The Problem With AI Scribes Isn't Just the Model

Four in five physicians now use AI in their practice. That number doubled in three years. And more than 70% of them still cite accuracy and reliability as their primary concern with the technology.

Read that again. Massive adoption. Persistent doubt. That's not contradiction — it's a clinical judgment call.

The skepticism is rational

A Doximity survey of more than 3,000 physicians found 94% have adopted AI or plan to. More than half are already using it in practice. The AMA's own numbers confirm the trend: 81% of physicians use AI professionally in 2026, compared to 38% in 2023.

These aren't early adopters taking a flier on experimental tools. These are working clinicians integrating AI into patient care. And still, 7 in 10 have reservations about accuracy.

The industry response has been predictable: better models, tighter prompts, more training data. That's not wrong. But it misses something important.

A lot of AI documentation errors don't come from the model. They come from what the model is working with.

AI scribes are doing more than transcription now

The early version of AI scribes was simple enough: physician speaks, AI transcribes, documentation gets faster. Useful, but limited. The new generation is more ambitious.

AI scribes are being integrated into deeper clinical workflows — cross-referencing prior visit notes, surfacing medication histories, flagging relevant diagnoses during the encounter. Optum and Suki have expanded their collaboration into richer operational environments. R1 partnered with Heidi to connect AI scribing with revenue cycle functions. Computer vision tools are starting to appear in clinic settings, reading vitals from monitors in real time.

The surface area of what these systems touch is expanding. That's fine. The question is what those systems are reading when they go deeper.

The documentation problem nobody's naming

There's a distinction worth drawing — between model error and source-document error — that almost never appears in the accuracy debate.

When an AI scribe produces an inaccurate output, the reflex is to interrogate the model. Wrong model. Undertrained on clinical language. Not enough data. Sometimes that's right. But an equally common problem is that the model performed exactly as designed, and the source material it pulled from was wrong, outdated, or internally inconsistent.

Clinical research puts the medication discrepancy rate at 40-60% during standard encounters — meaning nearly half of all clinical interactions involve a patient whose documented medication list doesn't match what they're actually taking. Those lists live in the EHR. AI scribes pull from the EHR. The model doesn't know the list is stale; it reads what's there.

The same problem shows up with formulary updates that haven't propagated across systems, clinical protocols that got revised in one department but not another, and patient records reconciled in one context but not yet updated in others. An AI working from a confident but six-month-old document doesn't hallucinate — it reads accurately from an inaccurate source.

That's a fundamentally different failure mode. And it doesn't respond to model improvements.

Better models don't fix stale knowledge

There's a tendency in the AI industry to assume that accuracy problems are model problems. That if you just keep scaling the model, keep improving the training data, keep tightening the prompts, the errors will eventually go away.

Some will. The ones that won't are the ones upstream — in the documentation that feeds the system.

Healthcare has a chronic maintenance problem with its own knowledge assets. Clinical guidelines get updated; the documents they replace often don't get removed. Formularies change; not every reference to the old version gets caught. Protocols are revised; the old version sits in an archive folder somewhere, structurally identical to the new one.

This matters in every industry. In healthcare it carries direct clinical stakes. An AI scribe that references outdated dosing guidance isn't the scribe's problem — it's a documentation infrastructure problem that the scribe just made visible.

The industry push for trustworthy AI in clinical settings has largely focused on the model layer: validation frameworks, accuracy benchmarks, explainability requirements. Much less attention has gone to the records layer underneath. HIMSS26 surfaced this tension explicitly — federal officials acknowledged that healthcare AI operates with few guardrails, but the policy responses pointed mostly at the AI systems themselves rather than the documentation they depend on.

Documentation infrastructure is the unsolved prerequisite

The path to physician trust in AI scribes runs through the documents those scribes read.

That means treating clinical documentation not as a passive archive that AI happens to access, but as active infrastructure that requires maintenance. Records need to be current. Contradictions need to be found and resolved, not just flagged. Outdated versions need to be removed, not just superseded. The AI that sits on top of that layer will only be as trustworthy as the layer itself.

As AI scribes expand further into clinical workflows — reading deeper into patient histories, pulling from broader documentation sets, making more consequential connections — the quality of that source material becomes harder to ignore. The loop Oracle has built into its Emergency Department documentation tool illustrates the stakes: when AI writes documentation that other AI then reads, any errors in the source compound quietly, without human review at each step.

That's the real argument for getting the documentation layer right. Not just for accuracy. For maintaining any meaningful oversight of what the system knows.

The closer

Physicians are right to want more before they fully trust AI documentation tools. The solution the industry keeps reaching for — better models — will get part of the way there. The part it won't address is the documentation layer underneath.

If health systems want clinical AI that doctors actually rely on, they need to treat their knowledge assets as infrastructure: maintained, reconciled, current. Not as a passive store the AI queries. An active system that stays accurate.

The trust problem isn't the model. In a lot of cases, it's what the model reads.

Mojar AI is a knowledge management platform that helps enterprises keep their documentation current, consistent, and queryable — so the AI systems built on top of it have something worth reading.

Four in five physicians now use AI in their practice. That number doubled in three years. And more than 70% of them still cite accuracy and reliability as their primary concern with the technology.

Read that again. Massive adoption. Persistent doubt. That's not contradiction — it's a clinical judgment call.

The skepticism is rational

These aren't early adopters taking a flier on experimental tools. These are working clinicians integrating AI into patient care. And still, 7 in 10 have reservations about accuracy.

The industry response has been predictable: better models, tighter prompts, more training data. That's not wrong. But it misses something important.

A lot of AI documentation errors don't come from the model. They come from what the model is working with.

AI scribes are doing more than transcription now

The early version of AI scribes was simple enough: physician speaks, AI transcribes, documentation gets faster. Useful, but limited. The new generation is more ambitious.

The surface area of what these systems touch is expanding. That's fine. The question is what those systems are reading when they go deeper.

The documentation problem nobody's naming

There's a distinction worth drawing — between model error and source-document error — that almost never appears in the accuracy debate.

That's a fundamentally different failure mode. And it doesn't respond to model improvements.

Better models don't fix stale knowledge

Some will. The ones that won't are the ones upstream — in the documentation that feeds the system.

Documentation infrastructure is the unsolved prerequisite

The path to physician trust in AI scribes runs through the documents those scribes read.

That's the real argument for getting the documentation layer right. Not just for accuracy. For maintaining any meaningful oversight of what the system knows.

The closer

The trust problem isn't the model. In a lot of cases, it's what the model reads.

Mojar AI is a knowledge management platform that helps enterprises keep their documentation current, consistent, and queryable — so the AI systems built on top of it have something worth reading.

The Problem With AI Scribes Isn't Just the Model

The skepticism is rational

AI scribes are doing more than transcription now

The documentation problem nobody's naming

Better models don't fix stale knowledge

Documentation infrastructure is the unsolved prerequisite

The closer

Related Resources

The Problem With AI Scribes Isn't Just the Model

The skepticism is rational

AI scribes are doing more than transcription now

The documentation problem nobody's naming

Better models don't fix stale knowledge

Documentation infrastructure is the unsolved prerequisite

The closer

Related Resources