Every few weeks, someone announces a tool that detects AI hallucinations. A startup, a research lab, a hyperscaler bolting a “trust layer” onto its chatbot. The release uses the word “guardrails.” Everyone nods. Another brick in the road to safe, reliable AI.
I want to argue that we are cheering for the wrong thing — that hallucination detection, however clever, cannot be the strategy. It can be a backstop. It can be a monitor. It cannot be the plan. And the reason is older than computing.
Start with the trap at the center of the whole idea.
To catch a hallucination, your detector has to know the right answer. Sit with what that means. The original model produced a confident falsehood because it did not have the grounded knowledge to do otherwise. Now you propose a second system to sit behind it and flag the lies. But to flag a lie, that second system has to know the truth — and if it knew the truth, you would not have needed the first model to guess in the first place. You would just serve the truth and skip the theater.
A detector good enough to reliably catch fabrication would have to possess exactly the capability whose absence caused the fabrication. Detection doesn’t solve the problem. It assumes the problem is already solved. That is the whole argument in a paragraph; everything else is just watching it play out.
So watch it play out. The first thing you notice is that a hallucination has no tell. When one of these models invents a court case, a citation, a drug dosage, a quarterly number, the sentence it produces is grammatically perfect, tonally identical to a true one, and delivered with precisely the same confidence. The model is not more hesitant when it lies. It does not sweat. There is no flicker. That is the entire reason this is hard: the false output and the true output are indistinguishable on their face. A detector staring at the text has nothing to grab onto, because there is nothing in the text to grab.
So the detector-builders do the sensible thing and go probabilistic. They get good — let’s be generous and say 95% good. And 95% sounds like an A. But invert it. In a hospital, a courtroom, a bank, a grid control room, 95% means one in twenty confident falsehoods walks right past the guard. And here is the cruel part: the ones that get through are not random. They are the most plausible fabrications in the batch — the ones convincing enough to fool the detector, which makes them precisely the ones most likely to fool you. A safety system that is only probabilistic is not a safety system. It is a liability with a press release.
It is also a treadmill. Every new model, every new domain, every fresh way of being wrong demands that the detector be retrained and re-tuned. It is antivirus software for an attacker that rewrites itself weekly — perpetual catch-up, by design. And you pay for it twice: once to generate the answer, again to check it, and you still don’t get certainty for the money.
But the deepest mistake here is a category error, and to name it I have to wade back into a fight I picked a quarter century ago.
Everyone reaches for W. Edwards Deming when they talk about quality — the American sage the Japanese supposedly heeded when Detroit wouldn’t. I once spent 4,400 words arguing the standard story gets the hero wrong. The man who actually carried disciplined quality into occupied Japan was a 29-year-old radio engineer named Homer Sarasohn, sent by MacArthur in 1946 to rebuild a flattened electronics industry. He and his colleague Charles Protzman, a Western Electric production man, spent four years teaching Japanese executives how to run a company and build things that worked — they literally wrote the handbook for it, a course book still in print in Japan today — and when they went home, Sarasohn handed the baton to Deming, who had a gift for self-promotion and ended up with his name on the prize and the legend. (Sarasohn was no footnote; he went on to a long career at IBM. History simply looked past him.) A remarkable number of readers wrote in to tell me I had it backwards. I didn’t, and I still don’t.
When that column ran, the Deming faithful came for me. The real transformation, they insisted, came from a handful of lectures Deming gave Japanese executives in the summer of 1950 — as if quality had arrived by seminar. Nonsense. If a few brilliant talks were all it took, answer me this: why did it take the better part of thirty years for Japan to turn quality into a weapon? The tools had been on the shelf since 1950 — Sarasohn’s manual, Protzman’s production discipline, Deming’s statistics, all of it.
What finally lit the fire was the memory chip. When Hitachi and the other Japanese makers went after the DRAM business Intel had invented, they slammed into the cruelest arithmetic in manufacturing: in a commodity chip, yield is the entire margin — and theirs was too low to make a dime. The answer had been sitting in Sarasohn’s handbook for three decades: build quality into the process instead of inspecting the failures out at the end. This time they used it. Japanese yields climbed past the Americans’ — seventy and eighty percent against Intel’s fifty or sixty — and by the mid-1980s the company that invented the DRAM had been driven out of it. The instruction was never the bottleneck. Necessity was.
We just prefer the story where one clever intervention saves the day — which is exactly the story being sold to us again: that a hallucination detector will do for AI what we like to pretend a seminar did for Japan.
But here is what matters for our purposes, and it is bigger than who gets the statue. Whether you credit Sarasohn, Deming, or the Japanese engineers who did the actual work, they all arrived at the same unglamorous law: you cannot inspect quality into a product. Sarasohn found factories where “quality” meant building a pile of vacuum tubes and throwing ninety percent of them away — where no one saw the problem with assembling precision electronics in a shack with a dirt floor. You do not fix that by hiring more inspectors to stand at the end of the line catching the bad ones. Inspection is expensive, it is late, and it never catches everything. The only thing that works is to build quality in — to design the process so the defect never happens. The industry that learned this went on to bury the one that had won the war. We are still driving the proof.
Hallucination detection is the man with the clipboard at the end of the line. It is quality by inspection, in a field that should have learned the lesson from manufacturing forty years ago.
And here is the part the clipboard can never fix: hallucination is not a malfunction. The model isn’t breaking when it makes things up. It is doing exactly what it was built to do — predict the most plausible next word, with no native notion of whether that word is true. Fabrication isn’t a bug in the architecture. It is the architecture, working as designed. You cannot detect your way out of a feature.
Which points at the only strategy that survives contact with the problem. Stop trying to catch the lie after the fact, and build a system that knows the boundary of what it actually knows — one that can tell the difference between answering from grounded, verified knowledge and reaching past the edge into invention, and that says so when it gets there. Not a smarter smoke detector. A machine that doesn’t set the fire.
That is harder. It is architectural, not bolted on, and it does not make for a tidy press release about a new trust layer. But it is the only version of this that works in a courtroom, where “our filter catches 95%” is not a sentence you want to say to a judge.
Detection is not a strategy. Design is. Sarasohn knew it in 1948. It is past time we learned it about machines that talk.
(Disclosure: I co-founded 2Brains, which is built around designing it in rather than inspecting it out, so I come to this with a horse in the race. I’d make the argument anyway — I was making versions of it about Japanese factory floors a quarter century ago.)

“To catch a hallucination, your detector has to know the right answer. Sit with what that means. The original model produced a confident falsehood because it did not have the grounded knowledge to do otherwise. Now you propose a second system to sit behind it and flag the lies. But to flag a lie, that second system has to know the truth”
It really doesn’t. It needs to be able to take the information and *check* to see if it exists. That’s not the same thing. If a word gets challenged in Scrabble, it doesn’t come down to what everyone knows already. They use a dictionary to check.
“When one of these models invents a court case, a citation, a drug dosage, a quarterly number, the sentence it produces is grammatically perfect, tonally identical to a true one, and delivered with precisely the same confidence.”
All of these are *checkable.* Does the citation actually exist? Follow the reference. You’re treating this as if there was nothing outside the AI and the checker but the entire world of knowledge exists as a reference.
How strange that Sarasohn mentions Donald Trump in your May 2000 article 🙂