Fluent Illusions - Charlotte Moore

My sister told me she's never watching another cop show with me again. Apparently I do this very annoying thing where I guess the killer. And it's particularly annoying because I'm usually right. No one would say I actually knew who the killer was 10 minutes into a procedural cop show.

In philosophy, it is said that luck is the knowledge killer. It's another way of saying being right isn't the same as knowing. Most of the time, we're very good at telling the difference. At least when we're judging other people. But what if I asked you:

If you ask an LLM a question, and it provides the correct answer, would you say that it knows the answer?

I've asked this question to several people, and the response I've consistently gotten is 'no'. But when I push further into why, people struggle to answer that question. I think that's partially because it's a trick question. Knowledge isn't just about producing true statements. It requires a mechanism that tracks why those statements are true.

There are different reasons we might say something is true. The kind of truth most people understand implicitly is correspondence; something is true when it corresponds to reality.¹

But correspondence isn't the only way we make that evaluation. Coherence truth contends that something is true when it is coherent with other related premises.² For example, the statement "Harry Potter is a wizard". Harry Potter isn't even a real person, but most of us would say that that is true. Truth, in this sense, is the function of a system.

Here's another, possibly more familiar example. You write some code, the tests pass, and you're good to go, right? Wrong. We've all written code that was internally coherent, but when faced with a complex business problem, the assumptions used to write that code may be missing some necessary nuance. The tests indicate internal coherence, but the output may not correspond to what is actually needed.

If you've used a general-purpose LLM like Claude or ChatGPT, it can feel pretty magical. You can ask it to give you a resignation letter in Shakespearean iambic pentameter, and it will actually do that. I, however, cannot do that. That's the magic. But when an LLM produces a true statement, that output is downstream of internal relational consistency, not world-checking.

During training, models process massive amounts of text and learn statistical relationships between pieces of information within that data. A mechanism called attention allows the model to weigh which of those relationships are most relevant for a given context.³ The same words appear in different contexts, and attention is how the model encodes and resolves context. The word "delivery" might appear frequently near "mail" and "package," but in other contexts near "baby" and "hospital."

By the time training is done, the model has built an enormous internal map of how pieces of information relate to each other, derived entirely from statistical patterns in text. At inference, it applies that learned map to generate output. "Delivery" near "package" activates the pathways related to logistics rather than babies. The fact that a model can build a map complex enough to make that distinction is super cool. But that map is internally coherent, not externally verified.

Models inherit correlations from representations of reality. Whether their outputs correspond to the world depends entirely on the data they were trained on. LLMs produce true statements frequently, which is what makes them convincing. But the truth is incidental. It's a byproduct of coherence aligning with reality, not a result of checking. Frequency of correctness isn't evidence of understanding, it's evidence of a training set that happens to correlate with reality.

Consider this example. In 2023 an attorney in New York was caught filing a brief that was entirely AI generated.⁴ The brief sounded great, but upon closer inspection, the cases cited simply didn't exist. ChatGPT learned what a legal brief looked like, and how to appropriately construct one, but never did the necessary correspondence check to ensure the stated facts matched reality. Hallucination is a side effect of this coherence mechanism. It isn't a bug, it's a fundamental consequence of the architecture.

There are some ML approaches that attempt to close the gap. Reinforcement learning from human feedback (RLHF) is a mechanism that allows for some correspondence checking. Unlike RAG or tool use, RLHF actually changes the model's weights. Human raters evaluate outputs and the model learns from those preferences. Research has shown this measurably improves output quality. You could argue that this is a correspondence mechanism: humans can check against reality, their judgments get encoded into the model, and the model gets better at producing outputs that correspond to the world. That's a real counterargument.

But the model doesn't gain the ability to do correspondence. It gains a behavioral approximation of what correspondence-checked outputs look like.⁵ This is correspondence by proxy. And the proxy has its own biases: humans love being told what they want to hear. Personally, it's been great for my confidence.

The same mechanism that improves accuracy also produces agreeableness — RLHF has been shown to produce sycophancy. The encoded preferences shape the interaction in subtle ways that create psychological pressure. If the output feels right and affirming, you're less likely to push back.* And if you're less likely to push back, you're probably less likely to verify. Which is the thing you needed to do in the first place.

Even perfect verification wouldn't solve a deeper issue. These models generate outputs from artifacts of human externalization: text, images, video, audio. And there are many truths that never get externalized at all. The gap between coherence and correspondence doesn't start at the output. It starts at the training data. But does that actually matter if the tools work well?

1. "The Correspondence Theory of Truth." Stanford Encyclopedia of Philosophy.

2. "The Coherence Theory of Truth." Stanford Encyclopedia of Philosophy.

3. 3Blue1Brown. "Attention in transformers, visually explained." YouTube (2024).

4. Bohannon, Molly. "Lawyer Used ChatGPT In Court—And Cited Fake Cases. A Judge Is Considering Sanctions." Forbes (2023).

5. Ouyang, Long, et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022).

* You're absolutely right!