Generative models are probability engines, not knowledge engines. In legal, that gap is the difference between a brief that's filed and a brief that's sanctioned. Here's why the LLM alone isn't enough — and what a deterministic layer actually does about it.
An LLM is a probability engine. Picture a giant matrix of every word, mapping the probability of the next word or phrase based on the words and phrases that came before it. The model just spits out the next tokens with the highest probability. That's why it's extraordinarily good at certain tasks and absolutely atrocious at others.
It doesn't comprehend certainty. It doesn't comprehend anything, actually. It's a stochastic language probability map. When you ask it to qualify itself if uncertain, it isn't calculating uncertainty — it's mapping your prose about uncertainty onto the probability map and spitting out tokens that look like a hedge.
That's the part that gets lost in the marketing. The model isn't being humble or confident. It isn't "thinking" at all. It's continuing a sequence in the most statistically expected way.
Two very different jobs get conflated all the time: document extraction over a discovery set, and case-based research. They are not the same problem.
Document extraction is more likely to be representative of the tokenized corpus the model trained on, so the probability of hallucination is lower. Where it still fails is the dozen pre-processing layers that distort a document before the model ever sees it — OCR errors, redactions, page order, exhibit tagging. Firms (and their clients) tend to accept that risk because matters routinely run eight to twelve figures of pages, and the cost of manually reviewing a billion pages has been indefensible since well before LLMs existed. Fortune 500 clients made that calculation in the 2009 crash.
Case-based research is the inverse. The authoritative corpus is narrow: cases, statutes, regulations, treatises, law review articles. The LLM's training corpus is not that. It is everything ever written about cases, statutes, and regulations — including articles by non-lawyers that don't represent, or don't accurately represent, the underlying authority. All of that ends up in the same probability map, weighted by volume.
It gets dramatically worse in politicized areas of law. Pick your flavor — immigration, election law, Second Amendment, abortion, antitrust, civil rights. In those areas, the volume of non-lawyer commentary (op-eds, advocacy pieces, think-tank "explainers", Reddit threads, viral tweets) absolutely dwarfs the actual case law and statutory text by orders of magnitude.
And remember: the probability map weights by volume, not by authority. The model has no concept that a law review article should outweigh a partisan blog post. To the matrix, they're both just tokens.
The real killer is that political rhetoric paraphrases law in ways that structurally mimic citations. "The Supreme Court held in Bruen that…" followed by a complete mischaracterization is shaped identically to an accurate cite. The model learns the shape of legal authority from text that wears the costume of legal authority. When you then ask it about a politicized area, it confidently produces something that sounds exactly like a real holding — because rhetoric trained it how holdings sound, not what they actually say.
The more politicized the area, the worse that signal-to-noise gets. And the more confident the LLM's output reads, the more likely you are to file it without a second look. That is how careful lawyers — at firms with names on the building — end up on the sanctions list.
We've already established the LLM can't comprehend certainty and can't comprehend authority. Asking it to qualify itself better, prompt itself harder, or "think step by step" doesn't change what it fundamentally is. So stop asking it to.
A deterministic layer refuses to trust the LLM at the boundary where it matters. Before a citation lands in a document, it has to resolve against ground truth — not against the model's statistical impression of ground truth. If it doesn't resolve, it doesn't ship.
Prose, structure, argument flow. This is what generative models are actually good at. Let them work.
Cases, statutes, regulations, full cites, short cites, Id., supra. Every claim of authority gets pulled out of the draft and queued for verification. Nothing gets a pass because it "looks fine".
Does the case exist in the indexed reporter? Does the statute section exist in the official code? Was it in effect on the filing date? If the answer is no, the cite is rejected — no LLM gets to argue with that result.
The pin cite has to point to text that actually exists at that location. The proposition has to be supported by the language at that pincite, not by adjacent vibes. Quotation drift, version drift, paraphrase passed off as direct quotation — all flagged.
Anything that didn't resolve cleanly is surfaced to the attorney with the evidence — the reporter, the pincite, the actual language. No silent pass-throughs. The deterministic layer is the partner who reads every cite before signing the brief.
People want the LLM to be Mike Ross — legal theorization and photographic memory in one package. It isn't. The LLM is the theorization without the photographic memory. The deterministic layer is the photographic memory without the theorization. You want Mike? You need both.
Verbatim is the deterministic memory layer for citation and quote verification. Run it against a brief — yours, or one you've received — and it produces a report that says, for every authority cited, whether the cite is real and whether the quoted language actually appears in the cited opinion at the pin cite. Same brief, same report, every time.