Epistemic
Custody
Custody of what a model knows versus what it asserts.
A hallucination is not something we detect after the fact. It is a claim with no settleable receipt — structurally visible the moment it is minted.
🦷⟐♾️⿻ incision · containment · compression · tension mapThe problem
The grader is the thing being graded.
Every honesty tool you have seen builds a better lie detector: a sharper confidence score, a stronger judge model rating another model's truthfulness. The text and its certification come from the same place. The loop ratifies itself.
That loop has a second floor under it. "Was this true?" is usually not a question you can settle — truth needs a world to check against, and most claims don't carry one. So the self-grader quietly substitutes fluency for grounding. On clean benchmark prose it looks fine. On dense, confident, world-building text it breaks worst: charmed by a coherent voice, it waves claims through that nothing actually backs.
We do not try to fix the lie detector. We refuse the question it answers. We do not adjudicate truth — we measure provenance: whether a claim settles against a receipt at all. The claims that don't settle stay in the record, un-deletable, labelled, sitting next to the ones that do.
The approach
Provenance, not truth-claim.
A receipt is the smallest checkable unit of grounding: a cited span that either exists, verbatim, in the named source — or does not. A model can write fluently about a span that isn't there. A deterministic, model-free check cannot be charmed into agreeing that it is. That asymmetry is the whole instrument.
Honesty has three pressure points, so the lab now tests all three: one reads, one writes, one witnesses an agent inside a receipt-bound world.
Did the claim settle?
After a model speaks, run two graders over the same atomic claims — the model's own self-grader against a frozen structural judge — and ship the gap. The claims the language layer accepted but no receipt backs are the leak set.
May the model commit?
Before a model changes state, a deterministic controller decides whether to let it. The model may propose; only the controller may commit — and it re-derives its verdict from the diff itself, ignoring whatever the proposer claims about its own work.
Does the agent stay coupled?
Inside a deterministic world, the agent's narration is checked against receipts. It may describe, explain, or plan, but it cannot upgrade a rejected action, hide a failed event, or invent inventory the folded world state does not contain.
Instruments
Three pressure points on one custody boundary.
No instrument is "done" without an executable acceptance line that fails when it should. All three ship green, offline, with no network and no API key. The output below is verbatim from the real run.
provenance_escrow
v0 greenTwo graders over the same atomic claims — a leaky language self-grader versus a frozen, model-free structural judge that asks only "does the cited span exist verbatim in the named doc?" Ships the leak set. The adversarial corpus is the maximally-mystical text where a language honesty judge breaks worst.
commit_gate
oracle greenA deterministic controller takes a JSON seal proposal and decides COMMIT / NO_COMMIT / FORBIDDEN_BOUNDARY / ESCALATE / REJECT_MALFORMED by structural re-derivation only. It ignores the proposer's self-graded scores. A structurally-valid lie — schema clean, self-graded clean — is still rejected, because boundary-touches are computed from the diff text itself.
world_witness
v0 greenA WAKN-style receipt trace becomes the external witness. The agent report must mention the rejected action, preserve its rejection code, and make only state claims that match the folded after-world snapshots. Speech is rendered beside receipts, not above them.
The model may propose.
Only the controller may commit.
None of the instruments decides what is true. The read side proves a claim has a settleable receipt; the write side guards the custody boundary, not the correctness of the change; the world side checks narration against receipt-bound state, not intelligence. All refuse to certify a cleanliness they never checked — every metric is proven able to fire before any number is trusted.