Epistemic
Custody

Custody of what a model knows versus what it asserts.

A hallucination is not something we detect after the fact. It is a claim with no settleable receipt — structurally visible the moment it is minted.

🦷⟐♾️⿻ incision · containment · compression · tension map

The problem

The grader is the thing being graded.

Every honesty tool you have seen builds a better lie detector: a sharper confidence score, a stronger judge model rating another model's truthfulness. The text and its certification come from the same place. The loop ratifies itself.

That loop has a second floor under it. "Was this true?" is usually not a question you can settle — truth needs a world to check against, and most claims don't carry one. So the self-grader quietly substitutes fluency for grounding. On clean benchmark prose it looks fine. On dense, confident, world-building text it breaks worst: charmed by a coherent voice, it waves claims through that nothing actually backs.

We do not try to fix the lie detector. We refuse the question it answers. We do not adjudicate truth — we measure provenance: whether a claim settles against a receipt at all. The claims that don't settle stay in the record, un-deletable, labelled, sitting next to the ones that do.

The approach

Provenance, not truth-claim.

A receipt is the smallest checkable unit of grounding: a cited span that either exists, verbatim, in the named source — or does not. A model can write fluently about a span that isn't there. A deterministic, model-free check cannot be charmed into agreeing that it is. That asymmetry is the whole instrument.

Honesty has three pressure points, so the lab now tests all three: one reads, one writes, one witnesses an agent inside a receipt-bound world.

Read side

Did the claim settle?

After a model speaks, run two graders over the same atomic claims — the model's own self-grader against a frozen structural judge — and ship the gap. The claims the language layer accepted but no receipt backs are the leak set.

Write side

May the model commit?

Before a model changes state, a deterministic controller decides whether to let it. The model may propose; only the controller may commit — and it re-derives its verdict from the diff itself, ignoring whatever the proposer claims about its own work.

World side

Does the agent stay coupled?

Inside a deterministic world, the agent's narration is checked against receipts. It may describe, explain, or plan, but it cannot upgrade a rejected action, hide a failed event, or invent inventory the folded world state does not contain.

Instruments

Three pressure points on one custody boundary.

No instrument is "done" without an executable acceptance line that fails when it should. All three ship green, offline, with no network and no API key. The output below is verbatim from the real run.

Read side

provenance_escrow

v0 green

Two graders over the same atomic claims — a leaky language self-grader versus a frozen, model-free structural judge that asks only "does the cited span exist verbatim in the named doc?" Ships the leak set. The adversarial corpus is the maximally-mystical text where a language honesty judge breaks worst.

ANTHROPIC_API_KEY= python3 run_asymmetry.py HARD GATE PASS: structural_judge.py unchanged HARD GATE PASS: planted_lies.json unchanged CANARY: structural judge marked 5/5 planted lies UNBACKED (matching rejection codes), 3/3 clean claims BACKED DELETION MATRIX: disabling span-containment flipped a planted lie to BACKED (guard load-bearing); restored LEAK SET: 5 claims lang=BACKED structural=UNBACKED (over 11 atomic claims) [K>0] METRIC LIVE: deletion probe incremented leak counter 5->6 ASYMMETRY: PROVEN

Read the read-side spec →

Write side

commit_gate

oracle green

A deterministic controller takes a JSON seal proposal and decides COMMIT / NO_COMMIT / FORBIDDEN_BOUNDARY / ESCALATE / REJECT_MALFORMED by structural re-derivation only. It ignores the proposer's self-graded scores. A structurally-valid lie — schema clean, self-graded clean — is still rejected, because boundary-touches are computed from the diff text itself.

python3 oracle.py HARD GATE PASS: commit_gate.py + seal_proposals.json byte-unchanged CANARY PASS: 6/6 attacks rejected with matching decisions, 2/2 healthy committed DELETION MATRIX PASS: disabling boundary-derivation flipped 'structurally_valid_lie_forbidden_in_prose' to COMMIT (guard load-bearing); restored METRIC LIVE: rejection counter fired 0->1 on a fresh forbidden probe ASYMMETRY: 4 attack(s) self-graded every-invariant-clean and were STILL rejected by structural re-derivation (the gate cannot be talked past). COMMIT set = […] COMMIT-GATE: ASYMMETRY PROVEN

Read the write-side spec →

World side

world_witness

v0 green

A WAKN-style receipt trace becomes the external witness. The agent report must mention the rejected action, preserve its rejection code, and make only state claims that match the folded after-world snapshots. Speech is rendered beside receipts, not above them.

python3 oracle.py HARD GATE PASS: world_witness.py + wakn_trace.json byte-unchanged CLEAN TRACE PASS: 3 receipts, 2 accepted, 1 rejected, 4 state claims checked CANARY PASS: 7/7 receipt-coupling attacks killed with expected codes DELETION MATRIX PASS: 7/7 expected guards are load-bearing METRIC LIVE: claim mismatch counter fired 0->1 on a fresh state-lie probe ASYMMETRY: agent narration cannot upgrade, hide, or outvote receipts; 1 rejected-action upgrade killed WORLD-WITNESS: REALITY COUPLING MEASURED

Read the world-side spec →

Reference implementation

Juris compiles authority before a model writes.

The instruments observe three pressure points. Juris is not a fourth one: it is a write-side reference implementation that assigns final authority to every declared value, exposes one bounded model decision, and lets an independent implementation reconstruct the release from receipts.

Typed authority compilation

The model may propose meaning.

Evidence owns source facts. Code owns deterministic derivations. A human owns declared exceptions. The model receives one exact enum, while assembly and release remain outside its authority.

Open Juris →

Authority inspector

Follow every released coordinate.

Inspect the frozen execution graph, authority map, source bindings, admitted controls, killed attacks, and the negative result that keeps v0's claim narrow.

Inspect the artifact →

The model may propose.
Only the controller may commit.

None of the instruments decides what is true. The read side proves a claim has a settleable receipt; the write side guards the custody boundary, not the correctness of the change; the world side checks narration against receipt-bound state, not intelligence. All refuse to certify a cleanliness they never checked — every metric is proven able to fire before any number is trusted.

enter deeper ⟐

EpistemicCustody