LLM document extractors are missing a 50-year-old CS technique
How I verify private-equity management fees, and why “extract-then-apply” is silently wrong for any document corpus with cross-references.
In March 2024, the GP of a private-equity fund signs a side letter with one of its limited partners. One clause caps the management fee rate at 1.25% from the end of the fund’s investment period onward. An unremarkable concession; the LP’s lawyers extract it, the GP’s CFO files it, everyone moves on.
Two years later, in May 2026, the GP issues a fee accommodation: management fee reduced from 2% to 1% for the period from January 2028 through the end of the investment period. Also unremarkable. Both clauses are anchored to the same boundary, “the end of the investment period.” The cap activates at that date; the waiver expires at it. That boundary is a date in the LPA: January 15, 2029.
In June 2028, the GP signs an amendment extending the investment period by 18 months. The new end-of-investment-period date is July 15, 2030.
A human reading those three documents in sequence has to notice: when the boundary moves, every clause anchored to it moves with it. The cap from 2024 now activates 18 months later. The waiver from 2026 now lasts 18 months longer. The dollar impact varies with commitment size, fee basis, and invested capital at the moment the gap opens. On a $10M LP commitment it runs to the low six figures. On a pension or sovereign-wealth commitment ten times that size, into the seven.
Catching these compounds is exactly the work an LP’s shadow-accounting team exists to do. The largest LPs run teams of dozens of people for it, by hand, in spreadsheets. Smaller LPs (family offices, small endowments) don’t have that headcount, and they lose money quietly to the kinds of dependencies their bigger peers catch by labor.
The naive way to automate this work with an LLM gets it wrong. The reason has nothing to do with the LLM.
What most LLM document systems do today
The default pattern for building an LLM document-extraction system in 2026 is roughly this:
for doc in inbox:
extracted = llm.extract(doc, schema)
validated = schema.validate(extracted)
database.write(validated)
I started this way. So does the doc-AI side of essentially every B2B startup shipping document-extraction in the last two years, plus most internal teams at incumbents that bolted an LLM onto an existing pipeline. The code is straightforward, the LLM does the real work, and the schema gives you validation cheap.
RAG is the same shape with a different surface. Instead of pre-extracting at ingest and writing rows, you index document chunks, retrieve relevant ones at query time, and let an LLM read those chunks to answer the user. The retrieval is smarter; the underlying mental model is identical. Each document is treated as a self-contained unit of meaning. RAG is excellent for “find the relevant passage and summarize it.” It is not a substitute for evaluating clauses as a system, because retrieval is local and clause evaluation is global. A query like “what is the management fee on July 1, 2030?” can pull every fee-related clause from every document, hand them all to the LLM, and still produce a wrong number, because the right answer is the output of executing those clauses against each other in dependency order, not the output of reading them.
For most document corpora, this is correct. Invoices, expense receipts, customer-support emails, KYC onboarding packets, freight bills of lading: each document is independent. The data extracted from invoice #1 is not contingent on what was in invoice #872, and nobody expects it to be. Extract-then-apply maps perfectly to the underlying problem.
For legal documents, it does not.
A side letter is not “data about a fund.” It is a modification to a prior agreement, the LPA. An amendment is a modification to the LPA and to every prior side letter. An MFN (most-favored-nation) election is a modification contingent on a disclosure made binding by a separate confirmation, three documents apart. The clauses inside any one of these documents reference fields that other documents define and other documents change.
Extract-then-apply, applied to that corpus, runs the LLM across every document, gets back a clean list of clauses, applies them in document order, and produces a wrong answer. The wrongness is silent. The output is shaped right and shaped consistently. It is just not arithmetically correct.
This is not an LLM problem. The LLM extracts the clauses fine. The problem is the architectural shape of the system around it.
What you actually have is a fixed-point
The problem is structural. Each clause is a function that reads from a shared state, a fund-parameter timeline (rate, basis, investment-period end, billing cadence, and so on), and writes back to it. Some clauses read fields that other clauses write. Some of those reads sit inside date expressions: a clause’s effective end date can itself be a reference to a field that another clause mutates.
Applying the clauses in document order does not converge. Pass 1 executes clause A using the seed value of fund_investment_end_date. Pass 1 then executes clause B, which mutates fund_investment_end_date. Clause A’s effective end is now stale. Re-executing A picks up the new value. But re-executing A might in turn cause clause C’s date condition to flip, requiring re-execution of C. And so on.
What you actually have is a fixed-point computation. The system has converged when applying all clauses against the current timeline produces no further changes to the timeline. Until then, the answer is provisional.
This is not new. The pattern has been understood in computer science for at least fifty years.
Datalog evaluation has solved exactly this since the 1970s: rules that reference other rules’ outputs, evaluated bottom-up to a fixed point (Bancilhon and Ramakrishnan, 1986, is the canonical reference). Constraint-propagation networks (Mackworth, 1977) are the same shape, applied to AI rather than databases. Build-system DAGs (Make, Bazel, Nix) are the same shape, applied to source files. Reactive frameworks (Vue, Solid, Recoil) are the same shape, applied to UI state. Incremental-computation libraries (Adapton, salsa) are the same shape, applied to compiler workloads. Anyone who has shipped any of those systems already knows what is coming next.
What is surprising is not the technique. What is surprising is that nobody applying LLMs to legal documents in 2026 reaches for it. Document-AI products treat clauses as independent entities to extract. Extracting clauses is the LLM’s job, and it does that well. But once you have the clauses, the next step is not to write them to a database. The next step is to evaluate them as a system to a fixed point.
Typed clauses, shared timelines, a fixed-point loop
The architecture truefee uses to handle this is two parts: clauses as typed instructions, and timelines as the mutable shared state those instructions read and write. The fixed-point loop sits on top.
The typed-instruction shape.
The clause-interpretation pass sends each natural-language clause to an LLM with a strict output schema. The LLM does not return prose, JSON blobs, or “extracted entities.” It returns one or more ClauseInstruction objects:
class ClauseInstruction:
action: Literal["SET", "ADJUST", "CONSTRAIN", "GATE",
"NO_ACTION", "MANUAL_REVIEW"]
affected_field: str | None # e.g. "management_fee_rate"
value_expr: ASTNode | None # what to write
effective_date_expr: ASTNode | None # when it activates
effective_end_date_expr: ASTNode | None # when it expires
condition_ast: ASTNode | None # gate predicate
# other typed slots elided
The action is a closed six-element set: SET writes a value, ADJUST writes a delta against the current value, CONSTRAIN registers a CAP or FLOOR, GATE modifies the timing of a transition another clause already wrote, NO_ACTION explicitly asserts no effect, MANUAL_REVIEW surfaces ambiguity to a human. The closed set matters: anything outside it is a category error caught by validation, not silently passed downstream.
affected_field names the field being mutated, drawn from a small fixed registry (management_fee_rate, fund_investment_end_date, management_fee_basis, and so on). The registry is what makes the entire system tractable. Clauses can only touch fields the system has thought about. Anything else gets MANUAL_REVIEW.
value_expr, effective_date_expr, effective_end_date_expr, and condition_ast are recursive ASTNode trees. Leaves are literal (a constant), field_ref (a pointer to another field’s current value), or function_call (a registry-restricted function like ANNIVERSARY or FUND_REALIZATION_PCT). Combinators are comparison, arithmetic, temporal. Nothing else is allowed in the schema, which means nothing else is allowed in the LLM’s output.
The 2026 fee waiver from the opening becomes:
ClauseInstruction(
action="SET",
affected_field="management_fee_rate",
value_expr=literal(0.01), # 1%
effective_date_expr=literal(date(2028, 1, 1)),
effective_end_date_expr=field_ref("fund_investment_end_date"),
)
The field_ref in effective_end_date_expr is the dependency. The waiver does not say “ends on January 15, 2029.” It says “ends on whatever fund_investment_end_date is when this is evaluated.”
The 2028 amendment becomes:
ClauseInstruction(
action="ADJUST",
affected_field="fund_investment_end_date",
value_expr=temporal(ADD_MONTHS,
field_ref("fund_investment_end_date"),
literal(18)),
)
Note the self-reference: the field being mutated appears inside its own update expression. We will return to this.
The shared state.
Timelines are a dict keyed by field name. Each timeline holds a list of entries. An entry looks roughly like (start_date, value, end_date, source_clause_id, insertion_order). Reads happen via value_at(field, date), which finds the entry whose [start_date, end_date) interval contains the query date. Writes happen via insert_entry(field, ...), which appends a new entry with a fresh insertion_order.
If multiple entries cover the same date, which happens routinely as later clauses modify earlier ones, the entry with the highest insertion_order wins. insertion_order is what lets the same clause re-executed on a later pass overwrite its own prior write deterministically. Without it, repeated execution would pile up entries and ambiguous winners.
Initially the timelines are populated from the LPA seed values. After clauses execute, the timelines reflect every modification.
The execution loop.
The fixed-point iteration is short:
MAX_PASSES = 3 # runaway-runtime guard, not a termination strategy;
# in practice the scenarios converge in two
for pass_num in range(MAX_PASSES):
prev_snap = snapshot(timelines)
prev_cond_dates = resolved_condition_dates()
for clause in clauses_in_document_order:
execute(clause, timelines)
new_snap = snapshot(timelines)
new_cond_dates = resolved_condition_dates()
if new_snap == prev_snap and new_cond_dates == prev_cond_dates:
break
Each pass executes every clause from scratch against the current state of the timelines. execute(clause, timelines) resolves every AST node inside the clause’s date and value expressions. A field_ref("fund_investment_end_date") is evaluated by reading the timeline for that field and returning whatever value lives there now. The result is written back as a new timeline entry tagged with this pass’s insertion_order. Reads always hit the latest state. Writes from earlier passes are overwritten by writes from later passes against the same (field, source_clause_id) key.
Convergence is detected when a pass changes nothing. The loop always runs one final no-op pass to confirm.
Tracing the running example.
Pass 1, document order: 2024 cap, 2026 waiver, 2028 amendment.
The 2024 cap executes first. Its effective_date_expr is a field_ref("fund_investment_end_date"). The timeline for that field, at this moment, holds only the LPA seed entry: January 15, 2029. The cap is registered with active_from = 2029-01-15.
The 2026 waiver executes next. Same field_ref resolution: January 15, 2029. The waiver writes a management_fee_rate entry: [2028-01-01, 2029-01-15) = 1%.
The 2028 amendment executes last. It reads fund_investment_end_date (still January 15, 2029), calls ADD_MONTHS(2029-01-15, 18), gets July 15, 2030, and writes that as the new value for fund_investment_end_date effective from the amendment’s own document date forward.
At the end of pass 1, the timeline contains, in part:
management_fee_rate:
[2028-01-01, 2029-01-15) → 1% (from 2026 waiver)
fund_investment_end_date:
2024-01-15 onward → 2029-01-15 (LPA seed)
2028-06-15 onward → 2030-07-15 (from 2028 amendment)
constraints[management_fee_rate]:
CAP 1.25% active_from 2029-01-15 (from 2024 cap)
Internally inconsistent. The waiver’s end date and the cap’s activation are still anchored to January 15, 2029, but the field they referenced has moved to July 15, 2030 from the amendment date forward. The convergence check compares snapshots; they differ. Loop.
Pass 2, same clauses, same order.
The cap re-executes. Resolves field_ref("fund_investment_end_date") against the current timeline. The amendment’s write is now visible: post-amendment, the field reads July 15, 2030. The cap’s prior registration (active_from 2029-01-15) is overwritten with active_from 2030-07-15.
The waiver re-executes. Same field_ref resolution: July 15, 2030. The prior entry [2028-01-01, 2029-01-15) = 1% is overwritten with [2028-01-01, 2030-07-15) = 1%. An additional 18 months of 1% rate. This is where the LP’s fee relief lives.
The amendment re-executes. Its value_expr is ADD_MONTHS(field_ref("fund_investment_end_date"), 18). If we naively re-evaluate it now, we get ADD_MONTHS(2030-07-15, 18) = 2032-01-15 and write that. The investment period would extend by 18 more months on every pass, forever.
This is the self-reference case flagged earlier. The detection is structural: traverse the clause’s value_expr AST and collect every field_ref. If affected_field appears in that set, the clause is self-referential. Re-execution of self-referential ADJUSTs is skipped on passes after the first. The amendment’s pass-1 write (July 15, 2030) stands.
End of pass 2 the timeline is internally consistent. The cap is anchored to 2030-07-15. The waiver ends at 2030-07-15. fund_investment_end_date is at 2030-07-15. Pass 3 runs, nothing changes, the snapshot equality check passes, and the loop exits.
Side by side:
waiver cap
(1.0%) (1.25%)
-------------------- --------------------
start end start
---------- ---------- ----------
After pass 1: 2028-01-01 2029-01-15 ........ 2029-01-15 (stale)
After pass 2: 2028-01-01 2030-07-15 2030-07-15 (correct)
^ ^
extended 18 months activates 18 months later
Converged. The fee calculator runs against the final timeline. The LP gets credit for the 18 months of additional fee relief, the cap activates at the right boundary, and the GP-claimed fee can be checked against a number that is now arithmetically correct.
Edge cases
The simple loop above gets the running scenario right. It does not get every scenario right. Four cases that bite a naive implementation:
Transitive cycles. A clause does not have to be syntactically self-referential to create one in principle. Clause X writes to field A while reading field B; clause Y writes to field B while reading field A. Each pass would ping-pong, and truefee’s syntactic self-reference detection (collect field_refs from the clause’s value_expr, check affected_field membership) would not catch it. In practice, PE governing language is consistently acyclic. Lawyers write clauses to be enforceable on receipt, and a clause whose meaning depends on its own downstream output is ambiguous in legal review and rare in the wild. The dependency directions that actually show up (clauses anchored to LPA fields, amendments rewriting those fields, reports populating metric fields) converge in at most a handful of passes. I have not encountered a true cycle in the corpora I have worked with. The MAX_PASSES cap exists as a runaway-runtime guard rather than a termination strategy. If a future corpus produced a real cycle, the right behavior is to detect non-convergence and surface the involved clauses as MANUAL_REVIEW; today the system logs a warning and exits.
GATE conditions whose resolution moves. A GATE clause modifies the timing of a transition another clause already wrote. Its trigger is a condition_ast, a Boolean expression over fund fields and metrics. For example, “the deferred fee reduction takes effect on the earlier of the second anniversary of final closing, or the date the fund’s realization reaches 50%.” That condition resolves to a specific date by evaluating its operands against the current timeline.
Now suppose a later clause mutates one of those operands. The 2nd-anniversary-of-final-closing leg of the disjunction depends on fund_final_closing_date. If the fund had a subsequent close that an amendment captured, that field has moved, and the resolved gate date moves with it. The GATE’s own AST slots haven’t been written to. Its meaning has nonetheless changed.
Tracking only timeline writes would miss this. truefee’s convergence check therefore snapshots two things, not one: the timelines themselves, and the per-condition resolved dates. The break condition is both unchanged together. A pass that produces a stable timeline but a different resolved gate date is not converged.
ADJUST against a moved baseline. ADJUST writes a delta against the current value at its effective date. If a SET writing 2% gets overwritten on a later pass by a SET writing 2.5%, the ADJUST that follows (“reduced by 25 basis points”) needs the new baseline. This works because re-execution is from-scratch every pass: the ADJUST resolves value_at(effective_date) against the current timeline, not against a memoized delta. Implementations that cache the prior pass’s intermediate values get this wrong silently.
Extracted versus computed fields. A quarterly fund-realization report supplies fund_percentage_realized as extracted data, not as the output of a clause. It enters the timeline once, at extraction, and is never overwritten by the loop. Computed fields, the ones a clause has written, are the only fields whose values can change between passes. Mixing the two in a single timeline requires discipline at insertion time: extracted entries get a sentinel source_clause_id that the loop never replays.
Beyond PE
The pattern is not about private equity. It is about any document corpus where legal language references other legal language, and where the resolution of those references produces money or compliance.
None of this is unsolved territory. Tax-software vendors, trading-rule engines, and contract-lifecycle systems have applied constraint-propagation-style evaluation to legal text for decades. What is new is the wave of LLM-driven document automation, which has largely been built without picking up the techniques that older domain-specific software solved long ago. The same shape of problem keeps coming back; the same shape of solution is sitting on a shelf.
The shape shows up in:
- Commercial contract management. Master agreements with addenda, side agreements, change orders. An amendment silently shifts the meaning of references in standing addenda. Procurement teams catch this by hand, in spreadsheets, and miss it routinely on long-running enterprise relationships.
- Insurance. A base policy plus endorsements plus riders, where each rider is a modification to terms defined in the base or in earlier riders. The same dependency direction, the same silent wrongness when references are not re-resolved.
- Tax-software rule engines. Statutes reference statutes; revenue rulings reference statutes; new tax bills modify older code, cascading through prior cross-references. Mature tax software handles this. New LLM-based tax tools are rediscovering it.
The pattern is not universal. It does not apply to invoice extraction, KYC parsing, receipt OCR, or any corpus where document meaning is independent. For those corpora, extract-then-apply is the right shape: each document settles its own meaning at the moment it leaves the system that produced it. The cost of running a fixed-point loop over them is wasted complexity, nothing worse. The real damage is in the other direction: extract-then-apply, run over a corpus with cross-document dependencies, produces silently wrong numbers.
PE is the corpus I picked because the dollars are unambiguous, the documents are obtainable, and the math is verifiable end-to-end. The architecture goes wherever the shape goes.
Honest limits
Three things I want a reader to know before believing the post.
Schema validation catches structural errors, not all semantic ones. Pydantic plus a strict closed-set schema catch the structural failures: a clause emitted with the wrong field type, an AST with disallowed node types, an action outside the six-action set. Semantic misinterpretation is the residual. A clause that should be SET sometimes comes back as ADJUST, a date phrase gets parsed wrong, a GATE returns as MANUAL_REVIEW where a careful human would have interpreted it. The AST self-check rules baked into the prompt catch most of these; the rest fall through to MANUAL_REVIEW rather than executing silently. The architecture in this post does not depend on the LLM being perfect. It depends on the LLM being good enough that residual errors are caught at validation, not at the fee calculation.
The test corpus is intentionally synthetic. truefee has been exercised against scenarios designed to surface specific architectural cases: cross-clause dependencies, MFN chains, deferred reductions, fee waivers tied to extracted metrics. The clause language is drawn from publicly-available LPA examples and SEC filings, and each scenario is constructed to test the architecture, not corpus depth. Real LP documents are confidential and require a customer relationship to obtain. A production deployment will surface clause shapes V1’s prompt does not yet handle; the MANUAL_REVIEW escape hatch absorbs those for now, and the prompt grows iteratively against a live corpus.
The fee scope is one slice. truefee verifies management fees. It does not compute carried interest, waterfalls, clawback obligations, transaction-fee offsets, or expense allocations. Each of those has the same architectural shape (typed clauses, shared timelines, fixed-point evaluation) but requires data sources beyond the LP’s document inbox. The article’s claims are about management-fee verification, which is the slice I have actually built.
Close
truefee is the working implementation of all of the above. Live demo at truefee.io; the multi_amendment scenario walks through the running example from this post end-to-end. Code at github.com/Bhavya-2k03/pe-doc-intelligence.
If there is a domain where this pattern shows up that I missed, or a counter-example where extract-then-apply is enough for legal text, I want to hear it.
Bhavya Gupta, bhavya.2k03@gmail.com.