How the Form Is Presented to the LLM

This page records why Booker4j presents a form to the form-filling LLM the way it does. We do not hand the model the raw YAML form. Instead the engine transforms the YAML into two purpose-built text representations — a flat schema (injected once) and a compact state (refreshed every turn) — and the model fetches per-field detail on demand.

That format performs well and reads cleanly in practice. This document captures the theory and the published research that led us to it, so the reasoning is never lost. Every claim is cited; the full annotated bibliography lives in References.

Where this is implemented

Schema — FormSchemaRenderer.java → FormEngineImpl.getFormSchema()
State — FormEngineImpl.getFormState()
On-demand detail — FormTools.getFieldInfo(fieldName)
Original design discussion: .cursor/docs/tool-call-for-form-filling.md (the load-bearing rationale is at line 294).

What the LLM actually sees

1. The schema (static — injected once into the system prompt). The whole YAML tree is flattened (DFS) into a flat field list. Options, validation rules, and long descriptions are omitted — they are fetched on demand.

=== FIELD TYPE REFERENCE ===
SELECT  A dropdown with predefined options...
DATE    A date picker field...

=== FORM STRUCTURE ===
Mandatory: serviceType, email, name
- serviceType [SMART_SELECT, required, options: campusMoveOut|moveOutCleaning|windowCleaning] - "What service?"
- email [EMAIL, required, show_when: serviceType = campusMoveOut] - "Email address?"

2. The state (dynamic — refreshed every turn). Only the active branch and progress.

Answered (2):
  serviceType = "campusMoveOut"
  name = "John Doe"
Active Branch: serviceType → campusMoveOut
Remaining Mandatory (1): email
Remaining Optional (0):
Legally Complete: No

The decisions, and the research behind each

1. Flatten the tree — don't send nested YAML

Decision. The YAML is a nested tree, but we render it to the LLM as a flat list of fields whose branching is expressed by inline show_when: field = option conditions.

Why. A flat list with explicit conditions is semantically equivalent to the nested tree but requires the model to hold far less structure in working memory to understand which field depends on what. Nested structures also carry redundant structural tokens and push the model to track depth.

Evidence.

Practitioner benchmarks and structured-prompt comparisons consistently find LLMs comprehend flat key-value / conditional structures better than deeply nested ones, and that deep nesting raises per-token overhead and error rates — so the standard guidance is to "keep schemas as flat as practical." [Prompt-format comparisons] [Structured-prompt eval]
BAML's design is built on the same observation: declare structure compactly, let the model reason in flatter terms. [BAML]

Honest framing

The flat-vs-nested guidance is strongly supported by practitioner benchmarks and a preprint comparative evaluation rather than a single definitive peer-reviewed result. We treat it as a well-evidenced heuristic, not a theorem. See References.

2. Split static schema from per-turn state

Decision. The form structure (schema) goes into the system prompt once. Only what's been answered and what remains (state) is refreshed on every turn, and it is kept tiny.

Why. The structure never changes during a session, so re-sending it each turn would waste tokens and — more importantly — bury the decision-relevant "what to do next" information in a wall of static text. Keeping the live state small and fresh keeps the model's attention on it.

Evidence — this is the strongest, peer-reviewed pillar.

Lost in the Middle (Liu et al., TACL 2023). LLMs use long contexts unevenly: accuracy is highest when relevant information is at the very start or end of the context and degrades sharply when it sits in the middle — even for explicitly long-context models. [Liu et al.]
Found in the Middle (arXiv 2406.16008) traces this to a U-shaped positional-attention bias (a RoPE side-effect) where mid-context tokens land in a low-attention zone. [Found in the Middle]
Implication we act on: keep the per-turn state compact and position it as a distinct, fresh block so the "current task" never gets lost in the middle of a large static prompt.

3. Load field detail on demand via `getFieldInfo`

Decision. The schema lines carry only what's needed to reason about ordering and visibility (name, type, required, show_when, option keys). Full option lists, validation rules, and descriptions are fetched only when the model actually touches a field, via the getFieldInfo tool.

Why. Pre-loading every field's full detail bloats context for fields the model may never reach (branches not taken). Just-in-time retrieval keeps even a 20-field form to ~40–50 lines.

Evidence.

This is exactly the just-in-time / lazy-loading pattern Anthropic later shipped as a first-class capability (Tool Search Tool, defer_loading) precisely because loading everything upfront burns context. Their guidance: agents should discover and load only what's relevant to the current step. [Anthropic advanced tool use] Our getFieldInfo-on-demand design applies the same principle to form fields instead of tools.

4. A compact DSL line format — not JSON

Decision. Each field is one line: - field [TYPE, required, show_when: …, options: a|b] - "prompt". We deliberately did not use JSON Schema.

Why. A terse DSL conveys the same structure with a fraction of the structural tokens (braces, quotes, repeated keys), and reads more naturally to the model.

Evidence.

BAML reports that its TypeScript-style type syntax cuts schema token overhead by ~50–80% versus JSON Schema (one example: ~300 tokens saved per call) while improving, not hurting, output quality. [BAML] We treat these as vendor-reported figures, but the direction matches the general token-efficiency literature.

5. Determinism: stack-based active path, exclusive branches, predictable progress

Decision. State is computed by walking only the answered branches (walkActivePath). Once a branch option is chosen, sibling branches are never visited ("exclusive branch selection"), and progress is counted over root-level fields only.

Why. This makes the representation unambiguous — given the stack, the remaining fields are always computable, with no "what-if" reasoning for the model to do. Counting only root-level fields keeps progress predictable regardless of which branch was taken ("Question 2 of 4" stays stable).

Evidence / rationale. This is primarily an engineering decision in service of determinism rather than a single cited research finding, but it aligns with tool-use best practice: constrain the model's decision space and remove ambiguity to reduce errors. [Anthropic advanced tool use] The serialization constraint (Redis-friendly, no circular references) also forced a flatter, lighter snapshot model, which reinforced the same goal. (See .cursor/docs/form-yml-documentation/phase3-traversal-implementation-discussion.md and n-level-nesting.md.)

6. Supporting principles

LLM-generated replies, not hardcoded templates. Final user-facing text is generated by the model with full runtime context (history, validation hints, field type) rather than concatenated from templates, so responses stay natural and handle edge cases gracefully. (See .cursor/docs/llm-flow-response-plan.md.)
Translation-first. Every user-facing string comes from translation files, never hardcoded — clean data/logic separation and locale-switching without code changes. (See .cursor/docs/rules-plan/prompt-writing-rules.md.)

One-sentence synthesis

We converged on "flat schema once, minimal active-branch state every turn, detail on demand." That triangulates three independent research currents: flat beats nested for comprehension (practitioner benchmarks + BAML), lost-in-the-middle attention bias (Liu et al., TACL 2023 — keep live state small and fresh), and just-in-time context loading (Anthropic's deferred tool loading). Anthropic later shipping defer_loading as a product feature is good after-the-fact validation that the getFieldInfo lazy-loading instinct was right.

➡️ Full annotated citations: References.

What the LLM actually sees​

The decisions, and the research behind each​

1. Flatten the tree — don't send nested YAML​

2. Split static schema from per-turn state​

3. Load field detail on demand via getFieldInfo​

4. A compact DSL line format — not JSON​

5. Determinism: stack-based active path, exclusive branches, predictable progress​

6. Supporting principles​

One-sentence synthesis​

What the LLM actually sees

The decisions, and the research behind each

1. Flatten the tree — don't send nested YAML

2. Split static schema from per-turn state

3. Load field detail on demand via `getFieldInfo`

4. A compact DSL line format — not JSON

5. Determinism: stack-based active path, exclusive branches, predictable progress

6. Supporting principles

One-sentence synthesis