References
The full, annotated bibliography behind How the Form Is Presented to the LLM and Confirmation, Profiles & Agent Architecture. Grouped by the design decision each source supports. Anchors are linked from the rationale pages.
Each entry notes its source type — peer-reviewed, preprint, vendor/engineering blog, or practitioner — so future readers know how much weight to give it. The strongest pillar (positional attention bias) is peer-reviewed; the flat-vs-nested and token-efficiency claims rest on practitioner benchmarks and vendor-reported figures.
Lost in the middle / positional attention bias
Supports decision 2 — splitting static schema from a small, fresh per-turn state.
- Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the Middle: How Language Models Use Long Contexts. TACL. — Peer-reviewed (Transactions of the ACL). The canonical finding: model accuracy is highest when relevant information is at the start or end of the context and degrades significantly in the middle, even for long-context models.
- Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization (2024). arXiv 2406.16008. — Preprint. Attributes the effect to a U-shaped positional-attention bias (a RoPE side-effect) and proposes a calibration fix.
- "Lost in the Middle LLM: The U-Shaped Attention Problem Explained" — Morph. — Engineering blog. Accessible explanation of the U-shaped bias and its RoPE root cause.
Flat vs. nested structure for LLM comprehension
Supports decision 1 — flattening the YAML tree into a flat field list with show_when conditions.
- "Prompt Engineering for Structured Data: A Comparative Evaluation of Styles and LLM Performance" (2025). Preprints.org. — Preprint comparative study. Compares structured prompt styles and their effect on LLM performance.
- "Best Structured Prompt Formats for LLMs, Ranked" — MightyBot. — Practitioner benchmark. Finds flat key-value formats reduce cognitive load vs. complex nested structures; flag of deep-nesting token overhead.
- "Structuring Data for LLMs: Why Your Schema Matters More Than Ever" — Jasmine Directory. — Practitioner. On token waste from repeated/nested structure.
- "LLMStructBench: Benchmarking Large Language Model Structured Data Extraction." arXiv. — Preprint benchmark. Structured-data extraction performance across formats.
BAML — schema-aligned parsing & token efficiency
Supports decisions 1, 4 — the compact DSL line format instead of JSON Schema, and flat structure.
- "Every Way To Get Structured Output From LLMs" — BoundaryML (BAML) Blog. — Vendor/engineering blog. Explains schema-aligned parsing and why concise type declarations beat verbose JSON Schema in prompts.
- "BAML vs POML vs YAML vs JSON for LLM Prompts" — Augment Code. — Vendor/engineering blog. Reports BAML's TypeScript-style types cut schema overhead by ~50–80% (≈300 tokens saved in one example) vs. JSON. Treat figures as vendor-reported.
Just-in-time / lazy context loading
Supports decision 3 (load field detail on demand via getFieldInfo) and decision 5 (constrain the decision space).
- "Introducing advanced tool use on the Claude Developer Platform" — Anthropic. — Vendor/engineering (primary). Tool Search Tool and
defer_loading: load only the tools/definitions relevant to the current step instead of everything upfront, to save context. Directly parallels our on-demandgetFieldInfo. - "Tool use with Claude" — Anthropic API docs. — Vendor docs. Reference for tool-calling mechanics and best practices (reduce ambiguity, examples for complex structures).
Explicit vs. implicit confirmation
Supports the confirmation decision — choosing confirmation style by stakes; verified (explicit) vs. express (implicit) profiles.
- "Implicit and Explicit Confirmations" — Hanakano. — Practitioner. Defines the two confirmation types and when each fits; explicit for severe-consequence actions, implicit folded into the result for natural flow.
- "5 Confirmations for Voice User Interfaces" — Voice First (Medium). — Practitioner. Catalogue of confirmation strategies (explicit, implicit, generic, visual, none) and where each is appropriate.
- "Exploring Confirmation Strategies for Voice Interaction in Multi-Tasking Scenario" (2025). International Journal of Social Robotics, Springer. — Peer-reviewed. Finds users prefer explicit confirmation for high-purposiveness tasks (precise completion) and implicit for low-purposiveness tasks (efficiency).
- "Tips on designing conversations for voice interfaces" — UX Collective. — Practitioner. Confirmation-flow guidance; unnecessary confirmations disrupt flow.
- "10 examples of perfect chatbot conversation flows" — Jotform. — Practitioner. Typical booking flow with a confirmation-summary step; baseline for what we deliberately collapse in
express. - "AI Chatbot Appointment Booking" — Oscar Chat. — Vendor/practitioner. Natural-language booking parsing and confirmation-message patterns.
Submit/reset metaphor in conversational interfaces
Supports making commit a deterministic, in-turn event rather than an LLM "press submit" decision or a timeout job.
- "Bridging UI Design and Chatbot Interactions: Applying Form-Based Principles to Conversational Agents" (2025). arXiv 2507.01862. — Preprint. Notes that GUIs give the backend explicit Submit/Reset signals, while chat lacks them — context can shift ambiguously when the user changes subject without a clear prompt.
Playbooks, generative vs. deterministic, hybrid agents
Supports the "named playbook/profile as unit of behavior" decision and the declarative/deterministic/generative layering.
- "Playbooks" — Dialogflow CX (Google Cloud) docs. — Vendor (primary). Playbooks define agents with natural-language goals + instructions + examples + parameters instead of flows/intents/transitions; you route between many.
- "Generative versus deterministic" — Dialogflow CX docs. — Vendor (primary). When to use deterministic flows vs. generative features.
- "Develop Hybrid Agents With Dialogflow CX" — TEKsystems. — Practitioner/engineering. Visual flow builder for deterministic control + playbooks for generative segments in one agent.
Rasa CALM — flows, process-calling, "LLM interprets, logic decides"
Supports the three-layer separation (declarative / deterministic / generative) and reusable repair patterns.
- "Rasa CALM — High-Trust Conversational AI Without Hallucinations" — Rasa. — Vendor (primary). CALM = Conversational AI with Language Models: YAML flows define the business process; the LLM interprets intent, the logic decides next steps.
- "Conversation Patterns" — Rasa Documentation. — Vendor (primary). Reusable system flows for non-linear interactions and conversation repair.
- "Rasa 2025 Fully Explained — the CALM Revolution" — Communeify. — Practitioner. Flows replacing Stories/Rules; the process-calling pattern (LLM predicts and collaborates with a stateful process).
- "Designing Natural and Engaging Conversations" — Rasa Documentation. — Vendor (primary). Conversation-design best practices.
Constraint decay & declarative vs. imperative agent layers
Supports the "named profiles, not boolean flags" decision — avoiding combinatorial config fragility.
- "Constraint decay: The Fragility of LLM Agents in Backend Code Generation." arXiv 2605.06445. — Preprint. As the density of non-functional constraints rises, agent performance declines — the named fragility behind config/flag accumulation.
- "Towards a Declarative Agentic Layer for Intelligent Agents in MCP-Based Server Ecosystems." arXiv 2601.17435. — Preprint. Declarative behavior specs vs. fragile imperative workflows; reports large dev-time reductions and deployment-velocity gains, and constraining behavior to a verifiable operational space.
- "A Declarative Language for Building And Orchestrating LLM-Powered Agent Workflows." arXiv 2512.19769. — Preprint. DSL for agent workflows expressed in far fewer lines than imperative code.
- "Formally Specifying the High-Level Behavior of LLM-Based Agents." arXiv 2310.08535. — Preprint. High-level declarative specification of agent behavior, decoupled from enforcement.
- "LLM Agents as Catalysts for Resilient DFT: An Orchestration-Based Framework Beyond Brittle Scripts." Applied Sciences (MDPI). — Peer-reviewed. Orchestration-based agent framework as an alternative to brittle imperative scripts.
Internal design records
The first-party discussion that produced these decisions lives in the booker4j repo under .cursor/docs/:
| Document | Covers |
|---|---|
tool-call-for-form-filling.md | Master design doc; flat-schema rationale (line 294), schema vs. state split, getFieldInfo lazy loading, system-prompt layering |
form-yml-documentation/phase3-traversal-implementation-discussion.md | Stack-based traversal, exclusive branch selection, predictable progress |
form-yml-documentation/n-level-nesting.md | Memory/perf characteristics of stack-based state |
implementation-before-form-traversal.md | Redis-serializable state model constraint |
llm-flow-response-plan.md | LLM-generated replies vs. hardcoded templates |
rules-plan/prompt-writing-rules.md | Translation-first, localization rules |