References

The full, annotated bibliography behind How the Form Is Presented to the LLM and Confirmation, Profiles & Agent Architecture. Grouped by the design decision each source supports. Anchors are linked from the rationale pages.

How to read this

Each entry notes its source type — peer-reviewed, preprint, vendor/engineering blog, or practitioner — so future readers know how much weight to give it. The strongest pillar (positional attention bias) is peer-reviewed; the flat-vs-nested and token-efficiency claims rest on practitioner benchmarks and vendor-reported figures.

Lost in the middle / positional attention bias

Supports decision 2 — splitting static schema from a small, fresh per-turn state.

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the Middle: How Language Models Use Long Contexts. TACL. — Peer-reviewed (Transactions of the ACL). The canonical finding: model accuracy is highest when relevant information is at the start or end of the context and degrades significantly in the middle, even for long-context models.
- Paper: https://arxiv.org/abs/2307.03172
- Code/data: https://github.com/nelson-liu/lost-in-the-middle
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization (2024). arXiv 2406.16008. — Preprint. Attributes the effect to a U-shaped positional-attention bias (a RoPE side-effect) and proposes a calibration fix.
- https://arxiv.org/abs/2406.16008
"Lost in the Middle LLM: The U-Shaped Attention Problem Explained" — Morph. — Engineering blog. Accessible explanation of the U-shaped bias and its RoPE root cause.
- https://www.morphllm.com/lost-in-the-middle-llm

Flat vs. nested structure for LLM comprehension

Supports decision 1 — flattening the YAML tree into a flat field list with show_when conditions.

"Prompt Engineering for Structured Data: A Comparative Evaluation of Styles and LLM Performance" (2025). Preprints.org. — Preprint comparative study. Compares structured prompt styles and their effect on LLM performance.
- https://www.preprints.org/manuscript/202506.1937
"Best Structured Prompt Formats for LLMs, Ranked" — MightyBot. — Practitioner benchmark. Finds flat key-value formats reduce cognitive load vs. complex nested structures; flag of deep-nesting token overhead.
- https://mightybot.ai/blog/best-structured-prompt-formats-for-llms/
"Structuring Data for LLMs: Why Your Schema Matters More Than Ever" — Jasmine Directory. — Practitioner. On token waste from repeated/nested structure.
- https://www.jasminedirectory.com/blog/structuring-data-for-llms-why-your-schema-matters-more-than-ever/
"LLMStructBench: Benchmarking Large Language Model Structured Data Extraction." arXiv. — Preprint benchmark. Structured-data extraction performance across formats.
- https://arxiv.org/html/2602.14743v1

BAML — schema-aligned parsing & token efficiency

Supports decisions 1, 4 — the compact DSL line format instead of JSON Schema, and flat structure.

"Every Way To Get Structured Output From LLMs" — BoundaryML (BAML) Blog. — Vendor/engineering blog. Explains schema-aligned parsing and why concise type declarations beat verbose JSON Schema in prompts.
- https://boundaryml.com/blog/structured-output-from-llms
"BAML vs POML vs YAML vs JSON for LLM Prompts" — Augment Code. — Vendor/engineering blog. Reports BAML's TypeScript-style types cut schema overhead by ~50–80% (≈300 tokens saved in one example) vs. JSON. Treat figures as vendor-reported.
- https://www.augmentcode.com/learn/baml-vs-poml-vs-yaml-vs-json-for-llm-prompts

Just-in-time / lazy context loading

Supports decision 3 (load field detail on demand via getFieldInfo) and decision 5 (constrain the decision space).

"Introducing advanced tool use on the Claude Developer Platform" — Anthropic. — Vendor/engineering (primary). Tool Search Tool and defer_loading: load only the tools/definitions relevant to the current step instead of everything upfront, to save context. Directly parallels our on-demand getFieldInfo.
- https://www.anthropic.com/engineering/advanced-tool-use
"Tool use with Claude" — Anthropic API docs. — Vendor docs. Reference for tool-calling mechanics and best practices (reduce ambiguity, examples for complex structures).
- https://docs.anthropic.com/claude/docs/tool-use

Explicit vs. implicit confirmation

Supports the confirmation decision — choosing confirmation style by stakes; verified (explicit) vs. express (implicit) profiles.

"Implicit and Explicit Confirmations" — Hanakano. — Practitioner. Defines the two confirmation types and when each fits; explicit for severe-consequence actions, implicit folded into the result for natural flow.
- https://www.hanakano.com/posts/confirmation-types/
"5 Confirmations for Voice User Interfaces" — Voice First (Medium). — Practitioner. Catalogue of confirmation strategies (explicit, implicit, generic, visual, none) and where each is appropriate.
- https://voicefirstai.medium.com/5-confirmations-for-voice-user-interfaces-e9c9fd03c764
"Exploring Confirmation Strategies for Voice Interaction in Multi-Tasking Scenario" (2025). International Journal of Social Robotics, Springer. — Peer-reviewed. Finds users prefer explicit confirmation for high-purposiveness tasks (precise completion) and implicit for low-purposiveness tasks (efficiency).
- https://link.springer.com/article/10.1007/s12369-025-01302-w
"Tips on designing conversations for voice interfaces" — UX Collective. — Practitioner. Confirmation-flow guidance; unnecessary confirmations disrupt flow.
- https://uxdesign.cc/tips-on-designing-conversations-for-voice-interfaces-d4084178cfd2
"10 examples of perfect chatbot conversation flows" — Jotform. — Practitioner. Typical booking flow with a confirmation-summary step; baseline for what we deliberately collapse in express.
- https://www.jotform.com/ai/agents/chatbot-conversation-flow/
"AI Chatbot Appointment Booking" — Oscar Chat. — Vendor/practitioner. Natural-language booking parsing and confirmation-message patterns.
- https://www.oscarchat.ai/blog/ai-chatbot-appointment-booking-website/

Submit/reset metaphor in conversational interfaces

Supports making commit a deterministic, in-turn event rather than an LLM "press submit" decision or a timeout job.

"Bridging UI Design and Chatbot Interactions: Applying Form-Based Principles to Conversational Agents" (2025). arXiv 2507.01862. — Preprint. Notes that GUIs give the backend explicit Submit/Reset signals, while chat lacks them — context can shift ambiguously when the user changes subject without a clear prompt.
- https://arxiv.org/pdf/2507.01862

Playbooks, generative vs. deterministic, hybrid agents

Supports the "named playbook/profile as unit of behavior" decision and the declarative/deterministic/generative layering.

"Playbooks" — Dialogflow CX (Google Cloud) docs. — Vendor (primary). Playbooks define agents with natural-language goals + instructions + examples + parameters instead of flows/intents/transitions; you route between many.
"Generative versus deterministic" — Dialogflow CX docs. — Vendor (primary). When to use deterministic flows vs. generative features.
- https://cloud.google.com/dialogflow/cx/docs/generative-deterministic
"Develop Hybrid Agents With Dialogflow CX" — TEKsystems. — Practitioner/engineering. Visual flow builder for deterministic control + playbooks for generative segments in one agent.
- https://www.teksystems.com/en/insights/article/hybrid-conversational-agents-part-2

Rasa CALM — flows, process-calling, "LLM interprets, logic decides"

Supports the three-layer separation (declarative / deterministic / generative) and reusable repair patterns.

"Rasa CALM — High-Trust Conversational AI Without Hallucinations" — Rasa. — Vendor (primary). CALM = Conversational AI with Language Models: YAML flows define the business process; the LLM interprets intent, the logic decides next steps.
- https://rasa.com/calm
"Conversation Patterns" — Rasa Documentation. — Vendor (primary). Reusable system flows for non-linear interactions and conversation repair.
- https://rasa.com/docs/learn/concepts/conversation-patterns/
"Rasa 2025 Fully Explained — the CALM Revolution" — Communeify. — Practitioner. Flows replacing Stories/Rules; the process-calling pattern (LLM predicts and collaborates with a stateful process).
- https://www.communeify.com/en/blog/what-is-rasa/
"Designing Natural and Engaging Conversations" — Rasa Documentation. — Vendor (primary). Conversation-design best practices.
- https://rasa.com/docs/learn/best-practices/conversation-design/

Constraint decay & declarative vs. imperative agent layers

Supports the "named profiles, not boolean flags" decision — avoiding combinatorial config fragility.

"Constraint decay: The Fragility of LLM Agents in Backend Code Generation." arXiv 2605.06445. — Preprint. As the density of non-functional constraints rises, agent performance declines — the named fragility behind config/flag accumulation.
- https://arxiv.org/html/2605.06445
"Towards a Declarative Agentic Layer for Intelligent Agents in MCP-Based Server Ecosystems." arXiv 2601.17435. — Preprint. Declarative behavior specs vs. fragile imperative workflows; reports large dev-time reductions and deployment-velocity gains, and constraining behavior to a verifiable operational space.
- https://arxiv.org/pdf/2601.17435
"A Declarative Language for Building And Orchestrating LLM-Powered Agent Workflows." arXiv 2512.19769. — Preprint. DSL for agent workflows expressed in far fewer lines than imperative code.
- https://arxiv.org/pdf/2512.19769
"Formally Specifying the High-Level Behavior of LLM-Based Agents." arXiv 2310.08535. — Preprint. High-level declarative specification of agent behavior, decoupled from enforcement.
- https://arxiv.org/pdf/2310.08535
"LLM Agents as Catalysts for Resilient DFT: An Orchestration-Based Framework Beyond Brittle Scripts." Applied Sciences (MDPI). — Peer-reviewed. Orchestration-based agent framework as an alternative to brittle imperative scripts.
- https://www.mdpi.com/2076-3417/15/21/11390

Internal design records

The first-party discussion that produced these decisions lives in the booker4j repo under .cursor/docs/:

Document	Covers
`tool-call-for-form-filling.md`	Master design doc; flat-schema rationale (line 294), schema vs. state split, `getFieldInfo` lazy loading, system-prompt layering
`form-yml-documentation/phase3-traversal-implementation-discussion.md`	Stack-based traversal, exclusive branch selection, predictable progress
`form-yml-documentation/n-level-nesting.md`	Memory/perf characteristics of stack-based state
`implementation-before-form-traversal.md`	Redis-serializable state model constraint
`llm-flow-response-plan.md`	LLM-generated replies vs. hardcoded templates
`rules-plan/prompt-writing-rules.md`	Translation-first, localization rules

Lost in the middle / positional attention bias​

Flat vs. nested structure for LLM comprehension​

BAML — schema-aligned parsing & token efficiency​

Just-in-time / lazy context loading​

Explicit vs. implicit confirmation​

Submit/reset metaphor in conversational interfaces​

Playbooks, generative vs. deterministic, hybrid agents​

Rasa CALM — flows, process-calling, "LLM interprets, logic decides"​

Constraint decay & declarative vs. imperative agent layers​

Internal design records​