12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer

Introduction & Motivation 00:01

  • The speaker discusses the experience of building agents, emphasizing the ease of reaching an initial 70-80% solution using libraries, but noting the challenges in pushing past that quality bar
  • Highlights that not all problems are well-suited for agents—sometimes a simple script is faster and more effective
  • Points out that many production "agents" are not truly agentic; rather, they're modular software with certain best practices

Patterns and the "12-Factor Agents" Approach 02:01

  • After talking to over 100 builders, the speaker identified recurring patterns for successful LLM-based applications
  • These patterns are codified in the "12-factor agent" approach, available as a GitHub repo with significant community interest and contribution
  • The approach is positioned as a set of modular, wishlist-like best practices, rather than an anti-framework manifesto

Rethinking Agent Design with Software Engineering Principles 03:27

  • Encourages application of standard software engineering principles, comparing the shift to how Heroku defined cloud-native practices years ago
  • The talk aims to help developers apply modular enhancements to increase agent reliability and flexibility

Factor Highlights: Key Patterns for Reliable Agents 03:47

  • The "magical" capability of LLM agents is their ability to convert flexible natural language into structured JSON
  • Cautions against fetishizing "tool use" as a magic abstraction; at its core, it's just structured model output (“tool use is harmful” in the sense of overcomplicating abstractions)
  • Advocates modular design where deterministic code, not magical loops, drives the agent’s operation

Control Flow, Execution, and State Management 05:07

  • Emphasizes the need for explicit ownership of the agent’s control flow, drawing parallels with DAGs and orchestration tools like Airflow or Prefect
  • Notes that naive agent loops relying on ever-growing context windows become unreliable, particularly in longer workflows
  • Recommends breaking up logic into modular components: prompt, action (via switch statement), context window, and execution loop
  • Owning the control flow allows for advanced features like pausing, resuming, retries, and integration with business state and APIs

Prompt and Context Engineering 08:37

  • Stresses the importance of deeply owning and handcrafting prompts to reach high reliability—out-of-the-box methods don’t suffice for top quality
  • The quality of outputs is entirely dependent on input tokens; experimentation, tuning, and manual refinement are emphasized
  • Context building (including prompt formats, memory, history, and RAG) is central; optimizing token density and relevance is critical for performance

Error Handling and Human Interaction 10:34

  • When tool API calls fail, don’t blindly append errors to model context; instead, summarize or clear them to avoid model confusion
  • Suggests a clear distinction between tool calls and communication with humans, pushing models to explicitly choose via natural language at each decision point
  • Enables agents to integrate seamlessly with various human interfaces (email, Slack, Discord, SMS), emphasizing meeting users where they are

Micro-Agents and Modularity in Practice 12:22

  • Advocates for micro-agents: small, focused agent loops with clear, manageable responsibilities, typically 3–10 steps
  • Shares a practical example from HumanLayer: a deployment agent where deterministic code handles CI/CD, but LLMs take over at the point needing flexible decision-making, all mediated by structured JSON and human feedback
  • Predicts a trend where deterministic workflows gradually become more agentic as LLM capacity improves, but each expansion should be engineered for reliability

State Management, Abstractions, and Future Directions 14:25

  • Recommends agents themselves be stateless, with developers retaining explicit ownership and management of all state
  • Acknowledges ongoing search for the right abstractions, referencing debates on framework vs. library style development
  • Announces work on "create 12-factor agent" to scaffold projects while letting developers stay in control of the core logic

Takeaways & Closing Advice 15:09

  • Reiterates that building reliable LLM agents is essentially a software engineering task—focus on inputs, state, and explicit control, leveraging familiar constructs like loops and switch statements
  • Flexibility comes from deep understanding and explicit control—make every model interaction intentional
  • Success depends on finding the reliability edge: engineering workflows to exploit agent strengths while safeguarding against unpredictability
  • Highlights the value of agents as collaborative tools with humans, with best results coming from careful prompt and flow engineering
  • Encourages practitioners to embrace and tackle hard problems, focusing efforts on AI and reliability rather than infrastructure friction
  • Concludes by inviting further conversation and collaboration on building better agents