SUMM

The speaker discusses the experience of building agents, emphasizing the ease of reaching an initial 70-80% solution using libraries, but noting the challenges in pushing past that quality bar
Highlights that not all problems are well-suited for agents—sometimes a simple script is faster and more effective
Points out that many production "agents" are not truly agentic; rather, they're modular software with certain best practices

After talking to over 100 builders, the speaker identified recurring patterns for successful LLM-based applications
These patterns are codified in the "12-factor agent" approach, available as a GitHub repo with significant community interest and contribution
The approach is positioned as a set of modular, wishlist-like best practices, rather than an anti-framework manifesto

Encourages application of standard software engineering principles, comparing the shift to how Heroku defined cloud-native practices years ago
The talk aims to help developers apply modular enhancements to increase agent reliability and flexibility

The "magical" capability of LLM agents is their ability to convert flexible natural language into structured JSON
Cautions against fetishizing "tool use" as a magic abstraction; at its core, it's just structured model output (“tool use is harmful” in the sense of overcomplicating abstractions)
Advocates modular design where deterministic code, not magical loops, drives the agent’s operation

Emphasizes the need for explicit ownership of the agent’s control flow, drawing parallels with DAGs and orchestration tools like Airflow or Prefect
Notes that naive agent loops relying on ever-growing context windows become unreliable, particularly in longer workflows
Recommends breaking up logic into modular components: prompt, action (via switch statement), context window, and execution loop
Owning the control flow allows for advanced features like pausing, resuming, retries, and integration with business state and APIs

Stresses the importance of deeply owning and handcrafting prompts to reach high reliability—out-of-the-box methods don’t suffice for top quality
The quality of outputs is entirely dependent on input tokens; experimentation, tuning, and manual refinement are emphasized
Context building (including prompt formats, memory, history, and RAG) is central; optimizing token density and relevance is critical for performance

When tool API calls fail, don’t blindly append errors to model context; instead, summarize or clear them to avoid model confusion
Suggests a clear distinction between tool calls and communication with humans, pushing models to explicitly choose via natural language at each decision point
Enables agents to integrate seamlessly with various human interfaces (email, Slack, Discord, SMS), emphasizing meeting users where they are

Advocates for micro-agents: small, focused agent loops with clear, manageable responsibilities, typically 3–10 steps
Shares a practical example from HumanLayer: a deployment agent where deterministic code handles CI/CD, but LLMs take over at the point needing flexible decision-making, all mediated by structured JSON and human feedback
Predicts a trend where deterministic workflows gradually become more agentic as LLM capacity improves, but each expansion should be engineered for reliability

Recommends agents themselves be stateless, with developers retaining explicit ownership and management of all state
Acknowledges ongoing search for the right abstractions, referencing debates on framework vs. library style development
Announces work on "create 12-factor agent" to scaffold projects while letting developers stay in control of the core logic

Reiterates that building reliable LLM agents is essentially a software engineering task—focus on inputs, state, and explicit control, leveraging familiar constructs like loops and switch statements
Flexibility comes from deep understanding and explicit control—make every model interaction intentional
Success depends on finding the reliability edge: engineering workflows to exploit agent strengths while safeguarding against unpredictability
Highlights the value of agents as collaborative tools with humans, with best results coming from careful prompt and flow engineering
Encourages practitioners to embrace and tackle hard problems, focusing efforts on AI and reliability rather than infrastructure friction
Concludes by inviting further conversation and collaboration on building better agents

12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer