SUMM

Focuses on building reliable agents for enterprise use, both for internal developers and vendors selling solutions to enterprises
Presents a vision of the future with many specialized agents performing various tasks within enterprises and being managed by humans
Sets out to identify why some agents succeed in the enterprise and others fail

Proposes three basic ingredients for agent success: high value when correct, high probability of correct operation, and low cost of errors
Suggests a formula: Probability of success × Value of success – Cost of errors, which should be greater than agent operation cost to justify production use

Advises focusing agents on high-value enterprise problems or tasks (e.g., legal research, finance research) where enterprise users already spend significant resources
Notes trend toward agents that handle more complex, long-running tasks for higher value, such as deep research or ambient agents working in the background
UI/UX patterns can be shifted to support more substantial, ongoing work, increasing agent value beyond quick, simple answers

Reliability is a major challenge; prototypes may work once but are hard to make reliable for production
Enterprises prefer deterministic workflows where steps are predictable and controlled (e.g., step A always follows step B)
Recommends making more of the agent's process deterministic, blending workflow systems with agentic systems as necessary
Introduces LangGraph, a framework supporting the spectrum from pure workflow to flexible agents
Stress on observability and evaluation tooling (e.g., LangSmith) to reduce perceived risk and error bars, making it easier for stakeholders to trust and approve agents

In enterprises, perceived risks of agent mistakes (brand damage, data leakage) are significant barriers
Suggests UI/UX patterns to minimize these risks, such as making all agent actions easily reversible (e.g., code commits and pull requests)
Advocates "human-in-the-loop" design where a person reviews or approves agent actions before they have external effects
Provides examples: deep research processes calibrate with user input, code agents propose changes as drafts or on branches for human approval

Argues that "ambient agents," which operate in the background and respond to events, allow scalable use (many agents working in parallel)
Major distinctions: ambient agents are event-triggered and not dependent on immediate user input; they can handle more complex, longer-running operations due to relaxed latency requirements
Stresses ambient does not mean fully autonomous; maintaining human-in-the-loop checkpoints is important for trust and safety

Details various human-agent interaction patterns: approve/reject, tool call editing, answering agent questions, and "time travel" (undoing or rerunning previous steps)
Suggests the agent evolution path: from synchronous chat agents (operated by humans) to "sync-to-async" models (human initiates, agent works asynchronously), heading toward more autonomous but still supervised agents
Mentions "agent inbox" UX, where users approve or reject agent-proposed actions in bulk

Email is presented as a natural domain for ambient agents: agents process incoming messages but still require human approval to send emails or modify calendar events
Open-source example provided for audience to try

Question raised about why code-generating agents attract more funding than others
Code agents favored because outputs are measurable (tests, compilation) and reversible (commits, reverts)
Large models have abundant, verifiable training data in domains like code and math, leading to superior performance
"First draft" UX is generalizable—legal and writing tasks can also benefit from agent output followed by human review, balancing agent value with necessary human oversight
Verifiable domains remain easier for agent deployment; ambiguity in outputs (e.g., essay writing) poses challenges

3 ingredients for building reliable enterprise agents - Harrison Chase, LangChain/LangGraph