3 ingredients for building reliable enterprise agents - Harrison Chase, LangChain/LangGraph

Introduction & Enterprise Agent Challenges 00:00

  • Focuses on building reliable agents for enterprise use, both for internal developers and vendors selling solutions to enterprises
  • Presents a vision of the future with many specialized agents performing various tasks within enterprises and being managed by humans
  • Sets out to identify why some agents succeed in the enterprise and others fail

Three Ingredients for Success 01:46

  • Proposes three basic ingredients for agent success: high value when correct, high probability of correct operation, and low cost of errors
  • Suggests a formula: Probability of success × Value of success – Cost of errors, which should be greater than agent operation cost to justify production use

Increasing Agent Value 03:11

  • Advises focusing agents on high-value enterprise problems or tasks (e.g., legal research, finance research) where enterprise users already spend significant resources
  • Notes trend toward agents that handle more complex, long-running tasks for higher value, such as deep research or ambient agents working in the background
  • UI/UX patterns can be shifted to support more substantial, ongoing work, increasing agent value beyond quick, simple answers

Improving Reliability and Probability of Success 05:12

  • Reliability is a major challenge; prototypes may work once but are hard to make reliable for production
  • Enterprises prefer deterministic workflows where steps are predictable and controlled (e.g., step A always follows step B)
  • Recommends making more of the agent's process deterministic, blending workflow systems with agentic systems as necessary
  • Introduces LangGraph, a framework supporting the spectrum from pure workflow to flexible agents
  • Stress on observability and evaluation tooling (e.g., LangSmith) to reduce perceived risk and error bars, making it easier for stakeholders to trust and approve agents

Reducing the Cost of Agent Errors 09:48

  • In enterprises, perceived risks of agent mistakes (brand damage, data leakage) are significant barriers
  • Suggests UI/UX patterns to minimize these risks, such as making all agent actions easily reversible (e.g., code commits and pull requests)
  • Advocates "human-in-the-loop" design where a person reviews or approves agent actions before they have external effects
  • Provides examples: deep research processes calibrate with user input, code agents propose changes as drafts or on branches for human approval

Scaling and Evolving Enterprise Agents 13:17

  • Argues that "ambient agents," which operate in the background and respond to events, allow scalable use (many agents working in parallel)
  • Major distinctions: ambient agents are event-triggered and not dependent on immediate user input; they can handle more complex, longer-running operations due to relaxed latency requirements
  • Stresses ambient does not mean fully autonomous; maintaining human-in-the-loop checkpoints is important for trust and safety

UX Patterns and Agent Types 15:19

  • Details various human-agent interaction patterns: approve/reject, tool call editing, answering agent questions, and "time travel" (undoing or rerunning previous steps)
  • Suggests the agent evolution path: from synchronous chat agents (operated by humans) to "sync-to-async" models (human initiates, agent works asynchronously), heading toward more autonomous but still supervised agents
  • Mentions "agent inbox" UX, where users approve or reject agent-proposed actions in bulk

Concrete Example: Ambient Email Agent 17:43

  • Email is presented as a natural domain for ambient agents: agents process incoming messages but still require human approval to send emails or modify calendar events
  • Open-source example provided for audience to try

Q&A: Why Code Generation Agents Dominate Funding 18:48

  • Question raised about why code-generating agents attract more funding than others
  • Code agents favored because outputs are measurable (tests, compilation) and reversible (commits, reverts)
  • Large models have abundant, verifiable training data in domains like code and math, leading to superior performance
  • "First draft" UX is generalizable—legal and writing tasks can also benefit from agent output followed by human review, balancing agent value with necessary human oversight
  • Verifiable domains remain easier for agent deployment; ambiguity in outputs (e.g., essay writing) poses challenges