Proposes three basic ingredients for agent success: high value when correct, high probability of correct operation, and low cost of errors
Suggests a formula: Probability of success × Value of success – Cost of errors, which should be greater than agent operation cost to justify production use
Advises focusing agents on high-value enterprise problems or tasks (e.g., legal research, finance research) where enterprise users already spend significant resources
Notes trend toward agents that handle more complex, long-running tasks for higher value, such as deep research or ambient agents working in the background
UI/UX patterns can be shifted to support more substantial, ongoing work, increasing agent value beyond quick, simple answers
Improving Reliability and Probability of Success 05:12
Reliability is a major challenge; prototypes may work once but are hard to make reliable for production
Enterprises prefer deterministic workflows where steps are predictable and controlled (e.g., step A always follows step B)
Recommends making more of the agent's process deterministic, blending workflow systems with agentic systems as necessary
Introduces LangGraph, a framework supporting the spectrum from pure workflow to flexible agents
Stress on observability and evaluation tooling (e.g., LangSmith) to reduce perceived risk and error bars, making it easier for stakeholders to trust and approve agents
Argues that "ambient agents," which operate in the background and respond to events, allow scalable use (many agents working in parallel)
Major distinctions: ambient agents are event-triggered and not dependent on immediate user input; they can handle more complex, longer-running operations due to relaxed latency requirements
Stresses ambient does not mean fully autonomous; maintaining human-in-the-loop checkpoints is important for trust and safety
Details various human-agent interaction patterns: approve/reject, tool call editing, answering agent questions, and "time travel" (undoing or rerunning previous steps)
Suggests the agent evolution path: from synchronous chat agents (operated by humans) to "sync-to-async" models (human initiates, agent works asynchronously), heading toward more autonomous but still supervised agents
Mentions "agent inbox" UX, where users approve or reject agent-proposed actions in bulk
Email is presented as a natural domain for ambient agents: agents process incoming messages but still require human approval to send emails or modify calendar events
Question raised about why code-generating agents attract more funding than others
Code agents favored because outputs are measurable (tests, compilation) and reversible (commits, reverts)
Large models have abundant, verifiable training data in domains like code and math, leading to superior performance
"First draft" UX is generalizable—legal and writing tasks can also benefit from agent output followed by human review, balancing agent value with necessary human oversight
Verifiable domains remain easier for agent deployment; ambiguity in outputs (e.g., essay writing) poses challenges