Why should anyone care about Evals? — Manu Goyal, Braintrust

The Journey to Evals 00:38

  • The speaker, Manu, from Brain Trust, shares his personal "eval journey" which began with a childhood disappointment in rule-based technology.
  • He dedicated his career to software engineering in the AI industry, including working on self-driving cars.
  • In self-driving cars, simply tuning models or adjusting loss functions was not enough to ship to production; there was a need to understand if the model actually worked in real-world scenarios, such as avoiding pedestrians or obeying traffic laws.
  • This experience highlighted the necessity of evals to contextualize and validate AI models for real-world applications.

Why Evals are Essential 02:26

  • Evals are not just unit tests for AI or solely for finding regressions.
  • Relying on shipping to production for signal on changes is expensive, slow, and risky.
  • Investing in good evals creates a "laboratory" for running experiments, allowing 90% of the product iteration loop to occur before production.
  • This process enables much quicker and more confident shipping of AI products.
  • Applying the same offline metrics to online production data provides data-driven insights into which real-world examples are most useful for the next iteration loop.

Industry Endorsement and Brain Trust's Role 03:38

  • Tech luminaries, including Kevin While, Gary Tan, Mike Kger, and Greg Brockman, universally extol the virtues and necessities of evals.
  • Brain Trust aims to build a development platform that supports evals, prompt tweaking, experimentation, and data logging for observability.
  • The platform connects these elements to create a "data flywheel" that helps users achieve their AI goals.

The Core Message 04:54

  • The key to industry transformation and success in AI is evals.