Even top companies like OpenAI face challenges shipping reliable products (e.g., issues with Codeex and chatbots)
Examples of AI failures: Virgin Money’s chatbot misinterpreting "virgin" as inappropriate, Google Cloud confusing credits, Grok responding inappropriately to queries
AI product mistakes often enter public awareness only because of product visibility
Unlike traditional apps with concrete errors, AI apps require careful monitoring of user signals (explicit and implicit)
Explicit signals: direct feedback like thumbs up/down, portion of response copied, user preferences, errors
Implicit signals: inferred data such as refusals, task failures, or user frustration
Signals should be combined with user intent to define, discover, and refine AI issues over time
Staying close to data and user feedback is essential for improving AI products
The Trellis Framework for Scaling AI Products 15:16
Sid introduces Trellis, Oleve’s systematic approach to refining and scaling viral AI products
Trellis’ three core principles: discretization (breaking down output space into focus areas), prioritization (ranking these areas for business impact), recursive refinement (continually improving within each area)
Six steps of Trellis include: launching MVP to gather data, classifying user intents, converting intents to workflows, prioritizing based on metrics, analyzing failures, and refining further
Prioritization should combine workflow volume with negative sentiment and estimated achievable improvement, rather than just usage volume
Structured, self-contained workflows allow faster iteration and more reliable improvements