From Copilot to Colleague: Building Trustworthy Productivity Agents for High-Stakes Work - Joel Hron

Introduction and Shifting Goals 00:00

  • Thompson Reuters' initial focus for AI assistants was on being helpful, referencing information accurately, and providing citations.
  • Recently, the priority has shifted from simple helpfulness to productivity, expecting AI to produce output, make judgments, and decisions on behalf of users.
  • High-stakes environments (law, tax, global trade, risk, and fraud) require higher accuracy due to the severe consequences of errors.
  • Thompson Reuters has a long history in these fields, with a strong base of domain expertise and proprietary content.

Background on Thompson Reuters and Approach 01:34

  • The company employs 4,500 domain experts, primarily lawyers, making it the largest employer of lawyers globally.
  • Possesses over 1.5 terabytes of proprietary domain content in legal, tax, compliance, audit, and risk.
  • Invested over $3 billion in acquisitions in recent years and spends over $200 million annually on AI product development.
  • Operates an applied research lab with over 200 scientists and engineers working with development teams.

Evolution Towards Agentic AI 03:16

  • There's a significant industry shift from building "agentic tools for law firms" to constructing "law firms of agents."
  • Agentic AI refers to systems with varying degrees of autonomy, moving beyond just being helpful to taking actions and making decisions.
  • Agency is viewed as a spectrum with adjustable "dials" like autonomy, context, memory, and coordination, depending on risk tolerance and use case.
  • Autonomy dial ranges from performing simple tasks to self-evolving workflows where the AI plans and replans its work dynamically.
  • Context dial evolved from using static model knowledge to integrating multiple and even dynamic knowledge sources.
  • Memory dial importance has grown, with systems needing persistent memory across complex workflows and sessions.
  • Coordination covers the evolution from single-task execution to delegation and multi-agent collaboration.

Lessons Learned and Evaluation Challenges 07:21

  • Evaluation (eval) of AI systems is complex and crucial for building trust, hindered by AI's non-deterministic nature.
  • Both users and internal teams struggle with inconsistency in evaluation; human experts’ assessments can vary by over 10% on identical cases.
  • Referencing reliable source material grows harder as agency increases; tracing decision drift and building robust guardrails requires deep domain expertise.
  • Rigorous evaluation rubrics are developed, but user preference remains a key north star in assessing if systems are improving.
  • Legacy applications, often seen as outdated, are now valuable as decomposable assets—agents can utilize their highly tuned domain logic as tools.
  • The concept of MVP (minimum viable product) can be limiting; building out the whole system first often yields better understanding and results.

Product Demonstrations and Application Examples 12:01

  • Demonstration of a tax workflow where AI automates document ingestion, data extraction, mapping to tax engines, applying tax law rules, validating results, and generating returns.
  • Success in automation is due to legacy tools (tax engines, validation engines) which AI can leverage for calculations and error checking.
  • Legal research application: AI uses litigation research tools to search, validate, and compare legal documents, statutes, citations, and blogs.
  • The AI tracks its evidence, writes intermediate notes, and compiles a final report with explicit, traceable citations and risk flags.
  • These examples highlight the benefits of decomposing legacy systems and building full-fledged products that agents can operate within.

Key Takeaways and Strategy 16:09

  • Start by considering the entire problem and solution space when building agentic AI systems, rather than focusing on minimal components.
  • Agency should be dynamically adjusted depending on the use case’s risk and the user's tolerance.
  • Leverage and break down old systems into components that agentic systems can utilize uniquely.
  • Maintain human-in-the-loop evaluation for trust, especially in high-stakes and specialized domains.
  • Differentiate products by capitalizing on unique assets, such as proprietary data and deep domain expertise.

Q&A: Security and Compliance 17:57

  • Question addressed on how Thompson Reuters ensures security and compliance, especially for regulated or government-related clients.
  • Emphasis on compliance with government and industry standards (e.g., FedRAMP, ISO).
  • Security posture is described as adaptive, ensuring alignment with rapidly evolving standards.
  • Detailed technical documentation is available for further specifics on architecture and compliance.