Vendor contracts are expensive and can lead to vendor/model lock-in, making upgrades challenging even within the same vendor
LLM latency challenges: response times of several seconds; complexity increases with users’ detailed tax information, especially near tax deadlines
Product and fallback designs are implemented to ensure a seamless and user-friendly experience even under high latency
Rigorous evaluation (eval) processes are critical for launching and maintaining quality and regulatory compliance
Q&A: Evaluation Methods and Workflow Integration 12:20
Evaluation types vary by development phase: manual evaluation by tax experts for initial baselining, automated evaluation (LLM as judge) for ongoing prompt tweaks, manual review returns for major changes
User LLM interactions include product questions (how to use TurboTax features) and tax-specific questions (e.g., claiming tuition payments)
Intuit uses different system components to route/plan the proper solution for each question type
Q&A: Data Integrity, Personalization, and Safety 15:12
All numeric tax data comes from Intuit’s proprietary tax engine; LLMs do not perform calculations, ensuring ground-truth accuracy
Security systems and guardrails prevent hallucinated numbers from being included in user explanations
ML models are used to verify the accuracy of numbers in the final user-facing output
Q&A: RAG Approaches, Personalization, and Future Models 16:42
Hybrid use of standard RAG and GraphRAG, with GraphRAG providing better, more personalized answers for users
Ongoing evaluation of new LLM models and custom in-house models; future adoption decisions yet to be made
Q&A: Legal, Privacy, and Explanation Traceability 17:53
All complex tax answers are based on data from the tax engine, with prompts crafted and tested by tax experts
Legal and privacy controls are strictly enforced to prevent regulatory errors or legal issues
Explanations are constructed using validated systems to ensure traceability and correctness