Forget RAG Pipelines—Build Production Ready Agents in 15 Mins: Nina Lopatina, Rajiv Shah, Contextual

Introduction to Contextual AI 00:37

  • Rajiv Shah, Chief Evangelist, and Nina Lopatina, working in NLP and language modeling, introduce Contextual AI, joined by platform engineer Matthew and solution architect John.
  • Contextual AI aims to simplify RAG (Retrieval Augmented Generation) by treating it as a managed service, similar to not training your own large language models or building your own vector databases.
  • The company's founders, Dao and Aman, were involved in the initial RAG paper and focused on addressing the challenges of scaling RAG beyond simple demos, particularly around accuracy and document diversity.

Contextual AI Platform Overview 04:20

  • RAG is crucial for understanding unstructured enterprise data, with a simple pipeline involving a vector database, cosine similarity, and an LLM.
  • The Contextual AI platform is designed for different user levels: a no-code option for business users and an orchestratable platform for developers to fine-tune RAG components.
  • The system is modular, allowing users to leverage specific components like extraction or reranking, with Contextual AI solely focused on building RAG solutions.
  • The platform addresses the complexities of production RAG, which involves orchestrating multiple models (BM25, rerankers) and can become time-consuming and costly.

Hands-on Agent Setup and Document Ingestion 06:20

  • Users can access the notebook and getting started guide at contextual.ai/25 and the GUI at app.contextual.ai.
  • The demonstration involves loading financial statements from Nvidia and "spurious correlations" documents to test the RAG system's data handling.
  • The setup process includes signing up, creating a workspace, obtaining an API key (the only one needed), setting up the client, and creating a data store.
  • Documents are downloaded and uploaded to the data store, where they are processed, parsed, and ingested, with the ability to inspect how tables and images are extracted with high accuracy.
  • Spurious correlation documents are used to show how the RAG system prioritizes information from loaded documents over conventional wisdom, helping to test for hallucinations.

Deep Dive into Platform Components 14:19

  • Building RAG systems involves several complex tasks: extraction pipeline, chunking strategies, reranking, and scaling.
  • Contextual AI provides an end-to-end platform available as SaaS or in a user's VPC, offering both UI and REST API endpoints.
  • The document understanding pipeline extracts information (tables, images, document structure, metadata), chunks it, and sets bounding boxes for attribution.
  • The retrieval process uses a mixture of retrievers (BM25, embedding model) and a proprietary state-of-the-art reranker.
  • Contextual AI trains its own grounded language model, specifically fine-tuned to respect provided context and avoid generating its own knowledge.
  • The platform's components are built with academic rigor, consistently achieving state-of-the-art performance on benchmarks for document understanding, retrieval, and grounded generation.
  • Use cases extend beyond simple Q&A chatbots (e.g., Qualcomm's website), including automated financial workflows and integrations with tools like Claude via an MCP server.

Agent Querying and Evaluation 19:14

  • Users can access and query documents via API or GUI, with agents capable of quantitative reasoning (e.g., summing quarterly Nvidia revenue) and citing sources from multiple documents.
  • When queried about spurious correlations, the agent provides the statistical correlation but also includes caveats from the document, demonstrating its focus on avoiding hallucinations and adhering strictly to provided data.
  • A comparison with ChatGPT highlights that Contextual AI's agent will present document information even if it contradicts conventional wisdom, unlike general LLMs that might argue or offer less detailed explanations for spurious correlations.
  • Agents can be customized through an admin panel to adjust settings like linking data stores, system prompts, query understanding (multi-turn, expansion), retrieval, reranking, and prompt generation.
  • Contextual AI offers an LMUnit model for natural language unit testing of RAG systems, allowing users to define specific test questions (e.g., accuracy, causation, relevance) and evaluate responses on a 1-5 scale.
  • An example query about Nvidia's revenue showed a low score for "avoid unnecessary information," indicating areas for prompt refinement.

Integration and Pricing 47:51

  • Contextual AI agents can integrate with other MCP clients like Claude Desktop and Cursor, enabling users to leverage their RAG agents within other applications.
  • A GitHub repository provides the Contextual MCP server for easy integration, requiring users to configure their client to point to the server.
  • Pricing for individual components (parse, rerank, generate, LM unit) is consumption-based (pay-per-token), with new sign-ups receiving a $25 credit.
  • The full RAG platform will also adopt consumption-based pricing, calculated by documents ingested and queries performed.
  • Provisioned throughput is available for enterprise customers requiring latency or queries-per-second guarantees.

Q&A Highlights 54:22

  • System Prompt & Eval Changes: Business users might adjust system prompts, but advanced settings require a developer's understanding for error analysis and hill-climbing accuracy.
  • Diverse Data Types: Contextual AI offers customer machine learning engineers to assist enterprises with building evaluation datasets and improving agent performance across various document types (PDF, Excel, structured data).
  • Modular Integration: The platform's components are modular with APIs and SDKs (including JavaScript), allowing developers to integrate specific parts into their custom applications.
  • Data Sovereignty: Contextual AI offers SaaS, Snowflake partnership, and VPC installs, but currently does not support custom on-premises deployments or specific government clouds like AWS GovCloud, though open to demand.
  • Scalability: The platform is designed to scale with customers, handling tens of thousands of complex documents with hundreds of pages.
  • LMUnit Determinism: The LMUnit scoring is designed to be repeatable and consistent, using random seeds for testing.
  • Challenges & Future of RAG: Key challenges include complex document extraction (tables, multimodal charts), scalability, and handling structured data queries (text-to-SQL). The future involves dynamic workflows, model routers, and advanced tool use for deeper research capabilities.
  • Document Updates & Permissions: Contextual AI supports continuous ingestion for frequently updated content and is developing an entitlements layer to manage document-level permissions and governance for sensitive data like PHI.
  • Conflicting Information: The retrieval process brings all relevant information to the grounded language model, which attempts to reason through discrepancies; metadata (recency, authority) can help prioritize answers.