open-rag-eval: RAG Evaluation without "golden" answers — Ofer Mendelevitch, Vectara

Introduction to Open RAG Eval 00:01

  • Ofer Mendelevitch introduces Open RAG Eval, an open-source project designed for quick and scalable RAG evaluation.
  • The project addresses the challenge of needing "golden" answers for RAG evaluation, which is not scalable.

Architecture Overview 00:25

  • The evaluation process begins with a set of queries collected for the RAG system.
  • A RAG connector gathers the actual chunks and answers generated by the RAG pipeline, compatible with various frameworks like Vector, LangChain, and Llama Index.

Evaluation Metrics 01:36

  • Evaluation metrics are grouped into evaluators, which generate files necessary for assessing the RAG pipeline.
  • Key metrics include:
    • Umbrella: A retrieval metric scoring chunks from 0 to 3 based on relevance to the query, correlating well with human judgment.
    • Auto Nuggetizer: A generation metric that creates atomic units called "nuggets," which are rated and analyzed by an LLM judge for support by the RAG response.
    • Citation Faithfulness: Measures the accuracy of citations in the response.
    • Hallucination Detection: Utilizes Vectara's model to ensure responses align with retrieved content.

User Interface and Conclusion 04:05

  • A user-friendly interface allows users to drag and drop evaluation files for analysis.
  • The tool enhances transparency in metrics and supports contributions for additional RAG pipeline connectors.
  • Mendelevitch encourages exploration of the package, emphasizing its potential for optimizing RAG pipelines.