The evaluation process begins with a set of queries collected for the RAG system.
A RAG connector gathers the actual chunks and answers generated by the RAG pipeline, compatible with various frameworks like Vector, LangChain, and Llama Index.
Evaluation metrics are grouped into evaluators, which generate files necessary for assessing the RAG pipeline.
Key metrics include:
Umbrella: A retrieval metric scoring chunks from 0 to 3 based on relevance to the query, correlating well with human judgment.
Auto Nuggetizer: A generation metric that creates atomic units called "nuggets," which are rated and analyzed by an LLM judge for support by the RAG response.
Citation Faithfulness: Measures the accuracy of citations in the response.
Hallucination Detection: Utilizes Vectara's model to ensure responses align with retrieved content.