HybridRAG: A Fusion of Graph and Vector Retrieval to Enhance Data Interpretation - Mitesh Patel

Introduction and Purpose of Talk 00:00

  • Mitesh Patel, leader of the Nvidia developer advocate team, introduces himself and his team's mission of creating and sharing technical workflows and codebases on GitHub for developer use.
  • The talk focuses on building a GraphRAG system, its advantages, and the benefits of a hybrid approach combining graph and vector retrieval.
  • The presentation provides a high-level overview rather than an in-depth code exploration; all related resources are available on GitHub.

Knowledge Graph Basics and Advantages 01:21

  • Knowledge graphs represent relationships between entities (such as people, places, concepts, events).
  • These relationships (edges) between entities can offer a detailed, comprehensive view of knowledge, surpassing semantic-only systems in certain cases.
  • Knowledge graphs can organize data from multiple sources, making them useful for rich data retrieval and interpretation.
  • Creation of a GraphRAG or hybrid system involves four main components: data, data processing, graph creation (or semantic embedding), and inferencing (querying and user response).

Building a GraphRAG System: Workflow and Components 03:28

  • The pipeline is split into offline (data processing, graph/vector creation) and online (querying/inferencing) stages.
  • Data is processed and used to create both a semantic vector database (via document chunking and embedding) and a knowledge graph (via entity and relationship extraction).
  • The process for building a knowledge graph involves LLM-assisted extraction of entity-relation-entity triplets from unstructured documents.
  • Ontology design and prompt engineering are critical for accurate and non-noisy triplet (knowledge graph) creation; this step often requires iteration and occupies most of the development time.

Vector Database and Embeddings 07:50

  • Creating a semantic vector database involves chunking documents, choosing chunk size and overlap, and embedding each chunk for storage in a vector database.
  • Chunk overlap is important to maintain contextual continuity between document segments.
  • Semantic vector retrieval is straightforward, but does not naturally capture detailed entity relationships as graph-based systems do.

Retrieval and Graph Traversal Strategies 09:01

  • Retrieving information from a knowledge graph can involve single-hop or multi-hop traversal strategies to exploit relationships across nodes.
  • Deeper graph traversal can provide better context but increases latency, creating a need to balance depth and performance.
  • Graph traversal latency can be reduced with libraries that accelerate graph search (e.g., CoolGraph integrated with NetworkX).

Evaluation and Optimization 11:16

  • Multiple metrics are used to evaluate retrieval systems, such as faithfulness, answer relevancy, precision, recall, coherence, and verbosity.
  • The Ragas library provides end-to-end evaluation of RAG pipelines, using LLMs to assess query, retrieval, and response quality; it supports customizable models and parameters.
  • LLM reward models, like Lanimotron (a 340 billion parameter model), can judge the quality of LLM outputs along several dimensions.

Practical Strategies and Fine-Tuning 13:44

  • The 80/20 rule applies: initial system development is quick, but optimization for production quality requires significant iterative effort.
  • Success depends heavily on the quality of the knowledge graph—fine-tuning LLMs, cleaning data (removing extraneous characters), and tuning output length all improve triplet extraction quality.
  • Experiments show that cleaning data and fine-tuning models (e.g., using llama 3.3 + LoRA) can raise knowledge graph accuracy from 71% to 87% (measured on 100 documents).

Scalability, Latency, and Technical Tweaks 16:56

  • As knowledge graphs scale to millions/billions of nodes, efficient search and reduced latency are critical.
  • Integrating CoolGraph with NetworkX demonstrates substantial latency reductions in search operations, enabling practical use of large-scale knowledge graphs.

Choosing Between Graph, Semantic, and Hybrid Approaches 18:07

  • Use of graph, semantic, or hybrid retrieval depends on data structure and application needs.
  • Structured datasets (e.g., retail, financial, employee records) are well-suited for graph-based systems.
  • For unstructured data, success depends on your ability to extract quality knowledge graphs.
  • Graph approaches are ideal when complex relationships must be understood and retrieved, but are compute-intensive and should be used judiciously.

Resources and Closing 19:28

  • All workflow details, including fine-tuning and code, are available on GitHub; a two-hour workshop covers the material in depth.
  • Developers are encouraged to join Nvidia’s mailing list for updates and resources.
  • The speaker invites further questions and interaction at the Neo4j booth.