SUMM

Mitesh Patel, leader of the Nvidia developer advocate team, introduces himself and his team's mission of creating and sharing technical workflows and codebases on GitHub for developer use.
The talk focuses on building a GraphRAG system, its advantages, and the benefits of a hybrid approach combining graph and vector retrieval.
The presentation provides a high-level overview rather than an in-depth code exploration; all related resources are available on GitHub.

Knowledge graphs represent relationships between entities (such as people, places, concepts, events).
These relationships (edges) between entities can offer a detailed, comprehensive view of knowledge, surpassing semantic-only systems in certain cases.
Knowledge graphs can organize data from multiple sources, making them useful for rich data retrieval and interpretation.
Creation of a GraphRAG or hybrid system involves four main components: data, data processing, graph creation (or semantic embedding), and inferencing (querying and user response).

The pipeline is split into offline (data processing, graph/vector creation) and online (querying/inferencing) stages.
Data is processed and used to create both a semantic vector database (via document chunking and embedding) and a knowledge graph (via entity and relationship extraction).
The process for building a knowledge graph involves LLM-assisted extraction of entity-relation-entity triplets from unstructured documents.
Ontology design and prompt engineering are critical for accurate and non-noisy triplet (knowledge graph) creation; this step often requires iteration and occupies most of the development time.

Creating a semantic vector database involves chunking documents, choosing chunk size and overlap, and embedding each chunk for storage in a vector database.
Chunk overlap is important to maintain contextual continuity between document segments.
Semantic vector retrieval is straightforward, but does not naturally capture detailed entity relationships as graph-based systems do.

Retrieving information from a knowledge graph can involve single-hop or multi-hop traversal strategies to exploit relationships across nodes.
Deeper graph traversal can provide better context but increases latency, creating a need to balance depth and performance.
Graph traversal latency can be reduced with libraries that accelerate graph search (e.g., CoolGraph integrated with NetworkX).

Multiple metrics are used to evaluate retrieval systems, such as faithfulness, answer relevancy, precision, recall, coherence, and verbosity.
The Ragas library provides end-to-end evaluation of RAG pipelines, using LLMs to assess query, retrieval, and response quality; it supports customizable models and parameters.
LLM reward models, like Lanimotron (a 340 billion parameter model), can judge the quality of LLM outputs along several dimensions.

The 80/20 rule applies: initial system development is quick, but optimization for production quality requires significant iterative effort.
Success depends heavily on the quality of the knowledge graph—fine-tuning LLMs, cleaning data (removing extraneous characters), and tuning output length all improve triplet extraction quality.
Experiments show that cleaning data and fine-tuning models (e.g., using llama 3.3 + LoRA) can raise knowledge graph accuracy from 71% to 87% (measured on 100 documents).

As knowledge graphs scale to millions/billions of nodes, efficient search and reduced latency are critical.
Integrating CoolGraph with NetworkX demonstrates substantial latency reductions in search operations, enabling practical use of large-scale knowledge graphs.

Use of graph, semantic, or hybrid retrieval depends on data structure and application needs.
Structured datasets (e.g., retail, financial, employee records) are well-suited for graph-based systems.
For unstructured data, success depends on your ability to extract quality knowledge graphs.
Graph approaches are ideal when complex relationships must be understood and retrieved, but are compute-intensive and should be used judiciously.

All workflow details, including fine-tuning and code, are available on GitHub; a two-hour workshop covers the material in depth.
Developers are encouraged to join Nvidia’s mailing list for updates and resources.
The speaker invites further questions and interaction at the Neo4j booth.

HybridRAG: A Fusion of Graph and Vector Retrieval to Enhance Data Interpretation - Mitesh Patel