Mitesh Patel, leader of the Nvidia developer advocate team, introduces himself and his team's mission of creating and sharing technical workflows and codebases on GitHub for developer use.
The talk focuses on building a GraphRAG system, its advantages, and the benefits of a hybrid approach combining graph and vector retrieval.
The presentation provides a high-level overview rather than an in-depth code exploration; all related resources are available on GitHub.
Knowledge graphs represent relationships between entities (such as people, places, concepts, events).
These relationships (edges) between entities can offer a detailed, comprehensive view of knowledge, surpassing semantic-only systems in certain cases.
Knowledge graphs can organize data from multiple sources, making them useful for rich data retrieval and interpretation.
Creation of a GraphRAG or hybrid system involves four main components: data, data processing, graph creation (or semantic embedding), and inferencing (querying and user response).
Building a GraphRAG System: Workflow and Components 03:28
The pipeline is split into offline (data processing, graph/vector creation) and online (querying/inferencing) stages.
Data is processed and used to create both a semantic vector database (via document chunking and embedding) and a knowledge graph (via entity and relationship extraction).
The process for building a knowledge graph involves LLM-assisted extraction of entity-relation-entity triplets from unstructured documents.
Ontology design and prompt engineering are critical for accurate and non-noisy triplet (knowledge graph) creation; this step often requires iteration and occupies most of the development time.
Creating a semantic vector database involves chunking documents, choosing chunk size and overlap, and embedding each chunk for storage in a vector database.
Chunk overlap is important to maintain contextual continuity between document segments.
Semantic vector retrieval is straightforward, but does not naturally capture detailed entity relationships as graph-based systems do.
Multiple metrics are used to evaluate retrieval systems, such as faithfulness, answer relevancy, precision, recall, coherence, and verbosity.
The Ragas library provides end-to-end evaluation of RAG pipelines, using LLMs to assess query, retrieval, and response quality; it supports customizable models and parameters.
LLM reward models, like Lanimotron (a 340 billion parameter model), can judge the quality of LLM outputs along several dimensions.
The 80/20 rule applies: initial system development is quick, but optimization for production quality requires significant iterative effort.
Success depends heavily on the quality of the knowledge graph—fine-tuning LLMs, cleaning data (removing extraneous characters), and tuning output length all improve triplet extraction quality.
Experiments show that cleaning data and fine-tuning models (e.g., using llama 3.3 + LoRA) can raise knowledge graph accuracy from 71% to 87% (measured on 100 documents).
As knowledge graphs scale to millions/billions of nodes, efficient search and reduced latency are critical.
Integrating CoolGraph with NetworkX demonstrates substantial latency reductions in search operations, enabling practical use of large-scale knowledge graphs.
Choosing Between Graph, Semantic, and Hybrid Approaches 18:07
Use of graph, semantic, or hybrid retrieval depends on data structure and application needs.
Structured datasets (e.g., retail, financial, employee records) are well-suited for graph-based systems.
For unstructured data, success depends on your ability to extract quality knowledge graphs.
Graph approaches are ideal when complex relationships must be understood and retrieved, but are compute-intensive and should be used judiciously.