SUMM

GraphRAG aims to make LLMs smarter by adding knowledge graphs into the Retrieval-Augmented Generation (RAG) pipeline.
Traditional LLMs often lack enterprise domain knowledge, struggle with verification/explanation, are prone to hallucinations, and present ethical/data bias concerns.
Vector databases used in standard RAG systems provide limited, sometimes irrelevant results and lack explainability/scalability for robust enterprise solutions.
Knowledge graphs bring accurate, contextual, and explainable answers by providing structured data.

Knowledge graphs improve context and explanation by structuring nodes, relationships, and properties, allowing for richer metadata connections.
Microsoft Research and other papers demonstrate GraphRAG leads to better results and reduced token costs versus standard RAG methods.
Industry studies, including an early data.world study, show a 3x improvement in LLM response accuracy using graph-based retrieval.
Gartner reports GraphRAG as an emerging trend, breathing new life into AI with its grounding in facts and reduction of hallucinations.
Large organizations (e.g., LinkedIn) achieve production benefits: LinkedIn’s use case noted a 28.6% reduction in customer support issue resolution time with knowledge graphs.

Constructing a knowledge graph from unstructured data involves:
- Substructuring data into a lexical graph (representing documents, chunks, and relationships).
- Entity extraction using LLMs based on a graph schema to identify entities and relationships.
- Enriching the graph via algorithms like PageRank or community summarization.
High initial effort in data engineering leads to higher quality, structured data and richer retrieval results for queries.

Patterns for structuring/querying graphs are collected and publicly shared on graph.com.
Lexical graphs capture document structure; domain graphs capture relationships between entities and concepts.
Graphs facilitate complex relationships, like linking document elements by parent/child or semantic similarity.
Entity extraction now leverages LLMs' multilingual capabilities and large context windows for recognition/matching tasks.
Knowledge graphs can be enriched with existing structured data (e.g., CRM entities linked with call transcripts).
Graph algorithms support further enrichment, such as topic clustering across multiple documents.

Retrieval in GraphRAG is more advanced than simple vector search; it includes index search (vector, full text, spatial) followed by relationship-based context expansion.
User and external context determine what information and relationships are retrieved, making responses more tailored.
Modern LLMs can process graph patterns, enabling richer context provision (node-relationship-node patterns).
Graph algorithms further enhance retrieval with features like clustering and link prediction.

Tools exist for extracting knowledge graphs from unstructured sources like PDFs, YouTube transcripts, Wikipedia articles.
Demonstration showed a tool that lets users upload varied content, build a knowledge graph with extracted entities and relationships, and define extraction schemas for improved results.
Users can trace back responses to their sources and the graph entities involved, supporting explainability and evaluation.
An "agentic" approach breaks questions down into subtasks, each handled by domain-specific retrievers/cypher queries, creating more nuanced responses.

The graph python package consolidates graph construction and retrieval into one pipeline, with visualization capabilities.
All tools, patterns, and examples are available as open source on graph.com, with contributions welcomed.
Attendees were invited to visit the Neo4j booth for further discussion and demonstrations.

Practical GraphRAG: Making LLMs smarter with Knowledge Graphs — Michael, Jesus, and Stephen, Neo4j