LlamaIndex is a framework in Python and Typescript for building generative AI applications, specializing in agents.
LlamaParse is a service that parses complex document formats (PDFs, Word, PowerPoints) to improve agent quality by making unstructured data easier for LLMs to understand.
LlamaCloud is an enterprise service for ingesting documents and getting a retrieval endpoint, available as SaaS or for private cloud deployment.
LlamaHub is a registry of open-source software providing adapters for data integration from various sources (e.g., Notion, Slack, databases) and storage in vector databases.
LlamaIndex integrates with over 400 different models from more than 80 LLM providers, including local models, and offers pre-built agent tools.
The framework's core promise is to help users go faster by skipping boilerplate, providing best practices, and accelerating time to production.
LlamaIndex is particularly proficient in Retrieval Augmented Generation (RAG) and agents.
An agent is a semi-autonomous piece of software that can use tools to achieve a goal without explicit step-by-step instructions.
Agents represent a significant departure from traditional programming by giving LLMs decision-making power to select and use tools, making them highly flexible and powerful.
Agents are most useful for situations involving unstructured data and unexpected inputs, where LLMs excel at handling messy information.
A good agent use case involves an LLM transforming a large body of text into a smaller one, such as summarizing documents, interpreting contracts, or processing invoices.
It's recommended to integrate LLMs into existing software to leverage their ability to turn unstructured data into structured data for decision-making, moving beyond simple chatbot applications.
LLMs require contextual data relevant to a specific domain or problem to provide useful outputs, as generic queries are often insufficient.
Retrieval Augmented Generation (RAG) addresses the challenge of feeding large amounts of data to LLMs by embedding data into vectors for efficient searching in a vector database.
RAG allows only the most relevant context from a data corpus to be fed to the LLM, making interactions significantly faster and more cost-effective.
RAG is essential because sending less data for an LLM to process is always cheaper, faster, and results in more specific and accurate answers.
While agents can use RAG as a tool, RAG also benefits from agents, as naive top-K RAG often performs poorly.
Layering an agent on top of RAG significantly improves result quality and enables advanced capabilities like introspection (e.g., breaking down complex questions, re-extracting data, self-evaluating answers).
Agents enhance RAG's performance in both speed and accuracy.
Anthropic identified several effective agent design patterns, which LlamaIndex implements in its workflows:
Chaining: Involves passing the output of one LLM's work as input to another LLM in a sequence, easily built using LlamaIndex's workflow abstraction.
Routing: An LLM decides which of several LLM-based tools or paths to follow to solve different problems, implemented in LlamaIndex workflows using branches.
Parallelization: Running multiple LLMs concurrently and aggregating their results, with two flavors:
Sectioning: Acting on the same input in different ways (e.g., one track for processing, another for guardrails to check for illegal requests).
Voting: Giving the same query to multiple tracks (same or different LLMs) and comparing answers to reduce hallucination, as LLMs hallucinate in different ways.
Orchestrator Workers: An LLM splits a complex task into several simpler questions, asks them in parallel, and then aggregates the answers into a single, coherent response (e.g., for deep research). This pattern is also implemented using parallelization.
Evaluator Optimizer (Self-Reflection): An LLM evaluates its own output against the original goal and generates feedback for iterative improvement, implemented in LlamaIndex workflows using loops.
These patterns can be combined to create arbitrarily complex workflows for diverse circumstances.
In LlamaIndex, an agent's tools are defined as standard Python functions wrapped in a step wrapper.
Multi-agent systems can be created in LlamaIndex by providing an array of agents to a multi-agent system, which then manages control flow and interaction between them.
A tutorial notebook is available for building a deep research agent workflow.