SUMM

Turbopuffer is a fast-growing vector database and search engine used by leading AI applications like Cursor, Notion, and Linear.
CEO Simon previously spent a decade at Shopify working on significant infrastructure challenges.
The discussion explores the need for new search paradigms, the evolution of vector databases, and Simon's learnings from building at the AI app frontier.

Early AI apps had small context windows (e.g., 8K tokens), which necessitated the use of vector indexing and retrieval-augmented generation (RAG).
As context windows expanded, some applications simply filled them with more data, but this approach had limitations regarding latency, recall, and training difficulty.
Organizations now wish to connect tens or hundreds of millions of tokens, with complex permissioning and high recall demands, exceeding what large context windows alone can handle.
Traditional storage architectures are costly due to triple disk replication, while Turbopuffer’s storage model (object storage with tailored trade-offs) enables much cheaper and more scalable search.

SCRAP is an acronym standing for Scale, Cost, Recall, ACL (Access Control Lists), and Performance.
Scale: Eventually, even very large context windows are insufficient for certain analytical queries.
Cost: Keeping all data in high-performance memory (VRAM/DRAM) is expensive, pushing for object storage solutions.
Recall: Achieving high recall over large corpora is challenging due to scarce benchmarking datasets.
ACL: Context windows don’t currently provide granular enough access controls; workaround solutions are needed.
Performance: Loading very large context windows can compromise sub-second response times desired by users.

An increasing number of customers require connecting massive amounts of data for semantic search or RAG workflows.
Earlier, this was uncommon, but now diverse and large-scale applications drive adoption of platforms like Turbopuffer.

Recent advancements make object-storage-based databases viable:
- NVMe SSDs (since ~2017) provide high bandwidth at much lower cost than DRAM.
- S3 gained strong consistency (since 2020), a critical primitive for databases.
- S3 and other object stores introduced compare-and-swap primitives (from 2024), needed for distributed synchronization.
This enables, for the first time, performant, cost-efficient, and scalable search engines built entirely on object storage.

The primary trade-off is higher write latency (100-200ms) due to remote commits to S3.
Acceptable for search workloads but not for high-frequency transactional workloads (e.g., e-commerce checkouts).
Benefits include scalability, cost efficiency, architectural simplicity, and high durability.
Occasional cache misses can add latency but are manageable; the upside is simpler system management and recovery.

Turbopuffer excels when searching over very large datasets (tens or hundreds of millions, or billions of vectors).
For small datasets, traditional relational databases with vector extensions are often sufficient.
As data size and search complexity grow, specialized databases like Turbopuffer become economically and technically necessary.
Historically, full-text search has been split out from transactional databases at scale; vector search accelerates this need.

Cursor uses Turbopuffer to enable semantic search across codebases, powering code-related queries via RAG.
Notion integrates Turbopuffer for Q&A features, enabling natural language search across internal wikis and documents.
Linear uses Turbopuffer for similarity search and deduplication (e.g., finding duplicate issues).

Traditional databases and tools like Elasticsearch are effective at smaller scales or when all data fits in memory.
At massive scale (billions or trillions of documents), cost and performance limitations make object storage-based search engines more attractive.
Turbopuffer’s pricing aligns with the cost structure of object storage, making it more sustainable for large applications.

Simplicity in design is key to scalable, reliable systems.
Keeping all data (including metadata) on object storage was initially forced but turned out to be beneficial.
Various engineering tricks support scalability and performance (e.g., read-through caches, use of S3-specific features).

Maintaining up-to-date, high-recall vector indexes is challenging, especially as data changes or grows.
Approximate nearest neighbor (ANN) search is used; customers typically feel comfortable at about 95% recall.
Incremental index maintenance at scale (hundreds of millions or billions of vectors per shard) is technically difficult.
Filtering during vector search (e.g., combining fuzzy vector results with hard filters like shipping regions) presents further complexity.

Turbopuffer prioritizes a simple, focused product over bundling multiple features prematurely.
Reliability for customers is paramount, influencing technical and product decisions.
Expansion into adjacent spaces (e.g., embeddings, re-ranking) will be gradual and only if customer demand and fit justify it.

The AI infrastructure stack evolves rapidly, with changing model capabilities and usage patterns.
Specialized, high-quality component providers currently have an advantage; bundling may come later but risks diluting focus and product quality.
New AI “table stakes” features for SaaS may include semantic search, similarity/recommendations, advanced reporting, and agentic workflows.

AI “memory” features are emerging, but the data scale varies by implementation.
Turbopuffer can handle both simple key-value storage and complex vector search for memory use cases.
Multimodal search (beyond text, such as images or PDFs) is possible and supported but not yet widespread among customers.

Simon values trusting instincts and focusing the team on core strengths, resisting outside pressure to overstretch.
Key product lessons include recognizing the power of simplicity and matching product features to real customer needs.
Looking forward, the evolution of AI agents’ reliance on search and the future of software engineering (with LLMs as learning companions) are open questions.
Final advice: reliability and simplicity, partnered with close customer alignment, are crucial in AI infrastructure engineering and product design.

The Infrastructure Company Powering the Top AI Apps