SUMM

The speaker, Craig, leads product management at Databricks and has prior experience at Google (Vertex AI) and AWS (SageMaker).
Databricks is a leading cross-cloud data platform with tens of thousands of customers and billions in revenue.
The company is known for popular open-source tools such as Spark, MLflow, and Delta.

Large enterprises often deal with highly fragmented data due to multiple acquisitions and different systems across various clouds.
Data is scattered among numerous warehouses, making integration and access difficult.
Expertise within organizations is often siloed, complicating cross-platform data utilization, especially for AI initiatives.

Databricks focuses on managing data and delivering AI capabilities, particularly with Mosaic AI.
Emphasis is placed on addressing "data intelligence" as distinct from "general intelligence," focusing on connecting AI systems to an enterprise's complex data estate.

FactSet, a financial services company, had a proprietary query language (FQL) that limited customer access.
Initial GenAI solution: English-to-FQL translation achieved 59% accuracy with a 15-second latency (correlating to cost).
Decomposition and agent-based multi-step process improved accuracy to 85% with a 6-second latency, later reaching "the 9s."
Highlights the importance of breaking down complex prompts into manageable tasks for performance tuning.

Databricks prioritizes enabling high-value use cases with financial or reputational risk.
Successful enterprise AI systems require:
- Governance: Controlling access to data, models, tools, and queries at a granular level.
- Evaluation: Quantifying and improving model/system accuracy objectively.
Many organizations try to create deterministic systems using inherently probabilistic AI components, requiring careful management.

Databricks governs not just data but also models and tool access.
Agents are treated as principals, and all actions are tightly controlled, with further capabilities forthcoming.
Incorporating vector stores or feature stores for reasoning over data is standard.

Tool calling allows LLMs to select among various tools or pathways, supporting quasi-deterministic outcomes.
The integration of Claude (frontier LLM by Anthropic) into Databricks has improved the accuracy of tool selection.
Claude is natively available across all major clouds (Azure, AWS, GCP) within Databricks, supporting advanced agent use cases.

Highly governed industries (banks, hospitals) are now able to use generative AI after implementing robust controls.
Databricks and Claude together enable customers to unlock high-value use cases, moving AI from experimental to operational stages.

Databricks provides an evaluation (eval) platform involving golden data sets and LLM-based judges for assessing system performance.
The platform includes simplified UIs for subject matter experts to provide feedback and corrections.
Much of the evaluation tooling is open source via MLflow, though some custom judges remain proprietary.

Databricks uses Claude to automate answering extensive questionnaires from analysts (e.g., Gartner, Forrester), streamlining from hundreds of employee hours to simple editing.
Iteration from open-source models to non-Anthropic models finally culminated in using Claude for shippable, high-quality results.
Block (formerly Square) uses Databricks and Claude to power "Goose," an open-source agentic developer environment, achieving 40-50% weekly user adoption increases and saving 8-10 hours per week.

Enterprises are encouraged to identify AI use cases, define success metrics, and contact Databricks or Anthropic for tailored support.
Building composable agentic systems allows for greater control and tuning in high-risk environments.
The approach involves decomposing problems and building deterministic behaviors atop probabilistic LLMs.
Databricks aims to tightly integrate AI and data layers for next-level productivity gains.

"Safe score" in LLM judges acts as a simple guardrail, not an adversarial (red teaming) metric.
Differentiation from competitors is rooted in deep integration between data and AI rather than point solutions.
Encouragement is given to use composable, agent-based architectures for fine-grained control and error mitigation.
Final thanks and invitation for further discussion.

Spotlight on Databricks: Driving data intelligence with AI