SUMM

Enterprises are key to realizing AI's broad impact due to their massive reach and resources.
Slow adoption in enterprise is often cited as evidence of AI hype; however, their involvement signals true industry transformation.
The biggest value is unlocked when enterprises move from buying vertical solutions (like AI for sales or marketing) to actually building with AI themselves.

The journey typically begins with enterprises using closed models (OpenAI, Anthropic), often via their own dedicated cloud deployments for security and privacy reasons.
Most enterprises rely on existing predictive ML teams to spearhead initial AI projects with these platforms.
In 2023, AI adoption was mostly experimental or for "toying around"; by 2024, 40–50% of companies in discussions had production use cases.
In 2025, cracks emerged in the assumption that enterprises could always rely on closed "frontier" models for all needs.

Vendor lock-in is not usually a concern, as multiple major providers offer similar, interoperable products.
Ballooning API cost was not widely cited initially, given plummeting token prices, but has become an issue with the rise of complex, agentic use cases creating many inference calls per action.
Security, privacy, and compliance are largely addressed by providers via dedicated deployments.

Quality: Generalist models aren't always best; specialized tasks (e.g., medical document extraction, transcription of medical jargon) sometimes outperformed by in-house/custom models, especially when companies have large labeled datasets.
Latency: High-throughput API-based models often sacrifice latency, which becomes problematic for real-time or voice applications that are latency sensitive.
Unit Economics: As usage scales and agentic use cases proliferate, costs rise dramatically. Enterprises see potential to lower costs and control pricing by running their own models.
Differentiation ("Destiny"): Enterprises worry that using the same off-the-shelf models as competitors yields no competitive advantage. In-house or customized models can become a source of differentiation.

Transitioning from calling a simple API to operating in-house models requires significant investment in inference infrastructure.
Performance: Achieving high throughput and low latency (especially for agentic or latency-sensitive use cases) requires ongoing optimization at both the model and infrastructure levels—e.g., speculative decoding, prefix caching, and disaggregated serving.
Reliability: Achieving high-availability targets (e.g., "four nines" uptime) is challenging. Failures and hardware issues have major impacts unless mitigated; scaling up to handle traffic spikes can be slow and problematic.
Observability and Control: Sophisticated tooling, lifecycle management, auditing, and compliance are needed for mission-critical applications—involvement goes far beyond basic logs and metrics.
Speed of Engineering: Building internal capabilities without slowing down innovation or overprovisioning resources is complex and resource-intensive.

Enterprises must decide whether to build these difficult layers of infrastructure in-house or buy an external solution/platform.
The speaker’s role involves convincing enterprises that buying this layer is often more efficient and reliable than building it themselves.

The speaker invites further discussions both from enterprises facing these challenges and startups aiming to sell to them.
Encourages networking and engagement to share learnings about successful and unsuccessful strategies in deploying AI for enterprise.
Announces a social event for continued discussions.

The Rise of Open Models in the Enterprise — Amir Haghighat, Baseten