SUMM

Ollama celebrated its second birthday at the ICML conference in Vancouver, including hosting a booth and a party.
The main announcement was a new app providing enhanced usability and features beyond the existing menu bar interface.
The new app interface allows users to access and use multiple models directly.
First-time use of any model triggers an automatic download within the app.

The interface functions similarly to ChatGPT, supporting chat-like interactions and displaying “thinking” statuses for responses.
Supports retrieval-augmented generation (RAG) by allowing users to chat with PDFs, images, and other files as source context.
Users can drag and drop multiple files for model context, such as slides or books, though there may be limits.
The app provides settings for adjusting model context size and managing file locations.

Previously, Ollama was limited to running small, local models, but now introduces “turbo mode” for accessing larger models hosted in the cloud.
Turbo mode allows fast interaction with models like Kimmy K2 directly from the app without needing a personal GPU or API setup.
Conversations using turbo mode are not stored in the cloud to the presenter’s knowledge.

Turbo mode requires creating an ollama.com account.
A free plan grants 10,000 credits (interpreted as tokens) per 7 days.
Users can upgrade to a Pro plan, though pricing details were not finalized in this pre-release version.
The founders do not intend turbo mode to be a major source of revenue but rather to fill gaps for users needing larger models.

Users have access to the Kimmy K2 model and Quen 3 Mixture of Experts models (both large and small variants).
The app allows importing custom models to work with images and PDFs.
Ollama’s app now runs on its own engines, not just Llama CPP.
The new app aims to appeal to users preferring a graphical interface over the command line.
Anticipated future updates may bring additional features.
The release offers a quick way to test local and large open models before integrating with other APIs or systems.

Ollama Gets a Turbo Update