The 120 bill model is around 60GB in size; the 20 bill model is about 11GB.
Both are mixtures of experts (MoE); a request activates only a fraction of the parameters—5.1 billion active parameters per request for the 120 bill model.
20 bill model runs smoothly on laptops and likely on phones; 120 bill model can severely tax a laptop’s memory and is much better suited for desktops or cloud.
Cloud services like T3 Chat offer fast generation: the 20 bill model is free to use, 120 bill available with a subscription.
These are not the “Horizon” models (which are especially strong at coding), and perform less impressively at some code generation tasks.
Tool calling reliability varies widely across providers due to differences in the custom “Harmony” middleware implementation.
Error rates and edge cases (like missing required JSON fields) appear more frequently in the 20 bill model.
Specific benchmarks, like “SnitchBench” (sensitivity to prompts) and “Skatebench” (knowledge of skate trick names), show the GPT OSS models performing better than many Chinese models but below the top proprietary models.
Benchmark Results & Comparison to Other Models 20:24
OpenAI claims the 120 bill model matches 03/04 Mini in reasoning tools and the 20 bill model compares to 03 Mini.
In specialized tests (e.g., HealthBench), the OSS 120 bill model outperforms 04 Mini.
The 20 bill model offers comparable performance to 03 Mini, despite being much smaller.
Benchmarks by Artificial Analysis place the 120 bill model between Quen 3 (235B) and Gemini 2.5 Flash in general intelligence.
Cost is a significant advantage: open models are much cheaper than proprietary ones, with 15–25 cents per million input tokens for the largest model.
Technical Insights: Architecture and Capabilities 28:07
Both models are text-only and sparsely activate parameters for efficiency.
120 bill model: 36 layers, 64 query heads per layer; 20 bill model: 24 layers.
Rotary embeddings and “yarn” extend context windows to 128K tokens.
The models activate a small percent of parameters per prompt, with higher sparsity at larger sizes.
OpenAI's open models are significant for open source AI: they provide competitive intelligence, reasonable size, affordability, and strong instruction following.
They can run on consumer hardware, making advanced AI more accessible.
Future comparisons with upcoming models (like GPT-5) and further benchmarking are expected.