OpenAI's New OPEN Models - GPT-OSS 120B & 20B Introduction and Model Overview 00:00
OpenAI has released two new open weight models: a 120 billion parameter model and a 20 billion parameter model.
The models are released under an Apache 2.0 license, allowing broad and unrestricted use.
There is discussion over the use of “open source” in the naming; the models are more accurately “open weight” rather than fully open source, since base models, training code, checkpoints, and datasets are not included.
Model Details and Design Choices 02:49
The models are trained using methods similar to GPT-3 and GPT-4, including reinforcement, supervision, and instruction tuning.
The 120B model targets cloud and high-GPU environments, while the 20B model is intended for local use on personal machines.
Comparisons are made between these models and others: the 120B is likened to “O4 mini” and the 20B to “O3 mini.”
Both models are especially suitable for agentic workflows such as instruction following, tool use, web search, and code execution.
Three levels of reasoning effort (low, medium, high) can be set via system prompt, balancing latency and performance requirements.
Technical Architecture and Multilingual Capabilities 06:04
Both models use a Mixture of Experts (MoE) architecture: 120B model uses 5B active parameters; 20B uses 3.6B active parameters.
Rotary positional embeddings are implemented, supporting up to a 128K context window (though likely trained up to 32K).
Both models are largely English-only at release; more multilingual capabilities may come in later versions.
Benchmarks and Performance 08:07
Benchmark comparisons are limited to OpenAI’s own models; broader comparisons to other models are not included yet.
Performance with tools is notably higher than without tools, indicating strength for agentic applications.
Strong results are shown for function calling and reasoning tasks; longer reasoning chains yield higher accuracy.
Potential benchmark overfitting is questioned as smaller models sometimes outperform larger ones.
Using the Models: Cloud and Local Deployment 10:36
Access is available via OpenRouter with various providers, making it simple to try the models in the cloud.
API includes a “reasoning effort” setting (high, medium, low) impacting the depth and thoroughness of responses.
Models often present answers in table format and have personality traits similar to other OpenAI models.
Running locally with GPU is possible; installation of Triton is recommended for handling 4-bit quantization, reducing hardware requirements.
The 20B model can be run on systems with 16GB RAM, while 120B requires more substantial hardware.
API, SDKs, and Developer Tooling 14:32
The new OpenAI Harmony SDK is supported, which helps structure response format and manage roles (system, developer, user).
Knowledge cutoff for the models is June 2024.
Harmony API automates insertion of system knowledge, current date, and other context information.
Observations from Local Testing 16:06
When tested locally (e.g., via Olama), the model’s thinking is output as summaries rather than detailed chain-of-thought steps.
Response thoroughness is high but response speed is relatively slow compared to some alternatives.
Output quality is comparable to larger models from other vendors.
Agentic Use Cases and Final Thoughts 19:06
Initial agentic function tests (with tools and agentic frameworks) show promising results.
Release increases pressure on other major labs, especially in the West, to offer open models.
The move may be partly timed to precede the upcoming GPT-5 launch.
The community is encouraged to experiment, provide feedback on strengths and weaknesses, and explore agentic applications such as code and tool use.