OpenAI's New OPEN Models - GPT-OSS 120B & 20B

Introduction and Model Overview 00:00

  • OpenAI has released two new open weight models: a 120 billion parameter model and a 20 billion parameter model.
  • The models are released under an Apache 2.0 license, allowing broad and unrestricted use.
  • There is discussion over the use of “open source” in the naming; the models are more accurately “open weight” rather than fully open source, since base models, training code, checkpoints, and datasets are not included.

Model Details and Design Choices 02:49

  • The models are trained using methods similar to GPT-3 and GPT-4, including reinforcement, supervision, and instruction tuning.
  • The 120B model targets cloud and high-GPU environments, while the 20B model is intended for local use on personal machines.
  • Comparisons are made between these models and others: the 120B is likened to “O4 mini” and the 20B to “O3 mini.”
  • Both models are especially suitable for agentic workflows such as instruction following, tool use, web search, and code execution.
  • Three levels of reasoning effort (low, medium, high) can be set via system prompt, balancing latency and performance requirements.

Technical Architecture and Multilingual Capabilities 06:04

  • Both models use a Mixture of Experts (MoE) architecture: 120B model uses 5B active parameters; 20B uses 3.6B active parameters.
  • Rotary positional embeddings are implemented, supporting up to a 128K context window (though likely trained up to 32K).
  • Both models are largely English-only at release; more multilingual capabilities may come in later versions.

Benchmarks and Performance 08:07

  • Benchmark comparisons are limited to OpenAI’s own models; broader comparisons to other models are not included yet.
  • Performance with tools is notably higher than without tools, indicating strength for agentic applications.
  • Strong results are shown for function calling and reasoning tasks; longer reasoning chains yield higher accuracy.
  • Potential benchmark overfitting is questioned as smaller models sometimes outperform larger ones.

Using the Models: Cloud and Local Deployment 10:36

  • Access is available via OpenRouter with various providers, making it simple to try the models in the cloud.
  • API includes a “reasoning effort” setting (high, medium, low) impacting the depth and thoroughness of responses.
  • Models often present answers in table format and have personality traits similar to other OpenAI models.
  • Running locally with GPU is possible; installation of Triton is recommended for handling 4-bit quantization, reducing hardware requirements.
  • The 20B model can be run on systems with 16GB RAM, while 120B requires more substantial hardware.

API, SDKs, and Developer Tooling 14:32

  • The new OpenAI Harmony SDK is supported, which helps structure response format and manage roles (system, developer, user).
  • Knowledge cutoff for the models is June 2024.
  • Harmony API automates insertion of system knowledge, current date, and other context information.

Observations from Local Testing 16:06

  • When tested locally (e.g., via Olama), the model’s thinking is output as summaries rather than detailed chain-of-thought steps.
  • Response thoroughness is high but response speed is relatively slow compared to some alternatives.
  • Output quality is comparable to larger models from other vendors.

Agentic Use Cases and Final Thoughts 19:06

  • Initial agentic function tests (with tools and agentic frameworks) show promising results.
  • Release increases pressure on other major labs, especially in the West, to offer open models.
  • The move may be partly timed to precede the upcoming GPT-5 launch.
  • The community is encouraged to experiment, provide feedback on strengths and weaknesses, and explore agentic applications such as code and tool use.