OpenAI's mystery models are insane...

Introduction and New OpenAI Models 00:00

  • OpenAI has released a new mystery model called 03 Alpha, visible on LM Arena.
  • 03 Alpha appears to be extremely capable at coding, potentially linked to a recent outstanding result at a top coding competition.
  • Additional experimental models from OpenAI have achieved gold at the International Math Olympiad (IMO), marking a significant achievement in AI reasoning.

03 Alpha in Coding Competitions 00:28

  • 03 Alpha's metadata: Model ID 03 Alpha Responses 2025 717, provider OpenAI.
  • The model can develop complex games (e.g., Space Invaders, 3D Pokedex, Doom) with notable improvement in polish and features over previous models.
  • The ATCoder World Tour Finals 2025 saw an OpenAI model (likely 03 Alpha) achieve second place, with a Polish programmer named Psycho winning first after a 10-hour marathon.
  • Psycho, a former OpenAI employee, defeated both AI and other human competitors.
  • The competition leaderboard showed Psycho in first, the OpenAI model in second, and other humans filling subsequent ranks.
  • Greg Brockman, OpenAI CTO, shared live updates during the competition, noting the narrow competition between the AI and leading human participants.

OpenAI's Experimental Math Model and Achievements 03:07

  • Soon after the coding competition, OpenAI introduced an experimental reasoning LLM that achieved gold-level performance at the 2025 International Math Olympiad.
  • The model was tested under human-equivalent conditions: two 4.5-hour exam sessions, official problem statements, no internet/tools, and natural language proof submissions.
  • IMO problems require extended, creative reasoning—latest AI benchmarks now span from seconds (GSM8K) to 100 minutes (IMO-level) in reasoning time horizon.
  • Achieving gold at the IMO demonstrates the model's capability to solve frontier math problems typically reserved for elite humans.

Verification, Rewarding, and Generalization Challenges in AI 05:38

  • IMO math submissions are multi-page proofs not easily verifiable by programs; current AI progress relies on verifiable rewards (math, coding problems with clear solutions).
  • Moving toward valuable but unverifiable tasks (like IMO proofs) is challenging, as scaling may require human or AI-based judges.
  • Recent advances suggest models can now judge one another's outputs effectively.
  • Groundbreaking results have come via general-purpose reinforcement learning and increased test-time computational scaling, not narrow, task-specific methods.

The Bitter Lesson in AI and Path to Superintelligence 06:44

  • Richard Sutton's "bitter lesson": Best AI results come from large-scale approaches that remove humans from the loop.
  • In both chess AI and Tesla’s self-driving, the move away from hand-coded rules towards scalable, self-taught neural networks has consistently outperformed human-guided systems.
  • OpenAI's recent advances are seen as applying this lesson by embracing large-scale RL and reducing reliance on human-intervened training.

Details and Future of OpenAI’s High-Performing Models 08:10

  • The new math model solved 5 out of 6 IMO 2025 problems; three independent former IMO medalists graded proofs unanimously.
  • OpenAI announced that GPT-5 would be released soon, but the IMO Gold LLM is not GPT-5 and will not be publicly released for several months.
  • OpenAI’s models have quickly advanced to near or above top human performance in both coding and advanced mathematical reasoning competitions.
  • The video concludes by emphasizing the ongoing rapid acceleration in AI capability advancement.