OpenAI has released a new mystery model called 03 Alpha, visible on LM Arena.
03 Alpha appears to be extremely capable at coding, potentially linked to a recent outstanding result at a top coding competition.
Additional experimental models from OpenAI have achieved gold at the International Math Olympiad (IMO), marking a significant achievement in AI reasoning.
03 Alpha's metadata: Model ID 03 Alpha Responses 2025 717, provider OpenAI.
The model can develop complex games (e.g., Space Invaders, 3D Pokedex, Doom) with notable improvement in polish and features over previous models.
The ATCoder World Tour Finals 2025 saw an OpenAI model (likely 03 Alpha) achieve second place, with a Polish programmer named Psycho winning first after a 10-hour marathon.
Psycho, a former OpenAI employee, defeated both AI and other human competitors.
The competition leaderboard showed Psycho in first, the OpenAI model in second, and other humans filling subsequent ranks.
Greg Brockman, OpenAI CTO, shared live updates during the competition, noting the narrow competition between the AI and leading human participants.
OpenAI's Experimental Math Model and Achievements 03:07
Soon after the coding competition, OpenAI introduced an experimental reasoning LLM that achieved gold-level performance at the 2025 International Math Olympiad.
The model was tested under human-equivalent conditions: two 4.5-hour exam sessions, official problem statements, no internet/tools, and natural language proof submissions.
IMO problems require extended, creative reasoning—latest AI benchmarks now span from seconds (GSM8K) to 100 minutes (IMO-level) in reasoning time horizon.
Achieving gold at the IMO demonstrates the model's capability to solve frontier math problems typically reserved for elite humans.
Verification, Rewarding, and Generalization Challenges in AI 05:38
IMO math submissions are multi-page proofs not easily verifiable by programs; current AI progress relies on verifiable rewards (math, coding problems with clear solutions).
Moving toward valuable but unverifiable tasks (like IMO proofs) is challenging, as scaling may require human or AI-based judges.
Recent advances suggest models can now judge one another's outputs effectively.
Groundbreaking results have come via general-purpose reinforcement learning and increased test-time computational scaling, not narrow, task-specific methods.
The Bitter Lesson in AI and Path to Superintelligence 06:44
Richard Sutton's "bitter lesson": Best AI results come from large-scale approaches that remove humans from the loop.
In both chess AI and Tesla’s self-driving, the move away from hand-coded rules towards scalable, self-taught neural networks has consistently outperformed human-guided systems.
OpenAI's recent advances are seen as applying this lesson by embracing large-scale RL and reducing reliance on human-intervened training.
Details and Future of OpenAI’s High-Performing Models 08:10
The new math model solved 5 out of 6 IMO 2025 problems; three independent former IMO medalists graded proofs unanimously.
OpenAI announced that GPT-5 would be released soon, but the IMO Gold LLM is not GPT-5 and will not be publicly released for several months.
OpenAI’s models have quickly advanced to near or above top human performance in both coding and advanced mathematical reasoning competitions.
The video concludes by emphasizing the ongoing rapid acceleration in AI capability advancement.