AI models have rapidly advanced from struggling with grade school math to achieving elite-level performance on high school olympiad problems in just a few years
Recent AI benchmarks have progressed quickly from GSM8K to Math, AIME, and now USAMO and IMO
OpenAI's model has attained gold medal performance at the International Math Olympiad (IMO), considered a key milestone in AI development
The quest for the IMO gold has been a long-term ambition within OpenAI, with renewed focus in the last few months leading up to the competition
The core team consisted of just three people, but many others at OpenAI contributed in supporting roles
Researchers at OpenAI are empowered to pursue high-impact projects, and the IMO effort was initially driven by a new technique proposed by Alex
There was early skepticism about success, but promising results led to wider support
Verification and Grading of AI-Generated Proofs 05:27
Proofs generated by the model were often not human-readable but were published in their raw form for transparency
To verify correctness, OpenAI hired former IMO medalists to grade each proof, requiring unanimous agreement on accuracy
Even OpenAI researchers found the proofs too advanced to comprehend, highlighting the sophistication of the model
Tackling the Hardest Problems: Problem 6 and Model Self-Awareness 07:49
Problem 6 at the IMO is traditionally the hardest; this year, no models, including OpenAI's, solved it
The model displayed self-awareness by not attempting an answer it couldn't solve, instead responding "no answer" rather than hallucinating a solution
This self-awareness marks an improvement over earlier models, which would fabricate convincing but incorrect answers
Model Strengths, Limitations, and Remaining Challenges 10:09
While models excel at certain types of problems, like geometry and stepwise reasoning, they struggle more with abstract and high-dimensional problems such as combinatorics
Internal optimism about winning IMO gold was cautious, with some betting their chances were less than one in three
Progress in mathematical capability has moved from seconds-long problems (GSM8K) to IMO-level problems that take humans hours, but research-level problems remain orders of magnitude more challenging
Scaling up “test time compute” (letting models think for longer) was key to success, now moving from 0.1 minutes to 100 minutes of reasoning
As models take longer to solve problems, evaluation also takes longer, which slows progress for very long tasks
Multi-agent systems and parallel compute were leveraged, prioritizing general-purpose techniques over bespoke, narrow solutions
The same infrastructure and approach used for the IMO model are shared with other OpenAI projects, aiming for broad applicability
Formal vs. Informal Reasoning, and Use of Natural Language 17:53
Unlike the IMO’s official AI track, which used Lean for formal proof verification, OpenAI prioritized natural language and informal reasoning for greater generality
Formal and informal methods are seen as complementary rather than directly competing, with informal reasoning offering a broader kernel of difficulty
Problems were fed to the model upon release, with the team monitoring progress overnight and manually checking proofs before sending them to graders
The model could express confidence or uncertainty in its proofs, providing hints about its internal "feelings" about progress
Beyond Competition Math: Next Steps and Frontiers 23:46
OpenAI’s model performs even better on the Putnam exam (more knowledge-heavy, less time per problem) than on IMO problems
The next frontier is solving problems that require much longer reasoning, akin to research-level tasks (hundreds to thousands of hours of human effort)
Progress remains: generating novel problems is still a human-intensive task, but the team sees no fundamental barriers to AI eventually doing this as well
The focus was on developing general-purpose techniques, with the expectation that these will improve AI reasoning in fields beyond math
Incorporation of these advances into broader OpenAI models is ongoing and will take more time
Physics Olympiad presents additional challenges due to experimental tasks, which current models cannot yet handle
Model Release and Collaboration with Mathematicians 28:03
OpenAI aims to make the math-capable model accessible to mathematicians, but details are still being worked out
Ongoing dialogue with researchers tests the model on unsolved math problems, with the model’s growing ability to admit uncertainty considered a meaningful milestone