Jasper Jang, co-founder of Hyperbolic, describes how frustration over limited GPU access led to building Hyperbolic, an on-demand AI cloud for developers.
The conversation shifts from Hyperbolic to Jasper's experience with Math Olympiads and recent AI achievements.
Both DeepMind and OpenAI have recently claimed gold medal-level performance at the International Mathematical Olympiad (IMO).
OpenAI's announcement came via social media, while DeepMind required additional verification from the competition organizers before confirming their results.
OpenAI did not officially participate; instead, they used IMO problems and had their solutions reviewed by former IMO medalists.
DeepMind's claim was officially verified by the competition.
IMO is a prestigious high school mathematics competition focused on combinatorics, number theory, geometry, and algebra, emphasizing problem solving over advanced knowledge.
Achieving gold medal level in IMO is considered a major milestone for AI, indicating advanced reasoning abilities.
Last year, DeepMind's approach with formal verification systems like Lean achieved a silver medal; this year, both Google and OpenAI succeeded with language model (LM) reasoning, moving away from formal proof languages.
Previous AI solutions could take up to 60 hours for some problems, far exceeding the human time limit of 4.5 hours; recent advances have dramatically reduced solution time.
Different IMO problems test varied skills: problem one often focuses on combinatorial geometry and case analysis, which AI models managed to solve using reduction strategies.
AI solutions sometimes approach problems differently from humans but can still produce valid answers.
While IMO tests logical problem solving, it does not evaluate deeper mathematical creativity or capability in advanced research.
The most difficult (sixth) problem usually remains unsolved by both AI and many humans, primarily requiring higher-level creativity and experimentation.
Past bench-marking shows AI struggles most with combinatorial creativity and constructing counterexamples.
Limitations of Current AI and Benchmark Improvements 15:00
Current benchmarks focus on competition problems and do not comprehensively evaluate true mathematical ability, such as conjecture formulation or advanced creativity.
Jasper is involved in a project to create a new, more holistic benchmark, breaking down mathematical expertise into categories: knowledge and understanding, problem solving and communication, learning/meta-skills, creativity, and intuition.
Collaboration between mathematicians and AI researchers is emphasized as key to developing effective measures and reward functions for creativity.
Lean, Natural Language, and Proof Verification 23:29
AI models can approach mathematical proofs either with natural language or formal systems like Lean.
Natural language outputs are easier to generate and interpret but harder to verify.
Formal systems like Lean offer clearer verification but currently lack large training datasets.
Both approaches are expected to advance, with parallel efforts in expanding Lean data and in improving language model capabilities.
There are ongoing projects, such as Kevin Buzzard's attempt to formalize the proof of Fermat's Last Theorem in Lean.
Breakthroughs and Scaling in AI Math Performance 27:21
There is speculation about the nature of the breakthroughs achieved by OpenAI and DeepMind, with mentions of larger-scale training, better data, and advances in multi-step reasoning (e.g., parallel idea exploration, improved reinforcement learning).
Significant performance gains, such as reducing solution time from 60 hours to under 4 hours, suggest more than just scaling—new ideas for efficiency and reasoning are likely involved.
Recent Gemini model tests showed that both curated and non-curated versions achieved high performance, strengthening confidence in results.
Future Directions and Milestones for AI in Mathematics 31:32
Current benchmarks are compared to exams; the real test for math-capable AI is making breakthroughs in genuine open problems.
The ultimate aspiration is for an AI to generate new mathematical theories or solve long-standing open problems, potentially winning a Fields Medal (the highest prize in mathematics).
The conversation closes with an invitation to further explore and solve IMO problems together.