SUMM

Google DeepMind achieved gold medal standard at the International Math Olympiad (IMO), a milestone AI companies have pursued for years.
The IMO is a prestigious mathematics competition for high school students, featuring six challenging problems covering algebra, geometry, and number theory.
Last year, DeepMind’s system earned a silver medal equivalent (28/42 points), mainly because its solving time far exceeded human time limits.
OpenAI also recently achieved gold medal-level performance at IMO using an experimental reasoning LLM, scoring 35/42 points.
Both DeepMind and OpenAI released solutions for the five problems they answered correctly, withholding the sixth, sparking speculation on the nature of the omitted answers.
Some observers believe DeepMind's solutions appeared more "humanlike" than OpenAI's.

This year, Google used a new approach: instead of math-specific languages or proof assistants, they utilized an advanced version of Gemini with a feature called Deep Think.
Deep Think, announced at Google I/O, focuses on generating multiple chains of thought in parallel, then evaluating which are most promising.
This approach allows the model to excel at logical, mathematical, and code-related benchmarks, as shown in updated Google I/O statistics.

Using Deep Think involves inputting a problem directly; the model processes it without specialized math software or languages.
Generation of solutions can take a long time, often 10 minutes or more for difficult problems, due to the parallel reasoning strategy.
During solution generation, users may not see any progress or partial results for several minutes.
For example, solving an IMO problem took around 16 minutes to deliver a correct, step-by-step solution.
Problem-solving times are inconsistent—some tasks complete in under 10 minutes, others take longer, especially for more difficult prompts.

Deep Think demonstrates strong performance on math benchmarks like the AIM 2025 dataset.
In a sample math problem (answer: 204), the model produced a correct summary quickly, but continued "thinking" several extra minutes before outputting the final answer.
For routine usage, this means users may submit prompts and await notification due to lengthy response times.

Deep Think can also handle creative prompts, such as generating 3D scene code for a traditional Thai house (“salatai”), producing quality output using 3JS.
For game generation tasks like “Angry Birds,” the model handled incremental improvements via follow-up prompts, indicating it runs and tests some code in a sandbox-like environment.
Despite successes, Deep Think is not ideal for rapid coding iterations; its extended computation times limit practicality for many software development tasks compared to models like Gemini 2.5 Pro.

Deep Think is accessible in the Gemini app for Ultra subscribers and will be available via API (AI Studio and Google Cloud) in the near future.
Its primary strength is in deep intelligence and reasoning, but slow response times limit its everyday usability.
The need to balance speed, cost, and intelligence in future model development is emphasized as a significant ongoing challenge.