Gemini Deep Think

Background and Achievements 00:00

  • Google DeepMind achieved gold medal standard at the International Math Olympiad (IMO), a milestone AI companies have pursued for years.
  • The IMO is a prestigious mathematics competition for high school students, featuring six challenging problems covering algebra, geometry, and number theory.
  • Last year, DeepMind’s system earned a silver medal equivalent (28/42 points), mainly because its solving time far exceeded human time limits.
  • OpenAI also recently achieved gold medal-level performance at IMO using an experimental reasoning LLM, scoring 35/42 points.
  • Both DeepMind and OpenAI released solutions for the five problems they answered correctly, withholding the sixth, sparking speculation on the nature of the omitted answers.
  • Some observers believe DeepMind's solutions appeared more "humanlike" than OpenAI's.

Gemini Deep Think Model and Approach 02:27

  • This year, Google used a new approach: instead of math-specific languages or proof assistants, they utilized an advanced version of Gemini with a feature called Deep Think.
  • Deep Think, announced at Google I/O, focuses on generating multiple chains of thought in parallel, then evaluating which are most promising.
  • This approach allows the model to excel at logical, mathematical, and code-related benchmarks, as shown in updated Google I/O statistics.

Model Usage, Performance, and Limitations 05:28

  • Using Deep Think involves inputting a problem directly; the model processes it without specialized math software or languages.
  • Generation of solutions can take a long time, often 10 minutes or more for difficult problems, due to the parallel reasoning strategy.
  • During solution generation, users may not see any progress or partial results for several minutes.
  • For example, solving an IMO problem took around 16 minutes to deliver a correct, step-by-step solution.
  • Problem-solving times are inconsistent—some tasks complete in under 10 minutes, others take longer, especially for more difficult prompts.

Performance on Benchmarks and Coding Tasks 10:24

  • Deep Think demonstrates strong performance on math benchmarks like the AIM 2025 dataset.
  • In a sample math problem (answer: 204), the model produced a correct summary quickly, but continued "thinking" several extra minutes before outputting the final answer.
  • For routine usage, this means users may submit prompts and await notification due to lengthy response times.

Non-Mathematical and Coding Capabilities 11:35

  • Deep Think can also handle creative prompts, such as generating 3D scene code for a traditional Thai house (“salatai”), producing quality output using 3JS.
  • For game generation tasks like “Angry Birds,” the model handled incremental improvements via follow-up prompts, indicating it runs and tests some code in a sandbox-like environment.
  • Despite successes, Deep Think is not ideal for rapid coding iterations; its extended computation times limit practicality for many software development tasks compared to models like Gemini 2.5 Pro.

Practical Considerations and Availability 15:29

  • Deep Think is accessible in the Gemini app for Ultra subscribers and will be available via API (AI Studio and Google Cloud) in the near future.
  • Its primary strength is in deep intelligence and reasoning, but slow response times limit its everyday usability.
  • The need to balance speed, cost, and intelligence in future model development is emphasized as a significant ongoing challenge.