Claude Just Got a Big Update (Opus 4.1)

Claude Opus 4.1 Release Overview 00:00

  • Anthropic released Claude Opus 4.1, an updated version of Claude Opus 4.0.
  • The update focuses on agentic tasks, real-world coding, and reasoning improvements.
  • Anthropic plans to release even larger improvements in upcoming weeks.

Benchmark Performance 00:37

  • Opus 4.1 achieved 74.5% on SWEBench verified, up from 72.5% in version 4.0 and 62.3% in Sonnet 3.7.
  • Terminal Bench score improved from 39.2 to 43.3.
  • Graduate level reasoning (GPQA diamond) score saw a minor increase from 79.6 to 80.9.
  • Agentic tool use (towbench) for retail improved to 82.4 from 81.4, but dropped for airline scenarios to 56% from 59.6%.
  • Multilingual Q&A performance rose to 89.5 from 88.8, visual reasoning had a small increase, and AMI 2025 A saw a 2.5-point boost to 78%.

Comparison with Competing Models 02:13

  • Opus 4.1 outperforms OpenAI's GPT-4o ("O3") and Gemini 2.5 Pro on SWEBench and Terminal Bench.
  • It trails both competitors on GPQA diamond, agentic tool use, and is notably behind in high school math competition (Opus 4.1 at 78%, O3 at 88.9%, Gemini 2.5 Pro at 88%).

Practical Use and Closing Thoughts 02:43

  • The actual value of the model becomes clear only through real-world usage, not just benchmarks.
  • Claude is recognized as the top coding model currently available, especially for agentic coding.
  • The update details were brief, and further testing is planned.
  • Viewers are encouraged to test the model and share feedback.