China Went HARD...

Introduction & Model Performance 00:00

  • Quen 3 coder, a new open-source coding model from China, rivals Anthropic's Claude family of models in coding capability
  • Performance verified on SWEBench, with Quen 3 coder achieving 69.6% compared to Claude Sonnet 4's 70.4%
  • Despite Quen 3 coder being a much smaller model than Claude Sonnet 4, their performance is effectively the same
  • Model offers command line interface (CLI) similar to Claude CLI, forked from Gemini CLI

Model Architecture & Capabilities 00:59

  • Most powerful variant is Quen 3 coder 480B: 480 billion parameters, with 35 billion active in a mixture-of-experts setup
  • Native context length of 256k tokens, up to 1 million tokens using extrapolation methods
  • Exceptionally strong at tool calling and agentic tasks, facilitated by the Quen code CLI tool
  • CLI tool, Quen code, adapted from Gemini code, includes customized prompts and function calling for enhanced agentic coding

Training Data & Methods 01:49

  • Model pre-trained on 7.5 trillion tokens with a 70% code data ratio, preserving both coding and general abilities
  • Uses Quen 2.5 coder to help clean and rewrite noisy data, improving overall data quality
  • Focused on high-quality coding training data, with reinforcement learning on diverse real-world coding tasks
  • Automated test case scaling led to better code execution success rates and improvements in other task areas

Reinforcement Learning, Post-Training & Technical Innovations 03:18

  • Post-training integrated long horizon agent-based RL (agent RL) to solve real-world tasks via multi-turn tool use
  • Used a scalable system to run 20,000 independent environments in parallel on Alibaba Cloud infrastructure for self-play training
  • Achieves state-of-the-art performance among open-source models on SWEBench without test-time reasoning or scaling
  • Model does not use test time scaling or reasoning yet, suggesting potential future improvements

Access, Demos & Real-World Examples 04:10

  • Quen 3 coder is hosted on HuggingFace and is free to use
  • Users can generate and execute code directly within the HuggingFace interface
  • Demonstrated capabilities include creating physics simulations, interactive visualizations, 3D terrain simulations, typing speed test apps, and simple games like a bouncing ball, hypercube rotation, solar system simulation, and Duet

CLI Demo & Closing 06:19

  • Quen code can be set up on the command line with provided instructions
  • Demonstrated generating a complex snake game: 792 lines of code in about 60 seconds, functional with minor lag
  • Encourages viewers to try Quen code and share feedback
  • Video ends with a prompt to like and subscribe