Kimmy K2, a new open-source language model from a Chinese company, is gaining industry attention for its flawless and smooth training loss curve.
The model is based on a trillion tokens, trained using innovative optimization techniques that avoided typical instability spikes.
Kimmy K2 is a mixture of experts language model with 32 billion activated parameters and 1 trillion total parameters.
Trained with the Muon optimizer, it excels in knowledge reasoning, coding, and agent capabilities.
The model was pre-trained on 15.5 trillion tokens, leveraging the Muon Clip optimizer at unprecedented scale.
Designed for tool use, reasoning, and autonomous problem solving.
Boasts a context window supporting up to 2 million tokens, tested by the team (with slight quality loss).
Versions, Open Source Status & Benchmark Performance 02:06
Two initial versions: base and instruct; a specific reasoning version is not yet available, but expected soon from the open-source community.
Performance benchmarks show Kimmy K2 Instruct outperforms models like Deepseek, Quen, and GPT-4, closely trailing behind or beating proprietary leaders like Claude 4 Opus in various tasks.
Topped multiple benchmarks for coding (SWEBench, Live Codebench), mathematics (Amy 2025), and general knowledge (GPQA Diamond).
Open weights, transparent training process, and a forthcoming research paper announced.
Extensive benchmark results are available on their Hugging Face card, covering tasks such as Ader polyglot, AceBench, Humanity's Last Exam, and more.
Multiple inference providers are already deploying Kimmy K2.
Kimmy K2 can be used via official inference, costing $0.15 per million input tokens (with cache), $0.60 (without cache), and $2.50 per million output tokens.
Weights, technical blog, and GitHub resources are public; immediate testing available at kimmy.ai.
Prompt engineering resources and guides are recommended for optimal usage.
Experts compare Kimmy K2 to Deepseek V3, noting fewer heads but more experts.
Enthusiasm over the model’s scaling—one trillion parameter level with zero training spikes, previously doubted.
Positioned as potentially the best open-source coding and tool-use model, scoring 65.8 on SWEBench verified.
Kimmy K2 demonstrates low cost and high efficacy, able to complete complex tasks like data analysis and web app creation for minimal expense.
Examples include rapid inference speeds, efficient quantized deployment, and impressive performance in generating web-based Minecraft, outperforming Gemini 2.5 Pro in fewer attempts.
Kimmy K2 jailbroken by community members, signaling high interest and adaptability in the open-source ecosystem.