Hugging Face released a comprehensive "blueprint" detailing all training phases, from data selection, distributed setup, long context handling, to post-training.
The pre-training used 384 H100 GPUs over 24 days (~220,000 GPU hours), suggesting training costs in the several-hundred-thousand-dollar range.
The model underwent a three-phase pre-training, initially web-heavy, then increasing code and math data in later phases.
Relies on Deepseek R1 and Quen 3 for generating synthetic reasoning traces.
Utilized a new alignment variant based on DPO and employed model checkpoint merging to produce a final model.
Hugging Face published the datasets and training methodology openly, fostering transparency compared to proprietary labs.
SmolLM3 supports function calling/tool use, relevant for building local AI agents.
Tested by defining tools (with schemas) and prompting the model to invoke them (e.g., weather lookup, web search).
The model generates accurate tool calls based on prompts and context.
For multiple tools present, the model sometimes correctly refrains from calling a tool when unnecessary, but not always—behavior depends on prompt and tool descriptions.
The model uses search tools to answer questions beyond its knowledge cutoff, seen as a positive trait for its intended applications.