SmolLM3 - A Local Agents Winner?

Introduction & Model Overview 00:00

  • The video introduces SmolLM3, a new 3B parameter model released by Hugging Face, with both base, instruct, and ONNX versions available.
  • SmolLM3 targets agentic use cases, including function calling and running local agents, aiming to reduce reliance on proprietary models.
  • The 3B size is highlighted as suitable for many mobile devices, outperforming previous models like Quen 2.53B and Llama 3.2B.

Model Features and Architecture 01:09

  • SmolLM3 was trained on 11 trillion tokens, which is unusually large for the 3B parameter segment.
  • Claims state-of-the-art results among 3B models and strong competition with some 4B models.
  • The model supports a long context window of up to 128K tokens (potentially up to 256K).
  • Noted as "multilingual" with support for six European languages.
  • The architecture draws from Llama 3 and includes features like group query attention and a novel mechanism (nope) for position encoding.
  • Includes architectural insights from other models, e.g., removing weight decay from embeddings for training stability, inspired by MO 2.

Training Process and Blueprint Release 03:02

  • Hugging Face released a comprehensive "blueprint" detailing all training phases, from data selection, distributed setup, long context handling, to post-training.
  • The pre-training used 384 H100 GPUs over 24 days (~220,000 GPU hours), suggesting training costs in the several-hundred-thousand-dollar range.
  • The model underwent a three-phase pre-training, initially web-heavy, then increasing code and math data in later phases.
  • Relies on Deepseek R1 and Quen 3 for generating synthetic reasoning traces.
  • Utilized a new alignment variant based on DPO and employed model checkpoint merging to produce a final model.
  • Hugging Face published the datasets and training methodology openly, fostering transparency compared to proprietary labs.

Model Usage and Reasoning ("Thinking") 07:31

  • SmolLM3 is easy to use with Hugging Face Transformers, as well as SG Lang and VLM tools.
  • Basic prompt control allows toggling of the "thinking" process by adjusting the system prompt (e.g., adding "\n nothink").
  • When "thinking" is enabled, answers are often prefixed by a detailed reasoning chain, though the reasoning is not partitioned into enumerated steps.
  • The chain-of-thought is generally lengthy but sometimes outputs empty "thinking" sections, especially for code generation.
  • Compared to outputs with "no thinking," answers with reasoning are more thorough and justified.
  • The reasoning abilities are considered strong for a model of this size, rivaling older, much larger models.
  • The model sometimes provides unexpected outputs, such as omitting "thinking" on certain prompts or over-generating justification.

Tool Use and Agentic Abilities 11:41

  • SmolLM3 supports function calling/tool use, relevant for building local AI agents.
  • Tested by defining tools (with schemas) and prompting the model to invoke them (e.g., weather lookup, web search).
  • The model generates accurate tool calls based on prompts and context.
  • For multiple tools present, the model sometimes correctly refrains from calling a tool when unnecessary, but not always—behavior depends on prompt and tool descriptions.
  • The model uses search tools to answer questions beyond its knowledge cutoff, seen as a positive trait for its intended applications.

Overall Assessment and Openness 14:27

  • SmolLM3 is noted for its open release, including both the model weights and an extensive blueprint of its creation.
  • Further releases might include intermediate checkpoints for more granular user experimentation.
  • The model is already being converted for use in various runtime environments (e.g., GGUF, LM Studio).
  • The video closes with questions to viewers about their preferences for local versus proprietary models and notes increasing use cases for both.