When AI Is Designed Like A Biological Brain

Introducing the Continuous Thought Machine (CTM) 00:00

  • Sakana AI has released the "Continuous Thought Machine" (CTM), a new AI model inspired by biological processes.
  • CTM addresses a key problem in current AI models: the inability to perceive time, by incorporating an internal clock.
  • This model demonstrates emergent capabilities, such as solving 2D mazes directly from raw images without positional hints, by using its internal neural timing.
  • CTM can generalize maze-solving to larger scales, suggesting it builds an internal spatial representation or world model well, even without positional information during training.
  • For image processing, CTM naturally takes multiple steps to examine different parts of an image before making a decision, indicated by its attention trace.
  • The longer CTM "thinks" (i.e., the longer its internal clock runs), the more accurate its answers become.
  • While not yet state-of-the-art, CTM represents a significant first step in bridging biologically inspired AI models into the field, showing strong performance for a novel idea.

Sponsor Message: Delete Me 01:19

  • The rapid evolution of AI research raises concerns about the AI agent-powered data scraping economy.
  • AI enables data brokers to scrape, analyze, and categorize personal details like home addresses, phone numbers, and family connections faster and with less effort than ever before.
  • Delete Me scans the web, submits removal requests to data brokers, and continuously monitors to keep personal data off these platforms.
  • The service provides a dashboard showing scanned listings (e.g., 635 in the example) and regular reports on where data was collected and how it's being removed.
  • Delete Me offers family plans to help protect loved ones' data, as collecting information on relatives has become easier.

CTM's Unique Approach to Time and Neuron Activity 02:37

  • While AI models incorporating time are not entirely new (e.g., recurrent neural networks), CTM differs significantly.
  • CTM uses complex "learner ways" instead of conventional static activation functions, incorporating histories of activations to produce complex neuron-level activity.
  • When producing outputs, CTM aggregates temporal relationships between neurons.
  • CTM doesn't "decide" based on time but has an internal clock that ticks, providing a way to track past generations.
  • This internal clock allows CTM to focus on different points of an image and improve prediction confidence, similar to how human attention works.

Detailed Breakdown of CTM Architecture 03:35

  • CTM has a flexible input mechanism, processing raw data via a feature extractor (e.g., CNN for images, embedding layers for sequential data), then through an attention layer.
  • The model contains a certain number of neurons (e.g., 128 to 4,096 in the paper, simplified to 4 for explanation purposes).
  • CTM's internal clock has a base unit called a "tick," and the model progresses through these ticks, accumulating "thought steps."
  • Input attention combines with neuron activations from the previous tick and is sent to a "synapse model," which compares input with each neuron's past "thoughts" to generate "pre-activations."
  • The synapse model acts as a communication point, combining insights from all neurons' signals and new input to redistribute guiding signals; after this, all signals are processed in parallel.
  • Pre-activations are then passed to individual "neuron level models" (NLMs), which are simple MLPs with a small memory that stores a limited history of recent pre-activations (e.g., 3 ticks).
  • Each NLM generates a "post-activation," which is sent to the synapse model for the next tick, mimicking how individual biological neurons demonstrate complex, time-sensitive responses.
  • Post-activations undergo "synchronization," where the model quantifies how neuron activity patterns change over time relative to each other, using this dynamic and temporal relationship for decisions.
  • Neurons are artificially paired, and their activation histories are evaluated for a "synchronization score," which is then concatenated to create a "latent representation."
  • This synchronization step allows the model to base its understanding on evolving patterns of paired activities throughout the entire thought process, leading to a deeper representation of information.
  • Signals from neurons are applied with a decay, enabling CTM to learn whether to prioritize recent activity or long-term historical relationships, capturing interactions across different time scales.
  • This latent representation is fed into learnable linear layers; one layer generates CTM's output/prediction for the current tick, while another creates an attention query for the next tick.
  • Users define the number of "thought steps" or ticks, receiving one output per tick, with all outputs then used to determine a single, robust final prediction across the entire thought process.
  • For the first tick, where no historical information exists, initialization values for the synapse model, neuron level models, and attention mechanisms can be set and optimized during training.