Introducing the Continuous Thought Machine (CTM) 00:00
Sakana AI has released the "Continuous Thought Machine" (CTM), a new AI model inspired by biological processes.
CTM addresses a key problem in current AI models: the inability to perceive time, by incorporating an internal clock.
This model demonstrates emergent capabilities, such as solving 2D mazes directly from raw images without positional hints, by using its internal neural timing.
CTM can generalize maze-solving to larger scales, suggesting it builds an internal spatial representation or world model well, even without positional information during training.
For image processing, CTM naturally takes multiple steps to examine different parts of an image before making a decision, indicated by its attention trace.
The longer CTM "thinks" (i.e., the longer its internal clock runs), the more accurate its answers become.
While not yet state-of-the-art, CTM represents a significant first step in bridging biologically inspired AI models into the field, showing strong performance for a novel idea.
The rapid evolution of AI research raises concerns about the AI agent-powered data scraping economy.
AI enables data brokers to scrape, analyze, and categorize personal details like home addresses, phone numbers, and family connections faster and with less effort than ever before.
Delete Me scans the web, submits removal requests to data brokers, and continuously monitors to keep personal data off these platforms.
The service provides a dashboard showing scanned listings (e.g., 635 in the example) and regular reports on where data was collected and how it's being removed.
Delete Me offers family plans to help protect loved ones' data, as collecting information on relatives has become easier.
CTM's Unique Approach to Time and Neuron Activity 02:37
While AI models incorporating time are not entirely new (e.g., recurrent neural networks), CTM differs significantly.
CTM uses complex "learner ways" instead of conventional static activation functions, incorporating histories of activations to produce complex neuron-level activity.
When producing outputs, CTM aggregates temporal relationships between neurons.
CTM doesn't "decide" based on time but has an internal clock that ticks, providing a way to track past generations.
This internal clock allows CTM to focus on different points of an image and improve prediction confidence, similar to how human attention works.
CTM has a flexible input mechanism, processing raw data via a feature extractor (e.g., CNN for images, embedding layers for sequential data), then through an attention layer.
The model contains a certain number of neurons (e.g., 128 to 4,096 in the paper, simplified to 4 for explanation purposes).
CTM's internal clock has a base unit called a "tick," and the model progresses through these ticks, accumulating "thought steps."
Input attention combines with neuron activations from the previous tick and is sent to a "synapse model," which compares input with each neuron's past "thoughts" to generate "pre-activations."
The synapse model acts as a communication point, combining insights from all neurons' signals and new input to redistribute guiding signals; after this, all signals are processed in parallel.
Pre-activations are then passed to individual "neuron level models" (NLMs), which are simple MLPs with a small memory that stores a limited history of recent pre-activations (e.g., 3 ticks).
Each NLM generates a "post-activation," which is sent to the synapse model for the next tick, mimicking how individual biological neurons demonstrate complex, time-sensitive responses.
Post-activations undergo "synchronization," where the model quantifies how neuron activity patterns change over time relative to each other, using this dynamic and temporal relationship for decisions.
Neurons are artificially paired, and their activation histories are evaluated for a "synchronization score," which is then concatenated to create a "latent representation."
This synchronization step allows the model to base its understanding on evolving patterns of paired activities throughout the entire thought process, leading to a deeper representation of information.
Signals from neurons are applied with a decay, enabling CTM to learn whether to prioritize recent activity or long-term historical relationships, capturing interactions across different time scales.
This latent representation is fed into learnable linear layers; one layer generates CTM's output/prediction for the current tick, while another creates an attention query for the next tick.
Users define the number of "thought steps" or ticks, receiving one output per tick, with all outputs then used to determine a single, robust final prediction across the entire thought process.
For the first tick, where no historical information exists, initialization values for the synapse model, neuron level models, and attention mechanisms can be set and optimized during training.