Cost of computing has fallen dramatically and consistently since 1940, enabling rapid AI progress.
The 2010s saw deep learning thrive thanks to fast, cheap GPUs and large data sets.
AI’s dominant paradigm became scaling up model and data size, with larger models achieving predictably better benchmarks.
Many believed that further scaling alone would automatically lead to artificial general intelligence (AGI).
However, benchmarks mostly measured static, memorized skills—very different from true, adaptive intelligence.
Chollet introduced the Abstraction and Reasoning Corpus (ARC) in 2019 to highlight these limits, showing human performance far surpasses scaled models.
In 2024, AI research shifted focus to test-time adaptation, where models adjust their behavior dynamically based on problems they encounter during inference.
Notable progress on ARC was observed with these new dynamic approaches.
OpenAI’s 03 model fine-tuned on ARC achieved human-level performance, demonstrating fluid intelligence for the first time on this benchmark.
The AI field has now moved away from static pre-training and into the era of test-time adaptation.
Two perspectives on intelligence: Minsky’s "task automation" view (AGI as performing 80% of human economic tasks) and the "fluid problem-solving" view (ability to tackle truly novel problems).
Chollet favors defining intelligence as the efficiency with which systems use past experience to handle future novelty.
Benchmarks that resemble human exams mostly measure rote skills, not true adaptive intelligence.
Intelligence involves operational area (range of contexts a skill applies to) and information efficiency (how much practice/data needed to acquire skills).
The way intelligence is measured deeply affects research direction—a poorly chosen metric can lead to superficial targets and "missing the point."
ARC1's binary nature means once a system gains some fluid intelligence, scores jump near the maximum, lacking granularity to judge progress.
ARC2, released March 2025, increases complexity and compositional reasoning demands, remaining solvable by untrained humans, as confirmed via real-world testing.
Baseline large language models and even static reasoning systems perform almost at chance; only systems using test-time adaptation achieve marginal improvement, but still far below human level.
Ongoing difficulty creating simple, human-easy tasks that stump AI is evidence that AGI is not yet achieved.
ARC3, expected in early 2026, will introduce interactive environments, requiring agentic goal-setting, exploration, and learning, with an emphasis on efficiency of actions.
The Foundations of Intelligence: Abstractions and Recombination 20:26
True novelty is rare; most real-world situations recombine a modest set of “atoms of meaning”—abstractions.
Intelligence is the ability to extract reusable abstractions from past experience and recombine them on the fly for new tasks.
Intelligence is distinguished not just by what is achievable, but by the efficiency in learning and deploying abstractions (in both data and compute).
Type 1: Value-centric (continuous), central for perception, intuition, and modern machine learning.
Type 2: Program-centric (discrete), central for human reasoning, logic, and discrete tasks like code manipulation.
Transformers excel at Type 1, but struggle with discrete reasoning tasks (Type 2), such as sorting or algorithmic computation.
Discrete program search enables invention and creativity, relying on combinatorial pattern search rather than simple interpolation.
Combining Type 1 and Type 2 for Human-like Reasoning 29:33
Human intelligence combines both forms: pattern recognition (Type 1) to narrow options, then explicit reasoning (Type 2) to analyze them.
The challenge is managing combinatorial “search explosion” when trying to synthesize new programs; intuition-based heuristics (from Type 1) can guide efficient search.
Programmer-like Meta-Learners: The Path Forward 31:53
Future AI should function as programmer-like systems that approach new tasks by synthesizing custom programs.
This involves a global, evolving abstraction library that updates as new problems are solved, much like a software engineer growing their toolkit and sharing on GitHub.
AI would blend deep learning (for intuitive, Type 1 sub-problems) and algorithmic reasoning (for Type 2 sub-problems), guided by deep learning-fueled search through the space of solutions.