AI models like Claude and ChatGPT are trained in two phases: pre-training (predicting next words from human data) and reinforcement learning (using human feedback to optimize desired behaviors).
Scaling up data and compute in pre-training and RL phases has led to consistent and predictable improvements in AI capability.
Scaling laws discovered in both pre-training and RL phases show a direct relationship: increasing compute, model size, and data improves performance over many orders of magnitude.
These scaling laws give confidence that AI can continue to advance predictably as resources increase.
The discovery of these laws was inspired by applying broad, simple questions from physics to AI.
AI progress can be visualized on two axes: flexibility (modality handling) and task complexity/duration.
Models are progressing from narrow, game-specific systems (e.g., AlphaGo) to flexible, multi-modal, and long-horizon tasks.
The length of tasks AI can handle is doubling roughly every seven months, with future models possibly handling projects that would take humans weeks, months, or even years.
To reach broadly human-level AI, models need more organizational knowledge (context about how organizations work), long-term memory, and enhanced oversight for nuanced tasks.
Improvements in AI memory and contextual understanding are being incorporated into newer models like Claude 4.
Progress is required in generating better reward signals for creative and subjective tasks (e.g., humor, poetry).
Memory improvements in AI are enabling collaboration on longer tasks, now achievable on the scale of hours for complex work.
The distinction between human judgment and AI generation is shrinking, suggesting the human role will shift toward managing, supervising, and validating AI outputs.
Products are evolving from co-pilots (requiring human approval) to end-to-end full workflow replacements in some domains.
The Future of AI Automation Versus Collaboration 20:17
Some tasks can tolerate less-than-perfect AI performance and are being automated more quickly.
High-reliability tasks (99.9%+ correct) still require humans in the loop, but full automation in more areas is expected as reliability improves.
Human-AI teamwork is likely to dominate high-complexity areas in the near term.
High-skill, computer-based tasks involving large data interactions—such as finance, legal, and business integrations—are promising fields for AI application.
Integrating AI into the fabric of existing businesses, analogous to reimagining factories during electrification, offers significant leverage.
Physics training helped Kaplan focus on identifying broad, precise trends (like scaling laws) in AI.
Asking naive but fundamental questions (such as the exact mathematical nature of learning curves) is valuable in AI’s young and rapidly evolving field.
In AI, studying very large neural networks leverages mathematical techniques familiar from physics.
Understanding AI interpretability is akin to neuroscience or biology, but AI allows for complete measurement of components, enabling thorough analysis.
Scaling laws have proven robust, and deviations from expected trends often indicate training or engineering problems, not limits of the paradigm itself.
AI training and inference are becoming rapidly more efficient, with 3x to 10x annual algorithmic improvements.
While costs will decrease, high-capability frontier models will likely continue to be in demand for their ability to handle complex, long-horizon tasks.
Building at the frontier of AI, understanding model mechanics, and efficiently leveraging and integrating AIs are key strategies for staying relevant.
Audience Q&A: Scaling, Self-Correction, and Training 35:24
Task horizon expansions may stem from improved self-correction and planning in AI, enabling larger leaps with modest ability increases.
For complex, long-horizon tasks outside coding, training data and verification become challenging. AI oversight (AI supervising AI) may improve efficiency and scalability.
Generating and curating training tasks increasingly combines both AI and human input as complexity grows.