Google announced Genie 3, a fully controllable and immersive world model akin to a video game.
Genie 3 enables real-time user control using keyboard inputs, maintaining high visual consistency between frames.
Demonstrations showcase diverse environments, such as a gorilla in a city, a mountain biker on hills, a cartoonish firefly, and a tropical island during a storm.
Environments and character movements display a high level of realism, detail, and 720p quality.
Google positions world models as critical tools for training robots, AI agents, and simulating rich environments for unlimited learning and self-improvement.
Removing humans from the training loop allows rapid and scalable agent training limited primarily by computational resources.
Technical Advancements and Model Consistency 05:31
Genie 3 generates each frame by accounting for the entire sequence of previous frames, ensuring consistent and realistic world behavior (e.g., ball trajectories, revisiting previous locations).
Achieving real-time, frame-by-frame generation is computationally demanding.
Consistency in Genie 3 is described as an emergent property from scaling and increased training, not from explicit programming.
Comparisons to methods like NeRFs and Gaussian splatting highlight Genie 3's flexible, dynamic world-building without explicit 3D modeling.
Interactive Prompting & On-the-Fly Scene Changes 08:02
Users can prompt events during real-time simulations, such as making characters or objects appear or causing rain to start.
Examples include adding a man in a chicken suit, a jet ski, or a dragon dynamically to ongoing scenes.
This capability demonstrates real-time adaptability and scene-modification without breaking visual continuity.
Genie 3 has not been released or made available for public testing; it remains internal to Google.
The current demos lack generated sound, though related models have the capability, suggesting future integration of real-time audio.
Additional Visual Demos and Room for Improvement 10:17
Genie 3 can generate a wide aesthetic range, from Pixar-like village scenes to highly realistic scenarios such as a person approaching a spaceship.
Realistic interactions include environmental responses, like flowers moving as a person walks, though some issues like blurriness persist in complex zones.
Technical demo includes wall painting, where behavior accurately reflects whether the brush touches the wall, although some visual artifacts such as missing reflections are present.