Genie 3 by Google DeepMind allows users to make real-world images interactive, entering and modifying them with prompts.
The system enables users to move around, take lasting actions, and explore creatively within generated environments.
Genie 3 is designed for AI agents to act out scenarios and self-improve, but the presenter predicts it will also gamify reality and imagination for users.
Real-time interaction is now possible at 720p, 24 frames per second, meaning immediate on-screen responses to user actions at a reasonably high resolution.
Genie 3 is still in research preview; no release date has been shared for public access.
The system's world memory only persists for minutes—changes made (like painting a wall) will not last longer or be remembered on returning the next day.
Key limitations include:
Only common, simple actions are currently possible (e.g., moving, jumping); complex actions are not yet supported.
No ability to talk to or engage in complex interactions with other characters—modeling these remains a research challenge.
Real-world locations are not accurately rendered; the focus is on imaginative, not photorealistic, fidelity.
High-fidelity text rendering is not native; any text included must be specifically prompted.
Google emphasizes that prior image generators also started as research-only, but later became widely available, so Genie 3 may follow a similar path.
Genie 3 is not positioned as a replacement for platforms like Omniverse or Unreal Engine, but provides a different approach: scalable, prompt-based world generation rather than meticulously handcrafted assets.
Hybrid approaches exist, such as models that code environment parts directly based on prompts, but may be less scalable.
Genie’s advantage is scalability, potentially leveraging billions of hours of video data, compared to manually built assets.