Genie 3

3 · Family: Genie

Foundation world model from Google DeepMind that generates interactive 3D worlds from a text prompt, in real time at 24 fps, 720p, with consistency for several minutes.

⏳ Preview⏳ Limited accessWorld Model📁 Genie

Release date

5 August 2025

🏢Google DeepMindProducer

Access:HostedDeployment:☁ Cloud

Overview

Genie 3 is a general-purpose foundation world model developed by Google DeepMind, announced on 5 August 2025 by Jack Parker-Holder and Shlomi Fruchter. Given a text prompt, the model generates dynamic, interactive 3D worlds that can be navigated in real time at 24 frames per second at 720p resolution, retaining consistency for several minutes.

Progress over Genie 2

Genie 3 is the first model in the Genie family to allow real-time interaction while simultaneously improving consistency and realism compared with Genie 2 (December 2024). Visual memory extends back roughly one minute — the model remembers and correctly renders previously seen regions when revisited. Unlike approaches such as NeRFs or Gaussian Splatting, Genie 3 does not rely on an explicit 3D representation: worlds are generated frame by frame from the description and user actions, making them more dynamic and richer.

Promptable world events

In addition to navigational input, Genie 3 introduces promptable world events — a text-based form of interaction that lets the user change the simulated world on the fly (altering weather, introducing new objects or characters). This mechanism broadens the range of counterfactual ("what if") scenarios available to agents that learn from experience.

Embodied agent research

Genie 3 is used to generate worlds for training and evaluating embodied agents. DeepMind demonstrated a collaboration with a recent version of the SIMA agent: in worlds generated by Genie 3, SIMA pursues stated goals by issuing navigation actions to the model, while Genie 3 — unaware of the agent's goal — simulates future frames. Longer-horizon consistency makes it possible to execute longer sequences of actions and more complex tasks.

Limitations

Limitations stated by DeepMind: a limited action space directly available to the agent, imperfect modelling of interactions between multiple independent agents, no perfect geographic fidelity of real-world locations, issues with rendering legible text (unless the text is provided in the world description) and a continuous interaction duration limited to a few minutes rather than extended hours.

Availability

Genie 3 has been released as a limited research preview to a small cohort of academics and creators. The weights are not publicly available; there is no public API. DeepMind signals plans to extend access to additional testers.

Classification

World Model

Family: Genie

Access & deployment

Hosted

Cloud

Weights: Closed

Key parameters

📥 Input: text, structured data

Robotics

Environment modelingSpatial predictionScene understandingSpatial reasoning

Technical specification

Modalities

⬇ Input

textstructured_data

⬆ Output

video

Capabilities and applications

Native model capabilities

Video understanding

The model's ability to analyse and interpret video content — recognising actions, motion, events and relationships between objects over time.

Category: video

Planning

Forming and executing action plans for complex tasks.

Category: planning

Robotics

Environment modelingSpatial predictionScene understandingSpatial reasoning

Technical architecture

Core Architecture

TRTransformer

Model Form

WMWorld Models WAWAM

Sources and related pages

3 sources

BlogGenie 3: A new frontier for world models (Google DeepMind, Aug 5, 2025)deepmind.google WebGenie — Google DeepMind models pagedeepmind.google WebProject Genie — Google Labslabs.google

Browse related topics

📁 Genie 🧠 Transformer 🧠 World Models 🧠 WAM All world model models