DreamerV3

3 · Family: Dreamer

General-purpose model-based RL algorithm that, with a single set of hyperparameters, masters more than 150 tasks and is the first to collect diamonds in Minecraft without any human data.

✓ Active✓ Public access⚖ Open sourceWorld Model📁 Dreamer

Parameters

12M – 400M

parameters

Release date

10 January 2023

🔬Google DeepMindResearch lab

Access:DownloadDeployment:💻 Local

Overview

DreamerV3 is a general-purpose model-based reinforcement learning algorithm developed by Danijar Hafner, Jurgis Pasukonis, Jimmy Ba and Timothy Lillicrap. It was first released as an arXiv preprint on 10 January 2023 (arXiv:2301.04104) and published in Nature in 2025.

The agent learns a representation of the environment from raw observations (such as images) using a Recurrent State-Space Model (RSSM) with discrete latent representations. An actor-critic policy is trained on trajectories rolled out in imagination by the world model, without executing those actions in the real environment.

Results

With a single fixed set of hyperparameters, DreamerV3 outperforms specialised methods on more than 150 tasks across many domains (DMLab, Atari 100k/200M, Crafter, ProcGen, Minecraft, BSuite, continuous control benchmarks). It is the first algorithm to collect diamonds in Minecraft from scratch from pixels and sparse rewards, without human data or curricula.

Scaling

The paper demonstrates favourable scaling behaviour: larger models (from roughly 12M to 400M parameters) consistently improve both final performance and sample efficiency. Increasing the number of gradient steps further improves data efficiency.

Classification

World Model

Family: Dreamer

Access & deployment

Download

Local

Weights: Open source

Key parameters

🧩 Parameters: 12M – 400M

✓ Fine-tuning

📥 Input: image, structured data, robot state data

Robotics

Motion planningRobot controlEnvironment modelingSpatial prediction

Technical specification

Parameters

12M – 400M

parameters

License

MIT

Hardware requirements

Training on a single GPU; reported training times range from about 12 hours (small configurations) to several days (large models) on modern NVIDIA GPUs / TPUs. Reference implementation built on JAX.

Features:✓ Fine-tuning

Modalities

⬇ Input

imagestructured_datarobot_state_data

⬆ Output

robot_actionsstructured_data

Capabilities and applications

Native model capabilities

Planning

Forming and executing action plans for complex tasks.

Category: planning

Robotics

Motion planningRobot controlEnvironment modelingSpatial prediction

Benchmark results

4 benchmarks

Minecraft (Diamond)

pixel input, sparse rewards, no curriculum

first to collect diamonds without human data

📄 DreamerV3 paper (arXiv:2301.04104)

Atari 200M

single hyperparameter configuration across all games

state-of-the-art with single config

📄 DreamerV3 paper (arXiv:2301.04104)

DeepMind Control Suite (Proprio)

state-of-the-art

📄 DreamerV3 paper (arXiv:2301.04104)

Crafter

state-of-the-art

📄 DreamerV3 paper (arXiv:2301.04104)

Technical architecture

Core Architecture

G(GRU (Gated Recurrent Unit)

Model Form

WMWorld Models WAWAM

Training Techniques

RLRL

Sources and related pages

4 sources

PaperMastering Diverse Domains through World Models (arXiv:2301.04104)arxiv.org WebDreamerV3 — project website (danijar.com/dreamerv3)danijar.com Repodanijar/dreamerv3 (GitHub, MIT license)github.com PaperMastering diverse control tasks through world models (Nature, 2025)nature.com

Browse related topics

📁 Dreamer 🧠 GRU (Gated Recurrent Unit)🧠 World Models 🧠 WAM All world model models