Robots Atlas>ROBOTS ATLAS
DreamerV3

DreamerV3

3ย ยทย Family: Dreamer
General-purpose model-based RL algorithm that, with a single set of hyperparameters, masters more than 150 tasks and is the first to collect diamonds in Minecraft without any human data.
โœ“ Activeโœ“ Public accessโš– Open sourceWorld Model๐Ÿ“ Dreamer
Parameters
12M โ€“ 400M
parameters
Release date
10 January 2023
Access:DownloadDeployment:๐Ÿ’ป Local

Overview

DreamerV3 is a general-purpose model-based reinforcement learning algorithm developed by Danijar Hafner, Jurgis Pasukonis, Jimmy Ba and Timothy Lillicrap. It was first released as an arXiv preprint on 10 January 2023 (arXiv:2301.04104) and published in Nature in 2025.

The agent learns a representation of the environment from raw observations (such as images) using a Recurrent State-Space Model (RSSM) with discrete latent representations. An actor-critic policy is trained on trajectories rolled out in imagination by the world model, without executing those actions in the real environment.

Results

With a single fixed set of hyperparameters, DreamerV3 outperforms specialised methods on more than 150 tasks across many domains (DMLab, Atari 100k/200M, Crafter, ProcGen, Minecraft, BSuite, continuous control benchmarks). It is the first algorithm to collect diamonds in Minecraft from scratch from pixels and sparse rewards, without human data or curricula.

Scaling

The paper demonstrates favourable scaling behaviour: larger models (from roughly 12M to 400M parameters) consistently improve both final performance and sample efficiency. Increasing the number of gradient steps further improves data efficiency.

Classification
World Model
Family: Dreamer
Access & deployment
Download
Local
Weights: Open source
Key parameters
๐Ÿงฉ Parameters: 12M โ€“ 400M
โœ“ Fine-tuning
๐Ÿ“ฅ Input: image, structured data, robot state data
Robotics
Motion planningRobot controlEnvironment modelingSpatial prediction

Technical specification

Parameters
12M โ€“ 400M
parameters
License
MIT
Hardware requirements
Training on a single GPU; reported training times range from about 12 hours (small configurations) to several days (large models) on modern NVIDIA GPUs / TPUs. Reference implementation built on JAX.
Features:โœ“ Fine-tuning
Modalities
โฌ‡ Input
imagestructured_datarobot_state_data
โฌ† Output
robot_actionsstructured_data

Capabilities and applications

Native model capabilities
Planning
The model's ability to determine a sequence of actions leading to a goal โ€” predicting the consequences of actions and selecting an optimal path in a given environment.
Category: planning
Robotics
Motion planningRobot controlEnvironment modelingSpatial prediction

Benchmark results

4 benchmarks
Minecraft (Diamond)
pixel input, sparse rewards, no curriculum
first to collect diamonds without human data
๐Ÿ“„ DreamerV3 paper (arXiv:2301.04104)
Atari 200M
single hyperparameter configuration across all games
state-of-the-art with single config
๐Ÿ“„ DreamerV3 paper (arXiv:2301.04104)
DeepMind Control Suite (Proprio)
state-of-the-art
๐Ÿ“„ DreamerV3 paper (arXiv:2301.04104)
Crafter
state-of-the-art
๐Ÿ“„ DreamerV3 paper (arXiv:2301.04104)

Technical architecture

Training Techniques