Robots Atlas>ROBOTS ATLAS
GAIA-1

GAIA-1

1
Wayve's generative world model for autonomous driving. From video, text and action inputs it generates realistic driving video sequences.
🔬 Research🔬 Research onlyWorld ModelVideo generationMultimodal
Parameters
9B
parameters
Release date
20 June 2023
Deployment:☁ Cloud

Overview

GAIA-1 is a generative world model developed by the UK company Wayve for autonomous driving. It takes video, text and action (vehicle control) inputs and generates realistic driver-perspective video sequences that are physically and geometrically consistent with the driving scenario.

Architecture

The model combines an autoregressive transformer (~6.5B parameters) operating on discrete video, text and action tokens with a diffusion video decoder (~2.6B parameters) that renders continuous frames from those tokens. Roughly 9B parameters in total. Trained on ~4,700 hours of proprietary driving data collected by Wayve in the United Kingdom.

Use cases

GAIA-1 does not control the vehicle — it is used to generate synthetic data and scenarios for training and evaluating autonomous driving stacks, including rare corner cases. Weather, lighting, behaviour of other road users and ego-vehicle commands can be controlled via text prompts and action vectors.

Classification
World ModelVideo generationMultimodal
Access & deployment
Cloud
Weights: Closed
Key parameters
🧩 Parameters: 9B
📥 Input: video, text
Robotics
Environment modelingSpatial predictionScene understanding

Technical specification

Parameters
9B
parameters
License
Proprietary (research, not released)
Modalities
⬇ Input
videotext
⬆ Output
video

Capabilities and applications

Native model capabilities
Video generation
The model's ability to generate video clips from a text prompt, image or another video, with control over length, resolution and visual characteristics.
Category: video
Synthetic data generation
Generating synthetic datasets that preserve the statistical properties of the original — used for model training, testing, and privacy protection.
Category: structured_generation
Image-to-video
The model's ability to animate a static input image — extending it in time into a consistent video clip according to a description of motion or action.
Category: video
Robotics
Environment modelingSpatial predictionScene understanding