GAIA-1

1 · Family: GAIA

Wayve's generative world model for autonomous driving. From video, text and action inputs it generates realistic driving video sequences.

🔬 Research🔬 Research onlyWorld ModelVideo generationMultimodal📁 GAIA

Parameters

parameters

Release date

20 June 2023

🏢WayveProducer

Deployment:☁ Cloud

Overview

GAIA-1 is a generative world model developed by the UK company Wayve for autonomous driving. It takes video, text and action (vehicle control) inputs and generates realistic driver-perspective video sequences that are physically and geometrically consistent with the driving scenario.

Architecture

The model combines an autoregressive transformer (~6.5B parameters) operating on discrete video, text and action tokens with a diffusion video decoder (~2.6B parameters) that renders continuous frames from those tokens. Roughly 9B parameters in total. Trained on ~4,700 hours of proprietary driving data collected by Wayve in the United Kingdom.

Use cases

GAIA-1 does not control the vehicle — it is used to generate synthetic data and scenarios for training and evaluating autonomous driving stacks, including rare corner cases. Weather, lighting, behaviour of other road users and ego-vehicle commands can be controlled via text prompts and action vectors.

Classification

World ModelVideo generationMultimodal

Family: GAIA

Applications

Simulation / synthetic data generation Robot policy training

Access & deployment

Cloud

Weights: Closed

Key parameters

🧩 Parameters: 9B

📥 Input: video, text

Robotics

Environment modelingSpatial predictionScene understanding

Technical specification

Parameters

parameters

License

Proprietary (research, not released)

Modalities

⬇ Input

videotext

⬆ Output

video

Capabilities and applications

Native model capabilities

Video generation

The model's ability to generate video clips from a text prompt, image or another video, with control over length, resolution and visual characteristics.

Category: video

Synthetic data generation

Generating synthetic datasets that preserve the statistical properties of the original — used for model training, testing, and privacy protection.

Category: structured_generation

Image-to-video

The model's ability to animate a static input image — extending it in time into a consistent video clip according to a description of motion or action.

Category: video

Robotics

Environment modelingSpatial predictionScene understanding

Application domains

Simulation / synthetic data generation Robot policy training

Technical architecture

Core Architecture

TRTransformer AGAR Generation DMDiffusion Model AVAction-Conditioned Video Generation

Model Form

WMWorld Models

Sources and related pages

4 sources

BlogScaling GAIA-1: 9B parameters and beyondwayve.ai BlogIntroducing GAIA-1: A cutting-edge generative AI model for autonomywayve.ai PaperGAIA-1: A Generative World Model for Autonomous Driving (arXiv:2309.17080)arxiv.org WebWayve Labs — GAIAwayve.ai

Browse related topics

📁 GAIA 🌐 Simulation / synthetic data generation 🌐 Robot policy training 🧠 Transformer 🧠 AR Generation 🧠 Diffusion Model All world model models All video generation model models