Robots Atlas>ROBOTS ATLAS
Cosmos 3

Cosmos 3

3 · Family: Cosmos
NVIDIA open world foundation model (omni-model) for physical AI. Combines vision reasoning, multimodal generation and robot action prediction.
✓ Active✓ Public access⚖ Open weightsWorld ModelRobotics foundation modelMultimodal📁 Cosmos
Parameters
65B (Super) / 16B (Nano)
parameters
Release date
31 May 2026
Access:APIDownloadHostedDeployment:☁ Cloud💻 Local📱 On-device

Overview

Cosmos 3 is an open world foundation model released by NVIDIA at GTC Taipei during COMPUTEX 2026 (May 31, 2026). It is the first omni-model in the Cosmos family with native reasoning, world generation and action generation in a single Mixture-of-Transformers architecture (separate transformer blocks for reasoning and generation).

The model handles input and output across modalities: text, image, video, ambient sound and action data (numerical — joint angles, gripper positions, trajectory points). Native action generation enables Cosmos 3 to serve as a World Action Model (WAM) backbone for post-training robot policies.

The family includes: Cosmos 3 Super (65B parameters, highest physics accuracy, for robotics and AV post-training), Cosmos 3 Nano (16B, lightweight, optimized for fast inference and policies), and the announced Cosmos 3 Edge (on-device, forthcoming). Additional variants on Hugging Face: Cosmos3-Super-Image2Video, Cosmos3-Super-Text2Image, Cosmos3-Nano-Policy-DROID.

Weights and code are released under the OpenMDW 1.1 license (Linux Foundation), permitting training, modification, redistribution and deployment. A Cosmos 3 Nano post-trained policy ranked first on the RoboLab and RoboArena leaderboards, and Cosmos 3 variants lead the open-weights leaderboards on Artificial Analysis, Physics-IQ, R-Bench, PAI-Bench, VANTAGE-Bench and the TAR challenge. Applications: robot policy training, synthetic data generation, environment simulation for autonomous vehicles, video analytics agents for industrial use.

Classification
World ModelRobotics foundation modelMultimodal
Family: Cosmos
Access & deployment
APIDownloadHosted
CloudLocalOn-device
Weights: Open weights
Key parameters
🧩 Parameters: 65B (Super) / 16B (Nano)
✓ Fine-tuning
📥 Input: text, image, video, audio
Robotics
Robot controlRobot manipulationBimanual manipulationEmbodied task planningScene understandingSpatial reasoningSpatial predictionEnvironment modelingVisual grounding

Technical specification

Parameters
65B (Super) / 16B (Nano)
parameters
License
OpenMDW 1.1 (Linux Foundation)
Features:Fine-tuning
Modalities
⬇ Input
textimagevideoaudiorobot_sensorsrobot_state_data
⬆ Output
textimagevideoaudiorobot_actionsrobot_commandsmotion_trajectories

Capabilities and applications

Native model capabilities
Synthetic data generation
Generating synthetic datasets that preserve the statistical properties of the original — used for model training, testing, and privacy protection.
Category: structured_generation
Reasoning
Category: reasoning
Video Understanding
Category: video
Multimodal understanding
Category: multimodal
Planning
Category: planning
Robotics
Robot controlRobot manipulationBimanual manipulationEmbodied task planningScene understandingSpatial reasoningSpatial predictionEnvironment modelingVisual grounding

Technical architecture

Core Architecture