NVIDIA's open World Foundation Model line for generating future world states from text, image or video. The main generative model in the Cosmos platform for training robots and autonomous vehicles.
Parameters
4B – 14B (Cosmos Predict 1, wiele wariantów)
parameters
Release date
6 January 2025
Access:DownloadAPIHostedDeployment:💻 Local☁ Cloud
Overview
Access & deployment
DownloadAPIHosted
LocalCloud
Weights: Open weights
Key parameters
🧩 Parameters: 4B – 14B (Cosmos Predict 1, wiele wariantów)
✓ Fine-tuning
📥 Input: text, image, video, robot state data
Robotics
Environment modelingSpatial predictionScene understandingSpatial reasoning
Platforms
Technical specification
Parameters
4B – 14B (Cosmos Predict 1, wiele wariantów)
parameters
License
NVIDIA Open Model License (Cosmos Predict 1 / 2 / 2.5)
Hardware requirements
Training and inference on NVIDIA GPU clusters (recommended: H100 / B100 / GB200). Inference for the smaller variants (4B–7B) is feasible on a single server-grade GPU; the 12B–14B variants and multiview scenarios require multiple GPUs. Reference implementation in PyTorch.
Features:✓ Fine-tuning
Modalities
⬇ Input
textimagevideorobot_state_data
⬆ Output
video
Capabilities and applications
Native model capabilities
Video generation
The model's ability to generate video clips from a text prompt, image or another video, with control over length, resolution and visual characteristics.
Category: video
Image-to-video
The model's ability to animate a static input image — extending it in time into a consistent video clip according to a description of motion or action.
Category: video
Video understanding
The model's ability to analyse and interpret video content — recognising actions, motion, events and relationships between objects over time.
Category: video
Planning
The model's ability to determine a sequence of actions leading to a goal — predicting the consequences of actions and selecting an optimal path in a given environment.
Category: planning
Robotics
Environment modelingSpatial predictionScene understandingSpatial reasoning
Technical architecture
Core Architecture
Model Form
Deployment and security
☁ Available on platforms
