Robots Atlas>ROBOTS ATLAS
2 June 2026 · 5 min readCosmos 3World Action ModelPhysical AI

NVIDIA Cosmos 3 Launches as World Action Model to Replace VLA Paradigm

NVIDIA Cosmos 3 Launches as World Action Model to Replace VLA Paradigm

NVIDIA presented Cosmos 3 at GTC Taipei during COMPUTEX -- an open-weights physical AI foundation model integrating visual reasoning, physical simulation, and robotic action prediction in a single architecture. The launch realizes the WAM (World Action Model) blueprint outlined by Jim Fan, NVIDIA's Lead of Embodied Autonomous Research, as the strategic direction for robotics.

Key takeaways

  • Cosmos 3 released as open-weights under the OpenMDW 1.1 license (Linux Foundation) -- available on Hugging Face, GitHub and NVIDIA NIM microservices
  • Mixture-of-transformers architecture: reasoning block (scenes, interactions, spatial-temporal relationships) + generation block (video, text, audio, actions)
  • Native action generation: model directly outputs joint angles, gripper positions and spatial trajectories -- no natural language intermediary
  • Ranked first across four open leaderboards: Artificial Analysis, Physics-IQ, PAI-Bench and R-Bench
  • Cosmos Coalition with Agile Robots, Skild AI and Generalist AI -- standardizing open world models under DGX Cloud infrastructure

From VLA to WAM: A Paradigm Shift

The dominant architecture in robot machine learning has been the VLA (Vision-Language-Action) model -- systems that process images through a language head and generate actions from that. Their limitation is structural: physics and robot movements are not well described by natural language grammar.

NVIDIA Cosmos 3 shifts the focus to the WAM paradigm -- a video-first model where physics and actions are first-class citizens. Jensen Huang, CEO of NVIDIA, declared during his keynote:

The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models.

The mixture-of-transformers architecture is designed so that the reasoning block interprets moving scenes and object interactions, while the generation block produces physically grounded outputs.

Trained on a massive multimodal dataset comprising billions of physical AI samples, Cosmos 3 swept four open leaderboards simultaneously: Artificial Analysis, Physics-IQ, PAI-Bench and R-Bench -- all evaluating world generation accuracy for physical environments.

Three Model Tiers

  • Cosmos 3 Super -- designed for post-training robotics and AV models requiring the highest physics accuracy and generation quality
  • Cosmos 3 Nano -- a lightweight variant optimized for high-quality video and action reasoning in fractions of a second
  • Cosmos 3 Edge -- a forthcoming variant tailored for real-time inference directly on physical hardware at the edge

In practice, policies post-trained with Cosmos 3 Nano secured the top spots on the RoboLab and RoboArena leaderboards -- evaluating robot control effectiveness in simulated and real-world environments.

Native Action Generation and Hardware Integration

The key innovation in Cosmos 3 is operating as a single all-in-one model (an "omnimodel") that translates what it sees directly into the robot's movements (so-called native action generation), without relying on separate modules. Instead of using vision-language training as a conceptual bridge, the system directly outputs numerical action data: joint angles, gripper positions and spatial trajectory points. For complex tasks requiring the use of both hands at once (bimanual manipulation), the robot receives immediate, reactive guidance.

Cosmos 3 is tightly coupled with the NVIDIA compute hardware it is built to run on. Simultaneously at GTC Taipei, the company unveiled the Isaac GR00T Reference Humanoid Robot -- an open reference architecture leveraging the next-generation Jetson AGX Thor T5000 compute. Cosmos 3 serves as the predictive baseline driving these setups, compressing research validation cycles from months to days.

Cosmos Coalition and Open-Weights Strategy

NVIDIA announced the Cosmos Coalition -- a global collaboration uniting world model builders, AI developers, and robotics pioneers to standardize open-source physical AI through shared models, evaluation metrics, and large-scale training workflows over NVIDIA DGX Cloud infrastructure. The OpenMDW 1.1 license allows developers to train, modify, redistribute and deploy weights, documentation and source code across enterprise pipelines.

Founding coalition members: Agile Robots (Munich) uses Cosmos 3 to generate action-conditioned trajectories for its Agile ONE humanoid platform; Skild AI, backed by a $1.4 billion Series C, integrates the model with its fleet orchestration software; Generalist AI -- known for training large models from scratch -- gains access to Cosmos 3's synthetic data engine to supplement its proprietary dataset.

Why This Matters

Cosmos 3 is NVIDIA's bid to occupy the base infrastructure position for physical AI -- analogous to the role CUDA -- NVIDIA's proprietary platform that harnesses graphics cards (GPUs) for AI computation -- plays in machine learning. The open-weights release under OpenMDW 1.1 is a calculated blow against closed universal intelligence layer models. By preventing ecosystem lock-in by any single player, NVIDIA positions its hardware stack (GPUs, Jetson, DGX Cloud) as the indispensable compute layer for everyone building on Cosmos 3.

For robotics labs, the shift is significant: policies post-trained on Cosmos 3 Nano win on open leaderboards, suggesting that anchoring a pipeline on Cosmos 3 as baseline is becoming a genuine competitive alternative to training a model from scratch. Compressing validation cycles from months to days translates directly into faster product iteration speed.

What's Next

  • Cosmos 3 Edge -- the on-device inference variant -- is scheduled for a future release; its availability will be critical for industrial robot deployments without cloud connectivity.
  • Skild AI plans to integrate Cosmos 3 with its fleet orchestration software following the Fetch Robotics acquisition -- the outcome will test the omnimodel's scalability in warehouse environments.
  • PAI-Bench and other open leaderboards will be the key adoption indicators -- a growing number of models post-trained on Cosmos 3 would confirm the digital flywheel thesis.

Sources

Share this article