NVIDIA open foundation model for humanoid robots. Dual-system architecture (VLM + Diffusion Transformer) generating manipulation actions.
Parameters
2B
parameters
Release date
18 March 2025
Access:DownloadDeployment:💻 Local📱 On-device
Overview
Access & deployment
Download
LocalOn-device
Weights: Open weights
Key parameters
🧩 Parameters: 2B
✓ Fine-tuning
📥 Input: text, image, robot sensors, robot state data
Robotics
Bimanual manipulationDexterous manipulationRobot manipulationEmbodied task planningRobot controlScene understandingVisual grounding
Platforms
Technical specification
Parameters
2B
parameters
License
NVIDIA Open Model License
Hardware requirements
Post-training: NVIDIA RTX A6000 or RTX 4090 (minimum); recommended NVIDIA DGX Spark / DGX H100. Inference: NVIDIA RTX A6000 or NVIDIA Jetson AGX Orin.
Features:✓ Fine-tuning
Modalities
⬇ Input
textimagerobot_sensorsrobot_state_data
⬆ Output
robot_actionsmotion_trajectoriesmanipulator_control
Capabilities and applications
Native model capabilities
Multimodal understanding
Category: multimodal
Image understanding
Category: vision
Reasoning
Category: reasoning
Planning
Category: planning
Multi-step reasoning
Category: reasoning
Robotics
Bimanual manipulationDexterous manipulationRobot manipulationEmbodied task planningRobot controlScene understandingVisual grounding
Benchmark results
4 benchmarks
RoboCasa
success rate · 100 demonstrations per task
32.1%%
📄 GR00T N1 paper (arXiv:2503.14734)
DexMG
success rate · 100 demonstrations per task
66.5%%
📄 GR00T N1 paper (arXiv:2503.14734)
GR-1 simulation suite
success rate · 100 demonstrations per task
50.0%%
📄 GR00T N1 paper (arXiv:2503.14734)
Real-world tabletop (full data, GR-1 humanoid)
average policy success rate · Full data; pick-and-place, articulated, industrial, coordination tasks
76.8%%
📄 NVIDIA Developer Blog (Mar 2025)
Technical architecture
Core Architecture
Model Form
Training Techniques
Deployment and security
🤖 Related robots
💾 Related software
☁ Available on platforms
