Robots Atlas>ROBOTS ATLAS
GR00T N1

GR00T N1

N1 (2B) · Family: GR00T
NVIDIA open foundation model for humanoid robots. Dual-system architecture (VLM + Diffusion Transformer) generating manipulation actions.
✓ Active✓ Public access⚖ Open weightsRobotics foundation modelVision-Language-Action model📁 GR00T
Parameters
2B
parameters
Release date
18 March 2025
Access:DownloadDeployment:💻 Local📱 On-device

Overview

NVIDIA Isaac GR00T N1 is an open foundation model for generalized humanoid robot reasoning and skills. The cross-embodiment model accepts multimodal input (vision and natural language) and outputs continuous control actions for manipulation tasks across diverse environments.

Architecture

GR00T N1 employs a dual-system architecture inspired by human cognition: System 2 is a Vision-Language Model (based on NVIDIA-Eagle with SmolLM-1.7B) responsible for reasoning and planning; System 1 is a Diffusion Transformer generating continuous robot motion trajectories. Both systems are tightly coupled and jointly optimized during post-training.

Training

The model was trained on a pyramidal data mix: internet-scale web data and human videos at the base, synthetic data generated on NVIDIA Omniverse in the middle, and real teleoperated robot data at the peak. Combining 750K synthetic trajectories (generated in 11 hours via the Isaac GR00T Blueprint) with real data yielded a 40% performance gain over real-data-only training.

Availability

The GR00T-N1-2B weights are publicly available on Hugging Face. PyTorch fine-tuning and inference scripts are on GitHub (NVIDIA/Isaac-GR00T). Minimum post-training configuration: a single NVIDIA RTX A6000 or RTX 4090. Inference is supported on NVIDIA Jetson AGX Orin.

Classification
Robotics foundation modelVision-Language-Action model
Family: GR00T
Access & deployment
Download
LocalOn-device
Weights: Open weights
Key parameters
🧩 Parameters: 2B
✓ Fine-tuning
📥 Input: text, image, robot sensors, robot state data
Robotics
Bimanual manipulationDexterous manipulationRobot manipulationEmbodied task planningRobot controlScene understandingVisual grounding

Technical specification

Parameters
2B
parameters
License
NVIDIA Open Model License
Hardware requirements
Post-training: NVIDIA RTX A6000 or RTX 4090 (minimum); recommended NVIDIA DGX Spark / DGX H100. Inference: NVIDIA RTX A6000 or NVIDIA Jetson AGX Orin.
Features:Fine-tuning
Modalities
⬇ Input
textimagerobot_sensorsrobot_state_data
⬆ Output
robot_actionsmotion_trajectoriesmanipulator_control

Capabilities and applications

Native model capabilities
Multimodal understanding
Category: multimodal
Image understanding
Category: vision
Reasoning
Category: reasoning
Planning
Category: planning
Multi-step reasoning
Category: reasoning
Robotics
Bimanual manipulationDexterous manipulationRobot manipulationEmbodied task planningRobot controlScene understandingVisual grounding

Benchmark results

4 benchmarks
RoboCasa
success rate · 100 demonstrations per task
32.1%%
📄 GR00T N1 paper (arXiv:2503.14734)
DexMG
success rate · 100 demonstrations per task
66.5%%
📄 GR00T N1 paper (arXiv:2503.14734)
GR-1 simulation suite
success rate · 100 demonstrations per task
50.0%%
📄 GR00T N1 paper (arXiv:2503.14734)
Real-world tabletop (full data, GR-1 humanoid)
average policy success rate · Full data; pick-and-place, articulated, industrial, coordination tasks
76.8%%
📄 NVIDIA Developer Blog (Mar 2025)

Technical architecture

Deployment and security