NVIDIA open foundation model for humanoid robots, successor of GR00T N1. Flow matching transformer architecture with pre-trained SigLip2 (vision) and T5 (language) encoders.
Parameters
3B
parameters
Access:DownloadDeployment:💻 Local📱 On-device
Overview
Access & deployment
Download
LocalOn-device
Weights: Open weights
Key parameters
🧩 Parameters: 3B
✓ Fine-tuning
📥 Input: text, image, robot sensors, robot state data
Robotics
Bimanual manipulationDexterous manipulationRobot manipulationEmbodied task planningRobot controlScene understandingVisual grounding
Platforms
Technical specification
Parameters
3B
parameters
License
NVIDIA One-Way Noncommercial License
Hardware requirements
Supported NVIDIA microarchitectures: Ampere, Blackwell, Hopper, Lovelace, Jetson. Runtime: PyTorch. OS: Linux.
Features:✓ Fine-tuning
Modalities
⬇ Input
textimagerobot_sensorsrobot_state_data
⬆ Output
robot_actionsmotion_trajectoriesmanipulator_control
Capabilities and applications
Native model capabilities
Multimodal understanding
Category: multimodal
Image understanding
Category: vision
Reasoning
Category: reasoning
Planning
Category: planning
Multi-step reasoning
Category: reasoning
Robotics
Bimanual manipulationDexterous manipulationRobot manipulationEmbodied task planningRobot controlScene understandingVisual grounding
Technical architecture
Core Architecture
Model Form
Training Techniques
Deployment and security
🤖 Related robots
💾 Related software
☁ Available on platforms
Sources and related pages
6 sources
WebNVIDIA Isaac GR00T - oficjalna stronaRepoGR00T-N1.5-3B model card (Hugging Face)RepoNVIDIA/Isaac-GR00T (GitHub)PaperEagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models (arXiv:2501.14818)Paperπ0: A Vision-Language-Action Flow Model for General Robot Control (arXiv:2410.24164)PaperFlow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow (arXiv:2209.03003)
