GR00T N1.5

N1.5 (3B) · Family: GR00T

NVIDIA open foundation model for humanoid robots, successor of GR00T N1. Flow matching transformer architecture with pre-trained SigLip2 (vision) and T5 (language) encoders.

✓ Active⏳ Limited access⚖ Open weightsRobotics foundation modelVision-Language-Action model📁 GR00T

Parameters

parameters

🏢NVIDIAProducer

Access:DownloadDeployment:💻 Local📱 On-device

Overview

NVIDIA Isaac GR00T N1.5 is an open foundation model for generalized humanoid robot reasoning and skills. The cross-embodiment model takes multimodal input (vision and language) and outputs continuous control actions. It is the 3B-parameter successor to GR00T N1 (2B).

Architecture

GR00T N1.5 uses a pre-trained Vision Transformer (SigLip2) to encode robot camera frames and a pre-trained Transformer (T5) to encode text instructions. Proprioception is encoded by an MLP indexed by embodiment ID, with padding to a configured max length to handle variable-dimension proprioceptive vectors.

The action sequence is modeled by a flow matching transformer (implemented as a Diffusion Transformer / DiT with adaptive layernorm conditioning). The transformer interleaves self-attention over proprioception and actions with cross-attention to vision and language embeddings. Compared to N1, the MLP connector between vision-language features and the DiT was modified, and the model was trained jointly with flow matching and world-modeling objectives.

Inference

At inference time the policy samples a Gaussian noise vector and iteratively reconstructs a continuous-value action using velocity prediction.

Availability and license

GR00T-N1.5-3B weights are publicly available on Hugging Face. License: NVIDIA One-Way Noncommercial License — the model is ready for non-commercial use. Runs on Linux with the PyTorch runtime; supported microarchitectures: NVIDIA Ampere, Blackwell, Hopper, Lovelace, and Jetson.

Classification

Robotics foundation modelVision-Language-Action model

Family: GR00T

Applications

Robotic manipulation Robot policy training Simulation / synthetic data generation

Access & deployment

Download

LocalOn-device

Weights: Open weights

Key parameters

🧩 Parameters: 3B

✓ Fine-tuning