>ROBOTS ATLAS

Control · Control & Planning

RT-2 (Robotics Transformer 2)

Name: RT-2 (Robotics Transformer 2)
Brand: Google DeepMind

Gemini Robotics·

Active

CATEGORYControl · Control & Planning

READINESSTRL 6

ADOPTION SCALEResearch / Prototype

LICENSESLicenseRef-Proprietary

FIRST RELEASE2023

**RT-2 (Robotics Transformer 2)** is a breakthrough Vision-Language-Action model announced by Google DeepMind in July 2023 (paper 'RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control', Brohan et al., arXiv:2307.15818). The successor to RT-1, but unlike RT-1 (which was a specialized 35M-parameter transformer trained exclusively on robotic data), RT-2 **co-trains** a massive pretrained Vision-Language Model (PaLI-X 55B or PaLM-E 12B) on a **mixed dataset** of web data (~10B image-text pairs) + robotic data from RT-1 (~130k episodes).

The key innovation: **actions as tokens**. RT-2 discretizes the robot's action space (6-DOF end-effector translation/rotation + gripper) into 256 bins per dimension and treats them as **additional tokens in the VLM's vocabulary**. This lets the model uniformly generate the 'answer' as a sequence of tokens — whether textual (VQA) or a robot action. This approach enables **emergent capabilities**: RT-2 can perform semantic tasks requiring chain-of-thought reasoning ('Move banana to the sum of two and one' → count the items, find '3' as the answer, approach the third object) that are absent from the training data.

Results: RT-2 achieves **62% success in generalization scenarios** (novel objects, novel instructions, novel backgrounds) vs. 32% for RT-1. Experiments were run on Everyday Robots mobile manipulators (a Google internal project — discontinued since 2023) and Franka. RT-2 **is not open source** — Google DeepMind only released a checkpoint of a small replication version, with no full PaLI-X/PaLM-E weights. Successors: **RT-X / Open X-Embodiment** (October 2023, cross-embodiment generalization), **Gemini Robotics** (March 2025, Apptronik Apollo integration).

RT-2 launched the era of 'foundation models for robotics' and influenced an entire generation of subsequent models: OpenVLA (Stanford/Berkeley, open-source replication), π0 (Physical Intelligence), Octo (Berkeley), CogACT. Most VLAs from 2024-2026 inherit the 'tokens as actions' architecture from RT-2.

Documentation

Type & Roles

Software types

VLA / Foundation ModelRuntime

Runtime

A Runtime is the environment or execution layer used to run code, load libraries, manage dependencies, and operate applications or services — either in real time or during normal system operation. In robotics this includes real-time operating system (RTOS) runtimes, ROS 2 executor runtimes, containerised execution environments (Docker, podman), and embedded C++ runtimes on microcontrollers.

Select an item to see its description.

Main category

Control & PlanningPerception & Vision SoftwareRuntime & Infrastructure

Roles in robotics ecosystem

Robot ControlMotion PlanningPerceptionComputer Vision

Robot Control

Robot Control denotes the role of software responsible for motion control, command execution, coordination of actuating elements and the direct operational logic of the robot.

Select an item to see its description.

Software family

Family

Proprietary VLA Stacks

Maturity & Adoption

6 / 9

Demonstration phase

ResearchPrototypeProduction

Adoption scaleResearch / Prototype

Maintenance statusInternal / Proprietary – Not Public

First release2023

Last update20 May 2026

Deployments

Internal Google DeepMind experiments on Everyday Robots mobile manipulators (a discontinued project) — 6,000+ eval scenarios in 2023. RT-2 became the foundation of the Open X-Embodiment program (X-Embodiment / RT-X, October 2023, 21 institutions + Google DeepMind, 1M demonstrations from 22 robotic platforms). The successor **Gemini Robotics** (March 2025) is integrated with Apptronik Apollo and Boston Dynamics Spot.

Community

No public weights repository. The arXiv:2307.15818 paper has 1,500+ citations (Q1 2026). The robotics-transformer2.github.io project site with video demonstrations. RT-X / Open X-Embodiment github.com/openx-embodiment ~2.8k★. The #RT-2 / #VLA hashtag on X/Twitter generates ~50-100 mentions/day.

Organizations

Google DeepMind

Producer · Primary

Integrates with

NVIDIA Isaac Sim

Photorealistic robotics simulator (RTX) with advanced PhysX 5 physics. Built on Omniverse Kit, supports ROS 2, synthetic data generation (SDG), Isaac Lab training, and the Isaac ROS deployment pipeline on Jetson.

→

MuJoCo (Multi-Joint dynamics with Contact)

Open-source rigid-body physics engine with accurate contact and friction, by Emo Todorov. Apache 2.0 since 2021, maintained by DeepMind. Standard for learning-based robotics (RL) and sim-to-real.

→

OpenVLA

Open-source replication of RT-2 (Stanford + Berkeley + TRI, June 2024). A 7B-parameter VLA (Llama 2 + DINOv2 + SigLIP), trained on 970k demonstrations from Open X-Embodiment.

→

Related robotics software

Π(

π0 (pi-zero)

Physical Intelligence's first 'generalist robot policy' — VLA with flow matching, 50 Hz actions, trained on 10,000+ hours of demonstrations. Open weights π0-base (February 2025, Apache 2.0).

→

Π(

π0.5 (pi-zero-5)

π0.5 (pi-zero-5, Physical Intelligence) — evolution of π0 focused on open-world mobile manipulation: generalises to new environments and tasks without additional fine-tuning thanks to large-scale training data.

→

OpenVLA

Open-source replication of RT-2 (Stanford + Berkeley + TRI, June 2024). A 7B-parameter VLA (Llama 2 + DINOv2 + SigLIP), trained on 970k demonstrations from Open X-Embodiment.

→

GR00T N1

NVIDIA GR00T N1 — open foundational model for humanoids: dual-system VLA (fast motor policy + slow semantic reasoning), trained on cross-embodiment data, available through NVIDIA Isaac.

→

LeRobot

LeRobot (Hugging Face) — open-source robot learning framework with ACT, Diffusion Policy and TDMPC implementations. Standardises data collection and policy training for manipulators and mobile robots.

→

Supported robot models

Atlas

Boston Dynamics bipedal humanoid robot. The fully electric generation unveiled in 2024 succeeds the hydraulic Atlas that was retired after more than a decade of research.

Applications

Factory Automation
Research
Production Line Operation
Object Manipulation

Figure 03

Figure 03 is the third-generation humanoid robot from Figure AI, designed for Helix, home environments, and scalable mass production.

Applications

Factory Automation
Industrial Logistics
Production Line Operation
Warehouse Automation
Object Manipulation
Home Assistance

Unitree G1

Bipedal humanoid robot by Unitree Robotics, designed as a compact research, development, and developer platform.

Applications

Research
Home Assistance

Target robotic platforms

Robotic Arm

Mobile Robot

Research Robot

Supported hardware

NVIDIA Jetson AGX Orin 64GB

NVIDIA Jetson AGX Thor

compute · compute_modules · industrial

Intel RealSense D435i

sensing · cameras · research · Stereo RGB-D camera

Stereolabs ZED 2i

sensing · cameras · industrial

ROS supportCompatibility with ROS / ROS 2 ecosystem

Community ROS 2 WrapperWrapper ROS 2 tworzony i utrzymywany przez społeczność, nie przez producenta

System capabilities

⊙

Open source

Source code is publicly available under an open-source license — enables security audits, custom modifications, and integration without licensing barriers.

⚡

Real-time capable

Designed with timing-determinism guarantees — meets the requirements of control loops, safety systems, and tasks demanding low, predictable latency.

⟨/⟩

API available

The software exposes a programmable interface (REST, gRPC, SDK, or language bindings) that enables automation and integration with other systems.

📦

Pre-built / binary

Distributed as ready-to-use binary packages, container images, or installers — no need to build from source.

Programming languages

PythonC++CUDA

Operating systems

Ubuntu 22.04Ubuntu 20.04Ubuntu 24.04Debian

Ubuntu 24.04

Ubuntu 24.04 LTS 'Noble Numbat' — supported until April 2029. The host for ROS 2 Jazzy.

Select an item to see its description.

Minimum hardware requirements

CPUPaLI-X 55B inference: requires a TPU v4 pod 8 or 8× H100 80 GB. Local replications on 1-2 GPUs are possible for smaller variants.

RAM (GB)64

GPUModel weights are closed — running takes place inside Google. Replication experiments: 1× A100 80 GB minimum for smaller variants.

Disk (GB)500

RT-2 was not released as open weights. Google has only published the paper and inference code for a small replication version. The community built OpenVLA (Stanford/Berkeley) as an open-source successor with an architecture similar to RT-2.

Packaging & distribution

Package managers

pip / PyPIGitHub Releases / GitHub Actions ArtifactsDocker / Docker HubSource – Python (setup.py / pyproject.toml)conda / mamba

CPU architectures

x86_64 (AMD64)NVIDIA GPU (CUDA – x86_64)ARM64 / AArch64NVIDIA Jetson – AArch64 (JetPack)

Installation difficulty

LevelExpert only

Protocols and interfaces

Communication protocols

gRPCREST API (HTTP/HTTPS)WebSocketROS 2 TopicsShared Memory (POSIX / mmap)

Hardware interfaces

Ethernet 1000BASE-T (Gigabit Ethernet)Ethernet 10GBASE-T (10 Gigabit Ethernet)USB 3.0 / 3.1 Gen 1PCIe 4.0MIPI CSI-2

Latency classes

Soft Real-Time (100–500 ms)Near Real-Time (500 ms – 2 s)

Deployment types

CloudLocal WorkstationHybrid

Supported simulators

MuJoCo

NVIDIA Isaac Sim

Gazebo Harmonic

PyBullet / Bullet3

Licenses

License family: Proprietary – Commercial

SPDX ↗

Version history

Gemini RoboticsMar 2025

Successor to RT-2 based on Gemini 2.0 — a multimodal VLA with an Apptronik (Apollo) partnership.

AutoRT + SARA-RTJan 2024

Derived architectures — AutoRT (scaling robot data collection) + SARA-RT (efficient self-attention).

RT-2-XOct 2023

RT-2 trained on the expanded Open X-Embodiment dataset.

RT-X / Open X-EmbodimentOct 2023

Cross-embodiment training: 1M demonstrations across 22 robots + 21 institutions. First cross-embodiment generalization benchmark.

RT-2 paperJul 2023

RT-2 — co-training PaLI-X 55B with robotic data. Introduces 'tokens as actions'. Emergent reasoning capabilities.

RT-1 paperDec 2022

Robotics Transformer 1 — 35M parameters, trained exclusively on robotic data (130k Everyday Robots episodes). The baseline benchmark.