Training

Cross-Embodiment Learning

2022ActivePublished: 20 June 2026Updated: 20 June 2026Published

Key innovation

A single model learns to execute tasks across robots with different physical morphologies (humanoids, dual-arm rigs, mobile manipulators) instead of training a separate model per platform.

How it works

1) Collect data from multiple sources: teleoperation across different robots or human demonstrations (motion capture, egocentric cameras). 2) Convert to a shared action representation via a cross-embodiment pipeline (e.g. kinematics-independent action tokenization). 3) Train a high-level policy on the unified dataset (task planning, scene understanding). 4) A low-level controller translates intent into physical motion respecting each robot's dynamics. 5) Optionally: a lightweight sim-to-real compensation model corrects hardware-specific errors.

Problem solved

Classical robot policy learning required collecting a separate dataset and training a separate model for every robot body. This made robotics unable to scale in the way LLMs do. Cross-Embodiment Learning addresses this by decoupling intelligence from embodiment and letting a single model drive many platforms.

Components

Cross-Embodiment Data PipelineUnification of data from heterogeneous sources

A layer that converts observations and actions from different robots (or human demonstrations) into a shared representation. May be proprioceptive normalization, canonical state representation, or action tokenization.

Official

High-level policyScene understanding, task planning, intent generation

An AI model (typically VLA or a robotics foundation model) producing task-level behavior — what to do, in what order, where to direct attention. Operates on embodiment-agnostic actions.

Low-level controllerPhysical execution, balance, stability

Embodiment-specific component — translates abstract intent into concrete motor commands, torques, trajectories and control signals that respect the specific robot's dynamics and constraints.

Official

Sim-to-real compensation modelClosing the sim-to-real gap

An optional, lightweight layer that corrects tracking errors and dynamics mismatch between simulation and real hardware. Trained on a small dataset from real deployments.

Official

Implementation

Reference implementations

Open X-Embodiment / RT-X

Python · Google DeepMind + 33 academic labs

Official

Octo (open-source generalist robot policy)

Python (JAX) · UC Berkeley + Stanford + CMU

pi-0

Physical Intelligence

Official

Implementation pitfalls

Kinematic differences between robotsHigh

One robot's action space may be unreachable for another (reach, degrees of freedom). Direct imitation leads to execution errors.

Fix:Introduce an action abstraction layer (e.g. end-effector goals instead of joint positions) and dedicated low-level controllers per robot.

Perception and control latencyMedium

Human data is delay-free, robot data has real latency. Direct imitation leads to desynchronization.

Fix:Hierarchical reasoning loop monitoring low-level feedback and adaptively scheduling actions.

Sim-to-real gapHigh

A policy trained in simulation often fails on the real robot due to dynamics mismatch, friction and latency.

Fix:A lightweight compensation model trained on real deployment data to correct tracking errors.

Evolution

Original paper · 2023 · arXiv preprint (ICRA 2024) · Open X-Embodiment Collaboration (Google DeepMind + 33 academic labs)

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration (Google DeepMind + 33 academic labs)

2022

RT-1 (Robotics Transformer)

Google Robotics releases RT-1 — the first large transformer trained on data from 13 robots. Shows that robot policies can be scaled like LLMs.

2023

Open X-Embodiment + RT-X

Inflection point

A consortium of 34 institutions publishes 1M+ trajectories from 22 robot types. RT-X demonstrates positive cross-embodiment skill transfer.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models (paper)

2024

pi-0 (Physical Intelligence)

Physical Intelligence (PI) releases pi-0 — a generalist VLA trained cross-embodiment on 8 platforms.

2026

Mind-0 (MindOn) — human-centric cross-embodiment

Inflection point

MindOn shows that a cross-embodiment policy can be trained purely from human-centric data (whole-body motion capture, egocentric cameras), without robot teleoperation. Demo: one model simultaneously driving a Unitree G1 humanoid and a stationary dual-arm rig.

(concept)