Robots AtlasRobots Atlas

Imitation Learning / Behavior Cloning

Learning an agent policy directly from expert demonstrations without defining a reward function, eliminating reward engineering in robotics.

Category
Abstraction level
Robotic policy trainingObject manipulationAutonomous navigationRobotic arm controlFine-tuning foundation models on human data

Pairs of (observation, action) are collected from expert demonstrations. A model (policy network) is trained to map observations to actions by minimising MSE or cross-entropy. In BC the model learns off-policy β€” without environment interaction during training. In more advanced variants (DAgger) the agent queries the expert in-the-loop to correct distribution shift errors.

Difficulty of defining reward functions for complex robotic tasks; need for efficient skill transfer from human demonstrations.

GENESIS Β· Source paper

Efficient Training of Artificial Neural Networks for Autonomous Navigation
1991Neural Computation, 1991Dean A. Pomerleau
1991

ALVINN (Pomerleau) β€” first demonstration of Behavior Cloning for autonomous navigation

breakthrough
2011

DAgger (Ross et al.) β€” iterative dataset aggregation solves the distribution shift problem in BC

breakthrough
2022

Open-X-Embodiment β€” scaling IL to millions of robotic demonstrations across diverse platforms

breakthrough
2025

UnifoLM-WMA-0 applies IL/BC as Policy Enhancement on Open-X data

GPU Tensor CoresPRIMARY

Training neural network policies on large demonstration datasets requires GPUs.

Related AI models

Other

2