Imitation Learning / Behavior Cloning

Learning an agent policy directly from expert demonstrations without defining a reward function, eliminating reward engineering in robotics.

Category

Abstraction level

Robotic policy trainingObject manipulationAutonomous navigationRobotic arm controlFine-tuning foundation models on human data

Pairs of (observation, action) are collected from expert demonstrations. A model (policy network) is trained to map observations to actions by minimising MSE or cross-entropy. In BC the model learns off-policy — without environment interaction during training. In more advanced variants (DAgger) the agent queries the expert in-the-loop to correct distribution shift errors.

Difficulty of defining reward functions for complex robotic tasks; need for efficient skill transfer from human demonstrations.

GENESIS · Source paper

Efficient Training of Artificial Neural Networks for Autonomous Navigation

1991Neural Computation, 1991Dean A. Pomerleau

1991

ALVINN (Pomerleau) — first demonstration of Behavior Cloning for autonomous navigation

breakthrough

2011

DAgger (Ross et al.) — iterative dataset aggregation solves the distribution shift problem in BC

breakthrough

2022

Open-X-Embodiment — scaling IL to millions of robotic demonstrations across diverse platforms

breakthrough

2025

UnifoLM-WMA-0 applies IL/BC as Policy Enhancement on Open-X data

GPU Tensor CoresPRIMARY

Training neural network policies on large demonstration datasets requires GPUs.

Related AI models

Other

Ti0

UnifoLM-WMA-0

Back to technology catalog

Imitation Learning / Behavior Cloning

Use cases

How it works

Problem solved

History and evolution

Preferred hardware

Related models and families

Related AI models

Other