Imitation Learning / Behavior Cloning
Learning an agent policy directly from expert demonstrations without defining a reward function, eliminating reward engineering in robotics.
Pairs of (observation, action) are collected from expert demonstrations. A model (policy network) is trained to map observations to actions by minimising MSE or cross-entropy. In BC the model learns off-policy β without environment interaction during training. In more advanced variants (DAgger) the agent queries the expert in-the-loop to correct distribution shift errors.
Difficulty of defining reward functions for complex robotic tasks; need for efficient skill transfer from human demonstrations.
GENESIS Β· Source paper
Efficient Training of Artificial Neural Networks for Autonomous NavigationALVINN (Pomerleau) β first demonstration of Behavior Cloning for autonomous navigation
breakthroughDAgger (Ross et al.) β iterative dataset aggregation solves the distribution shift problem in BC
breakthroughOpen-X-Embodiment β scaling IL to millions of robotic demonstrations across diverse platforms
breakthroughUnifoLM-WMA-0 applies IL/BC as Policy Enhancement on Open-X data
Training neural network policies on large demonstration datasets requires GPUs.