Neural Networks: From Fundamentals to Modern AI · Backpropagation: How a Network Learns

Chain rule — the foundation of backpropagation

Backpropagation: How a Network Learns

Introduction

Backpropagation is essentially one thing: a systematic, mechanical application of the chain rule from calculus. If y = f(g(h(x))), then dy/dx = f'(g(h(x))) · g'(h(x)) · h'(x) — the derivative of a composition is a product of derivatives. This lesson covers the chain rule in scalar, vector and Jacobian form, introduces the computational graph (DAG) as the basis of modern autograd, and explains why training deep networks is mathematically tractable. You will understand why reverse-mode autodiff is ~m times cheaper than forward-mode for functions R^n -> R^m when m is much smaller than n, and where vanishing and exploding gradients come from as a direct consequence of the product in the chain rule.