Neural Networks: From Fundamentals to Modern AI · Backpropagation: How a Network Learns

Building micrograd: Value, backward(), graph visualization (Karpathy)

Backpropagation: How a Network Learns

Introduction

Andrej Karpathy's micrograd (2020) is ~150 lines of Python that implement an entire autograd engine: the Value class stores data + grad + a list of parents + a _backward function; operators __add__, __mul__, __pow__, tanh build the graph dynamically during forward; the backward() method performs a topological sort and propagates the gradient from the loss to the leaves. This lesson walks through the construction step by step: why we need += instead of = when accumulating gradients (multivariate chain rule), how DFS-based topological sort works, how to visualize the graph via Graphviz, and how the same pattern scales to PyTorch (where instead of scalars we have tensors and instead of Python — a C++/CUDA backend). The key lesson: backprop is neither magic nor a framework — it is an algorithm you can write yourself in one evening.