Neural Networks: From Fundamentals to Modern AI · Math and tools: tensors, gradients, Python, NumPy

Gradient of a multi-variable function — an arrow on the loss map

Math and tools: tensors, gradients, Python, NumPy

Introduction

A derivative tells us the steepness of a single-variable function. But a neural network has millions of variables (weights), and the loss depends on all of them at once. Enter the gradient: a vector containing the partial derivative of the loss with respect to each variable separately. Geometrically the gradient is an arrow in parameter space pointing in the direction of fastest growth of the function — and the opposite vector points in the direction of fastest decrease, i.e. "which way to go to reduce the loss". This is exactly the arrow that gradient descent follows step by step, walking down the loss map toward the valley. In this lesson we build intuition for the gradient on 2D functions (a height map) and show that the gradient in a network has exactly as many components as the network has parameters.