Neural Networks: From Fundamentals to Modern AI · Math and tools: tensors, gradients, Python, NumPy

Gradient descent on a simple function — walking downhill step by step

Math and tools: tensors, gradients, Python, NumPy

Introduction

We already know the gradient is an arrow pointing in the direction of fastest growth of a function, and its opposite (−∇f) points in the direction of fastest decrease. Gradient descent is the algorithm that uses this arrow: you stand at some point, compute the gradient, take a small step in the opposite direction — and repeat until you reach the valley. In this lesson we will execute the algorithm BY HAND on a toy function f(x) = x² (a parabola with minimum at x=0). We start at x=2, set a learning rate η=0.1, and compute several iterations on paper. You will see x approach zero at a steady pace, with steps naturally shrinking near the minimum (because the gradient shrinks there too). This is exactly the same algorithm that trains neural networks — only instead of a single variable x we have millions of weights, and instead of x² we have a loss depending on all those weights.