Neural Networks: From Fundamentals to Modern AI · Regularization — how to avoid overfitting

Dropout: mechanism, train vs eval mode, implementation

Regularization — how to avoid overfitting

Introduction

Dropout (Srivastava et al. 2014, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting") is one of the most influential regularization techniques in deep learning. The mechanism is seemingly trivial — during training we randomly zero out a fraction of activations in each layer with probability p, and during evaluation the layer turns off and passes everything through unchanged. Beneath this simplicity lie several subtleties that decide whether dropout will help or break training: the need to rescale activations (inverted dropout), the difference between train and eval phase in PyTorch (model.train() vs model.eval()), interactions with BatchNorm, and the interpretation as averaging exponentially many sub-networks in a single one. This lesson moves from intuition (preventing neuron co-adaptation), through the math of variance compensation, to practical implementation pitfalls — including forgetting model.eval() during validation, which can look like chaotic noise in the val_loss metric with no apparent cause.