Neural Networks: From Fundamentals to Modern AI · Convolutional Neural Networks (CNN)

Pooling, feature maps, and dimension flow through the network

Convolutional Neural Networks (CNN)

Introduction

After convolutions comes aggregation — pooling reduces the spatial size of a feature map without learned parameters. The most popular variants are **max pooling** (takes the maximum from a 2×2 or 3×3 window) and **average pooling** (takes the mean). Each variant carries a different inductive bias: max answers "did this feature appear here?", and average answers "what is the average feature strength in this region?". Pooling serves several functions at once: it reduces the compute cost of later layers (quadratic FLOP reduction), widens the receptive field of every neuron deeper in the network, introduces partial translation invariance within the window, and acts as a form of regularization. This lesson systematically covers the math of pooling, its role in classical architectures (AlexNet, VGG), the slow retirement of max-pooling in favor of strided convolution in modern models (ResNet, ConvNeXt), the key role of **global average pooling** (Lin et al. "Network In Network" 2013) as the bridge between a feature map and the classifier, and the exact flow of dimensions (N, C, H, W) through a typical ImageNet network — from 224×224×3 to a 1000-logit vector, step by step.