Neural Networks: From Fundamentals to Modern AI · Convolutional Neural Networks (CNN)

Architecture evolution: AlexNet → VGG → ResNet — what changed and why

Convolutional Neural Networks (CNN)

Introduction

Three models define the timeline of the modern deep learning revolution in computer vision: AlexNet (Krizhevsky, Sutskever, Hinton 2012) wins ImageNet with a top-5 error of 15.3% (previous record 26.2%), igniting the wave of interest in CNNs; VGG (Simonyan & Zisserman 2014) proves that depth matters and that small 3×3 filters beat large ones; ResNet (He, Zhang, Ren, Sun 2015) solves the degradation problem in very deep networks through skip connections, enabling 152 layers without loss of learning ability. Each of these networks introduced a specific technical innovation in response to a specific problem of its time: AlexNet — ReLU, dropout, augmentation, GPU; VGG — the universality of small filters and a deep homogeneous architecture; ResNet — residual learning, batch normalization, bottleneck blocks. This lesson goes through each architecture from the side of numbers (layers, parameters, FLOPs, top-5 error), key design decisions, problems they solved, and limitations they revealed that forced the next iteration. Understanding this line is the foundation for evaluating any new architecture.