Neural Networks: From Fundamentals to Modern AI · Convolutional Neural Networks (CNN)
Skip connections and residual blocks — solving the degradation problem (He 2015)
Convolutional Neural Networks (CNN)
Introduction
A skip connection (also called residual connection or shortcut) is a primitive architectural element: a direct link between the input and output of a group of layers, adding one to the other (y = F(x) + x). Seemingly trivial, in practice one of the most important inventions in deep learning. This lesson covers the full history and mechanics: (1) the **degradation problem** observed by He et al. 2015 — deeper plain networks had HIGHER train error than shallower ones, fitting NEITHER overfitting nor vanishing gradient; (2) the **residual insight** — instead of learning F(x) = y (full transformation), learn F(x) = y − x (the difference from identity); if identity is optimal, the network only zeros F; (3) **block architecture** in three variants (basic, bottleneck, pre-activation) and when to use which; (4) **identity vs projection shortcuts** for changing dimensions; (5) **gradient flow** — the formula ∂L/∂x = ∂L/∂y · (1 + ∂F/∂x) guarantees the gradient never vanishes through residual; (6) **impact on other architectures** — Transformer, U-Net, DenseNet, Highway Networks. Without skip connections, anything deeper than VGG-19 is practically untrainable.