Neural Networks: From Fundamentals to Modern AI · Convolutional Neural Networks (CNN)

Transfer learning — feature extraction vs fine-tuning (how to leverage ImageNet)

Convolutional Neural Networks (CNN)

Introduction

Transfer learning is the practical recipe for success in almost any real-world vision task: instead of training a CNN from scratch on a small dataset (typically hundreds–thousands of images), we take a network pretrained on ImageNet (1.28M images, 1000 classes) and adapt it to the new task. It works because low and middle CNN layers learn **universal** visual features (edges, textures, shapes, object parts) independent of specific classes — useful for dogs, skin cancer, satellite imagery or PCB inspection. This lesson covers: (1) **two modes** — feature extraction (frozen network, classifier-only training) vs fine-tuning (unfreezing and retraining part/all); (2) **decision matrix** — when to use which (data size × domain similarity, Yosinski et al. 2014); (3) **discriminative learning rates** — lower LR for earlier layers (Howard&Ruder ULMFiT 2018); (4) **gradual unfreezing** — unfreezing layers one by one; (5) **BatchNorm pitfalls** — frozen BN in eval vs train mode; (6) **catastrophic forgetting** vs negative transfer; (7) **ImageNet alternatives** — DINOv2, CLIP, models pretrained on specific domains (medical, satellite). Without transfer learning, most computer vision projects with a budget below 1M USD would fail.