Machine Learning · Overfitting, Underfitting, and Regularization

L1 and L2 regularization — Lasso, Ridge, and Elastic Net

Overfitting, Underfitting, and Regularization

Introduction

L1 (Lasso) and L2 (Ridge) regularization are the two canonical tools for controlling model capacity. Both add a weight-dependent penalty to the cost function, but their geometry produces radically different effects: L2 SHRINKS weights (proportionally), L1 SETS THEM TO ZERO (feature selection). This lesson derives both formulas, discusses the closed-form Ridge, the geometry of L1 (corners of the ||·||₁ ball that generate sparsity), Elastic Net as a hybrid, effects on bias-variance, the necessity of standardization, and choosing λ via cross-validation. We build on: Tibshirani 1996 ("Regression shrinkage and selection via the Lasso"), Hoerl & Kennard 1970 (Ridge), Zou & Hastie 2005 (Elastic Net), Hastie, Tibshirani & Wainwright 2015 ("Statistical Learning with Sparsity").