Robots Atlas>ROBOTS ATLAS

Machine Learning · Overfitting, Underfitting, and Regularization

Underfitting and overfitting — diagnosing model fit

Overfitting, Underfitting, and Regularization

Introduction

Underfitting and overfitting are the two poles of generalization error. Underfitting: the model is too simple to capture the structure of the data — high error on both training and test sets. Overfitting: the model has fitted noise in the training data — training error near zero, test error high. This lesson shows how to recognize both phenomena (learning curves, validation curves), where they come from (model capacity, dataset size, signal-to-noise ratio), and what the standard remedies are (more data, regularization, early stopping, simpler model). We build on classic works: Geman, Bienenstock & Doursat 1992, Hastie, Tibshirani & Friedman "The Elements of Statistical Learning" 2009.