Machine Learning · Ensembles and Model Selection

Gradient Boosting and XGBoost — sequential error correction

Ensembles and Model Selection

Introduction

Gradient Boosting (Friedman 2001) is a sequential ensemble: we build a strong model F(x) = Σₜ ηhₜ(x) as a sum of weak learners (typically shallow trees, depth 3–8), where each hₜ is fit to the NEGATIVE GRADIENT of the loss function with respect to F_{t−1}. For MSE the gradient = residual (y − F), so every successive tree "fixes" the predecessor's error — hence the intuition "boosting corrects errors". Learning rate η (0.01–0.1) scales each tree's contribution to avoid overfitting. XGBoost (Chen & Guestrin 2016) adds: a second-order Taylor expansion (Newton boosting), L1+L2 regularization on leaves, sparsity-aware splits, native NaN handling, and feature-wise parallelization. LightGBM (Ke et al. 2017) and CatBoost (Prokhorenkova et al. 2018) go further with histogram binning and ordered boosting. This family dominates Kaggle competitions on tabular data.