Machine Learning · Data and Preparation
Numerical features and normalization
Data and Preparation
Introduction
Why scaling numerical features matters for distance- and gradient-based algorithms (kNN, SVM, logistic regression, neural nets) and does not for decision trees. The lesson lays out the scikit-learn scaler family (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, Normalizer), distribution transforms (log, Box-Cox, Yeo-Johnson, QuantileTransformer), and the "fit on train only" discipline via ColumnTransformer and Pipeline.