Machine Learning · Data and Preparation

Numerical features and normalization

Data and Preparation

Introduction

Why scaling numerical features matters for distance- and gradient-based algorithms (kNN, SVM, logistic regression, neural nets) and does not for decision trees. The lesson lays out the scikit-learn scaler family (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, Normalizer), distribution transforms (log, Box-Cox, Yeo-Johnson, QuantileTransformer), and the "fit on train only" discipline via ColumnTransformer and Pipeline.