Machine Learning · Data and Preparation
Categorical features and encoding
Data and Preparation
Introduction
ML models other than trees operate on numbers — categories must be encoded. The lesson lays out a family of techniques: One-Hot, Ordinal, Target/Mean Encoding (with OOF/KFold discipline to avoid leakage), Frequency, Hashing Trick, Binary, WoE, and cyclical sin/cos encoding. We cover when each technique fits, how to deal with high cardinality and rare categories, and why CatBoost and LightGBM accept categorical features natively.