Machine Learning · Data and Preparation
Missing data and cleaning
Data and Preparation
Introduction
Missing data is the rule, not the exception. The lesson lays out terminology (MCAR, MAR, MNAR), handling strategies (deletion, imputation, missing flag, native tree handling), sklearn tools (SimpleImputer, KNNImputer, IterativeImputer/MICE), and common cleaning operations: duplicates, text inconsistency, sentinel codes, date parsing, and units. All with the "fit on train only" discipline to avoid leaking imputation statistics.