Robots Atlas>ROBOTS ATLAS

Machine Learning · Data and Preparation

Missing data and cleaning

Data and Preparation

Introduction

Missing data is the rule, not the exception. The lesson lays out terminology (MCAR, MAR, MNAR), handling strategies (deletion, imputation, missing flag, native tree handling), sklearn tools (SimpleImputer, KNNImputer, IterativeImputer/MICE), and common cleaning operations: duplicates, text inconsistency, sentinel codes, date parsing, and units. All with the "fit on train only" discipline to avoid leaking imputation statistics.