Machine Learning · Data and Preparation
Exploratory Data Analysis (EDA) — first contact with the dataset
Data and Preparation
Introduction
EDA (a term popularized by John Tukey in 1977) is the systematic study of data before modeling: shape, distributions, relationships, anomalies, and quality. The lesson walks through a typical pandas pipeline (head/info/describe), visualizations (histogram, boxplot, scatter, pairplot), correlation measures (Pearson vs Spearman), outlier detection via the IQR rule, and the train/test discipline (EDA only on the training set, to avoid data leakage).