Machine Learning · Data and Preparation

Feature engineering

Data and Preparation

Introduction

Feature engineering is the deliberate construction of new variables that expose data structure to the model. The lesson covers classical techniques (polynomial features, interactions, ratios, bucketing), domain-specific ones (date, geography, text, time-series — lag and rolling), per-group aggregations, frequency/target encoding with K-Fold discipline, PCA as a dimensionality reducer, automated generation (Featuretools/DFS), the curse of dimensionality, and reproducibility via the sklearn Pipeline. Guiding rule: feature engineering MUST be part of the pipeline, fit on train only, to prevent leakage.