Machine Learning · Ensembles and Model Selection

Feature importance — interpreting ensemble models

Ensembles and Model Selection

Introduction

Feature importance attempts to answer "which feature influences the model's prediction more?". Four main methods: (1) MDI (Mean Decrease in Impurity, "Gini importance" in sklearn) — averaged impurity drop weighted by observations in nodes where the feature was used; fast but biased toward high-cardinality and continuous features (Strobl et al. 2007). (2) Permutation importance — mean drop in a metric (e.g. accuracy) when one feature's values are randomly shuffled on the validation set; slower, but model-agnostic and free of cardinality bias. (3) SHAP values (Lundberg & Lee 2017) — Shapley values from game theory giving an additive per-observation prediction decomposition; the only method satisfying consistency and local accuracy. (4) Partial Dependence Plots — show the marginal effect of a feature on the prediction. Each method answers a different question and has its interpretation pitfalls.