Other

AutoML

2013ActivePublished: 17 May 2026Updated: 17 May 2026Published

How it works

AutoML defines a search space (models, pipelines, hyperparameters, architectures) and optimizes it with a search procedure — Bayesian (SMAC, BOHB), evolutionary (TPOT), bandit-based (Hyperband, ASHA), reinforcement-learning-based (NAS-RL), or gradient-based (DARTS). Meta-learning leverages previous experiments (e.g. configuration portfolios) as a warm start for new tasks. Cross-validation, stacking, and ensembling improve final results.

Problem solved

Applying ML requires expert knowledge in preprocessing, model selection, and hyperparameter tuning, creating an entry barrier for domain users and slowing research iterations.

Components

Search space

Formal definition of the set of admissible models, pipelines, hyperparameters, and network topologies.

Search strategy

Algorithm exploring the search space: Bayesian optimization, evolutionary, bandits, RL, gradient.

Performance estimation

Compute-efficient procedure for evaluating candidates: cross-validation, multi-fidelity (Hyperband), learning curve extrapolation, performance predictors.

Meta-learning / warm-start

Leveraging knowledge from prior tasks (configuration portfolios, task embeddings) to speed up the start on a new dataset.

Ensembling and stacking

Combining multiple trained models (bagging, boosting, multi-layer stacking) into a final predictor.

Implementation

Reference implementations

Implementation pitfalls

High compute costsHigh

Full hyperparameter or architecture search can consume thousands of GPU-hours, making naive AutoML inaccessible for small teams.

Overfitting to the validation setHigh

Repeated evaluation of configurations on the same validation set leads to a winner's curse — the selected model is optimistically estimated.

Data leakageHigh

Automated pipelines easily leak test-set information if preprocessing is not properly enclosed within validation folds.

Limited interpretabilityMedium

Complex ensembles selected by AutoML are often hard to explain, complicating validation in regulated domains (healthcare, finance).

Narrow search spacesMedium

AutoML results are bounded by the chosen search space — by definition it cannot discover solutions outside of it.

Evolution

Original paper · 2013 · Chris Thornton

Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms

Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown

2013

Auto-WEKA — first system combining algorithm selection and hyperparameter optimization (Thornton, Hutter, Hoos, Leyton-Brown, KDD 2013).

Inflection point

2014

AutoML Workshop @ ICML 2014 (Hutter, Caruana, Bardenet et al.) — institutional establishment of the research field.

2015

Auto-sklearn (Feurer et al., NeurIPS 2015) — meta-learning + ensembling over scikit-learn; wins ChaLearn AutoML challenges.

Inflection point

2016

TPOT (Olson et al.) — ML pipeline optimized by genetic programming.

2017

Neural Architecture Search with RL (Zoph & Le, ICLR 2017) — NAS becomes a high-profile AutoML subfield.

Inflection point

2018

Google Cloud AutoML — commercialization of AutoML as a cloud service.

2019

DARTS (Liu et al., ICLR 2019) — differentiable NAS reducing compute cost by orders of magnitude; book "Automated Machine Learning: Methods, Systems, Challenges" (Hutter, Kotthoff, Vanschoren).

Inflection point

2020

AutoGluon-Tabular (Erickson et al., AWS) — multi-layer stacking for tabular data.

2022

TabPFN (Hollmann et al.) — transformer pretrained on synthetic data solving small tabular problems in one second.

2024

First dedicated AutoML Conference (after ICML/NeurIPS workshops) — the field matures institutionally.