AutoML defines a search space (models, pipelines, hyperparameters, architectures) and optimizes it with a search procedure — Bayesian (SMAC, BOHB), evolutionary (TPOT), bandit-based (Hyperband, ASHA), reinforcement-learning-based (NAS-RL), or gradient-based (DARTS). Meta-learning leverages previous experiments (e.g. configuration portfolios) as a warm start for new tasks. Cross-validation, stacking, and ensembling improve final results.
Applying ML requires expert knowledge in preprocessing, model selection, and hyperparameter tuning, creating an entry barrier for domain users and slowing research iterations.
Formal definition of the set of admissible models, pipelines, hyperparameters, and network topologies.
Algorithm exploring the search space: Bayesian optimization, evolutionary, bandits, RL, gradient.
Compute-efficient procedure for evaluating candidates: cross-validation, multi-fidelity (Hyperband), learning curve extrapolation, performance predictors.
Leveraging knowledge from prior tasks (configuration portfolios, task embeddings) to speed up the start on a new dataset.
Combining multiple trained models (bagging, boosting, multi-layer stacking) into a final predictor.
Full hyperparameter or architecture search can consume thousands of GPU-hours, making naive AutoML inaccessible for small teams.
Repeated evaluation of configurations on the same validation set leads to a winner's curse — the selected model is optimistically estimated.
Automated pipelines easily leak test-set information if preprocessing is not properly enclosed within validation folds.
Complex ensembles selected by AutoML are often hard to explain, complicating validation in regulated domains (healthcare, finance).
AutoML results are bounded by the chosen search space — by definition it cannot discover solutions outside of it.