Architecture

Support Vector Machine

Historical

How it works

1. Find the hyperplane w·x + b = 0 maximising the margin 2/‖w‖ subject to y_i(w·x_i + b) ≥ 1. 2. Reformulate as a quadratic programming (QP) problem (or its Lagrangian dual), where support vectors — points on the margin — determine the solution. 3. Soft-margin (parameter C): allow margin violations via slack variables ξ_i ≥ 0. Large C → narrow margin, fewer training errors; small C → wide margin, more errors tolerated. 4. Kernel trick: replace the dot product with a kernel function K(x_i, x_j) = φ(x_i)·φ(x_j) without computing φ explicitly. Common kernels: RBF K = exp(−γ‖x−x'‖²), polynomial, sigmoid. 5. Prediction: sign(Σ α_i y_i K(x_i, x) + b), summing only over support vectors (α_i > 0). 6. Training: the SMO (Sequential Minimal Optimization) algorithm decomposes the QP into two-variable sub-problems solved analytically.

Problem solved

Linear classifiers find any separating hyperplane but provide no generalization guarantee — infinitely many valid planes exist. SVM solves this by selecting the maximum-margin hyperplane (maximizing distance to the nearest points of each class), minimizing the risk of error on unseen data. Non-linear decision boundaries are addressed by the kernel trick, which maps data into a higher-dimensional space without explicit transformation.

Key mechanisms

Maximization of the margin between classes (max-margin classifier)

Support vectors — only they define the decision boundary

Kernel trick — implicit mapping into a higher-dimensional space

Kernel functions: linear, RBF (Gaussian), polynomial, sigmoid

Soft-margin with the C parameter — trade-off between margin and classification error

Constrained quadratic optimization (QP) or the SMO algorithm

Hinge loss: max(0, 1 − y·f(x))

Strengths & limitations

Strengths

✓Effective in high-dimensional spaces (e.g. vectorized text)

✓Strong generalization properties — solid VC theory

✓Kernel trick enables non-linear classification without explicit transformation

✓Result is deterministic — global optimum of a convex problem

✓Low sensitivity to feature count relative to sample count

✓Stability — model depends only on the support vectors

Limitations

✗Training is computationally expensive: O(N²)–O(N³) in the number of samples

✗Scales poorly to very large datasets (millions of examples)

✗No natively calibrated probabilities (requires Platt scaling)

✗Kernel and hyperparameter selection (C, γ) requires costly cross-validation

✗Multi-class classification requires one-vs-rest or one-vs-one schemes

✗Low interpretability after non-linear kernels are applied

✗Sensitive to feature scaling

Implementation

Implementation pitfalls

O(n²)–O(n³) complexity prevents scaling to large datasetsMedium

Standard SVM solvers (SMO, libsvm) have quadratic or cubic complexity w.r.t. sample count — for n>100k training is impractical. Alternatives: SGD-SVM, LinearSVC (O(n)).

Kernel and hyperparameter selection requires cross-validationMedium

Kernel choice (RBF, poly, linear) and parameters C, γ strongly affect results — no default values working in all cases. Grid search + CV is costly for large datasets.

Evolution

Original paper · 1995 · Corinna Cortes

Support-Vector Networks

Corinna Cortes, Vladimir Vapnik

1963

Vladimir Vapnik and Alexey Chervonenkis publish the "Generalized Portrait" algorithm — a precursor to linear SVM.

1992

Boser, Guyon and Vapnik introduce the kernel trick, enabling non-linear classification in high-dimensional spaces.

1995

Cortes and Vapnik publish "Support-Vector Networks" — the soft-margin variant becomes the foundation of modern SVM.

1998

John Platt describes the SMO (Sequential Minimal Optimization) algorithm, dramatically speeding up SVM training.

2001

Chih-Chung Chang and Chih-Jen Lin release LIBSVM — the most popular SVM implementation in academia and industry.

2012

AlexNet wins ImageNet — deep networks displace SVM as the dominant classifier in computer vision.

Sources

Support-Vector Networks

Paper

Machine Learning (Springer)

A Training Algorithm for Optimal Margin Classifiers

Paper

COLT 1992

Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines

Paper

Microsoft Research (Platt 1998)

LIBSVM — A Library for Support Vector Machines

Documentation

National Taiwan University

Support Vector Machines — scikit-learn documentation

Documentation

scikit-learn

Support vector machine

reference

Wikipedia