Let the set of possible categories have size K. Each category is assigned a unique index i โ {0, โฆ, Kโ1}. The categorical value is then represented as a vector v โ {0,1}^K with v[i] = 1 and v[j] = 0 for j โ i. Equivalently, it is the i-th row of the KรK identity matrix. In practice this is implemented by OneHotEncoder in scikit-learn, get_dummies in pandas, and one_hot functions in PyTorch/TensorFlow. For very large K (e.g. NLP vocabularies) the one-hot vector is rarely materialized explicitly: multiplying a weight matrix W by a one-hot vector reduces to selecting the i-th row of W (embedding lookup), which is the foundational optimization behind embedding layers.
Removes the spurious ordering and unequal distances introduced by encoding categories as integers (e.g. "red=0, green=1, blue=2" would imply blue is twice as far from red as green is). Lets linear models and neural networks correctly handle nominal categorical variables.
For vocabularies of 10โตโ10โถ (NLP) dense materialization of one-hot vectors is impractical in memory and numerically.
All vectors are equidistant โ the model has no prior information about category similarity.
The sum of one-hot columns is constantly 1, causing collinearity with the intercept in regression.
A category present only in the test set causes an error or silent zero vector if the encoder was not fit on it.
Time complexity: O(K) per sample. Space complexity: O(K) per sample (dense), O(1) (sparse / index).
Number of unique categories to encode. Directly determines the dimensionality of the output vector and memory cost.
Strategy for values outside the training set: ignore (zero vector), error (raise), infrequent_if_exist (map to special category).
Whether to drop one column to avoid collinearity in linear models (dummy encoding).
Whether to return scipy.sparse instead of a dense matrix โ important for large K.
The resulting vector has exactly one active position; ideal candidate for sparse representation.
Encoding of each sample is independent and trivially parallelizable.
The operation is computationally trivial (indexing / writing 1 into a zero vector) and gains nothing from specialized hardware.