Retrieval

IVF

2003ActivePublished: 20 May 2026Updated: 20 May 2026Published

Key innovation

Partitioning the vector space into Voronoi cells around k-means centroids, so a query only visits its nprobe nearest cells instead of the whole dataset — turning exact k-NN into ANN with a tunable recall/speed trade-off.

How it works

Training (offline): (1) take a sample of the dataset (e.g. 1–5%), (2) run k-means with k = nlist and obtain centroids c₁…c_nlist, (3) for each vector in the database find its nearest centroid and append it to that cell's inverted list. Physically the index is a dictionary: centroid → array of (vector id, vector). Query: (a) compute distances from q to all nlist centroids — O(nlist · D), (b) pick the nprobe closest cells, (c) scan the vectors in those cells while keeping a top-k heap, (d) return top-k. Key parameters are nlist (typically √N — e.g. nlist=4096 for 16M vectors) and nprobe (usually 1–10% of nlist). The IVF-PQ variant replaces the full vectors in the lists with PQ codes (e.g. 8–16 bytes per vector) — distances are computed via Asymmetric Distance Computation (ADC) against a precomputed LUT, which dramatically cuts RAM and accelerates per-cell scans. IVF-OPQ adds an upfront orthogonal rotation that optimizes how dimensions are spread across PQ sub-quantizers.

Problem solved

Exact k-NN over millions or billions of vectors requires scanning the entire dataset for every query, which is computationally infeasible. IVF solves this by partitioning the dataset up front: instead of O(N · D) work per query, only O((nlist + nprobe · N/nlist) · D) is needed, and the nprobe parameter gives operators a hard knob to trade recall against latency.

Components

k-means coarse quantizerDefines the space partitioning

Trained k-means centroids define the nlist Voronoi cells. The number and quality of centroids determine how the data distribution maps onto the inverted lists.

Official

Inverted listsPhysical index layout

For each centroid an array of ids and (optionally compressed) vectors assigned to that cell is kept — analogous to the inverted index in text search.

nprobe selectorCell-selection mechanism at query time

At query time, q→centroid distances are computed and the nprobe closest cells are selected. nprobe is the main recall/latency knob.

Optional PQ encoderPer-cell vector compression (IVF-PQ)

In the IVF-PQ variant, vectors in the lists are stored as short PQ codes; distances are computed from a LUT via Asymmetric Distance Computation, dramatically cutting RAM and speeding up the scan.

Official

Implementation

Reference implementations

FAISS IndexIVFFlat / IndexIVFPQ

C++ / Python · Meta AI Research

Official

Milvus IVF_FLAT / IVF_PQ / IVF_SQ8

C++ / Go · Zilliz / Milvus

pgvector ivfflat index

C (Postgres extension) · pgvector

Implementation pitfalls

Bad training sample → imbalanced cellsHigh

If the k-means training sample doesn't represent the production distribution (poor stratification, too small), you get "mega-cells" containing most of the data, which degrades recall and removes any benefit of indexing.

Fix:Use a sample of ≥ 30 · nlist, randomly stratified; monitor list sizes after assignment.

nprobe set too low → low recallHigh

Default nprobe=1 often yields below-50% recall on real datasets. This is the most common reason behind "IVF is bad" benchmarks.

Fix:Tune nprobe on a validation set until the target recall is met; keep profiles per SLO.

Missing normalization / wrong metricCritical

Mixing metrics (cosine vs L2) or skipping L2 normalization for cosine yields completely wrong centroids and search results.

Fix:Use a single metric across training, index and query; L2-normalize before indexing when using cosine.

Frequent mutations without retraining centroidsMedium

IVF is semi-static — when the data distribution drifts, the centroids stop being representative and recall drops. Adding vectors without retraining makes it worse.

Fix:Periodically retrain the quantizer; in mutable workloads prefer HNSW or FreshDiskANN.

IVF-PQ with m poorly matched to DMedium

In IVF-PQ the dimensionality D must be divisible by m. A mismatched m (e.g. m=8 works for D=384, m=10 does not) causes construction errors or padding, and a poorly chosen m drastically reduces recall.

Fix:Pick m as a divisor of D; consider IVF-OPQ, which partly mitigates this via rotation.

Evolution

Original paper · 2003 · ICCV 2003 · Josef Sivic

Video Google: A Text Retrieval Approach to Object Matching in Videos

Josef Sivic, Andrew Zisserman

2003

Sivic & Zisserman — Video Google (inverted file for visual features)

Inflection point

The idea of an inverted file for visual features: each "visual word" (centroid) has a list of documents in which it appears. The practical origin of IVF for vectors.

2011

Jégou et al. — IVF + Product Quantization (IVF-PQ)

Inflection point

Combining IVF with PQ ("Product Quantization for Nearest Neighbor Search", IEEE TPAMI 2011) — the foundation of modern billion-scale ANN.

Product Quantization for Nearest Neighbor Search (paper)

2017

FAISS — open-source library with IVF and IVF-PQ

Facebook AI Research releases FAISS — the most popular implementation of IVF and its variants, with GPU acceleration.

2020

IVF becomes a default index in vector databases

Milvus, Qdrant, Weaviate, pgvector and others expose IVF / IVF-PQ as a primary ANN index alongside HNSW.