Training (offline): (1) take a sample of the dataset (e.g. 1–5%), (2) run k-means with k = nlist and obtain centroids c₁…c_nlist, (3) for each vector in the database find its nearest centroid and append it to that cell's inverted list. Physically the index is a dictionary: centroid → array of (vector id, vector). Query: (a) compute distances from q to all nlist centroids — O(nlist · D), (b) pick the nprobe closest cells, (c) scan the vectors in those cells while keeping a top-k heap, (d) return top-k. Key parameters are nlist (typically √N — e.g. nlist=4096 for 16M vectors) and nprobe (usually 1–10% of nlist). The IVF-PQ variant replaces the full vectors in the lists with PQ codes (e.g. 8–16 bytes per vector) — distances are computed via Asymmetric Distance Computation (ADC) against a precomputed LUT, which dramatically cuts RAM and accelerates per-cell scans. IVF-OPQ adds an upfront orthogonal rotation that optimizes how dimensions are spread across PQ sub-quantizers.
Exact k-NN over millions or billions of vectors requires scanning the entire dataset for every query, which is computationally infeasible. IVF solves this by partitioning the dataset up front: instead of O(N · D) work per query, only O((nlist + nprobe · N/nlist) · D) is needed, and the nprobe parameter gives operators a hard knob to trade recall against latency.
Trained k-means centroids define the nlist Voronoi cells. The number and quality of centroids determine how the data distribution maps onto the inverted lists.
Official
For each centroid an array of ids and (optionally compressed) vectors assigned to that cell is kept — analogous to the inverted index in text search.
At query time, q→centroid distances are computed and the nprobe closest cells are selected. nprobe is the main recall/latency knob.
In the IVF-PQ variant, vectors in the lists are stored as short PQ codes; distances are computed from a LUT via Asymmetric Distance Computation, dramatically cutting RAM and speeding up the scan.
Official
If the k-means training sample doesn't represent the production distribution (poor stratification, too small), you get "mega-cells" containing most of the data, which degrades recall and removes any benefit of indexing.
Default nprobe=1 often yields below-50% recall on real datasets. This is the most common reason behind "IVF is bad" benchmarks.
Mixing metrics (cosine vs L2) or skipping L2 normalization for cosine yields completely wrong centroids and search results.
IVF is semi-static — when the data distribution drifts, the centroids stop being representative and recall drops. Adding vectors without retraining makes it worse.
In IVF-PQ the dimensionality D must be divisible by m. A mismatched m (e.g. m=8 works for D=384, m=10 does not) causes construction errors or padding, and a poorly chosen m drastically reduces recall.
The idea of an inverted file for visual features: each "visual word" (centroid) has a list of documents in which it appears. The practical origin of IVF for vectors.
Combining IVF with PQ ("Product Quantization for Nearest Neighbor Search", IEEE TPAMI 2011) — the foundation of modern billion-scale ANN.
Facebook AI Research releases FAISS — the most popular implementation of IVF and its variants, with GPU acceleration.
Milvus, Qdrant, Weaviate, pgvector and others expose IVF / IVF-PQ as a primary ANN index alongside HNSW.
Time complexity: O((nlist + nprobe · N/nlist) · D) per query. Space complexity: O(nlist · D + N · D) (IVF) / O(nlist · D + N · b_PQ) (IVF-PQ).
Most cycles are spent scanning vectors in the selected nprobe cells. For IVF-Flat this is a dot product over full vectors; for IVF-PQ it is an ADC LUT lookup — both are CPU vector workloads.
Only nprobe out of nlist cells are active per query — a "sparse cell activation" paradigm.
Queries are independent. Within one query, scanning different cells can also run in parallel — IVF is classically data-parallel.
Number of k-means centroids / Voronoi cells. Larger nlist means shorter lists and faster scan, but higher centroid-selection cost and longer training. Typically nlist ≈ √N.
Number of cells visited at query time. The main recall vs latency knob — larger nprobe means higher recall and slower queries.
Number of vectors used to train the k-means quantizer. Recommended ≥ 30 · nlist; too small a sample leads to imbalanced cells and recall degradation.
IVF-PQ variant: number of PQ sub-quantizers. The D-dim vector is split into m equal segments, each quantized independently. Larger m means higher recall and larger RAM.
Scanning lists and computing distances are dense vector operations — perfect for CPU SIMD (AVX2/AVX-512). IVF-PQ with ADC also relies on LUTs that stay in L1/L2 cache.
FAISS GPU accelerates IVF / IVF-PQ — large query batching and regular loops fit GPUs well, although Tensor Cores are only partially utilized (these are not classical GEMMs).
IVF runs efficiently on ARM (Graviton, Apple Silicon) and x86; it mainly needs good memory bandwidth and SIMD.