Retrieval

PQ

2011ActivePublished: 20 May 2026Updated: 20 May 2026Published

Key innovation

Decomposing a high-dimensional vector into m equal sub-vectors and quantizing each one independently with a small codebook (k = 256), which compresses a float32 vector 8–64× while preserving ANN search quality through Asymmetric Distance Computation (ADC) over a precomputed LUT.

How it works

Training (offline): (1) collect a representative sample of database vectors, (2) choose hyperparameters — number of sub-quantizers m (must divide D) and bits per code (typically 8, i.e. k=256 centroids per sub-space), (3) split each sample vector into m segments of D/m dims, (4) run k-means independently on each of the m segment sets — yielding m codebooks of 256 centroids each. Database encoding: each vector v is split into (v₁,…,v_m), each segment is matched to its closest centroid c_j in the corresponding codebook, and the vector is stored as m bytes (centroid ids). Query (ADC): (a) split q into (q₁,…,q_m), (b) for each sub-space j compute a LUT_j[i] = dist(q_j, c_j^i) for i=0..255 — m × 256 operations total, (c) for each database vector with code (i₁,…,i_m) the approximate distance = Σ LUT_j[i_j] — m additions plus m cache lookups, (d) keep a top-k heap. The OPQ variant prepends PQ with an orthogonal rotation R chosen to minimize quantization error — substantially improves recall on data with correlated dimensions. The RQ (Residual Quantization) variant quantizes residuals iteratively — successive codebooks learn the difference between the vector and its current approximation.

Problem solved

Embeddings from AI models (BERT, CLIP, OpenAI) typically have D=384–1536 dimensions at 4 bytes (float32), giving 1.5–6 KB per vector. For one billion vectors that is 1.5–6 TB of RAM — infeasible on a single server. PQ solves this: the same billion vectors after PQ take 16–64 GB of RAM (50–100× compression) with minimal recall loss, enabling huge indexes on a single node and dramatically cheaper infrastructure.

Components

Sub-space decompositionSplit vector into m segments

A D-dimensional vector is split into m disjoint segments of D/m dims each. m must divide D; choosing m controls the compression vs quality trade-off.

Per-subspace k-means codebooksCentroid dictionary per sub-space

m independent codebooks, each typically with k=256 centroids (8 bits). Trained with k-means on the corresponding sub-space sample.

Official

PQ codes (m bytes per vector)Compressed vector representation

Each vector is represented as an m-tuple of centroid indices (m bytes when k=256). This is what indexes hold in RAM/SSD.

Asymmetric Distance Computation (ADC)Query algorithm without decompressing the database

Precompute LUT_j[i] = dist(q_j, c_j^i) per sub-space, then scan the database as m lookups + m additions per vector. Symmetric DC also quantizes the query (faster but lower recall).

Optional rotation (OPQ)Pre-processing that minimizes quantization error

OPQ learns an orthogonal matrix R that rotates vectors so variance is spread evenly across sub-spaces — typically 2–5 p.p. higher recall at the same compression.

Official

Implementation

Reference implementations

FAISS IndexPQ / IndexIVFPQ

C++ / Python · Meta AI Research

Official

Google ScaNN (Anisotropic PQ)

C++ / Python · Google Research

Milvus IVF_PQ / IVF_SQ8

C++ / Go · Zilliz / Milvus

Implementation pitfalls

m not dividing DHigh

Plain PQ requires D to be divisible by m. E.g. for D=384 valid m: 1, 2, 3, 4, 6, 8, 12, 16, 24, 32, 48, 64, 96, 128, 192. Picking m=10 or m=20 throws an error or forces padding.

Fix:Check divisors of D before picking m; consider OPQ (rotation can change effective D).

No L2 normalization for cosineCritical

PQ is metric-agnostic, but for cosine you must L2-normalize embeddings before training codebooks — otherwise centroids capture norm differences instead of directional ones.

Fix:L2-normalize before encoding and before query; document the metric choice.

Training on a non-representative sampleHigh

k-means codebooks are frozen after training. If the training sample doesn't cover the production distribution, recall drops dramatically (drift effect).

Fix:Stratified sample of ≥ 30·k per sub-space (FAISS recommendation); periodic retraining on drift.

Skipping OPQ — leaving free recall on the tableMedium

For AI embeddings (BERT/CLIP/OpenAI) dimensions are often correlated. Vanilla PQ then yields significantly worse recall than OPQ for the same byte budget.

Fix:Default to OPQ (FAISS: IndexPreTransform + OPQMatrix or OPQ + IVFPQ).

No re-ranking → misleading recall@k metricsMedium

ADC produces approximate distances. Without re-ranking top-N candidates with full vectors from disk, recall@10 can be much worse than recall@100 after re-ranking.

Fix:Keep full vectors in a slower tier (SSD/object storage) and re-rank top-100 → top-10 before returning.

Evolution

Original paper · 2011 · IEEE TPAMI 2011 · Hervé Jégou

Product Quantization for Nearest Neighbor Search

Hervé Jégou, Matthijs Douze, Cordelia Schmid

2011

Jégou, Douze, Schmid — Product Quantization (TPAMI)

Inflection point

The original paper introduces PQ together with ADC and two modes: SDC (symmetric) and ADC (asymmetric). Experiments on SIFT/GIST show 32× compression with a small recall drop.

2013

Ge et al. — Optimized Product Quantization (OPQ)

Inflection point

OPQ learns an orthogonal rotation that minimizes quantization error. Became the default PQ variant in FAISS.

Optimized Product Quantization (paper)

2014

Babenko & Lempitsky — Additive / Residual Quantization

A family of methods (AQ, RQ, LSQ) generalizing PQ — vector as a sum of centroids from multiple codebooks. Higher recall at the cost of more expensive encoding.

2017

FAISS — open-source PQ with SIMD Fast Scan

Meta AI releases FAISS, including PQ4 Fast Scan with AVX-512 — a breakthrough in PQ CPU performance.

2020

Google — ScaNN with Anisotropic Vector Quantization

ScaNN improves PQ with a dot-product-aware (anisotropic) loss — higher recall for cosine/IP, widely used in Google production.

Accelerating Large-Scale Inference with Anisotropic Vector Quantization (paper)