Retrieval

HNSW

2016ActivePublished: 20 May 2026Updated: 20 May 2026Published

Key innovation

A multi-layer "small-world" proximity graph: upper layers carry sparse long-range shortcuts for fast global navigation, lower layers carry dense local neighborhoods for precise refinement. Queries descend layer by layer in time logarithmic in N.

How it works

Construction: (1) For each new vector, sample its maximum layer l ~ Geometric(1/ln(M)) — most end up in l=0, only a few reach higher layers. (2) The vector is inserted from layer l downwards. At each layer, an expanded greedy search collects ef_construction candidate neighbors, then M (or M_max0 for layer 0) final neighbors are picked using the pruning heuristic — points lying in "different directions" from the new node are preferred, not just the closest ones. (3) Edges are bidirectional; when a neighbor exceeds M_max connections its edges are re-pruned too. Search: (a) start from the entry point at the highest existing layer, (b) greedy search at each layer until a local minimum, with fixed ef=1 above layer 0, (c) at layer 0 run an expanded search with a size-ef priority queue (search-time), (d) return top-k from that queue. Parameters M (typically 16–48), ef_construction (100–500) and ef (50–500) control the build time / memory / recall / latency trade-off. Filtering (e.g. WHERE category='X') is usually implemented as filtered search with dynamic ef or as separate per-category graphs.

Problem solved

Exact k-NN in high-dimensional spaces costs O(N · D) per query — infeasible for a million vectors under millisecond SLOs. IVF and KD/ball-tree degrade in high dimensions due to the "curse of dimensionality". HNSW solves this graph-theoretically: it empirically delivers O(log N) query time and ≥ 95% recall at very low latency, which makes it the de facto standard for vector search over AI embeddings at single-node scale.

Components

Hierarchical layered graphIndex structure

Layers 0..L: higher are sparse (long-range shortcuts), lower are dense. The number of layers per vector is sampled geometrically with parameter 1/ln(M).

Greedy beam search per layerQuery algorithm

Above layer 0: ef=1 (pure navigation). At layer 0: a size-ef priority queue (search-time) controls recall/latency.

Diversity pruning heuristicNeighbor selection during construction

Instead of taking the M closest candidates, HNSW prefers candidates spread across different directions (relative-neighborhood-graph style). The key to high recall.

Official

Adjacency lists (per layer)Edge representation

Each node has at most M (M_max0 in layer 0) edges per layer. Bidirectional — insertion updates both sides.

Implementation

Reference implementations

hnswlib

C++ / Python · Yury Malkov (author)

Official

FAISS IndexHNSWFlat

C++ / Python · Meta AI Research

Qdrant HNSW index

Rust · Qdrant

pgvector HNSW

C (Postgres extension) · pgvector

Apache Lucene HNSW

Java · Apache Lucene

Implementation pitfalls

ef (search) too small → low recallHigh

Default ef=10–32 yields <90% recall on real AI embeddings (D≥384). This is the most common reason behind "HNSW is bad" benchmark blogs.

Fix:Tune ef on a validation set to hit the target recall; ef is a per-query knob and needs no rebuild.

Deletes / updates degrade the graphHigh

Vanilla HNSW is append-mostly. Deletes are tombstoned, and inserting new versions doesn't always rewire edges optimally — after many mutations recall drops and a full rebuild becomes necessary.

Fix:Schedule periodic rebuilds; use variants that support deletes with edge repair (Qdrant, pgvector ≥ 0.6).

Filtered search — empty-queue pitfallMedium

Post-hoc filters like WHERE category='X' applied on top-ef results can return fewer than k matching vectors. You must dynamically grow ef or use filtered-HNSW (predicate checked during traversal).

Fix:Use engines with native filtered HNSW (Qdrant, Milvus, Weaviate); for rare categories consider per-category graphs.

RAM blow-up at large M and high DHigh

HNSW keeps full vectors + edges in RAM. For 100M vectors at D=1536 (OpenAI embeddings) with M=32 that is ~700 GB RAM — quickly becomes economically unreasonable.

Fix:For N > ~50M consider HNSW-PQ / HNSW-INT8 or move to DiskANN / IVF-PQ.

Bad entry point placement after rebuildLow

The HNSW entry point is a single highest-layer node. After a full rebuild a suboptimal entry point inflates the number of steps per query — sometimes visible as a latency regression with no recall change.

Fix:Monitor mean hop count; some implementations support warm-up and entry-point selection via sampling.

Evolution

Original paper · 2018 · IEEE TPAMI 2018 (preprint 2016) · Yury A. Malkov

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

Yury A. Malkov, Dmitry A. Yashunin

2014

Malkov et al. — NSW (Navigable Small World)

HNSW's predecessor: a single-layer "small world" graph with both short and long edges. Already competitive, but slower than HNSW.

2016

Malkov & Yashunin — HNSW preprint (arXiv:1603.09320)

Inflection point

First HNSW paper: adds hierarchical layers and the diversity pruning heuristic. A breakthrough in ANN quality and speed.

2017

nmslib — reference C++ implementation

Open-source HNSW in nmslib (Non-Metric Space Library) — the first widely used implementation, basis for many ports.

2018

TPAMI publication; hnswlib as standalone

Inflection point

Official IEEE TPAMI publication. Yury Malkov releases hnswlib — a standalone, fast C++/Python library that remains the reference implementation.

2022

HNSW becomes the default index in vector databases

Qdrant, Pinecone, Weaviate, pgvector (≥ 0.5), Elasticsearch (8.0+), OpenSearch and Milvus — all expose HNSW as their primary or only ANN index.

2024

HNSW-PQ / HNSW-INT8 variants in mainstream use

Qdrant, Milvus, Lucene and pgvector add vector compression on top of HNSW (PQ, scalar quantization, INT8) to reduce RAM without giving up the graph.

Hyperparameters (configurable axes)

Max neighbors per node (M)Critical

Maximum neighbors per node in layers >0 (in layer 0 usually M_max0 = 2·M). Larger M means higher recall, more RAM, slower build.

32–48

ef_constructionHigh

Dynamic candidate list size during construction. Larger means better graph quality (higher recall) at the cost of longer build.

100

200–400

ef (search-time)Critical

Priority queue size during query at layer 0. The main runtime recall vs latency knob — can be changed per-query without rebuilding.

128

500

M_max0 (layer 0 cap)Medium

Max neighbor count in layer 0 (the densest). Typically 2·M; raising it improves recall but doubles layer-0 RAM.

2·M

Computational complexity

Time complexity: Query: O(log N) (empirical); Insert: O(log N · ef_construction · D). Space complexity: O(N · (D · 4B + M · 8B · L_avg)).

Compute bottleneck

Distance computations during graph traversal

Most query time is spent on dot products / L2 distances between the query and visited node vectors. SIMD implementations (AVX-512, NEON) are critical for performance.

Execution paradigm

Primary mode

sparse

Only a small subset of graph nodes is visited per query (sparse traversal). This contrasts with PQ (dense scan) and is closer to DiskANN.

Activation pattern

subset_active

Parallelism

Parallelism level

fully_parallel

Queries are independent — throughput scales linearly with cores. Index construction is partly parallel (per-node lock during edge updates).

Scope

inferenceacross_devices

Hardware requirements

Primary

HNSW search is dominated by distance computations (dot product / L2) over small vector sets per step. AVX2/AVX-512 provides the needed throughput. GPU has no edge due to small per-query batches and irregular memory access.

Good fit

HNSW implementations run efficiently on ARM (Graviton, Apple Silicon, NEON) and x86. What matters is memory bandwidth and SIMD, not a specific ISA.

Limited

Irregular graph reads and small per-query batches don't map well to Tensor Cores designed for dense GEMM. GPU can speed up index construction and extreme throughput scenarios, but not the typical single-query path.

Sources

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs (arXiv:1603.09320)

Paper

IEEE TPAMI / arXiv

hnswlib — official C++/Python implementation

Repository

nmslib

FAISS — HNSW index documentation

Documentation

Meta AI Research

Qdrant — Indexing (HNSW configuration)

Documentation

Qdrant

pgvector — HNSW index

Repository

pgvector