Retrieval

BM25 (Okapi BM25)

1994ActivePublished

Key innovation

Introduces term-frequency saturation (parameter k1) and document-length normalisation (parameter b), fixing two key weaknesses of TF-IDF.

Category

Retrieval

Abstraction level

Building block

Operation level

Retrieval

Use cases

Default ranking in Elasticsearch / OpenSearchFull-text search engines (Lucene, Solr)Sparse retriever in hybrid RAGBaseline in IR benchmarks (BEIR)

How it works

score(D,Q) = Σ IDF(qi) · ( f(qi,D)·(k1+1) ) / ( f(qi,D) + k1·(1 - b + b·|D|/avgdl) ). f(qi,D) = frequency of term qi in document D, |D| = length of D, avgdl = average document length. k1 (typically 1.2–2.0) controls TF saturation, b (typically 0.75) the strength of length normalisation. IDF uses the probabilistic variant.

Problem solved

In TF-IDF, term frequency grows linearly (100 occurrences = 100× weight) and does not consistently account for document length. BM25 saturates the contribution of additional occurrences and penalises unnaturally long documents.

Components

TF saturation (k1)Frequency damping

The term ( f·(k1+1) ) / ( f + k1·... ) makes the term-frequency contribution grow asymptotically instead of linearly.

Length normalisation (b)Length correction

The factor (1 - b + b·|D|/avgdl) penalises documents longer than average, limiting their artificial advantage.

Probabilistic IDFGlobal weight

IDF(t) = log((N - df + 0.5)/(df + 0.5) + 1) — a variant derived from the probabilistic relevance model.

Official

Implementation

Reference implementations

Apache Lucene BM25Similarity

Java · Apache Software Foundation

rank_bm25 (Python)

Python · Dorian Brown

Implementation pitfalls

Tuning k1/b by guesswork instead of on dataMedium

Defaults k1=1.2, b=0.75 are not optimal for every corpus (e.g. short titles vs long articles).

Fix:Tune on a validation set using nDCG/MRR metrics.

No semantic understanding — same as TF-IDFHigh

BM25 still matches lexically; synonyms and paraphrases are not captured.

Fix:Combine with a dense retriever (hybrid search) or apply query expansion.

Evolution

Original paper · 1994 · Stephen E. Robertson

Okapi at TREC-3

Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, Mike Gatford

1976

Probabilistic relevance model

Robertson and Spärck Jones formulate the probabilistic retrieval model — the theoretical basis of BM25.

1994

Okapi BM25 at TREC-3

Inflection point

The full BM25 formula presented in the Okapi system at the TREC-3 conference.

2009

"BM25 and Beyond" monograph

Robertson and Zaragoza systematise the BM25 family (including BM25F for fields) in a comprehensive survey.

2021

BM25 as a sparse baseline in the BEIR benchmark

BEIR shows that BM25 remains competitive with dense retrievers in zero-shot retrieval.

Sources

The Probabilistic Relevance Framework: BM25 and Beyond

Foundations and Trends in Information Retrieval

Robertson & Zaragoza (2009) — survey monograph on BM25.