concept

BM25

BM25 is the classic lexical retrieval algorithm — a tuned TF-IDF variant that scores documents by query-term frequency and inverse document frequency, still essential as part of [[hybrid-search]] in 2026.

BM25 (Best Matching 25) scores documents by how often query terms appear in them, weighted by global term rarity and adjusted for document length. It's purely lexical — no semantics — so it misses synonyms and paraphrases that vector embeddings catch. But it dominates on exact-match queries (product codes, names, error messages, technical terms) where embeddings often fail. Modern 2026 RAG pipelines hybrid-search: BM25 + vector retrieval combined via reciprocal rank fusion ([[rrf]]) before reranking. Implementations: Postgres `tsvector`, Elasticsearch / OpenSearch, Tantivy, Qdrant + sparse vectors, MeiliSearch.

When to use bm25

Common mistakes

FAQ

What is bm25?

BM25 is the classic lexical retrieval algorithm — a tuned TF-IDF variant that scores documents by query-term frequency and inverse document frequency, still essential as part of [[hybrid-search]] in 2026.

When should I use bm25?

Exact-match queries (codes, names, error strings). Hybrid search alongside vector retrieval.

What are the most common mistakes with bm25?

Skipping BM25 — vector-only retrieval misses exact-match queries. Using BM25 alone for semantic queries — synonyms / paraphrases get missed.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/bm25.md.