# BM25

**Source:** https://promtable.com/glossary/bm25

> BM25 is the classic lexical retrieval algorithm — a tuned TF-IDF variant that scores documents by query-term frequency and inverse document frequency, still essential as part of [[hybrid-search]] in 2026.

---
BM25 is the classic lexical retrieval algorithm — a tuned TF-IDF variant that scores documents by query-term frequency and inverse document frequency, still essential as part of [[hybrid-search]] in 2026.

BM25 (Best Matching 25) scores documents by how often query terms appear in them, weighted by global term rarity and adjusted for document length. It's purely lexical — no semantics — so it misses synonyms and paraphrases that vector embeddings catch. But it dominates on exact-match queries (product codes, names, error messages, technical terms) where embeddings often fail. Modern 2026 RAG pipelines hybrid-search: BM25 + vector retrieval combined via reciprocal rank fusion ([[rrf]]) before reranking. Implementations: Postgres `tsvector`, Elasticsearch / OpenSearch, Tantivy, Qdrant + sparse vectors, MeiliSearch.

## When to use

- Exact-match queries (codes, names, error strings).
- Hybrid search alongside vector retrieval.

## Common mistakes

- Skipping BM25 — vector-only retrieval misses exact-match queries.
- Using BM25 alone for semantic queries — synonyms / paraphrases get missed.

## Related terms

- [hybrid-search](https://promtable.com/glossary/hybrid-search)
- [rrf](https://promtable.com/glossary/rrf)
- [embeddings](https://promtable.com/glossary/embeddings)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/bm25
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/bm25".
Contact: info@vibecodingturkey.com.