Reranker
A reranker is a small cross-encoder model that takes a query + a candidate document and outputs a relevance score — used as the second stage after embedding retrieval to push the right answer to the top.
Two-stage retrieval pipelines work like this: stage 1 uses fast vector / BM25 search to fetch the top 50-200 candidates; stage 2 runs a reranker over each (query, candidate) pair to produce final ordering. Rerankers are slower per pair than embeddings (cross-attention over both texts vs separate encoding) but materially higher quality — they see query + document together. In 2026 production: Cohere Rerank 3, Voyage Rerank, Jina Reranker v2, BAAI bge-reranker, ColBERT. Reranking improves RAG accuracy 10-30% over embeddings alone with sub-100ms latency added per query. Cost matters — reranking 100 candidates per query at scale adds up.
When to use reranker
- Any production RAG pipeline above toy scale.
- When embedding-only top-K is too noisy.
Common mistakes
- Reranking too many candidates — diminishing returns past top 100.
- Skipping rerank in production — pure embedding search caps at ~70% recall@10 on hard queries.
FAQ
What is reranker?
A reranker is a small cross-encoder model that takes a query + a candidate document and outputs a relevance score — used as the second stage after embedding retrieval to push the right answer to the top.
When should I use reranker?
Any production RAG pipeline above toy scale. When embedding-only top-K is too noisy.
What are the most common mistakes with reranker?
Reranking too many candidates — diminishing returns past top 100. Skipping rerank in production — pure embedding search caps at ~70% recall@10 on hard queries.
Related terms
- Embeddings — Embeddings are dense numeric vectors that represent the meaning of text, images, or other data, allowing similarity to be measured as vector distance.
- Hybrid search (retrieval) — Hybrid search combines dense vector retrieval with sparse keyword (BM25) retrieval, then fuses the two ranked lists — the production retrieval default for RAG in 2026.
- BM25 — BM25 is the classic lexical retrieval algorithm — a tuned TF-IDF variant that scores documents by query-term frequency and inverse document frequency, still essential as part of [[hybrid-search]] in 2026.
Sources
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/reranker.md.