model

Reranker

A reranker is a small cross-encoder model that takes a query + a candidate document and outputs a relevance score — used as the second stage after embedding retrieval to push the right answer to the top.

Two-stage retrieval pipelines work like this: stage 1 uses fast vector / BM25 search to fetch the top 50-200 candidates; stage 2 runs a reranker over each (query, candidate) pair to produce final ordering. Rerankers are slower per pair than embeddings (cross-attention over both texts vs separate encoding) but materially higher quality — they see query + document together. In 2026 production: Cohere Rerank 3, Voyage Rerank, Jina Reranker v2, BAAI bge-reranker, ColBERT. Reranking improves RAG accuracy 10-30% over embeddings alone with sub-100ms latency added per query. Cost matters — reranking 100 candidates per query at scale adds up.

When to use reranker

Common mistakes

FAQ

What is reranker?

A reranker is a small cross-encoder model that takes a query + a candidate document and outputs a relevance score — used as the second stage after embedding retrieval to push the right answer to the top.

When should I use reranker?

Any production RAG pipeline above toy scale. When embedding-only top-K is too noisy.

What are the most common mistakes with reranker?

Reranking too many candidates — diminishing returns past top 100. Skipping rerank in production — pure embedding search caps at ~70% recall@10 on hard queries.

Sources

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/reranker.md.