# Retrieval evals

**Source:** https://promtable.com/glossary/retrieval-evals

> Retrieval evals measure how well a RAG system's retrieval stage performs — Recall@K, nDCG@K, coverage — separately from the generation quality of the answer.

---
Retrieval evals measure how well a RAG system's retrieval stage performs — Recall@K, nDCG@K, coverage — separately from the generation quality of the answer.

Retrieval and generation are different problems with different failure modes. Retrieval evals isolate the retrieval stage: given a labelled set of (query, ideal_doc_ids), what fraction of ideal docs appear in top-K (Recall@K), how well are they ordered (nDCG@K), what fraction of queries have at least one relevant doc retrieved (coverage). Use Ragas, custom Python with sklearn metrics, or LlamaIndex evaluators. The discipline: never debug a bad RAG answer without first checking whether the retrieval actually pulled the right docs — half the time the retrieval was correct and the generation prompt is the bug; half the time the docs weren't there.

## When to use

- Any production RAG system.
- Whenever a RAG answer quality regresses.

## Common mistakes

- Combining retrieval and generation evals — masks where the problem is.
- Labelled set that doesn't reflect production query distribution.

## Related terms

- [rag](https://promtable.com/glossary/rag)
- [evals](https://promtable.com/glossary/evals)
- [semantic-search](https://promtable.com/glossary/semantic-search)
- [hybrid-search](https://promtable.com/glossary/hybrid-search)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/retrieval-evals
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/retrieval-evals".
Contact: info@vibecodingturkey.com.