# Speculative RAG

**Source:** https://promtable.com/glossary/speculative-rag

> Speculative RAG runs a small fast model to draft an answer + identify what's uncertain, then retrieves and verifies only the uncertain claims with the strong model — saving cost on confident parts.

---
Speculative RAG runs a small fast model to draft an answer + identify what's uncertain, then retrieves and verifies only the uncertain claims with the strong model — saving cost on confident parts.

Speculative RAG (Wang et al., 2024) inverts the standard RAG pipeline. A cheap draft model produces an initial answer and flags claims it's uncertain about. The strong verifier model retrieves evidence and corrects only those flagged claims. Empirically matches full-RAG quality at much lower cost on confident queries. Production stacks in 2026 use the pattern for high-volume search where most queries are common-knowledge but a tail needs grounding.

## When to use

- High-volume search where most queries are common-knowledge.
- Cost-sensitive RAG at scale.

## Common mistakes

- Trusting the draft model's confidence claims without calibration.
- Skipping verification on claims that look confident but aren't.

## Related terms

- [rag](https://promtable.com/glossary/rag)
- [speculative-decoding](https://promtable.com/glossary/speculative-decoding)
- [self-correction](https://promtable.com/glossary/self-correction)
- [chain-of-verification](https://promtable.com/glossary/chain-of-verification)

## Sources

- [Speculative RAG (arXiv)](https://arxiv.org/abs/2407.08223)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/speculative-rag
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/speculative-rag".
Contact: info@vibecodingturkey.com.