technique

Speculative RAG

Speculative RAG runs a small fast model to draft an answer + identify what's uncertain, then retrieves and verifies only the uncertain claims with the strong model — saving cost on confident parts.

Speculative RAG (Wang et al., 2024) inverts the standard RAG pipeline. A cheap draft model produces an initial answer and flags claims it's uncertain about. The strong verifier model retrieves evidence and corrects only those flagged claims. Empirically matches full-RAG quality at much lower cost on confident queries. Production stacks in 2026 use the pattern for high-volume search where most queries are common-knowledge but a tail needs grounding.

When to use speculative rag

Common mistakes

FAQ

What is speculative rag?

Speculative RAG runs a small fast model to draft an answer + identify what's uncertain, then retrieves and verifies only the uncertain claims with the strong model — saving cost on confident parts.

When should I use speculative rag?

High-volume search where most queries are common-knowledge. Cost-sensitive RAG at scale.

What are the most common mistakes with speculative rag?

Trusting the draft model's confidence claims without calibration. Skipping verification on claims that look confident but aren't.

Sources

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/speculative-rag.md.