RAG fusion
RAG fusion runs multiple query rewrites in parallel against the retrieval index and fuses the ranked results — improving recall on ambiguous or multi-aspect queries.
Standard RAG retrieves on the user's original query. RAG fusion generates 3-5 query variants (semantic rewrites, more specific reformulations, synonym expansions), runs retrieval on each, and combines the ranked lists with reciprocal rank fusion. Empirically improves recall on ambiguous queries where the original wording misses relevant documents. Adds latency + LLM cost for the rewrite step but is cheap relative to a wrong answer. Pairs naturally with re-ranking the fused candidate set.
When to use rag fusion
- Ambiguous user queries ("how do I X" with multiple plausible meanings).
- Multi-aspect queries that need different docs for different parts.
Common mistakes
- Generating too many variants — diminishing returns past ~5 and cost climbs linearly.
- Skipping the re-rank step — fusion fuses noise without filtering.
FAQ
What is rag fusion?
RAG fusion runs multiple query rewrites in parallel against the retrieval index and fuses the ranked results — improving recall on ambiguous or multi-aspect queries.
When should I use rag fusion?
Ambiguous user queries ("how do I X" with multiple plausible meanings). Multi-aspect queries that need different docs for different parts.
What are the most common mistakes with rag fusion?
Generating too many variants — diminishing returns past ~5 and cost climbs linearly. Skipping the re-rank step — fusion fuses noise without filtering.
Related terms
- Retrieval-augmented generation (RAG) — Retrieval-augmented generation (RAG) injects relevant documents into the prompt at query time so the model answers from your data instead of its training memory.
- Hybrid search (retrieval) — Hybrid search combines dense vector retrieval with sparse keyword (BM25) retrieval, then fuses the two ranked lists — the production retrieval default for RAG in 2026.
- Semantic search — Semantic search finds documents by meaning rather than keyword match, using embedding similarity in a vector space.
- Self-consistency — Self-consistency runs the same prompt multiple times at non-zero temperature and picks the most common final answer, raising accuracy on reasoning tasks.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/rag-fusion.md.