# In-context RAG

**Source:** https://promtable.com/glossary/in-context-rag

> In-context RAG skips a vector index entirely and stuffs the whole knowledge base into the prompt — only viable when the corpus fits in the model's context window and is small enough that retrieval overhead exceeds inference cost.

---
In-context RAG skips a vector index entirely and stuffs the whole knowledge base into the prompt — only viable when the corpus fits in the model's context window and is small enough that retrieval overhead exceeds inference cost.

Long context windows (200K-1M tokens in 2026) made in-context RAG viable for small corpora — paste the whole policy document, contract, or FAQ into the prompt and let the model retrieve from there. Saves the ops cost of running a vector DB but burns more tokens per call and inherits long-context recall limitations. The break-even depends on QPS and corpus size: for a 50-page document queried 100×/day, in-context wins; for a 10,000-page corpus queried 100,000×/day, vector RAG wins by orders of magnitude. Anthropic ships prompt caching that makes in-context RAG cheaper by amortising the long-context cost across many queries.

## When to use

- Small corpora under ~200K tokens.
- Low QPS where infrastructure cost dominates.
- Prompt-cached static knowledge bases.

## Common mistakes

- Treating 1M token windows as recall-reliable for needle-in-haystack — they aren't.
- Forgetting that long context costs add up — even cached, throughput drops.

## Related terms

- [rag](https://promtable.com/glossary/rag)
- [context-window](https://promtable.com/glossary/context-window)
- [long-context-prompting](https://promtable.com/glossary/long-context-prompting)
- [prompt-caching](https://promtable.com/glossary/prompt-caching)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/in-context-rag
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/in-context-rag".
Contact: info@vibecodingturkey.com.