Context window stuffing
Context window stuffing is the antipattern of putting everything you might need into a single LLM call's context — degrading reasoning, blowing up cost, and obscuring which piece of context actually mattered.
Long context windows (200K-1M tokens in 2026) tempt teams to skip the engineering and just paste everything in. The result is consistently worse than thoughtful context selection: models lose attention in the middle ("lost in the middle"), costs balloon linearly per token, and debugging which piece of context drove the answer becomes nearly impossible. The cure is context engineering — distil aggressively, use retrieval for what's actually relevant, and put critical content at head + tail of the prompt. Long context is a tool, not a substitute for thinking about what the model needs.
Common mistakes
- Trusting that the model will find the needle in your haystack.
- Skipping retrieval because the corpus fits in context — retrieval is still cheaper at scale.
FAQ
What is context window stuffing?
Context window stuffing is the antipattern of putting everything you might need into a single LLM call's context — degrading reasoning, blowing up cost, and obscuring which piece of context actually mattered.
What are the most common mistakes with context window stuffing?
Trusting that the model will find the needle in your haystack. Skipping retrieval because the corpus fits in context — retrieval is still cheaper at scale.
Related terms
- Context window — The context window is the maximum number of tokens — system prompt, conversation history, retrieved documents, and the response — that a language model can process in a single turn.
- Long-context prompting — Long-context prompting is the discipline of writing prompts that exploit 200K-1M+ token windows effectively — chunk ordering, head-and-tail anchoring, summarisation, and recall-aware structure.
- Retrieval-augmented generation (RAG) — Retrieval-augmented generation (RAG) injects relevant documents into the prompt at query time so the model answers from your data instead of its training memory.
- Context distillation — Context distillation summarises an agent's growing conversation history into a compact representation, so each step's input stays small while preserving the relevant signal.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/context-window-stuffing.md.