failure

Context window stuffing

Context window stuffing is the antipattern of putting everything you might need into a single LLM call's context — degrading reasoning, blowing up cost, and obscuring which piece of context actually mattered.

Long context windows (200K-1M tokens in 2026) tempt teams to skip the engineering and just paste everything in. The result is consistently worse than thoughtful context selection: models lose attention in the middle ("lost in the middle"), costs balloon linearly per token, and debugging which piece of context drove the answer becomes nearly impossible. The cure is context engineering — distil aggressively, use retrieval for what's actually relevant, and put critical content at head + tail of the prompt. Long context is a tool, not a substitute for thinking about what the model needs.

Common mistakes

FAQ

What is context window stuffing?

Context window stuffing is the antipattern of putting everything you might need into a single LLM call's context — degrading reasoning, blowing up cost, and obscuring which piece of context actually mattered.

What are the most common mistakes with context window stuffing?

Trusting that the model will find the needle in your haystack. Skipping retrieval because the corpus fits in context — retrieval is still cheaper at scale.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/context-window-stuffing.md.