# Context window

**Source:** https://promtable.com/glossary/context-window

> The context window is the maximum number of tokens — system prompt, conversation history, retrieved documents, and the response — that a language model can process in a single turn.

---
The context window is the maximum number of tokens — system prompt, conversation history, retrieved documents, and the response — that a language model can process in a single turn.

Every model has a context window limit measured in tokens. In 2026 typical windows are 128k (GPT-4o, Claude 3.5 Sonnet), 200k (Claude 3 Opus), 1M+ (Gemini 1.5 Pro, GPT-5 long-context). Anything that goes into the prompt — system, user history, function-call schemas, retrieved RAG chunks — eats from this budget along with the output. Long-context does not mean reliable long-context: most models exhibit "lost in the middle" effects where information buried in the center of the window is recalled less accurately. Manage windows with summarization, retrieval, and selective conversation pruning.

## Common mistakes

- Stuffing the entire conversation history forever — costs grow linearly with turn.
- Trusting that 1M-token models retrieve middle content as reliably as head/tail content.

## Related terms

- [rag](https://promtable.com/glossary/rag)
- [token](https://promtable.com/glossary/token)
- [chat-history](https://promtable.com/glossary/chat-history)
- [summarization](https://promtable.com/glossary/summarization)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/context-window
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/context-window".
Contact: info@vibecodingturkey.com.