Context window
The context window is the maximum number of tokens — system prompt, conversation history, retrieved documents, and the response — that a language model can process in a single turn.
Every model has a context window limit measured in tokens. In 2026 typical windows are 128k (GPT-4o, Claude 3.5 Sonnet), 200k (Claude 3 Opus), 1M+ (Gemini 1.5 Pro, GPT-5 long-context). Anything that goes into the prompt — system, user history, function-call schemas, retrieved RAG chunks — eats from this budget along with the output. Long-context does not mean reliable long-context: most models exhibit "lost in the middle" effects where information buried in the center of the window is recalled less accurately. Manage windows with summarization, retrieval, and selective conversation pruning.
Common mistakes
- Stuffing the entire conversation history forever — costs grow linearly with turn.
- Trusting that 1M-token models retrieve middle content as reliably as head/tail content.
FAQ
What is context window?
The context window is the maximum number of tokens — system prompt, conversation history, retrieved documents, and the response — that a language model can process in a single turn.
What are the most common mistakes with context window?
Stuffing the entire conversation history forever — costs grow linearly with turn. Trusting that 1M-token models retrieve middle content as reliably as head/tail content.
Related terms
- Retrieval-augmented generation (RAG) — Retrieval-augmented generation (RAG) injects relevant documents into the prompt at query time so the model answers from your data instead of its training memory.
- Token — A token is the smallest unit a language model reads or writes — typically a sub-word fragment, with one English word averaging about 1.3 tokens.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/context-window.md.