Long-context prompting
Long-context prompting is the discipline of writing prompts that exploit 200K-1M+ token windows effectively — chunk ordering, head-and-tail anchoring, summarisation, and recall-aware structure.
Long context is now the default in 2026 (Claude 200K, GPT-4o 128K, Gemini 2 Pro 1M). But raw long context is not free quality — models still suffer from "lost in the middle" recall degradation. Effective long-context prompting puts critical content at the head and tail, summarises mid-context content explicitly, repeats key instructions near the end of the prompt, and uses long-context evals (needle-in-haystack tests on your data) to verify recall before shipping. Long context also enables many-shot in-context learning — hundreds of examples in the prompt — which can approach fine-tune quality for narrow tasks.
When to use long-context prompting
- Document QA, summarisation, code-base review.
- Many-shot in-context learning.
- Long agent loops without retrieval.
Common mistakes
- Trusting public benchmarks instead of needle-in-haystack on your own data.
- Forgetting that long prompts blow up cost — the whole context is billed every turn.
FAQ
What is long-context prompting?
Long-context prompting is the discipline of writing prompts that exploit 200K-1M+ token windows effectively — chunk ordering, head-and-tail anchoring, summarisation, and recall-aware structure.
When should I use long-context prompting?
Document QA, summarisation, code-base review. Many-shot in-context learning. Long agent loops without retrieval.
What are the most common mistakes with long-context prompting?
Trusting public benchmarks instead of needle-in-haystack on your own data. Forgetting that long prompts blow up cost — the whole context is billed every turn.
Related terms
- Context window — The context window is the maximum number of tokens — system prompt, conversation history, retrieved documents, and the response — that a language model can process in a single turn.
- Prompt caching — Prompt caching reuses the model's internal state for a repeated prompt prefix so the API charges and computes the prefix only once across many calls.
- Few-shot prompting — Few-shot prompting supplies 2–10 input–output examples inside the prompt so the model imitates the pattern on a new input.
- Retrieval-augmented generation (RAG) — Retrieval-augmented generation (RAG) injects relevant documents into the prompt at query time so the model answers from your data instead of its training memory.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/long-context-prompting.md.