Context distillation
Context distillation summarises an agent's growing conversation history into a compact representation, so each step's input stays small while preserving the relevant signal.
In agent loops the conversation history grows linearly with steps. Without compression, by step 20 the context is mostly stale tool outputs and the model both costs more and reasons worse ("lost in the middle" effects). Context distillation runs a small summariser after every N steps that rewrites the history as a tight scratchpad — current goal, key facts learned, open subtasks — and drops verbatim tool outputs that have been extracted from. The technique is standard in mature 2026 agent frameworks (LangGraph state graphs, OpenAI Agents SDK summarisation) and is the difference between agents that work at step 5 vs step 50.
When to use context distillation
- Long-horizon agent loops (10+ steps).
- Multi-turn assistants with hours-long sessions.
- RAG workflows with many retrieved-document turns.
Common mistakes
- Distilling too aggressively — losing context the next step needed.
- Skipping distillation until you hit the context limit — by then quality has already degraded.
FAQ
What is context distillation?
Context distillation summarises an agent's growing conversation history into a compact representation, so each step's input stays small while preserving the relevant signal.
When should I use context distillation?
Long-horizon agent loops (10+ steps). Multi-turn assistants with hours-long sessions. RAG workflows with many retrieved-document turns.
What are the most common mistakes with context distillation?
Distilling too aggressively — losing context the next step needed. Skipping distillation until you hit the context limit — by then quality has already degraded.
Related terms
- AI agent — An AI agent is a system where a language model autonomously plans and executes a sequence of tool calls to accomplish a goal.
- Context window — The context window is the maximum number of tokens — system prompt, conversation history, retrieved documents, and the response — that a language model can process in a single turn.
- ReAct pattern — ReAct interleaves Reasoning + Acting in an agent loop — the model writes a thought, then decides to call a tool, then observes the result, then thinks again.
- Prompt caching — Prompt caching reuses the model's internal state for a repeated prompt prefix so the API charges and computes the prefix only once across many calls.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/context-distillation.md.