Conversation compaction
Conversation compaction summarises a long agent or chat history into a tight representation that preserves the relevant signal — used when the conversation approaches the model's context window.
As agent loops and long chats grow, the history eventually approaches the context window. Compaction runs a summariser (typically a cheaper model) that rewrites the history as a structured summary — current goal, key facts learned, open subtasks, important decisions — and discards verbatim tool outputs once their information has been extracted. Production implementations: Claude Code's auto-compact, OpenAI Assistants thread summarisation, LangGraph state compaction nodes. The hard problem is what to keep vs drop; aggressive compaction loses signal, light compaction barely helps. Most teams in 2026 trigger compaction at 70-80% context fill.
When to use conversation compaction
- Long agent loops (15+ steps).
- Multi-hour chat sessions.
Common mistakes
- Compacting too aggressively — losing context the next step needed.
- Compacting too late — the model has already lost quality from context pressure.
FAQ
What is conversation compaction?
Conversation compaction summarises a long agent or chat history into a tight representation that preserves the relevant signal — used when the conversation approaches the model's context window.
When should I use conversation compaction?
Long agent loops (15+ steps). Multi-hour chat sessions.
What are the most common mistakes with conversation compaction?
Compacting too aggressively — losing context the next step needed. Compacting too late — the model has already lost quality from context pressure.
Related terms
- Context distillation — Context distillation summarises an agent's growing conversation history into a compact representation, so each step's input stays small while preserving the relevant signal.
- Context window — The context window is the maximum number of tokens — system prompt, conversation history, retrieved documents, and the response — that a language model can process in a single turn.
- Agent loop — An agent loop is the repeating cycle of an AI agent — observe state, decide on an action (usually a tool call), execute, observe the result, and repeat — until a goal is reached or a stop condition fires.
- Long-context prompting — Long-context prompting is the discipline of writing prompts that exploit 200K-1M+ token windows effectively — chunk ordering, head-and-tail anchoring, summarisation, and recall-aware structure.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/compaction.md.