# KV cache

**Source:** https://promtable.com/glossary/kv-cache

> The KV (key-value) cache stores the attention keys and values for tokens already processed, so each new token only attends to history instead of recomputing it.

---
The KV (key-value) cache stores the attention keys and values for tokens already processed, so each new token only attends to history instead of recomputing it.

Without a KV cache, generating each new token would require re-processing the entire context — quadratic cost. With it, generation is linear in context length. The KV cache is the dominant memory consumer at inference time: a 128K-token context can consume tens of gigabytes of GPU memory in KV alone. Optimisations like grouped-query attention, multi-head latent attention (DeepSeek), and PagedAttention (vLLM) reduce KV memory and unlock long-context serving. Prompt caching is a server-side reuse of the KV cache across API calls.

## Common mistakes

- Forgetting KV memory when sizing GPU inference — context length × heads × layers × bytes matters.
- Assuming long-context support is uniform — providers ration KV memory hard.

## Related terms

- [context-window](https://promtable.com/glossary/context-window)
- [prompt-caching](https://promtable.com/glossary/prompt-caching)
- [reasoning-model](https://promtable.com/glossary/reasoning-model)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/kv-cache
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/kv-cache".
Contact: info@vibecodingturkey.com.