# Prefix caching

**Source:** https://promtable.com/glossary/prefix-caching

> Prefix caching reuses the KV-cache state computed for a shared prompt prefix across many requests, so the prefix is processed once and amortised over all subsequent calls.

---
Prefix caching reuses the KV-cache state computed for a shared prompt prefix across many requests, so the prefix is processed once and amortised over all subsequent calls.

Prefix caching is the server-side mechanism that powers prompt caching as exposed by OpenAI, Anthropic, and Google in 2026. The model server stores the attention key/value tensors computed for a common prefix (system prompt + few-shot examples + tool schemas) and reuses them when a new request shares that exact prefix. Token-for-token cost on the cached portion drops to 10-25% of normal input price. The cache is keyed on a hash of the prefix tokens — any change to the prefix invalidates it. Design system prompts to maximise prefix stability for cache hits.

## Common mistakes

- Putting variable content (timestamps, user ID) at the start of the prompt — breaks the cache.
- Not measuring cache hit rate — you don't know if you're getting the discount.

## Related terms

- [prompt-caching](https://promtable.com/glossary/prompt-caching)
- [kv-cache](https://promtable.com/glossary/kv-cache)
- [system-prompt](https://promtable.com/glossary/system-prompt)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/prefix-caching
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/prefix-caching".
Contact: info@vibecodingturkey.com.