concept

Prefix caching

Prefix caching reuses the KV-cache state computed for a shared prompt prefix across many requests, so the prefix is processed once and amortised over all subsequent calls.

Prefix caching is the server-side mechanism that powers prompt caching as exposed by OpenAI, Anthropic, and Google in 2026. The model server stores the attention key/value tensors computed for a common prefix (system prompt + few-shot examples + tool schemas) and reuses them when a new request shares that exact prefix. Token-for-token cost on the cached portion drops to 10-25% of normal input price. The cache is keyed on a hash of the prefix tokens — any change to the prefix invalidates it. Design system prompts to maximise prefix stability for cache hits.

Common mistakes

FAQ

What is prefix caching?

Prefix caching reuses the KV-cache state computed for a shared prompt prefix across many requests, so the prefix is processed once and amortised over all subsequent calls.

What are the most common mistakes with prefix caching?

Putting variable content (timestamps, user ID) at the start of the prompt — breaks the cache. Not measuring cache hit rate — you don't know if you're getting the discount.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/prefix-caching.md.