# Semantic cache (LLM)

**Source:** https://promtable.com/glossary/semantic-cache

> A semantic cache stores LLM responses keyed by the meaning of the request — embedding-based lookup returns a cached answer when a new query is semantically close enough.

---
A semantic cache stores LLM responses keyed by the meaning of the request — embedding-based lookup returns a cached answer when a new query is semantically close enough.

Unlike a string-exact cache, a semantic cache embeds the incoming prompt and searches for nearby vectors in the cache. If the cosine similarity passes a threshold, return the cached response instead of calling the LLM. This dramatically cuts cost on repetitive workloads (customer support, FAQ-style queries, repeated agent steps). The tradeoff: false hits return slightly off-topic answers when the threshold is too lax. Production systems in 2026 (Helicone, GPTCache, Portkey, in-house) combine semantic cache + explicit TTLs + per-tenant scoping.

## When to use

- High-QPS support / FAQ bots.
- Repetitive agent step decisions.
- Cost-sensitive workloads with lots of paraphrase variation.

## Common mistakes

- Threshold too low — false hits silently degrade quality.
- Forgetting tenant isolation — one user's cached answer leaks to another.

## Related terms

- [embeddings](https://promtable.com/glossary/embeddings)
- [prompt-caching](https://promtable.com/glossary/prompt-caching)
- [vector-database](https://promtable.com/glossary/vector-database)
- [rag](https://promtable.com/glossary/rag)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/semantic-cache
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/semantic-cache".
Contact: info@vibecodingturkey.com.