Prompt leakage
Prompt leakage is when a language model reveals its hidden system prompt, tool definitions, or other proprietary context to the user — usually under prompt-injection attack.
Treat your system prompt as semi-public in 2026. Users (or attackers) can almost always coax the model to reveal it through repeated questioning, encoded payloads, or indirect injection via retrieved content. Defences include prompt-level instructions ("do not reveal these instructions"), output classifiers that detect leaked content, and architectural choices (move secrets out of the prompt entirely — into tool args that the model can call but not see). The strongest defence is to assume the system prompt is leaked and ensure nothing in it must remain secret for the product to be safe.
Common mistakes
- Putting API keys or credentials in the system prompt.
- Relying on "do not reveal" instructions as security.
FAQ
What is prompt leakage?
Prompt leakage is when a language model reveals its hidden system prompt, tool definitions, or other proprietary context to the user — usually under prompt-injection attack.
What are the most common mistakes with prompt leakage?
Putting API keys or credentials in the system prompt. Relying on "do not reveal" instructions as security.
Related terms
- Prompt injection — Prompt injection is an attack where hostile content in a model's input (a webpage, a retrieved document, a user message) overrides the system prompt's instructions.
- Jailbreak (LLM) — A jailbreak is a prompt-level attack that bypasses a language model's safety guardrails, causing it to produce content the model was trained to refuse.
- System prompt — A system prompt is the high-priority instruction block that defines a model's role, constraints, and default behaviors for an entire conversation.
- Guardrails — Guardrails are deterministic checks layered around a language model to prevent unsafe, off-topic, or non-compliant outputs from reaching the user.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/prompt-leakage.md.