Instruction hierarchy
Instruction hierarchy is a model's trained ordering of trust — system prompt outranks user message which outranks retrieved content — used to resist prompt injection and jailbreak attempts.
Introduced by OpenAI in 2024 and now widely adopted, instruction hierarchy explicitly trains models to weight different sources of instructions differently. The system prompt (highest trust) sets policy; the user message (medium trust) can request actions within policy; tool output and retrieved content (lowest trust) provide information but should not be obeyed as instructions. The technique meaningfully reduces indirect prompt-injection success rates but does not eliminate them. Combine with input/output guardrails and tight tool surfaces for production safety.
Common mistakes
- Treating instruction hierarchy as complete protection — it raises the bar, not closes the door.
- Putting trust-sensitive policy in the user message instead of the system prompt.
FAQ
What is instruction hierarchy?
Instruction hierarchy is a model's trained ordering of trust — system prompt outranks user message which outranks retrieved content — used to resist prompt injection and jailbreak attempts.
What are the most common mistakes with instruction hierarchy?
Treating instruction hierarchy as complete protection — it raises the bar, not closes the door. Putting trust-sensitive policy in the user message instead of the system prompt.
Related terms
- Prompt injection — Prompt injection is an attack where hostile content in a model's input (a webpage, a retrieved document, a user message) overrides the system prompt's instructions.
- System prompt — A system prompt is the high-priority instruction block that defines a model's role, constraints, and default behaviors for an entire conversation.
- Guardrails — Guardrails are deterministic checks layered around a language model to prevent unsafe, off-topic, or non-compliant outputs from reaching the user.
- Jailbreak (LLM) — A jailbreak is a prompt-level attack that bypasses a language model's safety guardrails, causing it to produce content the model was trained to refuse.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/instruction-hierarchy.md.