Guardrails
Guardrails are deterministic checks layered around a language model to prevent unsafe, off-topic, or non-compliant outputs from reaching the user.
Guardrails sit on the input or output side of a model call. Input guardrails (PII detection, prompt injection scanners, policy classifiers) reject or rewrite the request before the model sees it. Output guardrails (toxicity checks, JSON validators, fact-checkers, profanity filters) inspect the model's response and either pass, rewrite, or refuse. Frameworks: Guardrails AI, NeMo Guardrails, Llama Guard, OpenAI Moderation, Lakera. In production every LLM-facing surface in 2026 has at least lightweight guardrails — the question is which ones.
When to use guardrails
- Any user-facing AI feature.
- Regulated industries (finance, healthcare, education).
Common mistakes
- Relying only on prompt instructions for safety — easily bypassed.
- Using guardrails as a substitute for an evals system.
FAQ
What is guardrails?
Guardrails are deterministic checks layered around a language model to prevent unsafe, off-topic, or non-compliant outputs from reaching the user.
When should I use guardrails?
Any user-facing AI feature. Regulated industries (finance, healthcare, education).
What are the most common mistakes with guardrails?
Relying only on prompt instructions for safety — easily bypassed. Using guardrails as a substitute for an evals system.
Related terms
- System prompt — A system prompt is the high-priority instruction block that defines a model's role, constraints, and default behaviors for an entire conversation.
- Hallucination — A hallucination is when a language model produces output that is factually wrong, fabricated, or unsupported, while sounding confident.
- AI agent — An AI agent is a system where a language model autonomously plans and executes a sequence of tool calls to accomplish a goal.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/guardrails.md.