concept

Guardrails

Guardrails are deterministic checks layered around a language model to prevent unsafe, off-topic, or non-compliant outputs from reaching the user.

Guardrails sit on the input or output side of a model call. Input guardrails (PII detection, prompt injection scanners, policy classifiers) reject or rewrite the request before the model sees it. Output guardrails (toxicity checks, JSON validators, fact-checkers, profanity filters) inspect the model's response and either pass, rewrite, or refuse. Frameworks: Guardrails AI, NeMo Guardrails, Llama Guard, OpenAI Moderation, Lakera. In production every LLM-facing surface in 2026 has at least lightweight guardrails — the question is which ones.

When to use guardrails

Common mistakes

FAQ

What is guardrails?

Guardrails are deterministic checks layered around a language model to prevent unsafe, off-topic, or non-compliant outputs from reaching the user.

When should I use guardrails?

Any user-facing AI feature. Regulated industries (finance, healthcare, education).

What are the most common mistakes with guardrails?

Relying only on prompt instructions for safety — easily bypassed. Using guardrails as a substitute for an evals system.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/guardrails.md.