Constitutional AI
Constitutional AI is Anthropic's alignment method where a model is trained to follow a written constitution — a set of principles applied during self-critique and revision — without per-task human preference labels at every step.
Introduced by Anthropic in 2022 and refined through Claude 4.x, Constitutional AI replaces large parts of RLHF with self-critique against a model constitution. During training the model is prompted to critique its own outputs against the constitution and revise; the resulting pairs train a reward model. The result: scalable alignment with less reliance on continuous human labelling, more transparent alignment criteria (the constitution is public), and easier auditability. By 2026 versions of the technique are used across multiple labs in production training pipelines.
Common mistakes
- Treating the constitution as static — it evolves with deployment learnings.
- Skipping human evaluation entirely — the constitution still needs human-graded checks.
FAQ
What is constitutional ai?
Constitutional AI is Anthropic's alignment method where a model is trained to follow a written constitution — a set of principles applied during self-critique and revision — without per-task human preference labels at every step.
What are the most common mistakes with constitutional ai?
Treating the constitution as static — it evolves with deployment learnings. Skipping human evaluation entirely — the constitution still needs human-graded checks.
Related terms
- Instruction tuning — Instruction tuning is the post-training stage where a base language model is fine-tuned on examples of (instruction, ideal response) pairs to follow human instructions reliably.
- Fine-tuning — Fine-tuning updates a pretrained model's weights on task-specific data, baking the new behaviour into the model rather than relying on prompts.
- Evals (LLM evaluations) — Evals are systematic tests that measure how well a language model or LLM-powered system performs on a defined task using a golden set of inputs and reference outputs.
Sources
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/constitutional-ai.md.