technique

Constitutional AI

Constitutional AI is Anthropic's alignment method where a model is trained to follow a written constitution — a set of principles applied during self-critique and revision — without per-task human preference labels at every step.

Introduced by Anthropic in 2022 and refined through Claude 4.x, Constitutional AI replaces large parts of RLHF with self-critique against a model constitution. During training the model is prompted to critique its own outputs against the constitution and revise; the resulting pairs train a reward model. The result: scalable alignment with less reliance on continuous human labelling, more transparent alignment criteria (the constitution is public), and easier auditability. By 2026 versions of the technique are used across multiple labs in production training pipelines.

Common mistakes

FAQ

What is constitutional ai?

Constitutional AI is Anthropic's alignment method where a model is trained to follow a written constitution — a set of principles applied during self-critique and revision — without per-task human preference labels at every step.

What are the most common mistakes with constitutional ai?

Treating the constitution as static — it evolves with deployment learnings. Skipping human evaluation entirely — the constitution still needs human-graded checks.

Sources

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/constitutional-ai.md.