# Constitutional AI

**Source:** https://promtable.com/glossary/constitutional-ai

> Constitutional AI is Anthropic's alignment method where a model is trained to follow a written constitution — a set of principles applied during self-critique and revision — without per-task human preference labels at every step.

---
Constitutional AI is Anthropic's alignment method where a model is trained to follow a written constitution — a set of principles applied during self-critique and revision — without per-task human preference labels at every step.

Introduced by Anthropic in 2022 and refined through Claude 4.x, Constitutional AI replaces large parts of RLHF with self-critique against a model constitution. During training the model is prompted to critique its own outputs against the constitution and revise; the resulting pairs train a reward model. The result: scalable alignment with less reliance on continuous human labelling, more transparent alignment criteria (the constitution is public), and easier auditability. By 2026 versions of the technique are used across multiple labs in production training pipelines.

## Common mistakes

- Treating the constitution as static — it evolves with deployment learnings.
- Skipping human evaluation entirely — the constitution still needs human-graded checks.

## Related terms

- [instruction-tuning](https://promtable.com/glossary/instruction-tuning)
- [fine-tuning](https://promtable.com/glossary/fine-tuning)
- [evals](https://promtable.com/glossary/evals)

## Sources

- [Constitutional AI (arXiv)](https://arxiv.org/abs/2212.08073)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/constitutional-ai
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/constitutional-ai".
Contact: info@vibecodingturkey.com.