# Auto-eval (LLM)

**Source:** https://promtable.com/glossary/auto-eval

> Auto-eval is the automated grading of LLM output — usually by an LLM judge with a rubric — that replaces or supplements human grading in eval suites.

---
Auto-eval is the automated grading of LLM output — usually by an LLM judge with a rubric — that replaces or supplements human grading in eval suites.

Auto-eval is the cost-effective core of evals discipline in 2026. The pattern: a strong model (Claude 4.6 Sonnet, GPT-4o, Gemini 2 Pro) takes the rubric + the candidate output + (sometimes) a reference output and produces a score per dimension. Tools: Braintrust, Ragas, DeepEval, Inspect Evals. Combine with periodic human spot-checks (sample 10%) to catch judge drift. Auto-eval makes it economical to run evals on every prompt change and every production sample without scaling a human-grading team.

## When to use

- Any production LLM feature with evals.
- Continuous monitoring of production samples.

## Common mistakes

- Single auto-judge without calibration — drift goes unnoticed.
- Rubric too vague — auto-judge scores everything similarly.

## Related terms

- [evals](https://promtable.com/glossary/evals)
- [llm-jury](https://promtable.com/glossary/llm-jury)
- [ab-testing-prompts](https://promtable.com/glossary/ab-testing-prompts)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/auto-eval
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/auto-eval".
Contact: info@vibecodingturkey.com.