concept

Evals-driven development

Evals-driven development is the discipline of writing the eval suite first, then iterating prompts and models against it — borrowing test-driven development for LLM work.

Evals-driven development inverts the usual order: define what success looks like, encode it as an automated eval against a golden set, then iterate prompts, models, and orchestration until the evals pass. Adopted widely by serious LLM teams in 2026 because it's the only way to ship reliably — vibe-coding prompt changes without evals breaks production. Mature implementations integrate evals into CI: every prompt change runs the suite, regressions block merges, scores trend in a dashboard. Tools: Braintrust, Langfuse, Ragas, Inspect Evals.

When to use evals-driven development

Common mistakes

FAQ

What is evals-driven development?

Evals-driven development is the discipline of writing the eval suite first, then iterating prompts and models against it — borrowing test-driven development for LLM work.

When should I use evals-driven development?

Any serious production LLM feature. Teams shipping multiple prompt changes per week.

What are the most common mistakes with evals-driven development?

Building evals after shipping — the prompt baked in regressions you didn't catch. Eval set that's too small or unrepresentative of real distribution.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/evals-driven-development.md.