# Regression baseline

**Source:** https://promtable.com/glossary/regression-baseline

> A regression baseline is the recorded performance of the current production prompt + model against the golden set — every change must match or beat this baseline before merge.

---
A regression baseline is the recorded performance of the current production prompt + model against the golden set — every change must match or beat this baseline before merge.

Without a regression baseline, prompt + model changes ship by hope. Baseline + eval: capture current win rate against the [[golden-set]], gate merges on 'change must not drop > X percentage points', publish the diff in the PR. Implementations: Braintrust experiment diffs, Langfuse evals, custom CI scripts running an eval suite. Production patterns: baseline per route (chat vs classification vs extraction), set tolerance per route (chat: 0%; bulk classification: 1-2% acceptable), require human override for regressions (gate, not block). The baseline + golden set + eval CI is what turns AI from artisanal to engineered.

## When to use

- Production AI features with eval pipelines.

## Common mistakes

- Skipping baseline — silent regressions ship.
- Setting tolerance too loose — every change shrinks quality by a few percent.

## Related terms

- [golden-set](https://promtable.com/glossary/golden-set)
- [evals-driven-development](https://promtable.com/glossary/evals-driven-development)
- [regression-suite](https://promtable.com/glossary/regression-suite)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/regression-baseline
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/regression-baseline".
Contact: info@vibecodingturkey.com.