concept

Regression baseline

A regression baseline is the recorded performance of the current production prompt + model against the golden set — every change must match or beat this baseline before merge.

Without a regression baseline, prompt + model changes ship by hope. Baseline + eval: capture current win rate against the [[golden-set]], gate merges on 'change must not drop > X percentage points', publish the diff in the PR. Implementations: Braintrust experiment diffs, Langfuse evals, custom CI scripts running an eval suite. Production patterns: baseline per route (chat vs classification vs extraction), set tolerance per route (chat: 0%; bulk classification: 1-2% acceptable), require human override for regressions (gate, not block). The baseline + golden set + eval CI is what turns AI from artisanal to engineered.

When to use regression baseline

Common mistakes

FAQ

What is regression baseline?

A regression baseline is the recorded performance of the current production prompt + model against the golden set — every change must match or beat this baseline before merge.

When should I use regression baseline?

Production AI features with eval pipelines.

What are the most common mistakes with regression baseline?

Skipping baseline — silent regressions ship. Setting tolerance too loose — every change shrinks quality by a few percent.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/regression-baseline.md.