failure

Model collapse

Model collapse is what happens when a model is trained or fine-tuned on its own outputs across generations — quality degrades, diversity shrinks, and tail knowledge is forgotten.

Documented by Shumailov et al. in 2023-2024 and confirmed across 2026 research, model collapse occurs when synthetic-data loops feed back into training without quality filtering or grounded data. The model converges toward its own modal outputs, loses long-tail knowledge, and produces increasingly homogeneous output. Practical implications in 2026: synthetic data pipelines must include real human / grounded data, must filter for quality, and must monitor diversity metrics. Pre-training corpora are now heavily contaminated with AI-generated content; major labs invest in provenance detection and human-authored data sources to combat collapse.

Common mistakes

FAQ

What is model collapse?

Model collapse is what happens when a model is trained or fine-tuned on its own outputs across generations — quality degrades, diversity shrinks, and tail knowledge is forgotten.

What are the most common mistakes with model collapse?

Distilling a student on the teacher's outputs without any real data anchor. Running synthetic-data flywheels without quality gates.

Sources

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/model-collapse.md.