# Model collapse

**Source:** https://promtable.com/glossary/model-collapse

> Model collapse is what happens when a model is trained or fine-tuned on its own outputs across generations — quality degrades, diversity shrinks, and tail knowledge is forgotten.

---
Model collapse is what happens when a model is trained or fine-tuned on its own outputs across generations — quality degrades, diversity shrinks, and tail knowledge is forgotten.

Documented by Shumailov et al. in 2023-2024 and confirmed across 2026 research, model collapse occurs when synthetic-data loops feed back into training without quality filtering or grounded data. The model converges toward its own modal outputs, loses long-tail knowledge, and produces increasingly homogeneous output. Practical implications in 2026: synthetic data pipelines must include real human / grounded data, must filter for quality, and must monitor diversity metrics. Pre-training corpora are now heavily contaminated with AI-generated content; major labs invest in provenance detection and human-authored data sources to combat collapse.

## Common mistakes

- Distilling a student on the teacher's outputs without any real data anchor.
- Running synthetic-data flywheels without quality gates.

## Related terms

- [synthetic-data](https://promtable.com/glossary/synthetic-data)
- [distillation](https://promtable.com/glossary/distillation)
- [fine-tuning](https://promtable.com/glossary/fine-tuning)

## Sources

- [Shumailov et al. 2024 (Nature)](https://www.nature.com/articles/s41586-024-07566-y)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/model-collapse
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/model-collapse".
Contact: info@vibecodingturkey.com.