# Distillation

**Source:** https://promtable.com/glossary/distillation

> Distillation trains a smaller "student" model to mimic a larger "teacher" model's outputs, capturing most of the quality at a fraction of the inference cost.

---
Distillation trains a smaller "student" model to mimic a larger "teacher" model's outputs, capturing most of the quality at a fraction of the inference cost.

Distillation produces a small fast model that behaves like a big slow one. The student trains on the teacher's outputs (logits or generated text) rather than human-labelled data, which lets it pick up nuanced behaviour cheaply. In 2026 distillation drives most consumer-facing inference: GPT-4o-mini, Claude Haiku, Gemini Flash, and Llama 3.3 8B are all distilled from larger siblings. The current frontier is reasoning distillation — teaching small models to chain-of-thought by training on traces from o-series or Claude with extended thinking.

## When to use

- Cost-sensitive inference at high volume.
- Edge / on-device deployment.

## Common mistakes

- Distilling from a teacher that itself is wrong — student inherits the errors.
- Skipping evals — distilled models can drift on long-tail tasks.

## Related terms

- [fine-tuning](https://promtable.com/glossary/fine-tuning)
- [reasoning-model](https://promtable.com/glossary/reasoning-model)
- [model-router](https://promtable.com/glossary/model-router)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/distillation
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/distillation".
Contact: info@vibecodingturkey.com.