Distillation
Distillation trains a smaller "student" model to mimic a larger "teacher" model's outputs, capturing most of the quality at a fraction of the inference cost.
Distillation produces a small fast model that behaves like a big slow one. The student trains on the teacher's outputs (logits or generated text) rather than human-labelled data, which lets it pick up nuanced behaviour cheaply. In 2026 distillation drives most consumer-facing inference: GPT-4o-mini, Claude Haiku, Gemini Flash, and Llama 3.3 8B are all distilled from larger siblings. The current frontier is reasoning distillation — teaching small models to chain-of-thought by training on traces from o-series or Claude with extended thinking.
When to use distillation
- Cost-sensitive inference at high volume.
- Edge / on-device deployment.
Common mistakes
- Distilling from a teacher that itself is wrong — student inherits the errors.
- Skipping evals — distilled models can drift on long-tail tasks.
FAQ
What is distillation?
Distillation trains a smaller "student" model to mimic a larger "teacher" model's outputs, capturing most of the quality at a fraction of the inference cost.
When should I use distillation?
Cost-sensitive inference at high volume. Edge / on-device deployment.
What are the most common mistakes with distillation?
Distilling from a teacher that itself is wrong — student inherits the errors. Skipping evals — distilled models can drift on long-tail tasks.
Related terms
- Fine-tuning — Fine-tuning updates a pretrained model's weights on task-specific data, baking the new behaviour into the model rather than relying on prompts.
- Reasoning model — A reasoning model is an LLM trained to produce extensive internal chain-of-thought before its final answer, trading latency for higher accuracy on hard problems.
- Model router — A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/distillation.md.