# Instruction tuning

**Source:** https://promtable.com/glossary/instruction-tuning

> Instruction tuning is the post-training stage where a base language model is fine-tuned on examples of (instruction, ideal response) pairs to follow human instructions reliably.

---
Instruction tuning is the post-training stage where a base language model is fine-tuned on examples of (instruction, ideal response) pairs to follow human instructions reliably.

Base LLMs (raw next-token predictors) do not follow instructions well — they continue text in style. Instruction tuning (introduced widely by InstructGPT and FLAN in 2021-2022) reshapes the model to follow imperative inputs like "Translate this to French". RLHF (reinforcement learning from human feedback) and DPO (direct preference optimisation) are the dominant techniques. In 2026 the discipline has matured: most open-weight model releases ship a base + instruction-tuned pair, and serious fine-tuning teams pick up where instruction tuning left off and add domain-specific or persona-specific alignment.

## Common mistakes

- Trying to teach a base model new tasks via prompting alone — it won't follow consistently.
- Re-fine-tuning an already instruction-tuned model on raw text — degrades instruction following.

## Related terms

- [fine-tuning](https://promtable.com/glossary/fine-tuning)
- [system-prompt](https://promtable.com/glossary/system-prompt)
- [evals](https://promtable.com/glossary/evals)
- [model-card](https://promtable.com/glossary/model-card)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/instruction-tuning
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/instruction-tuning".
Contact: info@vibecodingturkey.com.