# Preference dataset

**Source:** https://promtable.com/glossary/preference-dataset

> A preference dataset is the (prompt, chosen response, rejected response) collection used to fine-tune model alignment via RLHF / DPO / IPO — teaching the model what good responses look like by comparison, not just by example.

---
A preference dataset is the (prompt, chosen response, rejected response) collection used to fine-tune model alignment via RLHF / DPO / IPO — teaching the model what good responses look like by comparison, not just by example.

Instruction-tuning teaches the model to follow inputs; preference fine-tunes teach it to prefer good outputs over bad ones. Format: `{prompt, chosen, rejected}` triples where chosen + rejected are responses to the same prompt with a quality difference. Sources: human raters (expensive but high-quality), LLM judges (synthetic — cheaper but biased toward judge model), user feedback (thumbs up / down on prod responses). Methods: RLHF (full RL with reward model + PPO — complex), [[DPO]] (direct preference optimization — simpler), IPO / KTO (variants). Production patterns: collect preference data from prod user feedback, fine-tune monthly to improve. By 2026 DPO has largely replaced RLHF in fine-tune pipelines for cost / complexity reasons.

## When to use

- Alignment + style fine-tuning.
- Production quality improvement from user feedback.

## Common mistakes

- Using LLM-judge preference for safety-critical fine-tunes — judge biases propagate.
- Small preference dataset (< 1K) — typically not enough for DPO to shift behavior.

## Related terms

- [dpo](https://promtable.com/glossary/dpo)
- [training-dataset](https://promtable.com/glossary/training-dataset)
- [constitutional-ai](https://promtable.com/glossary/constitutional-ai)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/preference-dataset
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/preference-dataset".
Contact: info@vibecodingturkey.com.