concept

Preference dataset

A preference dataset is the (prompt, chosen response, rejected response) collection used to fine-tune model alignment via RLHF / DPO / IPO — teaching the model what good responses look like by comparison, not just by example.

Instruction-tuning teaches the model to follow inputs; preference fine-tunes teach it to prefer good outputs over bad ones. Format: `{prompt, chosen, rejected}` triples where chosen + rejected are responses to the same prompt with a quality difference. Sources: human raters (expensive but high-quality), LLM judges (synthetic — cheaper but biased toward judge model), user feedback (thumbs up / down on prod responses). Methods: RLHF (full RL with reward model + PPO — complex), [[DPO]] (direct preference optimization — simpler), IPO / KTO (variants). Production patterns: collect preference data from prod user feedback, fine-tune monthly to improve. By 2026 DPO has largely replaced RLHF in fine-tune pipelines for cost / complexity reasons.

When to use preference dataset

Common mistakes

FAQ

What is preference dataset?

A preference dataset is the (prompt, chosen response, rejected response) collection used to fine-tune model alignment via RLHF / DPO / IPO — teaching the model what good responses look like by comparison, not just by example.

When should I use preference dataset?

Alignment + style fine-tuning. Production quality improvement from user feedback.

What are the most common mistakes with preference dataset?

Using LLM-judge preference for safety-critical fine-tunes — judge biases propagate. Small preference dataset (< 1K) — typically not enough for DPO to shift behavior.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/preference-dataset.md.