Training dataset
A training dataset is the JSONL / Parquet collection of (input, output) pairs used to fine-tune a model — for instruction tuning, RLHF / DPO preference data, vision examples, or domain-specific patterns. Quality + scale + diversity matter more than raw size.
Fine-tune outcomes are dominated by the dataset: garbage in, garbage out. Common shapes: instruction tuning ('User: ... Assistant: ...'), preference pairs (`{prompt, chosen, rejected}` for DPO), function-call examples (tool use), vision (image + caption / answer), code (problem + solution + tests). Quality drivers: (1) consistency (same style throughout), (2) diversity (cover the input distribution), (3) hardness (mix easy + hard cases), (4) noise (typos / mistakes confuse the model). Size guidelines for 2026: instruction tuning — 1K-10K high-quality examples usually beats 100K mediocre; LoRA can succeed on 100-1K examples for narrow tasks; full fine-tune needs more. Provenance + license matter for production (no scraped training data with unclear rights).
When to use training dataset
- Fine-tuning, RLHF / DPO, instruction tuning.
Common mistakes
- Quantity over quality — 100K noisy examples lose to 5K curated.
- Mismatched style — training data style ≠ desired output style → model picks up wrong style.
FAQ
What is training dataset?
A training dataset is the JSONL / Parquet collection of (input, output) pairs used to fine-tune a model — for instruction tuning, RLHF / DPO preference data, vision examples, or domain-specific patterns. Quality + scale + diversity matter more than raw size.
When should I use training dataset?
Fine-tuning, RLHF / DPO, instruction tuning.
What are the most common mistakes with training dataset?
Quantity over quality — 100K noisy examples lose to 5K curated. Mismatched style — training data style ≠ desired output style → model picks up wrong style.
Related terms
- Instruction tuning — Instruction tuning is the post-training stage where a base language model is fine-tuned on examples of (instruction, ideal response) pairs to follow human instructions reliably.
- Synthetic data — Synthetic data is training or evaluation data generated by a model rather than collected from humans — increasingly used to fine-tune smaller models and to fill gaps in real datasets.
- Direct preference optimisation (DPO) — Direct preference optimisation is a fine-tuning method that aligns a model to human preferences directly from preference pairs — without training an explicit reward model first.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/training-dataset.md.