Scaling law (LLM)
A scaling law is an empirical relationship — typically a power law — between a language model's loss and inputs like parameter count, training compute, or training data size.
Kaplan et al. (2020) and Chinchilla (Hoffmann et al., 2022) established that LLM loss scales predictably with parameters, data, and compute. The Chinchilla finding — that for a given compute budget there's an optimal balance of parameters and tokens (~20 tokens per parameter) — reshaped how labs trained models. In 2026 scaling laws still drive frontier model design, plus newer laws describe how reasoning capability scales with extra inference compute (test-time scaling). The practical implication: "bigger model" alone isn't the answer — it's bigger model, more tokens, and more inference compute together.
Common mistakes
- Comparing models on parameter count alone — token count and inference compute matter as much.
FAQ
What is scaling law (llm)?
A scaling law is an empirical relationship — typically a power law — between a language model's loss and inputs like parameter count, training compute, or training data size.
What are the most common mistakes with scaling law (llm)?
Comparing models on parameter count alone — token count and inference compute matter as much.
Related terms
- Mixture of Experts (MoE) — Mixture of Experts is an architecture where a router activates only a subset of the model's parameters per token, so total parameter count is huge but inference cost stays low.
- Fine-tuning — Fine-tuning updates a pretrained model's weights on task-specific data, baking the new behaviour into the model rather than relying on prompts.
- Reasoning model — A reasoning model is an LLM trained to produce extensive internal chain-of-thought before its final answer, trading latency for higher accuracy on hard problems.
Sources
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/scaling-law.md.