Top-p (nucleus sampling)
Top-p (nucleus sampling) restricts the model to the smallest set of tokens whose cumulative probability is at least p, then samples from that set.
Top-p, introduced by Holtzman et al. (2019), narrows the next-token candidates dynamically based on the distribution shape rather than a fixed count. At top-p=0.9 the model considers only the top tokens that together account for 90% of probability mass. It is often a better diversity knob than temperature because it adapts: confident contexts stay confident (few candidates), uncertain contexts get more variety. Most teams set top-p or temperature, not both. Common production settings: top-p=1.0 with temperature 0–0.3 for facts; top-p=0.9 with temperature 0.7 for creative.
When to use top-p (nucleus sampling)
- Open-ended generation where you want adaptive diversity.
- As an alternative to temperature when you want bounded randomness.
Common mistakes
- Setting both temperature and top-p aggressively low — output becomes degenerate.
- Using top-p < 0.5 — usually produces robotic text.
FAQ
What is top-p (nucleus sampling)?
Top-p (nucleus sampling) restricts the model to the smallest set of tokens whose cumulative probability is at least p, then samples from that set.
When should I use top-p (nucleus sampling)?
Open-ended generation where you want adaptive diversity. As an alternative to temperature when you want bounded randomness.
What are the most common mistakes with top-p (nucleus sampling)?
Setting both temperature and top-p aggressively low — output becomes degenerate. Using top-p < 0.5 — usually produces robotic text.
Related terms
- Temperature — Temperature is a sampling parameter that controls randomness in a language model's output, where 0 is fully deterministic and higher values introduce more variety.
Sources
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/top-p.md.