parameter

Top-p (nucleus sampling)

Top-p (nucleus sampling) restricts the model to the smallest set of tokens whose cumulative probability is at least p, then samples from that set.

Top-p, introduced by Holtzman et al. (2019), narrows the next-token candidates dynamically based on the distribution shape rather than a fixed count. At top-p=0.9 the model considers only the top tokens that together account for 90% of probability mass. It is often a better diversity knob than temperature because it adapts: confident contexts stay confident (few candidates), uncertain contexts get more variety. Most teams set top-p or temperature, not both. Common production settings: top-p=1.0 with temperature 0–0.3 for facts; top-p=0.9 with temperature 0.7 for creative.

When to use top-p (nucleus sampling)

Common mistakes

FAQ

What is top-p (nucleus sampling)?

Top-p (nucleus sampling) restricts the model to the smallest set of tokens whose cumulative probability is at least p, then samples from that set.

When should I use top-p (nucleus sampling)?

Open-ended generation where you want adaptive diversity. As an alternative to temperature when you want bounded randomness.

What are the most common mistakes with top-p (nucleus sampling)?

Setting both temperature and top-p aggressively low — output becomes degenerate. Using top-p < 0.5 — usually produces robotic text.

Sources

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/top-p.md.