Top-p, or nucleus sampling, restricts the model to the smallest set of next tokens whose cumulative probability exceeds p. At top-p = 0.9, the model only samples from tokens that together account for 90% of the probability mass — cutting the long tail of unlikely choices.
In practice you change either temperature or top-p, not both. Top-p is more "stable" because it adapts to how peaky the distribution is. A common production setting is top-p = 0.9 with temperature = 0.7.
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.