parameter

Thinking budget

Thinking budget is the API parameter that caps how many reasoning tokens a model is allowed to spend before producing a final answer — Claude `thinking.budget_tokens`, OpenAI o-series `reasoning.effort`, Gemini thinking config. Lets developers trade cost / latency for quality.

Without a budget, reasoning models can spend wildly varying amounts of compute per query — sometimes 50K thinking tokens, sometimes 2K. Thinking budget caps this: set `budget_tokens: 8000` and the model stops thinking after 8K (returning whatever final answer it has). OpenAI exposes `reasoning.effort` (`low`, `medium`, `high`) as a coarse equivalent. Production patterns: low budget for cheap classification with reasoning fallback, medium for general chat, high for math / code / multi-step. Trade-off: too low + the model can't reach correct answers on hard queries; too high + cost balloons. Most production stacks set per-route budgets (chat = low, refactor = high) rather than one global value.

When to use thinking budget

Common mistakes

FAQ

What is thinking budget?

Thinking budget is the API parameter that caps how many reasoning tokens a model is allowed to spend before producing a final answer — Claude `thinking.budget_tokens`, OpenAI o-series `reasoning.effort`, Gemini thinking config. Lets developers trade cost / latency for quality.

When should I use thinking budget?

Production reasoning-model deployments.

What are the most common mistakes with thinking budget?

No budget set — cost surprise on hard queries. Too-aggressive budget — wrong answers on easy-for-reasoning tasks.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/thinking-budget.md.