# Thinking budget

**Source:** https://promtable.com/glossary/thinking-budget

> Thinking budget is the API parameter that caps how many reasoning tokens a model is allowed to spend before producing a final answer — Claude `thinking.budget_tokens`, OpenAI o-series `reasoning.effort`, Gemini thinking config. Lets developers trade cost / latency for quality.

---
Thinking budget is the API parameter that caps how many reasoning tokens a model is allowed to spend before producing a final answer — Claude `thinking.budget_tokens`, OpenAI o-series `reasoning.effort`, Gemini thinking config. Lets developers trade cost / latency for quality.

Without a budget, reasoning models can spend wildly varying amounts of compute per query — sometimes 50K thinking tokens, sometimes 2K. Thinking budget caps this: set `budget_tokens: 8000` and the model stops thinking after 8K (returning whatever final answer it has). OpenAI exposes `reasoning.effort` (`low`, `medium`, `high`) as a coarse equivalent. Production patterns: low budget for cheap classification with reasoning fallback, medium for general chat, high for math / code / multi-step. Trade-off: too low + the model can't reach correct answers on hard queries; too high + cost balloons. Most production stacks set per-route budgets (chat = low, refactor = high) rather than one global value.

## When to use

- Production reasoning-model deployments.

## Common mistakes

- No budget set — cost surprise on hard queries.
- Too-aggressive budget — wrong answers on easy-for-reasoning tasks.

## Related terms

- [reasoning-tokens](https://promtable.com/glossary/reasoning-tokens)
- [test-time-compute](https://promtable.com/glossary/test-time-compute)
- [extended-thinking](https://promtable.com/glossary/extended-thinking)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/thinking-budget
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/thinking-budget".
Contact: info@vibecodingturkey.com.