Reasoning tokens
Reasoning tokens (or thinking tokens) are the internal chain-of-thought tokens reasoning models produce before the user-visible answer — billed separately and not shown to the end user.
Reasoning models like OpenAI o-series, Claude with extended thinking, and Gemini 2 Thinking generate thousands of internal reasoning tokens for hard problems before emitting the final answer. APIs surface this as a separate token count (and price) and let developers cap it (budget_tokens for Anthropic, max_completion_tokens minus visible output for OpenAI). Higher reasoning budgets improve quality on hard math, code, and planning at the cost of latency and price. For trivial tasks they add latency without improving anything. Best practice in 2026 is to route by task: standard model for chat and extraction, reasoning model with explicit budget for hard steps.
When to use reasoning tokens
- Hard math, code, planning, multi-step reasoning.
- Agent step decisions where one wrong step cascades.
Common mistakes
- Running every query through reasoning tokens — costs and latency blow up.
- Capping the budget too low on hard problems — quality drops sharply at the edge.
FAQ
What is reasoning tokens?
Reasoning tokens (or thinking tokens) are the internal chain-of-thought tokens reasoning models produce before the user-visible answer — billed separately and not shown to the end user.
When should I use reasoning tokens?
Hard math, code, planning, multi-step reasoning. Agent step decisions where one wrong step cascades.
What are the most common mistakes with reasoning tokens?
Running every query through reasoning tokens — costs and latency blow up. Capping the budget too low on hard problems — quality drops sharply at the edge.
Related terms
- Reasoning model — A reasoning model is an LLM trained to produce extensive internal chain-of-thought before its final answer, trading latency for higher accuracy on hard problems.
- Chain-of-thought prompting — Chain-of-thought (CoT) prompting tells a language model to write its reasoning steps before its final answer, increasing accuracy on multi-step problems.
- Model router — A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/reasoning-tokens.md.