Test-time compute
Test-time compute is the LLM technique of spending more inference compute per query (longer reasoning chains, multi-sample voting, deeper search) to get better answers — the foundation of reasoning models (o-series, Claude extended thinking, DeepSeek R-series) in 2026.
Pre-2024 LLMs spent the same compute on every query — fast but capped quality. Reasoning models flip this: more compute per query → better answers on hard tasks. Implementations: extended thinking (model emits long internal reasoning before final answer), multi-sample + voting (run N times, take majority, e.g., self-consistency), tree search (explore N branches, pick best), iterative refinement (draft → critique → revise). Cost: 10-100× more inference cost per query, latency 10-100s vs sub-second. Worth it for hard tasks (math proofs, complex code, multi-step planning); wasteful for simple lookups + classification. Production routing: send simple queries to non-reasoning models, escalate hard ones to reasoning tier.
When to use test-time compute
- Hard math, complex code, multi-step planning.
Common mistakes
- Sending everything to reasoning models — easy 50× cost blow-up.
- Forgetting latency budget — reasoning models can take 30-60s.
FAQ
What is test-time compute?
Test-time compute is the LLM technique of spending more inference compute per query (longer reasoning chains, multi-sample voting, deeper search) to get better answers — the foundation of reasoning models (o-series, Claude extended thinking, DeepSeek R-series) in 2026.
When should I use test-time compute?
Hard math, complex code, multi-step planning.
What are the most common mistakes with test-time compute?
Sending everything to reasoning models — easy 50× cost blow-up. Forgetting latency budget — reasoning models can take 30-60s.
Related terms
- Reasoning tokens — Reasoning tokens (or thinking tokens) are the internal chain-of-thought tokens reasoning models produce before the user-visible answer — billed separately and not shown to the end user.
- Extended thinking — Extended thinking is Anthropic's flag on Claude that allocates a configurable budget of internal reasoning tokens before the user-visible answer — enabling deeper reasoning on hard problems for a higher cost.
- Reasoning model — A reasoning model is an LLM trained to produce extensive internal chain-of-thought before its final answer, trading latency for higher accuracy on hard problems.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/test-time-compute.md.