concept

Test-time compute

Test-time compute is the LLM technique of spending more inference compute per query (longer reasoning chains, multi-sample voting, deeper search) to get better answers — the foundation of reasoning models (o-series, Claude extended thinking, DeepSeek R-series) in 2026.

Pre-2024 LLMs spent the same compute on every query — fast but capped quality. Reasoning models flip this: more compute per query → better answers on hard tasks. Implementations: extended thinking (model emits long internal reasoning before final answer), multi-sample + voting (run N times, take majority, e.g., self-consistency), tree search (explore N branches, pick best), iterative refinement (draft → critique → revise). Cost: 10-100× more inference cost per query, latency 10-100s vs sub-second. Worth it for hard tasks (math proofs, complex code, multi-step planning); wasteful for simple lookups + classification. Production routing: send simple queries to non-reasoning models, escalate hard ones to reasoning tier.

When to use test-time compute

Common mistakes

FAQ

What is test-time compute?

Test-time compute is the LLM technique of spending more inference compute per query (longer reasoning chains, multi-sample voting, deeper search) to get better answers — the foundation of reasoning models (o-series, Claude extended thinking, DeepSeek R-series) in 2026.

When should I use test-time compute?

Hard math, complex code, multi-step planning.

What are the most common mistakes with test-time compute?

Sending everything to reasoning models — easy 50× cost blow-up. Forgetting latency budget — reasoning models can take 30-60s.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/test-time-compute.md.