concept

Token budget

A token budget is the maximum number of tokens an application allows for a single LLM call (or an agent loop) — enforced to control cost, latency, and runaway behaviour.

Token budgets are the simplest and most important production guardrail for LLM apps. Apply at multiple levels: per-call (max_tokens for the response), per-conversation (truncate history past N tokens), per-agent-loop (cumulative cap across steps), per-user-day (rate limit by token spend), per-feature (kill switch if total tokens exceeds threshold). Without explicit budgets, agent loops can run away and adversarial users can drain accounts. Every production LLM app in 2026 has explicit token budgets enforced server-side.

When to use token budget

Common mistakes

FAQ

What is token budget?

A token budget is the maximum number of tokens an application allows for a single LLM call (or an agent loop) — enforced to control cost, latency, and runaway behaviour.

When should I use token budget?

Any production LLM feature. Agent loops. User-facing chat where adversarial users can spam.

What are the most common mistakes with token budget?

Budgets only at the API level — adversarial users find ways to amplify. No alerting when a session approaches its budget — fail loudly, not silently.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/token-budget.md.