Token budget
A token budget is the maximum number of tokens an application allows for a single LLM call (or an agent loop) — enforced to control cost, latency, and runaway behaviour.
Token budgets are the simplest and most important production guardrail for LLM apps. Apply at multiple levels: per-call (max_tokens for the response), per-conversation (truncate history past N tokens), per-agent-loop (cumulative cap across steps), per-user-day (rate limit by token spend), per-feature (kill switch if total tokens exceeds threshold). Without explicit budgets, agent loops can run away and adversarial users can drain accounts. Every production LLM app in 2026 has explicit token budgets enforced server-side.
When to use token budget
- Any production LLM feature.
- Agent loops.
- User-facing chat where adversarial users can spam.
Common mistakes
- Budgets only at the API level — adversarial users find ways to amplify.
- No alerting when a session approaches its budget — fail loudly, not silently.
FAQ
What is token budget?
A token budget is the maximum number of tokens an application allows for a single LLM call (or an agent loop) — enforced to control cost, latency, and runaway behaviour.
When should I use token budget?
Any production LLM feature. Agent loops. User-facing chat where adversarial users can spam.
What are the most common mistakes with token budget?
Budgets only at the API level — adversarial users find ways to amplify. No alerting when a session approaches its budget — fail loudly, not silently.
Related terms
- Rate limit — A rate limit is a hard cap on how many requests or tokens an API will accept from a single client in a given time window — the single most common production failure mode for LLM apps.
- AI agent — An AI agent is a system where a language model autonomously plans and executes a sequence of tool calls to accomplish a goal.
- Guardrails — Guardrails are deterministic checks layered around a language model to prevent unsafe, off-topic, or non-compliant outputs from reaching the user.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/token-budget.md.