# Token budget

**Source:** https://promtable.com/glossary/token-budget

> A token budget is the maximum number of tokens an application allows for a single LLM call (or an agent loop) — enforced to control cost, latency, and runaway behaviour.

---
A token budget is the maximum number of tokens an application allows for a single LLM call (or an agent loop) — enforced to control cost, latency, and runaway behaviour.

Token budgets are the simplest and most important production guardrail for LLM apps. Apply at multiple levels: per-call (max_tokens for the response), per-conversation (truncate history past N tokens), per-agent-loop (cumulative cap across steps), per-user-day (rate limit by token spend), per-feature (kill switch if total tokens exceeds threshold). Without explicit budgets, agent loops can run away and adversarial users can drain accounts. Every production LLM app in 2026 has explicit token budgets enforced server-side.

## When to use

- Any production LLM feature.
- Agent loops.
- User-facing chat where adversarial users can spam.

## Common mistakes

- Budgets only at the API level — adversarial users find ways to amplify.
- No alerting when a session approaches its budget — fail loudly, not silently.

## Related terms

- [rate-limit](https://promtable.com/glossary/rate-limit)
- [agent](https://promtable.com/glossary/agent)
- [guardrails](https://promtable.com/glossary/guardrails)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/token-budget
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/token-budget".
Contact: info@vibecodingturkey.com.