# Rate limit

**Source:** https://promtable.com/glossary/rate-limit

> A rate limit is a hard cap on how many requests or tokens an API will accept from a single client in a given time window — the single most common production failure mode for LLM apps.

---
A rate limit is a hard cap on how many requests or tokens an API will accept from a single client in a given time window — the single most common production failure mode for LLM apps.

Every LLM provider enforces multiple rate limits: requests per minute (RPM), input + output tokens per minute (TPM), and concurrent in-flight requests. Hit any one and you get a 429. Production apps must implement exponential backoff with jitter, route across providers / regions when one tier is saturated, and warn users gracefully instead of dying. In 2026 frontier API tiers can hit 50,000-500,000 TPM at enterprise tiers — but during traffic spikes you will still meet them. Plan for it.

## Common mistakes

- Linear retry without jitter — synchronised retry storms make the problem worse.
- No fallback provider — a single 429 cascade brings the product down.
- Mixing dev + prod traffic on the same key — dev work starves prod.

## Related terms

- [model-router](https://promtable.com/glossary/model-router)
- [openrouter](https://promtable.com/glossary/openrouter)
- [agent](https://promtable.com/glossary/agent)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/rate-limit
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/rate-limit".
Contact: info@vibecodingturkey.com.