Retry with backoff
Retry with backoff is the production resilience pattern for handling transient LLM API failures (rate limits, 5xx, timeouts) — retry with exponentially-growing delays + jitter to avoid thundering herd. Built into most AI SDKs by default in 2026.
LLM APIs fail transiently: rate limits hit, the model overloads, network blips. Naive immediate retry makes it worse (thundering herd). Retry with exponential backoff: wait 1s, then 2s, then 4s, then 8s, with random jitter (avoid all clients retrying in sync). Stop after N attempts. Production tuning: shorter backoff for cheap retries (rate-limit headers), longer for expensive (500 errors), per-endpoint policies (chat vs streaming vs tool use), respect Retry-After header. Most AI SDKs implement this by default; customize via timeout + max retries config. Hugely improves resilience for free — most production failures are transient + retry succeeds.
When to use retry with backoff
- Any production LLM call.
Common mistakes
- No jitter — synchronized retry storms.
- Retrying on 400 errors — bad request won't fix itself.
FAQ
What is retry with backoff?
Retry with backoff is the production resilience pattern for handling transient LLM API failures (rate limits, 5xx, timeouts) — retry with exponentially-growing delays + jitter to avoid thundering herd. Built into most AI SDKs by default in 2026.
When should I use retry with backoff?
Any production LLM call.
What are the most common mistakes with retry with backoff?
No jitter — synchronized retry storms. Retrying on 400 errors — bad request won't fix itself.
Related terms
- AI SDK — An AI SDK is the official client library a vendor ships for calling their model APIs — Anthropic SDK, OpenAI SDK, Google GenAI SDK, Mistral SDK, Vercel AI SDK (multi-vendor wrapper). Handles auth, retries, streaming, types.
- Rate limit — A rate limit is a hard cap on how many requests or tokens an API will accept from a single client in a given time window — the single most common production failure mode for LLM apps.
- Router fallback — A router fallback is a chain of model providers that the application tries in order — failing over from primary to secondary to tertiary on 429s, 500s, or quality thresholds.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/retry-with-backoff.md.