technique

Retry with backoff

Retry with backoff is the production resilience pattern for handling transient LLM API failures (rate limits, 5xx, timeouts) — retry with exponentially-growing delays + jitter to avoid thundering herd. Built into most AI SDKs by default in 2026.

LLM APIs fail transiently: rate limits hit, the model overloads, network blips. Naive immediate retry makes it worse (thundering herd). Retry with exponential backoff: wait 1s, then 2s, then 4s, then 8s, with random jitter (avoid all clients retrying in sync). Stop after N attempts. Production tuning: shorter backoff for cheap retries (rate-limit headers), longer for expensive (500 errors), per-endpoint policies (chat vs streaming vs tool use), respect Retry-After header. Most AI SDKs implement this by default; customize via timeout + max retries config. Hugely improves resilience for free — most production failures are transient + retry succeeds.

When to use retry with backoff

Common mistakes

FAQ

What is retry with backoff?

Retry with backoff is the production resilience pattern for handling transient LLM API failures (rate limits, 5xx, timeouts) — retry with exponentially-growing delays + jitter to avoid thundering herd. Built into most AI SDKs by default in 2026.

When should I use retry with backoff?

Any production LLM call.

What are the most common mistakes with retry with backoff?

No jitter — synchronized retry storms. Retrying on 400 errors — bad request won't fix itself.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/retry-with-backoff.md.