concept

Router fallback

A router fallback is a chain of model providers that the application tries in order — failing over from primary to secondary to tertiary on 429s, 500s, or quality thresholds.

Single-provider production deployments fail every time a frontier API has an incident. Router fallback chains define the next model to call when the primary fails (rate limit, server error, latency timeout, content filter) and resume the request transparently. In 2026 the standard pattern is primary frontier model → secondary frontier (different lab) → cheap-fast model → graceful refusal. Tools that implement this: OpenRouter, Portkey, LiteLLM, Vellum, Martian. Quality fallbacks (degrade gracefully instead of erroring) are the norm in production LLM apps that take real traffic.

When to use router fallback

Common mistakes

FAQ

What is router fallback?

A router fallback is a chain of model providers that the application tries in order — failing over from primary to secondary to tertiary on 429s, 500s, or quality thresholds.

When should I use router fallback?

Any production LLM app with real traffic. Customer-facing flows that cannot show a 5xx to the user.

What are the most common mistakes with router fallback?

Falling back to a much cheaper model on every error — quality drops invisibly. No alerting on fallback frequency — you don't notice the primary slowly degrading.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/ai-router-fallback.md.