Router fallback
A router fallback is a chain of model providers that the application tries in order — failing over from primary to secondary to tertiary on 429s, 500s, or quality thresholds.
Single-provider production deployments fail every time a frontier API has an incident. Router fallback chains define the next model to call when the primary fails (rate limit, server error, latency timeout, content filter) and resume the request transparently. In 2026 the standard pattern is primary frontier model → secondary frontier (different lab) → cheap-fast model → graceful refusal. Tools that implement this: OpenRouter, Portkey, LiteLLM, Vellum, Martian. Quality fallbacks (degrade gracefully instead of erroring) are the norm in production LLM apps that take real traffic.
When to use router fallback
- Any production LLM app with real traffic.
- Customer-facing flows that cannot show a 5xx to the user.
Common mistakes
- Falling back to a much cheaper model on every error — quality drops invisibly.
- No alerting on fallback frequency — you don't notice the primary slowly degrading.
FAQ
What is router fallback?
A router fallback is a chain of model providers that the application tries in order — failing over from primary to secondary to tertiary on 429s, 500s, or quality thresholds.
When should I use router fallback?
Any production LLM app with real traffic. Customer-facing flows that cannot show a 5xx to the user.
What are the most common mistakes with router fallback?
Falling back to a much cheaper model on every error — quality drops invisibly. No alerting on fallback frequency — you don't notice the primary slowly degrading.
Related terms
- OpenRouter — OpenRouter is a unified API that lets you call 200+ language models through one endpoint with one API key — the de-facto model-router infrastructure layer in 2026.
- Model router — A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.
- Rate limit — A rate limit is a hard cap on how many requests or tokens an API will accept from a single client in a given time window — the single most common production failure mode for LLM apps.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/ai-router-fallback.md.