concept

Model router

A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.

Single-model deployments are now the exception. Production stacks route: GPT-4o-mini for cheap classification, Claude for code, Gemini Pro for long context, an o-series or Claude with extended thinking for hard reasoning. The router can be rule-based ('if task=code → Claude'), embedding-based ('embed the query, route by nearest cluster centroid'), or LLM-based ('ask a small model to pick'). Openrouter, Portkey, Vellum, and Martian provide hosted routing; many teams roll their own. Routing decisions usually optimise cost-per-success, not raw quality.

When to use model router

Common mistakes

FAQ

What is model router?

A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.

When should I use model router?

Any production app with diverse query types. Cost-sensitive workloads.

What are the most common mistakes with model router?

Routing without evals — quality drift goes unnoticed. Over-engineering for traffic that's small enough for a single model.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/model-router.md.