# Model router

**Source:** https://promtable.com/glossary/model-router

> A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.

---
A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.

Single-model deployments are now the exception. Production stacks route: GPT-4o-mini for cheap classification, Claude for code, Gemini Pro for long context, an o-series or Claude with extended thinking for hard reasoning. The router can be rule-based ('if task=code → Claude'), embedding-based ('embed the query, route by nearest cluster centroid'), or LLM-based ('ask a small model to pick'). Openrouter, Portkey, Vellum, and Martian provide hosted routing; many teams roll their own. Routing decisions usually optimise cost-per-success, not raw quality.

## When to use

- Any production app with diverse query types.
- Cost-sensitive workloads.

## Common mistakes

- Routing without evals — quality drift goes unnoticed.
- Over-engineering for traffic that's small enough for a single model.

## Related terms

- [reasoning-model](https://promtable.com/glossary/reasoning-model)
- [agent](https://promtable.com/glossary/agent)
- [system-prompt](https://promtable.com/glossary/system-prompt)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/model-router
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/model-router".
Contact: info@vibecodingturkey.com.