concept

Router LLM

A router LLM is a small fast language model whose only job is to classify or rewrite an incoming request — deciding which downstream model, agent, or tool should handle it.

Router LLMs (typically GPT-4o-mini, Claude Haiku, Gemini Flash, or even a small open-weight model) handle the lightweight "what is this" step before the heavyweight "answer this" step runs. They classify intent, rewrite queries for retrieval, decide which expert agent to call, or pick the cheapest model that will meet quality. In 2026 router LLMs are the production default for cost-sensitive multi-skill apps because they cut frontier-model calls by 50-90% with minimal quality loss when the routing layer is evaluated and tuned.

When to use router llm

Common mistakes

FAQ

What is router llm?

A router LLM is a small fast language model whose only job is to classify or rewrite an incoming request — deciding which downstream model, agent, or tool should handle it.

When should I use router llm?

Cost-sensitive production with diverse query types. Multi-agent stacks with skill routing.

What are the most common mistakes with router llm?

Skipping evals on the router — routing errors are invisible without explicit monitoring. Using a router for tasks the frontier model could handle directly in fewer calls.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/router-llm.md.