technique

Semantic routing

Semantic routing classifies an incoming query by meaning — via embedding similarity to predefined route prototypes — and dispatches it to the right model, agent, or sub-system.

Semantic routing is the cheap fast layer that decides what to do before the expensive layer runs. Embed the query, compare to a set of pre-embedded route prototypes ("customer support", "sales", "code question"), pick the closest, and route accordingly. The routing model itself is usually a small embedder rather than an LLM. Used widely in 2026 production stacks to cut LLM cost (route 70% of queries to a cheap model, only 30% to the frontier) and to gate access (refuse routes for out-of-scope queries). Libraries: semantic-router, Pinecone Route, Cohere Classify.

When to use semantic routing

Common mistakes

FAQ

What is semantic routing?

Semantic routing classifies an incoming query by meaning — via embedding similarity to predefined route prototypes — and dispatches it to the right model, agent, or sub-system.

When should I use semantic routing?

Multi-skill chatbots and agents. Cost-sensitive production with diverse query types. Gate enforcement before expensive model calls.

What are the most common mistakes with semantic routing?

Too-similar route prototypes — embedding distances collapse. No fallback / out-of-scope route — ambiguous queries get mis-routed silently.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/semantic-routing.md.