MoE routing
MoE routing is the per-token gating function inside a Mixture-of-Experts model that selects which expert sub-networks process each token — the critical detail that determines MoE quality + efficiency.
Mixture-of-Experts (MoE) models contain N expert sub-networks; a router selects K (usually 2-8) per token to activate. The router's job is critical: bad routing collapses to using a few experts (defeating the point), or spreads tokens too thinly. Modern MoE training (load-balancing losses, expert-choice routing, switch routing) addresses these failure modes. By 2026 Llama 4 Maverick, Mixtral 8x22B, DBRX, DeepSeek V3 all use MoE with carefully tuned routing. From a developer's perspective MoE is mostly invisible — the API or self-host inference engine handles routing — but understanding it matters when debugging quality regressions or sizing inference compute.
Common mistakes
- Sizing MoE inference by total parameter count — active-parameter count per token is what matters.
FAQ
What is moe routing?
MoE routing is the per-token gating function inside a Mixture-of-Experts model that selects which expert sub-networks process each token — the critical detail that determines MoE quality + efficiency.
What are the most common mistakes with moe routing?
Sizing MoE inference by total parameter count — active-parameter count per token is what matters.
Related terms
- Mixture of Experts (MoE) — Mixture of Experts is an architecture where a router activates only a subset of the model's parameters per token, so total parameter count is huge but inference cost stays low.
- Mixture of Depths — Mixture of Depths (MoD) is an efficiency technique where the model learns to skip some layers for some tokens — applying compute selectively based on token importance.
- Attention mechanism — The attention mechanism is the transformer building block that lets each token in an input weight the importance of every other token when computing its representation — the core technique that made modern LLMs possible.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/moe-routing.md.