concept

MoE routing

MoE routing is the per-token gating function inside a Mixture-of-Experts model that selects which expert sub-networks process each token — the critical detail that determines MoE quality + efficiency.

Mixture-of-Experts (MoE) models contain N expert sub-networks; a router selects K (usually 2-8) per token to activate. The router's job is critical: bad routing collapses to using a few experts (defeating the point), or spreads tokens too thinly. Modern MoE training (load-balancing losses, expert-choice routing, switch routing) addresses these failure modes. By 2026 Llama 4 Maverick, Mixtral 8x22B, DBRX, DeepSeek V3 all use MoE with carefully tuned routing. From a developer's perspective MoE is mostly invisible — the API or self-host inference engine handles routing — but understanding it matters when debugging quality regressions or sizing inference compute.

Common mistakes

FAQ

What is moe routing?

MoE routing is the per-token gating function inside a Mixture-of-Experts model that selects which expert sub-networks process each token — the critical detail that determines MoE quality + efficiency.

What are the most common mistakes with moe routing?

Sizing MoE inference by total parameter count — active-parameter count per token is what matters.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/moe-routing.md.