model

Mixture of Experts (MoE)

Mixture of Experts is an architecture where a router activates only a subset of the model's parameters per token, so total parameter count is huge but inference cost stays low.

MoE models contain N expert sub-networks plus a router that selects K experts (usually 2–8) per token. Total parameter count can be 1T+ but only a fraction runs per forward pass — drastically cheaper inference than a dense model of the same size. Mixtral 8x22B, DBRX, Qwen2-MoE, DeepSeek V3, and Llama 4 Maverick are major 2026 MoE models. They dominate the Pareto frontier of quality-per-dollar for open-weight inference. Routing failures (expert collapse, imbalance) are the main engineering challenge.

Common mistakes

FAQ

What is mixture of experts (moe)?

Mixture of Experts is an architecture where a router activates only a subset of the model's parameters per token, so total parameter count is huge but inference cost stays low.

What are the most common mistakes with mixture of experts (moe)?

Treating parameter count as compute cost — the active-per-token count is what matters. Assuming MoE is automatically better — dense models still lead at very small and very large scale.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/mixture-of-experts.md.