# Mixture of Experts (MoE)

**Source:** https://promtable.com/glossary/mixture-of-experts

> Mixture of Experts is an architecture where a router activates only a subset of the model's parameters per token, so total parameter count is huge but inference cost stays low.

---
Mixture of Experts is an architecture where a router activates only a subset of the model's parameters per token, so total parameter count is huge but inference cost stays low.

MoE models contain N expert sub-networks plus a router that selects K experts (usually 2–8) per token. Total parameter count can be 1T+ but only a fraction runs per forward pass — drastically cheaper inference than a dense model of the same size. Mixtral 8x22B, DBRX, Qwen2-MoE, DeepSeek V3, and Llama 4 Maverick are major 2026 MoE models. They dominate the Pareto frontier of quality-per-dollar for open-weight inference. Routing failures (expert collapse, imbalance) are the main engineering challenge.

## Common mistakes

- Treating parameter count as compute cost — the active-per-token count is what matters.
- Assuming MoE is automatically better — dense models still lead at very small and very large scale.

## Related terms

- [reasoning-model](https://promtable.com/glossary/reasoning-model)
- [model-router](https://promtable.com/glossary/model-router)
- [context-window](https://promtable.com/glossary/context-window)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/mixture-of-experts
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/mixture-of-experts".
Contact: info@vibecodingturkey.com.