Mixture of Depths
Mixture of Depths (MoD) is an efficiency technique where the model learns to skip some layers for some tokens — applying compute selectively based on token importance.
Introduced by Google in 2024 and adopted in production architectures by 2026, Mixture of Depths complements Mixture of Experts. Where MoE selects different experts per token, MoD selects different depths — easy tokens skip layers, hard tokens get the full stack. The result: matched quality at lower average compute. Combined with MoE you get "per-token routing across both expert and depth", which is part of how 2026 frontier models keep getting more capable without proportionally more compute.
Common mistakes
- Conflating MoD with MoE — they are complementary, not interchangeable.
FAQ
What is mixture of depths?
Mixture of Depths (MoD) is an efficiency technique where the model learns to skip some layers for some tokens — applying compute selectively based on token importance.
What are the most common mistakes with mixture of depths?
Conflating MoD with MoE — they are complementary, not interchangeable.
Related terms
- Mixture of Experts (MoE) — Mixture of Experts is an architecture where a router activates only a subset of the model's parameters per token, so total parameter count is huge but inference cost stays low.
- Reasoning model — A reasoning model is an LLM trained to produce extensive internal chain-of-thought before its final answer, trading latency for higher accuracy on hard problems.
- Speculative decoding — Speculative decoding is an inference technique where a small "draft" model proposes several tokens at once and a large "verifier" model accepts or rejects them, cutting latency by 2-4x.
Sources
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/mixture-of-depths.md.