model

Mixture of Depths

Mixture of Depths (MoD) is an efficiency technique where the model learns to skip some layers for some tokens — applying compute selectively based on token importance.

Introduced by Google in 2024 and adopted in production architectures by 2026, Mixture of Depths complements Mixture of Experts. Where MoE selects different experts per token, MoD selects different depths — easy tokens skip layers, hard tokens get the full stack. The result: matched quality at lower average compute. Combined with MoE you get "per-token routing across both expert and depth", which is part of how 2026 frontier models keep getting more capable without proportionally more compute.

Common mistakes

FAQ

What is mixture of depths?

Mixture of Depths (MoD) is an efficiency technique where the model learns to skip some layers for some tokens — applying compute selectively based on token importance.

What are the most common mistakes with mixture of depths?

Conflating MoD with MoE — they are complementary, not interchangeable.

Sources

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/mixture-of-depths.md.