# Model tier

**Source:** https://promtable.com/glossary/model-tier

> Model tier is the cloud-LLM pricing dimension where the same vendor ships a hierarchy of model sizes (Claude Haiku / Sonnet / Opus; GPT-4o-mini / GPT-4o / o3; Gemini Flash / Pro / Ultra) at different cost + quality. Routing across tiers is core to production LLM cost optimization.

---
Model tier is the cloud-LLM pricing dimension where the same vendor ships a hierarchy of model sizes (Claude Haiku / Sonnet / Opus; GPT-4o-mini / GPT-4o / o3; Gemini Flash / Pro / Ultra) at different cost + quality. Routing across tiers is core to production LLM cost optimization.

Frontier labs ship 3-4 model tiers per generation: small / fast (cheap-tier — Haiku, GPT-4o-mini, Gemini Flash, Mistral Small), medium (Sonnet, GPT-4o, Gemini Pro), large reasoning (Opus, o3, Gemini Ultra, DeepSeek R3). Tiers differ 10-50× in price + 2-5× in quality. Production cost optimization works by routing across tiers: cheap for classification + extraction + routing decisions, medium for chat + most tasks, large for hard reasoning + extended thinking. Routing strategies: rules-based (always Sonnet for code, Haiku for classification), learned (small classifier picks tier per query), confidence-based (start small, escalate on uncertainty). [[Model router]] and [[router-llm]] patterns formalize this.

## When to use

- Anything beyond a single tier in production.

## Common mistakes

- Using flagship tier for everything — easy 10× cost overspend.
- Using cheap tier for hard reasoning — quality collapses.

## Related terms

- [cheap-tier-model](https://promtable.com/glossary/cheap-tier-model)
- [reasoning-model](https://promtable.com/glossary/reasoning-model)
- [model-router](https://promtable.com/glossary/model-router)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/model-tier
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/model-tier".
Contact: info@vibecodingturkey.com.