concept

Model tier

Model tier is the cloud-LLM pricing dimension where the same vendor ships a hierarchy of model sizes (Claude Haiku / Sonnet / Opus; GPT-4o-mini / GPT-4o / o3; Gemini Flash / Pro / Ultra) at different cost + quality. Routing across tiers is core to production LLM cost optimization.

Frontier labs ship 3-4 model tiers per generation: small / fast (cheap-tier — Haiku, GPT-4o-mini, Gemini Flash, Mistral Small), medium (Sonnet, GPT-4o, Gemini Pro), large reasoning (Opus, o3, Gemini Ultra, DeepSeek R3). Tiers differ 10-50× in price + 2-5× in quality. Production cost optimization works by routing across tiers: cheap for classification + extraction + routing decisions, medium for chat + most tasks, large for hard reasoning + extended thinking. Routing strategies: rules-based (always Sonnet for code, Haiku for classification), learned (small classifier picks tier per query), confidence-based (start small, escalate on uncertainty). [[Model router]] and [[router-llm]] patterns formalize this.

When to use model tier

Common mistakes

FAQ

What is model tier?

Model tier is the cloud-LLM pricing dimension where the same vendor ships a hierarchy of model sizes (Claude Haiku / Sonnet / Opus; GPT-4o-mini / GPT-4o / o3; Gemini Flash / Pro / Ultra) at different cost + quality. Routing across tiers is core to production LLM cost optimization.

When should I use model tier?

Anything beyond a single tier in production.

What are the most common mistakes with model tier?

Using flagship tier for everything — easy 10× cost overspend. Using cheap tier for hard reasoning — quality collapses.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/model-tier.md.