Model tier
Model tier is the cloud-LLM pricing dimension where the same vendor ships a hierarchy of model sizes (Claude Haiku / Sonnet / Opus; GPT-4o-mini / GPT-4o / o3; Gemini Flash / Pro / Ultra) at different cost + quality. Routing across tiers is core to production LLM cost optimization.
Frontier labs ship 3-4 model tiers per generation: small / fast (cheap-tier — Haiku, GPT-4o-mini, Gemini Flash, Mistral Small), medium (Sonnet, GPT-4o, Gemini Pro), large reasoning (Opus, o3, Gemini Ultra, DeepSeek R3). Tiers differ 10-50× in price + 2-5× in quality. Production cost optimization works by routing across tiers: cheap for classification + extraction + routing decisions, medium for chat + most tasks, large for hard reasoning + extended thinking. Routing strategies: rules-based (always Sonnet for code, Haiku for classification), learned (small classifier picks tier per query), confidence-based (start small, escalate on uncertainty). [[Model router]] and [[router-llm]] patterns formalize this.
When to use model tier
- Anything beyond a single tier in production.
Common mistakes
- Using flagship tier for everything — easy 10× cost overspend.
- Using cheap tier for hard reasoning — quality collapses.
FAQ
What is model tier?
Model tier is the cloud-LLM pricing dimension where the same vendor ships a hierarchy of model sizes (Claude Haiku / Sonnet / Opus; GPT-4o-mini / GPT-4o / o3; Gemini Flash / Pro / Ultra) at different cost + quality. Routing across tiers is core to production LLM cost optimization.
When should I use model tier?
Anything beyond a single tier in production.
What are the most common mistakes with model tier?
Using flagship tier for everything — easy 10× cost overspend. Using cheap tier for hard reasoning — quality collapses.
Related terms
- Cheap-tier model — A cheap-tier model is the small-fast LLM each major provider ships alongside their frontier model — Claude Haiku, GPT-4o-mini, Gemini Flash, Mistral Small, DeepSeek V3 — used for routing, classification, extraction, and bulk inference.
- Reasoning model — A reasoning model is an LLM trained to produce extensive internal chain-of-thought before its final answer, trading latency for higher accuracy on hard problems.
- Model router — A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/model-tier.md.