concept

Throughput per dollar

Throughput per dollar is the production metric for LLM inference cost — tokens served per second of compute time per dollar of GPU cost — used to compare inference engines, serving platforms, and hardware in 2026.

Single-metric throughput (tokens/s) or single-metric cost ($/1M tokens) underspecifies production performance. Throughput per dollar captures the trade-off: a faster engine on cheaper hardware can beat a slower engine on premium hardware, even if both achieve the same raw tokens/s. In 2026 inference engine comparisons (vLLM vs TGI vs sglang vs TensorRT-LLM, Groq vs Together vs Fireworks) are typically reported in throughput-per-dollar terms because that's what production teams actually care about. PagedAttention and continuous batching are improvements measured this way.

When to use throughput per dollar

Common mistakes

FAQ

What is throughput per dollar?

Throughput per dollar is the production metric for LLM inference cost — tokens served per second of compute time per dollar of GPU cost — used to compare inference engines, serving platforms, and hardware in 2026.

When should I use throughput per dollar?

Comparing inference engines. Sizing production GPU fleets.

What are the most common mistakes with throughput per dollar?

Comparing tokens/s without controlling for hardware cost. Ignoring quality differences — cheap fast bad output isn't a win.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/throughput-per-dollar.md.