# Throughput per dollar

**Source:** https://promtable.com/glossary/throughput-per-dollar

> Throughput per dollar is the production metric for LLM inference cost — tokens served per second of compute time per dollar of GPU cost — used to compare inference engines, serving platforms, and hardware in 2026.

---
Throughput per dollar is the production metric for LLM inference cost — tokens served per second of compute time per dollar of GPU cost — used to compare inference engines, serving platforms, and hardware in 2026.

Single-metric throughput (tokens/s) or single-metric cost ($/1M tokens) underspecifies production performance. Throughput per dollar captures the trade-off: a faster engine on cheaper hardware can beat a slower engine on premium hardware, even if both achieve the same raw tokens/s. In 2026 inference engine comparisons (vLLM vs TGI vs sglang vs TensorRT-LLM, Groq vs Together vs Fireworks) are typically reported in throughput-per-dollar terms because that's what production teams actually care about. PagedAttention and continuous batching are improvements measured this way.

## When to use

- Comparing inference engines.
- Sizing production GPU fleets.

## Common mistakes

- Comparing tokens/s without controlling for hardware cost.
- Ignoring quality differences — cheap fast bad output isn't a win.

## Related terms

- [batched-inference](https://promtable.com/glossary/batched-inference)
- [paged-attention](https://promtable.com/glossary/paged-attention)
- [managed-service](https://promtable.com/glossary/managed-service)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/throughput-per-dollar
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/throughput-per-dollar".
Contact: info@vibecodingturkey.com.