Cost attribution
Cost attribution is the FinOps discipline of tracking LLM spend per-user, per-feature, per-tenant, per-model — the foundation for unit economics, abuse detection, and pricing decisions in 2026 AI products.
LLM bills can spike unpredictably: one prompt-injected agent burns $1K in an hour, one viral feature 10×s the bill. Cost attribution tags every LLM call with metadata (user_id, tenant_id, feature, model, session) so the platform can answer: which features cost the most, which users are unprofitable, which tenants need rate-limiting, which models give the best $-per-quality. Implementations: Langfuse / Helicone / Braintrust capture cost metadata; custom proxies (LiteLLM, Portkey, OpenRouter) inject tags; OpenTelemetry attributes. Standard reports: cost per active user, cost per feature, cost by model, top-N spenders, week-over-week deltas. Without cost attribution, AI features hit unit-economics walls when scale arrives.
When to use cost attribution
- Any AI product with non-trivial LLM spend.
- Multi-tenant SaaS — needed for accurate billing.
Common mistakes
- Aggregating at the API key level only — misses per-feature breakdowns.
- Skipping output token attribution — output dominates cost on Claude / GPT-4-tier models.
FAQ
What is cost attribution?
Cost attribution is the FinOps discipline of tracking LLM spend per-user, per-feature, per-tenant, per-model — the foundation for unit economics, abuse detection, and pricing decisions in 2026 AI products.
When should I use cost attribution?
Any AI product with non-trivial LLM spend. Multi-tenant SaaS — needed for accurate billing.
What are the most common mistakes with cost attribution?
Aggregating at the API key level only — misses per-feature breakdowns. Skipping output token attribution — output dominates cost on Claude / GPT-4-tier models.
Related terms
- LLM observability — LLM observability is the production discipline of capturing requests, responses, latencies, costs, and outcomes across LLM-driven systems — the prerequisite for debugging, evaluating, and optimizing AI features in 2026.
- Rate limit — A rate limit is a hard cap on how many requests or tokens an API will accept from a single client in a given time window — the single most common production failure mode for LLM apps.
- Model router — A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/cost-attribution.md.