# LLM observability

**Source:** https://promtable.com/glossary/llm-observability

> LLM observability is the production discipline of capturing requests, responses, latencies, costs, and outcomes across LLM-driven systems — the prerequisite for debugging, evaluating, and optimizing AI features in 2026.

---
LLM observability is the production discipline of capturing requests, responses, latencies, costs, and outcomes across LLM-driven systems — the prerequisite for debugging, evaluating, and optimizing AI features in 2026.

Traditional APM (Datadog, NewRelic) doesn't capture what matters for LLM apps: token counts, prompts, completions, tool calls, eval scores. LLM observability tools (Langfuse, Helicone, Braintrust, LangSmith, Arize Phoenix, OpenLLMetry) capture: full prompts + completions per request, model + version, latency / TTFT / TPS, token usage + cost, tool calls + return values, multi-step trace correlation, user feedback signals (thumbs up / down), eval scores. The data flows into dashboards (cost per user, error rate by model), datasets (curate from prod for evals), and alerts (cost spikes, hallucination rate). Without observability, prompt regressions ship silently. With it, every change is verifiable.

## When to use

- Any production LLM feature — non-negotiable.
- Pre-prod: capture dev runs for offline eval.

## Common mistakes

- Logging prompts but not completions — half the data is missing.
- Skipping user feedback signals — the ground truth for prod quality.

## Related terms

- [agent-tracing](https://promtable.com/glossary/agent-tracing)
- [evals](https://promtable.com/glossary/evals)
- [evals-driven-development](https://promtable.com/glossary/evals-driven-development)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/llm-observability
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/llm-observability".
Contact: info@vibecodingturkey.com.