Prompt replay
Prompt replay is the debugging technique of re-running a captured production prompt against the same or different model to reproduce a failure or test a fix — the LLM equivalent of replaying a stack trace.
When a prod LLM call returns the wrong answer, you don't just want logs — you want to re-run the exact request to test fixes. Prompt replay captures the full request payload (system prompt, messages, tool definitions, settings) and lets you re-execute it against the original model, a different model, or a modified prompt. Built into Langfuse, Braintrust, Helicone, LangSmith. Production use cases: 'this user got a bad answer — what would Claude 4.7 do?', 'this prompt regressed — bisect against the last known good version', 'a customer reported a hallucination — capture the trace, add to golden set, prevent regression'. Prompt replay turns observability data from passive logs into an active debugging tool.
When to use prompt replay
- Reproducing prod failures.
- Testing prompt fixes before deploy.
- Migrating from one model to another.
Common mistakes
- Replaying without seeding — non-deterministic samplers make 'replay' return different results.
- Not capturing tool definitions / RAG context — replay misses what the model actually saw.
FAQ
What is prompt replay?
Prompt replay is the debugging technique of re-running a captured production prompt against the same or different model to reproduce a failure or test a fix — the LLM equivalent of replaying a stack trace.
When should I use prompt replay?
Reproducing prod failures. Testing prompt fixes before deploy. Migrating from one model to another.
What are the most common mistakes with prompt replay?
Replaying without seeding — non-deterministic samplers make 'replay' return different results. Not capturing tool definitions / RAG context — replay misses what the model actually saw.
Related terms
- LLM observability — LLM observability is the production discipline of capturing requests, responses, latencies, costs, and outcomes across LLM-driven systems — the prerequisite for debugging, evaluating, and optimizing AI features in 2026.
- Prompt versioning — Prompt versioning is the discipline of treating prompts as source-controlled artefacts — each prompt has a versioned ID, a deploy history, and a regression-tested change log.
- A/B testing prompts — A/B testing prompts runs two prompt variants against the same input distribution and compares scored outputs, attributing quality differences to the prompt change.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/prompt-replay.md.