technique

Prompt replay

Prompt replay is the debugging technique of re-running a captured production prompt against the same or different model to reproduce a failure or test a fix — the LLM equivalent of replaying a stack trace.

When a prod LLM call returns the wrong answer, you don't just want logs — you want to re-run the exact request to test fixes. Prompt replay captures the full request payload (system prompt, messages, tool definitions, settings) and lets you re-execute it against the original model, a different model, or a modified prompt. Built into Langfuse, Braintrust, Helicone, LangSmith. Production use cases: 'this user got a bad answer — what would Claude 4.7 do?', 'this prompt regressed — bisect against the last known good version', 'a customer reported a hallucination — capture the trace, add to golden set, prevent regression'. Prompt replay turns observability data from passive logs into an active debugging tool.

When to use prompt replay

Common mistakes

FAQ

What is prompt replay?

Prompt replay is the debugging technique of re-running a captured production prompt against the same or different model to reproduce a failure or test a fix — the LLM equivalent of replaying a stack trace.

When should I use prompt replay?

Reproducing prod failures. Testing prompt fixes before deploy. Migrating from one model to another.

What are the most common mistakes with prompt replay?

Replaying without seeding — non-deterministic samplers make 'replay' return different results. Not capturing tool definitions / RAG context — replay misses what the model actually saw.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/prompt-replay.md.