Interrupt handling
Interrupt handling is the voice-agent capability of detecting when a user starts speaking over the AI's reply and immediately stopping playback — the difference between feeling natural and feeling robotic in production phone agents.
When humans talk on the phone, we interrupt each other constantly — to confirm, clarify, redirect. Voice agents that ignore interrupts (talk over the user, refuse to stop) feel robotic and break the conversation. Production interrupt handling combines: [[vad]] (voice activity detection) running continuously on the user audio, [[barge-in]] logic to cancel current TTS playback within ~100ms of detected speech, [[streaming-stt]] to process the interrupting words as they arrive, and an LLM call that incorporates the interrupt into the next turn rather than ignoring it. Retell AI leads in 2026 on interrupt-handling quality; Vapi, Pipecat, LiveKit Agents support it but with varying tuning.
When to use interrupt handling
- Any production voice agent — non-negotiable.
- Phone agents especially — phone callers expect natural turn-taking.
Common mistakes
- Tuning VAD too sensitive — every background noise stops playback; agent stutters constantly.
- Tuning VAD too lax — agent talks over the user; conversation collapses.
FAQ
What is interrupt handling?
Interrupt handling is the voice-agent capability of detecting when a user starts speaking over the AI's reply and immediately stopping playback — the difference between feeling natural and feeling robotic in production phone agents.
When should I use interrupt handling?
Any production voice agent — non-negotiable. Phone agents especially — phone callers expect natural turn-taking.
What are the most common mistakes with interrupt handling?
Tuning VAD too sensitive — every background noise stops playback; agent stutters constantly. Tuning VAD too lax — agent talks over the user; conversation collapses.
Related terms
- Barge-in — Barge-in is the voice-agent feature where the user can interrupt the assistant mid-response — the assistant detects the speech and stops talking — making conversations feel natural instead of robotic turn-taking.
- Voice activity detection (VAD) — Voice activity detection is the lightweight signal-processing step that determines whether incoming audio contains speech — used to start STT, trigger barge-in, and gate microphone use in voice agents.
- Voice agent platform — A voice agent platform is a managed stack that combines STT + LLM + TTS + telephony into a single API for building production phone / voice agents — Vapi, Retell, Bland are the 2026 leaders.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/interrupt-handling.md.