Duplex conversation
Duplex conversation is the realtime-voice property where both parties (user + AI) can speak + listen simultaneously — supports natural interruption, backchanneling ('uh-huh'), and overlapping speech. The bar for voice agents to feel human in 2026.
Half-duplex voice (one party talks at a time, walkie-talkie style) feels robotic. Duplex flips this: continuous bidirectional audio, both parties can speak / interrupt / backchannel naturally. Engineering: audio streaming both ways simultaneously (WebRTC handles this natively), [[barge-in]] detection (stop AI playback when user starts speaking within ~100ms), backchanneling ('mm-hmm', 'right') without breaking the user's turn, turn-taking models that predict who should speak next. Realtime APIs (OpenAI, Gemini, ElevenLabs Conversational) ship duplex out of the box. The difference between voice that feels natural and voice that feels like calling a customer service IVR.
When to use duplex conversation
- Voice agents in customer-facing apps.
- Phone agents, voice assistants.
Common mistakes
- Building half-duplex voice apps in 2026 — feels dated immediately.
- Disabling barge-in for 'clean audio' — kills the natural feel.
FAQ
What is duplex conversation?
Duplex conversation is the realtime-voice property where both parties (user + AI) can speak + listen simultaneously — supports natural interruption, backchanneling ('uh-huh'), and overlapping speech. The bar for voice agents to feel human in 2026.
When should I use duplex conversation?
Voice agents in customer-facing apps. Phone agents, voice assistants.
What are the most common mistakes with duplex conversation?
Building half-duplex voice apps in 2026 — feels dated immediately. Disabling barge-in for 'clean audio' — kills the natural feel.
Related terms
- Barge-in — Barge-in is the voice-agent feature where the user can interrupt the assistant mid-response — the assistant detects the speech and stops talking — making conversations feel natural instead of robotic turn-taking.
- Interrupt handling — Interrupt handling is the voice-agent capability of detecting when a user starts speaking over the AI's reply and immediately stopping playback — the difference between feeling natural and feeling robotic in production phone agents.
- Voice agent platform — A voice agent platform is a managed stack that combines STT + LLM + TTS + telephony into a single API for building production phone / voice agents — Vapi, Retell, Bland are the 2026 leaders.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/duplex-conversation.md.