# Realtime API

**Source:** https://promtable.com/glossary/realtime-api

> A Realtime API is the WebSocket / WebRTC-based LLM endpoint that supports streaming audio in + audio out for natural duplex conversation — OpenAI Realtime API, Gemini Live, ElevenLabs Conversational, Cartesia Sonic are 2026 leaders.

---
A Realtime API is the WebSocket / WebRTC-based LLM endpoint that supports streaming audio in + audio out for natural duplex conversation — OpenAI Realtime API, Gemini Live, ElevenLabs Conversational, Cartesia Sonic are 2026 leaders.

Pre-realtime voice agents stitched STT + LLM + TTS sequentially → 1-2s round-trip. Realtime APIs flip this: a single WebSocket / WebRTC connection streams audio in + audio out with the model reasoning on a shared connection. Latency drops to 200-500ms total round-trip. Architecture: WebRTC for audio transport (low jitter, NAT traversal) or WebSocket for simpler integrations, model-native voice modes (GPT-4o voice, Gemini audio), tool calling mid-conversation, interrupt handling. Production wins: voice agents finally feel natural; UX matches human conversation pacing. Trade-offs: realtime APIs are expensive (audio tokens cost more than text), session limits cap conversation length, debugging streaming is harder than request / response.

## When to use

- Production voice agents.
- Real-time multimodal demos.

## Common mistakes

- Wrapping a Realtime API in your own STT + TTS — defeats the latency benefit.
- Not budgeting audio token cost — much higher than text per minute.

## Related terms

- [voice-agent-platform](https://promtable.com/glossary/voice-agent-platform)
- [barge-in](https://promtable.com/glossary/barge-in)
- [voice](https://promtable.com/glossary/voice)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/realtime-api
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/realtime-api".
Contact: info@vibecodingturkey.com.