Frame-based streaming
Frame-based streaming is the voice / video pipeline architecture where small fixed-size chunks (audio frames, image frames) flow through a chain of processors — Pipecat's core abstraction, also used in WebRTC media pipelines.
Pipecat (and similar frameworks) model voice pipelines as frames flowing between async processors: an audio capture node emits audio frames, a VAD node tags speech start/end frames, an STT node emits text frames, an LLM node emits response text frames, a TTS node emits audio frames, a playback node writes to speaker. Each processor is small + composable; swapping STT or TTS providers means swapping one node. Trade-offs: frame-based is more flexible than request / response orchestration; debugging is harder (async + distributed state); needs careful backpressure (slow processors block upstream). The abstraction matches WebRTC media pipelines + Unix pipes — a tried-and-true pattern for streaming media.
When to use frame-based streaming
- Building streaming voice / video pipelines.
Common mistakes
- No backpressure handling — fast producer overwhelms slow consumer.
- Mixing frame + request models — async state becomes a nightmare.
FAQ
What is frame-based streaming?
Frame-based streaming is the voice / video pipeline architecture where small fixed-size chunks (audio frames, image frames) flow through a chain of processors — Pipecat's core abstraction, also used in WebRTC media pipelines.
When should I use frame-based streaming?
Building streaming voice / video pipelines.
What are the most common mistakes with frame-based streaming?
No backpressure handling — fast producer overwhelms slow consumer. Mixing frame + request models — async state becomes a nightmare.
Related terms
- Voice pipeline — A voice pipeline is the chain of audio processing stages — VAD → STT → LLM → TTS → playback — composed in a streaming framework like Pipecat, LiveKit Agents, Vocode, or a managed platform's internal stack.
- Voice agent platform — A voice agent platform is a managed stack that combines STT + LLM + TTS + telephony into a single API for building production phone / voice agents — Vapi, Retell, Bland are the 2026 leaders.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/frame-based-streaming.md.