# Frame-based streaming

**Source:** https://promtable.com/glossary/frame-based-streaming

> Frame-based streaming is the voice / video pipeline architecture where small fixed-size chunks (audio frames, image frames) flow through a chain of processors — Pipecat's core abstraction, also used in WebRTC media pipelines.

---
Frame-based streaming is the voice / video pipeline architecture where small fixed-size chunks (audio frames, image frames) flow through a chain of processors — Pipecat's core abstraction, also used in WebRTC media pipelines.

Pipecat (and similar frameworks) model voice pipelines as frames flowing between async processors: an audio capture node emits audio frames, a VAD node tags speech start/end frames, an STT node emits text frames, an LLM node emits response text frames, a TTS node emits audio frames, a playback node writes to speaker. Each processor is small + composable; swapping STT or TTS providers means swapping one node. Trade-offs: frame-based is more flexible than request / response orchestration; debugging is harder (async + distributed state); needs careful backpressure (slow processors block upstream). The abstraction matches WebRTC media pipelines + Unix pipes — a tried-and-true pattern for streaming media.

## When to use

- Building streaming voice / video pipelines.

## Common mistakes

- No backpressure handling — fast producer overwhelms slow consumer.
- Mixing frame + request models — async state becomes a nightmare.

## Related terms

- [voice-pipeline](https://promtable.com/glossary/voice-pipeline)
- [voice-agent-platform](https://promtable.com/glossary/voice-agent-platform)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/frame-based-streaming
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/frame-based-streaming".
Contact: info@vibecodingturkey.com.