# Streaming response

**Source:** https://promtable.com/glossary/streaming-response

> Streaming response is the LLM API pattern where tokens are emitted incrementally over Server-Sent Events / WebSocket as the model generates — drastically improves perceived latency, enables progressive UI updates, mandatory for interactive UX.

---
Streaming response is the LLM API pattern where tokens are emitted incrementally over Server-Sent Events / WebSocket as the model generates — drastically improves perceived latency, enables progressive UI updates, mandatory for interactive UX.

Non-streaming LLM calls wait for the full response then return it — for a 1000-token response, the user stares at a loading spinner for 5-10 seconds. Streaming flips this: tokens arrive as generated, UI renders them in real time, user sees the first word in 100-300ms. Implementation: API uses Server-Sent Events (`text/event-stream`) or WebSocket; client SDK parses delta events into token strings; UI appends to a buffer. Most AI SDKs handle this; modern UI frameworks (Vercel AI SDK + React, SvelteKit) ship streaming-friendly hooks. Trade-offs: error handling is harder (partial response, then error), tool-call streaming is complex, harder to cache full responses. By 2026 non-streaming chat UX is rare; only batch jobs skip it.

## When to use

- Any interactive chat / agent UX.

## Common mistakes

- Buffering the stream before returning — defeats the purpose.
- Forgetting partial-response error handling.

## Related terms

- [response-streaming](https://promtable.com/glossary/response-streaming)
- [tool-call-streaming](https://promtable.com/glossary/tool-call-streaming)
- [ai-sdk](https://promtable.com/glossary/ai-sdk)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/streaming-response
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/streaming-response".
Contact: info@vibecodingturkey.com.