Streaming response
Streaming response is the LLM API pattern where tokens are emitted incrementally over Server-Sent Events / WebSocket as the model generates — drastically improves perceived latency, enables progressive UI updates, mandatory for interactive UX.
Non-streaming LLM calls wait for the full response then return it — for a 1000-token response, the user stares at a loading spinner for 5-10 seconds. Streaming flips this: tokens arrive as generated, UI renders them in real time, user sees the first word in 100-300ms. Implementation: API uses Server-Sent Events (`text/event-stream`) or WebSocket; client SDK parses delta events into token strings; UI appends to a buffer. Most AI SDKs handle this; modern UI frameworks (Vercel AI SDK + React, SvelteKit) ship streaming-friendly hooks. Trade-offs: error handling is harder (partial response, then error), tool-call streaming is complex, harder to cache full responses. By 2026 non-streaming chat UX is rare; only batch jobs skip it.
When to use streaming response
- Any interactive chat / agent UX.
Common mistakes
- Buffering the stream before returning — defeats the purpose.
- Forgetting partial-response error handling.
FAQ
What is streaming response?
Streaming response is the LLM API pattern where tokens are emitted incrementally over Server-Sent Events / WebSocket as the model generates — drastically improves perceived latency, enables progressive UI updates, mandatory for interactive UX.
When should I use streaming response?
Any interactive chat / agent UX.
What are the most common mistakes with streaming response?
Buffering the stream before returning — defeats the purpose. Forgetting partial-response error handling.
Related terms
- Response streaming — Response streaming pipes the model's output token-by-token to the client as it's generated, so users see text appearing in real time instead of waiting for the full answer.
- Tool call streaming — Tool call streaming is the API feature where the model emits a tool call (function name + arguments) incrementally as it generates — letting the client start preparing execution before the full call is complete.
- AI SDK — An AI SDK is the official client library a vendor ships for calling their model APIs — Anthropic SDK, OpenAI SDK, Google GenAI SDK, Mistral SDK, Vercel AI SDK (multi-vendor wrapper). Handles auth, retries, streaming, types.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/streaming-response.md.