Tool call streaming
Tool call streaming is the API feature where the model emits a tool call (function name + arguments) incrementally as it generates — letting the client start preparing execution before the full call is complete.
Standard tool calling buffers the model's response until the full function name + JSON arguments are emitted, then ships one event. For agents with multi-second tool latency that buffering is wasted time. Streaming tool calls flip it: the API emits partial deltas (`tool_name: 'search_'`, `tool_name: 'search_db'`, `arguments: '{"query":"'...`) so the client can start setup (open the DB connection, prefetch the embedding model) before arguments finish. By 2026 OpenAI, Anthropic, Google, and the Vercel AI SDK all expose streaming tool calls. Production benefit: 200-500ms shaved per tool call in latency-sensitive voice / chat agents.
When to use tool call streaming
- Latency-critical voice / chat agents.
- Tools with high setup cost (DB connections, model loads).
Common mistakes
- Acting on partial arguments before finalization — model can change mid-stream.
- Skipping streaming in low-latency apps — leaves 500ms on the floor.
FAQ
What is tool call streaming?
Tool call streaming is the API feature where the model emits a tool call (function name + arguments) incrementally as it generates — letting the client start preparing execution before the full call is complete.
When should I use tool call streaming?
Latency-critical voice / chat agents. Tools with high setup cost (DB connections, model loads).
What are the most common mistakes with tool call streaming?
Acting on partial arguments before finalization — model can change mid-stream. Skipping streaming in low-latency apps — leaves 500ms on the floor.
Related terms
- Function calling (tool use) — Function calling lets a language model emit a structured request to invoke a developer-defined tool, enabling reliable JSON output and agent workflows.
- Response streaming — Response streaming pipes the model's output token-by-token to the client as it's generated, so users see text appearing in real time instead of waiting for the full answer.
- Tool use (LLM) — Tool use is the umbrella term for any LLM mechanism that lets the model invoke external functions, APIs, or services — function calling, code interpreter, MCP servers, browser actions.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/tool-call-streaming.md.