technique

Tool call streaming

Tool call streaming is the API feature where the model emits a tool call (function name + arguments) incrementally as it generates — letting the client start preparing execution before the full call is complete.

Standard tool calling buffers the model's response until the full function name + JSON arguments are emitted, then ships one event. For agents with multi-second tool latency that buffering is wasted time. Streaming tool calls flip it: the API emits partial deltas (`tool_name: 'search_'`, `tool_name: 'search_db'`, `arguments: '{"query":"'...`) so the client can start setup (open the DB connection, prefetch the embedding model) before arguments finish. By 2026 OpenAI, Anthropic, Google, and the Vercel AI SDK all expose streaming tool calls. Production benefit: 200-500ms shaved per tool call in latency-sensitive voice / chat agents.

When to use tool call streaming

Common mistakes

FAQ

What is tool call streaming?

Tool call streaming is the API feature where the model emits a tool call (function name + arguments) incrementally as it generates — letting the client start preparing execution before the full call is complete.

When should I use tool call streaming?

Latency-critical voice / chat agents. Tools with high setup cost (DB connections, model loads).

What are the most common mistakes with tool call streaming?

Acting on partial arguments before finalization — model can change mid-stream. Skipping streaming in low-latency apps — leaves 500ms on the floor.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/tool-call-streaming.md.