# Tool call streaming

**Source:** https://promtable.com/glossary/tool-call-streaming

> Tool call streaming is the API feature where the model emits a tool call (function name + arguments) incrementally as it generates — letting the client start preparing execution before the full call is complete.

---
Tool call streaming is the API feature where the model emits a tool call (function name + arguments) incrementally as it generates — letting the client start preparing execution before the full call is complete.

Standard tool calling buffers the model's response until the full function name + JSON arguments are emitted, then ships one event. For agents with multi-second tool latency that buffering is wasted time. Streaming tool calls flip it: the API emits partial deltas (`tool_name: 'search_'`, `tool_name: 'search_db'`, `arguments: '{"query":"'...`) so the client can start setup (open the DB connection, prefetch the embedding model) before arguments finish. By 2026 OpenAI, Anthropic, Google, and the Vercel AI SDK all expose streaming tool calls. Production benefit: 200-500ms shaved per tool call in latency-sensitive voice / chat agents.

## When to use

- Latency-critical voice / chat agents.
- Tools with high setup cost (DB connections, model loads).

## Common mistakes

- Acting on partial arguments before finalization — model can change mid-stream.
- Skipping streaming in low-latency apps — leaves 500ms on the floor.

## Related terms

- [function-calling](https://promtable.com/glossary/function-calling)
- [response-streaming](https://promtable.com/glossary/response-streaming)
- [tool-use](https://promtable.com/glossary/tool-use)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/tool-call-streaming
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/tool-call-streaming".
Contact: info@vibecodingturkey.com.