Web retrieval tool
A web retrieval tool is the agent-callable API that fetches live web content for LLMs — Tavily, Exa, Serper, Brave Search API, Perplexity API are 2026 leaders, used as the 'web search' function in tool-use loops.
LLMs without web retrieval are stuck at their [[knowledge-cutoff]]. Web retrieval tools fix this: the agent emits a tool call `search(query)`, the tool returns either raw SERPs (Serper, Brave) or pre-summarized answers + citations (Tavily, Perplexity). Trade-offs: raw SERPs give the agent control over what to read but cost more reasoning tokens; pre-summarized answers are cheaper but lossy. Production patterns: voice / phone agents need sub-second latency (Tavily, Perplexity), research agents need depth + citations (Exa, Perplexity), high-volume bulk extraction needs cheapest raw SERPs (Serper). Most agents in 2026 wrap multiple — cheap SERP first, expensive LLM-summarized only when SERP fails.
When to use web retrieval tool
- Any agent needing post-cutoff or live-web information.
Common mistakes
- Hitting web retrieval on every turn — cache results, only refresh when query is novel.
- Trusting summarized answers without checking citations — LLM-summarized tools can hallucinate.
FAQ
What is web retrieval tool?
A web retrieval tool is the agent-callable API that fetches live web content for LLMs — Tavily, Exa, Serper, Brave Search API, Perplexity API are 2026 leaders, used as the 'web search' function in tool-use loops.
When should I use web retrieval tool?
Any agent needing post-cutoff or live-web information.
What are the most common mistakes with web retrieval tool?
Hitting web retrieval on every turn — cache results, only refresh when query is novel. Trusting summarized answers without checking citations — LLM-summarized tools can hallucinate.
Related terms
- AI search engine — An AI search engine answers a user's query by retrieving relevant web sources and synthesising a cited answer with a language model — the category that includes Perplexity, ChatGPT Search, Claude with web, and Gemini AI Overviews.
- Real-time knowledge — Real-time knowledge is an LLM's access to information from the past minutes/hours/days via live data feeds — Grok's X firehose, Perplexity's web search, ChatGPT's browse — separate from the model's static training cutoff.
- Tool use (LLM) — Tool use is the umbrella term for any LLM mechanism that lets the model invoke external functions, APIs, or services — function calling, code interpreter, MCP servers, browser actions.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/web-retrieval-tool.md.