Browser agent
A browser agent is an LLM-driven system that controls a real or headless web browser to navigate sites, fill forms, click, and extract data — automating tasks that require interacting with web UIs.
Browser agents emerged as a serious category in 2025-2026 with Anthropic Computer Use, OpenAI Operator, Browser Use (open source), Cline browser tools, and many others. The model receives screenshots or DOM, decides on the next action (click coordinates, type text, navigate), and the harness executes it. Use cases include scraping behind login walls, automating QA, filling SaaS dashboards, and consumer-task automation (booking, shopping). Production challenges: long horizons, brittle selectors, captchas, prompt injection from page content, and cost (every step is a vision + reasoning call). Best practice in 2026 is to scope agents tightly and combine browser actions with traditional APIs whenever a direct API exists.
When to use browser agent
- Automating workflows behind logins or paywalls without APIs.
- QA + visual regression on web apps.
- Bridging legacy SaaS dashboards into automation pipelines.
Common mistakes
- Letting the agent click freely on sensitive pages — sandbox actions.
- No max-step cap — costs blow up on long-horizon tasks.
- Trusting page content as instructions — prompt injection via web pages is the dominant attack.
FAQ
What is browser agent?
A browser agent is an LLM-driven system that controls a real or headless web browser to navigate sites, fill forms, click, and extract data — automating tasks that require interacting with web UIs.
When should I use browser agent?
Automating workflows behind logins or paywalls without APIs. QA + visual regression on web apps. Bridging legacy SaaS dashboards into automation pipelines.
What are the most common mistakes with browser agent?
Letting the agent click freely on sensitive pages — sandbox actions. No max-step cap — costs blow up on long-horizon tasks. Trusting page content as instructions — prompt injection via web pages is the dominant attack.
Related terms
- AI agent — An AI agent is a system where a language model autonomously plans and executes a sequence of tool calls to accomplish a goal.
- ReAct pattern — ReAct interleaves Reasoning + Acting in an agent loop — the model writes a thought, then decides to call a tool, then observes the result, then thinks again.
- Prompt injection — Prompt injection is an attack where hostile content in a model's input (a webpage, a retrieved document, a user message) overrides the system prompt's instructions.
- Guardrails — Guardrails are deterministic checks layered around a language model to prevent unsafe, off-topic, or non-compliant outputs from reaching the user.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/browser-agent.md.