Computer use
Computer use is the agent capability where an LLM controls a real desktop or browser via screenshots + mouse/keyboard primitives — Anthropic introduced it in 2024 and it's mainstream across Claude, GPT, Gemini in 2026.
Computer use lets agents drive any GUI by treating the screen as input and emitting mouse + keyboard actions. The model sees a screenshot, decides where to click/type, executes the primitive, sees the next screenshot, repeats. By 2026 all three frontier labs (Anthropic Claude, OpenAI, Google Gemini) expose computer-use APIs. Production uses: filling out web forms with no API, navigating legacy software, browser-based agents (Browserbase, Browserless), QA test generation. Failure modes: visual grounding errors (click missed by 10px), prompt injection via on-screen text, slow latency (multi-second per step), high cost (image tokens + reasoning every turn). Agent sandboxes ([[agent-sandbox]]) are usually required for safety.
When to use computer use
- Software with no API.
- Browser-based agents.
- Generating UI test scripts.
When not to use computer use
- If an API exists — use it. Computer use is 100× slower + more expensive.
Common mistakes
- Skipping sandbox — agent can click 'Buy now' or send email.
- Underestimating latency — production loops are multi-second per step.
FAQ
What is computer use?
Computer use is the agent capability where an LLM controls a real desktop or browser via screenshots + mouse/keyboard primitives — Anthropic introduced it in 2024 and it's mainstream across Claude, GPT, Gemini in 2026.
When should I use computer use?
Software with no API. Browser-based agents. Generating UI test scripts.
What are the most common mistakes with computer use?
Skipping sandbox — agent can click 'Buy now' or send email. Underestimating latency — production loops are multi-second per step.
Related terms
- Agent sandbox — An agent sandbox is the isolated execution environment where an LLM-driven agent runs code, browses, or controls a desktop — the safety boundary that contains prompt-injection blast radius.
- Browser agent — A browser agent is an LLM-driven system that controls a real or headless web browser to navigate sites, fill forms, click, and extract data — automating tasks that require interacting with web UIs.
- Tool use (LLM) — Tool use is the umbrella term for any LLM mechanism that lets the model invoke external functions, APIs, or services — function calling, code interpreter, MCP servers, browser actions.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/computer-use.md.