technique

Computer use

Computer use is the agent capability where an LLM controls a real desktop or browser via screenshots + mouse/keyboard primitives — Anthropic introduced it in 2024 and it's mainstream across Claude, GPT, Gemini in 2026.

Computer use lets agents drive any GUI by treating the screen as input and emitting mouse + keyboard actions. The model sees a screenshot, decides where to click/type, executes the primitive, sees the next screenshot, repeats. By 2026 all three frontier labs (Anthropic Claude, OpenAI, Google Gemini) expose computer-use APIs. Production uses: filling out web forms with no API, navigating legacy software, browser-based agents (Browserbase, Browserless), QA test generation. Failure modes: visual grounding errors (click missed by 10px), prompt injection via on-screen text, slow latency (multi-second per step), high cost (image tokens + reasoning every turn). Agent sandboxes ([[agent-sandbox]]) are usually required for safety.

When to use computer use

When not to use computer use

Common mistakes

FAQ

What is computer use?

Computer use is the agent capability where an LLM controls a real desktop or browser via screenshots + mouse/keyboard primitives — Anthropic introduced it in 2024 and it's mainstream across Claude, GPT, Gemini in 2026.

When should I use computer use?

Software with no API. Browser-based agents. Generating UI test scripts.

What are the most common mistakes with computer use?

Skipping sandbox — agent can click 'Buy now' or send email. Underestimating latency — production loops are multi-second per step.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/computer-use.md.