AI agent design cheatsheet (2026 patterns that ship)
Production-tested AI agent design patterns in 2026: planner-executor split, tool design, context distillation, budget caps, eval discipline, and the antipatterns that bite.
Architecture patterns
Pick a pattern based on horizon and branching.
| Item | Description | Example |
|---|---|---|
ReAct loop | Thought → action → observation, repeat. Good for short horizons (≤5 steps). | Quick research, single-tool retrieval |
Planner-executor | Planner decomposes goal into subtasks; executor runs each with its own budget. Default for >7 steps. | Multi-file code refactor |
Graph (state machine) | Explicit nodes + transitions; LangGraph default. Easiest to debug. | Workflow with branching, retries |
Critic-actor | Critic scores output; actor revises until critic approves. Quality over cost. | Code generation with iterative fix loops |
Tool design rules
| Item | Description | Example |
|---|---|---|
Name by intent | search_company_news, not http_get_news_api. The model uses names as docs. | |
Description = documentation | 1-2 sentences per tool. Include when NOT to use it. | |
Tight schemas | JSON Schema with descriptive field names. Required vs optional matters. | |
Readable errors | "Invalid ticker 'XYZ' — try 4-letter ticker like 'AAPL'" beats "400 Bad Request". | |
Small results | Return summaries, not 10K-token JSON. The model has to fit them in next step's context. | |
Tool count <30 | Past that, mis-routing climbs. Split into sub-agents. | |
Context engineering
| Item | Description | Example |
|---|---|---|
Distill the loop | Every N steps, rewrite history as a tight summary. Drop verbatim tool outputs after extracting what matters. | |
System prompt for constants | Tools, role, format — anything stable. Maximises prompt caching. | |
User turn for variables | Task, latest tool result, next step. | |
Beware lost-in-middle | Put critical instructions at head + tail of the prompt. | |
Budgets + guardrails (mandatory)
| Item | Description | Example |
|---|---|---|
Max-step cap | Hard ceiling on loop iterations (e.g. 20). Set at framework level. | |
Token budget | Total tokens across the loop. Kill if exceeded. | |
Wall-clock budget | Total time. Important for user-facing agents. | |
No-progress detector | If plan / summary hasn't changed in N steps, stop and escalate. | |
Refusal whitelist | Explicit list of actions the agent must never take. | |
Routing
| Item | Description | Example |
|---|---|---|
Cheap router model | GPT-4o-mini / Claude Haiku picks which tool to call. Fast, near-free. | |
Strong executor model | Claude 4.6 / GPT-4o runs the actual step. | |
Reasoning model selectively | Only for planning + hardest subtasks. Save the budget. | |
Eval discipline
| Item | Description | Example |
|---|---|---|
End-to-end success | Score on 50-200 golden tasks. The metric that matters. | |
Step-level correctness | Was each tool right? Were args right? For debugging. | |
Failure mode taxonomy | Wrong tool, wrong args, no progress, hallucinated result, timeout. Count each. | |
Regression alarm | Run evals on every prompt / tool change. Block deploy on N% drop. | |
FAQ
Best AI agent framework in 2026?
LangGraph for graph-based; OpenAI Agents SDK if OpenAI-native; Claude Agent SDK if Anthropic-native; CrewAI for multi-agent. All credible.
How many tools is too many for one agent?
Past ~30 tools per agent, mis-routing climbs. Split into specialised sub-agents instead.
Should I use a planner-executor or ReAct?
ReAct for ≤5 steps; planner-executor for longer horizons.
Last updated: 2026-06-01.