Cheatsheet

AI agent design cheatsheet (2026 patterns that ship)

Production-tested AI agent design patterns in 2026: planner-executor split, tool design, context distillation, budget caps, eval discipline, and the antipatterns that bite.

Architecture patterns

Pick a pattern based on horizon and branching.

ItemDescriptionExample
ReAct loopThought → action → observation, repeat. Good for short horizons (≤5 steps).Quick research, single-tool retrieval
Planner-executorPlanner decomposes goal into subtasks; executor runs each with its own budget. Default for >7 steps.Multi-file code refactor
Graph (state machine)Explicit nodes + transitions; LangGraph default. Easiest to debug.Workflow with branching, retries
Critic-actorCritic scores output; actor revises until critic approves. Quality over cost.Code generation with iterative fix loops

Tool design rules

ItemDescriptionExample
Name by intentsearch_company_news, not http_get_news_api. The model uses names as docs.
Description = documentation1-2 sentences per tool. Include when NOT to use it.
Tight schemasJSON Schema with descriptive field names. Required vs optional matters.
Readable errors"Invalid ticker 'XYZ' — try 4-letter ticker like 'AAPL'" beats "400 Bad Request".
Small resultsReturn summaries, not 10K-token JSON. The model has to fit them in next step's context.
Tool count <30Past that, mis-routing climbs. Split into sub-agents.

Context engineering

ItemDescriptionExample
Distill the loopEvery N steps, rewrite history as a tight summary. Drop verbatim tool outputs after extracting what matters.
System prompt for constantsTools, role, format — anything stable. Maximises prompt caching.
User turn for variablesTask, latest tool result, next step.
Beware lost-in-middlePut critical instructions at head + tail of the prompt.

Budgets + guardrails (mandatory)

ItemDescriptionExample
Max-step capHard ceiling on loop iterations (e.g. 20). Set at framework level.
Token budgetTotal tokens across the loop. Kill if exceeded.
Wall-clock budgetTotal time. Important for user-facing agents.
No-progress detectorIf plan / summary hasn't changed in N steps, stop and escalate.
Refusal whitelistExplicit list of actions the agent must never take.

Routing

ItemDescriptionExample
Cheap router modelGPT-4o-mini / Claude Haiku picks which tool to call. Fast, near-free.
Strong executor modelClaude 4.6 / GPT-4o runs the actual step.
Reasoning model selectivelyOnly for planning + hardest subtasks. Save the budget.

Eval discipline

ItemDescriptionExample
End-to-end successScore on 50-200 golden tasks. The metric that matters.
Step-level correctnessWas each tool right? Were args right? For debugging.
Failure mode taxonomyWrong tool, wrong args, no progress, hallucinated result, timeout. Count each.
Regression alarmRun evals on every prompt / tool change. Block deploy on N% drop.

FAQ

Best AI agent framework in 2026?

LangGraph for graph-based; OpenAI Agents SDK if OpenAI-native; Claude Agent SDK if Anthropic-native; CrewAI for multi-agent. All credible.

How many tools is too many for one agent?

Past ~30 tools per agent, mis-routing climbs. Split into specialised sub-agents instead.

Should I use a planner-executor or ReAct?

ReAct for ≤5 steps; planner-executor for longer horizons.

Last updated: 2026-06-01.