LangGraph agent cheatsheet (2026 production patterns)
Production LangGraph patterns: state machine design, node types, persistent checkpointing, human-in-the-loop, sub-agents, tracing, eval integration, and the antipatterns to skip.
State machine basics
LangGraph models agents as graphs of state transitions.
| Item | Description | Example |
|---|---|---|
State | TypedDict / Pydantic model holding everything the graph needs to pass between nodes. | |
Nodes | Functions that take state, return state. Can be LLM calls, tool calls, conditionals, or arbitrary Python. | |
Edges | Transitions between nodes. Static, conditional (branching), or interrupt-driven. | |
START / END | Special nodes marking graph entry and termination. | |
Node types
| Item | Description | Example |
|---|---|---|
LLM call | Standard inference — chat completion, structured output, function calling. | |
Tool call | Invoke a tool, fold result into state. | |
Conditional branch | Function returning the next node name based on state. | |
Sub-graph | An entire LangGraph as a node — composing larger agents. | |
Checkpointing
Production graphs persist state across turns.
| Item | Description | Example |
|---|---|---|
MemorySaver | In-memory checkpointer for dev. | |
PostgresSaver / SqliteSaver | Persistent checkpointer for production. Stores graph state per thread_id. | |
Thread IDs | Each conversation / session has a thread_id — checkpoints map to it. | |
Replay | Restart from any checkpoint — debug or branch from a specific state. | |
Human-in-the-loop
| Item | Description | Example |
|---|---|---|
interrupt_before / interrupt_after | Pause execution before / after specific nodes for human approval. | |
Resume with state edit | Human can edit state mid-pause before resuming. Useful for correction loops. | |
Time travel | Roll back to a previous checkpoint and try a different branch. | |
Sub-agents + handoffs
| Item | Description | Example |
|---|---|---|
create_react_agent | Ready-made ReAct agent for many use cases. Wrap as a node in a larger graph. | |
Supervisor + workers | Coordinator agent routes to specialist sub-agents based on intent. | |
Handoff | Explicit node that transfers control to another agent's sub-graph. | |
Observability
| Item | Description | Example |
|---|---|---|
LangSmith | First-party tracing — every node, every LLM call, every state transition logged. | |
OpenTelemetry | Custom tracing for OTel-native shops. | |
Evals | Pair traces with evals so regressions on a node are caught. | |
Antipatterns
| Item | Description | Example |
|---|---|---|
Mega node | A single LLM-call node doing too much. Split into smaller focused nodes. | |
No checkpoint | Production graphs without checkpoints can't recover from failure mid-run. | |
No max-step cap | Loops can run away. Always cap step count. | |
FAQ
Best LangGraph pattern for production?
Planner-executor split, postgres checkpointer, human-in-the-loop on destructive actions, LangSmith tracing.
How do I prevent runaway loops?
Set max step count, use no-progress detection, kill on token budget exceedance.
Is LangGraph too heavy for simple use cases?
Probably — for single-step LLM calls a plain SDK is fine. LangGraph wins past 3-5 step orchestration.
Last updated: 2026-06-01.