LLM gateway
An LLM gateway is the proxy layer between your app and one-or-many LLM providers — handles routing, fallback, caching, cost tracking, rate limiting, and observability. OpenRouter, LiteLLM, Portkey, Helicone, Cloudflare AI Gateway are 2026 leaders.
Calling LLM APIs directly works for prototypes. Production apps quickly need: model fallback when the primary errors, request caching for repeat prompts, per-team cost caps, audit logs for compliance, rate limiting against runaway loops, multi-vendor routing for cost or quality. An LLM gateway centralizes all of this — the app talks to one OpenAI-compatible endpoint, the gateway handles the rest. Trade-offs: hosted gateways (OpenRouter, Portkey) are zero-ops but add latency + margin; self-host (LiteLLM) gives full control + lower cost but operational burden; SDK-only (Vercel AI SDK) avoids the proxy but loses central governance. Most production AI stacks above small scale have an LLM gateway.
When to use llm gateway
- Any production AI app at non-trivial scale.
Common mistakes
- Putting the gateway on a slow region — adds latency to every LLM call.
- No fallback configured — primary outage takes down your app.
FAQ
What is llm gateway?
An LLM gateway is the proxy layer between your app and one-or-many LLM providers — handles routing, fallback, caching, cost tracking, rate limiting, and observability. OpenRouter, LiteLLM, Portkey, Helicone, Cloudflare AI Gateway are 2026 leaders.
When should I use llm gateway?
Any production AI app at non-trivial scale.
What are the most common mistakes with llm gateway?
Putting the gateway on a slow region — adds latency to every LLM call. No fallback configured — primary outage takes down your app.
Related terms
- Model router — A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.
- Router fallback — A router fallback is a chain of model providers that the application tries in order — failing over from primary to secondary to tertiary on 429s, 500s, or quality thresholds.
- Cost attribution — Cost attribution is the FinOps discipline of tracking LLM spend per-user, per-feature, per-tenant, per-model — the foundation for unit economics, abuse detection, and pricing decisions in 2026 AI products.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/llm-gateway.md.