Mixture of agents
Mixture of agents is an inference pattern where multiple specialised LLM agents run in parallel and a router aggregator combines their outputs into a single answer — higher quality than any single agent at higher cost.
Mixture of agents (Wang et al., 2024) generalises the model-router idea: instead of routing each query to one model, you send it to N models in parallel and have an aggregator LLM synthesise the best answer from all N. Each layer can be a different model family (Claude, GPT, Gemini, open-weight) so the ensemble draws on different training distributions. Open-source implementations (Together AI's mixture-of-agents, MoA-Lite) ship in 2026. The trade-off: 3-7x cost for ~15-25% quality lift on hard benchmarks. Reserve for high-stakes inference where being right matters more than being cheap.
When to use mixture of agents
- High-stakes single-shot answers (medical, legal, finance summarisation).
- Hard reasoning benchmarks (MMLU-Pro, GPQA).
Common mistakes
- Running MoA on every query — cost scales fast.
- Using identical models in the ensemble — diversity is the point.
FAQ
What is mixture of agents?
Mixture of agents is an inference pattern where multiple specialised LLM agents run in parallel and a router aggregator combines their outputs into a single answer — higher quality than any single agent at higher cost.
When should I use mixture of agents?
High-stakes single-shot answers (medical, legal, finance summarisation). Hard reasoning benchmarks (MMLU-Pro, GPQA).
What are the most common mistakes with mixture of agents?
Running MoA on every query — cost scales fast. Using identical models in the ensemble — diversity is the point.
Related terms
- Model router — A model router picks which language model handles each request based on cost, latency, or task type — the standard production pattern in 2026.
- AI agent — An AI agent is a system where a language model autonomously plans and executes a sequence of tool calls to accomplish a goal.
- Reasoning model — A reasoning model is an LLM trained to produce extensive internal chain-of-thought before its final answer, trading latency for higher accuracy on hard problems.
- Self-consistency — Self-consistency runs the same prompt multiple times at non-zero temperature and picks the most common final answer, raising accuracy on reasoning tasks.
Sources
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/mixture-of-agents.md.