technique

Mixture of agents

Mixture of agents is an inference pattern where multiple specialised LLM agents run in parallel and a router aggregator combines their outputs into a single answer — higher quality than any single agent at higher cost.

Mixture of agents (Wang et al., 2024) generalises the model-router idea: instead of routing each query to one model, you send it to N models in parallel and have an aggregator LLM synthesise the best answer from all N. Each layer can be a different model family (Claude, GPT, Gemini, open-weight) so the ensemble draws on different training distributions. Open-source implementations (Together AI's mixture-of-agents, MoA-Lite) ship in 2026. The trade-off: 3-7x cost for ~15-25% quality lift on hard benchmarks. Reserve for high-stakes inference where being right matters more than being cheap.

When to use mixture of agents

Common mistakes

FAQ

What is mixture of agents?

Mixture of agents is an inference pattern where multiple specialised LLM agents run in parallel and a router aggregator combines their outputs into a single answer — higher quality than any single agent at higher cost.

When should I use mixture of agents?

High-stakes single-shot answers (medical, legal, finance summarisation). Hard reasoning benchmarks (MMLU-Pro, GPQA).

What are the most common mistakes with mixture of agents?

Running MoA on every query — cost scales fast. Using identical models in the ensemble — diversity is the point.

Sources

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/mixture-of-agents.md.