# Multi-LoRA serving

**Source:** https://promtable.com/glossary/multi-lora-serving

> Multi-LoRA serving is the inference pattern where one base model serves dozens or hundreds of LoRA adapters from a single deployment — Predibase LoRAX, vLLM multi-LoRA, S-LoRA pioneered this. The cost-efficient way to deploy per-tenant fine-tunes.

---
Multi-LoRA serving is the inference pattern where one base model serves dozens or hundreds of LoRA adapters from a single deployment — Predibase LoRAX, vLLM multi-LoRA, S-LoRA pioneered this. The cost-efficient way to deploy per-tenant fine-tunes.

Without multi-LoRA, each fine-tuned variant needs its own deployment (each costing > $1K/mo for a serious GPU). Multi-LoRA serving fits the LoRA computation into the same forward pass as the base model: load the base once, swap small (10-100 MB) LoRA matrices per request based on routing. Implementations: vLLM multi-LoRA (Apache 2.0), Predibase LoRAX (per-request LoRA selection), S-LoRA (paper / research). Production unlock: thousands of per-customer fine-tunes served from one GPU pool at the marginal cost of base + adapter swap. Trade-offs: throughput drops vs single-LoRA, adapter swapping has overhead, complex routing logic.

## When to use

- Multi-tenant fine-tunes (per customer / use case).
- Cost-sensitive LoRA serving at scale.

## Common mistakes

- Multi-LoRA on tiny base — adapter swap overhead dominates.

## Related terms

- [lora-fine-tune](https://promtable.com/glossary/lora-fine-tune)
- [lora-hot-swap](https://promtable.com/glossary/lora-hot-swap)
- [inference-engine](https://promtable.com/glossary/inference-engine)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/multi-lora-serving
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/multi-lora-serving".
Contact: info@vibecodingturkey.com.