# LoRA hot-swapping

**Source:** https://promtable.com/glossary/lora-hot-swap

> LoRA hot-swapping is the serving pattern where many fine-tuned LoRA adapters share a single base model on GPU — the appropriate adapter is loaded per request without reloading the base model.

---
LoRA hot-swapping is the serving pattern where many fine-tuned LoRA adapters share a single base model on GPU — the appropriate adapter is loaded per request without reloading the base model.

Production serving systems in 2026 (vLLM multi-LoRA, S-LoRA, Predibase, Fireworks fine-tune-and-serve) let teams deploy hundreds of LoRA adapters on one base model — each request specifies which adapter to apply. The cost win: a single base model in GPU memory + tiny per-adapter overhead vs running 100 separate fine-tuned models. The constraint: all adapters must derive from the same base model. Used heavily for multi-tenant fine-tuning, per-customer brand voices, and domain-specific variants.

## When to use

- Multi-tenant fine-tuning at scale.
- Per-customer brand-voice or domain variants.

## Common mistakes

- Mixing adapters from different base models — won't work.
- Sizing GPU memory naively — pages add overhead.

## Related terms

- [lora](https://promtable.com/glossary/lora)
- [fine-tuning](https://promtable.com/glossary/fine-tuning)
- [batched-inference](https://promtable.com/glossary/batched-inference)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/lora-hot-swap
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/lora-hot-swap".
Contact: info@vibecodingturkey.com.