LoRA hot-swapping
LoRA hot-swapping is the serving pattern where many fine-tuned LoRA adapters share a single base model on GPU — the appropriate adapter is loaded per request without reloading the base model.
Production serving systems in 2026 (vLLM multi-LoRA, S-LoRA, Predibase, Fireworks fine-tune-and-serve) let teams deploy hundreds of LoRA adapters on one base model — each request specifies which adapter to apply. The cost win: a single base model in GPU memory + tiny per-adapter overhead vs running 100 separate fine-tuned models. The constraint: all adapters must derive from the same base model. Used heavily for multi-tenant fine-tuning, per-customer brand voices, and domain-specific variants.
When to use lora hot-swapping
- Multi-tenant fine-tuning at scale.
- Per-customer brand-voice or domain variants.
Common mistakes
- Mixing adapters from different base models — won't work.
- Sizing GPU memory naively — pages add overhead.
FAQ
What is lora hot-swapping?
LoRA hot-swapping is the serving pattern where many fine-tuned LoRA adapters share a single base model on GPU — the appropriate adapter is loaded per request without reloading the base model.
When should I use lora hot-swapping?
Multi-tenant fine-tuning at scale. Per-customer brand-voice or domain variants.
What are the most common mistakes with lora hot-swapping?
Mixing adapters from different base models — won't work. Sizing GPU memory naively — pages add overhead.
Related terms
- LoRA (Low-Rank Adaptation) — LoRA is a fine-tuning method that trains a small set of low-rank adapter weights on top of a frozen base model — cheaper to train and store than full fine-tuning.
- Fine-tuning — Fine-tuning updates a pretrained model's weights on task-specific data, baking the new behaviour into the model rather than relying on prompts.
- Batched inference — Batched inference packs multiple prompts into a single GPU forward pass, dramatically improving throughput and unit cost at the cost of per-request latency.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/lora-hot-swap.md.