LoRA fine-tune
LoRA (Low-Rank Adaptation) fine-tune is the parameter-efficient method that trains small adapter matrices on top of frozen base weights — 10-100× cheaper than full fine-tune, swappable per task, easy to serve many LoRAs from one base model.
Full fine-tune updates all base model parameters — expensive in compute + memory + storage. LoRA freezes the base + trains low-rank matrices (`W' = W + AB^T` where A, B are tiny) — typically < 1% of base params. Benefits: 10-100× cheaper training, tiny adapter files (10-100 MB vs 100+ GB full model), easy to swap LoRAs at serve time, lets one base model serve many specialized variants (per-tenant, per-language, per-task). 2026 production patterns: train per-customer LoRAs in minutes-to-hours, serve dozens of LoRAs from one base on multi-LoRA inference engines (Predibase LoRAX, vLLM multi-LoRA). Trade-offs: LoRA quality usually approaches but doesn't quite match full fine-tune on hardest tasks; merging multiple LoRAs is non-trivial.
When to use lora fine-tune
- Per-customer / per-task model specialization.
- Cost-sensitive fine-tunes.
Common mistakes
- Picking too-low rank — quality cap on hard tasks.
- Over-training — LoRA overfits faster than full fine-tune.
FAQ
What is lora fine-tune?
LoRA (Low-Rank Adaptation) fine-tune is the parameter-efficient method that trains small adapter matrices on top of frozen base weights — 10-100× cheaper than full fine-tune, swappable per task, easy to serve many LoRAs from one base model.
When should I use lora fine-tune?
Per-customer / per-task model specialization. Cost-sensitive fine-tunes.
What are the most common mistakes with lora fine-tune?
Picking too-low rank — quality cap on hard tasks. Over-training — LoRA overfits faster than full fine-tune.
Related terms
- LoRA (Low-Rank Adaptation) — LoRA is a fine-tuning method that trains a small set of low-rank adapter weights on top of a frozen base model — cheaper to train and store than full fine-tuning.
- LoRA stacking — LoRA stacking applies multiple LoRA adapters simultaneously to a diffusion model — combining a character LoRA, a style LoRA, and a quality LoRA — to compose effects without retraining.
- LoRA hot-swapping — LoRA hot-swapping is the serving pattern where many fine-tuned LoRA adapters share a single base model on GPU — the appropriate adapter is loaded per request without reloading the base model.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/lora-fine-tune.md.