technique

LoRA (Low-Rank Adaptation)

LoRA is a fine-tuning method that trains a small set of low-rank adapter weights on top of a frozen base model — cheaper to train and store than full fine-tuning.

LoRA (Hu et al., 2021) inserts trainable rank-decomposition matrices into transformer layers while keeping the original weights frozen. The result: you can fine-tune a 70B-parameter model on a single GPU and store the adapter (a few MB) instead of a full checkpoint (140 GB). LoRA adapters can be hot-swapped at inference time, so one base model can serve many specialised tasks. QLoRA adds 4-bit quantisation, making fine-tuning a 70B model viable on a single 24 GB GPU. LoRA is the default fine-tuning technique in 2026 for open-weight LLMs and image diffusion models.

When to use lora (low-rank adaptation)

Common mistakes

FAQ

What is lora (low-rank adaptation)?

LoRA is a fine-tuning method that trains a small set of low-rank adapter weights on top of a frozen base model — cheaper to train and store than full fine-tuning.

When should I use lora (low-rank adaptation)?

Customising open-weight models on small datasets (500–10,000 examples). Training a character or art style on Stable Diffusion / Flux. Multi-tenant deployments where many adapters share one base.

What are the most common mistakes with lora (low-rank adaptation)?

Setting LoRA rank too low (under 4) — under-fits on complex tasks. Forgetting to merge LoRA into base weights for production latency-critical paths.

Sources

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/lora.md.