technique

LoRA fine-tune

LoRA (Low-Rank Adaptation) fine-tune is the parameter-efficient method that trains small adapter matrices on top of frozen base weights — 10-100× cheaper than full fine-tune, swappable per task, easy to serve many LoRAs from one base model.

Full fine-tune updates all base model parameters — expensive in compute + memory + storage. LoRA freezes the base + trains low-rank matrices (`W' = W + AB^T` where A, B are tiny) — typically < 1% of base params. Benefits: 10-100× cheaper training, tiny adapter files (10-100 MB vs 100+ GB full model), easy to swap LoRAs at serve time, lets one base model serve many specialized variants (per-tenant, per-language, per-task). 2026 production patterns: train per-customer LoRAs in minutes-to-hours, serve dozens of LoRAs from one base on multi-LoRA inference engines (Predibase LoRAX, vLLM multi-LoRA). Trade-offs: LoRA quality usually approaches but doesn't quite match full fine-tune on hardest tasks; merging multiple LoRAs is non-trivial.

When to use lora fine-tune

Common mistakes

FAQ

What is lora fine-tune?

LoRA (Low-Rank Adaptation) fine-tune is the parameter-efficient method that trains small adapter matrices on top of frozen base weights — 10-100× cheaper than full fine-tune, swappable per task, easy to serve many LoRAs from one base model.

When should I use lora fine-tune?

Per-customer / per-task model specialization. Cost-sensitive fine-tunes.

What are the most common mistakes with lora fine-tune?

Picking too-low rank — quality cap on hard tasks. Over-training — LoRA overfits faster than full fine-tune.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/lora-fine-tune.md.