Spot instance (AI training)
A spot instance is a cloud GPU rented at a discount (often 50-90% off) on the condition that the provider can reclaim it on short notice — used for cost-sensitive training that can checkpoint and resume.
Spot pricing for GPUs (AWS, GCP, Vast.ai, Lambda) is the cheapest path to large-scale training in 2026 — at the cost of operational complexity. Training jobs must checkpoint frequently and resume cleanly when instances are reclaimed. Frameworks like Ray Train, Modal restart hooks, and SkyPilot handle the orchestration. For non-resumable workloads (interactive notebooks, live inference) spot is the wrong choice. For batch training, fine-tuning, and large embedding jobs, spot can cut compute cost by 5-10× vs on-demand.
When to use spot instance (ai training)
- Batch training and fine-tuning jobs with checkpointing.
- Large embedding generation pipelines.
Common mistakes
- No checkpointing — losing hours of training when an instance is reclaimed.
- Using spot for live inference — interruptions break user UX.
FAQ
What is spot instance (ai training)?
A spot instance is a cloud GPU rented at a discount (often 50-90% off) on the condition that the provider can reclaim it on short notice — used for cost-sensitive training that can checkpoint and resume.
When should I use spot instance (ai training)?
Batch training and fine-tuning jobs with checkpointing. Large embedding generation pipelines.
What are the most common mistakes with spot instance (ai training)?
No checkpointing — losing hours of training when an instance is reclaimed. Using spot for live inference — interruptions break user UX.
Related terms
- Batched inference — Batched inference packs multiple prompts into a single GPU forward pass, dramatically improving throughput and unit cost at the cost of per-request latency.
- Fine-tuning — Fine-tuning updates a pretrained model's weights on task-specific data, baking the new behaviour into the model rather than relying on prompts.
- Cold start (inference) — Cold start is the delay incurred when a serverless inference function loads its model into GPU memory for the first time after being idle — typically 5-60 seconds for large LLMs.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/spot-instance.md.