concept

Spot instance (AI training)

A spot instance is a cloud GPU rented at a discount (often 50-90% off) on the condition that the provider can reclaim it on short notice — used for cost-sensitive training that can checkpoint and resume.

Spot pricing for GPUs (AWS, GCP, Vast.ai, Lambda) is the cheapest path to large-scale training in 2026 — at the cost of operational complexity. Training jobs must checkpoint frequently and resume cleanly when instances are reclaimed. Frameworks like Ray Train, Modal restart hooks, and SkyPilot handle the orchestration. For non-resumable workloads (interactive notebooks, live inference) spot is the wrong choice. For batch training, fine-tuning, and large embedding jobs, spot can cut compute cost by 5-10× vs on-demand.

When to use spot instance (ai training)

Common mistakes

FAQ

What is spot instance (ai training)?

A spot instance is a cloud GPU rented at a discount (often 50-90% off) on the condition that the provider can reclaim it on short notice — used for cost-sensitive training that can checkpoint and resume.

When should I use spot instance (ai training)?

Batch training and fine-tuning jobs with checkpointing. Large embedding generation pipelines.

What are the most common mistakes with spot instance (ai training)?

No checkpointing — losing hours of training when an instance is reclaimed. Using spot for live inference — interruptions break user UX.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/spot-instance.md.