Model registry
A model registry is the versioned store for trained model artifacts + metadata — the source of truth for which model version is in staging vs prod, what its eval scores were, and how to roll back. MLflow, W&B, Sagemaker, Vertex AI all ship one in 2026.
Without a registry, 'which model is in prod?' becomes tribal knowledge. Registries solve this: every trained model gets versioned + tagged (stage: dev / staging / prod), linked to its training run, eval scores attached, and lifecycle hooks (promote, deprecate, roll back). For LLM apps the registry holds prompt versions + eval scores + linked datasets — same pattern, different artifact. Production patterns: every push to main triggers eval; promotion to prod requires passing baseline; rollback is one click. MLflow's registry is the open-source standard; cloud platforms (Sagemaker, Vertex AI, Azure ML) ship native equivalents. For LLM-only stacks, Langfuse / Braintrust / Opik provide prompt registries.
When to use model registry
- Production ML / LLM apps.
- Compliance / audit-heavy environments.
Common mistakes
- Promoting without eval gates — broken model ships silently.
- No rollback drill — first incident discovers rollback is broken.
FAQ
What is model registry?
A model registry is the versioned store for trained model artifacts + metadata — the source of truth for which model version is in staging vs prod, what its eval scores were, and how to roll back. MLflow, W&B, Sagemaker, Vertex AI all ship one in 2026.
When should I use model registry?
Production ML / LLM apps. Compliance / audit-heavy environments.
What are the most common mistakes with model registry?
Promoting without eval gates — broken model ships silently. No rollback drill — first incident discovers rollback is broken.
Related terms
- Experiment tracking — Experiment tracking is the MLOps discipline of capturing hyperparameters, metrics, model artifacts, and code versions per training run — making model development reproducible and comparable. W&B, Comet, MLflow, Neptune are 2026 leaders.
- Shadow deployment (LLM) — Shadow deployment runs a new model or prompt alongside the production one — receiving the same traffic but never showing output to users — to measure quality, latency, and cost before flipping live.
- Evals-driven development — Evals-driven development is the discipline of writing the eval suite first, then iterating prompts and models against it — borrowing test-driven development for LLM work.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/model-registry.md.