concept

Experiment tracking

Experiment tracking is the MLOps discipline of capturing hyperparameters, metrics, model artifacts, and code versions per training run — making model development reproducible and comparable. W&B, Comet, MLflow, Neptune are 2026 leaders.

Without experiment tracking, ML / LLM teams lose the answer to 'why did this model perform better?' after a few weeks. Tracking platforms capture: hyperparameters (learning rate, batch size, prompt template), metrics (loss curves, eval scores), artifacts (model weights, datasets, configs), code version (git SHA), system metrics (GPU util). The UI lets you compare runs side-by-side, plot metric trends, and reproduce winning runs. In 2026 LLM apps extend this with prompt tracking, eval scores, dataset versions, and trace samples — Weave (W&B), Opik (Comet), Langfuse, Braintrust serve this. Experiment tracking is the prerequisite for serious ML / LLM iteration; ad-hoc tracking via spreadsheets collapses after week 2.

When to use experiment tracking

Common mistakes

FAQ

What is experiment tracking?

Experiment tracking is the MLOps discipline of capturing hyperparameters, metrics, model artifacts, and code versions per training run — making model development reproducible and comparable. W&B, Comet, MLflow, Neptune are 2026 leaders.

When should I use experiment tracking?

Any team training models or iterating prompts seriously. Reproducibility / audit requirements.

What are the most common mistakes with experiment tracking?

Logging too little — only loss, no system metrics or sample outputs — debugging later is impossible. Logging too much — every gradient norm — UI becomes unusable.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/experiment-tracking.md.