# Diffusion model

**Source:** https://promtable.com/glossary/diffusion-model

> A diffusion model is a generative neural network that creates images, video, or audio by iteratively denoising random noise toward a learned target distribution.

---
A diffusion model is a generative neural network that creates images, video, or audio by iteratively denoising random noise toward a learned target distribution.

Diffusion models — including Stable Diffusion, Flux, Midjourney, DALL·E 3, Imagen, Sora, Kling, Veo, and Stable Audio — start from random Gaussian noise and run a denoising network for 20–50 steps to produce a coherent sample. Variants like latent diffusion (SD, Flux) operate in a compressed latent space for speed; flow-matching (Flux, SD3) replaces the diffusion formulation with a direct ODE path. Generation is steered by text encoders (CLIP, T5) that condition the denoising process. In 2026 diffusion still dominates image and video generation; autoregressive image models (e.g. nano-banana, Imagen 4) are catching up on instruction-following but are not the majority.

## Common mistakes

- Comparing diffusion samplers without fixing the seed.
- Treating diffusion outputs as deterministic — same prompt, different seed = different image.

## Related terms

- [negative-prompt](https://promtable.com/glossary/negative-prompt)
- [seed](https://promtable.com/glossary/seed)
- [cfg-scale](https://promtable.com/glossary/cfg-scale)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/diffusion-model
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/diffusion-model".
Contact: info@vibecodingturkey.com.