ControlNet
ControlNet is a neural-network architecture that conditions a diffusion image model on extra spatial inputs — edges, depth, pose, segmentation — for precise control over output structure.
ControlNet (Zhang et al., 2023) adds a parallel conditioning branch to a pretrained diffusion model so it can accept a structural hint alongside the text prompt. Common control types: Canny edges, depth maps, OpenPose skeletons, normal maps, scribbles, line art. The model then generates images that match both the text prompt and the spatial structure. ControlNet is the production workflow for AI photography, AI animation, and architectural visualisation — anywhere the composition has to be exact. The richest ControlNet ecosystem is on Stable Diffusion 3.5; Flux ControlNets are catching up; Midjourney has no first-party ControlNet.
When to use controlnet
- Locking a specific pose, edge map, or depth structure across generations.
- Architectural visualisation, product photography, character keyframing.
Common mistakes
- Using a misaligned control input — produces broken hybrid output.
- Combining too many ControlNets at once — model loses coherence.
FAQ
What is controlnet?
ControlNet is a neural-network architecture that conditions a diffusion image model on extra spatial inputs — edges, depth, pose, segmentation — for precise control over output structure.
When should I use controlnet?
Locking a specific pose, edge map, or depth structure across generations. Architectural visualisation, product photography, character keyframing.
What are the most common mistakes with controlnet?
Using a misaligned control input — produces broken hybrid output. Combining too many ControlNets at once — model loses coherence.
Related terms
- Diffusion model — A diffusion model is a generative neural network that creates images, video, or audio by iteratively denoising random noise toward a learned target distribution.
- LoRA (Low-Rank Adaptation) — LoRA is a fine-tuning method that trains a small set of low-rank adapter weights on top of a frozen base model — cheaper to train and store than full fine-tuning.
- Seed — A seed is an integer that initializes the random number generator inside an image, video, or audio model, making generation reproducible.
- CFG scale (classifier-free guidance) — CFG scale controls how strongly a diffusion image model follows its text prompt — higher values stick closer to the prompt, lower values explore more.
Sources
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/controlnet.md.