Inference

CFG

2022ActivePublished: 8 June 2026Updated: 8 June 2026Published

Key innovation

Strengthens conditioning in generative models without a separate classifier — by extrapolating between the conditional and unconditional prediction of the same model trained with random condition dropout.

How it works

Training: the condition c (e.g. a text embedding) is randomly replaced with an empty token ∅ with probability p_uncond (typically 0.1-0.2). This way one set of weights learns both the conditional prediction ε_θ(x,c) and the unconditional ε_θ(x,∅). Inference: at each denoising (or generation) step TWO passes are computed — conditional and unconditional — and the result is linearly extrapolated: ε̃ = ε_θ(x,∅) + w·(ε_θ(x,c) − ε_θ(x,∅)). Equivalently ε̃ = (1−w)·ε_θ(x,∅) + w·ε_θ(x,c) in some conventions. The vector (ε_θ(x,c) − ε_θ(x,∅)) points in the "condition direction"; the scale w amplifies it. w = 1 means no guidance (purely conditional), w > 1 amplifies. Cost: ~2× inference compute (two forward passes), although batching the conditional and unconditional passes mitigates the overhead.

Problem solved

Conditional generative models often follow the condition (prompt) weakly, producing content only loosely related to it. Earlier classifier guidance required a separate classifier trained on noisy data — costly and difficult. CFG strongly amplifies conditioning using only the generative model itself, with no extra networks.

Components

Condition dropout (training)Trains a joint conditional and unconditional model

Randomly replacing the condition c with an empty token ∅ at probability p_uncond (usually 0.1-0.2) during training.

Dual forward pass (inference)Computes conditional and unconditional predictions

Two model passes per step: ε_θ(x,c) and ε_θ(x,∅). Often batched together.

Guidance extrapolationLinear combination steered by scale w

ε̃ = ε_θ(x,∅) + w·(ε_θ(x,c) − ε_θ(x,∅)). The scale w controls conditioning strength.

CFG rescaleVariance renormalization of ε̃ against oversaturation (Common Diffusion Noise Schedules paper).

Dynamic/adaptive CFGGuidance scale w that varies over denoising.

Negative prompt guidance∅ replaced by a negative-prompt embedding.

Official

Implementation

Reference implementations

Diffusers (Hugging Face) — guidance_scale

Python

Official

OpenAI guided-diffusion / GLIDE

Python

Official

Stability AI generative-models

Python

Official

Implementation pitfalls

Oversaturation and artifacts at high wHigh

Large guidance scale causes oversaturated colors, posterization, and unnatural textures.

Fix:CFG rescale, dynamic thresholding (Imagen), lower w, zero-SNR schedule.

Doubled inference costMedium

Two forward passes per step (conditional + unconditional) double the compute cost.

Fix:Batching both passes, guidance distillation, disabling CFG in late steps.

Diversity drop (conditional mode collapse)Medium

High w increases condition fidelity at the cost of sample diversity.

Fix:Choose w as a quality-diversity trade-off; adaptive guidance schedule.

Evolution

Original paper · 2022 · NeurIPS 2021 Workshop / arXiv 2022 · Jonathan Ho

Classifier-Free Diffusion Guidance

Jonathan Ho, Tim Salimans

2021

Classifier Guidance (predecessor)

Dhariwal & Nichol introduce guidance using a separate classifier trained on noisy data.

Diffusion Models Beat GANs on Image Synthesis (paper)

2021

Classifier-Free Guidance — introduction

Inflection point

Ho & Salimans show the separate classifier is unnecessary — a joint conditional/unconditional model suffices.

2022

Adoption in GLIDE, DALL·E 2, Imagen, Stable Diffusion

Inflection point

CFG becomes the standard conditioning mechanism across all leading text-to-image models.

LDM (concept)

2023

CFG rescale and zero-SNR

Lin et al. diagnose oversaturation at high w and propose rescale + zero-SNR schedule.

Common Diffusion Noise Schedules and Sample Steps are Flawed (paper)

2024

Guidance distillation (fewer steps)

Distilling CFG into a single forward pass removes the 2× compute overhead (e.g. in few-step models).

Sources

Classifier-Free Diffusion Guidance

Paper

arXiv / NeurIPS 2021 Workshop

Diffusion Models Beat GANs on Image Synthesis (classifier guidance)

Paper

arXiv / NeurIPS 2021

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Paper

arXiv

Common Diffusion Noise Schedules and Sample Steps are Flawed (CFG rescale)

Paper

arXiv / WACV 2024

Hugging Face Diffusers documentation

Documentation

Hugging Face

CFG

How it works

Problem solved

Components

Implementation

Evolution

Sources

Hyperparameters (configurable axes)

Execution paradigm

Parallelism

Hardware requirements