Architecture

GAN

2014ActivePublished: 8 June 2026Updated: 8 June 2026Published

Key innovation

Training a generative model via a minimax game between two networks — a generator and a discriminator — instead of explicit likelihood maximization, yielding sharp, realistic samples without modeling an explicit density.

How it works

The generator G(z;θ_G) maps a noise vector z (usually 𝒩(0,I) or U(−1,1)) into a sample. The discriminator D(x;θ_D) returns the probability that x is real. Alternating training: (1) D step — maximize log D(x) + log(1 − D(G(z))) on a minibatch of real and fake samples (binary classification); (2) G step — minimize log(1 − D(G(z))) or, in practice, maximize log D(G(z)) (non-saturating loss, better gradients). Gradients flow through D into G. Variants change the loss and regularization: WGAN (Wasserstein distance + weight clipping), WGAN-GP (gradient penalty), LSGAN (least squares), hinge loss, spectral normalization. Architectural variants: DCGAN (convolutions), conditional GAN (condition c into both networks), Pix2Pix/CycleGAN (image-to-image), StyleGAN (style-based generator with a latent mapping w), BigGAN (large scale + self-attention). Training is delicate — it requires balancing the power of G and D.

Problem solved

Earlier generative models (VAE) produced blurry samples due to the averaging nature of reconstruction losses, and explicit-density models were computationally expensive. GANs bypass explicit density modeling — learning the distribution implicitly via the discriminator signal — leading to sharp, realistic samples and fast single-pass generation.

Components

GeneratorMaps noise to data samples

A network transforming a latent vector z into a sample G(z). Trained to fool the discriminator. In StyleGAN preceded by a mapping network z → w.

DiscriminatorDistinguishes real from generated samples

A binary classifier (or a critic in WGAN returning a scalar) providing the learning signal to the generator. Usually discarded after training.

Binary classifier (vanilla/DCGAN)Sigmoid + binary cross-entropy.

Critic (WGAN)Returns a scalar approximating the Wasserstein distance.

PatchGAN (Pix2Pix)Classifies local patches instead of the whole image.

Adversarial lossThe minimax game objective driving training

The loss defining the game: vanilla (BCE), non-saturating, Wasserstein, least squares, hinge. The choice strongly affects stability.

Official

Latent space zSource of generation stochasticity

The input distribution (usually 𝒩(0,I)). In StyleGAN mapped into a style space W with better disentanglement properties.

Implementation

Reference implementations

StyleGAN3 (NVIDIA, oficjalna)

Python

Official

pytorch-CycleGAN-and-pix2pix

Python

Official

PyTorch-GAN (kolekcja implementacji)

Implementation pitfalls

Mode collapseCritical

The generator produces limited (or single-mode) sample diversity, ignoring parts of the data distribution.

Fix:WGAN-GP, minibatch discrimination, unrolled GANs, spectral norm, a larger discriminator.

Training instability and oscillationsHigh

The minimax game may not converge — losses oscillate and quality fluctuates; the G/D power balance is delicate.

Fix:TTUR (separate learning rates), gradient penalty, spectral norm, generator weight EMA, careful tuning.

Vanishing gradients (vanilla loss)High

When the discriminator is too strong, the generator gradient vanishes (log(1−D(G(z)))→0).

Fix:Non-saturating loss (max log D(G(z))), Wasserstein loss.

Hard evaluationMedium

The lack of explicit likelihood complicates evaluation; metrics (FID, IS) are imperfect and sensitive.

Fix:FID + Precision/Recall, human eval, multiple seeds, variance reporting.

Evolution

Original paper · 2014 · NeurIPS 2014 · Ian J. Goodfellow

Generative Adversarial Nets

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

2014

GAN — introduction

Inflection point

Goodfellow et al. introduce the two-network minimax game as a new generative paradigm.

2015

DCGAN — stable convolutional GANs

Radford et al. establish architectural patterns enabling stable image GAN training.

Unsupervised Representation Learning with Deep Convolutional GANs (paper)

2017

WGAN / WGAN-GP — Wasserstein stabilization

Inflection point

Arjovsky et al. and Gulrajani et al. introduce the Wasserstein distance and gradient penalty, mitigating mode collapse.

Wasserstein GAN (paper)

2017

Pix2Pix / CycleGAN — image-to-image

Isola et al. and Zhu et al. enable paired and unpaired image translation.

Image-to-Image Translation with Conditional Adversarial Networks (Pix2Pix) (paper)

2018

Progressive GAN and BigGAN

Karras et al. (progressive growing) and Brock et al. (large scale + self-attention) reach high-resolution, photorealistic samples.

2019

StyleGAN / StyleGAN2 — style-based generator

Inflection point

Karras et al. introduce the style space W and feature control, setting SoTA in face generation.

A Style-Based Generator Architecture for Generative Adversarial Networks (paper)

2021

Diffusion Models Beat GANs — shift of dominance

Inflection point

Dhariwal & Nichol show diffusion models surpass GANs in quality and diversity, ending the GAN-dominance era.

Diffusion Model (concept)Diffusion Models Beat GANs on Image Synthesis (paper)

2023

GANs in supporting and low-latency roles

Adversarial loss remains a component of diffusion VAEs; GANs dominate audio vocoders and single-pass super-resolution, and in diffusion distillation (e.g. adversarial distillation).