A GAN consists of two networks: a *generator* network *G* and a discriminator
network *D* [1]. The generator *G* takes as input a random noise **z** sampled
from a prior distribution *p*_{z} and output a fake sample
*G*(**z**). The discriminator *D* takes as input either a sample drawn from real
data or generated by the generator and outputs a scalar indicating its
authenticity.

The adversarial setting goes like this:

*D tries to tell the fake samples from the real samples**G tries to fool D (to make D misclassify the generated, fake samples as real ones)*

In general, most GAN loss functions proposed in the literature take the following form:

max_{D}
**E**_{x~pd}
[ *f*(*D*(**x**)) ] +
**E**_{x~pg}
[ *g*(*D*(**x**)) ]

min_{G} **E**_{x~pg}
[ *h*(*D*(**x**)) ]

Here, *p _{d}* denotes the

In a conditional GAN (CGAN) [2], both the generator *G* and the discriminator
*D* are now conditioned on some variable *y*. Typical (**x**, *y*) pairs include
(data, labels), (data, tags), (image, image).

As the discriminator is often found to be too strong to provide reliable
gradients to the generator, one ** regularization approach** is to use some
gradient penalties to constrain the modeling capability of the discriminator.

Most gradient penalties proposed in the literature take the following form:

*λ* **E**_{x~px}
[ *R*( ||∇_{x} *D*(**x**)|| ) ]

Here, the *penalty weight* *λ* ∈ ℝ is a pre-defined constant, and *R*(·) is a
real function. The distribution *p*_{x} defines where the gradient
penalties are enforced. Note that this term will be added to the loss function
as a *regularization term* for the discriminator.

Here are some common gradient penalties and their *p*_{x} and
*R*(·).

gradient penalty type | p_{x} |
R(x) |
---|---|---|

coupled gradient penalties [3] | p + _{d}U[0, 1] (p − _{g}p)_{d} |
(x − k)^{2} or max(x, k) |

local gradient penalties [4] | p + _{d}c N[0, I] |
(x − k)^{2} or max(x, k) |

R_{1} gradient penalties [5] |
p_{d} |
x |

R_{2} gradient penalties [5] |
p_{g} |
x |

Spectral normalization [6] is another ** regularization approach** for GANs. It
normalizes the spectral norm of each layer in a neural network to enforce the
Lipschitz constraints. While the gradient penalties impose local
regularizations, the spectral normalization imposes a global regularization on
the discriminator.

[1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio,
“Generative Adversarial Networks,”
in *Proc. NeurIPS*, 2014.

[2] Mehdi Mirza and Simon Osindero,
“Conditional Generative Adversarial Nets,”
*arXiv preprint, arXiv:1411.1784*, 2014.

[3] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and
Aaron Courville,
“Improved Training of Wasserstein GANs,”
in *Proc. NeurIPS*, 2017.

[4] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira,
“On Convergence and Stability of GANs,”
*arXiv preprint, arXiv:1705.07215*, 2017.

[5] Lars Mescheder, Andreas Geiger, and Sebastian Nowozin,
“Which training methods for GANs do actually converge?”
in *Proc. ICML*, 2018.

[6] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida,
“Spectral Normalization for Generative Adversarial Networks,”
in *Proc. ICLR*, 2018.