1. Mode Collapse

Untitled

Mode Collapse
- When the discriminator gets stuck in a local minimum and doesn’t find the best strategy, it’s too easy for the generator to find the most plausible output for the discriminator

<aside> 🧑‍🏫 Mode collapse happens when the generator learns to fool the discriminator by producing examples from a single class from the whole training dataset

</aside>

2. Problems with BCE loss

$$ J(\theta) = - \frac{1}{m}\sum^{m}_{i=1}[y^{(i)}\log{h(x^{(i)},\theta)}+(1-y^{(i)})\log{(1-h(x^{(i)},\theta))}] $$

Untitled

Objective in GANs : Make the generated and real distributions look similar

2-1. BCE loss in GANs

Discriminator
- Needs to output just a single value prediction within zero and one
- Easy to train
Generator
- Needs to produce a complex output composed of multiple features e.g an image
- Hard to train

❗This unbalanced training difficulty of the discriminator and Generator causes the vanishing gradient problem

Untitled

The discriminator can easily distinguish between real and fake when it is superior than the generator which leads to vanishing gradients

3. Earth Mover’s Distance

Untitled

Effort to make the generated distribution equal to the real distribution
Depends on the distance and amount moved

Untitled

No ceiling to zero to one which resolves the problem of vanishing gradients
Reduces the likelihood of mode collapse in GANs

4. Wasserstein Loss

4-1. BCE Loss Simplified

$$ J(\theta) = - \frac{1}{m}\sum^{m}_{i=1}[y^{(i)}\log{h(x^{(i)},\theta)}+(1-y^{(i)})\log{(1-h(x^{(i)},\theta))}] $$

$$ \min_{G} \max_{D}V(D,G) = \mathbb{E}[\log D(x)] + \mathbb{E}[\log (1-D(G(z)))] $$