1. Mode Collapse

- Mode Collapse
- When the discriminator gets stuck in a local minimum and doesn’t find the best strategy, it’s too easy for the generator to find the most plausible output for the discriminator
<aside>
🧑🏫 Mode collapse happens when the generator learns to fool the discriminator by producing examples from a single class from the whole training dataset
</aside>
2. Problems with BCE loss
$$
J(\theta) = - \frac{1}{m}\sum^{m}_{i=1}[y^{(i)}\log{h(x^{(i)},\theta)}+(1-y^{(i)})\log{(1-h(x^{(i)},\theta))}]
$$

- Objective in GANs : Make the generated and real distributions look similar
2-1. BCE loss in GANs
-
Discriminator
- Needs to output just a single value prediction within zero and one
- Easy to train
-
Generator
- Needs to produce a complex output composed of multiple features e.g an image
- Hard to train
❗This unbalanced training difficulty of the discriminator and Generator causes the vanishing gradient problem

- The discriminator can easily distinguish between real and fake when it is superior than the generator which leads to vanishing gradients
3. Earth Mover’s Distance

- Effort to make the generated distribution equal to the real distribution
- Depends on the distance and amount moved

- No ceiling to zero to one which resolves the problem of vanishing gradients
- Reduces the likelihood of mode collapse in GANs
4. Wasserstein Loss
4-1. BCE Loss Simplified
$$
J(\theta) = - \frac{1}{m}\sum^{m}_{i=1}[y^{(i)}\log{h(x^{(i)},\theta)}+(1-y^{(i)})\log{(1-h(x^{(i)},\theta))}]
$$
$$
\min_{G} \max_{D}V(D,G) = \mathbb{E}[\log D(x)] + \mathbb{E}[\log (1-D(G(z)))]
$$