<aside> 🧑🏫 Activation functions have to be non-linear and differentiable
</aside>
GANs are very hard to train so every trick that speeds up and stabilizes training is crucial
Different Distributions
Normalization and its effects
Training
$$ \hat{z}^{[l]}_i = \frac{z^{[l]}i-\mu{z^{[l]}i}}{\sqrt{\sigma^2{z^{[l]}_i}+\epsilon}} $$
$$ y^{[l]}_i = \gamma\hat{z}^{[l]}_i + \beta $$
→ Learnable parameters to get the optimal dist
<aside> 🧑🏫 Batch normalization gives you control over what that distribution will look like moving forward in the neural network
</aside>