1. Sigmoid


problems:
- Saturated neurons “kill” the gradients
- Sigmoid outputs are not zero-centered
- exp() is a bit compute expensive
2. Tanh(x)

3. ReLU
$f(x) = max(0,x)$

A dead ReLU will never activate → never update
4. Leaky ReLU

Parametric Rectifier (PReLU)
$f(x)=max(ax,x)$
5. Exponential Linear Units (ELU)
