1. Sigmoid

Untitled

Untitled

problems:

  1. Saturated neurons “kill” the gradients
  2. Sigmoid outputs are not zero-centered
  3. exp() is a bit compute expensive

2. Tanh(x)

Untitled

3. ReLU

$f(x) = max(0,x)$

Untitled

A dead ReLU will never activate → never update

4. Leaky ReLU

Untitled

Parametric Rectifier (PReLU)

$f(x)=max(ax,x)$

5. Exponential Linear Units (ELU)

Untitled