<aside> 🧑🏫 Using Global Average Pooling instead of FC layers of CNNs allow us to use Class Activation Maps for discriminative localization!
</aside>

$f_k(x,y)$ → activation of unit $k$ in the last conv layer at $(x,y)$
$F^k : \sum_{x,y}f_k(x,y)$ → GAP for unit $k$
$S_c = \sum_k w^c_kF_k$ → input to the softmax for class $c$
→ $w^c_k$ indicates importnce of $F_k$ for class $c$
$P_c$ → softmax of $Sc$


