<aside> π§βπ« By using gradients of target w.r.t feature maps as weights of feature maps, we can produce localization map for any CNN based models
</aside>
β CAM trades off model complexity and performance for transparency(classification model accuracy $\downarrow$) and can be only applied to CNNs that have GAP
β Grad-CAM allows to interpret models without altering architecture avoiding this trade-off

Last CNN layers have best compromise between high-level semantics and detailed spatial information β they look for semantic class-specific information in and image
$L^c_{Grad-CAM} \isin \mathbb{R}^{u \times v}$ β class discriminative localization map Grad-CAM
$y^c$ β Score for class $c$
$A^k$ β feature map activations
$a^c_k$ β neuron importance weights
$$ a_k^c = \frac{1}{Z}\sum_i \sum_j \frac{\partial y^c}{\partial A^k_{i,j}} $$
Represent a partial linearization of the deep network downstream from A, and captures the βimportanceβ of feature map $k$ for a target class $c$
weighted combination of forward activation maps
$$ L^c_{Grad-CAM} = RELU \Bigg(\sum_ka^c_kA^k \Bigg) $$
Only interested in the features that have positive influence on the class of interest, i.e pixels whose intensity should be increased in order to increase $y^c$
Hence, $RELU$
