<aside> 🧑‍🏫 By using gradients of target w.r.t feature maps as weights of feature maps, we can produce localization map for any CNN based models

</aside>

❗ CAM trades off model complexity and performance for transparency(classification model accuracy $\downarrow$) and can be only applied to CNNs that have GAP

✅ Grad-CAM allows to interpret models without altering architecture avoiding this trade-off

Untitled

Grad-CAM

Last CNN layers have best compromise between high-level semantics and detailed spatial information → they look for semantic class-specific information in and image

$L^c_{Grad-CAM} \isin \mathbb{R}^{u \times v}$ → class discriminative localization map Grad-CAM

$y^c$ → Score for class $c$

$A^k$ → feature map activations

$a^c_k$ → neuron importance weights

$$ a_k^c = \frac{1}{Z}\sum_i \sum_j \frac{\partial y^c}{\partial A^k_{i,j}} $$

Represent a partial linearization of the deep network downstream from A, and captures the ‘importance’ of feature map $k$ for a target class $c$

weighted combination of forward activation maps

$$ L^c_{Grad-CAM} = RELU \Bigg(\sum_ka^c_kA^k \Bigg) $$

Only interested in the features that have positive influence on the class of interest, i.e pixels whose intensity should be increased in order to increase $y^c$

Hence, $RELU$

Guided Grad-CAM

Untitled