Visualization using Grad-CAM
Deep learning based on convolutional neural networks has achieved high
accuracy in different areas of image recognition, such as image
classification, object detection, and image segmentation. This high
performance is obtained from processing substantial amounts of data with
deep neural networks. However, the multilayer, nonlinear structure of
deep learning causes difficulty in interpreting the model. This lack of
interpretability is a major disadvantage of deep learning, and this
technique is sometimes considered a “black box” method. In fact, in
fields such as clinical medicine, the lack of interpretability in models
is a barrier to the practical application of deep learning (Petch and
Nelson 2022). Recently, several approaches have been developed to
overcome these challenges. For example, Class Activation Mapping (CAM)
provides a heatmap visualization of the regions that influenced the
model’s predictions, which is valuable information for human
interpretation of the results (Zhou et al. 2016). However, this method
is not applicable when the Global Average Pooling (GAP) layer is absent,
which means it depends on the network structure. Gradient-weighted class
activation mapping (Grad-CAM), a generalized version of CAM
unconstrained by model structure, has improved this problem (Selvaraju
et al. 2017). The algorithm uses class-specific gradient information in
the final convolutional layer of a CNN to visualize important regions.
This study used Grad-CAM to show visual evidence in classifying native
and hybrid species.