4.
Discussion
In the present study, the performance of different models in predicting
spatially biased occurrence data was evaluated. For this purpose,
modeling was performed based four SDMs. A variety of correlative
distribution models have been emerged in recent decades and their
performance has been compared in numerous studies (Elith and Graham,
2009; Merow et al., 2014). These methods, however, are challenged by a
number of methodological problems that make their comparison
controversial. Among these issues are providing a balance between
goodness-of-fit and model complexity (Araújo et al., 2019; Warren and
Seifert, 2011), and the spatial bias of the input data and manipulating
them for evaluating model performance (Hijmans, 2012; Phillips et al.,
2009). Generally, for most SDMs, particularly for complex machine
learning ones, using a set of default parameters have been recommended
based on a comprehensive model tuning [for example see Phillips &
Dudík (2008)
for the MaxEnt and Elith et al., (2008) for boosted regression trees].
However, a blindfold utilization of them may come up with a poorly
performing model (Muscarella et al., 2014). Equally important, is the
probable SAC in the input occurrence points which may result in an
inflated score of the metrics used for model valuation (Dormann et al.,
2007; Radosavljevic and Anderson, 2014). Another problem is the paucity
of operative procedures for tuning SDMs that are efficiently effortless
and time-saving. Consequently, in most SDM efforts, while the species,
scale, and area of interest are different, the modeling relies on
default setting of the related model.
In the current research, we employed a novel approach that in one hand
applyed model-specific parameterization, and on the other hand,
manipulated a spatially imblanced-biased input data to cope with the
above-mentioned issues. In general, the results showed that the models
have high predictive performance based on AUC and TSS values. However,
the geographical pattern of the predicted suitable habitats among models
was different. For example, the area of suitable habitats was greater in
GLM in comparison with other SDMs. Accordingly, for this model lower
values of specificity were observed across the different gradients of
suitability thresholds (red curves in Fig. 3). This is intrinsically due
to the simple regression-based nature of this model in which a basic
assumption is the normality of the error distribution and its constant
variance (Osborne and Waters, 2002). Thus, those data that are not used
as training are well-fitted in the predicted model. On the other hand,
this makes GLM prone to over-prediction across non-sampled areas leading
to lower specificity, i.e., true negative rate, of this model.
On the contrary, we found the best training sensitivity and specificity
for decision-tree-based methods, i.e., GBM and RF. These models, due to
the automatic promotion caused by the model learner, attempt to improve
task classification, as much as possible, leading to the highest
discrimination capacity of training presence/absence dataset (De’ath and
Fabricius, 2000; Friedman, 2017). In our case, results indicated that
the RF models for both random and background weighting procedures, while
excellently classified the training dataset, failed to predict
out-of-bag ones. Fundamentally, RF and GBM depend on bagging and
boosting algorithms of the tree learner, respectively (Elith et al.,
2008; Friedman, 2017). In RF the bagging algorithm, also called
bootstrap aggregation, allows the tree learner repeatedly select a
random sample with replacement of the training set and fits trees to
these samples (Breiman, 2001). The trees in RF are run in parallel and
there is no interaction between them while the trees are built. Once all
the trees are built, then an average is taken across all the trees’
predictions (Cutler et al., 2012). Conversely, the trees in boosting
algorithms, e.g. GBM, are trained sequentially, and accordingly, weaker
results are boosted or reweighted over many iterations to have the
learner focus more on areas it got wrong, and less on those observations
that were correct (Elith et al., 2008; Friedman, 2017). The risk of
deeply-grown trees in random forests comes at the expense of overfitting
the training data set in which predictions have low bias but high
variances. In our modeling approach, we grew the RF model to 1000 trees
which is reasonable and comparable to the similar SDM efforts (Breiman,
2001; Shabani et al., 2016). Altough based on a cross-validated
splitting of the training data we fine-tuned the RF model, it appears
that this model still suffers from high variance. In fact, the strict
classification algorithm of the RF model is prone to return low
capabilities in predicting out-of-bag dataset (Elith et al., 2008; Merow
et al., 2014). Accordingly, the use of this method in extrapolating
predictions should be taken into account prudently.
Given the random background approach, our results highlight the
excellent predictive performance of the GBM model in which true positive
rate, i.e. sensitivity, and true negative rate, i.e. specificity,
remained well-justified across accumulative thresholds. In contrast to
the paralleled bagging method of the RF, it is believed that the
boosting manner of the tree learner in GBM reduce the probability of
overfitting and allows well-classifying out-of-bag samples (Elith et
al., 2008; Shabani et al., 2016). However, our study revealed that
fitting a GBM model based on a background weighting approach
significantly reduces the predictive performance of this model for
identifying out-of-bag data. This highlights the inefficiency of this
method for being used in SDM efforts with imbalanced-biased data where
the primary goal is finding probability distribution over areas that are
not comparably sampled.
Unlike the decision-tree-based methods, MaxEnt resulted in a comparable
prediction based on both random and background weighting approaches.
This was similar to the results of the GLM model except that for the
latter true negative rates were misplaced at lower thresholds. As
mentioned before, the normally-distributed errors and no trends in
residuals relative to the fitted values (Hardin et al., 2007) allow the
GLM to be interpretably efficient for predicting out-of-bag data. In
MaxEnt, as a density estimator algorithm, the species distribution is
represented by a probability distribution that is closest to uniform
(Phillips et al., 2006). This probability distribution is bounded by a
set of constraints that are simple functions of the explanatory
variables, called ‘features’, and derived from the species occurrence
locations. The primary assumption of the MaxEnt model is that the mean
of each feature is required to be close (within same error bounds) to
the empirical average over the presence sites (Phillips and Dudík,
2008). From a general point of view, this constraint in MaxEnt could be
assumed equivalent to the consistency of error variance in the GLM
method, and as a consequence, bringing consistent results for out-of-bag
data is also expectable in the MaxEnt model. In both GLM and MaxEnt
models a maximum likelihood is used to estimate a parametric exponential
distribution of linear combination of features (Phillips et al. 2004).
Although GLM could be fitted by considering quadratic and interactive
terms of the explanatory variables, more variation of feature types in
MaxEnt allows fitting more complex models (Phillips et al. 2006). More
importantly, while GBM and RF as complex machine learning methods are
more prone to overfitting, the regularization multiplier in MaxEnt
prevents the model to match the input data too closely. Altogether, our
results highlight the efficiency of the MaxEnt model in using the
benefits of different modeling methods to produce results that are both
predictable (extrapolative) and complex (interpolative).
Asymmetry in spatially biased model predictions also highlights the need
to evaluate model performance using threshold-dependent sensitivity
(true positive rate) and specificity (true negative rate) in addition to
threshold-independent AUC. As criticized by Lobo et al. (2008) and
Jiménez‐Valverde (2012), modeling goals and setting highly influence the
appropriateness of the AUC for measuring the performance of a model. AUC
inflates the number of false absence data (Lobo et al., 2008), and
accordingly, over-represents predictive performance for rare species
(Phillips et al., 2009; Stolar and Nielsen, 2015), as in the case ofMontivipera species. Moreover, being only as a discrimination
measure, AUC doesn’t show goodness-of-fit, i.e. classification accuracy
of the model, and consequently, a model with high AUC value is not
necessarily a well-fitted one (Jiménez‐Valverde, 2012). For threshold
dependent measures a critical trick is selecting the best suitability
threshold at which the sensitivity and specificity of the resulted model
are well-balanced. Although several thresholds have been suggested to do
this, for example see Liu et al. (2005) as a review, a single
suitability threshold provides only a cross-section of the model
performance and doesn’t provide a comprehensive perception of the
classification accuracy of the model across a gradient of suitability
threshold. For example, the threshold at which the sensitivity is equal
to the specificity, i.e., where their curves cross on the Fig. 3, is
among the widely used suitability thresholds, nevertheless, the
corresponding accuracy measure doesn’t inclusively specify the
performance of the model to classify presence and background data.
There are several ways to correct sampling bias, some of which cannot be
used in cases where data is scarce. Spatial filtering may not be helpful
when there are only a few presence points (Phillips et al., 2009).
Decreasing clumpiness reduces training sample size and, depending on the
heterogeneity of the surrounding environment and the selected spatial
resolution, it may drop some of the information on the species
occupation sites. In addition to spatial filtering and background
weighting, a third method called model-based bias correction has been
used (El‐Gabbas and Dormann, 2018a) to address spatial bias in
occurrence data. In this method other environmental variables, used as
bias covariates, characterize potential sources of sampling bias.
Although this method is confirmed to be useful when dealing with sparse
datasets (El‐Gabbas and Dormann, 2018b), it is highly dependent on the
selected bias variable that, in turn, intensifies upstream uncertainty
caused by assisting covariates.
It is worth mentioning that bias adjustment and model parameterization
depending on the prevalence of the target species result in varying
spatial predictions (Araújo et al., 2019; Pottier et al., 2013). This
variation is most noticeable for common species, and thus, distribution
models of rare species that are habitat specialists may not be very
sensitive to spatially-biased occurrence data (Stolar and Nielsen,
2015). Being limited to a narrow gradient of environmental conditions,
specialist species are thus more predictable as well as more
distinguishable, i.e. high values of AUC of their SDMs, because of the
high distinctiveness between their occurrence points and background
space. In our case, this tendency was more obvious where narrow-ranged
mountain vipers (Ahmadi et al., 2019) obtained high scores of AUC and
TSS. This, to a high extend, justify our SDM approach where, due to the
sparse data of the mountain vipers, their occurrence points were pooled
into one set. Since they belong to distinct taxonomic levels, e.g.
species or sub-species levels, the resulted SDMs might challenge niche
equilibrium assumption (Wiens et al., 2009) and be prone to a inflated
niche breadth (Pearman et al., 2010) where the resulted distribution
models show higher levels of over-estimation. However, niche inflation
is more challenging for general species with abundant data (Randin et
al., 2006). Moreover, the narrow-ranged mountain vipers, in general, and
the species/sub-species belonging to the Raddei clade, in specific, show
low rates of niche evolution and high degrees of niche conservatism
(Ahmadi et al., 2021) that leads to the occupation of similar ecological
conditions in this species.