Discussion
In this study, we estimated the ability of species distribution model
(SDM) frameworks to predict the ‘presence-absence’ and
‘relative-abundance’ of 67,148 soil microbial OTUs (i.e. eDNA based
operational taxa) based on associated environmental conditions along an
elevation gradient. For most of these OTUs, prior knowledge on their
ecology had been very sparse. Nevertheless, SDM frameworks allowed
better ‘presence-absence’ prediction than null models for more than 85%
of OTUs, and 23% had models that displayed a high predictive power. Our
results confirm, in line with previous studies, that eDNA sequences can
be used in models of microbial environmental niches
(Schröder, 2008; King
et al., 2010; Mod et al., 2020, 2021; Lembrechts et al., 2020; Malard et
al., 2022). For ‘relative-abundance’ models, 33% of predictions had
good rank correlation with on-site values. However, the prediction of
the exact value of OTUs’ ‘relative-abundance’ per site yielded large
errors. A potential explanation could be the biases associated with
relating proportion of reads to environment due, for instance, to
intraspecific variations of the number of copies of the small ribosomal
subunit, as observed for some microbial organisms
(Stoddard et al.,
2015; Lavrinienko et al., 2021), or even biases due to primers
(Vaulot et al.,
2022). This result is similar to SDM studies in macro-organisms that
also showed low predictive power of abundance models
(Pearce & Ferrier,
2001; Tôrres et al., 2012). We therefore highlight the difficulty in
using SDM frameworks to predict abundance or other quantitative species
characteristics (Van
Couwenberghe et al., 2013; Lee-Yaw et al., 2022; Waldock et al., 2022).
In regard to our results, we consider likely that a ‘presence-absence’
approach better depicts the situation in situ than a quantitative
approach. However, a better and more fine-tuned quantitative approach
than the one presented in our study may yield better results. For
instance, zero-inflated models of semi-quantitative data such as the
frameworks proposed in Guisan et. al.
(1998) and in Irvine
et al. (2016) for
plant coverages, could be adapted to model ordinal classes of microbial
OTUs abundance.
Model performances might also depend on how the environmental covariates
that are used reflect the OTU ecological drivers
(Guisan et al.,
2017). We observed edaphic covariates as being the most selected
covariates across all groups, emphasising the importance of soil
properties in the spatial distribution of soil microorganisms, as
previously reported
(Birkhofer et al.,
2012; de Vries et al., 2012; Terrat et al., 2017; Malard et al., 2022).
Notably, our results continue bacteria and archaea to be highly
dependent on soil pH
(Yashiro et al., 2016,
2018; Malard et al., 2022). However, these performances were not
consistent across phyla. For example, better performances were shown in
Chloroflexi which are mostly heterotrophic phototrophs
(Bryant, 2019), and
Acidobacteriota, known to be strongly driven by pH, as well as other
edaphic properties that were directly measured on site
(Jones et al., 2009;
Lauber et al., 2009; Navarrete et al., 2013). The strong relationship
between organisms and the abiotic conditions that were directly measured
at the field sites or from the collected soil samples (as opposed to
covariates indirectly-derived from models or covariates not available
like biotic interactions) may explain better performances obtained for
these groups.
Conversely, poor model performances obtained for several OTUs could
indicate a lack of environmental covariates relevant for these OTUs.
Dependence on biotic interaction may be one of the main drivers of some
microorganism distribution, further impacting model performances. For
example, Patescibacetria, known for their strong dependence on biotic
interactions with the surrounding community
(Tian et al., 2007;
Herrmann et al., 2019), had a low proportion of phyla with good
modelling results compared to other phyla within bacteria. In contrast,
Planctomycetota, which is also documented to contain many OTUs with
highly dependent biotic interactions
(Kaboré et al.,
2020), had a higher proportion of OTUs presenting good model
performance. More work focussing on these two phyla is needed to find to
which extent biotic interaction and environmental conditions determine
their OTUs distributions.
Spatial and/or the temporal resolution of our covariate could also be
irrelevant to accurately model OTU spatial patterns
(Nunan et al., 2003).
For example, landscape type and structure covariates were present in the
initial covariate dataset, but at a very coarse resolution, compared to
the size and generation time of microorganisms. The selection and
importance of these covariates was low in all models, despite reports of
microorganisms influenced by this kind of covariate (e.g. for protists
Seppey et al., 2023).
Moreover, it has been shown in macro-organisms that information about
micro-scale environmental conditions could improve model predictive
power (Pradervand et
al., 2014; Carter et al., 2016; Lembrechts et al., 2019, 2020).
Modelled organism characteristics can influence model performances
(Guisan, Graham, et
al., 2007; McCune et al., 2020; Collart et al., 2023). For instance,
niche breadth is often reported as impacting the predictive power of
SDMs, with species presenting large niches (i.e. generalists) tending to
be harder to model than species presenting small ones (i.e. specialists)
(Guisan & Hofer,
2003; Guisan, Zimmermann, et al., 2007; Marshall et al., 2015; Regos et
al., 2019; Hallman & Robinson, 2020; Tessarolo et al., 2021). Malard
et al. (2022) showed
in the same study area that bacteria and protists have larger niche
breadth than archaea and fungi. Our results combined with these findings
tend to show that niche breadth may not be the main driving factor of
microbial models’ predictive power. Whether niche breadth has or not an
impact on these microorganisms distribution still needs further
investigation.
Another explanation for differences in model performances can be the
number of sampling sites
(Thuiller et al.,
2004; Hernandez et al., 2006; Wisz et al., 2008; Tessarolo et al., 2021;
Chevalier et al., 2022). However, while we showed that the number of
sites has an impact on performance, this explanation cannot stand alone
to explain model performance differences among taxa. Moreover, when we
fitted models for bacteria and archaea with the exact same sites used
for protists, the loss of predictive power was only marginal, and the
archaea/bacteria models’ predictive power remained much higher than
protist models’ predictive power. Moreover, the generated null
distributions showed little effect of prevalence on the performance
metrics of the null models, with only some effect for extreme prevalence
values. To improve model performances for these OTUs with very low
counts of presence or absences, ensembles of small models could be used
instead (Breiner et
al., 2015, 2018).
Species distribution modelling frameworks were first designed to model
distributions of macro-organism species
(Franklin, 2010;
Peterson et al., 2011; Guisan et al., 2017). Using eDNA barcoding OTUs
as modelled entities implies that organisms with different ecological
requirements are potentially clustered into the same OTU, thereby
potentially resulting in misleading predictions and poor modelling
results (Qiao et al.,
2017; Smith et al., 2019). However, with macro-organisms, other
taxonomic levels than Linnaean species have been modelled successfully
(Hadly et al., 2009;
Smith et al., 2019). Moreover, in an exploratory study, Mod et al.
(2021) tested the
effect of different clustering levels of OTUs on their models’ mean
predictive power, and did not find any strong effect.
A usual application of species distribution models is the prediction of
the modelled entity’s presence outside of sampling locations and time
(i.e. ‘projections’;
Guisan & Thuiller,
2005). Our results showed a large dependence of our soil-borne OTUs’
presence-absence patterns on edaphic conditions in the soil.
Consequently, projections in time and space of alpine soil-borne
microorganisms would necessitate the development of edaphic maps and
associated scenarios of change
(Mod et al., 2021).
Yet, mapping soil properties isn’t an easy task, even under current
conditions (Cianfrani
et al., 2018). SoilGrids maps
(Hengl et al., 2017)
represent a possibility, but their resolution (250m) is currently not
precise enough for local study areas, especially in rugged mountain
landscapes as in the western Swiss Alps. Moreover, deriving future
predictions with models including soil covariates will not be possible
until scenarios of soil changes are also simultaneously developed (as
can be found currently for climate and land-use;
Mod et al., 2021).
Yet, soil evolution under global change is still rather uncertain. While
some studies predict an acidification of mountain soils through
pollution (Hédl et
al., 2011), others predict more mitigated responses of soil pH, carbon
and nitrogen content
(Davidson & Janssens,
2006; Trumbore & Czimczik, 2008; Rocci et al., 2021), with a lag
between climatic changes and edaphic changes
(Ladau et al., 2018;
Mod et al., 2021). Taken together, to make full use of soil
microorganism SDMs, we need to develop an ecologically relevant
representation of covariates and their future scenarios.
To conclude, we showed that SDMs can be used to accurately predict
presence-absence and relative abundance of microbial OTUs. Both
‘presence-absence’ and ‘relative-abundance’ approaches explore different
aspects of microbial OTU distributions that can be helpful in ecological
research on soil function and management. Particularly, relative
abundance models as developed in this study could be used to
discriminate areas with higher prevalence of some OTUs from areas with
lower relative abundance of these OTUs. However, we advise future
authors to pay special attention to the coefficient of variation of such
models before giving too much credit to the actual predicted value of
relative abundance on sites. These models open the way to develop
spatial maps to predict soil OTU compositional changes and their spatial
distributions in future soil and landscape scenarios. In this context,
we urge that fine-scale maps be generated, as well as future scenarios
for soil edaphic covariates because of their importance as the main
drivers of soil microbial OTU distribution.