Discussion
In this study, we estimated the ability of species distribution model (SDM) frameworks to predict the ‘presence-absence’ and ‘relative-abundance’ of 67,148 soil microbial OTUs (i.e. eDNA based operational taxa) based on associated environmental conditions along an elevation gradient. For most of these OTUs, prior knowledge on their ecology had been very sparse. Nevertheless, SDM frameworks allowed better ‘presence-absence’ prediction than null models for more than 85% of OTUs, and 23% had models that displayed a high predictive power. Our results confirm, in line with previous studies, that eDNA sequences can be used in models of microbial environmental niches (Schröder, 2008; King et al., 2010; Mod et al., 2020, 2021; Lembrechts et al., 2020; Malard et al., 2022). For ‘relative-abundance’ models, 33% of predictions had good rank correlation with on-site values. However, the prediction of the exact value of OTUs’ ‘relative-abundance’ per site yielded large errors. A potential explanation could be the biases associated with relating proportion of reads to environment due, for instance, to intraspecific variations of the number of copies of the small ribosomal subunit, as observed for some microbial organisms (Stoddard et al., 2015; Lavrinienko et al., 2021), or even biases due to primers (Vaulot et al., 2022). This result is similar to SDM studies in macro-organisms that also showed low predictive power of abundance models (Pearce & Ferrier, 2001; Tôrres et al., 2012). We therefore highlight the difficulty in using SDM frameworks to predict abundance or other quantitative species characteristics (Van Couwenberghe et al., 2013; Lee-Yaw et al., 2022; Waldock et al., 2022). In regard to our results, we consider likely that a ‘presence-absence’ approach better depicts the situation in situ than a quantitative approach. However, a better and more fine-tuned quantitative approach than the one presented in our study may yield better results. For instance, zero-inflated models of semi-quantitative data such as the frameworks proposed in Guisan et. al. (1998) and in Irvine et al. (2016) for plant coverages, could be adapted to model ordinal classes of microbial OTUs abundance.
Model performances might also depend on how the environmental covariates that are used reflect the OTU ecological drivers (Guisan et al., 2017). We observed edaphic covariates as being the most selected covariates across all groups, emphasising the importance of soil properties in the spatial distribution of soil microorganisms, as previously reported (Birkhofer et al., 2012; de Vries et al., 2012; Terrat et al., 2017; Malard et al., 2022). Notably, our results continue bacteria and archaea to be highly dependent on soil pH (Yashiro et al., 2016, 2018; Malard et al., 2022). However, these performances were not consistent across phyla. For example, better performances were shown in Chloroflexi which are mostly heterotrophic phototrophs (Bryant, 2019), and Acidobacteriota, known to be strongly driven by pH, as well as other edaphic properties that were directly measured on site (Jones et al., 2009; Lauber et al., 2009; Navarrete et al., 2013). The strong relationship between organisms and the abiotic conditions that were directly measured at the field sites or from the collected soil samples (as opposed to covariates indirectly-derived from models or covariates not available like biotic interactions) may explain better performances obtained for these groups.
Conversely, poor model performances obtained for several OTUs could indicate a lack of environmental covariates relevant for these OTUs. Dependence on biotic interaction may be one of the main drivers of some microorganism distribution, further impacting model performances. For example, Patescibacetria, known for their strong dependence on biotic interactions with the surrounding community (Tian et al., 2007; Herrmann et al., 2019), had a low proportion of phyla with good modelling results compared to other phyla within bacteria. In contrast, Planctomycetota, which is also documented to contain many OTUs with highly dependent biotic interactions (Kaboré et al., 2020), had a higher proportion of OTUs presenting good model performance. More work focussing on these two phyla is needed to find to which extent biotic interaction and environmental conditions determine their OTUs distributions.
Spatial and/or the temporal resolution of our covariate could also be irrelevant to accurately model OTU spatial patterns (Nunan et al., 2003). For example, landscape type and structure covariates were present in the initial covariate dataset, but at a very coarse resolution, compared to the size and generation time of microorganisms. The selection and importance of these covariates was low in all models, despite reports of microorganisms influenced by this kind of covariate (e.g. for protists Seppey et al., 2023). Moreover, it has been shown in macro-organisms that information about micro-scale environmental conditions could improve model predictive power (Pradervand et al., 2014; Carter et al., 2016; Lembrechts et al., 2019, 2020).
Modelled organism characteristics can influence model performances (Guisan, Graham, et al., 2007; McCune et al., 2020; Collart et al., 2023). For instance, niche breadth is often reported as impacting the predictive power of SDMs, with species presenting large niches (i.e. generalists) tending to be harder to model than species presenting small ones (i.e. specialists) (Guisan & Hofer, 2003; Guisan, Zimmermann, et al., 2007; Marshall et al., 2015; Regos et al., 2019; Hallman & Robinson, 2020; Tessarolo et al., 2021). Malard et al. (2022) showed in the same study area that bacteria and protists have larger niche breadth than archaea and fungi. Our results combined with these findings tend to show that niche breadth may not be the main driving factor of microbial models’ predictive power. Whether niche breadth has or not an impact on these microorganisms distribution still needs further investigation.
Another explanation for differences in model performances can be the number of sampling sites (Thuiller et al., 2004; Hernandez et al., 2006; Wisz et al., 2008; Tessarolo et al., 2021; Chevalier et al., 2022). However, while we showed that the number of sites has an impact on performance, this explanation cannot stand alone to explain model performance differences among taxa. Moreover, when we fitted models for bacteria and archaea with the exact same sites used for protists, the loss of predictive power was only marginal, and the archaea/bacteria models’ predictive power remained much higher than protist models’ predictive power. Moreover, the generated null distributions showed little effect of prevalence on the performance metrics of the null models, with only some effect for extreme prevalence values. To improve model performances for these OTUs with very low counts of presence or absences, ensembles of small models could be used instead (Breiner et al., 2015, 2018).
Species distribution modelling frameworks were first designed to model distributions of macro-organism species (Franklin, 2010; Peterson et al., 2011; Guisan et al., 2017). Using eDNA barcoding OTUs as modelled entities implies that organisms with different ecological requirements are potentially clustered into the same OTU, thereby potentially resulting in misleading predictions and poor modelling results (Qiao et al., 2017; Smith et al., 2019). However, with macro-organisms, other taxonomic levels than Linnaean species have been modelled successfully (Hadly et al., 2009; Smith et al., 2019). Moreover, in an exploratory study, Mod et al. (2021) tested the effect of different clustering levels of OTUs on their models’ mean predictive power, and did not find any strong effect.
A usual application of species distribution models is the prediction of the modelled entity’s presence outside of sampling locations and time (i.e. ‘projections’; Guisan & Thuiller, 2005). Our results showed a large dependence of our soil-borne OTUs’ presence-absence patterns on edaphic conditions in the soil. Consequently, projections in time and space of alpine soil-borne microorganisms would necessitate the development of edaphic maps and associated scenarios of change (Mod et al., 2021). Yet, mapping soil properties isn’t an easy task, even under current conditions (Cianfrani et al., 2018). SoilGrids maps (Hengl et al., 2017) represent a possibility, but their resolution (250m) is currently not precise enough for local study areas, especially in rugged mountain landscapes as in the western Swiss Alps. Moreover, deriving future predictions with models including soil covariates will not be possible until scenarios of soil changes are also simultaneously developed (as can be found currently for climate and land-use; Mod et al., 2021). Yet, soil evolution under global change is still rather uncertain. While some studies predict an acidification of mountain soils through pollution (Hédl et al., 2011), others predict more mitigated responses of soil pH, carbon and nitrogen content (Davidson & Janssens, 2006; Trumbore & Czimczik, 2008; Rocci et al., 2021), with a lag between climatic changes and edaphic changes (Ladau et al., 2018; Mod et al., 2021). Taken together, to make full use of soil microorganism SDMs, we need to develop an ecologically relevant representation of covariates and their future scenarios.
To conclude, we showed that SDMs can be used to accurately predict presence-absence and relative abundance of microbial OTUs. Both ‘presence-absence’ and ‘relative-abundance’ approaches explore different aspects of microbial OTU distributions that can be helpful in ecological research on soil function and management. Particularly, relative abundance models as developed in this study could be used to discriminate areas with higher prevalence of some OTUs from areas with lower relative abundance of these OTUs. However, we advise future authors to pay special attention to the coefficient of variation of such models before giving too much credit to the actual predicted value of relative abundance on sites. These models open the way to develop spatial maps to predict soil OTU compositional changes and their spatial distributions in future soil and landscape scenarios. In this context, we urge that fine-scale maps be generated, as well as future scenarios for soil edaphic covariates because of their importance as the main drivers of soil microbial OTU distribution.