Testing for the isolating effects of distance, terrain, current and historic climatic suitability and environmental niche
We used landscape genetic approaches to estimate the degree of connectivity among localities, assessing the relative impact of isolating barriers (IBRTerrain, based on topographic complexity), current climatic suitability (IBRHabitat, based on environmental niches), climatic suitability since the LGM (IBI – isolation-by-instability) and the environmental niche (IBE) on population genomic differentiation (FST ). For each species, we downloaded occurrence data from GBIF (Table S5; Chamberlainet al ., 2021, Fig. S1) and pruned the data using custom cleaning techniques (COORDINATECLEANER, Zizka et al. 2019), flagging records with equal longitude/latitude, zero coordinates, coordinate–country mismatches, records located in country centroids, in the sea or around GBIF headquarters as well as duplicates and altitudinal outliers. To estimate IBRTerrain, we used raw elevation data (30 arc-sec resolution, earthenv.org/topography) and calculated topographic complexity using the Terrain Ruggedness Index (TRI, tri function, SPATIALECO, Evans 2021). For each species, we drew convex hulls around GBIF occurrences with a 0.4° buffer (RGEOS, Bivand & Rundel 2020) and assessed each species’ elevational distribution range (Table S5). Within the convex hulls, we restricted the TRI layer to the plausible altitudinal distribution range of each species by removing areas 100 masl below and above the lowest and highest GBIF occurrence (mask , RASTER; Table S5). To define a resistance cost surface based on TRI, we chose cost values to represent different proportions of the GBIF occurrences, with 1 (low cost) representing the central 70% of the GBIF occurrences of each species, 2 (moderate cost) representing the adjacent 5% quantiles (10%-15% and 85%-90%), 4 (high cost) representing the next 5% quantiles (5%-10% and 90%-95%), 8 (very high cost) representing the 0%-5% and 95%-100% quantiles, and 16 (extremely high cost) representing TRI values outside of the range covered by actual GBIF occurrences. Next, we calculated a transition object specifying a ‘knight and one-cell queen move’ direction (transition , GDISTANCE, van Etten 2017) and correcting for map distortion (geoCorrection ). Finally, we calculated all pairwise least-cost distances (paths with least-accumulative cost over cost surface) between the sampling localities (costDistancefunction).
To estimate IBRHabitat, we selected four bioclimatic variables relevant for tropical plants: Annual Mean Temperature (bio1), Mean Diurnal Temperature Range (bio2), Annual Precipitation (bio12), Precipitation Seasonality (bio15; WorldClim 2.0, 30 arc sec, Fick & Hijmans 2017). We restricted these rasters as we did for IBRTerrain and estimated the abiotic climatic niche of each species using all GBIF occurrences using MaxEnt 3.4.1 (DISMO, Hijmans et al. 2020). We created 10000 pseudo-absence points (the default number of background points used by MaxEnt) from raster cells lacking GBIF occurrences (DISMO, Hijmans et al. 2020). We set 80% of the GBIF occurrences of each species as training dataset and 20% to validate the models, using 500 iterations. We fit models 10 times and validated model fit using AUC (area under the receiver operating curve, values > 0.75 indicate good fit, Swets 1988) and TSS (true skill statistic, values range between -1 and +1; -1 to 0 –no better than random; values > 0.4 – 0.8 acceptable models, see Ornelas et al. 2019, Table S6; SDMtune, Vignali et al. 2020). In addition, we evaluated model performance using fourfold spatial-block cross-validation (blockCV, Valavi et al. 2019). In this approach, the occurrence dataset is split into four spatially distinct blocks, which are used as training datasets separately and subsequently used to test how well a model can be transferred to environmental space not used for calibration (Table S6). After assuring good model performance, we created raster maps of habitat suitability (0 – 0% suitable, 1 – 100% suitable) and averaged all ten models. To arrive with a resistance cost surface, we subtracted the climatic niche raster from 1 so that 0 means low cost and 1 means high cost. We then calculated the transition object as for IBRTerrain and calculated all pairwise least-cost distances between the sampled localities.
To estimate IBI, we used the same four bioclimatic variables (bio1, bio2, bio12, bio15, at 2.5 arc min resolution) as for the current climate, but derived from three different general circulation models for the LGM and mid-Holocene (NCAR-CCSM4, MIROC-ESM, MPI-ESM-P, WorldClim 1.4, Hijmans et al. 2005). Since substantial downward shifts in elevation zones may have occurred during the LGM (Ramírez-Barahona & Eguiarte 2013), we set the lower elevation restriction to zero in all species and kept the upper elevation restriction (see above). To minimize errors of temporal extrapolation from current to past niche models, we ran Multivariate Environmental Similarity Surface analyses (MESS) in DISMO (Figure S2, Elith et al. 2010). If past environments encompass climatic conditions not found in the training dataset, MESS will give negative values. High dissimilarity (many negative values) is limiting the predictive accuracy of models and hence identifies scenarios/areas where model results need to be treated with particular care. For each species, we calculated MESS for each past circulation model (Figure S2). Dissimilarity was overall low in all species but M. sanguinea , and the MIROC-ESM circulation model showed highest dissimilarity across all species (Fig. S2). We hence excluded MIROC-ESM from subsequent analyses.
For estimating habitat stability, we first estimated the current abiotic climatic niche of each species as for IBRHabitat, and then projected current habitat suitability to the LGM and mid-Holocene. We fit each model 10 times and averaged habitat suitability across runs and circulation models. Next, to derive a measure of habitat suitability through time, we overlaid the suitability maps of the three times (current, mid-Holocene, LGM) and averaged suitability of each raster cell. A raster cell providing 100% suitable conditions at all three times hence received a value of 1, while a raster cell providing unsuitable conditions at all three times received a value of 0. Again, we subtracted the suitability raster by 1 to arrive with a resistance cost surface and calculated all pairwise least-cost distances between localities of each species (see above; Fig. S3).
Finally, we calculated isolation-by-environment (IBE) by extracting the raw values of the four bioclimatic variables (current climate) per population and calculating the Euclidean distances in environmental space between sampled localities using the dist function in R.
Naturally, the resistance surfaces we calculated for IBRTerrain, IBRHabitat and IBI represent refined geographic matrices and may thus be strongly correlated to the Euclidean geographic IBD matrices. We hence first ran separate Mantel tests (with 10,000 permutations) on these four matrices (each matrix standardized by the mean to account for differences in scale) to assess collinearity (García-Rodríguez et al. 2020). The four matrices were strongly correlated in most species (Table S7). We hence next constructed separate Mantel tests (with 10,000 permutations) to test for the effect of the respective distance matrix on normalized population genetic differentiation (FST /(1- FST )). Then, for each species, we selected the distance measure with the highest R² and used multiple matrix regressions with randomizations (10,000 random permutations, Wang 2013) to test for the relative impact of the respective distance matrix and IBE on population genetic differentiation using rglMMRR (POPGENREPORT, Adamack & Gruber 2014).
Given recent criticism on the use of multiple matrix regressions (inflation of degrees of freedom, weak correlations may receive significant p-values (Moncada et al. 2020)), we additionally implemented Generalized Dissimilarity Modelling (GDM). GDM is a multivariate procedure allowing for modelling a single response variable as a function of any number of explanatory variables (Manly 1986) and allows for non-linear relationships among the response and explanatory variables using I-spline functions (Ferrier et al. 2007). For each species, we fit a GDM, specifying FST as response and IBD plus the resistance matrices and IBE as explanatory variables using the R-package gdm (Fitzpatrick et al. 2020). We plotted I-splines to visualize how magnitudes and rates of genetic differentiation vary with explanatory variables and estimated the (combination of) explanatory variable(s) best explaining genetic differentiation using backward elimination with permutation (see Ferrier et al. 2007, García-Rodríguez et al. 2020 for recent implementation). Briefly, the unique contribution of each predictor variable to total explained deviance is calculated, and the variable least contributing is discarded before fitting a new GDM. We ran 500 permutations until all variables retained in the final model made significant (p < 0.05), unique contributions to the explained deviance.