Testing for the isolating effects of distance, terrain, current
and historic climatic suitability and environmental niche
We used landscape genetic approaches to estimate the degree of
connectivity among localities, assessing the relative impact of
isolating barriers (IBRTerrain, based on topographic
complexity), current climatic suitability (IBRHabitat,
based on environmental niches), climatic suitability since the LGM (IBI
– isolation-by-instability) and the environmental niche (IBE) on
population genomic differentiation (FST ).
For each species, we downloaded
occurrence data from GBIF (Table S5;
Chamberlainet al ., 2021, Fig. S1) and pruned the data using custom cleaning
techniques (COORDINATECLEANER, Zizka et al. 2019), flagging records with
equal longitude/latitude, zero coordinates, coordinate–country
mismatches, records located in country centroids, in the sea or around
GBIF headquarters as well as duplicates and altitudinal outliers. To
estimate IBRTerrain, we used raw elevation data (30
arc-sec resolution, earthenv.org/topography) and calculated topographic
complexity using the Terrain Ruggedness Index (TRI, tri function,
SPATIALECO, Evans 2021). For each
species, we drew convex hulls around GBIF occurrences with a 0.4° buffer
(RGEOS, Bivand & Rundel 2020) and assessed each species’ elevational
distribution range (Table S5). Within the convex hulls, we restricted
the TRI layer to the plausible altitudinal distribution range of each
species by removing areas 100 masl below and above the lowest and
highest GBIF occurrence (mask , RASTER; Table S5). To define a
resistance cost surface based on TRI, we chose cost values to represent
different proportions of the GBIF occurrences, with 1 (low cost)
representing the central 70% of the GBIF occurrences of each species, 2
(moderate cost) representing the adjacent 5% quantiles (10%-15% and
85%-90%), 4 (high cost) representing the next 5% quantiles (5%-10%
and 90%-95%), 8 (very high cost) representing the 0%-5% and
95%-100% quantiles, and 16 (extremely high cost) representing TRI
values outside of the range covered by actual GBIF occurrences. Next, we
calculated a transition object specifying a ‘knight and one-cell queen
move’ direction (transition , GDISTANCE,
van Etten 2017) and correcting for
map distortion (geoCorrection ). Finally, we calculated all
pairwise least-cost distances (paths with least-accumulative cost over
cost surface) between the sampling localities (costDistancefunction).
To estimate IBRHabitat, we selected four bioclimatic
variables relevant for tropical plants: Annual Mean Temperature (bio1),
Mean Diurnal Temperature Range (bio2), Annual Precipitation (bio12),
Precipitation Seasonality (bio15;
WorldClim 2.0, 30 arc sec, Fick &
Hijmans 2017). We restricted these rasters as we did for
IBRTerrain and estimated the abiotic climatic niche of
each species using all GBIF occurrences using MaxEnt 3.4.1 (DISMO,
Hijmans et al. 2020). We created 10000 pseudo-absence points (the
default number of background points used by MaxEnt) from raster cells
lacking GBIF occurrences (DISMO, Hijmans et al. 2020). We set 80% of
the GBIF occurrences of each species as training dataset and 20% to
validate the models, using 500 iterations. We fit models 10 times and
validated model fit using AUC (area under the receiver operating curve,
values > 0.75 indicate good fit,
Swets 1988) and TSS (true skill
statistic, values range between -1 and +1; -1 to 0 –no better than
random; values > 0.4 – 0.8 acceptable models, see
Ornelas et al. 2019, Table S6;
SDMtune, Vignali et al. 2020). In addition, we evaluated model
performance using fourfold spatial-block cross-validation (blockCV,
Valavi et al. 2019). In this approach, the occurrence dataset is split
into four spatially distinct blocks, which are used as training datasets
separately and subsequently used to test how well a model can be
transferred to environmental space not used for calibration (Table S6).
After assuring good model performance, we created raster maps of habitat
suitability (0 – 0% suitable, 1 – 100% suitable) and averaged all
ten models. To arrive with a resistance cost surface, we subtracted the
climatic niche raster from 1 so that 0 means low cost and 1 means high
cost. We then calculated the transition object as for
IBRTerrain and calculated all pairwise least-cost
distances between the sampled localities.
To estimate IBI, we used the same four bioclimatic variables (bio1,
bio2, bio12, bio15, at 2.5 arc min resolution) as for the current
climate, but derived from three different general circulation models for
the LGM and mid-Holocene (NCAR-CCSM4, MIROC-ESM,
MPI-ESM-P, WorldClim 1.4, Hijmans
et al. 2005). Since substantial downward shifts in elevation zones may
have occurred during the LGM
(Ramírez-Barahona & Eguiarte
2013), we set the lower elevation restriction to zero in all species and
kept the upper elevation restriction (see above). To minimize errors of
temporal extrapolation from current to past niche models, we ran
Multivariate Environmental
Similarity Surface analyses (MESS) in DISMO (Figure S2,
Elith et al. 2010). If past
environments encompass climatic conditions not found in the training
dataset, MESS will give negative values. High dissimilarity (many
negative values) is limiting the predictive accuracy of models and hence
identifies scenarios/areas where model results need to be treated with
particular care. For each species, we calculated MESS for each past
circulation model (Figure S2). Dissimilarity was overall low in all
species but M. sanguinea , and the MIROC-ESM circulation model
showed highest dissimilarity across all species (Fig. S2). We hence
excluded MIROC-ESM from subsequent analyses.
For estimating habitat stability, we first estimated the current abiotic
climatic niche of each species as for IBRHabitat, and
then projected current habitat suitability to the LGM and mid-Holocene.
We fit each model 10 times and averaged habitat suitability across runs
and circulation models. Next, to derive a measure of habitat suitability
through time, we overlaid the suitability maps of the three times
(current, mid-Holocene, LGM) and averaged suitability of each raster
cell. A raster cell providing 100% suitable conditions at all three
times hence received a value of 1, while a raster cell providing
unsuitable conditions at all three times received a value of 0. Again,
we subtracted the suitability raster by 1 to arrive with a resistance
cost surface and calculated all pairwise least-cost distances between
localities of each species (see above; Fig. S3).
Finally, we calculated isolation-by-environment (IBE) by extracting the
raw values of the four bioclimatic variables (current climate) per
population and calculating the Euclidean distances in environmental
space between sampled localities using the dist function in R.
Naturally, the resistance surfaces we calculated for
IBRTerrain, IBRHabitat and IBI represent
refined geographic matrices and may thus be strongly correlated to the
Euclidean geographic IBD matrices. We hence first ran separate Mantel
tests (with 10,000 permutations) on these four matrices (each matrix
standardized by the mean to account for differences in scale) to assess
collinearity (García-Rodríguez et
al. 2020). The four matrices were strongly correlated in most species
(Table S7). We hence next constructed separate Mantel tests (with 10,000
permutations) to test for the effect of the respective distance matrix
on normalized population genetic differentiation
(FST /(1- FST )). Then, for
each species, we selected the distance measure with the highest R² and
used multiple matrix regressions with randomizations (10,000 random
permutations, Wang 2013) to test
for the relative impact of the respective distance matrix and IBE on
population genetic differentiation using rglMMRR (POPGENREPORT,
Adamack
& Gruber 2014).
Given recent criticism on the use of multiple matrix regressions
(inflation of degrees of freedom, weak correlations may receive
significant p-values (Moncada et al. 2020)), we additionally implemented
Generalized Dissimilarity Modelling (GDM). GDM is a multivariate
procedure allowing for modelling a single response variable as a
function of any number of explanatory variables (Manly 1986) and allows
for non-linear relationships among the response and explanatory
variables using I-spline functions (Ferrier et al. 2007). For each
species, we fit a GDM, specifying FST as response
and IBD plus the resistance matrices and IBE as explanatory variables
using the R-package gdm
(Fitzpatrick et al. 2020). We
plotted I-splines to visualize how magnitudes and rates of genetic
differentiation vary with explanatory variables and estimated the
(combination of) explanatory variable(s) best explaining genetic
differentiation using backward elimination with permutation (see Ferrier
et al. 2007, García-Rodríguez et al. 2020 for recent implementation).
Briefly, the unique contribution of each predictor variable to total
explained deviance is calculated, and the variable least contributing is
discarded before fitting a new GDM. We ran 500 permutations until all
variables retained in the final model made significant (p <
0.05), unique contributions to the explained deviance.