Environment-associated loci (RDA analysis)
We used redundancy analysis (RDA) to detect candidate genes under
environmental selection in C. gr. laevigata . RDA can be
used as a genotype-environment association (GEA) method to detect loci
under selection based on multivariate ordination (Forester et al.,
2016). RDA determines how groups of loci (here also chromosome number)
covary in response to the multivariate environment and can detect
processes that result in weak, multilocus molecular signatures (Rellstab
et al., 2015; Forester et al., 2018). Compared to other methods, RDA has
shown a superior combination of low false-positive and high
true-positive rates across a variety of selection scenarios (Forester et
al., 2018). Another advantage of RDA is that it can be used to analyze
many loci and environmental predictors simultaneously. For RDA we
considered only SNP sites for which there was sequence coverage in at
least 100 individuals, and for which the less-common allele was present
in at least 10% of sampled individuals [minor allele frequency (MAF)
filter ≥0.10]. The chromosome number was added as an additional column
in the SNPs database. RDA is a regression-based method, and so it is
subject to problems when using highly correlated predictors. Hence, we
removed correlated predictors with a correlation value of
r>0.7. We used bioclimatic variables from the Worldclim 2
database (Fick & Hijmans, 2017). The variable reduction was guided by
an ecological interpretation of the relevance of possible predictors. We
implemented a variable reduction protocol: First, we performed cluster
analyses of factors according to a matrix of absolute correlation values
|r|. For that, we used the complete linkage clustering
method of the ‘hclust’ function in R (R Core Team, 2021). After
subsequent cluster analyses, we retained one variable in clusters with a
distance among variables lower than 0.3 (correlation higher than 0.7).
We favored quarterly over monthly variables.
We also checked for multicollinearity using variance inflation factors
(VIF) and confirmed that VIF of selected variables was <10. We
also performed a permutation test using 1000 permutations to assess the
statistical significance of environmental variables and RDA axes used in
the models. We used the loadings of the SNPs (stored as species in the
RDA object from the R environment) in the ordination space to determine
which SNPs are candidates for local adaptation. We extracted the SNP
loadings from the two significant constrained axes. Outlier SNPs are
those that load in the tails of SNP loading distributions. Outliers were
identified as SNPs with the greatest loadings along the significant RDA
axes (i.e., those in the 2.5% upper and lower tails; Capblancq et al.,
2018), and were putatively considered as extremely associated with
environmental variables. Then we investigated the correlations between
environmental predictors and outlier SNPs. Finally, we represented SNPs
in the ordination space and color-coded them based on the predictor
variable with which they are most strongly correlated. Outlier SNPs with
the same signal as chromosome number were selected for investigating
their functional annotation, thus providing additional evidence of their
role in environmental adaptation. We used their genomic position to
identify candidate genes potentially under environmental selection
located <50 Kbp upstream or downstream in the annotatedCarex scoparia genome (Planta et al., 2022).