Environment-associated loci (RDA analysis)
We used redundancy analysis (RDA) to detect candidate genes under environmental selection in C. gr. laevigata . RDA can be used as a genotype-environment association (GEA) method to detect loci under selection based on multivariate ordination (Forester et al., 2016). RDA determines how groups of loci (here also chromosome number) covary in response to the multivariate environment and can detect processes that result in weak, multilocus molecular signatures (Rellstab et al., 2015; Forester et al., 2018). Compared to other methods, RDA has shown a superior combination of low false-positive and high true-positive rates across a variety of selection scenarios (Forester et al., 2018). Another advantage of RDA is that it can be used to analyze many loci and environmental predictors simultaneously. For RDA we considered only SNP sites for which there was sequence coverage in at least 100 individuals, and for which the less-common allele was present in at least 10% of sampled individuals [minor allele frequency (MAF) filter ≥0.10]. The chromosome number was added as an additional column in the SNPs database. RDA is a regression-based method, and so it is subject to problems when using highly correlated predictors. Hence, we removed correlated predictors with a correlation value of r>0.7. We used bioclimatic variables from the Worldclim 2 database (Fick & Hijmans, 2017). The variable reduction was guided by an ecological interpretation of the relevance of possible predictors. We implemented a variable reduction protocol: First, we performed cluster analyses of factors according to a matrix of absolute correlation values |r|. For that, we used the complete linkage clustering method of the ‘hclust’ function in R (R Core Team, 2021). After subsequent cluster analyses, we retained one variable in clusters with a distance among variables lower than 0.3 (correlation higher than 0.7). We favored quarterly over monthly variables.
We also checked for multicollinearity using variance inflation factors (VIF) and confirmed that VIF of selected variables was <10. We also performed a permutation test using 1000 permutations to assess the statistical significance of environmental variables and RDA axes used in the models. We used the loadings of the SNPs (stored as species in the RDA object from the R environment) in the ordination space to determine which SNPs are candidates for local adaptation. We extracted the SNP loadings from the two significant constrained axes. Outlier SNPs are those that load in the tails of SNP loading distributions. Outliers were identified as SNPs with the greatest loadings along the significant RDA axes (i.e., those in the 2.5% upper and lower tails; Capblancq et al., 2018), and were putatively considered as extremely associated with environmental variables. Then we investigated the correlations between environmental predictors and outlier SNPs. Finally, we represented SNPs in the ordination space and color-coded them based on the predictor variable with which they are most strongly correlated. Outlier SNPs with the same signal as chromosome number were selected for investigating their functional annotation, thus providing additional evidence of their role in environmental adaptation. We used their genomic position to identify candidate genes potentially under environmental selection located <50 Kbp upstream or downstream in the annotatedCarex scoparia genome (Planta et al., 2022).