2.3. Data analysis
2.3.1. Genetic Diversity and Genetic Structure
The AFLP bands were scored as present (1) or absent (0),and the AFLP band data were transferred to a binary (1/0) data matrix for further analysis. Shannon’s information index (I ) (RC, 1972), percentage of polymorphic loci (PPL ), and genetic distance were estimated using the POPGENE v1.31 (Yeh F, 1997).Total genetic diversity (HT ) and mean genetic diversity within populations (HS ) were calculated using Nei’s(Nei, 1973) genetic diversity statistics. A UPGMA tree based on genetic distance(Nei, 1978) was performed among different populations to identify their genetic relationships using NTsyspc v2.02 (Rohlf, 1997).
Population genetic structure was further assessed using model-based Bayesian assignment as implemented in STRUCTURE 2.3.4 software(Pritchard, Stephens, & Donnelly, 2000). Clustering of individuals was constructed without using the geographical origin of the samples as an informative prior. Analyses were based on an admixture ancestry model with correlated allele frequencies for a range of K genetic clusters from 1 to 36, with 20 replicates per K. The analyses were performed with a burn-in period and a run length of the Monte Carlo Markov chain (MCMC), of 10000 and 100000 iterations, respectively. The most likely number of genetic clusters (K) was estimated by the ΔK statistic (Evanno, Regnaut, & Goudet, 2005), using Structure Harvester (Earl & Vonholdt, 2012). Then CLUMPP 1.1.2(Jakobsson & Rosenberg, 2007) was used to align the 20 runs of the most representative K value and to compute the pairwise symmetric similarity coefficients (G) between pairs of runs, the average pairwise similarity (H) for the 20 replicates and the average probability of belonging to each cluster (Q). For K = 2, the full search method with 1000 replicates was used. A hierarchical analysis of molecular variance (AMOVA) was used to determine genetic differentiation (FST) within and among the groups. A nested AMOVA taking into consideration the two main genetic groups resulting from the Bayesian clustering with STRUCTURE (K = 2). using ARLEQUIN version 3.5.1.2(L Excoffier & Lischer, 2010).
2.3.2. Outlier Detection
Two complementary methods were used to detect outlier loci of all populations of R. aureum. BayeScan version 2.1. identifies markers with unusually high or low levels of genetic differentiation as outliers that have signatures of diversifying or balancing selection, respectively (Foll & Gaggiotti, 2008). Specifically, selection is concluded from an AFLP marker if the marker-specific estimates ofFST are needed in addition to population-specific estimates to interpret observed patterns of differentiation in the dataset(Fischer, Foll, Excoffier, & Heckel, 2011). A threshold value for determining loci under selection was estimated according to Jeffreys’ interpretation(Harris, 1961), i.e., log10 PO > 2.0 was regarded as adequate evidence for selection. A threshold of log10 PO > 2.0 was employed to reject the null hypothesis in each of the conducted tests. The analysis was performed with 20 pilot runs and a 50,000 step burn-in followed by 50,000 iterations and a thinning interval of 10 for the set of polymorphic AFLP markers. The false discovery rate (FDR ) calculated the expected proportion of false positives for statistically significant results(Foll & Gaggiotti, 2008). We used the software Dfdist to simulate a null distribution of FST values under an island model, which was relative insensitive to demographic structure, population structure and mutation level. Simulations were conducted with a meanFSTequal to the trimmed mean FST , which was calculated by excluding 30% of the most extremeFST values observed in the empirical dataset. We analysed the distributions of the FST values over all loci to null hypothesis of neutral evolution. Loci with a high or low FST value were taken as potentially under selection. In the study, we simulated the neutral distribution ofFST with 10,000 iterations at the 99.9% confidence intervals.
2.3.3. Environmental data and correlation analyses
To characterize environmental differences, BIOCLIM variables were obtained for each of the 36 sites by extrapolating climate data to the GPS coordinates for each population using DIVA-GIS software (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005; Hijmans, Guarino, Cruz, & Rojas, 2001). The BIOCLIM dataset includes 19 variables that describe monthly temperature and precipitation patterns for a spatial resolution of 1 km2(http://www.worldclim.org/). The sampled area spanned all the known R. aureum range in China, which occurs primarily in the alpine, along an annual mean precipitation gradient from 761-1096 mm and annual mean temperature gradient from -6.6°C to 0.1°C. Elevation, slope, and topographic index were derived from a 90m digital elevation model. The data set is provided by International Scientific & Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Sciences (http://www.gscloud.cn). A principal component analysis (PCA) was applied to these environmental variables to examine possible correlations between eco-climatic variables and elevation and remove redundant variables (i.e. variables that were correlated at |r| > 0.8 and which were logically related). We first identified variables correlated to each retained axis, creating groups of variables. Within each group, we kept only one (or two) variables considered to be the most pertinent in terms of local adaptation in plants. Finally, eleven factors (Bio 1, annual mean temperature; Bio 2, mean diurnal range; Bio 3, Isothermality; Bio 4, temperature seasonality; Bio 5, max temperature of warmest month; Bio 9, mean temperature of driest quarter; Bio 16, precipitation of wettest quarter; Bio 17, precipitation of driest quarter; slp, slope; asp, aspect; tpi, topographic position index) were chosen as representative of environment factors.
Prior to subsequent analyses, environment data were log10(x+1) transformed to improve normality and reduce heteroscedasticity. Dissimilarity matrices of Euclidean distances were calculated among normalized environment variables using R package. A matrix of geographic distances among sites was generated from GPS coordinates with the AFLP data in R software(Ehrich, 2006) and also log10(x+1) transformed. The genetic distance matrices of R. aureum was calculated with PopGene(Yeh F, 1997) using the Jaccard dissimilarity method of the scored bands, were tested against the Euclidean distance of the environment variables while controlling of the geographic distance matrix. The Mantel test was performed using the default preset values and parameters (999 permutation steps) according to the software manual using R program “vegan” package. Multiple matrix regression with randomization (MMRR) is a novel and robust approach for estimating the independent effects of potential factors, and the analysis was implemented with 10,000 permutations in R with the MMRR function script(Goslee & Urban, 2007; I. J. Wang, 2013; Wu, Yu, Wang, Li, & Xu, 2015).
Then, to detect relationships between allele frequencies and environmental variables, we applied Multiple Linear Regression (MLR)(Zulliger, Schnyder, & Gugerli, 2013) in R v.3.3.3 to determine potential adaptive loci that are under selection from existing environmental factors. For outlier loci identified by both Dfdist and BayeScan, we estimated their population pairwise frequencies of AFLP alleles at the 36 sites. We selected values of the eleven selected environmental factors from each population. We then regressed the allele frequencies of the retained outlier loci (dependent variables) on the selected environmental variables (explanatory variables; standardized) using the MLR model. Potential adaptive loci were identified as Radj2> 0.5. Univariate regressions were then applied to each variable individually to estimate its significance.
2.3.4. Species distribution model (SDM)
We used Maxent 3.3.3k to predict distribution changes for R. aureum as a result of climate changing. Maxent is a program for maximum entropy modelling of the geographical distributions of species; it combines presence-only data with ecological-climatic layers to predict suitable areas (Phillips SJ, 2006; Phillips & Dudik, 2008). For current distribution, we downscaled climate grids for the periods 1970–2000. In addition to sample locations in this study, we also collected the distribution records of R. aureum from the Chinese Virtual Herbarium (http://www.cvh.ac.cn/ ). After removing duplicate records, it remained a total of 42 records ofR. aureum (Table B )that were used to establish the distribution model by using 19 bio-climatic data layers from the WorldClim database. Most of the default parameters of Maxent were applied to conduct SDM, except the following user-selected parameters: application of random seed and random test percentage of 70%, replicates of 10 and bootstrap as the replicated run type. The logistic output of Maxent consists of a grid map with each cell having an index of suitability between 0 and 1. Low values show that the conditions are unsuitable for the species to live, whereas high values show that the conditions are more suitable. Model predictions were visualized using DIVA-GIS(Hijmans et al., 2001).
To obtain the distribution of R. aureum at the Last Glacial Maximum, we projected correlation between current species-climate and the LGM using the Community Climate System Model (CCSM4) scaled down to a 2.5-arcmin resolution. We used the Hadley Global Environment Model 2 (HadGEM2-ES) as a general circulation model under two climate scenarios (IPCC-CMIP5 RCP 26/85) to ensure the accuracy of assessment. The RCP 85 scenario represents a higher predicted greenhouse gas emission than RCP 26.
3. Results
3.1 Patterns of AFLP variation and population structure 3.1.1. Subsubsection
The ten AFLP primer combinations generated 449 unambiguously scorable bands ranging from 1500 to 100 bp in 461 individuals from 36 natural populations. Of these fragments, 99.777% (448) were polymorphic. The overall Shannon’s Information index (I ) and Nei’s genetic diversity (H ) was 0.584 and 0.402 respectively.R. aureum showed high genetic diversity at the species level. Iranged from 0.014 to 0.329 and H ranged from 0.01 to 0.228 within population (Table 1) . The genetic diversity of most populations is at moderate level; but there is still a low genetic diversity of some populations, such as population N7.
Bayesian clustering analyses performed with the software STRUCTURE indicated that the most informative representation of overall genetic structure was achieved for K = 2, where K is the number of subpopulations. The southern slope populations (populations S1–S6) and the western populations on the U-shaped valley of the northern slope (populations NW7 and NW8) formed cluster 2, while populations NW9 and NW10 showed a high degree of membership to cluster 2. Populations L1 and L2, which are not on the prominent peak of Changbai Mountain and populations from the western slope region showed that, with exception of populations NW3 and NW4, individuals had a high level of admixture. The remaining populations displayed a high degree of membership to cluster 1