2.3. Data analysis
2.3.1. Genetic Diversity and Genetic Structure
The
AFLP bands were scored as present (1) or absent (0),and the AFLP band
data were transferred to a binary (1/0) data matrix for further
analysis. Shannon’s information index (I ) (RC, 1972), percentage
of polymorphic loci (PPL ), and genetic distance were estimated
using the POPGENE v1.31 (Yeh F,
1997).Total
genetic diversity (HT ) and mean genetic diversity
within populations (HS ) were calculated using
Nei’s(Nei, 1973) genetic diversity statistics.
A UPGMA tree based on genetic
distance(Nei, 1978) was performed among different populations to
identify their genetic relationships using NTsyspc v2.02 (Rohlf, 1997).
Population genetic structure was
further assessed using model-based Bayesian assignment as implemented in
STRUCTURE 2.3.4 software(Pritchard,
Stephens, & Donnelly, 2000).
Clustering of individuals was
constructed
without using the geographical origin of the samples as an informative
prior.
Analyses
were based on an admixture ancestry model with correlated allele
frequencies for a range of K genetic clusters from 1 to 36, with 20
replicates per K. The analyses were performed with a burn-in period and
a run length of the Monte Carlo Markov chain (MCMC), of 10000 and 100000
iterations, respectively. The most likely number of genetic clusters (K)
was
estimated by the ΔK statistic (Evanno, Regnaut, & Goudet, 2005), using
Structure Harvester (Earl & Vonholdt, 2012). Then
CLUMPP
1.1.2(Jakobsson & Rosenberg, 2007) was used to align the 20 runs of the
most representative K value and to compute the pairwise symmetric
similarity coefficients (G) between pairs of runs, the average pairwise
similarity (H) for the 20 replicates and the average probability of
belonging to each cluster (Q). For K = 2, the full search method with
1000 replicates was used. A hierarchical analysis of molecular variance
(AMOVA) was used to determine genetic differentiation
(FST) within and among the groups. A nested AMOVA
taking into consideration the two main genetic groups resulting from the
Bayesian clustering with STRUCTURE (K = 2). using ARLEQUIN version
3.5.1.2(L Excoffier & Lischer, 2010).
2.3.2. Outlier Detection
Two
complementary methods were used to detect outlier loci of all
populations of R. aureum. BayeScan version 2.1. identifies
markers with unusually high or low levels of genetic differentiation as
outliers that have signatures of
diversifying or balancing selection, respectively (Foll & Gaggiotti,
2008). Specifically, selection is
concluded from an AFLP marker if the marker-specific estimates ofFST are needed in addition to population-specific
estimates to interpret observed patterns of differentiation in the
dataset(Fischer, Foll, Excoffier, & Heckel, 2011). A threshold value
for determining loci under selection was estimated according to
Jeffreys’ interpretation(Harris, 1961), i.e., log10 PO
> 2.0 was regarded as adequate evidence for selection. A
threshold of log10 PO > 2.0 was employed to reject the null
hypothesis in each of the conducted tests. The analysis was performed
with 20 pilot runs and a 50,000 step burn-in followed by 50,000
iterations and a thinning interval of 10 for the set of polymorphic AFLP
markers. The false discovery rate (FDR ) calculated the expected
proportion of false positives for statistically significant results(Foll
& Gaggiotti, 2008). We used the software Dfdist to simulate a null
distribution of FST values under an island model,
which
was relative insensitive to demographic structure, population structure
and mutation level. Simulations were conducted with a meanFSTequal to the trimmed mean FST , which was
calculated by excluding 30% of the most extremeFST values observed in the empirical dataset. We
analysed the distributions of the FST values over
all loci to null hypothesis of neutral evolution. Loci with a high or
low FST value were taken as potentially under
selection.
In
the study, we simulated the neutral distribution ofFST with 10,000 iterations at the 99.9%
confidence intervals.
2.3.3. Environmental data and correlation analyses
To characterize environmental differences, BIOCLIM variables were
obtained for each of the 36 sites by extrapolating climate data to the
GPS coordinates for each population using DIVA-GIS software (Hijmans,
Cameron, Parra, Jones, & Jarvis, 2005; Hijmans, Guarino, Cruz, &
Rojas, 2001). The BIOCLIM dataset includes 19 variables that describe
monthly temperature and precipitation patterns for a spatial resolution
of 1 km2(http://www.worldclim.org/).
The sampled area spanned all the known R. aureum range in China,
which occurs primarily in the alpine, along an annual mean precipitation
gradient from 761-1096 mm and annual mean temperature gradient from
-6.6°C to
0.1°C.
Elevation, slope, and topographic index were derived from a 90m digital
elevation model. The data set is provided by International Scientific &
Technical Data Mirror Site, Computer Network Information Center, Chinese
Academy of Sciences
(http://www.gscloud.cn). A
principal component analysis (PCA) was applied to these environmental
variables to examine possible correlations between eco-climatic
variables and elevation and remove redundant variables (i.e. variables
that were correlated at |r| > 0.8 and
which were logically related). We first identified variables correlated
to each retained axis, creating groups of variables. Within each group,
we kept only one (or two) variables considered to be the most pertinent
in terms of local adaptation in plants. Finally,
eleven
factors (Bio 1, annual mean temperature; Bio 2, mean diurnal range; Bio
3, Isothermality; Bio 4, temperature
seasonality; Bio 5, max temperature of warmest month; Bio 9, mean
temperature of driest quarter; Bio 16, precipitation of wettest quarter;
Bio 17, precipitation of driest quarter; slp, slope; asp, aspect; tpi,
topographic position index) were chosen as representative of environment
factors.
Prior to subsequent analyses, environment data
were
log10(x+1)
transformed to improve normality and reduce heteroscedasticity.
Dissimilarity matrices of Euclidean distances were calculated among
normalized environment variables using R package. A matrix of geographic
distances among sites was generated from GPS coordinates with the AFLP
data in
R
software(Ehrich, 2006) and also log10(x+1) transformed.
The genetic distance matrices of R. aureum was calculated with
PopGene(Yeh F, 1997) using the Jaccard dissimilarity method of the
scored bands, were tested against the Euclidean distance of the
environment variables while controlling of the geographic distance
matrix. The Mantel test was performed using the default preset values
and parameters (999 permutation steps) according to the software manual
using R program “vegan” package. Multiple matrix regression with
randomization (MMRR) is a novel and robust approach for estimating the
independent effects of potential factors, and the analysis was
implemented with 10,000 permutations in R with the MMRR function
script(Goslee & Urban, 2007; I. J. Wang, 2013; Wu, Yu, Wang, Li, & Xu,
2015).
Then, to detect relationships between allele frequencies and
environmental variables, we applied
Multiple
Linear Regression (MLR)(Zulliger, Schnyder, & Gugerli, 2013) in R
v.3.3.3 to determine potential adaptive loci that are under selection
from existing environmental factors. For outlier loci
identified
by both Dfdist and BayeScan, we estimated their population pairwise
frequencies of AFLP alleles at the 36 sites. We selected values of the
eleven selected environmental factors from each population. We then
regressed the allele frequencies of the
retained
outlier loci (dependent variables) on the selected environmental
variables (explanatory variables; standardized) using the MLR model.
Potential adaptive loci were identified as
Radj2> 0.5.
Univariate regressions were then
applied to each variable individually to estimate its significance.
2.3.4. Species distribution model (SDM)
We used
Maxent
3.3.3k to predict distribution changes for R. aureum as a result
of climate changing.
Maxent
is a program for maximum entropy modelling of the geographical
distributions of species; it combines presence-only data with
ecological-climatic layers to predict suitable areas (Phillips SJ, 2006;
Phillips & Dudik, 2008). For current distribution, we downscaled
climate grids for the periods 1970–2000. In addition to sample
locations in this study, we also
collected
the distribution records of R. aureum from the Chinese Virtual
Herbarium
(http://www.cvh.ac.cn/ ).
After removing duplicate records, it remained a total of 42 records ofR. aureum (Table
B )that were used to establish the
distribution model by using 19 bio-climatic data layers from the
WorldClim database. Most of the default parameters of Maxent were
applied to conduct SDM, except the following user-selected parameters:
application of random seed and random test percentage of 70%,
replicates of 10 and bootstrap as the replicated run type. The logistic
output of Maxent consists of a grid map with each cell having an index
of suitability between 0 and 1. Low values show that the conditions are
unsuitable for the species to live, whereas high values show that the
conditions are more suitable.
Model
predictions were visualized using DIVA-GIS(Hijmans et al., 2001).
To
obtain the distribution of R. aureum at the
Last
Glacial Maximum,
we
projected
correlation between current species-climate and the LGM using the
Community Climate System Model
(CCSM4)
scaled down to a 2.5-arcmin resolution. We used the Hadley Global
Environment Model 2 (HadGEM2-ES) as a general circulation model under
two climate scenarios (IPCC-CMIP5 RCP 26/85) to ensure the accuracy of
assessment. The RCP 85
scenario
represents a higher predicted greenhouse gas emission than RCP 26.
3. Results
3.1 Patterns of AFLP variation and population structure 3.1.1.
Subsubsection
The ten AFLP primer combinations generated 449 unambiguously
scorable
bands ranging from 1500 to 100 bp in 461 individuals from 36 natural
populations. Of these fragments, 99.777% (448) were
polymorphic.
The overall Shannon’s Information index
(I )
and Nei’s genetic diversity (H ) was 0.584 and 0.402 respectively.R.
aureum showed high genetic diversity at the species level. Iranged from 0.014 to 0.329 and H ranged from 0.01 to 0.228 within
population (Table 1) .
The
genetic diversity of most populations is at
moderate
level; but there is still a low genetic diversity of some populations,
such as population N7.
Bayesian clustering analyses performed with the software STRUCTURE
indicated that the most informative representation of overall genetic
structure was achieved for K = 2, where K is the number of
subpopulations. The southern slope populations (populations S1–S6) and
the western populations on the U-shaped valley of the northern slope
(populations NW7 and NW8) formed cluster 2, while populations NW9 and
NW10 showed a high degree of
membership to cluster 2. Populations L1 and L2, which are not on the
prominent peak of Changbai Mountain and populations from the western
slope region showed that, with exception of populations NW3 and NW4,
individuals had a high level of admixture.
The
remaining populations displayed a high degree of membership to cluster 1