2.3 Molecular analyses
Putatively neutral markers were assessed using a combination of multivariate methods to detect underlying population structure, which we expected to coincide with coastal and inland lineages described in previous studies. A principal component analysis (PCA) was plotted for all populations based on allele frequencies of putatively neutral markers determined to be without linkage disequilibrium (LD). The PCA of putatively neutral markers accompanies a discriminate analysis of principal components (DAPC) with the R package adegenet 2.1.0 to assign probability of individual membership to genetic groups, K , revealed with the putative neutral marker PCA (Jombart, 2008; Jombart & Ahmed, 2011). The DAPC recovers maximum genetic variation between groups, while minimizing genetic variation within groups (Jombart, 2008; Jombart & Ahmed, 2011). The ‘find.clusters’ adegenet function ran for 25 instances for K =1 through K =10. The Bayesian information criterion (BIC) was averaged and scaled by the standard deviation for each K value. The most appropriate number of genetic groups was determined with the greatest ΔK value as described in Evanno et al. (2005).
The distribution of genetic variation underlying adult migration timing in steelhead across the landscape was described by genotype frequencies. We examined 13 markers occurring on chromosome 28 within thegreb1L, rock1 , and intergenic region between greb1L androck1 that were previously shown to be strongly associated with adult migration timing (Hess et al., 2016; Micheletti et al., 2018a; Table 1). Initially the two most significant SNPs were retained from a previous RAD study (Hess et al. 2016), and the remaining 11 with the strongest association with migration timing from the whole genome resequencing conducted by Micheletti et al. (2018a).To reduce ascertainment bias, we examined genetic variation in this candidate region from several populations of O. mykiss in the region to design primers (Table 1). Premature, mature, and heterozygote genotypes for migration timing were established based on genotype association from previous studies (Hess et al., 2016; Micheletti et al., 2018a), as well as using reference Skamania broodstock genotyped, which is a hatchery-strain intensively selected for early migration and cultured since 1956 with steelhead from the Washougal and Klickitat Rivers (Chilcote et al., 1986). Premature, mature, and heterozygote migration timing genotype proportions were assessed across all collection locations. A PCA of allele frequencies of adaptive markers was also conducted for all collection locations to assess genetic groupings based on migration timing.
We assessed linkage disequilibrium (LD) within the 13 candidate markers to identify haplotype blocks that would be informative for estimating frequencies of migration types. Haplotype blocks within the 13 candidate markers were defined with solid spine LD analysis in the Java Runtime Environment software, Haploview 4.2, across all collection locations (Barrett et al., 2005). A solid spine of LD was extended across a haploblock if D’, or a normalization of the coefficient of LD, exceeded 0.74. The same markers were assessed for LD in individuals from coastal and inland lineages (as delineated by DAPC) separately. The effect of population structure on the LD of the markers was corrected in the analysis with the LDcorSV 1.3.2 R package (Mangin et al., 2012; Appendix S1 Table S2). Variation of genotype proportions were also evaluated with various groupings of the candidate markers.
Redundancy analyses (RDA) were conducted for all Columbia River basin collections to model the degree to which the variation in environmental variables explained the variation in allele frequencies of candidate markers included in the haplotype blocks (Borcard et al., 1992; Kierepka & Latch, 2015). Redundancy analysis was performed on two sets of collections, all populations and each lineage (coastal vs. inland) using the R package Vegan 2.5-6 (Oksanen et al., 2019). We used environmental variables that were significantly associated with adaptive genetic variation in a previous study (Micheletti et al., 2018b; Table 2; Appendix S1 Table S3). When two highly correlated (>0.75 pairwise correlation; Asuero et al., 2006) environmental variables were identified, one was removed from further analyses and the variable kept was determined from biological relevance to salmonids according to previous studies (Olsen et al., 2011; Hecht et al., 2015; Micheletti et al., 2018b). One-way analysis of variance (ANOVA) with a Tukey’s range test (Tukey, 1949) identified significant variability in salmonid habitat. Environmental variables were analyzed with the “envfit” PCA function of the vegan R package. The ANOVA test and PCA together determined significant environmental variables within and among O. mykiss habitats measured in this study. The final RDAs were run with significant environmental variables retained from permutation tests with 1000 permutations (α=0.05). Frequency of alleles in the haplotype block associated with migration timing were correlated to environmental variables with RDA constraint scores. Constraint scores indicated the degree of correlation and whether there was a positive or negative relationship between environmental variables and allelic frequencies.