3 Results
After aligning markers in common for all samples, 246 neutral markers (Hess et al., 2016) and up to 13 candidate markers from chromosome 28 (Table 1) were included for further analyses. A total of 9,471 individuals from 113 populations met inclusion criteria (>90% loci successfully genotyped and had an estimated <0.5% genotyping error based on replicate genotyping) and were included in this study.
Population structure as visualized by the PCA of allelic frequencies of neutral markers indicated genetic divergence by geographic locations (Figure 2). DAPC and ΔK revealed two genetic groupings that coincided with coastal and inland localities (Appendix S1 Figure S1). Most coastal collections, except for Mill Creek and Indian Creek, exhibited non-overlapping allele frequencies relative to all inland collections. The Klickitat River which is located between coastal and inland populations formed a cluster intermediate of the two population types. Inland collections from the Yakima and Clearwater rivers clustered distinctly from others in study (Figure 2).
A second PCA produced using candidate markers separated individuals according to proportion of premature and mature migration genotypes (Figure 3). In contrast to results with neutral markers that separated individuals by sample location and population structure, the PCA with adaptive markers separated individuals by migration timing genotypes. Cluster membership delineated via DAPC assuming K=2 grouped individuals from 25 putatively coastal lineage collections together and grouped individuals from 90 putatively inland lineage collections together in a second cluster (Appendix S1 Figure S1).
Candidate markers were analyzed for all sampling locations in Haploview with solid spine and this resulted in two haploblocks, one with markers 1-7 and another with markers 8-13 (Figure 4a). One haplotype block contained all markers within greb1L and another included all or the majority of markers located within the intergenic region upstream ofgreb1L and rock1 . There was one marker located withinrock1 , but it did not demonstrate as strong of LD as other markers included in the second haplotype block. The intergenic haplotype block, containing markers 8-12, maintained high LD in both inland and coastal collections.
When haplotype blocks were examined separately for coastal (Figure 4b) and inland (Figure 4c) lineages, high LD was retained at markers 8-12 for both lineages. Additionally, minor allele frequencies (MAF) were lower for all inland markers except for candidate markers 8-12 (Appendix S1 Table S3; Figure S2). Variation in LD occurred among markers 1-7 and was stronger in the coastal lineage (Figure 4b-c). Elevated LD in the coastal lineage markers resulted in one haplotype block, spanning markers 1-12 (Figure 4b). The solid spine analysis revealed three haplotype blocks in the inland lineages which were split between markers two and three and markers seven and eight (Figure 4c). The haplotype block split between markers seven and eight observed in the inland lineage was the same position as the split in all collections (Figure 4a), indicating the split for all collections was influenced by the inland collections. Further, a greater divergence between average MAF values can be observed between markers seven and eight of the inland collections than in the coastal collections (Figure S2). Confidence intervals (0.95 upper, 0.7 lower; Gabriel et al. 2002) and the four gamete rule, which assumes recombination when all four possible haplotypes are detected at frequencies exceeding 0.01 (frequency > 0.02-0.03; Wang et al. 2002), were applied in further LD analyses Variation in haplotype blocks was observed between analyses and the differences were the inclusion or exclusion of markers 1 and 13 and the split between markers 5 and 6 or between markers 7 and 8. The difference in the location of where haplotype blocks were split could be influenced by fixed alleles at markers 4, 6, and 7 in some collections (Appendix S1 Table S3). All Snake River collections were limited to markers 2, 3, 6, and 9 because these markers were developed earlier than the rest and were the only markers available for these collections. This resulted in limited data availability (4 instead of 13 candidate markers) for the farthest inland collections. Haploview analysis comparing lineages was done both with and without the individuals that were only genotyped at 4 of the 13 markers and both analyses yielded the same results.
We examined six different combinations of markers to determine which markers produce similar frequency results: a single marker (9), three markers (2,3,6), four markers (2,3,6,9), five markers (8-12), six markers (2-7), and 11 markers (2-12). This allowed for comparison across marker groups to determine if frequencies across different marker combinations were similar. In general, all six combinations of marker groups provided similar haplotype frequencies with differences in associated haplotypes only differing by 1-7% (Figure S3). The groups with the most similar haplotype frequencies were marker 9 alone and markers 8-12, followed by markers 2, 3, and 6 and markers 2-7, markers 2, 3, 6, and 9 and markers 2-12 have similar average genotype frequencies (Figure S3).
Average genotype proportions were mapped across all collection locations with markers 2, 3, 6, and 9 because all collections were genotyped at these markers (Figure 5). The most common genotype in individuals sampled was the mature genotype. The mature genotype was predominant throughout much of the range in the Columbia River, however many populations west of the Cascade Mountains and in the Salmon River have greater proportions of the premature genotype than other collections (Figure 5). However, only 9 of the 113 populations had a higher frequency of premature alleles for early migration.
To evaluate haplotype frequencies for a single haplotype block in as many locations as possible, we further scrutinized haplotypes for markers 2, 3, 6 across the landscape and found five unique haplotypes (Figure S4a). Haplotype frequencies for collections (Figure S4a) showed similar patterns of geographic distribution as the genotype frequencies (Figure 5), but with improved resolution for heterozygous haplotypes that were within a single haplotype block underlying greb1L . According to results of overall haplotype frequency (Figure S4a), the heterozygote haplotype 4 is present more frequently than the premature haplotype 5. Additionally, there is a distinct separation of heterozygote haplotypes between coastal (haplotypes 2 and 3) and inland (haplotype 4) collections (Figure S4a).
To model impacts of significant environmental variables on allelic frequencies of migration timing associated markers, RDAs were done for all Columbia River basin collections and then separately for coastal and inland lineage collections. Significant environmental variables retained in the RDA for all collections were migration distance, minimum temperature of the warmest month, 20-year average August water temperature, annual mean temperature, isothermality, and annual precipitation (Figure 6). Annual precipitation had the greatest effect when all collections were analyzed together (Figure 6). Environmental variables retained in the coastal lineage RDA were average temperature of the coldest quarter and precipitation of the wettest month (Figure S5a). Environmental variables retained in the interior lineage RDA were 20-year average August water temperature and minimum temperature of the warmest month (Figure S5b).