Association testing
Genotyping-in-thousands by sequencing (GT-seq, Campbell et al.2015) was employed to genotype 308 genetic markers for the association testing analyses. The GT-seq 308 loci were a subset of markers developed from the paired end consensus reads from the Hess et al. (2013) RAD-seq dataset. The selection of loci and steps in development are described in detail in Supplemental Materials. Locus selection began with a group of 457 total SNP loci considered in round 1, which included 120 that had been already designed for TaqMan assays (Hess et al.2015). Final optimization left 308 loci that worked best in GT-seq genotyping. For all samples used below in the association testing we filtered out individuals missing >10% of genotypes at the 308 loci. Excluding the four species diagnostic loci and two duplicated loci provided 302 unique loci for association tests.
There were six samples, five comprised of adults (JDD, S_BON, T_BON, WFA♀, and WFA♂) and one comprised of larvae (GAR), with which we performed association testing (Table S1). Adult samples were from the following three locations: males (WFA♂, N=136) and females (WFA♀, N=133) from Willamette Falls collected in 2016 (Willamette River, Oregon City, OR; 205.6 Rkm upstream from the Columbia River mouth), two samples (S_BON, N=295 and T_BON, N=883) from Bonneville Dam in 2014 (235.1 Rkm upstream from the Columbia River mouth), and one sample (JDD, N=656) from John Day Dam in 2014 and 2015 (346.9 Rkm upstream from the Columbia River mouth). The following five adult traits were measured on all adult samples: ordinal “day” of collection (timing of migration to the sample point), girth (mm), total “length” (mm), weight (g), and distance between dorsal fins (“interdorsal”, mm). Interdorsal measurements have been suggested to serve as an indicator of maturation status in Pacific lamprey because the distance tends to decrease with maturation (Clemens et al. 2009). We measured an additional migration trait for three adult samples (S_BON, T_BON, and JDD) via a combination of passive integrated transponder (PIT) and radio tagging of individual fish and observing their furthest upstream detection from the release location (“Rkm”). Further, since the males and females collected at Willamette Falls (WFA♂ and WFA♀) were being harvested, we were able to measure gonad weight as a proxy for maturity in those samples. Finally, a subset of the adult sample from Bonneville Dam (S_BON) was used in a swim trial experiment within a flume (Kirket al. 2016), in which the following three swimming behavioral traits were measured: “approached” experiment, passed challenge (“pass”), and passed challenge without fallback (“passrep”). Details of these swimming performance experiments can be found in Kirk et al. (2016) and Supplemental Materials.
A single group of larvae were artificially propagated using adults captured at Bonneville Dam. These larvae were reared in a common garden experiment to generate early larval growth (“GAR”) rate data (N=337). All larvae were spawned in the spring of 2015 and allowed to rear from 30 to 163 days after hatching. Growth rate was measured as length / time (“growth”), and also corrected growth rate [“growth rate_b”; (length – 4 mm) / time] to correct for length at hatch (~4 mm).
Intercorrelation among all measured traits in these six samples (i.e. JDD, S_BON, T_BON, WFA♀, WFA♂, and GAR) was examined (based on Pearson’s r ) to avoid excessive redundancy of predictor variables (│r │ > 0.95), and P -values were calculated (SAS Institute, Inc. 2000). We performed univariate analyses using a general linear model (GLM) and a mixed linear model (MLM) with TASSEL v. 5.1.0 (Bradbury et al. 2007). The GLM is a fixed effects linear model that is used in TASSEL to identify significant associations between phenotypes and genotypes. TASSEL takes population structure into account by using genetic principal coordinate axes as covariates in the model. The MLM is similar to GLM but includes both fixed effects (e.g. population structure, and genetic marker) and random effects (i.e., relationships among individuals) and can thus account for both population structure and kinship to reduce false positive associations (Yu et al. 2006). Details on the covariates and ways in which loci were used taking population structure and relatedness into account in the GLM and MLM tests are provided in the Supplemental Materials. To account for multiple tests, only those associations with P -values less than the critical value as determined using the false discovery rate procedure described by Benjamini and Hochberg (1995) were considered significant. The Benjamini and Hochberg (1995) false discovery rate approach has more power to detect significant differences than sequential Bonferroni correction (Narum 2006). Critical values were calculated using the function p.adjust within the R package stats (RDC Team 2019).