Marker Performance
We evaluated the accuracy of these markers for predicting parentage using a dataset of 326 offspring from a partial cross of 5 dams and 6 sires from the Yakama hatchery genotyped at a minimum threshold of 80% completeness (29 full sibling families, ranging from 3 to 23 offspring). Ploidy-accurate genotypes were generated by the updated GT-seq pipeline for polyploids. From all potential sire-dam-offspring trios, we estimated the percent of Mendelian incompatibilities between involving both vs. one or neither true parent using a custom R script (Supplemental File 2). For comparisons in which sex is unknown or both parents may not be included, we used the “Paternity” estimation routine of Polygene (Kang Huang et al., 2020), which includes several population genetic routines adapted for polyploids, though some of these are not applicable across samples of different ploidy. To evaluate performance of single-parent assignment, we included only 4 dams and 2 sires for 326 of the offspring, and examined the LOD score when both, one, or neither parent was present in the candidate set. For circumstances where candidate parents cannot be identified a priori , or where sibship relationships are of greater interest, we evaluated the performance of the Huang et al. (2015) maximum likelihood estimator of relationship against known relationships among the 326 offspring in this Yakama set. Although the presence or degree of meiotic double-reduction, resulting in gametes that carry both of a pair of sister chromatids, is not well known in white sturgeon, we applied all Polygene analyses under the “pure random chromatid segregation” (PRCS) model, which provides for some amount of double-reduction. Similarly, we also estimated sibship and full sibling families using Colony2 (Jones & Wang, 2010). Colony2 is designed for diploids, but accepts dominant data, so each SNP locus was recoded as two pseudo-dominant loci (Rodzen et al., 2004; Wang & Scribner, 2014). We ran analyses (full likelihood, high precision, 3 medium length runs) using different estimates of genotyping error (0.001 to 0.05), with parents absent or with all 11 parents present in different arrangements. Arrangement of parents included: separated by sex, together in a single set but ordered, and together but unordered, in each case with probability of inclusion of 0.9, and with all or score 1+2 loci only. Other parameters were left as default.
We also evaluated the utility of these SNPs in estimation of population genetic parameters useful in understanding the differences among and relationships between white sturgeon in different population segments. Using the 3,514 sequenced sturgeon, we evaluated the relationship between sequencing depth, genotype completeness, heterozygosity, and confidence in ploidy estimate represented as minimum alternate LLR. We also compared the genotypes of 142 of these fish which had been sequenced more than once to estimate genotyping error. We then filtered to retain individuals with 4N ploidy from minimum alternate LLR >10 and genotyped at a minimum of 80% completeness. We utilized only 4N individuals because PolyGene is currently limited to a single ploidy per population. We removed known stocked and hatchery individuals from the filtered set, and, to filter unknown stocked individuals, we excluded all but one individual from any set of individuals within a reach with relatedness estimates greater than 0.2 (K. Huang et al., 2015), resulting in 1,203 individuals. We reasoned that unknown stocked fish would exhibit high levels of relatedness due to the generally limited number of breeders available to hatchery operations. Based on estimates from known relationships (the Yakama fish), the applied value should eliminate all full-siblings and most half-siblings while minimizing the unintentional removal of unrelated individuals, although the actual results will be population-specific (Table 1). There has been considerable discussion of whether siblings should be filtered from population genetic datasets (Wang, 2018; Waples & Anderson, 2017), and we acknowledge that statistics calculated from this dataset will have been affected by this filtering but correct for potentially large bias that can be introduced from non-random sampling (Wang, 2018).
Using a custom R script (Supplemental File X), we calculated the minor allele frequency (MAF) of each locus for each reach with N>7 individuals to examine the information content of these loci for identifying distinct population segments. For each population with N>50 individuals, we calculated linkage disequilibrium among loci using Fisher’s G test and conformance to expectations of Hardy-Weinberg (HW) equilibrium using Raymond & Rousset’s (1995) estimator from the Markov chain (5k burn-in, 100 batches of 5k iterations), both applied in PolyGene and with correction for multiple tests (false discovery rate, FDR; Benjamini & Hochberg, 1995). Similarly, we examined differences across sub-populations in estimated inbreeding values using Weir’s (1997) estimator in PolyGene. Finally, we estimated genetic divergence, as Nei’s (1973) FSTanalog, among a selection of sub-populations for which sample size was sufficient, and estimated the precision of this statistic by 100 50% jackknife replicates across loci and calculation of the 95% confidence interval from these assuming a normal distribution.