Marker Evaluation
To examine the functional ploidy (‘somy’, or meiotic segregation
pattern) of the identified SNPs in white sturgeon, we initially
evaluated read ratios in the ascertainment panel individuals, which
exhibited patterns reflective of 5 genotype categories (AAAA, AAAB,
AABB, ABBB, BBBB), i.e. tetrasomy. To confirm this, we estimated ploidy
from a larger group (N=3,514; Table 1) of white sturgeon from the
Columbia, Fraser, and Sacramento River Basins, using read counts for the
325 SNPs and the R function funkyPloid (Delomas et al.
submitted), implementing the beta-binomial model with uniform noise for
candidate ploidies (somies) of 4N, 5N, and 6N. From this group, putative
tetrasomic (4N) individuals were retained that had a minimum of 50k
reads and a minimum log likelihood ratio to the next most likely ploidy
(hereafter, minimum alternate LLR) of 25, resulting in 2,378 individuals
retained for further analyses. We then used the read counts for each
locus across these individuals to assess the ploidy of the locus withfunkyPloid comparing 4N and 8N models. This was achieved by
transposing the read count matrices input to funkyPloid .
The funkyPloid function assesses the number of amplified copies
of SNPs in the genome, and does not directly assess the pattern of
segregation. Thus, this test cannot discriminate between a true
tetrasomic locus and two co-amplified, disomic loci. However, given the
chromosome complement and previous observations (e.g. Drauch Schreier et
al., 2011), disomic loci are likely to be rare, and so here we only
consider tetrasomic and octosomic segregation for these loci. It should
also be noted that because this LLR metric includes no penalty for
overfitting, ‘noisy’ 4N loci may exhibit higher likelihood for 8N
despite only being present in four copies in the genome (Delomas et al.
submitted). Thus, rather than evaluate the raw LLR results, we ranked
each locus based on fit to 4N or 8N models.
We compared this to two measures of congruence with tetrasomic
inheritance. First, we evaluated the percent of comparisons for each
locus reflecting Mendelian incompatibilities in several parent-offspring
genotypes of known crosses (4 dams, 2 sires, and 128 offspring genotyped
at 90% completeness) of white sturgeon spawned by the Yakama Nation
using a custom R script (Supplemental File 2). Mendelian
incompatibilities were identified as any offspring genotype that was
absent from the set of possible offspring genotypes from a pair of
adults formed from all possible combinations of all potential diploid
gametes assuming the absence of double reduction, which has not been
identified in white sturgeon (Drauch Schreier et al., 2011; A. L. Van
Eenennaam, Murray, & Medrano, 1998). Genotypes of each individual were
obtained by modification of the GT-seq pipeline that accommodates
different ploidies by integrating funkyPloid and assuming a
normal distribution of read ratios for each allele ratio-genotype
category with standard deviation starting at 0.05 for 4N and
progressively reduced for each higher ploidy following
s.d.=0.05*(4/ploidy) to avoid overlap of allele ratio categories
(https://github.com/stuartwillis/gt-seq-ploidy). Allele ratios that fall
outside the 95% confidence bounds for each genotype category are scored
as missing. As such, the genotype thresholds are more stringent for
increasing ploidies, and higher sequencing effort may be required to
precisely estimate read ratios and genotype higher ploidy samples at all
loci. Second, we visually inspected the allele ratio plots of the 2,378
4N individuals and rated each locus on a scale of 1-4 for conformance to
the expected allele ratios for tetrasomy (95% confidence interval of
normal distribution). The ratings were 1: almost all
(~<5%) ratios fall within expected 95%
bounds of 5 genotype classes; 2: ratio medians fall within expected 95%
bounds of 5 genotype classes but with minor shifts towards the reference
or alternate allele (allele bias), and ~<25%
of ratios out of bounds; 3: ~5 genotype classes, but
ratio medians often fall outside bounds (medians strongly skewed),
and/or many (~>25%) allele ratios fall
outside confidence intervals; 4: 6+ genotype classes, homozygote allele
ratio medians fall off x,y axes, and/or no distinct genotype classes.
Example plots are provided with bounds reflecting 8N genotype categories
(Supplemental Figure 2).