RAD Library sequencing and SNP Ascertainment
Single nucleotide polymorphic (SNP) loci were identified from a
restriction site-associated DNA (RAD) library created using a single
enzyme (sbfI) (Miller, Dunham, Amores, Cresko, & Johnson, 2007; Baird
et al. 2008; Puritz et al., 2014). Twenty-eight barcoded samples of
white sturgeon from a broad geographic range were prepared, pooled
equimolarly, and sequenced on an Illumina HiSeq (100bp paired end,
quality trimmed to 80bp). The ascertainment panel included 12 samples
from the lower/middle Columbia River, 14 from the lower/middle Snake
River, and 2 from the Sacramento River. Forward reads were processed
using the Stacks (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait,
2011) pipeline, using assembly parameters of M of 2 (maximum mismatch;
ustacks), N of 4 (maximum secondary mismatch; ustacks), n of 2 (maximum
sample mismatch; cstacks), and m of 4 to 16 (minimum stack depth;
ustacks) depending on the depth of coverage of that individual, i.e. the
nearest integer using million raw reads * 2 . The relatively low
mismatch thresholds (versus a default 3) ensured that only very similar
reads were assembled, and assisted in preventing homeologs (homologs
derived from polyploidization) from being clustered (Dufresne et al.,
2014; but see Ilut, Nydam, & Hare, 2014; O’Leary, Puritz, Willis,
Hollenbeck, & Portnoy, 2018; Willis, Hollenbeck, Puritz, Gold, &
Portnoy, 2017). Stacks were filtered to retain variants with minor
allele frequencies above 5%, genotyped in at least 80% of individuals,
and for combined depth between 1,050 and 1,600 sequence reads (first
mode of a multi-modal distribution; Supplemental Figure 1). Using read
ratios as proxy for genotypes of unknown ploidy, a principal components
analysis was performed with the candidate SNPs. SNPs with the highest
loadings on the first 10 Eigenvectors were selected for further
development. Forward reads containing candidate markers were
concatenated with their reverse-complemented paired reads to increase
target length, and primers were developed using Primer3. Primers were
tested individually using standard PCR conditions, and in combination
using standard GT-seq multiplex conditions (Supplemental File 1).
Pooling thresholds, the number of individuals that can be simultaneously
genotyped in a single run, were tailored to produce >90%
genotyping success across individuals, as discussed later.