Introduction
Whole genome duplication is hypothesized to have played a fundamental
role in evolution (Dufresne, Stift, Vergilino, & Mable, 2014; Soltis,
Visger, Blaine Marchant, & Soltis, 2016), including of vertebrates
(Dehal & Boore, 2005; Holland, Garcia-Fernandez, Williams, & Sidow,
1994), and in particular in fishes (Crow, Stadler, Lynch, Amemiya, &
Wagner, 2006; Meyer & Van De Peer, 2005). Despite this, there are
relatively few extant vertebrate species that are known to be polysomic
(exhibiting multivalent chromatids) (Comai, 2005), which stems in part
from the processes of diploidization that occur following most
polyploidization events (Lynch & Conery, 2000; Ohno, 1971; Wendel,
2000; Wolfe, 2001). Select lineages however, including some vertebrates,
appear to be prone to episodic polyploidization and prolonged polysomism
(Dufresne et al., 2014).
Despite the obvious differences, our understanding of polyploid
evolution has largely come via study of allopolyploids, those that arise
from combination of two ancestral genomes, usually through
hybridization, rather than their autopolyploid counterparts, which arise
from the doubling of a single ancestral genome, usually through
fertilization of unreduced gametes (Dufresne et al., 2014; Soltis et
al., 2016). In part this stems from several methodological challenges to
developing genetic insights from polyploids, which are often more
significant in auto- than allopolyploids. Developing reliable genetic
markers for polyploids has been impeded by both the presence of
co-amplifying homeologs whose signals cannot be discriminated, as well
as true polysomic segregation of those homeologs, with the true somy
obscured by homeolog co-amplification. For example, while
microsatellites have often been the standard marker for population
genetics because of their ease of discovery and high allelic diversity,
many studies of polyploids have found mixed inheritance patterns that
could reflect true mixed-somy segregation or variable amplification of
homeologs from each ancestral genome (Dufresne et al., 2014).
While allopolyploids may often exhibit disomy of the ancestral genomes
soon or immediately following polyploidization (Spoelhof, Soltis, &
Soltis, 2017), in which case developing diploid markers is a matter of
identifying ancestral genome-specific primers or probes (Dufresne et
al., 2014), true polysomy in autopolyploids and segmental allopolyploids
(those formed from merger of partially divergent ancestral genomes)
presents additional challenges. In polysomes, determining the dosage
(count or ratio) of microsatellite alleles in an individual’s genotype
may be difficult when the genotyping technology is not quantitative, and
the presence of null alleles can impede this further. Moreover, while
the estimators for many population genetic parameters can be extended to
include polysomic inheritance (Meirmans & Van Tienderen, 2013; Ronfort,
Jenczewski, Bataillon, & Rousset, 1998), until recently there has been
relatively little interest in incorporating these extensions into
popular genetic software, the majority of which permit only diploid
data. Several recent software packages that were updated to permit
polyploid data (e.g. Genodive (Meirmans, Liu, & Van Tienderen,
2018), the R package adegenet (Jombart, 2008), and others,
reviewed in (Dufresne et al., 2014)) or were designed specifically for
polyploids (EBG; (Blischak, Kubatko, & Wolfe, 2018); Polygene;
(Kang Huang, Dunn, Ritland, & Li, 2020)) make progress on this front,
but these require that the ploidy and either the allelic phenotype
(dosage blind genotype) or full ploidy-aware genotype be provided for
each individual. For species that vary in ploidy, this generally
requires separately assessing ploidy from genotyping/allelic
phenotyping, adding time and expense, and in some cases precluding the
use of commonly archived tissue types and preservation methods.
Here, we demonstrate a set of methodological and bioinformatic
techniques which address many of these challenges in developing genetic
resources for a ploidy-variable, polysomic species, the white sturgeon
(Acipenser transmontanus ). The sturgeons (Acipenseriformes) are a
classic example of polysomic polyploidy in vertebrates. All extant
sturgeons, which exhibit between ~120 and
~360 chromosomes, are hypothesized to be polyploid
relative to an extinct diploid ancestor which had 60 chromosomes
(Rajkov, Shao, & Berrebi, 2014). The sterlet (Acipenser
ruthenus ), a Eurasian sturgeon with ~120 chromosomes,
should by this ratio be tetraploid (4N), but in exploring gene content
and homology of a draft genome, Du et al. (2020) discovered extensive,
though incomplete, diploidization resulting from a “segmental
deduplication” process, while others have inferred both disomic and
tetrasomic inheritance of microsatellite markers in this species (Rajkov
et al., 2014). By similar logic, the white sturgeon (A.
transmontanus ; ~240 chromosomes) should be ancestrally
octoploid (8N), though microsatellite inheritance patterns have
suggested both tetrasomic and octosomic segregation (Drauch Schreier,
Gille, Mahardja, & May, 2011). Intriguingly, white sturgeon
occasionally exhibit spontaneous autopolyploidy generally resulting in
increases of chromatin content by ~1.5 (12N,
dodecaploid) (Drauch Schreier et al., 2011; A. D. Schreier, May, &
Gille, 2013). And though of unknown fertility, backcrossed offspring
(10N, decaploid) are often viable (J. P. Van Eenennaam et al., 2019),
creating a wide range of ploidies within a single species.
White sturgeon are the largest freshwater fish in North America,
reaching lengths up to 6.1m, though lengths of 2m are more common (Scott
& Crossman, 1973). As euryhaline fish, white sturgeon may be found
along the Pacific coast as far north as the Aleutian Islands and as far
south as northern Baja, though their current strongholds include the
Sacramento-San Joaquin, Columbia, and Fraser River Basins (Hildebrand et
al., 2016). Although the Columbia basin hosts the largest total
aggregation of white sturgeon, their distribution in this system is
broken into a number of de facto population segments by dams and
other river modifications that prevent almost all demographic exchange
(Hildebrand et al., 2016). Several of these river sections contain
populations classified by the US or Canada as threatened or endangered
and even more population segments are in decline due to recruitment
limitation resulting from habitat degradation (Hildebrand et al., 2016).
While conservation management plans have been developed for most white
sturgeon population segments, a lack of robust information about
historical and contemporary movement, population structure, and
recruitment patterns have hindered effective solutions for these fish,
which can take a decade or more to mature (Hildebrand et al., 2016).
Obtaining and utilizing genetic data, in particular, has seen challenge
not unlike many polyploid species (Anders et al., 2011). While
microsatellite markers for white sturgeon have been available for some
time (Rodzen, Famula, & May, 2004), the unclear or mixed segregation
patterns of these markers has made inferring robust genetic data
difficult (Clark & Schreier, 2017; Drauch Schreier et al., 2011). In
addition, although some researchers have achieved moderate success by
coding the polysomic data as pseudo-dominant di-somic markers or by
using ploidy-agnostic analysis methods (Rodzen et al., 2004; A. Drauch
Schreier, Rodzen, Ireland, & May, 2012), this has nonetheless limited
the types of analyses available.
To remedy these limitations, we developed a set of single-nucleotide
polymorphism (SNP) markers using reduced representation genomic
libraries and tested the reliability of polysomic segregation patterns
by examining inheritance in known cross families and allele ratios in a
large sample of individuals. The SNP markers were developed for survey
with the ‘genotyping-by-thousands’ or ‘GT-seq’ method (Campbell, Harmon,
& Narum, 2015), a multiplex amplicon-based method utilizing
massively-parallel sequencing to cost-effectively survey hundreds of
individuals simultaneously and providing read data approximately
proportional to allelic dosage. We provide updated scripts to
efficiently genotype polysomic individuals in a ploidy-aware manner by
incorporating the funkyPloid function from the R package
tripsAndDipR v0.2.0 (Delomas et al. submitted), which fits
beta-binomial mixture models to the sets of allele read counts and
compares the likelihoods of candidate ploidies. This permits each
individual to be genotyped in accordance with its inferred ploidy. We
demonstrate the utility of these SNPs to infer parentage/relatedness and
estimate population and individual-level genetic parameters using a
computer package specifically designed for polyploids, Polygene
(Kang Huang et al., 2020).