I. INTRODUCTION

Many key questions in evolutionary and conservation biology can only be addressed using genomic approaches and appropriate study species. Lake Trout (Salvelinus namaycush ) are a top predator in many lentic ecosystems across northern North America and express exceptional levels of ecotypic variation (Muir et al. 2014; Muir et al. 2016), making them an ideal study species for exploring the processes of ecological speciation and adaptive diversification. The post-Pleistocene parallel evolution of diverse Lake Trout ecotypes has been likened to the adaptive radiation of cichlid species in the Great Lakes of east Africa (Muir et al. 2016); however, the radiation of Lake Trout ecotypes appears to have occurred over a relatively short evolutionary timescale (Harris et al. 2015, ~8000 years). At least three distinct Lake Trout ecotypes (lean, siscowet, and humper) once existed throughout the Laurentian Great Lakes (Hansen 1999) and anecdotal evidence suggests that as many as 10 easily differentiable forms once existed in Lake Superior (Goodier 1981). High levels of ecotypic variation have also been documented in contemporary populations across the species range (Blackie et al. 2003; Zimmerman et al. 2006; Hansen et al. 2012; Chavarie et al. 2015), with as many as five trophic ecotypes being found in a single lake (Marin et al. 2016).
Lake Trout are also ancestrally autotetraploid, with the common ancestor of all salmonids having undergone a whole genome duplication event (WGD) roughly 60-100 million years ago (Crête-Lafrenière et al. 2012; Macqueen and Johnston 2014). For this reason, Salmonids have long been considered ideal study species for understanding the evolutionary consequences of WGD (Ohno 1970; Allendorf and Thorgaard 1984). Given the high levels of ecotypic diversity observed in Lake Trout, and the potential for WGD to facilitate the evolution of novel phenotypes (Ohno 1970; Macqueen and Johnston 2014; Van De Peer et al. 2017) and reproductive isolation (Lynch and Force 2000), research exploring the genetic basis for ecotypic differentiation and incipient speciation in Lake Trout could provide important insights about the role of relatively recent WGD events in adaptive radiations.
Furthermore, many Lake Trout populations, particularly those in the Laurentian Great Lakes, have been severely reduced in abundance or distribution, or extirpated, due to invasive species introductions and overfishing (Smith 1968). Following the basin-wide collapse of the lake whitefish (Coregonus clupeaformis ) commercial fishery in the Great Lakes during the early 20th century, fishing pressure was transferred to Lake Trout populations, which partially contributed to population declines starting in the 1930s (Hansen 1999). A novel predator, the sea lamprey (Petromyzon marinus ), also invaded the Great Lakes during this time, leading to further increases in adult Lake Trout mortality and functional extirpation from all lakes except Lake Superior and a small, isolated, population in Lake Huron (Hansen 1999). The restoration program that commenced largely focused on reducing sea lamprey predation, reducing fishing pressure, creating aquatic refuges, and stocking juvenile Lake Trout from a diverse collection of domesticated strains originating from multiple source populations (Krueger et al. 1983; Hansen 1999). Lake Trout populations in Lake Superior rebounded relatively quickly; however, the re-emergence of natural reproduction in other lakes was hindered by high levels of lamprey predation on adult Lake Trout (Pycha et al. 1980), predation on juveniles by invasive alewife (Madenjian et al. 2008), reduced juvenile survival caused by thiamine deficiency (Fitzsimmons et al. 2010), and potentially reduced hatching success associated with PCB contamination (Mac and Edsall 1991). Today, Lake Superior populations remain relatively stable and recruitment has been observed in lakes Huron (Riley et al. 2007), Michigan (Hanson et al. 2013), and Ontario (Lantry 2015). Recent research suggests that domesticated strains used for reintroduction have variable fitness in contemporary Great Lakes environments (Scribner et al. 2018; Larson et al. 2021), and may be differentially contributing to recent recruitment, however, the biological mechanisms that underly these differences in fitness and recruitment remain unclear.
Genomic and transcriptomic approaches have been widely used to identify loci associated with adaptive diversity and ecotypic divergence in salmonids (Prince et al. 2017; Veale and Russelo 2017; Willoughby et al. 2018; Rougeux et al. 2019). This work has been partially driven by the publication of high-quality genome assemblies and linkage maps for numerous salmonid species (Gagnaire et al. 2013; Lien et al. 2016; Christensen et al. 2018a, Christensen et al. 2018b; Pearse et al. 2019; De-Kayne et al. 2020); however, genomic resources are notably lacking for Lake Trout. An annotated, chromosome-anchored, genome assembly is arguably the most valuable resource for advancing genomic research on any species. A publicly available reference genome for Lake Trout would eliminate many challenges associated with conducting conservation-oriented genetic research aimed at restoring ecotypic diversity and viable wild populations. Until recently, the assembly of non-model eukaryotic genomes was prohibitively expensive, computationally challenging, and required the collaborative efforts of large genome consortia; however, the development of long-read (‘third generation’) sequencing technologies has to some extent eliminated these hurdles (Hotaling and Kelley 2020; Whibley et al. 2021).
Long-read sequencing data can be useful for scaffolding and filling gaps in existing, fragmented, short-read assemblies (English et al. 2012). A number of assembly algorithms also seek to assemble contigs directly from long-read sequencing data (Falcon, Chin et al. 2016; Canu, Koren et al. 2017; wtdbg2, Ruan and Li 2020) and recent work suggests that this approach can be highly effective for assembling chromosome-anchored salmonid genomes when combined with additional scaffolding information (De Kayne et al 2020; also see RefSeq: GCF_002021735.2).
Salmonid genomes are highly complex and relatively difficult to assemble owing to ancestral autotetraploidy (Maqueen and Johnston 2014) and high repeat content (Lien et al 2016; De-Kayne et al. 2020; Kajitani et al 2014). Sequencing low-diversity individuals from inbred lines or homozygous individuals produced via chromosome set manipulations provides one route for simplifying assembly in such species. Previous salmonid genome assemblies have made use of doubled haploid individuals (Lien et al. 2016; Christensen et al. 2018b; Pearse et al. 2019) because these individuals are theoretically homozygous at all loci (but see Lien et al. 2016). However, it should be noted that the highly contiguous assembly produced by DeKayne et al. (2020) for European Whitefish (Coregonus sp. balchen ) was produced using data from an outbred, wild-caught individual.
Here we present a chromosome-anchored reference genome for a double haploid Lake Trout that was assembled using Pacific Bioscience long-read sequencing data and scaffolded using a high-density linkage map (Smith et al. 2020) and genome-wide chromatin conformation capture followed by massively parallel sequencing (Hi-C). We also produced a number of complementary resources including a custom repeat library, an interpolated recombination map, and a set of publicly available gene annotations in order to facilitate additional research on this important species. Additionally, we identify Lake Trout homeologs resulting from the Salmonid specific autotetraploid event (Ss4R) and establish homologous relationships with chromosomes from other salmonid species.