Genome sequencing, assembly, and post-processing
We generated 11 whole-genome sequences representing both New World marten species, including individuals collected in both known hybrid zones (Kuiu [KUI] and the northern Rocky Mountains [MTX]) and multiple translocated islands (Prince of Wales Island, POW; Chichagof Island, CHI), with an Old World sable (Martes zibellina ) included as an outgroup (Table 1). Sequences were generated on an Illumina HiSeq X through the Beijing Genomics Institute (BGI Americas, Philadelphia, PA, USA) and NextSeq 500 through the Molecular Biology Facility at the University of New Mexico. Sampling was based on previous genetic (Dawsonet al. 2017; Colella et al. 2018a) and morphological analyses (Colella et al. 2018b) that helped define species limits and refine hybrid zone locations through the identification of mixed mitochondrial and nuclear haplotypes. Subsamples of liver tissue were loaned from the University of New Mexico’s Museum of Southwestern Biology (MSB) and the Burke Museum at the University of Washington (UWBM). DNA extractions followed a DNeasy Blood and Tissue Kit (Qiagen, Venlo, The Netherlands) protocol. Our assembly pipeline followed Colellaet al. (2018c). Read quality was examined using FastQC (Andrews 2010) and adapter sequences and sex chromosomes removed by excluding those scaffolds from the reference (Trimmomatic v0.33; Bolger et al. 2014). The Burrows-Wheeler aligner (BWA, Li & Durbin 2010) was used to map reads to the domestic ferret genome (Mustela putorius furo ; Peng et al. 2014) and an additional BWA iteration extracted mitochondrial genomes using the same reference. Final depth of coverage ranged from 19 to 30X (Table 1). PCR duplicates were removed using Picard v1.9 (MarkDuplicates; http://broadinstitute.github.io/picard/) and nuclear and mitochondrial consensus sequences called using SAMtools (mpileup; Li et al.2009). Single nucleotide polymorphisms (SNPs) were called with the Genomic Analysis Toolkit (GATK, Haplotypecaller; McKenna et al.2010) for all North American marten and again against the M. zibellina outgroup. SNPs were filtered (Supplemental Information1) by minimum depth (minDP = 2, set to 1/3rd the coverage of our lowest coverage sample, as recommended for PSMC analyses; Li & Durbin 2011), genotype quality (minGQ = 30), minimum minor allele frequency (MAF = 0.1), and scaffold size (1Mb). Private alleles and indels were removed using VCFtools (Danecek et al. 2011). A MAF of 0.1 removed singletons (e.g., individual-specific, rare mutations), which are not informative about allelic overlap among populations, to reduced potential sequencing errors more common in lower coverage genomes. Format conversions (vcf, ped, bed) were conducted in PLINK (Purcell et al. 2007). Missing data were removed (–max-missing, VCFtools) based on analysis specifications. Variants were spaced (1 per 100bp window) to account for linkage disequilibrium and sorted into 46 ‘pseudo-chromosomes’ to enable the application of human-specific analyses to a non-model system with only 38 chromosomes using custom python scripts available online at https://github.com/jpcolella/.