Introgression analyses
For ADMIXTURE (Alexander et al. 2009) analyses, sites with >80% missing data were removed. The greatest delta (Δ) in cross-validation (cv) score identified the most appropriate number of populations (K) by iteratively leaving a sample out and reexamining the partitioning of genetic structure among the remaining samples. ADMIXTURE results were visualized in R v3.3.4. Populations identified by ADMIXTURE were used in F- statistics.
F -statistics were run in AdmixTools (Patterson et al.2012) using M. zibellina as an outgroup. F3- statistics [Target: Source-1, Source-2] explicitly test for admixture (3PopTest in AdmixTools) and considered all permutations where Source-1, Source-2, and the Target samples came from different populations. While a significantly negative F3 score (Z < 5) denotes admixture in the Target sample, a positive F3 value does not necessarily indicate the absence of admixture (Peter 2016). To determine the generational status (e.g., F1, F2, B1, B2, etc.) of each identified hybrid individual, we use R code available from Lavretsky et al.(2016) to simulate multi-generational hybrids based on unadmixedM. americana (‘POP1’) and M. caurina (‘POP2’). We contrasted admixture proportions of empirical hybrids against proportions output for simulated multi-generational hybrids to estimate generational status. Last, Treemix (Pickrell & Pritchard 2012) was used to infer historical relationships among populations with 2, 3 or 4 mixture events.
To characterize the backcrossing history of each hybrid sample, we usedF4- statistics, similar to D-statistics or ABBA/BABA (Kulathinalet al. 2009; Green et al. 2010; Durand et al.2011), in AdmixTools with block-jackknifing accommodating non-independence between loci. Although F -statistics alone cannot deduce the direction of gene flow in a system, admixture graph fitting can test whether a proposed evolutionary model fits the data well (Lipson et al. 2013; Martin et al. 2015). AdmixtureGraph (Mailund et al. 2016) iteratively fit hybrid individuals into two non-admixed tree topologies (RAxML topology and the RAxML topology collapsed into K = 6 populations; Supplemental Information 2-3) by estimating the minimal error placement from F4 results. We tested all F4 population permutations excluding hybrids identified through F3- statistics. We then tested hybrids against individuals from ‘pure’ populations (e.g., F4 (Outgroup, Hybrid; continentalamericana , insular caurina )) to decipher the backcrossing histories of hybrid samples and characterize patterns of gene flow across populations. F4 -statistics [W, X ; Y, Z] are negative (Z-score ≤ -5) if there is more allelic overlap between X and Y than between X and Z, and positive (Z score ≥ 5) if there has been more recent allele sharing between X and Z than between X and Y. To estimate the timing of introgressive events, we converted drift unit branch lengths (D ) output from MixMapper to absolute time (years) using the formula D ≈ 1- e-t/2Ne (solved for tgenerations; Puckett et al. 2015) and a generation time of 5 years (Buskirk et al. 2012). Small sample sizes and the absence of a Martes linkage map prevents linkage disequilibrium-based estimates of Ne and more refined dating of admixture events.