Introgression analyses
For ADMIXTURE (Alexander et al. 2009) analyses, sites with
>80% missing data were removed. The greatest delta (Δ) in
cross-validation (cv) score identified the most appropriate number of
populations (K) by iteratively leaving a sample out and reexamining the
partitioning of genetic structure among the remaining samples. ADMIXTURE
results were visualized in R v3.3.4. Populations identified by ADMIXTURE
were used in F- statistics.
F -statistics were run in AdmixTools (Patterson et al.2012) using M. zibellina as an outgroup. F3- statistics
[Target: Source-1, Source-2] explicitly test for admixture (3PopTest
in AdmixTools) and considered all permutations where Source-1, Source-2,
and the Target samples came from different populations. While a
significantly negative F3 score (Z < 5) denotes
admixture in the Target sample, a positive F3 value does not
necessarily indicate the absence of admixture (Peter 2016). To determine
the generational status (e.g., F1, F2, B1, B2, etc.) of each identified
hybrid individual, we use R code available from Lavretsky et al.(2016) to simulate multi-generational hybrids based on unadmixedM. americana (‘POP1’) and M. caurina (‘POP2’). We
contrasted admixture proportions of empirical hybrids against
proportions output for simulated multi-generational hybrids to estimate
generational status. Last, Treemix (Pickrell & Pritchard 2012) was used
to infer historical relationships among populations with 2, 3 or 4
mixture events.
To characterize the backcrossing history of each hybrid sample, we usedF4- statistics, similar to D-statistics or ABBA/BABA (Kulathinalet al. 2009; Green et al. 2010; Durand et al.2011), in AdmixTools with block-jackknifing accommodating
non-independence between loci. Although F -statistics alone cannot
deduce the direction of gene flow in a system, admixture graph fitting
can test whether a proposed evolutionary model fits the data well
(Lipson et al. 2013; Martin et al. 2015). AdmixtureGraph
(Mailund et al. 2016) iteratively fit hybrid individuals into two
non-admixed tree topologies (RAxML topology and the RAxML topology
collapsed into K = 6 populations; Supplemental Information 2-3) by
estimating the minimal error placement from F4 results. We tested
all F4 population permutations excluding hybrids identified
through F3- statistics. We then tested hybrids against individuals
from ‘pure’ populations (e.g., F4 (Outgroup, Hybrid; continentalamericana , insular caurina )) to decipher the backcrossing
histories of hybrid samples and characterize patterns of gene flow
across populations. F4 -statistics [W, X ; Y, Z] are negative
(Z-score ≤ -5) if there is more allelic overlap between X and Y than
between X and Z, and positive (Z score ≥ 5) if there has been more
recent allele sharing between X and Z than between X and Y. To estimate
the timing of introgressive events, we converted drift unit branch
lengths (D ) output from MixMapper to absolute time (years) using
the formula D ≈ 1- e-t/2Ne (solved for tgenerations; Puckett et al. 2015) and a generation time of 5
years (Buskirk et al. 2012). Small sample sizes and the absence
of a Martes linkage map prevents linkage disequilibrium-based
estimates of Ne and more refined dating of admixture
events.