2.4 | Demographic reconstruction and admixture
analyses
We made a consensus fastq file for each caribou and the reindeer from
their bam files, using the Samtools and BCFtools 1.5. This was converted
into an input file and run in Pairwise Sequentially Markovian Coalescent
(PSMC) model in PSMC (Li & Durbin, 2011) to investigate past effective
population size changes. These were plotted using the general mammal
mutation rate of 1.0E-9 (Li & Durbin, 2011) and a generation time of 7
years (COSEWIC 2014-2017).
To calculate admixture statistics, we used the R package admixr (Petr,
Vernot, & Kelso, 2019) to run ADMIXTOOLS (Patterson et al., 2012). We
converted our VCF file containing the Sitka deer (to use as an outgroup)
into EIGENSTRAT format using a C++ script (found here:
https://github.com/bodkan/vcf2eigenstrat). As the package does not
work when including more than 600 scaffolds, we filtered the dataset to
include SNPs found only on the 600 largest scaffolds, which encompassed
over 98% of the reference genome assembly (the scaffold L90 is 285;
Taylor et al., 2019). We used the EIGENSTRAT files to run f3, f4, and
f4-ratio statistics. See Reich, Thangaraj, Patterson, Price, and Singh
(2009) and Patterson et al., (2012) for full explanations of these
tests, but briefly, the f3 statistic is a three-population test that can
calculate whether population ‘C’ is a mixture of two other populations,
‘A’ and ‘B’. A negative f3 statistic indicates that population ‘C’ is a
mixture of ‘A’ and ‘B’. The f4 statistic is an ABBA BABBA test and acts
similarly to D statistics. It is a four-population test which requires a
phylogenetic set up including two sister groups, a test group to see if
introgression has occurred into one of the two sister groups, and an
outgroup (which we always set as the Sitka deer). An f4 statistic which
significantly differs from 0 indicates gene flow, whether it is positive
or negative tells you into which of the sister populations. In the
f4-ratio test, alpha is calculated, which is the proportion of the
genome in population ‘X’ that originates from population ‘B’ as opposed
to population ‘A’ (the proportion of population ‘A’ is calculated as 1
– alpha).
For these tests, we grouped the four barrenground genomes from Bluenose
and Qamanirjuaq as they show no differentiation and testing them
separately made no difference to the results. The four boreal caribou
genomes from Ontario and Manitoba were run separately as these do show
differentiation and grouping them did affect the outcome. We focussed on
using these tests to investigate 1) the amount of barrenground
introgression into eastern migratory caribou in Ontario/Manitoba and
Quebec/Labrador (f3, f4 and f4-ratio tests) separately as they have
non-overlapping ranges, 2) introgression between eastern migratory
caribou in Ontario/Manitoba and Quebec/Labrador (f4 test), 3)
introgression between boreal caribou of NAL origin and the mountain
caribou (f4 and f4-ratio tests for significant populations), and 4)
introgression between boreal caribou of NAL origin and the Northwest
Territories boreal caribou of BEL origin (f4 and f4-ratio tests), since
our goal was to investigate the potential role of adaptive introgression
in leading to parallel evolution.
We first investigated introgression from barrenground into the Manitoba
and Ontario boreal populations (f4 test), and due to its current
geographic isolation and low levels of introgression from the
barrenground lineage, we used Ignace as the representative NAL boreal
population. Similarly, to investigate introgression from the NAL into
the BEL, we used the Grant’s caribou as the sister group as these showed
the least amount of introgression from the NAL lineage (f4 test). In the
tests to investigate BEL introgression into the boreal caribou of NAL
origin, we used eastern migratory Quebec/Labrador caribou as the sister
group which had the lowest introgression from the BEL (f3, f4 and
f4-ratio tests). For the full set up our tests, see Supporting
Information.
To investigate patterns of introgression across the genome we used the
programme Dsuit (Malinsky, Matschiner, & Svardal, 2019). The
Dinvestigate function can be used to calculate introgression in windows
across the genome, and this was used to calculate the fDand fDM statistics (Malinsky et al., 2015) for sliding
windows of 1,000 SNPs incremented by 250 SNPs across the genome. We used
this programme to investigate introgression between NAL boreal caribou
and the mountain caribou as well as between NAL boreal caribou and BEL
boreal caribou to further investigate the process of parallel evolution.
Again, we used Ignace as the representative NAL boreal population and as
it is the most geographically isolated and has low levels of
introgression from BEL. Similarly, we found Grant’s caribou to show the
lowest levels of introgression from the NAL and so we used them as the
sister group into the BEL boreal and Columbia North caribou. The Sitka
deer was still used as the outgroup in all tests.
To further investigate the potential role of adaptive introgression in
the parallel evolution of the BEL boreal caribou (see Results) we
investigated the gene composition of the most introgressed regions
within the BEL boreal caribou identified as having originated from
Ignace. We compared these to the most introgressed regions from Ignace
into all mountain populations as adaptive introgression is unlikely to
have played a role in the parallel evolution of these populations due to
the uncovered patterns of introgression (see Results). To do this, we
extracted the sequences for all regions across the genome with an
fDM score over 0.2 (as it is the most conservative
statistic) using Bedtools 2.29 (Quinlan & Hall, 2010). To make sure the
sister group used in the set-up of the test didn’t bias the results, we
only included regions that were flagged as highly introgressed from the
NAL group when using both Grant’s and Peary caribou as the sister group.
We used the command line version of BLAST 2.6 (Altschul, Gish, Miller,
Myers, & Lipman, 1990) to search for the genes present in these
introgressed regions and genes with mRNA or predicted mRNA hits in at
least two species and with an E score of 0.