2.3 | Population and phylogenomic structure
We calculated the mean depth of coverage for each BAM file using
Samtools. After filtering, we measured the mean depth, the frequency of
missing data, and the inbreeding co-efficient, F, for each individual in
the final VCF file of 30 caribou plus the reindeer using VCFtools. We
performed principle component analyses in R 3.4.4 (R Development Core
Team, 2006) using the packages vcfR (Knaus & Grünwald, 2017) and
Adegenet (Jombart, 2008). The PCA was done on the VCF file containing
all caribou and the reindeer (but not the Sitka deer). We then ran
subsets of individuals on different PCA’s to gain higher resolution of
different lineages (see Results).
We used VCFkit (available here:
https://vcf-kit.readthedocs.io/en/latest/, using numpy 1.14 as the
programme does not work with newer versions) to generate a fasta file
using the ‘phylo fasta’ command. The programme concatenates SNPs for
each sample, using the first genotype of each allele, and replacing
missing values with an N. We ran this on the VCF file without the Sitka
deer to create an unrooted tree as including the Sitka deer pushed all
caribou too closely together to discern the branches. The resulting file
was input into RAxML 8 (Stamatakis, 2014), and run using the GTRGAMMA
model and 1,000 bootstrap replicates. We visualised the best tree in
FigTree 1.4.2 (https://github.com/rambaut/figtree). We also aligned each
of the conserved mammalian genes extracted from the genomes using BUSCO
(above) to construct phylogenies, from which we made a consensus tree.
We used the Sitka deer outgroup to root the tree. We used MUSCLE (Edgar,
2004) to align the sequences for each individual to create a combined
fasta file for each gene. We then used RAxML as above to create a gene
tree for each file, and then used ASTRAL-III (Zhang, Rabiee, Sayyari, &
Mirarab, 2018) to create a consensus tree which was visualised in
FigTree.
We used the populations module in Stacks 2.4.1 (Catchen, Hohenlohe,
Bassham, Amores, & Cresko, 2013) to convert our VCF files (both with
and without the Sitka deer) into an input file for Treemix 1.13
(Pickrell & Pritchard, 2012). We ran Treemix from 0-9 migration events,
with three iterations of each, grouping the SNPs in windows to account
for possible linkage using a block size of 1,000 for two of the
iterations and 5,000 for one of the iterations (because the OptM
package, below, must have different likelihood scores between
iterations). We plotted the resulting trees, and the residual plots, in
RStudio 1.0.136 (RStudio Team, 2015). We then used the R package OptM
(Fitak, In Review.) to calculate the second order rate of change in the
log-likelihood of the different migration events (the ad hoc statistic
delta M) to help infer how many migration events to visualize.