Processing of sequencing data and variant calling
We downloaded forward sequencing reads of 26 human samples (Supplemental
Text S1), which we had selected randomly from the list of samples from
the 1000 Genome Project for which there was whole-genome sequencing data
available. We also retrieved parrot whole-genome sequencing data from a
previous study (McElroy et al., 2018), datasets listed in Supplemental
Text S1.
We downloaded from GenBank a human mitochondrial reference
(NC_012920.1) and we assembled, de novo , the mitochondrial
genome sequences for the grasshopper and parrot using NOVOPlasty
(Dierckxsens, Mardulyn, & Smits, 2016) and GetOrganelle (Jin et al.,
2020), respectively. We annotated the grasshopper mitogenome with MITOS
web server (Bernt et al., 2013).
The subsequent steps were identical for all data sets. We mapped
high-throughput sequencing reads with BWA (H. Li & Durbin, 2009) to the
appropriate mitochondrial genome assemblies creating BAM files and
retaining unmapped reads. Using Samtools (H. Li et al., 2009), we then
sorted the alignment files and generated per-sample mapping statistics
from which we extracted the total amount of read data and the amount of
base pairs mapped (taking into account soft-clipping). We marked
duplicates with Picard Tools
(http://broadinstitute.github.io/picard/).
We then called single-nucleotide variants with Freebayes
(https://arxiv.org/abs/1207.3907) across all BAM files per
species. In order to retain apparent variants caused by NUMTs, we ran
Freebayes with the command line parameters “–haplotype-length -1
–min-alternate-fraction 0.01 –min-alternate-count 2
–pooled-continuous -p 1 -X -u -i”, counting the number of
occurrences of any nucleotide at any site in the alignments.
From the resulting VCF files (one per species), we extracted all
nucleotide allele counts for each variant site by means of an
interactive python script which uses the scikit-allel package (version
1.3.2, https://github.com/cggh/scikit-allel). In each individual
and at each site, the most abundant allele was designated the genotype.