Processing of sequencing data and variant calling
We downloaded forward sequencing reads of 26 human samples (Supplemental Text S1), which we had selected randomly from the list of samples from the 1000 Genome Project for which there was whole-genome sequencing data available. We also retrieved parrot whole-genome sequencing data from a previous study (McElroy et al., 2018), datasets listed in Supplemental Text S1.
We downloaded from GenBank a human mitochondrial reference (NC_012920.1) and we assembled, de novo , the mitochondrial genome sequences for the grasshopper and parrot using NOVOPlasty (Dierckxsens, Mardulyn, & Smits, 2016) and GetOrganelle (Jin et al., 2020), respectively. We annotated the grasshopper mitogenome with MITOS web server (Bernt et al., 2013).
The subsequent steps were identical for all data sets. We mapped high-throughput sequencing reads with BWA (H. Li & Durbin, 2009) to the appropriate mitochondrial genome assemblies creating BAM files and retaining unmapped reads. Using Samtools (H. Li et al., 2009), we then sorted the alignment files and generated per-sample mapping statistics from which we extracted the total amount of read data and the amount of base pairs mapped (taking into account soft-clipping). We marked duplicates with Picard Tools (http://broadinstitute.github.io/picard/). We then called single-nucleotide variants with Freebayes (https://arxiv.org/abs/1207.3907) across all BAM files per species. In order to retain apparent variants caused by NUMTs, we ran Freebayes with the command line parameters “–haplotype-length -1 –min-alternate-fraction 0.01 –min-alternate-count 2 –pooled-continuous -p 1 -X -u -i”, counting the number of occurrences of any nucleotide at any site in the alignments.
From the resulting VCF files (one per species), we extracted all nucleotide allele counts for each variant site by means of an interactive python script which uses the scikit-allel package (version 1.3.2, https://github.com/cggh/scikit-allel). In each individual and at each site, the most abundant allele was designated the genotype.