Analyses of variant data and implementation of the method
We excluded individuals with excess missing data. Our methods described above are implemented in R with the intercept method relying on a mixed-effect modelling approach, which limits the number of variant sites that can be analysed on a desktop computer. This method thus selects, per dataset, 400 of the most informative SNPs, by subsampling using a weighting proportional to each SNP’s average allele frequency (Po ). This selection minimises the bias in the estimate, since these SNPs have the highest proportion of distinguishable reads (n /v ; Figure 1). Our 95% confidence intervals are based on the standard deviation of the intercept estimates (‘standard error’ in R’s terminology) they correspond to the values 1.96 standard deviations on either side of the respective intercept estimate, transformed back into linear space.
To select sites for the diverged sites method, we carried out principal component analyses on the variant data. In both the parrot and grasshopper datasets this PCA identified two populations, arbitrarily designated populations A and B, carrying distinct mitochondrial genotypes. We then selected sites that showed fixed genotype differences between these populations to obtain the diverged sites estimates as described above.