Analyses of variant data and implementation of the method
We excluded individuals with excess missing data. Our methods described
above are implemented in R with the intercept method relying on a
mixed-effect modelling approach, which limits the number of variant
sites that can be analysed on a desktop computer. This method thus
selects, per dataset, 400 of the most informative SNPs, by subsampling
using a weighting proportional to each SNP’s average allele frequency
(Po ). This selection minimises the bias in the
estimate, since these SNPs have the highest proportion of
distinguishable reads (n /v ; Figure 1). Our 95% confidence
intervals are based on the standard deviation of the intercept estimates
(‘standard error’ in R’s terminology) they correspond to the values 1.96
standard deviations on either side of the respective intercept estimate,
transformed back into linear space.
To select sites for the diverged sites method, we carried out principal
component analyses on the variant data. In both the parrot and
grasshopper datasets this PCA identified two populations, arbitrarily
designated populations A and B, carrying distinct mitochondrial
genotypes. We then selected sites that showed fixed genotype differences
between these populations to obtain the diverged sites estimates as
described above.