2.5.4 Network Analysis
Haplotype median-joining networks were created using the program POPART (Leigh & Bryant, 2015) with the default settings. This was carried out for 28 whole mitochondrial genomes including 14 from the NWA (four present-day and ten historical samples), eleven from the NEA (one historical, seven present-day from this study, and three from GenBank MF409242, X72204 and assembled from SRR5665644), a South Atlantic historical sample, an Antarctic historical sample and one of uncertain North Atlantic origins.
The median-joining network for the control region was examined for the 28 samples described above along with an additional 126 mitochondrial partial control region sequences available in GenBank (Supplementary Information). Additional haplogroups available from GenBank were not included due to their shorter sequence length as the algorithm collapses sites that are missing or ambiguous. The consensus length of the control region sequence examined was 413 base pairs.
2.6 Heterozygosity
To estimate heterozygosity, ten present-day NA blue whale samples with sequence coverage of >20X were aligned to masked autosomal contigs of NA blue whales as described above and the resulting alignments were analyzed with default settings of MLRHO (Haubold et al., 2010). MLRHO gives a maximum likelihood estimate of the population mutation rate (4Neµ) from individual whole genome sequencing data, which approximates expected heterozygosity under the infinite sites model. The heterozygosity of the Antarctic historical sample (~6X) was determined using ANGSD based on site frequency spectrum (SFS) using infinite sites model. The trimmed paired-end reads from the Antarctic sample were also aligned to the NA blue whale masked autosomal contigs and ANGSD analysis was performed with filtering for quality score of >20, mapping quality >20 and options: -noTrans (to remove deaminated cytosine residues in historical samples) and -fold 1.