2.5.4 Network Analysis
Haplotype median-joining networks were created using the program POPART
(Leigh & Bryant, 2015) with the default settings. This was carried out
for 28 whole mitochondrial genomes including 14 from the NWA (four
present-day and ten historical samples), eleven from the NEA (one
historical, seven present-day from this study, and three from GenBank
MF409242, X72204 and assembled from SRR5665644), a South Atlantic
historical sample, an Antarctic historical sample and one of uncertain
North Atlantic origins.
The median-joining network for the control region was examined for the
28 samples described above along with an additional 126 mitochondrial
partial control region sequences available in GenBank (Supplementary
Information). Additional haplogroups available from GenBank were not
included due to their shorter sequence length as the algorithm collapses
sites that are missing or ambiguous. The consensus length of the control
region sequence examined was 413 base pairs.
2.6 Heterozygosity
To estimate heterozygosity, ten present-day NA blue whale samples with
sequence coverage of >20X were aligned to masked autosomal
contigs of NA blue whales as described above and the resulting
alignments were analyzed with default settings of MLRHO (Haubold et al.,
2010). MLRHO gives a maximum likelihood estimate of the population
mutation rate (4Neµ) from individual whole genome sequencing data, which
approximates expected heterozygosity under the infinite sites model. The
heterozygosity of the Antarctic historical sample (~6X)
was determined using ANGSD based on site frequency spectrum (SFS) using
infinite sites model. The trimmed paired-end reads from the Antarctic
sample were also aligned to the NA blue whale masked autosomal contigs
and ANGSD analysis was performed with filtering for quality score of
>20, mapping quality >20 and options: -noTrans
(to remove deaminated cytosine residues in historical samples) and -fold
1.