2.5.1 Principal component analysis
PCA was performed using LASER v 2.04 (Wang et al., 2015) which uses projection Procrustes analysis for the samples with low depth of coverage were placed in the context of a reference PCA space constructed using genotypes of a set of reference individuals with higher coverage depth. The first PCA analysis with blue, fin and sei whale samples which included 18 NA blue whales (four present-day and six historical samples from NWA, and eight present-day samples from NEA); one historical sample from an Antarctic blue whale (Table 1); seven present-day fin whales from NA and two sei whales (SRR5665645 and SRR5665646). Historical samples, NWa-R4, NWa3, NWa4, NWa5, NWa6, NWa-CM1 were included in this analysis. The second PCA analysis visualized the genetic relationship among blue whales, which included 12 present-day blue whales sampled from both sides of the NA, three historical samples from the NWA (NWa-R4, NWa3 & NWa4) and the historical sample from Antarctica. The trimmed sequences of the blue, fin and sei whales were reference aligned to the assembled NA blue whale genome autosomes, variant detected and VCF format files were generated. The biallelic SNPs for the PCA were filtered for sites present in at least 50% of the samples, >10 bases apart, with a quality score of >30, mapping quality >30, coverage depth of between 3X and 130X and MAF of >0.1. The sites were filtered for linkage disequilibrium by eliminating sites with a correlation coefficient (r2) > 0.8 within a 1kb window and 4,136,458 and 2,620,383 sites were used for the first and second PCA analysis, respectively. The ancestry reference PCA space for the first PCA analysis was constructed using 11 present-day NA blue whales with >20X coverage (Table 1), two present-day fin whales with >20X coverage and two sei whales with ~10X coverage. Four NWA and six NEA present-day blue whales with high coverage and a sei whale (SRR5665645) were included to compute the reference PCA space for the second PCA analysis.