3.1 Genome assembly.
The NA blue whale genome of 2.49 Mbp DNA was assembled with paired-end Illumina HiSeq X (~110X) and Pacbio (~50X) sequencing technologies using DNA from a female blue whale (NW-M6, Table 1) that washed ashore in Newfounland in 2014 (Fig. 1). The assembly had a total of 11,400 contigs and a N50 of 1.46 Mb (L50 of 449). The completeness of the assembly was assessed using BUSCO (Simão et al., 2015) analysis that showed 94.8% complete genes and 2.6% fragmented genes of the 4,104 reference mammalian single-copy genes tested. The 255 sex chromosome linked contig was identified as well as one contig aligned to the blue whale mitochondrial DNA (Árnason & Gullberg, 1993). The size of the genome was estimated from Illumina reads to be ~2.7Gb indicating 92.6% of genome assembled.
Upon scanning the blue whale genome for repeats, we identified that 46.2 % of the genome was composed of repetitive elements. The most abundant transposable elements were LINES (Long interspersed nuclear elements) comprising 23.21 % of the genome and SINEs (Short interspersed nuclear elements) constituting about 6.86 % of the genome. The other repeat elements included LTR (Long terminal repeats, 6.05 %), DNA elements (3.53%), unclassified (0.05 %), small RNA (3.70 %), satellites (3.98 %), simple repeats (1.00 %), low complexity sequences (0.2 %) and 1.33 % de novo repeats.
Transcriptome assembly annotation predicted 30,867 protein coding genes and 25,736 genes were functionally annotated (Fig. 1). The assessment of the quality of the annotation showed that 65.7% of predicted proteins had known Pfam (Finn et al., 2010) domains.