3.1 Genome assembly.
The NA blue whale genome of 2.49 Mbp DNA was assembled with paired-end
Illumina HiSeq X (~110X) and Pacbio
(~50X) sequencing technologies using DNA from a female
blue whale (NW-M6, Table 1) that washed ashore in Newfounland in 2014
(Fig. 1). The assembly had a total of 11,400 contigs and a N50 of 1.46
Mb (L50 of 449). The completeness of the assembly was assessed using
BUSCO (Simão et al., 2015) analysis that showed 94.8% complete genes
and 2.6% fragmented genes of the 4,104 reference mammalian single-copy
genes tested. The 255 sex chromosome linked contig was identified as
well as one contig aligned to the blue whale mitochondrial DNA (Árnason
& Gullberg, 1993). The size of
the genome was estimated from Illumina reads to be
~2.7Gb indicating 92.6% of genome assembled.
Upon scanning the blue whale genome for repeats, we identified that 46.2
% of the genome was composed of repetitive elements. The most abundant
transposable elements were LINES (Long interspersed nuclear elements)
comprising 23.21 % of the genome and SINEs (Short interspersed nuclear
elements) constituting about 6.86 % of the genome. The other repeat
elements included LTR (Long terminal repeats, 6.05 %), DNA elements
(3.53%), unclassified (0.05 %), small RNA (3.70 %), satellites (3.98
%), simple repeats (1.00 %), low complexity sequences (0.2 %) and
1.33 % de novo repeats.
Transcriptome assembly annotation predicted 30,867 protein coding genes
and 25,736 genes were functionally annotated (Fig. 1). The assessment of
the quality of the annotation showed that 65.7% of predicted proteins
had known Pfam (Finn et al., 2010) domains.