2.1 Genome assembly and annotation
Muscle tissue for the genome assembly was collected by the Royal Ontario Museum (ROM), Toronto, ON, with the approval of the Minister of Fisheries and Oceans, Canada (SARA permit ref: NLSAR-003-14) from a female blue whale that died close to Newfoundland in 2014 (NW-M6, Fig. 1, Table 1). The Illumina and Pacbio reads for the genome assembly were generated at The Centre for Applied Genomics (TCAG), The Hospital for Sick Children, Toronto, Canada. The genome was assembled using the hybrid assembler MASURCA v 3.2.8 (Koren et al., 2012; see Supplemental Information for more details).
RNA for the transcriptome assembly was collected from a skin biopsy of a blue whale sampled in the Svalbard Archipelago (79°N), Norway (Fig. 1). The paired-end RNAseq data were generated using HiSeq 2500 at TCAG. The transcripts were assembled with TRINITY (Grabherr et al., 2011) and TOPHAT (Trapnell et al., 2009) as elaborated in Supplemental Information. The masked genome was annotated using the MAKER2 (Holt & Yandell, 2011) pipeline with the blue whale transcriptome, NCBI proteins for cow and all cetaceans as explained in Supplemental Information. Functional annotations of the predicted genes were done by BLASTp (Altschul et al., 1997) hits to UNIPROT (UniProt Consortium 2015) using an E-value of <1e-6.