Genome assembly, polishing, and draft quality checks
Whole DNA was obtained from two male Tawny owls, where DNA extraction
(from nucleated red blood cells) library preparation and whole-genome
sequencing was outsourced to BGI. Sequencing consisted of PacBio RS II
Single Molecule, Real-Time (SMRT®), with high accuracy
circular library construction (CCS, 40-60kb) (Eid et al., 2009) . Read
polishing at this stage included removal of SMRT adapters and clustering
of redundant subreads sequenced from the same circular molecule into
single reads of insert (ROI). Genome assembly was performed withflye (Kolmogorov, Yuan, Lin, & Pevzner, 2019). Flye uses
a repeat graph as a core data structure as opposed to the most commonly
utilized De Bruijn graphs in short-read and hybrid assemblies. Repeat
graphs do not require exact k-mer matches as those are built with
approximate sequence matches – to tolerate high noise of
single-molecule sequencing reads such as PacBio. Flye major
parameters were set to default overlap of 5000 base pairs (bp) between
reads, while enforcing a minimum reduced coverage for initial
disjointing assembly of 20x – reads with 20x or more were utilized to
initiate the process. In order to explore how enforcing overlaps change
the assembly quality, we performed one assembly with forced minimum
overlap to 1000bp between reads. Lastly, we replicated each assembly to
check consistency of the algorithm and variance of assembly statistics.
Despite flye having a built-in polishing step, we further
utilized PacBio´s polishing pipeline gcpp and pbmm2(https://github.com/PacificBiosciences/pbbioconda). All assemblies were
compared with quast (Gurevich, Saveliev, Vyahhi, & Tesler, 2013)
where we chose the most contiguous, complete and with higher coverage as
a future reference genome. Taxa specific completeness of the chosen
draft assembly was verified with BUSCO utilizing aves_odb as
database of coding regions while also utilizing the northern spotted owl
(Strix occidentalis
caurina ), burrow owl (Athene cunicularia ) and barn owl
(Tyto alba ) assemblies as a term of comparison (Simão,
Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015). Repetitive
elements were identified and masked with RepeatMasker version
4.1.2-p1 and utilizing the HMM-Dfam_3.3 database updated in November
2020 (Chen, 2004). Genome versions utilized in this analysis can be
consulted in the supplemental information document.