2.3 Genome sequencing and assembly
Illumine sequencing was performed to evaluate genome size,
heterozygosity and rate of duplication and polish de novo assembly. A
paired-end library (insert size: 350 bp) was constructed on Illumina
NovaSeq platform. The raw data generated were filtered by FASTAQC. After
filtering, we yielded a total of 112.81 Gb clean data with 176× sequence
coverage.
High-quality genome DNA was fragmented to construct SMRT bell library
with PCR-free. After the library size was tested to be qualified by
Qubit 3.0 and Agilent 2100, it was sequenced on SMRT cell by PacBio
Sequel Ⅱ sequencing platform (Pacific Biosciences) with ×186.17 Mean
Depth. we obtained a total of 169.37 Gb clean data after filtering and
7,960,820 subreads (mean subreads length: 21,275.65 bp, subreads length
N50: 31,540 bp). Row data generated from PacBio sequencing were
corrected by CANU. In the assembly phase, reads were assembled into
contig and output consensus sequences by WTDBG v2 with default
parameters. PBMM2 (MINIMAP2) was used to map original data to the
reference genome, and ARROW (RACON) for polishing. The previously
polished FASTA sequence was indexed with BWA index, and the corrected
genome was used as the reference genome. Then, the Illumina sequencing
FASTQ data were compared with BWA MEM to perform Pilon error correction
for secondary polishing. To remove the redundancy of the genome after
preliminary assembly and error correction, PURGE_HAPLOTIGS software was
used to identify and remove the redundant heterozygous contigs according
to the depth distribution of reads and sequence similarity. The quality
of genome sequence was evaluated by BUSCO v4 with default
parameters(Manni et al. , 2021).