2.3 Genome assembly
The Illumina paired end reads were used for k-mer analysis to estimate
the genome size and heterozygosity with a k-mer length of 17 bases.
Specifically, the k-mer number and distribution were calculated based on
Jellyfish (version 1.1.10, parameters set to -C, -m 17, -s 10G, -t 80),
whereas the genomic information was counted and visualized using
GenomeScope (version 2.0, parameters set to 12, 150) (Ranallo-Benavidez,
Jaron, & Schatz, 2020, Marcais & Kingsford, 2011). Pacbio sequencing
data were used to assemble the draft genome using Wtdbg2 (version 2.5,
parameters set to -t 8, -p 21, -S 4, -s 0.05, -g 274m, -L 5000) (Ruan &
Li, 2020). Potential sequences
from bacteria, fungi and other microorganisms were removed by aligning
the genome sequences to the Nt database. Both long and short reads were
utilized to correct base errors in the draft genome using NextPolish
(Hu, Fang, Su, & Liu, 2019). HaploMerger2 (with default parameters) and
purge_haplotigs (parameters set to -m 4G; -t 60; -l value1, -m value2,
-h value3; -t 60, -a 70) were adopted to remove the heterozygous regions
in the genome (Huang, Kang, & Xu, 2017, Roach, Schmidt, & Borneman,
2018).
To construct the chromosome-level genome assembly, Hi-C sequences were
aligned to the haploid genome assembly using Juicer (version 1.5, with
default parameters). An initial chromosome-level assembly was generated
via the 3D de novo assembly (3D-DNA)
(version 180114) analysis with the
parameter “-r 3” (Dudchenko et al., 2017). The final chromosome-level
assembly was reviewed using Juicebox Assembly Tools (JBAT, version
1.11.0, with default parameters) (Dudchenko et al., 2018). The
completeness of genome assembly was assessed using BUSCO (v5.1.3)
(Waterhouse et al., 2018) to scan the universal single-copy orthologous
genes selected from Eukaryota, Arthropoda, Insecta and Hemiptera
datasets (odb_10). The final assembly was validated based on the
Illumina reads and RNA sequencing (RNA-seq) reads via bowtie2
(Table S1).
2.4 Localization of the
sex chromosomes and autosomes
The mapped reads per million
(MRPM) of each chromosome for female and male Illumina reads were
calculated to locate the sex chromosomes and autosomes (Ye et al.,
2021). The normalized read counts of the X chromosome are approximately
twice higher in females than those in males, because males have only one
copy of the X chromosome, while female have two copies. Both males and
females have two copies in the autosomes, and the ratio of males and
females is expected to approach 1 (Pal & Vicoso, 2015). Male and female
DNA reads were mapped separately to the genomic scaffolds using Bowtie2
with default parameters (Langmead & Salzberg, 2012). The resulting
alignments were later filtered to remove the low-quality mapped reads
via SAMtools view (-b -q 30). The read counts of each chromosome were
calculated using SAMtools idxstats (Li et al.,
2009). The sex chromosomes were
then verified by comparison with other species. Syntenic blocks of genes
were identified between the chromosome-level genome assemblies ofS. chinensis , Acyrthosiphon pisum , Rhopalosiphum
maidis , E. lanigerum by adopting MCSCANX and visualization via
Dual Systeny Plotter for MCSCANX of the synteny visualization of TBtools
(version 1.09, Chen et al., 2020) (Table S1).