Discussion
Chinese flowering cabbage (B. rapa var. parachinensis ) is
an important leafy and bolting stem vegetable with high nutritional
value which has been widely grown in
Asia(Tan et al., 2019).
Among the abundant ecological types of Brassica rapa that are
planted as vegetables in China, Chinese flowering cabbage is the one
that is well-adapted to the high temperature and high humidity climate
in the south of China. It can be planted all year round for tender
flower products without the need for a strict vernalization process. In
this study, we report the first chromosome-level genome assembly of this
important ecological B. rapa strain, Chinese flowering cabbage,
which provides a valuable genomic data resource for evolutionary studies
for B. rapa and related Brassica species. This present
study is the first to report on the genome size, heterozygosity, and
repeat content of the Chinese flowering cabbage genome.
Highly continuous genome assembly is critical for genome-wide marker
development and gene model prediction. Enormous studies have
demonstrated that recent long-read sequencing technologies can greatly
improve the continuity of genome
assembly(Song et al.,
2020; Wang et al., 2019;
Belser et al., 2018;
Zhang et al., 2018). In this study, we used PacBio long reads to
assemble the B. rapa var. parachinensis genome. Because of
the low heterozygous ratio (0.16%) of the plants used in this genome
sequencing, we obtained the contig N50 length of 7.26 Mb, which is
longer than the two B. rapa genomes sequenced recently by PacBio
and Nanopore
technology(Belser et
al., 2018; Zhang et al., 2018), and much longer than the genomes ofB. rapa and B. oleracea sequenced using Illumina
technology(Liu et al.,
2014; Wang et al., 2019). We applied the Hi-C technique to scaffold
more than 545 Mb contigs onto 10 chromosomes. The scaffold N50 length of
the final assembly reached 32.3 Mb, with the maximum size of 47.4Mb,
which was similar to the B. rapa Z1 genome sequenced with
Nanopore technology(Belser
et al., 2018) (Table S5). The completeness of the genome (97.8%) was
validated using the BUSCO analysis in the present study, and surpassed
most of the genome of related Brassica species sequenced thus
far, including B. oleraceaHDEM(Belser et al., 2018),B. oleracea var.botrytis (Sun et al.,
2019) and B. rapaZ1(Belser et al., 2018)
(Table S5).
In the present study, the assembly of the Chinese flowering cabbage
genome resolved most of the pericentromeric regions of the B.
rapa . Among them, the pericentromeric regions of chromosome 5 (A05) and
6(A06) were found to be significantly expanded in comparison to other
pericentromeric regions and very few genes were annotated in this region
(Fig. 2B; Fig. 6). This observation can further be verified by the Hi-C
contact map in which the pericentromeric regions of chromosome 5 and 6
have a clear sparse Hi-C contact signal that is mostly caused by
repetitive sequences (Fig. 3). Strikingly, this expansion seems to be
lineage specific since we do not observe a similar pattern in the two
other Brassica genome types, i.e. chromosome C05 and C06 inB. oleracea and B. napus (Belser et al., 2018; Song et al.,
2020), and chromosome B05 and B06 in B. nigra (Fig. 6A). This
lineage specific expansion may play a role in the evolutionary
divergence of Brassica AA, BB and CC genomes. It is worth noting
that such large repetitive regions can only be resolved by long-read
sequencing technology. For example, in the previous studies, B.
rapa Z1 and B. napus AA genome assemblies present a similar but
relatively weaker pattern than the current
assembly(Belser
et al., 2018; Song et al., 2020; Zhang et al., 2018) (Fig. S1).
However, in the assembly of B. rapa (Belser et al., 2018; Song et
al., 2020; Zhang et al., 2018) (Figure S1E), sequenced by PacBio Sequel
with a N50 of 1.45Mb, does not present the large repetitive regions in
its assembly (Supplementary Fig.1E).
The genus Brassica contains three basic genomes, B. rapa(AA genome), B. nigra (BB genome), and B. oleracea (CC
genome), which further hybridize to give rise to three allopolyploid
species, B. napus (AACC genome), B. juncea (AABB genome),
and B. carinata (BBCC
genome)(Cheng et al.,
2016; Sun et al., 2019). In the present study, a phylogenetic tree was
constructed to analyze the evolution of the Brassica species.
Interestingly, the Chinese flowering cabbage shows the closest
relationship with the B. juncea AA genome but not with twoB. rapa genomes (Chinese cabbage and yellow sarson)(Fig.
4)(Belser et al.,
2018; Zhang et al., 2018). The B. rapa species can be further
subdivided into six populations: turnips (Chinese and European turnips),
sarsons (sarson, rapid cycling and spring/winter oilseed), turnip rapes,
taicai and mixed Japanese morphotypes, pak choi (pak choi, wutacai,
Chinese flowering cabbage and zicaitai varieties) and heading Chinese
cabbages(Cheng et al.,
2016). Our results suggested that the donor of the AA genome inB. juncea is most likely from the pak choi group (Chinese
flowering cabbage) in contrast to other B. rapa varieties, such
as sarsons and
turnips(Belser et al.,
2018; Cai et al., 2017). Meanwhile, we found that B. rapa Z1
(sarson) was clustered firstly with B. napus AA genome and then
other AA genomes, implying that it should be the most evolutionary
closest donor of the AA genome in B. napus . Similarly, theB. oleracea can also be subdivided into seven populations such as
kohlrabies, Chinese kale, cauliflower, broccoli, Brussels sprouts, kale
and cabbages(Cheng et al.,
2016). Interestingly, B. oleracea var. capitata(cabbages) was clustered firstly with two B. napus CC genomes and
then with B. oleracea var. italica (broccoli), implying
the donor of CC genome in B. napus was probably evolved fromB. oleracea var. capitata (cabbages) (Fig. 4). Thus, we
demonstrated that high continuity genome assemblies can aid in the
interpretation of evolutionary relationship among Brassicaspecies.
Numerous cases of studies found that structural variations can impact
larger genomic regions than SNPs. Structural variant (SV) discovery
would not only help our understanding of the landscape of genomic
variation within and between species but also reveal the functional
significance of SVs(Fuentes
et al., 2019). In comparison to SVs detection methods that are based on
Illumina short reads, the whole assembly-based method can fully recover
the SVs in theory but still depend on assembly quality. SVs studies in
human(Audano et al.,
2019; Huang et al., 2010), and in a wide range of plant species, such
as rice(Fuentes et al.,
2019), Maize(Mahmoud et
al., 2020), tomato(Voichek
& Weigel, 2020), andArabidopsis (Voichek
& Weigel, 2020) indicate that SVs can affect a large proportion of
coding genes. In current study, we detect SVs between the genome
assemblies of two Brassica rapa lines and identified a total of
27,190 insertions, 26,002 deletions, 1,368 duplications and 46
medium-sized inversions with size from 5.2Kb to 1,431.6 Kb, and 8,565
complex SVs with imprecise breakpoints between them (Fig. 7). This is
the first report of SVs that detect between Brassica genomes
using high contiguity genome assemblies. These SVs may affect coding
genes that may further contribute to phenotypic variations, such as
morphological and phytochemical characteristics.
In summary, we report a chromosome-level genome assembly of Chinese
flowering cabbage and its accurate gene and TE annotation. The
phylogenetic analysis indicates this genome has a closer evolutionary
relationship with the AA diploid progenitor of B. juncea . We also
found the lineage specific pericentromeric expansion events on the
chromosome 5 and 6 of the Brassica AA genome compared to the
orthologous genomic regions in the Brassica BB and CC genomes.
Finally, we report a large amount of structural variations (SVs) between
two B. rapa lines (Z1 and parachinensis ) using high
continuity genome assemblies. Overall, our high-quality genome assembly
of Chinese flowering cabbage provides a valuable genetic resource for
deciphering the genome evolution of Brassica species and it would
serve as the reference genome guiding the molecular breeding practice ofB. rapa crops.