Gene duplication analysis across 20 eudicot genomes reveals the
current B. rapa var. parachinensis genome is among the
most high-quality assemblies of Brassica genomes
To assess the completeness of genome assembly and gene models, we used
Orthofinder(Emms & Kelly, 2015) to construct the ortholog group across
20 eudicot species and separate them into three categories: ortholog
group with a single copy gene, two genes and multiple (more than two)
genes. The frequency of each group among the 20 eudicot species revealed
that the Brassica species (i.e. B. napus , B. rapa ,B. juncea and B. nigra ) harbor more duplicated orthologs
than Arabidopsis species (Fig. 3A,B), which is consistent with
the fact that Brassica species experienced an extra whole genome
triplication (WGT) event compared with the model plant Arabidopsis
thaliana (Liu et al.,
2014). Additionally, more duplicated orthologs are identified in the
current B. rapa var. parachinensis genome assembly than in
the two other assemblies of this species with a relative lower N50
(Fig.3A), suggesting that we obtained a higher quality of genome
assembly and gene annotation than previous
studies(Belser et al.,
2018; Zhang et al., 2018). BUSCO analysis suggested that all the 12Brassica species have a high quality of genome assembly and the
current B. rapa var. parachinensis has the highest BUSCO
value (Fig. 3B).
Next, we compared the overlap of gene models among B. rapa var.parachinensis and two other B. rapagenomes(Belser et al.,
2018; Zhang et al., 2018). A total of 19,042 genes are shared by all
three genomes. The Chinese flowering cabbage genome (Fig.3C) has more
specific gene models, which may be caused by the difference of assembly
quality among these three genomes or specific gene amplification
history.