Gene duplication analysis across 20 eudicot genomes reveals the current B. rapa var. parachinensis genome is among the most high-quality assemblies of Brassica genomes
To assess the completeness of genome assembly and gene models, we used Orthofinder(Emms & Kelly, 2015) to construct the ortholog group across 20 eudicot species and separate them into three categories: ortholog group with a single copy gene, two genes and multiple (more than two) genes. The frequency of each group among the 20 eudicot species revealed that the Brassica species (i.e. B. napus , B. rapa ,B. juncea and B. nigra ) harbor more duplicated orthologs than Arabidopsis species (Fig. 3A,B), which is consistent with the fact that Brassica species experienced an extra whole genome triplication (WGT) event compared with the model plant Arabidopsis thaliana (Liu et al., 2014). Additionally, more duplicated orthologs are identified in the current B. rapa var. parachinensis genome assembly than in the two other assemblies of this species with a relative lower N50 (Fig.3A), suggesting that we obtained a higher quality of genome assembly and gene annotation than previous studies(Belser et al., 2018; Zhang et al., 2018). BUSCO analysis suggested that all the 12Brassica species have a high quality of genome assembly and the current B. rapa var. parachinensis has the highest BUSCO value (Fig. 3B).
Next, we compared the overlap of gene models among B. rapa var.parachinensis and two other B. rapagenomes(Belser et al., 2018; Zhang et al., 2018). A total of 19,042 genes are shared by all three genomes. The Chinese flowering cabbage genome (Fig.3C) has more specific gene models, which may be caused by the difference of assembly quality among these three genomes or specific gene amplification history.