Genome annotation and gene prediction
A total of ~327 Mb (58.81%) of theO.
kokonorica genome assembly was identified as repetitive elements. The
vast majority of repeats were classified as LTR retrotransposons,
accounting for 36.02% of the genome, including approximately 28.71%
Gypsy and 7.30% Copia retrotransposons (Table S5), a higher percentage
than that in related species C. songorica (26%). Analysis of
dynamic evolution of LTRs indicated that LTRs of O. kokonoricawere younger than those in C. songorica , which experienced a more
recent expansion with a peak of 0.8 Ma (Figure S3). DNA transposons,
long interspersed nuclear elements (LINEs) and short interspersed
nuclear elements (SINEs) accounted for 8.92%, 3.56%, and 0.22%,
respectively, of the genome assembly (Table S5).
In total, a high-confidence set of 48,598 protein-coding genes was
predicted using a combination of de novo , homology-based, and
transcriptome-based approaches, and 48,521 (99.84%) were anchored into
20 pseudochromosomes (Table 1; Table S6). With similar features of other
Gramineae species, protein-coding genes in O. kokonorica were
4,327 bp long and covered 6.18 exons on average. The lengths of exons
and introns were highly conserved in all five investigated plant genomes
(Figure S4; Table S7), which further illustrated the reliability of the
annotation results. In addition, BUSCO analysis of the protein set
showed that the annotated genome contained 90.60% BUSCOs (Table S8),
suggesting good annotation completeness of protein-coding genes.
Approximately 90.59% of O. kokonorica genes could be annotated
by non-redundant nucleotides and proteins in the SWISS-PROT Protein
Sequence Database, Gene Ontology (GO), Kyoto Encyclopedia of Genes and
Genomes (KEGG), Clusters of orthologous groups for eukaryotic complete
genomes (KOG) and Non-Redundant Protein Sequence Database (NR) (Table
S9). We also identified 231 microRNA (miRNA), 1,012 small nuclear
(snRNA) genes, 903 transfer RNA (tRNA), 183 ribosomal RNA (rRNA) in the
genome sequence (Table S10). Characterization and features of theO.
kokonorica genome are exhibited in Figure 1B. The LTRs exhibited an
inverse correlation with the gene density, and these transposable
elements were mainly distributed across the pericentric regions, while
genes were mainly enriched in the more distal chromosomal regions.