Gene prediction and functional annotation
The repeat masked genome was used for predicting subsequent protein-coding genes with a combination of three complementary methods:de novo , homology-based, and transcriptome-based prediction. Augustus v. 3.3.257 (Stanke et al., 2004), GlimmerHMM v. 3.0.458 (Majoros et al., 2004) and Genscan (Burge & Karlin, 1997) were used forde novo predictions. GeMoMa v.1.3.161 (Keilwagen et al., 2019) was used for homology-based predictions, with protein sequences fromArabidopsis thaliana, Eragrostis curvula, Eragrostis tef, O. thomaeum, Oryza sativa, Prunus persica, Sorghum bicolor, Triticum aestivum, Zea mays . For transcriptome-based predictions, we first sequenced the RNA libraries generated from five tissues (i.e.,root, rhizome, rhizome tip, young leaf and mature leaf) and assembled the RNA-seq reads into transcripts using Trinity v. 2.1.162 (Grabherr et al., 2011) with default parameters. We also used all the RNA-seq reads to assess genome assembly quality by mapping to the final assembled genome using PASA v. 2.1.063 (Haas et al., 2003). Finally, all predictions of gene models yielded by the above approaches were integrated using EVidenceModeler (EVM) v. 1.1.1 (Haas et al., 2008) to generate a consensus gene set.
The predicted protein-coding genes were functionally annotated by searching against databases. We used Interproscan v. 5.36 (Apweiler et al., 2000), including Gene Ontology (GO) database annotations, protein motifs and domains, functional classifications, protein family identification, transmembrane topology, and predicted signal peptides, to obtain a comprehensive annotation of the predicted protein-coding genes. We used a custom Perl script to get the annotation information. Then, KOBAS (http://kobas.cbi.pku.edu.cn/annotate/) was used to search the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa & Goto, 2000) for orthologs. Finally, we used BLASTP to search against the Swiss-Prot (Bairoch & Apweiler, 2000), NR (Pruitt et al., 2007), and KOG databases with an e-value cutoff of 1e-5. All of the best hits of these database searches were integrated to obtain the final functional annotation result.