1. Characterization of TSs provides an overview of the terpenoid biosynthetic potential in Trichoderma
We used an in-silico approach in order to assign putative functions to 387 TS-encoded proteins currently found in the genomes ofTrichoderma spp. Initially, we identified PT and TC proteins according to their conserved domains. Subsequent detection of the metal-binding motifs enabled protein classification as Class I, Class II or Bifunctional enzymes. Clustering-based phylogenetic analysis using biologically characterized fungal TSs also enabled determining substrate-specificity, as well as assigning putative functions to 15 groups of proteins. TSs sharing conserved domains and metal-binding motifs clustered in the same phylogenetic group, each one highlighted in different colour (Fig. 1 ). TSs accession numbers, TS-content per each species showing specific portions of the TS inventory, and phylogenetic tree including protein accession numbers are available in Supporting Information Table S4 and Fig. S1 .
Analysis revealed specific TSs sharing N-terminal HAD-like (Pfam 13419; PTHR43611:SF3) and C-terminal TS domains (IPR008930) (light-blue colour in Fig. 1). Although they did not cluster with known TSs, the presence of both Class I DDxxE and Class II DxDTT motifs indicates they are bifunctional enzymes. Interestingly, these seems to be exclusive of species belonging to clade Viride (Supporting Information Table S4 ).
We found a vast group of sesquiterpene synthases (sesquiTSs) belonging to the TRI5 superfamily (Pfam 06330) (dark-green colour in Fig. 1), which was particularly represented in species of Viride clade (Supporting Information Table S4 ). It contained 7 trichodiene synthases (TRI5) (PIRSF001388), 15 longiborneol synthases, and two groups of proteins that did not cluster with known TSs, and were therefore named as “uncharacterized group 1” and “2”, respectively. TRI5 was found in species of the Brevicompactum clade, T. gamsii ,T. asperellum and T. guizhouense . This phylogenetic distribution indicates that TRI5 is not a monophyletic trait inTrichoderma , opening questions about its evolutionary origin. Species of Viride clade were the only lacking longiborneol synthases, which were found highly conserved among species from the other clades. Instead, proteins of “uncharacterized group 1” were only present in Viride and some species of the Longibrachiatum clade. In species of Viride, we found two TSs of “uncharacterized group 2”, while T. arundinaceum and T. virens had only one.
The sister clade of the TRI5-superfamily group (light-green colour in Fig. 1) contains Class I proteins sharing a conserved terpene synthase C domain (Pfam 03936), which can be found in sesquiTS and monoterpene synthases (monoTS). It contains two groups of sesquiTS including 16 presilphiperfolan-8β-ol synthases and 22 pentalenene synthases, along with two groups of proteins that did not cluster with known proteins (“uncharacterized 3” and “4”, respectively) (Supporting Information Table S4 ). Presilphiperfolan-8β-ol synthases are absent in Viride species, T. arundinaceum and T. atrobrunneum , whereas pentalenene synthases were found in all the genomes analysed. Although most of these lasts share highly conserved metal-binding motifs, some proteins lack on the NSD/DTE triad but contain an additional DDxxD motif. This suggests they could actually synthesize sesquiterpenes others than pentalenene. TSs of “uncharacterized group 4” are widely distributed across the species, but are particularly represented in T. virens and T. pleuroticola . Differently, proteins of “uncharacterized group 3” seems to be exclusive of species belonging to the Harzianum clade, and their phylogenetic proximity to both groups of sesquiTS suggests this group is also composed by this type of TSs.
We found a large group of PTs, identified as squalene synthases (SQSs) (Pfam 00494; PTHR11626:SF2; PS01044) (orange colour in Fig. 1) and enzymes involved in protein prenylation (Jeong et al., 2018) (red colour in Fig. 1), such as type I geranylgeranyl transferases (GGTases 1) (PTHR11774:SF4), type II geranylgeranyl transferases (GGTases 2) (PTHR11774:SF11) and farnesyl transferases (FTases) (PTHR11774:SF6). In SQSs, we also identified a C-terminal TM helix region of 23 residues, which is universally conserved in all eukaryotic SQSs and it is responsible of binding the protein to the endoplasmic reticulum (Linscott et al., 2016). Although each genome contains one of these TSs, an additional SQS is present in T. pleuroti , indicating that at least one of them is probably pathway-specific (Supporting Information Table S4 ).
All genomes of Trichoderma here analysed contain one copy of oxidosqualene cyclase (OSCs) (Pfam 13249; Pfam 13243; PTHR11764; PS01074) (light-brown colour in Fig. 1) showing DCTSE or DCISE aspartate-rich motifs, both variants of the classical DCTAE reported in these enzymes (Abe, Naito, Takagi, & Noguchi, 2001). Furthermore, these OSCs contain 5 conserved QW motifs, which are thought to be responsible of strengthening the structure of the enzyme and stabilize the carbocation intermediates (Kushiro et al., 2000). Interestingly, TSs from the sister clade of OSCs contain a conserved squalene synthase-phytoene synthase domain (Pfam 00494; PTHR21181:SF13) (dark-brown colour in Fig. 1), but they did not cluster with SQSs neither with reference lycopene-phytoene synthases. They were therefore named as “uncharacterized group 5”, of which one is present in each genome.
Copalyl-pyrophosphate/Ent-kaurene synthases (CPS/KS) (PTHR31739:SF4; PIRSF 026498) were found in T. asperellum , known for its ability for gibberellin biosynthesis (Zhao & Zhang 2015). In addition, we found bifunctional enzymes clustering with CPS/KS in T. citrinoviride ,T. parareesei , T. reesei and species of Brevicompactum clade (PTHR31739:SF4) (grey colour in Fig. 1), but their low sequence similarity with CPS/KS indicates these are diterpene synthases (diTS) not involved in ent-kaurene biosynthesis.
The last cluster (dark-blue colour in Fig. 1) contain proteins sharing a conserved polyprenyl synthase domain (Pfam 00348). Within this group, GGPP synthases (PTHR12001:SF47; PS00723; PS00444) and FPP synthases (PTHR11525:SF0; PS00723; PS00444) were identified, showing the two characteristic DDxxD motifs usually found on these enzymes (Wendt, & Schulz, 1998; Gao et al., 2012). Some species belonging to Harzianum and Brevicompactum clades have 2-3 copies of these PTs class, suggesting that at least some of them could be actually pathway-specific (Supporting Information Table S4 ). Analysis also revealed a set of highly conserved indole diTS, of which some species have more than one. Considering that production of indole diterpenes has not been reported in Trichoderma , our results reveal that these species have at least the potential to produce these compounds. The last group contains Class I TSs clustering with known chimeric TSs from fungi, and were therefore named as chimeric-like, which were absent in species of Viride clade. Most of these proteins contain only polyprenyl synthase or terpene synthase C domains. Nevertheless, we found one protein inT. asperellum TR456 containing both domains which is highly similar to ophiobolin F synthase from Aspergillus clavatus , suggesting this specie is able to produce sesterterpenes.