Transcriptome assembly

We combined all libraries and assembled them de novo using Trinity software (Grabherr et al., 2011), SOAPdenovo-Trans (k-mer sizes 39, 41) (Xie et al., 2014) and Oases (k-mer sizes 39, 43 and 47) (Schulz, Zerbino, Vingron, & Birney, 2012). Next, we combined all the resulting contigs and ran the EvidentialGene pipeline (Gilbert, 2013). We then combined okayset and okayalt outputs of the pipeline and clustered them using RapClust (Srivastava, Sarkar, Malik, & Patro, 2016), and merged the genes within the clusters using Lace (Davidson, Hawkins, & Oshlack, 2017). To remove contamination, the resulting contigs were aligned against NCBI non-redundant protein database (Pruitt, Tatusova, & Maglott, 2007) using BLAST, and only the transcripts that had a best hit in plant kingdom were retained. We also removed the transcripts mapping to ribosomal genes and having ambiguous sites (Ns). Minimum read coverage of three was used for all the assemblies.