2.5 Identification of differentially expressed genes and pathways
To quantify expression we ran the “align_and_estimate_abundance.pl” script in Trinity v.2.5.1 (Grabherr et al., 2011) software incorporating the RSEM program (Li & Dwey, 2011). Gene-level quantification estimates produced by RSEM were imported into R/Bioconductor with the tximport package. The tximport package produces count matrices from gene-level quantification files with effective gene length taken into account (Soneson, Love, & Robinson, 2016). To detect differential gene expression, we passed the estimated count matrices from tximport to DESeq2 (Love, Huber, & Anders, 2014) and analyzed the two species separately. To build the model for the differential expression analysis we removed low count genes (<10) and genes that were not present in at least 2 replicates. To examine intra-species temperature specific expression pattern in B. calyciflorus s.s., we performed pairwise contrasts within the model in the following combinations: 20 vs.26 oC (control vs. mild heat), 20 vs. 32oC (control vs. high heat), and 26 vs.32 oC (mild vs. high heat) (Figure 1). ForB. fernandoi similarly we performed pairwise comparisons in the following combinations: 20 vs. 23 oC (controlvs. mild heat), 20 vs. 26 oC (controlvs. high heat), and 23 vs. 26 oC (mildvs. high heat) (Figure 1). We used a false discovery rate (FDR) threshold of 0.05 to correct for multiple testing. All differentially expressed genes (DEGs) were annotated against the NCBI non-redundant (nr ) database using blastx (e-value cutoff 1e-10). We assessed overall temperature-dependent patterns of expression by plotting a two-dimensional principal component analysis (PCA) of log-transformed counts for each species separately.
We further categorized differential expression at the gene-pathway level using the online Kyoto Encyclopedia of Genes and Genomes (KEGG) automatic server for KEGG pathway analysis (Moriya, Itoh, Okuda, Yoshizawa, & Kanehisa, 2007), which clusters genes based on their association in biochemical pathways. To estimate whole KEGG pathway expression, genes belonging to the same KEGG pathway were clustered together and a differential pathway expression analysis was performed. Pathways were considered differentially expressed at a false discovery rate (FDR) below 0.05. Analyses and visualization was performed using the “gage” (Luo et al., 2009), “clusterProfiler” (Yu, Wang, Han, & He, 2012), “pathview” (Luo, Weijun, Brouwer, & Cory, 2013), and “ggplot2” (Wickham, 2016) R packages (R Core Team).
To capture similar or contrasting patterns of expression between the two species, orthologous genes were identified using Orthofinder (Emms & Kelly, 2015). From this, we compared expression of orthologous genes between B. fernandoi and B. calyciflorous usingClust (Abu-Jamous & Kelly, 2018). In Clust, gene clusters (groups) are identified that are consistently co-expressed (well-correlated) in both shared and contrasting patterns between species. Within this, we chose patterns that were biologically meaningful for further analysis. These groups/clusters were checked for functional enrichment in any KEGG pathways, using a Fisher Exact Test and correcting for false positives (FDR=0.05).