Transcriptome assembly and expression analysis

On average, the 90 transcriptome libraries produced around 14.6 million reads. All libraries were pooled to construct a de novo assembly with Trinity, yielding 177,948 isotigs. The assembly was of high quality, since the Busco score of universal single-copy genes (Seppey, Manni, & Zdobnov, 2019) was 87% (including complete and fragmented genes), but it likely contained many splice variants and allelic variants, because the proportion of duplicated gene models was high (46.3%). Careful clustering and filtering (see M&M) narrowed the set down to 24,332 high quality non-redundant transcripts with an average length of 1,858 bases. The procedure reduced the Busco score to 66.5%, but clearly removed the allelic variants, as only 2% of gene models remained duplicated. The filtered gene models had mostly low expression counts and therefore were of low biological significance to the experiment.
On average 75 % of the transcriptome data mapped to the de novoassembly (Supplementary Table 1). Principal Component Analysis (PCA) of TPM normalized gene expression data showed a clear grouping of the samples by genotype (Figure 1g) along the first three PCs. These first three PCs explained altogether 53 percent of the total variation, illustrating that genotype effect is the main contributor to the variation between samples. The inoculation treatment had a smaller but marked effect on gene expression, which was clearly demonstrated in genotype-specific PCA plots where the effect of the inoculation was associated with PC1 and PC2 axes (Figure 1 b to f). For example, in resistant R1 and R2 genotypes the variation explained by PC1 was 65 and 52 percent, respectively, and clearly separated the inoculated and control plants (Figure 1 b and c).