Transcriptome assembly and expression
analysis
On average, the 90 transcriptome libraries produced around 14.6 million
reads. All libraries were pooled to construct a de novo assembly
with Trinity, yielding 177,948 isotigs. The assembly was of high
quality, since the Busco score of universal single-copy genes (Seppey,
Manni, & Zdobnov, 2019) was 87% (including complete and fragmented
genes), but it likely contained many splice variants and allelic
variants, because the proportion of duplicated gene models was high
(46.3%). Careful clustering and filtering (see M&M) narrowed the set
down to 24,332 high quality non-redundant transcripts with an average
length of 1,858 bases. The procedure reduced the Busco score to 66.5%,
but clearly removed the allelic variants, as only 2% of gene models
remained duplicated. The filtered gene models had mostly low expression
counts and therefore were of low biological significance to the
experiment.
On average 75 % of the transcriptome data mapped to the de novoassembly (Supplementary Table 1). Principal Component Analysis (PCA) of
TPM normalized gene expression data showed a clear grouping of the
samples by genotype (Figure 1g) along the first three PCs. These first
three PCs explained altogether 53 percent of the total variation,
illustrating that genotype effect is the main contributor to the
variation between samples. The inoculation treatment had a smaller but
marked effect on gene expression, which was clearly demonstrated in
genotype-specific PCA plots where the effect of the inoculation was
associated with PC1 and PC2 axes (Figure 1 b to f). For example, in
resistant R1 and R2 genotypes the variation explained by PC1 was 65 and
52 percent, respectively, and clearly separated the inoculated and
control plants (Figure 1 b and c).