Characterisation of Focal Genes
The results of the described approaches were used comparatively to
obtain a final set of predictors. First, we obtained the top 20 most
important features according to each approach, which were then compiled
into a single list of focal genes. To test the statistical significance
of the overlaps, we calculated the Jaccard Index and Odds Ratio with the
GeneOverlap R package . The annotations of overlapping genes were
obtained using NCBI. Where NCBI could not provide any information on
putative gene function, we used BLAST to retrieve such information from
orthologs in other organisms. We then performed overlap analyses to
detect candidate genes that were in common among the three algorithms.
For comparison with standard analytical methods, we also analysed the
same dataset of RNAseq read counts with a traditional transcriptomic
approach to identify differentially expressed genes across groups (see .
We used two different statistical analyses using the Bioconductor R
package: we performed the Likelihood Ratio Test (LRT) using DESeq2,
version 1.24 , and fit a Generalized Linear Model (GLM) using edgeR . We
adopted a False-Discovery Rate (FDR) equal to 0.05 to invoke
statistically significant difference in gene expression. Lastly, we
compared the outputs of these analyses with the list of candidate genes
from the machine learning approaches to identify common genes.