Recursive Feature Elimination
The model achieved high accuracy even when using only a small portion of
the available features (genes in this case). In particular, with only 20
variables it achieved 0.9752 ROC, 0.906 Sens and 0.8715 Spec. Therefore,
the model was able to represent the data using only limited information.
The best subset however, proved to be marginally better using 5000 of
the original features (see Supplemental Information), which is around
34% of the available data. We examined the first 20 of these genes and
found significant overlap with the other methods (see Figure 4). Similar
to the approaches above, RFE was able to generalise from the training
data, and obtained 100% ACC on the test set. Below we present a
characterization of the focal genes selected by each approach.