Model Hyperparameters and Data Pre-processing
We used the ‘caret’ package in the programming language R to train and
assess the performance of the classifiers. To evaluate the models, 20%
of the data (6 samples) were retained for testing and the remaining 80%
(26 samples) were used to train each model. While training, we assessed
the performance of each classifier on the validation set and selected
the one with the highest area under the ROC curve which is available
using the twoClassSummary as the summary function.
In order to avoid overfitting to the validation set, we performed
cross-validation repeatedly 100 times We chose the number of folds to be
10, a standard practice in ML. The optimal hyperparameters were found by
caret implicitly, by performing a grid search through the most likely
values. Furthermore, as pre-processing, the data were centred, scaled,
and freed from variables of (near) zero variance, as it is standard in
ML.