Model Hyperparameters and Data Pre-processing
We used the ‘caret’ package in the programming language R to train and assess the performance of the classifiers. To evaluate the models, 20% of the data (6 samples) were retained for testing and the remaining 80% (26 samples) were used to train each model. While training, we assessed the performance of each classifier on the validation set and selected the one with the highest area under the ROC curve which is available using the twoClassSummary as the summary function.
In order to avoid overfitting to the validation set, we performed cross-validation repeatedly 100 times We chose the number of folds to be 10, a standard practice in ML. The optimal hyperparameters were found by caret implicitly, by performing a grid search through the most likely values. Furthermore, as pre-processing, the data were centred, scaled, and freed from variables of (near) zero variance, as it is standard in ML.