Statistical analysis
R version 3.3.1 statistical software was used for statistical analysis
(R Core Team (2016) Vienna, Austria). Continuous variables are presented
as medians [interquartile range] and categorical variables as
numbers (%). Comparisons of quantitative data were performed using
Wilcoxon-Mann-Whitney tests. Categorical variables were analyzed using
the chi-square test or Fisher exact test as appropriate. The number of
positive biological sources by age was evaluated using a quasi-Poisson
regression to account for over-dispersion issues. No imputation of
missing data was performed. Heatmaps were used to visualize the data
using graphical representation as a grid of colors (according to the
level of c-sIgE ISU), with rows standing for individuals and columns
standing for components. The heatmaps were stratified according to
severity group and individuals were ordered by age.
Both unsupervised and supervised analyses were performed to assess
underlying data correlations. Components with a positive response (≥0.3
ISU) for at least three subjects and participants with at least one
c-sIgE ≥0.3 ISU were retained for these analyses (Sup Fig.1). Principal
component analyses (PCAs) were performed within the R function
“prcomp”. Biplots of the principal components derived from the PCAs
were plotted based on the classification of severe/non-severe disease.
Then, random forest analyses using the known severity class of the
patients were performed. Receiver operating characteristic (ROC) curves
were used to assess the performance of the model using all c-sIgE to
perform the classification and appraise the model predictions . The area
under the curve (AUC) values indicated the level of precision: excellent
for an AUC between 0.90 and 1.00; good for an AUC between 0.80 and 0.90;
fair for an AUC between 0.70 and 0.80; poor for an AUC between 0.60 and
0.70, and fail for an AUC between 0.50 and 0.60.
The prediction
errors of the random forest analyses were also assessed by calculating
the out-of-bag (OOB) errors. Furthermore, an unsupervised clustering
approach was applied to identify patterns of c-sIgE sensitization among
participants. Sensitization clusters were derived by clustering
participants using Bayesian estimations of a mixture of Bernoulli
distributions (Bernoulli Mixture Model), as previously described in
detail (15). The BayesBinMix R package (15) was used to join estimation
of the number of clusters and model parameters of the Bernoulli mixture
model using Markov chain Monte Carlo sampling. A Poisson prior
distribution was applied for the number of clusters and a uniform
distribution for the Bernoulli parameters.