Strengths and limitations
This SR was based on an ambitious, open and inclusive protocol, which aimed to include studies using any test to support the diagnosis of any food allergy. This way we captured all available evidence beyond the commonly used tests and the most common food allergies. However, we were limited by the number of studies available to do meta-analyses and by the quality of the available evidence. For instance, randomized controlled trials (RCTs) are considered the highest level of evidence for evaluating the effectiveness of diagnostic strategies; however, none of the studies found by our SR followed this methodology. It is important to note that RCTs may not always be feasible or practical for evaluating diagnostic strategies, especially if the strategy is already in widespread use as is the case for SPT, sIgE and CRD. In such instances, observational studies may be used to evaluate diagnostic tests. Evidence from our SR met these criteria and included cross-sectional and cohort study designs. Although we included 8 case control studies, this were judged as having high risk of bias and did not contribute to the certainty of evidence.
The heterogeneity of studies was a major obstacle for our SR complicating meaningful comparisons across studies. We found variability in the definition of the target condition, in the interpretation of test results and in the characteristics of the study populations. The different diagnostic thresholds implemented across the studies as well as the composition of the food extracts and commercial brands could affect the sensitivity and specificity of the tests used in the meta-analyses. Most studies on FA diagnosis have been conducted in children. Of the studies included, 60.4% were undertaken in a population ≤12 years of age. While these studies have provided important insights, they may not be fully generalizable to adults.
Our data highlights the important of having age validated cut-offs for allergy diagnostic test. Previous research has examined diagnostic test accuracy in specific age groups or ethnicities as one single population and pooled analysis of this data have thus far not been performed. While the individual raw data were not available, we were able to draw inferences of interest. For example, we found that peanut-sIgE had greater diagnostic accuracy in children under 2 years of age while Ara h 2-sIgE exhibited high specificity among adults.
Data included in the SR came mainly from Europe. Multiple geographical locations had only limited or no studies, such as Southeast Asia, Middle East, Africa and Central and South America. Only 13.4% of eligible data were derived from multicentre studies, highlighting a need for future collaboration to understand cross-population differences. The lack of representation from certain regions or populations can limit the generalizability of the findings and may not accurately reflect the diversity of the global population.
While studies from Europe may provide valuable insights into the diagnosis in that region, it is important to recognize that test accuracy may vary in other parts of the world. We analyzed the data for different geographical regions and saw that Ara h 2-sIgE presented higher specificity in Northern Europe and Australia than in North America or Asia [184]. Furthermore, various ethnicities within a geographical region could have different diagnostic test accuracies. Most studies included in this SR made no reference to ethnicity variations within the populations studied. Only 12 studies mentioned the ethnicity of the subjects enrolled and 3 studies [80-82] analyzed the accuracy of diagnostic test between different ethnicities within the same population. Better descriptions of the study populations in future diagnostic test accuracy studies may help to establish more personalized approaches.
Another limitation of diagnostic studies is that the results are often dichotomous, meaning that a specific cut-off value is used to classify participants as allergic or tolerant, and this affects the reported diagnostic performance. For example, if a high cut-off value of 8 mm is used, sensitivity (proportion of participants with true food allergy with SPT ≥8mm) would be relatively low while the specificity (proportion of true tolerant participants with SPT <8mm) would be relatively high. This gives a misleading impression that the test has a low sensitivity when it may be good at ruling out food allergy when the SPT result is much smaller (e.g. <3mm). Ideally, a continuous model would be used linking actual results to probability of food allergy to accurately evaluate the results of allergy tests, but this approach requires additional raw data that were not available at this stage. Furthermore, we assessed the cut-offs employed in various studies; this approach using pooled estimates obtained may not accurately represent any specific cut-off point studied. Consequently, there is a need to exercise caution and rate the certainty of the findings lower due to the indirect nature of the evidence.
The sensitivity and specificity of the tests rely on the chosen threshold. Tables S5 and S6 demonstrate that when the threshold is set sufficiently high, almost every test for every food exhibit high specificity. Similarly, by setting the threshold low enough, most tests can achieve high sensitivity. Instead of solely concentrating on pooled results to determine optimal thresholds, it’s important to consider that different studies may have been designed to optimize different factors. Consequently, pooling them together may not yield meaningful results. Utilizing the Youden’s index to maximize sensitivity and specificity can lead to a threshold that does not perform well for either metric.
We performed meta-analyses for maximum sensitivity and specificity, whose aim was to provide insights into the specific cut-offs which could help rule in or out specific food allergies. A highly sensitive test when negative rules out allergic disease while a highly specific test when positive rules it in. The values obtained for the maximum specificity and sensitivity analysis were those provided by the authors as their maximum cut-offs; thus, this is dependent on the way the data is reported in the different studies.