Responses at time point one
Out of 918 responses overall, 653 (71.13%) were correct. Accuracy was highest with the clara technique (221; 72.22%; 95%CI 67-77%) followed by the spectra A technique (218; 71.24%; 95% CI: 66-76%), and the spectra B technique (214; 69.93%; 95%CI: 64-75%). This difference was not statistically significant (p=0.7112). Table2 shows all responses across the three DIE techniques.
To compare the different DIE techniques, we calculated sensitivity and specificity values for each technique. Sensitivity was highest for spectra B (82.35%; 95%CI: 75-88%), followed by spectra A (81.05%; 95%CI: 74-87%) and clara (75.81%; 95%CI: 68-82%). Specificity was highest for clara (68.63%; 95%CI: 61-76%), followed by spectra A (61.44%; 95%CI: 53-69%), and spectra B (57.52%; 95%CI: 49-65%). Specificity, sensitivity and accuracy are depicted in Figure 2.
If we combine values that are responsible for the high specificity of clara (105 TN) and the high sensitivity of spectra B (126 TP), we obtain a theoretical accuracy of 231 correct responses (75.49% of 306). This means, 10 more images would be answered correctly if the two techniques were combined instead of using clara alone.
Additionally, we calculated the positive and negative predictive values for each technique, which are also included in table 2. Highest positive predictive value was reached by clara (70.73%), followed by spectra A (67.76%), and spectra B (65.97%). Highest negative predictive value was reached by spectra B (76.52%), followed by spectra A (76.42%), and clara (73.94%).
Additionally, we inspected accuracy for all images separately. Figure 3 shows that certain images were very difficult to interpret, or information were interpreted wrongly with accuracy below 40% (image 4 and image 8). To illustrate this challenge, the most difficult images are presented in Figure 4.