Table 2. Confusion matrices from random forest classifiers trained on AudioSet Fingerprint (a, c) and Analytical Indices (b, d) using uncompressed raw audio (a, b) and highly compressed CBR8 audio (c, d).
Impact of Temporal Subsetting
Temporally subsetting poses a trade-off as diel variation is reduced at the cost of reduced recording hours. Temporally subsetting the day into quarters (Fig. 4) yielded a largely unpredictable effect on accuracy, precision and recall. There are clear differences in discrimination between pairs of sites. Notably comparing cleared and primary forest has the highest precision across each temporal window, index choice and compression (Fig. 4 e,f) but the recall was not markedly different from other pairs (Fig. 4 k,l). Temporal windows did not generally help discriminate between logged and primary forest (Table 2, Fig. 4 g,h,m,n) and the marked performance difference between AudioSet Fingerprints and Analytical Indices was largely maintained.
Combined Effects of Parameter Alterations on Classification Performance
Our model has shown that performance measures were consistently higher when classifiers are trained on the AudioSet Fingerprint, rather than Analytical Indices (Accuracy: +16.9% (z=10.381799p<0.001), Precision: +15.5% (z = 9.7171799p<0.001), Recall: +16.9% (z=10.221799p<0.001), full model outputs Supplementary 9C). Index type was by far the largest contributor to model accuracy (Table 3), although there was some effect of temporal splitting, compression level and frame size. Despite the considerable impact of compression level on index values, it appeared to have a minor effect of model accuracy (Fig. 5, Table 3). The effect of frame size appeared to increase as the days were cut into smaller temporal subsections, however, this effect was small compared to the contribution of index type (Fig. 5). Temporal subsetting appeared to have minimal effect on the accuracy of the AudioSet Fingerprint classifier, which kept consistently high (70-100%, Fig. 5). The Analytical Indices, however, became much more unpredictable when temporal subsetting is used (20-100%, Fig. 5)