Table 2. Confusion matrices from random forest classifiers trained
on AudioSet Fingerprint (a, c) and Analytical Indices (b, d) using
uncompressed raw audio (a, b) and highly compressed CBR8 audio (c, d).
Impact of Temporal Subsetting
Temporally subsetting poses a trade-off as diel variation is reduced at
the cost of reduced recording hours. Temporally subsetting the day into
quarters (Fig. 4) yielded a largely unpredictable effect on accuracy,
precision and recall. There are clear differences in discrimination
between pairs of sites. Notably comparing cleared and primary forest has
the highest precision across each temporal window, index choice and
compression (Fig. 4 e,f) but the recall was not markedly different from
other pairs (Fig. 4 k,l). Temporal windows did not generally help
discriminate between logged and primary forest (Table 2, Fig. 4 g,h,m,n)
and the marked performance difference between AudioSet Fingerprints and
Analytical Indices was largely maintained.
Combined Effects of Parameter Alterations on Classification
Performance
Our model has shown that performance measures were consistently higher
when classifiers are trained on the AudioSet Fingerprint, rather than
Analytical Indices (Accuracy: +16.9% (z=10.381799p<0.001), Precision: +15.5% (z = 9.7171799p<0.001), Recall: +16.9% (z=10.221799p<0.001), full model outputs Supplementary 9C). Index type was
by far the largest contributor to model accuracy (Table 3), although
there was some effect of temporal splitting, compression level and frame
size. Despite the considerable impact of compression level on index
values, it appeared to have a minor effect of model accuracy (Fig. 5,
Table 3). The effect of frame size appeared to increase as the days were
cut into smaller temporal subsections, however, this effect was small
compared to the contribution of index type (Fig. 5). Temporal subsetting
appeared to have minimal effect on the accuracy of the AudioSet
Fingerprint classifier, which kept consistently high (70-100%, Fig. 5).
The Analytical Indices, however, became much more unpredictable when
temporal subsetting is used (20-100%, Fig. 5)