Table 3. Anova table for the model terms in the beta regression
model of the accuracy data. (Significance: *** p <0.001, **
p<0.01, * p<0.05). Equivalent tables for precision
and recall in Supplementary 9C).
Discussion
Ecoacoustics is a new and rapidly expanding field of ecology, with great
power to describe ecological systems (e.g. Sethi et al. , 2020),
but methodological choices have proliferated that have poorly known
impacts on ecoacoustic analysis. We show that the choice of audio index
is key and confirm (Sethi et al. , 2020) that a multi-dimensional
generalist classifier outperforms more traditional Analytical Indices
regardless of the levels of audio compression or recording schedule.
Analytical Indices have been constrained to a limited set of features
within soundscapes, leading to strong non-independence. For example,
ADI, AEve and H indices are all summaries of the evenness of frequency
band occupancy (Sueur, Aubin and Simonis, 2008; Villanueva-Riveraet al. , 2011). This non-independence can further decrease the
dimensionality of suites of Analytical Indices, which are already
typically small. Here, we use just the mean values of Analytical
Indices, but other studies have incorporated both the mean and standard
deviation (Bradfer‐Lawrence et al. , 2019), which provides further
dimensionality. Although the AudioSet Fingerprint clearly benefits from
a large number of relatively uncorrelated acoustic features, most
Analytical Indices have the advantage of being designed to capture
ecologically relevant aspects of the soundscape.
Compression affected the quantification of all indices (Fig. 3) and –
although the qualitative patterns are noisy – the groupings seen may
reflect the underlying algorithms. The apparent threshold for AudioSet
Fingerprint at CBR16 may be due to the obligatory loss in audio quality
before samples pass to the AudioSet CNN. The audio is downsampled to
16kHz and then presented as a mel-shifted spectrogram, which increases
sensitivity in frequency ranges relevant to human hearing, akin to those
frequencies favoured in commercial compression. Coupled with its
variable quality training set (Youtube Videos) these factors may
predispose AudioSet Fingerprint to perform as well with high-quality
audio as with intermediate and low-quality MP3s.
The M and NDSI were also largely unaffected by compression until the
frequency range is reduced. When mp3 audio is compressed below 32kb/s
the audio swaps from being encoded as MPEG-1 Audio Layer III (which
supports max frequency of 16-24kHz) to MPEG-2 Audio Layer III (max:
8-12kHz), this change in format results in the removal of signals beyond
the cut-off frequency threshold. Further reduction is seen where at CBR8
when encoding changes again to MPEG-2.5 Audio Layer III (max: 4-6kHz).
The M index is explicitly a measure of amplitude (Sueur et al. ,
2014) and is largely unaffected until downsampling reduces amplitude.
Similarly, NDSI measures the proportion of sound in biophonic vs.
anthropophonic frequency bands: as downsampling progressively eliminates
sounds within the frequency range (2 – 11 kHz) containing most
biophony, NDSI is known to increase (Kasten et al. , 2012).
AEve and H, both of which describe the spread and evenness of amplitude
over the full range of frequencies, showed a gradual increase inD that reversed when the maximum coded frequency reduced. The two
measures differ in measuring dominance (Villanueva-Rivera et al. ,
2011) and evenness (H: Sueur et al. , 2014) across bands but may
share a common explanation. In both cases, compression preferentially
removes amplitude from some bands, initially decreasing evenness but
downsampling removes bands entirely, possibly restoring a more even
distribution.
ACI and Bio all share a dependence on high frequency or quieter sounds
and were generally most severely affected by compression. ACI measures
frequency band dependant changes in amplitude over time (Pieretti,
Farina and Morri, 2011), and is reduced when there is minimal variation
between time steps. Loss of “masked” sounds under low compression and
then 16 – 24 kHz sound under CBR16 may reflect the loss of ecoacoustic
temporal variation: this band includes the calling range of many
invertebrates, birds, mammals and amphibians (Browning et al. ,
2017). The Bio index similarly quantifies the spread of frequencies in
the range 2kHz- 11kHz, all relative to the quietest 1kHz band (Boelmanet al. , 2007): loss of quiet frequency bands, therefore, make it
uniquely sensitive to compression. Despite both of these indices
incurring alterations 200% larger than the uncompressed range, the
Analytical Indices classifier accuracy still showed robustness to
compression, perhaps suggesting these indices are less important for
classification than the others. Bradfer‐Lawrence et al. , (2019)
have already shown that the Bio index contributes little additional
power to classification tasks, but found that ACI was the strongest
individual contributor (Bradfer‐Lawrence et al. , 2019). Our
findings suggest this ranking may not be consistent across different
levels of compression.
Our findings reflect those of an earlier study that explored the effect
of mp3 compression (VBR0 and CBR128) on indices describing specific bird
calls (Araya-Salas, Smith-Vidaurre and Webster, 2019). They found that
compression did not cause a systemic deviation in all indices, but
rather indices designed to capture extreme frequencies were less precise
after compression, particularly with VBR encoded files (Araya-Salas,
Smith-Vidaurre and Webster, 2019). While some of these principles are
present in our findings, the use of a wider range of compressions has
allowed us to develop a more complete description of the action of
compression on soundscape indices.
We found that even the highest rate of compression caused a
comparatively small reduction the overall accuracy of the classification
task (5.8% and 3% for Analytical Indices and the AudioSet Fingerprint
respectively, 5minute whole-day). In both cases, the reduction in
accuracy was explained by a higher degree of overlap between primary and
logged forest. When audio is compressed, the whole signal is altered but
higher frequencies and quieter sounds are more severely altered and
reduced than others . Higher and quieter frequencies (akin to specific
animal vocalisations) may therefore be more important for separating
logged and primary– but less so for discerning cleared from other
forest types (which may be more dependent on overall level). These
proportionally small differences, while somewhat reassuring, should be
considered with caution it could due to the large differences among our
three habitat classes. Accuracy may not have been conserved so well in
areas of more closely related forest.
Both Analytical Indices and AudioSet Fingerprint had similar changes in
variance as a result of recording length. Transient vocalisers are
therefore likely somewhat important in the determination of the AudioSet
Fingerprint and a mixed level of importance in some Analytical Indices.
The ACI index was not impacted by recording length despite specifically
quantifying how the soundscape changes over time (Pieretti, Farina and
Morri, 2011). The ADI, AEve and H all did incur an alteration in
variance as recording length changed, interestingly these indices do not
consider any temporal value but rather just the spread of frequency
(Sueur et al. , 2008; Villanueva-Rivera et al. , 2011),
indicating that transient calls akin to short term anomalies in
frequency are perhaps lost in when recording windows are altered.
Finally, we found that subsetting audio data temporally and analysing
them separately had an unpredictable accuracy on the classification
task, with Audioset Fingerprint classifier staying consistently high
while the Analytical Indices classifier was returning accuracies
anywhere between 20 and 100%. Temporal subsetting can reduce the impact
of diel variation on analyses but poses a trade-off as it reduces the
amount of data used to train the classifier. It is recommended that
> 120 h of recordings are required for Analytical Indices
to stabilise (Bradfer‐Lawrence et al. , 2019), yet in our study,
we had just 70 – 75 h of recordings per site. Overall we found that
compression, frame size and temporal subsetting caused a small decrease
classifier accuracy, with the largest overall contributor being the
choice of AudioSet Fingerprinting over Analytical Indices. The AudioSet
Fingerprint classifier, temporally sectioned and trained on just 2 hours
of data was able to, on average, outperform the Analytical Indices
classifier trained on the full 24h.
Recommendations and Conclusion
Based on the results of this study we provide the following four
recommendations:
- When classifying Soundscapes, use AudioSet Fingerprinting rather than
Analytical Indices.
- Lossless compression is always desirable but if data
storage/transmission become a bottleneck to a study, we advise using
the VBR (quality = 0) MP3 encoder if using Analytical Indices, which
will reduce the file size to roughly 23% of the original while having
minimal impact on indices (other than ACI). The AudioSet Fingerprint,
however, is more robust to compression and so can tolerate a minimum
compression encoding of CBR64 (8% of the original file size) without
significant effect.
- If further compression is a necessity, use indices which describe the
general energy of the system rather than those which are dependent on
high frequency or quieter sounds.
- Temporal subsetting may be a useful alternative for capturing
soundscape descriptors with AudioSet Fingerprinting when data storage
costs are a bottleneck. However temporal subsetting should be used
with caution when using Analytical Indices.
There exists a trade-off between the quality and volume of data that can
be stored in ecoacoustics. We have investigated the impact of
compression along a gradient of habitat disturbance, providing evidence
that compressed audio can be used without severely affecting acoustic
index values. The ability to use compression may reduce experimental
costs, remove bottlenecks in study design, and help remote ecoacoustic
recorders reach true autonomy. Moreover, by providing a quantified
description of how individual indices, and more broadly grouped index
categories, respond to compression, we have enabled comparisons to be
drawn between studies of compressed and non-compressed audio. Increasing
comparability of studies will become progressively important as global
ecoacoustic databases and recording sites grow and open up novel
opportunities to explore datasets across huge temporal and geographic
scales. Such a task can now be cautiously approached using meta-analysis
of non-uniform acoustic data, while a simultaneous trajectory towards
more standardised practices will enable more rigorous analyses in the
future.
Acknowledgements
We firstly thank Dr Henry Bernard at the Sustainability of Altered
Forest Ecosystems project in Malaysian Borneo for permitting us to
research within their field sites. This project was funded by the
Natural Environmental Research Council, UK within the Quantitative
Methods in Ecology and Evolution (QMEE) Centre for Doctoral Training.
Author’s Contributions
B.E.H., L.P., S.S.S, R.M.E., and C.D.L.O. contributed to the
conceptualization and implementation of the study. B.E.H. and R.M.E. led
fieldwork and data collection. B.E.H., S.S.S., L.P., and C.D.L.O,
designed and ran index extraction pipeline and data analysis. B.E.H and
C.D.L.O developed the statistics and figures for the main text and
supplementary. B.E.H led the manuscript writing process aided with
revisions provided by all authors.
Data Accessibility
Acoustic Data: Will be made available on Zenodo and accessible via
permenant DOI
Analytical Indices/ AudioSet Fingerprint Data: Will be available on the
SAFE project website, and accessible via permenant DOI.
Analysis Scripts: Available on Github at
https://github.com/BeckyHeath/Experimental-Variation-Ecoacoustics-Analysis-Scripts
(made public after publication)
References
Araya-Salas, M., Smith-Vidaurre, G. and Webster, M. (2019) ‘Assessing
the effect of sound file compression and background noise on measures of
acoustic signal structure’, Bioacoustics . Taylor & Francis,
28(1), pp. 57–73. doi: 10.1080/09524622.2017.1396498.
Boelman, N. T. et al. (2007) ‘Multi-trophic invasion resistance
in Hawaii: Bioacoustics, field surveys, and airborne remote sensing’,Ecological Applications , 17(8), pp. 2137–2144. doi:
10.1890/07-0004.1.
Bohnenstiehl, D. R. et al. (2018) ‘Investigating the utility of
ecoacoustic metrics in marine soundscapes’, Journal of
Ecoacoustics . doi: 10.22261/jea.r1156l.
Bradfer‐Lawrence, T. et al. (2019) ‘Guidelines for the use of
acoustic indices in environmental research’, Methods in Ecology
and Evolution , 10(10), pp. 1796–1807. doi: 10.1111/2041-210x.13254.
Browning, E. et al. (2017) ‘Passive acoustic monitoring in
ecology and conservation’, WWF Conservation Technology Series 1 ,
2(October), pp. 1–75. doi: 10.13140/RG.2.2.18158.46409.
Buxton, R. T. et al. (2018) ‘Efficacy of extracting indices from
large-scale acoustic recordings to monitor biodiversity’,Conservation Biology , 32(5), pp. 1174–1184. doi:
10.1111/cobi.13119.
Costello, M. J. et al. (2016) ‘Field work ethics in biological
research’, Biological Conservation . Elsevier Ltd, 203, pp.
268–271. doi: 10.1016/j.biocon.2016.10.008.
Cribari-Neto, F. and Zeileis, A. (2010) ‘Beta regression in R’,Journal of Statistical Software , 34(2), pp. 1–24. doi:
10.18637/jss.v034.i02.
Douma, J. C. and Weedon, J. T. (2019) ‘Analysing continuous proportions
in ecology and evolution: A practical introduction to beta and Dirichlet
regression’, Methods in Ecology and Evolution , 10(9), pp.
1412–1430. doi: 10.1111/2041-210X.13234.
Eldridge, A. et al. (2018) ‘Sounding out ecoacoustic metrics:
Avian species richness is predicted by acoustic indices in temperate but
not tropical habitats’, Ecological Indicators . Elsevier B.V., 95,
pp. 939–952. doi: 10.1016/j.ecolind.2018.06.012.
Ewers, R. M. et al. (2011) ‘A large-scale forest fragmentation
experiment: The stability of altered forest ecosystems project’,Philosophical Transactions of the Royal Society B: Biological
Sciences , 366(1582), pp. 3292–3302. doi: 10.1098/rstb.2011.0049.
Fitzpatrick, M. C. et al. (2009) ‘Observer bias and the detection
of low-density populations’, Ecological Applications , 19(7), pp.
1673–1679.
Fuller, S. et al. (2015) ‘Connecting soundscape to landscape:
Which acoustic index best describes landscape configuration?’,Ecological Indicators . Elsevier Ltd, 58, pp. 207–215. doi:
10.1016/j.ecolind.2015.05.057.
Gemmeke, J. F. et al. (2017) ‘Audio Set: An ontology and
human-labeled dataset for audio events’, ICASSP, IEEE
International Conference on Acoustics, Speech and Signal Processing -
Proceedings , pp. 776–780. doi: 10.1109/ICASSP.2017.7952261.
Gómez, W. E., Isaza, C. V. and Daza, J. M. (2018) ‘Identifying disturbed
habitats: A new method from acoustic indices’, Ecological
Informatics , 45. doi: 10.1016/j.ecoinf.2018.03.001.
Hershey, S. et al. (2017) ‘CNN architectures for large-scale
audio classification’, ICASSP, IEEE International Conference on
Acoustics, Speech and Signal Processing - Proceedings , pp. 131–135.
doi: 10.1109/ICASSP.2017.7952132.
Huff, M. H. et al. (2000) ‘A habitat-based point-count protocol
for terrestrial birds, emphasizing Washington and Oregon’, General
Technical Reports of the US Department of Agriculture, Forest Service ,
(PNW-GTR-501), pp. 2–30.
Kasten, E. P. et al. (2012) ‘The remote environmental assessment
laboratory’s acoustic library: An archive for studying soundscape
ecology’, Ecological Informatics . Elsevier B.V., 12, pp. 50–67.
doi: 10.1016/j.ecoinf.2012.08.001.
Linke, S. and Deretic, J. A. (2020) ‘Ecoacoustics can detect ecosystem
responses to environmental water allocations’, Freshwater
Biology , 65(1), pp. 133–141. doi: 10.1111/fwb.13249.
Mammides, C. et al. (2017) ‘Do acoustic indices correlate with
bird diversity? Insights from two biodiverse regions in Yunnan Province,
south China’, Ecological Indicators . Elsevier, 82(March), pp.
470–477. doi: 10.1016/j.ecolind.2017.07.017.
Medina-García, A., Araya-Salas, M. and Wright, T. F. (2015) ‘Does vocal
learning accelerate acoustic diversification? Evolution of contact calls
in Neotropical parrots’, Journal of Evolutionary Biology , 28(10),
pp. 1782–1792. doi: 10.1111/jeb.12694.
Pfeifer, M. et al. (2016) ‘Mapping the structure of Borneo’s
tropical forests across a degradation gradient’, Remote Sensing of
Environment , (176), pp. 84–97.
Pieretti, N., Farina, A. and Morri, D. (2011) ‘A new methodology to
infer the singing activity of an avian community: The Acoustic
Complexity Index (ACI)’, Ecological Indicators . Elsevier Ltd,
11(3), pp. 868–873. doi: 10.1016/j.ecolind.2010.11.005.
Rapport, D. J. (1989) ‘What constitutes ecosystem health?’,Perspectives in Biology and Medicine , 33(1), pp. 120–132. doi:
10.1353/pbm.1990.0004.
Rapport, D. J., Costanza, R. and McMichael, A. J. (1998) ‘Assessing
ecosystem health’, Trends in Ecology and Evolution , 13(10), pp.
397–402. doi: 10.1016/S0169-5347(98)01449-9.
Rempel, R. S. et al. (2005) ‘Bioacoustic monitoring of forest
songbirds: interpreter variability and effects of configuration and
digital processing methods in the laboratory’, Journal of Field
Ornithology , 76(1), pp. 1–11. doi: 10.1648/0273-8570-76.1.1.
Roca, I. T. and Proulx, R. (2016) ‘Acoustic assessment of species
richness and assembly rules in ensiferan communities from temperate
ecosystems’, Ecology , 97(1), pp. 116–123. doi:
10.1890/15-0290.1.
Saito, K. et al. (2015) ‘Utilizing the Cyberforest live sound
system with social media to remotely conduct woodland bird censuses in
Central Japan’, Ambio . Springer Netherlands, 44(4), pp. 572–583.
doi: 10.1007/s13280-015-0708-y.
Sethi, S. S. et al. (2018) ‘Robust, real-time and autonomous
monitoring of ecosystems with an open, low-cost, networked device’,Methods in Ecology and Evolution , 2018(December 2017), pp. 1–5.
doi: 10.1111/2041-210X.13089.
Sethi, S. S. et al. (2020) ‘Characterizing soundscapes across
diverse ecosystems using a universal acoustic feature set’, (24), pp.
1–7. doi: 10.1073/pnas.2004702117.
Struebig, M. J. et al. (2013) Quantifying the Biodiversity
Value of Repeatedly Logged Rainforests. Gradient and Comparative
Approaches from Borneo . 1st edn, Advances in Ecological
Research . 1st edn. Elsevier Ltd. doi:
10.1016/B978-0-12-417199-2.00003-3.
Sueur, J. et al. (2008) ‘Rapid acoustic survey for biodiversity
appraisal’, PLoS ONE , 3(12). doi: 10.1371/journal.pone.0004065.
Sueur, J. et al. (2014) ‘Acoustic indices for biodiversity
assessment and landscape investigation’, Acta Acustica united with
Acustica , 100(4), pp. 772–781. doi: 10.3813/AAA.918757.
Sueur, J., Aubin, T. and Simonis, C. (2008) ‘Equipment review: Seewave,
a free modular tool for sound analysis and synthesis’,Bioacoustics , 18(2), pp. 213–226. doi:
10.1080/09524622.2008.9753600.
Sueur, J., Krause, B. and Farina, A. (2019) ‘Climate Change Is Breaking
Earth’s Beat’, Trends in Ecology & Evolution , 34(11), pp.
971–973. doi: 10.1016/j.tree.2019.07.014.
Sugai, L. S. M. et al. (2019) ‘A roadmap for survey designs in
terrestrial acoustic monitoring’, Remote Sensing in Ecology and
Conservation , pp. 1–16. doi: 10.1002/rse2.131.
Towsey, M. et al. (2014) ‘The use of acoustic indices to
determine avian species richness in audio-recordings of the
environment’, Ecological Informatics . Elsevier B.V., 21(100), pp.
110–119. doi: 10.1016/j.ecoinf.2013.11.007.
Towsey, M. (2018) ‘The calculation of acoustic indices derived from
long-duration recordings of the natural environment’, 0(August 2017),
pp. 0–12.
Towsey, M. W., Truskinger, A. M. and Roe, P. (2016) ‘The Navigation and
Visualisation of Environmental Audio Using Zooming Spectrograms’,Proceedings - 15th IEEE International Conference on Data Mining
Workshop, ICDMW 2015 , pp. 788–797. doi: 10.1109/ICDMW.2015.118.
Vesna, I. (2009) ‘Understanding Bland Altman Analysis’, Biochemia
Medica , 19(1), pp. 10–16. doi: 10.11613/BM.2013.003.
Villanueva-Rivera, L. J. et al. (2011) ‘A primer of acoustic
analysis for landscape ecologists’, Landscape Ecology , 26(9), pp.
1233–1246. doi: 10.1007/s10980-011-9636-9.
Villanueva-Rivera, L. J. and Pijanowski, B. C. (2016) ‘Package
“soundecology”’, http://ljvillanueva.github.io/soundecology/,
CRAN , (Package version 1.3.3), p. 14. Available at:
http://ljvillanueva.github.io/soundecology/.
Zhang, L. et al. (2016) ‘Classifying and ranking audio clips to
support bird species richness surveys’, Ecological Informatics .
Elsevier B.V., 34, pp. 108–116. doi: 10.1016/j.ecoinf.2016.05.005.