Table 3. Anova table for the model terms in the beta regression model of the accuracy data. (Significance: *** p <0.001, ** p<0.01, * p<0.05). Equivalent tables for precision and recall in Supplementary 9C).
Discussion
Ecoacoustics is a new and rapidly expanding field of ecology, with great power to describe ecological systems (e.g. Sethi et al. , 2020), but methodological choices have proliferated that have poorly known impacts on ecoacoustic analysis. We show that the choice of audio index is key and confirm (Sethi et al. , 2020) that a multi-dimensional generalist classifier outperforms more traditional Analytical Indices regardless of the levels of audio compression or recording schedule.
Analytical Indices have been constrained to a limited set of features within soundscapes, leading to strong non-independence. For example, ADI, AEve and H indices are all summaries of the evenness of frequency band occupancy (Sueur, Aubin and Simonis, 2008; Villanueva-Riveraet al. , 2011). This non-independence can further decrease the dimensionality of suites of Analytical Indices, which are already typically small. Here, we use just the mean values of Analytical Indices, but other studies have incorporated both the mean and standard deviation (Bradfer‐Lawrence et al. , 2019), which provides further dimensionality. Although the AudioSet Fingerprint clearly benefits from a large number of relatively uncorrelated acoustic features, most Analytical Indices have the advantage of being designed to capture ecologically relevant aspects of the soundscape.
Compression affected the quantification of all indices (Fig. 3) and – although the qualitative patterns are noisy – the groupings seen may reflect the underlying algorithms. The apparent threshold for AudioSet Fingerprint at CBR16 may be due to the obligatory loss in audio quality before samples pass to the AudioSet CNN. The audio is downsampled to 16kHz and then presented as a mel-shifted spectrogram, which increases sensitivity in frequency ranges relevant to human hearing, akin to those frequencies favoured in commercial compression. Coupled with its variable quality training set (Youtube Videos) these factors may predispose AudioSet Fingerprint to perform as well with high-quality audio as with intermediate and low-quality MP3s.
The M and NDSI were also largely unaffected by compression until the frequency range is reduced. When mp3 audio is compressed below 32kb/s the audio swaps from being encoded as MPEG-1 Audio Layer III (which supports max frequency of 16-24kHz) to MPEG-2 Audio Layer III (max: 8-12kHz), this change in format results in the removal of signals beyond the cut-off frequency threshold. Further reduction is seen where at CBR8 when encoding changes again to MPEG-2.5 Audio Layer III (max: 4-6kHz). The M index is explicitly a measure of amplitude (Sueur et al. , 2014) and is largely unaffected until downsampling reduces amplitude. Similarly, NDSI measures the proportion of sound in biophonic vs. anthropophonic frequency bands: as downsampling progressively eliminates sounds within the frequency range (2 – 11 kHz) containing most biophony, NDSI is known to increase (Kasten et al. , 2012).
AEve and H, both of which describe the spread and evenness of amplitude over the full range of frequencies, showed a gradual increase inD that reversed when the maximum coded frequency reduced. The two measures differ in measuring dominance (Villanueva-Rivera et al. , 2011) and evenness (H: Sueur et al. , 2014) across bands but may share a common explanation. In both cases, compression preferentially removes amplitude from some bands, initially decreasing evenness but downsampling removes bands entirely, possibly restoring a more even distribution.
ACI and Bio all share a dependence on high frequency or quieter sounds and were generally most severely affected by compression. ACI measures frequency band dependant changes in amplitude over time (Pieretti, Farina and Morri, 2011), and is reduced when there is minimal variation between time steps. Loss of “masked” sounds under low compression and then 16 – 24 kHz sound under CBR16 may reflect the loss of ecoacoustic temporal variation: this band includes the calling range of many invertebrates, birds, mammals and amphibians (Browning et al. , 2017). The Bio index similarly quantifies the spread of frequencies in the range 2kHz- 11kHz, all relative to the quietest 1kHz band (Boelmanet al. , 2007): loss of quiet frequency bands, therefore, make it uniquely sensitive to compression. Despite both of these indices incurring alterations 200% larger than the uncompressed range, the Analytical Indices classifier accuracy still showed robustness to compression, perhaps suggesting these indices are less important for classification than the others. Bradfer‐Lawrence et al. , (2019) have already shown that the Bio index contributes little additional power to classification tasks, but found that ACI was the strongest individual contributor (Bradfer‐Lawrence et al. , 2019). Our findings suggest this ranking may not be consistent across different levels of compression.
Our findings reflect those of an earlier study that explored the effect of mp3 compression (VBR0 and CBR128) on indices describing specific bird calls (Araya-Salas, Smith-Vidaurre and Webster, 2019). They found that compression did not cause a systemic deviation in all indices, but rather indices designed to capture extreme frequencies were less precise after compression, particularly with VBR encoded files (Araya-Salas, Smith-Vidaurre and Webster, 2019). While some of these principles are present in our findings, the use of a wider range of compressions has allowed us to develop a more complete description of the action of compression on soundscape indices.
We found that even the highest rate of compression caused a comparatively small reduction the overall accuracy of the classification task (5.8% and 3% for Analytical Indices and the AudioSet Fingerprint respectively, 5minute whole-day). In both cases, the reduction in accuracy was explained by a higher degree of overlap between primary and logged forest. When audio is compressed, the whole signal is altered but higher frequencies and quieter sounds are more severely altered and reduced than others . Higher and quieter frequencies (akin to specific animal vocalisations) may therefore be more important for separating logged and primary– but less so for discerning cleared from other forest types (which may be more dependent on overall level). These proportionally small differences, while somewhat reassuring, should be considered with caution it could due to the large differences among our three habitat classes. Accuracy may not have been conserved so well in areas of more closely related forest.
Both Analytical Indices and AudioSet Fingerprint had similar changes in variance as a result of recording length. Transient vocalisers are therefore likely somewhat important in the determination of the AudioSet Fingerprint and a mixed level of importance in some Analytical Indices. The ACI index was not impacted by recording length despite specifically quantifying how the soundscape changes over time (Pieretti, Farina and Morri, 2011). The ADI, AEve and H all did incur an alteration in variance as recording length changed, interestingly these indices do not consider any temporal value but rather just the spread of frequency (Sueur et al. , 2008; Villanueva-Rivera et al. , 2011), indicating that transient calls akin to short term anomalies in frequency are perhaps lost in when recording windows are altered.
Finally, we found that subsetting audio data temporally and analysing them separately had an unpredictable accuracy on the classification task, with Audioset Fingerprint classifier staying consistently high while the Analytical Indices classifier was returning accuracies anywhere between 20 and 100%. Temporal subsetting can reduce the impact of diel variation on analyses but poses a trade-off as it reduces the amount of data used to train the classifier. It is recommended that > 120 h of recordings are required for Analytical Indices to stabilise (Bradfer‐Lawrence et al. , 2019), yet in our study, we had just 70 – 75 h of recordings per site. Overall we found that compression, frame size and temporal subsetting caused a small decrease classifier accuracy, with the largest overall contributor being the choice of AudioSet Fingerprinting over Analytical Indices. The AudioSet Fingerprint classifier, temporally sectioned and trained on just 2 hours of data was able to, on average, outperform the Analytical Indices classifier trained on the full 24h.
Recommendations and Conclusion
Based on the results of this study we provide the following four recommendations:
  1. When classifying Soundscapes, use AudioSet Fingerprinting rather than Analytical Indices.
  2. Lossless compression is always desirable but if data storage/transmission become a bottleneck to a study, we advise using the VBR (quality = 0) MP3 encoder if using Analytical Indices, which will reduce the file size to roughly 23% of the original while having minimal impact on indices (other than ACI). The AudioSet Fingerprint, however, is more robust to compression and so can tolerate a minimum compression encoding of CBR64 (8% of the original file size) without significant effect.
  3. If further compression is a necessity, use indices which describe the general energy of the system rather than those which are dependent on high frequency or quieter sounds.
  4. Temporal subsetting may be a useful alternative for capturing soundscape descriptors with AudioSet Fingerprinting when data storage costs are a bottleneck. However temporal subsetting should be used with caution when using Analytical Indices.
There exists a trade-off between the quality and volume of data that can be stored in ecoacoustics. We have investigated the impact of compression along a gradient of habitat disturbance, providing evidence that compressed audio can be used without severely affecting acoustic index values. The ability to use compression may reduce experimental costs, remove bottlenecks in study design, and help remote ecoacoustic recorders reach true autonomy. Moreover, by providing a quantified description of how individual indices, and more broadly grouped index categories, respond to compression, we have enabled comparisons to be drawn between studies of compressed and non-compressed audio. Increasing comparability of studies will become progressively important as global ecoacoustic databases and recording sites grow and open up novel opportunities to explore datasets across huge temporal and geographic scales. Such a task can now be cautiously approached using meta-analysis of non-uniform acoustic data, while a simultaneous trajectory towards more standardised practices will enable more rigorous analyses in the future.
Acknowledgements
We firstly thank Dr Henry Bernard at the Sustainability of Altered Forest Ecosystems project in Malaysian Borneo for permitting us to research within their field sites. This project was funded by the Natural Environmental Research Council, UK within the Quantitative Methods in Ecology and Evolution (QMEE) Centre for Doctoral Training.
Author’s Contributions
B.E.H., L.P., S.S.S, R.M.E., and C.D.L.O. contributed to the conceptualization and implementation of the study. B.E.H. and R.M.E. led fieldwork and data collection. B.E.H., S.S.S., L.P., and C.D.L.O, designed and ran index extraction pipeline and data analysis. B.E.H and C.D.L.O developed the statistics and figures for the main text and supplementary. B.E.H led the manuscript writing process aided with revisions provided by all authors.
Data Accessibility
Acoustic Data: Will be made available on Zenodo and accessible via permenant DOI
Analytical Indices/ AudioSet Fingerprint Data: Will be available on the SAFE project website, and accessible via permenant DOI.
Analysis Scripts: Available on Github at https://github.com/BeckyHeath/Experimental-Variation-Ecoacoustics-Analysis-Scripts (made public after publication)
References
Araya-Salas, M., Smith-Vidaurre, G. and Webster, M. (2019) ‘Assessing the effect of sound file compression and background noise on measures of acoustic signal structure’, Bioacoustics . Taylor & Francis, 28(1), pp. 57–73. doi: 10.1080/09524622.2017.1396498.
Boelman, N. T. et al. (2007) ‘Multi-trophic invasion resistance in Hawaii: Bioacoustics, field surveys, and airborne remote sensing’,Ecological Applications , 17(8), pp. 2137–2144. doi: 10.1890/07-0004.1.
Bohnenstiehl, D. R. et al. (2018) ‘Investigating the utility of ecoacoustic metrics in marine soundscapes’, Journal of Ecoacoustics . doi: 10.22261/jea.r1156l.
Bradfer‐Lawrence, T. et al. (2019) ‘Guidelines for the use of acoustic indices in environmental research’, Methods in Ecology and Evolution , 10(10), pp. 1796–1807. doi: 10.1111/2041-210x.13254.
Browning, E. et al. (2017) ‘Passive acoustic monitoring in ecology and conservation’, WWF Conservation Technology Series 1 , 2(October), pp. 1–75. doi: 10.13140/RG.2.2.18158.46409.
Buxton, R. T. et al. (2018) ‘Efficacy of extracting indices from large-scale acoustic recordings to monitor biodiversity’,Conservation Biology , 32(5), pp. 1174–1184. doi: 10.1111/cobi.13119.
Costello, M. J. et al. (2016) ‘Field work ethics in biological research’, Biological Conservation . Elsevier Ltd, 203, pp. 268–271. doi: 10.1016/j.biocon.2016.10.008.
Cribari-Neto, F. and Zeileis, A. (2010) ‘Beta regression in R’,Journal of Statistical Software , 34(2), pp. 1–24. doi: 10.18637/jss.v034.i02.
Douma, J. C. and Weedon, J. T. (2019) ‘Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression’, Methods in Ecology and Evolution , 10(9), pp. 1412–1430. doi: 10.1111/2041-210X.13234.
Eldridge, A. et al. (2018) ‘Sounding out ecoacoustic metrics: Avian species richness is predicted by acoustic indices in temperate but not tropical habitats’, Ecological Indicators . Elsevier B.V., 95, pp. 939–952. doi: 10.1016/j.ecolind.2018.06.012.
Ewers, R. M. et al. (2011) ‘A large-scale forest fragmentation experiment: The stability of altered forest ecosystems project’,Philosophical Transactions of the Royal Society B: Biological Sciences , 366(1582), pp. 3292–3302. doi: 10.1098/rstb.2011.0049.
Fitzpatrick, M. C. et al. (2009) ‘Observer bias and the detection of low-density populations’, Ecological Applications , 19(7), pp. 1673–1679.
Fuller, S. et al. (2015) ‘Connecting soundscape to landscape: Which acoustic index best describes landscape configuration?’,Ecological Indicators . Elsevier Ltd, 58, pp. 207–215. doi: 10.1016/j.ecolind.2015.05.057.
Gemmeke, J. F. et al. (2017) ‘Audio Set: An ontology and human-labeled dataset for audio events’, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , pp. 776–780. doi: 10.1109/ICASSP.2017.7952261.
Gómez, W. E., Isaza, C. V. and Daza, J. M. (2018) ‘Identifying disturbed habitats: A new method from acoustic indices’, Ecological Informatics , 45. doi: 10.1016/j.ecoinf.2018.03.001.
Hershey, S. et al. (2017) ‘CNN architectures for large-scale audio classification’, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , pp. 131–135. doi: 10.1109/ICASSP.2017.7952132.
Huff, M. H. et al. (2000) ‘A habitat-based point-count protocol for terrestrial birds, emphasizing Washington and Oregon’, General Technical Reports of the US Department of Agriculture, Forest Service , (PNW-GTR-501), pp. 2–30.
Kasten, E. P. et al. (2012) ‘The remote environmental assessment laboratory’s acoustic library: An archive for studying soundscape ecology’, Ecological Informatics . Elsevier B.V., 12, pp. 50–67. doi: 10.1016/j.ecoinf.2012.08.001.
Linke, S. and Deretic, J. A. (2020) ‘Ecoacoustics can detect ecosystem responses to environmental water allocations’, Freshwater Biology , 65(1), pp. 133–141. doi: 10.1111/fwb.13249.
Mammides, C. et al. (2017) ‘Do acoustic indices correlate with bird diversity? Insights from two biodiverse regions in Yunnan Province, south China’, Ecological Indicators . Elsevier, 82(March), pp. 470–477. doi: 10.1016/j.ecolind.2017.07.017.
Medina-García, A., Araya-Salas, M. and Wright, T. F. (2015) ‘Does vocal learning accelerate acoustic diversification? Evolution of contact calls in Neotropical parrots’, Journal of Evolutionary Biology , 28(10), pp. 1782–1792. doi: 10.1111/jeb.12694.
Pfeifer, M. et al. (2016) ‘Mapping the structure of Borneo’s tropical forests across a degradation gradient’, Remote Sensing of Environment , (176), pp. 84–97.
Pieretti, N., Farina, A. and Morri, D. (2011) ‘A new methodology to infer the singing activity of an avian community: The Acoustic Complexity Index (ACI)’, Ecological Indicators . Elsevier Ltd, 11(3), pp. 868–873. doi: 10.1016/j.ecolind.2010.11.005.
Rapport, D. J. (1989) ‘What constitutes ecosystem health?’,Perspectives in Biology and Medicine , 33(1), pp. 120–132. doi: 10.1353/pbm.1990.0004.
Rapport, D. J., Costanza, R. and McMichael, A. J. (1998) ‘Assessing ecosystem health’, Trends in Ecology and Evolution , 13(10), pp. 397–402. doi: 10.1016/S0169-5347(98)01449-9.
Rempel, R. S. et al. (2005) ‘Bioacoustic monitoring of forest songbirds: interpreter variability and effects of configuration and digital processing methods in the laboratory’, Journal of Field Ornithology , 76(1), pp. 1–11. doi: 10.1648/0273-8570-76.1.1.
Roca, I. T. and Proulx, R. (2016) ‘Acoustic assessment of species richness and assembly rules in ensiferan communities from temperate ecosystems’, Ecology , 97(1), pp. 116–123. doi: 10.1890/15-0290.1.
Saito, K. et al. (2015) ‘Utilizing the Cyberforest live sound system with social media to remotely conduct woodland bird censuses in Central Japan’, Ambio . Springer Netherlands, 44(4), pp. 572–583. doi: 10.1007/s13280-015-0708-y.
Sethi, S. S. et al. (2018) ‘Robust, real-time and autonomous monitoring of ecosystems with an open, low-cost, networked device’,Methods in Ecology and Evolution , 2018(December 2017), pp. 1–5. doi: 10.1111/2041-210X.13089.
Sethi, S. S. et al. (2020) ‘Characterizing soundscapes across diverse ecosystems using a universal acoustic feature set’, (24), pp. 1–7. doi: 10.1073/pnas.2004702117.
Struebig, M. J. et al. (2013) Quantifying the Biodiversity Value of Repeatedly Logged Rainforests. Gradient and Comparative Approaches from Borneo . 1st edn, Advances in Ecological Research . 1st edn. Elsevier Ltd. doi: 10.1016/B978-0-12-417199-2.00003-3.
Sueur, J. et al. (2008) ‘Rapid acoustic survey for biodiversity appraisal’, PLoS ONE , 3(12). doi: 10.1371/journal.pone.0004065.
Sueur, J. et al. (2014) ‘Acoustic indices for biodiversity assessment and landscape investigation’, Acta Acustica united with Acustica , 100(4), pp. 772–781. doi: 10.3813/AAA.918757.
Sueur, J., Aubin, T. and Simonis, C. (2008) ‘Equipment review: Seewave, a free modular tool for sound analysis and synthesis’,Bioacoustics , 18(2), pp. 213–226. doi: 10.1080/09524622.2008.9753600.
Sueur, J., Krause, B. and Farina, A. (2019) ‘Climate Change Is Breaking Earth’s Beat’, Trends in Ecology & Evolution , 34(11), pp. 971–973. doi: 10.1016/j.tree.2019.07.014.
Sugai, L. S. M. et al. (2019) ‘A roadmap for survey designs in terrestrial acoustic monitoring’, Remote Sensing in Ecology and Conservation , pp. 1–16. doi: 10.1002/rse2.131.
Towsey, M. et al. (2014) ‘The use of acoustic indices to determine avian species richness in audio-recordings of the environment’, Ecological Informatics . Elsevier B.V., 21(100), pp. 110–119. doi: 10.1016/j.ecoinf.2013.11.007.
Towsey, M. (2018) ‘The calculation of acoustic indices derived from long-duration recordings of the natural environment’, 0(August 2017), pp. 0–12.
Towsey, M. W., Truskinger, A. M. and Roe, P. (2016) ‘The Navigation and Visualisation of Environmental Audio Using Zooming Spectrograms’,Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015 , pp. 788–797. doi: 10.1109/ICDMW.2015.118.
Vesna, I. (2009) ‘Understanding Bland Altman Analysis’, Biochemia Medica , 19(1), pp. 10–16. doi: 10.11613/BM.2013.003.
Villanueva-Rivera, L. J. et al. (2011) ‘A primer of acoustic analysis for landscape ecologists’, Landscape Ecology , 26(9), pp. 1233–1246. doi: 10.1007/s10980-011-9636-9.
Villanueva-Rivera, L. J. and Pijanowski, B. C. (2016) ‘Package “soundecology”’, http://ljvillanueva.github.io/soundecology/, CRAN , (Package version 1.3.3), p. 14. Available at: http://ljvillanueva.github.io/soundecology/.
Zhang, L. et al. (2016) ‘Classifying and ranking audio clips to support bird species richness surveys’, Ecological Informatics . Elsevier B.V., 34, pp. 108–116. doi: 10.1016/j.ecoinf.2016.05.005.