Introduction
Bioacoustics – the study of sound production, dispersion and reception in animals – has been practiced for millennia. Even in underwater systems, Aristotle had described communication between animals in great anatomic and behavioural detail (Aristotle, 1910; Linke et al., 2020). Bioacoustics can be used to study animal ecology – for example, reproductive behaviour and success (Teixeira et al., 2019) - to monitor population dynamics of native or invasive species (Brodie et al., 2020b), and to detect rare and endangered soniferous animals (Dema et al., 2020; Dutilleux and Curé, 2020; Znidersic et al., 2020). The sister discipline ecoacoustics is a new field that is not restricted to biotic organisms, but – like ecology to biology – investigates biodiversity, its relation to habitats as well as populations and ecological communities (Sueur and Farina, 2015).
Ecoacoustics has been used to quantify ecological responses to environmental restoration or improvement in condition, providing a rapid and continuous monitoring framework that can detect both degradation and restoration success (Greenhalgh et al., 2021; Linke and Deretic, 2020; Znidersic and Watson, 2022). Often, acoustic indices are used in assessments. These indices are analogous to measurements of diversity or richness in classical ecology – they summarise the acoustic properties of an overall soundscape, for example its spatial, temporal or combined complexity, its overall volume or the relation between natural and human-influenced frequency bands (Sueur et al., 2014). However, given inherent variations in soundscapes between places, ecoacoustic indices must be calibrated by ecosystem. While some authors have described clear variation along landscape gradients (Ng et al., 2018), others have found little relation of acoustic indices to human disturbance (Mitchell et al., 2020). Other studies have found that acoustic indices can be dominated by single acoustic events, for example river flows (Linke and Deretic, 2020) or single species that dominate the soundscape, such as snapping shrimp (Bohnenstiehl et al., 2018).
To examine restoration of wetlands in Australia’s most highly regulated river system, the Murray-Darling, Linke and Deretic (2020) used both manual annotation and ecoacoustic indices to track recovery of amphibian and waterbird populations. The Murray-Darling system currently flows at only ~40% of its natural capacity, with the bulk of the extracted water used for irrigated agriculture (Grafton et al., 2014). Under a federal government initiative - the “Murray-Darling Basin Plan” - water is being returned to rivers and wetlands via water buybacks from irrigators, however quantifying ecological recovery over the long term is difficult (King et al., 2015; Souchon et al., 2008). Linke and Deretic (2020) pioneered the use of ecoacoustic analysis as a tool to continuously monitor populations after restorative water returns to wetlands. When manually listening to recordings of frog and bird calls, they found highly significant responses in richness of water-dependent biota to environmental watering. However, the response of acoustic indices was much weaker, and in some cases non-significant, partially obfuscated by ambient noises, and also subject to high diurnal variation. This led the authors to conclude that a logical next step was to trial multi-species call recognisers that would combine the advantage of species specificity with the automated processing of acoustic indices (Linke and Deretic, 2020).
Call recognisers usually function to detect single species, since bioacoustics is often used to detect cryptic or rare animals. However, as the application of acoustics to environmental monitoring increases, multi-species recognisers are likely to become more important. Multi-species recognisers detect sympatric species simultaneously (Wright et al., 2020; Zhong et al., 2020), and outputs can be analysed for species separately or combined. This is useful where groups of species (e.g. mixed species frog choruses) represent environmental change or other ecological values. Like single-species recognisers, multi-species recognisers can use acoustic indices to detect soundscapes in which target species are likely to occur (Brodie et al., 2020a), or they can implement several single-species algorithms to detect discrete calls (Ruff et al., 2020). There are many challenges to creating reliable multi-species recognisers, however methods for reducing the increased risk of false detections are beginning to be examined (Campos et al., 2019; Wright et al., 2020).
Performance metrics used to evaluate and report on call recogniser performance are highly variable in the literature (Knight et al., 2017). Terminology is inconsistent and studies may report only a small number of possible performance metrics. This makes comparisons and repeatability difficult. Perhaps more importantly, there are major inconsistencies in type and amount of training data used and the test datasets upon which recognisers are evaluated. While strictly standardised methods are unlikely be feasible (e.g. for rare species, datasets can be extremely difficult to acquire), studies should, as a minimum, report the representativeness of the training data, how these were chosen or tested, and any limitations or assumptions. Decisions relating to, for example, geographical coverage may have important consequences for recogniser performance and transferability among regions. Moreover, the extent to which training and test data include real-world ambient noise should be explained, because factors like wind, noise and other species’ calls can significantly impact false detections (Brandes, 2008; Cragg et al., 2015; Crump and Houlahan, 2017; Kahl et al., 2021; Knight et al., 2017; Priyadarshani et al., 2018; Salamon et al., 2016; Towsey et al., 2012). To standardise the reporting of performance metrics, Knight et al. (2017) recommended all studies report precision, recall, F-score and area under the precision-recall curve (AUC) or, for comparison with the broader classifier literature, receiver operating characteristics (ROC) curve.
Using a template-matching algorithm (binary point matching, Towsey et al., 2012) in the R package monitor (Katz et al., 2016b) , we aimed to establish a free and open source protocol to optimise multi-species call recogniser construction and evaluation using three levers: template selection, amplitude cut-off and score-cut-off.
As a case study, we tested this protocol on the calls of eight sympatric frog species from the Koondrook-Pericoota wetland complex in the Murray-Darling Basin.