Keywords
Artificial Intelligence, connectivity, deep learning, dispersal, machine learning, object tracking, underwater video

1. Introduction

Computer vision (CV), the research field that explores the use of computer algorithms to automate the interpretation of digital images or videos, is revolutionising data collection in science (Waldchen & Mader, 2018; Beyan & Browman, 2020). The use of remote camera imagery, such as underwater stations, camera traps and stereography, has driven the uptake of CV because it has the capacity to process and analyse imagery quickly and accurately (Bicknell et al., 2016; Schneider et al., 2019). In ecological studies, advances in CV have led to an increase in sampling accuracy and repeatability (Waldchen & Mader, 2018). For example, drones are being used to track grassland animals (van Gemert et al., 2015) and estimate tree defoliation (Kälin et al., 2019), underwater observatories with CV are monitoring deep sea ecosystems (Aguzzi et al., 2019), and CV-capable dive scooters are being used to monitor coral reefs at large spatial and temporal scales (González-Rivero et al., 2020; Kennedy et al., 2020)
In the past few years, we have seen an increase in the uptake of CV to study and monitor marine ecosystems. These applications are related to the two main CV tasks – object detection (OD) and object tracking (OT). OD and OT automate the task of gathering information about the type, location and movement of objects of interest. OD has received the most attention as OD models can count and identify species of interest in underwater video footage (Christin, Hervet & Lecomte, 2019). For example, OD models have been applied to detect seals (Salberg, 2015), identify whale hotspots (Guirado et al., 2019), monitor fish populations (Xiu et al., 2015; Salman et al., 2016; Villon et al., 2016; Marini et al., 2018; Villon et al., 2018; Ditria et al., 2020b; Jalal et al., 2020; Villon et al., 2020) and quantify floating debris on the ocean surface (Watanabe, Shao & Miura, 2019). By comparison, the application of OT is less advanced in marine ecosystems. Previous work has shown that OT models can successfully track on-surface objects (see topios.org) and underwater objects such as fish, sea turtles, dolphins, and whales (Spampinato et al., 2008; Chuang et al., 2017; Xu & Cheng, 2017; Arvind et al., 2019; Kezebou et al., 2019). There is also evidence that automated monitoring of fish in underwater ecosystems through the combination of OD and OT is reliable and accurate (Spampinato et al., 2008; Lantsova et al., 2016; Mohamed et al., 2020). However, no studies have jointly applied OD and OT for animal movement studies. OD can help advance the automatic collection of traditional presence/absence data of different species (Xiu et al., 2015; Marini et al., 2018) and OT can subsequently track multiple individuals and provide fine-scale tracking data to assess behavioural and animal movement patterns across a range of environments (Francisco, Nührenberg & Jordan, 2020). With a single and non-invasive automated approach, two types of ecological information can be obtained, which will provide individual level information of different species and that enhances our ability to quantify the environmental drivers of species abundance and behaviour.
The combination of OD and OT is particularly suited to the subfield of marine animal movement because these tasks can provide the volume of data required to quantify movement of numerous individuals (Lopez-Marcano et al., 2020).In marine environments, animal movement shapes predator-prey dynamics, nutrient dynamics and trophic functions (Olds et al., 2018). For example, the movement of herbivorous fish between seagrass and coral reefs helps maintain resilience by balancing fish abundances with algal growth rates that vary spatio-temporally (Pagès et al., 2014). The knowledge of animal movement is fundamental to many research objectives in marine science, and collecting movement data is challenging and requires substantial resources. Therefore, the development and applications of emerging technologies (i.e. computer vision) can help advance our understanding of animal movement across a broad range of spatio-temporal dimensions and ecological hierarchies (e.g. individuals, populations, communities).
In this study, we aimed to test the ability of deep learning algorithms to track small-scale animal movement of many individuals in underwater videos. We developed a CV pipeline consisting of two steps, OD and OT, and we used the pipeline to quantify underwater animal movement across habitats for ecological research. We tested and applied off-the-shelf OT architectures to determine the efficacy and capacity of these emerging techniques to be used for underwater ecological applications. To demonstrate the applications of OD and OT, we deployed a 6-camera network in a known coastal fish estuarine corridor and recorded the movement of a common fisheries species (yellowfin bream,Acanthopagrus australis) . The corridor, located in the Tweed River estuary, Australia, is located between a rockwall passage and a seagrass meadow. Multiple estuarine fish such as sand whiting (Sillago ciliata), river garfish (Hyporhamphus regularis ), luderick (Girella tricuspidata) , spotted scat (Scatophagus argus ), three-bar porcupinefish (Dicotylichthys punctulatu s) and yellowfin bream, frequently move back and forth with the tides through this corridor, representing a relatively challenging scenario (i.e. low visibility and with currents also carrying floating debris) to showcase the capacity of CV to detect the target species in a multi-species assemblage and quantify the direction of movement. Testing the method with fish tidal movement represents the ideal test, because of the common knowledge on how and where fish move with the tidal patterns. We expected the analysis of videos from cameras to detect and track bream moving in the corridor consistent with the direction of the tidal flow. For OD, we used an off-the-shelf model called Mask Regional Convolutional Neural Network (Mask R-CNN) (He et al., 2017) that has been shown to successfully and accurately detect and quantify fish in estuarine ecosystems (Ditria et al., 2020b). We also benchmarked three OT architectures: Minimum Output Sum of Squared Errors (MOSSE) (Bolme et al., 2010), Sequential Non-Maximum Suppression (Seq-NMS) (Han et al., 2016), and Siamese Mask (SiamMask)(Wang et al., 2019). Ultimately, we demonstrate that these technologies can complement the collection and analysis of animal movement data and potentially contribute to the data-driven management of ecosystems.

2. Methods