FIGURE 1 The study location in Tweed River estuary, Australia, showing the camera array deployed in a fish corridor (two ended white arrow) between the rock wall channel and the seagrass meadow (green polygon). Each set of cameras consisted of three underwater cameras that recorded for 1 hr during a flood tide. Set 1 faced north and Set 2 faced south. The distance between cameras (~3 m) and between sets (20 m) ensured non-overlapping field of views. Map data: NearMap 2020.
2.2.1 Minimum output sum of squared error (MOSSE)
The MOSSE algorithm produces adaptive correlation filters over target objects, and tracking is performed via convolutions (process of combining outputs to form more outputs). MOSSE was developed between 2010 and 2016 and it is robust to changes in lighting, scale, pose and shape of objects (Bolme et al., 2010; Sidhu, 2016). Here, we modified the MOSSE tracking process by activating the tracker with the OD output (Figure 2). The object detection model and the object tracking architecture interacted to maintain the consistency of the tracker on yellowfin bream individuals. When a fish was detected, the entry was used to initialise the tracker. MOSSE then tracked the fish for 4 frames and then a check was made on the subsequent frame to verify the accuracy of the tracker. In this check, if the detection bounding box overlapped by ≥ 30% with the existing tracker bounding box, then the tracker continues on the same object. If the detection bounding box does not overlap with the existing tracker bounding box, then a new tracker entry starts. This interaction between the detection and tracking occurred for every yellowfin bream detected in a frame and stopped when no more detections were found.
2.2.2 Sequential non-maximum suppression (Seq-NMS)
Sequential non-maximum suppression (Seq-NMS) was developed in 2016 and is traditionally an algorithm developed to improve the classification results and consistency of deep learning outputs (Han et al., 2016). Seq-NMS works differently to the other trackers tested because it requires an OD output for every frame where there is a fish. Seq-NMS links detections of neighbouring frames, which means that a detection in the first frame can be connected with a detection in the second frame if there is an intersection above a defined threshold. In our case, we used the principles of Seq-NMS to create detection linkages for OT of fish when there was an overlap (intersection over union) of bounding box in subsequent frames of ≥ 30%. In other words, if an object was detected in Frame 1 (Detection 1) and then another object detected in Frame 2 (Detection 2), then we checked if the bounding boxes of Detection 1 and Detection 2 overlapped equal to or greater than 30% (Figure 2). If this is true, then the chain of detections continues. When the overlap is less than 30%, then a new detection link starts (i.e. the tracker will treat this detection as a new fish). We determined the movement direction by calculating the distance and angle (vector) between the two coordinates at the centres of bounding boxes of Detections 1 and 2.
2.2.3 SiamMask
SiamMask is a tracking algorithm developed in October 2019 that uses outputs of deep learning models for estimating the rotation and location of objects (Wang et al., 2019). SiamMask is based on the concepts of Siamese network-based tracking. Similar to MOSSE, we slightly modified the tracking process by activating the tracker with the deep learning object detection model (Figure 2). The tracking with SiamMask started once a yellowfin bream was detected (Figure 4).
We have made available the OD annotations and images, movement dataset and annotations, and trackers and data wrangling codes (https://github.com/slopezmarcano/automated-fish-tracking).