FIGURE 1 The study location in Tweed River estuary, Australia,
showing the camera array deployed in a fish corridor (two ended white
arrow) between the rock wall channel and the seagrass meadow (green
polygon). Each set of cameras consisted of three underwater cameras that
recorded for 1 hr during a flood tide. Set 1 faced north and Set 2 faced
south. The distance between cameras (~3 m) and between
sets (20 m) ensured non-overlapping field of views. Map data: NearMap
2020.
2.2.1 Minimum output sum of squared
error
(MOSSE)
The MOSSE algorithm produces adaptive correlation filters over target
objects, and tracking is performed via convolutions (process of
combining outputs to form more outputs). MOSSE was developed between
2010 and 2016 and it is robust to changes in lighting, scale, pose and
shape of objects (Bolme et al., 2010; Sidhu, 2016). Here, we modified
the MOSSE tracking process by activating the tracker with the OD output
(Figure 2). The object detection model and the object tracking
architecture interacted to maintain the consistency of the tracker on
yellowfin bream individuals. When a fish was detected, the entry was
used to initialise the tracker. MOSSE then tracked the fish for 4 frames
and then a check was made on the subsequent frame to verify the accuracy
of the tracker. In this check, if the detection bounding box overlapped
by ≥ 30% with the existing tracker bounding box, then the tracker
continues on the same object. If the detection bounding box does not
overlap with the existing tracker bounding box, then a new tracker entry
starts. This interaction between the detection and tracking occurred for
every yellowfin bream detected in a frame and stopped when no more
detections were found.
2.2.2 Sequential non-maximum
suppression
(Seq-NMS)
Sequential non-maximum suppression (Seq-NMS) was developed in 2016 and
is traditionally an algorithm developed to improve the classification
results and consistency of deep learning outputs (Han et al., 2016).
Seq-NMS works differently to the other trackers tested because it
requires an OD output for every frame where there is a fish. Seq-NMS
links detections of neighbouring frames, which means that a detection in
the first frame can be connected with a detection in the second frame if
there is an intersection above a defined threshold. In our case, we used
the principles of Seq-NMS to create detection linkages for OT of fish
when there was an overlap (intersection over union) of bounding box in
subsequent frames of ≥ 30%. In other words, if an object was detected
in Frame 1 (Detection 1) and then another object detected in Frame 2
(Detection 2), then we checked if the bounding boxes of Detection 1 and
Detection 2 overlapped equal to or greater than 30% (Figure 2). If this
is true, then the chain of detections continues. When the overlap is
less than 30%, then a new detection link starts (i.e. the tracker will
treat this detection as a new fish). We determined the movement
direction by calculating the distance and angle (vector) between the two
coordinates at the centres of bounding boxes of Detections 1 and 2.
2.2.3
SiamMask
SiamMask is a tracking algorithm developed in October 2019 that uses
outputs of deep learning models for estimating the rotation and location
of objects (Wang et al., 2019). SiamMask is based on the concepts of
Siamese network-based tracking. Similar to MOSSE, we slightly modified
the tracking process by activating the tracker with the deep learning
object detection model (Figure 2). The tracking with SiamMask started
once a yellowfin bream was detected (Figure 4).
We have made available the OD annotations and images, movement dataset
and annotations, and trackers and data wrangling codes
(https://github.com/slopezmarcano/automated-fish-tracking).