FIGURE 2 Interaction between the object detection model and tracking architectures. The object detection model activates all three tracking architectures. For MOSSE and SiamMask the tracker continues for 4 frames after the initial detection. For Seq-NMS, the movement was determined by calculating the vector direction between two detections. For all architectures a check was made to determine if the tracker continued, stopped, or a new tracker started. For MOSSE and SiamMask the check was made after 4 tracking frames from the first detection. For SeqNMS the check was made for all frames after the first detection. The interaction between detections and tracker occurred through the whole length of a video where the object detection model detected a yellowfin bream and was carried for all frames, videos and cameras. All trackers provided a direction of movement for each frame where the interaction between the detection and tracking occurred successfully.

3. Results

3.1 Object detection

When using the Mask R-CNN framework for detecting yellowfin bream we obtained an 81% mAP50 value and an F1 score of 91% (Table 1). The OD model missed 21 fish (false negatives) and misidentified 8 objects (i.e. algae or other fish) as bream (false positives) out of the 169 fish (ground-truth) that were observed.
TABLE 1 Object detection map50 and the evaluation results of the Mask R-CNN yellowfin bream model. The confusion matrix is shown as counts of individual fish, where the true positives were the correct detection of yellowfin bream. Yellowfin bream not detected were false negatives and misidentified objects were false positives.