2.3.2 Object tracking evaluation
We evaluated the tracking architectures against the movement dataset by calculating precision, recall and a F1 score and by assessing the movement data (see Section 2.3.3). To calculate precision recall and a F1 score, we manually observed every second of video and determined if the OT architecture was correctly tracking the yellowfin bream individual (Supplementary B). We defined a true positive as a correct detection of yellowfin bream and then accurate tracking of the individual for ≥ 50% of the time where yellowfin bream appeared on frame (Supplementary B). A false negative was when a bream was not detected and tracked or if the yellowfin bream was tracked < 50% of the time when the fish appeared on frame. Additionally, we classified a false positive when a non-yellowfin bream object was detected and tracked or when a yellowfin bream was detected but the tracking architecture failed by then tracking a non-yellowfin bream object.
2.3.3 Movement assessment
The movement assessment was done to evaluate the accuracy of the directions provided by the tracker. We obtained one row of tracking data per frame when a fish was detected and subsequently tracked. For each tracking output, the OT architecture provided a tracking angle of movement in 2 dimensions. We grouped tracking angles using reference angles into four directions: up, down, left and right (Supplementary B). Because the camera was facing horizontally towards the fish corridor, parallel with the seafloor, we were able to calculate horizontal movement of fish. Fish moving up meant that the fish movement had tracking angles between 44o and 315o. Fish moving right (east) had angles between 45o and 135o, whereas fish moving left (west) between 225o and 315o. Finally, fish moving down had tracking angles between 135o and 225o. The tracking angle for all OT architectures was obtained from the tracker vector that is generated within each tracker’s bounding box (Supplementary B). By grouping the directions, we were able to count and group the number of movement angles per camera and per set. For each camera set, we then calculated the proportion of each tracking direction and determined net movement. We defined net movement as the movement angle with the highest proportion for a video. The data summary was generated in R (R Core Team, 2019) with the packages ggplot, tidyverse and sqldf (Wickham, 2009; Grothendieck, 2017; Wickham & Henry, 2019).
To groundtruth the tracking data, we manually observed all the videos and for each fish determined the direction of movement (groundtruth) (i.e. fish moving mainly right or left). The net movement of each video was determined (direction with the highest proportion for the video). We then compared the ground truth output with the net movement direction from the three OT architectures (Supplementary B).