2.3.2 Object tracking evaluation
We evaluated the tracking architectures against the movement dataset by
calculating precision, recall and a F1 score and by assessing the
movement data (see Section 2.3.3). To calculate precision recall and a
F1 score, we manually observed every second of video and determined if
the OT architecture was correctly tracking the yellowfin bream
individual (Supplementary B). We defined a true positive as a correct
detection of yellowfin bream and then accurate tracking of the
individual for ≥ 50% of the time where yellowfin bream appeared on
frame (Supplementary B). A false negative was when a bream was not
detected and tracked or if the yellowfin bream was tracked <
50% of the time when the fish appeared on frame. Additionally, we
classified a false positive when a non-yellowfin bream object was
detected and tracked or when a yellowfin bream was detected but the
tracking architecture failed by then tracking a non-yellowfin bream
object.
2.3.3 Movement
assessment
The movement assessment was done to evaluate the accuracy of the
directions provided by the tracker. We obtained one row of tracking data
per frame when a fish was detected and subsequently tracked. For each
tracking output, the OT architecture provided a tracking angle of
movement in 2 dimensions. We grouped tracking angles using reference
angles into four directions: up, down, left and right (Supplementary B).
Because the camera was facing horizontally towards the fish corridor,
parallel with the seafloor, we were able to calculate horizontal
movement of fish. Fish moving up meant that the fish movement had
tracking angles between 44o and
315o. Fish moving right (east) had angles between
45o and 135o, whereas fish moving
left (west) between 225o and 315o.
Finally, fish moving down had tracking angles between
135o and 225o. The tracking angle
for all OT architectures was obtained from the tracker vector that is
generated within each tracker’s bounding box (Supplementary B). By
grouping the directions, we were able to count and group the number of
movement angles per camera and per set. For each camera set, we then
calculated the proportion of each tracking direction and determined net
movement. We defined net movement as the movement angle with the highest
proportion for a video. The data summary was generated in R (R Core
Team, 2019) with the packages ggplot, tidyverse and sqldf (Wickham,
2009; Grothendieck, 2017; Wickham & Henry, 2019).
To groundtruth the tracking data, we manually observed all the videos
and for each fish determined the direction of movement (groundtruth)
(i.e. fish moving mainly right or left). The net movement of each video
was determined (direction with the highest proportion for the video). We
then compared the ground truth output with the net movement direction
from the three OT architectures (Supplementary B).