SalmonSight: A Computer Vision Pipeline for Automated Fish Migration Monitoring at DFO Counting Fences
Jason Au — 101220472 — jasonau@cmail.carleton.ca
Ranad Osman — 101143136 — ranadosman@cmail.carleton.ca
Tan Tran — 101058868 — tantran@cmail.carleton.ca
As part of the Department of Fisheries and Oceans Canada's (DFO) ongoing efforts to enhance fishery management and conservation, this project develops an automated computer vision pipeline for monitoring fish migration at counting fences. Counting fences are structures installed across rivers or streams to direct fish into specific channels, enabling accurate population counts during migration periods. These structures are essential tools for assessing fish populations, informing harvest regulations, and implementing conservation strategies.
DFO currently collects fish migration data from underwater camera footage, which requires time-consuming manual analysis. This project applies machine learning and computer vision methods to automate the counting and species identification of fish from this footage, enabling better-informed, data-driven management decisions.
SalmonSight/
├── Fence_count_data/ # Raw footage and annotations (gitignored, see Data Access)
├── mmdetection/ # MMDetection framework (gitignored, see GETTING-STARTED.md)
├── src/
│ ├── baseline/ # PCA + SVM pipeline
│ ├── yolo/ # YOLOv8 / YOLOv11 / YOLOv26 experiments
│ └── cascade/ # Cascade Mask R-CNN + GradCAM
├── README.md
└── GETTING-STARTED.md
Raw footage and annotation data are stored externally due to file size constraints. Contact the authors for data access.
Place the downloaded folder at the repo root as /Fence_count_data.
Data is sourced from four DFO field sites:
- Cowichan Fence
- Skutz Falls Fishway
- Sakinaw
- Port Alberni
Raw video footage is retrieved from underwater cameras at each field site. Frames are extracted and stored in a cloud storage solution for annotation and model training.
- Class distribution analysis: Evaluate species population across frames and assess potential sample imbalance. Note that observed imbalance may reflect true population distributions rather than sampling bias.
- Batch intervals: A 5-second time window is applied between counted frames to reduce duplicate detections.
- Bounding box annotations are provided by DFO scientists with domain expertise in species identification.
- A dual-reviewer quality control process is applied to ensure consistency and accuracy across annotated frames.
Principal Component Analysis (PCA) is applied for two purposes:
- Denoising: Reduce underwater image noise (e.g., murky water artifacts) by reconstructing frames from top principal components.
- Feature extraction: Eigenvalues and principal components serve as input features to a Support Vector Machine (SVM) or logistic regression classifier.
Ensemble methods (bagging and boosting) are explored as extensions of the baseline.
YOLO (You Only Look Once) models are trained on annotated footage for real-time fish detection and species classification. Experiments compare:
- YOLOv8 — primary model; strong general-purpose object detection
- YOLOv11 — reported improvements in small object detection
- YOLOv26 — optimized for edge deployment and faster convergence
Loss function modifications are evaluated across YOLO variants to improve detection performance on imbalanced species distributions.
A two-stage Cascade Mask R-CNN with a ResNet-50 backbone is used as the stretch model for high-accuracy detection:
- Progressive IoU refinement across cascade stages improves multi-scale detection
- Native instance segmentation support enables future morphological analysis (e.g., fin shape)
- Clean ResNet-50 backbone layer hooks enable reliable GradCAM attribution — critical for the QA/QC platform
- Well-established in medical and surgical CV literature, supporting XAI pipeline reuse
Models are evaluated on a held-out test set reflecting real-world distribution, including multi-instance frames. Metrics include accuracy, precision, recall, and F1 score. Optimized detection is validated by confirming at least one correct count per frame containing multiple fish instances.
A human-in-the-loop review interface is developed to support DFO scientists in validating model predictions:
- GradCAM visualizations highlight image regions driving each detection decision, distinguishing true fish detections from background noise
- Reviewers can confirm or flag model predictions, with validated frames fed back into the training set to improve model precision iteratively