#Bachelor's Thesis
This repository contains the benchmarking framework and evaluation results for a comparative study of modern object detectors applied to apple detection in orchards. Reliable apple detection is a core perception component for precision agriculture, enabling automated yield estimation, fruit counting, crop monitoring, and robotic harvesting.
Orchard images are challenging due to extreme variations in lighting, viewpoint, background clutter from foliage, and densely clustered or partially occluded fruit. Many existing agricultural studies report limited metrics (like only mAP@0.5) and use differing datasets, making direct comparisons difficult.
This project establishes a transparent, controlled benchmark for single-class apple detection by evaluating six diverse architectures under a deterministic train/validation/test split using COCO-style metrics and threshold-dependent diagnostics.
- The project utilizes the public AppleBBCH81 orchard dataset, which provides bounding-box annotations in YOLO format[cite: 2].
- The dataset consists of 1,838 images containing 15,309 ground-truth apple bounding boxes.
- A fixed split protocol is used for all models: approximately 80% for training (1,470 images), 10% for validation (183 images), and 10% for testing (185 images)[cite: 2].
- Empty-label images are retained as valid negative samples to reduce false positive bias.
The benchmark includes six contemporary models representing different architectural families:
- YOLO11n: A modern, lightweight one-stage CNN baseline for real-time inference.
- YOLOv10n: A nano-configured one-stage detector targeting low latency and reduced compute.
- RT-DETR-L: A real-time transformer-based detector that utilizes object queries instead of dense anchors.
- Faster R-CNN (ResNet50-FPN): A standard two-stage detector baseline known for strong recall via explicit region proposals.
- FCOS (ResNet50-FPN): An anchor-free, one-stage detector baseline[cite: 2].
- SSDLite320 (MobileNetV3-Large): A lightweight, deployment-oriented baseline designed for mobile or edge hardware.
The primary localization-aware metric used is the Intersection over Union (IoU)[cite: 2]. It is calculated as follows:
Performance is reported using:
- mAP@0.5: Average precision at an IoU of 0.5.
- mAP@0.5:0.95: Average precision averaged across IoU thresholds from 0.50 to 0.95.
- Fixed Operating-Point Diagnostics: Precision, Recall, and F1-score evaluated at an IoU of 0.5 and a confidence threshold >= 0.05 using a one-to-one greedy matching rule.
- Curve Analysis: Precision-Recall (PR) curves and ROC-AUC for confidence-separation diagnostics.
The table below summarizes the core COCO-style mAP results and the F1-scores at the fixed operating point.
| Model | mAP@0.5:0.95 | mAP@0.5 | F1-Score |
|---|---|---|---|
| YOLO11n | 0.6065 | 0.9620 | 0.752053 |
| RT-DETR-L | 0.6012 | 0.9506 | 0.364767 |
| YOLOv10n | 0.5941 | 0.9501 | 0.780870 |
| Faster R-CNN | 0.5278 | 0.9091 | 0.768721 |
| FCOS | 0.4926 | 0.8993 | 0.672042 |
| SSDLite320 | 0.2125 | 0.5400 | 0.080068 |
Findings Summary:
- YOLO11n achieves the highest strict localization performance (mAP@0.5:0.95).
- YOLOv10n provides the highest F1-score when operating at a fixed threshold (IoU = 0.5, confidence >= 0.05).
- RT-DETR-L reaches very high recall but suffers from low precision at low confidence thresholds due to false positives, requiring careful calibration.
- Hardware: NVIDIA A100-SXM4-40GB GPU[cite: 2].
- Software: Python 3.12.1[cite: 2].
- Frameworks: PyTorch (2.1.0+cu121), Torchvision (0.16.0+cu121), and Ultralytics[cite: 2].
- Analysis: OpenCV, Matplotlib, NumPy, and Pandas[cite: 2].