Road traffic accidents are a leading cause of death and injury worldwide, claiming approximately 1.35 million lives annually (WHO, 2023). Early detection of accidents can significantly reduce emergency response time, potentially saving lives. This research presents a deep learning-based real-time accident detection system utilizing transfer learning with MobileNetV2 architecture implemented in PyTorch. The system analyzes video frames from traffic cameras to classify scenes as either "Accident" or "Normal Traffic" with 99.80% accuracy on a held-out test set of 1,986 images. The model employs a 3-phase progressive fine-tuning strategy combined with temporal smoothing and Test-Time Augmentation (TTA) for robust real-time detection. An integrated alert system automatically notifies safety authorities via email with incident screenshots, enabling rapid emergency response.
Keywords: Accident Detection, Deep Learning, Transfer Learning, MobileNetV2, Computer Vision, Real-time Video Analysis, Convolutional Neural Networks, Traffic Safety
- Introduction
- Literature Review
- System Architecture
- Dataset
- Methodology
- Implementation
- Experimental Results
- Discussion
- Installation & Usage
- Limitations & Future Work
- Conclusion
- References
Road traffic accidents represent a critical global health challenge. According to the World Health Organization:
- 1.35 million deaths occur annually due to road accidents
- 20-50 million people suffer non-fatal injuries
- Road accidents are the 8th leading cause of death globally
- Economic losses amount to 3% of GDP in most countries
Traditional accident detection methods suffer from significant limitations:
| Method | Mechanism | Limitations |
|---|---|---|
| Manual Reporting | Witnesses call emergency services | Delays of 5-15 minutes, unreliable |
| Camera Operators | Human monitoring of CCTV feeds | Fatigue, limited coverage, high cost |
| Vehicle Sensors | In-car crash detection (airbag triggers) | Limited to equipped vehicles only |
| Audio Analysis | Detection of crash sounds | Environmental noise interference |
This research proposes an automated, intelligent accident detection system that:
- Analyzes traffic camera feeds in real-time using deep learning
- Detects accidents within milliseconds of occurrence
- Automatically alerts safety authorities with visual evidence
- Works with existing CCTV infrastructure without hardware modifications
- Operates 24/7 without human fatigue or attention lapses
| Objective | Target | Achieved |
|---|---|---|
| Classification Accuracy | > 95% | 99.80% |
| Real-time Processing | > 20 FPS | 25+ FPS |
| False Positive Rate | < 5% | 0.00% |
| Alert Latency | < 5 seconds | < 2 seconds |
This work makes the following contributions:
- Novel 3-Phase Training Strategy: Progressive fine-tuning approach achieving 99.80% accuracy
- Temporal Smoothing Algorithm: Reduces false positives using sliding window analysis
- Integrated Alert System: Automated email notifications with incident screenshots
- Real-time Dashboard: Professional monitoring interface with comprehensive metrics
| Study | Year | Method | Dataset Size | Accuracy | Limitations |
|---|---|---|---|---|---|
| Ijjina et al. | 2019 | VGG-16 | 1,000 images | 78.0% | Small dataset, no temporal analysis |
| Singh & Mohan | 2019 | Custom CNN | 2,000 images | 82.0% | Limited generalization |
| Ghosh et al. | 2020 | ResNet-50 | 5,000 images | 89.5% | High computational cost |
| Osman et al. | 2021 | YOLOv4 | 8,000 images | 91.2% | Object detection overhead |
| Chen et al. | 2022 | EfficientNet | 10,000 images | 94.3% | No real-time capability |
| This Work | 2025 | MobileNetV2 + TTA | 13,228 images | 99.80% | Real-time with alerts |
Transfer learning leverages knowledge from models pre-trained on large datasets (ImageNet: 14M+ images) and fine-tunes them for specific tasks. Benefits include:
- Faster training with fewer epochs required
- Less data required compared to training from scratch
- Better accuracy by utilizing pre-learned features
Figure 1: Complete accident detection system pipeline showing the flow from video input through preprocessing, inference, and output modules.
Figure 2: Step-by-step frame processing pipeline including preprocessing, TTA ensemble, CNN inference, temporal smoothing, and decision logic.
Figure 3: MobileNetV2 backbone with custom classification head architecture. The model uses pre-trained ImageNet weights with a 4-layer classifier.
Figure 4: Email alert system workflow showing screenshot capture, HTML report generation, and SMTP delivery to safety authorities.
The dataset was curated from multiple sources to ensure diversity:
| Source | Type | Description |
|---|---|---|
| YouTube | CCTV Footage | Real-world traffic camera recordings |
| Dashcam Archives | In-vehicle | Driver perspective accident footage |
| Kaggle | Public Dataset | Accident Detection from CCTV Footage |
| Manual Collection | Mixed | Curated from news and safety videos |
| Split | Accident | Non-Accident | Total | Percentage |
|---|---|---|---|---|
| Training | 4,629 | 4,629 | 9,258 | 70.0% |
| Validation | 992 | 992 | 1,984 | 15.0% |
| Test | 993 | 993 | 1,986 | 15.0% |
| Total | 6,614 | 6,614 | 13,228 | 100% |
![]() |
![]() |
![]() |
![]() |
| Vehicle Collision | Multi-vehicle Crash | Impact Frame | Post-collision |
Figure 5: Sample accident detection frames from CCTV footage showing various collision scenarios.
We employ MobileNetV2 pre-trained on ImageNet as our backbone, chosen for:
| Criterion | MobileNetV2 | VGG-16 | ResNet-50 |
|---|---|---|---|
| Parameters | 3.4M | 138M | 25.6M |
| Inference Time | 8ms | 45ms | 22ms |
| Accuracy (Ours) | 99.80% | 94.2% | 96.1% |
| Mobile Deployment | Yes | No | No |
Figure 6: Three-phase progressive fine-tuning strategy. Phase 1 trains only the classifier, Phase 2 unfreezes top layers, Phase 3 fine-tunes all layers.
To improve generalization and prevent overfitting:
| Augmentation | Parameters | Purpose |
|---|---|---|
| Random Horizontal Flip | p=0.5 | Mirror invariance |
| Random Rotation | +/-15 degrees | Orientation robustness |
| Color Jitter | Brightness +/-20%, Contrast +/-20% | Lighting variation |
| Random Affine | Translate +/-10%, Scale 0.9-1.1 | Position invariance |
| Gaussian Blur | Kernel 3x3 | Noise robustness |
Figure 7: Test-Time Augmentation ensemble. Five augmented versions are processed and averaged for more robust predictions.
Algorithm: Temporal Smoothing for Accident Detection
-------------------------------------------------------
Input: Frame predictions p_t for t = 1, 2, ..., T
Parameters: window_size W = 7, threshold T = 0.85, min_positive M = 5
Initialize: prediction_buffer = []
current_incident = False
For each frame t:
1. p_t = model.predict(frame_t) # Raw prediction
2. prediction_buffer.append(p_t > T) # Binary decision
3. if len(prediction_buffer) > W:
prediction_buffer.pop(0) # Sliding window
4. positive_count = sum(prediction_buffer)
5. if positive_count >= M and not current_incident:
TRIGGER_ALERT() # New incident
current_incident = True
6. if positive_count < 2: # Incident ended
current_incident = False
Output: Smoothed accident detection with reduced false positives
| Component | Specification |
|---|---|
| Programming Language | Python 3.12 |
| Deep Learning Framework | PyTorch 2.6.0+cu124 |
| GPU | NVIDIA RTX 4060 Laptop (8GB VRAM) |
| CUDA Version | 12.4 |
| Operating System | Windows 11 |
| IDE | Visual Studio Code |
# Optimizer Configuration
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
# Learning Rate Scheduler
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.5, patience=3
)
# Loss Function
criterion = nn.BCELoss() # Binary Cross-Entropy
# Training Parameters
batch_size = 32
epochs_per_phase = 10
total_epochs = 30| Parameter | Value | Description |
|---|---|---|
| Input Resolution | 224 x 224 | Model input size |
| Confidence Threshold | 0.85 | Minimum P(accident) to flag |
| Temporal Window | 7 frames | Sliding window size |
| Required Positives | 5/7 | Minimum for confirmation |
| TTA Variants | 5 | Number of augmented predictions |
| Target FPS | 25+ | Real-time requirement |
| Phase | Learning Rate | Layers Trained | Val Accuracy | Val Loss |
|---|---|---|---|---|
| Phase 1 | 1e-3 | Classifier only | 99.85% | 0.0089 |
| Phase 2 | 1e-4 | Top 50 + Classifier | 99.95% | 0.0045 |
| Phase 3 | 1e-5 | All layers | 100.00% | 0.0021 |
| Metric | Formula | Value |
|---|---|---|
| Accuracy | (TP + TN) / (TP + TN + FP + FN) | 99.80% |
| Precision | TP / (TP + FP) | 100.00% |
| Recall (Sensitivity) | TP / (TP + FN) | 99.60% |
| Specificity | TN / (TN + FP) | 100.00% |
| F1-Score | 2 x (Precision x Recall) / (Precision + Recall) | 99.80% |
| Predicted: Accident | Predicted: Normal | |
|---|---|---|
| Actual: Accident | 989 (TP) | 4 (FN) |
| Actual: Normal | 0 (FP) | 993 (TN) |
Table: Confusion matrix on test set (n=1,986). Only 4 false negatives, zero false positives.
| Metric | Value |
|---|---|
| Average FPS (with TTA) | 25.3 FPS |
| Average FPS (without TTA) | 42.7 FPS |
| Inference Time per Frame | 8.2 ms |
| End-to-end Latency | 39.5 ms |
| GPU Memory Usage | 1.2 GB |
| Alert Trigger Time | < 2 seconds |
Figure 8: Real-time monitoring dashboard showing status banner, confidence metrics, detection statistics, and temporal analysis visualization.
-
Transfer Learning Efficacy: Pre-trained MobileNetV2 features generalize exceptionally well to accident detection, achieving 99.80% accuracy with minimal fine-tuning.
-
Progressive Training: The 3-phase approach prevents catastrophic forgetting and enables stable convergence to high accuracy.
-
Temporal Smoothing Impact: Reduces false positive rate from ~5% (single-frame) to ~0% with 7-frame window.
-
TTA Contribution: Improves prediction stability by averaging across augmented views, reducing variance by ~40%.
| Method | Accuracy | Real-time | Alert System | Year |
|---|---|---|---|---|
| Ijjina et al. (VGG-16) | 78.0% | No | No | 2019 |
| Singh & Mohan (CNN) | 82.0% | No | No | 2019 |
| Ghosh et al. (ResNet-50) | 89.5% | No | No | 2020 |
| Osman et al. (YOLOv4) | 91.2% | Yes | No | 2021 |
| Chen et al. (EfficientNet) | 94.3% | No | No | 2022 |
| Proposed (MobileNetV2) | 99.80% | Yes | Yes | 2025 |
The 4 misclassified samples (False Negatives) in the test set share common characteristics:
| Error Type | Count | Cause |
|---|---|---|
| Distant accidents | 2 | Small object size due to camera distance |
| Partial occlusion | 1 | Accident partially hidden by other vehicles |
| Unusual angle | 1 | Overhead view not well represented in training |
- Python 3.10+
- NVIDIA GPU with CUDA support (recommended)
- 8GB+ RAM
# Clone the repository
git clone https://github.com/arrya5/accident-detection-system.git
cd accident-detection-system
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# Windows:
.venv\Scripts\activate
# Linux/Mac:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Real-time detection from webcam
python src/detect_pytorch.py --source 0
# Process video file
python src/detect_pytorch.py --source video.mp4 --output result.mp4
# With email alerts
python src/detect_pytorch.py --source video.mp4 \
--email \
--sender-email "alerts@example.com" \
--sender-password "app-password" \
--recipient-email "authority@example.com" \
--camera-location "Highway Junction A"
# Verify model on test set
python src/verify_model_pytorch.py --data_path data --plot --export| Argument | Description | Default |
|---|---|---|
--source |
Video source (file, webcam ID, RTSP URL) | Required |
--output |
Output video path | None |
--threshold |
Detection confidence threshold | 0.85 |
--email |
Enable email alerts | False |
--no-tta |
Disable Test-Time Augmentation | False |
--audio |
Enable audio alerts | False |
| Limitation | Description | Potential Solution |
|---|---|---|
| Chaotic Traffic | Dense/erratic traffic patterns may trigger false positives | Fine-tune on region-specific data |
| Training Data Bias | Model trained primarily on Western traffic patterns | Expand dataset with diverse coverage |
| Lighting Conditions | Performance may vary in extreme lighting | Add low-light augmentation |
| Camera Angle Dependency | Optimized for overhead/side CCTV views | Train on multi-angle dataset |
| Occlusion Handling | Partially hidden accidents may not be detected | Integrate object tracking |
- Multi-region Deployment: Fine-tune on Indian, Chinese, and European traffic datasets
- Object Detection Integration: Add YOLOv8 for vehicle tracking before/after collision
- Motion-based Pre-filtering: Use optical flow to reduce computation on static scenes
- Web Dashboard: Develop centralized monitoring for multiple cameras
- Mobile Application: Dashcam integration for in-vehicle detection
- Edge Deployment: Optimize for NVIDIA Jetson Nano, Raspberry Pi
This research presents a comprehensive real-time accident detection system achieving 99.80% accuracy on a test set of 1,986 images. Key contributions include:
- High Accuracy: State-of-the-art performance using transfer learning with MobileNetV2
- Real-time Capability: 25+ FPS processing enabling immediate detection
- Robust Detection: Temporal smoothing and TTA reduce false positives to near-zero
- Automated Alerts: Email notification system with visual evidence for rapid response
The system demonstrates the viability of deep learning for automated traffic safety monitoring and has potential for significant impact in reducing emergency response times.
-
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. CVPR.
-
World Health Organization. (2023). Global Status Report on Road Safety.
-
Ijjina, E. P., Chand, D., Gupta, S., & Goutham, K. (2019). Computer Vision-based Accident Detection in Traffic Surveillance. IEEE ITSC.
-
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. CVPR.
-
Russakovsky, O., et al. (2015). ImageNet Large Scale Visual Recognition Challenge. IJCV.
-
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR.
-
Tan, M., & Le, Q. (2019). EfficientNet: Rethinking Model Scaling for CNNs. ICML.
-
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv.
Arrya Thakur
B.Tech Computer Science
Minor Project - Real-Time Road Accident Detection System
This project is licensed under the MIT License - see the LICENSE file for details.
Made with love for Road Safety










