Detection • Tracking • Background Understanding
Research paper implementation — Vishwakarma Institute of Technology, Pune
This repository contains the implementation of a unified hybrid deep learning pipeline that simultaneously performs:
- Object Detection — Real-time detection using YOLOv8
- Multi-Object Tracking — Persistent identity tracking via Deep SORT
- Semantic Segmentation — Scene-level background understanding using transformer-based models
The pipeline is designed for real-time video analytics applications including autonomous driving, surveillance, and smart city infrastructure.
graph TD
A[Input Video Frame] --> B[YOLOv8 Detector]
A --> C[SegFormer / OneFormer]
B --> D[Bounding Boxes + Classes]
D --> E[Deep SORT Tracker]
E --> F[Tracked Objects with IDs]
C --> G[Semantic Segmentation Map]
F --> H[Unified Output Frame]
G --> H
H --> I[Gradio Interactive Demo]
style A fill:#1a1a2e,color:#fff
style H fill:#16213e,color:#fff
style I fill:#0f3460,color:#fff
| Module | Model | Purpose |
|---|---|---|
| Detection | YOLOv8 (Ultralytics) | Real-time object detection |
| Tracking | Deep SORT | Multi-object identity persistence |
| Segmentation | SegFormer / OneFormer | Scene understanding & background parsing |
| Demo UI | Gradio | Interactive web-based visualization |
- Click the Open in Colab badge above
- Set runtime: Runtime → Change runtime type → GPU (T4)
- Run all cells top to bottom
- The last cell outputs a Gradio public URL — click it to launch the interactive demo
No local setup required. The notebook auto-installs all dependencies and downloads pretrained weights.
⚠️ Requires a working CUDA/PyTorch environment.
# Clone the repository
git clone https://github.com/coolss21/Hybrid-Deep-Learning-Video-Analytics.git
cd Hybrid-Deep-Learning-Video-Analytics
# Install dependencies
pip install ultralytics deep-sort-realtime transformers gradio torch torchvision opencv-python numpy
# Launch Jupyter and open the notebook
jupyter notebook vid_analytics.ipynbHardware Requirements:
- NVIDIA GPU with CUDA support (GTX 1060+ recommended)
- 8GB+ RAM
- Python 3.8+
All datasets used in this study are publicly available:
| Dataset | Task | Source |
|---|---|---|
| MS-COCO 2017 | Object Detection | cocodataset.org |
| MOT17 | Multi-Object Tracking | motchallenge.net |
| ADE20K | Semantic Segmentation | MIT CSAIL |
| BDD100K | Driving Analytics | bdd-data.berkeley.edu |
| Cityscapes | Urban Segmentation | cityscapes-dataset.com |
Note: Some datasets require registration. Follow each dataset's official terms.
Hybrid-Deep-Learning-Video-Analytics/
├── vid_analytics.ipynb # Complete pipeline notebook (Colab-ready)
├── README.md # Project documentation
└── LICENSE # MIT License
- Best results on Colab T4 GPU with cells executed sequentially
- Minor FPS/timing variations are expected across GPU types
- External weights are auto-downloaded and may be cached across Colab sessions
If you use this code in your research, please cite:
@article{vayadande2026hybrid,
title = {Hybrid Deep Learning Pipeline for Real-Time Video Analytics
with Detection, Tracking, and Background Understanding},
author = {Vayadande, Kuldeep and others},
journal = {Pattern Analysis and Applications},
year = {2026}
}This project is licensed under the MIT License.
Advancing real-time video intelligence through hybrid deep learning.