🎬 Hybrid Deep Learning Pipeline for Real-Time Video Analytics

Detection • Tracking • Background Understanding

Research paper implementation — Vishwakarma Institute of Technology, Pune

📌 Abstract

This repository contains the implementation of a unified hybrid deep learning pipeline that simultaneously performs:

Object Detection — Real-time detection using YOLOv8
Multi-Object Tracking — Persistent identity tracking via Deep SORT
Semantic Segmentation — Scene-level background understanding using transformer-based models

The pipeline is designed for real-time video analytics applications including autonomous driving, surveillance, and smart city infrastructure.

🧠 Pipeline Architecture

graph TD
    A[Input Video Frame] --> B[YOLOv8 Detector]
    A --> C[SegFormer / OneFormer]
    B --> D[Bounding Boxes + Classes]
    D --> E[Deep SORT Tracker]
    E --> F[Tracked Objects with IDs]
    C --> G[Semantic Segmentation Map]
    F --> H[Unified Output Frame]
    G --> H
    H --> I[Gradio Interactive Demo]

    style A fill:#1a1a2e,color:#fff
    style H fill:#16213e,color:#fff
    style I fill:#0f3460,color:#fff

Module	Model	Purpose
Detection	YOLOv8 (Ultralytics)	Real-time object detection
Tracking	Deep SORT	Multi-object identity persistence
Segmentation	SegFormer / OneFormer	Scene understanding & background parsing
Demo UI	Gradio	Interactive web-based visualization

🚀 Quick Start (Google Colab — Recommended)

Click the Open in Colab badge above
Set runtime: Runtime → Change runtime type → GPU (T4)
Run all cells top to bottom
The last cell outputs a Gradio public URL — click it to launch the interactive demo

No local setup required. The notebook auto-installs all dependencies and downloads pretrained weights.

💻 Local Setup (Advanced)

⚠️ Requires a working CUDA/PyTorch environment.

# Clone the repository
git clone https://github.com/coolss21/Hybrid-Deep-Learning-Video-Analytics.git
cd Hybrid-Deep-Learning-Video-Analytics

# Install dependencies
pip install ultralytics deep-sort-realtime transformers gradio torch torchvision opencv-python numpy

# Launch Jupyter and open the notebook
jupyter notebook vid_analytics.ipynb

Hardware Requirements:

NVIDIA GPU with CUDA support (GTX 1060+ recommended)
8GB+ RAM
Python 3.8+

📊 Datasets

All datasets used in this study are publicly available:

Dataset	Task	Source
MS-COCO 2017	Object Detection	cocodataset.org
MOT17	Multi-Object Tracking	motchallenge.net
ADE20K	Semantic Segmentation	MIT CSAIL
BDD100K	Driving Analytics	bdd-data.berkeley.edu
Cityscapes	Urban Segmentation	cityscapes-dataset.com

Note: Some datasets require registration. Follow each dataset's official terms.

📂 Repository Structure

Hybrid-Deep-Learning-Video-Analytics/
├── vid_analytics.ipynb    # Complete pipeline notebook (Colab-ready)
├── README.md              # Project documentation
└── LICENSE                # MIT License

📝 Reproducibility Notes

Best results on Colab T4 GPU with cells executed sequentially
Minor FPS/timing variations are expected across GPU types
External weights are auto-downloaded and may be cached across Colab sessions

📖 Citation

If you use this code in your research, please cite:

@article{vayadande2026hybrid,
  title   = {Hybrid Deep Learning Pipeline for Real-Time Video Analytics
             with Detection, Tracking, and Background Understanding},
  author  = {Vayadande, Kuldeep and others},
  journal = {Pattern Analysis and Applications},
  year    = {2026}
}

📄 License

This project is licensed under the MIT License.

Advancing real-time video intelligence through hybrid deep learning.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Untitled video.mp4		Untitled video.mp4
vid_analytics.ipynb		vid_analytics.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Hybrid Deep Learning Pipeline for Real-Time Video Analytics

📌 Abstract

🧠 Pipeline Architecture

🚀 Quick Start (Google Colab — Recommended)

💻 Local Setup (Advanced)

📊 Datasets

📂 Repository Structure

📝 Reproducibility Notes

📖 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 Hybrid Deep Learning Pipeline for Real-Time Video Analytics

📌 Abstract

🧠 Pipeline Architecture

🚀 Quick Start (Google Colab — Recommended)

💻 Local Setup (Advanced)

📊 Datasets

📂 Repository Structure

📝 Reproducibility Notes

📖 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages