🎯 Multi-Object Detection & Tracking Pipeline

title	Multi-Object Detection and Tracking
emoji	🎯
colorFrom	green
colorTo	blue
sdk	gradio
app_file	app.py
pinned	false
license	mit
short_description	YOLOv8-based real-time multi-object tracking

📦 Deliverables

All required deliverables for this project have been compiled and organized in a single Google Drive folder.

🔗 Access Link: View Deliverables Folder

The folder includes:

Annotated output video
Original public video
Short technical report
Sample screenshots of results
Short demo video (3–5 minutes) explaining the approach

🎯 Multi-Object Detection & Tracking Pipeline

A production-ready, end-to-end Python pipeline for detecting, tracking, and annotating multiple objects in video — with audio preserved in the output.

🔗 Live Demo: Multi Object Tracker

💡 Motivation

This project demonstrates real-time multi-object tracking using modern detection + tracking pipelines, solving challenges like occlusion, ID switching, and audio preservation — common in surveillance, sports analytics, and autonomous systems.

🎥 Demo

📸 Features

YOLOv8 (Ultralytics) object detection (nano → extra-large variants)
ByteTrack (default) and DeepSORT multi-object tracking
Stable cross-frame IDs with occlusion handling
Motion trail visualization — fading path per object
Audio preserved in output MP4 via FFmpeg muxing
YouTube / URL download via yt-dlp
Gradio web UI — no code needed
CLI interface for batch / server use
CSV + JSON tracking logs per session
GPU / CPU / MPS auto-detection

🗂️ Project Structure

multi_object_tracker/
├── app.py                   ← Gradio web UI (Hugging Face Spaces entry point)
├── main.py                  ← CLI entry point
├── config.py                ← Central configuration dataclasses
├── packages.txt             ← System packages for HF Spaces (ffmpeg)
├── requirements.txt
│
├── detector/
│   └── detector.py          ← YOLOv8 wrapper: Detection, DetectorConfig, Detector
│
├── tracker/
│   └── tracker.py           ← ByteTrack / DeepSORT / IoU-fallback wrappers
│
├── draw/
│   └── draw.py              ← Bounding boxes, labels, trails, FPS HUD
│
├── utils/
│   ├── video_io.py          ← VideoReader, VideoWriter, mux_audio_into_video
│   └── logger_utils.py      ← FPSCounter, TrackingLogger (CSV + JSON)
│
├── report/
│   └── technical_report.md  ← 1–2 page technical write-up
│
└── output/                  ← Auto-created: annotated videos + logs

⚙️ Setup

Prerequisites

Python 3.10+
FFmpeg (for audio): sudo apt install ffmpeg · brew install ffmpeg

Install

git clone https://github.com/manojk909/multi-object-tracker.git
cd multi-object-tracker

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

pip install -r requirements.txt

For CUDA GPU:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

🚀 How to Run

Option 1 — Web UI (Gradio)

python app.py
# → opens at http://localhost:7860

Upload a video or paste a YouTube URL and click Run Tracking.

Option 2 — CLI

# Local file — track people
python main.py --video path/to/video.mp4 --classes person

# YouTube URL — full auto download + track + audio preserved
python main.py --video "https://www.youtube.com/watch?v=VIDEO_ID"

# GPU + larger model + live preview
python main.py --video video.mp4 --model yolov8m.pt --device cuda --display

# Track vehicles only
python main.py --video traffic.mp4 --classes car bus truck motorcycle

# DeepSORT for appearance-based re-ID
python main.py --video myvideo.mp4 --tracker deepsort

# Skip every other frame (2× faster)
python main.py --video myvideo.mp4 --skip 1

# Skip audio muxing
python main.py --video myvideo.mp4 --no-audio

Full CLI reference

Argument	Default	Description
`--video`	(required)	File path or public URL
`--output-dir`	`output/`	Where to save results
`--model`	`yolov8n.pt`	YOLO weights variant
`--conf`	`0.35`	Detection confidence threshold
`--iou`	`0.45`	NMS IoU threshold
`--classes`	all	Space-separated class names
`--tracker`	`bytetrack`	`bytetrack` or `deepsort`
`--track-buffer`	`30`	Frames before lost track deletion
`--skip`	`0`	Frame skip (0 = every frame)
`--device`	`auto`	`cpu`, `cuda`, `mps`, `auto`
`--no-audio`	off	Skip FFmpeg audio muxing
`--no-trail`	off	Disable motion trails
`--display`	off	Show live preview window

📤 Output Files

File	Description
`output/<name>_tracked.mp4`	Annotated video with audio
`output/logs/<name>_tracking.csv`	Per-frame detection log
`output/logs/<name>_tracking.json`	Same data grouped by frame

CSV schema

frame_id, track_id, class_id, class_name, x1, y1, x2, y2, confidence

🔊 Audio Preservation (Fix)

OpenCV's VideoWriter cannot write audio streams — it always produces silent video.

Our fix: After the frame loop completes, we call FFmpeg to mux the original audio track directly into the annotated video without re-encoding the video stream:

[silent annotated video] ──┐
                            ├──► FFmpeg mux ──► final video with audio ✅
[original audio stream]  ──┘

FFmpeg must be installed on the system (packages.txt handles this on HF Spaces automatically).

🔍 Model Choices

Model	Speed	Accuracy	Use Case
`yolov8n.pt`	⚡ Fastest	Good	Real-time, CPU
`yolov8s.pt`	Fast	Better	Balanced default
`yolov8m.pt`	Medium	Great	Higher accuracy
`yolov8l.pt`	Slow	Excellent	Offline
`yolov8x.pt`	Slowest	Best	Max accuracy

Models download automatically on first use.

🔁 Tracker Choices

ByteTrack (default)

Uses ALL detections including low-confidence ones → better occlusion recovery
No appearance features needed → very fast
Best for: sports, surveillance, traffic

DeepSORT

Adds appearance embeddings (MobileNet) for re-ID after long occlusions
Slower but more robust when objects look similar and disappear
Best for: long occlusions, crowded scenes

🧩 Assumptions & Limitations

Assumptions

Input video readable by OpenCV (MP4, AVI, MOV, MKV, …)
YOLO model trained on COCO classes (80 categories)
Camera motion is moderate (extreme shake degrades IoU matching)

Limitations

No cross-camera tracking — IDs are per video only
Long disappearances (> track_buffer frames) reset the ID
Very small objects (< 8 px after resize) are rarely detected
Age-restricted or DRM YouTube videos cannot be downloaded

🛠️ Programmatic API

from detector.detector import Detector, DetectorConfig
from tracker.tracker import ObjectTracker, TrackerConfig
from draw.draw import FrameAnnotator

detector  = Detector(DetectorConfig(model_name="yolov8s.pt", confidence_threshold=0.4))
tracker   = ObjectTracker(TrackerConfig())
annotator = FrameAnnotator()

# In your frame loop:
detections = detector.detect(frame_bgr)
tracked    = tracker.update(detections, frame_bgr)
annotated  = annotator.annotate(frame_bgr, tracked, fps=30.0, frame_id=42)

🚀 Deploy to Hugging Face Spaces

Push this repo to GitHub
Go to huggingface.co/new-space
Choose Gradio SDK, link your GitHub repo
HF Spaces reads packages.txt → installs ffmpeg automatically
Reads requirements.txt → installs Python deps
Launches app.py — your app is live at https://huggingface.co/spaces/mk909/multi-object-tracker

📄 License

MIT

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📦 Deliverables

🎯 Multi-Object Detection & Tracking Pipeline

💡 Motivation

🎥 Demo

📸 Features

🗂️ Project Structure

⚙️ Setup

Prerequisites

Install

🚀 How to Run

Option 1 — Web UI (Gradio)

Option 2 — CLI

Full CLI reference

📤 Output Files

CSV schema

🔊 Audio Preservation (Fix)

🔍 Model Choices

🔁 Tracker Choices

🧩 Assumptions & Limitations

🛠️ Programmatic API

🚀 Deploy to Hugging Face Spaces

📄 License

MIT

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
detector		detector
draw		draw
report		report
tracker		tracker
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
main.py		main.py
packages.txt		packages.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📦 Deliverables

🎯 Multi-Object Detection & Tracking Pipeline

💡 Motivation

🎥 Demo

📸 Features

🗂️ Project Structure

⚙️ Setup

Prerequisites

Install

🚀 How to Run

Option 1 — Web UI (Gradio)

Option 2 — CLI

Full CLI reference

📤 Output Files

CSV schema

🔊 Audio Preservation (Fix)

🔍 Model Choices

🔁 Tracker Choices

🧩 Assumptions & Limitations

🛠️ Programmatic API

🚀 Deploy to Hugging Face Spaces

📄 License

MIT

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages