| title | Multi-Object Detection and Tracking |
|---|---|
| emoji | 🎯 |
| colorFrom | green |
| colorTo | blue |
| sdk | gradio |
| app_file | app.py |
| pinned | false |
| license | mit |
| short_description | YOLOv8-based real-time multi-object tracking |
All required deliverables for this project have been compiled and organized in a single Google Drive folder.
🔗 Access Link: View Deliverables Folder
The folder includes:
- Annotated output video
- Original public video
- Short technical report
- Sample screenshots of results
- Short demo video (3–5 minutes) explaining the approach
A production-ready, end-to-end Python pipeline for detecting, tracking, and annotating multiple objects in video — with audio preserved in the output.
🔗 Live Demo: Multi Object Tracker
This project demonstrates real-time multi-object tracking using modern detection + tracking pipelines, solving challenges like occlusion, ID switching, and audio preservation — common in surveillance, sports analytics, and autonomous systems.
- YOLOv8 (Ultralytics) object detection (nano → extra-large variants)
- ByteTrack (default) and DeepSORT multi-object tracking
- Stable cross-frame IDs with occlusion handling
- Motion trail visualization — fading path per object
- Audio preserved in output MP4 via FFmpeg muxing
- YouTube / URL download via yt-dlp
- Gradio web UI — no code needed
- CLI interface for batch / server use
- CSV + JSON tracking logs per session
- GPU / CPU / MPS auto-detection
multi_object_tracker/
├── app.py ← Gradio web UI (Hugging Face Spaces entry point)
├── main.py ← CLI entry point
├── config.py ← Central configuration dataclasses
├── packages.txt ← System packages for HF Spaces (ffmpeg)
├── requirements.txt
│
├── detector/
│ └── detector.py ← YOLOv8 wrapper: Detection, DetectorConfig, Detector
│
├── tracker/
│ └── tracker.py ← ByteTrack / DeepSORT / IoU-fallback wrappers
│
├── draw/
│ └── draw.py ← Bounding boxes, labels, trails, FPS HUD
│
├── utils/
│ ├── video_io.py ← VideoReader, VideoWriter, mux_audio_into_video
│ └── logger_utils.py ← FPSCounter, TrackingLogger (CSV + JSON)
│
├── report/
│ └── technical_report.md ← 1–2 page technical write-up
│
└── output/ ← Auto-created: annotated videos + logs
- Python 3.10+
- FFmpeg (for audio):
sudo apt install ffmpeg·brew install ffmpeg
git clone https://github.com/manojk909/multi-object-tracker.git
cd multi-object-tracker
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtFor CUDA GPU:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txtpython app.py
# → opens at http://localhost:7860Upload a video or paste a YouTube URL and click Run Tracking.
# Local file — track people
python main.py --video path/to/video.mp4 --classes person
# YouTube URL — full auto download + track + audio preserved
python main.py --video "https://www.youtube.com/watch?v=VIDEO_ID"
# GPU + larger model + live preview
python main.py --video video.mp4 --model yolov8m.pt --device cuda --display
# Track vehicles only
python main.py --video traffic.mp4 --classes car bus truck motorcycle
# DeepSORT for appearance-based re-ID
python main.py --video myvideo.mp4 --tracker deepsort
# Skip every other frame (2× faster)
python main.py --video myvideo.mp4 --skip 1
# Skip audio muxing
python main.py --video myvideo.mp4 --no-audio| Argument | Default | Description |
|---|---|---|
--video |
(required) | File path or public URL |
--output-dir |
output/ |
Where to save results |
--model |
yolov8n.pt |
YOLO weights variant |
--conf |
0.35 |
Detection confidence threshold |
--iou |
0.45 |
NMS IoU threshold |
--classes |
all | Space-separated class names |
--tracker |
bytetrack |
bytetrack or deepsort |
--track-buffer |
30 |
Frames before lost track deletion |
--skip |
0 |
Frame skip (0 = every frame) |
--device |
auto |
cpu, cuda, mps, auto |
--no-audio |
off | Skip FFmpeg audio muxing |
--no-trail |
off | Disable motion trails |
--display |
off | Show live preview window |
| File | Description |
|---|---|
output/<name>_tracked.mp4 |
Annotated video with audio |
output/logs/<name>_tracking.csv |
Per-frame detection log |
output/logs/<name>_tracking.json |
Same data grouped by frame |
frame_id, track_id, class_id, class_name, x1, y1, x2, y2, confidence
OpenCV's VideoWriter cannot write audio streams — it always produces silent video.
Our fix: After the frame loop completes, we call FFmpeg to mux the original audio track directly into the annotated video without re-encoding the video stream:
[silent annotated video] ──┐
├──► FFmpeg mux ──► final video with audio ✅
[original audio stream] ──┘
FFmpeg must be installed on the system (packages.txt handles this on HF Spaces automatically).
| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
yolov8n.pt |
⚡ Fastest | Good | Real-time, CPU |
yolov8s.pt |
Fast | Better | Balanced default |
yolov8m.pt |
Medium | Great | Higher accuracy |
yolov8l.pt |
Slow | Excellent | Offline |
yolov8x.pt |
Slowest | Best | Max accuracy |
Models download automatically on first use.
ByteTrack (default)
- Uses ALL detections including low-confidence ones → better occlusion recovery
- No appearance features needed → very fast
- Best for: sports, surveillance, traffic
DeepSORT
- Adds appearance embeddings (MobileNet) for re-ID after long occlusions
- Slower but more robust when objects look similar and disappear
- Best for: long occlusions, crowded scenes
Assumptions
- Input video readable by OpenCV (MP4, AVI, MOV, MKV, …)
- YOLO model trained on COCO classes (80 categories)
- Camera motion is moderate (extreme shake degrades IoU matching)
Limitations
- No cross-camera tracking — IDs are per video only
- Long disappearances (>
track_bufferframes) reset the ID - Very small objects (< 8 px after resize) are rarely detected
- Age-restricted or DRM YouTube videos cannot be downloaded
from detector.detector import Detector, DetectorConfig
from tracker.tracker import ObjectTracker, TrackerConfig
from draw.draw import FrameAnnotator
detector = Detector(DetectorConfig(model_name="yolov8s.pt", confidence_threshold=0.4))
tracker = ObjectTracker(TrackerConfig())
annotator = FrameAnnotator()
# In your frame loop:
detections = detector.detect(frame_bgr)
tracked = tracker.update(detections, frame_bgr)
annotated = annotator.annotate(frame_bgr, tracked, fps=30.0, frame_id=42)- Push this repo to GitHub
- Go to huggingface.co/new-space
- Choose Gradio SDK, link your GitHub repo
- HF Spaces reads
packages.txt→ installsffmpegautomatically - Reads
requirements.txt→ installs Python deps - Launches
app.py— your app is live athttps://huggingface.co/spaces/mk909/multi-object-tracker
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference