Skip to content

kairwang01/Computer-Vision-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Object Detection (YOLO11)

Real-time webcam / video / RTSP object detection with Ultralytics YOLO11 and OpenCV. Auto-selects the best available device (CUDA → Apple Silicon MPS → CPU), draws boxes + labels + FPS overlay, optionally records the annotated stream, and writes detections to a rotating log.

Python Ultralytics YOLO OpenCV License: MIT

🌐 Languages: English · 中文


English

Demo

Demo GIF coming soon. Record one in under a minute with the recipe below — the workflow is python detect.py --duration 12 --save assets/demo.mp4 … then a single ffmpeg step.

Features

  • 80-class COCO detection with the latest YOLO11 family — swap any variant via a flag (yolo11n / yolo11s / yolo11m / yolo11l / yolo11x).
  • Auto device selection — picks cuda if you have an NVIDIA GPU, else Apple Silicon mps, else cpu. No code change needed across machines.
  • Multiple sources — webcam (--source 0), local video files, or RTSP / HTTP streams.
  • Rotating log — detection events go to detection_log.txt with size-based rotation (no infinite growth).
  • Optional annotated recording--save out.mp4 writes the live overlay to disk for replay or sharing.
  • HUD overlay — current time, rolling FPS, active device.
  • Class filter — restrict detection to specific COCO classes (e.g. --classes 0 for people only).
  • Headless mode--no-display runs without a preview window, ideal for servers or batch processing.
  • Graceful shutdown on Ctrl+C and SIGTERM.

Quick Start

git clone https://github.com/kairwang01/Computer-Vision-python.git
cd Computer-Vision-python

# Recommended: virtual environment
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Run with the default webcam, default model (YOLO11n auto-downloads on first run)
python detect.py
# Press 'q' in the preview window to quit

CLI Options

Flag Default Description
--source 0 Camera index, video path, or RTSP / HTTP URL
--model yolo11n.pt YOLO weights — auto-downloaded by Ultralytics
--conf 0.4 Confidence threshold for displaying / logging
--iou 0.5 IoU threshold for non-maximum suppression
--device auto auto / cpu / cuda / mps
--imgsz 640 Inference image size (square)
--classes (all) COCO class indices to keep, e.g. --classes 0 2 7 (person, car, truck)
--save (off) Path to save annotated MP4, e.g. --save out.mp4
--log-file detection_log.txt Rotating log path
--no-display false Headless mode
--max-fps 0 (uncapped) Soft FPS cap
--duration 0 (no limit) Auto-stop after N seconds (handy for demos)

Examples

# Webcam, larger model, lower threshold (catch more)
python detect.py --model yolo11s.pt --conf 0.25

# Process a video file and save the annotated output
python detect.py --source clip.mp4 --save out.mp4

# RTSP camera, headless, log to a custom file, only people
python detect.py --source rtsp://cam.local/stream --no-display \
                 --classes 0 --log-file logs/people.txt

# Force CPU even if CUDA / MPS is available (e.g. for benchmarking)
python detect.py --device cpu

How It Works

┌────────────────┐     ┌──────────────────────┐     ┌────────────────┐
│ Source         │ ──▶ │ YOLO11 inference     │ ──▶ │ OpenCV render  │
│ webcam / file  │     │ (auto cuda/mps/cpu)  │     │ + HUD overlay  │
│ RTSP / HTTP    │     │ conf + iou + classes │     │                │
└────────────────┘     └──────────────────────┘     └────────────────┘
                                                            │
                              ┌─────────────────────────────┴────────┐
                              ▼                                      ▼
                  ┌────────────────────┐                  ┌────────────────────┐
                  │ Rotating log file  │                  │ Optional MP4 writer│
                  │ (auto-rotated)     │                  │ (--save out.mp4)   │
                  └────────────────────┘                  └────────────────────┘

Recording a demo

The --duration flag exits cleanly after N seconds, which makes it trivial to capture a short clip and convert it to a GIF for the README. Requires ffmpeg (brew install ffmpeg / apt install ffmpeg).

mkdir -p assets

# 1. Record a 12-second annotated MP4 from the webcam (no preview window)
python detect.py --duration 12 --max-fps 15 --no-display --save assets/demo.mp4

# 2. Convert MP4 → optimized GIF (~720px wide, 15 fps)
ffmpeg -i assets/demo.mp4 \
       -vf "fps=15,scale=720:-1:flags=lanczos,split[a][b];[a]palettegen[p];[b][p]paletteuse" \
       -loop 0 assets/demo.gif

# 3. Wire it into the README and commit
sed -i '' "s|_Demo GIF coming soon.*|![demo](assets/demo.gif)|" README.md
git add assets/demo.gif README.md && git commit -m "docs: add demo GIF"

Tech Stack

Layer Choice
Language Python 3.10+
Detection model Ultralytics YOLO11
Inference backend PyTorch (auto-selected: CUDA / MPS / CPU)
Video I/O + drawing OpenCV ≥ 4.10
Logging logging.handlers.RotatingFileHandler (stdlib)
CLI argparse (stdlib)

Project Structure

.
├── detect.py            Main entry — CLI, capture loop, inference, render, log
├── requirements.txt     Pinned major-version constraints
├── assets/              Demo GIF goes here (recording recipe above)
├── .gitignore           Excludes weights, logs, caches, output videos
├── LICENSE              MIT
└── README.md

Performance Reference

Numbers are rough indicators on common hardware with yolo11n.pt at --imgsz 640. Your mileage will vary with frame size and other classes' density.

Hardware Device FPS (typical)
Apple Silicon M2 / M3 mps ~30–60
NVIDIA RTX 3060+ cuda ~60–120
Modern CPU only cpu ~10–20

For tighter latency targets, use yolo11n with smaller --imgsz. For higher accuracy, switch to yolo11s / yolo11m and accept lower FPS.

Acknowledgements

Author

Kair Wang (@kairwang01)


中文

演示

Demo GIF 待补。下面这段命令一分钟内能录一个——核心就是 python detect.py --duration 12 --save assets/demo.mp4 … 加一行 ffmpeg

功能

  • 80 类 COCO 检测,使用最新的 YOLO11 系列——通过 --model 一行切换型号(yolo11n / yolo11s / yolo11m / yolo11l / yolo11x)。
  • 设备自动选择——优先 cuda,其次 Apple Silicon mps,最后回落 cpu。换机器无需改代码。
  • 多种输入源——摄像头(--source 0)、本地视频文件、或 RTSP / HTTP 流。
  • 轮转日志——检测事件写入 detection_log.txt,按大小自动轮转,不会无限增长。
  • 可选录制叠加视频——--save out.mp4 把带框的实时画面落盘,便于回看或分享。
  • HUD 叠加层——当前时间、滑动窗口 FPS、当前 device。
  • 类别过滤——只保留指定 COCO 类别(例如 --classes 0 仅检测人)。
  • 无显示模式——--no-display 不弹预览窗,适合跑服务器或批处理。
  • 优雅退出——支持 Ctrl+CSIGTERM

快速开始

git clone https://github.com/kairwang01/Computer-Vision-python.git
cd Computer-Vision-python

# 推荐使用虚拟环境
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 默认摄像头 + 默认模型(YOLO11n 首次运行自动下载)
python detect.py
# 在预览窗口按 'q' 退出

CLI 参数

参数 默认值 说明
--source 0 摄像头编号、视频路径、或 RTSP / HTTP URL
--model yolo11n.pt YOLO 权重——Ultralytics 会自动下载
--conf 0.4 显示 / 记录的置信度阈值
--iou 0.5 NMS 用的 IoU 阈值
--device auto auto / cpu / cuda / mps
--imgsz 640 推理图像尺寸(正方形)
--classes (全部) 只保留的 COCO 类别索引,例如 --classes 0 2 7(人、车、卡车)
--save (关闭) 保存带框 MP4 的路径,例如 --save out.mp4
--log-file detection_log.txt 轮转日志路径
--no-display false 无显示模式
--max-fps 0(不限) 软 FPS 上限
--duration 0(不限) 跑 N 秒后自动停(录 demo 友好)

示例

# 摄像头 + 更大模型 + 更低阈值(多检出)
python detect.py --model yolo11s.pt --conf 0.25

# 处理视频文件并保存带框输出
python detect.py --source clip.mp4 --save out.mp4

# RTSP 摄像头,无显示,只检测人,写到自定义日志
python detect.py --source rtsp://cam.local/stream --no-display \
                 --classes 0 --log-file logs/people.txt

# 强制走 CPU(哪怕有 CUDA / MPS,用于基准对比)
python detect.py --device cpu

工作原理

┌────────────────┐     ┌──────────────────────┐     ┌────────────────┐
│ 输入源         │ ──▶ │ YOLO11 推理          │ ──▶ │ OpenCV 渲染    │
│ 摄像头/文件    │     │ (auto cuda/mps/cpu)  │     │ + HUD 叠加     │
│ RTSP/HTTP      │     │ conf + iou + classes │     │                │
└────────────────┘     └──────────────────────┘     └────────────────┘
                                                            │
                              ┌─────────────────────────────┴────────┐
                              ▼                                      ▼
                  ┌────────────────────┐                  ┌────────────────────┐
                  │ 轮转日志文件       │                  │ 可选 MP4 写入器    │
                  │ (按大小自动轮转)   │                  │ (--save out.mp4)   │
                  └────────────────────┘                  └────────────────────┘

录制 demo

--duration 参数可以让程序到时间自动干净退出,配合 ffmpeg 一行就能录好 demo GIF。需要本机有 ffmpegbrew install ffmpeg / apt install ffmpeg)。

mkdir -p assets

# 1. 用摄像头录 12 秒带框 MP4(不弹预览窗)
python detect.py --duration 12 --max-fps 15 --no-display --save assets/demo.mp4

# 2. MP4 → 优化过的 GIF(720px 宽、15 fps)
ffmpeg -i assets/demo.mp4 \
       -vf "fps=15,scale=720:-1:flags=lanczos,split[a][b];[a]palettegen[p];[b][p]paletteuse" \
       -loop 0 assets/demo.gif

# 3. 把 README 顶部占位行替换成图片,并提交
sed -i '' "s|_Demo GIF 待补.*|![demo](assets/demo.gif)|" README.md
git add assets/demo.gif README.md && git commit -m "docs: add demo GIF"

技术栈

选型
语言 Python 3.10+
检测模型 Ultralytics YOLO11
推理后端 PyTorch(自动选择 CUDA / MPS / CPU)
视频 I/O + 绘制 OpenCV ≥ 4.10
日志 logging.handlers.RotatingFileHandler(标准库)
CLI argparse(标准库)

仓库结构

.
├── detect.py            主入口——CLI、采集循环、推理、渲染、日志
├── requirements.txt     固定主版本的依赖约束
├── assets/              放 demo GIF(录制方法见上一节)
├── .gitignore           忽略权重、日志、缓存、输出视频
├── LICENSE              MIT
└── README.md

性能参考

下面是常见硬件上 yolo11n.pt + --imgsz 640 的粗略 FPS。实际数字取决于画面大小和场景中目标密度。

硬件 Device FPS(典型)
Apple Silicon M2 / M3 mps ~30–60
NVIDIA RTX 3060+ cuda ~60–120
仅 CPU cpu ~10–20

追求更低延迟:用 yolo11n + 更小 --imgsz。追求更高准确率:换 yolo11s / yolo11m,接受 FPS 下降。

致谢

作者

Kair Wang (@kairwang01)

About

Real-time webcam / video / RTSP object detection with Ultralytics YOLO11 + OpenCV. Auto device selection (CUDA → Apple Silicon MPS → CPU), rotating logs, optional MP4 recording, 12-flag CLI. Python 3.10+.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages