MobileNetV2 Object Detection Toolkit

A desktop application for training and running MobileNetV2 SSD-Lite object detection models. Built with PyQt5 and powered by pytorch-ssd.

Features

Feature	Description
Annotation	Built-in YOLO-format labelling tool (labelimg2)
Dataset prep	Automatic train / val / test split via GUI or CLI
Training	MobileNetV2 SSD-Lite, YOLO .txt labels, SGD with momentum
Photo detection	Load any image, run inference, view annotated result
Camera detection	Real-time detection from webcam

Requirements

Package	Version
Python	3.9 – 3.12
PyTorch	≥ 2.1.0
torchvision	≥ 0.16.0
OpenCV	≥ 4.8.0 (`opencv-python`)
PyQt5	≥ 5.15.9
NumPy	≥ 1.24.0

Install all dependencies:

pip install -r requirements.txt

Launch the Application

python main.py

The main menu has three options:

数据集训练 — Dataset Training
摄像头模型验证 — Live Camera Detection
照片模型验证 — Single-Image Detection

Complete Workflow

Raw images
    │
    ▼
[1] Annotate with labelimg2
    │  python labelimg2/main.py --img_dir <images/> --save_dir <labels/>
    ▼
[2] Prepare dataset (train/val/test split)
    │  python prepare_dataset.py --src <raw/> --dst <dataset/> --class-names beer,coke
    ▼
[3] Train
    │  python -m mobilenet.train  (or GUI)
    ▼
[4] Weights saved → weights/mb2-ssd-lite-best.pth
    │              weights/mb2-ssd-labels.txt
    ▼
[5] Inference
       python main.py → Photo / Camera detection

Step 1 — Annotate Raw Images

Option A: Standalone annotation tool

python labelimg2/main.py \
  --img_dir  /path/to/your/images \
  --save_dir /path/to/save/labels

Draw bounding boxes, assign class names, save in YOLO .txt format.
Each saved file is named <image_stem>.txt and contains one line per object:
```
<class_id>  <cx>  <cy>  <width>  <height>
```
All coordinates are normalized to [0, 1].

Option B: Annotate from the GUI

Launch python main.py → 数据集训练.
Click 选择数据集目录 and pick a folder that already has images/ and labels/ subdirectories.
Click 开始标注 — labelimg2 opens automatically pointing at that folder.

Step 2 — Prepare Dataset Structure

The training pipeline requires images and labels split into train / val / test sub-folders. Use the provided script to create this layout automatically:

python prepare_dataset.py \
  --src   /path/to/raw_folder \
  --dst   /path/to/output_dataset \
  --class-names beer,coke,sprite

Arguments:

Argument	Default	Description
`--src`	(required)	Folder with images + matching `.txt` label files
`--dst`	(required)	Output dataset folder to create
`--class-names`	`""`	Comma-separated class names (no spaces), e.g. `beer,coke,sprite`
`--train-ratio`	`0.7`	Fraction used for training
`--val-ratio`	`0.2`	Fraction used for validation (remainder → test)
`--seed`	`42`	Random seed for reproducible splits

Output structure:

output_dataset/
├── images/
│   ├── train/   ← 70 % of images
│   ├── val/     ← 20 %
│   └── test/    ← 10 %
├── labels/
│   ├── train/   ← matching YOLO .txt files
│   ├── val/
│   └── test/
├── annotations/ ← COCO JSON stubs (used by GUI dataset checker)
└── class_names.txt

Tip: If your images and labels are already split into subfolders, you can skip this script and point the training directly at images/train/ and labels/train/.

Step 3 — Train

Option A: GUI

python main.py → 数据集训练
选择数据集目录 — pick the folder created in Step 2.
The GUI validates the dataset structure. Fix any errors shown.
Enter your class names (comma-separated, e.g. beer,coke,sprite).
Set Epochs and Batch size.
(Optional) Click 选择权重文件 to continue training from an existing .pth checkpoint.
Click 开始训练.

Training logs stream in real time. Weights are saved to weights/ automatically.

Option B: Command Line

python -m mobilenet.train \
  --image-dir  /path/to/dataset/images/train \
  --label-dir  /path/to/dataset/labels/train \
  --class-names beer,coke,sprite \
  --epochs 20 \
  --batch-size 8 \
  --lr 1e-3 \
  --save-dir weights

All CLI arguments:

Argument	Default	Description
`--image-dir`	(required)	Path to training images directory
`--label-dir`	(required)	Path to training YOLO `.txt` labels directory
`--class-names`	(required)	Comma-separated class names (no background)
`--epochs`	`20`	Number of training epochs
`--batch-size`	`8`	Batch size
`--lr`	`1e-3`	Initial learning rate (SGD)
`--weights`	`""`	Pre-trained `.pth` file to continue training from
`--save-dir`	`weights`	Directory to save checkpoints
`--device`	auto	`cuda` or `cpu`

Step 4 — Saved Weight Files

After training, the following files are written to --save-dir (default: weights/):

File	Description
`mb2-ssd-lite-best.pth`	Checkpoint with the lowest training loss
`mb2-ssd-lite-last.pth`	Checkpoint from the last epoch
`mb2-ssd-labels.txt`	Class name list (read automatically at inference time)

mb2-ssd-labels.txt must stay in the same folder as the .pth file — it is loaded automatically when you open the model for inference.

Step 5 — Inference

Option A: Photo detection (GUI)

python main.py → 照片模型验证
Select a model file (.pth) from the weights/ folder.
Adjust confidence threshold and image size if needed.
Click 选择图片 — detection results are drawn on the image.

Option B: Live camera detection (GUI)

python main.py → 摄像头模型验证
Select a model file.
Click 开始检测 — bounding boxes are overlaid on the live webcam feed in real time.

Option C: Python API

import cv2
from mobilenet.infer import MobileNetV2Detector

# Load model (class names are read from mb2-ssd-labels.txt automatically)
detector = MobileNetV2Detector(
    model_path="weights/mb2-ssd-lite-best.pth",
    use_cpu=False,   # set True to force CPU
)

# Run on a single image
frame = cv2.imread("photo.jpg")          # BGR format
result = detector.predict(frame, score_threshold=0.5)

# result.boxes  → numpy (N, 4)  pixel coords [x1, y1, x2, y2]
# result.scores → numpy (N,)
# result.labels → numpy (N,)  1-based class indices

# Draw and save
annotated = detector.draw(frame, result)
cv2.imwrite("output.jpg", annotated)

print(f"Detected {len(result.boxes)} objects")
for box, score, label in zip(result.boxes, result.scores, result.labels):
    name = detector.class_names[int(label) - 1]
    print(f"  {name}  {score:.2f}  {box.astype(int).tolist()}")

Project Structure

MB V2/
├── main.py                    # GUI application entry point
├── prepare_dataset.py         # Raw data → train/val/test dataset converter
├── requirements.txt           # Python dependencies
│
├── mobilenet/                 # MobileNetV2 module (wraps pytorch-ssd)
│   ├── model.py               # build_model / load_model
│   ├── train.py               # Training script (CLI entry point)
│   ├── infer.py               # MobileNetV2Detector class
│   ├── yolo_dataset.py        # YOLO-format dataset adapter for pytorch-ssd
│   └── config.py              # AppConfig defaults
│
├── pytorch-ssd/               # Upstream SSD library (qfgaohao/pytorch-ssd)
│   └── vision/                # Model, loss, transforms, predictor
│
├── labelimg2/                 # Annotation tool (YOLO .txt output)
│
└── weights/                   # Created by training
    ├── mb2-ssd-lite-best.pth
    ├── mb2-ssd-lite-last.pth
    └── mb2-ssd-labels.txt

Notes

Image size used by the model is fixed at 300 × 300 (SSD standard). The size combo in the GUI affects how input images are pre-resized before passing to the model, not the internal network input.
GPU acceleration: if CUDA is available it is used automatically; pass --device cpu or tick 强制使用CPU to force CPU mode.
Class names must be consistent between training and inference; they are embedded in mb2-ssd-labels.txt at training time.
The yolov7/ folder is not required and can be deleted without affecting any functionality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MobileNetV2 Object Detection Toolkit

Features

Requirements

Launch the Application

Complete Workflow

Step 1 — Annotate Raw Images

Option A: Standalone annotation tool

Option B: Annotate from the GUI

Step 2 — Prepare Dataset Structure

Step 3 — Train

Option A: GUI

Option B: Command Line

Step 4 — Saved Weight Files

Step 5 — Inference

Option A: Photo detection (GUI)

Option B: Live camera detection (GUI)

Option C: Python API

Project Structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
labelimg2		labelimg2
mobilenet		mobilenet
pytorch-ssd		pytorch-ssd
yolov7		yolov7
.gitignore		.gitignore
README.md		README.md
main.py		main.py
prepare_dataset.py		prepare_dataset.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MobileNetV2 Object Detection Toolkit

Features

Requirements

Launch the Application

Complete Workflow

Step 1 — Annotate Raw Images

Option A: Standalone annotation tool

Option B: Annotate from the GUI

Step 2 — Prepare Dataset Structure

Step 3 — Train

Option A: GUI

Option B: Command Line

Step 4 — Saved Weight Files

Step 5 — Inference

Option A: Photo detection (GUI)

Option B: Live camera detection (GUI)

Option C: Python API

Project Structure

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages