A two-stage pipeline for classifying overtaking vehicles in dashcam footage:
- Stage 1 (RT-DETR) detects vehicles in each frame.
- Stage 2 (fine-tuned ViT) classifies the selected overtaking vehicle into one of six types.
Six classes: passenger_car · large_van · commercial_truck · minivan · pickup · suv
The ViT (google/vit-base-patch16-224-in21k) is fine-tuned with focal loss and inverse-frequency class weights to reduce bias toward common types (e.g. passenger car, SUV, pickup from Stanford Cars) relative to rarer types (large van, commercial truck, minivan from web-scraped and field-annotated images).
This repository follows a common layout for research-oriented computer vision projects: data preparation scripts, training code, released weights metadata, inference entry points, evaluation utilities, and generated artifacts kept separate.
├── data/ # Dataset preparation scripts (Stanford Cars, scraping, field crops, merge)
├── training/ # ViT fine-tuning (train.py, config.py)
├── model/ # Model config, label map, preprocessor config (weights distributed separately)
├── inference/ # Two-stage RT-DETR + ViT inference scripts
├── evaluation/ # Metrics, confusion matrices, and plots
├── requirements.txt # Python dependencies (install PyTorch separately first)
├── pyproject.toml # Project metadata
└── LICENSE
Data paths for inference and evaluation are configured through environment variables documented in inference/config.py. Set NSF_DATA_ROOT and ANNARBOR_DATA_ROOT to match your local data layout before running inference or evaluation scripts.
# Example: Python 3.11
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux / macOS
# Install PyTorch for your platform first, e.g. CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txtFor evaluation figures only:
pip install -r evaluation/requirements.txt| Source | Role | Script |
|---|---|---|
| Stanford Cars (Hugging Face) | passenger_car, suv, pickup, minivan | data/download_stanford_cars.py |
| Web-scraped images | large_van, commercial_truck, minivan | data/scrape_images.py |
| Ann Arbor field crops (RT-DETR from video) | all six classes | data/generate_crops.py |
# Run all commands from the repository root
# Windows uses ^ for line continuation; Linux/macOS uses \
python data/download_stanford_cars.py --output-dir data/stanford_cars_raw
python data/scrape_images.py
python data/generate_crops.py ^
--csv path/to/annarbor-ndivision-20221005_groundtruth.csv ^
--images-dir path/to/image_sequence ^
--output-dir data/field_crops
python data/prepare_dataset_customdata.py ^
--stanford-raw data/stanford_cars_raw ^
--scraped-clean data/scraped_clean ^
--annarbor-dir data/field_crops ^
--output-dir data/train_customdata# Windows
python training/train.py ^
--data-dir data/train_customdata ^
--output-dir model ^
--results-dir model ^
--epochs 30
# Linux/macOS
# python training/train.py \
# --data-dir data/train_customdata \
# --output-dir model \
# --results-dir model \
# --epochs 30Training hyperparameters are documented in training/config.py and training/README.md.
Scripts in inference/ import each other by relative name, so always cd inference first before running them.
NSF Cycling Safety (trip 1 / trip 2)
# Set the dataset root (adjust separator for your OS)
# Windows: set NSF_DATA_ROOT=D:\path\to\nsf-cycling-safety
# Linux/macOS: export NSF_DATA_ROOT=/path/to/nsf-cycling-safety
cd inference
# Windows
python inference_nsf.py --model-dir ..\model --trip both
# Linux/macOS
# python inference_nsf.py --model-dir ../model --trip bothAnn Arbor N. Division
cd inference
# Windows
python inference_annarbor.py --model-dir ..\model --csv path/to/passing_events.csv --images-dir path/to/image_sequence
# Linux/macOS
# python inference_annarbor.py --model-dir ../model --csv path/to/passing_events.csv --images-dir path/to/image_sequenceOutput is written to ../outputs/trip_outputs/<name>/ by default. Intermediate image dumps are gitignored.
cd evaluation
# Windows
python run_all.py ^
--inf1 path/to/trip1_inference.csv --gt1 path/to/trip1_gt.csv ^
--inf2 path/to/trip2_inference.csv --gt2 path/to/trip2_gt.csv ^
--aa-inf path/to/annarbor_inference.csv --aa-gt path/to/annarbor_gt.csv ^
--out ..\outputs\evaluation_run
# Linux/macOS
# python run_all.py \
# --inf1 path/to/trip1_inference.csv --gt1 path/to/trip1_gt.csv \
# --inf2 path/to/trip2_inference.csv --gt2 path/to/trip2_gt.csv \
# --aa-inf path/to/annarbor_inference.csv --aa-gt path/to/annarbor_gt.csv \
# --out ../outputs/evaluation_runSee evaluation/README.md for per-dataset commands and metric definitions.
model/checkpoint_final/model.safetensors (~330 MB) is not stored in this repository.
To obtain weights:
- Option A — Download: Check the Releases page or the associated paper for a direct download link. Place the file at
model/checkpoint_final/model.safetensors. - Option B — Retrain: Follow the Reproducing training section above. Training takes ~45 minutes on a single NVIDIA RTX 6000 GPU.
The small sidecar files (config.json, label_mapping.json, preprocessor_config.json) are already in this repository and do not need to be downloaded separately.
If you use this code or model, please cite the associated dissertation or publication once available, and cite Hugging Face model IDs used in training/config.py and inference/config.py.
See LICENSE.