Skip to content

fenggroup/vehicle-type-classifier

Repository files navigation

Fine-tuned ViT for overtaking vehicle type classification

A two-stage pipeline for classifying overtaking vehicles in dashcam footage:

  1. Stage 1 (RT-DETR) detects vehicles in each frame.
  2. Stage 2 (fine-tuned ViT) classifies the selected overtaking vehicle into one of six types.

Six classes: passenger_car · large_van · commercial_truck · minivan · pickup · suv

The ViT (google/vit-base-patch16-224-in21k) is fine-tuned with focal loss and inverse-frequency class weights to reduce bias toward common types (e.g. passenger car, SUV, pickup from Stanford Cars) relative to rarer types (large van, commercial truck, minivan from web-scraped and field-annotated images).


Repository layout

This repository follows a common layout for research-oriented computer vision projects: data preparation scripts, training code, released weights metadata, inference entry points, evaluation utilities, and generated artifacts kept separate.

├── data/                    # Dataset preparation scripts (Stanford Cars, scraping, field crops, merge)
├── training/                # ViT fine-tuning (train.py, config.py)
├── model/                   # Model config, label map, preprocessor config (weights distributed separately)
├── inference/               # Two-stage RT-DETR + ViT inference scripts
├── evaluation/              # Metrics, confusion matrices, and plots
├── requirements.txt         # Python dependencies (install PyTorch separately first)
├── pyproject.toml           # Project metadata
└── LICENSE

Data paths for inference and evaluation are configured through environment variables documented in inference/config.py. Set NSF_DATA_ROOT and ANNARBOR_DATA_ROOT to match your local data layout before running inference or evaluation scripts.


Setup

# Example: Python 3.11
python -m venv .venv
.venv\Scripts\activate            # Windows
# source .venv/bin/activate       # Linux / macOS

# Install PyTorch for your platform first, e.g. CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install -r requirements.txt

For evaluation figures only:

pip install -r evaluation/requirements.txt

Reproducing training

1. Prepare training data

Source Role Script
Stanford Cars (Hugging Face) passenger_car, suv, pickup, minivan data/download_stanford_cars.py
Web-scraped images large_van, commercial_truck, minivan data/scrape_images.py
Ann Arbor field crops (RT-DETR from video) all six classes data/generate_crops.py
# Run all commands from the repository root
# Windows uses ^ for line continuation; Linux/macOS uses \

python data/download_stanford_cars.py --output-dir data/stanford_cars_raw
python data/scrape_images.py

python data/generate_crops.py ^
  --csv path/to/annarbor-ndivision-20221005_groundtruth.csv ^
  --images-dir path/to/image_sequence ^
  --output-dir data/field_crops

python data/prepare_dataset_customdata.py ^
  --stanford-raw  data/stanford_cars_raw ^
  --scraped-clean data/scraped_clean ^
  --annarbor-dir  data/field_crops ^
  --output-dir    data/train_customdata

2. Train

# Windows
python training/train.py ^
  --data-dir data/train_customdata ^
  --output-dir model ^
  --results-dir model ^
  --epochs 30

# Linux/macOS
# python training/train.py \
#   --data-dir data/train_customdata \
#   --output-dir model \
#   --results-dir model \
#   --epochs 30

Training hyperparameters are documented in training/config.py and training/README.md.


Inference

Scripts in inference/ import each other by relative name, so always cd inference first before running them.

NSF Cycling Safety (trip 1 / trip 2)

# Set the dataset root (adjust separator for your OS)
# Windows:  set NSF_DATA_ROOT=D:\path\to\nsf-cycling-safety
# Linux/macOS: export NSF_DATA_ROOT=/path/to/nsf-cycling-safety

cd inference

# Windows
python inference_nsf.py --model-dir ..\model --trip both
# Linux/macOS
# python inference_nsf.py --model-dir ../model --trip both

Ann Arbor N. Division

cd inference

# Windows
python inference_annarbor.py --model-dir ..\model --csv path/to/passing_events.csv --images-dir path/to/image_sequence
# Linux/macOS
# python inference_annarbor.py --model-dir ../model --csv path/to/passing_events.csv --images-dir path/to/image_sequence

Output is written to ../outputs/trip_outputs/<name>/ by default. Intermediate image dumps are gitignored.


Evaluation

cd evaluation

# Windows
python run_all.py ^
  --inf1 path/to/trip1_inference.csv --gt1 path/to/trip1_gt.csv ^
  --inf2 path/to/trip2_inference.csv --gt2 path/to/trip2_gt.csv ^
  --aa-inf path/to/annarbor_inference.csv --aa-gt path/to/annarbor_gt.csv ^
  --out ..\outputs\evaluation_run

# Linux/macOS
# python run_all.py \
#   --inf1 path/to/trip1_inference.csv --gt1 path/to/trip1_gt.csv \
#   --inf2 path/to/trip2_inference.csv --gt2 path/to/trip2_gt.csv \
#   --aa-inf path/to/annarbor_inference.csv --aa-gt path/to/annarbor_gt.csv \
#   --out ../outputs/evaluation_run

See evaluation/README.md for per-dataset commands and metric definitions.


Model weights

model/checkpoint_final/model.safetensors (~330 MB) is not stored in this repository.

To obtain weights:

  • Option A — Download: Check the Releases page or the associated paper for a direct download link. Place the file at model/checkpoint_final/model.safetensors.
  • Option B — Retrain: Follow the Reproducing training section above. Training takes ~45 minutes on a single NVIDIA RTX 6000 GPU.

The small sidecar files (config.json, label_mapping.json, preprocessor_config.json) are already in this repository and do not need to be downloaded separately.


Citation

If you use this code or model, please cite the associated dissertation or publication once available, and cite Hugging Face model IDs used in training/config.py and inference/config.py.


License

See LICENSE.

About

ViT-Base 16 and RT-DETR based two stage vehicle type classification system for 6 vehicle types

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages