Fine-tuned ViT for overtaking vehicle type classification

A two-stage pipeline for classifying overtaking vehicles in dashcam footage:

Stage 1 (RT-DETR) detects vehicles in each frame.
Stage 2 (fine-tuned ViT) classifies the selected overtaking vehicle into one of six types.

Six classes: passenger_car · large_van · commercial_truck · minivan · pickup · suv

The ViT (google/vit-base-patch16-224-in21k) is fine-tuned with focal loss and inverse-frequency class weights to reduce bias toward common types (e.g. passenger car, SUV, pickup from Stanford Cars) relative to rarer types (large van, commercial truck, minivan from web-scraped and field-annotated images).

Repository layout

This repository follows a common layout for research-oriented computer vision projects: data preparation scripts, training code, released weights metadata, inference entry points, evaluation utilities, and generated artifacts kept separate.

├── data/                    # Dataset preparation scripts (Stanford Cars, scraping, field crops, merge)
├── training/                # ViT fine-tuning (train.py, config.py)
├── model/                   # Model config, label map, preprocessor config (weights distributed separately)
├── inference/               # Two-stage RT-DETR + ViT inference scripts
├── evaluation/              # Metrics, confusion matrices, and plots
├── requirements.txt         # Python dependencies (install PyTorch separately first)
├── pyproject.toml           # Project metadata
└── LICENSE

Data paths for inference and evaluation are configured through environment variables documented in inference/config.py. Set NSF_DATA_ROOT and ANNARBOR_DATA_ROOT to match your local data layout before running inference or evaluation scripts.

Setup

# Example: Python 3.11
python -m venv .venv
.venv\Scripts\activate            # Windows
# source .venv/bin/activate       # Linux / macOS

# Install PyTorch for your platform first, e.g. CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install -r requirements.txt

For evaluation figures only:

pip install -r evaluation/requirements.txt

Reproducing training

1. Prepare training data

Source	Role	Script
Stanford Cars (Hugging Face)	passenger_car, suv, pickup, minivan	`data/download_stanford_cars.py`
Web-scraped images	large_van, commercial_truck, minivan	`data/scrape_images.py`
Ann Arbor field crops (RT-DETR from video)	all six classes	`data/generate_crops.py`

# Run all commands from the repository root
# Windows uses ^ for line continuation; Linux/macOS uses \

python data/download_stanford_cars.py --output-dir data/stanford_cars_raw
python data/scrape_images.py

python data/generate_crops.py ^
  --csv path/to/annarbor-ndivision-20221005_groundtruth.csv ^
  --images-dir path/to/image_sequence ^
  --output-dir data/field_crops

python data/prepare_dataset_customdata.py ^
  --stanford-raw  data/stanford_cars_raw ^
  --scraped-clean data/scraped_clean ^
  --annarbor-dir  data/field_crops ^
  --output-dir    data/train_customdata

2. Train

# Windows
python training/train.py ^
  --data-dir data/train_customdata ^
  --output-dir model ^
  --results-dir model ^
  --epochs 30

# Linux/macOS
# python training/train.py \
#   --data-dir data/train_customdata \
#   --output-dir model \
#   --results-dir model \
#   --epochs 30

Training hyperparameters are documented in training/config.py and training/README.md.

Inference

Scripts in inference/ import each other by relative name, so always cd inference first before running them.

NSF Cycling Safety (trip 1 / trip 2)

# Set the dataset root (adjust separator for your OS)
# Windows:  set NSF_DATA_ROOT=D:\path\to\nsf-cycling-safety
# Linux/macOS: export NSF_DATA_ROOT=/path/to/nsf-cycling-safety

cd inference

# Windows
python inference_nsf.py --model-dir ..\model --trip both
# Linux/macOS
# python inference_nsf.py --model-dir ../model --trip both

Ann Arbor N. Division

cd inference

# Windows
python inference_annarbor.py --model-dir ..\model --csv path/to/passing_events.csv --images-dir path/to/image_sequence
# Linux/macOS
# python inference_annarbor.py --model-dir ../model --csv path/to/passing_events.csv --images-dir path/to/image_sequence

Output is written to ../outputs/trip_outputs/<name>/ by default. Intermediate image dumps are gitignored.

Evaluation

cd evaluation

# Windows
python run_all.py ^
  --inf1 path/to/trip1_inference.csv --gt1 path/to/trip1_gt.csv ^
  --inf2 path/to/trip2_inference.csv --gt2 path/to/trip2_gt.csv ^
  --aa-inf path/to/annarbor_inference.csv --aa-gt path/to/annarbor_gt.csv ^
  --out ..\outputs\evaluation_run

# Linux/macOS
# python run_all.py \
#   --inf1 path/to/trip1_inference.csv --gt1 path/to/trip1_gt.csv \
#   --inf2 path/to/trip2_inference.csv --gt2 path/to/trip2_gt.csv \
#   --aa-inf path/to/annarbor_inference.csv --aa-gt path/to/annarbor_gt.csv \
#   --out ../outputs/evaluation_run

See evaluation/README.md for per-dataset commands and metric definitions.

Model weights

model/checkpoint_final/model.safetensors (~330 MB) is not stored in this repository.

To obtain weights:

Option A — Download: Check the Releases page or the associated paper for a direct download link. Place the file at model/checkpoint_final/model.safetensors.
Option B — Retrain: Follow the Reproducing training section above. Training takes ~45 minutes on a single NVIDIA RTX 6000 GPU.

The small sidecar files (config.json, label_mapping.json, preprocessor_config.json) are already in this repository and do not need to be downloaded separately.

Citation

If you use this code or model, please cite the associated dissertation or publication once available, and cite Hugging Face model IDs used in training/config.py and inference/config.py.

License

See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-tuned ViT for overtaking vehicle type classification

Repository layout

Setup

Reproducing training

1. Prepare training data

2. Train

Inference

Evaluation

Model weights

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
evaluation		evaluation
inference		inference
model		model
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Fine-tuned ViT for overtaking vehicle type classification

Repository layout

Setup

Reproducing training

1. Prepare training data

2. Train

Inference

Evaluation

Model weights

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages