Skip to content

botmahn/slowfast

M2MVT x DAAD in PySlowFast

License

An unofficial pytorch implementation for Early Anticipation of Driving Maneuvers based on the PySlowFast framework.

Overview

This repository attempts to reproduce the methodologies and results from the paper Early Anticipation of Driving Maneuvers (Abdul Wasi et al., ECCV 2024). The implementation processes multi-view and multi-modal driving sequences using 6 different camera views.

Key Features

  • Specialized dataloader for multi-view (6 cameras) driving sequences.
  • M2MVT architecture with spatio-temporal tubes, early fusion for the first 5 views and late fusion for the last (gaze) view.
  • Additional learnable memory tokens.
  • Built on Facebook AI Research's PySlowFast framework.

Installation

Installation steps are taken from here.

Prerequisites

  • Python 3.8
  • CUDA 11.7
  • PyTorch 1.13.0
  • TorchVision 0.14.0 (compiled from source)

Setup Environment

# Create and activate conda environment
conda create -n slowfast python=3.8
conda activate slowfast

# Install PyTorch ecosystem with CUDA support
conda install -y pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.7 -c pytorch -c nvidia

# Install FFmpeg
conda install -y -c conda-forge ffmpeg=4.2

Rebuild TorchVision from Source

This step is necessary to fix video decoding issues:

# Uninstall current torchvision
pip uninstall -y torchvision

# Clone and build TorchVision v0.14.0
git clone https://github.com/pytorch/vision.git
cd vision
git checkout v0.14.0
python setup.py install
cd ..

Install Dependencies

# Install PyTorchVideo from source
pip install "git+https://github.com/facebookresearch/pytorchvideo.git"

# Core dependencies
pip install simplejson opencv-python psutil
conda install -y -c conda-forge iopath
conda install -y tensorboard

# Analysis and data tools
pip install scikit-learn pandas
conda install -y -c conda-forge moviepy

# Additional frameworks
pip install 'git+https://github.com/facebookresearch/fairscale'
pip install cython
pip install -U 'git+https://github.com/facebookresearch/fvcore.git' 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

# Install Detectron2
git clone https://github.com/facebookresearch/detectron2 detectron2_repo
pip install -e detectron2_repo

Install PySlowFast

# Clone repository
git clone https://github.com/facebookresearch/slowfast

# Add to PYTHONPATH
echo 'export PYTHONPATH=/path/to/slowfast:$PYTHONPATH' >> ~/.bashrc
source ~/.bashrc

# Build PySlowFast
cd slowfast
python setup.py build develop

Fixing Known Issues

After installation, you'll need to apply these critical fixes:

1. Fix Import Paths in tools/run_net.py

Replace:

# from vision.fair.slowfast.tools.demo_net import demo
# from vision.fair.slowfast.tools.test_net import test
# from vision.fair.slowfast.tools.train_net import train
# from vision.fair.slowfast.tools.visualization import visualize

With:

from demo_net import demo
from test_net import test
from train_net import train
from visualization import visualize

2. Fix Import Paths in slowfast/utils/ava_eval_helper.py

Replace:

# from vision.fair.slowfast.ava_evaluation import (
#     object_detection_evaluation,
#     standard_fields,
# )

With:

from ava_evaluation import (
    object_detection_evaluation,
    standard_fields,
)

3. Fix "NumPy array not writable" Error in slowfast/datasets/decoder.py

Replace:

# video_tensor = torch.from_numpy(np.frombuffer(video_handle, dtype=np.uint8))

With:

video_tensor = torch.from_numpy(np.frombuffer(np.array(video_handle), dtype=np.uint8))

Using Custom Data

Using custom data is taken from here.

Download the DAAD dataset from here

Follow these steps to prepare your custom dataset:

  1. Create the following directory structure:
SlowFast/
├── configs/
│       └── MyData/
│               └── I3D_8x8_R50.yaml
├── data/
│       └── MyData/
│               ├── ClassA/
│               │      └── video1.mp4
│               ├── ClassB/
│               │      └── video2.mp4
│               ├── ClassC/
│               |      └── video3.mp4
│               ├── train.csv
│               ├── test.csv
│               ├── val.csv
│               └── classids.json
  1. Create dataset handler:

    • Duplicate slowfast/datasets/kinetics.py and rename it to mydata.py
    • Replace all occurrences of "Kinetics" with "Mydata" (case-sensitive)
    • Add from .mydata import Mydata to slowfast/datasets/__init__.py
  2. Create JSON class mapping file (classids.json):

{"ClassA": 0, "ClassB": 1, "ClassC": 2}
  1. Create CSV dataset split files with format:
/path/to/SlowFast/data/MyData/ClassA/video1.mp4 0
/path/to/SlowFast/data/MyData/ClassC/video3.mp4 2
  1. Create configuration file by copying an existing one and changing "kinetics" to "mydata"

Training and Testing

To train the model:

python tools/run_net.py --cfg configs/DAAD/MVITv2_S_16x4_daad.yaml >& ./logs/log_m2mvt_daadsixviews.txt &

To test the model:

vim configs/DAAD/MVITv2_S_16x4_daad.yaml

set TRAIN.ENABLE to False
set TEST.ENABLE to True
set NUM_GPUS = 1

RUN:
python tools/run_net.py --cfg configs/DAAD/MVITv2_S_16x4_daad.yaml

Model Zoo

For pre-trained models and baseline results, refer to the PySlowFast Model Zoo.

For M2MVT x DAAD weights, refer to our MODEL ZOO.

Acknowledgements

This work builds upon several important contributions:

@inproceedings{feichtenhofer2019slowfast,
  title={Slowfast networks for video recognition},
  author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={6202--6211},
  year={2019}
}

@inproceedings{adm2024daad,
  author       = {Abdul Wasi, Shankar Gangisetty, Shyam Nandan Rai and C. V. Jawahar},
  title        = {Early Anticipation of Driving Maneuvers},
  booktitle    = {ECCV (70)},
  series       = {Lecture Notes in Computer Science},
  volume       = {15128},
  pages        = {152--169},
  publisher    = {Springer},
  year         = {2024}
}

@misc{tziafas2023earlylatefusionmatters,
      title={Early or Late Fusion Matters: Efficient RGB-D Fusion in Vision Transformers for 3D Object Recognition}, 
      author={Georgios Tziafas and Hamidreza Kasaei},
      year={2023},
      eprint={2210.00843},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2210.00843}, 
}

@misc{sandler2022finetuningimagetransformersusing,
      title={Fine-tuning Image Transformers using Learnable Memory}, 
      author={Mark Sandler and Andrey Zhmoginov and Max Vladymyrov and Andrew Jackson},
      year={2022},
      eprint={2203.15243},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2203.15243}, 
}

License

This project is released under the Apache 2.0 license, in accordance with the original PySlowFast repository.

About

An unofficial pytorch implementation of "Early Anticipation of Driving Maneuvers" based on the PySlowFast framework.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages