Skip to content

mmarinated/ssl_project

Repository files navigation

Bird's Eye View layout prediction: roads and cars

Final project for Deep Learning course (DS-GA 1008, NYU Center for Data Science)

Top-1 in both tasks: road layout prediction and car bounding boxes prediction

Shreyas Chandrakaladharan, Marina Zavalina, Philip Ekfeldt

Abstract

In this project we focus on Bird's Eye View (BEV) prediction based on monocular photos taken by the cameras on top of the car. We present a Maximum Mean Discrepancy Variational Auto Encoder (MMD VAE) model to predict the BEV road layout. We also contribute an approach combining Image Warping, U-Net and Post-processing to predict the bounding boxes (BB) on the BEV layout. Our models achieve 0.81 test threat score on the road layout prediction task and 0.072 test threat score on the BB prediction task. Animations below visualize the predictions of our final models.

Animations

Predictions of the final models

Predictions

Projecting photos to Bird's Eye View

Projections


Usage

Generate and save labels

Use generate_labels.py to generate

  • vehicles mask
  • road mask
  • warped and glued photos

Road Layout Prediction

Refer to road_layout_prediction/ for code used to train and test road layout prediction models.

  • Main notebook, Training & Evaluation: road_layout_prediction.ipynb
  • Model Architectures and Loss functions: modelzoo.py

Bounding Boxes Prediction

Refer to vehicle_layout_predictions/ for preprocessing, modeling and postprocessing.

  • Image warping and glueing (OpenCV, Kornia): preprocessing/
  • U-Net model (fastai): notebooks/fastai_final_for_cars.ipynb
  • Converting segmentation map to bounding boxes coordinates (OpenCV): postprocessing/

Self-supervised learning (tried but didn't use in final models)


Libraries used

Parts of code sourced from:

Papers and useful links:

About

Final project on self-supervised learning for DL course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors