Bird's Eye View layout prediction: roads and cars

Final project for Deep Learning course (DS-GA 1008, NYU Center for Data Science)

Top-1 in both tasks: road layout prediction and car bounding boxes prediction

Shreyas Chandrakaladharan, Marina Zavalina, Philip Ekfeldt

Report | Video

Abstract

In this project we focus on Bird's Eye View (BEV) prediction based on monocular photos taken by the cameras on top of the car. We present a Maximum Mean Discrepancy Variational Auto Encoder (MMD VAE) model to predict the BEV road layout. We also contribute an approach combining Image Warping, U-Net and Post-processing to predict the bounding boxes (BB) on the BEV layout. Our models achieve 0.81 test threat score on the road layout prediction task and 0.072 test threat score on the BB prediction task. Animations below visualize the predictions of our final models.

Animations

Predictions of the final models

Projecting photos to Bird's Eye View

Usage

Generate and save labels

Use generate_labels.py to generate

vehicles mask
road mask
warped and glued photos

Road Layout Prediction

Refer to road_layout_prediction/ for code used to train and test road layout prediction models.

Main notebook, Training & Evaluation: road_layout_prediction.ipynb
Model Architectures and Loss functions: modelzoo.py

Bounding Boxes Prediction

Refer to vehicle_layout_predictions/ for preprocessing, modeling and postprocessing.

Image warping and glueing (OpenCV, Kornia): preprocessing/
U-Net model (fastai): notebooks/fastai_final_for_cars.ipynb
Converting segmentation map to bounding boxes coordinates (OpenCV): postprocessing/

Self-supervised learning (tried but didn't use in final models)

Shuffle and learn: ssl_ideas/shuffle_and_learn
Contrastive learning, SimCLR: ssl_ideas/simclr

Libraries used

Parts of code sourced from:

Papers and useful links:

InfoVAE (https://arxiv.org/abs/1706.02262)
Understanding MMD: https://ermongroup.github.io/blog/a-tutorial-on-mmd-variational-autoencoders/
MonoOccupancy (https://arxiv.org/pdf/1804.02176.pdf)
UNet (https://arxiv.org/abs/1505.04597)
Monocular Plan View Networks for Autonomous Driving: https://arxiv.org/pdf/1905.06937.pdf
Review of papers on 3D object detection : https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e
Inverse perspective mapping (IPM): from monocular images to Birds-eye-view (BEV) images

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
data_loaders		data_loaders
deprecated		deprecated
notebooks		notebooks
report		report
road_layout_prediction		road_layout_prediction
ssl_ideas		ssl_ideas
submission		submission
tests		tests
vehicle_layout_prediction		vehicle_layout_prediction
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
constants.py		constants.py
generate_labels.py		generate_labels.py
metrics.py		metrics.py
paths.py		paths.py
plot_utils.py		plot_utils.py
pytest.ini		pytest.ini
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bird's Eye View layout prediction: roads and cars

Final project for Deep Learning course (DS-GA 1008, NYU Center for Data Science)

Top-1 in both tasks: road layout prediction and car bounding boxes prediction

Shreyas Chandrakaladharan, Marina Zavalina, Philip Ekfeldt

Report | Video

Abstract

Animations

Predictions of the final models

Projecting photos to Bird's Eye View

Usage

Generate and save labels

Road Layout Prediction

Bounding Boxes Prediction

Self-supervised learning (tried but didn't use in final models)

Papers and useful links:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bird's Eye View layout prediction: roads and cars

Final project for Deep Learning course (DS-GA 1008, NYU Center for Data Science)

Top-1 in both tasks: road layout prediction and car bounding boxes prediction

Shreyas Chandrakaladharan, Marina Zavalina, Philip Ekfeldt

Report | Video

Abstract

Animations

Predictions of the final models

Projecting photos to Bird's Eye View

Usage

Generate and save labels

Road Layout Prediction

Bounding Boxes Prediction

Self-supervised learning (tried but didn't use in final models)

Papers and useful links:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages