In this project we focus on Bird's Eye View (BEV) prediction based on monocular photos taken by the cameras on top of the car. We present a Maximum Mean Discrepancy Variational Auto Encoder (MMD VAE) model to predict the BEV road layout. We also contribute an approach combining Image Warping, U-Net and Post-processing to predict the bounding boxes (BB) on the BEV layout. Our models achieve 0.81 test threat score on the road layout prediction task and 0.072 test threat score on the BB prediction task. Animations below visualize the predictions of our final models.
Use generate_labels.py to generate
- vehicles mask
- road mask
- warped and glued photos
Refer to road_layout_prediction/ for code used to train and test road layout prediction models.
- Main notebook, Training & Evaluation:
road_layout_prediction.ipynb - Model Architectures and Loss functions:
modelzoo.py
Refer to vehicle_layout_predictions/ for preprocessing, modeling and postprocessing.
- Image warping and glueing (OpenCV, Kornia):
preprocessing/ - U-Net model (fastai):
notebooks/fastai_final_for_cars.ipynb - Converting segmentation map to bounding boxes coordinates (OpenCV):
postprocessing/
- Shuffle and learn: ssl_ideas/shuffle_and_learn
- Contrastive learning, SimCLR: ssl_ideas/simclr
Libraries used
Parts of code sourced from:
- https://github.com/Chenyang-Lu/mono-semantic-occupancy
- https://github.com/napsternxg/pytorch-practice/blob/master/Pytorch%20-%20MMD%20VAE.ipynb
- https://github.com/mdiephuis/SimCLR/blob/master/loss.py
- https://github.com/guptv93/saycam-metric-learning/blob/master/data_util/simclr_transforms.py
- InfoVAE (https://arxiv.org/abs/1706.02262)
- Understanding MMD: https://ermongroup.github.io/blog/a-tutorial-on-mmd-variational-autoencoders/
- MonoOccupancy (https://arxiv.org/pdf/1804.02176.pdf)
- UNet (https://arxiv.org/abs/1505.04597)
- Monocular Plan View Networks for Autonomous Driving: https://arxiv.org/pdf/1905.06937.pdf
- Review of papers on 3D object detection : https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e
- Inverse perspective mapping (IPM): from monocular images to Birds-eye-view (BEV) images

