Skip to content

badri999/MMDC-Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi Modal Depth Completion Network

Official Pytorch implementation of the depth completion framework published in Optics and Lasers in Engineering.

Paper: https://doi.org/10.1016/j.optlaseng.2025.109587

Overview

This repository implements a multi-modal depth completion network that fuses sparse depth measurements, grayscale imagery, and monocular depth estimates to generate dense depth maps. The model employs circular convolutions and a multi-branch architecture optimized for efficient inference (~2.5M parameters). The overall framework is outlined below:

Figure 1: Overall workflow of the proposed approach. MMDC-Net fuses sparse FPP depth, grayscale imagery, and relative depth priors to recover dense geometry in unreliable regions. Balasubramaniam et al. (2026).

Sample Results

Figure 2: MMDC-Net sample results showing depth completion on hard drives. MMDC-Net successfully recovers geometry where traditional fringe projection fails. (a) Ground Truth | (b) Our Method (FPP + MMDC-Net) | (c) MMDC-Net prediction on sparse regions | (d) Ground Truth on sparse regions | (e) Error map Balasubramaniam et al. (2026).

Key Features

  • Circular Convolution Kernels: Novel convolution operation for improved spatial feature extraction
  • Multi-Modal Fusion: Combines three complementary modalities (sparse depth, grayscale, relative depth)
  • Lightweight Architecture: SqueezeNet-inspired design with Fire modules for efficiency
  • Gated Convolutions: Adaptive feature gating for improved fusion quality
  • Squeeze-and-Excitation: Channel attention mechanism for feature refinement
  • Flexible Loss Functions: Supports L1, L2, SSIM, and Laplacian pyramid losses with custom weighting

Architecture

The model consists of four main branches:

  1. Sparse Depth Branch: Processes sparse structured light measurements
  2. Grayscale Branch: Extracts texture and edge information from intensity images
  3. Relative Depth Branch: Incorporates monocular depth priors (Depth-Anything-v2)
  4. Fusion Branch: Combines multi-scale features with gated convolutions and SE blocks

Each parallel branch uses an encoder-decoder architecture with skip connections. The fusion branch integrates outputs and encoder features from all three modalities to predict dense depth maps.

Figure 3: The proposed MMDC-Net architecture. Balasubramaniam et al. (2026).

Installation

# Clone the repository
git clone [https://github.com/badri999/MMDC-Net.git](https://github.com/badri999/MMDC-Net.git)
cd MMDC-Net

# Create the environment from the file
conda env create -f environment.yml

# Activate the environment
conda activate mmdc-net

## Dataset Structure

**Important:** The training data resides directly in the root folder. The validation and test sets are nested subfolders.

dataset_root/
├── sparse_depth_z/              # [Training] Sparse depth map (CSV)
├── sparse_mask/                 # [Training] Masks of unreliable regions (PNG)
├── grayscale/                   # [Training] Projector-illuminated images (PNG)
├── depth_anything_v2_map_1512/  # [Training] Relative depth from Depth-Anything-V2 (PNG)
├── gt_depth/                    # [Training] Ground truth depth map (CSV)
├── shadow_mask_gt/              # [Training] Shadow region masks (PNG)
├── background_mask_gt/          # [Training] (Required folder) Background masks (PNG)
│
├── valid/                       # [Validation] Nested folder
│   ├── sparse_depth_z/
│   ├── sparse_mask/
│   ├── grayscale/
│   ├── depth_anything_v2_map_1512/
│   ├── gt_depth/
│   ├── shadow_mask_gt/
│   └── background_mask_gt/
│
└── test/                        # [Test] Nested folder
    ├── sparse_depth_z/
    ├── sparse_mask/
    ├── grayscale/
    ├── depth_anything_v2_map_1512/
    ├── gt_depth/
    ├── shadow_mask_gt/
    └── background_mask_gt/

> **Note on Background Masks:** The `background_mask_gt` folder is **required** by the dataloader structure, even though the masks are not currently used for training or inference. You must ensure this folder exists (populated with dummy 512x512 PNGs if you wish) to avoid file path errors.

## Usage

### Training

```bash
python hdd_training_script_2_5_ck.py \
    --config config.json \
    --train-on-shadow-regions

Resume Training

python hdd_training_script_2_5_ck.py \
    --config config.json \
    --resume checkpoints/best-epoch50.pth \
    --resume-lr 0.0001 \
    --reset-best-loss

Configuration

Use existing json files in repo

Inference

python inference_script.py

Model Components

Fire Module

Efficient squeeze-expand module with 1×1 squeeze convolution followed by parallel 1×1 and 3×3 expand convolutions (with circular kernels).

Gated Circular Convolution

Feature gating mechanism: output = φ(features) ⊙ σ(gating) where both feature and gating branches use circular convolutions.

Circular Convolution (CircleConv3x3)

Custom convolution operation with circular transformation matrix for enhanced spatial feature learning.

Training Features

  • Multi-Loss Training: Combine L1, L2, SSIM, and Laplacian losses with custom weights
  • Shadow Region Handling: Optional masking of shadow regions during training
  • Data Augmentation: Built-in augmentation pipeline (see hdd_data_augmenter.py)
  • WandB Integration: Automatic experiment tracking and visualization
  • Checkpoint Management: Automatic best model saving and periodic checkpoints
  • Learning Rate Scheduling: Support for step, cosine, and plateau schedulers

Loss Functions

The framework supports multiple loss functions defined in compute_loss.py:

  • L1 Loss: Mean Absolute Error with optional masking
  • L2 Loss: Mean Squared Error with optional masking
  • SSIM Loss: Structural similarity with cropping and masking
  • Laplacian Loss: Multi-scale edge-aware loss using Laplacian pyramids

Each loss can be independently weighted for shadow and sparse regions.

Citation

If you use this code in your research, please cite our paper:

@article{balasubramaniam2026application,
    title={Application-driven multi-modal depth completion in fringe projection profilometry},
    author={Balasubramaniam, Badrinath and Suresh, Vignesh and Cheng, Yang and Li, Jiaqiong and Li, Beiwen},
    journal={Optics and Lasers in Engineering},
    year={2026},
    doi={10.1016/j.optlaseng.2025.109587},
    url={https://doi.org/10.1016/j.optlaseng.2025.109587}
}

License

GPL-3.0 license

Acknowledgments

Contact

For questions or issues, please open an issue on GitHub or contact bb2@uga.edu.

About

Deep learning-based depth completion for 3D fringe projection profilometry systems

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages