Multi Modal Depth Completion Network

Official Pytorch implementation of the depth completion framework published in Optics and Lasers in Engineering.

Paper: https://doi.org/10.1016/j.optlaseng.2025.109587

Overview

This repository implements a multi-modal depth completion network that fuses sparse depth measurements, grayscale imagery, and monocular depth estimates to generate dense depth maps. The model employs circular convolutions and a multi-branch architecture optimized for efficient inference (~2.5M parameters). The overall framework is outlined below:

Figure 1: Overall workflow of the proposed approach. MMDC-Net fuses sparse FPP depth, grayscale imagery, and relative depth priors to recover dense geometry in unreliable regions. Balasubramaniam et al. (2026).

Sample Results

Figure 2: MMDC-Net sample results showing depth completion on hard drives. MMDC-Net successfully recovers geometry where traditional fringe projection fails. (a) Ground Truth | (b) Our Method (FPP + MMDC-Net) | (c) MMDC-Net prediction on sparse regions | (d) Ground Truth on sparse regions | (e) Error map Balasubramaniam et al. (2026).

Key Features

Circular Convolution Kernels: Novel convolution operation for improved spatial feature extraction
Multi-Modal Fusion: Combines three complementary modalities (sparse depth, grayscale, relative depth)
Lightweight Architecture: SqueezeNet-inspired design with Fire modules for efficiency
Gated Convolutions: Adaptive feature gating for improved fusion quality
Squeeze-and-Excitation: Channel attention mechanism for feature refinement
Flexible Loss Functions: Supports L1, L2, SSIM, and Laplacian pyramid losses with custom weighting

Architecture

The model consists of four main branches:

Sparse Depth Branch: Processes sparse structured light measurements
Grayscale Branch: Extracts texture and edge information from intensity images
Relative Depth Branch: Incorporates monocular depth priors (Depth-Anything-v2)
Fusion Branch: Combines multi-scale features with gated convolutions and SE blocks

Each parallel branch uses an encoder-decoder architecture with skip connections. The fusion branch integrates outputs and encoder features from all three modalities to predict dense depth maps.

Figure 3: The proposed MMDC-Net architecture. Balasubramaniam et al. (2026).

Installation

# Clone the repository
git clone [https://github.com/badri999/MMDC-Net.git](https://github.com/badri999/MMDC-Net.git)
cd MMDC-Net

# Create the environment from the file
conda env create -f environment.yml

# Activate the environment
conda activate mmdc-net

## Dataset Structure

**Important:** The training data resides directly in the root folder. The validation and test sets are nested subfolders.

dataset_root/
├── sparse_depth_z/              # [Training] Sparse depth map (CSV)
├── sparse_mask/                 # [Training] Masks of unreliable regions (PNG)
├── grayscale/                   # [Training] Projector-illuminated images (PNG)
├── depth_anything_v2_map_1512/  # [Training] Relative depth from Depth-Anything-V2 (PNG)
├── gt_depth/                    # [Training] Ground truth depth map (CSV)
├── shadow_mask_gt/              # [Training] Shadow region masks (PNG)
├── background_mask_gt/          # [Training] (Required folder) Background masks (PNG)
│
├── valid/                       # [Validation] Nested folder
│   ├── sparse_depth_z/
│   ├── sparse_mask/
│   ├── grayscale/
│   ├── depth_anything_v2_map_1512/
│   ├── gt_depth/
│   ├── shadow_mask_gt/
│   └── background_mask_gt/
│
└── test/                        # [Test] Nested folder
    ├── sparse_depth_z/
    ├── sparse_mask/
    ├── grayscale/
    ├── depth_anything_v2_map_1512/
    ├── gt_depth/
    ├── shadow_mask_gt/
    └── background_mask_gt/

> **Note on Background Masks:** The `background_mask_gt` folder is **required** by the dataloader structure, even though the masks are not currently used for training or inference. You must ensure this folder exists (populated with dummy 512x512 PNGs if you wish) to avoid file path errors.

## Usage

### Training

```bash
python hdd_training_script_2_5_ck.py \
    --config config.json \
    --train-on-shadow-regions

Resume Training

python hdd_training_script_2_5_ck.py \
    --config config.json \
    --resume checkpoints/best-epoch50.pth \
    --resume-lr 0.0001 \
    --reset-best-loss

Configuration

Use existing json files in repo

Inference

python inference_script.py

Model Components

Fire Module

Efficient squeeze-expand module with 1×1 squeeze convolution followed by parallel 1×1 and 3×3 expand convolutions (with circular kernels).

Gated Circular Convolution

Feature gating mechanism: output = φ(features) ⊙ σ(gating) where both feature and gating branches use circular convolutions.

Circular Convolution (CircleConv3x3)

Custom convolution operation with circular transformation matrix for enhanced spatial feature learning.

Training Features

Multi-Loss Training: Combine L1, L2, SSIM, and Laplacian losses with custom weights
Shadow Region Handling: Optional masking of shadow regions during training
Data Augmentation: Built-in augmentation pipeline (see hdd_data_augmenter.py)
WandB Integration: Automatic experiment tracking and visualization
Checkpoint Management: Automatic best model saving and periodic checkpoints
Learning Rate Scheduling: Support for step, cosine, and plateau schedulers

Loss Functions

The framework supports multiple loss functions defined in compute_loss.py:

L1 Loss: Mean Absolute Error with optional masking
L2 Loss: Mean Squared Error with optional masking
SSIM Loss: Structural similarity with cropping and masking
Laplacian Loss: Multi-scale edge-aware loss using Laplacian pyramids

Each loss can be independently weighted for shadow and sparse regions.

Citation

If you use this code in your research, please cite our paper:

@article{balasubramaniam2026application,
    title={Application-driven multi-modal depth completion in fringe projection profilometry},
    author={Balasubramaniam, Badrinath and Suresh, Vignesh and Cheng, Yang and Li, Jiaqiong and Li, Beiwen},
    journal={Optics and Lasers in Engineering},
    year={2026},
    doi={10.1016/j.optlaseng.2025.109587},
    url={https://doi.org/10.1016/j.optlaseng.2025.109587}
}

License

GPL-3.0 license

Acknowledgments

Depth-Anything-v2 for monocular depth estimation
PyTorch-MSSSIM for SSIM loss implementation
Circular Convolution Implementation by https://github.com/JHL-HUST/CircularKernel
Albumentations for the data augmentation pipeline

Contact

For questions or issues, please open an issue on GitHub or contact bb2@uga.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
dataloader		dataloader
model		model
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi Modal Depth Completion Network

Overview

Sample Results

Key Features

Architecture

Installation

Resume Training

Configuration

Inference

Model Components

Fire Module

Gated Circular Convolution

Circular Convolution (CircleConv3x3)

Training Features

Loss Functions

Citation

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi Modal Depth Completion Network

Overview

Sample Results

Key Features

Architecture

Installation

Resume Training

Configuration

Inference

Model Components

Fire Module

Gated Circular Convolution

Circular Convolution (CircleConv3x3)

Training Features

Loss Functions

Citation

License

Acknowledgments

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages