MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization

Georgia Institute of Technology (OLIVES Lab)

Published at ICML 2026

Abstract

Deploying multimodal models in real-world scenarios requires generalization to new environments where recording conditions differ from training, a challenge known as multimodal domain generalization (MMDG). Standard architectures employ separate encoders for each modality and a fusion module, training the system end-to-end by optimizing on the fused features. In this paper, we identify that such joint optimization causes encoders to exploit cross-modal co-occurrences, statistical relationships between modalities that arise from source-specific recording conditions, rather than learning domain-invariant features. We term this failure mode Fusion Overfitting. To address this, we propose Modality-Entropy Regularization for Domain Generalization (MER-DG), which maximizes the entropy of each encoder's feature distribution to preserve feature diversity. MER-DG is architecture-agnostic and integrates into existing multimodal frameworks as an additive loss term. Extensive experiments on EPIC-Kitchens and HAC benchmarks demonstrate average improvements of approximately 5% over standard fusion and approximately 2% over state-of-the-art methods.

Code

The code was tested using Python 3.10.4, torch 1.11.0+cu113.

Environments:

mmcv-full 1.2.7
mmaction2 0.13.0

EPIC-Kitchens & HAC Datasets Preparation

Download Pretrained Weights

Download Audio model link, rename it as vggsound_avgpool.pth.tar and place under the EPIC-rgb-flow-audio/pretrained_models and HAC-rgb-flow-audio/pretrained_models directories.
Download SlowFast model for RGB modality link and place under the pretrained_models directories.
Download SlowOnly model for Flow modality link and place under the pretrained_models directories.

Download Datasets

EPIC-Kitchens: Download Audio files EPIC-KITCHENS-audio.zip. Follow the original EPIC-Kitchens extraction format.
HAC: Download at link.

(See the original SimMMDG repository for the exact desired directory tree structures for the datasets).

Running the Code (Experiments)

We provide clean compilation scripts for both datasets to run our MER-DG approach alongside the standard Baseline Fusion and the state-of-the-art SimMMDG framework.

Each directory contains a unified run_experiments.sh script that organizes configuring and training the models. Before running:

Edit EPIC-rgb-flow-audio/run_experiments.sh or HAC-rgb-flow-audio/run_experiments.sh
Point DATAPATH= to where you stored the datasets locally.

EPIC-Kitchens

cd EPIC-rgb-flow-audio
bash run_experiments.sh

HAC Dataset

cd HAC-rgb-flow-audio
bash run_experiments.sh

By default, the scripts execute the following experiments sequentially:

Baseline Fusion
Baseline Fusion + MER-DG
SimMMDG Baseline
SimMMDG + MER-DG

Modify the script to isolate specific experiments. Ensure wandb is configured for metric logging.

Citation

@inproceedings{yarici2026merdg,
    title={MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization},
    author={Yarici, Yavuz and AlRegib, Ghassan},
    booktitle={2026 International Conference on Machine Learning (ICML)},
    note={Accepted on April 30, 2026},
    year={2026}
}

Acknowledgement

This codebase is adapted from the SimMMDG framework. We sincerely thank the authors for open-sourcing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
EPIC-rgb-flow-audio		EPIC-rgb-flow-audio
HAC-rgb-flow-audio		HAC-rgb-flow-audio
imgs		imgs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization

Published at ICML 2026

Code

EPIC-Kitchens & HAC Datasets Preparation

Download Pretrained Weights

Download Datasets

Running the Code (Experiments)

EPIC-Kitchens

HAC Dataset

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization

Published at ICML 2026

Code

EPIC-Kitchens & HAC Datasets Preparation

Download Pretrained Weights

Download Datasets

Running the Code (Experiments)

EPIC-Kitchens

HAC Dataset

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages