Skip to content

DocF/MOD-ZOO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

98 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MOD-ZOO: Multispectral Object Detection - A Unified Framework and Systematic Survey

Paper Project Page Awesome License: MIT

This is the official repository for the preprint paper "Multispectral Object Detection: A Unified Framework and Systematic Survey".

This repository (MOD-ZOO) provides a comprehensive, continuously updated collection of resources (papers, codes, datasets) for Multispectral Object Detection (MOD) across Ground-based and Remote Sensing scenarios.


πŸ“‘ Table of Contents

MOD Publications over the past decade


πŸ“’ News

  • [2026/04] πŸ”₯πŸ”₯The preprint will be available soon!
  • [2026/04] πŸ”₯πŸ”₯Initial release of the MOD-ZOO repository, including taxonomy, datasets, and paper lists.

πŸ“– Abstract

Multispectral Object Detection (MOD) has emerged as a critical methodology to overcome the limitations of visible-light imaging, particularly under adverse conditions such as low illumination and inclement weather. By integrating complementary information across diverse spectral bands, MOD ensures robust all-day and all-weather perception.

To provide a systematic survey, a unified four-stage mathematical framework is established, which deconstructs MOD into multispectral data input, feature learning, fusion schemes, and detection solutions.


πŸ–ΌοΈ Unified Framework & Taxonomy

Building upon the concepts introduced above, the following figures visualize the structural breakdown of our survey.

  • Figure 1 illustrates the detailed data flow of the unified mathematical framework, mapping the progression from raw multispectral inputs to final detection outputs.
  • Figure 2 expands this framework into a fine-grained hierarchical taxonomy. It categorizes recent state-of-the-art literature based on their specific strategies to overcome core cross-modal challenges.

This taxonomy directly dictates the organization of the paper list in the following sections.


Unified Framework

Figure 1. A unified four-stage framework and systematic taxonomy of MOD.

Taxonomy

Figure 2. Hierarchical structural decomposition and taxonomy of the MOD landscape.

πŸ—‚οΈ Datasets & Benchmarks

An overview of representative MOD datasets spanning ground-based and remote sensing scenarios.

Spectrum

Figure 3. Electromagnetic spectrum mapping and visual comparisons.

Ground-based Datasets

(Legend: Pairs = Img Pairs, Res. = Resolution, Plat. = Platform(Surv. = Surveillance, Multi. = Multiple), Cls = Class, A/O = Alignment / Occlusion)

Dataset Venue Modality Pairs Res. Plat. Cls Den. A/O Link
KAIST CVPR'15 RGB-TIR 95.3K 640x480 Driving 1 0.62 βœ…/βœ… Link
CVC-14 Sensors'16 RGB-TIR 8.5K 640x512 Driving 1 0.80 ❌/❌ Link
FLIR-aligned ICIP'20 RGB-TIR 5.1K 640x512 Driving 3 7.92 βœ…/βœ… Link
LLVIP ICCV'21 RGB-TIR 16.8K 1080x720 Surv. 1 2.51 βœ…/❌ Link
MΒ³FD CVPR'22 RGB-TIR 4.2K 1024x768 Multi. 6 8.19 βœ…/❌ Link
SMOD TMM'25 RGB-TIR 8.6K 640x512 Driving 4 3.62 βœ…/βœ… Link
MFAD TCSVT'25 RGB-TIR 12.1K 1280x960 Driving 6 7.13 βœ…/❌ Link

Remote Sensing Datasets

(Legend: Pairs = Img Pairs, Res. = Resolution, Plat. = Platform, Cls = Class, A/O = Alignment / Occlusion)

Dataset Venue Modality Pairs Res. Plat. Cls Den. A/O Link
VEDAI JVCI'16 R-NIR 1.2K 1024x1024 UAV 9 2.93 βœ…/❌ Link
DroneVehicle TCSVT'21 R-TIR 28.4K 840x712 UAV 1 16.7 ❌/❌ Link
DronePerson ISPRS'23 R-TIR 6.1K 640x512 UAV 1 11.6 βœ…/❌ Link
DVTOD TIV'24 R-TIR 2.1K 1920x1080 UAV 3 2.82 ❌/❌ Link
OdinMJ GRSM'24 R-TIR 23K 640x512 UAV 1 1.98 βœ…/βœ… Link
RGBT-Tiny TPAMI'25 R-TIR ~47.5K 640x512 UAV 7 12.9 βœ…/❌ Link
SpaceNet6-OTD TGRS'22 R-SAR 820 900x900 Sat. 1 22.0 βœ…/❌ Link
OGSOD-1.0 TGRS'23 R-SAR 14.6K 256x256 Sat. 3 2.62 βœ…/❌ Link
OGSOD-2.0 ICGIP'25 R-SAR 23.4K 256x256 Sat. 4 3.24 βœ…/❌ Link
```

πŸ“š Paper List (The MOD Zoo)

We categorize representative methods according to our proposed taxonomy.

1. Feature Learning (Mitigating Representation Challenges)

This section addresses fundamental representation challenges: Modality Misalignment, Modality Imbalance, Modality Redundancy, and Modality Asymmetry.

Modality Misalignment

This challenge manifests in two primary forms: Spatial Misalignment and Semantic Misalignment.

Function

Figure 4. Modality Misalignment

Venue Methods Title Modality Source
AAAI'26 IGIANet Igianet: Illumination guided implicit alignment network for infrared-visible uav detection RGB-TIR Paper
TMM'25 DeformCAT Deformable cross-attention transformer for weakly aligned rgb-t pedestrian detection RGB-TIR Paper/Code
TCSVT'25 SeaDATE Seadate: Remedy dual-attention transformer with semantic alignment via contrast learning for multimodal object detection RGB-TIR Paper
CVPR'24 OAFA Weakly misalignment-free adaptive feature alignment for uavs-based multimodal object detection RGB-TIR Paper
ECCV'24 DAMSDet Damsdet: Dynamic adaptive multispectral detection transformer with competitive query selection and adaptive feature fusion RGB-TIR Paper/Code
ICIP'24 L-CMAF Revisiting misalignment in multispectral pedestrian detection RGB-TIR Paper
TIV'24 YOLO-Adaptor Yolo-adaptor: A fast adaptive one-stage detector for non-aligned visible-infrared object detection RGB-TIR Paper
MM'23 AANet Attentive alignment network for multispectral pedestrian detection RGB-TIR Paper
MM'23 CALNet Multispectral object detection via cross-modal conflict-aware learning RGB-TIR Paper/Code
TITS'23 MFPT Multi-modal feature pyramid transformer for rgb-infrared object detection RGB-TIR Paper
ECCV'22 TSFADet Translation, scale and rotation: cross-modal alignment meets rgb-infrared vehicle detection RGB-TIR Paper
ICCV'19 AR-CNN Weakly aligned cross-modal learning for multispectral pedestrian detection RGB-TIR Paper/Code

Modality Imbalance

Venue Methods Title Modality Source
TCSVT'25 MSCoTDet Mscotdet: Language-driven multi-modal fusion for improved multi-spectral pedestrian detection RGB-TIR Paper
TGRS'25 DKDNet Diffusion mechanism and knowledge distillation object detection in multimodal remote sensing imagery RGB-SAR Paper
InfFus'25 EMOD Efficient multispectral object detection with attentive feature aggregation leveraging zero-shot implicit illumination guidance RGB-TIR Paper/Code
ICCV'25 MΒ²D-LIF Rethinking multi-modal object detection from the perspective of mono-modality feature learning RGB-TIR Paper/Code
TITS'24 MS-DETR MS-DETR: multispectral pedestrian detection transformer with loosely coupled fusion and modality-balanced optimization RGB-TIR Paper/Code
IROS'24 DCSANet Desanet: Dual cross-channel and spatial attention make RGB-T object detection better RGB-TIR Paper
CVPR'24 CMM Causal mode multiplexer: A novel framework for unbiased multispectral pedestrian detection RGB-TIR Paper/Code
ECCV'22 MBNet Improving multispectral pedestrian detection by addressing modality imbalance problems RGB-TIR Paper/Code

Modality Redundancy

Venue Methods Title Modality Source
NeuCom'24 DHFNet Dhfnet: Decoupled hierarchical fusion network for RGB-T dense prediction tasks RGB-TIR Paper
RS'22 RISNet Improving rgb-infrared object detection by reducing cross-modality redundancy RGB-TIR Paper
PR'22 YOLOFusion Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery RGB-NIR Paper/Code

Modality Asymmetry

Venue Methods Title Modality Source
MM'25 UniRGB-IR Unirgb-ir: A unified framework for visible-infrared semantic tasks via adapter tuning RGB-TIR Paper/Code
ECCV'24 ModTr Modality translation for object detection adaptation without forgetting prior knowledge TIR Paper/Code
CVPR'24 D3T D3t: Distinctive dual-domain teacher zigzagging across rgb-thermal gap for domain-adaptive object detection TIR Paper/Code
MM'23 TIRDet Tirdet: Mono-modality thermal infrared object detection based on prior thermal-to-visible translation TIR Paper/Code
TCSVT'22 DCRL-PDN Deep cross-modal representation learning and distillation for illumination-invariant pedestrian detection RGB Paper
AAAI'22 VPD Towards versatile pedestrian detector with multisensory-matching and multispectral recalling memory RGB-TIR Paper
ECCV'20 TC-Det Task-conditioned domain adaptation for pedestrian detection in thermal imagery TIR Paper/Code
CVPRW'19 UMAD Unsupervised domain adaptation for multispectral pedestrian detection RGB-TIR Paper/Code
CVPR'17 CMT-CNN Learning cross-modal deep representations for robust pedestrian detection RGB-TIR Paper/Code

2. Fusion Scheme

Categorized by Fusion Stage Design and Fusion Function Construction.

Stage

Figure 5. Fusion Stage Design.

Function

Figure 6. Fusion Function Construction.

Venue Methods Title Modality Source
TIP'26 AFFNet Adaptive fine-grained fusion network for multimodal UAV object detection RGB-TIR Paper
InfFus'26 MSFF Multispectral state-space feature fusion: Bridging shared and cross-parametric interactions for object detection RGB-TIR Paper/Code
InfFus'26 COMO COMO: cross-mamba interaction and offset-guided fusion for multimodal object detection RGB-TIR Paper/Code
TII'25 RetinexDet Retinexdet: Enhancing multispectral object detection via retinex state space duality and wavelet-based frequency adaptive fusion RGB-TIR Paper
TGRS'25 MPFF Aerial image object detection based on rgb-infrared multibranch progressive fusion RGB-TIR Paper
TGRS'25 DHANet Dhanet: Dual-stream hierarchical interaction networks for multimodal drone object detection RGB-TIR Paper/Code
TGRS'25 DMM DMM: disparity-guided multispectral mamba for oriented object detection in remote sensing RGB-TIR Paper/Code
PR'25 MSTF Multispectral transformer fusion via exploiting similarity and complementarity for robust pedestrian detection RGB-TIR Paper
TMM'25 Fusion-Mamba Fusion-mamba for cross-modality object detection RGB-TIR Paper/Code
MM'25 CSSFDet Contextually-guided state space fusion for misaligned multi-spectral object detection RGB-TIR Paper
MM'25 SemFusion Sam-guided semantic knowledge fusion for visible-infrared object detection RGB-TIR Paper/Code
ICCV'25 WaveMamba Wavemamba: Wavelet-driven mamba fusion for rgb-infrared object detection RGB-TIR Paper
ICCV'25 M-SpecGene M-specgene: Generalized foundation model for rgbt multispectral vision RGB-TIR Paper/Code
TNNLS'24 LRAF-Net Lraf-net: Long-range attention fusion network for visible-infrared object detection RGB-TIR Paper
TNNLS'24 TFDet Tfdet: Target-aware fusion for RGB-T pedestrian detection RGB-TIR Paper/Code
ECCV'24 MMPedestron When pedestrian detection meets multi-modal learning: Generalist model and benchmark dataset Multi Paper/Code
NIPS'24 E2E-MFD E2e-mfd: Towards end-to-end synchronous multimodal fusion detection RGB-TIR Paper/Code
TMM'23 CMPD Confidence-aware fusion using dempster-shafer theory for multispectral pedestrian detection RGB-TIR Paper/Code
TCSVT'22 UA-CMDet Drone-based rgb-infrared cross-modality vehicle detection via uncertainty-aware learning RGB-TIR Paper/Code
InfFus'19 CIAN Cross-modality interactive attention network for multispectral pedestrian detection RGB-TIR Paper/Code
PR'19 IAF R-CNN Illumination-aware faster r-cnn for robust multispectral pedestrian detection RGB-TIR Paper/Code

3. Detection Solutions (Task-Specific)

This section categorizes detection solutions based on specific application challenges: Small Object Detection, Robust Perception Under Adverse Conditions, and Adversarial Attacks.

Small Object Detection

Venue Methods Title Modality Source
TIM'25 AMSDet Adaptive modality selection drone-based RGBT detector for tiny targets RGB-TIR Paper
TGRS'23 SuperYOLO Superyolo: Super resolution assisted object detection in multimodal remote sensing imagery RGB-NIR Paper/Code
ISPRS'23 QFDet Drone-based rgbt tiny person detection RGB-TIR Paper/Code
BMVC'20 ASMPD Anchor-free small-scale multispectral pedestrian detection RGB-TIR Paper/Code
ISPRS'19 HMFFN Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection RGB-TIR Paper

Robust Object Detection

Venue Methods Title Modality Source
TCSVT'25 CFMW CFMW: cross-modality fusion mamba for robust object detection under adverse weather RGB-TIR Paper/Code
PRL'25 RRD Learning a robust rgb-thermal detector for extreme modality imbalance RGB-TIR Paper
RAL'25 HA-MLPD Hybrid attention for robust RGB-T pedestrian detection in real-world conditions RGB-TIR Paper
MMUL'25 VL-ACFDet Vision-language-guided adaptive cross-modal fusion for multispectral object detection under adverse weather conditions RGB-TIR Paper
TGRS'24 LF-MDet Low-rank multimodal remote sensing object detection with frequency filtering experts RGB-TIR Paper/Code
ECCV'22 ProbEn Multimodal object detection via probabilistic ensembling RGB-TIR Paper/Code

Adversarial Attack & Defense

Venue Methods Title Modality Source
MM'25 CDUPatch Cdupatch: Color-driven universal adversarial patch attack for dual-modal visible-infrared detectors RGB-TIR Paper
TPAMI'24 UAPatch Unified adversarial patch for visible-infrared cross-modal attacks in the physical world RGB-TIR Paper/Code
AAAI'23 MIC Multispectral invisible coating: Laminated visible-thermal physical attack against multispectral object detectors using transparent low-e films RGB-TIR Paper
ICASSP'23 SRG-ASRP Similarity relation preserving cross-modal learning for multispectral pedestrian detection against adversarial attacks RGB-TIR Paper

(Note: We welcome pull requests to update this list with the latest SOTA papers!)

Contact

Please contact us at fqy2017@gmail.com for any questions.

About

πŸš€ Official repo for "Multispectral Object Detection: A Unified Framework and Systematic Survey". A curated zoo of MOD resources.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors