Efficient Layer-Wise Cross-View Calibration and Aggregation for Multispectral Object Detection

UBC Faculty Research and Publications

Efficient Layer-Wise Cross-View Calibration and Aggregation for Multispectral Object Detection He, Xiao; Yang, Tong; Yan, Tingzhou; Li, Hongtao; Ge, Yang; Ren, Zhijun; Liu, Zhe; Jiang, Jiahe; Tang, Chang

Abstract

Multispectral object detection is a fundamental task with an extensive range of practical implications. In particular, combining visible (RGB) and infrared (IR) images can offer complementary information that enhances detection performance in different weather scenarios. However, the existing methods generally involve aligning features across modalities and require proposals for the two-stage detectors, which are often slow and unsuitable for large-scale applications. To overcome this challenge, we introduce a novel one-stage oriented detector for RGB-infrared object detection called the Layer-wise Cross-Modality calibration and Aggregation (LCMA) detector. LCMA employs a layer-wise strategy to achieve cross-modality alignment by using the proposed inter-modality spatial-reduction attention. Moreover, we design Gated Coupled Filter in each layer to capture semantically meaningful features while ensuring that well-aligned and foreground object information is obtained before forwarding them to the detection head. This relieves the need for a region proposal step for the alignment, enabling direct category and bounding box predictions in a unified one-stage oriented detector. Extensive experiments on two challenging datasets demonstrate that the proposed LCMA outperforms state-of-the-art methods in terms of both accuracy and computational efficiency, which implies the efficacy of our approach in exploiting multi-modality information for robust and efficient multispectral object detection.

Item Metadata

Title	Efficient Layer-Wise Cross-View Calibration and Aggregation for Multispectral Object Detection
Creator	He, Xiao; Yang, Tong; Yan, Tingzhou; Li, Hongtao; Ge, Yang; Ren, Zhijun; Liu, Zhe; Jiang, Jiahe; Tang, Chang
Publisher	Multidisciplinary Digital Publishing Institute
Date Issued	2026-01-23
Description	Multispectral object detection is a fundamental task with an extensive range of practical implications. In particular, combining visible (RGB) and infrared (IR) images can offer complementary information that enhances detection performance in different weather scenarios. However, the existing methods generally involve aligning features across modalities and require proposals for the two-stage detectors, which are often slow and unsuitable for large-scale applications. To overcome this challenge, we introduce a novel one-stage oriented detector for RGB-infrared object detection called the Layer-wise Cross-Modality calibration and Aggregation (LCMA) detector. LCMA employs a layer-wise strategy to achieve cross-modality alignment by using the proposed inter-modality spatial-reduction attention. Moreover, we design Gated Coupled Filter in each layer to capture semantically meaningful features while ensuring that well-aligned and foreground object information is obtained before forwarding them to the detection head. This relieves the need for a region proposal step for the alignment, enabling direct category and bounding box predictions in a unified one-stage oriented detector. Extensive experiments on two challenging datasets demonstrate that the proposed LCMA outperforms state-of-the-art methods in terms of both accuracy and computational efficiency, which implies the efficacy of our approach in exploiting multi-modality information for robust and efficient multispectral object detection.
Subject	multi-modal fusion; multi-view learning; feature fusion; aerial detection
Genre	Article
Type	Text
Language	eng
Date Available	2026-02-13
Provider	Vancouver : University of British Columbia Library
Rights	CC BY 4.0
DOI	10.14288/1.0451491
URI	http://hdl.handle.net/2429/93619
Affiliation	Science, Faculty of; Non UBC
Citation	Electronics 15 (3): 498 (2026)
Publisher DOI	10.3390/electronics15030498
Peer Review Status	Reviewed
Scholarly Level	Faculty; Researcher
Rights URI	https://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications