UBC Faculty Research and Publications

Efficient Layer-Wise Cross-View Calibration and Aggregation for Multispectral Object Detection He, Xiao; Yang, Tong; Yan, Tingzhou; Li, Hongtao; Ge, Yang; Ren, Zhijun; Liu, Zhe; Jiang, Jiahe; Tang, Chang

Abstract

Multispectral object detection is a fundamental task with an extensive range of practical implications. In particular, combining visible (RGB) and infrared (IR) images can offer complementary information that enhances detection performance in different weather scenarios. However, the existing methods generally involve aligning features across modalities and require proposals for the two-stage detectors, which are often slow and unsuitable for large-scale applications. To overcome this challenge, we introduce a novel one-stage oriented detector for RGB-infrared object detection called the Layer-wise Cross-Modality calibration and Aggregation (LCMA) detector. LCMA employs a layer-wise strategy to achieve cross-modality alignment by using the proposed inter-modality spatial-reduction attention. Moreover, we design Gated Coupled Filter in each layer to capture semantically meaningful features while ensuring that well-aligned and foreground object information is obtained before forwarding them to the detection head. This relieves the need for a region proposal step for the alignment, enabling direct category and bounding box predictions in a unified one-stage oriented detector. Extensive experiments on two challenging datasets demonstrate that the proposed LCMA outperforms state-of-the-art methods in terms of both accuracy and computational efficiency, which implies the efficacy of our approach in exploiting multi-modality information for robust and efficient multispectral object detection.

Item Media