Intelligent surveillance with multimodal object detection in complex environments

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Intelligent surveillance with multimodal object detection in complex environments Cao, Yue

Abstract

Surveillance systems play a crucial role in ensuring public safety. With the advent of deep learning algorithms, these systems have evolved from passive monitoring tools that heavily relied on human operators, to advanced solutions capable of autonomously analyzing scenes with minimal human input. However, accurately detecting objects of interest in real-world scenarios presents a significant challenge due to the dynamic illumination and the varying sizes of objects. This research aims to enhance the accuracy and robustness of intelligent surveillance systems for object detection in complex environments by integrating two complementary sensor data: visible light (RGB) and infrared (IR) images. First, a multimodal detection framework is developed building upon the Faster R-CNN architecture, which is capable of integrating features from both RGB and IR images for enhanced object detection. Following this, Poolfuser, a transformer-based fusion module, is introduced and incorporated into the detection framework to fuse features from various modalities from spatial perspective. This approach emphasizes the critical features for target detection. Experimental results show that the multimodal framework equipped with Poolfuser significantly outperforms unimodal detectors and other competing multimodal approaches in terms of detection accuracy in complex environments. Secondly, to further improve the detection accuracy of the multimodal detection framework without introducing additional computational load, a lightweight fusion module based on Convolutional Neural Networks (CNN) is introduced. This module, termed Channel Switching and Spatial Attention (CSSA), integrates input features from both channel and spatial dimensions. The experimental results demonstrate that the CSSA module can further improve the detection accuracy without affecting the real-time performance of the detection framework. Finally, considering the impact of other components, such as the backbone network and the loss function on detection performance. This study further optimizes the CSSA-based multimodal detection model and introduces CSSA-Det. CSSA-Det shows improved object detection performance over CSSA and other state-of-the-art multimodal frameworks, particularly in the accuracy of bounding box localization.

Item Metadata

Title	Intelligent surveillance with multimodal object detection in complex environments
Creator	Cao, Yue
Supervisor	Liu, Zheng (Professor of engineering)
Publisher	University of British Columbia
Date Issued	2024
Description	Surveillance systems play a crucial role in ensuring public safety. With the advent of deep learning algorithms, these systems have evolved from passive monitoring tools that heavily relied on human operators, to advanced solutions capable of autonomously analyzing scenes with minimal human input. However, accurately detecting objects of interest in real-world scenarios presents a significant challenge due to the dynamic illumination and the varying sizes of objects. This research aims to enhance the accuracy and robustness of intelligent surveillance systems for object detection in complex environments by integrating two complementary sensor data: visible light (RGB) and infrared (IR) images. First, a multimodal detection framework is developed building upon the Faster R-CNN architecture, which is capable of integrating features from both RGB and IR images for enhanced object detection. Following this, Poolfuser, a transformer-based fusion module, is introduced and incorporated into the detection framework to fuse features from various modalities from spatial perspective. This approach emphasizes the critical features for target detection. Experimental results show that the multimodal framework equipped with Poolfuser significantly outperforms unimodal detectors and other competing multimodal approaches in terms of detection accuracy in complex environments. Secondly, to further improve the detection accuracy of the multimodal detection framework without introducing additional computational load, a lightweight fusion module based on Convolutional Neural Networks (CNN) is introduced. This module, termed Channel Switching and Spatial Attention (CSSA), integrates input features from both channel and spatial dimensions. The experimental results demonstrate that the CSSA module can further improve the detection accuracy without affecting the real-time performance of the detection framework. Finally, considering the impact of other components, such as the backbone network and the loss function on detection performance. This study further optimizes the CSSA-based multimodal detection model and introduces CSSA-Det. CSSA-Det shows improved object detection performance over CSSA and other state-of-the-art multimodal frameworks, particularly in the accuracy of bounding box localization.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-03-28
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0440965
URI	http://hdl.handle.net/2429/87661
Degree	Master of Applied Science - MASc
Program	Electrical Engineering
Affiliation	Applied Science, Faculty of; Engineering, School of (Okanagan)
Degree Grantor	University of British Columbia
Graduation Date	2024-05
Campus	UBCO
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Intelligent surveillance with multimodal object detection in complex environments Cao, Yue

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights