- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Intelligent surveillance with multimodal object detection...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Intelligent surveillance with multimodal object detection in complex environments Cao, Yue
Abstract
Surveillance systems play a crucial role in ensuring public safety. With the advent of deep learning algorithms, these systems have evolved from passive monitoring tools that heavily relied on human operators, to advanced solutions capable of autonomously analyzing scenes with minimal human input. However, accurately detecting objects of interest in real-world scenarios presents a significant challenge due to the dynamic illumination and the varying sizes of objects. This research aims to enhance the accuracy and robustness of intelligent surveillance systems for object detection in complex environments by integrating two complementary sensor data: visible light (RGB) and infrared (IR) images. First, a multimodal detection framework is developed building upon the Faster R-CNN architecture, which is capable of integrating features from both RGB and IR images for enhanced object detection. Following this, Poolfuser, a transformer-based fusion module, is introduced and incorporated into the detection framework to fuse features from various modalities from spatial perspective. This approach emphasizes the critical features for target detection. Experimental results show that the multimodal framework equipped with Poolfuser significantly outperforms unimodal detectors and other competing multimodal approaches in terms of detection accuracy in complex environments. Secondly, to further improve the detection accuracy of the multimodal detection framework without introducing additional computational load, a lightweight fusion module based on Convolutional Neural Networks (CNN) is introduced. This module, termed Channel Switching and Spatial Attention (CSSA), integrates input features from both channel and spatial dimensions. The experimental results demonstrate that the CSSA module can further improve the detection accuracy without affecting the real-time performance of the detection framework. Finally, considering the impact of other components, such as the backbone network and the loss function on detection performance. This study further optimizes the CSSA-based multimodal detection model and introduces CSSA-Det. CSSA-Det shows improved object detection performance over CSSA and other state-of-the-art multimodal frameworks, particularly in the accuracy of bounding box localization.
Item Metadata
Title |
Intelligent surveillance with multimodal object detection in complex environments
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
Surveillance systems play a crucial role in ensuring public safety. With the advent of deep learning algorithms, these systems have evolved from passive monitoring tools that heavily relied on human operators, to advanced solutions capable of autonomously analyzing scenes with minimal human input. However, accurately detecting objects of interest in real-world scenarios presents a significant challenge due to the dynamic illumination and the varying sizes of objects. This research aims to enhance the accuracy and robustness of intelligent surveillance systems for object detection in complex environments by integrating two complementary sensor data: visible light (RGB) and infrared (IR) images.
First, a multimodal detection framework is developed building upon the Faster R-CNN architecture, which is capable of integrating features from both RGB and IR images for enhanced object detection. Following this, Poolfuser, a transformer-based fusion module, is introduced and incorporated into the detection framework to fuse features from various modalities from spatial perspective. This approach emphasizes the critical features for target detection. Experimental results show that the multimodal framework equipped with Poolfuser significantly outperforms unimodal detectors and other competing multimodal approaches in terms of detection accuracy in complex environments.
Secondly, to further improve the detection accuracy of the multimodal detection framework without introducing additional computational load, a lightweight fusion module based on Convolutional Neural Networks (CNN) is introduced. This module, termed Channel Switching and Spatial Attention (CSSA), integrates input features from both channel and spatial dimensions. The experimental results demonstrate that the CSSA module can further improve the detection accuracy without affecting the real-time performance of the detection framework.
Finally, considering the impact of other components, such as the backbone network and the loss function on detection performance. This study further optimizes the CSSA-based multimodal detection model and introduces CSSA-Det. CSSA-Det shows improved object detection performance over CSSA and other state-of-the-art multimodal frameworks, particularly in the accuracy of bounding box localization.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-03-28
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0440965
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2024-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International