- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Adversarial attacks on multi-modal 3D detection models
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Adversarial attacks on multi-modal 3D detection models Abdelfattah, Mazen
Abstract
Modern Autonomous Vehicles (AVs) rely on sensory data often acquired by cameras and LiDARs to perceive the world around them. To operate the vehicle safely and effectively, Artificial Intelligence (AI) is used to process this data to detect objects of interest around the vehicle. For 3D object detection, recent advances in deep learning have resulted in the development of state-of-the-art multi-modal models which are built using Deep Neural Nets (DNNs) to process camera images, and LiDAR point clouds. While DNN-based models are very powerful and accurate they may be vulnerable to adversarial attacks which introduce a small change to a model’s input and can result in great errors in its output. These attacks have been heavily investigated for models that operate on camera image input only, and recently for point cloud processing models, however they have rarely been investigated in models that utilize both modalities as is often the case in modern AVs. To address this gap we propose a realistic adversarial attack on such multi-modal 3D detection models. We place a 3D adversarial object on a vehicle with the aim of hiding this object’s host vehicle from detection by powerful multi-modal 3D detectors. This object’s shape and texture are trained so that it can be used to prevent a specific model from detecting any host vehicle in any scene. 3D detection models are often based on either a cascaded architecture where each input modality is processed consecutively, or a fusion architecture where multi-input features are extracted and fused simultaneously. We use our attack to study the vulnerabilities of representative models of these architectures to realistic adversarial attacks and to understand the effects of multi-modal learning on the robustness of a model. Our experiments show that a single adversarial object is capable of hiding its host vehicle 55.6% and 63.19% of the times from the cascaded model and from the fusion model respectively. This vulnerability was found to be mainly due to RGB image features which were much less robust to adversarial scene changes compared to the point cloud features.
Item Metadata
Title |
Adversarial attacks on multi-modal 3D detection models
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2021
|
Description |
Modern Autonomous Vehicles (AVs) rely on sensory data often acquired by cameras and LiDARs to perceive the world around them. To operate the vehicle safely and effectively, Artificial Intelligence (AI) is used to process this data to detect objects of interest around the vehicle. For 3D object detection, recent advances in deep learning have resulted in the development of state-of-the-art multi-modal models which are built using Deep Neural Nets (DNNs) to process camera images, and LiDAR point clouds. While DNN-based models are very powerful and accurate they may be vulnerable to adversarial attacks which introduce a small change to a model’s input and can result in great errors in its output. These attacks have been heavily investigated for models that operate on camera image input only, and recently for point cloud processing models, however they have rarely been investigated in models that utilize both modalities as is often the case in modern AVs. To address this gap we propose a realistic adversarial attack on such multi-modal 3D detection models. We place a 3D adversarial object on a vehicle with the aim of hiding this object’s host vehicle from detection by powerful multi-modal 3D detectors. This object’s shape and texture are trained so that it can be used to prevent a specific model from detecting any host vehicle in any scene. 3D detection models are often based on either a cascaded architecture where each input modality is processed consecutively, or a fusion architecture where multi-input features are extracted and fused simultaneously. We use our attack to study the vulnerabilities of representative models of these architectures to realistic adversarial attacks and to understand the effects of multi-modal learning on the robustness of a model. Our experiments show that a single adversarial object is capable of hiding its host vehicle 55.6% and 63.19% of the times from the cascaded model and from the fusion model respectively. This vulnerability was found to be mainly due to RGB image features which were much less robust to adversarial scene changes compared to the point cloud features.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2021-04-22
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0396930
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2021-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International