UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Adversarial attacks on multi-modal 3D detection models Abdelfattah, Mazen


Modern Autonomous Vehicles (AVs) rely on sensory data often acquired by cameras and LiDARs to perceive the world around them. To operate the vehicle safely and effectively, Artificial Intelligence (AI) is used to process this data to detect objects of interest around the vehicle. For 3D object detection, recent advances in deep learning have resulted in the development of state-of-the-art multi-modal models which are built using Deep Neural Nets (DNNs) to process camera images, and LiDAR point clouds. While DNN-based models are very powerful and accurate they may be vulnerable to adversarial attacks which introduce a small change to a model’s input and can result in great errors in its output. These attacks have been heavily investigated for models that operate on camera image input only, and recently for point cloud processing models, however they have rarely been investigated in models that utilize both modalities as is often the case in modern AVs. To address this gap we propose a realistic adversarial attack on such multi-modal 3D detection models. We place a 3D adversarial object on a vehicle with the aim of hiding this object’s host vehicle from detection by powerful multi-modal 3D detectors. This object’s shape and texture are trained so that it can be used to prevent a specific model from detecting any host vehicle in any scene. 3D detection models are often based on either a cascaded architecture where each input modality is processed consecutively, or a fusion architecture where multi-input features are extracted and fused simultaneously. We use our attack to study the vulnerabilities of representative models of these architectures to realistic adversarial attacks and to understand the effects of multi-modal learning on the robustness of a model. Our experiments show that a single adversarial object is capable of hiding its host vehicle 55.6% and 63.19% of the times from the cascaded model and from the fusion model respectively. This vulnerability was found to be mainly due to RGB image features which were much less robust to adversarial scene changes compared to the point cloud features.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International