UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Fault injection in Machine Learning applications Narayanan, Niranjhana

Abstract

As Machine Learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems has also grown in importance. While prior studies have proposed techniques to enable efficient error-resilience (e.g., selective instruction duplication), a fundamental requirement for realizing these techniques is a detailed understanding of the application's resilience. The primary part of this thesis focuses on studying ML application resilience to hardware and software faults. To this end, we present the TensorFI tool set, consisting of TensorFI 1 and 2 which are high-level fault injection frameworks for TensorFlow 1 and 2 respectively. With this tool set, we inject faults in TensorFlow programs and study important reliability aspects such as model resilience to different kinds of faults, operator and layer level resilience of different models or the effect of hyperparameter variations. We evaluate the resilience of 12 ML applications, including those used in the autonomous vehicle domain. From our experiments, we find that there are significant differences between different ML applications and different configurations. Further, we find that applications are more vulnerable to bit-flip faults than other kinds of faults. We conduct four case studies to demonstrate some use cases of the tool set. We find the most and least resilient image classes to faults in a traffic sign recognition model. We consider layer-wise resilience and observe that faults in the initial layers of an application result in higher vulnerability. In addition, we visualize the outputs from layer-wise injection in an image segmentation model, and are able to identify the layer in which faults occurred based on the faulty prediction masks. These case studies thus provide valuable insights into how to improve the resilience of ML applications. The secondary part of this thesis focuses on studying ML application resilience to data faults (e.g. adversarial inputs, labeling errors, common corruptions/noisy data). We present a data mutation tool, TensorFlow Data Mutator (TF-DM), which targets different kinds of data faults commonly occurring in ML applications. We conduct experiments using TF-DM and outline the resiliency analysis of different models and datasets.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International