UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Advancing privacy and accountability in machine learning Chen, Zitao

Abstract

As inherently data-driven solutions, machine learning (ML) systems can aggregate and process vast amounts of user data, such as clinical files and financial records. While these data are critical for training the ML systems to achieve high prediction accuracy, it is equally important to ensure that these systems do not leak users’ information. Moreover, users' data such as personal photos may be used for training ML systems without their permission. Therefore, it is essential for data owners to be able to audit any unauthorized use of their data. This dissertation addresses both problems through the unifying lens of membership inference (MI) attacks, which aim to infer whether a specific data point was used to train an ML model. The abundance of existing MI attacks pose a severe privacy threat in ML. To this end, we develop a practical defense technique that can effectively mitigate MI attacks while preserving high model accuracy. This supports model creators in building private and high-accuracy ML systems. Having considered the privacy threat posed by common MI attacks, we continue to explore how malicious adversaries can proactively manipulate ML models to amplify their privacy leakage. We demonstrate how a supply chain attacker can implant a privacy backdoor into ML models, causing them to leak significantly more information while still maintaining high accuracy on the main task. Finally, we study how MI, a class of malicious attacks, can be retrofitted for societal good. We present an MI-based data auditing technique that can support ordinary data holders to accurately detect whether their data were used for training ML models without their consent. Together, the findings from this dissertation support practitioners in building private ML systems from privacy-sensitive data and empower users to safeguard their data from being misused, ultimately contributing to the responsible use of ML in our society.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International