UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Enhancing privacy and efficiency in AI models through data-oriented learning strategies Huang, Chun-Yin

Abstract

The rapid development of Artificial Intelligence (AI) is transforming how we work, live, and innovate. Despite achieving impressive performance, AI models face challenges in reliability, privacy, and efficiency for real-world deployment. This thesis addresses these concerns through two main objectives: 1) tackling data privacy and heterogeneity via federated learning (FL), and 2) auditing data usage and quality during both pre-training and post-training stages. Efficiency is also emphasized to ensure practical applicability. All solutions in this thesis share a data-centric view, grounded in the belief that AI models are shaped by data and can, in turn, reveal information about it. To enhance privacy and efficiency, we first improve heterogeneous FL with new frameworks such as FedLGD and DeSA. These methods use virtual data to overcome challenges like heterogeneity, asynchronization, and decentralized coordination, leading to more robust and scalable FL systems. Both theoretical analysis and empirical results support their effectiveness. Then, we shift to data auditing, critical for ensuring data quality and ethical use. For post-training analysis, we introduce EMA, a method that combines membership inference with statistical ensembles to verify training data legitimacy. For pre-training assessment, EXAMINE leverages self-supervised learning to evaluate data quality. These methods enhance the robustness and accountability of AI systems, particularly in sensitive fields like healthcare. Finally, we propose future research directions that aim to improve scalability and adapt our work to emerging model architectures, enabling broader applicability in resource-limited and multimodal settings. In summary, this thesis offers a comprehensive approach to building trustworthy and efficient AI systems through innovations in federated learning and data auditing. By adopting a data-oriented perspective, it addresses key challenges in privacy, scalability, and accountability. The proposed methods, validated both theoretically and empirically, demonstrate strong potential for real-world impact, especially in ethically sensitive domains.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International