Enhancing privacy and efficiency in AI models through data-oriented learning strategies

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Enhancing privacy and efficiency in AI models through data-oriented learning strategies Huang, Chun-Yin

Abstract

The rapid development of Artificial Intelligence (AI) is transforming how we work, live, and innovate. Despite achieving impressive performance, AI models face challenges in reliability, privacy, and efficiency for real-world deployment. This thesis addresses these concerns through two main objectives: 1) tackling data privacy and heterogeneity via federated learning (FL), and 2) auditing data usage and quality during both pre-training and post-training stages. Efficiency is also emphasized to ensure practical applicability. All solutions in this thesis share a data-centric view, grounded in the belief that AI models are shaped by data and can, in turn, reveal information about it. To enhance privacy and efficiency, we first improve heterogeneous FL with new frameworks such as FedLGD and DeSA. These methods use virtual data to overcome challenges like heterogeneity, asynchronization, and decentralized coordination, leading to more robust and scalable FL systems. Both theoretical analysis and empirical results support their effectiveness. Then, we shift to data auditing, critical for ensuring data quality and ethical use. For post-training analysis, we introduce EMA, a method that combines membership inference with statistical ensembles to verify training data legitimacy. For pre-training assessment, EXAMINE leverages self-supervised learning to evaluate data quality. These methods enhance the robustness and accountability of AI systems, particularly in sensitive fields like healthcare. Finally, we propose future research directions that aim to improve scalability and adapt our work to emerging model architectures, enabling broader applicability in resource-limited and multimodal settings. In summary, this thesis offers a comprehensive approach to building trustworthy and efficient AI systems through innovations in federated learning and data auditing. By adopting a data-oriented perspective, it addresses key challenges in privacy, scalability, and accountability. The proposed methods, validated both theoretically and empirically, demonstrate strong potential for real-world impact, especially in ethically sensitive domains.

Item Metadata

Title	Enhancing privacy and efficiency in AI models through data-oriented learning strategies
Creator	Huang, Chun-Yin
Supervisor	Li, Xiaoxiao
Publisher	University of British Columbia
Date Issued	2025
Description	The rapid development of Artificial Intelligence (AI) is transforming how we work, live, and innovate. Despite achieving impressive performance, AI models face challenges in reliability, privacy, and efficiency for real-world deployment. This thesis addresses these concerns through two main objectives: 1) tackling data privacy and heterogeneity via federated learning (FL), and 2) auditing data usage and quality during both pre-training and post-training stages. Efficiency is also emphasized to ensure practical applicability. All solutions in this thesis share a data-centric view, grounded in the belief that AI models are shaped by data and can, in turn, reveal information about it. To enhance privacy and efficiency, we first improve heterogeneous FL with new frameworks such as FedLGD and DeSA. These methods use virtual data to overcome challenges like heterogeneity, asynchronization, and decentralized coordination, leading to more robust and scalable FL systems. Both theoretical analysis and empirical results support their effectiveness. Then, we shift to data auditing, critical for ensuring data quality and ethical use. For post-training analysis, we introduce EMA, a method that combines membership inference with statistical ensembles to verify training data legitimacy. For pre-training assessment, EXAMINE leverages self-supervised learning to evaluate data quality. These methods enhance the robustness and accountability of AI systems, particularly in sensitive fields like healthcare. Finally, we propose future research directions that aim to improve scalability and adapt our work to emerging model architectures, enabling broader applicability in resource-limited and multimodal settings. In summary, this thesis offers a comprehensive approach to building trustworthy and efficient AI systems through innovations in federated learning and data auditing. By adopting a data-oriented perspective, it addresses key challenges in privacy, scalability, and accountability. The proposed methods, validated both theoretically and empirically, demonstrate strong potential for real-world impact, especially in ethically sensitive domains.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2025-08-11
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0449620
URI	http://hdl.handle.net/2429/91816
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2025-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Enhancing privacy and efficiency in AI models through data-oriented learning strategies Huang, Chun-Yin

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights