- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Enhancing privacy and efficiency in AI models through...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Enhancing privacy and efficiency in AI models through data-oriented learning strategies Huang, Chun-Yin
Abstract
The rapid development of Artificial Intelligence (AI) is transforming how we work, live, and innovate. Despite achieving impressive performance, AI models face challenges in reliability, privacy, and efficiency for real-world deployment. This thesis addresses these concerns through two main objectives: 1) tackling data privacy and heterogeneity via federated learning (FL), and 2) auditing data usage and quality during both pre-training and post-training stages. Efficiency is also emphasized to ensure practical applicability. All solutions in this thesis share a data-centric view, grounded in the belief that AI models are shaped by data and can, in turn, reveal information about it. To enhance privacy and efficiency, we first improve heterogeneous FL with new frameworks such as FedLGD and DeSA. These methods use virtual data to overcome challenges like heterogeneity, asynchronization, and decentralized coordination, leading to more robust and scalable FL systems. Both theoretical analysis and empirical results support their effectiveness. Then, we shift to data auditing, critical for ensuring data quality and ethical use. For post-training analysis, we introduce EMA, a method that combines membership inference with statistical ensembles to verify training data legitimacy. For pre-training assessment, EXAMINE leverages self-supervised learning to evaluate data quality. These methods enhance the robustness and accountability of AI systems, particularly in sensitive fields like healthcare. Finally, we propose future research directions that aim to improve scalability and adapt our work to emerging model architectures, enabling broader applicability in resource-limited and multimodal settings. In summary, this thesis offers a comprehensive approach to building trustworthy and efficient AI systems through innovations in federated learning and data auditing. By adopting a data-oriented perspective, it addresses key challenges in privacy, scalability, and accountability. The proposed methods, validated both theoretically and empirically, demonstrate strong potential for real-world impact, especially in ethically sensitive domains.
Item Metadata
Title |
Enhancing privacy and efficiency in AI models through data-oriented learning strategies
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2025
|
Description |
The rapid development of Artificial Intelligence (AI) is transforming how we work, live, and innovate. Despite achieving impressive performance, AI models face challenges in reliability, privacy, and efficiency for real-world deployment. This thesis addresses these concerns through two main objectives: 1) tackling data privacy and heterogeneity via federated learning (FL), and 2) auditing data usage and quality during both pre-training and post-training stages. Efficiency is also emphasized to ensure practical applicability.
All solutions in this thesis share a data-centric view, grounded in the belief that AI models are shaped by data and can, in turn, reveal information about it. To enhance privacy and efficiency, we first improve heterogeneous FL with new frameworks such as FedLGD and DeSA. These methods use virtual data to overcome challenges like heterogeneity, asynchronization, and decentralized coordination, leading to more robust and scalable FL systems. Both theoretical analysis and empirical results support their effectiveness.
Then, we shift to data auditing, critical for ensuring data quality and ethical use. For post-training analysis, we introduce EMA, a method that combines membership inference with statistical ensembles to verify training data legitimacy. For pre-training assessment, EXAMINE leverages self-supervised learning to evaluate data quality. These methods enhance the robustness and accountability of AI systems, particularly in sensitive fields like healthcare.
Finally, we propose future research directions that aim to improve scalability and adapt our work to emerging model architectures, enabling broader applicability in resource-limited and multimodal settings.
In summary, this thesis offers a comprehensive approach to building trustworthy and efficient AI systems through innovations in federated learning and data auditing. By adopting a data-oriented perspective, it addresses key challenges in privacy, scalability, and accountability. The proposed methods, validated both theoretically and empirically, demonstrate strong potential for real-world impact, especially in ethically sensitive domains.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2025-08-11
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0449620
|
URI | |
Degree (Theses) | |
Program (Theses) | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2025-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International