Open Collections will undergo scheduled maintenance on Monday February 2nd between 11:00 AM and 1:00 PM PST.

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Exploring heterogeneity and resource constraints in federated learning Yu, Xinhui

Abstract

One key factor in training deep learning models is having abundant training data. This often requires collecting data from multiple sources (clients), which may violate data protection regulations. Federated learning (FL) provides a training paradigm where clients collaboratively train a global model while keeping data decentralized. Despite its potential, current FL frameworks rely on restrictive assumptions, limiting their performance and applicability in real-world scenarios. In practice, client data originate from different distributions (i.e., data heterogeneity), and local models are optimized for their respective data, leading to poor performance and slow convergence. To address data heterogeneity, we first adopt a domain-invariant feature learning perspective to capture task-relevant features, and propose a generalized FL framework. We extend this framework to a label-scarce setting where only a few samples are labeled for each client, thereby reducing annotation burden. We then tackle a more challenging scenario involving both labeled and unlabeled clients, indicating that data distributions differ across the two groups. To facilitate effective knowledge transfer, we introduce a generalized FL framework that combines pseudo-labeling with a dual-selection strategy which selects pseudo-labeled samples and model components for updating. Next, we reformulate the global objective to produce personalized models for each client, addressing data heterogeneity from a new perspective. We propose a Bayesian-enhanced personalized FL framework which incorporates Bayesian learning into FL to mitigate overfitting, and designs high-quality personalized priors for each client to guide local training. We then continue our investigation of personalized FL, focusing on an underexplored yet critical form of data heterogeneity: concept drift across clients. We also explore the potential of multimodal data. To improve model performance, we propose a multimodal-enhanced personalized FL framework with personalized modules to capture individual understanding of each input and effective fusion strategies to integrate features from diverse modalities. We finally explore the feasibility of allowing clients to customize their model structures and propose a heterogeneous FL framework that tackles dual heterogeneity. In this framework, a designed 'bridge' enables collaboration between clients by representing local knowledge as logits on the bridge and a similarity-based knowledge distillation strategy supports effective cross-client knowledge absorption.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International