Combatting bias in small medical imaging datasets with generative deep learning

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Combatting bias in small medical imaging datasets with generative deep learning Vavasour, Zach

Abstract

As artificial intelligence (AI) continues to find its way into more aspects of our daily lives, addressing its potential for bias and discriminatory behaviour has become increasingly critical. While bias remains a concern in all fields of machine learning, it presents as a particularly relevant problem when working on medical imaging applications. The costly nature of scanning participants combined with the frequently unbalanced distributions in real-world health conditions make most datasets highly under sampled and unbalanced. To combat the effects of bias in small medical imaging datasets we propose employing a generative deep learning framework to synthesize additional samples of the under-represented groups. In this research we first explore how imbalances in participant age, sex, disease phenotype, and scanner manufacturer effect downstream performance of three progressively more difficult classification tasks in a small brain MRI dataset (n=1000) of people living with Multiple Sclerosis (MS): MS versus healthy control, MS phenotype, and progressor versus non-progressor classification. Our results demonstrate that for all chosen factors except for sex, downstream classification performance is significantly lower in the under-represented groups. We then created a state-of-the-art wavelet diffusion model (WDM) to synthesize additional samples conditioned on the identified sources of bias. We evaluate the performance of the generative model data by training classifiers to predict the conditioning targets on a set of fully synthetic data. The trained classifiers are then applied to a test set of real data in order to evaluate if the features present in the synthetic data are representative of real features. Lastly, we synthesize brain MRIs of the specific underrepresented groups and re-evaluate performance on the three previously defined classification tasks. Our results demonstrate that by including synthetic data, we reduce the relative gaps in performance between the majority and minority groups without harming the overall performance. This research indicates that synthetic data from generative deep learning models can be a powerful tool to combat bias if intelligently configured to synthesize new samples of underrepresented groups.

Item Metadata

Title	Combatting bias in small medical imaging datasets with generative deep learning
Creator	Vavasour, Zach
Supervisor	Tam, Roger
Publisher	University of British Columbia
Date Issued	2024
Description	As artificial intelligence (AI) continues to find its way into more aspects of our daily lives, addressing its potential for bias and discriminatory behaviour has become increasingly critical. While bias remains a concern in all fields of machine learning, it presents as a particularly relevant problem when working on medical imaging applications. The costly nature of scanning participants combined with the frequently unbalanced distributions in real-world health conditions make most datasets highly under sampled and unbalanced. To combat the effects of bias in small medical imaging datasets we propose employing a generative deep learning framework to synthesize additional samples of the under-represented groups. In this research we first explore how imbalances in participant age, sex, disease phenotype, and scanner manufacturer effect downstream performance of three progressively more difficult classification tasks in a small brain MRI dataset (n=1000) of people living with Multiple Sclerosis (MS): MS versus healthy control, MS phenotype, and progressor versus non-progressor classification. Our results demonstrate that for all chosen factors except for sex, downstream classification performance is significantly lower in the under-represented groups. We then created a state-of-the-art wavelet diffusion model (WDM) to synthesize additional samples conditioned on the identified sources of bias. We evaluate the performance of the generative model data by training classifiers to predict the conditioning targets on a set of fully synthetic data. The trained classifiers are then applied to a test set of real data in order to evaluate if the features present in the synthetic data are representative of real features. Lastly, we synthesize brain MRIs of the specific underrepresented groups and re-evaluate performance on the three previously defined classification tasks. Our results demonstrate that by including synthetic data, we reduce the relative gaps in performance between the majority and minority groups without harming the overall performance. This research indicates that synthetic data from generative deep learning models can be a powerful tool to combat bias if intelligently configured to synthesize new samples of underrepresented groups.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2025-01-13
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0447741
URI	http://hdl.handle.net/2429/90105
Degree	Master of Applied Science - MASc
Program	Biomedical Engineering
Affiliation	Applied Science, Faculty of; Biomedical Engineering, School of
Degree Grantor	University of British Columbia
Graduation Date	2025-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Combatting bias in small medical imaging datasets with generative deep learning Vavasour, Zach

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights