UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Combatting bias in small medical imaging datasets with generative deep learning Vavasour, Zach

Abstract

As artificial intelligence (AI) continues to find its way into more aspects of our daily lives, addressing its potential for bias and discriminatory behaviour has become increasingly critical. While bias remains a concern in all fields of machine learning, it presents as a particularly relevant problem when working on medical imaging applications. The costly nature of scanning participants combined with the frequently unbalanced distributions in real-world health conditions make most datasets highly under sampled and unbalanced. To combat the effects of bias in small medical imaging datasets we propose employing a generative deep learning framework to synthesize additional samples of the under-represented groups. In this research we first explore how imbalances in participant age, sex, disease phenotype, and scanner manufacturer effect downstream performance of three progressively more difficult classification tasks in a small brain MRI dataset (n=1000) of people living with Multiple Sclerosis (MS): MS versus healthy control, MS phenotype, and progressor versus non-progressor classification. Our results demonstrate that for all chosen factors except for sex, downstream classification performance is significantly lower in the under-represented groups. We then created a state-of-the-art wavelet diffusion model (WDM) to synthesize additional samples conditioned on the identified sources of bias. We evaluate the performance of the generative model data by training classifiers to predict the conditioning targets on a set of fully synthetic data. The trained classifiers are then applied to a test set of real data in order to evaluate if the features present in the synthetic data are representative of real features. Lastly, we synthesize brain MRIs of the specific underrepresented groups and re-evaluate performance on the three previously defined classification tasks. Our results demonstrate that by including synthetic data, we reduce the relative gaps in performance between the majority and minority groups without harming the overall performance. This research indicates that synthetic data from generative deep learning models can be a powerful tool to combat bias if intelligently configured to synthesize new samples of underrepresented groups.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International