UBC Theses and Dissertations
Graph neural networks and transformers for enhanced explainability and generalizability in medical machine learning Mokhtari, Masoud
Machine learning frameworks for medical applications must be explainable, generalizable despite the scarcity of training data, and able to tackle various clinical tasks with minimal modifications. Explainability is crucial for safety-critical applications, as users must recognize when human supervision is needed. Additionally, developing strong inductive bias from sparsely labeled data is essential, given that large-scale medical datasets are not widely available. In a clinical setting, numerous metrics are measured daily, making it logistically challenging to maintain separate models for individual metrics. This highlights the need for flexible frameworks that preserve explainability. In this thesis, we address these requirements by proposing three frameworks that harness the representation power of graph neural networks (GNNs) or transformers, improving the state-of-the-art and enhancing the practicality of machine learning in medical applications. Our first framework aims to provide explainability in the prediction pipeline. We demonstrate its effectiveness using the task of left ventricular ejection fraction estimation from echocardiographic videos. This framework employs GNNs to learn a weighted graph between the frames of an input echocardiogram before producing a single ejection fraction estimate. Our results show that the learned latent structure aligns with clinical guidelines for predicting ejection fraction and can serve as a surrogate for the model's confidence in its predictions. The second framework improves model generalizability for sparsely labeled data using GNNs. We apply the framework to the task of clinical landmark detection, where only a small number of frames in a video are labeled. To maximize the use of supervisory signals, we employ a multi-scale objective function and a hierarchical graph structure. Our results indicate that this approach builds better inductive bias and outperforms previous work. Lastly, we propose a flexible framework that offers attention-based explainability on multiple levels, making it suitable for various clinical tasks. This framework utilizes Transformers, a special instance of GNNs, to capture patch-wise, frame-wise, and video-wise interactions in echocardiographic data. This approach aids in identifying pertinent information for a specific clinical metric. To showcase the flexibility of this framework, we consider two critical cardiac tasks: aortic stenosis detection and ejection fraction estimation.
Item Citations and Data
Attribution-NonCommercial-ShareAlike 4.0 International