UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Targeted feature extraction : a deep learning approach Tsai, Yiting

Abstract

This thesis details the progressive development of a Machine Learning workflow, aimed towards multi-class problems in the engineering and clinical fields. We select Deep Learning as the basis of this modelling framework, as their universal approximation property renders them agnostic to different types of underlying data structures. We propose an optimal Deep Learning model which extracts interpretable features, capturing the decisive, salient characteristics of each data class. This is accomplished by revising the traditional Deep Learning objective, introducing an additional term which enhances class separation and identity. Using mathematical properties of the discovered latent space, we introduce a Feature Extractor based on weight traceback, which connects the decisive class-specific neurons to the raw variables in the input layer. The efficacy and necessity of the proposed strategy is demonstrated across six total case studies. The first two studies highlight the inconsistency across clusters discovered by traditional Unsupervised Learning models, as well as the misconception of traditional Deep Learning as a magical solution to every problem. The following two studies demonstrate proof-of-concept for the proposed strategy on two Machine Learning benchmark datasets, showing visible improvements in both classification accuracy and feature extraction compared to baseline models. Finally, the remaining two studies explore clinical applications concerning the diagnosis of COVID-19 and Scleroderma patients. In each case, the proposed Machine Learning strategy is compared against traditional, state-of-art models, with respect to class cluster separability, prediction accuracy, and biomarker discovery. The results show clear improvements in each aforementioned area; moreover, computational complexity analysis shows that our method scales linearly with the number of samples in the dataset, and in a linearithmic fashion with respect to the number of raw variables. The main practical contributions of this thesis include a significant improvement in prediction accuracy through the reduction of false discovery rates, as well as the discovery of signature variables which allow for targeted mitigation of undesired conditions.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International