Scandent tree : a decision forest based classification method for multimodal incomplete datasets

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Scandent tree : a decision forest based classification method for multimodal incomplete datasets Hor, Soheil

Abstract

Incomplete and inconsistent datasets often pose difficulties in multimodal studies. A common scenario in such studies is where many of the samples are non-randomly missing a large portion of the most discriminative features. We introduce the novel concept of scandent decision trees to tackle this issue in the context of a decision forest classifier. Scandent trees are decision trees that optimally mimic the partitioning of the data determined by another decision tree, and crucially, use only a subset of the feature set. We use the forest resulting from ensembling these trees as a classification model. We test the proposed method on a real world example of the target scenario, a prostate cancer dataset with MRI and gene expression modalities. The dataset is imbalanced with many MRI only samples and few with MRI and gene expression. Using scandent trees, we train a classifier that benefits from the large number of MRI samples at training time, and of the presence of MRI and gene expression features at the time of testing. The results show that the diagnostic value of the proposed model in terms of detecting prostate cancer is improved compared to traditional methods of imputation and missing data removal. The second major contribution of this work is the concept of tree-based feature maps in the decision forest paradigm. The tree-based feature maps enable us to train a classifier on a rich multimodal dataset, and use it to classify samples with only a subset of features of the training data. This has important clinical implications: one can benefit from an advanced modality to train a classifier, but use it in a practical situation when less expensive modalities are available. We use the proposed methodology to build a model trained on MRI and PET images of the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, and then test it on cases with only MRI data. We show that our method is significantly more effective in staging of cognitive impairments compared to a model trained and tested on MRI only, or one that uses other kinds of feature transform applied to the MRI data.

Item Metadata

Title	Scandent tree : a decision forest based classification method for multimodal incomplete datasets
Creator	Hor, Soheil
Publisher	University of British Columbia
Date Issued	2016
Description	Incomplete and inconsistent datasets often pose difficulties in multimodal studies. A common scenario in such studies is where many of the samples are non-randomly missing a large portion of the most discriminative features. We introduce the novel concept of scandent decision trees to tackle this issue in the context of a decision forest classifier. Scandent trees are decision trees that optimally mimic the partitioning of the data determined by another decision tree, and crucially, use only a subset of the feature set. We use the forest resulting from ensembling these trees as a classification model. We test the proposed method on a real world example of the target scenario, a prostate cancer dataset with MRI and gene expression modalities. The dataset is imbalanced with many MRI only samples and few with MRI and gene expression. Using scandent trees, we train a classifier that benefits from the large number of MRI samples at training time, and of the presence of MRI and gene expression features at the time of testing. The results show that the diagnostic value of the proposed model in terms of detecting prostate cancer is improved compared to traditional methods of imputation and missing data removal. The second major contribution of this work is the concept of tree-based feature maps in the decision forest paradigm. The tree-based feature maps enable us to train a classifier on a rich multimodal dataset, and use it to classify samples with only a subset of features of the training data. This has important clinical implications: one can benefit from an advanced modality to train a classifier, but use it in a practical situation when less expensive modalities are available. We use the proposed methodology to build a model trained on MRI and PET images of the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, and then test it on cases with only MRI data. We show that our method is significantly more effective in staging of cognitive impairments compared to a model trained and tested on MRI only, or one that uses other kinds of feature transform applied to the MRI data.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2016-04-26
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0300237
URI	http://hdl.handle.net/2429/57848
Degree	Master of Applied Science - MASc
Program	Biomedical Engineering
Affiliation	Applied Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2016-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Scandent tree : a decision forest based classification method for multimodal incomplete datasets Hor, Soheil

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights