Predicting new disease activity from deep grey matter on MRI in early multiple sclerosis using random forests and neural networks : feature selection and accounting for class label uncertainty

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Predicting new disease activity from deep grey matter on MRI in early multiple sclerosis using random forests and neural networks : feature selection and accounting for class label uncertainty Tayyab, Maryam

Abstract

Multiple sclerosis (MS) is an autoimmune disease of the central nervous system with a heterogeneous disease course, making it difficult to predict patient-specific clinical outcomes. Machine learning can potentially improve predictions by feature extraction and/or learning complex relationships. Morphological change in deep grey matter (DGM) structures is a consistent feature in all MS phenotypes, yet the value of DGM imaging features for clinical prediction is largely unexplored. In this thesis, I evaluated the contribution of DGM volumes and deep-learned (DL) features for predicting new disease activity within 24 months of a first clinical demyelinating event. Our data set had two challenging characteristics: highly heterogeneous feature types, requiring a thoughtful exploration of feature selection methods, and 32 out of 140 patient samples had uncertain ground truth labels. We implemented and evaluated 1) random forest (RF) models trained on clinical, demographic and DGM volumes, 2) four feature selection methods for RF training and their impact on model performance, 3) DL models trained on 3D segmentations of DGM nuclei with and without user-defined features, and 4) strategies to account for label uncertainty while training an RF. In a 7-fold nested cross-validation experiment, our best result without accounting for uncertainty (F1-score = 77.57%, SD= 6.60%) was achieved with an RF trained on manually selected features, which outperformed common automated feature selection methods, such as iterative RFs (F1-score = 72.35%, SD=6.91%). The neural network using only deep-learned DGM features achieved a slightly lower F1-score = 73.02% (4.70%), which decreased further when adding user-defined features. When accounting for label uncertainty, the highest performance achieved in the 108 confirmed labels was produced by a probabilistic RF (F1-score = 89.62%, SD=4.90%) trained on all available samples, which was higher than training only on the confirmed labels.

Item Metadata

Title	Predicting new disease activity from deep grey matter on MRI in early multiple sclerosis using random forests and neural networks : feature selection and accounting for class label uncertainty
Creator	Tayyab, Maryam
Supervisor	Tam, Roger
Publisher	University of British Columbia
Date Issued	2021
Description	Multiple sclerosis (MS) is an autoimmune disease of the central nervous system with a heterogeneous disease course, making it difficult to predict patient-specific clinical outcomes. Machine learning can potentially improve predictions by feature extraction and/or learning complex relationships. Morphological change in deep grey matter (DGM) structures is a consistent feature in all MS phenotypes, yet the value of DGM imaging features for clinical prediction is largely unexplored. In this thesis, I evaluated the contribution of DGM volumes and deep-learned (DL) features for predicting new disease activity within 24 months of a first clinical demyelinating event. Our data set had two challenging characteristics: highly heterogeneous feature types, requiring a thoughtful exploration of feature selection methods, and 32 out of 140 patient samples had uncertain ground truth labels. We implemented and evaluated 1) random forest (RF) models trained on clinical, demographic and DGM volumes, 2) four feature selection methods for RF training and their impact on model performance, 3) DL models trained on 3D segmentations of DGM nuclei with and without user-defined features, and 4) strategies to account for label uncertainty while training an RF. In a 7-fold nested cross-validation experiment, our best result without accounting for uncertainty (F1-score = 77.57%, SD= 6.60%) was achieved with an RF trained on manually selected features, which outperformed common automated feature selection methods, such as iterative RFs (F1-score = 72.35%, SD=6.91%). The neural network using only deep-learned DGM features achieved a slightly lower F1-score = 73.02% (4.70%), which decreased further when adding user-defined features. When accounting for label uncertainty, the highest performance achieved in the 108 confirmed labels was produced by a probabilistic RF (F1-score = 89.62%, SD=4.90%) trained on all available samples, which was higher than training only on the confirmed labels.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2021-10-08
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0402485
URI	http://hdl.handle.net/2429/79932
Degree	Master of Applied Science - MASc
Program	Biomedical Engineering
Affiliation	Applied Science, Faculty of; Biomedical Engineering, School of
Degree Grantor	University of British Columbia
Graduation Date	2021-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Predicting new disease activity from deep grey matter on MRI in early multiple sclerosis using random forests and neural networks : feature selection and accounting for class label uncertainty Tayyab, Maryam

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights