Utility of machine learning approaches for cancer diagnosis and analysis from RNA sequencing

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Utility of machine learning approaches for cancer diagnosis and analysis from RNA sequencing Grewal, Jasleen K

Abstract

The highest number of cancer-associated deaths are attributable to metastasis. These include rare cancer types that lack established treatment guidelines, or cancers that become resistant to established lines of therapy. Precision oncology projects aim to develop treatment options for these patients by obtaining a detailed molecular view of the cancer. Scientists use sequencing data like whole-genome sequencing and RNA-sequencing to understand the biology of the cancer. A significant challenge in this process is diagnosing the cancer type of the sample since the observed measurements are best understood with this context. Routine histopathology relies on tissue morphology and can fail to provide a determinative diagnosis when the cancer metastasizes, presents biology attributable to multiple different cancer types, or presents as a rare cancer type. Molecular data has revealed differences in the genetic makeup of cancers that appear morphologically similar, motivating the use of molecular diagnostics. Nevertheless, no existing tools utilize the output from these sequencing modalities in its entirety (that is, without feature selection). There is also limited work evaluating the utility of pan-cancer molecular diagnostics in a precision oncology trial. In this work we review an ongoing precision oncology trial and identify the impact of sequencing-based approaches on cancer diagnosis. We develop SCOPE, a machine-learning method that uses RNA-Seq profiles of tumours for automated cancer diagnosis. We show that this method, which uses over 17,688 gene measurements as input, has better classification accuracy than when using statistically prioritized marker genes, can deconvolve cancer-types with mixed histology, and has high performance in metastatic cancers and cancers of unknown origin. In precision oncology, manual analysis of the tumour's genomic profile is used to understand tumour biology and driver pathways. We find that by assessing the classifier's dependence on gene subsets, we can automatically calculate the importance of various biological programs in individual tumours. Pathways prioritized through this tool - called PIE - show a high overlap with manual integrative analysis performed by expert bioinformaticians to identify clinically important genomic changes. Lastly, we demonstrate that PIE facilitates cohort-wide cancer analysis and discovery of novel sub-groups in advanced cancers.

Item Metadata

Title	Utility of machine learning approaches for cancer diagnosis and analysis from RNA sequencing
Creator	Grewal, Jasleen K
Publisher	University of British Columbia
Date Issued	2020
Description	The highest number of cancer-associated deaths are attributable to metastasis. These include rare cancer types that lack established treatment guidelines, or cancers that become resistant to established lines of therapy. Precision oncology projects aim to develop treatment options for these patients by obtaining a detailed molecular view of the cancer. Scientists use sequencing data like whole-genome sequencing and RNA-sequencing to understand the biology of the cancer. A significant challenge in this process is diagnosing the cancer type of the sample since the observed measurements are best understood with this context. Routine histopathology relies on tissue morphology and can fail to provide a determinative diagnosis when the cancer metastasizes, presents biology attributable to multiple different cancer types, or presents as a rare cancer type. Molecular data has revealed differences in the genetic makeup of cancers that appear morphologically similar, motivating the use of molecular diagnostics. Nevertheless, no existing tools utilize the output from these sequencing modalities in its entirety (that is, without feature selection). There is also limited work evaluating the utility of pan-cancer molecular diagnostics in a precision oncology trial. In this work we review an ongoing precision oncology trial and identify the impact of sequencing-based approaches on cancer diagnosis. We develop SCOPE, a machine-learning method that uses RNA-Seq profiles of tumours for automated cancer diagnosis. We show that this method, which uses over 17,688 gene measurements as input, has better classification accuracy than when using statistically prioritized marker genes, can deconvolve cancer-types with mixed histology, and has high performance in metastatic cancers and cancers of unknown origin. In precision oncology, manual analysis of the tumour's genomic profile is used to understand tumour biology and driver pathways. We find that by assessing the classifier's dependence on gene subsets, we can automatically calculate the importance of various biological programs in individual tumours. Pathways prioritized through this tool - called PIE - show a high overlap with manual integrative analysis performed by expert bioinformaticians to identify clinically important genomic changes. Lastly, we demonstrate that PIE facilitates cohort-wide cancer analysis and discovery of novel sub-groups in advanced cancers.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2020-08-27
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0394043
URI	http://hdl.handle.net/2429/75717
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2020-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Utility of machine learning approaches for cancer diagnosis and analysis from RNA sequencing Grewal, Jasleen K

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights