UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Utility of machine learning approaches for cancer diagnosis and analysis from RNA sequencing Grewal, Jasleen K


The highest number of cancer-associated deaths are attributable to metastasis. These include rare cancer types that lack established treatment guidelines, or cancers that become resistant to established lines of therapy. Precision oncology projects aim to develop treatment options for these patients by obtaining a detailed molecular view of the cancer. Scientists use sequencing data like whole-genome sequencing and RNA-sequencing to understand the biology of the cancer. A significant challenge in this process is diagnosing the cancer type of the sample since the observed measurements are best understood with this context. Routine histopathology relies on tissue morphology and can fail to provide a determinative diagnosis when the cancer metastasizes, presents biology attributable to multiple different cancer types, or presents as a rare cancer type. Molecular data has revealed differences in the genetic makeup of cancers that appear morphologically similar, motivating the use of molecular diagnostics. Nevertheless, no existing tools utilize the output from these sequencing modalities in its entirety (that is, without feature selection). There is also limited work evaluating the utility of pan-cancer molecular diagnostics in a precision oncology trial. In this work we review an ongoing precision oncology trial and identify the impact of sequencing-based approaches on cancer diagnosis. We develop SCOPE, a machine-learning method that uses RNA-Seq profiles of tumours for automated cancer diagnosis. We show that this method, which uses over 17,688 gene measurements as input, has better classification accuracy than when using statistically prioritized marker genes, can deconvolve cancer-types with mixed histology, and has high performance in metastatic cancers and cancers of unknown origin. In precision oncology, manual analysis of the tumour's genomic profile is used to understand tumour biology and driver pathways. We find that by assessing the classifier's dependence on gene subsets, we can automatically calculate the importance of various biological programs in individual tumours. Pathways prioritized through this tool - called PIE - show a high overlap with manual integrative analysis performed by expert bioinformaticians to identify clinically important genomic changes. Lastly, we demonstrate that PIE facilitates cohort-wide cancer analysis and discovery of novel sub-groups in advanced cancers.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International