Bioinformatic approaches for identifying single nucleotide variants and profiling alternative expression in cancer transcriptomes

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Bioinformatic approaches for identifying single nucleotide variants and profiling alternative expression in cancer transcriptomes Goya, Rodrigo

Abstract

Over the last decade, the advent of high-throughput sequencing (HTS) has given us the ability to study DNA and RNA sequences at nucleotide resolution at an unprecedented speed and at a relatively low cost. This has been an invaluable tool in the study of cancer, allowing projects such as The Cancer Genome Atlas and the International Cancer Genome Consortium to sequence thousands of tumours from multiple cancer types. The ever-increasing amounts of data created by these projects demanded new analysis methods: in the first part of this thesis, I focus on method development for mutation calling in genome and transcriptome data. I present SNVMix, a single nucleotide variant (SNV) caller based on a set of probabilistic models created to adapt to variations in allele representation in a tumour. Differential allele representation in DNA can occur when multiple clones are present in the sequenced tumour, and in RNA can occur due to differences in gene expression or allele bias. These situations are nearly ubiquitously encountered in cancer sequencing studies, and thus need to be accounted for. I demonstrate that SNVMix was able to outperform another contemporary SNV caller that does not account for variations in allele representation. I also present BWA-R, an adaptation of the Burrows Wheeler Aligner, that can properly align RNA-Seq paired-end reads to a genome reference extended with exon-exon junction sequences formed through splicing. I show that BWA-R provides better alignments for SNV calling in transcriptomes, resulting in an increase in the proportion of true positive calls obtained. In the second part of this thesis, I analyze RNA-Seq data from a triple negative breast cancer (TNBC) cohort and describe the alternative splicing profiles of the previously defined Basal and NonBasal subgroups. TNBC is characterized by the absence of estrogen and progesterone receptors and human epidermal growth factor receptor 2 (HER2), which precludes the use of currently available targeted therapies. TNBC patients are thus treated with chemotherapy, and outcomes are generally poor. I identify alternatively expressed genes that may be relevant to the biology of these two subgroups and that could provide clues for further studies or treatment options.

Item Metadata

Title	Bioinformatic approaches for identifying single nucleotide variants and profiling alternative expression in cancer transcriptomes
Creator	Goya, Rodrigo
Publisher	University of British Columbia
Date Issued	2017
Description	Over the last decade, the advent of high-throughput sequencing (HTS) has given us the ability to study DNA and RNA sequences at nucleotide resolution at an unprecedented speed and at a relatively low cost. This has been an invaluable tool in the study of cancer, allowing projects such as The Cancer Genome Atlas and the International Cancer Genome Consortium to sequence thousands of tumours from multiple cancer types. The ever-increasing amounts of data created by these projects demanded new analysis methods: in the first part of this thesis, I focus on method development for mutation calling in genome and transcriptome data. I present SNVMix, a single nucleotide variant (SNV) caller based on a set of probabilistic models created to adapt to variations in allele representation in a tumour. Differential allele representation in DNA can occur when multiple clones are present in the sequenced tumour, and in RNA can occur due to differences in gene expression or allele bias. These situations are nearly ubiquitously encountered in cancer sequencing studies, and thus need to be accounted for. I demonstrate that SNVMix was able to outperform another contemporary SNV caller that does not account for variations in allele representation. I also present BWA-R, an adaptation of the Burrows Wheeler Aligner, that can properly align RNA-Seq paired-end reads to a genome reference extended with exon-exon junction sequences formed through splicing. I show that BWA-R provides better alignments for SNV calling in transcriptomes, resulting in an increase in the proportion of true positive calls obtained. In the second part of this thesis, I analyze RNA-Seq data from a triple negative breast cancer (TNBC) cohort and describe the alternative splicing profiles of the previously defined Basal and NonBasal subgroups. TNBC is characterized by the absence of estrogen and progesterone receptors and human epidermal growth factor receptor 2 (HER2), which precludes the use of currently available targeted therapies. TNBC patients are thus treated with chemotherapy, and outcomes are generally poor. I identify alternatively expressed genes that may be relevant to the biology of these two subgroups and that could provide clues for further studies or treatment options.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2017-12-18
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0362107
URI	http://hdl.handle.net/2429/64070
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2018-02
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Bioinformatic approaches for identifying single nucleotide variants and profiling alternative expression in cancer transcriptomes Goya, Rodrigo

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights