Open Collections

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Probabilistic models for the identification and interpretation of somatic single nucleotide variants in cancer genomes Roth, Andrew Justin Latham

Abstract

Somatic single nucleotide variants (SNVs) are mutations resulting from the substitution of a single nucleotide in the genome of cancer cells. Somatic SNVs are numerous in the genomes of most types of cancers. SNVs can contribute to the malignant phenotype of cancer cells, though many SNVs likely have negligible selective value. Because many SNVs are selectively neutral, their presence in a measurable proportion of cells is likely due to drift or genetic hitchhiking. This makes SNVs an appealing class of genomic aberrations to use as markers of clonal populations and ultimately tumour evolution. Advances in sequencing technology, in particular the development of high throughput sequencing (HTS) technologies, have made it possible to systematically profile SNVs in tumour genomes. We introduce three probabilistic models to solve analytical problems raised by experimental designs that leverage HTS to study cancer biology. The first experimental design we address is paired sequencing of normal and tumour tissue samples to identify somatic SNVs. We develop a probabilistic model to jointly analyse data from both samples, and reduce the number of false positive somatic SNV predictions. The second experimental design we address is the deep sequencing of SNVs to quantify the cellular prevalence of clones harbouring the SNVs. The key challenge we resolve is that allele abundance measured by HTS is not equivalent to cellular prevalence due to the confounding issues of mutational genotype, normal cell contamination and technical noise. We develop a probabilistic model which solves these problems while simultaneously inferring the number of clonal populations in the tissue. The final experimental design we consider is single cell sequencing. Single cell sequencing provides a direct means to measure the genotypes of clonal populations. However, sequence data from a single cell is inherently noisy which confounds accurate measurement of genotypes. To overcome this problem we develop a model to aggregate cells by clonal population in order to pool statistical strength and reduce error. The model jointly infers the assignment of cells to clonal populations, the genotype of the clonal populations, and the number of populations present.

Item Metadata

Title	Probabilistic models for the identification and interpretation of somatic single nucleotide variants in cancer genomes
Creator	Roth, Andrew Justin Latham
Publisher	University of British Columbia
Date Issued	2015
Description	Somatic single nucleotide variants (SNVs) are mutations resulting from the substitution of a single nucleotide in the genome of cancer cells. Somatic SNVs are numerous in the genomes of most types of cancers. SNVs can contribute to the malignant phenotype of cancer cells, though many SNVs likely have negligible selective value. Because many SNVs are selectively neutral, their presence in a measurable proportion of cells is likely due to drift or genetic hitchhiking. This makes SNVs an appealing class of genomic aberrations to use as markers of clonal populations and ultimately tumour evolution. Advances in sequencing technology, in particular the development of high throughput sequencing (HTS) technologies, have made it possible to systematically profile SNVs in tumour genomes. We introduce three probabilistic models to solve analytical problems raised by experimental designs that leverage HTS to study cancer biology. The first experimental design we address is paired sequencing of normal and tumour tissue samples to identify somatic SNVs. We develop a probabilistic model to jointly analyse data from both samples, and reduce the number of false positive somatic SNV predictions. The second experimental design we address is the deep sequencing of SNVs to quantify the cellular prevalence of clones harbouring the SNVs. The key challenge we resolve is that allele abundance measured by HTS is not equivalent to cellular prevalence due to the confounding issues of mutational genotype, normal cell contamination and technical noise. We develop a probabilistic model which solves these problems while simultaneously inferring the number of clonal populations in the tissue. The final experimental design we consider is single cell sequencing. Single cell sequencing provides a direct means to measure the genotypes of clonal populations. However, sequence data from a single cell is inherently noisy which confounds accurate measurement of genotypes. To overcome this problem we develop a model to aggregate cells by clonal population in order to pool statistical strength and reduce error. The model jointly infers the assignment of cells to clonal populations, the genotype of the clonal populations, and the number of populations present.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2016-01-06
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial 2.5 Canada
DOI	10.14288/1.0223130
URI	http://hdl.handle.net/2429/56222
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2016-02
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Probabilistic models for the identification and interpretation of somatic single nucleotide variants in cancer genomes Roth, Andrew Justin Latham

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights