UBC Theses and Dissertations
Probabilistic models for the identification and interpretation of somatic single nucleotide variants in cancer genomes Roth, Andrew Justin Latham
Somatic single nucleotide variants (SNVs) are mutations resulting from the substitution of a single nucleotide in the genome of cancer cells. Somatic SNVs are numerous in the genomes of most types of cancers. SNVs can contribute to the malignant phenotype of cancer cells, though many SNVs likely have negligible selective value. Because many SNVs are selectively neutral, their presence in a measurable proportion of cells is likely due to drift or genetic hitchhiking. This makes SNVs an appealing class of genomic aberrations to use as markers of clonal populations and ultimately tumour evolution. Advances in sequencing technology, in particular the development of high throughput sequencing (HTS) technologies, have made it possible to systematically profile SNVs in tumour genomes. We introduce three probabilistic models to solve analytical problems raised by experimental designs that leverage HTS to study cancer biology. The first experimental design we address is paired sequencing of normal and tumour tissue samples to identify somatic SNVs. We develop a probabilistic model to jointly analyse data from both samples, and reduce the number of false positive somatic SNV predictions. The second experimental design we address is the deep sequencing of SNVs to quantify the cellular prevalence of clones harbouring the SNVs. The key challenge we resolve is that allele abundance measured by HTS is not equivalent to cellular prevalence due to the confounding issues of mutational genotype, normal cell contamination and technical noise. We develop a probabilistic model which solves these problems while simultaneously inferring the number of clonal populations in the tissue. The final experimental design we consider is single cell sequencing. Single cell sequencing provides a direct means to measure the genotypes of clonal populations. However, sequence data from a single cell is inherently noisy which confounds accurate measurement of genotypes. To overcome this problem we develop a model to aggregate cells by clonal population in order to pool statistical strength and reduce error. The model jointly infers the assignment of cells to clonal populations, the genotype of the clonal populations, and the number of populations present.
Item Citations and Data
Attribution-NonCommercial 2.5 Canada