Model-based clustering for aCGH data using variational EM

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Model-based clustering for aCGH data using variational EM Alain, Guillaume

Abstract

DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients according to CNA patterns, we can identify recurrent CNAs and understand molecular heterogeneity. This differs from normal distance-based clustering that doesn’t exploit the sequential structure of the data. Our approach is based on the hmmmix model introduced by [Sha08]. We show how it can be trained with variational methods to achieve better results and make it more ﬂexible. We show how this allows for soft patient clusterings and how it partly addresses the difficult issue of determining the number of clusters to use. We compare the performance of our method with that of [Sha08] using their original benchmark test as well as with synthetic data generated from the hmmmix model itself. We show how our method can be parallelized and adapted to huge datasets.

Item Metadata

Title	Model-based clustering for aCGH data using variational EM
Creator	Alain, Guillaume
Publisher	University of British Columbia
Date Issued	2009
Description	DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients according to CNA patterns, we can identify recurrent CNAs and understand molecular heterogeneity. This differs from normal distance-based clustering that doesn’t exploit the sequential structure of the data. Our approach is based on the hmmmix model introduced by [Sha08]. We show how it can be trained with variational methods to achieve better results and make it more ﬂexible. We show how this allows for soft patient clusterings and how it partly addresses the difficult issue of determining the number of clusters to use. We compare the performance of our method with that of [Sha08] using their original benchmark test as well as with synthetic data generated from the hmmmix model itself. We show how our method can be parallelized and adapted to huge datasets.
Extent	2502822 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-08-11
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0051536
URI	http://hdl.handle.net/2429/11992
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2009-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Model-based clustering for aCGH data using variational EM Alain, Guillaume

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights