- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Model-based clustering for aCGH data using variational...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Model-based clustering for aCGH data using variational EM Alain, Guillaume
Abstract
DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients according to CNA patterns, we can identify recurrent CNAs and understand molecular heterogeneity. This differs from normal distance-based clustering that doesn’t exploit the sequential structure of the data. Our approach is based on the hmmmix model introduced by [Sha08]. We show how it can be trained with variational methods to achieve better results and make it more flexible. We show how this allows for soft patient clusterings and how it partly addresses the difficult issue of determining the number of clusters to use. We compare the performance of our method with that of [Sha08] using their original benchmark test as well as with synthetic data generated from the hmmmix model itself. We show how our method can be parallelized and adapted to huge datasets.
Item Metadata
Title |
Model-based clustering for aCGH data using variational EM
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2009
|
Description |
DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. Copy number variations (of which CNAs are a subset) are a common phenomenon and not much is known about the nature of many of the mutations. By clustering patients according to CNA patterns, we can identify recurrent CNAs and understand molecular heterogeneity. This differs from normal distance-based clustering that doesn’t exploit the sequential structure of the data.
Our approach is based on the hmmmix model introduced by [Sha08]. We show how it can
be trained with variational methods to achieve better results and make it more flexible. We show how this allows for soft patient clusterings and how it partly addresses the difficult issue of determining the number of clusters to use. We compare the performance of our method with that of [Sha08] using their original benchmark test as well as with synthetic data generated
from the hmmmix model itself. We show how our method can be parallelized and adapted to huge datasets.
|
Extent |
2502822 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-08-11
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0051536
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2009-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International