- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Model based approaches to array CGH data analysis
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Model based approaches to array CGH data analysis Shah, Sohrab P.
Abstract
DNA copy number alterations (CNAs) are genetic changes that can produce adverse effects in numerous human diseases, including cancer. CNAs are segments of DNA that have been deleted or amplified and can range in size from one kilobases to whole chromosome arms. Development of array comparative genomic hybridization (aCGH) technology enables CNAs to be measured at sub-megabase resolution using tens of thousands of probes. However, aCGH data are noisy and result in continuous valued measurements of the discrete CNAs. Consequently, the data must be processed through algorithmic and statistical techniques in order to derive meaningful biological insights. We introduce model-based approaches to analysis of aCGH data and develop state-of-the-art solutions to three distinct analytical problems. In the simplest scenario, the task is to infer CNAs from a single aCGH experiment. We apply a hidden Markov model (HMM) to accurately identify CNAs from aCGH data. We show that borrowing statistical strength across chromosomes and explicitly modeling outliers in the data, improves on baseline models. In the second scenario, we wish to identify recurrent CNAs in a set of aCGH data derived from a patient cohort. These are locations in the genome altered in many patients, providing evidence for CNAs that may be playing important molecular roles in the disease. We develop a novel hierarchical HMM profiling method that explicitly models both statistical and biological noise in the data and is capable of producing a representative profile for a set of aCGH experiments. We demonstrate that our method is more accurate than simpler baselines on synthetic data, and show our model produces output that is more interpretable than other methods. Finally, we develop a model based clustering framework to stratify a patient cohort, expected to be composed of a fixed set of molecular subtypes. We introduce a model that jointly infers CNAs, assigns patients to subgroups and infers the profiles that represent each subgroup. We show our model to be more accurate on synthetic data, and show in two patient cohorts how the model discovers putative novel subtypes and clinically relevant subgroups.
Item Metadata
Title |
Model based approaches to array CGH data analysis
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2008
|
Description |
DNA copy number alterations (CNAs) are genetic changes that can produce
adverse effects in numerous human diseases, including cancer. CNAs are
segments of DNA that have been deleted or amplified and can range in size
from one kilobases to whole chromosome arms. Development of array
comparative genomic hybridization (aCGH) technology enables CNAs to be
measured at sub-megabase resolution using tens of thousands of probes.
However, aCGH data are noisy and result in continuous valued measurements of
the discrete CNAs. Consequently, the data must be processed through
algorithmic and statistical techniques in order to derive meaningful
biological insights. We introduce model-based approaches to analysis of aCGH
data and develop state-of-the-art solutions to three distinct analytical
problems.
In the simplest scenario, the task is to infer CNAs from a single aCGH
experiment. We apply a hidden Markov model (HMM) to accurately identify
CNAs from aCGH data. We show that borrowing statistical strength across
chromosomes and explicitly modeling outliers in the data, improves on
baseline models.
In the second scenario, we wish to identify recurrent CNAs in a set of aCGH
data derived from a patient cohort. These are locations in the genome
altered in many patients, providing evidence for CNAs that may be playing
important molecular roles in the disease. We develop a novel hierarchical
HMM profiling method that explicitly models both statistical and biological
noise in the data and is capable of producing a representative profile for a
set of aCGH experiments. We demonstrate that our method is more accurate
than simpler baselines on synthetic data, and show our model produces output
that is more interpretable than other methods.
Finally, we develop a model based clustering framework to stratify a patient
cohort, expected to be composed of a fixed set of molecular subtypes. We
introduce a model that jointly infers CNAs, assigns patients to subgroups
and infers the profiles that represent each subgroup. We show our model to
be more accurate on synthetic data, and show in two patient cohorts how the
model discovers putative novel subtypes and clinically relevant subgroups.
|
Extent |
15032556 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2008-11-24
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0051341
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2009-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International