- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Interpretable clustering of epigenetic marks by incorporating...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Interpretable clustering of epigenetic marks by incorporating their relationships to genes and their functions Jafarzadeh, Sina
Abstract
Recent advances in high-throughput technologies have allowed researchers to measure epigenetic information, such as the methylation levels of CpG sites or the accessibility levels of chromatin, for hundreds of thousands of genomic regions. Many statistical methods have been developed to cluster these epigenetic measurements into contiguous, functional regions involved in biological processes or disease. In this project, I proposed a new approach for clustering the epigenetic marks into regions. The proposed model defines each region as the set of epigenetic marks located within a predefined window around the transcript start site of a gene. Therefore, the one-to-one mapping between the regions and genes helps elucidate the epigenetic functions of regions by looking at the functions of genes mapped to the regions. The proposed statistical model uses a weighted linear model that combines the values of marks in each region to construct a gene-level representation for that region. The weights of marks in each region are estimated using a scalable, coordinate descent optimization algorithm. I evaluated the quality of the inferred gene-level representations on two types of epigenetic data: chromatin accessibility and DNA methylation. When applied to the chromatin accessibility data, the results showed that the gene-level representations inferred by the proposed model could represent the variations in the expression levels of genes across samples with higher accuracy compared to the baseline methods. The model performance declined when applied to the DNA methylation data. To address this observation, I investigated the role of the type and quality of the epigenetic data on the model performance and offered a set of recommendations for using the proposed model effectively.
Item Metadata
Title |
Interpretable clustering of epigenetic marks by incorporating their relationships to genes and their functions
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2020
|
Description |
Recent advances in high-throughput technologies have allowed researchers to measure epigenetic information, such as the methylation levels of CpG sites or the accessibility levels of chromatin, for hundreds of thousands of genomic regions. Many statistical methods have been developed to cluster these epigenetic measurements into contiguous, functional regions involved in biological processes or disease. In this project, I proposed a new approach for clustering the epigenetic marks into regions. The proposed model defines each region as the set of epigenetic marks located within a predefined window around the transcript start site of a gene. Therefore, the one-to-one mapping between the regions and genes helps elucidate the epigenetic functions of regions by looking at the functions of genes mapped to the regions. The proposed statistical model uses a weighted linear model that combines the values of marks in each region to construct a gene-level representation for that region. The weights of marks in each region are estimated using a scalable, coordinate descent optimization algorithm. I evaluated the quality of the inferred gene-level representations on two types of epigenetic data: chromatin accessibility and DNA methylation. When applied to the chromatin accessibility data, the results showed that the gene-level representations inferred by the proposed model could represent the variations in the expression levels of genes across samples with higher accuracy compared to the baseline methods. The model performance declined when applied to the DNA methylation data. To address this observation, I investigated the role of the type and quality of the epigenetic data on the model performance and offered a set of recommendations for using the proposed model effectively.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2020-12-15
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0395309
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2021-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International