UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Interpretable clustering of epigenetic marks by incorporating their relationships to genes and their functions Jafarzadeh, Sina

Abstract

Recent advances in high-throughput technologies have allowed researchers to measure epigenetic information, such as the methylation levels of CpG sites or the accessibility levels of chromatin, for hundreds of thousands of genomic regions. Many statistical methods have been developed to cluster these epigenetic measurements into contiguous, functional regions involved in biological processes or disease. In this project, I proposed a new approach for clustering the epigenetic marks into regions. The proposed model defines each region as the set of epigenetic marks located within a predefined window around the transcript start site of a gene. Therefore, the one-to-one mapping between the regions and genes helps elucidate the epigenetic functions of regions by looking at the functions of genes mapped to the regions. The proposed statistical model uses a weighted linear model that combines the values of marks in each region to construct a gene-level representation for that region. The weights of marks in each region are estimated using a scalable, coordinate descent optimization algorithm. I evaluated the quality of the inferred gene-level representations on two types of epigenetic data: chromatin accessibility and DNA methylation. When applied to the chromatin accessibility data, the results showed that the gene-level representations inferred by the proposed model could represent the variations in the expression levels of genes across samples with higher accuracy compared to the baseline methods. The model performance declined when applied to the DNA methylation data. To address this observation, I investigated the role of the type and quality of the epigenetic data on the model performance and offered a set of recommendations for using the proposed model effectively.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International