UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Application of supervised learning models to compare epigenetic predictors of gene expression across healthy breast cell types Tello Palencia, Marco Antonio

Abstract

Moderate associations have been identified between gene expression and DNA methylation variability, predicted transcription factor binding sites, and transcription factor expression across multiple human tissues, including healthy mammary cells and diverse cancer-related cellular contexts. However, previous models summarized DNA methylation primarily at promoter regions, ignoring methylation variability in other genomic regions. In the current thesis, I propose using Variably Methylated Regions (VMRs) for summarizing DNA methylation and hypothesized that models trained on VMR-derived features would outperform promoter-centered models in the prediction of individual gene expression across healthy mammary cell types. Results largely supported this hypothesis, with VMR-based models demonstrating a superior capacity for predicting standardized individual gene expression across held-out samples compared to their promoter counterparts. Additionally, the DNA methylation feature showed the highest contribution to the performance of VMR-based models. Despite challenges in generalizing association patterns to unseen data across all regression models, this thesis is the first study that uses and rigorously evaluates the contribution of VMR-derived features to explain gene expression variability across healthy mammary cell types.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International