UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

DNA methylation microarray data reduction for co-methylation analysis Gatev, Evan

Abstract

DNA Methylation (DNAm) is an epigenetic modification that is present across the human genome, primarily in the context of CpG di-nucleotides. In human population studies, high throughput bead chip microarray assays are the prevalent way to simultaneously measure the methylation state of many thousands of genomic CpG sites. Proximal genomic CpGs have correlated methylation state within a single cell and often function as a single biological unit. The prevailing common methylation state of such multiple CpGs within a common biological unit has been the subject of intense study, due to its immediate relevance for gene expression regulation and ultimately for health and disease. I designed and implemented a method for a biologically motivated DNAm array data reduction, which constructs co-methylated regions (CMRs), while incorporating information about the genomic CpG background from the reference human genome annotation. The method relies on the correlations of CpG methylation across individuals for proximal CpG probes. The method aims for enhanced statistical performance in terms of statistical power and specificity, including for downstream applications. For example, Epigenome Wide Association Studies (EWAS), an important such application, often places the focus on group “hits” with multiple adjacent CpGs that are significant, because their gnomic proximity makes it more likely that the detected correlations are not spurious. The CMRs capture such groups and I showed that the CMRs constructed in whole blood public data have high statistical specificity in the context of EWAS for chronological age and biological sex. When the composite CMR methylation measures were used to perform EWAS for age and sex, they had high sensitivity and specificity, including uncovering additional associated CpGs not detected by conventional EWAS. The utility of the data reduction method was further discussed within the broader context of applying machine learning algorithms for high dimensional DNAm array data analysis.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International