- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Probabilistic modeling of high-throughput sequencing...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Probabilistic modeling of high-throughput sequencing data for enhanced understanding of DNA methylation heterogeneity Shen, Ning
Abstract
DNA methylation is a key epigenetic mechanism governing gene regulation and cellular identity. Advances in high-throughput sequencing technologies have enabled detailed investigation of methylation landscapes across single cells and complex tissue mixtures. However, the sparsity and noise inherent in single-cell data, as well as the signal distortion in enrichment-based platforms, pose major analytical challenges. This thesis presents two novel statistical frameworks to address these limitations and advance the computational toolkit for DNA methylation analysis.
The first contribution is vmrseq, a probabilistic method and software for detecting variably methylated regions from single-cell bisulfite sequencing data. vmrseq integrates a smoothing-based strategy for candidate region identification with hidden Markov modeling to account for spatial correlation and technical noise. Through extensive benchmarking on synthetic and experimental datasets, vmrseq demonstrates improved precision and biological relevance in identifying methylation heterogeneity, supporting downstream analyses such as unsupervised clustering and cell-type-specific marker discovery.
The second contribution is decemedip, a hierarchical Bayesian model and software for cell type deconvolution of enrichment-based methylation data such as MeDIP-seq. By leveraging reference panels derived from alternative platforms and modeling the complex relationship between methylation levels, CpG density, and read counts, decemedip enables accurate estimation of cell type proportions with uncertainty quantification. Its performance is validated through simulations, cross-platform comparisons, and real-world applications involving patient-derived xenografts and circulating cell-free DNA from cancer cohorts.
Together, these methods address critical gaps in the analysis of high-throughput DNA methylation data, enabling robust detection of epigenetic heterogeneity across biological contexts. The associated open-source software implementations provide practical tools for future epigenomic research and potential clinical applications.
Item Metadata
| Title |
Probabilistic modeling of high-throughput sequencing data for enhanced understanding of DNA methylation heterogeneity
|
| Creator | |
| Supervisor | |
| Publisher |
University of British Columbia
|
| Date Issued |
2025
|
| Description |
DNA methylation is a key epigenetic mechanism governing gene regulation and cellular identity. Advances in high-throughput sequencing technologies have enabled detailed investigation of methylation landscapes across single cells and complex tissue mixtures. However, the sparsity and noise inherent in single-cell data, as well as the signal distortion in enrichment-based platforms, pose major analytical challenges. This thesis presents two novel statistical frameworks to address these limitations and advance the computational toolkit for DNA methylation analysis.
The first contribution is vmrseq, a probabilistic method and software for detecting variably methylated regions from single-cell bisulfite sequencing data. vmrseq integrates a smoothing-based strategy for candidate region identification with hidden Markov modeling to account for spatial correlation and technical noise. Through extensive benchmarking on synthetic and experimental datasets, vmrseq demonstrates improved precision and biological relevance in identifying methylation heterogeneity, supporting downstream analyses such as unsupervised clustering and cell-type-specific marker discovery.
The second contribution is decemedip, a hierarchical Bayesian model and software for cell type deconvolution of enrichment-based methylation data such as MeDIP-seq. By leveraging reference panels derived from alternative platforms and modeling the complex relationship between methylation levels, CpG density, and read counts, decemedip enables accurate estimation of cell type proportions with uncertainty quantification. Its performance is validated through simulations, cross-platform comparisons, and real-world applications involving patient-derived xenografts and circulating cell-free DNA from cancer cohorts.
Together, these methods address critical gaps in the analysis of high-throughput DNA methylation data, enabling robust detection of epigenetic heterogeneity across biological contexts. The associated open-source software implementations provide practical tools for future epigenomic research and potential clinical applications.
|
| Genre | |
| Type | |
| Language |
eng
|
| Date Available |
2025-10-02
|
| Provider |
Vancouver : University of British Columbia Library
|
| Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
| DOI |
10.14288/1.0450299
|
| URI | |
| Degree (Theses) | |
| Program (Theses) | |
| Affiliation | |
| Degree Grantor |
University of British Columbia
|
| Graduation Date |
2025-11
|
| Campus | |
| Scholarly Level |
Graduate
|
| Rights URI | |
| Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International