The Open Collections site will undergo maintenance from 4:00 PM - 6:00 PM PT on Wednesday, April 2nd, 2025. During this time, images and the IIIF service will not be available.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- False discovery rate estimation for high-dimensional...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
False discovery rate estimation for high-dimensional regression models Yuan, Ming
Abstract
A genome-wide association study (GWAS) aims to determine genetic variants statistically associated with phenotypes. However, because of linkage disequilibrium (LD), a characteristic of large-scale genomic datasets referring to the strong local dependencies between single-nucleotide polymorphisms (SNPs), it is usually challenging to identify the actual causal variants among their associated proxies. In this work, we propose a Bayesian variable selection method called the sparse mixed Gaussian prior for generalized linear models (SMG-GLM). It is an efficient high-dimensional Bayesian variable selection approach designed for arbitrary relationships between variants and phenotypes. Besides, it calibrates the selection uncertainty, which many popular variable selection methods do not address, by estimating posterior inclusion probabilities. We additionally combine SMG-GLM with knockoffs, named SMG-knockoffs, to account for the collinearity problem caused by LD. The SMG-knockoffs method can make inferences on the variable selection result and control the false discovery rate at an expected level. Its competence in discovering causal variables while controlling a desired false discovery rate has been shown in simulation studies conducted on a GWAS dataset.
Item Metadata
Title |
False discovery rate estimation for high-dimensional regression models
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2022
|
Description |
A genome-wide association study (GWAS) aims to determine genetic variants statistically associated with phenotypes. However, because of linkage disequilibrium (LD), a characteristic of large-scale genomic datasets referring to the strong local dependencies between single-nucleotide polymorphisms (SNPs), it is usually challenging to identify the actual causal variants among their associated proxies. In this work, we propose a Bayesian variable selection method called the sparse mixed Gaussian prior for generalized linear models (SMG-GLM). It is an efficient high-dimensional Bayesian variable selection approach designed for arbitrary relationships between variants and phenotypes. Besides, it calibrates the selection uncertainty, which many popular variable selection methods do not address, by estimating posterior inclusion probabilities. We additionally combine SMG-GLM with knockoffs, named SMG-knockoffs, to account for the collinearity problem caused by LD. The SMG-knockoffs method can make inferences on the variable selection result and control the false discovery rate at an expected level. Its competence in discovering causal variables while controlling a desired false discovery rate has been shown in simulation studies conducted on a GWAS dataset.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2022-10-21
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0421409
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2022-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International