False discovery rate estimation for high-dimensional regression models

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

False discovery rate estimation for high-dimensional regression models Yuan, Ming

Abstract

A genome-wide association study (GWAS) aims to determine genetic variants statistically associated with phenotypes. However, because of linkage disequilibrium (LD), a characteristic of large-scale genomic datasets referring to the strong local dependencies between single-nucleotide polymorphisms (SNPs), it is usually challenging to identify the actual causal variants among their associated proxies. In this work, we propose a Bayesian variable selection method called the sparse mixed Gaussian prior for generalized linear models (SMG-GLM). It is an efficient high-dimensional Bayesian variable selection approach designed for arbitrary relationships between variants and phenotypes. Besides, it calibrates the selection uncertainty, which many popular variable selection methods do not address, by estimating posterior inclusion probabilities. We additionally combine SMG-GLM with knockoffs, named SMG-knockoffs, to account for the collinearity problem caused by LD. The SMG-knockoffs method can make inferences on the variable selection result and control the false discovery rate at an expected level. Its competence in discovering causal variables while controlling a desired false discovery rate has been shown in simulation studies conducted on a GWAS dataset.

Item Metadata

Title	False discovery rate estimation for high-dimensional regression models
Creator	Yuan, Ming
Supervisor	Park, Yongjin P.
Publisher	University of British Columbia
Date Issued	2022
Description	A genome-wide association study (GWAS) aims to determine genetic variants statistically associated with phenotypes. However, because of linkage disequilibrium (LD), a characteristic of large-scale genomic datasets referring to the strong local dependencies between single-nucleotide polymorphisms (SNPs), it is usually challenging to identify the actual causal variants among their associated proxies. In this work, we propose a Bayesian variable selection method called the sparse mixed Gaussian prior for generalized linear models (SMG-GLM). It is an efficient high-dimensional Bayesian variable selection approach designed for arbitrary relationships between variants and phenotypes. Besides, it calibrates the selection uncertainty, which many popular variable selection methods do not address, by estimating posterior inclusion probabilities. We additionally combine SMG-GLM with knockoffs, named SMG-knockoffs, to account for the collinearity problem caused by LD. The SMG-knockoffs method can make inferences on the variable selection result and control the false discovery rate at an expected level. Its competence in discovering causal variables while controlling a desired false discovery rate has been shown in simulation studies conducted on a GWAS dataset.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2022-10-21
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0421409
URI	http://hdl.handle.net/2429/82959
Degree (Theses)	Master of Science - MSc
Program (Theses)	Statistics
Affiliation	Science, Faculty of; Statistics, Department of
Degree Grantor	University of British Columbia
Graduation Date	2022-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

False discovery rate estimation for high-dimensional regression models Yuan, Ming

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights