BIRS Workshop Lecture Videos
Bayesian feature screening for big neuroimaging data via massively parallel computing Kang, Jian
Motivated by the needs of selecting important features from big neuroimaging data, we develop a new Bayesian feature screening approach in the generalized linear model (GLM) framework. We assign the conjugate priors on the coefficients and obtain the analytical form of the marginal posterior density function. Under some mild regularity conditions, we show that the marginal posterior moments follow a mixture of normal distributions. In light of this theoretical foundation, we develop a Bayesian variable screening algorithm for ultra-high dimensional data consisting of two steps: Step 1: compute a multivariate variable screening statistic based on marginal posterior moments; Step 2: perform the mixture model-based cluster analysis on screening statistics to identify the unimportant variables. Step 1 only requires a computational complexity on the linear order of the number of predictors and it is straightforward to be parallelized. It has a close connection with sure independent screening (SIS) statistics and high-dimensional ordinary least-squares projection (HOLP) methods. Step 2 is an extension of the local false discovery rate (FDR) analysis. We implement our method using massively parallel computing techniques based on the general-purpose computing on graphics processing units (GPGPU), leading to an ultra-fast variable screening procedure. Our simulation studies show that the proposed approach can perform variable screening on one million predictors within seconds and achieve higher selection accuracy compared with existing methods. We also illustrate our methods on an analysis of resting state functional magnetic resonance imaging (Rs-fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) study.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International