UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Subgroup-specific regression models Yaghoubi, Marjan


Data sets are becoming massive with ever increasing advances in data collection technologies and are altering the nature of biomedical research. With many techniques, these huge data sets can be challenging, or even impossible, to accurately analyse. In biomedical settings, data sets are frequently heterogeneous, with samples representing various subtypes of diseases that are thought to have variations with respect to underlying biology. A motivating example is the study of progressive diseases such as Alzheimer's disease (AD). While there is a significant increase in the number of studies that concentrate on regression modeling of the disease progression, they ignore the fact that the pattern change are profoundly different for patients with various initial pro les. Estimating separate models for each subgroup is extremely difficult due to small sample sizes in the high dimensional setting, but may obtain results that are more accurate and reliable. Moreover, recognizing homogeneous subgroups of predictors can be cumbersome in high-dimensional regression analysis over subgroups of samples. This thesis attempts to improve upon an established method of regularized regression for group-structured datasets by using a linear combination of two penalty functions to select predictive clusters of correlated variables, and to allow for subgroup-specific parameter estimates. In order to showcase the performance of the suggested methodology, we conducted a series of experiments on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset including three groups of Cognitively Normal Controls (CN), Late Mild Cognitive Impairment (LMCI), and Alzheimer's disease (AD) subjects to estimate Mini- Mental State Examination (MMSE) scores in multiple future time points. Results reveal the effectiveness of the suggested method in terms of Root Mean Square Error (RMSE) over several available well-known statistical methods in two subgroups, AD and LMCI. However, in CN group, our proposed method performed better than other methods at two time points. We also investigated the prediction performance of our proposed method with multiple multi-task learning regression methods.

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International