UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Effect of misspecified response correlation in regression analysis Chui, Grace Shung-Lai


One can imagine a possible loss of parameter estimation efficiency when response correlation is ignored or misspecified in modeling a response-covariate relationship. Under what conditions is efficiency lost? How much is lost? Whether the responses are correlated or independent, standard theory for the distribution of least squares parameter estimates in linear models (Gaussian responses) can be readily determined. We find that the linear regression analysis assuming independent responses is (theoretically) never more efficient than that incorporating response dependence. The "difference" in efficiencies between these two analyses — measured by how much more readily the latter detects a non-zero regression coefficient — generally increases as the coefficient-to-noise ratio increases. To incorporate response correlation in G LM parameter estimation, Liang &; Zeger (1986) extended the quasi-likelihood theory and developed the generalized estimating equations (GEE) approach. Despite being a popular method, the effects of misspecifying response correlation (e.g. assuming independence when responses are correlated) on parameter estimation efficiency using GEE are not obvious. To investigate such effects, we use simulation studies in which we generate count data and use the GEE approach to estimate the model parameters, using both the correct and misspecified correlation structures. The generated counts, the number of correlated responses in each cluster/replicate, and the total number of replicates are all small to imitate health impact studies (in which hospital admission counts are often the responses). Despite possible loss of parameter estimation efficiency due to such "obstacles" intrinsic in the model, simulation results indicate that the GEE approach produces' 1. regression parameter estimates with relatively small empirical biases using either a correct or misspecified response correlation; 2. a good estimate of the response correlation matrix if its structure is correctly specified; 3. naive and robust variance estimates both of which estimate the true variance well when the response correlation structure is correctly specified; and 4. good robust variance estimates even when the response correlation is misspecified. Furthermore, in a G LM with exchangeably correlated Poisson data and no covariates, specifying independence or exchangeable dependence yields the same intercept estimate and estimation efficiency, provided that inference is based on the robust variance estimate. The naive variance estimate can significantly underestimate the true variance if the responses are assumed independent when analyzing such a GLM.

Item Media

Item Citations and Data


For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.