Effect of misspecified response correlation in regression analysis

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Effect of misspecified response correlation in regression analysis Chui, Grace Shung-Lai

Abstract

One can imagine a possible loss of parameter estimation efficiency when response correlation is ignored or misspecified in modeling a response-covariate relationship. Under what conditions is efficiency lost? How much is lost? Whether the responses are correlated or independent, standard theory for the distribution of least squares parameter estimates in linear models (Gaussian responses) can be readily determined. We find that the linear regression analysis assuming independent responses is (theoretically) never more efficient than that incorporating response dependence. The "difference" in efficiencies between these two analyses — measured by how much more readily the latter detects a non-zero regression coefficient — generally increases as the coefficient-to-noise ratio increases. To incorporate response correlation in G LM parameter estimation, Liang &; Zeger (1986) extended the quasi-likelihood theory and developed the generalized estimating equations (GEE) approach. Despite being a popular method, the effects of misspecifying response correlation (e.g. assuming independence when responses are correlated) on parameter estimation efficiency using GEE are not obvious. To investigate such effects, we use simulation studies in which we generate count data and use the GEE approach to estimate the model parameters, using both the correct and misspecified correlation structures. The generated counts, the number of correlated responses in each cluster/replicate, and the total number of replicates are all small to imitate health impact studies (in which hospital admission counts are often the responses). Despite possible loss of parameter estimation efficiency due to such "obstacles" intrinsic in the model, simulation results indicate that the GEE approach produces' 1. regression parameter estimates with relatively small empirical biases using either a correct or misspecified response correlation; 2. a good estimate of the response correlation matrix if its structure is correctly specified; 3. naive and robust variance estimates both of which estimate the true variance well when the response correlation structure is correctly specified; and 4. good robust variance estimates even when the response correlation is misspecified. Furthermore, in a G LM with exchangeably correlated Poisson data and no covariates, specifying independence or exchangeable dependence yields the same intercept estimate and estimation efficiency, provided that inference is based on the robust variance estimate. The naive variance estimate can significantly underestimate the true variance if the responses are assumed independent when analyzing such a GLM.

Item Metadata

Title	Effect of misspecified response correlation in regression analysis
Creator	Chui, Grace Shung-Lai
Publisher	University of British Columbia
Date Issued	1996
Description	One can imagine a possible loss of parameter estimation efficiency when response correlation is ignored or misspecified in modeling a response-covariate relationship. Under what conditions is efficiency lost? How much is lost? Whether the responses are correlated or independent, standard theory for the distribution of least squares parameter estimates in linear models (Gaussian responses) can be readily determined. We find that the linear regression analysis assuming independent responses is (theoretically) never more efficient than that incorporating response dependence. The "difference" in efficiencies between these two analyses — measured by how much more readily the latter detects a non-zero regression coefficient — generally increases as the coefficient-to-noise ratio increases. To incorporate response correlation in G LM parameter estimation, Liang &; Zeger (1986) extended the quasi-likelihood theory and developed the generalized estimating equations (GEE) approach. Despite being a popular method, the effects of misspecifying response correlation (e.g. assuming independence when responses are correlated) on parameter estimation efficiency using GEE are not obvious. To investigate such effects, we use simulation studies in which we generate count data and use the GEE approach to estimate the model parameters, using both the correct and misspecified correlation structures. The generated counts, the number of correlated responses in each cluster/replicate, and the total number of replicates are all small to imitate health impact studies (in which hospital admission counts are often the responses). Despite possible loss of parameter estimation efficiency due to such "obstacles" intrinsic in the model, simulation results indicate that the GEE approach produces' 1. regression parameter estimates with relatively small empirical biases using either a correct or misspecified response correlation; 2. a good estimate of the response correlation matrix if its structure is correctly specified; 3. naive and robust variance estimates both of which estimate the true variance well when the response correlation structure is correctly specified; and 4. good robust variance estimates even when the response correlation is misspecified. Furthermore, in a G LM with exchangeably correlated Poisson data and no covariates, specifying independence or exchangeable dependence yields the same intercept estimate and estimation efficiency, provided that inference is based on the robust variance estimate. The naive variance estimate can significantly underestimate the true variance if the responses are assumed independent when analyzing such a GLM.
Extent	6024464 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-02-12
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0086986
URI	http://hdl.handle.net/2429/4521
Degree	Master of Science - MSc
Program	Statistics
Affiliation	Science, Faculty of; Statistics, Department of
Degree Grantor	University of British Columbia
Graduation Date	1996-11
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_1996-0352.pdf -- 5.75MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Effect of misspecified response correlation in regression analysis Chui, Grace Shung-Lai

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights