Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models Regier, Michael David

Abstract

Observational studies predicated on the secondary use of information from administrative and health databases often encounter the problem of missing and mismeasured data. Although there is much methodological literature pertaining to each problem in isolation, there is a scant body of literature addressing both problems in tandem. I investigate the effect of missing and mismeasured covariates on parameter estimation from a binary logistic regression model and propose a likelihood based method to adjust for the combined data deficiencies. Two simulation studies are used to understand the effect of data imperfection on parameter estimation and to evaluate the utility of a likelihood based adjustment. When missing and mismeasured data occurred for separate covariates, I found that the parameter estimate associated with the mismeasured portion was biased and that the parameter estimate for the missing data aspect may be biased under both missing at random and non-ignorable missing at random assumptions. A Monte Carlo Expectation-Maximization adjustment reduced the magnitude of the bias, but a trade-off was observed. Bias reduction for the mismeasured covariate was achieved by increasing the bias associated with the others. When both problems affected a single covariate, the parameter estimate for the imperfect covariate was biased. Additionally, the parameter estimates for the other covariates were also biased. The Monte Carlo Expectation-Maximization adjustment often corrected the bias, but the bias trade-off amongst the covariates was observed. For both simulation studies, I observed a potential dissimilarity across missing data mechanisms. A substantive data set was investigated and by using the second simulation study, which was structurally similar, I could provide reasonable conclusions about the nature of the estimates. Also, I could suggest avenues of research which would potentially minimize expenditures for additional high quality data. I conclude that the problem of imperfection may be addressed through standard statistical methodology, but that the known effects of missing data or measurement error may not manifest as expected when more general data imperfections are considered.

Item Metadata

Title	Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models
Creator	Regier, Michael David
Publisher	University of British Columbia
Date Issued	2009
Description	Observational studies predicated on the secondary use of information from administrative and health databases often encounter the problem of missing and mismeasured data. Although there is much methodological literature pertaining to each problem in isolation, there is a scant body of literature addressing both problems in tandem. I investigate the effect of missing and mismeasured covariates on parameter estimation from a binary logistic regression model and propose a likelihood based method to adjust for the combined data deficiencies. Two simulation studies are used to understand the effect of data imperfection on parameter estimation and to evaluate the utility of a likelihood based adjustment. When missing and mismeasured data occurred for separate covariates, I found that the parameter estimate associated with the mismeasured portion was biased and that the parameter estimate for the missing data aspect may be biased under both missing at random and non-ignorable missing at random assumptions. A Monte Carlo Expectation-Maximization adjustment reduced the magnitude of the bias, but a trade-off was observed. Bias reduction for the mismeasured covariate was achieved by increasing the bias associated with the others. When both problems affected a single covariate, the parameter estimate for the imperfect covariate was biased. Additionally, the parameter estimates for the other covariates were also biased. The Monte Carlo Expectation-Maximization adjustment often corrected the bias, but the bias trade-off amongst the covariates was observed. For both simulation studies, I observed a potential dissimilarity across missing data mechanisms. A substantive data set was investigated and by using the second simulation study, which was structurally similar, I could provide reasonable conclusions about the nature of the estimates. Also, I could suggest avenues of research which would potentially minimize expenditures for additional high quality data. I conclude that the problem of imperfection may be addressed through standard statistical methodology, but that the known effects of missing data or measurement error may not manifest as expected when more general data imperfections are considered.
Extent	4407561 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-11-27
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0068495
URI	http://hdl.handle.net/2429/15883
Degree	Doctor of Philosophy - PhD
Program	Statistics
Affiliation	Science, Faculty of; Statistics, Department of
Degree Grantor	University of British Columbia
Graduation Date	2009-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Imperfect variables : the combined problem of missing data and mismeasured variables with application to generalized linear models Regier, Michael David

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights