Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Imperfect variables : the combined problem of missing data and mismeasured variables with application.. Regier, Michael David 2009-11-27

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


24-ubc_2009_fall_regier_michael.pdf [ 4.2MB ]
JSON: 24-1.0068495.json
JSON-LD: 24-1.0068495-ld.json
RDF/XML (Pretty): 24-1.0068495-rdf.xml
RDF/JSON: 24-1.0068495-rdf.json
Turtle: 24-1.0068495-turtle.txt
N-Triples: 24-1.0068495-rdf-ntriples.txt
Original Record: 24-1.0068495-source.json
Full Text

Full Text

Imperfect variables:The combined problem of missing data andmismeasured variables with application to generalizedlinear models.byMichael David RegierB.AR., Ambrose University College, 1997B.Sc., University of the Fraser Valley, 2003M.Sc., The University of British Columbia, 2005A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)July 2D09©Michael David Regier 2009AbstractObservational studies predicated on the secondary use of information fromadministrative and health databases often encounter the problem of missingand mismeasured data. Although there is much methodological literaturepertaining to each problem in isolation, there is a scant body of literatureaddressing both problems in tandem. I investigate the effect of missingand mismeasured covariates on parameter estimation from a binary logisticregression model and propose a likelihood based method to adjust for thecombined data deficiencies. Two simulation studies are used to understandthe effect of data imperfection on parameter estimation and to evaluate theutility of a likelihood based adjustment.When missing and mismeasured data occurred for separate covariates, Ifound that the parameter estimate associated with the mismeasured portionwas biased and that the parameter estimate for the missing data aspect maybe biased under both missing at random and non-ignorable missing at random assumptions. A Monte Carlo Expectation-Maximization adjustmentreduced the magnitude of the bias, but a trade-off was observed. Bias reduction for the mismeasured covariate was achieved by increasing the biasassociated with the others. When both problems affected a single covariate,the parameter estimate for the imperfect covariate was biased. Additionally,the parameter estimates for the otehr covariates were also biased. The MonteCarlo Expectation-Maximization adjustment often corrected the bias, butthe bias trade-off amongst the covariates was observed. For both simulationstudies, I observed a potential dissimilarity across missing data mechanisms.A substantive data set was investigated and by using the second simulation study, which was structurally similar, I could provide reasonableconclusions about the nature of the estimates. Also, I could suggest avenues11of research which would potentially minimize expenditures for additionalhigh quality data.I conclude that the problem of imperfection may be addressed throughstandard statistical methodology, but that the known effects of missing dataor measurement error may not manifest as expected when more general dataimperfections are considered.111Table of ContentsAbstract 11Table of Contents ivList of Tables viiiList of Figures . xviiAcknowledgements xviiiDedication xix1 Introduction1.1 Literature review: identifying the gap14891112131.3.1 Background to the substantive problem1.3.2 Study population1.3.3 The three routinely collected databases1.4 Summary2 A synthetic methodology for imperfect variables2.1 Introductory example: a variable which suffers from bothmissing data and mismeasurement2.2 Nomenclature2.3 Imperfection indicator2.4 The synthetic notation1.2 Thesis goals and structure .1.3 Motivating substantive problem171818202224iv2.5 The model.262.5.1 Likelihood 262.5.2 Covariate model 292.5.3 The imperfection model 322.5.4 Surrogate versus proxy: nondifferential and differential error 352.5.5 Response model 362.5.6 Mechanism of imperfection 392.5.7 Missing data 422.5.8 Missing Data and Mismeasurement as Special Cases 453 Maximum likelihood estimation via the Expectation-Maximizationalgorithm 473.1 Brief history of the EM algorithm 473.2 Rationale for and critiques against the use of the EM algorithm 523.3 Formulation of the EM algorithm 543.3.1 EM algorithm 543.3.2 EM algorithm and imperfect variables 573.4 Implementing the Expectation-Maximization algorithm: Monte-Carlo Expectation-Maximization 623.4.1 Monte Carlo integration 633.4.2 Monte Carlo EM: the expectation step 643.4.3 Gibbs sampler 683.4.4 Maximization 723.5 Louis standard errors 733.5.1 Monte Carlo wit Louis standard error for imperfectvariables 764 Simulation studies 784.1 Global aims of the simulation studies 784.2 General structural elements and procedure of the simulationstudies 79V4.3 Evaluative measures 804.3.1 Measures of bias and accuracy 804.3.2 Confidence interval based measures 844.3.3 Robust measures 864.4 Simulation study 1: binary logistic regression with two imperfect explanatory variables 864.4.1 Likelihood model for simulation study 1 874.4.2 Simulation specific details for the Monte Carlo EMalgorithm 914.4.3 Results and discussions 964.4.4 Simulation study 1 discussion 1274.5 Simulation study 2: a random variable which suffers fromboth imperfections 1314.5.1 Likelihood model for simulation study 2 1314.5.2 Simulation specific details for the Monte Carlo EMalgorithm 1354.5.3 Results and discussions 1394.5.4 Simulation study 2 discussion 1745 Does income affect the location of death for males, living inAboriginal communities, who died of prostate cancer? . . 1765.1 Results and discussion 1835.2 Conclusions 1926 Conclusion 1946.1 Future work 197Bibliography 199AppendicesA British Columbia Vital Statistics location of death . . . . 212vihA6%[UO,flUqS1OJSJeAJ9UoupJuoppossopuisoiuisouodoqU!jqijoojdwxjtLrUAtJflUIOTJOUOiZtmtXJAI10IUOT9flU1T10soTpn$suotnnmrsiojttuiuotuiddnSo6x!wuouuornp- 1 duiootIJOUOAiO[iOO9poiosqoOTflJOUO13A!1OGIWt{tJOpJ,\[JI{IOJS1OflpipUsS!fl071OJ9I!0GH9UO!8S0120JOTStOAn3uqoq1OJUT!SOHpu13auo!z!uItxmOSTA&uouodmoDT%[sjtopUOZ!UI!XU1NHlopomuoTsso1oJOSOIiiuqOI{JO&!A3UO0I[pOOT{tjo)!JOT{JO2A1OUOOVIH6tg2utjdunsouo!o!Iddy9Iu!Iduh1?SUOPOçO3JoAd’upcj9Ipofo’-jdoooodo 1 oAu:urjduisuotpoCo-jriutjdmguoipoço3-JOAIdPVIuomdopAopOtisJOJIuawaiddnSiList of Tables1.1 Some results for single and multiple keyword searches usingthe Web of Science index 52.1 An example of how the imperfect variable framework allowsfor more complicated relationships in a data set. Here, thereis a single variable which suffers from both missing data andmismeasurement. In this example, mismeasurement is notuniform for all realizations 192.2 Both missing data and mismeasured data have somethingmissing and something observed. The similar features of bothtype of data are used as the foundation for the concept ofan experimentally observable random variable and the targetrandom variable 212.3 Imperfect data patterns as specified by the generalized imperfect data indicator 232.4 Types of data problems which can be modelled using the imperfect variable framework 244.1 Notation and combinations for the missing data mechanismthat generates the data and the missing data mechanism assumed by the fitted model. The missing data mechanismtakes the form of a binary logistic regression 794.2 Coefficient values for different missing data mechanisms usedto generate simulated data sets. The systematic componentis specified as7 = (70, 71, 72,73)T89viii4.3 Comparing the effect of the missing data mechanism (MAR,NMAR) on the point estimate, bias, and mean squared errorwhen the mechanism that generates the missing data andthe missing data mechanism in the model are matched andmismatched. Mechanism A was used to generate the dataand the coefficient for missing data variable is/3and for themismeasured variable is/32,case 1 and case 2 only 1004.4 Comparing the effect of the missing data mechanism (MAR,NMAR) on the point estimate, bias, and mean squared errorwhen the mechanism that generates the missing data andthe missing data mechanism in the model are matched andmismatched. Mechanism A was used to generate the dataand the coefficient for missing data variable isi3and for themismeasured variable is/32,case 3 and case 4 only 1014.5 Comparing the effect of the missing data mechanism (MAR,NMAR) on confidence interval length, and coverage when themechanism that generates the data and the mechanism usedto model the data are matched and mismatched. MechanismA was used to generate the data and the coefficient for missingdata variable is/3and for the mismeasured variable is/32. .. 1024.6 Comparing the effect of the missing data mechanism (MAR,NMAR) on the median and difference between the medianand true value (6). Mechanism A was used to generate thedata and the coefficient for missing data variable is/3andfor the mismeasured variable is/321034.7 Comparing different missing data mechanisms when the mechanism that generates the data and the mechanism used tomodel the data is matched. Simulation sizes for case 1 (MARMAR) are 153, 168, 167 for mechanism A, B, and C respectively. 1074.8 Comparing different missing data mechanisms when the mechanism that generates the data and the mechanism used tomodel the data is matched. Simulation sizes for case 4 (NMARNMAR), they are 139, 163, 168 respectively 108ix4.9 Confidence interval length, and coverage when the mechanismthat generates the data and the mechanism used to model thedata are matched. Simulation sizes for case 1 (MAR-MAR)are 153, 168, 167 for mechanism A, B, and C respectively; forcase 4 (NMAR-NMAR), they are 139, 163, 168 respectively. . 1094.10 Comparing the effect of the missing data mechanism (MAR,NMAR) on the median and difference between the medianand true value (6) when the mechanism that generated themissing data and the mechanism assumed by the model arematched. Simulation sizes for case 1 (MAR-MAR) are 153,168, 167 for mechanism A, B, and C respectively; for case 4(NMAR-NMAR), they are 139, 163, 168 respectively 1104.11 Comparing the effect of sample size on point estimation, bias,mean squared error, and relative efficiency for case 1: MAR-MAR. Missing data mechanism B was used for this comparison. 1144.12 Comparing the effect of sample size on confidence intervallength, and coverage for case 1: MAR-MAR. Missing datamechanism B was used for this comparison 1154.13 Comparing the effect of sample size on the median and difference between the median and true value (6) for case 1:MAR-MAR. Missing data mechanism B was used for thiscomparison 1164.14 Comparing the effect of sample size on point estimation, bias,mean squared error, and relative efficiency for case 4: NMARNMAR. Missing data mechanism B was used for this comparison 1174.15 Comparing the effect of sample size on confidence intervallength, and coverage for case 4: NMAR-NMAR. Missing datamechanism B was used for this comparison 1184.16 Comparing the effect of sample size on the median and difference between the median and true value (6) for case 4:NMAR-NMAR. Missing data mechanism B was used for thiscomparison 119x4.17 Study design for exploring the effect of r 1204.18 Comparing how agreement and disagreement on the specification of r when the true value of r == 1.0 (4.17) affectspoint estimation, bias, mean squared error, and relative efficiency when using missing data mechanism B with agreementbetween the mechanism generating the missing data and theassumed missing data mechanism for case 1: MAR-MAR. . . 1234.19 Comparing how agreement and disagreement on the specification of r when the true value of r = 0.5 (4.17) affects pointestimation, bias, mean squared error, and relative efficiencywhen using missing data mechanism B with agreement between the mechanism generating the missing data and theassumed missing data mechanism for case 1: MAR-MAR. . . 1244.20 Comparing confidence interval length, and coverage for thefour patterns of r as given in Table 4.17. when using missingdata mechanism B with agreement between the mechanismgenerating the missing data and the assumed missing datamechanism for case 1: MAR-MAR 1254.21 Comparing the effect of sample size on the median and difference between the median and true value()for the fourpatterns of r as given in Table 4.17 when using missing datamechanism B with agreement between the mechanism generating the missing data and the assumed missing data mechanism for case 1: MAR-MAR 1264.22 Coefficient values for different missing data mechanisms usedto generate simulated data sets. The systematic componentis specified as7=(77777)T134xi4.23 Comparing the effect of the missing data mechanism whenthe missing data was generated with a MAR mechanism onthe point estimate, bias, and mean squared error when a covariate has both missing data and mismeasured data. Themechanism that generates the missing data and the missingdata mechanism used to model the data is matched and mismatched. Mechanism A was used to generate the missingdata for x 1434.24 Comparing the effect of the missing data mechanism whenthe missing data was generated with a NMAR mechanism onthe point estimate, bias, and mean squared error when a covariate has both missing data and mismeasured data. Themechanism that generates the missing data and the missingdata mechanism used to model the data is matched and mismatched. Mechanism A was used to generate the missingdata for x 1444.25 Comparing the effect of the missing data mechanism when themissing data was generated with a MAR mechanism on confidence interval length, and coverage where a single covariatesuffers from both missing data and measurement error. Bothmatched and mismatched missing data mechanisms are considered 1454.26 Comparing the effect of the missing data mechanism whenthe missing data was generated with a NMAR mechanism onconfidence interval length, and coverage where a single covariate suffers from both missing data and measurement error.Both matched and mismatched missing data mechanisms areconsidered 1464.27 Comparing the effect of the missing data mechanism on themedian and difference between the median and true value (ö)where a single covariate suffers from both missing data andmeasurement error 147xii4.28 Comparing different missing data mechanisms when when themechanism that generates the data and the mechanism usedto model the data is matched (MAR-MAR) for a covariatewith both missing and mismeasured data 1524.29 Confidence interval length, and coverage when when when themechanism that generates the data and the mechanism usedto model the data is matched (MAR-MAR) for a covariatewith both missing and mismeasured data 1534.30 Comparing the effect of the missing data mechanism on themedian and difference between the median and the true value(6) when the mechanism that generated the missing data andmechanism assumed by the model are matched (MAR-MAR)for a covariate with both missing and mismeasured data. . . 1544.31 Comparing different missing data mechanisms when when themechanism that generates the data and the mechanism usedto model the data is matched (NMAR-NMAR) for a covariatewith both missing and mismeasured data 1554.32 Confidence interval length, and coverage when when when themechanism that generates the data and the mechanism usedto model the data is matched (NMAR-NMAR) for a covariatewith both missing and mismeasured data 1564.33 Comparing the effect of the missing data mechanism on themedian and difference between the median and the true value(6) when the mechanism that generated the missing dataand mechanism assumed by the model are matched (NMARNMAR) for a covariate with both missing and mismeasureddata 1574.34 Comparing the effect of sample size on point estimation, bias,mean squared error, and relative efficiency for case 1 (MARMAR) when a covariate has both missing data and mismeasurement. Missing data mechanism A was used for this comparison 163xlii4.35 Comparing the effect of sample size on confidence intervallength, and coverage for case 1 (MAR-MAR) when a covariatehas both missing data and mismeasurement. Missing datamechanism A was used for this comparison 1644.36 Comparing the effect of sample size on the median and difference between the median and true value (6) for case 1 (MAR-MAR) where a single covariate suffers from both missing dataand measurement error. Missing data mechanism A was usedfor this comparison 1654.37 Comparing the effect of sample size on point estimation, bias,mean squared error, and relative efficiency for case 4 (NMARNMAR) when a covariate has both missing data and mis-measurement. Missing data mechanism A was used for thiscomparison 1664.38 Comparing the effect of sample size on confidence intervallength, and coverage for case 4 (NMAR-NMAR) when a covariate has both missing data and mismeasurement. Missingdata mechanism A was used for this comparison 1674.39 Comparing the effect of sample size on the median and difference between the median and true value (6) for case 4(NMAR-NMAR) where a single covariate suffers from bothmissing data and measurement error. Missing data mechanism A was used for this comparison 1684.40 Comparing the effect of r on the point estimate, bias, andmean squared error when a covariate has both missing dataand mismeasured data and when the missing data mechanismis NMAR. Mechanism A was used to generate the missingdata for 3; 200 simulations were run 1714.41 Comparing the effect of r on confidence interval length, andcoverage where a single covariate suffers from both missingdata and measurement error and when the missing data mechanism is NMAR. Mechanism A was used to generate the missing data for /3k; 200 simulations were run 172xiv4.42 Comparing the effect of r on the median and difference between the median and true value (6) where a single covariatesuffers from both missing data and measurement error andwhen the missing data mechanism is NMAR. Mechanism Awas used to generate the missing data for/3i;200 simulationswere run 1735.1 Univariate summary of the applied data set. The overall column has the counts, proportions, means and standard deviation for all 215 subjects. The hospital and place of residencecolumns have the rates, proportions, means and standard deviations for the subjects contingent on being in the columncategory for the location of death 1825.2 Naive complete-cases results 1845.3 Parameter estimates, standard error, z-score, and associatedp-value for the MCEM methodology assuming that the missing data mechanism is MAR. Four levels of r are explored tocheck the sensitivity of the model to the assumption on r . 1855.4 Point estimate, 95% confidence interval, and confidence interval length on the odds ratio scale for the MCEM methodologyassuming that the missing data mechanism is MAR. Four levels of r are explored to check the sensitivity of the model tothe assumption on -r 1865.5 Parameter estimates, standard error, z-score, and associatedp-value for the MCEM methodology assuming that the missing data mechanism is NMAR. Four levels of r are exploredto check the sensitivity of the model to the assumption on r 1905.6 Point estimate, 95% confidence interval, and confidence interval length on the odds ratio scale for the MCEM methodologyassuming that the missing data mechanism is NMAR. Fourlevels of r are explored to check the sensitivity of the modelto the assumption on -r 191xvB. 1 Log-concave properties for probability distributions found inthis document [35] 221xviTIAX0T7gVWS!UtpOWpUSS!UI2u!snJ7soiojS1A1$Uup1Juoosnopuwmnsu 1 odc1,\I10181uonqipI’-IOUpipumsOt{pumopoijJO001pT1‘‘TflTM:OJ1Ao3owooupozpapusoiiojsoidô-ôoTuonq1pruTioupIpusopumopoatjjosatp001pu‘‘gjosuonqaqsp-qpatduioaiq’OUTODU!pozmiouO1{1JOUII 2 O!H8L1-A!IOpOD!AIOSqoqpuSOiOtflflipoqquinjot1SJH1sasppnsuonbosqnspuosqoJfl-Jo-pT1oTpJouopnasuoooquposnsojuqS[flt{JOSTrJAcknowledgementsFirstly, I would like to thank my supervisor, Dr. Paul Gustafson, for a passing comment about missing data and measurement error which eventuallyturned into this thesis. I am grateful for his patience through this processand for the numerous points at which his insights have directed the researchand influenced my particular interests in statistical methodology.Secondly, I extend my appreciation to my supervisory committee, Dr.Lang Wu and Dr. Ying MacNab, for their help and encouragement throughthe process.Finally, I would like to thank Dr. John Petkau for his advice aboutmy career development and Dr. Rollin Brant for his thought provokingconversations.xviiiDedicationTo my wife Christine Elizabeth Regier and daughter Elizabeth Anne VioletRegier, for their patience and sacrifices.xixChapter 1IntroductionMismeasured variables and missing information are common problems encountered in the routine analysis of data. Frequently, the presence of suchdata deficiencies obsfucates the contextual truth which is sought. For example, assume there is a single predictor which has a direct causal effect on theoutcome, but the predictor cannot be accurately measured. The parameterestimates of a simple linear model with a mismeasured covariate will attenuate if the statistical model does not account for the data deficiency. Inthis simple situation, neglecting mismeasurement and assuming the data isperfect may result in erroneously discounting the effect under investigation.In a similar manner, neglecting missing data may result in erroneousconclusions. If some of the observations in a data set are missing and thedata is analysed without properly accounting for this deficiency, then theparameter estimates may be biased and suffer from reduced efficiency. Inboth situations, by not accounting for the data deficiencies, biased estimatesand erroneous associated measures of variability may result.Encountering data which is plagued with both missing data and mismeasurement appears to be a common phenomena with observational data thathas been constructed through the linkage of several disparate data sources.Often the motivation for data linkage is the acquisition of a set of covariateswhich are of primary, or even secondary interest, to substantive researchers.One set of covariates which have recently drawn much attention are culturalcovariates. Typically, these are abstracted from census data to model adjustment variables such as sex, age or socio-economic status [51]. The use ofsurrogates and proxies for these situations is a common practice with censusdata being a common source of surrogates. Census data is routinely collectedand often is released freely to the public at an aggregate level. Although1this represents a monetary efficiency, it brings its own set of problems.Aggregate census data derived from a measured attribute of the individual members of the group of interest are “contextual” variables [98]. Ifconceived as a cultural construct, it is feasible to view aggregate census dataas describing a “context” which affects all the members of the group. Bothclassical and Berkson errors may be reasonable assumptions for such ambiguous social constructions. When using census based measures, the contextualmeasurement is ascribed to each unit, which is typically a person.The process of ascription is call geocoding. In its simplest form, geocodinguses a geographic measure to link the aggregate surrogate data to a subject[51]. A fundamental concern with geocoding is the accuracy of the aggregatedata in revealing the true relationship between the explanatory variable andthe outcome [98]. A naive approach is to assume no measurement errorand treat the surrogate or proxy as perfectly measured. In such a situation,we may see a conclusion that would sound something like, “cancer patientswho live in a South-Asian community are more likely to die in the hospitalwhen compared to a culturally heterogeneous community”. Although thisstatement is hard to fault, there is a subtle semantic game at play. Theinference desired is for the patient, not the community, thus the implicationis that people who live in a South-Asian community are more likely to beSouth-Asian. The danger is to conclude that South-Asians are more likelyto die in a hospital when compare to non-South-Asians.An ecological analysis often is performed for the intent of inferring to theindividual and the error of inferring group level conclusions to individualsis referred to as the ecological fallacy [80, 89, 92]. Distilling the ecologicalfallacy to its core problem reveals that it is a problem of inference acrosslevels of data where the patient is seen as an aggregate unit of size one. Thefallacy occurs when inference based on the aggregate data is ascribed to theto the smaller aggregations.In contrast to this concern is the individualistic fallacy which is the notion that individual level data is sufficient for epidemiological studies [51, 92].This suggests that the individual-level analysis is able to capture the groupdynamics of a health related phenomena. This is aggregate-level conclu2sions based upon a generalization from individual-level data [98]. There ismuch literature concerning these fallacies, which will not be visited in thisexposition [34, 51, 72, 80, 92, 98].The applied example in this thesis utilizes geocoding to associate cancerpatients with surrogates of desired cultural attributes, thus we will brieflyconsider ecological studies. Morgenstern [80] deals with this situation in anoblique manner through the concept of cross-level bias. He partitions thebias resulting from an ecological study into• aggregation bias which results from the grouping of individuals, and• specification bias which results from the confounding effect of the groupitself.Cross-level bias is the sum of aggregation bias and specification bias andcan make the ecological associations appear stronger or weaker than theindividual-level associations. It is noted that the two could cancel eachother out resulting in no bias at all. It is suggested that no cross-level biaswill occur if the ecological covariates affect the outcome [80]. Observationalstudies which require the use of surrogates or proxies may seem overly problematic, but Greenland [37] gives balanced insight into the problem; “it isimportant to remember that the possibility of bias does not demonstrate thepresence of bias, and that a conflict between ecological and individual-levelestimates does not by itself demonstrate that the ecological estimates aremore biased”.Stripping away the semantics and definitions, we see at the centre of allof this a basic mismeasurement problem. We have a desired inference abouta population, predicated on subject level data, but a model which crossesboth the unit level and the aggregate level of the data. With the use of secondary data sources we should expect a level of missing data. For databases,the absence of information may result from poor database management orfrom using constructed variables which are predicated on information frommultiple sources. Poor database management would include such issues asincomplete database design such that fields can be left blank without any3notation as to the intent of the blank. The intent of the database itselfcan result in missing information when the primary users of the data havevery different data needs than the substantive researchers who are using thedatabase as a secondary source of information.A further source of imperfection is the construction of variables. Someinformation, such as survival time, may be constructed using a variety ofdata sources. The date of diagnosis may be in the health database, butthe data of death may be abstracted from a different source such as vitalstatistics. Although for this particular scenario, the probability of missinginformation may be low, it illustrates the type of problem that may beencountered. If census data is used it is possible to link incomplete surrogateinformation to the subject for which the mean income would be an example.It is clear that an observational study predicated on the linkage of disparate data sources for research beyond the intent of the databases themselves may present the dual problem of missing data and mismeasurement.It is reasonable to assume that any analysis of this type of data should not bebased on standard naive complete-case methods. The obvious result wouldbe measures of variation which would be too optimistic, but what other dangers to such an approach may be encountered? What kind of adjustmentsare necessary?1.1 Literature review: identifying the gapBefore we address these questions, we will see if they have been considered in the literature. Considered independently from one another, thereis a wealth of information concerning the problems of missing data andmismeasurement, but literature for which both problems are considered intandem becomes very small. For the identification of peer reviewed articles where both missing data and mismeasurement were integrated into thesame model under a unified theoretical framework, we restricted the searchto the Thomson’s Institute for Scientific Information Web of Science index. Furthermore, we restricted the search to the following set of keywords:missing data, missing response, missing covariate, measurement error, mis4Table 1.1: Some results for single and multiple keyword searches using the Web of Science indexNumber of ResultsKeyword(s) August 2007 September 2007 April 2009“Measurement error” 6,011 6,051 7,086“Missing data” 4,150 4,206 5355Misclassification 3,670 3,701 4383“Error in covariates” 10“Errors in variables” 119Misclassification AND “missing response” 0 0 0Misclassification AND “missing covariate” 0 0 0Misclassification AND “missing dat&’ 31 37“Measurement error” AND “missing response” 2 2 2“Measurement error” AND “missing covariate” 8 8 8“Measurement error” AND “missing data” 77 78 94“Error in covariates” AND “missing data” 6“Errors in variables” AND “missing data” 3c’1measurement, misclassification, error in covariate, and errors in variables.Single term searches resulted in large numbers of articles for which any review would extend beyond the parameters of this thesis, so all searches wererestricted to combinations of keywords (e.g. missing response AND misclassification). When two keywords were considered the number of articles weregreatly reduced (Table 1.1). The abstracts were read and promising articleswere retrieved and reviewed.The results of the literature fall into four general categories and sincethis is not a systematic review, an illustrative approach will be taken. Fromthe search, peer reviewed articles1. used measurement error and missing data as synonyms,2. used proxies for the missing data and then proceeded with a measurement error model,3. addressed measurement error and missing data separately, or4. proposed a model which adjusted for both the missing data and mis-measurement problems found in the data.Category 1 used the term measurement error and missing data as synonyms. These papers typically identify one in the abstract and the otherin the keywords (e.g. missing data in the abstract and measurement errorin the keywords). They had a strong application focus and populated bothsubject area and statistical journals. An example of this is with Stricklandand Crabtree [94], where patterns of missingness of survey responses amongheterogeneous subgroups with family medical practices is investigated. Missingness is the central idea. An EM algorithm is used in conjunction with ahierarchical model. Although the researchers are investigating missing data,measurement error is listed as a keyword.For category 2, a missing data problem is transformed into a measurement error problem through the use of a proxy or supplemental data. Thiswas done for covariates that were not completely observed and for covariatesthat were missing (i.e. completely unobserved). The most prevalent manifestation of this approach is for the latter situation. For example, Pepe [84]6considers the situation where an entire covariate is missing, thus a proxyis used and a challenging missing data problem is transformed into a simpler and more concrete measurement error problem. Wang and Pepe [107]give another example. Here, the investigators are concerned with estimationbased on general estimating equations when true covariate data are missing for all the study subjects, but surrogate or mismeasured covariates areavailable.The third category covers those papers that mention both measurementerror and missing data, but they are handled separately. Hsiao and Wang[42] propose a method to obtain least-squares or generalized least-squares estimators of structural nonlinear errors-in-variables models and then suggestthat their proposed method is useful for missing data or covariate proxies.The final category contains only four papers. Each considers both theproblem of missing data and measurement error in tandem. When this research was initiated there was one article concerning partial linear modelswhich remotely touched on the underlying themes to be addressed in thisthesis. Liang, Wang and Carroll [63] considered the situation where theresponse is missing at random and measurement error affects only the predictors. Semiparametric and an empirical likelihood estimator are proposed.Since then, three articles addressing this dual data deficiency have beenpublished; each focuses on longitudinal data. Yi [115] considers the effectof covariate mismeasurement on the estimation of response parameters forlongitudinal studies. Missing data is assumed to be missing at random alongwith a classical measurement error model. An inverse probability weightedgeneralized estimating equation (IPWGEE) is used with SIMEX. Wang etal. [106] considers the combined problem of missing data measurement error for longitudinal studies and uses a general approach based on expectedestimating equations (EEEs) with the asymptotic variance obtained withsandwich estimation. Finally, we have Liu and Wu [71] who consider missing data and measurement error within the context of a longitudinal studywith nonlinear mixed-effects models. Non-ignorable missing data in the response and missing data in other time-varying covariates is considered.71.2 Thesis goals and structureIt is clear that little work has been done on the issue of managing the combined problem of missing data and mismeasurement, but the work done iswithin a rather sophisticated methodological framework. Although it provides solutions to very real substantive problems and is interesting from astatistical point of view, it only obliquely asks a fundamental question; aremissing data and mismeasurement manifestations of a more general problem?In the spirit of Dempster et al [22], we recognize that much work hasbeen done in the areas of missing data and mismeasurement with much ofthe work exhibiting advanced statistical methodology, but there has beenno formal consideration that these two areas may just be particulars of amore general problem. The primary endeavour of this thesis is to proposea conceptual framework which considers missing data and mismeasurementas manifestations of a more general problem: imperfection. We will consider the structure of experimental or observational data and the role thatobservation plays in defining the problem of imperfection. Furthermore,we will propose an imperfect variable model which is an integrated missingdata-mismeasurement model.The secondary objective, is develop a likelihood based approach to adjusting for imperfection and to provide a parsimonious analytic frameworkfor adjustment within the context of generalized linear models. Since there isno known literature concerning the adjustment for imperfect variables withinthe generalized linear model framework, we will explore the effect of imperfection in covariates on parameter estimation for two situation: when missing data and mismeasurement affect separate covariates, and when missingdata and mismeasurement affect a single variable. In order to highlight thesubstantive relevance of addressing both missing data and mismeasurementwithin a unified framework, a social-epidemiological example will be used toexhibit the utility of the proposed conceptual framework and methodological approach. One feature of the substantive problem which will be retainedis the assumption that no auxiliary information exists for the mismeasured8variables.In a rather unconventional twist, chapter 2 will take a look at the concepts, language and notation. This chapter will begin with an illustrativeexample to motivate the ensuing metamorphosis of the standard language,notation and concepts used in both the missing data and mismeasurementliterature. Since there is no common language that transcends either statistical sub-discipline, the synthesizing of statistical methodologies for thedual problem of missing data and mismeasurement requires nomenclaturemodification. Within this chapter, a unified model for imperfect variableswill be presented and issues such as nondifferential error will be addressedfor the proposed imperfect variable. Once the common language and notation has been established, the likelihood model and the components of themodel will be discussed.Chapter 3 will focus on how to obtain maximum likelihood type estimatesof the parameters and their associated standard errors through the use ofthe EM algorithm. In this section, the time devoted to clearly delineatingthe conceptual framework for imperfect variables should become evident.Chapter 4 will present the background and the results for two simulationstudies. The first study will consider the problem of having both deficienciesin the same data set, but only one of the two deficiencies can present in asingle covariate. The second simulation study will consider the problem ofhaving both deficiencies in a single covariate. Chapter 5 will consider a dataset abstracted from a linked data set. Most of the information about themotivating social-epidemiological problem will be presented in chapter 1,thus little of the substantive background will be repeated. The final chapterwill draw some general conclusions and present some future areas of research.1.3 Motivating substantive problemParticipation in substantive research permits statisticians an alternate perspective on their discipline. For a statistician, the phrase getting your handsdirty means working with data much like a gardener would work with thesoil. Metaphorically digging through a complicated data set often reveals9treasures much like a gardener would experience when first acquiring anovergrown and unruly garden. After a period of time a familiarity beginsto develop; what may at first looked like an unimportant specimen soonbecomes a valued and prized piece.A composite data set, one constructed from a collection of disparatesources often seems like an inherited and unruly garden to a statistician. Thevariables may be unfamiliar and may not be immediately recognized for theirvalue, but familiarity brings understanding and appreciation. The inherentbeauty of the composite data set is that information unattainable from asingle source can be constructed from multiple and potentially disparatesources to address complicated substantive questions.Although we can link data from disparate sources to construct ever-moreelaborate databases and data sets, the act of doing obscures two basic questions: should we perform data linkage and what is the cost of data linkage?The first involves a labyrinth of legal issues spanning ethical concerns aboutthe use of secondary information for scientific research to basic concerns ofabout the protection of an individual’s identity and privacy. As a statistician, we may choose to defer those questions to other experts. Beyond theobvious costs of time and money, there is a much more subtle question concerning the validity of conclusions based upon routine analysis of compositedata sets. This second question is where a statistical perspective is needed.This can be thought of as the cost of untenable inference. The problem of defining and measuring cultureA team of researchers, known as the Cross-cultural N.3T, is interested in understanding the impact of culture on health decisions and health outcomes.For this group, culture has been defined as a complex interplay of meaningsthat represent and shape the individual and collective lives of people. Thedefinition, although satisfying from a sociological perspective, gives littleguidance for measuring culture. The intentional omission of meta-narrativelanguage has been used as an opportunity to generalize the concept of culture to include familiar ideas, such as ethnicity, and religion to more novel10constructions of culture which include socio-economic status, and occupation.From a conceptual point of view, this is a flexible definition of culturewhich has high utility in an open-ended course of research, but it does provide a guide for measuring culture. The generality of the definition is one ofits strengths because a study specific definition of culture can be constructedconditional on the data collected. The advantage is also its disadvantage;often the aspects of culture which are of most interest, such as self-reportedcultural identity are those not routinely collected in Canadian databases.One way around this problem is to link a database with census data. Indoing this, researchers are able to associate each patient in the defined population with a wider range of cultural constructs, but as previously discussed,it comes with costs beyond those of time and money - a complex database.1.3.1 Background to the substantive problemThe body of literature concerning the use of end-of-life health services inCanada is small, but when issues of culture are of interest, it becomes extremely sparse. Due to the substantive focus of the research and the sparsityof the literature, much of the investigation into culture has been predicatedon the work of a few Canadian researchers: Burge, Lawson, and Johnston[91.The operating hypothesis is that given a choice, Canadian cancer patients would prefer to die outside of a hospital setting. One avenue ofresearch is to understand if this preference is shared across all Canadiancultural groups. A challenge with the research is that much of the information is predicated on routinely collected data housed in government andhealth databases. These sources of information can only address the actionand not the intent of the patients, thus inference from the acts of the patients to the desires of the patients is assisted with supplementary domesticand international end-of-life research.It is known that approximately two-thirds of Canadian cancer patientsdie in the hospital yet from international literature it is hypothesized that be11tween 50% and 80% of these patients would prefer to die at home [100, 112].It is suggested that this difference arises from a complicated mechanismthat prevents people from dying in their preferred place [88]. Barriers suchas socio-economics, culture, provincial and federal policy, geography, andbarriers inherent in the health service delivery system may result in a displacement of people from their desired location of death.Although it is commonly assumed that people would prefer to die intheir place of usual residence, for a culturally diverse population, this assumption may be erroneous. For example, discussions about death and dyingwithin Chinese communities are, in general, considered disharmonious andare avoided due to the belief that such conversations will bring bad luck [29].The Cross-Cultural NET researchers have posited that this belief affects thelocation of death and manifests in the decision to not die at home in orderto prevent bad luck for the household. This hypothesis stands in contrastto a recent finding that suggests the trend towards hospital deaths amongChinese is more a factor of improved medical resources and changing societal norms due to the changes in the location of the best medical care andnatural generational shifts. Furthermore, it is believed that this desire forthe dying in Western societies to die at home is related to the quality of caregiven at the home or in a hospice when compared with hospital care [39]. Inthe United Kingdom, where there is a different mixture of cultures, cancerpatients exhibit an overwhelming preference to die at home or in a hospice[101]. This suggests that under a general definition of culture, disregardinga patient’s culture impedes quality health service delivery.1.3.2 Study populationThe study population used to obtain and construct the composite databaseis defined as all adult British Columbian residents, age 20 and older, whodied in BC, as identified on their death certificate, due to malignant cancerbetween 1997 and 2003, excluding death certificate only (DCO) diagnoses ofcancer. A DCO diagnosis is one made at the time of death and has not beentraced back to any prior information which indicated or suggested that the12patient may have cancer. Prior information can include a pathology report,or even a notation of an assumption that the patient had cancer but notconfirmed through testing. In the latter case, it is assumed that the patientis dying from a non-cancer primary cause which will progress more rapidlythan the cancer. This is the source population on which all study data setswill be predicated.1.3.3 The three routinely collected databasesBC Cancer BC VitalRegistry StaUsticsSubject SelectionI_BCR, BCVSCanada CensuslinkeddatabaseDisseminationof unique patientsArea GeocodeI___Endof-life_________Canada CensusDatabase DataIStudyData SetFigure 1.1: Database linkages used in the construction of the End-of-lifeDatabase and subsequent study data sets.Focusing primarily on the location of death, which was readily available,13the investigation was based on routinely collected data which resided withvarious agencies: the British Columbia Cancer Agency, Statistics Canada,and the British Columbia Vital Statistics. Compared to the acquisition ofthe data, the process of linking the information is inexpensive and straightforward process given the appropriate software [17]. Currently there arethree sources of data: British Columbia Cancer Registry (BCCR), the BritishColumbia Vital Statistics (BCVS), and Statistics Canada Census (Censusor Census data). The BC Cancer Registry (BCCR) began collecting patientdata in 1969. In 1980 the maintenance of the BCCR was transferred to theBC Cancer Agency (BCCA). Since then, the I3CCA has collected patientinformation for the purposes of surveillance and research. It is a populationbased registry that is routinely linked with BCVS death certificate information and it is assumed that all persons who lived and died in BritishColumbia and had a diagnosis of cancer (primary, secondary, or tertiary)are included in the BCCR.The BCCR and BC Vital Statistics are routinely linked for basic information such as the causes of death. Information more specific to the deathof a patient is not routinely linked and must be requested from BCVS. Thelinkage of additional information is done by either the BCVS or the I3CCRdepending on the type and quantity of information requested. The locationof death is an example of information which would require a special datalinkage.The linked databases of the ]3CCR and the BCVS contains basic healthinformation information. It does not include information such as relationship status (e.g. married, single), socio-economic status (SES), or cultural information. More sophisticated demographic, SES and cultural measured are derived from census data. The linked database comprised of theBCCA, BCVS and census data will be referred to as the End-of-life database(EOLDB); all study data sets are derived from this database (Figure 1.1).The postal code of usual residence was taken to be the postal code abstracted from the death certificate. If this was unavailable, the postal codewas obtained from the BCCR. The postal code was used in conjunction withStatistic Canada’s Postal Code Conversion File program to associate each14patient with a dissemination area. This is known as geocoding [11]. Thegeocode is a seven or eight digit number which reflects Standard Geographical Classification (SGC) used for census reporting. The hierarchy of theSGC, from largest geographic unit to smallest is province, census division,census subdivision, and dissemination area. In this hierarchy, the provinceis divided into census divisions, a census division is divided into census subdivisions, and a census sub-division is divided into dissemination areas. Theseven digit geocode is broken down as follows: first two digits represent theprovince, the second two represent the census division(CD), and the lastthree represent the census subdivision (CSD). The eight digit geocode hasthe province and census division for the first four digits with the last fourrepresenting the dissemination area (DA). Census information is abstractedat the DA level and linked to each patient based on the geocode. British Columbia Cancer RegistryThe British Columbia Cancer Registry (BCCR) is the primary source of patient level data. The target population, initially abstracted from the BCCR,contained approximately 80,000 unique patients. The data contained incomplete patient information with imprecisely recorded fields. Immediately, thecombined problem of missing information and mismeasurement was evident.Much of the data in the BCCR is categorical. There are many codingoptions, but in general there is no code to indicate the intentional omissionof information. This makes things a bit more challenging when using thedata. Does a blank mean that the data was obtained but not recorded ordoes it mean that no data was ever collected and would never be collected.Furthermore, the update and completeness of the BCCR data is done astime permits [113]. These represent only some of the problems encounteredwith the BCCR data. Canadian CensusCensus information is released in a variety of ways ar4d at a various aggregations. One useful aggregation is the dissemination area. Following the15geographic hierarchy of province, census division, census subdivision andthen dissemination area (UA), the DA is the lowest level that is freely released is at the dissemination area (DA) which represents a small compactarea bounded by visible boarders (e.g. river, road) and has an average sizeof 550 people [12]. If there are small numbers of people (less than 40) in aDA the information is not released which suggests the presence of missingcensus information. [12]. This problem occurs frequently for Indian Reservations, rural and northern locations. An example is the use of mean incomerather than the quartiles provided by the Postal Code conversion program[11]. There are many dissemination areas where the population is less than40 individuals, thus there is non-released income information. This translates to missing surrogate information which is a combined missing data andmeasurement error problem. With the applied problem, we will consider asurrogate for which both problems exist within the same covariate. British Columbia Vital StatisticsThe data obtained from the death certificate has a high level of completeness.The problem with the death certificate information relates to the outcomemeasure (Appendix A). With secondary tables which contain the facilitycodes for various health units and facilities, it is possible to fine tune thesecodes to abstract more specific locations of death, but there is no informationto identify hospices or locations of death inside the hospital (e.g. ICU) whichare of greater interest to the Cross-Cultural NET researchers. With effort,these aggregate groups can be broken down into more specific categories,but for some locations it is currently impossible. The inability to identifythe location of death in a hospice or in a specific location within a hospitalrepresent a fundamental problem and a misclassification for the location ofdeath. It will be assumed that the outcome is accurately measured and thiswill be reflected in the proposed model.161.4 SummaryObservational epidemiological research utilizes census based data in orderto supplement health information with cultural information that may bedifficult to obtain. The process of geocoding associates each patient with ageographic region which allows for census based information to be associatedwith the patient. In epidemiological terms, this is a cross-level analysis andmay suffer from the ecological fallacy. The heart of this problem is one ofmismeasurement since an aggregate surrogate is being used, but when thesurrogate is derived from the census, missing information may occur.The problems which arise with having both problems present in a dataset are, in general, unknown. Even more opaque is the effect that this dualproblem has on the parameter estimates. It is assumed that they will bebiased and have problems with accurate measures of efficiency, but the directions and trends of these problems are not well documented. Furthermore,there exists no known literature addressing the dual problem as manifestin a single covariate. In response to these gaps of knowledge, a uniformperspective will be proposed in which both missing data and mismeasurement will be conceived of as particulars of a more general problem of datadeficiency called called imperfection.Two simulation studies will be executed in order to investigate the effectof imperfection on parameter estimation. A likelihood based approach toadjust for imperfection will be explored. An applied data set will be abstracted from the population of all adult British Columbian residents, age20 and older, who died in BC, as identified on their death certificate, due tomalignant cancer between 1997 and 2003, excluding death certificate only(DCO) diagnosis of cancer.17Chapter 2A synthetic methodology forimperfect variablesFrequently, missing data and mismeasurement are treated in isolation fromone another. In this chapter, a unified approach will be proposed for thegeneral problem of data imperfection. Since there is substantial work done inboth the areas of missing data and measurement error, it would be prudentto use the established methodological approaches and terminology as foundational to the generalization process. To this end, the primary methodologicalgoal becomes the integration of the two areas under a unified perspective.2.1 Introductory example: a variable whichsuffers from both missing data andmismeasurementTo begin, we will consider a constructed example to highlight importantfeatures, nomenclature, and notation for integrating missing data and mis-measurement into a general framework: imperfection. Table 2.1 is an example of how the complexity of imperfections can be accounted for withinthe single framework of imperfect variables. In this example, there is asingle variable which suffers from both missing data and mismeasurement.Assume, for this example, that realizations are mismeasured once they areabove a certain value, hence subject 1 and 4 suffer from mismeasurement.Furthermore, above a certain value, the realizations are unobservable as withsubject 3. Finally, below a certain value, the realizations are observable andthey are accurately measured, as with subjects 2 and 5. The realizations are18Table 2.1: An example of how the imperfect variable framework allows for more complicated relationships inadata set. Here, there is a single variable which suffers from both missing data and mismeasurement.In thisexample, mismeasuremeit is not uniform for all realizations.Subject Target: x Observed:xEImperfection Relationship between x andxESynthetic notation1 6.5 4.3 Mismeasurement = f(x)+ e Xbs2 1.3 1.3 None (Perfect)XE= x x3 9.8 - Both = f(x)±4 8.2 7.6 Mismeasurement =f(x) +5 2.2 2.2 None (Perfect)XEX XIbeing generated from a single process, but there are a multiplicity of dataproblems.In this example,XE1S the experimentally observable realization and it isalways observed. It is what a researcher would observe and record, the thirdcolumn in table 2.1. The observed realization is not necessarily a realizationfrom the target, X, so we need to determine the relationship between whatis observed,XEand the target which is what we want to observe, x.Imperfect variables occur with subjects 1,3, and 4. For subjects 1 and4, the problem is restricted to measurement error. The realizationxisobserved, but it is known to be functionally related to the target, thus= f(x) + e. Additionally, the realization was observed, so= Xb8.Forsubject 3, we know a priori that it is mismeasured, but we are unable toexperimentally observe it. In this case we have x88= 4and we knowthat if it was observed thenxE= f(x)+e.The introduction of an experimentally observable random variable, laysthe foundation for the development of both the nomenclature and the notation. Although this may appear to add a level of redundancy, it will helpto reduce the notational complexity for specifying a model and assist indeveloping a model which relates the observed and unobserved data.2.2 NomenclatureWhen missing data and mismeasurement methodologies are brought together, the foundational theses do not cleanly align. In the case of mis-measurement, there are situations where a mismeasurement problem can bere-cast as a missing data problem. The counter-point is that the entire bodyof mismeasurement literature cannot be recast as a missing data problemand in the process be relegated to the status of a particular manifestationof missing data. If this was the case, missing data would then be a moregeneral version of mismeasurement.It is the process of integrating the fundamental concepts and notation ofmissing data and mismeasurement which necessitated the evolution of thenotation and the supporting concepts. Although imperfect random van20Table 2.2: Both missing data and mismeasured data have something missingand something observed. The similar features of both type of data are usedas the foundation for the concept of an experimentally observable randomvariable and the target random variable.Unobserved ObservedMissing Data X = (Xrnj58, X0b)Mismeasurement X = (X, X*)J. ,I.Imperfect Data X = (X ,XE)Target Experimentalables are of primary interest, we must first consider the language used todescribe such random variables. In the example, realizations from the random variable X were sought, but not always obtained. These were calledthe desired or the target realizations because in the experimental process,the variable under investigation or the target of the investigation is X fromwhich a researcher desires realizations.Recognizing that it is possible to not observe realizations from the targetrandom variable leads us to make a distinction between what is observedand what is not observed. In a perfect experiment, we would observe allrealizations from the target random variable X, but this is not always possible. When this occurs there are realizations from X for which observation isnot possible, but something is observed whether it is a mismeasured versionof the target random variable or a placeholder denoting the absence of anobservation. Recognizing these two features of data collection assists us inmodifying our usage of observation.For each subject in an experiment, something is observed. We denote theexperimentally observable “something” asXEfor the random variable generating the realization andxEfor the realization itself. This is an extensionof the mismeasurement use of the notation where x denotes the directlyobservable realization fromX*which is functionally related to the desiredrandom variable X (Table 2.2). This extension facilitates a modification of21how we perceive the random variable associated with a trial. From the example it was implicitly suggested that for each trial there are two associatedrandom variables: the target random variable X and the experimentallyobserved random variableXE.Rather than having the random variableX being associate with each trial or subject, a random vector (X,XE)isassociated.The random vector (X,XE)associated with each trial of an experimenthas a particular structure which should be recognized. For each trial, thereis a desired realization from the target random variable X. Unfortunately,experimental realizations of the target can be frustrated, thus we observe aversionxEfromXE.We assume that there is a relationship between X andXEand predicated on this assumption, we useXEto help us understandhow X affects some outcome of interest. Underneath the vector (X,XE)there is a structure which links the two and will provide a coherent means bywhich information from the experimentally observable random variable canbe used to understand the relationship between the target random variableand the outcome of interest.Finally, we will clarify the usage of direct and indirect observation. Direct observation of the random variable X suggests that we not only observerealizations from X, but that these observation are themselves sufficientlyaccurate with respect to the precision of measurement in the case of quantitative measures and precision of the language with respect to qualitativemeasures. We observe realizations from the target random variable, but thetools by which the observations are sufficient blunt that they are not sufficiently accurate, and example of which would be rounding error. In thiscase, we will treat the measured value as being mismeasured for it unable toprovide accurate measures of the target random variable. Realizations arenot from X, but fromXEwhich is related to the target.2.3 Imperfection indicatorFor each target variable associate a two dimensional indictor variable, R =(R’,RM).The indicator R’ corresponds to the missing data aspect andRM22Table 2.3: Imperfect data patterns as specified by the generalized imperfectdata indicator.r’rMGeneralized Imperfect Data Problem1 1 Perfect Data0 1 Missing Data1 0 Mismeasurement0 0 Missing Data and Mismeasurementis the indicator for mismeasurement. The random vector R has an associatedjoint probability distribution, p(R’,RM).The imperfection indicator is notassociated withXEbecauseXEis what is observed; the imperfection is aproblem associated with X and manifests throughXE.In section 2.2, we associated the random vector (X,XE)with each subject. In the same spirit, we associate the random indicator vector with eachvariable. For (X,XE)we associate R = (R’,RM),therefore for each realization (x,xE)we have realizations of the indicator random vector (r’,rM);the realized components are defined as11 if x is observedr =10if x is not observedandMIi if x is accurately measured10if x is mismeasuredwhich jointly allows for all combinations of imperfection to be specified (Table 2.3)The realization of the imperfection indicator, r = (1, 1) indicates completely observed and accurate, thus we say that the target random variableX is an imperfect variable if an only if Pr(R = (1,1)1.) <1. Using one of23Table 2.4: Types of data problems which can be modelled using the imperfect variable framework.Pr(R’= ii.)Pr(RM= ii.)Imperfection1 1 Perfect Data[0,1) 1 Missing Data1 [0, 1) Mismeasurement[0,1)[0,1) Boththe conditional expressions of the joint distribution as illustrative, if missingdata is probable, thenPr(RM= 1) = 1 and Pr(R’= lirM= 1) < 1 (Table2.4,). For mismeasurementPr(RM= 1) < 1 and Pr(R’= 1rM= 1) = 1(Table 2.4). If both the variable suffers from both problems, then the product of the conditionals will be less than one. Finally, if neither problem existsthenPr(RM= 1) = 1 and Pr(R’ = lIrM = 1) = 1. With an analogousresult if the conditioning is donePr(RMr)Pr(R’).Before moving on to a further synthesis of missing data and mismeasurement ideas, a final observation is that the imperfection indicator R isanalogous to Little and Rubin’s [67] proposal of an indicator for missingdata M, but is modified to include the problem of mismeasurement. Toforeshadow the subsequent formulation, it is an easy extension to conceiveof multiple covariates each with an associated imperfection indicator vector,R3where i = 1, . .. , n and j = 1, . . .,p. In this situation, we will have theimperfection analog to Little and Rubin’s proposed missing data indicatormatrix M.2.4 The synthetic notationWith the use of the indicator vector and some further observations, we willbe able to complete the notational synthesis. Recall that it was proposedthat the random vector (X, X’) be used rather than just the random variable X. The idea behind this conceptual shift is that in an experiment, each24trial produces something that can be observed: a realization in the form ofa quantity or quality, or a placeholder denoting the absence of the quantityor quality. This is denotedXA= (X,XE).This is the augmented versionof what is typically associated with an experimental trial with a realizationdenoted asxA= (x,xE).Under the imperfection framework, it is possibleto consider a vector of everything that can be observed during the experimental process. The complete set of random variable for each trial of anexperiment or universa, is denotedXU = (X,XE,RI,RM)= (XA,R)Alluded to in previous sections is a relationship between X andXE.A missing data mechanism is a probabilistic model that characterizes thepattern of missing data and will be used to provide a relationship between=x and the potentially unobserved target x when the imperfectioninvolves mismeasurement. For mismeasured data, a mismeasurement modelrelatesxEand x. Auxiliary data may exist which would help to constructa relationship between X andXEbut this auxiliary data is not part ofXLin the sense that it may not be observed concurrent with the other studyvariables. If there is no auxiliary data a measurement error model can beassumed.Now, lets extend these ideas to the situation where there are n independent and identically distributed subjects with more than one covariate. LetX = (Xj, X, R, Re,’) be the random vector for thejthtrial or subject,i = 1, .. . , n and thethrandom vector, j = 1, . ..‘p. For n independent andidentically distributed subjects, the design matrix will be a n x 3p matrixwhich is related to the underlying conceptual n x 4p design matrix which isfunctionally related to the target n x p matrix.Now that we have a working framework for imperfect variables, we canturn our attention to the likelihood. It can be broken into three parts:a covariate model, a response model, and a model for the mechanism ofimperfection. Before moving on to the likelihood, two notes should be made.25The first is that is missing data literature often uses z to denote missingdata. Although there is no direct correspondence between the notation usedin missing data literature and that proposed for imperfect variables, the useofxE= Xmjss andXE= are analogous to z. The second is that muchof the notation has been inspired from the mismeasurement literature. Itis noted that the use of x” is not universal notation for the mismeasuredvariable and often w is used.2.5 The modelFrom notation and underlying concepts, we turn our attention to the models themselves. We will begin by considering the likelihood model and itscomponents.2.5.1 LikelihoodSuppose that there are n independent and identically distributed subjectswith (Xv, Y) being the random vector for thejthsubject, where the responseis Y, is the universa random vector and p(X, Y2)is the associatedjoint probability distribution indexed by the parameter . For notationalsimplicity, p(UiIu2) will be used to denote p(U1 =Iu2) which isthe conditional distribution of the random variable U1 given the realizationu2 from U2. This notation will be used throughoutfor both continuousand discrete random variables. The joint distribution, as defined through aproduct of conditionals, isp(X’,Yj)=p(RjIx’,y,7)p(YIx,/3,)p(XI)(2.5.1)where X =(Xx,...,X), X = (X, . . . ,X), andT= (The conditional distribution of the indicator vector given the observed explanatory variables and response indexed by 7 and will be referred to as themechanism of imperfection model is p(RjIx”, y, fry).The conditional distribution of Y given the observed explanatory variables indexed by the p x 1vector of regression coefficients/3and the dispersion parameter 4 and will26be referred to as the response model is p(YjIx, /3, q). Finally p(X,I)is the joint distribution of the explanatory variables indexed by i& and willbe referred to as the covariate model. The complete likelihood for thethsubject isL(jx,y1)=p(RIx,y, -y)p(YIx”, /3, )p(X,I)(2.5.2)with the complete likelihood asLc(IXU,3=Lc(x, yj) (2.5.3)and the complete log-likelihood1(IxU,y)==log p(Rjx, y, -y) + log p(Yjx,/3,q) + log p(X, I&)(2.5.4)Within the context of missing data, the idea of factoring the joint probability distribution was implicitly suggested by Rubin [90] when he begins toaddress missing data and the sampling distribution of inference. Little andSchluchter [68] make this approach explicit in the discussion about alternatestrategies to specifying the joint probability distribution of the response andexplanatory variables. When the outcome is categorical and the explanatory variables are continuous, the factorization p(X, Y) = p(YIx) p(x) isrecommended when y is binary because this underlies the logistic regressionmodel which unlike the linear discriminant factorization, p(XIy) p(Y), doesnot rely on multivariate normal assumptions for the continuous variables.Additionally, the number of parameters is greater for the latter than thoseneeded to characterize a logistic factorization [91]. The logistic regressionfactorization approach has been subsequently used by Ibrahim [43], Lipsitz27and Ibrahim [64], Lipsitz et al. [65], Ibrahim et al. [44], and Ibrahim andLipsitz [48]. Furthermore, by including the probability distribution of Rj,the reason for missing information, and p(X,),the model for the mis-measurement mechanism in the likelihood, bias from missing informationand attenuation should be removed [14,65].Equation 2.5.1 is the probability distribution for thejthsubject and reflects the factorization proposed by Ibrahim and Lipsitz. Additionally, itrepresents the complete, or perfect, data likelihood. Without loss of generality, if we assume that the imperfect covariates for which both problems donot co-exist in the same covariate are the firstjcovariates in the data setand that the imperfect covariates for which both problems coexist within thesame covariate are the following r covariates, then when data is imperfect,the joint distribution of the observable covariates isp(X, R,YjI)= / . .I )dx . . . dx d41 . ..•XUx(2.5.5)for the i’ subject which suffers from imperfection, XU1Eis the union ofthe variable space associated with the target covariates and the variablespace associated with the experimentally observable covariates.With the specification of the likelihood (Equation 2.5.4), we are implicitly suggesting a particular approach for handling the missing data component of imperfection. Little [67] classifies models as selection models wherethe likelihood is written as a product of uniquely indexed distributions: theprobability distribution for the indicator of missing information conditionalon the data and the probability of the data. Furthermore, this characterization of the relationship between the indicators of missing informationand the data which suffers from the imperfection motivates likelihood-basedtechniques [67].282.5.2 Covariate modelThe covariate model is of particular interest because it has the potentialfor some combination of missing data and mismeasurement. A successfulintegration of the two deficiencies will require ideas from both areas to bewoven together: a model which specifies the joint probability distributionXA= (x,XE)and a mapping from X to X.Both nonparametric and parametric approaches have been proposed forthe specification of the covariate model when explanatory data is missing.baird [57] and Robins [86] proposed nonparametric approaches. Agresti [1]considered a parametric approach and used a joint multinomial log-linearmodel for discrete covariates. Although the number of nuisance parametersneeded to model the distribution could be reduced by retaining only the maineffects, Lipsitz and Ibrahim [64] proposed and alternate approach wherethey sought to reduce the number of nuisance parameters by using a seriesof one-dimensional conditional distributions. With this approach, the jointdistribution of (Y, X), is broken into a response model, Y conditional on xand a conditional specification of the joint distribution of X. This approachhas been termed the conditional-conditional approach.Two immediate benefits are that it has been shown that the conditional-conditional specification of the joint distribution approximates a joint log-linear model for the covariates and that any completely observed covariatedoes not need to be modelled [48]. The latter is a significant improvementover any method which would require the full specification of the joint probability model. In a discrete covariate situation, the number of required cellprobabilities to be modelled can grow quickly for a fully saturated model.For continuous covariates, specifying a high-dimensional joint probabilitydistribution may be unrealistic and a joint probability distribution may notnaturally present itself. For example, consider a data set with three covariates; the first is time to e’ent data such as survival time, the secondcovariate is continuous and resides in the interval [0, 1], and the third appears normally distributed, but has heavy tails. A three dimensional jointprobability distribution does rot instantly come to mind, but a conditional29model using an exponential, a beta, and a t-distribution is readily apparent.Initially, the conditional-conditional approach was used to reduce thenumber of nuisance parameters needed to specify the joint distribution ofdiscrete covariates and it was later extended to continuous covariates [48].If the problem at hand was only missing data then the joint distribution fora p-dimensional explanatory variable vector X: = (X1,..., X) would beconditionally specified asp(X1,. . .,X,b) =p(X2Ixj,. . . , x_1.p(Xx,2)p(xiIb1)whereT1,)T‘cl’is the indexing vector for thethconditionaldistribution, and the are distinct for all i. A beneficial feature of thisapproach is that data sets rarely contain only discrete or only continuouscovariates. Problems with both discrete and continuous covariates have beencalled mixed covariate models in that the covariate model is not strictlycomposed of only discrete or continuous data. The conditional-conditionalapproach is general enough to handle these situations [48].The complete likelihood (Equation 2.5.1), suggests that the conditional-conditional approach may have utility; we have the joint distributionp(X,YjI/3,q)which can be expressed as p(Yjx,3,q!)p(X,I).Theconditional model for the covariate distribution isp(Xb) =p(Xx,. . .XjQ,_),’K/)p) p(XIx/)2)p(XIl/)1)=p(XIx,x . . .X1), . ..Xp_fl, 1’)xp(XIx2,x,‘bjp(X2Ix,‘i4)x p(XIxii, ‘i,hf’)p(XjiI’)(2.5.6)where,T= (,...,,•T=is the parameter vector indexing the conditional distribution associated with the target explanatory variable for thethconditional distribution and is the parametervector indexing the conditional distribution associate with the experimentally observable explanatory variable for thethconditional distribution.30Buried within the conditional-conditional specification of the joint distribution of p(X, Y/3,, ‘v’)is a measurement error model. Consider thethcomponent of the conditional distribution for the joint distribution ofthe covariate modelp(XIx,Xj1 . . .Xj(_l),’/)J . . .Xj(j_1),,b)and if the imperfection includes mismeasurement thenp(X=Xjj1Xi,Xj .. .Xj(J_l),’,hJ )p(XjIxl . .where P indexes the state of observation, that is P = {obs, miss}. Byextending the conditional-conditional idea to the joint distribution of X, itis easy to see that we have come to the likelihood construction for classicalmeasurement error [14). Berkson specification can result as well with thethcomponent of the conditional distribution of the covariates beingp(X3I4 = Xjjr,Xji . . .Xj(_1),)p(Xjj= X,rIx ..An interesting feature of this approach is that a measurement error relationship naturally emerged through the application of a missing data modellingtechnique.At first, the conditional-conditional specification of the covariate modelmay appear excessive, but the complexity of the specification allows for anincreased flexibility. Additionally, such complex models may be rare. Consider the three cases of imperfection and the one case for perfect covariates.For a perfect covariate, no joint distribution needs to be specified, so allof the associated components in equation 2.5.6 drop out. For only missingdata problems, the observed realizations and the target are identical andthe missing data mechanism is already modelled in the complete likelihood.Mismeasured variables, missing or not, need a specification for a joint distribution and will still need the full complexity of the covariate model.312.5.3 The imperfection modelUsing a measurement error model as a genesis point, a structured relationship between the observable random variable and the target random variablewill be constructed. There are two general types of models to use in orderto specify the measurement error process:• Error models which include the classical measurement error models,where the conditional distribution ofX*given X is modelled, and• Regression calibration models which include l3erkson error models,where the conditional distribution of X given X is modelled.Given that a classical measurement error model has been specified in equation 2.5.6, we will restrict ourselves to this type of measurement error.A general version of the classical error model relates the mismeasuredvariable with the true variable and may be conditional on some set of perfectly measured covariates. In a standard problem of measurement error,the general classical measurement error model for thejthsubject and thethcovariate isX= co +cxiXjj + + Ej (2.5.7)where a = (aO, cia2)Tis the indexing vector for measurement errormodel, Xj is the unobserved target variable, Xj(_j) is the set of accuratelymeasured covariates without thethcovariate, and E (eIX) = 0.If thiswas strictly a measurement error problem, then this would be sufficient forspecifying the relationship between X$ and Xj, but it is unsatisfactory forintegrating the problem of missing data measurement error.Using material from both measurement error and missing data, a moregeneral model can be constructed which relates the observable random variable,XE,with the target random variable, X. The imperfection modelmust be able to characterize:• accurate and observable data,• missing data,32• mismeasured data, and• data with both problems.A simple approach is to use R to combine these four into a single relationship. For thejthsubject andthcovariate we haveyE M I ii I.,A.j =r3+ 1J- — rjj).15..ij,miss+ (1 — rjf) [rjX,j,Qb + (1 — rfj)X,m (2.5.8)where X,= ao +oiXij+a2TX(_)+ej,E(EJIXj) = 0,X_3 is the set ofth TTT.covariates with the j removed, and a3 = (aj,aid, 23)is the indexingvector for thethimperfect covariate. Although this stitching together isa naive approach, it does permit the required flexibility to model a widerange of complex data structures. For example, thejthsubject may have= (xij,miss, —, 0,1), thus we havexE= Xij,miss.Thekthsubject my haveXE= (xj,ObS,4.5, 1,0), soXEXkj,Obsfor whichXj,obs= Xkj +An interesting feature of this model is that a further generalization ispossible. Consider the situation where the measurement error does notapply uniformly to all subjects. In this setting, there exists disjoint subsetsof subjects for which the measurement error structure is different. Considerthe situation with K subsets indexed by k, then equation 2.5.8 becomesE MfI i I\Xj3k=r3 TjjXij,obs + i1 — TjjiXij,miss+ (1 — rjf) [rfjXk,Qb8+ (1 — rfj)Xk,mjss] (2.5.9)where X= + a1kX +cX2kTXi(_j)+ jkand E(E,kIXj) = 0.Aobvious benefit of this flexibility is that imperfection models need not beconstant across all subjects. The indexing parameteraJkTand random errorjkpermits a different parameterization for different imperfection subsets.Consider the situation. where a measurement. tool is imprecise at both thelower end and upper end of the measurement scale. In such situations, it maynot be reasonable to assume that the structure of imprecision is identical ateither extrema. It is observed that as k —* n the imperfection model becomes33a subject level model rather than a group level model. For simplicity, wewill restrict ourselves to where the imperfection model is identical for all isubjects (Equation 2.5.8).Equation 2.5.8 provides much flexibility, but at this point we will focus onunbiased measurement error. Under this restriction a= (ao = 0, a = 1, a2 = 0)which reduces equation 2.5.8 toX =rjf {rjX,08 + (1 — rj)Xij,miss]+(1 — r!)[rfjx,Qb8+(1 —rfj)Xj,miss]whereX =X +.(2.5.10)With the measurement error component of the imperfection model unbiased,the overall bias needs to be determined. The expectation of X given thetarget random variable yieldsE(xIx,nj) =E (rig {rXj,0b8+ (1 — rfj)Xij,mi5s]+ (1 — r) [rfjX,Qb8+ (1 — r(j)X,mjss] ni)where E(Xj,0b8IXjj,n) =E(Xij,mi5slXij,r)and E (x:*j,o&s!xij, ri3) =E(XKj,missIXij, nj), so=rE r) + (1 — rf)E (X1IX,n2)=riX + (1 — r)E (ao + aiX+a2TX(_)+ fjIXjrij)34and under the assumption of unbiased measurement error and E (EjIX) = 0=rfXj±(1- rjf)Xjj=xij,so the imperfection model, under the assumption that the measurementerror is unbiased, is itself unbiased.A final note concerns the covariance structure of X. We will not assumeindependence amongst the target random variables, so CoyXk) 0,and independence will not be assumed amongst the experimental randomvariables, Coy (X, X) 0. We will assume that the error terms are independent across subjects and across covariates, so the error is only randomnoise generated without respect to any other mechanism under consideration. Furthermore, we will assume that the experimentally observable random variables are conditionally independent given the target variable andindication that this is a measurement error problem.2.5.4 Surrogate versus proxy: nondifferential anddifferential errorWhen mismeasured, surrogate and proxy are terms commonly used to describeXE,the mismeasured, observed random variable, but there is evidencethat these two terms should not be used interchangeably. Carroll [14] suggests that a surrogate is the measured substitute for the desired variable, butit gives no information beyond that which would have been given by the truevariable. In contrast, the proxy yields information beyond that which thetrue variable would have given. This distinction in terminology reflects thetwo types of measurement error: nondifferential and differential error. Moretechnically, the measurement error is said to be nondifferential when Xf andY, are conditionally independent given X, so thatp(XIX,Y) p(XfX)[40]. If conditioning includes the indicator of imperfection, then the measurement error is nondifferential when the conditional independence of Xand (Ye, a1) given the target X so that p(XfIX1,Y, R1)p(XIX1).352.5.5 Response modelThe response model p(YIx4,Ø,),will be developed in two stages. Firstly,we will consider response models that are members of the exponential family.Secondly, we will restrict ourselves to a particular member, the Bernoulliresponse model.The exponential family is a rich source of probability models such asthe normal, binomial, and gamma which has often been exploited for thedevelopment of statistical theory. Some authors make a distinction betweenthe types of members and identify single parameter members as membersof the exponential family while two parameter members are members of theexponential dispersion family [27]. For our purposes, we will focus on singleparameter members, thus future references to the exponential family will bewithin this context.Assuming that thethsubject has a response from the exponential family conditional on the indexing parameters 0 and , the response Y: is distributed asp(Yj=yjIx, 0,)= exp + c(yj, (2.5.11)indexed by the canonical parameter 0 which represents location and thedispersion parameter which represents scale. Members of the exponentialfamily are defined through the specification of the functions a, b and c, forexample the binomial distribution has a(q5) = 1, b(S) = —m log(1 —ir) whereir indexes the binomial distribution and 0 = log and c(y,q)= log(v).The form for the expectation and variance of the exponential family iswell known [73]. The expectation of an exponential family random variable,Y, is E(YZ) = b’(O) a function of 0. The variance is a product of the locationand the scale, Var(Y) = b”(O)a(). For the exponential family, b”(O) iscalled the variance function and describes how the variance of Y is relatedto the mean.The function a(q) is commonly of the form a() = where w is aknown prior weight that varies from observation to observation and is36constant over observations. For one parameter members, such as Poissonand binomial, is fixed at one, but for two parameter members, such asNormal or Gamma, is free. Given that we are restricting ourselves to oneparameter members, it will be assumed that = 1. An implication of thisrestriction is that over- and under-dispersed models will not be considered.At this point it is good to observe that the response model given inthe likelihood is parameterized by /3, and relates a set of covariates to theresponse (Equation 2.5.1). The proposed exponential family response isindexed by 0 which is the canonical location parameter. A generalized linearmodel allows for a natural connection between the parameter of interest /3and the canonical parameter of the exponential family, 0. This relationshiprequires three pieces: a probability distribution, a systematic component,and a link function. The probability distribution has already been identifiedso only the systematic component and the link function are needed.The systematic component is a linear combination of the covariates. Ifthe covariates were perfectly measured and complete, then the systematiccomponent would be(2.5.12)where X is the n x p design matrix and /3 is the p x 1 indexing vector.The link function, g(•), connects the random component given by theprobability distribution of Y and the systematic component, which is alinear combination of the covariates. In general, g(.), describes how themean response, (Y) = ,u, is linked to the covariates through the linearpredictor. For theithsubjectg(ILi) =Tfi.The link function can be any monotone continuous and differentiable function, but we will restrict ourselves to canonical links. Working with canonicallinks brings a level of simplicity to theproblem and arises whenO =g(ij) =rj. An immediate benefit is that a sufficient statistic equal in dimension to37/3 in the linear predictor, = Xf3, exists for canonical links [73].The response of interest, which is also a common response in epidemiological research, is the Bernoulli random variable, Y. A typical goal is tofind a functional relationship between the response and a set of explanatoryvariables X indexed by/3.We will consider an independent and identically distributed sample of n subjects with thethresponse denoted Y,i = 1, .. . , n. A realization from the Bernoulli random variable Y is— f1 with probability ir0 with probability (1 — r)where is the probability of observing a success in the Bernoulli experiment.The associated probability mass function isp(Y= IYi(l —with E(Y) = ir and Var(Y) = — ire). Here a() = 1 when we letweightequal the number of independent trials andassume no dispersion problemswith the data. We let m = 1, q = 1, where mis the number of trials in theBernoulli experiment),b(O) = — log(1 — ir) where ir indexes the binomialdistribution and 0 = log and c(y,)= 0. Wesee that the canonicallink is 0 = log the logit function. Thelinkage between the random andthe systematic components isIn= log1—thusp(Y= In.j)=IrYi(1—= exp (Yilogi + log(1—in))= exp(yt%— log (1 + ern)) (2.5.13)=p(11Ix,/3,1= 1)382.5.6 Mechanism of imperfectionThe final component of equation 2.5.1 is the model for the missing datamechanism, p(RIx4,y,y). Modelling the joint probability distribution ofthe indicator vector is simplified through two observations. The first is thatwe can use the ideas from the conditional-conditional model for the jointdistribution of the the covariates and model the missing data mechanismas a sequence of one-dimensional distributions [48]. With this approach wehave,p(RiIx,y,)p(Rprii,...,rip_i,x,yi,7p)xp(Rjp_iIrii,...,rip_2,x’,yi,7—i).xp(RjIrji,x,yj,72)p(RilIx”,y,71)where indexes the model for thethindicator vector, j = 1, .. .,p.As with the covariates, the above relationship only provided a modestsimplification of the problem. The next step is to apply the conditioningfor each of the j joint conditional distributions, j = 1, .. .,p. Unlike thecovariate situation, the relationship between the random variables is lessobvious. With the covariates, there was a perfect random variable andthere was an imperfect random variable. The natural relationship betweenthe two was motivated by the classical measurement error assumption. Here,theJthjoint distribution of the indicator variables is between the indicatorfor missing data, R and the indicator for mismeasurement, R. Withouta clear guide, we have three situations to consider.The first is a conditioning on the indicator of mismeasurement. Thischaracterization of the joint probability model suggests that the probabilisticmechanism for missing data is dependent on the presence, or absence, ofmeasurement error. If we choose to condition on the indicator of missinginformation, then it is suggested that the mechanism for mismeasurementis dependent on the observation of a realization from the associated randomvariable. The third situation is to assume independence so that the presenceof one type of imperfection has no bearing on the presence of the other.39Although simplest, at this point the assumption of independence betweenthe two mechanisms is not desirable primarily because it can be considereda special case of either conditional specification. For thethsubject andthcovariate, the associated models arep(1{jjIrji,...,rjj_i,x,yj,or for the second conditional situationiMI A M=pR rj,r1,. . . , rj_i,x ,y,‘-yxp(RIrii,...,rjj_i,x,yi,7).Although for expository purposes the first will be used, there are subtlecontextual reasons for this choice. Recall that the first model suggests thatmissingness depends on mismeasurement whereas the second suggests thatmismeasurernent depends on missingness.Intuitively, the first seems more plausible, but why? Consider the second situation. The conditional distribution is conditional on the observational status: observed, not observed. If it is not observed, then there is noobserved information about the measurement error mechanism for thejthsubject. The only information we would have about the mismeasurementmechanism would be from the observed realizations. To apply this model,we would have to assume that both the observed and the unobserved wouldbe subject to the same measurement error model. The first situation isappealing at this point.A motivating illustration is to consider the situation where obtainingperfect measurements is difficult, but easy for imperfect measurements. Ifit is difficult to observe a perfect realization of the random variable, theremay be a high probability that the perfect realization will not be observed.If it is easy to obtain an imperfect realization of the random variable, thenthere may be a high probability of observing the realization. This characterization of the mechanism of imperfection still allows for multiple types40of measurement error models for a single random variable. It for these reasons that the first characterization will be used with the recognition thatthe other characterization is not inferior. Conditioning on the indicator ofmismeasurement yieldsp(RjIx,yj, y)=p(RfIr,rji,...,rjp_i,x,yi, y)mM A M,yj, 7,ii,I M A IXP1’ip_1r.1,r1,. . . , r,_2, xy, 7p—1i,M A Mxp1-L_1 ‘fJj7p—1x p(RIrX, y,p (RIrji, x,y,7)x p(RiiIr’f,x, y,f)p (RfIx, j, (2.5.14)where77is the indexing vector for the indicator vector and thevthimperfection where v = {I, M} and7T=(yf,..yf, . ,717M)T. Sinceis binary, we can use a sequence oflogistic regressions for equation2.5.14. With binary indicators, wecan use the previous exposition aboutthe response model substitutingthe random variable R for Yj. Here wewill apply the same assumptions as with theresponse, thus we havep(RIr,ri,. . .‘y)exp (rfj +log (1 +andp (Rf In1,.. . , x, y,= exp (rj + log (i + eJ))where=[, r]4is a linear combination of the realizations and the indicators on which conditioning ispredicated for the indication of missing data model, r,_(3 = (ri, . . ., nj_i) which is the vector of indicators with theththrough indicators removed, and =[x, is the linear combination of therealizations and the indicators on which conditioning is predicated for the indicationof mismeasure41ment model.As indicated with the covariate model, this method greatly reduces thenumber of nuisance parameters that need specification compared with ajoint log-linear specification [95]. Furthermore, we have the advantage thatit provides a natural approach to the specification of the joint probabilitydistribution of the imperfection indicators.2.5.7 Missing dataAs it was natural to discuss the measurement error model in the context ofthe covariate model, the model for the missing data mechanism provides anatural context for discussing the types of missing data. There are threetypes of missing data: missing completely at random (MCAR), missingat random (MAR), and non-ignorable missing (NMAR) data. These threeclasses of missing data will follow the original descriptions as given by Rubin[90] and their interpretation by Ibrahim [46, 47]. Missing completely at randomData are said to be missing completely at random (MCAR) when the inability to observe a realization from a random variable does not depend onany data either observed or unobserved. In our situation, the response iscompletely observed whereas some components of thethrealization x aremissing. For example, if xj is missing, then we say it is missing completelyat random (MCAR) if the probability of observing a realization of Xj isindependent of the responsejand of x. From a model perspective, if73(0)= 0 which is the indexing parameter vector of the logistic regressionmodel for thethindicator variable associated with the missing informationcomponent of imperfection with the intercept term removed, then the datais missing completely at random [47]. Under the MCAR assumption, theobserved data is a random sample of all the data. When imperfection isrestricted to missing data, then a complete case analysis will lose efficiency,but not introduce bias to the parameter estimates.422.5.7.2 Missing at randomData are said to be missing at random (MAR) if conditional on the observeddata, the inability to observe a realization from a random variable doesnot depend on the data that are not observed. Viewed another way, theinability to observe a realization may depend on the observed data. Thisconditional dependence does not prevent the unconditional probability ofnot observing a realization from depending on the unobserved data. Aswith MCAR data, assume that the response is completely observed andthat some components of thethrealization x are missing. The missingvalues of x are MAR if conditional on the observed data, the likelihood ofobservingx: is independent the values of x what would have been observed,where the observed values include both the response and the observed valuesof x2.It has been observed that the MAR is a much more reasonable assumption than that of MCAR, but adjustments should be made because theobserved realization are no longer a random sample of the sample [46]. Acomplete case analysis will result in inefficient and biased estimates of theparameters. Furthermore, if the probability depends on the observed covariates and not on the response then a complete case analysis will resultin unbiased estimates, but if the probability of not observing a realizationdepends on at least the response, then a complete case analysis will resultin biased estimates [67]. Little and Rubin’s assessment to which Ibrahimrefers is made in the context of a regression model. The unbiased propertywhen the response is not part of the missing data model stems from the factthat both the regression model and the missing data model are conditionalon the covariates [36].For logistic regression, the effect of having data which is missing at random is less conclusive than with linear regression. Vach and Biettner [104]observed for case-control studies under the MAR assumption that completecase analysis when the exposure (covariate) and disease status (outcome)are variables for the missing data mechanism, biased estimation may result. Furthermore, Vach and Blettner comment that the bias for MAR data43which involves both the exposure and the disease state can range from being small to rather large. Greenland and Finkle [38] demonstrate in a seriesof limited simulation studies that for logistic regression with the missingat random mechanism depending on the outcome can produce parameterestimates with negligible bias. These two studies indicate that missing atrandom data for logistic regression models may not operate as expected fromlinear regression.If we consider the situation where there is a single covariate which suffersfrom missing data, x3, then from a modelling point of view, if the coefficient[y]3,which is the coefficient for the variable for the vector of parameterswhich indexes the missing data component of thethimperfection model is0, then the missing data mechanism does not depend on x; in this situation,the missing data are missing at random [47]. Non-ignorable missing dataData are said to be non-ignorable missing if the probability of observing arealization from a random variable depends on the value of the realizationthat would have been observed. As with the previous examples, assume thatsome components of x are unobservable and that the response is alwaysobserved. Here, if the probability of observing realizations, conditional onthe observed data, depends on the missing values that would have beenobserved from the components of x, then missing data is non-ignorable. Inthis setting, a complete case analysis, when considering only missing dataimperfection, leads to unbiased estimates of the parameter if the conditionalprobability depends only on the missing covariate and not the response.Valid inference requires the correct model specification of the mechanismof missingness and/or the specification of the distribution of X2 [46, 67].Finally, if[y]0then the missing data mechanism is not ignorable [47,90].442.5.8 Missing Data and Mismeasurement as Special CasesThroughout the development of the generalized imperfect variable model,examples have been given which highlight the flexibility and complexity ofthe framework. As a counterpoint to these examples, we will not considersimplifying assumptions which will illustrate that the missing data and mis-measurement frameworks are now special cases. For this, equations 2.5.8and 2.5.14 are of primary interest.For missing dataRM= 1 for all subjects so the imperfect variable reduces toE Mn / I\X =r 1rX,0b8+ il — r)Xij,missIf the realization is observed, then X = Xj,ob8then we observe a realizationfrom the true random variable, but if it is not observed then = Xij,misswhich is the random variable that would have generated the realization if ithad been observable. The model for the mechanism of imperfection reducestop(Rjjx,y,7)=p(RJr =7) M i A IXP i-’i2r2 = .1., rd, XYi, 72IDIM_1 A IX 71VLi1r1 — i., XlIz, 7iwhich is identical to Ibrahim’s proposal for a likelihood based approach tohandling missing data within a generalized linear model framework [48].For measurement error, often it is assumed that all realizations fromX are unobservable. Instead a surrogate is used and realizations fromXEwhich is functionally related to X are observed. In this scenario R’ = 1 andRM= 0 for all subjects. The generalizedimperfect variable reduces tox = +c1Xj +a2TX(_))+€which is Carroll’s proposed classical measurement error [14]. In this situation, the model for the mechanism of imperfection is p(R3Ir1,. . . ,r3_,x, y,7) =451 for all subjects. Since all the observations are observable and all realizations are mismeasured with probability one, the probability distribution isa point mass on the event R3 = (1,0) for all subjects.If the restriction on mismeasurement is relaxed, an application of theconditional-conditional method for measurement error results. Consider thesituation where only R’ = 1,XZ =TfX,,0b8+ (1 —The model for the mechanism of imperfection reduces top(RjIx,y,x p(Rfrjl,x,yj,7)p(RIx,y, -ye’)which is Ibrahim’s conditional-conditional method applied to a reformulatedmismeasurement problem. It does not require all realizations from X, tobe mismeasured.46Chapter 3Maximum likelihoodestimation viatheExpectation-MaximizationalgorithmThe Expectation-Maximization algorithm or EM algorithmis a broadly applicable algorithm that provides an iterative methodologyfor obtaining maximum likelihood estimates in situations where straightforwardestimation ofthe maximum likelihood estimate is frustratedby data complexity such asincomplete information. In this chapter, wewill briefly recount the historyof the EM algorithm, consider the rationale forand critiques against its use,present a formulation of the EM algorithm, translatethe EM algorithm forgeneralized imperfect variables, provide a Monte Carlobased method for obtaining the EM algorithm based maximum likelihood estimatesand finallyobtain standard error estimates for the parametersof primary interest.3.1 Brief history of the EM algorithmThe earliest manifestation of an EM-type algorithmwas in 1886. The central problem was parameter estimation ofa mixture of two univariate normal distributions [74]. The EM-type algorithm inthis instance was usedto incorporate observations for which the associated errorswere abnormallylarge into the estimation of the parameter rather than discardingthese observations as outliers and estimating parametersbased on the reduced set of47observations. These outliers are affectedby some error, termed the evil ofa value, that is suspiciously similar, in some respects,to measurement errorproblems reformulated as missing data problems[81].Following this early example. of the EMalgorithm, other examples ofEM-type and EM algorithms in the statisticalliterature were considered.The number of these grew over the course ofninety years. In 1977, Dempster, Laird, and Rubin proposed a generalformulation which is known asthe Expectation-Maximization algorithm forwhich all predecessors are particular examples [22]. Prior to their formulation,a few precursors to theDempster, Laird, and Rubin formulation ofthe EM algorithm should benoted.In 1958, Hartley’s approach for the generalcase of count data sets forththe basic ideas of the EM algorithm. Two years later,Buck [8] considers theestimation of the p-dimensional mean vectorand the associated covariancematrix when some of the observations are missing.Suggesting an imputationmethod based on regressing the missingvariables on the observed ones,the parameters are estimated on a synthetic“complete” data set. Undercertain conditions, this approach yieldsthe maximum likelihood estimatorsof parameters for the exponential family and hasthe basic components ofthe EM algorithm.Convergence results for EM-type algorithmsemerged in the late 1960’sand a inter-related series of papers form the basisof the present-day formulation of the EM algorithm [4—6]. In 1972, theMissing Information principlewas introduced and is in the spirit of and relatedto the basic ideas of the EMalgorithm [83]. Here, the relationship betweenthe complete and incomplete-data log-likelihood functions lead to the conclusionthat the maximum likelihood estimator is a fixed point of a particulartransformation. Contemporaneously with these results was the proposal of theSelf-Consistency principleby Efron [24].Turnbull uses Efron’s Self-Consistency ideafor nonparametric estimation of a survivorship function with doublycensored data [102]. Here,Efron’s ideas are extended to show that theequivalence between Efron’sself-consistency and the nonparametriclikelihood equations. Convergence48is proven for for these EM-like algorithms. Twoyears later, he deals withthe empirical distribution in the context of grouped,censored, and truncateddata [103]. It is here that Turnbull derivesa version of the EM algorithmand notes that this approach can beused for not only missing data buttruncated data. Just prior to Dempster, Laird,and Rubin’s seminal paperproposing the present version of the EMalgorithm, work also appeared toaddress mixture distributions [20] andan iterative mapping for incomplete-data problems predicated on exponential familieswhich are now known asthe Sundberg formulas were addressed[96, 97].Since Dempster, Laird, and Rubin’spaper, there have been many applications of the EM algorithm. The algorithmhas since become a staple inmany statisticians methodological tool-box.Although there is a high levelof utility to the algorithm, two issues needto be addressed: interpretationand extensions of the EM algorithm. McLachlanand Krishnan [74] suggesttwo interpretations. The first comes fromproblems involving mixture distributions. In these problems, the EM algorithmnatural emerges from theparticular forms taken by the derivativesof the log-likelihood function forwhich Day’s paper is an example [20]. Thesecond interpretation is taken byviewing a complex problem as an incomplete-dataproblem with an associated complete-data problem. If well formulatedthere is a natural connectionbetween the two likelihood functions which theEM algorithm can exploit.This interpretation is in the spirit ofthe Missing Information principle, andis the one taken in this thesis [83].The work surrounding the EM algorithmhas continued to grow sinceDempster, Laird, and Rubin generalized theapproach in 1977. A detailedaccount of the convergence propertiesof the EM algorithm were given byWu [114]. In particular Wu shows that convergenceofLQIJk)toL(1*)does not necessarily imply the convergence of11kto 4’, where ‘I’ is thevector of parameters. That is the convergenceof the likelihood evaluatedusing thekthiterative estimate of the parameter to thelikelihood evaluatedusing the unique maximizer,L([J*),does not necessarily imply that thekthiterative estimate of the parameterconverges to the unique maximizer ‘T’.Almost a decade later, Lansky, Casella,McCulloch and Lansky establish49some invariance and convergence results in additionto considering the rateof convergence for the EM algorithm [60].Laird [57], in his work with maximum likelihoodestimation in survival/sacrifice experiments, shows the equivalence ofEfron’s Self-Consistencyprinciple [24] and Orchard and Woodbury’s Missing Informationprinciple[83]. Furthermore, Laird also shows that with parametric exponentialfamilies, these two principles have the same mathematical basisas the Sunbergformulas and establishes the self-consistency algorithmas a special case ofthe EM algorithm.Despite the popularity of the EM algorithm, it isnot without its problems. An initial criticism and probably one of thebiggest impediments toits use is its inability to automatically produce an estimate ofthe covariancematrix of the maximum likelihood estimate. Earlyon, Louis [70] developeda method which yields the observed information matrix in terms of thegradient and curvature of the complete log-likelihood function. Indoing so,Louis, by providing a method for obtaining thesecond moments, deepensthe connections between Dempster, Laird, and Rubin EM algorithmandFisher’s [28] observation that the incomplete score statistic isthe conditional expectation of the complete-data score statistic given the incompletedata.Since Louis, others have proposed methods for obtaining covariance estimates. Meilijson [75] proposed a numerical approachwhich uses componentsof the E- and M-steps of the algorithm. This methods avoids thecomputation of the second derivatives as with Louis’ method.Furthermore, it wasshown that single-observation scores for the incomplete-data model canbeobtained as a by-product of the E-step. Meng andRubin [76] proposeda method which produces a numerically stable estimate of the asymptoticcovariance of the EM estimate which using onlythe code of the EM algorithm itself and standard matrix operations. This approachis called theSupplemented EM algorithm (SEM). A negative aspect of the SEMalgorithm is that accurate estimates of the parameters are needed causingtheSEM to be potentially algorithmically expensive. Furthermore, it hasbeensuggested that the SEM suffers from numerical inaccuracy which isa con50siderable drawback for an algorithmic or numerical method[3, 93]. A recentproposal comes from Oakes [82]. Oakes’ proposesa computation of thestandard error by deriving a formula for the observedinformation matrix byusing an explicit expression for the. second derivativeof the observed-datalog-likelihood in terms of the the derivatives of the conditionalexpectationof the complete-data log-likelihood given the observeddata.Two secondary issues with the EM algorithm are therate of convergenceand the potential for high dimensional integrationin the E-step. An interesting feature of the Dempster, Laird, and Rubin [22] paper isthat keys foraddressing acceleration and identifying problems withthe likelihood surfacesare buried within. Louis [70] not only suggested a widelyadopted approachto obtaining the covariance matrix of the EM estimates,but also proposedmethod to accelerate the convergence itself. His proposaluses the multivariate generalization of the Aitken acceleration procedure which,when appliedto the EM problem, is essentially equivalent to using theNewton-Raphsonmethod to find a zero of the incomplete-data score statistic.Jamshidianand Jennrich [49] propose a gradient approach foraccelerating convergenceas does Lange [58, 59]. Meng and van Dyk [‘77] consideracceleration ofthe algorithm by introducing a working parameterin the specification ofthe complete-data likelihood. This working parameter indexesa class ofEM algorithms, thus a parameter can be specified whichaccelerates the EMalgorithm with out affecting either the stability or the simplicityof the algorithm. Parameter Expanded Expectation Maximization(PX-EM) expandsthe parameter space over which maximization occursand often results inaccelerating the convergence of the EM algorithm[69].High dimensional integration, although interesting, canpose serious analytic challenges even in what may be conceived as thesimplest of problems.Wei and Tanner [109] use Monte Carlo integrationfor the E-step and proposes the Monte Carlo EM algorithm (MCEIVI). Ibrahim [44]use the MCEMapproach combined with Gilks and Wild’s[35], and Wild and Gilks’ [111]adaptive rejection sampling for the Gibbs sampler inorder to obtain EIVIestimates with missing data for generalized linearmodels.513.2 Rationale for and critiques againstthe use ofthe EM algorithmA rational application of statistical methodology is bestdone acknowledging both the reasons for a method and the critiquesagainst it. The EMalgorithm has high utility and an inherent simplicity,yet it has limitationsand failings. McLachlan and Krishnan [74] providea useful summary ofreasons for and against the use of theEM algorithm. This list will not bereproduced, but for completeness, a summary ofthe reasons will be presented. The authors identify tenreasons why the EM algorithm has suchwide spread appeal.• The EM algorithm is numerically stable; increasing thelikelihood witheach iteration until it reaches a stationary point.• Under general conditions, the EM algorithm hasreliable global convergence. Given an arbitrary starting point,p(°)in the parameterspace, convergence to a local maximizer almost alwaysoccurs.• In general, the EM algorithm is easily implemented.The E-step involves taking the expectations over the complete-datalikelihood conditional on the observed data and the M-Step involvescomplete-datamaximum likelihood estimation.• The estimation in the M-step is frequently a standardone, thus it ispossible to use standard software packages when themaximum likelihood estimates are not available in closed form.• The analytic work is often simpler than othermethods because onlythe conditional expectation of the complete-datalog-likelihood needsmaximization. The counter point is that the E-stepmay require nontrivial analytic work.• Programming of the EM algorithm is generally easysince there is noevaluation of the likelihood itself nor any of thederivatives.52• In general, it is inexpensive in terms of computerresources since estimation does not involve the storage of large datastructures such asthe information matrix. Furthermore, sincethere is low cost per iteration, even a large number of iterations can proveto be less expensivecomputationally than other methods.• Monitoring convergence is easy since each iterativestep increases thevalue of the likelihood.• The EM algorithm is useful in providingestimates for missing dataproblemsA further reason in favour of the EM algorithm whichis specific to thecontext of missing data and logistic regression is therecommendation byVach and Schumacher [105] which recommends either the EMalgorithm formaximum likelihood estimation, following themethodology of Ibrahim [43],or the pseudo maximum likelihood approach.Although there are compelling reasons to use the EMalgorithm, it is notwithout its faults. As indicated inthe previous section, the EM algorithmdoes not naturally provide an estimate of the covariance matrixof the parameter estimates. Although this isa failing, it was shown that work hasbeen done to address this shortcoming. The counterpoint to the EM algorithm’s low cost of computer resources for a singleiteration is that it maytake a long time to converge. This is a particular problem whenthere is alarge amount of “missing information” [74]. Even thoughthe EM algorithmwill eventually converge, it may only coverage toa local maximum. Thisis not a specific failure of the EM algorithm, but morea general failure ofoptimization algorithms in general. Finally, the E-stepmay be intractable,thus more complicated methods such as MCEMwill be required. Herethe simplicity of the EM algorithm is muted with the complexityof MonteCarlo integration; additionally, the computational complexityand cost willincrease.533.3 Formulation of the EM algorithm3.3.1 EM algorithmIn this section, we will first consider the EM algorithm withoutreference toa specific likelihood then we will consider the EM algorithm inthe case ofthe exponential family. Finally, we will briefly discussthe mapping inducedby the EM algorithm and its role.The EM algorithm is commonly used iterative procedure for findingmaximum likelihood parameters when part of the data is missingor more generally for finding maximum likelihood estimates in situations where maximumlikelihood estimation would be straightforward, but there isthe additionalcomplexity of incomplete information. The objective is to obtain maximumlikelihood type estimates, with nice statistical properties, that wouldbe obtained if the data was perfect. The EM algorithm is less an algorithmandmore a two-step general principle. The first step, the E-step,involves taking the conditional expectation of the complete likelihood given the observeddata. The second step, the M-step, involves maximizing the conditional expectation with respect to the indexing parameter. We then returnto theE-step, treating the parameter estimates obtained in the previousM-stepas part of the observed data. In this manner, an iterative process evolveswhich, under some predetermined stopping rule, yields a EMbased maximum likelihood estimate.At this point, we will deviate from the standard notation commonlyusedfor presenting the basic formulation of the EM algorithm. The fundamentalreason for this to minimize the introduction of superfluous notationand toprovide a natural bridge between the proposed imperfect variable frameworkand the EM algorithm. Recall that, for thejthsubject, the universa randomvector isXU (X,XE,RI,RM)= (XA,R)where X is the true or target random variable,XEis the experimentally oh-.54servable version of X and R= (RI, RM)is the indicator vector for missinginformation and mismeasurement respectively. The log-likelihood,equation2.5.4 is the product of three conditional distributions:the joint probabilitydistribution of the indicator vector, the probability density of the responseand the joint probability distribution of the covariate vector (X,XE).Furthermore, recall that if all the data were observable,then the perfectlyobserved log-likelihood function is1c(IxU,y)= l(Ix,yj)=p(YjIx,,)+logp(XI).and the experimentally observable likelihood givenbyl(Ix,rj,yj) = f .. .dXidX1...dX (3.3.1)£Ufor thejthsubject which suffers from imperfection, is the union of thevariable space associated with the target covariates and the variablespaceassociated with the experimentally observable covariates. Furthermoreweassume that the imperfect covariates for which both problems do notcoexist in the same covariate are the firstj covariates in the data set and thatthe imperfect covariates for which both problems coexist within the samecovariate are the following r covariates. It is clear that allthe unobservableinformation is integrated out so that all that remains are the experimentallyobserved realizations from XThe EM algorithm proceeds iteratively in terms of the complete-datalog-likelihood function. Since the complete-data log-likelihood is unobservable,it is replaced by the conditional expectation given the observed data and thecurrent estimate of the parameter, . To begin, specify some initialvalue,for . For the first iteration of the EIVI algorithm, the E-step requiresthe calculation of the expectation of the complete log-likelihoodgiven theobserved data and the initial parameter estimate, For thejthsubject55this isQ(I°)= E , (3.3.2)thus over all subjects, the expectation, denotedQ(I°),isQ(°))=Q((°)),(3.3.3)where the expectation is taken with respect to probability distribution ofthe missing information conditional on the observed information and themost recent update of the parameter estimates, p (xI0),xE,y, r).The M-step requires the maximization ofQ(I°)with respect to overthe parameter space . The parameter1)is chosen such thatQ(’I°) Q(P°)for all e . Alternately, this can be expressed as = argmaxQ(I°).The subsequent step replaces()with The E- and M-steps are repeatedbut this time with()in place For the(t+l)thiteration the E-stepis=E[l(Iy, xV)It,x, y, r] (3.3.4)For the M-step. we choose(t+1)such that(t+1)E and that it maximizesQ(1t),that isQ(t’I) Q(It)(3.3.5)or alternately,(3.3.6)56for all E .The E- and M-Steps are repeated inan iterative fashion until the predetermined stopping criterion is satisfied.One of three choices are commonlyused; two of which- are based on the estimateof the parameter and theother being based on the likelihood. The first criterionis to allow the EMalgorithm to iterate until the distance betweenlagged estimates of the parameter is less than some value,(t+1) — (t)< ö, where 1 is the chosenlag and ö is some arbitrarily small amount[65, 70]. The second criterion isto use only a subset of the parameters, for example, require thatthe distance between lagged estimates of the parameter vectorof interest be lessthan some value,— 13(t)ö. The third criterion is to require thatthe distance between lagged estimates of the likelihoodbe less than somevalue,IL(t+t)) — L((t))Lö [74]. It is good to note that regardless of thecriterion chosen, this is only a measureof lack of progress in the change ofthe parameter estimate or in the likelihood and nota measure of distanceto the true maximum value of the likelihood.Dempster, Laird, and Rubin [22] showedthat the likelihood functiondoes not decrease after an EM iteration; that isL((t+l))L()) (3.3.7)for t = 0, 1,2,..., thus convergence is obtained witha sequence of likelihoodvalues that are bounded above when the stationary point isreached. Thisis a strong motivation to use a likelihood based stopping criterion.Thecounter-point to this is the potential complexity of computingthe likelihoodnegates one of the advantages of usingthe EM algorithm. In response tothis, a lagged distance between EMbased estimates is commonly used.3.3.2 EM algorithm and imperfect variablesAlthough we have begun translatingthe EM algorithm for generalized imperfect variables, we still need to finishthe translational process by parsingthe expectation function into several components.We begin with equation573.3.4’Q(It)= E [1c(Iyi,x)!t,x,yi,ri]=+ (1 -flrr)E= ?+ (1—llrr)f.. .flc(Iyi,x)P(X(t),x,ri,yi)dxi]j=1(3.3.8)where dx = dx1 . . . dxq d431 . ..dXf for notational simplicity in thecontext of integration unless otherwise specified. Equation3.3.8 consists oftwo parts. The first is the contribution when thejthsubject is perfectlyobserved, the log-likelihood that is seen in complete case analysis. Thesecond part is the contribution when thejthsubject contains imperfections.The contribution is the expected log-likelihood over the space of unobservable realizations given the observed data. Notice that if there exists animperfect covariate for thejthsubject, the second component is the contribution because that imperfection may affect not only response model, butalso the missing data mechanism and the covariate model.Now consider only the portion for which imperfectionis problematic,f... flc(Iyi,x)p(X(t),x, r,xUEwhich can be written as58f.f1ogp(RiIx, y,)p(XI(t),x, r,+f... flogp(Yjx4,,)p(XI(t),x, r, y)dxU+f.f1ogp(XI)p(XI(t),x, r, y)dxUand is succinctly represented asQ(7I7t)++whereQ(.)is the expectation for the1thsubject (Equation 3.3.2); thereforeQ(It)=+(1- rrf)[Q(7It)+ Q(It)+Qt))]](3.3.9)This needs further decomposition. First we will considerthe joint distribution of the imperfection indicator. From equation 2.5.14 we have59= f.f1ogP(RiIx, y,7)p(X(t),x, r, y)dxUEp ,.=J.f1ogP(R,Rprii,.. .j=1UExp(X4I(t)x r,y)dxI’ ‘ ip=f...f1ogP( xEr,y)dxuS ‘pA M+f...flogP(RIrii,...,rii_i,xjYijj1UExp(X(t),xf,r,y)dxp/i1(t) M(t)”]=[-I-r)+Qi (Vxj)](3.3.10)j=1whererio denotes the absence of an indicator. In an analogous manner, thejoint distribution of the covariates can be writtenaspQ(It)=[(fii’) + Q(I/(1 (3.3.11)j ,j1]’j=160thus equation 3.3.9 becomesQ(It)+ (1- fl rrf) [Q(7I7)+Q(Ipt)+Q((t))]]=, [(i)1c(Iyj,xV)+ ( - rr)[P[(77t))+ Q(7M7M(t))]+ Q(pIØt)+[(i) + Q(It))]]](3.3.12)It is here that the relevance of assuming unique parameterizationfor thevarious distributions becomes evident. Equation3.3.12 tells us that a complicated joint distribution, under the assumptionthat the conditional expression of that distribution is uniquely parameterized, canbe partitionedinto a sum of simpler and most likely standard problems.Equation 3.3.12 has four components. The first is the log-likelihoodwhen there are no data imperfections. The remainingthree result whenimperfection exists. The second component corresponds to thesequence ofone-dimensional conditional distributions used to specifythe joint distribution of the imperfection indicator which decomposes intoa conditionalexpectation, given the observed data, of the indicator of missing information and a conditional expectation, given the observeddata, of the indicatorof inismeasurement. Since the indicators are binary, a binary regressionmodel is being used to characterize the uniquely parameterizedconditionaldistributions. The third component is the conditional expectation,giventhe observed data, of the response distribution. Recall thata binary regression model is being used to characterize the response distribution (Equation612.5.13). The final component of equation 3.3.12 isfor the joint distributionof the true variable and the experimentally observablevariable. Recall thatthis distribution was constructed by usinga sequence of one-dimensionalconditional distributions utilizing uniqueparameterization for each conditional distribution which also undergoesa component-wise decomposition.3.4 Implementing the Expectation-Maximizationalgorithm: Monte-CarloExpectation-MaximizationAlthough equation 3.3.12 brings much needed simplicityto the E-step (Equation 3.3.4), it is still, for the most part, non-trivial. Evenin a small problemthat may involve only two imperfect covariates,double integrals will be required for the response and may be required forthe covariate and indicatorvariable portions depending on their structure. It is clearthat even in welldefined an modest problems where imperfection is present,high-dimensionalintegration overUwill be necessary.Evaluating the expectations of the E-step cantake several forms. As weare synthesizing missing data and mismeasurement togetherwe have multiple sources from which guidance can be taken. Weiand Tanner [109, 110]proposed a Monte Carlo approach to compute the expectationin the E-Stepof the EM Algorithm. Missing data in the contextof generalized linearmodels literature tends to support theuse of Monte Carlo integration asa means to handle potentially difficult and high dimensionalintegration[46, 48, 95]. Prom the measurement error side of the problem,Monte Carlomethods have been considered when framing themismeasurement problemas a missing data problem and using the EM algorithm[13, 30]. Wang [108]proposed a Gaussian quadrature EM (GQEM) that handleslow dimensionalproblems as an alternative to MCEM. In general, MonteCarlo methods arenot competitive against quadrature methods whichhave convergence ratesof O(M2)or O(M4)in one dimension withM-step evaluations [108].Typically, quadratures are used for low dimensional problemsand are corn62monly used for single integrals. Wang showsthat the GQEM performs wellfor k < 3 dimensions [108]. This wouldbe an appealing approach, but thesimple model under discussion is onlya first step in understanding how tointegrate similar but more complex patternsof missingness and measurement error. It is reasonableto believe that any extension of this problemwill result in high dimensional integration.Although MCEM is less efficient,it is a much more attractive starting point fora problem that is only goingto grow in complexity.3.4.1 Monte Carlo integrationMonte Carlo methods are well documented,thus only a brief overview of themethods used will be presented. In 1949,Metropolis presented a statisticalmethod to the study of differential equationsand as indicatedby Metropolis, to a more general class of integro-differentialequations that occur acrossthe natural sciences [79]. With the advent ofinexpensive and almost ubiquitous computing power, Monte Carlo based methodshave flourished. Wewill consider classical Monte Carlo integrationas presented by Robert andCassella [85].Consider the problem of evaluating the integralEf [h(X)]fh(x)f(x)dx (3.4.1)where X is the domain of X. A sample (Xi,. . . , Xm) is generated from thedensity f(x) and equation3.4.1 is approximated used the empirical averageBy the Strong Law of Large Numbers,hm converges almost surely to Ef [h(X)j.The variance isvar(flm) =f(h(x) - Ef [h(X)])2f(x)dxm63and also can be approximated from the sample withVm=(h(Xj)_hm)2.For large mhmEf[k(X)]has an approximate N(0, 1) distribution which allowsfor the constructionof tests and confidence bounds on the approximation of Ef [h(X)].3.4.2 Monte Carlo EM: the expectation stepIn the application of Monte Carlo integration tothe problem of findingthe expectation in the E-step of the EM algorithm, the approachsuggestedby Wei and Tanner [109, 110] will be used. A sample of simulateddata isdrawn from the conditional distribution ofp(XAIt),XE,r, y) on the(t+l)titeration of the E-step. This sample is used to approximatethe expectationin equation 3.3.4 using the empirical averagen m= (3.4.2)When m —* oc then this converges toQ(1t),hence MCEM is the regularEM in the limit [85].Although using Monte Carlo integration is an attractive solutionto handle a difficult high-dimensional integral, it is not without somecost. Obtaining the sample X,... ,Xm fromp(XAI(t),xA,r,y)can greatly increasethe computation time for an algorithm which is notorious for its slow convergence. In response to this, Levine and Casella [61] proposed a methodby which samples may be reused, thus reducing the overall computationaltime. Alternately, sampling has been eased through theuse of Gibbs sampling and Gibbs Adaptive Rejection Sampling [18,35, 44]. Rodriguez [87]64has implemented Gibbs Adaptive Rejection Samplingin the statistical package R in the ars library. The ars function is predicatedon Gilks [35] paperand requires log-concavity of the functions from which it issampling.At this point, a brief digression concerning MCEMimplementation willbe taken. The discussion herein is relevant to theMCEM algorithm, butwill not be incorporated into the simulation studiesnor the applied example.McLachlan [74] points out that in MCEM, a Monte Carloerror is introducedat the E-step and that the monotonicity property ofthe EM algorithm islost in that sequential steps may not result in an increaseof the likelihoodyet it is posited that with an adaptive procedurefor increasing the size ofthe drawn sample, the MCEM algorithm will move towardsthe maximizerwith high probability [7, 85].Wei and Tanner [109] recommended that the size ofthe sample be increase as the algorithm moves closerto convergence, thus equation 3.4.2 canbe written asm9()It)) =1l(Iyj,x’) (3.4.3)mg(t)wheremg(t)is the size of the sample drawn from the samplingdistributionwhich is itself a function of iterativestep itself. By using a function ofthe step at which the EM algorithm is currently at,g(t), a more flexiblerepresentation of how to increasethe sample size is obtained. In its simplestform, g(t) = t with a sample size increase at each step ofthe algorithm. Amore sophisticated approach is to monitor the inducedMonte Carlo error sothat m is increased when changes in the parameter becomedominated by theMonte Carlo error itself [7]. Since Boothand Hobert proposed the increaseto be applied to the following step, the sample size can be thoughtof as afunction of the step itself. The method requires independentMC samplesin the E-step to allow for computationally inexpensive and straightforwardassessment of Monte Carlo error through an application ofthe central limittheorem. Levine and Casella [61] proposed a methodfor monitoring MonteCarlo error in the MCEM algorithm in the presence ofdependent Monte65Carlo samples.Although there are two recent proposals formonitoring the Monte Carlorate for the MCEM algorithm, the processis not yet truly automatic. Bothapproaches can determine whenm should be increased, but neither suggestby how much m should be increased. This isan important feature sincethe operating hypothesis is thatby increasing the sample size as the EMalgorithm nears convergence,the effect of the Monte Carlo error canbeminimized thus allowing the algorithmto get closer to the maximizer withhigh probability. Levine [62] considersa central limit theorem based methodfor gauging Monte Carlo error directlyon the EM estimates in the presenceof dependent MC samples. Using their asymptoticresults, an adaptive rulefor updating the Monte Carlosample size at each iteration of the MCEMalgorithm is proposed.Another proposition is to alloweach subject to have a subject specificMonte Carlo sample size, m.In practice this recommendation givenbyIbrahim is not implemented [44, 46,48]. This proposition manifests asn m= -lIy, x) (3.4.4)j=1 1=1where m is the Monte Carlo sample forthejthsubject. If the aforementioned adaptive Monte Carlo samplesize methodologies were integrated, theE-step would becomem9(t)=m’1(lyj,x) (3.4.5)=ig(t)wheremg(t)is the Monte Carlo sample size for thejthsubject as a functionof thettiterative step in the MCEM algorithm.663.4.2.1 Monte Carlo EM: theexpectation step for imperfectvariablesThe implementation of this method requires theapplication of equation 3.4.5to the conditional expectation of the log-likelihood given the observedinformation, equation 3.3.12. Recallthat equation 3.3.12 has four components.The first is not of direct interest inthis setting since it is the complete caseor perfect data contribution to the likelihood. Theremaining three components provide the complexity to the problem, henceare the direct points ofinterest.Of the three remaining components in equation3.3.12, the first is comprised of two parts, each pertaining toone aspect of the mechanism of imperfection: missing information and mismeasurement.Here, we will introducethe notationxjt+1)to indicate the 1’ Monte Carlo sample(xt1,x)forthejthsubject andthcovariate for the t+ EM iteration. With theapplication of Monte Carlo integration,Q (i)=[(i) + Q (7I7)]is approximated bymg.(t)p(t+i)(717t)m1[logp(RijIr1,rii, . . . ,rjj_i,xt+1),yj,7t))+g1=1M A(t+) M(t)logp(RIn1,. . ., nj_i, x,yi,where = ((x’),x),. . .,(x,x)). The response model isapproximated bym9.(t)= m1l0gp(yjxt,t)),(3.4.6)g%(t)67and the covariate likelihoodQ (i)=[((t))+ Q(i)]is approximated bymg.(t)p(t+i) (j,1,(t))= m1.. ,x,9(t))+( )i=i j=i1 1X-A(t+1) A(t+1)1P(t)ogp x11 , . .i(j—1)1’ —The notationQ(t+1)(.)brings emphasis to the fact that this is the approximation of the estimating equation for the(t +l)thiteration of the MCEMalgorithm; it is reasonable to dropthe suffix (t+ 1) which indexes the iteration for which the estimation is being made. Althoughnotationally complex, the core idea remains constant;generate a large sample of data fromp(XAlt),x, r, y,) and then average over the sample.3.4.3 Gibbs samplerA key idea of the MCEM algorithm isto use a well chosen sample from thedistribution of unobservable variables whichare then used to approximatean expectation through the use ofan empirical average, thus we will turnour attention to how the sample is to be obtainedbefore proceeding to afull representation of the MCEM algorithm for imperfectvariables within ageneralized linear model framework.The Gibbs sampler can be traced backto the development of a fastcomputing algorithm for calculatingthe properties of any substance whichmay be considered as being composedof interacting individual molecules [78]with further developments by Hastings [41].In the mid 1980’s, the Gibbssampler enjoyed some renewedinterest which with Geman and Geman’s [33]introduction of the Gibbs sampler under the considerationof a Bayesianapproach to image restoration [31]. Here, the GibbsSampler is used togenerate realizations from the Markov randomfield through an exploitation68of the equivalence between Gibbsdistributions and Markov random fields.Interest in the Gibbs sampler was renewedwith Gelfand and Smith’s [31]paper which considers the Gibbs sampleras a method by which numericalestimates of non-analytically availablemarginal densities can be obtained.An objective of this paper is totranslate the Gibbs sampler, which untilthen was widely known in the image-processingliterature, for more generaland conventional statistical problems. Two yearslater, Casella and George[16] provide an accessible and insightfulpaper concerning the Gibbs samplerwhere they explain both how and why it works.To begin, we will consider the general formof the Gibbs sampler before interpreting it for imperfect variables withina generalized linear modelframework, then we will briefly review log-concavityand finally we willpresent a general approach usingGibbs adaptive rejection sampling for imperfect variables within a generalized linear modelframework.Suppose we have a joint densityp(XI)indexed by and we want toobtain the characteristics of the marginal densityp(X3I)where indexes the marginal distribution forthethvariable and= (, . . . , ),astraightforward approach would beto calculate p(XI)directly and thenuse the marginal density to obtain the desired characteristics.Obtaining themarginal distribution of X3 directlymay not be feasible. The integral maybe analytically intractable and numerical methodsmay be overly difficultto implement. In these cases, the Gibbs samplerprovides an alternativemethod to obtainingp(XI).With Gibbs sampling, we do not compute or approximatep(Xj) directly, but rather generate a sampleXii,. . . , Xmj from p(X3 By simulating a large enough sample, the empiricaldistribution of p(X j) is usedto compute the desired characteristics [16]. It is worthreminding ourselvesthat although an empirical distributionbased on a simulated sample is beingused to obtain the characteristics of the marginal distribution,if m is largeenough the population characteristics andeven the density itself can be obtained with a specified degree ofaccuracy. For example, if we are interested69in the mean of the marginal distribution of thethrandom variable then00lim—Xi= fxiP(xiIi)dxi=E(X3)A fundamental assumption is that we can simulate from the univariateconditional densitiesp1,...,p where p3 = p(Xj, x(_)) are called the fullconditionals and X(_j)= (xi, x2, . . . , x_1xj1, . .. , x1,) which is the vectorx with thethvariable removed. The Gibbs samplingalgorithm or Gibbssampler for the (t+l)thsample given thetthsamplex(t) = (xt), . . ., x)X(t+1)(t) (t)(t) .1 P’1i,X2 ,. . . ,Xj,(t+1) (t) (t+1) (t)(t).2 “P22,X1 ,X3 ,.. .,Xj,xt+l) t+1) ., xj’). (3.4.7)A readily apparent feature of the Gibbs sampler is that the full conditionals are the only distributions used for the simulation, thus evenin high-dimensional problems, all of the simulations maybe done using univariatefull conditionals. Since the Gibbs sampler is beingused as a tool to implement the EM algorithm via Monte Carlo integration,we will not discussthe technicalities of obtaining a sample from a stationary distribution. Suchtechnical details are well documented in texts suchas Robert and Casella[85] or Gelman, Carlin, Stern and Rubin [32]. Adaptive rejection samplingis one approach used to obtain samples from the full conditionalsfor whichdetails are given in Appendix B.! for details. Gibbs sampler for imperfect variablesFrom the previous section, most of the generalwork for the specification ofthe Gibbs sampler for imperfect variables has been done. Equation3.4.7 is‘70already close to the general form needed forimperfect variables. The jointdistribution from which samples are tobe taken is Pr(X41(t),x, r,Yi).If we have p imperfect random variables, then forthethsubject we haveXt+1)(x-y),(t)4t)r, ., y);Xt1(xi7,(t),4t)r,(t+1), U(t)J(t)Yi)Xt+1)p(xI7,t),p(t), ,(t)r,(t+1) .,Yi)Notice that the indexing parameter for each fullconditional is itself indexedsuch that thethfull conditional is indexed by(7t),(t),çt))TThis indexing is a result of the conditional-conditionalapproach for specifying thejoint probability distribution of the imperfection indicatorand the joint distribution of the covariates (Sections 2.5.2 and2.5.6).Currently, we have the full conditionals forthe joint distribution ofXA.For imperfect variables, we further decompose the samplingdistribution toyield(x.1i7)t4tr,(t), , xuYi)Xt+1)(XI7,t), t)x+1)rixt) .. .p (x..t) (t) ,,(t)E(t)r,xg(t+1), (t+1) .,Yi)Xt+1)p (x7(t) (t) ,,(t)r,U(t+1)U(t+1)u(t+1)Yi)(3.4.8)With the sequence of full conditional distributionsspecified, we can considerthe structure of thethset of full conditionals,(t+1) / (t) E(t) U(t+1)U(t+1) U(t) U(t)X pI, , r, x1 , .,. . . , x,71andXE(t+J ((t) (t+1) U(t+1) U(t+1) U(t)U(t)‘‘-pi3j, , r, x1 , . . . ,x3_1 ,. ,,where= (7t),13(t)(t))Tdenotes the set of updated covariatesand X) be the set of covariates which have yet tobe updated. The fullconditionals reduce to the following/ E up(X,YjI)pIj,r, X(_),Vi)= I E Up Xjj,X(_),Yijcxp(Rx,y, 7)p(YjIx,/3, )p(X/,)andI E____________—Pi3ij,X_Yijop(XI) (3.4.9)where X_) denotes the universa with thethinformation removed.3.4.4 MaximizationSince(it)is equivalent to a sum of uniquely indexed functions,themaximization of(1t)reduces to the maximization of eachcomponent of(1(t))(Appendix B.2.1). Given thatwe are working within thegeneralized linear model and that we are working witha binary outcome,(1313t)can be maximized using the standard iterativelyre-weighted linear regression method typically used for binary logisticregression. With thisapproach, the maximization is the same as withcomplete data with weights[44, 109]. For example, if thejthsubject has a perfect set of covariates, thenit is given a weight of 1, but if thejthsubject has an imperfect variable,then each contribution from the Monte Carlo sampleis given weightmg(t)72This approach can be directly applied to the maximization of(y7(t))since the components are themselves Bernoulli randomvariables modelledwith a binary logistic regression. The maximization forQ(,1t))cannotbe explicitly given without applying a distributionalassumption, thus itsmaximization will be discussed in context of particularexamples.3.5 Louis standard errorsAlthough there are a variety of strategies for obtainingthe standard error ofthe EM based parameter estimates, Louis’ approachesmethod was chosenfor several reasons. The first is that the informationmatrix and relatedderivatives can provide information about the rate of convergence.As well,the EM algorithm can be modified to increase rateof convergence [74]. Thesecond reason is that the likelihood surface is sufficiently complex,thusany movement towards understanding this surfacecan be considered anadvantageous step in understanding the overall problemof imperfection andhow it may affect the likelihood itself. A third reasonis that the mappinginduced by the EM algorithm can be expressed in termsof the informationmatrix and related objects. As before any movement towardsa deeperunderstanding of the EM algorithm can be consideredadvantageous. Tothis end, Louis’ approach based on an analytic derivation of the informationmatrix is a reasonable venture.An attractive feature of this method is thatit can be embedded in theEM algorithm itself with the evaluation occurringonly after the EM algorithm itself converged. Computationally this is attractive sincepotentiallylarge matrices do not need to be stored, nor arecomputationally intenseinversions of these matrices required for each EM iteration. This isan attractive feature because with the implementation ofthe EM algorithm boththe estimates and the associated standard errors are produced. Asecondaryfeature which made Louis’s method popular was thatthe gradient and second derivative matrix were derived from the complete-datalog-likelihoodand not the incomplete-data log-likelihood. Key to the proposedmethod isthe ability to extract the observed information matrix from the complete73data log-likelihood.Proceeding with Louis standard error,denoted SEL, will require theintroduction of some notation. This willbe done as it is needed. To beginwe will consider the case for therandom variable before proceedingwiththe situation of n independentcases.Recall from equation 2.5.2 that thejthcomplete-data log-likelihood islc(Ix’,yj) = logp(X,YI)= Iogp(Rjx,yj,y) + logp(Yjx, 3,q)+ logp(X,)For imperfect variables, xj is alwaysmissing. What is observedis anopaque version of x. For thejthsubject, the observabledistribution isp(X, R,YI)= f. .. YjI)p(XIt), x,r, yj)dxjthus we can definer, y) = log(i••Jr(XY, x,r,Yi)dxi).The score vector for the subject associatedwith the complete-dataloglikelihood,lc(IxY, y)is denoted S(x’,y) and the score vector associatedwith the observed-data log-likelihood, l*(Ixp,r,y), is denotedS*(Ix,r, y).The Hessian matrix for the complete-datalog-likelihood is denotedH(Ix’,)and the for the observed-data log-likelihood,the Hessian is denotedasr, yj).Now under the regularity conditions whichpermit differentiating underan integral sign [15] we haveS*(Ix,rj,yj)(3.5.1)74with the details given in appendixB.3.1. Furthermore, with satisfyingS*(Ix,rj,yj)=0.Now we turn our attention to deriving theobserved (incomplete-data)information matrix. Orchard andWoodbury [83] set downthe basic principles for correctly specifying the informationmatrix in the context of missingdata. Interestingly enough, the conceptualframework used by OrchardandWoodbury is similar to the imperfectvariable framework currently underdiscussion. The similaritymay stem from the common assumption thatthedata contains unobservable information. Thegoal for deriving the information matrix, which provides directlink to estimating standard errors,is toadjust the information matrix obtained fromthe observed data with the information that has been lost due tonon-observance. Louis [70] showedthatthe expected information matrix forfor the unobservable data x giventheobserved data x and the assumption thatwe have a regular exponentialfamily distribution in the canonicalform [74] is given byIm(Ix)=Cov-E(S(IxV,y))]x[S(Ix,yj)-E(S(Ixy,yj))]Tx]=E—S*(Ix,r,yj)S*(Ix1,r,.)T(3.5.2)thus the information matrix for the observeddata isI(Ix) =‘(I)Im(IX)=E(Ic(IX)Ixfl - E+S*(Ix,r,yj)S*(Ix,r,.)T(3.5.3)75where the first part of equation3.5.3 is the conditional expectation of thecomplete-data information matrix while the latter twoparts give the expected information information for the conditional distributionof the complete-data given the observed data. The Louis formula for theobserved, incomplete-data, information matrix for the MCEM based maximumlikelihood solutionis given byI(Ix) -_E (I(IxV) ix) - E (s(EIx’, y)S(Ix’,yi)Tx)+S*(EIxf,rj,yj)S*(x,rj,yj)T(3.5.4)Details concerning the development of equations3.5.3 and 3.5.4 are given inappendix B. Monte Carlo wit Louis standard errorfor imperfectvariablesFrom equation 3.5.3 we have three expectationsfor which Monte Carlo integration will be used. Following Wei and Tanner[1091,the Monte Carloestimate ofS*(xE,r,y) is.*(ijxEriyi)=E[S(Ix’,yj)Ixf,rj,yjj1m9(t)9( )1=1mg.(t)mg(t)where x = (xj,x,rj,yj) andx1 = (x1,. .. The estimate of thesecond term of equation 3.5.3 ism9.(t)E(s(Ix’, yj)S(xV,y)TIx)m(t)(S(Ix, yj)S(Ix, yi)T)m9(t)Tm(t)Z76The first term in equation 3.5.3 is approximatedbym.(t)E(Ic(IXV)Ixfl= 1I(IX)mg.(t)1z1mg.(t)a2l(Ix,ye).=mgi(t)1=1Pulling these three approximations togethertransforms equation 3.5.3 into1mg(t)a2I(IX)=mth(t)mg(t)U- m’— xiii))(i(I4)))T]g(t)11 L+[ 1“t)[m()amg.(t)mg(t)11 1=1(3.5.5)and equation 3.5.4 to1mg(t)a2I(Ix)m9() aaT1(Ix, y)mg(t)U- 1— xiiYi))(1(Ix,yi))T]mg(t)1=1 LTmg.(t) r m+F1 1g(t)aYi]mg.(t)[m911 1=1(3.5.6)77Chapter 4Simulation studiesTwo simulation studies will be executed. Thefirst study will be based ona binary logistic regression with two covariates. Onecovariate will sufferfrom missing data and the other willsuffer from measurement err. Thesecond study will use a binary logistic regressionmodel with a set of perfectly measured covariates and a single imperfect covariate.The imperfectcovariate will suffer from both missingdata and measurement error. It isrecognized that there are many parameters involvedwith the utilized models. Due to the embryonic nature of this investigation, we willfocus on thethe performance of the response model parameters.4.1 Global aims of the simulationstudiesThere are three general andshared aims for the two simulation studies. Thefirst is to determine if the combinedproblem of missing data and mismeasurement is computationally feasible.This will be considered from a resourcemanagement point of view witha particular focus on computational timeand storage requirements. The second goal is to identifyhow the combinedproblem of missing data and mismeasurementaffect parameter estimation.This goal is a derivative from the observationthat in many situations complete case analysis is chosen for its simplicityin execution and low cost onresources. The third is to assess the affect of the MCEM adjustmentonparameter estimation.78Table 4.1: Notation and combinations forthe missing data mechanism thatgenerates the data and the missingdata mechanism assumed by the fittedmodel. The missing data mechanismtakes the form of a binary logisticregression.Case True Mechanism Fitted Mechanism1 MAR MAR2 MAR NMAR3 NMAR MAR4 NMAR NMAR4.2 General structural elementsand procedure ofthe simulation studiesEach simulation study will be a compositeof simulation scenarios for which aglobal scenario will be subsequently presented.For each scenario moderatelyindependent simulations willbe used, which use the same set of simulateddata sets to compare statistical methodologies[101.For each scenario, wewill have S simulations. Withina scenario, the simulated data sets areindependent, but the results from the methodologiesunder investigation arelike matched pairs, reducing the effects ofbetween sample variation.Table 4.1 is an example of four scenarioswhich serve to construct astudy of the effect of how the specificationof the missing data mechanismin the model affects parameter estimation.It is clear from table 4.1 thatwe can consider two modelling situations:we correctly specify the missingdata mechanism in the model and we incorrectly specifyit. Outside of highlycontrolled situations, accurately specifyingall pieces of a complicated modelis highly unlikely. Often, the bestwe can do is make a reasonable conjecturebased on prior knowledge about thesystem under investigation and on thedata we have observed. It may not be feasible to considerall four casesfor each aspect of the study. In these situationsonly cases 1 and 4 will beconsidered.79The MCAR mechanism is notbeing directly investigated because theliterature suggests a complete case analysiswill yield unbiased estimates.The problem should reduce to a mismeasurementproblem. With missingdata, one feature of interest which is notexplored is how the proportionof missing information affects the quality ofthe estimator. Given that themotivating example typically had proportionsof missing data in the rangeof 10% to 20%, when the missing datamechanism was altered, the goal isto keep the proportion of missing data in thisrange. This decision reflectsthe most common structure of social-epidemiologicaldata related to themotivating example.4.3 Evaluative measuresThe same set of evaluative measures willbe used for both simulation studies.We will consider a variety of ways to summarizethe simulation results tocreate a more complete picture of boththe problem of ignoring imperfectdata and the effect of using a maximumlikelihood type method to adjustfor the presence of imperfect variables.4.3.1 Measures of bias andaccuracyIn the current context, the parameterof interest is the estimated vector ofregression coefficients of the response model. For simplicity,consider thethparameter, /3g. The expectation of thethcoefficient of the response modelis estimated byE($)=Z/iswhere s indexes the simulations ina particular scenario. The estimate ofthe variance is given by()80with the estimate of the standard deviationacross the S simulated data setsgiven as(/j)=i($).The standard deviation of /3j will be reported sinceit puts the measure ofvariability on the same scale as the estimateitself. The bias isBias(/3) =E(/3) —which is estimated by(‘)=E(1)—where the bias itself can be thought of asbeing drawn from its own samplingdistribution, thus the variance of the estimatedbias isVar(BiasCãj))=Var(E(/33)— /3j)Var(j)Swhich is estimated by((j))with the standard deviation of the bias acrossthe S simulated data setsgiven by((i)81With these, we can construct the z-score for the bias under the assumptionthat the estimators are unbiased,Bias(/3j)Zbjas =-V SD(Bias(/3j))It is recognized that this is strictly a t-test statistic, but with sufficientlylarge S and an associated degrees-of-freedom S — 1, the normal distributionbecomes a reasonable approximation. To this end we willuse the notationZiaS to emphasize the fact that we are using an approximation. A largez-score indicates a biased point estimator.There is concern with the use of statistical testingwithin the context ofsimulation studies. Since many statistical tests canbe considered to be afunction of the number of simulations S, it is suggested that theuse of thistype of measure is dubious. The results of such statistics canbe improvedthrough the implementation of more or less simulations the statisticscan beimproved or penalized respectively [10]. Burton suggests the standardizedbias which Demirtas proposed in 2004 without any referential antecedent[21]. It is defined as(E(e)100.SE(O))A value of 100% means that the estimate on average falls within one standarddeviation above the parameter. It has been suggestedby Demirtas that ameasure standard bias 50% or less is practically significant [21]. The measureis used again in a simulation study concerning missing information within thecontext of longitudinal data, but cites only Demirtas’ paper for thesourceof this measure [99]. With a wider search, a robust analog wasfound, butthe measure of variability was directly relatedto the estimator of interest[50].In the proposed standardization, the standard error proposed forscalingis related to the sampling distribution of43jand only indirectly related tothe sampling distribution of the bias, Although there is an appreciationfor82what is intended with the standardizedbias, the disconnect between theestimator and the standard errorbeing used to standardizeis problematic.The numerator is the estimated bias,which has its own sampling distributionand hence its own standard error. Eventhough the z-score has some subtleissues, it will be used used withthe caveat that no additional actionwastaken once the simulation studywas completed to enhancethe results, norwere preliminary results considered duringthe execution of the simulation.To this end, the z-score seems reasonabledespite its dependence onS.The mean squared error (MSE), oftenreported as the root mean squarederror (RMSE) to put it on the samemeasurement scaleas the parameteritself, offers an overall measure of accuracy.Incorporating both the bias andthe variability associated with an estimator,it provides a succinct measurefor comparing estimators. TheMSE is given byMSE() =E(- )2=Var(/3)+ Bias(3)2which is estimated by‘()=(/3)+The relative efficiency will be used asa comparative measure between thecomplete case estimate and theMCEM estimate. For thethparameter,the relative efficiency is\ E(/3MCEM,j—/3j)2e(\/3CC,I3MCEM,i) =2— !3j)= MSEMCEM(ij)MSEcc(/3)83which is estimated asIVISEMcEM(3j)e/3CCj/MCEM,j) =MSEcc(j5)Recognizing that both) and ê($GG,j I3M0EM,j)are themselvesestimates which have associated sampling distributions, the bootstrap wasused to estimate the respective associated standard deviations [26]. From theoriginal estimate of the sample distribution of/3yielded from the simulationscenario, a bootstrap sample of size B = 10,000 was obtained for(/i)and ê(&YC,j, /3McEM,j).The bootstrap estimate of the standard deviationacross the S simulations for each estimator isB1I’3B 1B23L/B=BlL4IJbU•b=1whereBand O is thebthbootstrap estimate of 8. Here we let 0=for estimating SDB,MCEM(MSE(/3J)),the standard deviation of the mean squarederror across the S simulated data sets; we let 0 = ê(&C,j I3MCEM,j)for estimatingSDB,e((/CC,J /3MCEM,J)),the standard deviation of the relativeefficiency across the S simulated data sets.4.3.2 Confidence interval based measuresEach scenario within the simulation study consists of S simulations. For eachsimulation, using the Louis estimate of the standard error, we can constructa confidence interval. From this we can directly compare the coverage of thesimulation based coverage rate to that assumed in the constructionof theconfidence intervals. The confidence interval for thethparameter and the84sthsimulation is constructed with the end points defined asIjs±zl_SEL([3j)where z1_. is the 1 — quantile of the standard normal distribution andSEL(,j3)is the Louis standard error for thethparameter and thesimulation.From the confidence interval we will consider two measures: mean lengthand coverage. The mean length can be used as an evaluative tool in simulation studies. If the parameter estimates perform well and exhibit a generalunbiased property, then a narrower confidence interval suggests more preciseestimate as well as gains in efficiency and power [10, 19]. Coverage refersto the proportion of times the constructed confidence intervals contain thetrue value. Ideally, the coverage should be the same as the nominal coverage rate, that is the coverage for the 95% confidence interval shouldbe95%. Over-coverage, where the coverage rate exceeds the proposed confidence level indicates that the results are too conservative. In this situation,too many Type II errors are being made which implies a decrease in statistical power. On the other hand, under-coverage where the coverage rateis less than the proposed confidence level indicates that the results are toobold. In this situation, too many Type I errors are being made.For assessing the coverage, Burton [10] suggests a rough yet inferentiallybased approach where an acceptable simulation based assessment is onethat does not fall outside two standard errors of the the nominal coverageprobability. The interesting feature is that Burton proposes a confidenceinterval based approach to assessing the coverage under the null hypothesisH0 = 0.95, but confidence interval construction with the intent of drawinginference is typically performed under the alternate hypothesis [56]. Casella[15] notes that in practice, we will have both the null and alternate hypotheses in mind; this alternate will dictate the form of the acceptance region ofa 1evel-o test which in turn influences the shape of the ensuing confidenceset. Rather than taking a confidence interval approach where the a-level ispre-set, we will use p-values based on an one-proportion z-test to test the85sample based coverage rate against the nominal rateused for its constructionrather than the proposed confidence interval approach.4.3.3 Robust measuresIt has been observed that some of the simulationsresulted in point estimateswhich could be classified as outliers. The median,denoted m, and medianof the absolute deviations from the median willbe used as compliments toECãj) and Var($j). The median of theabsolute deviations from the median(MAD) is given byMAD(/3) =1I/t—which is estimated withiAb(j)=I/,t —Furthermore, the difference between the median andthe true value, whichis a crude analog to the bias, will be givenby6j=r($)-which is estimated byj =(i)—/3g.4.4 Simulation study 1: binary logisticregressionwith two imperfect explanatoryvariablesWe will explore the situation where the two typesof imperfection co-existin a single data set, but do not occursimultaneously in the same variable.One covariate will suffer from missing data, X1while the other will sufferfrom measurement error, X2. To helpus understand the effect of imperfect86variables on parameter estimation we will break the simulationstudy intofour smaller components. Each part constitutes acomparison where all thefeatures of the simulation are held constantwhile allowing one feature tochange. We will compare• missing data mechanisms,• the effect of the model assumptionabout the missing data mechanismassumption,• the effect of samplesize, and• the effectof different distributional assumptions for the errorterm inthe measurement error model.The assumption of nondifferential, classicalmeasurement error will remain constant across all scenarios in this simulationstudy as will the assumption that c N(0,T). To prevent modelmisspecification, r will beassumed known [40].The target number of simulations for each scenariowas 150 simulateddata sets. For most components, eight runs of 25 simulationswere executed.If this fell below 150 simulations, then two additionalruns of 25 simulationswere executed in order to bringthe total number of simulations to approximately 150. There is not a constant number of simulationsrun for eachscenario and this will be addressedin a subsequent section devoted to discussing the implementation problems of the MCEMalgorithm within thiscontext.4.4.1 Likelihood model for simulationstudy 1We will begin with equation 2.5.4 wheretc(IXU,y)= logp(Rjx,yj,)+logp(Yjx,, )+logp(X,I);recall that log p(Rx,yj, -y) is the model of the mechanism of imperfection,log p(YjIx, 3,q.’) is the response model and logp(X’,b)is the covariate87model. For the covariate model,we begin with equation 2.5.6, thusp(XI) =p(XIx2,x,)p(X2)p(XIxii,‘‘f)p(X1I,f).Under the pecificationsof this simulation study, xXj1,ob if’ the realization is observed or x= Xjl,mjss if the realization is not observed, thusp(Xjb1)=p(X1I’f).The second covariate suffers only from measurementerror, so Xandp(XIxi,2)=p(XIx2,x1,Assuming a bivariate normal distribution forthe joint probability distribution of (X1,X2), we have(X1,X) MVN(ii,E),whereT= (0,0)Tand= [o2]thus we haveX, -‘-‘N(O, 1), andX2xi ‘-‘-N(0.2xji,0.96).Under the assumption of an unbiasedclassical imperfection model, equation2.5.10 becomesxi2 =x2+ 2where2N (0,r2); thereforeXIx2‘-‘s N(x2,r). Often it is assumedthat social-epidemiological data setsare noisy, so we initially let ‘r2 = 1.The response is a Bernoulli randomvariable with the binary logisticregression response model givenin section 2.5.5 which is indexed by /3 =88Table 4.2: Coefficient values for different missingdata mechanisms usedto generate simulated data sets. The systematic componentis specified as=Mechanism MAR NMARA (—2,0, 1.5, (—2, 0.75, 1.5,B (2,0.1.5,0)’ (—2,0.75,1.5,0)’C(_2,0.25)T (_2,1.5025)T(—1.39,0.41,_0•69)T.The mechanism of imperfection model is given insection 2.5.6 and we will retain the definition for the imperfectionindicatorgiven in section 2.3. It is clear fromequation 2.5.14 that the conditionalexpression of the joint distribution of imperfection indicatorallows for adependent structure between the indicatorfor missing data and the indicatorfor mismeasurement. Also, it allows for dependent structureacross the setof imperfection indicators.Two simplifying assumptions about the mechanism ofimperfection willbe made:• thethandkthindicators of imperfection are independent for allj =1,...,pandk=1,...,p,and• the indicator for missing data and the indicator for mismeasurementare independent, R’J±RJ.Predicated on these two assumptions, equation2.5.14 becomesp(Rjx,y,) =p(RIx,y, 7f)since all realizations from X1 are accurately measured, X2 ismismeasuredfor all i = 1, . . . , m, and all realizations fromX are observable.For the simulation, three different missingdata mechanisms will be used(Table 4.2). The structure of the variables for the missingdata mechanismmodel is the same for all three mechanisms,= (to, ‘Yr, 72,73)Twhere‘yo89corresponds to the intercept,‘Yirelates to X1,X2 is associated with‘Y2,and73 being the coefficient for y.With only one covariate sufferingfrom missing data, the MAR mechanism hasYi0 and the NMAR mechanism has‘y0. For mechanism A,the response was part of the missingdata mechanism model. The simulationbased mean proportion ofdata missing when the missing data mechanismA is MAR is 0.176 and is 0.195 when NMARwith respective ranges of 0.26and 0.29. The specificationof the missing data mechanism was motivatedby the applied problem. With social-epidemiologicalresearch predicated onlinked databases where oneor more data sources includes census data for theexpressed purpose of investigatingvulnerable populations, it is reasonableto assume that for some cultural constructs,there will be a high proportionof missing information. The choice of havingthe proportion of missing dataaround 20% was to reflect thisassumption. Missing data mechanism B is avariation of mechanism Awith only the dependence on the response being-different. This will allow foran exploration of the effect of the response inthe missing data mechanismon parameter estimation. As with mechanismA, the proportion of missing informationbetween MAR and NMAR mechanisms are similar: 19% and21% respectively. With mechanism C, the rolesof7iand72were switched. For mechanismsA and B,72was larger by twofold than7i•Here, we kept the sign of the two coefficientsthe same as inthe previous two mechanisms,but allowed7ito be greater then72.Additionally, the difference insize between the two coefficients was amplified.This was done to force astrong effect when the mechanism was NMAR.This resulted in a difference in the proportionof missing data for the MARand NMAR cases: 12% and19% respectively.904.4.2 Simulation specific details for the Monte Carlo EMalgorithmBeginning with the approximation to 3.3.12, we have(It))+(i_ñrr) [i(7I7(t))+ Q(Iøt)+Under the imperfection mechanism model assumptions given in the previoussection the Monte Carlo estimate ofQ(717(t))isi(I7(t)) (7It))m9(t)1/1A(t+i) 1(t)= g(t)logpR11Ix1,yj,71m1=1Recall that in section 3.4.2 we discussed how the sample size used in theexpectation step can be made to increase as the algorithm gets closer tosatisfying the convergence criterion, but indicated that we would not implement it. It will be assumed that the Monte Carlo sample is the the same for/ j 1(t)all iterative steps and all subjects. Under these assumptions Qt.-r1h”1becomes(7(t))logp(RIxt+1),y,7t)).It is recognized that this will not translate to the most efficient algorithmsince in the early iterations of the EM algorithm a cruder approximation,based on smaller Monte Carlo samples, would be sufficient [7, 44, 46, 48].The computational cost of this decision was not investigated.The approximation to the expectation estimate of the response model91given by equation 3.4.6 becomes(It))=iogp(yjxt+1),(t))With the covariate model specified in the previoussection the estimate ofequationQ(/,1(t))is(It))logp(Xx),t))+1ogp(Xjxl),,P(t))+logp(Xjit)). Gibbs adaptive rejectionsamplingBeginning with equations 3.4.8, the structureof the Gibbs sampler for thissimulation isx’ p(XjiI ),x,rji,xt),yj),andp(x2i),x,ri2,x,Yi).Equation 3.4.9 gives the general formof the full conditionals which becomep (Xjj ,rf1,x,yj)ocp(RIxji,x,yj,7f)p(YjIxji,x,/3)xp(X2Ix1,)p(XjlI4D)andp(Xj2I,xf,xji,r,yj)xp(XIx2,b)p(X2Ixi,Each component of the conditionals islog-concave (Appendix B.1.5, TableB.1), thus each full conditional is itself log-concave.This permits the useof the Gibbs adaptive rejection sampler. Thestatistical package R wasused to perform Gibbs Adaptive Rejection Sampling(ARS) and the recently92developed function ars in the ars library wasused [87]. The literature whichincludes a sensitivity analysis for the Monte Carlo samplesize for MCEMimplementation within a logistic regressionsetting strongly suggests that theresults are insensitive to sample sizes ranging from2000 to 10,000 [44, 45, 48].Lipsitz seems to prefer a larger sample sizeand regularly used a Monte Carlosample of 10,000, but makes no mentionof burn-in [65, 66]. When burn-insamples are discussed, they are exceptionallysmall. Stubbendick [95] uses aMonte Carlo sample of size 100 within the contextof a random effects modelwith no burn-in. Also, it is suggested that nodifference was found betweenusing a Monte Carlo sample of size 100 with noburn-in and using a sampleof size 1000 with a burn-in sample ofsize 100. The burn-ins appear to berather small; this may be a reason for thenegligible results. Recognizingthat automating the Monte Carlo sample sizeover the entire simulation wasbeyond the scope of inquiry for thisinvestigation, a Monte Carlo sampleof size 6000 was drawn from the full conditionalswith the first 1000 beingburn-in for the sampler; all results arepredicated on the remaining MonteCarlo sample of size 5000. Analytic solutions for maximizationThe maximization step of the EM algorithmmaximizesQ(((t)).With eachcomponent ofQ((t))uniquely indexed, its maximization reducesto themaximization of each component. BothQ(j3j3(t))andQ(yy(t))are characterized by the binary linear regressionmodel, thus they can be maximizedusing standard iteratively re-weighted linear regressionmethod with the observed realizations and the MonteCarlo sample realizations are weightedbased on their contribution. The observed realizationsfrom the experimentare given a weight of one with each realization forthe Monte Carlo samplegiven the weight m* The Monte Carlo algorithmgiven by Wei and Tanner[109] allows us to use the EM algorithmby method of weights [48].The remaining distributions are normallydistributed and have closedform solutions. Using the results found in Dwyer’s[23] work on multivariate93derivatives, the MCEM maximum-likelihoodtype estimate of the mean isn[rr) xi+(1-rr)(4.4.1)where j. = xj is the intra-Monte Carlosample average with thedot emphasizing the index over which the average istaken (Appendix C.1.1).The estimate for the covariance matrix isn[(P)(xii - )(x-+ (1_rrI)m—1S2](4.4.2)where82=LZi(xii— t)(xj— 1)Tis the intra-Monte Carlo samplevariance(Appendix C.1 .1).These two estimates are intuitively appealingsince they have a generalform which emulates that of the maximum likelihoodestimates for a complete case scenario. When a subject has a covariate whichhas been classifiedas imperfect, but not all realizations suffer from the imperfection thenfor thejthsubject for which the imperfection does notmanifest we have x =for all 1. When this occurs we have[(‘)+(i_ñrrz’”)±x.l]=*[(fr’)x+(i-ñrir)[:]...[:]][1]]i=1 j=1 j=1 Z1=ii2i94The situation is similar for the estimate of the covariance,m21 (x1—ILl) (xi1 — 1tl)(Xi21—mf-(Xjij— ILl)(Xi21— 2)(x21i \l m(— — /Llj— Itl) jiXj21 — IL2(x1—1u’)Zl(xj2j— P’2) —112)2= (x—11)2(x—/1l)(j2.— 112)(x1— I1l)(i2.— /12)Tiswhere s =Zl(x— 112)2is the intra-Monte Carlo sample variationfor the second covariate, x2. is the intra-MonteCarlo sample average forthe1thsubject, and(i2.— /12) is the average deviation of the Monte Carlosample from the grand mean.Although much of the maximization is algorithmic,the computationalgains with using the closed solutions is expected tobe minimal, but theyprovide some insight into the structure of the estimates. It is clearfromequation 4.4.1 that when imperfection exists, the contributionfor the meanis the estimated expectation of the joint distributionof X1 and X2 giventhettMCEM estimate of the parameters with an analogouscontributionfor the variance in equation Louis standard errorBefore proceeding with the derivation of the Louis standard errors,two observations will be made. Given that the likelihoodis a product of uniquelyparameterized probability distributions, the Hessian matrix willbe block diagonal with each block pertaining to a single probabilitydistribution. Also,since the object of interest in this investigation is parameterestimation ofthe response model, only the block pertaining to the response modelwill bediscussed.From the general presentation of the Louis standard error,we have therelationship given in equation 3.5.3 with the MonteCarlo approximationbased on the current Gibbs sample given byequation 3.5.5. Sincethe likeli95hood model components are uniquely parameterized, the resulting information matrix will be block diagonal, thus we need not compute the inverse forthe entire information matrix in order to obtain the covariance matrix forthe parameters of interest. Using equation 3.5.6 and the score and Hessianfor the response model as given in appendix B.2.2, we are able to obtain theMCEM Louis standard error for thesthsimulation estimate of / Results and discussionsThrough the very act of completing the first simulation, we have begun toanswer the first global question; it is computationally feasible toimplementa solution to the problem of having both missing data and measurementerror in the same data set. Although fraught with problems which willbe addressed in section 4.4.4, the proposed strategy can be implemented.From implementation, we move to assessment of how the combined problemof missing data and mismeasurement impact parameter estimation. Finally,the effect of the MCEM adjustment is considered..For this simulation study, four experiments were considered. The firstis a comparison of how the agreement or disagreement between the missingdata mechanism that generated the missingness and the assumed mechanismfor the model affects parameter estimation for the response model. Herewe will consider four scenarios, denoted as generating mechanism-modelmechanism: MAR-MAR, MAR-NMAR, NMAR-MAR, NMAR-NMAR. Thesecond experiment explores the effect of different missing data mechanisms.This experiment will help to determine if our observations can be generalizedto more missing data mechanisms than that used in the first experiment.The third experiment is to explore the effect of sample size on parameterestimation. This is to begin considering the asymptotic properties of thisapproach. The final experiment is to consider the effect that the variancer2 has on parameter estimation.964.4.3.1 Comparing the effect of the missing data mechanismassumptionTwo comparisons will be made in this experiment: comparingthe naivecomplete-case approach with the MCEM method when the generatingmissing data mechanism and the modelled mechanism arematched, and whenthey are mismatched. Since we are using a logistic regressionmodel, it isgood to recall that both Vach [104] and Greenland[38] indicate that theMAR missing data mechanism with the response asa covariate may or maynot result in negligible bias, but for measurement error withthe logisticregression model we should expect attenuation or bias towardszero [40].For the first situation, we will consider a comparison acrossthe methodsfor case 1 and case 4. The complete-case approach has large estimatedstandard deviations for both cases (Tables 4.3 and 4.4). Theestimate ofis unbiased as anticipated with small z-scores. For bothcases,/32isbiased, both have very large z-scores, and we see the expectedattenuation.A general failure to reach the nominal level for/32across the two cases withcase 4 failing for the intercept as well (Table 4.5). The results for6 emulatethat of the bias (Table 4.6).The MCEM methodology has large estimated standarderrors, but theseare much larger than those of the complete-case approach (Tables4.3 and4.4). The bias associated with /3 is increased in both caseswith largeincreases in case 1. The trade-off for these increases is reductionin thebias associated with/32.Unfortunately, the bias has insufficient evidenceagainst the null of no bias for only case 4. The MSE is similar acrossthe thetwo mechanisms, but the MCEM based MSE is substantially larger forallparameter estimates than that of the complete-case approach;ê indicates ageneral inefficiency for the MCEM method.The 1VICEM confidence intervals are consistently larger thanthose ofthe complete-case approach, but this is not unexpected due to theobservedincrease in the estimated standard error. Also, the length of theconfidenceintervals has a much larger confidence interval thanthe naive ones. For bothcases, the nominal coverage is reached for all parameter estimates(Table974.5). The results foröMCEM are similar to those of the bias except in case 4where a decrease in 6 is seen moving fromthe naive approach to theMCEMmethod.In general, similar trends areseen when moving from the complete-casemethod to the MCEM approach forcases 2 and 3 as we have seen forcases 1and 4, thus we will focus oniy onthe effect of misspecifying the missingdatamechanism on the MCEMmethod when we move from correctspecificationto incorrect characterization. Whenmoving from case 1 to case 2 thereis areduction in the estimatedstandard error for/32and a decrease in its bias(Table 4.3). The bias associated with/3o and j3 increases. Although we havesome increase in bias, theMSE for all estimates is reduced since itis drivenprimarily by the size ofthe estimated standard deviation.Also, we see areduction in ê and a shorteningof the confidence intervals. Theattainmentof the nominal coveragerate is lost for andi3(Table 4.5). Finally wesee a decrease in6MCEM(Table 4.6).Under-modelling the missingdata mechanism, which is assuminga smallermissing data mechanism model thanthe true model (e.g. case3), the estimated standard deviationofi3oand/32is reduced, but increasedfor/3i(Table 4.4). The bias isincreased for the estimates of /3and/32as wellas an increase in the associatedMSE. Interestingly, 6 is reduced forall parameter estimates. Thereis little difference in the lengths of theconfidenceintervals and the attainmentof the nominal coverage rate (Table4.5). Finally,8MCEMexhibits mixed results (Table4.6).Discussion We see many featureswhich would be anticipated ifthis wasonly a missing data problem oronly a mismeasurement problem.For case1, we see an unbiased estimate of/3iwhen a complete-case analysisis implemented; for case 4, the estimate isbiased. Also, we see that theestimatesof/32for both cases are biased andwe see the expected attenuation.Theestimated standarddeviations are large, but we notice a substantialincreasein the variability of theMCEM estimates. This contributesto large MSEestimates for both cases. A positivefeature of the MCEM approach is thatit is able to reach the nominalcoverage rate for all parameter estimates,for98which the complete-case analysis fails.In this situation, we see many of theexpected features, but we do see a subtle trade-offemerge. It appears thatsacrifices are made in the estimation of/3and/3iin order to reduce the biasassociated with/32.This trade-off seems to work best whenthe matchedmissing data mechanismis NMAR.When the missing data model is misspecified,the type of misspecification appears to have some affect on how the parametersare affected. Whenwe over-model the missing data mechanism, case 2, wea general reduction inthe noise of the parameter estimates,but an increase in the bias associatedwith/3oand/3i.Also, the nominal coverage is lost for these twoparameter estimates. If we are only interested/32then over-modelling has someadvantages by reducing bias, variabilityand increasing its efficiency. Whenthe missing data mechanism is under-specified, wesee mixed results forj3and/32.We have an increase in bias and an increase in theMSE, but theestimated standard deviation decreases for only/3i.There is little changein the coverage and in the confidence interval lengths.99Table 4.3: Comparing the effect of themissing data mechanism (MAR,NMAR) on the point estimate, bias, andmean squared error whenthe mechanism that generates the missingdata and the missing data mechanism in themodel are matched and mismatched.Mechanism A was used to generate thedata and the coefficient for missingdata variable isi3and for the mismeasured variableis/32,case 1 and case 2 only.(a) Case 1: MAR-MAR, Simulation Size= 153NaiveMCEMg E (s’r) ñis(Z,jas) 1E()E()(sr)())?E() ()-1.367 (0.290) 0.023 (0.961)0.085 (0.013) -1.503 (0.417) -0.113 (-3.344) 0.187(0.051) 2.20 (0.58)8 0.419 (0.284) 0.009 (0.389) 0.080 (0.011)0.479 (0.347) 0.069 (2.462) 0.125 (0.020)1.56 (0.23)/32-0.349 (0.190) 0.341 (22.174) 0.152(0.011) -0.840 (0.759) -0.150 (-2.445)0.599 (0.315) 3.93 (2.15)(b) Case 2: MAR-NMAR, Simulation Size112NaiveMCEM/3E()(sb) (Zja)()E(/)(s1) Ts(Zbias) MSE()(SD)/3 -1.419 (0.282) -0.029 (-1.075)0.080 (0.014) -1.524 (0.376) -0.134 (-3.781) 0.159 (0.042) 1.98 (0.28)/30.420 (0.309) 0.010(0.335) 0.095 (0.016) 0.495 (0.356) 0.085(2.530) 0.134 (0.024) 1.40 (0.15)/32-0.333 (0.192) 0.357 (19.652) 0.164 (0.013) -0.747(0.543) -0.057 (-1.112) 0.298 (0.073) 1.82 (0.50)Table 4.4: Comparing theeffect of the missing data mechanism (MAR,NMAR) on the point estimate, bias, andmean squared error when themechanism that generates the missing data andthe missing data mechanism in themodel are matched andmismatched. Mechanism A was used to generate thedata and the coefficient for missingdata variable is/3iand for the mismeasured variable is/32,case 3 and case 4 only(a) Case 3: NMAR-MAR, Simulation Size=150NaiveMCEM/3E (j) (s’D) ii(Zbias)()E()(sb)(Zbjas) i() ()-1.347 (0.311) 0.043(1.690) 0.099 (0.013) -1.458(0.389) -0.068 (-2.145) 0.156 (0.023) 1.58 (0.14)/30.457 (0.365) 0.047(1.577) 0.135 (0.017) 0.490 (0.382)0.080(2.560) 0.152 (0.020) 1.12 (0.07)/32-0.348 (0.194) 0.342 (21.581) 0.154(0.012) -0.776 (0.473) -0.086 (-2.227) 0.231 (0.031)1.49 (0.25)(b) Case 4: NMAR-NMAR, Simulation Size= 139NaiveMCEM/3 E()(siz) iTh1()E (sb) S(Zbias)ITh1() ()-1.352 (0.308) 0.038(1.463) 0.096 (0.012) -1.478 (0.422) -0.088 (-2.449) 0.186 (0.050)1.93 (0.53)/30.440 (0.329) 0.030(1.092) 0.109 (0.011) 0.459 (0.350) 0.049(1.632) 0.125 (0.015) 1.15 (0.09)/32-0.341 (0.207) 0.349 (19.904) 0.164 (0.013) -0.770(0.641) -0.080 (-1.464) 0.418 (0.188) 2.54 (1.15)I.Table 4.5: Comparing the effectof the missing data mechanism (MAR, NMAR) on confidence intervallength, andcoverage when the mechanism thatgenerates the data and the mechanism used to modelthe data are matchedand mismatched. Mechanism A wasused to generate the data and the coefficient for missing data variableisi3and for the mismeasured variable is/32.(a) Case 1: MAR-MAR, Simulation Size=153I3jLpjajve (gb)LMGEM()CoverageNajve (p-value) CoverageMcEM (p-value)1.058 (0.123) 1.571 (2.866) 0.928 (0.296)0.961 (0.492)/31.140 (0.135) 1.361 (0.728)0.967 (0.228) 0.967 (0.228)/320.756 (0.101) 2.414 (6.359) 0.523(<0.001) 0.961 (0.492)(b) Case 2: MAR-NMAR, Simulation Size= 112/3 LNaive(gb)LMCEM (SD) CoverageNajve (p-value) CoverageMcE],f (p-value)1.072 (0.135) 1.352 (0.576) 0.982 (0.012)1.000(<0.001)/3i1.150 (0.143) 1.303 (0.367) 0.973 (0.128) 0.982(0.012)/320.758 (0.091) 1.843 (0.878) 0.500(<0.001) 0.938 (0.585)(c) Case 3: NMAR-MAR, Simulation Size15OI3jLjajve()LMCEM (SD) CoverageNajve (p-value) CoverageMcEM (p-value)/3o1.064 (0.131) 1.311 (0.395) 0.92 (0.176)0.933 (0.412)/31.201 (0.177) 1.328 (0.282) 0.92 (0.176) 0.947(0.856)/320.762 (0.095) 1.816 (0.638) 0.56(<0.001) 0.967 (0.255)(d) Case 4: NMAR-NMAR, Simulation Size139/3 LNaive (SD)LMCEM (SD) CoverageNajve (p-value) CoverageMcEM (p-value)./3 1.063 (0.126) 1.454 (1.811) 0.899 (0.048)0.935 (0.480)/3’1.202 (0.158) 1.325 (0.363) 0.964 (0.376) 0.957 (0.692)/320.755 (0.102) 1.990 (2.988) 0.568(<0.001) 0.935 (0.480)Table 4.6; Comparing the effect of themissing data mechanism (MAR,NMAR) on the median and differencebetween the median and true value(6). Mechanism A was used to generatethe data and the coefficient formissing data variable isj3and for the mismeasured variable is/32.(a) Case 1: MAR-MAR, Simulation Size153/3irnNjV(/3)(MAD)rnMCEM(/3J) (MAD)8Naive 8MCEMj3 -1.361 (0.230) -1.450 (0.276)0.029 -0.060/3i0.428 (0.275) 0.471(0.351) 0.018 0.061/32-0.325 (0.191) -0.696 (0.472)0.365 -0.006(b) Case 2: MAR-NMAR, Simulation Size112!3rnNaive(/’3j)(Ab) fdMCEM(/j) (MAD)3Naive SMCEM-1.369 (0.248) -1.439 (0.274) 0.021-0.0490.406 (0.289) 0.443(0.318) -0.004 0.033/32-0.325 (0.143) -0.684 (0.380)0.365 0.006(c) Case 3: NMAR-MAR, Simulation Size15Oj3rnNaive(13j) (MAD) TflMCEM(/3j)(IVIAD)&vaive SMCEM-1.354 (0.313) -1.400 (0.366)0.036 -0.010i30.418 (0.368) 0.463 (0.373)0.008 0.053/32-0.360 (0.186) -0.794 (0.457)0.330 -0.104(d) Case 4: NMAR-NMAR, Simulation Size=139/3rnNaive(13j) (MAD) ?T1MCEM(/3j)(MAD)6Naive MC’EM-1.330 (0.290) -1.425(0.356) 0.060 -0.035i30.409 (0.348) 0.463(0.379) -0.001 0.053/32-0,334 (0.210) -0.714 (0.466)0.356 -0.0241034.4.3.2 Comparing missingdata mechanismsThree missing data mechanismswill be considered with each takingan MARand an NMAR form. These mechanismswill be considered for both case 1and case 4 where the missin datamechanism that generated the data andthe mechanism used to modelthe data are matched. We will focuson acomparison across the three mechanismsto see if the functional form of themissing data mechanism affectsparameter estimation. A special note willbe taken for differences between mechanismA and B since they vary onlyin the inclusion or exclusion of theresponse as part of the mechanism.Beginning with case 1, wesee that the estimated standard deviationforthe parameter estimates under thecomplete case method are similar acrossthe mechanisms (Table 4.7). For/32we see the expected bias, but wehavemixed results for the other parameterestimates. For/3omechanisms A andB are unbiased and for/3ionly mechanism A is unbiased. Thesemixedresults may be how the ambiguitywith parameter estimation with missingdata is manifesting [38,1041.The MSE is similar across the mechanisms,but probably a bit large withmechanism C. Table 4.9 suggests that theconfidence interval lengths are ofsimilar magnitude across the three mechanisms. We see a similarity in thecoverage rates for mechanisms A and B,but C shows a lower coverage rate for/3o.The noticeable difference inöNaiVeis with mechanism C for/3oand /3 which are much larger thanthe othermechanism estimates.When the MCEM adjustment is applied,we notice that the estimatedstandard deviation of /3 is similaracross the mechanisms, but the magnitudeexhibits the expected increase (Table4.7). There is a correction for the bias,but this correction appears to be better whenthe response is not part of themechanism. We see that the MSE issimilar across mechanisms for/3andi3but varied for/32.A noticeable feature is that the MSEfor mechanism A islarger than that of B. The confidenceintervals are longer and more variable,but this is similar to that seen inthe previous experiment. The coverageis similar across all mechanisms for/3iand yields the same conclusions asthe naive approach. This is not true forthe others estimates;/32improves104under the MCEM approach and attains the nominalrate (Table 4.9). Wesee no clear pattern for6MCEMacross the mechanisms for$andj3,butwe observe a general improvement forö(/2)(Table 4.10)When the matched missing data mechanismsare NMAR, we see thatthe estimated standard deviations obtained from thenaive complete-caseanalysis are similar across the three missingdata mechanisms (Table 4.8).There is much variation in the biases,but we do observe that the bias of/32is similar across mechanisms and exhibitsthe expected magnitude. Theconfidence interval lengths are similar across the mechanisms,but in termsof coverage we observe thati3is too low for mechanisms A andC (Table4.9). We see for/3oandi3that8Najveis lowest for mechanism A and largestfor C. For132, 6Naiveis similar across the mechanisms (Table4.10).Under the MCEM methodology, we see that theexpected increase in theestimated standard variation of2,but we also notice that the mechanismsare similar within each set of parameter estimates (Table4.8). There ismuch variation in the biases, but we do observea reduction in the bias of/32for all mechanisms, but the reduction appears toremove the bias for onlymechanisms A and B. Although there is no clearpattern for/3we do observethat the z-scores reveal that all estimates can be consideredunbiased, whichis not the case for the naive approach. For13o, there is an improvement inthe bias for mechanisms B and C. The MSE is similar acrossthe mechanisms,although there is the suggestion that differencesmay emerge for/32.Fromtable 4.9, we see that the confidence intervallengths are longer for theMCEM approach, but they are similar across themechanisms. In termsof coverage, all mechanisms across all the parameterestimates attain thenominal coverage rate. Finally, forSMOEM we see improvements for allestimates and mechanisms except in the singlecase of mechanism A andi3(Table 4.10). In this case we see a worsening.Discussion When the generating missing datamechanism is MAR andmatched with the model assumption, we seemuch similarity across themechanisms. Some notable deviations are with mechanismC which haslarger MSE for the complete-case approach anddoes not perform as well as105the other mechanisms for the complete-caseapproach in terms of coverage.We see this problem with coverage vanishes with theMCEM methodology.Furthermore, we note that the MCEM approachdoes a reasonable job atcorrecting the bias, but tends to performless well when the response is partof the missing data mechanism. We alsosee the largest MSE for all parameter estimates associated with mechanism A.We may conclude, based onthis limited simulation based investigation, that underthe MAR assumptionthere is little difference across the threemechanisms, but the inclusion ofthe response in the mechanism may cause the parameterestimates to haveless than desirable properties.Under the NMAR assumptions, drawinggeneral conclusions is muchmore difficult for there are many pointsat which the three mechanisms aresimilar, but there are points of divergence.The inclusion of the responsein the missing data mechanism does not appearto have the same influenceover the estimate properties as in theMAR situation. The MCEM approachappears to do a sufficient job at improvingthe parameter estimates and theirproperties over the naive complete-case approach.106Table 4.7: Comparing differentmissing data mechanisms when the mechanism thatgenerates the data and themechanism used to modelthe data is matched. Simulation sizes for case 1 (MAR-MAR)are 153, 168, 167 formechanism A, B, and C respectively.Naive MCEMMech./3E (j) (s’)z) S (Zjas) i()E()(sb) S (Z,ja)()A /3 -1.367 (0.290) 0.023 (0.961)0.085 (0.013) -1.503 (0.417) -0.113 (-3.344) 0.187 (0.051))B 8 -1.355 (0.251) 0.035 (1.804)0.064 (0.008) -1.478 (0.315) -0.088 (-3.618) 0.107(0.015)C/-1.318 (0.285) 0.072 (3.253)0.087 (0.010) -1.436 (0.423) -0.046 (-1.411) 0.181 (0.048)A 0.419 (0.284) 0.009(0.389) 0.080 (0.011) 0.479 (0.347) 0.069 (2.462) 0.125(0. (0.23)B /3 0.361 (0.309) -0.049 (-2.062)0.098 (0.011) 0.448 (0.374) 0.038 (1.315) 0.141 (0.017))Cj30.310 (0.310) -0.100 (-4.174) 0.106(0.012) 0.411 (0.403) -0.001 (-0.021) 0.163 (0.032)A/32-0.349 (0.190) 0.341 (22.174) 0.152(0.011) -0.840 (0.759) -0.150 (-2.445) 0.599 (0.315)B/32-0.321 (0.188) 0.369 (25.457) 0.171 (0.012) -0.742 (0.531)-0.052 (-1.259) 0.285 (0.042)C/32-0.330 (0.193) 0.360 (24.184) 0.167 (0.010) -0.778 (0.652)-0.088 (-1.743) 0.432 (0.175)I’Table 4.8: Comparing differentmissing data mechanisms when the mechanism thatgenerates the data and themechanism used to modelthe data is matched. Simulation sizesfor case 4 (NMAR-NMAR), they are 139, 163,168 respectively.Naive MCEMMech./3E (j) (s’)) ñYS(Z,jas)()E(‘i)(SD) ñiS (Zija) (SE))A 8 -1.352 (0.308)0.038(1.463) 0.096 (0.012) -1.478 (0.422)-0.088 (-2.449) 0.186 (0.050)B -1.319 (0.265) 0.071(3.418) 0.075 (0.007) -1.446(0.374) -0.056 (-1.914) 0.143 (0.020)C j3 -1.288 (0.294) 0.102(4.490) 0.097 (0.009) -1.420 (0.388) -0.030(-0.990) 0.151 (0.019)A /3 0.440 (0.329)0.030(1.092) 0.109 (0.011) 0.459(0.350) 0.049(1.632) 0.125 (0.015)B 0.402 (0.325) -0.008(-0.332) 0.106 (0.012) 0.431 (0.385) 0.021(0.712) 0.148 (0.011)C/3i0.330 (0.323) -0.080 (-3.208) 0.111 (0.013) 0.409(0.408) -0.001 (-0.044) 0.167 (0.021)A132-0.341 (0.207) 0.349 (19.904) 0.164 (0.013) -0.770(0.641) -0.080 (-1.464) 0.418 (0.188)B12-0.319 (0.209) 0.371 (22.629) 0.181 (0.013)-0.717 (0.533) -0.027 (-0.644) 0.285 (0.048)C/32-0.337 (0.196) 0.353 (23.349) 0.163 (0.012)-0.801 (0.573) -0.111 (-2.502) 0.341 (0.048)I.Table 4.9: Confidence interval length,and coverage when the mechanism that generates thedata and the mechanism used to model the data are matched.Simulation sizes for case 1 (MAR-MAR) are 153,168, 167 for mechanismA, B, and C respectively; for case 4(NMAR-NMAR), they are 139, 163, 168 respectively.(a) Case 1: MAR-MARModel/3LNaive()LMCEM (SD) CoverageNaive (p-value) CoverageMcEM (p-value)A 1.058 (0.123) 1.571 (2.866)0.928 (0.296) 0.961 (0.492)B 1.045 (0.102) 1.306 (0.392) 0.958(0.588) 0.976 (0.024)C 1.044 (0.122) 1.433 (1.717) 0.898 (0.028)0.940 (0.592)A /3 1.140 (0.135) 1.361 (0.728)0.967 (0.228) 0.967 (0.228)B /3 1.134 (0.128) 1.315 (0.336) 0.940(0.600) 0.946 (0.836)C/31.175 (0.167) 1.438 (0.962) 0.958(0.604) 0.958 (0.604)A/320.756 (0.101) 2.414 (6.359) 0.523(<0.001) 0.961 (0.492)B/320.739 (0.083) 1.774 (0.796) 0.488(<0.001) 0.940 (0.600)C/320.745 (0.090) 2.100 (2.785) 0.461(<0.001) 0.970 (0.128)(b) Case 4: NMAR-NMARModel ,i3LNaive()LMCEM (SD) CoverageNaive (p-value) CoverageMcEM (p-value)A 1.063 (0.126) 1.454 (1.811)0.899 (0.048) 0.935 (0.480)B /3 1.045 (0.105) 1.315 (0.500)0.933 (0.372) 0.963 (0.372)C /3 1.050 (0.111) 1.356 (0.498) 0.869(<0.001) 0.935 (0.416)A/31.202 (0.158) 1.325 (0.363) 0.964 (0.376) 0.957 (0.692)B 1.228 (0.139) 1.359 (0.280) 0.963 (0.372) 0.945(0.772)C /3 1.280 (0.141) 1.475 (0.361) 0.940 (0.600) 0.958 (0.588)A/320.755 (0.102) 1.990 (2.988) 0.568(<0.001) 0.935 (0.480)B/320.748 (0.076) 1.749 (0.677) 0.509(<0.001) 0.920 (0.160)C/320.745 (0.087) 1.955 (0.898) 0.542(<0.001) 0.952 (0.884)Table 4.10: Comparing the effectof the missing data mechanism (MAR,NMAR) on the median and difference between themedian and true value (6)when the mechanism that generated themissing data and the mechanismassumed by the model are matched. Simulationsizes for case 1 (MAR-MAR) are 153, 168, 167 for mechanismA, B, and C respectively; for case 4(NMAR-NMAR), they are 139, 163, 168 respectively.(a) Case 1: MAR-MARModelfñNaive($j) (MAD) fñMCEM(!j) (MAD)Naive 6MCEMA -1.361 (0.230)-1.450 (0.276) 0.029 -0.060B -1.354 (0.242)-1.463 (0.306) 0.036 -0.073C -1.305 (0.231) -1.364 (0.288)0.085 0.026A 0.428 (0.275)0.471 (0.351) 0.018 0.061B 0.378 (0.299) 0.436(0.370) -0.032 0.026C 0.290 (0.313) 0.374(0.345) -0.120 -0.036AL32-0.325 (0.191) -0.696 (0.472)0.365 -0.006B/32-0.312 (0.200) -0.653 (0.427)0.378 0.037C/32-0.307 (0.183) -0.651 (0.435) 0.3830.039(b) Case 4: NMAR-NMARModel /jrnNaive(/3j) (MAD) mMCEM(/3j)(MAD)Naive MCEMA /3 -1.330 (0.290)-1.425 (0.356) 0.060-0.035B -1.294 (0.269) -1.372(0.356) 0.096 0.018C -1.252 (0.280) -1.376 (0.369)0.138 0.014A/30.409 (0.348) 0.463 (0.379)-0.001 0.053B 0.388 (0.321)0.417 (0.377) -0.022 0.007C 0.343 (0.305) 0.429 (0.392)-0.067 0.019A/2-0.334 (0.210) -0.714 (0.466)0.356 -0.024B/32-0.325 (0.223) -0.636 (0.432)0.365 0.054C/32-0.357 (0.209) -0.733 (0.514)0.333 -0.0431104.4.3.3 Assessing the effect of sample sizeThe primary objective is to identify empirical evidencesuggesting any asymptotic behaviour of the estimates usingmechanism B (Table 4.2). For case1, clear trends are exhibited. Considering firstthe complete-case approachwe see that forj3the bias worsens, but stays about the same for /3 and/32as the sample size increases (Table 4.11).The mean squared error associated with the estimates decreasesfor all parameters as the sample sizeincreases. The confidence interval coverage for the complete-caseestimatesof/32degrades as the sample size increases (Table 4.12).The confidenceinterval coverage rate associated withexhibits negligible differences, butthe coverage associated withi3appears to be improving with increasingsample size. Finally, wesee no noticeable trend in6Najve(Table 4.13).When the MCEM adjustment is implementedthe story appears to change.We see a reduction in the bias for all the parameter estimatesas the samplesize increases (Table 4.11). This pattern is exhibitedagain for the MSE andfor the estimated relative efficiency. For/32and n = 250 we observe ê =0.67which is statistically different from unity. Intable 4.12 we see the confidenceinterval coverage rates move towardsthe nominal value as the sample sizeincreases. Finally, in table 4.13 wesee that SMCEM decreases as the samplesize increases.For case 4 we see similar trends as withcase 1. For the complete-caseanalysis, the bias associated with the parametersappears to be stable acrosssample sizes. When n = 100, we seethat the bias associated with theestimate of/3iis smaller than for the other sample sizes. Althoughwe seethis difference, the magnitude ofthe bias is similar across the sizes and tellsthe same story through their z-scores,/iis unbiased with the others beingbiased. As before the MSE and confidence intervallength decreases as thesample size increases. There is no noticeable trendfor the coverage rates andthey exhibit values similar to thoseseen in case 1. The difference betweenthe estimated median and the true value has some interestingtrends. Forthe intercept and/32,the difference increases as the sample size increase,butthe reverse is generally true for/3i.111As with case 1, the MCEMapproach for case 4 exhibits some desirableproperties as the sample size increases. In general,across all parametersthere is a trend of decreasing bias, MSE,ê, and confidence interval length asthe sample size increases. The exceptionto this is for/32at n 250 wherethe bias appears to increase. Firstly, thisis not a statistically significant difference (p-value = 0.052 on 144df). Secondly, the break in the trend couldbe a result of the small sample and the resulting crude empiricalapproximation to the sampling distributionproduced by the simulation experimentwhen m = 250. In general, as the samplesize increases, the coverage ratereaches the nominal level. For6MSEthere is no clear trend, but this maybe due to the small simulation size when n 250.Special attention should be givento the trend of ê for case 4. We see thatas the sample size increases ê decreases. The reduction is ratherquick inthat by the time the sample size has reachedm = 250, the MCEM approachis at par with or better than the complete-case approach.For/3ithe relativeefficiency exhibits no statistical differencefrom the complete-case approach,but for and/32there is a difference with that of/32being most noticeable.Discussion A general trend emergesfor both case 1 and case 4. As wemove from the complete-case analysisto the MCEM approach and as wemove from small to large sample sizes,there is a general improvement inestimate properties. The bias tendsto decrease and balancing the trade-offsappears to be eased with more information.The MSE reduces and in somecases the MCEM approach becomes the more efficientapproach. Thesetrends continue to be observed for the length of confidenceintervals andfor the difference between the mediai and the true value forthe MCEM approach. Furthermore, the MCEMexhibits the desirable property of reachingthe nominal coverage rate as the sample size increases.Given the evidence for emerging anddesirable asymptotic properties ofthe MCEM approach, we can make afew observations. The first is that theproperties that we anticipated to seeby using the EM algorithm do emerge atrelatively modest sample sizes. Thesecond is that due to computational limitations, n = 100 was used for many of the simulationexperiments, but this112may obsfucate a lucid understanding ofthe ability of the MCEM approachto manage the dual problem of missingdata and mismeasurement in otherexperiments. That said, seeing the performanceof the MCEM algorithmwith a modest sample size of n 100will give valuable insight for substantive researchers who regularly deal withsmall and modestly sized samples.The importance lies in how it illuminatesthe fact that the complete-caseapproach may be attractive from an efficiencypoint of view, but is unableto reasonably manage the various biases, namelythose caused by missingdata and measurement error.For case two there isa wide discrepancy in the number of simulationperformed. It was found that incase 4, when dealing with NMAR missing data, the ARS Gibbs sampler regularly encounterednumerical problemswith the abscissa, thus causing many abortionsof the algorithm. Additionally, the algorithm was found to wander throughthe parameter space, thussuggesting potential ridges in the log-likelihoodfunction.113I.Table 4.11: Comparing the effect of sample size on pointestimation, bias, mean squared error, and relativeefficiency for case 1: MAR-MAR. Missingdata mechanism B was used for this comparison.(a) Case 1: Sample Size 50, Simulation Size 179Naive MCEME($)(si) S(z,jas)()E()(Si)) S (Zi,jas)()V (SD)/3o-1.406 (0.420) -0.016 (-0.510) 0.177 (0.021) -1.811 (1.932)-0.421 (-2.914) 3.909 (2.296) 22.08 (12.60)0.337 (0.448) -0.073 (-2.176) 0.206 (0.025) 0.431(0.803) 0.021(0.351) 0.645 (0.172) 3.12 (0.820)/32-0.339 (0.301) 0.351 (15.609) 0.214 (0.017) -1.072 (2.177) -0.382 (-2.351) 4.883(2.801) 22.81 (12.94)(b) Case 1: Sample Size = 100, Simulation Size 168 VNaive MCEME($)(si))(Zja) A?E()E(‘)(sl) VS (Zija)()V()j3 -1.355 (0.251) 0.035(1.804) 0.064 (0.008) -1.478 (0.315) -0.088 (-3.618) 0.107 (0.015) 1.66 (0.17)/3 0.361 (0.309) -0.049 (-2.062) 0.098 (0.011) 0.448 (0.374) 0.038(1.315) 0.141 (0.017) 1.44 (0.15)/32-0.321 (0.188) 0.369 (25.457) 0.171 (0.012) -0.742 (0.531) -0.052 (-1.259) 0.285 (0.042) 1.67 (0.28)(c) Case 1: Sample Size 250, Simulation Size = 161Naive MCEMV/3jE()(Si)) )(Zbia.s) )E()E()(Si))(Zbias)() ()/3 -1.340 (0.172) 0.050(3.651) 0.032 (0.003) -1.433 (0.208) -0.043 (-2.618) 0.045 (0.005)V 1.40 (0.13)i3i0.350 (0.186) -0.060 (-4.094) 0.038 (0.004) 0.417 (0.210) 0.007(0.417) 0.044 (0.005) 1.15 (0.08)/32-0.323 (0.125) 0.367 (37.175) 0.151 (0.007) -0.711 (0.318) -0.021 (-0.828) 0.101 (0.011) 0.67 (0.09)Table 4.12: Comparing the effect ofsample size on confidence interval length, and coverage for case 1:MAR-MAR.Missing data mechanism B was used forthis comparison.(a) Case 1: Sample Size 50, Simulation Size = 179ParameterLNajve LMCEM CoverageNajv (p-value) CoverageMcEM (p-value)1.568 (0.293) 7.968 (47.757) 0.955 (0.732)0.978 (0.012)/3 1.743 (0.355) 3.664 (11.279) 0.966(0.220) 0.972 (0.072)/321.123 (0.221) 8.936 (48.990) 0.670(<0.001) 0.966 (0.220)(b) Case 1: Sample Size 100, Simulation Size = 168ParameterLNaive LMCEM CoverageNajve (p-value) CoverageMcEM (p-value)/3 1.045 (0.102) 1.306 (0.392)0.958 (0.588) 0.976 (0.024)/3 1.134 (0.128) 1.315 (0.336) 0.940(0.600) 0.946 (0.836)/320.739 (0.083) 1.774 (0.796) 0.488(<0.001) 0.940 (0.600)(c) Case 1: Sample Size 250, Simulation Size 161ParameterLNajve LMCEM CoverageNaive (p-value) CoverageMcEM (p-value)/3o 0.644 (0.041) 0.756 (0.123) 0.919 (0.152)0.963 (0.392)/30.704 (0.049) 0.779 (0.089) 0.951 (0.984) 0.944(0.744)/320.461 (0.034) 1.041 (0.230) 0.168(<0.001) 0.938 (0.524)I.Table 4.13: Comparing the effect of sample size onthe median and differencebetween the median and true value (S) forcase 1: MAR-MAR. Missing datamechanism B was used for this comparison.(a) Case 1: Sample Size 50, Simulation Size= 179ParameterrnNaive(13j)(ib)iMCEM(IJ) (MAD) Naive5MCEM-1.365 (0.425) -1.481(0.473) 0.025 -0.0910.362 (0.442)0.387 (0.501) -0.048 -0.023/32-0.327 (0.307) -0.658(0.66) 0.363 0.032(b) Case 1: Sample Size = 100, Simulation Size 168Parameter flNaive(/3j) (MAD)rnMCEM(/3j)(K1Ab) Naive ÔMCEM/3 -1.354 (0.242) -1.463(0.306) 0.036 -0.073/3i0.378 (0.299) 0.436 (0.370) -0.0320.026/32-0.312 (0.200) -0.653 (0.427)0.378 0.037(c) Case 1: Sample Size = 250, Simulation Size 161ParameterrnNaive(13j) (iJZb)MCEM(1j)(1b)Naive OMCEM/3o-1.322 (0.192) -1.399(0.220) 0.068 -0.009/3i0.351 (0.210) 0.419 (0.225)-0.059 0.009132-0.316 (0.124)-0.674 (0.327) 0.374 0.016116Table 4.14: Comparing the effect ofsample size on point estimation, bias, mean squared error,and relativeefficiency for case 4: NMAR-NMAR.Missing data mechanism B was used for this comparison.(a) Case 4: Sample Size 50, Simulation Size 74Naive MCEME()(siz.)(zs) iE()E(,)(s1)(z&ia) (81))()-1.305 (0.378) 0.085 (1.923) 0.150 (0.022)-1.512 (0.612) -0.122 (-1.720) 0.389 (0.101) 2.59 (0.60)/3 0.379 (0.495) -0.031 (-0.534)0.246 (0.048) 0.467 (0.648) 0.057 (0.758) 0.423 (0.110) 1.72 (0.28)/32-0.349 (0.308) 0.341 (9.536) 0.211 (0.030) -0.932(1.010) -0.242 (-2.059) 1.080 (0.291) 5.11 (1.65)(b) Case 4: Sample Size 100, Simulation Size = 163Naive MCEME()(sb)(Zbias)()E(i)(sD) (Z,ja)() ()i3o-1.319 (0.265) 0.071(3.418) 0.075 (0.007) -1.446 (0.374) -0.056 (-1.914) 0.143 (0.02)1.90 (0.27)i3i0.402 (0.325) -0.008(-0.332) 0.106 (0.012) 0.431 (0.385) 0.021(0.712) 0.148 (0.018) 1.40 (0.11)/32-0.319 (0.209) 0.371 (22.629) 0.181 (0.013) -0.717(0.533) -0.027 (-0.644) 0.285 (0.048) 1.57 (0.29)(c) Case 4: Sample Size = 250, Simulation Size 46Naive MCEM/3 E(,)(sb) S(Zbias)()E(j)(sb) (Ziyjas) 1i() ()/3o -1.298 (0.166) 0.092 (3.742) 0.036 (0.006) -1.362 (0.175)0.028 (1.067) 0.031 (0.006) 0.87 (0.11)0.391 (0.185) -0.019 (-0.705) 0.035 (0.009) 0.417 (0.198) 0.007(0.253) 0.039 (0.009) 1.13 (0.09)/32-0.297 (0.117) 0.393 (22.690) 0.168 (0.013) -0.621 (0.278) 0.069(1.697) 0.082 (0.018) 0.49 (0.11)I’—4Table 4.15: Comparing the effect of samplesize on confidence interval length, andcoverage for case 4: NMARNMAR. Missing data mechanism Bwas used for this comparison.(a) Case 4: Sample Size = 50, Simulation Size = 74ParameterLNaj,e LMCEM CoverageNajve (p-value) CoverageMcEM (p-value)/3o1.526 (0.226) 2.481 (2.291) 0.946(0.876) 0.959 (0.680)/3 1.758 (0.321) 2.301 (1.485)0.959 (0.680) 0.973 (0.224)/321.094 (0.197) 3.646 (4.231) 0.635(< 0.001) 0.946 (0.876)(b) Case 4: Sample Size 100, Simulation Size 163ParameterLNajve LMCEM CoverageNj (p-value) CoverageMcEM (p-value)/3 1.045 (0.105) 1.315 (0.500)0.933 (0.372) 0.963 (0.372)j3 1.228 (0.139) 1.359 (0.280) 0.963(0.372) 0.945 (0.772)/320.748 (0.076) 1.749 (0.677) 0.509(<0.001) 0.920 (0.160)(c) Case 4: Sample Size = 250, Simulation Size = 46ParameterLNajve LMCEM CoverageNajv (p-value) CoverageMcEM (p-value)/3 0.640 (0.032) 0.718(0.084) 0.935 (0.676) 0.957 (0.828)i30.741 (0.060) 0.784 (0.076) 0.913 (0.372)0.957 (0.828)/320.459 (0.034) 0.973 (0.173) 0.087(<0.001) 0.957 (0.828)CoTable 4.16: Comparing the effect of sample size on themedian and differencebetween the median and true value (S) for case 4:NMAR-NMAR. Missingdata mechanism B was used for this comparison.(a) Case 4: Sample Size 50, Simulation Size = 74ParameterrnNaive(/3j) (MA1)) ?ñMcEM(I3j) (MAD) NaiveMCEM/3o -1.306 (0.424) -1.428 (0.441)0.084 -0.038j3 0.278 (0.402) 0.370 (0.409)-0.132 -0.040/32-0.372 (0.292) -0.780 (0.784)0.318 -0.090(b) Case 4: Sample Size 100, Simulation Size = 163ParameterrnNaive(/3j)(MA1))fñMCEM(/j) (MAD) SNaive8MCEM/3o-1.294 (0.269) -1.372 (0.356)0.096 0.018/3 0.388 (0.321) 0.417 (0.377)-0.022 0.007/32-0.325 (0.223) -0.636 (0.432)0.365 0.054(c) Case 4: Sample Size = 250, Simulation Size = 46ParameterrnNaive(/3j)(ii) fñMCEM(/) (MAD)6Naive MCEM-1.267 (0.168) -1.352 (0.151)0.123 0.038/3 0.378 (0.186) 0.393 (0.265)-0.032 -0.017/32-0.284 (0.098) -0.589 (0.247)0.406 0.101119Table 4.17: Study design for exploring the effect of r.TPattern Data Generating Model Assumption1 1.0 1.02 1.0 0.53 0.5 1.04 Comparing the effect of the specification ofrUnder the assumption that we have correctly specified the missingdatamodel, mechanism B, two aspects will be consideredin this experiment:correctly specifying T and incorrectly specifyingit (Table 4.17). When thevalue of r is correctly specified, we can observe some generaldifferencesbetween a noisy measurement error model (pattern1) and a much less noisyone (pattern 4). For the complete-case analysis we observea reduction inthe bias as we move from the noisy model to the lessnoisy one, and we seea similar trend in the MSE. The MSE associated with is theexceptionto this trend (Tables 4.18 and 4.19). Here, the biasis reduced, but thevariability of the estimator increases for/3oand/32as r goes from 1.0 to0.5. We see the increased variability affect the MSE of The theothercoefficients we see a reduction in the MSE. We see longerconfidence intervalsas we move from the noisy to less noisy model; this difference may notbestatistically significant (Table 4.20). There is no substantial differenceinthe coverage andNaive(Table 4.21). The exceptions are that thecoveragerate for/32in the less noisy model is much better than that ofthe noisymodel and the difference for/32has a noted reduction in the difference forthe less noisy model.Considering the MCEM adjustment, we see some familiar features.Asdesired, we see a reduction in the bias of the estimator whenmoving froma noisy measurement error model to a less noisy one. Wesee little changein the 1VISE, but we do note that/32has a smaller associated MSE for the120less noisy model (Tables4.18 and 4.19). Unlike the previous situation,theconfidence interval length is smaller withthe less noisy model. We alsoseethe maintenance of reaching the nominallevel for the confidence intervalsand note thei3ogoes from not reaching the nominal level inthe noisy modelto reaching it in the less noisy model (Table4.20). We see a reduction ofMCEM as we move from pattern 1 to 4 (Table 4.21). Finally,we observethat there is little reduction in ê as we moveto a less noisy measurementerror model.When considering the problem of incorrectlyspecifying r we need tocompare the changes with the correct model.For pattern 2, where we underestimate the variabilityof the measurement error model, we shouldcompareit with pattern 1. Forpattern 3, where we over-estimate the variabilityofthe measurement error model, we shouldcompare it with pattern 4.Wesee, with pattern 2, that the complete-case analysisbias is similar to thatof pattern 1, as is the MSE, length,coverage, andNaive. For the MCEMapproach we see that the bias improves fromthe complete-case situation,but is larger forj3and/32estimates than those of pattern 1.The MSE isabout the same between the two methods,but is smaller than that of pattern1 (Table 4.18). The lengths arelonger than the complete-case confidenceinterval lengths, but they are shorterthan those seen in pattern1. Thecoverage is the most complicated relationshipin that there is no clear trendor pattern. For/32the coverage for the associated confidenceintervals issimilar across estimationprocedures, but worse than in pattern1. For/3k,the coverage is similar for both comparisonsand for/3othe coverage is similaracross estimation methodologiesand better than pattern 1 (Table 4.20). For6MCEMthe differences are in general larger thanthose found in pattern 1(Table 4.21). Finally, the estimated relativeefficiency is better for pattern2 than in pattern 1, and is noticeablybetter for/32.For pattern 3, the results are notas favourable towards mismatching thetrue value of r and the proposed value usedfor the model. In the completecase situation, the parameter estimatesare less variable than the matchedcounterparts but the bias increases withthe mismatch. The result is that theIVISE exhibits negligible differences betweenthe matched and mismatched121estimates. We notice that the confidence interval lengths and coverageratesare similar for both pattern 3 and 4, but6Naiveis larger for the mismatchedsituation. For the MCEM adjustment, the bias is much greaterthan thatseen in the complete-case analysis and is much worse than in case 4 (Table4.19). This pattern is seen again for the MSE, the estimate of therelativeefficiency and for the length of the confidence intervals.Coverage worsensfori3oand/iwhen compared to both the compete-case approach and thematched situation. For/32the coverage rate for the associated confidenceinterval reaches the nominal rate and is similarto that of pattern 4 (Table4.20). Finally, the difference between the median and the true value isworsefor both comparisons (Table 4.21).Discussion There are two main comparisons on which to makecomment.Firstly, we will consider the difference between a noisy and less noisy model;secondly we will consider the situations whenr in inaccurately specified.We see that the less noisy model has smaller bias and a greaterability tomaintain the associated confidence intervals at the correct nominal level,but we see little gains in the area of MSE and in the estimatedrelativeefficiency which may be due in part to the general, and unexpected,increasein variability of the estimators in pattern 4 when compared to pattern1.Pattern 2 under-estimates r, assuming a less noisy model.If we wereto assess the performance solely on the MSE and relative efficiency, wemayconclude that under-estimating the measurement error maybe beneficial tothe estimation procedure. This hidesa bias that is larger for the covariatesof interest:/3iand/32.Furthermore, we see undesirable properties forthecoverage rate and for6MCEM•Pattern 3 over-estimates -r and we observethat many of the estimate properties are are noticeably worse thanthe correctly specified model. If we were to make a mistake in our assumptionswewould rather assume a smaller measurement error than one that istoo big.A final note concerns the execution of this experiment.It is clear that fromthe number of successful simulations for pattern 3, that many computationproblems existed. The primary and most frequently encounteredproblemwas with the initialization and updating of the abscissa.122Table 4.18: Comparing how agreementand disagreement on the specification of r when the true value ofT== 1.0(4.17) affects point estimation, bias,mean squared error, and relative efficiency whenusing missing data mechanismB with agreement between the mechanismgenerating the missing data and the assumed missingdata mechanismfor case 1: MAR-MAR.(a) Pattern 1: Simulation Size 168Naive MCEM/3E()(s’D) S (Zjas)()E()(s1) ñTS(Z,ja) A1E()t()-1.355 (0.251) 0.035(1.804) 0.064 (0.008) -1.478 (0.315) -0.088(-3.618) 0.107 (0.015) 1.66 (0.17)0.361 (0.309) -0.049 (-2.062) 0.098 (0.011) 0.448(0.374) 0.038(1.315) 0.141 (0.017) 1.44 (0.15)/32-0.321 (0.188) 0.369 (25.457) 0.171 (0.012) -0.742 (0.531)-0.052 (-1.259) 0.285 (0.042) 1.67 (0.28)(b) Pattern 2: Simulation Size 179Naive MCEM/3jE (/j) (st)(zba)()E ($j) (sb)(Zb.jas)() ()/3o-1.352 (0.267) 0.038 (1.918) 0.073 (0.008) -1.379 (0.278) 0.011(0.527) 0.078 (0.008) 1.06 (0.03)0.367 (0.310) -0.043 (-1.878) 0.098 (0.012) 0.365 (0.307) 0.045(-1.965) 0.097 (0.011) 0.99 (0.03)/32-0.327 (0.187) 0.363 (25.964) 0.167 (0.010) -0.394(0.222) 0.296 (17.901) 0.137 (0.010) 0.82 (0.02)Table 4.19: Comparing how agreementand disagreement on the specification of T whenthe true value of r = 0.5(4.17) affects point estimation, bias,mean squared error, and relative efficiency when using missing dat mechanismB with agreement between the mechanismgenerating the missing data and the assumed missingdata mechanismfor case 1: MAR-MAR.(a) Pattern 3: Simulation Size 39Naive MCEME()(sD) .8 (Z,ja.)()E()(s1) (Zbjg) (3D)()/3o-1.306 (0.217) 0.084 (2.425) 0.054 (0.011) -1.824 (0.648) -0.434 (-4.188)0.608 (0.158) 11.19 (4.34)/3 0.366 (0.265) -0.044 (-1.025) 0.072 (0.012) 0.752(0.549) 0.342 (3.892) 0.418 (0.109) 5.79 (1.73)/32-0.471 (0.246) 0.219 (5.559) 0.109 (0.023) -2.304(1.796) -1.614 (-5.612) 5.828 (1.684) 53.67 (23.1)(b) Pattern 4: Simulation Size = 150Naive MCEMBE(,)(sr) 8 (Zjas)()E()(sb) S (Z&jas)() ()/3o-1.383 (0.290) 0.007 (0.297) 0.084 (0.012) -1.45 (0.321) -0.06 (-2.268) 0.107 (0.018) 1.26(0.08)0.387 (0.297) -0.023 (-0.949) 0.089 (0.009) 0.429 (0.329) 0.019 (0.702) 0.109 (0.013) 1.23(0.08)/32-0.527 (0.265) 0.163 (7.552) 0.097 (0.010) -0.724 (0.383) -0.034 (-1.077) 0.148 (0.028) 1.52(0.21)Table 4.20: Comparing confidence interval length,and coverage for the four patterns of T as given in Table 4.17.when using missing data mechanismB with agreement between the mechanism generating the missingdata andthe assumed missing data mechanism for case 1: MAR-MAR.(a) Pattern 1: Simulation Size = 168ParameterLNaive LMCEM CoverageNajv (p-value) CoverageMcEM (p-value)/3 1.045 (0.102) 1.306 (0.392) 0.958(0.588) 0.976 (0.024)1.134 (0.128) 1.315 (0.336) 0.940(0.600) 0.946 (0.836)/320.739 (0.083) 1.774 (0.796) 0.488(<0.001) 0.940 (0.600)(b) Pattern 2: Simulation Size 179ParameterLNajve LMCEM CoverageNajve (p-value) CoverageMcE (p-value)/3o 1.044 (0.107) 1.084 (0.137)0.944 (0.732) 0.955 (0.732),3 1.141 (0.135) 1.154 (0.149) 0.944(0.732) 0.951 (0.988)/320.747 (0.078) 0.898 (0.114) 0.464(<0.001) 0.737(<0.001).(c) Pattern 3: Simulation Size = 39ParameterLNaive LMCEM CoverageNajv (p-value) CoverageMcEM (p-value)/3 1.031 (0.086) 2.717 (3.923) 1.000(<0.001) 1.000(<0.001)1.128 (0.121) 2.428 (2.576) 0.974 (0.336) 1.000(<0.001)/320.938 (0.116) 6.331 (10.108) 0.821 (0.036) 0.923 (0.528)(d) Pattern 4: Simulation Size 150ParameterLNajv LMCEM CoverageNajve (p-value) CoverageMcEM (p-value)/3 1.073 (0.137) 1.173 (0.240) 0.933 (0.412) 0.947 (0.856)/3 1.148 (0.141) 1.225 (0.193) 0.947 (0.856) 0.960 (0.532)/320.979 (0.145) 1.360 (0.325) 0.880 (0.008) 0.967 (0.256)Table 4.21: Comparing the effectof sample size on the median and differencebetween the median and true value (8) for thefour patterns of r as given in.Table 4.17 when using missing data mechanism B with agreement betweenthe mechanism generating the missingdata and the assumed missing datamechanism for case 1: MAR-MAR.(a) Pattern 1: Simulation Size = 168ParameterfñNaive(/3j) (MAD) rnMCEM(!3)(MAD) SNaive MCEM-1.365 (0.425) -1.481 (0.473) 0.025 -0.091i30.362 (0.442) 0.387 (0.501) -0.048-0.023/32-0.327 (0.307) -0.658 (0.66) 0.363 0.032(b) Pattern 2: Simulation Size = 179ParameterrnNaive(/3j)(iA) iMCEM(/3j) (MAD)Naive ‘5MCEM/3o -1.348 (0.263) -1.364(0.26) 0.042 0.0260.374 (0.271) 0.356 (0.275) -0.036 -0.054132-0.316 (0.187) -0.377 (0.206) 0.374 0.313(c) Pattern 3: Simulation Size 39Parameter‘FñNajve(/3j) (MAD) mMCEM(/3j)(MAD)6Naive 8MCEMj3 -1.309 (0.190) -1.551 (0.608) 0.081 -0.161j3 0.353 (0.319) 0.671 (0.501)-0.057 0.261/32-0.458 (0.274) -1.883 (1.413) 0.232 -1.193(d) Pattern 4: Simulation Size = 150Parameter fhNajve(/3j) (MAD)mMCEM(/3j)(YTAb) SNajve S?VICEM-1.364 (0.264) -1.437 (0.304) 0.026 -0.047j30.378 (0.299) 0.399 (0.315) -0.032 -0.011/32-0.497 (0.233) -0.667 (0.316) 0.193 0.0231264.4.4 Simulation study 1 discussionWe will first consider the combined effect of havingmissing data and measurement error problems in the samedata set on a complete-case analysis.The estimated standard deviation was large,but reduced in magnitude asthe sample size increased. When the missingdata mechanism is MAR,/3is generally unbiased, but mechanism C producedlarge z-scores with mechanism B producing z-scores which indicate thatproblems with bias mayarise. As expected,432was biased for all mechanismsand in all consideredscenarios. Increasing the sample size appearsto have little affect on biasreduction, but smaller r appears to yieldsmaller biases. In terms of coverage, the nominal rate was not attainedfor any mechanism for432,but thiswas not unexpected. Mechanism C failedto attain the nominal rate forio.Increasing the sample size did not improvethe overall attainment of thenominal coverage rate.With a NMAR missing data mechanism,/3was unbiased for mechanismsA and B and/32was biased as expected. The biasof/3obecame moreproblematic under the NMAR assumption. Aswith MAR, increasing thesample size had little impact on the bias.The attainment of the nominalcoverage rate exhibited the same patternas with MAR with no change inattainment as the sample size increased.When compared with the naive complete-caseanalysis we see that theMCEM approach attempts to mitigatethe bias of/-32associated with themismeasured covariate by striking a setof trade-offs with the other estimates. The emergent empirical relationship showsthe MCEM algorithmimplicitly setting up a hierarchy amongstthe three parameters, allowingthe bias in both /3o and /3 to increasein order to reduce that of/32.Itappears to put the overall quality of the model estimatesabove the qualityof the individual estimates. This is dramaticallyseen as the sample sizeincreases. Furthermore, there appears tobe a subtle structure to this hierarchy where both/3oandj3suffer at the expense of correcting432.Whenbenefit is brought back to the parameterestimates, is the first to undergoimprovement.127The missing data mechanism, correctly or incorrectlyspecified appearsto make little impact on the results, but we do see that under-modelling themissing data mechanism is in general worse than over-modelling it.Whenthe systematic component of the missing data mechanism is considered,wedo see evidence that including the response in the systematic componentdoes influence the quality of the estimates (MechanismC). In this situation,the MCEM approach has a much more difficult time reducingthe bias for allthree parameter estimates and may sacrifice theaccuracy of the estimatesfollowing the aforementioned priority.In general, the larger the sample size, the better the performance of theMCEM adjustment. Although not unexpected, it isa desirable quality toobserve. We also observe that the parameter estimates are ingeneral betterfor less noisy measurement error models. When‘r is incorrectly specified,we see that it is better to use a less noisy value forr than the one in thetrue measurement error model.Although it was possible to execute the simulationstudy and obtainresults, it was not without its obstacles. The barriers break-downto fourtypes: storage, computational time, ridges, and theARS Gibbs sampler.Given the structure of the mismeasurement problemof the motivating substantive context, no additional data wasassumed to exist for modelling themeasurement error model. A classical model was assumedas was the distribution for the random noise, c, in the measurement error model. This modelwas then built into the likelihood function, thus we needed to drawsamplesfor each subject in the data. With a data set of size 100 withtwo covariatesand 5000 ARS Gibbs samples per subject, the resulting augmenteddata setyields a 500,000 by 2 data matrix. A data matrix of this sizebecomes nontrivial when it needs to be held in memory for theduration of the algorithm.Although these algorithms often required between one and two gigabytesofmemory, it was large enough to be noticed. When the sample sizeincreasedto 250 subjects, the increase in memory requirement is not a linearincreaseof 2.5 times that of the 100 subject memory requirements. Althoughit waspossible to manage the programs so that they would run at non-peaktimes,they were sufficiently large as to warrant some concern about the practicality128of what could be called a “brute force” approachto memory management.The required memory was reduced by identifyingonly the key pieces of information necessary for the overall simulationstudy; these changes made amodest impact on the required memory.The second practical feature on whichcomment can be made is thecomputational time for the MCEM algorithmwith imperfect covariates.Given the comments about the computationaldifficulties with the MCEMapproach made by Ibrahim et al. [44] itwas expected that we may only beable to reasonably attain simulation sizes ofaround 60, but we were able toreach sizes of approximately 150 and still maintaina reasonable computationtime in doing so. For this particular mixtureof imperfect covariates, we raneach scenario in batches of 25. Thecomputational time for each batchwas roughly 2 to 5 days depending onthe sample size and the demandsplace on the servers. Although for manysimulation studies, this may seemunreasonably long, it permitted enough timeto assess the algorithm itself,manage computational difficulties, andpermitted extra runs for batcheswhich would encounter computationalproblems or fail. Furthermore, thispermitted a wide scoping look at the performanceof the algorithm.The computational problems fall into oneof two types: potential ridgesand ARS Gibbs abscissa. Certain scenarios, suchas the n = 50 scenario, exhibited behaviour highly suggestive of a ridge inthe likelihood. At this pointis in unclear if the likelihood associated withimperfect variables with theassumptions levied in this thesis has ridges and inthese situations we wereunlucky, or if the likelihood associated withonly these scenarios containedridges. An example of the problem isbest exhibited with two scenarios. Thefirst was when n = 50. In this situation, the measureof convergence travelled in the range (0.03, 0.04) and reached over200 iterations. This was anoticeably large number of iterations for the majorityof the simulations wereending around 3D or 40 iterations. Fromwhat could be assessed, it appearedas if the algorithm found a portion of the likelihood whereit could continueto travel without changing the distance between the laggedestimates of theparameter vector. This behaviour wasalso exhibited for mechanism C withn 100. Here, the convergence measure stayed in(0.01,0.1) for longer than129expected, around 60 iterations, but eventually failedbefore reaching a verylarge number of iterations (more then 300) due to problems withupdatingthe abscissa for the ARS Gibbs sampler.When considering the third pattern for r, we encountered some interesting computational problems: failure and non-convergence.This particularscenario was highly prone to problems with initializingand updating theabscissa for the ARS Gibbs sampler. This problem was not isolatedto thisscenario and occurred when n = 50 and then the missingdata mechanismswere mismatched. Furthermore, it appears that many of the problems withupdating the abscissa were found with simulationswhere the convergencecriterion remained in a well defined area, appearing to travelalong a ridgeor a plateau. A final problem that was observed occurredonly with pattern3 of the r experiment. In this situation, non-convergence was commonandin some cases, the convergence measure grew with successive iterations.1304.5 Simulation study 2: a random variable whichsuffers from both imperfectionsIn this simulation study, we will consider the situation where there are multiple covariates with one suffering from both imperfections. This study willinvestigate a comparison of the:• effect of the model assumption about the missing data mechanism,• missing data mechanisms,• effect of sample size, and• effect of different distributional assumptions for the error term of themeasurement error model.The assumption of a classical measurement error model will remain constant across all the scenarios in this simulation study witha target of 200simulations.4.5.1 Likelihood model for simulation study 2Beginning with the covariate model, we begin havep(XI)where X1) is the vector of augmented random variables excluding thefirst augmented random variable, X4(_1) is the associated random variable,and is the vector of indexing parameters with the indexing parameterassociated with X1) removed. The covariate XE P while X(_i) E P.For X,, j 2,3,4, andXr—Xi, soX =(Xf,X)rrXj131for j = 2,3,4. The covariate model becomesp(XI)=p(X4jIl)p(Xj(_l)II’(-1))=p(XIxji, ‘)p(Xjll)p(Xj(_l)I(l)).The log-likelihood is now specified astc(IXU,y)= [1ogP(RIxYi)+logp(Yjx,, )+1ogp(Xxi,)+ 1ogp(Xi) + lOgP(Xi(_l)I(_l))].For the covariates, we will assume a multivariate normaldistribution forthe joint probability distribution of X whereX MVN(u,E),10 0 0whereT =(0000)Tand0 1 0.1 0.050 0.1 1 0.10 0.05 0.1 1It is clear that the multivariate normal specificationhas Xil pairwise independent of Xj(_i). Under the assumption of anunbiased imperfectionmodel, when rj = 0 for all i, equation 2.5.8 for XbecomesyE_ I y*L(1— IY*il— T1il,obs Ir11il,misswhere = X1+ € for F = {obs,miss} and €i N (0,r2), so XN(x1,r2) under the assumption that X11 are identicallydistributed.In the previous simulation study a noisy measurementerror model wasassumed. For this simulation study, we willassume a slightly less noisy132model and specify r 0.7 as the true value for all models. Unlike theprevious simulation , we will not consider various levels of r, but insteadwe will focus on the problem of misspecifying the correct valueof T. Thechange in focus for the r experiment pertains to the similarity ofthe appliedexample to this simulation study, thus with the applied example we will notknow the true value of r. In this respect it is of interestto understand howthe naive complete case approach and the MCEM methodologyperformswhen the model is misspecified in term of T.The response is a Bernoulli random variable with the binarylogisticregression response model given in section 2.5.5 which is indexedby /3 =(—0.69,0.41, —0.22,0.10, —0.11). The mechanism of imperfection model isgiven in section 2.5.6 and we will retain the definition for the imperfectionindicator given in section 2.3. It is clear from equation 2.5.14 thatthe conditional expression of the joint distribution of imperfection indicator allowsfor a dependent structure between the indicator for missingdata and theindicator for mismeasurement. Also, it allows for dependentstructure acrossthe set of imperfection indicators.Two simplifying assumptions about the mechanism of imperfection willbe made. The first is that the andkthindicators of imperfection areindependent for all j 1,. ..,p and k = 1, . ..,p. The second assumptionis that the indicator for missing data and the indicator formismeasurementare independent, RfLR for all j, j = 1...4. Predicated on these twoassumptions and the model currently under investigation, equation2.5.14becomesp(Rlx,y,7)p(RIx,yj,7j)p(RfIx,yj,7).It is assumed that X1 is mismeasured for all realizations, so= 0 forall subjects which means that there is only one outcome possiblefor thisrandom variable,p(RIx,y,7)=p(RfIx,yj,7f)133Table 4.22: Coefficient values for different missingdata mechanisms usedto generate simulated data sets. The systematic component is specifiedas=(77777)’Mechanism •MAR NMARA (—2,0, 0.5, 0.25,0.5)’ (—2, 0.75, 0.5, 0.25, —0.5)’B (—3,0,0, 11)T(—3,2,0, 1,1)TC(_2.5,0)T (_2.5,0)TThe complete log-likelihood becomestc(IxU,y)= logp(R1x,y,f)+logp(YiIx,)+ 1ogp(Xf) + 1ogp(Xj1xjf)+ logp(Xj(_1)/4)).(4.5.1)since we are not considering over- or under-dispersed models.For the simulation study, one experiment will consider three differentmissing data mechanisms (Table 4.22). The structure for the missingdatamechanism model is the same for all three,y = (‘yo, 71,72,7374)Twhere70corresponds to the intercept,‘yicorresponds to the imperfect covariate,and(72,73,74)correspond to the perfect covariates. For this simulationstudy, the response was not considered to be a covariate in the indicator ofimperfection model.For missing data mechanism A, the probability of missing informationwas dependent on on (x2,X3, x4) if MAR and all covariates if NMAR. Thesimulation based mean proportion of missing data is 0.14 and 0.16 respectively with respective ranges of 0.25 and 0.27. As with the previous simulation study, the proportion of missing information was kept high to reflectthe assumption, and observed levels of missing data, often encountered withsocial-epidemiological research. As with mechanism A of the previous simulation study, the mechanism was constructed so that the MAR andNMAR134proportions would be similar.In contrast to mechanism A, mechanism B puts much moreweight onthe imperfect covariate and does not depend on all the perfectcovariates.The simulation based mean proportion of missing data is0.Q96 and 0.162for MAR and NMAR mechanisms. The respective ranges are0.21 and 0.28.For mechanism C, we further explored the roleof the imperfect covariateon EM adjustment. For MAR, we consider the situation where the missingdata mechanism is MCAR by settingy3= 0 for j = 1,2,3,4. As before wecompare this with NMAR. The simulationbased mean proportion of missingdata is 0.076 for MAR and 0.173 for NMAR. The respective rangesare 0.2and Simulation specific details for the Monte CarloEMalgorithmBeginning with the approximation toQ((t)),we haveIt))+ (1-11rr) [iI7t++Under the imperfection mechanism model assumptions given in theprevioussection, equation 4.5.1 and the functional form given inequation 3.3.10 wehave(t) — / i1(t)Q(7I7 ) =Q 7iVY1logp (Rp(t+1)135where m= mt)as discussed in the previous simulation study. The expectation-step estimate of the response model given by equation 3.4.6 becomesV (Iøt))= Z1ogp(Ixt+1),t)The covariate model is slightly more involved thanthat of the previoussimulation. We approximate equation 3.3.11 with[(i)(It))].First we have the approximation for the observable random variable=--1ogp(Xx1),t))and second we have the approximation for the unobservable targetrandomvariable4 m=-->[logp(xj1f)+ 1ogp(X(l)Il))]=[logp(Xj1)]+logp(X(_l)I1))so(It))[logp(XIx1,P(t))+ log p(X1If)]+ logp(X(_l)I1)).1364.5.2.1 Gibbs adaptive rejectionsamplingBeginning with equations 3.4.8,the structure of the gibbs sampler for thissimulation isXt+1)(xIx,r, Xj(_i), y, andx1p (x1Ir,(t+l)Xi(_i), /j,The full conditionals arep (xIx, rf, Xi(_i),,(t))p(xjx, X(_i),,P(t))and/ E(t+1)(t)’\ IE(t+1) (t) 1(t)pXjiIr1,x Xj(_l),Yj,jxp(RIx1 ,x1,Xj(_1),Yi,7i )E(t+1) (t)(t)xp(YjIx1,x1 ,Xj(_i), /3)xp0fiIx,,P(t))(,P(t))As with the previous simulation studyeach full conditional is itself log-concave (Table B.1 and Appendix B.1.5).We will use the ARS Gibbssampler under the sampling specificationsoutlined in simulation study one(Section Analytic solutions formaximizationThe underlying concepts for maximizationare identical to those in section4.4.2.1, thus we proceed directlyto the normal distribution which has aclosed form solution. Before we proceed,an important observation shouldbe made. The first is that T is assumed tobe given for the model to ensure itis identifiable. This the only parameterwhich which needs to be estimatedfor the conditional distribution X1xj. Sincewe are assuming that T isknown, this distribution is fully characterized.Again, following the results given byDwyer [23] and beginning withequation C.1.3, the MCEM maximum-likelihoodtype estimate of the mean137for X1 is[()x+ (—i)where = 0 for all i, soi4where=xi and is the grand average over the mn simulatedsubjects.Turning to the estimation of the variance,we begin with equation C.1.4E[(v)(xj--+(1_irri)m-1S2]where = 0 for all i and we are oniy estimating o,so21m—12where4=—j1(xii— i1.)2,with the result for large m holding(Equation C.1.5) Louis standard errorThe computation of the Louis standard errors is essentiallythe same as theprevious simulation study. From the general presentation of theLouis standard error, we have the relationship given in equation3.5.3 with the MonteCarlo approximation based on the current Gibbs sample given by equation3.5.5. Since the likelihood model components are uniquely parameterized,the resulting information matrix will be block diagonal, with eachblock corresponding to a uniquely parameterized component ofthe likelihood. With138this structure we can use the resultthat the inverse of a block diagonalmatrix is a block diagonal matrix ofthe block inverses, thus we need notcompute the inverse for the entire informationmatrix in order to obtain thecovariance matrix for the parameters of interest.Since we are primarily interestedin 3 the parameter for the responsemodel, we need only to compute the block thatcorresponds to the responsemodel to obtain the corresponding informationand then take its inverse forthe variance estimate. Using equation3.5.6 and the score and Hessian forthe response model as given in appendixB.2.2, we are able to obtain theMCEM Louis standard error for the8thsimulation estimate of / Results and discussionsWith two completed simulations, eachof which consider different manifestations of imperfection in data set, wehave compelling evidence that it iscomputationally feasible to implementa solution to the problem of havingboth missing data and measurement errorin the same data set. The caveatseems to be that we may be limited to smallsamples. Comparing the effect of themissing data mechanismassumptionThree aspects of this simulationexperiment will be considered: comparisonsbetween the naive completecase approach and the MCEM method withinthe matched scenarios, a comparison of correctlyspecified and misspecifiedmissing data models, and a comparisonbetween MAR and NMAR modelswhen they are correctly specified.We begin with table 4.23. With the modelcorrectly specified and having a MAR missing data mechanism, wecan see that the MCEM approachproduces a dramatic reduction in thebias of/3iand there is no statisticalevidence against the null hypothesisof equality with zero. This is not surprising as this would be our best guessbased on the missing data literature.An interesting feature is that/o, /32,and /33 all retain biases which are significantly different than zero, whilei3remains statistically non-significant.139Furthermore, the MCEM approach does not appearto have considered anynoticeable adjustment for the /3, = 0, 2, 3,4. TheMSE for /3 is muchlarger with the others being much the same between the two methods.Theefficiency for/3favours the complete case method, but for all others,thedifference may not be significant. For all parameters,the lengths of the confidence intervals are similar between approaches. (Table 4.25). The coverageis as expected for all parameters excepti3and /33. The MCEM adjustmentcorrects the coverage problem forj3while /33 remains uncorrected. Finallywith table 4.27 we see that there is a good adjustment for /3 whichlooksvery similar to that observed for the non-robust measures. Additionallywesee that there may be a slight trade-off in the correction; the differencein/is sacrificed in order to improve that of/3i.This is similar to that observedin the previous study.The NMAR-NMAR scenario has shows that the biasfor/3iundergoesa substantial correction under the MCEM model and the other parametersare relatively unchanged (Table 4.24). As expected, there is littlechangein the MSE for all parameters except that of /3 which has increased.Theefficiency is similar to that of the other scenarios.There is no significantdifference in the lengths of the confidence intervals,but we do note longerMCEM intervals (Table 4.26). Fori3,the coverage improves and attainsthe nominal rate. There is no change in coverage for the other parameters.With ö, there is a substantial improvement in the difference for with littlechange for the other parameters (Table 4.27).A very real challenge with missing data is the correct specification of themissing data model. It is overly naive to assume that we would always getthis correct, thus it is of interest to consider how misspecificationchangesthe adjusted estimates. For these two cases we will focus on the IVICEMestimates. In table 4.23, we see, moving from assuming MARto NMAR,that the bias for/32and /33 changes from being significantly differentthanzero to non-significance. The MSE for each parameter is roughlythe samewith a decrease for but this may not be a significant depreciation of theIVISE. We see a similar result for the efficiency. The lengths are very similar,but we see a worsening of the coverage for/iand an improvement for /33140(Table 4.25). Finally, we that5MCEMworsens for/3oand/3i,improves for/32and /33, and stays the same for /34.When the data is generated by a NMAR mechanismbut assumed to beMAR, we see that misspecification causes the bias to be reduced for/3oandwhile increasing it for the other parameters (Table 4.24). The MSE forthe parameters increases for all except for/owhich decreases. Furthermore,and/32exhibit significant differences in their associated MSEs acrossthetwo cases. There is a general decrease in efficiency except for /33;/3oexhibitsa significant difference in efficiency. There is no significant difference in thelengths (Table 4.26). The coverage worsens foro,improves for/3iand/32,and shows now real change for /33 and/3.For8MCEMwe see that thatmisspeciflcation reduces 6 for/oandj3while increasing it for the others.Now we turn our attention to some comparisons between MAR andNMAR when the models are correctly specified, that is we are comparingcases 1 and 4. The MCEM approach appears to do a better job with biasreduction for NMAR data compared to MAR when considering the entireparameter vector. The caveat here is that the MCEM approachdoes a better adjustment for /3 if the missing data mechanism is MAR ratherthanNMAR, but performs rather poorly for the other parameters. Comparingthe MSE across the two cases, we see little difference with perhaps a difference for and/32.The nominal coverage is reached for/3and /34 for bothmechanisms. Forj3the coverage rate drops, but still attains the nominalrate. For/32,there is a loss in the attainment of the nominalrate whereas /33attains it. We see no differences in the lengths of the confidence intervals.We see that the magnitude of6MCEMincreases for/as we move from theMAR to NIVIAR mechanism, but all others decrease.Discussion As before, we see subtle trade-offs occurring. Frequently thesetrade-offs involve a worsening of the bias for the parameters associated withperfect covariates and the intercept, with the latter often suffering themost.In most cases, we see stability in the MSE with only theMSE for /3 having asubstantial increase. There is little change in efficiency across the four cases.The lengths of the confidence intervals for the MCEM approach are similar141across all cases. It is oniy reached in case 1 and case3. A general observationis that MCEM adjustment does improve the coveragerate, but for cases 1,2, and 4 there exists at at least one parameter for which the nominalrateis not achieved. Finally we see that S improves with adjustment,but thisoften comes at the expense of a degradation of the others,with a worseningforobeing the most obvious.The effect of misspecification appears to depend on thetype. Consideringonly the MCEM results, we see that over-modellingresults in a generaldecrease of parameter biases whereas under-modelling tendsto increase thebiases. The inference about the biases changes only whengoing from case 1to case 2. Here the over-modelled results appear to have no significantbias.Over-modelling appears to affect negatively thecoverage associated with/i,but has a positive affect on on the coverage for . Under-modellingbringsan improvement to the coverage associated with/3and/32,but negativelyaffects the intercept. From the point of view of bias and5MCEMit is betterto over-model, but in terms of coverage, it would be betterto under-model.Stepping back to look at the overall trending acrossthe mechanism reveals no clear or obvious patterns. What is consistentis that the MCEMapproach does decrease the magnitude of the bias for theparameter estimateassociated with the imperfect covariate,j3.The MSE for this parameter islarger for the MCEM approach. The MCEM approachproduces longer confidence intervals but this does not translate to the attainment ofthe nominalcoverage rate for all cases. As with the bias, S improves withthe applicationof the MCEM method on the imperfect data, but thereis a general cost onS for the other parameters which displays no clear pattern.142Table 4.23: Comparing the effect of the missingdata mechanism when the missing data was generated with aMAR mechanism on the point estimate, bias, and meansquared error when a covariate has both missing dataand mismeasured data. The mechanism that generatesthe missing data and the missing data mechanism usedtomodel the data is matched and mismatched. MechanismA was used to generate the missing data forXl.(a) Case 1: MAR-MAR, Simulation Size2OONaive MCEME(/)(sb) S (Zja) E’()E(,)(sb)(zbjas)()e()-0.722 (0.230) -0.032 (-1.992) 0.054 (0.005)-0.738 (0.241) -0.048 (-2.809) 0.060 (0.006) 1.12 (0.03)/310.291 (0.214) -0.119 (-7.833) 0.060 (0.006) 0.418 (0.319)0.008 (0.375) 0.102 (0.011) 1.70 (0.14)/32-0.260 (0.237) -0.040 (-2.417) 0.058 (0.006) -0.266 (0.242)-0.046 (-2.662) 0.061 (0.007) 1.05 (0.01)/33 0.133 (0.231) 0.033 (2.045) 0.055 (0.005) 0.136 (0.236) 0.036(2.136) 0.057 (0.005) 1.04 (0.01)/3-0.121 (0.253) -0.011 (-0.629) 0.064 (0.006) -0.125 (0.259) -0.015(-0.794) 0.067 (0.007) 1.05 (0.01)(b) Case 2: MAR-NMAR, Simulation Size=200Naive MCEM,3 E(,)(sD) S (Zja.) 11E()E (Sb) S(Z,jas)() ()-0.713 (0.242) -0.023 (-1.361) 0.059 (0.006) -0.728 (0.250) -0.038 (-2.128) 0.064(0.007) 1.08 (0.02)j3 0.280 (0.196) -0.130 (-9.373) 0.055 (0.005) 0.405 (0.293) -0.005 (-0.242) 0.086 (0.009) 1.55 (0.13)/32-0.241 (0.237) -0.021 (-1.227) 0.057 (0.007) -0.245 (0.242) -0.025(-1.455) 0.059 (0.007) 1.05 (0.01)/3 0.089 (0.239) -0.011 (-0.638) 0.057 (0.005) 0.091 (0.244) -0.009 (-0.505) 0.060(0.005) 1.04 (0.01)/3 -0.134 (0.250) -0.024 (-1.351) 0.063 (0.008) -0.136 (0.255) -0.026 (-1.459) 0.066 (0.008) 1.05 (0.01)I.Table 4.24: Comparing the effect of the missing datamechanism when the missing data was generated with aNMAR mechanism on the point estimate, bias, andmean squared error when a covariate has both missing dataand mismeasured data. The mechanism that generatesthe missing data and the missing data mechanism used tomodel the data is matched andmismatched. Mechanism A was used to generate the missing data for x1.(a) Case 3: NMAR-MAR, Simulation Size200Naive MCEME(/)(s’b) 7S(Zbias)()E()(sb) TS(Zbias)() ()-0.700 (0.202) -0.010 (-0.708) 0.041 (0.004)-0.704 (0.207) -0.014 (-0.938) 0.043 (0.005) 1.06 (0.02)i3i0280 (0.235) -0.130 (-7.822) 0.072 (0.007)0.415 (0.369) 0.005 (0.183) 0.136 (0.019) 1.89 (0.20)/32-0.242 (0.249) -0.022 (-1.223) 0.063 (0.007) -0.246 (0.256) -0.026(-1.450) 0.066 (0.007) 1.05 (0.01)/33 0.121 (0.259) 0.021 (1.119) 0.068 (0.008) 0.124 (0.268) 0.024(1.260) 0.073 (0.009) 1.07 (0.02)/34 -0.133 (0.255) -0.023 (-1.268) 0.065 (0.008) -0.136 (0.261) -0.026 (-1.429)0.069 (0.008) 1.05 (0.01)(b) Case 4: NMAR-NMAR, Simulation Size200Naive MCEME(/)(st) (Zbia5)()E()(sb) (zS)()ê()/3o -0.669 (0.228) 0.021 (1.331) 0.053 (0.005) -0.672 (0.241) 0.018 (1.062)0.058 (0.005) 1.11 (0.02)/3i0.284 (0.228) -0.126 (-7.847) 0.053 (0.005) 0.427 (0.365) 0.017(0.664) 0.134 (0.020) 1.97 (0.22)132-0.223 (0.212) -0.003 (-0.175) 0.045 (0.004) -0.228 (0.218) -0.008 (-0.487) 0.048(0.004) 1.06 (0.02),3 0.111 (0.244) 0.011 (0.651) 0.060 (0.007) 0.114 (0.249) 0.014 (0.799) 0.062 (0.008) 1.05 (0.01)/3-0.117 (0.236) -0.007 (-0.407) 0.056 (0.008) -0.119 (0.242) -0.009 (-0.528)0.059 (0.008) 1.05 (0.01)Table 4.25: Comparing the effect of the missing data mechanism when the missingdata was generated with aMAR mechanism on confidence interval length, andcoverage where a single covariate suffers from both missingdata and measurement error. Both matched and mismatched missingdata mechanisms are considered.(a) Case 1: MAR-MAR, Simulation Size=200/3jLprajve (SD)LMCEM()CoverageNajv (p-value) CoverageMcEM (p-value)0.900 (0.055) 0.936 (0.094) 0.945 (0.756) 0.950 (1.000)/3 0.812 (0.086) 1.207 (0.229) 0.880(0.004) 0.955 (0.732)/320.915 (0.087) 0.936 (0.099) 0.955 (0.732) 0.955 (0.732)/33 0.917 (0.086) 0.937 (0.095) 0.975 (0.024) 0.975 (0.024)/34 0.907 (0.086) 0.927 (0.097) 0.930 (0.268) 0.930 (0.268)(b) Case 2: MAR-NMAR, Simulation Size2OO13jLiiajve (SD) LMCEM (SD) CoverageNajve (p-value) CoverageMcEM (p-value)0.901 (0.055) 0.934 (0.083) 0.950 (1.000) 0.950 (1.000)/30.813 (0.079) 1.204 (0.207) 0.905 (0.028) 0.985 (<0.001)/320.914 (0.081) 0.933 (0.090) 0.960 (0.472) 0.960 (0.472)/3 0.917 (0.082) 0.935 (0.089) 0.965 (0.248) 0.965 (0.248)/30.914 (0.082) 0.932 (0.092) 0.945 (0.756) 0.945 (0.756)C-,’Table 4.26; Comparing the effect of the missing data mechanism when the missingdata was generated with aNMAR mechanism on confidence interval length, and coverage where a single covariate suffers from both missingdata and measurement error. Both matched and mismatched missing data mechanisms are considered.(a) Case 3: NMAR-MAR, Simulation Size2OO/3jLNaive(SD)LMCEM()CoverageNajve (p-value) CoverageMcEM (p-value)/3o0.898 (0.047) 0.929 (0.072) 0.970 (0.096) 0.975 (0.024),8i0.844 (0.087) 1.294 (0.294) 0.875(<0.001) 0.955 (0.732),82 0.917 (0.080) 0.938 (0.095) 0.935 (0.388) 0.935 (0.388)/33 0.915 (0.084) 0.937 (0.107) 0.940 (0.552) 0.940 (0.552)/34 0.915 (0.083) 0.937 (0.097) 0.940 (0.552) 0.940 (0.552)(b) Case 4: NMAR-NMAR, Simulation Size=2OO!3jLNaive()LMCEM()CoverageNajve (p-value) CoverageMcEM (p-value)-3 0.893 (0.052) 0.928 (0.089) 0.950 (1.000) 0.950 (1.000)/30.838 (0.091) 1.289 (0.311) 0.880 (0.004) 0.970 (0.096)/320.908 (0.078) 0.931 (0.095) 0.990(<0.001) 0.990(<0.001)83 0.903 (0.086) 0.925 (0.101) 0.940 (0.552) 0.940 (0.552)/34 0.905 (0.088) 0.928 (0.103) 0.955 (0.732) 0.955 (0.732)Table 4.27: Comparing the effect of the missingdata mechanism on themedian and difference between the median and true value(ó) where a singlecovariate suffers from both missing data and measurementerror.(a) Case 1: MAR-MAR, Simulation Size2OO/3irnN,(/3f) (Jb) MMcE!vI(/j) (Ab)Naive MGEM-0.698 (0.240) -0.729 (0.266) -0.008-0.03913i0.296 (0.228) 0.419 (0.327)-0.114 0.009/32-0.241 (0.234) -0.246 (0.241)-0.021 -0.026/33 0.148 (0.216) 0.149 (0.217) 0.0480.049/34 -0.087 (0.248) -0.087 (0.253)0.023 0.023(b) Case 2: MAR-NMAR, Simulation Size2OOf3jTflNajVe(/3j) (1b)MMCEM(/j) (MAD)&vaive MCEM/3 -0.727 (0.231) -0.745 (0.250)-0.037 -0.055i30.273 (0.207) 0.391 (0.288) -0.137-0.019/32-0.232 (0.236) -0.239 (0.239)-0.012 -0.019/33 0.093 (0.252) 0.094 (0.257)-0.007 -0.006/3-0.133 (0.250) -0.133 (0.255)-0.023 -0.023(c) Case 3: NMAR-MAR, Simulation Size=200/3,j rnNaive(/3j) (MAD) mMCEM(/3J) (MAD)6Naive MCEM/3o -0.693 (0.185) -0.703 (0.183)-0.003 -0.013/3i0.278 (0.231) 0.395 (0.346) -0.132-0.015/32-0.264 (0.223) -0.266 (0.224)-0.044 -0.046/33 0.124 (0.245) 0,125 (0.250)0.024 0.025/3-0.139 (0.21) -0.143 (0.215)-0.029 -0.033(d) Case 4: NMAR-NMAR, Simulation Size=200/3j rnNaive(13j)(ib) TnMCEM(13j)(MAD)Naive 5MCEM-0.664 (0.232) -0.666 (0.247)0.026 0.024/3 0.265 (0.212) 0.389 (0.324) -0.145-0.021/32-0.227 (0.204) -0.227 (0.209) -0.007-0.007/3 0.118 (0.222) 0.122 (0.223)0.018 0.022/3 -0.116 (0.229) -0.116 (0.235)-0.006 -0.0061474.5.3.2 Comparing missing data mechanismsThree missing data mechanisms will be considered (Table 4.22) andwe willcompare the effect of the different mechanism on the summary measuresfor the situation where the generating missing data mechanism is correctlyspecified for both the MAR and the NMAR cases. For the MARnaivecomplete-case, little difference in the estimated standard deviationacrossthe three missing data mechanisms is exhibited. For the parameterof primary interest,i3,the biased across all three mechanisms is comparable. Thelarge z-scores suggest significant biases, forj3,and/32and /33 for mechanismA. For /3o andi3,mechanism C, which is MCAR, produces the worst biases. We can think of the mechanisms as providing a progression from MARwith all the covariates, except x, participating in the model to MCARformechanism 3. From this perspective we see that the bias associatedwith theintercept worsens as the mechanism progresses towards MCAR, but isreduced for/32and /33. The MSE is similar in magnitude across mechanism foreach parameter estimate as are the confidence interval lengths. For coverage,we see the expected significant difference for/3,but we also see differencefrom the nominal rate with mechanism A for /33 and with mechanismC forall parameters excepti3and/32.When we consider the difference ö, wesee that mechanism C tends to do the best job at reducing the differenceswhereas mechanisms A and B have varied results. This contrasts withthebias.When we look to the MCEIVI summaries, we see observe that the estimated standard deviations tend to be larger than the naive approachwiththe most noticeable difference being the estimated standard deviations for/3i.The bias is significant for/3oacross all mechanisms and for132and /33with mechanism A. As we move from the naive approach to the MCEMmethod, we see that the bias, in general, tends to increase for all parameterestimates aside from/3i,which is associated with the imperfect covariate.It is interesting that the z-scores are similar to those of the naive approach.Although the biases have increased for all the estimated standard deviationsfor each parameter have also increased which has an almost combined null148effect on magnitude of the z-score.The MSE increases for all parameter estimates acrossall mechanisms.It seems that the increase in MSE is primarily drivenby the increase inthe estimated standard deviation. For/3ithere is a dramatic increase inthe estimated standard deviation which almost swampsany gains made bymanaging the bias. For the other estimates, we havea combined effect of aslightly larger estimate of the standard deviation witha general increase inthe bias. The efficiency is greater for the naive approachacross all mechanisms and parameter estimates, but outside ofi3,the two are rather closeto one another.The confidence interval lengths are longer for the MCEMapproach andthey are similar across missing data mechanism for each parameterestimate.The difference in length between the naive completecase method and theMCEM approach may not be statistically significant. Wesee the usual trendfrom the naive approach to the MCEM method forthe coverage. We canobserve that mechanism C, the MCAR mechanism,struggles to achieve thenominal rate, leavingi3, /32and /3 with coverage rates either too low ortOo high. Finally we see that mechanism C has a general decreasein Smoving from the complete-case method to theMCEM whereas the othertwo mechanism have a general increase in S.Turning out attention to case 4, NMAR-NMAR,we see that the estimates, for the complete-case situation, are similar in magnitudeacross themechanism and that the estimated standard deviationsare large. The biasis not uniform across the three mechanisms with mechanismC performingworse than the other two. We see the expected biasin/3i•The MSE tends to increase as we move from mechanismA to C. Thelarger bias of mechanism C is the main driver for thedifference between theMSE of the parameters associated with mechanismsA and B and those ofC. This suggests that the mechanism may in fact be dissimilar interms oftheir MSE, which appears to result from the differencesin the biases.We see little difference across the mechanismsfor the confidence intervals. We observe that mechanism C struggles withthe attainment of thenominal coverage rate for all parameters except /34.Mechanism B is unable149to attain the coverage rate for the interceptparameter and mechanism Afails for/32.Mechanism C consistently has the largestmagnitudes foröNaiveand we observe that5Najvefori3is large across all three mechanisms.When we adjust for the imperfection usingthe MCEM approach, we seethat the point estimates are similar acrossmechanisms for each parameterwith large estimates of the standard deviation.The MCEM approach appears to perform well for mechanisms A andB, but struggles to adequatelycorrect the biases of mechanismC. This leads to a noticeable dissimilarity across the mechanisms. This translatesto some dissimilarities for theMSE or each parameter across the mechanisms. Forall parameter estimatesassociated with mechanismC, except for the estimate of /33, the MSE is noticeably larger than theother two mechanism and is most likely statisticallysignificant. We see as a general trend,as the perfect covariates are removedfrom the missing data mechanism, theMSE tends to increase.We see no substantial differences for theparameter estimates acrossmechanism for the lengthof the confidence intervals. When we considerthe coverage rates, we have some surprises. For example,/3ifails to reachthe nominal rate for mechanismC. We also see a lack of attainment formechanism C with and /33 and for mechanismA with/32.For the difference between the medan and the true value, wesee the expected correctionappear for/9.Aside from this, what is noticeable isa distinct similarity betweenöMCEM for mechanisms A and B but a dissimilarity with mechanismC.Discussion Under the MAR missingdata model, if we are only interestedini3and not the other covariates, that is they areincluded in the modelfor adjustment only, then we seethe desired corrections as we move fromthe naive complete-case approach to the MCEMmethodology. If we areinterested in all the covariates, then we have somemixed results. We seethe familiar trade-off, in that the correction in thebias of the parameterassociated with the imperfect covariateappears to increase of bias for othercovariates. This happened across all missingdata mechanisms for,3o, 2,and /33 and for mechanisms A and B with Althoughthere were increases,150these have a minimal impact on the estimates themselves and would haveeven less impact when translated to the odds-ratio scale.Although for most measures there was little to distinguish the three missing data mechanisms, the coverage rates attained doespose some concern.Mechanism C, which is MCAR, struggled to reach the nominal coverage ratefor most parameters. This is a serious issue for this willaffect the sensibility of inference for the affected parameters. If we wereto be restricted tothe parameter associated with the imperfect covariate, then we have littleconcern in this area.In general, we observe few differences between the three missingdatamechanisms under the MAR model. As noted above, there isa differencein coverage for mechanism C compared to mechanismsA and B. We alsonote a difference in how 5 performs when moving from the complete-casesituation to the MCEM approach. Mechanism C showsa general decreasein the magnitude of S whereas the other two mechanisms tend to show anincrease.When we turn to the NMAR-NMAR case, we see in general a trendtowards a similarity in results for mechanisms A and B,but a dissimilaritywith C. This was surprising due to the results for the MAR-MAR portionof this experiment. Mechanism C in the NMAR-NMAR situation has themissing data mechanism solely reliant on the unobservable realizationsof theimperfect random variable. The biased estimates, as well as the evidenceagainst the similarity of Mechanism C with the other two suggests thatperhaps a synergistic effect is occurring in that the combined imperfectionproblem within a single covariate results in a dissonance between what isexpected when missing data and mismeasurement is considered in isolationwith the effect that manifests when the two problems are combined.151Table 4.28: Comparing different missing data mechanismswhen when the mechanism that generates the data andthe mechanism used to model the data is matched(MAR-MAR) for a covariate with both missing and mismeasureddata.ModelNaive MCEMe()______/3E(1) (s’v)A /3o -0.722 (0.230)B13o-0.758 (0.239)C/3o-0.793 (0.217)Ai30.291 (0.214)B13i0.311 (0.227)C 0.282 (0.230)A132-0.260 (0.237)B132-0.239 (0.249)C132-0.228 (0.242)A /33 0.133 (0.231)B /3 0.116 (0.239)C /3 0.113 (0.282)A /34 -0.121 (0.253)B /34 -0.120 (0.248)C/3-0.097 (0.172)Bias(Zbias)-0.032 (-1.992)-0.068 (-4.031)-0.103 (-6.713)-0.119 (-7.833)-0.099 (-6.187)-0.128 (-7.897)-0.040 (-2.417)-0.019 (-1.097)-0.008 (-0.474)0.033 (2.045)0.016 (0.959)0.013 (0.651)-0.011 (-0.629)-0.010 (-0.568)0.013 (1.079)E(3) (st) s(zi,jas)0.054 (0.005) -0.738 (0.241) -0.048 (-2.809) 0.060 (0.006) 1.12 (0.03)0.062 (0.008) -0.776 (0.255) -0.086 (-4.808) 0.072 (0.010) 1.17 (0.05)0.058 (0.006) -0.815 (0.231) -0.125 (-7.662) 0.069 (0.006) 1.19 (0.03)0.060 (0.006) 0.418 (0.319) 0.008 (0.375) 0.102 (0.011) 1.70 (0.14)0.061 (0.007) 0.447 (0.354) 0.037 (1.470) 0.127 (0.024) 2.07 (0.25)0.069 (0.006) 0.398 (0.336) -0.012 (-0.524) 0.113 (0.014) 1.63 (0.13)0.058 (0.006) -0.266 (0.242) -0.046 (-2.662) 0.061 (0.007) 1.05 (0.01)0.062 (0.006) -0.245 (0.255) -0.025 (-1.374) 0.065 (0.006) 1.05 (0.01)0.058 (0.005) -0.230 (0.245) -0.010 (-0.583) 0.060 (0.005) 1.03 (< 0.01)0.055 (0.005) 0.136 (0.236) 0.036 (2.136) 0.057 (0.005) 1.04 (0.01)0.057 (0.006) 0.120 (0.248) 0.020 (1.123) 0.062 (0.006) 1.08 (0.03)0.080 (0.006) 0.117 (0.291) 0.017 (0.825) 0.085 (0.007) 1.06 (0.01)0.064 (0.006) -0.125 (0.259) -0.015 (-0.794) 0.067 (0.007) 1.05 (0.01)0.062 (0.007) -0.123 (0.254) -0.013 (-0.713) 0.065 (0.008) 1.05 (0.01)0.030 (0.003) -0.099 (0.176) 0.011 (0.913) 0.031 (0.003) 1.04 (< 0.01)I’cJ1Table 4.29: Confidence interval length,and coverage when when when the mechanism that generatesthe data andthe mechanism used to model the data is matched (MAR-MAR)for a covariate with both missing and mismeasureddata.Model/3jLNaive()LMCEM (SD) CoverageNajve (p-value) CoverageMcEM (p-value)A 0.900 (0.055)0.936 (0.094) 0.945 (0.756) 0.950 (1.000)B j3 0.909 (0.060) 0.951 (0.125)0.935 (0.396) 0.935 (0.396)C/3o0.916 (0.058) 0.958 (0.107) 0.900 (0.020)0.900 (0.020)A /3 0.812 (0.086) 1.207 (0.229)0.880 (0.004) 0.955 (0.732)B /3 0.803 (0.087) 1.194 (0.291)0.891 (0.008) 0.945 (0.768)C 0.798 (0.076) 1.162 (0.199)0.900 (0.020) 0.950 (1.000)A/20.915 (0.087) 0.936 (0.099) 0.955 (0.732)0.955 (0.732)B/320.919 (0.078) 0.942 (0.091) 0.945 (0.768)0.945 (0.768)C/320.925 (0.091) 0.944 (0.095) 1.000(<0.001) 1.000(<0.001)A /33 0.917 (0.086) 0.937 (0.095)0.975 (0.024) 0.975 (0.024)B /33 0.923 (0.082) 0.946 (0.101) 0.945(0.768) 0.949 (0.988)C /33 0.915 (0.086) 0.936 (0.111) 0.950 (1.000)0.950 (1.000)A /34 0.907 (0.086) 0.927(0.097) 0.930 (0.268) 0.930 (0.268)B /34 0.913 (0.084) 0.936 (0.098) 0.945(0.768) 0.945 (0.768)C/3.0.916 (0.075) 0.933 (0.074) 1.000(<0.000) 1.000(<0.001).Table 4.30: Comparing the effect of themissing data mechanism on themedian and difference between the median and thetrue value (ö) when themechanism that generated the missingdata and mechanism assumed bythe model are matched (MAR-MAR) fora covariate with both missing andmismeasured data.Model/jMNaive(/j) (MAD) fñMCEM(/j) (MAD) SNaiveMGEMA -0.698 (0.240)-0.729 (0.266) -0.008 -0.039B /3 -0.744 (0.215)-0.761 (0.238) -0.054 -0.071C -0.794 (0.121)-0.809 (0.136) -0.104 -0.119A 0.296 (0.228)0.419 (0.327) -0.114 0.009B /3 0.307 (0.184)0.428 (0.275) -0.103 0.018C /3 0.291 (0.204) 0.405(0.300) -0.119 -0.005A/32-0.241 (0.234) -0.246 (0.241) -0.021-0.026B/32-0.24 (0.272) -0.243 (0.278) -0.020 -0,023C/32-0.193 (0.296) -0.194 (0.298) 0.0270.026A /33 0.148 (0.216) 0.149 (0.217)0.048 0.049B /33 0.11 (0.252)0.115 (0.255) 0.010 0.015C /33 0.069 (0.276) 0.070 (0.279) -0.031 -0.030A /34 -0.087 (0.248)-0.087 (0.253) 0.023 0.023B /34 -0.129 (0.235) -0.131(0.234) -0.019 -0.021C -0.095 (0.168)-0.096 (0.171) 0.015 0.014154Table 4.31: Comparing different missing data mechanismswhen when the mechanism that generates the dataand the mechanism used to model the data is matched(NMAR-NMAR) for a covariate with both missingandmismeasured data.Bias(Zbias)Naive MCEMA)Model /3 E () (s’D)()E (sr)(z,jas)()A/3o-0669 (0.228) 0.021 (1.331) 0.053 (0.005) -0.672 (0.241) 0.018(1.062) 0.058 (0.005) 1.11 (0.02)B/3o-0.671 (0.261) 0.019 (1.049) 0.069 (0.006) -0.658 (0.275)0.032 (1.640) 0.077 (0.007) 1.12 (0.03)C -0.508 (0.256) 0.182 (10.061) 0.098(0.008) -0.499 (0.289) 0.191 (9.344) 0.120 (0.010) 1.22 (0.02)A 0.284 (0.228) -0.126 (-7.847) 0053 (0.005) 0.427 (0.365) 0.017(0.664) 0.134 (0.020) 1.97 (0.22)B /3 0.268 (0.233) -0.142 (-8.607) 0.074 (0.006) 0.425 (0.380)0.015 (0.563) 0.144 (0.017) 1.94 (0.20)C i3 0.325 (0.287) -0.085 (-4.176)0.089 (0.006) 0.544 (0.527) 0.134 (3.603) 0.296 (0.039) 3.31 (0.35)A/32-0.223 (0.212) -0.003 (-0.175) 0.045 (0.004) -0.228 (0.218) -0.008(-0.487) 0.048 (0.004) 1.06 (0.02)B/32-0.221 (0.238) -0.001 (-0.039) 0.057 (0.006) -0.226 (0.245)-0.006 (-0.354) 0.060 (0.007) 1.06 (0.01)C/32-0.134 (0.252) 0.086 (4.847) 0.071 (0.007) -0.132 (0.267)0.088 (4.674) 0.079 (0.009) 1.12 (0.02)A /33 0.111 (0.244) 0.011 (0.651) 0.060 (0.007) 0.114 (0.249) 0.014(0.799) 0.062 (0.008) 1.05 (0.01)B /33 0.098 (0.253) -0.002 (-0.123) 0.064 (0.007) 0.101 (0.259) 0.001(0.056) 0.067 (0.007) 1.05 (0.01)C /33 0.102 (0.287) 0.002 (0.085) 0.082 (0.007) 0.106 (0.302)0.006 (0.294) 0.091 (0.009) 1.10 (0.01)A 34 -0.117 (0.236) -0.007 (-0.407) 0.056 (0.008) -0.119 (0.242)-0.009 (-0.528) 0.059 (0.008) 1.05 (0.01)B /34 -0.097 (0.263) 0.013 (0.722) 0.069 (0.008) -0.098 (0.270) 0.012 (0.633) 0.073 (0.008)1.05 (0.01)C 3 -0.065 (0.266) 0.045 (2.399) 0.073 (0.006) -0.067 (0.276) 0.043 (2.229)0.078 (0.007) 1.07 (0.01)c-IlTable 4.32: Confidence intervallength, and coverage when when whenthe mechanism that generates the dataand the mechanism usedto model the data is matched (NMAR-NMAR)for a covariate with both missing andmismeasured data.Model /3LNaive()LMCEM (gb) CoverageNaje (p-value) CoverageQEM (p-value)A /3o 0.893 (0.052) 0.928(0.089) 0.950 (1.000) 0.950(1.000)B j3 0.904 (0.059)0.942 (0.081) 0.910 (0.048) 0.920(0.116)C 0.889 (0.057) 0.963 (0.124)0.750(<0.001) 0.750(<0.001)A /3 0.838 (0.091) 1.289(0.311) 0.880 (0.004)0.970 (0.096)B j3 0.882 (0.092) 1.391 (0.325)0.885 (0.004) 0.970 (0.096)C/3i0.906 (0.121) 1.565 (0.613) 0.850(<0.001) 1.000(<0.001)A1320.908 (0.078) 0.931 (0.095) 0.990(<0.001) 0.990(<0.001)B/320.912 (0.083) 0.935 (0.095) 0.955(0.732) 0.955 (0.732)C1320.873 (0.081) 0.912 (0.121) 0.900(0.020) 0.950 (1.000)A /33 0.903 (0.086) 0.925 (0.101)0.940 (0.552) 0.940 (0.552)B /33 0.923 (0.087) 0.946 (0.101)0.935 (0.388) 0.935 (0.388)C /33 0.861 (0.084) 0.901 (0.120)0.900 (0.020) 0.900 (0.020)A/30.905 (0.088) 0.928 (0.103) 0.955 (0.732)0.955 (0.732)B /34 0.906 (0.085)0.928 (0.095) 0.945 (0.756) 0.945(0.756)C /34 0.899 (0.086) 0.938 (0.120)0.950 (1.000) 0.950 (1.000)Table 4.33: Comparing the effect of the missingdata mechanism on themedian and difference between the medianand the true value (ö) when themechanism that generated the missingdata and mechanism assumed by themodel are matched (NMAR-NMAR) fora covariate with both missing andmismeasured data.Model/3irnNaive(/3j) (MAD)TnM0EMCI3j)(MAD)‘5Naive 5MCEMA /3 -0.664 (0.232)-0.666 (0.247) 0.026 0.024B /3 -0.639 (0.241)-0.619 (0.258) 0.051 0.070C -0.513 (0.186) -0.479(0.222) 0.177 0.211A 0.265 (0.212)0.389 (0.324) -0.145 -0.021B 0.266 (0.237)0.390 (0.360) -0.144 -0.020C 0.284 (0.334) 0.416(0.495) -0.126 0.006A/32-0.227 (0.204) -0.227 (0.209)-0.007 -0.007B/32-0.192 (0.223)-0.193 (0.227) 0.028 0.027C/32-0.145 (0.198) -0.147 (0.199)0.075 0.073A /33 0.118 (0.222)0.122 (0.223) 0.018 0.022B /33 0.110 (0.245) 0.110(0.244) 0.010 0.010C /33 0.163 (0.308) 0.168 (0.314)0.063 0.068A /34 -0.116 (0.229)-0.116 (0.235) -0.006 -0.006B -0.103 (0.251)-0.105 (0.259) 0.007 0.005C /34 -0.072 (0.308) -0.073 (0.313)0.038 0.0371574.5.3.3 Comparing the effect of sample sizeFor this experiment, we use missing data mechanismA and we are interestedin observing how the summary measures changeas the sample size increasefrom n = 50 to n = 250. Although the largest sample sizeis still modest, itwill serve to indicate possible asymptotic behavioursfor both the complete-case and MCEM approaches. We will consider case1, MAR-MAR, and case4, NMAR-NMAR.For the complete-case method we see that the estimatedstandard deviation of the parameter estimates decreases as n increases(Table 4.34). Wealso see a decrease in bias for/3oand /33. For/32we see the bias increasewhen we go from a sample size of 50 to that of 100,and then we see adecrease when we move from a sample size of100 to that of 250. For /34we see the reverse action. The the bias associatedwith the parameter ofinterest,/3i,increases as the sample size increase.Given that the estimatedstandard deviation is much larger than the bias, the MSEexhibits the sametrend: decreasing as the sample size increases.The lengths of the confidence intervals decreaseas the sample size increases (Table 4.35). When it comes to the coverage rates,we have a mixedstory. For/3oand/32we attain the nominal coveragerate for all samplesizes. For the parameter of interest, we see a worseningof the coverage andan inability to attain the coverage rate, but this is not surprising.For /33 wesee that the coverage rate improves with increasingsample size and movesfrom missing the nominal rate to achieving it.The reverse is true for /34where the coverage rate actually degrades withan increasing sample size,but maintains the nominal rate. We see the same trendfor/3,but it ceasesto reach the nominal rate at n = 100. For 6 we see thatboth and /33decrease in magnitude as the sample size increases, but wesee the oppositefor /31(Table 4.36).Turning now to the MCEM approach, we seesome familiar trends. Theestimate of the standard deviation decreases for allparameters as the samplesize increases (Table 4.34). For/3üand /33, we see that the bias decreaseswith increasing n, but we see mixed results for the remaining parameters.158Both/3and /34 exhibit a decrease in bias as we move fromn 50 to n = 100and then increase as we move to m = 250. The reverseis true for the biasassociated with the estimate of/32.Although there are convoluted trendsfor /3k,/32and /34 we do see some clarity in terms of possiblestatisticalsignificance. For/3i,the associated estimate of the bias moves from beingsignificantly different from zero to having no evidencefor a difference asthe sample size increases. This does not hold for/32which moves fromno evidence of a difference to a significant differenceto no evidence for adifference. In an opposite fashion to the estimatedbias fori3,we see thebias associated with /34 go from having no evidenceof a difference to havinga significant difference from the null value of zero.As with the complete-case method, we see that the MSE and ê decreaseswith an increase in thesample size.The length of the confidence intervals exhibitthe same trend as withthe complete-case with the additional observationthat the mean lengthsof the MCEM confidence intervals constructed usingthe Louis standarderrors appears to be approaching that of the complete-caseas the samplesize increases (Table 4.35). As with the naive complete-caseapproach, wehave a mixed story for the coverage rates. Wesee that both/3and/32achieve the nominal coverage for all sample sizes.For/3i,the rate is toohigh, but achieves the nominal rate when n = 250.Both /33 and /34 havecoverage rates that decrease with increasing n. In bothcases they achievethe nominal coverage rate as the sample sizeincreases. Finally, we see thatöMCEM improves as the sample size improves for/3oand /33 as the samplesize increases (Table 4.36). We also see thatj3becomes similar to thecomplete-case estimate whileöMCEM for /33 for all sample sizes. Foronly three measurement point, there is no clear pattern for theestimate of6MCEMassociated with the remaining parameterestimates, but it is clearthat MCEM does do a good job in reducing the differenceassociated withthe estimate of /3k.When the model generating the missing informationand the model assumed are matched and is NMAR, case 4, we seesimilar results as before.For both the complete-case and the MCEM approacheswe see that the esti159mate of the standard deviation, MSE and length of the estimated confidenceinterval for the parameter estimate decrease as the sample size increases (Table 4.37).The bias presents a mixed story. The intercept and /34 go from havinga large z-score suggesting a statistical difference in the bias when n = 50to a much lower z-score score which posits that there is a lack of evidencesupporting any difference in the bias from the null value of zero when n =100. When moving to n = 250, we see that the z-score is much largerand once again suggests a significant difference. In the naive situation, thebias associated with the estimate of/3iincreases as n increases, but forthe MCEM method, we see that there is no clear trend, other than a lackof evidence for a significant difference for all sample sizes. For/32we seesmall z-scores for all sample sizes even though the bias travels in differentdirections for the naive and MCEM methods as the sample size increases.Finally we see that the estimated bias for the estimate of /33 decreases asnincreases.Coverage in this situation is a rather complicated issue. There are noclear or general trends over both scenarios in this experiment. The trendsfor the coverage rates associated with the estimates of/32,/33, and /34 areidentical for both the naive complete-case and MCEM methods (Table4.38).For/32,we see the coverage oscillate from attaining the nominal rate atn = 50 to being different form it at m = 100, to attaining the rate onceagain when n = 250. We see with /33 that the coverage is rather stableand all values suggest the attainment of the nominal rate for all samplesizes. Turning to it is observed that the coverage rate declines for bothapproaches, but achieves the coverage rate for all sample sizes. For thenaive approach, we see that the coverage associated with is stable andachieves the nominal rate for all sample sizes, but for the MCEM approachthe coverage goes from being too high with n = 50 to the attainment of thecoverage rate for n = 100 and n = 250. Finally, we see that the coverage ratefor the parameter of interest in the naive situation declines as the samplesize increases and never reaches the nominal rate. For the MCEM method,we see that the nominal rate is achieved for all sample sizes for/3k,but160is abandoned with n = 100. For the differences,identical trends with thecomplete-case and the MCEM methods fordifferences associated with allthe parameters except32are observed. Here we see opposing trendsas wemove from the naive to the MCEM approach;there is no clear overall trendfor the differences (Table 4.39).Discussion Considering first case 1,we observe that as the sample sizeincreases there is a general improvementin the bias associated with all theparameters except/3i.The parameter of interest, appearsto becomemore biased as the sample size increase.We see the expected shrinkage ofthe estimated standard deviation, MSE, and a shorteningof the confidenceintervals as the sample size increases. The coveragerate associated with/idegrades as the sample size increases. Thereappears to be a generalimprovement in the attainment of the nominalcoverage rate for the otherparameters, but this is not a uniformresult.For the MCEM approach, we donot see such a clear improvement in thebias as with the naive approach. The bias associatedwith j3 shows littleevidence against the null of no biasas the sample size increases which isdesirable, but the results for the remaining estimatesis a bit of a quagmire.The reduction in estimated standard deviation,MSE, ê, and the expectedshortening of confidence interval lengths is observed.There appears to be ageneral improvement for the attainment of thenominal coverage rate overall the estimates.When the missing data mechanismis NMAR for the generating model,we see the expected reduction in the estimatedstandard deviation, MSE, andconfidence interval length for the complete-case approach.The same trendfor a general bias reduction with all parametersexcept/3as the sample sizeincrease, but the reverse fori3.The significance of the bias with respecttothe z-score shows no clear pattern over all the parameterestimates, exceptfor that already noted with/3.For the coverage, the general trend is toreach the nominal rate except in the case ofThe MCEM approach exhibits theexpected trends for the estimatedstandard deviation, MSE, ê, and confidence intervallength. What appears161to be a general trend in the z-scores towards a lack of evidence fora differencewith the null of no bias is frustrated with the biasshowing a significantdifference for/oand /34. This is a noticeable change in direction whenmoving from n = 100 to n = 250. The coverage performs muchbetter witha general trend towards the attainment of the nominal rate as thesamplesize increases.Over both missing data mechanisms and both methodologieswe see thedesired improvements in the estimate of the standard deviation,MSE, ê,and confidence interval length as the sample size increases. In termsof biasand coverage the results are not as transparent. Ingeneral, there is animprovement in the bias of the estimates underthe MCEM approach, with/3iexhibiting the most consistent trend. A peculiar result wasa degradationof the bias associated with for the complete-case analysis. Thismay bea manifestation of the combined effect of the imperfections on the estimate.The coverage is another murky set of results with littletransparency otherthan an overall improvement in the coverage for theMCEM methodologywhen the missing data mechanism is NMAR.162Table 4.34: Comparing theeffect of sample size on point estimation, bias, mean squarederror, and relativeefficiency for case 1 (MAR-MAR) whena covariate has both missing data and mismeasurement. Missingdatamechanism A was used for this comparison.(a) Case 1: MAR-MAR, Sample Size=50, Simulation Size200Naive MCEME()(s’D) s(Zbias))ci:::E()E(,)(sb) s(Zbjas) 1E (5D) e()-0.817 (0.380) -0.127 (-4.724) 0.161 (0.017)-0.850 (0.421) -0.160 (-5.372) 0.203 (0.024) 1.26 (0.05)0.339 (0.355) -0.071 (-2.818) 0.131 (0.012)0.524 (0.589) 0.114 (2.737) 0.359 (0.052) 2.74 (0.26)/32-0.236 (0.385) -0.016 (-0.591) 0.149 (0.015) -0.246(0.402) -0.026 (-0.929) 0.163 (0.017) 1:09 (0.02)/33 0.177 (0.360) 0.077 (3.042) 0.135 (0.014) 0.183(0.379) 0.083 (3.082) 0.150 (0.016) 1.11 (0.02)/3-0.132 (0.393) -0.022 (-0.810) 0.155 (0.020) -0.140(0.413) -0.030 (-1.018) 0.171 (0.023) 1.11 (0.04)(b) Case 1: MAR-MAR, Sample Size=100, Simulation Size200NaiveMCEME()(s))(zS) (b) E()(st) S (Zbias)() ()j3 -0.722 (0.230) -0.032 (-1.992) 0.054 (0.005) -0.738(0.241) -0.048 (-2.809) 0.060 (0.006) 1.12 (0.03)/3i0.291 (0.214) -0.119 (-7.833) 0.060 (0.006) 0.418(0.319) 0.008 (0.375) 0.102 (0.011) 1,70 (0.14)/32-0.260 (0.237) -0.040 (-2.417) 0.058 (0.006) -0.266 (0.242)-0.046 (-2.662) 0.061 (0.007) 1.05 (0.01)/33 0.133 (0.231) 0.033 (2.045) 0.055 (0.005) 0.136(0.236) 0.036 (2.136) 0.057 (0.005) 1.04 (0.01)/3-0.121 (0.253) -0.011 (-0.629) 0.064 (0.006) -0.125(0.259) -0.015 (-0.794) 0.067 (0.007) 1.05 (0.01)(c) Case 1: MAR-MAR, Sample Size250, Simulation Size=200Naive MCEMECã)(sb)iiTs(zbias)()E() (sD)8(Zbjas)() ()j3o -0.662 (0.137) 0.028 (2.886) 0.019 (0.002) -0.660 (0.140) 0.030 (3.025) 0.020 (0.002) 1.05 (0.01)/30.270 (0.132) -0.140 (-14.963) 0.037 (0.003) 0.391 (0.197)0.019 (-1.392) 0.039 (0.004) 1106 (0.09)/2-0.213 (0.145) 0.007 (0.689) 0.021 (0.003) -0.216 (0.148) 0.004 (0.391) 0.022(0.003) 1.03 (0.01)0.099 (0.151) -0.001 (-0.124) 0.023 (0.002) 0.100 (0.154) 0.000 (0.008) 0.024 (0.002) 1.03(< 0.01)/3-0.135 (0.151) -0.025 (-2.387) 0.023 (0.002) -0.138 (0.153) 0.028 (-2.546) 0.024 (0.002) 1.04(<0.01)Table 4.35: Comparing the effect of samplesize on confidence interval length, and coverage for case 1 (MARMAR) when a covariate has both missingdata and mismeasurement. Missing data mechanism A was used forthis comparison.(a) Case 1: MAR-MAR, Sample Size5O, Simulation Size2OOI3jLiyajve()LMCEM()CoverageNajve (p-value) CoverageMcEM (p-value)./3 1.388 (0.167) 1.527 (0.401) 0.940 (0.552) 0.950 (1.000)/3i1.277 (0.238) 2.103 (0.954) 0.920 (0.116) 0.975 (0.024)/321.419 (0.228) 1.486 (0.284) 0.955 (0.732) 0.955 (0.732)/33 1.391 (0.211) 1.462 (0.305) 0.975 (0.024) 0.975 (0.024)/31.406 (0.205) 1.478 (0.315) 0.960 (0.472) 0.970 (0.096)(b) Case 1: MAR-MAR, Sample Size=1OO, Simulation Size2OO/3jLNaive()LMCEM (SD) CoverageNajve (p-value) CoverageMcEM (p-value)/3 0.901 (0.055) 0.934 (0.083) 0.950 (1.000) 0.950 (1.000)/3 0.813 (0.079) 1.204 (0.207) 0.905 (0.028) 0.985(<0.001)/320.914 (0.081) 0.933 (0.090) 0.960 (0.472) 0.960 (0.472)/3 0.917 (0.082) 0.935 (0.089) 0.965 (0.248) 0.965 (0.248)/34 0.914 (0.082) 0.932 (0.092) 0.945 (0.756) 0.945 (0.756)(c) Case 1: MAR-MAR, Sample Size25O, Simulation Size2OO/3j LNaive(SD)LMCEM()CoverageNaive (p-value) CoverageMcEM (p-value)/3 0.546 (0.015) 0.556 (0.020) 0.945 (0.756) 0.955 (0.732)/30.506 (0.030) 0.747 (0.079) 0.800(<0.001) 0.965 (0.248)/320.551 (0.030) 0.559 (0.032) 0.950 (1.000) 0.950 (1.000)/3 0.551 (0.031) 0.559 (0.032) 0.940 (0.552) 0.940 (0.552)/34. 0.550 (0.032) 0.558 (0.034) 0.925 (0.180) 0.930 (0.268)Table 4.36: Comparing the effeët ofsample size on the median and differencebetween the median and true value (ö)for case 1 (MAR-MAR) where a singlecovariate suffers from both missingdata and measurement error. Missingdata mechanism A was used for this comparison(a) Case 1: MAR-MAR, Sample Size5O, Simulation Size2OO/3irnNaiveCBj) (MAD) fhMCEM(/3j) (MAD) cNaive8MCEM-0.778 (0.360) -0.813 (0.399) -0.088-0.123/3i0.314 (0.356) 0.451 (0.507) -0.0960.041/32-0.215 (0.347) -0.222 (0.365)0.005 -0.002/3 0.160 (0;336) 0.161(0.341) 0.060 0.061/34 -0.094 (0.382)-0.095 (0.401) 0.016 0.015(b) Case 1: MAR-MAR, Sample Size1OO, Simulation Size2OO13,j 7flNaive(/3j)(MAD)rnMCEM(/3j)(MAD)8Naive MCEM-0.727 (0.231) -0.745 (0.250)-0.037 -0.055/3i0.273 (0.207) 0.391 (0.288) -0.137-0.019/32-0.232 (0.236) -0.239 (0.239) -0.012-0.019/33 0.093 (0.252) 0.094 (0.257)-0.007 -0.006-0.133 (0.250) -0.133 (0,255) -0.023-0.023(c) Case 1: MAR-MAR, Sample Size25O, Simulation Size2OO13jrnNaive(/3j) (MAD) mMCEM(/3j)(MAD)6Naive 6MCEM/3 -0.654 (0.129) -0.605 (0.134)0.036 0.0400.266 (0.135) 0.372 (0.190) -0.144-0.038/32-0.211 (0.120) -0.214 (0.122) 0.0090.006/3 0.101 (0.164) 0.103 (0.166)0.001 0.003/3 -0.124 (0.150) -0.127(0.152) -0.014 -0.017165I.Table 4.37: Comparing the effect of samplesize on point estimation, bias, mean squared error, and relativeefficiency for case 4 (NMAR-NMAR) when a covariate hasboth missing data and mismeasurement. Missing datamechanism A was used for this comparison.(a) Case 4: NMAR-NMAR, Sample Size=50, Simulation Size=200Naive MCEME()(sD) S (Z&jas)()E (/) (sD) BiS(Zbias)() ()-0.771 (0.391) -0.081 (-2.938) 0.159 (0.022)-0.790 (0.427) -0.100 (-3.299) 0.193 (0.029) 1.21 (0.04)i3i0.293 (0.370) -0.117 (-4.476) 0.151 (0.016) 0.462 (0.652)0.052 (1.133) 0.428 (0.076) 2.85 (0.33)/32-0.218 (0.405) 0.002 (0.070) 0.164 (0.021) -0.229 (0.444) -0.009 (-0.285) 0.197(0.032) 1.21 (0.06)/33 0.149 (0.394) 0.049 (1.773) 0.158 (0.017) 0.162 (0.425) 0.062 (2.076)0.184 (0.025) 1.17 (0.07)/3-0.188 (0.388) -0.078 (-2.860) 0.156 (0.022) -0.196 (0.405) -0.086(-3.001) 0.171 (0.026) 1.10 (0.03)(b) Case 4: NMAR-NMAR, Sample Size= 100, Simulation Size200Naive MCEME(,)(sr) (Z,jas) 1i()E(‘)(sD) (Z,jas)() ()/3o -0.669 (0.228) 0.021 (1.331) 0.053 (0.005) -0.672 (0.241) 0.018 (1.062)0.058 (0.005) 1.11 (0.02)/3i0.284 (0.228) -0.126 (-7.847) 0.053 (0.005) 0.427 (0.365) 0.017 (0.664) 0.134 (0.020) 1.97 (0.22)/32-0.223 (0.212) -0.003 (-0.175) 0.045 (0.004) -0.228 (0.218) -0.008 (-0.487)0.048 (0.004) 1.06 (0.02)/30.111 (0.244) 0.011 (0.651) 0.060 (0.007) 0.114 (0.249) 0.014 (0.799) 0.062(0.008) 1.05 (0.01)/3 -0.117 (0.236) -0.007 (-0.407) 0.056 (0.008) -0.119 (0.242) -0.009 (-0.528)0.059 (0.008) 1.05 (0.01)(c) Case 4: NMAR-NMAR, Sample Size=250, Simulation Size200Naive MCEMi3E (j) (sr)(Zbias) 1i1()E(/)(s)) S (Zbias) )1E() ()-0.662 (0.137) 0.028 (2.886) 0.019 (0.002) -0.660 (0.140) 0.030 (3.025) 0.020 (0.002) 1.05 (0.01)/3 0.270 (0.132) -0.140 (-14.963) 0.037 (0.003) 0.391 (0.197) 0.019 (-1.392) 0.039 (0.004) 1.06 (0.09)/32-0.213 (0.145) 0.007 (0.689) 0.021 (0.003) -0.216 (0.148) 0.004 (0.391) 0.022 (0.003) 1.03 (0.01)0.099 (0.151) -0.001 (-0.124) 0.023 (0.002) 0.100 (0.154) 0.000 (0.000) 0.024 (0.002) 1.03 (0.00)/3-0.135 (0.151) -0.025 (-2.387) 0.023 (0.002) -0.138 (0.153) 0.028 (-2.546) 0.024 (0.002) 1.04 (0.00)Table 4.38: Comparing the effect ofsample size on confidence interval length, and coverage for case 4 (NMARNIVIAR) when a covariate has both missing dataand mismeasurement. Missing data mechanism A was used forthis comparison.(a) Case 4: NMAR-NMAR, Sample Size5O, Simulation Size2OO/3 LNaive()LMCEM()CoverageNajve (p-value) CoverageMcEM (p-value)1.385 (0.197) 1.509 (0.386) 0.955 (0.732)0.975 (0.024)/3i1.294 (0.242) 2.178 (1.191) 0.895 (0.012)0.955 (0.732)/321.415 (0.252) 1.507 (0.457) 0.950 (1.000)0.955 (0.732)/33 1.404 (0.222) 1.495 (0.482)0.935 (0.388) 0.945 (0.756)1.420 (0.242) 1.491 (0.326) 0.960 (0.472)0.965 (0.248)(b) Case 4: NMAR-NMAR, Sample Size1OO, Simulation Size2OO13jLpqajve (gb)LMCEM()CoverageNajv (p-value) CoverageMcEM (p-value)/3o 0.893 (0.052) 0.928 (0.089) 0.950 (1.000)0.950 (1.000)i3 0.838 (0.091) 1.289 (0.311) 0.880 (0.004) 0.970 (0.048)1320.908 (0.078) 0.931 (0.095) 0.990(<0.001) 0.990(<0.001)/33 0.903 (0.086) 0.925 (0.101) 0.940 (0.552) 0.940 (0.552)/30.905 (0.088) 0.928 (0.103) 0.955 (0.732) 0.955 (0.732)(c) Case 4: NMAR-NMAR, Sample Size25O, Simulation Size2OO/3 LNaive (SD) LMCEM()CoverageNjv (p-value) Coveragec],f (p-value)0.546 (0.015) 0.556 (0.020) 0.945 (0.756) 0.955 (0.732)/3 0.506 (0.030) 0.747 (0.079) 0.800(<0.001) 0.965 (0.248)/320.551 (0.030) 0.559 (0.032) 0.950 (1.000) 0.950 (1.000)/3 0.551 (0.031) 0.559 (0.032) 0.940 (0.552) 0.940 (0.552)j3 0.550 (0.032) 0.558 (0.034) 0.925 (0.180) 0.930 (0.268)Table 4.39: Comparingthe effectof sample size onthe median and differencebetween the median and true value (S) for case 4(NMAR-NMAR) wherea single covariate suffers from both missing dataand measurement error.Missing data mechanism A was used for this comparison(a) Case 4: NMAR-NNAR, Sample Size =50, Simulation Size=200/3irnNaive(/3j)(i) fhMcEM(/j) (MAD)6Naive SMCEMj3 -0.746 (0.385) -0.758(0.390) -0.056 -0.068/3 0.279 (0.337) 0.403 (0.516)-0.131 -0.007/32-0.206 (0.365) -0.215 (0.376) 0.0140.0050.136 (0.370) 0.136 (0.384) 0.036 0.036/34 -0.158 (0.311) -0.165(0.339) -0.048 -0.055(b) Case 4: NMAR-NMAR, Sample Size 100, Simulation Size=200/3rnNaive(/3j)(Ib) mMCEM(/3j)(MAD) Naive6MCEM-0.664 (0.232) -0.666 (0.247)0.026 0.024/3 0.265 (0.212) 0.389 (0.324) -0.145-0.021/32-0.227 (0.204) -0.227 (0.209) -0.007-0.007/33 0.118 (0.222) 0.122 (0.223) 0.018 0.022/3A. -0.116 (0.229) -0.116 (0.235) -0.006 -0.006(c) Case 4: NMAR-NMAR, Sample Size =250, Simulation Size=200/3jrnNaive(/3j) (AiA) mMCEM(/3)(MAD)‘5Naive 6MCEM/3o -0.654 (0.129) -0.650 (0.134)0.036 0.040/3 0.266 (0.135) 0.372 (0.190) -0.144-0.038/32-0.211 (0.120) -0.214 (0.122) 0.009 0.0060.101 (0.164) 0.103 (0.166) 0.0010.003/3 -0.124 (0.150) -0.127 (0.152) -0.014 -0.0171684.5.3.4 Comparing the effect ofthe specification of -rIn this experiment, the true value of T waskept at 0.7 for all scenarios. Thevalue of r assumed for the model was 0.5,0.7, 1.0. Rather than lookingat situation where T was matched andmismatched for different values ofr, we wanted to explore the effectof misspecification of r. This betterrepresents the “real-world” contextwhere the true value of r is unknownand a sensitivity analysis is done to see how theparameter estimates changeas a function of r. Furthermore, we willonly consider the NMAR case andrestrict our observations to the MCEMadjustment.We observe that the estimate of the standarddeviation associated withthe estimate ofi3increase as r increase, butthere is much stability in thestandard deviation estimates of the otherparameter estimates (Table4.40).Fori3,the bias is greater with a z-scoresuggesting a statistical differencefrom the null value of zero for mismatchesof r. When it is matched, thebias associated with/3ihas little evidence to suggesta difference from thenull. The bias associated with the other parameterestimates all have smallz-scores.The MSE associated with theparameter estimate ofi3increases as ‘r increases. Since this is the same pattern of thestandard deviation of the pointestimates, which are much larger thanthe biases, this is not unexpected.Although there are varying patternsin the MSE for the other parameters,there may be little evidence suggestingan statistical difference in them.We observe that the length of the confidenceintervals increases as rincreases (Table 4.41). This too may notbe surprising as the increased noisein the measurement error modelshould translate to increased variabilityin the parameter estimates which wouldbe reflected in the the associatedconfidence intervals. The coverage rate presentsthe familiar complexity seenin previous experiments. Unfortunately,in this case there is no overarchingtrend. We observe that /33 and /34attain the nominal rate for all levels of T.For/32we see that the coverage is significantlydifferent from the nominalrate for T = 0.7. For both/3andj3we observe an increase in the coveragerate as r increases; for both parametersthe movement is from attaining the169nominal rate to being significantly different thanit; this occurs at T = 1.0.For we see no clear trend, but wedo see that when assumed -r matchesthat of the measurement error model, we have the smallestvalues for/3k,and/34.(Table 4.42).Discussion We can observe a couple of trends inthis experiment. We seean expected increase in measures of variability as r increaseswhich manifestsin the estimate of the standard deviation, MSE, efficiency,and confidenceinterval length. We observe that the bias is least for theparameter estimateassociated with the imperfect covariate when theassumed value of r matchesthat of the measurement error model. Misspecification ofr has no clear effecton the bias of the other parameter estimates. Although there isnot a clearstory for the coverage rate, the evidence seemsto suggest that specifying aT that is smaller than the true value is better than having onethat is toolarge. This seems to apply to the efficiency as well.170Table 4.40: Comparing the effect of T on the pointestimate, bias, and mean squared error when a covariate hasboth missing data and mismeasured data and when themissing data mechanism is NMAR. Mechanism A wasused to generate the missing data for/3k;200 simulations were run.Naive MCEME.()(sb) iS (Zjg) A (SD) E(Ij)(sD) S (Zjas) MSE()(SD)-0.691 (0.253) -0.001 (-0.031) 0.064 (0.009) -0.689(0.256) 0.001 (0.029) 0.065 (0.009) 1.03 (0.01)0.274 (0.189) -0.136 (-10.151) 0.054 (0.005) 0.338 (0.236) 0.072 (-4.349)0.061 (0.007) 1.12 (0.05)/32-0.196 (0.238) 0.024 (1.448) 0.057 (0.006) -0.197(0.240) 0.023 (1.347) 0.058 (0.006) 1.02 (0.00)/33 0.12 (0.251) 0.020 (1.102) 0.064 (0.006) 0.121 (0.253) 0.021(1.154) 0.064 (0.006) 1.01 (0.00)/3-0.127 (0.248) -0.017 (-0.957) 0.062 (0.006) -0.128(0.250) 0.018 (-1.022) 0.063 (0.007) 1.02 (0.00)(b) TTrue — 0.7, Tmodej 0.7Naive MCEME()(st) (Zbjas)()E(,)(s1) S (Zjas) )iE() ()13o-0.669 (0.228) 0.021 (1.331) 0.053 (0.005) -0.672 (0.241) 0.018 (1.062)0.058 (0.005) 1.11 (0.02)j3 0.284 (0.228) -0.126 (-7.847) 0.053 (0.005) 0.427(0.365) 0.017 (0.664) 0.134 (0.020) 1.97 (0.22)/32-0.223 (0.212) -0.003 (-0.175) 0.045 (0.004) -0.228 (0.218) -0.008 (-0.487)0.048 (0.004) 1.06 (0.02)/33 0.111 (0.244) 0.011 (0.651) 0.060 (0.007) 0.114 (0.249) 0.014(0.799) 0.062 (0.008) 1.05 (0.01)/34. -0.117 (0.236) -0.007 (-0.407) 0.056 (0.008) -0.119 (0.242)-0.009 (-0.528) 0.059 (0.008) 1.05 (0.01)(c) ‘TTrue = 0.7, Tmodel 1.0Naive MCEM/3E()(sr)(Zbias) ii()E(,)(sb) (Z,jas) MSE (SD)()j3 -0.675 (0.216) 0.015 (0.957) 0.047 (0.004) -0.695 (0.256) -0.005 (-0.281) 0.065 (0.009) 1.39 (0.13)0.271 (0.206) -0.139 (-9.543) 0.062 (0.005) 0.595 (0.531) 0.185 (4.928) 0.316 (0.054) 5.14 (0.92)/32-0.238 (0.257) -0.018 (-0.987) 0.066 (0.008) -0.255 (0.286) 0.035 (-1.752) 0.083 (0.013) 1.25 (0.10)/3 0.098 (0.233) -0.002 (-0.093) 0.054 (0.006) 0.107 (0.249) 0.007 (0.390) 0.062 (0.007) 1.14 (0.03)/34 -0.125 (0.257) -0.015 (-0.834) 0.066 (0.008) -0.135 (0.274) 0.025 (-1.287) 0.076 (0.009) 1.14 (0.03)(a) TTrue 0.7, Tmodel = 0.5I’Table 4.41: Comparing the effect of r onconfidence interval length, and coverage where a single covariate suffersfrom both missing data and measurementerror and when the missing data mechanism is NMAR. Mechanism Awas used to generate the missing data for/3i;200 simulations were run.(a) TTrue 0.7, Tmode 0.5/3 LNaive(SD)LMCEM (SD) CoverageNajve (p-value) CoverageMcEM (p-value)0.895 (0.060) 0.905 (0.068) 0.930 (0.268) 0.930 (0.268),i3i 0.831 (0.089) 1.033 (0.149) 0.910 (0.048) 0.960 (0.472)/320.920 (0.092) 0.927 (0.095) 0.970 (0.096) 0.970 (0.096)/33 0.912 (0.086) 0.920 (0.089) 0.940 (0.552) 0.940 (0.552)/34 0.902 (0.087) 0.910 (0.089) 0.955 (0.732)0.955 (0.732)(b) rTrue = 0.7, Tmodel = 0.7/3LNaive()LMCEM()CoverageNajve (p-value) CoverageMcEM (p-value)/3 0.893 (0.052) 0.928 (0.089) 0.950 (1.000)0.950 (1.000)i3 0.838 (0.091) 1.289 (0.311) 0.880 (0.004) 0.970 (0.096)/320.908 (0.078) 0.931 (0.095) 0.990(<0.001) 0.990(<0.001)/33 0.903 (0.086) 0.925 (0.101) 0.940 (0.552)0.940 (0.552)/30.905 (0.088) 0.928 (0.103) 0.955 (0.732) 0.955 (0.732)(c) rTre 0.7, Tmodel 1.0!3jLNaive()LMCEM()CoverageNaive (p-value) CoverageMcEM (p-value)/3 0.892 (0.049) 1.007 (0.262) 0.965 (0.248) 0.985(<0.001)/3i0.839 (0.082) 1.955 (0.913) 0.905 (0.028) 1.000(<0.001)/320.918 (0.089) 0.999 (0.261) 0.935 (0.388) 0.945 (0.756)0.907 (0.079) 0.976 (0.164) 0.955 (0.732) 0.955 (0.732)/30.907 (0.086) 0.977 (0.167) 0.940 (0.552) 0.945 (0.756)Table 4.42: Comparing the effect of r on themedian and difference betweenthe median and true value(6) where a single covariate suffers from bothmissing data and measurement error andwhen the missing data mechanismis NMAR. Mechanism A was used togenerate the missing data for/3i;200simulations were run.(a) TTrue 0.7, Tmodel 0.5/3irnNaive(/3j) (MAD) ‘fñMcEMC1j) (MAD)Naive MCEM/3 -0.675 (0.212)-0.68 (0.224) 0.015 0.010j3 0.278 (0.174) 0.337 (0.226)-0.132 -0.073/32-0.199 (0.237) -0.202 (0.237) 0.0210.018/33 0.12 (0.222) 0.121 (0.220)0.020 0.021/34 -0.124 (0.233) -0.126 (0.234)-0.014 -0.016(b) TTrue 0.7, Tmod 0.7/3jTflNajve(/3j) (MAD)?flMCEM(/3j)(MAD)SNaive MCEM/3 -0.664 (0.232) -0.666 (0.247)0.026 0.024/3 0.265 (0.212)0.389 (0.324) -0.145 -0.021/32-0.227 (0.204) -0.227 (0.209) -0.007-0.007/33 0.118 (0.222) 0.122 (0.223)0.018 0.022/34 -0.116 (0.229) -0.116 (0.235)-0.006 -0.006(c) TTrue 0.7, Tmodt 1.013jrnNaive(!3j) (MAD) MMcEM(/) (MAD)Naive 5MGEM-0.669 (0.214) -0.68 (0.239) 0.0210.010!3i0.246 (0.194) 0.482 (0.406) -0.1640.072/32-0.225 (0.224) -0.232 (0.261) -0.005 -0.01243 0.101 (0.209)0.109 (0.230) 0.001 0.009/3-0.118 (0.237) -0.133 (0.246) -0.008-0.0231734.5.4 Simulation study2 discussionThe effect of having both missingdata and measurement error affectingthesame covariate introduces bias to the interceptand toi3.The bias is notrestricted to just these twoparameter estimates, for it was seen thatwithmechanism A, significantbias was introduced for/2and /33. The presenceof both problems withina single covariate appears to have the potentialto cause problems with the attainmentof the nominal coverage rate foraccurately measured covariatesas seen with mechanism A and C.In the first two experiments,we see the familiar trade-off betweenreducing the bias associated with the parameterof interest, /3, and the otherparameters. As with the previous study,we see that the intercept is themost affected by this trade-off.Across the four cases in the first experiment, we see that for /3 the MCEM approachreduces the bias and tendsto improve the coverage rate allowingthe coverage to reach the nominalrate. Unfortunately, this is not a globalresult which can be made for allthe parameters. We also see the familiarresult of larger estimated standarddeviations, MSE, and longer confidenceintervals.When the missing data mechanismis misspecified, we observe differingeffects which are contingent on thetype of disconnect between the generatingmodel and the assumed model. Whenthe mechanism is under-modelled,that is assumed to be MAR when it isNMAR, there is a tendency for thebias and the MSE to be larger than ifthe model was correctly specified.When it is over-modelled,we observed that the point estimatesare lessbiased than if it was correctly specified.In this situation, struggledtoattain the nominal rateof coverage.When we compared different missingdata mechanisms for cases 1 and 4,we saw some conflicting results. For/3we observed few differences acrossthe three mechanism when the mechanismwas MAR. For the other covariates, we saw many similarities acrossthe three mechanism, but there wassome suggestion that MechanismC, which was MCAR, may be behavingdifferently. This differenceemerges with the estimation of bias of the MCEMapproach for /34 and in theMCEM coverage rates associated with/o, /32and174,6. The differenceof MechanismC fully emerges when weconsider case 4,NMAR-NMAR. It ishere that the similarityof mechanisms A andB solidifyagainst the dissimilarityof mechanismC.Although there arepeculiarities observed withthis simulationstudy,by distancing ourselvesfrom the minutia ofdetails in the tables,we seesome trends forboth the complete-caseand MCEM approachesacross boththe MAR and NMARmissing datamechanisms. We observedecreasingestimates of thestandard deviation,MSE, and mean lengthof the confidenceintervals as the samplesize increases.Furthermore we seethat the MCEMapproach becomesmore efficientas the sample size increases.When weconsider the issueof bias, the detailscan create much confusion,but bytaking a broaderlook at the trendwe can see the possibilitythat for allparameters the biasis reduced asthe sample size increases.Applying thisperspective tothe other murky results,we see familiar groundemerge withthe MCEM approachhaving a trendtowards the attainmentof the nominalrate as the samplesize increases.Finally, whenwe considerthe specification ofr, we see the expectedresults, such as largerestimates of thestandard deviationfor larger valuesof r. This trend isseen again forthe MSE, and confidenceinterval length.When the assumedvalue of r matchesthat of the model,we see that thebias is minimized forWe see a generally positivefeature with thebiasin that misspecificationof -r seems to haveno affect on the biasof the otherparameters. Althoughthere is not clearstory for the coveragerate, theevidence seemsto suggest that specifyinga r that is smaller thanthe truevalue is betterthan having one thatis too large. This seemsto apply to theefficiency as well.175Chapter 5Does income affect thelocation of deathfor males,living in Aboriginalcommunities, who diedofprostate cancer?Although there are a myriad ofways to specify an individual’s culture, forthis investigation, we will consider thesocio-economic context. This is rarelymeasured directly and not presentin any of the linked data sources. A common means to compensate forthis deficiency, income is used as a surrogate[53—55]. Socio-economic status representsa context in which its memberslearn social coding; it is a culturalcontext which forms an understandingof self and others. Ecosocial theorysuggests that people biologically incorporate their “context”, thusit is suggested that health is the physicalrealization of our cumulative experiences[52].With this operating definition of asubject’s socio-economic cultural context, we extend the originalpopulation definition to include all male adults,age 20 and older, who resided in and diedin British Columbia, as identifiedon their death certificate, dueto malignant prostate cancer between 1997and 2003 inclusively, excludingdeath certificate only (DCO) diagnosis ofcancer, who lived in a dissemination areawhere the self reported aboriginalpopulation was 20% or greater. With thisdefinition, 215 patients were identified and the following covariateswere used for the model: average income176of the dissemination area, age, healthauthority, aboriginal profile of thedissemination area. The response is the location wherethe patient died.For the outcome, all locations of death arecoded on the death certificateby the BCVS (Appendix A). In 2000, the place ofdeath codes changedfrom following the International Classificationof Diseases (lCD) version9 to version 10. Coding for deaths at home(also called place of usualresidence) remained the same acrossthe change, but this was not true forother codes. For example, the hospitalcode was 7 prior to 2000 and 2after. The change of code also changedthe group of locations associatedwith hospital deaths. From the code alone,it is impossible to extract onlyhospital deaths, thus a supplementaryfile was obtained from the BCVS thatindicated all patients who died ina hospital where the facility code and thehospital name was given for this purpose.The set of codes has been reducedto indicate death at the place of usual residence, homeand free standingfacilities which constitutes a place ofresidence (e.g. nursing home), anddeath on a hospital campus which includes all facilitiesassociated with thepostal code of the hospital. The outcome wasdefined as10if the location of death was on a hospital campusyi =if the location of death was in the place of usual residenceWe have three perfect covariates: age, health authority,and aboriginalprofile of the dissemination area. Although theage is constructed from multiple data sources (date of birth from thecancer registry and date of deathfrom BC vital statistics), the populationhad no identifiable problems. Theage was kept as a continuous variable, whichis not standard for appliedepidemiological analysis, and it was standardized.The health authorityfor the place of usual residence wasused and obtained from the geocoding program. All the persons inthe population were sufficiently geocodedas to correctly identify them with one healthauthority: Vancouver Island,Vancouver Coastal, Fraser, Interior, and Northern(Figure 5.1). VancouverCoastal is the reference Health Authoritywith X32 z=Fraser Health Authority, X33 =Interior Health Authority, x34=Vancouver Island Health Author-177ity, and x35 z=Nothern Health Authority.The aboriginal profile of each area is predicatedon the disseminationcode which is obtained from the geocoding program. InBritish Columbia,dissemination areas which are in the800 series aside from the Nisga’a nationare reservations. The dissemination area code wasused to construct anindicator for reservations and the Nisga’a nation.Since all the patientswere successfully geocoded, the indicator was completeand defined asX4= {0 if the location of residence is off a reservation1 if the location of residence is ona reservationFigure 5.1: British Columbia health authoritiesand health service deliveryareasThe age, health authority, and reservationindicator were included inthe model as adjusting variables only with no directinterest in assessing- British ColumbiaHealth AuthoritiesandHealth Sevice Delivery AreasHealth Authortitea1.fltndo2. Fraser3. Vancouver Coastal•4.Vancouver Island•5.Nern6. ProvincIal Health Sravice(province-wide)Ic.V,dII5OJthVrcotwcbr178their utility to the overall model.We see in figure 5.1 that there may be aspatial aspect to this problem sincethe health authorities have geographicrelationships to each other. Althoughit is reasonable to consider a hierarchical model to account for thisstructure, we have chosen to adjust for anypotential regionality in the place ofdeath by including the health authorityin the modeL Furthermore, the healthauthority is the overarching governmental structure for the delivery ofregional services. By including this inthe model, we are not only adjusting forany regional differences, but alsoadjusting for administrative practices.The indicator for reservations wasincluded to adjust for any differencesthat may arise due to living on or offof a reservation.The average income is the surrogate for thesocio-economic status aridis the variable of primary interest forthis investigation. Income is a censusbased covariate which was inaccurately measured.Although there may beequivalent motivation for either a Berksonor classical measurement errormodel, for illustrative purposesa classical model is assumed. It is clear thatthis variable is mismeasured, but it also suffersfrom missing data. Of the215 subjects, 38 did not have any incomedata which translate to missing17.7% of the income data. This is due to the non-reportingof information inareas with low numbers of individuals(Chapter 1). Income was standardizedusing the empirical mean and standard deviationof the observable data.Two assumptions will be made for the imperfectcovariate. First we willassume that the distribution of the unobservableexperimental realizationsis identical to that of the observedexperimental realizations realizations,that is is identically distributedfor all subjects. Assuming unbiasedmeasurement error and ejN(O,r2)then Xjxj N(x,r2)whereF = {obs,rniss}, so X)xj, N(x,r).Secondly, we assume that thevariance of the measurement error model ispre-specified, that is we will notbe estimating it from the data.In order to specify a likelihood modeland apply the MCEM methodologyto this example, we will need to assumea model for the income data. Giventhat we are assuming a normal distributionfor the measurement error, if thesurrogate is normally distributedas well, we can reasonably assume that the179C V- -- t (dt=5) - -• t (dt=25)t(df=100)V— N(O,1)CooStandardized incomeFigure 5.2: Histogram of the normalizedincome variable compared with tdistributions of 5, 25, and 100 degrees offreedom and the standard normaldistribution.underlying unobservable measure of directinterest is normally distributedas well. The empirical distribution of the standardized incomedoes notimmediately suggest that a normal assumption istenable (Figure 5.2). Eventhe next best guess, such as a t-distribution remainsunconvincing.If we turn to theQ-Qplots for the t5, t25,tioo and standard normaldistributions, we gain some clarity (Figure5.3). We see that the t25,t100and standard normal distributions wouldbe reasonable assumptions for a“real-world” data set, thus to retain simplicity,we will assume that thestandardized income is distributed asa N(0,1) distribution. The caveat tothis assumption is the recognition thatthe tails do not have the correct180IQuantiles of t—cfistribution (df=5) Quantjles of I—distribution (df=25)Normal Q—Q PlotI-31O12Quantiles of t—distribution (df=100) Theoretical QuantilesFigure 5.3: Q-Q plots for the standardized income covariate: t-distributionswith 5, 25, and 100 degrees of freedom and the standard normal distribution.mass.Finally we have the outcome: place of death. We defined the place ofdeath as being in a hospital or in the place of usual residence, where theplace of usual residence includes both the home and any institutional place ofusual residence such as a nursing home. Furthermore, a hospital designationis any death which occurred on a hospital campus as defined by the postalcode. We identified location of death within the hospital campus first thenall places of usual residence. The proportion of men who died at the placeof their usual residence indicates that the odds ratio is nota reasonableapproximation to the relative risk (Table 5.1). There is a large differencein the proportion of individuals who die in hospital when comparing living181on a reservation with living off of a reservation. The covariateof primaryinterest, income, shows little difference between thosewho died in hospitaland those who died in their place of usual residence.Table 5.1: Univariate summary of the applied data set. The overallcolumn has the counts, proportions, means and standarddeviation for all 215subjects. The hospital and place of residence columnshave the rates, proportions, means and standard deviations for the subjects contingenton beingin the column category for the location of death.The model required for the analysis has been comprehensivelydiscussedin section 4.5, thus we will only briefly review the componentsfor the applied problem. The analysis will involve a binaryoutcome with a logit linkfunction, an imperfect covariate, the assumption thate N(0,r2), andthree perfect covariates which have been included inthe model for adjustment purposes and are not of primary interest. The variance ofe is assumedknown and a sensitivity analysis for -r will be performed. Finally,the MARand NMAR assumptions will be considered to see if the assumptionaboutthe missing data mechanism affects parameter estimation.Assuming that the imperfection indicator for missingdata and mismeaVariablePlace of deathHealth AuthorityVancouver IslandVancouver CoastalFraserInteriorNorthernReservation statusLiving off a reservationLiving on a reservationHospital(%)141 (65.6%)10 (7.1%)20 (14.2%)46 (32.6%)26 (18.3%)39 (27.7)124 (87.9%)17 (12.1%)Hospital (SD)43,650 (12,690)76.2 (8.7)Residence (%)74 (34.4%)7 (9.5%)11 (14.9%)15 (20.3%)30 (40.5%)11 (14.9%)56 (75.7%)18 (24.3%)Residence (SD)VariableIncomeAge39,712 (11,435)78.7 (9.2)182surement are independent and that they are binary random variables, wewill use the logit link function with the basic model for the systematic component for thejthsubject as= 70+ 7lXi,1 + 72Xi,2 + 732Xi,32 + 733Xj,33 + 734Xi,34 + 735Xi,35 + 74Xj,4where72is the parameter for age,732,733,734,and735are the parametersassociated with the four levels of the Health Authority covariate and74isthe parameter indexing the indicator of residence ona reservation. Thecovariate of primary interest and suffering from imperfection is X1,thus forMAR we set7’= 0. For NMAR, we are assuming that 0.5.1 Results and discussionThere are two aspects that need to be explored from a sensitivity analysispoint of view: the assumption about the missing data mechanism and thevariance of , r2. To begin, we will first consider the effect ofr when themissing data mechanism is MAR, then we will considerthe NMAR case.Finally we will consider the effect of changing the missing data assumption.When appropriate, comparisons against the complete-case analysis willbemade (Table 5.2)The striking feature under the MAR assumption is the stability of theparameter estimates across the various values of r (Table5.3). The MCEMestimates exhibit little variation across the specifications ofT. We seestability in the parameter estimates, z-scores and the associated p-values. Whenconsidered in light of the associated standard errors, these differenceswouldmost likely not achieve any significance. When it comesto significance ofthe parameter in modelling the outcome, we see that only /34 has a statisticaldifference from the null value of zero, but there is some weak evidence insupport of/32.We also notice that the standard error associated with theMCEM estimates are, in general, smaller than those derived from the naivecomplete-case approach.Recalling that we are primarily interested in/3and are only using the183Table 5.2: Naive complete-cases results(a) Parameter estimates, standard error, z-score, and associated p-valueParameter/3Naive(SE(/)) z-score p-valueI3 -0.580 (0.636) -0.912 0.362j3 -0.207 (0.194) -1.069 0.284/320.355 (0.178) 1.994 0.046/332-0.425 (0.800) -0.531 0.596/333 -0.760 (0.713) -1.066 0.286/334 0.654 (0.686) 0.9540.340/335 -0.414 (0.760) -0.5440.586/34 0.751 (0.659) 1.141 0.254(b) Point estimate, 95% confidence interval, andconfidence interval length on the odds ratio scale/3jORNaive 95% CI length/3o 0.56 (0.16,1.95) 1.79i30.81 (0.56,1.19) 0.63/321.43 (1.01,2.02) 1.02/3320.65 (0.14,3.14) 3.00/30.47 (0.12,1.89) 1.78/3M1.92 (0.50,7.37) 6.87/3 0.66 (0.15,2.93) 2.78/3 2.12 (0.58,7.71) 7.1318400ciiTable 5.3: Parameter estimates,standard error, z-score, and associated p-value forthe MCEM methodologyassuming that the missing data mechanismis MAR. Four levels of r are explored tocheck the sensitivity of themodel to the assumption on r(a) T=O.2(b) rO.3/3,j /3MCEM(SE(/)) z-score p-value-0.789 (0.561) -1.408 0.160-0.137 (0.231) -0.593 0.554/320.298 (0.160) 1.855 0.064/332-0.131 (0.647) -0.203 0.838/333 -0.507 (0.615) -0.824 0.410/334 0.783 (0.602) 1.303 0.192/335 -0.436 (0.684) -0.637 0.524/3.0.804 (0.409) 1.968 0.050(c) rO.4!3MCEM(SE()) z-score p-value/3o -0.781 (0.574) -1.359 0.17443 -0.141 (0.341) -0.4130.680/320.298 (0.161) 1.852 0.064/332-0.141 (0.649) -0.217 0.828/3 -0.518 (0.621) -0.835 0.404/3.0.776 (0.606) 1.280 0.200/3-0.454 (0.712) -0.638 0.524/34 0.805 (0.409) 1.969 0.048/3 /MCEM(SE(/)) z-score p-value•/3 -0.787 (0.563) -1.398 0.162i3i-0.144 (0.262) -0.550 0.582/320.298 (0.161) 1.854 0.064/332-0.137 (0.646) -0.213 0.832-0.513 (0.615) -0.833 0.404/3. 0.780 (0.602) 1.296 0.194/3 -0.440 (0.689) -0.639 0.522/3. 0.805 (0.409) 1.969 0.048(d) TO.513j73MCEM(SE()) z-score p-value/3 -0.775 (0.635) -1.220 0.222/3 -0.142 (0.638) -0.222 0.824/320.297 (0.161) 1.841 0.066/332-0.146 (0.665) -0.220 0.826-0.525 (0.648) -0.810 0.4180.771 (0.628) 1.228 0.220/3-0.469 (0.832) -0.563 0.5740.807 (0.409) 1.973 0.048Table 5.4: Point estimate,95% confidence interval, and confidence interval length on theodds ratio scale for theMCEM methodology assuming that themissing data mechanism is MAR. Four levelsof r are explored to checkthe sensitivity of the model to the assumptionon r(a) 7-=O.2(b) r=O.3/3 ORNaive95% CI length/3o 0.45 (0.15,1.36) 1.21j3 0.87 (0.55,1.37) 0.821321.35 (0.98,1.84) 0.86/3320.88 (0.25,3.11) 2.87/333 0.60 (0.18,2.01) 1.832.19 (0.67,7.12) 6.44/335 0.65 (0.17,2.47) 2.30/32.23 (1.00,4.98) 3.97(c) rO.4/3ORNaive 95% CI length/3o 0.46 (0.15,1.41) 1.26,8i0.87 (0.44,1.70) 1.25/321.35 (0.98,1.84) 0.86/3320.87 (0.24,3.10) 2.86/3 0.60 (0.18,2.01) 1.83/3.2.17 (0.66,7.12) 6.46/3s0.63 (0.16,2.56) 2.41/32.24 (1.00,4.99) 3.9813jORNaive 95% CI length/3 0.46 (0.15,1.37) 1.22/3i0.87 (0.52,1.45) 0.93/321.35 (0.98,1.84) 0.8613320.87 (0.25,3.09) 2.85/30.60 (0.18,2.00) 1.82/3.2.18 (0.67,7.09) 6.420.64 (0.17,2.48) 2.32/34 2.24 (1.00,4.98) 3.98(d) rO.5/3j ORNajve 95% CI length/3o 0.46 (0.13,1.60) 1.47/3i0.87 (0.25,3.03) 2.79/321.35 (0.98,1.85) 0.87/3320.86 (0.23,3.18) 2.95/3 0.59 (0.17,2.11) 1.94/3.2.16 (0.63,7.40) 6.770.63 (0.12,3.20) 3.07/3 2.24 (1.01,5.00) 3.99other covariates to adjust the model, we will first consider43ithen the otherparameters. From simulation study 2, we expectto see a substantial reduction in the bias, especially if the model is correctly specified across all modelfeatures. For all values of T we see an approximate33% reduction in themagnitude of the parameter. Thisis considerable, but not of the magnitudeoften seen in simulation study 2 when the model wascorrectly specified. Itis reasonable to assume that we have not correctly specified themodel inat least one, but perhaps more ways. In simulation 2, we also saw thatthebias associated with the MCEM approach for /3 was,in general, positive.Transferring to this situation, wecan assume that the MCEM estimate islarger than the true value, but we may also assumethat we are closer to itthan the estimate from the complete case analysis. Across the valuesof r,we see that /3 ranges from -0.137 to -0.144, so pulling everythingtogether,it is reasonable to conclude that the true value of theparameter, conditionalon this particular functional form of the model, wouldbe around -0.13.Considering the other covariates which havebeen included to adjust themodel and the intercept, we see a variety of movements.For the intercept,we see the magnitude of the estimate increase, but from simulation2, weknow that the bias associated with the estimate of the intercepttends toincrease when going from the naive complete-caseanalysis to that of theMCEM. If we maintain that observation for this data set, thenwe wouldassume that the intercept has a larger bias than thatof the naive analysis,thus by observing the direction of movement of the parameter estimateforthe intercept we can make a reasonable guess as to where thetrue valueshould be. In this case by becoming more negativewe would surmise thatthe true value for the intercept should be closer to zero than thatobservedfor the complete-case analysis. Outside of the estimate of the parameterassociated with the imperfect covariate and theintercept, there are fewclear and decisive results from simulation 2 aboutthe relationship betweenthe MCEM estimate, the complete-case estimates,and the true values. Ingeneral, we expect a slight increase in the bias for these covariates,but thisis not uniform across all the scenarios implemented in simulationstudy 2.We can observe that some of the parameter estimatesdo have large changes187in the estimates with/332and /333 being most notable.Table 5.4 converts the estimates to the odds-ratio scale. It is noted thatwe are not in a rare case situation, so the odds-ratio is not an approximationto the relative risk in this case and should not be interpreted in that manner.For the estimates of the odds-ratio, we constructed the95% confidence interval for each parameter estimate and the length of the confidence interval.The first noticeable feature is that the lengths of the confidence intervals forthe MCEM approach are smaller than those of the compete-case methodology. This is due to the smaller standard errors observed earlier. Fromsimulation study 2, we observed that the large sample asymptotic properties appeared to be “kicking in” around a sample size of 250. The shorterintervals could be evidence that for this study and we are beginning to experience some of these benefits. The difference in lengths is most noticeablefor /34 where the MCEM based length is almost half that of the naive approach. Also, we observe that the length of the confidence interval for/3iincreases as r increases, but this is not unexpected since as we increase rwe are introducing more variability into the model and in particular morevariability associated with x. Finally, from a confidence interval point ofview, only/3presents substantive interest.For most parameters the difference on the odds-ratio scale between thecomplete-case and the MCEM approaches is slight and would result in littledifference in substantive interpretation of the model. Focusing on we seethat there is what appears to be a minor difference. For the complete-caseanalysis we have an odds-ratio of 0.81 and for the MCEM method we have0.87. From our previous observations, we may assume that the the trueodds-ratio may be around 0.90. With the large confidence interval and theproximity of both estimates to zero, we may quickly decide that there islittle of interest in this covariate. From a substantive point of view, thecomplete-case estimate is much more compelling than the MCEM estimate.The confidence interval for the naive estimate is (0.56, 1.19) with a pointestimate of 0.81. Knowing that the data is fraught with problems, this maybe enough evidence to divert resources to the acquisition of better data inorder to further explore the effect of income on the location of death for this188population. It may be argued that withbetter and more data, the confidenceinterval may be shortened resulting ina significant finding. With the MCEMadjustment, which takes into accountthe data imperfection, we see that thepoint estimate is 0.87 with a potentialtrue odds-ratio of around 0.9, basedon inferring the results of an analogoussimulation study to the result of thesubstantive problem. Furthermore, theconfidence interval is rather large.With diverting more resources to investigatingthe effect of income on theplace of death for this population, wemay be able to shorten the confidenceinterval, but now a shorter interval looksless likely to produce a significantfinding, thus the diversion of resourcesbecomes much more questionable.In fact, in light of the MCEM adjustment,the question about the effect ofliving on or off a reservation, /34, becomesmuch more compelling and maybe a more sensible place to focus limited resources. Herewe see that theMCEM approach, predicated by an analogous simulationstudy, not onlyprovides statistical methodologicalinsight, but also can provide a means forsubstantive researchers to make betterdata acquisition decisions.Moving from the MAR to theNMAR assumption, we see again an overallstability in the parameter estimates (Table5.5). We see a similar stabilityin the estimated standard deviationacross the parameters except for theestimated standard deviation associatedwith /3 and /335, the indicator forthe Northern Health Authority. In bothcases we see an increase in theassociated estimate of the standard deviation.Perhaps, in this case, the underlying assumption that the imperfect randomvariable and the covariatesare pairwise independent is untenable for at leastx and x35. Finally, we seethat /34 is the only significant covariate. If wewere exploring the data froman hypothesis generation point of view in orderto determine the importantsubset of data in which to invest bothtime and money, it would be reasonable to focus on the location of residence (on/offreservation) rather thanincome. Furthermore there is moderateevidence for the inclusioll of/32.011the odds ratio scale, we see similar conclusionswith a particular note of theevidence supporting the odds ratio associated with/34 (Table 5.6).It is reasonable to assume that althoughwe have done a sensitivity analysis we have not correctly specified the model in termsof T, thus from189ccTable 5.5: Parameter estimates, standard error, z-score,and associated p-value for the MCEM methodologyassuming that the missing data mechanism is NMAR. Fourlevels of r are explored to check the sensitivity of themodel to the assumption on T(a) T=O.2 (b) i-=O.3/3/‘3MCEM(SE(/)) z-score p-value/3 -0.812 (0.557) -1.457 0.146-0.189 (0.230) -0.820 0.412/320.299 (0.161) 1.859 0.064/332-0.112 (0.646) -0.173 0.862/333 -0.482 (0.614) -0.785 0.432/334 0.803 (0.601) 1.337 0.182/335 -0.377 (0.683) -0.552 0.580/34 0.821 (0.409) 2.006 0.044(c) T0.4/3MCEM(SE(s)) z-score p-value-0.801 (0.571) -1.403 0.160/3-0.191 (0.338) -0.566 0.572/320.300 (0.161) 1.859 0.064/332-0.122 (0.650) -0.188 0.852/333 -0.496 (0.620) -0.800 0.424/3.0.796 (0.608) 1.310 0.190/3-0.403 (0.713) -0.565 0.572/34 0.821 (0.409) 2.005 0.044/3j I3MCEM(SE(/)) z-score p-value-0.811 (0.560) -1.448 0.14813i-0.201 (0.261) -0.768 0.442/320.300 (0.161) 1.860 0.062/332-0.110 (0.647) -0.170 0.866/3-0.485 (0.615) -0.788 0.430/3. 0.804 (0.603) 1.333 0.182/335 -0.376 (0.691) -0.545 0.5860.824 (0.410) 2.012 0.044(d) rO.5/3j I3MCEM(SE(/)) z-score p-value-0.798 (0.613) -1.302 0.192/3i-0.203 (0.568) -0.358 0.720/320.299 (0.162) 1.846 0.064/332-0.124 (0.663) -0.187 0.852/3 -0.500 (0.639) -0.783 0.434/3.0.793 (0.626) 1.266 0.206/3 -0.413 (0.800) -0.517 0.606/34 0.822 (0.410) 2.004 0.046ccTable 5.6: Point estimate, 95% confidence interval,and confidence interval length on the odds ratio scale for theMCEM methodology assuming that the missing datamechanism is NMAR. Four levels of T are explored to checkthe sensitivity of the model to the assumption on r(a) -r=O.2 (b) ‘r=O.3/3 ORNaive95% CI length/3 0.44 (0.15,1.32) 1.170.83 (0.53,1.30) 0.77/321.35 (0.98,1.85) 0.87/3320.89 (0.25,3.17) 2.92/333 0.62 (0.19,2.06) 1.87/334 2.23 (0.69,7.25) 6.56/335 0.69 (0.18,2.62) 2.44/34 2.27 (1.02,5.07) 4.05(c) rO.4/3iORNaive 95% CI length/3o 0.45 (0.15,1.37) 1.23/3i0.83 (0.43,1.60) 1.18/321.35 (0.98,1.85) 0.87/3320.89 (0.25,3.16) 2.920.61 (0.18,2.05) 1.87/32.22 (0.67,7.29) 6.62/30.67 (0.17,2.70) 2.54/3.2.27 (1.02,5.07) 4.05/3jORNaive 95% CI length/3o 0.44 (0.15,1.33) 1.18,3i0.82 (0.49,1.37) 0.87/321.35 (0.98,1.85) 0.87/3320.90 (0.25,3.19) 2.930.62 (0.18,2.06) 1.87/3 2.23 (0.69,7.28) 6.590.69 (0.18,2.66) 2.482.28 (1.02,5.09) 4.07(d) r=O.5/3ORNaive 95% CI length43o0.45 (0.14,1.50) 1.36/3 0.82 (0.27,2.48) 2.21/321.35 (0.98,1.85) 0.87/3320.88 (0.24,3.24) 3.000.61 (0.17,2.12) 1.95/3 2.21 (0.65,7.55) 6.90/335 0.66 (0.14,3.17) 3.03/34. 2.28 (1.02,5.09) 4.07simulation study 2, we can infer that the bias associatedwith /3 will rangein magnitude from 0.05 to 0.15. Predicated on theresults from simulationstudy 2, it is feasible that the true value maybe in the range of -0.13 to-0.03, which would have ramifications on the substantive conclusionsfor thisstudy. Considering this on the odds-ratio scale, wewould conclude that theMCEM and the complete-case approaches are similar enoughto warrantthe expenditure of resources in the acquisition ofhigher quality data, buta caveat exists in that it may be difficult to sufficientlyshrink the confidence interval. If the expenditure of funds in orderto have a non-significantfinding was permissible within the researchframework, then this would bea worthy venture. Otherwise, it would be more productiveto relegate thisinvestigation into a hypothesis generationcategory and use it to identifyvariables with greater potential.There is much similarity in the conclusionsthat can be drawn when comparing the two missing data mechanisms. Both supportthe inclusion of /34in the model and both have moderate evidence for/32.From an exploratoryanalysis point of view, it would be worth the expenditureof time and moneyto acquire better data surrounding these two covariates:age and residenceon a reserve. Both models suggest that income is nota statistically significant predictor of the location of death for this population,thus there maybe little interest in pursuing substantive questions pertainingto income forthis population.5.2 ConclusionsAlthough the estimate for/3iwas less with NMAR assumption than withthe MAR assumption, the conclusions about thesignificance of/3iin themodel is the same. The variation between the two approachesis more of anuance in the sensibility of pursuing the acquisitionof higher quality datato pursue the substantive question. A better role for theincome covariate isto adjust for socio-economic status in other investigations. Ifwe change theperspective of the study from a designed investigationto one of exploration,the MCEM adjustment appears to better define researchopportunities. For192example, the MCEM adjustment strongly suggests that/32and /34 would begood candidates for further investigation into their effecton the location ofdeath.We observed that for this study, the specification of ‘i- appears to have lessof an impact on the estimate ofi3than the assumption about the missingdata mechanism. Also, we observed little difference between the two missingdata mechanisms. This suggests a degree of stability in the estimates evenwhen the model is misspecifled. Although we may notbe able to identify thetrue model and compare the effect of misspeciflcation,we were able to useresults from a simulation for which the structure was identical in ordertoconstruct some plausible scenarios for the true parameter value. Finally, theshorter confidence intervals and the smaller standard errors associated withthe MOEM approach suggest that we may have “kickedin” the asymptoticproperties which may result in the MCEM being more efficient.193Chapter 6ConclusionA commonality between missing data andmismeasurement is the inabilityto observe realizations from the desired or target randomvariable. Althoughthe desired realizations may not be observed,something is observed throughthe course of the experiment, a realization or theabsence of the realization.Recognizing this provides a mean by whichthe two areas can be synthesizedinto a unified conceptual framework. Inthe spirit of both sub-disciplines,the idea of an experimentally observable random variablewas introduced,XE.The random vector of the target random variable andthe experimentally observable random variable,(X,XE),was associated with each subjectas was a functional relationship which linked the two.With the use the indicator variables, a naive model was usedto glue together aspects of missingdata and mismeasurement to ensure that the more generalproblem of imperfection would not only have both missingdata and mismeasurement assubsets but also provide a mechanism to characterizeimperfections whichdo not fall neatly into one of the two types ofdata deficiencies. Although theintroduction of the experimentally observablerandom variable appeared tobe superfluous, it provided much needed notationalsimplicity when specifying the particulars of the Monte CarloExpectation-Maximization algorithmneeded to obtain the adjusted maximum likelihoodestimates.Two situations were considered. The first had twocovariates with eachaffected by a single deficiency. The expectedattenuation in the covariatesuffering from mismeasurement was observedacross various missing datamechanisms and specifications of ‘r, but statisticallysignificant bias wasobserved in the estimates associated with the covariateaffected by missingdata when the missing data mechanism was MAR.Also, an overly optimisticestimate of the variation in the data resulted. Furthermore,it appears as194if oniy the covariate suffering from mismeasurementhas problems with theattainment of the nominal coverage rate.The second situation considered multivariable binarylogistic regressionas well, but there was one covariate which suffered from bothproblems.Across various missing data mechanisms we saw thatthe naive parameterestimates experienced attenuation and an overly optimisticestimation ofthe variation in the data. The estimation of the standarddeviation for theimperfect covariate was similar, for both MAR andNMAR assumptions,to the estimated standard deviation for the accurate covariates. Thiswasa feature not seen when the two problems affected differentcovariates. Asexpected, the coverage rate for the parameter associated with the imperfectcovariate failed to reach the nominal rate, but other parameterestimateswere also affected.In first simulation with two imperfect covariates, each witha single deficiency, the MCEM approach attempts to mitigate thebias of/2associatedwith the mismeasured covariate by striking a set of trade-offs withthe otherestimates. The algorithm appears to adjust first forthe mismeasured covariate, then for the covariate suffering from missing dataand finally for theintercept. The adjustment appears to be makinga trade-off between theoverall fit of the model and the accuracy of the individual estimates.Thespecification of the missing data mechanism appears tohave little impact onparameter estimation, but we do see that under-modelling the missingdatamechanism is in general worse than over-modelling it.When the systematiccomponent of the missing data mechanism is considered,we do see evidencethat including the response in the systematic componentdoes influence thequality of the estimates (Mechanism C). In this situation, theMCEM approach has a much more difficult time reducing thebias for all three parameter estimates and may sacrifice the accuracy of the estimates followingtheaforementioned priority. Finally, it was observed that the larger thesamplesize, the better the performance of the MCEM adjustment.For the second simulation study, MCEM trends are clearer witha slightstep back from the details buried in the many tables. In general, theMCEMapproach managed to reduce the bias associated with the parameteresti195mates, but this was not uniformly observedacross all missing data mechanisms or missing data assumptions.An unexpected finding was that therelative efficiency of the MCEM approach was much betterwhen both problems plagued a single covariate as opposed to the situationwhere the problems affected separate covariates. We observe decreasingestimates of thestandard deviation, MSE, and mean length ofthe confidence intervals asthe sample size increases. Furthermore we see thatthe MCEM approachbecomes more efficient as the samplesize increases. When we consider theissue of bias, the details can create much confusion,but by taking a broaderlook at the trend we can see the possibility thatfor all parameters the biasis reduced as the sample size increases. Finally, themisspecification of Tappears to have a minimal affect on parameterestimation.Across both simulation studies, we see the effect of attenuationon the imperfect covariates. Depending on theexact “mix” of imperfections appearsto have an affect on how this manifests. We alsosee that when missing dataand mismeasurement problems are not co-mingledin a single covariate, wemay be able to rely on standard results from each area.The caveat to thisis that we do observe nuances which suggest that standardknowledge maybreak down, thus it serves as onlya guide and ceases to hold its definitiveexplanatory role about the effects.The MCEM adjustment worked wellin general and has many promisingfeatures to warrant further explorationof the method. In particular, theemerging large sample properties were veryattractive, thus it is reasonableto conclude that with a more sensitive application of the MCEMalgorithm,estimates with better properties may emerge. Althoughthe MCEM adjustment does not mark a novel algorithmic approach,a novel perspective aboutdata was required in order to make the expositionof the details less foreboding. I was shown that standard approaches can work well innovel contexts,given the right perspective about the problem.This begs the question as towhat is needed more, new algorithms ora better understanding of the basicproblem?From a substantive point of view,we see that there is utility in using thisapproach. Although current sample sizes are restrictedby computational196limitations, it is reasonable to usethis method on a subset of data for thepurpose of data exploration. Using simulationstudies which emulate thestructure of the data, it is possibleto gain an understanding of how thebiases may be working and the sensibility of inferences.Furthermore, suchinsights will be a valuable tool for researchers whoare in a position to seekout more costly data. By providing a reasonable guideas to which aspects ofthe data may be most profitable in terms of meaningfuloutcomes, researchresources can be administrated ina much more focused manner.6.1 Future workGiven the small number of peer reviewed articles whichconsider the combined problem of missing data and mismeasurement, there ismuch workto be done. From a computational point of view,if the MCEM approachwas to be retained, future investigation wouldneed to involve a detailedinvestigation into the automation of the MonteCarlo sample size for eachsubject and at each iterative step of the EM algorithm.Also, methods formanaging very large design matrices would needto be explored in order topermit sample sizes greater than n = 250.Dempster et al. [22] indicate that computationalefficiencies can begained by understanding the structure of the likelihood. Itmay be worthpursuing a more analytic understanding of the likelihoodin order to identifyridges or plateaux that may exists. Furthermore,it would be of interest toconsider methodologies which allowfor ridges, but utilize a pulse mechanismto “eject” the algorithm away from the ridge and into a moreprofitable areaof the likelihood.A natural progression of this work would be to considerhierarchical models, the modelling of causal pathways, and time-varying covariates.Giventhat the motivating substantive problem involvedcultural covariates, whichthemselves are considered tovary with time, it would be feasible to consideran integration of the imperfect variable frame work withthe Andersen modelof health services [2], and Lui andWu’s work with time-varying covariatesin the context of HIV trials [71]. Finally, any integration ofa hierarchical197models and imperfect covariateswould be an asset to substantive researchersutilizing geocoded aggregate data.A final area of future researchis to consider the basic problem of dataimperfection and its impact.This is in the spirit of Gustafsons work[40]where a consideration ofthe impact of measurement error and misclassification are considered before turningto corrective methodologies. Giventhe complexities of relationships thatmay arise with imperfection and thelimited research on the topic,a more detailed look at the basic problem maybe profitable.198Bibliography[1] A. Agresti. Categorical Data Analysis. New York:Wiley, 1990.[2] R M Andersen. Revisiting thebehavioural model and access to medicalcare: Does it matter? Journalof Health and Social Behaviour, 36:1—10, 1995.[3] Stuart G. Baker. A simple method for computingthe observed information matrix when using the emalgorithm with categorical data.Journal of Computational an Graphical Statistics, 1(1):63—76,1992.[4] L E Baum and J A Eagon. An inequalitywith applications to statistical estimation for probabilisticfunctions of markov processes and toa model for ecology. Bulletin of the American MathematicalSociety,73:360—363, 1967.[5] L E Baum and T Petrie. Statistical inference for probabilistic functionsof finite markov chains. Annals of Mathematical Statistics,37:1554—1563, 1966.[6] L E Baum, T Petrie, G Soules, and N Weiss. A maximization techniqueoccurring in the statistical analysis of probablisticfunctions of markovchains. Annals of Mathematical Statistics,41:164—171, 1970.[7] J G Booth and J P Hobert. Maximizing generalizedlinear mixedmodel likelihoods with an automated montecarlo algorithm. Journalof the Royal Statistical Society. Series B, 61:265—285,1999.[8] S F Buck. A method of estimation of missing values inmultivariatedata suitable for use with an electronic computer. Journalof the RoyalStatistical Society. Series B, 22:302—306, 1960.199[9] Frederick Burge, Beverley Lawson, and Grace Johnston.Trends in theplace of death of cancer patients,1992-1997. CMAJ, 168(3):265—270,February 2003.[10] Andrea Burton, Douglas G. Altman,Patrick Royston, and Roger L.Holder. The design of simulationstudies in medical statistics. Statistics in Medicine, 25:4279—4292,2006.[11] Statistics Canada. Postal CodeConversion File (PCCF): ReferenceGuide. Statistics Canada, catalogueno. 92f0153gie statistics canadacatalogue no. 92f0153gie. ottawaedition, September 2006.[12] Statistics Canada. 2006 Census Dictionary.Statistics Canada, statistics canada catalogue no.92-566-xwe. ottawa. edition, February 142007.[13] Raymond J. Carroll, Douglas Midthune,Laurence S. Freedman, andVictor Kipnis. Seemingly unrelatedmeasruement error models, withappplication to nutritional epidemiology.Biometrics, 62:75—84, 2006.[14] Raymond J. Carroll, David Ruppert,Leonard A. Stefanski, andCipriän M. Crainiceanu. MeasurementError in Nonlinear Models:A Modern Perspective. Chapman& Hall/CRC, 2006.[15] George Casella and Robert L. Berger.Statistical Inference. Duxbury,second edition, 2002.[16] George Cassella and Edward I.George. Explaining the gibbs sampler.The American Statistician, 46(3):167—174,1992.[17] Richard Chamberlayne, Bo Green,Morris L Barer, Clyde Hertzman,William J Lawrence, andSamuel B Sheps. Creating a populationbased linked health database: Anew resource for health services research. Canadian Journal of Public Health,89(4):270—273, 1998.[18] K S Chan and J Ledolter. Monte carloestimation for time series models involving counts. Journal ofthe American Statistical Association,90:242—252, 1995.200[19] H Y Chen and R J A Little. Proportionalhazards regression withmissing covariates. Journalof the American Statistical Association,94:896—908, 1999.[20] N E Day. Estimating the componentsof a mixture of normal distributions. Biometrilva, 56:463—474,1967.[21] Hakan Demirtas. Simulation driveninferences for multiply imputedlongitudinal datasets. Statistica Neerlandica,58(4):466—482, 2004.[22] A P Dempster, N M Laird, andD B Rubin. Maximum likelihood fromincomplete data via the em algorithm.Journal of the Royal StatisticalSociety. Series B (Methodological), 39(1):1—38,1977.[23] Paul S. Dwyer. Some applications ofmatrix derivatives in multivariateanalysis. Journal of the American StatisticalAssociation, 62(318):607—625, June 1967.[24] Bradley Efron. The two sample problemwith censored data. In Proceedings of the 5th Berkley Symposiumof Mathematical Statistics andProbability, volume 4, pages 831—853.University of California Press,1967.[25] Bradley Efron and David V.Hinkley. Assessing the accuracy of themaximum likelihood estimator: Observedversus expected fisher information. Biomet’rika, 65(3):457—482,1978.[26] Bradley Efron and Robert J. Tibshirani. AnIntroduction to the Bootstrap. Chapman & Hall/CRC, 1998.[27] Julian J. Faraway. Extending the Linear Modelwith R. Chapman &Hall/CRC, 2006.[28] R A Fisher. Theory of statistical estimation.Proceedings of the Cambridge Philosophical Society, 22:700—725,1925.201[29] L W Fung. Implementing thepatient self-determination act (psda):How to effectively engage chinese-americanelderly persons in the decision of advance directives. Journalof Gerontological Social Work,22(1/2):161—174, 1994.[30] B Ganguli, J Staudenmayer, and M P Wand. Additivemodels withpredictors subject to measurement error. Australianand New ZealandJournal of Statistics, 47:193—202, 2005.[31] Alan E Gelfand and Ariand F M Smith. Samplebased approachesto calculating marginal densities. Journalof the American StatisticalAssociation, 85(410):398—409, 1990.[32] Andrew Gelman, John B Carlin, HalS Stern, and Donald B Rubin.Bayesian Data Analysis. Chapman& Hall/CRC, 2nd edition, 2004.[33] Stuart Geman and Donald Geman. Stochastic relaxation,gibbs distributions, and the bayesian restoration of images.IEEE Transactionson Pattern Analysis and Machine Intelligence, 6:721—741,1984.[34] A T Geronimus, J Bound, and LJ Neidert. On the validity ofusing census geocode characteristicsto proxy individual socioeconomic characteristics. Journal of the AmericanStatistical Association,91(434) :529—537, 1996.[35] W. R. Gilks and P. Wild. Adaptive rejection samplingfor gibbs sampling. Applied Statistics, 41(2) :337—348,1992.[36] R J Glynn and N M Laird. Regression estimatesand missing data:complete-case analyis. Technical report,Harvard School of PublicHealth, Department of Biostatistics, 1986.[37] Sander Greenland. Ecological versus individual-levelsources of bias inecological estimates of contextual health effects. InternationalJournalof Epidemiology, 30:1343—1350, 2001.202[38] Sander Greenland and William D Finkle. A criticallook at methodsfor handling missing covariates in epidemiological regressionanalysis. American Journal of Epidemiology, 142(12): 1255—1264,December1995.[39] D Gu, G Liu, D A Viosky, and Z Yi. Factors associatedwith place ofdeath among the chinese oldest old. Journal of AppliedGerontology,26(1):34—57, 2007.[40] Paul Gustafson. Measurement Error and Misclassificationin Statisticsand Epidemiology: Impacts and Bayesian Adjustments.Chapman &Hall/CRC, 2004.[41] W K Hastings. Monte carlo sampling methodsusing markov chainsand their applications. Biometrika, 57:97—109,1970.[42] C Hsiao andQK Wang. Estimation of structural nonlinear errors-in-variables models by simulated least-squares method. InternationalEconomic Review, 41(2):523—542, 2000.[43] Joseph G. Ibrahim. Incomplete data in generalizedlinear models.Journal of the American Statistical Association, 85(411):765—769,Sept1990.[44] Joseph G. Ibrahim, Ming-Hui Chen, and StuartR. Lipsitz. Montecarlo em for missing covariates in parametric regressionmodels. Biometrics, 55(2):591—696, June 1999.[45] Joseph G. Ibrahim, Ming-Hui Chen, andStuart R. Lipsitz. Missingresponses in generalised linear mixed models when themissing datamechanism is nonignorable. Biometrika, 88(2):551—564,June 2001.[46] Joseph G. Ibrahim, Ming-Hui Chen, StuartR. Lipsitz, and Amy H.Herring. Missing-data methods for generalized linearmodels: A comparative review. Journal of the AmericanStatistical Association,100(469):332—346, March 2005.203[47] Joseph G. Ibrahim and StuartR. Lipsitz. Parameter estimation fromincomplete data in binomial regression whenthe missing data mechanism is nonignorable. Biometrics,52(3):1071—1078, Sept 1996.[48] Joseph G: Ibrahim, Stuart R. Lipsitz,and Ming-Hui Chen. Missingcovariates in generalized linear models whenthe missing data mechanism is non-ignorable. Journal of the RoyalStatistical Society. SeriesB (Methodological), 61(1): 173—190, 1999.[49] M Jamshidian and R I Jennrich.Conjugate gradient acceleration ofthe em algorithm. Journal of the AmericanStatistical Association,88:221—228, 1993.[50] Daijin Ko. Estimation of the concentrationparameter of the von misesfisher distribution. The Annals of Statistics,20(2):917—928, June 1992.[51] Nancy Krieger. Overcoming theabsence of socioeconomic data in medical records: validation and applicationof a census-based methodology.American Journal of Public Health, 82(5):703—710,1992.[52] Nancy Krieger. Theories forsocial epidemiology in the 21st century: an ecosocial perspective. InternationalJournal of Epidemiology,30:668—677, 2001.[53] Nancy Krieger. Choosing area based socioeconomicmeasures to monitor social inequalities in low birth weightand childhood lead poisoning: The public health disparities geocodingproject (us). Journal ofEpidemiology and Community Health,57:186—199, 2003.[54] Nancy Krieger. Defining and investigatingsocial disparities in cancer:critical issues. Cancer Causes and Control,16:5—14, 2005.[55] Nancy Krieger. Race/ethnicity and breast cancerestrogen receptorstatus: impact of class, missing data,and modeling assumptionsrace/ethnicity and breast cancerestrogen receptor status: impactof class, missing data, and modeling assumptionsrace/ethnicity andbreast cancer estrogen receptor status: impactof class, missing data,204and modeling assumptions. Cancer Causes and Control,19:1305—1318,2008.[56] John M Lachin. Biostatistical Methods: The Assessment of RelativeRisks. John Wiley and Sons, 2000.[57] N.M Laird. Nonparamteric maximum likelihood estimationof a mixingdistribution. Journal of the American Statistical Association, 73:805—811, 1978.[58] K Lange. A gradient algorithm locally equivalent to the em algorithm.Journal of the Royal Statistical Society. Series B, 57:425—437,1995.[59] K Lange. A quasi-newton acceleration of the em algorithm. StatisticaSinica, 5:1—18, 1995.[60] D Lansky, George Cassella, C E McCulloch, and D Lansky. Convergence and invariance properties of the em algorithm. In AmericanStatistical Association Proceedings of the Statistical ComputingSection, pages 28—33. American Statistical Association, 1992.[61] Richard A. Levine and George Casella.Implementations of the montecarlo em algorithm. Journal of Computational an GraphicalStatistics,10(3):422—439, Sept 2001.[62] Richard A. Levine and Juanjuan Fan. An automated(markov chain)monte carlo em algorithm. Journal of Statistical Computation 4Simulation, 74(5):349—360, May 2004.[63] Hua Liang, Suojin Wang, and Raymond J. Carroll. Partially linearmodels with missing response variables and erorr-prone covariates.Biometrika, 94(1): 185—198, 2007.[64] Stuart R. Lipsitz and Joseph G. Ibrahim. A conditional modelforincomplete covariates in parametric regression models. Biometrika,83(4):916—922, Dec 1996.205[65] Stuart R. Lipsitz, Joseph G. Ibrahim, Ming-Hui Chen, andHarriet Peterson. Non-ignorable missing covariates in generalizedlinear models.Statistics in Medicine, 18:2435—2448, 1999.[66] Stuart R. Lipsitz, Joseph G. Ibrahim, and Lue Ping Zhao.A weightedestimating equation for missing covariate data with propertiessimilarto maximum likelihood. Journal of the AmericanStatistical Association, 94(448):1147—1160, Dec 1999.[67] Roderick J. A. Little and Donald B. Rubin. Statisticalanalysis withmissing data. John Wiley and Sons, New Jersey, 2002.[68] Roderick J. A. Little and Mark D. Schluchter. Maximumlikelihoodestimation for mixed continuous and categoricaldata with missingvalues. Biometrika, 72(3):497—512, 1985.[69] Chuanhai Liu, Donald B. Rubin, and Ying NianWu. Parameter expansion to accelerate em: The px-em algorithm.Biometrika,85(4):755—770, 1998.[70] Thomas A. Louis. Finding the observed information matrixwhen usingthe em algorithm. Journal of the Royal StatisticalSociety. Series B(Methodological), 44(2):226—233, 1982.[71] Wei Lui and Lang Wu. A semiparametric nonlinearmixed-effectsmodel with non-ignorable missing data and measurementerrors forMv viral data. Computational Statistics and DataAnalysis, 52:112—122, 2008.[72] M Malstrom, J Sundquist, and S E Johansson.Neighbourhood environment and self-rated health status: a multilevelanalysis. AmericanJournal of Public Health, 89(8):1181—1186, 1999.[73] P. IVicCullagh and J.A. Nelder. Generalized Linear Models.Chapman& Hall/CRC, 2nd edition, 1989.[74] Geoffre J. lVlcLachlan and ThriyambakamKrishnan. The EM Algorithm and Extensions. John Wiley and Sons, 2008.206[75] I Meilij son. A fast improvement to the em algorithmon its own terms.Journal of the Royal Statistical Society. Series B,51:127—138, 1989.[76] X L Meng and D B Rubin. using em to obtainasymptotic variancecovariance matrices: the sem algorithm. Journalof the AmericanStatistical Association, 86:899—909, 1991.[77] Xiao-Li Meng and David van Dyk. The em algorithm—anold folk-songsung to a fast new tune. Journalof the Royal Statistical Society. SeriesB (Methodological), 59(3) :511—567, 1997.[78] Nicholas Metropolis, Arianna WRosenbluth, Marshall N Rosenbluth,Augusta H Teller, and Edward Teller.Equation of state calculations by fast computing machines. TheJournal of Chemical Physics,21(6):1087—1091, June 1953.[79] Nicholas Metropolis and S Ulam. The montecarlo method. Journalof the American Statistical Association, 44(247):335—341,1949.[801H. Morgenstern. Uses of ecological analysis inepidemiological research.American Journalof Public Health, 72(12):1336—1344, 1982.[81] Simon Newcomb. A generalized theory ofthe combination of observations so as to obtain the best result. AmericanJournal of Mathematics,8(4):343—366, Aug 1886.[82] David Oakes. Direct calculation of the informationmatrix via the emalgorithm. Journal of the Royal StatisticalSociety. Series B (Statistical Methodology), 61(2) :479—482, 1999.[83] Terence Orchard and Max A Woodbury. A missinginformation principle: Theory and applications. InProceedings of the 6th Berkley Symposium of Mathematical Statistics and Probability,volume 1, pages697—715. University of California Press, 1972.[84] M S Pepe. Inference using surrogate outcome dataand a validationsample. Biometrika, 79(2) :355—365, 1992.207[85] Christian P. Robert and George Cassella. Monte Carlo StatisticalMethods. Springer, second edition, 2004.[86] J. M. Robins, A. Rotnitzky, and L. P. Zhao. Estimation of regressioncoefficients when some regressors are not always observed.Journal ofthe American Statistical Association, 89:846—866, 1994.[87] Paulino Perez Rodriguez. Adaptive Rejection Sampling, 2007.[88] L L Roos, J Magoon, S Gupta, D Chateau, and PJ Veugelers.Socioeconomic determinants of mortality in two canadianprovinces:Multilevel modelling and neighbourhood context. SocialScience andMedicine, 59(7): 1435—1447, 2004.[891A V Roux. The study of group-level factorsin epidemiology: rethinking variables, study designs and analytic approaches.EpidemiologyRevue, 26:104—111, 2004.[90] Donald B. Rubin. Inference and missing data. Biometrika,63(3):581—592, 1976.[91] Donald B. Rubin. The analysis of transformeddata: Comment. Journal of the American Statistical Association, 79(386):309—312,1984.[92] S Schwartz. The fallacy of the ecological fallacy:the potential misuseof a concept and its consequences. American Journalof Public Health,84(5):819—824, 1994.[93] M R Segal, P Bacchetti, and N P Jewell. Variance for maximumpenalized likelihood estimates obtained via the em algorithm.Journalof the Royal Statistical Society. Series B, 56:345—352,1994.[94] P A 0 Strickland and B F Crabtree. Modelling effectivenessof internally heterogeneous organizations in the presenceof survey nonresponse: An application to the ultra study.Statistics in Medicine,26(8):1702—1711, 2007.208[951Amy L. Stubbendick and Joseph G. Ibrahim. Maximumlikelihoodmethods for nonignorable missing responses and covariates inrandomeffects models. Biometrics, 59:1140—1150, December2003.[96] R Sundberg. Maximum likelihood theory for incomplete data from anexponential family. Scandinavian Journal of Statistics: TheoryandApplications, 1:49—58, 1974.[97] R Sundberg. An iterative method for solution of the likelihoodequations for incomplete data from exponential families. Communicationsin Statistics - Simulations and Computations, 5:55—64, 1976.[98] M Susser. The logic of ecological: I. the logic of analysis. AmericanJournal of Public Health, 84(5):825—829, 1994.[991Lingqi Tang, Juwon Song, Thomas R. Belin, and JurgenUnutzer.A comparison of imputation methods in a longitudinal randomizedclinical trial. Statistics in Medicine, 24:2111 — 2128,2005.[100] S T Tang and R McCorkle. Determinants of placeof death for terminalcancer patients. Cancer Invest., 19(2):165—180, 2001.[101] C Thomas, S M Morris, and D Clark. Placeof death preferencesamong cancer patients and their carers. Social Science é4 Medicine,58(12):2431—2444, 2004.[102] B W Turnbull. Nonparametric estimation ofa survivorship functionwith doubly censored data. Journal of the American Statistical Association, 69:169—173, 1974.[103] B W Turnbull. The empirical distribution with arbitrarily grouped,censored, and truncated data. Journal of the Royal StatisticalSociety.Series B, 38:290—295, 1976.[104] Werner Vach and Maria Blettner. Bias estiamtionof the odds-ratioin case-control studies due to the use of ad hoc methods of correctingfor missing values for confounding variables. Journal of AmericanEpidemiology, 134(8):895—907, 1991.209[105] Werner Vach and MartinSchumacher. Logistic regression with incompletely observed categorical covariates: A comparisonof three approaches. Biometrika, 80(2):353—362,1993.[106] C YWang, Yijian Huang,Edward C Chao, and Marjorie Jeffcoat.Expected estimating equations for missing data, measurementerror,and misclassification, with applicationto longitudinal nonignorablemissing data. Biometrics, 64:85—95,March 2008.[107] C Y Wang and M S Pepe. Expectedestimating equations to accommodate covariate measurement error.Journal of the Royal StatisticalSociety. Series B (Methodological), 62:509—524, 2000.[108] Jing Wang. Em algorithms for nonlinearmixed effects models. Computational Statistics and Data Analysis, 51:3244—3256,2007.[109] G. C. Wei and M.A. Tanner. Amonte carlo implementation of the emalgorithm and the poor man’s data augmentation algorithms.Journalof the American Statistical Association, 85:699—704, 1990.[1101G. C. Wei and M.A. Tanner. Posterior computationsfor censored regression data. Journal of the AmericanStatistical Association, 85:829—839, 1990.[111] P. Wild and W. R. Gilks. Algorithmas 287: Adaptive rejection sampling from log-concave density functions. AppliedStatistics, 42(4):701—709, 1993.[112] D M Wilson, J C Anderson, R L Fainsinger,H C Northcott, S L Smith,and M J Stingi. Social and health care trendsinfluencing palliativecare and the location of death intwentieth-century canada. Technicalreport, University of Alberta, 1998.[113] C Wong. Personal communicationwith data analyst, March 2005.[114] C F J Wu. On the convergenceproperties of the em algorithm. Annalsof Statistics, 11:95—103, 1983.210TT8OO‘Tc—IOc:()6‘S?4SOffS1AOpoanSuisrwpunodoiptj&PIPIU0I10 JPOtIifiI”!’”U0P9flT1IVAA°D[TT1Appendix ABritish ColumbiaVitalStatistics location ofdeathAll locations of death are coded on the death certificate by the BCVS.In2000, the place of death codes changed from following the InternationalClassification of Diseases (lCD) version9 to version 10. Coding for deathsat home (also called place of usual residence) remained the same acrossthechange, but the codes for other locations changed. Thehospital code was 7prior to 2000 and 2 after. It is clear that these categories representgenerallocations of death based on some similarity acrossthe locations. The placeof usual residence, called home, is its own distinct category, but comparisonwith this location and others becomes problematicdue to the compositenature of the other categories.Prior to 2000, the location of death codesas reported on the deathcertificate are• (0) HOME: includes apartment, boarding house, caravan (trailer) park,farmhouse, home premises, house (residential), non-institutionalplaceof residence, private driveway to home, garage, gardenof home, swimming pool in private house or garden• (1) RESIDENTIAL INSTITUTION: includeschildren’s home, dormitory, home for the sick, hospice, militarycamp, nursing home, oldpeople’s home, orphanage, pensioners’ home, prison, reformschool• (2) SCHOOL, OTHER INSTITUTION, PUBLIC ADMINISTRATIVEAREA: Building and adjacent grounds usedby the general public or bya particular group of the public which includes hospital, assemblyhall,22campus, church, cinema, institute for higher education,court-house,day nursery, post-office, public hail, school, library, youthcentre, (thevast majority of deaths with this placecode are hospital deaths). (3) SPORTS AND ATHLETIC AREA: includes baseball field, basketball court, football field, stadium, public swimmingpooi, etc. (fairlyself explanatory)• (4) STREET AND HIGHWAY• (5) TRADE AND SERVICE AREA: includes airport, bank, cafe,hotel/motel, office buildings, public transit stations,railway stations,shops, malls, service station, commercial garage, radio/televisionstations• (6) INDUSTRIAL AND CONSTRUCTION AREA: includes any building under construction, dockyard,factory premises, industrial yard,mine, oil rig and other offshore installations, gravel/sandpit, powerstation, shipyard, tunnel under construction, workshop,commercialgarage.• (7) FARM includes farm buildings, land under cultivation, ranch• (8) OTHER SPECIFIED PLACES: including but not limitedto beach,campsite, derelict house, wilderness areas and naturalbodies of water(mountain, forest, lake, marsh, river, sea, seashore,etc.), public park,parking lots, railway line, zoo, waterreservoir, unspec. public place.• (9) UNSPECIFIED PLACEAfter 2000, the location of death codesused for the British Columbiadeath certificates are• (0) HOME: includes apartment, boarding house, caravan (trailer) park,farmhouse, home premises, house (residential),non-institutional placeof residence, private driveway to home, garage, gardenof home, swimming pooi in private house or garden213• (1) FARM: Includes farm buildings and land undercultivation. Excludes farm house and home premisesof farm.• (2) MINE AND QUARRY: including gravel and sandpits, tunnelsunder construction• (3) INDUSTRIAL PLACE AND PREMISES: includesany buildingunder construction, dockyard, factory premises, industrialyard, mine,oil rig and other offshore installations,gravel/sand pit, power station,shipyard, tunnel under construction, workshop, commercialgarage• (4) PLACE FOR RECREATION ANDSPORT: includes lake andmountain resorts, vacation resorts,public parks, playgrounds, baseballfield, basketball court, football field, stadium, publicswimming pool,etc.• (5) STREET AND HIGHWAY• (6) PUBLIC BUILDING: includes assembly hall, campus,church, cinema, institute for higher education, court-house,day nursery, post-office, public hall, school, library,youth centre, airport, bank, cafe,• hotel, office buildings, public transit stations, shops,malls, commercial parking garage, clinic• (7) RESIDENTIAL INSTITUTION: includes Hospitals,children’s home,dormitory, home for the sick, hospice, military camp,nursing home,old people’s home, orphanage, pensioners’ home,prison, reform school• (8) OTHER SPECIFIED PLACES: including but not limitedto beach,campsite, derelict house, wilderness areasand natural bodies of water(mountain, forest, lake, marsh, river, sea, seashore,etc.), public park,parking lots, railway line, zoo, water reservoir.• (9) UNSPECIFIED PLACE214Appendix BSupplemental materialforthe methodologicaldevelopmentThis appendix contains information which iseither informative backgroundinformation which is not directly necessary for themain expository flow ofthe thesis, or methodological development which, ifincluded in the mainbody of the thesis would distract from the narrative points beingmade.B.1 Adaptive Rejection SamplingGilks and Wild [35] proposed a method for rejectionsampling from anyunivariate log-concave probability density function.Details for rejectionsampling, adaptive rejection sampling and log-concavityfollow. Intendedto be a black-box technique for sampling from any univariatelog-concavedistribution, a upper hull, called the the rejectionenvelope and a lower hull,called the squeezing function are constructedabout the target distribution.As sampling proceeds, the rejection envelopeand squeezing function converge to the target distribution.The formal goal is to generate a sample of independentrealizations froma density, f(x), which only needs to be known upto a constant of proportionality to another function g(x), that isg(x) cf(x). This is particularlyuseful when cfg(x)dx is not known in closed form whereV is thedomain of f(x) for which f(x) > 0 for allx V.215B .1.1 Rejection sampling: Envelope Accept-RejectOn approach, called Envelope Accept-Reject, is to constructan envelopeabout g(x) consisting of two functions and thenuse this envelope to determine if a samples point is from the function underconsideration, f(x).First, define an envelope function gu(x) such thatgu(x) g(x)Vx E Vand a squeezing function g (x) such thatg(x) g(x)Vx e V.Now, sample an observation, x’ fromgu(x) and independently sample anobservation, w from U(O, 1). If we have defined91(x) then we can performthe squeeze test,if wg(x*)then accept x9u(x*)otherwise perform the following rejectiontest,(x*if w < then accept xgu(xotherwise reject x. This is repeated until a sample ofm points is obtained.B .1.2 Adaptive Rejection S arnplingFrequently, only one sample is required whenusing Gibbs sampling, butoften we need one sample from a large numberof different probability densities. Adaptive rejection sampling (ARS) reduces the numberof evaluationsrequired to obtain the desired sample size by• assuming log-concavity which avoids the need to findthe supremumof the function, and216• reducing the probability of needingto evaluate g(x) further becauseafter each rejection, the envelope and squeezing functionsare updatedto incorporate the new information obtained from therejected obser• vation.Along with the log-concavity assumption of the function,adaptive rejectionsampling requires the additional assumptions that• the domain V of the function is connected,that is it cannot be represented as the disjoint union of two or more non-emptyopen subsets,• g(x) is continuous and differentiable everywhere in V, and• h(x) = log g(x) is concave everywhere in V.Given that we are working in a standard Cartesian coordinatesystem, thex coordinates are termed the abscissa. In order to construct theupper andrower hulls which are then squeezed towards the target function,a set of kabscissae are selected where x1 <x2 •. x. We will retain the notationof Gilks and Wild [35] and denote the set of abscissaeTk = {x : i 1,.. . , k}where the subscript k denotes the cardinality of theset T. The functionh(x) and its derivative h’(x) are evaluated for xE Tk where i = 1,. . . , kand TkCV.Now we construct the upper hull known as the rejectionenvelope onTk.Define the rejection envelope asexpuk(x) where uk(x) is a piece-wiselinear upper hull formed by the tangents to h(x) atx eTkfor i = 1,.. . , k.Notice that for each update of the abscissae,Tk changes its cardinality bythe introduction of the point that was rejected.This in turn results inthe construction of a new upper hull, thus the upper hull functionu(x) isindexed by the cardinality of the abscissae,Uk (x).The upper hull, constructed from the tangent linesis easy to constructusing the point-slope form of a line. Given the functionh(x) we find thederivative h’(x). Furthermore, the tangent line,nk(x),and h(x) share thepoint (xi, h(x)) for i = 1,.. . , k. Finally we have thepoint-slope form of217the tangent line associated with xe Tk,Uk(X) = h(x) + (x — x)h’(x).The k — 1 intersections betweeii theand the x1 tangent lines, denotedas Zj, is— h(x+i)— h(x)— x+1h’(x+i) + xh’(x)— h’(x)—Now with the construction of theupper hull, we normalizeexpuk(x) toconstruct a density from the hull envelope function,thus— expuk(x)/ /fvPUk(X )dxThe squeezing function is constructed ina similar manner to that of theenvelope functionlk(x), exceptthat we use the k—i chords(secants) betweenx,, eTk. Notice that as with the envelope function, the squeezingfunction is also indexed by the cardinality ofthe abscissae. The slope of thesecant between the abscissae x, andish(x+1)— h(x)xi+1—and with the point-slope form of a linewe haveh(x+1)— h(x)tk(X) — h(x) (x — x)xi+1—h(x+1)(x — x) — h(x) (x — x)+ h(x) (x1 — x)lk(X) =Xj+1 —= h(x)(x+i_) + h(x+i) (x — x)xi+1—xiUnder the assumption of concavityof h(x) we ensure thatlk(x)< h(x) <uk(x)for all x E V. The ARS algorithm, utilizingthese functions, involvesthree steps: initialization, sampling, and updating[35, iii].218B.1.2.1 InitializationWe initialize the abscissae,Tk bychoosing a set of points inthe domain off(x). If V is unbounded from below,choose x such that h”(x)> 0 and ifV is unbounded from above,then choosexk such that h’(x) <0. Calculatethe functionsUk(Xj), Sk(Xj), and lk(Xj) for x ETk where i = 1, . . . , k.B.1.2.2 SamplingFirst sample an observation,x, from the distribution constructed fromtheupper hull,sk(x), then independently sample an observation, w, fromtheuniform distribution, U(0, 1). Nowperform the squeeze test,if w exp{lk(x*) —uk(x )}then accept xotherwise evaluateh(x*)andhI(x*)and perform the rejection test,if w exp{h(x*) —uk(x )}then accept xotherwise reject x.B.1.2.3 UpdatingIf the functionsh(x*)andh’(x*)are evaluated, then include theabscissax in Tk, thus we haveTk+1 = {xi, . .. ,xk,x*}.The abscissae inTk+1areordered and the functionsuk+1(x), Sk+1(X) and lk+1(x) are evaluated withthe new abscissae. If a sampleof size m is not yet obtained,then return tothe sampling step.B.1.3 Applicationto Gibbs samplingWith Gibbs sampling, we havea sequence of fully conditional distributionsfrom which samples are drawn.The adaptive rejection sampling becomestheprocess by which samplesare drawn from the full conditionaldistributionsas specified by the Gibbs sampler.Here, we require that thefull conditional219distribution is proportional to to some function g(). Unless the full conditional distribution is expressible in terms of conjugate distributions, thefull conditional distribution will not correspond to a common distributionand it will not be possible to obtain a closed form for the proportionalityconstant. Additionally, since the full conditional distribution will commonlybe constructed from the product of many terms, it may be computationallyexpensive to sample from the fully conditional distribution, hence the utilityof the adaptive rejection sampling method for the Gibbs sampler[35].B.1.4 Concavity of the likelihoodAlthough the adaptive rejection sampling algorithm requires several assumptions, the one of particular concern is the assumption of log-concavefunctions. A log-concave function is simply a function that is concave onthe logarithmic scale. Using Gilks and Wild’s [35] notation, recall thath(x) = logg(x), thus g(x) is log-concave if h(x) is concave. For differentiable functions, h(x) is concave if the derivative h’(x) = fh(x) decreasesmonotonically with increasing x E V. Furthermore, this can be modified toconsider functions over a specified interval, thus a differentiable function isconcave on an intervalD*if h’(x) is monotonically decreasing on that interval. This translates to the familiar concept that a concave function has anegative second derivative, f”(x) <0. This situation is known as a strictlyconcave function. Gilks and Wild consider concave functions in general andrelax the condition to include straight lines, thus we have f”(x) 0. Moreformally, if h(x) is twice differentiable, then h(x) is concave if and only ifh”(x) <0. A desirable property which emerges for concave functions is thatthe sum of concave functions is itself concave, thus if each component of thelog-likelihood is log-concave, then the log-likelihood itself is log-concave.Gilks and Wild [35] provide a list of common probability distributionsand their log-concavity properties. Table B.1 reproduces the concavity features for those distributions considered in this document.220Table B .1: Log-concave properties for probability distributionsfound in this document [35]p(x) Parameters logp(x) concave with respect to log p(x) not concave with respect toNormal Mean , Variance ax, 1u, log a, aBernoulli Proportionp p, logit(p)Binomial Proportion pp, logit(p)B.1.5 Concavity of the binarylogistic regression modelHere, we will show that the exponential familyfor generalized linear modelsis log-concave in terms of the covariates. Specifically,for adaptive rejectionGibbs sampling, we need to generatea sample based on the full conditionalof the x variable, thus the responsemodel and the model for the mechanismof imperfection need to be log-concavein terms of x.Recall from equation 2.5.11 thatthe form of the exponential family isp(Y =y10, )exp(°a(+ c(y,with the log-likelihood, in terms ofgiven byy6—b(8)( Iy,c)—+c(y,çb)and for our context, the log-likelihoodisyj&j —l(Ojyj,xj,q)= a()+c(y,b)where the canonical parameter isa function of the data,= g(/3, xi).Recognizing that our outcome is distributedas a Bernoulli random variable, we will assume a binary logistic regressionform of the generalizedlinear model. Recall that we assumed that the modelis not over- or under-dispersed(= 1) which resulted in a(q) = 1. Furthermore,the canonicallink was assumed for the model, thus0 = log(-)where n is the probability of success as given in section 2.5.5 and0 = 7). For thejthsubject thisrelationship is given as‘Tlj = x/3= Xijjwhere xo = 1 for all i = 1, . . . , n and/3o is the intercept. The function b(0)222for binary logistic regression isb(O) = — log(1 —=log(1+e9i)andc(yj,)=log()since m = 1 for Bernoulli random variablesand y = 0, 1.The concavity of a Bernoulli randomvariable under the generalized linearmodel is required, thus we wantto determine-r1(OIyj, x,)is negativefor all x e Obtaining --2l(OIy, x,)requires the application of thechain rule,J—=twhere for thejthsubject we have81 1wheree1 + e,and1+ e1ithrj ir1(1 —— eth7(1+ei)2=irj(l—irj), and—p1,oxij223thuseOt= [Yi— 1+e8j]ifor a binary logistic regression model.Applying the chain rule again for thesecond derivative yields-wheree8”b”(91)1+e0i’thusa2i — e02ax1+eei3——— 1+=— (1—It is clear that for xjj, i = 1, . . n andj = 1, . . .,p the binary logistic modelis log-concave.224B.2 EM maximization detailsB.2.1 Component wise maximizationRecall that the maximization ofQ(1(t))reduces to the maximizationof each component ofQ(it).Although this is intuitively satisfying,lets consider a more detailed verificationof this proposition. The basicrequirement of the EM algorithm is thatfor each step of the algorithm, theexpectation of the complete log-likelihoodincreases (Equation 3.3.5), thatisQ((t+l)())> Q(It)To verify that the maximization of eachcomponent of equation 3.4.5 applied to equation 3.3.12 results in the maximizationofQ(t)we willbegin with the assumption that eachcomponent of equation 3.4.5 can bemaximized, thus= argmaxQ(p1t))= argmaxQ(717(t))and,(t+1)= argmax,Q(1(t))Without loss of generality, begin withQ(p113(t)).Given that(t+i)maximizesQ(1t)thenQ(/3(t+l)(t))From this we haveQ((t+1)j(t))+Q(7(t+1)j(t))Q())+Q((t+1)17(t))Q(3i3)+Q(7Vyt)225sinceQ(y(t+1)y(t))Q(717).Finally, sinceQ((t+1)1/,(t)) Q(1(t))we haveQ((t+1)1(t))Q(I)which is the requirement for the EMalgorithm.It is clear that under the assumption thateach probability distributionis uniquely parameterized and that each densitycan be maximized in termsof its unique parameter allows for a complex maximizationproblem to bepartitioned into a series of smaller andpotentially easier ones. Furthermore, it is recognized thatQ(yy(t))andQ(1(t))are themselves a sumof uniquely parameterized distributions. Inthis regard, proving that if wecan maximize each of these distributions in termsof their unique parameters then we maximizeQ(1t)involves a more elaborate, but identicalin spirit, version of that which has been givenabove.B.2.2 Score and Hessian for thebinary logistic regressionmodelBeginning with the binary regression model givenin section B.1.5, and theresponse model as specified in equation 2.5.4,the log-likelihood of the response model as1(/3Ix, y, q).With the application of the chain rule, asgiven in B.1.5 we can obtain the score function.We have-l(/3IxA,y, 1)=-1(j3x,yj,—-l(/3Ix, y)A&7rth]=L-l(/3Ixi,yj)—-_where éIj is ultimately a function of/3g.Since we are dealing with a binarylogistic regression model as in section B.1.5, it is self-evidentthat the middletwo terms = 1. Additionally, the first partial derivativeyields thesame form as in section B.l.5, thus the onlynew part if the final partial226derivative,thjPulling all the pieces together yields,n-l(/3IxA,y) =[Yi— 1Xjjwhich is the component of the score function.Thejktcomponent of the Hessian matrix has the the secondand thirdcomponents of the chain rule identical to thatgiven in section B.1.5 whichcancel, thus we are left withaakH=- [Yi— b’()]=— b”(8j)xjjxkand from section B.1.5 we haveb”(O)= 1+ e0’thuse8=— 1+ e9XjjXjk227B.3 Details for Louis standard errors forthe EMalgorithmB.3.1 Derivation of the observed scoreHere we present the derivation of equation 3.5.1. Forthe observed-data, wehave the log-likelihoodl*(Ix,rj,yj)log(Iand the score is given by the first derivative of the log-likelihood,S*(Ixp,r, y)=l/*(Ix,r, y)p’(X,R,I)— p(X, R,YI)f...j’p(X’,YI)dxjU—f...fp(xV,YiI)dxiand under the regularity conditions required fordifferentiation under theintegral sign we havef...fp’(XV,YtI)dxUf...fp(X’,Y)dxuEf... fp’(XIdxjUE=f...fp(xUy.)x.u- [ I S(IxY,yi)p(X’,Y)d-J •••Jf...fp(x,Y)dxjxUEUE=E[S(IxY, yj)Ix]228soS*(Ix,rj,yj)—E [S(Ix,yj)lx]Furthermore, the fixed point, , is the solution toS*(Ix,rj,yj)=0 (B.3.1)B.3.2 Derivation of the complete-data informationmatrixThe matrix of second derivatives is commonly knownas the Hessian matrix.For the subject this is82H()aaT1(i)82=8 8l(Ixj)=Sx)where the Hessian over all subjects isThe Fisher information function for a p-dimensionalvector of parameters isa p x p matrix defined asI()=E(_8Tl)with thejkthelement1()jk(_ak1).229An alternate expression for the information isI() =E (—H())The observed information is defined as the negativeof the Hessian, soI() = -The Fisher Information can then be written asI() =E (I())Now we will consider the second derivativeofl*(Ix,r,y), which isc9Tl(Ixi,ri,yi)and equals81p’(X,R,YjI)18 [p(Xf,Ri,Y)jp(X,Rj, R, Y) — [p’(X, R,Y1I)][p’(Xf,— [p(X,R,YjI)]2_______ETp”(X,,Y)— S*(Ix,r,yj)S*(Ix, r,y)— p(X,R,Y)T_UE_S*(Ixf,rj,yj)S*(Ix,rj,yj)UUy1p(XY)Jfp”(X‘_UE—— f...fp(X’,Yj)dxju82Tl(Ix, y) + S(Ix, yj)S(Ix,Yi)T]p(x,YiI)f...fp(X’,Yj)dxjdx2UEUE—230then,[T1IxY Y)IxP]+ E[s(IxV,y)S(jx, Yi)Tx]— SIx,r,yj)S*(Ix,r,therefore,r, y) =E[H(IxV,)Ix]+ E[s(IxVyj)S(Ix,)TIx]— S*(Ix,r, yj)S*(Ix, r,which is the Hessian matrix associated with the observed-datalog-likelihood.The first term is the conditional expectation ofthe complete-data Hessian,the second term is the conditional expectation of theouter product of thecomplete-data score functions with the final termbeing the outer product ofthe observed-data score function. Louis uses the observedinformation whichunder frequentist justification is preferred overthe Fisher Information [25].Beginning with the definition for the observedinformation for the observeddata we haveI(Ix,rj,yj)=_H*(Ix,rj,yj)=- E[H(Ix’,yj)lx]- E+S*(Ix,r,yj)S*(Ix,r,=E[I(Ix’,yj)Ix]- E+S*(Ix,r,yj)S*(Ix,r,=I(Jx) - E+S*(x,r,yj)S*(x,r,=I(x)- Irn(Ix)where Im(Ix) is the information matrix associatedwith the missing information (Equation 3.5.2).231The solution to the score,equation B.3.1 is the value for which an estimate of variance is desired. The information functionevaluated at is thecentral feature in the derivation of thestandard error of whereI(x,r,yj)=1(Ix)Im()=Ixf)- E[s(Ix’,yi)s(Ix,yi)TIx].+[S*(Ix,r,yj)S*(Ix,r1,)T](B.3.2)Ean x, ,rj,yjj — , so- E[S(Ix’,y)S(x’,y.)TIx]. (B.3.3)At the solution to the score equation, the original incomplete-dataproblemis obtained from the gradient and Hessian (curvature)of the complete-datalog-likelihood function that is used inthe EM algorithm. Although there isstill the derivation of thefirst and second order partial derivatives, this relationship presents a simplicity to the incomplete-dataproblem by exchangingdifficult derivations for much more simpleones.On a more pragmatic level, there isan implicit suggestion within themissing data literature that all threecomponents should be used withinthe MCEM framework. From a conceptual pointof view, the solutionto S*(Ix, r,y)= 0 occurs at a local or global maximum, denoted .Within the MCEM framework, we havea algorithmically based solution tor,y)= 0 which we can denote on thetthiteration of theMCEM algorithm. Recalling that the sequenceof likelihood values, denoted{L()(t)}being bounded above converges monotonically toL() where isthe local or global maximum. Wu [114],in his presentation of two convergence theorems for the EM algorithm, notes that if1(t+i) (t)11‘ 0, as t ‘ ccis assumed to be a necessary condition for(t)to tend to then the assump232tion that the set of stationary pointsfor the EM algorithm needs to consistof a single point)can be relaxed. The relevant feature ofthis necessarycondition is that it suggests oneof the commonly used stopping criteria usedfor the EM algorithm. As emphasized in section3.3.1 the stopping criterion— (t—1)< S for some small S is a measure of lack of progresswiththe solution occurring only asymptotically.In a finite number of steps itmay be unreasonable to assume thatwe have reached the target stationarypoint as suggested by Wu’s theoryon convergence. Such a recognition of thelimitation of restricting our algorithmto a finite number of steps controlledby the size of S suggests that whenwe reach the pragmatic convergenceth. (t) -‘criterion, we have on thet iteration of the EM algorithm , thenS*(Ix,r,yj)(t) 0. This understanding is implicit in the use of Louisstandard errors for missingdata problems within a generalized linear modelframework where equation B.3.2 isused instead of the asymptotic resultgiven in equation B.3.3 [43, 44,47, 48].233Appendix CSupplemental material forthe simulation studiesC.1 Simulation 1C.1.1 Maximization of the multivariate normalThe MCEM based maximum-likelihood type estimatorsfor a multivariatenormal distribution with imperfect variables willbe derived. To begin, wewill assume that X2 is identically and independentlydistributed as a multi-variate normal of p dimension with mean t and variance-covariance>,X N(,u,).For a sample of size n, the likelihood for thejthsubject isL(i,Ixi) =1exp{—(xj ——(21r)2I2with the log-likelihoodl(, x)[—log2 — log‘— (x —— )T](C.1J)From equation 2.5.4 we havelc(IxU,y))+1ogp(Yjx, )+logp(X,I)234where = (ji, ,T). The expectationstep for imperfect variables, given byequation 3.3.12 becomest))[firr [logP(RlIx,Yi.7f) + 1ogp(Yjx)+ logp(Xx1i,r) + logp(X,+ (1- flrr)[i(7I7++(rIr(t))+, (t), (t))thus it is possible to isolate the portion whichpertains to the covariates.We also observe that one componentpertains to T, which is assumed tobe known, thus no estimation is required.This will result in a furthersimplification. For the joint distributionof and X2 we have,[( riiri)+ (_rir)(EI(t)(t))][(ñrr)logp(Xl,X2,)p m—rfir) log p(Xjii, X21(t),(C.1.2)The two components of,,t(t),y(t))are almost identical exceptfor the indexing. With the use ofDwyer’s [23] results, the first partialderivatives in terms ofj and D are reasonably straight forward to obtainfor the multivariate normal, thus obtainingthe MCEM maximum likelihood235type estimators and >D will pose few challenges. Since the two componentsof equation C. 1.2 are coming froma multivariate normal, the first orderpartial derivatives for the multivariate normal will firstbe obtained beforethe first order derivative for, (t),E(t))is found. Taking equationC.1.1 as our starting point, for ,t we havel(,Ix)—1(x —ii)and solving for the critical point yields‘ThLXj.The first order partial derivative with respect to isl(t,Ix)=(_‘ + --and solving for the critical point yields={i—— )T_1]= --With these results, we have all the basic piecesfor the derivation of thefirst order partial derivative of equationC.1.2 and comment on the form ofthem when compared with the complete case scenario.The derivative with236respect to u is= [(rr)1(x—P+ (1_llrr)1-1(x)]Solving for the critical point yields[rr) xip+(1_llrIir’) _Zx]and with left multiplication with , the estimateof,iis(i-rr)i.](C. 1.3)where. =is the intra-Monte Carlo sample average withthedot emphasizing the index over which the average is taken. The derivative237with respect to ,&Q(p, (t))is equivalent to=,[-1-1(t)(x)TE-1] +p m(1—llrrf)-_[>_i+E’(x1—p)(xji—=— nZ1+ [firr [1(x —— )T_1]+p m(1— llr(jr)-_——Solving for the critical point yields[T [1(x11 —)(x111)T_1]+p m(1 —flr(jr)J_ ——and with pre- and post-multiplication of then theestimate for is——p m+ (i_JJrfjr)-_238let S2=1(x—— ,),)Tthe intra-Monte Carlo sample variance, then[(ñri1xil_12xi1_)T+ (1_flrrf)m1S2](C.1.4)and with large m[()(i_firiri)S2].(C.1.5)C.1.2 Example of variability in the point estimates andtheassociated confidence intervals for simulation 1.With the first simulation study, a high degree of variabilityin the estimateswas observed. This was not uniform over all the scenarios,but was frequentenough to warrant special comment. Figure C.1 is an exampleof the noiseseen with the point estimates derived from the MCEMapproach. Althoughmany of the estimates remain within a well defined and reasonably tightboundary, some of the estimates appear to be outside this band. Enough ofthe estimates appear to be outliers that robust measures were used toseeif there were substantial differences between the standardand the robustmeasures.239Point estimate with 95% confIdence interval (Louis Method)0cJ.I I I I0 20 40 60 80 100 120 140IndexFigure C.1: MCEM point estimates and Louisconfidence intervals for case4 using missing data mechanism A.240


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items