EvaluatingthePerformanceofHypothesisTestingin Case-ControlStudies withExposureMisclassification,using Frequentistand BayesianTechniquesbyMohammad EhsanulKarimB.Sc., University ofDhaka, 2004M.S., University of Dhaka,2005A THESIS SUBMITTED INPARTIAL FULFILLMENTOFTHE REQUIREMENTS FOR THEDEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate Studies(Statistics)THE UNIVERSITY OF BRITISHCOLUMBIA(Vancouver)August 2009©Mohammad Ehsanul Karim 2009AbstractIn epidemiologic studies, measurement error in the exposure variable canhave large effects on the power of hypothesis testing for detecting the impactof exposure in the development of a disease. As it distorts the structure ofdata, more uncertainty is associated with the inferential procedure involvingsuch exposure variables. The underlying theme of this thesis is the adjustment for misclassification in the hypothesis testing procedure. We considerproblems involving a correctly measured binary response and a misclassified binary exposure variable in a retrospective case-control scenario. Weaccount for misclassification error via validation data under the assumptionof non-differential misclassification. The objective here is to develop a testto check whether the exposure prevalence rates of cases and controls are thesame or not, under the frequentist and Bayesian point of view. To evaluate the test developed under the Bayesian approach, we compare that withan equivalent test developed under the frequentist approach. Both theseapproaches were developed in two different settings: in the presence or absence of validation data, to evaluate whether there is any gain in hypothesistesting for having such validation data. The frequentist approach involvesthe likelihood ratio test, while the Bayesian test is developed from posteriordistribution generated by a mixed MCMC algorithm and a normal priorunder realistic assumptions. The comparison between these two approachesis conducted using different simulated scenarios, as well as two real case-control studies having partial validation (internal) data. Different scenariosinclude settings with varying sensitivity and specificity, sample sizes, exposure prevalence and proportion of unvalidated and validated data. Oneother scenario that was considered is to evaluate the performance under afixed budgetary constraint. In the scenarios under consideration, we reachthe same conclusion from the two hypothesis testing procedures. The simulation study suggests that the adjusted model (with validation data model)is always better than the unadjusted model (without validation data model).However, exception is possible in the fixed budget scenario.UTable of ContentsAbstract iiTable of Contents iiiList of Tables vList of Figures viSelected Notations viiiAcknowledgements ix1 Prelude 11.1 Introduction 11.2 The Impact of Misclassification 21.3 Suggested Correction of Misclassification in the Literature 31.4 Settings of the Problem under Investigation 41.5 Basic Terminologies used to Evaluate Misclassification andMeasures of Association 71.6 Existing Literature on Misclassification 101.7 Motivation and Outline of the Current Work 122 Frequentist Adjustment 132.1 Introduction 132.2 Likelihood Functions 132.2.1 Without Validation Data 152.2.2 With Validation Data 152.3 Variance Estimates 172.4 Likelihood-ratio Tests 173 Bayesian Adjustment 193.1 Introduction 193.1.1 Bayes’ Theorem 19111Table of Contents3.2 MCMC Algorithms 193.2.1 Metropolis-Hastings Algorithm . . . 213.2.2 Gibbs Algorithm 223.2.3 Mixed Algorithm 233.3 MCMC Diagnostics 293.3.1 Conventional Graphical Diagnosis 313.3.2 Gelman-Rubin Method for Monitoring Convergence 314 Simulation Results 334.1 Data Generation 334.2 Scenario Settings Under Frequentist Adjustment 334.2.1 Under Different Values of Sensitivity and Specificity 354.2.2 Under Different Sample Sizes 364.2.3 Under Different Exposure Prevalence Rates 364.2.4 Under Different Proportion of Validation and MainPart of the Data 394.2.5 Comparison under Budgetary Constraint 394.3 Scenario Settings Under Bayesian Adjustment 484.3.1 Under Different Values of Sensitivity and Specificity 494.3.2 Under Different Sample Sizes 494.3.3 Under Different Exposure Prevalence Rates 494.3.4 Under Different Proportion of the Validation Data 494.3.5 Diagnostics 545 Application in Epidemiological Studies 645.1 Introduction 645.2 Study of Sudden Infant Death Syndrome (SIDS) 645.3 Cervical Cancer and Herpes Simplex Virus Study 686 Conclusions and Further Research 756.1 Overall Conclusions 756.2 Further Research and Recommendations 77Bibliography 79ivList of Tables1.1 Structure for main (unvalidated) part of the data 61.2 Structure for validation part of the data 71.3 Relationships among the basic terminologies in a 2 x 2 table 74.1 Scenarios under consideration 354.2 Scenarios under constant cost = $2400, assuming that collecting validated data costs three (3) times as much as collectingunvalidated (main) data 434.3 Scenarios under constant cost = $2400, assuming that collecting validated data costs five (5) times as much as collectingunvalidated (main) data 444.4 Scenarios under constant cost $2400, assuming that collecting validated data costs ten (10) times as much as collectingunvalidated (main) data 455.1 Data from the study of sudden infant death syndrome (SIDS)and antibiotic prescription 655.2 Frequentist Estimates of the model parameters in the SIDSstudy 655.3 Bayesian Estimates of the model parameters in the SIDS study 685.4 Data from Herpes Simplex Virus-2 study 685.5 Frequentist Estimates of the model parameters in the HSV-2study 705.6 Bayesian Estimates of the model parameters in the HSV-2study 71VList of Figures4.1 Power curves under different sensitivity and specificity values:0.6, 0.7, 0.8 and 0.9 respectively 374.2 Power curves under different sample sizes: 400, 600, 1000 and2000 respectively (validation sub-sample size is fixed at 200in each situation) 384.3 Power curves under different Exposure Prevalences: 0.25,0.30, 0.35 and 0.4 respectively 404.4 Power curves under smaller Exposure Prevalences: 0.005,0.01, 0.05 and 0.10 respectively 414.5 Power curves under different proportions of validation partand main part of the data: (1:9, 1:3, 1:1 and 3:1) respectively 424.6 Power curves under fixed amount of cost = $2400 assumingthat collecting validated data costs three (3) times as muchas collecting unvalidated (main) data 444.7 Power curves under fixed amount of cost = $2400 assumingthat collecting validated data costs five (5) times as much ascollecting unvalidated (main) data 454.8 Power curves under fixed amount of cost = $2400 assumingthat collecting validated data costs ten (10) times as much ascollecting unvalidated (main) data 464.9 Bayesian analysis results for different sensitivity and specificity values (.6, .7, .8, .9): Proportion of credible intervalsexcluding null value 504.10 Bayesian analysis results for different sample sizes (400, 600,1000, 2000): Proportion of credible intervals excluding nullvalue 514.11 Bayesian analysis results for different exposure prevalence(.25, .3, .35, .4): Proportion of credible intervals excludingnull value 52viList of Figures4.12 Bayesian analysis results for different ratio of data splits (1:9,1:3, 1:1, 3:1 respectively for validation and main part): Proportion of credible intervals excluding null value 534.13 Diagnosis of convergence of Bayesian analysis results: TracePlots for r0 in 4 chains with different starting values (for10,000 iterations, with half burn-in) for a single dataset . . 554.14 Diagnosis of convergence of Bayesian analysis results: TracePlots for r1 in 4 chains with different starting values (for10,000 iterations, with half burn-in) for a single dataset . . 564.15 Diagnosis of convergence of Bayesian analysis results: TracePlots for SN in 4 chains with different starting values (for10,000 iterations, with half burn-in) for a single dataset . . 574.16 Diagnosis of convergence of Bayesian analysis results: TracePlots for SF in 4 chains with different starting values (for10,000 iterations, with half burn-in) for a single data.set . . 584.17 Sequence of the mean of posterior for r0 for the four MarkovChain Monte Carlo Chains for 10,000 iterations 594.18 Sequence of the mean of posterior for r1 for the four MarkovChain Monte Carlo Chains for 10,000 iterations 604.19 Sequence of the mean of posterior for SN for the four MarkovChain Monte Carlo Chains for 10,000 iterations 614.20 Sequence of the mean of posterior for SF for the four MarkovChain Monte Carlo Chains for 10,000 iterations 624.21 Sequence of the Gelman-Rubin R for the four Markov ChainMonte Carlo Chains for 10,000 iterations 635.1 MCMC for the with and without validation data model parameters in the SIDS study 675.2 Prior and Posterior Distributions of all the Parameters underConsideration in the SIDS study 695.3 MCMC for the with and without validation data model parameters in the HSV-2 study 725.4 Prior and Posterior Distributions of all the Parameters underConsideration in the HSV-2 study 74viiSelected NotationsY Outcome variable of a studyV = Categorical explanatory variableV*Recorded surrogate variable which is collected insteadof V forpractical reasonsObserved part of the dataUnobserved part of the dataExposure prevalence& Apparent exposure prevalenceSN SensitivitySF Specificity‘I’ Odds-ratioPPV Positive predictive valueNPV Negative predictive valueL(.) Likelihood functionf(.) density functionH Logit transformation of exposure prevalenceP Logit transformation of sensitivityT Logit transformation of specificitye Logit transformation of apparent exposure prevalencei,j,K,t Indexvi”AcknowledgementsI would like to thank everyone of the Department of Statistics, from facultyto staff and fellow graduate students, for making my M.Sc. program suchan enriching and pleasant experience. In particular, it gives me a greatpleasure to express my sincere gratitude and deepest appreciation to mysupervisor Professor Paul Gustafson. His unique way of mentoring, valuablesuggestions regarding my research, technical issues of research, hints forprogramming calculations, careful draft revision, constant inspiration andwords of wisdom helped me a great deal to complete this dissertation intime. It truly has been an honor and a privilege to work with him.Nonetheless, I would like to express my sincere gratitude and appreciation to Professors John Petkau, Jiahua Chen, Matfas Salibián-Barrera,Ruben H. Zamar, Arnaud Doucet, Lang Wu and Michael Schulzer for theirexcellent teaching and mentorship. Again, I would like to thank ProfessorJohn Petkau for kindly agreeing to be the second reader of my thesis, and forhis numerous suggestions regarding clarity and consistency in my writing.On a personal note, I am eternally grateful to my parents who haveprovided me with abundance of opportunities and freedom all my life, andto my wife Suborna for her love.Vancouver, Mohammad Ehsanul KarimCanada ehsancstat.ubc.caAugust 27, 2009ixTo my parents, and my wife SubornaxChapter 1Prelude1.1 IntroductionA health outcome, often simply presence or absence of disease, is usuallythe central issue of an epidemiological inquiry, whereas the exposure is arelated factor that is possibly involved in development of disease. The scopeof an exposure assessment is much broader in the sense that it can originatefrom various sources such as some physiological characteristic, psychological characteristic, genetic factor, social or environmental element or geneticattribute. We can use some biological test or even self-reported survey instrument to assess exposure status. Intuition suggests that, whatever toolwe use to evaluate that exposure status, there is always a possibility of having mismeasurement. We can have a gold standard method of exposurestatus evaluation with a well-set definition of superior or ideal exposure assessment. However, since such superior assessment may not be possible toimplement on the whole study sample for various practical reasons,suchas available resources or ethical considerations, an operational method ofassessment has to be settled upon so that we can use that method on theentire sample. This operational definition is basically an indirect measureof the exposure of ultimate interest. The methodologies of assessing diseaseand evaluating exposure are quite different from one another. Therefore, themechanisms by which measurement errors will occur from these two sourcesare very different.Evaluating the affect of the exposure to a given risk factor in the development of a disease or infection is usually the goal in epidemiologicalstudies. While making the causal association between an outcome variablethat defines the disease and the exposure variable(s), it is crucial that bothare recorded without error. However, due to restriction of resources, oftensuch quantification by any association measure is hindered by the lack ofpreciseness of the measures of relevant exposures which are collected using the operational definition. When there exist any sources of error, it ispossible that the researcher’s interpretations or findings of causal inference11.2. The Impact of Misclassificationhave alternative explanations. A rich literature suggests that this has longbeen identified as a problem and there has been considerable interest in thisproblem. Most applied work still ignores this issue and suffers detrimentaleffects. In the current work, this issue is acknowledged and addressed.This problem is relevant as much to continuous as to categorical measures — although the terminology differs slightly. ‘Measurement error’ is theterminology used when the predictor variable under consideration is continuous in nature. On average, the closer the true explanatory variable value andthe measured value from the surrogate variable (error-corrupted variable)are, tho less measurement error exits, If the predictor variable is categoricalinstead (with two or more categories), we call it a problem of ‘misclassification’. In this case, the probabilities of classifying a subject into the correctcategory are considered. The impact of both of the mismeasurement casesare somewhat similar, although the expressions and terminologies to evaluate them are quite different.For the sake of clarity, let us define some notation: the goal of the studyis to explain the relationship between the outcome variable (Y) and exposurevariable. In the current work, we will restrict ourselves to a binary exposurevariable and denote it as V. That is, we will only consider the situationwhether a subject is either exposed or not. However, for practical reasonssuch as cost, time factors or unavailability of a gold standard, V might not bemeasured precisely or directly. Therefore, a cruder classification method isapplied and a corresponding surrogate variableV*is recorded instead. Thisis mostly the case when the exposure status is unobservable or cannot bemeasured precisely within reasonable cost. Nevertheless, although plugging-in a surrogate variable by using an imprecise but cheap classification toolmight seem a very intuitive solution, this is are not without consequences.The phenomenon of such error on the measure of association is sometimesreferred to as information bias. A question of accuracy of the estimate ofthe measure of association between disease and exposure arises, and hencewe need to evaluate the impact of such replacements.1.2 The Impact of MisclassificationMismeasurement in the explanatory variables, when ignored, can have detrimental effects on statistical analysis such as: making the estimates of the21.3. Suggested Correction of Misclassification in the Literatureparameters biased in the model under investigation, reducing the discriminative power, and masking various features of the data [Carroll et al., 2006].In the case of obtaining an estimate of a measure of association, misclassification presents a serious problem. Naive analysis that just substitutesthe apparent exposure status for the unobserved true exposure status canproduce highly biased estimates. When misclassification probabilities areequal for the two compared groups (exposed and unexposed), the estimatesof measures of association such as relative risk, odds ratio, are biased towardthe null value [Copeland et al., 1977].However, the effect of misclassification error on hypothesis testing procedures might not be as detrimental as that on estimation, as mentionedin Bross [1954]. In this paper, it is argued that, if similar misclassificationprevails in both exposed and unexposed populations, then the validity of thetest of finding whether two proportions, that is, the exposure prevalencesare different or not, is not affected. However, this does not come without aprice - and the price is the power of the test, which is reduced in the presence of misclassification. Usually the extent of loss depends on the amountof misclassification.1.3 Suggested Correction of Misclassification inthe LiteratureFortunately, reasonable estimates of measures of association are still attainable, even though the exposure variable under consideration is corrupted.For that, the researchers must have some knowledge about the nature oferror to be able to correct or account for it. Identifiability becomes an issuefor the likelihoods - if we have no clue regarding the extent of misclassification [Walter and Irwig, 1988]. A number of methods for the correctionof measurement error have been developed throughout the years, both indesign and analysis stages. Methodologies in the design stage include replicated measurements, validation studies, etc. In the validation study, thevalidated sub-sample is derived randomly from the same population underinvestigation (either internal or external to those included in the primarysample) and a superior method of exposure assessment is implemented oneach subject in the sub-sample. All these methods have their own pros andcons. Taking into account such information, correction for misclassificationor measurement error can be performed either in frequentist or Bayesian31.4. Settings of the Problem under Investigationways. We will discuss this topic in the current work from the ‘test of hypothesis’ point of view.It is worth mentioning that we will be using internal validation sub-samples throughout this work. Although external validation sample sometimes helps generalizing the results to larger extent, it suffers from variousother limitations as well, especially in the situations when cause is dependent on many factors, not only on the predictor variable under investigation,which is a very common circumstance in the disease-exposure relationships.Also, in terms of cost, internal validation sub-samples are cheaper than external validation samples since inferior method of exposure assessment isalready applied on the subjects of internal validation sub-samples.1.4 Settings of the Problem under InvestigationLet Y be the outcome of interest:for diseased subjectsfor disease free subjects.To keep the problem simple, it is assumed that the outcome variable Y ismeasurement error free. That is, we will deal with exposure misclassification, not disease misclassification.The simplest setting in misclassification is a binary exposure variable,which is frequently the case in epidemiological studies. The binary variableV is used to denote the true exposure status:— J1 for truly exposed0 for truly unexposed.V” is a surrogate variable that denotes the exposure status observed by someinstrument or measurement that is subject to a certain amount of error:— f1 for apparently exposed0 for apparently unexposed.Here the exposure variable V is considered to be replaced by the surrogatevariableV*with considerable measurement error. It is also assumed thatsuch exposure measurements are independent of other errors.41.4. Settings of the Problem under InvestigationTo obtain information about the degree of mismeasurement, a validationsub-sample is necessary, where complete information is available about trueexposure status (V), along with surrogate variableV*status (through animperfect assessment on exposure). This is a small fraction of the main sample, where only the surrogate variableV*status is available. Throughoutthis work, we used various compositions of data by varying this fraction.We will discuss this further in Chapter 4.Although it is known that prospective study data are usually preferablestudy data, researchers have to make certain trade-offs due to feasibility.Retrospective designs are more popular because the secondary data sourcesare usually much cheaper. However, (unmatched) retrospective case-controlstudies are more subject to errors of measurement or misclassification, whichoften leads to invalid results. Therefore, we consider a retrospective case-control scenario, where n1 subjects are sampled from the diseased population(cases), and n0 subjects are sampled from the disease free population (controls).To make valid causal inference from a retrospective study, a number ofassumptions need to be appreciated. Consideration of the type or patternof measurement error is very crucial in evaluating its likely impact on ameasure of association. Researchers should be able to distinguish the consequences of different patters of misclassification: such as differential andnondifferential misclassification - which are based on whether the pattern oferror in exposure assessment varies with respect to disease status. Misclassification probabilities of exposure vary with respect to disease status in caseof differential misclassification. Errors arising due to recall bias and perception are common sources for misclassification probabilities being different inrelation to disease status. The presence of disease may have great influenceon how subjects interpret or report about the exposure status. In this case,the conditional distribution of the surrogate exposure variable (or measurement by ‘imperfect’ exposure assessment method), given the true exposurevariable and outcome variable, that is, VV, Y, depends on Y. This is thecase for many realistic situations. However, to simplify the problem, wesometimes assume that the conditional distribution ofV*V, Y does not depend on Y, that is, misclassification probabilities are invariant with respectto disease status (all cases and controls have the same probability of being misclassified). This is the definition of nondifferential misclassification.Throughout the current work, we will maintain the assumption of nondifferential misclassification, and the conclusions are valid under this particular51.5. Basic Terminologies used to Evaluate Misclassification and Measures of Associationassumption. The reason for this assumption is basically due to some of thesimple features that are established in literature, such as “bias toward thenull” in absence of other forms of error for a dichotomous exposure variableand its simplicity compared to relatively unpredictable effects of differentialmisclassification, Researchers usually go through more sophisticated designslike blinding (of exposure assessment with respect to the disease outcome) orsome of its advanced variants to attempt to ensure that the nondifferentialassumption holds.The notation we will use for the unvalidated sub-sample and validationsub-sample data structures under consideration is given in Table 1.1 andTable 1.2 respectively. The unvalidated and validation sub-samples are separate parts of the entire data. The validation sub-sample is the part of thedata where we implemented both the inferior and superior methods of exposure assessment. On the other hand, the unvalidated sub-sample is thepart on which we used only the inferior method of exposure assessment,excluding those subjects who were randomly selected for the validation subsample. For convenience, we will use the phrase ‘unvalidated sub-sampledata’ and ‘main data’ interchangeably from now on. In each of these tables,the nj’s are observed, where i = 0, 1, j = 1,2,3,4, 5,6, but in Table 1.1,the uj ‘s are unobserved. Although the marginal totals ofV*are observable, we do not have direct information on how those subjects are classified with respect to V. The total number of subjects in the case group isflu + nj + fl13 + n14 + ni +ni = niand similarly, the total number ofsubjects in the control group isoi + fl02 +fl03 + fl04 + no5 +floe = floTable 1.1: Structure for main (unvalidated) part of the data[yy=iJ_____V/V*V=1 V’=0 V=1 V’=0V = 1 U02V = 0 u03Total u11+ U13 U12 + U14 UOl + UO3 U02 + U04= nlS= fl16 = o5 =61 5. Basic Terminologies used to Evaluate Misclassification and Measuresof AssociationTable 1.2: Structure for validation part of the dataY Y=1 Y=oV/V*v=i V’=oIV=l VOV 1 fljjfl12 O1V = 0 n13 n14 n03 n04Total[[n11 + n13 n12+ 7114Ino1 +o3 fl1J2+‘O4Table 1.3: Relationships among the basic terminologies in a 2 x 2 tableTest ConditionExposed UnexposedExposed True Positive False NegativeTrue ConditionUnexposed False Positive True Negative1.5 Basic Terminologies used to EvaluateMisclassification and Measures of AssociationLet us denote the true exposure prevalence as:= P(V 1Ywhere i = 0, 1 for control and case respectively. As V in this case is unobserved, the apparent exposure prevalence is defined as= P(V*=1Y = i).Sensitivity and specificity are commonly used statistical measures of theperformance of a binary classification test. In the current context, sensitivity(SN) measures the proportion of actual exposed people which are correctlyidentified as such. Specificity (SP) measures the proportion of unexposedpeople which are correctly identified. Thus, by definition,SN=P(V*=1V = 1,Y=i)SP= P(V*=OIV=0, Y = i)Notice that we are characterizing misclassification in terms of classificationprobabilities. Therefore, SN2 and SP. range between 0 and 1, and the extentto which these are less than 1 indicates the intensity of the misclassificationproblem.71.5. Basic Terminologies used to Evaluate Misclassification and Measures of AssociationWhen the conditional distribution ofV*Iv,Y does not depend on Y(i.e., nondifferential misclassification condition), then we get SN0 = SN1 =SN and SP0 = SP1 = SF. The apparent exposure prevalence é1 can beexpressed in terms of r1,SN1,SF1:= F(V*=1IY=i)=P(V =1, V= kIY= i)= F(V*=1iV=k,Y=i)P(VkIYi)= F(V*=liV=1,Y=i)P(V:=zliY=i)+P(V*liv= 0,Y = i)P(V 0Y = i)= SN1 r1 + (1 — 5P1)(l — r1) (1.1)= SNr+(1—SF)(l—r),denoting common sensitivity by SN and common specificity by SF, underthe assumption of nondifferential classification. Simple algebraic manipulation from Equation (1.1) shows that(rI,SN1,SF1) and (1 — r1, 1 — SN1,1 —SF1) leads to sameFrom Youden’s Index [Youden,19501,we know that if the sensitivityand specificity are such that SN + SF — 1 <0, then the test is misleading.SN + SF = 1 would mean that the test is no more useful than a coin-flipguess. That is, the test has no discriminative power on the exposure group,and reports same proportion of positive tests for both exposed and unexposed groups. Therefore, a common assumption is SN + SF> 1.In our scenario, where both the response Y and the exposure variable Vare binary, the odds-ratio is defined as:— F(V 1Y 1)/F(V =OIY=1)— F(V=1Y = 0)/F(V=0Y =0)= ri/(l—ri)(1.2)ro/(l — ro)which is a common measure of association between disease and exposurestatus for retrospective case-control studies. However, if exposure variablev is subject to misclassification error, an intuitive substitute is:*— 8/(l— i)0 081.5. Basic Terminologies used to Evaluate Misclassification and Measures of AssociationThus, the attenuation factor is defined as:I1*AF=r--,which gives us an idea of the magnitude of bias introduced by misclassification.An alternative formulation for expressing degree of misclassification requires us to use the Positive Predictive Value (FP) and the Negative Predictive Value (NFV). Positive Predictive Value (FFV) is the proportionof subjects with a positive test result from the inferior method of exposure ase.sment, who actually is exposed, determined by superior methodof exposure assessment. Similarly, Negative Predictive Value (NPV) is theproportion of subjects with a negative test result from the inferior methodof exposure assessment, who actually is unexposed, as indicated by superior method of exposure assessment. These two quantities can be calculatedfrom a 2 x 2 table. By implementing Bayes’ Rule as discussed in Equation(3.1), the relationships of (ri, SN, SF,) with FFT4 and NFV are derivedas follows:PPV, = P(V=1IV* =1,Y=i)— P(V*= iV= 1,Y=i)P(V=1IY=i)— P(V = 1V= 1,Y=i)P(V=1IY =i)+P(V= 1IV=O,Y=i)P(V=OIY=i)SNr1 4SNr + (1 — SP)(1 —NPV P(V=OIV* =O,Y=i)— P(V*=OIV=O,Y=i)P(V=OIY=i)— P(V*=0IV=o,Y=i)P(V=0IY=i)+P(V*=1IV=o,Y=i)P(V=OJY=i)SP(1—r)(15— SP(1—r)+(1--SN)rHowever, unlike the implication of nondifferential misclassification withrespect to sensitivity SN and specificity SF, FFV0does not have to be equalto PPV1,nor does NPV0 has to be equal to NPV1 under nondifferentialmisclassification.9- 1.6. Existing Literature on Misclassification1.6 Existing Literature on MisclassificationNondifferentiality is a recurring assumption in the epidemiologic literaturedue to some of its interesting results. Bross [1954] discussed the difficultiesof inferences on a single proportion or the difference between two proportions from a 2 x 2 classification table in the presence of misclassification. Heindicated the distortion of estimation and thepower reduction in hypothesistesting. He justified his statements under the assumption of nondifferentialmisclassification. Newell [1962] further substantiated the fact that nondifferential misclassification errors will always tend to produce results biasedtowards the null (that is, the difference between the two rates will shrinkwhile applying inferential procedures on the data with nondifferential misclassification). Also, Gullen et al. [1968] suggested that under broad assumptions, classification error never results in the apparent difference beinglarger than the real difference in rates. Dosemeci et al. [1990] and Diamondand Lilienfeld [1962a, b] showed with some numerical examples that exceptions are possible and that nondifferentiality is not always tenable. Keys andKihlberg [1963] tried to identify the reasons of such unusual deviation. Toimplement this result, the measurement error has to be independent of allother errors. A few good reviews of such unusual phenomenon are availablein the literature, such as Thomas [1995] and Jurek et al. [2005].Roih7nan et al. [2008] discussed misuse of the “bias toward the null”result, mostly when the assumptions for this result are not met. Even ifthe assumptions are met, it is not necessarily true for hypothesis testing: pvalues need not have upward bias as reported by Greenland and Gustafson[2006].The situation gets even more complicated for more than two categories,i.e., when exposure is polytomous. [Gladen and Rogan, 1979] provided expressions for bias under nondifferential assumption. Early literature on theimpact of misclassification includes Koch [1969] and Goldberg [1975]. Mostof these describe the effect on association measures obtained from a 2 x 2exposure-disease classification table. Goldberg [1972] discusses the issue withregard to hypothesis testing.Historically, the development of adjustments for mismeasurement weremostly under the nondifferentiality assumption. Copeland et al. [1977] suggested extension of the “bias toward the null” result to ratio effect measuresof association, such as the risk ratio and odds ratio and derived adjust101.6. Existing Literature on Misclassificationment formulas to correct for misclassification given the nondifferential assumption. Barron [1977] suggested a matrix method for such adjustment.Greenland [1980] further extended the adjustment to difference effect measures and also considered the possibility of misclassification of confounders.Greenland [1988b] discussed the basic methods for constructing varianceestimators for the various parameters after adjusting for misclassification.Marshall [1990] proposed inverse matrix methods by reparameterizing themisclassification problem. Morrissey and Spiegelman [1999] discussed boththe matrix and inverse matrix methods under various circumstances. Lyles[2002] reparameterized the likelihood of the problem and suggested a relatively more convenient solution to the problem which does not require numerical optimization. If all the parameters are unknown, nonidentifiabilitymakes the inference impossible. A reasonable estimate of the misclassification probabilities is required to carry on the inference. Adjustments formisclassification using replicated samples are provided by Walter and Irwig[1988]. Greenland [1988a] provided formulas for adjustment when a validation sample is present. More recent works include Greenland and Gustafson[2006], Greenland [2008] and Marshall [1997]. Marshall [1989] pointed outthat the estimates of measures of association that adjust for misclassification are very sensitive to the estimates of misclassification probabilities andeven small discrepancies with actual probabilities can lead to misadjustment.Recent developments in the rapidly advancing field of computing made itpossible to use the numerical approaches and simulation techniques to solvethese problems in a more elegant way. The problems of mismeasurementwere explored from a Bayesian context in Rahme et al. [2000], Joseph et al.[1995] and Prescott and Garthwaite [2002]. Gustafson et al. [2001] checkedthe point made by Marshall [1989] and suggested a Bayesian solution ofthe problem by incorporating some uncertainty about the misclassificationprobabilities by means of having a prior distribution of those parametersinstead of a particular guess. Gustafson and Greenland [2006] showed thatimplementing such prior may provide narrower interval estimates of measureof association. Chu [2005] incorporated such uncertainty or randomness byimplementing various prior distributions on the prevalence and misclassification probabilities and assessed the Bayesian adjustment of estimates ofvarious parameters of misclassified data under various assumptions when validation data is available. Estimates obtained from the Bayesian approachis then compared with estimates from previously developed methods suchas the maximum likelihood estimates [Lyles, 2002] and SIMEX (simulationextrapolation method).111.7. Motivation and Outline of the Current WorkA general overview of the methods for misclassified categorical data andsome extensions to higher dimensions are provided in Willett [1989] andChen [1989]. Overall general discussion of these issues and the ways tocombat such problems are documented in chapters 3 and 5 of Gustafson[2004].1.7 Motivation and Outline of the Current WorkAlthough comparisons between the frequentist method with specific estimates of parameters (misclassification probabilities and prevalences) andthe Bayesian method with prior distributions on parameters (to incorporateuncertainty) provided in the literature, such comparisons have not yet beenmade for hypothesis testing. In this thesis, we will assess the impact of misclassification of dichotomous exposure on hypothesis testing for two settings- without considering validation data and its counterpart after adjustmentsusing the estimates from validation data - under the nondifferential misclassification assumption. The Bayesian adjustments for hypothesis testing willbe compared with standard frequentist methods.In Chapter 1, we have discussed the historical developments, basic definitions and terminologies for misclassification error. The motivations forcorrection and some methods of adjusting for such errors are also discussed.The problem under investigation is specified. In Chapter 2 and 3, we willexplain the models and methodologies of hypothesis testing in the presenceof misclassification error from the frequentist and Bayesian points of viewrespectively. In chapter 4 we will show the simulation results under a setof scenarios and compare the classical and Bayesian methods. We use somereal epidemiological datasets to implement these methods in Chapter 5 andconclude with general findings and further recommendations for future researches in Chapter 6.12Chapter 2Frequentist Adjustment2.1 IntroductionMaximum likelihood estimation (MLE) is a popular method used for fittinga statistical model to data. Pioneered by various statisticians including R. A.Fisher at the beginning of the last century, it has widespread applications invarious fields. If the sample observations are available, this estimation procedure searches over various possible population characteristics and eventuallyobtains the most likely value as the estimate of that population characteristic. Having drawn a sample of ri values x1, x2, ..., z from a distributionwhere4is the parameter of interest, we form L(çb) = f(xi, x2,.. . , x,j. Themethod of maximum likelihood estimates ‘ by finding the value of thatmaximizes L() or, more commonly, the logarithmic transformed version ofit. The solution can be found numerically using various optimization algorithms. The popular alternatives to this estimation procedure are leastsquares procedure and method of moments. However, those estimates arenot very efficient in various circumstances, whereas maximum likelihoodestimates possess various desirable features such as consistency and asymptotically efficiency, if solution exists. The maximum likelihood estimationprocedure can also be used on non-random samples, if certain adjustmentsare made, such as conditioning on the clusters or correlated groups, etc.2.2 Likelihood FunctionsPreviously in §1.2, we discussed the impact of misclassification. The estimates of(So,9) obtained from the entire sample will be biased towardthe null, under certain conditions. As described in §1.3, there are variousmethods suggested in the literature for adjusting the consequences of misclassification. We will use the method that uses a validation sub-sample.By using validation data, we can have an estimate of ro, r1, SN and SF.Therefore, given the observed data, under nondifferential misclassification,we can consider(ro, ri)or(So, Si)as the unknown parameters in the statis132.2. Likelihood Functionstical models. As mentioned in §1.5, the true exposure prevalence is definedas r = P (V=iY = i), whereas, the apparent exposure prevalence is expressed as 8P(V*=1Y = i). From Equation (1.1), we can see that Ocan be expressed as a linear function of (ri, SN, SF). For the same dataset -when r, SN and SF remains fixed, we can use H0 := ifor without validation data settings (by ignoring all the true exposure status V, but usingall the apparent exposure statusV*obtainable from the entire sample) andequivalently, H0 : ro = r1 for with validation data settings (by incorporatingthe true exposure status V from the validation sub-sample and the apparentexposure statusV*obtainable from the entire sample). Since both of thesehypotheses are applied on the same dataset, the total number of subjectsunder consideration are the same in each test. The stated hypotheses aresimply variants of the following hypotheses respectively: H0 : ‘I’ 1 andH0 : =1.The notable distinction between these two models is that r can take anyvalue between (0, 1), whereas é1 can take values between min(SN, 1 — SF)and max(SN, 1 — SF). We will discuss the likelihoods and the solutionmethods in the following subsections.One important point is worth mentioning: even in absence of validationdata (that is, when true r0, ri , SN and SF are not estimable), due to theequivalence of hypotheses mentioned above, we can test H0 :== 0and we can conclude the same about H0 :ro = r1 = r. However, whenvalidation data is not present, such equivalence is not true for estimationpurposes, because when SN and SF are unknown, the relationship between(Oo, 0)and (ro, ri) is not known respectively (see Equation 1.1). Therefore,when a validation sub-sample is not available, we can not estimate (ro,r1),but from the entire sample we can estimate (Oo, O).142.2. Likelihood Functions2.2.1 Without Validation DataA standard way to express the likelihood in terms of the parameters(6o, 6)for problems consisting of misclassified data without validation part is:L(60,9iIV,Y)cc = 1Y= o)(flol+flo3+flo5)x P(V = OY= O)(flO2+flO4+flOG)xP(V*= iY= i)(flhl+fl13+fl15)xP(V*OY= i)(fl12+fl14+fl16)= 6nol+no3+no5)< {i.— 0 }(flO2+flO4+flO)x(nn+n13+n15)x {i—6}(fl12+fl14+fl1)(2.1)The maximum likelihood estimates of6, 61respectively are given by:floi+fl03+fl05=fl01 + flhJ2 +no3 + no4 + no5+ no6flu + ri3 + 9215il+ fl12 + fll3 + fl14 +fll5 + Th16Under the null hypothesis H0 : 6= 6 0, the maximum likelihoodestimate is given by -— flOi+fl03+flQ5+flhl +nl3+fll5—Oi +flo2 +fl03 +fl04 +flo5 +9206 +flui +fli2 +fll3 +fl14 + l5 + l62.2.2 With Validation DataA standard way to express the likelihood in terms of the parameters (ro, r1,SN, SP) for problems consisting of misclassified data with validation part152.2. Likelihood Functionsis provided in Equation (2.2) under nondifferential misclassification:L(ro, r, SN,SPV*,V Y){P(v= 1Y= O)P(V*0V 1,Y= 0)}01{P(v1IY= O)P(V*= 1V 1,Y= 0)}02{P(v=OY= O)P(V*1V = 0, Y={P(v 0Y0)P(V*OV = 0, Y= 0)}T04{P(v = 1Y= 1)P(V*0V = 1,Y= 1)}nul{P(v 1Y1)P(V*= 1V ,Yi)}712{P(v 0Y1)P(V*1V = 0, Y ={P(v=0Y= 1)P(V*= 0,Y ={P(v*= 1IY= 0)}0{1 — (P(v*iY={P(v*=1Y 1)}’{1— (P(v*= 1IY= l))}16= {roSN}’°’{ro(1 — SN)}’°{(1 — ro)(1 — SP)}’°3 x{(i — ro)SP}°{riSN}Th11{ri(l — SN)}Th12 x{(i — ri)(1 — SP)}”3{(i — ri)SP}’ x{r0SN + (1 — ro)(1 — SP)}’205 x{i — (r0SN + (1 —r0)(1— SP))}?06x{r1SN + (1 — ri)(1 — SP)}’15 x{i— (riSN+(1—ri)(1—SP))}’’6. (2.2)This likelihood does not lead to a closed form for the maximum likelihoodestimates of ro, r1, SN and SF. In quasi-Newton methods, the Hessian matrix of second derivatives of the function to be optimized is not required.That is why, a general-purpose optimization based on quasi-Newton methodsor a variable metric algorithm is used to optimize Equation (2.2), specificallythe algorithm that was published simultaneously in 1970 by Broyden [1970],Fletcher [1970], Goldfarb [1970] and Shanno [1970] (that is the origin ofthe name Broyden - Fletcher - Goldfarb - Shanno or BFGS method). Thisalgorithm uses function values and gradients to build up a picture of thesurface to be optimized.However, for differential misclassification, we do have closed form expression for the maximum likelihood estimates of ro, ri, SN and SF.162.3. Variance Estimates2.3 Variance EstimatesThe numerical approximation to the Hessian matrix can be obtained fromthe BFGS algorithm (implemented in opt im function of R). The negative ofthe Hessian matrix is the observed Fisher information matrix. The inverseof the observed Fisher information matrix yields the asymptotic covariancematrix of the maximum likelihood estimates. By the use of multivariatedelta method, one can easily obtain the asymptotic variance of the log oddsratio, given the estimated prevalence rates.2.4 Likelihood-ratio TestsA likelihood-ratio test is a statistical test for making a decision between twohypotheses based on the value of the ratio of the likelihood under two different hypotheses. The null hypothesis is often stated by saying the parameteris in a specified subset ‘I of the parameter space . The likelihood function is L() = L(çbx) is a function of the parameter with x held fixed atthe value that was actually observed, i.e., the data. The likelihood ratio isA— supL(q5Ix):— supLx):EThe numerator corresponds to the maximum likelihood of the observed result under the null hypothesis H0. The denominator corresponds to themaximum likelihood of the observed result under the alternative hypothesisH1. Lower values of the likelihood ratio mean that the observed result wasless likely to occur under the null hypothesis. Higher values mean that theobserved result was more likely to occur under the null hypothesis. Thelikelihood ratio A is between 0 and 1. The likelihood ratio test rejects thenull hypothesis if A is less than a critical value which is chosen to obtaina specified significance level c. Usually it is difficult to determine the exact distribution of the likelihood ratio for a specific problem. However, asthe sample size n approaches , the test statistic —2 log(A) will be asymptoticallyx2distributed with degrees of freedom equal to the difference indimensionality ofoand I. In the current context, for without validation data, I = 6 and I=(6o, 6k). Similarly, for with validation data,(r, SN, SP) and 1=(ro, r1,SN, SF). Eventually from these tests, weobtain p-values.A convenient measure of the performance of any hypothesis test is to findthe probability of not making type II errors (1—3), or in other words, not172.4. Likelihood-ratio Testsmaking the error of “not rejecting null hypothesis when it is false” - powerof the test. Powers can be thought as the ability of the hypothesis test todetect a false null hypothesis. In Chapter 4, we will use the power curveas a tool to compare the tests based on with and without validation data.Also we will try to identify whether frequentist methods perform better thanBayesian methods or not. We will discuss relevant Bayesian methodologyin Chapter 3.18Chapter 3Bayesian Adjustment3.1 Introduction3.1.1 Bayes’ TheoremBayes’ Theorem is a simple mathematical formula used for calculating conditional probabilities of random events. For the random variable X, that isdistributed asL@5x),where is the parameter of interest, let fx(x) is themarginal distribution and hence a function of the observed X alone, whileg() is the distribution of before observing X. Then Bayes’ Theorem saysthat the form of posterior distribution is:ir@Ix)= f(x,)fx(x)— g()L(Ix)fx(x)c g()LQx). (3.1)Although Equation (3.1) seems simple, it is a fundamental theorem whichhas deep impact in statistical theory. It is often the case that the posterior‘w(ØJx) is non-standard or high dimensional, involving a lot of parameters.Then it is difficult to evaluate summaries such as the mean, variance, moments, etc. which require integration. Although analytically this is a difficult problem, algorithms discussed in the following sections help us findsolutions numerically. It should be noted that simpler methods such asLaplace approximation can also be used to evaluate such summary quantities, but they require restrictive assumptions such as normal approximationto the posterior distribution and so on. Therefore, we consider algorithmsthat can be applied in broader contexts.3.2 MCMC AlgorithmsFrom §2.4, frequentist likelihood ratio test results are based on the asymptotic assumption, that is, ax2approximation for the sampling distribution193.2. MCMC Algorithmsof the test statistic —2 log(A), since the exact distribution of the statistic ishard to determine and varies from problem to problem. Monte Carlo methods can be an alternative to this approach. These methods can even beapplied in the cases where the distributions are not in conventional formator unknown.To explain the idea of Monte Carlo, suppose that is a collection ofmodel parameters or unknowns, and h() is a function of . We want toevaluate the expected value of the given function h() over a pdf ir(). Inother words, we want to evaluate E(h()) fh(b)Tr(q)db. If ir has avery complex form, we proceed with the Monte Carlo integration technique.Here, we draw samples(2), (n)independently from Tr(). Thenwe estimate h()), which can be made as accurate asdesired by increasing sample size. Therefore, the fundamental idea behindMonte Carlo methods is that, by repeatedly drawing random samples fromthe target population ir(), we can gain insight regarding the behavior of astatistic. When we observe the behavior for a very long time, we obtain anestimate of the sampling distribution of the statistic. But this added advantage is not without a price - time and computer resources are big issues forthese algorithms. However, recent advances in computing technologies haveled to enormous popularity of Monte Carlo simulation as a powerful alternative to formula-based analytic approaches, especially where the solutionrequires a lot of assumptions.In the Bayesian context, this Tr() is the posterior densityTr(Ix),whichmay have a nonstandard, complicated form. Here x denotes the observedinformation, and is high dimensional. Sampling independently from theposterior densityir(Ix)is generally not feasible, and closed form solutionsare not usually possible. Therefore, we generate a chain or dependent samples(1), (2),(‘)from the posterior using a Markov Chain scheme.This Markov chain generates each iteration)taking into account of theprevious value(i_1)only. We want to create a Markov Chain whose stationary or limiting or equilibrium distribution is the desired posterior Tr(q5 x).Here the posterior distribution ir(g5Ix) is the target distribution. To obtainthe stationary distribution of the Markov Chain, we need to run the burn-infor a long time. Here, burn-in refers to the series of initial samples that arenot expected to have yet converged to the target distribution and are henceexcluded from any subsequent analysis. In brief, the basic idea of MarkovChain Monte Carlo is to iteratively produce parameter values that are representative samples from the joint posterior. For large number of iterations,203.2. MCMC Algorithmsthis scheme will provide samples)from its stationary distribution, thatis, when the successive samples becomes uncorrelated. This way, it is surprisingly easy to approximate the posterior distributionir(Ix). However,one added disadvantage to this entire procedure is that we have to monitorconvergence. We will discuss this further in §3.3.The Markov Chain Monte Carlo algorithms that are used in the Bayesianversion of the test under consideration are described in §3.2.3. But fordetailed understanding of the procedure, we start with a general descriptionof the Metropolis-Hastings algorithm and Gibbs algorithm. However, forbasic terminologies and definitions used in these Markov Chain Monte Carloalgorithms, we refer the readers to Gelman [2004].3.2.1 Metropolis-Hastings AlgorithmSuppose we need to estimate a parameter vector with k-elements, E 4and the posterior, 1r(c). When the chain reaches the position at thettIstep, we draw ‘ from a distribution over the same support and we nameit the proposal or jumping distribution,Pt(4I),according to which a newvalue ‘ (candidate point) is proposed given the new current value].Onething to keep in mind is thatP(’I) should be easy to sample from. Weare producing a multidimensional candidate value. The condition here isthat the reverse function value,Pt(I’)should also exist. In the literature,the acceptance ratio is defined as follows:/ ir(’)Pt(’)32(.)The Metropolis-Hastings algorithm does not necessarily move on every iteration. The probabilistic rule that decides whether the candidate point isaccepted or not, i.e., transition from t to (t + l)th point, is:— f‘ with probability min{a(,[t1),l}[tjwith probability 1 — min{c(’,[t1),1}We only need to know the posterior distribution ir() up to a constant ofproportionality. This is considered as the most attractive feature of theMetropolis-Hastings sampler.A single Metropolis-Hastings iteration proceeds with the following steps:1. Initialize the chain with any arbitrary value.213.2. MCMC Algorithms2. Generate a candidate point // from(‘I),where is the currentlocation.3. Sample u from uniform (0, 1) distribution.4. If cYç/) > u then accept ‘.5. Otherwise keep 4 as the new location and repeat until convergence.From the obtained chain, we truncate burn-in samples, and the rest of thechain is used to estimate the posterior distribution.3.2.2 Gibbs AlgorithmThe Gibbs sampler is a special case of Metropolis-Hastings where we alwaysaccept a candidate value. The idea of Gibbs sampling is that, given a multi-variate distribution, sample from a conditional distribution. This samplingis generally simpler than integrating over a joint distribution. Hence, theGibbs sampler is simply a Markovian updating scheme, based on a sequenceof conditional probabilistic statements.We will give a brief outline of Gibbs algorithm in its simplest form.Let the joint distribution of interest be ir(), where is a vector of of kparameters. The aim is to create a Markov chain that cycles through someconditional statements. A requirement for use of this sampler is that wemust know the full conditional distributions. This is a major limitation ofthis algorithm, especially for the cases where the conditional distributionsare hard to derive. The full set of required conditional distributions forare denoted by and defined by 7r()=ir(iI&, 2, . . . j4, j+1,. .. k)fori = 1, .. . , k. It should be possible to draw samples from these conditionaldistributions. At each iteration of the Gibbs sampling, the algorithm cyclesthrougn these conditionals based on the most recent version of all otherparameters. The order is not important, but it is important that the mostrecent draws from the other samples be used. The algorithm is as follows:1. Decide on the starting values:=[01, ..,2. At thettIiteration, a single cycle is completed by drawing values from223.2. MCMC Algorithmsthe following k distributions:_ijtl,4jti A.It1 _t—•1 ,4j11‘P1 ‘P2 ‘ ‘P3 , ‘P4 ‘‘ ‘Pki ‘ ‘Pk[tj[t] jt—i] [t—1][t—i1 [t—i]2Y2 , ‘P3 ,, ‘ ‘Pk[tj jt] [t] [t—i] jt—i] [t—1]P3 ‘P1 ‘ ‘P2 ‘ ‘P4 ‘•‘ Pk—1 ‘ ‘Pk[tj ,[t] ,[t] ,[tj [tJ ,[t—i)“k—ipk—1 ‘P1 , ‘P2‘ ‘P3 ‘ “‘ ‘Pk—2, ‘Pk[tJ,[t] ,4[tl ,[tj ,[t] ,,[tjk‘Pk ‘P1 ‘ ‘P2 ‘P3 ,‘ ‘Pk2, ‘Pk1Here çb can be a multidimensional vector.3. Set t = (t + 1) and repeat until convergence.If the Gibbs sampler has run for sufficiently long time, it produces samplesfrom the desired stationary distribution. The attractive feature of the Gibbssampling algorithm is that these conditional distributions contain enough information to eventually produce samples from the desired joint distribution.3.2.3 Mixed AlgorithmWith Validation DataLikelihood Function: First, we define the parameter space:(ro,ri,SN,SP),where r0 is the exposure prevalence for controls and r1 is the same forcases, SN is the sensitivity and SF is the specificity under nondifferentialclassification.The cell counts u of the main data as shown by Table 1.1 are generated from a binomial distribution. To be more specific, the actual numberof subjects that are in positive exposure status (u1) amongst those whoare exposed in the groups of cases or controls (n5) follows a binomial withparameters n5 and PPI4 (as defined in Equation (1.4)). Likewise, conditioning on the number of cases or controls with negative exposure status(ri6), the number of truly unexposed subjects (u4) follows a binomial withparametersj6and NPV (as defined in Equation (1.5)).The likelihood function for this setting is given in Equation (3.3). Here,the data Y is updated as Y = {Y, Y}={(m1,n,2,fl3, n4), (u1,u2,u3,233.2. MCMC AlgorithmsU4)f(Y,,YI2) L(ro,ri,SN,SPIYn,Yu)ñ[pv= iv = i,Y = i)P(V ={P(v*OV 1,Y = i)P(V 1Y{P(v*=1V = O,Y = i)P(V= OY ={P(v*OV = O,Y i)P(V = OY{P(v*iV = 1,Y = i)P(V = 1Y— j)}Uil{F(v*OV = 1,Y = i)P(V 1Y{P(v*1V = O,Y = i)P(V = OJY= j)}U3{P(v*OV = O,Y = i)P(V OY= i)}Ui4][{SNjr}nx {(i - SN)r}’2x{(1- SP)(1 —{SP(1 — x {SNrI}” x {(i— SNj)rj}U2{(i — SP1)(1— r)}us3x {SP(1— rj)}U4][{sNjri}+Ulx {(i— SNj)rj}Th2+2{(i — SP)(1 — x {SP(1 — ri)}44]=x (1—x SN’”X (1— SN)2+2x (1_ X SP4+4].Under nondifferential misclassification, SN0 = SN1 — SN and SF0 =SP1 = SF (according to the definition that we used). Therefore, in Equation(3.3), we could have used SN and SF, instead of SN and SP. But wepreferred to keep the general format to present the likelihood function forthe broader context.Prior Specification: We are interested in(ro, ri,SN, SF), as defined in section 3.2.3. Each of these parameters can possibly range from 0to 1. To cover the whole real line from —oo to , we make a logit transformation of each of these parameters. To keep the problem manageable, we243.2. MCMC Algorithmsassume the following:/ “ II__\// \ /2(H= (LO9r\NI (/L0‘ I °oP°oi“H)— \ logy) )‘k puouiF (log1N)N(2,u),I(iog1Sp)(34)whereH,H, F, I are just the logit transformed versions ofro, r1, SN, SFrespectively. Here(ll, Hi)’is assumed to follow a bivariate normal distribution with hyperparameterso,and cr0,u1, p. Similarly, F and I followindependent normals with hyperparameters ,u2,2 andj,u3.Also conditional distribution of a bivariate normal variable remains normal, therefore, given F, I, we haveHoliliN({[Lo+p(Hi_1L1)},U(l_p2))flubN({,u + p-(Ho —io)},(1— p2))It should be noted that these assumptions of independence among theparameters and normal distribution structures of them are purely based onmathematical convenience. Researchers can think of other possible distributions if they find them suitable for the purpose. Also if one thinks that theassumption of independence of the parameters is inappropriate, it is possibleto impose correlation among the parameters by means of some multivariatedistribution with defined correlation structure.We assume that the analyst’s prior beliefs about the logit transformedparameters can be represented by the hyperparameters mentioned in Equation (3.4). These beliefs may be gained from relevant examples from thegiven subject area. Under fairly general conditions, we have empirical reason to believe that bothro and ru usually lie between Tmjn = 0.02 andTmax 0.50; wewill assume a median being 0.125. Then i= ILi =0.125. Within 2 standard deviation, on logit scale, we have orj =i ={logit(u) — logit(rmjn)}/3 under normality with 95% probability. Also assume a mild value for p, say, 0.3 to allow relatively large standard deviationof log OR around the mean of 0. For SN and SF, we usually see them lying253.2. MCMC Algorithmsbetween 0.60 and 0,99; we will assume median of 0.80. Using the same logicas before, we determine the hyperparameters. It should be emphasized thatthe user can choose any hyperparameters of interest. The above is just anexample of how we can construct the prior from the mentioned empiricalbeliefs. Often the posterior is robust to the assumed prior. We will discussthis point further in Chapter 5.Posterior: Since Equation (3.3) is a complex one, simulating 1 directlyfrom the joint posterior distribution is troublesome. Therefore, we will sample sequentially from the conditional distributions as follows:(ro, riI, SN, SF) cc fr(ro, ri)fi[r1+2+Uul+Ui2(1— rj)nh3+4+3+ui4],(SNI,ro,ri,SP) cc fsN(SN)fl[SN’1x(1_SN)22],(SP, SN, r0,ri) cc fsp(SP)fl[5pi4+ui4x (1— SP)Thi3+Ui3](35)using the prior distributionfT, fsNandfsas already described.Since the densities are not conditionally conjugate, we implement univariate Metropolis-Hastings jumps embedded in the Gibbs sampling. Thisalgorithm will update each component in the pairs of parameters, (ro,r1),and the same for SN and SP. For satisfactory performance of the MCMC,we need o make suitable choice of jumping distribution. If we examine thelikelihood function in Equation (3.3), and think of r, SN and SF separately, it looks similar to a beta density. Hence we assume a beta jumpingdistribution. This simplifies calculation of the acceptance rate by cross canceling the ratio of proposed versus current likelihoods and the ratio betweentwo jumping densities. For example, consider the acceptance probability forthe one-dimensional M-H jump on ro in Equation (3.5). The jumping rule isspecified as r’0 ‘-‘s Beta(noi+fl02 +U1+u2+1, rio3+n04 +u3+u4+1), closeto the conditional sampling distribution, where t is the index for iteration263.2. MCMC Algorithmsnumber. The ratio in Equation (3.2) becomesir(r,lri,SNt,SPt,Yt)— 7r(rIr,SNt,SPt,Yt)— Pt(rbr,r,SNt,SPt)Pt(rIrb,r,SNt,SP)r(r’o Ir,SNt ,SPt)L(rb,r ,SNt,SPt)— 7r(r Ir,SNt ,SPt)L(r,r ,SNt ,SPt)— L(ro,r,SNt ,SPt)L(r ,r ,SNt ,SPt)— 7r(r’oIr,SNt, SPt)— rIr,SNt, SPt)— ir(rbr)— ir(rrTherefore, we are left with merely the ratio between two prior distributions.Thus, using this mixed algorithm, we proceed as follows:1. Set starting values of (r,r?,SN°, SP°).2. At thett/iteration,• Given parametersQt= (ri, r,SNt, SPt),generate new data asunobserved actual exposure data = {u} based on binomialdistributions, for i = 0,1, j = 1,2,3,4.• Based on the updated cell counts {u} at thettiteration, modelparameters are generated as follows:(1) Simulate r conditioning on(r,SNt_l,SPt_)using theM-H algorithm, the proposed jumping rule is rj Beta(noi+nO2tui,no3+no4+u),with acceptance rate mm{ }.(ii) Similate r conditioning on (r,,SNt_l,Spt_1)using the MH algorithm. The proposed jumping rule is r’1 “.‘ Beta(nii+fl12 +U1+U2+ 1, nla + ri14 +u3+u4+ 1), with acceptanceI ir(r’ir)rate miniIir(r1 Iro,1)(iii) SimulateSNtconditioning on (ri, r,SPt_1)using the M-Halgorithm. The proposed jumping rule is SN ‘- Beta(noi +U11 +7111+1-41 +1, no2+u2+n12+u2+ 1), with acceptanceI ir(SVSN’’)ratemlnr(SNt_1ISNt_1)(iv) Simulate5ptconditioning on (r,, r,SNt)using the M-Halgorithm. The proposed jumping rule is SF Beta(no4+273.2. MCMC AlgorithmsUj4 + fl14 + U4 + 1,o3 +u3 + flj3 +u3 + 1), with acceptanceI r(S7VSP’)ratemlncr(SPt_1ISPt_1). Calculate the log odds ratio at thettiteration.3. Repeat the step (2) at subsequent iterations, for t = 1,. . . , m+ n, tosimulate target parameters using the hybrid algorithm.The procedure is run for sufficiently long m+ n iterations, where m is thenumber of burn-in iterations and n is the number of target iterations.Without Validation DataLikelihood Function: At first, we define the parameter space:(8o,9i),where 8o is the apparent exposure prevalence for controls and9iis the samefor cases.As for validation data case, let us assume that the cell counts from themain data as shown by Table 1.1 are generated from a binomial distribution.The likelihood function from this setting becomesL(= {o91}Y, Y) H1+fl3+flh5(1— .)fl2+fl4+fl6= 8fl01+fl03+fl05(1—00)fl02+fl04+fl06x91111+fl13+fl15(l —80)fl12+fl14+fl16Prior Specification: We are interested in(8o, Ox).Each of theseparameters ranges from 0 to 1. To cover the whole real like from —oo to oo,we make a logit transformation of each of these parameters. To keep theproblem manageable, we assume the following:(0- (1ogN°Ie1) logy-)’ 1)’0oo?))‘where e0, e1 are just the logit transformed versions of6, 6irespectively.Here(Os, 1) is assumed to follow bivariate normal distribution with h.yperparametersJo, itiando, 8i,. For the hyperparameters, the logic is thesame as for the prior of(fl,Hi)in §3.2.3. The conditional distributions ofa bivariate normal can be similarly derived.283.3. MCMC DiagnosticsPosterior: Similar to the case of data with a validation sub-sample, weproceed as follows for the mixed algorithm:1. Set starting values of(08,9).2. At thettiteration,a Given parameters=(0, 0), update the unobserved actualexposure data.• Based on the updated cell counts {uj} at thetthiteration, modelparameters are generated as follows:(i) Simulate 0 conditioning on 0’ using the M-H algorithm.The proposed jumping rule is9Beta(noi + no3 +7105 +1, no2+no4+no6+1), with acceptance rate mm{(Ot_1t1)}(ii) Simulate 14 conditioning on 0. The proposed jumping ruleis0 Beta(nn + fl13 + n15 + 1, n12 + nl4 + fl16 + 1), withI r(O’iI)acceptance rate mini1 0’• Calculate the log odds ratio at thetthiteration.3. Repeat step (2) at subsequent iterations, for t = 1,.. .,m + n, tosimulate target parameters alternately using the hybrid algorithm.3.3 MCMC DiagnosticsFormal convergence diagnostic techniques are addressed here, to identifyvarious frequently occurring issues regarding mixing and coverage of theMCMC algorithms discussed in §3.2. There are several common issues asdiscussed by Gill [2008]:• There is no formal way to ensure that the chain at currently in thetarget distribution for a given Markov chain at a given time.• It is not possible to ensure that a Markov chain will explore all areasof the target distribution in finite time.• Slow convergence. Although theoretically this is not a problem, it is apractical issue.Fundamentally these concerns can be summarized as setting up the parameters of the process appropriately, ensuring satisfactory mixing throughout293.3. MCMC Diagnosticsthe whole sample space, and obtaining convergence at some point. There aresome design issues that must be taken into consideration before constructingand running the chain. Some of these considerations are taking decisionslike determining where to start the chain, judging how long to burn-in thechain before recording values for inference, and determining whether to thinthe chain values by discarding portions of the output.1. Initialization: When little is known about the process, some researchersrandomly distribute initial values through the state space. Usually itis best to try several different starting points in the state space andobserve whether they lead to noticeably different descriptions of theposteriors. This is an obvious sign of non-convergence of the Markovchain. Unfortunately the reverse is not true: it is not the case that ifone starts several Markov chains in different places in the state spaceand they congregate for a time in the same region that this is the region that characterizes the stationary distribution. It is possible thatall of the chains are influenced by the same local maxima.2. Burn-In: The beginning set of runs are discarded as they are notexpected to be representative of the target distribution. Unfortunately,there is no formal way to calculate the appropriate length of the burn-in period. Assessing diagnostic plots or other convergence statisticsdescribed in the literature are the usual ways to determine the burn-inperiod.3. Mixing: A chain that has not fully explored the stationary distributionwill tend to give biased results since it is based on only a subset ofthe state space. Often slow mixing through the target distribution canbe attributed to high correlation between model parameters - hencechecking autocorrelation is a good idea. This is particularly the casewith the Gibbs sampling algorithm. High intra-parameter correlationis also an issue with the Metropolis-Hastings algorithm since it willalso induce slow mixing, due to observing too many rejected candidatevalues.4. Chain thinning: In the very long simulations, storage of the observedvalues on the computer becomes a huge problem. Not only the storage, but also the process of storing the high dimensional parameterrealizations will slow down the computation. The idea of thinning thechain is to run the chain in an usual fashion, but record only every c-thvalue of the chain, thus reducing the storage demands while still preserving the general trend of the Markov process. Here c is some small303.3. MCMC Diagnosticsinteger. It is worth mentioning that this approach does not improvethe quality of the estimate, speed up the chain or help in convergence -rather it is the other way around - the variance estimate will be somewhat distorted due to use of less observations. Still, it is a tool fordealing with possibly limited computer resources. Given the tradeoffsbetween storage and accuracy as well diagnostic ability, the value of cshould be carefully chosen in any given problem.Keeping all the above aspects in mind, we still need to find the numberof iterations that would be sufficient for approximating the convergence tothe target distribution or the length of burn-in sample. Various methods areproposed in the literature for monitoring the convergence of Markov ChainMonte Carlo chains; see Cowles and Garlin [1996], Brooks et al. [1997],Geyer [1992], Raftery and Lewis [1992], Hastings [1970], Robert [1995] andRizzo [2007] for more detailed discussion. We will discuss the graphicaldiagnosis and the approach suggested by Gelman and Rubin [1992] and gel;Gelman [2004].3.3.1 Conventional Graphical DiagnosisGraphically, trace plots are the most popular way to assess convergence. Ifthe iterations are run for fairly long time, the trend will move from initialvalues to the desired density. Other plots that are popularly used includethe mean graph - which plots the mean scores of the previous values versusthe iteration number. If the chain is stable, a flat line will be produced.This does not evidently prove convergence, but if the chain is not producinga fiat line, this indicates that the chain has not yet converged. Also, densityplots of the estimates after burn-in can be drawn.3.3.2 Gelman-Rubin Method for Monitoring Convergencegel suggested that the lack of convergence can be appropriately detectedby comparing multiple sequences (at least two) with initial points beingwidely dispersed in the target distribution. The Gelman-Rubin statistic R(shrink factor) of monitoring convergence of a Markov chain is based oncomparing the behavior of a group of chains with respect to the varianceof a given scalar summary statistic. The estimates of the variance of thestatistic are analogous to estimates based on between-sample and withinsample mean squared errors in a one-way analysis of variance. It uses thebetween sequence variation of the summary statistic as an upper boundand the within-sequence variance as a lower bound. The idea behind this313.3. MCMC Diagnosticsmethod is that, if the chain converges to the target distribution, both thevariances will also converge. It is recommended that the sequence be rununtil R for all the summaries are less than 1.2 at most. If it is less than 1.1,the convergence is even better.32Chapter 4Simulation Results4.1 Data GenerationTo generate data for our setting as described in Table 1.1 and Table 1.2, thesteps are:1. We independently generate the true exposure status (V = 0 and 1)at Y i for i = 0, 1. The generating distribution is Bernoulli withparameter r.2. We generate surrogate measurementsV*IVfrom Bernoulli based onthe fact thatP(V*=1IV=0)= 1-SNF(V*=1IV=1)= SF,where r1 is the exposure prevalence, SN is the seusitivity and SF is thespecificity under nondifferential classification. Now we cross-tabulate to getthe validation table. The main data generation is exactly the same - but inthis case we omit the true exposure status (V) from the classification - it isonly about apparent measurements.4.2 Scenario Settings Under FrequentistAdjustmentWhile dealing with frequentist adjustments, we utilize all the n’s (observedvalues) but not the u’s (unobserved values) as mentioned in Tables 1.1 and1.2. Tn the model without validation data or the two parameter model (thesetwo names of this model will be used interchangeably throughout the entirework) discussed in §2.2.1, we simply use the column totals from the tables.But in the model with validation data (or four parameter model) discussedin §2.2.2, we also use the n’s that are inside the validation table. Hence,when we make comparison, for example - say, for sample size 2000 where334.2. Scenario Settings Under Frequentist Adjustment200 are in the validation and 1800 are in the main (unvalidated) part, thenfor without validation data we use the 2000 subjects aggregately as if therewere no validation part (marginal total for the surrogate variable are knownfor both parts and the without validation model uses just these marginaltotals). But with the validation model, we can recognize 200 subjects ascomprising the validation part and the rest as the main part.To understand the performance of frequentist adjustment for nondifferential misclassification in the simplest possible way, several scenarios areconsidered, as shown in Table 4.1, varying the level of exposure prevalence,or sensitivity and specificity, or sample size, or sample proportion of thevalidation and main parts of the data. Notice that, the whole process isvery complex. Here, the factors are merely assessed in an uni-dimensionalway in all these cases, that is, all other factors are held constant when weswitch from one scenario setting to another, so we will not be able to assessthe possibility of interactions among the factors. That would require morecombinations of scenarios and a huge amount of data would have to be generated. However, there would be some limitations to that approach as well- such as computing time and storage and, above all, comprehending andinterpreting all those data would be challenging. As our objective of assessing impacts on hypothesis testing is a relatively new one in epidemiologicresearch, this simplified approach should provide some rough ideas aboutthe effects which will suffice as a first step in the process.We use power curves as the tool of comparison for this frequentist approach. Therefore, the null hypothesized value (difference between the exposure prevalences is zero) is the mid-value on the horizontal axis. On theright and left side of it, four other equidistant difference points are selectedin each direction based on the difference of the exposure prevalences fromcase and control groups, according to alternative hypothesis. In this work,the considered absolute difference between exposure prevalences from caseand control groups were 0.05, 0.10, 0.15, 0.20 (fixing r0 and changing r1 toachieve the desired difference). Therefore, we have nine points in total inone power curve. The process of getting the estimated power is as followsfor any one point: 10,000 datasets are generated according to the hypothesized difference in exposure prevalence from two groups. We implement thehypothesis test on each dataset and evaluate the p-value. The estimatedpower is given by the proportion of the datasets that provide a p-value lessthan the chosen level of significance 0.05. In theory, with a large numberof datasets, the lowest point of the power should be the chosen level of sig344.2. Scenario Settings Under Frequentist Adjustmentnificance at the null hypothesized point. This is under the assumption thatasymptotic cut offs are accurate.Power curves from the hypotheses of H0 := 6i = 6 (from withoutvalidation data) and H0 : r0 = r1 = r (from with validation data) are showntogether in each graph because of their equivalence as described in §2.2. Ona technical note, to allow reproducibility of the results, the seed is chosenarbitrarily and kept the same throughout the entire analysis.Table 4.1: Scenarios under considerationFactor changed SN, SF Sample SizeScenarios A B C D B F C BValidated data 200 200 200 200 200 200 200 200Unvalidated data 1800 1800 1800 1800 200 400 800 1800To = Ti 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40SN = SF 0.60 0.70 0.80 0.90 0.70 0.70 0.70 0.70Factor changed Exposure Prevalence Proportion of dataScenarios I J K B B N 0 PValidated data 200 200 200 200 200 500 1000 1500Unvalidated data 1800 1800 1800 1800 1800 1500 1000 500r0 = 0.25 0.30 0.35 0.40 0.40 0.40 0.40 0.40SN SF 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.704.2.1 Under Different Values of Sensitivity and SpecificityWe generated 10,000 datasets with exposure prevalence 0.4 for both caseand control groups, where, in each dataset, 200 were in the validation part(50% in the case group, and the rest in the control group) of the data aswas described in Table 1.2, and 1800 were in the main dataset (again 50%in the case group and the rest are in the control group) as was describedin Table 1.1. Four different sets of sensitivity and specificity values wereconsidered: 0.60, 0.70, 0.80 and 0.90. We implemented the likelihood ratiotest (discussed in §2.2) for the two parameter (6o, 6) model for data without validation part and the four parameter (ro,Ti, SN, SF) model for datawith validation part. The estimated power curves out of these tests for allthe cases under consideration are shown in Figure 4.1. From the figure, itis evident that the power of the two parameter model is always dominatedby that of the four parameter model. The situation is much exacerbated354.2. Scenario Settings Under Frequentist Adjustmentwhen the values of sensitivity and specificity are low. However, when themisclassification is at least 0.8, the powers of both tests are almost the same.According to the theory, for the exact tests, the power at the null point(where the difference between the exposures of case and control groups arezero) should be equal to the level of significance, which is 0.05. Due tosimulation error, this might deviate a bit. From the Figure 4.1, we canalso see that the lowest point of powers do match at 0.05 in each setting.Therefore, the number of simulations considered here are adequate to showthe power curves nicely.4.2.2 Under Different Sample SizesLike the previous scenario, we generated 10,000 datasets. The exposureprevalence for both case and control groups was 0.4. Sensitivity and specificity of both groups was 0.7. The sample sizes varied in this scenario asfollows: 400, 600, 1000 and 2000, where in each situations, we had 200 asthe validation part of the data and rest were the main part of the data(again half are allocated in case group, and rest are in control group in eachsetting). Still the four parameter model is superior considering the power ofthe likelihood ratio tests, as shown in Figure 4.2. Naturally, as the samplesize increases, the power of both the tests increases.4.2.3 Under Different Exposure Prevalence RatesIn this scenario, we considered 10,000 datasets. Again sensitivity and specificity were set to be 0.7. In each dataset, we had 200 as the validation partof the data and 1800 as the main part of the data (50% in case group, andrest are in control group). The hypothesis regarding the exposure prevalencewas always H0 :ro == r or equivalently H0 : = 0, where r-j =— ri.Alternative hypothesis in this case would be that the difference of r0 and r1is not zero. To draw a complete power curve, we assume that the possibledifferences in horizontal axis are 0.05, 0.10, 0.15, 0.20 in both directions, sothat we get nine points in total to draw a power curve. There were fourvalues of r under consideration: r = 0.25, r = 0.30, r = 0.35 and r = 0.40.From Figure 4.3, in all the cases, the power of the two parameter model isless than the four parameter models, and the power does not seem to varymuch under different exposure prevalence values r. In practical situations,sometimes we see much less prevalence. Hence, we construct power curvesfor lower prevalence rates such as r = 0.005, r = 0.01, r = 0.05 and r 0.10,3642. Scenario Settings Under Frequentist AdjustmentFigure 4.1: Power curves under different sensitivityand specificity values: 0.6,0.7, 0.8 and 0.9 respectively\dtoD7Parameter Difference Parameter DifferenceParameter Difference Parameter Difference374.2. Scenario Settings Under Frequentist AdjustmentFigure 4.2: Power curves under different sample sizes: 400, 600, 1000 and 2000respectively (validation sub-sample size is fixed at 200 in each situation)Parameter DifferenceParameter Difference Parameter Difference384.2. Scenario Settings Under Frequentist Adjustmentwhich are shown in Figure 4.4. In all the case, we find the previous conclusion is still valid.One point we should mention is that, for lower exposure prevalencesuch as 0.005, while finding the maximum likelihood estimators, sometimesthe optim function goes out of bound. Therefore, for finding maximumlikelihood estimators of 10,000 simulations in the null hypothesis situation(rO = r1 = r), we had to iterate the process of generating new datasets132,134 times for r = 0.005, 42,798 times for r 0.01, 10,859 times forr = 0.05 and 10,047 times for r = 0.10. However, for higher exposureprevalence rates, we never had this problem of non-convergence.4.2.4 Under Different Proportion of Validation and MainPart of the DataAs for all the other scenarios, we generated 10,000 datasets, with sensitivityand specificity 0.7 and exposure prevalence 0.4. But, keeping the totalsample size fixed at 2000, we changed the proportions of the validation andthe main (unvalidated) part of the dataset, — which are 1:9 (200:1800), 1:3(500:1500), 1:1 (1000:1000) and 3:1 (1500:500) respectively. From Figure4.5, the two parameter model has an identical power curve in all situations,but as the proportion of main data decreases for the four parameter model,power increases sharply.4.2.5 Comparison under Budgetary ConstraintCost effectiveness is obviously an important measure of the ultimate worthof a study design. While designing a study, we aim to obtain the best quality of information for a given resource, say, in terms of money or time. Ofcourse, the optimal solution for a given study design is hard to obtain, because not all the parameters are usually known and there might be externalconstraints. Nonetheless, for our study, by considering the stated assumptions and the parameters of the described models, we tried to find whichmodel performs better under a fixed budgetary constraint.Validation data is costly to collect. The high cost of validation datalimits the size of the validation sub-sample in a fixed cost design. From theprevious scenarios we considered, the model without validation data couldbe at best as good as the model with validation data given favorable conditions, but never better. The critical issue we wanted to investigate here is to394.2. Scenario Settings Under Frequentist AdjustmentFigure 4.3: Power curves under different Exposure Prevalences: 0.25, 0.30, 0.35and 0.4 respectivelyParameter Difference Parameter Difference404.2. Scenario Settings Under Frequentist AdjustmentFigure 4.4: Power curves under smaller Exposure Prevalences: 0.005, 0.01, 0.05and 0.10 respectivelyPower Curve for HO: r = 0.005Power Curve for HO: r.0.lPower Curve for HO: r 0.012poromelwMod— 4porovwlerModD 0b5 0.0 0.5 00PwomwwvePower Corre for HO: r — 0.05•- 2 po,uvulor Modd— 400,urnolerModdibo o.b5 0.10 0.15 0 0Jwenoo2 pa,umolur Model— 4 parumotur Model0.00 005 0.10 0.15 020ParamelurJ2erecur-. 2 parunrole, Model— 4 parumoler Model0.00 0.55 0.10 0.15 0.20Pa,ameler0De,enoe414.2. Scenario Settings Under Frequentist AdjustmentFigure 4.5: Power curves under different proportions of validation part and mainpart of the data: (1:9, 1:3, 1:1 and 3:1) respectively424.2. Scenario Settings Under Frequentist Adjustmentfind out whether there is any point where the model without validation databecomes superior to the model with validation data. In other words, howcostly the validation data have to be to abandon the model with validationdata, or whether a researcher can always choose the model with validationdata without any trade-off. We investigated using a particular example asfollows.Say, we have $2400 as a budget for designing a retrospective study usingeither the model with validation data or the model without validation data.We arbitrarily set $1 as the cost of an unvalidated observation. We considerthree pricing choices:1. Collecting validated data costs three (3) times cost as much as collecting unvalidated (main) data. The allocations of validated and unvalidated data considered are provided in Table 4.2.2. Collecting validated data costs five (5) times cost as much as collectingunvalidated (main) data. The allocation of validated and unvalidateddata considered are provided in Table 4.3.3. Collecting validated data costs ten (10) times cost as much as collectingunvalidated (main) data. The allocations of amount of validated andunvalidated data considered are provided in Table 4.4.Table 4.2: Scenarios under constant cost = $2400, assuming that collecting validated data costs three (3) times as much as collecting unvalidated (main) dataScenario Validated data Unvalidated data CostQ.S 2 x 50 2 x 1050 2 x (3 x 50 + 1050) = 2400R.S 2 x 100 2 x 900 2 x (3 x 100 + 900) = 2400S.3 2 x 200 2 x 600 2 x (3 x 200 + 600) = 2400T.3 2x300 2x300 2x(3x300+300)=2400In Tables 4.2, 4.3 and 4.4, we only consider situations where sample sizesare equal for cases and controls in both validated and unvalidated parts.From Figure 4.6, the with validation data model is still superior in allscenarios despite the fact that validation sample costs three times more tocollect compared to an unvalidated sample. However, when the cost is fivetimes as much, both models have almost the same utility, as shown in Figure4.7. But from Figure 4.8, it is evident that the model without validation434.2. Scenario Settings Under Frequentist AdjustmentFigure 4.6: Power curves under fixed amount of cost = $2400 assuming thatcollecting validated data costs three (3) times as much as collecting unvalidated(main) data—6.2 —6.1 olo oi 012Parameter Difference—0.2 —0.1 0.0 0 1Parameter Difference—6.2 —6.1 olo cli 02Parameter Difference—0.2 —0.1 00 0.1Parameter DifferenceTable 4.3: Scenarios under constant cost = $2400, assuming that collecting validated data costs five (5) times as much as collecting unvalidated (main) dataScenario Validated data Unvalidated data CostQ.5 2 x 50 2 x 950 2 x (5 x 50 +950) =2400R.5 2x100 2x700 2x(5x 100+700)=2400S.5 2 x 150 2 x 450 2 x (5 x 150 + 450) 2400T.5 2 x 200 2 x 200 2 x (5 x 200+ 200) = 2400- - -- Without Validation Data— With Validation Data02Without Vetidation DataWith Validation Data012444.2. Scenario Settings Under Frequentist AdjustmentFigure 4.7: Power curves under fixed amount of cost = $2400 assuming thatcollecting validated data costs five (5) times as much as collecting unvalidated(main) dataParameter Difference Parameter DifferenceO1O1O2Parameter Difinrence Parameter DiffemneeTable 4.4: Scenarios under constant cost = $2400, assuming that collecting validated data costs ten (10) times as much as collecting unvalidated (main) dataScenario Validated data Unvalidated data CostQ.1O 2x25 2x950 2x(10x25+950)=2400R.1O 2x50 2x700 2x(10x50+700)=2400Sb 2x75 2x450 2x(10x75+450)=2400T.bO 2 x 100 2 x 200 2 x (10 x 100+ 200) = 2400454.2. Scenario Settings Under Frequentist AdjustmentFigure 4.8: Power curves under fixed amount of cost = $2400 assuming thatcollecting validated data costs ten (10) times as much as collecting unvalidated(main) data0201000102Parameter Difference Parameter Difference0202Parameter Difference Parameter Difference464.2. Scenario Settings Under Frequentist Adjustmentdata can be superior to the with validation data model, givena fixed totalcost and a much higher cost for validation data. This is the only case amongthe scenarios we have considered, where the model without validationdatacan possibly be superior - when the cost of a validated observation is muchhigher than the cost of an unvalidated observation. This is one practicallimitation of the model with validation data that the researchers shouldkeep in mind when designing a study.474.3. Scenario Settings Under Bayesian Adjustment4.3 Scenario Settings Under BayesianAdjustmentWhile dealing with Bayesian adjustments, we utilize all the n’s (observedvalues) and the u’s (unobserved values) in Tables 1.1 and 1.2, and the models (two parameter for data without validation part and four parametermodels for data with validation part) discussed in §3.2.3 are utilized. Asfor frequentist adjustment, for the two parameter model, we simply use thecolumn totals from the tables, while in the four parameter model, we usethe n’s that are inside the validation table as they are observed. To ensurecomparability, both models in Bayesian adjustment utilize the same amountof data. The only difference is that the four parameter model recognizes thevalidation part, while the two parameter model ignores the true classification information of the validation part.Exactly the same scenarios discussed in §4.2 are considered to understandthe performance of Bayesian adjustment to nondifferential misclassification,varying the level of exposure prevalence, or sensitivity and specificity, orsample size, or sample proportion of the validation and the main (unvalidated) part of the data.We used the power curve from the likelihood ratio tests as the comparison tool in assessing the frequentist adjustments. However, finding sucha tool for Bayesian adjustment models was not straightforward. Instead,this is what we have done: Once we have generated the data (as discussedin §4.1), we implemented the mixed algorithm as described in §3.2.3 for10,000 Markov Chain Monte Carlo iterations. Half of the Markov ChainMonte Carlo iterations were discarded as burn-in (we will justify the lengthof chain and burn-in in §4.3.5). Using the retained chains, we constructeda 95% credible interval for the odds-ratio and checked whether this credibleinterval contained the null value (OR = 1) or not (where OR is a functionof r0 and r1 for four parameter model as given in Equation (1.2), and also,for two parameter model, OR is a function of9oand f as given in Equation(1.3)). One other way of serving this same purpose would be to construct95% credible interval for the logarithmic transformation of the odds-ratioand test whether the constructed credible interval contains the null value(log OR 0) or not. 2,000 datasets for each set of parameters in a particular case of the scenario. To produce a graph for the cases of each scenario,we do it for nine points (corresponding to the differences of the alternative484.3. Scenario Settings Under Bayesian Adjustmenthypothesis, as was discussed in §4.2 for the frequentist approach of powercurve construction procedure) for each of the models. This information ofwhat proportion of credible intervals excluded the null value was used to finda probabilistic solution of the problem of comparison. This tool could alsobe labeled as a kind of power curve since this also uses the similar theme “reject H0 if the credible interval excludes null value”, instead of the statement“reject H0 if the p-value is less than significance level”. From the deviationfrom one model’s curve to another, one can have some understanding of theperformance of the two models in these situations.Again, to allow reproducibility of the results, the seed is chosen arbitrarily and kept the same throughout the entire analysis. Initial values needto be provided for Markov Chain Monte Carlo algorithms. Experience suggests that the initial values does not have much impact on the final results.Details of this comment are shown in §4.3.5.4.3.1 Under Different Values of Sensitivity and SpecificityThe same cases as considered in frequentist adjustment are carried out.From the Figure 4.9, it is evident that the two parameter model is alwaysdominated by the four parameter model.4.3.2 Under Different Sample SizesFigure 4.10 shows that the tests get better as the sample size increases, butthe four parameter model is always better than the two parameter model.4.3.3 Under Different Exposure Prevalence RatesThe graphs under consideration are shown in Figure 4.11. In all the cases,two parameter models are dominated by the four parameter models, and thegraphs of credible intervals excluding null values do not seem to vary muchfor the various prevalence rates under consideration.4.3.4 Under Different Proportion of the Validation DataFrom Figure 4.12, the two parameter model in all cases have almost similarcurves, but the powers increase sharply as the proportion for the validationpart increases for four parameter model.494.3. Scenario Settings Under Bayesian AdjustmentFigure 4.9: Bayesian analysis results for different sensitivity and specificity values(.6, .7, .8, .9): Proportion of credible intervals excluding null value504.3. Scenario Settings Under Bayesian AdjustmentFigure 4.10: Bayesian analysis results for different sample sizes (400, 600, 1000,2000): Proportion of credible intervals excluding null valueParameter DifferenceParameter Difference Parameter Difference514.3. Scenario Settings Under Bayesian AdjustmentFigure 4.11: Bayesian analysis results for different exposure prevalence (.25, .3,.35, .4): Proportion of credible intervals excluding null value-0.2 —0.1 0.0Parameter Difference0.1 0.2Parameter DifferenceParameter Difference Parameter Difference524.3. Scenario Settings Under Bayesian AdjustmentFigure 4.12: Bayesian analysis results for different ratio of data splits (1:9, 1:3,1:1, 3:1 respectively for validation and main part): Proportion of credible intervalsexcluding null valueItSSta101SantS-0.2 —0.1 0.0Parameter Difference0.1 0.2 -0.2 —0.1 0.0 01Parameter Difference02S20tap1- - - -- Withoat Validation DataIdatonDateItSSSantS—0.2 —0.1 0.0 0.1Parameter Difference0.2 —0.2 —0.1 0.0 0.1 0.2Parameter Difference534.3. Scenario Settings Under Bayesian AdjustmentIn all four situations considered here, we have the same conclusion aboutthe respective situations from both the frequentist and Bayesian approaches.In fact, if we compare Figure 4.9 with Figure 4.1 and Figure 4.10 with Figure4.2 and Figure 4.11 with Figure 4.3 and Figure 4.12 with Figure 4.5, theshapes of curves from the respective situations are strikingly similarity.4.3.5 DiagnosticsFor diagnostic purposes, we generate datasets with exposure prevalence 0.3for both case and control groups and sensitivity and specificity both equalsto 0.7. As shown in Figures 4.13, 4.14, 4.15 and 4.16 for 10,000 MCMCiterations, the trace plot of all the parameters r, r1, SN and SF look stable after the burn-in in four chains with different starting values (0.2, 0.4,0.6, 0.8 for each parameters). All these figures are obtained using one singledataset as an example. The burn-in is colored as grey and after burn-in, theestimates are colored as black in each of these graphs.Sometimes graphical diagnostics are not very reliable, Therefore, we resort to some statistics that are used for such chain diagnosis, such as Gelmanand Rubin’s convergence diagnostic statistic which was discussed in §3.3.2.This statistic requires more than one chain, and hence we used the fourchains with four different set of initial values. Theory says that the statisticshould not go beyond 1.2. For this particular dataset, for the parametersr0,r1, SN and SF, we had the Gelman and Rubin’s convergence diagnosticstatistic, R 1.011, 1.003, 1.011 and 1.008 respectively, for 10,000 iterations in each. Figure 4.21 indicates the evolution of Gelman and Rubin’sconvergence diagnostic statistic as the number of iterations increases. Fromthis figure, it is evident that the chain is very satisfactory after burn-in.Also, to check whether the initial value has any effect on estimates, weplotted the means of each of the four chains which started from differentinitial values. As shown in Figures 4.17, 4.18, we can see that both convergeto 0.3, which was the original exposure prevalence value used to generatethe considered dataset. Similarly, from Figures 4.19, 4.20, we see that boththe sensitivity and specificity eventually converge to 0.7, which was theparameter value used to generate the datasets under consideration.544.3. Scenario Settings Under Bayesian AdjustmentFigure 4.13: Diagnosis of convergence of Bayesian analysis results: Trace Plotsfor ro in 4 chains with different starting values (for 10,000 iterations, with halfburn-in) for a single datasetr0 r05543. Scenario Settings Under Bayesian AdjustmentFigure 4.14: Diagnosis of convergence of Bayesian analysis results: Trace Plotsfor r1 in 4 chains with different starting values (for 10,000 iterations, with halfburn-in) for a single datasetO 20O 400 doo 8d00 ioOoo9!iLiiIi!P1gJihihi.I I!‘J-- it: -1:-r_‘Li’0 2000 4000 0000Ii0 00 4000 00oo 00oo io00111111‘II iiII1’I’ll!6000 10 2000 4000Ii0000 0000 10000564.3. Scenario Settings Under Bayesian AdjustmentFigure 4.15: Diagnosis of convergence of Bayesian analysis results: Trace Plotsfor SN in 4 chains with different starting values (for 10,000 iterations, with halfburn-in) for a single datasetO 2d00 4coo 6d00 8d00 ioOooSN11a0 2000 4000 6000 8000 10004SN SNISN574.3. Scenario Settings Under Bayesian AdjustmentrI96 2d00 4d00 oooo odoo ioOoosPFigure 4.16: Diagnosis of convergence of Bayesian analysis results: Trace Plotsfor SF in 4 chains with different starting values (for 10,000 iterations, with halfburn-in) for a single datasetwI6 zobo 4000 doo adoo ioOoosPiIi.U 0I II10 2000 4000 6000 8000 10000sP‘I0 2000 4000 6000 8000 10000sp584.3. Scenario Settings Under Bayesian AdjustmentFigure 4.17: Sequence of the mean of posterior forro for the four Markov ChainMonte Carlo Chains for 10,000 iterationsr0 r00o .________________________00C.,oC.,1’000CN°Ô 2d00 4OO 6d00 8d00 ioöc Ô 2d00 4d00 6d00 8OO ioôooChain 1 Chain 2r0 r001000a000__________C.° ó 2d00 4d00 6d00 8d00 ioôoo 6 2d00 4d00 6d00 8d00 ioôooChain 3 Chain 4594.3. Scenario Settings Under Bayesian AdjustmentFigure 4.18: Sequence of the mean of posterior for r1 for the four Markov ChainMonte Carlo Chains for 10,000 iterationsr1 r1C 0d 0d 0CD0a- a(‘1’00 0c,J C.,°a 2d00 4000 60h0 8d00 ioöoo O 2d00 4d00 6d00 8OO bOaChain 1 Chain 2r1000000_______________0-a 2d00 4000 60h0 8d00 ioôoo 0 2000 4000 6000 8000 10000Chain 3 Chain 4604.3. Scenario Settings Under Bayesian AdjustmentFigure 4.19: Sequence of the mean of posterior for SN for the four Markov ChainMonte Carlo Chains for 10,000 iterationsSN SNh______0 d00CDo0C)002d00 4000 6d00 8d00 1060006 2000 4d00 6d00 8d00 ioôooChain 1 Chain 2SN SNCDCDo CDr.CDoCD0cc?0000__________6 2d00 4d00 6d00 8fiOO ioOoo 6 2000 4d00 6d00 8d00 ioôuoChain 3 Chain 4614.3. Scenario Settings Under Bayesian AdjustmentFigure 4.20: Sequence of the mean of posterior for SP for the four Markov ChainMonte Carlo Chains for 10,000 iterationssP sP0Cr,00°00020O 4000 6d00 8000 ioôoo02d00 4d00 6d00 80b0 ioOooChain 1 Chain 2sP sPC.- 0F-.CD CDCDF—0 0CD F-,Cc’J,00C0 F-____________002d00 4d00 6d00 80b0 ioôoo 5 2d00 4d00 6d00 8d00 ioóooChain 3 Chain 4624.3. Scenario Settings Under Bayesian AdjustmentFigure 4.21: Sequence of the Celman-Rubin E for the four Markov Chain MonteCarlo Chains for 10,000 iterationsBum—in Samples— Retained SamplesCut—point at 1.1-Cut—point at 1.2Burn—in Semples— Retained SempleeCut—point atll- Cut—point etIZno- - -- Porn-in Samples— Ratainad SamptaeCut—point at 1.1--- Cut-pdntatl26 2600 4655 6650 ados ioOstSNSum—inSamptes— Retained SamplesCot—point at 1.1Cot—point at 126 265o 4600 sdoo Bobs ioOooSpS 2000 4000 6000 a000 10000 6 odoo 4600 6600 esbo ioboo63Chapter 5Application inEpidemiological Studies5.1 IntroductionIn almost all epidemiological studies, some amount of error in assessment isinevitable. The extent of such error depends on various factors, such as thenature of the exposure, and the instrument error associated with collectingthe information. In this chapter, we will consider two epidemiologic datasetswhere challenges specifically arise in accurately identifying the outcome ofexposure. The methods discussed in the previous chapters are consideredand applied to these datasets.5.2 Study of Sudden Infant Death Syndrome(SIDS)The performance of the methods described in the previous chapters areillustrated using a case-control study of antibiotic prescription during pregnancy and subsequent occurrence of Sudden Infant Death Syndrome (SIDS)[Greenland, 1988b, a, 2008], [Marshall, 1990, 1997], [Kraus et al., 1989].The association of interest is between the prescription of antibiotics duringpregnancy (V) and SIDS (Y). The surrogate exposure or error-prone measurement(V*)was an interview response, whereas the true exposure (V)was derived from medical records. The validation studies, in cases (Y = 1)and controls (Y = 0), were joint(V*,V) designs done as sub-studies thatresulted in the data presented in Table 5.1.Frequentist estimates of parameters of the two models under consideration for the SIDS study data are reported in Table 5.2. From this table, wecan see that the apparent prevalence rates are close estimates of the prevalence rates obtained while considering the validation data. The validationdata model shows that the data has low sensitivity (0.6), but high speci645.2. Study of Sudden Infant Death Syndrome (SIDS)Table 5.1: Data from the study of sudden infant death syndrome (SIDS) andantibiotic prescriptionY Cases (Y = 1)JControls (Y = 0)Validated PartV*= 1 V = 0V*= 1 V = 0V=r1 29 17 2116V=0 22 143 12 168Unvalidated (main) 122 442 101 479Total173 602 134 663ficity (0.9). The log-odds ratios in both groups are positive numbers. Forwithout validation data, the estimate of odds ratio is 1.422 and 95% Waldconfidence limits are (1.11, 1.83), calculated using the formula provided byMarshall [1997]. Not surprisingly, the likelihood ratio p-value obtained fromthis model is small (0.006). These results match with the case discussed byGreenland [2008]. On the other hand, for with validation data model, theestimated odds ratio is 1.49 with 95% Wald confidence limits (1.02, 2.16),which is coherent with the findings of Greenland and Gustafson [2006]. Thelikelihood ratio p-value is also small in this model (0.035). Therefore, theconclusions from both models are the same. They suggest that the hypothesis H0 : r0 Ti is rejected at cv 0.05. That is, the true log(OR) issignificantly far away from 0 based on the evidence provided by the SIDSstudy data.Table 5.2: Frequentist Estimates of the model parameters in the SIDS studyNot considering_Validation data Considering Validation dataParameters Estimate S.E. Parameters Estimate S.E.9o0.168 0.013. r0 0.163 0.0218 0.223 0.015 r1 0.225 0.024SN 0.603 0.047SP 0.903 0.013log(OR) 0.352 0.128 log(OR) 0.398 0.191P-value 0.006 P-value 0.035The Bayesian estimates and standard errors are reported in Table 5.3.The priors used here are very general and similar to those described in §3.2.3.These results are very similar to those obtained using maximum likelihood.655.2. Study of Sudden Infant Death Syndrome (SIDS)Both the 95% credible intervals of the odds ratio obtained from the withand without validation data model fail to include the null value 1 insidethe interval. Moreover, the estimates and credible intervals are very similarto those obtained by frequentist methods. Therefore, the null hypothesis isstill rejected by the Bayesian tools. That means the data suggests a positiveassociation between the prescription of antibiotic and consequent incidenceof SIDS, under the assumption of equality of misclassification probabilities.For the Bayesian estimates and hypothesis testing results reported in Table 5.3 and trace plots in Figure 5.1, the initial values of r0,r1, SN and SPwere set to 0.4, 0.4, 0.7 and 0.7 respectively. For6oand 6, it was 0.2 and 0.2.One interesting issue needs to be addressed here. Other than a few spacial cases, it is well known that under the nondifferential misclassificationassumption, in absence of any other errors, the estimates of measure of association, such as odds ratio should be biased towards the null ‘on average’.However, in this particular data, we notice that the estimate of odds ratioslightly goes away from null (1.42 to 1.49), as the theory suggests, but thep-value from Wald test gives us the opposite message - it increases from0.0029 to 0.0184 respectively (which dictates towards the null behavior afteradjustment). This might be due to the fact that the posterior variance isbeing iinrierestimated in the without validation data situation, and hencethe posterior variance increases after adjustment, providing an even widercredible interval. Such phenomenon of increment of uncertainty even thoughthe odds ratio moves away from null after adjustment, is already noted byGustafson and Greenland [2006]. The likelihood ratio test acts similarly tothe approximate Wald test. This might be one indication that the assumption of nondifferentiality was not completely satisfied, if we rule out theexplanation of random variation due to chance in this particular example.Since both p-values are small enough to reject the null hypothesis, this doesnot alter the conclusion in this example.The Gelman and Rubin convergence diagnostic statistic, 1? value for thefour parametersro, r, SN and SP are 1.002, 1.003, 1.045 and 1.009 respectively. Also, for 8 and 8o, R gives 1.002 and 1.003 respectively. All thesevalues are much less than 1.2. Here, various initial values were set to checkthe convergence - such as 0.2, 0.4, 0.6 and 0.8 for each of the parameters under consideration. 10, 000 iterations were performed and half were retainedafter burn-in to estimate each parameters. Also, from Figure 5.2, we cansee that the posterior distributions does not have any multimodality, which665.2. Study of Sudden Infant Death Syndrome (SIDS)Figure 5.1: MCMC for the with and without validation data model parametersin the SIDS study0 2000 4000 4 8000 1— 4IiI!9ç10 2000 4000 0000 8004 00080]FILtI.[Iii0 .i0I675.3. Cervical Cancer and Herpes Simplex Virus StudyTable 5.3: Bayesian Estimates of the model parameters in the SIDS studyNot considering Validation settingJConsidering Validation settingParameters Estimate SD Parameters Estimate SDo0.168 0.013 r0 0.161 0.0209 0.222 0.015 r1 0.221 0.024SN 0.609 0.046SP 0.901 0.012log(OR) 0.351 0.129 log(OR) 0.395 0.18695%C.I. Does not include H0 value 95%C.I. Does not include Ho value(OR) (1.103, 1.830) (OR) (1.038, 2.153)is a sign of good convergence.5.3 Cervical Cancer and Herpes Simplex VirusStudyThis data is listed in Carroll et al. [1993] and discussed in Prescott andGarthwaite [2002], Carroll et al. [2006]. The research question is whetherexposure to herpes simplex virus contributes to the risk of cervical cancer.The response variable Y is an indicator of cervical cancer, V is exposureto type 2 herpes simplex virus (HSV-2) measured by a refined western blotprocedure andV*is exposure to HSV-2 measured by the western blot procedure. The data is provided in Table 5.4.Table 5.4: Data from Herpes Simplex Virus-2 studyY Cases (Y = 1)]Controls (Y = 0)Validated PartV*1 V = 0 V = 1 V 0V=1 18 5 16 16V=0 3 13 11 33[Unvalidated (main) 375 318 535 701[Total 396 336 562 750Frequentist estimates of parameters of the two models under consideration for the HSV-2 study data are reported in Table 5.5. From this table, wecan see that the apparent prevalence rates and estimates of the prevalence685.3. Cervical Cancer and Herpes Simplex Virus StudyFigure 5.2: Prior and Posterior Distributions of all the Parameters under Consideration in the SIDS study04 0:5 06 07 08 09 10sPpijot— Posterior- - -- prior— Postorior- --. Prior— Posterior00 0:1 02 0:3 0:4r001 0:2 0:3 04r1---- Prite— OSt000r0:4 0:6 08 i’oSN---- Prior— Posterioroo o1 0:2 0:3 0:4900o 01 0:2 03 0:4695.3. Cervical Cancer and Herpes Simplex Virus Studyrates for the case group obtained in the presence of the validation data arenot in complete agreement. Especially the prevalence rates for case groupare much higher than the control group, both in the before and after adjustments. For without validation data, the estimated odds ratio is 1.57and 95% Wald confidence limits are (1.31, 1.89). Also, the likelihood ratiop-value obtained from this model is very small. On the other hand, forthe with validation data model, the estimated odds ratio is 2.61 with95%Wald confidence limits (1.62, 4.18). The likelihood ratio p-value is also verysmall in this model. Since the p-values obtained from both models are verysmall, the conclusions from both models are the same. They suggest thatthe hypothesis H0 : To = r is rejected at c = 0.05. The validation datamodel shows that the exposure assessment has moderate sensitivity, as wellas moderate specificity.Table 5.5: Frequentist Estimates of the model parameters in the HSV-2 studyNot considering Validation setting] Considering Validation settingParameters Estimate S.E. Parameters Estimate S.E.9 0.428 0.014 0.418 0.0469 0.541 0.018 r1 0.652 0.053SN 0.679 0.041SP 0.743 0.043log(OR) 0.453 0.093 log(OR) 0.958 0.237P-value 9.966 x iO P-value 1.48 x 10The Bayesian estimates and standard errors are reported in Table 5.6.For the model without validation data, these results are almost the same asthose obtained using maximum likelihood. However, the estimates obtainedfrom the model with validation data are not nearly as close. Nonetheless,both the 95% credible intervals of the odds ratio obtained from the withand without the validation data model fail to include the null value 1 insidethe interval. Therefore, even with the Bayesian method, the null hypothesisis rejected. Moreover, the conclusions obtained from hypothesis testing arevery similar to those obtained by frequentist methods. As a result, we canconclude that the exposure of HSV-2 is positively associated with increasedrisk of developing cervical cancer.Using the same prior that we used in simulations in chapter 4, we cansee that the exposure prevalences are greatly underestimated in prior den705.3. Cervical Cancer and Herpes Simplex Virus Studysities. The posterior exposure prevalences are very different than suggestedin the prior. From Figure 5.4 it is evident that the posterior results are notdominated by the given prior.Table 5.6: Bayesian Estimates of the model parameters in the HSV-2 studyNot considering Validation settingj Considering Validation settingParameters Estimate SD Parameters Estimate SD8 0.426 0.014 0.383 0.0468 0.537 0.018 0.605 0.052SN 0.700 0.043SF 0.733 0.041log(OR) 0.445 0.092 log(OR) 0.912 0.23395%C.I. Does not include H0 value 95%C.I. Does not include H0 value(OR) (1.302, 1.870) (OR) (1.654, 4.085)For the Bayesian estimates and the trace plots, the initial values of r0,r1, SN and SP were set to 0.4, 0.4, 0.7 and 0.7 respectively. For8o and 8,it was 0.45 and 0.45.The Gelman and Rubin convergence diagnostic statistic, 1? value for thefour parameters r0, r1, SN and SF are 1.23, 1.16, 1.11 and 1.26 respectively. Also, for 8 and 8, R gives 1.26 and 1.28 respectively. Notice that,most of these values are over 1.2 for 10, 000 iterations considering half ofthese as burn-in. Hence we can conclude that the convergence is not goodfor the cases under consideration for 10, 000 iterations. If we increase thenumber of iterations to 40,000 and R value for the four parameters r0, r1,SN and SF becomes 1.19, 1.16, 1.05 and 1.15 respectively. For8oand 8, Rnow gives 1.01 and 1.10 respectively. As all of these R values are less than1.2, we can conclude that the convergence is satisfactory for the cases underconsideration for 40, 000 iterations, Therefore, we report the trace plots andthe Bayesian estimates of the parameters for 40, 000 iterations in Table5.6and Figure 5.3. However, it should be noted that the changes in estimationare very small (changes mostly in third decimal places) and the standarderrors are almost the same despite the larger number of iterations.As before, the initial values were set to be 0.2, 0.4, 0.6 and 0.8 for eachof the parameters under consideration. One possible reason for this analysisrequiring such large number of iterations could be due to the fact that some715.3. Cervical Cancer and Herpes Simplex Virus StudyFigure 5.3: MCMC for the with and without validation data model parametersin the HSV-2 studyij [iBefore Burn—in— 3000 Burn-inii11iI’’‘16 io60o o6oo 30600 4060Before Barn-in— her Burn-rnbr’6 1 Ooo 206 30603 4060BBIB.BB rn-rn— 200cr Bum—rn1Frr,irM’ rr6 10600 20600 30603 4060CBBI000B rn-rn— re Burn-rnIrNi Ir6 10600 20600 30600 4060Before B urn-in— AflerBurn-rn0 10000 20000 30000 40000 0 0000 20000 30000 40000725.3. Cervical Cancer and Herpes Simplex Virus Studycell counts of the Table 5.4 are 5 or less. Another possibility is that thenondifferential assumption does not hold in this case.735.3. Cervical Cancer and Herpes Simplex Virus StudyFigure 5.4: Prior and Posterior Distributions of all the Parameters under Consideration in the HSV-2 study74-V PflQf— P0ii0r----Pta-0.0 0.2 0.4 0.6 00-. -V Pta— P004 05 06 07 0.6 09 10SN0.4 0.5 0.6 0.7 0.9 OS 1.0SP---V P599— Posterior60Chapter 6Conclusions and FurtherResearch6.1 Overall ConclusionsVarious practical issues force researchers to use inferior measures of exposure assessment. When an ideal exposure measurement is replaced by anoperational method or a surrogate variable, it is well known in the literature that due to the disparity between these two measures, there are severalconsequences of such compromise. Of course the extent of disparity plays arole in the consequences. To understand the extent to which the measure ofassociation differs, a validation sub-sample is used to get some insight aboutthe misclassification probabilities. Using the added information obtainedfrom a validation sub-sample, adjustment measures are possible to correctfor such bias and the subsequent power loss in hypothesis testing procedures.The nondifferentiality assumption is very popular in the epidemiologicliterature due to its various attractive features. Two adjustment techniquesare considered in this thesis under this assumption. One is based on frequentist methods, power curves were derived for the likelihood ratio testboth with and without validation data. This is basically a standard routine, used here as a benchmark. The detailed procedure is discussed inChapter 2. The main goal is to evaluate the Bayesian counterpart whichis based on a MCMC algorithm after reasonable diagnostic checks as discussed in Chapter 3. Both these methods are implemented in two settings:considering validation data and without considering validation data. In thefrequentist method, estimates from the validation sub-sample are used toadjust for exposure misclassification, but in the Bayesian implementation,instead of having specific estimates of parameters, a set of priors are usedso that some randomness or uncertainty is induced in the inferential processamongst cases and controls.756.1. Overall ConclusionsThe main focus of this research is to identify the adjustment methodologythat performs better under fairly general conditions in hypothesis testing.A set of scenarios are considered so that both methods canbe comparedusing simulation study. These scenarios were constructed by varying thelevel of misclassification, prevalence, sample size, proportion of validationpart ii1 the whole sample and under fixed cost constraint. Since a lot ofscenarios are in possible, to simplify the problem, only the one dimensionaleffects due to the one of the parameters, sample size or sample compositionchange is considered at a time. Both methods are applied on all of thesescenarios. Details are provided in Chapter 4. As a tool of evaluation, powercurves are drawn for the frequentist method and the proportion of credibleintervals that exclude the null value are plotted for Bayesian method. Fromthese plots, it is clear that the with validation data model is always better.The without validation data (two parameter) model can be as good as withvalidation data (four parameter) model in extreme cases, but can never getbetter. We showed that this is true for hypothesis testing settings. Theonly case when the without validation data model can be superior to thewith validation data model is under fixed budget, if the cost of collectingvalidated data is much higher than collecting usual unvalidated data. Howhigh is high? This depends on the various parameters, considered samplesizes, composition of sample and budget for the study. We just showed byexample that such an exception is possible.It is worth mentioning that the settings considered by Greenland andGustafson [2006] are slightly different than those considered in this work,although they also address the issue of adjusting for misclassification in thecontext of hypothesis testing. In that paper, it is shown that given known orreasonably assumed (say, from educated guesses) values of sensitivity andspecificity, the power does not improve after adjustment under nondifferential misclassification error (assumed to be free from any other sources oferrors). This suggestion was based on the analysis of a single dataset. Incontrast, in the current work, we showed that in presence of validation data,which enables us to estimate the true exposure prevalences, sensitivity andspecificity, we have more power after adjustment, subject to the conditionthat the nondifferential misclassification assumption is satisfied.If the plots of the frequentist and the Bayesian method results for respective scenario are superimposed, they are almost indistinguishable. Comparing these plots under each scenario, it is evident that both methods performexactly the same way. Having both methods producing the same conclusion,766.2. Further Research and Recommendationsit is worth mentioning that the Bayesian framework, although very easy togeneralize to other extensions of this problem, are very demanding in termsof resources and computing time to attain results without any MCMC diagnostic anomaly. On the other hand, with the frequentist methods usedhere, although closed forms are not always attainable, simple numerical routines can optimize these likelihoods very quickly. To give real life flavor, twoepidemiologic datasets are also analyzed using the above methodologies inchapter 5, which are coherent with the simulation results.6.2 Further Research and RecommendationsFurther research could focus on extending some of the simplistic assumptions that were considered, adapting the proposed models for problems withsimilar specifications and generalizing the simulation scenario to broadercontexts.• One can consider larger combinations of the scenario setting than considered in this work to describe the effects in a broader sense. Onecould organize this effort by developing an experimental design (e.g.,a fractional factorial design) involving the factors of interest.• One immediate extension of the work is to go beyond nondifferentialassumption and check the results under differential misclassification,which is more realistic in many fields. For Bayesian adjustment, thiscan be easily done by considering the general model where the misclassification probabilities are different with respect to case and controland imposing a joint prior for those parameters with an assumed covariance structure.• To make the problem more realistic, additional exposures that are correctly measured are worth adding in the model. A logistic regressionmodel can be a start in this direction.• This dissertation only deals with dichotomous exposure misclassification. Polytomous exposure misclassification can also be another extension to this research. Instead of binomial assumption of misclassifiedexposure, a multinomial assumption will be used in that case.776.2. Further Research and Recommendations• The models used in this work can be modified to allow using replicated sub-set of data or data obtained from an alternative source inthe absence of a benchmark scorer or gold standard method of exposure assessment, instead of validation data, which could be more costeffective, especially when the cost of validation data is very high.• It is also worth investigating other tools to analyze the continuous exposure data directly, instead of dichotomizing it to make it categorical,and try to identify how much sensitivity does one loose by categorizingthe exposure variable. Spline analysis can be one way to move in thisdirection.• The Bayesian hypothesis testing could be accomplished by using theBayes factor, and then compared with the standard likelihood techniques to find out whether there is any discrepancy in those twomethodologies.78BibliographyBarron, B., The effects of misclassification on the estimation of relativerisk,Biometrics, 93(2), 414—418, 1977.Brooks, S., P. Dellaportas, and G. Roberts, An approach to diagnosing totalvariation convergence of MCMC algorithms, Journal of Computationaland Graphical Statistics, 6(3), 251—265, 1997.Bross, I., Misclassification in 2 x 2 tables, Biometrics, 10(4), 478—486,1954.Broyden, C., The convergence of a class of double-rank minimization algorithms 1. general considerations, IMA Journal of Applied Mathematics,6(1), 76—90, 1970.Carroll, R., M. Gail, and J. Lubin, Case-control studies with errors in covariates, Journal of the American Statistical Association, 88(421), 185—199,1993.Carroll, R., D. Ruppert, L. Stefanski, and C. Crainiceanu, Measurement Error in Nonlinear Models: A Modern Perspective, Chapman & Hall/CRC,2006.Chen, T., A review of methods for misclassified categorical data in epidemiology, Statistics in Medicine, 8(9), 914—921, 1989.Chu, R., Bayesian Adjustment for Exposure Misclassification in Case-Control Studies, Master’s thesis, Department of Statistics, University ofBritish Columbia, 2005.Copeland, K., H. Checkoway, A. McMichael, and R. Holbrook, Bias due tomisclassification in the estimation of relative risk, American JournalofEpidemiology, 105(5), 488—495, 1977.Cowles, M., and B. Carlin, Markov chain Monte Carlo convergence diagnostics: a comparative review, Journal of the American Statistical Association, 91(434), 883—904, 1996.79Chapter 6. BibliographyDiamond, E., and A. Lilienfeld, Effects of errors in classification and diagnosis in various types of epidemiological studies, American Journal of PublicHealth, 52(7), 1137—1145, 1962a.Diamond, E., and A. Lilienfeld, Misclassification errors in 2 x 2 tables withone margin fixed: some further comments, American Journal of PublicHealth, 52(12), 2106—2111, 1962b.Dosemeci, M., S. Wacholder, and J. Lubin, Does nondifferential misclassification of exposure always bias a true effect toward the null value?,American Journal of Epidemiology, 132(4), 746—748, 1990.Fletcher, R., A new approach to variable metric algorithms, The ComputerJournal, 13(3), 317—322, 1970.Gelman, A., Bayesian Data Analysis, CRC Press, 2004.Gelman, A., and D. Rubin, Inference from iterative simulation using multiplesequences, Statistical Science, 7(4), 457—472, 1992.Geyer, C., Practical markov chain monte carlo, Statistical Science, 7(4),473—483, 1992.Gill, J., Bayesian Methods: A Social and Behavioral Sciences Approach,Chapman & Hall/CRC, 2008.Gladen, B., and W. Rogan, Misclassification and the design of environmentalstudies, American Journal of Epidemiology, 109(5), 607—616, 1979.Goldberg, J., The Effects of Misclassification on the Analysis of Data in 2 X2 Tables, Unpublished dissertation, Harvard University School of PublicHealth. Boston, 1972.Goldberg, J., The effects of misclassification on the bias in the difference between two proportions and the relative odds in the fourfold table, Journalof the American Statistical Association, 70(351), 561—567, 1975.Goldfarb, J., A family of variable metric updates derived by variationalmeans, Mathematics of Computing, 24(109), 23—26, 1970.Greenland, S., The effect of misclassification in the presence of covariates,American Journal of Epidemiology, 112(4), 564—569, 1980.80Chapter 6. BibliographyGreenland, S., Statistical uncertainty due to misclassification: implicationsfor validation substudies., Journal of Clinical Epidemiology, 41(12), 1167—1174, 1988a.Greenland, S., Variance estimation for epidemiologic effect estimates undermisclassification, Statistics in Medicine, 7(7), 745—757, 1988b.Greenland, S., Maximum-likelihood and closed-form estimators of epidemiologic measures under misclassification, Journal of Statistical Planning andInference, 138(2), 528—538, 2008.Greenland, S., and P. Gustafson, Accounting for independent nondifferentialmisclassification does not increase certainty that an observed associationis in the correct direction, American Journal of Epidemiology, 164(1),63—68, 2006.Gullen, W., J. Bearman, and E. Johnson, Effects of misclassification inepidemiologic studies., Public Health Reports, 83(11), 914—918, 1968.Gustafson, P., Measurement Error and Misclassification in Statistics andEpidemiology: Impacts and Bayesian Adjustments, CRC Press, 2004.Gustafson, P., and S. Greenland, Curious phenomena in Bayesian adjustment for exposure misclassification, Statistics in Medicine, 25(1), 2006.Gustafson, P., N. Le, and R. Saskin, Case-control analysis with partialknowledge of exposure misclassification probabilities, Biometrics, 57(2),598—609, 2001.Hastings, W., Monte Carlo sampling methods using Markov chains and theirapplications, Biometrika, 57(1), 97—109, 1970.Joseph, L., T. Gyorkos, and L. Coupal, Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard, American Journal of Epidemiology, 141 (3), 263—272, 1995.Jurek, A., S. Greenland, G. Maldonado, and T. Church, Proper interpretation of non-differential misclassification effects: expectations vs observations, International journal of epidemiology, 34(3), 680—687, 2005.Keys, A., and J. Kihlberg, Effect of misclassification on estimated relative prevalence of a characteristic: Part I. Two populations infallibly distinguished. Part II. Errors in two variables, American Journal of PublicHealth, 53(10), 1656, 1963.81Chapter 6. BibliographyKoch, G., The Effect of Non-Sampling Errors on Measures of Association in2 x 2 Contingency Tables, Journal of the American Statistical Association,64 (327), 852—863, 1969.Kraus, J., S. Greenland, and M. Bulterys, Risk factors for sudden infantdeath syndrome in the US Collaborative Perinatal Project, InternationalJournal of Epidemiology, 18(1), 113—120, 1989.Lyles, R., A note on estimating crude odds ratios in case-control studieswith differentially misclassified exposure, Biometrics, 58(4), 1034—1037,2002.Marshall, J., The use of dual or multiple reports in epidemiologic studies,Statistics in Medicine, 8(9), 1041—1049, 1989.Marshall, R., Validation study methods for estimating exposure proportionsand odds ratios with misclassified data., Journal of Clinical Epidemiology,43(9), 941—947, 1990.Marshall, R., Assessment of exposure misclassification bias in case-controlstudies using validation data, Journal of Clinical Epidemiology, 50(1),15—19, 1997.Morrissey, M., and D. Spiegelman, Matrix methods for estimating oddsratios with misclassified exposure data: extensions and comparisons, Biometrics, 55(2), 338—344, 1999.Newell, D., Errors in the interpretation of errors in epidemiology, AmericanJournal of Public Health, 5, 1925—1928, 1962.Prescott, G., and P. Garthwaite, A simple Bayesian analysis of misclassifiedbinary data with a validation substudy, Biometrics, 58(2), 454—458, 2002.Raftery, A., and S. Lewis, How many iterations in the Gibbs sampler,Bayesian Statistics,4,763—773, 1992.Rahme, E., L. Joseph, and T. Gyorkos, Bayesian sample size determinationfor estimating binomial parameters from data subject to misclassification,Applied Statistics, 49(1), 119—128, 2000.Rizzo, M., Statistical Computing with R, Taylor & Francis, Inc., 2007.Robert, C., Convergence control methods for Markov chain Monte Carloalgorithms, Statistical Science, 10(3), 231—253, 1995.82Chapter 6. BibliographyRothman, K., S. Greenland, and T. Lash, Modern Epidemiology, LippincottWilliams & Wilkins, 2008.Shanno, D., Conditioning of quasi-Newton methods for function minimization, Mathematics of Computation, 2 (111), 647—656, 1970.Thomas, D., When will nondifferential misclassification of an exposure preserve the direction ofatrend?, American Journal of Epidemiology, .L4.2(7),782—784, 1995.Walter, S., and L. Irwig, Estimation of test error rates, disease prevalenceand relative risk from misclassified data: a review., Journal of ClinicalEpidemiology, 41(9), 923—937, 1988.Willett, W., An overview of issues related to the correction of nondifferential exposure measurement error in epidemiologic studies, Statistics in Medicine, 8(9), 1031—1040, 1989.Youden, W., Index for rating diagnostic tests, Cancer, 8(1), 32—35, 1950.83
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Evaluating the performance of hypothesis testing in...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Evaluating the performance of hypothesis testing in case-control studies with exposure misclassification,.. Karim, Mohammad Ehsanul 2009-12-31
pdf
Page Metadata
Item Metadata
Title | Evaluating the performance of hypothesis testing in case-control studies with exposure misclassification, using frequentist and Bayesian techniques |
Creator |
Karim, Mohammad Ehsanul |
Publisher | University of British Columbia |
Date | 2009 |
Date Issued | 2010-03-24T20:15:00Z |
Description | In epidemiologic studies, measurement error in the exposure variable can have large effects on the power of hypothesis testing for detecting the impact of exposure in the development of a disease. As it distorts the structure of data, more uncertainty is associated with the inferential procedure involving such exposure variables. The underlying theme of this thesis is the adjustment for misclassification in the hypothesis testing procedure. We consider problems involving a correctly measured binary response and a misclassified binary exposure variable in a retrospective case-control scenario. We account for misclassification error via validation data under the assumption of non-differential misclassification. The objective here is to develop a test to check whether the exposure prevalence rates of cases and controls are the same or not, under the frequentist and Bayesian point of view. To evaluate the test developed under the Bayesian approach, we compare that with an equivalent test developed under the frequentist approach. Both these approaches were developed in two different settings: in the presence or absence of validation data, to evaluate whether there is any gain in hypothesis testing for having such validation data. The frequentist approach involves the likelihood ratio test, while the Bayesian test is developed from posterior distribution generated by a mixed MCMC algorithm and a normal prior under realistic assumptions. The comparison between these two approaches is conducted using different simulated scenarios, as well as two real case-control studies having partial validation (internal) data. Different scenarios include settings with varying sensitivity and specificity, sample sizes, exposure prevalence and proportion of unvalidated and validated data. One other scenario that was considered is to evaluate the performance under a fixed budgetary constraint. In the scenarios under consideration, we reach the same conclusion from the two hypothesis testing procedures. The simulation study suggests that the adjusted model (with validation data model) is always better than the unadjusted model (without validation data model). However, exception is possible in the fixed budget scenario. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Collection |
Electronic Theses and Dissertations (ETDs) 2008+ |
Date Available | 2010-03-24 |
Provider | Vancouver : University of British Columbia Library |
DOI | 10.14288/1.0069386 |
URI | http://hdl.handle.net/2429/22472 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2009-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- ubc_2009_fall_karim_mohammad.pdf [ 1.65MB ]
- [if-you-see-this-DO-NOT-CLICK]
- [if-you-see-this-DO-NOT-CLICK]
- Metadata
- JSON: 1.0069386.json
- JSON-LD: 1.0069386+ld.json
- RDF/XML (Pretty): 1.0069386.xml
- RDF/JSON: 1.0069386+rdf.json
- Turtle: 1.0069386+rdf-turtle.txt
- N-Triples: 1.0069386+rdf-ntriples.txt
- Original Record: 1.0069386 +original-record.json
- Full Text
- 1.0069386.txt
- Citation
- 1.0069386.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 21 | 26 |
United States | 18 | 0 |
Canada | 10 | 1 |
Russia | 6 | 0 |
Italy | 6 | 0 |
United Kingdom | 2 | 0 |
Germany | 2 | 0 |
Unknown | 2 | 0 |
France | 2 | 0 |
India | 1 | 0 |
Norway | 1 | 0 |
City | Views | Downloads |
---|---|---|
Shenzhen | 17 | 15 |
Unknown | 8 | 0 |
Saint Petersburg | 6 | 0 |
Milan | 6 | 0 |
Ashburn | 6 | 0 |
Vieux-Saint-Laurent | 5 | 0 |
Guangzhou | 4 | 0 |
Pittsburg | 4 | 0 |
Cheyenne | 4 | 0 |
Kingston | 3 | 0 |
Washington | 2 | 0 |
London | 2 | 0 |
University Park | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0069386/manifest