UBC Faculty Research and Publications

Bayesian adjustment for measurement error in continuous exposures in an individually matched case-control… Espino-Hernandez, Gabriela; Gustafson, Paul; Burstyn, Igor May 14, 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12874_2011_Article_588.pdf [ 420.81kB ]
JSON: 52383-1.0223261.json
JSON-LD: 52383-1.0223261-ld.json
RDF/XML (Pretty): 52383-1.0223261-rdf.xml
RDF/JSON: 52383-1.0223261-rdf.json
Turtle: 52383-1.0223261-turtle.txt
N-Triples: 52383-1.0223261-rdf-ntriples.txt
Original Record: 52383-1.0223261-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessBayesian adjustment for measurement error incontinuous exposures in an individually matchedcase-control studyGabriela Espino-Hernandez1, Paul Gustafson1* and Igor Burstyn2AbstractBackground: In epidemiological studies explanatory variables are frequently subject to measurement error. The aimof this paper is to develop a Bayesian method to correct for measurement error in multiple continuous exposures inindividually matched case-control studies. This is a topic that has not been widely investigated. The new method isillustrated using data from an individually matched case-control study of the association between thyroid hormonelevels during pregnancy and exposure to perfluorinated acids. The objective of the motivating study was to examinethe risk of maternal hypothyroxinemia due to exposure to three perfluorinated acids measured on a continuousscale. Results from the proposed method are compared with those obtained from a naive analysis.Methods: Using a Bayesian approach, the developed method considers a classical measurement error model forthe exposures, as well as the conditional logistic regression likelihood as the disease model, together with arandom-effect exposure model. Proper and diffuse prior distributions are assigned, and results from a qualitycontrol experiment are used to estimate the perfluorinated acids’ measurement error variability. As a result,posterior distributions and 95% credible intervals of the odds ratios are computed. A sensitivity analysis ofmethod’s performance in this particular application with different measurement error variability was performed.Results: The proposed Bayesian method to correct for measurement error is feasible and can be implementedusing statistical software. For the study on perfluorinated acids, a comparison of the inferences which are correctedfor measurement error to those which ignore it indicates that little adjustment is manifested for the level ofmeasurement error actually exhibited in the exposures. Nevertheless, a sensitivity analysis shows that moresubstantial adjustments arise if larger measurement errors are assumed.Conclusions: In individually matched case-control studies, the use of conditional logistic regression likelihood as adisease model in the presence of measurement error in multiple continuous exposures can be justified by having arandom-effect exposure model. The proposed method can be successfully implemented in WinBUGS to correctindividually matched case-control studies for several mismeasured continuous exposures under a classicalmeasurement error model.BackgroundMeasurement error refers to the variation of theobserved measurement from the true value, and consistsof two components, random error and systematic error.The first component, the random error, is caused byany factors that randomly affect the measurement acrossa sample, and usually arises from inaccuracy in ameasuring laboratory instrument or random fluctuationsin the environmental conditions. The second error com-ponent, the systematic error, is caused by any factorsthat systematically affect the measurement across a sam-ple, and can be attributed to non-random problems inthe system of measurement (e.g. wrong use or impropercalibration of the measurement instrument).In many scientific areas where statistical analysis isperformed, the problem of dealing with explanatoryvariables subject to measurement error is present. Inparticular, in epidemiologic studies, the explanatory* Correspondence: gustaf@stat.ubc.ca1Department of Statistics, University of British Columbia, Vancouver, BC,CanadaFull list of author information is available at the end of the articleEspino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67© 2011 Espino-Hernandez et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.variables (or ‘exposures’) that reflect exposure to sus-pected risk factors associated with a disease (the out-come variable) are commonly measured with error.These errors can be either differential or non-differential,according to whether they depend on the values of othervariables in the study, for instance the outcome variable[1,2]. As has been discussed by many authors [3-6], mea-surement error reduces power for detecting relationshipbetween exposures and disease, and ignoring this errormay bias the assessment of the association betweenhealth outcome and exposure variables. In particular,ordinary logistic regression can lead to biased estimatesof odds radios (ORs) when the covariates are subject tomeasurement error [7]. Researchers have proposed non-Bayesian methods to correct for measurement error inexposures in individually matched case-control studies.For instance, Guolo et al. [8] used conditional likelihoodmethods to correct for measurement error in a singlecontinuous exposure using simulated data. These authorscompared the performance of the likelihood methodswith two other corrections techniques (regression cali-bration [7] and simulation-extrapolation (SIMEX) [9]),observing that the likelihood approach outperforms thealternative methods when a single continuous exposureis measured with error. McShane et al. [5] proposed aconditional scores procedure to correct for measurementerror in some components of one or more continuouscovariates. In that study, the authors treated the true cov-ariates as fixed unknown parameters, which wereremoved from the likelihood by conditioning on a suffi-cient statistic and estimated together with the unknownparameters. However, the conditional scores procedureexperienced convergence problems in the presence oflarge relative risks or when large measurement errorswere considered. Also, conditional scores procedures aretypically not very generalizable when data structures arechanged even slightly. In addition, Liu et al. [6], Prescottand Garthwaite [10], and Rice [11] proposed Bayesianadjustments for misclassification of a binary exposurevariable. Nevertheless, to our knowledge, very little atten-tion has been given to measurement error in multiplecontinuous exposures in matched case-control studies,except for McShane et al. [5] whose procedure may bechallenging numerically, and which is quite dependenton the settings of the problem.Thus, in this paper, we develop a Bayesian method tocorrect for measurement error in multiple continuousexposures in individually matched case-control studiesthat may be generalized to different settings, whereinformation regarding the measurement error variabilityis available from additional experiments. The methodol-ogy is illustrated using data from a study of associationof perfluorinated acids (PFAs) with disruption of thyroidhomeostasis in pregnant women [12]. PFAs are globalcontaminants of human blood and environment [13].The objective of the motivating study was to examinethe risk of maternal hypothyroxinemia due to exposureto three PFAs measured on a continuous scale. Nohuman study has previously examined the influence ofPFAs on the development of hypothyroxinemia, butthere are reports on the relationship between PFAs andthyroid hormones and thyroid disease. In a sample fromthe US general population from 1999 to 2006, both menand women exposed to some PFAs had higher preva-lence of physician-diagnosed thyroid disease [14]. How-ever, a small study in a highly contaminated communityfailed to find similar association [15], and two other stu-dies also did not report associations [16,17]. Dallaire etal. [18] reported a mixture of negative and positive asso-ciations of thyroid hormones with PFAs. The extent towhich measurement error may contribute to apparentheterogeneity among these reports is unknown, but itcertainly should be considered as an explanation.We start this paper by describing the data in the moti-vating example in detail, followed by derivation of anestimate of the random error variability from percentrecovery experiments, description of the proposed Baye-sian model and justification of conditional logisticdisease model for measurement error correction. Next,application of the method is illustrated along with a sen-sitivity analysis of the impact on the results if greater-than-estimated random error was present. The proposedBayesian method is implemented in WinBUGS softwareand inferences are compared to those drawn from anaive analysis, which ignores measurement errors in theexposures.MethodsDataThe developed Bayesian method is illustrated using indi-vidually matched case-control data from a study of Chanet al. [12]. The objective of the study was to examinethe risk of maternal hypothyroxinemia due to exposureto three PFAs: perfluorooctanoic acid (PFOA), perfluor-ooctane sulfonate (PFOS) and perfluorohexane sulfonate(PFHxS). Chan et al. [12] extracted PFAs from maternalsera samples from 271 pregnant women, aged 18 orolder, who elected to undergo a second trimester prena-tal “triple screen”, delivered at 22 weeks gestation ormore to live singletons without evidence of malforma-tions, and were referred by a physician who made aleast eight recommendations for the “triple screen” overthe study period. The exposure variables were reportedon a continuous scale, and censored/non-detectablevalues (about 5.4% of the total number of records) wererecorded as half the value of the limit of detection. Con-centrations of the PFAs were transformed to log-molarunits, and it was seen that after this transformation theEspino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 2 of 11measured exposures approximately follow a normal dis-tribution. A quality control experiment was performedin order to assess the amount of error incurred in themeasurement of the exposures. In this experiment, per-centages of recovery were calculated for each exposureand the results revealed the presence of a random errorin the measurements. Details of this procedure andresults are presented in Appendix I.Chan et al. [12] classified the subjects into cases orcontrols, based on the analysis of their thyroid stimulat-ing hormone (TSH) and free thyroxin (T4) concentra-tions. The hypothyroxinemia cases correspond to womenexhibiting normal TSH concentrations with no evidenceof hyperthyroidism (between 0.15 and 4.0 mU/L) andfree T4 in the 10th percentile (less than 8.8 pmol/L).Meanwhile the controls correspond to women with nor-mal TSH concentrations but having free T4 concentra-tions between the 50th and 90th percentiles (between 12.0and 14.1 pmol/L). Each case was matched to betweenone and three controls on the basis of two matching fac-tors: maternal age at blood draw (± 3 years) and referringphysician (a total of 29 physicians). Further details on theconstruction of the data can be found in Chan et al. [12].In summary, the matched case-control data used toillustrate the Bayesian method to correct for measure-ment error contain information from 96 cases and 175individually matched controls. For the purpose of thispaper, it is assumed there is no misclassification of con-trol/case status. In addition, the data contain, for eachsubject, the corresponding exposure to PFOA, PFOSand PFHxS, which are reported on a continuous scale inlog- molar units and are assumed to be subject only torandom measurement error. Moreover, four potentialconfounders which are precisely measured are reported:maternal age (years), maternal weight (pounds), mater-nal race (Caucasian and non-Caucasian) and gestationalage (days). All potential confounders except for maternalrace were reported on a continuous scale. The maternalage variable is retained despite its use as a matching fac-tor, in case the matching is too coarse to fully eliminateconfounding.Measurement modelGenerally, in observational studies, the vector of impre-cise surrogate exposures W is commonly recorded,instead of X itself. Therefore, in order to understand therelationship between the disease risk and the explana-tory variables X, having data on Y and W, it is necessaryto account for measurement error in the exposures. Inthis paper, the attention is concentrated on the problemof having only random error, by assuming zero systema-tic error. However, the present methodology can beadapted to introduce the effect of a systematic error.Assume the vector of independent surrogates W arisesfrom a classical additive measurement error model,which can be expressed asW = X + U, (1)where U refers to the measurement error component.This classical model assumes the true exposures arerecorded with an additive, independent error. In addition,it can be assumed the measurement error is non-differen-tial, and unbiased. The assumption of non-differentialmeasurement error refers to the fact that the distributionof the surrogate exposures depends only on the actualexposure variables and not on the response variable orother variables in the model. As a result, the conditionaldistribution of (W|X,Y) is identical to the conditional dis-tribution of (W|X). The unbiased assumption E(U|X) = 0implies E(U|X) = X. Typically, the measurement errorcomponent is also assumed to be normally distributedwith constant variance, i.e. U ~ NP(0, ∑), where is ∑ adiagonal matrix with the main diagonal entries given bypp = σp2, for p = 1, ..., P.Under the stated assumptions, W|X follows aP-dimensional multivariate normal distribution with amean vector given by the vector of true exposures anda covariance matrix ∑, which in this case is known.Thus, the density corresponding to the measurementmodel is given by W|X, ∑ ~ Np(X, ∑). Therefore, underthe assumptions of an individually matched case-controldata, the density of the measurement model is given byN∏i=1ni∏j=1f(wij|xij,), (2)where wij and xij correspond to the vector of surrogateand true exposures variables, respectively, for the j - thmember of the i - th matched set, and N refers to thenumber of matched sets.For the particular case of data used in the study onPFAs, the surrogate variables are measured concentra-tions of PFOA, PFOS, and PFHxS, which correspond tothe exposures to the compounds reported on a continu-ous scale in log-molar units. Consequently, an additivemeasurement error model for the exposures inlog-molar units translates into a multiplicative errorstructure, in which the corresponding error term is pro-portional to the true exposure in molar scale. In manyepidemiological studies, positive explanatory variablesare subject to this sort of measurement error. Usingavailable validation data from the quality control proce-dure performed by Chan et al.[12], the covariancematrix ∑ of the measurement model can be estimated.In Appendix I we present a statistical argument for esti-mating ∑ from this particular form of quality controlEspino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 3 of 11data. The argument is based on the multivariate versionof the delta method [19] and uses the estimated stan-dard deviation of the percentages of recovery for theconcentrations of the three compounds in parts-per-billion to obtain information about the incurred error inthe measurement of the exposures.Disease modelIn order to describe a relationship between the trueexposures and the probability associated to the responsevariable, it is necessary to specify a disease model. Sincethe study analysed in this paper involves matched sets,the conditional logistic regression likelihood is adopted.Consider a study having N matched sets, such that thej-th member (j = 1, ..., ni) of the i-th set (i = 1, ..., N)has P associated continuous exposures Xij = (Xij1, ...,Xijp)T. In addition, let Yi = (Yi1, ..., Yini)T be a vector ofresponse variables associated to the i-th matched set,such that Yij = 1 for the cases and Yij = 0 for the con-trols. Without loss of generality, the subjects can belabelled such that Yi1 = 1 and Yij = 0, for j = 2, ..., ni.Thus, the underlying objective is to model the retro-spective probabilities for the case (i.e. P(Xi1 | Yi1 = 1) ),and the controls (i.e. P(Xij | Yij = 0), for j = 2, ... ni),which can be accomplished by using the conditionallogistic regression model.The conditional likelihood is obtained by conditioningon the number of cases in each matched set, i.e. condi-tioning onni∑j=1Yij. In the particular case of individualmatching, the number of cases is one; therefore, theconditional likelihood for the i - th matched set (i = 1,..., N) is given by [20,21]LCi(β)= f (yi= (1, 0, · · · , 0)|xi,ni∑j=1yij = 1)=exp(βTXi1)ni∑j=1exp(βTXij)where b= (b1, ..., bP)T corresponds to the log ORsassociated with unit changes in each of the exposures.Under the assumptions of a matched case-control study,the full conditional likelihood is the product of LCi(β)over the N strata or matched sets, which isLC(β)=N∏i=1LCi(β). (3)The parameter bis assumed to be constant acrossmatched sets, and it is the target of statistical inference.Bayesian modelConsider a retrospectively collected matched case-con-trol data where each case is matched to one or morecontrols based on suspected confounders as matchingfactors. Let Y be the response variable, such that Y = 1for cases and Y = 0 for controls, let X= (X1, ..., XP)T bethe P -dimensional vector of the true, latent, continuousexposures which are subject to measurement error, andlet W= (W1, ..., WP)T be the P-dimensional vector ofsurrogate exposures.The aim of this subsection is to develop a Bayesianmethod to understand the association between the vec-tor of continuous exposures X and the probability of theresponse variable Y, after correcting for random mea-surement error in the exposures.Under the Bayesian paradigm, the posterior density ofthe unknown quantities is given byf(x, θ |w, y) ∝ f (x,w|y, θ) f (θ) (4)where θ refers to the vector of unknown parameters.The first term of the right hand side of (4) refers to thejoint posterior distribution of the true exposures X andthe surrogate variables W. As will be shown in Appen-dix II, this term contains the densities of the measure-ment model, disease model and exposure model.Meanwhile, the second term corresponds to the priordistribution of the unknown parameters.Exposure modelThe conditional logistic regression model has been suc-cessfully applied in matched retrospective case-controlstudies, and the use of this procedure has been statisti-cally justified using Bayesian (see for example [11,22,23])and non-Bayesian (see for instance [20,21]) approaches.This justification is based on the fact that the likelihoodterm describing the distribution of the total number ofcases within-stratum given the exposures can be dis-carded. The reason for this is that it does not provideinformation about the parameter of interest, since thelikelihood is only a function of the unknown parameterb. However, this justification is no longer directly applic-able when adjusting for measurement error in expo-sures, since the omitted likelihood term might alsocontain information about these exposures [6], i.e., thelikelihood is a function of b and the unobserved expo-sures. As a result, the use of a conditional likelihoodapproach in the presence of measurement error in mul-tiple continuous exposures has not been widely adopted.We justify the use of conditional logistic regressionlikelihood as a disease model when adjusting for mea-surement error in an individually matched case-controlstudy via a random-effect exposure model; details arepresented in Appendix II. A different approach thatEspino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 4 of 11does not involve a random-effect exposure model is pro-vided by Gulo et al. [8].In order to describe the random-effect exposuremodel, we assume that the vector of exposures for thej - th subject from the i - th matched set follows a mul-tivariate normal distribution around the vector of expo-sure means of the corresponding matched set.Moreover, since the vector of exposure means of amatched set is unknown, we assume that this vector fol-lows a multivariate normal distribution centered on theacross-set exposures means. That results inXij = μi + γ ijandμi= μ¯ + λiwhere γijiid∼NP(0,VW), and λiiid∼NP(0,VB), such thatgij and li are mutually independent. Also notice that thewithin-stratum covariance matrix VW is assumed to beconstant across matched sets. As a result,Xij|μiiid∼NP(μi,VW), and μiiid∼NP(μ¯,VB). Thus, thedensity corresponding to the random-effect exposuremodel is given byf (x|μi, μ¯,VW ,VB) =N∏i=1ni∏j=1f (xij|μi,VW) ×N∏i=1f (μi|μ¯,VB). (5)It has been assumed that the vector of true expo-sures follows a P-dimensional multivariate normal dis-tribution. However, in observational studies, exposuresoften have a skew distribution [24]. Therefore, it isimportant to keep in mind that incorrect model speci-fication may lead to biased estimates. To overcomepotential misspecifications, for the univariate casesome authors [24-26] have proposed the use of flexibledistributions to increase the robustness to model speci-fication. However, implementation of such methodscan be quite challenging in the context of multivariateexposures.Joint posterior densityFor the particular case of this paper, the data consideredconsist of N = 96 matched sets, such that the i-thmatched set has j = 1, ..., ni subjects, with ni Î {2,3,4}.Thus one subject is the case per set and the remainingni - 1 subjects are the controls. Letθ =(β ,μi, μ¯,VW ,VB)be the vector of unknown para-meters. Therefore, by (4) and (AII.3), and using densitiesin (2), (3) and (5), it follows that the posterior density ofthe unknown quantities can be expressed asf(x, θ |w, y) ∝ 96∏i=1⎧⎨⎩ni∏j=1f(wij|xij,)×exp(βTXi1)ni∑j−1exp(βTXij)×ni∏j=1f(xij|μi,VW)× f(μi|μ¯,VB)⎫⎬⎭×f (β, μ¯,VW ,VB).It is commonly assumed that the unknown parametersare independent of each other a priori, so thatf (β, μ¯,VW ,VB) = f(β)× f(μ¯)× f (VW) × f (VB). Inorder to implement a Bayes-Markov chain Monte Carlo(MCMC) inference, it is necessary to assume prior dis-tributions for the unknown parameters. Proper priordistributions are assumed for all the parameters. More-over, the corresponding hyperparameters are chosen sothat the parameters reflect prior ignorance:β ∼ NP(0, 10000IP),μ¯ ∼ NP(μ, 10000Ip),V−1W ∼ WP (IP,P + 2) ,V−1B ∼ WP (IP,P + 2) ,where WP(R, b) indicates a P-dimensional Wishart dis-tribution with a positive definite inverse scale matrix R,and b degrees of freedom. And IP is an identity matrixof size P. For the particular case of the matched case-control data from the epidemiological study on PFAs,P = 3 and μis estimated using the across-set samplemean of the corresponding observed exposures.Adjustment for additional confoundersConsidering the possibility that confounding is only par-tially addressed by matching, further potential confoun-ders can be introduced in the disease model. In general,potential confounders should also be included in the expo-sure model; however, for simplicity these confounders arenot considered in our random-effect exposure model,keeping it as presented in equation (5). For the case of thePFA’s data, this simplification might be justified by thefact that the exposures and the confounders exhibit smallcorrelations (less than 0.18), so we do not expect thepotential confounders to be very helpful in reconstructingthe true exposures. In addition, due to the assumption ofnon-differential measurement error, the measurementmodel also remains as presented in equation (2).Espino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 5 of 11Consider the situation where the j-th member of the i-th set has associated K potential confounding variablesZij = (Zij1, ..., Zijk)T which are precisely measured.Therefore, the full conditional likelihood correspondingto the disease model in (3) can be rewritten asLC(β, δ)=N∏i=1exp(βTXi1 + δTZi1)ni∑j=1exp(βTXij + δTZij) ,where δ= (δ1, ..., δK)T is the vector of parameters asso-ciated with the confounding effect.Thus, the posterior density of the unknown quantitiescan be rewritten asf(x, θ |w, y, z) ∝ 96∏i=1⎧⎨⎩ni∏j=1f(wij|xij,)×exp(βTXi1 + δTZi1)ni∑j−1exp(βTXij + δTZij)×ni∏j=1f(xij|μi,VW)× f(μi|μ¯,VB)⎫⎬⎭×f (β, δ, μ¯,VW ,VB),where θ =(β, δ,μi, μ¯,VW ,VB)is the new vector ofunknown parameters, and a proper and diffuse priordistribution is assumed for the parameter δ, by havingδ~ Nk(0, 10000IK), where K = 4 for the particular caseof the motivating example.Results and DiscussionIn this section, the proposed Bayesian method to cor-rect for measurement error is illustrated using datafrom the study of Chan et al. [12]. Inferences drawnfrom a naive analysis and an analysis correcting formeasurement error are presented. The naive analysisignores error in exposure measurements, by pretendingthe observed exposures (PFOA, PFOS, and PFHxS) areprecisely measured. Meanwhile, in the analysisaccounting for measurement error, the surrogate expo-sures are corrected for random measurement error.Two models are considered in each analysis: a simplemodel assuming the only confounding is via matchingfactors, and a model adjusted by four further potentialconfounders (maternal age, maternal weight, maternalrace, and gestational age). In summary, the resultsfrom four Bayesian models are compared: a simplemodel under the naive analysis (N-S), an adjustedmodel by confounders under a naive analysis (N-A), asimple model under a measurement error analysis(ME-S), and an adjusted model by confounders undera measurement error analysis (ME-A).The models are implemented in WinBUGS software,version 1.4.3 [27], which is freely distributed and can bedownloaded from the web [28]. Our WinBUGS code asavailable (Additional file 1). The analysis of the resultswas carried out using the statistical package R, version2.11.1, which is also freely distributed on the web [29].Two MCMC chains of length 55,000 were run for eachmodel, using different initial values. The first 5,000“burn-in” iterations were discarded from each chain andthe last 50,000 MCMC iterations were used to performBayesian statistical inference. The computer runningtimes on an Intel Core 2 Duo CPU at 2.10 GHz with3.00 GB of RAM for N-S and N-A were approximately1.5 and 4 minutes, respectively. Meanwhile, runningtimes for ME-S and ME-A were about 9 and 13 min-utes, respectively. The convergence to the posterior dis-tributions and mixing of the two chains were assessedfrom the trace, autocorrelation, and the Gelman-Rubinconvergence statistic plots. Moreover, under both typesof analysis the estimated Monte Carlo standard errors ofthe posterior log ORs were smaller than 0.0026 for thesimple models (N-S and ME-S) and smaller than 0.0030for the models adjusted by the confounding variables(N-A and ME-A).Posterior means and 95% equal-tailed credible inter-vals of the ORs obtained for the models under the naiveBayesian analysis and the proposed Bayesian method tocorrect for measurement error are depicted in Table 1.This table indicates that under the two types of analysis,there is an absence of evidence for an associationbetween the risk of maternal hypothyroxinemia andexposure to the PFAs. The point and credible intervalsestimates of the ORs under both analyses are very simi-lar, suggesting that a slight adjustment is manifested forthe level of measurement error exhibited in the PFAs.Table 1 Comparison of posterior means and credibleintervals of the ORsOR 95% Cred. Int. Adjusted OR 95% Cred. Int.Naive analysisPFOA 0.905 (0.661, 1.209) 0.828 (0.584, 1.127)PFOS 0.802 (0.495, 1.214) 0.752 (0.445, 1.181)PFHxS 1.315 (0.964, 1.755) 1.302 (0.934, 1.779)Measurement erroranalysisPFOA 0.904 (0.656, 1.212) 0.821 (0.568, 1.131)PFOS 0.794 (0.482, 1.221) 0.743 (0.431, 1.191)PFHxS 1.333 (0.960, 1.816) 1.329 (0.938, 1.856)Posterior means and 95% equal-tailed credible intervals of the ORs for thesimple model and the model adjusted for confounding variables. Results arepresented for the Bayesian naive analysis and for the Bayesian methodproposed to correct for measurement error.Espino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 6 of 11However, a priori, there was not intuition that theadjustment would necessarily be slight.Figure 1 presents the posterior densities and 95%credible intervals of the ORs for the simple model andthe model adjusted by confounders, both under themeasurement error analysis. Plots indicate there is nosubstantial association between the exposure to any PFAand maternal hypothyroxinemia. In addition, plots showa wider posterior distribution of the OR for the expo-sure to PFHxS, suggesting a bigger uncertainty in therisk associated to this exposure. Moreover, the posteriordistributions for the simple model and the model afteradjusting for the confounding variables are quite similar,in particular for the exposures PFOS and PFHxS, sug-gesting little or no further confounding effect. This con-clusion is confirmed after calculating 95% equal-tailedcredible intervals of the estimated parameter δ(notshown).A sensitivity analysis of the measurement error varia-bility ∑ estimated in Appendix I is carried out in orderto validate the performance of the method. Figure 2gives posterior means and 95% equal-tailed credibleintervals for the ORs after increasing up to ten timesthe assumed measurement error variability for the expo-sures. Plots show that, for all the exposures, more sub-stantial measurement error adjustments arise if largermeasurement errors are assumed. Furthermore, the esti-mated ORs move in the anticipated directions, i.e., awayfrom the null. However, in all cases the credible inter-vals widen, so that they still include the value of one,providing little evidence of any association betweenexposures and the outcome, regardless of the assumedmeasurement error magnitudes. The aforementionedMCMC diagnostics indicated that MCMC convergenceand mixing worsened slightly as the assumed measure-ment error magnitude increased. However, chains oflength 55,000 were run with the first 5,000 interactionsused as a “burn-in” period were still adequate.ConclusionsWe propose a Bayesian method to correct for measure-ment error in multiple continuous exposures for indivi-dually matched case-control studies. This methodassumes a classical measurement model in order toaccount for random error in the exposures. It uses theconditional logistic regression likelihood as a diseasemodel. We justify the use of this model in the presenceof measurement error in the exposures by having a ran-dom-effect exposure model.The proposed method can be implemented in Win-BUGS software, which manages the computational com-plexity associated with likelihood-based approaches, towhich Guolo et al. [8] referred. Moreover, as waspointed out by Guolo et al. [8], the likelihood-basedmethods, such as Bayesian and maximum-likelihoodmethods, perform well under different measurementerror structures, can provide accurate inferential results,and outperform other corrections techniques (regressioncalibration and SIMEX). Furthermore, unlike themethod proposed by McShane et al. [5] to correct formeasurement error in continuous exposures, the Baye-sian method proposed in this paper is neither prone toconvergence errors nor highly dependent on the settingsof a particular individually matched case-control study.For the particular case of the study on PFAs, Bayesianinference of ORs indicates that little adjustment forOdds−ratioDensity0. 1.0 1.5 2.0 2.5 3.0PFOA0.5 1.0 1.5 2.0 2.5 3.0PFOS0.5 1.0 1.5 2.0 2.5 3.0PFHxSFigure 1 Posterior distributions of ORs and credible intervals. Posterior distributions (curves), and corresponding posterior means and 95%equal-tailed credible intervals (vertical lines) of ORs based on the two models under the measurement error analysis. The solid curves/linescorrespond to the simple model (ME-S) and the dashed curves/lines correspond to the model adjusted for confounding variables (ME-A).Espino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 7 of 11exposure measurement error is needed for the magni-tude of error determined from the quality controlexperiment. However, bigger adjustments arise if largermeasurement errors are assumed.Some avenues for future research are suggested by ourresults. First, the method assumes a multivariate normaldistribution on the exposures. However, it is importantto keep in mind that a model misspecification may leadto biased estimates. In this context some authors haveproposed the use of parametric and non-parametricflexible models. Nevertheless, some complications areinvolved in their implementations. For instance,Richardson et al. [24] proposed using a normal mixturemodel under a Bayesian approach and found that in theabsence of validation data, their approach requires verystrong priors on the mixture parameters to obtain rea-sonable estimates. Carroll et al. [25] suggested the useof a Bayesian approach in order to avoid the compli-cated implementation of the Estimation-Maximization(EM) algorithm under a traditional frequentist analysisonce the normal mixture model is implemented into thelikelihood. Furthermore, they advised to use partiallyproper priors in order to avoid improper posteriors.Guolo [26] suggested the use of skew-normal family ofdistributions as long as this distribution is a goodapproximation of the distribution of the unobservedexposures in the case-control sampling. Generally, how-ever, the implementation of flexible exposure models formultivariate exposures remains challenging.Second, we have not made explicit comparisonsbetween our method and other methods. We have, how-ever, considered implementation issues for our methodversus others. Particularly, we considered regressioncalibration techniques which impute best-guess expo-sure values and then plug these in to the disease model.While this is a simple procedure with some data for-mats, it would be no simpler that our method in thepresent format. The imputation involves estimating E(X|W), which in turn requires estimating variance compo-nents from a multivariate random effect model appliedto unbalanced data, in order to acknowledge variationbetween and within matched sets, in a similar fashion to[5]. Thus fitting a model similar to our exposure modelis required, for which software options are somewhatlimited. Moreover, regression calibration requires post-fitting adjustment of standard errors, say by bootstrap-ping, which would be very burdensome computationallyin the present setting.Finally, using available information from the qualitycontrol experiment performed on the PFA concentra-tions and the multivariate version of delta method, wepresent a statistical approach to estimate the measure-ment error variability. However, different assumptionsand estimation methods can be developed in the pre-sence of additional validation data or a different struc-ture of quality control data. For instance, thecomplicated structure of the percent recovery experi-ments necessitated a ‘plug-in’ approach to dealing withthe measurement error covariance matrix. Simpler datastructures for informing the measurement error var-iance, such as a validation subsample, replicates, or aninstrumental variable, would much more easily lendthemselves to incorporating uncertainty about this cov-ariance matrix as part of the overall Bayesian analysis.● ● ● ● ● ● ● ●●●1 2 3 4 5 6 7 8 9−ratio ● ● ● ● ● ● ●● ●●● ●● ●●●●●●●1 2 3 4 5 6 7 8 9 variance● ●● ●●●●●●●● ●● ●● ●●●●●1 2 3 4 5 6 7 8 9 1012345PFHxS● ●● ●●●●●●●Figure 2 Sensitivity analysis of measurement error variability. Posterior means and 95% equal-tailed credible intervals of the ORs fordifferent scenarios of measurement error variability. The solid lines correspond to the simple model (ME-S) and the dashed lines to the modeladjusted for confounding variables (ME-A).Espino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 8 of 11Appendix I. Measurement error variabilityestimationIn reference to the epidemiological matched case-con-trol study on PFAs, Chan et al. [12] performed a qualitycontrol procedure on the PFAs in ppb concentrations.Serum samples for subjects were divided into batches ofapproximately 16 per set for analysis of PFAs. Each sethad a pooled sample consisting of: a paired sample ofspiked serum (50 ppb of mixed standard in pooledserum) and unspiked serum (only pooled serum),besides a gold standard sample (50 ppb into methanol).Percentages of recovery were calculated by comparingthe spiked concentration (i.e. difference between thepaired spiked and unspiked samples) to the gold stan-dard sample. The results showed that the standarddeviations of the percent recoveries for PFAs in ppbconcentrations were: 0.157 for PFOA, 0.139 for PFOSand 0.252 for PFHxS.Let W∗p, W∗p,spiked, and W∗p,gold be the unspiked serum,spiked serum and gold standard corresponding to expo-sure p in ppb concentration, with p ε {1,2,3} = {PFOA,PFOS, PFHxS}. Using this notation, the percent recoverycorresponding to exposure p is given byQp =W∗p,spiked − W∗pW∗p,gold.Let mp be the factor used to convert molar units toppb concentrations, corresponding to exposure p.Therefore, the percent recovery can be expressed asQp =mp ∗ exp(Wp,spiked) − mp ∗ exp(Wp)mp ∗ exp(Wp,gold), (AI:1)where Wp, Wp, spiked and Wp, gold correspond to theunspiked serum, spiked serum and gold standard sam-ples in log-molar concentrations, respectively.Using the normality and homogeneity assumptions ofthe error component, the corresponding equation (1) fora particular exposure p is equivalent toWp = Xp + σpεp, with εp ∼ N(0, 1).Therefore, it follows thatWp,spiked = Xp,spiked + σpεp,spiked, with εp,spiked ∼ N(0, 1),Wp,gold = Xp,gold + σpεp,gold, with εp,gold ∼ N(0, 1).By substituting these three equations into (AI.1), it ispossible to see thatQp =mp ∗ exp(Xp,spiked) ∗ exp(σpεp,spiked) − mp ∗ exp(Xp) ∗ exp(σpεp)mp ∗ exp(Xp,gold) ∗ exp(σpεp,gold) . (AI:2)Moreover, according to the description of the samplesin the quality control procedure Xp,spiked, Xp,gold, whichcorrespond to the true spiked serum and gold standardsamples in log-molar concentrations, have the followingunderlying structuresmp ∗ exp(Xp,spiked) = mp ∗ exp(Xp) + 50,mp ∗ exp(Xp,gold) = 50.(AI:3)Substitution of (AI.3) into equation (AI.2) yieldsQp =(50 + cp) ∗ exp(σpεp,spiked) − cp ∗ exp(σpεp)50 ∗ exp(σpεp,gold) ,where cp = mp* exp(Xp). Therefore, the percentage ofrecovery corresponding to exposure p is a twice-differ-entiable function of Tp = (εp, εp,spiked, εp,gold)T. Assuming(εp, εp,spiked, εp,gold) are independent, Tp follows a trivari-ate normal distribution with a mean vector of zeros anda covariance matrix equal to the identity matrix. Thus,based on the multivariate delta method, the variance ofthe percent recovery is given byVar(Qp) ∼= ∇Qp(0)TI3∇Qp(0)∼=[( cp50)2+(1 +cp50)2+ 1]∗ σ 2p ,(AI:4)where is the gradient of ∇Qp(Tp). Using the gradient ofQp results of the quality control procedure (standarddeviations of the percentages of recovery for PFAs inppb concentrations), and by taking cp as the sampleaverage of the ppb concentrations recorded for exposureacross-sample, estimates for the measurement errorvariability for each exposure can be obtained as followsσˆ 2p∼= Vˆar(Qp)[(cˆp50)2+(1 +cˆp50)2+ 1] .Using that information, the estimate of covariancematrix ∑ in (2) is given byΣˆ ≈ diag (σˆ 21 , σˆ 21 , σˆ 23 )≈ diag (σˆ 2PFOA, σˆ 2PFOS, σˆ 2PFHxS)≈ diag (0.01180007, 0.007957311, 0.03042517) .(AII:1)Appendix II. Justification for conditionallikelihood in matched case-control studies withmeasurement error in continuous exposuresBayesian justifications for using conditional likelihoodwhen actual exposure is observed are given by Rice[11,22,23], but the situation is less clear when the actualexposure is unobserved and treated as an unknownquantity inside the posterior distribution. Thus weEspino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 9 of 11provide the following argument for using the conditionallikelihood as a disease model, as long as the model forexposure acknowledges both across-stratum and within-stratum variation. For simplicity the argument is pre-sented in the situation without confounders that varywithin matched sets, i.e., all confounding is addressedvia matching.Under the Bayesian paradigm, for individuallymatched case-control data retrospectively collected andsubject to measurement error, the joint posterior modelof the true exposure and surrogate variables for a speci-fic stratum (matched set) s is given byf(x,w|y = (1, 0, · · · , 0), s)= f(w|x, y = (1, 0, · · · , 0), s)× f(x|y = (1, 0, · · · , 0), s)= f(w|x)× f (x|y = (1, 0, · · · , 0), s) .The first term of the right hand side of (AII.1) isobtained under the assumption of nondifferential mea-surement error model. Regarding the second term, since∑jYj depends explicitly on the values of the elementsof the vector Y, the distribution of (X, Y= (1,0, ..., 0), S)is the same as the distribution of(X,Y = (1, 0, · · · , 0), S,∑jYj = 1). Therefore, the retro-spective probability in the second term of the right handside of (AII.1) can be rewritten asf(x|y = (1, 0, · · · , 0), s)= f⎛⎝y = (1, 0, · · · , 0)|x,∑jyj = 1, s⎞⎠×f(∑jyj = 1|x, s)f(y = (1, 0, · · · , 0)|s)×f(x|s) .(AII:2)Notice the distribution of Yis independent of the spe-cific stratum, and it is mainly determined by the expo-sure variables. Therefore, by standard arguments,under a prospective logistic regression model withstratum-specific intercept, the probability of⎛⎝Y = (1, 0, · · · , 0)|X,∑jYj = 1, S⎞⎠ is simply the condi-tional likelihood term. On the other hand, arguably thedominant variation in the prospective distribution of⎛⎝∑jYj⎞⎠given (X, S) will be with S rather than X, viathe stratum-specific intercept. As a result,f⎛⎝∑jyj = 1|x, s⎞⎠ ≈ f⎛⎝∑jyj = 1|s⎞⎠. Thus, (AII.2) canbe approximated asf(x|y = (1, 0, · · · , 0), s)≈ f⎛⎝y = (1, 0, · · · , 0)|x,∑jyj = 1⎞⎠×f(∑jyj = 1|s)f(y = (1, 0, · · · , 0)|s)×f(x|s) .Since Y1, · · · ,Yns |S are exchangeable, it is possible tosee thatf(y = (1, 0, · · · , 0)|s)f(∑jyj = 1|s) = f⎛⎝y = (1, 0, · · · , 0)|∑jyj = 1, s⎞⎠ = 1/ns.Thus, the joint posterior density of the true exposureand surrogate variables for a specific stratum s can beexpressed asf(x,w|y = (1, 0, · · · , 0), s)∝ f (w|x)× f⎛⎝y = (1, 0, · · · , 0)|x,∑jyj = 1⎞⎠× f (x|s) , (AII:3)where the conditional density of (W|X) corresponds tothe measurement model. This describes how the surro-gate vector of explanatory variables arises from the truevalues of X. And the density (X|S) of refers to thewithin-stratum density of the exposure model. Thewithin-stratum density of the exposures can be imple-mented as a random-effect model. Therefore, the use ofa conditional logistic regression model, when the expo-sures are measured with error, is justified by having arandom-effect exposure model.Additional materialAdditional file 1: WinBUGS Code. Code used to perform the Bayesianadjustment for measurement error in a matched case-control study withmultiple continuous covariatesAcknowledgementsThe authors thank JW Martin, E Chan, F Bamforth and NM Cherry of theUniversity of Alberta (Edmonton, Canada) for their contribution togenerating data that motivated our work. This research was financiallysupported by the Canadian Institutes for Health Research (FundingReference Number 62863).Author details1Department of Statistics, University of British Columbia, Vancouver, BC,Canada. 2Department of Environmental and Occupational Health, School ofPublic Health, Drexel University, Philadelphia, PA, USA.Authors’ contributionsGEH developed and implemented the Bayesian method under supervisionof PG. IB designed epidemiological study, oversaw its conduct and identifiedthe need to better understand the impact of measurement error on naiveanalysis. All authors contributed to writing of the manuscript, read its finalform and approved it.Espino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 10 of 11Competing interestsThe authors declare that they have no competing interests.Received: 5 January 2011 Accepted: 14 May 2011Published: 14 May 2011References1. Thomas D, Stram D, Dwyer J: Exposure Measurement Error: Influence onExposure-Disease Relationships and Methods of Correction. AnnualReview of Public Health 1993, 14:69-93.2. Fosgate GT: Non-differential measurement error does not always biasdiagnostic likelihood ratios towards the null. Emerging Themes inEpidemiology 2006, 3:7.3. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM: Measurement Error inNonlinear Models. A Modern Perspective Chapman & Hall; 2006.4. Gustafson P: Measurement Error and Misclassification in Statistics andEpidemiology: Impacts and Bayesian Adjustments Chapman and Hall/CRCPress; 2004.5. McShane LM, Midthune DN, Dorgan JF, Freedman LS, Carroll RJ: Covariatemeasurement error adjustment for matched case-control studies.Biometrics 2001, 57(1):62-73.6. Liu J, Gustafson P, Cherry N, Burstyn I: Bayesian analysis of a matchedcase-control study with expert prior information on both themisclassification of exposure and the exposure-disease association.Statistics in Medicine 2009, 28:3411-3423.7. Rosner B, Spiegelman D, Willett WC: Correction of Logistic Regressionrelative risk estimates and confidence intervals for measurement error:The case of multiple covariates measured with error. American Journal ofEpidemiology 1990, 132(4):734-745.8. Guolo A, Brazzale AR: A simulation-based comparison of techniques tocorrect for measurement error in matched case-control studies. Statisticsin Medicine 2008, 27:3755-3775.9. Cook JR, Stefanski LA: Simulation-Extrapolation Estimation in ParametricMeasurement Error Models. Journal of the American Statistical Association1994, 89(428):1314-1328.10. Prescott GJ, Garthwaite PH: Bayesian analysis of misclassified binary datafrom a matched case-control study with a validation sub-study. Statisticsin Medicine 2005, 24(3):379-401.11. Rice K: Full-likelihood approaches to misclassification of a binaryexposure in matched case-control studies. Statistics in Medicine 2003,22(20):3177-3194.12. Chan E, Burstyn I, Bamforth F, Cherry NM, Martin JW: Perfluorinated acidsand hypothyroxinemia in pregnant women. Environmental Research 2011,111(4):559-564.13. Hansen KJ, Clemen LA, Ellefson ME, Johnson HO: Compound-Specific,Quantitative Characterization of Organic Fluorochemicals in BiologicalMatrices. Environmental Science & Technology 2001, 35(4):766-770.14. Melzer D, Rice N, Depledge MH, Henley WE, Galloway TS: Associationbetween Serum Perfluorooctanoic Acid (PFOA) and Thyroid Disease inthe U.S. National Health and Nutrition Examination Survey. EnvironHealth Perspect 2010, 118(5):686-692.15. Emmett EA, Zhang H, Shofer FS, Freeman D, Rodway NV, Desai C, Shaw LM:Community exposure to perfluorooctanoate: relationships betweenserum levels and certain health parameters. Journal of Occupational &Environmental Medicine 2006, 48(8):771-779.16. Inoue K, Okada F, Ito R, Kato S, Sasaki S, Nakajima S, Uno A, Saijo Y, Sata F,Yoshimura Y, Kishi R, Nakazawa H: Perfluorooctane Sulfonate (PFOS) andRelated Perfluorinated Compounds in Human Maternal and Cord BloodSamples: Assessment of PFOS Exposure in a Susceptible Populationduring Pregnancy. Environ Health Perspect 2004, 112(11):1204-1207.17. Peck JD, Robledo C, Neas B, Calafat AM, Sjodin A, Wild R, Cowan LD: Effectsof persistent organic pollutants on thyroid hormone levels duringpregnancy [abstract]. Annual Meeting of the Society for EpidemiologicResearch June 24-27, 2008. Volume 167 Chicago, Illinois, USA: AmericanJournal of Epidemiology; 2008, s103.18. Dallaire R, Dewailly E, Pereg D, Dery S, Ayotte P: Thyroid function andplasma concentrations of polyhalogenated compounds in Inuit adults.Environ Health Perspect 2009, 117(9):1380-1386.19. Lachin JM: A.3 Delta Method. Biostatistical Methods: The Assessment ofRelative Risks John Wiley and Sons; 2000, 455-457.20. Breslow NE, Day NE: Statistical methods in cancer research, Volume 1- Theanalysis of case-control studies Lyon, France: International Agency forResearch on Cancer (IARC); 1980.21. Lachin JM: 7. Logistic Regression Models. Biostatistical Methods: TheAssessment of Relative Risks John Wiley and Sons; 2000, 247-316.22. Rice K: Equivalence between conditional and mixture approaches to theRasch model and matched case-control studies, with applications.Journal of the American Statistical Association 2004, 99(466):510-522.23. Rice K: Equivalence between conditional and random-effects likelihoodsfor pair-matched case-control studies. Journal of the American StatisticalAssociation 2008, 103(481):385-396.24. Richardson S, Leblond L, Jaussent I, Green PJ: Mixture Models inMeasurement Error Problems, with Reference to EpidemiologicalStudies. Journal of the Royal Statistical Society Series A (Statistics in Society)2002, 165(3):549-566.25. Carroll RJ, Roeder K, Wasserman L: Flexible Parametric Measurement ErrorModels. Biometrics 1999, 55(1):44-54.26. Guolo A: A Flexible Approach to Measurement Error Correction in Case-Control Studies. Biometrics 2008, 64(4):1207-1214.27. Lunn DJ, Thomas A, Best N, Spiegelhalter D: WinBUGS – a Bayesianmodelling framework: concepts, structure, and extensibility. Statistics andComputing 2000, 10:325-337.28. The BUGS Project. [http://www.mrc-bsu.cam.ac.uk/bugs/].29. The Comprehensive R Archive Network (CRAN). [http://cran.r-project.org/].Pre-publication historyThe pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/11/67/prepubdoi:10.1186/1471-2288-11-67Cite this article as: Espino-Hernandez et al.: Bayesian adjustment formeasurement error in continuous exposures in an individually matchedcase-control study. BMC Medical Research Methodology 2011 11:67.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitEspino-Hernandez et al. BMC Medical Research Methodology 2011, 11:67http://www.biomedcentral.com/1471-2288/11/67Page 11 of 11


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items