UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Bayesian adjustments for disease misclassification in epidemiological studies of health administrative databases, with applications to multiple sclerosis research Högg, Tanja


With disease information routinely established from diagnostic codes or prescriptions in health administrative databases, the topic of outcome misclassification is gaining importance in epidemiological research. Motivated by a Canada-wide observational study into the prodromal phase of multiple sclerosis (MS), this thesis considers the setting of a matched exposure-disease association study where the disease is measured with error. We initially focus on the special case of a pair-matched case-control study. Assuming non-differential misclassification of study participants, we give a closed-form expression for asymptotic biases in odds ratios arising under naive analyses of misclassified data, and propose a Bayesian model to correct association estimates for misclassification bias. For identifiability, the model relies on information from a validation cohort of correctly classified case-control pairs, and also requires prior knowledge about the predictive values of the classifier. In a simulation study, the model shows improved point and interval estimates relative to the naive analysis, but is also found to be overly restrictive in a real data application. In light of these concerns, we propose a generalized model for misclassified data that extends to the case of differential misclassification and allows for a variable number of controls per case. Instead of prior information about the classification process, the model relies on individual-level estimates of each participant's true disease status, which were obtained from a counting process mixture model of MS-specific healthcare utilization in our motivating example. Lastly, we consider the problem of assessing the non-differential misclassification assumption in situations where the exposure is suspected to impact the classification accuracy of cases and controls, but information on the true disease status is unavailable. Motivated by the non-identified nature of the problem, we consider a Bayesian analysis and examine the utility of Bayes factors to provide evidence against the null hypothesis of non-differential misclassification. Simulation studies show that for a range of realistic misclassification scenarios, and under mildly informative prior distributions, posterior distributions of the exposure effect on classification accuracy exhibit sufficient updating to detect differential misclassification with moderate to strong evidence.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International