UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Bayesian methods for alleviating identification issues with applications in health and insurance areas Xia, Chaoxiong


In areas such as health and insurance, there can be data limitations that may cause an identification problem in statistical modeling. Ignoring the issues may result in bias in statistical inference. Bayesian methods have been proven to be useful in alleviating identification issues by incorporating prior knowledge. In health areas, the existence of hard-to-reach populations in survey sampling will cause a bias in population estimates of disease prevalence, medical expenditures and health care utilizations. For the three types of measures, we propose four Bayesian models based on binomial, gamma, zero-inflated Poisson and zero-inflated negative binomial distributions. Large-sample limits of the posterior mean and standard deviation are obtained for population estimators. By extensive simulation studies, we demonstrate that the posteriors are converging to their large-sample limits in a manner comparable to that of an identified model. Under the regression context, the existence of hard-to-reach populations will cause a bias in assessing risk factors such as smoking. For the corresponding regression models, we obtain theoretical results on the limiting posteriors. Case studies are conducted on several well-known survey datasets. Our work confirms that sensible results can be obtained using Bayesian inference, despite the nonidentifiability caused by hard-to-reach populations. In insurance, there are specific issues such as misrepresentation on risk factors that may result in biased estimates of insurance premiums. In particular, for a binary risk factor, the misclassification occurs only in one direction. We propose three insurance prediction models based on Poisson, gamma and Bernoulli distributions to account for the effect. By theoretical studies on the form of posterior distributions and method of moment estimators, we confirm that model identification depends on the distribution of the response. Furthermore, we propose a binary model with the misclassified variable used as a response. Through simulation studies for the four models, we demonstrate that acknowledging the misclassification improves the accuracy in parameter estimation. For road collision modeling, measurement errors in annual traffic volumes may cause an attenuation effect in regression coefficients. We propose two Bayesian models, and theoretically confirm that the gamma models are identified. Simulation studies are conducted for finite sample scenarios.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International