BIRS Workshop Lecture Videos

Banff International Research Station Logo

BIRS Workshop Lecture Videos

Handling missing data in observational studies: challenges for teaching and research Carpenter, James


Missing data present an inevitable, if unwelcome, challenge to analysts of observational data. Such analysts typically come from a variety of backgrounds, often with limited formal statistical training. Furthermore, they are increasingly looking to go beyond standard regression models and perform relatively complex analyses, e.g. using propensity scores, hierarchical models, and non-linear models.

Alongside this, the methodological literature on missing data is vast, and often relatively inaccessible. Despite excellent reviews [e.g. 1, 2, 3], it is often far from clear to practitioners which methods are essentially equivalent, and the relative strengths of different approaches and software. This is even more true when we move to sensitivity analyses.

To move things forward, in this talk I propose some principles for analysts of all levels, and illustrate how they may be implemented in increasingly complex examples. Beginning with analysts with limited statistical training (level 1), I will argue that STRATOS guidance should highlight:
• the necessity of performing and reporting a careful complete records analysis, and in particular guidance around how the mechanisms giving rise to the missing data impact the validity of the results ([4, Ch 1; 5, 6]);
• the importance of awareness of the scientific context, which should be kept in mind when faced with the results of complex statistical analysis [7];
• the value of including information from appropriate additional variables, not in the primary scientific model [8];
• the usefulness of simple sensitivity analysis; [e.g., 4, Ch 10; 9],
• the complications which necessitate going beyond a relatively standard analysis and seeking further assistance, and
• how analyses of partially observed data should be reported [10].

I will argue that multiple imputation, though not the ‘best’ solution in all cases, has the widest applicability, and therefore should be considered as the primary approach, indicating how it relates to other approaches, such as direct likelihood and the EM algorithm.

As the talk progresses, the examples will become more complex, and I will indicate where I believe guidance would be helpful for level 2 analysts, both in terms of methods and software. I will also briefly discuss how missing data is an example of a broader class of data dependent sampling [7], and the implications of this for developing guidance for researchers.

[1] Little, R.J. (1992). Regression with Missing X's: a Review. Journal of the American Statistical Association, 87, 1227–1237.
[2] Horton, N. J. and Kleinman, K. P. (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. The American Statistician, 61, 1–12.
[3] Hogan, J. W., Roy, J. and Krokontzelou, C. (2004). Tutorial in biostatistics: handling drop-out in longitudinal studies. Statistics in Medicine 23, 1455–1497.
[4] Carpenter, J. R., and Kenward, M. G. (2013) Multiple Imputation and its Application. Chichester: Wiley.
[5] Little, R. J. and Zhang, N (2011) Subsample ignorable likelihood for regression analysis with missing data. Journal of the Royal Statistical Society, Series C, 60, 591–605.
[6] Bartlett, J. W., Harel, O. and Carpenter, J. R. (2015) Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. American Journal of Epidemiology, 182, 730–736.
[7] Morris, T. P., White, I. R., Royston, P., Seaman, S.R., and Wood, A. M. (2014) Multiple imputation for an incomplete covariate that is a ratio. Statistics in Medicine, 33, 88–104.
[8] Spratt, M., Carpenter, J. R, Sterne, J.A.C and Carpenter, J. R. (2010), Strategies for Multiple Imputation in Longitudinal Studies. American Journal of Epidemiology, 172, 478–487.
[9] Hogan, J., Daniels, M. J. and Hu, L. (2015) Bayesian Sensitivity Analysis. In Handbook of Missing Data Methodology, eds Molenberghs, G., Fitzmaurice, G., Kenward, M. G., Tsiatis, A. and Verbecke, G., pages 405–431. New York: CRC press.
[10] Sterne, J. A. C., White, I. R., Carlin, J. B. et al (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. British Medical Journal, 339, 157–160.
[12] Molenberghs, G., Kenward, M. G., Aerts, M., Verbeke, G., Tsiatis, A. A., Davidian, M. and Rizopoulos, D. (2014) On random sample size, ignorability, ancillarity, completeness, separability, and degeneracy: sequential trials, random sample sizes, and missing data. Statistical Methods in Medical Research, 23, 11–41.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International