Title	Handling missing data in observational studies: challenges for teaching and research
Creator	Carpenter, James
Publisher	Banff International Research Station for Mathematical Innovation and Discovery
Date Issued	2016-07-05T08:28
Description	Missing data present an inevitable, if unwelcome, challenge to analysts of observational data. Such analysts typically come from a variety of backgrounds, often with limited formal statistical training. Furthermore, they are increasingly looking to go beyond standard regression models and perform relatively complex analyses, e.g. using propensity scores, hierarchical models, and non-linear models. Alongside this, the methodological literature on missing data is vast, and often relatively inaccessible. Despite excellent reviews [e.g. 1, 2, 3], it is often far from clear to practitioners which methods are essentially equivalent, and the relative strengths of different approaches and software. This is even more true when we move to sensitivity analyses. To move things forward, in this talk I propose some principles for analysts of all levels, and illustrate how they may be implemented in increasingly complex examples. Beginning with analysts with limited statistical training (level 1), I will argue that STRATOS guidance should highlight: • the necessity of performing and reporting a careful complete records analysis, and in particular guidance around how the mechanisms giving rise to the missing data impact the validity of the results ([4, Ch 1; 5, 6]); • the importance of awareness of the scientific context, which should be kept in mind when faced with the results of complex statistical analysis [7]; • the value of including information from appropriate additional variables, not in the primary scientific model [8]; • the usefulness of simple sensitivity analysis; [e.g., 4, Ch 10; 9], • the complications which necessitate going beyond a relatively standard analysis and seeking further assistance, and • how analyses of partially observed data should be reported [10]. I will argue that multiple imputation, though not the ‘best’ solution in all cases, has the widest applicability, and therefore should be considered as the primary approach, indicating how it relates to other approaches, such as direct likelihood and the EM algorithm. As the talk progresses, the examples will become more complex, and I will indicate where I believe guidance would be helpful for level 2 analysts, both in terms of methods and software. I will also briefly discuss how missing data is an example of a broader class of data dependent sampling [7], and the implications of this for developing guidance for researchers. References: [1] Little, R.J. (1992). Regression with Missing X's: a Review. Journal of the American Statistical Association, 87, 1227–1237. [2] Horton, N. J. and Kleinman, K. P. (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. The American Statistician, 61, 1–12. [3] Hogan, J. W., Roy, J. and Krokontzelou, C. (2004). Tutorial in biostatistics: handling drop-out in longitudinal studies. Statistics in Medicine 23, 1455–1497. [4] Carpenter, J. R., and Kenward, M. G. (2013) Multiple Imputation and its Application. Chichester: Wiley. [5] Little, R. J. and Zhang, N (2011) Subsample ignorable likelihood for regression analysis with missing data. Journal of the Royal Statistical Society, Series C, 60, 591–605. [6] Bartlett, J. W., Harel, O. and Carpenter, J. R. (2015) Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. American Journal of Epidemiology, 182, 730–736. [7] Morris, T. P., White, I. R., Royston, P., Seaman, S.R., and Wood, A. M. (2014) Multiple imputation for an incomplete covariate that is a ratio. Statistics in Medicine, 33, 88–104. [8] Spratt, M., Carpenter, J. R, Sterne, J.A.C and Carpenter, J. R. (2010), Strategies for Multiple Imputation in Longitudinal Studies. American Journal of Epidemiology, 172, 478–487. [9] Hogan, J., Daniels, M. J. and Hu, L. (2015) Bayesian Sensitivity Analysis. In Handbook of Missing Data Methodology, eds Molenberghs, G., Fitzmaurice, G., Kenward, M. G., Tsiatis, A. and Verbecke, G., pages 405–431. New York: CRC press. [10] Sterne, J. A. C., White, I. R., Carlin, J. B. et al (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. British Medical Journal, 339, 157–160. [12] Molenberghs, G., Kenward, M. G., Aerts, M., Verbeke, G., Tsiatis, A. A., Davidian, M. and Rizopoulos, D. (2014) On random sample size, ignorability, ancillarity, completeness, separability, and degeneracy: sequential trials, random sample sizes, and missing data. Statistical Methods in Medical Research, 23, 11–41.
Extent	44 minutes
Subject	Mathematics; Statistics; Computer science
Type	Moving Image
File Format	video/mp4
Language	eng
Notes	Author affiliation: London School of Hygiene & Tropical Medicine (UK)
Series	BIRS Workshop Lecture Videos (Banff, Alta)
Date Available	2017-02-01
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0340504
URI	http://hdl.handle.net/2429/60149
Affiliation	Non UBC
Peer Review Status	Unreviewed
Scholarly Level	Faculty
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

BIRS Workshop Lecture Videos

Handling missing data in observational studies: challenges for teaching and research Carpenter, James

Description

Item Metadata

Item Media

Item Citations and Data

Rights