BIRS Workshop Lecture Videos

Banff International Research Station Logo

BIRS Workshop Lecture Videos

Use and misuse of predicted values in epidemiologic data analyses (TG4) Shaw, Pamela


Pamela A. Shaw, Paul Gustafson, Daniela Sotres-Alvarez, Victor Kipnis, and Laurence Freedman

For many epidemiologic settings, the principle exposure or outcome under study can only be imprecisely measured. In an attempt to address error-in-variables, sometimes the analyst will adjust these variables, say through a calibration or prediction equation, and use the resulting predicted value in the analysis in place of the observed value. When a predicted quantity is used in place of an observed value in a data analysis, consideration of the impact of the uncertainty in the predicted quantity on the study results is needed, but this is not always done in practice. Such predicted variables usually have Berkson error. The result of ignoring this uncertainty, or prediction error, for some settings could be that the parameter estimates are biased, the standard errors are biased, or both. We examine three common examples for how predicted values are used in an analysis in place of an error-prone variable: 1) to estimate the distribution of a variable, 2) to compare values of a variable between groups by using the predicted value in a two-group statistic (e.g. t-statistic) or as an outcome variable in a regression, and 3) to estimate the effect of an error-prone variable on an outcome, where the predicted quantity is used as exposure variable in a regression. For each example, we present an overview of the potential consequences for using a predicted quantity in an analysis in place of the true value without appropriate statistical adjustment. We further illustrate some concepts with data from a large population-based cohort, the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International