While working on our guidance paper for biostatisticians, we came across some little-known facts related to a type of measurement error known as Berkson error, named after Joseph Berkson who wrote about this type of error in a JASA paper, published in 1951. The main characteristic of such measurement error is that it is independent of the mismeasured observation. In other words, if we denote the true measurement by $X$ and the mismeasured one by $X^{*}$, and the error by e (with mean zero and constant variance), then $X=X^{*}+e$, where e is independent of $X^{*}$. This is in contra-distinction to the more commonly occurring classical error, where $X^{*}=X+e$, and $e$ is independent of $X$.

The common perception is that when a covariate $X^{*} $ in a regression model has Berkson error, the regression coefficients in the model are estimated without bias. In other words if outcome $Y$ is related to $X$ and other covariates $Z$ in a regression model, e.g. $E(Y|X,Z)=\beta_0+\beta_X X+\beta_Z Z$, then it is also true that $E(Y|X^{*},Z)=\beta_0+\beta_X X^{*}+\beta_Z Z$. The only condition that was thought to require this condition was the Ã¢non-differentialityÃ¢ of the error e, i.e. the condition that the conditional distribution $Y|X^{*},X,Z$ is equal to $Y|X,Z$. We have found out that this not generally true and a further condition is required, namely that Berkson error e is also independent of $Z$.

It is possible that this fact may impact on many studies in occupational health. Industrial exposures to workers are often estimated from the mean exposures to specific subgroups. For example, those who perform hands on laboratory work may be at a one level of exposure, whereas those who are office workers and occasionally walk through the laboratory are at a different (lower) level. Ascribing the subgroup mean exposure $X^{*}$ to each worker in a specific subgroup induces Berkson error. But suppose now that the outcome in question is a form of cancer that is related to gender, so that the risk model includes gender among the covariates $Z$. Now if the subgroups include both men and women and the true exposure to men is higher than to women, then the Berkson error e will be related to gender, and our second condition will not be satisfied. Our group will explore whether this type of problem indeed arises in occupational health studies. In addition, we are working on related problems with Berkson error arising from the use of prediction equations in place of observed exposure values, as Pamela Shaw will explain.

While working on our guidance paper for biostatisticians, we came across some little-known facts related to a type of measurement error known as Berkson error, named after Joseph Berkson who wrote about this type of error in a JASA paper, published in 1951. The main characteristic of such measurement error is that it is independent of the mismeasured observation. In other words, if we denote the true measurement by $X$ and the mismeasured one by $X^{*}$, and the error by e (with mean zero and constant variance), then $X=X^{*}+e$, where e is independent of $X^{*}$. This is in contra-distinction to the more commonly occurring classical error, where $X^{*}=X+e$, and $e$ is independent of $X$.

The common perception is that when a covariate $X^{*} $ in a regression model has Berkson error, the regression coefficients in the model are estimated without bias. In other words if outcome $Y$ is related to $X$ and other covariates $Z$ in a regression model, e.g. $E(Y|X,Z)=\beta_0+\beta_X X+\beta_Z Z$, then it is also true that $E(Y|X^{*},Z)=\beta_0+\beta_X X^{*}+\beta_Z Z$. The only condition that was thought to require this condition was the Ã¢non-differentialityÃ¢ of the error e, i.e. the condition that the conditional distribution $Y|X^{*},X,Z$ is equal to $Y|X,Z$. We have found out that this not generally true and a further condition is required, namely that Berkson error e is also independent of $Z$.

It is possible that this fact may impact on many studies in occupational health. Industrial exposures to workers are often estimated from the mean exposures to specific subgroups. For example, those who perform hands on laboratory work may be at a one level of exposure, whereas those who are office workers and occasionally walk through the laboratory are at a different (lower) level. Ascribing the subgroup mean exposure $X^{*}$ to each worker in a specific subgroup induces Berkson error. But suppose now that the outcome in question is a form of cancer that is related to gender, so that the risk model includes gender among the covariates $Z$. Now if the subgroups include both men and women and the true exposure to men is higher than to women, then the Berkson error e will be related to gender, and our second condition will not be satisfied. Our group will explore whether this type of problem indeed arises in occupational health studies. In addition, we are working on related problems with Berkson error arising from the use of prediction equations in place of observed exposure values, as Pamela Shaw will explain.