Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The assesment and control of non-differential and differential exposure misclassification in a case-control… Newburn-Cook, Christine Valerie 1995

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1996-091406.pdf [ 17.52MB ]
JSON: 831-1.0076950.json
JSON-LD: 831-1.0076950-ld.json
RDF/XML (Pretty): 831-1.0076950-rdf.xml
RDF/JSON: 831-1.0076950-rdf.json
Turtle: 831-1.0076950-turtle.txt
N-Triples: 831-1.0076950-rdf-ntriples.txt
Original Record: 831-1.0076950-source.json
Full Text

Full Text

THE ASSESMENT A N D CONTROL OF NON-DIFFERENTIAL A N D DIFFERENTIAL EXPOSURE MISCLASSIFICATION IN A CASE-CONTROL STUDY OF BREAST CANCER by CHRISTINE VALERIE NEWBURN-COOK M.S.N., The University of British Columbia, 1978 B.ScN., Queen's University, 1976 B.A.Sc, Queen's University, 1974 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (INTERDISCIPLINARY STUDIES) We accept this thesis as conforming to thej-equired standard THE UNIVERSITY OF BRITISH COLUMBIA December 1995 (c) Christine Valerie Newburn-Cook, 1995 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at The University of British Columbia, I agree mat the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. INTERDISCIPLINARY STUDIES:Health Care and Epidemiology, Medicine, Research Design and Methodology, Sociology and Statistics The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T1W5 Date: 7 December 1995 ABSTRACT This research study was designed to investigate some of the methodological issues involved in the design, conduct and analysis of a case-control study. The overall objective was to determine the reliability and validity of exposure data collected in a nested case-control study of breast cancer (N=l,177). The study was designed specifically to determine if the retrospective (post-diagnostic) reports of exposure provided by the cases and the controls were systematically different, to assess the impact of any resulting exposure misclassification on the estimates of relative risk, and (most importantly) to develop and to evaluate a 'Validity Scale' as a possible design standard for the measurement and the statistical control of differential exposure misclassification in future case-control studies. To answer these questions, exposure information was collected prospectively and retrospectively by means of a self-administered questionnaire. When the retrospective (post-diagnostic) and prospective (pre-diagnostic) exposure assessments were compared, the reported levels of exposure were assessed to be both reliable and consistent, although some inconsistencies (i.e., random exposure misclassification) were noted. The data provided no strong and conclusive evidence that the knowledge of diagnosis (i.e., case versus control status) resulted in the differential reporting of past exposure and antecedent events by the cases and the controls. To determine the impact of exposure misclassification on the estimates of association, the prospective and retrospective odds ratios and their 95% confidence intervals were compared. Both the pre- and post-diagnostic odds ratios were found to be comparable. Therefore, the odds ratio (OR) estimates for the various study factors had not been biased towards or away from the null value (OR=1.00) by either the systematic overreporting or the underreporting of exposure by the cases and the controls. Furthermore, these data did not provide empirical evidence for the existence of either non-differential or differential exposure misclassification. ii This was apparently the first study to explore directly the impact of different control groups on the estimates of association, and in particular, whether or not a particular control group would have a tendency to bias odds ratio estimates. Two control groups were recruited for the case-control comparisons ~ healthy controls (i.e., women with a normal mammogram) and anamnestically equivalent controls (i.e., women with an abnormal mammogram but no breast cancer). Correlation, Kappa and McNemar analyses reported similar levels of agreement and inconsistency between the prospective and retrospective reports of exposure among the three study comparison groups. The results suggested that no advantage was obtained by using a control group which was anamnestically equivalent to the cases (except for diagnosis) that is, had experienced the same trauma (an abnormal mammogram), had experienced the same diagnostics to determine a diagnosis, and the same motivation to participate in the research study, and to report their past exposures both completely and reliably. In addition, within the context of this research study, an 'Exposure Data Validity Scale' as conceptualized by Raphael (1987) was designed, implemented and evaluated as a design strategy for the measurement and the control of differential exposure misclassification (i.e., recall bias). Overall, the validity scale appeared to be an effective means of assessing the propensity of the cases and the controls to report past exposures differently, and whether or not the estimates of effect have been subject to distortion (bias) as a result of differential exposure misclassification (i.e., recall bias). Replication studies will be required to determine both the utility and effectiveness of an 'Exposure Data Validity Scale' as a design strategy to be included routinely in future case-control studies. iii T A B L E O F C O N T E N T S Chapter Page Abstract ii Table of Contents iv List of Tables and Figures vii List of Appendices x Acknowledgments xi Chapter 1 - Introduction 1.1 Epidemiology of Breast Cancer: The Magnitude of the Problem 1 in Canada and British Columbia 1.2 Research Problem and the Significance of the Study 4 1.3 Overview of the Dissertation 10 Tables for Chapter 1 13 Chapter 2 - Background to the Research Problem 2.1 Methodological Limitations of Case-Control Studies 18 2.2 The Measurement of Exposure: The Problem of 22 Misclassification and Its Impact on Risk Estimates 2.3 Factors Affecting Exposure Reporting and Recall Accuracy: 27 The Respondent as a Source of Measurement Error 2.3.1 Encoding, Retrieval and Judgment (Inferential) 30 Errors: An Overview 2.3.2 Cognitive Perspectives on Coding and Retrieval 32 of Autobiographical Memory 2.3.3 Impact of the Properties of Memory on the 46 Retrieval of Autobiographical Facts 2.3.4 Judgment of the Appropriate Answer: Inferential 53 Processing Strategies and Associated Errors 2.3.5 Respondent Rule Effects 63 2.3.6 Nonattitudes and Acquiescent Response Behavior 66 2.3.7 Sociodemographic Correlates of Respondent 69 Error 2.4 Factors Affecting Exposure Reporting and Recall Accuracy: 76 The Interviewer as a Source of Measurement Error iv Chapter Page 2.5 Factors Affecting Exposure Reporting and Recall Accuracy: 87 Task Variables as a Source of Measurement Error 2.6 Review of the Literature: Studies of Recall Accuracy and Recall 99 Bias (Differential Exposure Misclassification) 2.7 Raphael's Proposal for the Measurement and the Control of 101 Recall Bias: The Development and Implementation of an 'Exposure Data Validity Scale' 2.8 Methodological Considerations for the Design of an'Exposure 104 Data Validity Scale' 2.9 Summary 105 Tables for Chapter 2 107 Chapter 3 - Research Design and Methods 3.1 Study Objectives 151 3.2 Study Design 152 3.3 Subject Recruitment 153 3.4 Case Definition and Selection 156 3.5 Control Groups: Definition and Selection 158 3.6 Matching Criteria 161 3.7 Sample Size and Power Calculations 162 3.8 Data Collection Procedures 3.8.1 Questionnaire One 163 3.8.2 Questionnaire Two 164 3.9 Procedure for Handling Non-Respondents 165 3.10 Data Handling and Analysis 166 3.10.1 Data Coding and Entry 166 3.10.2 Analysis of Non-Respondents 167 3.10.3 Test-Retest Reliability Analysis 167 3.10.4 Kappa Analysis 169 3.10.5 McNemar Analysis 170 3.10.6 Prospective and Retrospective Relative Risk 170 Assessments 3.11 Stages in the Development of an 'Exposure Data Validity 172 Scale' 3.11.1 Search for the Candidate Variables 173 3.11.2 Selection of the Validity Scale Exposure Variables 174 and Assignment of Weighting Factors v Chapter Page 3.11.3 Administration and Analysis of the Validity Scale 177 3.11.4 Evaluation of the 'Exposure Data Validity Scale' 178 3.12 Summary 179 Chapter 4 - Results 4.1 Response Rate 181 4.2 Description of the Study Groups 183 4.3 Analysis of the Non-Respondents 192 4.4 Test-Retest Reliability: Agreement Between the Prospective and 195 the Retrospective Exposure Reports 4.5 Kappa Analysis 198 4.6 McNemar Analysis for Directional Discordance 200 4.7 Retrospective Versus Prospective Reports of Exposure: A 204 Comparison of the Exposure-Disease Odds Ratio Estimates 4.8 The 'Exposure Data Validity Scale': Development and Analysis 209 4.9 Summary 228 Tables and Figures for Chapter 4 232 Chapter 5 - Discussion 5.1 The Evidence for Non-Differential and Differential Exposure 305 Misclassification 5.2 Evaluation of the Effectiveness of the 'Exposure Data Validity 313 Scale' 5.3 Study Validity 319 5.3.1 Internal Validity 320 5.3.2 External Validity 324 5.4 Limitations of the Study 326 5.5 Recommendations for Future Research 328 5.6 Conclusions 330 Table for Chapter 5 334 Bibliography 336 Appendices 1-9 370 vi LIST OF TABLES AND FIGURES Risk Factors for Breast Cancer Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Estimated Response Rates for the Three Study Comparison Groups Reasons Given for Non-Response Comparison of the Study Population to the Screening Mammography First-time and Returning Participants Regarding Various Demographic Factors and the Risk Factor Profile Demographic, Medical, Reproductive and Lifestyle Characteristics of the Study Groups Demographic, Medical, Reproductive and Lifestyle Characteristics of the Study Respondents and Non-Respondents Pearson Product Moment and Spearman Rank Correlations for Prospective and Retrospective Reports of Exposure for Study Variables and Conditions Possibly Related to Breast Cancer Risk - A Summary by Magnitude The Agreement Between Prospective and Retrospective Reports of Exposure as Measured by Kappa Values - A Summary by Magnitude A Comparison of Prospective and Retrospective Reports of Exposure by Breast Cancer Cases: Level of Agreement Between Reports (Kappa) and Directional Discordance (McNemar's Test) Table 10.2: A Comparison of Prospective and Retrospective Reports of Exposure by Controls (Group One): Level of Agreement Between Reports (Kappa) and Directional Discordance (McNemar's Test) Table 10.3: A Comparison of Prospective and Retrospective Reports of Exposure by Controls (Group Two): Level of Agreement Between Reports (Kappa) and Directional Discordance (McNemar's Test) Table 11: Exposure-Disease Odds Ratio Estimates for Prospective (Pre-Diagnostic) and Retrospective (Post-Diagnostic) Reports of Exposure for Study Factors Possibly Related to Breast Cancer Development Table 1: Table 2: Table 3: Table 4: Table 5: Table 6: Table 7: Table 8: Table 9: Table 10.1: vii Table 12: A Comparson of the Prospective and Retrospective Exposure-Disease Odds Ratios: Study Factors for Which the Odds Ratio Estimates Differed Table 13: Estimation of Exposure-Disease Odds Ratios for the Selected Exposures and Conditions Included in the Validity Scale Table 14.1: Exposure Data Validity Scale Analysis: The Determination of Study Group Differences for Individual Scale Exposure Variables and the Aggregate Validity Scale Summary Score (Part 1) Table 14.2: Exposure Data Validity Scale Analysis: The Determination of Study Group Differences for Individual Scale Exposure Variables and the Aggregate Validity Scale Summary Score (Part 2) Table 14.3: Exposure Data Validity Scale Analysis: Exposure Factors Identified by Breast Cancer Patients as Being Relevant For Disease Occurrence (Part 3) Table 15: Contingency Table of Subject Responses Classified by Risk Factor and Perceived Level of Risk (Etiologic Importance) Table 16: Definition of Risk Factor Codes Table 17: Response Frequency Expressed as Percentages of Marginal Row Totals Table 18: Analysis of the Frequency of Response Percentages: Factors Identified as "Plausible" Risk Factors for Breast Cancer Development Using the Subjective Inclusion Rule Table 19: Numerical Output from the Correspondence Analysis: The Principal Inertia (Eigenvalues) and Total Inertia, the Percentages of Inertia and Cumulative Percentages Table 20: The Analysis of Row and Column Coefficients: Absolute Contributions (CTR), Squared Correlations (COR), Distance of the Profiles from the Origin (FACT T), Profile Masses (MASS) and the Quality of the Representation of the Row and Column Profiles (QLT) Table 21: Definitions of the Column and Row Coefficients Table 22: Assessment of the Accuracy of the Two-Dimensional Graphical Representation: QLT Analysis Table 23: The Evaluation of the Accuracy of the Two-Dimensional Graphical Display: CTR (Contributions to Inertia) and COR (Contributions to the Principal Axis) Analyses viii Table 24: A Comparison of the Prospective Risk Estimates of this Study with the Retrospective Estimates of Association Reported in Other Case-Control Studies Figure 1: A One-Dimensional Display of Row (Risk Factor) and Column (Levels of Risk) Profiles Figure 2.1: A Two-Dimensional Display of Row and Column Points: A Simultaneous Display Figure 2.2: A Two-Dimensional Display of Row and Column Points: A Simultaneous Display Increased by Magnification ix LIST OF APPENDICES Appendix 1: Validity Scale Development - Selection of Exposure Variables and Weighting Factors: Letter of Information to Potential Subjects Appendix 2: The Validity Scale Questionnaire Used to Evaluate the Plausibility and Perceived Etiologic Importance of Several Exposure Variables and Conditions Being Considered for Possible Inclusion in Questionnaire Two Appendix 3: The Enrolment Screening Questionnaire Administered to Mammography Clients by the Screening Mammography Program of British Columbia. (This Questionnaire is referred to in this thesis as Questionnaire One (Ql). Pre-diagnostic exposure data were generated from responses to Q l items). Appendix 4: Letter of Introduction Sent to Potential Subjects from the Executive Director of the Screening Mammography Program of BC. (This letter introduces the 'purpose' of the research study as well as the researcher who would be calling the potential subjects to determine their willingness to participate. Appendix 5: Letter of Information Accompanying the Study Questionnaire (Questionnaire Two) Appendix 6: The Study Questionnaire - Questionnaire Two (Q2). This Questionnaire was used to Collect Retrospective (Post-diagnostic) Reports of Exposure as well as Subject Responses to Included Validity Scale Items Appendix 7: Reminder Letter Sent to Non-Respondents Appendix 8: Codebook Developed for Questionnaire One and Two Data Processing Appendix 9: Power Calculations X ACKNOWLEDGMENTS I am grateful to the members of my thesis committee for their individual and collective contributions to the completion of my research and dissertation. I am indebted to my research supervisor, Dr. Robert Corny, who provided ongoing guidance, support and encouragement, as well as constructive criticism and suggestions during the final editing of the manuscript. I am grateful as well to Dr. Gregory Hislop, the Senior Epidemiologist at the British Columbia Cancer Agency, and also a member of my thesis committee, for assisting me to gain access to the Screening Mammography Program of British Columbia for the conduct of this study. He provided invaluable assistance to the development of the study protocol, and preparation of the study questionnaire, and important input regarding changes to the research design prior to data collection. The other members of my thesis committee were Dr. Michael Schulzer, Dr. Martin Schechter, Dr. Nancy Waxier-Morrison and Dr. Gordon Page. I wish to express to them my sincere gratitude for participating on my thesis committee, for their interest in the study and also for providing me with useful comments and suggestions at the different stages of the dissertation. I also thank Dr. Linda Warren and Ms. Lisa Kahn, the Executive Director and research coordinator respectively of the Screening Mammography Program of British Columbia for assisting me in the recruitment of the cases and controls, and for providing access to the SMP BC data which was required for this thesis project. There are several other people who generously provided assistance with this thesis, and to whom I am indebted. Dr. Jonathan Berkowitz was an invaluable resource for numerous questions regarding data analysis. I would also like to thank Reverend Amethyst Campbell for her assistance in the coding of data and Mrs. Sherry Mihamoto for data entry. I especially wish to thank the generous women who participated in this study. As well, I acknowledge the assistance of the University of British Columbia for financial support provided by means of a graduate fellowship. My family and friends have provided continuous support and encouragement. In particular, I would like to thank Commander Robert Blakely who encouraged me at a crucial stage in this endeavour to rethink my priorities and to complete this dissertation. And last but not least, to my husband and 'best friend, Brian Cook, I thank you for your love, your belief in me and my abilities, your constant support and encouragement, as well as your numerous editorial comments. To the Memory of My Grandmother - Mary Newburn-Redhead 1 Chapter 1 INTRODUCTION 1.1 Epidemiology of Breast Cancer: The Magnitude of the Problem in Canada and British Columbia The incidence of breast cancer is rising around the world. It is a major public health problem for women in the more developed countries, especially in North America and western Europe (Miller and Bulbrook, 1986). Until approximately 1985, breast cancer was the principal cause of cancer deaths in Canadian women. However, while breast cancer mortality rates have remained stable over the past decade, mortality rates for lung cancer have increased rather dramatically. Consequently, lung cancer has surpassed breast cancer as the most frequent and fatal neoplasm in women in the industrialized western countries (National Cancer Institute of Canada, 1995). Furthermore, the National Cancer Institute of Canada (1995) projects that 5,800 Canadian women will die from lung cancer and 5,400 from breast cancer in 1995. The 1995 age-standardized breast cancer mortality rates for Canada and British Columbia are 31 per 100,000 and 27 per 100,000 respectively. Therefore, it is estimated that of the 5,400 Canadian women who will die from breast cancer in 1995,830 will be women in British Columbia (National Cancer Institute of Canada, 1995). In Canada, breast, colorectal and lung cancers are responsible for at least 55% of the new cases of cancer in women. The age-standardized incidence rate (ASIR) for cancer of the female breast in Canadian women is 103 cases per 100,000, whereas the ASIR in British Columbia is 117 cases per 100,000. The BC rate is comparable to the Canadian age-standardized incidence rate. Given the national 2 and provincial ASIRs, it is estimated that in 1995, there will be 17,700 new cases of breast cancer diagnosed in Canada, and 2,600 women will be from British Columbia (National Cancer Institute of Canada, 1995). In 1989, the number of new cases of breast cancer in Canada was 12,300. A comparison of 1989 and 1995 figures demonstrates clearly that breast cancer incidence continues to rise, and more new cases are expected each year. The average annual percent change in age-standardized breast cancer incidence from 1983-1990 in Canada has increased by 1.3%. Only the incidence of female lung cancer is rising faster, at a rate of 3.7% (National Cancer Institute of Canada, 1989; Friedenreich, 1990; National Cancer Institute of Canada, 1995). Miller and Bulbrook (1986) projected that if the incidence rates continue to increase in young women under the age of 50 years worldwide, "the number of breast cancer cases will increase from 541,000 (in 1975) to over 800,000 by the year 2000, and this figure could even exceed one million cases" (p.173). In addition, "over half of these cases will be diagnosed in countries where breast cancer is not currently the most frequent cancer in women" (p.173) - in Asian countries including Japan and Singapore, central Europe, and some South American countries (Miller and Bulbrook, 1986, p.173; Friedenreich, p.5). Breast cancer is an epidemic which is responsible for more morbidity than any other disease (Papaioannou, 1974; Wallis, 1991). The observed trends of a decrease in mortality, along with an increase in incidence of female breast cancer over the past decade may be related to several factors, including: earlier detection, mammographic examinations (used since the mid-1980s), more sensitive diagnostic techniques, and improvements in cancer registration (Wigle et al., 1986; Friedenreich, 1990; National Cancer Institute of Canada, 1995). 3 The National Cancer Institute of Canada (1995) noted that although cancer (for all sites) is primarily a disease of elderly Canadians, female breast cancer is "more frequent at earlier ages with about one third of all cases occurring in women aged 40-59 years, and another third in women aged 70 years and older" (p.38). In fact, breast cancer is the leading cause of death for women aged 35-50 years (Paffenbarger et al., 1980). A Canadian woman has a 1 in 3 chance of developing cancer over the course of her lifetime. In contrast, the same woman has a 1 in 9 chance of developing breast cancer, and a 1 in 24 chance of dying from a breast neoplasm. Unfortunately, the probability of developing breast cancer during one's lifetime also continues to increase (National Cancer Institute of Canada, 1995, p.42). While breast cancer was the principal cause of premature mortality in women in 1989, lung cancer is replacing it as the leading cause of premature mortality (i.e., years of life lost before age 75) (Bisch et a l , 1989; Friedenreich, 1990). However, together, they still pose a major health concern to women and to health care providers and policy-makers. According to the 1995 Canadian Cancer Statistics, the potential years of life lost due to breast cancer in 1992 was estimated at 95,000, which is equivalent to 21.6% of premature mortality from all causes. "Although more men than women die from cancer every year, women generally live longer than men and many of the cancer deaths among women occur at younger ages", as is the case with breast cancer. Consequently, the loss of potential years of life is slightly higher for women (National Cancer Institute of Canada, 1995, p.47). The five-year survival rate for breast cancer is approximately 75%. Only 63% of the breast cancer patients are alive 10 or more years after diagnosis (Wallis, 1991). Breast cancer remains a major public health concern, and is indeed an epidemic affecting women in Western industrial countries (Paffenbarger et al., 4 1980) for several reasons: its morbidity, the years of life lost due to premature death, a rising incidence rate, relative ignorance regarding its etiology, and conflicting research reports on exposure-disease associations for the risk factors believed responsible for breast cancer development. Multiple variables have been identified as potential risk factors for breast cancer. These factors, and their influence on risk, are identified and summarized in Table 1 (pp.13-17). It must be noted that very few variables have been identified with absolute certainty as risk factors for breast cancer development. A plethora of etiologic investigations (both case-control and cohort) has been conducted, which have produced conflicting rather than supporting evidence for these putative factors. Relative risk estimates for the exposure variables differ considerably from study to study. Governments continue to invest grant monies, and researchers continue to explore the question of breast cancer etiology, in order to get a better idea about what the precipitating factors are. Limitations in research methodology may be responsible for the many contradictions, and the significant lack of progress made regarding our understanding of the etiology and natural history of breast cancer. 1.2 The Research Problem and Significance of Study Considering the fact that the incidence of breast cancer in Canada and British Columbia is among the highest in the world, and continues to increase, there is a clear need to complete well-conducted, and methodologically sound etiologic studies to determine the factors responsible for disease occurrence, and when possible, to initiate prevention programs. The case-control design is particularly well-suited for studying a disease like breast cancer which occurs many years after exposure to the suspected etiologic factors. Unlike cohort studies and clinical trials in which subjects are 5 followed in a forward direction from exposure to a particular outcome or disease, case-control studies begin with the recruitment of subjects who already have a particular outcome or disease (referred to as {he 'cases') and those without the disease (the 'controls'). The two groups are then compared with respect to the prevalence of the exposures and antecedent conditions thought to be associated with the development of the outcome or disease under investigation. By employing the case-control approach to diseases like breast cancer, which have a long latency, investigators can quickly and efficiently mount and conduct a study because they begin immediately to search for and to recruit women with breast cancer (i.e., the cases). Unlike the situation with cohort studies and clinical trials, relatively little money and effort are expended on the follow-up of subjects who remain free of disease; as well, "there is no need to wait for time to elapse between an exposure and the manifestation of disease" (Schlesselman, 1982, pp.18-19). In addition, Mantel (1973) indicates that comparatively fewer subjects are required in order to test for exposure-disease associations (i.e., there is no requirement to follow a large number of subjects to get sufficient numbers of individuals who develop the particular disease); and, more than one potential risk factor can be studied at the same time. Another advantage to a retrospective study is that the standard error of the odds ratio is smaller than that found in a prospective study, or a cross-sectional study of the same size (Fleiss, 1981). The stated advantages of case-control studies make them the preferred design for the study of rare diseases and those with a long incubation period (i.e., breast cancer). However, the ability of a case-control study to generate valid estimates of association between the risk factor(s) and disease occurrence (i.e., exposure-disease odds ratio) depends on the capacity of the cases and the controls to provide complete and accurate personal histories regarding past events and the 6 exposures of interest. By definition, the exposure-disease odds ratio (OR) is the ratio of the odds of exposure among the cases to the odds of exposure among the controls (Last, 1995, p.118). For rare conditions (e.g., most cancers), the odds ratio provides a valid estimate of the relative risk (RR), which is a measure of the magnitude of the association between exposure and disease, and indicates the likelihood of developing the disease in those subjects who were exposed relative to those who were not exposed (Henneckens and Buring, 1987; Miettinen, 1976). If the OR=1.0, the ratio of the odds of exposure among the cases is equal to that among the controls, and there is no association between the exposure and the disease. An OR>1.0 indicates a positive association, whereas an OR<1.0 represents a negative association. Methodologists challenge the credibility of case-control research, in particular the reliability and validity of exposure data, and consequently the study's findings with respect to the relationship between an exposure and disease (i.e., OR estimates) for several reasons: 1) its non-experimental approach, and the "backwards directionality" of reasoning from effect (disease outcome) to cause (risk factors) - (Rothman, 1986; Kramer, 1988); 2) the fact that subjects are requested to provide exposure information after they are aware of their disease status (Mackenzie, 1986; Friedenreich, 1990); 3) the observation that the compared groups (i.e., those subjects with (cases) and without the disease (controls)) are selected from two separate populations; the researcher cannot thus be confident that the groups are similar with respect to extraneous risk factors and other sources of distortion (Kramer, 1988; Kleinbaum et al., 1982); and, 4) the number, diversity and legitimacy of variables that have been proposed as possible contributors to inequivalent and faulty recall of exposure by the cases 7 and controls — i.e., the salience and emotional impact of the outcome event, the presence or absence of disease, the length of recall, "telescoping", respondent characteristics (motivation, age, sex, education, socioeconomic status) the research design employed, the type of controls used (i.e., an anamnestic equivalent, diseased referent, population controls, proxy respondents), the method of data collection (i.e., interview, self-administered questionnaire), trait desirability, the need for social approval when the requested information is either sensitive or embarrassing, as well as time, memory and judgment factors. Critics of the case-control method believe that this research design is susceptible to several methodological problems and hidden biases (Cole, 1979). Of particular concern is the possibility of exposure misclassification — measurement errors which occur in the process of obtaining the required exposure information from the cases and the controls — and specifically, the differential recall of exposure by the cases and the controls (i.e., recall bias). If differential exposure misclassification is present, the odds ratio may be biased in unpredictable ways (i.e., either the underestimation or overestimation of the association between a potential risk factor and the occurrence of disease). It is also conjectured that the fluctuations in the odds ratio estimates either towards or away from the null value (i.e., OR=1.0 -- which implies no risk or association between the risk factor and the disease outcome) may result in either type I or type II errors, and partially explain the problem of discrepant and contradictory results found among etiologic case-control and cohort studies on the same topic (Hayden et al., 1982; Morabia, 1990; Austin et al., 1994). However, the empirical evidence for such criticisms of the case-control method may be unjustified. In view of the large number of case-control studies completed, it is notable that very few of them have evaluated both the reliability and validity of exposure information collected and the impact of exposure 8 misclassification on the estimates of effect (i.e., ORs). The few studies which do exist have not provided conclusive and convincing evidence that differential exposure misclassification is as significant a problem as it is thought to be. This lack of evidence does not convince the critics, often due to the inherent methodological limitations of the research itself (Mackenzie, 1986). The warnings about case-control methodology persist, and confidence in the case-control results is cloaked in the fear that bias does exist, and has in fact invalidated study conclusions. These doubts will persist as long as researchers fail to prove the contrary — directly within the context of case-control research. Case-control methodology is nevertheless an important epidemiological tool for the investigation of cause-effect relationships in situations in which neither randomized clinical trials nor cohort studies can be performed. Therefore, it is necessary in the routine conduct of case-control studies for researchers to address formally the reliability and validity of their data and the exposure estimates, and to adjust the estimates of effect when distortion exists. Therefore, the overall objective of this study is to determine the reliability and validity of exposure information collected in a case-control study of breast cancer, and the impact of any resulting exposure misclassification on the exposure-disease odds ratios for the various study factors. This research extends the work of Klemetti and Saxen (1967), Mackenzie (1986), Friedenreich (1990), and others who have investigated the nature and impact of differential misclassification (recall bias) in case-control studies by also investigating the presence and consequences of non-differential exposure misclassification (NDEM) on estimates of effect, as well as the use of an exposure data validity scale to measure and to control differential exposure misclassification (DEM). Rothman (1986) noted that epidemiologists have generally found it more acceptable to underestimate than to overestimate effects. This may partly explain 9 the focusing of attention on differential exposure misclassification in substantive research. However, this investigator believes that N D E M , which occurs when exposure "misclassification is incorrect for an equal proportion of the cases and the controls" (Rothman, 1986, p.87) must also be considered in the conduct of etiologic case-control studies. Non-differential misclassification bias and its effect on the odds ratio estimates have not received enough attention from researchers and methodologists. N D E M must be systematically and empirically investigated in case-control studies. Its presence may obscure subtle but real risk effects (i.e., the occurrence of type II error), and account for not finding significant risk effects for factors which have biological plausibility and have-been proven in animal studies. This researcher contends that data validity studies must consider both type I and type II errors — that is, the declaration that an association is significant when it is not (a type 1 error), and the failure to find an effect when one exists (a type II error). The significance of this study also rests in its attempts to design, implement and evaluate a validity scale as a design strategy for use in case-control studies to measure and to control for exposure misclassification. If successful, the validity scale concept could be adopted and modified for routine inclusion in a case-control study. Thus, researchers could (in every case) verify directly and empirically the epidemiological adequacy of the exposure data on which their research depends, or even show that exposure measurement error and/or bias exists: and if it were present, whether or not it poses a plausible threat to the study's conclusions. Information on the magnitude and the direction of exposure misclassification could then be used statistically to correct estimates of association (odds ratios), so that valid study findings could be generated — conclusions in which both the researcher and the scientific community could have confidence. This will increase the design's credibility and enhance its acceptance as an 10 important research strategy for the study of chronic and rare disease etiology, while furthering our understanding of the natural history and the etiologic basis of disease. This study is also significant in its application of an infrequently used French multivariate statistical technique - correspondence analysis (developed by the French analyst - Jean-Paul Benzecri in the early 1960s) - for the construction of the 'exposure data validity scale', and specifically, the selection of the exposure items for inclusion in the scale and their relevant weighting factors. 1.3 Overview of the Dissertation This dissertation is divided into five chapters. Chapter One provides the general background to this study. It begins with a brief discussion of the epidemiology of breast cancer to demonstrate the magnitude of this problem, and its impact on women's health. The multiple risk factors identified in breast cancer etiology are then summarized, followed by comments about the lack of progress in understanding the natural history of breast cancer, and about the existence of conflicting evidence for the etiologic importance of the various risk factors; these are provided to substantiate the rationale for this study, and for the selection of the specific research questions. Finally, there is brief argument supporting the significance of this study for the evaluation and interpretation of previously conducted cancer case-control studies, as well as for the refinement and improvement of future etiologic studies. Chapter Two discusses the limitations of case-control methodology, and examines the problem of exposure misclassification, with its impact on the estimation of relative risk. The review of the literature in Chapter Two deals specifically with two critical content areas: 11 1) the processes of human memory responsible for the encoding, retrieval and reporting of exposure and other health related events, as well as the factors which affect recall accuracy; and, 2) the substantive research which has examined the reliability of exposure data, and the presence, direction and magnitude of differential exposure misclassification (recall bias). Chapter Three provides a detailed discussion of the research methods employed in this study to address the research questions. The areas discussed include: the choice of a nested case-control study design, the recruitment and the selection of the study groups, the use of multiple control groups and anamnestic equivalents, as well as the specific procedures used to collect and to analyze the data. This chapter also describes the stages in the construction of the 'exposure data validity scale' which was to be evaluated in this study as a possible design strategy to measure and to control for recall bias. Chapter Four reports the findings of the study. It includes a description and comparison of the study groups, and an analysis of the non-respondents regarding demographic, medical, reproductive, exposure, anthropometric and lifestyle variables. The results of the data analytic procedures are presented to address several key issues: the reliability of prospective versus retrospective exposure data, whether or not differential exposure misclassification exists, the impact of exposure misclassification on the exposure-disease odds ratios, and the validity of study conclusions regarding the association of study factors to the occurrence of breast cancer. Chapter Four concludes with the results of the 'exposure data validity scale', and an evaluation of its effectiveness. Chapter Five examines and interprets the results of this study with respect to the agreement between prospective and retrospective exposure data, and to the extent and impact of non-differential and differential exposure misclassification 12 on risk estimates. As well, it provides recommendations and conclusions regarding the use of a validity scale to measure exposure case-control differences in reporting accuracy. The validity and generalizability of the findings are discussed, along with the strengths and limitations of this study. The chapter concludes with important recommendations for future studies. 13 Table 1: Risk Factors for Breast Cancer Risk Factor Influence on Breast Cancer Risk Comments 1. Female sex Increase risk Most important risk factor 2. Age at menarche (early vs late) Inversely related to risk Protective - late menarche Early menarche: menarche before age 12 has twice the risk relative to menarche after age 12 Late menarche - s 15 yrs [Relative risk (RR) - 1.1 to 1.9]2 3. Age at menopause (late vs early) Directly related to risk Late menopause (after age 45) results in twice the risk relative to early menopause [RR - 1.1 to 1.912 4. Age at First Full Term Pregnancy (late vs early) Directly related to risk Late first full term pregnancy (older than 30 years) or no pregnancy (nulliparous) results in 2- 3x the risk relative to early first pregnancy (i.e., younger than 22 years) -which appears to be protective [RR - 2.2 to 4.012 5. Marital Status High risk: never married Low risk: ever married Probably related to childbearing practices [RR - 1.1 to 1.9]2 6. Country of Residence High risk: industrialized countries (North America and Northern Europe) Low risk: Asia, Africa [RR>4.0] 2 7. Socioeconomic Status High risk: upper class Low risk: lower class [RR- 1.1 to2.0]3 Risk Factor Data were taken from the following sources: 1. Vihko R, Apter D. Endogenous steroids in the pathophysiology of breast cancer. Critical Reviews in Oncology/Hematology 1989; 9: 1-16. 2. Kelsey JL, Hildreth NG. Breast and Gynecologic Cancer Epidemiology. Boca Raton, Florida: CRC Press, 1983. 3. Kelsey, JL. Breast Cancer Epidemiology: Summary and Future Directions. Epidemiologic Reviews 1993; 15: 256-263, p. 257. 4. Dunn B, Hislop TG, Anthony V. Breast Cancer Risk and Prognosis (Handout - BCCA) . 5. Petrakis NL, Ernster VL, King MC. In Cancer Epidemiology and Prevention (D Schottenfeld and J Fraunrence, eds.). Philadelphia: WB Saunders, 1982, pp. 855-870. 14 Table 1 (continued): Risk Factors for Breast Cancer Risk Factor Influence on Breast Cancer Risk Comments 8. Oophorectomy Decreases risk Protection is inversely related to age at oophorectomy Low risk (protection) if surgery occurs before age 40 years fRR - 2.0 to 4.012 9. Ionizing radiation Increases risk if the radiation to the chest is in moderate to high doses3 Dose response is probably linear. Sensitivity varies with age The greatest sensitivity occurs during puberty and mammary development fRR-2.0to4.0T 2 10. Benign breast disease Risk is increased with fibrocystic disease Proliferative lesions most important Women with benign breast disease may have increased risk up to 5x with demonstrated severe hyperplasia with atypia fRR-2.0to4.01 2 11. Family history of breast cancer Increases risk Having first degree relatives (mother, sister, grandmother) with breast cancer gives 2x to 3x the relative risk. Risk is increased Oiigher) if relative has had early breast cancer (premenopausal) and/or bilateral breast cancer fRR>4.01 2 12. Early abortion Increases risk Inconclusive evidence in research studies 13. Body build/ Weight/Obesity (Anthropometric) Obese vs normal weight (thin) may increase risk of postmenopausal cancer Breast cancer at 2 50 yrs -Ough risk: obese; low risk: thin) Breast cancer at < 50 years Ough risk: thin; low risk: obese) Concerns only postmenopausal women; may be opposite premenopausally. Weak evidence that premenopausal breast cancer related to excess body weight. Negatively associated with premenopausal cancer [RR-2.0to4.0] 2 15 Table 1 (continued): Risk Factors for Breast Cancer Risk Factor Influence on Breast Cancer Risk Comments 14. Race/Ethnicity Higkrisk: white Low risk: black Breast cancer at 2 45 yrs: (high risk - white; low risk-Hispanic/Asian) Breast cancer at < 40 yrs: 0iigh risk - black; low risk -Hispanic/Asian) This relationship exists for women over age 40 years. Under 40 years, black women have a higher risk for breast cancer [RR- 1.1 to 1.9]2 15. Place of residence High risk: urban Low risk: rural Effect mediated through several other factors [RR - 1.1 to 1.9]2 16. History of primary cancer in ovary or endometrium High risk: yes Low risk: no [RR-2.0to4.0] 2 17. Age Increases risk High risk: old Low risk: young Effect starts at puberty and diminishes after menopause Breast cancer is extremely uncommon in women under age 25. For women under 30, the risk is about 1%, but once women are in their 30s, they are in a 15% risk category. After age 40, women enter the period in which 85% of breast cancers occur 4 rRR>4.01 3 18. Lactation High risk: no Low risk: yes (protective) Research evidence is inconclusive. Very weak relationship, if any 19. Parity Inversely related to risk Effect of additional pregnancies small compared to age at first full term birth 20. Exogenous hormone use: a. Estrogen replacement therapy (ERT) b. Oral contraceptives if taken under age 45 years Ni l risk. Evidence inconclusive or controversial The overall opinion is that estrogen used for contraception or for the treatment of menopausal symptoms is not associated with an increased risk of breast cancer. This is a controversial area and has proponents on both sides Prolonged use of ERT may increase risk Birth control pills taken after age 45 years may increase risk 16 Table 1 (continued): Risk Factors for Breast Cancer Risk Factor Influence on Breast Cancer Risk Comments 21. Lifestyle factors: a. Smoking No overall association with breast cancer b. Alcohol Consumption Increased risk with alcohol consumption Inconclusive and conflicting research evidence. Dose-response data are inconsistent c. Emotional stress Inconsistent results. Ni l risk d. Exercise Inconsistent results confounded by diet and obesity effects 22. Diet Fat: saturated fat positively associated with risk in postmenopausal patients Protein: no association after adjustment for fat Carbohydrates: no association after adjusment for fat Total calories: no association after adjustment for fat Fiber/Fruits/Vegetables: Negative association — strongest in postmenopausal women High fat diet probably increases risk 23. Atypical epithelial cells in nipple aspirate f luid 3 Increases risk High risk: cells present Low risk: no fluid produced [RR>4.01 3 24. Nodular densities on the mammogram3 Increases risk High risk: densities occupying > 75% of breast volume Low risk: parenchyma composed entirely of fat fRR - 2.0 to 4.013 17 Table 1 (continued): Risk Factors for Breast Cancer Risk Factor Influence on Breast Cancer Risk Comments 25. Hyperplastic epithelial cells without atypia in nipple aspirate fluid3 Increases risk High risk: cells present Low risk: no fluid produced [RR - 2.0 to 4.013 26. Religion High risk: Jewish Low risk: Seventh-day Adventist, Mormon 3 [RR -1.1 to2.0]3 18 Chapter 2 BACKGROUND TO THE RESEARCH PROBLEM A case-control study is classified as a retrospective, observational and non-experimental research design, in which two groups are studied: one with a particular disease or outcome ('cases'), and the other without the disease or outcome ('controls' or 'referents'). The two study groups are then compared regarding the prevalence of existing prior exposures, characteristics and conditions hypothesized as putative (risk) factors for the outcome event. In other words, case-control methodology seeks to compare exposure frequencies between diseased and non-diseased study groups; and, the inclusion of one or more control groups provides an estimate of the frequency of exposure expected in subjects free of the disease (Schlesselman, 1982). In a discussion of the historical development of case-control methodology, McFarlane et al. (1986) noted that this research design was developed by epidemiologists to examine cause-effect relationships in situations where experimental randomized clinical trials and cohort studies could not be conducted due to lack of feasibility, or to logistical, ethical, and cost limitations. 2.1 Methodological Limitations of Case-Control Studies There are several significant advantages to using the case-control design -the simultaneous study of multiple risk factors; the relative simplicity of design, implementation and evaluation; its statistical efficiency (relatively few subjects required, no extended follow-up or loss of subjects-to-follow-up); its cost-efficiency; and, the lack of harm to subjects. Combining these virtues with the 19 development of, and improvements in, statistical procedures for the handling of case-control data has resulted in widespread use and increased acceptance of case-control methodology (Cole, 1979; Breslow, 1982; Schlesselman, 1982; Rothman, 1986). Sackett (1979) noted a fourfold increase in the number of case-control studies being completed and published in medical journals. In fact, case-control methodology remains the design of choice for the study of rare diseases (i.e., congenital anomalies) and chronic diseases such as cancer, where the incubation or latency period between exposure and disease outcome extends over many years (Mackenzie, 1986). However, the disadvantages of using case-control methodology, and its associated deficiencies, have provoked intense scrutiny of this research design, dividing the research community into supporters and opponents of this investigative tool. The deficiencies and or limitations of case-control methodology include: 1) The possibility of unreliable and incomplete exposure data being collected. In a case-control study, the reliability and validity of exposure data are challenged because the researcher must rely on subjects' recall or on records for information on past exposure. Records are not always available equally for both the cases and controls, and exposure data may be missing or recorded in a format that is not useful in etiologic studies. Problems also exist because the exposure data are collected after diagnosis (group status) has been determined (Mackenzie, 1986). Mackenzie (1986) and Friedenreich (1990) also note that cases and controls differ in many ways because of their disease experience; these differences may affect recall accuracy. The recall of past events and exposures is susceptible to substantial human errors (Kleinbaum et al., 1982; Cole, 1979). For example, diagnosis and treatment may impede the memory processes of case subjects, preventing complete and accurate recall of past exposures; the controls may be 20 less motivated to remember and to report exposures because the outcome event is not important to them; the cases (on the other hand) may be more motivated because of the salience of the outcome event (disease diagnosis), and the need to understand why this has happened to them, and what has caused their disease. There may be increased stimulation of 'search for cause' cognitive processes within the case subjects (Raphael, 1987; Raphael and Cloitre, 1994). Consequently, with the causal search model, one would expect that "causes or prior exposures connected plausibly with a disorder should be reported more completely by the cases than the controls" (Raphael, 1994, p.555). Furthermore, when subjects are aware of the exposure-disease associations being investigated, or that the exposure information being requested is either threatening of embarrassing to them, differences in recall may occur. Lastly, differences in hospitalization and diagnostics may help cases to remember and report antecedent exposures more completely and reliably because they have been prompted through increased questioning and examinations by several health care practitioners; 2) The validation of exposure information is difficult or sometimes impossible because case-control research must rely on either the recall of the cases, controls or proxy respondents, or exposure data which is recorded in hospital, physician or pharmacy records (Schlesselman, 1982); 3) The temporal direction of the investigation (directionality of inference testing) is from effect (disease) back to cause (risk factors). This 'backwards directionality' is the focal point of case-control controversy because it is the most significant methodological difference from the classical experimental approach, which investigates disease causation in a forward direction from cause to effect, through a process of deductive reasoning. It is assumed that under these circumstances, complete and accurate exposure histories for the cases and controls 21 cannot be acquired (Schneiderman and Levin, 1973; Rothman, 1986; Mackenzie, 1986; Kramer, 1988); and, 4) The fact that comparison groups are selected from two separate populations (i.e., those with the disease (cases), and those without (controls). In this case, the researcher cannot be certain that the two comparison groups are comparable regarding "extraneous risk factors and other sources of distortion" (confounders) (Kleinbaum et al., 1982, p.70). Cole (1979) and Sackett (1979) conclude that the retrospective approach of case-control methodology, and the deficiencies previously discussed leave case-control studies subject to a wide range of sampling and measurement biases (including exposure misclassification and recall bias) and to methodological problems which may bias estimates of association between exposures and disease. If significant, these biases may create or obscure effect estimates, and consequently invalidate the study's findings. Research into design deficiencies and possible biases is clearly justifiable given both the wide use and importance of case-control methodology in the study of rare and chronic diseases (including cancer), and its important limitations. Methodologists such as Feinstein (1979a), Cole (1979) and Ibrahim and Spitzer (1979) have called for the systematic and empirical investigation of case-control design and its validity as a paradigm for the determination of etiologic associations. Dorn (1959) summarized this need when he stated that there was an ongoing requirement for researchers to ensure study validity by improving upon the design, execution and analysis phases of case-control studies, and most importantly by providing strategies to assess for, to minimize or to control for the errors and biases that they are susceptible to. Sackett (1979) observed that both case-control and cohort analytical studies are susceptible to bias, but of the two, the case-control design is both affected by 22 more sources of bias and less able to guard against them. He advocated that "the continued development and refinement of methodological standards for case-control studies becomes a high priority, especially in view of their increasingly frequent execution and appearance in the scientific literature" (p.59). This dissertation seeks to do what Dorn (1959) and Sackett (1979) advocate: it addresses the appropriateness of case-control methodology for the etiologic investigation of breast cancer; it assesses the reliability and validity of retrospectively collected exposure data, and considers the nature and effect of any exposure misclassification on the estimates of effect. Finally, it proposes the design, implementation and evaluation of an 'exposure data validity scale' for the measurement and the control of differential exposure misclassification. 2.2 The Measurement of Exposure: The Problem of Misclassification and Its Impact on Risk Estimates In a case-control study, once the cases and controls have been recruited, a primary task of the investigator is to collect exposure information from the study groups for comparison. Rothman (1986) noted that the collection of exposure information from the cases and the controls may be subject to error which results in "information bias", and the distortion (biasing) of the estimates of effect (p.84). Rothman also differentiated between two types of information bias - non-differential and differential misclassification as well as their consequences on exposure-disease odds ratios. The basis for distinguishing between these two types of misclassification error according to Rothman (1986) is "whether the classification error on one axis of classification (either exposure or disease) is independent of the classification on the other axis. The existence of classification errors that are not independent of the other axis is referred to as differential misclassification, whereas the existence of classification errors for either exposure 23 or disease that are independent of the other axis is considered non-differential misclassification" (p.84). Differential exposure misclassification bias is regarded as a major threat to the validity of case-control studies because it can result in either an exaggeration or an underestimation of an exposure-disease association. Differential exposure misclassification is sometimes referred to in epidemiological texts as information or response bias (Checkoway et al., 1989). As discussed in Section 2.1, there are multiple factors which may influence recall accuracy, and result in exposure misclassification by the cases and the controls. The differences in the completeness and accuracy of exposure histories provided by the cases and controls may be random, systematic, or both random and systematic. Raphael (1987) distinguishes between simple memory failure in which recall is equivalently poor (non-differential) among cases and controls, and 'anamnestic inequivalence' (i.e., differential memory failure) in which the cases and controls differ with respect to the completeness and accuracy of exposure recall. Section 2.3 will detail the various factors that may account for 'anamnestic inequivalence' and the possible differences in the recall accuracy of the cases and the controls. In the case of non-differential exposure misclassification (NDEM), the prevalence of exposure reporting errors (misclassification) is similar for the cases and the controls. Furthermore, exposure misclassification errors are independent of the case-control (disease) status of the study subjects. Both the cases and controls experience 'memory failure', and are unable to remember or report exposures either accurately or completely. This type of exposure misclassification results in measurement error and usually a loss of statistical power (Raphael, 1987, p.167). As a consequence of NDEM, the exposure-disease odds ratios are biased in a predictable way, towards the null value (i.e., an assessment of no association 24 between an exposure and the disease/outcome event) (Copeland et al., 1977). In other words, the presence of N D E M results in the weakening or the masking of an association, and a type II error. Methodologists have noted that non-differential exposure misclassification cannot obscure a relationship between exposure and disease, nor can it create a statistically significant association when none exists (Gullen et al., 1968; Marshall et al., 1981; Mackenzie, 1986; Rothman, 1986; Chu et al, 1989). Conversely, differential exposure misclassification (DEM) due to recall bias occurs when the probability of exposure misclassification is different for the cases and the controls. The prevalence of false positive and false negative reports of exposure (patterned error misclassification) differs systematically by group (case vs control). D E M occurs as a result of differential memory failure (Raphael, 1987). Here, the systematic misclassification of exposure is related to outcome (disease status): the resulting bias leads either to an underestimation or overestimation of the strength of the association between the outcome and the hypothesized risk variables. In other words, DEM results in a systematic departure from the 'truth', and biases the odds ratios either towards or away from the null value, in an unpredictable manner (Checkoway et al., 1989; Coughlin, 1990). Several studies which have attempted to study the nature and effects of D E M have incorrectly conceptualized what recall bias is (i.e., assuming it to be the overreporting of exposure by cases and the underreporting of exposure by controls), and therefore have only looked for odds ratio distortions that are biased away from the null. Any study which hopes to determine if differential exposure misclassification must be taken into account will have to determine both the prevalence of false positive and false negative reports which are a function of the sensitivity and specificity of exposure classification, and subsequently, whether or 25 not the exposure-disease odds ratios are biased towards or away from the null value. The effect of non-differential exposure misclassification on estimates of effect has been discussed by Bross (1954) and Copeland et al. (1977). Rothman (1986) and Austin et al. (1994) commented on the fact that N D E M , and not just D E M (i.e., recall bias) is also a threat to study validity. Rothman (1986, p.86) stated that "non-differential misclassification has generally been considered less a threat to validity than differential exposure misclassification, since the bias introduced by non-differential misclassification is always in a predictable direction: toward the null condition [Bross, 1954; Copeland et al., 1977]". For example, if in reality there was a significant difference between the two comparison groups with respect to smoking, the existence of N D E M could result in a certain proportion of the truly exposed cases being misclassified as unexposed, while at the same time the same proportion of the truly unexposed controls could be misclassified as exposed. The result of the exposure misclassification in different directions, but in the same proportion for both the cases and controls could decrease the real differences between the two groups (cases and controls) regarding exposure to nicotine exposure through smoking, and result in an estimated OR=1.0, when in fact, the estimated OR > 1.0. Here, the researcher would fail to find a significant association. This example demonstrates that N D E M may be responsible for a type II error — a failure to detect a subtle and weak association between a risk factor (nicotine exposure) and the disease. Unfortunately, studies on exposure misclassification routinely exclude an examination of exposure data for N D E M . Rothman (1986) further observed that researchers are more concerned about erroneously claiming a significant association when one does not exist (i.e., type I error), than they are about underestimating an odds ratio, and failing to find a significant effect (i.e., type II error). He further noted (p.88) that N D E M will be 26 present in every epidemiological study, and that investigators should show greater concern for the consequences of NDEM, especially in studies that indicate no effect so that they may be able "to determine to what extent a real effect might have been obscured" (p.88). N D E M may result in the obscuring of real effects, especially if they are weak or subtle (Rothman, 1986; Austin et al., 1994). Elwood (1988) noted "that the greater the error the more 'noise' there is in the system, and therefore the more difficult it will be to detect a true difference in the factor being assessed between the groups being compared" (p.60). Consequently, it will be more difficult to detect a true difference in exposure prevalence and to find the cause of an effect. When compared to DEM, N D E M is considered less serious because it masks marginal associations mainly, but major exposure-disease associations will be detected even in the presence of NDEM. It is also acknowledged that type I error (which is associated with DEM) is more serious than type II error (which is associated with NDEM). This explains why researchers sometimes concentrate exclusively on D E M and its effect on study conclusions. However, it goes without saying that researchers must ensure the reliability and validity of the exposure data, and consider the nature and impact of both N D E M and DEM, and then correct their estimates of effect for any resulting distortions due to exposure misclassification. In summary, exposure misclassification is a potential problem in every study, regardless of design, but especially in case-control research. Therefore, it is very important to assess for its presence, to determine if it is non-differential or differential, and to estimate its magnitude and direction so that estimates of effect may be statistically adjusted. 27 2.3 Factors Affecting Exposure Reporting and Recall Accuracy: Respondent as a Source of Measurement Error The gathering of information from and about survey respondents through interviews and questionnaires has a long history and is widely used in both the social sciences and health care disciplines (Moss and Goldstein, 1975; Fienberg and Tanur, 1983). In fact, the sample survey has been described as the "single most important information gathering invention of the social sciences" (Adams et al., 1982, p.64). Many disciplines have come to depend on survey data for explanation of disease etiology, for evaluation of treatment protocols, for input to policy-making, for governmental and business administration, as well as for basic and applied social sciences research. However, from a methodological perspective, this research design is susceptible to problems of reliability and validity. Because a significant proportion of survey data, both factual and attitudinal, is derived from self-reports, which ask respondents to recall past events or attitudes, researchers know that such reports may be highly inaccurate. They have identified several response bias variables as possible sources of invalid conclusions for studies which rely on retrospective, self-reported data. These include: trait desirability, the need for social approval, the salience and emotional impact of the event, the method of data collection, the questionnaire format and context, the interview situation, respondent motivation, as well as time, memory and judgment factors. Hauser (1969) emphasized the potential magnitude of the error associated with survey data when he questioned if "...more misinformation than information had been gathered on subjects by means of survey methodology". Some methodologists believe that the response bias variables act as a form of systematic bias which significantly distorts the relationship observed between the independent and dependent variable(s). Several social scientists have realized 28 that although the literature indicates such distortion may occur, it has not been demonstrated conclusively (Gove and Geerken, 1977; Sudman and Bradburn, 1974). Sackett (1979) commented that methodologists must go beyond the mere cataloging of biases. He also stated that there was an urgent requirement for "the empiric elucidation of the dynamics and results of these biases. Methodologists have too long ignored their responsibility to measure the occurrence and magnitude of bias" (p.59). Many social scientists and health care researchers agree that there should be research on the biases themselves. Accuracy of reporting, specifically the precision and validity of retrospective recall data, is a serious methodological concern. It affects the research endeavours in many disciplines including survey research, health care research, sociology, psychology, demography, market research and statistics. Although there is a body of methodological work assessing the validity and precision of survey methods, the resulting information has not been integrated across the disciplines. Researchers are not fully aware of what is available in subject areas other than their own. For instance, although an experimental psychologist may know very little about collecting retrospective data in surveys, her study of human memory has a direct bearing on survey data because the accuracy of such data relies crucially on a respondent's memory — the retrieval and communication of recalled responses to questions posed. On the other hand, survey researchers must be scientifically self-critical, overcoming any complacency, such as assuming that if large numbers of people give apparently definite answers to a straightforward recall question, the combined result must be treated as valid and accurate (Moss and Goldstein, 1975). The purpose of this section of the dissertation is to review the pertinent cognitive and social sciences literature with respect to the problems associated with the collection and interpretation of retrospective recall survey data. It will 29 attempt to summarize and integrate much of what is known about each of the error sources and response bias variables through the writings in three research domains: cognitive and experimental psychology, social psychology, and survey research methodology. It will first discuss relevant observations about human memory, using a schema of encoding, retrieval and judgment, as elaborated by such writers as Alba and Hasher (1983). Next, it shall review the literature of social psychology which deals with three distinctive sources of survey error — the respondent, the interviewer and the "task" — and the known response bias variables pertinent to these sources which affect the accuracy of reporting, after information has been retrieved from memory, and which may jeopardize the validity of study conclusions. In this latter regard the researcher is interested in the impact of threatening or embarrassing questions on the truthfulness of the respondents' answers, the characteristics and behavior of the interviewer and respondent, the questionnaire design, the interview situation, as well as the motivation of respondents to participate and provide precise answers. Thirdly, the literature in survey research methodology delineates the many sources of response bias; as well, it identifies those respondent groups and tasks most susceptible to such errors. The framework for discussion of response effects or errors is as follows: 1) the respondent as a source of measurement error; 2) the interviewer as a source of measurement error; and, 3) the task variables associated with measurement error ("task variables" refer to the conditions under which the required information is given by the respondent to the interviewer). This framework is derived from the work of Groves (1989) and from Sudman and Bradburn (1974). Of the three sources of response effects, Sudman and Bradburn (1974) believe that the "task variables" are the most important source of response effects in survey data. In epidemiology, however, respondent variables are considered to have a considerable effect upon recall accuracy. As Mackenzie notes (1986), "given the number and variety of influences on reporting (and their potential to interact with each other), it is not difficult to construct scenarios in which cases and referents might be differentially influenced, and might produce reports of differing validity. However, there is little experimental evidence to support or refute any of the possibilities" (p.14). 2.3.1 Encoding, Retrieval and Judgment (Inferential) Errors: An Overview Retrospective, non-experimental, survey research often questions respondents for both qualitative and quantitative facts about prior events, behaviors and attitudes. Examples include: How old were you when you first had intercourse? Age at menarche? When was the last time you visited your doctor for a PAP smear? Would you describe your childhood as happy? What was your maternal grandmother's death caused by? Respondents' attributes and actions may affect the quality of the data collected. These questions do not just require simple recall of unambiguous facts from one's memory. They actually require sophisticated mental (information) processing. Due to limits on their ability to recall and enumerate specific autobiographical information, individuals often find these questions difficult to answer. It is a fact that respondents forget details associated with specific events, and often combine similar incidents into a single generalized memory (Linton, 1982). In these cases, respondents rely on inferences based on partial recall of information from memory to construct their answers (Bradburn et al., 1987). Cognitive psychology can achieve insight into these sources of error. Specifically, how do subjects encode information into their memories; how do 31 they later retrieve it; and, how do they combine the recalled information into a single integrated response by inferential processing? As Groves (1989) has observed, practitioners of the new cognitive science perspectives on survey response, such as Hastie and Carlston (1980), have identified five sources of survey response measurement error: 1) the respondent does not possess the knowledge required to respond to the survey questions (i.e., the information was never encoded, or the respondent has simply forgotten the details required); 2) the respondent does not engage in the appropriate cognitive activities (i.e., there is failure of the retrieval processes) at the time of response. Memory studies (Alba and Hasher, 1983) indicate that exact copies of personal events are never stored in memory. Furthermore, retrieval results in the recall of partial information, which can then be either reconstructed accurately or distorted and left incomplete; 3) the respondent does not understand the intended meaning of the survey question. Groves (1989) notes that the meanings of questions are not "fixed properties constant over all persons in a population" (p.419). The meaning assigned by the respondent to the question depends on the behaviour and characteristics of both the interviewer and respondent, consistent with the perspectives of symbolic interactionism (Stryker and Statham, 1985), on the context and form of the questions, and on the characteristics of the interview environment; 4) the respondent does not attend to the request for information and lacks the motivation to engage in the deep cognitive processing required to produce complete and accurate responses; and 5) the respondent does not communicate the appropriate response, once the information is retrieved. Many psychological processes affect what is actually 32 articulated in response to the survey questions. These include the perceived sensitivity or threat of the questions being asked, the social desirability of the respondent's answers, the perceived expectations of the interviewer, and the question itself in the context of the respondent's general knowledge and understanding of the purpose of the study. The first source of error cannot be addressed by design strategies. Nothing can be done to overcome the loss of information which has never been retained, or subsequently has been forgotten. The researcher can only deal with the situation whereby the respondent has encoded and retained the required information, but is having some difficulty retrieving it. If the survey researcher understands cognitive theory about how knowledge and events are stored in memory — by information processing -- she can construct survey procedures to access this information. Here, the emphasis is on the retrieval process and finding the cues (relevant schema) necessary to evoke the memory. 2.3.2 Cognitive Perspectives on Encoding and Retrieval of Autobiographical Memory Freeman et al. (1987) examined the matter of respondent accuracy of recall within the framework of the principles of memory organization in cognitive theory. They concluded that both forgetting and false recall are not random; they found systematic bias, which seems to lie somewhere in the cognitive processes of the respondent, somewhere within memory, between perception and recall. [The' following discussion relied heavily upon the analysis in Freeman] Contemporary cognitive theory provides no general overall explanation of the storage and retrieval of experience and information. Psychologists have come to accept that humans may be able to change memory storage and retrieval 33 strategies to suit differing demands. Nevertheless, a set of five fairly general principles has emerged in the research literature on memory: 1. Human memory is organized; humans create mental structures that impose patterns on information; 2. The organization in a mental structure is revealed in free recall. A categorical form of organization is usually imposed somewhere between stimulus and recall. Once a person has established a structure to organize a class of experiences, any new experience is then perceived and processed in terms of expectations imposed by that structure; 3. The organization of memory is based on experience. Mandler (1979) has said that "the mind creates order and structure out of a welter of stimulation, seeks for and finds regularities, and comes to expect them in the future" (p.260). But individuals vary in experiences. They differ in their exposure to, and knowledge about, the regularities exhibited in and among the elements found in a class of events; 4. The ability of a person to recall an element that occurred within an event depends on two factors: the amount of elaboration of his mental structure; and, the degree to which the element is "typical" in, and of, the event being examined. 5. The tendency of a person to falsely recall an element that did not actually occur depends on two factors: the amount of elaboration of the person's mental structure, and the degree to which the element is typical in events of the kind being studied. Although elaborate mental structures aid in recall, they may also have a cost in accuracy; what seems to happen is that with increased experience and increased mental structure, there occurs also an increasing tendency for "default" processing of those typical elements. People with well-developed mental structures will process incoming information about the typical elements of an event only superficially; their attention focuses on the untypical elements; they 34 will see what they expect to see of insignificant elements, based on prior experience. In such cases, a request for retrieval can not be met with genuine recall of the elements; rather, there will be a constructive process that taps into the general structure, as well as into specific memory. If the structure and event do not match exactly, false recall occurs. However, to the degree that the "normal" elements in an event are statistically typical, use of the model embedded in the cognitive structure as a substitute for actual perception will introduce very few errors. Freeman describes this as the "organizational" view of human memory, which suggests a way to reconstruct event details from the data provided by informants. High knowledge respondents (those with well developed mental structures) forget little but they do create errors in recall by reporting typical elements which did not actually occur in the particular case. However, as their errors tend toward the long term pattern, their collective judgment about a particular event should provide the best possible index of that pattern. The cognitive literature on memory emphasizes the fact that although one's memory of complex events and autobiographical information is sometimes very accurate, it is also frequently incomplete and highly distorted (Groves, 1989; Alba and Hasher, 1983). Schema Theory The following discussion is found in Alba and Hasher (1983), who analyzed the strengths and weaknesses of the influential schema theory. There is no single, well-accepted cognitive theory which provides a totally satisfactory explanation of such response measurement errors. However, "schema theories" (proposed originally by Bartlett, 1932) appear to give excellent insight into the nature of such errors, as well as to account for accuracy in recall. 35 Schema-guided encoding theory describes four central processes for the encoding of complex events/experiences - selection, abstraction, interpretation and integration — and one central retrieval process of reconstruction. These, taken together, can explain the inaccuracy, incompleteness or distortion of survey data. The theory posits that incompleteness in recall is attributed to the failure of retrieval processes, whereas distortions are due to associative encoding processes (Alba and Hasher, 1983). By definition, 'schemata' are "sets of interrelated memories organized so that relationships are represented among attributes of events or pieces of information". A schema refers to the general knowledge that an individual possesses about a particular domain; and it is the vehicle which permits the encoding, storage and retrieval of information related to the specific domain (Groves, 1989, p.410). It is proposed that what is encoded or stored in memory is determined by a guiding schema or 'knowledge framework' that selects the central elements and modifies the experience in order to arrive at "a coherent, unified, expectation-conforming and knowledge-consistent representation" of an experience (Alba and Hasher, 1983, p.203). The four central encoding processes (selection, abstraction, interpretation and integration) are responsible for schema-guided encoding of complex information, behaviour and attitudes. Explication of each of the four stages of the multistage encoding process follows to show how 'schema theory' accounts for the potential accuracy of memory, incompleteness in recall, and distortions in the data recalled and reported. 36 (1) Selection Only some of the incoming information of an event or experience will be encoded and stored as part of the memory representation of that event/experience. Three factors will determine what information is selected: (1) the existence of a relevant schema or 'knowledge frame'; (2) activation of that schema at the time of encoding; and, (3) the importance of the incoming information with respect to the activated schema. The first condition requires the presence of existing relevant information. Prior knowledge, whether semantic or structural, increases the probability that new information will be encoded. In other words, specific domain-related prior knowledge is required for the acquisition of new domain-related information. The encoding of new information can be seen as a 'mapping process' of new information onto old; it depends on a sufficiently well-developed knowledge base or schema. The amount of new information which can be assimilated depends not only upon the amount of prior relevant domain-related knowledge, but also on the degree to which the incoming information matches the existing knowledge structure. In the absence of domain-related prior knowledge, there is no schema into which this new information can be readily integrated or subsumed. Thereupon the information is quickly lost or distorted. In this condition, memory is poor. For example, an end-stage kidney patient who has received dialysis three times per week over the past 5 years, and been subjected to frequent and diverse diagnostic workups, will be better able to assimilate information about a new diagnostic procedure than is a patient hospitalized for the first time for a diagnostic workup which includes this new procedure. The experienced kidney patient has a well-developed, extensive schema for such diagnostic procedures. He can more easily integrate information about a new procedure. It is not 37 unlikely that his recall of information about diagnostic tests, including the new one, will be more accurate than the information provided by the inexperienced subject. The second factor emphasizes that the possession of prior domain-specific knowledge is not sufficient in itself to guarantee the encoding and storage of the new domain-related information. There is a further requirement that the relevant schema or knowledge frame be concurrently activated at the time of encoding. Experimental evidence shows the importance of schema activation during the encoding process. Bransford and Johnson (1973) postulate that when knowledge structures are inactive during the encoding process, new knowledge cannot be integrated easily: the "absence of the appropriate semantic context can seriously affect the acquisition process" (p.397). Furthermore, Bransford and Nitsch (1978) speculate that less experienced subjects (e.g., the patient hospitalized for the first time for a diagnostic workup) will have greater difficulty than more experienced subjects (e.g., the chronic renal patient who has been subjected to frequent diagnostics) in determining the situational cues that can lead to the activation of an appropriate schema. Anderson and Pichert (1978) provided experimental evidence that respondents preferentially recalled information and events which were congruent with their perspective at the time they were encoded; that is, ideas important to an activated schema are more likely to have a selection advantage for storage. The research was designed to study the independent effects of the nature of cues/schema on recall by using two distinct cueing strategies for the same text of a story. The results corroborated the hypothesis that recall of information was consistent with the perspective taken at the time the information was encoded. Analysis of the interview protocols suggested that the shift in perspective led 38 respondents to invoke a different schema which provided implicit cues for different categories of story information. In other words, the schema which was activated would determine the kinds of material recalled. Anderson and Pichert (1978) concluded that these data clearly show that retrieval processes are independent of encoding processes, and that apparently forgotten material can be remembered through a shift in perspective. Also, different schemata operate at retrieval to influence what is recalled (Groves, 1989). Information irrelevant to the activated schema may never be permanently encoded; or it may be encoded but processed less elaborately than more relevant information (Anderson and Pichert, 1978; Alba and Hasher, 1983). Even when information agrees with two different schemata, only one schema is activated during encoding; and recall is more accurate and complete for the information that is consistent with the activated schema (Alba and Hasher, 1983). Tulving and Thomson (1973) stress the importance of the "encoding specificity principle" (p.353). The specific information stored is a function of what is perceived and how it is encoded. Further, what is stored determines the specific retrieval cues that can be effective in accessing the stored information. In other words, the memory representation and the properties of effective retrieval cues are determined by the specific encoding processes used on the incoming stimuli. The implication of this research is that implementation of schema-driven questioning strategies (i.e., diverse recall cueing strategies) by survey researchers is important for eliciting responses relevant to the purpose of their studies. Tulving and Psotka (1971) showed that multiple cueing at the time of recall improved the quality of the material recalled. Therefore, when a survey is asking respondents to remember detailed or complex information, it would seem advantageous for the interviewer to ask the respondent to recall the required data 39 from different perspectives in an attempt to trigger different schema in order to build up a complete and accurate recall of the information being requested. The experimental work of Richardson and Gropper (1964) also suggests that if respondents must recall complex details/events, they should be asked to recall the same information on successive attempts in order to improve the quality of recall. It has been shown experimentally in studies on 'bounded recall' that response is improved when subjects are interviewed on more than one occasion and are reminded of what was recalled in previous sessions (Neter and Waksberg, 1964b). These strategies may not always be feasible given an analysis of the cost-benefit ratio of additional interview time against the amount and precision of the additional information obtained. Schema theory suggests that research should be directed at learning which schema are used to organize the information sought. The third factor which influences encoding in human information processing is the relative importance of the incoming stimuli in relation to the activated schema. Only important elements of the incoming stimulus are focused on for encoding. Because more attention is devoted to these elements than to less important ones, it is these same elements which are likely to be learned and represented accurately in the individual's memory. Traditional schema theory predicts that only the relevant information will be encoded; the remainder will be either rejected and lost, or distorted to fit the existing schema (Owens et al., 1979). This would account, in some measure, for the incomplete and inaccurate reporting of events. Two selection principles have been postulated to account for the information that is selected and encoded. First, it is proposed that the ideas most important to the theme of the information, and which cannot be derived from previously encoded information, will be given special attention during encoding and subsequently will be recalled best (Owens et al., 1979; Spiro, 1980b). 40 The second selection principle is derived from script theory and the work of Schank and Abelson (1977) and is discussed in Alba and Hasher (1983). It focuses on how subjects process information related to frequently occurring events (e.g., visiting the doctor, eating out). A "script" is a temporarily ordered set of detailed memories containing the normal sequence of actions performed during the event under the usual circumstances (Groves, 1989, p.411). This means that not every detail of the experienced event is stored. Only the distinguishing features or 'atypical information' will be selected and encoded in memory. The theory predicts that the memory traces representing the highly typical events of the particular episode will be forgotten, or simply will not be encoded. These details can be derived by recalling that a scripted event occurred, and then by recalling highly probable elements from the "prototypical script" (Schank and Abelson, 1977). As a result of the selection process, the memory trace of any event is likely to be incomplete. Thus, it is impossible for a respondent to reproduce a carbon copy of an event, even when under full motivation to do so (Alba and Hasher, 1983). Instead, she will try to reconstruct the event from the information which has been encoded and recalled. Bartlett (1932) stated that event reconstruction will consist of the recall of stored information plus "probable detail" from general schematic knowledge. Research suggests that reconstruction using script information (i.e., probable detail) can lead to poorer, imprecise reporting. For instance, Bower et al. (1979) found in a series of experiments with students concerning stories about routine activities (e.g., visiting a doctor or eating in a restaurant) that the respondents tended to recall attributes of the stories that were never communicated. They recalled and reported 'typical' actions that never happened. Instead, the 'scripts' or the respondent's prior personal knowledge regarding the activity (i.e., going to the doctor) provided additional details for the memory trace of the particular story. In other words, subjects confused what was said with what the script strongly implied when remembering script-based texts. Distortions of the original event will occur when the probable detail generated during reconstruction is not actually part of the original event. Alba and Hasher (1983) then note that Spiro (1977, 1980b) has showed that the reconstruction process is most likely to result in distorted/imprecise recall when the respondent encounters additional schema-relevant knowledge contradicting the encoded schema. Subsequent recall will depend on these two sources of information — one correct, the other wrong. In this situation, the recalled information includes additional, incorrect information which is a byproduct of the reconstruction process that is attempting to resolve the inconsistencies between the two sources of information. Recall will contain additional information that was not a part of the original event. In summary, selection process theory suggests that a significant proportion of the original event is not encoded nor represented in an individual's memory. Therefore, the selection process can account for incomplete respondent recall during survey research, whereas inaccuracy and distortion in recall can be attributed to failure of the reconstruction process which is believed to operate in retrieval (Alba and Hasher, 1983). (2) Abstraction During the abstraction stage of encoding, the information selected by the activated schema is further reduced. Only the semantic content or meaning will be abstracted. As a result, the surface structure will be lost. Thus, only an abstracted memory trace of the original stimulus is stored. Because significant detail is lost during abstraction, this process can account for a respondent's incomplete recall and distortion of complex events/experiences. 42 The further concern, however, is to explain accurate recall. Alba and Hasher (1983), note that psycholinguistic research findings help schema theorists do this with the hypothesis that speakers of a language share preferred means of expressing information. If both speaker and listener have the same preferences or biases, the listener's reproduction may later seem to be accurate but is really only the imposition of the shared language structure. On the other hand, "distortions" would result from the abstracting process when the sender and receiver do not speak the same language, do not share the same biases. This distortion is a common occurrence in medicine. For a newly hospitalized patient with little previous health care experience, what he abstracts about the diagnostic workup may vary greatly from what the doctor intended him to understand — even when the doctor has tried to forestall this problem through the use of careful, apparently non-medical terms and painstaking explanations. However, distortion is better explained by two other schema-theory processes, interpretation and integration. Abstraction can be tangentially linked to recall distortion because it is a precondition for these last two stages in the encoding process (Alba and Hasher, 1983, p.209). (3) Interpretation The discussion to this point has attributed the distortions in the recall of events/experiences to the encoding processes (selection and abstraction) which reduce the information encoded and stored. This loss of information is partially compensated at recall by reconstruction and the addition of 'probable detail'. Distortions also occur because the semantic information encoded is in fact only an interpretation of the explicitly presented stimuli, albeit one that is consistent with the activated domain-specific schema. Distortions due to faulty interpretation are referred to as constructive errors because additional information 4 3 is added to the explicit information during or shortly after encoding. As a direct consequence of interpretation, there will be an elaboration of the memory trace of a complex experience/event. Harris and Monaco (1978) note that respondents' interpretations are typically inferences of two .general types. The first, 'pragmatic implication', involves transforming explicit information into its probable underlying intent. The second involves inferences made during comprehension when there is a need to (1) concretize vague information; (2) provide missing detail; and, (3) simplify complex and detailed information. The possibility of distorting an original experience/event occurs during this stage of encoding because the respondent is able to add or change the information that is conveyed by the stimulus (Alba and Hasher, 1983). For example, when a patient is told by his doctor that he has a tumor which requires biopsy, he may later recall this discussion as his doctor actually telling him that he has cancer. (4) Integration The information which remains after selection, abstraction and interpretation will then be integrated with the previously acquired, related information activated during the current encoding episode. A single integrated memory representation is created. Individual detail exists only as a part of a complex semantic whole. Integration processes occur either when a new schema is formed, or when an existing schema is modified. Gentner and Loftus (1979) give experimental evidence that once integration occurs and prior knowledge has been modified or updated, accurate recall of the original stimulus/information becomes highly unlikely. New information will be integrated into the old knowledge frames; thereafter, distinct traces of a 'to-be-remembered' event do not exist; only an 44 integrated memory trace remains. In the experiments of Gentner and Loftus (1979), subjects were shown either a film or slides of various traffic situations. Afterwards, a question was posed relative to the traffic scene that either implied the presence of additional information which never actually was present or which contradicted prior existing information. In subsequent memory tests, the subjects often misrecognized new slides containing the additional or contradictory information. The new and contradictory data replaced the individual's knowledge of the original traffic scene, resulting in a single, integrated memory of the scene. This distortion has received much research attention in relation to eyewitness testimony in court cases. In summary, schema theory plays a very useful role in understanding human information processing. It provides a framework for understanding how information is encoded and stored in memory. It also permits an understanding of why recalled material can be accurate, or instead can be incomplete or distorted. According to this theory only some highly selected subset of all the possible stimulus information is encoded; the selection is guided by the knowledge schema activated at the time of stimulus presentation. Memory traces are abstract representations of the original stimuli, and as such, make recall of the exact events/experiences impossible. Memory is an organized, hierarchical structure based on economy of storage, in which the semantic meaning of stimuli appears to have the highest priority for storage. Furthermore, memory is interpretive in that schema serve to add missing details or distort others so as to be schema-consistent. Memory is integrative in that the abstracted information is combined with prior knowledge, and any subsequent information, to create a single, unified memory trace of a detailed and complex event (Groves, 1989; Alba and Hasher, 1983, p.212). Reconstruction of memory traces is the method by which subjects retrieve and recall event characteristics. The subjects add detail to the memory trace; they 45 will reconstruct the event using the probable details of the larger generic group of events of which the target event is a subset. The reconstruction process is a source of respondent error in the recall of event characteristics because there is a tendency to report event characteristics that are prototypical of those of the general class (Alba and Hasher, 1983; Yekovich and Thorndyke, 1981). Recall accuracy depends upon many factors, including the elements of the original event selected for encoding and storage, the nature of the schematic connections established during encoding, respondent-interviewer biases in expressing information, differences in the number and timings of rehearsals (i.e., the number of times that the same information is requested and recalled by the respondent), the nature and number of cues/schema used to retrieve the requested information, the order in which elements are recalled, and chance matches between the reconstruction process and the original event/experience (Groves, 1989; Anderson and Pichert, 1978; Alba and Hasher, 1983). For example, accurate recall is more likely for high probability events (i.e., high importance events/information) because these events develop more cognitive connections with other similar events, are referred to by others, and tend to be called into working memory more frequently. As a result of the rehearsal processes which occur in the working memory and the connections with other schema, important and highly probable events can more easily be retrieved by many diverse cues. The quality of recall can be good. In other words, these memory traces are more accessible in memory and are retrieved first. These events also serve to provide the probable detail in the reconstruction of the original event (Kintsch and Van Dijk, 1978; Black and Bower, 1979). Distortion can be due to several factors including: the process by which the original event is reconstructed, the addition of information to the memory 46 representation of an event (constructive error), interpretations not actually intended, and the integration of memory traces over time (Alba and Hasher, 1983). Incompleteness in recall is attributed to two encoding processes - selection and abstraction: not all the elements of the original stimulus are selected for representation or become part of the abstracted meaning. Thus, incomplete details of the event are stored in memory. Because encoding involves the reduction of information for memory storage (economy of storage concept), the amount of information that can be recalled about the original event is reduced and incomplete. 2.3.3 Impact of the Properties of Memory on the Retrieval of Autobiographical Facts: A Review of the Literature Based on the preceding discussion, it will be recognized that responses to autobiographical questions could be accurate, but they could also be incomplete and distorted. Linton (1982) demonstrated that subjects may not be able to recall an event, even when they have numerous cues and the event is distinguishable from others. In this study of recall of personal events, 20% of the critical details -selected at time of occurrence to be important and to be "certainly" remembered if the events were recognized ~ were irretrievable after 12 months; 60% were irretrievable after 5 years. In 1977 Cannell summarized the results of studies on interviewing methods which were designed to identify patterns of response bias, and to provide a basis for creating procedures to improve reporting. He recognized the inadequacies of the case-control method, and wrote that there was a need for improved data collection, as well as for research into the possibility of widespread 47 biases plaguing case-control design. One interpretation of reporting errors (over-and underreporting) is that such errors result from poor memory. Cannell (1977) reviewed three theories. The "disuse" theory of Thorndike suggests that events from the more distant past are more likely to be forgotten than are more recent events, and that reporting errors arise from poor memory. Gestalt theory suggests that, generally, a respondent will forget events of low "salience" (or personal impact), especially with the passage of time; high salience events are better remembered. The "interferences" theory deals with the phenomenon of forgetting; forgetting does not occur absolutely. Information does not disappear completely from memory; rather, it may be more difficult to retrieve due to competing associations or interferences. Only the accessibility of information decreases, resulting in a lessening probability of recall from the storehouse of memory (Cannell, 1977). This third theory suggests that underreporting is really a problem of retrieval, which can be alleviated by manipulating conditions which facilitate the recall of information. There are two critical steps for a respondent asked to report information from memory. First, she has to search for and retrieve it; secondly, she must transmit it to the researcher through a questionnaire or interview. Investigators have long been aware of the limited time span over which a person gives accurate reports. Some studies have showed a forgetting curve over time. However, the decrease in reporting may be due not so much to forgetting as to a tendency to misplace the event in time, and then recall it as being outside the reference period. Such misplacement may be only a minor factor, though. From the earliest memory studies, it has been recognized that the greater the impact of the event upon the person, the more readily the respondent recalls it. "Impact" generally refers to the personal importance of the event. Psychologically, this suggests that certain events occupy a greater part of one's psychic life. 48 Cannell (1977) also recognized that another factor affecting accuracy of reporting is the level of threat of embarrassment which the requested information holds for the respondent. Social psychological research has revealed the effectiveness of group norms in bringing about and maintaining approved behaviour. As well, one's self-image tends to censor communications. Research has shown a predictable and significant relationship between some characteristics of the information sought and the respondent's reporting behaviour. Cannell (1977) concluded though that the characteristics of the respondent are not as consistent nor strong in their influence on underreporting as are the characteristics of the event. His general view is that research on improving reporting can best be devoted to the nature of the events and the factors underlying them, the most significant of which appear to be: elapsed time, impact, and the threat of embarrassment through revelations. Cannell (1977) also concluded that research shows that the actual behaviour during an interview was the main variable that correlated with the index of reporting quality. This possibility is discussed in the later portions of this chapter dealing with "interviewer as a source of measurement error" and "task variables". Coughlin, as well, has more recently reviewed survey research which has suggested that recall ability is related to the salience of an event; frequency, vividness, duration and meaningfulness of an event contribute to ease of recall (Coughlin, 1990). No consistent relationship has been found between accuracy of recall and demographic factors; this perhaps reflects the differences in study populations, the questions being asked, or the nature of the exposure under study. His overall thesis was that the factors which contribute to bias due to differential recall between cases and controls in retrospective studies have not yet been examined very thoroughly. 49 The National Centre for Health Statistics (1965) publication, 'Reporting of Hospitalization in the Health Interview Survey' (1965) focused on the underreporting of hospitalizations, and discussed hypotheses about the working of memory and about motivation as major variables possibly responsible for recall error. The survey accepted the validity of the two principles discussed above: memory is better for recent events than for those farther back in time; and, events having a greater impact on the person will be remembered better than those of minor impact. In general, there may be a "decaying" of experiences over time. However, to consider memories as fixed and "lifeless" is unrealistic. Memory is active, and dynamic, following patterns which can be predicted. Motivation is one of the most important forces in memory. Persons integrate events into their psychological life so that they fit most comfortably with past experience and with self-image. Numerous experiments testify to the selectivity and distortion which occur in the recollection of an event. Decay of memory is modified by many factors, including the meaningfulness of the initial experience, the degree to which it was "learned", and the interference of other experiences. Motivation will affect how much effort a respondent will make to give an accurate report. To do this, a person must relive or review carefully his experience, constantly checking memory or using other aids such as reference to records. A more serious problem arises if the goals of the respondent are better served by inaccurate reporting, such as for embarrassing or socially undesirable behaviours. Respondents may also have low motivation due to negative reactions to the study or its objectives. To participate in an interview requires the respondent to accept the goals of the survey, and to react positively to the interviewer. A negative reaction to either factor may be expected to result in inaccurate data. 50 The survey concluded that there is a strong relation between memory and motivation; they react dynamically. Findings included: the threat or embarrassment of a diagnosis starts a motivational pattern leading to suppression, and thus to underreporting of threatening episodes; the longer the hospital stay, and the more serious it is, the harder it is to forget; the elapsed time between the episode and the interview provides the opportunity for threat and decay factors to become effective. As time passes, perceptions are reshaped to fit one's total pattern of experiences. People remember selectively. It is found that the greatest underreporting is among episodes that provide the motive and opportunity for "forgetting". The survey found that re-interviewing elicited a sizable number of additional episodes remembered. The second visit provided additional stimulus to recall, and may also have increased the motivation to do so. Wagenaar (1986) demonstrated that everyday personal events are forgotten very slowly, and that no event entirely disappears from memory. In addition, his work confirmed that the probability of recall depended on the number of retrieval cues used, as well as on the nature and the particular combination of these cues. A prompt about what occurred on a particular occasion (who was involved or where the event took place and/or the social occasion) improved recall. Asking for the date of an event is a poor cue to use if accuracy is the focus of the cueing strategy (Barsalou, 1987). Retention was significantly related to the perceived salience of the events, to their pleasantness, and to the degree of emotional involvement. The suppression of unpleasant memories was only significant for the shorter retention periods (p.249). The work of Wagenaar (1986) and Bahrick (1983) demonstrated that the "forgetting function", the percentage of correct information recalled as a function of the retention period (time measured in years), depended on the nature of the material being queried. 51 Surveys requiring detailed and complex information — such as number and duration of hospitalizations, personal expenses, drug consumption and cost, etc. --often ask the subjects by advance letter to collect and review personal records, and then have them available during the interview. Other studies use recall aids (e.g., lists of products) at the time of the interview, thus enabling respondents to use recognition rather than recall as a strategy for reporting events and/or behavior. Aided recall appears to become more important as the length of the recall period increases. These methods have opposite effects on memory errors. The use of records generally controls for overreporting due to telescoping errors, but has an insignificant effect on errors of omission because records were never meant to be 100% complete. Aided recall, on the other hand, tended to reduce the number of errors due to omission, but did not reduce (perhaps may even have increased) telescoping effects (Sudman and Bradburn, 1974, p.68). The use of records does not guarantee that respondents' reports will be accurate. Williams and Hollan (1981) demonstrated that successive attempts at recalling a specific event/experience can result in additional recall of event characteristics. Furthermore, experiments on autobiographical memory have shown that respondents achieve better levels of recall if they are required to begin with the most recent event and then work backwards to the earliest occurring event. However, individual differences have been noted: some subjects prefer to recall in a forward direction; and often the direction of recall depends on the nature of the material being recalled (Whitten and Leonard, 1981; Williams and Hollan, 1981). Groves and Kahn (1979) hypothesized that even though an event is never entirely forgotten, the effort to retrieve this information may be so onerous as to exceed the capacity of even the most attentive and motivated respondent. Reiser et al. (1985) demonstrated that it takes on the order of several seconds to consider and retrieve specific information about even commonly occurring events (i.e., going to the barber, going for a walk or a visit to the dentist). This means that if a survey researcher asks too many questions within the limited time period during which respondents are motivated to participate and answer survey questions, the accuracy of the data will certainly diminish. Tourangeau et al. (1986) and Cannell et al. (1977) suggest design strategies which allocate more time per response and the use of longer questions. Accuracy will be increased because subjects have more time to use different retrieval strategies, more time to recall events, more time to consider their response, as well as more cues to stimulate recall. Recall of autobiographical events is harder if memory contains many similar events. Initially distinguishable events can become confused; that is, the characteristics of the event become distorted, or the event itself becomes irretrievable due to interference from later events (Linton, 1982; Wagenaar, 1986). Brown et al. (1987) note that events and personal experiences are organized temporally in discrete groupings of autobiographical sequences. For instance, a hospitalized patient may remember a visit to a doctor as part of an "extended causal sequence" beginning with the identification of a breast lump, making the initial doctor's appointment, then being referred for diagnostics, and ending up in hospital recuperating from a mastectomy. Evidence for 'autobiographical sequences' has been generated from studies on the effects of calendars on recall of personal events associated with work/school, as well as from 'free-recall' studies of personal events. The latter research demonstrates how subjects order events/experiences while reporting their recollections. Taken together, these studies demonstrate the temporal organization of personal information. In addition, autobiographical sequences provide reference or anchoring points in time, that could be used as a design strategy for locating 53 other events in time. Autobiographical sequences provide a means by which subjects can organize their memories of personal events/experiences temporally. The survey researcher can use these sequences to counteract deficiencies in respondents' temporal inferences -- errors associated with estimations of the frequency or recency of an event. 2.3.4 Judgment of the Appropriate Answer: Inferential Processing Strategies and Associated Errors Questions about the frequency or probability of occurrence of autobiographical events are difficult for a respondent to answer even when fully motivated to do so. Cognitive research has shown that inference plays a vital role in what a respondent reports, and in the accuracy of response. In general, the respondent remembers a few facts relevant to a particular survey question, and then uses inductive inference to produce a reasonable answer. Inference, which adds detail to what the respondent can recall, can be inexact and misleading (Kahneman and Tversky, 1972). An understanding of the inferential processes used in answering survey questions and their cognitive limitations can aid survey researchers to implement design strategies which will improve the accuracy of recall. As Groves states (1989, p.434), although there is no generally accepted theory on how humans make judgments, much cognitive research has been completed on how subjects form judgments about alternatives, and draw inferences from personal experiences (Kahneman and Tversky, 1972; Tversky and Kahneman, 1974; Nisbett and Ross, 1980). In surveys, subjects tend to put forth the minimal effort to meet the data requirements of the particular study. Krosnick and Alwin (1987) coined the term "cognitive misers" to refer to survey respondents 54 exhibiting this characteristic behavior in recall tasks. In general, respondents tend to avoid burdensome, intensive cognitive processing when forced to choose among alternatives. Rather, they use more readily accessible information about the alternatives to determine if sufficient discrimination can be made among them. Kahneman and Tversky (1972) note that in these situations, subjects are willing to accept the risk that they will be incorrect in return for the decreased effort and time needed to make their decisions. Various "heuristics" (rules) have been proposed to explain the shortcuts taken by subjects to reduce cognitive processing (i.e., to reduce the required judgments to simpler ones) in decision-making, and in the evaluation of the frequency and probability of events. Although these heuristics are efficient, and in most cases yield judgments consistent with more intensive thought, they can also be sources of measurement error in survey research (Tversky and Kahneman, 1974). The "availability heuristic" is often the first cited source of systematic bias in survey research. This inferential strategy refers to the tendency to choose as most important/recent/relevant that alternative which is most accessible in memory. It also influences the validity and consistency of judgments about frequency and probability. This judgment strategy relies on the quantity of information recalled, and the ease with which relative instances come to mind. The respondent is attracted to the accessible memory as an effort-saving strategy. Availability is considered a valid criterion for judging frequency because, in general, frequent events are easier to recall or imagine than infrequent ones (Tversky and Kahneman, 1973). The most accessible memory is often the most 'vivid' (i.e., rich in detail and cognitive connections with other schema/events/ideas because it has been 'rehearsed' most often), the most recently accessed, and most connected to strong 55 emotions. Often the respondent takes this easy accessibility as a good indicator of relative importance and recency when a date must be put on an event. Furthermore, a survey question, and the cognitive problem it poses, may resemble another one just performed (and thus "available"). Consequently the respondent may form a judgment using the procedural format just followed in the preceding task (Kahneman and Tversky, 1972; Tversky and Kahneman, 1974; Brown et al., 1985). In many situations the availability heuristic works well. An individual may come to trust it in guiding his judgments. However, reliance on it can result in poor judgments and inaccurate responses when the most accessible, relevant memory is atypical of the respondent's experiences, and his answer is overly influenced by it. For instance, consider the survey question which asks respondents to determine the frequency of acute health problems in the past 12 months. If a respondent has recently experienced a bout of chest pain requiring emergency outpatient treatment, this easily recalled incident stimulates him to search his memory for other similar incidents in the requested reference period. To the extent that these episodes are seen as related events (i.e., similar in duration, diagnosis and effect), recall is improved for such a question of this kind - one designed to measure the total frequency of occurrence in a specific reference period. On the other hand, if the most easily accessible event is unrelated to an acute health problem, or if the episode remembered occurred early in the reference period, the answer to the frequency question may be incorrect. In the first instance, both over- and underreporting frequencies are possible; in the second case, there will be fewer reports of acute health episodes. Groves (1989) speculates that easily accessible memory is a poor indicator of an individual's 56 experience throughout a reference period, and one which can lead to errors with respect to frequency, recency and importance. Brown et al. (1985) propose that an inferential process — the "accessibility principle" — is used by respondents for the estimation of the probability or frequency of an event, as well as for inferring event dates and recency. According to the accessibility principle, the subjective dating of events/experiences, and the judgment of frequency and probability, depend in part on the amount that can be recalled. Events with more facts accessible will appear to be more recent, more probable, and more frequent in occurrence. These authors see a connection between their "accessibility hypothesis" and Tversky and Kahneman's "availability heuristic". In the former, the subjects tend to base estimates of frequency, probability and recency of an event on the amount known about an event; in the latter, the estimate is based on how easily the event can be recalled. The strength of the memory trace (accessibility) and the ease of information recall (availability) can lead to errors in recency, probability and frequency judgments. For instance, when two events occur at the same time, the one that is more retrospectively memorable will be estimated to have occurred more recently. Further, when two events occur at the same time, but recall of event generates more detail, it is that episode which will be mistakenly judged to have occurred more recently (Kahneman and Tversky, 1972; Brown et al., 1985). Frequency and probability judgments are also affected by another judgment strategy called the "representativeness heuristic". Kahneman and Tversky (1972) describe it as a tendency to over generalize from incomplete information or small samples (Groves, 1989). An example of this inferential mechanism comes from the controlled experiments of Dawes (1988). He noted that when subjects were presented with evidence that the conditional probability of teenagers using marihuana was high given that they used any drugs, the 5 7 subjects inferred then that the conditional probability of teenagers using other drugs was also high given that they used marihuana. Another inferential strategy to determine frequency of occurrence is "decomposition" (Armstrong et al., 1975). Respondents tend to break a problem into its subcomponents. For instance, if a subject were asked to determine the frequency with which members of his family ate out in a restaurant over the past 12 months, typically he would determine a general rate of occurrence, and then multiply the rate generated by the time period requested (i.e., the multiplicative approach). Bradburn et al. (1987) describe another decomposition approach — additive decomposition. The respondent calculates values for mutually exclusive and exhaustive components of the desired quantity (in this case the number of breakfasts, lunches and dinners). Both of these techniques can be included in the survey design to increase response accuracy by means of guided decompositions controlled by the researcher. Response errors can also be associated with an individual's explanation of the nature of memory. For instance, if a respondent has difficulty remembering an event (i.e., the memory trace is inaccessible or weak), he will infer that the event occurred infrequently, long ago, or not at all (Tversky and Kahneman, 1973; Lichtenstein et al., 1978). This incorrect reasoning strategy is identified as a factor responsible for "telescoping" — the incorrect estimation of event frequency within a given reference period. Telescoping occurs when a respondent incorrectly includes into the reference period events which actually happened earlier. For instance, a respondent might erroneously include an episode of the flu experienced 16 months ago in answering a question about the frequency of acute health problems in the past year (Bradburn and Sudman, 1979). Brown et al. (1985) suggest that one explanation for errors due to telescoping is that subjects recall episodes of the class of events requested (i.e., 58 acute health problems), but cannot remember specific dates. If they recall an incident that actually occurred before the time period requested, but the memory is easily recalled and detailed (according to the accessibility principle and availability heuristic), subjects may incorrectly infer that the episode was recent enough to be included. Brown et al. (1985) note that autobiographical sequences — the temporal organization of personal events/experiences — can be used as a recall strategy to diminish the bias generated by these faulty inferences. They argue that error is minimized because the sequences anchor events onto a 'personal time frame'. As such, they can provide additional information about dates. Judgment goes beyond inference based only on ease or detail of recall. Neter and Waksberg (1964b) also show that "bounded recall" can reduce response errors due to telescoping. It is used frequently in health and consumer expenditure studies. This strategy uses data from a previous interview as recall cues in the next time period. Here, respondents report on events over an extended time interval (usually a year), but are interviewed periodically (every 3 months) about expenditures during that particular time period. The interviewer gives the respondent the data from the prior period and asks about expenditures since the last interview. The previous interview acts as an anchor as well as a retrieval cue to reduce faulty recall of events from an earlier period. Finally, overreporting and underreporting of event frequency can be explained in reference to "anchor and adjustment heuristics" (Tversky and Kahneman, 1974). Anchoring and adjustment relates to questions requiring estimations. According to these heuristics, a question is answered by choosing a preliminary estimate or approximation (referred to as an 'anchor'); the respondent then adjusts it to specific differences implied by the question. For instance, if a woman is questioned about the frequency of pap smears in the last 5 years, she 59 may take as her anchor the normative expectation of 'once per year', and then adjust that by an awareness of deviations (i.e., consistency or inconsistency with the stated norm; if she goes more or less often for the checkups). Groves (1989) noted that anchoring and adjustment heuristics are most often used when respondents must estimate event frequency (total occurrence) over a long reference period, and also when the enumeration task is judged to be error-prone. For shorter reference periods, decomposition-enumerative techniques (additive or multiplicative) are used. Inaccuracy of recall may result from incomplete memory, or from retrospective distortion of information after reflection on the issue, or from a combination thereof (Ross, 1980). "Faulty" recall is unintentional false reporting due to poor memory or changing perceptions of past events. However, a subject might also be biased toward the researcher, desiring to help the project or even desiring to conform to societal expectations about proper behaviour. "Falsified accounts" involve intentional false reporting. Explanations for such behaviour include the fear of being honest, or a desire to project a false image for ego enhancement. Threats to accuracy of recall in survey reports arise from both cognitive and motivational factors. In order to report accurately, the respondent must understand and remember the information on the one hand, but must also be willing to report it, on the other (Rodgers and Herzog, 1987). The authors surmise that with the aging of society, there are greater concerns about the validity of surveys of elderly populations. Some experimental evidence suggests that increase in memory loss is a function of aging; this seems to happen for distant as well as recent events. Also, older respondents may resist reporting embarrassing information, but overreport desirable behaviours. 60 Although the accuracy with which people can recall past events is a crucial scientific issue in case-control research, it has received relatively little attention. Investigators seldom report the results of either large or small studies in which interviews were repeated at some suitable time after the original encounter, in order to determine the variability of responses to the same set of questions (Feinstein, 1985a, pp.501 et seq). As such studies are done after the outcome events have occurred, much time may have elapsed and the subjects may have great trouble remembering exactly what happened. The research subject has two different tasks. First, she must try to recall, as accurately as possible, what actually happened; secondly, she must make the "anamnestic" effort with adequate vigour, regardless of personal status as a case or as a control. Additionally, because controls have not experienced an outcome event that might stimulate recall of exposure, they may not clearly remember what happened. A woman who has been diagnosed with breast cancer is much more likely to ruminate about lifetime exposures and to read about breast cancer etiology than is a woman with a normal mammogram. Feinstein (1985a) suggests that an important first step in health surveys is a crude assessment of the subject's "sensorial" competence; this is the ability to understand questions, remember events, and respond accurately. The researcher can use various approaches to stimulate memory and improve recall: remind subjects of occasions on which exposures might have happened; provide lists of commercial names of possible agents to which subjects might have been exposed. However, such tactics may create bias rather than improve accuracy. Such multiple-choice questions may best be left until after the subject responds to more open-ended questions about exposure. Feinstein (1985a) suggests that a researcher can attack the problem of anamnestic bias by choosing a control group 61 who are likely to have reviewed their history with a vigour similar to that of the cases (p.508). Summary In the previous sections are presented some findings of cognitive psychology and survey research regarding the memory and judgment factors which influence the reporting accuracy for autobiographical events/experiences. These factors can be used to understand and to explain why survey response may be accurate, incomplete or distorted. However, these theoretical observations and hypotheses about factors relevant to survey response effects cannot be applied directly to survey research in general, or to the potential findings of my proposed research, should recall bias indeed be found to exist. The methodological limitations, and the resulting non-generalizability of the completed cognitive research discussed above, demand prudence in assessing the applicability of such findings to other kinds of research. For instance, in evaluating different cognitive theories, researchers have often used a biased sample — subjects who were for the most part university students. Due to the homogeneity of this study sample, the resulting data could not be used to generalize the results with confidence to other populations. These researchers often are reduced to generalizing their results solely on the assertion of their theory alone. In the experiments on judgment heuristics, specifically the "availability heuristic", there appeared to be little or no concern with measuring the level of effort (i.e., subject motivation) which respondents were willing to invest in providing an answer. In several cases, respondents were allocated only brief time intervals to answer questions. Because of such shortcomings, the applicability of judgment errors to survey results might well be limited. 62 The irrelevance of many tasks used in these laboratory experiments also prevents the direct application of the results of cognitive psychological research to survey research, and in particular, the question-answering tasks given to survey respondents. Many of the retrieval tasks involved the recall of lists of words, nonsense syllables, visual images. These certainly cannot be equated to the retrieval of real-life, personal/autobiographical events or experiences. Also, the time frame for retrieval tasks does not reflect the reference periods normally encountered in survey research. In the laboratory setting, the researcher designs retrieval tasks after relatively short time periods, often only minutes or hours between exposure to the material and the measurement of recall. Results based on such data may not reflect the nature or magnitude of the problems faced by survey researchers, who require recall of events from very-long-term memory. Finally, such cognitive research does not consider the sociological factors which influence what is actually communicated once retrieved. Groves (1989) argues that cognitive research "often implicitly assumes homogeneity of cognitive and response behavior across persons given the same task". There is a need for cognitive psychologists to investigate the effects of various sociological variables, such as social desirability, task difficulty and the salience of the event, on the accurate recall and subsequent communication of autobiographical events, behaviors and attitudes (p.409). It would be fruitful for the cognitive psychologist to work on interdisciplinary research projects with the social psychologist, whose interests in the impact of social context (desirability), the characteristics of the interview situation, and the effect of the characteristics and behaviors of the interviewer on respondent behavior would add a new dimension to the study of report accuracy. 63 2.3.5 Respondent Rule Effects "Respondent rule" refers to the eligibility criteria of surveys: specifically, who may answer the survey questions (Groves, 1989, p.414). Here, the debate is focused on the relative accuracy of self reports versus proxy reports. Respondent rule effects are an issue only in non-attitudinal research, which seeks information about individual behavior or observable characteristics. (In attitudinal research only the individual is considered an acceptable respondent). Although it is generally believed by researchers that self-reports are more accurate than reports obtained by proxy, this is not necessarily so. Cognitive and social psychology offer very different theoretical insights into the process of memory storage and the communication of responses, insights that may help to explain respondent rule effects in surveys. The first perspective draws upon inferences from schema theory about the nature of the encoding process for information about self, as opposed to information about others. Social psychology, on the other hand, focuses on the differential influences of social desirability upon responses which one gives about oneself and those which one gives about others. In addition to the effects of social desirability, social psychologists note that different 'roles' provide to their incumbents different information about different events. From the perspective of the cognitive psychologist, the respondent rule controversy must be addressed by looking at the encoding and organization of memories. The purpose of any research is the generation of accurate data. Therefore, one necessary attribute of a good respondent is that she has encoded and retained in memory the information which is relevant to the survey questions. Schema theory research has shown that there are differences both in how an individual perceives his own behavior as opposed to the behavior of others, and in how such information is encoded. Jones and Nisbett (1972), argue that self-64 schemata differ from others-schemata because the images possible for the self are limited. Schema theorists argue that this perceptual difference affects the stimulus information encoded, and the general organization of memories about ourselves versus others. Groves (1989), in discussing Lord's experiments (Lord, 1980), suggests that memories about oneself are organized around "central emotional states or other internalized attributes", whereas memories about others are organized about "observable traits and actions" (p.415). Furthermore, these findings may be generalized to naturally occurring events which are often the focus of survey measurement. If so, the implications are that proxy reports would be more accurate when the information requested covers characteristics of a person that involve physical action (e.g., episodes of acute health care, visits to the dentist). Self-reports would be better for information concerning more internalized states (e.g., frequency of chronic health problems, looking for work). Self-respondents may be inaccurate when an event is inconsistent with the internalized states dominating the self-schemata (e.g., underreporting of alcohol consumption when the individual believes that he has no drinking problem). Mathiowetz and Groves (1985) reviewed the literature on respondent rule effects for health reporting and found that, contrary to the prevailing beliefs, self-respondents are not consistently found to provide more accurate health data than proxies. They delineate several reasons though why self-reporters might be more accurate than proxy reporters: (1) proxies may not possess the knowledge about the event or characteristic in question; (2) because events occurring to others are usually not as salient as events which occur to oneself, some events may not be reported, or only the most memorable (i.e., a serious health condition) is reported. Saliency may also affect a respondent's ability to date events accurately when reporting for others (Groves, 1989). On the other hand, there are circumstances where proxy reports may be more accurate. Mathiowetz and Groves (1985), discuss role function within families as a factor affecting report accuracy. Health researchers have argued that knowledge about health-related events is more compatible with some self-schemata (roles) than others. The role of family 'health monitor' is therefore cited as an example where proxy reporting may be more accurate. In general, different roles provide individuals with different information about different events. According to this hypothesis, it may be argued that the responsibility inherent in the 'health monitor role' may heighten for her the salience of events occurring to other household members, and lead her, as proxy, to provide more accurate reports. In other words, the health monitor may be better motivated, and also have the relevant knowledge, to answer the survey questions. Depending on the nature of material being requested in a survey, other role definitions may also be able to provide accurate proxy reports. Social psychologists also identify social desirability effects as another factor influencing the accuracy of reporting about oneself and about others. They argue that when levels of information held by two persons are the same, if the trait being reported is perceived to be socially undesirable, it will be less often reported about oneself by oneself than by another. It has been demonstrated that respondents find it easier to report embarrassing or threatening information about someone else than about themselves. In general, response effect studies suggest that self-reporting is not necessarily better. Estimates generated from survey research may differ depending on the respondent rule that is chosen (Mathiowetz and Groves, 1985, p.639). Furthermore, the choice of respondent rule may depend on the nature or purpose of the survey, and specifically, the type of information that is being sought. 66 The respondent rule chosen should reflect the best source of information while taking into consideration social desirability effects. 2.3.6 Nonattitudes and Acquiescent Response Behavior Another source of nonsampling respondent error is found in attitudinal research, and can be categorized as errors associated either with failure to comprehend the survey questions, or with the possession of nonattitudes. Converse (1970), in studies of respondent opinions on various political issues, observed a group of subjects who provided inconsistent reports when questioned over repeated trials. It was argued that the inconsistency reflected a non-stable, non-permanent attitudinal state. Converse labeled these individuals as holders of "nonattitudes" because for them the survey measures concerned issues to which they had given little prior consideration. Within this group of respondents, some subjects provided a substantive response while others answered that they had no opinion on the issue in question. In trying to account for these inconsistencies, Coverse proposed that either the persons did not understand the questions in a consistent manner, or they lacked all the information/knowledge to be able to take a consistent position on the issue. When responses were given, Converse noted that they were random, crossing different response categories. [This discussion is found in Groves, 1989, at p.417] Groves (1989) went on to discuss Converse's use of metaphor in cognitive psychology to describe human information storage. Converse (1964) explained the nature of this nonsampling error by using cognitive terminology to describe the problems which occur in information storage when nonattitudes are held. He conceived respondent knowledge as an interlocking system consisting of pieces of semantic memories, concepts and arguments pertinent to the issue. A survey 67 question then acts as a stimulus to the retrieval of information from this system. The respondent will either report an opinion that has been well-rehearsed in the past, or she forms an opinion by weighing previously encoded arguments and counter arguments. However, when the respondent has little or no information encoded and stored about the issue, there is no strong network of concepts, arguments and counterarguments. The same survey question would in this case activate only weak ties between concepts and arguments of secondary relevance to the issue. Which of these ties is assessed by the respondent as critical in forming an opinion or stance will not be consistent over time, because few of these knowledge frames can be differentiated on their strength. This theoretical perspective can thus account for the random, inconsistent responses over replications of the same question. Smith (1984), in a review of nonattitude research literature, delineated two design strategies that can be implemented to correct nonsampling errors related to nonattitudes. The first employs "don't know" filters; the researcher would, before beginning the actual interview, question the respondent about whether he has carefully considered the issue prior to the interview. This would encourage the subject to provide a "don't know" response if she had not formed an opinion on an item. The second strategy builds on the work of Schuman and Presser (1981). Their research demonstrated that specific indicators of intensity of feeling about an issue could be used as a good predictor of the consistent reporting of opinions. As a consequence, Smith (1984) suggests the use of follow-up questions about the intensity of respondents' expressed opinions — that is, how strongly they hold their positions on a particular issue. Abelson (1986) also suggests follow-up questions about respondents' experiences in defending their positions, and their attempts to convince others of the relative merit of their position. 68 However, the proposal for longer questionnaires, which ask for more detail about the issues of interest, has not been favored due to the increased costs and possibility of non-response caused by the very length of the questionnaire/interview, with the consequent loss of subject motivation (Groves, 1989, p.419). Acquiescent respondent behavior (also referred to as yea/nay-saying behavior) has also been identified as a source of respondent error. Agreeing-response bias primarily refers to the tendency to agree with attitude statements presented to them, but has been extended to yes/no attitude questions (Groves, 1989; Schuman and Presser, 1981, Chapter 8) Research in this general area has been generated from psychological measurement studies on closed questions which require "yes" or "no" answers, with statements to which the respondent is to "agree" or "disagree", and with questions using scales from which the respondent must choose a category from an ordered set (e.g., strongly agree... neutral...strongly disagree). In personality studies, social psychologists such as Couch and Keniston (1960) found a tendency for some subjects to agree entirely with the statements of another person, thus apparently disregarding the content and context of those statements. They regarded acquiescence as a personality trait and studied it as such. Converse (1964) described it as the tendency for less-educated respondents to be "uncritical of sweeping statements and to be 'suggestible' where inadequate frames of reference are available". Both interpretations relate education level to acquiescent behavior, but suggest different dynamics. The first focuses on interview-respondent interaction and regards education as one indicator of social status. The second implies deference to interviewers and to interview statements because of poor cognitive 69 abilities (i.e., poorer education, or simply lack of opinions (nonattitudes) on the issues. Rorer (1965) and Nunally (1978) questioned the importance of agreeing-response bias. Their studies showed that the magnitude of the agreement response bias was not significant as a measure of personality nor as a source of systematic invalidity in measures of personality or attitudes. However, what must be emphasized is that this questioning of the importance of acquiescence in psychological research is not incompatible with the assumption of survey researchers that acquiescence is a significant source of response bias in survey research. Researchers who take this position argue that the phenomenon plays a significant role when educationally heterogeneous populations are interviewed but it disappears entirely when student samples and self-administered questionnaires are used (Schuman and Presser, 1981). Further, Schuman and Presser (1981) find that agreement due to acquiescent behavior can be remedied in the design of the study. The problem is minimized or eliminated when, instead of a statement, the subject is offered a forced choice between the statement and its opposite. This observation would imply that acquiescent behavior may be more a function of the questionnaire format than a characteristic of the respondent. 2.3.7 Sociodemographic Correlates of Respondent Error Four respondent attributes have been identified as potential sources of response error in retrospective survey research: education, sex, age and race. The impact of each respondent characteristic upon response effects will be discussed separately. Then this section will look at response effects due to the interaction of respondent variables and "task" variables. Interaction between respondent attributes and interviewer characteristics will be discussed in Section 2.4, dealing with the interviewer as a source of measurement error. Sudman and Bradburn (1974) comprehensively reviewed survey research studies which specifically investigated different sources of response errors. They then summarized all these studies and determined the distribution of response effects due to age, sex, education and race. They concluded that none of these variables is statistically significant when examined separately (p.98). The largest response effect among these variables is for number of years of education (which is measured as the number of years of formal education). However, as a measurement criterion, the number of years of formal education is criticized by researchers as a poor choice, because previous psychological research has shown it to be a poor indicator of crystallized intelligence (Groves, 1989; Schuman and Presser, 1981). Much of the following discussion is taken from Groves (1989, pp.441-448). As a source of respondent error, education is often considered as a 'proxy' variable for a respondent's cognitive abilities, i.e., the ability to comprehend survey questions and the ability to retrieve, reconstruct and communicate responses to survey questions (Krosnick and Alwin, 1987; Schuman and Presser, 1981). Because researchers believe that formal education is indeed indicative of an individual's cognitive abilities and general knowledge, they then make several assumptions: that less educated respondents are slower or unable to comprehend survey questions, lack the knowledge or opinions to answer them, will have difficulty communicating their responses, and might be influenced by the interviewer or be more apt to have their choice influenced by irrelevant cues (Schuman and Presser, 1981). 7 1 A review of the literature suggests that the education response effect was most pronounced for adult respondents with less than eight years of schooling. Bradburn and Sudman (1979) note that adult subjects with grade school education provide more missing data (i.e., a higher non-response rate and an increase in "don't know" responses) in surveys. In addition, these subjects are more susceptible to interviewer effects, such as being influenced by the perceived differences in status between interviewer and respondent. Converse (1970), in a multivariate analysis of "no opinion" behavior in Gallup and Harris Surveys, included education, the length of the questions (>30 words), whether the question forced a choice between two or more response categories, and whether the topic concerned material related to foreign political affairs as predictors of "don't know" response rate. He found that education was the strongest predictor of "don't know" responses. But, as this study did not control for age, other researchers think that age differences may account for the observed differences over the education levels (Bradburn and Sudman, 1979; Groves 1989). The study was incomplete in that it did not identify the source of the problem — whether due to lack of comprehension of the survey questions, lack of knowledge of the topic, inability to retrieve the information, the complexity of the language used in the questions, or interviewer effects. Groves did speculate that less educated subjects were more willing to answer "don't know", possibly due to less perceived pressure to appear informed relative to the more educated subjects. However, Schuman and Presser (1981) note that these results are not uniformly obtained when other studies are reviewed. In some studies asking for information on obscure topics, subjects with college education were found to provide a higher percentage of "don't know" responses. Sudman and Bradburn (1974) also found that women tended to answer "don't know" more often than men, except on topics related to birth control and 72 morality (p.100). They suggest that this difference reflects the different roles played by men and women, hypothesizing that it is more acceptable in our society for women to admit that they "don't know" or have "no opinion", than it is for men. They also indicate that this observation could correlate with level of education. As mentioned previously, when researchers discuss education effects, they often propose that less-educated respondents will be slower to understand the context and meaning of the question, and consequently will be more apt to have their choice influenced by irrelevant cues (Schuman and Presser, 1981). In studies on question structure and order, Schuman and Presser found very mixed support for the hypothesis that the less-educated are sensitive to question effects. What they did observe was that less-educated respondents gave different responses to open-ended survey questions than to equivalent close-ended questions. They "explained" this observation by noting some evidence that those with less education are either more influenced in their choice of answers by the very fact of being forced to make choices (closed responses); or they have difficulty communicating answers to open-ended questions. Age is another respondent attribute associated with response error. It is an important source of measurement error and the only one related to memory. It is however difficult to integrate the results of research upon the effects of age upon response errors, due to the disparate and inconsistent definitions of the "elderly" age group. The term "elderly" spans the 55 - 70+ years range (Groves, 1989, p.441). In general, it has been observed that with aging there is a decrease in response performance characterized by increasing failure to retrieve information from memory (Craik, 1977; Sudman and Bradburn, 1974; Groves, 1989). 73 From an extensive review of the literature on age effects and memory, Craik (1977) compared and summarized the results in performance for elderly versus younger subjects and concluded that: (1) elderly subjects have larger deficits in recall from secondary or long-term memory than in recall from primary or short-term memory. This observation may have little if any impact on surveys designed to retrieve autobiographical information because the conclusions were based on recall of information from semantic memory only; (2) elderly subjects' performance is poorer on recall tasks than on recognition tasks. In their studies on recall and recognition in the elderly, Herzog and Rodgers (1989) asked elderly respondents at the end of an interview session to name six physical functions that were the subject of the questions earlier in the interview. Afterwards, they presented 20 survey questions to the subjects (only 10 had been previously presented), and asked the respondents to identify which, if any, had been asked of them. The data from these studies suggest that both recall and recognition decrease over the age groups; but comparatively speaking, recognition tasks were performed better. Even when Herzog and Rodgers adjusted for educational differences, these effects remained. Researchers such as Groves (1989, p.441) and Schuman and Presser (1981, pp.91-92) have also observed that it is often more difficult to focus and keep the attention of the elderly on the interview task; often, they stray off topic and fail to follow the interview protocol. Their answers are often only "tangentially relevant" because they do not respond to the particulars of the survey questions. Sudman and Bradburn (1974) also note that the percentage of non-response rates, and "don't know" responses increase as a function of age. In explaining this, they argue that the elderly disengage from societal activities as they age. 74 Acceptance of this age-trend effect is not unanimous. Glenn (1969) argued against; his work suggested that when strict educational controls were introduced, differential effects due to age disappeared. However, the weight of the research would still favor Sudman and Bradburn's observation. Therefore, in summary, the percentage of "don't know " responses decreases with respondent education; and, the percentage of "don't know" responses increases with respondent age. Because of these two trends, it is assumed that there is a correlation between education and age in the occurrence of non-response rates. A final observation in the literature is that elderly subjects tend to be more susceptible to interviewer bias effects in their responses to survey questions (Sudman and Bradburn, 1974; Schuman and Presser, 1981). Groves observes further that different theoretical perspectives are offered to explain response effects related to age. Smith (1980) suggests that errors could be due to: (1) poorer organization of memories during the encoding stage; (2) decreased attention and cognitive processing at the acquisition and retrieval stages; (3) interference during recall because of the extensive, rich links between any particular retrieval cue and information in long-term memory. With older subjects, who have years of experiences, their memories are as plentiful as the connections between them. This may result in diminished, distorted recall. Groves (1989), however, recognizes that there is really no causal model for the effect of aging upon encoding and retrieval processes. Hulicka (1967) attributes memory response effects to physical changes occurring with chronological age and poor health. It is suggested that chronological age is a proxy measure for physical attributes such as loss of brain tissue and poor vascular circulation; these changes may affect brain functioning (i.e., encoding and retrieval processes); as well, poor health may account for diminished motivation and attention in elderly subjects. 75 Sudman and Bradburn (1974) found that sex differences interact with several task variables: the threat posed by the question, the structure of the questions (open versus closed format), the length and difficulty of the questions, the method of administration, and whether or not there is a preferred or socially desirable answer to a question. The research on response effects due to sex differences indicates that response variance is larger for females with close-ended, threatening questions where a socially desirable answer is possible. The response effect for women is twice as high as that for men when the questions are threatening; the reverse situation exists when the questions are non-threatening. The method of administration (face-to-face interview versus a self-administered questionnaire) and two respondent characteristics, sex and race, influence measurement error in surveys. Male respondents, both black and white, find face-to-face interviews more threatening, especially when some questions have preferred or socially desirable answers. Here, the response effects are the most pronounced. When the race of the respondent is considered separately, response effects are larger for black than white respondents on face-to-face interviews, and black subjects are influenced more by interviewer effects (i.e., deference is higher for black respondents). When a socially desirable answer is possible, the response effects are larger for white respondents. Sudman and Bradburn (1974, pp.102-109) also note that five task variables interact with respondent attributes to produce measurement error in survey research: the degree of threat posed by a question, the method of data collection (face-to-face interview vs self-administered questionnaire), question structure (open-ended vs close-ended questions), question length, and whether or not a socially desirable answer is possible). They also found that women were influenced a little more than men by question structure. They postulated that the 76 difference may be related to different interpretations of the question. Close-ended questions with forced choices are a greater source of response error for women. The length of the question may also affect response. Longer questions are often more difficult; respondents have difficulty understanding what is being asked of them. Also, the longer the question, the more likely it is that interviewer effects (i.e., intonation cues, different wording, etc.) will be activated. Furthermore, the length of the question often influences the length of responses, and can lead to incomplete or inaccurate reporting. Sudman and Bradburn (1974) found that two respondent characteristics (race and education) interact with question length. Racial effects are the largest source of error. When questions are short (<12 words), there are no differences between black and white respondents. The response effect is larger for blacks when the question is longer (>28 word). For the education variable, measurement error is greatest when the respondent is less educated (high school education or less), and when the sentences are long and complex (>18 words)(Sudman and Bradburn, 1974, p.110). Two respondent characteristics, sex and race, interact with social desirability. The response effect is greatest for women (nearly twice as large as that for men), and for white respondents when a socially desirable answer is possible. 2.4 Factors Affecting Exposure Reporting and Recall Accuracy: Interviewer as a Source of Measurement Error Because the survey interview is the means by which response measurements are obtained, the interviewer can play a significant positive or negative role in the process of recall, in the interpretation of what is recalled, and 77 in the actual recording of the response information. The interviewer herself can generate response measurement error at any one of these stages. Despite the potential for interviewer bias, concern about it has decreased in the last 20 years because its overall impact on study variance has been minimized through such improvements in survey design protocols as the standardization of question wording and administration, and the use of non directive probing procedures. As well, there is more rigorous training of interviewers; training interviews teach one how to be objective, neutral and accepting of all responses when collecting and recording data. Methodologists are more concerned now with survey measurement error arising from task variables (Sudman et al., 1977). Sudman and Bradburn (1974, pp.13-16) state that if a researcher wants to understand 'how' interviewers contribute to response errors by introducing "variable" measurement across interviewers, she must consider the problem from three perspectives: interviewer role demands, interviewer role behavior and the extra-role characteristics of the interviewer (i.e., race, sex, education, and social class). First, interviewers can influence subject responses by the way they carry out the interview role demands — how they read survey questions, how they clarify respondent misunderstanding, how they probe incomplete answers, and how they handle subjects' questions. Training of interviewers is designed to produce complete uniformity of behavior among the interviewers, thereby removing any effects which variation in interviewer behavior might have on respondent answers. The greater the degree of structure in the interviewer's role, the lower the relative response effect will be (Sudman and Bradburn, 1974). However, training programs cannot always ensure that interviewer behavior will be consistent with the study protocol. 78 Secondly, interviewers may administer the questionnaire differently to different subjects. They have been known to reword or eliminate some questions; to record responses incompletely, inaccurately or falsely; and to use different probing strategies when the respondent does not comprehend a question or is having difficulty communicating a response to an open-ended question. Interviewer effects are therefore possible, even if the study protocol is adhered to and all questions are administered and read correctly. Furthermore, interviewers can unconsciously change their intonation, or use different, unplanned words. In these diverse ways, an interviewer's behavior is not always consistent with interviewer role demands. The survey questions to the respondents will vary across interviewers, producing measurement error. A general finding is that the greater the degree to which the interviewer adheres to the role demands required by the interview/study protocol, the lower the relative response effect will be across interviews (Sudman and Bradburn, 1974). Thirdly, a survey interview is a structured social interaction (Kahn and Cannell, 1957) conducted within the context of a complex set of social norms which guide interactions among individuals. Therefore, it is subject to the same social factors that influence other interactions, such as the interviewer's and respondent's sociodemographic profiles (i.e., their race, education, sex and socioeconomic status). These variables often act as cues which help respondents make decisions about their own behaviour while helping the interviewer to interpret responses (Groves, 1989, p.359). In Cannell's 1977 summary of the results of studies on interviewing methods (discussed earlier with respect to the respondent as source of measurement error), the author concluded that research has shown that the actual 7 9 behaviour in an interview is the main variable which correlates with the index of reporting quality. The "cue-search model of interview interaction" posits that the respondent looks to the interviewer, or to some other source, for cues about expected behaviour; the interviewer is in a similar situation, searching for cues. Research has also shown that changing the characteristics of the process, including interviewer behaviour, can have marked effects on both the amount and accuracy of health data reported by respondents (Cannell 1977, p.37). Cannell (1977) states that interviewer feedback can bias answers, or it can improve response validity. The effects of verbal reinforcement on respondents can be divided into three categories: cognitive effect, conditioning effect, and motivational effect. These categories overlap and interact. The first effect occurs when verbal reinforcement supplies cues about the interviewer's expectations and about how the respondent is meeting them. The second effect is important in many studies of the psychology of learning. In the simplified model of interview, the researcher's evaluation immediately follows the respondent's answer; this can reinforce the response or can also alter the frequency of the behaviour that preceded it. This process can thus either strengthen or weaken the probability of eliciting that behaviour in subsequent trials. The third possible effect is motivational - the intensity or psychological effort which the respondent gives to the reporting task, and to other behaviours which may interfere with the adequacy of response. Cannell (1977) concludes that reporting accuracy may be improved by manipulating the conditions under which retrieval occurs. The conditions of recall have a crucial impact on "what" is reported, and on accuracy. Different questioning strategies can improve reporting by changing the conditions under which the respondent is invited to search for past events. However, these studies were not concerned directly with the cognitive processes involved in recall. 8 0 Like other researchers, Cannell (1977) believes that an experienced event is not merely recorded in original form, as on a tape; rather, it is organized into a perceptual field. Its meaning depends upon how it is perceived, and with what other events it becomes associated in memory. What an interviewer might see as a simple item may, in fact, be organized in several frames of reference by the respondent; a single question about the event may not be the best stimulus to recall; several questions from different reference points may be necessary. Cannell (1977) sets out a model of information processing (p. 53). This shows the respondent's cognitive organization and the researcher's questionnaire design as two diverging paths, which lead to two independent informational states — memory trace, and stimulus question — whose interaction in the interview is expected to produce the retrieval of the original information. This model suggests that the probability of accurate recall is a function of the ability of the stimuli questions to interact adequately with the respondent's cognitive organization. The appropriateness of the stimuli questions is itself a function of the researcher's ability to comprehend the nature of the respondent's cognitive path, and to use this knowledge in framing the questions. An event may be stored in memory under various states so distant from the original information state that a question stimulus merely traced from the original event, or from its straightforward conceptualization by the researcher, may not elicit the stored information. For instance, memory can process an "illness" in ways that transform and organize it around such varying concepts as pain, incapacity, costs, doctor's visits, hospitalization, medication, treatment, symptoms — or even more generally around other causal, circumstantial, or consequential events. Sudman and Bradburn (1974) refer to the variables which act as cues in interviews as "extra-role characteristics". Their study conclusions are that higher social status interviewers induce a larger response effect than do interviewers of 81 lower social class status — if, and only if, the respondent is aware of the interviewer's socioeconomic status. The status of the interviewer is important in school studies, where status is recognized. The resulting error is associated with an incorrect interviewer perception of the respondent's answer: it is speculated that an interviewer may unintentionally "hear" and record answers more consistent with his own views. The response effect is most pronounced with questions dealing with social class. The nature of the question also determines the impact of interviewer characteristics on response effects. Results of numerous studies, including Katz (1942), show that the greater the saliency of the interviewer's extra-role characteristics for the subjects being investigated, the larger the relative response effect will be (Sudman and Bradburn, 1974, pp.15,110-111). The interviewer's sex and race have no significant impact on response errors. However, there is a trend that suggests a higher "don't know" response rate with female rather than male interviewers, and for inexperienced lower social class interviewers. Furthermore, the "don't know" rate diminishes as the education and experience of the interviewer increases, regardless of sex. The race of interviewer and respondent influence survey error when the questions deal with racial attitudes and issues. Stronger, more militant answers are given to black interviewers by black respondents when questions deal with race. Differences disappear with non-racial questions. There is a paucity of data regarding response effect interactions between age of interviewer and respondent. One general observation is that older, female interviewers get lower response effects in face-to-face interviews than do younger interviewers, especially inexperienced undergraduate university students. 82 Sudman and Bradburn (1974) have also discussed the impact on measurement error of the joint effects of interviewer characteristics and a number of task variables. Their observations can be summarized as follows: a. Method of Administration: (1) The age of the interviewer is relevant when the interview is face-to-face. Response effects decrease as the age of the interviewer increases. (2) In self-administered questionnaires, the education of the interviewer is the more important variable. The more educated this individual is, the larger the response effects, at least in school settings, where status is based on education. To extrapolate for example, a pregnant woman given a dietary questionnaire by a doctoral research student may indicate that she is consuming milk when her daily fluid intake consists of soda pop and coffee exclusively. Here, trait desirability and social approval factors play a role in what is reported on the self-administered questionnaire. (3) The interviewer's sex is important only when survey data are collected by means of a personal interview. Somewhat larger response effects are attributed to male interviewers. b. Structure of Interview Questions: (1) Close-ended questions have a larger influence on respondents than do open-ended, but they minimize interviewer bias by providing more structure. (2) Concerning the race of the interviewer, it was found that response effects obtained by white interviewers are higher than for black interviewers on open-ended questions. But the studies are small. 8 3 c. Possibility of a Socially Desirable Answer: (1) When a socially desirable answer is very possible, response effects are larger for white interviewers. The reverse is true when no socially desirable answer is possible. (2) Results for interviewers by sex were the opposite of those found for respondents. When there is a possibility of a socially desirable answer, the response effects are more than twice as large for male rather than for female interviewers. If there is little possibility of a socially desirable answer, there is no difference between male and female interviewers for response effects. Several research studies have been completed about the influence of question type (factual versus attitudinal) on interviewer generated bias (discussed in Groves, 1989, p. 373). It was often assumed that factual questions with knowable and verifiable answers would be less influenced by such interviewer variations as differences in question wording, question administration, and delivery/intonation. The results of studies comparing interviewer effects on factual and attitudinal measures are in fact mixed, with only some of them corroborating the original assumption. For example, O'Muircheartaigh (1976), found larger response effects for attitudinal questions, and in particular, those stated as open-ended questions. Collins and Butcher (1982) found factual questions less susceptible to interviewer effects. Fowler and Mangione (1970), used a regression model predicting Kish intraclass correlation coefficients (i.e., a unit-free measure expressing the ratio of variance between interviewers to the total variance). The predictor variables were defined as specific characteristics of the survey question and included the following: the difficulty of the question (degree of cognitive processing required), the vagueness of the terms in the questions, the threat or sensitivity of the topic, whether or not the question was factual or attitudinal, and whether the sentence 84 was open-ended or close-ended. The most important predictor of the Kish intraclass correlation coefficient was the task difficulty imposed by the question. Fowler and Mangione found no evidence that factual items are subject to lesser interviewer effects than are attitudinal questions. Open-ended questions are not susceptible to greater interviewer variance; however, the number of answers obtained to an open-ended question is quite sensitive to interviewer effects (i.e., variation due to different probing behavior). Probing by interviewers resulted in additional information. In social psychology and sociology, there is literature about how interviewers' expectations influence response variation: they affect the manner in which questions are presented to the respondent, including word changes, variations in intonation, and other attributes of questionnaire administration that influence respondents in different ways. Interviewer expectations may influence both the answer given by the respondent and what is recorded by the interviewer. Hyman (1954) was the first to investigate the role that an interviewer's prior expectations might play in invalidating survey data (Groves, 1989, p. 395). He identified three kinds of interviewer expectations: (1) role expectations (i.e., the interviewer expects certain responses from different groups of individuals such as women, blacks, clergymen, laborers); (2) attitude structure expectations, in which the interviewer expects respondents' views to be internally consistent; and, (3) probability expectations. Hyman suggested that prior to the commencement of the survey, interviewers have "probable expectations"; they expect a certain distribution of expected answers congruent with their own beliefs about the prevailing sentiments in the general population. Subsequently, their behavior during the interview may effect such a distribution. In other words, Hyman argues that the interviewer expects a certain distribution of responses and then unconsciously tries to fulfill that distribution expectation. 85 Work by Rosenthal (1966) and Rosenthal and Rosnow (1969) suggests possible interviewer effects related to both their opinions and to their expectations of the respondent. Interviewer's expectations might cause biased data collection in several ways. First, bias may occur because the interviewer's opinions are communicated to the respondent; the respondent then modifies her own responses accordingly to fulfill the expectations of the interviewer. Secondly, the interviewer might ask leading questions to probe the respondent's replies, or she may fail to probe unclear or inappropriate answers, or she might be biased in which replies are probed and how. Thirdly, the interviewer might be selective in which responses are recorded; she may even record what the respondent "meant" to say. Fourthly, the interviewer could also bias survey data during sampling through the choices of whom to interview. It is evident that survey responses may be incomplete, inaccurate or totally false if interviewer effects are significant (Cannell et al., 1977; Sudman et al., 1977). Sudman et al. (1977) argue that prior expectations may also relate to anticipated difficulty in asking questions of the survey respondents, to subject uneasiness about answering threatening or sensitive questions, or to expectations about the levels of under- and overreporting/ percentage of "no opinion'V'don't know" responses. Here, the hypothesis would be that interviewers who anticipate difficulty with a study or high item non-response may not probe incomplete or ambiguous responses, or will communicate a lack of confidence to the respondent. To test this assumption, Sudman et al. (1977) had all the interviewers complete a questionnaire prior to the survey. It measured the interviewers' perceptions of how difficult it would be to ask the survey questions, how inhibited the respondents would be in answering the questions, what the non-response rate would be for certain questions, and how large the underreporting of sensitive or threatening information might be. Their results do not support the hypothesis that 86 interviewer's prior expectations result in interviewer variance and response effects. Interviewer expectations concerning reporting errors and difficulty of administration were not predictive of respondent behavior. There was a slight, but insignificant tendency toward lower reporting of sensitive/threatening information to interviewers who anticipated difficulties with these questions. However, the results of the study must be considered only tentative, because of a serious design limitation: the interviewers were not assigned randomly to the respondents. This makes the interpretation of interviewer effects problematic. If an interviewer is not "blind" to the outcome, he might pursue exposure information more vigorously for known cases than for controls who show no abnormality (Levin, 1983). However, information gathered by trained interviewers is more reliable than that collected by self-administered questionnaire because accurate interpretation of the questions is important. Levin also believes that design will allow for the demonstration of interviewer bias, if it exists. Grichting and Caltabiano (1986) conclude, however, that no standard procedures are applied to measure and evaluate the amount and direction of bias resulting from the interviewing experience. From an extensive review of the literature on the dynamics of interviews, they had little doubt that interviewing does change a respondent's opinions, beliefs, and action tendencies. Because Feinstein (1985a) believes that both an interviewer's attitude and mode of investigation can substantially affect the response of a subject to a question, for example about prior exposures, he says that where possible it is best for the researcher to be "blinded" from the research hypothesis and from the subject's status (case or control). To reduce interviewer bias, he also recommends that a rigorous and relatively rigid format be used for data acquisition. Phrasing of relevant questions — those whose answers provide crucial research information 87 -- as well as the methods of recording answers should be applied uniformly in all interviews; the format should also allow for each pertinent positive answer to be followed by additional questions, which are also uniformly arranged and phrased (Feinstein, 1985b). In conclusion, it is important to consider that interviewer bias is not as critical as the impact of various task variables on response effects. However, certain characteristics of the interviewer, either alone or in combination with the characteristics of the respondent and relevant task variables, might be sources of survey measurement error. Sudman and Bradburn (1974) suggest that interviewer bias must be addressed by considering: interviewer role demands, interviewer role behavior and the extra-role characteristics of the interviewer. They give three guidelines which provide assistance in understanding and preventing this source of error (p.15): (1) The greater the degree of structure in the interviewer's role, the lower the relative response effects. Interview protocols and interviewer training are important. (2) The greater the consistency between interviewer role demands and interviewer role behavior, the lower the response effects will be. This emphasizes the need for extensive interviewer training. (3) The greater the saliency of the interviewer extra-role characteristics for the questions being asked, the greater the relative response effects will be. 2.5 Factors Affecting Exposure Reporting and Recall Accuracy: Task Variables as a Source of Measurement Error Sudman and Bradburn (1974) found few studies about the effects of "task variables" on survey measurement error: these are the response effects due to the 88 conditions under which information is acquired by the interviewer and responses are generated by the respondent. The authors' synthesis of a large body of research on response bias suggested that the task -- retrieval and reporting of information — and the conditions under which it is performed is the largest source of response effects. From their analysis, they distinguished three categories of task variables which appear to influence the accuracy or the variance of responses: (1) task structure; (2) problems of self-presentation; and (3) the saliency of the requested information to the respondent. Within these categories, they identify the following factors as having the largest influence on survey measurement error: level of psychological threat, the possibility of a socially desirable answer, the saliency of the questions to the respondent, the method of administration (personal interview or self-administered questionnaire), the location of the interview, as well as the position and structure of the questions. Self-reporting inventories are a cheap and efficient means of data collection (Furnham, 1986). They can be administered by research workers with clinical experience. However, like all self-response measures they are open to response bias, which must somehow be dealt with. The usual means have taken one of three forms. First, to provide a "lie scale" within the questionnaire itself to detect unreliable subjects, and expose differential reporting between cases and controls; this should be sensitive to both over- and underreporting. Secondly, to emphasize "honesty" in responses. Thirdly, to reduce the face validity of some questions, so that respondents are not as aware of what the assessors are trying to measure. In 1990 Coughlin reviewed significant literature and concluded that little study has yet been done into the factors contributing to bias due to differential recall between cases and controls in retrospective studies. He too thought that interviewing techniques do influence recall. He noted that Schlesselman (1982) 89 and Rossi (1983) believe that the content and the form of questions both may affect recall accuracy. Supplementary devices such as introductions to sections of the questionnaire may improve responses, possibly due to the stimulus and time provided to the respondents. Schuman and Presser (1981) noted that the greatest potential for non sampling error is how the questionnaire is constructed, that is, the choice of wording, the use of open-ended versus close-ended questions, and the characteristics of the interview situation in which the questions are delivered. The length of time since the event occurred (i.e., the total duration of the survey's reference period) and the referent person about whom the questions are asked (respondent rule effects) also influence response effects, but their impact on response variance is believed to be less significant. Oksenberg and Cannell (1977) are concerned that the nature of the respondent's task — first in comprehending the meaning of the question, and then in retrieving, reconstructing and reporting the required information — may create demands which the subject is unable or unwilling to meet because they exceed, or seem to exceed, his memory or his ability to process and integrate information. Some respondents will not show the requisite motivation, and will subsequently perform these tasks with minimal effort; they approach the task demands as "cognitive misers". When the respondent has not understood the question, or is not sufficiently motivated to retrieve and reconstruct the necessary information, extraneous cues (such as the status, behavior and appearance of the interviewer, the respondent's beliefs, values and goals, or his assumptions about the intended meaning of an ambiguous question) may drive the selection process, determining what is reported, and the degree to which it approximates the true response. 90 The normal complexity and demands of the information and response processing may be further increased when the respondent considers the response to be embarrassing, sensitive or personally threatening/uncomfortable. Here, the psychological implications of providing responses which accurately reflect the respondent's beliefs, attitudes, values or behavior may lead him to suppress the information (underreporting) or distort it into a more acceptable response for protection of his self-esteem (overreporting of desirable behavior/attitudes). Raphael and Cloirre (1994) also discuss the impact of the respondent/s affect on memory and memory retrieval. They suggest that a 'mood-congruence model of memory' may partially explain the occurrence of differential recall of prior exposures (i.e., recall bias). Commenting on the research of Blaney (1986) and Ucros (1989), these authors suggest that "negatively-toned prior exposures are recalled more easily by respondents in a negative mood state at the time of recall than those in a positive mood state" (p.556). They also note that this may be a significant problem in psychiatric epidemiology because "negative mood or demoralization is often an indicator of case status" (p.556), and the specific research requirements in this domain often involve the ascertainment of negative life experiences and events. Raphael and Cloitre (1994) also stated that "mood congruent patterns may not occur when recognition memory processes (i.e., respondents indicate that a proposed exposure did or did not occur) are invoked in the context of an epidemiological study" (p.556). Research suggests that mood congruent effects would be more likely to happen when the memory demands are low, that is, when the subject is required to remember only a small amount of information, and there is no time delay between the exposure and the request to recall the information (p.556). They further comment that mood congruent effects may not be relevant in epidemiological research due to the high memory demands - the need to remember and to report relatively large amounts of 91 information about remote and low-salience life experiences and events. Lastly, they report that "mood may have an impact beyond memory retrieval: it may influence respondents' reconstruction or subjective evaluation of details about recalled experiences", specifically, the assessment of "how frequent, how important, or how positive/negative a prior exposure was" (p.556). In summary, the 'mood congruence model of memory' as an explanation for recall bias proposes that the particular mood state of the respondent affects recall of prior exposures, and that the mood state may often differ between the cases and the controls, unless the research design uses a 'mood equivalent' control group. Furthermore, a depressed mood predicts poorer recall of instances of prior exposure; however, subjective assessment of those exposures tends to be distorted in a "mood congruent manner". Raphael and Cloitre (1994) also noted that a "mood related memory deficit may reduce effect sizes artifactually" (p.555). Consequently, they recommended that "the recall of event occurrence must be considered separately from subjective appraisal of event characteristics" (p.555). These situational factors (task variables) can definitely lead to biased or distorted data. The most frequent distortion in survey data is the failure to report information (i.e., false negative reports). This can be due to a failure in the information retrieval process, a true memory lapse, or carelessness/unwillingness to make the effort necessary to retrieve the information. False negative reports are quite common in reports of past behaviour or experiences, especially when the time between event occurrence and interview is long. Another common distortion involves making false positive reports — that is, falsely reporting events, behavior or other factual information as having occurred. This distortion often appears when there is a reference to time; such telescoping (overreporting) errors results from compression of time - the event is remembered 92 as having occurred more recently than it did. False positive reports may also reflect faulty recall or may be related to the need for social approval and the possibility of giving socially desirable answers (Cannell et al., 1977). As Cannell noted earlier, a respondent's task in answering a broad question is enormous. She must create appropriate frames of reference to guide recall, create cues to reactivate traces of possibly low salience. One cannot expect the motivation to invest substantial effort to be high, especially if the questionnaire has no immediate benefit to the respondent (Cannell, 1977). Questioning only allows short periods of time to complete this process. Within this framework error is predictable. The broad question is not an adequate stimulus. Instead of asking one standard question derived from a simple conceptualization of the event, several questions may be needed, from various hypothesized states of memory processing. That is, instead of requiring the respondent to build up her own cues, the researcher should try to create these recall aids and build them into the questionnaire. If the researcher can successfully predict and design the relevant cues and frames of reference, the respondent's recall process should be significantly assisted, with resulting improvement of recall. Cannell (1977) suggests that an extensive questionnaire, containing a large number of questions providing multiple and overlapping cues, may assist retrieval. However, it also might inhibit participation due to time involvement and effort obviously required. Nevertheless, he concluded that the cue-giving approach, for instance using symptomatic manifestations of illness as a frame of reference, is more productive in eliciting the report of illness, than are standard general questions. He thought that the involvement of the interviewer and the length of questionnaire might convey the message that the recall task was important, 93 heightening motivation. However, he concluded that very little was known about the asking of appropriate questions. It might be that reporting errors are often the result of questioning errors. Long questions might elicit both more information and a more accurate report, contrary to common assumptions. Question length may have a cueing effect. Findings from a number of validity studies (Neter and Waksberg, 1964b; Sudman and Bradburn, 1974 (Chapter 3); and Cannell et al., 1977) on the effect of these task variables on response variance are summarized: (1) Response accuracy/variance is influenced by where and how the interview is conducted. Regarding the method of administration, self-administered questionnaires are better than personal interviews (i.e., are associated with more accurate and complete reporting) when the questions to be asked are personally threatening, or when a socially desirable (preferred) answer is possible. Face-to-face interaction is an important factor in the generation of socially desirable responses; there is a tendency for respondents to present themselves favorably to the interviewer, or to report behavior and attitudes that conform to the socially acceptable norms. Socially desirable behavior is likely to be overreported (i.e., false positive reports). Behavior or attitudes which are sensitive, embarrassing or threatening, and which therefore conflict with a norm of "self-presentation", are likely to be underreported (Phillips and Clancy, 1972). Social desirability also works in conjunction with factors such as threat or saliency. Attitude questions rated as highly threatening and having a strong possibility of a socially desirable answer have much larger response effects than any other category of attitude items. Among behavioral items, the effects are largest for items with a strong possibility of a socially desirable answer and which are somewhat highly threatening (DeMaio, 1984; Sudman and Bradburn, 1974). 94 (2) Differential response effects occur for the different subjects studied regardless of the conditions of the interview. Sudman and Bradburn (1974) note that factors such as threat, social desirability, and memory factors are probably responsible for these effects. Their analysis found that threat and saliency work in opposite directions on response variance. The largest response effects are associated with threatening questions. Conversely, events important to the respondent are recalled more easily (in accordance with the availability heuristic and accessibility principle) and reported more completely and accurately than those of lesser psychological significance. Questions about salient events are more likely to motivate the respondent to follow the retrieval and memory reconstruction processes. When considering these two task variables in combination, the largest response effects would occur where saliency is low and threat is high. (3) The age of the respondent and interviewer can each create response effects when survey questions deal with behavioral/attitudinal information perceived as threatening (i.e, illegal behavior, racist attitudes), sensitive or embarrassing (i.e., sexual practices). Here, the largest response effects are found with young respondents and interviewers, and in particular, college students. Self-administered questionnaires are the method of choice when highly threatening questions are to be asked and anonymity is required. (4) For threatening questions or those with a socially desirable answer, the analysis of Sudman and Bradburn (1974) suggests that close-ended questions increase the threat by forcing the respondent to choose one of the response alternatives; the result can be large response effects. Underreporting or false negative reports are noted when a personal interview is used rather than a self-administered questionnaire, and also when the interview is conducted at home when others are present. Furthermore, short questions have a strong negative 95 (underreporting of behavior) effect on reports for threatening behavioral/attitudinal questions. Research has shown threatening questions should be asked towards the end of an interview when rapport is established between the interviewer and the respondent. It is believed that if the topic is threatening, sensitive or embarrassing, the greatest threat would occur at the beginning of the interview with the threat diminishing as the interview progresses and rapport is established. (5) Respondents are more likely to report socially undesirable, sensitive and embarrassing attitudes/behaviors about others than about themselves. Therefore, self-reports are more likely to be less accurate than proxy reports under these circumstances. (6) False negative reports (underreporting rates) are related to the time elapsed between the occurrence of the events to be recalled and reported and the interview, to the salience of the events for the respondents, and to the perceived social desirability of the events (Sudman and Bradburn, 1974). Wicklegren (1970) reported that the majority of research in experimental psychology suggests that short-term and intermediate memory decays exponentially with time. Cannell et a l , (1963) found that the failure to report visits to physicians over a two week period, increased from 15% after one week to 30% in interviews two weeks later. There are no data available for long-term memory effects as a function of the reference time period. In general, as the time increases between the event and the interview, there is increased underreporting of information about that event (i.e., errors of omission). Because of the greater time lapse, the cognitive demands to define what information is relevant, to recall it and to reconstruct it are greater; extraneous cues (interviewer characteristics, respondent goals, etc.) may then erroneously affect the accuracy and completeness of the information reported. 9 6 Because remembering events in the distant past can be taxing to an individual's cognitive skills and capacities, the use of a personal interview and probing techniques could result in fewer omissions; however, personal interviews are associated with telescoping errors. It is interesting to note that errors due to omissions and telescoping can occur simultaneously during recall. For very long periods, there will be more errors of omission than overreporting due to telescoping. Errors of omission also depend on the saliency of the event: memory is better for highly salient items. The cognitive approach to questionnaire design conceptualizes the response to a survey question as involving four distinct stages, each of which can involve erroneous reporting (Jobe and Mingay, 1989). The first is comprehension, interpreting the meaning of the question. Secondly, there is retrieval, in which the respondent searches long-term memory for relevant information. Thirdly, there is estimation or judgment, the evaluation of the retrieved information as to relevance; the respondent may then combine separate information items to form a response, or alternatively she may decide that the recalled information is inadequate, using that decision as a start point in forming an adequate response. The fourth stage is response. The subject weighs such factors as sensitivity of the question, social desirability of the answer, and probable accuracy. The authors reviewed three reports from the US National Center for Health Statistics, which discussed ways to minimize reporting errors. Respondent comprehension rose when simpler terms were used, even though the original wording was considered to be comprehensible. Respondent recall was also improved by providing additional cues for hard-to-remember information. Researchers also used "decomposition" to lead subjects to break down generic memories so as to recall individual events such as health visits. Another technique is the creation of a personal timeline of accurately dated landmark events in the 97 subject's life, against which he can try to place, for instance, particular health events such as visits to doctors. Kalton and Schuman (1982) studied the effect of the question on survey response. Such responses may be sensitive to the precise wording, format and placement of the question. Their conclusions are that questioning is not a precision tool; there is ample evidence that serious response errors can and do occur. Although much research has been done, we remain largely ignorant of the nature of question wording and or form effects. Reviewing some of the authorities discussed above, such as Sudman and Bradburn (1974), and Cannell (1977), the authors discuss in particular the effective construction of "factual" questions for such surveys as case-control studies. The start point must be a precise definition of the fact/information to be collected. It has often been shown that apparently small changes in definition can have large effects on survey results. Of concern is that a precise definition may lead to an unwieldy question, which the respondent cannot or will not make the effort to absorb. A respondent needs to understand both what is being asked of him, and what is an appropriate response. Such problems as telescoping and social desirability effects have already been canvassed. The authors also consider that the random response technique can protect a respondent's privacy, particularly when threatening or embarrassing questions are asked. The respondent chooses which of two or more questions he answers by a random device; he answers the chosen question, without the interviewer being aware which is being answered (Kalton and Schuman, 1982). Several studies have obtained higher rates of reports of sensitive information from random response techniques than from traditional questioning. However, any gain in bias reduction has to be set against a sizable increase in sampling error; the 98 technique also hampers analyses of the relationships between the responses to the threatening question and other variables. The authors consider that the various approaches of Cannell (1977) to the problems of memory errors and of sensitive questions, have resulted in improved reporting. Although longer questions may sometimes yield fuller answers, they can be a cumbersome tool, however. Experiments on a carefully thought out mix of long and short questions show an increased yield of reports on health events. By essentially stating important questions twice, the questionnaire improves the respondent's understanding of what is required by giving more time to martial one's thoughts and recall; as well, a respondent may interpret the length of the question as a sign of its importance and give it greater consideration. Another technique involves the use of instructions to the respondent at the beginning of her task, to think carefully, search her memory, take her time to check records, and answer as completely as possible. Researchers may also use feedback, and deliberately secure the respondent's commitment to respond conscientiously. Evidence of experiments on the utility of these techniques suggests that each leads to improved reporting, with a combination of all three techniques giving the best results of less under- and overreporting (Kalton and Schuman, 1982). For Feihstein (1985b) the format of the health interview is the most important scientific instrument of many case-control studies. In a well-conducted study, the investigator may take additional pains to check the consistency and accuracy of the interview process. This may involve data collection of the same information by the same method several months later; or it may require checking data with a family member who knows the subject, as well as any archival material (Feinstein, 1985b). He is also concerned that if the exposure is not well specified during data collection, inaccuracy may arise. For instance, the subject 99 may have used certain pharmaceutical substances, such as aspirin or food additives, without being aware. Unless the researcher has established a complete list of all the ways that exposure might have occurred, and unless the subject is asked about all those possibilities, the occurrence of an exposure might not be recognized. As well, if these inquiries are not then applied equally for all subjects, whether case or control, the results may be biased. In conclusion, despite four decades of academic discussion about the nature, prevalence, characteristics, causes and indeed the very existence of recall bias, many conclusions still seem tentative. Perhaps this is understandable, as the basic problem is rooted in the nature of human memory and in the social motivations of humans, often when subjected to the additional stress of involvement in a disease process. Nevertheless, the literature discussed above does, at the least, provide both cognitive and sociological schemata within which to place the anecdotal findings of recall bias. It would appear that the most important lessons for an epidemiologist or designer of health research studies centre on the conclusion that both respondent and task variables are likely the largest source of response bias effects. There certainly are enough perceptive insights about the demonstrated shortcomings of case-control research to apply to future studies in an effort to eliminate or at least minimize recall bias effects, if they exist. i 2.6 Review of the Literature: Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) In the previous section of this chapter, the cognitive, psychological and social sciences literature was canvassed, and the factors responsible for recall and recall accuracy were reviewed, along with a discussion of their possible influence on the reliability and validity of the data collected. The field of cognitive 100 psychology provides the most thorough understanding of the processes of human memory, how memory errors occur, and the means to control them through research design. Part two of this literature review examines epidemiological and other health-related studies which specifically assess the reliability of the exposure data, and then determines the nature and impact of any resulting exposure misclassification (recall bias) on the estimates of effect. Harlow and Linet (1989) and Austin et al. (1994, pp.65-75) provide a fairly complete listing of the studies which have studied recall accuracy and recall bias, and have provided useful guidance in the compilation of this literature review in a table format. The evidence for the existence of differential exposure misclassification (recall bias), and the effects of misclassification bias on relative risk estimates in completed case-control studies are reviewed and summarized in Table 2 (pp.102-145) of this chapter. Overall, none of these studies provides strong and consistent evidence for the existence of appreciable recall bias, and significant distortion of the relative risk estimates. Methodologists such as Coughlin (1990) recommend further studies of exposure misclassification in different research domains with respect to the effects of different exposures, length of recall, and other factors which may account for differential recall. In addition, Austin et al. (1994) concluded that future case-control studies must be evaluated for their ability to detect subtle and weak associations, and when possible, their methodology improved to be "more sensitive and specific to weak and moderate associations" (p.74). These authors also note the importance of considering non-differential exposure misclassification when they identify it as one of three "biggest threats to the validity in case-control studies"; one of the others is recall bias resulting from differential recollection of past events for cases and controls (p.75). Thus, there is a need to study both non-101 differential and differential exposure misclassification in the context of case-control studies. Following on this, the next logical step for investigation is the feasibility of developing a validity scale for the measurement and control of non-differential and differential exposure misclassification. This dissertation is designed to address these areas of concern. 2.7 Raphael's Proposal for the Measurement and Control of Recall Bias: The Development and Implementation of an Exposure Data Validity Scale During the review of the literature, it became apparent that only a limited number of studies had directly addressed the problem of exposure misclassification and recall bias in case-control studies, and that the empirical evidence for the existence of recall bias was inconclusive (Lippman and Mackenzie, 1985; Mackenzie, 1986; Mackenzie and Lippman, 1989; Friedenreich, 1990). Furthermore, the findings of these studies did not provide strong evidence that differential exposure misclassification (recall bias) was as serious a case-control deficiency as it was conjectured to be. There was no significant group differences in the reporting of past exposures and the biasing of the estimates of effect. Those studies which provided findings in support of recall bias were themselves, often subject to methodological problems (e.g., insufficient study power, failure to assess the validity of both false positive and false negative reports of exposure for both the cases and controls, lack of (suitable) controls for case-control comparisons, the use of different data sources for both the collection and the comparison of prospective and retrospective exposure reports, the salience of the outcome event, the length of recall, etc.) and their evidence had to be called into question. 102 Nevertheless, the opponents of case-control studies have persisted in their strong criticisms, and in their challenges to the scientific structure and the credibility of retrospective, observational research to provide unbiased estimates of association, and the generation of valid study conclusions. At the same time, the case-control paradigm is acknowledged by these same critics, as the design of choice for the study of rare and chronic diseases such as cancer, where the latency period between exposure and disease occurrence was long, and logistical and ethical reasons precluded the implementation of randomized clinical trials or cohort studies. At first, these contradictions and ambiguities seemed irreconcilable. However, Raphael (1987) provided the insight and the methodological guidance to resolve these issues. She proposed the development of a validity scale for the measurement and control of recall bias. If successful, this scale would improve case-control methodology overall, while increasing its acceptance as a valid research paradigm for the etiologic investigation of the determinants of health and disease. The development, implementation and evaluation of an 'exposure data validity scale' became the very motivation for, and a primary focus of this dissertation. If a 'validity scale' could be easily developed, and shown to be effective as a design standard for the estimation and statistical adjustment of exposure misclassification, then it could be used in every case-control study. Its routine inclusion would meet the requirements of the investigator to provide evidence regarding the reliability and validity of the exposure data collected in a case-control study, the existence, magnitude and direction of any existing exposure misclassification, and if significant, the means to statistically adjust the relative risks for any distortions (Raphael, 1987, p.168). Consequently, researchers and critics would be more confident and accepting of the ability of case-control studies to generate valid study conclusions. 103 Raphael's (1987) proposal was "adapted from the logic of the validity scales of the Minnesota Multiphasic Personality Inventory (MMPI) which attempted to adjust some of the other scales (in the inventory), based on a measure of each respondent7s test-taking attitude or response set" (1987,p.l68). According to Raphael (1987), the exposure data 'validity scale' should consist of 'plausible but fake' risk factors for the disease under study. The exposure variables that are included in the scale must also meet the following criteria: 1) the exposures and events "should be of approximately equal plausibility when compared to the exposures which are the putative risk factors of the research study. Unless they are equally plausible, the validity scale will not appropriately measure 'search for cause' cognitive processes" (p.169); and, 2) the exposures and events cannot be related to the development of the study disease. Once the scale items are selected, the respondents would then be questioned regarding their previous exposure to each item on the validity scale. Subsequently, their responses would be used to estimate the presence of exposure misclassification, particularly, differential recall (i.e., the tendency of cases and controls to either over- or underreport antecedent events and exposures). If a specific exposure variable is not a risk factor for breast cancer, then the proportion of cases and controls exposed to this 'plausible but fake' risk factor should be approximately the same, and the resulting estimate of association (odds ratio) should be equal to 1.00, thus, indicating no risk for disease development. According to Raphael (1987), differential exposure misclassifcation (recall bias) would be suggested, for example, when "case respondents positively endorse an excessively large number of validity scale items in comparison to control respondents" (p.169). She goes on the argue that "...the endorsement would likely be due to overreporting recall bias rather than actual higher rates of exposure" 104 (p.169). Here, the estimated odds ratios for these variables would be significantly different from 1.00. Raphael (1987) suggests that "by comparing the total validity scores for cases versus controls", the researcher will be able to determine if recall bias exists, as well as its impact on the measures of association (odds ratio estimates) (p.l69)[my emphasis]. Because "the validity scale score is a function of the extent of each respondent's recall bias" i.e., the subject's tendency to over/underreport previous exposures and events, the summary within groups validity scale score "may be entered into the final analysis as a statistical control for recall bias" (p.169). In summary, Raphael's validity scale proposal was intutitively appealing because it offered case-control researchers the opportunity to assess and to control for the effects of differential exposure misclassification (recall bias) in any case-control study: the scale construction appeared to be straightforward. Section 3.11 of Chapter 3 describes the stages in the construction of the exposure data validity, as well as the statistical program used to assess the etiologic importance and specific weights for the exposure factors selected for inclusion in the scale. 2.8 Methodological Considerations for the Design of an Exposure Data Reliability and Validity Study to Assess Exposure Misclassification Mackenzie (1986) provided design clarification as to how the question of exposure misclassification (recall bias), could best be studied. She noted that the problem of reporting (recall) bias should be examined by collecting exposure information prospectively "when subjects are at risk of becoming cases", and then collecting the same information, by the same data collection method, retrospectively, once the subjects are cognizant of their disease status (Mackenzie, 1986, p.35). By using this design, the researcher is able to assess the impact of group membership on recall, and specifically, whether cases and controls recall 105 their past exposures similarly or differently. If group differences exist in the prospective and retrospective reports of exposure, the researcher can conclude with confidence and increased certainty, that any existing exposure misclassification is due in fact to systematic case-control differences in recall accuracy, and not to differences in the way the data were collected. Other aspects of study design are discussed in Section 3.1. 2.9 Summary In this chapter, I have attempted to provide a review of the methodological limitations of case-control research, and its susceptibility to biases which could invalidate study conclusions. In addition, an extensive overview of the various subject response, task and interviewer variables which could affect respondents and their ability to recall past events and exposures both accurately and reliably was provided. This background information was included because of its importance for a complete understanding of 'how' and 'why' exposure misclassification occurs, 'why' the subjects in a case-control study may be predisposed to remember and report personal information differently, as well as the way these factors may contribute to exposure misclassification (i.e., the overreporting/overstating and/or the underreporting/understating exposure), and their impact on the biasing of the exposure-disease odds ratios (i.e., towards or away from the null value). This chapter has also provided the background justification for and the significance of this study. As discussed in Chapter 2, very little research has been completed on the reliability and validity of exposure data, including non-differential and differential exposure misclassification. The research which has been done, has not provided strong and consistent evidence that exposure misclassification is as significant a problem in case-control research as it is 106 suspected to be. Furthermore, studies have not demonstrated that the exposure-disease odds ratios have been biased by either non-differential or differential exposure misclassification so that study conclusions were invalidated. Given the susceptibility of case-control studies to inaccurate recall of past exposures and events, the possibility that odds ratios may possibly have been biased, and the relative lack of empirical research in this area, a research need was clearly identified in the area of case-control methodology. Therefore, this study was designed to determine the suitability of case-control research for studying disease etiology. Of particular concern was the requirement for an assessment of the reliability and validity of exposure data, the determination of the presence or absence of non-differential and differential exposure misclassification, and an evaluation of the impact of any resulting exposure misclassification of the estimates of effect. Raphael's proposal (1987) for the development of a validity scale to measure and control recall bias was also investigated within the context of this dissertation. In the next chapter the specific research design and methods that were used in this study to address these questions will be outlined. As noted in Section 1.3, the discussion of the specific research methods will include such topics as: the choice of a nested case-control study design, the recruitment and selection of cases and controls, the use of multiple control groups and anamnestic controls, as well as the specific procedures used to collect and analyze the data. Chapter 3 also describes the specific steps in the development and construction of an exposure data validity scale which will be evaluated as a possible design strategy for the measurement and control of differential exposure misclassification (i.e., recall bias). Table 2: Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 1. Klemetti and Saxen, 1967 A case-control study of the association of non-chronic maternal disease and drug usage in early pregnancy with a deviant pregnancy outcome (i.e., neonatal death, abortion, stillbirth and congenital malformations). Prospective and retrospective exposure reports obtained by personal interview were compared with information recorded in the clinical records. The prospective data regarding antenatal drug usage and non-chronic maternal disease were collected during the fifth month of pregnancy. Yes No Antenatal drug usage Non-chronic disease 1. It was concluded that both the prospective and retrospective exposure reports were unreliable. Overall, there were no significant group differences regarding recall accuracy. 2. Only 25% of the prospectively collected exposure information (i.e., drug usage and non-chronic diseases) was recalled and reported accurately in the restrospective postnatal interview (Klemetti and Saxen, 1967, p. 2075). 3. Sixty six percent of the retrospective positive exposure reports "could not be confirmed from the prospective interview or information collected from other sources" (Klemetti and Saxen, 1967, p. 2075). Note: The section on 'Evidence of Exposure Misclassification/Recall Bias' refers to the researcher's interpretation of the study results, and whether or not the researcher's assessment of the data provided evidence that recall bias was present. References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 1. Klemetti and Saxen, 1967 (continued) The results of the prospective and retrospective interviews were compared to determine i f recall bias was present. N = 406 (203 case mothers and 203 controls (matched to the cases for time of birth and clinic)) 4. Klemetti and Saxen (1967) noted that "new" and incorrect exposure information was provided retrospectively by the mothers (p. 2074). 5. There were no significant case-control differences in the percentage of identical replies. The pregnancy outcome (deviant vs normal) and the condition of the child did not affect recall accuracy (Klemetti and Saxen, 1967, p. 2074). 6. There was no empirical evidence of recall bias. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 1. Klemetti and Saxen, 1967 (continued) Remarks: The results of this study are inconclusive and cannot support the finding of no evidence of recall bias. Mackenzie (1986) noted several methodological limitations which prevented the proper assessment of these prospective and retrospective exposure reports for recall bias. These included: (1) Only the positive (+) exposure reports were considered. Properly conducted recall bias studies must evaluate the prevalence of both false (+) and false (-) reports of exposure by the cases and the controls, and then determine if groups differ systematically regarding overall level of recall accuracy. (2) Case-control distinction was not maintained in this study. "The units studied changed from the women (i.e., the cases and controls) who provided the reports, to the reports themselves" (Mackenzie 1986, p. 27). Here, the problem was the absence of case-control comparisons to determine i f the prevalence of discrepant reports were sufficient to bias the estimates of association (exposure-disease odds ratios) for the various study factors. (3) Klemetti and Saxen (1967) also stated that approximately "two-thirds of the positive replies in the retrospective study could not be confirmed from the prospective interview or information collected from other sources" (p. 2075). Here, it must be emphasized that data reliability studies as well as studies of exposure misclassification (recall bias) must use the same source of data for the prospective and retrospective comparisons. Health and pharmacy records are not 'gold standards'; their accuracy and completeness may vary for the cases and the controls. Health records wi l l only contain what the subject reports, or what is observed and reported by the health care provider. As such, case-control discrepancies could be related to the method used for data collection rather than real differences in recall accuracy (Mackenzie, 1986; Lippman and Mackenzie, 1985; Mackenzie and Lippman, 1989). Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 2. Hewitt e ta l . , 1966 A case-control study of the relationship between antenatal x-rays (i.e., abdominal and chest), toxemia and anemia and the subsequent development of childhood cancers. Personal interview data were compared with antenatal records. Yes No Antenatal x-rays, toxemia and anemia The authors concluded that there was no general tendency for mothers of live children to report fewer prenatal events when compared with mothers of dead children. These conclusions were based on a sensitivity and specificity analysis of "checked statements" (pp. 82-83). Remarks: The results of this study are inconclusive due to methodological problems. The use of antenatal records and the radiologists' reports weakened this study and its conclusions. The authors noted that the antenatal records did not contain any information about events which happened after admission to hospital, or before a woman seeks antenatal care. X-rays during labour or during the early weeks of pregnancy for non-obstetric reasons were missing. In fact, antenatal records were missing for 43 % of the sample and were incomplete for the rernaining 57 %. When these records were used as the standards for comparing the maternal exposure reports, the relative risk estimates would have been upwardly biased for abdominal x-rays, toxemia and downwardly biased for anemia and chest x-rays. False conclusions would have been based on discrepancies related to inadequate data collection procedures, and specifically, missing documentation. The study emphasizes the requirement to use the same data source for exposure data reliability studies, and for those studies designed to assess exposure misclassification/recall bias. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 3. Hopwoodand Guidotti, 1988 A case series study. The authors assessed the recall of symptoms in 22 of 31 subjects. These workers were exposed to nitric acid fumes from drums ruptured during a hazardous waste site clean-up operation in 1983. Symptoms recalled at 6 months were compared to symptoms reported at the time of the incident. Yes Yes Symptoms related to nitric acid exposure: dizziness, headaches, respiratory problems (shortness of breath, sore throat, cough, sputum production), lightheadedness, unusual taste, eye discomfort, fatigue, nausea, pruritis, abdominal discomfort, paresthesis, and anxiety. The authors observed substantial disagreement which exceeded that expected on the basis of chance alone. This discordance was consistent, and in the direction of more prevalent reporting of symptoms with the passage of time. They concluded that a high level of recall bias was present. Six months post-outcome, the authors noted that symptoms were more likely to be recalled and reported retrospectively than forgotten. False positive reports were more prevalent than false negative reports. 3. Hopwood and Guidotti, 1988 (continued) Remarks: This was a small case series: Only 71 % of those subjects that were originally exposed were found and re-interviewed at 6 months. This study was unable to assess for the presence of recall bias because there was no control subjects included for the required case-control comparison. Recall bias is defined as differential reporting of exposure status by cases and controls (i.e., a phenomenon of differential reporting accuracy which is dependent on group membership). As such, this study was an exposure data reliability study (i.e., a determination of the consistency of reports given by the 21 cases at the time of the incident versus 6 months later). They should have concluded that recalled symptoms at 6 months were unreliable and lacked precision. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 4. T i l leye ta l . , A case-control study of Yes No Drug use during With the exception of data on 1985 the effects of diethylstil- pregnancy, treatment (i.e., hospitalization bestrol (DES) exposure The authors pregnancy history, during pregnancy and trunk during fetal life. The considered recall parity, x-ray) and drug use, there were authors compared prenatal accuracy which miscarriages, no statistically significant records with obstetric was defined as threatened differences in agreement histories. These histories the level of abortion, (obstetric history vs antenatal were collected by means agreement hospitalization record) between the group of of a self-administered between prenatal during pregnancy, DES-exposed mothers identified questionnaire which was records and the trunk x-ray and through review of their prenatal completed by the women reports provided birth weight. records and the unexposed 10 to 30 years after the by a self- mothers. Agreement was better birth of their daughters. administered for DES-exposed mothers questionnaire. regarding treatment and drug N=3650: (3078 cases use. mothers and 572 control mothers). Recall accuracy (i.e., the level of agreement between the The case mothers also prenatal record and obstetric included DES-exposed history) was slightly better for women who were walk- the walk-ins/ referrals when ins or referrals to the compared with the two groups project centres^ identified by the review of prenatal records. According to the results of the Kappa analysis, this study found good to excellent agreement for all groups when the mother's Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 4. T i l l eye ta l . , 1985 (continued) recall of her reproductive history was compared with medical records. The agreement was poor for the following variables: medical treatment, x-rays and drug usage during pregnancy. 37% of the DES exposed mothers either could not remember (29%) or denied (8%) using DES although it was recorded in their antenatal record (p. 269). The accuracy of recall is dependent on the type of exposure to be recported, as well as the level of detail that is requested. Clinical records were more complete when compared with physicians' office charts. Remarks: (1) The study population was homogeneous: predominantly Caucasian and middle class. Therefore, the results of this study cannot be generalized to other populations. (2) Sample size was insufficient to test the study's underlying hypotheses. (3) The impact of case-control differences in recall accuracy were not evaluated by means of odds ratio comparisons. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 5. Preston-Martin A case-control study of Yes No Dental radiation The authors conclude from the et al . , 1985 the association of dental comparisons of chart and radiation and the interview information that occurrence of parotid exposure recall appears to be gland tumors. unbiased. The measures of agreement between the two data Telephone interview sources were similar for cases information was and controls. compared to dental records. N = 163: (84 cases and 79 controls). Length of recall - up to 30+ years. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 6. Mackenzie and A nested case-control Yes No 39 potential risk The data from this study did not Lippman, 1989 study of the association of factors for adverse provide any evidence for the 39 potential risk factors reproductive existence of recall bias. The and possible adverse outcomes: chronic authors found that: reproductive outcomes. illness; stress; coffee, wine, 1. Inconsistency in the reporting N=747 (85 case mothers liquor of the study variables was (whose infant died, had consumption; evident; however, these malformations, or was smoking; poor discrepancies were similar for admitted to the intensive nutrition; nausea; the study groups. The care nursery for longer medications; retrospective reports were than 24 hours for serious contraception; subject to more post-delivery complications); 217 reproductive deletion of exposure information mothers (intermediate history; acute rather than post-delivery group) with infants of illness, family addition. intermediate health status; history of and, 445 controls (normal malformations, 2. There were no statistically healthy infants)). etc. significant differences in the frequencies or prevalence Pregnant women provided changes for the 39 exposure reports of exposure variables for the 3 study groups. prospectively and In other words, there was no retrospectively for the 39 significant case-control study factors by means of differences in the group's a self-administered tendency to add or delete questionnaire. exposure information postnatally. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 6. Mackenzie and Lippman, 1989 (continued) Prenatal and postnatal responses were compared (reliability study) for the three pregnancy outcome group. Changes in the odds ratio estimates were also evaluated: case vs normal control; intermediate case group vs control. Yes No 3. The changes in exposure reporting were not related to group status (i.e., pregnancy outcome, maternal concern about the baby, or maternal socio-demographic characteristics) (Mackenzie and Lippman, 1989, p. 65). 4. A comparison of the odds ratios from the prospective and retrospective data did not show a tendency to increase or the decrease the estimates of association between the risk factors and pregnancy outcome. 5. The estimates of association were not biased by the resulting changes in exposure reports. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 6. Mackenzie and Lippman, 1989 (continued) Remarks: (1) Sample size was insufficient: the study lacked the statistical power to test the research hypothesis and to demonstrate significantly biased reporting. (2) Mackenzie and Lippman (1989) noted that i f one assumes that biased reporting is dependent on the salience and emotional impact of the outcome event ( i .e , severe infant malformations which would stimulate biased reporting), then the failure of this study to provide evidence of recall bias may be a function of the small case population and the low incidence of very sick/malformed infants (8.2%) (p. 74). Cases were originally defined as mothers experiencing stillbirth or having a child with severe medical complications or malformations. However, the majority of the cases were not consistent with the inclusion criteria. Consequently, there was a loss of statistical power due to insufficient sample size for case group definition. (3) The cases may not have been different from the controls because the case mothers' infants required only transitional N ICU care. This homogeneity may account for the similarity of reporting among the cases and controls. The number of cases needed to be increased, and be comprised of stillbirths, abortions, severally i l l or malformed infants only so that recall bias could be studied. (4) The length of recall was of shorter duration than that usually encountered in case-control studies of chronic and rare disease (a few months vs many years). (5) The study subjects were unrepresentative of the general population. Less educated women and immigrants were underrepresented or excluded because they were unable to complete the study questionnaire. The study population was predominantly Canadian born, highly educated and sought obstetrician-based prenatal care. This was significant because lower SES women and less-educated women were at an increased risk for deviant pregnancy outcome (Mackenzie, 1986). It was concluded that the lack of evidence to support the existence of recall bias "does not prove that the bias does not, or cannot, exist" (Mackenzie and Lippman, p. 74). Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 7. Werler et a l , 1989 A case-control study of malformations. Interview data collected during the postpartum were compared to exposure information collected during pregnancy and then recorded in the mother's obstetric record. N=270 (105 cases (mothers of malformed infants) and 165 controls (mothers of non-malformed infants)). The medical record information was considered the 'truth' or (gold standard) for the determination of case-control differences in the reporting of the eight exposure variables. The researchers assessed the proportion of case mothers who gave positive reports given Yes Yes (For some of the factors and not for others) Medications taken and illnesses during pregnancy. The cases compared to the controls recalled a greater proportion of documented exposure for two of the eight exposures: periconceptual birth control and urinary tract or yeast infection. For birth control after conception, case reports were 8x more complete. The proportion of agreement was equal in the two groups for over-the-counter drug usage and elective abortion, and less for cases for nausea and vomiting. The authors concluded that recall accuracy was better for the cases, and therefore suggesting the presence of recall bias. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 7. Wer lereta l . , 1989 (continued) that the exposure was recorded in the chart. The researchers assessed for the presence of recall bias by estimating relative sensitivity (RS) (i.e., the ratio of reporting accuracy for mothers of malformed infants to that of mothers of normal, healthy infants). If the RS measure > 1.0, recall accuracy is better for the case mothers (Werler et al . , p. 415). Yes Yes (For some of the factors and not for others) Remarks: The results of this study have been criticized by Swan and Shaw (1990) and Berg (1990) for the following deficiencies: (1) The impact of case-control differences in exposure recall were not evaluated by comparing the odds ratios for the two data sources (i.e., medical records and personal interview data). (2) The study had a high rate of non-participation for both the cases and the controls. Therefore, sample distortion bias may be responsible for case-control differences in recall accuracy. (3) The failure to consider potential overreporting by the cases (a function of specificity), as well as underreporting by the controls. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias -7. Werle-r e ta l . , 1989 (continued) Remarks: (4) Critics were opposed to the suggestion by Werler and coworkers to use malformed controls due to the possibility of selection bias. Swan and Shaw (1990) noted that etiologic agents could cause increased risk for different malformations, and that the exposure-disease odds ratio would be biased towards the null. (5) The use of obstetric records as the standard 'criterion' against which the maternal reports were compared was considered inappropriate. This record can only be used to estimate the prevalence of false (-) reports; medical records report exposures only (if complete) and fail to document non-exposures. Two different data sources were used to measure recall accuracy. Therefore, the method of data collection could explain the resulting reporting discrepancies. The same source of data (maternal reports) must be used to study the nature and impact of recall bias ' (Mackenzie, 1986). (6) The findings of this study suggest only small differences in recall accuracy. Therefore, only weak evidence exists for maternal recall bias. The methodological limitations of this study would negate even this minimal source of evidence. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 8. S to l l eye ta l , 1978 A case-control study of thromboembolic disease and oral contraceptive use. Interview data were compared with physicians' records. N=276 (79 cases and 197 controls (women with a history of oral contraceptive use within the previous two years)). Length of recall was up to 10 years. Yes Yes (for some of the exposure categories) Oral contraceptives (OCs) (10 brands): total duration, brand name, start date, and stop date. Case-control differences existed regarding the subjects' recall of duration of use, and dates for the use of the drugs. For total duration of use of OCs, the cases showed a higher rate of agreement with prescriber records than the controls. Among the controls, a higher percentage reported a longer duration of use. Agreement rates between subject reports and physician records were poorer regarding the dates of usage. Cases tended to have a higher agreement rate with their prescriber on the starting date of use (60.6%) compared to (48.1%) for the controls. Recall accuracy depended on the types of information and level of detail requested. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 8. Stolley et al . , 1978 ( continued) Remarks: (1) Results are not generalizable to the general population because the study population was homogeneous regarding ethnicity. The cases and controls were primarily Caucasian (71 %). Therefore, the sample may possibly have been unrepresentative of the general population with respect to ethnicity. (2) Sample size was too small to complete a full study of recall bias, and to test the relevant hypotheses (i.e., study power was insufficient). As well, only 52.4% of the study population had physician records which could be used for comparative purposes in this study. (3) The impact of case-control differences in exposure recall were not evaluated (i.e., no odds ratio estimates were calculated for comparison). Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 9. Rosenberg et a l , 1983 A case-control study of the association of oral contraceptive use and the diagnostic outcome of hepatocellular carcinoma (HCA). Study population consisted of N=130 (61 cases and 69 controls). Only 43% of the original study was available for this follow-up study. Length of recall: 4-16 years previously. Interview reports were compared to questionnaire information provided by the physician ( i .e , the prescriber of the Oral contraceptives). Yes Yes Oral contraceptives Overall, the agreement between the 2 data sources for: (1) month-specific duration of use; (2) duration of use and brand; and (3) duration, brand, and dose was 90%, 62% and 54% respectively. Agreement was significantly better for the cases than for the controls in all 3 areas (i.e., duration, dose and brand). When analyzing agreement for all 3 variables combined, the difference in percent agreement for the cases versus the controls was 62% vs 47% respectively. "These differences in agreement did not change appreciably when adjusted for race, education, marital status, religion or age at index date" (p. 85). Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 9. Rosenberg, et al . , 1983 (continued) Remarks: (1) The impact of case-control differences in recall accuracy were not assessed by odds ratio comparisons calculated separately for the two data sources. (2) The study sample was too small. Thus, there was a loss of statistical power for hypothesis testing. (3) Only 66% of the original H C A study group were included in this analysis. Women who could not remember their prescribing physician were excluded. It is assumed that these women may not recall contraceptive use as well. Their exclusion would incorrectly inflate the percentage overall agreement. (4) Sample was predominantly Caucasian and educated (high school or better), and therefore, the sample may be possibly be unrepresentative of the general population. These subjects may have been more motivated to participate, and therefore, better prepared to remember and to recall what was being requested of them. Sample distortion (selection) bias may have been responsible for the high levels of agreement that were found in this study. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 10. R o h t e t a l , 1985 A household health survey of residents living near 2 hazardous waste sites in Louisiana (1981-1982) compared with an unexposed community. Yes Yes Eye, respiratory, upper and lower gastrointestinal symptoms. Results of the health survey indicated that residents living in the exposed communities reported more symptoms than residents of the comparison community. There was a statistically significant main effect for the respondents' opinion about waste site effects on health and the reporting of associated symptoms regardless of the loca-tion of residence (pp. 426-427). For those subjects who believed that waste disposal sites affect the environment, their reports of chronic illness were 2-3x more prevalent than those individuals who did not believe in this association (p. 428). Meteorological and hydrologic data demonstrated that residents near the waste sites were not directly exposed to the hazardous substances which were released from the sites. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 10. Roht et al . , 1985 (continued) Remarks: This study does not provide conclusive evidence about the existence of 'reporting bias'. There is no measure of reported symptoms and chronic illness for the comparison communities prior to the media coverage which focused the residents' attention on health problems and local environmental hazards. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 11. Hertzman, et al . , 1987 A prospective morbidity survey of workers and residents from the Upper Ottawa Landfill Site in Hamilton, Ontario. The objective of the study was to determine health effects associated with occupational exposure to a landfill site. Workers and unexposed controls completed a health questionnaire. To validate the cases and controls self-reported health problems, their medical records were reviewed for confirmatory documentation of reported exposures, health problems, and visits to the doctors. Yes No Possible health problems related to landfill site exposure and visits to the doctor for any resulting health problems. There were no statistically significant differences in the distribution of confirmed, possibly confirmed and not confirmed events in either time period (i.e., pre-publicity vs post-publicity where there was intense concern in the media re: exposure and health problems regarding the landfill site). For example, the percent of problems lacking chart confirmation was small and non-differential between the cases (7.9%) and the unexposed controls (7%) preceding publicity regarding the hazards of landfill sites. However, post-publicity, the proportion of unconfirmed events rose 9.9% in the exposed cases and 4.5% in the unexposed controls. No evidence of increased physician utilization by exposed cases. to Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 11. Hertzman, et a l , 1987 (continued) None of the conditions of interest showed trends toward overreporting among the exposed cases. Overall overreporting rates were unbiased between the study groups. There was no evidence for recall bias. Remarks: (1) Response rate for exposed workers was higher (84.5%) and significantly different from the response rate for controls (71.9%). Here, the possibility exists for selection bias. (2) Did not consider underreporting of exposures, health problems and visits to the family physician. Studies of recall bias must assess the case-control differences in false (+) and false (-) reports of exposure, which are a function of sensitivity and specificity. This problem may have threatened the study's validity in view of the fact that 36.5% of the medical records documented a visit to the doctor which was not reported on the questionnaire. to CO Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 12. Jain e t a l , 1980 A case-control study of the association of diet and bowel cancer. N=52 (26 cases of bowel cancer and 26 matched neighbourhood controls). Personal interview data were compared with information recorded in a dietary history questionnaire. Yes Yes (Modest) Mean daily intake of 13 nutrients Jain et al. (1980) concluded that the cases were more likely to decrease intake after diagnosis. Therefore, cases had a tendency post-diagnosis to underreport intake for the various food items/nutrients. The authors concluded that current diet affects reporting of past dietary patterns. Remarks: (1) Case-control differences regarding participation rate (i.e., the possibility of selection bias) may have adversely affected the study results/ conclusions. 80% of the eligible cases vs 52% of the eligible controls participated the original study. A low participation rate is a potentially serious threat to the study's validity because the controls differ from the cases by virtue of the fact that they are disease free. The salience of the outcome event is not sufficient to stimulate their motivation to recall and to report past diet. (2) No evaluation of case-control differences on the estimates of association (odds ratio) between the nutrients and outcome (i.e., bowel cancer). Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 13. Hislop, et al . , 1990 A nested case-control study of the relationship of diet and breast cancer among a cohort of women from an earlier case-control study in 1980-1982. This study was designed to evaluate dietary recall and the presence of differential misclassification. N=463 (263 cases and 200 controls (i.e., neighbors/acquaintances of the case women)). Self-reported dietary information from a food frequency questionnaire completed in 1980-1982 was compared with data re-reported in 1986 by means of a self-administered food frequency questionnaire. Yes No Dietary components reported in a food frequency questionnaire The authors found little difference in the responses for both cases and controls regarding dietary recall for the distant past. Systematic differences were noted for the recall of recent diet by the cases. Here it was suggested that recent dietary changes would be more frequent and likely to affect recall in the cases because they have had to alter their diets as a consequence of disease and its treatment. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 14. Baumgarten et al . , 1983 A validation of self-reported work histories of cases and controls in a study of the relationship between various occupational variables and the diagnostic outcome of cancer. N=297 (274 cases and 23 controls). Yes No Occupational factors 1. 82% of the subject reports agreed with the records. 2. The extent of agreement did not differ between the subgroups defined by age, education, and social class. 3. There was no evidence of differential reporting of occupational factors by the cases and the controls. The data provides no evidence to support the finding of no recall bias. Remarks: (1) Sample size was insufficient. There was inadequate study power to test the specific research hypothesis. (2) Work histories were validated for 274 cases but only 23 controls. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 15. Weinstock et a l , 1991 In a case-control study nested in the Nurses' Health Study Cohort, Weinstock and coworkers assessed for the presence of recall bias in the reporting of two risk factors for melanoma -( i .e , hair color and the ability to tan). Cases and controls provided risk factor data prospectively (before diagnosis) and retrospectively (after melanoma diagnosis) by means of a questionnaire. N=459 (143 cases and 316 age-matched controls) randomly sampled from the cohort. Response rate: 85 % cases and 81% controls. Yes Yes Hair colour and ability to tan The authors concluded that recall bias was observed among female nurses with cutaneous melanoma regarding their assessment of tanning ability. Cases differentially reported a reduced ability to tan when questioned after the diagnosis of melanoma. Prospective OR = 0.7 (95% CI 0.3-1.5) Retrospective OR = 1.6 (95% CI 0.8-3.5) Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 15. Weinstock et a l . , 1991 (continued) Remarks: (1) The format of the questionnaire (i.e., non-identical wording of the questions) may have been responsible for the biased reporting of ability to tan. (2) The participants were highly motivated as evidenced by their responding to multiple questionnaires. The findings are probably not generalizable to the general population: the participants may be more or less susceptible to recall bias. (3) Melanoma was diagnosed before the return of the first questionnaire in 104 of the cases. According to the authors, the diagnosis of melanoma may have affected the baseline exposure history. H Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 16. Lindsted and Kuzma, 1989 A case-control study nested within the Adventist Mortality Study in California. This study examined the relationship between diet and cancer. In addition, the authors examined the recall reliability over 24 years and the differences in recall between the cases and the controls. Subjects who completed a 21-item food frequency questionnaire in 1960 were asked to recall this diet in 1984 using a subset of the original questionnaire. N=216 (117 incident cases and 99 controls). Yes No Usual frequency of consumption of 21 foods. 1. Recall scores were similar for both the cases and the controls. The mean and median food frequencies did not show systematic group differences in recall ability after the researchers controlled for factors that were possibly related to recall ability (e.g., age, education, sex). 2. Twenty-four year recall ability was dependent on two factors: vegetarian status and the stability of ones diet. The authors postulated that vegetarians had better recall because they were more health conscious, ate fewer of the foods listed on the diet questionnaire, and were more aware of their own dietary intake (pp. 145-146). 3. Therefore, it was concluded that the data did not provide evidence for the existence of recall bias. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 17. Linsted and Kuzma, 1989 This nested-case control study was also a part of the Adventist Mortality study. The same sample of cases and controls were used to examine the determinants of long-term diet recall with respect to the following variables: vegetarian status, diet stability and selected demographic charact-eristics. N=216 (101 vegetarians and 115 non-vegetarians). Length of recall: 8 years (short-term) vs 12 years (long-term). Yes No Mean frequency per week of 35 foods restricted to vegetarians. The authors investigated the determinants of long-term recall and observed the following relationships: (1) Better recall was noted for vegetarians who had stable diets, were educated, went to church and did not watch television regularly. (2) For length of recall (8 year vs 24 years), diet stability, vegetarian status and education were related to recall accuracy. Ul Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 18. Fenster et a l . The reliability of Yes No Factors possibly The authors noted case-control 1991 exposure data was related to differences in the prevalence of examined in a case- pregnancy exposures reported on the two control study of outcome: occasions. "However, the spontaneous abortion in caffeine, tap and degree of differential reporting Santa Clara County, bottled water was not sufficient to appreciably California. Because of consumption, alter the measures of association the concern about cigarette smoking, between water consumption differential reporting of employment, during pregnancy and water consumption in pregnancy history, spontaneous abortion" (p. 477). regions with publicized occupational water contamination, exposures, and detailed information exposure to video during pregnancy was display terminals. collected and analyzed. Exposure data were collected prospectively and retrospectively by means of a telephone interview and then compared. N=300 (100 cases and 200 controls). Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 18. Fenster et a l . , 1991 (continued) Remarks: (1) Interviewers were not blind to case-control status. Case-control differences may be related to interview bias rather than differences in recall accuracy. (2) Cases were questioned about events that occurred during the entire pregnancy. Controls were questioned on the first 20 weeks of gestation. (3) Small sample size and insufficient study power to study recall bias properly. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 19. Drews, et al . , 1990 A case-control study of sudden infant death syndrome (SIDS). N=452 (226 cases and 226 controls). This study examined case-control differences in the accuracy of maternal recall, and evaluated the impact of maternal reporting errors on the observed measures of association (odds ratios). Personal interview information was compared with medical record data. Yes No Events which had occurred during the mother's pregnancy, labour and delivery, as well as events or sickness happening to their infants within 5 weeks of the death of the matched case infant. The authors concluded overall that case-control differences in recall accuracy were not significant to create "spurious associations with SIDS, or to bias most associations away from the null hypothesis". There were large C-C differences in the estimated sensitivity of recall. However, overall, cases did not report events more completely than controls. Controls were more likely to report events documented in their records. Specificity of recall was at least 10% higher for the controls. These results would seem to indicate that enhanced recall among cases is not universal across factors. These results were opposite to the results reported by Werler, et al. (1986) who noted better recall among the case subjects. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 20. Lindefors-Harris e t a l , 1991 A n analysis of exposure data from 2 independent Swedish studies to determine i f response bias could explain the tendency for an increased risk of breast cancer associated with induced abortion. Study 1 - case-control study in which data was collected via personal interview. Study 2 - cohort record linkage study using registry information on abortion. N=828 (317 cases (breast cancer) and 512 controls) randomly selected from the Swedish population register. Yes Yes History of spontaneous or induced abortions, reproductive histories and contraceptive drug usage. The authors concluded that the results of this study suggested that there was a statistically significant bias in the underreporting of induced abortions among healthy controls compared with incident cases of breast cancer. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 21. Spengler et a l , 1981 A case-control study of endometrial cancer and its relationship to exogenous estrogen use. Personal interview data compared with hospital and/or physician records. N=265 (88 cases and 177 age-matched neighbourhood controls). Yes No Exogenous estrogen use (any dose taken for a duration of at least one month or longer) Level of agreement between interview data and medical records was similar - 83 % cases vs 81 % controls. Interview vs hospital record (85% cases, 65% controls). False (-) rate was better for cases (21% vs 35%) showing slightly better recall among the cases. Two thirds of the disagreement between interview and hospital records was due to women reporting usage with no record documentation (false +). Remarks: (1) Validation of estrogen use from medical and hospital records was completed for all cases, but only 50% of the controls. (2) Can't assume that medical and hospital records are equally complete and reliable for both the cases and the controls. Case-control differences in the reporting of estrogen use may be an artifact related to method of data collection rather than real differences in recall. (3) No estimation of the impact of case-control differences on the odds ratios calculated separately for interview and record data for the subject's exposure to conjugated estrogen. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 22. Horwitz et al . , 1980 These authors completed two case-control studies which investigated the etiological relationship between estrogen use and the development of endometrial cancer. Study 1: N=238 (119 cases 119 controls with only 50 controls being interviewed). Study 2: N=298 (149 cases 149 controls); but 104 cases, 87 controls were interviewed. Personal interview data compared with data in medical records. No No Use of oral estrogen > 3 mg for a minimum duration of atleast 6 months. There was no evidence of recall bias. Disagreements between the interview and the medical records were similar for the cases and the controls. The authors concluded that "The results demonstrated that the odds ratio found in a case-control study may vary considerably according to the source of data used to define exposure . . . i f substantial differences are noted in proportions of people from the basic groups who are available for interview, major variation can be expected in the odds ratio". Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) Consideration Evidence of of Exposure Exposure Exposures/ References Study Design Misclassification Misclassification Conditions Results /Recall Bias /Recall Bias 22. Horwitz e ta l . , 1980 (continued) Remarks: (1) Problems were encountered in this study regarding the selection of cases and controls. Fewer controls than cases were available for interview. For example, in the samples of patients from a tumor registry, more controls had died before the interview could be conducted; fewer control patients overall could be located; more controls refused to participate. (2) Significantly more estrogen users were interviewed than those who did not take this drug. Availability for interview was positively correlated both with estrogen exposure and a diagnosis of endometrial cancer. The authors suggested that the availability for interview plus the increased reporting of estrogen use by cases may lead to a falsely elevated odds ratio. However, this was not investigated. (3) The refusal rate was too high. Consequently, the sample size and study power were too low to test adequately the research hypotheses. tsj Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 23. Friedenreich, et a l , 1993 A nested case-control study conducted within the Canadian National Breast Screening Study. The overall objective of this study was to determine if there was evidence for recall bias in the reporting of past micronutrient intake. N=953 (325 cases (breast cancer) and 628 matched controls ( i .e , matched by age, clinic and date of enrollment)). Dietary data collected prospectively during enrollment (1982-85) were compared with data collected retrospectively in 1988 after the diagnosis of breast cancer. Data were collected by questionnaire. Yes No Dietary factors (86 food items) The authors state that the data from this study do not provide evidence for recall bias in the reporting of previous food intake. The accuracy of recall of food intake patterns was comparable for the case and the control subjects. The odds ratios for the association of the various food groups/items and the occurrence of breast cancer for the prospective and retrospective dietary data were similar in magnitude. Table 2 (continued): Findings of Selected Studies of Recall Accuracy and Recall Bias (Differential Exposure Misclassification) References Study Design Consideration of Exposure Misclassification /Recall Bias Evidence of Exposure Misclassification /Recall Bias Exposures/ Conditions Results 24. Giovannucci et a l , 1993 A nested case-control study conducted within the Nurses' Health Study cohort to determine the association of diet (dietary fats) and breast cancer. N=902 (300 cases and 602 controls). Participation rates: 77 % for both the cases and the controls. Dietary data were collected prospectively and retrospectively by means of a food frequency questionnaire. Yes Yes Mean daily intake of 12 nutrients. Retrospective estimates of total fat and saturated fat showed positive and significant associations between intakes of total fat and saturated fat and breast cancer. Prospective assessments, on the contrary, showed no association. The authors stated that "apparently small biases of 2-5 % in mean intakes of saturated fat and red meat resulted in biases of 50% or greater in odds ratio of breast cancer between extreme quintiles of intake" (p. 508). The authors concluded that several features of their study indicate that their estimate of bias may be representative, i f not an underestimate of the degree of potential bias in a typical c