Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Gender differential item functioning effects on various item response formats of the CES-D Gelin, Michaela Nicole 2001

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2001-0200.pdf [ 2.34MB ]
Metadata
JSON: 831-1.0053890.json
JSON-LD: 831-1.0053890-ld.json
RDF/XML (Pretty): 831-1.0053890-rdf.xml
RDF/JSON: 831-1.0053890-rdf.json
Turtle: 831-1.0053890-turtle.txt
N-Triples: 831-1.0053890-rdf-ntriples.txt
Original Record: 831-1.0053890-source.json
Full Text
831-1.0053890-fulltext.txt
Citation
831-1.0053890.ris

Full Text

G E N D E R D I F F E R E N T I A L I T E M F U N C T I O N I N G E F F E C T S O N V A R I O U S I T E M R E S P O N S E F O R M A T S O F T H E C E S - D by M I C H A E L A N I C O L E G E L I N B . A . , University of British Columbia, 1998 Diploma in Guidance Studies, University of British Columbia, 1999 A THESIS S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R OF A R T S in T H E F A C U L T Y OF G R A D U A T E S T U D I E S D E P A R T M E N T O F E D U C A T I O N A L A N D C O U N S E L I N G P S Y C H O L O G Y , A N D S P E C I A L E D U C A T I O N With Specialization in M E A S U R E M E N T , E V A L U A T I O N , A N D R E S E A R C H M E T H O D O L O G Y We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A A pr i l 2001 ©Michaela Nicole Gelin, 2001 Authorization Form In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t fr e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It i s understood that copying or publication of thi s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of tZ duccctiovt-nV an^- Coot.wx-^iivx^ ""Pksy cUolo^y o,^xJt The University of B r i t i s h Columbia Vancouver, Canada Date C%g^jC 5, Abstract The present study investigated potentially biased scale items on the Center for Epidemiologic Studies Depression (CES-D) scale in a sample of 600 community-dwelling adults between the ages of 17 and 87 years. The mean age was 46 years for males (N=310) and 42 years for females (N=290). The 20-item C E S - D was scored using two binary methods (presence and persistence) and one ordinal method. Gender differential item functioning (DIF) was explored using Zumbo's (1999) ordinal logistic regression method with corresponding logistic regression effect size estimator with all three scoring methods. After statistically matching males and females on the underlying ability, gender DIF was found with the C E S - D item crying for the ordinal and presence methods of scoring. The persistence scoring method identified two DEF items (effort and hopeful), however, this scoring method was of limited use due to low response rates on some items. Overall, the results indicate that the scoring method has an effect on DEF; thus DEF is a property of the item, scoring method, and purpose of the instrument. i i i Table Of Contents Abstract i i List of Tables iv List of Figures v Acknowledgements vi Chapter I: Introduction to the Problem 1.1 Introduction 1 1.2 What is Differential Item Functioning (DIF)? 4 1.3 Ordinal Logistic Regression Method 7 Chapter TL: Literature Review 2.1 Gender Differences with the CES-D 12 2.2 Previous Applications of DIF to Depression Measures 14 2.3 Explanations for Gender Differences 16 2.4 Research Questions 17 Chapter HI: Methodology 3.1 Participants 18 3.2 Measure 18 3.2.1 CES-D Item and Total Scoring 19 3.3 Analysis 23 Chapter IV: Results 4.1 Introduction 25 4.2 Assumptions 25 4.3 Ordinal Scored 4.3.1 Classical Analysis 25 4.3.2 Gender DIF Analysis 26 4.4 Presence Scored 4.4.1 Classical Analysis 28 4.3.3 Gender DIF Analysis 28 4.5 Persistence Scored 4.5.1 Classical Analysis 30 4.5.2 Gender DIF Analysis 30 Chapter V: Discussion 34 5.1 Gender DIF for the CES-D Items 34 5.2 The Effect of Different Scoring Methods on DIF 35 5.3 Implications 36 References 39 Appendix Appendix A: SPSS Syntax for the Ordinal Logistic Regression and Corresponding Effect Size estimator 46 iv List Of Tables Table 1: Results from Zumbo's (1999) ordinal logistic regression method and corresponding effect size measure for Ordinal scoring method - CES-D items 27 Table 2: Results from Zumbo's (1999) ordinal logistic regression method and corresponding effect size measure for Presence scoring method - CES-D items 29 Table 3: Results from Zumbo's (1999) ordinal logistic regression method and corresponding effect size measure fox Persistence scoring method - CES-D items 32 Table 4: Item endorsement proportions by gender for the CES-D scale (310 males; 290 females) using the persistence scoring method 33 V List Of Figures Figure 1: Center for Epidemiology Depression Scale (CES-D) 21 Figure 2: Scoring the CES-D 22 v i Acknowledgements I first wish to acknowledge my mentor and supervisor, Dr. Bruno Zumbo, for his guidance, statistical expertise, enthusiasm, and enduring support throughout all stages of this thesis. His assistance has been invaluable - from the earliest ideas to the final drafts. I would also like to acknowledge my committee member, Dr. Ani ta Hubley, for providing insightful and thoughtful suggestions throughout this thesis. I would also like to thank Dr. Beth Haverkamp for being the external examiner. Additional thanks are due especially to my parents for their unwavering support and confidence as I pursued both my undergraduate and graduate degrees. Finally, I would like to acknowledge the Institute for Social Research and Evaluation at the University of Northern British Columbia for providing the data used in this thesis. 1 Chapter I Introduction to the Problem 1.1 Introduction The Center for Epidemiologic Studies Depression scale ( C E S - D ; Radloff, 1977) is a widely used self-report measure developed for use in studies exploring the epidemiology of depressive symptomology in the general population. This scale has also been used in numerous studies to compare the prevalence of depressive symptomology in different groups such as (1) racial/ethnic groups (Aneshensel, Frerichs, & Huba, 1984; Snyder, Cervantes, & Padilla, 1990; Vera et al. 1991), (2) age groups (Gatz & Hurwicz, 1990; Hertzog, Van Alstine, Usala, Hultsch, & Dixon, 1990; Kessler, Foster, Webster, & House, 1992; Liang, Tran, Krause, & Markides, 1989), (3) nonpsychiatric (Devins et al., 1988) and psychiatric (Weissman, Sholomskas, Pottenger, Prusoff, & Locke, 1977) medically i l l groups, and (4) between men and women (Clark, Aneshensel, Frerichs, & Morgan, 1981; Krause, 1986; Roberts, Andrews, Lewinsohn, & Hops, 1990; Snyder et al. 1990; Sonnenberg, Beekman, Deeg, & Van Tilburg, 2000; Stommel et al. 1993; Zunzunegui, Beland, Llacer, & Leon, 1998). The majority of these studies reported differences between the groups of interest by comparing the mean scale scores. However, the scale mean differences could indicate that the two groups have a different probability of endorsing the items or that the test items function differently for the two groups (e.g., males and females). Therefore, in order to accurately interpret these comparisons one needs to investigate whether scale score differences are correctly attributable to the construct of interest or whether they are being erroneously attributed to the construct of interest (hence being spurious). That is, i f DIF results are erroneously attributable to the construct of interest, thereby inferring that there are 2 real differences when the differences are actually due to an irrelevant factor, item bias is evident. Conversely, i f DEF results are correctly attributable to the construct of interest, thereby inferring that there are real differences between the groups, item impact is evident. The first step required to investigate whether there is item bias or item impact is to investigate differential item functioning (DEF). In this thesis, gender based DEF w i l l be explored with the C E S - D to identify test items that function differently between males and females. These comparison groups were chosen because of the large number of studies reporting that women are much more likely than men to report high levels of depressive symptoms. The C E S - D is comprised of 20 items that reflect various aspects of depressive symptomology. Furthermore, the C E S - D is used with three types of scoring: ordinal, presence, and persistence. The ordinal scoring method uses a Likert type format that is intended to identify the presence and severity of depressive symptoms (Radloff & Locke, 1986). Alternatively, the presence and persistence scoring methods use a binary scoring format. The essential difference between these two methods is that the latter requires that the symptomology be present longer than the former. DEF should be investigated for each of these scoring methods because it is unclear whether the scoring methods affect DEF. In essence, i f an item is found to be DEF with the ordinal scoring method, w i l l it still perform differentially with one of the binary scoring methods or w i l l re-scoring the measure remove the DEF? Furthermore, i f scoring methods affect DEF, then inferences derived from DEF results based on one scoring method w i l l not be appropriate or valid for a different scoring method. N o previous study has compared DEF for various scoring methods with the same instrument. 3 The purpose of this thesis is to investigate whether, for each scoring method, any C E S - D scale items exhibit gender DEF. The DEF w i l l be explored using Zumbo's (1999) ordinal logistic regression method along with his corresponding effect size estimator that are used in combination to help identify differentially functioning items. In particular, this technique is used to investigate whether or not males and females have a different probability of endorsing the items on the C E S - D . If DEF is found with some items, then further investigations are needed to determine i f the DEF is because of item bias or item impact. Determining whether an item displays bias or impact has a number of significant implications for researchers, selection personnel, test takers, and policy makers. The primary issue is one of consequential matters of test fairness and equity. That is, there should be a level playing field where men and women have equal opportunities (e.g., in the personnel selection context) and being treated equitably (e.g., in the screening for depression it is inappropriate to portray women as being more depressed than men i f it is an artifact of the measurement process). To this end, the following section w i l l briefly review differential item functioning (DEF), the relevant terminology found in the literature on bias, and Zumbo's ordinal logistic regression method for the detection of DEF. Ln the next chapter, the few studies that have explored gender biased items of the C E S - D w i l l be reviewed. Next, chapter HI w i l l describe the study methodology. This w i l l include the study participants, a description of the C E S - D measure and the different scoring methods. It w i l l also include the analyses steps, criteria for determining a DEF item, and a brief discussion on the odds ratio statistic used to determine the direction of responding. The results of the study w i l l be reported in Chapter EV. The final chapter w i l l discuss the study results in terms of the research questions: (a) gender DEF 4 for the C E S - D items and (b) the effect of different scoring methods on DIF. Next, implications for research and practice are considered, including what to do with an item that displays DIF. This is followed by limitations of the study and future directions. The thesis concludes with comments on the contribution of this study for measuring depression with the C E S - D , gender differences on depression, and for DIF analyses with various scoring methods of the same instrument. 1.2 What is Differential Item Functioning (DIF)? Differential item functioning (DIF) is a statistical technique that is used to identify differential item response patterns between groups of test-takers (e.g., male versus female, Caucasian versus African American). In assessing response patterns, the comparison groups, conceptualized in this study as men and women, are first statistically matched on the underlying construct of interest (e.g., depressive symptomology), then the DIF methods evaluate the response patterns to individual test items. Thus, as Zumbo (1999) states, DIF occurs when examinees with the same underlying ability on the construct measured by the test, but who are from different groups, have a different probability of endorsing (or correctly answering) the item. He continues with a conceptualization of the basic principle of DIF: "If different groups of test-takers (e.g., males and females) have roughly the same level of something (e.g., knowledge), then they should perform similarly on individual test items regardless of group membership" (p. 5). In this thesis, DIF matches males and females on depressive symptomology, measured by the C E S - D total score. This study could have also matched males and females on a different criterion measure for the latent variable, such as a medical diagnosis of depression from a clinician. This matching alternative would be useful if, for example, the 5 C E S - D total score was found to be misleading or inappropriate, such as the case i f DIF were found for many C E S - D items. DIF is different than previous classical test theory techniques used to.assess bias because DIF matches the groups of interest on the latent variable of interest; previous bias studies compared mean scores either without any matching technique or simply compared the factor structure for the groups of interest. Previous studies that found group differences on observed scores, such as group comparisons of scale means, may be misleading because respondents are not first matched on the construct of interest. Thus, matching groups on the variable measured by the test is important for determining whether item responses are equally valid for different groups. However, it should be noted that DIF is a statistical method to flag potentially problematic items. Therefore, it is the first step in determining whether there is item bias or item impact. Further study would be needed by content experts to determine whether one has bias or impact. Item bias is a value judgment with social, political, and ethical implications, and thus, takes into account the purpose of the test. Specifically, item bias requires that the source of the differential functioning of the item is irrelevant to the purpose of the test and/or interpretation of the measure. In essence, item bias is an artifact of the testing procedure. That is, item bias would occur i f one group of test-takers (e.g., males) were less likely to endorse the item than the comparison group of test-takers (e.g., females) because the item is tapping a factor over-and-above the factor of interest. For example, i f females were less likely to endorse an item from an achievement test of mathematical ability than men because the question required prior knowledge of basketball scores (assuming females do not know the point system used in basketball and males do) then the item is biased. Thus, for item bias 6 to occur, DEF must be apparent; however, as Zumbo (1999) reminds us, "DEF is a necessary, but not sufficient, condition for item bias" (p. 12). Item impact is evident when one group of examinees is found to endorse the item more than the other group of examinees because the two groups truly differ on the underlying ability or factor being measured by the test. That is, item impact occurs when the item measures a relevant characteristic of the test, and 'real' differences between the two groups of interest are found. For example, i f females were less likely to endorse an item from an achievement test of mathematical ability than men matched on mathematical ability, and men and women truly differed on mathematical aptitude, item impact is present. The distinction between whether the group differences are based on irrelevant or relevant characteristics of the measure is really a question about the purpose of the measure. Therefore, one needs to be clear about the purpose of the test before conducting the analysis. A s well , it is important to note, that i f an item is flagged as displaying DEF, it does not mean that the item should be automatically omitted from the scale. Rather, items that are flagged as displaying DEF should be carefully analyzed by experts in the appropriate area. For example, i f a C E S - D item were flagged as displaying DEF then depression researchers should carefully analyze why the item was flagged. Taken as a whole, methodologically, the DEF analysis is computed by initially matching the two groups of interest on their underlying ability as determined by their overall performance on the test (i.e. the total test score). Next, the DEF statistic is computed and this indicates the extent to which members of one group perform differently from members of some other group who are of comparable overall ability. If DEF is found, further analyses are needed to determine i f the DEF is because of item bias or item impact. This thesis does not 7 continue with the investigation of the source of item bias or impact because it would take experts in the area of depression to conduct that study. Moreover, a study of item bias or impact with differential functioning items of the C E S - D for a general population has not been reported in the literature. 1.3 Ordinal Logistic Regression Method In order to calculate DIF in binary and ordinal scored items, this study used Zumbo's (1999) ordinal logistic regression method. To date, Zumbo's ordinal logistic regression method and corresponding measure of effect size is the only method available for both binary and ordinal scored items. Thus, one advantage of this method is that it allows for a direct comparison of the results from binary and ordinal scored items because only this one method is required. That is, statistical method effects do not influence the results, h i addition, this method has a corresponding effect size estimator that can be used with binary and ordinal items to help determine the magnitude of DIF. The effect size estimator is extremely important for this particular study because DIF is based on a large sample size and, without an examination of the effect size, trivial effects may appear to be statistically significant. The ordinal logistic regression method of DBF w i l l be used for the items on the C E S -D that are scored using the ordinal method and it w i l l be repeated for the items that are scored using the two binary methods. A s described in Zumbo's (1999) handbook, this procedure uses the item response as the dependent variable, with the grouping variable (characterized as variable GRP) , total scale score for each examinee (characterized as variable T O T A L ) and a group by total interaction as independent variables. This can be expressed as a linear regression of predictor variables on a latent continuously distributed random variable, y*. The ordinal logistic regression equation is y*=b 0 +b 1 TOTAL+b 2 GRP+ b 3 T O T A L * G R P i + e{ Zumbo's (1999) ordinal logistic regression method provides a test of DEF that measures the effect of group and the interaction, over-and-above the total scale score while, at the same time, statistically matching on the total scale score. This DEF method has a natural hierarchy of entering variables into the model in which the conditioning variable (i.e. the total score) is entered first. Next, the grouping variable (e.g., gender) is entered. This step measures the effect of the grouping variable while holding constant the effect of the conditioning variable. Finally, the interaction term (e.g., T O T A L * G E N D E R ) is entered into the equation which describes whether the difference between the group responses on an item varies over that latent variable continuum. Each of these steps provides a Chi-squared statistic which is used in the statistical test of DEF. The DEF computation is basically the difference between the Chi-squared value for Step #3 and the Chi-squared value for Step #1. That is, the Chi-squared value for Step #1 is subtracted from the Chi-squared value for Step #3 giving a resultant two degrees of freedom Chi-squared value. The two degrees of freedom arises because it is the difference between the three degrees of freedom at Step #3 and the one degree of freedom at Step #1. Next, the p-value for this resultant two degrees of freedom Chi-squared test is determined by using a Chi-squared probability table that is found in most statistical textbooks. Just as a Chi-squared statistic is computed for each step in Zumbo's (1999) ordinal logistic regression method, the corresponding effect size estimator is computed for each step. This corresponding effect size value is calculated as an R-squared which can be applied to 9 both binary and ordinal items. Using these R-squared values, the magnitude of D D 7 can be computed by subtracting the R-squared value for Step #1 from that for Step #3. Lastly, in order to classify an item as displaying DIF, one must consider both the two degrees of freedom Chi-squared test of DIF and Zumbo's corresponding effect size measure. Zumbo (1999) proposed two criteria that must be met for an item to be classified as displaying DIF. First, the two degrees of freedom Chi-squared test for DIF must have a p-value less than or equal to 0.01. Second, the corresponding effect size measure must have a R-squared (R 2 ) value of at least 0.130. However, Jodoin and Gierl 's (in press) investigation of DIF effect size measures suggests that this R 2 value is very conservative, and thus they propose a more liberal R 2 value for detecting DIF. Specifically, Jodoin and Gierl propose R 2 values below 0.035 for negligible DIF, between 0.035 and 0.070 for moderate DIF, and above 0.070 for large DIF. Taken together, this thesis w i l l require that (1) an item must have a p-value less than or equal to 0.01 with the two degrees of freedom Chi-square test, and (2) the corresponding R 2 must be greater than or equal to 0.035 for an item to be classified as displaying DIF. If both of these criteria are met, Jodoin and Gierl 's effect size criteria w i l l be used to quantify the magnitude of DIF. Furthermore, i f DIF exists for an item, the steps computed in the calculation of DIF using Zumbo's (1999) ordinal logistic regression w i l l be reviewed to determine i f the DJJF is uniform or non-uniform. Uniform DIF occurs when there is no interaction between the probability of endorsing an item and the group membership being tested. That is, DIF functions in a uniform fashion across the latent continuum of variation (i.e., depression). That is, uniform DIF may occur when the DIF is attributable to differences in item difficulty only. This can be determined by comparing the R-squared values between steps #2 and #1 "to 10 measure the unique variation attributable to the group differences over-and-above the conditioning variable (the total score) (Zumbo 1999, p. 26). Uniform DD? can also be graphically illustrated as two nonlinear regression lines (one for each group) with a substantial area between the two curves that do not cross over each other. The regression lines typically characterize the probability of endorsing an item as a function of an underlying construct. If uniform DIF is found, the odds ratio w i l l be used to interpret the direction of the DIF (i.e., are females or males more likely to respond?). For the ordinal scoring method, the odds ratio is computed from Step #2 of Zumbo's (1999) ordinal logistic regression (i.e. the regression model adding uniform DIF to the model). Next, a %2(1) test (Step #2 - Step #1) with corresponding p-value and the R 2 effect size values are computed and the odds ratio is computed from the regression coefficient (more technically, the odds ratio is computed as the exponentiation of the regression coefficient). Conversely, non-uniform DIF occurs when there is an interaction between group membership and the criterion variable (CES-D total score). Non-uniform DIF reflects a situation in which an item might differentially favor a group of respondents (e.g., males) at one end of the latent continuum and disfavor the comparison group (e.g., females) at the other end of the spectrum. In terms of the ordinal logistic regression used for detecting DIF in this thesis, non-uniform DIF can be determined by comparing the R-squared values at step #3 to the R-squared values at step #2. A n item is considered uniform DIF i f the difference between steps #2 and #3 is statistically non-significant and has a trivial effect size. Non-uniform DIF can also be graphically illustrated as two nonlinear regression lines (one for each group) that cross over each other and characterize the probability of endorsing an item 11 as a function of an underlying construct. This graph would look similar to an interaction plot from an A N O V A . 12 Chapter II Literature Review 2.1 Gender Differences with the CES-D Since the introduction of the C E S - D in 1977, numerous studies have documented that women tend to have higher scale scores than men on the C E S - D . That is, women tend to endorse depressive symptoms more than men (e.g., Callahan & Wolinsky, 1994; Clark et al., 1981; Krause, 1986; Sonnenberg et a l , 2000). Furthermore, several studies exploring general depression have reported that women experience depression twice as frequently as men whether one looks at depressive symptoms or depressive disorders, and whether referred or non-referred samples are used (e.g., Culbertson, 1997; Leon, Klerman, & Wickramaratne, 1993; Nolen-Hoeksema, 1987). Moreover, this 2:1 prevalence ratio frequently has been found with individuals in the age range between late adolescence and approximately 64 years of age (Nolen-Hoeksema, 1990). Item level gender differences in depression self-report measures have also been documented. A number of studies report problematic items on the C E S - D by comparing mean score differences or the factor structure between males and females. For example, Roberts et al. (1990) found the C E S - D items "crying" and "appetite" had different factor loadings for males and females in a sample of adolescents. Unfortunately, these differences often are documented as a form of bias (e.g., gender bias) but the groups are not first matched. However, as mentioned previously, comparing mean score differences on items or total scores between two groups provide uninterpretable results. A s Santor et al. (1994) express, "Finding an overall mean difference between two groups does not demonstrate bias, nor does failing to find a difference preclude the possibility of bias" (p. 256). 13 To date only one study (Cole, Kawachi, Mailer, & Berkman, 2000) has presented item-level gender DEF with the C E S - D and this study only explored the C E S - D with the ordinal scoring method. Although Cole et al. label their methodology as an extension of the Mantel-Haenszel method of detecting DEF, a closer look reveals that they are using Zumbo's method of modelling DEF through ordinal logistic regression, except that they do not use the R 2 effect size method. Instead, they report on the odd-ratio as an effect size. Using this technique with a sample of 2340 community dwelling adults 65 years of age or older, Cole et al. found that the C E S - D item "crying" functioned differently; the proportional odds of women responding higher on the "crying" item were 2.14 times that of men matched on overall depressive symptoms. That is, women were more than twice as likely to endorse the "crying" item than men. While the study by Cole et al. (2000) is the only item-level study of gender differences with the C E S - D , Stommel et al. (1993) assessed item bias by gender on the C E S -D using factor analysis. Using a series of multi-sample confirmatory factor analysis models on a sample of 1212 subjects (708 cancer patients between the ages of 19-89, average age of 61; 504 caregivers of chronically i l l elderly between the ages of 18-88, average age of 63), Stommel et al. (1993) found that the items "crying" and "talked less" were gender biased. Females were more likely to endorse the item "crying spells" compared to males, while females were less likely to endorse the item "talked less" compared to males matched on overall depression. The technique they used was an ordinary least-squares multiple regression with: (a) the item of interest as the dependent variable, (b) gender and the remaining items as predictors, and (c) the t-test of the gender variable was examined to see i f it is statistically significant. This approach is a primitive form of DEF analysis that only 14 allows for uniform DEF and treats the dependent variable (i.e., the item of interest) as a continuous variable. Although the exclusive use of scale-level methods of factor analysis and reliability, such as the method used by Stommel et al. (1993), are commonly used to investigate item bias, a recent paper by Zumbo (in press) demonstrates that item-level DEF does not manifest itself in scale-level methods. In other words, factor analysis by itself w i l l not necessarily detect DEF. Accordingly, Zumbo (in press) recommends that one must do item-by-item analysis to identify differentially functioning items. He also states that factor analysis can still be conducted; however, it answers a different question. It can only confirm that the test is measuring the same thing in both groups. However, differential item functioning'by gender with the C E S - D has not been explored using a sample with a broad age range from the general population, nor has it been explored using the various scoring methods of the C E S - D . The Cole et al. (2000) study only used the ordinal scoring. In fact, the study reported in this thesis is the first to compare DEF across various scoring methods - ordinal and binary. 2.2 Previous Applications of DIF to Depression Measures A broader literature review on the use of DEF methods (e.g., Rasch, Mantel-Haenszel, and logistic regression methods) applied to other depression measures such as the Beck Depression Inventory (BDI: Beck, Ward, Mendelson, Mock , & Erbaugh, 1961), Geriatric Depression Scale (GDS: Yesavage et al., 1983), Hamilton Rating Scale for Depression ( H R S D : Hamilton, 1960), Hudson's Generalized Contentment Scale (GCS: Hudson, 1982) and/or Zung's Self-rating Depression Scale (SDS; Zung, 1965) was carried out. This 15 extensive literature search revealed that DIF techniques have been used only with the B D I and H R S D . Using nonparametric item response modelling, Santor, Ramsay, and Zuroff (1994) examined gender item bias on the B D I with a sample of depressed outpatient (N=648) and nonpatient college (N=l 182) individuals. Each of the 21 items on the B D I consisted of four graded statements that were scored from 0 to 3, and a total depression score was computed by summing all of the scaled responses. Examining gender bias as a function of the severity of depression with the depressed outpatient sample resulted in three B D I items demonstrating DIF: Item 6 (sense of punishment), item 10 (crying), and item 14 (distortion of body image). "Overall bias was defined as the weighted squared difference between the [option characteristic] curves for men and women" (Santor et al., p.261). A similar analysis was conducted with the college sample. Results from this sample also revealed that item 14 (distortion of body image) demonstrated the greatest amount of gender bias in expected item score; however, no gender bias was found for items 6 (sense of punishment) or 10 (crying). In a more recent study by Santor and Ramsay (1998), item 1 (depressed mood) from the H R S D was used as an example to illustrate DIF between depressed (N=418) and nondepressed (N=238) individuals. After matching individuals from the two groups on depressive severity (e.g., total H R S D score using an ordinal scoring method), clinically depressed individuals were more likely to endorse item 1 (depressed mood) than nondepressed individuals. The other items on the H R S D have not been examined for DIF. 16 2.3 Explanations for Gender Differences The prevalence of findings on gender differences have led to several proposed explanations designed to account for the apparent differences. Such explanations include genetic causes (i.e., depression may be genetically transmitted), social and personality factors, as well as the role of endocrine factors. Another possibility is that the prevalence rates of depression are, in fact, equal in men and women, and the apparent gender differences are believed to reflect an artifact as opposed to true differences. That is, the apparent gender differences do not reflect differences in depression per se, but rather reflect differences in the way men and women express depressive symptoms (e.g., Winokur & Clayton, 1967), recall depressive symptoms (e.g., Angst & Dobler-Mikola, 1984), are wi l l ing to report depressive symptoms (e.g., Frank, Carpenter, & Kupfer, 1988), and/or even diagnostic biases among mental health professionals (e.g., Lopez, 1989; Potts, Burnam, & Wells, 1991; Wrobel, 1993). However, there is no clear evidence that men and women express depressive symptoms differently (Nolen-Hoeksema, 1987). Similarly, there are inconclusive findings that gender differences are accounted for by a greater tendency for women to remember symptoms (e.g., Coryell, Endicott, & Keller, 1992; Fennig, Schwartz, & Bromet, 1994; Wilhelm & Parker, 1994) and/or admit to experiencing depressive symptoms (e.g., K i n g & Buchwald, 1982; Tousignant, Brosseau, & Tremblay, 1987). On the other hand, a number of studies have supported the possibility that mental health practitioners tend to overdiagnose depression in women and underdiagnose depression in men (Lopez, 1989; Wrobel, 1993; Loring & Powell , 1988; Potts et al. 1991), thereby contributing to artificial prevalence rates. 17 What is unclear from this literature is whether the differences being found are: (a) real differences -- impact, (b) differences due to a measurement artifact - bias, or (c) real differences have been minimized or exaggerated by measurement artifacts. 2.4 Research Questions Given that (a) few studies have explored gender DEF in depression measures, (b) only one study has investigated gender DEF for the C E S - D and that this study (Cole et al., 2000) focused on one sample of seniors 65 years of age or older, and (c) no study has compared DEF for ordinal and binary item formats on the same scale, the present study is needed and w i l l contribute to the literature on depression and gender differences, the C E S - D , and the psychometrics of DEF. The research questions investigated in this thesis are: i) Does gender DEF exist for the C E S - D for the ordinal, presence, and persistence scoring formats? ii) Are any C E S - D items found as DEF irrespective of the scoring method (i.e., for all the scoring methods)? iii) Are any C E S - D items found to be DEF for only some of the scoring methods? Therefore, the two purposes of this thesis are (a) to investigate gender DEF for the C E S - D items, and (b) to investigate whether different scoring methods affect the DEF results. Given Cole et al.'s (2000) findings with a sample of seniors, I expect that the crying item wi l l demonstrate gender DEF for the ordinal scoring method in a general population. However, given that there has been no empirical or theoretical work comparing the effect of scoring method, it is unclear whether the different scoring methods have an effect on DEF. 18 Chapter Three Methodology 3.1 Participants Individuals who were included in this study were obtained from the Health and Health Care Survey carried out by the Institute for Social Research and Evaluation (ISRE) at the University of Northern British Columbia, Canada, in the fall o f 1998. The sample comprised of 600 community-dwelling adults l iving in Northern British Columbia: 290 females and 310 males, who were drawn randomly from the Dominion phone list. The mean age of female participants was 42 years (SD = 13.4, range = 18 to 87 years), and the mean age of male participants was 46 years (SD = 12.1, range = 17 to 82 years). 3.2 Measure The Center for Epidemiologic Studies - Depression (CES-D) scale used in this study is a 20 item self-administered instrument originally introduced by Lenore Radioff (1977). This scale was designed to measure the current feelings of depression in the general population. Although this scale has been applied to various clinical samples (e.g., Craig & Van Natta, 1976; Weissman et al., 1977), it was never designed to be used as a screening tool for identifying clinical depression (e.g., within standardized systems such as D S M - T V ; American Psychiatric Association, 1994) or for discriminating among subtypes of depression. The C E S - D has also been translated into many different foreign languages (e.g., Caetano, 1987) and it has been validated for use with a number of different ethnic groups (e.g., Roberts, 1980), as well as for specific age groups such as children (e.g., Weissman, Orvaschel, & Padian, 1980), adolescents, and the elderly (e.g., DeForge & Sobal, 1988; Gatz & H u r w i c z , 1990). 3.2.1 C E S - D Item and Total Scoring The C E S - D , reproduced in Figure 1, asks respondents to indicate the frequency/duration with which they have experienced a specific symptom associated with depression (e.g., M y sleep was restless) during the previous week. Each item has four options that have specific anchors which correspond to the frequency that each of the 20 symptoms was experienced. These anchors are intended to reflect the differences in the presence and severity of depressive symptoms and are usually labelled as: Option 0, rarely or none of the time / less than 1 day; Option 1, some or a little of the time /1-2 days; Option 2, occasionally or a moderate amount of the time / 3-4 days; and Option 3, most of the time / 5-7 days. Using this response scale, the C E S - D can be scored in three different ways: an ordinal scoring format and two dichotomous scoring formats (see Figure 2). Originally, the four options are scored 0, 1, 2, or 3, respectively, which may be termed the "ordinal" method of scoring. Next, the scoring of the positively worded items (item 4, 8, 12 and 16) is reversed, and then all 20 scaled responses are summed for a possible range of scores from 0 to 60. Alternatively, the options may be scored dichotomously with respect to a specific threshold. The most popular dichotomous scoring method is termed the "presence" method of scoring. This method refers to a respondent's report of having experienced the symptom at least some of the time during the preceding week (i.e. for 1 to 7 days). This method is used when researchers are interested only in the presence or absence of any depressive symptomology. In this case, Option 0 is assigned a score of 0, indicating no depression, and all other response options (Options 1, 2, and 3) are assigned a score of 1, indicating depression. Then the positively worded items are reverse scored (ones are recoded as zeros, 20 and vice versa), and lastly each scaled response is summed for a possible total scale score of 20. A n alternative dichotomous format, termed the "persistence" method, is used when researchers are interested only in whether an individual is l ikely to be depressed. The "persistence" of a symptom usually refers to the respondent's report of having experienced the symptom for 3-7 days during the preceding week. For this method, Option 0 and Option 1 are scored as 0, and Option 2 and Option 3 are scored as 1. Next, the positively worded items are reverse scored, and then each scaled response is added for a possible range of scores between 0 and 20. 1 Studies using dichotomous formats have been presented by Clark et al. (1981), Craig and Van Natta (1976), Myers and Weissman (1980), Roberts and Vernon (1983), and Santor and Coyne (1997). ' A n extreme version of the "persistence" method o f scoring is also used with the C E S - D , and refers to the respondent's report o f having experienced the symptom for 5-7 days during the preceding week. This is extreme because an individual must endorse Option 3 for a symptom to be scored as indicating depression. This scoring method was not used in this thesis because there was not enough variability in the item responses for the ordinal logistic regression to be computed. That is, as expected in a general population survey, few respondents select 5-7 days. 21 Figure 1. Center for Epidemiologic Studies Depression (CES-D) Scale. Center for Epidemiologic Studies Depression (CES-D) Scale: Format for Self-Administered Use I N S T R U C T I O N S : Using the scale below, please circle the number for each statement that best describes how often you felt or behaved this way during the past week. 0 = Rarely or none of the time (less than 1 day) 1 = Some or a little of the time (1-2 days) 2 = Occasionally or a moderate amount of time (3-4 days) 3 = Most or all of the time (5-7 days) Less D U R I N G T H E P A S T W E E K : than 1 _4 5 day days days d 1.1 was bothered by things that usually don't bother me. 0 1 2 3 2.1 did not feel like eating; my appetite was poor. 0 1 2 3 3.1 felt that I could not shake off the blues even with help from my family or friends. 0 1 2 3 4.1 felt that I was just as good as other people. 0 1 2 3 5.1 had trouble keeping my mind on what I was doing. 0 1 2 3 6.1 felt depressed. 0 1 2 . 3 7.1 felt that everything I did was an effort. 0 1 2 3 8.1 felt hopeful about the future. 0 1 2 3 9.1 thought my life had been a failure. 0 1 2 3 10.1 felt fearful. 0 1 2 3 11. M y sleep was restless. 0 1 2 3 12.1 was happy. 0 1 2 3 13.1 talked less than usual. 0 1 2 3 14.1 felt lonely. 0 1 2 3 15. People were unfriendly. 0 1 2 3 16.1 enjoyed life. 0 1 2 3 17.1 had crying spells. 0 1 2 3 18.1 felt sad. 0 1 2 3 19.1 felt that people dislike me. 0 1 2 3 20.1 could not get "going". 0 1 2 3 -7 Note: Items are summed after reverse scoring of items 4, 8, 12, and 16. Total C E S - D scores range from 0-60, with higher scores indicating higher levels of general depression. 22 Figure 2. Scoring the C E S - D . Scoring the CES-D Each item has four options: Option 0, rarely or none of the time / less than 1 day Option 1, some or a little of the time /1-2 days Option 2, occasionally or a moderate amount of the time / 3-4 days Option 3, most of the time / 5-7 days O R D I N A L scoring method > A l l four options are scored 0, 1,2, or 3, respectively > The total score ranges from 0 - 60. P R E S E N C E scoring method > The respondent's report of having experienced the symptom at least some of the time during the preceding week (i.e. for 1 to 7 days). Option 0 i s a s s i g n e V 0 (indicating no depression) Option 1 "1 Option 2 > assigned 1 (indicating depression) Option 3 J > The total score ranges from 0 - 20. P E R S I S T E N C E scoring method > The respondent's report of having experienced the symptom for 3-7 days during the preceding week. Option 0 ~i Option 1 J u (indicating no depression) Option 2 T , ,. . . . . Option 3 J 1 ( i n d l c a t m S depression) > The total score ranges from 0 - 20. 23 3.3 Analysis Based on the observation that reliability coefficients are often used as a standard of comparison to show equivalence of measures, they w i l l be reported in this thesis. Reporting reliability coefficients is such common practise that not reporting them gives the impression of an incomplete analysis. Specifically, for all three scoring methods of the C E S - D , coefficient alpha reliabilities w i l l be reported for males and females separately, and for the overall scale (males and females combined). However, although this classical test statistic is computed as an estimate of the test reliability, it should be interpreted cautiously because item-level DEF does not manifest itself in the reliability coefficient (Zumbo, in press). That is, the reliability estimates may be the same for males and females, however, some items may still be DEF. Next, gender DEF w i l l be investigated for each item of the C E S - D using Zumbo's (1999) ordinal method of logistic regression and corresponding effect size measure for each scoring method. In all three gender DEF analyses, gender (coded as 0=female, l=male) was the grouping variable. This can be expressed as Y*=b 0+biTOTAL+b2GENDER+b 3TOTAL*GENDER + A s this equation demonstrates, female and male respondents initially are matched according to their total test score (characterized as variable T O T A L ) on the C E S - D . A s discussed previously, this total test score depends on the scoring format. Appendix A provides the SPSS syntax file used to calculate the ordinal logistic regression and corresponding effect size estimator. It should be noted that this syntax file calls for a public domain SPSS macro (filename: ologit.inc) written by Prof. Dr. Steffen Kuhnel, and modified by John Hendricks, University of Nijmegen, The Netherlands (see Appendix A ) . 24 A s noted earlier in this thesis, the criterion for a DIF item is that (a) the x2(2) has a p-value less than .01, and (b) the R 2 for this 2df test be greater than or equal to 0.035. Jodoin and Gierl (in press) showed that this is a statistically powerful criterion. In addition, Jodoin and Gierl 's effect size criteria w i l l be used to quantify the magnitude of DIF: R 2 values below 0.035 for negligible DIF, between 0.035 and 0.070 for moderate DIF, and above 0.070 for large DIF. In addition, the proportional odds ratio for each item w i l l be presented. The odds ratio w i l l be used to help determine the direction of responding. That is, it w i l l identify whether men or women are more likely to endorse the item. Moreover, it can be used to determine the odds of one group responding higher to an individual item than those in the corresponding group, after matching on overall depressive symptomology. For example, a proportional odds ratio of 2.0 can be translated to mean that those in group one (e.g., females) are twice as likely to endorse the item than those in the comparison group (e.g., males coded as zero). 25 Chapter Four Results 4.1 Introduction The results of the analyses are reported for each scoring method separately, starting with the ordinal method, and followed by the presence and persistence method. For each scoring method, the analyses were calculated using SPSS 10.0 for Windows. Moreover, for each scoring method, results from Zumbo's ordinal logistic regression method and corresponding effect size measure, R-squared, are presented in a tabular format. Each table lists the C E S - D item number, 1 through 20, the Chi-squared test statistic and the corresponding R-squared effect size measure for each step in the model. The final column reports the DIF computation, which includes the two degrees of freedom Chi-squared test statistic value with its p-value, as well as the corresponding R-squared effect size value. 4.2 Assumptions The key assumption in using ordinal logistic regression for DIF is essential unidimensionality, which presumes that the items on the C E S - D only measure one dominant factor. In the present case, results from a confirmatory factor analyses study support the unidimensionality of the scale (Zumbo, Gelin, & Hubley, in press). These authors found that a unidimensional model with method effects modelled for the four positively worded items was the best fit. 4.3 Ordinal scored 4.3.1 Classical analyses Using coefficient alpha, the reliabilities for the ordinal scored C E S - D scale were 0.91 overall, 0.91 for males and 0.90 for females. 4.3.2 Gender DIF analyses A s displayed in Table 1, the results from the ordinal scoring method show that item 17 (crying) displays large gender DIF (DIF R 2 - .218). Moreover, comparing the R-squared values at steps #2 and #3, the data suggest that the "crying" item shows predominantly uniform DIF. Uniform DIF means that there is no interaction between the probability of endorsing item 17 and the group membership (e.g., gender) being examined. That is, for the "crying" item, DIF functions in a uniform fashion across the latent variable continuum. Moreover, the proportional odds of women responding higher on the item "I had crying spells" were 9.31 times that of men matched on the total score. That is, women were over nine times more likely to score higher on this item. It should be noted that items 2 (eating) and 18 (sad) showed Chi-squared p-values less than 0.01, but their effect sizes were small according to Jodoin and Gierl 's (in press) criterion. 27 o . C •*-> s w c •c o o Vi 1 1 o 3 cd 0) cu _N 'vi ts JS CD g '-a c o o, Vi fc o o e C O T D O JS -*-> CJ e c o c/1 (U V -01 <0 o « <a g H3 u. O / — • O S ON ON O E 3 N s o i—l Vi * 3 H C A s 9 W O T t o © 1 - H © cn cn © H H C N H H © C N © i - H 00 O N rt o en © © © © H H i—i H H © © I - H © l - H © © © © o © © © © © © © © © © © © © © © CN © © © a: o O ©' © © © ©' © © ©' ©' © ©' © ©' © o* © © © o H =«: R< o o © T i - T t cn C N cn r- © i - H O N ON VO 00 © cn T t cn MPUT. <L> CJ m © en vo C N 00 C N C N VO cn T t C N VO l - H Os cn o © VO cn MPUT. #3-SI p-valu rt © © © cn ©' T t © T t © cn ©' © ©' © ©' © T t ©' cn ©' © © T t ©' © © T t © © © o o © © © © T t © o _ Si O N 00 00 C N 00 tr> 00 Os C N VO 00 VO ON r- © C N vo T M Ul on © cn © cn C N cn © cn OS vo r- cn C N cn C N tn T t C N 00 T t O N 00 cn I T ) C N C N cn 1 m © H H un •n C N © C N vi ©° © ©' ©' vd vd C N © © vd ©' vd © in vo ©* T t © ,_ C N C N Os Os cn vo vo 00 OS vo O N O N i - H T t 1—H «N © i - H O N i ON m VO C N © 1 - H r - H cn C N C N T t O N VO in © C N CSN C N f- r-e :s in the cn C N vo C N T t vo T t cn T t cn cn T t cn T t C N in Vl o vo C N T t ind No :s in the fc o © © © ©' © © © ©' © © ©' © © ©' ©' © ©' © -r <LI o © © © © © © © © © © © © © © © © © © © en S o © © © © © © © © © © © © © © © © © © © % E CJ T3 o © © © © © © © © © © © © © © © p © © © "§ i © © © ©' ©' © © © © ©' ©' ©' © ©' © © © ©' ©' © w a > O H P b a Q. C/3 of p o o CO 00 00 cn Os 00 C N _ H r-- T t T t vo m •n cn C N m V D VO in VO H H cn cn T t i - H OS r~ l - H vo cn cn cn r- O N 00 00 O N cn © vq T t O N cn r- © C N C N in O N vo C N in © T t C N O N cn B cn C N en C N cn 00 © C N T t ©' vi vd r-' Os 00 T t © l - H vd T t C N © T t r~ C N C N 00 r- C N T t C N t> ON vo rt C N C N © o H wit C N rt T t l - H C N T t C N C N H H i—i C N cn C N C N cn fN T t cn r- Os cn 00 Os T t rt in r- 00 VO Os ON cn T t _ Os Os C N 00 00 T, vo C N © l - H cn C N C N T t Os VO T t © C N cn f-H r- r-& fc cn C N vo C N T t VO T t cn T t cn cn T t cn T t C N •n in VO C N T t Q © © ©' © © © ©' © © © ©' © © © ©' © © © ©' ©' odel odel CS % hVi W C*H •a w o © © © © © © © © © © © © © © © o © © © in the l o © © © © © © © © © © © © © © © o © © © 1 in the l -valu o © © © © © © © © © © ©' © ©' © ©' © © © © © © © © © ©' © © © © © © o ©' © © © © © ©' H in the l a. cj ariab VO CO cn l - H Os © 00 T t cn 00 vo T t vo rt C N 00 O N o CA ariab p en l - H T t "/"> © T t 00 VO VO vo 1 - H C N cn VO r- T f i - H cn l - H ariab © T t 00 l - H T t Os C N © >n in VO vo in O N iri 00 C N vd $ > cs T f I - H cn C N cn 00 00 T t © v-i vd r~-' 00 cn rt 00° © © C N C N © T t C N VO C N 00 C N T t r- Os VO CN rt C N cn o rt T t l - H C N T t C N C N 1 - H l - H C N cn C N C N cn T t rt H r~- C N cn 00 00 cn cn cn 00 00 T t 00 00 ON T t © T t „ T t 00 00 C N vo C N © H H © C N ^ H C N T t 00 VO cn © f - H i - H VO cn C N vo C N T t VO T t cn T t cn cn T t cn T t C N WI cn vo C N T t odel fc o © © © ©' © © © ©' ©' ©' © © © ©' © o ©' © © odel CJ o © © © © © © © © © © © © © © © © © © © o © © © © © © © © © © © © © © o © © © © S3 o © © © © © © © © © © © © © © © s © © © TEP p-va © ©' © © ©' ©' © ©' ©' ©' © © © © ©' © o ©' © © TEP p-va o CJ C O fc C N r- T t i - H i - H 1 - H cn C N vo T t O N cn 00 r- cn vo vo T t O N VO 00 C N 00 © C N © VO T t ON O N © OS m vo fN T t T t VO V0 VO r- cn 00° Tl OS © ON r- m m vo in ON ON r- 00 © o IS © vd C N C N r- C N T t vi C N OS T t ©' r-' C N 00 oo' o O N vd H © © T t C N T t VO l - H 00 VO C N T t C N r- O N •n vo © | H © wit! C N T t C N C N i - H C N cn C N C N cn rt T t cn Q S w © H H C N cn T t in vo 00 ON © ES-ITEI i a C N a cn a T t a a vo a r-a 00 a Os a e a a a = a i - H a Iteml a a U ITEI z CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CD CJ Iteml CJ CJ CJ CN C o 'vi I H > Vi ° , • I 2 T3 d a cd § .1* IB "o cd o o Vi Vi £ cd rt o 28 4.4 Presence scored 4.4.1 Classical analyses Using coefficient alpha, the reliabilities for the presence scored C E S - D scale were 0.86 overall, 0.85 for males and 0.87 for females. 4.4.2 Gender DIF analyses The results for the presence scoring method, displayed in Table 2, show that only item 17 (crying) shows large gender DEF. Furthermore, the difference in R-squared values from Step #2 to Step #3 was relatively small suggesting that the DEF is predominantly uniform. The proportional odds of women responding higher on the item "I had crying spells" were 7.51 times that of men matched on overall depressive symptoms. That is, women were 7.51 times more likely to endorse this item in a presence format than men. Unlike the ordinal scoring method, none of the other items had a Chi-squared p-value less than 0.01 (the threshold for DEF). 29 CU „ OA C •c o o 8 I, in <2 3 cn ea CD N I CU 01 c T3 c o D cn cu fc o C <a T 3 o 6 B O °cn cn CU l i 04 CU OX "e3 c _ o OA ov o 1 3 N e o «b CO rt cn <L> tf cn E C/3 W u o _ B I 00 0> •g e 41 .3 CD •c 5 E2 •o o s H a GO o C/3 1 cn r - VO f-H • n CN CN cn f-H r - © T t rt IH 00 © 00 O rt f-H © f -H i - H CN © © © ^ H © © CN © © © o O © © © © © © © © © © © © © © f-H © © © © ' © ' © © © © ' © ' © © ' ©' ©' ©' © ' © © © o* ©' ©' © ' T t cn rt • n T t T t i n _ i r- CN rt © 00 © o • n <-- © "(U OS I-H cn • n CN f-H ON cn CN cn CN T t CN T t © o vo 00 T t 3 cn © f -H T t © © © © © CN f-H cn © © cn CN o © cn f H 1 O o ©' © ' © ' © ' ©' © ©' © ©' © ' © © ' © ' © o ©' ©' © ' T O. CN cn vo VO CN i n CN 00 i n T t T t • n CN T t r~ r~ T t CN to r~ 00 CN Ov f -H © CN CN CN r- VO 00 © T t CN cn c n 00 f -H m CD T t © 00 CN T t f-H cn m i n i n 00 00 T t r - 00 t - © • n i n '—V © ' r-' f -H © T t VO r~' cn m rt' CN © ' T t id © ' l - H c n T t ©' CN CN Tf ••—' CN f -H OV r- CN 00 cn vo VO © m CN vo T t CN t - r- cn CN wi T t CN cn CN T t CN f-H CN CN VO r- © © CN Tt vo CN CN T f CN r~ © W"l r - • n f-H i n cn cn r-H T t T t cn CN VS VO cn i n „ © ' ©' © ' © ' © ©' © ' © ' © ' © ' © ' © ©' ©' © ' © ' © ' © ' © ' © ' o © © © © © © © © © © © © © © © o © © © o © © © © © © © © © © © © © © © o © © © s o © © © © © © © © © © © © © © © © © © © 1 o © © ' © © ©' ©' ©' ©' © ' © ©' ©' © ' ©' © ' o ©' © ' © ' ex >o T i - r- CN T t i n rt rt r - CN © CN i n l~~ • n 00 *o -S CN en CN VO CN © vo i - H T t i n VO r- © cn CN V© T t T t T t cn r- 00 © •n CN cn cn CN © 00* r~ 00 cn vo T t Tt rt CN CN IN cn r-' vo' r-' r—' f—4 r-' CN r-' © ' rt i n oo' © ' cn 00 CN Tt r~- i n r-' cn r - 00 f-H CN CN CN 00 CN i n ^ H CN T t f-H r - • n CN cn CN T t CN f -H CN CN f-H C N cn CN CN © vo i n CN © T t 00 rt © 00 CN VO T t T t CN cn cn r-T t cn CN CN CN T t CN r-H CN CN VO r-- © © cn VO VO CN f -H CN © m r - i n f-H i n cn cn rt T t T t cn CN IT) VO cn m © ' © © ©' ©' ©' ©' ©' ©' © ' © © © ' ©' ©' © ' ©' © ' ©' © ' o © © © © © © © © © © © © © © © o © © © CJ o © © © © © © © © © © © © © © © o © © © s o © © © © © © © © © © © © © © © o © © © 1 ©' © © © © ' ©' © ' © ' ©' © ' © ' ©' © ' ©' ©' © ' © * © ' © ' ©' a. vo © 00 CN vo CN T t VD r- 00 cn r~- © cn o VO T t © CN cn r - w-> cn 00 m VO 00 m © © rt' CN CN r - CN CN CN • n CN r- CN 00 >n r- T t cn 00 <—i i n vo CN CN vq r- 00 CN i n © CN VC vd vd vd oo' vd © ' r-' © ' ©' oo' 00 CN cn oo' oo' T-H vd wi vd cn i - H r- r- f-H CN CN CN 00 i n CN T t rt CN T t » H r- i n t--CN cn CN T t CN rt rt CN CN f-H C N m CN 00 T t cn r- CN CN CN CN r~ t- T t cn vo © 00 «*) CN cn T t T t m cn CN CN f—H CN © CN 00 00 VO vo CN © CN ITl • n CN f-H T t CN © i n r- m f-H T t cn cn f-H T t cn cn CN Tt vo cn i n ai ©' © © © © ' © ' ©' ©' © © ©' © © ' © © ' © ' o* ©' © ' ©' o © © © © © © © © © © © © © © © © © © © u o © © © © © © © © © © o © © © © o © © © 3 o © © © © © © © © © © © © © © © © © © © ©' ©' ©' ©' ©' © ' ©' © ' © ' © © ' © ' ©' © ' ©' © ' o ©' © ' © ' 7 a. T t »-H vo f-H © cn i n © CN cn m cn 00 m © rt o 00 T t T t CN w-> CN 00 i n f -H r- T t f-H vo T t 00 vo m f-H VD c n m cn OS X) ON vo ON r- f-H f—1 f-H^ © cn T t CN 00 © 00 CN VO r~ © T t - H vd Ov v i vd r-' f -H wi T t i n CN i n r-' vd vd r-' o* cn i n T t cn © r- f -H 00 CN 00 r- >o CN T t © CN T t r- l> • n r--CN f-H cn CN T t CN f-H CN CN f-H cn CN © f-H CN cn T t m VO l ~ 00 CN © l - H CN cn T t i n vo 00 ON f-H rt rt f-H rt f-H f-H I-H s s S 5 S E = E E E E B E e B E E <u E B E CD CD CD CD CD CD CD CD CD CD CD CD o 4) CD CD CD CD CD CN _o ' i n i~ cu > cn 1 3 C cu 00 c2 ccJ rt 'S _c CU o cn cu •T? cn •8 p ccJ a. cn -3 cd j o "o "o JO s cu CU 30 4.5 Persistence scored 4.5.1 Classical analyses Using coefficient alpha, the reliabilities for the persistence scored C E S - D scale were 0.87 overall, and 0.89 for males and 0.85 for females. 4.5.2 Gender DIF analyses For the persistence scoring method, the results from Table 3 show that items 7 (effort) and 8 (hopeful) show moderate gender DIF. Furthermore, comparing the R-squared values at Step #2 and #3, the data suggest that the DIF for these items is predominantly uniform. The proportional odds of men responding higher on item 7 were 2.03 times that of women matched on the total score, and the proportional odds of men responding higher on item 8 were 1.90 times that of women matched on the total score. It should also be noted that the calculation for items 2, 3, 9, 10, 15, 17, and 19 could not be computed because there was a low probability of endorsing the items. In other words, there was not enough variability in the item responses for the ordinal logistic regression to be computed. It is very important to note that item 17 (crying) is among these items that have very low variability. Table 4 shows the endorsement proportions for each item on the C E S - D using the persistence scoring method. In classical test theory, these generally are defined as the proportion of examinees who answer an item correctly and, thus, are a measure of item difficulty. In this context, however, endorsement proportions are more accurately indicative of how much latent variable is required before an individual endorses a particular item. For example, using the persistence scoring method, an item is only endorsed by an individual i f they experience the symptom for 3-7 days. In other words, this scoring method requires that an individual must have a lot of the latent variable, depression, before they endorse an item. 31 Thus, with such a strict criteria for endorsing an item, it is no surprise that all 20 C E S - D items have very low endorsement proportions, ranging from 3.8% to 27%. Table 4 also shows that items 2, 3, 9, 10, 15, 17, and 19 have low endorsement proportions ranging from 3.8 % to 6.7%). Consequently, these items exhibit very little variability. Endorsement proportions by gender were also calculated. A s shown in Table 4, females had higher endorsement proportions than males for 16 of the 20 C E S - D items. However, according to Cohen's (1992) effect size criteria, the difference in endorsement proportions between males and females was trivial. Moreover, as noted in the introduction of this thesis, one should interpret these gender differences cautiously because of the absence of matching males and females on the latent trait- which is why DE? analyses are needed. The implications of these very low endorsement proportions w i l l be presented in the discussion section of this thesis, Chapter V . 32 6n fi •c o o Vi c» o s: a, t-. <2 CD 1-1 -3 <Z> SJ CD CD N CD tu e 43 c o O i QJ fc o CD T3 fi a -a o ID 6 c _o VI CD V H 61 CD 61 O "c3 c "*3 u O ON ON ON O 1 3 N E o (/J rt 3 c n rt E CD CZ5 u o ,fi +-> CD i i i-s w a > 5 H D CJ (—1 * H ^ o S CJ C E2 3 & Q 3 3 11 2 ^ ^ l l « H S —1 a l i •a > CJ •a o rt CJ 0- a SJ o CJ C O rt o H r—1 rt cn Vl 0 rt © cn CS ,_( T t NO CS O © cn Tt Tt cn © © i H © N J PH O O O O p © p © © © © © © O O 1 O O 0 9 © © ' © © ' © ' © © OO O NO O © •n © T t cn rt NO "°CJ 1/1 m NO r- © rt o\ c s •n r~ N O cn 00 5 rt T t cn 0 0 O © cn c s T t cn © CN •a O © 0 0 0 S © © ' © © ' © © ' © ' o. 00 ON O N in NO 00 c s in m O N NO T t to O N C - <n r~ Tt O c s N O 00 © cn NO l - H c s N O r- ad 00 T t 00 cn N O m ""I c s O O cn r-^  cn © rt © © m' rt c s N O O ON <n rt N O •n NO T t O N in N O rt •n in o \ ON © rt 00 CS NO N O NO PH cn CS cn •n cn cn T t in c s T t m in T t 0 O O 0 0 s © © ' © © © © ' © 0 O 0 0 0 © © © © © © © © - , 0 O 0 0 0 © © © © © © © © 3 0 O © 0 0 © © © © © © © © 1 0 O © © 0 0 © © ' © © ' © © © T C H 00 cn cn c s NO r- T t rt •n 00 T t •n r- NO cn © V) © NO H H in © 00 m NO 00 r~ rt NS 00 NO rt T t c s cn © T t <s cn r~ r-' CS O N O SO cn CS cn c s T t ON with CS OS T t H H rt r- cn © r- in t - H with CS rt CS rt CN CN 0 00 CS 9\ CS T t CS r~ T t m © © T t T t cn r- 00 rt 00 c s in 00 in P H cn c s cn •n cn cn cn in c s T t in •n T t © 0 0 0 0 0 ' © © © © ' © © © 0 0 0 0 0 0 © © © © © © © CJ 0 0 0 0 0 © © © © © © © © 0 0 0 0 © © © © © © © © © C<3 > 0 © 0 0 0 s O © © © © © © CH N O ,_, cn Vl T t cn ON CS c s cn ON rt O rt 00 c s Vl T t 00 rt in •n © "9 T t 00 >r, 0 cs ON in ON 00 CS 00 © T t Vl r-' c s 00' l-^ t - ^ © ' P H CS CS cn •n 00 CS ON T t rt Tt r- cn © r~ m rt r-1—1 rt CS rt rt rt c s ' — ' i—1 c s CN rt 0 r~ T t N O O rt •n CS T t cn v-i ON T t rt T t CS Vt Vl r- © 00 CS in N O T t CH cn CS cn <n rn c> cn in c s T t in m Tt O O O 0 O 0 © © ' © ' © © ' © © ' O O O 0 s © © © © © © © © Si O O O 0 © 0 © © © © © © © O O O 0 p p © © © © © © © vs] © O O 0 s* s* ©' © © © © © © 1 a. 00 T t T t r> Tt NO ON cn cn m O N cn cn 00 CS 0 r- ON © in NO rt T t in ND © cn cs CS CS " - J NO m en cs r-' CS in cs © © rt © CS cn rt 00' ON T t 1—f Tt NO r- cn rt r- m rt r-•3 c s rt rt rt CN i - H c s CN l - H •5 OC © CS cn in N O 00 ON © rt CS cn T t >n NO ON rt rt rt rt i - H rt rt 1—1 rt rt CN E s a £ a E E s a a a a a a a a a a a a CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CJ CN a o CD > Vi O fi . CD T3 C CD 60 c2 ccj +-> ' f i % _ f i -a CD ts I o Vi CD •s O CCJ *&, Vi >> 1 3 *o X) _c Vi E CD CD rt O 33 Table 4. Item endorsement proportions by gender for the C E S - D scale (310 males; 290 females) using the persistence scoring method. CES-D item Overall Male Female number endorsement endorsement endorsement proportions proportions proportions 1 0.098 0.094 0.103 2 0.067 0.032 0.103 3 0.065 0.061 0.069 4 0.200 0.184 0.217 5 0.132 0.129 0.134 6 0.118 0.090 0.148 7 0.125 0.139 0.110 8 0.230 0.248 0.210 9 0.048 0.052 0.045 10 0.067 0.052 0.083 11 0.272 0.242 0.303 12 0.193 0.181 0.207 13 0.098 0.084 0.114 14 0.120 •0.106 0.134 15 0.057 0.061 0.052 16 0.193 0.174 0.214 17 0.050 0.026 0.076 18 0.095 0.071 0.121 19 0.038 0.029 0.048 20 0.112 0.103 0.121 Mean 0.12 0.11 0.13 SD 0.07 0.07 0.07 Note: The items in bold could not be computed with the ordinal logistic regression (see Table 3) because there was a low probability of endorsing the items. This table is only intended to show endorsement proportions and should not be used to investigate or interpret DIF. 34 Chapter V Discussion The primary objective of this thesis was to conduct gender DEF for the C E S - D scale with each scoring method. In doing so, this thesis explored whether (a) gender DEF existed for the C E S - D for the ordinal, presence, and persistence scoring formats, (b) any C E S - D items were identified as DEF irrespective of the scoring method and (c) any C E S - D items were found to display DEF for only some of the scoring methods. 5.1 Gender DIF for the CES-D Items The results in Chapter IV showed that, depending on the scoring method used, at least three of the 20 C E S - D items functioned differently among males and females. For the ordinal scoring method, the "crying" item (item 17) showed predominately uniform DEF. Similarly, the "crying" item also showed uniform DEF when the C E S - D was scored using the presence method. These findings were consistent with those of Cole et al.'s (2000) study of seniors. However, it is not known whether or not the "crying" item functions differently for males or females when the persistence scoring method is used because this data sample does not have enough variability in the responses of these items for the ordinal logistic regression to be computed. Although an item showing almost no variability may be considered a poor indicator of the construct being measured, this is unlikely in this context because the items hold up well using the other two scoring methods. It is more likely that these are good items, and that the low variability for these items is only because of the strict criteria used with the persistence method of scoring. Furthermore, given that this data came from the general population, it is unlikely that enough people in this sample would experience symptoms such 35 as "I had crying spells" for 3-7 days, creating very little variability. That is, individuals from the general population are more likely to experience such symptoms less than two days, an endorsement indicating an individual is unlikely to be depressed with the persistence scoring method. The results found with the persistence scoring method showed that item 7 (everything was an effort) and item 8 (felt hopeful) were flagged as showing gender DEF. Although these items were flagged as showing gender DEF, this result should be interpreted with some caution because of the low variability found in the responses (item 7, p= 0.125; item 8, p= 0.230). In fact, a close look at all the responses with this scoring format reveals low variability for all the items, which supports the notion that using the persistence scoring method with data from a general population is not appropriate. One needs variability in the item responses for an item to have psychometric utility - i.e., the aim of psychometric methods is to differentiate among respondents and therefore an item without variability does not aid in this differentiation. To this end, it can be concluded that the "crying" item displays gender DEF for both the ordinal and presence scoring methods. That is, an individual's gender influences how one endorses the "crying" item on the C E S - D . In other words, re-scoring the C E S - D from the ordinal method to the presence method does not remove the gender DEF of the "crying" item. On the other hand, items 7 and 8 display gender DEF only with the persistence method. 5.2 The Effect of Different Scoring Methods on DIF The findings in this thesis not only demonstrate that DEF is a property of the item (e.g., the crying item), they also show that DEF is a property of the scoring method. A s mentioned in the introduction of this thesis, no previous study had compared DIF for various scoring formats with the same instrument. That is, prior to this thesis, it was not known whether DIF was dependent on various scoring formats. Given that the C E S - D has various scoring formats, this instrument was used to ascertain whether DIF was dependent on the scoring method. A s presented in the results section, Chapter IV, different items displayed gender DIF depending on the scoring format used. Thus, when one thinks about DIF one must think about the items, the scoring method, and the interaction of the items and scoring method used. Furthermore, because the scoring method is dependent on the purpose of the instrument, DIF is also a property of the purpose of the measure. Wi th this in mind, researchers exploring why an item displays DIF must consider not only the item, but also the scoring format used and the purpose of the instrument. This is relevant in practice because there are several other instruments that have a Likert-type response format and a variety of suggested binary scoring methods, such as the widely used General Health Questionnaire (Goldberg, 1972; Goldberg & Will iams, 2000). 5.3 Implications Exploring why an item displays DIF is pertinent for decisions regarding what to do with items that function differently for different groups. That is, an item displaying DIF should not be discarded from the instrument before experts in the area can clearly understand why the item is endorsed differently for different groups. A n item displaying DIF may reflect item impact (i.e. the groups truly differ on the underlying factor being measured). For example, i f the "crying" item is tapping depressive symptomology (i.e. a relevant characteristic of the measure), item impact may be present. For instance, women may be more depressed than men and therefore, they endorse the crying item more. However, i f the 37 "crying" item is tapping a factor other than depression symptomology, item bias may be present. For example, the crying item would be biased i f men endorsed it more than women because men are socialized not to express their emotions. In order to explore this issue, a talk-aloud protocol may be used wherein individuals orally describe what they are thinking as they respond to an item. Accordingly, this method w i l l provide information as to the process of responding to an item. Moreover, this method lends itself nicely to answering the question "What is it, in the process of responding to items, that causes bias?" That is, are there differences in the reasoning processes people use in biased versus unbiased items? Although this talk-aloud protocol approach has not yet been used in the DEF literature, the information provided by this approach may help researchers decide what to do with an item that displays DEF and it may suggest why the item displays DEF. In addition, one also needs to consider the characteristics o f the sample with which the DEF item was found. One should make sure that the item displays DEF across different samples. For example, an item may show DEF between groups of seniors and young adults, but may not show DEF between groups of seniors and children (e.g., irritability). In terms of this thesis's findings, although DEF was found with a sample from the general population, comparisons and generalizations across other populations may be problematic. Specifically, this sample came from Northern British Columbia, an area that is comprised of numerous small, rural towns and one city (population -80,000). In order to generalize these findings across populations, it is necessary that future studies replicate this study with different populations (such as those from larger urban areas). Future studies could explore gender DEF for other depression inventories and compare whether similar items display gender DEF. Comparisons across different items from 38 different depression scales may provide a great deal of information regarding the types of items that are found to display gender DEF, and this information may be used as a basis for making informed judgements and decisions pertaining to the future use and role of such items. References American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4 t h ed.). Washington, D C : Author. Aneshensel, C.S. , Frerichs, R.R. , & Huba, GJ. (1984). Depression and physical illness: A multiwave, nonrecursive causal model. Journal of Health and Social Behavior, 25, 350-371. Angst, J., & Dobler-Mikola, A . (1984). Do the diagnostic criteria determine the sex ratio in depression? Journal of Affective Disorders, 7, 189-198. Beck, A . T . , Ward, C . G . , Mendelson, M . , Mack, J., & Erbaugh, J. (1961). A n inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Caetano, R. (1987). Alcohol use and depression among U . S . Hispanics. British Journal of Addictions, 82, 1245-1251. Callahan, C M . , & Wolinsky, F .D . (1994). The effect of gender and race on the measurement properties of the C E S - D in older adults. Medical Care, 32, 341-356. Clark, V . A . , Aneshensel, C.S., Frerichs, R.R. , & Morgan, T . M . (1981). Analysis of effects of sex and age in response to items on the C E S - D scale. Psychiatry Research, 5, 171-181. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. Cole, S.R., Kawachi, I., Mailer, S.J., & Berkman, L . F . (2000). Test of item-response bias in the C E S - D scale: Experience from the New Haven E P E S E study. Journal of Clinical Epidemiology, 53, 285-289. 40 Coryell, W. , Endicott, J., & Keller, M . (1992). Major depression in a nonclinical sample: Demographic and clinical risk factors for first onset. Archives of General Psychiatry, 49, 117-125. Craig, T.J. , & V a n Natta, P. A . (1976). Presence and persistence of depressive symptoms in patient and community populations. American Journal of Psychiatry, 133, 1426-1429. Culbertson, F . M . (1997). Depression and gender: A n international review. American Psychologist, 52, 25-31. DeForge, B .R. , & Sobal, J. (1988). Self-report depression scales in the elderly: The relationship between the C E S - D and Zung. International Journal of Psychiatry in Medicine, 18, 325-338. Devins, G . M . , Orme, C M . , Costello, C G . Binik , W . M . , Frizzel l , B . , Stam, H.J . , & Pull in, W . M . (1988). Measuring depressive symptoms in illness populations: Psychometric properties of the Center for Epidemiologic Studies Depression (CES-D) scale. Psychology and Health, 2, 139-156. Fennig, S., Schwartz, J.E., & Bromet, E.J . (1994). Are diagnostic criteria, time of episode and occupational impairment important determinants of the female: Male ratio for major depression? Journal of Affective Disorders, 30, 147-154. Frank, E . , Carpenter, A . B . , & Kupfer, D.J . (1988). Sex differences in recurrent depression: Are there any that are significant? American Journal of Psychiatry, 145, 41-45. Gatz, M . , & Hurwicz, M . L . (1990). Are old people more depressed? Cross-sectional data on Center for Epidemiological Studies Depression Scale factors. Psychology and Aging, 5, 284-290. 41 Goldberg, D . P. (1972). The Detection of Psychiatric Illness by Questionnaire. Maudsley Monograph No. 21. Oxford University Press : Oxford. Goldberg, D . , & Will iams, P., (2000). A User's Guide To The General Health Questionnaire. N F E R - N E L S O N . Hamilton, M . (1960). A rating scale for depression. Journal of Neurology, Neurosurgery and Psychiatry, 23, 56-65. Hertzog, C. , V a n Alstine, J., Usala, P .D. , Hultsch, D.F . , & Dixon, R. (1990). Measurement properties of the Center for Epidemiological Studies Depression scale (CES-D) in older populations. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2, 64-72. Hudson, W . (1982). A measurement package for clinical workers. Journal of Applied Behavioral Science, 18, 229-238. Jodoin, M . G . , & Gierl , M . J . (in press). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education. Kessler, R . C . , Foster, C , Webster, P.S., & House, J.S. (1992). The relationship between age and depressive symptoms in two national surveys. Psychology and Aging, 7, 119-126. K i n g , D . A . , & Buchwald, A . M . (1982). Sex differences in subclinical depression: Administration of the Beck Depression Inventory in public and private disclosure situations. Journal of Personality and Social Psychology, 42, 963-969. Krause, N . (1986). Stress and sex differences in depressive symptoms among older adults. Journal of Gerontology, 41, 727-731. 42 Leon, A . C . , Klerman, G .L . , & Wickramaratne, P. (1993). Continuing female predominance in depressive illness. American Journal of Public Health, 83, 754-757. Liang, J., Tran, T .V . , Krause, N . , & Markides, K . S . (1989). Generational differences in the structure of the C E S - D scale in Mexican Americans. Journal of Gerontology: Social Sciences, 44, S110-S120. Lopez, S.R. (1989). Patient variable biases in clinical judgement: Conceptual overview and methodological considerations. Psychological Bulletin, 106, 184-203. Loring, M . & Powell , B . (1988). Gender, race, and D S M - H I : A study of the objectivity of psychiatric diagnostic behavior. Journal of Health and Social Behavior, 29, 1-22. Myers, J .K. , & Weissman, M . M . (1980). Use of a self-report symptom scale to detect depression in a community sample. American Journal of Psychiatry, 137, 1081-1084. Nolen-Hoeksema, S. (1987). Sex differences in unipolar depression: Evidence and theory. Psychological Bulletin, 101, 259-282. Nolen-Hoeksema, S. (1990). Sex differences in depression. Stanford, C A : Stanford University Press. Potts, M . K . , Burnam, M . A . , & Wells, K . B . (1991). Gender differences in depression detection: A comparison of clinical diagnosis and standardized assessment. A Journal of Consulting and Clinical Psychology, 3, 609-615. Radloff, L .S . (1977). The C E S - D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 3, 385-401. 43 Radloff, L .S . , & Locke, B . Z . (1986). The community mental health assessment survey and the C E S - D Scale. In A . E . Slaby (Series Ed.), Community surveys of psychiatric disorders (pp. 177-189). New Brunswick, N J : Rutgers University Press. Roberts, R . E . (1980). Reliability of the C E S - D scale in different ethnic contexts. Psychiatry Research, 2, 125-134. Roberts, R . E . , Andrews, J .A. , Lewinsohn, P . M . , & Hops, H . (1990). Assessment of depression in adolescents using the Center for Epidemiological Studies Depression scale. Psychological Assessment, 2, 122-128. Roberts, R . E . , & Vernon, S.W. (1983). The Center for Epidemiological Studies Depression scale: Its use in a community sample. American Journal of Psychiatry, 140, 41-46. Santor, D . A . & Coyne, J.C. (1997). Shortening the C E S - D to improve its ability to detect cases of depression. Psychological Assessment, 9, 233-243. Santor, D . A . , & Ramsay, J.O. (1998). Progress in the technology of measurement: Applications of item response models. Psychological Assessment, 10, 345-359. Santor, D . A . , Ramsay, J.O., & Zuroff, D . C . (1994). Nonparametric item analyses of the Beck Depression Inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255-270. Snyder, V . N . S . , Cervantes, R . C . , & Padilla, A . M . (1990). Gender and ethnic differences in psychosocial stress and generalized distress among Hispanics. Sex Roles, 22, 441-453. Sonnenberg, C. M . , Beekman, A . T. F., Deeg, D . J. H . , & Van Tilburg, W . (2000). Sex differences in late-life depression. Acta Psychiatrica Scandinavica, 101, 286-292. Stommel, M . , Given, B . A . , Given, C .W. , Kalaian, H . A . , Schulz, R. , & McCorkle , R. (1993). Gender bias in the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D) . Psychiatry Research. 49. 239-250. Tousignant, M . , Brosseau, R., & Tremblay, L . (1987). Sex biases in mental health scales: Do women tend to report less serious symptoms and confide more then men? Psychological Medicine, 17, 203-215. Vera, M . , Alegria, M . , Freeman, D . , Robles, R.R. , Rios, R., & Rios, C F . (1991). Depressive symptoms among Puerto Ricans: Island poor compared with residents o f the New York City area. American Journal of Epidemiology, 134, 502-510. Weissman, M . M . , Orvaschel, H . , & Padian, N . (1980). Children's symptoms and social functioning self-report scales: Comparison o f mother's and children's reports. Journal of Nervous and Mental Disease. 168, 736-740. Weissman, M . M . , Sholomskas, D . , Pottenger, M . , Prusoff, B . A . , & Locke, B . Z . (1977). Assessing depressive symptoms in five psychiatric populations: A validation study. American Journal of Epidemiology, 106, 203-214. Wilhelm, K . , & Parker, G . (1994). Sex differences in lifetime depression rates: Fact or artefact? Psychological Medicine, 24, 97-111. Winokur, G . , & Clayton, P. (1967). Sex differences and alcoholism in primary affective illness. British Journal of Psychiatry, 113, 973-979. Wrobel, N . H . (1993). Effect of patient age and gender on clinical decisions. Professional Psychology: Research and Practice, 24. 206-212. 45 Yesavage, J .A. , Brink, T .L . , Rose, T .L . , Lum, 0 . , Huang, V . , Adey, M . B . , & Leirer, V . O . (1983). Development and validation of a geriatric depression screening scale: A preliminary report. Journal of Psychiatric Research, 17, 37-49. Zumbo, B . D . (1999). A handbook on the theory and methods of differential item functioning ( D U ) : Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottawa, O N : Directorate of Human Resources Research and Evaluation, Department of National Defense. Zumbo, B . D . (in press). Does item-level DLF manifest itself in scale-level analyses?: Implications for translating language tests. Language Testing. Zumbo, B . D . , Gelin, M . N . , & Hubley, A . M . (in press). The construction and use of psychological tests and measures. Encyclopedia of Life Support Systems. United Nations Educational, Scientific and Cultural Organization Publishing ( U N E S C O - E O L S S Publishing), France. Zung, W . W . K . (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-73. Zunzunegui, M . V . , Beland, F., Llacer, A . , & Leon, V . (1998). Gender differences in depressive symptoms among Spanish elderly. Social Psychiatry and Psychiatric Epidemiology. 33, 195-205. Appendix A SPSS Syntax File for the Ordinal Logistic Regression and Corresponding Effect Size Estimator * SPSS S Y N T A X written by: Bruno D . Zumbo, PhD . * University of British Columbia . * e-mail: brunozumbo@ubc.ca . * Instructions . * Copy this file and the file "ologit2.inc", and your SPSS data file into the same * Change the filename, currently 'ordinal, sav' to your file name . * Change 'item', 'total', and 'grp', to the corresponding variables in your file. * Run this entire syntax command file. include file='ologit2.inc'. execute. G E T FELE='C:\ordinal.sav'. E X E C U T E . compute item= i teml . compute total= cesdtot. compute grp= gender. * Regression model with the conditioning variable, total score, in alone, ologit var = item total /output=all. execute. * Regression model adding uniform DEF to model, ologit var = item total grp /contrast grp=indicator /output=all. execute. * Regression model adding non-uniform DEF to the model, ologit var = item total grp total*grp /contrast grp=indicator /output=all. execute. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0053890/manifest

Comment

Related Items