Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Gender differential item functioning effects on various item response formats of the CES-D Gelin, Michaela Nicole 2001

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2001-0200.pdf [ 2.34MB ]
Metadata
JSON: 831-1.0053890.json
JSON-LD: 831-1.0053890-ld.json
RDF/XML (Pretty): 831-1.0053890-rdf.xml
RDF/JSON: 831-1.0053890-rdf.json
Turtle: 831-1.0053890-turtle.txt
N-Triples: 831-1.0053890-rdf-ntriples.txt
Original Record: 831-1.0053890-source.json
Full Text
831-1.0053890-fulltext.txt
Citation
831-1.0053890.ris

Full Text

GENDER DIFFERENTIAL ITEM FUNCTIONING EFFECTS O N V A R I O U S I T E M RESPONSE F O R M A T S OF T H E CES-D by MICHAELA NICOLE GELIN B . A . , University of British Columbia, 1998 Diploma in Guidance Studies, University of British Columbia, 1999 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T OF T H E R E Q U I R E M E N T S FOR T H E D E G R E E OF M A S T E R OF A R T S in T H E F A C U L T Y OF G R A D U A T E STUDIES D E P A R T M E N T OF E D U C A T I O N A L A N D C O U N S E L I N G P S Y C H O L O G Y , A N D SPECIAL EDUCATION W i t h Specialization in MEASUREMENT, EVALUATION, AND RESEARCH METHODOLOGY W e accept this thesis as conforming to the required standard  T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A A p r i l 2001 ©Michaela Nicole Gelin, 2001  A u t h o r i z a t i o n Form In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree a t the U n i v e r s i t y of B r i t i s h Columbia, I agree that the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and study. I f u r t h e r agree that p e r m i s s i o n f o r e x t e n s i v e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d by the head of my department o r by h i s or her r e p r e s e n t a t i v e s . I t i s understood that copying o r p u b l i c a t i o n of t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be allowed without my w r i t t e n p e r m i s s i o n .  Department of  tZ  duccctiovt-nV  The U n i v e r s i t y of B r i t i s h Vancouver, Canada  Date  C%g^jC  5,  a n ^ - Coot.wx-^iivx^ ""Pksy c U o l o ^ y Columbia  o,^xJt  Abstract The present study investigated potentially biased scale items on the Center for Epidemiologic Studies Depression ( C E S - D ) scale i n a sample o f 600 community-dwelling adults between the ages o f 17 and 87 years. The mean age was 46 years for males (N=310) and 42 years for females (N=290). The 20-item C E S - D was scored using two binary methods (presence and persistence) and one ordinal method. Gender differential item functioning (DIF) was explored using Zumbo's (1999) ordinal logistic regression method with corresponding logistic regression effect size estimator with all three scoring methods. After statistically matching males and females on the underlying ability, gender D I F was found with the C E S - D item crying for the ordinal and presence methods o f scoring. The persistence scoring method identified two DEF items (effort and hopeful), however, this scoring method was o f limited use due to low response rates on some items. Overall, the results indicate that the scoring method has an effect on DEF; thus DEF is a property o f the item, scoring method, and purpose o f the instrument.  iii Table Of Contents Abstract List of Tables List of Figures Acknowledgements  ii iv v vi  Chapter I: 1.1 1.2 1.3  1 4 7  Introduction to the Problem Introduction What is Differential Item Functioning (DIF)? Ordinal Logistic Regression Method  Chapter TL: Literature Review 2.1 Gender Differences with the CES-D 2.2 Previous Applications of DIF to Depression Measures 2.3 Explanations for Gender Differences 2.4 Research Questions  12 14 16 17  Chapter HI: Methodology 3.1 Participants 3.2 Measure 3.2.1 CES-D Item and Total Scoring 3.3 Analysis  18 18 19 23  Chapter IV: Results 4.1 Introduction 4.2 Assumptions 4.3 Ordinal Scored 4.3.1 Classical Analysis 4.3.2 Gender DIF Analysis 4.4 Presence Scored 4.4.1 Classical Analysis 4.3.3 Gender DIF Analysis 4.5 Persistence Scored 4.5.1 Classical Analysis 4.5.2 Gender DIF Analysis  25 25 25 26 28 28 30 30  Chapter V: Discussion 5.1 Gender DIF for the CES-D Items 5.2 The Effect of Different Scoring Methods on DIF 5.3 Implications  34 34 35 36  References Appendix Appendix A : SPSS Syntax for the Ordinal Logistic Regression and Corresponding Effect Size estimator  39  46  iv List Of Tables Table 1: Results from Zumbo's (1999) ordinal logistic regression method and corresponding effect size measure for Ordinal scoring method - CES-D items 27 Table 2: Results from Zumbo's (1999) ordinal logistic regression method and corresponding effect size measure for Presence scoring method - CES-D items 29 Table 3: Results from Zumbo's (1999) ordinal logistic regression method and corresponding effect size measure fox Persistence scoring method - CES-D items 32 Table 4: Item endorsement proportions by gender for the CES-D scale (310 males; 290 females) using the persistence scoring method  33  V  L i s t Of Figures  Figure 1: Center for Epidemiology Depression Scale (CES-D)  21  Figure 2: Scoring the CES-D  22  vi  Acknowledgements I first wish to acknowledge my mentor and supervisor, D r . Bruno Zumbo, for his guidance, statistical expertise, enthusiasm, and enduring support throughout all stages o f this thesis. H i s assistance has been invaluable - from the earliest ideas to the final drafts. I would also like to acknowledge my committee member, D r . A n i t a Hubley, for providing insightful and thoughtful suggestions throughout this thesis. I would also like to thank Dr. Beth Haverkamp for being the external examiner. Additional thanks are due especially to my parents for their unwavering support and confidence as I pursued both m y undergraduate and graduate degrees. Finally, I would like to acknowledge the Institute for Social Research and Evaluation at the University o f Northern British Columbia for providing the data used in this thesis.  1  Chapter I Introduction to the Problem 1.1 Introduction The Center for Epidemiologic Studies Depression scale ( C E S - D ; Radloff, 1977) is a widely used self-report measure developed for use in studies exploring the epidemiology o f depressive symptomology in the general population. This scale has also been used in numerous studies to compare the prevalence o f depressive symptomology i n different groups such as (1) racial/ethnic groups (Aneshensel, Frerichs, & Huba, 1984; Snyder, Cervantes, & Padilla, 1990; Vera et al. 1991), (2) age groups (Gatz & Hurwicz, 1990; Hertzog, V a n Alstine, Usala, Hultsch, & D i x o n , 1990; Kessler, Foster, Webster, & House, 1992; Liang, Tran, Krause, & Markides, 1989), (3) nonpsychiatric (Devins et al., 1988) and psychiatric (Weissman, Sholomskas, Pottenger, Prusoff, & Locke, 1977) medically i l l groups, and (4) between men and women (Clark, Aneshensel, Frerichs, & Morgan, 1981; Krause, 1986; Roberts, Andrews, Lewinsohn, & Hops, 1990; Snyder et al. 1990; Sonnenberg, Beekman, Deeg, & V a n Tilburg, 2000; Stommel et al. 1993; Zunzunegui, Beland, Llacer, & Leon, 1998). The majority o f these studies reported differences between the groups o f interest by comparing the mean scale scores. However, the scale mean differences could indicate that the two groups have a different probability o f endorsing the items or that the test items function differently for the two groups (e.g., males and females). Therefore, i n order to accurately interpret these comparisons one needs to investigate whether scale score differences are correctly attributable to the construct o f interest or whether they are being erroneously attributed to the construct o f interest (hence being spurious). That is, i f DIF results are erroneously attributable to the construct o f interest, thereby inferring that there are  2  real differences when the differences are actually due to an irrelevant factor, item bias is evident. Conversely, i f DEF results are correctly attributable to the construct o f interest, thereby inferring that there are real differences between the groups, item impact is evident. The first step required to investigate whether there is item bias or item impact is to investigate differential item functioning (DEF). In this thesis, gender based DEF w i l l be explored with the C E S - D to identify test items that function differently between males and females. These comparison groups were chosen because o f the large number o f studies reporting that women are much more likely than men to report high levels o f depressive symptoms. The C E S - D is comprised o f 20 items that reflect various aspects o f depressive symptomology. Furthermore, the C E S - D is used with three types o f scoring: ordinal, presence, and persistence. The ordinal scoring method uses a Likert type format that is intended to identify the presence and severity o f depressive symptoms (Radloff & Locke, 1986). Alternatively, the presence and persistence scoring methods use a binary scoring format. The essential difference between these two methods is that the latter requires that the symptomology be present longer than the former. DEF should be investigated for each o f these scoring methods because it is unclear whether the scoring methods affect DEF. In essence, i f an item is found to be DEF with the ordinal scoring method, w i l l it still perform differentially with one o f the binary scoring methods or w i l l re-scoring the measure remove the DEF? Furthermore, i f scoring methods affect DEF, then inferences derived from DEF results based on one scoring method w i l l not be appropriate or valid for a different scoring method. N o previous study has compared DEF for various scoring methods with the same instrument.  3 The purpose o f this thesis is to investigate whether, for each scoring method, any C E S - D scale items exhibit gender DEF. The DEF w i l l be explored using Zumbo's (1999) ordinal logistic regression method along with his corresponding effect size estimator that are used i n combination to help identify differentially functioning items. In particular, this technique is used to investigate whether or not males and females have a different probability of endorsing the items on the C E S - D . If DEF is found with some items, then further investigations are needed to determine i f the DEF is because o f item bias or item impact. Determining whether an item displays bias or impact has a number o f significant implications for researchers, selection personnel, test takers, and policy makers. The primary issue is one o f consequential matters o f test fairness and equity. That is, there should be a level playing field where men and women have equal opportunities (e.g., i n the personnel selection context) and being treated equitably (e.g., in the screening for depression it is inappropriate to portray women as being more depressed than men i f it is an artifact o f the measurement process). To this end, the following section w i l l briefly review differential item functioning (DEF), the relevant terminology found i n the literature on bias, and Zumbo's ordinal logistic regression method for the detection o f DEF. Ln the next chapter, the few studies that have explored gender biased items o f the C E S - D w i l l be reviewed. Next, chapter HI w i l l describe the study methodology. This w i l l include the study participants, a description o f the C E S - D measure and the different scoring methods. It w i l l also include the analyses steps, criteria for determining a DEF item, and a brief discussion on the odds ratio statistic used to determine the direction o f responding. The results o f the study w i l l be reported i n Chapter EV. The final chapter w i l l discuss the study results in terms o f the research questions: (a) gender DEF  4 for the C E S - D items and (b) the effect o f different scoring methods on D I F . Next, implications for research and practice are considered, including what to do with an item that displays DIF. This is followed by limitations o f the study and future directions. The thesis concludes with comments on the contribution o f this study for measuring depression with the C E S - D , gender differences on depression, and for D I F analyses with various scoring methods of the same instrument.  1.2 What is Differential Item Functioning (DIF)? Differential item functioning (DIF) is a statistical technique that is used to identify differential item response patterns between groups o f test-takers (e.g., male versus female, Caucasian versus African American). In assessing response patterns, the comparison groups, conceptualized in this study as men and women, are first statistically matched on the underlying construct o f interest (e.g., depressive symptomology), then the D I F methods evaluate the response patterns to individual test items. Thus, as Zumbo (1999) states, D I F occurs when examinees with the same underlying ability on the construct measured by the test, but who are from different groups, have a different probability o f endorsing (or correctly answering) the item. He continues with a conceptualization o f the basic principle o f DIF: " I f different groups o f test-takers (e.g., males and females) have roughly the same level o f something (e.g., knowledge), then they should perform similarly on individual test items regardless o f group membership" (p. 5). In this thesis, D I F matches males and females on depressive symptomology, measured by the C E S - D total score. This study could have also matched males and females on a different criterion measure for the latent variable, such as a medical diagnosis o f depression from a clinician. This matching alternative would be useful if, for example, the  5 C E S - D total score was found to be misleading or inappropriate, such as the case i f D I F were found for many C E S - D items. DIF is different than previous classical test theory techniques used to.assess bias because D I F matches the groups o f interest on the latent variable o f interest; previous bias studies compared mean scores either without any matching technique or simply compared the factor structure for the groups o f interest. Previous studies that found group differences on observed scores, such as group comparisons o f scale means, may be misleading because respondents are not first matched on the construct o f interest. Thus, matching groups on the variable measured by the test is important for determining whether item responses are equally valid for different groups. However, it should be noted that D I F is a statistical method to flag potentially problematic items. Therefore, it is the first step i n determining whether there is item bias or item impact. Further study would be needed by content experts to determine whether one has bias or impact. Item bias is a value judgment with social, political, and ethical implications, and thus, takes into account the purpose o f the test. Specifically, item bias requires that the source o f the differential functioning o f the item is irrelevant to the purpose o f the test and/or interpretation o f the measure. In essence, item bias is an artifact o f the testing procedure. That is, item bias would occur i f one group o f test-takers (e.g., males) were less likely to endorse the item than the comparison group o f test-takers (e.g., females) because the item is tapping a factor over-and-above the factor o f interest. For example, i f females were less likely to endorse an item from an achievement test o f mathematical ability than men because the question required prior knowledge o f basketball scores (assuming females do not know the point system used i n basketball and males do) then the item is biased. Thus, for item bias  6 to occur, DEF must be apparent; however, as Zumbo (1999) reminds us, "DEF is a necessary, but not sufficient, condition for item bias" (p. 12). Item impact is evident when one group o f examinees is found to endorse the item more than the other group o f examinees because the two groups truly differ on the underlying ability or factor being measured by the test. That is, item impact occurs when the item measures a relevant characteristic o f the test, and 'real' differences between the two groups of interest are found. For example, i f females were less likely to endorse an item from an achievement test o f mathematical ability than men matched on mathematical ability, and men and women truly differed on mathematical aptitude, item impact is present. The distinction between whether the group differences are based on irrelevant or relevant characteristics o f the measure is really a question about the purpose o f the measure. Therefore, one needs to be clear about the purpose o f the test before conducting the analysis. A s well, it is important to note, that i f an item is flagged as displaying DEF, it does not mean that the item should be automatically omitted from the scale. Rather, items that are flagged as displaying DEF should be carefully analyzed by experts i n the appropriate area. For example, i f a C E S - D item were flagged as displaying DEF then depression researchers should carefully analyze why the item was flagged. Taken as a whole, methodologically, the DEF analysis is computed by initially matching the two groups o f interest on their underlying ability as determined by their overall performance on the test (i.e. the total test score). Next, the DEF statistic is computed and this indicates the extent to which members o f one group perform differently from members o f some other group who are o f comparable overall ability. If DEF is found, further analyses are needed to determine i f the DEF is because o f item bias or item impact. This thesis does not  7  continue with the investigation o f the source o f item bias or impact because it would take experts in the area o f depression to conduct that study. Moreover, a study o f item bias or impact with differential functioning items o f the C E S - D for a general population has not been reported i n the literature.  1.3 Ordinal Logistic Regression Method In order to calculate D I F i n binary and ordinal scored items, this study used Zumbo's (1999) ordinal logistic regression method. To date, Zumbo's ordinal logistic regression method and corresponding measure o f effect size is the only method available for both binary and ordinal scored items. Thus, one advantage o f this method is that it allows for a direct comparison o f the results from binary and ordinal scored items because only this one method is required. That is, statistical method effects do not influence the results, h i addition, this method has a corresponding effect size estimator that can be used with binary and ordinal items to help determine the magnitude o f DIF. The effect size estimator is extremely important for this particular study because D I F is based on a large sample size and, without an examination o f the effect size, trivial effects may appear to be statistically significant. The ordinal logistic regression method o f DBF w i l l be used for the items on the C E S D that are scored using the ordinal method and it w i l l be repeated for the items that are scored using the two binary methods. A s described i n Zumbo's (1999) handbook, this procedure uses the item response as the dependent variable, with the grouping variable (characterized as variable G R P ) , total scale score for each examinee (characterized as variable T O T A L ) and a group by total interaction as independent variables. This can be expressed as a linear regression o f predictor variables on a latent continuously distributed random variable, y*.  The ordinal logistic regression equation is y*=b +b TOTAL+b GRP+ b T O T A L * G R P i + e 0  1  2  3  {  Zumbo's (1999) ordinal logistic regression method provides a test o f DEF that measures the effect o f group and the interaction, over-and-above the total scale score while, at the same time, statistically matching on the total scale score. This DEF method has a natural hierarchy o f entering variables into the model i n which the conditioning variable (i.e. the total score) is entered first. Next, the grouping variable (e.g., gender) is entered. This step measures the effect o f the grouping variable while holding constant the effect o f the conditioning variable. Finally, the interaction term (e.g., T O T A L * G E N D E R ) is entered into the equation which describes whether the difference between the group responses on an item varies over that latent variable continuum. Each o f these steps provides a Chi-squared statistic which is used in the statistical test o f DEF. The DEF computation is basically the difference between the Chi-squared value for Step #3 and the Chi-squared value for Step #1. That is, the Chi-squared value for Step #1 is subtracted from the Chi-squared value for Step #3 giving a resultant two degrees o f freedom Chi-squared value. The two degrees o f freedom arises because it is the difference between the three degrees o f freedom at Step #3 and the one degree o f freedom at Step #1. Next, the p-value for this resultant two degrees o f freedom Chi-squared test is determined by using a Chi-squared probability table that is found i n most statistical textbooks. Just as a Chi-squared statistic is computed for each step i n Zumbo's (1999) ordinal logistic regression method, the corresponding effect size estimator is computed for each step. This corresponding effect size value is calculated as an R-squared which can be applied to  9  both binary and ordinal items. Using these R-squared values, the magnitude o f D D can be 7  computed by subtracting the R-squared value for Step #1 from that for Step #3. Lastly, in order to classify an item as displaying DIF, one must consider both the two degrees o f freedom Chi-squared test o f DIF and Zumbo's corresponding effect size measure. Zumbo (1999) proposed two criteria that must be met for an item to be classified as displaying DIF. First, the two degrees o f freedom Chi-squared test for D I F must have a pvalue less than or equal to 0.01. Second, the corresponding effect size measure must have a R-squared (R ) value o f at least 0.130. However, Jodoin and Gierl's (in press) investigation 2  of DIF effect size measures suggests that this R value is very conservative, and thus they 2  propose a more liberal R value for detecting DIF. Specifically, Jodoin and Gierl propose R 2  2  values below 0.035 for negligible DIF, between 0.035 and 0.070 for moderate DIF, and above 0.070 for large DIF. Taken together, this thesis w i l l require that (1) an item must have a p-value less than or equal to 0.01 with the two degrees o f freedom Chi-square test, and (2) the corresponding R must be greater than or equal to 0.035 for an item to be classified as 2  displaying DIF. If both o f these criteria are met, Jodoin and Gierl's effect size criteria w i l l be used to quantify the magnitude o f DIF. Furthermore, i f DIF exists for an item, the steps computed i n the calculation o f DIF using Zumbo's (1999) ordinal logistic regression w i l l be reviewed to determine i f the DJJF is uniform or non-uniform. Uniform D I F occurs when there is no interaction between the probability o f endorsing an item and the group membership being tested. That is, D I F functions i n a uniform fashion across the latent continuum o f variation (i.e., depression). That is, uniform D I F may occur when the D I F is attributable to differences in item difficulty only. This can be determined by comparing the R-squared values between steps #2 and #1 "to  10  measure the unique variation attributable to the group differences over-and-above the conditioning variable (the total score) (Zumbo 1999, p. 26). Uniform DD? can also be graphically illustrated as two nonlinear regression lines (one for each group) with a substantial area between the two curves that do not cross over each other. The regression lines typically characterize the probability o f endorsing an item as a function o f an underlying construct. If uniform D I F is found, the odds ratio w i l l be used to interpret the direction o f the D I F (i.e., are females or males more likely to respond?). For the ordinal scoring method, the odds ratio is computed from Step #2 o f Zumbo's (1999) ordinal logistic regression (i.e. the regression model adding uniform D I F to the model). Next, a % (1) test 2  (Step #2 - Step #1) with corresponding p-value and the R effect size values are computed 2  and the odds ratio is computed from the regression coefficient (more technically, the odds ratio is computed as the exponentiation o f the regression coefficient). Conversely, non-uniform D I F occurs when there is an interaction between group membership and the criterion variable ( C E S - D total score). Non-uniform D I F reflects a situation i n which an item might differentially favor a group o f respondents (e.g., males) at one end o f the latent continuum and disfavor the comparison group (e.g., females) at the other end o f the spectrum. In terms o f the ordinal logistic regression used for detecting DIF in this thesis, non-uniform D I F can be determined by comparing the R-squared values at step #3 to the R-squared values at step #2. A n item is considered uniform D I F i f the difference between steps #2 and #3 is statistically non-significant and has a trivial effect size. N o n uniform D I F can also be graphically illustrated as two nonlinear regression lines (one for each group) that cross over each other and characterize the probability o f endorsing an item  11  as a function o f an underlying construct. This graph would look similar to an interaction plot from an A N O V A .  12  Chapter II Literature Review 2.1 Gender Differences with the C E S - D Since the introduction o f the C E S - D i n 1977, numerous studies have documented that women tend to have higher scale scores than men on the C E S - D . That is, women tend to endorse depressive symptoms more than men (e.g., Callahan & Wolinsky, 1994; Clark et al., 1981; Krause, 1986; Sonnenberg et a l , 2000). Furthermore, several studies exploring general depression have reported that women experience depression twice as frequently as men whether one looks at depressive symptoms or depressive disorders, and whether referred or non-referred samples are used (e.g., Culbertson, 1997; Leon, Klerman, & Wickramaratne, 1993; Nolen-Hoeksema, 1987). Moreover, this 2:1 prevalence ratio frequently has been found with individuals in the age range between late adolescence and approximately 64 years of age (Nolen-Hoeksema, 1990). Item level gender differences in depression self-report measures have also been documented. A number o f studies report problematic items on the C E S - D by comparing mean score differences or the factor structure between males and females. For example, Roberts et al. (1990) found the C E S - D items "crying" and "appetite" had different factor loadings for males and females in a sample o f adolescents. Unfortunately, these differences often are documented as a form o f bias (e.g., gender bias) but the groups are not first matched. However, as mentioned previously, comparing mean score differences on items or total scores between two groups provide uninterpretable results. A s Santor et al. (1994) express, "Finding an overall mean difference between two groups does not demonstrate bias, nor does failing to find a difference preclude the possibility o f bias" (p. 256).  13  To date only one study (Cole, Kawachi, Mailer, & Berkman, 2000) has presented item-level gender DEF with the C E S - D and this study only explored the C E S - D with the ordinal scoring method. Although Cole et al. label their methodology as an extension o f the Mantel-Haenszel method o f detecting DEF, a closer look reveals that they are using Zumbo's method o f modelling DEF through ordinal logistic regression, except that they do not use the R effect size method. Instead, they report on the odd-ratio as an effect size. Using this 2  technique with a sample o f 2340 community dwelling adults 65 years o f age or older, Cole et al. found that the C E S - D item "crying" functioned differently; the proportional odds o f women responding higher on the "crying" item were 2.14 times that o f men matched on overall depressive symptoms. That is, women were more than twice as likely to endorse the "crying" item than men. While the study by Cole et al. (2000) is the only item-level study o f gender differences with the C E S - D , Stommel et al. (1993) assessed item bias by gender on the C E S D using factor analysis. Using a series o f multi-sample confirmatory factor analysis models on a sample o f 1212 subjects (708 cancer patients between the ages o f 19-89, average age o f 61; 504 caregivers o f chronically i l l elderly between the ages o f 18-88, average age o f 63), Stommel et al. (1993) found that the items "crying" and "talked less" were gender biased. Females were more likely to endorse the item "crying spells" compared to males, while females were less likely to endorse the item "talked less" compared to males matched on overall depression. The technique they used was an ordinary least-squares multiple regression with: (a) the item o f interest as the dependent variable, (b) gender and the remaining items as predictors, and (c) the t-test o f the gender variable was examined to see i f it is statistically significant. This approach is a primitive form o f DEF analysis that only  14  allows for uniform DEF and treats the dependent variable (i.e., the item o f interest) as a continuous variable. Although the exclusive use o f scale-level methods o f factor analysis and reliability, such as the method used by Stommel et al. (1993), are commonly used to investigate item bias, a recent paper by Zumbo (in press) demonstrates that item-level DEF does not manifest itself in scale-level methods. In other words, factor analysis by itself w i l l not necessarily detect DEF. Accordingly, Zumbo (in press) recommends that one must do item-by-item analysis to identify differentially functioning items. He also states that factor analysis can still be conducted; however, it answers a different question. It can only confirm that the test is measuring the same thing i n both groups. However, differential item functioning'by gender with the C E S - D has not been explored using a sample with a broad age range from the general population, nor has it been explored using the various scoring methods o f the C E S - D . The Cole et al. (2000) study only used the ordinal scoring. In fact, the study reported i n this thesis is the first to compare DEF across various scoring methods - ordinal and binary.  2.2 Previous Applications of DIF to Depression Measures A broader literature review on the use o f DEF methods (e.g., Rasch, Mantel-Haenszel, and logistic regression methods) applied to other depression measures such as the Beck Depression Inventory (BDI: Beck, Ward, Mendelson, M o c k , & Erbaugh, 1961), Geriatric Depression Scale ( G D S : Yesavage et al., 1983), Hamilton Rating Scale for Depression ( H R S D : Hamilton, 1960), Hudson's Generalized Contentment Scale ( G C S : Hudson, 1982) and/or Zung's Self-rating Depression Scale (SDS; Zung, 1965) was carried out. This  15  extensive literature search revealed that D I F techniques have been used only with the B D I and H R S D . Using nonparametric item response modelling, Santor, Ramsay, and Zuroff (1994) examined gender item bias on the B D I with a sample o f depressed outpatient (N=648) and nonpatient college ( N = l 182) individuals. Each o f the 21 items on the B D I consisted o f four graded statements that were scored from 0 to 3, and a total depression score was computed by summing all o f the scaled responses. Examining gender bias as a function o f the severity of depression with the depressed outpatient sample resulted i n three B D I items demonstrating DIF: Item 6 (sense o f punishment), item 10 (crying), and item 14 (distortion o f body image). "Overall bias was defined as the weighted squared difference between the [option characteristic] curves for men and women" (Santor et al., p.261). A similar analysis was conducted with the college sample. Results from this sample also revealed that item 14 (distortion o f body image) demonstrated the greatest amount o f gender bias i n expected item score; however, no gender bias was found for items 6 (sense o f punishment) or 10 (crying). In a more recent study by Santor and Ramsay (1998), item 1 (depressed mood) from the H R S D was used as an example to illustrate D I F between depressed (N=418) and nondepressed (N=238) individuals. After matching individuals from the two groups on depressive severity (e.g., total H R S D score using an ordinal scoring method), clinically depressed individuals were more likely to endorse item 1 (depressed mood) than nondepressed individuals. The other items on the H R S D have not been examined for DIF.  16  2.3 Explanations for Gender Differences The prevalence o f findings on gender differences have led to several proposed explanations designed to account for the apparent differences. Such explanations include genetic causes (i.e., depression may be genetically transmitted), social and personality factors, as well as the role o f endocrine factors. Another possibility is that the prevalence rates o f depression are, in fact, equal in men and women, and the apparent gender differences are believed to reflect an artifact as opposed to true differences. That is, the apparent gender differences do not reflect differences in depression per se, but rather reflect differences i n the way men and women express depressive symptoms (e.g., Winokur & Clayton, 1967), recall depressive symptoms (e.g., Angst & Dobler-Mikola, 1984), are w i l l i n g to report depressive symptoms (e.g., Frank, Carpenter, & Kupfer, 1988), and/or even diagnostic biases among mental health professionals (e.g., Lopez, 1989; Potts, Burnam, & Wells, 1991; Wrobel, 1993). However, there is no clear evidence that men and women express depressive symptoms differently (Nolen-Hoeksema, 1987). Similarly, there are inconclusive findings that gender differences are accounted for by a greater tendency for women to remember symptoms (e.g., Coryell, Endicott, & Keller, 1992; Fennig, Schwartz, & Bromet, 1994; W i l h e l m & Parker, 1994) and/or admit to experiencing depressive symptoms (e.g., K i n g & Buchwald, 1982; Tousignant, Brosseau, & Tremblay, 1987). O n the other hand, a number o f studies have supported the possibility that mental health practitioners tend to overdiagnose depression in women and underdiagnose depression in men (Lopez, 1989; Wrobel, 1993; Loring & Powell, 1988; Potts et al. 1991), thereby contributing to artificial prevalence rates.  17  What is unclear from this literature is whether the differences being found are: (a) real differences -- impact, (b) differences due to a measurement artifact - bias, or (c) real differences have been minimized or exaggerated by measurement artifacts.  2.4 Research Questions Given that (a) few studies have explored gender DEF i n depression measures, (b) only one study has investigated gender DEF for the C E S - D and that this study (Cole et al., 2000) focused on one sample o f seniors 65 years o f age or older, and (c) no study has compared DEF for ordinal and binary item formats on the same scale, the present study is needed and w i l l contribute to the literature on depression and gender differences, the C E S - D , and the psychometrics o f DEF. The research questions investigated i n this thesis are: i)  Does gender DEF exist for the C E S - D for the ordinal, presence, and persistence scoring formats?  ii)  A r e any C E S - D items found as DEF irrespective o f the scoring method (i.e., for all the scoring methods)?  iii) A r e any C E S - D items found to be DEF for only some o f the scoring methods? Therefore, the two purposes o f this thesis are (a) to investigate gender DEF for the C E S - D items, and (b) to investigate whether different scoring methods affect the DEF results. Given Cole et al.'s (2000) findings with a sample o f seniors, I expect that the crying item w i l l demonstrate gender DEF for the ordinal scoring method i n a general population. However, given that there has been no empirical or theoretical work comparing the effect o f scoring method, it is unclear whether the different scoring methods have an effect on DEF.  18  Chapter Three Methodology 3.1 Participants Individuals who were included in this study were obtained from the Health and Health Care Survey carried out by the Institute for Social Research and Evaluation (ISRE) at the University o f Northern British Columbia, Canada, i n the fall o f 1998. The sample comprised o f 600 community-dwelling adults living in Northern British Columbia: 290 females and 310 males, who were drawn randomly from the D o m i n i o n phone list. The mean age o f female participants was 42 years (SD = 13.4, range = 18 to 87 years), and the mean age o f male participants was 46 years (SD = 12.1, range = 17 to 82 years).  3.2 Measure The Center for Epidemiologic Studies - Depression ( C E S - D ) scale used in this study is a 20 item self-administered instrument originally introduced by Lenore Radioff (1977). This scale was designed to measure the current feelings o f depression i n the general population. Although this scale has been applied to various clinical samples (e.g., Craig & V a n Natta, 1976; Weissman et al., 1977), it was never designed to be used as a screening tool for identifying clinical depression (e.g., within standardized systems such as D S M - T V ; American Psychiatric Association, 1994) or for discriminating among subtypes o f depression. The C E S - D has also been translated into many different foreign languages (e.g., Caetano, 1987) and it has been validated for use with a number o f different ethnic groups (e.g., Roberts, 1980), as well as for specific age groups such as children (e.g., Weissman, Orvaschel, & Padian, 1980), adolescents, and the elderly (e.g., DeForge & Sobal, 1988; Gatz & H u r w i c z , 1990).  3.2.1 C E S - D Item and Total Scoring The C E S - D , reproduced in Figure 1, asks respondents to indicate the frequency/duration with which they have experienced a specific symptom associated with depression (e.g., M y sleep was restless) during the previous week. Each item has four options that have specific anchors which correspond to the frequency that each o f the 20 symptoms was experienced. These anchors are intended to reflect the differences in the presence and severity o f depressive symptoms and are usually labelled as: Option 0, rarely or none of the time / less than 1 day; Option 1, some or a little of the time /1-2 days; Option 2, occasionally or a moderate amount of the time / 3-4 days; and Option 3, most of the time / 57 days. Using this response scale, the C E S - D can be scored i n three different ways: an ordinal scoring format and two dichotomous scoring formats (see Figure 2). Originally, the four options are scored 0, 1, 2, or 3, respectively, which may be termed the "ordinal" method o f scoring. Next, the scoring o f the positively worded items (item 4, 8, 12 and 16) is reversed, and then all 20 scaled responses are summed for a possible range o f scores from 0 to 60. Alternatively, the options may be scored dichotomously with respect to a specific threshold. The most popular dichotomous scoring method is termed the "presence" method of scoring. This method refers to a respondent's report o f having experienced the symptom at least some o f the time during the preceding week (i.e. for 1 to 7 days). This method is used when researchers are interested only in the presence or absence o f any depressive symptomology. In this case, Option 0 is assigned a score o f 0, indicating no depression, and all other response options (Options 1, 2, and 3) are assigned a score o f 1, indicating depression. Then the positively worded items are reverse scored (ones are recoded as zeros,  20  and vice versa), and lastly each scaled response is summed for a possible total scale score o f 20. A n alternative dichotomous format, termed the "persistence" method, is used when researchers are interested only in whether an individual is likely to be depressed. The "persistence" o f a symptom usually refers to the respondent's report o f having experienced the symptom for 3-7 days during the preceding week. For this method, Option 0 and Option 1 are scored as 0, and Option 2 and Option 3 are scored as 1. Next, the positively worded items are reverse scored, and then each scaled response is added for a possible range o f scores between 0 and 20.  1  Studies using dichotomous formats have been presented by Clark  et al. (1981), Craig and V a n Natta (1976), Myers and Weissman (1980), Roberts and Vernon (1983), and Santor and Coyne (1997).  ' A n extreme version o f the "persistence" method o f scoring is also used with the C E S - D , and refers to the respondent's report o f having experienced the symptom for 5-7 days during the preceding week. This is extreme because an individual must endorse Option 3 for a symptom to be scored as indicating depression. This scoring method was not used i n this thesis because there was not enough variability i n the item responses for the ordinal logistic regression to be computed. That is, as expected i n a general population survey, few respondents select 5-7 days.  21  Figure 1. Center for Epidemiologic Studies Depression ( C E S - D ) Scale.  Center for Epidemiologic Studies Depression (CES-D) Scale: Format for Self-Administered Use I N S T R U C T I O N S : Using the scale below, please circle the number for each statement that best describes how often you felt or behaved this way during the past week. 0 = Rarely or none o f the time (less than 1 day) 1 = Some or a little o f the time (1-2 days) 2 = Occasionally or a moderate amount o f time (3-4 days) 3 = Most or all o f the time (5-7 days)  DURING THE PAST WEEK:  Less than 1 day  days  1.1 was bothered by things that usually don't bother me.  0  2.1 did not feel like eating; my appetite was poor.  _4  days  5 -7 d  1  2  3  0  1  2  3  3.1 felt that I could not shake off the blues even with help from m y family or friends.  0  1  2  3  4.1 felt that I was just as good as other people.  0  1  2  3  5.1 had trouble keeping m y mind on what I was doing.  0  1  2  3  6.1 felt depressed.  0  1  2 .  3  7.1 felt that everything I did was an effort.  0  1  2  3  8.1 felt hopeful about the future.  0  1  2  3  9.1 thought m y life had been a failure.  0  1  2  3  10.1 felt fearful.  0  1  2  3  11. M y sleep was restless.  0  1  2  3  12.1 was happy.  0  1  2  3  13.1 talked less than usual.  0  1  2  3  14.1 felt lonely.  0  1  2  3  15. People were unfriendly.  0  1  2  3  16.1 enjoyed life.  0  1  2  3  17.1 had crying spells.  0  1  2  3  18.1 felt sad.  0  1  2  3  19.1 felt that people dislike me.  0  1  2  3  20.1 could not get "going".  0  1  2  3  Note: Items are summed after reverse scoring o f items 4, 8, 12, and 16. Total C E S - D scores range from 0-60, with higher scores indicating higher levels o f general depression.  22  Figure 2. Scoring the C E S - D .  Scoring the CES-D Each item has four options: Option 0, rarely or none of the time / less than 1 day Option 1, some or a little of the time /1-2 days Option 2, occasionally or a moderate amount of the time / 3-4 days Option 3, most of the time / 5-7 days  O R D I N A L scoring method >  A l l four options are scored 0, 1,2, or 3, respectively  >  The total score ranges from 0 - 60.  P R E S E N C E scoring method >  The respondent's report o f having experienced the symptom at least some of the time during the preceding week (i.e. for 1 to 7 days). Option Option Option Option  >  isassigne  0 1 "1 2 > 3 J  V  assigned  0 (indicating no depression) 1 (indicating depression)  The total score ranges from 0 - 20.  P E R S I S T E N C E scoring method >  The respondent's report o f having experienced the symptom for 3-7 days during the preceding week. Option 0 ~i Option 1 J Option 2 T Option 3 J  >  (indicating no depression)  u  , ,. (  1  .  i n d l c a t m  . . . S depression)  The total score ranges from 0 - 20.  23  3.3 Analysis Based on the observation that reliability coefficients are often used as a standard o f comparison to show equivalence o f measures, they w i l l be reported i n this thesis. Reporting reliability coefficients is such common practise that not reporting them gives the impression of an incomplete analysis. Specifically, for all three scoring methods of the C E S - D , coefficient alpha reliabilities w i l l be reported for males and females separately, and for the overall scale (males and females combined). However, although this classical test statistic is computed as an estimate o f the test reliability, it should be interpreted cautiously because item-level DEF does not manifest itself in the reliability coefficient (Zumbo, in press). That is, the reliability estimates may be the same for males and females, however, some items may still be DEF. Next, gender DEF w i l l be investigated for each item o f the C E S - D using Zumbo's (1999) ordinal method o f logistic regression and corresponding effect size measure for each scoring method. In all three gender DEF analyses, gender (coded as 0=female, l=male) was the grouping variable. This can be expressed as Y*=b +biTOTAL+b2GENDER+b TOTAL*GENDER + 0  3  A s this equation demonstrates, female and male respondents initially are matched according to their total test score (characterized as variable T O T A L ) on the C E S - D . A s discussed previously, this total test score depends on the scoring format. Appendix A provides the SPSS syntax file used to calculate the ordinal logistic regression and corresponding effect size estimator. It should be noted that this syntax file calls for a public domain SPSS macro (filename: ologit.inc) written by Prof. Dr. Steffen Kuhnel, and modified by John Hendricks, University of Nijmegen, The Netherlands (see Appendix A ) .  24  A s noted earlier in this thesis, the criterion for a D I F item is that (a) the x (2) has a p2  value less than .01, and (b) the R for this 2df test be greater than or equal to 0.035. Jodoin 2  and Gierl (in press) showed that this is a statistically powerful criterion. In addition, Jodoin and Gierl's effect size criteria w i l l be used to quantify the magnitude o f DIF: R values below 2  0.035 for negligible DIF, between 0.035 and 0.070 for moderate DIF, and above 0.070 for large DIF. In addition, the proportional odds ratio for each item w i l l be presented. The odds ratio w i l l be used to help determine the direction o f responding. That is, it w i l l identify whether men or women are more likely to endorse the item. Moreover, it can be used to determine the odds o f one group responding higher to an individual item than those in the corresponding group, after matching on overall depressive symptomology. For example, a proportional odds ratio o f 2.0 can be translated to mean that those i n group one (e.g., females) are twice as likely to endorse the item than those in the comparison group (e.g., males coded as zero).  25  Chapter Four  Results 4.1 Introduction The results o f the analyses are reported for each scoring method separately, starting with the ordinal method, and followed by the presence and persistence method. For each scoring method, the analyses were calculated using SPSS 10.0 for Windows. Moreover, for each scoring method, results from Zumbo's ordinal logistic regression method and corresponding effect size measure, R-squared, are presented i n a tabular format. Each table lists the C E S - D item number, 1 through 20, the Chi-squared test statistic and the corresponding R-squared effect size measure for each step in the model. The final column reports the D I F computation, which includes the two degrees o f freedom Chi-squared test statistic value with its p-value, as well as the corresponding R-squared effect size value.  4.2 Assumptions The key assumption in using ordinal logistic regression for D I F is essential unidimensionality, which presumes that the items on the C E S - D only measure one dominant factor. In the present case, results from a confirmatory factor analyses study support the unidimensionality o f the scale (Zumbo, Gelin, & Hubley, in press). These authors found that a unidimensional model with method effects modelled for the four positively worded items was the best fit.  4.3 Ordinal scored 4.3.1 Classical analyses Using coefficient alpha, the reliabilities for the ordinal scored C E S - D scale were 0.91 overall, 0.91 for males and 0.90 for females.  4.3.2 Gender D I F analyses A s displayed in Table 1, the results from the ordinal scoring method show that item 17 (crying) displays large gender D I F (DIF R - .218). Moreover, comparing the R-squared 2  values at steps #2 and #3, the data suggest that the "crying" item shows predominantly uniform DIF. Uniform D I F means that there is no interaction between the probability o f endorsing item 17 and the group membership (e.g., gender) being examined. That is, for the "crying" item, D I F functions i n a uniform fashion across the latent variable continuum. Moreover, the proportional odds o f women responding higher on the item "I had crying spells" were 9.31 times that o f men matched on the total score. That is, women were over nine times more likely to score higher on this item. It should be noted that items 2 (eating) and 18 (sad) showed Chi-squared p-values less than 0.01, but their effect sizes were small according to Jodoin and Gierl's (in press) criterion.  27  o  CN o © © cn cn © i—i en © © © © © © I-H o © © © © © © © © © © © o O ©' © © © ©' © © ©' ©' ©  .C  o  o  c •c o  H  MPUT.  o  Vi  1 1  o  o  a: =«:  R <L><  © TiT t vo C N © en cn T t T t rt © © © ©' © ©  CJ  _ Si TM  ON  3  ,_  i  ts  JS  i>  "§  CD  fc  CJ  T3  w H Pa b a C/3 of p  g  '-a c o o,  O  B  wit  o  o o  H  &  CO  CS % hVi  CJ  e  W H  c o  c/1  C*H  •a  1 o  (U  V-  CA  01  $ o  <0  cj  © © © ©  © © © ©'  © © © ©'  00 00 cn H H cn © vq T t cn C N en  Os  T t  T t l-H  00 C N cn T t cn rcn 00 r~ C N  CN  o o o ©  w  VO  p  cs  o «  VO  ON  r- T©t tn un •n vo ©*  CN CN  vo 00 CN  T t  ©  CN  1-H  © © © ©  © © © ©  © © © ©  _ H  r--  © © © ©  © © © ©'  T t  T t l-H  © © © ©'  © © © ©'  © © © ©  © © © ©'  © © © ©  © © © © © © © © © © © p © © © © © ©' ©' ©  CN  rt Os T, CN  © T t  ON  CN  cn 00 vo C N vo C N  CN  T t  Os  T t  i-H  © CN  OS  © CN CN CN  r~ CN  CN  •n cn vo m vo Ocn cn cn rN C N in vo in vi vd r-' Os 00  T t  ©'  H H  i—i C N  00 r-  in r- 00  CN  VO  cn  CN CN  t> CN  Os  ON  cn  T t  ON  T t  CN ON  ©  m 00 T t  T t  vo rt cn fN  _  VD  VO ON ON  © CN  l-H  00 CN  CN  T t  Os Os  CN  in  cn cn vd  © cn 00  © © © ©  © © © ©  © © © ©  © © © ©  © © © ©'  © © © ©'  © © © ©'  © © © ©  © © © ©  © © © ©  © © © ©  © © © ©'  © © © ©  © © © ©  © © © ©  o o o ©'  © © © ©  © © © ©  © © © ©'  T f  CN  CO  en  © I-H CN  rt  cn l-H  T t  cn  © T t  l-H  T t  00 CN T t l-H  "/"> l-H  Os  ©  © 00 T t  T t  cn 00 00 CN  CN T t  VO CN  00 Os CN CN  T t  VO CN  VO  © T t © 00 l - H 1-H  cn 00 vo T t vo rt C N 00 O N i-H T f vo 1 - H CV NO cn VO rcn l - H vo >n in in O N iri 00 C N vd vd r ~ ' 00 cn 00° v-i © © CN T t CN r- Os V O rt rtt C N cn C N cn C N C N T cn rt  r~00 cn  CN CN CN  cn 00 00 cn cn cn 00 00 T t 00 00 ON vo C N © H H © C N ^ H C N T t 00 V O cn vo C N T t V O T t cn T t cn cn T t cn T t  o o o ©  © © © ©'  © © © ©  T t  © f-H  T t  „  T t  00  i - H VO © C N W I cn vo C N T t o © © © ©' © © © ©' ©' ©' © © © ©' © o ©' © ©  fc  odel  H3  00  u. O  CJ  /—•  S3  o  CCJ O  O  E  s o  i—l  * H  Vi  3  CN  s 9  W O  r00  T t  CN  V0 VO  © vd CN © © T t CN  IS  wit!  N  o  'vi  © © © ©'  © © © ©'  © © © ©  © © © ©'  © © © ©'  © © © ©'  © © © ©  © © © ©  © © © ©  © © © ©  © © © ©'  © © © o © © © s © © o ©'  © © © ©  © © © ©  >  Vi  °  , •  a  T3  I cd  2 d  § .1* cd  fc VV OO  o H  CA  i-H  i-H  1-H  00 © C N r- cn 00° C N r- C N T t  CN  T t  cn  © Tl T t  VO CN  CN VO OS  vo  vi l-H  CN  CN  © 00  O N cn 00 r- cn vo vo T t O N ON OS m vo fN T t T t r- m © m vo in ON ON r- 00 © O N T t C N r-' 00 oo' o vd T t C N ©' C N r- O N •n vo © |H © C N cn C N C N cn rt T t cn T t  T t  ON  OS VO i-H  ON  o o  IB  "o  Vi Vi  £  QS w  ESITEI  3  © © © ©  U  i  z  cn  CJ  CJ  ©  H H  CN  cn  CJ  CJ  T t  in vo  vo r- 00 a a a a a a a a a e a a a = aCD a CJ  CN  Os  T t  CJ  CJ  CJ  CJ  CJ  CJ  CJ  CJ  i-H  CJ  CJ  Iteml  ON  p-va  TEP  OS ON  C  IH  H  g  VO  a.  >  <a  CN  cn  CN  ariab  JS -*->  in the l odel  O  Os  T t  rt l-H cn C N C N T t Os VO T t ©N C N cn VO f-H r- r© CN T t T t VO T t cn T t cn cn T t cn T t C •n in © © ©' © © © ©' © © © ©' © © © ©' © © © ©' ©'  fc  Q  T D  o © o © o © © ©  r00 cn  -valu  e  00  cn  VO cn © o © © © Tt © o © © ©  1—H  Os CN CN  Os  vo  CN  VO CO  Vi  00  00 © cn o  Q.  o o  fc  CN CN  H H  CN VO  cn  <LI  -r  en S % E  CN  m  ON  ind No :s in the  cu _N 'vi  cn r- © i - H O N ON V O C N VO cn T t C N V O l - H Os T t © © cn © T t © T t ©' ©' ©' © ©' ©' © ©' © © cn 00 cn  «N © i - H O N cn vo vo 00 OS vo O N O N i - H T t r-H cn C N C N T t O N VO in C N C N CSN C N fr© T t vo T t cn T t cn cn T t cn T t © in Vl vo C N T t o © © © ©' © © © ©' © © ©' © © ©' ©' o © ©' ©  cd  e  © Cl -NH © i - H 00 O N rt © © © © © © © © © CN © © © ©' © ©' © o* © © ©  T t  ©  0)  00 00  H H  tr>OS CN cn C N cn cn vo r- cn C N1 cn © cn ON © CN CN 00 cn I T ) © cn m © C N C N vi ©° © ©' ©' vd vd © © vd ©' vd © in  on  Ul  H H  o m  o  p-valu  w  #3-SI  s  H H  H H  1-H  T t  •*->  00  ON  a a CJ  CJ  © CJ  rt cd  o  28  4.4  Presence scored 4.4.1  Classical analyses  Using coefficient alpha, the reliabilities for the presence scored C E S - D scale were 0.86 overall, 0.85 for males and 0.87 for females.  4.4.2  Gender DIF analyses  The results for the presence scoring method, displayed i n Table 2, show that only item 17 (crying) shows large gender DEF. Furthermore, the difference i n R-squared values from Step #2 to Step #3 was relatively small suggesting that the DEF is predominantly uniform. The proportional odds o f women responding higher on the item "I had crying spells" were 7.51 times that o f men matched on overall depressive symptoms. That is, women were 7.51 times more likely to endorse this item i n a presence format than men. Unlike the ordinal scoring method, none o f the other items had a Chi-squared p-value less than 0.01 (the threshold for DEF).  29  cn r - VO f-H • n O rt f-H © o O ©' ©'  OA C  •c o o  _  T t  B  8  "(U  3  T 1  I  OS I-H cn © O  o  f-H  © ©  © ©  © ©  ©  cn  rt  •n  T t  ©'  cn • n CN  CN i-H  ©  © ©' ©  T t f-H  T t  T t  © © ©  rt  © © ©  IH 00 CN © f-H © o* © '  © © © ©'  00 © © ©'  _ i r- CN rt © 00 © o • n <-- © ON cn CN cn CN T t CN T t © o vo 00 T t © CN f-H cn © © cn CN o © cn f H in  © © © © ©' © ' © ' © ' ©' © f-H  CN cn f-H r - © ^H CN © © © © © © © © © © ' ©' ©' ©' © ' ©  ©' ©  ©' © ' ©  ©' © ' ©  o  ©' ©' ©'  O.  I,  CN cn 00 CN r~ T t © 00 © ' r-' f - H  to CD  00  in  <2  '—V  CN  CN i n CN 00 i n T t T t • n vo VO Ov fT- Ht © CN CN CN r- VO 00 © TT tt f-H CN cn m i n i n 00 00 © T t VO r~' cn m rt' CN © ' T t id  CN T t r~ r~ T t CN CN cn ct n- 00 f - H m r - 00 © •n in © ' l - H c n T t © ' CN Tf  ••—'  3  cn  ea  CN  CD  wi T f  ©' ©' ©' ©' ©  „  N  I  s  1 ex  CU  01  c  OV r- CN 00 cn vo VO © m CN vo T t CN t - r- cn CN Tt CN cn CN T t CN f-H CN CN VO r- © © CN Tt vo CN CN CN r ~ © W"l r - • n f-H i n cn cn r-H T t T t cn CN VS VO cn i n f-H  o o o o  © © © ©  © © © ©'  © © © ©  © © © ©  ©' © ' © ' © ' © ' © ' ©  © © © ©'  © © © ©'  © © © ©'  © © © ©'  © © © ©'  © © © ©  © © © ©'  ©' ©' ©' © ' ©' © ' © ' © '  © © © ©'  TiCN T t i n rrt rt r - CN en CN VO CN © vo i - H T t i n VO rcn r- 00 © •n CN cn cn CN © 00* r~ 00 r—' f—4 r-' CN r-' © ' rt i n oo' © ' IN cn r-' vo' rr-' cn 00 f-H CN CN CN 00 CN i n f-H CN cn CN T t CN CN  T3  -S  c o D  cn  cu  >o CN  fc o C  CN © T t  <a  CN  ©' ©  T 3  © © © © © © ©' ©'  © © © ©'  CN i n cn CN cn vo T t cn 00 CN ^H CN T t CN f-H © ©  o o © o  © © © ©'  l~~ • n V© T t Tt Tt r~f-H r -  rt  CN  cn  © © © ©'  © © © ©'  00 *o T t  T t  CN CN i n r-' •n  CN  i n CN © T t 00 rt © 00 CN VO T t T t CN cn cn rvo r-H © cn VO VO CN f - H cn CN CN CN T t CN CN CN VO r-- © © m r - i n f-H i n cn cn rt T t T t cn CN IT) VO cn m  ©  ©' ©' ©' ©' ©' ©' © '  ©  ©  © ' ©' ©' © '  © © © ©'  © © © ©'  © © © ©'  ©' © ' ©' © '  0>  CN  •g  o  e 41  6 B  .3  O °cn cn CU  •c  li 04 CU  E2  5  © © © ©'  © © © ©  © © © ©  vo CN CN CN VC cn CN  ©  00  CN vo CN w-> cn 00 m 00 >n r- T t vd oo' vd © ' r- f-H CN CN T t CN  1 a.  CD  © © © ©'  © © © ©  CJ  s  © © © ©'  o o o ©'  © © © ©'  © © © ©'  © © © ©'  © © © ©'  © © © ©'  © © © ©'  o o o  © © © ©* ©'  © © © ©'  © © © ©'  00 T t T t  c  •o o  _ o  s  OA  ov  cn r r- CN vd vd i-H rcn  rt rt  r~© rt' CN vo CN CN 00 CN cn CN T t rt CN CN cn  VO T t © CN r - CN CN CN • n vq r- 00 CN i n © oo' oo' T-H vd w i vd »H T t CN r- i n t-f-H CN m CN ©  cn o  1 3 N  „  W  CO  o  rt  cn <L>  tf  H GO  C/3  e  o «b CU  E  u  u  3  cn r- CN CN CN CN r~ t- T t cn vo © 00 «*) CN cn T t m cn CN CN f—H CN © CN 00 00 VO vo CN © CN ITl • n CN f-H © i n r- m f-H T t cn cn f-H T t cn cn CN Tt vo cn i n CN T t  ©' ©  o o o ©'  © © © ©'  ©  ©  © ' © ' ©' ©' ©  ©  ©' ©  ©' ©  ©' ©'  o*  ©' © ' ©'  © © © ©'  © © © ©'  © © © ©'  © © © ©  © © © ©'  © © © ©'  © © © ©'  ©  © © © ©'  © © © ©'  © © © ©'  © © © ©'  © © © ©'  © o © ©'  © © © ©'  © © © ©'  o © o  © © © ©'  © © © ©'  7  cn  o  ai  a o C/3  1  'in  i~  cu  >  VD r- 00 VO 00 m © cn 00 <—i i n r-' © ' © ' oo' CN CN 00 i n Tt  OX  "e3  _o  cn  c2 ccJ  rt  'S _c C U  a. T t  CN ON X) -H vd cn CN  »-H vo f-H © cn i n © w-> CN 00 i n f - H r- T t vo ON r- f-H f—1 f-H^ © Ov v i vd r-' f - H w i T t f-H © r00 CN f-H cn CN T t CN  CN cn f-H vo cn T t i n CN 00 rf-H  m cn T t 00 CN 00 i n r-' >o CN  00 m © rt o 00 T t T t vo m f-H VD c n m cn OS © 00 CN VO r~ © T t vd vd r-' o* cn i n T t T t © CN T t r- l> • n r-f-H cn CN CN CN  o cn cu  •8 l-H  CN cn  s s S CD  CD  CD  in  vo  f-H f-H  CN cn  rt rt 5 S E CD E E E E B E CD CD CD = CD CD CD CD CD o  T t  00 ON  T t  e  4)  m VO l ~ 00 CN © f-H  B CD  rtE  CD  f-H  f-H  I-H  p  E E B ECD CD CD <u  C  cu 00  a. cn  -3 cd  jo  "o  •T?  ©  13  ccJ  "o JO cn  s cu CU  30  4.5  Persistence scored 4.5.1  Classical analyses  Using coefficient alpha, the reliabilities for the persistence scored C E S - D scale were 0.87 overall, and 0.89 for males and 0.85 for females.  4.5.2  Gender DIF analyses  For the persistence scoring method, the results from Table 3 show that items 7 (effort) and 8 (hopeful) show moderate gender DIF. Furthermore, comparing the R-squared values at Step #2 and #3, the data suggest that the D I F for these items is predominantly uniform. The proportional odds o f men responding higher on item 7 were 2.03 times that o f women matched on the total score, and the proportional odds o f men responding higher on item 8 were 1.90 times that o f women matched on the total score. It should also be noted that the calculation for items 2, 3, 9, 10, 15, 17, and 19 could not be computed because there was a low probability o f endorsing the items. In other words, there was not enough variability i n the item responses for the ordinal logistic regression to be computed. It is very important to note that item 17 (crying) is among these items that have very low variability. Table 4 shows the endorsement proportions for each item on the C E S - D using the persistence scoring method. In classical test theory, these generally are defined as the proportion o f examinees who answer an item correctly and, thus, are a measure o f item difficulty. In this context, however, endorsement proportions are more accurately indicative o f how much latent variable is required before an individual endorses a particular item. For example, using the persistence scoring method, an item is only endorsed by an individual i f they experience the symptom for 3-7 days. In other words, this scoring method requires that an individual must have a lot o f the latent variable, depression, before they endorse an item.  31  Thus, with such a strict criteria for endorsing an item, it is no surprise that all 20 C E S - D items have very low endorsement proportions, ranging from 3.8% to 27%. Table 4 also shows that items 2, 3, 9, 10, 15, 17, and 19 have low endorsement proportions ranging from 3.8 % to 6.7%). Consequently, these items exhibit very little variability. Endorsement proportions by gender were also calculated. A s shown in Table 4, females had higher endorsement proportions than males for 16 o f the 20 C E S - D items. However, according to Cohen's (1992) effect size criteria, the difference in endorsement proportions between males and females was trivial. Moreover, as noted i n the introduction o f this thesis, one should interpret these gender differences cautiously because o f the absence o f matching males and females on the latent trait- which is why DE? analyses are needed. The implications o f these very low endorsement proportions w i l l be presented in the discussion section o f this thesis, Chapter V .  32  rt  r—1  6n fi  •c o  N  © O  c n Vl 0 c n Tt Tt O p ©  O  rt  9  © cn p © © ©'  NO  O  ©  0  O S  O1 O O  O  J  PH  o  O O  CS O  0  cn © © ©  ,_(  CS © © ©'  NO  T t  iH  © ©'  © © ©  cn  rt cn  © ©  Vi  c»  OO  o s:  "°CJ  1/1  m T t  •a  O  ©  00  ON  5  rt  © c n r0 0 NO  0  0  o\ ©  rt  ©  •n  © Tt c s •n r~ cn cs T t ©' © ©'  NO  cn ©  NO  00  CN ©'  © ©'  o.  a,  to  <2  cs cs  cs  CD  C-  ON  t-.  O  in <n r~ Tt r- ad ON  NO  O  cn  O  ON  NO O  00 c s  r-^  cn ©  NO  cs  00  T t  in  m  ON  00 © 00 c n NO  NO NO  cn  rt  ©  ©  •n  NO  T t  T t l-H  m m'  NO  ""I  rt  1-1  -3  <Z> SJ CD  ii  CD N  i-s 1  e  w a>  43 c o  5  QJ fc  E2  o CD T3 fi  ^  S  C  O O O O  0 0 0  00  cn cn c s NO c n ©  rr~ CS  00  NO  cn  <s  with  CJ  Oi  0 0 0 0  3  r-' OS  T t  Q  33 11  ID  6  2^ ^  c _o  ll « H  VI  CD  S —  1  PH  a  ©  T t  0  s  ©  0 0 0 0  © © ©  © © © ©  NO  r-  0  rt in  00 CS T t  © © © ©'  © © © ©  in ©'  m ©  © © © ©'  in NO  ON NO  NO  cs ©' © ©  © © © ©  © © © ©  T t  •n  T t  ©  © © © ©  r~ rt CS O N T t  H H  CS  T t  V)  ©  NS 00 O SO  rt  NO  cn  r-  rt  cn ©  cs  0 0 0  0 0 0  0  T t  cn  0  0  cn  •n  rcn  00  0  0'  0 0  0  0  s  •n 00 rt NO cin s rt CS c n c s T t  cn © CS  rt  m  ©  T t  ON  T t  r-  CS r~ T t 00 c s c n in c s T t © © © ©' T t  cn  00  © cn  H H  in CN  t-H  m  ©  ©  •n ©  T t  © © © ©  © © © ©  © © © ©  cn  ON  CN  in in ©  rt  in  00  ©  CN CJ  0  C<3  ©  0 0 0  0  0 0 0  0  ©  >  © © ©  © ©  O  © © © ©  © © © ©  © © © ©  ON  CS c s  CH NO  VH  61 CD  © ©  ©  00 CS 9\ CS  0  &  a -a o  0 0  o \ ON cn cn  CH  CJ (—1  o  0  <n rt  T  H D *H  O O  0  PH  -,  tu  rt CS •n c n in •n  cn  3  CD  NO  NO  "9  li  •a >  rt 00  O  T t  r-' ON  Vl  CS  1—1  ,_, c n Vl T t 00 c s Vl  >r, 0 cs l-^ c s 00' T t  rt  rt CS  Tt  cn T t  in ©' r-  ON t-^  rt rt  rt  00 ON  in 00  rt 00 CS  cn m cs  CS CS c n © r~ c s ' — ' i—1  P H  rt  •n © •n  ©  cn  CJ  "*3  •a o  u O ON ON  rt  0- a  ON  E CD  u  (/J rt 3  o ,fi +->  rt  SoJ  CJ CO  O  Si  O O O  CZ5  CD  o H  T t T t  cn  O O  •n rcn  rt  O CS Vt Vl  NO  fi  rt  <n rn c> 0  0  O  s © 0 p p s* s*  cn  T t  ON  v-i  cs T t ©' ©' ©  © © © ©'  © © © ©  © © © ©  © © © ©  © © © ©  © © © ©  ON c n ON ©  cn  m  ON  cn  ©  ©  O O O O  O O O O  0 0 0 0  00  T t  T t  r>  Tt  NO  in  ND  © CS  in 1—f cs Tt  ©  ©  ©  CS  in in ©'  ©  in  00 CS  © © © ©  NO  00  cn  r-' ON  cs  •3 •5  T t  CS 0 c n cs cs  r-  CS CS in "-J © CS cn ri-H CN  rt rrt rt  Tt ©'  NO  rt rt  rt rt rt CN  m cs  ccj +-> 'fi  -a  >>  cn  a CJ  T t  £ CJ  >n NO  a CJ  E E CJ  OC ON  CS c n  ©  rt rt rt rt  i-H  in  NO  rt rt  1—1  00  ON  rt rt  CD  ts  00'  o  l-H  Vi  © CN  saaaaaa aaaaaa CJ  CJ  CJ  CJ  CJ  CJ  CJ  CJ  CJ  CJ  CJ  CJ  Vi  _fi  CD  •s rt CS E s CJ CJ  C CD 60  *&,  en  r-  CD  T3  %  I  T t  m  cn  .  c2  T t  m ©  NO NO  Vi  O  1  a.  rt  N E o  cn  CJ  CH  vs]  c  13  rt CS  T t  "c3  O  r~  0  CD >  00  61 O  o  T t  r-  rt CN  a  O  CCJ  13  *o X)  _c Vi  E CD  CD rt O  33  Table 4. Item endorsement proportions by gender for the C E S - D scale (310 males; 290 females) using the persistence scoring method.  Overall endorsement proportions  Male endorsement proportions  Female endorsement proportions  1  0.098  0.094  0.103  2 3  0.067 0.065  0.032 0.061  0.103 0.069  4 5 6 7 8  0.200 0.132 0.118 0.125 0.230  0.184 0.129 0.090 0.139 0.248  0.217 0.134 0.148 0.110 0.210  9 10  0.048 0.067  0.052 0.052  0.045 0.083  11 12 13 14  0.272 0.193 0.098 0.120  0.242 0.181 0.084 •0.106  0.303 0.207 0.114 0.134  15  0.057  0.061  0.052  16  0.193  0.174  0.214  17  0.050  0.026  0.076  18  0.095  0.071  0.121  19  0.038  0.029  0.048  20  0.112  0.103  0.121  0.12 0.07  0.11 0.07  0.13 0.07  CES-D item number  Mean SD  Note: The items i n bold could not be computed with the ordinal logistic regression (see Table 3) because there was a low probability o f endorsing the items. This table is only intended to show endorsement proportions and should not be used to investigate or interpret DIF.  34 Chapter V Discussion The primary objective o f this thesis was to conduct gender DEF for the C E S - D scale with each scoring method. In doing so, this thesis explored whether (a) gender DEF existed for the C E S - D for the ordinal, presence, and persistence scoring formats, (b) any C E S - D items were identified as DEF irrespective o f the scoring method and (c) any C E S - D items were found to display DEF for only some o f the scoring methods.  5.1 Gender D I F for the C E S - D Items The results i n Chapter I V showed that, depending on the scoring method used, at least three o f the 20 C E S - D items functioned differently among males and females. For the ordinal scoring method, the "crying" item (item 17) showed predominately uniform DEF. Similarly, the "crying" item also showed uniform DEF when the C E S - D was scored using the presence method. These findings were consistent with those o f Cole et al.'s (2000) study o f seniors. However, it is not known whether or not the "crying" item functions differently for males or females when the persistence scoring method is used because this data sample does not have enough variability in the responses o f these items for the ordinal logistic regression to be computed. Although an item showing almost no variability may be considered a poor indicator o f the construct being measured, this is unlikely in this context because the items hold up well using the other two scoring methods. It is more likely that these are good items, and that the low variability for these items is only because o f the strict criteria used with the persistence method o f scoring. Furthermore, given that this data came from the general population, it is unlikely that enough people in this sample would experience symptoms such  35  as "I had crying spells" for 3-7 days, creating very little variability. That is, individuals from the general population are more likely to experience such symptoms less than two days, an endorsement indicating an individual is unlikely to be depressed with the persistence scoring method. The results found with the persistence scoring method showed that item 7 (everything was an effort) and item 8 (felt hopeful) were flagged as showing gender DEF. Although these items were flagged as showing gender DEF, this result should be interpreted with some caution because o f the low variability found in the responses (item 7, p= 0.125; item 8, p= 0.230). In fact, a close look at all the responses with this scoring format reveals low variability for all the items, which supports the notion that using the persistence scoring method with data from a general population is not appropriate. One needs variability i n the item responses for an item to have psychometric utility - i.e., the aim o f psychometric methods is to differentiate among respondents and therefore an item without variability does not aid i n this differentiation. To this end, it can be concluded that the "crying" item displays gender DEF for both the ordinal and presence scoring methods. That is, an individual's gender influences how one endorses the "crying" item on the C E S - D . In other words, re-scoring the C E S - D from the ordinal method to the presence method does not remove the gender DEF o f the "crying" item.  O n the other hand, items 7 and 8 display gender DEF only with the persistence  method.  5.2 The Effect of Different Scoring Methods on D I F The findings i n this thesis not only demonstrate that DEF is a property o f the item (e.g., the crying item), they also show that DEF is a property o f the scoring method. A s  mentioned i n the introduction o f this thesis, no previous study had compared D I F for various scoring formats with the same instrument. That is, prior to this thesis, it was not known whether D I F was dependent on various scoring formats. Given that the C E S - D has various scoring formats, this instrument was used to ascertain whether D I F was dependent on the scoring method. A s presented in the results section, Chapter I V , different items displayed gender D I F depending on the scoring format used. Thus, when one thinks about D I F one must think about the items, the scoring method, and the interaction o f the items and scoring method used. Furthermore, because the scoring method is dependent on the purpose o f the instrument, D I F is also a property o f the purpose o f the measure. W i t h this i n mind, researchers exploring why an item displays D I F must consider not only the item, but also the scoring format used and the purpose o f the instrument. This is relevant in practice because there are several other instruments that have a Likert-type response format and a variety o f suggested binary scoring methods, such as the widely used General Health Questionnaire (Goldberg, 1972; Goldberg & Williams, 2000).  5.3 Implications Exploring why an item displays D I F is pertinent for decisions regarding what to do with items that function differently for different groups. That is, an item displaying D I F should not be discarded from the instrument before experts i n the area can clearly understand why the item is endorsed differently for different groups. A n item displaying D I F may reflect item impact (i.e. the groups truly differ on the underlying factor being measured). For example, i f the "crying" item is tapping depressive symptomology (i.e. a relevant characteristic o f the measure), item impact may be present. For instance, women may be more depressed than men and therefore, they endorse the crying item more. However, i f the  37  "crying" item is tapping a factor other than depression symptomology, item bias may be present. For example, the crying item would be biased i f men endorsed it more than women because men are socialized not to express their emotions. In order to explore this issue, a talk-aloud protocol may be used wherein individuals orally describe what they are thinking as they respond to an item. Accordingly, this method w i l l provide information as to the process o f responding to an item. Moreover, this method lends itself nicely to answering the question "What is it, in the process o f responding to items, that causes bias?" That is, are there differences in the reasoning processes people use in biased versus unbiased items? Although this talk-aloud protocol approach has not yet been used i n the DEF literature, the information provided by this approach may help researchers decide what to do with an item that displays DEF and it may suggest why the item displays DEF. In addition, one also needs to consider the characteristics o f the sample with which the DEF item was found. One should make sure that the item displays DEF across different samples. For example, an item may show DEF between groups o f seniors and young adults, but may not show DEF between groups o f seniors and children (e.g., irritability). In terms o f this thesis's findings, although DEF was found with a sample from the general population, comparisons and generalizations across other populations may be problematic. Specifically, this sample came from Northern British Columbia, an area that is comprised o f numerous small, rural towns and one city (population -80,000). In order to generalize these findings across populations, it is necessary that future studies replicate this study with different populations (such as those from larger urban areas). Future studies could explore gender DEF for other depression inventories and compare whether similar items display gender DEF. Comparisons across different items from  38  different depression scales may provide a great deal o f information regarding the types o f items that are found to display gender DEF, and this information may be used as a basis for making informed judgements and decisions pertaining to the future use and role o f such items.  References American Psychiatric Association. (1994). Diagnostic and statistical manual o f mental disorders (4 ed.). Washington, D C : Author. th  Aneshensel, C.S., Frerichs, R.R., & Huba, GJ. (1984). Depression and physical illness: A multiwave, nonrecursive causal model. Journal o f Health and Social Behavior, 25, 350-371. Angst, J., & Dobler-Mikola, A . (1984). D o the diagnostic criteria determine the sex ratio in depression? Journal o f Affective Disorders, 7, 189-198. Beck, A . T . , Ward, C . G . , Mendelson, M . , Mack, J., & Erbaugh, J. (1961). A n inventory for measuring depression. Archives o f General Psychiatry, 4, 561-571. Caetano, R. (1987). A l c o h o l use and depression among U . S . Hispanics. British Journal o f Addictions, 82, 1245-1251. Callahan, C M . , & Wolinsky, F . D . (1994). The effect o f gender and race on the measurement properties o f the C E S - D in older adults. Medical Care, 32, 341-356. Clark, V . A . , Aneshensel, C.S., Frerichs, R.R., & Morgan, T . M . (1981). Analysis o f effects o f sex and age in response to items on the C E S - D scale. Psychiatry Research, 5, 171181. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. Cole, S.R., Kawachi, I., Mailer, S.J., & Berkman, L . F . (2000). Test o f item-response bias i n the C E S - D scale: Experience from the N e w Haven E P E S E study. Journal o f Clinical Epidemiology, 53, 285-289.  40 Coryell, W . , Endicott, J., & Keller, M . (1992). Major depression i n a nonclinical sample: Demographic and clinical risk factors for first onset. Archives o f General Psychiatry, 49, 117-125. Craig, T.J., & V a n Natta, P. A . (1976). Presence and persistence o f depressive symptoms in patient and community populations. American Journal o f Psychiatry, 133, 1426-1429. Culbertson, F . M . (1997). Depression and gender: A n international review. American Psychologist, 52, 25-31. DeForge, B . R . , & Sobal, J. (1988). Self-report depression scales i n the elderly: The relationship between the C E S - D and Zung. International Journal o f Psychiatry i n Medicine, 18, 325-338. Devins, G . M . , Orme, C M . , Costello, C G . Binik, W . M . , Frizzell, B . , Stam, H.J., & Pullin, W . M . (1988). Measuring depressive symptoms in illness populations: Psychometric properties o f the Center for Epidemiologic Studies Depression ( C E S - D ) scale. Psychology and Health, 2, 139-156. Fennig, S., Schwartz, J.E., & Bromet, E.J. (1994). A r e diagnostic criteria, time o f episode and occupational impairment important determinants o f the female: M a l e ratio for major depression? Journal o f Affective Disorders, 30, 147-154. Frank, E . , Carpenter, A . B . , & Kupfer, D . J . (1988). Sex differences i n recurrent depression: A r e there any that are significant? American Journal o f Psychiatry, 145, 41-45. Gatz, M . , & Hurwicz, M . L . (1990). A r e old people more depressed? Cross-sectional data on Center for Epidemiological Studies Depression Scale factors. Psychology and Aging, 5, 284-290.  41  Goldberg, D . P. (1972). The Detection o f Psychiatric Illness by Questionnaire. Maudsley Monograph N o . 21. Oxford University Press : Oxford. Goldberg, D . , & Williams, P., (2000). A User's Guide To The General Health Questionnaire. N F E R - N E L S O N . Hamilton, M . (1960). A rating scale for depression. Journal o f Neurology, Neurosurgery and Psychiatry, 23, 56-65. Hertzog, C . , V a n Alstine, J., Usala, P.D., Hultsch, D . F . , & D i x o n , R. (1990). Measurement properties o f the Center for Epidemiological Studies Depression scale ( C E S - D ) in older populations. Psychological Assessment: A Journal o f Consulting and Clinical Psychology, 2, 64-72. Hudson, W . (1982). A measurement package for clinical workers. Journal o f Applied Behavioral Science, 18, 229-238. Jodoin, M . G . , & Gierl, M . J . (in press). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for D I F detection. Applied Measurement i n Education. Kessler, R . C . , Foster, C , Webster, P.S., & House, J.S. (1992). The relationship between age and depressive symptoms in two national surveys. Psychology and Aging, 7, 119-126. K i n g , D . A . , & Buchwald, A . M . (1982). Sex differences i n subclinical depression: Administration o f the Beck Depression Inventory i n public and private disclosure situations. Journal o f Personality and Social Psychology, 42, 963-969. Krause, N . (1986). Stress and sex differences i n depressive symptoms among older adults. Journal o f Gerontology, 41, 727-731.  42  Leon, A . C . , Klerman, G . L . , & Wickramaratne, P. (1993). Continuing female predominance in depressive illness. American Journal o f Public Health, 83, 754-757. Liang, J., Tran, T . V . , Krause, N . , & Markides, K . S . (1989). Generational differences in the structure o f the C E S - D scale i n Mexican Americans. Journal o f Gerontology: Social Sciences, 44, S110-S120. Lopez, S.R. (1989). Patient variable biases in clinical judgement: Conceptual overview and methodological considerations. Psychological Bulletin, 106, 184-203. Loring, M . & Powell, B . (1988). Gender, race, and D S M - H I : A study o f the objectivity o f psychiatric diagnostic behavior. Journal o f Health and Social Behavior, 29, 122. Myers, J.K., & Weissman, M . M . (1980). Use o f a self-report symptom scale to detect depression in a community sample. American Journal o f Psychiatry, 137, 1081-1084. Nolen-Hoeksema, S. (1987). Sex differences in unipolar depression: Evidence and theory. Psychological Bulletin, 101, 259-282. Nolen-Hoeksema, S. (1990). Sex differences in depression. Stanford, C A : Stanford University Press. Potts, M . K . , Burnam, M . A . , & Wells, K . B . (1991). Gender differences i n depression detection: A comparison o f clinical diagnosis and standardized assessment. A Journal o f Consulting and Clinical Psychology, 3, 609-615. Radloff, L . S . (1977). The C E S - D scale: A self-report depression scale for research i n the general population. Applied Psychological Measurement, 3, 385-401.  43  Radloff, L . S . , & Locke, B . Z . (1986). The community mental health assessment survey and the C E S - D Scale. In A . E . Slaby (Series Ed.), Community surveys o f psychiatric disorders (pp. 177-189). N e w Brunswick, N J : Rutgers University Press. Roberts, R . E . (1980). Reliability o f the C E S - D scale in different ethnic contexts. Psychiatry Research, 2, 125-134. Roberts, R . E . , Andrews, J.A., Lewinsohn, P . M . , & Hops, H . (1990). Assessment o f depression in adolescents using the Center for Epidemiological Studies Depression scale. Psychological Assessment, 2, 122-128. Roberts, R . E . , & Vernon, S.W. (1983). The Center for Epidemiological Studies Depression scale: Its use in a community sample. American Journal o f Psychiatry, 140, 4146. Santor, D . A . & Coyne, J.C. (1997). Shortening the C E S - D to improve its ability to detect cases o f depression. Psychological Assessment, 9, 233-243. Santor, D . A . , & Ramsay, J.O. (1998). Progress in the technology o f measurement: Applications o f item response models. Psychological Assessment, 10, 345-359. Santor, D . A . , Ramsay, J.O., & Zuroff, D . C . (1994). Nonparametric item analyses o f the Beck Depression Inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255-270. Snyder, V . N . S . , Cervantes, R . C . , & Padilla, A . M . (1990). Gender and ethnic differences in psychosocial stress and generalized distress among Hispanics. Sex Roles, 22, 441-453. Sonnenberg, C . M . , Beekman, A . T. F., Deeg, D . J. H . , & V a n Tilburg, W . (2000). Sex differences in late-life depression. Acta Psychiatrica Scandinavica, 101, 286-292.  Stommel, M . , Given, B . A . , Given, C . W . , Kalaian, H . A . , Schulz, R . , & M c C o r k l e , R. (1993). Gender bias in the measurement properties o f the Center for Epidemiologic Studies Depression Scale ( C E S - D ) . Psychiatry Research. 49. 239-250. Tousignant, M . , Brosseau, R., & Tremblay, L . (1987). Sex biases in mental health scales: D o women tend to report less serious symptoms and confide more then men? Psychological Medicine, 17, 203-215. Vera, M . , Alegria, M . , Freeman, D . , Robles, R.R., Rios, R., & Rios, C F . (1991). Depressive symptoms among Puerto Ricans: Island poor compared with residents o f the N e w Y o r k City area. American Journal o f Epidemiology, 134, 502-510. Weissman, M . M . , Orvaschel, H . , & Padian, N . (1980). Children's symptoms and social functioning self-report scales: Comparison o f mother's and children's reports. Journal o f Nervous and Mental Disease. 168, 736-740. Weissman, M . M . , Sholomskas, D . , Pottenger, M . , Prusoff, B . A . , & Locke, B . Z . (1977). Assessing depressive symptoms in five psychiatric populations: A validation study. American Journal o f Epidemiology, 106, 203-214. Wilhelm, K . , & Parker, G . (1994). Sex differences i n lifetime depression rates: Fact or artefact? Psychological Medicine, 24, 97-111. Winokur, G . , & Clayton, P. (1967). Sex differences and alcoholism in primary affective illness. British Journal o f Psychiatry, 113, 973-979. Wrobel, N . H . (1993). Effect o f patient age and gender on clinical decisions. Professional Psychology: Research and Practice, 24. 206-212.  45  Yesavage, J.A., Brink, T . L . , Rose, T . L . , L u m , 0 . , Huang, V . , Adey, M . B . , & Leirer, V . O . (1983). Development and validation o f a geriatric depression screening scale: A preliminary report. Journal o f Psychiatric Research, 17, 37-49.  Zumbo, B . D . (1999). A handbook on the theory and methods o f differential item functioning ( D U ) : Logistic regression modeling as a unitary framework for binary and likerttype (ordinal) item scores. Ottawa, O N : Directorate o f Human Resources Research and Evaluation, Department o f National Defense. Zumbo, B . D . (in press). Does item-level DLF manifest itself in scale-level analyses?: Implications for translating language tests. Language Testing. Zumbo, B . D . , Gelin, M . N . , & Hubley, A . M . (in press). The construction and use o f psychological tests and measures. Encyclopedia o f Life Support Systems. United Nations Educational, Scientific and Cultural Organization Publishing ( U N E S C O - E O L S S Publishing), France. Zung, W . W . K . (1965). A self-rating depression scale. Archives o f General Psychiatry, 12, 63-73. Zunzunegui, M . V . , Beland, F., Llacer, A . , & Leon, V . (1998). Gender differences i n depressive symptoms among Spanish elderly. Social Psychiatry and Psychiatric Epidemiology. 33, 195-205.  Appendix A SPSS Syntax File for the Ordinal Logistic Regression and Corresponding Effect Size Estimator * SPSS S Y N T A X written by: Bruno D . Zumbo, P h D . * University o f British Columbia . * e-mail: brunozumbo@ubc.ca . * * * * *  Instructions . Copy this file and the file "ologit2.inc", and your SPSS data file into the same Change the filename, currently 'ordinal, sav' to your file name . Change 'item', 'total', and 'grp', to the corresponding variables i n your file. R u n this entire syntax command file.  include file='ologit2.inc'. execute.  GET FELE='C:\ordinal.sav'. EXECUTE. compute item= i t e m l . compute total= cesdtot. compute grp= gender. * Regression model with the conditioning variable, total score, in alone, ologit var = item total /output=all. execute. * Regression model adding uniform DEF to model, ologit var = item total grp /contrast grp=indicator /output=all. execute.  * Regression model adding non-uniform DEF to the model, ologit var = item total grp total*grp /contrast grp=indicator /output=all. execute.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0053890/manifest

Comment

Related Items