UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Measurement of Post Traumatic Stress Disorder : an a posteriori content validation of the clinician-administered… Thiessen, Lynda Gail 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-0254a.pdf [ 4.69MB ]
Metadata
JSON: 831-1.0054579.json
JSON-LD: 831-1.0054579-ld.json
RDF/XML (Pretty): 831-1.0054579-rdf.xml
RDF/JSON: 831-1.0054579-rdf.json
Turtle: 831-1.0054579-turtle.txt
N-Triples: 831-1.0054579-rdf-ntriples.txt
Original Record: 831-1.0054579-source.json
Full Text
831-1.0054579-fulltext.txt
Citation
831-1.0054579.ris

Full Text

MEASUREMENT OF POST TRAUMATIC STRESS DISORDER: AN A POSTERIORI CONTENT VALIDATION OF THE CLINICIAN-ADMINISTERED PTSD SCALE (CAPS-dxsx). by LYNDA GAIL THIESSEN B. A. A. , Kwantlen University College, 2004 A THESIS SUBMITTED IN PARTIAL F U L F I L M E N T OF THE REQUIREMENTS FOR THE DEGREE OF M A S T E R OF ARTS in THE FACULTY OF GRADUATE STUDIES (Measurement, Evaluation, and Research Methodology) THE UNIVERSITY OF BRITISH COLUMBIA November, 2006 © Lynda Gail Thiessen, 2006 Abstract Many people in society today suffer from the consequences of exposure to a traumatic event in their life of the life of a loved one. These persons are at risk for developing a condition known as Post Traumatic Stress Disorder (PTSD). PTSD is a potentially debilitating disorder that is characterized by three symptoms clusters: reexperiencing, numbing and avoidance, and hyperarousal. To assess PTSD, many practitioners often use semi-structured interviews. The current "gold-standard" instrument for assessing PTSD is the Clinician-Administered Post Traumatic Stress Scale (CAPS; Blake et al., 1995). However, the CAPS has gaps in its validity evidence. In particular, a content validation study has not been performed to assess the ability of the CAPS to produce content-related validity evidence. The purpose of the present study was to evaluate (a) the CAPS adequacy in sampling the entire content domain of PTSD (b) how relevant each of the elements of the CAPS was, and (c) whether our current definitions of trauma and PTSD in the DSM-IV-TR (APA, 2000) are considered adequate by our subject matter experts. Results indicated that the CAPS received high ratings for the relevance of its traumatic events questions, individual items, and Life Events Checklist; however, it received poor ratings for its format, rating scales, global ratings, associated features, time-sampling parameter, scoring and summary chart, and summary sheet. Moreover, the CAPS failed to achieve minimum acceptable levels for its representativeness for the .content domain of PTSD. Concordantly, subject-matter experts rated the current definition of PTSD in the DSM-IV-TR (2000) as inadequate. Experts indicated that the current conceptualization of PTSD does not capture the full range of symptoms seen in clients with PTSD. Implications and suggestions for future research are proposed. TABLE OF CONTENTS Abstract i: Table of Contents ii List of Tables v Acknowledgments v CHAPTER I Introduction 1 Psychometric Measurement 1 Other Measurement Issues Associated with Post Traumatic Stress Disorder 3 PTSR 6 Complex PTSD 7 Missed Diagnosis of PTSD 8 Strategies for Assessing PTSD 9 Accurate Measurement of PTSD 10 Importance of Validity 12 CHAPTER II Review of the Literature 14 Current View of Validity 15 Dynamic Nature of Validity 16 Content Validity 17 Historical Emergence of Content Validity 18 Content Validity versus Face Validity 19 The Purpose of Content Validation 20 The Need for Content Validation 20 iv A Priori Content Validation 21 A Posteriori Content Validation 22 Subject Matter Experts Ratings 25 Defining the Content Domain 26 Other Considerations for Content Validation Studies 28 Interrater Reliability versus Interrater Agreement 29 Indices of Content Validity 32 CHAPTER III Method 34 Participants 34 Procedure 35 Measures 37 Qualitative Questions 39 The Rating Process for the Quantitative Questions 39 Content Validity Analysis 40 Interrater Agreement Analysis 43 CHAPTER IV Results 45 Content Validity Ratio 46 Content Validity Index : 49 Interrater Agreement Analysis 51 Qualitative Questions' Analysis 53 CHAPTER V Discussion 56 Conclusions 59 Strengths of this Study 60 Limitations of this Study 62 Future Research 65 References 67 Appendix A 75 Appendix B 76 Appendix C 78 Appendix D 104 Appendix E 105 Appendix F '. 106 vi List of Tables Number Title Page Table 1 Content Validity Ratio Highest and Lowest Values 48 Table 2 CAPS Subscales Averaged Values for Content Validity and Interrater Agreement Indexes 50 Acknowledgments I would like to express my deep gratitude to my supervisor, Dr. Kadriye Ercikan, whose expertise, guidance, and encouragement, helped make this thesis project a reality. I appreciate her vast knowledge and skills in the area of measurement and research methodology and her willingness to accept me half way through my thesis and see this project through to completion. I would also like to acknowledge my previous supervisor, Dr. Anita Hubley, for her initial hard work and input on the first three chapters of this thesis before she went away on sabbatical; and my committee members, Dr. Bruno Zumbo, and Dr. Marv Westwood for their contributions, their support, and their encouragement along the way. I would also like to thank my family for the support they provided me during this thesis and throughout my academic career and for the many sacrifices they made in letting me have the time to pursue this goal. I would also like to thank my many friends in the M E R M program, in the Adult Development and Psychometrics lab, the C A R M E lab, and from my undergraduate program at Kwantlen University College for their encouragement and support along the way. I especially appreciate their steadfast support and encouragement to keep going regardless of whatever challenges arose and for their belief in me. Finally, I would like to thank the Social Science and Humanities Research Council of Canada (SSHRC) for the Canadian Graduate Scholarship, the University of British Columbia for the University Graduate Fellowship and financial assistance that made this project possible. 1 Chapter One Introduction Many individuals in society today, struggle in their daily lives with the deleterious after-effects of exposure to a severe traumatic event. Persons who experience or witness traumatic events (e.g., war, rape, torture, or severe natural disasters) are at risk for developing a potentially debilitating condition known as Post Traumatic Stress Disorder (PTSD) (Levine, 1999). According to the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR, 2000), PTSD is a debilitating disorder in which individuals experience characteristic symptoms of reexperiencing the traumatic event, avoidance and numbing, and increased arousal. Those who experience these characteristic sequelae of symptoms often require therapeutic interventions to overcome negative after-effects of experiencing trauma. However, most individuals are unaware that the current symptoms they are experiencing are related to previous trauma (Carlson & Dutton, 2003). Therefore, it is up to professional practitioners from various fields (e.g., psychological, psychiatric, and medical) to accurately assess an individual's condition in order for them to receive the necessary treatment to deal with this debilitating disorder. Psychometric Measurement One of the ways to assess such psychological disorders is with psychological tests. Psychological testing endeavors to provide accurate measurement of abstract constructs such as traumatic stress or anxiety using instruments with sound psychometric properties. Choosing a psychometrically sound instrument for measuring post-traumatic stress disorder, however, can be a difficult challenge. Despite a plethora of measurement instruments available for assessing PTSD, few to date have been validated via systematic content validation studies (Sprang, 2004). Without information regarding content validity, it is difficult to judge whether a 2 particular instrument's items adequately represent the content domain of interest, or whether the instrument is appropriate for a particular population or purpose. For instance, many currently available instruments were originally designed to measure PTSD in veteran or military populations (Krinsley & Weathers, 1995); however, nonmilitary trauma exposure in civilian populations accounts for the largest percentage of PTSD in the greater population (Stein, 2002). As a result, many standard assessment instruments may not adequately assess a full range of PTSD symptoms in victims of nonmilitary types of trauma such as female rape victims (Carlson & Dutton, 2003). Subsequently, it becomes imperative to ensure that content validity evidence be provided on PTSD assessment instruments to demonstrate their utility with various afflicted PTSD populations, thereby, providing researchers and clinicians with psychometrically sound choices for a wider range of afflicted populations. Appropriate choice of an assessment instrument is dependent on linking the type of instrument chosen with the purpose of the assessment (Weathers, Keane, & Ruscio, 1999). For example, a test designed as a brief screen for PTSD would not be appropriate for diagnosing PTSD. Concordantly, measures designed to confirm a diagnosis of PTSD or to provide a differential diagnosis regarding other psychological disorders, would not be the best choice for screening purposes. Kraemer (1992) identified three different types of tests to match with three different assessment purposes. The first type of test is an optimally sensitive test, which minimizes false negatives and is best for screening purposes. The second type is optimally specific tests, which minimizes false positives and is best for confirming a diagnosis; the third type is optimally efficient tests, which minimize overall numbers of diagnostic errors and are best suited for differential diagnosis (Weathers, Keane, & Ruscio, 1999). Sensitivity, specificity, and efficiency are important considerations that affect the accuracy of the measurement results 3 and the conclusions drawn from those results. Thus, it is of primary importance for clinicians and researchers to link the designed purpose of the instrument with the measurement objectives in order to obtain meaningful results. Other Measurement Issues Associated with Post Traumatic Stress Disorder Several other issues arise in the measurement of PTSD, one of which involves changes in the definition of the construct itself. Since its inception into the nomenclature of the DSM-III in 1980, the construct of PTSD has undergone several changes. As new information is discovered about PTSD, current definitions of the construct are refined and current editions of the D S M reflect those changes. In the latest definition delineated in the DSM-IV-TR (2000), there are three main changes in criteria. First, the definition of a traumatic event (Criterion A) was elaborated into a two-part definition; A. 1 requires that "the person experienced, witnessed, or was confronted with an event or events that involved actual or threatened death or serious injury, or a threat to the physical integrity of self or others"; A.2 requires that the "person's response involved intense fear, helplessness, or horror" (DSM-IV-TR 2000, p.467). Second, physiological reactivity was moved from Criterion D (arousal symptoms) to Criterion B (reexperiencing symptoms). Third, a new criterion was added requiring clinically significant distress or functional impairment as part of the diagnosis (Criterion F). In one review, Weathers, Keane, and Ruscio (1999) describe these changes as "relatively minor" and state that they have minimal impact on most studies measuring PTSD. However, these changes may affect diagnosis of PTSD and subsequent prevalence rates of the disorder in the following ways: by expanding criterion A. 1 to include those who have witnessed or been confronted with a traumatic event, prevalence rates may increase over the previously more stringent standard of having personally experienced the event themselves. Now, individuals who witness a traumatic event, such as innocent 4 bystanders during a crime, or emergency room personnel, are included under this criterion, which increases the range of subjects who qualify to be diagnosed with PTSD and subsequent prevalence rates. Concordantly, earlier criteria also required that the traumatic event be "outside the range of normal experience"; changing this requirement may increase the range of experiences that qualify as a traumatic event. Subsequently, the inclusion of "common" events will likely also increase prevalence rates. However, criterion A.2 requires that the person's response involve intense fear, helplessness, or horror, which may act as a qualifier to somewhat curb any spurious inflation. Relatively common experiences, such as minor car accidents, would have to produce clinically significant fear responses for the experience to qualify as a traumatic event. The third change in criteria, moving physiological reactivity from the arousal symptoms category to the reexperiencing category, may have a negative effect on prevalence rates. The reexperiencing category (Criterion B) only requires one symptom to meet diagnostic criteria for qualifying as PTSD, whereas the arousal category requires two criteria. Moving physiological reactivity out of the arousal category makes it more difficult to meet diagnostic criteria of that criterion since there will be fewer symptoms that qualify. As a result, type II errors may occur as fewer cases then meet diagnostic standards, though the person may truly suffer from the disorder. In addition, adding a new criterion requiring clinically significant distress or functional impairment as part of the diagnosis may also disqualify some individuals and decrease prevalence rates. A related criticism of the current understanding of PTSD is that the number of symptoms required to meet criterion diagnosis is arbitrary and indefensible (Burstow, 2005). For example, why does the D S M - I V require only one reexperiencing symptom to meet criterion B (e.g., 5 distressing dreams, recurrent thoughts or flashbacks), yet three symptoms are necessary to meet criterion C (e.g., numbing and avoidance) and two symptoms required for criterion D (e.g., increased arousal, irritability, and difficulty concentrating)? A suitable answer has not been given in the literature to these critiques and remains elusive at this present time. As with many psychological constructs, most researchers and experts in the field of PTSD agree that our current understanding of the construct is incomplete. Many aspects are yet unknown and extend beyond the range of those covered in the D S M . Burstow (2005) argues that current D S M - IV criteria are not exhaustive; and that the current conceptualization of PTSD is "reductionistic" in nature. Contemporary definitions of PTSD mainly focus on symptomology and fail to elaborate on environmental factors such as the negative effects on interpersonal relationships or occupational functioning that may occur. Though criterion F explicitly states that PTSD "causes clinically significant distress or impairment in social, occupational, or other important areas of functioning" there is no description of what those impairments entail. Many aspects of the effects of PTSD on daily functioning require future research to adequately define the range of effects that can occur. Further criticisms contend that the symptoms described in the D S M may not adequately represent manifest symptomology of individuals from other cultures. Carlson and Dutton (2003) argue that the D S M criteria are culturally bound and may be inadequate for assessing other ethnicities. For example, Vietnam refugees experiencing PTSD may not be asked culturally relevant questions such as seeing family or loved ones disappear or losing all one's possessions, which may have caused individuals from their culture significant distress. Current D S M criteria are limited to local American types of experiences and may not adequately assess other culture's experiences. In addition to limited cross-cultural application of current conceptualizations of -6 PTSD, many researchers in the field of post-traumatic stress argue that PTSD is not broad enough of a category to encompass all of the symptom sequelae that appear in the aftermath of trauma (Herman, 1992b; Roth, Newman, Pelcovitz, van der Kolk, & Mandel, 1997; van der Kolk, 1997). Leading researchers call for the development and inclusion of new broader categories into the nomenclature regarding post-traumatic stress. P T S R Post Traumatic Stress Response (PTSR) is one overarching, broad category that is proposed to describe a widespread constellation of symptoms that follow experiencing a traumatic event (Roth, Newman, Pelcovitz, van der Kolk, & Mandel, 1997). PTSR includes various categories that include, but are not necessary limited to, PTSD, complex PTSD, disorders of extreme stress (DES), and disorders of extreme stress not otherwise specified (DESNOS). PTSD is but one of them many responses that may occur after experiencing trauma; however, many experts now say that due to the idiosyncratic nature of symptom presentation or manifestation, our current conceptualization of PTSD is not comprehensive enough to account for the various presentations and stress response syndromes that appear in many individuals after a trauma (Horowitz, 2001). Though some of the symptoms have been incorporated into the nomenclature under associated features of PTSD, many experts argue that most of the psychological sequelae that occur are not well represented in our current conceptualizations of PTSD and they call for a revision of our current definitions as well as the inclusion of other disorders of extreme stress (currently conceptualized under acute stress disorder) and complex PTSD under an overarching category of PTSR (Briere, 1997; Herman, 1992b; Pelcovitz, van der Kolk, Roth, & Mandel, 1997). 7 Complex PTSD Since the initial conceptualization of complex PTSD (Herman, 1992a; 1992b) in the early 1990s, numerous studies have begun to provide support for this construct and for the complex nature of symptom presentation especially in those who have been exposed to chronic sexual and physical abuse at an early age (Briere, 1997; Roth, Newman, Pelcovitz, van der Kolk, & Mandel, 1997). Field trials for the DSM-IV (1994) in 1991-1992 sought to investigate proposed changes to the PTSD category and to explore the psychopathology of chronic developmental trauma to account for symptomology not covered in PTSD (van der Kolk & Courtois, 2005). Seven categories of symptoms were included in the complex PTSD/DESNOS conceptualization; (a) alterations in ability to modulate emotions, (b) alterations of identity and sense of self, (c) alterations in ongoing consciousness and memory, (d) alteration in relations with the perpetrator, (e) alterations in relations with others, (f) alterations in physical and medical status, and (g) alterations in systems of meaning. Recent studies by van der Kolk and colleagues (1996) demonstrate the profound effect that interpersonal trauma, in particular, has on self-regulation, self-perception or self-definition, interpersonal relationships, self-destructiveness, dissociative symptoms, and somatization; especially when that interpersonal trauma is at an early age of onset and/or prolonged/chronic, individuals manifest higher levels of symptomology and more severity of symptoms. In addition, individuals who experience both sexual and physical trauma are at greater risk for developing complex PTSD than those who experience either sexual or physical abuse alone (Roth, et al., 1997). From these various studies, it becomes clear that our current conceptualization of PTSD is too limited to comprehensively account for all the post traumatic stress reactions that occur in the wake of trauma. 8 Field trials for the DSM-IV tried to account for the growing range of symptoms being discovered by including them as associated features of PTSD, however, experts contend that these symptoms are not peripheral manifestations, but rather inherent characteristics, van der Kolk and Courtois (2005) argue that these "problems do not constitute comorbid diagnoses, but rather, are somatic, affective, behavioral, and characterological manifestations of chronic interpersonal trauma and thus, are part of the primary disorder" (p.3 86). As such they also recommend that a new category be created, and suggested names such as Complex PTSD/DESNOS or Developmental Trauma Disorder to account for the various effects of complex adaptations to trauma over the lifespan. Missed Diagnosis of PTSD Despite high prevalence rates in the general population, other factors may also result in missed diagnoses of PTSD. Often, missed diagnoses occur because PTSD is typically not the primary presenting problem when patients are seeking help (Franklin, Sheeran, & Zimmerman, 2002). Patients typically present with somatic complaints, interpersonal problems, substance abuse, or comorbid psychological problems, such as depression or borderline personality disorder, but many are unaware that their symptomology may be related to PTSD (Carlson & Dutton, 2003; Litz, Penk, Gerardi, & Keane, 1990). Consequently, patients are treated for somatic, comorbid, or interpersonal complaints, but their PTSD goes unrecognized. Primary care physicians, as well as psychologists and psychiatrists, often do not routinely assess for trauma in their patients; thus, PTSD may go undiagnosed for many years (Frueh, Cousins, Hiers, Cavenaugh, Cusack, & Santos, 2002). Professional practitioners may be unaware of the link between presenting symptomology and prior trauma; therefore they may not assess their patients 9 for PTSD. Poor routine assessment procedures combined with lack of specialized training in recognizing the signs of PTSD account for much of the misdiagnoses (Frueh et al., 2002). Strategies for Assessing PTSD Many agree that appropriate strategies for assessing PTSD vary depending on the purpose of the evaluation. Evaluations for research, clinical, or forensic purposes use different types of instruments based on the purpose of the evaluation (Carlson & Dutton, 2003). Research evaluations may require either assessment or testing instruments depending on the research question. If a study wants to investigate the etiological factors involved in the development of PTSD, then a comprehensive assessment instrument would be the appropriate choice. However, if a study was interested only in testing for current symptomology, then a briefer screening instrument would likely be sufficient. Studies wanting to sample individuals who currently meet the criteria for PTSD would require diagnostic inventories such as clinician-administered scales (Carlson & Dutton, 2003). For clinical evaluations, often a comprehensive assessment is more desirable. Comprehensive assessment instruments generally give a more elaborative view of both the client's present and past situations, which aids in treatment planning strategies. Systematic assessment of prior trauma aids in the prediction, implementation of treatment strategies, and helps practitioners identify cues that trigger trauma reactions in patients (Carlson & Dutton, 2003). Knowledge of past experiences gives clinicians a deeper understanding of the client's case and can provide invaluable insight as to what factors may influence current symptomology. Comprehensive assessment of an individual's trauma history including the nature of the trauma and any cumulative effect from multiple traumas or repeat victimization is crucial for thorough understanding of the individual's situation before effective treatment strategies can be 10 Implemented (Carlson & Dutton, 2003). In these types of clinical situations, brief screening instruments would supply insufficient amounts of information regarding the client's case. However, one of the current arguments in the literature concerning the assessment of PTSD is whether comprehensive assessment of an individual's past trauma history is necessary. On the one hand, many would argue that a detailed trauma history assessment is crucial for providing valuable insight regarding precipitating factors that may be related to current symptomology (Carlson & Dutton, 2003; Litz, Penk, Gerardi, & Keane, 1990). On the other hand, some would argue that a comprehensive trauma history assessment is unnecessary to test for current PTSD symptoms. Advocates of the testing view would contend that using inventories that measure current symptomology are sufficient and that knowledge of prior traumas is not only unnecessary, but may be unwise. For example, when assessing crime victims, Carlson and Dutton (2003) caution the assessment of past trauma histories because such information can be used against the victim in child custody cases as evidence of being an unfit parent. For other forensic purposes, comprehensive and detailed assessments may be preferred. Forensic cases such as disability or compensation claims or criminal cases for mitigation of the death penalty or other serious sentencing situations require detailed information to build a solid case with (Carlson & Dutton, 2003). A criminal's past trauma history would be relevant in such instances as potential evidence for current criminal behaviour. Thus, the more detailed information an assessment instrument could provide the better for such forensic purposes. Accurate Measurement of PTSD Regardless of whether an instrument is being used for research, clinical, or forensic purposes, using standardized measurement instruments with good psychometric properties is essential for accurate measurement. Both screening and assessment purposes require 11 standardized instruments that provide good reliability and validity evidence while measuring the variable of interest (Carlson & Dutton, 2003). Subsequently, any study or testing occasion endeavoring to accurately assess PTSD must address issues of validity to ensure the veracity of the results. Accurate measurement is essential in order to achieve the purposes for which the testing is proposed. Measurement is defined as "the standardized quantification of cognitive, personality, emotion, academic achievement, or behavioral variables" (Mayo Clinic, 2005). In other words, measurement is the assigning of numbers to abstract variables in a standardized fashion. Concordantly, accurately measuring PTSD involves addressing issues of reliability and validity in order to enhance the fidelity of the results and the inferences drawn from those results. For a test score to be considered a good measurement or observation, it must consist of two essential features: First, that it is reliable, which is synonymous with consistency, stability, and predictability; second, that it is valid, which is synonymous with truthfulness, accuracy, authenticity, and soundness (Hubley & Zumbo, 1996). Reliable results are those, which can be reproduced consistently and predictably by other researchers replicating one's study. Valid results refer to accurately measuring the construct of interest that the study was designed to measure. However, as Hubley and Zumbo (1996) noted, a score can be reliable without being valid, but it cannot be valid without first being reliable - "reliability is a necessary, but not sufficient, condition for validity" (p. 208). In other words, a test may consistently reproduce a score that is specious, which can lead to spurious conclusions. Thus, it is essential that evidence for both the reliability of a score and its validity be demonstrated by the researcher using sound 12 validation procedures in order to enhance the fidelity of the results and the inferences drawn from those results. The Importance of Validity Psychometricians acknowledge that one of the most fundamental and important considerations when conducting any type of psychometric testing is the concept of validity (Gregory, 2004). Validity has been defined historically as "the fidelity with which [a test] measures what it purports to measure" (Garrett, 1937, p. 324). However, many contemporary testing specialists consider this too restrictive of a view. Current views on validity endorse a broader perspective that incorporates both the characteristics of the test's scores and the inferences drawn from those scores. Based on the current definition by the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1985, 1999), a test is considered valid to the extent that the scores and the inferences drawn from the scores are appropriate, meaningful, and useful. From this definition, it appears that the current view on validity has changed from a narrow functional definition focusing on the test itself, to a broader contextual focus on the meaning of the scores. Rather than considering validity an inherent characteristic of the test, contemporary specialists such as Messick (1995) argue that validity is a dynamic property of test scores and the inferences drawn from those scores. This view represents a major shift to the traditional conceptualization of validity. The purpose of this study was to conduct a content validation of the CAPS-dxsx (Blake et al., 1995). The CAPS-dxsx is considered the current "gold standard" diagnostic interview for PTSD in the literature (Blake et al., 1995), yet it has never undergone a content-related validation process. Concordantly, the goal was (a) to see i f the CAPS adequately sampled the entire content domain of PTSD (b) to assess the relevance of each element of the CAPS and (c) to see whether our current definition of PTSD espoused in the DSM-IV-TR (APA, 2000) is considered representative by our subject matter experts. 14 Chapter Two Review of the Literature Traditional conceptualizations of validity have undergone some major changes over the last 50 years. Traditionally, validity was conceptualized as an inherent property of the test. Moreover, the concept of validity consisted of four separate categories, or "types" of validity: content validity, predictive validity, status validity (concurrent validity), and congruent validity (construct validity). These four categories were first introduced in the Technical Recommendations for Psychological Tests and Diagnostic Techniques: A Preliminary Proposal (APA, 1952). Status validity and congruent validity were later changed to concurrent validity and construct validity in the follow up report in 1954. Cronbach and Meehl (1955) described these four "types" of validity: content validity was considered the degree to which the test items sample a universe of possible behaviours. Predictive and concurrent validity referred to a systematic relationship between test scores and one or more outcome criteria (Yun & Ulrich, 2002). Outcome criteria collected in the future through psychometric testing or job performance reviews, for instance, provide evidence of predictive validity; and outcome criteria obtained at the same time provided evidence of concurrent validity. Construct validity referred to an evaluation of tests designed to measure hypothetical (but behaviorally relevant) psychological traits or constructs (Angoff, 1988; Hubley & Zumbo, 1996). These four separate categories were considered different "types" of validity. However, not all psychometricians agreed with this discrete view. Loevinger (1957) argued that "all validity is construct validity" (p. 24). From Loevinger's perspective, having four separate categories of validity was an artificial partitioning of construct validity. However, this view did not gain popular support until the 1970s when 15 psychometricians such as Messick (1975) and Guion (1977) endorsed its position (as cited in Hubley & Zumbo, 1996). Another fundamental change to the traditional conceptualization of validity occurred in the mid 1960s. In 1966, predictive and concurrent validity were subsumed under the term criterion-related validity (Angoff, 1988). Thus, the four category view of validity evolved into a trinitarian view. According to Hubley and Zumbo (1996), there was some disagreement as to whether the trinitarian concept was meant to introduce three aspects of validity (Guion, 1980), or three types of validity (Angoff, 1988). However, the dominant view at the time was that validity was still considered separate categories. Current View of Validity Current conceptualizations by educational-measurement specialists and psychometricians now consider validity a unitary concept (AERA, APA, NCME, 1985, 1999). Rather than consisting of independent types or categories, validity is now considered a unified construct where each aspect can join together with the others to lend cumulative support for evidence of validity. This unified view of validity is more of an additive model and reflects a fundamental change in the way validity was traditionally construed. Another traditional view that has undergone major changes over the years is the current perspective on test validation. Validation is now considered an ongoing process rather than an exclusive step of the development process (Hubley & Zumbo, 1996). Traditionally, validation was considered an a priori procedure conducted by test developers during the initial development of an instrument. Since validity was conceptualized as an inherent characteristic of the instrument that could be demonstrated by providing its construct, content, and criterion-related validity evidence, test validation was only considered necessary at the outset of development. 16 However, validation is now considered an on-going process that is necessary both during the initial phases of test development and whenever an instrument is being used with a different population or for a different purpose than for which it was originally designed (Haynes, Richard, & Kubany, 1995; Hubley & Zumbo, 1996). Validation procedures, especially content validation, are necessary on an on-going basis in order to provide cumulative evidence for validity. Dynamic Nature of Validity Validity is dynamic by its definition and its conceptualization. Validity evidence can degrade over time as previous conceptualizations of a construct change and new information is learned (Haynes, Richard, & Kubany, 1995); thus, older validity evidence may become obsolete and new content-related validity evidence will need to be gathered. As new information emerges, assessment instruments need to be re-evaluated and revised to reflect those new changes. According to Haynes, Richard, and Kubany (1995), "it should rarely be assumed that an assessment instrument has unconditional validity" (p. 241). An instrument may have demonstrated acceptable validity in the past, however, dynamic changes in the understanding of a construct can affect the current relevance or representativeness of the items that sample the content domain and thus, may need to be revised. Due to the dynamic nature of validity, on-going validation is necessary to ensure that the scores obtained and the inferences drawn about those scores are valid for the intended purposes of the current testing situation (Sireci, 1998a). On-going validation of an assessment instrument via multiple sources of validity evidence is therefore, necessary to support the veracity of the scores and the claims made from those scores. According to current views on validity theory, content validation is an on-going process; one that is needs to be updated over time due to the dynamic nature of many constructs, and one that may lead to the emergence of improved conceptualizations of such constructs as PTSD. This . 17 is particularly true in the case of PTSD where numerous subject matter experts in the literature contend that our previous understanding of the disorder as specified in the DSM-IV-TR is truncated and unrepresentative of the entire content domain of PTSD (Carlson, 2001; van der Kolk, 2002). As a result, up to date information regarding the full extent of the disorder and the various characteristic symptom sequelae that may manifest in individuals with the disorder needs to be explored. Concomitantly, this current study sought to examine the ability of the CAPS to provide strong content-related validity evidence for our current understanding and more up to date conceptualization of PTSD. Thus, all aspects of the CAPS instrument were subjected to content validation including the administration instructions, the event questions, the individual PTSD items, the associated items, the rating scales, format, scoring charts, summary sheet, Life Events Checklist, and the time-sampling parameters. Content Validity One of the sources of validity evidence that is essential to gather for any measurement purpose (and will be the focus of the rest of this paper) is content validity. Sireci (1998a) states that content validity is "crucial for sound measurement in the behavioural sciences" (p. 83). According to the Standards for Educational and Psychological Testing (AERA, A P A , N C M E , 1985), content validity is defined as "the degree to which the samples of items, tasks, or questions on a test represent some defined universe or domain of content" (AERA, A P A , N C M E , 1985, p. 10). Content validity refers to how adequately the test items and other elements of the instrument (e.g., instructions, rating scales, and response formats) represent appropriate ways of sampling data for the content domain of interest. If a measurement instrument uses items or elements not relevant to PTSD, it likely will not produce scores representative of PTSD. For 18 example, a question about depression is not relevant to the construct of PTSD and will not produce results representative of the construct. Subsequently, the question will have a negative effect on the results by attenuating the content validity rating of the study. Though depression may be a valid comorbid condition with PTSD, it will not add to the content validity of a measurement occasion designed to solely assess PTSD. The content validity of the measurement occasion also depends on the purpose of the testing, which has been recognized ever since the concept of content validity emerged. The Historical Emergence of Content Validity As early as 1946, Rulon argued that when assessing the validity of instruments, an assessment of the content of the instrument and its relation to the measurement purpose is necessary. Rulon recommended an operational approach to instrument validation that required that both the appropriateness of the content and the purpose of testing be evaluated during the validation process (Sireci, 1998a). In addition, Gulliksen (1950) purported that evaluations of instruments should be empirically based and should include evaluation of expert judgements regarding test content (Sireci, 1998b). Cureton (1951) concurred and, in the following year, wrote one of the earliest synopses of the various conceptualizations of validity and content validation. In fact, Cureton was one of the first to introduce the term content validity into the nomenclature of educational and psychological testing and to describe the process of content validation (Sireci, 1998a). Cureton's (1951) chapter entitled "Validity" which appeared in the first edition of Educational Measurement (Lindquist, 1951) states that one should: ask those who know the job to list the concepts which constitute job knowledge, and to rate the relative importance of these concepts. Then when the preliminary test is finished, we may ask them to examine its items. If they agree fairly well that 19 the test items will evoke acts of job knowledge, and that these acts will constitute a representative sample of all such acts, we may be inclined to accept these judgments (p.664). It appears evident from these early writings that the basic concepts for content validity and content validation were in place as early as 1951. Content Validity versus Face Validity Since its inception, content validity has often been confused with face validity. Ever since Rulon (1946) stated that some instruments are "obviously valid", confusion occurred over the difference between the two. However, Anastasi (1988) states that: Content validity should not be confused with face validity. The latter is not validity in the technical sense; it refers, not to what the test actually measures, but to what it appears superficially to measure. Face validity pertains to whether the test "looks valid" to the examinees that take it, the administrative personnel who decide on its use and other technically untrained observers (p. 144). Face validity is defined as "validity by assumption" (Sireci, 1998a). If the content of an instrument appears "to bear a common-sense relationship to the objective of the test" then it is assumed to be valid (Mosier, 1947, p. 208). However, Mosier (1947) expressed concern over the veracity of face validity and over assumptions made based on the "appearance of validity" (p. 208). Mosier (1947) did acknowledge that face validity was a highly desirable attribute of tests; however, he argued that "validity by definition" was the only type of face validity that was defensible. "Validity by definition" referred to when test questions defined the objective of testing and could be verified by subject matter experts (Sireci, 1998a). Ultimately, measurement 20 specialists from Mosier (1947) to Messick (1995) argue that face validity is not a real form of validity and emphasize the need for assessing content validity. The Purpose of Content Validation The purpose of conducting a content validation study is to assess whether the content of the test is consistent with the intended purposes of the test (Sireci, 1998b). For instance, i f a test was designed to screen for PTSD, items on the test should represent related characteristics of the disorder and adequately sample a representative amount of the known symptomology. However, if the intended purpose of the test was diagnostic, the content of the instrument would have to consist of enough relevant items to meet the criteria requirements for making an accurate diagnosis. Ultimately, content validity must be demonstrated to support the utility of the instrument and the defensibility of the claims inferred from the data gathered (Sireci, 1998b). A diagnosis of PTSD could not be empirically supported by a content validation study when based on data from a screening instrument (though this situation often occurs). Both the data and the judgments based on the data are the primary object of content validation studies (Sireci, 1998b). Therefore, the assessment instrument elements, the conclusions, and the purpose of the testing situation must be congruent with the intended purposes of the instrument to provide evidence for content validity during a validation study. The Need for Content Validation Many psychological assessment instruments have not been subjected to systematic, quantitative content validation (Haynes, Richard, & Kubany, 1995). Though most have been rationally derived, many have not been empirically validated. Content validation of instruments helps to establish (a) the relative degree to which the instrument taps the targeted construct, (b) their most appropriate functions, (c) the inferences that can be drawn from the resultant data, and 21 (d) the elements that may benefit from refinement (Haynes, Richard, & Kubany, 1995). Content validation can occur at any time during the life of an assessment instrument, whether that is a priori during the development stage, or a posteriori, after it is developed. A Priori Content Validation Content validation is a process, which may occur during the development stage of an assessment instrument. This is known as a priori content validation. During the development stage, the validation process involves: 1) identifying the dimensions of the construct of interest; 2) generating items; and 3) assembling the items into a useable form (Lynn, 1986). In this initial phase, the purpose is to minimize potential error variation, and to increase the probability of obtaining supportive construct validity indices later on (Haynes, Richard, & Kubany, 1995). By minimizing error variance the fidelity of the responses is enhanced and the inferences drawn from those responses are more likely to be valid and meaningful. When measuring a psychological construct, sources of error vary. Some sources are associated with the targeted construct, some with the method of assessment, some with the function of assessment, and others are related to the methods of content validation used (Haynes, Richard, & Kubany, 1995). A priori content validation can address these sources of error variance and reduce them before an instrument's form is finalized. As a result, the content validity of the subsequent scores and the conclusions inferred from them may be enhanced. A priori methods for building content validity into an instrument right form the start are highly recommended by testing specialists (Beck & Gable, 2001). However, content validation is a dynamic on-going process; therefore, a posteriori content validation is also necessary and will be the focus of the remainder of this paper. 22 A Posteriori Content Validation A posteriori content validation refers to validating the items and elements of an instrument after its initial formulation by the developer. A posteriori content validation is often conducted when a priori validation was not performed, or when an instrument is used in a different context than for which it was originally designed. In addition, a posteriori content validation is also required when an instrument requires updating due to changes in the construct over time. These latter two situations are particularly relevant in the assessment of PTSD. Many of the current PTSD instruments were originally designed for use with male military populations and may not adequately assess civilian or female post traumatic experiences. Concordantly, the construct of PTSD has changed over time since its inception in the early 1980's and many instruments developed according to previous definitions may no longer provide content validity evidence according to the updated definitions. The most commonly used approach for a posteriori content validation in the extant literature is the use of subject matter experts (SMEs). SMEs are experts in the relevant content domain who have considerable experience and expertise regarding the construct of interest. Consequently, choosing a panel of subject matter experts with high levels of content expertise is a crucial step in content validation studies (Haynes, Richard, & Kubany, 1995). The veracity of their content validity judgments depends on their levels of expertise and experience with the particular content domain. Yun and Ulrich (2002) recommend that criteria for choosing judges should be specified based on their professional qualifications and experience and should consider any biases that may affect their ratings. Clear concrete criteria that specify years of experience and levels of expertise are critical for enhancing the quality of their ratings. Poor criteria that do not adequately distinguish between content experts with adequate professional experience and 23 non-professionals with inadequate experience will result in spurious judgements and lower validity ratings (Yun & Ulrich, 2002). Burns (1995) warns that "collecting the opinions of the non-professional SMEs and asking them to assign numbers to their opinions produces "data," but does not remove the process from the face validity category; the data is simply a quantification of their specious opinion. Therefore, selecting a panel of judges must be done with careful consideration of their relevant qualifications for the content domain they will be assessing. Selecting a panel of experts involves identifying their position (i.e., job title), experience, knowledge, and availability to complete the task in the allotted time frame (Yun & Ulrich, 2002). Selection also involves determining how many experts will be needed on the panel to meet the minimum content validity ratio values required for statistical significance. According to Lynn (1986), "a minimum of five experts would provide a sufficient level of control for chance agreement" (p. 383). However, more experts are preferable and lower the minimum levels of agreement required to achieve adequate interrater agreement among ratings. "If there are five or fewer experts, all must agree on the content validity for their rating to be considered a reasonable representation of the universe of possible ratings" (p. 383). The more experts involved on the panel of judges, the lower the minimum amounts of agreement that are acceptable. Yun and Ulrich (2002) suggest a multidisciplinary strategy for selecting panels of experts. Multidisciplinary experts bring different perspectives to the content validation process that have the potential of expanding our current levels of awareness of the topic of interest in a more comprehensive fashion. One advantage to using multiple raters from multiple domains is that each may potentially bring a distinct perspective of the content domain, which may provide unique insight of which the other domains are unaware. Especially, for example, in the area of PTSD where current understanding of the disorder is still considered incomplete, diverse 24 disciplines may contribute valuable domain specific knowledge that increases our understanding of the topic area. By assembling a multidisciplinary team of experts from such fields as psychology, psychiatry, and neuropsychology, (for instance), the potential exists to broaden our current understanding of PTSD and enhance future assessments of this debilitating disorder. Multidisciplinary perspectives during a content validation study may contribute valuable knowledge in the areas of domain definition, item generation, item revision, and adequate content representation that single disciplines may miss. However, one limitation of using multidisciplinary teams involves the increase potential for discrepant ratings. Discrepancies among raters can occur due to lack of complete understanding of the content domain within certain disciplines and lead to lower validity ratings in the study (Yun & Ulrich, 2002). Incomplete understanding of certain aspects of the content domain may result in judges guessing at certain ratings. Guessing, in turn, lowers the validity of ratings. In order adequately to control for lack of understanding and guessing, selection criteria should focus on a minimum amount of experience to enhance the likelihood that SMEs have adequate expertise to deter guessing. Another problem with using multidisciplinary teams is that in so doing, one may introduce bias into the ratings inherent in the different disciplines. For instance, psychiatrists assessing PTSD criteria may be biased towards a biological model of understanding PTSD, whereas psychologists may take a cognitive perspective with its own inherent biases. Such diverse perspectives may add an additional source of error variation to the judgments that lower validity ratings. Careful consideration must be given then to decide whether multidisciplinary teams may be more appropriate for a construct validation study than a content validation study, which depends on consensus. 25 Subject Matter Experts' Ratings SMEs evaluate an instrument's items and elements and rate them according to their relevance and representativeness of the content domain measured (Beck & Gable, 2001; Sireci, 1998a). The higher the rating of relevance and representativeness, the higher the content validity rating. Domain relevance refers to the relevance of the test items to the content domain they are intended to measure (Sireci, 1998b). SMEs assess to what degree an item is associated with the particular content domain and distinguish between items that are not highly related. Items that receive low relevance ratings are either eliminated or revised to increase their relevance, thereby increasing their content validity. In addition, Haynes, Richard, and Kubany (1995) argue that the relevance of an assessment instrument also depends on the appropriateness of its elements for the targeted construct and function of assessment. For instance, i f the instructions to participants or the rating scale chosen are not relevant to the content domain, content validity will be diminished. For example, in the case of PTSD, if the instrument developer chooses a dichotomous rating scale indicating the presence or absence of a particular symptom, but the function of the instrument was to rate the severity of symptoms, the dichotomous scale would not have been an appropriate choice and the content validity would be attenuated. In that example, a continuous rating scale would have been a more appropriate choice that would have been congruent with the purpose of the assessment and would have increased the content validity. Therefore, not only do the test items need to be relevant to the content domain, but also the other various elements of the assessment instrument. As well as rating the domain relevance, SMEs also rate domain representativeness. Domain representativeness refers to evaluating the degree to which test items represent the content domain specifications (Sireci, 1998b). In other words, content experts assess the amount 26 of the entire content domain that is covered by the test items on the instrument. As Haynes, Richard, and Kubany (1995) state, "the representativeness of an assessment instrument refers to the degree to which its elements are proportional to the facets of the targeted construct" (p.239), or the degree to which the entire construct domain can be reproduced. Adequate content coverage encompassing aspects of the entire content domain yields high representativeness ratings and higher content validity ratings. In addition, content representativeness and content relevance are considered directly related. Sireci (1998b) states that i f an instrument has high content representativeness, the items should also have a high degree of relevance. Items that are representative of a domain are likely to be highly relevant. In order for SMEs to rate the representativeness and relevance of an instrument, they must first be provided with a clear conceptual and operational definition of the content domain. Content experts then can rate the correspondence between the conceptual and operational definitions of the content domain (Beck & Gable, 2001; Lynn, 1986). Defining the Content Domain One of the essential first steps to undertake in a content validation study is clearly to define the content domain. Domain definition refers to specifying the conceptual and operational definitions of the particular content domain (Sireci, 1998b). The accuracy of a content validation depends on how precisely the construct is defined and the degree to which the experts agree about the domain and its facets (Haynes, Richard, & Kubany, 1995). If a construct is poorly defined, SMEs do not have a reliable basis forjudging the content validity of the items. If a construct has fuzzy definitional boundaries or inconsistent definitions, content validation is negatively affected (Haynes, Richard, & Kubany, 1995). Judges may rely on disparate definitions that will lead to incongruent ratings. Unreliability will negatively impact the amount 27 of agreement in the judge's ratings and will lower overall content validity ratings. According to Haynes, Richard, and Kubany (1995), "a construct that is poorly defined, undifferentiated, and imprecisely partitioned will limit the content validity of the assessment instrument" (p. 244). However, clearly defining the concept to be measured and establishing the objectives outlining the purpose of the instrument will positively impact content validation. A clear conceptual definition justifies the content of an instrument and relates it to the broader theory thereby providing a means for how to interpret the results of a study (McKenzie, Wood, Kotecki, Clark, & Brey, 1999). Clear, concrete definitions make judgements about items easier and make subsequent interpretations more straightforward. Clear definitions help limit the universe of possible items to sample arid makes selection of possible items much easier (McKenzie, Wood, Kotecki, Clark, & Brey, 1999). Limiting the universe of possible items refers to narrowing the available choices of items to those that are the most relevant. Moreover, carefully articulating, differentiating, and examining facets of the construct can facilitate the representativeness of the item content of the assessment instrument (Haynes, Richard, & Kubany, 1995). Therefore, clear domain definitions are likely to increase the relevance and representativeness of the instruments items, which will subsequently enhance content validity ratings. Identifying and defining the domain of content is accomplished through a comprehensive literature review and through consultation with content experts (Wynd, Schmidt, & Schaefer, 2003). During this process, all known aspects of the phenomenon of interest are investigated and identified through a systematic, thorough review of the literature. In addition, experts from a particular content area can also provide invaluable information based on their clinical experience and professional training to expand the developer's (or researcher's) awareness of the phenomenon of interest. 28 Other Considerations for Content Validation Studies According to Haynes, Richard, and Kubany (1995), content validation is a multimethod process that is applicable to all elements of the assessment situation, not just the instrument's items. Elements of an instrument refer to "all the aspects of the measurement process that can affect the obtained data" (p.238), which include the instruction to participants, response formats, response scales, audio-visual material, and situational depictions. Each can not only affect the data obtained, but also can also affect the relevance, representativeness, and clinical inferences drawn from the data (Haynes, Richard, & Kubany, 1995). For example, in his work with combat veterans suffering from PTSD, van der Kolk et al. (1996) state that the depicted battle scenes used to elicit physiological responses were reviewed by other combat veterans for relevance and face validity. In addition, the psychophysiological measures used were reviewed by PTSD experts and psychophysiologists. . Elements of the measurement process also include the behavioural observation codes, time-sampling parameters, and the situation in which the observation occurs because they can all affect the obtained data (Haynes, Richard, & Kubany, 1995). Observation codes used on assessment forms should be relevant to the construct of interest and representative of behaviours that are known to occur in the disorder. For example, i f a PTSD measure is assessing an individual's observed behaviours related to PTSD, such observational codes may include: exaggerated startle response, inability to recall events, or difficulty concentrating, as examples of related behaviours to PTSD. In addition, time-sampling parameters should be adequate for the testing situation given the applicable features of the domain. For instance, on self report measures, i f PTSD patients tend to decompensate when time pressure is on and that decompensating behaviour is not of 29 research interest, then adequate time should be allowed to measure relevant behaviours and preclude any irrelevant ones. Every element of an assessment instrument should be judged by multiple experts for relevance, representativeness, specificity, and clarity (Haynes, Richard, & Kubany, 1995). Specificity is related to representativeness and will determine to what degree certain areas of the content domain are adequately and concretely targeted. Ambiguous items, for instance, may mistakenly assess a construct for which they were not intended and thereby attenuate content validity ratings of a study. Concordantly, lack of clarity in any element of the assessment instrument can lead to confusion in the participants and subsequently lower the validity of the responses based on those elements. For example, if the instructions to the participants regarding the time frame during which the behaviours of interest should have occurred (e.g., within the last week, or last month) are not clear, participants may mistakenly respond based on the wrong temporal parameters. Thus, evaluation by experts with experience in detecting such details is important to improve the clarity of the instructions and other elements and may significantly increase the content validity of the responses. Interrater Reliability versus Interrater Agreement Two related properties of quantitative ratings by SMEs regarding content validity are based on concepts known as interrater agreement and interrater reliability. Goodwin (2001) contends that interrater agreement and interrater reliability do not indicate the same properties of the ratings. Interrater agreement is defined as the extent of the match between 2 or more raters' scores assigned to performances, behaviours, or observations; whereas, interrater reliability is the extent to which scores obtained from 2 or more raters, scorers, observers, or judges are consistent (Goodwin, 2001). Interrater agreement is based on the agreement among the judges for the 30 degree of the rating an object should be given, interrater reliability is based on the degree that that rating by each rater can be reliably reproduced. Concordantly, "indexes of agreement are conceptually different from reliability estimates derived from either classical test theory or G theory" (Goodwin, 2001, p. 16). Underlying structures of the equations used to estimate interrater agreement and interrater reliability are highly dissimilar; and derivations from the structural equations produce distinct results. Therefore, it is important to choose the appropriate index for the appropriate interrater concept. In this study, interrater agreement among SME ratings will be examined as one necessary, but not sufficient piece of evidence of content validity. Consensual validation by subject matter experts is considered the preferred method for establishing adequate content validity (Veneziano & Hooper, 1997). According to Goodwin (2001), sources of error that may affect SME ratings include: 1. Intraindividual factors (e.g., fatigue, anxiety, lack of motivation). 2. Characteristics of the measure (e.g., unclear or ambiguous items, too few items). 3. Administration factors (e.g., unclear instructions, insufficient time to complete tasks). 4. Scoring Errors. Other factors that could produce discrepancies in interrater agreement are: 1. Different theoretical orientations of the experts. 2. Misunderstanding the rating task. 3. Differences in leniency and strictness of ratings (Overall & Magee, 1992; Yun & Ulrich, 2002). Yun and Ulrich (2002) suggest that researchers should decide a priori a method for how to analyze discrepancies among the raters. For example, will one rater's judgement be ignored i f the 31 majority of other panel members agree on an item that judge does not? Or will the decision be made based on match theoretical construct/criteria? One quantitative evaluation process developed by Lawshe (1975) and then expanded by Veneziano and Hooper (1997) asks jurors to rate the appropriateness of each item by stating i f it is essential; useful, but not essential, or; not necessary. Interestingly, an instrument that provides inadequate evidence of content validity may still provide positive, albeit incomplete, evidence for validity (Haynes, Richard, & Kubany, 1995). Though there may be inadequate content validity evidence, the instrument may demonstrate excellent interobserver agreement and good criterion validity if the indices of shared variance between the instrument and criterion are the result of shared variance of extraneous elements outside the construct domain (Haynes, Richard, & Kubany, 1995). Using an instrument with incomplete evidence of content validity calls into question the inferences drawn from the obtained data because the variance in obtained scores cannot be explained to a satisfactory degree by the construct (Haynes, Richard, & Kubany, 1995). The data may over-represent, omit, or under-represent some facets of the construct and reflect variables outside the construct domain (Haynes, Richard, & Kubany, 1995). In addition, the instrument with a low degree of content-related validity evidence could erroneously indicate the occurrence or nonoccurrence of clinically significant treatment effects (Haynes, Richard, & Kubany, 1995). "Similarly, erroneous inferences could be drawn about causes for panic attacks (e.g., the immediate triggers for attacks or the factors that affect the severity or duration of attacks) because estimates of shared variance would be based on erroneous measures of the construct" (Haynes, Richard, & Kubany, 1995, p. 240). Inferences made from assessment instruments with 32 unsatisfactory content validity will be suspect, even when other indices of validity are satisfactory (Haynes, Richard, & Kubany, 1995). Indices of Content Validity A second approach to a posteriori content validation involves empirical techniques to quantify the degree of content-related validity evidence (Beck & Gable, 2001). "Content validity indices are specific to a particular function of the assessment instrument and to other factors such as the population to which the instrument is applied and the assessment situation in which the instrument is used" (Haynes, Richard, & Kubany, 1995, p. 245). As a result the following recommendations are made regarding content validation: (a) indices of content validity cannot be presumed to remain stable across time, (b) the content validity of psychological assessment instruments should be periodically examined, (c) psychological assessment instruments should be revised periodically to reflect revision in the targeted construct (d) realizing that erroneous inferences regarding revised constructs may be drawn from unrevised assessment instruments (Haynes, Richard, & Kubany, 1995). In reviewing the various content validity indices, the Content Validity Ratio (CVR; Lawshe, 1975) and the Content Validity Index (CVI; Lynn, 1986; Waltz and Bausell, 1981) were chosen based on recommendations in the literature and their suitability with the level of measurement in this study (see pages 40 and 41 for further explanations). Both the C V R and the CVI are appropriate indices for use with ordinal data. Other indexes, such as the Intra-Class Correlation for example, were more appropriately used with interval or ratio level data, and thus, were eliminated from the choices for this study. In addition, the Average Deviation Index (AD) was chosen to evaluate the amount of disagreement in the SMEs' ratings. The A D index is also 33 appropriate for ordinal data and recommendations in the literature suggest reporting both the mean and the median scores (Burke et al., 1999; see Table 2). 34 Chapter Three Method The purpose of this study was to conduct a content validation of the Clinician-Administered Post Traumatic Stress Scale (CAPS-dxsx; Blake et al., 1995). The goal was to evaluate (a) the C A P S adequacy in sampling the entire content, domain of P T S D (b) how relevant each of the elements of the C A P S was, and (c) whether our current definitions of trauma and P T S D in the D S M - I V - T R ( A P A , 2000) are considered adequate by our subject matter experts. Participants Participants in this study consisted of two separate groups. The first group for the pilot study consisted of two graduate students (one male = 52 years old, and one female = 44 years old); both had master's degrees in psychology and were in the age range of subsequent subject matter experts in the main study. Sampling of the pilot study participants was a random sampling of students in a psychology class on P T S D . The second group of participants for the main study was a panel of six subject matter experts (SMEs) (Males n = 5, Females n= 1) in the content domain of P T S D . Due to a limited amount of subject matter experts for P T S D , purposive sampling was used in order to obtain enough well-qualified experts that met the inclusionary criteria explained below. Participants ranged in age from 32 to 54 years (M= 44, SD = 9.12 yrs.) all six participants in this study had a Ph D. in psychology and their years of experience in working with people with P T S D (and / or researching it academically) ranged from eight to 20 years (M- 13 yrs., SD = 4.87 yrs.). Types of trauma populations the experts work with is unknown at this present time. Selection criteria for this study was as follows: completion of a minimum of a Master's degree in a relevant profession to P T S D (e.g., psychology, or psychiatry); specialized training in 35 PTSD (e.g., accredited course work, continuing education workshops); a minimum of five years or more of recent (e.g., within the last five years) clinical or research experience working with clients having PTSD or on published research projects about PTSD (respectively). Sireci (1998) recommends that selection criteria for SMEs should be based on relevant knowledge and recent experience regarding the content domain of interest to ensure competence in the subject matter and to enhance the quality of the ratings. Lynn (1986) recommends a minimum of three panelists to control for chance agreement, but recommends five to ten as optimal. However, she acknowledges that determination of the number of experts needed is somewhat arbitrary. Procedure A pilot study was conducted with two participants (one Male = 52 years old: one Female = 44 years old) having similar ages and education levels as our subject matter experts. Subjects were recruited via personal solicitation by the researcher at a local university, in order to pretest the fidelity and clarity of the content validation study's questionnaire. Pilot study participants were asked to assess the clarity, comprehensibility, and accuracy of the study's questionnaire and to make suggestions or recommendations for improving the quality of the survey instrument. In addition, participants were timed in order to provide an estimate of how long it would take for SMEs to complete the questionnaire for the study. The average of the two participants' times was 75 minutes to complete the questionnaire. Recommendations and suggestions made by the participants included revising some typographical mistakes in the spelling and numbering of questions, and changing an inaccuracy in question number 136 regarding to which question on the Life Events Checklist the item was referring. Recommendations were reviewed and incorporated into the final product. 36 Subsequently, subject matter experts (SMEs) were recruited using a purposive sampling technique in order to obtain enough qualified experts in P T S D , which can be quite limited. Participants were contacted via telephone, or letter (see Appendix A ) by the researcher. Upon consent, participants received a package v ia mail including a cover letter (Appendix A ) , an informed consent form (see Appendix B) , a copy of the questionnaire (Appendix C) , the D S M -IV 's definition of P T S D (Appendix D) specific instructions for completing the review, and a self-addressed stamped envelope in which to return the completed questionnaire. Written informed consent was obtained prior to data collection. Data collection was collected in one administration in order to minimize the work-load on S M E s and to enhance the likelihood of their participation. Originally, ten subject matter experts agreed to participate in this study, and were sent a study questionnaire. O f the ten questionnaires mailed out, only six were returned for the study. Panelists were informed in the cover letter of the tasks to be completed, estimated amount of time to complete the tasks, and the conceptual underpinnings o f the rating task. Financial incentives were not offered, however, internal rewards were highlighted. Participants were informed of the importance of their participation and the contribution it would make to the extant knowledge regarding P T S D . Participants were asked at the time of recruitment i f they would be wi l l ing to come to the University o f Brit ish Columbia for a consensus-building meeting regarding the study. However, due to time constraints, and the up-coming summer holiday break, participants declined the offer. Voluntary withdrawal of participation from the consensus-building meeting was respected and participants were thanked for their willingness to participate in the survey. 37 Measures The Clinician-Administered PTSD Scale (CAPS-dxsx; Blake, et a l , 1990, 1995; Appendix E) is a 30 item semi-structured interview, which is intended for use by clinicians, clinical researchers, and appropriately trained mental health paraprofessionals with a.working knowledge of PTSD. Developed at the National Center for PTSD, the CAPS-dxsx was designed as a general diagnostic instrument to assess and diagnose current and lifetime PTSD status. It was also designed to overcome limitations of previously existing clinical interviews that can be used to assess PTSD, which often yielded only categorical/dichotomous data (e.g., presence/absence) of PTSD rather than dimensional ratings (e.g., frequency, intensity of symptoms). The CAPS-dxsx assesses the 17 core symptoms of PTSD outlined in the DSM-IV, as well as eight associated symptoms of PTSD outlined in the DSM-III-R and derived from clinical research literature on PTSD, and five additional items assessing a) impact on social functioning; b) occupational functioning; c) global PTSD symptom severity; d) global changes in symptoms; and e) overall response validity. The frequency and intensity of each symptom are rated on separate 5-point Likert-type scales ranging from 0 to 4 (lowest frequency or intensity to highest). Standard prompt questions, " suggested follow-up questions, and behaviourally anchored rating options are provided for each item. Prompt questions and behavioural anchors were incorporated to enhance reliability by increasing rating precision (Blake et al., 1995). Responses on the rating scale yield both dichotomous (presence/ absence) scores for diagnostic purposes and continuous data (frequency and intensity) scores for a finer-grained analysis of PTSD symptom severity. The CAPS is specifically designed to establish that all endorsed symptoms occurred within the same one-38 month time frame in order to distinguish between current and lifetime PTSD status ratings. Follow up questions regarding the time of symptom occurrence are included to further delineate between current and lifetime PTSD timeframes. A 3-point rating scale (unlikely, probable, and definite) is used for the last nine items on the CAPS, which allows the clinician to assess i f the reported symptom is the result of a specific traumatic event. Data were coded on a 17-item summary sheet included with the CAPS. Summary sheet scoring was designed to ease scoring and interpretation tasks. In addition, nine different diagnostic scoring rules can be employed to assess the data. Review of scoring rules is strongly suggested by Weathers et al. (1999) since different scoring rules produce different outcomes. In all cases, higher scores indicate more frequent or more intense symptoms of PTSD, while lower scores indicate less frequent or less intense symptoms (or their absence). In assessing the different diagnostic criteria for assessing PTSD according to DSM-III-R or DSM-IV standards, Weathers et al., (1999) found that both yielded near-identical results. However, they discovered that the results varied depending on which scoring rule was chosen. Recommendations by Weathers et al. (1999) state that the optimal selection strategy for choosing which scoring rule to use, is to use several different scoring rules depending on the purpose of the assessment and to evaluate the impact of the various rules on the outcome of the study. General guidelines suggested by Weathers and colleagues (1999) are: 1. to use a lenient rule (e.g., F1/12) for screening purposes when false negatives are to be avoided. 2. a moderate rule (e.g., S X C A L ) for differential diagnosis, when false positives and false negatives are equally undesirable. 39 3. a stringent rule (e.g., F1/12/SEV65 or CR 60) for confirming a diagnosis or creating a homogenous group of individuals with unequivocal PTSD. Qualitative Questions Part two of this study involved four open-ended questions to survey the subject matter experts' opinions regarding the current conceptualizations of PTSD and trauma in the DSM-IV-TR (APA, 2000) and their suggestions for improvements of those definitions. Answers will be analyzed by looking for emergent themes in the data as suggested by Denzin and Lincoln (2000) using the SMEs' own words as data. The Rating Process for the Quantitative Questions Reviewers were asked to rate: 1. the clarity of the administration instructions. 2. the relevance, representativeness, and clarity of the items. 3. the clarity and appropriateness of the CAPS scoring procedures. Haynes, Richard, and Kubany (1995) state that content validity depends on the relevance and representativeness and clarity of every element of the assessment instrument. Ratings were assessed on a four point Likert-type scale ranging from 0 to 3 with anchor labels (Not at all relevant, Somewhat relevant, Mostly relevant, Very relevant). Use of a four-point scale is based on recommendations by Sireci (1998b) and Lynn (1986) in order to avoid neutral responses. In addition to the items on the CAPS, the instructions, response formats, and scoring procedures were also assessed for their clarity, ease of understanding (of what to do), and their appropriateness. Discrepancies in the S M E ratings of content validity could be from several sources. Discrepancies could be a result of: 40 1. different theoretical orientations of the SMEs, 2. misunderstanding of the rating task, or 3. differential leniency/strictness tendencies. Several questions regarding the SME's theoretical orientation and beliefs about PTSD were asked on the demographic questionnaire and would be matched with their ratings as well as compared to other SMEs ratings to see i f a pattern of responding emerges that would possibly account for discrepancies in the ratings. An overall examination of each judge's pattern of rating was explored to see if any patterns that may indicate excessive leniency or strictness emerged. In extreme cases, discrepancies due to one individual judge's extreme leniency or strictness may warrant discarding his/her data in order to enhance the quality of the ratings. However, final decisions regarding handling discrepancies in the data were made on a case-by-case basis depending on the possible factors involved. Content Validity Analysis The first step in the analysis process involved calculating content validity indices for the items. According to Goodwin (2001), choice of which indices to use depends on the level of measurement. In this study, ordinal data was collected, which ruled out use of many of the indices designed for nominal, interval or ratio level data. Data from the completed questionnaires was transformed into several indices of content validity appropriate for use with ordinal data. Crocker, Miller, and Franks (1989) also recommend thoughtful consideration of the purpose of the content validation study in order to decide which content validity index to use. According to Crocker, Miller, and Franks (1989), there are three distinct purposes for quantifying degree of fit: 41 1. To assess the overall fit between test and content domain. In this case they recommend use of either a Percentage Agreement index, or Klein and Kosecoff s (1975) correlation index. 2. To assess the fit of individual items to a content domain. In this case they recommend Hambleton's (1980) Item-Objective Congruence Index, Aiken's (1980) Validity Index, or Lawshe's (1975) Content Validity Ratio. 3. To assess the impact of test specifications on examinee performance. In which case Jarjoura and Brennan's (1982) variance component estimate is recommended. Since the purpose of this study is to assess the fit of individual items to a content domain, Lawshe's (1975) Content Validity Ratio (CVR) was chosen, as recommendations by Crocker, Miller, and Franks (1989). Lawshe (1975) developed the content validity ratio (CVR), which is a direct linear transformation of the SME ratings. The formula for the individual item C V R is as follows: C V R ratio: n«- N/2 ne represents number of panelists indicating an item is "essential" N/2 N is the total number of panelists. First, the C V R is calculated for each item by tallying the number of the highest ratings (2 or 3 on our scale) that the item received from the subject matter experts and then subtracting an amount that equals half of the number of judges in the study and then dividing the resulting amount by that same number again (see formula below). The final amount represents the content validity ratio for that item and higher amounts indicate stronger content validity. The range of the C V R is from -1.0 to +1.0. Expected values for the C V R are approaching +1.0 to be considered strong content-related validity evidence." 42 Subsequently, once all the individual content validity ratios are calculated for the items the average of those ratings is calculated as the content validity index for the whole instrument. The scale-level content validity index represents the proportion of items or elements on the whole instrument that are judged as strong evidence of content-related validity evidence. Higher values represent stronger content validity. This scale level version of the content validity ratio/index is not to be confused with Waltz and Bausell's (1981) CVI used in the following part of the study. Choice of which index of content validity to use depends on several factors such as the level of measurement involved in the study, the research question, and the complexity of the calculations for the index. Crocker, Miller, and Franks (1989) recommend that either Lawshe's CVI, Aiken's validity index (V), or Hambleton's (1980) Item-Objective Congruence Index could have been chosen for the purposes of this study with ordinal data. Aiken's validity index (V) was not chosen because of its potential complexity issues. As the number of rating categories increases, the calculation of Aiken's V becomes more complex and may require the use of a special computer program to calculate (Crocker, Miller, & Franks, 1989). Similarly, Hambleton's (1980) Item-Objective Congruence Index was not chosen because it is more time-consuming forjudges during the rating task since it requires judges to match every item to a single objective (Crocker, Miller, & Franks, 1989). A second content validity index that was chosen for this study was Waltz and Bausell's (1981) version of the Content Validity Index (CVI). According to Lynn (1986), the CVI is the most widely used index of content validity. To calculate this version of the CVI at the item level, tally the number of ratings of 2 or 3 that each item receives from the judges and then divided that number by the total number of judges. For example, in this study each item was rated by six judges, so one would count the number of ratings of 2 or 3 that a particular item received and 43 then divide that total by six (for the number of judges). The resulting amount represents the proportion of judges who rate the item as having strong content-related validity (Rubio, Berg-Weger, Tebb, & Lee, 2003). For instance, i f four of the six judges in this study rate the item as a 2 or 3 on the rating scale, then the proportion for that item is (four divided by six, which works out to be a CVI value of) .66. The higher the proportion is for the item, the higher the degree of content validity the item is judged to have. The theoretical range for the CVI is from 0 to +1.0. Thus, the CVI represents a proportion agreement statistic of content validity. However, strong criticism of proportion agreement statistics exists within the extant literature. Opponents of proportion agreement statistics argue that the proportions are largely inflated by chance agreement (Goodwin, 2001). Consequently, when using proportion agreement statistics, many recommend using some type of control for chance agreement. For instance, Lynn (1986) recommends that with six raters, five of the six would have to rate the item as a 2 or 3 to account for chance agreement and to be considered strong evidence of content-related validity. 5/6 works out to a proportion of .83, therefore, a cut score for the CVI of .83 was used in this study to represent an acceptable level of content-related validity evidence. Interrater Agreement Analysis A third index was calculated to assess interrater agreement/disagreement within this study* in order to explore the degree of disagreement in the ratings. The Average Deviation Index (AD) was developed by Burke, Finkelstein, and Dusig (1999) The ADA/ or A D M * represents the average absolute deviation from the mean or median (respectively) of each of the judge's ratings on a target. The A D ^ and AD^rf are considered indexes that measures the dispersion of responses about the mean (or median) and therefore, a smaller score indicates better interrater 44 agreement. The theoretical range for the ADM or ADMCI are from -1.0 to +1.0 (Burke & Dunlap, 2003). The ADM and A D ^ indexes allow researchers to establish a priori ranges of agreement, or to specify acceptable ranges of disagreement in the ratings, which may represent random error or chance responding (Burke & Dunlap, 2003). To calculate the A D ^ or ADMd , each item's rating is subtracted from the mean or median for that item. Then the absolute values of those deviations are summed and then divided by the number of judges rating that item. A higher value of the A D ^ o r ADwrf represents higher disagreement in the ratings. Burke et al. (1999) recommend a cut-off for acceptable level of interrater agreement as c/6, where c represents the number of points on the rating scale (and the six is not defined by the author). In this study, the rating scale has 4 points, (ranging from 0 to 3), therefore, 4/6 works out to a cut score of .66. In contrast to the CVI, we are looking for any A D score lower than the cut score (.66) as an acceptable level of interrater agreement. 45 Chapter Four Results The purpose of this study was to gather and examine content validity evidence for the latest version of the Clinician-Administered Post Traumatic Stress Scale (CAPS-dxsx). The CAPS is currently the most widely used semi-structured clinical interview for assessing PTSD and is considered the "gold standard" in the extant literature. However, no investigation to date had ever conducted a content validation study of this popular instrument. Content validation is an important aspect of both test development and a posteriori validation procedures; and although subject matter experts participated in the CAPS development (Blake et al., 1995), that alone does not provide content validity evidence for the instrument. Hence, the ability of the CAPS measure to provide content-related validity evidence needed to be explored in order to fill in the existing gap in the validity evidence, and to justify its use as an assessment tool for measuring PTSD. Preliminary analyses consisted of calculating 3 different indices (two content validity indices and one deviation index), in order to (a) assess the ability of the CAPS-dxsx to produce content-related validity evidence of PTSD and to (b) assess levels of agreement in the ratings (see Appendix F). Content validity was assessed by calculating the Content Validity Ratio (CVR; Lawshe, 1975) for each item and then averaging that statistic to represent the content validity index for the entire measure; secondly, Waltz and Bausell's (1981) version of the Content Validity Index (CVI) for each item was calculated (see Appendix F), and then for the various subscales of the CAPS. Finally, the Average Deviation index (AD; Burke et al., 1999) was calculated to assess the degree of disagreement in the ratings (see Appendix F). 46 Content Validity Ratio Initial content validation analysis was conducted by calculating Lawshe's Content Validity Ratio (CVR) for each item of the CAPS (see Appendix F). The C V R is an item statistic that represents a direct linear transformation of the subject matter expert's (SMEs) ratings of content validity (Lawshe, 1975; Sireci & Giesinger, 1992). To qualify as strong content validity evidence, each item had to have received a rating of 2 or 3 on the 4-point Likert-type rating scale (ranging from 0 to 3) for its relevance, representativeness, and/or clarity by the SMEs. Items receiving a rating of 0 or 1 were considered weak evidence of content-related validity (Lynn, 1986; Waltz & Bausell, 1981). The underlying assumptions of the Content Validity Ratio are: "any item... perceived to be 'essential' by more than half of the panelists, has some degree of content validity"; and "the more panelists (beyond 50%) who perceive the item as 'essential', the greater the extent or degree of its content validity" (Lawshe, 1975, p.567). Thus, the higher the content validity ratio value, the greater the degree of content-related validity evidence according to the raters. In this study, the items rated by all the SMEs as a two or three achieve the highest value of 1.0 based on Lawshe's C V R formula (see page 40). The theoretical range for the C V R is -1.0 to +1.0, however Lawshe (1975) recommends a practical range of 0 to +1. The items in this study earning the highest and lowest C V R values were summarized in Table 1 (below). The sections with the highest C V R values include: the overall administration instructions, the traumatic event questions, and the individual CAPS items. In contrast, the items receiving the lowest C V R values of 0 in this study were the representativeness items, the "administer checklist" instruction, the format, the intensity questions, the impairment of 47 functioning and global severity questions, the summary chart, survivor guilt, and the CAPS Summary Sheet (see Table 1). 48 Table 1 Content Validity Ratio Highest and Lowest Values Administration instructions 1.0 Administration instructions CAPS Questions with HighestValues C V R CAPS Questions with Lowest Values C V R Traumatic event questions 1.0 Representativeness CAPS Individual items 1.0 Intensity questions Life Events Checklist (questions: 1.0 Impairment of Functioning and Global o Severity questions Format Summary chart Survivor guilt Summary Sheet 0 0 0 0 After calculating the C V R for each item, the CVI for the entire instrument was calculated by taking the mean of all the C V R statistics. This overall averaged CVI statistic differs from Waltz and Bausell's (1981) individual CVI statistic described below. In this part of the study, the CVI value for the entire CAPS instrument (based on an average of the C V R ratings) was a moderate .50. Thus, overall in this study, the CAPS provided moderate content-related validity evidence. Generally, the higher the content validity ratio, the higher the degree of content 49 validity. In the literature, values of .80 or higher for the overall CVI are considered as strong validity evidence. Thus, a CVI value of .50 for the CAPS (dxsx) does not meet the minimum acceptable level of content-related validity evidence to be classified as strong evidence. Subsequent analyses of various subscales of the CAPS were conducted in order to determine if various sections of the CAPS provided stronger content-related validity evidence than the overall (averaged) rating (Table 2). The CAPS questionnaire was broken down into the following categories: 1) Administration Instructions; 2) Traumatic Event Questions; 3) Rating Scales; 4) Format of the CAPS; 5) Individual Items; 6) Global Ratings; 7) Scoring Summary Chart; 8) Associated Features of PTSD; 9) CAPS Summary Sheet; 10) Time-sampling Parameter; 11) Life Events Checklist; and 12) Overall Adequacy of the CAPS to measure PTSD according to the DSM-IV's definition (see Table 2). As seen in Table 2, only one subscale of the CAPS, the traumatic events subscale, received a significant C V R rating as relevant to PTSD. A l l other subscales and elements failed to meet minimum acceptable levels of content-related validity. Thus, strong evidence of content validity for the other areas of the instrument was not supported by the data. However, of all of the subsections, the Life Events Checklist (LEC) received the best overall C V R ratings for all categories. The L E C obtained a C V R of .55 for relevance, .46 for representativeness, and .84 for ease of understanding. In contrast, the lowest C V R rating of 0 was given for the representativeness of the CAPS traumatic event questions. 50 Table 2 CAPS Subscales Averaged Values for Content Validity and Interrater Agreement Indexes Subscale Relevant Representative Clarity Ave. Ave. Ave. Ave. Ave. Ave. C V R CVI ADM ADMd C V R CVI A D M A D Md C V R CVI ADM ADM Administration .45 .83* .67 .33* - .92* .58* .38* Instructions Traumatic 1.0* 1.0* .40* .33* 0 .50 .67 .67 Event Questions Rating Scales .33 .67 .89 .83 - - - - - - -Format .17 .5 .83 .83 - - - - .67 .50* .55* Individual .31 .87* .67 .67 .53 .76 .62* .62* .53 .72 .57* .57* Items Global Ratings 0 .5 .72 .72 - - - - .5 .83 .83 Scoring .50 .75 .91 .91 .6 .64 .64 Summary Chart Associated .30 .65 .71 .71 .17 .55 .70 .65* .17 .83* .58* .58* Features Summary .13 .5 .77 .77 _ .67 .67 .67 .53 .53* .52* Sheet Time-sampling .33 .67 .67 .67 _ _ Parameter Life Events .55 .76 .58* .58* .46 .72 .64* .64* .46 .92* .48* .48* Checklist Overall .67 .83* .50* .55* - .67 .83 .83 _ Adequately Met Objective -DSM-IV Note: * = significant at the .05 level. Ave. = Average CVR values need to be close to 1.0 to considered strong content-related validity evidence. CVI values are expected to be .83 or larger to considered strong content-related validity evidence. ADM values are expected to be < .66 to be considered strong content-related validity evidence. A second content validity analysis of the data was conducted using another variation of the Content Validity Index (Waltz & Bausell, 1981). According to Lynn (1986) and Wynd, Schmidt, 51 and Schaefer (2003), this version of the Content Validity Index (CVI) is the most widely used index of content validity in the extant literature. The CVI represents the mean of the number of items that are rated by the subject matter experts as highly content relevant or representative for the domain of interest. Thus, the CVI is a proportion agreement index that is based on the proportion of raters that rate the item with the highest ratings of 2 or 3 on our 4-point Likert-type scale (Lynn, 1986; Waltz & Bausell, 1981; Wynd, Schmidt, & Schaefer, 2003). As seen in Table 2, only seven CVI values out of 26 on the CAPS instrument reached the required minimum agreement of .83. These include the relevance of the administration instructions, traumatic event questions, and the individual items, as well as the ability of the measure to assess PTSD according to the DSM-IV's definition (see Table 2). In contrast, three areas of the CAPS instrument were rated highly by our experts for their clarity. These three are: 1) the administration instructions, 2) the associated features, and 3) the Life Events Checklist (see Table 2). Each received high ratings for their clarity of the wording and ease of understanding of what was being asked. Interrater Agreement Analysis Interrater agreement amongst the ratings was evaluated using the Average Deviation index (Burke et al., 1999). The Average Deviation (AD) index represents the average absolute deviation from the mean of each of the judges' ratings on a particular target. Lower scores of the A D represent higher interrater agreement. In this study, A D scores lower than .66 are considered acceptable levels of interrater agreement. As seen in table 2, the A D value for the traumatic event questions and the Life Events Checklist (LEC), and the "ability of the CAPS to measure PTSD according to the DSM-IV" question demonstrate low levels of disagreement. The administration instructions and time-52 sampling parameter receive borderline ratings of .67, which are one point too high to be considered strong evidence of content validity. In the representative category, the CAPS items and the Life Events Checklist both demonstrated high levels of interrater agreement. Interestingly, the evidence for the representativeness of the CAPS items was not supported by any of the other three indexes' data. In contrast, several areas of the CAPS demonstrated high levels of interrater agreement for clarity. In particular, the administration instructions, the format, the individual items, the associated features, the summary sheet,, and the Life Events Checklist were all rated as easy to understand (see Table 2). Overall, in examining the various ratings of the CAPS elements by our expert judges, several interesting findings stand out. First, the traumatic events questions received high content validity ratings for their relevance as indicated by both the C V R and CVI indexes. Second, the administration instructions received high content validity ratings for their relevance and clarity as indicated by the CVI, but not the CVR. Third, the individual items on the CAPS demonstrated strong content-related validity evidence for relevance by the CVI , but not the C V R . Forth, both the Life Events Checklist and the Associated Features received high ratings for clarity, but not for representativeness or relevance. Finally, the CAPS was rated as meeting its overall objective to measure PTSD according to the current DSM-IV's definition by the CVI , but not the CVR. In addition, there were low levels of disagreement about the relevance ratings for Traumatic Event questions, the L E C questions, and the "meeting its objective" questions. The SMEs agreed that the content of traumatic events questions and the Adequately Met is Objective questions were relevant; and that the L E C questions almost met the minimum agreement for relevance. In addition, there were low levels of disagreement in the representativeness ratings for 53 the individual items, and the L E C . Both the individual items and the L E C were rated as near the threshold for representative of the entire content domain. Finally, there were low levels of disagreement in the clarity ratings for the format, the individual items, the associated features, the summary sheet and the L E C . Both the Associated Features and L E C were very clear and the Individual Items, format, and Summary Sheet were rated just below threshold for clarity. In conclusion, the global ratings, summary sheet, and format of the CAPS received the lowest overall content validity ratings in this study. Qualitative Questions' Analysis A second very important part of this study was to survey the subject matter experts in PTSD for their opinion of the adequacy of the DSM-IV's definitions of PTSD and trauma. Noteworthy in their responses was that none of the SMEs rated the DSM-IV's definition of PTSD as "Very Adequate". Only two of the six SMEs said the DSM-IV's definition of PTSD was "Mostly Adequate", and twice as many rated the DSM-IV's definition as only "Somewhat" adequate. Each expert agreed that vital inherent characteristics of PTSD are missing in the current DSM-IV-TR definition. In fact, two themes emerged strongly from the experts' comments. The first theme was that "dissociative symptoms" are an inherent part of PTSD not just an "associated feature" in the clients they see. The second theme was that "interpersonal trauma" or "relational trauma," should also be included as important traumatic events that can lead to the development of PTSD. Moreover, experts also said that one's "sense of se l f and "identity" are often affected in those with PTSD, yet currently unrecognized in the DSM-IV-TR; one expert described the effect as "a shattering of assumptions about self; in addition, themes regarding "one's sense of safety" as being "damaged" in those with PTSD was a very real consequence of trauma. In addition, 54 most experts agreed that "abandonment," and the various forms of "abuse" should be included in our current definitions of PTSD and trauma as well. Experts were also asked to provide an alternate definition of PTSD based on their professional clinical experience. For those who did provide an alternate definition, here are the results. PTSD is: 1) the development of characteristic symptoms following an event experienced as extremely traumatic, often, but not always involving threatened death or serious injury to self or others. The traumatic experience may also result from a shattering of assumptions about self, or the world that is so severe as to.affect daily functioning, but may result from experiences in which there was no physical threat to self, or others. 2) PTSD may present as a variety of symptoms following an event (as outlined in the DSM-IV definition) or abuse, neglect, or abandonment of different types. 3 ) (as outlined in the DSM-IV definition) but add the dissociative criteria currently reserved for acute stress disorder and also add relational traumas in childhood. Experts were also asked to provide alternate definitions for trauma, which were: 1) any threat to the organism that overwhelms its capacity for processing/functioning adequately. 2) something that shatters our sense of safety in the world, violates our assumptions about the world and our place in it, and remains unprocessed by us. 3 ) events which cause one to feel powerless, overwhelmed, and confused and which are negative and unexpected. This would include relational traumas such as sexual, emotional, and physical abuse, along with neglect and abandonment. Especially important to note that the younger the age, the less severe the stressor needs to be to traumatize (e.g., a 3 yr. old lost for 55 three hours in a mall; abandonment, criticism, rejection, betrayal, humiliation, embarrassment, ritual abuse, and torture). 4) any event that threatens or damages one's sense of self, safety (e.g., physical, emotional, psychological), or ability to live life fully. Others who did not provide an actual definition of PTSD or trauma commented that our current understanding of these concepts is "not broad enough" or "not inclusive enough" of all the characteristic symptoms and manifestations that they have seen in their patients. Recommendations were made for future studies to explore the full extent of symptomology associated with the disorder and to develop other categories of syndromes that account for all the complex responses that may occur in the aftermath of trauma. Chapter Five Discussion The purpose of this study was to gather content-related validity evidence for the latest version of the CAPS (dxsx; Appendix E). Subject matter experts (SMEs) in the field of post-traumatic stress were surveyed in order to assess the relevance, representativeness, and clarity of all elements of the CAPS instrument. Elements such as the administration instructions, traumatic event questions, rating scale, format, items, and other inclusive features (see Table 2) were evaluated for their relevance, representativeness, and clarity and some surprising results were found whose possible explanations and implications will be discussed below. In addition, SMEs were also queried regarding their professional opinions on the adequacy of our current conceptualization of PTSD delineated in the DSM-IV-TR (APA, 2000). Experts were asked to rate the representativeness of the current definition of PTSD based on their many years of academic or clinical experience working with people with the disorder. Moreover, they were asked to comment on what they felt was missing in the current definition and, if possible, provide an alternate definition that was more representative of the full range of symptomology that individuals may experience with PTSD. Several important points stand out from this present content validation study of the CAPS-dxsx that are worth noting. First, the CAPS-dxsx received many poor relevance ratings for its format, rating scales, global ratings, associated features, time-sampling parameter, scoring summary chart, and summary sheet. In fact, the global ratings and the summary sheet received the lowest ratings for relevance of all of the sections. However, SMEs rated the relevance of the 57 traumatic event questions, individual items, associated features, Life Events checklist, and overall objective as meeting acceptable levels of strong content validity. Second, and most strikingly, the CAPS did not meet any minimum acceptable levels of content-related validity evidence on any of the subscales surveyed for representativeness (Table 2). None of the traumatic event questions subscale, individual items subscale, associated features subscale, summary sheet subscale, Life Events Checklist subscale, nor the overall objective met the minimum acceptability level for representativeness. Poor representativeness ratings support previous research by numerous experts who argue that our current definition of PTSD is very unrepresentative of the full range of symptom sequelae involved in PTSD (Herman, 1992a; Horowitz, 2001; Lasiuk & Hegadoren, 2006; Roth, Newman, Pelcovitz, van der Kolk, & Mandel, 1997; Wilson, 2004). Third, the global ratings and scoring summary chart received the lowest ratings for clarity of all areas of the CAPS in this study. However, the CAPS did receive high clarity ratings for the administration instructions, associated features, and Life Events Checklist and moderate clarity ratings for its individual items, format, and summary sheet. One of the most noteworthy findings for the field of post-traumatic stress is that none of this study's subject matter experts rated the current definition of PTSD in the DSM-IV as very adequate. Most rated it as only "Somewhat Adequate" which represents the third lowest rating on the scale used in this study. One possible explanation for this poor rating is that a growing consensus in the field of post-traumatic stress says that our current conceptualization of PTSD is inadequate and unrepresentative of the entire content domain of post-traumatic stress responses that can occur following exposure to a traumatic event (Herman, 1992a; Horowitz, 2001; Lasiuk 58 & Hegadoren, 2006; Roth, Newman, Pelcovitz, van der Kolk, & Mandel, 1997). If this premise is true, one would expect subject matter experts to rate the CAPS low for representativeness since it was designed to measure PTSD according to the DSM-IV's definition. Concordantly, the representativeness ratings in this study were the lowest ratings overall and seem to support the above conclusions. Further support was demonstrated in the qualitative data in part two of the study that asked subject matter experts for their opinions regarding the current DSM-IV's definition of PTSD and trauma. The qualitative responses from some of the experts stated that our current conceptualization was "not broad enough" or "not inclusive enough" to encompass all of the symptoms that occur following exposure to a traumatic event. These conclusions support previous research by Roth et al. (1997) that criticize the current definition for being too narrow. In response, Roth and colleagues propose a new broader category, which they call Post Traumatic Stress Responses (PTSR), which builds on previous work by Horowitz (2001) to account for all of the possible post-traumatic reactions that can occur after trauma. Another one of the strongest themes to emerge from the qualitative data in this study was that according to our subject matter experts, "dissociative symptoms" should be considered an inherent part of PTSD and not just an "associated feature". In addition, our experts say that "interpersonal trauma" should be highlighted as a strong precipitating event, and, that "a shattering of assumptions about se l f is a pervasive consequence of experiencing trauma. Themes regarding dissociative symptoms as inherent to PTSD support previous research by Briere (1997) who found that individuals with PTSD who had histories of chronic sexual abuse manifest dissociative symptoms in adulthood. In addition, themes regarding interpersonal trauma support previous research by Herman (1992a) who linked interpersonal trauma to manifestation of complex PTSD symptomology. Other qualitative themes stated how issues of abandonment 59 and themes regarding "one's sense of safety" were also prevalent in those with PTSD and needed to be addressed in new definitions of PTSD. This study's findings regarding shattered assumptions of the self, support previous research by Janoff-Bulman (1985) who describes in detail the shattering effect that trauma has on an individual's beliefs and their sense of self and safety. Overall, this study's findings were that the CAPS-dxsx has relevant and clear administration instructions, relevant Traumatic Event questions, relevant individual items, and clarity amongst its associated features and items on the Life Events Checklist (though they did not meet the minimum require ratings for relevance or representativeness). However, even more interesting is that the CAPS-dxsx did not meet the minimum requirements for representativeness in any of its categories. These findings support previous research, which states that our current conceptualization of PTSD is inadequate and should be revised (Herman, 1992a; Roth, Newman, Pelcovitz, van der Kolk, & Mandel, 1997; van der Kolk, 1997). Conclusions Several conclusions can be drawn from this present study regarding the overall topic of PTSD and, in particular, the CAPS-dxsx instrument. First, as previously stated, many experts today in the field of post traumatic stress consider our current conceptualization of PTSD in the DSM-IV-TR (2000) as inadequate and unrepresentative of the entire content domain (Herman, 1992a; Horowitz, 2001; Lasiuk & Hegadoren, 2006; Roth, Newman, Pelcovitz, van der Kolk, & Mandel, 1997; Wilson, 2004). As such, many of our content experts rated the CAPS-dxsx as unrepresentative of the entire content domain of PTSD since it was designed to measure PTSD according to the DSM-IV definition. Third, some experts consider PTSD but one of several possible post traumatic stress responses that can occur after a traumatic event (Herman, 1992a; 60 Horowitz, 2001; Lasiuk & Hegadoren, 2006; Roth, et al., 1997). Many of these experts assert that constructs such as PTSD, complex PTSD, and disorders of extreme stress not otherwise specified (DESNOS) should be subsumed under an overarching category called PTSR (Post Traumatic Stress Responses; Roth, et al., 1997; Wilson, 2004). Forth, in this study, the CAPS-dxsx did not provide strong evidence of content-related validity evidence except in the areas of (1) the relevance, representativeness, and clarity of the items on the Life Events Checklist, (2) the relevance of the traumatic events questions at the beginning of the measure, (3) the relevance and the clarity of the administration instructions; and (4) the relevance, representativeness, and clarity of some of the individual items. Fifth, the CAPS-dxsx received poor ratings for content representativeness in almost every area (with the exception of the A D ^ values for the individual items and the LEC). Many experts felt that it did not adequately cover the entire content domain for PTSD. Authors should take note and consider revising the instrument in light of new developments in the conceptualization of PTSD. Sixth, the Rating Scales, Format, Global Ratings, Scoring Summary Chart, Associated Features, Summary Sheet, and Time-sampling Parameters received poor ratings in all three areas (e.g., relevance, representativeness, and clarity) with the exception of the clarity of the Associated Features. The developers of the CAPS may want to revise these areas and make them more relevant and representative for PTSD. According to Haynes, Richard, and Kubany (1995), every element of an instrument counts in a content validation study. The subject matter experts in this study judged them to be less than adequate. Strengths of this Study There are four major strengths of this present study that distinguish it from previous 61 research in the field of post traumatic stress. First, this present research is the only content validation study that has been done to our knowledge on the CAPS-dxsx. Previous research regarding the CAPS internal consistency, convergent and discriminant validity have been investigated, but as far as is known, there are no existing studies examining its content validity. For an instrument that is considered to be the "gold standard" in the industry, this gap in validity evidence needed to be rectified. Another strength of this study was to be able to draw upon subject matter experts with extensive experience in the field of post-traumatic stress. The average amount of experience for the subject matter experts was 13 years with the longest amount of experience being 22 years. This represents a wealth of experience to draw upon for expert knowledge. Experts had extensive experience in working with people with PTSD and had seen many various forms of symptom sequelae over time during their careers. Thus, they were able to address what manifestations of symptoms they had witnessed when discussing the subject rather than relying solely on book knowledge, which may be antiquated. A third strength of this study was its combination of quantitative and qualitative design, which allowed us to discover emerging themes from experts in the field of PTSD. Quantitative data provided numerical representation of the judges' ratings of the CAPS various components, but the qualitative data added layers of richness of description for relevant themes emerging in the field. Without the qualitative data, we would not have known what the experts thought was missing from current conceptualizations of PTSD. Instead, now we can say with confidence, that numerous experts in the extant literature and in this present study, both believe three main things. First, they argue that "dissociative" symptoms should be included as an inherent part of our current understanding of PTSD. Secondly, they believe that interpersonal trauma needs to be 6 2 recognized as playing a more important role than it currently does, in the etiology of PTSD. Thirdly, that other categories, such as complex PTSD or PTSR, should be accepted into the nomenclature. Though these themes appear somewhat in the extant literature by various proponents, they were strongly endorsed by our subject matter experts who had directly witnessed them in the field. A final strength of this study is its thoroughness in examining all aspects of elements of the CAPS-dxsx in the content validation process. According to Haynes, Richard, and Kubany (1995), every element of an instrument from the administration instructions to the scoring rubrics should be evaluated in a thorough content validation study. This study sought, in the validation process, to address every area that could affect the instrument's scores, thereby affecting the content validity ratings. As such, we gained some valuable insights into what current clinicians and practitioners in the field consider to be noteworthy regarding these elements and which ones need to be improved in the future. Limitations of this Study There are four inherent limitations in this study that became apparent. One limitation is that the subject matter experts' ratings are subjective, therefore, some consider that a threat to validity. However, others may argue that the ratings are from experienced experts in the field who are trained to be objective in their judgments. Secondly, another limitation of this study is that we cannot definitively say what the sources of error may be in the ratings. Since we do not have the SMEs' feedback from a consensus-building meeting, we can only speculate as to what the errors may be. Some of the possible sources of error may be: intraindividual factors (e.g., fatigue, lack of motivation, etc.), ambiguity surrounding the rating task, problematic administration procedures, or scoring errors 63 (Goodwin, 2001). Other sources may be different philosophical views, differential leniency/strictness tendencies, or subjective bias on the part of the raters. One thing is certain, however; more study needs to be done to investigate all of the relevant issues surrounding the topic of Post Traumatic Stress Disorder such as its inherent characteristics versus its associated features. A third limitation was that there was a considerable amount of variability in the experts' ratings, which may indicate only moderate reliability overall. Moderate levels of reliability limit the strength of the inferential claims that can be made from this study. Moderate reliability may be a result of problems of measurement, or problems of sampling (Zumbo, in press). Problems of measurement in this study may be a result of construct underrepresentation or construct underidentification (Messick, 1995; Zumbo, in press). The construct of PTSD delineated in the DSM-IV-TR (APA, 2000) has been implicated in this study as not representative of the entire content domain related to post traumatic stress. As a result, underidentification of the construct of PTSD limits the representative ratings of the CAPS in this study and attenuates the overall content validity ratings. Problems of measurement such as construct underidentification represent special cases of a missing data problem whereby the data you have is not a representative subset of the entire content domain related to PTSD (Zumbo, in press). A forth limitation of this study relates to the dichotomization of categories on the questionnaire's rating scale. In this study, a four point rating scale was used to survey the subject matter experts' responses; however the statistics used in the analysis required collapsing the 4 categories into two dichotomous categories which results in some loss of information that may have affected the results. Future studies may want to seek statistics that do not require the collapsing of categories, thereby retaining more variability in the data. 64 In addition to problems of measurement, problems of sampling may also be responsible for some of the variability in the data. Problems of sampling involved in this study may be related to differing theoretical orientations of our six subject matter experts. Differing theoretical orientations may account for a sizable portion of the discrepancies in the ratings regarding the relevance and representativeness of the CAPS measure. Underlying theoretical differences may cause some judges to rate items more stringently or leniently than others would. A narrowly defined theoretical orientation may consider the current definition of PTSD to be quite relevant and representative of the entire content domain, therefore, a judge with that orientation would likely rate the CAPS elements highly; whereas, a broader theoretical orientation may consider the scope of the CAPS to be very truncated and therefore, that subject matter expert would rate the CAPS poorly for representativeness. With such low representative ratings in this study, this explanation may be a plausible argument to explain some of the discrepancies in the various ratings. Underlying theoretical differences affecting ratings is an explanatory model, which supports previous research by Yun and Ulrich (2002). Yun and Ulrich state that differing theoretical orientations may be the basis for much of the variability in the data; yet they also claim it can be a source of more comprehensive findings as well. Differing theoretical orientations may fill in the gaps that may exist in one theory's knowledge alone and may lead to a more comprehensive understanding of a constructs or a disorder. By combining knowledge from several perspectives a deeper, richer understanding may emerge that serves to expand our current conceptualization of the construct. This may be especially true in the case of PTSD, where previous research and this study's findings indicate conceptual gaps do exist in our current definition of the disorder. 65 Ultimately, problems of measurement and problems of sampling weaken the strength of the inferences that can be made from this study. According to Zumbo (in press), weak inferential strength and limited generalizability of inferences is referred to as Initial Calibrative Inference, which represents the lowest level of inference. Initial Calibrative Inference limits one to simple calibration statements about items or behaviours involved in the current study and means that the data is likely not exchangeable in the greater population (Zumbo in press). For example, the particular mix of theoretical orientations this study's sample may not be replicated in another random sample from the same population. As a result, another study may produce different findings. Future Research Future research should assess whether problems of measurement or problems of sampling have biased the findings of the present study. Future studies should explore whether another sample of subject matter experts contradict or replicate low reliability and content validity ratings for the CAPS-dxsx as in this study. In addition, future studies may want to replicate this study with a larger sample size to see if that affects the results. If, other studies do replicate this study's findings, then the authors of the CAPS-dxsx may want to revise the instrument and make it more representative of the entire content domain of PTSD symptomology. Future studies should explore the mil extent of the characteristic symptom sequelae that may manifest in trauma clients, as well as to explore all of the relevant precipitating events and after effects in relation to various types of trauma (e.g., interpersonal, military, natural disasters, etc.) that can occur. If, in fact, the findings of this study reveal that our current conceptualization of PTSD according to the DSM-IV is inadequate, that has important implications for practitioners who may use this definition as a template to assess PTSD. Inaccurate diagnoses 66 could occur resulting in false negative Type II errors where individuals who have PTSD may be misdiagnosed since their presenting symptoms are unrecognized as fitting the current DSM-IV definition. Future studies should strive to define the domain accurately according to all the symptoms that should be considered inherent. Both the quantitative and the qualitative data in this study did support recommendations for an updated and more comprehensive understanding of trauma and post traumatic stress responses. As a result, future investigations should explore the full extent of these content domains. Future studies have the potential to gain valuable knowledge and to advance our current understanding of this important topic area. Moreover, since the CAPS is supposed to be the current "gold standard" clinical interview for PTSD, this study shows that more research is needed to examine whether the CAPS-dxsx can consistently provide strong content-related validity evidence, or whether it needs to be revised in light of new developments in our understanding of the construct of PTSD. As Haynes, Richard, and Kubany (1995) state, as previous conceptualizations become outdated, new content validation studies are necessary to ensure we adequately survey the entire content domain of interest. In the case of PTSD and PTSR, new research is desperately needed to define the entire domain relevant to post traumatic stress, and to identify all of the relevant symptom sequelae that may manifest in clients afflicted by this debilitating disorder. Then, and only then, can we hope to find more effective solutions that may bring much needed healing to those who are traumatized and afflicted by this disorder. 67 References Aiken, L.R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40, 955-959. Anastasi, A . (1988). Psychological testing (6 t h ed.). New York, N Y : Macmillan. Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H . Braun (Eds.), Test validity. 19-32. Hillsdale, NJ: Erlbaum. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985; 1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association. American Psychological Association. (1952). Technical Recommendations for Psychological Tests and Diagnostic Techniques: A Preliminary Proposal. American Psychologist, 7, 461-465. American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin Supplement, 51(2, Pt. 2), 1-38. American Psychological Association. (1994). Diagnostic and statistical manual of mental disorders ( 3 r d ed.). (pp.426-429). Washington, DC: Author. American Psychological Association. (2000). Diagnostic and statistical manual of mental disorders (text revision). Washington, DC: Author. Beck, C.T., & Gable, R.K. (2001). Ensuring content validity: An illustration of the process. Journal of Nursing Measurement, 9(2), 201-215. Blake, D.D., Weathers, F.W., Nagy, L . M . , Kaloupek, D.G., Klauminzer, G., Charney, D.S. & Keane, T .M. (1990). A clinician rating scale for assessing current and lifetime PTSD: The CAPS-1. Behavior Therapist, 13, 187-188. Blake, D.D., Weathers, F.W., Nagy, L . M . , Kaloupek, D.G., Gusman, F.D., Charney, D.S. & Keane, T .M. (1995). The development of a clinician-administered PTSD Scale. Journal of Traumatic Stress, 8(\),1'5-90. Briere, J. 1997. Psychological assessment of adult posttraumatic states. Washington: A P A . Burke, M.J. , Finkelstein, L . M . , & Dusig, M.S. (1999). On average deviation indices for estimating interrater agreement. Organization Research Methods, 2, 49-68. Retrieved from Ebsco Database on September 21, 2006. Burke, M.J. , & Dunlap, (2003). Estimating interrater agreement with the average deviation index: A user's guide. Organizational Research Methods, 5(2), 129-131. Retrieved from Ebsco Database on September 21, 2006. Burstow, B. (2005). A critique of posttraumatic stress disorder and the D S M . Journal of Humanistic Psychology, 45(A), 429-445. Carlson, E.B. (2001). Psychometric study of a brief screen for PTSD: Assessing the impact of multiple traumatic events. Assessment, 5(4), 431-441. Carlson, E.B., & Dutton, M . A . (2003). Assessing experiences and responses of crime victims. Journal of Traumatic Stress, 16(2), 133-148. Crocker, L . M . , Miller, M.D. , & Franks, E.A. (1989). Quantitative methods for assessing the fit between test and curriculum. Applied Measurement in Education, 2(2), 179-194. Cronbach, L.J. & Meehl, P E . (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302. Cureton, (1951). Validity. In E.F. Lindquist, (Ed.), Educational Measurement. Washington, D.C.: American Council on Education. 69 Denzin, N .K. , & Lincoln, Y.S . (Eds.) (2000). Handbook of qualitative research (2 n d ed.). Thousand Oaks, C A : Sage. Franklin, C.L., Sheeran, T., & Zimmerman, M . (2002). Screening for trauma histories, posttraumatic stress disorder (PTSD), and subthreshold PTSD in psychiatric outpatients. Psychological Assessment, 14(4), 467-471. Frueh, B.C., Cousins, V .C . , Hiers, T.G., Cavenaugh, S.D., Cusack, K.J . , & Santos, A . B . (2002). The need for trauma assessment and related clinical services in a state-funded mental health system. Community Mental Health Journal, 38(4), 351-356. Garrett, H. E. (1937). Statistics in psychology and education. New York: Longmans, Green. Goodwin, L .D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercise Science, 5 (1), 13-34. Gregory, R.J. (2004). Psychological testing: History, principles, and applications. Boston: Pearson. Guion, R. M . (1977). Content validity: The source of my discontent. Applied Psychological Measurement, \, 1-10. Gulliksen, H. (1950). Intrinsic validity. American Psychologist, 5, 511-517. Haynes, S.N., Richard, D.C.S., & Kubany, E.S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7(3), 238-247. Hambleton, R.K. (1980). Test score validity and standard setting methods. In R.A. Berk (Ed.), Criterion-referenced measurement: The state of the art. Baltimore, M D : John Hopkins University Press. 351-379. 70 Herman, J.L. (1992a). Complex PTSD: A syndrome in survivors of prolonged and repeated trauma. Journal of Traumatic Stress, 5(3), 377-392. Herman, J.L. (1992b). Trauma and recovery. New York: Basic Books. Horowitz, M.J. (2001). Stress response syndromes. 4 t h ed. New Jersey: Jason Aronson. Hubley, A . M . , & Zumbo, B.D. (1996). A dialectic on validity: Where we have been and where we are going. Journal of General Psychology, 123(3), 207-216. Jarjoura, D., & Brennan, R.L. (1982). A variance components model for measurement procedures associated with a table of specifications. Applied Psychological Measurement, 6(2), 161-171. Janoff-Bulman, R. (1985). The aftermath of victimization: Rebuiding shattered assumptions. In C R . Figley (Ed.), Trauma and its wake: The study and treatment of post-traumatic stress disorder. New York: Brunner/Mazel. Klein, S.P., Kosecoff, J.P. (1975). Procedures and issues in the validation of criterion-referenced tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (Washington, D.C., March 31-April 2, 1975). Lasiuk, G.C., & Hegadoren, K . M . (2006). Posttraumatic stress disorder part II: Development of the construct within the North American psychiatric taxonomy. Perspectives in Psychiatric Care, 42(2), 72-81. Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563-575. Levine, P. A. (1999). Healing trauma: Restoring the wisdom of the body. Boulder: Sounds True Publishing. 71 Lindquist, E.F. (Ed.). (1951). Educational Measurement. Washington, D.C.: American Council on Education. Litz, B.T., Penk, W.E., Gerardi, R.J., & Keane, T .M. (1990). Assessment of posttraumatic stress disorder. In P.A. Saigh (Ed.) Post-traumatic stress disorder: A behavioral approach to assessment and treatment. Boston: Allyn & Bacon. Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 5(Suppl. 9), 635-694. Lynn, M.R. (1986). Determination and quantification of content validity. Nursing Research, 35(6), 382-385. Mayo Clinic. (2005). Department of Psychiatry and Psychology web page. Retrieved December 13, 2005 from http://mavoresearch.mavo.edu/mavo/researclVpsvch/psychometric_resource.cfm McKenzie, J.F., Wood, M . L . , Kotecki, J.E., Clark, J.K., & Brey, R.A. (1999). Establishing content validity: Using qualitative and quantitative steps. American Journal of Health Behavior, 23(4), 311-318. Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30, 955-966. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749. Mosier, C. I. (1947). A critical examination of the concept of face validity. Educational and Psychological Measurement, 7, 191-205. Overall, J.E., & Magee, K . N . (1992). Estimating individual rater reliabilities. Applied ' Psychological Measurement, 16(\), 77-85. Roth, S., Newman, E., Pelcovitz, D., van der Kolk, B.A. , & Mandel, F.S. (1997). Complex PTSD in victims exposed to sexual and physical abuse: Results from the DSM-IV field trial for posttraumatic stress disorder. Journal of Traumatic Stress,10(A), 539 - 555. Retrieved from Ebsco Database on September 21, 2006. Rubio, D .M. , Berg-Weger, M . , Tebb, S.S., Lee, E.S., & Rauch, S. (2003). Objectifying content validity: Conducting a content validity study in social work research. Social Work Research, 27(2), 94-104. Retrieved from Ebsco Database on September 21, 2006. Rulon, P.J. (1946). On the validity of educational tests. Harvard Educational Review, 16, 290-296. Sireci, S.G. (1998a). Gathering and analyzing content validity data. Educational Assessment, 5(4), 299-321. Sireci, S.G. (1998b). The construct of content validity. Social Indicators Research, 45, 83-117. Sireci, S.G., & Giesinger, K.F. (1992). Analyzing test content using cluster analysis and multidimensional scaling. Applied Psychological Measurement, 16(1), 17-31. Sprang, G. (2004). Construct validity and method variance in the measurement of posttraumatic stress disorder: An examination of the traumatic experiences inventory (TEI). Journal of Loss and Trauma, 9, 189-199. Statistics Canada (2002). Canadian Community Health Survey 2002. Retrieved from Ebsco Database March 20, 2005. Stein, M.B. (2002). Taking aim at posttraumatic stress disorder: Understanding its nature and shooting down myths. The Canadian Journal of Psychiatry, 47(10), 921-922. van der Kolk, B.A. (1997). Traumatic memories. In P. S. Appelbaum, L. A . Uyehara, & M . Elin (Eds.), Trauma and Memory, New York: Oxford University Press. van der Kolk, B .A. (2002). Assessment and treatment of complex PTSD. In R. Yehuda (Ed.), Treating trauma survivors with PTSD. Washington, D.C.: American Psychiatric Publishing. Van der Kolk, B., & Courtois, C A . (2005). Editorial comments: Complex developmental trauma. Journal of Traumatic Stress, 18(5), 385-388. van der Kolk, B., Pelcovitz, D., Roth, S., Mandel, R., McFarlane, A . , & Herman, J.L. (1996). Dissociation, affect dysregulation, and somatization: The complexity of adaptation to trauma. American Journal of Psychiatry, 153,83-93. Veneziano, L. & Hooper, J. (1997). Research notes: A method for quantifying content validity of health-related questionnaires. American Journal of Health Behaviour, 27(1), 67-70. Waltz, C , & Bausell, R.B. (1991). Nursing research: Design, statistics, and computer analysis. Philadelphia, PA: F.A. Davis. Weathers, F.W., Keane, T .M. , & Ruscio, A . M . (1999). Psychometric properties of nine scoring rules for the clinician-administered posttraumatic stress disorder scale. Psychological Assessment, 11(2), 124-133. Wilson, J.P. (2004). PTSD and complex PTSD: Symptoms, syndromes, and diagnoses. In J.P. Wilson, and T .M. Keane (Eds.). Assessing psychological trauma and PTSD (2 n d ed.). London: Guilford Press. Wynd, C.A., Schmidt, B. , & Schaefer, M . A . (2003). Two quantitative approaches for estimating content validity. Western Journal of Nursing Research, 25(3), 508-518. 74 Yun, J . , & Ulrich, D. (2002). Estimating measurement validity: A tutorial. Adapted Physical Activity Quarterly, 19, 32-47. Zumbo, B.D. (in press). Validity: Foundational Issues and Statistical Methodology. In C.R. Rao and S. Sinharay (Eds.) Handbook of Statistics, Vol. 27: Psychometrics. Elsevier Science B.V.: The Netherlands. T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A lie Appendix B UBC Department of Educational and Counselling Psychology, and Special Education 2125 Main Mall Vancouver, BC Canada V6T 1Z4 Tel: (604) 822-0242 Fax: (604) 822-3302 www.ecps.educ.ubc.ca Informed Consent Sheet This informed consent is required to ensure that you understand the purpose of this study, what you are being asked to do, and your rights as a participant. If you have any questions about the research or the tasks that are requested of you, please ask! Title: Measurement of Post traumatic Stress Disorder: A content validation study of the Clinician-Administered PTSD Scale (CAPSdxsx) Principal Investigator: Dr. Kadriye Ercikan, Dept. of Educational and Counselling Psychology, and Special Education at the University of British Columbia (UBC). Co-Investisator: Lynda Thiessen, a graduate student in the Measurement, Evaluation, and Research Methodology program at UBC. This project is for Lynda's Master's thesis. Purpose: The purpose of this study is to gather content validity evidence for the latest version of the CAPS scale which measures Post Traumatic Stress Disorder (PTSD). Task Requirements: You will be asked to complete a questionnaire regarding the clarity, relevance, and representativeness of the questions and instructions on the CAPS measure; and if available, you are invited to come to a consensus-building meeting with a group of other experts on PTSD. Your participation is invaluable and will help to advance current knowledge regarding the assessment of PTSD. Duration: There is just one session required. The questionnaire task should take approximately 90 minutes; and i f applicable, the consensus-building meeting will be conducted immediately after the questionnaire is administered and will last approximately 60 minutes. Feedback: Due to time constraints, no_specific feedback will be given regarding your individual performance on the tasks in this study. If you are interested in a summary of the overall results, you may request one from the principal investigator when the research is completed. Anonymity/Confidentiality: A l l information collected during this study is kept confidential. Your name will not appear on any of the data collected. Participants will not Counselling Psychology • Development, Learning and Culture Measurement, Evaluation and Research Methodology • School Psychology • Special Education Page 1 of2 78 Appendix C CAPS Content Validation Study Instructions Dear Participant, On the following questionnaire you will be asked to rate the: 1. Relevance 2. Representativeness and 3. Clarity of the items on the latest version of the Clinician-Administered PTSD Scale (CAPSdxsx). Your task is to rate how well the items pertain to the entire content domain for PTSD. As well, you will be asked to rate the: appropriateness; and ease of understanding of the instructions for administering the CAPS and the scoring procedures. Finally, you will be asked for your professional opinion on any items that should be: 1. added 2. deleted 3. revised on the CAPS and also to provide an alternate definition of PTSD and trauma based on your professional experience. Before you begin the questionnaire, please complete the following demographic information to help us in this study. Name: (Please print) Institution name: Address: City : Postal Code: Phone # E-mail: Fax # Your age: Gender: Highest level of education: From what institution: Type of current occupation: Number of years of post graduate experience working with PTSD: Have you taken specific course work or training about PTSD? Yes / No (circle one) What kind? Where? Below are 3 examples of questions you will be asked to rate on the following questionnaire about the content domain of PTSD. You task is to rate each question on a 4 point scale ranging from 0 (Not at all) to 3 (Very). The purpose of this study is to assess how well the Clinician-Administered PTSD Scale (CAPS) measures the content domain of PTSD. The CAPS instrument was designed to assess PTSD according to the DSM-IV's definition and criteria. As an expert on PTSD, we are interested first of all in your professional opinion of the DSM-IV's definition. On the next page you will find a copy of the current DSM-IV's definition of PTSD, please take a moment now to read it over and then answer the following 3 questions: Rating Scale Questions Not at all Somewhat Mostly Very a. How adequate is the DSM-IV's definition of PTSD? (see Appendix) 0 1 2 3-b. How representative is the DSM-IV's definition of the entire content domain of PTSD symptoms? 0 1 2 3 c. In your professional opinion, how adequate is the D S M -IV's definition of trauma (as specified by Criterion A 1 - 2 ? ) (see below) 0 1 2 3 Criterion A. The person has been exposed to a traumatic event in which both of the following were present: 1. The person experienced, witnessed, or was confronted with an event or events that involved actual or threatened death ore serious injury, or a threat to the physical integrity of self or others. 2. The person's response involved intense fear, helplessness, or horror. (Note: in children this may be expressed as disorganized or agitated behaviour). d. Based on your experience with PTSD, please provide an alternate definition that you feel in your professional opinion best describes PTSD: (if more space is needed use the back of the page) e. In your professional opinion, please provide an alternate definition of trauma: Date: Name: CAPS Content Validation Questionnaire On the following pages you will find questions relating to the: 1. relevance, 2. representativeness, and 3. clarity of an instrument designed to measure Post Traumatic Stress Disorder (PTSD). The instrument is entitled "The Clinician-Administered Post Traumatic Stress Scale" (CAPS). Based on your knowledge and experience over the years working with PTSD, we want your professional opinion regarding how well the items sample the entire content domain pertaining to PTSD, and how appropriate the instructions and scoring procedures are. You will be asked to rate each item or element on a 4 point Likert-type scale (O = Not at all) to (4= Very) by circling the response that best reflects your professional opinion regarding PTSD. If you rate any items as 0 or 1 we would like to know why so we would ask you to write below the item or at the bottom (or back) of the page the question number and your reasoning. This is very important information to this study so please be sure to include your comments wherever you can fit them in. Please return this completed questionnaire in the self-addressed envelope included in your package within 2 weeks of receiving it. The deadline for collecting the data for this project is so we thank you in advance for your prompt reply. Below are the first questions of the survey which are pertaining to the administration instructions contained in the instruction manual to clinicians. Please circle the number on the rating scale to the left of each question that best reflects your professional opinion. Administration Procedure Instructions Instruction # 1: Using the example narrative on Page 2 of the CAPS interview, the clinician should introduce the fact that they will be asking the patient about such events". [Example narrative from page 2: "I'm going to be asking you about some difficult or stressful things that sometimes happen to people ...As we go along, if you find yourself becoming upset, let me know and we can slow down and talk about it. Also, if you have any questions or you don't understand something, please let me know. Do you have any questions before we start?"] Questions Not at all Somewhat Mostly Very 1 .a) How easy to understand are these instructions? 0 1 2 3 2.b) How appropriate is the narrative for persons who potentially may have PTSD? 0 1 2 3 SI Content Validation-CAPS.-. Questions Not at all Somewhat Mostly Very On the preceding page to the left, (below the box for Criterion A) please read the two paragraphs of instructions that begin "I'm going to . . ." and "Some of these . . ." 1. How easy to understand are these instructions? 0 1 2 3 2. How appropriate are these instructions for persons who potentially may have PTSD? •0 1 2. 3 Read the next paragraph that begins "ADMINISTER CHECKLIST" 3. How clear are these instructions to test administrators (or clinicians)? 0 1 2 3 4. In the box on the left entitled "Event 1" how relevant is the question "What happened?" 0 1 2 3 5. How relevant are the questions that follow in the brackets? 0 1 2 3 6. How relevant is the question "How did you respond emotionally?" 0 1 2 3 7. How relevant are the questions that follow in the brackets ("Were you very ...?") o. 1 2 3 Content Validation-CAPS • Questions Not at all Somewhat Mostly Very Look at the page to the left, in the box on the right-hand side 8. How relevant is the question "Describe (e.g., event type, victim, perpetrator, age, frequency) for people who may have PTSD? 0 1 2 3 9. How representative are the items in the brackets of all the content that patients should describe? 0 . 1 2 3 10. Below that question is a rating scale for clinicians, How appropriate is that rating scale? 0 1 2 3 11. How appropriate is the format of these boxes on CAPS page 2 and 3? 0 1 2 3 12. How easy to understand is this format? 0 1 2 3 At the bottom of the page are 2 paragraphs of instructions to be read to participants. 13. How easy to understand are these instructions? 0 1 2 3 14. How appropriate are these instructions for people who potentially may have PTSD? 0 1 2 3 £3> Content Validation-CAPS6 Questions Not at all Somewhat Mostly Very On CAPS page 4 to the left, read the set of questions in the box entitled frequency that begins with "Have you ever..." 15. How relevant are these questions? 0 1 2 3 16. How representative are these questions of the entire content domain for criterion B - l ? 0 1 2 3 17. How easy to understand are these questions? 0 1 2 3 18. How relevant are the Intensity questions in the box beside? 0 1 • 2 3 19. How representative are these questions of the entire content domain for intensity questions to assess criterion B - l ? . 0 1 2 3 20. How easy to understand are these intensity questions? 0 1 2 3 21. How relevant are the questions in the bottom frequency box? 0 1 2 3 22. How representative are these questions of the entire content domain for criterion B-2? 0 1 2 3. 23. How easy to understand are these questions? 0 1 2 3 24. How relevant are the Intensity questions in the box beside? 0 1 2 3 25. How representative are these questions of the entire content domain for intensity questions to assess criterion B -2? 0 1 2 3 26. How easy to understand are these intensity questions? 0 1 2 .3 . £4 Content Validation-CAPS'; Questions Not at all Somewhat Mostly Very On CAPS page 5 to the left, read the set of questions in the box entitled frequency that begins with "Have you ever..." 27. How relevant are these questions? 0 1 2 3 28. How representative are these questions of the entire content domain for criterion B-3? 0 1 2 3 29. How easy to understand are these questions? 0 1 2 3 30. How relevant are the Intensity questions in the box beside? 0 1 2 3 31. How representative are these questions of the entire content domain for intensity questions to assess criterion B-3? 0 1 2 . 3 32. How easy to understand are these intensity questions? 0 1 2 3 ' 33. How relevant are the questions in the bottom frequency box? 0 1 • 2 3 34. How representative are these questions of the entire content domain for criterion B-4? 0 1 2 3 35. How easy to understand are these questions? 0 1 2 3 36. How relevant are the Intensity questions in the box beside? 0 1 2 3 37. How representative are these questions of the entire content domain for intensity questions to assess criterion B-4? 0 1 2 3 38. How easy to understand are these intensity questions? 0 1 2 3 Content Validation-CAPS Questions Not at all Somewhat Mostly Very On CAPS page 6 to the left, read the set of questions in the box entitled frequency that begins with "Have you ever..." 39. How relevant are these questions? 0 .1 2 3 40. How representative are these questions of the entire content domain for criterion B-5? 0 1 2 . 3 41. How easy to understand are these questions? 0 1 2 3 42. How relevant are the Intensity questions in the box beside? 0 1 2 3 43. How representative are these questions of the entire content domain for intensity questions to assess criterion B-5? 0 1 2 3 44. How easy to understand are these intensity questions? 0 1 2 3 45. How relevant are the questions in the bottom frequency box? 0 1 2 . 3 47. How representative are these questions of the entire content domain for criterion C- l ? 0 1 2 3 47. How easy to understand are these questions? 0 1 2 3 48. How relevant are the Intensity questions in the box beside? 0 1 2 3 49. How representative are these questions of the entire content domain for intensity questions to assess criterion C - l ? 0 • 1 2 3 50. How easy to understand are these intensity questions? 0 1 2 3 Content Validation-CAPSl Questions Not at all Somewhat Mostly Very On CAPS page 7 to the left, read the set of questions in the box entitled frequency that begins with "Have you ever..." 51. How relevant are these questions? 0 1 2 3 • 52. How representative are these questions of the entire content domain for criterion C-2? 0 1 2 , 3 53. How easy to understand are these questions? 0 1 2 3 54. How relevant are the Intensity questions in the box beside? 0 1 2 .3 55. How representative are these questions of the entire content domain for intensity questions to assess criterion C-2? 0 1 2 3 56. How easy to understand are these intensity questions? 0 1 2 3 57. How relevant are the questions in the bottom frequency box? 0 1 2 ' 3 58. How representative are these questions of the entire content domain for criterion C-3? 0 1 2 3 59. How easy to understand are these questions? 0 1 2 3 60. How relevant are the Intensity questions in the box beside? 0 1 2 • " 3 61. How representative are these questions of the entire content domain for intensity questions to assess criterion C-3? 0 1 2- 3 62. How easy to understand are these intensity questions? 0 1 2 3 Content Validation-CAPS.l4 Questions Not at all Somewhat Mostly Very On CAPS page 8 to the left, read the set of questions in the box entitled frequency that begins with "Have you been..." 63. How relevant are these questions? 0 1 2 3 64. How representative are these questions of the entire content domain for criterion C-4? 0 1 2 3 65. How easy to understand are these questions? 0 1 2 3 66. How relevant are the Intensity questions in the box beside? 0 1 2 3 67. How representative are these questions of the entire content domain for intensity questions to assess criterion C-4? 0 1 2 3 68. How easy to understand are these intensity questions? 0 1 2 3 69. How relevant are the questions in the bottom frequency box? 0 1 2 3 70. How representative are these questions of the entire content domain for criterion C-5? 0 . 1 2 3 71. How easy to understand are these questions? 0 1 2 3 72. How relevant are the Intensity questions in the box beside? 0 1 2 3 73. How representative are these questions of the entire content domain for intensity questions to assess criterion C-5? 0 1 2 3 74. How easy to understand are these intensity questions? 0 1 2 3 Content Validation-CAPSl6 Questions Not at all Somewhat Mostly Very On CAPS page 9 to the left, read the set of questions in the box entitled frequency that begins with "Have there been..." 75. How relevant are these questions? 0 1 2 3 76. How representative are these questions of the entire content domain for criterion C-6? 0 1 2 3 77. How easy to understand are these questions? 0 1 2 3 78. How relevant are the Intensity questions in the box beside? 0 1 2 3 79. How representative are these questions of the entire content domain for intensity questions to assess criterion C-6? ' 0 1 2 3 80. How easy to understand are these intensity questions? 0 1 2 3 81. How relevant are the questions in the bottom frequency box? 0 1 2 3 82. How representative are these questions of the entire content domain for criterion C-7? 0 1 2 3 83. How easy to understand are these questions? 0 1 2 3 84. How relevant are the Intensity questions in the box beside? 0 1 2 3 85. How representative are these questions of the entire content domain for intensity questions to assess criterion C-7? 0 1 2 3 86. How easy to understand are these intensity questions? 0 1 2 3 . Content Validation-CAPS 18 Questions Not at all Somewhat Mostly Very On CAPS page 10 to the left, read the set of questions in the box entitled frequency that begins with "Have you had..." 87. How relevant are these questions? 0 1 2 3 ' 88. How representative are these questions of the entire content domain for criterion D - l ? 0 1 2 3 89. How easy to understand are these questions? 0 1 2 3 90. How relevant are the Intensity questions in the box beside? 0 1 2 3 91. How representative are these questions of the entire content domain for intensity questions to assess criterion D- l? 0 1 2 3 92. How easy to understand are these intensity questions? 0 1 2 3 93. How relevant are the questions in the bottom frequency box? 0 1 2 3 94. How representative are these questions of the entire content domain for criterion D-2? 0 1 . 2 3 95. How easy to understand are these questions? 0 1 2 3 96. How relevant are the Intensity questions in the box beside? 0 1 2- 3 97. How representative are these questions of the entire content domain for intensity questions to assess criterion D-2? 0 1 2 3 98. How easy to understand are these intensity questions? 0 1 2 3 Content Validation-CAPS°,0 Questions Not at all Somewhat Mostly Very On CAPS page 11 to the left, read the set of questions in the box entitled frequency that begins with "Have you found..." 99. How relevant are these questions? -0 1 2 3 100. How representative are these questions of the entire content domain for criterion D-3? 0 1 2 3 101. How easy to understand are these questions? 0 1 2 • • 3 102. How relevant are the Intensity questions in the box beside? 0 1 2 3 103. How representative are these questions of the entire content domain for intensity questions to assess criterion D-3? 0 1 2 3 104. How easy to understand are these intensity questions? 0 1 2 3 105. How relevant are the questions in the bottom frequency box? 0 1 2 3 106. How representative are these questions of the entire content domain for criterion D-4? 0 . 1 2 3 107. How easy to understand are these questions? 0 1 2 3 , 108. How relevant are the Intensity questions in the box beside? 0 1 2 3 109. How representative are these questions of the entire content domain for intensity questions to assess criterion D-4? 0 1 2 3 110. How easy to understand are these intensity questions? 0 1 2 3 Content Validation-CAPS'-. Questions Not at all Somewhat Mostly Very On CAPS page 12 to the left, read the set of questions in the box entitled frequency that begins with "Have you had..." 111. How relevant are these questions? 0 1 2 3 112. How representative are these questions of the entire content domain for criterion D-5? 0 1 2 3 113. How easy to understand are these questions? 0 1 2 3 114. How relevant are the Intensity questions in the box beside? 0 1 2 3 115. How representative are these questions of the entire content domain for intensity questions to assess criterion D-5? 0 1 2 3 116. How easy to understand are these intensity questions? 0 1 2 3 117. How relevant is question 18 which begins "When did you...? 0 1 2 3 118. How easy to understand is this question? 0 1 2 3 119. How relevant is question 19 which begins "How long...? 0 1 2 3 120. How easy to understand is this question? 0 1 2 3 121. How relevant is question 20? 0 1 2 3 122. How easy to understand is this question? 0 1 2 3 Content Validation-CAPS24 1 J> Questions Not at all Somewhat Mostly Very On CAPS page 13 to the left, read the set of questions in the box for item number 22 which begins with "Have these..." 123. How relevant are these questions to PTSD? 0 1 2 3 124. How easy to understand are these questions? 0 1 2 3 -125. Read the next questions for item number 22: How relevant are these questions to PTSD? 0 1 2 3 126. How easy to understand are these questions? 0 1 2 3 Content Validation-CAPS26 1 "5 Questions Not at all Somewhat Mostly Very On CAPS page 14 to the left, read the set of instructions in the box for item number 23 127. How appropriate is this global validity rating for PTSD? 0 1 2 3 128. How easy to understand are these instructions? 0 1 2 3 129. Read the next questions for item number 24: How appropriate is this global severity rating for PTSD? 0 1 2 3 130. How easy to understand are these instructions? 0 1 2 3 131. Read the next questions for item number 25:. How appropriate is this global severity rating for PTSD? 0 1 2 3 132. How easy to understand are these instructions? 0 1 2 3 Content Validation-CAPS28 9 1 Questions Not at all Somewhat Mostly Very On CAPS page 15 to the left, please read the scoring summary chart under the heading "Current PTSD Symptoms" 133. How clear is this scoring chart for understanding what you are supposed to do? 0 1 2 3 134. How appropriate is this type of scoring chart for assessing PTSD? 0 1 2 3 135. How clear are the instructions that begin "IF C U R R E N T CRITERIA A R E M E T . . . " as to what you should do next (where you are instructed to go)? 0 1 2 3 136. How clear are the instructions that begin "IF CURRENT CRITERIA A R E NOT M E T . . . " ? 0 1 2 3 137. How clear are the next instructions which begin "Since the ( E V E N T ) . . . as to who they are intended for (or what is to be done)? 0 1 2 3 138. How clear is the chart at the bottom under the heading "Lifetime PTSD Symptoms" as to when it should be used? 0 1 2 3 139. How appropriate is this type of chart for assessing PTSD symptoms? 0 1 2 3 140. How easy to use is this type of chart? 0 1 2 3 Content Validation-CAPS.30 Questions < Not at all Somewhat Mostly Very On CAPS page 16 to the left, 141. How relevant is assessing (#26) "guilt over acts of commission or omission" to the content domain of PTSD? 0 1 2 3 142. How representative are the questions in the Frequency box to the content domain of PTSD? 0 1 2 3 143. How clear are the questions in the Frequency box? 0 1 2 3 144. How relevant are the questions in the Intensity box (on the right) for assessing the content domain of PTSD? 0 1 2 3 145. How representative are the questions in the Intensity box to the content domain of PTSD? 0 1 2 3 146. How clear are the questions in the Intensity box? 0 1 2 3 147. How relevant is assessing (#27) "survivor guilt" to the content domain of PTSD? 0 1 2 • 3 148. How representative are the questions in the Frequency box under "survivor guilt" to the content domain of PTSD? 0 1 2 3 149. How clear are the questions in the Frequency box? 0 1 2 3 150. How relevant are the questions in the Intensity box (on the right) for assessing the content domain of PTSD? 0 1 2 3 151. How representative are the questions in the Intensity box to the content domain of PTSD? 0 1 2 3 152. How clear are the questions in the Intensity box? 0 1 2 3 Content Validation-CAPS3"> S t> Questions Not at ail Somewhat Mostly Very On CAPS page 17 to the left, 153. How relevant is assessing (#28) "a reduction in awareness of his or her surroundings" to the content domain of PTSD? 0 1 2 3 154. How representative are the questions in the Frequency box to the content domain of PTSD? 0 1 2 3 155. How clear are the questions in the Frequency box? 0 1 2 3 156. How relevant are the questions in the Intensity box (on the right) for assessing the content domain of PTSD? 0 1 2 3 157. How representative are the questions in the Intensity box to the content domain of PTSD? 0 1 2 .3 158. How clear are the questions in the Intensity box? 0 1 2 3 159. How relevant is assessing (#29) "derealization" to. the content domain of PTSD? 0 1 2 3 160. How representative are the questions in the Frequency box under " derealization " to the content domain of PTSD? 0 1 2 3 161. How clear are the questions in the Frequency box? 0 1 2 3 162. How relevant are the questions in the Intensity box (on the right) for assessing the content domain of PTSD? 0 1 2 3 163. How representative are the questions in the Intensity box to the content domain of PTSD? 0 1 2 3 164. How clear are the questions in the Intensity box? 0 1 2 , 3 Content Validation-CAPS! •:• 9~? Questions Not at all Somewhat Mostly Very On CAPS page 18 to the left, 165. How relevant is assessing (#30) "depersonalization" to the content domain of PTSD? 0 1 2 3 166. How representative are the questions in the Frequency box to the content domain of PTSD? 0 1 2 3 167. How clear are the questions in the Frequency box? 0 1 2 3 . 168. How relevant are the questions in the Intensity box (on the right) for assessing the content domain of PTSD? 0 1 2 3 169. How representative are the questions in the Intensity box to the content domain of PTSD? 0 1 2 3 170. How clear are the questions in the Intensity box? 0 1 2 3 171. Overall, how appropriate is the format of the CAPS questions (e.g. title above, Frequency box, Intensity box, 2 per page usually) for use by clinicians or practitioners for assessing PTSD? 0 1 2 . " 3 172. How easy to use is this format? 0 1 2 3 Content Validation-CAPSl . Questions Not at all Somewhat Mostly Very 173. How clear is the format of the "CAPS S U M M A R Y SHEET"? 0 1 2 3 174. How appropriate is the format of the "CAPS S U M M A R Y SHEET" for use by clinicians assessing PTSD? 0 1 2 3 175. How easy to use is the format of the "CAPS S U M M A R Y SHEET"? 0 1 2 3 176. How clear is the rating scale of the "CAPS S U M M A R Y SHEET"? 0 1 2 3 175. How appropriate is the rating scale of the "CAPS S U M M A R Y SHEET" for assessing PTSD? 0 1 2 3 176. How easy to use is the rating scale of the "CAPS S U M M A R Y SHEET" for assessing PTSD? 0 1 2 3 177. Overall, how representative are the items on the "CAPS S U M M A R Y SHEET" for assessing PTSD? 0 1 2 3 Content Validation-CAPS. ? ^ 1 Questions Not at all Somewhat Mostly Very 178. How appropriate is the format of the "global ratings" for assessing the content domain of PTSD? 0 1 2 3 179. How clear is the rating scale of the "global ratings"? 0 1. 2 3 180. How appropriate is the rating scale of the "Associated features" for assessing PTSD? 0 1 2 3 181. How easy to use is the rating scale of the "Associated features"? 0 1 2 3 182. How appropriate is the "time sampling parameter" used throughout the CAPS of "Past Week; Past Month; or Lifetime" for assessing PTSD? 0 1 2 3 Content Validation-CAPS-;' ; l o o Questions Not at all Somewhat Mostly Very Please read the instructions at the top of the "LIFE EVENTS CHECKLIST" page to the left and answer the following: 183. How appropriate are these instructions for the target population of persons with PTSD? 0 2 3 184. How clear are these instructions for people with PTSD? 0 . i 2 3 Now, please read item number 1: 185. How relevant is item (#1) "Natural disaster..." to the content domain of PTSD? 0 2 3 . 186. How representative are the items on number (#1) for the content domain of PTSD? 0 i 2 3 187. How clear are the items? 0 , 2 3 188. How relevant is item (#2) "natural disaster" to the content domain of PTSD? 0 i . 2 3 189. How representative are the items on number (#2) for the content domain of PTSD? . 0 1 2 3 190. How clear are the items? 0 i 2 3 191. How relevant are the items on (#3) "transportation accident..." to the content domain of PTSD? 0 2 3 192. How representative are the items on number (#3) for the content domain of PTSD? 0 • 2 3 193. How clear are the items? 0 2 3 194. How relevant is item (#4) "Serious accident..." to the content domain of PTSD? 0 2 3 195. How representative are the items on number (#4) for the content domain of PTSD? 0 2 3 196. How clear are the items? 0 2 3 197. How relevant is item (#5) "Exposure to . . ." to the content domain of PTSD? 0 2 3 198. How representative are the items on number (#5) for the content domain of PTSD? 0 « 2 3 199. How clear are the items? 0 2 3 Content Validation-CAPS'4i ( O l Questions Not at all Somewhat Mostly Very 200. How relevant is item (#6) "Physical assault..." to the content domain of PTSD? 0 2 3 201. How representative are the items on number (#6) for the content domain of PTSD? 0 i 2 3 202. How clear are the items? 0 2 3 203. How relevant is item (#7) "Assault with a .. ." to the content domain of PTSD? 0 i 2 3 204. How representative are the items on number (#7) for the content domain of PTSD? 0 2 3 205. How clear are the items? 0 2 3 206. How relevant is item (#8) "Sexual assault..." to the content domain of PTSD? 0 2 3 207. How representative are the items on number (#8) for the content domain of PTSD? 0 2 3 208. How clear are the items? 0 i 2 3 209. How relevant is item (#9) "Other unwanted..." to the content domain of PTSD? 0 2 3 210. How representative are the items on number (#9) for the content domain of PTSD? 0 2 3 211. How clear are the items? 0 2 3 212. How relevant is item (#10) "Combat exposure..." to the content domain of PTSD? 0 2 3 213. How representative are the items on number (#10) for the content domain of PTSD? 0 2 3 214. How clear are the items? 0 2 3 215. How relevant is item (#11) "Captivity..." to the content domain of PTSD? 0 i 2. • 3 216. How representative are the items on number (#11) for the content domain of PTSD? 0 i 2 3 217. How clear are the items? 0 2 3 Content Validation-CAPS42 I ° 3 Questions Not at all Somewhat Mostly Very 218. How relevant is item (#12) "Life threatening..." to the content domain of PTSD? 0 i 2 3 219. How representative are the items on number (#12) for the content domain of PTSD? 0 2 3 220. How clear are the items? 0 2 3 221. How relevant is item (#13) "Severe human suffering" to the content domain of PTSD? 0 i 2 3 222. How representative are the items on number (#13) for the content domain of PTSD? 0 2 3 223. How clear are the items? 0 ! 2 3 224. How relevant is item (#14) "Sudden, violent..." to the content domain of PTSD? 0 1 2 3 225. How representative are the items on number (#14) for the content domain of PTSD? 0 1 2 3 226. How clear are the items? 0 2 3 227. How relevant is item (#15) "Sudden, unexpected..." to the content domain of PTSD? 0 1 2 3 228. How representative are the items on number (#15) for the content domain of PTSD? 0 .1 2 3 229. How clear are the items? 0 , 2 . 3 ... . 230. How relevant is item (#16) "Serious injury..." to the content domain of PTSD? 0 1 2 3 231. How representative are the items on number (#16) for the content domain of PTSD? 0 2 3 232. How clear are the items? 0 2 ' 3 233. How relevant is item (#17) "Any other..." to the content domain of PTSD? 0 1 2 3 234. How representative are the items on number (#17) for the content domain of PTSD? 0 1 2 3 235. How clear is the item? 0 2 3 Content Validation-CAPS> V f c>3. "1, Questions Not at all Somewhat Mostly Very Please look over the LIFE EVENTS CHECHLIST page: 236. How clear is the format of the "LIFE EVENTS CHECHLIST"? 0 1 . 2 3 237. How appropriate is the format of the "LIFE EVENTS CHECHLIST" for use by clinicians assessing PTSD? 0 1 2 3 238. How easy to use is the format of the "LIFE EVENTS CHECHLIST"? 1 0 1 .2 3 239. How clear is the rating scale of the "LIFE EVENTS CHECHLIST"? 0 1 2 3 ' i 240. How appropriate is the rating scale of the "LIFE EVENTS CHECHLIST" for assessing PTSD? 0 1 2 3 241. How easy to use is the rating scale of "LIFE EVENTS CHECHLIST" for assessing PTSD? 0 1 2 3 242. Overall, how representative are the items on the "CAPS S U M M A R Y SHEET" for assessing PTSD? 0 1 2 3 243. Overall, how adequately did the CAPS questionnaire itself meet its own objective to assess PTSD according to the DSM-IV's definition and criteria? 0 1 ' 2 3 Appendix D i O 4 D S M I V Diagnostic Criteria for PTSD A . The person has been exposed to a traumatic event in which both of the following were present: 1. The person experienced, witnessed, or was confronted with an event or events that involved actual or threatened death ore serious injury, or a threat to the physical integrity of self or others. 2. The person's response involved intense fear, helplessness, or horror. (Note: in children this may be expressed as disorganized or agitated behaviour). B. The traumatic event is persistently reexperienced in one (or more) of the following ways: 1. Recurrent and intrusive distressing recollections of the event, including images, thoughts, or perceptions. (Note: In young children, repetitive play may occur in which themes or aspects of the trauma are expressed). 2. Recurrent distressing dreams of the event. 3. Acting or feeling as i f the traumatic event were recurring. (Includes a sense of reliving the experience, illusions, hallucinations, and dissociative flashback episodes. May also occur on awakening or when intoxicated). 4. Intense psychological distress at exposure to internal or external cues that symbolize or resemble an aspect of the traumatic event. 5. Psychological reactivity on exposure to internal or external cues that symbolize or resemble an aspect of the traumatic event. C. Persistent avoidance of stimuli associated with the trauma and numbing of general responsiveness (not present before the trauma)* as indicated by three or more of the following: 1. Efforts to avoid thoughts, feelings, or conversations associated with the trauma 2. Efforts to avoid activities, places, or people that arouse recollections of the trauma 3. Inability to recall an important aspect of the trauma 4. Markedly diminished interest or participation in significant activities 5. Feeling of detachment ore estrangement from others 6. Restricted range of affect (e.g. unable to have loving feelings) 7. Sense of foreshortened future (e.g. does not expect to have a career, marriage, children, or normal life span) D. Persistent symptoms of increased arousal as indicated by two or more of the following: 1. Difficulty falling or staying asleep 2. Irritability or outbursts of anger 3. Difficulty concentrating 4. Hypervigilance 5. Exaggerated startle response E. Duration of the disturbance is more than 1 month (symptoms in B, C, and D). F. The disturbance causes clinically significant distress or impairment in social, occupational, or other important areas of functioning. Specify as: Acute - i f duration of symptoms is less than 3 months Chronic - i f duration of symptoms is 3 months or more Delayed Onset - i f onset of symptoms is at least 6 months after the stressor. (DSM IV, 1994, pp.426-429). u o Append iv . fc National Center for PTSD CLINICIAN-ADMINISTERED PTSD SCALE FOR DSM-IV Name: ID #:. Date: Interviewer Study: 0 J a z \b Dudley D. Blake, Frank W. Weathers, Linda M. Nagy, Danny G. Kaloupek, Dennis S. Charney, & Terence M. Keane National Center for Posttraumatic Stress Disorder Behavioral Science Dh/ision - Boston VA Medical Center Neurosciences Division - West Haven VA Medical Center Revised July 1998 CAPS Page 2 Criterion A. Thepersm-has been exposed;to a traumata event in wrhich both of the.folloi^ ng^vveFe:present'. (1) the person experienced, witnessed;;or was: confronted wiw an everrt^revems^w^ .:• (2) the person's.response Involved Intense fear, helplessness, or horror...- Note: In children, this may be:; expressed Instead by disorganized or agitated behavior I'm going to be asking you about some difficult or stressful things that sometimes happen to people. Some examples of this are being in some type of serious accident; being in a fire, a hurricane, or an earthquake; being mugged or beaten up or attacked with a weapon; or being forced to have sex when you didn't want to. I'll start by asking you to look over a (1st of experiences like this and check any that apply to you. Then, if any of them do apply to you, I'll ask you to briefly describe what happened and how you felt at the time. Some of these experiences may be hard to remember or may bring back uncomfortable memories or feelings. People often find that talking about them can be helpful, but It's up to you to decide how much you want to tell me. As we go along, If you find yourself becoming upset, let me know and we can slow down and talk about It Also, If you have any questions or you dont understand something, please let me know. Do you have any questions before we start? ADMINISTER CHECKLIST, THEN REVIEW AND INQUIRE UP TO THREE EVENTS. IF MORE THAN THREE EVENTS ENDORSED, DETERMINE WHICH THREE EVENTS TO INQUIRE (E.G.. FIRST. WORST, AND MOST RECENT EVENTS; THREE WORST EVENTS; TRAUMA OF INTEREST PLUS TWO OTHER WORST EVENTS, ETC.) IF NO EVENTS ENDORSED ON CHECKLIST: (Has there ever been e time when your life was in clanger or you were seriously injured or harmed?) IF NO: (What about a time when you were threatened with death or serious injury, even if you weren't actually injured or harmed?) IF NO: (What about witnessing something like this happen to someone else or finding out that it happened tt someone dose to you?) IF NO: (What would you say are some of the most stressful experiences you have had over your life?) EVENT #1 What happened? (How old were you? Who else was involved? How many times did this happen? Life threat? Serious injury?) How did you respond emotionally? (Were you very anxious or frightened? Horrified? Helpless? How so? Were you stunned or in shock so that you didnt feel anything at all? What was that like? What did other people notice about your emotional response? What about after the event • - how did you respond emotionally?) Describe (e.g., event type, victim, perpetrator, < frequency): A. (1) Life threat? NO YES [self other J Serious Injury? NO YES [self other / Threat to physical integrity? NO YES [self omer_ A. (2) Intense fear/help/horror? NO YES [during after _ Criterion A met? NO PROBABLE YES 106 Appendi: F Question* rateii rater2 rater3 rater4 rater5 rater6 CVR CVI ADmed ADm P 1a 3 3 3 3 2 3 0.67 1 0.167 0.278 0.0096 2b 2 2 3 3 1 3 0.67 0.83 0.67 0.67 0.3048 1 2 3 3 3 2 2 1 1 0.5 0.50* 0.0584 2 3 2 3 3 1 2 0.67 0.83 0.67 0.67 0.3112 3 2 1 0 2 2 1 0 0.67 0.67 0.67 0.3072 4 3 3 3 3 2 3 1 1 0.167 0.278 0.0096 5 3 2 3 3 2 2 1 1 0.5 0.5 0.0617 6 2 3 3 3 2 3 1 1 0.333 0.444 0.0408 7 2 3 3 3 2 3 1 1 0.333 0.444 0.0408 8 3 3 3 3 2 2 1 1 0.333 0.444 0.0408 9 2 2 1 3 1 1 0 0.5 0.67 0.67 0.3022 10 3 2 1 3 0 2 0.33 0.67 0.83 0.89 0.5437 11 3 2 1 3 ' 1 1 0 0.5 0.83 0.83 0.49 12 3 1 3 3 2 1 0.33 0.67 0.83 0.83 0.49 13 2 2 3 2 2 2 1 1 0.375 0.469 0.023 14 1 2 3 1 1 2 0 0.5 0.67 0.67 0.2966 15- 3 3 3 3 2 3 1 1 0.167 0.278 0.0096 16 2 1 2 3 1 2 0.33 0.67 0.5 0.556 0.1253 17 2 3 1 3 2 2 0.67 0.83 0.5 0.556 0.1212 18 2 3 3 3 1 3 0.67 0.83 0.5 0.667 .3042* 19 3 2 3 3 1 2 0.67 0.83 0.67 0.67 0.3041 20 2 1 1 1 2 3 0 0.5 0.67 0.67 0.3037 21 3 3 3 3 1 2 0.67 0.83 0.5 0.667 0.2998 22 3 2 3 3 1 2 0.67 0.83 0.67 0.67 0.3027 23 3 2 3 3 2 3 1 1 0.333 0.444 0.0452 24 3 3 3 3 1 3 0.67 0.83 0.333 0.556 0.077 25 3 3 3 3 2 2 1 1 0.333 0.444 0.0413 26 3 2 1 3 2 2 0.67 0.83 0.5 0.556 0.1179 27 3 3 3 3 2 2 1 1 0.333 0.444 0.0405 28 3 3 3 3 1 2 0.67 0.83 0.5 0.667 0.303 29 3 2 3 3 2 1 0.67 0.83 0.67 0.67 0.3078 30 2 3 3 3 1 2 0.67 0.83 0.571 0.653 0.1655 31 2 3 3 3 1 2 0.67 0.83 0.67 0.67 0.2996 32 2 2 1 3 2 1 0.33 0.67 0.50* 0.556 0.1234 33 3 3 2 3 1 3 0.67 0.83 0.50* 0.667 0.3009 34 2 3 3 3 1 2 0.67 0.83 0.67 0.67 0.2996 35 3 3 3 3 2 2r 1 1 0.333 0.444 0.0405 36 3 3 3 3 1 2 0.67 0.83 0.50* 0.667 0.3009 37 2 3 2 3 1 2 0.67 0.83 0.50* 0.556 0.1193 38 3 3 1 3 2 2 0.67 0.83 0.67 0.67 0.2996 39 3 3 2 3 1 3 0.67 0.83 0.50* 0.667 0.2963 40 3 2 2 3 1 2 0.67 0.83 0.50* 0.556 0.1233 41 3 3 3 3 2 2 1 1 0.333 .44* 0.0391 42 3 3 3 3 1 2 0.67 0.83 0.5 0.667 0.2985 43 3 2 3 3 1 1 0.33 0.67 0.83 0.83 0.5046 44 3 2 1 3 2 2 0.67 0.83 0.5 0.556 0.1221 45 3 3 3 3 1 3 0.67 0.83 0.333 0.556 0.0728 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0054579/manifest

Comment

Related Items