@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix dc: . @prefix skos: . vivo:departmentOrSchool "Education, Faculty of"@en, "Educational and Counselling Psychology, and Special Education (ECPS), Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Perot, Josette Anne-Marie"@en ; dcterms:issued "2008-09-05T16:46:20Z"@en, "1992"@en ; vivo:relatedDegree "Master of Arts - MA"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description "The primary purpose of this study was to investigate psychologists' natural, interactive decision-making behaviour while scoring difficult verbal responses on the Wechsler Intelligence Scale for Children-Revised. A total of 23 psychologists participated in the study. First of all, in order to obtain scoring information descriptive of the sample, psychologists scored a WISC-R protocol. This protocol comprised four verbal scale subtests: the Vocabulary, Similarities, Information, and Comprehension subtests. In order of difficulty, the Vocabulary, Comprehension, Similarities, and Information subtests were found to be most prone to scoring differences. The Verbal IQ was found to vary by 11 points. Differences in point assignment within subtests accounted for variance in scoring. Following the completion of the first measure, a sub-sample of 8 psychologists provided think-aloud protocols in a separate session while scoring a second fabricated Comprehension subtest. The complexity of the task involved the consideration of administration errors and response judgment while scoring. Rather than focus solely on quantitative analysis of error differences as has been done in prior research, this study conceptualized these sources by providing additional analysis of specific strategies psychologists used while making scoring decisions. The results of the verbal protocol analysis identified cognitive strategies inherent in the scoring of difficult type responses. The type and frequency of cognitive strategies identified in the study appear to be related to individual scoring accuracy. At the end of the session, psychologists were asked to identify strategies that were useful to them in difficult scoring situations. All psychologists identified the manual as the primary heuristic; however, percentage frequencies of verbalized strategies across subjects indicated that only four of the subjects used the manual as their primary aid on this task. These findings are further discussed, as well as their implications and inferences."@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/1639?expand=metadata"@en ; dcterms:extent "5235405 bytes"@en ; dc:format "application/pdf"@en ; skos:note "We accept this thesis as conformingCOGNITIVE STRATEGIES AND HEURISTICSUNDERLYING PSYCHOLOGISTS' JUDGMENTS ON THE WISC-R VERBALSCALES: A PROTOCOL ANALYSISbyJOSETTE ANNE-MARIE PEROTB.A., YORK UNIVERSITY, 1988A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF ARTSinTHE FACULTY OF GRADUATE STUDIESDepartment of Educational Psychology and SpecialEducationTHE UNIVERSITY OF BRITISH COLUMBIAMarch 1992© Josette Anne-Marie Perot, 1992Department of a_..(_,C(2)C0/, V Le The University of British ColumbiaVancouver, CanadaIn presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Date ^0_4(1) 3/ /9,1DE-6 (2/88)iiABSTRACTThe primary purpose of this study was to investigatepsychologists' natural, interactive decision-makingbehaviour while scoring difficult verbal responses on theWechsler Intelligence Scale for Children-Revised. A totalof 23 psychologists participated in the study. First ofall, in order to obtain scoring information descriptive ofthe sample, psychologists scored a WISC-R protocol. Thisprotocol comprised four verbal scale subtests: theVocabulary, Similarities, Information, and Comprehensionsubtests. In order of difficulty, the Vocabulary,Comprehension, Similarities, and Information subtests werefound to be most prone to scoring differences. The VerbalIQ was found to vary by 11 points. Differences in pointassignment within subtests accounted for variance inscoring. Following the completion of the first measure, asub-sample of 8 psychologists provided think-aloud protocolsin a separate session while scoring a second fabricatedComprehension subtest. The complexity of the task involvedthe consideration of administration errors and responsejudgment while scoring. Rather than focus solely onquantitative analysis of error differences as has been donein prior research, this study conceptualized these sourcesby providing additional analysis of specific strategiespsychologists used while making scoring decisions.iiiThe results of the verbal protocol analysis identifiedcognitive strategies inherent in the scoring of difficulttype responses. The type and frequency of cognitivestrategies identified in the study appear to be related toindividual scoring accuracy. At the end of the session,psychologists were asked to identify strategies that wereuseful to them in difficult scoring situations. Allpsychologists identified the manual as the primaryheuristic; however, percentage frequencies of verbalizedstrategies across subjects indicated that only four of thesubjects used the manual as their primary aid on this task.These findings are further discussed, as well as theirimplications and inferences.William T. McKee, Ph.D.Research SupervisorivTABLE OF CONTENTSABSTRACT ^iiLIST OF TABLES ^ viLIST OF FIGURES viiACKNOWLEDGMENTS ^ viiiCHAPTERI INTRODUCTION ^1Context of the Study ^3Purpose of the Study ^4Assumptions of the Study ^4Justification of the Study ^5II REVIEW OF THE LITERATURE ^8Theoretical Framework: Cognitive Psychologyand the Psychometric Link ^8A Perspective: The WISC-R as a CognitiveTask ^11Psychologists' Task Performance on theWISC-R ^13Problems of the Verbal Scales ^14Nature of the WISC-R Verbal Scales ^17Task Summary ^21The Administration Process ^22Cognitive Strategies and Heuristics ^25Cognitive Psychology and the LaboratoryMethod ^26Verbal Protocol Analysis ^29III METHODOLOGY ^39Sample ^39Procedures ^40The Stimulus Protocol ^41Development of Verbal Categories ^43Analysis of Semantic Units ^44Training of the Coder ^45Summary of Instrumentation ^46IV RESULTS ^ 49Demographic Characteristics of Sample ^ 49Session One WISC-R I Results ^51Session Two WISC-R II Results ^ 56V SUMMARY AND CONCLUSIONS ^ 68Summary of Results and Discussion ^ 68Conclusions ^82Limitations of the Study ^ 83Implications of the Study 84REFERENCES ^ 88APPENDIX A 101WISC-R I ^ 102WISC-R II 108Instructions 110Consent form 112Background form ^ 113APPENDIX B ^ 114Script for Thinking-Aloud Protocol ^ 115APPENDIX C ^ 117Examples of Segmented Units and CompleteProtocols 118APPENDIX D ^ 130Table Dl: Frequencies of Verbalizations forNon-Problematic Items ^ 131Table D2: Frequencies of Verbalizations forDifficult Items ^ 132viLIST OF TABLESTABLE^I^Demographic Characteristics of Sample ^50II^Types of Errors Across Subtests ^52III^Comparison of Scaled Scores to Slate'sKey ^53IV^Means and Standard Deviations and StandardErrors of Measurement for Scale Scores andVerbal IQ ^54V^Comparison of Point Differences AcrossGroups ^55VI^Comparison of Total Errors BetweenGroups ^56VII^Frequency and Percentage Categories of VerbalBehaviour ^62VIII^Frequencies and Percentages of CognitiveStrategies Across Subjects in eachCategory ^63IX^Patterns of Scoring on the WISC-IIMeasure ^64viiLIST OF FIGURESI Scoring Delimma ^2II Model of Psychologists' JudgmentalProcesses ^22III Encoding Process ^33vii'ACKNOWLEDGMENTSI wish to express my sincere appreciation to Dr. WilliamMcKee, my research supervisor, for his open door policy, andhis unwaivering support throughout the course of thisresearch, from the problem formulation, insightfulcommentary on the drafts, to the finished product.I am also especially grateful to Dr. Marion Porath forher constant support, her valuable comments, insights, andcritical analysis throughout the proposal and thesis stagethat has contributed immeasurably to the quality of thework.I gratefully acknowledge the contribution of Dr. NandKishor, who provided valuable feedback on my proposal andwho planted the \"conceptual seed\" that helped formulate theframework for this study.I wish to express my appreciation to Dr. John Slate forhis interest in this study as well as for his generosity forpermitting me to use his fabricated protocols for thepurpose of this project.I gratefully acknowledge the help of Drs. SuzanneJacobsen and John Carter and the school board officials whohave been instrumental in the subject solicitation process.A word of appreciation goes to the psychologists whovolunteered for this study without whose efforts this studywould not have come to be. Thanks folks!ixFinally, I wish to acknowledge the love andencouragement I have received from my family: my parents,Elton and Victoria Perot, my sister, Giovanna, and mybrother, Jean-Francois who have been patient in my academicpursuits from a distance. I also wish to thank Indar, whowas here for the writing of the thesis, for his moralsupport and encouragement during this period.CHAPTER 1INTRODUCTIONSchool psychologists often make decisions underuncertain conditions. For example, when a child is underconsideration for placement in special services, schoolpsychologists must make judgments based on different sourcesof information, then weigh the probabilities and outcomes asto whether the child needs these services (Fagley, 1988,p.311). Given the prominence of the Wechsler Scales andstandardized tests, one of the subareas withinpsychologists' profession which contributes to thisuncertainty is their judgments regarding the scoring ofverbal responses on the Wechsler Intelligence Scale forChildren-Revised (WISC-R). Research has demonstrated thatthere is a high degree of subjectivity involved in thescoring of responses on the Verbal subtests (Slate &Hunnicut, 1988). The verbal subtests are prone to elicitproblematic responses. These responses are usuallyambiguous responses that demand considerable judgment on thepart of the examiner (Brannigan, 1975) and therefore aredifficult to score. Additionally, the difficulty of scoring\"novel\" responses has been widely acknowledged. Sattler(1988, p. 147) amplifies the challenged posed in the scoringof verbal responses as illustrated in Figure 1.12HOW WOULD YOU SCORE THIS?IN THECANNEDVEGETABLEDEPARTMENT?Figure 1. Scoring DilemmaUsed with the permission of SattlerIn this respect, it is inevitable that psychologists oftenhave differences of opinion in their evaluations of the sameresponse. Despite the knowledge that psychologists differin their judgments of verbal responses, there lacks is alack of descriptive evidence in the literature linking thesedifferences to the actual judgmental strategies andheuristics that psychologists habitually employ in theirtask of scoring difficult-to-score verbal responses on theWISC-R. This is unfortunate since psychologists makeextensive use of the WISC-R in their practice, and knowledgeof the heuristics that they employ as well as their relatedthought processes may shed light on how they cope with areasthat are not clearly delineated in the test manual. In thebroader scope of psychologists' professional judgments inmaking complex decisions, Barnett (1988) calls for3conceptual links in order to analyze psychologist'sbehaviours in \"problem framing, planning and implementingstrategies\" (p.667). In this regard, one may extend suchconceptual links to the analysis of psychologists' taskperformance on the WISC-R from an information processingframework.Context of the StudyAs highlighted, the heuristics or \"guiding strategies\"that psychologists engage in have not been well articulatedin the context of scoring WISC-R verbal responses. Aprobable reason for the absence of such data is that studiesinvolving scoring differences on the WISC-R have not viewedthe testing process from the perspective of thepsychologist, that is, as a cognitive task requiringsignificant judgment. Consequently, with the focus onstandardization and objectivity in the testing practice(Hanna, Bradley, & Holen, 1981; Slate & Jones, 1989),psychologists themselves have often been overlooked as anactive part of the measuring process - a test process thatcalls for specific cognitive skills making often complexjudgments.^For example, the difficulty of the scoring taskincreases when the psychologist must make subjectivejudgments surrounding the \"appropriateness\" of a child'sresponse to a certain test item. The role of subjectivityis especially increased by responses that are not clearlyscorable by the test manual (Slate & Hunnicut, 1988).4According to Wechsler (1974), such exercises do indeed relyon the professional judgment capabilities of the examiner.Yet, psychologists are not specifically trained in optimaljudgment strategies; therefore there may be a discrepancybetween the guidelines in the manual and what psychologistsactually do.The Purpose of the Study The purpose of this study was to investigatepsychologists' judgments from the standpoint of cognitivepsychology. Through analysis of verbal protocols this studyinvestigated the strategies utilized by a group ofpsychologists while engaged in making evaluative judgmentsof difficult-to-score responses. The aim of the study wasto describe the particular cognitive strategies arising fromthe verbal data. Since this study was mainly descriptive innature, it is hoped that specific cognitive strategiesidentified in this research can be more fully detailed inthe future.Assumptions of the StudyAn underlying assumption of this research is that thetype and frequency of psychologists' underlying judgmentprocesses affect the frequency of their scoring of verbalresponses. DeNisi, Cafferty, and Meglino (1984) havesuggested that some strategies lead to more accurate ratingsin the appraisal process than others. Therefore, somepsychologists may more ably appraise problematic responses5than others as a result of the efficiency and effectivenessof the judgmental processes they employ.A second assumption is that the degree of clinicalexperience is not a factor in attaining accuracy. Bothgraduate students (Slate & Chick, 1989; Slate & Jones, 1989;Warren & Brown, 1972) and experienced psychologists(Brannigan, 1975; Miller & Chansky, 1972; Oakland, Lee, &Axelrad, 1975; Plumb & Charles, 1955) alike have been foundto be prone to errors in scoring the verbal subtests. Inother words, scoring difficulties do not seem to diminishwith experience.Justification of the StudyThe WISC-R is one of the most commonly administeredtests in clinical practice (Slate & Chick, 1989).Additionally, the extensive use of the WISC-R is reflectedin the graduate classroom where it is the most frequentlytaught individual intelligence test (Slate & Chick, 1989).The use of the Wechsler Scales may be traced as far back as1939 (Wechsler Bellevue Scales) which has afforded thesetests a lengthy history in assessment (Plumb & Charles,1955). Moreover, regarding the assessment of children, theWechsler Scales have continued in the form of the WISC andWISC-R (and more recently, the WISC III). The functionaluse of the WISC-R has emerged not only as an IQ test but asthe most widely used diagnostic tool in making importantdecisions regarding educational placement and special6services (Bradley, Hanna, & Lucas, 1980). Furthermore,especially with the emergence of the WISC III, it appearsthat the Wechsler Scales will continue to be usedextensively in clinical practice as an important diagnosticaid.Although psychologists go through extensive training onthe WISC-R, they do differ in their scoring. Psychologists'scoring differences impact negatively on the integrity ofthe test scores which in turn affects the validity ofsubsequent decisions based on these scores. Thus, if theprocesses behind psychologists' differences in scoring canbe studied, then greater understanding as to why there issuch high variability in marking verbal responses may behelpful for training. Moreover, according to Pitz and Sachs(1984), \"Errors in judgment suggest ways in whichperformance might be improved, especially if one understandswhy the errors occurred\" (p.141).SummaryIt is apparent that scoring differences persist despitetraditional training and detailed scoring guidance from thetest manuals. The same differences have been encounteredacross several similar measures where verbal responses arescored (Warren & Brown, 1972). In light of this evidence,this study sought to investigate the cognitive strategiesemployed by psychologists as they engaged in the scoring ofverbal scale responses.7The next chapter introduces a conceptual frameworkrelevant to the interaction of cognitive psychology and thepsychometric tradition. Included in the review ofliterature is a discussion on the usefulness of verbalreports as data since this form of data was used in thestudy.Chapter 2REVIEW OF THE LITERATURETheoretical Framework: Cognitive Psychology and thePsychometric Link The process of making evaluative judgments and thestrategies underlying this process have traditionally beenstudied within the field of work or organizationalpsychology. In such instances where judgment is required,employers are often called upon to evaluate employeeperformance (De Nisi, Cafferty, & Meglino, 1984; Kishor,1987; Mount & Thompson, 1987; Murphy & Balzer, 1986). Inemployee evaluation two separate traditions that bear on theissue of the judgmental process have been studied (Feldman,1981). These are the instrument-psychometric tradition andthe social psychology tradition. The former deals with thestudy of random error and systematic biases in performanceratings while the latter focuses on the cognitive systemsunderlying attribution processes, person perception, andstereotyping. In the past, both schools of thought havebeen kept separate; however, more interest is beingdeveloped in the interactions between the individual'scognition in making a rating and the psychometric instrumentthat the individual uses. According to Krzytofiak, Cardy,and Newman, 1988:89A significant body of appraisal research has approachedthe problem of error and bias by focusing on improvedinstrumentation. Because this attention to format hasbeen of limited success, appraisal research has beenfocusing on the role of the rater as an informationprocessor. (p.515)The psychometric tradition has until recently overlookedthe fact that the individual him/herself comprises part ofthe rating process. This is because the act of successfullymaking judgments requires individuals to sample from morethan one source of information. Anderson (1977) sees thissuccess as being contingent upon the \"ability to interpret,integrate, and deferentially weight information to arrive atan appropriate decision\" (p.68). Additionally, a number ofinvestigators have combined both traditions to study therelationship between cognition and the processes of makingrating judgments (Borman, 1977; DeNisi, Cafferty, & Meglino,1984; Mount & Thompson, 1987; Murphy & Blazer, 1986).Questions such as, \"[What] cognitive processes [orstrategies] are engendered by the various types of ratingscales ...?\" are being asked (Feldman, 1980, p.128). And,how does a rater's cognition intercede with an evaluativejudgment to produce a specific rating judgment? Forexample, in judging an employee's performance, schematicprocessing is a fundamental cognitive mechanism used inhuman judgment that may affect the final rating (Kishor,1987). It has been suggested that an employer-rater hascertain schematic categorizations that guide him/her to1 0notice specific employee attributes; these influence howhe/she makes a rating judgment (Mount & Thompson, 1987). Asin the perception of people, the perception of nonsocialstimuli which are ambiguous \"is often determined by what theperceiver expects to see\" (McArthur, 1981, p.204).^Forinstance, decision frames may guide the way an individualconceptualizes \"acts, outcomes, contingencies associatedwith a particular choice\" (Tversky & Kahneman, 1984).Therefore, rating judgments on the same employee bydifferent supervisors do not necessarily have to agree.Along the same lines, Slate and Jones (1988) havesuggested that psychologists may conceptualize verbal itemresponses differently and, as a result, need to learn toclarify response categories. Because psychologists mayconceptualize information differently, it is possible thatthey may rely heavily on individual strategies andheuristics and exhibit variations in these processes tocategorize responses. Such heuristics may be reflective ofthe systematic scoring patterns that fit their method ofweighing and integrating information. Much like employeerating behaviour, these mental processes are a part of theWISC-R rating behaviour. However, the mode of judging willdiffer between tasks because of the basic task structure andrequirements of each. For instance, Payne (1982) found thatjudgments change as the presentation of the task itselfchanges. Hence, one may infer that the mode of processing11information in WISC-R ratings is reflected by the contextualjudgmental strategies employed, and knowledge of the taskstructure. However, since the study of judgment processesis a relatively new endeavour within the field of cognitivepsychology (Rappoport & Summers, 1973), models describingencoding processes, judgmental heuristics and decisionstrategies are not as yet common. Most of the work in thisarea has been aimed primarily at providing descriptions oftask heuristics in the hope that, at a later date, a\"systematic theoretical presentation\" can be developed (Pitz& Sachs, 1984, p.146). However, if appraisal practice is toadvance, then further investigation intoinformation-processing involved in making appraisals needsto be addressed (Cardy & Dobbins, 1986).A Perspective: The WISC-R as a Cognitive Task The information-processing revolution that has takenplace in cognitive psychology over the past 2 decadeshas been characterized by an increased emphasis on theprocesses rather than the products of task performance.The major goal of the task-analytic approach that hasdominated the study of information processing has beento discover the elementary processes people use inperforming tasks and to understand the strategies intowhich these processes and strategies act. As a resultof this often successful pursuit of this goal, we nowhave a good understanding, at least at some level, ofhow people approach a large variety of tasks. (Sternberg& Ketron, 1982, p.399)Information processing psychologists study the mind \"interms of mental representations and the processes thatunderlie observable behavior\" (Sternberg, 1985, p.1). Inother words, information processing theory attempts to12describe the processes and strategies that underlie humanjudgment and problem solving (Schulman & Elstein, 1975).These may be qualitative processes such as the strategiesthat individuals employ in acquiring information and theways they use this information in certain problem-solvingsituations (Pitz & Sachs, 1984). Since \"cognitive tasksvary dramatically in well-definedness and specificity\"(Ericsson & Simon, 1974, p.119), it is possible toconceptualize the scoring of WISC-R verbal responses as acognitive task; the scoring of responses of the Verbal Scalerequires psychologists to actively process, weigh, andintegrate information when judging a response.To reiterate, an advantage to this approach is thatcognitive activities can be related to observed performanceor behaviours. Therefore, one can then make inferences asto what strategies the individual used to perform the task.Classic studies of problem-solving in chess (Chase & Simon,1973; de Groot, 1965,1966; Simon & Chase, 1973) and physics(Chi, Feltovich, & Glaser, 1981) from the perspective ofexpert and novice knowledge bases (cognitive contentmethodology approach) have provided insight into cognitiveprocesses. Such an approach often studies the comparativeperformance between experts and novices in different contentdomains. More recently, an information-processing frameworkhas been found useful in research on human performance inclinical diagnostic settings such as medicine and in13interactive instructional contexts, such as teaching(Fogarty, Wang, & Creek, 1983). These studies have beenuseful in describing the interaction between knowledge,cognitive processes and heuristics in problem-solving anddecision making. Sternberg (1985) summarizes the basicquestions of interest to researchers of this persuasion:1. What are the mental processes that constituteintelligent performance on various tasks?2. How rapidly and accurately are these processes performed?3. Into what strategies for task performance do thesemental processes combine?4. Upon what forms of mental representation do theseprocesses and strategies act?5. What is the knowledge base that is organized into theseforms of representation, and how does it affect, andhow is it affected by, the processes, strategies, andrepresentations that individuals use? (pp.1-2)Psychologists' Task Performance on the WISC-RQuestions such as those posed by Sternberg may behelpful in the investigation of the factors involved inpsychologists' evaluations and scoring of WISC-R responses.That is, the study of the underlying cognitive variablesthat affect psychologists' WISC-R task performance may befruitful in conceptualizing errors in scoring. Thetraditional manner of conceptualizing psychologists'performance on their use of the Wechsler Scales has beenfrom a quantitative or psychometric perspective. Thisapproach has tallied the number and kinds of errors14committed by psychologist-examiners. The usefulness of thisapproach has been that it has identified and quantified asignificant problem within the psychological profession;that is, the Wechsler Scales have been found to lendthemselves to highly variable scoring because of thedifficult-to-score responses that they elicit. However,despite the recurrent scoring problems evidenced in theWechsler Scales, studies focusing on these tests have simplycontinued to acknowledge this problem through thedocumentation of quantitative data underlying scoringaccuracy in test protocols. Such studies have not focusedreseach on understanding the nature and causes which giverise to these problems.Problems of the Verbal Scales The problems of scoring the Wechsler Verbal Scales havebeen acknowledged since the inception of theWecshler-Bellevue Scales in 1939 (Plumb & Charles, 1955).Scoring problems are also apparent in the WAIS, WAIS-R andthe WISC as well as other standardized tests. An extensivebody of research has shown that psychologists frequentlycommit serious errors when administering and scoring testprotocols (Franklin, Stollman, Burpeau, & Sabers, 1982;Hunnicutt, Slate, Gamble, & Wheeler, 1990; Miller & Chansky,1972; Miller, Chansky, & Gredler, 1970; Oakland, Lee, &Axelrad, 1975; Plumb & Charles, 1955; Slate & Jones, 1989;Walker, Hunt, & Schwartz, 1965; Warren & Brown, 1972).15More particularly, the most problematic tests have beenshown to be those which comprise the Verbal Scale (Oakland,Axeirad & Lee, 1975; Slate & Chick, 1989; Slate & Jones,1988). The studies that have examined the nature or typesof errors on the Wechsler Scales demonstrate with anoverwhelming consensus that the Verbal subtests are the mostdifficult to score and that the source of much variabilityin scoring stems from these subtests rather than from thePerformance subtests. Slate and Jones (1988) ranked theWISC-R subtests in terms of scoring difficulty. In theorder of most difficult to least difficult to score, thesewere the Vocabulary, Comprehension, and Similaritiessubtests. Information was ranked seventh out of the tenscales. Digit Span and Mazes were omitted.The problem of scoring protocols has been illustrated instudies where psychologists have been given identicalprotocols to score and have awarded different scores to thesame items (Slate & Jones, 1988). The nature of theseerrors usually involved giving more credit than required.For example, psychologists are more prone to giving 2 pointsfor a one point response, and 1 point for a 0 point response(Slate & Jones, in press). Errors also occur in failing torecord subject responses verbatim (Warren & Brown, 1972), aswell as through differences in questioning of ambiguoussubject responses (Brannigan, 1975). Additional questioningusually occurs for item responses that are not clearly16scorable by the test manual. Although a wide range ofempirical evidence addresses the impact of scoringdifference on the Full Scale and Verbal IQ, a brief reviewof some studies relating to this problem is warranted.In an early study, Miller, Chansky, and Grendler (1970)investigated the degree of agreement among 32psychologists-in-training in the scoring of WISC protocolscontaining fabricated responses. Although the authorshypothesized that ratings would be highly comparable, theyfound a wide range of scoring. The full scale IQ rangedfrom 76-93. They also found that verbal subtests lendthemselves to highly variable scoring. The Comprehensionand Vocabulary subtests were found to be most vulnerable toscoring errors.A later study was conducted by the same authors (Miller& Chansky, 1972) in which they investigated the agreementamong professionals in the scoring of WISC protocols.Surprisingly, professional psychologists seemed to fare nobetter. Sixty-four professional psychometricians scoredidentical WISC protocols. Again the greatest interscorervariability was produced by the Verbal subtests. This sameprotocol elicited an IQ range from 78-95 points whichindicates that psychologists typically vary in theirscoring. The authors commented that psychologists seem touse additional criteria other than the manual, however theydid not expand on these criteria. Similarly, Kasper,17Throne, and Schulman (1968) have suggested that individualsmay rely more readily on memory and experience than on themanual as they gain experience.Again, in a study involving 94 psychologists, Oakland,Axelrad, and Lee (1975) found the Verbal Scale to have alower interrater agreement than the Performance Scale onWISC protocols.Additionally, Babad, Mann, and Mar-Hayim (1974)investigated the effects of experimenter bias. Eighteengraduate students scored a prepared protocol. They weretold that the responses were those either of aunder-achieving disadvantaged child or a high-achievingupper middle class child. For both subtests, resultsindicated that means for the Comprehension subtest differedsignificantly, as well as the Verbal IQ score.As seen, variability in performance on the Verbalsubtests is a serious and recurrent problem, such thatsignificant IQ discrepancies have often been brought tolight after corrections. This is extremely worrisome sinceintelligence tests are routinely administered to childrenwho are functioning at a marginal level (Boem, Duker,Haesloop, & White, 1974; Warren & Brown, 1973).Nature of the WISC-R Verbal Scales The WISC-R Verbal Scale consists of six subtests:Information, Comprehension, Arithmetic, Similarities,Vocabulary, and Digit Span. However, only Information,18Comprehension, Similarities, and Vocabulary will bedescribed since these were the subtests used in the study.These subtests are untimed.Information measures memory of a wide range of generalinformation and knowledge gained from experience andeducation (Sattler, 1988;Searles, 1975; Truch, 1989). Suchinformation gives the psychologist an idea of the child's\"general range of information, alertness to the environment,social or cultural background, and attitudes towards schooland school-like tasks\" (Sattler, 1988, p. 147). The natureof the questions asked pertains to \"questions concerningnames of objects, dates, historical and geographical facts,and other such information\" (Sattler, 1988, p.147). Anexample of a Similarities item is, \"What are the fourseasons of the year?\" [item 11]. A more difficult questionis, \"Who was Charles Darwin?\" [item 29]. The starting pointof the test is determined by the age of the child. Eachitem is either given 0 or 1 point depending on the qualityof the response. The psychologist is allowed to question thechild by saying \"Explain what you mean or Tell me more\" ifthe response is not clear.^This subtest consists of 30items, and is discontinued after 5 consecutive failures.Similarities subtest measures essential relationshipsbetween facts and ideas, namely associated relationshipsbetween word-pairs (Searles, 1975; Truch, 1989). TheSimilarities subtest consists of 17 items. For each item,19the psychologist asks the child to differentiate between twowords. For example, for the first item of the Similaritiessubtest, the psychologist asks, \"In what way are a wheel anda ball alike? How are they the same?\" They are both roundwould be a correct answer (Wechsler, 1974, p.74). Allchildren are administered the first item. The first fouritems are either given 1 or 2 points. Additional items aregiven a score of either 2, 1, or 0 depending on thesophistication or conceptual level of the child's response.Two points are given for a general classification which isprimary to both words, 1 point for less pertinent butspecific properties common to both words, and 0 points aregiven for clearly wrong responses. For specific items,additional questioning is permitted to clarify ambiguousresponses. This subtest is discontinued after 3 consecutivefailures.Vocabulary consists of words that need to be defined.This subtest measures learning ability, word knowledgeacquired from experience, education, richness of ideas, kindand quality of language, and level of abstract thinking.This subtest is considered to be the best single measure ofintelligence of all the subtests (Searles, 1975; Truch,1989).^The Vocabulary subtest consists of 32 itemsarranged in increasing order of difficulty. Thepsychologist asks \"What does ^ mean? Or what is a ^?(Wechsler, 1974, p.89).^A score of 2, 1, or 0 is credited20to each item depending on the level of sophistication of theresponse. Examples of a 2 point response are, \"a goodsynonym\",\"a major use\", \"definitive...or primary features ofobjects\" (Wechsler, 1974, p.161). One point is given forpartially correct responses, synonyms that are lesspertinent, or a definition of a minor use of an object(Sattler, 1988, p.151). Psychologists are allowed toquestion vague responses. Some responses must be queried,if a (Q) appears in the scoring rules, and some responsesindicated in the manual, must be scored without furtherquestioning or clarification. Similar to the Informationsubtest, the starting point of this subtest is alsodetermined by the age of the child.Comprehension questions reflect the child's level ofmoral development and understanding of surrounding societalconventions. Success depends on social judgment, practicalinformation, as well as knowledge pertaining to pastexperiences in reaching solutions (Sattler, 1988, p.153).Knowledge of one's body and interpersonal relations are alsoreflected in the questions. This subtest consists of 17items. Examples of items are, \"What is the thing to do whenyou cut your finger?\" [item 1], or, \" What are you supposedto do if you find someone's wallet or pocket-book in astore?\" [item 2]. Responses to items are either scored 2,1, or 0. The child must express at least two of the generalideas listed in the manual in order to be awarded 2 points.21The child receives only 1 point for one idea, and 0 pointsfor an incorrect response. The psychologist is permitted toquestion vague responses in accordance with the queryingprocedures in the manual. Additionally, if the childreplies with only one idea, the psychologist may ask for asecond response. This subtest is discontinued after 4consecutive failures.Task SummaryEach of the above subtests requires the examiner topresent test items orally to the child. The test questionsare presented as written in the manual so that the examinerdoes not depart from standardized procedures. The child isexpected to answer and the examiner immediately records thechild's response as accurately as possible in the testbooklet. The examiner is not expected to indicate theappropriateness of the child's answer by providing feedback.However, if the examiner is uncertain as to what the childhas said, the examiner may ask the child to repeat theresponse in order to clarify ambiguous responses. If thechild's response is absolutely wrong as stated in the manualthen the child's response should not be questioned further.The examiner is expected to insure that the child iscomfortable in the testing situation so that the child maydo his/her best. The examiner should be aware of instancesof distractibility that may affect performance. Theexaminer can increase his/her awareness of the overallYesAdministration:Record responseInterpretation:scorable?YesScore ResponseAdminister NextItem> <—No —> ClarifyNo—>- Ceiling:Next Subtest22testing situation by knowing the \"task well enough, so thatthe test flows almost automatically, leaving [the examiner]maximally free to observe all aspects of the child'sbehavior\" (Sattler, 1988, p.103). Each subtest isdiscontinued when the child reaches the ceiling.Although this study focuses on the scoring judgmentaspect of the test administration process, the proceduresleading up to and following this phase are outlined fortheir contextual value.The Administration Process In the administration process, the psychologist usuallyprogresses through various decision points (see Figure 2).During an input phase, the psychologist must administer theINPUTPHASEJUDGMENTPHASEBEHAVIOURALPHASEFigure 2.^Model of Psychologists' Judgmental Process23test item as well as record the child's response asaccurately as possible.The judgment phase of the process involves reading thechild's response, and determining whether the response isscorable according to standardized procedures. If theresponse is clearly scorable, the psychologist mayautomatically proceed to the behavioural phase and score theresponse. As established in the literature, in many casesthe verbal response may not be clearly scorable by themanual and hence be difficult to judge. Therefore, in theactual testing situation the psychologist must seekclarification with additional questioning and record the newresponse. The psychologist must then reinterpret the answerto the item in light of the new response given by the child.The psychologist may then score this new response. What thepsychologist actually does at this point - the interpretivephase of the model - is the focus of this study.Finally, the psychologist proceeds to the next item andthe administration cycle begins again. If a ceiling isobtained, the next subtest is administered instead.An underlying assumption is that a smooth administrationrequires a high level of competence. The psychologist mustbe able to access specific knowledge relevant to taskperformance. \"Intelligent performance of complex tasksmeans doing the tasks correctly, with little or no waste[sic) motion - with few or no mistakes or detours along the24way\" (Simon, 1976, p.65). Psychologists must then have asolid foundation of declarative knowledge, [the backgroundor factual knowledge of a particular domain (Gavelek &Raphael, 1985)] as well as procedural knowledge. Proceduralor conditional knowledge refers to knowing the specificsteps involved in carrying out the task (Lesgold, 1990).Expert performance requires an enormous amount of suchknowledge (Simon, 1976).The task domain related to the scoring of verbalresponses is knowledge of the basic rules needed to performthe task properly. These basic rules may include knowledgeof the scoring procedures, knowing the starting point ofeach subtest according to the age of the child, when aceiling is obtained, and, when probing for clarification isappropriate. Such knowledge would determine how well themodel works for each psychologist. However, it is necessaryto mention that the basic declarative knowledge necessary todo well on the task may be interactive with other sources ofcomplex declarative knowledge that are related to the field,but not necessarily task-specific. An example of this typeof knowledge would be what the psychologist already knowsabout the child from school reports, or teacher conferences.Such knowledge may influence psychologists' interpretationsof responses. For example, Sattler (1988) states that \"thechild's performance on the WISC-R should be interpreted inrelation to all other sources of data\" (p.182).^However,25when this information is known beforehand it may influencescoring responses, especially in instances where responsesare marginal.Cognitive Strategies and Heuristics One of the ways in which cognitive psychologists studythe manner in which individuals treat information is byinvestigating the strategies that individuals employ whenheeding information during problem solving.^However , asWebb (1975) suggests, \"The language used to describeproblem-solving processes is cumbersome. The meaning ofstrategy or heuristic varies from study to study\"pp.103-104). Tversky and Kahneman (1983) define judgmentalheuristic as \"a strategy - whether deliberate or not - thatrelies on a natural assessment to produce an estimation orprediction\" (p.294). Additionally, Burns (1990) definesheuristics as \"cognitive shortcuts\" (p. 343) and Fischoff(cited in Kahneman, Slovic, & Tversky, 1982) extends thisdefinition to include individual strategies or non-optimalrules of thumb which are effective in some cases in guidingjudgments. For the purpose of this study Fischoff'sdefinition provides the definitional framework. This isbecause one may think of a heuristic as a cognitive strategythat sometimes leads to systematic bias in making judgments.In other words, not all cognitive strategies are effectivein bringing about appropriate judgments because of anincorrect problem-solving procedure - these types of26strategies are called heuristics.An example of a biased cognitive strategy, or heuristicis illustrated in the errors that children sometimes make inmathematical problem-solving. Buggy algorithms are apparentwhere children fail to borrow in subtraction problems (VanHaneghan, Baker, 1989, cited in McCormick, Miller, &Pressley). For example, Brown and Burton (1978, cited inGagne, 1985) found that some children consistently used theincorrect procedure of subtracting the two numbers in eachcolumn. These children ignored the position of the smallernumber. For example, if the smaller number was in the topposition of the column the children still proceeded tosubtract the larger number from the smaller in order to findthe difference without first borrowing. It may besuggested, then, that psychologists' judgments may involvethe application of similar types of biases in the difficulttask of scoring verbal responses.The next section discusses the limitations andadvantages of the cognitive psychology laboratory approachto study heuristics.Cognitive Psychology and the Laboratory Method One criticism of cognitive psychology is that the studyof strategies that individuals employ in the laboratorylacks external validity; that is, these strategies may notnecessarily generalize to everyday problem solving tasks.Laboratory experiments give rise only to \"anecdotal27evidence\" (Galotti, 1989) in relation to people's actualheuristics away from the laboratory (Burns, 1990). Forinstance, Galotti (1989) describes the limitations of thelaboratory approach to problem solving in the following way:The premises are usually already identified, the amountof irrelevant information...[is] restricted, the numberof inferences to be performed...[are limited] to one ora few, there often exist normatively correct answers.(p.343)In making decisions under uncertain conditions, trainedprofessionals performing tasks in their fields, do rely onjudgmental heuristics to make their exercises easier(Fagley, 1988). However, since all aspects of a problemsituation cannot be studied at once, an advantage of thelaboratory approach is that external variables can becontrolled so that an identifiable aspect a problem can bemore ably studied. This is especially useful in anexploratory study. For example, in this study all subjectswere given the same protocols to score under similarconditions. This methodology allowed for the comparativestudy of the cognitive strategies and heuristic processesamong psychologists engaged in the same task.Therefore, one may speculate that, in scoring the verbalresponses, psychologists do have systematic ways of indexinginformation. Where little ambiguity exists, assignment of astimulus to a category should be an automatic process formost psychologists (Feldman, 1981). Judgmental strategiesin these instances are instantaneous and perhaps similar28across psychologists. A response that is easilydecipherable and not cognitively demanding willautomatically be rewarded a consensual point value. On theother hand, heuristic processing - a more rudimentaryapplication of judgment - is more prevalent when ajudgmental situation is cognitively demanding. Strategiesin these instances are more deliberate (Pitz & Sachs, 1984)such that, in the face of difficult-to-score responses, onemay speculate that psychologists do sometimes find itnecessary to use certain \"rules of thumb\" or heuristics, forexample, referring to the test manual to match with similarexamples. Because heuristics often result in systematicerrors or biases (Svenson, 1985), individual psychologistsmay become insensitive to variations in data which mayaccount for scoring errors.^Interestingly, Kasper, Throne,and Schulman (1968, cited in Conner & Woodall, 1983, p.378)have suggested that, as a psychologist generally becomesmore experienced in the scoring of WISC-R protocols, \"s/hemay rely more heavily on his/her memory than on themanual...resulting in individual scoring patterns\" .Cognitive psychologists generally refer to tasks asproblem situations to which a solution is sought. In thecontext of this study, such a solution is a judgment choiceregarding a specific point value to award a response. Pitzand Sachs (1984) reiterate that \"[whenever] informationprocessing occurs as part of the [judgment and29decision-making process], the only observable behavior is aresponse - usually a... choice\" (p.152). In order to betterunderstand why individuals make certain choices,psychologists try to trace the solution paths by analyzingthe underlying thinking and mental strategies that areassociated with solving a particular task. One way ofstudying the thinking processes involved in the judgmentalprocess has been by means of verbal protocols (Ericsson &Simon, 1984; Klein, 1983; Pitz & Sachs, 1984).Verbal Protocol Analysis The seminal work of Ericsson and Simon (1984) has drawnattention to the benefits and the applicability of verbalreports to investigate the underlying cognitive processes indecision-making tasks. The technique of concurrentself-reports such as thinking-aloud and talking-aloudtechniques have traditionally been used to provide thisverbal data. The self-report technique requires thatsubjects verbally express all thoughts which come into theirminds as they perform a task (Ericsson & Simon, 1984). Inthinking-aloud, the more complex of the two self-reporttechniques, subjects are asked to verbalize both simple andcomplex thoughts while engaged in the particular task.Complex thoughts may include detailed information pertainingto sub-goals, goals, motives, reasons, and comments on thedomain-specific knowledge necessary to complete the task.Additionally, think-aloud reports are detailed enough that30decision rules to the solution process can be inferred.Alternatively, subjects may also be asked to report thesedecision rules (Crow, Olshaysky, & Summers, 1980; Klein,1983) or to report any hypotheses they used inproblem-solving (Ericsson & Simon, 1980).^As opposed tothink-aloud techniques, talk-aloud techniques are mostuseful when the experimenter is interested in general typesof information related to cognitive processes. Thistechnique requires subjects simply to say out loud whateverthey are saying silently to themselves in a problem-solvingepisode. Although there appears to be an overlap betweenthink-aloud and talking-aloud techniques, they do seem todiffer in the conceptual level of information they generate.This is because the type of instruction and probingquestions asked by the experimenter guides the subject as towhether events should be reported in general or moreparticular terms. This affects the depth of informationpresent in the verbal protocol. However, the usefulness ofboth techniques are such that they:Can reveal in remarkable detail what information[subjects] are attending to while performing thetasks...[they] can provide an orderly picture of theexact way in which the tasks are being performed: thestrategies employed, the inferences drawn frominformation, the accessing of memory by recognition\".(Ericsson & Simon, 1984, p.220)On the other hand, the underlying assumption that verbalreports are a reflection of what is in awareness or working31memory has been the subject of criticism by some cognitivepsychologists. Nisbett and Wilson (1977) hypothesized thatthe very act of verbalizing while engaged in the taskchanges the task environment and, therefore, the nature ofthe data. However, Ericsson and Simon (1984) have arguedthat verbal reports are not altered by thinking aloud, andtherefore will not interfere with ongoing thinkingprocesses. For example, Newell and Simon (1972) comparedthe think-aloud protocols of 7 subjects discovering proofsin propositional logic exercises to 64 subjects under thesame conditions. When the structure of search trees andsolution paths were compared, no differences were foundbetween the two groups. Other work is in agreement thatverbal protocols do not change judgmental behaviour (Payne,1980; Karph, 1973). Earlier research found that verbalizingactually aids in improving performance in terms ofuncovering general problem-solving principles and newreasons for specific choices (Benjafield, 1969; Dansereau &Gregg, 1966; Davis, 1968; Gagne & Smith, 1962).Methodology of Verbal Protocol Analysis Ericsson and Simon (1984) make the important point that\"thinking aloud does not by itself enforce an analyticalapproach\" (p.88) to understanding cognitive processes. Inorder to bring some level of conceptual understanding toverbalized thoughts, this raw data must be treated in somemanner in order for conceptual interpretations to be made.32The question, then, is, \"How do we characterize cognitivestructures, or thoughts?\" In verbal protocol analysis,thoughts are usually characterized by separating theprotocol into smaller units, or segments.SegmentationSegmentation refers to the breaking apart of an entireprotocol into smaller units. Protocols may be segmented indifferent ways depending on the nature of the study and theresearch question. However, when a protocol is segmented,each segment usually represents one instance of a generalcognitive process (Ericsson & Simon, 1984). According toEricsson and Simon (1984), the appropriate and mostfrequently used cues in segmentation are: \"pauses,intonation, contours...as well as syntactical markers forcomplete phrases and sentences - the cues for segmentationin ordinary discourse\" (p.205). In some cases, relyingsolely on this type of segmentation may be inappropriate.Where the actual content of the protocol is the importantfactor, it may be more appropriate to use idea or semanticunits as the criteria underlying segmentation (Smith, 1971).Therefore, a segmented element in a protocol may be definedas a semantic unit judged to represent a complete thought.After each protocol is segmented, the next step is to encodethese segments.EncodingThe process of encoding may be described by the actual33matching of a segment to a category. In model-basedencoding categories are usually already defined. The choiceof categories may be based on existing theory, or categoriesalready existing in the literature (Glaser, 1978).Alternatively, categories may be constructed throughknowledge and procedures of the experimenter, such as pilotstudies (Ericsson & Simon, 1984; Kilpatrick, 1968). Forexample, in a study investigating the relationship betweenthe thinking aloud technique and problem solving ability ofmathematics problems, Flaherty (1974) was able to devise andrevise categories that were appropriate to the task througha pilot study. A schematic representation of this processis seen in Figure 3.Input^>Encoding^>Output(Segment)^(Category)Figure 3. Encoding Process Adapted from Ericsson & Simon, 1984, p.276In contrast, where a study is exploratory andappropriate categories do not exist or existing categoriesare not suitable, it is not uncommon to segment and codesimultaneously (Kilpatrick, 1968; Glaser, 1978; Glaser &Strauss, 1967).The next section describes method for deriving validity34of verbal data.Deriving Validity from Think-Aloud Data The type of validation procedure used in verbal reportmethodologies generally depends on the nature of the study,and the question asked. However, data obtained from verbalreports generally begins with the credibility ofconstructing conceptual codes from transcribed data (Glaser& Strauss, 1967). Thus, validation usually involves thecomparison between the coded verbal report and some othertype of measure that refers or is related to the same eventsin the same fashion (White, 1980). This procedure allowsfor a valid comparison between two related events, eachconcerned with the same question.In studies involving numerous subjects and wherequantitative data is available, external validity may bemeasured by predicting success on a criterion variable.For example, in a study involving problem-solving patternsof 8th grade students, Kilpatrick (1968) used verbalprotocol data coded according to two schemas, heuristicstrategies and processes. He used the methodology ofcorrelating the verbal data with performance scores, such assolution times and solution scores. Through this method hewas also able to refine his data by eliminating artifactualcorrelations resulting from properties inherent in thecoding system.Validation may also be a measure of a subject's35self-awareness of the particular activity studied (Smith &Miller, 1978). In a study involving students' ability toreport their decision criteria surrounding the choice ofwhat college to attend, Berl, Lewis, and Morrison (1976;cited in Smith & Miller, p.360) found that students wereable to report on the cognitive processes that underpinnedtheir decision concerning what school to attend. They foundthat the students' reports correlated well with reportsrelating to the actual criteria that governed theirdecisions.Another way to validate verbal protocol data is toconstruct a subject's problem space from the initial step tothe solution in such detail that each minute step in problemsolving can be evaluated. Newell and Simon (1972) used thismethodology when they investigated subjects' capabilities toproblem-solve crypto-arithmetic problems. Subjects wererequired to decode arithmetic problems where digits werereplaced by letters. In this way the extent of thesubject's awareness of problem-solving could be determined.It was found that subjects were able to clearly report ontheir steps and also give reasons as to why they wereundergoing certain steps. In exercises as detailed asthese, Smith and Miller (1978) report that \"there is noreason to believe that anything is going on besides whatsubjects can report\" (p.360). However, White (1980) warnsthat since an individual's cognitive activity and strategies36are the variables of interest, comparison between groupmeans in control and experimental situations does not bestrepresent the nature of the question.Thinking Aloud Methodology The motivation of employing a thinking aloud methodologyin this study was to acquire a veridical description of thethought processes involved in making scoring judgments andthe explanations underlying these judgments. In a studyinvolving intransitive preferences, Montgomery (1977) found\"that think-aloud protocols from single individuals can givevaluable information about decision processes in asmuch asit is possible to describe the [subjects'] choices by meansof choice rules that were derived from the think aloudreports\" (p.360). Additionally, the thinking-aloudtechnique has been used successfully to study cognitiveprocesses reflecting financial decisions of bank trustofficers, chess moves of chess players, and diagnosticreasoning of clinical psychologists and physicians(Peterson, Marx, & Clark, 1978).Ericsson and Simon (1984) have found that verbal reportsfrom even a small number of subjects are useful in terms ofgenerality of cognitive processes and strategieswithin-subjects and generalizability between-subjects overtasks.Lastly, since the scoring of the WISC-R verbal subtestsis based on psychologists' professional experiences and37activities, familiarity of the task environment is amethodological advantage with regard to the facility and theaccuracy with which subjects report their thoughts. White(1980) suggests that:The making of the judgment and of the report should be amatter of some ease for the subject, not a task initself requiring concentration and effort. The subjectshould not have to be preoccupied with the mechanics ofan unfamiliar task or the problems of comprehendingdifficult instructions. (p.107)SummaryThis chapter has focused on the importance of WISC-Rscores in psychological testing. The literature reviewedestablished that psychologists make errors in scoring,particularly on the Verbal subtests. The types andfrequencies of errors have been identified in previousresearch, but the underlying reasons have not been analyzed.In order to help explain these differences in scoring, acognitive psychology framework was adopted, and the natureof research in cognitive psychology on other tasks wasrepresented. One assumption is that the task of scoringpresents cognitive demands which might account fordifferences in scoring.Thirdly, the methodology of verbal protocol analysis wasreviewed.The research questions are presented below:1) What are the common mental strategies that underliepsychologists' judgments in their approach to the scoring ofthe same task?2) To what degree are their strategies similar?3) To what degree are psychologists aware of their owncognitive strategies?In the following chapter the methodology for theinvestigation is described.38CHAPTER 3METHODOLOGYThis chapter presents a description of the sample ofpsychologists who volunteered for this study. Alsopresented is a description of the procedures underlying thedevelopment of the categories for the verbal data, theanalysis of this data, and the training of the independentcoder.Sample A sample of 23 psychologists was solicited from 4 schooldistricts in the Greater Vancouver area. The majority ofthe psychologists worked primarily within the school system.Two also held positions in hospitals. Subjects weresolicited through representatives in their respective schooldistricts. The names of potential volunteers were thengiven to the investigator. The investigator contacted eachperson by phone in order to gain their participation in thestudy.The median number of years that subjects worked aspsychologists was 5.00 years. The psychologists' levels ofeducation varied. One psychologist had a bachelor's degree;fifteen were master's level psychologists; and the remainderwere doctoral level psychologists. Seventy-eight percent ofthe psychologists were trained on the WISC-R as well as the3940Wechsler Preschool and Primary Scale of Intelligence(WPPSI), 70% also had training on the Stanford-Binet IV.Seven psychologists described themselves primarily as schoolpsychologists, two as educational psychologists, two asdevelopmental psychologists, and one as a special educator.The remaining 11 in the sample described their jobs aseclectic in nature. That is, they worked in the capacity ofat least two of the following categories: schoolpsychologist, educational psychologist, counsellingpsychologist, or special educator.Procedures Session 1: The 23 psychologists were mailed a completedWISC-R protocol that contained fabricated responses fromfour verbal subtests (Information, Similarities, Vocabulary,and Comprehension). Also included in the package were twoconsent forms (one to be retained by the participant) whichdescribed the nature of the study and a backgroundinformation form. The subjects scored the subtests at theirown convenience and mailed them back to the investigatorwith the background form and a copy of the consent form.The psychologists were asked also to indicate on the consentform whether they were willing to participate in thethink-aloud session. Each psychologist was assigned anidentification number and all data was coded with thisnumber in order to preserve subject confidentiality.41The Stimulus Protocol The stimulus protocol is a fairly new instrument. It isone of a series of protocols developed by Dr. J. Slate ofArkansas State University. The protocols are part of anunpublished text, Guide to administering and scoring the WISC-R (Slate, 1991). The protocols are currently beingused for research purposes at other universities and fortraining in Dr. Slate's Intelligence Testing course.The protocols were employed in a study by Dr. Slate inthe summer of 1991. Although the results have not beenwritten up as yet, the mean error rate per protocol wasfound to be about 3 per protocol. The protocols wereconstructed to be as difficult as possible to scorecorrectly (J. Slate, personal communication, Nov. 22, 1991).Session 2: A subsample of 9 psychologists participatedin session two. These persons were contacted to set up atime and a place of convenience to participate in thethink-aloud exercise. Usually, the investigator met withthe psychologists in the district for the exercise.One protocol spoiled yielding a total of 8 protocols forsession two. The median number of years that subjects workedas psychologists was 5.00 years. Five were master's levelpsychologists, the other three were doctoral levelpsychologists. The sample described themselves mainly asdistrict school psychologists. They all had training on theWISC-R as well as on the Stanford-Binet IV; all except one42psychologist had undergone training on the WPPSI.A Comprehension subtest, WISC-R II measure (see AppendixB), was selected for this think-aloud session since it isone of the verbal subtests that have been shown to produce alarge amount of scoring variability (Slate, 1988). Althoughthe Vocabulary subtest has been shown to produce the mostvariability in scoring, this test was not appropriate due toits length. Time constraints were a concern of thesubjects.For this exercise, subjects were first given a briefwarm-up think-aloud task. This task involved analogy-typequestions that required the psychologist to reason out loud.In order to acquaint them with the think-aloud method, theexperimenter first demonstrated this exercise for thesubject.^When subjects were comfortable with thethink-aloud method they then proceeded to the actualthink-aloud task on the Comprehension WISC-R II measure,that is, to verbalize what they would normally be thinkingas they were judging a response. At the end of the session,the subjects were asked this final probing question, \"Werethere any strategies that you were conscious of that aidedyou in deciding what point value to award a response?\" Suchadditional information acted to validate the think-alouddata through the comparison of actual performance on thetask in response to the third research question. Theduration of session two was, on average, 25 minutes (see43Appendix B for complete script for warm-up exercise).The next section describes the treatment of the verbaldata derived from this sample.Development of Verbal Categories The methodology behind category development firstinvolved transcribing all tape-recorded responses of eachprotocol into written text. The written text was thensegmented into semantic units for analysis. A semantic unitwas previously defined as a phrase or sentence representinga complete thought (Smith, 1971). All units were coded fromcategories that were derived from the data itself. It iscustomary to derive coding categories from the data presentin the protocols themselves (Ericsson & Simon, 1980).According to Glaser (1978) it is desirable to enter researchwithout predetermined ideas; this methodology allows theinvestigator to remain open to the data generation process.Merely selecting data for a category that has beenestablished by another theory tends to hinder thegeneration of new categories, because the major effectis not generation but data selection. (Glaser & Strauss,p.37)The analysis of verbal protocols proceeded in threestages. In the initial stage, before segmentation, a pilotstudy was conducted involving a logical task-analysis ofthink-aloud protocols of a student-psychologist and apractising psychologist. The basic coding scheme wasderived from this method. Seven basic coding categoriesthat reflected trends across both protocols were derived:self-questioning behaviour, self-regulatory behaviour,general metastrategic statements, memory, manual, andrecommendations/evaluations.The second stage of category development involvedcategory finetuning. This stage involved the analysis ofprotocols that were subsequently collected, and thensegmented into semantic units. The basic method offinetuning involved the matching of examples (segmentedunits) from these protocols with an appropriate categoryderived from the pilot study. The investigator notedinstances where an example could not be matched in acategory, or seemed to fall within two categories.Consequently, two categories were altered, and twoadditional categories were added to the set.Self-questioning behaviour and self-regulatory behaviourwere collapsed into monitoring statements, recommendationsand evaluations were also collapsed into one category, andplanning behaviour and self-explanations categories wereadded. Again, a total of seven categories was obtained.Such revision is not uncommon in the treatment of verbaldata. According to Glaser and Strauss (1967), themethodology of jointly collecting, coding, and analysis ofdata should be an interactive process.Analysis of Semantic Units Each protocol was first segmented according to idea or4445semantic units. For illustrative purposes, an example ofthe segmented units from two protocols as well as thecomplete protocols are presented in Appendix C. Eachsegment was then assigned to a category by two coders. Inorder to prevent coders from using contextual informationand to preserve objectivity in the coding scheme, allsegments were printed on separate cards and then randomlycoded.Training of the Coder Second party verification was necessary to obtain areliability index regarding coding. Therefore, adescription of coder training is given.The investigator first defined the categories for thecoder. The coder was able to ask questions at this point sothat nuances in category definitions could be clarified.In order that further misconceptions could be clarified thecoder was first trained on practice units especiallydeveloped by the investigator for this purpose. The coderfirst sorted about 15 cards out loud into categories, andgave reasons for specific category choices. If the codermade an error during this process, the investigator stoppedthe exercise and clarified the categories. The coder sorted10 more cards without interruptions. The investigator wentthough any corrections with the coder. Next the codersorted 10 more cards, and at this point the coder was readyfor the actual coding exercise.46The coder then proceeded to sort the 281 segmentsderived from all subjects into the seven categories. Acoding reliability index was obtained by computing thepercentage agreement between the two coders. The percentageagreement was found to be 93% - 261 units. The 20 units onwhich no agreement was found were dropped from furtheranalysis. This was not a major problem in affecting anysubsequent analysis since the units were few in number andthe reliability was already quite high.Summary of Instrumentation 1. DEMOGRAPHIC (BACKGROUND) QUESTIONNAIRE - Psychologistswere asked to provide information relating to professionalexperience, level of education, and formal training intesting. This information was required in order to havedescriptive data for the sample (see Appendix A for copy ofquestionnaire).2. WISC-R MEASURE I consisted of four verbal subtests:Vocabulary, Comprehension, Similarities, and Information. Anage was attached to the protocol. Each subtest containedsome administrative errors which consisted of responses thatwere inappropriately cued and items administered beyond aceiling level. The scoring criteria then required thepsychologist to assign the correct point value to eachresponse item. If an item was inappropriately queried (Q),the psychologist was expected to assign a point valueaccording to the procedure outlined in the manual. For47example, if a one point response was given initially, and itwas incorrectly probed on the protocol and a two pointanswer was subsequently recorded, it should still beassigned a one point. On the other hand, if a zero pointanswer was given initially, and an incorrectly queriedresponse elevated the point value, the point value shouldstill be recorded as zero. Other errors on the subtestincluded inaccurate starting points (see Appendix B for copyof WISC-R I measure). Each subject's protocol was checkedagainst Slate's scoring key for point assignment differencesper item. Each subject provided a total raw score whichwere checked for addition errors. There were no errors inaddition. For each subtest the total raw score wasconverted into a scaled score. A prorated verbal IQ wasalso calculated for each subtest.3. WISC-R MEASURE II consisted of a single Comprehensionsubtest from a different fabricated protocol. No age wasattached to this subtest. As with the WISC-R I measure thecomplexity of the task involved the simultaneousconsideration of whether an item was correctly administeredor not, that is, a decision as to whether a response wascorrectly or incorrectly queried and response scoringjudgment. The task also involved the judgment of ambiguoustype responses, for example, \"treat it (Q) treat it withthings at home\", and a multiple response, \"catch bad people,arrest crooks, enforce laws\" [item 4]. Lastly, the measure48also consisted of items administered beyond the ceiling (seeAppendix A for copies of instrumentation).As with the WISC-R I measure, the same scoringverification procedure was followed here except that averbal IQ was not computed since this measure consisted ofonly one subtest. Additional data obtained from thismeasure involved the development of verbal categories fromaudiotape transcriptions.The following chapter summarizes the demographic datafor the sample and presents the results of the analysis ofthe WISC I and WISC II data.CHAPTER 4RESULTS This chapter presents the research findings of the twosessions. There were two sources of data for this study.Twenty-two WISC-R verbal protocols were obtained fromsession one. One subject was not able to participate in thefirst session but was able to do so in the second session.These protocols comprised the Information, Similarities,Vocabulary, and Comprehension subtests, which providedscoring information descriptive of the sample. Secondly,eight verbal protocols were analyzed from a think-aloudexercise using a Comprehension subtest from another measure.This exercise provided general descriptive information ofpsychologists' cognitive strategies. The complete resultsof the demographic questionnaire is also presented in thischapter.Demographic Characteristics of Sample The demographic data pertaining to the 23 subjects arepresented in Table 1. Separate demographic data are alsopresented for the 8 psychologists whose data was analyzedfrom the think-aloud session. The primary variables ofinterest included experience, educational level, andprofessional training.The median number of years of experience indicated that4950the sub-sample had a slightly higher degree of experiencethan the total sample however, all members of the secondgroup had formal training on the WISC-R as opposed to 78% ofthe total sample. Another point is that about half (52%) ofthe total sample described their profession as varied, asopposed to 62% of the second sample who preferred the labelof school psychologist.Table 1Demographic Characteristics of SampleCharacteristic^Total Sample (23) Subsample (8)Experience (years)Median 5.00 6.00Range 1-27 1-12Educational LevelB.A. 1(4%) 0M.A. 16(70%) 5(62%)PhD/EdD 6(26%) 3(38%)Formal TrainingWISC-R 18(78%) 8(100%)WPPSI 18(78%) 7(88%)Stanford-Binet IV 16(70%) 8(100%)ProfessionSchool Psychologist 7(30%) 5(62%)Educational Psychologist 2(13%) 2(25%)Counselling Psychologist 0 0Psychometrician 0 0Special Educator 1(4%) 0Eclectic 11(52%) 1(13%)Developmental Psychologist 2(9%) 0As some members of the total sample did not have formalclassroom training on the WISC-R, a two-tailed t-test wasconducted to compare the total number of errors made between51those subjects who had formal training and those who didnot. The t-test revealed no significant differences betweenthe two groups, t(20)=.17, p>.1. The nature of these errorsare presented in the next section.Session One WISC-R I Results The number and types of errors were computed acrosssubtests for the overall sample. The analyses revealed thatthe Vocabulary subtest was most prone to scoring errors.The Comprehension subtest was found to be the next highestin errors, then Similarities, and Information.Additionally, when psychologists erred, they were prone togive more credit than less credit. This result was obtainedby comparing instances where psychologists awarded morecredit when they should have given less (64 instances) togiving less credit when more credit was necessary (50instances). Frequencies of types of errors are summarizedin Table 2.Next, in order to determine how great inter-psychologistdifferences were, scaled scores as well as a prorated VerbalIQ for each protocol were calculated. The Verbal IQ (PRO)was found to vary by as much as 11 points. The averageVerbal IQ for the sample as well as the average scaledscores for each subtest is presented in Table 3. Slate'sscoring key for the WISC-R I measure is also presented inAppendix B for comparative purposes.52Table 2Types of Errors Across SubtestsError type0 point for a 2 point answer0 point for a 1 point answer1 point for a 2 point answer1 point for a 0 point answer2 points for a 1 point answer2 points for a 0 point answerInappropriate questioning**Failure to question***Failure to obtain a correctceilingFailure to credit items belowbasalTotal (n=22)Info Sim Voc Comp*0 1 6 42 0 7 70 0 15 710 6 2 30 9 12 40 8 2 16 24 36 4410 6 15 143 0 0 113 NA 0 NA44 54 95 85* Comprehension scores based on 21 subjects** Instances where the subject agreed withinappropriate questioning on the protocol, as wellas introduced inappropriate additional questioning***Failure to indicate on protocol additional(correct) questioningAs a second check of scoring variability, standarddeviations [SD] were computed for each subtest and werecompared with the Wechsler norms, that is, the standarderror of measurement for each of the scaled scores. Thiscomparison was also done for the Verbal IQ (PRO). A similardata analysis was previously used by Oakland, Lee, andAxelrad (1975). They suggested that the \"SDs represent arange of scores reflecting the degree of interrater53Table 3Comparison of Scaled Scores to Slate's KeySubtest Scaled ScoresMean^SDRange Scoring keyInformation 7.23 .92 6-9 7Similarities 9.00 1.54 7-13 8Vocabulary 7.05 .58 6-8 7Comprehension 5.43 .87 4-7 6Verbal IQ 82.52 3.88 80-91 81variability. The higher the SD, the greater the variability;thus higher SDs reflect lower interrater consistency orreliability\" (p.229). Therefore, if the standard deviation ofeach subtest is less than the standard error of measurement asreported by the manual then the scoring may be said to berelatively homogeneous and within the bounds of measurementerror. Results indicated that except for the Similaritiessubtest and Verbal IQ, all standard deviations were substantial,although less than the SEMs as reported in the manual. Theresults of these analyses are summarized in Table 4.54Table 4Means and Standard Deviations and Standard Errors of Measurementfor Scale Scores and Verbal I0Subtest Mean SD SEMInformation 7.23 .92 1.12Similarities 9.00 1.54 1.28Vocabulary 7.05 .58 .87Comprehension 5.43 .87 1.51Verbal IQ 82.52 3.88 3.57Ancillary Analyses When differences in assigning credit to specific items ascompared to Slate's key was the sole consideration, differencesaveraged 6.45 items per protocol for the total sample. Abreakdown of the sample revealed that those who participated insession one alone (n=15) had a mean of 6.27 point differences,while those who also participated in the second session (n=8) hadslightly higher mean of 6.85 point differences (Table 5). Atwo-tailed t-test between these two means revealed no significantdifferences between the two samples, t(20)=.35, 2>.1.55Table 5Comparison of Point Differences Between GroupsError of^Mean Errors^Mean ErrorsItem Credit Session One Alone^Both SessionsDifferences6.45^ 6.27^ 6.85(SD=3.65)^(SD=3.92)^(SD=3.24)(n=22) (n=15) (n=7)Secondly, a Bartlett-Box F test was conducted to test forhomogeneity of variance between point differences for all groups.The test revealed no significant difference between the groupvariances, F(2)=.14, p>.1.A second analysis compared the mean number of errors withsubjects who participated in session one alone to those whoparticipated in both sessions. Those who also participated inthe second session did not differ considerably from those who didnot. Both groups made a similar number of errors when resultswere averaged across protocols. Table 6 summarizes theseresults. Furthermore, a two-tailed t-test found no significantdifferences between the two group means, t(20)=.08, p>.1.56Table 6Comparison of Total Errors Between GroupsMean Errors^Mean Errors^Mean Errorsof Total group^Session 1 Alone^Both Sessions12.64^12.73^ 12.43(SD=7.74)^(SD=8.76)^(SD=5.53)(n=22) (n=15) (n=7)Furthermore, a Bartlett-Box F test found no significantdifference between the three group variances for totalerrors, F(2)=.74, p>.1. These results of the variance testsindicated that the groups were relatively homogeneous inrespect to the overall errors made as well as differencesmade in assigning credit to items. However, the overallresults also indicated that psychologists do exhibitdifferences in scoring as illustrated by the Verbal IQ range(Table 3). In this respect, in order to explain thisvariability within the sample, it was important toinvestigate psychologists' descriptions of their thoughtprocesses through verbal protocols.Results of Session TwoThis section presents the types of categories derivedfrom the verbal data. These categories represent the commonmental processes and strategies that underlie psychologists'57scoring judgments. Following the presentation of thedevelopment of category types, results of the analyses ofcategory use within and across subjects are presented.Types of Verbal Categories 1) Monitoring Statements (MS): This category either reflectsdiffering states of comprehension or assessments of one'sprogress, that is, reflecting upon what one knows or doesnot know. The subject may verbalize statements referring toa lack of information in the response, an unclear response,or a response needing clarification that hinders thisprogress.These statements may refer to instances of:a) \"self-questioning\" behaviour.Examples: - I wonder whether that should have beenqueried.- I'm just not too sure what to do.- I'm not sure.b) \"self-regulatory\" behaviour: includes checking,revision, monitoring, or regulating one's progress.Examples: - I'll just do a quick run through.- I'm going back here.- I'll just confirm that.- I don't need that.2) Planning (PL): This category refers to the subject'sawareness of the task demands and goals that help make theproblem easier to solve. Mechanisms to resolve the problem58include organizing ideas into goals for attacking theproblem. That is, subjects may break up the problem intoconstituent parts. These are usually stated asintentions.^Examples: - Okay, now I have to do two things.- I remind myself that I'm looking for acouple of things here.- Okay for two points you need both of thosegeneral ideas.3) Self-explanation (SE): In this category the subject mustgo beyond the information given.^This is because \"thequantity and quality of a response is not sufficient towarrant a confident judgment\" (Flavell, 1979). The subjectovercomes the incompleteness of an example by derivingimplications and/or making inferences by expanding on theinformation present.^This subject may also explain theinadequacy of an answer by giving reasons. This means thatmore conceptual information is necessary if the subject isto properly evaluate a response.Examples: - We can imply that \"to go over myself\" wouldbe to get help.- That really doesn't expand on theinformation.4) General Metastrategic Statements (GMS): This categoryrefers to any individualized or personal compensatorystrategies or personal feelings that subjects fall back onin the evaluation process. These types of strategies may be59personalized strategies. Such strategies may includeadditional probing and/or proceeding beyond the ceiling.Examples: - I always give more points.- When in doubt I always give more.- I never score as I administer the test, buteven when I think I have a ceiling I stillgo on.- I would like to know more about the kid.- What a great answer!5) Memory (MEM): The subject may first recall informationfrom memory before checking the manual. That is, does thechild's response match the subject's concept in memory?This is indicated by a fairly quick response indicating aconcept already held in memory. For instance, the subjectmay automatically classify the response by giving it a pointvalue before checking the manual.Examples: - It's a fairly clear response, it'ssomewhere in the manual.- It's in the guidebook; I'll check itthough.- It sounds/looks/appears like a one pointanswer.- My initial reaction is that...6) Manual (MAN): In this category the subject actually readsfrom the manual. The subject tries to find a correspondingmatch between the child's response and a concept in the60manual. This is done when the subject matches the responseto the sample response in the manual or to the generalcriteria.Examples: - I always consult the manual.- That falls under the \"Insulation\" category.- \"Treat it\" is in the manual.7) Recommendations/Evaluations(R/E): This category refersto the domain-specific information that reflects thesubject's interpretations about rules, procedures andscoring practices that bring about accurate scoring.Examples: - This kid should have been questioned atthe protocol.- That didn't need to be cued.- It was administered correctly.Category Frequency of Verbal Categories Across Subjects An analysis of category frequencies indicated thatpsychologists depended quite often on the WISC-R manual,although not overwhelmingly. This was represented by 27.6%of the reported verbalizations. In instances wherepsychologists actually consulted the manual but did notverbalize this behaviour, the investigator took note of thisfact. So as not to misrepresent the verbal data, this wereaccommodated for in the actual coding.Next to the use of the manual, psychologists reportedthe next greatest amount of time engaging in cognitive61activity that made reference to evaluations andrecommendations. Comments were usually made about whetheritems that were appropriately queried. This categoryaccounted for about 22.2% of the verbalizations.Memory accounted for 14.2% of the verbalizations, andmonitoring statements were reflected in 13.4% ofverbalizations. For self-explanations, psychologistsgenerally did not make their own interpretations ofseemingly problematic responses before checking the manual.This was only done 9.9% of the time. Planning statementswere minimal, they accounted for only 6.5% of the reportedverbalizations. Similarly, metastrategic strategies(personal statements) were also underrepresented by thelowest statistic of 6.5 of the verbal categories. Apercentage summary of the cognitive strategies isrepresented in Table 7.Strategy Use Across Subjects An index of the degree of similarity of strategy use wasreflected in category frequency data across subjects. Aninter-subject comparison across categories indicated thatpsychologists varied widely in the extent of particularstrategies pertaining to the scoring task.^For example,the highest percentage frequency of using the manual was 54%for one psychologist (Subject #8) as opposed to 18% foranother psychologist (Subject #13). One psychologistmonitored his/her work 23% of the time (Subject #2), another62Table 7Frequency and Percentage Categories of Verbal BehavioursCategories^ Frequency (Percentage)Manual 72 (27.9%)Recommendations/Evaluations 58 (22.2%)Memory 37 (14.2%)Monitoring Statements 35 (13.4%)Self-explanations 26 (9.9%)Planning 17 (6.5%)General Metastrategic Statements 17 (6.5%)only 4% of the time (Subject #8), and one psychologist madeno references at all to general metastrategic statements(Subject #8) while all the others did make somemetastrategic references. Percentage-wise, Subject #15accounted for most of the metastrategic statements such as\"But let's just try to find out more about the kid\" or\"Sometimes I'm able to put thing in context if I know howold he is\". Interestingly, one psychologist neververbalized planning statements (Subject #13) and anotheronly 3% of the time (Subject #11). These results aresummarized in Table 8.63Table 8Frequencies and Percentages* of Cognitive Strategies AcrossSubjects in each Category2 8Subjects10^11 12 13 15 18MAN 5(23) 15(54) 11(31) 11(31) 6(21) 3(18) 12(19) 9(29)R/E 5(23) 3(11) 5(14) 9(25) 8(29) 8(47) 14(22) 6(19)MEM 2(9) 1(4) 6(17) 8(22) 4(14) 1(6) 10(16) 5(16)MS 5(23) 1(4) 7(20) 3(8) 4(14) 1(6) 9(14) 5(16)SE 1(4) 5(18) 2(6) 4(11) 3(11) 3(18) 5(8) 3(10)PL 3(14) 3(11) 2(6) 1(3) 2(7) 0 4(6) 2(6)GMS 1(4) 0 2(6) 1(3) 1(4) 1(6) 10(16) 1(3)TotalNumber 22 28 35 36 28 17 64 31*percentages are presented within parenthesesScoring by Item on the WISC-R II Measure Psychologists' scoring on the Comprehension WISC-R IImeasure is presented in Table 9. The total raw score foreach subject is presented as well as the total scoringdifference from Slate's key. This difference is alsorepresented as a distribution reflecting the degree oflenience in scoring. For example, Subject #2's total rawscore was greater by 3 points comparative to Slate's key(Total scoring diff). The source of scoring differences isrepresented by a scoring distribution (Score dist) wherethis subject gave more points to 5 items and less points to64Table 9Patterns of Scoring on WISC-R II MeasureItems Key2 8 10Subjects11^12 13 15 181 1 1 1 1 1 1 1 1 12 2 2 2 2 2 2 2 2 23 1 2 1 1 1 1 1 1 14 1 1 2 2 1 1 1 1 25 2 2 2 2 2 2 2 2 26 2 0 2 0 0 0 2 2 27 2 2 2 0 2 2 2 2 28 1 1 1 1 1 0 1 1 19 0 0 0 0 0 0 0 0 010 0 0 0 0 0 0 0 0 011 0 1 1 0 0 0 0 1 112 0 0 0 0 0 0 0 0 013 1 0 0 014 1 0 1 015 1 1 1 1Raw Score 2 15 14 9 10 9 12 15 15TotalTotalScoringDiff +3 +1 -3 -2 -3 0 +3 +3Score +5 +3 +1Dist -2 5 -4 -2 -3 0 +3 +3( 7 ) ( 3 ) ( 5 ) (2) ( 3 ) (3) (3)652 items, with a total of 7 items scored differently than thekey. Overall, only one subject scored in agreement with thekey. Four of the others awarded more points than the key,the other three psychologists awarded fewer points.Additionally, four of the psychologists awarded points pastthe ceiling (Subjects 2, 8, 15, and 18); four othersobtained the correct ceiling (Subjects 10, 11, 12,and 13).Non-Problematic Items Items where psychologists scored consensually were 1, 2, 5,9, 10, and 12. All psychologists awarded 1 point for theresponse to item 1, 2 points for the responses to items 2and 5, and 0 points for the responses from items 9, 10, and12. As for items 3, 7, and 8, the scores differed by onepoint assignment.Difficult Items Items 4, 6, and 11 posed more difficult to score.Three psychologists awarded 2 points for the response toitem 4, while the other five gave 1 point to the sameresponse. As for item 6, half of the psychologists gave nocredit to the response, while the other half gave 1 point.The scoring on item 11 was also divided. Half of thepsychologists gave 1 point while the other half awarded nopoints. For illustrative purposes the number and percentageof verbalized responses across subjects for three of thenon-problematic items (1, 2, and 9) and three of thedifficult items (4, 6, and 11) are presented in Appendix D66in Tables D1 and D2. Strategies varied across items,however, memory seemed to be more frequent for the lessdifficult items, while recommendations/evaluations was morecommon for the more difficult items. For examples ofpsychologists' actual verbalizations to these items seeAppendix C. Appendix C presented the actual verbalizedstatements of the psychologist who had no errors (Subject#13) and the psychologist who erred the most (Subject #2).General Strategies At the end of the think-aloud session, psychologistswere asked to relate any general strategies that werehelpful to them in scoring difficulties. Not surprisingly,all psychologists said that their general strategy was torefer to the manual. However, the percentage frequencies ofonly four of the subjects (Subjects 8, 10, 11, and 18)indicated the use of the manual was a primary decision aid.SummaryThe results of both sessions showed that, on average,psychologists participating in this study were within anacceptable range of error (Table 3). However, there weresome individual scoring differences as represented by the 11point spread for the Verbal IQ. To explore this variance,inter-individual scoring patterns were represented in thesecond session (Table 9). Scoring strategies werepreviously identified in the second session (Table 7) aswell as the frequency of strategy use across subjects (Table8). These results are discussed in the next chapter.67Chapter 5SUMMARY AND CONCLUSIONS The primary purpose of this study was to explorepsychologists' use of cognitive strategies when scoringdifficult verbal responses on the WISC-R. Because thepsychometric approach does not address mental processes, itwas helpful to look at how these processes might affectpsychologists' judgments, and ultimately their scoring.This study investigated the processes behind psychologists'decision-making behaviour as it pertained to judging verbalresponses.Summary of Results and DiscussionSession one and session two psychologists scoredsimilarly on the WISC-R I measure. Therefore, with respectto their performance, one may infer that the results on theWISC-R II measure have some degree of generalizability tothose psychologists who only participated in the firstsession. T-tests found no significant differences for pointdifferences and errors between session one only and sessiontwo psychologists. Although the Verbal IQ was found to varyby 11 points (80-91), overall group differences in scoringwere not sufficient to affect subtest scaled scores or theVerbal IQ (PRO) score. Collectively, the psychologists inthis study were found to assign scores comparable to those6869on Slate's fabricated protocol. Additionally, the SD of theVerbal IQ approached the SEM value of each subtest suggeststhat scoring variability is a an error component that shouldbe incorporated into the SEM (Table 4). This suggests inturn, that although a high degree of objectivity is assumedin the administration and scoring of WISC-R verbalresponses, test users need to be cognizant of thelimitations inherent in the tests themselves. Limitationsin the test are only acknowledged in the form of contentsampling (estimated from studies of internal consistency),and time sampling (estimated from score stability studies)as sources of measurement error (Hanna, Bradley, & Bolen,1981; Slate & Chick, 1989).Although the net scoring differences did not appear toaffect the overall IQ there was apparent inter-subjectvariability. Scores often cancel themselves out within aprotocol and items that reflect differences of opinion arenot identified. In other words, the range of scores onthese verbal subtests reflected psychologists' differencesin attaching credit to specific items.Lastly, in order of scoring difficulty, the Vocabulary,Comprehension, Similarities, and Information subtests werefound to be prone to scoring errors. These results areconsistent with those obtained by others (Miller, Chansky, &Grendler, 1970; Oakland, Axelrad, & Lee, 1975; Slate &Chick, 1989; Slate & Jones, 1988). Next, the results of70session two will be discussed in more detail since thissession was the primary focus of the study. TheComprehension subtest used in the think-aloud exercise was atest that the psychologists had not used in the firstsession. This test was employed so as not to repeatprevious material.Discussion of Research Question 1: Are there common mentalprocesses and strategies that underlie psychologists'scoring judgments?The results of the second session indicated thatpsychologists generally relied on the manual; somepsychologists referred to the manual more than others.Psychologists referred to the manual's principles andcriteria as an aid in judging responses, although at timesthis strategy was not productive.Recommendations/evaluations, the next most frequently usedcategory, reflected psychologists' knowledge of task-specific rules, for example, \"I think it should have beenqueried\" (Subject #18). Furthermore, some responses seemedto remind psychologists of an example already held inmemory. After reading a response, one psychologist simplyresponded, \"That looks like a hit on a one point response(Subject #15). Thus, psychologists seemed to have arepertoire of responses where they were able to learn fromexamples and make abstractions across categories. Thirdly,because psychologists did not engage frequently in71self-questioning and self-regulatory behaviour, one mayspeculate that performance in this regard may be highlyautomatic. Additionally, the lack of planning statementsindicated that again steps in the task were highlyprocedularized.^Psychologists generally did not make theirown interpretations of seemingly problematic responsesbefore checking the manual. When interpretations were madeit was to generate more conceptual information in order tomore clearly evaluate a response. \"We can imply to go overmyself would be to get help\" (Subject #12). Psychologistsgenerally found it helpful to verbalize planning statementswhen the task became difficult. This was a key heuristicwhen psychologists had to search for at least two key ideasin a response. Moreover, general metastrategic statementswere found not to account for a large portion of judgmentspertaining to scoring behaviour. However, the lowpercentage of metastrategic statements does not adequatelyreflect this strategy that can lead to errors by affectingceilings. This point will be discussed further.Discussion of Research Ouestion 2: To what degree arepsychologist cognitive strategies similar?Individual cognitive strategies were found to varydepending on the complexity of the response. Becauseintra-individual processes were diverse across the group,two of the psychologists' cognitive patterns may behighlighted in particular: the cognitive patterns of the72psychologist who had no scoring errors and the cognitiveactivity of the psychologist who erred the most.Psychologist With No Errors (Subject #13) Ironically, the psychologist that referred to the manualthe least amount of time (18%) made no errors. Rather, thispsychologist spent most of the time verbalizing statementsreferring to recommendations and evaluations (47%). Forexample, verbalizations included such statements as, \"thatkind of response is not meant to be queried\" or, \"okay onthe first one I would give it a one, it was appropriatelyqueried\". The manual was consulted only for clarification.The underlying reason was that this psychologist was alsomore prone to notice errors within the protocol itself andmarked accordingly. This was also indicated by therelatively high rate of evaluation/recommendation responseswhich was an index of task knowledge. Additionally, inambiguous circumstances, such as in item 1, \"treat it (Q)treat it with things at home\", scoring accuracy wasincreased because the psychologist also tried to makejudgments based on self-reasoning when the manual was of nohelp. For example, \"the additional answer doesn't addanything that I feel tells me that the child knows any morethan what he knew in the first answer.\" This is evidencedin this psychologist acquiring the highest frequency ofself-explanation episodes. In sum, 65% of thepsychologist's verbalizations referred to self-explanations73and recommendations/evaluations. Additionally, since thispsychologist took the least amount of time to score thesubtest, and did so with no errors, this indicated a highlevel of proceduralized knowledge.Psychologist with the Most Errors (Subject #2) Although Subject #2 used the manual more frequently(23%) than Subject #13, this psychologist erred the mostoverall. The differentiating factor was that although thispsychologist used the manual more often, s/he made half asmany recommendations and evaluations (23%). This appears toreflect a lack of declarative knowledge necessary foraccurate scoring, that is, the questioning rules forscoring. For example, for the response, \"so people can getmeat (Q) it might be bad\", this person failed to disregardthe incorrect cue and awarded the latter part of theresponse one point. This action impeded obtaining thecorrect ceiling. Additionally, this psychologist engaged inthe least amount of self-explanatory behaviour (4%) whenpresented with ambiguous responses. In contrast, forSubject #13 self-explanatory behaviour seemed helpful in thedifferentiation between category responses or vagueresponses.Psychologists' Scoring by Item on the WISC-R II As expected, responses clearly scorable by the manualwere awarded a consensual point value by all psychologists.An example of such an instance is in question two where74there were no errors, \"What are you supposed to do if youfind someone's wallet or pocket-book in a store?\". Theresponse given in the fabricated protocol was \"Turn it intolost and found\". In these types of situations psychologistswould usually rely on memory-based examples, that is,instances from previous experience. In other words, itwould appear that the structural aspects of the responsereminded psychologists of examples already held in longtermmemory.^Examples of memory-based comments to this questionwere as follows:It's a fairly clear response, I'll give it a two -it'ssomewhere in the manual it's as close as such (Subject#10).The second one is straight forward. It's the same as inthe book (Subject #13).Turn it into lost and found is two points. So that's atwo (Subject #2).On the other hand, after a memory-based impression was made, somepsychologists preferred to use an additional monitoring strategyby also checking the criteria in the manual:Number two. Turn it into lost and found. My firstreaction is I think it's a two pointer (memory)...andwhen I check it actually is (manual; Subject #18).Thus, in some cases the manual was never consulted; in othercases, the manual functioned primarily as a \"backup\" aid ifpsychologists were confident of a response that resembled amemory-based example.75Problematic Responses For responses that were more problematic to score,psychologists' judgments may be explained by their personalframes of references, the way they conceptualized responsesas well as by their rule-based interpretations. Thedifficulty of scoring \"novel\" subject responses has beenwidely acknowledged. Sattler (1988) amplifies the challengeposed in the scoring of verbal responses as illustrated inFigure 3. Some of the items in the Comprehension verbalexercises seemed problematic in this regard since there wasvariance in scoring among subjects. An example of theseitems to be discussed are 4, 6, and 11.Personal Frames of Reference Psychologists' personal frames of references may beaccounted for by statements referring to their generalmetastrategic statements (personalized strategies).Although such statements were few in number (5.4% of theoverall verbalizations), they were still found to exert aninfluence on psychologists' scoring. This was representedby the fact that half of the subjects obtained ceilings bythe 12th response, while the others did not obtain ceilings.Obtaining the correct ceiling is an important aspect ofaccurate scoring since subsequent scoring is determined bythe correct finishing point. Again, this was evidenced initem 11. Those who did not obtain a ceiling assigned a onepoint credit to the 11th response to the question \"Why is it76important for the government to hire people to inspect themeat in meat packing plants?\" The fabricated response was\"So people can get meat(Q)It might be bad\". In theseinstances psychologists ignored the query and preferred toaward the child a point, as in the following examples: \"He[the child] demonstrated that he knew it\" (Subject #15),\"So, it's to the child's advantage if you make this kind ofmistake because you can then raise the score\" (Subject #2),and \"I think it's verging on the concept of protecting theconsumer\" (Subject #18). Generally, the psychologists wereinterested in whether the child really knew the correctresponse despite the knowledge that the response should nothave been queried.^A more general account of onepsychologist's personal philosophy is summed up in thefollowing by subject #1:The administration sets up a \"structured interview\" inorder to get to know the child. I look at the WISC-R asa tool to get to know the child. I test a lot of ESLchildren. You know that they are intelligent but theymay not know the words, so you try to get the globalresponse. I also test a lot of language pathologychildren. You have to get the global response of whatthey mean because they may not be able to express it(Subject #1).Subject #2 expressed similar thoughts. \"I write all over myprotocol. I tend to make observations because the numbersthemselves don't have any meaning on their own\". Forinstance, this psychologist said:The questions in the Comprehension subtest basically aremorally loaded questions. What a child learns at home77is basically his/her view of the world. The questionsin the WISC-R may not be reflective of that. For thequestion, \"What would you do if someone younger than youhits you?\" Well, I knew a child whose younger brotherwas always hit upon. When the family system is set upthis way it's hard to ignore. So if you pose such aquestion to the child, the natural response would be tohit him.Thus, it appears that psychologists' personal frames ofreferences affect their individual judgment strategies aswell. For example, psychologists appear to be lenient ifthe child's response shows promise or if it seems that thechild may know the correct answer depending on the contextor situation. One psychologist suggested that, \"If this isan older person I think I would give that two points; if itwas a younger person I would consider it to be queried...tosee if they know what they are talking about\" (Subject #18).In such circumstances, one psychologist (Subject #15) saidit is better to always give more points. On the other hand,another psychologist related a different approach:I don't know exactly if I come to a situation where Ireally don't know...I mean I know some people tend toscore up and some people tend to score down and I guessif I were stuck in that situation I would tend to scoredown...and then I would just keep that in mind how manytimes I had to do that when I'm coming up with thescores to see if they would have an effect on any of thescales or the general scores (Subject #8).A Multitude of Responses: Differences inConceptualizations of Responses Another item where psychologists did not achieve aconsensus was on item four. Where psychologists78conceptualized responses differently was reflected primarilyin self-questioning behaviour and monitoring behaviour.Self-questioning served an important function by offeringclarifications and giving reasons in instances ofproblematic responses. Monitoring behaviour was reflectiveof psychologists' checking their actions in the face of suchunclear responses. For the question, \"What are some reasonswe need police men?\" The fabricated response was, \"Catch badpeople, arrest crooks, enforce laws\". Five of the subjectscategorized the three responses as similar and awarded theresponse one point as a unitary response. Althoughpsychologists checked the manual for matching criteria itdid not seem to be overly helpful to all. For example,Subject #13 said, \"I would have queried that as well to lookfor a second answer, they all fall within the samecategory.\" Subjects 8, 10, and 18 awarded two points. Theygave reasons such as \"I think that categorically catchingbad people and arresting crooks are probably one, andenforcing laws is definitely another\" (Subject #18). \"Thefirst two are the same level...I'm checking to see if thetwo ones are the same as 'enforce laws'... I think they'retwo separate categories so I'm giving it two points\"(Subject #10).To (0) Or Not To (0):Rule-based Interpretations When psychologists made reference to rule-basedinterpretations (22.2% of the time), it was generally in79reference to examiner errors already present on theprotocol. This aspect of scoring was reflected in therecommendations/evaluations category, which revolved mainlyaround whether a response should or should not be cued. Animportant finding emerged involving biased heuristics. Thisheuristic involved always scoring the response after a (Q).Half of the subjects used this biased heuristic for question#6, \"What is the thing to do if a boy (girl) much smallerthan yourself starts to fight with you?\". For the response,\"Let him be(Q)and get mad\", four subjects gave 0 points andthe other four 2 points. Psychologists gave reasons suchas, \"...I think that he would get a zero when he spoils itlike this\" (Subject #2), as opposed to \"I'd give that a twobecause his first answer is correct and it should not havebeen queried, and while his second answer is inadequate itshould not have been asked in the first place so his firstanswer is worth a two\" (Subject #13) - the correctalgorithm. An application of an incorrect heuristic wouldinvolve always crediting a second response to an incorrectlycued first response. To reiterate, the correct algorithmwould be to award credit only to the first response inaccordance with standardization procedures irrespective ofan answer that elevates the credit value. Also, differencesin rule-based interpretations and thus judgments in scoringwere apparent in the scoring of item eleven, \"So people canget meat (Q) It might be bad\". Half of the psychologists80awarded no credit, the other half awarded 1 point credit.Again, the biased heuristic involved crediting the latterresponse if it elevated the value of the first responsedespite inappropriate cueing. As discussed previously, forthis example a personal frame of reference that favoursleniency may explain this biased heuristic of awardingcredit to the more sophisticated response because somepsychologists did note that the response should not havebeen cued.^According to Brehmer, Hagafors, & Johansson(1980):Subjects do not perform optimally in a judgment taskeven when they know exactly what rule to use for makingoptimal judgments ... knowing the rule for a task is notenough for producing a correct response...being told howto perform a judgment task does not guarantee perfectjudgments. (p. 373)Therefore, being told how to score responses does notguarantee correct scoring judgments. Alternatively, generalknowledge deficiencies pertaining to scoring rules is anexplanatory factor underlying differences in judgments(Subject #2).Discussion of Research Question 3: To what degree arepsychologists aware of their own cognitive processes andstrategies?Results of percentage frequencies indicated that onlyhalf of the subjects accurately reported the manual as theirprimary heuristic. It could be that although psychologistsare aware that the manual is the most important heuristic81aid for checking responses, they are unaware that there areother underlying factors, or alternative primary strategiesinfluence their scoring judgments just as well. Especiallyin instances where the manual is not overly helpfulpsychologists may unknowingly have developed secondarystrategies to aid in their scoring judgments. This isevidenced by the fact that psychologists did not reportextensively on individual metastrategic statements (5.4%) aspressure to conform to standardization procedures ismandatory in the scoring of WISC-R responses.One subject did have a better idea than the others thatthe manual was the most important decisive factor in scoringjudgments and commented upon this:Well, I guess the first thing I would do is to try andstick to the manual. So, if I can see that there issomething clearly in the manual like in the examplewhere responses weren't queried even though they shouldhave been then I don't because it's been drilled into methat you have to stick to the standardization becauseotherwise it's not as valid and reliable.However, this psychologist differed from Slate's key by 4points, the second highest error rate. One may thenconclude that use of the manual does not itself guaranteeflawless scoring. As seen, there seems to be importantcognitive variables such as the strategies identified inthis study that psychologists are largely unaware of intheir scoring experience.82Conclusions An advantage of a study of this nature is that itidentified general cognitive strategies and a biasedheuristic that conceptualized sources of error.^This wasdone by relating cognitive strategies to scoringdifferences. Secondly, through the validity question(research question 3), it brought to light thatpsychologists are largely unaware of how their underlyingcognitive variables affect judgments. Thirdly, a biasedheuristic of some psychologists was identified that involvedidentifying an inclination towards leniency even if it meantthat standardized procedures were not adhered to. One mayspeculate, then, that other members of the generalpopulation may use similar processes in their scoring,especially for difficult or ambiguous verbal responses.This leniency aspect of scoring also became apparent in thequantitative performance data in session one where subjectswere more prone to award more credit to responses of lesservalue (Table 2).Another point is that in the present study psychologistsgenerally found the think-aloud exercise helpful to them inlooking at their underlying reasons in difficult scoringsituations. In reference to an inappropriately questioneditem on a protocol, one psychologist commented that, \"Whatdo I do there...this is good review for me...this isactually an interesting exercise... to do, because it is83interesting to look at why do I think that...\" (Subject #2).Limitations of the studyAlthough the majority of educational research isconducted with volunteer subjects (Borg & Gall, 1989), thefact that a volunteer sample was used in the study evokesseveral considerations. The first is that the psychologistswho volunteered may have been different than those who didnot. Those who volunteered were perhaps more confident intheir scoring, and perhaps were more at ease to openingthemselves up to a stranger for evaluation. Others who didnot volunteer perhaps declined to do so due to the anxietyof having their work evaluated. Additionally, those whovolunteered for the thinking aloud exercise may be adifferent group in themselves. Although think-aloudexercises generally involve a small number of Subjects(Ericsson & Simon, 1984), participation of all 23 subjectswas not obtained for the verbal analysis. Caution shouldtherefore be maintained in generalizing the results of thisstudy to the general population of Greater Vancouverpsychologists. Ideally, if a larger sample of scorers wereobtained, then perhaps more general kinds of heuristics andscoring biases would have become apparent, as reflected indifferent category types and frequencies.A second limitation of the study is that the protocolswere fabricated. In a real testing situation psychologistsoften find alternative cues helpful in their scoring. For84example, Subject #12 commented that \"in looking at theseanswers, it's like dealing with the kid, and the strategiesthat the kid goes through to get these answers are helpful,so that's what you miss when you're looking just at blackand white\". An alternative methodology to overcome thisdisadvantage would be to videotape an assessment processinvolving the administration of the WISC III to a child. Avideotape would capture verbatim responses as well as makevisible important testing facets that would be lost throughfabricated protocols. A further research question may ask,\"To what extent would scoring judgments obtained from a morerealistic setting differ from those obtained on a fabricatedprotocol constructed with the same verbal responses?\" Forinstance, for a multiple response answer in this study, onepsychologist said that s/he may have marked differently ifthe child had taken a breath in between a response -indicating two concepts rather than one. Such cues would beapparent on videotape.Another limitation related to the analogue nature of theprotocols is that psychologists were already presented withcued responses. These facets may have actually functionedas distracters in this exercise.Implications of the StudyDifferences among individuals who are in a judgment rolehave usually been treated as error (Slovic & Lichtenstein,cited in Rappoport & Summers, 1973) irrespective of the85underlying mental processes. This is because researchershave paid minimal attention to the inferences and theparticular cognitive processes that are responsible forexaminers' performance judgments (Kishor, 1987).Consequently, it is often overlooked that examiners are anactive part of a test process that calls for specificcognitive skills surrounding interpretive judgment.Unfortunately, it does not appear as though studies havesystematically explored the systems underlyingpsychologist-examiners' scoring behaviours. According toBarnett (1988, cited in Burns, 1990), applications ofjudgment research in cognitive psychology to schoolpsychology are lacking. Rather the end product of thescoring process, that is, the behavioural or scoring phasehas received the attention rather than the systemsunderlying this process. The study of these systems mayhelp in the understanding of psychologists' differentialjudgments of difficult-to-score and ambiguous verbalresponses. Slate and Hunnicut (1988) analyzed literaturepertaining to the Wechsler protocol errors in order toidentify probable reasons. They suggested that ambiguity intest manuals and poor instruction were contributing factors;however, again, there empirical studies addressing thesespecific issues are lacking. One may speculate, assuggested in this study, that, in practice, psychologistsare often left to their own devices in terms of scoring86problematic verbal responses. It follows that studies needto focus on the cognitive activity surroundingpsychologists' actions in doubtful situations. Do theyoften rely on their own compensatory strategies whenevaluating difficult responses? Psychologist-examiners mustoften make subjective judgments because many responses givenby children and adults alike are not clearly scorable by thetest manual (Slate & Hunnicut, 1988).Furthermore, this study brings to light that variationsin psychologists' cognitive activity appear to affectscoring outcomes. This was demonstrated by comparing theperformance of Subject #8 and Subject #13. Furthermore,implications for future training is that in order tofinetune scoring, the need for revision of questioning rulesby psychologists, as well as the clarification of such rulesin the manual may be helpful for some psychologists.Additionally, these problematic areas should be stressed inthe teaching of the Wechsler Scales in the graduateclassroom.Finally, the similar task structure of the new versionsof the WISC-R, WISC III, implies that the findings of thisstudy will have some degree of generalizability to the newmeasure. This is because the nature of underlying cognitiveprocesses were studied, rather than an item analysis.Therefore, the decision-making strategies and techniquesthat psychologists employ in scoring may also transfer to87problem areas on the WISC III as well. However, since thiswas primarily an exploratory study, the possibility remainsfor more indepth research using the similar WISC III task.References Anderson, B. (1977). Differences in teachers' judgmentpolicies for varying numbers of verbal and numericalcues. Organizational Behavior and Human Performance,19, 68-88.Babad, E., Mann. M., & Mar-Hayim, M. (1974). Bias inscoring the WISC subtests. Journal of Consulting andClinical Psychology, 43, 268.Barnett, D.W. (1988). Professional judgment: A criticalappraisal. School Psychology Review, 17, 658-672.Benjafield, J. (1969). Evidence that \"thinking aloud\"constitutes an externalization of inner speech.Psychonomic Science, 15, 83-84.Berl, J., Lewis, G., & Morrison, R.S. (1976). Applyingmodels of choice to the problem college selection. InE.R. Smith & Miller, F.D. (1978). Limits on Perceptionof Cognitive Processes: A Reply to Nisbett and Wilson.Psychological Review, 85, 355-362.Boehm, A., Duker, J., Haesloop, M., & White, M. (1974).Behavioral objectives in training for competence in theadministration of individual intelligence tests. Journalof School Psychology, 12, 150-157.Borg, W., & Gall, M. (1989). Educational Research: AnIntroduction, Fifth Edition. New York: Pitman.8889Borman, W.C. (1977). Consistency of rating accuracy andrating errors in the judgment of human performance.Organizational Behavior and Human Performance, 20,238-252.Bradley, F., Hanna, G., & Lucas, M. (1980). The reliabilityof scoring the WISC-R. Journal of Consulting and Clinical Psychology, 48, 530-531.Brannigan, G. (1975). Scoring difficulties on the Wechslerintelligence scales. Psychology in the Schools, 12,313-314.Brehmer, B., Hagafors, R., & Johansson, R. (1980).Cognitive skills in judgment: Subjects' ability to useinformation about weights, function forms, andorganizing principles. Organizational Behavior andHuman Performance, 26, 373-385.Brown, J.S., & Burton, R. (1978). Diagnostic models forprocedural bugs in basis mathematical skills.Cognitive Science, 2, 155-192.Burgess, R. (1985). Field Methods in the Study of Education. Philadelphia: Falmer.Burns, C.W. (1990). Judgment theory and school psychology.Journal of School Psychology, 28, 343-349.Cardy, R., & Dobbins, G. (1986). Affect and appraisalaccuracy: Liking as an integral dimension in evaluatingperformance. Journal of Applied Psychology, 72,672-678.90Chase, W.G., & Simon, H.A. (1973). Perception in chess.Cognitive Psychology, 4, 55-81.Chi, M.T.H., Feltovich, P., & Glaser, R. (1981).Categorization and representation of physics problems byexperts and novices. Cognitive Science, 5, 121-152.Conner, R., & Woodall, F.E. (1983). The effects ofexperience and structured feedback on the WISC-R errorrates made by student-examiners. Psychology in the Schools, 20, 376-370.Crow, L., Olshaysky, R., & Summers, J. (1980). Industrialbuyers' choice strategies: A protocol analysis. Journal of Marketing Research, 17, 34-44.Dansereau, D., & Gregg, L. (1966). An informationprocessing analysis of mental multiplication.Psychonomic Science, 6, 71-72.Davis, H. (1968). Verbalization, experimenter presence andproblem solving. Journal of Personality and Social Psychology, 8, 299-302.de Groot, A.D. (1965). Thought and Choice in Chess. TheHague: Mouton.DeNisi, A.S., Cafferty, T.P., & Meglino, B.M. (1984).Organizational Behavior and Human Performance, 33,360-396.Ericsson, K.A., & Simon, H.A. (1984). Protocol Analysis: Verbal Reports as Data. Cambridge: The MIT Press.91Ericsson, K.A., & Simon, H.A. (1980). Verbal Reports asData. Psychological Review, 87, 215-251.Fagley, N.S. (1988). Judgmental heuristics: Implicationsfor the decision making of school psychologists. School Psychology Review, 17, 311-321.Feldman, J.M. (1981). Beyond attribution theory: Cognitiveprocesses in performance appraisal. Journal of AppliedPsychology, 66, 127-148.Flavell, J.H. (1979). Metacognition and cognitivemonitoring: A new area of cognitive developmentalenquiry. American Psychologist, 34, 906-911.Franklin, M., Stollman, P., Burpeau, M., & Sabers, D.(1982). Examiner error in intelligence testing: Are youa source? Psychology in the Schools, 19, 563-569.Fogarty, J.L., Wang, M.C., & Creek, R. (1983). Adescriptive study of experienced and novice teachers'interactive thoughts instructional thoughts and actions.Journal of Educational Research, 77, 22-32.Forrest-Pressley, D.L., MacKinnon, & Waller, T.G. (1985).Metacognition, Cognition, and Human Performance: Instructional Practices (Vol.2). New York: AcademicPress.Gagne, E.D. (1985). The Cognitive Psychology of School Learning. Boston: Little, Brown and Company.92Gagne, M., & Smith, E., Jr. (1962). A study of the effectof verbalization on problem solving. Journal of Experimental Psychology, 63, 12-18.Galotti, K. (1989). Approaches to studying formaland everyday reasoning. Psychological Bulletin, 105,331-351.Gavelek, J.R., & Raphael, T.E. (1985). Metacognition,instruction, and the role of questioning activities. InD.L. Forrest-Pressley, G.E. MacKinnon, & T.G. Waller(Eds.), Metacognition, Cognition, and Human Performance: Instructional Practices (Vol.2). New York: AcademicPress.Glaser, B.G., & Strauss, A.L. (1967). The Discovery of Grounded Theory: Strategies for Qualitative Research.Chicago, IL: Aldine.Glaser, B.G. (1978). Theoretical Sensitivity. Mill Valley,CA: The Sociology Press.Hanna, G., Bradley, F., & Holen, M. (1981). Estimatingmajor sources of measurement error in individualintelligence scales: Taking our heads out of the sand.Journal of School Psychology, 19(4), 370-376.Higgins, E., Herman, C., & Zanna, M. (1981). Social Cognition: The Ontario Symposium (Vol. 1).Hillsdale,NJ: Erlbaum.93Hunnicutt, L., Slate, J., Gamble, C., & Wheeler, M.(1990). Examiner errors in the Kaufman AssessmentBattery for Children: A preliminary investigation.Journal of School Psychology, 28, 271-278.Kasper, J.C., Throne, F.M., & Shulman. J.L. (1968). A studyof the interjudge reliability in scoring the responsesof a group of mentally retarded boys to three WISCsubscales. Educational and Psychological Measurement,28, 469-477.Karph, D.A. (1973). Thinking-aloud in Human DiscriminationLearning. Unpublished doctoral dissertation, StateUniversity of New York at Stony Brook.Kaufman, A. (1979). Intelligent Testing with the WISC-R. New York: John Wiley and Sons.Kilpatrick, J. (1968). Analyzing the solution of wordproblems in mathematics: An exploratory study (Doctoraldissertation, Stanford University, 1967). Dissertation Abstracts International, 42, 4380-A.Kishor, N. (1987). Cognitive Strategies in Judgment.Unpublished doctoral dissertation, The University ofBritish Columbia, Vancouver.Klein, N. (1983). Utility and decision strategies: A secondlook at the rational decision maker. Organizational Behavior and Human Performance, 31, 1-25.94Krzytofiak, F., Cardy, R., & Newman, J. (1988). Implicitpersonality and performance appraisal: The influence oftrait inferences on evaluations of behavior. Journal ofApplied Psychology, 73, 515-521.Lesgold, A. (1990). Problem solving. In R.J. Sternberg &,E.E. Smith (Eds.). The Psychology of Human Thought (pp.188-213). Cambridge:Cambridge University Press.McArthur, L. (1981). What grabs you? The role of attentionin impression formation and causal attribution. In E.Higgins, C. Herman, and M. Zanna (Eds.), Social Cognition: The Ontario Symposium (Vol.1). Hillsdale,NJ: Erlbaum.McCormick, C., Miller, G., & Pressley, M. (1989). Cognitive Strategy Research. Springer-Verlag: New YorK.Miller, C., & Chansky, N. (1972). Psychologists' scoring ofWISC protocols. Psychology in the Schools, 9, 144-152.Miller, C., Chansky, N., & Gredler, G. (1970). Rateragreement on WISC protocols. Psychology in the Schools,7, 190-193.Montgomery, H. (1977). A study of intransitive preferencesusing a think aloud procedure. In H. Zungermann &, G.De Zeeuw (Eds.), Decision Making and Change in HumanAffairs. Boston: D. Reidel.Mount, M.K., & Thompson, D.T. (1987). Cognitivecategorization and quality of performance ratings.Journal of Applied Psychology, 72, 240-246.95Murphy, K.R., & Balzer, W.K. (1986). Systemic distortionsin memory in memory-based behavior ratings andperformance evaluations: Consequences for ratingaccuracy. Journal of Applied Psychology, 71, 39-44.Newell, A., & Simon, H.A. (1972). Human Problem Solving.New Jersey: Prentice-Hall.Nisbett, R., & Wilson, D. (1977). Verbal reports on mentalprocesses. Psychological Review, 84, 231-259.Oakland, T., Lee, S., & Axelrad, K. (1975). Examinerdifferences on actual WISC protocols. Journal of SchoolPsychology, 13, 227-233.Payne, J. (1980). Information processing theory: Someconcepts applied to decision research. In T.S. Wallsten(Ed.), Cognitive Processes in Choice and DecisionBehavior (pp. 95-115). Hillsdale, NJ: Erlbaum.Peterson, P.L., Marx, R.W., & Clark, C.M. (1978). Teacherplanning, teacher behavior, and student achievement.American Educational Research Journal, 15, 417-432.Pitz, G., & Sachs, N. (1984). Judgment and decision: Theoryand application. Annual Review of Psychology, 35,139-163.Plumb, G., & Charles, D. (1955). Scoring difficulty ofWechsler comprehension responses. Journal ofEducational Psychology, 46, 179-183.96Rappoport, L., & Summers, D. (1973). Human Judgement and Social Interaction. New York: Holt, Rhinehart, &Winston.Sattler, J.M. (1988). Assessment of Children (3rd ed.).San Diego: Jerome M. Sattler.Searles, E. (1985). How to Use WISC-R Scores in Reading Diagnosis. Newark, DE: International ReadingAssociation.Sherrets, G., Gard, G., & Langer, H. (1979). Frequency ofclerical errors on WISC protocols. Psychology in theSchools, 16, 495-496.Schulman, L.S., & Elstein, A.S. (1975). Studies of problemsolving, judgment, and decision making: Implications foreducational research. In F. Kerlinger (Ed.), Review of Research in Education (Vol.3, pp. 3-42). Itasca,Illinois: F.E. Peacock.Simon, H.A., & Chase, W.G. (1973). Skill in chess.American Scientist, 61, 394-403.Simon, D.P., & Simon, H.A. (1978). Individual differencesin solving physics problems. In R. Smith, E.T. (1985,September 30). Are you creative? Business Week, 80-84.Slate, J.R. (1991). Guide to administering and scoring theWISC-R. Unpublished manuscript, Arkansas StateUniversity, Arkansas.97Slate, J.R. & Chick, D. (1989). WISC-R examiner errors:Cause for concern. Psychology in the Schools, 26,78-83.Slate, J.R., & Hunnicutt, L. (1988). Examiner errors on theWechsler scales. Journal of Psychoeducational Assessment, 6, 280-288.Slate, J.R., & Jones, C.H. (In press). Errors on theWechsler scales: Commonly mis-scored examinee responses.Social and Behavioral Sciences Documents.Slate, J.R., & Jones, C.H. (1990). Identifying students'errors in administering the WAIS-R. Psychology in the Schools, 27, 83-87.Slate, J.R., & Jones, C.H. (1989). Examiner errors in theWAIS-R: A source for concern. The Journal of Psychology_, 124, 343-345.Slate, J.R., & Jones, C.H. (1989). Can teaching of theWISC-R be improved? Quasi-Experimental Exploration.Professional Psychology: Research and Practice, 20,408-410.Slate, J.R., & Jones, C.H. (1988). Strategies to reduceexaminer error on the Wechsler scales. Social andBehavioral Sciences Documents, 18 (Ms. No. 2840)98Slovic, P., & Lichtenstein, S. (1973). Comparison ofBayesian and regression approaches to the study ofinformation processing in judgment. In L. Rappoport &D. Summers, Human Judgment and Social Interaction (pp.15-109). New York: Holt, Rhinehart, & Winston.Smith, C.O. (1971). The Structure of Intellect Processes Analyses System: A Technique for the Investigation andQuantification of Problem Solving Processes.Unpublished doctoral dissertation, University ofHouston.Smith, E.R., & Miller, F.D. (1978). Limits on Perception ofCognitive Processes: A Reply to Nisbett and Wilson.Psychological Review, 85, 355-362.Smith, R., E.T. (1985, September 30). Are you creative?Business Week, 80-84.Sternberg, R.J. (1985). Introduction: What is aninformation-processing approach to human abilities? InR. Sternberg (Ed.), Human Abilities: An Information-Processing Approach (pp. 1-4). New York:W.H. Freeman.Sternberg, R.J. (1981). Testing and cognitive psychology.American Psychologist, 36, 1181-1189.Sternberg, R.J., & Ketron, J.L. (1982). Selection andimplementation of strategies in reasoning by analogy.Journal of Educational Psychology, 74, 399-413.99Svenson, 0. (1985). Cognitive strategies in a complexjudgment task: Analyses of concurrent verbal reportsand judgments of cumulated risk over different exposuretimes. Organizational Behavior and Human Decision Processes, 36, 1-15.Truch, S. (1989). The WISC-R Companion. Calgary: FoothillsEducational Materials.Tversky, A., & Kahneman, D. (1983). Extensional versusintuitive reasoning: The conjunction fallacy inprobability judgment. Psychological Review, 90,293-315.Tversky, A., & Kahneman, D. (1984). The framing ofdecisions and the psychology of choice. In G. Wright(Ed.)., Behavioral Decision Making. New York: Plenum.Van Haneghan, J.P., & Baker, L. (1989). Cognitivemonitoring in mathematics. In C. McCormick, G. Miller,& M. Pressley, Cognitive Strategy Research (pp.215-238). Springer-Verlag:New York.Wallsten, T.S. (1980). Cognitive Processes in Choice and Decision Behavior. Hillsdale, NJ: Erlbaum.Walker, R., Hunt, W., & Schwartz, M. (1965). The difficultyof WAIS comprehension scoring. Journal of Clinical Psychology, 21, 427-429.Warren, S., & Brown, W. (1972). Examiner scoring errors onindividual intelligence tests. Psychology in the Schools, 10, 118-122.100Webb, N.L. (1975). An exploration of mathematicalproblem-solving processes. Dissertation Abstracts International, 36, 3689A. (University Microfilms No.75-25625).Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children-Revised. New York: PsychologicalCorporation.White, P. (1980). Limitations on verbal reports of internalevents: A refutation of Nisbett and Wilson and of Bem.Psychological Review, 87, 105-112.Wiggins, N. (1973). Individual differences in humanjudgements: A multivariate approach. In L. Rappoport, &D. Summers (Eds.), Human Judgment and Social Interaction. New York: Holt, Rinehart and WinstonWoods, P. (1985). Ethnography and theory construction ineducational research. In R. Burgess (Ed.), Field Methods in the Study of Education (pp.51-78).Philadelphia: Falmer.Wright, G. (1984). Behavioral Decision Making. New York:Plenum.Zungermann, H., & De Zeeuw (1977). Decision Making andChange in Human Affairs. Boston: D. Reidel.101APPENDIX AWISC-P I102Mem.I.^INFORMATIONywre.h..... Om. 2 esm•cvh... Isil•res.Soo.1 or 0I. Smile-^.2. Ears3. Legs4 IWO5. Nickel6. Cow7. Womb^SEVENI. March . APRIL9. loom CALF10.Dozen 1311. SeasonsSF.SU.N.F12. AmericaC. CM 1.1MAIM13. Ste/noelsGROWLS14. Sufi WEST73. te\" \"el FEBRUARY16. Bulb EDISON17. 1776 BRITISHIS. ORPI.^IPIDK20. Ton^200021. CM.^1..22. Gloss LIME23. Greece^ROME21. Te11^5^11 6 1 /2\"2s. ee•oreeter DK26. Rust27. Los Angeles 2 79921. Hisreglypb/c•^DK^.29. 04rwifyNR30. Turpentine SAP3. SIMILARITIES^Diseenme. ohm 2 wooctolv• tea urea. Seam1 ibrO1. WBA•el..-benROUND AND ROLL2. Candle-lamp LIGHT UP3. Shlo...botWEAR BOTH4. plane-guitarMAKE WONDERFUL MUSIC'^S. Apple..166n6A61500.2.1..0FRUITS6. beer-wineAnti/ ,})RINKS (0) GET YOU DRUNK7. Cot-meow MAMMALSII. 113bew-kneeJOTNTS9. Telootione..eod ie MAXE NOISE (Q) COMMUNICAT/ON DEVICE10. Peund•-yardBOTH AMOUNTS71. Angier-4ovFEELINGS (Q) HOW SOME PEOPLE REACT12.Seissorz-eopper penBOTH COPPER13. Mountels-1666ANIMALS IN BOTH (Q) BOTH NATURE14. Liborty•-justiceCIVIL RIGHTS73. Firo-AeolorieITtnNg (01 IN A SERIES•16. The numbets 49 end 121^ • •^'ODD NUMBERS17. Sell- were? CHEMICAL ELEMENTS THAT ARE USED.•fteld iip.olo • i•peal se•ponee a Mo.•.w•••• •• end r s t 61.11•7,Mos ^30Tool103TIP7. VOCABULARY^DMCOMMYIN sit*. 3 COMOC10.11 Falwell. ---T cumI. S I. era1. Knife2. Umbrella3. Clock4 Me;5. Ilicycle6. Nail SHARP OBJECT7. Alphabet ALL THE LETTERS IN THE ALPHABET FROM A TO 2S. Donkey ANIMAL9. Thief SOMEONE WHO STFAIS10. Join PUT TOGETHER11. brave LIKE WHEY YOU SAVE SOMFAODY FROM DYING12. Diamond GEM (Q) SHINY THING13. Gamble PLAY WITH DICE (Q) CHEATING ON CARDS AND STUFF14. Nonsense FOOLISHNESS15, Prevent KEEP SOMETHING (0) FROM HAPPENING16. Contagious LOTS OF PEOPLE GET THE FLU (0) INFECTIOUS17. Nuisance A BOTHER OR PEST (0) MY KID BROTHER WON'T LEAVE ME ALONE15. Fable BOOK (Q) BOOK19. Hazardous POISONOUS (Q) IT COULD KILL YOU20. Migrate TO FLY SOUTH21. Stansa SOMETHING TO DO WITH PAPER AND WRITING22. Seclude IN, LIKE INCLUDE SOMEBODY^ .23. Monti. ANIMAL24. Espionage MISSION (Q) A SPY WITH .A MISSION FOR THE GOVERNMENT25. Belfry.. A CHURCH TOWER LIKE THING I26. RivalryDvI27. Anwndment NE28. Compel DK •29. Affliction YOU HURT SOMEBODY, YOU INFLICT SOMETHING ON THEM30. Obliterate31. Imminent •32. Dilatory•Tot.me...641049. COMPREHENSION^Dismarlass ans. A asnmanis. Whom. Seim2.1.6.01. Cut finest LET IT BLEED (Q) PUT A BAND—AID ON IT2. Find \"ale.^CALL A RADIO STATION•3- Smoke^CALL THE FIRE DEPARTMENT (Q) CALL THE POLICE.4* Pell...a\" TO STOP&RIME (0) HELP PEOPLE WITH PROBLEMS THAT AREN'T ILLEGALS.• la\" ball BUY ANOTHER ONE TO REPLACE THE ONE I LOST6. NH^BEAT THEM UP, NOBODY IS GOING TO HIT 11E, LET ME TELL YOU.7• ballerbaas• COOLER IN SU2OLER AND WARMER IN WINTER'S. license plates IDENTIFY THE CAR AND TO REPORT ACCIDENTS WHEN YOU HAVE TO DO SO•9. Criminals AS AS =AMPLE, FOR PUNTS)DIENT, SO THEY WON'T DO IT AGAIN, BAD PEOPLE10. Sla y\".^TO PAY FOR POSTAGE1). inspect mmit TO PROTECT THE NEAT FROM BEING BAD FOR PEOPLE• 12. Charily^TAX DEDUCTIBLE, SAFER FOR YOU TO DO SO13 . Seeallballe. IT IS THE RIGHT WAY• 16. Paperbacks CHEAPER. NOT SO BAD IF YOU LOSE A PAPERBACK BOOK15. Promise^IT IS A MATTO. OF TRUST AMONG PEOPLE (Q) A FATTER OF CONSCIENCE• 16. Owen^5gFT AND WARM (Q) CONVENIENT (Q) COMFORTABLE AND COOL• I7. Senators•9 n. tliw nolies grill only ORO i4... ..o Si. tow • sato* **ea*. 1*/****.^*DOwelw4n01..*4.4 '1•11gm.** His * N (***5 wity. 641*•1**Mas..34total6. CowS. Nickel3. Legs2. Ears1. Finge-4. Soil7. Week SEVENIL March • APRIL I 9. Bacon CALF^of_ v4toe'■10. Dozen11. S12. America r _ rm. nous13SP.SU,W.F13. Stomach GROWLS 14. Sun WEST 15. Leap Year FEBRUARYO17. 17761 0. Bulb 6.4tt BRITISH r4 13. LIGHTER105WISC-P. I KEY3.^SIMILARITIES^Discontinue otter 3 constitutive heilvree. Seem1 ot 01. W/1•11P1-ball ROUND AND ROLL I2. Candle-lamp LIGHT UP I3. Shirt-hatWEAR BOTH I4. PIono-guitarMAKE WONDERFUL MUSIC5. Apple-bananaFRUITS^Lhtteltrit6. Beer-win.enfItT DRINKS (0) GET YOU DRUNK^2.7. Cot:mouse^MAMMALS^ 2B. Bbow-AneelJOTNTS 2..9. TeIeph.\"1-\"Idio MAKE NOISE^COMMUNICATION^EVICES 0 ,10. Pound-yard BOTH AMOUNT^.^II 1. Anger-joy^ La btd- ham 9FEELINGS^Q) HOW SOME PEOPLE REACT^2.-12. Scissors-copper panBOTH COPPER13. Mountain-lake^ NoANIMALS IN BOTH CI) BOTH NATURE^014. Liberly-justiceCIVIL RIGHTS^ I15. First-last^ heald. het^Loot 9PirK T TIMIS^IN A SERIES•16. The numbers 49 and 121ODD NUMBERS^Ikeuli gidlbe CiS^I17. Salt-water CHEMICAL ELEMENTS THAT ARE USED^01 . INFORMATION^I Sc...gyltenl.n •e eller 3 ce ***** Ivor Failures.^1 or 019. BorderDK 0•9 Ca ceite gives • 1 •peobt •••■••■11. no Cleo le.^e4“ ore roe...per. 49 end 131 oilier.Tani^113a.e'3020. Ton^2000^I21. Chile22. GlossN. AME ICA4^OS bland WitLI•E23. Greece ROHE 024. Tan 5'11 5. 1/2\"25. Barometer DK 026. RustrtYVf\" CNT27. Los Angeles 279923. Hieroglyphics OK20. Darwin NR 030. TurpentineSAP0men. =314 , 01 171067. VOCABULARY^13.1C0111sAwe Whir 3 censecutire failures. SeenT. I.« 0l.^Knife 2.2. Umbrella 2.•3. Clock 2.-4^Not 2..5. Bicycle . 26. Nail SHARP OBJECT C hot, Li w.i- iv.42- 2.7. Alphabet ALL THE LETTERS IN THE ALPHABET FROM A TO 1^Skiff ei.... .2-S. Donkey ANIMAL 29. Thief SOMEONE WHO STEALS Z10.Join PUT TOGETHER Z11. Brave12.DiamondLIKE WHEN^OU F1414 TrfTHOitS4FRODYGm°^THING 2.„11 Gamble woPLAY WITH DICE CDCHEAT/NG ON CARDS AND STUFF I14. Nonsense FOOLISHNESS Z.15.Pre.ent KEEP SOMETHING (Q) FROM HAPPENING Z.16. Contagious NOLOTS OF PEOPLE GET THE FIX °INFECTIOUS I17. Nuisance \".....‘ 14.10 0A BOTHER OR PEST WWI KID BROTHER WON'T LEAVE ME ALONE 2-III. fable BOOK (Q) BOOK19.^.1411arCems POISONOUS (Q) IT COULD KILL YOU 2..30. &Corot, TO FLY SOUTH^s hc,iJcL^hattit- CNI.i21. S°\":° SOMETHING TO DO WITH PAPER AND WRITING^S 1104.1811. itckve. q o22. Seclude /N, LIKE INCLUDE SOMEBODY 023. Mantis ANIMAL CDEspionage_2a. NOMISSION (Q) ^SPY WITH A MISSION FOR THE GOVERNMENT25. Belfry A CHURCH TOWER LIKE THING I20. Rivalry nv C)27. Amendment NR 028.Compel DK 029.Affliction you HURT SOMEBODY, YOU INFLICT SOMETHING ON THEM 030. Obliterate RO Ce:111431. Imminent32.^Dilatoryis\"'Mee 6M36107COMPREHENSION^Discontinvo she, 4 sensmotive lei14.44. Sim*2.1,4.0Cut finger LET IT BLEED (Q) PUT A BAND—AID ON ITFind ^o iler^CALL A RADIO STATION ISmoke^CALL THE FIRE DEPARTMENT (Q) CALL THE POLICE IPolicemen TO STOP CRIME (0) HELP PEOPLE WITH PROBLEMS THAT AREN'T ILLEGAL .2-lose ball BUY ANOTHER ONE TO REPLACE THE ONE I LOST 2.Fight^BEAT THEM UP, NOBODY IS GOING TO HIT ME, LET ME TELL YOU 0/Odium, COOLER IN SUMER. AND WARNER IN WINTER 4 admit ILi^ plates IDENTIFY THE CAR AND TO REPORT ACCIDENTS WHEN YOU HAVE TO DO SO ICriminals AS AN EXAMPLE, FOR PUNISHMENT, SO THEY WON'T DO IT AGAIN, BAD PEOPLE 2-S tomps^TO PAY FOR POSTAGE 2.Inspect meat TO PROTECT THE MEAT FROM BEING BAD FOR PEOPLE Charity^TAX DEDUCTIBLE, SAFER FOR YOU TO DO SOSecret ^IT IS THE RIGHT WAY^.coottt.gs^ii ,,,,,_ q 0Paperbacks^CHEAPER, NOT SO BAD IF YOU LOSE A PAPERBACK BOOK IPromise^II IS A MATTER OF TRUST AMONG PEOPLE013t4TTER OF CONSCIENCE 2,Conon^SOFT AN ^WARM COCONVENIENT (Q) COMFORTABLE AND COOLSenators NO^b) a^cp i 1., I /J6 —Woe.^him 10,^UMW,*^hem^ '7411 4.4.4 ...... ...► way •••^ash^• 50100.1 nos••••••^Ise tut^sepaspeimeth 8414410.Totalhios.4,3420mole T. MO ir••■•• .►y. •••••ago of),....WISC—R II9. COMPREMENSION^0,101060111, ofger 4 11•••141,hr• AIlaW,•II. :2.1set.• 01. ClotfifleirTREAT IT (0) TREAT IT WITH THISZS AT HOME2. Find woos. TAN IT INTO LOST AND FOUND•3 . Sm•k•^ASK AN ADULT TO HELP AND GO OVER IlYSELF•,. peligsffi.\"—^CATCH BAD PEOPLE, ARREST CROOKS, ENFORCE LAWS3. 1421. 1)414^LOOK ALL OVER FOR THE BALL CO) PAY FOR IT IF LOST6. fight^LET HIM BE (Q) AND GET MAD—.7. 16\" /\".•. ovum (o) rnmoor• 11 . Um\" Plat\"^SO THE GOVERNMENT CAN KEEP TRACK OF CARS (Q) WON'T GO TO JAIL'9. Criminals^SAD PEOPLE AND AREN'T NICE10. Stomps^IT'S THE LAW1). las..\"*.t SO PEOPLE CAN GET MEAT (0) IT MIGHT BE ;AD•12. Charity^rrART'Y NV7DC 27 HORS 1 2112147I^13. Sw ill b.\". SO POLICE DO NOT CATCH YOU (0) DEMOCRATIC WAY•14. Pep* 'bees $^C^..^•^• atos^IN^, •WrI.^' • I W^..13 Prorm••^PEOPLE ARE DEPENDING ON YOU TO KEEP YOUR WORD• 1 dr . Canon'17. Senators1.• 41 1••/4. •••• MI, 0.• NON. ••• be •• • UNS•11^•••••••• Ni• N,l “•m •••■••••••1\", ■••••.. '101•••■••• II•ai se •• •••••• .61/. ••••••rge 14HD! Mee •...I108WISC—P. II KEY9. COMPREHENSION^plea•tim• she. 4 eeamewiisme WI/Am 2.117 01. CO finger TREAT IT (Q) TREAT IT WITH THINGS AT HOME I2. Find wallet^TURN IT n-ro LOST AND FOUND 2'3. Smoke^ASK AN ADULT TO HELP AND GO OVER MYSELF I•.4. PolicemenlicemenCATCH BAD PEOPLE. ARREST CROOKS, ENFORCE LAWS ....S. Lose boll^LOOK ALL 0^F R THE BALL (0) PAY FOR ;LIP LOST0. Rohe^LET HIM BE GET MAD 2.• 7. Build llelni. COOLER (0) FIREPROOFfi•II. Lice\". P lates^SO THE GOVERNMENT CAN KEEP TRACK OF CARS (Q) WON'T CO TO JAIL•9 . Criminal*^BAD PEOPLE AND AREN'T NICE10. Stamps^IT'S THE LAW^Shel^(IN^114. 4_0... er 011.Inspect \"Hat SO PEOPLE CAN GET MEAT^MIGHT BE_IAD el12.Oterity^r14LRITY ...4TrT1S IT MORE I _THINK C1I13. Secret bollot SO POLICE DO NOT CATCH YOU (0) DEMOCRATIC WAY1 t. Perporbocks^IT IS OKAY TO BEND THEM HOWEVER YOU WANT (0) DK ,15. Promise^PEOPLE ARE DEPENDING ON YOU TO KEEP YOUR WORD10. Canon ..-^.17. SenatorsWok,^Wm Om^ tspires• No NW am oftpoilwkark, MAIM \"la \"me 400....6.. with 4147 •••^44•^• wood moons.TotalMIN.034/2...40..• 0..0 $. oh (mama .Ay, alvemaelps 00....-109• .0NISC-R •RECORDFORM110NAME^UILLLMLJAUES^AGEJ4r85EX-NL-ADDRESSPARENT'S NAMESCHOOL ^ GRADE^Weeks boolairroso kola• for Childiroo.Rootood^ PLACE OF TESTING TESTED IT^REFERRED IV^WISC-R STUDY INSTRUCTIONS Dear Psychologists,There are four subtests to be scored in this protocol. Pleasescore the responses of each subtest, giving a total after youhave scored each one. This protocol does contain someadministrative errors. For example, some responses may not havebeen probed (0) properly. You may wish to indicate whether youthought the probing was appropriate or not by writing or brieflycommenting on the sheets. I am interested mainly in yourdecisions/judgments in arriving at a score for those responses,not in the score itself.Some abbreviated responses present in the protocol are: SP ■Spring, SU ■ Summer, W ■ Winter, F Fall; DK ^ don't know, andNR no response.Once again thank you for your time volunteered.THE UNIVERSITY OF BRITISH COLUMBIADepartment of Educational Psychologyand Special EducationFaculty of Education2123 Main MallVancouver. B.C. Canada V6T 1Z4Tel: (604) 822-8229Fax: (604) 822-3302WISC-R STUDY Psychologists,Please find enclosed the following materials:Two consent formsA background information formFour fabricated WISC-R Verbal subtestsPlease complete the package, and return all materials to theenvelope. However, please retain the second consent form foryour own records.If you have any questions regarding the study procedures orgeneral comments, please feel free to contact me at 737-0866, orDr. Bill McKee at 822-6572. We will be pleased to hear fromyou.All information will be treated as strictly confidential. Thankyou for your time volunteered.Yours sincerely,Josette Perot111CONSENT FORM (COPY TO BE RETAINED BY PARTICIPANT)Project Title: Psychologists' Scoring on the WISC-R VerbalScalesSubject no.Thank you for cooperating in this project regarding thedifferences in psychologists' decision-making in the scoring ofVerbal responses on the WISC-R. The data obtained from thisstudy will provide helpful scoring information for otherpsychologists. Secondly, participation in this study will havea significant bearing on the training of student psychologists.Therefore, your experience in the WISC-R testing practice willbe an important contribution to this study.Participation in this study will require you to provide somepersonal descriptive information, and score the Verbal Scales ofa WISC-R protocol. Arrangements can be made to score the WISC-Rprotocol at your convenience. The estimated time for thisexercise is no more than 30-40 minutes. In a second session, asubsample of psychologists will be asked to participate in ashort talk-aloud exercise while scoring some items. Theestimated time of the second session is approximately 30minutes. Again, arrangements will be made in accordance with atime and place that is convenient for you.Participants will be asked to provide a code name of theirown choosing and a phone number only as a means for contact.Additionally, all participants will be provided with a codenumber to preserve their'identities since confidentiality is ofthe utmost concern in this study.It is the right of any subject to refuse to participate orwithdraw from the study at any time. Such a decision willneither jeopardize nor influence you in any way. Pleaseindicate your willingness to participate in this project byproviding your consent below. Please also sign and retain thiscopy for your records.If you have any questions or enquiries about this studyplease feel free to contact me at this number: 737-0866. Or,you may contact Dr. Bill McKee at 822-6572. We will be pleaseto hear from you. Thank you for your cooperation.Josette Perot(Master's student, U.B.C.)I consent to participate in the study of psychologists'decision-making, and agree to allow the use of data acquired inthis study, and possibly recorded data for research purposes. Iacknowledge that I have received a copy of this consent form.Signature^ DateIf you are selected for the second session please indicatewhether you are willing to participate: yes or no (pleasecircle appropriate response). Please retain this copy for yourrecords.112BACKGROUND INFORMATIONDirections: Please provide the following information aboutyourself. Your responses will be coded and used to provideinformation descriptive of participants. This information, aswell as all other data you provide during this project, will betreated as confidential.Subject number:^1. Please provide a code name of your choice, and a telephonenumber where you may be reached. (You may use your first name ifyou wish; this information will only be used to contact if youwish to participate in the study.)Name^ Telephone number2. Years of college:^3. Highest degree earned:^4. Number of years as a psychologist:^5. Which of the following best describes your professionaltraining (please check the appropriate spot):a) School psychologistb) Educational psychologistc) Counseling psychologistd) Psychometriciane) Special educatorf) Other (please specify)6. Please indicate the appropriate date if you have receivedformal training (e.g., coursework, supervised administration) onthe administration and scoring of each of the followinginstruments:WISC-RWPPSI-RStanford-Binet IV ^7. Name of school district:^The information you provided will remain confidential. Thankyou for your cooperation.113114APPENDIX BSCRIPT FOR THINKING-ALOUD PROTOCOLINTRODUCTION:1) \"You have scored the WISC-R many times in the past. What weare going to do during this time is not much different from yourprevious scoring experience. You are going to score some itemson a Verbal Scale. The only difference is that I am going toask you to think aloud as you do this scoring. Sometimes whenwe are working on a problem alone we say out loud whateverpasses into our head. Here let me show you. Let's take thisexample:TROUT:FISH :: WHALE:^Example of experimenter's think-aloud thoughts to above problem:- They all live in the water - a trout is a kind of fish - awhale is a kind of fish too, but at the same time it isdifferent from a fish, in the sense that it does not reallybelong to the fish class -.if I remember correctly from Biologyclass in high school a whale is a warm-blooded animal, I thinkit has mammary glands too - a whale therefore belongs to thegeneral class of mammals - I think that's the correct choice -yes mammal.2) WARM-UP TASK FOR SUBJECT:Before you begin the Verbal Scale think-aloud exercise, whydon't you practice on this warm-up exercise first. Say whatevercomes into you head as you try to think about the response.This is just to get you accustomed to thinking aloud.CLOCK:TIME :: YARDSTICK:^HAIR:SCALP :: TOOTH^FRAGILE:BPEAK :: FLEXIBLE:^3) When the subject is ready for the actual exercise:\"Please say your thoughts as you decide on what specific pointvalue to award the response to the item. Your thoughts can besimple or complex. You may give reasons for your choices. Someresponses have a (0) beside them. The (0) means that theseparticular responses have been probed. You may or may not agreewith whether the probing was appropriate. You may make- commentson this too. You have the freedom to verbalize, or makereference to whatever you feel will help you make the bestchoice. No matter how irrelevant it may seem, I am interestedin all that you have to say. Please begin.\"1154) Final end of session probing question:\"The information that you have volunteered will be veryvaluable, however.to summarize this experience, can you tell mewhat strategies generally help you most in scoringresponses?\"116117APPENDIX C118Segmented Units for Subject #13 Item 1: Okay on the first one I would give it a one, it wasappropriately queried (recommendations/evaluations).The additional information doesn't add anything that I feeltells me that the child knows any more than what he new inthe first place (self-explanations).Item 2: The second answer is straight forward, it's the sameanswer as in the book (manual).Item 3: I think those are basically the same type of answer(self-explanation).I would have queried to see if there is an additional typeof answer because in my opinion those are the same(recommendations/evaluations)....but I would have queried in the actual testing situation(recommendations/evaluations).Item 5: I'd give that a two because he explained it (self-explanation).Item 6: And number six, I'd give that a two because hisfirst answer is correct and it should not have been queried(recommendations/evaluations)....and while the second answer is inadequate it should nothave been asked in the first place(recommendations/evaluations).Item 7: And number seven is a two, he's given two differentanswers (memory).Item 8: So the government can keep track of cars, um...I'mnot really sure what he means by that, um...it's vary similarto the way the government keeps a record of vehicles(manual).I don't know if that would have been given to clarify whatthis child was saying (monitoring statements)....that kind of response is not meant to be queried(recommendations/evaluations).Item 9: Number nine would be a zero because bad people is azero answer (manual).119Item 10: And number ten that is a zero answer but shouldhave been queried because there is no query you'd have totreat is as a zero (recommendations/evaluations).Item 11: ...that's a zero response and shouldn't have beenqueried, although that was an appropriate answer the firstone was not (recommendations/evaluations).Item 12: I would stop as soon as I hit the ceiling, I wouldnot mark the rest of them (general metastrategicstatements).Protocol for Subject *13*(E): Okay the first thing that I want to say is that you'vescored the Comprehension subtest many times, and today is notmuch different except that I'm just going to ask you to thinkaloud or talk aloud as you do this scoring so that I can followwhat you are doing and your thoughts as you approach the task ofscoring.But before we begin that and just to get you used tothinking aloud I've prepared a few cards, and what I'm going todo I'm going to practice one just to show you how it's done, andthen we can do a couple more. This is just to let you see how Isolve the problem.**(S): Okay.(E): Okay when I look at this card trout is to fish as whale isto something, I know that they're all sea animals. But if Iremember correctly from Biology class a long, long time ago, awhale is a little bit different from the other two because it'swarm blooded and it has mammary glands. So I think that eventhough a whale is to a fish as a trout is to a fish a whale isfrom the mammal class.So that's just my example of what goes through my head as_ I'm thinking about the problem. And, I'll Just ask you to tryone or two before we begin.(S): Hair is to scalp as tooth is to...okay hair sits on thescalp so I'd say tooth is to mouth, it's in the mouth.(E): Good, okay, just one more.(S): Clock is to time as yardstick is to measurement because ayardstick measures something.(E): Good , okay before we begin I'm just going to, read you afew specific instructions. Okay what I what you to do is justto please say you thoughts as you decide on what specific pointvalue to award the response to the item. Your thoughts can besimple or complex. You may give reasons for your choices. Someresponses have a (0) beside them. The (0) means that theseparticular responses have been probed or queried. You may ormay not agree with whether the probing was a appropriate. Youmay make comments on this too.^You have the freedom toverbalize or make reference to whatever you feel will help youmake the best possible choice. No matter how irrelevant it maysee, I am interested in all that you have to say. It'simportant to me. So, if you don't have any questions you maybegin and start scoring, and just tell me whatever comes intoyou head as you are doing this task.(S): So you are only worried about what I'm thinking not what'swritten here so I don't really need to read those as I'm goingthrough.*Experimenter**Subject120(E): That's right.(S): Okay.(E): If it helps to read you can.(S) [Number one]: Okay on the first one I would give it a one itwas appropriately queried, but the additional answer doesn't addanything that I feel tells me that the child knows anymore thanwhat he knew in the first answer. So I'd only give that a one.[Number two]: The second one is straight forward. It's the sameanswer as in the book (gives a two).[Number three]: And number three, um, I think that those are /really basically the same type of answer, um, I would havequeried to see if there is an additional type of answer becausein my opinion those are the same. So I would Just have to givea one because that second answer is not there, but I would havequeried it in the actual testing situation.[Number four]: And number four I would have given it a one and Iwould have queried that as well to look for a second answerbecause they all fall within the same category.[Number five]: Number five I'd give that a two because heexplained it, if he couldn't find it after the query then he'dpay for it...(gives a two).[Number six]: And number six, I'd give that a two because hisfirst answer is correct and it should not have been queried, andwhile the second answer is inadequate it should not have beenasked in the first place so his first answer is worth a two.[Number seven]: And number seven is a two, he's given twodifferent answers.[Number eight]: .So the government can keep track of cars, um...I'm not really sure what he means by that, um...it's verysimilar to the way the government keeps a record of vehicles,um, the query not having given the test myself, the query, Idon't know if that would have been given to clarify what thischild is saying but even if it was that kind of response is notmeant to be queried. go, I think I would give a one for thefirst part, the won't go to jail doesn't mean anything.[Number nine]: Number nine would be a zero because bad people isa zero answer.[Number ten]: And number ten that is a zero answer but shouldhave been queried because there is no query you'd have to treatit as a zero.[Number eleven]: Okay again in number eleven, that's a zeroresponse and shouldn't have been queried, although that was an121appropriate answer the first one was not.[Number twelve]: Number twelve is a zero response again...nowthat would be a ceiling, do you still want the other onesmarked?(E): If you would usually stop there.(S): I would stop as soon as I hit a ceiling, I would not markthe rest of them.(E): Alright, so we can stop here, but Just to kind of summarizeyour experience, if you are presented with the kinds ofdifficult type response that you're not sure about are there anygeneral kinds of strategies that you try to fall back on?(S): What I do is that I read through all the responses thatare given in the manual and try and identify the quality...firstof all does that fit into any one of the responses that aregiven in the manual, and if it does not then I try and identifythe quality of the answer with what I think the intent of thequality was in the manual. And, usually I'm only marking theseif I'm the one who's given it, so I try and recall what thechild was talking about that I might not have written everythingdown, but I do try and write it all down, but mostly I Justweigh it against the quality in the book versus the quality ofwhat the child said.(E): Okay, thank you very much.(S): That's all.(E): That's it.122123Segmented Units for Subject #2 Item 1: So...that doesn't really expand...so that should bea one still (self-explanation).Item 2: I'd give that a two, go over myself, go see what'sthe matter (manual).Item 4: So we need two reasons (planning).Arrest crooks is the same one...and enforce laws...is thesame category so we need to ask for another response on thatone (planning).Item 5: Okay, look all over for the ball is part of a one,yes and there's a (Q), pay for it if lost makes it a two(manual).Item 6: Let him be is two points, so that doesn't need to becued (recommendations/evaluations).And if it's cued and he spoils it with get mad...you see hespoiled it and I'm just not sure what to do with that, sowhat do I know about spoiling responses (monitoringstatements).Actually I'm curious, I'm curious [about spoiling rules](general metastrategic statements).Item 8: But that interesting you see because keeping trackof cars you might even cue 'cause it really...what it sayshere is, um, the way the government keeps a record of thevehicles is different (manual).Item 9: Bad people are criminals, or bad people aren't niceis zero...and that a zero with no question (memory).Item 10: It should be questioned(recommendations/evaluations).Item 11: That should not have been questioned(recommendations/evaluations).I'm going to check again though (monitoring statements)....I'll just check out my thinking...see where my thinkingis coming from (monitoring statements).Item 12: Charity needs it more I think, well that's anotherone that's scorable. That's just a zero (memory).124Item 13: So people do not catch you, what does that mean(monitoring statements)?I would question that one just like they did(recommendations/evaluations).It's the democratic way doesn't give you any of the criteria(manual).I would have questioned that one. That was not clear(monitoring statements).Item 14: It's okay to bend them however you want...that'sone to question because...that's under the cheapercategory...it's okay to bend it or fold it (manual)....that one should not have been questioned(recommendations/evaluations).Item 15: People are depending on you to keep your word..it'sa one, and now we have to ask for another [response](planning).Protocol for Subject $2*(E): What I'd like to say is that you've scored the WISC manytimes in the past but today we're not going to do anything thatmuch different except I'm going to ask you to think aloud as youdo this scoring, but just in case you're not quite aware of whata think aloud is, I'm going to try a little exercise.**(S): Certainly, sure.(E): I brought along just a couple of cards of some analogy typequestions, and I'm going to do a little think aloud myself whichwill take just a few seconds. What it involves is sayingwhatever comes into your mind, for instance: trout, fish,^9whale, it's an analogy question...and I'm going to tell youwhatever flows into my mind. First of all, they're all from thefish family, um, but at the same time a whale is just a littlebit different if I remember from Biology class in high schoolbecause a whale is warmblooded and it has mammary glands,therefore, it's not quite from the fish class but I would sayit's from the mammal class, so I think the answer is mammal.Trout is to fish, as whale is to mammal.Now I just want you just to try one more on your own, justto get accustomed to saying these things out loud.*(S): Okay, so I should try it like you did?(E): Sure, yes whatever comes into your mind.(S): Okay, um, so a clock is to time as a yardstick isto...okay, a clock, a clock tells time, a yardstick measures, someasure?(E): Okay, just one more before we begin. Whatever comes intoyour mind as you are thinking about the problem.(S): Hair is to scalp, hair is on the scalp, a tooth is in themouth. Mouth.(E): Good. Okay now I'll just read you a few writteninstructions I have before we begin the actual scoring. And,what I want you to do is just to please sat your thoughts as youdecide on what specific point value to award the response to theitem. Your thoughts can be simple or complex. You may givereasons for your choices. Some responses have a (0) besidethem. The (0) means that theses responses have bee probed. Youmay or may not agree with whether the probing was appropriate.You may make comments on this too. You have the freedom toverbalize, or make reference to whatever you feel may help youmake the best choice. And, no matter how irrelevant it mayseem, I am interested in all that you have to say.*Experimenter**Subject125(S): Sure.(E): Okay, so you can begin when you're ready, whatever comesinto you mind as you're scoring. I'll Just follow along.(S): So, I'll just talk out loud.(E): Sure, yes.(S): [Number one]: Okay. So, um, treat it, you treat it. Treatit is a cue, so that's right, you treat it with things at home.So...that doesn't really expand on what so that would be a onestill.[Number two]: Find a wallet, okay. Ah, what do you do, turn itinto the lost and found, ah, yes. Turn it into the lost andfound is a two points. So that's a two.[Number three]: Smoke, okay. Um, what should you do you do ifyou see thick smoke? Ask an adult to help is a one, and go overmyself...um...go see what's the matter. I'd give that one atwo, go over myself, go see what's the matter...okay, so I'dgive that a two...um.[Number four]: Okay, number four.^What are some reasons why weneed policemen? So, we need two reasons, catch badpeople...ah, is a one, is one of them. Arrest crooks is thesame one...and enforce laws...is the same category so we need to•ask for another response on that one. And, so that is a one asit is.[Number five]: And, okay. What's the thing to do if you lose aball that belongs to one of your friends? Okay, look all overfor the ball is part of a one, yes and there's a (G), pay for itif lost makes it into a two.[Number six]: What's the thing to do if a boy much smaller thanyourself starts a fight with you? Let him be is a two points, sothat doesn't need to be cued, and if it's cued and he spoils itwith get mad...you see he spoiled it and I'm just not too surewith what I would do with that, so what do I know about spoilingresponses, I think what would happen is that he gets a zero whenhe spoils it like that...yah, I think that he spoiled it andit's going to have a zero on that one. Actually, I'm curiousnow, I'm curious...[looks for rules about spoiling in themanual], a zero because he spoiled it, yah okay so that musthave been into, into my repertoire here. Things to do...[Number seven]: Okay, seven. In what ways is a house built ofbrick or stone better than one built of wood? Okay, it's cooler,that's one, and I guess (Q) is the same as an (R) here is it,I'm not sure, there's no R's here.126CE): You would ask for another response.(S): Yes, so would that mean, are you signalling that by a WY?(E): You can change it if you want.(S): No, it doesn't matter, is that a signal?(E): That's a (0), yes.(S): So, and it's fireproof is good too, so that's two. Sothat's two points.[Number eight]: License plates. So the government can keeptrack of cars, um, um...that's one, and won't go,to jail isn'tone. So, it's a one. But that's interesting you see becausekeeping track of cars you might even cue 'cause it really, whatit says here is, um, the way the government keeps a record ofthe vehicles is different, so to clarify that I might cue andthen say, now tell me another way, so that's what I'm justwondering about there. So, anyways that would be a one.[Number nine]: Bad people are criminals, or bad people aren'tnice is zero...and that's just a zero with no question, it's,that's a zero.[Number ten]: Stamps, it's the law is a zero. Oh no, yes it is,it's a zero, and it should be questioned, it should bequestioned. And, but as it is there it's a zero.[Number eleven]: Okay, so people can get meat. No...it's just azero, and that didn't need to be, it shouldn't have beenquestioned. When it's questioned and it might be bad, what do Ido there? So the student wasn't really allowed to be questionedthere under the standardized way, so I don't think I can scorethat, I think that's a zero. And, I'm going to check againthough because this is good review for me. So,if you don't mindme taking the time...(E): Sure...(S): ...that would be my thinking on that one, and then I'lljust check out my thinking...see where my thinking is comingfrom...Echecks manual]...this is actually an interesting, aninteresting exercise to do.(E): A few people told me they're enjoyed it.(S): Yes, it is interesting, because it is interesting to lookat why do I think that, you know...um....so if...um, [readsmanual out loud] the score may stay at zero or it may be raised127to one or even two depending on the quality of the child'selaboration - so, it's to the child advantage if you make thiskind of mistake because you can then raise the score. Okay, sohe will get a two because it might be, no it might be bad is aone, that's a one isn't it...I think it's one point for thatone.[Number twelve]: Charity needs it more.^More, charity needs itmore I think, well that's another one that's scorable. That'sjust a zero, and you just leave it at that [checks manual].[Number thirteen]: Okay, secret ballot. So people cannot catchyou what does that mean...so people do not catch you...so thepolice don't catch you, so people do not catch you... I would,question that one just like they did, and um it's the democraticway, doesn't give you a...it doesn't give yoU what any of thosecriteria...it's the democratic way, now that's • one there, okayso, yes, that was a good question because then you get a onepoint out of that...now, I would have questioned that one. Thatwas not clear.[Number fourteen]: Is it okay to...okay, paperbacks. It's okayto bend them however you want..um,um, that's one to questionbecause...that's under the cheaper...it's okay to bend or foldit, it's okay to bend or fold it, doesn't have a questionthough, so you can bend it however you want, so, that oneshouldn't have been questioned 'cause that's one that's already .there, that's already worth one point. And, then, oh I'm sorry •the (0) is an R here as well, that's confusing for me. So whenthat becomes an R, so then that's when I would say tell meanother way, so that's an R would be tell me another way. And,don't know so it's a one.[Number fifteen]: People are depending on you to keep yourword...it's a one, and now we have to ask for another, no thisisn't a two, this is the promise. So the promise is good,that's a one point, people are depending on you to keep you wordis a one. So, now we've stopped, and we don't have • ceiling.So, I'm not sure why we've stopped here. It needs to go on tosixteen and seventeen. Yes.(E): So you would go on.(S): I would go on.(E): Okay, just to summarize your experience, what are anygeneral strategies you use in the face of very difficult typeresponses?CS): Yah, I'm just aware of what I did do. First of all, Ithink about it, give it my...I come up with, well I come up with128an idea well I think of what that is, and then I would go backand have a look. If I'm really, if it's one I'm really stuckon, look back and doublecheck the criteria first the ones rightby the answers, and then back to the book, yah, yah. You don'tget very many where, I don't have experience questioning where,where it's not a questioning response. So, that's why that's anunusual experience because for me I always have that part of mybook open to, even after all these years. I always use this, Idon't take anything for granted. I always have this open. So,I would check it then. So, that's a difficult one, um, but if Idid, then I would have to go through Just what I did, I wouldhave to go through and see if that's the way that it was right.Like I mean this was interesting for me, number eleven, but thenbecause it's the opposite, the other way if you question it theycan lose it, but if you also, if you question it they can alsogain it so really that's interesting for me because there'snothing to lose by questioning, really in that case, is there,if you're unclear. If you question and they've given you •response that's accurate, you question they can below [lower]that. But, if they aren't or they're not clear or they've givenyou an inaccurate response and you question and then get it,what that suggests to me is that you're better off to questionthan not to question. Yes, so that's interesting.129130APPENDIX DTable D1 Frequencies (Percentages) of Verbalizations for Non-Problematic Items Item 1 Category^ Frequency(Percent)Memory 7 (29)Self-explanations^ 5 (21)Monitoring Statements 3 (13)General Metastrategic Statements^3 (13)Manual^ 3 (13)Recommendations/Evaluations^3 (13)Planning 0n=24Item 2 Memory^ 6 (56)Manual 3 (33)Monitoring Statements^1 (11)Planning 0Self-explanations 0General Metastrategic Statements^0Recommendations/Evaluations^0n=9Item 9 Manual^ 5 (29)Memory 4 (24)Monitoring Statements^ 4 (24)Recommendations/Evaluations^2 (12)Monitoring Statements 1 ( 6)Planning^ 1 ( 6)General Metastrategic Statements^0n=17131Table D2 Frequencies (Percentages) of Verbalizations for DifficultItemsItem 4 ManualPlanningSelf-explanationsRecommendations/EvaluationsMonitoring StatementsGeneral Metastrategic StatementsMemory8554311(30)(19)(19)(15)(11)(4)(4)n=27Item 6Recommendations/Evaluations 8 (36)Memory 5 (23)General Metastrategic Statements 2 (^9)Monitoring Statements 1 (^5)Manual 6 (27)Planning 0Self-explanations 0n=22Item 11Recommendations/Evaluations 6 (33)Manual 5 (28)Monitoring Statements 4 (22)Planning 1 (^6)Self-explanations 1 (^6)General Metastrategic Statements 1 (^6)Memory 0n=18132"@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "1992-05"@en ; edm:isShownAt "10.14288/1.0103795"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Special Education"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Cognitive strategies and heuristics underlying psychologists’ judgments on the WISE-R verbal scales : a protocol analysis"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/1639"@en .