UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Expertise in nurses’ clinical judgments : the role of cognitive variables and experience Christie, Lynda A. 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1996-147363.pdf [ 13.83MB ]
Metadata
JSON: 831-1.0087922.json
JSON-LD: 831-1.0087922-ld.json
RDF/XML (Pretty): 831-1.0087922-rdf.xml
RDF/JSON: 831-1.0087922-rdf.json
Turtle: 831-1.0087922-turtle.txt
N-Triples: 831-1.0087922-rdf-ntriples.txt
Original Record: 831-1.0087922-source.json
Full Text
831-1.0087922-fulltext.txt
Citation
831-1.0087922.ris

Full Text

EXPERTISE IN NURSES' CLINICAL JUDGMENTS: THE ROLE OF COGNITIVE VARIABLES AND EXPERIENCE by LYNDA A. CHRISTIE BSN, The University of B r i t i s h Columbia, 1967 MA (Education), Simon Fraser University, 1986 A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF EDUCATIONAL PSYCHOLOGY AND SPECIAL EDUCATION We accept this dissertation as conforming to tj^ s required stapdard ^NIVERSITY OF BRITISH COLUMBIA August 1996 ® Lynda A. Christie, 1996 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. The University of British Columbia Vancouver, Canada DE-6 (2/88) 11 ABSTRACT Many researchers have fai led to find a relationship between exper ience and judgment accuracy. In this study the purpose w a s to understand the relat ionship between exper ience and expert ise in clinical judgment. C o m m o n s e n s e suggests that exper ienced subjects make better quality judgments, compared to nov ices. Cl in ical judgments, however, are ill-structured and character ized by uncertainty; they take p lace in a dynamic context, with de layed or nonexistent feedback and are difficult to learn. Cognit ive operat ions that translate "cues" (such as risk factors, s igns, and symptoms) into judgments are not fully understood. Cognit ive constructs (conceptual structure, sensitivity to patterns in data, and judgment process) and individual dif ferences in age, educat ion, and exper ience were explored to identify their relationship to judgment expert ise. Indicators of judgment quality were: accuracy, consis tency, latency, conf idence, cal ibration, and knowledge accessibi l i ty. In phase 1 of this study, cues were identified that best predicted heal ing t ime for 258 surgical patients with abdominal incisions. In P h a s e 2, the subjects were 36 nurses with a range of exper ience caring for surgical patients. Generat ing both quantitative and qualitative data, subjects made judgments about incisional heal ing o n the bas is of information from actual patients. Mult idimensional scal ing was used to reveal conceptual structure, and lens model ing was appl ied to a s s e s s sensitivity to broad patterns. A n information board task with think-aloud protocols demonstrated judgment process. The select ion of tasks was based on their analys is- or intuition-inducing features, using K. R. Hammond 's (1990) cognit ive continuum theory. Exper ience accounted for a only a smal l proportion of var iance in performance, whereas conf idence in judgment was more strongly related to exper ience. T a k e n together, these f indings replicated previous research. Protocol data showed that metacognit ion, knowledge accessibi l i ty, and reflectivity increased with exper ience. Conceptua l structure predicted judgment accuracy under intuitive condit ions. Support was found for Dreyfus and Dreyfus' (1986) hypothesized transition in cognit ion, from deliberate process ing of discrete cues, to intuitive process ing of patterns of cues encoded in memor ies for speci f ic c a s e s . Th is study has theoretical s igni f icance by adding to knowledge about cl inical judgment, and by increasing understanding of cognit ive changes assoc ia ted with expert ise. Th is study has practical s igni f icance in providing direction for the development of teaching methods a imed to increase learning from exper ience in probabilistic contexts. i v TABLE OF CONTENTS A b s t r a c t ii Tab le of Con ten t s iv List of Tab les ix List of F igures xii A c k n o w l e d g e m e n t s xiii Ded ica t ion xv i CHAPTER 1 INTRODUCTION TO THE STUDY 1 Backg round 2 The Con tex t 3 The Prob lem Relat ing Exper ience and J u d g m e n t A c c u r a c y 4 Exp lanat ions for the Lack of Relat ionship 5 C o n c e p t u a l S t ruc tu re 6 Sens i t i v i t y to Pat terns in Data 7 J u d g m e n t P rocess 10 A s s e s s m e n t of Exper ience 11 S u m m a r y of the Prob lem 12 Purposes of the S t u d y 13 S ign i f i cance of the S tudy 13 Research Ques t i ons 14 V CHAPTER 2 REVIEW OF THE LITERATURE 16 Sec t i on A : Exper t ise In Cl in ica l J u d g m e n t 16 Charac te r i s t i cs of Exper t ise 16 D r e y f u s ' M o d e l of Exper t ise 21 Charac te r i s t i cs of Cl in ica l J u d g m e n t 2 5 Lens M o d e l for C l in ica l J u d g m e n t 2 9 Sec t i on B: S u m m a r y of Re levant Research 3 2 J u d g m e n t A c c u r a c y as an Indicator of Exper t ise 3 2 C o n s i s t e n c y of Cl in ica l J u d g m e n t 35 Add i t iona l Indicators of Exper t ise 36 Sec t i on C : F ind ings f rom the Literature 4 2 C o n c e p t u a l S t ruc ture 4 2 Sens i t i v i t y to Pat terns in Data 4 4 J u d g m e n t P r o c e s s 51 Individual D i f fe rences in Educat ion and A g e 61 Individual D i f fe rences in Exper ience 6 2 Sec t i on D: Add i t iona l Fac tors Inf luencing J u d g m e n t Resul ts 6 5 Heur is t ics and Cogn i t i ve Biases 6 5 Exper ience and the Use of Heur is t ics 7 0 Research A s s u m p t i o n s and Des ign Factors 71 Sec t i on E: S u m m a r y 7 8 CHAPTER 3 RESEARCH QUESTIONS, METHODS, AND ANTICIPATED PATTERNS OF RELATIONS 80 Sec t i on A : Resea rch Ques t i ons , Des ign , and Procedures 8 0 Overa l l Research Ques t ion 8 0 Spec i f i c Research Ques t i ons 8 0 Des ign and Procedures 81 Samp le S ize 8 3 v i Sec t i on B: The Lens M o d e l 8 4 Descr ip t ion of the Lens M o d e l 8 4 Cr i t ica l D i scuss ion of Linear Mode l i ng 8 9 Sec t i on C : Research T a s k s , Cond i t i ons , and M e a s u r e s 9 4 O v e r v i e w of M e t h o d s 9 5 Research T a s k s and Cond i t i ons 9 5 M e a s u r e s of Cogn i t i ve Cons t r uc t s 106 Indicators of Exper t ise 108 M e a s u r e s of Exper ience 1 1 4 Sec t i on D: An t i c i pa ted Pat terns of Relat ions 1 1 4 Sec t i on E: S u m m a r y 115 CHAPTER 4 ANALYSES AND RESULTS 116 Sec t i on A : Lens M o d e l A n a l y s e s [Phase 1 of the Study ] 116 Data Sc reen ing 117 Eva luat ion of A s s u m p t i o n s 1 1 9 Var iab les Predic t ive of Heal ing T ime 121 Relat ive Importance of Var iab les 1 2 4 Sec t i on B: Resul ts Related to Indicators of Exper t ise [Phase 2 of the Study ] 1 2 5 Descr ip t ion of Nurse Sub jec ts 125 O v e r v i e w of Data A n a l y s i s 1 2 7 J u d g m e n t A c c u r a c y 1 2 9 Con f i dence 131 J u d g m e n t C o n s i s t e n c y 1 3 2 Cal ibrat ion of Con f i dence 1 3 3 J u d g m e n t La tency 1 4 0 K n o w l e d g e Access ib i l i t y 1 4 2 Sec t i on C : Resul ts Related to Cogn i t i ve Cons t ruc t s and Individual D i f fe rences 145 C o n c e p t u a l S t ruc ture Sens i t i v i t y to Overa l l Pat terns in Data 145 151 v i i J u d g m e n t P r o c e s s 1 5 8 A g e , Educa t i on , and Exper ience 1 7 0 Sec t i on D: Resu l ts Related to Mul t ivar ia te A n a l y s e s 1 7 4 Resu l ts Related to Research Cond i t i ons 1 7 4 Resul ts Related to M e t h o d s for Con f i dence A s s e s s m e n t 1 7 8 Sec t i on E: S u m m a r y 178 Pred ic t ion of J u d g m e n t Exper t ise 178 S u m m a r y of Pat terns of Relat ions 181 CHAPTER 5 DISCUSSION 183 Sec t i on A : D i s c u s s i o n of Major F indings 1 8 3 Research Ques t ion 1 1 8 3 Research Ques t ion 2 1 8 9 Research Ques t ion 3 1 9 9 Research Ques t i ons 4 and 5 2 0 0 Research Ques t ion 6 2 0 3 Sec t i on B: D i s c u s s i o n Related to M e t h o d s 2 0 4 F ind ings Related to M e t h o d s 2 0 4 F ind ings Related to Research Tradi t ions 2 0 9 L imi tat ions of the S tudy 2 1 3 Sec t i on C : Cont r ibu t ion to Theory Deve lopment 2 1 5 Sec t i on D: Impl icat ions for Educat ion and Pract ice 2 2 3 Sec t i on E: S u m m a r y 2 2 4 S u m m a r y Related to Research Ques t i ons 2 2 4 R e c o m m e n d a t i o n s for Further Research 2 2 7 C o n c l u s i o n 2 3 2 REFERENCES 234 v i i i A p p e n d i x A : G lossa ry of Te rms 2 7 0 A p p e n d i x B: Examp les of Lens M o d e l C a s e s 2 7 2 A p p e n d i x C : T a s k Presenta t ion S e q u e n c e 2 7 4 A p p e n d i x D: K n o w l e d g e Access ib i l i t y Sca le 2 7 5 A p p e n d i x E: C o n c e p t u a l C o h e r e n c e Sca le 2 7 7 A p p e n d i x F: S t ress and R-Squared for Mu l t id imens iona l Sca l i ng of Simi lar i ty J u d g m e n t s 2 8 0 A p p e n d i x G : Lens M o d e l and A c c u r a c y M e a s u r e s for Normal Order and Reverse Order C a s e s 2 8 2 A p p e n d i x H: Regress ion Coef f i c ien ts (Standardized) for C u e s for Normal and Reverse Order C a s e s 2 8 5 A p p e n d i x I: Deta i led Resul ts Related to the Information Board T a s k . 2 8 9 A p p e n d i x J : J u d g m e n t P rocess Sca le 291 ix LIST O F T A B L E S Table 3-01: Means and Standard Deviations for the Dependent Variable, Healing Time, and Regression Model R-Squared for Normal Order, Reverse Order, and Cross-Validation Cases . . . . 101 Table 3-02: Outline of Tasks, Measures, and Associated Constructs 112 Table 4-01: Intercorrelations Between Significant Predictors and Log Transformation of Healing Time 123 Table 4-02: Indices of Variable Importance for Significant Predictors of Healing Time 124 Table 4-03: Classifications of Subjects According to Experience Level, Age Category, and Education 127 Table 4-04: Means and Standard Deviations for Accuracy Assessed by Error Variance and Percent Correct for Normal Order and Reverse Order Cases, by Experience Level 130 Table 4-05: Means and Standard Deviations for Average Confidence (Concurrent Method) and Overall Confidence (Retrospective Method) for Normal Order and Reverse Order Cases, by Experience Level 131 Table 4-06: Nonparametric Summaries of Consistency for Repeated Judgments of Healing Time and Repeated Judgments of Confidence (Concurrent) for Normal Order and Reverse Order Cases, by Experience Level 132 Table 4-07: Means and Standard Deviations for Two Methods of Assessing Judgment Calibration for Normal and Reverse Order Cases, by Experience Level 136 Table 4-08: Means and Standard Deviations for Two Methods of Assessing Judgment Quality for Normal Order and Reverse Order Cases, by Experience Level 138 Table 4-09: Means and Standard Deviations for Confidence Averaged Over Correct and Incorrect Judgments and for Confidence-Accuracy Correlations for Normal Order and Reverse Order Cases, by Experience Level 139 X Table 4-10: Means and Standard Deviations for Latency (in Seconds) Averaged Over Cases for Normal Order and Reverse Order Cases, by Case Sequence and by Experience Level 141 Table 4-11: Means and Standard Deviations for Knowledge Accessibility Scale, by Experience Level 142 Table 4-12a: Intercorrelation Matrix for Experience and Indicators of Expertise Based on Performance on Normal Order Cases 143 Table 4-12b: Intercorrelation Matrix for Experience and Indicators of Expertise Based on Performance on Reverse Order Cases 144 Table 4-13: Means and Standard Deviations for Conceptual Coherence Scale from Protocol Data, by Experience Level 146 Table 4-14: Means and Standard Deviations for Average Distance Between Concept Pairs in Selected Clusters, by Experience Level 150 Table 4-15 Nonparametric Measures of Performance on the Lens Model Task for Normal Order and Reverse Order Cases, by Experience Level 153 Table 4-16: Means and Standard Deviations for Measures Derived from Lens Model Analyses for Normal Order and Reverse Order Cases, by Experience Level 154 Table 4-17: Means and Standard Deviations for Accuracy of Cross-Validation of Judgment Policies (Based on Error Variance Measures and Percent Correct Measures), by Experience Level 157 Table 4-18: Means and Standard Deviations for Data Search Measures on the Information Board Task and Number of Subjects with Correct Judgments, by Experience Level 160 Table 4-19: Means and Standard Deviations for Judgment Process Scale, by Experience Level 161 Table 4-20: Means and Standard Deviations for Measures of Calibration of Healing Time and Omega-Squared from the Orthogonal Judgment Task, by Experience Level 166 Table 4-21: Means and Standard Deviations for Measures of Configural Cue-Use, by Experience Level 167 xi Table 4-22a: Intercorrelations Among Measures of Configural Cue-Use, Accuracy Measures, and Experience, for Normal Order Cases 168 Table 4-22b: Intercorrelations Among Measures of Configural Cue-Use, Accuracy Measures, and Experience, for Reverse Order Cases 169 Table 4-23a: Intercorrelations Between Experience Measures, Age, Education, and Judgment Accuracy for Normal Order Cases 171 Table 4-23b: Intercorrelations Between Experience Measures, Age, Education, and Judgment Accuracy for Reverse Order Cases 172 Table 4-24: Correlation Coefficients Between Cognitive Measures and Measures of Judgment Accuracy for Normal Order and Reverse Order Cases 173 Table 4-25: Multivariate Analysis of Variance for the Effects of Slide Condition and Paragraph Order (Normal Order and Reverse Order) on Judgment Accuracy (Error Variance Measure) 175 Table 4-26: Multivariate Analysis of Variance for the Effects of Slide Condition and Paragraph Order (Normal Order and Reverse Order) on Judgment Accuracy (Percent Correct Measure) 176 Table 4-27: Means and Standard Deviations for Judgment Accuracy (Percent Correct for all cases ~ Normal Order and Reverse Order), by Slide Condition and Experience Level 177 Table 4-28a: Intercorrelations for Predictor Variables for Accuracy for Normal Order Cases (Percent Correct Measure), and Experience 179 Table 4-28b: Intercorrelations for Predictor Variables for Accuracy for Reverse Order Cases (Error Variance Measure), and Experience 180 Table 4-28c: Intercorrelations for Predictor Variables for Accuracy for Reverse Order Cases (Percent Correct Measure), and Experience 181 x i i LIST OF FIGURES Figure 1: Brunswik 's Lens Mode l , showing conceptual relations 30 Figure 2: Brunswik 's Lens Mode l , showing mathematical relations 88 X l l X ACKNOWLEDGEMENTS I w ish to acknowledge and thank many individuals and groups for the help I was given with this research. First, I want to thank my Commit tee members , Dr. Patr ic ia Arl in, Dr. Nand Kishor, and Dr. Janet J a m i e s o n , for ass is t ing m e to bring this study to complet ion. They helped me develop and clarify my ideas, and express them more concisely. They a lso provided useful feedback on an earlier draft of this dissertat ion. In particular, Dr. Arl in's inf luence as a researcher, as a mentor, and as a friend was very much appreciated. I a lso want to thank Dr. Ar le igh Reichl for his ass is tance with applying the calibration equat ions, and for helping me run the var ious S P S S computer programs. In the summer of 1993,1 was fortunate to receive an Educat ional Leave . I thank the administration at Langara for that privilege. In addit ion, I thank the Langara Resea rch Commit tee for the grant I received. I thank the staff at the Regis tered Nurses ' Assoc ia t ion of B C for providing ass is tance in advert ising for exper ienced nurses as subjects, and for giving me a research bursary. I want to thank the surgical patients and the nurses who volunteered for this study. Without their participation, the research would not have been possib le. I acknowledge the ass is tance given by Dr. Chr is Bradley, Director of R e s e a r c h and Evaluat ion, at Vancouve r Hospital , and Mariette Ca rmen , Enterostomal Therapist, at St. Pau ls ' Hospi ta l . I a lso thank Dr. D iane Cooper , cl inical nurse scholar from the University of Texas , for her input regarding the wound heal ing concepts . To all my co l leagues from the Langara Nursing program, I owe a very heart-felt "thanks" for their encouragement and support. In the early years , Karen helped x i v by shar ing cl inical groups, s o I could attend c l asses during the day, A n n a w a s a lways ask ing how I was coming a long, and A l ice l istened to my ideas for a proposal . J a n e and Gai l were in the cl inical a rea providing encouragement a s I began the long p rocess of f inding 281 surgical patients who would be will ing to volunteer for my study. More recently, K e n and W a n d a tested out my tasks, G l e n d a and Maureen helped me to better organize myself, and J a c k reminded me to "focus". Lately, I appreciate the fact that many co l leagues s topped ask ing how I was coming along. Audrey, P a m , Mary, and Barb w ished m e well a s I prepared for my oral defense. I need to say a spec ia l thanks to Rober ta for the many ways s h e has facilitated this research. During the term, we did s o m e joint planning and team teaching -- Rober ta helped me keep on track, as I combined teaching and research. In preparation for attending A E R A in New York, Rober ta re-coded several of the verbal protocols s o I could calculate Cohen ' s Kappa , and we worked together construct ing the poster. S h e w a s a resource for spel l ing, grammar, and A P A style (including the removal of anthropomorphisms); she taught me to use WordPer fect , patiently l istened to my many problems, and a lso shared in the excit ing t imes. I am hoping my motivation for dissertat ion research might be contagious, and I can do the s a m e for her one day. Many people provided technical support. T e d Morris from Ubiquitous Software deve loped the computer programs that were used to present two of the tasks to subjects, and S u e t ranscr ibed tapes of nurses ' d ia logue. Tom did great work with the lens model d iagram, and J o e constructed the chart showing the sequenc ing of tasks. Blair t ransformed my numbers into heat, readable tables, and Atsuko from the Langara library obtained papers by F A X from Ohio and Alber ta in XV order for me to meet my deadl ines. Final ly, my son , T im, caught errors as he proof-read. I want to express my s incere appreciat ion to Dr. Robert M. H a m m , Director of the Program in Cl in ical Dec is ion Mak ing at the University of O k l a h o m a Co l lege of Medic ine. Dr. H a m m was the external reviewer for this dissertat ion. I thank him for reviewing the entire document s o thoroughly, and making many excel lent recommendat ions. I a lso value his suggest ions regarding future publ icat ions. Last, but not least, I must thank my family for their pat ience with my total absorpt ion in this dissertation over the years. I thank J o h n for letting me print out many drafts at his store, and for not noticing how late it w a s or how much paper I used. I real ize that while I have been involved in this research, I have not had time to be a traditional "Mom" ; I hope my chi ldren - Ruth, Dan , T im, and J e a n n e n e ~ can understand how much it has meant to me to be able to pursue this goal and to finally reach it. I thank everyone who helped me make the complet ion of this research a reality. 'D'E'DICWTlOOt I wish to dedicate this dissertation to the memory of my father, "Eric Schwab, who passed away in August, 1988, at the age of 74. Me encouraged me from an earty age to Cove [earning, to seek\academic challenges, and to complete what I started. 9le was the best father anyone could have had. I aCso dedicate this dissertation to my mother, Joyce Schwab, who has been a constant source of support and inspiration during aCCthe time I have spent on this research. She has listened patiently to my difficulties, and encouraged me to move ahead a little each weef(\ She also shared in my happiness as each chapter was completed. I /(now I was always in her prayers, and for that I am thankful. 1 I. INTRODUCTION TO THE STUDY Making judgments is a fundamental human activity in which cognit ive operat ions are used to integrate separate e lements of information (cues) into a unitary response. This response is a person's attempt to weigh the importance of information and combine relevant cues in an optimal manner. Examp les of judgments include predicting future events, est imating the va lue of a product or idea, and evaluating the relative merits of var ious courses of action in a particular situation. Stating the aggregate meaning of a number of equivocal informational cues such as test results, history, risk factors, and s igns and symptoms constitutes acts of judgment common in educat ional and medical contexts. Peop le somet imes reserve the term 'judgment' for those assessmen t situations where process ing of information can be consciously recal led. S u c h a restriction is limiting. A person in a familiar context may synthes ize cues into judgments without necessar i ly being able to articulate how s u c h a task is ach ieved. In fact, studies have revealed that the mind uses nonconsc ious ly acquired (or tacit) knowledge to make judgments (Kihlstrom, 1987; Lewicki , C z y z e w s k a , & Hoffman 1987; Mitchell & B e a c h , 1990). Over the past 25 years , considerable judgment research has been carr ied out (Hogarth, 1980; Slov ic , Fischhoff, & Lichtenstein, 1977; S lov ic & Lichtenstein, 1971; Tversky & Kahneman , 1974). The literature includes many examp les where judgments have been modeled (or pol icies were captured 1 ) . Examp les of such studies include the prediction of chi ldren's reading ach ievement (Cooksey & Freebody, 1985, 1987), the select ion of students for graduate school (Dawes & Corr igan, 1974), the identification of students' preferences for jobs (Einhorn, 1971), 1 Te rms pecul iar to the literature on judgment are included in the glossary in Appendix A . 2 and physic ians' judgments of severity of congest ive heart failure (LaDuca, Enge l , & Chovan , 1988). Background In the health-care f ield, professionals must p rocess vast quantit ies of da ta when making judgments. In this dynamic setting, both normal variat ions and pathological manifestat ions are monitored. Making judgments can be extremely chal lenging. In the past, difficulty with clinical judgment has been attributed to the lackoi information. Currently, however, problems s e e m more often related to an abundance of data, together with the wel l -known limitations imposed by information-processing constraints. Furthermore, difficulty exists because clinical judgment tasks are il l-structured, and character ized by uncertainty: most data have a probabil istic relationship to the judgment (Chr istensen & Elste in, 1991 ; Eddy, 1988). Not all aspec ts of cl inical judgment are fully understood. In particular, how a cl inician we ighs and comb ines var ious cues has been a source of unresolved inquiry. O n e reason for the difficulty in studying clinical judgments is that act ions taken based on judgment may alter subsequent events. A l s o , it is difficult to simulate judgment tasks in the laboratory; yet, very often, conduct ing a judgment study in the cl inical sett ing poses ethical and practical problems. The study of cl inical judgment has a long history a s an important a rea of research (Bieri et a l . , 1975; Einhorn, 1974; Meeh l , 1954; O s k a m p , 1962; Sarb in , Taft, & Bai ley, 1960). In addit ion, recent studies a imed to capture physic ians' judgment pol ic ies are avai lable (Deber, 1986; P o s e s & Anthony, 1991; Speroff, Connors , & D a w s o n , 1989; Wigton, 1988). 3 The Context Clinical judgment takes p lace in many sett ings with a variety of c l in ic ians. Nurses were the subjects in this study, and their judgments about wound heal ing in surgical patients were investigated. A s health care professionals, nurses make clinical judgments on a regular bas is . Many studies of nurses ' judgments are found in the literature (Benner, 1984; Benner , Tanner , & C h e s l a , 1992; Corco ran , 1986b; Gordon, 1980; Gordon , Murphy, C a n d e e , & Hiltunen, 1994; K. R. Hammond , 1964, 1966a; K. R. Hammond , Kelly, Schneider , & Vanc in i , 1966; Itano, 1989; Kel ly, 1964). The context of this study was nurses ' judgments related to wound heal ing. Coope r (1990a, 1990b), a clinical nurse scholar with expert ise in this f ield, v iewed assessmen t of wound heal ing as fundamental to caring for surgical patients. C o o p e r (1990b) argued that "only with a clear grasp of the intricacies of the heal ing trajectory are cl inicians able to provide optimal, goal-directed wound care" (p. 168). Judgments of wound heal ing are b a s e d on physiological theory that reveal connect ions between the cues and the t ime that incisions take to heal (Cruse & Foord , 1973, 1980; Irvin, 1981; P. L J o n e s & Mi l lman, 1990; Vil janto, 1991). This connect ion with theory is hypothesized to provide a bas is for causa l s c h e m a s that link particular data to heal ing t ime. In addit ion, cues that are diagnost ic of problems assoc ia ted with heal ing should , over t ime, become perce ived as a pattern. Cons iderab le research in cognit ive psychology has been focused on understanding performance in problem solv ing, judgment, and decis ion making (Groen & Pate l , 1991; K a h n e m a n , 1991; Pi tz & S a c h s , 1984; Shu lman & Elste in, 1975). Resea rch on judgment quality within a cl inical judgment context has potential to reveal cognit ive changes that contribute to the understanding of expert ise in general (Elstein & Bordage, 1988; Norman, Pate l , & Schmitt, 1990). Mak ing 4 judgments can be cons idered as a type of unstructured problem solv ing where no algorithm exists to calculate a correct response. Se lec ted studies related to problem solv ing, a s well a s research directly related to judgment performance, therefore, have been uti l ized. Al though nursing judgments are substantively different from medical judgments, they are similar in p rocess ; pertinent literature on judgment from both nursing and medical perspect ives has been reviewed. The Problem Relating Experience and Judgment Accuracy In this study, the problem being investigated is that of understanding the relationship between exper ience and expert ise in cl inical judgment performance. C o m m o n s e n s e suggests that exper ienced nurses will make better quality judgments, compared to novices. Nurses interact with a variety of patients where exposure to patterns of clinical data occurs repeatedly. It s e e m s logical to expect that, with exper ience, nurses would learn to weigh and integrate cues in a manner that reflected the cl inical ecology. Many studies on clinical judgment, however, fail to demonstrate a posit ive relationship between exper ience and judgment accuracy. For example , Go ldberg (1968), in his thorough review, conc luded that "the amount of professional training and exper ience does not relate to judgmental accuracy" (p. 484). More recently, Garb (1989, 1992) provided ev idence that supported Goldberg 's c la im. In addit ion, Camere r and Johnson (1991) c la imed "experts, paradoxical ly, are no better at making predictions than novices" (p. 210). Literature in cl inical medic ine revealed the s a m e f indings; for example , Dawson , Connors , Hs iue, and S h a w (1987) found no relationship between exper ience and judgment accuracy of patients' hemodynamic status. Studies 5 consistently fail to reveal what is general ly a s s u m e d to be true: that cl inicians increase their cl inical judgment accuracy with exper ience. Explanations for the Lack of Relationship Brehmer (1980b), Dawes (1989), and Einhorn (1980) examined the difficulties involved in learning from exper ience in probabil istic environments. T h e s e authors expla ined the absence of a relationship between exper ience and judgment accuracy shown in previous studies: when outcomes are dependent on the judgments made, cl inicians are prevented from obtaining accurate feedback needed to learn optimal decis ion rules. In addit ion, Einhorn and Hogarth (1978) proposed that prolonged clinical exper ience can induce high confidence in the accuracy of judgments, exceed ing that warranted by judgment validity; such conf idence funct ions to maintain an illusion of validity. T h e s e explanat ions, however, may not fully account for previous results. The possibil ity exists that previous f indings represented type II errors (a posit ive relationship could exist, but researchers have fai led to reveal the true state). Schmidt (1992, 1996) expla ined that researchers in psychology and other socia l sc iences have traditionally relied on a variety of statistical tests of s igni f icance in interpreting the meaning of data. The prevail ing decis ion rule has been as fol lows: if the statistic (for example , t or F) is significant, there is an effect or a relation. If the statistic is not significant, then the conclus ion is made that there is no effect or relation. W h e n traditional procedures for interpretation are used , the focus is on controll ing type I error. C o h e n (1990, 1994) and Schmidt (1992, 1996) c la imed that beta levels (type II errors) are frequently in the 5 0 % to 8 0 % range. Ashe r (1993) made the point that meaningfu lness must be a s s e s s e d on the basis of theory. Ser l in (1987) stated that "it is only on the bas is of theory that one 6 can dec ide . . . whether the experimental results can be genera l ized" (p. 365). In the present study, the theoretical concept ions included the ideas that the relationship between exper ience and judgment expert ise revealed through laboratory research is inf luenced by the condit ions under which judgment per formance is a s s e s s e d and how exper ience is def ined. In addit ion, cognit ive factors may interact with indicators of expert ise and with the condit ions. The particular cognit ive factors examined are conceptual structure, sensitivity to patterns in data, and judgment p rocess . It is possib le that smal l , but nevertheless meaningful relations (that is, relations that are meaningful within the context of a particular theoretical perspect ive), are present that are not revealed through traditional means . Conceptual Structure Conceptua l structure refers to one's mental organizat ion of domain-speci f ic concepts, including the inter-relationships among concepts . Th is structure significantly inf luences the interpretation a person makes in response to combinat ions of clinical cues . Severa l researchers have identified particular changes in conceptual structure with expert ise (Bordage & Lemieux, 1991; Homa, Rhoads , & C h a m b l i s s , 1979; Medin , 1989). It is not the configurations of cues, in themselves, that function as causa l st imulus for judgment performance, but rather, the meaning of cue configurations constructed within the conceptual structure. It s e e m s reasonable, therefore, to propose that conceptual structure could be an important var iable to consider when studying judgment performance. One rationale for previous judgment studies failing to demonstrate a relationship between clinical judgment performance and exper ience is that in these studies the fact that certain changes in cl inicians' conceptual structure may be 7 essential ior expert judgment to be demonstrated may not have been cons idered. C h a n g e s in conceptual structure do not occur automatically with exper ience; in previous studies, subjects with expert-l ike conceptual structures may not have been adequately represented. Sensitivity to Patterns in Data Reber (1989) c la imed that "when a st imulus environment is structured, people learn to exploit that structure" (p. 221). Experts demonstrate a high degree of sensitivity to patterns in the data. O n e rationale for previous investigators failing to find a relationship between exper ience and judgment performance is that typical patterns in clinical data may not have been included in experimental contexts; these omiss ions would tend to reduce exper ience-related dif ferences obtained from performance in laboratory tasks. Bi rnbaum (1982) pointed out the importance of the structure of the laboratory task. If the structure des igned for research is different from that of the clinical ecology, intuitive (tacit) knowledge acquired from exper ience may not be useful a s a guide to judgment performance. Cl in ic ians who make judgments in such a setting would not benefit from using the typically effective cognit ive strategies and heurist ics that they normally used (Tversky & Kahneman , 1974). Two categor ies of cues examined in this study were referred to as context cues and individuating cues. Context c u e s included global information about patients (such as age, gender, d iagnosis, and surgery) that facil itated the construction of an appropriate context. Individuating cues inc luded speci f ic patient data (such as surgery t ime, compl icat ions, and blood loss). In making judgments, cl inicians frequently use the context cues as an anchor to make a rough est imate, 8 while the other cues are used for making precise adjustments (Tversky & Kahneman , 1974). In this study, the order of cue-presentat ion was manipulated. K. R. Hammond (1990) c la imed that task factors inf luence the type of cognit ion that subjects employ: Hammond cons idered tasks that involved rapid judgments as intuition-inducing, whereas he regarded tasks that required s low deliberation as analys is- inducing. If expert ise in judgment were contingent upon cl inicians' sensitivity and respons iveness to patterns of cues , then altering the sequence of cue presentation was theor ized to have a select ive inf luence on performance accuracy for exper ienced subjects. Per formance for these subjects was expected to dec rease when c a s e s were presented in an atypical order as this manipulat ion was anticipated to interfere with intuitive process ing. Structural di f ferences between the cues represented in research tasks and those naturally avai lable in life are often unavoidable. In clinical sett ings, information is not formatted in terms of discrete cues , but is an integral aspect of phenomena and events in a cont inuous environment. Fr iedman, Howel l , and J e n s e n (1985) stated: Pol icy-capturing studies general ly use numerical va lues as cues - in e s s e n c e , providing the judge with data that have already been p rocessed to a degree. In their natural habitat, of course, decis ion problems are not conveniently decomposed into these e lements (p. 666). Because life does not come "pre-packaged" so precisely (Kolers, 1977, 1979a), the presentation of label led cues for research has potential to alter task structure. Exper ienced cl inicians may be prevented from exhibit ing the degree of exce l lence of which they are capable in making judgments from the "raw" sensory input. T a s k s structured for research purposes often contain little ambiguity; reality is 9 translated into symbol ic descript ion. With such reduction, the difficulty for nov ices dec reases , and an artificial cei l ing for exper ienced cl inicians is created. Any relationship between exper ience and expert ise b e c o m e s attenuated. Recent ly , Lamond , Crow, C h a s e , Doggen, and Swink ies (1996) ra ised this concern about the presentation format of information with respect to research in nursing. To address these problems, the researcher may consider employing alternative des ign features s u c h as conduct ing studies in naturalistic sett ings with authentic patients; such a solution, however, poses numerous ethical and practical difficulties. A possib le approach to minimize the effects of the difference in cue-structure between low-fidelity tasks and actual judgments that occur in the cl inical setting, may be to prime subjects' memor ies for previous cases . Showing s l ides of critical stimuli to subjects as part of the research method may be sufficient to act as a "prime" for subjects who have had relevant exper ience. If experts implicitly use their rich store of memory-based c a s e s to facilitate judgment performance, this deliberate triggering of mental context may be a way to tap into implicit, or tacit, knowledge that guides expert judgment. In consumer research, Bet tman and Su jan (1987) and Herr (1989) found differential priming effects on judgment for high- and low-knowledge subjects. In this study, case -based memor ies are theor ized to be part of an internal context, or s c h e m a , that inf luences performance (Groen & Pate l , 1986; Pate l , G r o e n , & Freder iksen, 1986). If present ing sl ides is success fu l in instantiating this mental context, judgment performance in the laboratory may be select ively enhanced . Subjects who have implicit knowledge of cue-criterion relations that reflect the clinical ecology, should have greater facilitation of judgment performance, compared to other subjects. 10 Judgment Process A third explanat ion for the lack of a relationship between exper ience and judgment performance in previous studies is that traditional methods to a s s e s s performance under research condit ions may not have been des igned to capture the judgment process used by expert cl inicians in the clinical setting. During the t ime when cl inicians learn to make particular judgments, information process ing is theorized to be initially an application of proposit ion-l ike rules (declarative knowledge, or "knowing that"). Later the rules become transformed into a dynamic network of relations (procedural knowledge, or "knowing how"). Un less research tasks used to reveal expert ise have sufficient flexibility to al low for such process ing differences, expert ise-exper ience relat ionships as measured in the laboratory may not be a valid indicator of actual relat ionships. It is not clear from the judgment literature how patterns of information are p rocessed , and how such process ing var ies with exper ience. The topic of pattern recognition has been of interest to cognit ive psychologists for many years (Campbel l , 1966; Neisser , 1976; R e e d , 1972). W h e n cl inicians work in an a rea for a period of t ime, they acquire an implicit understanding of the patterns of cues in the ecology which is reflected in their judgment process. Th is knowledge may be used procedural ly a s a bas is for p rocess ing the data from e a c h case , not a s separate p ieces, but interactively (or configurally). Perceptual recognit ion is enhanced by the meaningfu lness of cue patterns (Reicher, 1969). Exper ienced judges often claim that they process cues in terms of their patterns (Brehmer, 1969; Brooks, Norman, & Al len, 1991; Go ldberg , 1969; Hoffman, 1968). Meehl (1950), over 45 years ago, stated: 11 O n e of the most important words in the vocabulary of cl inicians is the word 'pattern'. W e speak of ourse lves a s thinking in terms of totalities, organizat ions, configurations; and we . . . look upon c a s e material in this patterned, non-atomistic way (p. 165). In the literature, there are major dif ferences in interpretation of research on the role of configural process ing in judgment performance. T h e authors of many studies using regression methods demonstrated that patterned process ing of cues accounted for only a smal l proportion of var iance in judgment performance (K. R. H a m m o n d , Hursch, & Todd , 1964; K. R. Hammond & S u m m e r s , 1965). Other authors, such as N. H. Anderson (1969, 1972) and N. H. Anderson and Shan teau (1977) c la imed that configurality in judgment is common, but cannot be measured by regression methods. To the extent that experts process cues configurally as a means of attaining accuracy, researchers who use approaches which are insensit ive to configural process ing may fail to find difference with exper ience. Methods of a s s e s s i n g configurality in addition to regression have been explored. Assessment of Experience A fourth possibil ity for failure to find a relationship between exper ience and judgment per formance is that exper ience may not have been a s s e s s e d optimally. In this study, exper ience was measured in three ways . First, the number of years s ince nursing graduation was obtained, providing a t ime-based measure with good variability. The second index of exper ience was the number of months car ing for surgical patients; this method had a poss ib le advantage over the first in that the re levance of exper ience to the judgment was taken into account. The third approach was to obtain an est imate of the number of t imes the subject v iewed an incision during his or her career. 12 A g e and educat ion were important var iables to include in order to make correct interpretations of the relations between exper ience and judgment performance. The exper ienced nurse is typically older, but may, or may not, have had more educat ion compared to the inexper ienced nurse. Any difference in expert ise assoc ia ted with exper ience, therefore, may be more reflective of the effects of age and/or educat ion than length of work exper ience. By obtaining data on both age and educat ion, and using three indices of exper ience, a more complete understanding of the relat ionships may be attained. Summary of the Problem The problem that has been addressed in this research is summar ized a s fol lows: several researchers have searched for, but have fai led to f ind, a relationship between exper ience and clinical judgment performance. O n one hand , the conclus ion that no relationship exists may be val id, with explanat ions provided by Brehmer (1980b), D a w e s (1989), Einhorn (1980), and Einhorn and Hogarth (1978) for why accuracy in judgment performance does not increase with exper ience. O n the other hand, these f indings may represent only partial truth; a posit ive relationship may in fact exist, but for many reasons, it is difficult to demonstrate in the laboratory context. A deeper understanding of the patterns of relat ionships among cognit ive factors, individual dif ferences, and clinical judgment per formance is necessary to reveal the nature of the relationship between judgment expert ise and exper ience. It may be that it is not exper ience a s s e s s e d in years or months that is directly related to judgment performance, but rather the cognitive impact that such exper ience has had. S u c h impact var ies with dif ferences in cognit ion as demonstrated through conceptual structure, sensitivity to patterns in data, and judgment process . In this study these explanat ions were explored. 13 Purposes of the Study The three purposes of this study were: 1) to demonstrate dif ferences in accuracy of per formance in a cl inical judgment task and identify patterns of relations between indicators of expert ise from the literature (including judgment accuracy) and exper ience; 2) to a s s e s s whether cognit ive constructs (conceptual structure, sensitivity to patterns in data, and judgment process) , and individual di f ferences in age, educat ion, and exper ience were predictors of clinical judgment per formance; 3) to a s s e s s whether condit ions (order of cue-presentat ion condit ion and memory-pr iming condition) revealed patterns in clinical judgment performance. Significance of the Study The study has both theoretical and practical importance. First, it has potential to add to knowledge about how people search for, weight, and integrate cues to make clinical judgments in a probabil istic context. B e c a u s e there is considerable ev idence that s u c h judgments are difficult to make, and may not be m a d e very accurately, this is an important aspect of judgment research to pursue. The second aspect of theoretical s igni f icance is that this study has potential to contribute to knowledge about changes in cognit ion with expert ise. A n attempt to identify poss ib le reasons why many cl inical judgment studies have fai led to f ind a relationship between judgment accuracy and exper ience ought to il luminate s o m e of the means by which judgment expert ise occurs . The major practical s igni f icance of this study is that better understanding of the reasons for previous f indings will provide some direction for educators in the 14 health care field. T h e s e educators are responsib le for developing curr icula and teaching cl inical judgment to student health care professionals. T h e knowledge gained about the influence of cognit ive factors may be useful to assist educators in encouraging ways to increase students' learning from exper ience. In addit ion, the f indings may constitute ev idence that for some judgment contexts, exper ience does not automatically bring about increased judgment performance. Th is study may provide an impetus for the development of methods to enhance appropriate learning from exper ience. Another more speci f ic aspect of this study that has potential for practical importance is that related to judgment conf idence. From the literature, many exper ienced practitioners have high conf idence in the accuracy of their judgments that may be unwarranted. A s Schwar tz and Griffin (1986) pointed out, cl inical judgments are risky, with costs assoc ia ted with errors of both types (misses and false alarms); this research into the influence of cognit ive var iables and exper ience is a preliminary step towards identifying ways to improve judgment cal ibrat ion. In summary, a judgment study is important in educat ional psychology. Lewis and Anderson (1985) c la imed: "As educators we would like to make the acquisit ion of expert ise more efficient; as psychologists, our interests lie in defining the p rocesses which underl ie complex skill acquisi t ion" (p. 26-27). B e c a u s e making judgments is an attempt to go "beyond the data" in a way that leads to accurate outcomes, research in transition in judgment may reveal human reasoning at its best. Overall Research Question What are the patterns of relat ionships among measures of se lected cognit ive constructs (conceptual structure, sensitivity to patterns in data, and 15 judgment process) , individual difference var iables (age, educat ion, and exper ience), task condit ions, and performance in a cl inical judgment task? Specific Research Questions In a domain-speci f ic , probabil istic clinical judgment task, where outcomes are only moderately predictable: 1. What are the patterns of relat ionships among var ious measures of judgment performance (indicators of expertise) and exper ience for a group of subjects in a clinical judgment task? 2. What are the patterns of relat ionships among measures of conceptual structure, sensitivity to patterns in data, judgment p rocess , and performance in a clinical judgment task? 3. Wha t are the patterns of relat ionships among individual di f ferences in age, educat ion, and exper ience and performance in a cl inical judgment task? 4. To what extent does cue-presentat ion condit ion (context cues fol lowed by individuating cues , or the reverse), reveal patterns of relat ionships in performance in a clinical judgment task? 5. To what extent does memory-pr iming condit ion (exposure to relevant domain-speci f ic visual stimuli, versus no exposure), reveal patterns of relationships in performance in a clinical judgment task? 6. Of all the measu res included in the study, which measu res are most predictive of clinical judgment performance? 16 II. REVIEW OF THE LITERATURE Chapter 2 is divided into five sect ions. Sect ion A is an introduction to the major concepts of expert ise and clinical judgment from a cognit ive perspect ive. In sect ion B, f indings from judgment studies related to var ious indicators of judgment quality, including accuracy and consis tency are reported. In sect ion C , f indings from the literature assoc ia ted with both expert ise and clinical judgment are descr ibed. Addit ional factors (such a s cognit ive b iases) that inf luence judgment per formance are examined in sect ion D. The last sect ion is a summary of concepts from the literature that have provided a theoretical base for the present study. Section A: Expertise in Clinical Judgment Expert ise has become an intriguing subject for investigation across many discipl ines. Stud ies have been conducted that contrast novice and expert performance in many f ields. Examp les include: phys ics (Chi, Fel tovich, & G laser , 1981; Larkin, McDermott , S imon , & S imon, 1980); nursing (Benner, 1984; Tanner, 1984); mathemat ics (Schoenfeld & Herrmann, 1982); teaching (Arlin, 1993; Clarr idge & Berl iner, 1991); and account ing (Choo, 1989; Ashton , 1991). The d iscuss ion in this sect ion includes characterist ics related to expert ise, a descript ion of the Dreyfus model , characterist ics of judgment performance, and the use of the lens model for cl inical judgment. Characteristics of Expertise Severa l authors have descr ibed characterist ics of expert ise (Benner, 1984; C h a n , 1982; Glaser , 1986, 1989; Hampton, 1994; Kennedy, 1987; Rabinowi tz & Glaser , 1985; Shan teau , 1987, 1988). The following is an overview and synthesis of many accounts from the literature. 17 Expertise requires extensive domain-specific knowledge. G lase r (1986), Posne r (1988) and others have identified that knowledge is critical to expert ise; experts must have extensive, up-to-date content knowledge in their f ield. B iomedica l knowledge is fundamental to medical reasoning for medical students and physic ians (Patel, E v a n s , & G r o e n , 1989; Pate l & Groen , 1986). Expert ise var ies dramatical ly as a function of the e laborateness and r ichness of the connect ions between units of knowledge. A n integrated network of domain-speci f ic information is required. Benner and Tanner (1987) descr ibed what constitutes domain-speci f ic knowledge in nursing: The language of d i sease is a language of pathology and tests to rule out or confirm hypotheses. The language of i l lness is a human language with emot ions and l ived expe r ience . . . . Nurses deal with both i l lness and d i sease , but they do not limit their knowledge of patients to d isease facts or physiological states. They have learned, over time, to find a "grasp" or "understanding" of patients' i l lnesses . . . Persona l histories and the contexts of the i l lness are as necessary to them as the traditional s igns and symptoms of the d isease (p. 25). Nu rses need a level of b iomedical knowledge as well as knowledge of human sc ience to be able provide holistic care to patients in a safe and car ing manner. Cl in ical knowledge has a high degree of domain-specif ici ty (Elstein, Shu lman , & Spra fka , 1978; Palchik, Wolf, Cass idy , Ike, & Davis , 1990). Often knowledge is assoc ia ted with memor ies of patients who may have been encountered years, or even decades , earl ier (Allen, Norman, & Brooks, 1992). Expertise requires competent use of knowledge. S imp le possess ion of knowledge is insufficient for expert ise: there is, in addit ion, skilled use of knowledge to address practice problems. Patel and Groen (1991) suggested that knowledge progression occurred in three-stages: the first involved the development of adequate 18 representat ions of knowledge, the second stage involved learning to dist inguish between relevant and irrelevant data, and the third s tage required using relevant representat ions in an efficient manner. Competent behavior requires that relevant information is readily avai lable when needed . S c h o n (1983) descr ibed this knowledge as tacit, or implicit, in patterns of act ion. It is a spec ia l ized form of procedural knowledge, or "knowing how", a s dist inguished from merely "knowing that" (Ryle, 1949). Th is "knowing-in-act ion" is apparent in appl ied professions such as teaching and nursing. J e n k s (1993) provided an example of this "knowing-in-action" as she descr ibed the ways nurses use personal knowledge of patients in their care. Expertise involves well-developed metacognition. Metacogni t ion can be defined as "thinking about thinking, or knowledge related to sel f-appraisal and self-regulation of one 's thinking and act ions" (Paris & Ayres , 1994, p. 167). In studies of expert ise, researchers have reported higher levels of metacognit ion in experts compared to nov ices (VanLehn, 1989). Ch i , G laser , and R e e s (1982), for example , found that experts were more accurate than novices at rating the difficulty of physics problems. More recently, Etelapelto (1993) c la imed that experts were superior to nov ices in their metacbgnit ive knowledge, task-speci f ic awareness , and cognit ive monitoring. One form of metacognit ion that is particularly important in relation to expert performance is condit ional knowledge. S u c h knowledge is an aspect of meta-cognit ion wh ich is informative about the appropriate condit ions and situations where knowledge and skil ls are best appl ied; condit ional knowledge helps people to know when, where, and why to apply their strategies (Paris & Ay res , 1994). Par is and Byrnes (1989) stated that condit ional knowledge may be fundamental for the spontaneous transfer of appropriate strategies to new situations, a skill that is critical to expert performance. 19 Expertise involves skilled attention. Cons iderab le research has been carr ied out with respect to attention (Nosofsky, 1987a, 1987b). In a study of expert medical practitioners and novices, Norman, Brooks, and A l len (1989) found dif ferences in attention between the two groups, but only under implicit memory condit ions; under explicit condit ions, no dif ferences were revealed. Bar roso (1985) investigated how attentional strategies of medical students were modif ied by exper ience; he reported that experts learned to discriminate relevant and irrelevant data, and focused their attention accordingly. Krogstad, Et tenson, and Shan teau (1984) and Et tenson, Shan teau , and Krogstad (1987) found that two groups of expert auditors focused attention on one cue, whereas the novices used severa l different cues , with no group consensus . S u c h behavior w a s v iewed as skilled omission rather than a cognit ive limitation. Clarr idge and Berl iner (1991), in a study of novice and expert teachers, reported differences between the teacher groups in ability to pay attention and notice unacceptable c lassroom behaviors. Experts were able to focus attention on aspects that non-experts either over looked or were unable to see . W h e n already extracted information is presented to novices (as is often the c a s e in judgment studies), these subjects are often capab le of making judgments that are nearly as good as those of experts. Benner and Wrubel (1982) descr ibed the ways in wh ich nurses ' attention and perceptual abilities become honed; expert nurses learn sal ient qualitative distinctions, and ach ieve a perceptual grasp of whole situations, which often include what action to take. "Expert nurse cl inicians who have spent many hours observ ing such subtle patient changes as those related to progress in labor, sept ic shock, or 20 wound heal ing, become discriminating judges of these states" (Benner & Wrube l , p. 12). Expertise involves sensitivity to meaningful patterns. R e s e a r c h carr ied out by C h a s e and S i m o n (1973) revealed that expert c h e s s players cou ld be dist inguished from those who were less ski l led by the experts ' ability to correctly reproduce meaningful patterns of c h e s s p ieces. Memory performance for these subjects for randomly p laced c h e s s p ieces was only average. This finding led to the notion of "chunking", which helped account for performance that would exceed non-experts ' working memory capaci ty. Shan teau (1987, 1988) argued that experts are able to see patterns that nov ices cannot, and make use of these patterns in judgment. Bennner and Tanner (1987) descr ibed pattern recognit ion from a nursing context. T h e s e authors c la imed that ski l led pattern recognit ion is intuitive knowledge based on background understanding and memor ies of ski l led clinical observat ions of past whole situations. For cl inicians with expert ise, pattern recognit ion is useful in understanding and possibly resolving unstructured problem situations. Expertise often involves fast performance. S p e e d assoc ia ted with expert ise c a n be accounted for, in part, by the compi lat ion of separate components of a skill into larger units as automatization occurs. This p rocess has been descr ibed by Shiffrin and Schne ider (1977). S u c h automat ized process ing is fast, effortless, and not avai lable to awareness , and thus, experts are often inarticulate in descr ib ing judgment strategies. Berl iner (1986) attributed difficulty with articulation to automatization. Not all aspec ts of judgment performance, however, become faster with expert ise: experts take longer to acquire a qualitative understanding of a 21 novel problem (Chi et a l . , 1981), but the t ime from problem representat ion to solut ion is usually rapid. Dreyfus Model of Expertise Holyoak (1991) descr ibed the progression of research with respect to expert ise. First generat ion research began with the early insights of Newel l and S imon (1972) as one of the earl iest teams to investigate differences in cognit ion between novices and experts. T h e s e researchers a s s u m e d that humans operated as limited information process ing sys tems, and argued that problem solv ing took place by search in a "problem space" . Newel l and S imon contended that heuristic methods, such as means -end analys is , could be appl ied to a broad range of domains. The expert was v iewed a s someone particularly ski l led at general solut ion methods. T h e s e researchers focused on well-defined problems such as criptarithmetic and the Tower of Hanoi problem. Later it became evident that detai led, domain-speci f ic knowledge w a s essent ial to expert ise. Genera l problem solving methods were recognized as more characterist ic of novice rather than expert performance. T h e s e c o n d generat ion of theories emphas ized the power of highly automat ized knowledge, which involved compil ing separate components of a skill into larger units. C h a s e and S imon 's (1973) hypothesis that expert ise involved the development of large integrated "chunks" of related and meaningful knowledge became understood as an instantiation of compi lat ion. The expert w a s someone who had pract iced a task and was able to so lve problems and make good judgments efficiently. It is poss ib le that a third generat ion of expert ise theories has begun (Holyoak, 1991; Patel & Groen , 1991; Schneider , 1987). S e c o n d generat ion theories focused on routine expert ise where experts were outstanding at speed and accuracy. In 22 contrast, performing well in unpredictable situations required what Hatano and Inagaki (1986) referred to as adaptive expertise. They real ized there w a s more to expert ise than simply a high skil l level. A n adapt ive expert is one who hand les novel situations wel l , has insight and imagination, and can reason from bas ic principles. Kennedy (1987) proposed that expert ise goes beyond technical ski l l : she summar ized the addit ional characterist ics of expert ise as the ability to apply genera l principles, to know when to change the rules, and to engage in interactive (dialectical) relat ionships between analys is and action. This type of expert is sensit ive to the meanings of patterns of data and can s e e fundamental problems and a lso envis ion new possibi l i t ies, where others do not. Kennedy 's account of expert ise is similar to Arl in 's (1990) descript ion of w isdom in terms of problem f inding. Ar l in proposed that "having a s e n s e of taste for problems that are of fundamental importance" (p. 235) may be an aspect of problem finding that is assoc ia ted with w isdom. Links between people 's thinking and the quality of their problem-solving and judgment ability have a lso been identified in earlier literature (Arlin, 1986). Us ing a developmental perspect ive, Arl in examined formal and postformal thinking and related performance of young adults in il l-defined (or unstructured) problem situations. Arl in character ized adult thinking as possibly represent ing both contractions of logical sys tems and expansions of those s a m e sys tems. The use of faulty logic and simplif ied representat ions of reality (both of which compromise judgment and dec is ion making) constitute examples of the former. The ability to think in relativistic terms in a manner that is receptive to new information is an example of the latter. Relat ivist ic thinking may be particularly helpful to the development of expert ise in judgment, because good judgments are based on the appropriate integration of probabil istic cues . 23 Theor ies with flexibility to account for both routine and adaptive expert ise are needed, including descript ions of the thinking and performance at var ious levels, and the ways in which the development of expert ise takes p lace. The model of expert ise deve loped by Dreyfus and Dreyfus (1986) w a s se lec ted for this study. Th is model is based on the progression of expert ise in terms of cognit ive changes . Bechtel (1988) and Holyoak (1991) have elaborated on these changes assoc ia ted with expert ise, and have c la imed that with expert ise c o m e s a qualitative change in cognit ion from the deliberate focus on declarat ive knowledge governed by analytic rules, to the more automatic process ing of domain-speci f ic patterns of data. Researchers in nursing (Benner, 1984) and in teaching (Berliner, 1988; Clarr idge & Berliner, 1991) have used this model to investigate expert ise in their respect ive domains. Empir ical support for the model has been found: exper ience appears necessary but not sufficient tor expert ise; the changes in cognit ion are critical. A summary of the model fol lows, based on Dreyfus and Dreyfus (1986), Benner (1984), and Benner , Tanner, and C h e s l a (1992). Novice level. The novice employs precise rules which apply to objectively speci f iable c i rcumstances that are recognizable without exper ience. B e c a u s e the rules are context-free (independent of other aspec ts of the situation), behavior is relatively inflexible. The novice lacks a coherent sense of the overall task, and often evaluates performance by how well the learned rules were fol lowed. Advanced beginner level. The advanced beginner starts to recognize the role of context and modif ies the rules in some situations. T h e s e except ions are speci f ied in terms of previously encountered situations. The advanced beginner melds book knowledge with on-the-job exper ience, building up c a s e or ep isod ic knowledge in associat ion with semant ic knowledge. Thus , al though this individual's 24 performance becomes more flexible than that of the novice, it remains s low, uncoordinated, and laborious. Competent level. The competent performer of a skill is dist inguished by the development of goa ls that facilitate the coordinat ion of rules and known facts. Ru les are appl ied not simply because they are the rules, but because they are perceived to be helpful in reaching goals. The person actively directs efforts rather than responds passively to events. A n individual at this mid-range level of expert ise d isp lays rational planning, consc ious assessmen t of e lements that are salient with respect to the plan, and analyt ical, rule-guided cho ice of act ion. Proficient level. The proficient performer moves beyond the del iberate mode of reasoning and begins to rely on recall ing previous events similar to the current one. This recognit ion is based not on speci f ic features, but on holistic similarity. O n c e the recollection occurs, the proficient performer may proceed analytical ly as would a competent performer. Where the proficient performer exceeds the one who is competent is in the ability to bring relevant past situations to bear, and use these in establ ishing goals and applying rules. Expert level. The expert no longer exhibits the del iberat iveness of competent performance; the whole process of responding becomes smooth and fluid. T h e expert s e e s the situation and sees what to do, and often cannot articulate reasons behind the judgment. The expert responds intuitively. Hav ing enough exper ience with a variety of situations, the mind of the expert decomposes the c lass of situations into subc lasses that share the s a m e goal and dec is ion or tactic. Thus , a situation, when seen as similar to members of this c lass , is not only thereby understood, but s imultaneously the assoc ia ted decis ion or tactic presents itself. 25 Dreyfus and Dreyfus (1986) argued for a qualitative difference between the cognit ive activity of individuals who have attained competent performance, compared to that of people who have ach ieved either of the two highest levels. Us ing nursing subjects, Benner (1984) found a transformation in thinking and performance was required for nurses to progress beyond the competent level. The use of holistic recognit ion of similarity is a crucial e lement of this change. Exper ts have the ability to recognize situations as similar to those encountered previously and to categor ize them appropriately (Dreyfus & Dreyfus, 1986). Al though there is considerable support in nursing for Benner 's (1984) research on the progression of novice to expert based on Dreyfus and Dreyfus (1986), there is a lso s o m e crit icism. Engl ish (1993) contested that the definitions Benner used for intuition and expert ise were inadequate. He a lso disputed the idea that expert nurses use intuition in making clinical judgments. Us ing a cognit ive psychology perspect ive, Engl ish provided alternative interpretations of Benner 's examples of intuition demonstrated by expert nurses. Characteristics of Clinical Judgment The literature related to reasoning p rocesses reveals considerable over lap in the use of terms such as problem solv ing, judgment, and dec is ion making (Elstein, 1992). Within a medical context, Deber and Baumann (1992) used the term problem solving to refer to the search for the best solution to a problem, and decision making to refer to a situation in which a cho ice must be made from severa l alternatives. In practice, problem solving is often used to determine what has led to a current situation that is in s o m e respects not ideal, and dec is ion making is used to identify what is likely the best approach to take, given the situation. 26 S iege l - Jacobs and Ya tes (1996) defined judgment as "an opinion about the status of some event in the real world" (p. 4). T h e s e authors cons idered judgments to be more fundamental than dec is ions. Judgments usually refer to constructing a response where none are ready-made, whereas decis ion making implies select ing alternatives from a set. Abernathy and Hamm (1995) defined judgment as "the mental ability to combine simultaneously information from multiple sources , to go beyond wel l-def ined, sharply bounded categories by interpolating and extrapolating" (p. 217-218). Problem solv ing, judgment, and decis ion making all can take p lace at varying levels. For example , with one global problem, the goal may be to make an overall dec is ion about what action to take based on judgments of the relative va lues of outcomes, given varying c i rcumstances; there may, however, be many micro-decis ions to execute about which data to collect, and micro-judgments about the meaning of the data. There even may be micro-problems to so lve in the course of obtaining the data. Cl inical judgments can be character ized as fol lows: Clinjcal judgments are based on cues. Hogarth (1980) stated that predictive judgments are b a s e d on cues ; a prediction is the extrapolation of a relationship between a set of cues and a target event. Examp les of target events include making judgments about pain management for palliative care patients (Corcoran, 1986b), d iagnosing pulmonary embol ism from cl inical data (Wigton, Hoel ler ich, & Pati l , 1986), making judgments about patients who have suspec ted bacteremia (Poses & Anthony, 1991), and predicting outcomes in coronary d i sease (K. L L e e e t a l . , 1986). 27 Elstein (1976) defined clinical as "any of the artful, informal, qualitative, or not explicitly quantitative strategies general ly employed by the cl inician for [judgment] tasks" (p. 696). Cl in ical judgment involves detecting subtle patterns and weighing conflicting ev idence. Making judgments can be v iewed as responding optimally to a particular set of domain-speci f ic cues where there is no algorithm to derive a n absolutely correct answer. The cues contain only some of the information about the clinical state being judged; frequently, there are both redundant cues and miss ing cues . Clinical judgments are probabilistic. Medica l judgments are character ized as "risky" and "probabil istic" (N. H. Anderson & Shan teau , 1970; Schwar tz & Griffin, 1986). Ga rb (1989) c la imed that in order for cl inicians to learn from their exper ience to make good judgments, they needed to think in probabil istic terms. C o m p a r e d to learning deterministic rules, such thinking is very difficult. Chr is tensen-Sza lansk i and Bushyhead (1981) reported an example of probabil istic information process ing where physic ians showed sensitivity to the predictive va lue of the symptoms of pneumonia, but overest imated its probability. Elstein (1976) c la imed "the relationship between the d isease states and symptoms . . . is one of probability rather than logical necessi ty" (p. 697). Many of the clinical aspects of patient care in nursing involve process ing probabilistic information (Crow & Spicer , 1995). Tanner (1984) descr ibed the environment for nursing judgment as one where cues were fallible, low-validity information was redundant, situations were dynamic, and new data were constantly arriving. Clinical judgments contain uncertainty. Peterson and Pi tz (1988) dist inguished between conf idence and uncertainty. T h e s e authors def ined 28 conf idence a s a person's subject ive probability that his or her judgments were accurate. In contrast, they def ined uncertainty a s the person 's beliefs about the variability of possib le outcomes. If beliefs about a set of these outcomes can be conceptual ized as forming a subjective probability distribution, then uncertainty cor responds to the var iance of that distribution. Uncertainty is a function of the range of different outcomes that appear plausible to a judge. The greater this range, the more uncertainty is perceived in a task. Exper ienced cl inicians are often more aware of the uncertainty which is actually present, whereas novices tend to perceive much less uncertainty in the s a m e situation. A n example of model ing medical judgments under condit ions of uncertainty was presented by Boreham (1989); this author illustrated physic ians ' judgments about appropriate drug dosages of phenytoin for a variety of patients with epi lepsy. Uncertainty was demonstrated by the un iqueness of each patient, and the unpredictability of responses to medicat ion. Clinical judgment depends upon skilled use of knowledge. In order to make good clinical judgments, it is not enough to have textbook knowledge. The meaning of the cues as a set, within the particular context, must be understood at a deep level. M u c h of the domain-speci f ic knowledge is l inked to previous c a s e s where some particular aspect was seen as salient. Skil l in cl inical judgment, however, goes beyond the mere possess ion of specif ic knowledge. Cl in ic ians must have a s e n s e of what the appropriate patterns are, and yet be receptive to perceive what is actually happening as events unfold; they must s imul taneously ignore what does not matter, and tune in to what is critical as they construct particular mean ings that represent a synthesis of the relevant data. 29 Clinical judgments constitute both art and science. Elstein (1976) regarded expert clinical judgment as more of an art than a sc ience . Brehmer (1976) cons idered clinical judgments to be inductive inferences where the state of a criterion variable is inferred on the bas is of a set of information cues such as s igns and symptoms. Expert judgments, however, are not a lways logical, ru le-governed inferences of this type. S u c h a definition exc ludes implicit judgments that demonstrate procedural knowledge where the rules cannot be art iculated. Dreyfus and Dreyfus (1986) stated: "The expert is simply not fol lowing any rules! He [or she] is . . . recogniz ing thousands of specia l c a s e s " (p. 108). Brehmer 's definition may apply more to judgments made by novice cl inicians than by those made by experts. Dreyfus and Dreyfus c la imed that expert ise may consist not so much in being able to carry out complex inferences better than novices do, but in making the most plausible responses. Lens Model for Clinical Judgment The lens model was deve loped by Brunswik (1955, 1956) and descr ibed by K. R. Hammond (1955, 1966b, 1972, 1990), Hammond , Stewart, Brehmer, and Ste inmann (1975) and H a m m o n d and other co l leagues. Brunswik cal led his model the "lens modef because its pictorial representat ion resembled the focusing process of a lens. Brunswik bel ieved that a person interacted and made judgments in an external environment (an "ecology") in which the actual state of affairs (or true meanings) are hidden by the presence of numerous ambiguous (proximal) cues that have a probabil istic relationship with the actual (distal) state. Thus , in the judgment situation, a person must aggregate, integrate, or re-combine into a single response multiple fallible cues from the environment, including those which are separated in time. Hammond (1990) suggested that this concept ion of the judgment process gave rise to the use of a lens a s an analogy because of its condens ing features. 30 S e e Figure 1, which character izes the conceptual aspec ts of the model for a single person. The convergent properties of the lens are il lustrated by the l ines in the figure. Figure 1. Brunswik 's lens model , showing conceptual relations. K. R. Hammond (1966b) stated that the lens model was the fundamental theoretical bas is for Brunswik 's v iew of psychology. In the present study, the lens model w a s employed as one way to conceptual ize the components of the judgment task: the cues , the criterion to be judged, and the judgment of the criterion. The bas ic idea is that the cl inician weights cue d imensions and then integrates them into a unitary judgment. The mathematics related to this model are descr ibed in chapter 3. The conceptual aspects are descr ibed here to illustrate the relat ionships between the cues and the criterion state being judged. 31 One assumpt ion of the lens model is that judgment per formance is a function of the characterist ics of both the person as well as that of the task sys tem, or ecology. E a c h judge comes to the task with unique exper ience and background, and literally s e e s the cues in a different way (L. K. Hammond , 1970). Expectat ions, memor ies, and educat ional exper iences inf luence the percept ion of cues , their weighting, and their integration. The notion of the " lens" can be cons idered in a metaphoric way, in the sense that each person engaging in an interaction in an ecology perceives the cues through his or her own perspect ive, or " lens". Examin ing this source of difference is essent ia l in understanding how individual di f ferences in cognit ion inf luence judgment performance. The circles in the center of Figure 1 represent the informational cues upon which the judgments are based . The criterion state is the actual cl inical state (in this study, heal ing time), represented by the circle on the left. The l ines from the cues to the criterion state vary in th ickness, indicating that s o m e c u e s have stronger relations to the criterion than others; that is, they have more predictive potential. A dotted line reflects no cue-criterion relationship. S o m e , but not all, information about the judged state is conta ined in the probabil istic cues . Thus , the lack of perfect predictability of a criterion is a reflection of the uncertainty that is inherent in the task. The judgments (predictions) made about the criterion state are represented by the circle on the right s ide of the d iagram. L ines with varying th ickness that extend from the c u e s to the judgment represent a cl inician's weight ing of the cues . The weighting of relevant cues accord ing to their informativeness in the actual ecology constitutes ev idence for sensitivity to overall cl inical patterns in the ecology. Apply ing the lens model is one way to examine subjects ' sensitivity to these broad, cl inical patterns. This model provided severa l measures of cl inical judgment 32 performance, including overall sco res that descr ibed how closely subjects ' judgments were related to the criterion state (lens model achievement) , a set of cue-weights for each judge, and an index that a s s e s s e d the overal l degree to wh ich the subjects' weighting of cues matched their importance in the ecology. H a m m o n d et al . (1964) found that this latter index had "promise as an important component of cl inical inference" (p. 446). Section B: Summary of Relevant Research R e s e a r c h quest ion 1 is: What are the patterns of relat ionships among var ious measures of judgment performance (indicators of expertise) and exper ience for a group of subjects in a clinical judgment task? R e s e a r c h has been se lected for review that is relevant to judgment accuracy and consis tency (and other indicators of expertise) with the goal of clarifying the relationship between exper ience and expert ise. Judgment Accuracy as an Indicator of Expertise Thompson , Ryan , and Ki tzman (1990), writing from a nursing perspect ive, maintained that the bas is upon which expert ise is def ined var ies considerably among studies. S o m e researchers select cl inicians who have been working in the field for many years a s "experts". The problem with this approach is that, a s Faust et a l . (1988), Garb (1989), and Ridderikhoff (1991) argued, longevity in a domain does not automatically qualify one as an expert. Shan teau and Stewart (1992) and Shan teau (1992) acknowledged that the definition of expert is an obvious precondit ion to any analys is of expert ise. Shanteau recommended that people in a domain be al lowed to define the experts. In his research, Shan teau def ined experts 33 as "those who have been recognized within their profession as having the necessary skil ls and abilities to perform at the highest level" (Shanteau, p. 255). In s o m e studies, expert ise is def ined by credentials (P. E. Johnson , Hassebrock , Duran, & Moller, 1982; Schvaneveld t et a l . , 1985). This method is effective in f ields s u c h as c h e s s ; in other f ields, however, Clarr idge and Berl iner (1991) argued that soc ia l criteria often lack validity for research purposes. Implications of the definition of expert are illustrated in a study conducted by Patel and Groen (1986). T h e s e researchers investigated the diagnost ic reasoning p rocesses of 7 cardiologists with a s imulated c a s e of acute bacterial endocardit is. Four of these experts made a correct d iagnosis on this task, and 3 did not. All of these cardiologists, however, were cons idered to be using expert reasoning. Moskowi ts , Kuipers, and Kass i rer (1988) ana lyzed the dec is ion-making p rocesses of three expert physic ians (pulmonary special ists), and descr ibed the problem-solving strategies and knowledge representat ions that these subjects employed to make dec is ions. The ways in which these subjects dealt with r isks and difficult tradeoffs were reported. Large individual dif ferences were noted. The authors pointed out that the cognit ive procedures employed likely contributed to errors in decis ion making under uncertainty. Both of these studies are excel lent examples of research in cognit ive process ing. The authors, however, have predetermined that individuals who are deemed "experts" on the bas is of credentials will (by definition) employ expert reasoning in a given judgment task. Th is practice may not be a s useful for the purpose of identifying process-outcome links as defining expert ise on the bas is of performance criteria. A further example of the importance of the definition of expert ise is found in a study of novices and expert mathemat ics teachers. Berl iner et a l . (1988) deve loped an innovative predictive task, using multiple cho ice mathemat ics quest ions 34 (representative of those used in the Amer ican National A s s e s s m e n t Tests) as stimuli. Whi le thinking aloud, subjects predicted the percentage of students who correctly responded to such quest ions. This task revealed interesting process ing differences between teacher groups, but no dif ferences in accuracy . Thus , it is not safe to a s s u m e that exper ienced professionals are more ski l led on a particular task, compared to less exper ienced professionals. In the present study, the criterion for expert ise in cl inical judgment for a particular task w a s st ipulated to be judgment accuracy for that task. E r i csson and Smith (1991) expla ined the expert ise approach: [This] approach is an attempt to descr ibe the critical performance under standard condit ions, to ana lyze it, and to identify the components of the performance that make it s u p e r i o r . . . . Whe reas other approaches can use socia l indicators as criterion v a r i a b l e s , . . . [this] approach requires the des ign o f . . . tasks wherein the superior per formance can be demonstrated (p. 8). In order to make compar isons about judgment performance, it is important to examine judgments made by subjects who have a similar professional role. M. U. Smi th (1990) pointed out that in s o m e research, professors are identified as the experts, and students are the novices. The professors, however, may not necessar i ly demonstrate the s a m e cognit ive process ing as expert practit ioners in the field, and thus compar isons may be limited. For example, in investigating a range of expert ise levels, Murphy and Wright (1984) studied patient descript ions from subjects with different background preparation (practicing psychologists, counsel lors for chi ldren, and novice undergraduates). Compar i son of these descript ions would have limited usefu lness in il luminating the development of cognit ion a long an expert ise cont inuum. 35 Similarly, because clinical judgment demands certain domain-speci f ic knowledge, un less all subjects in a study have at least a base- l ine knowledge level, any performance differences identified likely should be attributed to these knowledge dif ferences. In one study conducted by Dawson , Zei tz, and Wright (1989), the nov ices were undergraduates se lected on the bas is of their lack of knowledge of chi ldren, and the experts were exper ienced supervisory staff at a treatment center for disturbed chi ldren. The experts had advanced degrees in psychology, specia l educat ion, or socia l work. Al though dif ferences in judgment performance were found, they most likely resulted from the large differences in knowledge. Consistency of Clinical Judgment Cons is tency is a necessary (but insufficient) criterion for judgment accuracy. Goldberg and Wer tz (1966) identified var ious types of cons is tency (or reliability) as important in cl inical judgment, including consensus over judges (with the s a m e data), and convergence over data sources (with the s a m e judge). A review of the early literature on judgment reliability has revealed that both of these types of cons is tency are low (Hunt, J o n e s , & Hunt, 1957; Watts, 1980). Koran (1975) presented ev idence of a lack of reliability of judgment in cl inical medic ine and surgery. For example, 3 surgeons a s s e s s e d the cl inical progress of patients being evaluated for surgery. Over severa l days, e a c h physic ian made a total of 72 examinat ions on the s a m e patients. A s s e s s m e n t s for dehydrat ion and abdominal rigidity could not be reliably judged; only 51 % of the t ime w a s agreement reached on the judgment of whether patients were improving or getting worse. Koran (1975) a lso documented that agreement between clinical and radiological judgments of liver enlargement was no greater than chance . Eddy (1988) reported that in a study of 13 pathologists making judgments on more than 36 1,000 biopsy spec imens , inter-judge agreement was 5 1 % ; (intra-judge agreement, however, w a s found to be moderately good, at 68%). In a study of nursing judgment, K. R. Hammond et a l . (1966a) found that the nurses studied did not employ cues in a consistent fashion. In other investigations where judgment performance has been a s s e s s e d (Brenner & Howard, 1976; Hoffman, S lov ic , & Rorer, 1968), the s a m e findings emerge. Not all inter-judge variation, of course, can be a s s u m e d to be unreliability. There are t imes when actual dif ferences in the phenomena being judged can account for what may initially appear to be a lack of consistency. For example , Tape , Hecker l ing, Ornato, & Wigton (1991) reported that dif ferences in phys ic ians ' judgments about the prevalence of pneumonia reflected regional variation in inc idence. S o m e judgment studies show adequate reliability. E inhorn (1974) investigated the judgments of 3 exper ienced pathologists for 193 biopsy s l ides and found inter-judge reliability of .69. Einhorn attributed specia l importance to the fact that these s l ides were authentic; interactions and context effects could play whatever role they normally did. Another investigation demonstrat ing cons is tency is a policy-capturing study in which 67 nurses made performance evaluat ions (Zedeck & Kafry, 1977). Et tenson et a l . (1987) reported good reliability with a group of auditors, with mean intra-judge consistency of .83 for experts and .66 for nov ices; inter-judge reliability w a s .74 arid .41 for experts and novices, respectively. Additional Indicators of Expertise Addit ional indicators of expert ise used in this study were conf idence, appropr iateness of conf idence, judgment latency, and knowledge accessibi l i ty. 37 Conceptua l aspec ts of these indicators are presented in this sect ion; descr ipt ions of their methods of measurement are presented in chapter 3. Confidence. Conf idence referred to a stated belief about the probability that one's judgment w a s correct. In the present study, conf idence w a s s e e n as metacognit ive knowledge; it was v iewed as a belief about the degree to which one 's judgments were accurate. Conf idence in cl inical judgment has been of interest for many years (O'Connor, 1989; Oskamp , 1962; Ryback , 1967). P a e s e and Sn iezek (1991) examined factors that inf luenced conf idence in judgment and found that increased practice and effort, and an abundance of relevant da ta tended to increase conf idence, without necessar i ly bringing about an increase in competence. Appropriateness of confidence. O s k a m p (1962) used the term appropriateness of confidence, wh ich he recommended as a major criterion for expert ise in clinical judgment. He proposed that expert cl inicians should be capable of dist inguishing when they were apt to be right from when they were apt to be wrong. A measure of confidence in relation to accuracy, therefore, was suggested as a way to supplement the use of accuracy alone. Garb (1986) and O s k a m p reported appropr iateness of conf idence w a s significantly related to professional exper ience. In contrast, Grebste in (1963) and Wedd ing (1983) found no relationship between subjects ' accuracy of neuropsychological judgments and conf idence. One way to val idate degrees of conf idence is to examine the calibration of a set of conf idence statements. Overconf idence occurs when , over a ser ies of judgments, stated probabilit ies exceeded the actual proportion of correct judgments (Paese & Sn iezek , 1991). A robust f inding from the judgment literature is that of overconf idence (Fischhoff & MacGregor , 1982; Fischhoff, S lov ic , & Lichtenstein, 1977; Lichtenstein & Fischhoff, 1977). O s k a m p (1965) found increasing conf idence 38 as the amount of information increased, without increases in judgment accuracy. Einhorn and Hogarth (1978) reported that even though exper ienced cl inicians demonstrated a substantial lackoi cl inical ability, they had great conf idence in their fallible judgment. T h e s e authors stated: "Neither the extent of professional training nor the amount of information avai lable to cl inicians necessar i ly increases predict ive accuracy" (Einhorn & Hogarth, p. 395). Not many studies focusing on conf idence and accuracy of judgment have been conducted in nursing. A n early study was one by K. R. Hammond et a l . (1966a) in which conf idence ratings of nurses were reported a s moderate, with little variation over 100 c a s e s . More recently, Baumann , Deber, and Thompson (1991) found overconf idence in a group of critical care nurses. Al though there is a preponderance of ev idence for the f inding of overconf idence in the literature, Vreugdenhi l and Koe le (1988) found under-confidence in subjects when predicting future events. A n interesting study on judgment conf idence is one conducted by Gigerenzer , Hoffrage, and Kleinbolt ing (1991). T h e s e researchers investigated conf idence from a Brunswik ian perspect ive; they contended that, contrary to arguments otherwise, people are good judges of the accuracy of their knowledge, provided that knowledge w a s representatively samp led from a speci f ied reference c lass . G igerenzer et a l . (1991) made the assumpt ion that if subjects cannot make a judgment based on information known directly, they constructed what these authors referred to as a "probabilistic mental mode l " (PMM) . "A P M M connects the speci f ic structure of the task with a probability structure of the corresponding natural environment (stored in long-term memory)" (p. 507). Th is P M M is the functional equivalent of Newel l and S imon 's (1972) "problem space . " G igerenzer and 39 co l leagues predicted conf idence based on this theoretical perspect ive, and obtained data to support their posit ion. Similarly, Just l in (1994) p roposed that a researcher 's cho ice of stimuli or quest ions may contribute to the extent of bias observed in s o m e studies. H e found that nonrepresentat ive items may lead subjects to overconf idence, and argued for taking an ecological approach to item select ion when conduct ing research in conf idence. Erev, Wal ls ten, and Budescu (1994) a lso presented ev idence that ana lyses contributed to results in making conc lus ions about over- and under-conf idence. T h e s e authors stated: In the revision-of-opinion literature, subjective probability (SP) judgments have been ana lyzed as a function of objective probability (OP) and general ly have found to be conservat ive, that is , to represent under-conf idence. In the calibration literature, ana lyses of O P (operationalized as relative frequency correct) as a function of S P have led to the opposi te conclus ion, that judgment is generally overconfident. . . . Both results can be obtained from the s a m e set of data, depending on the method of ana lys is (Erev et a l . , p. 519). The literature on judgment conf idence reveals that more research is needed to understand how accuracy and conf idence relate in particular contexts, using var ious methods. Judgment latency. Exper ts typically are quick to perform judgment tasks in their domain. With reference to psychomotor tasks, Rabbitt and Banerji (1989) reported that very long practice improved decis ion speed . The literature is not as clear with respect to conceptual tasks. Berl iner (1986), Ch i et a l . (1981), and G l a s e r (1989) found that experts were slower in constructing mental representat ions, compared to nov ices. Experts were very fast, however, with familiar tasks which could be performed intuitively. Dreyfus and Dreyfus (1986) reported that experts can 40 play c h e s s at the rate of 5 to 10 seconds a move if they relied on intuition and employed little analys is and compar ison of alternatives. Barrows, Norman, Neufeld, and Feightner (1982) reported that exper ienced physic ians initiated hypotheses very early in a clinical encounter, and the ear l iness of hypothesis generat ion was assoc ia ted with correct d iagnosis . The total t ime spent on the task w a s not related to diagnost ic accuracy. Similarly, Muzz in et al . (1982) found that expert physic ians generated diagnost ic hypotheses almost immediately. With nurses as subjects, Westfal l , Tanner , Putzier, and Padr ick (1986) found that cl inical inferences were activated quickly; the timing of these judgments, however, was not related to accuracy. Hogarth (1975b) found the relationship between decis ion time and perceived task complexity to be concave ; with both complex and s imple tasks, typical dec is ion t ime w a s relatively short, whereas with tasks of intermediate complexity, judgment t ime was longer. Hogarth attributed this finding to information process ing limitations. The literature g ives good support for including judgment latency as an indicator of expert ise to explore in relation to accuracy. M. U. Smith (1992), however, warned that it is not safe to a s s u m e that the time devoted to a task is a direct measure of p rocess ing t ime. A source of difficulty in interpreting judgment latency in research where reading c a s e material is required is that the time taken for reading will unavoidably be included in the measure . Al though it might be poss ib le to remove subjects' dif ferences in reading rate, such adjustment would not be w ise . A s Patel and Freder iksen (1984) and Coughl in and Patel (1987) expla ined with respect to medical students and physic ians, when subjects read clinical c a s e s , two aspects of comprehens ion were found to be important: a "bottom-up" (text-based) process ing of the cues from the case , and a "top-down" (exper ienced-based) 41 instantiation of relevant prior knowledge. To make judgments that have potential to reveal dif ferences assoc ia ted with exper ience, both of these p rocesses must be cons idered together. Cl in ic ians construct a representat ion, whi le reading, in a n interactive manner. Any attempt to remove differences in reading rate could a lso remove variation essent ial in achieving differential comprehens ion and mental representat ion of c a s e s . Accessibility of domain-specific knowledge. Prawat (1989) def ined knowledge a c c e s s as "the ability to draw on or utilize one's intellectual resources in situations where these may be relevant" (p. 1). Experts have highly accessible domain-speci f ic knowledge. For familiar tasks, experts do not need to employ deliberate attentional resources to retrieve needed information. For example , in descr ib ing a c c e s s , Dreyfus and Dreyfus (1986) stated that "not only situational understandings spr ing to mind, but a lso assoc ia ted appropriate act ions" (p. 324). Barrows and Bennett (1972) noted in their study of neurologists, hypotheses s e e m e d to "pop" into the cl inician's head , suggest ing strong links in memory between sal ient cues and assoc ia t ions tr iggered by these cues . Tanner (1984) a lso found that expert nurses descr ibed the immediate a c c e s s of relevant data as knowledge "popping" into mind. Kihlstrom (1987), Lewicki et a l . (1987); Lewicki , Hill, and Bizot (1988), and Lewicki , Hill, and C z y z e w s k a (1992), have studied information process ing with implicit knowledge; these researchers c la imed that it is unnecessary to limit the concept of access in relation to one's knowledge to conscious access. Currently, controversy exists in the literature about the nature and signi f icance of the role of unconsc ious percept ion and cognit ion (Kihlstrom, Barnhart, & Tataryn, 1992; Merikle, 1992). Brown (1982) pointed out that the ability to use knowledge flexibly required multiple a c c e s s . 42 A c c e s s to knowledge and organizat ion of conceptual structure may be related: because of the many links and interconnections, a c c e s s to appropriate knowledge is enhanced through hierarchical structuring. Brown (1982) bel ieved that the development of reflective a c c e s s , which involved be ing able to use knowledge, and to contemplate knowledge as an object of thought, w a s critically important for expert ise. Section C: Findings From the Literature In this sect ion, literature related to the second and third research quest ions are d i scussed . Resea rch quest ion 2 is: What are the patterns of relat ionships among measures of conceptual structure, sensitivity to patterns in data, judgment p rocess , and performance in a cl inical judgment task? The cognit ive constructs are d i scussed in sequence . Conceptual Structure Shave lson (1974) def ined structure as "an assemb lage of identifiable e lements and the relat ionships between those elements" (p. 231). H e defined cognit ive structure a s a hypothetical construct related to the organizat ion of concepts in memory. The terms cognit ive structure and conceptual structure are used interchangeably in this study. Severa l researchers have investigated the conceptual structure of individuals (L. K. Hammond , 1970; Larkin et a l . , 1980; Mitchell & Ch i , 1984; Shave lson & 43 Stanton, 1975; M. U. Smi th, 1992). R e s e a r c h has revealed that conceptual organizat ion changes with expert ise (Benner, 1984; Boshu izen & C l a e s s e n , 1984). From a linear array with little coherence or interconnections, knowledge becomes greatly elaborated and success ive ly t ransformed into integrated hierarchical units b a s e d on fundamental categor ies. Ev idence sugges ts that these categor ies are not c lass ica l in nature, with sets of necessary and defining features, but rather are prototypical (Bordage & Zacks , 1984; C row & Spicer , 1995; Grant & Marsden , 1987, 1988; Patel & Freder iksen, 1984). Th is transformation in knowledge organizat ion has been critical to the evolution of expert ise. Smal l knowledge units are "chunked" into larger components, thereby al lowing a person to perceive patterns in the data a s a particular event or phenomenon. T h e s e patterns of data are known as schemata , the building b locks of cognit ion (Rumelhart, 1980). G lase r (1986) defined a s c h e m a as "a modif iable information structure that represents gener ic structures of c o n c e p t s . . . . S c h e m a t a represent knowledge that w e exper ience" (p. 921). S c h e m a t a are like prototypes of frequently exper ienced situations in memory that individuals use to integrate instances of related knowledge. Like internal theories or models, schemata enab le individuals to impute meaning and make predictions (Kahneman & Tversky, 1973; Tversky & Kahneman , 1980). O n e major difference between novices and experts is that the experts have wel l -developed schemata for domain-speci f ic situations. A s expert ise is ga ined, people modify their knowledge base, which facil itates more advanced thinking. A n increase in the hierarchical organizat ion of knowledge makes many implications of the data avai lable as logical inferences. So lv ing problems and making judgments become a matter of categoriz ing the speci f ic situations according to bas ic type (Chi et a l . , 1981). Us ing category knowledge as well as the va lues of speci f ic var iables, the expert cl inician can generate highly 44 useful inferences. In contrast, nov ices know separate facts, but often needed inferences are not generated. Resul ts of investigations of expert ise in judgment performance may be obscured if subjects have large dif ferences in knowledge. Ch i , Hutchinson, and Rob in (1989) explored knowledge organizat ion and the effect of structure on the use of the information. T h e s e researchers se lec ted a task for wh ich nov ice and expert subjects knew the s a m e number of attributes. T h e s e researchers were able to d iscern difference in knowledge organizat ion that they otherwise would not have done had they not ensured that both groups had a certain knowledge base . T h e a im of the present study is not to show that knowledge is essent ia l to expert judgment. That point is already known. What is of interest is to investigate dif ferences in judgment performance between novices and exper ienced subjects, when the nov ices have bas ic knowledge. Sensitivity to Patterns in Data Kolers (1970, 1979b) descr ibed three s tages of learning to read, which may be ana logous to novice to expert changes in perceiving information in a domain. Initially, the beginning reader perce ives each separate letter (like each speci f ic cue the novice sees) , and then integrates letters to form words. T h e s e words are often read mechanist ical ly, without meaning. Much later the ski l led reader "reads meaning directly from the words within a language; he [or she] does not read the words themselves" (Kolers, 1970, p. 116). Berl iner (1986) a lso used the reading analogy to descr ibe the expert teacher: "We regard the reading of a classroom, like the reading of a chessboard, [italics added], to be in part a pattern recognit ion phenomenon based on hundreds and thousands of hours of exper ience" (p. 11). Poss ib ly cl inical experts in s o m e real sense read the cl inical situation: they no longer p rocess each 45 cue separately but are sensit ively perceptive to the meanings constructed on the bas is of the set of related cues as a whole. Kolers and Smythe (1984) and Kolers and Br ison (1984) argued that perceiving the meaning of cues is not a "given", but is an achievement or a construct ion. T h e meaning perce ived depended on one 's concepts and skill in perceiving a s to how the st imulus flux of cont inuous life events is segmented into units. P e o p l e do not encode equal ly all features of objects or events (or their pictures or symbol ic description), but encode in relation to what they have learned to be useful . One ' s skill in perceiv ing meaning in sets of c u e s has a large inf luence on one's representational ability. Chi et a l . (1982) and Ch i , G laser , and Farr (1988) attributed expert ise in problem solv ing, in part, to changes in representational ability. To the extent that judgment performance is similar to problem solv ing, literature in this a rea has re levance for the present study. Ch i and co l leagues c la imed that problem representation depended not only on task characterist ics, but a lso upon observers ' conceptual categor ies brought to the task. These researchers descr ibed this interaction a s an outcome of both the initial categorizat ion process (arising from "bottom-up" analys is of cues) , and gener ic ("top-down") category knowledge. Bordage and Lemieux (1991) demonstrated with medical judgments that "the successfu l d iagnost ic ians . . . are those who use the most diversif ied sets of abstract relationships and , therefore, who have broader or deeper representat ions of the problem" (p. 71). Similarly, Lesgo ld (1984) found that with radiologists, exist ing schemata formed rules by which new data were interpreted; these schemata funct ioned as triggers when condit ions were sat isf ied: "The meaning of any given 46 film feature is determined, in part, by surrounding context" (p. 53). Trigger mechan isms b e c o m e refined with practice. Severa l authors (Evans, 1989; E v a n s & G a d d , 1989; N. F. Jones , 1957; Lesgo ld , 1989; Margol is, 1987; Ofir & Lynch, 1984; Rock , Bransford, Maistro, & Morey, 1987) have argued for the importance of context in judgment. Margol is c la imed that in making judgments, people who are in a famil iar eco logy rely on a rich variety of patterned cues . Margol is suggested that people learn to perceive individual aspects of a situation only in the context of a whole, which is often implicit or imputed rather than overt. Many relations of stereotypical patterns are constructed ove r t ime (Stelmachers & McHugh , 1964). Initially, however, when nov ices make what are ostensibly the s a m e judgments, no mental patterns are yet avai lable; these judges must rely on deliberately retrieving the meaning of each cue and estimating its inf luence without benefit of such context. Nov ice judgments reflect declarat ive knowledge, where e a c h p iece of information is a s s e s s e d separately and the judgment consists of mentally combining posit ive, negative, and neutral cues . In contrast, expert judgments are highly contextual; that is, the cues are interpreted within context, and the meaning constructed on the bas is of particular cue-patterns changes with an alteration in context. The expert is sensi t ive to what is truly relevant in the context at a certain point in time. Elste in, Shu lman , and Spra fka (1990), in their ten-year retrospective review of medical problem solv ing, argued for a v iew of situated, context-dependent cognit ion that is consistent with their earlier f indings on case-speci f ic i ty. Most researchers agree that in studying problem solv ing and clinical judgment, context must be cons idered. 47 Hobus, Boshu izen , and Schmidt (1991) examined the inf luence of context var iables such as age, gender, and previous d iseases , on judgment performance. They reported that "the development of expert ise s e e m s assoc ia ted with the development of i l lness scripts . . . . resulting in better diagnost ic accuracy" (p. 3). In this study, the primary judgment task involved reading brief c a s e information about surgical patients and predicting heal ing t ime. This dependent var iable was se lec ted, in part, on methodological grounds descr ibed in chapter 3. T h e dec is ion to use heal ing t ime a s the dependent var iable a lso has conceptual rationale. Nurses working in surgical settings encounter hundreds of patients each year. Peop le who are admitted to these units vary in terms of age, general health, d iagnoses, life style factors, and in a host of other ways . Over time, there is implicit learning of what patterns of cues tend to be assoc ia ted with particular outcomes, such as rapid heal ing, wound infection, or wound separat ion. Fol lowing the example of C lee remans and McCle l l and (1991), a patient's admiss ion to hospital, surgical procedure, recovery period, d ischarge from hospital, and home recovery can be v iewed a s an "event sequence" . T h e s e researchers found that with exper ience, people learned typical event sequences to which they were frequently exposed . C lee remans and McCle l land 's subjects could complete partial patterns; they were, however, not necessar i ly able to verbal ize their knowledge. S u c h "event sequences " can be compared to "i l lness scripts" that Hobus et a l . (1991) descr ibed. In the present study, nurses ' encounters with surgical patients were theor ized to be important in the development of implicit knowledge of "heal ing scripts". W h e n presented with c a s e data, nurses with more exper ience were expected to have deve loped schemat ic knowledge needed for relating var ious cues and complet ing patterns (making judgments of heal ing time). 48 Experts so lve problems and make judgments based on the detection of similarity between the given problem or case and relevant situations from exper ience (Patel et a l . , 1989). A s expert ise increases, s o m e studies show that this a s s e s s m e n t is related to perceiving similarity more in conceptual relations than in perceptual features. For example , Ch i et a l . (1982) reported that whereas an expert physicist may identify a problem as one based on "Newton's second law," a novice may label the s a m e problem in literal terms a s a "pulley problem." B e c a u s e conceptual relations depend on subjects' percept ions of the stimuli, it is not possib le for an experimenter to dictate their ex istence in a research context. In the present study, these relations were determined by the way subjects experienced tine stimuli, as Whi t t lesea (1987) and Whit t lesea and Brooks (1988) descr ibed. C h a s e and S imon 's (1973) finding.of perceptual chunking of meaningful chess patterns illustrated how sensitivity to patterns var ies with expert ise. The chess expert and physics expert have the ability to abstract relevant tacit knowledge elicited by external cues . The c h e s s master 's expert ise is der ived, in part, from the ability to impose a cognit ive structure on the pattern of chess p ieces. Similarly, the expert physicist can "see" the deep structure that underl ies the terms in a phys ics problem. Ch i et al . (1981) stated that "even though the s a m e set of key words may be deemed important by subjects of both skill groups, the actual cues used by the experts are not the words themselves, but what they signify" [italics added] (p. 149). It is not the words, but the meaning that functions as cues . This s a m e idea was expressed by L. K. H a m m o n d (1970) in relation to cl inical judgment. Nov ices have a limited ability to s e e underlying meanings and relations, compared to more exper ienced cl inicians. 49 Resea rch on concepts has shown that an individual's ability to categor ize depends on knowledge which may be encoded in patterns rather than in sets of proposit ions. R o s c h (1975), for example , p roposed an approach to semant ic categor ies that treats categor ies like patterns. Barsa lou (1985), as wel l , argued that familiar concepts are not proposit ions stored in memory, but are dynamic construct ions of cont inuous, interrelated knowledge, tailored to current needs and goals. Judgment performance is inf luenced by the way in which mental representat ions evo lve with expert ise. Nov ices examine e a c h c a s e and make judgments about them as separate entities. Experts, in contrast, no longer think atomistically; they think in terms of a relevant category of similar c a s e s . They perceive the present case within the context of representative c a s e s , based on perce ived typicality (Medin & Schafer , 1978). Knowledge of complex interaction is distributed across many specia l c a s e s , and thus this knowledge is eas ier to learn (Brooks, 1987). Differentiation is taking p lace on a wider sca le . L. B. Smith (1989) found that similarit ies were at first perceived globally, and then became more refined with exper ience. A s e a c h situation is compared to previously-exper ienced situations in memory, individuals construct (or re-construct) meaningfully differentiated subgroups (Hayes-Roth & Hayes-Ro th , 1977; R o s c h & Mervis, 1975; R o s c h , S impson , & Miller, 1976). From studies reporting novice-expert dif ferences, it is suggested that representat ions of problem situations evolve with expert ise (B. Whi te & Freder iksen, 1986). Nov ices construct mental representat ions with little elaborat ion; the information related to each cue is initially identified only in binary terms, a s a posit ive or negative impact related to the judgment. Later, mental representat ions develop so that information for each cue is represented as sets of two-dimensional cont inua; 50 thus, the degree to which information is positive or negative can be represented. Finally, mental representat ions of experts are evolved further, containing information in an integrated, mult idimensional form, referred to as mental models . T h e s e models include not only knowledge related to the var iables of the particular case , but a lso "distributional" knowledge: that is, knowledge of typicality compared to c a s e s of a similar category and knowledge of correlated features (Mellers, R ichards, & Bi rnbaum, 1992). If representat ions that experts construct are distributional in form, it is of interest to determine how expectat ions derived from past exper ience might inf luence judgment performance. In previous nursing research, Tanner (1984) found that expert nurses were sensit ive to patterns of cues that did not match expectat ions. It may be that this "mismatch" s ignals some anomalous input and guides further reasoning for expert cl inicians. In a ser ies of experiments, F lannagan, Fr ied, and Holyoak (1986) examined how categor ies were learned from observat ion of exemplars . T h e s e authors a s s e s s e d the possib le role of prior expectat ions on the way these exemplars were mentally distributed. They used the category density model to account for their f indings. In this model , learning was treated as the acquisit ion of knowledge about the distribution of category exemplars over a feature space . A central assumpt ion of this model was that the learner used presented instances as a sample to induce a density function over the feature s p a c e for the population of potential category members . Category exemplars are represented as configurations of va lues corresponding to points in a mult idimensional feature space . A further assumpt ion was that novel instances would be classi f ied on the bas is of distributional knowledge. The probability of c lassi fy ing an instance into a particular category w a s v iewed as proportional to the relative l ikel ihood that the item was generated by that 51 category's distribution, compared to that of the alternative categor ies. F lannagan and co l leagues c la imed that whenever a st imulus was observed, the f requencies of its attributes were incremented appropriately, building up a mental "record" that, over t ime, faithfully reflected the distributions of attributes in the observed stimuli. T h e s e researchers found that people who had learned a distribution through exper ience could classify new items more accurately when these new items matched distributional expectat ions than when they did not. The research of F lannagan et al . (1986) is relevant for the present study for severa l reasons. First, if exper ienced cl inicians have c a s e s encoded as mult idimensional distributions, this would provide a large advantage when assess i ng new c a s e s , provided the c a s e s were selected representatively. Not only higher-order category information, but typicality and assoc ia ted characterist ics would become avai lable. S e c o n d , this account of categorizat ion, in which the role of exper ience-related changes in the weights of features (cues) is emphas i zed , is consistent with an explanat ion of expert ise based on the Dreyfus model . In addit ion, this approach would help to account for experts ' use of the representat iveness heuristic, including its functional characterist ics in natural cont inuous environments, and its dysfunct ion in the laboratory sett ing where discrete, nonrepresentat ive judgments are often studied. Judgment Process Einhorn (1974) stated that "the combining of information l ies at the core of expert ise" (p. 570). The impact of information process ing on judgment per formance has been recognized as an important a rea of study for many years ; yet, it is difficult to specify any one process ing strategy that is uniformly "best", because what is optimal depends on : (1) task characterist ics, (the complexity, the presentat ion of cue 52 information, and the nature of the cue-criterion relationships); and (2) character ist ics of the judge (ability or skil l, familiarity with the task, use of weighting and integrating strategies, and information-processing style). Task characteristics. Severa l researchers have examined task difficulty as a factor in judgment per formance (Abdolmohammadi & Wright, 1987; Corco ran , 1986b). The complexity of a task is v iewed as a product of an interaction between task characterist ics and subject factors which cannot be completely separated. S u c h task character ist ics include the number of cues , where a greater number tends to increase complexity (Payne, 1976). Task structure is a lso important in determining complexity; Pate l , G roen , and A rocha (1990) attributed difficulty in a medical reasoning task to structural factors. Most tasks become eas ier with expert ise; Ade lson (1984), however, found that the difficulty of a task can increase with expert ise. The degree to which the structure of s imulated tasks used in the laboratory for research matched the structure of actual tasks in the ecology may have inf luenced the validity of the results of information process ing studies. H a m m (1988a), K. R. H a m m o n d (1987), and H a m m o n d , H a m m , Grass ia , and Pearson (1987) have categor ized tasks in terms of where they are best located on a task cont inuum; one end of this continuum are intuition-inducing tasks, and the other end are analys is- inducing, with tasks requiring both forms of cognit ion in between. Hammond and co l leagues argued that judgment per formance is enhanced when there is a match between the preferred means of cognit ion of the cl inician and the analys is- or intuition-inducing properties of the task. Structural features of the task significantly inf luence the type of p rocess ing used . If, a s Dreyfus and Dreyfus (1986) c la imed, cognit ion changes from analys is to a dialectical relationship of both analys is and intuition, then it may be illuminating if subjects at var ious skill levels performed tasks that induced intuition as well as tasks that induced analys is . 53 The nature of the structural relationship between cues and the criterion state in predictive tasks can inf luence information process ing (Connolly, 1977). S o m e cues have a causal structure, whereas other cues have a reflective structure, in wh ich the cues are indicative of the criterion. O n e structural feature that has information-processing consequences is dimensional i ty (Peterson & Scott , 1983). G a m e r (1978, 1983) and Medin and Schwanenf lege l (1981) investigated the inf luence of integral and separab le task d imensions; these researchers argued that a subject may be able to re-define integral st imulus d imensions into a new d imens ion, particularly when d imensions are correlated. T a s k s which encourage the process ing of information a s holistic or gestalt stimuli have potential to increase the efficiency of the process ing and reduce the perce ived complexity. This type of transition s e e m s relevant to the change in the perception of tasks that occurs with expert ise. Garner (1970) pointed out, however, that not all tasks are equivalent in this regard: only s o m e stimuli form good patterns. It is not clear from the literature how the presentat ion of cues (verbal or phenomenal) inf luences information process ing. The use of verbal cues , as opposed to abstract cues that provided no context, have been shown to facilitate accuracy in prediction tasks (Koele, 1980; Miller, 1971 ; Much insky & Dudycha , 1975; Sn iezek , 1986); these researchers , however, did not compare verbal cues with phenomenal cues , nor did they investigate how subjects' familiarity with each cue type could inf luence information process ing. Carlstrom (1989) used both verbal and pictorial cue-presentat ion formats in a study of army aviators. Resul ts revealed that two cues were used in the verbal format, and only one in the pictorial presentat ion. Phe lps and Shanteau (1978) a lso examined cue format. T h e s e researchers studied experts who made judgments about the quality of l ivestock. O n e format w a s 54 verbal (cue-dimensions of verbal ly-specif ied attributes relevant to judging l ivestock); the other format was phenomenal (the l ivestock d isp layed as photographs). The results showed that 9 to 11 cue d imensions, compared to three d imensions, were p rocessed in e a c h condit ion, respectively. N o conc lus ions about the format of cue express ion can be reached, however, because the verbal -phenomenal condit ion w a s confounded with orthogonal-correlated cue format. The verbal condit ion d isp layed cues that were artificially made to be orthogonal. The photographs d isp layed information that w a s naturally correlated; only three distinct d imensions were available. In addit ion, subjects made ratings for eight photographs on two occas ions . Multiple regression was used with the 16 ratings of quality as the dependent var iable, and 11 predictor var iables. Eight photographs may be an insufficient number of stimuli upon which to make a conclus ion about the number of d imens ions used . The reason that format of cue presentation is important is that in clinical sett ings, the cues are patterns of phenomena and events familiar to exper ienced cl inicians. During a clinical encounter, cl inicians care for patients who display these cues (such as pain), not as printed words, but as dynamic patterns of particular phenomena in context. Yet , judgment studies are usual ly conducted with descript ions of cues presented in print. Exper ienced cl inicians who perform in the laboratory may be d isadvantaged; novices (who are likely more familiar with verbal cues) , may perform better in the laboratory than they would have in the cl inical setting. The nature of cue-criterion relat ionships has a potential to inf luence information process ing in judgment (Brehmer, 1969; Brehmer & Slov ic , 1980). For example, when high validity cues are not avai lable, or when the predictive validity of the cues is low, tasks are s e e n as having considerable complexity (Brehmer, 1980a). 55 Two further task features that inf luence process ing are the nature of the relat ionships between cues and the criterion, and the mathematical function by which cues are combined. W h e n cue-va lues vary directly or inversely with criterion va lues , the relationship between a cue and the criterion is descr ibed as linear. W h e n cue-va lues and criterion va lues are systematical ly related in a manner that reveals a curved line when the data are plotted, the relationship is descr ibed a s nonlinear. C u e integration functions are frequently c lassi f ied as additive (net effect of the summat ion of posit ive, negative and neutral cues , each a s s e s s e d separately) and nonaddit ive (all other forms of cue integration, including multiplicative). A n essent ia l feature of nonaddit ive cue integration function is that to some extent, cues are p rocessed configurally. K. R. Hammond et a l . (1975) def ined configurality as a cue-integration method in which cues were combined in a manner that the use of one cue depended on the value of other cues . Mel lers (1980) pointed out the independence of cue-criterion relations and cue integration function. Al though in the literature, the terms linear and additive are often s e e n together, addit ive funct ions can be compr ised of c u e s which are related to the criterion either in a l inear or a nonlinear manner. The s a m e is true of nonaddit ive functions. Hoffman (1968) reported that a large number of empir ical studies have demonstrated that performance frequently can be wel l -descr ibed by l inear cue -utilization, and additive integration for both exper ienced and naive subjects. P h e n o m e n a which the researcher has deliberately constructed to relate in a nonlinear way can nevertheless often be re-framed and wel l -descr ibed by a linear 56 model (Dawes, 1979). Hoffman pointed out that model ing the linearity in a task system can overwhelm any nonlinearity. The type of cue-criterion relations in a task inf luences how subjects p rocess information. If expert ise is dependent on cl inicians' ability to better detect nonl inear relations compared to nov ices, it is important for judgment tasks that are des igned to reveal expert ise to contain cues that relate to the criterion in nonl inear ways . S u c h a precaution, of course , would not preclude other descript ions from being used , but nonl inear model ing would at least be a possibil ity. There are many previous studies where researchers have tried to capture the cognit ive process ing of cl inicians with configural tasks (Ogilvie & Schmitt , 1979; Meeh l , 1950, 1954; Wigg ins & Hoffman, 1968a, 1968b); with most of these attempts to demonstrate configurality, the results have been smal l . Characteristics of the judge. Character ist ics of the judge inf luence the manner in wh ich information is p rocessed . Hav ing particular knowledge or skill may be critical to p rocess ing information in a task. The importance of perceptual skill to many tasks has been recognized in nursing (Benner & Wrube l , 1982) and in teaching (Carter, Cush ing , Sabe rs , Ste in, & Berliner, 1988; Sabers , Cush ing , & Berl iner, 1991). Gi lov ich (1981) and T y s z k a (1986) have found that subject familiarity increased performance on judgment tasks. Mel lers (1980), in her studies on learning in probabil istic tasks, found that subjects' ability to predict the criterion was task dependent; tasks with l inear cue -criterion relations and additive cue integration functions were learned more quickly than those tasks character ized by nonlinearity and/or nonadditivity. 57 There is ev idence for differences between experts and nov ices in their weighting and integration of cues . For example , Wal ls ten and B u d e s c u e (1981), reported that in a task where Minneso ta Mult iphasic Personal i ty Inventories [MMPI profiles] were judged, exper ienced subjects used information configurally; in contrast, nov ices used an additive strategy. The ability to successfu l ly p rocess cues with nonl inear relations may increase with exper ience (N. H. Anderson , 1972). E. J . Johnson (1988) found experts used nonl inear cues to increase accuracy of f inancial judgment. Experts refined their judgments depending on the combined inf luence of all the relevant c u e s a s a set ; the use of this p rocess led to variable weighting of cues . Nov ices may have known the direction of inf luence of each cue separately, but they had major difficulty making optimal adjustment for interactive cues or for nonlinear relations. They tended to weight each cue the s a m e , regardless of the va lues of other cues . A s the amount of information increases, judges tend to use simplifying strategies for integration of information. Einhorn (1970, 1971) reported that the use of interactions and complex configurations are widespread in the judgment p rocess ; he pointed out that integration funct ions that are extremely difficult to model , may be easier to use from a cognit ive perspect ive, compared to less mathematical ly complex functions. Payne (1976), using a process tracing methodology, found that high task complexity led to differential use of strategies for judgment. Subjects tended to switch from an initial compensatory strategy to one that is noncompensatory, a s information load increased. Information processing style. Considerat ion of information process ing style may be important in understanding the progression of expert ise. Two particular styles d i scussed are intuitive process ing and analyt ical process ing. 58 Many researchers investigating expert ise identify the importance of intuition in judgment (Abemathy & H a m m , 1995; Benner & Tanner, 1987; Elstein, 1988; H a m m , 1988a, 1988b; Mitchell & B e a c h , 1990). G lase r (1986) argued that experts deve lop the ability to perceive meaningful patterns, which are seen in the course of everyday activities. Pattern recognit ion occurs so rapidly that the phenomenologica l exper ience takes on the character of intuitions. Elste in sugges ted that understanding of intuitive reasoning p rocesses is an important objective: "the most valuable output of both A l [Artificial Intelligence] and behavioral dec is ion research may be to give us insight into how we intuitively deal with complex cho ices and tradeoffs" (Elstein, 1988, p. 155). Brooks (1978, 1984) and Jacoby and Brooks (1984) used the term nonanalytic, rather than intuitive p rocess ing. T h e s e authors c la imed nonanalyt ic process ing is based on immediate assessmen t of holistic similarity of an incoming st imulus to a previously encountered situation retrieved from memory. T h e s e authors, a s well a s Dreyfus and Dreyfus (1986), regarded the transformation from the process ing of elements to the process ing of the whole situation as critical to the development of expert ise. Dreyfus and Dreyfus stated: [T]he new level of performance [expert level] co inc ides with a shift from the logical process ing of atomic facts to the recognit ion without recourse to isolable e lements, of the similarity between a current situation and a stored image-l ike representation of a previous situation it resembles (p. 66). Similarly, Schmidt , Norman, and Boshu izen (1990) descr ibed expert reasoning in medic ine as pattern recognit ion. [Pattern recognit ion is] based on the similarity between the present ing situation and some previous patient avai lable from m e m o r y . . . . The final stage of expert ise is nonanalyt ic (p. 617-618). 59 Abernathy and Hamm (1995) acknowledged that expert intuition exceeded any scheme used to descr ibe it; it could not be completely captured by rules. Abernathy and H a m m demonstrated that the experts who they studied responded to situations for which they did not have a script, and s e e m e d to elude the bounds of any analyt ic framework. Reber and Lewis (1977) and Reber (1989) related intuition to implicit learning. T h e s e authors c la imed that this learning represents the epistemic core of intuition; it is the p rocess by which tacit knowledge is acquired. Character ist ics of such learning include three critical features: (1) It fosters the construction of tacit knowledge that is abstract and representative of the structure of the environment in which the subjects have been immersed; (2) implicit learning is acquired without consc ious attempts to learn; and, (3) this knowledge can be used implicitly to so lve problems and make accurate judgments about novel st imulus c i rcumstances. Implicit learning can be dist inguished from explicit learning, which consists of the acquisi t ion of declarat ive knowledge and del iberately- learned rules for appl icat ion. In contrast, when people solve problems in their practice settings over long time periods, they develop implicit knowledge that al lows them to behave in ways that reflect their knowledge of environmental structures and patterns. Intuition is the end product of an implicit learning exper ience. Introspectively, it s e e m s compel l ing and obvious. Yet , Reber (1989) c la imed that, from empir ical and theoretical perspect ives, intuition has not been well understood. Implicit learning provides individuals with a strong s e n s e of what is the appropriate or inappropriate response to make, but people are largely unaware of the reasons for their mental state. The knowledge is "tacit", as S c h o n (1983) descr ibed. 60 Intuition was def ined by Benner and Tanner (1987) as "understanding without a rationale" (p. 22). T h e s e authors acknowledged that intuition has se ldom been granted legit imacy as a sound approach to clinical judgment; yet, they bel ieved that it d ist inguished expert judgment from that of beginners. Intuitive process ing, however, is not a lways accurate (Aspinal l , 1976; Bordage, 1984; Kass i rer & Kope lman , 1989; LaFor tune, 1988; Moskowi tz , et a l . 1988). Borak and Vei l leux (1982) cited many examples where health care professionals (including 85 physic ians, 43 of which were statistically sophist icated, and 43 clinical nurses) were a s s e s s e d for accuracy of their intuitive reasoning. Subjects with statistical knowledge performed best, but all subjects made many errors of intuitive logic. T h e s e results constitute ev idence of the difficulty of clinical reasoning, and the danger of over-using intuition in contexts where analys is is more likely accurate. Einhorn and Hogarth (1978) expla ined that in order to learn to make good intuitive judgments, it is necessary to consider judgments, act ions, and outcome feedback together. Yet , these authors c la imed that people who make judgments intuitively frequently lack awareness of environmental effects. W a s o n (1960) pointed out that "in real life there is no authority to pronounce judgment on inferences: the inferences can only [sic] be checked against the ev idence" (p. 139). In addit ion, W a s o n c la imed that people have difficulty in mak ing use of disconf irming information: they are often unwilling to attempt to falsify hypotheses, and thus test those intuitive ideas which carry the feeling of certitude. In contrast to intuitive process ing, analytic process ing is the deliberate process ing of separate features (such as s igns and symptoms) and weighting and combining them to make judgments about speci f ic cases . Explicit learning fosters 61 analytic process ing of information. Declarat ive knowledge resulting from explicit learning ep isodes is drawn upon, often in a s low and systemat ic manner. Us ing decis ion analys is, Corcoran (1986a) presented a good example of an analyt ic approach to making cl inical judgments in a nursing context; this author stated that analys is can be an effective guide "in complex, t roublesome situations where there are mutually exc lus ive courses of act ion" (p. 149). Benner and Tanner (1987) referred to the type of analyt ic information process ing that proficient and expert nurses used as "deliberative rationality". The literature related to the third research quest ion is reviewed next. R e s e a r c h quest ion 3 is: Wha t are the patterns of relat ionships among individual dif ferences in age, educat ion, and exper ience and performance in a cl inical judgment task? Individual Differences in Education and Age The inf luences of individual dif ferences in educat ion and age on judgment performance are difficult to separate. Tanner (1987) summar ized clinical judgment studies in nursing from 1966 to 1987. Tanner reported that per formance w a s positively related to the academic degree held; for subjects with more than 6 years exper ience, however, age and performance were negatively related. Aspinal l (1976) found a decl ine in judgment per formance for nurses with more than 10 years exper ience. In a review of studies of physic ian performance, Lockyer (1992) reported that older age of physic ian (50 to 55 years) was assoc ia ted with lower competence, compared to younger phys ic ians; the speci f ic measu res of competence were not descr ibed. Similarly, Sa lem-Scha tz , Avorn, and Soumera i (1990) found that older physic ians had comparat ively low knowledge scores , yet high conf idence. 62 In a study compar ing senior nursing students, Brooks and Shepherd (1990) found almost identical mean decis ion-making scores for students from three types of programs, which differed greatly in length and phi losophy. Sanford, Genr ich , and Nowotny (1992) a lso reported similar f indings. Individual Differences in Experience The relationship between exper ience and judgment accuracy is a major quest ion in this study. Many researchers regard the attribution of expert ise to exper ience as undisputable; however, it is not a lways clear what it is about exper ience that makes a difference, or how exper ience inf luences cognit ion and performance. Berl iner (1988) stated: "In any domain of expert ise, one must learn through expe r ience . . . . Exper ience s e e m s to change people s o that they literally "see" differently" (p. 49). The research in this a rea reveals mixed f indings: Goldste in , Deysach , and Kleinknecht (1973) found that cl inicians, including those who were exper ienced, performed poorly on judgments of cerebral impairment. Kunde l , Nodine, and Carmody (1978), in investigating subjects ' judgments of the p resence and location of lung nodules, reported no difference in accuracy of performance for subjects with a wide range of exper ience. Us ing a policy capturing approach, Borko and Cadwel l (1982) demonstrated much individual difference in teachers ' dec is ion strategies; no consistent policy could be identified in the group studied. Subjects were 41 elementary school teachers with a wide range of exper ience. Lesgo ld (1984) reported that exper ience beyond what is cons idered basic for radiologists did not correlate with expert ise. Faust et a l . (1988) studied a nationally representative sample of 155 clinical neuropsychologists. Subject made appraisals related to neuropsychological 63 disorders s u c h as A lzhe imer 's d isease , head injury, or epi lepsy. Faus t et a l . found that (except for a tendency among exper ienced practit ioners to over-d iagnose abnormality) "no systemat ic relations were obtained between training, exper ience, and accuracy across a ser ies of neuropsychologic judgments" (p. 145-146). In research in nursing, del Bueno (1990) and J . E. White, Nativio, Kobert, and Engberg (1992) found very little dif ferences in the accuracy of cl inical judgment assoc ia ted with exper ience. Tanner, Padr ick, Westfa l l , and Putzier (1987) reported that there were basical ly no dif ferences among subjects in the accuracy of diagnost ic hypotheses using simulated patients presented in v ideotaped vignettes. T h e three groups of subjects studied (beginning students, senior students, and exper ienced nurses with baccalaureate degrees) had large dif ferences in both knowledge and exper ience. K. R. Hammond et a l . (1966) studied 6 nurses with a range of exper ience and found large individual dif ferences in inference patterns that s e e m e d unrelated to exper ience; the nurses did not discriminate among the cues on the bas is of their usefu lness in judgment. Addit ional studies, for example, those conducted by D a n a , Cock ing , and D a n a (1970) and Elstein et a l . (1993) revealed no relation between exper ience and judgment accuracy. On the positive s ide, P a p a , Shores , and Meyer (1990) studied 173 subjects with var ied cl inical exper ience in medic ine and found that the number of months of clinical exper ience was significantly related to diagnost ic accuracy in simulat ions of patients with chest pain. The subjects for this study had relatively little exper ience; the results may indicate that exper ience makes more contribution to judgment performance early in the learning process . 64 In making diagnost ic judgments about dermatologic condit ions d isp layed as s l ides, Norman, Rosentha l , Brooks, A l len, and Muzz in (1989) demonstrated increased accuracy with exper ience. T w o studies where phys ic ians made accurate cl inical judgments about respiratory condit ions with a n outpatient populat ion were reported (Chr is tensen-Szalansk i , Diehr, Bushyhead , & W o o d , 1982). N iss i la (1992), in a simulat ion study of 5 exper ienced orthopedic nurses and 5 inexper ienced nurses, found that 2 of the former group reached correct dec is ions in all three c a s e s ; none of the inexper ienced nurses were correct on all cases . One of the difficulties in determining the effects of exper ience is that exper ience is def ined differently in different studies. For example , del Bueno (1990) def ined experienced nurses a s those having worked at least three months in a particular cl inical a rea. At the other extreme, Phe lps and Shanteau (1978) cons idered experienced subjects to be individuals with 21 to 25 years of exper ience in judging livestock. Thus, a "novice" judge in one study may be an "exper ienced" judge in another study. A few studies revealed a relation between clinical judgment accuracy and exper ience, but s u c h a finding is not dependable . It s e e m s that this a rea of investigation has demonstrated very little development of research knowledge. Schmidt (1996) attributed the s low growth of cumulat ive knowledge in psychology to the over-rel iance on statistical testing. In many studies the magnitude of relations may be smal l , and thus there is a high probability that traditional methods of data analys is will not reveal them. The consequence is that there is an accumulat ion of studies in wh ich researchers have conc luded (possibly erroneously) that there is no relationship. Schmidt (1996) argued that "traditional methods b a s e d on s igni f icance testing make it impossible to reach correct conc lus ions about the meaning of these studies" (p. 118). 65 It is difficult to know how to synthesize the results of these studies which have been reviewed. Un less synthesis is ach ieved, however, d iscovery of the underlying regularities cannot take p lace (Schmidt, 1996). S u c h regularities are the foundation for scientif ic progress. Schmidt crit icized what he referred to as the "voting method" in which conc lus ions were determined by the results of the majority of the studies. Schafer (1993) a lso crit icized this "box score approach" and recommended a qualitative synthesis be carr ied out. Section D: Additional Factors Influencing Judgment Results R e s e a r c h quest ions 4 and 5 are: T o what extent does cue-presentat ion condit ion (context cues fol lowed by individuating cues , or the reverse), reveal patterns of relat ionships in performance in a cl inical judgment task? To what extent does memory-pr iming condit ion (exposure to relevant domain-speci f ic visual stimuli, versus no exposure), reveal patterns of relationships in performance in a clinical judgment task? In addition to the cognit ive var iables and individual di f ferences already d i scussed , the u s e of cognit ive b iases and heurist ics may inf luence judgment performance. Assumpt ions related to research des ign and methods may have an impact on conc lus ions reached. T h e s e topics are d i scussed next. Heuristics and Cognitive Biases Clinic ians make both random and systemat ic error when predicting a criterion in a judgment context. A s has already been documented, reliability f igures constitute ev idence of inconsistency in judgment. B e c a u s e cues are related to the criterion 66 probabilistically rather than deterministically, task uncertainty makes random error inevitable. The human mind is subject to cognit ive constraints such as limited attentional resources and finite working memory capacity (Hogarth, 1975a, 1980). Simpl i fy ing cognit ive strategies, cal led heurist ics, are used to reduce information-processing load. Examp les of heurist ics include availability and anchor ing and adjustment (Kahneman & Tversky, 1982). T o the extent that these strategies lead to systematic error \n judgment, they are cons idered as b iases. Thus , a bias in this judgment context refers to systemat ic errors that stem from individuals' perceptual p rocesses , and/or information-processing strategies (Tversky & K a h n e m a n , 1974). Individuals may use these heurist ics in a way that does not lead to error in normal c i rcumstances but, rather, increases judgment performance. Hogarth (1981) speci f ied condit ions under which heurist ics have potential to be val id. In particular, he has argued that severa l b iases identified in discrete incidents result from heurist ics that are functional in the natural cont inuous environment: "Judgment and choice depend crucially upon the context in which they occur and the cognit ive representation of that context" (Hogarth, p. 213). The heuristics d iscussed include: availability and representat iveness heurist ics, anchor ing and adjustment, failure to consider regress ion to the mean , failure to perceive true correlat ions, and the perception of il lusory correlations. Availability heuristic. In a problem solving or judgment context, the e a s e with which relevant instances come to mind is known as availability (Tversky and Kahneman , 1973). W h e n people use availability as the bas is for judgments of the probability of events or the frequency of particular classif icat ions of entities or states of affairs, they are applying the availability heuristic. In general , availability is 67 correlated with ecological f requency, but because it is a lso affected by addit ional factors, the use of this heuristic in judgment may lead to bias. O n e factor with particular inf luence is the differential sa l ience of avai lable c a s e s (Nisbett & R o s s , 1980). Often the extreme c a s e s , (for this study, c a s e s with de layed wound healing), because of their sa l ience, exert greater inf luence on judgment than their actual f requency warrants. In the present study, exper ienced cl inicians are expected to be more likely to use the availability heuristic. Whether judgment performance in the laboratory becomes biased or is facilitated' will depend on the extent to which the presented c a s e s reflect the type of exper ience the subjects have had . Representativeness. Representa t iveness is a heurist ic wh ich people often employ as they make judgments about the l ikel ihood of uncertain events. Us ing this heuristic, the subjective probability of an event is determined by the degree to which the event is similar in characterist ics to a relevant population (Kahneman & Tversky, 1972). Peop le consistently judge a more representative event as the more likely event. A problem with this reasoning in terms of judgment accuracy is that sample s ize, or base rate, is often ignored. Dawes (1986) expla ined that when the heuristic of representat iveness is operating, a s c h e m a is a c c e s s e d automatically, but there is general ly no intuition of the schema 's preva lence tied to the s c h e m a itself. Anchoring with adjustment. Th is commonly used judgment strategy occurs when a person makes an est imate of a criterion state based on s o m e initial value, and then subsequent ly adjusts the est imate as addit ional information is p rocessed . Hogarth (1980) identified one potential risk of this technique as arising from the way 68 the original anchor is generated. Peop le have been known to be inf luenced by anchors generated by completely random processes (Tversky & K a h n e m a n , 1974). Shapi ro (1977) proposed that subjects categor ized c a s e s accord ing to similar c a s e s from memory. In this study, if subjects categor ize in this way, their est imates of average heal ing t imes for category members may act as an anchor; whether s u c h an anchor is helpful or not, would depend on the match between cl in ic ians' exper ience and the cases presented. A potential source of error c o m m o n with this heuristic is conservat ism, or failure to make sufficient adjustment for the particular case being cons idered (Kahneman & Tversky, 1972). Cons iderab le adjustment would be required for c a s e s at the periphery of the distributions. Experts may be those subjects who select good anchors and make appropriate adjustments. Regression to the mean. In making cl inical predict ions, failure to consider regression to the mean may lead to overly extreme, or "nonregressive", predictions (Kahneman & Tversky, 1973). Wha t people somet imes fail to cons ider is that measures of cues include error. W h e n cues have extreme va lues, the l ikelihood is that s o m e cues , if measured without error, are in actual fact, more moderate. Another reason for predictions that appear nonregress ive is that action in the clinical setting is taken based on cue va lues assoc ia ted with extreme predicted outcomes; such intervention may make the outcome more moderate than it would have been without the action. The presence of such cues leads to aggress ive treatment. Thus , if the treatment is effective, paradoxical ly, extreme c u e s may lead to more moderate clinical outcomes than what is suggested, g iven the extreme predictors. C a s e s with cue va lues that are not as extreme do not attract the s a m e attention; hence, they may not receive extra treatment. Einhorn and Hogarth (1978) 69 stated that because action taken on the basis of particular cue-va lues alters the outcome, cl inicians have difficulty in learning accurate cue-criterion relations in probabil ist ic environments. Perception of actual and illusory correlation. C h a p m a n and C h a p m a n (1967) defined illusory correlation as a systemat ic error in perceiving co -occurrences, where, in fact, no actual relationship existed. S u c h error, a s well as the failure to perceive true correlat ions, may lead to inaccurac ies in cl inical judgments. With exper ience, people can become sensit ive to correlat ions among features (Medin, A l tom, Ede lson , & Freko 1982). Lewicki (1986) demonstrated that the mind can implicitly learn to detect covariat ion in data. Dist inguishing between data-based and theory-based covariat ion est imates, Jenn ings, Amabi le , and R o s s (1982) found the former est imates to be extremely conservat ive. Only when objective correlations approached .85 did subjects consistently rate relat ionships as strongly posit ive. In contrast, theory-based est imates led to reports of correlat ions consistent with theory that were not present in the data; in other words, these were il lusory correlations. Smeds lund (1963) reported a study of cl inical inference conducted with a group of nursing students; a ser ies of c a s e s were presented to subjects in which the presence and absence of a symptom were assoc ia ted equal ly often with a d iagnos is ; that is, the correlation between the symptom and the d i sease w a s zero. The subjects conc luded that the correlation was positive and could cite many examples that "confirmed" their hypothesis, a finding which a lso revealed confirmation bias. There is research ev idence which reveals that accurate covariat ion detect ion is difficult (Jennings et a l . , 1982). With exper ience, however, people learn to s e e recurring patterns. For example, Medin et a l . (1982) reported that subjects were able to use correlat ions in symptoms as a cue for certain artificial d iagnoses . 70 Exper ienced cl inicians perceive patterns of s igns, symptoms, and assoc ia ted act ions which can function as global cues . Experience and the Use of Heuristics T h e literature is not c lear with respect to the link between exper ience and the extent of bias in judgment: Shan teau (1978) argued that "bias increases as the unfamiliarity of the st imulus increases" (p. 581). Heller, Sal tzste in, and C a s p e (1992) maintained that bias in medical judgment arising from heurist ics increases with exper ience. O'Neil l (1994) found, in a sample of 214 community health nurses, that the nurses who were more exper ienced demonstrated more b ias in diagnost ic reasoning, compared to less exper ienced nurses. Chr is tensen and Elste in (1991) c la imed that "both experts [who were more experienced] and nov ices are bel ieved to be equal ly suscept ib le to b iases" (p. 25). R ichards and Wierzbick i (1990) found anchor ing effects var ied inversely with conf idence, but these researchers ' subjects were undergraduate students, and, therefore, the results may not be appl icable to cl inicians. Chr is tensen, Hecker l ing, M a c K e s y , Bernstein, and Elstein (1991) examined medical students, residents, and exper ienced phys ic ians for the p resence of a framing bias. T h e results are interesting in that whereas students and physic ians did not s h o w bias, the residents [intermediate level group] demonstrated framing bias on 5 of 11 c a s e s . The authors proposed that the medical students may have had too little knowledge to make the frame relevant, whereas the exper ienced physic ians have highly stable knowledge, not easi ly altered by framing manipulat ions. It s e e m s reasonable to consider that some exper ience is necessary for a cl inician to exper ience certain b iases, such as anchor ing and adjustment, and 71 representat iveness, because s u c h b ias requires a level of schemat ic knowledge; this knowledge takes at least s o m e exper ience to develop. Research Assumptions and Design Factors The factors being addressed here were those which inf luenced the study des ign and task select ion: contextual factors, implications of representat iveness, assumpt ions about cognit ion, and assumpt ions arising from research traditions. Contextual factors. Context has a powerful inf luence on any research results. B i rnbaum (1982) identifies two types of context: the context provided in the laboratory, including the type of environment and tasks, and the directions, and the internal context the subject brings to the laboratory in the form of knowledge, expectat ions, and memor ies. Subjects ' judgments depend on both types of contexts. This d iscuss ion will focus on the internal context. Birnbaum (1982) argued that when a st imulus is presented in a research task, each subject 's mental context from exper ience is differentially brought to the task. Birnbaum recommended that these effects be cons idered, not as b iases to be el iminated, but as integral aspects to be studied. He used the term "systextual des ign" to refer to research in which context is systematical ly manipulated. In this study, priming was used to influence each subject 's internal context in an effort to induce greater availability of prior c a s e s ; in addit ion, cues were presented in two sequences in an attempt to induce either analys is or intuition. Manipulat ing cue-presentat ion sequence w a s anticipated to inf luence anchor ing, similar to research conducted by Fr iedlander & S tockman (1983). Implication of representativeness. Accord ing to K. R. H a m m o n d (1990), a particular set of stimuli or c u e s that carry a s imple meaning to nov ice subjects may, 72 in addit ion, communicate to exper ienced subjects what are referred to as "secondary" cues . Bouwman (1982) provided a good example of two accountants ' protocols that revealed expert ise-related differences in response to such cues : b a s e d on expectat ions der ived from knowledge of representat iveness, the expert used a particular contradiction to guide further explorat ion; in contrast, the novice, having little exper ience from which to make expectat ions, reported the contradict ion, but apparently never recognized its informativeness. The mismatch funct ioned as a secondary cue in the former instance, but did not in the latter. Knowledge assoc ia ted with representat iveness of the cue configuration c a n often function as secondary cues . In real-life situations, to those who can detect these secondary cues , the information gained can be useful in judgment. Whether or not secondary cues facilitate performance in a research context, however, depends on the particular design employed. If representative design is used , such cues may be helpful because the mental context of exper ienced subjects and the configuration of stimuli used in the study have a high degree of cor respondence. In contrast, des igns in which cues are or thogonal ized (or are otherwise nonrepresentative) cou ld lead to reduced judgment accuracy because the particular composi t ion of presented c a s e s would not match expectat ions. The e s s e n c e of the principle of representative des ign is that the judge is best studied using cues and cue-weight ings that occur in the natural ecology (Brunswik, 1955). Rock et a l . (1987), a s wel l , argued for taking an ecologica l approach to the study of clinical judgment. Exper ienced subjects learn to consider a range of particular phenomena as normal or typical; phenomena outside this range are v iewed as atypical. Natural environments are compr ised of ongoing, inter-related 73 events, rather than a ser ies of static, separate states; Hogarth (1981) argued that "representative des ign is e v e n more important in assess i ng human capabi l i t ies in cont inuous a s opposed to discrete situations" (p. 212). Kahneman (1992) descr ibed the concept of a cognit ive norm as the internal variability of an attribute within a category. For exper ienced subjects, the relative activation of different va lues of an attribute is governed more by the rules of memory than by the rules of logic or statistics. Th is activation forms the bas is for var ious heurist ics that are employed to e a s e the burden of information process ing. T h e s e heurist ics, however, will be effective only in an ecology that is similar to that exper ienced by the cl inician. S o m e researchers have maintained that the reason for representative design is to genera l ize to the cl inical ecology. Mook (1983) pointed out that many experimental studies are crit icized for lack of external validity, and yet achieving external validity is not a lways the research goal . With the present study, the goal was to understand clinical judgment at a deep level, and to fairly compare subjects at different exper ience levels in terms of performance on a task in which var ious clinical cues are we ighed and integrated. To the extent that experts use the representat iveness heuristic as a means of achieving their expert ise, representat ive des ign becomes a necessary element. One aspect of representat iveness in the research setting is the p resence of ambiguity corresponding to that which exists in the natural cl inical setting. In attempting to understand how human beings accompl ish judgment tasks, Brunswik (1956) and K. R. Hammond (1955, 1990) recommended that ambiguity be present in the condit ions under wh ich judgment is studied. T a s k s are commonly referred to a s "ambiguous" if they are not able to be clearly perceived or readily interpreted. In this 74 study, ambiguity is v iewed as an interaction between a subject and a particular c a s e , not as an objectively-defined attribute of a st imulus a lone; thus, the percept ion of ambiguity is contingent upon e a c h subject 's unique knowledge and exper ience. Wh ich st imulus is seen as ambiguous very much depends on who is looking. Einhorn and Hogarth (1985) dist inguished between first- and second-order ambiguity. The former is present where a judge is aware of the uncertainty in the judgment context, and the probabil istic nature of the relationship between the cues and the criterion. The latter is present when the judge is uncertain about the extent and nature of these relations. Exper ienced cl inicians tend towards first-order ambiguity, whereas nov ices likely exper ience second-order . This distinction may be important in determining optimal judgment strategies: second-order ambiguity may induce novices to weight cues equal ly and use an additive rule. In contrast, first-order ambiguity may tend to elicit configural strategies, exceed ing the limits of information process ing. Th is more complex strategy is not a lways assoc ia ted with greater accuracy (Camerer & Johnson , 1991). B e c a u s e outcome feedback is often limited in clinical situations, and conf idence tends to increase with exper ience, cl inicians may not real ize when their strategies are ineffective. Assumptions about knowledge. The present study is an investigation of the extent to which cognit ive constructs and exper ience account for variation in performance in a clinical judgment task. The assumpt ions made about knowledge, therefore, are critical in interpreting the results obtained and the conc lus ions made. R e s e a r c h cannot reveal what restrictive assumpt ions have prec luded from the outset. 75 Bechtel and Ab rahamsen (1991) examined two assumpt ions about knowledge that are relevant for this study: one is that knowledge is compr ised of symbo ls under the constraint of rules and is exp ressed in sen tences or proposit ions that have a truth value. T h e s e symbo ls are a s s u m e d to be enduring entities which are stored in, and retrieved from, memory. The second assumpt ion about knowledge is that it may be expressed in non-proposit ional formats; that is, in a nonverbal , or non-symbol ic form. Resea rche rs with this theoretical perspect ive who model cognit ion using this assumpt ion referred to such models variously as " P D P " [parallel distributed processing], connectionist, or neural network models . Recent ly, there has been much interest in this a rea of cognit ive sc ience (Bereiter, 1991; Caspar , Rothenf luh, & S e g a l , 1992; Fe ldman & Bal lard, 1982; McCle l land & Rumelhart , 1986; Quin lan, 1991). The proponents of these P D P models argued that the mind functions as a network of elementary nodes connected to each other so that active units excite or inhibit e a c h other in a dynamic sys tem. Networks reflect patterns by encoding regularit ies in weighted connect ions that are modif ied by exper ience. Within a connect ionist framework, pattern recognit ion plays a fundamental role at all levels of cognit ive process ing. Learning cons is ts of altering the weights of connect ions between units so that smal l adjustments in the way in which inputs are p rocessed on subsequent occas ions are made. No stored symbols or rules are required. Both symbol ic and connectionist sys tems are computat ional sys tems: in the former, computat ion involves the transformation of symbo ls according to rules, whereas in the latter, computat ion is implemented by units that excite and inhibit one another. 76 The basic aspec ts of these v iews are not altogether new: in fact, Ne isse r (1963) dist inguished between sequential and multiple mental p rocesses ; the former involved a step-by-step process useful when there is little uncertainty, and the latter involved perceiving input "as a whole", important in recogniz ing ambiguous cues and patterns. Similarly, Lur ia (1966) descr ibed the integrative nature of mental functioning in terms of success i ve (serial) and s imul taneous (parallel) p rocess ing . Recent ly, S l oman (1996) supported these earlier v iews: he stated "The mind has dual aspects , one of which conforms to the associat ionist v iew and one of which conforms to the analytic, sequent ial v iew" (p. 3). Researche rs are exploring the issue of integrating symbol ic and P D P approaches (J. A . Anderson , 1990; Bechte l , 1988; Es tes , 1988; Wolters & Pha l , 1990). Lesgo ld (1989) argued that both approaches were useful in medic ine: [S]ymbolic models, as currently deve loped and tested, are well suited to capturing phys ic ians ' accounts of the reasons for their d iagnoses , whi le connectionist models are perhaps better suited to mimicking the behavior of physic ians (Lesgold, p. 395). A comprehens ive study of clinical judgment performance demands both of these assumpt ions. W h e n novices make judgments, cognit ion can be properly descr ibed as symbol manipulat ion; idea-units and rules for action are apprehended, encoded , and retrieved in verbal codes . Limiting expert judgment to symbol manipulat ion, however, would not reveal cue-judgment connect ions that are encoded, not in verbal form, but in domain-speci f ic patterns of information. T h e s e assumpt ions, fundamental to the Dreyfus model , are critical for descr ib ing progression in judgment performance from novice to expert. The assumpt ion related to research on judgment expert ise that is often made is that analys is is the predominant type of cognit ive process ing employed. Brooks et 77 al. (1991) pointed out that an assumpt ion of many studies in medical educat ion is that skill consis ts of learning about separate features which are we ighed and combined when cl inicians make judgments about speci f ic c a s e s . Th is analyt ic assumpt ion underl ies regression models where researchers attempt to capture expert ise with changes in the weights appl ied to a set of cues ; it is a s s u m e d that these cues have a linear relationship with the criterion and are combined in an additive manner. Abernathy and H a m m (1995) have d i scussed the u s e of a combinat ion of analys is and intuition with respect to physic ians. In addit ion, there is recognit ion in the literature in nursing educat ion that nurses require both analys is and intuition (Miller & Rew , 1989; Radwin , 1995). Cons iderab le ev idence suggests that the analyt ic approach and the intuitive approach are used interactively. For example , Dreyfus and Dreyfus (1986) descr ibed the way in which analys is and intuition are combined as people progress towards expert ise: [Ajnalysis and intuition work together in the human mind. Al though intuition is the final fruit of skill acquisi t ion, analyt ic thinking is necessary for beginners learning a new skil l . It is a lso useful at the highest levels of expert ise where it can sharpen and clarify intuitive insights (p. xiv). In early medical studies, models employed did not capture the inter-dependency in problem features and data items, which in all medical f ields is crit ical. A n example is the independent cues model referred to by Med in a n d Smi th (1984). The causa l relat ionships among items were not represented, nor was the dynamic nature of i l lness as a process that was acquired and changed over time. Yet , as Murphy and Med in (1985) argued, it is precisely this type of theoretical knowledge that promotes conceptual coherence. Expert clinical judgment involves being able to perceive theoretical ly-derived similarity among cases . Percept ion of such similarity gives direction to judgment that has potential to be much superior to judgments 78 based on unconnected cues and superficial features. To reveal expert ise, the researcher must select tasks that provide opportunity for subjects to demonstrate both analys is and intuition, and as well as the process ing of similarity relations. Section E: Summary R e s e a r c h quest ion 6 is: Of all the measures inc luded in the study, which measu res are most predictive of clinical judgment performance? Convinc ing ev idence exists that conceptual structure changes with expert ise; as knowledge organizat ion becomes more highly elaborated, it is restructured into a hierarchical form, al lowing rapid a c c e s s to (or construct ion of) patterns of relations and inferences. A s expert ise increases, sensitivity to domain-speci f ic patterns becomes greater. With familiar tasks, experts' "chunking" of separate (but related) p ieces of data al lows cues to be avai lable together in working memory which may encourage pattern recognit ion by facilitating the compar ison of relations among task e lements. Mental representat ions evolve with expert ise from a static unidimensional internal "problem space" to a dynamic mult idimensional mental model containing distributional information; this transformation is accompan ied by corresponding increases in functional ability. Judgment process a lso changes with expert ise, from a focus on speci f ic cues and linear relations to an emphas is on meaningful patterns of cues . Experts ' increased familiarity with the ecology and with typical c a s e s promotes greater automatization of routine aspects of a task, al lowing attention to be focused on the incongruent or novel aspects . B e c a u s e of information process ing constraints, and 79 the need to process a great deal of data, cl inicians frequently employ a number of heurist ics when making judgments. Whether the use of these mental strategies leads to cognit ive bias, or leads to good judgments, depends on the context. In the laboratory setting, there are many examples where heurist ics (which are functional in the natural ecology) become dysfunct ional. Researchers investigating judgment performance in the laboratory have often fai led to find dif ferences in judgment accuracy with cl inicians of varying exper ience levels. Th is study is an examinat ion of possib le reasons; it may be that the laboratory context is not conduc ive to reveal ing expert ise, or that the artificial tasks often used do not elicit the s a m e thinking and responses that authentic judgments situations would. Al though experts may have greater explicit knowledge compared to nov ices, it may be in their considerable tacit, or implicit knowledge shown in act ion, where performance dif ferences will prove to be more detectable. Revea l ing patterns of relat ionships among the indicators of expert ise, the cognit ive constructs, the individual dif ferences, and the research condit ions are matters that cannot be addressed on the basis of the existing literature. In order to identify what measures (if any) are predictive of judgment accuracy, empir ical study is required. In the next chapter, the methods used to address these research quest ions are descr ibed. 80 III. RESEARCH QUESTIONS, METHODS, AND ANTICIPATED PATTERNS OF RELATIONS This chapter is organized into four sect ions. The research quest ions and a descript ion of the des ign and procedures are in Sect ion A . A n explanat ion of the lens model from a methodological perspect ive, fol lowed by s o m e critical d iscuss ion of l inear model ing is in Sec t ion B. A n outline of the methods emp loyed (research tasks and condit ions, and measures of constructs) are included in Sect ion C . A list of anticipated patterns of relations are avai lable in Sect ion D. Section A: Research Questions, Design, and Procedures The following overall research quest ion and speci f ic research quest ions have provided direction to the study. Overall Research Question: What are the patterns of relat ionships among measures of se lec ted cognit ive constructs (conceptual structure, sensitivity to patterns in data, and judgment process) , individual difference var iables (age, educat ion, and exper ience), task condit ions, and performance in a cl inical judgment task? Specific Research Questions: In a domain-speci f ic , probabil istic clinical judgment task, where outcomes are only moderately predictable: 1. What are the patterns of relationships among var ious measures of judgment performance (indicators of expertise) and exper ience for a group of subjects in a clinical judgment task? 81 2. What are the patterns of relationships among measures of conceptual structure, sensitivity to patterns in data, judgment p rocess , and performance in a cl inical judgment task? 3. Wha t are the patterns of relat ionships among individual di f ferences in age, educat ion, and exper ience and performance in a cl inical judgment task? 4. To what extent does cue-presentat ion condit ion (context cues fol lowed by individuating cues , or the reverse), reveal patterns of relat ionships in performance in a cl inical judgment task? 5. T o what extent does memory-pr iming condit ion (exposure to relevant domain-speci f ic visual stimuli, versus no exposure), reveal patterns of relationships in performance in a clinical judgment task? 6. Of all the measures included in the study, which measures are most predictive of clinical judgment per formance? Design and Procedures In phase 1 of the study, multiple regression analys is w a s used to derive an equat ion that character ized the relat ionships between the cues and heal ing t ime in the clinical ecology. The equation was based on clinical data obtained from patients and their medical charts, and from direct reports of heal ing. The data inc luded: personal history cues (age, gender, and occupat ion); medical cues (weight, height, d iagnosis , and medical condit ions); surgery cues (type of surgery, length of surgery, and complicat ions), and incision cues (length, approximation, and dressing type). The dependent variable was heal ing time, a s s e s s e d in days. The equat ion der ived 82 from multiple regression was compr ised of the set of cues that best predicted heal ing time in this patient population. In phase 2 of the study, the subjects were nurses who completed a number of tasks that required judgments about incisional heal ing in surgical patients. T h e primary task was the one in which judgments about heal ing t ime were made, and pol icies were captured for each subject. T h e s e pol icies constituted a mathematical descript ion of the relat ionships between the cues and the judgments. Equat ions for each subject descr ib ing his or her judgment policy were computed and compared to the equat ions der ived from the ecology. Th is judgment task w a s a lso used to elicit judgment conf idence and to measure judgment latency. For the heal ing t ime judgment task, two sets of parallel c a s e s were deve loped; data from representative patients from phase 1 were used , and the sets of c a s e s were constructed to be as equivalent as possib le. O n e set w a s deve loped with the paragraph containing the global cues first, fo l lowed by the paragraph containing the speci f ic cues . Th is order w a s referred to a s normal order. The other set of cases was structured with the speci f ic cues first, fol lowed by the global c u e s . This order was referred to as reverse order. Nurses who had a range of exper ience car ing for abdominal surgical patients predicted heal ing t imes for both se ts of c a s e s ; sequenc ing of the administration of the two sets of c a s e s (normal order and reverse order) w a s counterbalanced. Subjects were informed that the c a s e data used in the tasks were obtained from actual patients. For each case , immediately after making a judgment of heal ing time, subjects rated their conf idence that each judgment of heal ing t ime was within the al lowable range of the actual heal ing time. Subjects were individually-tested by the researcher. Directions were given prior to each task, and rest breaks were 83 schedu led to reduce fatigue. Test ing time was about 3 hours. Subjects were paid $50.00 each for their participation. Sample Size B a s e d on previous research, the assumpt ion w a s made that in the populat ion a smal l effect s i ze was most likely. A n effect s i ze in this context refers to the magnitude of the relationship being investigated. C o h e n (1988) and Olejnik (1984) recommended that the power (the probability of rejecting a fa lse null hypothesis) in a research study should be .70 to .85. The number of randomly samp led subjects needed to attain sufficient power would range from 80 to severa l hundred, depending on the des ign , the a lpha level se lec ted, and other factors (Cohen, 1988; Olejnik, 1984; Trattner & O'Leary, 1980). There were inadequate resources to conduct a study of such magnitude. In addit ion, it was not poss ib le to sample subjects randomly. Shave r (1993) argued that " a test of statistical s igni f icance used without r a n d o m i z a t i o n , . . . does not yield val id information about the probability of a result under the null hypothesis" (p. 299). The goal of the study was to a s s e s s for meaningful patterns between var ious measures of expert ise in judgment performance and subjects' exper ience. The measures of performance were obtained under two condit ions, using tasks which were theor ized to induce analys is or intuition. There were theoretical ly-derived rationales for predicting particular patterns. Thirty-six subjects who var ied with respect to their exper ience in car ing for surgical patients volunteered for the study. It w a s recognized that this s i ze of a samp le w a s inadequate to ach ieve the level of power recommended. Other studies where researchers have investigated novice-expert dif ferences, however, have had smal l sample s i zes : Hammond et a l . (1964) ana lyzed data for three groups (5 subjects were naive, 5 were semi-sophist icated, 84 and 5 were sophist icated), and Coughl in and Patel (1987) investigated information process ing for 16 medical students and 16 family medic ine practit ioners. S a m p l e s i ze is smal l a lso when data are ana lyzed by m e a n s of protocol ana lys is ; for example, Moskowi ts et a l . (1988) carr ied out an in-depth analys is on the protocols for 3 subjects. In the present study, the goal was to a s s e s s for patterns that may be meaningful from a clinical and educat ional perspect ive. Meaningfu lness is not determined by statistical s igni f icance (Cohen, 1994; Schafer , 1993; Shaver , 1993). Schmidt (1992, 1996) and Thompson (1987) argued that statistical s igni f icance a lone does not permit evaluat ion of the importance of a f inding. Schmidt (1992, 1996) stated that even though a study has inadequate power, it may have potential to contain valuable information when combined with similar studies in meta-analys is . This researcher suggested that each smal l -sca le study be cons idered as a data point to contribute to a later meta-analysis. A sample s ize of 36 subjects, therefore, was cons idered to be smal l , but adequate for the speci f ied purpose. ' Section B: The Lens Model Description of the Lens Model The lens model is b a s e d on research carr ied out by Brunswik (1955, 1956) and extended by K. R. Hammond (1955), Hammond et al . (1964), Hammond and Summers (1965, 1972), Deane, Hammond , and S u m m e r s (1972), and Hursch , Hammond , and Hursch (1964). Brunswik (1955) argued that in order to obtain an adequate analys is of judgment performance, a descript ion of the environment a s well as that of the individual subject 's response was required. The lens model provided such quantitative descript ion. 85 The lens model has had much d iscuss ion in the literature (Castel lan, 1972, 1973; Groner, 1972; J . C . Lee & Tucker, 1962; Petr inovich, 1979; Stewart, 1976; Tucker, 1964). This descript ion represents a synthesis from these readings. The lens model is both a conceptual model as well as a method to illustrate the judge's weighting and integration of cues that have varying degrees of re levance for a particular judgment. A n advantage of using this method is that subjects at var ious exper ience levels can make judgments for the s a m e c a s e s , and compar isons in judgment performance c a n be made. Another advantage is that the cue-weights are der ived in an objective way (using a least squares criterion) that can be compared to the judge's subjective weighting of particular cues . In any task in which a criterion Y e (referred to as the criterion state, or distal criterion) must be judged, the subject 's response is based on an analys is of informational cues , X , (referred to as proximal stimuli). In many judgment tasks, the cues are probabil istically related to the criterion, and the judgment made is probabil istically related to the cues . The bas ic premise of the model is that the environment contains uncertainty, and that judgments are made about a distal criterion on the bas is of proximal cues that lack perfect validity. W h e n appl ied in a clinical context, the c u e s (X,) are the p ieces of information about patients, such as age, d iagnosis, state of health, and surgery, avai lable for considerat ion. The lens model can be descr ibed in terms of the va lue of the multiple correlation coefficient, R e , for the environment, and R s , for the subject. Re descr ibes the relation between the cues and Y e (the criterion state, or the variable being judged); R s descr ibes the relation between the cues and Y s (each subject 's response). Conceptual ly , R e reflects the extent to which the criterion va lues can be 86 predicted by the cues , or task predictability, whereas, R s refers to the subject 's cognitive control, or consistency. C u e s vary in ecological validity (correlation with the cl inical state), and in utilization validity (correlation with the subject 's judgment ol the criterion state). O n e assumpt ion of this model is that the criterion state and the judgments can be expressed as a linear function of the cues : Y e = b e ' X , a n d (1) Y s = b; X , (2) where b e ' and b s ' are vectors of cue weights (regression coefficients) a n d X is the matrix of cues . One characterist ic of cues in the clinical judgment ecology is that they are frequently intercorrelated. That is, there are typically a number of cues, rather than only one, where certain cue-va lues are assoc ia ted with high or low criterion scores . T h e s e c u e s have s o m e degree of intersubstitutability. In situations where the judge is sensit ive to the correlat ions in the cues , judgment performance may be enhanced ; intercorrelated cues may be used interchangeably. A mathematical descript ion of the relat ionships between the cues and the ecology is known as the left s ide of the lens model . In contrast, the right s ide of the model is the corresponding representat ion of each judge's cue utilization policy. Obtaining such an equation is known as policy capturing (Mazen, 1990; S lov ic , 1966; S lov ic e ta l . , 1977). 87 Lens model achievement (ra), is the correlation between the subject 's judgments and the criterion scores . The lens model equation is: r a = G R e R s + C V O - R e 2 ) (1 - R s 2 ) (3) Th is equation is a mathematical descript ion indicating that ach ievement from a lens model perspect ive, ra, is limited by the degree to which the task is predictable, R e . Beyond that, such ach ievement is partly determined by knowledge of the task, G , and a lso by cognit ive control, R s . A measure of nonl inear var iance-matching, C , has potential to increase achievement, but only if significant systemat ic nonl inear or unmodeled var iance exists in the environmental system which can be detected and correctly u s e d (Cooksey, 1996). S tenson (1974) def ined C a s the l inear correlation between the var iance in the task system and the subject 's judgment sys tem that is unaccounted for by the linear component . Thus , if all the systemat ic var iance in the criterion scores can be accounted for by a linear combinat ion of cues , then C will equal zero. If a subject were sensi t ive to nonlinearity, but u s e d it inappropriately, C would be negative (Dudycha & Naylor, 1966). The matching index, 1-d, was descr ibed by Hammond et a l . (1964), and referred to by Hammond and S u m m e r s (1965) and Hoffman (1968); this index is a measure of the extent to which a subject uses the cues , relative to the validity of the cues . Hammond et a l . (1964) found that 2-0* became smal ler with exper ience and was highly related to lens model achievement. If matching were perfect, 2-d would be zero. S e e Figure 2. 88 CLINICAL ECOLOGY CUES JUDGMENT POLICY Predictability (R e) V Policy Consistency (Rs) Predicted Clinical State (Ye) Clinical State (Ye) x, 1 'e.2 's .2 1 Judged Predicted State Judged (Ys) State A (Ys) E c o l o g i c a l / Val idi t ies (re; C u e util ization N Val idi t ies (rs) ACHIEVEMENT (ra) ECOLOGY-POLICY MATCH (G) Figure 2. Brunswik 's lens model , showing mathematical relations. r a = Ach ievement X/ = Informational cues Y e = Criter ion state being judged A Y e = Pred ic ted criterion state Y s = Subject 's judgments of the criterion state A Y s = Pred ic ted judgments of the criterion state G = Ecology-po l icy matching (linear knowledge) R s = Po l i cy cons is tency (cognitive control) R e = Predictabil i ty of eco logy (task control) Note- From "A socia l judgment theory perspect ive on cl inical problem solv ing," by J . DTEnge l , R. Wigton, A . L a D u c a , and R. S . B lack low, 1990, Evaluat ion & the Heal th Pro fess ions. 13, p. 65-66. Copyright 1990 by S a g e Publ icat ions, Inc. Adap ted with permiss ion. 89 Critical Discussion of Linear Modeling Separating cognitive processes from task effects. Hoffman (1968) argued that regression approaches to model ing judgments related inputs (data) and outputs (judgments) mathematical ly; such models were not a representat ion of cognit ive p rocesses . He cons idered these linear models to be paramorphic (Hoffman, 1960, 1968). That is, regress ion equat ions constituted a descr ipt ion of the judgments "as if" the cognit ive process ing were such . The fact that l inear mode ls work quite well may be more because they model the task rather than the judge (Westenberg & Koele , 1994). Thus , when performance is demonstrated through regression approaches, the f indings may not be informative about the actual cognit ive p rocesses used . In this study, additional tasks were used (information board and protocol analysis) to allow the nature of the cognit ion involved in judgment to be a s s e s s e d . Problems of interpretation of model parameters. D iamond (1989) crit iqued the use of the regression method as reported in a recent medical study (Speroff et a l . , 1989). Schneck and Naylor (1968) a lso identified important limitations of the model ing process. In any sample , outlying c a s e s have potential to significantly inf luence the cues and their weights. Minor dif ferences in cue-intercorrelations can lead to large differences in the particular set of cues which enters the equat ion. Furthermore, results from one sample of subjects often do not cross-val idate well . Relat ionships between the data and outcome criteria have been found to be unstable, unless the sample s ize is very large. Definition and measurement of configurality. A s was d i scussed in chapter 2, the distinction between configural and analyt ic process ing may be an important source of difference in judgment process. Kap lan (1975) def ined 90 configural process ing as "[processing] where properties of a given st imulus are determined with reference to the other stimuli in the array" (p. 150). St imulus importance is affected by the relat ionships in the configuration. In l inear process ing, st imulus va lues are simply added or subtracted, and patterning on the bas is of the st imulus array is not involved. Camere r and Johnson (1991) pointed out that exper ienced subjects often u s e configural rules, whereas nov ices behave more like regress ion mode ls : they weight cues and add them up. O n e problem for exper ienced subjects, however, is that often even elaborate configural rules have little posit ive impact on performance. In the traditional lens model ana lyses , nonlinear cue-use is a s s e s s e d by the va lue of C . In previous studies, compar isons of novices and exper ienced individuals have revealed little difference in C-sco res ; the reason may be that, as G r e e n (1968) expla ined, configural effects are often masked by overwhelming linear effects. N. H. Ande rson (1972) argued that clinical judgment can be cons idered as information integration; the task fac ing the cl inician is to integrate separa te p ieces of informational stimuli into s o m e unitary judgment. Ande rson c la imed that "one main difficulty with previous work has been its rel iance on standard multiple regress ion m e t h o d o l o g y . . . . T h e failure to detect configurality s e e m s to have resulted . . . in part from the inherent limitations of that methodology" (p. 93). Go ldberg (1968) found ev idence lacking for configurality in judgment, and conc luded that "the power of the linear regression model is so great that it se rves to obscure the real configural p rocesses in judgment" (p. 488). Cons iderab le research has been carr ied out in relation to l inear and nonl inear models (Brehmer, 1969; L. C . Johnson & Mai , 1979; Shan teau & Anderson , 1972). Al though the lens model has provision for both linear and nonl inear cue-use , the 91 var iance expla ined by linear model ing has traditionally been identified first; any systemat ic var iance remaining that matches variation in the task system is then cons idered nonlinear. R o z e n b o o m (1972) stated: "The extent to which the relation between cues and focus var iables is l inear rather than curvil inear is very much an artifact [emphasis added] of how we choose to span the cue space " (p. 324). To the extent that expert cl inicians' differential weighting of cues for different c a s e s is requiredior expert ise, the lens model may not be completely adequate to reveal dif ferences assoc ia ted with expert ise. Regress ion methods are b a s e d on the assumpt ion that cue-weights are invariant ac ross cases . To identify if configural process ing of cues is assoc ia ted with expert ise, additional methods for determining configural cue -use may be necessary . In this study, the orthogonal-cue judgment task was used a s an alternative method for this purpose. Correlational measures of performance. Lindell (1978) exp ressed a warning about the interpretation of R 2 as an index of performance; this measure is the proportion of var iance accounted for by linear regression. Lindell argued that any factor which inf luenced the magnitude of the response var iance affected the R 2 index: "To the degree that subjects' responses are differentially affected by these factors, dif ferences in levels of R 2 become uninterpretable" (p. 71). Lindell recommended that in compar ing different subjects, the absolute amount of residual var iance for each subject was more useful than an index based on R 2 . In the lens model , multiple R (R s) is used to operat ional ize the construct of cognit ive control. O 'Grady (1982) a lso pointed out cautions and limitations regarding measures of expla ined var iance in psychological research. In this study, the R 2 index and multiple R were not used as performance measures , but as independent var iables, tapping subjects' sensitivity to broad patterns of data in the ecology. 92 The main indicator of judgment expert ise in this study w a s stipulated to be accuracy in judgments of heal ing time. Accu racy was determined by two measures . T h e first w a s the square root of the m e a n of squared error sco res , a s recommended by Lindell (1976). This measure , similar to the one employed by W . F. Wright and Anderson (1989), represents how many days, on average, each subject 's judgments deviate from the actual heal ing t ime. B e c a u s e the t ime for incis ions to heal was not precise, an interval measure of heal ing time w a s used : a 10% interval was constructed around two days preceding and two days fol lowing the reported heal ing t ime. Subjects were informed about this measure . This procedure w a s used so that an interval of at least five days would be assured when assess ing subjects' cal ibration. The judgments were difficult, and a liberal interpretation of the meaning of healing time was appropriate. For e a c h case , the "error", or deviation score , was def ined as the difference between e a c h subject 's judgment of heal ing t ime and the c losest limit of the interval measu re of actual heal ing time. This measure of accuracy is residual var iance, where smal ler scores indicated higher accuracy. For greater interpretability, a s imple linear transformation w a s carr ied out by subtracting each score from 100, s o that high scores indicated higher accuracy. The equation for accuracy (transformed) a s s e s s e d by mean error var iance was : Accuracy = 100 Where : HT = Heal ing T ime (Closest limit of interval measure) J H T = Judgment of Heal ing T ime n = Number of c a s e s (35 normal order, 35 reverse order) 93 The second measure of judgment accuracy was the number of correct judgments as a percent (or proportion) of total judgments. The proportion measure w a s used to be consistent with the literature in assess ing conf idence in relation to accuracy. Error var iance and the percent correct measures have an advantage over correlational measures for compar ing novice and exper ienced cl inicians on performance accuracy. Correlat ion methods are noted to be insensit ive to dif ferences between sets of scores that increase monotonical ly (Pritchard & Roth, 1991). Fo r example , Cas te l lan (1992) demonstrated that the use of G (task knowledge) obtained from lens model ana lyses had limited use as an index of performance: this author stated that " G can be large for even the most perverse or least attentive subjects" (p. 380). Ev idence that G-sco res quickly reached cei l ing levels is found in C o o k s e y and Freebody (1987), where first- and second-year student teachers ' G-sco res were found to be extremely high (.86 to .99) in a study where reading achievement was predicted. The problem of the criterion. In trying to determine expertise in cl inical judgment performance, there frequently is difficulty in defining judgment accuracy in an objective manner. Th is point is a not a criticism that is pecul iar to l inear model ing; any method of investigating judgments may be subject to such a problem. Shan teau (1992) stated that external s tandards are se ldom avai lable and that is why expert judgment is needed ; the fact that standards are def ined from subject ive opinion of experts, (and not the other way around) can be problematic. Sa l thouse (1991), in a d iscuss ion about research on expert ise, and E r i csson and Smi th (1991) pointed out the importance of obtaining measures of actual competence, rather than relying on consensua l judgment or other index. 94 Situations which constitute judgment performance cannot be treated as a s imple algorithm, which, if fol lowed faithfully and systematical ly will lead unequivocal ly to accurate responses . By definition, and by their very nature, judgments are probabil istic and will contain error. How can expert judgment be dist inguished from random g u e s s e s ? C h a n (1982) character ized good judgment as having coherence, veridicality, and reliability. In s o m e contexts s u c h a s weather forecast ing, the judgments made can be compared to the weather that actually occurs . Over time, a measure of accuracy can be calculated. Cl inical judgments, however, frequently do not have an externally verif iable criterion. Judgments about the severity of congest ive heart failure (LaDuca et al . , 1988), the nature of card iac pain (Jacavone & Dostal , 1992), patients' need for blood transfusion therapy (Sa lem-Schatz et a l . , 1990) are difficult to a s s e s s for quality because there is no absolute bas is for correct judgments. In this study, the judgment task involved predicting heal ing t imes for a number of patients. B e c a u s e an actual criterion existed, levels of judgment accuracy for different subjects could be compared . In s o m e studies, expert judgment is substituted for the criterion, a pract ice which may be justifiable in s o m e c i rcumstances. For example, Car lson (1989), investigated the degree of risk of a child for abuse , and Kirwan, Chaput de Saintonge, J o y c e , and Currey (1983) judged current arthritis activity. W h e n the purpose of the study, however, is to examine aspects of expertise in judgment, an objectively verif iable criterion is essent ia l . Section C: Research Tasks, Conditions, and Measures This sect ion is a descript ion of the methods used , including research tasks, condit ions and measures . 95 Overview of Methods In the literature review, it w a s demonstrated that judgment researchers often focused on one phase of the judgment task, and cons idered only a few var iables. In this study, the intent was to explore the patterns of relations between exper ience and a number of indicators of clinical judgment performance on one task, with a group of subjects who had a wide range of exper ience. The patterns of relations identified for both judgment p rocess and outcome under two different condit ions were anticipated to be helpful in interpreting the results of previous judgment research. The methods se lec ted were intended to ach ieve the following goals : (1) to model the relat ionships between cues and a criterion in one cl inical eco logy; (2) to reveal aspects of conceptual structure, sensitivity to patterns in data, and judgment p rocess ; (3) to obtain data about age, educat ion, and exper ience; (4) to capture subjects ' judgment pol icies under either enhanced memory-pr iming or basel ine memory-pr iming condit ion, and in both cue-presentat ion orders; and, (5) to identify var iables that were most predictive of expert ise in cl inical judgment performance. Research Tasks and Conditions Association task. Benner 's (1984) research on novice and expert nurses provided direction for this task. The purpose of the task was to encourage subjects to descr ibe their exper iences with surgical patients, with a particular focus on incisional heal ing. Th is was the initial research task; because the subjects had not yet been inf luenced by the other tasks in the study, it was of interest to pay attention to the level of specificity of the cues to which they referred, and the structure of their verbal izat ion. In his research on expert ise, Shan teau (1992) found that experts ' knowledge w a s readily a c c e s s e d through stories about past c a s e s . Anecdota l 96 accounts provided a mnemonic to remember and a convenient way to organize the information. After reading a card with information about a patient with an abdominal incision, subjects were asked to say what came to mind. They were encouraged to use their exper ience in any way that might be helpful. Three cards were used and responses were tape-recorded. T h e types of cues and their level in a conceptual hierarchy, and ev idence of the extent to which a coherent theory guided subjects ' thinking were a s s e s s e d , as suggested by Murphy and Medin (1985). Information board task. Th is task was based on research carr ied out by Payne (1976) and reviewed by Ford, Schmitt, Schech tman , Hults, and Doherty (1989), and by Harte, Wes tenberg , and von Someren (1994). Information search and judgment p rocesses were a s s e s s e d . The goal of the task was to have subjects rank order four s imulated patients (with data based on actual c a s e data) with respect to heal ing t ime. C u e s which might provide information about the four patients were printed on overturned cards ; only the labels for e a c h cue-category were in v iew initially. Thirty-two p ieces of information were potentially avai lable; subjects dec ided which cue to examine at each point, up to a maximum of 24 p ieces. Pilot testing this task showed that some data restriction was necessary in order to elicit strategic information process ing. O n c e sufficient data were selected and subjects integrated the cue information, they made rank order judgments of heal ing t ime. During this sea rch and judgment task, subjects were asked to think aloud. Verbal izat ions were tape-recorded. P rocess tracings were constructed which revealed depth, variability, and pattern of information search . 97 Examp les of strategic search p rocesses , and compensatory and noncompensatory integration strategies were identified. Verbal izat ions were a s s e s s e d using protocol analys is according to descript ions from Er icsson and S imon (1980,1993) , O l s o n and Biolsi (1991), and Payne , Bet tman, and J o h n s o n (1993a, 1993b). Healing time judgment task. This task was based on research by K. R. Hammond et a l . (1964). Subjects responded to two sets of 40 c a s e s , each consist ing of 35 c a s e s for analys is, and five repeated c a s e s to a s s e s s consistency. The cues were expressed in two brief paragraphs: one containing context cues (such as age, history, d iagnosis, and surgery) and one containing the individuating or speci f ic cues (such as blood loss, surgery time, compl icat ions, and incision data). C a s e s were presented one at a time by personal computer, using a program deve loped for this purpose. The paragraphs were typed in white letters on a blue background for e a s e in reading. O n c e the subject s ignal led readiness, the initial paragraph was presented on the sc reen . The second paragraph appeared when the subject again indicated readiness; both paragraphs remained in v iew until the subject w a s ready to make a judgment. Subjects had control over presentat ion t ime, to a maximum of 30 seconds per paragraph; from pilot test ing, such t ime was determined to be sufficient. Subjects read each c a s e and made their best judgments of heal ing t ime and conf idence. Latency was t imed by the computer for each c a s e , and subjects were aware of this. In the normal order c a s e s , context cues were presented first, fo l lowed by the individuating cues . B e c a u s e this paragraph sequence would be highly famil iar (particularly for exper ienced subjects), it was theor ized that normal order presentat ion would encourage the use of wel l - learned scripts. In the reverse order 98 cases , the individuating cues were presented first, fol lowed by the context cues . Reve rse order presentation was theor ized to require more attention and encourage more deliberate process ing because the individuating cues would have little meaning until they were understood in terms of the context cues . In addit ion, reversing the order of the paragraphs was thought to potentially interfere with the formation of a good anchor to use as a heuristic for prediction (Friedlander & S tockman, 1983). B a s e d on the research of K. R. Hammond and co l leagues, the normal order c a s e s were theor ized to be intuition-inducing, whereas the reverse order c a s e s were expected to be analys is- inducing; it w a s anticipated, therefore, that exper ienced subjects would demonstrate reduced performance in the reverse order c a s e s , compared to their performance in the normal order c a s e s . Examp les of both normal order c a s e s and reverse order c a s e s are provided in Appendix B. Presentat ion sequence for paragraph order (normal order and reverse order cases) was counterbalanced between subjects. Subjects ass igned to c a s e sequence 1 completed all normal order c a s e s , then reverse order c a s e s . Subjects ass igned to c a s e sequence 2 completed all reverse order c a s e s , then normal cases . E a c h subject performed both sets of c a s e s ; paragraph order was a within-subject factor. Subjects were randomly ass igned to case sequence , within exper ience levels. The presentation sequence for all tasks is shown pictorially in Append ix C . Heal ing t ime was defined as the earl iest time that incision sur faces were fully approximated, with no sti tches or staples, no redness, swel l ing, or drainage. This definition (consistent with the definition used with surgical patients in phase 1 of the study) was expla ined to each of the subjects. Expert ise in judgment performance was measured by accuracy, calculated as the percent (or proportion) correct, or the square root of the mean of squared errors, as stated previously. For the error var iance measure , an upper limit (or "ceiling") w a s imposed s o that a few extremely 99 inaccurate judgments would not have excess ive influence on the score for the entire set of 35 judgments. Deviation scores of 4 weeks or more for any one c a s e were cons idered irrelevant to the assessmen t of judgment performance. The following assessmen t for the lens model measures were made. (1) Knowledge of task relations, G , (the extent to which the subject correctly detected the properties of the task), was obtained by correlating the prediction of the criterion with the prediction of each subject's judgment. (2) Cognit ive control, R s , (the predictability of the subject 's responses based on the cues) , w a s obtained from the multiple correlation between Y s and the cues . (3) Nonl inear var iance-matching, C , (an indication of configural cue-use) was the correlation between the residuals of the linear predictions of Y e and Y s . (4) Ecologica l validity coefficients, r e / (measures of the importance of the cues in the ecology) were calculated by correlating cue-va lues with actual cri terion-values in the ecology. (5) Util ization coeff icients, r s / (the importance p laced on the c u e s by the subject) were calculated by correlat ing cue -va lues with judged cri terion-values in the subject sys tem. (6) The matching index, 2 -d, (the degree of match between the ecological validit ies and the utilization coefficients) w a s calculated in the following manner. First, two sets of dif ferences from the lens model c a s e s were determined: those between the mode l -based re( (representing the ecology) and the subject -based r s / (representing the subject system), and the dif ferences between the mode l -based regression coeff icients and the corresponding regression coefficients from the subject sys tem. The products of these differences were computed, and then added. If a subject had perfect matching ability, then 2 - d would be zero. Thus , as a matching Index, X-d a s s e s s e s the degree of mismatch. The equation for 2-d , based on Hammond et al . (1964), is as fol lows: 100 Z - d = 2 [ ( r „ - rs /) (0„ - / y ] (5) Where r e / = Eco log ica l validity coeff icients r s / = Util ization coefficients 3ei = Mode l -based regression weights Bsj = Subject sys tem-based regression weights Cue-importance task. Severa l researchers (Birnbaum & Stegner, 1981; Brehmer & Qvarnstrom, 1976; Cotton, J a c o b s , & G r o g a n , 1983; Nystedt & Magnusson , 1975; Schmitt & Levine, 1977) recommended that in terms of understanding the psychology of the judgment process, research directed at understanding the subjective weights of the cues be pursued. Therefore, in this study (in addit ion to the two regression policies), a third judgment policy was der ived for e a c h subject based on subject ive weights of cues . In a ser ies of 10 lens-model type c a s e s , for each c a s e , subjects predicted heal ing t ime b a s e d on the cues given, and rated the importance of e a c h of the cues to the judgment. Subjects rated cue- importance on a sca le from 1 to 100. This method was used because of its simplicity. Furthermore, in a study compar ing seven methods for obtaining subjective importance of cues , which included this method, Cook and Stewart (1975) reported no significant dif ferences between methods. A subjective policy was derived for each subject. T h e s e pol icies were used (with regress ion-based policies), to predict heal ing t ime for a new set of 35 c a s e s as a m e a n s to a s s e s s the degree to which the subject ive policy compared with the 101 regress ion-based pol icies in accuracy of cross-val idat ion. The new set of c a s e s was deve loped from phase 1 data to be parallel to the other two sets. Einhorn, Kleinmuntz, and Kle inmuntz (1979) and Surber (1985) identified that reduction in the range of cue-va lues can lead to inaccurate interpretation of the lens model equat ion. In developing each of the sets of c a s e s , every attempt w a s made to obtain a range of cue-va lues that was representative of the clinical ecology. In Tab le 3-01, the means , standard deviat ions, and R 2 for normal order, • • reverse order, and c ross val idat ion c a s e s are reported, to demonstrate the extent to which the three sets of c a s e s were matched on these features. Table 3-01 Means and Standard Deviations for the Dependent Variable, Healing Time, and Regression Model R-Squared for Normal Order. Reverse Order, and Cross-Validation Cases Summary Description for Three Sets of Cases Mean SD for Healing Variance Types of Cases Healing Time Time Explained (R2) Normal Order (n=35) 18.40 15.37 .70 Reverse Order (n=35) 19.66 15.32 .68 Cross-Validation (n=35) 19.02 14.72 .64 Note. SD = Standard Deviation 102 Slide task. Subjects v iewed 15 sl ides of patients with abdominal incis ions; each sl ide was a c lose-up v iew of the incision so that fine detail w a s possib le to see . A rear-projection v iewer with a 16 inch sc reen w a s used . Subjects sat about three feet from the sc reen . Lighting in the room was d immed to create maximum visibility. All s l ides were shown in the s a m e sequence . The first 12 s l ides were shown for 5 seconds ; the last three sl ides were shown for about a minute. E a c h subject descr ibed and interpreted these latter three s l ides. R e s p o n s e s were tape-recorded. This task had two purposes. T h e first purpose w a s to attempt to enhance memory-pr iming. T h e s l ides were anticipated to elicit memor ies of relevant past c a s e s . O n e of the crit icisms of conduct ing research in a laboratory context is that cues that are normally present in the natural sett ing are absent in the laboratory. B e c a u s e cl inicians make use of such cues as a guide to judgment, their absence may (in part) account for previous researchers ' f indings that judgment accuracy in the laboratory setting is modest. In order to reduce the per formance-depress ing effect that the laboratory context may have, the assoc ia t ion task and the information board task were schedu led to precede the heal ing t ime judgment task for all subjects. The sl ide task, however, was theorized to instantiate an appropriate s c h e m a more effectively compared to tasks that rely on verbal cues a lone. In addit ion, the visual detail about actual incisions avai lable in the s l ides was predicted to be more helpful to experienced subjects, compared to nov ices because exper ienced subjects likely have many more memor ies. A n interaction between sl ide condit ion and exper ience w a s predicted. The sl ide task w a s schedu led prior to the heal ing t ime judgment task for half the subjects at e a c h exper ience level in order to a s s e s s whether this attempt at enhanced memory-pr iming w a s effective. 103 Two memory priming condit ions were used : enhanced memory-pr iming, and basel ine memory-pr iming. Subjects in condit ion 1 s a w the s l ides immediately prior to engaging in the heal ing t ime judgment task. In condit ion 2, subjects saw the sl ides after complet ing this task. Thus , memory-pr iming was a between-subjects factor. Subjects at each exper ience level were randomly ass igned to the priming condit ion. If evoked memor ies included accurate cue-criterion relat ionships, judgment accuracy w a s theor ized to increase in the enhanced memory-pr iming condit ion. If memor ies were not avai lable, or not in cor respondence with the se lected cases , such exposure to s l ides should not facilitate judgment performance. The second purpose of this sl ide task was to obtain a measure of accessibi l i ty of particular domain-speci f ic knowledge. This task w a s der ived from research carr ied out by Myles-Wors ley , Johns ton , and S imons , (1988), Norman, Brooks , A l len, and Rosentha l (1990), and Norman et al . (1989). My les-Wors ley and co l leagues used X-ray fi lms as stimuli and Norman and co l leagues examined diagnost ic performance from s l ides of dermatology condit ions. In the present study, verbal izat ions (both descript ive and evaluat ive comments) about the incis ions seen in each sl ide were tape-recorded. Orthogonal-cue judgment task. This task was referred to as the ca rd -sorting task. It was based on the research conducted by Ashton (1974), Hoffman, S lov ic , and Rorer (1968), S lov ic (1969), and S lov ic , Rorer, and Hoffman (1971). T h e purpose of this task was to a s s e s s the use of cues where nonlinearity was present. C a s e stimuli were generated by orthogonal combinat ion of four cues , three of which were wel l known to inf luence heal ing: age (two levels: 32, 78); diabetes (two levels : absent, present s ince age 30); weight (four levels, expressed in pounds: 9 0 , 1 5 0 , 210, 270); height, expressed in feet and inches (three levels: 5' 0", 5' 5", 5' 10"). Four levels of weight were used to include an example of a nonl inear cue, where 104 both extremes could lead to problems with wound heal ing. A lso , the weight cue and the diabetes cue were configural: the importance of the variable weight to the judgment of heal ing t ime depended on the value of height, and the importance of the diabetes depended on the value of age. In this study, subjects made judgments about heal ing for 48 non-representative, hypothetical patients. Patient data (cues) were printed on cards. Subjects were informed that each case referred to a hypothetical female patient who had undergone a major abdominal operat ion. They were encouraged to use their knowledge and exper ience to help them estimate heal ing t imes for patients with particular combinat ions of data. A meter-long time-line with markers at 10-day increments was p laced on a desk surface. The goal was to a s s e s s each card with patient data and place the cards judged similar with respect to heal ing time together in a category. Subjects could construct as many categories as they could discriminate. To encourage analytical p rocess ing, subjects were informed that they could take whatever t ime they needed , apply any strategy they w ished, and continue to sort the cards until they were satisf ied with their placement. Subjects ' judgments of heal ing time in hypothetical patients reflected main effects of the information cues and interaction effects, or configural cue-use . B e c a u s e the patients were hypothetical, there were no objectively correct responses for this task. Us ing the responses given by each subject as dependent var iables, the data from this task were ana lyzed first using a single-subject Ana lys is of Var iance ( A N O V A ) , and then calculat ing omega-squared (co2). Omega-squa red provided an est imate of the proportion of the total var iance in the subject 's judgments that could be attributed to main effects and to interaction terms (Hayes, 1981; S lov ic , 1969). 105 Omega-squared for main effects was a measure of l inear information util ization; co2 for interactions a s s e s s e d configural information utilization. In this study, the latter measure was of most interest, as it provided an assessmen t of nonl inear cue-use that w a s an alternative to lens model C -sco res . Omega-squa red for interactions represented interactive process ing under analyt ic condit ions. Concept similarity judgment task. This task was based on the research conducted by Schvaneveld t et a l . (1985) in their research on the structure of expert ise, and that of Roth, G a b e l , Brown, and R ice (1992), who identified changes in students' cognit ive structure assoc ia ted with studying texts. E a c h subject 's conceptual structure (or cognit ive structure) was revealed through mult idimensional scal ing (MDS) . M D S is a mathematical technique in which a complex matrix of numbers representing proximities is reduced into a simpler picture. The d is tances between points reflected the psychological proximity of the concepts (Schiffman, Reyno lds , & Y o u n g , 1981). Similarity judgments were made of all possib le unique pairs of 16 concepts related to wound heal ing (120 judgments in total). The particular concepts used were obtained through reading the literature on wound heal ing, and by consult ing with a clinical nurse special ist with expert ise in this f ield. E a c h pair of concepts was d isp layed in sequence by personal computer, using a program deve loped for this purpose. A n assumpt ion w a s made that the proximity judgments were symmetr ica l ; thus, the similarity rating for the concept pair Erythema Inflammation was a s s u m e d to be the s a m e as that for Inflammation Erythema. The ordering of concept pairs was the s a m e for all subjects. The concept names were presented on the s a m e line in white letters on a blue 106 background half way down the sc reen. Subjects control led the presentation rate and were encouraged to respond quickly. The verbal izat ions arising from the associat ion task (d iscussed previously) constituted the second source of data for conceptual structure. Verba l protocols are d i scussed in relation to judgment process . Measures of Cognitive Constructs Conceptual structure. Da ta pertaining to this construct were obtained from two sources . O n e source was the R 2 assoc ia ted with the mult idimensionally sca led ratings generated from the similarity judgment task. The other source was the measure of coherence of theory from the associat ion task. Conceptua l structure w a s inferred from the relat ionships among concepts , the structure of verbal izat ions, and the level of key terms subjects used (superordinate, basic , or subordinate) as subjects descr ibed exper iences caring for surgical patients. Sensitivity to patterns in data. Sensit ivity to overall patterns of data in the ecology was a s s e s s e d by the lens model measures , supplemented by the verbal protocols. Sensit ivity to f ine-grained patterns involving more subtle case -by - case variation in cue importance was a s s e s s e d by the standard deviat ions of the subjective importance of cues and by the variability of information search for e a c h alternative. The influence of cue-presentat ion order (context cues fol lowed by individuating cues , or the reverse) on the patterns of relations w a s a s s e s s e d by compar ing lens model measures and judgment accuracy separately in the two condit ions. 107 Judgment process. This construct was a s s e s s e d by the fol lowing measures : depth, variability, and pattern of search from the process tracing der ived from the information board task, and configurality measured by the interaction terms obtained from the orthogonal-cue judgment task, and by lens model C -sco res . Judgment process was a s s e s s e d also by m e a n s of protocol analys is . Protocol analys is is a technique that has been used successfu l ly in many studies of judgment p rocess (Elstein et a l . , 1978, 1990, 1993; E r i csson & S imon , 1980, 1993; Newel l & S imon , 1972; S v e n s o n , 1989). P. E. Johnson et a l . (1982) used think-a loud protocols to dist inguish l ines of reasoning for novice and expert physic ians. Niss i la (1992) used this method with 5 exper ienced nurses and 5 less exper ienced nurses to identify l ines of reasoning in clinical nursing dec is ions. Protocol analys is w a s used in this study as a supplement to regression approaches, s o that data could be obtained on judgment process as well as on judgment outcome. Protocols generated from the information board task were a s s e s s e d for ev idence of data search and cognit ive process ing strategies. Mak ing inferences about cognit ive p rocesses from verbal protocol data, however, is not without its difficulties; in fact, s o m e authors crit icize this method (Nisbett & Wi lson, 1977). There are three main concerns regarding the use of think-aloud protocols as data in judgment research. The first relates to reactivity (whether the thinking a loud influences the judgment process) . The second concern is validity; s o m e authors quest ion whether the protocols obtained are accurate a s indicators of cognit ive p rocesses . The third concern is that, because exper ienced subjects have more automat ized cognit ive p rocesses , compared to novices, both reactivity and validity may vary with exper ience level. 108 T h e s e crit icisms of protocol analys is have had much debate in the literature (Er icsson & S imon , 1993; Fidler, 1983; Nisbett & Wi lson , 1977; R u s s o , Johnson , & S tephens , 1989; P. A . Whi te , 1988). T o avoid reactivity, and to obtain max imum validity, the method must be carr ied out correctly; Carrol l and Johnson (1990) reminded researchers that ask ing subjects to explain their thinking while engaged in the task represented a major threat to the validity of verbal protocols. Henry, LeBreck, and Ho lzemer (1989) a s s e s s e d the inf luence of thinking-aloud with 60 pediatric nurses and found no apparent effect on cognit ive process ing. B iggs, R o s m a n , and Sergen ian (1993), in a study of judgments of investment quality, reported that verbal izat ions did not affect the amount or pattern of data acquisi t ion, or accuracy of judgments; earl ier f indings indicating that verbal izat ion inc reased task latency were repl icated. Resul ts from previous studies have revealed that judgment p rocesses are adapt ive and construct ive (Payne, Bet tman, Coupey , & Johnson , 1992; P a y n e , Bet tman, & Johnson , 1993a). A qualitative analysis was carr ied out to determine whether the subjects in this study demonstrated these characterist ics. For example , the process tracings were examined together with the transcr ibed verbal protocols to identify examples of adaptive strategies. In addition, the process tracings were divided into three parts: the first eight cues, the second eight cues , and the remainder. E a c h sect ion was classi f ied in terms of strategy use . Outcome performance in terms of accuracy was determined. Indicators of Expertise Accuracy of judgments. Th is var iable w a s a s s e s s e d using accuracy b a s e d on the error var iance measure and on the percent correct measure , using the judgments of heal ing t ime from the lens model task. Judgment accuracy was 109 determined by compar ing each judgment with the actual clinical state in the ecology. E v e n though nurses car ing for surgical patients are expected to predict wh ich patients have highest risk for de layed heal ing, and they frequently make judgments about treatment based on the level of perceived risk, these judgments were anticipated to be difficult. A s a means to increase the ability of the subjects to make the required judgment, two formats for response were used in this study: one w a s an analogue sca le anchored with extremely shorter than average on the left, and extremely longer than average on the right. To indicate their response regarding heal ing t ime, for e a c h case , subjects moved the cursor to the des i red point (which was converted to a number by the computer program). The second response format was numeric. Subjects entered their est imate of the heal ing time, expressed in number of days. B a s e d on research carr ied out by Tversky, Sattath, and S lov ic (1988), the use of two types of response formats was theor ized to help subjects provide the most precise est imate of heal ing time of which they were capable . Judgment of confidence. Subjects ' conf idence in judgment was determined as a component of the heal ing time judgment task. Fol lowing the research tradition of G . Wright (1982), Keren (1991), and P a e s e and Sn iezek (1991), the assumpt ion was made that conf idence expressed as a proportion can be treated as a subject ive probability. Repor ted conf idence reflected subjects' degree of certainty that their judgments were within the interval measure of heal ing t ime. Data related to conf idence in judgment were obtained by two methods. The first was a concurrent measure, based on research carr ied out by Fischhoff et a l . (1977) and Lichtenstein, Fischhoff, and Phil l ips (1982). E a c h subject moved the cursor to a point on a sca le which best represented his or her conf idence that the preceding judgment was correct (within the interval range of the actual heal ing time). 110 The second method was retrospective, and was motivated by research conducted by Gigerenzer et a l . (1991). E a c h subject w a s asked to make two retrospective est imates of the number of judgments of heal ing t ime that were within range. Subjects made these overall est imates at the conc lus ion of e a c h of the two sets of 40 judgments. Bjorkman (1992,1994) and Jus l in (1994) made assumpt ions b a s e d on a Brunswikian perspect ive. In the present study, these authors' research methods have been drawn upon because they are consistent with the lens model approach. Bjorkman (1994) proposed that beliefs in the form of internal cue validit ies mediated the process ing of ecological cue validit ies in the assessmen t of conf idence. Judgment consistency. T h e judgments of heal ing t ime and the judgments of conf idence were a s s e s s e d for consistency. Five c a s e s in each set of normal order and reverse order cases were repeated. For e a c h subject (and for each set of cases ) , consis tency for heal ing time was obtained by correlating the repeated judgments of heal ing time. Similarly, consis tency in judgments of conf idence was obtained by correlating the repeated judgments of conf idence. Calibration of confidence. This indicator of judgment expert ise was based on research carr ied out by Garb (1986, 1989) and O s k a m p (1962, 1965). Judgments of conf idence were aggregated over conf idence intervals and compared with proportion of accurate judgments to s e e how well individuals were able to judge the accuracy of their predict ions of heal ing time. Cal ibrat ion measures the extent to which the subjective probabilit ies (confidence judgments) are real ized in terms of relative f requencies. Probabi l ist ic judgments are sa id to be wel l-cal ibrated when the corresponding judged probabil it ies match the relative f requencies of events (Ronis & Ya tes , 1987). I l l C h a n (1982) argued that for a judge who is wel l-cal ibrated, the real ized f requency distribution should not deviate too much from the predicted probability distribution. For example , on those days when a weather forecaster predicts a 3 0 % chance of rain, he or she should be correct about 3 0 % of the t ime. That is, in retrospect, the historical record should show that on 3 0 % of these days it did in fact rain. In the present study, to be able to make this type of assessment , a proportion measure of accuracy was used . S u c h a measure was obtained by d ichotomous scor ing: all judgments within the interval measure of heal ing time (as descr ibed earlier) were scored as correct, and all judgments outside this interval were scored as incorrect. Subjects were aware of this interval measure . Al l 40 judgments in each set of c a s e s (including repeated cases) were used in the calculat ion of proportion correct. Judgment latency. Latency was determined for the first paragraph and for the two paragraphs presented together. The two t ime per iods for each c a s e s were added to determine the number of seconds from the time subjects indicated read iness to read a c a s e to the t ime they were ready to enter their numer ic judgment. For each subject, latency was averaged over c a s e s . Accessibility of knowledge. The Knowledge Accessib i l i ty sca le was deve loped from subjects ' responses to the three s l ides showing c lose-up v iews of abdominal incis ions; subjects provided both descript ive and evaluat ive data. Verbal izat ions were ana lyzed for particular knowledge, structure, and style. In Tab le 3-02 a n outline of the tasks, measures , and related constructs u s e d in this study is presented. 112 uu CO O CL rx D D l °8 o OC f-co z o o Q Ul I— < o o CO to < CD 3 o 3 i _ CO 16 3 CL CD o c o o co co CD o o c CD E D> X ) 3 CD X I c * I CD 2 E : -"§ o — CD s - i C 3 ' O -E 0) 03 • - .y. a) 2> C L CO i2 g CO To CO c CD C L CD ;> "co c CO CO X 0) X I C "33 o CO c (Le c CO 'co </) co CD CD 'OCi roc Q . Q_ c ral CD 3 E **— Jud Con o 3 W c o o •a a> <3 o o w •a c (0 CO LU CC CO < LU a> o c CD O O To 3 *—1 Q . CD O c o o c CD E co <o CD CO (0 < CD CO C L CO CD (O T> C CO JZ 2 CO CD <0 X CD O X I C c CO (0 CO CO CD CO c .o as E o •= o C L CD a • i t .CO o > o * - CD O > CO ^ E a> <D O O . C " D g « CD C " 8l CL 2 CD C C CD CD E en CO * -1 .1 c CD E c o w CJ CD CO 5 <0 "Jjt CD co g a < E o c CD o k _ CD CL " D C co CD o c .55 * *_ CO > w >* o 2 3 O O CO CD 2. T5 O CD < o <0 -»—1 c CD E O ) " D 3 ~3 CD E co c "CO CD X o c w 'co c o o tn c CD E co X I 3 ~ 5 CD O c CD X> "c O o "o >^  o c a> £2 '<o c o O CO c CD E CO X I 3 ~3 CD O c CD X ) "c O O c .o 23 "CO O o c CD c CD E CO T J 3 CO S> 3 CO co CD CD X J O c CD CO CD i _ O o CO O "33 X ) o CO c CD CM O i CO _CL) a> i-3 (0 a> v> v> m a> c 3 o co < CO c .g "§ o o CO (O < co V. CO o m c .o co E V -C) CM c CD E CO X I 3 —3 CD E co •M * CO CO CD .CO CO LU CO O Q_ DC D CL 08 CO r— O Z) DC I-co z o o Q UJ o o CO CO < o c CO .c c Q) C L E CD % O 4—' co 0 co 0 T 3 W a> co CO 03 3 o co CO o '•c > 0 0 c -0 > E o 3 0 0 co 0 Q -c a> E co T 3 - 3 C/) c o T5 c o o to c CO 0) TO c 3 CO co a> o o c CD E C O T 3 3 CD CO '•e CD C L c CD E C O T 3 3 CD O ^—' CO re 3 Q _ CD O C o O CD co CD C L c CD E co •a 3 .9 2 O 3 • i " 8 O U CO CD CO 03 5" E ° 3 >- t W CD O - - -CD CO CD c re o c >> w ~ CD TO "o 3 ~ cr O CD ° -— 0) CD CO re si C L J _ E re T3 O O 5 CO CO . O CD CO a i _ (o CD CO _ 3 2 C D O CD * CO CO CD o o C CD E C O •o 3 (0 +-> u 3 to c o o T3 CD (0 o o (0 (0 < c (0 CO (1) 3 CO T3 a> 3 C C o O CM O CO 0) .O 03 w m c 3 o CO UJ DC CO < HI CO < CD o c: re T 3 E <» o c t re 0 -p C L C 1 CD 1^  II 0 CD C L E CD cz co E .2 c w tS .2 co » CD 5 T 3 CO CD C < .£ o CO re CD •g CO . Q 'co CO 0 o CD C O T 3 0 o c CD E co co 0 CO CO < 2« a> C ? 3 ° - 3 8 re 0 3 8 , | . C 0 E co 3 X 0 X3 C "B S ^ 3 0 re E -o CO 0 CO >-0 re ft §-< co 0 E j -co _c re 0 1 c o JO re o 3 ts 3 ra 3 s. 0 O c o O CO DC C L 0 o c o o c o o 0 3 = E CO to 0 0 » 5 2- « ^ Q . C 0) ra !£ $ 1 0 CO Si 0 o C L0 o c o o 0 0 co 0 o "a •E re CO N ro ro a " CO 3 o CO b o 0 c 0 E co T 3 3 ro c o co o ^ t: ro O H in c 0 E co TJ 3 ro co E g * " CO co 0 3 1 o o H - C L ° 0 m > 5 3 0 w > ts ts S *? 3 O CO o CO ro 0 o £Z ro •c o C L E 0 3 o c o ro E u O H — c 0 o c ro •c o C L E •4— o ro « > 114 Measures of Experience Exper ience was a s s e s s e d by the number of months car ing for surgical patients, by the est imates of the number of surgical incisions seen during their nursing career, and by the number of references to exper ience from the protocol data. Section D: Anticipated Patterns of Relations B a s e d on the literature, with a focus on Dreyfus and Dreyfus' (1986) theoretical v iews pertaining to novice to expert changes in cognit ion, the following patterns of relations were ant icipated: 1. Exper ienced subjects were anticipated to have clinical judgment performance character ized by higher accuracy and consistency, greater conf idence, better calibration, and shorter latency, compared to less exper ienced subjects. 2. Exper ienced subjects were expected to have more sophist icated conceptual structure, greater sensitivity to patterns in data, and more comp lex judgment process compared to less exper ienced subjects. 3. Individual dif ferences in age and educat ion were expected to attenuate the patterns of relations between exper ience and accuracy in judgment performance. 4. Cue-presentat ion condit ion w a s anticipated to reveal a decrement in performance for exper ienced subjects, compared to their performance in the normal order c a s e s . 115 5. Memory-pr iming condit ion was expected to reveal higher performance for exper ienced subjects in the enhanced memory-pr iming condit ion, compared to that in the basel ine memory-pr iming condit ion. 6. Cognit ive var iables were anticipated to be most predictive of expert ise in clinical judgment performance, compared to other var iables. Section E: Summary In this chapter, a descript ion of the method has been provided. The methods des igned to address the research quest ions raised in chapter 1 have been descr ibed. E a c h of the tasks and measures have been se lected to reveal different aspec ts of clinical judgment performance, or to address a problem with reveal ing expert ise in the laboratory context. The results from one group of subjects on these tasks were anticipated to demonstrate var ious patterns of relations. W h e n examined together, these patterns were anticipated to provide greater understanding of expert ise in clinical judgment, and possibly to add to knowledge about changes in cognit ion with expert ise. The next chapter is a report of the ana lyses and results. 116 IV. ANALYSES AND RESULTS This chapter is organized in four major sect ions and a summary. In Sec t ion A , the ana lyses and results of phase 1 of the study where the ecology w a s mode led (the left s ide of the lens model) are presented. In Sec t ion B, the sample of subjects who made judgments are descr ibed, an overview of the data analys is is provided, and the first research quest ion is add ressed . The ana lyses and results relating to indicators of expert ise (judgment accuracy, conf idence, consis tency, calibration, latency, and knowledge accessibi l i ty) are reported. In Sect ion C , the ana lyses results related to the cognit ive constructs (conceptual structure, sensitivity to patterns in data, and judgment process) , and individual dif ferences in age, educat ion, and exper ience are presented. In Sect ion D, the results related to paragraph order condit ion and memory-pr iming condit ion (research quest ions four and five) are g iven. The last sect ion is a summary of the main f indings: the var iab les which best predict expert ise in clinical judgment are identified. Section A: Lens Model Analyses Phase 1 of the Study] To obtain phase 1 data (the left s ide of the lens model), a sample of 281 patients with abdominal incisions was investigated as descr ibed in chapter 3. The dependent variable Heal ing T ime was miss ing for 13 subjects; the data from these cases , therefore, were exc luded from analys is . A n additional 10 c a s e s with extreme heal ing t imes were a lso omitted because these scores on the dependent variable-could have excess ive inf luence on the magnitude of the regression coefficients (Barnett & Lewis , 1978; Hatch & Pr ihoda, 1992). Da ta from 99 male patients and 159 female patients (258 cases ) were ana lyzed . The data were sc reened and the degree to which the data met the assumpt ions underlying regression was evaluated, as recommended by Tabachn ick 117 and Fidell (1989). The var iables which predicted heal ing t ime and their relative importance were then determined. Data Screening Thirty-five var iables were a s s e s s e d for accuracy of data entry. S i x var iables were dropped from analys is because of miss ing data. The variable Occupat ion w a s converted to a numeric soc ioeconomic index as descr ibed by B l ishen, Carrol l , and Moore (1987). The var iables Weight and Height were combined to form a single variable, Body M a s s Index, following examples in medical research (Revicki & Israel, 1986; Shetty & J a m e s , 1994). The use of this index el iminated variation in weight assoc ia ted with dif ferences in height. The frequency distributions for all var iables were a s s e s s e d . T h e distributions for A g e , Hemoglob in , and Occupat ion were c lose to normal , whereas the distribution for Surgery T ime w a s found to have a high kurtosis va lue of 27 .60 (standard error of 0.30). A n examinat ion of the histogram revealed that the distribution for Surgery T ime included one particularly deviant (but accurate) score of 933 minutes; in all other 257 c a s e s , surgery t imes were between 45 and 514 minutes. To address problems assoc ia ted with departure from normality, Hawkins (1980) recommended "Winsorizat ion", which involved truncating the deviant score to one unit beyond its nearest neighbour. Hawkins stated that, providing the outlier represented a val id observat ion, an advantage of Winsor izat ion was that this technique al lowed the researcher to make partial use of the data and yet avoid excess ive influence. The kurtosis was reduced to 5.49 (standard error of 0.30). Two other var iables with high kurtosis va lues were: B lood L o s s (9.73, standard error of 0.30) and Body M a s s Index (6.85, standard error of 0.30). Var ious transformations 118 were attempted, with s o m e degree of s u c c e s s , but they all increased the complexity of interpretation. T h e s e independent var iables, therefore, were not t ransformed. The following seven var iables were dichotomous: Gende r (men/women), Contaminat ion (clean/contaminated), Diabetes (no diabetes/diabetes), Dress ing (dry/packing), Infection (no infection/infection), Incision Length (short/long) and Incision Approximat ion (approximated/open). Ten var iables were converted to dichotomies because of uneven distributions in the original categor ies: R a c e (Caucasian/other) , B lood Transfus ion (none/some), R e a s o n For Surgery (other/cancer), Prior Operat ions (none/some), History Of I l lnesses (no/yes), Number Of Drains (none/some), Nutritional Status (good/poor), Recent Weight Loss (none/some), Sever i ty Of Surgery (minor/major) and Compl icat ions (none/some). For all of these var iables, the categories were coded as 0 and 1, respectively. The variable Hospital Serv ice was converted to three categorical var iables, using dummy coding as illustrated by Draper and Smi th (1981). One cross-product term included w a s that between Gender and Severi ty of Surgery. The dependent var iable, Heal ing T ime, had a posit ive skewness of 2.07 (standard error of 0.15) and a kurtosis value of 5.11 (standard error of 0.30). To improve distributional propert ies this var iable w a s logarithmically t ransformed. T h e va lues after transformation were 0.37 and -0.57 (standard errors of 0.15 and 0.30), respectively. Subsequen t analys is w a s based on 258 cases , 30 independent var iables, and one dependent variable. 119 Evaluation of Assumptions Regress ion analys is was conducted to obtain the lens model equat ion. The dependent var iable, Heal ing T ime, was a cont inuous var iable measured in days ; the independent var iables consis ted of categorical and cont inuous var iables. Fox (1984) identified the assumpt ions underlying the regression model as linearity, independence of residuals, normality of the distribution of residuals, and homoscedast ic i ty (equality of residual var iance at e a c h level of the dependent variable). T h e s e assumpt ions were evaluated. Var iab les were a s s e s s e d for nonlinear relat ionships with Heal ing T ime. Nonlinearity'm the data would not invalidate the analys is, but such a relationship would not be captured by a linear model . Plots of the independent var iables and Heal ing T ime were inspected, and possib le deviat ions from linearity were a s s e s s e d by eta-squared (n.2). Howel l (1992) def ined eta as the correlation coefficient assoc ia ted with curvi l inear regression, and illustrated that r|2 represented the proportion of var iance accounted for in the dependent var iable by a n independent var iable (including non-l inear relationship). The variable, Admiss ion -To -OR-T ime was found to have a non-l inear relationship with Heal ing T ime. Pat ients who had emergency surgery, as well as patients who had surgery severa l days fol lowing admiss ion, tended to exper ience de layed heal ing. This var iable was re-coded to a dichotomy. S tenson (1974) pointed out that it is theoretically possib le to make the nonl inear var iance vanish by applying appropriate transformations to the cue-measures . B e c a u s e nonlinearity is an important measure in phase 2 of the study, excess ive var iable transformation was v iewed as unwise. In this ecology, the level of predictability w a s only moderate, and the re-coding of this one var iable w a s thought to be adv isab le in order to better meet the assumpt ions of the l inear model . 120 There is ev idence that the relationship between body-mass index and heal ing time might be nonlinear: emacia ted patients heal relatively slowly, patients of optimal weight tend to heal quickly, and obese patients heal much more slowly, other factors being equal . A compar ison of R 2 and r\2 revealed that the distribution for Body M a s s Index for this sample did not show significant nonlinearity. A poss ib le reason for this f inding is that patients with extremely low Body M a s s Indices were too ill to be asked to volunteer for this study. The second assumpt ion underlying regression that was evaluated w a s the independence of residuals. Al l patients were independently (but not randomly) samp led from four surgical units in a large teaching hospital. S o m e of the patients likely had the s a m e surgeon, which cou ld give r ise to correlated residuals. Da ta regarding surgeons were not obtained. Cooksey (1996) descr ibed a graphical method and a statistical method for examining the independence of residuals assumpt ion. A n assessmen t of the residual plot revealed no patterns which might suggest correlations. Cooksey c la imed that the Durb in-Watson statistic is useful as a test of the extent to which success i ve residuals are correlated. In this data set, this statistic w a s 2.16; compared to the upper and lower critical levels extrapolated from Durbin and Wa tson (1951), this value of 2.16 indicated that there was no significant autocorrelation present. The assumpt ion of independence w a s cons idered to be satisf ied. The third assumpt ion a s s e s s e d was normality. Normality of the distribution of res iduals w a s a s s e s s e d in severa l ways . First, the student ized residuals were c o m -pared to critical va lues from Lund (1975), and all were within range. Student ized residuals are a more sensit ive index compared to s tandard ized residuals (Stevens, 1984). S e c o n d , the normal probability plot of observed and expected standardized residuals was inspected. With Heal ing T ime transformed, this plot revealed the 121 desi red l inear pattern, with a s lope of one. Finally, the Mahalanob is ' d istance sco res were examined. Th is measure is the distance between two populat ions: one containing a possib le outlier, and one with this particular c a s e deleted (Kle inbaum, Kupper, & Muller, 1988). A s the model fits outlying scores , excess ive ly large residuals may result, s o m e of which may violate the normality assumpt ion (Rousseeuw & van Zomeren , 1990). Two multivariate outliers were revealed by Mahalanob is ' d istance. Not all outl iers, however, are influential, and the automatic rejection of outliers is not recommended (Andrews & Pregibon, 1978). Cook ' s d istance (C, 2) is a measure used to detect c a s e s which have excess ive inf luence (Cook, 1977, 1979; Cook & We isberg , 1980; Hadi , 1993). In this data set, C ; 2 was within range. B e c a u s e the two outlying c a s e s met the inclusion criteria, and because of the danger of over-fitting a regress ion model if all outlying c a s e s were removed, the two c a s e s revealed as multivariate outliers were retained. T h e final assumpt ion related to regress ion that was a s s e s s e d was homoscedasticity. The plot of s tandard ized residuals and predicted Heal ing T ime demonstrated moderate heteroscedastici ty. A logarithmic transformation of Heal ing T ime led to good improvement. Variables Predictive of Healing Time T o identify var iables that best predicted Heal ing T ime, a stepwise multiple regression w a s computed us ing the Regress ion program from the Statist ical P a c k a g e for the Soc ia l S c i e n c e s ( S P S S ) . With an inclusion criterion of p_ = .05, the equat ion included seven var iables; the R 2 was .59, and adjusted R 2 was .58. The seven var iables were: Severi ty Of Surgery, Dress ing Type, Compl icat ions, Body M a s s Index, Admiss ion -To -OR-T ime , Drains, and Occupat ional Index. 122 In Tab le 4-01, the intercorrelations between these seven independent var iables and Heal ing T ime (logarithmically transformed) are reported. The two var iables with the highest correlation with Heal ing T ime were Severi ty of Surgery and Dress ing Type. T h e s e var iables have substantial shared var iance a s wel l . The s ign of the variable Occupat ion was negative, indicating that people with low sco res on the B l ishen Index tended to have somewhat higher heal ing t imes, compared to people with high scores . Low scores for Occupat ion reflect low soc ioeconomic status, and all that is entai led: the l ikel ihood for poorer nutrition, less educat ion, less desirable occupat ion, and possibly a less healthy life style, compared to those with high scores . 123 Table 4-01 Intercorrelations Between Significant Predictors and Log Transformation of Healing Time Variables 1 2 3 4 5 6 7 8 1 Severity 2 Dressing .33* 3 Complications .23* .21* 4 BMIndex .01 .11 .07 -5 Admission .26* .13* .15* .13* ~ 6 Drains .29* .01 .07 -.07 .04 -7 Occupation .09 .03 -.04 -.04 -.15* .16* 8 H-Time .55* .54* .43* .29* .31* .21* -.06 Note. 1 = Severity of Surgery (Severity) 2 = Dressing Type (Dressing) 3 = Complications 4 = Body Mass Index (BMIndex) 5 = Admission-to-OR Time (Admission) 6 = Drains 7 = Occupational Index (Occupation) 8 = Healing Time, in Log Transformation of Days (H-Time) Correlations were based on data from 258 surgical patients *p_ < .05. 124 The product term Gender-Sever i ty was not significant. The correlation between gender and severity of surgery (r = -.40) was interpreted as an artifact of the particular samp le of patients se lec ted. O n e setting w a s a gynecology ward where a large number of women (coded as 1) had had minor surgery. Relative Importance of Variables Lane, Murphy, and Marques (1982) identified three indices of cue importance between a cue and a criterion: the unstandardized and standard ized regression coefficients, and the squared semi-partial correlation coefficients. T h e s e indices of variable importance were derived from a stepwise regression using the logarithm of Heal ing T ime in days a*s the dependent variable. S e e Tab le 4-02. Table 4-02 Indices of Variable Importance for Significant Predictors of Healing Time Indices of Variable Importance3 Significant Predictors b 6 sr2 1 Severity of Surgery .21 .32 .08 2 Dressing Type .35 .35 .10 3 Complications .15 .23 .05 4 Body Mass Index .01 .22 .05 5 Admission-To-OR-Time .08 .10 .01 6 Drains .08 .12 .01 7 Occupational Index -.00 -.09 .01 Note. b = Unstandardized Regression Coefficient (3 = Standardized Regression Coefficient s?= Squared Semi-partial Correlation Coefficient a Indices were based on Stepwise Regression using data from 258 patients. 125 The variable Dress ing Type, fol lowed closely by Severi ty of Surgery were the best predictors, using both the s tandard ized beta weights and the squared semi -partial correlation coefficient a s criteria. T h e last three var iab les (Admiss ion- to-OR-Time, Drains, and Occupat ional Index), although statistically significant, accounted for negligible var iance in heal ing time. In summary of this sect ion pertaining to the left s ide of the lens model , the results are consistent with the literature (Cooper, 1990a, 1990b; C ruse & Foord , 1973, 1980; Irvin, 1981; P. L. J o n e s & Mi l lman, 1990; Vil janto, 1991). Var iab les s u c h a s Severi ty of Surgery and Compl icat ions were found to be important predictors of Heal ing T ime. Approximately 5 9 % of the var iance in heal ing time for one sample of patients was accounted for by seven var iables. A task requiring judgments of heal ing time w a s judged to have sufficient predictability to be useful to reveal di f ferences in nurses ' sensitivity to overal l patterns in data. In addit ion, there was considerable unexpla ined var iance which was expected to give subjects with expert ise possibil ity to attain scores exceed ing that obtained from a linear model . Section B: Results Related to Indicators of Expertise f Phase 2 of the Study! Description of Nurse Subjects Thirty-six subjects volunteered for this study. Of these subjects, all but 1 were women ; gender was not a factor being investigated. S ix subjects were senior nursing students, 6 were recent graduates with work exper ience of 2 months or less , and 24 were Regis tered Nurses who had a range of work exper ience from 1 year to 25 years . In terms of general education, 15 subjects had high school as the highest level attained, 3 had col lege d ip lomas, 2 had baccalaureate degrees, and 1 had a master 's degree (not in nursing); a further 15 subjects had s o m e col lege or university 126 courses . For analys is purposes, subjects were classi f ied into one of two categor ies: those with basic high school preparation only, and those with courses or degrees beyond high schoo l . With reference to the highest level of nursing education attained, 6 subjects were students from nursing programs who were within one month of graduat ion, 27 had d ip lomas, 1 had a baccalaureate degree in nursing, and 2 had a master 's degree in nursing. With nursing educat ion, subjects were classi f ied into two groups: the students and nurses with a d ip loma in one group, and the nurses with more educat ion in the other. Exper ience a s s e s s e d by the number of years s ince nursing graduation ranged from 0 to 25. The number of months that subjects cared for surgical patients ranged from 2 to 300. The use of subjects ' est imates of the total number of surgical incis ions v iewed during their nursing career a s an indicator of exper ience resulted in a var iable with a range from 20 to 20,000, with a median of 2640 . T h e indicator of exper ience used for subsequent analys is was the number of months of surgical nursing exper ience reported. Subjects with 2 months exper ience or less , were categor ized as level 1; those with greater than 2 months to 60 months exper ience, compr ised level 2; and subjects with more than 60 months of surgical exper ience were classi f ied as level 3. This categorizat ion resulted in three groups of 12 subjects each . In Tab le 4-03, the number of subjects categor ized in three different ways (by age category, general educat ion, and nursing education) is reported, by exper ience level. 127 Table 4-03 Classifications of Subjects According to Experience Level. Age Category, and Education Number of Subjects Number of Subjects in Relation to in age categories Education General Nursing Experience Level <25 years 25-34 years 35-44 years 45+ years HS Extra Basic Extra 1 n=12 5 7 0 0 4 8 12 0 2 n=12 1 4 6 1 3 9 8 4 3 n=12 0 4 6 2 8 4 8 4 Note. HS = High School A s anticipated, inexper ienced subjects tended to be younger, compared to exper ienced subjects; although these novices had bas ic nursing educat ion only, about two thirds of them had extra general educat ion. In contrast, only one third of the exper ienced group had extra general educat ion. The middle group was the most heterogeneous, with a greater number of subjects with extra general and nursing educat ion. T h e s e subjects a lso had the largest range in age. Overview of Data Analysis The data were initially ana lyzed using A N O V A or M A N O V A to a s s e s s for a significant effect of exper ience on the var ious indicators of judgment performance. A s anticipated, because of the low power assoc ia ted with the sample s ize , very few of the results were statistically significant. The goal of this study w a s to a s s e s s for patterns of relations. C h o w (1988) stated that "the statistical s igni f icance of a set of data is not informative about the practical importance (or substant ive signif icance) of 128 the f indings" (p. 106). Schmidt (1992,1996) and C o h e n (1990) recommended the reporting of effect s i zes . Snyder and Lawson (1993) and Tatsuko (1992) descr ibed severa l measures of magnitude-of-effect, including eta-squared (ri2) which is a measures of associat ion strength. Eta-squared is the proportion of var iance in the dependent var iable accounted for by the particular variable in quest ion. Fowler (1985) c la imed that r\2 w a s the least b iased est imate of the population correlation coefficient and was of greatest interest to behavioral researchers. Thompson (1987) a lso recommended this measure . Eta-squared, therefore, was se lected because the central quest ion in this study was to a s s e s s the strength of relationship between exper ience and judgment performance. C o h e n (1973) suggested rf for use with A N O V A des igns; the formula he provided is as fol lows, where S S A is the between sums of squares for factor A , [which, in this study is experience] and S S T is the total sums of squares : T|2 = S S A (6) S S T v ' Many researchers recommended the reporting of conf idence intervals in addit ion to s o m e measure of effect (Cohen, 1994; Fowler, 1985; Schmidt , 1992, 1996). Frick (1995), however, identified some difficulties in the interpretation of . 95% conf idence levels and quest ioned their use. B e c a u s e the effect s i zes in this study were very smal l (which gave rise to large conf idence intervals having extremely limited usefulness), conf idence intervals were not reported. 129 Resea rch Quest ion 1 is: What are the patterns of relat ionships among var ious measures of judgment performance (indicators of expertise) and exper ience for a group of subjects in a clinical judgment task? The indicators of expert ise cons idered in this study were judgment accuracy , conf idence, consistency, calibration of conf idence, judgment latency, and knowledge accessibi l i ty. Judgment Accuracy Judgment accuracy was determined from the heal ing time judgment task. In each set of c a s e s judgments accuracy was a s s e s s e d in two ways : by the square root of the mean error var iance, and by the percent of correct judgments. T h e interval measure of heal ing time as descr ibed in chapter 3 w a s used for both measures . For normal order c a s e s , (cases presented in natural order with a paragraph containing context cues first, fol lowed by specif ic cues) accuracy a s s e s s e d by error var iance ranged from 77.00 to 91.70; accuracy a s s e s s e d by percent correct ranged from 10.00% to 57.50%. For reverse order c a s e s , (cases presented with a paragraph of speci f ic cues first, fol lowed by context cues) accuracy a s s e s s e d by error var iance ranged from 77.00 to 93.70; accuracy a s s e s s e d by percent correct ranged from 10.00% to 62 .50%. In Tab le 4-04 the means and standard deviat ions for measures of accuracy are presented, by exper ience level. 130 Table 4-04 Means and Standard Deviations for Accuracy Assessed by Error Variance and Percent Correct for Normal Order and Reverse Order Cases, by Experience Level Normal Order Cases (n=35) Reverse Order Cases (n=35) Mean Error Mean % Mean Error Mean % Experience Variance Correct Variance Correct Level (SD) (SD) (SD) (SD) 1 87.59 40.17 88.88 42.67 n=12 (3.65) (12.55) (3.05) (14.74) 2 86.93 38.75 88.20 36.50 n=12 (3.30) (9.80) (4.47) (7.49) 3 87.99 45.08 88.97 42.42 n=12 (2.03) (9.92) (2.41) (9.52) Note. SD = Standard Deviation. Error Variance Measure has been transformed; high scores indicate high accuracy. Exper ience accounted for slightly more var iance in performance using the percent correct measure, compared to the error var iance measure (rf = .07 and v\2 = .02, respectively). T h e proportion of var iance assoc ia ted with the within-subject factor (normal and reverse order cases) was much smal ler using the proportion correct measure compared to the error var iance measure (rf = .01 and rf = .18, respectively). Al though a definite pattern in the scores could not be identified, the middle group had the lowest sco res in both normal and reverse order c a s e s . T h e standard deviat ions for the novices showed greater variability for this group, compared to the other two groups. 131 Confidence A s descr ibed in chapter 3, judgment conf idence was a s s e s s e d in two ways : by a concurrent method, and by a retrospective method. In the concurrent method, immediately after each judgment of heal ing time, each subject indicated his or her conf idence (in percent) that the preceding judgment of heal ing t ime w a s within the interval measure of actual heal ing t ime. In the retrospective method, following the set of 40 judgments, each subject reported the number of judgments that he or she est imated to be within the interval measure of heal ing t ime. M e a n s and standard deviat ions for concurrent conf idence and retrospective conf idence are reported in Table 4-05, by exper ience level. Table 4-05 Means and Standard Deviations for Average Confidence (Concurrent Method) and Overall Confidence (Retrospective Method) for Normal Order and Reverse Order Cases, by Experience Level Normal Order Cases (n=40)a Reverse Order Cases (n=40)a Concurrent Method Retrospective Method Concurrent Method Retrospective Method Experience Means Means Means Means Level (SD) (SD) (SD) (SD) 1 60.83 47.92 56.42 44.83 n=12 (16.43) (18.64) (17.34) (19.08) 2 68.33 60.50 64.83 53.25 n=12 (16.05) (20.24) (16.11) (18.64) 3 75.00 57.55 69.25 54.82 n=12 (11.86) (17.10) (20.10) (19.01) Note. SD = Standard Deviation a Repeated cases were included in the assessment of confidence. 132 Data from Table 4-05 revealed that exper ience accounted for a smal l proportion of the var iance in conf idence {rf = .12 with the concurrent method; rf = .07 with the retrospective method). There was a pattern for conf idence to be somewhat higher in the normal order c a s e s , compared to the reverse order c a s e s . Judgment Consistency A s descr ibed in chapter 3, for each subject, cons is tency w a s a s s e s s e d for both judgments of heal ing t ime and judgments of conf idence in normal order and reverse order cases . Cons is tency measu res indicated by the correlat ions for the corresponding sets of five repeated judgments are presented. Nonparametr ic summar ies of central tendency and variability are reported in Tab le 4-06. Table 4-06 Nonparametric Summaries of Consistency for Repeated Judgments of Healing Time and Repeated Judgments of Confidence (Concurrent) for Normal Order Cases and Reverse Order Cases, by Experience Level Correlations Between Repeated Judgments of Healing Time and Repeated Judgments of Confidence Repeated Normal Order Repeated Reverse Order Cases (n=5) Cases (n=5) Experience Healing Confidence Healing Confidence Time: Mdn Mdn Time: Mdn Mdn Level (IQR) (IQR) (IQR) (IQR) 1 .83 .55 .88 .53 n=12 (.21) (.60) (.17) (.62) 2 .92 .26 .95 .22 n=12 (.22) (.79) (.12) (.78) 3 .93 .80 .97 .73 n=12 (.15) (.41) (.05) (.45) Note. Mdn = Median; IRQ = Interquartile Range 133 Overal l , there was good consis tency for the judgments of Heal ing T ime. There was a pattern for consis tency to increase slightly, and for the variability in consis tency to decrease , with exper ience. The medians for cons is tency of judgments of Heal ing T ime tended to be slightly higher in the reverse order than in the normal order c a s e s . Cons is tency in judgments of conf idence, however, was extremely variable. A s shown by the large interquartile ranges, s o m e of the correlat ions were negative. The middle group had the most diverse correlations. The most exper ienced subjects had the best consistency in their conf idence judgments, compared to other subjects, but this level of consistency was not nearly as high as that obtained for their judgments of heal ing time. Med ians for cons is tency of conf idence tended to be slightly higher in the normal order c a s e s than in the reverse order c a s e s . Calibration of Confidence There are many ways to ana lyze the quality of conf idence judgments: Brier (1950) proposed what is now known as the "Brier score" ; O s k a m p (1962) deve loped the "appropriateness of conf idence"; G . Wright (1982) referred to "realism of conf idence". T h e s e measures are all related to calibration. A s d iscussed in chapter 3, calibration is a measure of how well the level of conf idence in one's judgment corresponds to the proportion of correct judgments, within a particular range of conf idence. For example, consider ing the range of conf idence between .7 and .8, for a perfectly cal ibrated judge, about 7 5 % of them should, in fact, be correct. Conf idence calibration was a s s e s s e d by compar ing conf idence and the proportion measure of accuracy in three ways . The first method was the traditional 134 method, used by Lichtenstein and Fischhoff (1977), G . Wright (1982) and others to al low compar ison with previous research. The equat ion for concurrent judgments of conf idence w a s : T C = 2 t = 1 n t ( x t - c t ) 2 ( 7 ) N Where : C = Cal ibrat ion N = Total number of judgments made T = Total number of different response categories n t = Number of judgments in response category t x t = Repor ted conf idence of each judgment in category t c, = Proport ion correct for i tems with conf idence x t . With this method, calibration can be expressed as the sum of the squared dif ferences between subjects' conf idence (expressed as a probability) and the proportion correct within each conf idence category, divided by the number of judgments. The second method of obtaining calibration [Bjorkman's (1994) method] a lso used concurrent ratings of conf idence that had a one-to-one cor respondence to e a c h judgment made. Bjorkman decomposed calibration (C) into three additive parts: bias, the squared difference between mean conf idence and proportion correct (D 2 ) ; resolution, the squared difference between the s tandard deviat ions of conf idence judgments and proportion correct (R 2 ) ; and deviation from linearity, the departure of the calibration curve from a linear function (L). 135 The equation used was : C = (% - c) 2 + (S x - s c ) 2 + 2 s x s c (1 - rxc) (8) Where : C = Cal ibrat ion X = Overal l mean conf idence x = Repor ted conf idence of each judgment C = Overal l proportion of correct responses c = Judgment accuracy, where judgments were sco red dichotomously. s x = Standard deviation of mean conf idence scores over conf idence intervals. s c = Standard deviation of proportion correct within conf idence intervals, over conf idence intervals r x c = Correlat ion of reported conf idence and judgment accuracy where the latter was scored dichotomously. The simplif ied equation for calibration from Bjorkman (1994) is: C = D 2 + R 2 + L (9) The first term in equat ions 6 and 7 is bias (D). The most c o m m o n bias that has been reported in the literature is that of overconf idence. Overconf idence is demonstrated when mean conf idence exceeds the mean proportion of correct judgments, whereas the reverse shows under-conf idence. Overconf idence has been a robust f inding (Fischhoff et a l . , 1977; Koehler , 1994; Zakay & G l i ckson , 1992). Whether overconf idence represents a cognit ive bias, however, or is to s o m e 136 extent an artifact of the method used for i tems se lected for study is currently being debated (Bjorkman, 1992, 1994; Jus l in , 1994). Test i tems which have not been se lec ted representatively may give rise to overconf idence. In this study, the use of representative sampl ing may provide a more accurate assessmen t of b ias. The second term in equat ions 6 and 7 is resolution, (R), which reflects the subject's ability to discriminate between correct and incorrect c a s e s . The third term is deviation from linearity (L), which descr ibes how closely conf idence judgments would fit a line drawn to represent expected proportion correct, if cal ibration were perfect. Ideally, this deviation is zero . In Tab le 4-07, the means and standard deviat ions for judgment calibration a s s e s s e d by the traditional method and by Bjorkman's method are presented, for normal order and reverse order c a s e s . Table 4-07 Means and Standard Deviations for Two Methods of Assessing Judgment Calibration for Normal and Reverse Order Cases, by Experience Level Judgment Calibration Normal Order Cases (n=40)a Reverse Order Cases (n=40)a Traditional Bjorkman Traditional Bjorkman Method Method Method Method Experience Means Means Means Means Level (SD) (SD) (SD) (SD) 1 .09 .12 .10 .13 n=12 (.07) (.08) (.10) (.10) 2 .12 .14 .13 .15 n=12 (.10) (.10) (.10) .10) 3 .14 .14 . .15 .15 n=12 (.09) (.05) (.07) (.07) Note. For both methods, higher scores indicate lower calibration. SD = Standard Deviation. a Repeated cases were included in the assessment of Judgment Calibration. 137 T h e s e methods both employ concurrent assessmen ts of conf idence. R e s p o n s e s from both methods were similar. There was a very weak relationship between exper ience and calibration (r|2 = .06 for the traditional method and rf = .02 for Bjorkman's method). A n examinat ion of the means suggested that calibration became somewhat worse with exper ience. A third method of examining the quality of conf idence judgments was b a s e d on the work of G igerenzer et a l . (1991). T h e s e authors proposed the hypothesis that the robust f inding of overconf idence in the literature ar ises not only from the item select ion procedure (items that are not representative of a reference c lass) but a lso from the convers ion of conf idence est imates into probabil it ies. G igerenzer and co l leagues argued that there is a psychological distinction between assess ing conf idence concurrently and making a retrospective est imate of the percent of correct judgments. The method that Gigerenzer et al. (1991) recommended for assess ing cal ibration cons is ted of comput ing the difference between the retrospective est imate (in percent) and the actual percent of correct judgments for each subject. In Tab le 4-08, the means and standard deviat ions for judgment quality a s s e s s e d by two different methods were reported. Method 1 was the difference between the mean of the concurrent conf idence judgments and the percent of correct judgments. Method 2 w a s the difference between the retrospective est imate of the percent of correct judgments and the percent of correct judgments, as proposed by Gigerenzer and co l leagues. 138 Table 4-08 Means and Standard Deviations for Two Methods of Assessing Judgment Quality for Normal and Reverse Order Cases, by Experience Level Assessment of Judgment Quality Normal Order Cases (n=40)a Reverse Order Cases (n=40)a Method 1b Method 2° Method 1b Method 2° Experience Means Means Means Means Level (SD) (SD) (SD) (SD) 1 20.67 7.75 13.75 2.17 n=12 (19.89) (21.55) (23.43) (28.59) 2 29.58 21.75 28.33 16.75 n=12 (14.49) (16.89) (17.25) (20.20) 3 29.92 12.27 26.83 14.00 n=12 (15.61) (18.99) (21.78) (21.71) Note. SD = Standard Deviation. For both methods, high scores indicate overconfidence. a Repeated cases were included in the assessment of judgment quality. b Method 1 = Average of 40 concurrent confidence judgments (in percent) minus percent of correct judgments. c Method 2 = Retrospective assessment of percent of correct judgments minus percent of correct judgments. The pattern of subjects' scores revealed that there w a s a weak relationship between exper ience and judgment quality (r|2 = .10 and r|2 = .08 for method 1 and method 2, respectively). The retrospective method revealed a similar pattern with exper ience, but showed somewhat less overconf idence. To demonstrate subjects' ability to discriminate their level of conf idence, subjects' mean conf idence was calculated separately for correct and incorrect judgments, and their conf idence-accuracy correlations were computed. In Tab le 4-09, these means and standard deviat ions for conf idence for correct and incorrect judgments and for conf idence-accuracy correlat ions for normal and for reverse order c a s e s are reported, by exper ience level. o 1-II cz, co 0 CO CO O L . CD CD CD > CD CD U c a> 'w CD a x UJ > 0) d> (0 co O CD •o CD CO »_ CD cc T3 C CO CD •o co E o z o H— CO c o a CD k_ o O o II CO CD CO ro O \ CD T3 CO E o CD O c CD cz o O o i o o c t3 a> o O CD O cz CD cz o O o i o CJ CZ o 0 I o O o 2 Z) u o < c 0 E CO TO 3 CO 4-» c 0 E CO TZt 3 ro o o < c 0 E CO 7 3 3 CO c 0 E co T 3 3 CO c g 0 & o O CO ^ ro Q CD CO CO ro a CD CO CO ro a CD CO CO cz o s 0 V— 1— o O CO ro Q CD CO CO ro Q CD CO CO ro Q CD CO 0 o 0 0 0 CD CL—I X UJ T -o CO h- in CM N cxi in CD T -T - CD CM T -CM r>» 00 CO CD T -00 O) (v. CD 00 CD T -CM T\ cz o t - CM i - CO •>- CM in CM CO h -a> cxi in CM in co CM co CO CO CD T -CO O T - CO o co CO CM CD CD i - CO h ~ T -T - CO CM i -O CM CM CM 00 CO CO co" in CO t -CO CO in a> CM CO h - 1 -0 o c 0 7 3 <P= c o o CO o CO CO o iri N - T -CO h -in h ~ T -CM CM CM CO CZ 0 1 'c? a ro 7 3 c ro < CO Q CO 0 o 2 ro 0 sz TZt 0 TO 3 O c 0 I CD co 0 CO ro o 7 3 0 CO 0 C L 0 ro cz g 0 fc o o 0 J* o ro .o 7 3 0 E co cz ro i _ +-» CO ro c 0 g 7 3 7 3 CO g CL C g 'to E CO <z ro (  P 0 -£Z co LL. CO c 0) 3 £ 0 7 3 C 3 CO c g 0 \_ o O >< o ro i — ZJ o 0 < 1 0 o c 0 o s 140 A s descr ibed by Howell (1992), prior to addit ion of the correlat ions, a F isher transformation (r to r') was carr ied out; the means were then transformed back to the correlational sca le . T h e proportion of var iance assoc ia ted with the within-subject factor (correct and incorrect cases) was substantial (r\2 = .51 in the normal order c a s e s and v\2 = .25 in the reverse order cases ) . It was anticipated that exper ienced subjects would be better able to differentiate (in terms of conf idence) between correct and incorrect c a s e s . The proportion of var iance assoc ia ted with the interaction between exper ience and the within-subject factor (confidence in correct and incorrect cases) , however, was smal l (n 2 = .06 for normal order c a s e s , and rf = .12 for reverse order cases ) , and in the opposi te direction to what w a s anticipated. The mean judgment-accuracy correlations were low, particularly in the reverse order c a s e s . The standard deviat ions revealed that variation in conf idence-accuracy correlat ions w a s not wel l -captured by exper ience level. Judgment Latency For each subject, a preliminary average judgment latency was obtained for each set of c a s e s by omitting the first f ive judgments from the average. Subjects were learning the task during these initial judgments, and the latency was e levated. E a c h subject 's mean latency for that set was substituted for the latency as t imed for these five judgments. The latency used in the analys is w a s an average for the 35 judgments. Latenc ies for repeated judgments were not included. The sequence in which the two sets of c a s e s were presented made a difference to subjects' latency. C a s e sequence 1 was the presentation of the normal order c a s e s first, fol lowed by the reverse order c a s e s . C a s e sequence 2 w a s the reverse. In Tab le 4-10, the means and standard deviat ions for average latency are reported for normal order and reverse order c a s e s , by sequence condit ion and by exper ience level. 141 Table 4-10 Means and Standard Deviations for Latency (in seconds) Averaged Over Cases for Normal Order and Reverse Order Cases, by Case Seguence and by Experience Level Latency for Normal Order Cases Latency for Reverse Order (n=35) Cases (n=35) Case Case Case Case Sequence 1a Sequence 2b Sequence 1a Sequence 2 b Experience Means Means Means Means Level (SD) (SD) (SD) (SD) n c n c n c n° 1 28.14 24.00 22.00 24.00 n=12 (8.30) (6.28) (6.38) (5.00) n=7 n=5 n=7 n=5 2 22.83 21.83 17.00 22.83 n=12 (4.17) (5.04) (2.45) (4.54) n=6 n=6 n=6 n=6 3 22.60 19.00 17.60 21.29 n=12 (6.54) (4.24) (6.43) (4.82) n=5 n=7 n=5 n=7 Note. SD = Standard Deviation a Case Sequence 1: Subjects completed normal order cases, then reverse order cases. b Case Sequence 2: Subjects completed reverse order cases, then normal order cases. c Number of Subjects in each Case Sequence. Exper ience accounted for a smal l proportion of var iance in latency (r\2 = .16). The effect of sequence condit ion was complex: the direct effect was negligible, but the interaction between sequence condit ion and the within-subject factor (normal and reverse order cases) was considerable (r i 2 = .48). Ave rage judgment latency dec reased somewhat for the set of c a s e s presented second, particularly for subjects ass igned to c a s e sequence 1; the var iance assoc ia ted with the effect of the within-subject factor (latency for normal order and reverse order cases) w a s notable (r\2 = .30). 142 Knowledge Accessibility The purpose of the sl ide task related to the Knowledge Accessib i l i ty sca le is reported here; the second purpose (memory priming) is reported later in sect ion D. Subjects ' tape recordings of verbal izat ions for three s l ides showing c lose-up v iews of abdominal incisions were transcribed and ana lyzed accord ing to the framework shown in Append ix D. T h e categories included in this sca le were perceptual knowledge, speci f ic item knowledge, relational knowledge, a n d contextual knowledge. In Tab le 4-11, the means and standard deviat ions for the number of references from each sub-sca le are reported, by exper ience level. Table 4-11 Means and Standard Deviations for Knowledge Accessibility Scale, by Experience Level Knowledge Accessibility Scale Perceptual Item Relational Contextual Knowledge Knowledge Knowledge Knowledge Experience Means Means Means Means Level (SD) (SD) (SD) (SD) 1 2.75 3.17 12.17 2.17 n=12 (1.22) (1.64) (4.57) (1.80) 2 2.67 3.75 12.67 1.83 n=12 (1.50) (1.48) (3.47) (1.59) 3 3.50 5.08 12.25 3.58 n=12 (1.93) (2.07) (4.14) (2.47) Note. SD = Standard Deviation There was a pattern for knowledge accessibi l i ty to increase with exper ience, with the except ion of relational knowledge. The proportion of var iance accounted for by exper ience in the total scores w a s smal ler than what w a s anticipated (n 2 = .07). 143 In summary of this sect ion, intercorrelations for exper ience and performance measures that were considered as possib le indicators of judgment expert ise are reported in Tab les 4-12(a) and 4-12(b). Table 4-12(a) Intercorrelation Matrix for Experience and Indicators of Expertise Based on Performance on Normal Order Cases Variables 1 2 3 4 5 6 7 8 1. • -2 .03 3 .32 .53** -4 .23 .14 .23 -5 .08 .15 -.21 .72** -6 -.12 -.16 -.05 .03 -.09 ~ 7 .21 .25 .25 .01 .10 -.29 -8 .16 .06 .13 .44 .30 -.10 .00 Note. 1 = Experience, in Months 2 = Accuracy, Error Variance Measure 3 = Accuracy, Percent Correct Measure 4 = Mean Confidence 5 = Calibration (Traditional Method) 6 = Mean Latency 7 = Knowledge Accessibility 8 = Consistency of Healing Time Judgments Correlations were based on 36 subjects * * E = < .01 144 Table 4-12(b) Intercorrelation Matrix for Experience and Indicators of Expertise Based on Performance on Reverse Order Case Variables 1 2 3 4 5 6 7 8 1 2. -.03 3 .09 .63** -4 .24 -.11 -.01 -5 .05 -.42* -.53** .68** -6 .01 .13 .11 .13 .04 -7 .21 .12 .31 -.02 -.22 -.15 -8 .20 -.01 .12 .27 .23 .12 .33 Note. 1 = Experience, in Months 2 = Accuracy, Error Variance Measure 3 = Accuracy, Percent Correct Measure 4 = Mean Confidence 5 = Calibration (Traditional Method) 6 = Mean Latency 7 = Knowledge Accessibility 8 = Consistency of Healing Time Judgments Correlations were based on 36 subjects *2< .05; **rj < .01. Severa l patterns of relations, all of very smal l magnitude, were evident. In both sets of c a s e s , conf idence and exper ience were positively related. The correlation between exper ience and accuracy was slightly higher in the normal order c a s e s than in the reverse order c a s e s . Knowledge accessibi l i ty demonstrated a 145 weak relationship with judgment accuracy in both sets of cases . A s ant icipated, knowledge accessibi l i ty had a negative relationship with latency. Contrary to expectat ion, consis tency of judgments of heal ing t ime w a s not related to judgment accuracy, and had little associat ion with exper ience. Section C: Results Related to Cognitive Constructs and Individual Differences Resea rch quest ion 2 is: What are the patterns of relat ionships among measures of conceptual structure, sensitivity to patterns in data, judgment p rocess , and performance in a clinical judgment task? Conceptual Structure Two tasks were used to obtain data about conceptual structure. The initial one was an assoc iat ion task where subjects were encouraged to talk about their exper iences with surgical patients; the purpose w a s to identify the way in which subjects verbal ized their exper iences, prior to being inf luenced by the other tasks included in the study. In the second task, subjects made similarity judgments for pairs of concepts . A s a means to reveal information about the organizat ion of concepts , mult idimensional scal ing (MDS) was carr ied out on the similarity judgments. Association task. The link between conceptual coherence and conceptual structure has been documented by Med in (1989), Med in and Ede l son (1988), and Murphy and Medin (1985). Subjects were tape-recorded as they responded to three brief c a s e situations. The verbal izat ions were transcr ibed and coded accord ing to 146 the framework entitled Conceptua l Cohe rence sca le avai lable in Append ix E. Th is sca le has four sub -sca les that were labe led as fol lows: conceptual familiarity, conceptual boundar ies, conceptual l inks, and conceptual structure. For three of these sub-sca les , high scores represented greater coherence , and for one sub -sca le (subjects' references to lackot knowledge or exper ience), high sco res reflected less coherence. For e a s e in interpretation and to achieve consis tency with the other sub -sca les , the polarity of this latter sub-sca le was reversed. In Tab le 4-13, the m e a n s and standard deviat ions for the number of references coded in each category, by exper ience level, are d isp layed. Table 4-13 Means and Standard Deviations for Conceptual Coherence Scale from Protocol Data, by Experience Level Conceptual Coherence Scale Conceptual Conceptual Conceptual Conceptual Familiarity Boundaries8 Connections Structure Experience Means Means Means Means Level (SD) (SD) (SD) (SD) 1 9.58 9.58 6.33 2.92 n=12 (3.78) (2.57) (3.06) (1.51) 2 11.67 10.83 7.58 3.17 n=12 (6.05) (1.47) (3.65) (1.85) 3 16.50 11.08 9.67 7.33 n=12 (6.59) (1.24) (4.46) (2.99) Note. SD = Standard Deviation. 8 Scores on this subscale have been transformed; high scores indicate greater coherence. 147 For three of the measures of conceptual coherence der ived from the protocol data (conceptual familiarity, conceptual connect ions, and conceptual structure), the effect of exper ience was notable. The second sub-sca le (conceptual boundaries) showed a only a slight increase with exper ience. Us ing subjects ' total scores for this sca le , exper ience level accounted for a substantial proportion of var iance in conceptual coherence (rf = .46). Concept similarity judgment task. To determine how subjects structured particular concepts , M D S was carr ied out using similarity judgments for all unique pairs of 16 concepts se lected for study. Gonza lvo , C a n a s , and Bajo (1994) found that M D S captured the global changes in structural representat ions of knowledge with learning. Wi lk inson, G imbe l , and Koepke (1982) used M D S to graph symptoms of i l lness. Stud ies carr ied out by these groups of researchers provided rationales and guidel ines for this method. E a c h subject 's conceptual structure (as revealed through M D S of concepts relevant to healing) was examined. The pairs of concepts for this similarity judgment task were d isp layed by personal computer. Subjects entered numbers from 0 to 9 express ing judgments of similarity. Us ing the S P S S A lsca l program, data were ana lyzed at the ordinal level , with the program al lowed to "untie" t ies. Basical ly , the aim of M D S is to opt imize the fit between any configuration and the data; the stress of the best-fitting configuration is a measure of goodness of fit (Kruskal, 1964). Mathematical ly, the s t ress is the square root of the normal ized residual sum of squares. Smal l va lues of s t ress indicate a good fit. The A lsca l algorithm was used to compute a fit measure cal led S-S t ress , which has two variat ions, S -S t ress formula 1, and S-S t ress formula 2. T h e s e formulae are avai lable in Dav ison (1983). A s recommended by Kruskal and W i s h 148 (1978) and Dav ison (1983), S -S t ress formula 1 was used because the data were similarit ies rather than preferences. E a c h subject 's similarity matrix was mult idimensionally sca led using the unweighted Eucl id ian model . The unweighted model is the most commonly u s e d spatial d istance model , and w a s se lected because the concepts were cons idered to be unitary; Weinberg (1991) recommended a Eucl id ian metric for such stimuli. Other models are more sui ted to stimuli that can be readily d e c o m p o s e d into constituent attributes, for example, for separab le features such as s i ze and shape . To make the decis ion of dimensionali ty, severa l s teps were fol lowed. First, solut ions were der ived in one, two, three, and four d imensions. One d imension was included because Shepa rd (1974) pointed out that there w a s a tendency for investigators using M D S to extract more d imensions than were necessary . In this study, the assessmen t of four d imensions was cons idered the upper limit; this decis ion was based on the criterion proposed by Kruskal and W i s h (1978) that the ratio of the number of stimuli to the number of d imensions should be at least 4 :1 . For e a c h subject, the stress was plotted as a function of dimensionali ty. If error were minimal, plots should show a bend (or "elbow") at a dimensional i ty that is cons idered to be the max imum (Schif fman et a l . , 1981). A slight bend either at two or three d imensions was observed in the plots for 10 subjects; no bend was noted for the other subjects. The next step was to a s s e s s the R 2 va lues (proportion of var iance of the sca led data wh ich w a s accounted for by the M D S Mode l d is tances). R 2 w a s plotted as a function of d imension, as recommended by Y o u n g and Hamer (1987). A sudden rise in R 2 suggests a good fit of the M D S model to the data. In this study, 149 only a few subjects' plots showed such a rise. In Appendix F, the st ress and R 2 for one to four d imensions are reported for all subjects. The third method used to determine dimensional i ty was to compare the s t ress va lues obtained at each d imension to f igures computed through Monte Car lo studies from S p e n c e and Ogi lv ie (1973) and S p e n c e (1979). Al l subjects ' M D S solut ions in two and three d imensions had stress va lues low enough to meet these criteria. Kruskal and W i s h (1978) caut ioned against accept ing solut ions with st ress va lues above .1, un less the solution were in one-d imension. Us ing this strict criterion, only 3 subjects had two-dimensional solut ions with sufficiently low st ress va lues ; w h e n three d imensions were used, this number increased to 14. Dav ison (1983), however, adv ised that if data contained considerable error, s t ress va lues over .1 may be accepted. Departure from monotonicity could ar ise from the fact that the stimuli used in this study were mental concepts pertaining to a particular field of knowledge; there is no definitively correct way to express the interrelationships among the concepts. Many M D S studies use objects with varying colour, hue and intensity, or physical stimuli such as Morse C o d e symbols , where objective, measurab le dif ferences among the features exist. It is reasonable to consider that the error would be less in these latter situations. In addit ion, Shepa rd (1974) expressed the opinion that investigators often p lace too much emphas is on numerical indicators of departure from monotonicity (stress) to the exc lus ion of more important considerat ions such as substant ive interpretability of the der ived configurations. The final approach to the quest ion of dimensional i ty w a s to a s s e s s whether the configurations were interpretable. B e c a u s e two-dimensional solut ions are much more easi ly interpreted, and because s o m e subjects were identified as very likely to 150 have solutions in only two d imens ions (based on stress measures and R 2 ) , two dimensional representation of the concepts were obtained for e a c h subject, and compared . They demonstrated a clear d imension that represented a cont inuum that could be label led healthy-unhealthy differentiation. A second d imens ion appeared to be phase in the healing process. The plots were ana lyzed for patterns that were predicted from the literature. O n e cluster of three concept-pairs w a s expected to become more differentiated with exper ience. Another cluster w a s anticipated to become c loser together as subjects attained greater associat ive connect ion. A final pair was conceptual ized to be related by the phase of the heal ing p rocess (early to late). In Tab le 4-14, the means and standard deviat ions for the average d is tances for the concept-pairs compr is ing the clusters are shown, by exper ience level. The measures of the dis tance between each of the constituent concept-pairs were taken from the two-dimensional plots. Table 4-14 Means and Standard Deviations for Average Distance Between Concept Pairs in Selected Clusters* by Experience Level Selected Concept Clusters Differentiation Phase of Healing Association Experience Means Means Means Level (SD) (SD) (SD) 1 8.33 6.50 6.17 n=12 (1.86) (3.11) (2.55) 2 8.34 4.71 5.23 n=12 (2.73) (2.62) (2.91) 3 9.44 6.17 4.38 n=12 (2.06) (2.48) (1.64) Note. SD = Standard Deviations. Thirty-six plots (one per subject) were analyzed. *The measures, in centimeters, were taken from two-dimensional plots derived from multidimensional scaling of similarity judgments. 151 With Differentiation and Assoc ia t ion concepts, there were very little dif ference in mean scores , al though the dif ferences were in the predicted direction. With P h a s e of Heal ing, the middle group had the smal lest d is tances which d id not conform to any pattern. Schvaneveld t et a l . (1985), in their research using M D S to reveal the conceptual structure of pilots, found that an analys is of all subjects' weighting of conceptual d imensions demonstrated an expert ise d imens ion. S u c h an ana lys is was carr ied out in the present study, using the Indscale procedure avai lable in S P S S . The axes were the two d imensions found relevant in the individual ana lyses . Coordinate points for all subjects were identified. The resulting plot based on der ived subject weights showed a negative linear trend, with a wider d ispers ion to the left of center. Points representing subjects with varying exper ience levels appeared to be intermixed. Severa l points representing subjects with high accuracy scores were located to the lower right s ide of the plot. No clear expert ise d imension, however, emerged . Sensitivity to Overall Patterns in Data To a s s e s s subjects ' differential sensitivity to data patterns, Brunswik ian lens model methodology was appl ied. In phase 1 of the study, the broad patterns related to incisional heal ing in the ecology were identified. In phase 2, the extent to which subjects of varying exper ience levels matched these patterns was revealed. Lens model task. A s descr ibed in chapter 3, subjects made judgments of heal ing time for two sets of 40 c a s e s . Ana lys is was based on 35 c a s e s ; repeated c a s e s were not used. 152 Traditionally in the lens model paradigm, judgment has been cons idered an intra-individual phenomenon captured in isolation from other individuals (K. R. H a m m o n d et a l . , 1975). T o be consistent with tradition, the examp les of Car ls t rom (1989) and C o o k s e y and Freebody (1987) have been fol lowed: non-parametr ic indices are reported to provide summarizat ion, and yet preserve the idiographic character of the data. Med ians and interquartile ranges for performance on normal order and reverse order c a s e s are presented in Tab le 4-15. Consider ing the level of task predictability, subjects ' performance w a s good in both the normal order and reverse order c a s e s . For normal order c a s e s , 13 subjects attained (or exceeded) the model R 2 of .70: 5 subjects from exper ience level 1, 3 from exper ience level 2, and 5 from exper ience level 3. For reverse order cases , 12 subjects attained (or exceeded) the model R 2 of .68: 5 subjects from exper ience level 1, 2 from exper ience level 2, and 5 from exper ience level 3. Us ing the lens model measu res a s indicators of sensitivity to broad patterns in the ecology, no patterns emerged for increased sensitivity with exper ience. For both lens model achievement and G - S c o r e s , the medians for the middle group were lower, compared to the other two groups. The median var iance expla ined (R 2 ) and median G - S c o r e s for the reverse order c a s e s inc reased somewhat with exper ience, but the other lens model per formance measures demonstrated no pattern. 153 £1 5 to o c .2 0) a x UJ in c o ii co Q) CO CO o CO 0 CD rr 73 c CO c o n CO CO CO co o CD X) co E CO S> CO CO 0 CO o c ro E i _ o t CD C L CD CO •o 0 § c y: • D CD c JO Q . CD O c ra \_ CO > c 0 E 3 CD i O o 6 0 CL c o O CD £ = co -o O DC 3 CO CO CO > CO DC co c <r f= CD i "2 CO > E co o P CO •go CO 2 c5 £ "E CD O DC i f ^ 6 CO CD "3 CD O DC CO ,_ E CD 0 o c .CD 0 Q . i2 IS •o DC c "O % DC 1 O IS IS i s o — 0 > 0 00 T -CM -3-00 - i -O) in oo o in co c o oo CO T -a> in CO T -O) CO in T -CM n 00 CM 00 T -co o h~ CM T - in co o CO f -00 o in c o in -r-m co CO T -CM co h - CM co O) in T -CM CM CO o 00 T -O CO CO -r-CM in oo o h - CO oo o CO T -co T -CD CO T -CO T -O O ) CO T -CM c o 0 CD c ro DC 0 •E ra Z3 0 +-» _c II DC g c ro T J 0 c T3 0 O 2 154 In Tab le 4-16, means and standard deviat ions for R 2 (variance explained) and for the matching index, 2-tY, (the degree to which subjects matched ecological validit ies with utilization coefficients) are reported, by exper ience level. Table 4-16 Means and Standard Deviations for Measures derived from Lens Model Analyses for Normal Order and Reverse Order Cases, by Experience Level Variance Explained (R2) and Ecological Validity-Matching Index (2d) Derived From Lens Model Analysis Normal Order Cases (n=35) Reverse Order Cases (n=35) Experience R 2 Means 2d Means R 2 Means 2d Means Level (SD) (SD) (SD) (SD) 1 .63 .32 .52 .27 n=12 (.14) (.15) (.22) (.13) 2 .63 .40 .54 .24 n=12 (.13) (.22) (.15) (.18) 3 .63 .27 .65 .24 n=12 (.11) (.12) (.09) (.12) Note. SD = Standard Deviation The means for R 2 increased with exper ience in the reverse order c a s e s , but did not increase in the normal order c a s e s . The Z-d was slightly lower (more ideal) in the reverse order c a s e s . Hammond et al . (1964) exp ressed a caut ion about the interpretation of this matching Index. Al though it may be tempting to examine e a c h difference product assoc ia ted with each cue, and make inferences about the relative 155 contribution of each cue to 2 -d , this would not be wise. Hammond and co l leagues expla ined that 2 - d is not a s imple sum of the mismatches, but is rather a result of interdependent difference products, or vectors. Subjects ' scores for lens model measures (ra, R s 2 , G , C , and 1-d) and accuracy measures (proportion correct and error variance) for normal and reverse order c a s e s are avai lable in Append ix G . Subject 's beta weights for significant var iables are avai lable in Append ix H. No pattern was evident in relation to the beta weights and exper ience. Cue importance task. Th is task was used to obtain subject ive pol icies for the judgment of heal ing t ime. Ten new c a s e s from phase 1 of this study, with a range of heal ing t imes, were se lected for this task. T h e subject ive weights of the cues were obtained by having subjects read case information, make a judgment of heal ing time, and rate the importance of each cue for each c a s e , as descr ibed in chapter 3. To convert these cue-weights into a policy, the fol lowing procedure was fol lowed. First, for cont inuous var iables, to make subjects ' cue-weights comparable between c a s e s , a variable cal led Caseweigh t was created by summing the raw cue -weights for e a c h case ; raw cue-weights were converted to relative cue-weights by dividing by Caseweight . S e c o n d , relative cue-weights were multiplied by the judged heal ing time to reflect the relationship with the subject 's judgment of the criterion, then divided by the average value for that cue. Division was necessary s o that the subjective weight could be multiplied by the value of the appropriate cue for a new case . The intermediate results were referred to as adjusted weights; these weights were components of the judged criterion in a proportion der ived by the subjective weighting of cues . Finally, for each subject, the adjusted weights were averaged over the 10 c a s e s to obtain contextual weights. 156 The categorical var iables were treated in the s a m e way, except that the adjusted weights were calculated only from c a s e s coded as 1 for that variable. The policy based on subjective weights consis ted of integrating these contextual weights for both cont inuous and categorical var iables in an additive manner. W h e n applying the policy der ived from subject ive weights to predict heal ing t ime, the polarity of the var iable Occupat ion (Bl ishen et al . , 1987) w a s reversed s o that lower numbers reflected higher soc ioeconomic status, rather than the opposi te. Einhorn and Hogarth (1975) pointed out that when predicting a criterion using pol icies based on subjective weights, cues must to be sca led s o that they have a consistent relationship to the criterion. The quality of the subjective policy was compared with the quality of the two regression pol ic ies. T h e three pol ic ies from e a c h subject were used to predict heal ing t imes for the new set of 35 cross-val idat ion c a s e s . Exper ienced subjects with considerable schemat ic knowledge of the ecology were theor ized to have greater accuracy on cross-val idat ion with subjective pol ic ies, compared to inexper ienced subjects. The predictions from the three pol icies were compared to the actual heal ing t ime obtained from phase 1 data, using the interval measure of heal ing t ime, a s before. In Tab le 4-17 data are presented for accuracy of cross-val idat ion, using accuracy based on error var iance and percent correct measure , by exper ience level . 157 ll to CO CD £ CD O c a UJ T3 CU CO CO m CO CD O a. CU E a •o o| CO To > CO CO o O 3 3 O o CO c o co > CU Q CO T J C CO 4-» C O - I • m co CO CU 0 Q. 1 CO !a 3 CO cu > CU CU o c .2 CU Q X UJ > .a (0 CU w 3 CO co CU o a> w k. o o cu u CU a. T J C CO CO CU CO CO o to CO c g to •g 3 CO CO o 1— O o 2 3 O 3 CO g> ]g o 0_ T 3 a> co CO C O c g 'co CO 0 C O 0 CC CO CO 0 o c CO c o Q . E 0 3 o o 0 O 0 •- 2 o c t .2 UJ CO > CO 0 CO co O k_ 0 •p 0 CO 1— CD CL o 0 ^ o O 0 i _ ° t .2 UJ co > CO 0 CO CO O 0 T 3 i o to o * o O 0 LU co > 0 o c: . 0 d 0 Q -X U J co ra Q cu CO CO ro Q CD CO 2 w CO ro Q CD CO co ro Q 0 CO co ro Q CD CO co c ro Q J CO 0 > 0 in c\j m cvi cd m T -1 0 co r» in CM in CO T -fe CM co co in T -co" co" - i -CM II c CO CM CM CO 5 C\] in § 2 l in h~ CO CO CO K CO T -CO CO O CM CO o CO CO CO T -co g> co C-CM CM in co CM CO CM CM Xa cq § c§SL CO -3" in h-ai d CO 1 -co ^ ^ o 8sl O N T - CM cd d CO T -8 ? CO CO CM CM CO CO •g CO CO o O 0 o c g "15 Q "E ro •o c ro CO a co o 2 3 O O co ro to T 3 0 C 0 1 0 co ro x: a. E o CO 0 co ro o 0 c in co >^  s2 T 3 0 ft 0 CO co ro co ro c o •*= 0 C O 0 to g T J c CO CD o o CO C O !c •a 0 E i o to c ro co ca 0 o c 2 '5 2 c o T 3 0 CO ro . 0 ro 3 o o f 158 All pol ic ies cross-val idated on the new set of 35 c a s e s moderately wel l , al though there w a s little relationship with exper ience. Exper ience accounted for a negligible proportion of var iance using the error var iance measure. Us ing the proportion correct measure , a very smal l proportion of var iance was expla ined (rf = .05 for the analys is of the policy der ived from regress ion on normal order c a s e s and the subjective policy; r|2 = .02 for the compar ison of the policy der ived from regression on reverse order c a s e s and the subjective policy). The effect of exper ience w a s in the direction predicted only with the regress ion-based policy der ived from reverse order c a s e s . Contrary to expectat ion, the means for accuracy for cross-val idat ion were highest for novices ' subject ive pol icies and for this group's regression pol icies based on the normal order cases . Judgment Process Judgment process was a s s e s s e d by two tasks. The first one was the information board task where subjects engaged in information seek ing and thinking a loud; these activities provided information about the sequence , amount, and variability of data se lec ted. Ana lys is of verbal protocols provided ev idence about the ways in which subjects at each exper ience level p rocessed information and used strategies. The second task used to a s s e s s judgment p rocess w a s the orthogonal judgment task. Measures of the subjects ' ability to discriminate heal ing t imes, and a measure of configural cue-use that was cons idered alternative to the lens model C -scores were derived from this task. Information board task. Ana lys is of the information board task w a s similar to that descr ibed by P a y n e (1976). The sequence of information search was 159 recorded on a p rocess tracing map. The pattern of search assoc ia ted with select ing the first eight cards, the second eight, and the remainder, was c lassi f ied as inter-dimensional (search proceeded over the var ious cue-d imens ions, within the s a m e case) , intra-dimensional (search proceeded within the s a m e cue-d imens ion, over the set of cases) , or mixed (elements of both strategies present). The depth of sea rch was def ined as the total number of cards turned over. A numeric index of search pattern was computed, using the method deve loped by P a y n e (1976). This index was defined as the number of alternative-wise moves minus the number of attribute-wise moves , divided by the sum of these two numbers. The resulting index has a range from -1 , indicating purely /nfradimensional, (or between-alternative) search , to +1, indicating purely /nterdimensional (or within-alternative) search . Shifts were def ined as moves that were neither alternative-wise or attribute-wise; such moves were counted. The variability of search was indexed by the standard deviat ion of the proportion of cards examined within the set of alternatives. Harte et a l . (1994) and Zakay (1990) descr ibed that search ing a constant number of d imensions for e a c h alternative indicated a compensatory strategy, whereas searching a var iable number of d imensions indicated a noncompensatory strategy. Us ing this definition, a lmost all subjects in this study used a compensatory strategy. T h e depth of sea rch ranged from 15 to 24. Twenty-eight subjects responded accurately and 8 did not. About two thirds of the subjects began the task with an intra-dimensional strategy. No relationship between strategy-use and accuracy w a s evident. Detai led data about information search are reported in Append ix I. T h e means and standard deviat ions for data related to performance on the information board task are avai lable in Tab le 4-18, by exper ience level. 160 Table 4-18 Means and Standard Deviations for Data Search Measures on the Information Board Task and Number of Subjects with Correct Judgments, by Experience Level Experience Level Judgment Performance Depth of Search Number of Shifts Search Pattern3 Number of Subjects with Correct Judgments Means (SD) Means (SD) Means (SD) 1 21.83 5.25 -.13 8 n=12 (2.44) (2.22) (.78) 2 20.33 3.92 .11 9 n=12 (2.35) (1.16) (.86) 3 19.67 3.83 -.34 11 n=12 (3.45) (1.47) (.64) Note. SD = Standard Deviation. a Search Pattern: Positive scores reflect more within-alternative (or interdimensional) search; negative scores reflect more between-alternative (or intradimensional) search. There was a pattern for exper ience to account for a smal l proportion of var iance in Depth of S e a r c h {rf = .10); the relationship was inverse, in that as exper ience increased, there was a smal l decrease in Depth of S e a r c h . The percent of subjects with correct judgments in each group increased with exper ience. Subjects var ied with respect to the degree to which they were systemat ic in searching for the data necessary to make their judgments. The most interesting subjects were those who initially appl ied a systemat ic, top-down strategy, a n d then became sensit ive to particular data, and al lowed bottom-up process ing to interact in a reciprocal way. S u c h an approach demonstrated subjects ' adapt iveness to the unique demands in the situation. 161 Ana lys is of the verbal protocols was carr ied out accord ing to P a y n e (1976), P a y n e et a l . (1993a, 1993b) and Er i csson and S i m o n (1993). The protocols a n d process tracing maps were a s s e s s e d for the number of instances of 18 features of information process ing. Features were categor ized into a Judgment P r o c e s s sca le , with three sub-sca les : metacognition, or execut ive functioning (including planning, identifying data as missing or not required, and indicating curiosity about particular data); input generation (including construct ing hypotheses or possibi l i t ies in working memory, making inferences, and perceiving clinical data as a particular pattern); and cognitive operations (including connect ing two or more p ieces of data in working memory, and weighing the relative importance of data). The Judgment P r o c e s s sca le , including 18 coding categor ies and examples from the protocols, is avai lable in Append ix J . Information process ing profiles were constructed for e a c h subject. In Tab le 4-19, the means and standard deviat ions for the number of references assoc ia ted with each sub-sca le of the Judgment P r o c e s s sca le are presented, by exper ience level. Table 4-19 Means and Standard Deviations for Judgment Process Scale, by Experience Level Judgment Process Subscales Metacognition Input Generation Cognitive Operations Experience - Means Means Means Level (SD) (SD) (SD) 1 6.42 4.17 5.83 n=12 (4.66) (3.04) (3.79) 2 5.83 4.00 5.33 n=12 (4.15) (2.30) (2.90) 3 10.17 8.42 7.42 n=12 (5.46) (5.07) (3.23) Note. SD = Standard Deviation. 162 There was a clear pattern evident for mean scores on each of the sub -sca les to increase with exper ience. Us ing sco res for the total Judgment P r o c e s s sca le , exper ience accounted for a notable proportion of var iance in this measure (rf = .21). T h e increases in scores were assoc ia ted primarily with the most exper ienced group. To a s s e s s the degree of cons is tency in da ta cod ing , as recommended by Harte et a l . (1994) and descr ibed by Fle is (1971), Cohen ' s K a p p a was computed. T h e use of C o h e n ' s K a p p a is a m e a n s for quantifying inter-rater agreement that is greater than that which is likely to occur by chance. A second person re-coded three of the protocols der ived from the information board task. Cohen ' s K a p p a w a s .72. B e c a u s e not all aspec ts of the verbal protocols could be captured in a numerical form, qualitative analys is of the verbal protocol data was carr ied out. In this sect ion, s o m e of the strategies that subjects used to search for and p rocess information have been identified. Examp les of how subjects we ighed and integrated information as they made judgments about the data are presented. O n e of the strategies used w a s general izat ion b a s e d o n category membership. The following are examples : Subject 28: "This client here has ulcerative colitis . . . just because ulcerative colitis patients a lways have a lot of problems, it's going to be long, ongoing problems with this guy." Subject 20 : "I a lways find out age first, and it helps me to judge how long they're going to take to heal because younger people heal a lot faster than older people." Another similar strategy used was making backward inferences, based on data avai lable. Examp les include: Subject 35 : "The fellow with the ulcerative colitis . . . He might have been on prednisone . . . so he would have impaired heal ing." 163 Subject 28 : "This patient with the lower abdominal t ransverse incision, with a supra-pubic c a t h e t e r , . . . that's probably been a Burch repair. A third strategy was making knowledge avai lable by deduct ive logic, based on the cues already in view. Examp les include the fol lowing: Subject 25 : "I can s e e that I have at least two women here, and so I don't need to turn over the age and gender on these bottom two". [This was sa id after turning over the cue cards for surgery; the two cards to which this subject referred were oophorectomy and Burch repair]. Subject 25 : "I can tell by the surger ies what their d iagnoses are." The exper ienced subjects s e e m e d to have a better idea of the relevance of particular data, compared to novice subjects. For example , when weigh ing the impact of particular problems and integrating data, an exper ienced subject [28] was highly aware of the relationship of the factors to heal ing time, whereas a novice subject [05] acknowledged the data, but did not s e e m able to integrate the data in relation to a goal : Subject 28 : "This patient with the Burch repair has s tomach hyperacidity and hypertension - she may have a lot of problems voiding, getting her catheter out, and may need to go home on home care. But in terms of the incis ion, she 's going to be fine. There 's no problems there." Subject 05 : "The patient in for a Burch repair who has hypertension and s tomach hyperac id i t y . . . H m , . . . S h e is probably a little anxious. . . . It would be really hard to say why they would do this on an 80 yeai old. I don't know " One difficulty subjects exper ienced was in ignoring data. S o m e subjects real ized that particular data were not relevant to the judgments, and yet the inf luence was evident in the judgment made. For example : Subject 12: "I'll take the vital s igns, just to compare The temperature is getting up there a little. Wel l , this is on admiss ion so this, U r n . . . I don't know if that would tell me too much about the way they'd heal because I've seen vital s igns change so much while people are in the h o s p i t a l . . . " 164 This inexper ienced subject subsequent ly did not give the correct rank order for heal ing t imes for the four patients, and gave the rationale that the one particular patient's vital s igns "were a little off". The protocol data provided a useful means to illustrate the cl inical reasoning employed by subjects who have a range of exper ience. Orthogonal-cue judgment task. There were two goals for this task: (1) To identify to what extent subjects make discriminations in heal ing t ime; and (2) to determine the proportions of var iance assoc ia ted with the main effects (Age, Weight, and Diabetic status) and with the interaction terms. Subjects ' calibration of heal ing t imes was a s s e s s e d by discrimination (the number of categor ies that subjects created when sorting the cards of patient data), and by est imates of rapid and de layed heal ing t imes. The orthogonal-cue judgment task was ana lyzed for configural cue-uti l ization using A N O V A , fol lowing the procedure descr ibed by Hoffman et a l . (1968) and Phe lps and Shanteau (1978). The four cues used in this task were regarded as fixed categorical factors, each with a number of levels: A g e had two levels, Diabetes had two levels, Height had three levels and Weight had four levels. The dependent var iable was the judgment made for each case . The c a s e s were constructed from an orthogonal combinat ion of cues in a completely c rossed factorial des ign. A separate Ana lys is of Var iance ( ANOVA) was carr ied out for each subject. T h e proportion of var iance in the dependent measure expla ined by interactions represented the patterned use of cues . Omega-squared (co2) w a s used to ach ieve the goals related to var iance expla ined; this measure was computed as recommended by Slov ic (1969). The equat ions for co2 for main effects and interaction terms for a f ixed effects model were taken from Howell (1992, p. 407). 165 In Tab le 4-20, the means and standard deviat ions for these information process ing measures and for co2 are d isp layed by exper ience level. There was a pattern of relations evident with the information process ing data. Exper ience accounted for a smal l proportion of the var iance in the number of categor ies generated (rf = .10). A s s e s s m e n t of the mean scores revealed that this relationship w a s inverse: exper ienced subjects tended to generate fewer categor ies, compared to subjects with less exper ience. Exper ience accounted for a very smal l proportion of var iance in subjects' est imates of low heal ing t imes (rf = .05), and a notable proportion of var iance in the high heal ing t imes ( T I 2 = .26). Exper ience accounted for a negligible proportion of var iance in co2 for interaction (rf = .02); the slight relationship observed, was in the direction opposite to what was anticipated. There was variability in these mean scores , but only a smal l part of it w a s captured by exper ience. 166 <5 9 o T3 a> a 3 a to • a o V E a o> E P (0 CO CO k_ £ o (0 a> w 3 (0 CO a c o 13 > 0) Q (0 T3 C n •4-» C O 73 C co 0 CM 1 s (0 c g 'to To O co CD k_ 3 CO CO 0 CO C g To co i _ £ E co — 3 cr CO cfe co 0 co E t3 O £ HI c 'co > » u c 0 V a X UJ > J * ' 0) CO 0 E o n 3 " 3 © to E 'to LU co 0 CO 0 E E q o c g 'To c 'E •c o co co 0 CO co TO Q 0 CO CO TO Q 0 co CO TO Q 0 CO co TO Q 0 CO 0 o .§ |j 0 a> a. -J UJ l O CO CO T -CO O) T - O CM O T - h-in oi in CM CM -r-a> in c\i in CD CM in at CM cn ai rj-" CO i -co o co co CM Ti c CD O in oj in oo h-* CM O CD in ai ai CO T -ai co co a> co co cb cvi CM CM h- in CM CM T - h ~ cn CM cb •t CM oo oo o o CO I -m o> t< CM N T-CO CD CD CM CM CO c g To 0 Q TO •o c TO 0 To 0 c 0 CO CO 0 1 o CO 0 To o 0 . Q E 3 c 0 n T3 0 CO CO 0 3 CO TO C o 0 T3 ro c Si E to -c <> £ Q b CO « T3 0 c ro CL X 0 0 o c ro \_ ro > c 0 a 0 CL T3 0 •c 0 > c o o 9> 0 T3 0 ro 3 cr C0 1 ro co 0 E O •s o co 3 CO CO 0 0 E i-co c To 0 1 II 0 E i-1 x 167 Three addit ional measures of configural cue-use avai lable were C-sco res from the lens model ana lyses , the variability of data search from the information board task, and variability of the measure of the importance of information from the cue importance task. The means and standard deviat ions for these indices of configural cue-use are avai lable in Tab le 4-21. B e c a u s e the C-sco res were correlations, prior to addit ion, a Fisher 's transformation (r to r') w a s appl ied; the mean scores were transformed back to the correlational sca le for reporting. Table 4-21 Means and Standard Deviations for Measures of Confiqural Cue-Use. bv Experience Level Configural Configural C-Scores Index Index Normal Reverse Variability Variability Order Cases Order Cases of Information of Cue (n=35) (n=35) Search Importance Experience Means Means Means Means Level (SD) (SD) (SD) (SD) 1 .02 .43 .95 2.3 n=12 (.27) (.28) (.62) (1.5) 2 -.04 .43 .86 3.3 n=12 (.26) (.40) (.41) (1.6) 3 .04 .38 .62 2.9 n=12 (.22) (.23) (.76) (1.4) Note. SD = Standard Deviation. a C-Scores were transformed using a Fisher r to r' transformation prior to the calculation of means, then transformed back to correlations. In contrast to expectat ions, the variability of information search decreased with exper ience (n2 = .05), as did the variability of cue- importance (rf = .08). The means for C - S c o r e s were considerably larger in the reverse order c a s e s compared to that in the normal order c a s e s . 168 In Tab le 4-22(a) the correlations among the four measures of configural cue -use, accuracy, and exper ience for normal order c a s e s are d isp layed. Table 4-22(a) Intercorrelations Among Measures of Configural Cue-Use. Accuracy Meaures. and Experience, for Normal Order Cases Variables 1 2 3 4 5 6 7 1 2 .03 3 .17 -.03 -4 .01 -.04 -.03 -5 -.24 .16 .09 .24 ~ 6 .17 .15 .08 .12 .53** 7 -.16 .07 -.08 .08 .03 Note. 1 = Omega-Squared for Interaction Terms 2 = C-scores from Lens Model Task 3 = Variability of Information Search 4 = Variability of Cue Importance 5 = Accuracy, Error Variance Measure 6 = Accuracy, Percent Correct Measure 7 = Experience, in Months Correlations were based on 36 subjects. ** p_ < .01 169 In Tab le 4-22(b) the correlations among the four measures of configural cue -use, accuracy, and exper ience for reverse order c a s e s are reported. Table 4-22(b) Intercorrelations Among Measures of Configural Cue-Use. Accuracy Meaures. and Experience, for Reverse Order Cases Variables 1 2 3 4 5 6 7 1 2 -.07 -3 .17 -.07 -4 .01 .01 -.03 -5 -.34* .27 .03 .06 -6 -.07 -.13 .08 .28 .63** 7 -.16 -.16 -.08 .08 -.03 .09 Note. 1 = Omega-Squared for Interaction Terms 2 = C-scores from Lens Model Task 3 = Variability of Information Search 4 = Variability of Cue Importance 5 = Accuracy, Error Variance Measure 6 = Accuracy, Percent Correct Measure 7 = Experience, in Months Correlations were based on 36 subjects. *p. < .05. ** fj < .01. 170 With the except ion of the two accuracy measures , these correlat ions tended to be very smal l , as anticipated with a sample s ize of 36. A negative relationship was observed between co2 for the interaction term and accuracy a s s e s s e d by error var iance; this w a s not the direction predicted. The variability of cue importance and accuracy a s s e s s e d by error var iance demonstrated a posit ive relationship in the normal order c a s e s , but not in the reverse order c a s e s . The variability of cue importance and accuracy a s s e s s e d by the percent correct measure w a s posit ively related in the reverse order c a s e s . Age, Education and Experience R e s e a r c h quest ion 3 is: What are the patterns of relat ionships among individual dif ferences in age, educat ion, and exper ience and performance in a clinical judgment task? The descript ion of subjects with respect to age category and educat ion w a s reported in Sect ion B. O n e subject was over 55 by a month; to prevent this one score from having undue influence on correlations, this subject's age was adjusted to the immediately preceding age category. B e c a u s e the subjects' precise ages were not known, each age was approximated by using the midpoint of the sca le from the quest ionnaire. Thus , the ages used in the analys is became 20 , 30, 40, and 50. Exper ience was a s s e s s e d by the number of months working in a surgical setting. Exper ience was also indexed from the est imate of the total number of incisions v iewed during the subject 's career. A further means of tapping exper ience was extracted from qualitative data (verbal references to exper ience lacking from the protocols). This measure was linearly transformed so that low scores indicated little exper ience. 171 The intercorrelations between these exper ience measures , age , educat ion, and accuracy for normal order c a s e s and reverse order c a s e s are provided in Tab les 4-23(a) and 4-23(b). Table 4-23(a) Intercorrelations Between Experience Measures. Age. Education, and Judgment Accuracy for Normal Order Cases Variables 1 2 3 4 5 6 7 8 1 — 2 .87** 3 .14 .03 -4 .62** .51** .35* -5 -.46** -.36* -.02 -.20 ~ 6 .34* .38* .00 .34* .06 -7 .03 -.01 -.06 -.14 -.19 .18 -8 .32 .29 -.11 .17 -.22 .13 .53** Note. 1 = Experience, In Months 2 = Experience Expressed as Number of Incisions Seen 3 = Experience, Protocol Measure 4 = Age 5 = General Education 6 = Nursing Education 7 = Accuracy, Error Variance 8 = Accuracy, Percent Correct Correlations were based on 36 subjects. * p. < .05. ** rj < .01. 172 Table 4-23(b) Intercorrelations Between Experience Measures, Age. Education, and Judgment Accuracy for Reverse Order Cases Variables 1 2 3 4 5 6 7 8 1 2 .87** 3 .14 .03 -4 .62** .51** .35* -5 -.46** -.36* -.02 -.20 -6 .34* .38* .00 .34* .06 -7 -.03 -.02 .00 -.15 .03 .20 ~ 8 .09 .02 -.23 -.20 -.19 .03 .63** Note. 1 = Experience, In Months 2 = Experience Expressed as Number of Incisions Seen 3 = Experience, Protocol Measure 4 = Age 5 = General Education 6 = Nursing Education 7 = Accuracy, Error Variance 8 = Accuracy, Percent Correct Correlations were based on 36 subjects. * < .05. **rj < .01. Exper ience was strongly related to the number of incis ions s e e n , nursing educat ion, and to age. The percent correct measure of accuracy had a weak associat ion with exper ience in the normal order cases . 173 The correlations between the cognit ive measures , exper ience a s s e s s e d in a variety of ways , and measures of accuracy are reported in Tab le 4-24. Table 4-24 Correlation Coefficients Between Cognitive Measures and Measures of Judgment Accuracy for Normal Order and Reverse Order Cases Correlation Coefficients for Judgment Accuracy Normal Order Cases (n=35) Reverse Order Cases (n=35) Percent Percent Error Variance Correct Error Variance Correct Variables Measure Measure Measure Measure 1 -.02 .22 -.27 -.08 2 .13 .25 .13 .21 3 -.07 .09 .21 .19 4 .15 .27 .36 .05 5 -.20 -.34* -.20 -.10 6 -.08 .14 .14 .34* 7 .04 -.15 -.19 -.11 8 -.15 -.14 -.19 -.16 Note. 1 = R-Squared, from Similarity Judgment Task 2 = Conceptual Coherence Scale, from Protocols 3 = R-Squared, from Lens Model Task 4 = Achievement, from Lens Model Task 5 = Matching Index (2d) from the Lens Model Task 6 = Judgment Process Scale, from Protocols 7 = Depth of Search from Information Board Task 8 = Pattern of Search from Information Board Task Correlations were based on 36 subjects * p. < .05. 174 With the normal order c a s e s , Ach ievement from the lens model task, the Conceptua l Cohe rence sca le from the protocol data, and the R 2 from the similarity judgment task were the three var iables with posit ive relations with percent correct measure of accuracy. The matching index, Z -d , was negatively related to the percent correct measure of accuracy, which showed that subjects with more ideal matching a lso (as expected) tended to be more accurate. With the reverse order c a s e s , Ach ievement from the lens model task w a s related to the error var iance measure. The Judgment P rocess sca le was related to the percent correct measure . R 2 from the similarity judgment task w a s inversely related to the error var iance measure , which was surprising because this latter measure had been transformed so that high scores reflected greater accuracy. The error var iance measure and the percent correct measure appear to be tapping different aspects of judgment performance quality. Subjects with extreme errors in prediction were being penal ized when the error var iance measure w a s used, whereas with the percent correct measure , there w a s no such penalty. Section D: Results Related to Multivariate Analyses Results Related to Research Conditions Resea rch Quest ions 4 and 5 are: To what extent does cue-presentat ion condit ion (context cues fol lowed by individuating cues , or the reverse), reveal patterns of relat ionships in performance in a cl inical judgment task? To what extent does memory-pr iming condit ion (exposure to relevant domain-speci f ic visual stimuli, versus no exposure) , reveal patterns of relationships in performance in a clinical judgment task? 175 In the heal ing time judgment task, 18 of the subjects (those in c a s e sequence 1) completed the normal order c a s e s first, and then the reverse order c a s e s ; the other 18 subjects (those in c a s e sequence 2) completed the c a s e s in the opposi te sequence . With 36 subjects, the inclusion of both between-subject factors (slide condit ion and c a s e sequence) , and exper ience level, made cell s i zes extremely smal l . Multivariate ana lyses of var iance ( M A N O V A ) , therefore, were carr ied out twice, using an a lpha level of .05. The first t ime c a s e sequence was included to determine that it w a s not significant; the second time, c a s e sequence was omitted. The mean squares, degrees of f reedom, F-value and signi f icance of F related to these ana lyses are provided in Tab le 4-25. Table 4-25 Multivariate Analysis of Variance for the Effects of Slide Condition and Paragraph Order (Normal Order and Reverse Order) on Judgment Accuracy (Error Variance Measure) MANOVA Summary Figures Source df MS F Sig of F Between Subjects Slide Condition (S) 1 .76 .05 .83 Experience (E) 2 2.78 .17 .85 S by E 2 44.08 2.68 .09 Error 30 16.47 Within Subjects Paragraph Condition (P) 1 28.20 7.69 .01 S by P 1 .18 .05 .83 E by P 2 .26 .07 .93 S by P by E 2 1.97 .54 .59 Error 30 3.67 176 There was a significant difference between measures of performance based on error var iance related to paragraph order. Paragraph order w a s a within-subject factor. Subjects in sl ide condit ion 1 v iewed s l ides of incisions immediately prior to performing the heal ing t ime judgment task. Subjects in sl ide condit ion 2 s a w the s a m e s l ides following this task. B e c a u s e in this ana lys is c a s e sequence w a s not significant, col lapsing across sequence was carr ied out. A M A N O V A was performed, using accuracy a s s e s s e d by the percent correct measure on the two sets of c a s e s a s dependent var iables. The mean squares, degrees of f reedom, F-value, and signif icance of F can be seen in Table 4-26. Table 4-26 Multivariate Analysis of Variance for the Effects of Slide Condition and Paragraph Order (Normal Order and Reverse Order) on Judgment Accuracy (Percent Correct Measure) MANOVA Summary Figures Source df MS F Sig of F Between Subjects Slide Condition (S) 1 186.16 1.27 .27 Experience (E) 2 170.50 1.16 .33 S by E 2 566.45 3.85 .03 Error 30 147.15 Within Subjects Paragraph Condition (P) 1 18.03 .27 .60 S by P 1 84.12 1.28 .27 E by P 2 62.52 .95 .40 S by P by E 2 19.25 .29 .75 Error 30 65.74 177 There w a s a significant interaction between exper ience level and s l ide condit ion. B e c a u s e paragraph order was not significant, the total percentage of correct judgments was computed and used in the calculat ion of means . In Tab le 4-27, the means and standard deviat ions for judgment accuracy are presented, by exper ience level. The cell means reveal that exper ienced subjects (as anticipated) tended to perform better with enhanced memory-pr iming; contrary to expectat ions, however, nov ice subjects who were not exposed to the s l ides a lso had high scores . Table 4-27 Means and Standard Deviations for Judgment Accuracy (Percent Correct for all Cases -Normal Order and Reverse Order Taken Together), by Slide Condition and Experience Level Judgment Accuracy (Total Percent Correct Measure) Slide Condition 1a Slide Condition 2b Experience Means No. of Means No, of Level (SD) subjects (SD) subjects 1 34.33 6 48.50 6 n=12 (13.53) (4.75) 2 36.75 4 38.06 8 n=12 (8.58) (7.44) 3 45.56 8 40.13 4 n=12 .(8.91) (2.17) Note. SD = Standard Deviation a Subjects viewed slides prior to the judgment task. b Subjects viewed slides after the judgment task. 178 Results Related to Methods for Confidence Assessment Measures of conf idence were significantly inf luenced by the method used to determine conf idence. A compar ison of the results of the concurrent and retrospective methods for assess ing conf idence using M A N O V A revealed that (using an a lpha level of .05), there were significant dif ferences between the two methods. In normal order c a s e s , F(1 , 29) = 15.27; M S E = 3039.87. In the reverse order c a s e s , F(1 , 29) = 9.43; M S E = 1310.30. Section E: Summary R e s e a r c h quest ion 6 is: Of all the measures inc luded in the study, wh ich measures are most predictive of cl inical judgment per formance? Prediction of Judgment Expertise Normal order cases. Us ing a stepwise approach with an inclusion criterion of £ = .05, and the error var iance measures of accuracy as the dependent var iable, no cognit ive var iables or exper ience var iables entered the equat ion. None of the variability in accuracy was accounted for by the var iables included in the study. If, however, the percent correct measures of accuracy were used as the dependent var iable, seven predictors entered the equat ion. T h e s e var iables were: P h a s e Of Heal ing, Conceptua l Assoc ia t ion , Wound C losure Concep ts , Conceptua l Structure R 2 for Three Dimens ions (from the mult idimensionally sca led concepts from the similarity judgments task), High Est imate Of Heal ing T ime, and O m e g a -Squared For Main Effects (from the orthogonal judgments task), and Genera l Educat ion. The R 2 was .67 (adjusted R 2 was .59). 179 In Tab le 4-28(a), the intercorrelations between these predictor var iables, exper ience, and the percent correct measures of accuracy are reported. Table 4-28(a) Intercorrelations for Predictor Variables for Accuracy for Normal Order Cases (Percent Correct Measure), and Experience Variables 1 2 3 4 5 6 7 8 9 1 2 .01 ~ 3 -.06 .21 -4 -.10 -.01 -.32 -5 -.04 -.02 .30 .11 -6 -.14 -.01 .03 -.00 .03 -7 -.05 -.10 -.10 -.18 -.10 .11 8 .07 -.16 -.47** .11 -.16 -.46** -.01 9 .40* -.36* -.38* -.24 .10 -.22 -.12 Note. 1 = Phase of Healing from MDS of Similarity Judgments 2 = Conceptual Association from MDS of Similarity Judgments 3 = Estimate of Delayed Healing Time from Orthogonal Judgment Task 4 = Wound Closure concepts from MDS of Similarity Judgments 5 = Omega-Squared for Main Effects from Orthogonal Judgment Task 6 = General Education, Scored Dicotomously (Basic, Extra) 7 = R 2 for Three-dimensional Solutions for MDS of Similarity Judgments 8 = Experience, in Months 9 = Accuracy (Percent Correct Measure) Correlations were based on 36 subjects. *fi< .05. **2< .01. 180 The variable Est imate of De layed Heal ing was most highly correlated with exper ience; contrary to expectat ions, however, this correlation w a s negative. P h a s e of heal ing and Conceptua l Assoc ia t ion were most highly related to accuracy. Reverse order cases. Us ing a s tepwise approach with an inclusion criterion of p_ = .05, and the error var iance measures of accuracy as the dependent var iable, three var iables entered the equat ion. Signif icant predictors were: P h a s e Of Heal ing (related to the concepts from the similarity judgments task), O m e g a - S q u a r e d For Interactions (from the orthogonal judgment task), and Ach ievement (from the lens model task). The R 2 w a s .46 and the adjusted R 2 w a s .41. T h e correlation between these predictor var iables, exper ience, and accuracy are given in Tab le 4-28(b). Table 4-28(b) Intercorrelations for Predictor Variables for Accuracy for Reverse Order Cases (Error Variance Measure), and Experience Variables 1 2 3 4 5 1 -2 .05 -3 .31 .14 -4 .07 -.16 -.07 -5 .52** -.35* .36* -.03 -Note. 1 = Phase of Healing from MDS of Similarity Judgments 2 = Omega-Squared for Interaction Effects from Orthogonal Judgment Task 3 = Achievement from the Lens Model Task 4 = Experience, in Months 5 = Accuracy (Error Variance Measure) Correlations were based on 36 subjects. * £ < .05. ** fj < .01. 181 If the percent correct measures of accuracy were used, only one var iable entered the equat ion, giving an R 2 of .13 (adjusted R 2 of .10). Th is var iable w a s Low Est imate Of Heal ing T ime. The intercorrelations between these predictor var iables, exper ience, and percent correct measures of accuracy are presented in Tab le 4-28(c). Table 4-28(c) Intercorrelations for Predictor Variables for Accuracy for Reverse Order Cases (Percent Correct Measure), and Experience Variables 1 2 3 1 2 -.22 -3 -.35* .09 - • Note. 1 = Estimate for Fast Healing Time from Orthogonal Judgment Task 2 = Experience, in Months 3 = Accuracy (Percent Correct Measure) Correlations were based on 36 subjects * fi < .05. Summary of Patterns of Relations This chapter has been a report of the ana lyses and results related to cl inical judgment performance. A number of interesting patterns of relations were identified. In summary, dif ferences in conceptual structure were found to predict accuracy in an intuition-inducing task, but not in an analysis- inducing task. With judgment p rocess , exper ienced subjects showed greater metacognit ion, more reflectivity, and a higher level of adaptability, compared to novice subjects. Conf igural cue-uti l ization 182 a s s e s s e d by lens model C - S c o r e s was higher in the reverse order c a s e s , compared to the normal order c a s e s . Conf igural process ing a s s e s s e d by omega-squared for interaction terms dec reased with exper ience in a task where cues were orthogonal ized. G iven the level of task unpredictability, subjects performed well in the lens model task: subjects s e e m e d to have good sensitivity to patterns of da ta related to incisional heal ing. Subjects ' ability to match ecological validit ies was somewhat better in the reverse order c a s e s , compared to the normal order c a s e s . The relationship between 2-0* and the percent correct measure of accuracy was slightly stronger under intuition-inducing condit ions. W h e n scores were a s s e s s e d in terms of absolute accuracy, however, performance would be descr ibed as moderately good. Al though there was little associat ion between performance and exper ience in the reverse order cases , these var iables were positively related in the normal order c a s e s . A d iscuss ion of these results is presented in chapter 5. 183 V. DISCUSSION This chapter has five sect ions. In sect ion A , the results related to e a c h of the tasks were interpreted, and related to the literature. Poss ib le reasons for s o m e of the results have been identified. Findings were organized by the research quest ions. Sec t ion B is a d iscuss ion of the f indings related to the tasks and methods and their implications with respect to the phi losophical tradition with which they are assoc ia ted. A n attempt was made to ach ieve unification. S o m e limitations to the study were identified. Sect ion C is a commentary on how the f indings add a number of important contributions to the research that has been conducted in cl inical judgment and expert ise. The implications of the f indings to professional educat ion and practice have been presented in sect ion D. The final sect ion includes a summary of the study and s o m e suggest ions for future research. Section A: Discussion of Major Findings Research Question 1 What are the patterns of relat ionships among var ious measu res of judgment performance (indicators of expertise) and exper ience for a group of subjects in a clinical judgment task? The following indicators of expert ise are d i scussed : judgment accuracy, judgment conf idence and calibration, consis tency of judgment, judgment latency, and knowledge accessibi l i ty. Judgment accuracy. There were very little dif ferences in the mean accuracy of judgments of heal ing time with exper ience (Table 4-04). Th is finding 184 replicated previous research (Camerer & Johnson , 1991; Faust et al . , 1988; Garb , 1989; Goldberg, 1968). Subjects in exper ience level 2, at t imes, performed less well compared to the other two groups. Lesgo ld et a l . (1988), in their research with phys ic ians o n the interpretation of x-ray fi lms, investigated the development of expert ise in a complex skil l . T h e s e researchers found that, whereas traditional learning theory would predict per formance improving steadi ly with practice, for certain problems, performance became worse, prior to improving. Lesgo ld and co l leagues found that subjects' performance was not a monotone function of exper ience, and gave examples where third- and fourth-year residents performed less wel l than either experts or first- and second-year residents. Lesgo ld et a l . (1988) noted that a similar phenomenon has been reported by developmental psychologists; in a variety of developmental studies (cited in Lesgo ld , 1984), a skill is observed to be present at one age, is miss ing somewhat later, and is again present s o m e time after that. The example was given of chi ldren's use of irregular past tense verbs and plural nouns. T h e s e researchers expla ined the nonmonotonicity effect they had observed in phys ic ians by compar ing it to these developmental irregularities. A s practit ioners deve lop increasing skil l , the transition to more sophist icated approaches may decrease performance temporari ly until the more advanced strategies have been pract ised sufficiently. In the present study, the per formance of the intermediate group was , at t imes, anomalous. For example, examinat ion of Tab les 4-04, 4 -21, and 4-25, revealed that subjects in this middle group had scores that deviated from the 185 trend shown by the other two groups, a pattern that demonstrated "U-shaped" distributions. Judgment confidence and calibration. Conf idence was conceptua l ized as metacognit ive knowledge: an est imate of the degree of trustworthiness of one 's own judgments. A probabil ist ic task with a high level of uncertainty ought to induce limited conf idence (Keren & Wagenaar , 1987). Even though subjects were informed about the probabil istic nature of the task, exper ienced subjects appeared unable to moderate their conf idence ratings accordingly. Th is finding is consistent with the results of a study conducted by B a u m a n n et a l . (1991) with 40 intensive care nurses. In other studies, this pattern for conf idence to increase with exper ience (Table 4-05) has been observed (Fischhoff et al . , 1977; Koehler , 1994; Lichtenstein & Fischhoff, 1977; Einhorn & Hogarth, 1978). A n assessmen t of judgment calibration revealed deviat ions from perfect calibration. S e e Tab les 4-07 and 4-08. Th is finding a lso repl icated previous research related to overconf idence in judgment (Lichtenstein, et a l . , 1982; O s k a m p , 1962, 1965). Nov ice subjects were more modest in their conf idence ratings, and thus they demonstrated better overall cal ibration, compared to exper ienced subjects. In relation to conf idence for correct and incorrect judgments reported separately (Table 4-09), conf idence was consistently higher for correctly judged c a s e s , compared to c a s e s incorrectly judged. B a s e d on the differences shown between mean sco res for correct and incorrect cases , however, most subjects did not make large distinctions. Conf idence for reverse order c a s e s w a s somewhat lower (and thus more appropriate) compared to that shown for normal 186 order c a s e s . B a s e d on these f indings, it s e e m s that analytic condit ions induced less overconf idence, compared to intuitive condit ions. Consistency of judgments. Correlat ions between repeated judgments of heal ing time were considerably higher and less variable, compared to the correlat ions of repeated judgments of conf idence (Table 4-07). O n e factor contributing to the low cons is tency of conf idence judgments w a s that, for many subjects, the range for conf idence judgments was low. Another factor w a s that the number of c a s e s (five) may have been insufficient to a s s e s s consistency, a point made recently by S iege l - Jacobs and Ya tes (1996). A third factor w a s that conf idence in one 's judgment is a subjective state that may lack stability over time. The exper ience of having made many judgments could have altered subjects' conf idence for subsequent judgments. Judgment latency. There were smal l dif ferences in mean latency assoc ia ted with exper ience in the heal ing time judgment task (Table 4-10). There was a lso an effect of sequence : subjects responded more quickly to the set of c a s e s in the heal ing t ime judgment task that w a s presented second . Th is tendency may have been a "practice" effect, or it may have represented subjects' motivation for not prolonging the research sess ion any longer than necessary . The relationship between latency and exper ience was more complex than what can be shown by these data: short latencies were observed with subjects who were very capab le , a s well a s with those who made care less errors; long latencies were noticed with subjects who were highly motivated to perform to the best of their ability, as well as with those who del iberated, but were less able to make accurate predictions, compared to other subjects. 187 Knowledge accessibility. T h e Knowledge Accessibi l i ty sca le is avai lable in Append ix D. The means increased only slightly with exper ience, indicating that exper ience accounted for a only smal l proportion of var iance in sco res (Table 4-11). The smal l mean dif ferences may be due to the fact that for expert c l in ic ians, knowledge is encoded in dynamic patterns wh ich are difficult to articulate. Revea l ing them may require the instantiation of a n authentic context, and c lose physical interaction with the phenomenon of interest. Seve ra l exper ienced subjects stated that they w ished they cou ld directly evaluate a n a r e a of the incision us ing forceps or a probe: they wanted "hands-on" assessmen t . The findings related to knowledge accessibi l i ty were important not s o much because of the quantitative measures of particular knowledge from each sub-sca le , but rather because this task revealed qualitative aspec ts of the translation of perceptual knowledge into language. Dif ferences in knowledge accessibi l i ty may be difficult to detect by an examinat ion of the dialogue, in part because s o m e knowledge may be so deeply internalized that it is not expressed overtly. This concept ion, consistent with Reber 's (1989) and Dreyfus and Dreyfus' (1986) v iews about the use of intuition, constitutes a poss ib le explanat ion for the f indings. There is much literature on knowledge access. B rown (1982) d i scussed a c c e s s of mental process ing, us ing Roz in ' s (1976) theory: In the course of evolut ion, cognit ive programs have become more access ib le and , therefore, c a n be used flexibly in a variety of situations. Brown v iewed consc ious control a s being the highest level . Yet , with experts, there often is no consc ious awareness of retrieval p rocesses . It is poss ib le that tacit knowledge of implicitly learned patterns cou ld be regarded as higher than consciously-contro l led knowledge. 188 Reber (1989) d i scussed the commonly a s s u m e d epistemic priority of consc iousness . He argued: [Knowledge acquired from implicit learning procedures is knowledge t h a t . . . is a lways ahead ol the capabil i ty of its possesso r to expl icate it.. . The implicitly acquired epistemic contents of mind are a lways richer and more sophist icated than what can be expl icated (Reber, 1989, p. 229). In order to address the issue of which type of knowledge is higher, the purpose must be cons idered. If the goal is teaching, then having consc ious ly-control led knowledge ought to be regarded as the highest order. If the goal is clinical practice, however, implicit knowing-in-action as S c h b n (1983) descr ibed, may be most useful. Relationships among indicators of expertise. W h e n accuracy w a s a s s e s s e d by the percent correct measure in the normal order c a s e s , the correlation between accuracy and exper ience was .32. In the reverse order c a s e s the comparable correlation was .09 [Tables 4-12(a) and 4-12(b)]. It is poss ib le that task factors may have inf luenced the strength of the relationship between exper ience and performance. The normal order c a s e s could be performed through the use of either an intuitive or analytic strategy; in contrast, the structure of the reverse order c a s e s would make intuitive process ing very difficult, if not impossib le. Dreyfus and Dreyfus (1986) c la imed that as expert ise is acquired, the use of intuition increases. The difference in the exper ience-performance relationship between normal order and reverse order c a s e s constituted support for Dreyfus and Dreyfus' ideas about the use of intuition by exper ienced practit ioners. It is a lso possib le, however, that the particular c a s e s se lected for normal order and reverse order presentation evoked the observed dif ferences. 189 The response to research quest ion 1 can be summar ized a s fol lows: the f indings replicated what has been found in the literature: little relationship w a s observed between performance accuracy and exper ience. W h e n accuracy w a s a s s e s s e d using an intuition-inducing task, however, exper ience accounted for a greater proportion of var iance in performance compared to assessmen t under analysis- inducing task condit ions. Judgment consis tency and conf idence tended to increase with exper ience; judgment calibration and latency tended to decrease. Research Question 2 What are the patterns of relat ionships among measures of conceptual structure, sensitivity to patterns in data, judgment p rocess , and performance in a clinical judgment task? Conceptual structure. T w o tasks were used to obtain data about conceptual structure: the associat ion task, and the similarity judgments task. The associat ion task provided protocol data to reveal thinking related to clinical judgment. The Conceptua l Cohe rence sca le is avai lable in Append ix E. The anticipated pattern of changes with exper ience w a s demonstrated (Table 4-13), particularly with two of the sub-sca les (concept familiarity and conceptual structure). Fo r s o m e subjects, the predicted dif ferences in the structure of verbal izat ions (for example , express ion of cl inical knowledge in the form of a story) were found. S u c h structural di f ferences constituted ev idence for more sophist icated conceptual coherence. The similarity judgment task demonstrated smal l dif ferences in subjects ' conceptual structure. T h e mult idimensional scal ing of similarity judgments 190 revealed s o m e patterns of relations (Table 4-14) in the proximities of concept clusters with exper ience. Sensitivity to overall patterns in data. The pattern of med ians for R 2 and G - S c o r e s (Table 4-15) showed that exper ience had a stronger relationship under analysis- inducing condit ions, compared to intuition-inducing condit ions. If there were a significant number of subjects with high levels of expert ise, then performance in the normal order c a s e s should have revealed much differentiation. O n e explanat ion for this finding is that there were few true experts (as descr ibed by Dreyfus & Dreyfus) in this sample of subjects. Consider ing the exper ience dif ferences for each group of subjects, there was a remarkable consistency in the means of the R 2 for normal order c a s e s (Table 4-16). Hammond et a l . (1964) proposed that the matching index, 2 -d , had promise as an important component of clinical inference. In this study there w a s good variation in this index of the degree to which subjects matched ecological validities with utilization coefficients (from .79 to .08 for normal order c a s e s and .66 to .07 for reverse order cases) . Hammond and co l leagues recommended that 2-cf is best interpreted within the context of other lens model parameters. This measure w a s strongly related to achievement (r = -.77 for normal order c a s e s , and r = -.84 in reverse order cases ) . Subjects who had high ach ievement tended to have low (more ideal) scores on Z -d. 2 - d a l s o related to C -sco res (r = -.35 in the normal order cases and r = -.37 in the reverse order cases ) , indicating that subjects who used nonlinearity effectively a lso tended to have good 2 - d scores . 2 - d scores were found to be multicoll inear (within rounding error) with the lens model G-scores . In this data set, G-scores were calculated by 191 correlating the model-predicted heal ing t imes with predictions based on subject 's unique regression pol icies on the basis of the s a m e cases . The calculat ion of 2 -d involved the s a m e two sources of data. Thus , both the 2 - d scores and the G -scores were measures of the s a m e linearity; the difference between the two measures w a s that G represented the degree of match, whereas 2-d represented the degree of mismatch. Captur ing subjects ' pol ic ies by m e a n s of a regress ion equat ion demonstrated that different pol icies gave rise to similar outcomes (Appendix H). Brehmer (1994) c la imed that in making judgments, subjects' particular weighting of cues does not matter much. The large inter-individual dif ferences found in subjects ' pol icies in this study constituted support ing ev idence for Brehmer 's c la im. Brehmer attributed policy variation to the fact that condit ions under which subjects acquired their pol icies were different. Most of the avai lable cues were correlated, and the judges were not focusing on the weights. A compar ison of the regression weights from subjects ' pol ic ies (Appendix H) with the regression weights from the left s ide of the lens model (Table 4-02), demonstrated an interesting difference. In the ecology, when 258 c a s e s were ana lyzed, Severity of Surgery was the most significant predictor for heal ing t ime; in the subject sys tems, w h e n two sets of 3 5 c a s e s were ana lyzed, only o n e subject perceived Severity of Surgery as the most relevant cue . The interpretation here is not that most subjects fai led to detect the most important cue. Rather, because each set of lens model judgments contained 35 c a s e s for analys is , subjects partit ioned the cue Severity of surgeryInto its components with which it was correlated: length of t ime for surgery, b lood transfusions needed , compl icat ions, s ize of incision, and so on. The subjects 192 possib ly made qualitative distinctions among c a s e s that related more to these other constituent cues avai lable rather than to a classif icat ion of c a s e s in terms of minor or major surgery. The particular cues which emerge a s important depend , not only on the subject 's weighting policy, and the intercorrelations among the cues , but a lso upon the ways the subjects conceptual ize the cues , and the numbers of c a s e s being ana lyzed . Severity of Surgery had status a s the most important cue when the analys is included 258 c a s e s , and when this var iable w a s constrained to be d ichotomous (minor or major surgery). S u c h simplif ication of a cont inuous var iable into two levels reduces the ambiguity in the ecology. It appears that the subjects in the study did not apply this simplifying strategy. The accuracy of subjective pol icies of novice subjects on cross-val idat ion was similar to that of exper ienced subjects, using the error var iance measure of accuracy (Table 4-17). The s u c c e s s of cross-val idat ion w a s somewhat better for the inexper ienced subjects using the percent correct measure of accuracy. It may be that looking for dif ferences in the weighting of cues is taking too microscopic a view, as Brehmer (1994) c la imed. It is possib le that these novice subjects were knowledgeable and highly motivated, and appl ied their knowledge to the research task in a manner that w a s more consistent compared to exper ienced subjects. One point about lens model ach ievement that warrants caution is that at the s a m e t ime lens model ach ievement can be high, accuracy in an absolute s e n s e can be low. For example, one subject in this study had good lens model achievement scores (.73 and .72 for normal order and reverse order c a s e s , respectively), but had low accuracy scores , a s s e s s e d by percent correct (28% and 18%). With this subject, high achievement scores were accompl ished by providing responses that were in the appropriate direction, but were extreme; a 193 large var iance resulted, which w a s translated into good lens model achievement . W h e n a s s e s s e d for accuracy, however, few of the predictions were within the interval measure of heal ing time. Th is subject appeared to have a good s e n s e of the overall patterns in the ecology, but had poorly cal ibrated knowledge of typical heal ing t imes. Other subjects had low correlational measures , but good accuracy measures ; these subjects ' judgments were c loser to the base rate and showed little variation in response to c a s e s with large differences in cue-va lues. What s e e m s surprising is the smal l variation in lens model achievement across exper ience groups. One possib le reason why novice subjects' performance on the heal ing t ime judgment task w a s as good as it w a s may be because , in the absence of exper ience, it is reasonable to propose that nov ices used a theory-based strategy; the cues employed in this study were all famil iar in theory. In contrast, the exper ienced subjects likely used a strategy based on the patients they had encountered during their careers. Severa l researchers have character ized expert ise in phys ic ians as "case -based" , rather than genera l ized (Elstein, Shu lman , & Spra fka , 1978; Fel tovich & Barrows, 1984). Many of the nurses in this study s e e m e d to have knowledge that could be character ized as case-speci f ic . S c o r e s may have been reduced for s o m e exper ienced subjects because the match between the variety of surgical c a s e s used in this study and the types of patients they have cared for may not have been optimal. Judgment process. Th is construct w a s a s s e s s e d by the information board task and by the orthogonal judgment task. Data from the information board task demonstrated a smal l effect of exper ience with the Depth of S e a r c h and Number of Shif ts. There w a s a smal l 194 increase in the number of accurate subjects with exper ience (Table 4-18). Subjects in the middle group had a positive search pattern, which indicated that on average, they used an /nterclimensional search pattern more frequently, compared to other subjects. The exper ienced subjects ' sea rch pattern w a s more often /nfradimensional. The verbal protocols were examined to determine which cues were most emphas ized . There s e e m e d to be no relationship between the weight a cue was given in the protocol data and the weight der ived from regression. For example, one subject emphas ized on four occas ions the importance of old age as a factor in heal ing (with reference to the 80 year old patient), yet A g e did not enter in this subject's regression policy. The Judgment P r o c e s s sca le is presented in Append ix I. The examples of metacognit ion, input generat ion, and cognit ive operat ions were il luminating in terms of how nurses think as they make clinical judgments (Table 4-19). Nine of the protocols demonstrated reflectivity in relation to sal ient data; this type of verbal izat ion w a s somet imes character ized by a particular affective quality that could be detected in the tone of voice from the audio-tapes. Examp les of the types of reflective responses included surprise when expectat ions were not met, curiosity when subjects had uncovered partial data, perplexity when information s e e m e d incongruous, satisfaction when expectat ions were fulfilled, or apprehension when cues of a particularly ser ious nature were revealed. Protocols from other subjects included the verbal process ing of the cues , but no such reflection or affectivity could be detected. The following excerpts are examples from the protocols: 195 S . 16: "I want to s e e if the d iagnosis for patient A is ulcerative colitis. [Turns over card.] Jus t as I thought!" [Satisfaction]. S . 21 : "I want to know the medical history. O h ! My goodness ! I would have to watch him." [Apprehension]. S : 28: "This one has swol len thighs. At this point I can't s e e the connect ion between the thighs and the incision." [Perplexity]. Th is reflectivity and affective respons iveness may simply indicate individual dif ferences in communicat ion style, or perhaps differential ability to carry out the thinking-aloud component of the task. T h e s e features, however, could represent real dif ferences in information process ing. The reflectivity may be generat ing an internal response which the subject is using as a "secondary cue" , as descr ibed by K. R. H a m m o n d (1990). The reflective and responsive subjects tended to demonstrate motivation for the tasks, and good judgment performance. Exper ienced subjects provided all but one of these protocols. A similar example of respons iveness to internally-generated [secondary] cues is presented in Abernathy and Hamm (1995) where a resident physic ian paid attention to his own reactions to a patient with internal injuries, including a pelvic fracture, using his own responses as a gauge of the ser iousness of the c a s e and as a bas is for cl inical judgments. The notion of secondary cues is not new; in fact, over 35 years ago, Sarb in et a l . (1960) c la imed that an important source of information in making clinical judgments was "the inferrer's own v iscero-somat ic react ions to other inputs" (p. 176). It appears that as technology becomes more sophist icated, however, much legit imacy is given to cues such as test results and laboratory data, and little value is p laced on the exper ienced cl inicians' internal cues which may have proven validity. Un less these internal cues are understood and va lued, 196 cl inicians may not develop and properly calibrate their internal sensors , and fail to reach their potential in terms of implementing the art of clinical judgment. Abernathy and Hamm (1995) stated that "intuitive thinking has atrophied as medic ine and surgery have b e c o m e more oriented towards numbers in laboratory data, numbers in critical care data, and the use of radiology reports as 'hard data'" (p. 4). There has been considerable research into the manner by which subjects integrate cues into judgments. B a s e d on ev idence from the protocol data, subjects in this study s e e m e d to use a compensatory funct ion; they examined a certain number of cues , and then integrated them (seemingly) in an additive fashion. There were instances from the protocols where subjects p rocessed two cues interactively, but no examples where subjects used extremely complex configural integration strategies. Whether this failure to apply such integration was related to subjects' information-processing capaci t ies or constraints, or whether it ought to be attributed to the nature of the task was not possib le to determine. The information board task w a s too easy to function a s a test of the form of cue-integration employed by subjects. The four alternatives used in this task did not sufficiently chal lenge working memory, nor was there any time pressure. Unl ike many information boards from the literature where preferences were elicited (for example, a cho ice of apartments among a set of possibil i t ies), the board in the present research required judgments based on domain-speci f ic knowledge [rank ordering of heal ing times]. With preferences, there is no normatively correct response, and thus subjects likely would feel no constraint in terms of the data they seek and the strategies they apply. In contrast, in this task, judgments were required. Subjects knew there w a s a normatively correct response, and that, as nurses, they may have felt accountable for applying 197 appropriate and defensible p rocesses because their judgments were being scrut inized (S iege l -Jacobs & Ya tes , 1996). S u c h dif ferences between preferences and judgments may help to account for the lack of c lear patterns for data search and cue-integration strategies with exper ience. P a y n e et a l . (1992) descr ibed a constructive p rocess v iew of judgment. T h e s e researchers c la imed that in making judgments, individuals often develop strategies opportunistically; they alter their process ing at the t ime, depending on the information they encounter and the cont ingencies inherent in the task. Thus , strategies were not merely revealed through a task, but were constructed on-line, Payne et a l . (1993a, 1993b) descr ibed decis ion makers who flexibly appl ied both top-down and bottom-up strategies effectively. In the present study, the verbal protocols constituted ev idence that the subjects var ied in their construct iveness and adapt iveness. Mos t subjects were strategic in their information process ing. Discrimination of heal ing t imes and configural cue -use were a s s e s s e d by the orthogonal judgment task (Table 4-20). This task required making judgment about heal ing t imes for hypothetical patients. Most of the nurses had worked in hospital settings, where they did not see patients with extremely de layed wound heal ing. S u c h patients are frequently d ischarged and cared for by community nurses. Al though all the subjects were informed that the total time for heal ing was required (not just the t ime whi le the patient was in hospital), the exper ienced nurses apparently had difficulty in calibrating their judgments to incorporate this data, a s shown in Tab le 4-20. O n average, exper ienced subjects had less accurate est imates related to heal ing t imes, compared to novice subjects. Yet , such est imates very likely reflected their experience, from their hospital setting perspect ive. 198 The orthogonal judgment task required analyt ic thinking for subjects to consider the overall effects of the cues (main effects) a s wel l as the joint impact of cues (interactive effects). The percent of var iance accounted for by main effects and interaction effects were a s s e s s e d . Omega-squa red for interactions was used to a s s e s s configural process ing. The omega-squared for main effects and interaction effects tended to decrease with exper ience level, as shown in Tab le 4-20. Th is f inding ra ises a number of quest ions. D o e s configural processing actually dec rease with exper ience? Or, is the observed dec rease in Omega-squared a reflection of the exper ienced subjects' preference for intuitive task condit ions? If, as Dreyfus and Dreyfus (1986) c la imed, exper ience fosters the intuitive process ing of information, then exper ienced subjects ' performance may have been reduced by the lack of congruence between the task demands and the subjects ' preferred means of process ing. K. R. Hammond et a l . (1987) descr ibed how such incongruence can significantly inf luence performance. The means and standard deviat ions for var ious measures of configural cue-use showed no increasing pattern with exper ience (Table 4-21). Variabil i ty of information search , and C - S c o r e s in the reverse order c a s e s demonstrated the opposite pattern: the scores decreased with exper ience level. Correlat ions among var ious indices of cue-use tended to be low and did not show the relationship with expert ise predicted by Dreyfus and Dreyfus' (1986) theoretical posit ion [Tables 4-22(a) and 4-22(b)]. It s e e m s interesting that there is considerable anecdotal ev idence of configural cue-use , an extensive body of judgment literature descr ib ing configurality in judgment, and yet when attempts are made to carry out empir ical research, it is extremely e lus ive. It may be that the methods of measurement used to a s s e s s configurality interfere with the 199 configural use of cues . A n alternative approach to the study of configural process ing is needed ; verbal protocols may have potential in this regard. R e s e a r c h quest ion 2 can be summar ized as fol lows: T h e patterns of relations between cognit ive measu res and judgment accuracy which are notable in the normal order c a s e s are those between Z - d a n d performance and between lens model achievement and performance. In the reverse order cases , the patterns which are strongest are those between sco res on the Judgment P r o c e s s sca le and performance, and the lens model ach ievement sco res and performance (Table 4-24). Research Question 3 What are the patterns of relat ionships among individual dif ferences in age, educat ion, and exper ience and performance in a cl inical judgment task? Age, education, and measures of experience. The samp le s ize was too smal l to fully address the quest ions about age and educat ion (Table 4-03). Three main measures of exper ience were the number of months of surgical exper ience, the number of incis ions v iewed, and the protocol measure that reflected exper ience. The correlations between exper ience in months, and number of incisions was substant ial , but the protocol measure was unrelated [Tables 4-23(a) and 4-23(b)]. 200 Research Questions 4 and 5 To what extent does cue-presentat ion condit ion (context cues fol lowed by individuating cues , or the reverse), reveal patterns of relat ionships in performance in a cl inical judgment task? To what extent does memory-pr iming condit ion (exposure to relevant domain-speci f ic visual stimuli, versus no exposure), reveal patterns of relationships in performance in a clinical judgment task? Impact of paragraph order condition. The patterns of f indings regarding paragraph order were interesting. Smal l dif ferences in performance in the two sets of paragraphs were identified, using the error var iance measure of accuracy (Table 4-25). The correlational pattern in the normal and reverse order c a s e s was different [Tables 4-12(a) and 4-12(b)]. The median lens model achievement scores were lower in the normal order c a s e s compared to comparab le medians in the reverse order c a s e s , as shown in Tab le 4-15. Cur iously, the medians for R 2 were higherjn the normal order c a s e s compared to those in the reverse order c a s e s (except for exper ience level 3 where scores were essential ly the same) . Th is pattern was not anticipated. It may be that lens model achievement w a s lower in the normal order c a s e s because these sco res are correlat ions with a real world criterion. Per formance in an intuition-inducing task was not as accurate, compared to performance in an analysis- inducing task. O n the other hand, measures of R 2 have no connect ion to the real world; these scores tap internal consis tency within the task wh ich w a s higher under intuitive-inducing condit ions. It is a lso poss ib le that it was the speci f ic content of each set of c a s e s (rather than the manipulat ion of paragraph order) which elicited the smal l dif ferences in responses . 201 The fact that C - S c o r e s were larger under analyt ic condit ions (see Tab le 4-21) provided s o m e supporting ev idence that the reversal of paragraph order w a s successfu l in altering subjects' information process ing. Th is difference suggested that subjects were better at non-l inear var iance matching under analyt ic condit ions. S u c h a finding is consistent with what K. R. Hammond et a l . (1987) descr ibed in their study of expert engineers making judgments about road safety. T h e s e authors argued that when a task is intuition-inducing, a linear model can often account for the data, whereas , under analys is- inducing task situations, subjects can better detect and use nonlinearity. This interpretation by Hammond is different from the theoretical v iew proposed by Dreyfus and Dreyfus (1986). T h e s e latter researchers argued that individuals with high levels of expert ise could detect and use nonlinearity best in a n intuition-inducing task, providing the task was relevant and familiar. The source of this difference in prediction based on theory appears to be, at least in part, a difference in the definition of expert. Dreyfus and Dreyfus appl ied performance criteria, whereas Hammond used exper ience and credentials. K. R. Hammond et a l . (1987) a lso c la imed that conf idence in the accuracy of responses would be higher under intuition-inducing task condit ions; an examinat ion of Tab le 4-09 showed that the means for average conf idence illustrated such a pattern. Overal l , conf idence increased with exper ience, and w a s somewhat higher when a s s e s s e d by an intuition-inducing task, compared to an analysis- inducing task. Impact of memory-priming condition. W h e n accuracy was a s s e s s e d by the percent correct measure , there was a significant s l ide-exper ience interaction, as shown in Tab le 4-26. It can be seen in Tab le 4-27 that under enhanced memory-pr iming condit ions, subjects ' per formance tended to increase 202 with exper ience as predicted; under basel ine memory-pr iming, however, the subjects who did not v iew the s l ides unexpectedly ach ieved very good performance. If the s l ides had funct ioned as a memory prime, novice subjects were predicted to show little difference in performance with priming. However , this was not the case . O n e [post hoc] interpretation for this anomalous result is that the s l ides may have been educative tor inexper ienced subjects. The sl ide set w a s not completely representative of abdominal incis ions; there were more s l ides of abnormal incis ions compared to normal ones . It could be that viewing such s l ides c a u s e d greater inaccurac ies for the novice subjects. T h e exper ienced subjects may have been better able to withstand such an inf luence. This explanat ion, however, does not account for the high scores for nov ices who did not receive enhanced memory-pr iming. There was a significant effect of priming condit ion on judgment performance. A s Bi rnbaum (1982) descr ibed in relation to internal context, subjects have unique backgrounds in the form of memor ies of past and recent exper iences that can (and presumably do) interact with particular cues , likely in ways unanticipated by the experimenter. Newel l (1968) exp ressed long ago that "humans add to the information avai lable in the situation in order to make their judgments, even when they have no internal information relevant to the situation" (P- 7). The point here is not so much that memory inf luences information process ing, (as that is already known), but to alert judgment researchers that it is not a lways possib le to have control over all the cues util ized in a judgment task. W h e n Brehmer (1994) stated that "l inear modell ing requires that a subject be 203 provided with all cues for all c a s e s " (p. 150) [emphasis added], he may have been referring to all external cues , and was likely not precluding the possibil ity of internal cues being influential. C a r e needs to be taken to real ize that any sal ient exper iences immediately prior to a lens model type task, a s well as memor ies evoked by the cues , have potential to inf luence judgment per formance in the laboratory as well as in the cl inical setting. Resea rch quest ions 4 and 5 are summar ized as fol lows: Differences in the observed patterns of scores were assoc ia ted with paragraph order condit ion. In general , these patterns of scores provided modest support for K. R. Hammond 's theory regarding a cognit ive cont inuum. There w a s a significant s l ide-exper ience interaction in which exper ienced subjects demonstrated increased performance under enhanced memory-pr iming condit ions, but the pattern of results in the basel ine memory-pr iming condit ion was not interpretable. Research Question 6 Of all the measures inc luded in the study, which measures are most predictive of clinical judgment per formance? Predictors of clinical judgment performance. The configurations resulting from mult idimensionally sca led similarity judgments constituted a posit ive f inding. Four of these measures demonstrated signi f icance in predicting judgment accuracy [Tables 4-28(a), 4-28(b), and 4-28(c)]. To the extent that the plots reflected the organizat ion of cognit ive structure, this finding demonstrated that cognit ive structure predicted accuracy in judgment performance. Th is idea was proposed more than 35 years ago by Sarb in et a l . (1960) in their d iscuss ion 204 of the sources of variation in cl in ic ians: "Variat ions in inference . . . are primarily due to individual dif ferences in the d imensions in the cognit ive organizat ion of the inferrer" (p. 224). The similarity judgment task was an intuition-inducing task; subjects rated the similarity of pairs of concepts very quickly, without analys is . It may not be coincidental that these measures predicted accuracy only in the normal order c a s e s where an intuitive strategy was likely used. In the normal order c a s e s , exper ience in months had a stronger relationship with judgment accuracy (when accuracy was a s s e s s e d by the percent correct measure) compared to the comparab le relationship in the reverse order c a s e s . The quest ion that initiated this study was whether exper ience w a s related to judgment accuracy. The conclus ion on the bas is of this judgment task is that there is a weak relationship between exper ience and judgment accuracy that was revealed by an intuition-inducing task but was not revealed by an analysis- inducing task. Exper ience accounted for only a smal l proportion of var iance in performance. Exper ience did not predict judgment accuracy in this task. Section B: Discussion Related to Methods and Analysis Findings Related to Methods Assessment of confidence and calibration. Subjects demonstrated significantly better calibration from Gigerenzer ' and co l leagues ' retrospective method for assess ing conf idence, compared to the traditional method as shown in Tab le 4-05. There were no differences, however, for this group of subjects 205 between the traditional method and Bjorkman's (1994) method for determining calibration (Table 4-07). The method used to determine calibration differed significantly in determining the extent of overconf idence demonstrated. S c o r e s reflecting the use of the two conf idence measures to calculate judgment quality can be s e e n in Tab le 4-08. Subjects were most accurate in estimating the number of correctly judged c a s e s when the retrospective method was used in the reverse order c a s e s . Inspection of the means for judgment calibration revealed that, for both methods, subjects in exper ience level 1 had somewhat better cal ibration sco res ; this f inding was attributed to these subjects' more modest conf idence, compared to exper ienced subjects. Whether or not the representative select ion of c a s e s accord ing to a natural reference c lass , a s recommended by Bjorkman (1994) and Just l in (1994), had any beneficial effect on calibration could not be determined. T h e heal ing time judgment task did have representative sampl ing of c a s e s , but it a lso had a high difficulty level ; most subjects did not seem to consider this latter feature. Correlational and absolute measures of accuracy. It is important that the dif ferences between accuracy a s s e s s e d by absolute measures (error var iance and percent correct) and correlational measures der ived from the lens model (achievement and var iance explained) be understood. The subjects made judgments that were ana lyzed in two different ways . W h e n absolute measures were examined , the judgments ranged in accuracy from 77 to 94 us ing the average error var iance measure, and 10% to 6 3 % when using the percent correct measure . W h e n these same judgments were a s s e s s e d by the lens model approach, however, 13 out of 36 subjects attained R 2 measures as high 206 as , or higher than, the regression model in the normal order c a s e s . In the reverse order c a s e s 12 out of the 36 subjects attained R 2 measures as high as , or higher than, the model . In each set of c a s e s , 5 out of the 12 subjects from exper ience level 1 had scores on R 2 that met or exceeded the lens model R 2 . Lens model ach ievement scores , particularly for nov ices, were impressive. T h e s e high R 2 sco res indicated that the subjects' cognit ive control w a s higher than the predictability of the ecology. C o m p a r e d to an external standard, performance appears fair to moderate, but when compared to a regression model , performance s e e m s more favorable. Th is dif ference ar ises because the lens model y ie lds correlational measures that contain no element of calibration. Value of a multi-method approach. G reen (1968) stated: Whether one accepts the first approximations as good descr ipt ions of reality or as fictions contributed by the method of analys is depends partly on one 's purpose. If the goal is predict ion in s o m e practical situation, an adequate descript ion will serve. But if the goal is to understand... then we must beware of ana lyses that mask complexi t ies" (p. 98). [Emphasis added]. Any one set of methods and ana lyses highlights certain aspec ts and obscures other aspects . In this study, the practical goal w a s to a s s e s s a number of cognit ive var iables and exper ience to determine which, if any, would predict accuracy in clinical judgment. Predict ion, of course, would not mean causat ion, only an associat ion; nevertheless, an associat ive link could be useful as a bas is for further research. The most important goal of this study, however, was to understand judgment at a deep level : to comprehend how a person seeks , weights, and integrates data into a unitary response. This goal involved understanding of both judgment p rocess and outcome, and required the collection and analys is of both quantitative and qualitative data. With such a comprehens ive goal , a multi-method approach was necessary . 207 Using the lens model methodology took considerable time and effort. A major benefit was being able to compare the s a m e subjects' lens model scores , with scores derived from other methods, as well as with sco res from other lens model studies. Another benefit was obtaining heal ing t imes from authentic patients which al lowed for assess ing the accuracy of the subjects' judgments. The analys is of the verbal protocols a lso took a great deal of t ime. T h e va lue of the protocols, particularly in understanding how these subjects integrated the var ious cues , goes beyond what can be detected from mean scores , or other numer ic indicators. Their va lue is that they help to i l luminate the p rocesses by which subjects seek particular data in a probabil istic context, and construct mean ing from a pattern of cues , with the goal to make cl inical judgments. Protocol data avai lable in publ ished studies (Abernathy & H a m m , 1994; Kassi rer , Kuipers, & Gorry, 1982; Moskowi tz et a l . , 1988) were observed to be similar to the data generated by subjects in this study. Recent ly, Cooksey (1996) pointed out the substantial impact of measurement in judgment research: [A] very important implication for judgment research . . . paral lels Werner Heisenberg 's notion in physics that uncertainty exists at a fundamental level and that the act of observat ion changes what is observed. A s in quantum phys ics, where it is not poss ib le . . . to know simultaneously where a particle is in space and how fast that particle is moving (precisely measur ing one character ist ic alters the second characteristic), one cannot be simultaneously certain, in judgment and decis ion making research, of the what, where, how, and why of observed judgment behavior. In judgment research, one tends to lose sight of the fact that methodological procedures may alter the very phenomenon being observed (Cooksey, p. 331). 208 One example of the difficulty in measurement that may be attributed to the inf luence of method may be the assessmen t of configurality wh ich has been d iscused previously. In chapter 4, the two measures of accuracy (error var iance and percent correct) were found to be tapping different aspects of expert ise. This dif ference illustrates how the researchers ' cho ices of methods and measures has a significant inf luence on the conc lus ions made. For example , if only the error var iance measure had been used the conc lus ions for this study would have been that the var iance in the normal order c a s e s was not explainable, paragraph order was significantly assoc ia ted with accuracy of performance, and there w a s no s l ide-exper ience interaction. In contrast, if only the percent correct measure had been used, the conclus ion would have been that the var iance in the normal order c a s e s was much more explainable, compared to that in the reverse order cases , the paragraph order condit ion was not related to performance accuracy, and there was a significant s l ide-exper ience interaction. T h e s e contradictory results could be v iewed as ev idence that the methods were inadequate; another interpretation is that because all methods and measures have s o m e effect on results, the fact that these different methods and measures generated inconsistent results provides ev idence for the need for higher-order research that will help to account for such variation. Confl ict ing results have potential to act a s an incentive for further study. A major va lue in using a multi-method approach is that by reveal ing the extent to which f indings vary with method, researchers are in a better posit ion to understand the many aspects of a quest ion or problem, and to avoid unwarranted general izat ions. 209 Findings Related to Research Traditions Implications of data analysis methods. In this study, the decis ion w a s made to use less traditional approaches to the analys is of data rather than implement strict s igni f icance testing methods. The implications of this dec is ion were far-reaching. If a criterion of statistical s igni f icance had been adopted and maintained, there would be no effects to report, nothing to d i scuss , and no opportunity to construct and comment upon possib le connect ions between the f indings in the present study and results of past research. In fact, fol lowing guidel ines based on power analys is would mean this study would not have been conducted. In contrast, by paying attention to effect s i zes (whether they are statistically signif icant or not), a s Carver (1993) recommended , a l lowed the reporting of smal l dif ferences. Patterns were reported not with the aim to make a strong knowledge c la im, but to reflect on their possib le meaning within the context of a theoretical concept ion, and to motivate others to d iscover and report interesting assoc ia ted regularities so that growth in knowledge may occur. Breaking with tradition is difficult. Schmidt (1996) c la imed that "we have cont inued to emphas ize signi f icance testing in the training of graduate students despi te c lear demonstrat ions of the def ic iencies of this approach to data analys is" (p. 116). Traditionally, statistical testing has been the predominant approach for data analysis in research in psychology. Shave r (1993) stated that a perusal of research journals, educat ional and psychological statistics textbooks, and doctoral dissertat ions will confirm that tests of statistical s igni f icance cont inue to dominate the interpretation of quantitative data in socia l sc ience research. Yet , criticism of statistical s igni f icance test ing has been reported for more than twenty years (Carver, 1978; Sp ie lman, 1974). Huberty (1993) examined this issue from a historical perspect ive, and illustrated how entrenched a method can become. 210 Thompson (1989) noted that paradigms, including the accepted ways of thinking about research, come to be taken for granted as natural thought, and they carry normative implications about what is appropriate thinking. Thompson character ized paradigms as highly resistant to change. This study has demonstrated that although change is difficult to bring about, smal l changes are not impossible. Integration of research traditions. Chr is tensen and Elstein (1991) compared and contrasted two of the main research traditions where expert judgment has been invest igated: information process ing and judgment and decis ion making (JDM). Researchers from both traditions have attempted to capture the cognit ive activity of experts as they make judgments under uncertainty, and they both evaluate judgment expert ise. Severa l critical dif ferences have been noted: whereas researchers from the information processing approach tend to focus on information search , and use verbal express ions of rules, researchers from the J D M tradition focus on the integration of weighted information. In the information process ing paradigm, the expert is given ample time to analyze and use a deliberative cognit ive strategy; in the J D M approach, the expert is given a set of tasks to perform intuitively without t ime for analys is . The most consequent ia l difference between these two v iews is that the information process ing researchers have demonstrated the superiority of expert judgment, whereas the J D M researchers have found expert judgment to be poor, usually no better than a s imple linear model , and often not this accurate. K. R. Hammond (1987, 1990) proposed that the two groups examined expert judgment from different perspect ives; they each present a vers ion of 211 "truth" from a different point of v iew. Adherents from the information process ing perspect ive use a theory that bases truth on coherence relations of similarity among features. J D M researchers use a theory in which truth is b a s e d on cor respondence with some external standard, such as functional relations between proximal and distal var iables. K. R. Hammond (1987) argued that it is important to understand both v iews, and s e e the p lace of each in relation to the other. In seek ing to construct the complementari ty of these disparate perspect ives, Hammond (1990) argued for a unified approach. Hammond is not a lone in suggest ing integration: Einhorn et a l . (1979), a s wel l as Bil l ings and Marcus (1983), have appl ied both p rocess tracing and policy capturing methodologies, illustrating advantages of a multi-method approach. Furthermore, Hammond bel ieved that each approach used separately would provide only partial knowledge: "Unif ication might well provide new and important information about cognit ive activity that would not have been produced by either approach alone" (Hammond, 1987, p. 14). Resea rch f indings depend on the research tradition from which they ar ise. What research quest ions are cons idered important to investigate, what ontology is used to conceptual ize and name the relevant var iables, and what type of methods are advocated, may determine research f indings and conc lus ions to a larger extent than what is recognized. In this study, K. R. Hammond 's (1990) recommendat ion to conduct an investigation of clinical judgment performance using a number of methods from both the J D M and the information-processing tradition was fol lowed. What was attempted was an investigation from two different research paradigms to s e e to what extent f indings t ranscended the tradition or were determined by it. 212 From the J D M perspect ive the performance of the subjects w a s comparab le to that reported in the literature. The conclus ion is that exper ience accounted for only a smal l proportion of var iance in judgment accuracy . From a n information process ing perspect ive, the subjects searched for information in adapt ive ways , and were constructive in terms of their use of strategies. Resu l ts from this paradigm suggested a much more posit ive conc lus ion. Thus , it may s e e m that the evaluation of processes are more favorable than that of outcomes. Yet , it may be that s o m e methods are more suited to reveal ing certain aspects or phases of the total judgment p rocess than others. In educat ional psychology what is needed is a metatheoretical perspect ive in which both research paradigms are integrated. A s the twenty-first century approaches, and the amount of avai lable information cont inues to accelerate, learning to process data optimally in probabil istic contexts and to make effective use of relevant information in judgment is crit ical. Emphas iz ing either p rocess or outcome is limiting. What is needed is to understand the connections between certain characterist ics of judgment p rocess and accurate outcomes in particular contexts. Without understanding s u c h connect ions, it is difficult to make a c a s e that one process is better than another. Without examining context, variat ions in process and outcome that are assoc ia ted with context are not clear, and may be attributed e lsewhere. S u c h a broad theoretical perspect ive is necessary in order to capture the dynamic and responsive qualit ies of the human mind a s judgments are made. 213 Limitations of the Study Three limitations of this study were related to the following aspec ts : The sample of surgical patients who volunteered for phase 1 of the study; the artificial nature of the judgment tasks; and the narrow focus of expert ise. Limitations related to the sample of surgical patients. During phase 1 of the study the focus was the left s ide of the lens mode l ; data were obtained from a large, representative sample of patients who had had abdominal surgery. The goal w a s to identify cues that predicted wound heal ing t ime, and to determine the relative importance of the cues as predictors. Pat ients could not be sampled randomly. The surgical patients who volunteered for this phase of the study deviated from that which would have been truly representative for a number of reasons. First, because of ethical considerat ions and the eligibility criteria, extremely ill patients were not approached, and thus not admitted to the study. S e c o n d , because this sample w a s compr ised of volunteers, the fact that . certain patient-groups consented in greater proportions than other groups led to some degree of non-representat iveness. Examp les of under-represented groups included patients with drug and alcohol problems, and patients whose first language was not Engl ish . Third, departure from representat iveness occurred also because follow-up was more successfu l with particular groups, such as highly educated, C a u c a s i a n subjects. T h e s e factors likely attenuated the relationships between predictor var iables and heal ing t ime. Limitations related to the judgment tasks. The tasks used in this study were qualitatively different from authentic judgment tasks that nurses general ly perform. Schwar tz , Griffin, and Fox (1989) c la imed that the cl inical judgments 214 physic ians made were most typically categor ical ; similarly, most nursing judgments are categorical in nature (Radwin, 1995). They are often judgments related to one unique individual. Nurses are not accus tomed to making quantitative judgments in relation a ser ies of 30 or 40 c a s e s , as lens model judgments require. B e c a u s e performance w a s a s s e s s e d as subjects engaged in these artificial tasks rather than in authentic clinical judgments, no conc lus ions can be made about how nurses would respond in the cl inical setting. The purpose of the study, however, w a s to add to the body of knowledge about how people search for, weight, and integrate information when making judgments. T h e s e di f ferences assoc ia ted with tasks and setting were acknowledged a s qualif ications to the study. Limitations related to the narrow focus of expertise. In the practice setting, the concept expertise in clinical judgment conveys a s e n s e of high quality pertaining globally to a wide variety of nurses ' judgments. In this study, only one judgment context was investigated. By tapping into implicit learning from exper ience, and demonstrat ing the rich networks of knowledge that experts possess , the heal ing t ime judgment task was anticipated to reveal the global nature of cl inical expert ise. The moderate range of accuracy scores constituted ev idence that these measures did not capture as much of this global nature of expert ise as w a s anticipated. The findings nevertheless are still useful as ev idence to link particular cognit ive concepts to demonstrated judgment quality. B a s e d on Dreyfus and Dreyfus' (1986), and Benner 's (1984) f ramework of novice to expert, subjects included in this study s e e m e d best categor ized at the advanced beginner, competent, and proficient levels. Both true experts as well as true nov ices were not well represented. No general izat ions can be made based on one judgment context with this smal l sample of subjects. Mook (1983) 215 argued, however, that general izat ion can be accompl ished through replication, and that external validity must be cons idered in relation to the purpose of the research. In the present investigation, descript ion and understanding, not general izat ion, were primary goals. Section C: Contribution to Theory Development Caste l lan (1993) argued that what is needed are unified theories and models that al low explanat ion as well as prediction in judgment and decis ion making. G o o d theories and models can lead researchers to new insights. The present study has potential to make a contribution in relation to severa l theoretical points. Linear modeling of judgment. Brehmer (1994), in d iscuss ing the psychology of l inear judgment models, stated "despite the wealth of studies showing that such models fit judgement data quite wel l , there has, however, been little progress in our theoretical understanding of the psychological p rocesses that produce these data" (p. 137). For decades , K. R. Hammond and co l leagues, and Hoffman (1960, 1968) have both proposed linear models as a way to address the difficult methodological problem in the study of cognit ive p rocesses . To Hoffman, the choice was pragmatic; he asser ted that the model was paramorphic, that is, it was not intended that cognit ion funct ioned like a l inear model . K. R. Hammond (1955) proposed a linear model for the study of cl inical judgment b a s e d on theoretical principles. H e was attempting to apply the general f ramework of probabil istic functionalism to the study of clinical judgment, as proposed by Brunswik (1955, 1956). The lens model f ramework w a s not an 216 arbitrary choice. It was motivated by the fact that the cues upon which judgments are b a s e d have, to s o m e extent, intersubstitutability. T h e reason that l inear models fit so well is that such models al low for such cues , and capture the capacity for v icar ious functioning, a fundamental property of human cognit ion. T o explain v icar ious functioning, Hammond (1955) stated: Higher organisms may substitute one form of behavior for another in order to ach ieve a g o a l . . . . In the biological literature, this phenomenon has been termed equifinality. .. . Concern ing the perception of the environment, . . . cues may substitute for one another. Th is phenomenon has been termed equipotentiality of cues . Thus , the concept of v icar ious functioning refers to the variability in behavioral "output" (equifinality) and "input" (equipotentiality) of cues for an organism (p. 258). Brehmer (1994) expla ined that the main reason why l inear models are appropriate is that judgment tasks demand vicar ious functioning and linear models capture this form of cognit ion. Brehmer (1994) provided s o m e advice for developing a theoretical understanding of clinical judgment. First, he suggested that the use of verbal protocols in addition to linear model ing would provide a useful method for studying judgment, especial ly for capturing the experience of making judgments. S e c o n d , he recommended that psychologists adopt the Brunswik ian approach of studying the interaction between the judge and the judgment task. A third recommendat ion was to avoid focusing analys is at the level of cue-uti l ization coefficients. Ana lys is of cue weights is too microscopic a level to establ ish stable empir ical relations for theory development. W h e r e the focus ought to be, Brehmer c la imed, is on achievement or accuracy. "The cl inician's focus is the extent to which his or her judgments agree or d isagree with the distal state" (Brehmer, p. 148). 217 The first two suggest ions have been taken in this study. A n important caveat can be added to his third suggest ion. The focus should be on ach ievement or accuracy, but, whenever possib le, this ought to include compar ison to an absolute measure of a criterion. Correlat ional measures of accuracy as demonstrated by the lens model achievement, al though important, are insufficient to achieving a full understanding of the subjects ' per formance or the demands of the task. Novice-expert differences. Patel and Groen (1991) descr ibed the progression of expert ise in diagnost ic reasoning in medical students and physic ians. T h e s e researchers dist inguished three different kinds of medical expert ise: generic, specif ic, and domain- independent. The results related to task performance in the present study can be examined in light of these expert ise categories. A s practit ioners become exper ienced, their knowledge becomes more spec ia l i zed, and more difficult to employ in a gener ic way. Patel and Groen (1991) conc luded their paper by stating: Al though we have a performance theory of expert ise, we have no adequate descript ion of the learning mechan ism. It s e e m s clear that theories based purely on knowledge expans ion or on the development of better and better representat ions are inadequate (p. 122). From an educat ional psychology perspect ive, a descript ion of the learning mechan ism is of paramount importance in all programs for professional educat ion. In this study, ev idence is presented about the signif icant role of dif ferences in conceptual structure assoc ia ted with expert ise that has potential to add to the development of these learning theories. 218 Comparison of clinical and statistical prediction. For several decades a controversy in the literature has existed as to the relative accuracy of cl inical methods for judgment and mechanica l methods (expert sys tems, l inear models , or other computer ized approaches) . Kle inmuntz (1990) f ramed the issue as us ing one 's head (intuition) or us ing a formula (statistical or mechan ica l procedure) for clinical judgment. Many authors have written about this issue over the years (Dawes, 1988, 1988; Einhorn, 1972; 1986; Go ldberg , 1970; Holt, 1958; Keinmuntz, 1968; 1984; K. L. Lee e ta l . , 1986; Meeh l , 1954; J . Sawyer , 1966; Wigg ins, 1981). It is obvious that human judges are required to select which var iables are relevant to study in a particular situation. A s early as 1972, Einhorn proposed that computers were more accurate at combining or integrating data, compared to cl inicians. E v e n though there s e e m s to be considerable research ev idence on this point, many expert cl inicians bel ieve that their cl inical predict ions are superior to those made by computer. The fact that the controversy has existed for s o long indicates that there are some major phi losophical foundations to the debate. Holt (1986) argued that the issue is related to the mechanist metaphysics of behavior ism. The latest resolution is to adv ise the use of computer ized approaches where they are avai lable and where they work best, and use human experts otherwise. Levi (1989) pointed out that expert sys tems ought to be more accurate b e c a u s e they are deve loped on the bas is of the best experts in the f ield. Kle inmuntz (1990) and Holt both encouraged integration of the two approaches. T h e data from this study revealed that one third of the subjects made predictions where the var iance expla ined was higher than that derived from the regression model . 219 In tasks where there is little predictive potential, human judges likely can out-perform a linear model , because there may be additional idiosyncrat ic cues avai lable that could not be entered into any model . O n the other hand, on tasks with highly val id cues , the computer will most likely perform more accurately compared to human judges because the computer appl ies cue-weights in a perfectly consistent manner. K. R. Hammond's cognitive continuum. O n e example where this study has potential to add to understanding of the theoretical bas is for judgment is in relation to K. R. Hammond 's ideas about a cognit ive cont inuum. Basical ly , Hammond (1987), Hammond et a l . (1987), and Hamm (1988a) demonstrated that tasks can be located on a task continuum that ranges from analysis- inducing to intuition-inducing; these researchers recommended that, in order to better understand judgment performance, subjects should perform severa l tasks at a variety of locations on this cont inuum. The research carr ied out with expert engineers who judged road safety us ing different formats, il lustrated these ideas. The identification of tasks for their relative ana lys is - and intuition-inducing characterist ics was carr ied out in this study. To add to theory development, particularly from an educat ional psychology perspect ive, the expertise level of the subjects a lso must be cons idered. There is considerable ev idence (for example, Dreyfus and Dreyfus, 1986) that changes in cognit ion occur with expert ise. In K. R. Hammond and co l leagues ' (1987) study of engineers, all of their subjects were experts. It is likely that the point on the task continuum along which a task is located is contingent on the level of expert ise of the subject. Hammond found that performance was better when there was a match between the preferred means of cognit ion of the judge and the intuition-inducing or analys is- inducing properties of the task. By including subjects with a range of exper ience, there is 220 potential to learn more about the cognit ive changes that occur as expert ise is deve loped. Dreyfus and Dreyfus' (1986) claim that in making judgments, an exper ienced person with good performance often uses intuition where an inexper ienced person employs analys is has had s o m e support in this study. This does not mean , however, that the use of intuition will bring about expert ise in judgment. H a m m (1988a) argued that without exper ience based on an analyt ic foundation, intuitive performance will be poor. "Not using rules is a privilege of the expert, not a route to becoming expert more quickly" (p. 95) [Emphasis added]. Role of experience. Rotter (1967) raised the quest ion of whether it is possib le for cl inicians to learn from exper ience. He bel ieved it w a s possib le, providing motivation was appropriate and feedback w a s used properly. H e c la imed that s o m e cl inicians did not s e e m interested in learning. It is rare that cl inicians make valiant attempts to obtain systemat ic feedback so that they can change or i m p r o v e . . . . More often they are concerned in demonstrat ing to others and perhaps to themselves that their cl inical judgments . . . are val id (Rotter, 1967, p. 13). Rotter maintained that motivation to prove that one is right is hardly the most appropriate incentive for d iscover ing what one is doing wrong. This author suggested that educators ought not to try to impress students with how knowledgeable experts are, but rather with how much everyone has to learn in order to ach ieve reasonable prediction. Not all the quest ions that could be asked about the role of exper ience in judgment have been answered. For example, why is it that s o m e people require 221 minimal exper ience and others need much more exper ience to reach a particular level of per formance? W h e n individuals have exper ience with particular events and phenomena, why is it that somet imes the knowledge gained becomes genera l ized and abstract ions are constructed, whereas , other t imes, the knowledge is wedded to only the particular events and phenomena encountered? More quest ions have been raised than have been addressed . In this study, there is s o m e ev idence from the protocols that subjects var ied in the degree of ref lect iveness as they process information. Encourag ing cl inicians to reflect on their cl inical exper iences may enhance nov ices ' ability to carry out judgments in a way that demonstrates what S c h o n (1983) cal led "reflection-on-action". Exper iences that are conducive to developing reflective judgment ought to be encouraged. Cons iderab le research h a s been carr ied out in the a rea of reflective judgment (King & Kitchner, 1994). Further research could be done not only linking the judgment process with the outcome, but a lso investigating the stage of reflective judgment as a s s e s s e d by King and Kitchner, or the stage of development in terms of postformal reasoning, a s studied by Arl in and Fung (1995) and Y a n and Arl in (in press). It may be that inc reased ref lect iveness has a causa l role in bringing about changes to conceptual structure. It is only in reflection that people have awareness of the organizat ion and structure of their own concepts, and the connect ion of such organizat ion to successfu l reasoning. It may be the c a s e that people who reach postformal reasoning are capab le of reconstruct ing internal conceptual organizat ion upon reflection, and that such conceptual change has potential to promote more sophist icated reasoning and judgment. R e s e a r c h that demonstrates causa l l inks is of particular importance in educat ional psychology in order to be able to subsequent ly apply f indings in teaching-learning contexts. 222 Confidence in judgment. Einhorn and Hogarth (1978) demonstrated that conf idence in judgment persisted, despite judgment performance that is often shown to be lacking in quality. Exper ience does not necessar i ly lead to increased accuracy. T h e s e authors demonstrated how the concept "my judgment is accurate" is both learned and maintained even though judgments may be invalid. A n additional point to Einhorn and Hogarth's very thorough argument can be added. Th is point is b a s e d on research by Koehler (1994) and by Ne lson (1996) as well as the results of this study. Judgment conf idence can be cons idered metacognit ive knowledge (knowledge about the trustworthiness of one 's knowledge). O n e of the functions of this knowledge is to act as a cue (or signal) when individuals are at risk of making an erroneous judgment, for example when facing complex, unfamiliar tasks where the data exceed the limits of information process ing. In the course of a day, people make numerous judgments intuitively, without the need for careful attention or deliberation. S u c h automaticity saves cognit ive effort. For people whose metacognit ive knowledge includes wel l -cal ibrated judgment conf idence, when a drop in conf idence reaches a certain threshold, a shift from automatic intuitive process ing to del iberate process ing is initiated. T h e lowered conf idence can function as a warning to interrupt intuitive process ing, to pay c lose attention, and to ana lyze the situation carefully. By s o doing, judgments have potential to improve. Conf idence, however, is useful in this way only if it is properly cal ibrated to the level of one 's competence. Not many people use conf idence metacognit ively. Often people fail to perceive the potential va lue of low conf idence as a s ignal ; they learn to avoid the 223 negative feel ings assoc ia ted with low conf idence by never acknowledging that it occurs . T h e consequence is that the signal for when to reflect and del iberately process information becomes tuned out. Without such a s ignal , people tend to p rocess data intuitively and are not as open to environmental feedback: any activity that threatens the comfortable feel ing of high conf idence is avo ided. For example, attempting to determine if past judgments were good, seek ing disconfirming data, and being open to alternative perspect ives, are all pract ices that might suggest low conf idence. Engaging in these cognit ive activities would mean admitting (at least to oneself) that one 's judgment conf idence was less than complete. It is ironic that the cognit ive activities with potential to improve judgment quality (and thus increase conf idence indirectly), are the very ones which are less likely to be carr ied out in the efforts to maintain high conf idence directly. Section D: Implications for Education and Practice Severa l implications for professional educat ion are der ived from the literature that has been reviewed, and from the results of this study. Increase knowledge of the judgment process. Faus t (1986) suggested that improper judgment habits and cognit ive limitations which restrict cl inicians' ability to use feedback productively make learning from exper ience difficult. It is, therefore, important to develop approaches to increase students ' and practitioners' knowledge of the judgment process, including b iases and corrective measures . Students and practitioners need to learn ways to minimize b iases in judgment. By recogniz ing the consequences of human tendenc ies for seek ing confirming data, failing to consider base rates, failing to ignore irrelevant data, 224 being overconfident, and using sa l ience of data as an indicator of the importance of data, there is potential to increase the quality of the clinical judgments that are made. If judgments are being made intuitively, there is still merit in examin ing their bas is and consequences in order to learn as much as poss ib le about the factors which contribute to performance. Recognize trends in confidence with experience. Practit ioners may not real ize that conf idence in judgment tends to increase with exper ience, at t imes without any necessary relationship to increased competence. S u c h increasing conf idence may be a consequence of the professional social izat ion process for students and practicing professionals; the impact on practice is that overconf idence can result which could lead to inappropriate judgments. Encourage judgment calibration. Vas t quantit ies of data are avai lable to practit ioners. G iven the rapid advances in biological, psychologica l , and socia l sc iences , and technology, the goal of being completely knowledgeable about all potentially relevant aspec ts of professional practice is clearly an impossibil i ty. Therefore, it makes s e n s e to advocate that practitioners aim for good discrimination between when they are capable of making quality judgments independently, and when to seek consultat ion and col laboration. At no previous time in history has it been s o imperative that practit ioners learn to differentiate between what they know and what they do not know. Section E: Summary Summary Related to the Research Questions The overall research quest ion being addressed is: 225 What are the patterns of relat ionships among measures of se lected cognit ive constructs (conceptual structure, sensitivity to patterns in data, and judgment process) , individual difference var iables (age, educat ion, and exper ience), task condit ions and performance in a clinical judgment task? This study has been an examinat ion of clinical judgment performance by 36 nurses in a probabil istic clinical judgment task. The relevant exper ience of the subjects ranged from 2 months to 25 years . In phase 1 of the study, 258 patients with abdominal incisions were a s s e s s e d and var iables that were predictive of heal ing time were identified. A lens model approach was employed, using representative des ign. In phase 2 of the study, nurses made a variety of judgments about heal ing in tasks which var ied in terms of their analys is- or intuition-inducing properties. The judgments were based on data from abdominal surgery patients obtained from phase 1 of the study. Pol icy capturing w a s used , and subjects ' judgments were compared to the lens model equat ion. Subjects a lso made judgments about the similarity of concepts, and v iewed s l ides of incis ions, descr ib ing and evaluat ing what they perceived. Wi th the use of an information board, data search strategies were revealed, and thinking that accompan ied judgments about incisional heal ing in surgical patients was captured as subjects thought a loud. Half of the subjects observed s l ides of incisions immediately prior to the lens model task. The purpose of this enhanced memory-pr iming condit ion was to attempt to reveal dif ferences in the patterns of relations between exper ience and judgment expert ise in novice and exper ienced subjects. The other subjects v iewed the sl ides after the lens model task. W h e n accuracy was measured by 226 the percent correct measure , the expected pattern of per formance under enhanced memory-pr iming condit ions was observed, but the pattern of results in the basel ine memory-pr iming condit ion was not interpretable. B a s e d on subjects' task performance, measures of cognit ive structure, sensitivity to data patterns, and judgment process were obtained. The cognit ive var iable wh ich w a s predictive of accuracy in judgment performance w a s conceptual structure, a s s e s s e d by applying M D S to the similarity judgments. T h e inf luence of exper ience w a s found to be complex. Qualitative aspec ts of exper ience (experience with kinds or types of patients) likely had greater impact on performance, compared to quantity of exper ience, providing the quantity is beyond some minimum. Kolodner (1983) pointed out that it is not the amount of exper ience perse that is important, but what is learned from particular exper ience. Under intuitive task condit ions exper ience had a stronger relationship with clinical judgment performance, compared to analyt ic task condit ions. This finding lends some support to Dreyfus and Dreyfus' theoretical ideas regarding the changes in cognit ion with exper ience. If, in this study, there had been a greater number of experts as defined by Dreyfus and Dreyfus (1986), exper ience may have emerged a s a stronger predictor of expert ise. Per formance in the lens model task was inf luenced by the ordering of the paragraphs. T a s k s were se lec ted on the bas is of their ability to induce intuition or analys is . Per formance was interpreted in light of K. R. Hammond 's theory regarding a cognit ive cont inuum, and Dreyfus and Dreyfus' (1986) theory of progression of novice to expert based on changes in cognit ion. In general , the dif ferences that were observed were in the direction predicted and thus provided 227 support to these theories. Repl icat ion to other subjects and a greater variety of clinical judgments would be required in order to make any general izat ions. Judgment process was shown to be adaptive and constructive. Subjects with minimal exper ience tended to use an analytic, theory-based approach, whereas those with exper ience employed an intuitive, exper ience-based approach. Conf idence in judgment tended to increase with exper ience, even though accuracy remained stable. Cons ider ing the high task uncertainty, lens model measures were good, but judgment accuracy a s s e s s e d in absolute terms was fair to moderate. Recommendations for Future Research B a s e d on the literature and the f indings of this study, recommendat ions for future research can be made in the following areas : Issues related to confidence. Further research is needed to determine how to best measure conf idence in judgment performance, and to interpret the degree of overconf idence that actually exists. T h e relat ionship of representat ive design to measurement of conf idence appears to be an interesting topic, not only because of practical importance, but a lso because of the connect ions to theory related to information process ing and cognit ion general ly. Issues related to judgment process. Resea rch is needed to understand the p rocess of making judgments. Currently, this aspect of cognit ion, particularly with intuitive judgments, is somet imes imbued with a certain myst ic ism assoc ia ted with expert ise. Practit ioners and students should d i scuss their thinking in relation to their judgments. S u c h d iscuss ion could include the data that were sought and its importance, the interpretations made and the alternative 228 interpretations cons idered, and the ways in which the information w a s (or could have been) combined, as well as the judgment outcome, and consequence (Arkes, 1981). If this were done, cognit ive activity would become less of a "black box". It may not be possib le to recapture from the expert in explicit, formal s teps the mental p rocesses or all the e lements that character ize expert cl inical judgment performance. Much of the bas is for intuitive judgment is implicit knowledge. Expert ise often results from synthesis, not by decompos ing whole situations by analyt ic means . It is still possib le, however, to communicate some aspects of intuitive judgment. Benner (1984) recommended that expert nurses be encouraged to capture in narrative form the accompl ishments and interpretative reflections on their practice. Listening to stories about paradigm c a s e s has potential to help novice nurses gain insight into aspects of judgments, and to il luminate issues of fundamental importance in terms of professional practice. Information processing and connectionism. W h e n Brunswik initially proposed his ideas about the environment-subject interaction, and vicar ious functioning, computers were extremely primitive. K. R. Hammond and col leagues used the statistical procedure of regression as a means of implementing Brunswik 's ideas, not only because they bel ieved the integrative mechan ism was additive, as Brehmer (1994) c la imed, but a lso because the cho ices of statistical programs were limited. Now, with much more sophist icated computer hardware and software, researchers are using connectionist methodology for model ing judgments. For example, Brickley and Shepherd (1996) descr ibed how they trained a neural 229 network to make particular treatment-planning dec is ions that would provide reliable decis ion support for cl inicians. There is potential for carrying out Brunswik 's ideas using more modern technology. Features such as secondary cues are poss ib le to model in a connectionist sys tem. R e s e a r c h in the a rea of novice-expert di f ferences in judgment performance us ing a combinat ion of symbol ic and connectionist approaches could prove to be more sensit ive than regression to reveal changes in cognit ion with exper ience. If, as Dreyfus and Dreyfus (1986) and Benner (1984) c la im, experts excel in knowledge that is tacit, encoded in patterns, it would s e e m appropriate that researchers learn to incorporate new methodologies in attempting to capture the complex changes that occur in cognit ion. Future directions for judgment research. R e s e a r c h needs to be conducted to tap implicit knowledge imbedded in practice. It may be that the researcher must move from the laboratory to the practice sett ing, s o that judgments of practitioners can be studied in a more appropriate context. K. R. Hammond (1993) provided encouragement to move to a naturalistic setting, which is consistent with representative design proposed by Brunswik. S u c h a change to high fidelity tasks could a lso be accompan ied by a shift from an emphas is on quantitative des igns, and statistical s igni f icance of results, to a focus on qualitative approaches and signif icance based on criteria that may prove more suitable for cl inical judgments made by nurses, s u c h a s authenticity (Guba & Lincoln, 1989). S u c h a change would seem to afford excel lent possibi l i t ies for enhancing understanding of cl inical judgment in particular contexts. K. R. Hammond and Ade lson (1976) understood judgment research in the context of human va lues as well as scientif ic a ims. Referr ing to the prevail ing 230 paradigms in nursing and how they are changing, Newman (1993) encouraged clarification of scientif ic va lues and methods that shape the discipl ine. A move to shift emphasis from one perspect ive to another, however, should not be interpreted to mean that a quantitative approach to judgment is not usefu l ; it s e e m s that it is through understanding the quantitative aspec ts (including advantages and limitations), that one can more fully understand and appreciate the qualitative aspects . Both approaches have potential to be useful in relation to e a c h other. Another shift in emphas is is to move from obtaining a snapshot v iew of discrete, static judgments to monitoring judgments in cont inuous, dynamic environments over periods of t ime. Lusk and K. R. Hammond (1991) provided a good example where a lens model approach has been taken with weather persons ' forecasts of microbursts in a dynamic environment; these researchers demonstrated how subjects changed their judgments in response to changing condit ions. In addit ion, Hamm (1988b) provided ev idence regarding dynamic variation in cognit ion (analysis and intuition) during task performance. Debate over configurality. The subject of configurality in judgment has been a t o p i c of debate for decades (Brannick & Brannick, 1989; Brannick & Darl ing, 1991; Brehmer, 1969; Edgel l , 1993; Einhorn, 1 9 7 0 , 1 9 7 1 ; Meeh l , 1950; J . E. Sawyer , 1993). The reason that previous attempts to detect nonlinearity in clinical judgment have not succeeded , according to G a n z a c h (1995), is that good nonl inear mode ls were not avai lable. Th is author reana lyzed Meeh l 's da ta us ing a model that G a n z a c h referred to as a scatter model ; he c la imed this model provided a better fit to the data, compared to a linear model . H e found patterns of nonlinearity for which he could provide psychological interpretation. 231 The results from the present study were not helpful in resolving the extent to which cl inicians use configurality in making judgments. Further research could be carr ied out, reanalyzing these data, using G a n z a c h ' s (1995) scatter model . Teaching of judgment. O n c e more research of a bas ic nature is carr ied out with respect to the theoretical aspec ts of cl inical judgment per formance, research is needed into ways to more effectively teach peop le to make judgments. More practical research needs to be done using methods that have validity in the settings in which they would be appl ied. In addit ion, educators need to make use of the judgment research that has already been carr ied out. Faust (1986) pointed out that there is considerable under-use of research on human judgment. Greater application of such research is needed to prepare students and practitioners for the cognit ive chal lenges that are inherent in professional practice. In any professional educat ion program, helping students to learn to make good judgments is an important and demanding goal . Faust maintained that "there is far more to know than what is known about methods for improving judgment. Cont inuing research on human j u d g m e n t . . . can produce knowledge that helps cl inicians better serve their cl ients" (p. 428). Hammond and Ade lman (1976) stated that "the key e l e m e n t . . . in the process of integrating socia l va lues and scientif ic facts is human judgment" (p. 389). There are distinct advantages in being able to make wise judgments. Arl in (1993) d i scussed the importance of w isdom and expert ise in teaching; she descr ibed wisdom as entail ing good judgment, manifest ing itself in planning, managing, and reflecting on teaching and learning activities. Her points are relevant to nursing and other professions. In no other t ime in history has the teaching of judgment been of greater importance. From an evolutionary perspect ive, survival of the spec ies involves groups of people (communities) 232 making wise judgments in relation to all important aspects of life such as health and safety, educat ion, food production, and issues related to the environment. Hogarth (1980) stated "in the not so distant past human survival and progress depended on physical skil ls. There can be little doubt that the need today is for conceptual skills, that is, the ability to process information and make judgments" (P- 3). Conclusion The problem identified at the beginning was the finding that many studies in the literature showed no relationship between exper ience and expert ise in judgment. To s o m e extent, this lack of relationship may be accounted for by c laiming that there actually is a relationship, but it is difficult to demonstrate in the laboratory. S u c h factors as the use of laboratory studies with nonrepresentat ive des igns, failure to cons ider changes in cognit ion with expert ise, and the use of tasks that have little ecological validity constitute reasons for this difficulty. O n the other hand, Einhorn and Hogarth (1978), Dawes (1989), and Brehmer (1980b) expla ined the lack of relationship found in many studies by claiming that there is truly little relationship to be found. T h e s e authors demonstrated that factors such as the difficulty of learning in probabil istic contexts and the increase in conf idence in one's judgment with exper ience tend to create the il lusion of validity. B a s e d on the f indings in this study, a synthesis of these two explanat ions is proposed: Al though people tend to increase in their ability to make judgments as they gain exper ience, standard research condit ions usual ly do not reveal this, and because conf idence increases considerably, the self-perceived level of expert ise in a particular judgment context (revealed by confidence) may be greater than what is val id when compared to absolute indicators of judgment quality. 233 The initial quest ion ra ised was to determine the extent to which exper ience and clinical judgment accuracy were related; this quest ion has provided good direction to this study. A number of interesting patterns of relations were found between exper ience and var ious indicators of expert ise in clinical judgment performance. Exper ience was related to judgment performance under intuitive condit ion; in a comparab le task under analytic condit ions, exper ience w a s not related to judgment performance. Per formance differences assoc ia ted with task condit ion may contribute to the explanation for previous results of a lack of a relationship between exper ience and judgment performance. A s wel l , such dif ferences in performance assoc ia ted with task characterist ics lend support to Dreyfus and Dreyfus' (1986) model for the development of expert ise. 234 REFERENCES Abdolmoharnmadi , M. , & Wright, A . (1987). A n examinat ion of the effects of exper ience and task complexity on audit judgments. The Account ing Rev iew, 62, 1 -13. Abernathy, C . M., & H a m m , R. M. (1994). Surgical Scr ipts: Master surgeons think a loud about 43 common surgical problems. Phi ladelphia: Hanley & Bel fus. Abernathy, C . M. , & H a m m , R. M. (1995). Surgical intuition: What it is and how to get it. Phi ladelphia: Hanley & Belfus. Ade lson , B. (1984). W h e n novices surpass experts: T h e difficulty of a task may increase with expert ise. Journal of Exper imental Psycho logy : Human Learning and Memory , 10, 483-495. A l len, S . W. , Norman, G . R., & Brooks, L. R. (1992). Exper imental studies of learning dermatology d iagnos is : The impact of examples . Teach ing and Learning in Medic ine, 4, 35-44. Anderson , J . A . (1990). Hybrid computat ion in cognit ive sc ience : Neural networks and symbols . App l ied Cognit ive Psycho logy , 4, 337-347. Anderson , N. H. (1969). Comment on "An analys is of var iance model of the assessmen t of configural cue-uti l ization in clinical judgment". Psycho log ica l Bulletin, 72, 63-65. Anderson , N. H. (1972). Looking for configurality in clinical judgment. Psycho log ica l Bulletin, 78, 93-102. Anderson , N. H., & Shan teau , J . C . (1970). Inf