Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An experimental investigation of the use of explanations provided by knowledge-based systems Dhaliwal, Jasbir S. 1993

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1993_fall_phd_dhaliwal_jasbir.pdf [ 18.35MB ]
Metadata
JSON: 831-1.0086429.json
JSON-LD: 831-1.0086429-ld.json
RDF/XML (Pretty): 831-1.0086429-rdf.xml
RDF/JSON: 831-1.0086429-rdf.json
Turtle: 831-1.0086429-turtle.txt
N-Triples: 831-1.0086429-rdf-ntriples.txt
Original Record: 831-1.0086429-source.json
Full Text
831-1.0086429-fulltext.txt
Citation
831-1.0086429.ris

Full Text

AN EXPERIMENTAL INVESTIGATION OF THE USE OFEXPLANATIONS PROVIDED BY KNOWLEDGE-BASED SYSTEMSbyJASBIR SINGH DHALIWALB. Acc. (Honours), University of Malaya, 1982MBA, University of British Columbia, 1986A THESIS SUBMITTED IN PARTIAL FULFILMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIES(Business Administration - Management Information Systems)We accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIAJune 1993© Jasbir Singh Dhaliwal, 1993In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature) Department of  Management Information SystemsThe University of British ColumbiaVancouver, CanadaDate^June 22, 1993DE-6 (2/88)ABSTRACTEver since MYCIN introduced the idea of computer-based explanations to the artificialintelligence community, it has come to be taken for granted that all knowledge-based systems (KBS) needto provide explanations. While this widely-held belief has led to much research on the generation andimplementation of various kinds of explanations, there has however been no theoretical or empiricalevidence to suggest that 1) explanations are used by users of KBS, and 2) the use of explanations benefitsKBS users in some way. In view of this situation, this study investigates the use of explanations that areprovided by a knowledge-based system, from the perspective of understanding both the specific factorsthat influence it, as well as its effects.The first part of this dissertation proposes a cognitive learning theory based model that bothclarifies the reasons as to why KBS need to provide explanations and serves as the basis forconceptualizing the provision of KBS explanations. Using the concepts of the feedforward and feedbackoperators of cognitive learning it develops strategies for providing KBS explanations and uses them toclassify the various types of explanations found in current KBS applications. The roles of feedforwardand feedback explanations within the context of the theory of cognitive skill acquisition and a model ofexpert judgment are also analyzed. These, together with past studies of KBS explanations, suggest thatuser expertise, the types of explanations provided, and the level of user agreement are significant factorsthat influence the explanation seeking behavior of users. The dissertation also explores the effects of theuse of KBS explanations in judgmental decision making situations supported by a KBS. It identifies andconsiders four distinct categories of potential effects of the use of explanations --- learning effects,perceived effects, behavioral effects, and effects on judgmental decision making.iiThe second part of the dissertation empirically evaluates the explanation provision strategies ina laboratory experiment in which 80 novice and expert subjects used a KBS for financial analysis to makejudgments under conditions of uncertainty. The experiment was designed specifically to investigate thefollowing fundamental research questions: 1) To what extent are the various kinds of explanations used?2) How does user expertise, the feedforward and feedback provision of explanations, and the level of useragreement influence the amount and the types of explanations that are used? and 3) Does the use ofexplanations affect the accuracy of judgmental decision-making and user perceptions of usefulness?Some of the major results relating to the determinants of the use of KBS explanations include:1) user expertise is not a determinant of the proportion of explanations used but influences the types ofexplanations that are used, 2) explanation provision strategy is a critical determinant of the use of KBSexplanations with feedback explanations being used significantly more than feedforward explanations, and3) the three types of explanations are used in different proportions with the Why and How explanationsbeing used significantly more than the Strategic explanations. It was also found that the level of useragreement with the KBS had an "inverted-U" shaped relationship with the use of explanations. The leastnumber of explanations are used when the level of user agreement is either very high or very low. Themajor results relating to the effects of the use of explanations include the following: 1) the increased useof feedback explanations improves the accuracy of judgmental decision-making but has no effect on userperceptions of usefulness, 2) the increased use of feedforward explanations while having no impact onthe accuracy of judgments is positively correlated with user perceptions of usefulness, 3) the use of theWhy explanation as feedback improves the accuracy of judgmental decision-making. As well, there wasalso evidence that the use of the KBS benefited both experts and novices. Considering that anunderstanding of the determinants and effects of the use of KBS explanations is a critical prerequisite forthe design of KBS explanations, these and other findings of the study contribute both towards theiiidevelopment of a theoretical basis for the provision of KBS explanations, as well as the practical designof such explanation facilities.ivTABLE OF CONTENTSAbstract ^  iiTable of Contents ^  vList of Tables  xList of Figures ^  xiiAcknowledgement  xivChapter 1: Introduction and Motivation ^  11.0 Introduction^ 11.1 Importance of KBS Explanations Research ^  21.2 Approach Taken By the Study ^  41.3 Research Questions ^  71.4 Organization of the Dissertation ^  8Chapter 2: A Cognitive Learning Approach to Knowledge-Based System Explanations . . . . 102.0 Introduction ^  102.1 Explanations and Cognitive Learning ^  102.1.1 Reasons For the Provision of Explanations2.1.2 The Learning Versus Working Conflict2.2 Explanations As Cognitive Feedback Within the Lens Model ^  152.2.1 KBS Explanations Within the Lens Model2.2.2 A Cognitive Learning Basis For the Provision of KBS Explanations2.2.3 The Feedforward and Cognitive Feedback Learning Operators of theCognitive Feedback Paradigm2.2.3.1 Components of Cognitive Feedback and Feedforward2.2.3.2 Differences Between Cognitive Feedback and Feedforward2.2.3.3 Effectiveness of Feedforward and Feedback In Fostering Learning2.2.4 The Cognitive Feedback Paradigm and the Provision of KBS Explanationsv2.3 Feedforward and Cognitive Feedback Strategies for the Provision of KBS Explanations . . . . 322.4 The Role of Feedforward and Feedback Explanations In the Acquisition andApplication of Expertise ^  372.4.1 Role of Feedforward and Feedback In Hogarth's Model of Judgment2.4.2 Role of Feedforward and Feedback In the ACT Theory of Skill Acquisition2.5 Summary of the Chapter ^  44Chapter 3: A Framework for Investigating the Use of Knowledge-Based SystemExplanations and Research Hypotheses ^  453.0 Introduction ^  453.1 The Use of the Various Types of KBS Explanations ^  463.1.1 Selection Versus Use of KBS Explanations3.1.2 Types of Explanations3.1.3 Feedforward and Feedback Explanation Provision and the Types of Explanations3.1.4 Literature Review of Past Studies of the Use of KBS Explanations3.2 A Framework of the Factors that Influence the Use of KBS Explanations ^ 543.2.1 Task Characteristics3.2.2 Characteristics of the Explanations that are Provided3.2.3 Interface Design and Provision Strategies3.2.4 User Characteristics3.3 A Framework for Investigating the Effects of the Use of KBS Explanations ^ 623.3.1 Effects on Judgmental Decision Making3.3.2 Behavioral Effects3.3.3 Perceived Effects3.3.4 Learning Effects3.4 Hypotheses ^  713.5 Summary of the Chapter ^  76Chapter 4: The Task Domain And the Development and Validationof the Experimental System ^  774.0 Introduction ^  774.1 Selection of the Domain and the Task Application ^  774.1.1 The Domain4.1.2 Selection of the Task Application4.1.3 Decision Support Aids for Financial Statement Analysisvi4.2 Development of the Experimental Material ^  844.2.1 Development of the Canacom Experimental Case4.2.2 Development of the FINALYZER-XS Experimental System4.2.2.1 Decision Process Used in Financial Analysis4.2.2.2 Interface Design Issues4.2.2.3 Knowledge Acquisition Issues4.2.2.4 Development of the Explanations Provided by FINALYZER-XS4.3 Validation of the Explanations Developed ^  1084.4 Summary of the Chapter ^  118Chapter 5: Research Design and Experimental Procedures ^  1195.0 Introduction ^  1195.1 Independent Variables ^  1195.1.1 Explanation Provision Strategy5.1.2 User Expertise5.1.3 Types of Explanations5.1.4 Level of User Agreement5.2 Dependent Variables ^  1245.2.1 Use of the Explanations5.2.2 Accuracy of Judgmental Decision Making5.2.3 Perceptions of Usefulness5.2.3.1 Perceived Usefulness of the System5.2.3.1 Perceived Usefulness of the Explanations5.3 Research Design ^  1345.4 Experimental Procedures and Subjects ^  1375.4.1 Sample Size and Recruitment of Subjects5.4.2 Experimental Procedures5.4.3 Performance Incentives and Payment to Subjects5.5 Basic Approaches to Data Analysis ^  1475.6 Summary of the Chapter ^  151Chapter 6: Analysis of the Results ^  1526.0 Introduction ^  1526.1 Evaluation of the Assumptions Underlying the Statistical Tests ^  153vii6.2 Assessment of the Validity and Reliability of Measurement ^  1586.2.1 Reliability of the Independent Variables6.2.1.1 Years of Experience in Financial Statement Analysis6.2.1.2 Professional Qualifications6.2.2 Assessment of the Reliability of the Dependent Variables6.3 Results Pertaining to the Factors Influencing the Use of Explanations ^ 1686.3.1 Hypothesis 1: Feedback and Feedforward Explanation Provision6.3.2 Hypothesis 2: User Expertise6.3.3 Hypothesis 3: Types of Explanations6.3.4 Hypothesis 4: Interaction of User Expertise and Explanation Provision6.3.5 Hypothesis 5: Interaction of Explanation Provision and Types of Explanations6.3.6 Hypothesis 6: Interaction of User Expertise and Types of Explanations6.3.7 Hypothesis 7: User Expertise by Explanation Provision by Explanation Types Interaction6.3.8 Summary of the Results Relating to the Factors Influencing the Use of Explanations6.4 Results Relating the Level of User Agreement to the Use of Explanations ^ 1816.5 Results Relating to the Effects of the Use of Explanations ^  1876.5.1 Effect of the Use of Feedforward/Feedback Explanationson the Accuracy of Judgments6.5.2 Effect of the Types of Explanations Used on the Accuracy of Judgments6.5.3 Effect of the Use of Feedforward/Feedback Explanationson Perceptions of Usefulness6.5.4 Effect of the Types of Explanations Used on Perceptions of Usefulness6.6 Results of Other Statistical Analysis Undertaken^  2076.6.1 Results of Structural Equation Modelling6.6.2 Results of Validation Tests6.6.2.1 Ease of Use of the KBS6.6.2.2 Past Familiarity With KBS6.6.2.3 Motivation to Use the Explanations6.6.2.4 Expertise Contained In the KBS6.6.2.5 Expertise of the Explanations Provided6.6.2.6 Accuracy of Judgments Before and After using the KBS6.7 Summary of the Chapter ^  220Chapter 7: Discussion of the Findings and Conclusion ^  2227.0 Introduction ^  2227.1 Implications of the Research Findings ^  2227.1.1 Are KBS Explanations Used in Judgmental Decision-Making Situations?7.1.2 What Factors Influence The Use of KBS Explanations?7.1.2.1 Influence of Explanation Provision: Feedforward Vs. Feedbackviii7.1.2.2 Influence of User Expertise7.1.2.3 Influence of The Types of Explanations7.1.2.4 Influence of The Level of User Agreement With The KBS7.1.3 Does the Use of KBS Explanations Empower KBS Users?7.1.3.1 Effect on the Accuracy of Judgmental Decision-Making7.1.3.2 Effect on User Perceptions of Usefulness7.1.4 Implications for a Theory of the Provision and Use of KBS Explanations7.2 Limitations of the Research ^  2387.3 Contributions of the Research  2427.4 Directions For Future Research ^  2447.5 Concluding Comments ^  247BIBLIOGRAPHY ^  249LIST OF APPENDICES ^  2611. Material Used For Recruiting Subjects ^  2622. Experimental Task Material Used  2753. Data Collection Instructions to Research Assistants ^  3254. Screens of the MOUSE Tutorial ^  3325. Screens of the CREDIT-ADVISOR Expert System Tutorial ^  3406. Screens of the FINALYZER-XS Expert System ^  3597. Examples of the Feedforward and Feedback Explanations Provided by FINALYZER-XS . . . . 4098. Structural Equation Modelling: Comparison of PLS and LISREL &Summary of PLS Models ^  4359. Task Material and Summary Data of the Pilot Test Undertaken to Validatethe Explanations ^  448ixLIST OF TABLESTable Page3.1 Definitions of Explanations ^ 493.2 Explanation Provision Strategies Used For the Three Types of Explanations ^ 503.3 Hypotheses Relating to the Determinants of the Use of Explanations ^ 733.4 Hypotheses Relating to the Influence of User Agreement ^ 743.5 Hypotheses Relating to the Influence of the Use of Explanations on ^the Accuracy of Judgments743.6 Hypotheses Relating to the Influence of the Use of Explanations on ^ 75User Perceptions of Usefulness4.1 Items Used to Assess the Expertise of the Experimental System ^ 1044.2 Summary Statistics For Explanation Attributes ^ 1134.3 Statistics for the Understandability of Explanations 1144.4 Statistics for the Readability of Explanations ^ 1144.5 Statistics for How Accurately the Explanations Reflect Their Definitions ^ 1144.6 ANOVA Results: Validation of Explanations ^ 1175.1 Explanation Provision Strategies ^ 1205.2 Denominators Used for Computing Explanation Usage Proportions ^ 1275.3 Items Used for Measuring Usefulness of the System ^ 1315.4 Items Used for Measuring Usefulness of the Explanations Provided ^ 1335.5 Research Design ^ 1355.6 Experimental Format 1365.7 Power Estimates for a Sample Size of 80 ^ 1375.8 Basic Statistical Models ^ 1486.1 Tests of the Assumptions Underlying the ANOVA Model ^ 155xTable^ Page6.2^Scale Reliability Coefficients and Factor Analysis Results ^  1626.3^Item Reliability Statistics & Factor Loadings: Usefulness of the System ^ 1646.4^Item Reliability Statistics & Factor Loadings: Usefulness of the Explanations ^ 1666.5^Item Reliability Statistics & Factor Loadings: Ease of Use ^  1676.6^Rotated Factor Matrix for All Items ^  1686.7^ANOVA Results for the Use of Explanations ^  1706.8^Detailed Statistics For the Use of Explanations  1716.7A ANOVA Results for the Use of Feedforward Explanations (Model 2) ^ 1736.7B ANOVA Results for the Use of Feedback Explanations (Model 3)  1746.9^Aggregate Statistics of the Use of Explanations ^  1766.10 Hypotheses and Statistical Models Relating to User Agreement ^  1826.11 ANOVA Results and Summary Statistics of the Number of Explanations Used ^ 183For the Levels of User Agreement and User Expertise6.12 Means and Standard Deviations of User Agreement for the Treatment Groups^ 1866.14A Influence on Judgmental Accuracy: Results for Model 1 ^  1906.14B Influence on Judgmental Accuracy: Results for Models 2 and 3  1906.15 Influence of the Types of Explanations on Judgmental Accuracy .^  196Results for Models 4 and 56.17A Influence on User Perceptions of Usefulness: Results for Model 1 ^ 2006.17B Influence on User Perceptions of Usefulness: Results for Models 2 and 3 ^ 2006.18 Influence on Perceptions of Usefulness: Summary Statistics for Models 2 and 3 ^ 2036.19 Influence of the Types of Explanations on Perceptions of Usefulness: ^ 206Results for Models 4 and 56.20 ANOVA Results: Accuracy Before and After Using the KBS ^  2186.21 Summary Statistics of Accuracy Before and After Using the KBS  218xiLIST OF FIGURESFigure Page2.1 The Lens Model ^ 162.2 Lens Model Equation 182.3 Typical User-Knowledge Based System Interaction ^ 332.4 Alternative Explanation Provision Strategies ^ 352.5 Role of Explanations In the Use of Expertise 383.1 Determinants of the Use of KBS Explanations ^ 553.2 Dependent Variables For Investigating the Effects Of the Use of KBS Explanations 643.3 Research Model ^ 724.1 Modular Structure of the FINALYZER-XS System ^ 924.2 ANOVA Models Used to Validate the Explanations 1165.1 Research Variables ^ 1205.2 Hierarchy of the Measures of Explanation Usage ^ 1255.3 Summary of Experimental Procedure ^ 1405.4 Main Structural Equation Model 1495.5 Second Structural Equation Model ^ 1506.1 Normal Density Plots For the Use of Explanations Dependent Variables ^ 1566.2 Normal Density Plots For the Accuracy and User Agreement Dependent Variables^. . . . 1576.3 Statistical Model and Hypotheses Relating to the Use of Explanations ^ 1696.4 The Use of Feedforward and Feedback Explanations ^ 1726.3A Additional Models Used to Investigate Explanation Provision Strategy ^ 1736.5 Use of the Three Types of Explanations ^ 177xiiFigure Page6.6 Use of Explanations By Experts and Novices ^ 1786.7 Types of Explanations Used For Explanation Provision Strategies ^ 1796.8 Types of Explanations Used By Experts and Novices ^ 1806.10 Level of User Agreement and the Use of Explanations 1856.11 User Expertise By Level of User Agreement ^ 1856.13 Hypotheses and Statistical Models Relating to the Influence of the ^ 188Use of KBS Explanations on the Accuracy of Judgments6.13A Plot of Accuracy by the Use of Feedback Explanations (Model 1) ^ 1916.13C Expert-Novice Differences in the Influence of the Use of Feedforward 193Explanations on the Accuracy of Judgments6.13B Plots of Accuracy by the Use of Feedback Explanations (Model 3) ^ 1946.14 Plot of Accuracy by the Use of the Why Explanation Provided As Feedback ^ 1976.15 Hypotheses and Statistical Models Relating to the Influence of the ^ 198Use of KBS Explanations on User Perceptions of Usefulness6.16 Plot of User Perceptions of the Usefulness of the Explanations by the ^ 201Use of Feedforward Explanations6.16B Plot of User Perceptions of KBS and Explanation Usefulness by the ^ 201Use of Feedforward Explanations6.17A Plots of Feedforward Explanations Used and Perceptions of KBS Usefulness ^ 205By Explanation Provision Strategy6.17B Plots of Feedback Explanations Used and Perceptions of KBS Usefulness ^ 205By Explanation Provision Strategy6.19 Overall Structural Equation Model of the Determinants & Consequences of ^the Use of Explanations209ACKNOWLEDGEMENTThis thesis represents the culmination of an exciting period of personaltransformation and rapid change. It involved changes in my personal philosophy,immediate family, home, conception of nationality, political outlook, and my relationshipswith most other individuals as well as society as a whole. Many people contributed to thistransformation, often in ways that remain unfathomable to us all. Among them I amparticularly indebted to the following individuals who were a major influence: JohnFossum for showing me the depth of the human soul, individual integrity and right-wingeconomics; M. Ramesh for his support, opinions on Marx and how the world should runitself,• Jo-Anne Dillaboughfor her struggle with post-modernism and deconstruction; LouieLefebre for his passion for Quebec nationalism and Charlie Parker; Sarb Ner for hisDNA-based view of the world and his understanding of the cynical aspects of science;Margo Fallon for her enlightening views on feminism and the Irish immigrant mentality;Jinro Ukita both for his ability to merge it all --- arts, science, music, business andhuman emotions, as well as his ability to reject it all; John Heaman who impressed mewith his cooking, painting, and the plantation lifestyle; Hugh Miller for teaching me theintricacies of Canadian wildlife and the character-building aspects of life on the Prairies;and Mukesh Bhargava for his discussions on the joys of data interpretation. The loveshowed by these people had more of an impact on my transformation than they trulyrealise.More directly, I wish to acknowledge the contribution my thesis supervisor, IzakBenbasat. His biggest achievement was in showing me the value of depth, as againstbreadth, in rigorous scientific research. My thanks also to my committee --- Joy Begley,Andy Trice and Al Dexter for their ideas and insights. I also owe an obligation to myfamily for their support and understanding, especially the critical roles played by mybrother Manjit and my sister Jasbinder. Last and most importantly, this thesis would nothave been completed without the love and support of my wife, Anita. Her calmness andher "so what?" attitude kept my sanity during the darkest moments of the transformation.This thesis represents just as much an achievement of hers as it is mine.xivlk Onkar,Satnam, Nirbhau, Nirvair, Akal-murat, AjuniSaibhang, Gurprasad, Jap, AM Sach, Jugad Sach, Haibhi Sach,Nanak Hosi Bhi Sach.--- Japji Sahib^ this thesis is dedicated to my parents Dilbara and Amritpaland my sons Simren and RunpalIlm AmalWith the beggar's bowl of my skullI hankered after those with garnered learning,Begging crumbs of their knowledge,Stuffing them into this bowl.Conceited, puffed with scraps,Learning's mantle, I fancied, had fallen upon me,Strutting like one far gone.One day I placed my surfeited bowl before the Master:"Leavings, corrupted crumbs!" he cried,Emptying it out on the highway;Then he rubbed it clean of this pollution:See how it shines now, its lotus-freshness.--- By Bhai Vir Singh, Poet of the SikhsX VCHAPTER 1: INTRODUCTION AND MOTIVATION1.0 Introduction The focus of this dissertation is on understanding the factors that influence, as well as theeffects of, the use of explanations that are provided by knowledge-based systems (KBS).Knowledge-based systems are computer-based software tools that use artificial intelligencetechniques to capture, represent and apply expert knowledge, so as to be able to mimic the behaviorof human experts in specific narrowly-defined problem domains. The ability to explain knowledgeand reasoning, often referred to as the explanation facility, is considered to be one of their mostpowerful components. For example, in a survey capturing the preferences of medical practitionersfor KBS design features, Teach and Shortliffe [1981] found them to rate the system's ability toprovide explanations of its knowledge and functioning as the primary requirement for a KBS inmedicine. However, while substantial research efforts have been directed to the implementation ofexplanation facilities and the generation of explanation texts [see Abu-Hakima and Oppacher, 1990for a comprehensive summary], surprisingly little is known about the behavioral consequences ofexplanations as part of the KBS interface. For example, given a choice in acquiring explanatoryinformation, not everyone chooses to use this information. It has been argued that the scant attentionpaid to such phenomenon, and to the requirements for KBS explanations in general, is a majorreason for the limited impact that knowledge-based systems technology has had on computing ingeneral [Kidd and Cooper, 1985]. Similarly, Carroll and McKendree [1987] argue that far too littlebehavioral work has been invested in research on the design of advice-giving systems and suggestthat it is pointless to build such interface facilities unless "we take into account the behavioralrequirements of their usefulness and usability."11.1 Importance of KBS Explanations Research Ever since explanations were introduced as part of the interface of MYCIN [Scott, Clancey,Davis, and Shortliffe, 1977], it has come to be taken for granted by the artificial intelligencecommunity that all KBS should provide explanations [Duda and Shortliffe, 1983; Buchannan andShortliffe, 1984]. However, although there exists a substantial amount of literature on the natureand structure of explanation in the philosophy of science [Toulmin, 1958; Hempel, 1965;Achinstein, 1971], the provision of KBS explanations has not been theoretically or empiricallystudied thoroughly to date. There exist no theories that address fundamental concerns such as: 1)why should KBS provide explanations, and 2) what benefits are there for users of KBS to make useof explanations that are provided? As well, there is little empirical evidence to indicate that whenthey are provided, KBS explanations are used and useful in problem solving situations. As a matterof fact, the most troubling aspects of the KBS explanations literature is that the goals for providingexplanations are poorly explicated, if at all specified, and it is commonly assumed that KBSexplanations are relevant to, and necessary for, all problem-solving situations.Considering that large amounts of time and system development resources are necessary toimplement explanation facilities [cf. AAAI Workshop on Explanations, 1988], it is imperative thatwe should first seek to understand: 1) the circumstances in which explanations are selected for use,and 2) how the use of explanations affects the user as well as the quality and process of decisionmaking. Such an understanding of both the factors that influence, and the effects, of the use ofexplanations will be invaluable to KBS developers for making design decisions relating to: 1) theoptimal design features and strategies to be used for providing explanations, and 2) the appropriateamount of resources that should be expended on developing explanation facilities.2While dictionary definitions of explaining focus on "making clear or intelligible", "givingmeaning to", "accounting for", and "making known in detail" [Oxford, 1984], Hayes and Reddy[1983] argue that humans explain for one or more of the following three reasons: 1) to clarify, 2)to instruct, and 3) to convince. For example, explanations are used to clarify why a particularquestion is being asked, instruct as to how a particular conclusion is reached, and convince bygiving reasons as to why all other alternatives are infeasible. While clarifying, teaching andconvincing can reasonably be expected to increase the understanding of the person who is providedwith the explanations, the fundamental question of whether such an understanding does translatedirectly into improved problem solving or decision making also needs to be investigated. It istherefore critical that studies be undertaken to demonstrate how the use of explanations improvesthe quality of decision making.The study of explanations is also important from the perspective of human-computerinteraction and interface design. It is argued that the exact manner in which KBS explanations areprovided will directly influence KBS use, as well as the value that users derive from them. Theimportance of this is underscored by studies of cognitive learning that suggest that the exact natureand timing of explanatory information made available to learners can be expected to significantlyaffect both their explanation seeking behaviour, and their understanding and comprehension of thedomain being learned. Wensley [1989, p. 261] notes that most of the expert systems that have beenproduced to date have poor interfaces for explanations. For instance, he observes that:"it seems to be assumed that providing the users with a single type of explanationis sufficient. Usually the explanation is in the form of a tortuous chain of facts andrules which lead to a particular conclusion. Like a mathematical proof, an3explanation of this type is unlikely to inspire confidence. Even more dangerous is theassumption that a single generic explanation can be provided to novice and expertsalike. This is clearly unacceptable. There is much research which needs to be doneinto the nature of explanation from a practical human-computer interfacestandpoint."The user interface needs of KBS users are fundamentally different from, and go beyond, those ofusers of conventional information systems, such as decision support systems. For example, Ye[1990] argues that because traditional programs automate well-specified computational procedures,users are more concerned about operating a system effectively to obtain outputs, than aboutunderstanding the validity of the system's functioning and the usability of its outputs. Designers ofKBS interfaces, however, must go beyond operating concerns such as ease-of-use, response time,and error recoverability, to consider those interface needs that are likely to affect user understandingand acceptance of the KBS and its outputs [Hendler and Lewis, 1988; Wexelblat, 1989].1.2 The Approach Taken By This Study KBS explanations are used in three different contexts: 1) as part of KBS use in decisionmaking, 2) as part of debugging activities carried out by knowledge engineers, and 3) as part ofsystem validation activities carried out by users, domain experts and knowledge engineers. Thedistinctions between these three contexts are critical and stem from the fact that while explanationsare commonly incorporated into most end user applications of KBS, they can also play a significantrole in KBS development. In recognition of this, most current KBS development shells andenvironments include tools that utilize explanations to aid efficient and effective systemdevelopment.4The practice of incorporating explanations in end-user applications of KBS is based on thedual beliefs that: 1) intelligent advisory systems should be able to explain themselves like humanexperts, and 2) explanations increase user confidence and understanding by revealing domainknowledge and internal rules leading to system conclusions. Lamberti and Wallace [1990, p.302]suggest that an explanation facility is useful at several levels: 1) explanatory information aids thedecision maker in formulating problems and models for analysis, 2) it can assure the sophisticateduser that the system's knowledge and reasoning process is appropriate, and 3) it can instruct thenovice user about the knowledge in the system as it is applied to solve a particular problem.The logic behind including explanation facilities in KBS shells and development environmentsis that they provide enhanced debugging and validation abilities in systems development. Forexample, the REPORT command in the VPExpert [Paperback Software, 1988] shell lists insequential order all the explanations attached (using the BECAUSE clause) to rules that 'fired' aspart of a consultation. Such a listing assists in the debugging of processing logic by knowledgeengineers. It also allows users and domain experts, who may not be familiar with representationschemes and inference engines, to participate in the validation of a knowledge-base.Thus, the use of KBS explanations in systems development is motivated by a different setof objectives than when used as part of an end user application. This study limits itself to the studyof KBS explanations that are included as part of end-user applications of KBS.This study takes a "decision support system (DSS)" approach to the investigation of KBSexplanations, i.e., it investigates the use of explanations in the specific situation where a decision5maker uses a KBS as a decision support tool to support judgmental decision making underconditions of uncertainty. While KBS are used in various roles, e.g., as decision takers, intelligentassistants, etc., this research focuses on the decision support role of KBS. Thus, KBS explanationsare viewed as decision aids that are provided to enhance the quality of decision making injudgmental situations. As well, of the various types of decision situations that exist, e.g., thoseranging from unstructured to structured, those with varying numbers of alternatives and/orattributes, those with varying levels of uncertainty, etc., this study focuses on, and is primarilyrelevant to, judgmental decision making situations under conditions of uncertainty.The study also takes a cognitive learning approach to KBS explanations. It focusesspecifically on the learning that takes place when a decision maker uses KBS explanations, byassuming that users undergo a learning experience when they use a KBS-type decision aid. Unlikeconventional DSS which generally provide decision makers with information and tools to betteranalyze a decision problem, KBS that are used as decision aids provide decision makers withspecific advice or recommendations. Hollnagel [1987] argues that, if users are to remain responsiblefor the decisions made, they will not accept advice and recommendations based on reasoning theydo not understand. Thus, not only must KBS provide such advice but they must also ensure that theuser learns about the basis for the advice. KBS explanations have this critical role of ensuring thatusers learn about or develop an understanding of the functioning of the KBS. This study focuses onthis learning role of explanations. Thus, it heeds Wensley's [1989, p. 251] call for more researchinto the design of systems to assist the learning process of individuals, by focusing on the learningeffects of the use of decision aids.61.3 Research Questions By taking a cognitive learning theory based approach to the provision of KBS explanations,this study has the goal of addressing three fundamental research questions:1) To what extent are explanations used in judgmental decision-making situations supportedby the use of a KBS?2) What factors influence the use of KBS explanations? and3) Does the use of explanations benefit KBS users in any way?The objective of the first research question is to uncover the amount of explanationsprovided that are used. Unlike situations where the KBS is used as a training tool, there is reasonto believe that in decision making situations, decision makers will not utilize all the explanationsthat are provided to them. Considering that the relevance of KBS explanations to such decisionsituations has not been empirically demonstrated to-date, the study will attempt to provide evidenceof the extent to which explanations are used in such situations.With respect to the second research question, the study will specifically investigate theinfluence of four factors. First, it will test the influence of the provision of explanations as"feedforward" and "feedback". As will be discussed in Chapter 2, these are two concepts that havebeen identified, by research based on the cognitive feedback paradigm, as being the primarycognitive learning operators that foster learning in judgmental situations. Second, the influence ofuser expertise will be investigated. Prior studies of KBS explanations and theories of skillacquisition suggest that there will be expert-novices differences in the use of explanations. Third,the influence of three generic types of explanations that are currently implemented in KBS will be7analyzed. These include Why explanations that justift a KBS state or action by revealing theunderlying reasons that are based on causal models, How explanations that describe or trace thecontents and reasoning of a KBS, and Strategic explanations that clarify overall problem-solvingstrategy and meta-knowledge. Fourth, the study will examine the relationship between the level ofuser agreement with recommendations of a KBS and the use of explanations. While prior researchhas shown that this is a significant relationship, the precise nature of the relationship has not beenstudied yet.The third research question is motivated by the desire to understand the benefits that accrueto users of KBS explanations. The effort and cost incurred in the development of an explanationfacility can only be justified if there are significant benefits accruing from its use. This question hasnot been investigated to-date, although some researchers have observed that "the effect ofexplanations on performance is methodologically more difficult to investigate, but is certainly morerelevant" [Ye, 1990, p.164]. While Chapter 3 identifies various behavioral, perceptual, learning,and judgmental effects of the use of explanations, this study will focus on the effect of the use ofexplanations on the accuracy of judgments and on user perceptions of usefulness.1.4 Organization of the Dissertation The dissertation is organised as follows. Chapter 2 takes a cognitive learning approach tounderstanding the reasons as to why explanations are needed in judgmental decision makingsituations involving the use of a KBS. By using the feedforward and feedback concepts of cognitivelearning, it develops a theoretical model for conceptualizing the provision of KBS explanations. Aswell, it analyzes the roles of feedforward and feedback within the context of the ACT theory of8cognitive skill acquisition [Anderson, 1982] and Hog arth's [1981] model of expert judgment. Basedon the theoretical model developed in Chapter 2, Chapter 3 classifies the various types ofexplanations found in current applications of KBS. Additionally, it presents a two-part frameworkfor investigating the use of KBS explanations that identifies both the factors that influence theexplanation selection behavior of KBS users, as well as the potential effects of the use of theexplanations. It also presents a review of prior studies that empirically investigated issues relatedto the use of KBS explanations, and proposes specific hypotheses that are empirically tested in thisstudy. Chapter 4 describes the various considerations involved in the selection of the task domain,as well as the development and validation of an experimental KBS that is used to test thehypotheses. Chapter 5 outlines the research design and operationalization details of the experimentconducted, while Chapter 6 presents the results of the statistical analysis of the data collected.Finally, Chapter 7 discusses the implications of the findings, limitations of the research, itscontributions, as well as directions for future research.9CHAPTER 2: A COGNITIVE LEARNING APPROACH TOKNOWLEDGE-BASED SYSTEM EXPLANATIONS2.0 Introduction The study utilizes a cognitive learning theory based approach for conceptualizing knowledge-based system (KBS) explanations. Concepts related to the two cognitive learning operators offeedforward and feedback are used to characterize the provision of KBS explanations. The nextsection presents the reasons for taking a cognitive learning perspective to the study of KBSexplanations and distinguishes between two different situations in which learning takes place. Thefollowing section details the lens model [Brunswik, 1956] and the cognitive feedback paradigm[Balzer, Doherty, and O'Connor, 1989] that provide a basis for understanding the nature of learningin judgmental situations under uncertainty. Section 2.3 utilizes the two cognitive learning operatorsof the cognitive feedback paradigm --- feedforward and cognitive feedback, to conceptualizestrategies for the provision of knowledge-based system explanations. Section 2.4 discusses the roleof these strategies in both the acquisition and application of domain expertise in judgmentalsituations. This is done in the context of Anderson's [1982] ACT Theory for the acquisition ofdomain expertise, and Hogarth's [1987] model of the judgmental process for the application ofexpertise.2.1 Explanations and Cognitive LearningEver since explanations were introduced as part of the interface of MYCIN [Scott, Clancey,Davis, and Shortliffe, 1977], it has come to be taken for granted by the artificial intelligence10community that all KBS should provide explanations [Buchannan and Shortliffe, 1984]. However,although there exists a substantial amount of literature on the nature and structure of explanationin the philosophy of science [Toulmin, 1958; Hempel, 1965, Achinstein, 1971], the provision ofKBS explanations has not been theoretically or empirically studied thoroughly to date. There existno theories that answer fundamental questions such as: 1) why should KBS provide explanations,or 2) what benefits are there for users of KBS to make use of explanations that are provided?Considering that the general aim of cognitive science and artificial intelligence, particularly thestudy of expertise and knowledge-based systems, is to mimic human behavior and performance[Davis and Lenat, 1983; Hayes-Roth, 1983; Buchanan and Shortliffe, 1984], it is imperative thathuman explanation should serve as a model for the design of KBS explanation. Unfortunatelyhowever, the study of human explanation, as a psychological or behavioral phenomenon, is itselfpoorly understood and has not been the focus of much research. The few exceptions, all of whichwere motivated by the advent of KBS explanation facilities, are Coombs and Alty [1980], Goguen,Weiner, and Linde [1983], and Paris [1987]. In view of this situation, Ye [1990] has proposed amodel based on user models of the task domain as a basis for the study of KBS explanation. Priorto the assessment and discussion of what would constitute an appropriate theoretical framework forKBS explanations, it is important to assess the reasons why explanations are used.2.1.1 Reasons for the Provision of ExplanationsConsidering that MYCIN introduced and set the standard for the explanation facility beingan integral component of the concept of a knowledge-based system, it is important to firstcomprehend the original motivations of MYCIN's designers. Buchanan and Shortliffe [1979]incorporated explanations in MYCIN because they felt that providing reasonable explanations, in11addition to good advice, was critical for the system to be "acceptable to users." Underlying thisconcept of user acceptance was their belief that for the system to be acceptable it had to be"understood by clients." Clancey [1985, p. 216], notes that this approach was in stark contrast tocommon Bayesian programs, which did not capture nor explain an expert's reasoning, and weretherefore not easily understood by their users. He further notes that using MYCIN for teachingmedical students was consistent with its design goals that its explanations had to be "educational"to naive users. While Buchanan, Shortliffe, and Clancey have not precisely defined what they meantby being "understood by clients" or "educational to users", it is quite clear that MYCIN'sexplanation facility was incorporated primarily because they saw a "teaching" role for the KBS. Theexplanations were provided so that KBS users could learn from the use of the system, and it wasbelieved that this could then somehow translate into user acceptance of the system.This "teaching" role of KBS explanations is exemplified by the close relationship betweenthe concepts of explaining and learning. Explaining is defined as making clear what is notunderstood, while learning involves gaining an understanding of a skill or a task, etc. [Clancey,1985]. Thus, explaining a rule from the teacher's perspective is equivalent to understanding it froma learner's perspective. Hayes and Reddy [1983] provide further support for this dualteaching/learning role of explanations by arguing that humans explain for three reasons. First,explanations are given to clarify particular intentions. For example, clarifications are given whenlearners are unsure about why particular information is being asked for or a specific analysis isbeing performed. Second, explanations are given to teach or instruct a learner. Abu-Hakima andOppacher [1990] argue that for this, it is necessary for the explainer to judge the level of knowledgeand understanding possessed by the learner and to construct an explanation to fit that level. Third,12explaining is used to convince learners. For this, sound arguments need to be provided that yieldsupporting information as to why a particular piece of information is required and how a particularhypothesis is true or why competing hypotheses are false. Goguen, Weiner, and Linde [1983]further identify that naturally occurring explanations foster understanding by providing justificationsby giving reasons, giving examples, and eliminating alternatives.2.1.2 The Learning Versus Working ConflictGiven that explanations are provided primarily for the purposes of fostering understandingor learning, it is important to separate two distinct learning situations in which KBS explanationscan be used. First, there is the situation where the only objective is to learn or to gain anunderstanding. This can be termed the instructional situation, for example when a KBS is used fortraining purposes. The second situation is where there are other, and usually more important,performance objectives besides the gaining of a better understanding. This can be termed theworking situation, for example when a KBS is used for problem-solving or to make particularjudgments or decisions. In such situations, the use of explanations to foster an improvedunderstanding of the KBS or its domain is just as important as in the instructional situation.However, it can be expected that the use of explanations will be significantly different. Rationalusers will only use those explanations that they believe will, by improving their understanding, havea direct impact on their performance objectives. This highlights a critical, implicit assumption thatunderlies the provision of KBS explanations in all non-instructional situations and that also explainsthe incorporation of explanations in MYCIN for the purposes of making it "understandable andeducational". This is the assumption that the learning that occurs inevitably in working situationswill lead to improved performance. In the use of a KBS, explanations are the primary means by13which such learning is facilitated.A primary characteristic of working situations that has a direct bearing on the use of KBSexplanations is termed the learning versus working conflict [Carroll and McKendree, 1987]. It canbe described as follows. Users want to use a KBS and its explanations because they want to getsome task accomplished. This gives them a focus for their interaction with the KBS and increasesthe likelihood of their receiving concrete reinforcement from their use of the system. However, thissame pragmatism, or focus on task accomplishment, also makes them unwilling to spend much timelearning the KBS and its domain on its own terms. After all, to consult on-line, self-instructionalexplanations is for a time to cease working effectively. There is therefore a conflict betweenlearning and working that is inherent in the use of explanations in working situations. This conflictoften leads novice users to try and skip learning altogether by relying completely on the KBS'sconclusions without really understanding them. It also causes more experienced users to stagnatein terms of domain expertise. When situations occur that are more effectively described by newexplanations, they are more likely to stick to explanations they already know, regardless of theirefficacy in terms of explaining new situations. Thus, these tradeoffs between actual problem-solvingand using explanations to improve performance through learning, can have the potential effect ofundermining the motivation required to learn and to improve users' knowledge and understandingof a domain. This motivational "cost" of learning can however be reduced through the design ofbetter explanation facilities and interfaces. These will mitigate the learning versus workingconflict by better integrating the cost of learning with the actual use of the KBS for performance.For this, it is imperative that the design of KBS explanations be based on a sound understanding ofthe learning that occurs in working situations. This is necessary if we are to overcome the trouble14users have in learning about the KBS/domain, and to avoid having their knowledge andunderstanding of the KBS \domain stagnate at the current level. These can be characterized as the"deskilling" effect of the use of KBS.2.2 Explanations As Cognitive Feedback Within the Lens Model While there are various theories of learning for instructional situations, such as cybernetictheory [Novak, 1977], assimilation theory [Ausubel, 1963], conversation theory [Pask, 1976], andACT theory [Anderson, 1982], they shed little light on the learning that takes place in workingsituations, largely because they fail to consider the learning versus working conflict described in thelast section. However, research based on the cognitive feedback (CFB) paradigm [for a summarysee Balzer, Doherty, and O'Connor, 1989] that was developed within the framework of the lensmodel [Brunswik, 1956], is relevant to learning in working situations. While this work has focusedon the comparative efficacy of concepts such as outcome feedback, cognitive feedback, andfeedforward on the accuracy of judgmental decision-making under conditions of uncertainty, it hasalso been identified as being "central to the psychology of learning under uncertainty," [Balzer,Doherty, and O'Connor, 1989, pp. 410]. Much of this research has used judgmental accuracy asa surrogate for the measurement of learning in a variety of multiple cue probabilistic learning(MCPL) tasks [for reviews see Castellan, 1977, and Slovic, Fischhoff and Lichtenstein, 1977].2.2.1 KBS Explanations Within The Lens ModelBrunswik [1952] proposed the lens model, that is depicted in Figure 2.1, as a general modelof human behavior in uncertain situations. He argued that in most decision-making situations, thereis a sharp distinction between the person making the judgment and the environment. Judgments15TASK ENVIRONMENT^CUE SET^DECISION MAKER (JUDGE) CUE 1CUE 2CUE 3CUE 4CUE 5LEGEND OF RELATIONS^^ENVIRONMENT TO CUE RELATIONS (R e )^t/11,111111101111110111111111111111111111111,11”^JUDGE TO CUE RELATIONS (R e )^'''''''''''''''' •••^CUE TO CUE RELATIONS (G)FIGURE 2.1: THE LENS MODELabout events or objects in the task environment must be made by decision-makers in the absenceof direct contact with the particular events or objects. For example, most prospective purchasersof a particular stock have to make their judgments and investment decisions without being in directcontact with the particular firm that is represented by the stock. This separation of the judge fromthe object or event being judged means that decision-making has to be based on a set of imperfectcues or "lenses" through which the judge views the object or event. These are imperfect in the sensethat they are not perfect representations or predictors of the particular object or event. For example,the prospective investor relies upon available financial information such as stock prices, price-earning ratios, net sales over time, capital structure ratios, etc. as "lenses" thorough which the firmis understood. The relationships (termed Re) between these financial indices or cues and the firm16represented by the stock (the object) are at best imperfect or probabilistic, thus the judgments haveto be made in conditions of "uncertainty". As well, there is a second set of probabilisticrelationships in the model. Considering that the various cues comprise a set of related yet differentindicators, the relationships between these cues (termed G) are in themselves probabilistic oruncertain. For example, stock prices will contain information that will be redundant with thatprovided by the price-earnings ratios while the net sales over time may contain completely differentinformation. As well, while for some industries there may be a positive relationship between price-earning ratios and capital structure ratios, in others it may be the reverse. There is also a third setof uncertain relationships in the model --- that between the cues and the judge (termed Re). Forexample, cues, such as financial ratios, will be perceived and used differently by different judgesas well as by the same judge at different points in time or under different circumstances. Thus, therelative reliance on different cues can be affected by human cognitive factors such as learning,fatigue, individual differences, etc.The lens model also suggests that accurate judgment or "achievement", which can berepresented by the strength of the correlation between the objects/events and the judgments madeby the judge, is a function of the three uncertain relationships. The more knowledge orunderstanding possessed by a decision-maker about these relationships the better will be thedecisional performance. The model can be summarized in Libby's [1981] simplification of the lensmodel regression equation that was developed by Hammond, Hursh, and Todd [1964], as presentedin Figure 2.2. The equation states that judgmental performance or achievement (r e) is a function ofthree variables G, R e , and Re . G represents the set of probabilistic relationships between the variouscues or "lenses". The more knowledge possessed by the decision-maker about the correct weightings17ra = Fn {G, R„ R S}Legend: ra = achievementG = accuracy of the cue weightingR. = predictive ability of the cue set, andR. = predictability of the individual (consistency)Figure 2.2: Lens Model Equationbetween the cues, the better will be judgmental performance. R e represents the set of relationshipsbetween the cues and the object or event in the environment being evaluated or judged. The amountof information or understanding possessed by a decision-maker about the predictability of the cueshas a direct bearing on judgmental performance. R. represents the uncertain relationship betweenthe cues and the decision-maker. The greater the understanding that decision-makers possess aboutthe strengths and weaknesses of their own perceptual and cognitive information processing inrelation to the cues, the better will be judgmental performance. This is because such anunderstanding will enable the decision-makers to apply their knowledge with sufficient consistencyand completeness. This is often termed cognitive consistency [Hammond and Summers, 1972].The pivotal idea underlying the model is Brunswik's assumption that the basic units ofcognition, or knowing, are relations [Hammond, McCelland, and Mumpower, 1980]. In anuncertain judgment environment, learning about these different relations will be helpful forimproving a decision-maker's understanding of the judgment task and this is the key to improvedjudgmental performance. The lens model equation also precisely specifies how learning about theserelations will lead to more accurate judgmental performance or achievement. These correspond to18the components of the lens model, i.e., the environment, the cues, and the judge. The model cantherefore be restated as follows: "Achievement is a function of knowledge, calibration, andcognitive consistency." Knowledge relates to the understanding of the object or event in the taskenvironment. Information can therefore be provided to make the judge more knowledgeable.Calibration relates to the understanding of the accuracy of the match between the cue weightingsperceived by the judge and those that truly exist in the task environment. Information can thereforebe provided that will make the judge more calibrated. Similarly, cognitive consistency which is theunderstanding of the judge's cognitive decision strategy can also be improved; information can beprovided that will make the judge more consistent in the application of his or her strategy.The lens model is not a theory of learning for instructional situations and does not offer anyprescriptions as to how learning can be fostered in those situations. Rather, it is a model of thestructure of judgmental problem-solving (working) situations and it specifies precisely the threetypes of relationships that decision-makers need to understand, have clarified, or learn about duringsuch situations. If information about these relationships is provided to decision-makers duringproblem-solving it will lead to improved judgmental performance. The challenge to developers ofdecision aids is to determine precisely how and when this information should be provided, duringa problem-solving session involving the use of such decision aids, to maximize the understandingof the user. In the case of a KBS being used as a decision aid in a working situation, the explanationfacility can make these three kinds of information available to users. As an example, consider thecase of a KBS that performs financial statement analysis. First, explanations that clarify why aparticular ratio (cue), such as the debt-equity ratio, is related to an aspect of the firm's financialposition, e.g capital structure, will represent the provision of R e information, i.e, it will facilitate19the understanding of the relationship between the cue and the environment. Second, another set ofexplanations may explain how various liquidity ratios, such as the current ratio, inventory turnover,day sales in receivables, etc., are combined to evaluate the various sub-aspects of liquidity analysis.This represents the provision of G information, and it will facilitate the understanding of therelationships between the various cues used to represent liquidity. Third, by keeping track of theratios that are used by users in performing a particular analysis and matching these usage patternsto predefined profiles of typical users, the explanation facility may provide R. explanations byrevealing information about the cues that are most important to such users. These will facilitate thelearning of the relationships between the cues and the judge.In summary, the lens model provides guidelines as to three specific types of information thatcan be provided as explanations to foster learning in judgmental problem-solving (working)situations. A theoretical basis for exactly how and when this explanatory information can beprovided and prescriptions as to the relative efficacies of the three kinds of information is providedby the cognitive feedback paradigm that has evolved out of the lens model.2.2.2 A Cognitive Learning Basis for the Provision of KBS ExplanationsResearch into cognitive learning operators has been motivated by the desire to provide apsychological answer to the following question: "How might persons learn to improve the accuracyof their judgments?" [Balzer, Doherty, and O'Connor, 1989, p. 10]. To a great extent, this questionhas arisen out of concerns such as people have difficulty inferring environmental relationships fromunaided experience [Brehmer, 1980], and more generally, people are limited in their ability toprocess information in uncertain environments [Nisbett and Ross, 1980; Kahneman, Slovic and20Tversky, 19821 The traditional answer to this question has been "provide knowledge of results"or what is commonly termed as the feedback cognitive learning operator. This represents anextension, to the learning of cognitive skills, of the critical role that feedback has traditionallyplayed as a means of reinforcement in the learning of physical skills, e.g., the notion ofreinforcement schedules in Skinner's philosophy of radical behaviorism and Pavlov's notion ofclassical conditioning in the learning of reflexes or respondent behaviors [Bower, 1981]. These earlyconceptualizations of feedback were oriented towards outcome or performance reporting forpurposes of facilitating control. For example, Annett [1969] asserts that "at its most basic levelfeedback is information received by an individual about his or her past behaviour." It provides anevaluation of the correctness, accuracy or adequacy of the individual's response. This reflects theinfluence of cybernetic theorists [e.g., Wiener, 1948] who introduced the concept of the feedbackloop to the behavioral sciences. Such feedback loops are considered to be central to the ability ofself-regulating systems to correct deviant behaviour in reaching stable states. As the primary focusof this feedback was on reporting the outcomes of actions, it has generally come to be termedoutcome feedback (OFB). It has been asserted that outcome feedback enables individuals tounderstand and improve their judgments, improve their expertise in judgment tasks, and reducecommitment to incorrect judgment strategies [Hogarth, 1981]. As well, it has often been providedto people on the assumption that it will lead to improved decisions.However, various reviews of studies that have used the multiple-cue probability learning taskparadigm, to investigate the effects of outcome feedback on judgmental decision-making underuncertainty, reveal that outcome feedback is generally ineffective in fostering learning and improvedjudgmental performance [Hammond, Stewart, Brehmer and Steinmann, 1975; Brehmer, 1980;21Hoffman, Earle, and Slovic, 1981]. The consensus is that, with the exception of simple tasks suchas those involving two-cue linear function relationships, individuals do not learn from OFB incomplex, uncertain tasks. On the contrary, there is evidence that OFB was detrimental to cognitivelearning in certain tasks [Hammond, Summers, and Deane, 1973]. There are several reasons thatexplain this phenomenon. First, when the relationship between cues and criterion is probabilistic,the erroneous information (noise) in outcome feedback results in the lack of cognitive control (R.,in the lens model equation) [Libby, 1981, p. 29]. The constant adjustments made by learners to therandom error variances in outcome feedback lead to them not being able to apply their ownknowledge of judgment strategies in a consistent manner. Second, outcome feedback fails to providesufficient task information to enable decision-makers to form a suitable model of the environment[Brehmer, 1987; Sterman, 1989]. Third, in the case of dynamic decision-making situations, it doesnot allow decision-makers to correctly perceive key relationships and significant changes in thesystem [Brehmer, 1990; Sengupta and Abdel-Hamid, 1991]. Additionally, Hammond [1971] has alsodemonstrated that some complex tasks may not be learned at all when only outcome feedback isprovided.Considering that multiple-cue probability learning tasks are similar to more complexdiagnostic tasks, in the sense that both require individuals to make holistic judgments based on anumber of multiple-cue decision profiles in uncertain conditions, these results for OFB have seriousimplications for KBS. Most importantly, they provide a theoretical basis for the belief held by theKBS community, and by the original developers of MYCIN, that KBS must provide other output,such as explanations, in addition to outcome recommendations. This is especially critical when theyused as decision aids. In relation to the deskilling effect of the learning versus working conflict that22was discussed in Section 2.1, it implies that if regular users of KBS do not want their domainexpertise to stagnate at a mediocre level, they must use explanatory information provided by theKBS to foster increased learning and not just rely on the system's recommendations. This lendstheoretical support to Southwick's [1991, p.14] assertion that "we have to move from an expertsystem that produces a solution as a result, to a consultation system where the entire consultationis the result. Explanation has to progress from the verification of an expert system's solution, tobeing the solution itself."2.2.3 The Feedforward and Cognitive Feedback Learning Operators of the Cognitive FeedbackParadigmThe deficiencies of outcome feedback as a cognitive learning operator in judgmentaldecision-making have led to a search for alternatives to it. This search has resulted in a wideningof the feedback concept. Subsequent work has revealed another dimension of feedback. Ilgen et al.[1979] refer to the "information value" of feedback which they say depends on the incrementalincrease in knowledge about performance that the feedback provides the recipient. This is consistentwith the literature on the decision theoretic value of information [Hilton, 1981] and extends theearlier definition of feedback from the notion of "accuracy of performance for cybernetic control"to "knowledge about performance." As well, it explicitly recognizes the role of cognitive learningin improving judgmental performance in working situations. The two distinct types of feedback cantherefore be recognized as providing either 1) outcome variance information for calibratingperformance (OFB) and 2) information increasing the understanding and comprehension of therecipient. The study of the latter has come to be termed the cognitive feedback paradigm [Todd andHammond, 1965] and the lens model [Brunswik, 1956] has been used to understand the specific23types of information that, by increasing the understanding of decision-makers, improves theaccuracy of judgmental performance. This paradigm proposes the two alternative cognitive learningoperators of cognitive feedback (CFB) and feedforward (FP) as alternatives to outcome feedback.Theoretical support for these originate in Bjorkman's [1972] analysis of feedforward and cognitivefeedback in the study of cognitive learning processes. He identified them as the being the twocritical "cognitive learning operators" that facilitate improved understanding and comprehension.As well, he argued that while both have the same function, namely to reduce uncertainty about thetask, there are also critical distinctions between them.2.2.3.1 Components of Cognitive Feedback and FeedforwardThe lens model equation, discussed in Section 2.2.1, specified the three kinds of informationthat need to be provided to decision-makers to help them improve judgmental performance byincreasing their understanding of the judgmental task. The cognitive feedback paradigm takes theseto be the precise components of both the cognitive feedback and feedforward learning operators.They are conceptualized as being information provided to the decision-maker about (a) the relationsin the decision environment, (b) relations perceived by the decision-maker about that environment,and (c) relations between the environment and the decision-makers's perceptions [Balzer et al,1989]. Such information seeks to improve decision making by enhancing a decision maker'sunderstanding of the task structure, his or her cognitive system, and the fit between the two[Hammond, Stewart, Brehmer and Steinmann, 1975]. While these correspond respectively to theRe , Re and G types of information of the lens model equation, in the language of the cognitivefeedback paradigm, these are termed task information (TI), cognitive information (CI), andfunctional validity information (FVI), respectively.24Numerous studies have compared the relative value of the three types of information thatcan be provided as feedforward or cognitive feedback [see Balzer, Doherty, and O'Connor, 1989for an extensive summary]. The results indicate that task information (TI) is the most effectiveamongst them in fostering the learning of accurate judgmental performance [Newton, 1965;Adelman, 1981; Hoffman, et al, 1981]. The provision of cognitive information (CI) has little effectboth in itself or as an addition to task information [Schmitt et al, 1976]. The unique effects offunctional validity information (FYI), however, have not yet been adequately studied for anyconsensus to emerge [Balzer, et al, 1989]. These results have serious implications for the provisionof KBS explanations. They suggest that task information should be given the greatest priority forinclusion in explanation facilities. The explanations provided by MYCIN, as well as by most othercurrently implemented KBS, do focus on the provision of task information. As well, the finding thatCI is ineffective in fostering learning suggests that intelligent explanation facilities that providecustomized explanations to users based on user models of the task [Rich, 1983; Ye, 1990;Southwick, 1991, p. 8], are not likely to be effective. However, some researchers argue that whileCI may be ineffective in fostering learning in judgmental situations, its value in limiting thedysfunctional effects of poor cognitive control increases as the task environment becomes richer andmore complex [Sengupta and Te'eni, 1991].2.2.3.2 Differences Between Cognitive Feedback and FeedforwardWhile TI, CI, and FVI can be provided as part of both the feedforward and cognitivefeedback learning operators, there are critical distinctions between the two. Cognitive feedbackgenerally focuses on providing TI, CI, and FYI information that clarifies case specific outcomefeedback, i.e., it uses outcome feedback (OFB) as the starting reference point for improving the25decision-maker's understanding of the task. Feedforward, on the other hand, is not case specific andis generally unrelated to the outcomes of the specific case (OFB) being considered [Bjorkman,1972]. It is for this reason that the terminology of the CFB paradigm while carefully distinguishingbetween outcome feedback and cognitive feedback, does not decompose feedforward informationin a similar fashion. All feedforward information, is by definition, "cognitive feedforward", i.e.,it has the objective of improving understanding of the task by providing TI, CI, and FVIinformation to reduce task uncertainty. Bjorkman [1972] argues that knowledge learned throughfeedforward will be more accurate and consistent since it does not suffer from various errors andbiases associated with the trial-by-trial transmission of cognitive feedback information throughoutcome feedback.Another critical difference focuses on the temporal order in which they are provided. Bydefinition, feedforward is always provided prior to task performance while feedback is alwayspresented on the completion of the task and subsequent to the specific outcome feedback, to whichit is associated, being provided [Bjorkman, 1972]. Largely focusing on this distinction, someresearchers have chosen to operationalize feedforward as being training provided prior to taskperformance. For example, Malloy, Mitchell, and Gordon, [1987] provided subjects with a set ofheuristics for effectively performing the task, and Sengupta and Abdel-Hamid [1991] used an hour-long training session on the Brook's Law of project management. This temporal distinction betweenfeedforward and cognitive feedback can be used to identify advantages of one over the other.Skinner's [1968] argument that for "individuals to learn from explanatory information that isprovided to them, they must be able to associate it with their actions," suggests that cognitivefeedback will be superior to feedforward. Being provided just after the outcomes of an individual26action become known, it should be easier to associate and will therefore facilitate greaterunderstanding of the task. Along these same lines, Anderson et al, [1989] assert that since cognitivefeedback is provided just after outcome feedback, it eliminates the memory burden of holding allthe information that is relevant to a whole range of possible outcomes that may result from aparticular action. Being provided before task performance, information provided as feedforwardadds to this memory burden. On the other hand, however, it can be argued that feedforward relievesthe learner from a certain amount of cognitive strain as the information that is primed in memoryduring task performance will allow the learner to better understand the task requirements duringproblem-solving. This is supported by Bjorkman's [1972] assertion that feedforward in effectreduces cognitive load on a subject because it provides, prior to task performance, a large amountof the information that the subject would otherwise have to infer from experience, i.e., outcomefeedback.A third distinction between feedforward and cognitive feedback is that they focus ondifferent sets of cues. Generally, feedforward is information that relates to the cues that serve asinput variables in a task, while feedback is information that relates to the outcomes of the task thatis performed. To a large extent, this distinction arises from the fact that cognitive feedback isoutcome-specific while feedforward is unrelated to outcomes and is provided at the start of taskperformance [Bjorkman, 1972]. Cognitive feedback uses the clarification of case-specific outcomecues (OFB) as a starting point and provides information that traces the reasoning backwards to theinput cues. Feedforward information on the other hand focuses on the clarification of the input cuesand traces the reasoning forwards to the outcome cues.27Thus, while feedforward and cognitive feedback are distinctive concepts, they are by nomeans mutually exclusive. Rather, they have a compensatory relationship to each other in relationto fostering learning in working situations [Bjorkman, 1972]. There is less to gain from feedbackexplanations when feedforward explanations are clear and comprehensive and vice versa. This isnot unlike the tradeoff between the concepts of learning-by-being-told (feedforward) and learning-by-doing (feedback) in skill acquisition. Bjorkman [1972, pp. 156] further notes that "there is atpresent an almost complete lack of data on the interplay between feedforward and feedback inproducing knowledge (learning) and forming policies (action)."2.2.3.3 Effectiveness of Feedforward and Cognitive Feedback In Fostering LearningA significantly large number of studies using a wide variety of tasks have evaluated theefficacy of cognitive feedback in terms of how effectively subjects learn to improve the accuracyof judgmental performance. Balzer, Doherty, and O'Connor [1989] provide an extensive review ofthem and note that the bulk of these studies have used: (a) the outcome feedback condition as thecontrol group, and (b) the lens model measures of achievement (judgment accuracy), knowledge,and cognitive consistency as dependent measures. There was considerable evidence that cognitivefeedback, especially in the form of task information (TI), had a significant effect on such learning[Steinmann, 1974; Schmitt et al, 1976; Neville, 1978; Hoffman et al. 1981; and Adelman, 1981].Generally, both knowledge and cognitive consistency improved as a result of the use of cognitivefeedback and led to improvements in the accuracy of judgments. As well, Lindell [1976] found asignificant cognitive feedback by task complexity interaction --- as task complexity increased,cognitive feedback led to significant improvements in achievement, knowledge and cognitiveconsistency. These results from psychological studies that used generic multiple-cue probabilistic28learning tasks, have also been confirmed in what Libby [1981, p. 30] terms as "meaningful tasks,"i.e., tasks that incorporate significant contextual empirical referents, e.g., Harrell [1977]; Regel,[1985]; Luckett & Eggleton [1991]. As well, Kessler and Ashton [1981] found a significant effectfor task properties (TI) cognitive feedback on the learning of the financial analysis task which Libby[1981, p. 131] considers "a meaningful task where both theoretical and empirical relationshipsexist." Balke et al, [1973] had pairs of management and union negotiators make (a) judgment ratingsof the acceptability of hypothetical labour contracts, and (b) predictions of their counterparts'ratings, before and after the provision of cognitive feedback. Their results also confirm theeffectiveness of cognitive feedback. Interestingly, the negotiators who received cognitive feedbackreported that it gave them insights into their own and their counterparts' judgmental processes andwent on to suggest that it would be valuable to receive such information at the start of thenegotiation process (feedforward).There is also evidence of the effectiveness of feedforward [Steinmann, 1976; Galbraith,1984; Sengupta and Abdel-Hamid, 1991]. It is however less conclusive, largely because it has beenstudied to a lesser extent. This is because it is a much more difficult construct to operationalize thancognitive feedback. Cognitive feedback which seeks to explain the outcomes of a specific problem-solving case is by definition more focused and therefore easier to operationalize, both in the contextof the multiple-cue probability learning tasks that have been used in laboratory settings and the morerealistic "meaningful tasks". As a result, feedforward has been studied by only a handful of studiesthat attempted to directly compare the relative efficacy of feedforward to that of cognitive feedback.Of the three types of information (TI, CI, and FVI), all these studies used task information tooperationalize both feedforward and cognitive feedback. Using multiple-cue probability learning29tasks, both Galbraith [1984] and Steinmann [1976] found that feedforward was just as effective ascognitive feedback in fostering higher achievement (the learning of judgmental accuracy) andknowledge. For cognitive consistency, however, both studies found that feedforward was superiorto cognitive feedback, in that it allowed subjects to better understand and control the execution oftheir problem-solving strategies. Steinmann [1976] also reported that feedforward was just aseffective as cognitive feedback in simple as well as complex tasks. In reviewing these studies,Balzer et al. [1989, p. 423] has observed that because feedforward, which is provided prior to taskperformance, is just as effective as cognitive feedback, it will place a ceiling on the potentialimprovement that can be expected from the additional provision of cognitive feedback. Along thesesame lines, Steinmann [1976] suggests that future studies should attempt to "ascertain the point atwhich cognitive feedback is required in addition to feedforward." However, as neither of thesestudies manipulated the absence or presence of feedforward, but rather focused on comparing it tocognitive feedback, the unique contribution of feedforward cannot be separated. Studies conductedusing a wide variety of realistic tasks have demonstrated that feedforward, operationalized as thepre-task provision of problem-solving heuristics or training sessions, has a significant effect onlearning and is more effective than outcome feedback [Cats-Baril and Huber, 1987; Malloy et al,1987; Robertson, 1987]. However, Sengupta and Abdel-Hamid [1991] found that it was lesseffective than cognitive feedback in the context of a software project management task.In summary, there is evidence that the provision of feedforward and cognitive feedback iseffective in fostering learning in working situations, especially in comparison to outcome feedback.However, there is a lesser understanding of their relative efficacies and the interplay between themin these situations.302.2.4 The Cognitive Feedback Paradigm and the Provision of KBS ExplanationsThe merger of lens model judgment research and the psychological analysis of cognitivelearning operators has led to the term CFB paradigm "achieving the status of a technical term inthe literature" [Balzer, Doherty, and O'Connor, 1989]. It has been extensively studied in numerouslaboratory and applied judgment situations and its concepts have been borrowed by various otherdomains, e.g., the "outcome feedback" (OFB) and "cognitive feedback" (CFB) concepts of SocialJudgment Theory [see Hammond, McCelland and Mumpower, 1980] which explains how individualdecision-makers take into account the outcomes of decisions made by of others.In the mainstream management decision making literature, the distinction between outcomeand cognitive feedback and their relationship to learning is also understood and recognized. Forexample, Argyris's [1982] characterization of double-loop learning is related to the notion ofincorporating cognitive feedback loops for increasing management's understanding andcomprehension of a decision situation. Within the context of decision support systems (DSS),however, it has had a lesser impact. For example, Te'eni [1990] asserts that feedback is informationthat 1) is provided by the DSS to the user, 2) concerns the current or previous decision-makingprocess, decision, or outcome, and 3) is designed to help the user adjust the decision-makingprocess to better accomplish the decision task. This definition focuses primarily on outcomefeedback for streamlining decision making performance, and fails to recognize the cognitive learningeffects of feedback. DSS researchers are now beginning to study its implications. Examples of theseinclude the impact of alternative feedback strategies in dynamic decision making [Sengupta andAbdel-Hamid, 1991] and the role of cognitive feedback in improving control and convergence ingroup decision support systems [Sengupta and Te'eni, 1991].31It can be argued that the cognitive feedback paradigm is especially relevant to thedevelopment of KBS. This is because KBS go beyond the provision of outcome feedback, in theform of intermediate and final recommendations, to provide explanations. The provision ofexplanations, such as the Why, How, What, and Strategic explanations [Wick and Slagle, 1989],represents the provision of cognitive feedback. As was discussed in Section 2.1 the provision ofKBS explanations has the objective of increasing user understanding and comprehension in workingsituations. This is also precisely the objective for the provision of feedforward and cognitivefeedback in judgmental decision-making situations and the reason why they have been so extensivelystudied. The distinction between outcome feedback and cognitive feedback is also evident in the caseof KBS. For example, consider the case of a KBS that provides advice as to the selection orrejection of loan applications. The specific advice offered by the system is outcome feedback. Anyother information provided to explain the KBS's reasoning or the task domain is either feedforwardor cognitive feedback, e.g., how the outcome was reached, what are its implications, and why itwas necessary to consider certain input variables or sub-analyses. For these reasons, this study usesthe two cognitive learning operators of the cognitive feedback paradigm to conceptualize atheoretical basis for the manner in which KBS explanations can be provided. This feedforward andfeedback model for the provision of KBS explanations is expounded in the next section.2.3 Feedforward and Cognitive Feedback Strategies for the Provision of KBS ExplanationsThis section considers the role of feedforward and cognitive feedback as explanations thatare provided within a model of the typical user-KBS interaction. Figure 2.3 provides aconceptualization of such an interaction. It can be viewed as an alternating series of two differentkinds of stages. These are information exchanges where the user and the KBS exchange32SO -. System OperationIE = Information Exchange[System Presentation + User Action]FIGURE 2.3:TYPICAL USER - KNOWLEDGE BASED SYSTEM INTERACTIONinformation and system operations where the KBS "goes away" from the interaction to performsome analysis or function. Human-computer interface designers have generally focused onoptimizing the information exchanges that take place. Other designers of hardware and softwaregenerally try to minimize the system's "going away to work" time as it represents a wasted or33inactive period from the users perspective. As well, each information exchange involves a pairingof the system making a presentation and the user undertaking some action and vice versa. These areoften respectively termed the presentation language and action language, in the study of human-computer interaction [Bennett, 1983, p. 45]. System presentations generally involve the KBS askingfor input data or control instructions and displaying questions, explanations, user-action options, andsystem conclusions. User actions generally include initiating a consultation, providing domain factsor input data, requesting explanations, selecting options, and reading system presentations.According to this conceptualization, explanation provision is just one aspect of the total user-KBS interaction. It represents an information exchange, rather than a system operation, with theuser requesting an explanation and the KBS providing it. As well, based on Coombs and Alty's[1980] finding that human explanation seeking behavior is context-driven, it can be assumed thatexplanations that are requested or provided will usually relate to the particular aspect of the overalltask that is currently being resolved by the KBS. The resolution of these sub-tasks is representedby the system operations in Figure 2.3. Therefore, explanations relating to a particular systemoperation can be provided at two junctures. They can be provided as part of the informationexchange preceding the particular system operation or as part of the information exchange directlyfollowing its completion (see Figure 2.4). In the former case it can be termed feedforwardexplanation provision and in the latter case feedback explanation provision.Considering the distinctions between the feedforward and cognitive feedback learningoperators that were discussed in Section 2.2.3.2, there are three fundamental differences infeedforward and feedback explanation provision. First, feedforward explanations are not case34Figure 2.4 : Alternative Explanation Provision Strategiesspecific while feedback explanations are focused on case-specific outcomes. Second, feedforwardexplanations are made available prior to the KBS performing a particular system operation, whilefeedback explanations are provided subsequent to the system operation's conclusions (outcomefeedback) being displayed by the KBS. Third, feedforward explanations relate to the inputs used inthe particular system operation, while feedback explanations focus on clarifying the outcomes of thatsystem operation.This cognitive learning conceptualization of the provision of KBS explanations means thatKBS designers have a choice of three possible strategies that can be used to provide explanationsin an effort to increase user understanding during the use of the KBS in working situations. These35explanation provision strategies are: 1) provide only feedforward explanations, 2) provide onlyfeedback explanations, and 3) provide both feedforward and feedback explanations. As well, thesethree strategies are pertinent to all the various types of explanations that are currently provided byKBS, e.g., the Why, How, and Strategic explanations. Each of these explanations can be providedin both feedforward and feedback forms as part of the three strategies. However, it can be expectedthat for each type of explanation, one of the three explanation strategies will be more effective thanthe others in increasing user understanding and comprehension. This suggests that in workingsituations involving the use of a KBS, the specific nature and timing of explanation provision willdirectly influence the type of explanation that is selected for use by KBS users. This issue will befurther pursued in the next chapter.In summary, just as outcome feedback is the primary learning operator for the acquisitionof physical skills, e.g, learning to ride a bicycle, cognitive feedback and feedforward explanationsare the primary learning operators for learning during judgmental decision making involving the useof a KBS. While the literature is equivocal in terms of the comparative value of feedforward andcognitive feedback, there are reasons though to believe that one is more effective than the other atdifferent stages of the cognitive skill acquisition process. For example, Steinmann [1976] found thatmore experienced judges used cognitive feedback to improve their judgments to a greater degreethan did less experienced judges. Their comparative value can also be expected to be differentduring the various stages of the application or use of cognitive skill. These aspects are discussedin the next section.362.4 The Role of Feedforward and Feedback In the Acquisition and Application of Expertise This section attempts to understand the roles that feedforward and cognitive feedbackexplanations play at the various stages of the judgmental process. As well, it analyzes how theireffectiveness varies in relation to the level of expertise that is possessed by decision makers.2.4.1 Role of Feedforward and Feedback Explanations In Hogarth's Model of Judgment(Cognitive Skill Application)Explanatory information plays a central role in the acquisition and application of domainexpertise in judgmental situations [Hogarth, 1981]. This is evident if one considers the differentroles of feedforward and feedback explanations within a modified version of Hogarth's [1987] modelof judgment, that is presented in Figure 2.5. The model views judgmental decision making as beingthe application of cognitive skill or expertise, and views it as taking place within a system composedof three elements --- the decision maker, the task environment, and the actions that result fromjudgment and that can subsequently affect both the decision maker and the task environment.At the cognitive level, the exercise of judgment that leads to behavioral action can bedecomposed into three operations as depicted in the model. These are: 1) the acquisition of theexpertise schema, 2) the retrieval and processing of the expertise, and 3) the judgment itself. Thisdecomposition is similar to the computational theory of the mind [Marr, 1982] which treats the brainas an information processing system [Newell and Simon, 1972] and is consistent with Payne's[1982, p.386] characterization of the decision making process as being composed of the threesubprocesses of feedback/learning, information acquisition and information evaluation. Additionally,the exercise of judgment takes place within a system of two subsystems. First, there is the task37ProcessingTask EnvironmentSchemaOutcome Feedback (OFB)I^AcquisitionJudgementBehaviourOutcome4,):1aFIGURE 2.5:Role of Explanations in the Use of Expertise38environment within which the decision maker makes the judgment. Second, there is the decisionmaker's schema comprising declarative knowledge and problem solving strategies. This schemasymbolizes the decision maker's knowledge and beliefs concerning the task environment and his orher representation of it [Norman, 1983; Gentner and Stevens, 1983].Feedforward and cognitive feedback explanations, together with outcome feedback, arecritical to the carrying out of all the three operations --- acquisition, processing, and judgment.While the provision of cognitive feedback and feedforward explanations directly impact theindividual's schema or representation of expertise, the impact of the outcome feedback loop islimited to the task environment. Thus, feedforward and cognitive feedback explanations are theprimary operators in the acquisition operation. These explanations directly provide schema shapinginformation to the decision maker. Otherwise, he or she would have to resort to inferring itindirectly by studying the impact of outcome feedback on the task environment. This would be aninefficient form of learning as a large number of trials or practice runs would be necessary. As well,such learning is impeded by a variety of cognitive biases, such as the availability heuristic, therepresentativeness heuristic, the law of small numbers, anchoring and adjustment, and illusorycorrelation [see Hogarth, 1987, pp. 166-170, and Kahneman, Slovic and Tversky, 1982, forcomplete reviews]. There is also no guarantee that the multiple outcome feedback loops willfacilitate the acquisition of all the different kinds of knowledge that constitute the schema. This isespecially the case for the acquisition of structural and strategic meta knowledge. It is argued thatexplanations provided in the form of feedforward or cognitive feedback will be much more efficientfor acquiring these kinds of knowledge than such trial-by-trial learning. This highlights the criticalrole that the explanations play in the efficient and effective acquisition of expertise schemas.39During the schema processing operation, domain expertise is retrieved and applied to resolveproblems. Here the three learning operators serve different but critical functions. The outcomefeedback loop provides the data for 'pattern-matching' with knowledge stored in the person'sschema. Feedforward and cognitive feedback explanations on the other hand, by fosteringunderstanding and comprehension of the domain knowledge, facilitate efficient retrieval of expertisefrom long-term memory by directly reshaping the decision-maker's schema. This schema acts asa filtering and guiding aid in the retrieval and application of the relevant knowledge. The moredeveloped the schema of the decision maker, the more likely it is that unguided traversal in long-term memory and exhaustive pattern-matching in short-term memory will be avoided during thisschema processing operation. The strong priming effects of feedforward explanations, which areprovided prior to task performance, will also facilitate schema processing during the judgmentalprocess.The output of the processing operation leads to some action being taken within the taskenvironment. The consequences of this action impacts both the task environment and the decisionmaker's schema. Outcome feedback impacts the decision maker's expertise schema only indirectly,through observation of the calibration that goes on in the task environment as a result of theoutcome feedback. This is inevitable as the task environment incorporates within itself the schema.Cognitive feedback explanations, on the other hand, do not provide the basis for a change in thetask environment. Rather, by providing explanatory information which is relevant to and triggeredby the outcome, they directly reshape the decision maker's schema through improved understanding.Similarly, feedforward explanations also shape the schema directly without impacting the taskenvironment. Unlike cognitive feedback explanations, however, they are not triggered by the40outcome nor do they attempt to explain the relationship of the outcome to the schema. Rather, theyare triggered directly from outside the task environment and are independent of any particularoutcome.2.4.2 Role of Feedforward and Feedback Explanations In the ACT Theory of Cognitive SkillAcquisitionIn an investigation of human face-to-face advisory service interactions, Coombs and Alty[1980], found that the expertise level of advice-seekers was the major determinant of 1) the specificnature of such advisory interactions, and 2) the explanations and advice sought by users andprovided by the experts. This finding suggests that significant expert-novice differences cantherefore be expected in relation to the use and the effectiveness of feedforward and cognitivefeedback explanations. It is therefore useful to consider the different roles of feedforward andfeedback explanations within the context of the ACT theory of skill acquisition [Anderson, 1982].This theory views the cognitive skill learning process, i.e., the acquisition of judgmental expertise,as a sequence of phases termed the declarative phase, the knowledge compilation phase and theprocedural phase. Generally, individuals at the initial declarative phase can be regarded as novicesin the particular task domain, while those at the final procedural phase as experts. Those at theintermediate knowledge-compilation phase can be considered to be apprentices.During the declarative phase, the individual's initial performance involves the operation ofgeneral strategies utilizing declarative knowledge to guide performance. The individual learns frominstruction and from the observation of the consequences of actions undertaken (outcome feedback).Feedforward explanations can serve to provide the individual with knowledge of the requisite41declarative knowledge and the initial problem-solving strategies at this phase. Feedforwardexplanations that clarify the reasons for the use of particular input data and the manner in whichthese input data is structured can be expected to be especially effective. As well, explanations thatprovide an understanding of the declarative knowledge relating to the various entities and inputvariables involved, e.g., definitional information, can be expected to be effective.The knowledge compilation phase involves the conversion of the slow and consciousdeclarative knowledge interpretations into faster compiled procedures. This is where the individualbegins to develop an overall conceptualization (model) of the domain expertise. This is wherecognitive feedback explanations can be expected to be of greatest value to the individual from alearning perspective. For example, feedback explanations that explain how conclusions are reachedand clarify the implications of these conclusions, will facilitate the acquisition of the efficientcompiled procedures by fostering a greater understanding of the relevant knowledge structure andinference procedure. Other explanations that describe the meta-knowledge and control strategiesused to solve the task, will expedite the acquisition of the overall model of the domain includingthe relationships between the various components and the ordering or structuring amongst them.These explanations will be just as effective irrespective of whether they are provided as cognitivefeedback or feedforward.At the final procedural phase, the associations between the components of the individual'sconceptual model become stronger and there is no longer the need to consciously initiate eachseparate action. The newly acquired models are refined and tuned through the process of over-practising and reinforcement to the point of automation and unconsciousness. This is often termed42the tacit dimension of expertise [Polanyi, 1966], and the individuals can now be considered to betrue experts. Non-explanatory outcome feedback, which is central to the reinforcement process, isgenerally of critical value at this stage. Both feedforward and cognitive feedback explanations areof little value to the expert individuals as they will be performing instinctively and unconsciouslyand will have little time and patience for explanations both prior and subsequent to taskperformance. Therefore, it is reasonable to expect that individuals at this stage will use bothfeedforward and cognitive feedback explanations minimally. However, if these expert individualsare faced with anomalous system conclusions with which they disagree, they will largely utilize andbenefit from the cognitive feedback explanations that clarify how the conclusions were reached.These explanations will act as recall triggers to the individual's own expertise which wouldotherwise be unconscious and therefore unavailable. Feedforward explanations will not be effectiveat this stage.In summary, this theory suggests that the level of expertise that an individual alreadypossesses will exert a moderating influence on the effectiveness of the explanations that are providedas cognitive feedback or feedforward. For example, it is reasonable to expect that novices will makegreater use of KBS explanations than experts. As well, they will benefit more from the use offeedforward explanations as compared to feedback explanations. As well, cognitive feedbackexplanations will generally be more effective in fostering greater understanding of the task inindividuals who are apprentices or experts in the task domain.432.5 Summary of the Chapter This chapter heeds recent calls to researchers of decision support systems to pay moreattention to the learning that is fostered by the use of such aids [Mackay and Elam, 1992], andtackles the fundamental question of why knowledge-based systems have to provide explanations. Bydistinguishing between the learning that occurs in instructional situations from that in workingsituations, it utilizes the lens model to identify the types of explanatory information that need to beprovided to KBS users to foster learning in working situations. As well, it uses concepts relatingto the three cognitive learning operators of the cognitive feedback paradigm --- outcome feedback,feedforward, and cognitive feedback, to provide a theoretical basis for 1) why KBS need to provideexplanations in addition to advice, and 2) a model of explanation provision strategies for KBS. Thenext chapter considers the feedforward and feedback provision of explanations within a morecomprehensive framework of factors that influence the use of KBS explanations and the effects ofusing these explanations.44CHAPTER 3: A FRAMEWORK FOR INVESTIGATING THE USE OF KNOWLEDGEBASED SYSTEM EXPLANATIONS AND RESEARCH HYPOTHESES3.0 Introduction This chapter presents the research hypotheses of the study within a two-part framework forevaluating aspects of the use of knowledge based system (KBS) explanations. Initially, Section 3.1distinguishes between the selection and use of KBS explanations, and reviews the various types ofexplanations that are commonly provided by KBS explanation facilities. It also provides a reviewof past studies that have investigated the use of KBS explanations. The first part of the framework,which is presented in Section 3.2, identifies the various factors that influence both the amount andthe types of KBS explanations that are used by KBS users. The second part comprises a model ofthe dependent variables that can be used for studying the effects of the use of KBS explanations, andis discussed in Section 3.3. Specific research hypotheses are developed in Section 3.4, byconsidering the feedforward and feedback explanation provision strategies that were described inChapter 2, within the framework for studying the use of KBS explanations. These hypotheses areconcerned with the following types of influences associated with the feedforward and feedbackprovision of KBS explanations: 1) To what extent are explanations utilized in judgmental decision-making situations involving the use of a KBS? 2) What factors influence the use of KBSexplanations? and 3) Does the use of explanations empower KBS users in any way?453.1 The Use of the Various Types of KBS Explanations This section begins by defining what it meant by "the use of explanations" that are providedby a KBS. An understanding of this is critical to the design of explanation facilities that are usableand effective. Next, in Section 3.1.2, the various types of KBS explanations are presented, andconsidered in relation to the feedforward and feedback explanation provision strategies. Finally,Section 3.1.4 summarizes past studies that have investigated the use of KBS explanations.3.1.1 Selection Versus Use of KBS ExplanationsExplanation selection and explanation use are two distinct but related constructs. The formerrepresents the explicit behavior of selecting and viewing explanations which is exhibited by usersof KBS. Explanation use refers to the cognitive processing of explanatory information in judgmentaldecision making and problem solving using a KBS. It can reasonably be argued that only a subsetof the explanatory information that is selected and viewed by KBS users is actually used in decisionmaking. The cognitive processing of explanatory information is difficult to operationalize andmeasure, largely because it is visually unobservable. It therefore requires techniques for cognitiveobservation as against techniques for visual observation or physical measurement. While variousprocess tracing techniques [Ericcson and Simon, 1984] provide reasonable approximations of thenature of such cognitive processing, they are less reliable in terms of providing estimates of theexact amount of processing that takes place. This is primarily because the completeness of theprotocols that are collected is generally indeterminate [Nisbett and Ross, 1980]. For these reasons,it is proposed that the explanation selection behavior of KBS users, which is visually observable andphysically measurable, be used as a measurable surrogate for the cognitive use of the informationthat is provided by such explanations. While this chapter focuses on the cognitive use of explanatory46information that is provided by a KBS, it uses the two terms interchangeably; however, thedistinctions between them, and the deficiencies of selection as a surrogate for use, are recognizedand understood.3.1.2 Types of KBS ExplanationsThe Why and How explanations, which were first introduced in MYCIN [Shortliffe, 1976],remain the foundation of most explanation facilities found in current KBS applications anddevelopment shells. Attempts have been made to incorporate other forms of explanations. Theseinclude the Strategic, What, and What-If explanations. The Strategic class of explanations provideinsight into meta-knowledge, especially the control objectives and overall problem solving strategiesused by a system. For example, the NEOMYCIN system explicitly outlines problem solvingstrategies in its own knowledge base and makes them available for explanation [Hasling, Clanceyand Rennels, 1984]. The What explanations are designed to give insight into object definitions ordecision variables used by a system [Rubinoff, 1985]. They serve as responses to queries such as:"What do you mean by object or variable name"? The What explanation is significantly differentfrom the What-If query facilities commonly found in decision support systems (DSS). These referto the ability to re-run a consultation with changed model parameters. While such What-If facilitiescan be provided as part of the KBS interface, they are not viewed as being explanations per se, butrather as tools for sensitivity analysis. To be considered a distinct category of explanations, What-Ifhas to be implemented as the direct and explicit provision of information about the sensitivity ofdecision variables to KBS users, instead of being a facility for performing sensitivity analysis.Various classifications of the many types of explanations that should be provided by KBS47have been suggested [Neches, Swartout, and Moore, 1985; Clancey, 1985; Wick and Slagle, 1989].These classifications can be condensed as subscribing to one of two possible criteria fordistinguishing between the types of explanations. The first criteria is the nature of the explanationqueries. For example, Wick and Slagle [1989] discuss explanations whose queries begin with What,Why, How, When, Where, etc. As well, Neches, Swartout, and Moore [1985] consider the How,Why, When, and What range of queries, as part of XPLAIN's explanation facility. The secondcriteria is the nature of the explanation responses. Swartout and Smoliar [1987] distinguishbetween explanations that provide terminological knowledge, domain descriptive knowledge, andproblem-solving knowledge. Wexelblat [1989] suggests that KBS users require information aboutprocedures, reasoning traces, action goals, control, and self-knowledge. Gilbert [1988] presents twoways of distinguishing explanation responses. First, responses can provide case-specific knowledge,domain knowledge or meta knowledge. Second, they can provide taxonomic knowledge, formalknowledge, contingent knowledge, or control knowledge.Irrespective of whether they are based on explanation queries or explanation responses, thereis a major problem with all these classifications. Lacking a sound theoretical basis, the various typesof explanations that comprise each of these classifications are neither consistently defined, nor iseach classification comprehensive. However, largely based on Clancey's [1983] characterization ofthe epistemological roles that knowledge can play in KBS explanation, a consensus has emerged onthe three primary types of explanations that KBS ought to provide [Southwick, 1991, p. 3; Ye,1990]. Corresponding to the three epistemological roles of structure, support, and strategy, thesethree types of explanations are: 1) trace explanations that describe contents and reasoning(structure), 2) deep explanations that justify underlying reasons for a state or an action based on48causal models (support), and 3) strategic explanations that clarify problem-solving strategy andmeta-knowledge (strategy). This taxonomy of the three primary types of explanations, has also ledto a convergence of opinion on the matching of explanation queries with explanation responses. Thisis as follows: the How explanation queries are used to provide trace explanations, the Whyexplanation queries are used to provide causal justifications, and the Strategic explanation queriesare used to provide clarifications of control strategies and meta-knowledge. These three types ofexplanations will be adopted in this dissertation.3.1.3 Feedforward and Feedback Explanation Provision and the Types of ExplanationsThe feedforward and feedback explanation provision strategies that were developed inFeedforward Why explanations justify the importance of, and the need for, input informationto be used or a procedure that is to be performed.Feedforward How explanations detail the manner in which input information is to be obtainedfor use and procedures that are to be performed.Feedforward Strategic explanations clarify the overall manner in which input information tobe used is organized or structured, and specify the manner in which each input cue to be usedfits into the overall plan of assessment that is to be performed.Feedback Why explanations justify the importance, and clarify the implications, of a particularconclusion that is reached by the system.Feedback How explanations present a trace of the evaluations performed and intermediateinferences made in getting to a particular conclusion.Feedback Strategic explanations clarify the overall goal structure used by a system to reach aparticular conclusion, and specify the manner in which each particular assessment leading tothe conclusion fits into the overall plan of assessments that were performed.Table 3.1: Definitions of Explanations49Chapter 2 are relevant to all three of the Why, How and Strategic explanations, i.e., each of thesetypes of explanations can be presented both as feedforward and feedback. Table 3.1 presentsdefinitions for the three types of explanations when they are presented as feedforward or feedback.The feedforward explanations differ from feedback explanations as follows: (a) the feedforwardversion is not case-specific while the feedback version explains a particular case-specific outcome,(b) the feedforward version is presented prior to an assessment or diagnosis being performed whilethe feedback version is presented subsequent to the assessment and after the presentation of theoutcome of that assessment, and (c) the feedforward version focuses on the inputcues while the feedback version focuses on the outcomes.The concepts of feedforward and feedback explanation provision can also be used to classifythe manner in which the three types of explanations are provided in current KBS. As an example,Table 3.2 presents the manner in which MYCIN and its descendants presented their explanations.Table 3.2Explanation Provision StrategiesUsed For the Three Types of KBS ExplanationsType of Explanation Explanation Provision StrategyWhy FeedforwardHow FeedbackStrategic Feedforward/FeedbackThe Why explanations were presented in feedforward form, but the How explanations wereprovided in feedback form. The Why explanations provided by MYCIN presented information50clarifying the reasons a particular question was posed to the user. As these questions took the formof requests for input information and were posed prior to MYCIN performing a particular diagnosis,the explanations focused on providing declarative task information about the entities andrelationships in a task domain. This is consistent with the definition of feedforward explanations.MYCIN's How explanations provided information as to the basis for particular conclusions thatwere presented to the user. They focused on providing post-hoc explanatory information about thespecific inference procedures used to arrive at each particular conclusion. This conforms to feedbackexplanation provision. While MYCIN did not provide Strategic explanations, they were incorporatedby some of its descendants, i.e., NEO-MYCIN and GUIDON. In these systems, the Strategicexplanations were provided both in feedforward and feedback form.The exact reasons as to why these particular provision strategies are used for the varioustypes of explanations remain unaddressed. However, it suggests that there could be a closerelationship between the types of explanations and the feedforward and feedback explanationprovision strategies.3.1.4 Literature Review of Past Studies of the Use of KBS ExplanationsThree empirical studies of the use of explanations as part of the user-KBS interface werefound in the literature.Lerch et al., [1990] focused on some effects of the use of KBS explanations. They measureduser agreement with, and confidence in, conclusions presented by a KBS. Subjects were told thatthese conclusions were obtained from one of three different "sources of advice" --- novices, experts51or a knowledge-based system. As well, they used three different treatment conditions --- noexplanations provided, explanations provided in the form of English sentences, and explanationsprovided in production rule form. They found that while the use of explanations had an impact onthe level of user agreement with the conclusions, it did not change users' confidence in the sourceof advice on which the conclusions was modelled. The different types of explanations were notconsidered in this study, rather a generic category closely resembling the How explanation wasused.In another study, Ye [1990] had subjects evaluate explanations presented in a fixedsequential order and compared user perceptions of usefulness and user preferences for the threedifferent types of explanations. The types considered were labelled as the Trace, Justification, andControl explanations. The first two are analogous to the How and Why explanations of MYCIN[Shortliffe, 1976], and the third to the Strategic explanations found in NEOMYCIN [Hasling et al.,1984]. Ye found that the use of explanations had a positive effect on user agreement with KBSconclusions and that the Why explanation was the most preferred explanation across levels of userexpertise and types of inference used for heuristic classification tasks. As well, experts and noviceswere found to have differing perceptions of usefulness for the various types of explanationspresented. Experts perceived the How explanation as being most useful, and novices the Whyexplanation. The primary weakness of this study was that the context used to collect the measuresof user preference and perception of usefulness was not representative of the usual context in whichexplanations are used. The primary goal of the experimental task that was performed by subjects,was to evaluate and criticize the explanations that were presented, instead of the use of explanationsin a problem solving or decision making situation involving the use of a KBS. As well, the study52also assumed that experts and novice subjects would have a generic preference for one of the typesof explanations, irrespective of the problem-solving context involved.In a field study, Lamberti and Wallace [1990] investigated interface requirements forknowledge presentation in knowledge-based systems. They examined interactions between userexpertise, knowledge presentation format (procedural versus declarative formats), question type(requiring abstract versus concrete answers), and task uncertainty, in terms of the speed andaccuracy of decision making performance. They found that for highly uncertain tasks, response timeand accuracy for questions with declaratively formatted explanations (as compared to proceduralones) was better for higher skill users. However, for low uncertainty tasks the low-skilled subjectsperformed equally fast, but more accurately than high-skill users, when presented with declarativeexplanations to questions. Also, for explaining the procedures used in strategies of problem solving,both high and low skill users felt more confident with procedural explanations in contrast todeclarative explanations. In relation to concrete versus abstract knowledge organization, the studyfound that low skill users performed significantly faster and more accurately when answeringquestions requiring concrete knowledge organization. High-skill users performed faster, althoughnot necessarily more accurately, when responding to questions requiring abstract knowledgeorganization, in contrast to concrete knowledge organization.The Ye study focused on capturing user perceptions for three different types of explanationsthat were presented sequentially to users. The Lerch et al. study focused on comparing the effectsof explanations with varying levels of source credibility. The Lamberti and Wallace study, whileonly indirectly investigating explanations, investigated issues relating to the content of explanations.53The objective of this research is to add to this knowledge by focusing on other aspects of theprovision and use of KBS explanations which have not been addressed as yet. These includeinvestigating the influence of providing feedforward versus feedback types of explanations on theuse of explanations, and the manner in which the use of KBS explanations affects both the accuracyof judgmental decision making as well as user perceptions of usefulness. Prior to a discussion ofthese, the various factors that influence the use of explanations are discussed in the next section.3.2 A Framework of the Factors that Influence the Use of KBS Explanations Figure 3.1 presents a host of factors that potentially determine both the amount and the typesof KBS explanations that are used. These factors can be classified into four separate categoriesrelating to the characteristics of: 1) the task setting, 2) the explanations provided, 3) the interfacedesign and explanation provision strategies used, and 4) the users.3.2.1 Task CharacteristicsThe nature of the KBS task and the context in which the KBS is used comprise the firstcategory of factors that will influence the amount and the types of explanations used. The types oftasks that KBS perform can be categorized by various classifications, including analysis tasks versussynthesis tasks [Gaines and Boose, 1985], and heuristic classification tasks versus heuristicconfiguration tasks [Clancey, 1985], etc. While these classifications overlap to some extent, eachof them can be further decomposed into a larger hierarchy of many levels of tasks. For example,the heuristic classification task can be decomposed into the three inference processes (sub-tasks) ofdata abstraction, heuristic match and solution refinement. Similarly, Waterman [1986] offers abreakdown of analysis tasks into sub-categories such as diagnosis, prediction, etc. The Ye [1990]54TASK CHARACTERISTICS1. Task Type: - analysis- synthesis2. Context of Use- end user applications- knowledge base debugging- knowledge base validationEXPLANATION CHARACTERISTICS1. Explanation Type - Why- How- Strategic2. Explanation Content- amount of information- abstract versus concrete- granularity and specificity- focus to user groups- emphasisINTERFACE DESIGN & PROVISION STRATEGY1. Provision Strategy - feedforward, feedback2. Accessibility - active strategy- passive strategy3. Communication Mode - audio/visual4. Presentation Format - text/image/animationUSER CHARACTERISTICS1. User Expertise - domain expertise- system expertise2. Individual Differences: Cognitive & Personality3. Level of User AgreementFigure 3.1: Determinants ofthe Use KBS ExplanationsUseof KBSxplanaty*55study directly investigated the influence of varying task types on the preference for the three typesof explanations, by utilizing the data abstraction and heuristic match levels of heuristic classificationas independent variables. However, it did not find significant differences in the preference forexplanations between the two levels. The use of explanations that are provided by KBS performingsynthesis or heuristic configuration tasks, such as design or planning, has not been studied.Considering the critical differences between these tasks and the more common diagnostic tasks, itis reasonable to expect that they will result in different patterns of KBS explanation use.The context in which a KBS is utilized will determine the purpose for which the explanationfacility is used. This will directly influence the use of KBS explanations. Three contexts for the useof KBS explanations can be identified: 1) by end-users in problem-solving contexts, 2) byknowledge engineers in carrying out knowledge-base debugging activities [Southwick, 1991], and3) as part of KBS validation activities carried out by domain experts and/or knowledge engineers.The distinctions between these three contexts are critical and stem from the fact that the use of KBSexplanations in systems development is motivated by a different set of objectives than when usedas part of end user applications. It can therefore reasonably be expected that end users of KBSapplications will use explanations differently from when they are used during debugging, validation,or other KBS development activities.While explanations are commonly incorporated into most end user applications of KBS, theyalso play a significant role in the development of KBS by offering enhanced debugging andvalidation abilities. Most current KBS development shells and environments include tools that utilizeexplanations to aid efficient and effective system development, e.g., the Knowledge Engineering56Environment (KEE) from Intellicorp. Another example is the REPORT command in the VPExpert[Paperback Software, 1988] shell. This command lists in sequential order all the explanationsattached (using the BECAUSE clause) to rules that 'fired' as part of a consultation. Such a listingassists in the debugging of processing logic by knowledge engineers. It also allows users anddomain experts, who may not be familiar with representation schemes and inference engines, toparticipate in the validation of a knowledge-base.In contrast to debugging and validation, the use of explanations by end users of KBSapplications is motivated by a different set of reasons. For example, Lamberti and Wallace [1990,p. 302] suggest that an explanation facility is used: 1) by decision makers because it aids them informulating problems and models for analysis, 2) by sophisticated users because it assures them thatthe system's knowledge and reasoning process is appropriate, and 3) by novice users because it caninstruct them about the knowledge in the system as it is applied to solve a particular problem. Thereare also a variety of contexts in which end user applications of KBS are used. For example, whilesome applications are used as tools for training novices in a domain, others are used by experts tosupport their own decision making. The organizational context in which these end user applicationsare used will also affect the use of the explanations. Sviokla [1986] notes that some organizationsinstitutionalize the use of such KBS applications for making certain critical decisions. The use ofexplanations when end-users are compelled to use the KBS will certainly be different from thesituation when end-users utilize the KBS as a decision aid by choice. In summary, many differentcontext of the use of KBS can be identified as potentially influencing the use of KBS explanations.However, the influence of this critical factor has not been investigated to-date.573.2.2 Characteristics of the Explanations that are ProvidedThe nature of the explanatory information that is provided by a KBS to its users willcertainly influence the explanations that are used. These can be divided into two major categories:explanation type and explanation content. While the types of explanations were discussed in Section3.1.2, there is considerable overlap between these two categories. The three types of explanationsare by definition different in content. For example, the Why explanations focus on providingdeclarative information about the task, the How explanations provide procedural task information,and the Strategic explanations present meta-knowledge of the task. Similarly, the various types ofexplanations will also differ in content in relation to whether they are provided as feedforward orfeedback. For example, feedback explanations, being outcome specific, will by definition be moreconcrete and at a lower level of specificity than the more generalized feedforward explanations.While the investigation of the influence of the types of explanations is potentially morerelevant, largely because both KBS developers and users distinguish clearly between them, it is alsopossible to study directly the influence of various dimensions of content, e.g., the Lamberti andWallace [1990] discussed in Section 3.1.4. Some relevant dimensions of explanation content includethe following. The informational content of the explanations in terms of the number of signals thatare incorporated represents the first dimension. The second dimension is the abstraction level of theexplanations, i.e., how concrete or abstract they are from the perspective of users. The thirddimension is the granularity and specificity of the explanations, e.g., the lowest level will have themost amount of detail and vice versa. Fourth, explanations can be focused towards particular usergroups, such as knowledge engineers, domain experts and end users, or they can have a moregeneral focus. Terminological differences in explanations can be expected depending on who are58the target users of the explanations. Fifth, explanations can emphasize different aspects of that whichis being explained, e.g., procedural aspects in contrast to declarative aspects.3.2.3 Interface Design and Provision StrategiesThe design features of the interface used to provide explanations, as well as the strategiesused for providing explanations, will also influence the patterns of KBS explanation use. Specificaspects of the interface design include the following. First, the amount of effort required for usersto access the explanations, i.e., the accessibility of the explanations, will influence their use. Twopossible classes of strategies for accessing explanations can be identified. These include an activestrategy where the KBS presents explanations without the user having to request them, and a rangeof passive strategies that require the user to make varying levels of explicit physical effort to accessthe explanations. Such effort can range from clicking on specialized explanation icons presented onthe screen to hitting predesignated function keys for accessing and scanning the explanations.Generally, an active strategy has the system interrupt the dialogue to provide explanations or makesthem available continuously as part of every screen of the KBS. In the design of KBS explanationfacilities, it is important that interface designers consider the amount of effort that is required forusers to request and access the explanations. The results of recent studies of cost-benefit models ofthe effort involved in utilizing computerized decision aids [Todd and Benbasat, 1991] suggest thatthe accessibility of explanations, i.e., the cost of accessing them, will exert a salient influence onthe use of explanations. Second, the communication mode used for presenting the explanations, e.g.,audio and/or visual modes, will also influence the use of explanations. Third, the presentationformat utilized for the explanations is also a factor, e.g., text explanations in contrast to image-based explanations that use graphical, iconic and animation formats. The influence of all these59aspects of the interface used to provide KBS explanations, on the use of KBS explanations has notbeen investigated to-date.Considering that the primary reason for the provision of KBS explanations is to improveusers' understanding of the KBS and its domain, the feedforward and feedback explanation provisionstrategies conceptualized in Chapter 2 will also influence the use of explanations. The importanceof these explanation provision strategies, that are based on the cognitive feedback paradigm,becomes obvious if one considers the analogy of a child engaged in a learning process to improvehis or her understanding. While "an explanation machine", in the form of a child's parents, maybe continuously available to provide explanations about some phenomenon that is the target of thelearning, the child will only seek, attend to, and benefit from explanations provided at particularstages of the learning process. At different stages of the process different types of explanations willbe sought, and it can be expected children at varying stages of cognitive development will seekdifferent amounts and types of explanations. As well, it is also likely that explanations providedautomatically without being requested, will at times impede rather than encourage, the learning thattakes place. This analogy therefore suggests that any evaluation of the influence of the explanationprovision strategies must therefore take into consideration the other factors that are identified in theframework of Figure 3.1.3.2.4 User CharacteristicsThree distinct categories of user characteristics that will impact the use of explanations canbe identified --- user expertise, individual differences, and the level of user agreement with the KBS.Of these, user expertise is potentially the most significant to the design and use of KBS60explanations. Section 2.4.2 of Chapter 2, which considered the role of feedforward and feedbackexplanations within the three stages of Anderson's [1982] theory of skill acquisition, suggests thatuser expertise will influence the use of explanations. As well, all the three empirical studies of KBSexplanations that were discussed in Section 3.1.4, have found significant effects for this factor. Allthese studies used users' knowledge of the task domain to operationalize user expertise. However,the human-computer interface literature reveals another aspect of user expertise that can beconsidered as being just as relevant. This is the level of users' expertise or familiarity withknowledge-based systems themselves [Lehner and Zirk, 1987]. This is termed in Figure 3.1 assystems expertise. The influence of this dimension of user expertise on the use of KBS explanationshas not yet been studied.Various types of cognitive and personality based individual differences can also be identified,primarily from the literature on decision support systems (DSS), as potentially influencing the useof KBS explanations. However, while much is known of their influence on human cognitivefunctioning, they suffer from a lack of an adequate and coherent theoretical basis [Huber, 1983].Additionally, as is now recognized in the study of DSS, there is only a small likelihood that anindividual differences approach to the design of decision aids will yield practical and cost-beneficialdesign requirements. These individual differences have not been studied in the context of the useof KBS explanations.The final category of user characteristics that can be identified is the level of user agreementwith the KBS. Both the Ye [1990] and the Lerch et al. [1990] studies found that the use ofexplanations increased the level of user agreement with KBS conclusions. This finding, together61with the finding that experts were more likely to agree with a KBS's conclusions than novices [Ye,1990], suggests that there could potentially be a reverse effect as well. The level of initial useragreement with KBS conclusions would influence, to some extent, the amount of explanations thatare used. As the differences in the level of domain expertise can result in different levels ofagreement with a KBS's conclusions, this suggests that the level of user agreement potentiallymoderates the influence of user expertise on the use of explanations.In summary, this section has identified a variety of factors that could potentially influencethe use of KBS explanations. Any study of a subset of these factors, e.g., the feedforward andfeedback provision of explanations, has to consider and account for the potential influence of theother factors as well. The next section presents the second part of the investigative framework forthe use of KBS explanations. It focuses on the effects of the use of KBS explanations.3.3 A Framework for Investigating the Effects of the Use of KBS Explanations KBS explanation facilities will only be utilized by KBS users if they obtain some real orperceived benefit from their use. Conversely, it may be argued that some KBS users will utilizeexplanations because they are not able to perceive the detrimental effects of such use. Currently,we know very little about the effects of the use of KBS explanations, especially in applied settingswhere the KBS is used as a decision aid. Lerch et al. [1990] conducted the only study that hasdirectly investigated some effects of the use of explanations. They found that the use of explanationshas an effect on the level of user agreement with KBS conclusions but has no effect on users'confidence in the KBS. This study proposes to test the effects of the use of KBS explanations thatare provided as feedforward and/or feedback in an applied setting where a KBS is used as a decision62aid to make judgments under conditions of uncertainty. Prior to a discussion of this, however, itis important to consider all the possible effects that the use of explanations may have in such asetting. Viewing KBS explanations as a decision aid, this section therefore discusses potentialdependent variables that can be used for investigating the effects of the use of KBS explanations.Figure 3.2 depicts four categories of dependent variables that can potentially be used for theinvestigation of the effects of the use of KBS explanations. The first category relates to userbehavior in interacting with the KBS. Various measures of the change in users behavior arising fromthe provision of explanations comprise this category. As was discussed in Chapter 2, the use ofexplanations may lead to an increase in user understanding of the KBS and the task domain. Thisrepresents the category of learning effects. Learning effects moderate the relationship between theuse of explanations and the categories pertaining to perceptions and judgmental decision-making(JDM). The perceptions category encompasses the changes in the perceptions and intentions of userscaused by the learning fostered by the use of the explanations. They can also be termed as"perceived" effects. The category pertaining to judgmental decision-making (JDM) relates to themanner in which the increased user understanding of the task domain and the KBS influence thequality of the users' decision making. Considering that knowledge-based systems that are used asdecision aids have the facilitation of decision making as their primary objective, this category canbe regarded as being the most important category of effects.63Figure 3.2: Dependent Variables for Investigating the Effects Of The Use Of KBS Explanations3.3.1 Effects on Judgmental Decision MakingThe success or failure of any information technology has to be evaluated in terms of theobjectives of the use to which it is put. An assumption underlying the design of decision supportsystems is that the incorporation of a particular design feature, e.g., KBS explanations, will increasethe effectiveness of decision performance and/or the efficiency of the decision process. Based onthis, Keen and Scott Morton [1978] argue that the appropriate dependent measures for evaluatingdecision aids can be categorized as being either measures of effectiveness or efficiency. Typically,effectiveness indicates the quality or accuracy of the decisions made, and efficiency denotes thedecision speed. For example, while laboratory studies of DSS have used measures such as the levelsof profit, cost, and the number of alternatives considered, as effectiveness criteria, decision64efficiency has usually been measured by decision time [Sharda, Barr, and McDonnell, 1988]. Therehave been no studies to-date that have investigated the effect of the use of KBS explanations oneither decision effectiveness or efficiency.Ideally, objective and observable measures of decision effectiveness and/or efficiencyrepresent the best dependent variables for evaluating the effects of the use of any decision aid,including KBS explanations. However, for the lack of valid measures that meet these criticalcriterion, DSS researchers often have to fall back on using other "surrogate" measures. Theseusually belong to either the category of perceived effects or the category of behavioral effects inFigure 3.2. In such situations, it is essential for these measures to map faithfully the underlyingtheoretical constructs in a valid and reliable manner.3.3.2 Behavioral EffectsIn the absence of good direct measures for decision effectiveness and/or efficiency,behavioral measures of the use of a decision aid have been used as substitutes. The underlyingassumption has been that users will only use a decision aid if they perceive some value in terms ofdecision effectiveness or efficiency. While recent studies, that indicate user perceptions of the valueof decision aids can suffer from an "illusion of control" [Kotteman, Davis and Remus, 1993],suggest that this may be an invalid assumption, the "system use" construct has the advantage ofbeing an observable and objective construct. Measures can generally be collected unobtrusively byhaving the computer keep counts of the frequency and the lengths of time that various features ofthe decision aid are accessed. While most DSS studies have utilized this approach to measurement,others have also relied on having participants report the frequency of their use.65In the context of KBS, the "system use" construct has also been considered to besynonymous with user acceptance [Shortliffe, 1984; Chandrasekaran, Tanner and Josephson, 1988],i.e., if users make use of a particular feature of a KBS they can be considered to have accepted it.In the context of KBS explanations, this argument has also been used as a justification for providingKBS explanations. This has led some researchers to note that "an intriguing, but little researched,aspect of the expert systems literature is the argument that it is essential to provide a capability toexplain results obtained by a program in order to increase user acceptance" [Lerch, Prietula andKulik, 1990, p. 8]. KBS explanations can effect users "system use" behavior in many ways. Forexample, measures of the total time that elapses when KBS explanations are accessed during aconsultation can be used to assess the use being made of the explanations. As well, computer logsof users' interaction with the KBS can be used to assess if 1) there are differences in the mannerin which the various types of explanations are utilized, 2) there are changes in user behavior as aresult of being provided with feedforward and/or feedback explanations.3.3.3 Perceived Effects of the Use of KBS ExplanationsWhile perceived effects can also be used as measurement proxies for the effects on decisioneffectiveness or efficiency, they also represent an important set of effects in their own right. Thisis because users' future behavior, in relation to the use of a decision aid, is often motivated just asmuch as by their perceptions of the benefits that accrued from past use, as by the "real" benefitsthat were obtained. The information systems literature suggests that user intentions, and at least fourcategories of user perceptions or attitudes will be affected by the use of explanations. These areshown in Figure 3.2. The critical role of intentions arises from the work of Davis, Bagozzi andWarshaw [1989]. They tested models based on Fishbein and Ajzen's [1975] Theory of Reasoned66Action to demonstrate that system use can be predicted reasonably from people's intention to use.Additionally, they also demonstrated that perceptions of usefulness and ease-of-use are significantdeterminants of the intentions to use a system. Recent work by Moore and Benbasat [1991] providesa further refinement of these categories of perceptions. They demonstrate that perceptions ofvoluntariness, trialability, compatibility, visibility, and result demonstrability underlie the intentionto use a particular system and are significant determinants of the adoption of a system. While allthese perceptions could be used as dependent variables in the study of the effects of the use ofexplanations, Figure 3.2 isolates the four that can be deemed to be the most relevant. These are theperceptions of trust, user satisfaction, ease of use, and usefulness.The primary category of perceptions that will be affected by the use of KBS explanationsis the perception of user trust in a system and its output. For example, Swartout [1983] argues that"trust in a system is developed not only by the quality of its results, but also by a clear descriptionof how they are derived," and Muir [1988] suggests that "a decision aid, no matter howsophisticated or intelligent it may be, will be rejected by a decision maker who does not trust it".Various theoretical components of trust, such as dependability, predictability, and faith, arediscussed in the literature [Rempel, Holmes, and Zanna, 1985]. However, unlike the case of userperceptions of usefulness, ease-of-use, and satisfaction, no concise and reliable measurementinstrument for trust has been developed yet. Empirical investigations of trust have therefore largelyutilized measures of 1) user confidence, and 2) user agreement with system recommendations, asoperationalizations of the construct, e.g., see Lerch, Prietula and Kulik [1990].While the nature of the theoretical association between confidence and trust is unclear, user67confidence can be measured using two approaches. First, perceptions could be captured usingquestionnaire-type, multi-item instruments that measure its various theoretical sub-constructs. While,many studies of DSS have measured user confidence as a secondary dependent variable using thisapproach, most have relied on using a single item Likert-type scale and have not considered thetheoretical basis for the measurement [Aldag and Power, 1986; Cats-Baril and Huber, 1987; andSharda, et al., 1988]. No attempt has been made to develop a multi-item instrument for measuringconfidence reliably and in a valid manner. The second approach to measuring confidence involveshaving users provide probability estimates of their confidence in judgments that they make. Thisapproach has been used and investigated extensively in the context of the accuracy of judgment andchoice in decision making [Lichtenstein, Fischhoff, and Phillips, 1982; Abelson and Levi, 1985;Sniezek, Paese and Switzer, 1990]. Using this approach, user confidence is decomposed into its twosub-constructs of calibration and resolution, which are considered to be independent skills that areclosely related to the exercise of accurate judgmental performance [Yates, 1982]. Calibration is thedecision-maker's ability to assign appropriate probability levels to judgments, while resolution refersto the decision maker's ability to discriminate correct from incorrect judgments by differentiallyassigning confidence judgments to accurate and inaccurate judgments. The effects of the use of KBSexplanations on the calibration and resolution sub-constructs of user confidence have not beenstudied yet. However, by using the first approach, i.e., a single item Likert-type scale, Lerch et al.[1990] found that the use of explanations had no effect on user confidence in a KBS.The effect of the use of KBS explanation on user agreement with KBS conclusions has beeninvestigated by both Lerch et al. [1990] and Ye [1990]. While only the Lerch et al. study directlyconsidered the linkage of user agreement to user perceptions of trust, both studies found that the68use of explanations improved user agreement with the KBS. In the two studies, user agreement wasmeasured using a single-item, Likert-type scale.User perceptions of satisfaction represent the second category of perceptions that will beaffected by the use of KBS explanations. The use of various scales for capturing perceptions of usersatisfaction, as a surrogate measure for the effectiveness of a decision aid, is well established in theDSS literature [Bailey and Pearson, 1983; Ives, Olson and Baroudi, 1983; Jenkins and Ricketts,1985; Baroudi and Orlikowski, 1988]. While the theoretical underpinning of user satisfaction as avalid theoretical construct is weak, Melone [1990] has argued that there is a fundamental similaritybetween it and social and cognitive psychologists' notion of an attitude. While it could reasonablybe hypothesized that the clarifications provided by KBS explanations will have a positive effect onuser perceptions of satisfaction with the KBS, this effect has not been investigated yet.In Figure 3.2, the perceptions of ease-of-use and usefulness constitute the third and fourthcategories of the perceptual effects arising from the use of KBS explanations. Valid and reliableinstruments exist in the information systems literature for measuring these constructs [Davis, 1986;and Moore and Benbasat, 1991]. The inclusion of explanations as part of the user-KBS interfaceincreases its complexity. This could mean that explanations may have a negative effect on userperceptions of the ease of use of a KBS. This effect has not been investigated yet. Perceptions ofusefulness focus on the degree to which users perceive that using a particular decision aid enhancedtheir task performance. Generally, the use of KBS explanations can be expected to lead to morepositive user perceptions of usefulness. Ye [1990] investigated the effect of using the three typesof explanations on user perceptions of usefulness. He found that the use of the Why and How69explanations resulted in more positive perceptions of usefulness than the use of Strategicexplanations.3.3.4 Learning Effects of the Use of KBS ExplanationsIn Figure 3.2, the learning construct of user understanding moderates the influence of theuse of explanations on the perceived effects. For example, it is a precondition that before a systemcan be perceived as being trustworthy, easy-to-use, or useful it must first be comprehensible andclear to its users. As well, users will only be satisfied and confident with systems that theyunderstand and comprehend. This argument is consistent with the assertion of current theories ofmental models [Norman, 1983], cognitive learning [Balzer et al., 1989], and framing effects[Minsky, 1975] that people's perceptions of the world are shaped by the understanding that theyalready possess. As a consequence of its role in moulding user perceptions, the user understandingthat is fostered by KBS explanations can be expected to be a significant determinant of the intentionto use a KBS and its explanations in the future. User understanding also moderates the influenceof the use of explanations on the efficiency and effectiveness of judgmental decision making. It canbe argued that users will demonstrate more effective and efficient decisional performance, whenusing KBS explanations, largely because the improved user understanding of the task that is fosteredby the explanations will allow them to perform better. This is consistent with, and supported by,the findings of the research based on the cognitive feedback paradigm, which was discussed inChapter 2. Measurement of the learning, i.e., the improved understanding that is fostered, cantherefore be considered to be a critical dependent measure representing the effects of the use of KBSexplanations.70There are two ways in which user understanding can potentially be operationalized in KBSstudies. First, it can be measured using multi-item scales to capture user perceptions of learning.Second, it can be operationalized as defined by the conversation theory of learning [Pask, 1976].This theory views teaching as being the passing of one's understanding to others and learning as thegaining of an understanding. Teaching and learning develop as the participants, e.g., the explanationfacility and the user, reach agreements about the subject matter. It also defines understanding asbeing the ability to reconstruct or develop the concept learned either when it is forgotten or whencircumstances have changed. A critical implication of the theory is that the agreements that arereached by the participants can be demonstrated by testing. Thus, the approach to the measurementof user understanding should not be the collection of perceptual measures but rather testing ifsubjects are 1) able to reconstruct the concepts when forgotten, or 2) able to adjust the concepts tochanging situations.3.4 Hypotheses Chapter 2 provided a cognitive learning theoretical basis for KBS explanations byconceptualizing their provision as feedforward and feedback. These feedforward and feedbackstrategies for providing KBS explanations were evaluated within the comprehensive framework forinvestigating KBS explanations. In doing so, three fundamental research questions were addressed:1) To what extent are explanations used in judgmental decision-making situations involving the useof a KBS? 2) What factors influence the use of KBS explanations? and 3) Does the use ofexplanations empower KBS users in any way? This section formulates and states several hypothesesthat explore these research questions.71Figure 3.3 presents the research model that will be used. To investigate the first and secondresearch questions, this research will study three of the factors that were identified in Section 3.2(see Figure 3.1) as potential determinants of the use of KBS explanations. It will investigate howexplanation provision strategies, together with user expertise and the types of explanations provided,influence the use of KBS explanations. User expertise was identified in Section 2.4.2 of Chapter2 as being a relevant variable that could potentially affect the effectiveness of the feedforward andfeedback explanation provision strategies. As well, Section 3.1.3 of this chapter, which discussedthe relationship between the types of explanations and the explanation provision strategies, providedthe basis for the inclusion of the types of explanations as a relevant variable. Table 3.3 presents the72Main EffectsHl:^Feedback explanations will be used as much as the feedforward explanations.H2: Novices will use more explanations than experts.H3: The why explanation will be used the most, and the how explanation will be used more thanthe strategic explanation.Interaction EffectsH4: Novices will use more feedforward explanations than feedback explanations, while expertswill use more feedback explanations than feedforward explanations.H5: For both feedforward or feedback explanation provision, the why explanation will be usedthe most, and the how explanation will be used more than the strategic explanation.H6a: Novices will use the why explanation the most, and the how explanation more than thestrategic explanation.H6b: Experts will use the how explanation the most, and the strategic explanation more than thewhy explanation.H7a: For both feedforward or feedback explanation provision, novices will use the whyexplanation the most, and the how explanation more than the strategic explanation.H7b: For both feedforward or feedback explanation provision, experts will use the howexplanation the most, and the strategic explanation more than the why explanation.Table 3.3: Hypotheses Relating to the Determinants of the Use of Explanationsspecific hypotheses that will be used. Hypotheses 1 through 3 directly test the influence of the threefactors independently, while Hypotheses 4 through 7 focus on the influence of their interactions.Considering that evidence from prior research indicates that 1) the level of user agreementhas a significant relationship with the use of explanations, and 2) there are expert-novice differencesin user agreement, the study will also investigate the influence of the initial level of user agreementwith KBS conclusions on the use of KBS conclusions. The specific hypotheses relating to this are73118: There is an inverse relationship between the level of user agreement and the number offeedback explanations used.119: The relationship between the level of user agreement and the number of feedbackexplanations used will be the same for experts and novices.H10: There is no difference in the level of user agreement between experts and novices.Table 3.4: Hypotheses Relating to the Influence of User Agreementpresented in Table 3.4. Hypothesis 8 postulates that there will be an inverse relationship betweenthe level of initial user agreement and the amount of explanations that are used. Hypothesis 9 teststhe mediating influence of the level of user agreement on the relationship between user expertiseand the use of KBS explanations. As well, Hypothesis 10 tests for expert-novice differences in theinitial level of user agreement with KBS conclusions.The third research question is investigated using two of the various effects of the use of KBSexplanations that were identified in Section 3.3 (see Figure 3.2). As shown in Figure 3.3, the firstof these is the effect on the accuracy of judgments, which represents the effectiveness of judgmentaldecision making. Table 3.5 states the specific hypotheses that are used to assess this effect.Hi la: The use of feedforward explanations will improve the accuracy of judgments.H1 lb: The use of feedback explanations will improve the accuracy of judgments.H12a: The use of Why explanations will improve the accuracy of judgments.H12b: The use of How explanations will improve the accuracy of judgments.H12c: The use of Strategic explanations will improve the accuracy of judgments.Table 3.5: Hypotheses Relating to the Influence ofthe Use of KBS Explanations on the Accuracy of Judgments74Hypothesis lla tests for the influence of the use of feedforward explanations, while Hypothesis 1 lbtests the influence of the use of feedback explanations. Hypotheses 12 a, b, and c focus on theinfluence of the use of the three types of explanations. The second effect explored is the effect onuser perceptions of usefulness. This represents a perceived effect in Figure 3.3. The specifichypotheses relating to it are stated in Table 3.6. Hypotheses 13 a and b test the influence of the useH13a: The use of feedforward explanations will improve user perceptions of usefulness.H13b: The use of feedback explanations will improve user perceptions of usefulness.H14a: The use of Why explanations will improve user perceptions of usefulness.H14b: The use of How explanations will improve user perceptions of usefulness.H14c: The use of Strategic explanations will improve user perceptions of usefulness.Table 3.6: Hypotheses Relating to the Influence of theUse of KBS Explanations on User Perceptions of Usefulnessof feedforward and feedback explanations, while Hypotheses 14 a, b, and c test the influence of theuse of the three types of explanations. The selection of the accuracy of judgment effect forinvestigation is based on the discussion, in Section 2.2.3.3 of Chapter 2, of past research thatcompared the relative efficacies of the feedforward and feedback cognitive learning operators infostering the learning of judgmental decision making. User perceptions of usefulness were alsoselected as a dependent variable because Ye [1990] had found significant differences in theseperceptions for the different types of explanations that were provided to KBS users. As well, it wasfelt that the selection of a "perceived" effect, together with a "real" one, would add to ourunderstanding of how the use of KBS explanations empowered KBS users.753.5 Summary of the Chapter In this chapter, the use of explanations was first defined and significant prior research in thearea was reviewed. The second and third sections then presented a two-part framework for studyingthe determinants and effects of the use of KBS explanations. This framework formed the basis forthe research model that was then presented in Section 3.4. This research model comprises thepertinent variables identified, in both Chapter 2 and the first part of this chapter, as being relevantto the assessment of the provision of KBS explanations as feedforward and feedback. Section 3.4also used the research model to state hypotheses about relationships among the variables. Thechapters that follow describe an experiment that was carried out to test these hypotheses.76CHAPTER 4: THE TASK DOMAIN AND THE DEVELOPMENT AND VALIDATIONOF THE EXPERIMENTAL SYSTEM4.0 Introduction This chapter describes the task domain that served as the context for the investigation of thedeterminants of the use of KBS explanations and their impact. It also discusses the development ofthe various stimulus material used in the investigation including the Canacom financial analysis case,the MOUSE and CREDIT-ADVISOR tutorial systems, and the FINALYZER-XS experimentalknowledge-based system (KBS). As well, the results of various tests carried out to ascertain theexperimental realism [Swieringa and Weick, 1982] of the experimental task material and the validityof the explanations provided by the experimental system are reported.4.1 Selection of the Domain and the Task Application Selection of the specific context that was used in this study was done in two steps. Initially,an appropriate administrative domain was identified. Next, the various types of task applicationswithin this domain were evaluated to yield a suitable one. These two steps are discussed separatelybelow.4.1.1 The DomainSeveral factors can be used to distinguish between application domains in which knowledge-based systems have been developed. These include the levels of complexity, the degree of77uncertainty, the size of the problem space, and the levels of both perceived structure [Keen andScott-Morton, 1978] and deep structure [Chomsky, 1971]. As KBS are used to solve complex,unstructured problems, it was important for the domain selected to have sufficient levels ofcomplexity, uncertainty, and problem size so that users could perceive the need for a KBS of highquality. These levels can generally be controlled by the selection of application tasks within adomain. The two dimensions of the domain structure, on the other hand, cannot be controlled inthis manner. The underlying deep structure of a domain that specifies the relationships in itsproblem space is usually fixed, irrespective of the particular task selected. Perceived structure, asreflected by the extent to which there exists a well-defined and specified body of knowledge in thedomain, is also independent of the task selected.Considering the above factors, the financial evaluation domain was selected to be theexperimental domain. It offered an adequate level of deep structure to make the development anduse of a KBS worthwhile from a user's perspective. As well, the high specificity of its domainknowledge, which is based on the well-specified double-entry model of financial accounting, ensuredthat knowledge acquisition to develop such a system would be facilitated. The domain selected wasvalidated against Zanakis and Evan's [1989] characterization of domains where the use ofknowledge-based heuristics is most applicable. Considering that financial evaluation is too complexfor simple optimization models and algorithms [Duchessi, Shawky and Seagle, 1988, p. 57],requires both quantitative and qualitative inputs, and involves both symbolic and numeric reasoning[Bouwman, 1983] it rated positively on this evaluation. Other considerations that also supported theselection of this domain include the following: 1) its extensive use in the financial sector meant thatsufficiently large numbers of domain experts and experimental subjects would potentially be78available to participate in the study; 2) the researcher's past training in financial accounting wouldbe an asset in the design of the experimental system; and 3) the long history of the use of financialevaluation in the study of computer modelling of human diagnostic reasoning [Bouwman, 1978 and1983] and the increasing interest in knowledge-based approaches to financial accounting applications[Elmer and Borowski, 1988; Duchessi et al, 1988; Srinivasan and Kim, 1988; and Shaw andGentry, 1988] could serve to guide the selection and development of the experimental material.4.1.2 Selection of the Task ApplicationVarious tasks within the financial evaluation domain were considered, using both ofClancey's [1985] heuristic classification and heuristic configuration task-types and the taxonomyproposed by Hayes-Roth et al, [1983, p. 14]: interpretation, prediction, diagnosis, design, planning,monitoring, debugging, repair, instruction, and control. Based on a review of the existingapplications of financial knowledge-based systems [see for example the summary presented by Roy,1989] and after consultation with several experts in financial management, the choices werenarrowed down to two tasks. One was the "heuristic configuration" type task of designing financialinvestment plans for wealthy individuals by balancing a variety of financial objectives. This task issimilar to that of the PlanPower expert system of Financial Designs, Inc., [Sviokla, 1986, p. 116].The other was the "heuristic classification" type task of diagnosing the financial statements ofcompanies to support loan granting and investment selection decisions.Of the two options, the latter was selected as the experimental task for a variety of reasons.First, the bulk of current KBS perform diagnostic tasks that involve the identification of the stateof an underlying system on the basis of a set of observable symptoms [Waterman, 1986]. As well,79all the past studies of KBS interfaces have used diagnostic tasks. The selection of the financialanalysis task over the financial planning task would therefore facilitate the comparison of the resultsof this study to those of the prior studies. The second consideration was that since expertise infinancial planning involves the integration of specialized knowledge about financial investments, taxplanning, insurance, estate planning, retirement planning and related legal considerations,knowledge acquisition would be complex. A number of experts from these different fields wouldhave to be recruited and their views consolidated. Sviokla [1986], notes that this can be an arduoustask because these experts generally subscribe to significantly different principles and models offinancial planning. Financial analysis expertise, however, is more focused and consistent. A smallernumber of experts would be needed and they would generally be easier to recruit as there are moreof them around due to the pervasiveness of the use of financial statement analysis.Financial statement analysis involves the review of a company's financial data to evaluatevarious aspects of its financial standing and performance. Based on such an assessment ofunderlying financial health, a whole range of business decisions are made. For example, while itis used primarily in the two decision contexts of investment portfolio selection and loan evaluation,it is also used for estimating a firm's market risk (beta) and for predicting bond ratings [Brealey andMyers, 1984]. The loan evaluation decision context was selected to be the focus of the study. It wasalso decided that a commercial lending task would be more appropriate over a consumer credit taskas both the composition of the financial statements and the expertise required to evaluate themwould be complex enough to warrant the use of a KBS.Seven major categories of financial ratios are commonly used in financial analysis: liquidity80ratios, leverage ratios, profitability or efficiency ratios, market value ratios, funds flow adequacyand reinvestment ratios, and common-size ratios of the individual items comprising both the balancesheet and the income statement. Additionally, other subjective factors affecting these ratios are alsousually considered. Examples of these include the possibilities of losing major litigation battles,having foreign subsidiaries nationalized, competitors introducing significantly superior products, etc.As there are no absolute standards for these financial ratios, relative standards are generally used.This involves comparing a firm's ratios to the same ratios in earlier years, and with the ratios ofother firms in the same industry which are often summarized into industry composites.4.1.3 Decision Support Aids for Financial Statement AnalysisThe conduct of financial statement analysis generally involves three steps. Initially, thevarious categories of ratios are computed. Next, each category of ratios is evaluated to yieldjudgments relating to each of the various sub-aspects of financial analysis, such as liquidity, capitalstructure, profitability, etc. Finally, these judgments form the basis for the making of specificdecisions or predictions. Various computerised decision aids are currently available to support thefirst step involving the computation of the ratios. Considering that it is a structured process [Gorryand Scott-Morton, 1971] with quantitative inputs and precisely defined transformation procedures,this step was the first to benefit from computerization. These decision aids generally take the formof spreadsheet-based financial analysis packages such as FSAP [Stickney, 1990] and FISCAL[Halcyon Group, 1990] and are used widely in industry. They facilitate decision support functionssuch as what-if analysis, sensitivity analysis, and goal seeking analysis [Sprague and Carlson, 1982].The second step, involving the evaluation of the ratios to produce judgments, represents a81more unstructured process. It is characterized by the use of specialized domain-specific knowledge.This generally takes the form of industry-specific knowledge about: 1) the relevance and value ofeach of the ratios computed, and 2) the relative standards by which the values of the ratios are tobe judged. Decision aids that incorporate such knowledge, such as knowledge-based systems, areneeded to support this step of the financial analysis process. For example, Miller, Pople and Meyers[1982] have argued that a financial analyst would greatly benefit from knowledge-based systems thatcontain knowledge of specialized industries, just like a medical general practitioner greatly benefitsfrom knowledge-based systems that cover medical specialties. Considering the large number ofindustries that exist and the fact that the boundaries between them are often fuzzy, especially withthe large number of firms that have pursued diversification, it is no surprise that there does not exista single commercialized multi-industry KBS available that supports this step of the financial analysisprocess. This situation is likely exacerbated by the significant cost and difficulties associated withthe acquiring of such domain knowledge as part of the systems development process.There is evidence, however, that various organizations have attempted the in-housedevelopment of financial analysis KBS that are focused on a limited set of industries. As reportedby Roy [1989, pp. 331-332] and in the Expert Systems Strategies newsletter of Cutter InformationCorporation [1988], these include the Authorizer's Assistant of American Express, the Letter ofCredit Adviser of the Bank of America, the Financial Analyzer of the Athena Group, and theImplode system of the Security Pacific Merchant Banking Group. Established systems developmentorganizations have also started offering services relating to the development of such systems, e.g.,Arthur Andersen & Company's development of the Financial Statement Analyzer for the USSecurities and Exchange Commission [McGee and Porter, 1988] and Arthur D. Little Incorporated's82development of the Indenture Analysis system. More recently, smaller specialized systemsdevelopment organizations have begun to appear in the market place to exploit this niche ofindustry-specific KBS for financial statement analysis. In addition to providing specialized KBSdevelopment assistance, they generally offer an integrated package of software capabilities, in theform of a skeletal system or "shell", that is specifically constructed for the financial statementanalysis task. These shells can quickly and easily be customized or instantiated with the domainknowledge of a particular industry to yield specific knowledge-based systems. Examples of theseinclude FAST ADVISOR of Financial Proformas Incorporated which is currently being used at theBank of Montreal [White, 1991], Power 1 of BancA Corporation which is used at the CanadianImperial Banking Corporation [Shaw and Gentry, 1988, p. 46], ANSWERS of Financial AuditSystems Incorporated [Financial Audit Systems, 1992], and Lending Adviser of SyntelligenceIncorporated which is used at the Wells Fargo Bank [Syntelligence Marketing Material, 1990].Step three of the financial analysis process involves the conversion of various financialjudgments into particular decisions or predictions. Decision aids for this step will therefore take theform of "decision-taking" systems as compared to the "decision support" approach taken for theprevious step as discussed above. While this step is similar to step two in terms of being anunstructured procedure requiring large amounts of domain knowledge for its resolution, the fewcommercial decision support systems used in industry do not take the form of knowledge-basedsystem applications. Instead, algorithmic credit scoring approaches based on elaborate weightingschemes, multivariate statistical methodologies, e.g., logit and discriminant analysis [Srinivasan andKim, 1988], and induction procedures [Shaw and Gentry, 1988] are used. As well, the use of thesedecision aids is largely limited to applications relating to the analysis of personal financial statements83in consumer credit decisions. Of the three steps of financial analysis, this step is the least automatedin industry. It has been suggested that the financial industry is reluctant to adopt the use of theseaids in the more substantive corporate lending and investment situations primarily because of: 1)legal liability and accountability considerations arising from the use of such "decision taking"systems, and 2) the failure of these systems to take into consideration subjective factors relating tothe maintenance of long-term business relationships between financial institutions and their clients[Kerkovius, 1992; White, 1991].4.2 Development of the Experimental Material Considering the current availability of decision support aids for financial analysis, asdiscussed in the last section, attempts were initially made to obtain the use of a commercial KBSused to support step 2 of the process. Although none of these KBS were known to incorporate anyelaborate explanation provision facilities, it was felt that such a capability could be added for thepurposes of this research. As well, attempts were made to obtain the participation of a localfinancial institution that was in the process of implementing such a KBS. When these efforts provedfruitless, a simulation approach was decided upon, whereby a simulation of the interface of ahypothetical KBS for financial analysis would be developed and subjects using it convinced that theywere interacting with a functional KBS. This simulation approach has been commonly utilized inlaboratory studies testing new human-computer interfaces [Good, et al., 1984]. In the context ofKBS interface design, it has been used by Ye [1990] as was discussed earlier. However, before asimulation could be developed, it was first necessary to determine the exact details of theexperimental task to be used. This was because the simulation to be developed had to meet all therequirements for completely solving this task. The development of this is discussed next.844.2.1 Development of the Canacom Experimental CaseSpecific criteria used in the development of the case included the following. First, the taskhad to be non-trivial so that subjects could perceive the need for a KBS to support it. Glass [1992]has noted that non triviality does not necessarily correspond to task complexity. Second, it had tobe challenging enough to engage both expert and novice subjects. Third, it had to be a realistic task.Various researchers in management information systems have warned about the dangers of usingunrealistic tasks in experimental designs [Jarvenpaa, Dickson and DeSanctis, 1985; Benbasat, 1989].While it was not the intention to use completely new task material, it was made imperative by thefact that task material used in prior studies was either not available or did not meet the criteriaspecified above. Generally, the task material used in the study of the composition of financialexpertise [e.g., Bouwman, 1990] and its decision processes [e.g., Bouwman, 1983] was found tobe too artificial and trivial to meet the requirements for a KBS. Task material from field studies inaccounting of the loan granting and bankruptcy prediction tasks [e.g., Danos, Holt and Imhoff,1989] was generally not readily available, possibly due to confidentiality reasons.Initially, case problems in various texts of financial accounting [Stickney, 1990; Foster,1986; and Bernstein, 1989] and managerial finance [Butters, Fruhan, Mullins and Piper, 1981] werereviewed and evaluated to select an appropriate industry and to determine the minimal set ofinformation that would have to be included in a comprehensive case. Based on this, the high-technology computer manufacturing and retail industry was selected as the context for the case.Using the Computer Industry Case Analysis [Stickney, 1990, p. C-65], the Digital Equipment Case[ibid, p. AR-199), and the Tandy Corporation Case [Bernstein, 1989] as a guide, the financialstatements and annual reports of various firms in this industry were sampled and evaluated using85a CD-ROM based financial database. Only companies listed on the New York and American stockexchanges were considered in this evaluation. This was because of the limited number of Canadiancompanies in this industry. As well, it was felt that by using a disguised foreign company thechances of biases caused by some subjects being familiar with the case would also be minimized.Of the various companies, one was selected (Tandy Corporation) to provide the basis forthe case and was disguised as Vancouver-based Canacom Corporation. Complete annual reports andmarket reports of the company for a period of ten years between 1978-1988 were obtained and usedas the basis for the development of the case. Under the guidance of an accounting professor whoteaches financial analysis, the financial statements were reorganized to be in the format prescribedfor Canadian companies. As well, other information was modified to "Canadianize" the case andto minimize the time required to read and understand it. Careful attention was paid to theinformation cues that were to be provided, such as the fact that the auditor's report of the companyhad yielded an unqualified opinion in the last five years, etc. Appendix 2 presents a copy of the casedeveloped together with the financial statements and other experimental instructions.The case was then refined and validated iteratively by having two doctoral students inaccounting, two chartered accounting students, and two experienced financial analysts (both ofwhom would later serve as domain experts in the development of the experimental system) analyzeand solve the case. Special attention was paid to the time they took to complete the case, theinformation cues they used, and their perceptions of the difficulty of the case. Their opinions as tomissing information cues that they would like added to the case and information cues that werevague were actively solicited and used to improve the case. For example, it was found that close86to eighty percent of the time spend in completing the case was used to compute the various ratiosthat were required. While the four students did not particularly mind this and generally got straightdown to computing the ratios, both the experienced analysts minimized the amount of effort theyspent doing this computations. They suggested that a complete set of ratios should be provided aspart of the case as virtually all organizations in industry used financial accounting packages andspreadsheets to compute them. This led to the development of a comprehensive set of financialratios which were subsequently added to the case. Four iterations involving the two financialanalysts and the accounting professor were needed to complete this set of ratios. These iterationswere necessary to resolve differences in opinion as to which ratios had to be included, what eachratio was to be called, and exactly how each ratio was to be computed. Ratios were computed fora five year period (see Appendix 2). Another three suggestions that were incorporated into the caseinclude the addition of the common-size financial statement ratios, the ratios for the competitor firm,and the industry composites. This ensured that the case contained all the information necessary toperform the three types of comparisons that are used in financial analysis: trend comparisons,comparisons to competitors, and comparisons to industry averages. The Standard Industry Code(SIC) of the real company on which the case is based was used to obtain these industry composites.Considering that the company participates in the both the computer manufacturing and retailingsegments, the composites of both these industry segments were included. The criteria used to selectthe competitor firm (Apple Corporation) included: the same approximate balance between computermanufacturing and retail operations, roughly the same overall scale of operations, and similarity insales strategy (both used their own chains of retail stores to market their products).874.2.2 Development of the FINALYZER-XS Experimental SystemConsidering that the simulation approach was being used, it was imperative that the systemdeveloped had to completely cover all aspects of the experimental case developed. Knowledgeacquisition was therefore a critical aspect of the system development. It had to adequately elicit allthe expertise that human experts utilized to completely analyze the Canacom case. As well, usershad to be convinced that they were interacting with a "true" KBS. Failure to do this would meanthat users' behavior in the use of the system would be biased and would therefore compromise thegoals of the research. As user perceptions of system "expertise" are based primarily on theinformation cues provided by the user-system dialogue [Einhorn and Hogarth, 1986], interfacedesign was the second major aspect in the development of the system. While knowledge acquisitionand interface design were handled separately, and are therefore discussed in separate sections below,it was decided that both the domain expertise and the interaction sequence of the system would bemodelled on the decision process obtained from an analysis of concurrent verbal protocols [Ericssonand Simon [1984] elicited from a human expert in financial analysis. This approach is similar to thatadvanced by Bouwman [1983] who argued that computer programs could be made more "human-like" by basing them on human diagnostic behaviour. Considering that none of the subjects in thestudy would have prior experience with the use of a financial KBS but most would have interactedwith human experts in financial analysis, it was felt that this approach to the systems developmentwould improve users' perceptions of the validity and usability of the system. Therefore, the firststep in the development of the system was to collect and analyze the verbal protocols. This isdiscussed next.884.2.2.1 Decision Process Used in Financial AnalysisThe following is the decision process of an experienced financial analyst evaluating thefinancial statements of a company to determine an acceptable loan size and to predict its futureperformance. It was derived from an analysis of concurrent protocols that were collected, usingneutral probes [Ericsson and Simon, 1984], as the analyst performed the financial analysis task. Theanalyst, a partner in a medium sized accounting firm, has been a chartered accountant for twentyyears. The steps comprising the decision process were elicited by using the feedforward-feedbackconceptualization of the user-KBS interaction, as discussed in Chapter 3, to code the verbalprotocols. According to this conceptualization, the interaction is viewed as being analogous to analternating series of steps involving the system exchanging information with the user spaced by thesystem performing a particular operation (See Figure 3.2). The steps as viewed according to thisconceptualization are highlighted below in bold.Step 1: Compute a profitability ratio, initially the return on investment (ROI): informationexchange (input)Step 2: Evaluate it in terms of trends and possible qualitative factors: system operationStep 3: Form conclusions as to profitability and make a decision as to the need for furtherprofitability analysis. If yes, decide on another ratio and go to Step 1: information exchange(outcome)Step 4: Compute a leverage ratio, initially the debt-equity ratio: information exchange (input)Step 5: Evaluate in terms of trends and possible qualitative factors: system operationStep 6: Form conclusions as to leverage and make a decision as to the need for further leverageanalysis. If yes, decide on another ratio and go to Step 4: information exchange (outcome)Step 7: Compute a liquidity ratio, initially the quick ratio: information exchange (input)89Step 8: Evaluate in terms of trends and possible qualitative factors: system operationStep 9: Form conclusions as to liquidity and make a decision as to the need for further liquidityanalysis. If yes, decide on another ratio and go to Step 7: information exchange (outcome)Step 10: Compute a market value ratio, initially the price-earnings (P/E) ratio: informationexchange (input)Step 11: Evaluate in terms of trends, market risks, and other possible qualitative factors, includingthe impact on return, leverage and cash flow: system operationStep 12: Form conclusions as to market value and make a decision as to the need for further marketvalue analysis. If yes, decide on another ratio or risk factor and go to Step 10: informationexchange (outcome)Step 13: Compute a funds flow ratio or perform a funds flow item comparison: informationexchange (input)Step 14: Evaluate it in terms of trends, risks, and other possible qualitative factors, including theimpact on return, leverage and liquidity: system operationStep 15: Form conclusions as to funds flow and make a decision as to the need for further fundsflow analysis. If yes, decide on another ratio or comparison and go to Step 13: informationexchange (outcome)Step 16: Select a common-size financial statement of the company, initially the common-size incomestatement: information exchange (input)Step 17: Evaluate the trends of the individual items both in relation to one another and to otherpossible qualitative factors: system operationStep 18: Form conclusions as to the strengths and weaknesses of the items and make a decision asto the need for further common-size and trend analysis. If yes, select the common-size balance sheetand go to Step 16: information exchange (outcome)Step 19: Select for consideration one of the specific judgmental tasks that has to be performed:information exchange (input)90Step 20: Consider the combined effect of all the financial analysis conclusions made earlier inrelation to this specific judgment task: system operationStep 21: Generate the judgment. Decide if another judgment is required, if yes go to step 19:information exchange (outcome).This model of the decision process served as the basis of a modular structure comprising thetypes of financial analysis that the experimental system was to perform (See Figure 4.1). Theprotocols revealed that the analyst performed seven different types of analysis: leverage analysis,liquidity analysis, profitability analysis, market value analysis, funds flow analysis, common-sizeincome statement analysis, and common-size balance sheet analysis. As well, each of these analyseswas performed independently in a sequential manner. The combined impact of the various analyseswas only considered at the end of the process when each of the specific judgement tasks wasresolved (Steps 19 through 21 above). This modular structure served as a guide to both the mannerin which the knowledge acquisition activities were carried out as well as for the internal design ofthe system.The design of the sequence of the experimental system's screens was modelled directly onthe decision process depicted above. Note that steps 1, 4, 7, 10, 13, 16 and 19 representinformation exchanges where the system communicates with the user about the inputs that are tobe used to perform a particular analysis. Considering that these information exchanges deal withinput information and occur before a particular analysis is performed, they can be viewed as beingfeedforward steps. Steps 2, 5, 8, 11, 14, 17 and 20 are system operations where the system "goesaway" from the control of the user to perform an action, i.e., a particular analysis. Finally, steps3, 6, 9, 12, 15, 18 and 21 are information exchanges where the system presents its conclusions and91EXITSCREENINTRODUCTIONSCREENSCREEN SUMMARYSCREENOVERALLBALANCE SHEET ANALYSISI feedforward screensystem operation screen1 ^feedback screenINCOME STATEMENTANALYSISfeedforward screensystem operation screenfeedback scamFUNDS FLOW ANALYSISfeedfonwuri screensystem operation screenfeedback screenLIQUIDITY ANALYSISfeedforward screensystem operation screenfeedback screenCAPITAL STRUCTUREANALYSISfeedforward screensystem operation screenfeedback screenPROFITABILITY ANALYSISfeedforward screensystem oration screenfeedback screenMARKET VALUE ANALYSISfeedforward screensystem operation screen Ifeedback screenFINANCIALSTATEMENTPRESENTATIONSCREENSANALYSIS SELECTIONFIGURE 4.1: MODULAR STRUCTURE OFTHE FINALYZER-XS SYSTEMrecommendations to the user. These can be considered to be feedback steps as they relate to theexchange of output information and occur subsequent to a particular analysis being performed bythe system. The interaction can therefore be viewed as being comprised of three-step cycles, onefor each of the seven types of analysis performed. Each cycle involves starting with an informationexchange about the inputs that are to be used, going on to the execution of a particular systemoperation, and ending with the system and the user exchanging information about the outcomes of92the operation that the system performed. As can be seen in Figure 4.1, for each of the seven typesof analysis that FINALYZER-XS performed three different kinds of screens were developed torepresent this three-step cycle: feedforward screens, system operation screens, and feedback screens.The feedforward screens listed all the inputs, in the form of ratios and comparison procedures, thatthe system was to use together with the following message: "All the following inputs will be usedby the system as part of its analysis " The system operation screens comprised two screenspresented sequentially The first presented a short message such as "Computing asset utilization andprofitability ratios " and was displayed for about three seconds. Its objective was to give usersthe impression that the system was actually performing the computations. The second screendisplayed the tables of ratios that had been computed and to which the system was applying itsfinancial expertise. Finally, the feedback screens listed the various conclusions reached by thesystem together with the message "The following conclusions have been reached by the system Appendix 6 presents these three kinds of screens for all the analysis modules of the experimentalsystem.The verbal protocols collected also served as the basis for the composition of the variousscreens. For example, conscious effort was made to ensure that the outcomes or conclusionspresented in the feedback screens were similar to, as well as couched in the same language, as thoseused by the domain expert. The following are some examples extracted from the protocols collected:at Step 2 - "Has produced a respectable return on equity (ROE) in 3 of 5 years"; at Step 5 - "Hasleveraged close to 1 to 1, and is susceptible to a recession", and "The ROE is so steady because ofthe low debt-equity ratio"; and at Step 8 - "The repayment structure will have to reflect thatliquidity will be low in the initial two years."934.2.2.2 Interface Design IssuesAll of the screens comprising the FINALYZER-XS simulation were designed by theresearcher. These were then implemented by a programmer using the Windows 3.0 [Microsoft,1990] software running in a DOS-environment on a 33 MegaHertz 80386 microcomputer. A mouse-driven, multi-window interface was selected over a line-at-a-time, command language type interfacefor two reasons. First, it was necessary to make the system as easy-to-use as possible. As some ofthe subjects would have no prior experience in the use of computer interfaces, it was felt that amouse-driven, multi-window interface would be easier to learn, easier to use, and less prone to usererrors [see Majchrzak, et al., 1987 for a summary of this research]. Second, the mouse-driven,multi-window style has recently become the interface of choice for the bulk of current knowledge-based system applications. Virtually all of the leading high-end KBS development environmentsutilize it. For example, the personal computer version of NEXPERT Object [Neuron Data, 1991]offers such a mouse-driven, multi-windows front-end to its knowledge-base based on the Windows3.0 software. Along these same lines, Jones [1990] argues that while in the 1970's the user interfaceof a decision support system generally consisted of some form of line-at-a-time dialogue with thecomputer, beginning in the mid 1980's direct manipulation interfaces with mouse-driven, icon-basedaction languages and multi-window presentation languages, are becoming the norm. He goes onto argue that the growing availability of this interface style will make it a simple necessity in futuredecision support systems.In an effort to keep the interface as simple as possible only two types of iconic symbols wereutilized as part of the action language by which users communicated with the system. These iconswere the radio button and the push button [Borland Whitewater Resource Toolkit, 1990]. The94feedback screens presented in Appendix 6 provides examples of both these two icons. Users utilizedradio buttons to select particular items from among a set of options, such as input ratios, systemconclusions, or types of analysis. As well, by clicking on one of a string of seven radio buttons thatcomprised a seven-point Likert-type scale, users used the radio buttons to specify their level ofagreement with system conclusions. Push buttons were used to switch from one screen to another(screen control), to trigger the start of a particular procedure or analysis, and to view a particulartype of explanation that was provided.Another critical decision made that influenced the design of the interface was the role theKBS was to play in relation to the user. This is related to the "decision support" versus "decisiontaking" distinction that was discussed earlier in Section 4.2.3 in relation to currently availabledecision aids for financial analysis. Considering that the focus of this research was on the usersupport role as against the user replacement role [Wensley, 1989], the system was designed to bean expert consulting and analysis tool (decision aid) rather than a decision making tool. It aids theuser by performing all the various computations required and by providing expert diagnosticconclusions relating to the various aspects of the financial health of a company such as liquidity,capital structure, asset utilization, etc.. It does not provide the user with specific prediction estimatesor judgmental decisions that the user may be attempting to resolve, e.g., whether the stock beinganalyzed should be bought or sold, and whether a loan should be granted to the company whosefinancial statements were being analyzed.Another related interface design issue was the control of the user-KBS interaction. It isrelated to Silver's [1991, pp. 115-121] notion of system restrictiveness, i.e., the degree to which,95and the manner in which, a system limits its users' decision-making processes to a subset of allpossible processes. Three specific models of dialogue control can generally be conceptualized:system-driven, user-driven, and hybrid-control. In a system-driven interaction, the KBS leads theuser through the various steps of the financial analysis task in a pre-determined and fixed sequentialorder. Many systems, especially those that are used in experimental studies such as this, utilize thismode as it facilitates control (especially experimental control) by minimizing user errors, userresponsibility and the chance that not all aspects of a particular task may be considered. Someauthors have even argued that this is the optimal mode for KBS interfaces because it more closelyresembles the manner in which human experts interact with their clients [Coombs and Alty, 1980].From an experimental control perspective, however, there is the danger that it may cause users tobehave "unnaturally" in the performance of the financial analysis task. User-driven interfaces, onthe other hand, offer complete flexibility to the users in terms of the kinds of analysis that they wishto perform, the order in which they wish to perform it, and even the total amount of analysis thatthey wish to perform. FINALYZER-XS was designed to utilize the hybrid-control mode. It offeredflexibility in terms of the kinds of analysis the user wanted to undertake. For example, as depictedin Figure 4.1, an Analysis Selection Screen was included that offered users the choice as to the typeof analysis they wished to do. Appendix 6 presents the actual screen that was used. On thecompletion of a particular analysis, the system always returned to this "control" screen for the nextselection by the user. Exiting the system, by first switching to the Overall Summary Screen, wastreated as one of the choices on this Analysis Selection Screen (see Figure 4.1). However, for thepurposes of experimental control, some restrictions had to be placed on the flexibility offered bythe Analysis Selection Screen. It was decided that while users would have control over thesequential order in which they performed the various types of analysis, they were required to96complete all the seven types of analysis before they could exit the system through the OverallSummary Screen. This was to ensure consistency in the types of analysis performed by all thesubjects.The requirement that users had to be convinced that they were using a "functional" KBS andnot just a simulation, led to various refinements in the interface design. First, the initial IntroductionScreen (page 363 of Appendix 6) was modified to include system requests for the names ofcomputer data files from which the system would purportly obtain: 1) the financial statements ofthe company to be analyzed, and 2) the industry-specific financial expertise that was to be appliedto the financial statements. Since neither of these files were really needed or even existed,hypothetical file names were included as part of the screen: "Cana.KBS" for the financial statementsof Canacom Corporation and "SIC6479" for the rules comprising the domain expertise of thecomputer manufacturing and retail industry. Second, intermediate screens with messages such as"Reading Balance Sheets.... ", "Reading Income Statements ", and "Reading Statement ofChanges in Financial Position" were displayed for three seconds each to give the impression thatthe system was actually reading in the financial data or performing some particular computations.Casual exploration with various time intervals and involving the programmer and the researcher wasused to decide on the three second display interval for these screens. Further, besides all the aboveinterface design considerations, other aspects of the FINALYZER-XS design, e.g., the format forthe presentation of the ratio tables, and the inclusion of a logo screen for the system, etc., weremodelled to be similar to that of commercially available DSS-type financial analysis softwarepackages such as FSAP [Stickney, 1990] and FISCAL [Halcyon Group, 1990]. It was felt that thiswould help increase the realism of the experimental system.974.2.2.3 Knowledge Acquisition IssuesThe modular structure of Figure 4.1 was also used as the basis for the knowledge acquisitionactivities that were undertaken to complete the experimental system. Knowledge acquisition for eachof the seven types of financial analysis was performed independently in a sequential manner. Foreach of these analysis-types, the appropriate feedforward, system operation, and feedback screenswere developed with the help of a team of five experts in financial analysis. The first two of theseexperts were the same individuals who had earlier been involved in the development and refinementof the Canacom Corporation case. One of them (Expert 1) is a manager with the InvestmentManagement Division of Alberta Treasury, which is responsible for the management of the multi-billion dollar Alberta Heritage Savings Fund. He has a Masters degree in finance, holds theCertified Financial Analyst (CFA) designation, has fifteen years experience in corporate lending andinvestment management, and is a past-president of the Edmonton chapter of the Society forFinancial Analysts. The second expert (Expert 2) works as a Senior Equity Analyst in the sameorganization and is a subordinate of Expert 1. He is both a Chartered Accountant (CA) and a CFAand has fourteen years of experience in financial planning and investment analysis. Considering thathis current job requires him to monitor market developments in the high-technology computermanufacturing and retail industry he possesses specialized domain knowledge of that industry. Heserved as the primary expert in the development of the experimental system. The third expert wasthe same individual who had earlier provided the concurrent verbal protocols from which thedecision process of Section 4.2.2.1 was derived. He is a partner in a medium-sized accounting firm,holds the CA designation, and has twenty years experience, primarily in the preparation andanalytical review of financial statements. The fourth and fifth experts were recruited from thebanking sector to provide a corporate lending perspective. This ensured that the expert team98comprised expertise of all the three primary areas in which financial statement analysis is heavilyused: commercial lending, investment management, and analytical review in auditing. Expert 4currently serves as the vice-president in-charge of corporate lending at the Canadian Western Bank.He holds a Bachelors degree in the physical sciences and has twenty-three years of experience incommercial lending at all levels of the corporate hierarchy. Expert 5 is a subordinate of Expert 4,holds the CFA designation and possesses twelve years of experience in commercial lending at twobanks.For each of the seven types of financial analysis, knowledge acquisition generally involvedobtaining the expert team's consensus on the composition of the feedforward, system operation, andfeedback screens. In this sense, the modular structure presented in Figure 4.1 greatly facilitated theknowledge acquisition. However, it was first necessary to validate the modular structure. For thispurpose, two more sets of concurrent verbal protocols were collected. Experts 2 and 5 providedthese protocols. These were then analyzed by flowcharting the decision processes used. This wasdone at the same level of analysis as the decision process that is summarized in Section 4.2.2.1. Itwas found that these experts generally used a similar decision process. They performed each of thesame seven types of analysis independently, however, the order in which they performed themvaried. While this validated the seven types of analysis included in the modular structure, it alsomade the case for flexibility in the design of the system in relation to the order in which the variousanalyses could be performed. There were also some minor differences in the classification and theterminology used for the seven types of analysis. For example, expert 5 performed profitabilityanalysis separately from asset utilization analysis and viewed it as being distinct type of analysis.Another difference was in the use of cash flow analysis. Similar to expert 3 who provided the99earlier protocol, expert 2 placed significant emphasis on it and used cash flow ratios as part of eachof the seven types of analysis. Expert 5, however, viewed it as being strictly a subset of funds flowanalysis. These two differences reflected varying levels of commitment to the use of: 1) analysisthat relates return on investment to asset turnover, and 2) cash flow analysis. Such differences ofopinion were generally resolved by consulting other experts and by consulting the accountingprofessor who helped design the Canacom design. In some situations where two different optionswere just as valid, the option favoured by of the primary expert (expert 2) was selected.The next step involved designing the composition of the feedforward, feedback, and systemoperation screens. It was decided that for each of the seven types of financial analysis, tables of therelevant ratios would be used as the system operation screens. While these tables had beendeveloped earlier as part of the development of the Canacom case, the five experts reviewed themand all their suggestions were incorporated into the revised tables. The bulk of these suggestionsrelated to the addition of particular ratios. The resulting tables can be viewed as beingcomprehensive collections of ratios that are required to perform any of the analysis. This wasimportant as it reinforced user perceptions of the completeness of the experimental system. Duringthe subsequent data collection stage, none of the subjects complained about some particular ratiobeing unavailable. Next, the feedforward screens were developed. Considering that these screenswere to display lists of the inputs that the system used in performing a particular analysis, theycould be obtained in the first instance from the tables of ratios that comprised the system operationscreens. The researcher did this for all seven of the feedforward screens initially and, just as withthe system operation screens, got each of the five experts to review them. There were few changesthat were required by the experts for these screens. Note that the expert team never met as a group100for these review sessions, which were generally conducted individually with each expert by theresearcher. However, there were instances when the researcher met experts 1 and 2 jointly as theyworked in the same office. This was not the case, however, for experts 4 and 5 although they tooworked at the same place.The development of the feedback screens was more complicated and required much moreof the experts' time. This was because the experts had to apply their expertise in solving the caseto generate the conclusions that were to be included in the screens. Knowledge acquisition wasperformed one module at a time, starting with liquidity analysis, and used a variation of the Delphitechnique [Helmer and Rescher, 1959]. The experts were given the complete Canacom case and toldto analyze it to generate conclusions relating to one of the modules. They were also asked toprovide reasons that explained the conclusions they reached. Three of the experts preferred doingthis without the researcher being present and mailed out their responses to the researcher. The othertwo experts did not mind being "put on the spot" and generally analyzed the case in the presenceof the researcher to produce their conclusions. They generally provided their conclusions and theirrationale in verbal form. These were taped and transcribed. For each module, the researchercompiled all the conclusions provided by the experts and together with the primary expert generateda combined list. As the objective of this compilation was to ensure that the combined list ofconclusions was a comprehensive set, very few conclusions were dropped. These were thencirculated back to the five experts for their evaluation together with the reasons provided by all fiveexperts for the conclusions. The experts' evaluations of these were generally obtained by theresearcher over the phone. Considering that 1) the compilation process used to generate thecombined list was "inclusive", 2) no ranking of the conclusions was attempted, 3) the reasons for101each conclusion included in the list were provided, and 4) the names of the experts from whom eachconclusion originated was not revealed, experts seldom disagreed with the contents of the combinedlist and consensus was generally easily reached. However, they commonly provided suggestions asto how the conclusions could be further streamlined or structured. Based on these, the researchergenerated a final set of conclusions which were then incorporated into the design of the feedbackscreens. This process was repeated for all seven of the analysis modules of FINALYZER-XS andtook approximately four months to complete. The fact that it was done one module at a time andthat experts received the combined list of conclusions for their review, helped to ensure that themotivation of the experts remained high. As each expert knew that the other team members werealso performing the same task, they were always keen to know about the conclusions that weregenerated by the other team members.On the completion of knowledge acquisition for all the screens (see Appendix 6), a prototypeversion of FINALYZER-XS was implemented. Only experts 2 and 5 participated actively in therefinement of this prototype. Some of the design issues that had to be resolved at this stage includedthe number of conclusions to be included on each feedback screen, the layout of each type ofscreen, and the specific wording used for the conclusions. For example, a conscious effort wasmade to present an objective and "balanced" view in the feedback screens by incorporating bothpositive and negative conclusions relating to the Canacom case. Even in the design of the OverallSummary Screen (see Figure 4.1 and Appendix 6) an equal number of positive and negativeevaluations were included. Silver [1991, p. 162] calls this the concept of informative decisionalguidance, i.e., the system provides the pertinent information that enlightens the decision-makersjudgment, without suggesting what judgment to make or how to act.102Finally, two professors in accounting, three doctoral students in finance and accounting,and two junior financial analysts who worked with the primary expert were recruited to help in therefinement process. Their use of the prototype helped to establish its face validity, especially inrelation to the quality of its "expertise". These pilot users were also questioned after they had usedthe system to determine if they were able to detect the absence of a truly functional KBS underlyingthe FINALYZER-XS interface. None of them was able to detect it. The concern relating to the levelof expertise displayed by the system was a critical factor and much attention was paid to it duringthe entire knowledge acquisition process. It was assumed that the knowledge acquisition would notbe complete until the system could convince its users of its expertise, especially in relation to therange of inputs it used and the conclusions it produced. An assessment of the expertise of the finalversion of the experimental system was also solicited as a manipulation check in a post-studyquestionnaire that was used in the main study. This took the form of the three items presented inTable 4.1. Seven-point Likert scales were used for measurement and eighty subjects rated the itemsas follows:Item 1 : Mean = 1.93 and Standard Deviation = 1.23Item 2 : Mean = 2.24 and Standard Deviation = 1.26Item 3 : Mean = 2.52 and Standard Deviation = 1.69These results suggest that the system had a high level of content validity, i.e., users wereconvinced of the expertise displayed by the system. It also implies that knowledge acquisition wassuccessful in capturing the expertise required to solve the Canacom case. The next section discussesthe development and validation of the explanations provided by FINALYZER-XS. It representedthe most difficult aspect of the development of the experimental system.1031. I am impressed by the level of expertise displayed by FINALYZER-XS.Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree2. The quality of financial analysis performed by FINALYZER-XS can be rated as beingequivalent to that of the best human experts in the field.Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree3. Many of the conclusions and recommendations generated by FINALYZER-XS could onlyhave been produced by the most experienced financial analysts in industry.Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly DisagreeTable 4.1: Items Used to Assess The Expertiseof The Experimental System4.2.2.4 Development of the Explanations Provided by FINALYZER-XSOf the five types of explanations that were discussed earlier in Chapters 2 and 3, only threewere directly implemented in this research. In line with the hypotheses presented in Chapter 3, theWHAT and WHAT-IF explanations were omitted. The same five experts who were involved in thedevelopment of the experimental system assisted in the development of the explanations. Similarly,derivation of the explanations also involved using a variation of the Delphi method [Helmer andRescher, 1959] to obtain the consensus of the five experts for each explanation. However, theoverall procedure followed was different.Initially, the verbal protocols, provided earlier in the development of the experimentalsystem by experts 2 and 5, were used to obtain material for the explanations. This was done bycombining retrospective protocol analysis [Ericsson and Simon, 1984, p. 149] with the teachback104interview [Johnson and Johnson, 1987] method of knowledge acquisition. Experts 2 and 5 weregiven their respective protocols in transcribed form, and asked to step through their protocols whilethinking aloud about the reasons as to how and why they used each input or reached eachconclusion. The researcher used directed probes and the ethnographic query which is also knownas the paraphrasing method [Meyer, Mniszewski, and Peaslee, 1989] in an effort to focus theverbalizations on specific explanations. As well, these retrospective verbalizations were taped bythe researcher and subsequently analyzed. This analysis formed the basis for the next meetings,where the researcher presented or "taught-back" to the experts, his understanding of their reasonsas to why and how they used each input or reached each conclusion. During these presentations,the experts often corrected the researcher whenever they disagreed with his assessment of theirreasons. This two-step procedure, while sharpening the researcher's understanding of the taskexpertise, yielded rich material from which the researcher developed an initial set of explanationsboth for the inputs included on the feedforward screens and for the conclusions presented on thefeedback screens. Considering that the current knowledge acquisition literature provides noguidelines as to how expertise for KBS explanations should be elicited, this combination ofretrospective protocol analysis and the teachback interview method was found to be very useful.Other material from secondary sources, such as various textbooks on financial analysis anddiscussions with the other three domains experts, also helped the researcher to formalize the initialset of explanations.Next, the Why, How, and Strategic explanations for both the feedforward and feedbackscreens of liquidity analysis (see Appendix 6), were given to all five domain experts for review,improvements, and comments. The experts evaluated these independently and provided written105responses as to the changes they would like made to the explanations. Whenever the researcher wasunclear about the reasons for any particular change suggested by the any of the experts, thesereasons were obtained over the phone from the appropriate expert. Based on all these suggestions,the explanations were improved and consolidated into a new set by the researcher. These were thencirculated back to the five experts together with information as to the changes made and the reasonsfor each major change. The experts then provided further suggestions which formed the basis forthe next iteration. Between three to six iterations were necessary to obtain consensus, yielding acomplete set of explanations for liquidity analysis. The whole process was then repeated for eachof the six other types of financial analysis that comprised the experimental system. While most ofthe changes suggested by the experts related to the contents of the explanations, others related topresentation format and the grammatical considerations. One major change that was suggested andincorporated related to the presentation of the Strategic explanation. The experts felt that theStrategic explanations would be more useful if presented in graphical format as compared to the textformat.The experts participated unevenly in the iterative Delphi procedure and most generally foundit to be a tedious exercise. Two of the experts opted out after the second iteration for the last threeof the analyses as they felt that the explanations at that stage were satisfactory with them. However,the process was continued with the remaining experts as it was critical to the study that theexplanations reflected a high level of expertise. As with the development of the experimentalsystem, Expert 2 played a major role in the refinement of the explanations. In an effort to keep theexperts motivated two changes were made to the procedure. First, the explanations wereincorporated into the experimental system after the second iteration and the implemented prototype106was used in subsequent iterations. Second, experts were allowed to "sign-off" on any particularexplanation which they felt was in acceptable form. These changes were very successful inincreasing the motivation of the experts. The explanations were implemented into the experimentalsystem by the researcher using the Notepad and Paintbrush accessories of Windows 3.1. Thisfacilitated the improvement of the explanations at the end of each iteration. Approximately three anda half months were required to complete both the development and implementation of all 234 of theexplanations that were required.The Notepad and Paintbrush explanation files were linked to the appropriate feedforwardand feedback screens of the experimental system by incorporating buttons for accessing theseexplanations at the bottom left corner of these screens. As can be seen on the screens displayed inAppendix 6, there were separate buttons for each of the three kinds of explanations and theexplanations were not provided automatically to the users. Rather, users had to use the mouse inputdevice to explicitly click on the button corresponding to the type of explanation they wished toconsider. This corresponds to the passive accessibility format that was discussed in Chapter 2. Eachexplanation was displayed in its own distinct window when requested and only one explanationcould be viewed at any one time. The three explanation buttons were presented side-by-side in thesame order on both the feedforward and feedback screens: the Why button on the left, the Howbutton in the middle, and the Strategic button on the right. While counter-balancing schemes for thispresentation order were considered for the purposes of controlling for ordering effects, it wasrejected as it was felt that it would distract or confuse users as they interacted with the system tocomplete the experimental task. As well, it would have adversely affected the realism of theexperimental system.1074.3 Validation of the Explanations Developed In an effort to assess the face validity of the explanations that were incorporated into theprototype, two accounting professors and three doctoral students used the system together with theexplanations. The informal assessments that they provided were positive. The explanations werethen validated in a more formal pilot test using a small sample of individuals familiar with financialanalysis. This was necessary as the explanations developed were the main manipulation used in theresearch.The first objective of the pilot study was to ensure that all the explanations developed wereadequate in terms of three attributes that were identified as being possible confounding variables onthe relationship between the provision of explanations and their use by users: 1) readability, 2)understandability, and 3) definitional accuracy. Readability refers to the correctness of thegrammatical structure and vocabulary used to provide an explanation [Kintsch and Vipond, 1979].It can reasonably be assumed that a low level of explanation readability will directly influence userperceptions of explanation usefulness. The understandability of an explanation refers to thesemantics or meanings that users assign to an explanation in relating it to the real world. Users willnot, therefore, select explanations that are not understandable to them. Definitional accuracy refersto how faithfully an explanation represents an operationalization of the definition of its class. It isrelated to users expectations about what they expect to receive when they select a particularexplanation for viewing. For example, a user who selects the How explanation for viewing has acertain expectation as to what he or she will receive for it. If the explanation provided does notaccurately match this expectation it will not be considered useful by the user. Such expectations foran explanation are a direct function of the definition for the class of explanations to which the108particular explanation belongs. In addition to ensuring that the explanations developed had adequatelevels of readability, understandability, and definitional accuracy, the second objective of the pilottest was to ensure that their was a reasonable degree of equivalence both between: 1) feedforwardand feedback explanations, and 2) the Why, How, and Strategic explanations. The details and resultsof this pilot test are discussed next.Five graduating students, who were in the final weeks of the Bachelor of Commerceprogram at the University of Alberta were recruited. It was required that they be majors in eitheraccounting or finance and be familiar with financial statement analysis. They were told that theywould be paid $10 an hour for their participation. The first five who volunteered were accepted andmade to fill out a background information questionnaire and a confidentiality form that weredesigned for the main study (See Appendix 1). Four were accounting majors who were going onto study for their Chartered Accountancy examinations, while another was a finance major joiningan insurance firm as an analyst.The participants were brought together in a seminar room and formally lectured on thenature of expert systems and their interfaces including the explanation facility. They wereencouraged to ask as many questions as they wished and to learn as much as possible. Next, theywere given a demonstration of the CREDIT-ADVISOR expert system simulation (See Appendix 5)that was developed to be a tutorial in the main study. They were also given an opportunity to obtain"hands-on" familiarity with the system to understand how the interface and explanations functioned.The Canacom case was also provided to them and they were asked to familiarize themselves withits details. This session lasted about 40 minutes. They were then led to another seminar room and109seated around a VGA computer display screen that was similar to the one that was used in the mainstudy. The seating layout ensured that one person's notes could not be easily viewed by anotherperson. They were then given two information sheets that defined each of the three types ofexplanations in relation to both the feedforward and feedback screens of the system. These arepresented in Appendix 9. They were instructed to read them carefully and to understand thedefinition of each of these explanations. All questions were answered and any clarifications soughtwere provided until each participant felt that they understood all the definitions. They wereinstructed to place the definition sheets on the desks in front of them and were allowed to refer tothe definitions as often as they wished. Next, they were given multiple copies of a rating sheet (alsoprovided in Appendix 9) and told that they had to fill one rating sheet for each of the explanationsthat was to be displayed to them on the computer screen. They then read this rating sheet and it wasascertained that they clearly understood all the items that were included in it.The rating sheet comprised six items that utilized seven-point, Likert-type scales (seeAppendix 9). Items 3 and 4 elicited ratings of the "readability" of the explanations. Thesemeasurements of the readability of the explanations were collected for two reasons. First, it wasnecessary to ensure that all the explanations had an acceptable level of readability. Second, it wascritical that the various types of explanations be equivalent in terms of readability, i.e., feedforwardexplanations were as readable as feedback explanations, and the Why explanation was as readableas both the How and Strategic explanations. This calibration was critical to guard against thepossible confounding effects of differing levels of readability among the explanations [Ye, 1990].The high level of readability of an explanation does not, however, guarantee that it willunderstandable to users. Therefore, items 1 and 6 were developed to elicit ratings of the110"understandability" of the explanations. Considering that the literature was scarce in relation toinstruments for the measurement of this understandability construct, an empirical approach thatrelied largely on the literature on text comprehension was used to develop these items. Items 2 and5 related to how well the explanations as displayed met the definition of their category as per thedefinition sheets placed before the participants. Calibration of the content of the explanation types,such as measurements of informational and computational equivalence [Simon, 1978], were notnecessary as each of the type of explanations was by definition distinct from the others. While someoverlap was to be expected in the contents of the three types of explanations, they certainly couldnot be viewed as being complete substitutes for one another.Each of the 159 feedforward explanations and 75 feedback explanations were displayed onthe computer screen one at a time in a randomized order. Feedforward explanations were displayedtogether with the input item (ratio or comparison procedure) that they explained while feedbackexplanations were displayed together with the conclusion that they explained. Each participant wasrequired to complete one copy of the rating sheet for each explanation displayed. Each explanationwas displayed for as long as it took all five of the participants to complete their respective ratingsheets. The category of explanations (Why, How and Strategic) to which an explanation belongedwas not displayed.For each explanation that they read, participants initially had to specify on the rating sheetthe category of explanations to which that explanation belonged. They had to specify both: 1) if itwas a feedforward or feedback explanation, and 2) whether it was a Why, How, or Strategicexplanation. They were then immediately told the correct category to which the explanation111belonged and the correctness of each participant's response was ascertained. This had the advantageof allowing the researcher to immediately elicit from a participant the underlying reasons as to whyan explanation had been incorrectly classified. However, all explanations were correctly classifiedby all of the participants. This suggested that the participants could clearly distinguish between thethree types of explanations (Why, How, and Strategic) and also between feedforward and feedbackexplanations. The explanations developed were therefore consistent with the definitions providedfor them.Next, participants rated the particular explanation being displayed before them on the sixitems on the rating sheet. Finally, they were asked to write at the bottom of their rating sheets theircomments and suggestions for improving the particular explanation including aspects of its contents,readability and presentation on the screen. The rating sheets were then collected from theparticipants before the next explanation was displayed. Considering the tediousness of the task (234evaluations per person) two separate sessions were required and scheduled on separate days tominimize fatigue. While all five participants were present at the first session, the second session wasconducted in two sections due to participant availability constraints. Three of the participants tookpart in one section and two in the other section. In the latter, one participant was lost due to attritionand replaced with another from the same graduating class with a similar finance major. In total,each participant took approximately seven hours and fifteen minutes to complete the exercise overtwo days. This included breaks of five minutes each that were given every hour, or at theparticipants' calling, to reduce fatigue and boredom. At the end, subjects were debriefed, thankedfor their participation, reminded about their consent to confidentiality, and paid for theirparticipation.112Table 4.2: Summary Statistics for Explanation AttributesUnderstandability Readability Definitional AccuracyItem 1 Item 6 Item 3 Item 4 Item 2 Item 5Mean 6.17 6.04 6.22 6.49 6.18 6.19Std. Dev. 0.85 0.94 0.93 0.85 0.85 0.87The 1,170 rating sheets that were collected (234 explanations evaluated by 5 participants)were then analyzed with SYSTAT (1990). Table 4.2 presents the means and standard deviations foreach of the six items on the rating sheets. Note that a mean value of 7.0 for each item would signifythat the explanations were either completely understandable, readable, or accurate in relation to theirrespective definitions. The fact that all the mean values were greater than 6.0 suggests that theexplanations were easy to read, easy to understand, and represented good operationalizations of theirdefinitions. These mean values for the three attributes of the explanations represent the minimallevels that can be ascribed to the final set of explanations that were incorporated into theexperimental system. This is because every explanation that yielded a rating of 5.0 or lower on anyof the six items was re-evaluated subsequent to the pilot test and improved further based on thesubjective comments provided by subjects at the bottom of each rating sheet. In relation to theunderstandability attribute, considering that students were used as subjects to provide these ratings,it can reasonably be argued that experts would find the explanations even more easier to understandas they would have a better understanding and knowledge of the financial analysis domain.Further analysis of the explanations focused on each of the attributes independently. Tables113Table 4.3: Statistics for the Understandability of ExplanationsExplanation Type Feedforward Feedback TotalWhy 12.25 (1.42) 11.99 (1.74) 12.12 (1.54)How 12.70 (1.40) 12.24 (1.76) 12.47 (1.54)Strategic 11.92 (1.83) 11.90 (2.08) 11.91 (1.91)Total 12.29 (1.59) 12.05 (1.87) 12.21 (1.69)Table 4.4 Statistics for the Readability of ExplanationsExplanation Type Feedforward Feedback TotalWhy 12.75 (1.34) 12.28 (1.76) 12.51 (1.82)How 13.21 (1.07) 12.59 (1.64) 12.90 (1.31)Strategic 12.56 (1.68) 12.46 (2.10) 12.51 (1.82)Total 12.84 (1.41) 12.44 (1.85) 12.71 (1.57)Table 4.5: Statistics for How Accurately the ExplanationsReflect Their DefinitionsExplanation Type Feedforward Feedback TotalWhy 12.37 (1.27) 12.17 (1.63) 12.27 (1.40)How 12.91 (1.38) 12.40 (1.49) 12.65 (1.43)Strategic 12.04 (1.89) 12.06 (1.97) 12.05 (1.91)Total 12.44 (1.57) 12.21 (1.71) 12.37 (1.62)Legend: The data is in the form - Mean (Standard Deviation)4.3, 4.4, and 4.5 provide the detailed statistics for each of these attributes. Note that the six itemswere combined to yield single measurements for each of the three attributes of the explanations. Forexample, items 3 and 4 were summed to yield a common measure for readability and items 1 and6 were summed for the measure of explanation understandability. Combining the items in this114manner was possible because they utilized a similar scale for data collection. While only the meansand standard deviations for the combined variables are presented here in Tables 4.3, 4.4, and 4.5;the detailed statistics for each of the six items are presented as part of Appendix 9. A cursoryevaluation of the statistics suggests that the feedforward explanations were generally more readable,easier to understand, and more accurate operationalizations of their definitions, than feedbackexplanations. Additionally, the How explanation received the highest ratings for readability,understandability, and definitional accuracy, while the Strategic explanation rated the lowest on allthree of the attributes.The statistical significance of these differences was then tested by using the data to run threeseparate ANOVA models. These are presented in Figure 4.2. For each of the three attributes of theexplanations, two-factor ANOVA analysis was performed. The first factor was the type ofModel 1: READABILITY = B o + B2 (WHS) + B2 (FFFB) + B3(WHS*FFFB) + eModel 2: UNDERSTANDABILITY = B o + Bi (WHS) + B2 (FFFB) + B3 (WHS*FFFB) + eModel 3: DEFINITIONAL ACCURACY = B o + Bi (WHS) + B2 (FFFB) + B3 (WHS*FFFB)+eLegend:^B„ -- = -- Beta Weightse = Error TermWHS = Why, How, and Strategic Explanation TypesFFFB = Feedforward or Feedback ExplanationsFigure 4.2: ANOVA Models Used to Validate The Explanationsexplanation and it had three discrete levels conforming to the Why, How, and Strategic115explanations. The second factor had two levels: the feedforward and feedback provision ofexplanations. Prior to the analysis being conducted, the three dependent variables were evaluatedin relation to the assumptions of multivariate normality. The specific tests and procedures used toperform this evaluation of the assumptions underlying the ANOVA model, were similar to that usedin the main study and are detailed in Section 7.1 of Chapter 7. The data satisfied all the assumptionswith one exception: the assumption of distributional normality [Neter, Wasserman and Kutner,1985]. As a result, all three of the variables had to be transformed using the natural logarithmalgorithm to ensure that they conformed to this critical assumption [Lillefors, 1967; Systat, 1990].The results of the three ANOVA models are presented in Table 4.6. They reveal that at the0.05 confidence level there are no statistically significant differences in readability,understandability, and definitional accuracy, both: 1) between the feedforward and feedbackexplanations, and 2) between the Why, How, and Strategic explanations. As well, none of theinteraction effects of all the three models was significant. The p-values for the feedforward andfeedback provision of explanations (main effect) were consistently high across all three ANOVAmodels. This suggests that the null hypotheses, of there being no differences between the two levelsof explanation provision, could not be rejected. Further, it meant that for the purposes of this studyand from a statistical perspective, it was reasonable to make the assumption that the feedforwardand feedback explanations developed were equivalent in terms of readability, understandability, anddefinitional accuracy. The results for the type of explanation (WHS) main effect, while beingsimilar, are weaker, yielding p-values of 0.097, 0.119, and 0.124 respectively for the three models.Based on these results, it was decided that in the final iteration for improving the explanations, moreattention would be paid to improving the readability, understandability and definitional accuracy of116Table 4.6: ANOVA Results: Validation of ExplanationsVariables and Effects DF F-value I^P-valueA. Dependent Variable: Readability of ExplanationsType of Explanation (WHS) 2 1.66 0.097Feedforward/Feedback Explanations (FFFB) 1 0.12 0.903WHS*FFFB Interaction 2 1.53 0.125B. Dependent Variable: Understandability of ExplanationsType of Explanation (WHS) 2 1.56 0.119Feedforward/Feedback Explanations (FFFB) 1 0.02 0.985WHS*FFFB Interaction 2 0.92 0.359C. Dependent Variable: Definitional Accuracy of ExplanationsType of Explanation (WHS) 2 1.54 0.124Feedforward/Feedback Explanations (FFFB) 1 0.09 0.930WHS*FFFB Interaction 2 0.85 0.399the Strategic and Why explanations, as they had ranked lower than the How explanations. Theseimprovements were made based on the written subjective comments provided by the five subjectsat the bottom of the rating sheets used in the pilot study.1174.4 Summary of the ChapterThis chapter described the various considerations involved in the selection of the task domainand the development and validation of the experimental KBS and its explanations. The next chapterdescribes the research design and procedure utilized to investigate the research hypotheses ofChapter 3 within the context of the task domain that was selected and the KBS that was developed.118CHAPTER 5 : RESEARCH DESIGN & EXPERIMENTAL PROCEDURES5.0 Introduction This chapter describes the experiment which was conducted to investigate the hypothesesrelating to the determinants and the impact of the use of explanations that were provided by aknowledge-based system. The experimental design and procedures are discussed in detail and theapproach taken to data analysis is outlined. The next sections discuss how the various independentand dependent variables were operationalized. The final two sections of the chapter will focus onthe experimental procedures utilized and the approaches taken to data analysis.5.1 Independent Variables As can be seen in the research model presented in Figure 5.1, this study investigated fourfactors that influence the use of explanations: explanation provision strategy, user expertise, typesof explanations, and the level of user agreement with the system. While the first three were theprimary independent variables and were directly manipulated in the study, the level of useragreement was a moderating variable. As well, accuracy of judgmental decision-making and userperceptions of usefulness served as the dependent variables for investigating the impact of the useof explanations. The independent variables were operationalized as follows:5.1.1 Explanation Provision StrategyAs depicted in Table 5.1, explanation provision strategy was decomposed into four treatmentconditions by juxtaposing the two conditions (Yes and No) for each of its two levels: the feedback119provision of explanations and the feedforward provision of explanations. These treatment conditionswere operationalized by controlling the screens on which buttons for accessing explanations wereprovided to the subjects by the KBS (For some examples, see the screens that are presented inAppendix 6).Table 5.1: Explanation Provision StrategiesFeedforward (FF) ExplanationsYes NoFeedback (FB)ExplanationsYes BOTH ONLY FBNo ONLY FF NONE120As part of its interaction with the user, the KBS exchanged information with the userthrough two kinds of screens: 1) feedforward screens --- when it was informing the user about thevarious input's that it would use for its analysis, and was requesting the user's permission toproceed with a particular sub-analysis, and 2) feedback screens --- when it was presenting itsconclusions after the completion of the computation and evaluation required for the sub-analysis.As was depicted in Figure 4.1 of Chapter 4, for each type of sub-analysis that the systemperformed, the feedforward screen was first presented by the system prior to performing thenecessary computation and evaluation. On the completion of this computation and evaluation itpresented the feedback screen. Appendix 6 includes all the feedforward and feedback screens thatwere used for the seven sub-analyses that the KBS performed. Subjects in the "feedforward only"group had explanations made available to them on every "input" screen that they viewed. Subjectsin the "feedback only" group could access the explanations made available to them on every"conclusion" screen that they viewed. The "both feedforward and feedback" group had explanationsprovided on both the input and conclusion screens that they viewed. Finally, the "neither feedbacknor feedforward" condition was operationalized by excluding access to explanations from all of thescreens of the system.It was important to ensure that each explanation that was presented in feedforward form wasconsistent, in terms of both the effort required to access it and the format in which it was presentedon the screen, to its counterpart presented in feedback form. For this reason, all the explanationsprovided using either of the feedforward or feedback provision strategies were designed to beconsistent in both these two respects. As was discussed in Chapter 4 earlier, the differences betweenthem were limited to the differences between the feedforward and feedback cognitive learning121constructs that were discussed earlier: 1) feedforward explanations were not case specific whilefeedback explanations were focused on case-specific outcomes, 2) feedforward explanations weremade available prior to the system performing the relevant financial analysis computation/evaluationwhile the feedback explanations were presented subsequent to it, and 3) feedforward explanationsrelated to the inputs used in the particular financial analysis computation/evaluation that was to beperformed while feedback explanations related to the outcomes of that computation/evaluation.5.1.2 User ExpertiseThe novice, apprentice, and expert levels were identified earlier in Chapter 2 as being thethree relevant theoretical dimensions of the user expertise construct. However, only two of theselevels were measured in this study. These were the two end points of the user expertise continuum;the novice and expert levels. The reason for this is that there are significant measurement difficultiesinherent in pinpointing the exact cut-offs for the middle "apprentice" level. By focusing on the endlevels a more valid operationalization would result. It should be noted, however, that thedisadvantage of this is that the results of the study would not be generalizable to the whole userexpertise construct. Therefore, two distinct groups of subjects were carefully selected to maximizethe differences between them on the measurement scales used to operationalize user expertise, asdiscussed in the next paragraph. This was facilitated by the fact that the procedures required torecruit expert subjects differed significantly from those needed for novice subjects.Bedard [1989, p. 114] asserts that as the construct of expertise is impossible to observe, itshould be operationalized using multiple measures representing observable criteria such as: yearsof experience, educational and professional qualifications, evaluation reviews, and the results of peer122reviews. Of these, and because participation in the study was voluntary, only the first two wereused. Novice subjects were recruited from amongst accounting and finance students who werefamiliar with the declarative knowledge and basic procedures of financial analysis. Subjects had tohave taken the equivalent of at least one introductory-level course directly related to financialanalysis, and to have little or no working experience in performing it routinely. Subjects recruitedincluded undergraduate and graduate students at the University of British Columbia and theUniversity of Alberta, and those studying to qualify as chartered accountants (CA) and certifiedgeneral accountants (CGA) in Canada. Expert subjects were recruited from amongst experiencedfinancial executives and loan analysts from various financial institutions in Vancouver andEdmonton. As participation was voluntary, conscious effort was made to select those who possessedsome related professional qualifications beyond the undergraduate degree, such as the CA andcertified financial analyst (CFA) designations. As well, they had to possess at least three years ofpost-qualifying, working experience that directly related to financial analysis.5.1.3 Types of ExplanationsAll three of the Why, How and Strategic explanations were used in the study. Chapter 4described the manner in which they were developed and operationalized in relation to the definitionsdeveloped for them in Chapter 3. It also described the pilot test that was used to validate them interms of readability, understandability and definitional accuracy. All three of the explanations werealways made available simultaneously, and users were free to use as many of these explanations asthey wished, as well as in any order that they choose.1235.1.4 Level of User AgreementNo attempt was made to manipulate or control this moderating variable. Rather, users'agreements with the conclusions presented by the KBS were captured for evaluating thehypothesized: 1) inverse relationship between the level of user agreement and the use ofexplanations, and 2) moderating influence of user agreement on the relationship between userexpertise and the use of explanations. Users were asked to specify their level of agreement witheach conclusion that the system presented to them on a single-item, seven-point Likert-type scale.This measurement was taken after the user had evaluated a particular system conclusion, but priorto his or her viewing any explanations related to that conclusion. There were 25 such measurementstaken from each subject, one for each conclusion that was presented. In the interest of maintainingconsistency of both the system interface and of user effort in using the system, subjects in all thetreatment groups provided this data. These included those who had no explanations made availableto them. While more detailed multi-item scales for measuring the level of agreement with a systemconclusion could have provided a more reliable measure, the development and use of such aninstrument was not considered feasible. The time required for a subject to complete such aninstrument for each of the 25 conclusions would have been prohibitive and would have reduced userperceptions of the realism of the experimental system and the problem solving context which wereused in the study.5.2 Dependent Variables There were three primary dependent variables: the use of explanations provided by the KBS,the accuracy of judgmental decision making, and user perceptions of usefulness. All three of theseare multi-dimensional constructs and were operationalized using multiple measures.124TOTAL EXPLANATIONS USEDFEEDFORWARD EXPLANATIONS USED FEEDBACK EXPLANATIONS USEDWHYEXPLANATIONSUSEDHOWEXPLANATIONSUSEDSTRATEGICEXPLANATIONSUSEDFIGURE 5.2: HIERARCHY OF THE MEASURES OF EXPLANATION USAGE5.2.1 Use of the ExplanationsThe explanation selection behavior of users was used as a surrogate for measuring the useof explanations. For example, if a user selected a particular explanation for viewing, it was takento mean that the particular explanation had been cognitively used by the user. This distinctionbetween the cognitive use of explanatory information and the express behavior of selecting it forviewing, as well as the advantages and disadvantages of using the latter as a surrogate for measuringthe former, were discussed in Chapter 3.The use of explanations measure was derived from the computer logs of the users'interaction with the KBS. The logs provided counts for feedforward explanations used, feedback125explanations used, as well as separate sub-counts for each of the Why, How, and Strategicexplanations used. Figure 5.2 summarizes the breakdown of these counts of explanations usage.Note that the count for total explanations used is comprised of the counts for the feedforward and/orfeedback explanations used. As discussed earlier in Section 5.1.1, these were presented on separatescreens of the system and some treatment groups did not receive either or both of the feedforwardand feedback explanations. These groups therefore had no scores for the use of either or both ofthe feedforward and feedback explanations. Whenever explanations were provided on a screen,irrespective of whether it was a feedforward or feedback screen, three types of explanations wereprovided together (see Appendix 7). These were the Why, How and Strategic explanations. Chapter4 discussed how these explanations were designed as part of the overall development of theexperimental system and provided the results of a pilot test undertaken to assess their comparativereadability, understandability, and definitional accuracy .The three levels of counts of explanation usage had to be further transformed before theycould be used in the data analysis. This was necessitated by a difference between the total numberof instances of feedforward explanations and the total number of instances of feedback explanationsthat were made available. Feedforward explanations were made available in 53 instances whilefeedback explanations in 25. Considering that in financial analysis a large number of inputs getcombined to yield a smaller set of conclusions, it was inevitable that there were a greater numberof feedforward explanations as compared to feedback explanations. Their exact numbers weredetermined by the requirement that the explanation facility of the experimental system had toadequately cover all the various aspects of the financial analysis task. This meant that comparingthe absolute counts of explanations used would be biased by the difference in the total number of126explanations provided for the two categories of explanations. As a result, the explanation usagecounts were converted into proportions of the total number of explanations that were provided foreach category. This approach is consistent with other studies that have investigated the more generalTable 5.2: Denominators Used for Computing Explanation Usage ProportionsMeasures of Explanations ProvidedNONEFFONLYFBONLY BOTHTotal Explanations Provided - 159 75 234Feedforward Explanations Provided - 159 - 159FF-WHY Provided - 53 - 53FF-HOW Provided - 53 - 53FF-STRATEGIC Provided - 53 - 53Feedback Explanations Provided - - 75 75FB-WHY Provided - - 25 25FB-HOW Provided - - 25 25FB-STRATEGIC Provided - - 25 25problem of use made of information provided by a decision support system (see for example Todd& Benbasat, 1991). Table 5.2 presents the denominators used in the computation of theseproportions of the use of explanations. These numbers represent the total number of explanationsthat were provided by the KBS for each treatment condition that was discussed in Section 5.1.1.127In summary, the three levels of measures for the use of explanations were obtained andanalyzed separately. These measures took the form of ratio-type data of the proportions ofexplanations used. It should also be noted that while these measurements of the use of explanationsrepresented dependent variables in the study of the factors that influenced the use of explanations,(see the research model of Figure 5.1 at the start of this chapter), they also served as theindependent variables in the study of the effects of explanation use on the two dependent variablesthat are discussed next: the accuracy of judgmental decision making and users perceptions ofusefulness.5.2.2 The Accuracy of Judgmental Decision MakingThe task used in the study involved making judgments under conditions of uncertainty[Kahneman, Slovic and Tversky, 1982] aided by the use of a diagnostic KBS for evaluating thefinancial state of a company. Users of financial statements routinely make a variety of suchjudgments in financial decision situations. These judgments can generally be classified into theevaluative and predictive categories proposed by Hogarth [1987]. Evaluative judgments are seen asreflecting individual preferences while predictive judgments represent the combination of a set ofevaluative judgments into predictions under conditions of uncertainty. Subjects were asked to makesix evaluative judgments and two predictive judgments. The objective was to obtain multiplemeasures for the two judgment categories, as well as to ensure that all the subjects consistentlyconsidered all aspects of the problem solving case used in the experiment.The six evaluative judgments related to the: 1) liquidity, 2) capital structure, 3) assetutilization and profitability, 4) market valuation, 5) financial management, and 6) operating128management, aspects of the experimental case. Each of these measures was collected using single-item ten-point Likert type scales yielding ordinal data. The two predictive judgments werepredictions of the net income in the coming year and the optimal loan size that would be appropriatefor the company in the experimental case. These measures were collected as ratio-type data in theform of amounts in millions of dollars. The Judgment Recording Sheets included in the experimentaltask material presented in Appendix 2 were used to collect all these judgments.The accuracy of all the eight judgments was determined in relation to a set of "correct"consensus estimates agreed upon by a panel of five expert judges. A variation of the Delphi method[Helmer and Rescher, 1959] was used to obtain these estimates. This involved circulating back toeach of the judges a combined list of all the judgments initially made individually by each judge.Each judge could then change his initial estimates and these formed the basis for the next round ofevaluation. As there was minimal variation in the judgments, especially for the six evaluativejudgments, only two rounds of evaluation were needed to reach the consensus set of estimates. Asthese were the same judges who participated in the development of the experimental system (asdiscussed in Chapter 4), the final estimates were agreed upon at a lunch meeting held at the end ofthe development of the system. Thus, there was an extensive exchange of ideas, opinions, andjustifications between the judges prior to their agreement on the final set. A description of thequalifications and competence of these panel of judges and a more detailed description of theprocedure used was given in Chapter 4.Subjects' estimates of each judgment were compared to the correct consensus score and ameasure of the absolute deviation obtained as a measure of its accuracy. This is similar to the129accuracy and bias measures, such as the mean absolute error, that have been suggested for suchfinancial analysis judgments [Foster, 1986]. The absolute deviations for the six evaluative judgmentswere summed for each subject to yield an overall accuracy measure for the evaluative judgmentcategory. The same was done for the two predictive judgments with one difference. The absolutedeviation for each judgment was first converted to be a proportion of the value of the correctconsensus score. Only then were the deviations of the two judgments summed to yield one accuracymeasure for predictive judgments. This was necessary to eliminate differences in the size of thedeviations caused by differences in the size of the correct scores [Foster, 1986]. This procedureamounts to ensuring that there was equivalence in the scales being combined.In summary, deviation scores were used as measurements for accuracy. As well, there weretwo composite measures, one for evaluative judgments and the other for predictive judgments.Considering the considerable differences between the nature of these two categories of judgmentsand the fact that one was measured using ordinal data and the other using ratio data they wereassessed separately.5.2.3 Perceptions of UsefulnessTwo perceptions of usefulness were measured. One focused on the usefulness of the overallknowledge-based system that the subjects used in the study. The other focused on the usefulness ofthe explanations that were provided by the system. It was necessary to separate the measurementof the two as they can be viewed as being at two separate levels of analysis, and little is knowncurrently about how the perception of the usefulness of one aspect of a system's interface impactsits overall perception of usefulness.1305.2.3.1 Perceived Usefulness of the SystemThis dependent variable was included to test the hypothesized effect of the use ofexplanations on the degree to which users perceived that using the system providing the explanationsenhanced their task performance. Capitalizing on the cumulative tradition [Keen, 1980] of recentinformation systems studies of system usefulness and the fact that measurement instruments withhigh degrees of validity and reliability have been developed [Moore & Benbasat, 1991; and Davis,1986], a ten-item instrument was designed to measure this construct based on these prior studies.1. Using FINALYZER-XS enabled me to accomplish the financial analysis task more quickly.2. Using FINALYZER-XS improved the quality of the analysis I performed.3. Using FINALYZER-XS made the financial analysis task easier to do.4. Using FINALYZER-XS enhanced my effectiveness in completing the financial analysis task.5. Using FINALYZER-XS gave me more control over the financial analysis task.6. Using FINALYZER-XS increased my productivity.7. Using FINALYZER-XS allowed me to accomplish more analysis than would otherwise have beenpossible.8. The use of FINALYZER-XS greatly enhanced the quality of my judgements.9. FINALYZER-XS conveniently supported all the various types of analysis required to complete thejudgmental decision making tasks.10. Overall, I found FINALYZER-XS useful in analyzing the financial statements.Table 5.3: Items Used for Measuring Usefulness of the SystemConsidering that the definition of usefulness in this research was similar to that used by Davis[1986] and to what Moore and Benbasat [1991] term as "relative advantage", a combination of theitems on these two instruments were derived and adapted to the context of this study. Table 5.3131presents the items that comprised the instrument. Data was collected in the form of seven-pointLikert-type scales. Of the ten items comprising the instrument, items 1, 2, 3, 4, 5, 6, and 10represent variations adapted from items found on both the Moore & Benbasat and the Davis scales.Item 7 was adapted solely from the Davis scale, while items 8 and 9 were added to reflect twocritical aspects of the use of knowledge-based systems for making judgments. Item 8 focuses on theassociation between the use of a knowledge-based system and the quality of the judgments madeusing it. This link between the system and the resulting judgments (outputs) is much more directand evident in this case as compared to the more generalized technological innovations, such aspersonal work stations, that were the focus of the prior studies. Similarly, item 9 focused on thecompleteness of the system in supporting all the various types of sub-analysis required to completethe judgmental task involved, an aspect that was not covered in the prior studies considering thatthey dealt with systems whose applications were not as problem specific as in this study.Additionally, three items on the Moore & Benbasat instrument and two on the Davis instrumentwere not used as they were deemed to be not directly relevant to the case of knowledge-basedsystems.5.2.3.2 Perceived Usefulness of the Explanations ProvidedThis construct tested the hypothesized effect of the use of explanations on the degree towhich users perceived that using the explanations enhanced their task performance. It was a morefocused measure than that of the overall system usefulness discussed in the last section in the sensethat it directed the user to specifically consider the explanations component of the overall KBS.Development of the items and scales for this instrument was similar to that of the last section, andas before incorporated the sub-constructs used in the Moore & Benbasat and Davis instruments.132Table 5.4 presents the nine items that were developed. The first seven items and the last item, while1. Using the explanations provided by FINALYZER-XS enabled me to accomplish the financialanalysis task more quickly.2. Using the explanations provided by FINALYZER-XS improved the quality of the analysis Iperformed.3. Using the explanations provided by FINALYZER-XS made the financial analysis task easier to do.4. Using the explanations provided by FINALYZER-XS enhanced my effectiveness in completing thefinancial analysis task.5. Using the explanations provided by FINALYZER-XS gave me more control over the financialanalysis task.6. Using the explanations provided by FINALYZER-XS increased my productivity.7. The explanations provided by FINALYZER-XS had a significant impact on my judgements.8. My understanding of financial analysis has been enhanced by the use of the explanations providedby FINALYZER-XS.9. Overall, I found the explanations provided by FINALYZER-XS useful in analyzing the financialstatements.Table 5.4: Items Used for Measuring Usefulness of the Explanations Providedbeing worded slightly differently, are similar in content to their equivalents presented in Table 5.3and as discussed earlier. Item 8 was added to take into consideration of the fact that improving theunderstanding of users about the task domain was a fundamental goal of providing explanations andwould possibly have an impact on the perceptions of explanation usefulness. Additionally, the itemthat related to the completeness of the system in terms of the various sub-analysis performed wasnot included.133Prior to the use of both the two instruments in the data analysis, various validity andreliability tests were performed. These tests suggest that they had a high degree of validity andreliability. They are reported in detail in Chapter 6 together with the results of the study.5.3 Research Design All three of Campbell and Stanley's [1963] true experimental designs were considered foruse in conceptualizing the appropriate experimental design. The pretest-posttest control group designwould have involved using a repeated measures design with subjects in all treatment groups initiallyperforming the experimental task in the NONE explanation provision condition, i.e., with no KBSexplanations being available to them. These would have yielded a pretest score. The subjects wouldthen repeat the experimental task, this time with the explanations pertaining to their respectivetreatment groups being provided to them, yielding a posttest score. While this design would haveminimized the number of treatment groups used by eliminating the need for the NONE conditionor no explanations provided control group, it had the disadvantage of possibly causing the subjectsto react differently to the treatments because the pretest would "prime" their responses during theposttest. To guard against this danger of a reactive pretest and the strong priming andpresensitization effects that are to be expected, it was decided that this design option would not beappropriate. For these same reasons, and for the fact that the extra gains from using the Solomonfour-group design would not have been worth the doubling of effort that would have been required,especially from the practical viewpoint of meeting sample size requirements, this second designoption was also not deemed feasible.134Instead, after careful consideration of the research hypotheses of Chapter 3 and theoperationalization and measurement of the independent and dependent variables discussed in Section5.2 and 5.3, a posttest only control group 4 X 2 factorial design was selected as shown in Tables5.5 and 5.6. The eight treatment cells obtained by crossing the four levels of explanation provisionTable 5.5: Research DesignNoExplanationsFeedforwardOnlyFeedbackOnlyBothExplanationsNovices Group 1 Group 2 Group 3 Group 4Experts Group 5 Group 6 Group 7 Group 8strategy with the two levels of user expertise resulted in a primarily between subjects design witha separate treatment group for each cell. With this design, it was possible to evaluate all theresearch hypotheses in a between subjects manner, with the exception of those relating to the typesof explanations (Why, How and Strategic) used. These were evaluated within subjects. It is alsoimportant to consider the role of the control groups, i.e., groups 1 and 5 whose subjects were notprovided with any explanations at all. These groups served as control groups only for theinvestigation of the impact of explanation use. For the investigation of the determinants of the useof explanations they were not of use and were eliminated. Having no explanations provided to thesesubjects ensured that their explanation scores were by default zero. While this had an unbalancinginfluence on the factorial design, it was felt that it still was the best option as compared to thealternatives.135Note in Table 5.6, that subjects were randomly assigned to the various treatment groups.This was performed in a two step procedure to take into account the operationalization of the userexpertise construct as discussed in Section 5.1.2. Subjects were initially deliberately assigned orblocked into either an expert or a novice pool depending on their qualifications and experience.Next, subjects in the novice pool were randomly assigned to one of the four novice groups, whilethose in the expert pool were assigned randomly to the one of the four expert groups. Prior to thesubjects performing the experimental task using a variation of the experimental system pertainingto their treatment condition (X), all subjects completed two tutorials. This was to ensure that allsubjects reached an equivalent stable state of learning about the use of the experimental system andTable 5.6: Experimental FormatR(novice)R(novice)R(novice)R(novice)R(expert)R(expert)R(expert)R(expert)Xgroup 1: no explanationsXgroup 2: feedforward explanationsXgroup 3: feedback explanationsXgroup 4: both explanationsXgroup 5: no explanationsXgroup 6: feedforward explanationsXgroup 7: feedback explanationsXgroup 8: both explanations0102030405060708to guard against novelty effects. These are discussed in detail in the next section as part of theexperimental procedures.1365.4 Experimental Procedures and SubjectsThis section will present details of the subject population and their recruitment, as well asthe experimental procedure followed. Discussion of the financial analysis domain, the experimentaltask used, and the development of the experimental system is not included as it is described inChapter 4.5.4.1 Sample Size and the Recruitment of SubjectsStatistical power analysis [see Chapter 8 of Cohen, 1988], estimates of the variance expectedthat were obtained from pilot subjects who completed the experimental task initially, and pragmaticconsiderations relating to the difficulty of recruiting a large number of experts in financial analysis,were taken into consideration in deciding on the sample size that was required. As well, to facilitatestatistical analysis by minimizing the complexity of the research design, a decision was made tohave balanced group sizes for the eight treatment groups. Balancing these considerations, a targetTable 5.7: Power Estimates for a Sample Size of 80Significance Level: 0.01 0.05 0.10Detecting a Medium Main Effect: (f=0.25) 11% 29% 42%Detecting a Large Main Effect: (f=0.4) 45% 70% 80%of 80 subjects was set, with ten subjects to each treatment group. Considering the largely betweensubjects design, an assessment of this sample size using power analysis [Cohen, 1988, pp. 284]yielded the results presented in Table 5.7. Based on these, it was felt that a power level of 70% fordetecting a large main effect for alpha = 0.05 would be adequate. Additionally, it was also resolved137that, in the analysis of the data, alpha would be controlled at both the commonly accepted 0.05 levelas well as the less rigorous 0.10 level.Recruitment of novice subjects was handled independently from the recruitment of expertsubjects. Initially, two distinct information packages were developed for the two groups of subjects.These are included in Appendix 1 that presents all the material used for recruiting subjects. Thesepackages comprised an information sheet detailing the study and specifying the criteria forparticipation and the tasks involved, a consent form, and a background information questionnaireto be filled out by prospective subjects. Also included were instructions as to how the consent formand questionnaire could be returned to the investigators. Participation was completely voluntary.A total of 896 copies of the information package for novices were distributed to prospectivesubjects. Of these 340 were distributed to undergraduate students and 46 to graduate students inaccounting or finance. Distribution was generally done at the start of class and subsequent to theresearcher making a short five minute presentation of the objectives of the study and the benefitsto students of participation. As well, the cooperation of the local chapters of the Canadian CertifiedGeneral Accountants Association and the Canadian Institute for Chartered Accountants was alsoobtained in distributing the packages to their students. A total of 250 copies of the package weredistributed by the former organization and 260 copies by the latter. Out of this total of 896distributed, 43 positive responses were received and 41 novices participated.Information packages for expert subjects were distributed in two ways. Members of theVancouver and Edmonton chapters of both the Society of Financial Analysts and the Financial138Executives Institute were sent a copy of the package as part of their respective organizations'regular mail outs. A total of 320 were mailed out to members of the Society of Financial Analystsand another 145 to members of the Financial Executives Institute. Additionally, a further 74 weredistributed to prospective subjects in financial and lending institutions who were contacted directlyby letter or phone. Generally, this involved the researchers making contact with and obtaining thecooperation of one individual in each institution. This individual would then distribute copies of thepackage to prospective subjects who were his peers or subordinates. Of a total of 539 distributed,46 were returned and 43 were eventually scheduled.5.4.2 Experimental ProceduresFigure 5.3 presents a summary of the experimental procedure followed. Sessions werescheduled by phone and were conducted individually by a laboratory assistant who was presentthroughout the whole session. Considering that data was collected at two locations, the twolaboratory assistants were furnished with the same set of detailed written instructions as to theconduct of each step of the data collection process. A copy of these instructions are attached asAppendix 3. The researcher also trained both the assistants in the procedure to be followed andobserved their performance at some of the sessions. To avoid mix-ups and the loss of experimentalmaterial, the experimental task material was sequentially pre-numbered and different coloured paperwas used to signify the various treatment groups. Copies of these experimental task packages areattached as Appendix 2, at the end of the dissertation.Before the arrival of a participant, the laboratory assistant reviewed the BackgroundInformation Question sent in by each of the participants and noted information about the139Consent Form ReceivedLSession Scheduled by PhoneSubject Arrives and Reads General Information SheetSubject Completes the MOUSE tutorial_J LSubject Completes the CREDIT-ADVISOR tutorialSubject Analyzes the Canacom Case Manuallyand Completes the Judgments Subject Uses the Experimental System to Analyzethe Canacom CaseGroups 1 & 5 Groups 2 & 6 Groups 3 & 7 Groups 4 & 8No Feedforward Feedback FeedforwardExplanations Explanations Explanations and FeedbackProvided Provided Provided ExplanationsProvidedSubject Completes Judgement Recording SheetsSubject Completes Post-Study Questionnaire Subject Debriefed (and Novices Paid)FIGURE 5.3: Summary of Experimental Procedure140participant's familiarity with business computing, use of the mouse input device, and expert systemsin general. This helped the assistant to anticipate any problems that might occur during the session.On arrival, participants read a one page General Information Sheet that told them that the objectivesof the study were to 1) evaluate the use of a financial analysis expert system to complete a loananalysis case, and 2) evaluate the judgments made with the assistance of such a system. It was notrevealed that explanation use was a focus of the research. The laboratory assistant also reviewedthe contents of this information sheet with the participants to ensure that they understood itcorrectly. Particular attention was drawn to the overview of the complete session presented in thedescription to ensure that subjects knew what to expect at any stage of the session. It was alsoemphasized that time was not a factor in the study and that they should take as long as they wishedat any stage.Participants were then told to sit facing a 33 megahertz DOS-type personal computer witha colour screen. They were then given a step-by-step tutorial that familiarised them with the use ofthe mouse input device. All the screens of this tutorial are shown in Appendix 4. The applicationchosen for the tutorial was the provision of information relating to the climate, the universities, andthe sports teams of Western Canadian cities. It was a simple application providing information thatshould generally be familiar to the subjects. The objective was to minimize the stress and anxietyof those participants who were unfamiliar with the hardware and software while they learned theuse of the mouse device in relation to the push-buttons, radio-buttons, and the multiple windowsconcept [Borland International, 1991] that the experimental system used. At the same time, it hadto keep the interest of those participants who were already familiar with such interfaces, as theywent through the motions of completing the tutorial. At the end of the tutorial, all subjects were141allowed as much time as they wished to step through the tutorial system again until they feltcomfortable and confident in the use of the system.Next, participants were given another information sheet entitled "A Short Note on ExpertSystems" (see Appendix 2). This sheet defined in simple terms what expert systems were, brieflyexplained how they were developed, and using the analogy of a human expert in medical diagnosisillustrated what they could expect from such a system in terms of input information and conclusions.As well, participants in the treatment groups that were to receive explanations were told that expertsystems could also provided explanations and the three kinds of explanations they were to receivewere defined precisely for them. For participants who were to receive only feedforwardexplanations these definitions were couched in input information terms, while those who were toreceive only feedback explanations had their definitions put in terms of output information. Forparticipants receiving both feedforward and feedback explanations the definitions were combinedto be inclusive using the co-ordinate conjunction "or". As well, subjects were told that they couldkeep these definitions in front of them throughout the time they used the KBS. As well, thelaboratory assistants were instructed to ask the participants if they were clearly understood the threetypes of explanations. They were also told to use the analogy of a naive child asking questions aboutthe world, such as "Why is the sky blue?", "How did the sky become blue?", and "What role doesthe blue sky play in the greater scheme of things", to clarify and distinguish between these threeexplanations. However, few participants had trouble understanding them.At the next step, participants were required to use the CREDIT-ADVISOR tutorial expertsystem simulation for evaluating consumer credit applications. As before, they were given a step-by-142step tutorial to guide them and all participants were required to use perform the tutorial. They werealso told that the FINALYZER-XS KBS they would be using later had an interface similar to thissystem. As was the case for the FINALYZER-XS KBS, it was not revealed to them that theCREDIT-ADVISOR system was a simulation. As described in Chapter 4, conscious effort was putinto ensuring that this tutorial system had a high level of expertise and appeared as functional as theFINALYZER-XS KBS. This was critical to minimize the novelty effect of being overwhelmed or"seduced" by the quality of the expertise of the KBS at the later stage. As well, subjects were toldto take as much time as they wished to understand the CREDIT-ADVISOR system by steppingthrough it more times. This combined with the fact that no problem-solving case was imposed uponthe participants at this stage ensured that subjects who could be considered to be the "tinkerous-types" had ample opportunity to satisfy their curiosity before they used the experimental session.This reduced another possible novelty effect and ensured that subjects reached "steady-state" priorto the main experimental session. The laboratory assistants also made sure that the participants whowere to receive explanations did use and familiarized themselves with all the three kinds ofexplanations that were available to them. As well, participants were also told to relate theexplanation definitions placed in front of them to examples that were provided by the CREDIT-ADVISOR system. This was necessary to ensure that they adequately understood what to expect foreach type of explanation. Just as it was to be the case with the main experimental system, theexplanations that CREDIT-ADVISOR provided were dependent on the treatment group to whichparticipants belonged. A copy of the screens of CREDIT-ADVISOR are presented in Appendix 5.The participants were then asked to move to another desk that was placed away from thecomputer. Here, they were given a copy of the Canacom Corporation Loan Analysis Case together143with a set of Judgment Recording Sheets. They were told to familiarize themselves with the caseand with the judgments that had to be made and to ask any questions that they wished to haveclarified. Although all the ratios and other computations were given to them in the form of tables,they were also provided with a financial calculator and paper to write on. They were then told toanalyze the case without the use of the KBS and complete the eight judgments. Initially, it was feltthat there was no need to have the participants make these eight judgments at this stage. However,pilot testing revealed that doing so got them more involved and focused in their analysis, as wellas better familiarized them with the task and the judgments. As well, it better replicated the real-lifesituation (external validity) of users using an expert system for financial analysis. Most users of suchsystems would manually analyze and form some judgments about a company before they consultedan expert about it. Thus, all participants were required to complete the judgments manually priorto them consulting the FINALYZER-XS knowledge-based system. While these judgments were notused in the main data analysis, they served as a manipulation check in relation to how the KBSaffected their judgments. As before, participants were given as much time as they wished tocomplete this stage.While participants were manually analyzing the Canacom case, the laboratory assistants wereinstructed to load on to the computer the relevant version of the experimental system with the nameof the log file specified correctly. On handing in their completed Judgment Recording Sheets,participants were given a new set of these sheets and reseated in front of the computer screen. Theywere then told to use the experimental system to help them re-analyze the case and make the eightjudgments again. They were allowed as much time as they wished and the computer recorded everykeystroke that they made. The laboratory assistants were instructed to sit unobtrusively at the back144of the room during this stage. However, they were required to keep track of the seven sub-analysesof FINALYZER-XS that the participants used. They had to ensure that the participants used allseven of them. However, the order in which they were used was completely dependent on thesubject's preferences. During this entire stage, no mention was made to the participants about theexplanations provided by the system and they were free to use as many or as few of them as theywished. When participants had finished using the system and were at the final "OVERALLSUMMARY" screen (see Appendix 6 for copies of all the screens that comprised the KBS), theywere reminded to complete the eight judgments again. The completed Judgment Recording Sheetswere then collected and verified as to their completeness.A post-study questionnaire comprising the items measuring user perceptions of both theusefulness of the system and the explanations used was then administered. Also included weremeasures of other secondary constructs that served as manipulation checks, e.g., the ease-of-use ofthe system and motivation to use the explanations. Their primary purpose was to determine ifparticipants had trouble using the system or were properly motivated. As well, open-ended questionswere included that asked for subject' subjective opinions of the major strengths and weaknesses ofthe system they used, how they would improve the system, and what changes they would like tosee in the way the system provided explanations. A copy of this post-study questionnaire is includedin the experimental task material of Appendix 2.At the end, participants were thanked for their participation and debriefed by being presentedwith a one-page debriefing protocol (see Appendix 2) which was reviewed with them by thelaboratory assistant. This revealed to them the true objectives of the study and the nature of the KBS145they used. As well, the reasons as to why this information was not revealed to them earlier werealso provided. Novice subjects were also paid an honorarium as is discussed in the next section. Thetotal elapsed time between the arrival of a participant and the completion of the debriefing rangedtypically between 1.5 to 2.5 hours, with the average being slightly less than two hours.5.4.3 Performance Incentives and Payment to SubjectsMonetary incentives have been shown to improve the accuracy of decision making,particularly on realistic judgment tasks that are reasonably complex [Wright and Aboul-Ezz, 1988].Participants in each treatment group were given the same incentive for overall judgmentperformance. Specifically, they were informed that a $50 prize was to be awarded to the top twentypercent of participants, participating under similar circumstances, who obtained the most accuratescores. It was emphasized that there was a one out of five chance of winning a prize. They wereinformed of this incentive in the General Information Sheet they read on arriving for theexperimental session. It was also emphasized to them that the time taken for completing the task wasnot a criteria for the awarding of a prize. Considering that the primary dependent variable ofjudgmental accuracy is a measure of decision effectiveness and not efficiency, this was to helpensure that participants were not put into situations where they are forced to make tradeoffs betweenviewing explanations and completing the task quickly.Novice participants were paid an honorarium of $ 12 for their participation. This amountwas not contingent on the amount of time that they spend on the experimental task or on the qualityof their performance. The motivation for this payment was solely to encourage them to participatein the experiment by partially reimbursing them for their time and the costs they incurred in146participating. Expert participants were not offered this payment as it was felt that this amount wasnot significant enough to encourage the participation of the experts or even as a reimbursement fortheir time. Considering that the average amount of time that participants took to complete all thevarious steps of the experimental session was approximately 2 hours, this payment cannot beconsidered to be adequate reimbursement for the time of either category of the participants.5.5 Basic Approaches to Data Analysis The primary methods of data analysis used were the three sub-categories of the MultivariateGeneral Linear Hypothesis (MGLH) statistical model: regression, analysis of variance, andmultivariate analysis. It was of course imperative that the data met with the assumptions underlyingthis model. Table 5.8 presents a summary of the basic statistical models that were run and whoseresults are presented in Chapter 6. As specified in Model 1 the determinants of the use ofexplanations were analyzed using ANOVA analysis. Both of the two independent variables ofexplanation provision strategy and user expertise were evaluated between subjects, while the typesof explanations were analyzed within subjects. As was discussed earlier the user expertise variablehad two levels, the explanation provision strategy variable had three levels (the none group had nousage scores), and the types of explanation variable had three levels. Model 2 tested the hypothesesrelating to the relationship between level of user agreement and the use of explanations. The unitof analysis for this model was not each individual subject as for Model 1, rather it was eachconclusion presented by the system for which the user had to specify his or her level of agreement.There were 25 such agreements provided by each participant. As well, the dependent variable forthis model was the number of explanations used that were related to each conclusion that was ratedfor agreement. Hypotheses relating to the impact of the use of explanations on the accuracy of147Model 1: EXPLANATION USE = Bo + BI (EPS) + B2(UE) + B3(TYPES) + B4(EPS*UE) +BS (EPS*TYPES) + B6(UE*TYPES) + B 7(EPS*UE*TYPES) + eModel 2: EXPLANATION USE = Bo + BIAGREEMEIVT) + B 2(UE) + B3(AGREEMENPVE)+eModel 3: ACCURACY = Bo + BIUSAGE-FF) + B2(USAGE-FB) + eModel 4: USEFULNESS = Bo + BIUSAGE-FF) + B2(USAGE-FB) + eModel 5: ACCURACY = Bo + Bl.„(TYPES USED) + eModel 6: USEFULNESS = B o + BI_„(TYPES USED) + eLegend:^Bo = AlphaBn = Beta Weightse = Error TermEPS = Levels of Explanation Provision StrategiesUE = Levels of the User Expertise ConstructTYPES = Three Types of Explanations ProvidedAGREEMENT = Level of User Agreement with a ConclusionUSAGE = Proportions of Feedforward & Feedback Explanations UsedTYPES USED = Proportions of Use of Each Explanation TypeTable 5.8: Basic Statistical Modelsjudgmental decision making and user perceptions of usefulness were evaluated using multipleregression analysis (Models 3 through 6). This was because the independent variables of explanationusage (USAGE) and types of explanations used (TYPES USED) were both non-categorical measuresand there were multiple measures for both the accuracy of judgments and user perceptions ofusefulness. However, when user expertise was included as part of these analysis, ANOVA modelswere used.Structural equation modelling [Bagozzi, 19771 was also performed to test the completeresearch model as presented in Figure 5.1. The objective was to supplement the main statistical1484000' Figure 5.4: MainStructural Equation Modelanalysis while minimizing the overall measurement error involved. It represents a way of combiningthe analysis of the determinants of explanation use with the analysis of the impact of explanationuse into one single model. A structural equation model, similar to that presented in Figure 5.4, andwhich represents the causal modelling variation of the research model of Figure 5.1, was fittedusing the Partial Least Squares (PLS) method of structural equation modelling [Wold, 1979]. Similarto LISREL [Joreskog and Sorbom, 1984], the PLS method can be characterized as being a "secondgeneration" multivariate analysis technique [Fornell, 1982].PLS was selected for use over LISREL for a few reasons. First, it better accommodates thesmall sample size of this research than LISREL. Second, the model to be fitted includes both149Figure 5.5: SecondStructural Equation Modelformative and reflective indicators. For example, the use of explanations is "formed" by the separatemeasures for the use of the three different types of explanations, and the accuracy of judgmentslatent variable is "reflected" in the eight accuracy measures that were taken. Third, both the twoindependent variables of user expertise and explanation provision strategy were operationalized ascategorical, non-interval measures. These cannot be handled easily by LISREL. Fourth, unlikeLISREL the estimates of PLS do not require adherence to any distributional assumptions. Fifth,considering that the study of the determinants and consequences of the use of KBS explanations isat the infancy stage of theory formation, PLS is more appropriate. Unlike LISREL, it is just assuitable for exploratory analysis as for theory validation.150The model in Figure 5.4 only included the paths that were directly being hypothesized inthis study. As well, there was the consideration that the various treatment groups received differentsets of explanations and therefore had varying numbers of usage scores. For example, in the caseswhere neither one or both of the feedforward and feedback explanations were provided, some usagescores were not captured. This meant that combining the data of all the groups into one set for theuse of explanations latent variable was not completely consistent. To overcome these concerns, asecond model, that was similar to that presented in Figure 5.5, was fitted. The ExplanationProvision Strategy latent variable of Figure 5.4 was dropped and additional paths from userexpertise to the accuracy of judgments and perceptions of usefulness were added. This second modelwas run separately for each of the eight groups and their results were compared. Chapter 6 andAppendix 8 will discuss in detail the specific models that were fitted.5.6 Summary of the ChapterThis chapter presented the details of the experiment that was conducted to test the hypothesesthat were developed in Chapter 3. Besides describing the operationalization of the independent anddependent variables, it presented the research design, procedures used for data collection, and thebasic statistical methods used for analyzing the data. The next chapter will present the statisticalfindings of the study.151CHAPTER 6: RESULTS OF THE EXPERIMENT6.0 IntroductionThis chapter reports the results of the laboratory experiment designed to tests the hypothesesdiscussed in Chapter 3. Each hypothesis is assessed in the context of the conceptual modeldeveloped in Chapter 3, and the knowledge-based system (KBS) and task domain described inChapter 4. Four kinds of quantitative data collected in the experiment were used as dependentvariables: 1) measures of the use of explanations obtained from computer logs, 2) measures of theaccuracy of judgmental decision-making computed from the judgments made by subjects after usingthe KBS, 3) measures of user perceptions of "system usefulness" and "explanation usefulness"captured in the post-experimental questionnaire, and 4) measures of user agreement with theconclusions presented by the KBS, obtained from the computer logs of the experimental session.The use of this data for the assessment of the research hypotheses is organized as follows.The next section details the manner in which the assumptions underlying the statistical model usedwere verified. Section 6.2 describes how the reliability and validity of the data was assessed forboth the independent and dependent variables. Section 6.3 presents the results of the tests relatingto the factors influencing the use of KBS explanations. Section 6.4 focuses on the results pertainingto the relationship between the level of user agreement and the use of explanations. Section 6.5offers the results of the tests relating to the impact of the use of explanations on the accuracy ofjudgmental decision-making and user perceptions of usefulness. The last section reports the resultsof the secondary data analysis performed, including the use of structural equation modelling and152other tests that served as manipulation checks.6.1 Evaluation of the Assumptions Underlying the Statistical Tests The analysis of variance (ANOVA) procedure of the Multivariate General Linear Hypothesisstatistical model [SYSTAT, 1990] was used to analyze the experimental data. Three conditions mustbe satisfied before ANOVA can be used to test the equality of several population means: 1) eachpopulation must be normally distributed; 2) the variances of all populations must be equal; and 3)all sample observations must be independent [Mills, 1977]. The first two of these basic assumptionsof the ANOVA model were formally tested in relation to each of the dependent variables. The thirdassumption of independent treatment effects was satisfied through the research design of the study,by operationalizing both of the independent variables, i.e., user expertise and explanation provisionstrategy, as between-subjects variables. By having a separate treatment group for each combinationof the levels of these independent variables (see the research design presented in Section 5.3), andexposing each subject to only one of the treatments ensured that the requirements of this assumptionwere met. As well, the independence of sample observations was further assured by requiring allsubjects to provide an agreement that they would refrain from discussing, any aspect of theirparticipation in the study, with other participants (see the Consent Form that is included inAppendix 1).The assumption of multivariate distributional normality was tested prior to testing for thehomogeneity of variances assumption. This was done so that variables not conforming to thenormality assumption could be appropriately transformed prior to the assessment for homogeneityof variances. Distributional normality was assessed in multiple ways including: 1) evaluations of the153differences between the means and medians, 2) assessments of the measures for skewness andkurtosis, 3) examinations of the shape of normal density plots, and 4) the use of the Kolmogorov-Smirnov Test [Siegel, 1956] for the shape and location of a sample distribution together with theLillefors Normal Probability Test [Lillefors, 1967] that utilizes the standard normal distribution[SYSTAT, 1990]. The variables that did not conform to the assumption of multivariate distributionalnormality were transformed using various transformation algorithms, e.g., the arc-sine, naturallogarithm, and Fisher's Z algorithms [SYSTAT, 1990; Box and Cox, 1964]. The homogeneity ofvariances assumption was assessed for each dependent variable through the use of Bartlett's test forthe homogeneity of group variances [Neter, Wasserman and Kutner, 1985].The results of the Kolmogorov-Smirnov (Lillefors) Test are presented in Table 6.1. Lilleforsp-values greater than 0.05 indicate that the distributions approximated the standard normaldistribution. Only the two perceptual measures of usefulness conformed to the assumption ofdistributional normality. All the other dependent variables had to be transformed using tail-stretchingtransformations in an effort to stabilize their variances [Box and Cox, 1964]. The primary reasonfor this was that the data for all these variables were measured in the form of proportions [Cohenand Cohen, 1975, p. 254]. The four measures of the use of explanations and the user agreementmeasure were proportions of the number of explanations selected out of the total number ofexplanations available. The accuracy measures represented proportions of the differences betweensubjects' judgment scores and correct scores to the absolute values of the correct scores. Section5.2 of Chapter 5 discusses in detail the exact computation procedures for these measures.For each of the variables that required transformation, appropriate transformations were154TABLE 6.1:TESTS OF THE ASSUMPTIONS UNDERLYING THE ANOVA MODELDependentVariables(LilleforsInitialDataKolmogorov-SmirnovProbability: p-value)TransformationAppliedTestTransformedDataBartlett Testfor theHomogeneityof GroupVariances(p-value)Use of Explanations:Why Explanations 0.00 Nat. Log. 0.55 0.34How Explanations 0.00 Nat. Log. 0.67 0.48Strategic Explanations 0.03 Arc-sine 0.51 0.18Total Explanations 0.00 Nat. Log. 0.78 0.65Accuracy of:Evaluative Judgments 0.00 Nat. Log. 0.15 0.47Predictive Judgments 0.01 Arc-sine 0.16 0.36Perceptions of Usefulness:System 0.07 None --- 0.56Explanations 0.06 None --- 0.46User Agreement 0.00 Nat. Log. 0.32 0.91identified and applied based on an evaluation of the shape of the a priori normal density plot. Theresults of these transformations were then evaluated and the transformation yielding the highestLillefors probability, an acceptable normal probability plot, and the best measures of kurtosis andskewness was selected. For example, for the use of the Strategic explanations variable, all three ofthe square root, arc-sine, and natural logarithm transformations yielded acceptable normal densityplots and Lillefors probabilities greater than 0.05. However, the arc-sine transformation was155 AFTER TRANSFORMATIONBEFORETOTAL USE OP EXPLANATIONS:NATURAL LOGARITHM TRANSFORMATIONRENWRBEFOREUSE OF HOW EXPLANATIONS:NATURAL LOGARITHM TRANSFORMATIONUSE OF STRATEGIC EXPLANATIONS:ARC-SINE TRANSFORMATIONAFTER TRANSFORMATIONAFTER TRANSFORMATIONBEFOREAFTER TRANSFORMATIONUSE OP WHY EXPLANATIONS:NATURAL LOGARITHM TRANSFORMATIONFIGURE 6.1: NORMAL DENSITY PLOTS FORTHE USE OF EXPLANATIONS DEPENDENT VARIABLESselected amongst them as it yielded the highest Lillefors probability (0.51), an acceptable normalprobability plot (see Figure 6.1), and the best measures for skewness and kurtosis.156Amine ntANSPORILUIONACCURACY OF EVALUATIVE JUDGMENTS:NATURAL LOGARITHM TRANSFORMATIONAFTER TRAMSEORMATIONACCURACY OF PREDICTIVE JUDGMENTS:ARC-SINE TRANSFORMATIONWORE APT TRANSFORMATIONUSER AGREEMENT:NATURAL LOGARITHM TRANSFORMATIONFIGURE 6.2: NORMAL DENSITY PLOTS FOR THE ACCURACY AND USER AGREEMENTDEPENDENT VARIABLESFigure 6.1 presents normal density plots for the four use of explanations dependent variablesprior and subsequent to their transformation using the relevant algorithms. Similarly, Figure 6.2shows the normal density plots for the accuracy and user agreement dependent variables. Therevised Lillefors probabilities after transformation for all these variables are also shown in Table6.1. Note that the Lillefors probabilities for all the transformed variables are greater than 0.05 andthat the shapes of the normal probability plots after transformation resemble the "bell-shaped" curveof the normal distribution. These results suggest that the transformations succeeded in removing the157problems associated with multivariate distributional normality. This is further backed by the factthat ANOVA models with equal cell sizes are robust to minor deviations in normality observed innormal probability plots [Neter, Wasserman and Kutner, 1985, p. 623].Bartlett's test for the homogeneity of group variances was used to test the second assumptionrelating to the homogeneity of variances. As can be seen in the last column of Table 6.1, there wereno violations of this assumption at the 0.05 level for any of the dependent variables.For the variables that required transformation, all subsequent statistical tests were performedusing the transformed data. The test results for these variables, as reported in the rest of thischapter, are based on their transformed form. However, other secondary data that is reported, suchas the tables and plots of the means and standard deviations, are based on the initial, untransformeddata that was collected. This was done to facilitate the contextual interpretation and analysis of thisdata.6.2 Assessment of the Validity & Reliability of Measurement Assessment of the reliability and validity of the independent variables is discussed in the nextsection prior to the assessment of the dependent variables.6.2.1 Reliability of the Independent VariablesAn assessment of the measurement error involved in the operationalization of theindependent variables was undertaken. Amongst the independent variables, user expertise wasidentified as possibly being the weakest construct in this regard. Unlike the "explanation provision158strategy" variable, whose two levels of feedforward and feedback were directly manipulated withoutmeasurement error, user expertise was operationalized by categorizing subjects according to twoindividual characteristics. These were the number of years of experience in financial statementanalysis and the possession of professional qualifications. They were used to classify subjects intothe expert and novice groups. Both of these individual characteristics were analyzed post-hoc toassess the validity of the categorization used to operationalize user expertise.6.2.1.1 Years of experience in financial statement analysisAn independent samples t-test was performed to test the null hypothesis that the means forSubjects^N^Mean (years of experience)^Std. Dev. Novices^40^0.33^ 0.94Experts^40^9.68 4.54SEPARATE VARIANCES T = -12.76 DF = 42.3 PROB =^< 0.001POOLED VARIANCES^T = -12.76 DF = 78 PROB =^< 0.001the number of years of experience were not significantly different between the expert and novicesubjects. The 9.35 years difference in the means of the two groups was significant at the 0.05 level.This suggests that categorization based on the years of experience in financial statement analysis waseffective in distinguishing between the expert and novice subjects. Only 5 of the 40 novice subjectsreported some amount of experience ranging from 1.5 to 3 years. All of the 40 expert subjects hadwork experience in financial analysis ranging between 4 and 20 years.1596.2.1.2 Professional QualificationsOnly professional qualifications that went beyond the undergraduate degree and that wereSubjects NoProfessional QualificationsYes TotalNovices 33 7 40Experts 11 29 40Total 40 40 80TEST STATISTIC^VALUE^DF^PROBLIKELIHOOD RATIO CHI-SQUARE^25.95 1^< 0.001relevant to the application of financial analysis were considered. These were the CharteredAccountant (CA), Certified Financial Analyst (CFA), and Certified General Accountant (CGA)designations (see Section 5.2). The data was coded as a binary variable with the value 1 denotingthe possession of such qualifications and 0 their absence. Considering the nature of this data, a Chi-square test was performed to test the null hypothesis that there was no significant difference betweenthe expert and novice subjects in terms of the possession of professional qualifications. The resultof this test is presented above together with a cross-tabulation of the data. There was a significantdifference between the expert and novice subjects in terms of their possession of professionalqualifications.In summary, the results of these two tests suggest that the categorization used tooperationalize the user expertise construct was successful in differentiating between the expert and160novice levels.6.2.2 Assessment of the Reliability of the Dependent VariablesThe measures relating to the use of explanations were captured reliably without error byrecording them directly into the computer logs of the experimental sessions. The two accuracymeasures also have a high degree of reliability as they were computed directly from the judgmentrecording sheets completed by subjects. However, for the constructs relating to user perceptions ofthe usefulness of the system and of the explanations, a multi-item instrument in the form of the post-study questionnaire was used for measurement. The development of the items comprising thisinstrument was discussed in Section 5.2.3 of Chapter 5. Prior to the use of these measures ofusefulness for the purposes of evaluating the hypotheses, the validity and reliability of the scalescomprising this instrument were assessed.The instrument comprised of three multi-item scales for measuring three perceptualconstructs: usefulness of the KBS, usefulness of the explanations provided, and the ease-of-use ofthe system. Only the first two of these served as dependent variables and were discussed in Chapter6. The ease-of-use scale was included primarily to serve as a manipulation check for assessing thepossible mediating influence of differences in the system's usability between the various treatmentgroups. Evaluation of its reliability and construct validity is however also discussed in this section.The overall reliability of the three scales was assessed using Cronbach's ALPHA (Cronbach,1970). As well, the items comprising each scale were evaluated using various item reliabilitystatistics. These included the effect on Cronbach's ALPHA if an item was deleted, the item standard161deviation score, item-to-total scale correlation, and the correlation of items within each scale (usingthe Bonferroni probability matrices). The construct validity of the scales was assessed by performingPrincipal Components Factor Analysis utilizing the Varimax rotation to obtain the eigenvalues,factor loadings, and the percentage of variance explained. The objective was to evaluate theconvergence and divergence of items within each scale. All the analysis was performed using theTESTAT module of the SYSTAT statistical package [SYSTAT, 1990].TABLE 6.2:SCALE RELIABILITY COEFFICIENTS & FACTOR ANALYSIS RESULTSScaleNo. ofItemsCronbach'sALPHAEigenvalue(Factoranalysis)Percentage ofVariance explainedby Rotated FactorUsefulness of the System 10 0.85 4.37 45.8Usefulness of the Explanations 9 0.93 5.95 66.1Ease of Use 5 0.76 2.65 53.0Table 6.2 presents the Cronbach's ALPHA, eigenvalues, and the percentage of varianceexplained for each of the three scales. The values of ALPHA for the three scales are all within orabove the 0.60 to 0.80 range, which Nunnally [1967, p. 226] argues is a sufficient reliability levelfor "basic research." While this assured that the scales were sufficiently reliable, the results of thefactor analysis were used to assess construct validity. For each of the three scales, the itemscomprising the scale were factor analyzed to assess the number of factors that emerged.162In the case of the usefulness of the system construct, the four largest eigenvalues thatemerged were as follows: 4.37, 1.16, 0.92, and 0.75. Only the first eigenvalue was significantlygreater than 1.0 and a scree plot of all the eigenvalues revealed a break after the first factor. Thisindicated that a single factor solution was most likely. While this first factor alone accounted for45.8% of the variance captured, the acceptance of a second factor (eigenvalue = 1.16) would haveincreased the percentage of variance explained by another 11.6%. Assessment of the "usefulnessof the explanations" construct yielded only one eigenvalue that was greater than 1.0. A scree plotof all the eigenvalues confirmed again that a single factor was most appropriate. This first factorwith an eigenvalue of 5.95 explained 66.1% of the total variance. This suggests a significantly highlevel of construct validity. In the case of the third scale that measured the ease-of-use construct, theresults were generally similar. The four largest eigenvalues that were obtained were: 2.65, 0.86,0.67, 0.43, and the one factor solution that resulted explained 53% of the total variance captured.Further confirmatory factor analysis was performed by conducting Principal Component analysisusing the VARIMAX rotation on all 24 of the items that measured the three constructs. All threeof the expected factors emerged with a reasonable degree of clarity accounting for 61 % of the totalvariance in the data set. A scree plot revealed a clear break after the first three eigenvalues whichwere all significantly greater than the 1.0 cut-off: 10.04, 2.93, and 1.72.Examination of the rotated factor loadings was performed jointly with the evaluation of theitem reliability statistics for each of the three scales. The objective was to isolate items that reducedeither the reliability or the construct validity of the scales. It was felt that the elimination of suchitems could enhance subsequent statistical analysis conducted to test the research hypotheses. Itemsthat did not load strongly on any factor (loadings of less than 0.5) or loaded strongly on more than163one factor reduced the construct validity of the scales. Items that diminished scale reliabilityincluded those: 1) whose deletion would increase Cronbach's ALPHA; 2) with low item-to-item anditem-to-total correlations; and 3) with low standard deviance scores. Tables 6.3. through 6.5 presentTABLE 6.3:ITEM RELIABILITY STATISTICS & FACTOR LOADINGS:USEFULNESS OF THE SYSTEMItemLabelStandardDeviationItem-TotalCorrelationDeletion Effect onALPHA(base = 0.85)Rotated FactorLoadingsQ3 1.07 0.70 0.834 0.69Q5 0.99 0.79 0.825 0.81Q13 0.80 0.68 0.837 0.72Q15 1.22 0.66 0.840 0.67Q16 1.13 0.76 0.828 0.77Q20 0.95 0.75 0.829 0.77Q23 1.41 0.57 0.856 0.52Q24 0.77 0.58 0.845 0.61Q26 1.19 0.61 0.845 0.57Q31 1.22 0.60 0.847 0.57the item reliability statistics, as well as the rotated factor loadings, for each of the scales. Note thatthe item labels used in the first column of these tables correspond to the question numbers of theitems included in the post-study questionnaire. A copy of this is included in Appendix 2.Examination of the deletion effects on ALPHA presented in Table 6.3 reveals that there wasonly one item (Q23) whose elimination would improve the overall reliability of the scale measuringthe usefulness of the system. However, examination of the standard deviation and item-to-scalecorrelation of this item, provides some justification for its retention. The item-to-total correlation,164while being lowest among all the items in this scale, is substantially greater than the 0.40 cut-offthat is generally used [Moore and Benbasat, 1991]. As well, its standard deviation is larger than thatof the other items, suggesting that it has the most explanatory power. Considering that the increasein ALPHA from elimination would have been slight (from 0.85 to 0.856), it was therefore decidedthat the item would not be culled from the scale. None of the items that measured usefulness of theexplanations posed any reliability problems (Table 6.4). All the item-to-total correlations were equalto or greater than 0.71 and Cronbach's ALPHA could not be increased by eliminating any of theitems. Similar assessment of the items measuring ease-of-use (Table 6.5) revealed that item Q4 wasa possible candidate for elimination. It had the lowest standard deviation and item-to-scalecorrelation amongst all the items. However, considering that the increase in ALPHA from itsdeletion would have been marginal, from 0.76 to 0.765, it was decided that the item would beretained.As can be seen in the last column of Tables 6.3, 6.4, and 6.5 all the items in the three scalesloaded strongly on their respective factors, indicating a high level of convergent validity. The lowestfactor loading was 0.52 for item Q23 of the usefulness of the system construct. This is acceptableas it is higher than the loading of 0.45 that is generally considered in the literature to be the lowercut-off level [Comrie, 1973]. As well, the majority of the loadings were in excess of 0.63, whichComrie [1973] considers to be the minimal cut-off level for very good loadings. The rotated factorloadings from the confirmatory factor analysis performed using all the 24 items (see Table 6.6) wereexamined to identify items that reduced divergent validity, i.e., loaded strongly on more than onefactor. A three factor solution was specified. This revealed that there were only two such items:item Q15 which measured usefulness of the system and item Q25 which measured usefulness of the165TABLE 6.4:ITEM RELIABILITY STATISTICS & FACTOR LOADINGS:USEFULNESS OF THE EXPLANATIONSItemLabelStandardDeviationItem-TotalCorrelationDeletion Effecton ALPHA(base = 0.93)Rotated FactorLoadingsQ9 1.20 0.85 0.921 0.85Q21 1.23 0.71 0.930 0.71Q25 1.50 0.75 0.930 0.74Q27 1.05 0.85 0.921 0.86Q28 1.20 0.90 0.917 0.90Q30 1.28 0.82 0.922 0.82Q32 1.32 0.74 0.929 0.74Q33 1.16 0.81 0.923 0.82Q38 1.44 0.85 0.920 0.86explanations. Besides loading strongly on their respective constructs, both of these items also loadedheavily on the ease-of-use construct with loadings of 0.63 and 0.60 respectively. Examination ofthe contents of both items reveals that they measure a common dimension of usefulness that relatesto the improvement in the efficiency, or quickness, with which the underlying problem-solving taskcan be performed. For example, a system or system feature that enhances the efficiency of theproblem-solving process will be perceived as being useful. Item Q15 comprises the statement "Usingthe explanations provided by FINALYZER-XS allows me to accomplish the financial analysis taskmore quickly", while Q25 states that "Using FINALYZER-XS allows me to accomplish the financialanalysis task more quickly". While prior studies [Davis, 1986; Moore and Benbasat, 1991] haveconsidered this dimension to be solely a sub-construct of usefulness, subjects in this study clearlyviewed it as being a measure of ease-of-use as well. Considering that these two items did not clearlydiscriminate between the usefulness and ease-of-use constructs, they were therefore culled from their166TABLE 6.5:ITEM RELIABILITY STATISTICS & FACTOR LOADINGS: EASE-OF-USEItemLabelStandardDeviationItem-TotalCorrelationDeletion Effecton ALPHA(base = 0.76)Rotated FactorLoadingsQ4 0.61 0.54 0.765 0.58Q7 1.03 0.80 0.681 0.78Q10 1.26 0.79 0.727 0.73Q29 0.77 0.76 0.693 0.81Q34 0.87 0.71 0.718 0.73respective scales.In summary, the three scales comprising the instrument were found to be sound from theperspective of reliability of measurement. As well, in relation to the assessment of constructvalidity, there was substantial evidence relating to the convergent and divergent validity of the itemscomprising the scales. Aspects of the content validity were discussed earlier in Section 5.2.3 ofChapter 5 in relation to the development of the scales. Considering that the items were assembledbased on reliable and valid instruments that arose out of past studies that focused specifically on thetheoretical dimensions of the usefulness and ease-of-use constructs [Davis, 1986; Moore andBenbasat, 1991], it can be argued that a reasonable degree of content validity is assured.6.3 Results Pertaining_ to the Factors Influencing the Use of Explanations The primary statistical model used to investigate the hypotheses relating to the use ofexplanations is presented in Figure 6.3. It is a mixed ANOVA model with "explanation provision167Table 6.6: Rotated Factor Matrix for all 24 itemsItem Factorl Factor2 Factor3 I^Item Factorl Factor2 Factor3^IQ3* 0.72 0.27 0.06 Q9b 0.29 0.74 0.25Q5a 0.70 0.19 0.39 Q21b 0.48 0.62 0.13Q13a 0.60 0.28 0.33 Q25b 0.05 0.63 0.60Q15a 0.43 0.13 0.63 Q27b 0.22 0.84 0.11Q16a 0.76 0.25 0.14 Q28b 0.24 0.86 0.17Q20a 0.56 0.35 0.15 Q30b 0.13 0.76 0.32Q23' 0.43 0.47 0.14 Q32b 0.20 0.71 0.04Q24' 0.61 0.22 0.21 Q33b 0.06 0.83 0.03Q26' 0.59 0.26 0.11 Q38b 0.30 0.83 0.05Q31" 0.58 0.23 0.20 Q10' 0.28 0.07 0.72Q4' 0.32 0.16 0.68 Q29' 0.19 0.05 0.75Q7c  0.39 0.02 0.63 Q34' 0.40 0.16 0.66Note: "a" signifies that this item measures usefulness of the system (Factorl)"b" signifies that this item measures usefulness of the explanations (Factor2)"c" signifies that this item measures ease-of-use (Factor3)strategy" and "user expertise" being between-subject factors and the "types of explanations" beinga within-subjects factor. The model was run using the MGLH module of SYSTAT [SYSTAT, 1990]with a sample size of 60 subjects. The 20 subjects who had no scores for the use of feedforwardand feedback explanations because they were not provided with any explanations were not included.There were three levels of explanation provision strategy: (a) only feedforward explanations; (b)only feedback explanations; and (c) both feedforward and feedback explanations. User expertise wasmeasured at two levels: experts and novices. The three types of explanations provided were theWhy, How, and Strategic explanations. Table 6.7 presents the results of the ANOVA model, while168Model : EXPLANATION USE = B1 (EPS) + B2(UE) + B3(TYPES) + BIEPS*UE)+ B5 (EPS*TYPES) + B6(UE*TYPES) + B 7(EPS*UE*TYPES) + eLegend: B. = Beta Weights; e = Error TermEXPLANATION USE = Usage Proportions of the Why, How, and Strategic ExplanationsEPS = Three Types of Explanation Provision Strategies: Feedforward Only, FeedbackOnly, and Both Feedforward and FeedbackUE = Expert and Novice Levels of User ExpertiseTYPES = Why, How and Strategic Explanations ProvidedHypotheses : Main EffectsHl:^Feedback explanations will be used as much as the feedforward explanations (FB = FF).H2: Novices will use more explanations than experts (NOVICES > EXPERTS).H3: The why explanation will be used the most, and the how explanation will be used more thanthe strategic explanation (WHY > HOW > STRATEGIC).Hypotheses : Interaction EffectsH4: Novices will use more feedforward explanations than feedback explanations, while expertswill use more feedback explanations than feedforward explanations.H5: For both feedforward or feedback explanation provision, the why explanation will be usedthe most, and the how explanation will be used more than the strategic explanation.H6a: Novices will use the why explanation the most, and the how explanation more than thestrategic explanation.H6b: Experts will use the how explanation the most, and the strategic explanation more than thewhy explanation.H7a: For both feedforward or feedback explanation provision, novices will use the whyexplanation the most, and the how explanation more than the strategic explanation.H7b: For both feedforward or feedback explanation provision, experts will use the howexplanation the most, and the strategic explanation more than the why explanation.Figure 6.3: Statistical Model and Hypotheses Relating to the Use of Explanationsthe means and standard deviations for the various treatment groups are summarized in Table 6.8.Note that the measures for the use of explanations were proportions of the number of explanationsselected to the number of explanations provided. Table 5.2 on page 127 presents the denominators169Table 6.7: ANOVA Results for the Use of ExplanationsEFFECTS D.F. F-value Sigf.A. BETWEEN SUBJECTSUser Expertise (UE) 1 0.02 0.88Explanation Provision Strategy (EPS) 2 13.16 0.00UE by EPS Interaction 2 0.16 0.85B. WITHIN SUBJECTSTypes of Explanations (TYPES) 2 8.80 0.00TYPES by UE Interaction 2 2.68 0.07TYPES by EPS Interaction 4 2.39 0.06TYPES by UE by EPS Interaction 4 0.05 0.99used in the computation of these proportions, i.e., the total number of explanations provided foreach treatment condition. The results are discussed below in relation to each of the hypotheses (H1 -H7) pertaining to the determinants of the use of explanations.6.3.1 Hypothesis 1: Feedback and Feedforward Explanation ProvisionThere is a significant difference in the proportion of feedback and feedforward explanationsused (F = 13.16; p < 0.001). H1 was rejected. The relevant group means from Table 6.8 aredisplayed in Figure 6.4. Subjects provided with only feedforward explanations used 7.4% of theavailable explanations, while subjects provided with only feedback explanations used 28.1 %. Thetotal percentage of explanations used by subjects who were provided with both feedforward andfeedback explanations was 12.4%. A decomposition of this percentage reveals that subjects whowere provided with both used 5.8% of the feedforward explanations and 26.5% of the feedback170TABLE 6.8: DETAILED STATISTICS FOR THE USE OF EXPLANATIONSExplanation Provision StrategyNovices Experts TotalMean S.D. Mean S.D. Mean S.D.GroupsReceiving OnlyFeedforwardExplanations(FF)FF WhyFF HowFF StrategicTotal FF.092 .108 .064 .051 .078 .084.053 .064 .132 .183 .092 .139.051 .045 .049 .042 .050 .042.065 .067 .082 .081 .074 .072GroupsReceiving OnlyFeedbackExplanations(FB)FB WhyFB HowFB StrategicTotal FB.408 .347 .336 .279 .372 .308.272 .225 .36 .339 .312 .272.16 .189 .149 .185 .154 .182.28 .209 .281 .184 .281 .192GroupsReceiving BothFeedforward andFeedbackExplanations(BOTH)FF WhyFF HowFF StrategicTotal FF.113 .162 .051 .103 .082 .136.062 .064 .051 .063 .057 .062.042 .036 .03 .035 .036 .035.072 .067 .044 .061 .058 .064FB WhyFB HowFB StrategicTotal FB.376 .351 .256 .271 .318 .316.24 .24 .344 .308 .292 .274.228 .19 .14 .128 .184 .164.283 .206 .247 .205 .265 .201FF & FB WhyFF & FB HowFF & FB StrategicTotal FF & FB.199 .201 .117 .153 .158 .179.119 .097 .145 .124 .132 .11.101 .071 .065 .048 .083 .062.14 .094 .109 .098 .124 .095171Figure 6.4The Use of Feedforward and Feedback ExplanationsFF Only^FB Only^Both (FF)^Both (FB)Explanation Provision0.25 -0.23 -0.21 -0.19 -0.17 -0.15 -0.13 -0.11 -0.09 -0.07 -0.05 -0.03 0.31 -0.29 -0.27 -explanations that were provided. Pairwise contrasts between the three treatments using theBonfferoni adjustment matrix yields the following probabilities: Only FF and Only FB - p < 0.01;Only FF and BOTH (FF & FB) - p = 0.03; and Only FB and BOTH (FF & FB) - p = 0.01. Thesecontrasts suggest that there is a significant difference in the proportion of explanations used bysubjects in the three treatments. Overall, the use of feedback explanations is approximately fourtimes greater than the use of feedforward explanations.The measure of explanation use for the BOTH (FF and FB) explanation provision strategywas a weighted average of the proportions of feedforward and feedback explanations used.172Model 2: FEEDFORWARD USE = B I (EPS) + B2(UE) + B3 (TYPES) + B4 (EPS*UE)+ BS (EPS*TYPES) + B6(UE*TYPES) + B 7(EPS*UE*TYPES) + eModel 3: FEEDBACK USE = B I (EPS) + B2(UE) + B3 (TYPES) + B4(EPS*UE)+ B5 (EPS*TYPES) + B6(UE*TYPES) + B2(EPS*UE*TYPES) + eLegend: B. = Coefficients; e = Error TermFEEDFORWARD USE = Proportions of the Why, How, and Strategic ExplanationsUsed As FeedforwardFEEDBACK USE = Proportions of the Why, How, and Strategic Explanations UsedAs FeedbackEPS = Two Types of Explanation Provision: Provided Alone or TogetherUE = Expert and Novice Levels of User ExpertiseTYPES = Why, How and Strategic Explanations ProvidedFigure 6.3A: Additional Models Used to Investigate Explanation Provision StrategyTable 6.7A: ANOVA Results for the Use of Feedforward Explanations (Model 2)EFFECTS D.F. F-value Sigf.A. BETWEEN SUBJECTSUser Expertise (UE) 1 0.07 0.79Explanation Provision Strategy (EPS) 1 0.06 0.80UE by EPS Interaction 1 0.09 0.77B. WITHIN SUBJECTSTypes of Explanations (TYPES) 2 7.67 0.00TYPES by UE Interaction 2 2.31 0.11TYPES by EPS Interaction 2 0.41 0.67TYPES by UE by EPS Interaction 2 0.13 0.88Therefore, two additional ANOVA models were run to ascertain if the amount of feedforward orfeedback explanations used when each was provided individually was significantly different from173Table 6.7B: ANOVA Results for the Use of Feedback Explanations (Model 3)EFFECTS D.F. F-value Sigf.A. BETWEEN SUBJECTSUser Expertise (UE) 1 0.08 0.79Explanation Provision Strategy (EPS) 1 0.50 0.49UE by EPS Interaction 1 1.04 0.31B. WITHIN SUBJECTSTypes of Explanations (TYPES) 2 2.85 0.06TYPES by UE Interaction 2 2.78 0.07TYPES by EPS Interaction 2 0.70 0.50TYPES by UE by EPS Interaction 2 0.77 0.47when they were provided together. While these models were generally similar to the main statisticalmodel of Figure 6.3, they used different dependent measures and had only two levels of explanationprovision strategy. The first of these models (see Model 2 of Figure 6.3A) had the amounts offeedforward explanations used as the dependent variable and tested if providing the feedforwardexplanations individually or together with feedback explanations (explanation provision strategy) ledto different proportions of feedforward explanations being used. Thus, only the data from the 40subjects who were provided with feedforward explanations was used for this model. The result ofthis model (see Table 6.7A) for explanation provision strategy indicate that there is no significantdifference (F = 0.06, p = 0.80) in the amount of feedforward explanations used in the twoconditions. Similarly, the second of these models (Model 3 in Figure 6.3A) was run using the datafrom the 40 subjects who were provided with feedback explanations and had the amounts offeedback explanations used as the dependent variable. Its results for explanation provision strategy174(See Table 6.7B) indicate that there is no significant difference (F = 0.50, p = 0.49) in the use offeedback explanations, when provided individually or together with feedforward explanations. Theseresults suggest that there is no evidence of a tradeoff between feedforward and feedbackexplanations when they are provided together.In summary, the analysis of explanation provision strategy in the three models indicates that1) feedback explanations are used significantly more than feedforward explanations, and 2) theamounts of feedforward and feedback explanations used are constant, irrespective of whether theyare provided individually or together.6.3.2 Hypothesis 2: User ExpertiseContrary to the model of the determinants of the use of explanations postulated in Chapter3, there is no significant difference in the total use of explanations by expert and novice subjects(F = 0.02, p = 0.878). The means and standard deviations are presented in the last row of Table6.9. This table summarizes the mean and standard deviation statistics for the 60 subjects whoreceived explanations. Overall, novices used 16.2% of the explanations provided to them, whileexperts used 15.7%.6.3.3 Hypothesis 3: Types of ExplanationsThere is a significant difference in the proportions of the How, Why, and Strategicexplanations used (F = 8.80; p < 0.01). Figure 6.5 presents the overall proportions of use for thethree kinds of explanations. It shows that 20.3% of the Why explanations provided, 17.9% of theHow explanations provided, and 9.6% of the Strategic explanations provided were used. Multiple175TABLE 6.9: AGGREGATE STATISTICS OF THE USE OF EXPLANATIONS(FOR ALL SUBJECTS WHO RECEIVED EXPLANATIONS: N = 60)Explanation Provision StrategyNovices Experts TotalMean S.D. Mean S.D. Mean S.D.FeedforwardExplanationsUsedWhyHowStrategicTotal.103 .134 .058 .079 .080 .111.058 .062 .092 .139 .075 .108.046 .04 .04 .039 .043 .039.069 .065 .063 .072 .066 .068FeedbackExplanationsUsedWhyHowStrategicTotal.394 .344 .296 .27 .345 .309.256 .227 .352 .315 .304 .275.194 .187 .144 .155 .169 .172.281 .202 .264 .19 .273 .194TotalExplanationsUsedWhy .233 .267 .172 .216 .203 .242How .148 .169 .21 .24 .179 .208Strategic .104 .124 .087 .117 .096 .12Total .162 .161 .157 .153 .16 .156contrasts were performed to ascertain if these usage proportions supported the results of past studiesof the use of KBS explanations, which have found the Why explanation to be the most preferredtype of explanation [Ye, 1990]. These contrasts reveal that there was no significant differencebetween the use of the Why and How explanations (F = 1.08, p = 0.39). However, the differencesin usage between 1) the Why and Strategic explanations (F = 3.83, p < 0.01), and 2) the How andStrategic explanations (F = 3.70, p < 0.01), were significant.1760.22 -0.21 -0.2 -0.19 -0.180.17-0.16-a 0.15 -20- 0.14 -0.13 -0.12 -0.11 -0.1 -0.09 -0.08FIGURE 6.5:Use of the Three Types of ExplanationsWHY^HOW^STRATEGICTypes of Explanations6.3.4 Hypothesis 4: Interaction of User Expertise By Explanation ProvisionIt was hypothesized in Chapter 3 that novices would use a higher proportion of feedforwardexplanations as compared to feedback explanations. As well, it was hypothesized that experts woulduse more feedback explanations, as their more developed models of the domain would reduce theirneed for feedforward explanations. However, no evidence was found for this interaction effect (F= 0.16, p = 0.85). Experts and novices displayed a similar behavior in relation to the use offeedback and feedforward explanations. As displayed in Figure 6.6, when only feedforward or onlyfeedback explanations were provided, both experts and novices used a greater proportion of thefeedback explanations as compared to the feedforward explanations. When feedforward and177FIGURE 6.6:Use of Explanations by Experts and Novicesco rp^2o 0 User Expertisec\I Novicesa^.• Experts0.32 -0.3 -0.28 -0.26 -0.24 -0.22 -0.2 -0.18 -0.16 -0