Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An experimental study of the use and effects of hypertext-based explanations in knowledge-based systems Mao, Jiye 1995

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1995-060144.pdf [ 11.32MB ]
Metadata
JSON: 831-1.0088358.json
JSON-LD: 831-1.0088358-ld.json
RDF/XML (Pretty): 831-1.0088358-rdf.xml
RDF/JSON: 831-1.0088358-rdf.json
Turtle: 831-1.0088358-turtle.txt
N-Triples: 831-1.0088358-rdf-ntriples.txt
Original Record: 831-1.0088358-source.json
Full Text
831-1.0088358-fulltext.txt
Citation
831-1.0088358.ris

Full Text

AN EXPERIMENTAL STUDY OF THE USE AND EFFECTS OF HYPERTEXT-BASED EXPLANATIONS IN KNOWLEDGE-BASED SYSTEMS by JIYE MAO B.Eng., Renmin University of China, 1985 MBA, McGill University, 1989 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Business Administration - Management Information Systems) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA June 1995 © Jiye Mao, 1995 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. (Signature) Department of Q)tAtfift& £a4 IwifM h&Uwisft&iiW The University of British Columbia Vancouver, Canada Date v, uiu tci. (W5 DE-6 (2/88) ABSTRACT Since MYCIN, explanation has become a fundamental feature of knowledge-based systems (KBS). Among the common deficiencies of KBS explanations, the most acute one is the lack of knowledge. This dissertation research investigates the use of explanations provided with hypertext for increasing the availability and accessibility of domain knowledge. The ultimate objective is to determine the behavioral and cognitive basis of the use of hypertext in providing KBS explanations. Two informationally equivalent KBS were comparatively studied in a laboratory setting: one used hypertext to provide explanations, while the other one used conventional lineartext. The experiment involved 26 experienced professionals, and 29 undergraduate and graduate students specializing in accounting. Subjects used the experimental KBS to work on a realistic problem of financial analysis. Both the process and outcomes of explanation use were assessed. Outcome variables included improvement in decision accuracy, trust in the KBS, and perceived usefulness of explanations. In addition to questionnaires used to measure decision accuracy and perceptions, computer logs were used to capture the number, type, and context of explanation use. Thinking-aloud procedures were used to assess the nature of explanation use. Results indicate that the use of hypertext for providing explanations significantly improved decision accuracy, and influenced users' preference for explanation types, and the number and context of explanation requests. Enhanced accessibility to deep explanations via the use of hypertext significantly increased the number of deep explanations requested by both novices and experts. Verbal protocol analysis shows that the lack of knowledge and means of accessing deep explanations could make it difficult to understand KBS recommendations, and that deep ii explanations could improve the understandability of KBS advice, especially in cases where unfamiliar domain concepts were involved. In the hypertext group, about 37% of the deep explanations were requested in the context of judgment making, rather than in the abstract. While only about 28% of the deep explanations requested by the lineartext group were the How type, 42% were the How type for the hypertext group. Experts and novices had different preferences for explanation types. Experts requested a much higher percentage of How, and lower percentages of Why and Strategic explanations, than novices. Verbal protocol analysis illustrates that experts and novices used explanations for different purposes. in TABLE OF CONTENTS Abstract ii Table of Contents iv List of Tables vii List of Figures ix Chapter 1. Introduction 1 1.1 Research Questions 3 1.2 Importance of the Research 4 1.3 Organization of This Dissertation 7 Chapter 2. A Review of Previous Research 9 2.1 Explanations in KBS - An Artificial Intelligence Perspective 9 2.2 Use of Hypertext to Provide Explanations 13 2.3 Empirical Studies Related to Explanation Use 16 Chapter 3. Theoretical Perspectives 23 3.1 Theories of Discourse Comprehension 24 Dependent Variables 25 Independent Variables 26 Intervening Variables 27 3.2 Other Theories 31 Alleviation of the Production Paradox 32 Contextualized Learning 32 Meaningful Learning 33 Factors Affecting the Use of Knowledge in Decision Making 34 "Explanation Effect" and Reasoning-Trace Explanations 37 Chapter 4. Research Model and Hypotheses 38 4.1 Research Model 38 4.2 Research Hypotheses 41 iv Chapter 5. Research Method 47 5.1 Research Method 47 5.2 Task Domain and Experimental Materials 48 5.3 Independent Variables 49 Explanation Provision Method and Experimental Systems 49 Level of Expertise 55 5.4 Dependent Variables 56 Decision Accuracy 56 Perceived Usefulness of Explanations 58 Trust in KBS 59 Use of Explanations 61 5.5 Experiment Design 64 5.6 Subjects 65 5.7 Experimental Procedures 68 Pre-Experimental Phase 68 Experimental Phase 70 Post-Experimental Phase 72 Chapter 6. Data Analysis (Part I) 73 6.1 Data Screening Prior to Analysis 74 6.2 Analysis of Explanation use 77 Determinants of Explanation Use 78 Effects of Hypertext on the Context of Deep Explanation Use 79 6.3 Effects of Explanation Use: A Structural Equation Model 85 The Use of PLS 86 Formative Versus Reflective Constructs in the PLS Model 88 Measurement Model 91 Structural Model 95 Discussion on the PLS Modelling 101 6.4 Supplementary Analyses: User Preference for Explanation Types 103 User Preference for Deep Explanation Types 104 User Preference for Reasoning-Trace Explanation Types 110 Discussion on User Preference for Explanation Types 112 6.5 Conclusions 115 Chapter 7. Data Analysis (Part II): Verbal Protocol Analysis 118 7.1 The Use of Verbal Protocol Analysis 118 7.2 Verbal Protocol Analysis Used in This Study 121 7.3 Method and Procedure to Verbalize 123 7.4 Data Analysis Methods 127 7.5 Classifications of the Use of Deep Explanations 131 v 7.6 Classifications of the Use of Reasoning-Trace Explanations 136 7.7 Pattern of Explanation Use 142 Use of Deep Explanations 143 Use of Reasoning-Trace Explanations 145 Reliability of Coding and Analysis 147 7.8 Linkage to the Theoretical Foundations 148 7.9 Two Cases of "Difficult" Recommendations 155 Analysis of Verbal Reports Related to C5.4 157 Analysis of Verbal Reports Related to C10.2 160 7.10 Conclusions 165 Chapter 8. Conclusions and Discussion 170 8.1 Research Contributions and Implications 170 Major Research Findings 170 Theoretical Contributions 175 Research Methods 176 8.2 Limitations of This Study 178 8.3 Directions for Future Research 181 Bibliography 184 List of Appendices A. Experimental Materials 194 B. Step-by-Step Experimental Procedures and Verbal Instructions 229 C. An Illustration of Hyper-FINALYZER 234 D. Materials for Recruiting Subjects 241 E. Data Screening Prior to Analysis 252 F. Validity and Reliability of Perception Measures 257 G. Statistical Power Analysis 262 H. Materials Related to Verbal Protocol Analysis 264 I. Nature of Explanation Use (Refined Categorization) 278 VI LIST OF TABLES Table Page 2.1 Deep Knowledge in KBS 11 4.1 Overview of the Operationalization of Explanation Use 40 5.1 Definitions of Explanations Provided by FINALYZER and Hyper-FINALYZER . . . . 50 5.2 Items Used to Measure Perceived Usefulness of Explanations 59 5.3 Items Used to Measure User Trust in the KBS 61 5.4 Classification of Explanation Use 63 5.5 The Experimental Design 65 6.1 Average Scores of Improvement in Decision Accuracy 76 6.2 Statistics for the Number of Deep Explanation Use 78 6.3 Statistics for the Number of Reasoning-Trace Explanation Use 79 6.4 Difference Between Experts and Novices in Means and Context of DE Use 81 6.5 Difference Between Experts and Novices in Using Hypertext for Requesting DE . . . . 82 6.6 Effects of Expertise on the Means and Context of Deep Explanation Use 84 6.7 Effects of Expertise on Different Use of Hypertext 85 6.8 Means, Standard Deviations, and Internal Consistencies (Reliability) 93 6.9 Correlation of Constructs (Hypothesized Model) 94 6.10 Summary of Findings From PLS Modelling 97 6.11 Determinants of Explanation Use: A Comparison Between ANOVA and PLS 103 6.12 Effects of Expertise on Preference for Deep Explanation Types 105 6.13 Effects of Explanation Methods on Preference for DE Types 106 6.14 Effects of the Means of Explanation Request on Preference for DE Types 106 6.15 Effects of DE Methods on Preference for DE Types (Expertise Controlled) 107 6.16 Main Effects of Expertise and DE Methods on User Preference for DE Types 108 6.17 Interaction Effects of Expertise and DE Methods on Preference for DE Types 109 6.18 Effects of Expertise on Preference for RTE Types 110 6.19 Effects of DE Provision Methods on Preference for RTE Types I l l 6.20 Effects of DE Methods on Preference for RTE Types (Expertise Controlled) 112 6.21 Main Effects of Expertise and DE Methods on Preference for RTE Types 113 vii Table Page 6.22 Interaction Effects of Expertise and DE Methods on Preference for RTE Type . . . . 113 6.23 Summary of Findings 116 7.1 Effects of Expertise on DE Use 144 7.2 Effects of Explanation Provision Methods on DE Use 145 7.3 Effects of Expertise on RTE Use 146 7.4 Effects of Explanation Provision Methods on RTE Use 147 7.5 Differences in the Use of RTE-5.4 Between Various Treatment Groups 158 7.6 The Relationship Between Agreement Rating and the Use of RTE-5.4 159 7.7 Difference in the Use of DE-10.5 Between Various Treatment Groups 164 7.8 The Relationship Between Agreement Rating and the Use of DE-10.5 165 7.9 Effects of Thinking-Aloud on Performance 167 7.10 Summary of Rationale for Explanation Use 168 viii LIST OF FIGURES Figure Page 2.1 Use of Hypertext to Provide Deep Explanations 16 3.1 A Framework for Studying Learning from Text 25 3.2 Formal Structure of a Textbase 30 4.1 The Research Model 39 6.1 Process and Outcomes of Explanation Use (Results From PLS Modelling) 96 7.1 Explanation Use and Information Processing Variables 155 IX CHAPTER 1. INTRODUCTION Two decades after their arrival onto the computing scene in the 1970s, knowledge-based systems (KBS) have become a fixture in industry. Many KBS are now in routine use across a wide spectrum of business and professional domains (cf., Feigenbaum, McCorduck, & Nii, 1988). They are increasingly assigned to principal roles such as assistants to human decision makers, autonomous decision-making components of complex systems, and generators, critics, and evaluators of information structures including designs, plans, and schedules (Hayes-Roth & Jacobstein, 1994). In all of the above applications, few KBS are designed to completely replace human decision makers; most likely they are built to assist or advise human decision makers with different technical backgrounds and levels of problem-solving abilities. Because KBS must work cooperatively with human decision makers, it is essential that their function be transparent and understandable. In general, the more power a system has, the greater its need to effectively communicate the intent of its actions, so that users can have an appropriate expectation of its competence and responsibility, and therefore interact with the system more effectively (Muir, 1987). When unexpected recommendations are given, users of KBS want to understand why and how. At the root of every human being's understanding is the ability to seek and create explanations (Schank, 1982). Since MYCIN (Shortliffe, 1976), explanation has become a fundamental feature of KBS. It has been argued that explanations provided by KBS are essential for user acceptance because they can make the performance of KBS natural and transparent (Davis, Buchanan, & Shortliffe, 1 1977). Human decision makers must live with the consequences and risks associated with their decisions. They are unlikely to accept decisions based on reasoning that they are unaware of or do not understand. To influence human judgment and decision making, a KBS should present both explanations of its knowledge of the task domain and explanations of its reasoning processes employed to solve problems and make recommendations. Explanations really should be considered an indispensable part of the solution itself, in addition to being the principal mechanism for securing users' acceptance and trust. However, generating effective explanations is not an easy task. Over the past two decades, a substantial amount of research has been conducted to explore alternative ways of explaining the actions and conclusions of KBS. Despite the significant advances that have been made, explanations generated by KBS still suffer from several flaws, e.g., they ignore user needs, lack interactivity, and are difficult to understand (cf., Gilbert, 1989; Maybury, 1992; Wick & Thompson, 1992). The need for superior explanation capabilities may be one reason that the integration of KBS into practical settings has lagged behind the advice-giving capabilities of these systems. Among the common deficiencies of KBS explanations, the most acute one is the lack of knowledge. Often, the most needed knowledge for explanation is not stored in KBS. For example, in rule-based KBS, which are the most widely applied and best understood, problem-solving knowledge is represented as a set of rules in a compiled form, with all of the underlying justification removed. Moreover, production rules are inadequate for defining terms, and for describing domain objects and static relationships among objects (Fikes & Kehler, 1985). Therefore, the knowledge base is shallow and flat, and the KBS cannot justify its actions by 2 referring to the knowledge used in the reasoning. While many approaches have been tried to incorporate the underlying knowledge into explanations, there has been little impact on real world applications, beyond a number of systems implemented in research laboratories. More practical solutions are yet to be identified and evaluated. It is also important to develop new modes of explanation. The emergence of hypertext as a new paradigm for the design of information systems provides new opportunities to meet the challenge of generating quality explanations. A number of researchers have advocated and explored the use of hypertext for this purpose. It appears that hypertext offers a natural way of integrating explanatory knowledge into KBS. For example, if domain knowledge is represented with hypertext, in the form of "textbook-type" definitional and descriptive information, it can be conveniently offered to users through hypertext when the knowledge behind the system is questioned. This dissertation research investigates the benefits that can be expected from the use of hypertext to provide KBS explanations, and if the expectations are indeed warranted. It attempts to determine the behavioral and cognitive basis of the use of hypertext to increase the availability and accessibility of domain knowledge in KBS explanations, by examining the theoretical foundation and by collecting empirical evidence. 1.1 Research Questions Explanations provided by KBS can be used by knowledge engineers, domain experts, and decision makers, for a variety of purposes such as debugging, verification, justification, and learning and training. This research is concerned with the use of explanations by individual 3 decision makers using KBS to solve managerial problems. KBS are considered to be decision-aids to support and improve decision making processes and outcomes. Thus, explanation use is primarily for justification and improving decision makers' understanding of KBS behaviour. The principal research question is: What are the effects of the use of hypertext to provide explanations on the process and outcome of explanation use by KBS users? Since the outcome and process of decision making usually depend on prior knowledge and experience, in order to generalize the principal research question to a wider spectrum of KBS users, a closely related secondary research question is: What are the effects of users' task domain knowledge on the process and outcome of explanation use? 1.2 Importance of the Research This dissertation research investigates the effects of hypertext-based explanations, in response to the requirements and problems of existing KBS explanations, and the need for a new mode of explanation. The research findings will be of practical value to the designers of KBS explanations, for the following reasons: (1) It is not a trivial task to implement hypertext-based explanations (Mao, Benbasat, & Dhaliwal, 1995), thus, the use of hypertext to provide KBS explanations can only be justified if there are significant benefits; (2) no prior research has systematically investigated the benefits of hypertext-based explanations; and (3) in this research, the use of hypertext-based explanations is studied in the context of realistic decision making. It has been argued that the technological capability to build a new system is only one condition for the potential "real success" (Carroll & McKendree, 1987). Mere technological feasibility must be augmented by empirical studies of whether and how people will find a new 4 technology useful and tractable. Carroll and Mckendree also suggest that such behavioral assessments can be made even before the new technology is completely developed. They criticize the fact that (1) too little behavioral research has been done to date in the design of advice-giving systems; (2) the lack of behavioral assessment of usefulness and usability is increasingly being identified as a key reason for the limited impact that expert systems technology has had on real computing; and (3) the area has not yet incorporated a serious psychological theory base or empirical methodology. This research addresses the serious problems criticized by Carroll and McKendree. While hypertext is being integrated into more and more information systems, the evaluation of its behavioral implications is usually not guided by psychological theories. The lack of a psychological theory base may result in mis-match among the fundamental features of the technology, task, and measurement. In this research, hypertext is considered an operationalization of contextualized access to problem-solving knowledge, rather than merely a novel technology. This research attempts to establish a theoretical foundation for the provision, utilization, and evaluation of hypertext-based KBS explanations. Such a foundation is important for understanding and analyzing the effects of hypertext on explanation use, and is useful for future research on KBS design. It is important to conceptualize the benefits of hypertext in a theoretical framework, and to specify and measure the benefits. Empirical evidence collected in this research also adds to the cumulative knowledge of alternative explanation provision technologies. The results will be useful for KBS designers in terms of what problems hypertext can help solve, what factors are important for the usefulness and usability of KBS explanations, and what effects can be expected from the use of hypertext 5 to provide explanations. The behavioral impact of hypertext is assessed in the context of explanation use, as well as in the broader framework of managerial judgment or decision making. Research in the area of decision-making has demonstrated that people often make significant and consistent errors in many decision situations. It has been shown that decision knowledge is often neglected (Powell, 1991). Increasing the use of decision knowledge is one of the strategies to help the decision-maker make better decisions. When a KBS is being used to support decision making, it may be helpful to make the domain knowledge available upon request, in addition to advice. It is important to find out if hypertext can be used to make the decision knowledge in KBS more transparent, and conveniently available in the context of decision making. Hypertext is an emerging paradigm for information representation and user-interface design. Mode of presentation of information has long been considered an important variable in information systems (IS) research (Mason & Mitroff, 1973). Over the years, a majority of studies that investigate the role of IS in management decision making have used presentation format (tabular, graphic, etc), presentation medium (hard copy, CRT, etc), and report features such as colour (e.g., Benbasat & Dexter, 1985) to represent IS variables. The comparative evaluation of the use of hypertext versus conventional lineartext can be considered a study of newly available information representation modes. The results of this study would be useful for assessing the effectiveness of hypertext as a generic mode of information representation and user-interface design. In summary, this research is motivated by the following factors: (1) Existing methods for providing explanations are unsatisfactory; (2) the use of hypertext may have the potential of 6 enhancing the usefulness and usability of KBS explanations by increasing the availability and accessibility of knowledge needed for effective explanation, thus making the problem-solving process more understandable to users. Also, (3) there exists no systematic empirical evaluation of hypertext as a method for providing explanations. 1.3 Organization of This Dissertation This dissertation is organized as follows: Chapter 2 reviews prior research in explanations, potentials of hypertext-based explanations, and recent empirical studies of explanation use. Chapter 3 presents the theoretical foundations for the provision and use of hypertext-based explanations. Cognitive theories of discourse comprehension, and other theories are reviewed and shown to be useful for this research. Chapter 4 develops the research model, and presents hypotheses derived from it. Chapter 5 describes details of the research method, research design, experimental task and systems, and experimental procedures. The research findings are presented in two chapters: Chapter 6 reports results of data analysis on explanation use and the effects of explanation use; and Chapter 7 describes results of the verbal protocol analysis. Lastly, Chapter 8 concludes the dissertation by discussing the major contributions, implications, and limitations of this research. It also comments on directions for future research. Endnotes. The follow three definitions are important for specifying the subjects studied in this research, and are used frequently throughout this dissertation. A brief explanation is provided for each of them: 7 Knowledge-based systems (KBS), also known as expert systems, are artificial intelligence programs that achieve expert-level competence in solving problems by utilizing a body of knowledge of the task domain (Feigenbaum, McCorduck, & Nii, 1988). Within a narrowly defined task domain, a knowledge-based system may perform at a level equivalent to or even higher than that of the human expert. Explanations provided by a KBS are machine-generated descriptions of the KBS - what it does, how it works, and why its actions are appropriate (Swartout, 1987). Explanations are considered one of the fundamental characteristics of KBS; they are expected to increase users' understanding of the reasoning process and users' confidence in the solutions provided by KBS, for users to take actions accordingly. Furthermore, explanations may play an important role in influencing users' acceptance of KBS, and ultimately impact on individual and organizational effectiveness and efficiency. Hypertext is a new paradigm for information management. It can be considered as an information representation scheme - a kind of semantic network, which mixes informal textual material with more formal and mechanized operations and processes. At the same time, hypertext is an interface modality, which provides a novel way of directly accessing data, featuring link buttons (icons) which can be flexibly embedded within the content material by the user (Conklin, 1987). Basic constructs of hypertext are nodes and links. A hypertext node usually corresponds to a single concept or idea, which can be referenced by other nodes through machine-supported links. The designer can link discrete elements in a system and set up a cross-referenced system through which users can navigate in any direction. Hypertext allows users to access non-linear information documentation, according to interest, level of understanding, differing viewpoints, and other individually determined factors. 8 CHAPTER 2. A REVIEW OF PREVIOUS RESEARCH This chapter reviews previous research conducted in the three major areas upon which this study is built. It begins with research in explanations by computer scientists from an artificial intelligence perspective. Next, it examines the complementary relationship between hypertext and explanation facilitates. Lastly, empirical studies of explanation use are reviewed and summarized. 2.1 Explanations in KBS - An Artificial Intelligence Perspective The functions that KBS explanations can or should perform may be described based on the type of expected user and task to be supported, according to the following categories (Swartout, 1983; Wick & Thompson, 1992): (1) Explanations can describe the reasoning process of a KBS so that a designer can verify its knowledge, and prove that the reasoning and conclusions are sound, especially when an unexpected result is produced. (2) Explanations must justify or convince users that the conclusion and suggestion provided by KBS are appropriate and relevant, since KBS are designed to influence the thinking and behaviour of users. (3) Explanations can be used as a training aid to facilitate learning by novice users, by describing the system's domain knowledge and inference techniques. The objective is to teach or train users, i.e., to transfer the knowledge in KBS to novice users. (4) Most KBS have some limited capability to provide explanations to clarify why users should submit certain information to the system. However, few of them can explain terminologies used. Finally, (5) explanations can help a knowledge engineer debug KBS during development, since explanations often make errors apparent that are easily overlooked in more formal program code. 9 Typically, explanations may be generated either based on the problem-solving process (i.e., the reasoning trace), or from explicitly represented domain knowledge (often called deep knowledge) in the KBS. Thus, explanations generated from these two different sources can be called reasoning-trace explanations and deep explanations (Southwick, 1988), respectively. For instance, the Why and How explanations in MYCIN (Shortliffe, 1976) are provided in terms of the goals and premises of production rules. A Why question ("Why is this information useful?") would be translated internally as "In which rule is this information going to be used, and what goal does the rule conclude about?" The explanation facility would trace the reasoning chain upward, and present all the rules (Clancey, 1983). Similarly, a How question ("How is the goal reached?") is answered by descending the goal chain to explain how lower sub-goals were or could be achieved. However, reasoning trace alone is not sufficient for making explanations useful and comprehensible. Explicitly represented deep knowledge must also be included (Clancey, 1983; Swartout, 1983). For instance, Clancey's GUIDON (1983) and Swartout's XPLAIN (1983) involve explicitly represented domain knowledge. The notion of deep knowledge is analogous to the meaning of "model" in the literature of model-based reasoning (e.g., Fikes & Kehler, 1985; Sticklen & Bond, 1991). For the purposes of this research, deep knowledge in KBS includes the three types of knowledge identified by Swartout and Smoliar (1987) (Table 2.1). In addition to the Why-How explanation paradigm, other types of explanation have been incorporated in various KBS. For example, the Strategic type of explanation provides insight to meta-knowledge, especially the control objective and overall problem-solving strategies used by a system (Hasling, Clancey, & Rennels, 1984). The What type of explanation describes domain 10 Terminological Knowledge: knowledge of concepts and relationships of a domain that domain experts use to communicate with each other. In order for one to understand a domain, one must understand the terms to describe the domain. Domain Descriptive Knowledge: "textbook rudiments" which are required before one can solve problems. It provides abstract factual knowledge about a domain, typically represented declaratively. Problem-Solving Knowledge: knowledge about how tasks can be accomplished. It can be represented as plans and methods which consist of a sequence of steps to accomplish a goal. Table 2.1 Deep Knowledge in KBS concepts or decision variables used by a system (Rubinoff, 1985), in response to questions like "What do you mean by object or variable name!" Common deficiencies of explanations are the lack of knowledge, irrelevancy, and low interactivity, which make explanations difficult to understand (Gilbert, 1989; Maybury, 1992; Southwick, 1989; Wick & Thompson, 1992). The most frequently cited problem is the lack of knowledge. Because the "deep model" of the domain is usually implicitly represented in KBS, a shallow knowledge base only gives shallow explanations (Southwick, 1991). The consensus in the KBS field is that explanations need to go beyond reasoning trace to provide backing or first principles for KBS action. A number of researchers have tried to tackle the problem with various approaches (e.g., Abu-Hakima & Oppacher, 1990; Chandrasekaran & Mittal, 1983; Hasling, Clancey, & Rennels, 1984; Wick & Thompson 1992). The second major deficiency is the irrelevancy. Usually, explanation facilities do not take users' needs and differences into account, and provide the same answer to all. Users do not have the same background; nor do they need the same level of detail. Those with different levels of experience may want to see different aspects or various levels of detail. The understanding of 11 what a user already knows is often utilized by human explainers to aid inquirers in gaining new understanding. User profile information is sometimes used to develop user specific explanation plans (e.g., Hasling, Clancey, & Rennels, 1984). The third major problem with existing explanations is the low interactivity between users and KBS. Interactivity is important in the communication between a human expert and a client for a number of reasons: (1) The explainer's assumptions about the inquirer's understanding may be at times invalid. An interactive dialogue is required to correct such mistakes. (2) Interactivity is also important whenever the object of explanation is either multifaceted or ambiguous (DoUahite, 1991). (3) If the explainer and inquirer have different expectations as to the level of detail required, the explanation will be either too detailed or lacking in detail. Interactivity allows the inquirer to interrupt the explainer, thereby permitting communication of the appropriate level of detail. And, (4) while the level of detail may be properly diagnosed, the "terminology" used at that level may result in a communication breakdown. In such circumstances, the inquirer should be allowed to ask for clarification and definition of terms before the explanation continues. Although impressive progress has been made since MYCIN, most of the current approaches to improving explanations are too complicated and domain specific to be put into common use. The next section will examine the complementary relationship between fundamental features of hypertext and the requirements for providing explanations. Also, the potential of hypertext to overcome the above deficiencies of KBS explanations to various degrees will be discussed. 12 2.2 Use of Hypertext to Provide Explanations The application of hypertext to the provision of KBS explanations has received considerable attention in recent years. For example, Rada and Barlow (1988) noted the complementary relationship between hypertext and KBS. Moore and Swartout (1990) proposed a hypertext-like interface that would allow users to point to a portion of an explanation for further clarification. The purpose was to provide explanation in a dialogue style and yet avoid the difficult referential problem in natural language analysis. Basically, users would be able to point to the text that needs explanation, and then be provided with a menu of potential questions about the highlighted text. Moore and Swartout did not adopt a hypertext approach per se, emphasizing the need for dynamically generated links as opposed to predefined ones. The major limitation of their proposal was its exclusive focus on hypertext style interface, regardless of the virtue of hypertext as a knowledge representation scheme. Similarly, Kane (1990) described the development of a hypertext interface to a medical KBS. The user of the system could choose to read additional explanatory information through the interface, during the consultation. Abu-Hakima and Oppacher (1992) described a framework, RATIONALE, for building knowledge-based diagnostic systems that could explain by reasoning explicitly. They implemented RATIONALE with a graphical hypermedia user interface, mainly for its ease of use. Furthermore, system development tools such as KnowledgePro (Knowledge Garden, 1991) have incorporated hypertext capabilities. The functional match between hypertext and explanation provision in KBS can be argued from two related perspectives: hypertext as a knowledge representation scheme, and as a user interface modality that allows direct access to data with link icons (Conklin, 1987). As a 13 knowledge representation scheme, hypertext is appropriate for knowledge storage and retrieval by virtue of its linking capabilities. Essentially, knowledge represented with hypertext preserves the rich associations among domain concepts, and corresponds to the expert's knowledge schemata directly (Jonassen, 1990). It is claimed that hypertext is ideally suited for learning or knowledge acquisition in complex domains (Spiro, Coulson, Feltovich, & Anderson, 1988). Hypertext-based knowledge representation highlights the interdependence among domain concepts in complex domains, while KBS are designed to work in complex domains. Therefore, the domain knowledge necessary for explanation provision may be ideally represented and communicated with hypertext. As a user interface modality, hypertext can make domain knowledge more accessible to users from multiple perspectives, allowing explorations of varying levels of difficulty, amount of detail, and granularity. Stanton and Stammers (1990) claim that hypertext: (1) allows for different levels of prior knowledge, (2) encourages exploration, (3) enables users to see sub-tasks as part of the whole task, and (4) allows users to adapt material to their own learning style. Thus, users can engage in a personalized interaction with hypertext-based application systems. Furthermore, explanations provided with hypertext allow timely access to relevant information in the problem domain. Domain concepts and principles can be mapped onto specific instances through hypertext. While previously user profiles have been used to deal with the irrelevancy problem of explanations, a hypertext-based solution would offer more choices to the user, and allow the user to control what to see and when to see it. Hypertext may also expedite the two-way communication between KBS and users, i.e., increase the interactivity. Since explanations really involve a dialogue (Moore & Swartout, 1989), 14 the user interface of KBS should allow users to ask follow-up questions in the context of the on-going dialogue. However, building such an interface poses a serious challenge for natural language understanding (Moore & Swartout, 1990). Therefore, hypertext, and its extension hypermedia, have become a promising alternative for the design of intelligent interfaces needed for explanation provision, and are more economical than natural languages (Chignell & Hancock, 1988). The central idea of using hypertext to provide explanations in KBS is to make deep knowledge more accessible in a contextualized manner, and to highlight the structure of domain knowledge. Thus, users will be provided with a rich context to understand and interpret KBS advice and reasoning. In principle, there are two ways that the provision of explanations can take advantage of the flexibility of hypertext in linking information (Mao, et al., 1995). First, deep knowledge can be linked to KBS output with hypertext links. Domain constructs (i.e., concepts and procedures, e.g., "working capital") appearing in KBS output can be highlighted with link markers indicating that deep explanations are available, thereby, allowing a user to switch to a "free exploration" mode to pursue domain knowledge in a natural and seamless way. In other words, hypertext is the means for bridging the gap between the surface structure and deep knowledge. Second, various domain concepts and procedures involved in problem-solving can be linked to each other with hypertext links to reflect the rich associations and complexity of the task domain. Concepts and procedures that are used in the same sub-analysis by the KBS can be associated together with links to reflect the domain structure. In general, links of hypertext can be used to capture the hierarchical structure among the task, sub-tasks, and domain constructs, 15 z>r\ Reasoning Trace Explanations Advice Deep Explanations (Hyper text -Based) ( < = J Ind ica t ing the Contr ibut ions of Hypertext) Figure 2.1 Use of Hypertext to Provide Deep Explanations to give the user the overall picture of strategic relationships among domain constructs. Figure 2.1 illustrates conceptually unique contributions of hypertext in making deep knowledge more accessible. The two unshaded arrows are enabled by hypertext, meaning that deep knowledge can be used optionally to supplement reasoning-trace explanations and KBS advice. 2.3 Empirical Studies Related to Explanation Use Teach and Shortliffe (1981) conducted one of the earliest studies of user attitudes towards various KBS capabilities. Attitudes of teaching and practising physicians were measured using a questionnaire along three dimensions: (1) the acceptability of different medical computing applications, (2) expectations about the effect of computer-based consultation systems on medicine, and (3) demands regarding 15 performance capabilities of consultation systems. It was found that the highest demand was a program's ability to give explanations for its diagnosis, and the third highest demand was the ability to demonstrate the understanding of the particular 16 medical domain. In contrast, the ability to diagnose perfectly was given a much lower ranking, 14th out of the 15 capabilities surveyed. In recent years, there have been more empirical studies of the use of KBS explanations from the perspective of human-computer interaction. Lamberti and Wallace (1990) studied the impact of expert systems on user performance in a field experiment. Independent variables included users' domain expertise, explanation (knowledge) presentation format, and task characteristics (uncertainty or degree of routinization). Explanations were presented in two ways: procedural presentation, which utilized IF...THEN premise-action pairs to describe the line-of-reasoning; and declarative presentation, which contained knowledge as a static collection of facts about objects, events, and situations. Dependent variables included problem-solving time and question-answering time, performance accuracy, confidence (rating) in the accuracy of the expert system recommendations and lines-of-reasoning, and user satisfaction with the system. Performance accuracy was defined as the number of errors made by a user when answering system questions. Two findings were particular interesting: (1) When presented with declaratively formatted explanations, high-skill users performed better than low-skill users in response time and accuracy for tasks involving a higher degree of uncertainty; low-skill users performed better than high-skill users for tasks involving a lower degree of uncertainty. (2) The use of the expert system had a greater impact on the performance of lower skill employees, in terms of decreased decision time and increased accuracy. A number of doctoral dissertations have been completed over the past few years involving experimental studies of explanation use (e.g., Dhaliwal, 1993; Hsu, 1993; Moffitt, 1989; Ye, 1990). Considerable interest has been placed on users' requirements, by investigating the effects 17 of providing explanations consistent with user mental models (Ye, 1990) and cognitive styles (Hsu, 1993). In both of the studies by Ye and Hsu, the rule-trace/justification/control (strategic) categorization of explanations (Chandrasekaran, Tanner, & Josephson, 1989) were used. Rule-trace explanations presented the reasoning-trace to reach a conclusion, in terms of rules and goals. Justification explanations revealed the rationale behind the system's action (deep knowledge). Control (strategic) explanations described the system's problem-solving strategies, in terms of high-level goals. Ye (1990) investigated the influence of task types (data abstraction versus heuristic match) and users' domain expertise on preference for various types of explanation. A computer program was used to simulate a KBS user interface. Subjects were asked to use and evaluate the system for making a purchasing recommendation for their company. In Phase 1, the three types of explanation were displayed to subjects after conclusions. Subjects then rated the usefulness of each type of explanation and ranked them on a preference scale. In Phase 2, subjects were allowed to request any of the three types of explanations. Both experts and novices requested justification explanations more than the other two types of explanation. Nevertheless, novices requested more explanations than experts did for each type. Explanations enhanced user understanding of, and confidence in, the system's conclusions and reasoning processes. Hsu (1993) investigated the effects of cognitive styles and interface designs on the use of expert systems, with an emphasis on knowledge transfer from KBS to users. A financial analysis expert system and tasks were used in the experiment. It was conducted over a four-week period in which subjects completed four practice and testing sessions. Learning during this period was measured. Subjects' procedural knowledge acquired during the practice phase of the 18 experiment was assessed as the independent variable. Hsu found that the use of justification explanations resulted in a greater amount of knowledge transferred than using rule-trace explanations alone. One of the major conclusions was that explanation presentation format (availability of various types of explanation) was important for knowledge transfer. Throughout the practice and experiment process, subjects used the system with the expectation that they would be tested after using the system. As a result of the explicit focus on knowledge transfer (involving multiple tests of learning), the research findings may not be generalizable to the use of KBS explanations in work situations. Moffitt (1989) also studied the learning of domain knowledge in expert systems through explanations. Subjects received rule-trace explanations, canned-text explanations, context-specific embedded explanations (i.e., when the KBS requests a piece of information or provides a recommendation, a brief explanation is displayed automatically), or no explanations at all. Subjects then ranked the KBS in terms of learning effects, usefulness as a decision aid, and usefulness as a learning device. It was found that context-specific embedded explanations provided for a higher degree of learning, and more useful information than other forms of explanations. In fact, only the context-specific embedded explanation group achieved a significantly higher degree of learning than the control group. Since this type of explanation could be considered as a form of justification, it appeared that justification explanations were preferred over rule-trace explanations. It is reasonable to speculate that the number of explanations used might also be related to the amount of learning, because users in the context-specific embedded group might use more explanations. However, this was not clear because the number of explanations actually used was not measured. 19 Dhaliwal (1993) established the importance of making deep knowledge available to users, in addition to justification and trace of reasoning, based on cognitive feedforward and feedback theories (e.g., Bjorkman, 1972). Explanations were classified into two categories: (1) Deep knowledge on all input to the problem-solving process (input-related) was provided as feedforward explanations; (2) explanations for recommendations (outcome-related) were termed feedback explanations. Both of these two general categories were further broken down into Why/How/Strategic explanations. The use of various types of explanation, and user preferences for the feedforward versus feedback explanations were assessed in a lab experiment. Dhaliwal found that overall 16% of the explanations provided by the KBS were requested by the users. In terms of explanation types, the Why and How types were used more than the Strategic type. Novices used more Why explanations, while experts used more How explanations, compared to Strategic explanations. This difference was interpreted as a result of different learning objectives between experts and novices. Novices used the Why explanations to gain declarative knowledge of the implications of various input cues and KBS conclusions, while experts used the procedural How explanations to probe and understand how the KBS performs its analysis, especially in comparison to their own analysis of the problem situation. Both experts and novices improved performance by using explanations. More specifically, the use of feedback explanations (outcome-related) had a positive effect on the accuracy of judgment making. The use of feedforward explanations (input-related) had a positive effect on user perceptions of the usefulness of the explanations, but had no effect on judgment accuracy. In contrast to other studies of explanation use, Dhaliwal found that domain expertise had no effect on the use of explanations and experts used just as many as novices. He speculated that the discrepancy 20 between his study and previous ones was mainly due to the difference between explanation use in a working situation and in an instructional one. In summary, prior research has found that justification was preferred over rule-trace explanations across a number of studies. Except the Ye study, all of the other three dissertations emphasized that KBS and its explanations could be used as a tool for knowledge transfer, and therefore affect the learning of users. User characteristics were considered an important independent variable, which should be incorporated into the investigation of the use of explanations. However, there existed some inconsistency regarding the effect of expertise on the use of explanations. A major limitation in the previous studies was that the cognitive use of explanations in judgment and decision making was not studied. Some of the studies did not even have a measure of whether or not the users ever looked at any explanations, although they were given the opportunity to request them (e.g., Moffitt, 1989). Secondly, most of the studies focused on the use of KBS explanations for learning, rather than for problem-solving in a work situation. The third limitation was related to the use of different implementations of explanation types, thus, research results were difficult to compare and generalize. While the Dhaliwal study made progress in making deep knowledge directly available to users, and illustrated that the availability of deep knowledge had a impact on user perceptions about the KBS, deep knowledge was not naturally integrated into user-KBS interaction. Feedforward explanations were available only from separate, specially-designed screens, which were independent of the rest of KBS output. This research extends the Dhaliwal (1993) study by investigating if hypertext can make deep knowledge more accessible and useful, and how 21 hypertext will change the dynamics of explanation use and decision making. 22 CHAPTER 3. THEORETICAL PERSPECTIVES As part of the evaluation of the effects of hypertext-based explanations, theoretical foundations were sought for understanding the underlying cognitive process and outcomes of explanation use. The objective was to build a theoretical framework to address the following four basic questions: (1) how can the outcome of explanation use be measured and evaluated? (2) what are the main intervening variables of information processing and how do they relate to the outcome of explanation use? (3) are there any other important independent variables that can be manipulated and studied along with the explanation provision method? and (4) how do the independent variables relate to the intervening variables? Given that the central issue of this study pertains to the use of explanations provided through hypertext and lineartext, cognitive theories of discourse comprehension by van Dijk and Kintsch (1978, 1983; Kintsch, 1988), and Mayer (1980, 1985; Bromage & Mayer, 1981) were found highly appropriate for guiding this research. Because explanations provided by KBS are a kind of optional, technical expository text displayed on a computer monitor, the use of explanations can be seen as the process of requesting, reading, comprehending, retaining, and applying the discourse produced by KBS. To a certain extent, users of KBS can be considered learners of the problem-solving process used by KBS. A variety of other theories are also relevant to this research. For example, Anderson's ACT* model (1982) may be used to justify the provision of deep knowledge (in terms of declarative versus procedural knowledge) in explanations. The meaningful learning theory (Ausubel, 1968), theory of cognitive flexibility (Spiro, Coulson, Feltovich, & Anderson, 1988), 23 notions of incidental learning (Heller, 1990), the production paradox (Carroll & Rosson, 1987), and contextualized learning (Fischer, Lemke, & McCall, 1990), may also be useful for understanding the effects of hypertext on explanation use. They all have their roots in cognitive theories, although none of them provides a unifying comprehensive framework for this research. Therefore, discourse comprehension theories were used as the primary unifying theoretical foundation for analyzing the effects of hypertext versus lineartext and domain expertise. Other relevant theories are also reviewed in this chapter, because they shed light on this research from different perspectives and levels. 3.1 Theories of Discourse Comprehension Mayer's framework of learning from text is deemed useful, because it is concerned with the relationship between comprehending technical prose and problem-solving. To gain insights into the relationship between the independent variables and dependent variables, the cognitive information processing variables in van Dijk and Kintsch's process model of discourse comprehension are adopted to replace the intervening variables in Mayer's framework (1985). Mayer's original intervening variables include memory encoding and retrieval processes, which are related to the independent variables under study, but not as directly as those of van Dijk and Kintsch. The intervening variables of discourse comprehension are closely related to the memory encoding and retrieval processes used in Mayer's framework. According to van Dijk and Kintsch (1983), memory is merely a by-product of processing - one remembers what one does. The depth of processing and its elaboration are important because deeper, more elaborate processes leave in memory more traces that can later be recovered. Kintsch and van Dijk's work (1978, 1983) 24 is particularly useful because it focuses on the conceptual-level processes as opposed to lower-level perceptual processes or linguistic parsing processes. Kintsch and van Dijk believe that their theory provides a firm basis for investigating how learning from text proceeds. As an organized synthesis of Mayer's, and van Dijk and Kintsch's work, a research framework of learning from text is depicted in Figure 3.1. Each element of this framework will be discussed next. Intervening Variables of Discourse Comprehension: • No. of Bridging Inferences . No. of Memory Reinstatements m Macrostructure of Text Figure 3.1 A Framework for Studying Learning From Text 3.1.1 Dependent Variables Mayer's research framework (1985) is concerned with increasing problem solvers' ability to get and use information from text for creative problem-solving. The target behaviour is knowledge transfer - the ability to use the information from the text in novel ways, going beyond what is presented in the text. 25 Research objectives implied in Mayer's research framework typically include: (1) predicting retention performance, and (2) predicting problem-solving performance, i.e., creative problem-solving based on the information in the text. Problem-solving performance refers to the capability to put information together in a novel way to solve problems that are different from those a person has previously learned to solve (Mayer, 1980, 1985). It can be measured by testing the ability to creatively apply knowledge contained in the text to other problems. Retention scores can be obtained through a forced-choice test covering the basic content. 3.1.2 Independent Variables Independent variables in Mayer's framework (1985) of learning from text consist of text characteristics such as hierarchical structure and signals for key ideas, and learner characteristics such as prior knowledge, past experience, or reading strategies. To improve learning and problem-solving performance, text design can be manipulated, for instance, by organizing the text around key ideas or in a way emphasizing or signalling the main explanatory information. Readers with sufficient domain knowledge are able to process new information differently from those without such knowledge, because prior knowledge is used for understanding the meaning of a discourse. It is assumed that prior knowledge constrains the construction of discourse comprehension and provides part of the context within which a discourse is interpreted (Kintsch, 1988). The usefulness and relevance of Mayer's framework are clearly manifested in the close match between the key issues of this research and his framework's implied research objectives, independent variables, and dependent variables. The use of hypertext versus conventional 26 lineartext to represent and access domain knowledge in KBS is indeed a kind of text design (manipulation of the text characteristics), which may affect learning and problem-solving performance. 3.1.3 Intervening Variables Kintsch and van Dijk identified three information processing variables for their cognitive process model of discourse comprehension (1978, 1983). The first pair of intervening variables are bridging inferences and memory reinstatements, which are useful for predicting knowledge retention and analyzing the readability of text. Readability is the relative ease with which text can be read and remembered. The model introduces two determinants of readability: the number of memory reinstatements that occur in processing the text and the number of bridging inferences required to construct a coherent meaning from the text, in addition to traditional readability variables (which are relatively superficial variables such as word frequency and sentence length). According to Kintsch and van Dijk (1978), the need for reinstating a text proposition that is no longer available in the short-term memory buffer occurs when the textual input on a given cycle cannot be related to the propositions still held in the short-term memory buffer. In such cases, the model assumes that the reader searches his or her long-term memory for possible antecedents to the current propositional input. If the search is successful, the search result is reinstated in short-term memory, thus providing a coherent link between what was read before and the new input. If reinstatement searches are unsuccessful, a bridging inference is assumed to occur to infer the missing link. Both reinstatement searches and inferences are assumed to be resource-consuming operations and therefore likely sources of comprehension difficulty. 27 Typically, a text leaves some crucial causal relationships implicit, and readers have to supply this missing link from their own knowledge. However, people are often not very good at this task, and arrive at misrepresentations that grossly distort the actual causal relations (Stevens, et al., 1979, cf. van Dijk & Kintsch, 1983, p. 46). Furthermore, if a key concept in the text is unfamiliar or not understood (i.e., vague or not in long-term memory), memory reinstatements are impossible and bridging inferences may be incorrect. The potential advantage of providing problem-solving knowledge through hypertext can be explained by using the notion of bridging inferences and memory reinstatements. In KBS output (referring to recommendations and explanations thereafter), it is neither feasible nor desirable to include all essential problem-solving knowledge. An assumption of a certain level of prior domain knowledge must be made on the users of KBS. However, it is possible, and likely, that there exists a gap between the assumed level of domain knowledge and the level actually possessed by a given user. In other words, the assumption may be invalidate from time to time depending on a given user's expertise. Because hypertext-based explanations provide the user with multiple opportunities of contextualized instant access to domain knowledge, the number of memory reinstatements and bridging inferences may be reduced. Moreover, memory reinstatement and bridging inferences may be easier to make and more likely to be correct, based on domain knowledge made available by hypertext. Even for knowledgable users, hypertext-based access to unfamiliar or infrequently used domain concepts may help reduce the need to search long-term memory. The third intervening variable is the macrostructure of text. Most theories of text comprehension assume at least two levels of memory representation for processed text (e.g., Just 28 & Carpenter, 1987; van Dijk & Kintsch, 1983). One is a surface level consisting of a verbatim representation of text fragments or a speech-like representation of text sentences. The other is a hierarchical structure, usually called the textbase, that reflects the meaning of elements that can be extracted from the surface level. The textbase is composed of a "microstructure" with numerous "micropropositions" representing base meaning units, and a "macrostructure." The macrostructure can be represented as a set of "macropropositions," which are combinations and reductions of micropropositions. Generally, macrostructure is the theoretical account of what is usually called the gist, the upshot, the theme, or the topic, of a text (van Dijk & Kintsch, 1983). Figure 3.2 sketches the formal structure of a textbase1. The textbase combines two sources of information: the text itself and knowledge - knowledge about language as well as knowledge about the problem domain (Kintsch, 1988). Macrostructure is of central importance in the processing of complex information. It is asserted that without macrostructure complex cognitive tasks (such as learning, recall, action planning) could not possibly be performed (van Dijk & Kintsch, 1983, p. 195). It is likely that one of the pivotal events in the successful comprehension of text is the process of building a macrostructure of the text. When a user reads KBS output, the efficiency of the process of building the macrostructure depends largely on a user's level of domain knowledge involved in KBS output. If the user has instant access to the context-relevant domain knowledge enabled by hypertext, then the macrostructure building process may be substantially enhanced. Hypermaps (Conklin, 1 The notation is adopted from van Dijk and Kintsch (1983): The macrostructure is labelled with M nodes, while P nodes form the microstructure. 29 Figure 3.2 Formal Structure of a Textbase 1987) — hierarchical charts of domain concepts that can be used as a navigation tool via hypertext links — may be used to provide strategic knowledge of problem-solving. Such hypermaps may be particularly useful for building the macrostructure of domain knowledge, because they support the "zooming-in and zooming-out" type of navigation among different levels of domain knowledge in the knowledge base, which can be represented by the structure in Figure 3.2. Comprehension of a text segment containing unfamiliar concepts should be deeper and faster than it would be without instant access to the domain knowledge contained in the explanations of the concepts (Lachman, 1989). Moreover, if hypertext is used to facilitate access to domain knowledge, important domain concepts appearing in KBS output will be attached with link icons to indicate explanations are available upon request. Such highlighting may have a signalling effect, which in itself may help the user appreciate the macrostructure of the domain knowledge. Furthermore, interdependence 30 among domain concepts is also highlighted through hypertext links. These features are consistent with some of Mayer's recommendations (1985) for improving the understandability of technical text. For example, one of the recommendations is to underline technical terms and to provide a glossary for all technical terms, so that unfamiliar concepts will not make smooth reading impossible. It has been found that graphically highlighting the topic sentence was helpful for people having difficulty finding titles for a paragraph. Another recommendation is to include repetition of important ideas in various wordings, and to build redundancy into the passage so that the reader has several opportunities to be exposed to the main points. These recommendations can be easily implemented using hypertext, because they are natural properties of hypertext-based knowledge representation and retrieval. In summary, cognitive theories of discourse comprehension illuminate the understanding of the effects of using hypertext to represent and access domain knowledge. The emphasis placed on prior knowledge by discourse comprehension theories has two implications. It provides theoretical support for the provision of domain knowledge in KBS output, and calls attention to the effects of users' prior knowledge on the process and outcome of learning and problem-solving. The research framework discussed in this section was used as the basis for deriving a research model for this study. 3.2 Other Related Theories While the previous section discusses the foundation of this study from the perspective of cognitive psychology, this section reviews additional theories that may also shed light on this study. It will be shown that principles emphasized by these theories are supported by the use of 31 hypertext for learning and decision making. 3.2.1 Alleviation of the Production Paradox The production paradox refers to the conflicts between learning and working, constantly present in work settings: learning is inhibited by lack of time and working is inhibited by lack of knowledge (Carroll & Rosson, 1987; Carroll & McKendree, 1987). Consequently, productivity suffers. The notion of the production paradox has a direct bearing on the use of KBS, because the use of both explanations and hypertext are optional operations. Learning about the reasoning process of the KBS by requesting explanations may be instrumental for judgment making, but also consumes extra resources — time and effort. Whether requests for explanations will result in savings in cognitive resources and improvements in judgment may depend upon the usefulness and ease of use of the explanations. The motivational "cost" of learning may be reduced through the design of better explanation facilities and interfaces (Carroll & Rosson, 1987). More learning may occur with the same amount of time and effort if learning is encouraged and made convenient and easy. The use of hypertext to provide explanations may facilitate learning by effectively reducing the motivational cost of learning, thus, alleviating the production paradox. This may be done by providing contextualized and easy access to task knowledge, and by making the exploration of task knowledge natural and convenient. 3.2.2 Contextualized Learning The production paradox may also be alleviated by implementing systems that are consistent with the principles of contextualized learning. Contextualized learning involves 32 learning within the context of work on real-world problems (Fischer, Lemke, & McCall, 1990). There are three aspects of the operationalization of the notion of contextualized learning, according to Fischer, et al. First, situated learning requires integrating learning into situations where it is useful or instrumental in solving real-world problems. Second, problem-solving can be seen as repeated alternations between situated action and reflection. A breakdown occurs when the problem solver realizes that a situated action has resulted in unanticipated consequences. Reflection is used to repair the breakdown, and then situated action continues. The third aspect relates to the integration of construction and argumentation. Construction is the process of shaping the solution, whereas argumentation is the reasoning about the problem and its solution. Fischer et al. stress that a system must help the problem solvers do the following: (1) to see where their knowledge is inadequate (to perceive breakdowns); (2) to find the argumentation knowledge they need for such situations (ideally, all of, and only, the knowledge needed for the tasks at hand); (3) to understand how generalized principles of design relate to their particular construction situations; and (4) to understand how to perform the contextual elaboration needed to go beyond principles - to make intelligent exceptions and perform detailed construction. The use of explanations in KBS can be reinterpreted according to the notion of contextualized learning: When reading KBS output, if a user's reasoning process "breaks down" (e.g., cannot follow the reasoning by KBS), explanations are necessary to "repair" the process by supplying general principles as well as situated constructive and argumentation knowledge. The first three requirements proposed by Fischer, et al. are facilitated by the features of hypertext to various degrees. Furthermore, the notion of contextualized learning provides a strong argument for using hypertext to provide explanations: Contextualized learning can only be 33 effective if the flow of work is not disrupted (Fischer, et al., 1990). Such learning requires rapid and timely access to relevant information in the knowledge base, and a collection of argumentation on recurring issues in the problem domain. A system supporting contextualized learning makes the users aware of issues, possible solutions, and argumentation. The above requirements are easier to satisfy using hypertext than using conventional lineartext. The association capability of hypertext allows immediate access to explanatory, elaborative, and other types of information and argumentation background knowledge. Hypertext also accommodates individual differences in prior knowledge and learning styles, which are crucial contextual factors. Users with little knowledge can explore the network in depth, while knowledgeable users can ignore links to known information. With systems facilitating contextualized learning, learning does not take place in a separate phase and place, but is integrated into the work process. 3.2.3 Meaningful Learning Meaningful learning occurs when an individual connects new information in a non-arbitrary and substantive manner with knowledge that already exists in memory (Ausubel, 1968). Individuals must search long-term memory to retrieve appropriate anchoring ideas or context. Previous studies suggest that unless learners are provided with cues to help them retrieve appropriate concepts, they will often be unable to do so (Gick & Holyoak, 1983). Carroll et al. (1985) advocate techniques for effective learning that facilitate learning by doing, and by actively assimilating novel experience to prior experience, in the context of learning to use word-processors. They also criticize certain learning conditions that do not make provisions for learners 34 to take the initiative to learn what they want, when they want, as well as support for the kind of problems such initiatives can produce. It can be argued that hypertext-based explanations promote meaningful learning because they provide the context to assimilate new knowledge by highlighting the interdependence and structure of domain knowledge, and by allowing instant access to domain knowledge, and user-controlled depth, path, and timing of exploration. 3.2.4 Factors Affecting the Use of Knowledge in Decision Making Earlier in this chapter, the availability and accessibility issues of domain knowledge were addressed in the review of cognitive theories of discourse comprehension. Now, it can be argued that the availability and accessibility of domain knowledge influence the utilization of deep knowledge in decision making, from a slightly different perspective. Higgins and Bargh (1987) reviewed a number of factors affecting the utilization of knowledge in the judgment process, including knowledge availability, knowledge accessibility, salience/visual access, perceived applicability, and motivational significance. These factors may lead a judgment's content to coincide with or deviate from a specific criterion (Kruglanski, 1989). Two factors that are most relevant to this study will be reviewed next. The first relevant factor pertains to the availability of inference rules. A person's tendency to render accurate judgments should depend on the relevant inference rules that she or he has available in memory. Such inference rules could be "if-then" statements that link a given category of evidence with a given judgment (e.g., "if a credit applicant is a home owner, he or she may be considered as more stable and reliable"). Some inference rules may be derived from people's conceptions about themselves, others may be derived from their conceptions of external sources 35 of information. For example, a statement linked with a source perceived as authoritative is more likely to be accepted by an individual, hence, be adopted as this person's own opinion. Experts and novices may have quite different conceptions in this regard. The second relevant factor deals with the accessibility of inference rules. In order for a rule to be used in a judgment, it must not only be available in an individual's long-term memory, but also be instantaneously accessible. Accessibility of relevant constructs is possibly involved in the encoding of situational evidence in ways that highlight the applicability of given inference rules. Support for this notion comes from studies in which specific packaging of the evidence may have primed the appropriate encoding categories and hence increased the subject's tendency to use otherwise under-utilized principles. For example, it was found that base rate information was more likely to be used when it was interpreted to possess causal significance (Ajzen, 1977). Kruglanski, Friedland, and Farkash (1984) found that statistically correct use of the regression to the mean rule increased when such evidence was couched in familiar, everyday examples. Presumably, such examples served to activate the appropriate constructs (chance factors, variability) that rendered the regression logic more apparent to subjects. These two factors may be affected by the use of hypertext to represent and access domain knowledge, and the request of explanations by users in general. Explanations in a KBS contain inference rules and domain knowledge underlying the reasoning by the KBS. Domain knowledge such as causal relationships can be made both available and accessible by explanations. Therefore, users with convenient, contextualized access to the domain knowledge through hypertext may be able to utilize more knowledge in the judgment and decision making process. 36 3.2.5 "Explanation Effect" and Reasoning-Trace Explanations While explanations on deep knowledge can make the domain knowledge (inference rules) used by KBS available to users, explanations on the reason-trace may directly influence judgment accuracy. In social psychology, there is a well-known phenomenon called "explanation effect." Essentially, it refers to the fact that a causal explanation for an event occurrence may increase one's assessment of the judged likelihood of the event. U. Anderson and Wright (1988) looked for the presence of the explanation effect in the judgments of accounting students and experienced auditors on the likelihood that the balance of accounts was materially in error. They found that written explanations for the occurrence of the target event did result in an explanation effect for the students, but not for the experienced auditors. The explanation effect was present for the novices, perhaps because they were developing an explanation for the first time without the needed schematic organization of knowledge. Domain-specific experience mediated or eliminated any explanation effect, probably because experienced professionals have explained, and received feedback on their explanations for numerous situations. The explanation phenomenon appears to influence information processing at both encoding and retrieval/construction, as well as extending from likelihood judgments to actions. Causal explanations provide a way to organize and understand events. Moreover, a causal explanation may provide a stronger, longer lasting effect than an explanation consisting of only a restatement of the facts (C. Anderson et al., 1980). In the context of KBS use, it may not be realistic to expect novice users to generate causal explanations. Explanations provided by the KBS and used by the user may have similar explanation effects on judgments and decisions. 37 CHAPTER 4. RESEARCH MODEL AND HYPOTHESES 4.1 The Research Model A research model was derived based on the theoretical foundations established in Chapter 3, literature on explanations reviewed in Chapter 2, and research objectives specified in Chapter 1. The model is depicted in Figure 4.1. The two independent variables originated from the research framework of discourse comprehension (Figure 3.1). The explanation provision method variable in terms of hypertext versus lineartext is a special case of the text design variable, while the task domain knowledge variable focusing on novices versus experts was a key dimension of user characteristics. There are two additional reasons for including the task domain knowledge variable in this research: (1) The process of discourse comprehension relies heavily on the prior knowledge of the reader or learner (e.g., Kintsch, 1988); and (2) task domain knowledge has been considered a key independent variable in most of the empirical studies of KBS explanations (e.g., Ye, 1990; Dhaliwal, 1993) because users can be domain experts, apprentices, or novices with varying amounts of domain knowledge. With respect to the dependent variables in the research framework of discourse comprehension (Figure 3.1), problem-solving performance was operationalized as decision accuracy, given that the use of KBS was to facilitate and enhance decision making. Knowledge retention was not measured directly because learning was not the primary concern of this study. Two other important dependent variables related to KBS use were also included in the research model. Users' confidence or trust in KBS is an important measure of the effectiveness of explanations provided by KBS. This variable had been used in a number of empirical 38 Figure 4.1 The Research Model evaluations of explanation use (e.g., Lamberti & Wallace, 1990; Oz, 1990). It is a commonly-held belief that users must have sufficient confidence in the recommendations reached by the KBS before taking responsibility for acting on the recommendations. Lerch, Prietula, and Kim (1993) conducted a serious of experiments to investigate user trust in KBS. In terms of building trust, explanations could be the most important instrument. They provide access to the problem-solving process and the domain knowledge necessary for understanding the solution. Therefore, it is particularly important and relevant to have user trust in KBS as an independent variable in this study of KBS explanations. Perceived usefulness of explanations was the third dependent variable. It was argued that good explanation facilities increase user acceptance by assuring the users that the system is logical (Shortliffe, 1976). KBS can only be accepted if they explain what they do and justify their actions in terms understandable to the users (Swartout, 1983). Davis and his colleagues (Davis, 39 Bagozzi, & Warshaw, 1989) established the theoretical importance of the perceived usefulness variable. They theorized that perceived usefulness of a system was one of the fundamental determinants of user acceptance, and conducted a longitudinal study. Subjects rated the usefulness and ease of use of word processing software they would expect based on their limited experience after initial use (demonstration and less than an hour of actual system use). It was found that these ratings were significantly correlated with current and future usage of the system. Therefore, perceived usefulness of explanations was considered important in this study of the use of hypertext to provide explanations, due to the theoretical and practical importance of the perceived usefulness construct. Since the three dependent variables discussed in the above paragraphs were studied as the result of explanation use, this research adopted a comprehensive measure of the process of explanation use. While the detailed operationalization of this explanation use variable is discussed in Chapter 5, an overview of all the process measures is presented in Table 4.1. Specific Measures of the Use of KBS Explanations Number of total explanations requested: for both deep and reasoning-trace explanations Context of explanation requests: for the use of deep explanations only, prior to problem solving versus in the process of problem-solving Preference for explanation types: for both deep and reasoning-trace explanations, in terms of Why, How, Strategic explanations Rationale for explanation use: for both deep and reasoning-trace explanations Table 4.1 Overview of the Operationalization of Explanation Use The use of KBS explanations was measured and analyzed using the following four more 40 specific terms. The first two were the number of explanations requested, and the context of explanation requests. The latter was important to assess the use of hypertext-based explanations in the context of problem solving, where the three intervening variables of discourse comprehension theories might be affected. The third one was preference for explanation types, which might be influenced by (1) the level of domain expertise, i.e., needs for domain knowledge; and (2) the availability of hypertext-base explanations. Finally, rationale of explanation use was important for understanding to what extent explanations have been used to help problem-solving. The first two of these detailed measures were used for testing hypotheses related to the research question; the last two were adopted for exploratory analysis. 4.2 Hypotheses This section presents hypotheses derived from the research model, in light of the theories reviewed in Chapter 3. Each of the hypotheses is briefly discussed in relation to the theoretical foundation, relevant literature, and importance to this study. Hypothesis HI is related to the determinants of explanation use (main effects). HI: Users with access to hypertext-based explanations will use more explanations than users with access to lineartext-based explanations. Since there are two general types of explanation provided by the KBS, each with distinct functionalities, HI can be broken down into two more specific hypotheses. Hla is based on the discourse comprehension theories in the sense that hypertext-based explanations will make KBS output more understandable by making necessary domain knowledge more accessible to users. Thus, as deep explanations become more accessible, more will be requested. Hlb is indirectly related to the use of hypertext. If hypertext-based access to deep explanations is valued by users, 41 reasoning-trace explanations may become more understandable and valuable, due to the feasibility of accessing deep explanations from them. HI a: Users with access to hypertext-based explanations will use more deep explanations than users with access to lineartext-based explanations. Hlb: Users with access to hypertext-based explanations will use more reasoning-trace explanations than users with access to lineartext-based explanations. Hypothesis H2 is central to this research. It is the use of hypertext to provide explanations that makes it possible to access domain knowledge in the context of problem solving. Thus, users' understanding of KBS output can be facilitated according to the discourse comprehension theories. This hypothesis can also be derived from the notion of contextualized learning. It is expected that users would find contextualized access to deep knowledge more useful, and shift the context of deep knowledge request from before to during the problem-solving. H2: Users with access to hypertext-based explanations will request more deep explanations in the context of problem solving than in the abstract (i.e., requesting deep explanations prior to problem solving). Hypothesis H3 is related to the determinants of explanation use (interaction effects), which is based on the difference between expert and novice users. H3: There will be an interaction effect between user domain expertise and the explanation provision method on the number of explanations requested. H3 can be further specified in terms of the use of deep explanations and reasoning-trace explanations. According to cognitive theories of discourse comprehension, the level of prior knowledge will affect the three intervening variables of information processing (Section 3.1). Therefore, prior knowledge is likely to influence the number of requests for deep explanations. H3a is also grounded in the belief that expert users may be more confident in their command of domain knowledge, and more capable of identifying the necessary knowledge to apply to the 42 problem. According to Salthouse (1991), performance limitations of non-experts in decision making and medical diagnosis included (1) not knowing what information is relevant and why, (2) lack of knowledge of interrelations among variables, and (3) difficulty in combining or integrating information. Furthermore, Salthouse suggested, the structure of knowledge might be important for overcoming the limitations, by indicating how different variables are interrelated, by facilitating recognition of what information is relevant, and by supporting the formation of internal representations that suggest the actions to be performed and lead to expectations about the consequences of these actions. If novice users have a stronger need for convenient access to deep explanations, hypertext would be more effective for novice users in increasing deep explanation use than for expert users. H3a: There will be an interaction effect between user domain expertise and the explanation provision method on the number of deep explanations requested. Similarly, due to the lack of experience, novice users may have a stronger need for reasoning-trace explanations to bridge the gap between the recommendations and the data. It would be difficult for novice users to create their own causal explanations. H3b: There will be an interaction effect between user domain expertise and explanation provision method on the number of reasoning-trace explanations requested. Hypotheses H4a to H4c are related to the effects of hypertext on the outcome of explanation use (main effects). H4a can be directly derived from the discourse comprehension theories (by specifying problem-solving performance as decision accuracy). H4b and H4c suggest that users should perceive the explanations useful and trust the KBS, if contextualized access to domain knowledge is useful for understanding KBS output. Hypotheses H5a to H5c are related to the effects of expertise on the outcome of 43 H4a: Users with access to hypertext-based explanations will have a bigger improvement in decision accuracy than users with access to lineartext-based explanations. H4b: Users with access to hypertext-based explanations will have a higher level of perceived usefulness of explanations than users with access to lineartext-based explanations. H4c: Users with access to hypertext-based explanations will have a higher level of trust in KBS than users with access to lineartext-based explanations. explanation use (main effects). Although these hypotheses are not central to this study, they are interesting and the research design allows them to be tested conveniently. The foundation of these hypotheses is similar to that of H3, H3a, and H3b. H5a: Novice users will have a bigger improvement in judgment accuracy than expert users, as a result of using KBS. H5b: Novice users will have a higher level of perceived usefulness of explanations than expert users. H5c: Novices users will have a higher level of trust in KBS than expert users. Furthermore, H5a is also based on previous research that experts tend to be less influenced by the provision of decision rules than those with only moderate knowledge in the task domain (Arkes, Dawes, & Christensen, 1986). Lamberti and Wallace (1990) also found that the use of KBS had a greater impact on the performance of low-skill employees than on high-skill employees. As to H5b, enhanced accessibility to deep knowledge is not expected to be as useful for experts as for novices. Hypotheses H6a through H6c address the direct relationship between the use of deep explanations and its outcome (main effects). It has been argued in earlier chapters that it is important for decision aids to increase the availability and accessibility of normative knowledge in decision making. However, there is a concern over information overload as a result of making 44 more information conveniently available. Given human cognitive limitations, more choices and more information may cause cognitive overload, particularly for novices. Since hypertext is expected to increase the number of deep explanation requested, it is important to know if requesting more deep explanations is indeed beneficial to problem solving as predicted by discourse comprehension theories. H6a: The use of deep explanations will be positively related to improvement in decision accuracy. H6b: The use of deep explanations will be positively related to the level of perceived usefulness of explanations. H6c: The use of deep explanations will be positively related to the level of trust in KBS. Hypotheses H7a through H7c are aimed at the direct relationship between the use of reasoning-trace explanations and its outcome (main effects). Discourse comprehension theories are useful for studying the effects of not only hypertext-based access to deep knowledge, but also the use of reasoning-trace explanations. If KBS users understand the reasoning process, requesting-reasoning trace explanations may facilitate the evaluation of KBS recommendations, and the comparison between KBS recommendation and users' own reasoning. Otherwise, reasoning-trace explanations can be requested to bridge the gap between the conclusions made by KBS and the input, to reduce the number of bridging inferences. Reasoning-trace explanations may also highlight the macrostructure of the problem solving process. Therefore, H7a to H7c predict the positive effects of the use of reasoning-trace explanations. Furthermore, H7a to H7c can also be derived from the literature on the functionalities of KBS explanations, as reviewed in Chapter 2. In particular, hypothesis H7a is partly based on the "explanation effect" reviewed in Section 3.2.5. Because causal explanations provide a way of 45 H7a: The use of reasoning-trace explanations will be positively related to improvement in decision accuracy. H7b: The use of reasoning-trace explanations will be positively related to the level of perceived usefulness of explanations. H7c: The use of reasoning-trace explanations will be positively related to the level of trust in KBS. organizing and understanding events, reading reason-trace explanations may strengthen the effect of recommendations provided by KBS on decision making, i.e., result in a stronger, longer-lasting effect than just reading recommendations. H7b and H7c can also be rationalized based on the amount of relevant information used to make a judgment. The strength of the belief that one has made an "informed" judgment is likely to increase with the number of available predictors, and the belief may then serve as evidence that judgments are accurate (Higgins & Bargh, 1987). Because reasoning-trace explanations would justify the recommendations, requesting more of them would imply increased access to information that reinforces or supplements KBS recommendations. Consequently, the level of confidence and trust in KBS should be higher as well. 46 CHAPTER 5. RESEARCH METHOD Chapter 5 discusses the research method, including a description of the task domain and experimental materials, experimental design, operationalization of the independent and dependent variables, recruitment of subjects, and experimental procedures. 5.1 Research Method Laboratory experimentation was used in this study for primarily two reasons. First, laboratory experimentation allowed the manipulation and control essential for testing the behavioral impact of alternative methods for providing explanations (Carlsmith, Ellsworth, & Aronson, 1976; Kerlinger, 1986). Experimental systems could be implemented in such a way that they differ only in the representation and accessibility of deep knowledge, while all other aspects would be identical. Not only the two experimental KBS, but also the corresponding tutorial systems, could be designed consistently in terms of the content of recommendations and explanations. Furthermore, subjects were recruited from specifically-chosen user populations based on predetermined criteria, and assigned to treatments at random. Consistency in the treatment conditions was relatively easy to maintain in a laboratory experiment setting, compared to other research methods. The second reason for the use of laboratory experiment was related to practical constraints. None of the commercial KBS in existence provide explanations based on hypertext, as required for this study. It was impossible to find two versions of an existing KBS in use where the only difference between the two versions was that one used hypertext for knowledge 47 representation and explanation provision, and the other had the same material available, but based on conventional lineartext. The other difficulty was associated with the need for log files of user-KBS interactions to determine the exact number and context of explanation requests. The use of a simulated KBS to deal with the above constraints was accommodated best by a laboratory environment. 5.2 Task Domain and Experimental Materials A commercial lending task was chosen because both the composition of the financial statements and the expertise required to evaluate them were complex enough to warrant the use of a KBS. The task involved the evaluation of the financial statements of a company to predict its future performance and to determine an acceptable loan size. Financial statement analysis usually entails a review of a company's financial data to evaluate various aspects of its financial standing and performance. It is conducted by comparing a firm's financial ratios to the same ratios in earlier years, and to the ratios of other firms in the same industry, which are often summarized into industry composites. The evaluation of the ratios to produce judgments is an unstructured process, characterized by the use of specialized domain knowledge. A financial analysis case was prepared, involving the evaluation of an application for a senior borrowing by a hypothetical firm. Subjects were told to assume that they were corporate loan evaluation officers working for a large financial institution in Western Canada. They were provided with five-year financial statements of a hypothetical firm — "Canacom," and a completed set of common-size statements and ratios. The financial statements and case description were prepared and used in a previous study by Dhaliwal (1993). The company was 48 applying for senior borrowings of $800 million for streamlining its operations. Subjects were supposed to use a KBS designed for loan evaluation to assess various aspects of the company's financial health. Then, based on the assessment, they would make a recommendation regarding whether the loan should be approved, and, if yes, the amount. The experimental material (Appendix A) was sequentially pre-numbered to facilitate the experimental process and record keeping. Textual descriptions of the experimental procedures and KBS features given to the subjects were kept at a minimum, to reduce the amount of reading during the experiment. As part of this effort, two charts (cf., Appendix A) were included to graphically illustrate system structures, and major input and output elements, to replace lengthy textual description. A written experimental protocol was prepared in advance to guide the conduct of the experiment, and to maintain consistency. It covered each step of the data collection process (Appendix B). Using this protocol, the author of this dissertation trained a laboratory assistant, who ran some of the experimental sessions. The author ran the rest of the sessions (when the laboratory assistant was not available). 5.3 Independent Variables This section describes the operationalization of the independent variables, namely the explanation provision method and user expertise. 5.3.1 Explanation Provision Method and Experimental Systems The method for providing explanations, as an independent variable of this research, was operationalized using two versions of a simulated KBS. They were developed based on 49 alternative methods for providing explanations, i.e., hypertext versus conventional lineartext. The two versions were named FINAL YZER, and Hyper-FINALYZER, each of which is briefly described in this section. The types of explanations are provided in terms of Why, How, and Strategic ones (Table 5.1). The term reasoning-trace explanations is used to refer to all explanations related to the KBS recommendations (case specific), including the How (usually called rule trace), Why (sometimes called justification), and Strategic (control) explanations. Reasoning Trace Explanations: WHY justifies the importance, and clarifies the implications, of a particular conclusion that is reached by the system. HOW presents a trace of the evaluations performed and intermediate inferences made to reach a particular conclusion. STRATEGIC clarifies the overall goal structure used by the system to reach a particular conclusion, and specifies the manner in which each particular assessment leading to the conclusion fits into the overall plan of assessments that have been performed. Deep Explanations: WHY justifies the importance of, and the need for, a domain concept to be used or a procedure to be performed, in the problem solving. HOW describes the manner in which a domain concept is defined and to be obtained, or the manner in which a procedure is to be performed, as well as the way the concept or procedure is affected by others involved in the problem solving. STRATEGIC clarifies the overall structure in which domain concepts and procedures are organized, and specifies the manner in which each domain concept or procedure fits into the overall plan of problem solving that is to be performed. Table 5.1 Definitions of Explanations Provided by FINAL YZER and Hyper-FINALYZER Both versions of the KBS performed seven subanalyses such as liquidity analysis, capital 50 structure analysis (cf. the system flow chart in Appendix A). For each of the seven subanalyses, the KBS displayed the ratios to be used in the subanalysis, data (values of the ratios), and recommendation based on its "analysis." Correspondingly, users would see at least three basic types of screen (Appendix C). The three basic types of screens are in the following sequence: an index screen of relevant domain concepts (Figure CI), a data screen of relevant financial ratios calculated from the financial statements of the hypothetical firm (Figure C2), and KBS recommendation screens (Figure C3). In addition to these three types of screen, optional explanations could be displayed on request. From the index screen of domain concepts, users could request deep explanations on all relevant domain concepts and procedures to be used by the KBS (Figures C4 and C5) in a subanalysis. From the recommendation screens, users could request reasoning-trace explanations on all the recommendations reached by the KBS (Figure C6). In Hyper-FINALYZER, deep explanations could be requested from all types of screen. Explanations provided in FINALYZER (Dhaliwal, 1993) were designed based on the cognitive feedforward and feedback paradigm for learning in the context of problem solving, which emphasized a particular order among events (Bjorkman, 1972). Deep knowledge was presented as cognitive feedforward prior to analyses, and reasoning-trace explanations were accessible after system recommendations as cognitive feedback. The advantages of providing domain knowledge as cognitive feedforward include promoting more accurate and consistent knowledge acquisition, relieving the learner from certain cognitive strain, and favouring an analytical rather than intuitive mode of thought (Bjorkman, 1972). Therefore, following the feedforward-feedback paradigm of learning, FINALYZER encouraged a structured, step-by-step, linear process: (1) Users would interact with the FINALYZER in a linear order (explanations 51 on domain concepts, data, recommendations, and reasoning-trace explanations); explanations on domain concepts were available only through the index screen. (2) When examining deep knowledge, users could only access domain concepts one at a time, and only from the index screen. There was no mechanism in the KBS to highlight the interdependence among domain concepts, and to allow users to follow the semantic links among domain concepts. Explanations in Hyper-FINALYZER were provided with hypertext (Mao, et al., 1995). It allowed the integration of deep explanations (cognitive feedforward) and reasoning-trace explanations (feedback). Hyper-FINALYZER had the same recommendations and explanations, i.e., an identical number of explanations and identical contents for each explanation, as FINAL YZER had. Recommendations and reasoning-trace explanations in both systems were adopted from the previous version of FINAL YZER with only minor additional enhancement mainly on the wording. The reason for minimizing the change was that FINAL YZER had already been validated during the initial development process (Dhaliwal, 1993). However, given this study's emphasis on contextualized access to deep knowledge, the contents of deep explanations in both versions of KBS were substantially augmented. The rationale and the process of the augmentation are described in the following paragraphs. As in many other domains, understanding of domain concepts is fundamentally important in financial analysis. A large part of the analysis is based on the calculation and interpretation of key financial ratios. Many classification and definitional issues need to be addressed when computing financial ratios. For example, the terms "working capital" and "cash flow" have been used differently in many diverse contexts. In certain areas of financial analysis (e.g., capital structure analysis), the popular name of a ratio may not convey precisely its meaning, and hence, 52 the method of its computation. Before any measure or ratio is used, care must be taken that the method of its computation is thoroughly understood (Bernstein, 1993). Basic domain concepts are related to each other, and used jointly in financial analysis. For example, a widely used measure of liquidity is working capital (Bernstein, 1993). Given the importance attached by credit grantors and investors to working capital as a measure of liquidity and solvency, financial analysts use three measures of the rate of activity in the three major working capital accounts, i.e., accounts receivable turnover, inventory turnover, and accounts payable turnover, in addition to current ratio. The use of hypertext to represent these concepts makes their interdependency explicit, and provides additional linkages for accessing the concepts. The material contained in the deep explanations was augmented under the guidance of Dr. Joy Begley, a faculty member specialized in accounting. The intent was to make the explanations comprehensive and useful. In the previous version of FINALYZER, many deep explanations were very brief. The How type of explanation on financial ratios, in particular, usually consisted of only a single formula, which was not very useful. Thus, the How type of deep explanation was augmented more substantially than the Why type. Only minor changes were made to the Strategic type. The changes were made to reflect the removal of two ratios which were rarely used and considered not well defined, and the addition of two ratios which were important and commonly used. The augmentation was done with care to maintain the basic characteristics of the How and Why types of explanation. The additional material was drawn from textbooks of financial analysis (Bernstein, 1993; Foster, 1986; Fraser, 1988; Miller & Miller, 1991; Stickney, 1993), and the expertise of Dr. Begley. Hyper-FINALYZER's explanations were implemented with hypertext features, for both 53 representing and accessing deep knowledge. Figure 5 (Appendix C) represents a particular reasoning-trace How explanation in Hyper-FINALYZER. Concepts such as current ratio, acid-test ratio, and working capital are attached with circles (hypertext link markers). If the user selects current ratio first and then HOW, the user will get Figure 4. The Why and Strategic types of explanation can be obtained in the same way. In Figure 4, domain concepts such as days sales in receivables and days to sell inventory can also be selected for further explanation. There were 53 domain concepts and 25 recommendations involved in the seven subanalyses performed by FINALYZER. Each domain concept had one How and one Why explanation. There was only one Strategic explanation for each subanalysis, common for all domain concepts involved in the subanalysis. Therefore, the total number of deep explanations was 113. Similarly, the total number of reasoning-trace explanations was 57. In summary, the two versions of the simulated KBS contained identical recommendations and explanations. The only difference was in the representation and accessibility of deep knowledge, based on the respective explanation provision methods. The use of hypertext to provide explanations made the deep knowledge more accessible in a contextualized manner, i.e., through recommendations, reasoning-trace explanations, and the explanations of other closely associated domain concepts, rather than merely through index screens before starting analysis and decision making. FINAL YZER offered a linear structure and simplicity, whereas Hyper-FINALYZER provided a richer environment for explanation use. A tutorial KBS with hypertext-based explanations was also developed by modifying an existing one called Credit-Advisor. Thus, each version of the KBS had a corresponding tutorial system for training subjects. 54 5.3.2 Level of Expertise The second independent variable relates to the level of task domain knowledge of KBS users. There is a continuum of expertise, regardless of the absolute levels of performance. A relative definition of expertise is usually preferred. Patel and Groen (1991) provided a refined categorization of expertise with six levels, namely, layperson, beginner, novice, intermediate, subexpert, and expert. Admittedly, the intermediate levels are hard to differentiate. In other words, there were measurement difficulties inherent in pinpointing the exact boundaries for the middle levels. Therefore, this study focused on the two extreme ends of the user expertise continuum. It is worth commenting on the distinction between "novice" and "layperson," at the beginner end. The distinction between novices and laypersons is important. Laypersons have little if any skill in making decisions in a given area. In comparison, novices may have considerable knowledge but lack of experience. Many technical domains require prior knowledge for even minimal comprehension. For example, in chess, a novice is a beginner who knows the rules of chess, whereas a layperson does not. This study involved novice users, as opposed to laypersons. It would not make sense to have laypersons make judgments on the current liquidity and long-term solvency position of a company, because they simply would not be able to fully understand the task and related questions. The novice subject population involved undergraduate students specializing in accounting, or MBA students who had taken accounting courses extensively or had an undergraduate degree in accounting. They had the general theoretical background of financial analysis, and were familiar with the terminological knowledge and basic procedures. They were 55 "educated novices," but had no or little working experience. On the expert end, self-designation or scores on short knowledge test have been used as expert selection criteria in many studies. In other studies, job titles or academic credentials have been used. A definition of expert by Camerer and Johnson (1991) was adopted in this study: An expert is considered to be a person who is experienced at making predictions in a domain and has some professional or social credentials. Experts defined as such would be called subexperts, in Patel and Groen's terminology (1991). In this study, no special distinction was made between experts and extraordinary experts, i.e., experts acclaimed by peers in the same domain (cf. Shanteau, 1988). Expert subjects were recruited from experienced professionals whose work involved financial analysis. They possessed some related professional qualifications beyond the undergraduate degree, such as Chartered General Accountant (CGA) and Certified Financial Analyst (CFA) designations. It was clearly stated in the invitation (recruitment) letter that the required qualification was at least three years of post-qualifying, working experience directly related to financial analysis. 5.4 Dependent Variables Most of the dependent measures were adopted from prior empirical research on KBS use, and slightly modified to suit this study. Decision Accuracy. Accuracy is an important issue in social judgment. According to Kruglanski (1989), the most prevalent definition of accuracy has been that of a correspondence between judgment and criterion. Funder (1987) suggested two external criteria for examining accuracy, namely, agreement with consensus (expert agreement), or the ability to predict current 56 or future behaviour. The first one is frequently used as a measure of judgment accuracy. In this study, decision accuracy was measured before and after using the KBS. The difference between the two measures was considered improvement in decision accuracy caused by users' task domain expertise and the use of explanations based on different methods. Subjects made judgments regarding the experimental task with and without using the KBS. There were six judgmental questions, covering various aspects of the financial health of the hypothetical firm, in terms of current liquidity, capital structure, asset utilization and profitability, market valuation, financial management, and operating management (Appendix A). Each of these questions was answered using a single-item, ten-point, Likert-like scale. For example, the question related to current liquidity was worded as: Based on your analysis and under current economic and interest-rate conditions, rate Canacom's current liquidity position. Please circle the correct answer. Very Weak Position: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Very Strong Position The accuracy of the six judgments was determined by comparing a pre-determined benchmark, which consisted of a set of "correct" consensus estimates agreed upon by a panel of five expert judges after a two-round Delphi process (Dhaliwal, 1993). Since each subject evaluated the case twice, before and after using the KBS, two sets of deviation scores were obtained. The improvement in decision accuracy could be assessed in two ways. The absolute deviations for the six evaluative judgments could be summed up for each subject to yield an overall accuracy measure. The improvement in decision accuracy could be taken as the overall pre-KBS score minus the post-KBS one. Thus, larger scores indicated larger improvement in decision accuracy. Alternatively, improvement in decision accuracy could be considered as a "latent variable" with multiple indicators. For each of the six questions, one indicator could be 57 obtained from the difference between the pre- and post-KBS deviation scores. Thus, each of these indicators reflects the improvement in decision accuracy in one particular aspect. Therefore, it would be possible to determine for which aspects the use of KBS and explanations was more effective, compared to the others. In either case, the improvement scores were preferred to the post-KBS scores, as an indicator of the effects of explanation use, because the former took into account individual differences of the subjects. The measure of improvement in decision accuracy could be considered an indicator of the effectiveness of the KBS in convincing users, for the following two reasons: (1) The panel of five experts was involved in the development of not only the benchmark, but also the experimental KBS, i.e., providing the basis for the recommendations and explanations. Because both the treatment (KBS) and the evaluation benchmark were based on the input of the same panel of experts, the evaluation would be more like a measure of the effectiveness of the treatment rather than an objective measure of absolute accuracy. And, (2) given the high level of complexity and the subjective nature of the task, no single "correct" judgment existed. Two other more general predictive questions were asked to make the case more challenging and realistic. Subjects were asked to estimate the exact amount of loans to be granted and the hypothetical firm's projected net income for the coming year. Because the KBS did not provide recommendations and explanations that were directly related to these two questions, answers to them were not included in the decision accuracy measure. Perceived Usefulness of Explanations refers to the degree to which users perceived that using the explanations provided by the KBS enhanced their task performance. A decision was made to adopt instruments with high degrees of validity and reliability developed and used in 58 recent IS studies of system usefulness (Moore & Benbasat, 1991; Davis, 1989). Dhaliwal (1993) developed a nine-item instrument based on these studies. Since the definition of usefulness in his research was similar to that used by Davis (1989) and to what Moore and Benbasat (1991) term as "relative advantage," the items on these two instruments were combined and adapted. The same instrument was adopted for this research. Data were collected in the form of seven-point Likert-type scales. Table 5.2 presents the items that comprised this instrument. 1. Using the explanations provided by FINALYZER improved the quality of the analysis I performed. 2. My understanding of financial analysis has been enhanced by the use of the explanations provided by FINAL YZER. 3. Using the explanations provided by FINAL YZER enhanced my effectiveness in completing the financial analysis task. 4. The explanations provided by FINALYZER had a significant impact on my judgments. 5. Using the explanations provided by FINALYZER enabled me to accomplish the financial analysis more quickly. 6. Using the explanations provided by FINALYZER made the financial analysis task easier to perform. 7. Using the explanations provided by FINALYZER gave me more control over the financial analysis task. 8. Using the explanations provided by FINALYZER increased my productivity. 9. Overall, I found the explanations provided by FINALYZER useful in analysing the financial statements. Table 5.2 Items Used to Measure Perceived Usefulness of Explanations Trust in KBS. Generally speaking, the concept of trust that humans have on machines has not been systematically studied. Muir (1987) suggested that work done by psychologists and 59 sociologists on interpersonal trust would be a good starting point for the study of trust between humans and machines as decision aids. Sociologists consider trust as the degree of confidence one has in a relationship. For example, Barber (1983) conceptualized interpersonal trust in terms of three specific expectations: (1) One's general expectation of the persistence of the natural and the moral social orders (e.g., natural laws will be constant, human life will survive); (2) one's expectation of technically competent role performance from those involved with us in social relationships and systems; and (3) one's expectation that partners in an interaction will carry out their fiduciary obligations and responsibilities. Among these three expectations, the expectation of technical role performance is central to trust between humans and machines as decision aids (Muir, 1987). Barber (1983) further specified three types of technical competence that one can expect from another: expert knowledge, technical capability, and everyday routine performance. In the context of KBS applications, users' trust in KBS may depend on the degree to which explanations make the technical competence transparent to the users. More specifically, users would be likely to determine their trust in a KBS by evaluating the level of expert knowledge as well as the technical capability demonstrated by the KBS in its recommendations and explanations. Trust in KBS was measured in this study by adopting an instrument developed by Lerch, et al. (1993), and three additional items from McCroskey's scale on authoritiveness (1985). It was defined as a generalized expectancy held by a user that recommendations provided by KBS could be relied on (Lerch, et al., 1993). The operationalization of this construct involved three dimensions: predictability, dependability, and faith. The first seven items presented in Table 5.3 were adopted from Lerch, et al. Three additional items originating in McCroskey's scale on 60 authoritiveness (1985) were considered particularly relevant, because they were specific indicators of the technical competence components of trust (Barber, 1983). Therefore, they were incorporated into the measure of trust (after modifying the subject words), as the last three items in Table 5.3. 1. FINALYZER behaves in a very consistent manner. 2. FINAL YZER gives the same advice for the same situation over time. 3. FINAL YZER provides good advice across different situations. 4. FINAL YZER is dependable in important decisions. 5. When FINAL YZER gives me unexpected advice, I am confident that the advice is correct. 6. FINAL YZER helps users make good decisions. 7. I think users who have little expertise would trust the advice give by FINAL YZER. 8. FINAL YZER is a reliable source of knowledge for financial analysis. 9. FINAL YZER has considerable knowledge of the factors involved in financial analysis. 10. The advice generated by FINAL YZER could only have been provided by an expert h the industry. Table 5.3 Items Used to Measure User Trust in the KBS It is a common belief that users of hypertext systems may have disorientation and cognitive overload problems (Conklin, 1987). These problems might cause the hypertext version of the KBS to be perceived slightly more difficult to use. Therefore, the ease of use measure, used in the Dhaliwal study (1993), was also included in the questionnaire. The purpose was not to test any hypotheses, but to compare the level of difficulty involved in the two experimental systems. Use of Explanations. Information on the use of explanations was extracted from the 61 computer logs of user-KBS interactions, and users' verbal protocols. The log files kept track of keystrokes and time elapsed between keystrokes for each subject. A computer program, taking the computer logs as input, was written to extract relevant information. The log files provided information on the number of each type of explanations requested, and noted the context where deep explanations were requested (i.e., in the abstract of analyses versus in the context of analyses). Because the two experimental KBS had the same number of explanations, a direct comparison based on the number of explanations requested could be made to test various hypotheses. Explanations requested by each subject were categorized in two dimensions. First, explanations were classified into two general categories: deep explanations and reasoning-trace explanations. Furthermore, since deep explanations could be accessed either through the index screen of relevant domain concepts or hypertext links at different stages of KBS use, the use of deep explanations was also classified according to the context and means (modes) used to make the request. The purpose was to test the impact of hypertext on the number and context of explanation use. Second, each of the above categories of explanation use was further divided into Why, How, and Strategic explanations, which are the three most commonly available types of explanation. Such a categorization allowed an investigation of users' preference for explanation types and comparison between this research and previous studies. The detailed coding scheme is described in Table 5.4. Each explanation could be retrieved more than once. Thus, two sets of statistics were generated. In one case, if a particular explanation was viewed more than once, each time of its 62 WHY HOW STRATEGIC TOTAL DEEP EXPLANATIONS Index1 (Abstract) Hypertext-2 Abstract Hypertext-3 Contextualized REASONING-TRACE EXPLANATIONS Note: 1. Index (Abstract) refers to the number of deep explanations requested through index screens of domain concepts, prior to data analysis and KBS recommendations. 2. Hypertext-Abstract refers to the number of deep explanations requested via the links of hypertext originated from index screens ONLY, prior to data analysis and KBS recommendations. 3. Hypertext-Contextualized refers to number of deep explanations requested via the links of hypertext originated from data screens, KBS recommendations, and reasoning-trace explanations, in the context of analysis. Table 5.4 Classification of Explanation Use use was counted. In the other case, repeated use was counted only once, to reflect the degree of coverage of all the explanations available. As it turned out, the number of cases of multiple use of the same explanation was very small. Therefore, the data analyses presented in this chapter were based on the former case. The underlying assumptions were that if an explanation was requested repeatedly, it was needed more than once for the task, and that subjects working under a time constraint would not do things unnecessarily. In previous empirical studies of explanations (e.g., Dhaliwal, 1993; Ye, 1990), users' requests were used as a surrogate measure of the cognitive use of explanations. If a user selected a particular explanation for viewing, it would be assumed that it was being cognitively used. In 63 this study, the assessment of the use of explanation was expanded to include the nature of explanation use, in addition to the number of explanations used and the context of explanation use. Verbal protocol data were used to interpret why users requested explanations, how explanations were used in understanding KBS advice and reasoning (e.g., to reduce the number of memory reinstatements and bridging inferences). A certain proportion of subjects in each of the treatment conditions were asked to verbalize their thoughts throughout the whole process of KBS use and decision making (the details on the experimental procedures are reported in detail in Chapter 7). The verbal protocols were tape recorded. The analysis was mainly through scanning and scoring, to tabulate the major categories of reasons for explanation use, and the frequencies in each category. 5.5 Experimental Design A 2 X 2 factorial experiment design was utilized, as shown in Table 5.5, to investigate the research questions. The explanation provision method factor had two levels: hypertext-based versus conventional lineartext-based. The task domain knowledge factor comprised expert users versus novice users. The four treatment cells were obtained by crossing the two levels of explanation provision method with the two levels of user expertise. There was a separate treatment group for each cell, which resulted in a between-subjects design: The process measures on explanation use, and the outcome measures were between-subjects. A key dependent variable, decision accuracy, was measured before and after using the KBS. The difference between the two measures was considered improvement in decision accuracy as a result of the use of explanations based on different explanation provision methods. With this design, most of the research 64 hypotheses were tested in a between-subjects manner. The exceptions are hypotheses related to user preference for explanation types (Why/How/Strategic) by both the hypertext and lineartext groups, and hypotheses related to the context and means of explanation requests by the hypertext groups only. These two classes of hypotheses were tested within subjects. Groups 1 and 3 served as the control group for the investigation of the impact of hypertext, groups 1 and 2 served as the control group for that of user domain expertise. User Task Domain Knowledge Novice Users Expert Users Explanation Provision Method Lineartext-based Group 1 Group 3 Hypertext-based Group 2 Group 4 Table 5.5 The Experimental Design 5.6 Subjects The sample size was pre-determined using power analysis (Cohen, 1988). The minimum number of subjects in each cell was calculated to detect a large main effect (f = .40) for a significance level of 0.05 (alpha) and a power level of 80%. The calculations showed that thirteen subjects were necessary for each cell. Only five subjects in each cell were asked to provide verbal protocol, due to the high density of protocol data and the amount of resources required for protocol analysis. All together, 52 was the target number of subjects needed to complete the experiment. There were several restraints making larger sample sizes infeasible, e.g., the scarcity 65 of domain experts, and limited resources for this research. Subjects were recruited by sending out an information package to the targeted subject populations. The information package included an introduction to the general background of the study along with a description of the research procedure, and invited participation. Participation in this experiment was completely voluntary. Two distinct information packages were developed for the two groups of subjects (Appendix D includes all materials used for recruiting subjects). These packages comprised an information sheet describing the study and specifying the criteria for participation and the tasks involved. Novice subjects were Commerce undergraduate students who were specializing in accounting, or MBA students who had an accounting background or had taken accounting courses extensively. The other criterion was that they should have taken at least one course in financial analysis. Thus, subjects had roughly equivalent amounts of domain knowledge, but they all lacked work experience. These subjects would resemble entry-level employees for financial analysis positions. Four hundred copies of the information package for novices were distributed to prospective subjects: 200 to undergraduate students and 200 to MBA students. Forty-three individuals agreed to participate by returning the invitation form to the researcher. Some dropped out later due to scheduling conflicts. Eventually, 32 individuals participated, three of them as pilot subjects. Among these participants, 20 were undergraduate students, and 12 were MBA students. None of them belonged to any professional associations. Only four of them had short-term financial analysis related work experience. Information packages for expert subjects were distributed through two professional 66 societies. A total of 310 copies of the package were mailed to members of the Vancouver Society of Financial Analysts (VSFA), as the prime target subject population. With the endorsement of the executives of the Society, the package was sent as part of their regular newsletter. Sixteen agreed to participate. The response rate was low, likely because this was the second study of this nature in two years. Since there were not enough participants from the VSFA, the cooperation of the local chapter of the Canadian Certified General Accountants (CGA) was sought. A total of 800 copies of the information package were sent to all the CGAs in the province of British Columbia. Twenty-six individuals responded. Those who were located within the great Vancouver area were scheduled to participate. Eventually, 26 expert subjects participated in the study, 12 were members of VSFA and the other 14 were CGAs. On average, they had 9.6 years of financial analysis related work experience. Because judgment accuracy was a primary dependent variable, monetary prizes were offered to ensure every subject was doing his/her best. This decision was based on prior research findings that monetary incentives resulted in improvement in the accuracy of probability assessments and frequency assessments, particularly on realistic judgment tasks that are reasonably complex (Wright & Aboul-Ezz, 1988). Incentives were awarded within each treatment group for better performance in terms of decision accuracy. They were told in the invitation letter that a $50 prize was to be awarded to the top 20% of individuals participating under similar circumstances, based on judgment accuracy scores. In addition to the awards, novice subjects were also promised an honorarium of $15 for their participation, based on the expectation that on average it would take approximately two 67 hours to complete the experiment. Expert subjects were not offered this payment, as this amount was not significant enough to encourage the participation of the experts or even as a reimbursement for their time. 5.7 Experimental Procedures Each novice and expert subject was randomly assigned to one of the two treatment groups with different explanation provision methods. The experiment was administered individually, with only the experimenter and the participant in the room. The details of the experimental procedures are described as follows. Pre-experimental Phase. Upon arrival, subjects were given the same information sheet included in the recruiting information package. They had a chance to read the information sheet again, before signing a consent form. All subjects signed the consent form, which also requested them not to reveal details of the experiment to anyone who might also participate in the study. They filled out a background information questionnaire (Appendix D) with respect to their computer literacy, attitudes towards computer applications (Kay, 1990), and familiarity with KBS. Subjects then read a one-page sheet of general information describing the objectives of the study, which were as follows: (1) to evaluate the use of a financial analysis expert system to complete a loan analysis case; and (2) to evaluate the judgments made with the assistance of such a system. Subjects were not told that explanation use was the focus of the research. The main points of the information sheet were also verbally summarized to the subjects to ensure correct understanding. Subjects were informed that time was not a factor in the study and that they should take as long as they wished at any stage. However, subjects were also told about the 68 optimal allocation of their time if they wished to complete the study in two hours. Prior to the experimental task, all subjects completed two tutorials pertaining to their treatment condition. This was to ensure that they were comfortable with using the experimental systems, and to eliminate possible novelty effects. The first tutorial focused on the use of a mouse as the input device, and the use of two types of button: push-buttons and radio-buttons (Borland International, 1991), which the user needed to click on with a mouse to interact with the experimental system. Almost all of the subjects had used a mouse before, and went through this tutorial very quickly. Only a couple of them had never used a mouse before, and therefore spent about five minutes learning to do so. The second tutorial related to the use of expert systems. Subjects were given a one-page description of expert systems. It defined in simple terms what expert systems were, briefly explained how they were developed, and illustrated what would be expected from such a system in terms of recommendations and explanations. Definitions of the various types of explanation, Why/How/Strategic for both deep and reasoning-trace explanations, were also included in the information sheet, and explained verbally to subjects. Subjects were told to keep the definitions in front of them for reference when using the KBS. Subjects were subsequently given a step-by-step instruction sheet on the use of the CREDIT-AD VISOR tutorial expert system, which was a simulated KBS for evaluating consumer credit applications. Two versions of the tutorial system were developed, each of which had the same type of user interface (screens and procedures) as that of the corresponding experimental KBS. Subjects were told that the tutorial had the same features as FINALYZER to be used. Because the tutorial KBS was much smaller in scale compared to the experimental KBS, users 69 were encouraged to try out as many of the explanations offered by the tutorial KBS as possible, so as to ensure a thorough understanding of each type of explanation and proficiency in using explanations. Furthermore, they were instructed to spend as much time as they wished to try out the tutorial system until they felt completely comfortable and confident in the use of the system. This tutorial was critical to minimize the novelty effect that might unduly influence the request and use of explanations in the later stage. No problem-solving case was imposed upon the subjects at this stage to ensure that they had ample opportunity to satisfy their curiosity before data collection started. The experimenter monitored the process, and reminded the subjects to use, and familiarize themselves with, all of the three types of explanation. Subjects were also encouraged to refer back to the one-page description of expert systems (placed in front of them) for definitions of various types of explanation, while they were using the tutorial KBS. Subjects were then given the description of Canacom Corporation Loan Analysis Case, along with the printout of the five-year financial statements and pre-calculated common-size statements, and tables containing the financial ratios of the Canacom company (Appendix A). They were told to familiarise themselves with the case and to write down their answers to the judgment questions on a separate set of judgment recording sheets. A financial calculator, a pencil, and scratch paper were also provided. A previous study (Dhaliwal, 1993) revealed that this manual analysis was essential for the subjects to become familiar with the details of the task and data, get the subjects engaged in the analysis, and to acquire a clear expectation of the type of judgment to be made. Once again, subjects were given as much time as they needed. Experimental Phase. Upon finishing the manual analysis, subjects were told to use the 70 experimental system (pertaining to their treatment conditions), to help them re-analyze the case and make the judgments again. They were given a new set of judgment sheets, and could still look at the printout of the financial statements and financial ratio tables. The experimenter sat unobtrusively beside the subject, three feet away, to provide help with potential technical problems as required. Subjects were asked to go through all the seven subanalyses, in any order. Throughout this stage the experimenter did not intervene in the use of the system, for most of the subjects. However, a few subjects had to be reminded not to reverse the agreement rating scale. There was no mention of the explanations provided by the system and the subjects were free to use as many or as few as they wished. In one case, a subject in the hypertext group did not use any hypertext-feature. Still, there was no intervention to encourage the use of hypertext-based explanations. In another case, an expert subject appeared to make a couple of attempts to request explanations without success, then gave up. This was probably caused by cognitive overload, as he was also asked to provide verbal protocols. When questioned after the experiment was completed, he claimed that he was rushing, although he appeared to be tense and very concerned about his performance. Subjects who were assigned to provide thinking-aloud protocols were given additional training. Training materials and the experimental procedure related to thinking-aloud, i.e., the collection of verbal protocol data, are described in Chapter 7. In this experiment, the use of KBS was more realistic than the previous study involving FINAL YZER (Dhaliwal, 1993). In that study, subjects were required to provide an agreement rating after receiving each recommendation. The reasoning-trace explanations could not be requested until the agreement rating was provided. Furthermore, subjects were not allowed to 71 change their agreement ratings after viewing explanations (some novice subjects had tried to do so). This restriction was necessary because users' agreement was investigated as a variable moderating the request for explanations. This restriction was removed in this study. However, although agreement rating was not investigated as an independent or control variable, subjects were still required to provide the rating, just to increase the amount of thought and attention. Subjects were allowed to change their agreement rating after viewing explanations. This was also instrumental in stimulating verbal protocols (see details in Chapter 7). Post-experimental Phase. Upon completion of the experimental task, a post-experiment questionnaire (Appendix A) was administered to subjects to measure user perceptions of the usefulness of explanations, and trust in the KBS. Also included were measures of other secondary constructs that served as manipulation checks, e.g., the ease of use of the KBS. Finally, subjects were debriefed about the manipulation involved in the study - the use of a simulated KBS rather than a real one, and the focus of the study being on explanation use rather than system evaluation. Subjects were presented with a one-page debriefing protocol (Appendix A). They were also told the reasons that this information was not revealed to them earlier. Novice subjects were also paid an honorarium of $15 at this stage. At each of the above stages, subjects were allowed as much time as they needed and the computer recorded every keystroke that they made. The total elapsed time between the arrival of a participant and the completion of the debriefing ranged typically between one and a half and two hours, with the average being slightly less than two hours. 72 CHAPTER 6. DATA ANALYSIS (Part I) The results of the data analysis are reported in two chapters, each of which concentrates on one of the two distinct types of data, namely quantitative measures of process and outcome variables, and qualitative verbal protocol data on the rationale for explanation use. This chapter presents statistical analyses on explanation use measures such as the number, context, and type of explanation request, and outcome measures including improvement in decision accuracy and user perceptions. Chapter 7 reports on the analysis of verbal protocol data. A variety of statistical analysis tools were used to analyze the process measures, ranging from analysis of variance (ANOVA), contingency tables (crosstabulation), to log-linear modelling. The selection of the statistical tools was based on the nature of the data, hypotheses tested, and underlying assumptions of the statistical models. For example, the determinants of the use of explanations were analyzed by ANOVA to compare the average number of explanations used by subjects in various treatment conditions. The effects of hypertext on the context of deep explanation use and preference for explanation types were analyzed using contingency tables and log-linear models, because the analysis was based on frequency data from various classifications (Wickens, 1989). Structural or causal equation modelling was performed to analyze the outcome measures, as well as the overall research model. This integrated approach allowed for the combined analysis in one model of both the determinants of explanation use and the impact of explanation use. Thus, the overall predictive power of the research model could be examined while minimizing the overall measurement error involved. 73 This chapter is organized as follows. The first section details the data screening process used to identify the existence of outliers, potential violation of key assumptions of various statistical models, and the corresponding corrective transformations. Section 6.2 describes the effects of explanation provision methods and domain expertise on the number and context of explanation requests. Section 6.3 reports on details of the structural equation modelling with a focus on the effects of explanation use, while Section 6.4 provides supplementary analyses of the effect of hypertext on user preference for explanation types. Lastly, Section 6.5 provides a summary of research findings. 6.1 Data Screening Prior to Analysis Fifty-five subjects participated in this study, however, data for two of them could not be used. In one case, an expert subject who was asked to provide verbal protocols experienced some technical difficulty in requesting explanations. After trying to get explanations twice without success, he gave up and finished the experiment without using any at all. Thus, his data was not relevant for the analysis of explanation use. In the other case, a novice subject used a total of 63 deep explanations, which resulted in a z-score of 4.33. Because a z-score of this magnitude was well beyond the p = .001 criterion of 3.67 (2-tailed) (Tabachnick & Fidell, 1989), this case was deemed to be too extreme from "normal" KBS use, and was removed from further analysis. Consequently, data from 53 subjects was analyzed. Twenty-eight of the subjects were novices, and 25 were experts; 28 had access to hypertext-based explanations, and 25 used lineartext. Before ANOVA was run on explanation request measures, including deep explanations (DE) or reasoning-trace explanations (RTE) as dependent variables, the data was first checked 74 for adherence to statistical assumptions. Table El in Appendix E shows that the normal distribution assumption was violated in the case of DE use. A number of transformation techniques were attempted. The square-root transformation was most effective in reducing the skewness and kurtosis. The post-transformation distribution is displayed in Table E2. The same problem was found in the number of RTE, thus the same transformation was performed. The pre-and post-transformation distributions are shown in Tables E3 and E4. The homogeneity of variance assumption was also examined for the two explanation use measures, through two SPSS/PC+ tests. The statistics for the pre- and post-transformation results are presented in Table E5. The significance levels for DE before transformation were very small, indicating violation of the equal variance assumption. After the transformation, the significance levels were much larger. Although the variances remained somewhat different, the sample sizes in all groups were similar. This was not a major concern since the ANOVA test is not particularly sensitive to violations of equality of variance under such conditions (Norusis, 1983, p. 113). There were no such problems with the RTE data. Improvement in decision accuracy was one of the key dependent variables in this study. Table 6.1 shows the average combined deviation scores for the six judgments made before using KBS (compared to the benchmark), and the improvement scores after using KBS. Contrary to what was expected, the average deviation score of experts was not significantly different from that of novices. Rationalization of this unexpected result was sought in the literature. It was found that this outcome was not particularly surprising. In general, prior research paints a rather bleak picture of the decision-making ability of experts (Shanteau, 1988). For example, Camerer and Johnson claimed "experts know a lot but predict poorly" (1991). It was observed that expert grain 75 judges frequently were invalid and unreliable. Nearly one-third of wheat samples were found to be misgraded; when judged a second time, more than one-third of the samples were graded differently. It was also found that greater experience increased the confidence of judges, but was not necessarily related to the accuracy of grain inspection. Similar findings have been reported for other types of expert, such as medical doctors (Einhorn, 1974) and clinical psychologists (Goldberg, 1959). One frequently cited reason for the low level of expert performance is that experts reportedly rely on heuristics in making judgments, which often leads to biases or judgmental errors relative to normal standards. It was found in many fields that training had some effect on accuracy, but expertise had almost none (see Camerer & Johnson, 1991, for a detailed review). Although findings on cognitive limitations of expertise were replicated, it is still believed that many experts, nonetheless, were able to make competent judgments (Shanteau, 1988). Novices Experts t-value Overall Before KBS (Total Score) 9.96 n = 28 10.48 n = 25 t = -.48 p = .63 10.21 n = 53 Lineartext (Improvement) 0.54 n = 13 -0.08 n = 12 t = .35 p = .73 0.24 n = 25 Hypertext (Improvement) 2.13 n = 15 0.85 n = 13 t = .83 p = .41 1.54 n = 28 Overall (Improvement) 1.39 n = 28 0.40 n = 25 p = .85 p = .40 0.92 n = 53 Table 6.1 Average Scores of Improvement in Decision Accuracy Since this study was concerned with the effects of different explanation provision methods, rather than the relationship between expertise and performance, attention was focused 76 i on the improvement in decision accuracy as a result of KBS use. Initially, the aggregated score (sum of six indicators) of improvement in decision accuracy was to be used as the dependent variable in a number of ANOVA. Table E6 in Appendix E shows the frequency table of the improvement of decision accuracy scores. Since the distribution is so different from a normal one, a contour of the bell-shape could not be drawn by SPSS/PC+, and no suitable transformation was found. This indicated that statistical analysis techniques that were not stringent on the normal distribution assumption should be used, such as certain types of structural equation modelling. The analysis of psychometric properties of the user perception measures was integrated into the structural equation modelling, reported in Section 6.3. The reliability and validity of user perception measures were examined using the conventional approach (based on Cronbach's alpha and factor analysis), and reported in Appendix F as a supplementary analysis. The relationships between subjects' background data and dependent variables were examined using a correlation coefficient matrix to identify potential confounding effects. No substantial correlation was found. Thus there was little to gain from including covariates in the analysis. In particular, there was no significant difference in user perceptions of the ease of use between the two experimental systems (t = - 0.22, p = .82). 6.2 Analysis of Explanation Use This section reports the effects of explanation provision methods and expertise on the use of explanations, in terms of the number and context of explanation requests. 77 6.2.1 Determinants of Explanation Use All hypotheses are re-stated in the null form in this chapter. Hlao and H3b0 are related to the determinants of DE use, in terms of main and interaction effects, respectively. Hlao: Users with access to hypertext-based explanations will use the same number of deep explanations as users with access to lineartext-based explanations. H3ao: There will be no interaction effect between user domain expertise and explanation provision methods on the number of deep explanations requested. Summary data on DE use are presented on the left side of Table 6.2, without transformation, for easier interpretation. Transformed data were used for ANOVA and the results are presented on the right side. As expected, Hlao was rejected. Both explanation provision method and domain expertise significantly influenced the number of DE used. However, H3ao could not be rejected. There was no evidence of an interaction effect, which implied that hypertext was equally effective for increasing the number of requests by both novices and experts. Novices Experts Lineartext u = 6.62 a = 5.06 n = 13 \i = 2.42 a = 1.83 n = 12 Hypertext u = 17.33 a = 10.55 n = 15 u = 9.85 a = 11.48 n = 13 Main Effects: Expertise (E) Methods (M) Interaction Effect: E x M D.F. 8.64 17.72 .05 Sig. of F .005 .000 .829 Table 6.2 Statistics for the Number of Deep Explanation Use Hypotheses related to the determinants of RTE use include Hlb0 and H3b0, in terms of main and interaction effects, respectively. 78 Hlb0: Users who have access to hypertext-based explanations will use the same number of reasoning-trace explanations as users with access to lineartext-based explanations. H3b0: There will be no interaction effect between user domain expertise and explanation provision methods on the number of reasoning-trace explanations requested. Table 6.3 displays the pre-transformation data on the left side, and the results of ANOVA based on the transformed data on the right side. Novices Experts Lineartext u = 18.07 CJ = 12.43 n = 13 u = 16.67 a = 14.33 n = 12 Hypertext u = 16.33 a = 11.84 n = 15 u = 15.62 a = 12.64 n = 13 Main Effects: Expertise (E) Methods (M) Interaction Effect: E x M D.F. .12 .21 .05 Sig. of F .728 .649 .831 Table 6.3 Statistics for the Number of Reasoning-Trace Explanation Use Surprisingly, neither Hlb0 nor H3b0 could be rejected. It was expected that, when hypertext was used to represent and access deep knowledge, RTE might become more useful, and therefore be used more frequently. However, the ANOVA results clearly indicated that neither domain expertise nor hypertext made a difference in terms of the number of RTE used. 6.2.2 Effects of Hypertext on the Context of Deep Explanation Use It was expected, given hypertext-based DE, that users would be more likely to request DE in the context of problem solving, than in the abstract (prior to problem solving). Although some differences were expected between experts and novices in the preference 79 H20: Users with access to hypertext-based explanations will request fewer deep explanations in the context of problem solving than in the abstract (i.e., requesting deep explanations prior to problem solving). for the context of explanation requests, there was no theoretical ground or prior empirical evidence to allow for a prediction. Therefore, no hypothesis was proposed in this regard, and statistical analyses were performed as an initial exploration. Contingency table analysis, or frequency comparison, was particularly appropriate in this case, because data was represented as nominal scales or unordered categories. In fact, the most frequent application of chi-square in behavioral science is in contingency table analysis or frequency comparisons (Cohen, 1988). The analysis can be viewed as tests of the equality of two or more distributions over a set of two or more categories. Because it compares entire distributions rather than parameters (means, variance) of distributions, the chi-square test is a non-parametric test and is relatively free of constraining assumptions, other than the need to avoid very small hypothetical frequencies (Hays, 1981). The SPSS/PC+ crosstabulation procedure - CROSSTABS - was applied in the data analysis. Each column in Table 6.4 represents the total number (frequency) of explanations requested by all subjects. Forty-nine percent of the DE were requested through index-screens prior to analysis, and the rest were requested using hypertext (Hypertext-Abstract plus Hypertext-Contextualized). Pearson's chi-square was only weakly significant, despite the big difference between novices and experts in the way hypertext was used. The significance of the chi-square test was apparently weakened by the similarity between experts and novices in the proportion of DE requested via index screens. Since approximately the same proportion of DE was requested through index screens by 80 Expertise Count Novices1 Experts Total Explanation Type Index (Abstract) 129 49.6% 61 47.7% 190 49.0% Hypertext-Abstract 44 16.9% 12 9.4% 56 14.4% Hypertext-Contextualized 87 33.5% 55 43.0% 142 36.6% Total 260 67.0% 128 33.0% 388 100% Statistics Value DF Significance Pearson's chi-square 5.57 2 .06 Note 1: the number of novices is greater than that of the experts by three (28 versus 25). Table 6.4 Difference Between Experts and Novices in Means and Context of DE Use both novices and expert subjects, further analyses focused on the use of hypertext for requesting explanations in the abstract versus in the context of analysis. Table 6.5 shows that experts were more likely to use hypertext to request explanations in the context of decision making than novices (82.1% versus 65.4%), whereas novices were more likely to follow hypertext links to explore related domain concepts in abstract than experts. Pearson's chi-square was significant (p = .02), indicating that the preferred context of hypertext use was associated with domain expertise, i.e., experts were more likely to be engaged in the contextualized use of DE, via hypertext. Although the above contingency table analyses were effective for testing the effect of hypertext and expertise on the preferred means and context of explanation requests, no statistical estimates of the treatment effects were provided. In the following paragraphs, results of log-linear 81 Expertise Count Novices Experts Total Explanation Type Hypertext-Abstract 44 33.6% 12 17.9% 56 28.3% Hypertext-Contextualized 87 66.4% 55 82.1% 142 71.7% Total 131 66.2% 67 33.8% 198 100% Statistics Value DF Significance Pearson's chi-square 5.37 1 .02 Table 6.5 Difference Between Experts and Novices in Using Hypertext for Requesting DE modelling are reported to provide those estimates. Log-linear models are similar to multiple regression models, but are a special class of statistical techniques formulated to analyze categorical data. These models are useful for unveiling the potential complex relationships among the variables in a multiway crosstabulation, where all variables used for classification are independent variables, and the dependent variable is in the form of frequency from cross-classification. Neither analysis of variance nor multiple regression is appropriate for such categorical data because the observations are not from populations of normal distributions with constant variance (Norusis, 1983). 82 The statistical model used is the following1: LOG(ExpnOy) = ju + X,x +Xf + A/* Where ExpnOy is the observed frequency of DE use in the cell, \ x is the effect of the ith means of explanation use, A,jE is the y'th expertise level; A,jjXE is the corresponding interaction between explanation type and expertise. Table 6.6 presents the results of the log-linear modelling using the SPSS/PC+ HILOGLINEAR analysis on the data in Table 6.4. In terms of the main effects, DE were most frequently requested through the index screens. Therefore, hypothesis H20 could not be rejected. Part of the reason for this could be that the tendency to use the index for requesting explanations might be increased because of the sequential presentation of index screen of DE, data, recommendations, and RTE. The index screen was always presented to users as the first source for DE, therefore, to a certain extent users might be primed to use it. Despite the potential priming effect, subjects did take advantage of hypertext to request DE in the context of analysis. Contextualized use of DE was clearly the second preferred means of requesting DE. The result was significant with an alpha level of .05, since the 95% confidence interval (CI) was positive and did not include zero. The effect of expertise on DE use was consistent with findings in Section 6.2.1. Following hypertext links in abstract (prior to problem solving) was the least frequently used alternative for requesting explanations. This finding implies that main priorities of the subjects were making judgments and decisions. Browsing the domain knowledge base was at best a secondary priority. Although following hypertext links in abstract was not an overall preferred 1 The lambda parameters must sum to zero across the categories of a variable, to obtain a unique estimate of the lambda parameter. Based on similar constraints, the lambda parameters of the interaction terms must sum to zero over all categories of a variable. 83 Parameter EXPERTISE: Novices (N) Experts (E) MEANS: Index (I) Hypertext-Abstract (HA) Hypertext-Contextualized (HC) Coefficient .412* -.412* .526* -.805 .279* MEANS x EXPERTISE: I x N H A x N H C x N I x E H A x E H C x E -.039 .223* -.184 .039 -.223* .184 Std. Err. .066 .066 .080 .113 .139 .080 .113 .139 .080 .113 .139 Z-Value 6.25 -6.25 6.61 -7.09 2.01 -0.49 1.97 1.33 0.49 -1.97 -1.33 Lower 95 CI .283 -.283 .370 -1.027 -.582 .007 -.195 .001 -.456 -.117 -.001 -.088 Upper .541 -.541 .682 .550 .117 .446 .088 .195 -.446 .456 (* p < .05) Table 6.6 Effects of Expertise on the Means and Context of Deep Explanation Use means for requesting DE, the analysis of interaction effects indicated that novices were more likely to use this function compared to experts. Tables 6.7 presents the results of the log-linear modelling on the data in Table 6.5. This analysis focused on the relationship between expertise and the context of hypertext use in requesting DE. The analysis of interaction effects indicated that novices used more hypertext links to browse DE in abstract, whereas experts used more hypertext links to request DE in the context of judgment and decision making. 84 Parameter Coefficient Std. Err. Z-Value Lower 95 CI Upper 95 Q EXPERTISE: Novices (N) Experts (E) MEANS: Hypertext-Abstract (HA) Hypertext-Contextualized (HC) .431* -.431* -.542* .542* MEANS x EXPERTISE: H A x N H C x N H A x E H C x E .204* -.204* -.204* .204* .091 .091 .091 .091 .091 .091 .091 .091 4.75 4.75 -5.97 -5.97 2.24 -2.24 -2.24 2.24 .253 -.609 -.720 .364 .026 -.382 -.382 .026 .609 -.253 -.364 .720 .382 -.026 -.026 .382 (* p < .05) Table 6.7 Effects of Expertise on Different Use of Hypertext 6.3 Effects of Explanation Use: A Structural Equation Model This section reports the results of structural, or causal, equation modelling of the overall research model, which combines both the determinants and outcomes. The nature of the research model dictated the use of some kind of structural equation modelling, or path analysis, to investigate the overall relationship among the independent, intervening, and dependent variables. Data analysis was conducted using Partial Least Squares (PLS), a powerful multivariate analysis technique, which is ideal for testing structural models with latent variables (see Wold, 1985, for a comprehensive review). 85 6.3.1 The Use of PLS Structural equation modelling was chosen to analyze the effects of explanation use. The reason for this was that the effects involved multiple dependent constructs with measurement errors, such as perceived usefulness of explanations and trust in KBS. Structural equation modelling makes it possible to incorporate multiple dependent constructs, by explicitly recognizing measurement errors and "adjusting" relationships for these errors, as well as integrating theory with empirical data (Barclay, Higgins, & Thompson, 1995). This could not be done with multiple regression and ANOVA, generally classified as "first-generation" statistical techniques (Fornell, 1982). In this study, using PLS, the analysis of the determinants of explanation use could be easily combined with the analysis of the consequences of explanation use. Both PLS and the linear structural relationships (LISREL) model (Joreskog & Sorbom, 1986) belong to what is called the "second generation" of multivariate analysis (Fornell, 1982). Although LISREL is a more popular method, PLS was selected primarily based on the difference between the objectives of the two techniques. LISREL is best used for theory testing and development, whereas PLS is oriented more towards predictive applications. More specifically, LISREL estimates models in an attempt to reproduce the covariance matrix of the measure, and also incorporate overall goodness-of-fit measures to see how well the hypothesized model "fits" the data. Such covariance structure analysis is "theory-oriented, and emphasizes the transition from exploratory to confirmatory analysis" (Wold & Joreskog, 1982, p. 270). In contrast, PLS has as its objective the explanation of variance in a regression sense, and thus multiple R2 and the significance of relationships among constructs are measures more indicative of how well a 86 model is performing. "PLS is primarily intended for causal-predictive analysis in situations of high complexity but low theoretical information" (Wold & Joreskog, 1982, p. 270). Thus, PLS and LISREL are complementary, and in some cases PLS can be viewed as a precursor to the use of LISREL (Barclay, et al., 1995). The statistical analysis of the overall relationships derived from the research model in this study fits well into the application domain of PLS. In addition to the difference in objectives, a few other operational considerations also made the use of PLS more appealing. For example, multivariate normality is not a requirement for estimating PLS parameters. The distribution of data needs to be considered only when the statistical significance of parameters is estimated. While sample size considerations and the associated "rule of thumb" in regression apply to PLS, small sample sizes with complex causal models can be effectively analyzed with PLS. This is a difference between LISREL and PLS; the latter can work with small sample sizes. Given the early stage of theory development in KBS explanation use, PLS was used to assess the overall validity of the research model, and, in particular, to determine the extent to which expertise, method of explanation provision, and the use of explanations account for the variance observed in the dependent variables. PLS simultaneously assesses the reliability and validity of the measures of theoretical constructs, and estimates the relationship among theoretical constructs. However, a complete process of PLS modelling involves two stages: (1) assessment of the measurement model, including the reliability and discriminant validity of the measures, and (2) assessment of the structural model. Details of each of these two stages are reported in this section. The results of the PLS modelling are discussed in light of the research model and various hypotheses. 87 6.3.2 Formative Versus Reflective Constructs in the PLS Model PLS deals with two classes of constructs differently in its internal estimation process, which is another advantage over LISREL (Barclay, et al., 1995). The two classes of constructs are those with "reflective" indicators or measures, and those with "formative" ones. In general, a construct with reflective indicators is one where the observable variables are expressed as a function of the construct - the indicators "reflect" or are manifestations of the constructs; the construct precedes the indicators in a causal sense. They are implied in LISREL models. Typical reflective constructs in IT research include Perceived Ease of Use (Davis, 1989). These constructs are theoretically the source of correlation among the empirical indicators (Cohen, Cohen, Teresi, Marchi, & Velez, 1990). On the other hand, a construct with formative indicators implies that the construct is expressed as a function of the variables; and the variables "form," cause, or precede the construct. For example, Personal Computer Utilization is considered to be formed by length and frequency of use, number of different software packages, etc., (Barclay, et al., 1995). The construct is specified as a summary index. Such constructs are theoretically the causal link between the measured indicators and other variables in a model. However, they do not explain the correlations among their measurable indicators, and theory does not even require that the indicators be positively correlated, or nonzero correlated (Cohen, et al., 1990). In practice it could be difficult to determine whether a construct should be considered formative or reflective. A theoretical basis may be helpful in making the judgment. The research model comprised two independent variables and five dependent variables (constructs), to be discussed in the following paragraphs. The two independent variables, domain 88 expertise and explanation provision method, were modelled using formative indicators. Expertise and explanation provision method in fact were categorical variables, and were formed with a single indictor. Domain expertise was defined in two levels, either experienced professionals or novices; and explanation provision method was defined in terms of lineartext or hypertext. Two dependent variables are related to user perceptions, namely perceived usefulness of explanations and trust in KBS. They are typical reflective constructs. The decision accuracy variable was considered to be formed by an index of six individual items, based on the way it was measured and the development of the benchmark: The measure was based upon a set of six deviance scores, as opposed to a theoretical construct. The six items were chosen to incorporate the six major aspects of the task, and a panel of experts provided the benchmark used to determine judgment accuracy (details are described in Chapter 5). The measure was initially developed as a summary of the six scores used in a multiple regression analysis (Dhaliwal, 1993), without regard to the correlation among the six items. There was neither a theoretical nor a prior empirical basis to suggest that the treatment effects of the experimental KBS would be equally effective on the six aspects, or that judgments on the six different aspects would be highly correlated. The use of DE and RTE were initially modelled with formative indicators. The use of the DE could be formed either with three indicators for the three types of explanation (Why, How and Strategic), or with three indicators for the means and context of DE requests (Index, Hypertext-Abstract, and Hypertext-Contextualized). Because the effects of contextualized access to deep knowledge were of interest in this study, the latter was adopted. The use of RTE was formed with three indicators describing the use of the three types of explanation (Why, How, and 89 Strategic). However, in the PLS model, it turned out that the "communality" (shared variance) among the indicators were reasonably high, if the indicators were treated as reflective ones. If the indicators were considered formative, the RTE construct would degenerate into a single indicator construct, dominated by the use of How explanations. The loading score for How would be above .9, and the weight score above 1, whereas the loadings scores for the other two indicators would be negligible. The How indicator in itself was not as informative as How and Why together, because both types of explanation were essentially equally utilized, although How might have slightly more explanatory power (as indicated by the higher loading). Conceptually, it was also appropriate to model the two explanation use constructs with reflective indicators, reflecting or revealing the tendency of the user to request DE and RTE. Eventually, these two constructs were modelled as reflective constructs. The ability to work with small sample sizes made PLS more appealing for structural equation modelling, and made this analysis possible. According to Barclay, et al. (1995), the required sample size can be determined by what is required to support the most complex multiple regression PLS would encounter. This involves: (1) the number of indicators of the most complex formative construct, or (2) the largest number of antecedent constructs leading to a dependent predictor construct in an ordinary least square regression. Using the "rule of thumb" of ten cases per predictor, the minimum sample size would be ten times the larger of (1) and (2). In this case, the most complex regression involved the formative construct of accuracy measure with six items initially, and four items in the final model. The sample size of 53 satisfied this rule of thumb. 90 6.3.3 Measurement Model In traditional multivariate analysis, Cronbach's alpha is usually used as a measure of reliability, followed by exploratory principal component analysis to evaluate the convergent and discriminant validity of the scale (Appendix F). Then, aggregated factor scores from measurement items are subsequently used to test relationships among the factors, as in path analysis or multiple regression. A major assumption underlying such multivariate analysis is that the reliability and validity of the measures of a construct will "hold" across theoretical contexts (Fornell, 1982). This is considered a major limitation of traditional multivariate analysis. Barclay, et al. indicated that some well known instruments in the IS literature might not be reliable when used in alternate contexts (1995). This difficulty can be overcome partially by second generation multivariate analysis tools such as PLS, which assess measurement errors within the context of the research model, as opposed to independently. In studies using PLS, item loadings and internal consistencies for the constructs are examined as a test of reliability. A rule of thumb is to accept items with loadings of .707 or more, which implies greater shared variance (50% or more) between the construct and its measures than error variance (Camines & Zeller, 1979). However, researchers have been cautious in removing items with low loadings because there can be other underlying reasons for low loadings besides low reliability, e.g., multidimensionality of the construct (Barclay, et al., 1995). Furthermore, it is highly desirable to have four or more measurable indicators for each latent construct, and it is almost necessary to have at least three (Cohen, et al., 1990). In Barclay, et al.'s demonstration of PLS modelling (1995), ten of the twenty-one items had loadings less than .707. Nine were kept in the final model, and only one of the ten items was dropped from further 91 analysis because of extremely low loading. In this analysis using PLS, the initial results indicated that a few items loaded marginally on their respective constructs with loading scores well below .707. Not only the formative construct of decision accuracy, but also previously validated reflective constructs, had indicators with low loadings. This happened in spite of the fact that both of the two perception measures were validated in previous studies (Dhaliwal, 1993; Lerch, et al, 1993), and that Cronbach's alphas obtained prior to using PLS were .86 and .85 for Usefulness of Explanations and Trust in KBS, respectively (Table Fl in Appendix F). Barclay, et al. argued that some scales (or scale items) might not display the same psychometric properties when used in theoretical and research contexts distinct from those in which they were first developed (1995). While it was desirable to retain as many original items as possible to cover all underlying dimensions of the construct and to allow comparison between studies, it was also essential to reach an acceptable level of reliability. Consequently, items with extremely low loadings were dropped from the scales. Table E7 in Appendix E shows item loadings on the final constructs, along with weights for items of the formative construct. For formative constructs, PLS estimates the weights of each indicator, and transforms the weights into loadings. Other descriptive information about the constructs, such as Cronbach's alphas, are shown in Table 6.8. Fornell and Larcker's measure of internal consistency (1981), which is usually reported in empirical studies using PLS, is also shown in the same table. It is a better indicator of internal consistency than Cronbach's alpha (Barclay, et al., 1995) in PLS modelling. Four of the nine Usefulness of Explanations indicators were removed from further analysis, due to their low loadings. Q33 as the overall measure of usefulness was retained, with 92 a loading score of .70. The resulting scale had a Cronbach's alpha of .80. After removing two of the ten items in the Trust in KBS scale, the remaining items had loading scores ranging between .54 to .83. Cronbach's alpha was .87. Interestingly, all the seven items from the original scale by Lerch, et al. (1993) remained, plus the additional Q13. Furthermore, the indicators kept in the PLS model for the Usefulness of KBS and Trust constructs were almost identical to those validated through conventional reliability and factor analysis (Table F7). No.of Items 3 2 4 5 8 Mean 9.47 13.42 0.87 11.08 21.04 Std. Dev. 9.91 11.28 3.39 3.88 6.89 Fornell & Larcker's Internal Consistency .78 .84 .57 .84 .88 Cronbach's Alpha .51 .63 — .80 .87 Avg. Variance Extracted1 .54 .73 .26 .52 .49 Deep Explanation Trace Explanation Decision Accuracy Usefulness of Expl. Trust in KBS Note 1. Average extracted variance represents the amount of variance captured by the construct compared to the total variance (variance captured by the construct plus measurement error). Table 6.8 Means, Standard Deviations, and Internal Consistencies (Reliability) It was anticipated that some formative items of the Accuracy construct might not be highly correlated, and that some might have low loadings. Two of the six decision accuracy measures, namely El and E2 were dropped, with loading scores much lower than .30. The likely reason for the low loading scores could be that the benchmark for these two questions was somehow not consistent with the KBS recommendations and explanations. As shown in Table 6.8, the overall internal consistency for this construct was .57, which was lower than other constructs. Although it was possible to remove E6, which had the next lowest loading of .28, that would have left 50% of the original items out. This scale were not further reduced. 93 All of the DE measures were kept because they all have reasonable loadings. However, the RTE scale was dominated by the How and Why explanation measures. The Strategic explanation measure, with a loading score under .30, was dropped. Consequently, all the remaining indicators had loading scores above .50, except E6 for the Accuracy measure. Fornell and Larcker's measures of internal consistency, were generally at a reasonable level. The Accuracy construct, which was the only formative one, had internal consistency scores much lower than the others. This was a reflection of the difficulty of measuring judgment accuracy for the complex task involved in this study. EX ME DE RTE DA UOE TRUST Expertise (EX) 1.00 Method (ME) -.02 1.00 Deep Explanation (DE) -.27 .56 .73 Reasoning-Trace Explanation (RTE) .04 -.03 .13 .85 Decision Accuracy (DA) -.27 .23 .42 .12 .51 Usefulness of Explanations (UOE) .06 .10 .22 .03 .00 .72 Trust in KBS (TRUST) -.02 -.15 .11 .27 .09 .54 .70 Note 1. Diagonal elements are the square roots of average variance extracted. Off-diagonal elements are the correlation among constructs. For adequate discriminant validity, diagonal elements should be greater than corresponding off-diagonal elements. Table 6.9 Correlation of Constructs (Hypothesized Model) Discriminant validity was assessed using the average variance shared between the constructs and their measures, in terms of average variance extracted (AVE). It was compared to the variance shared between the constructs themselves (Fornell & Larcker, 1981). As shown in Table 6.8, the measurement model extracted at least as much or more variance than error variance (AVE close or above .50), except for the case of Decision Accuracy, which extracted the lowest amount of variance (.26). Removing E6 would only have improved the AVE of this 94 measure marginally to .32. Nonetheless, the minimum discriminant validity requirement was satisfied, in the sense that the variance extracted by each of any two particular constructs was larger than the covariance captured in the model (Fornell & Larcker, 1981). All the diagonal elements in Table 6.9 were greater than the corresponding off-diagonal elements, indicating adequate discriminant validity. For example, while the Decision Accuracy measure extracted the least amount of variance compared to other constructs, none of its correlation with other constructs was higher than .51 (square root of .26). 6.3.4 Structural Model The results of the PLS modelling are shown in Figure 6.1 and Table 6.10. The results include path coefficients representing the hypothesized causal relationships between the variables, and multiple R2s. The path coefficients can be interpreted as standardized coefficients (fTs) in multiple regressions, and their magnitude indicates if the relationship represented by the path is negligible. A path coefficient of .05 is suggested to be the lower limit of substantive significance for regression coefficients; .10 and above are preferable (Pedhazur, 1982). The multiple R2s can be interpreted, as in multiple regression, to show the percentage of the variance in the respective latent variables that can be explained by the overall model proposed. While path coefficients only represent the direct effect of each of the antecedent constructs in the PLS model, it is important to consider the total effects. The total effects are the summation of the direct and indirect effects, which are the overall indicators of the relative importance of antecedent constructs. Table 6.10 shows both the direct effect and the indirect effects of antecedent constructs. The most noticeable indirect effects are the strong positive 95 *** P < .001 * P < .05 Figure 6.1 Process and Outcomes of Explanation Use (Results From PLS Modelling) relationship between Methods (hypertext factor) and improvement in Accuracy, and the negative relationship between Expertise and improvement in Accuracy. The significance of the path coefficients were tested using the Jackknifing technique (Tukey, 1958; Wildt, Lambert, & Durand, 1982), which tests the significance of parameter estimates from data which are not assumed to be multivariate normal. It calculates the probability of obtaining a particular coefficient if the true value of the coefficient was zero. Seven paths were found to be significant at the 0.001 level of alpha, two were significant at the .05 level of alpha, and the other two were not significant. In the following, the results of the PLS analysis are discussed in light of the research model, and specifically in terms of research hypotheses. 96 Hypotheses Expertise --> DE Expertise —> RTE Expertise —> Accuracy Expertise --> Usefulness of Expl. Expertise --> Trust Methods --> DE Methods --> RTE Methods --> Accuracy Methods —> Usefulness of Expl. Methods --> Trust Standardized path coefficient (Direct Effect) -.27 .04 -.18 .55 -.03 DE —> Accuracy DE —> Usefulness of Expl. DE --> Trust RTE —> Accuracy RTE --> Usefulness of Expl. RTE - > Trust .36 .08 .08 .08 .21 .25 t-value for path -7.65"* 2.50* -5.86*** Indirect Effect 18.93 -0.97 7.41*** 2.36* 4.32*** 1.29 4.82*** 10.48*** .10 .01 .01 .20 .04 .04 Total Effect3 -.27 .04 -.28 -.01 -.01 .55 -.03 .20 .04 .04 .36 .08 .08 .08 .21 .25 * * P < . 0 0 1 , * P < . 0 5 Total Effect = Direct Effect + Indirect Effect Multiple R2 DE .38 RTE .00 ACCURACY .21 UOE .05 TRUST .07 Table 6.10 Summary of Findings From PLS Modelling 97 Hla^ Users with access to hypertext-based explanations will use the same number of deep explanations as users with access to lineartext-based explanations. HlaO was rejected, because the path from Methods to DE was .55, the largest coefficient produced by the PLS. Access to hypertext-based DE was associated with the increased use of DE. Methods (hypertext factor), along with Expertise, explained 38% of the variance in DE use. This represented the largest proportion of explained variance of a single dependent variable in the structural equation model. The PLS results indicated that experts used fewer DE, which was also in agreement with Section 6.2. Hlb0: Users with access to hypertext-based explanations will use the same number of reasoning-trace explanations as users with access to lineartext-based explanations. Hlb0 was not rejected, because not only the path coefficient was insignificant, but also the multiple R2 for RTE was virtually zero. These results were consistent with the ANOVA in Section 6.2, particularly the close to zero multiple R2. Given the high internal consistency and AVE scores of the RTE (.84 and .73), the low R2 was more likely due to the absence of an effect of the independent variables, rather than a measurement problem of the PLS model. Although the path from Expertise to RTE was significant, the value of the path coefficient (.04) was too small to be noteworthy. This inconsistency between the PLS modelling and the ANOVA results in Section 6.2 might be caused by: (1) Expert users were more likely to use the How type of RTE than novice users (Section 6.4), and (2) the RTE construct was dominated by the use of the How type (Table E7 in Appendix E). However, the PLS modelling and ANOVA were consistent in terms of the negligible value (close to zero) of the multiple R2 of the RTE variable. As expected, H4ao was rejected, because of the strong positive link (.20) from Methods (hypertext factor) to Accuracy. This was the result of indirect effects, which mainly came from 98 EMa,,: Users with access to hypertext-based explanations will have the same amount of improvement in decision accuracy as users with access to lineartext-based explanations. the path from Methods to DE to Accuracy. The effect of the path from Methods to RTE to Accuracy (Figure 6.1) was negligible. The strong positive link from Methods to Accuracy was demonstrated in the strength of the path coefficient (.20), and the magnitude of variance in Accuracy jointly explained by Expertise and the use of DE. H4b0: Users with access to hypertext-based explanations will have the same level of perceived usefulness of explanations as users with access to lineartext-based explanations. H4c0: Users with access to hypertext-based explanations will have the same level of trust in KBS as users with access to lineartext-based explanations. The indirect effects of Methods (hypertext factor) on perceived usefulness of explanations, and trust in KBS were weak, both having a path coefficient of .04 as shown in Table 6.10. These two hypotheses could not be rejected. The mere technological availability of hypertext-based access to deep knowledge in itself was not an important factor affecting users' perceptions of usefulness of explanations and trust. H5ao: Expert users will have the same amount of improvement in judgment accuracy as novice users. H5ao was rejected, because the path from Expertise to Accuracy was significant and negative. The total effect of expertise on decision accuracy was -.28 (-.18 direct and -.10 indirect effect). This implied that KBS was less effective in influencing the judgment accuracy of experts, than that of novices. H5b0 and H5c0 could not be rejected, because as Table 6.10 indicates, the indirect effects from Expertise on the corresponding dependent constructs were negligible. 99 H5b0: Expert users will have the same level of perceived usefulness of explanations as novice users. H5c0: Expert users will have the same level of trust in KBS as novice users. H6ao: The use of deep explanations will have no effect on the amount of improvement in decision accuracy. H6ao was rejected because the path coefficient from DE to Accuracy was positive and very strong (.36). This supported the provision of DE to KBS users. It also highlighted the importance of providing explanation facilities conducive to accessing DE, as it was already found that hypertext might significantly increase DE use. Interestingly, PLS modelling gave a slightly larger weight to the contextualized use of DE, indicating this item had a slightly larger impact than Index and Hypertext-Abstract items on the outcome measures (Table E7). H7ao: The use of reasoning-trace explanations will have no effect on the amount of improvement in decision accuracy. Increased use of RTE was found to be positively related to improved decision accuracy, but the path coefficient was not significant (.08), with a t-value of 1.29. The coefficient was much smaller than that of Expertise and DE. Compared to the use of DE, using more RTE might help improve judgment accuracy, but to a much lesser extent. H6b0: The use of deep explanations will have no effect on the level of perceived usefulness of explanations in KBS. H7b0: The use of reasoning-trace explanations will have no effect on the level of perceived usefulness of explanations in KBS. H6b0 and H7b0 were rejected on the grounds that the path coefficients from DE and RTE to Usefulness of Explanations were positive and significant. The link originating from DE was relatively weak, and much weaker than the one from RTE (.08 versus .21). The relative importance of variables are usually inferred from the relative size of their coefficients, however, 100 a high correlation of variables may pose a problem for such inferences (Alwin, 1988). In this analysis, the relative importance of DE and RTE could be assessed, given the absence of a high correlation between and DE and RTE (as indicated by the nonsignificant correlation coefficient of .23). Relatively speaking, when it came to the evaluation of the Usefulness of Explanations, the use of RTE seemed to be a stronger determinant. However, the multiple R2 was only at a marginal level of .05 for perceived Usefulness of Explanations. No direct link was hypothesized from Expertise and Methods to perceptions, because perceptions as a result of explanation use were of interest. H6c0: The use of deep explanations will have no effect on the level of trust in KBS. H7c0: The use of reasoning-trace explanations will have no effect on the level of trust in KBS. The results on Trust in KBS were very similar to that of Usefulness of Explanations. Table 6.10 shows almost identical patterns of path coefficients and R2s for the two constructs. Once again, it appeared that the use of RTE was a relatively more important predictor of users' trust in KBS, than the use of DE. 6.3.5 Discussion on the PLS Modelling Domain expertise and explanation methods explained 38% of the variance in the use of DE. The provision of hypertext-based access to DE substantially increased DE use. Increased use of DE was positively related to improvement in judgment accuracy. Twenty-one percent of the variance in the Accuracy measure was explained by explanation methods, use of DE, and domain expertise. This R2 was considered relatively large, given the difficulties associated with measuring 101 judgment accuracy, and the existence of other factors that could have potentially affected judgment accuracy such as individual differences in background and motivation. The use of RTE was more critical in forming users' opinions, than the use of DE. The reason could be the following: while DE were important and useful for judgment making, DE (of the experimental KBS in particular) largely consisted of textbook information. They were provided to help users understand the meaning of KBS output. In contrast, RTE, being more specifically related to the problem under consideration, were of application of domain knowledge. Therefore, RTE might be more relevant to users for assessing the quality and behaviour of a KBS, i.e., what was going on inside the KBS. However, only five and seven percent of the variance in user perceptions of the usefulness of the explanations and trust in KBS were explained by the use of explanations, despite the positive and significant path coefficients. An R2 of this magnitude is not uncommon in studies in behavioral sciences (Cohen, 1988), as well as in prior research using structural equation modelling. The results of the structural equation modelling were generally in agreement with other statistical analyses performed. For example, the findings on the effects of expertise and hypertext on the use of explanations were essentially consistent with the ANOVA in Section 6.2. Relevant statistics are compared in Table 6.11. The user perception part of the final measurement model of the PLS model (Table E7 of Appendix E) was nearly identical to the result of conventional reliability and factor analysis (Table F7 of Appendix F). The convergence of these statistical methods gave validity to the results. In addition to reaffirming other statistical analyses, PLS modelling also allowed assessment of certain relationships that were difficult to handle without using structural equation 102 Path Hypertext -> DE Requests Expertise -> DE Requests Hypertext -> RTE Requests Expertise -> RTE Requests PLS Std. Path Coefficient .55 -.27 -.03 .04 p-value oft <.001 <.001 <.05 > .30 R2 .38 .00 ANOVA p-value ofF .00 .01 .65 .73 R2 .35 .01 Table 6.11 Determinants of Explanation Use: A Comparison Between ANOVA and PLS models. From the PLS model, it was possible to identify the relative importance of indicators within a construct, e.g., the contextualized use of DE, and reasoning-trace How explanations were slightly more important than other indicators. In a conventional multiple regression analysis, decision accuracy and perceptions as dependent variable had to be summarized into single item measures. Measurement errors could not be handled within the research model. Consequently, the analysis would not have sufficient statistical power. In contrast, the PLS modelling was much more powerful. In conclusion, the explanatory power of the PLS modelling was acceptable. The modelling has led to new insights to explanation use, in addition to reaffirming other analyses. 6.4 Supplementary Analyses: User Preference for Explanation Types The explanations for each domain concept and recommendation were further decomposed into Why, How, and Strategic explanations. These three types were proposed as part of a conceptual framework for providing explanations (Dhaliwal, 1993; also see Clancey, 1983), and 103 empirically investigated by Dhaliwal (1993). While hypertext made contextualized access to deep knowledge possible and significantly increased the request for DE, it would be interesting to assess the impact of its use on preference for explanation types. An exploratory analysis of users' preference for different types of explanation was conducted and the results are presented in this section. The impact of both explanation provision methods and expertise was analyzed, for DE and RTE, separately. Because the analysis was essentially a test of the homogeneity of two or more distributions over a set of two or more categories, contingency table analysis (frequency comparison) was particularly appropriate. Chi-square tests were performed to examine the effects of expertise and explanation provision methods on user preference for explanations types, and then, log-linear models were used to determine the nature and strength of the effects. 6.4.1 User Preference for Deep Explanation Types The number of DE requested by the novice and expert subjects are presented in terms of Why/How/Strategic categories. The first row in Table 6.12 shows the number of each of the three types of DE requested by the 28 novices, as well as the total for the three types. Each of these numbers is also expressed as a percentage of the total. The second row consists of the numbers for the experts, and the third for the sum of both novice and expert subjects. Novices and experts had different preferences for explanation types (p = .02). Experts requested a much higher percentage of How explanations (48% vs. 35%), and lower percentages of Why and Strategic explanations, than novices. In terms of preference for explanation types (or frequency of explanation requests), the order of preference for novices was Why, How, and 104 Strategic; whereas the order for experts was How, Why, and Strategic. Expertise Count Novices1 Experts Column Total Explanation Type Why 147 42.5% 57 36.3% 204 40.6% How 121 35.0% 75 47.8% 196 39.0% Strategic 78 22.5% 25 15.9% 103 20.5% Total 346 68.8% 157 31.2% 503 100% Statistics Value DF Significance Pearson's chi-square 7.87 2 .02 Note 1: the number of novice subjects is larger than that of expert subjects by three. Table 6.12 Effects of Expertise on Preference for Deep Explanation Types Table 6.13 indicates that user preference for explanation types was influenced by explanation provision methods (p = .01). Hypertext was not only associated with an across-the-board increase in the number of DE requests, but also influenced the preference for explanation types. In particular, the proportion of How explanations was increased from 27.8% to 42.3%, whereas the proportions of Why and Strategic explanations were reduced by 5% and 9.5%, respectively. One possible cause for the 14.5% difference in the preference for How explanations might be the contextualized request for DE enabled by hypertext. There could be a difference between the pattern of explanation requests in the abstract (i.e., prior to analysis) and in the context of analysis and decision making. In order to compare the preference for DE requested in abstract and the preference in context, data were categorized as shown in Table 6.14. The first and second rows represent the 105 Methods Count Lineartext1 Hypertext Column Total Explanation Type Why 51 44.3% 153 39.4% 204 40.6% How 32 27.8% 164 42.3% 196 39.0% Strategic 32 27.8% 71 18.3% 103 20.5% Total 115 22.9% 388 71.1% 503 100% Statistics Value DF Significance Pearson's chi-square 9.21 2 .01 Note 1: the number of hypertext subjects is larger than that of lineartext subjects by three, Table 6.13 Effects of Explanation Methods on Preference for DE Types Means and Context of Explanation Requests Count Index (Abstract) Hypertext-Abstract Hypertext-Contextualized Column Total Explanation Type Why 129 42.2% 22 43.1% 53 36.3% 204 40.6% How 100 32.7% 24 47.1% 72 49.3% 196 39.0% Strategic 77 25.2% 5 9.8% 21 14.4% 103 20.5% Total 306 60.8% 51 10.1% 146 29.0% 503 100% Statistics Value DF Significance Pearson's chi-square 17.67 4 .001 Table 6.14 Effects of the Means of Explanation Request on Preference for DE types request for DE in abstract (i.e. prior to receiving recommendations from KBS); the second row exhibits explanations requested by following links of hypertext. The third row displays DE requested in the context of decision making, also by following the links of hypertext. The percentage of How explanations increased from 32.7%, to 47.1% and 49.3%. In general, explanations requested through hypertext favoured the How type. Novices Experts Methods Lineartext Hypertext Column Total Explanation Type Why 38 44.2% 109 41.9% 147 42.5% How 24 27.9% 77 37.3% 121 35.0% Strategic 24 27.9% 54 20.8% 78 22.5% Total 86 24.9% 260 75.1% 346 100% Statistics Value DF Significance chi-square 3.17 2 .20 Lineartext Hypertext Column Total 13 44.8% 44 34.4% 57 36.3% 8 27.6% 67 52.3% 75 47.8% 8 28.6% 17 13.3% 25 15.9% 29 18.5% 128 81.5% 157 100% Statistics Value DF Significance chi-square 6.78 2 .03 Table 6.15 Effects of DE Methods on Preference for DE Types (Expertise Controlled) Table 6.15 presents the results of a chi-square test of the effects of hypertext on novices and experts, separately, to show the interaction effects between expertise and explanation methods. Hypertext was associated with the increase of the How type of explanation requested 107 by both novices and experts. However, the effect was significant only for experts. The above crosstabulation analyses indicated that preference for DE types was related to domain expertise, and might be influenced by the methods for providing DE. In the following paragraphs, results of log-linear modelling are reported to provide statistical estimates for the effects under study. The statistical model used was the following: LOG(Expnoijk) = M + A* +V + V + V* + V + V + V ™ Where Expnoijk is the observed frequency of explanations requested (by all subjects) in the cell, X,x is the effect of the /th explanation type, A,jE is the y'th expertise level, \ M is the kth explanation method; A,yXE is the corresponding interaction between explanation type and expertise, XikXM is the interaction between explanation type and method, A.jkEM is the interaction between expertise and explanation method, and X,ijkXEN1 is the three-way interaction. The results of the log-linear modelling are presented in Tables 6.16 and 6.17. In terms of the main effects (Table 6.16), Why explanations appeared to be most frequently requested, and Parameter TYPE: Why How Strategic EXPERTISE: Novices Experts METHODS: Lineartext Hypertext Coefficient .263* .110 -.373* .464* -.464* -.601* .601* Std. Err. .081 .087 .119 .062 .062 .062 .062 Z-Value 3.25 1.26 -3.13 7.52 -7.52 -9.73 -9.73 Lower 95 CI .105 -.061 -.607 .343 -.585 -.722 .480 Upper 95 CI .422 .281 -.140 .585 -.343 -.480 .722 (* p < .05) Table 6.16 Main Effects of Expertise and DE Methods on Preference for DE Types 108 the Strategic ones were least frequently requested. However, according to last row of Table 6.12, the overall difference between the Why and How types was only about 2% of the total DE requested. The effects of expertise and explanation methods were consistent with findings of Section 6.2. Parameter Coefficient TYPES x EXPERTISE: Why x Novice (N) How x N Strategic x (N) Why x Expert (E) How x E Strategic x E TYPES x METHODS Why x Linear (L) How x L Strategic x L Why x Hyper (H) How x H Strategic x H .023 -.108 .075 -.023 .108 -.075 i . .042 -.262* .221 -.042 .262* -.221 EXPERTISE x METHODS: N x L E x L N x H E x H .063 -.063 .063 -.063 Std. Err. .081 .087 .119 .081 .087 .119 .081 .087 .119 .081 .087 .119 .062 .062 .062 .062 Z-Value .28 -1.23 .63 -.28 1.23 -.63 .51 -3.00 1.85 -.51 3.00 1.85 1.03 -1.03 1.03 -1.03 Lower 95 CI -.136 -.279 -.129 -.182 -.064 -.308 -.117 -.434 -.013 .200 .091 -.454 -.058 -.184 -.058 -.184 Upper 95 CI .182 .064 .308 .136 .279 .129 .200 -.091 .454 -.117 .434 .013 .184 .058 .184 .058 (* p < .05) Table 6.17 Interaction Effects of Expertise and DE Methods on Preference for DE Types In terms of interaction effects (Table 6.17), a test of k-way interactions indicated that third order effects were non-existent. Among the two-way interactions, the only significant finding was that hypertext increased the requests for How explanations. 109 6.4.2 User Preference for Reasoning-Trace Explanation Types User preference for RTE was analyzed in the same fashion as that of the DE. As can be seen from Table 6.18, experts requested a much higher proportion (40.4% versus 54.8%) of How explanations, and lower proportions of Why and Strategic explanations, than novices. This finding was similar to that for user preference for DE types, except that it was more significant. It is interesting to note that in terms of the total number of all explanations requested, the How type of RTE was the only case where experts requested more explanations than novices, even though there were fewer expert subjects. In Table 6.19, Pearson's chi-square is relatively small, suggesting that hypertext had no impact on user preference for RTE types. This was expected, because the two experimental KBS had identical RTE, except that the hypertext version allowed access to deep knowledge from Expertise Count Novices Experts Total Explanation Type Why 174 36.3% 122 30.3% 296 33.5% How 194 40.4% 221 54.8% 415 47.0% Strategic 112 23.3% 60 14.9% 172 19.5% Total 480 54.4% 403 45.6% 883 100% Statistics Value DF Significance Pearson's chi-square 20.05 2 .000 Table 6.18 Effects of Expertise on Preference for RTE Types RTE. 110 Methods Count Lineartext Hypertext Total Explanation Type Why 146 33.6% 150 33.5% 296 33.5% How 199 45.7% 216 48.2% 415 47.0% Strategic 90 20.7% 82 18.3% 172 19.5% Total 435 49.3% 448 50.7% 883 100% Statistics Value DF Significance Pearson's chi-square .93 2 .63 Table 6.19 Effects of DE Provision Methods on Preference for RTE Types Table 6.20 further reveals the effects of hypertext on novices and experts, separately. For both novices and experts, hypertext had no effect on the preference for Why explanations. Furthermore, although hypertext did not influence novice users' preference for RTE types, it was associated with a marginal increase in the proportion of How explanations and a reduction of the same proportion in Strategic explanations requested by experts. Pearson's chi-square was significant at the .05 level of alpha. The log-linear model used in 6.4.1 was also used to examine the effects of expertise and explanation methods on the preference for the three types of RTE. In terms of the main effects (Table 6.21), the How type of RTE was most frequently requested, and the Strategic type least frequently. In terms of the expertise effect, although fewer RTE were requested by experts as a group, on average the number of RTE requested by each expert was about the same as the number requested by each novice (16.1 vs 17.1). I l l Novices Experts Methods Lineartext Hypertext Total Explanation Type Why 86 36.6% 88 35.9% 174 36.3% How 99 42.1% 95 38.8% 194 40.4% Strategic 50 21.3% 62 25.3% 62 23.3% Total 235 49.0% 245 51.0% 480 100% Statistics Value DF Significance chi-square 1.18 2 .55 Lineartext Hypertext Total 60 30.0% 62 30.5% 122 30.3% 100 50.0% 121 59.6% 221 54.8% 40 20.0% 20 9.9% 60 14.9% 200 49.6% 203 50.4% 403 100% Statistics Value DF Significance chi-square 8.67 2 .01 Table 6.20 Effects of DE Methods on Preference for RTE Types (Expertise Controlled) There was an interaction effect between expertise and explanation types (Table 6.22). Novices requested a lower proportion of How explanations, and a higher proportion of Strategic explanations than experts. There was no interaction effects between explanation types and methods, and expertise and methods. 6.4.3 Discussion on User Preference for Explanation Types The difference in the preference for explanation types should be interpreted in light of the 112 Parameter Coefficient Std. Err. Z-Value Lower 95 CI Upper 95 CI EXPLANATION TYPES: Why How Strategic EXPERTISE: Novices Experts METHODS: Lineartext Hypertext .084 .432* -.516* .149* -.149* .022 -.022 .051 .047 .069 .038 .038 .038 .038 1.67 9.16 -7.46 3.97 -3.97 .584 -.584 -.015 .339 -.380 .076 -.223 -.052 -.096 .184 .524 -.652 .223 -.076 .096 .052 (* p < .05) Table 6.21 Main Effects of Expertise and DE Methods on Preference for RTE Types different functionalities of each type of explanation. The Why type of DE contained mainly information for justifying the conceptual relevance and importance of domain concepts, whereas the How type of DE provided specific definitional information including details and clarifications of domain concepts. Strategic explanations, being the least specific type, only depicted the global relationships among relevant domain concepts, and how each concept was related to the task. Because experts possessed more knowledge and conceptual background, their use of DE was concentrated on How explanations probably for occasional clarification and definitional details. Experts paid less attention to the justification and strategic relationship of various domain concepts. In contrast, novices preferred justification (Why), as well as specific definitional information (How). Contextualized requests for DE were focused on How explanations for specific definitions and clarifications on domain concepts, probably because users needed different types of 113 Parameter Coefficient Std. Err. Z-Value Lower 95 CI Upper 95 CJ TYPES x EXPERTISE: Why x Novice (N) How x N Strategic x N Why x Expert (E) How x E Strategic x E TYPES x METHODS Why x Linear (L) How x L Strategic x L Why x Hyper (H) How x H Strategic x H .027 -.212* .185* -.027 .212* -.185* i . -.036 -.059 .095 .036 .059 -.095 EXPERTISE x METHODS: N x L E x L N x H E x H -.054 .054 .054 -.054 .051 .047 .069 .051 .047 .069 .051 .047 .069 .051 .047 .069 .038 .038 .038 .038 .54 -4.50 2.67 .54 4.50 -2.67 -.71 -1.26 1.37 .71 1.26 -1.37 -1.45 1.45 1.45 -1.45 -.072 -.304 .049 -.126 .120 -.320 -.135 -.151 -.041 -.063 -.033 -.231 -.128 -.019 -.019 -.128 .126 -.120 .320 .072 .304 -.049 .063 .033 .231 .135 .151 .041 .019 .128 .128 .019 (* p < .05) Table 6.22 Interaction Effects of Expertise and DE Methods on Preference for RTE Types information at different stages of KBS use. In the context of problem solving, information most specific and relevant to judgment and decision making was needed. When DE were requested in abstract, they were mainly used for general understanding. Hypertext seemed to be effective in enabling access to the most relevant knowledge in the context of decision making. Among the three types of RTE, How explanations actually provided a detailed trace of reasoning process, Why explanations justified the importance of each individual KBS recommendation, and Strategic explanations depicted global relationships among all the recommendations in a particular sub-analysis. Experts' requests for RTE were more focused on 114 How explanations, similar to their use of DE. This might be because the relevance and importance of KBS recommendations could be relatively easily understood by experts, and they had more knowledge and experience to judge the quality of reasoning by looking into the elaboration of the line of reasoning (How). Thus, experts might perceive How explanations as the most useful type. They paid less attention to global relationships between individual recommendations and the overall objective of analysis tasks (Strategic). In contrast, novices, who lacked experience in dealing with realistic financial analysis cases, equally preferred elaboration on the line of reasoning (How) and justification in terms of importance and relevance of KBS recommendations (Why). Overall, the use of hypertext to represent and access DE did not influence the preference for RTE types. The above findings should be interpreted and understood with caution, for two reasons. First, users' preference for explanation types might have been influenced by the presentation sequence. In the experimental KBS, the sequence of the buttons for accessing the explanations was always Why, How, and Strategic. However, such consistency was necessary, and the lack of it would have adversely affected the quality of the user interface of the KBS. Second, the analyses conducted in this section were based on computer logs of the numbers, types, and means of explanations requested by subjects. The cognitive use of explanations was analyzed based on verbal protocols, and results are presented in Chapter 7, to determine the nature and motivation of explanation use. 6.5 Summary Major research findings presented in this chapter are summarized in Table 6.23. As 115 Hypotheses HI a (No. of deep explanations): HT1 > LT Hlb (No. of reasoning-trace explanations): HT > LT H2 (No. of contextualized deep explanations use): Context > Abstract H3a: Interaction bet. expertise and explanation method on No. of deep explanations requested H3b: Interaction bet. expertise and explanation method on No. of reasoning-trace explanations requested Results Supported Not Supported Some Evidence Not Supported Not Supported H4a {Improvement in decision accuracy): H4b {Perceived usefulness of explanations): H4c {Trust in KBS): H5a {Improvement in decision accuracy): H5b {Perceived usefulness of explanations): H5c {Trust in KBS): H T > L T H T > L T H T > L T NV2 > EX N V > E X N V > E X Supported Not Supported Not Supported Supported Not Supported Not Supported H6a: Use of deep explanations --> improvement in decision accuracy H6b: Use of deep explanations --> perceived usefulness of explanations H6c: Use of deep explanations —> trust in KBS H7a: Use of reasoning-trace explanations --> improvement in decision accuracy H7b: Use of reasoning-trace explanations --> perceived usefulness of explanations H7c: Use of reasoning-trace explanations --> trust in KBS Supported Supported Supported Not Supported Supported Supported Note: 1. HT - hypertext users; LT - lineartext users 2. NV - novices; EX - experts Table 6.23 Summary of Findings expected, the use of hypertext significantly increased the requests for DE (HIa). Users with 116 access to hypertext virtually doubled the number of DE requested (HIa and H2). While the context of DE requests did not shift completely from in the abstract to in the context of problem solving, about 37% of the DE use was actually contextualized. The improvement in decision accuracy (H4a) as the result of the access to hypertext-based DE was also of major importance. It is also interesting to note the positive correlation between the use of DE and improvement in decision accuracy (H6a). Such effects were particularly significant for novices (H5a). The likely reason for this is that they used more DE and using more DE led to greater improvement in decision accuracy. Expertise and explanation provision methods did not have a significant impact on user perceptions, although the use of DE and RTE had some weak effects. It is not surprising that most of the significant results were related to hypertext and DE use. In this study, hypertext was used in the experimental systems to enhance the representation of and access to DE. The use of hypertext changed the representation and accessibility of DE to a greater extent than that of RTE. In other words, the RTE in the two experimental KBS were similar, except that the hypertext version allowed contextualized access to DE. Data analysis indicated that the feasibility of accessing deep knowledge from RTE did not have much effect on the use of RTE, although the use of DE and RTE had some weak effect. Experts and novices had different preferences for DE and RTE types. Experts preferred the How type, whereas novices preferred both the Why and the How types. Access to hypertext-based DE substantially increased the proportion of the How type of DE requested, but had little effect on the preference for RTE types. 117 CHAPTER 7. DATA ANALYSIS (PART II): VERBAL PROTOCOL ANALYSIS This chapter is a self-contained report of the analysis of verbal protocols related to explanation use, including details ranging from method issues, experimental procedures, to analysis results. The primary objective of the verbal protocol analysis was to understand the nature of explanation use. Verbal protocol data were analyzed mainly qualitatively in the form of description and categorization, as well as quantitatively using statistical tests whenever appropriate. In this chapter, general method issues associated with verbal protocol data are reviewed (Section 7.1), followed by a description of the objectives and characteristics of this particular verbal protocol analysis (Section 7.2). The methods and procedures used for collecting verbal reports (Section 7.3), and coding and analyzing the data (Section 7.4) are reported in detail. The rationale of the use of deep explanations (DE) and reasoning-trace explanations (RTE) are classified (Sections 7.5 and 7.6), and examined statistically (Section 7.7). The data is then examined in terms of the theoretical foundations of this study (Section 7.8), followed by the findings of the two case studies (Section 7.9). Lastly, conclusions are drawn through a discussion of the effects of verbalization on performance, lessons learned from this study, and major research findings and contributions (Section 7.10). 7.1 The Use of Verbal Protocol Analysis Verbal protocol analysis is a method of data collection and analysis that relies on verbal reports "to gain information about the course and mechanisms of cognitive processes" of the 118 subject's internal states (Ericsson & Simon, 1980, p. 215). In the field of information systems research, verbal protocol analysis can be used for a number of purposes such as understanding general principles of human information processing, and evaluating human-computer interfaces and decision-making support (Todd & Benbasat, 1987). There exist two major concerns regarding verbal protocol analysis, namely, its validity and its completeness. Validity can be compromised if the instruction to verbalize and the researcher's probing affect the cognitive process itself. Concurrent verbalizing has a greater chance of interference than retrospective reports. Questioning (structured probing on specific aspects of the process) during the experiment may also be obtrusive to the cognitive process. The other concern regards whether verbal data can be comprehensive enough to reflect the course and structure of the underlying cognitive process. Cognitive overload can cause omission when subjects are not able to report all the available information in short-term memory (STM) at a time because information used in parallel tasks rapidly obliterates each other in STM. Another probable reason for the incompleteness of verbal reports may be that not all of the information previously available in STM has been retained, or is retrievable, in long-term memory (LTM). Criticisms on verbal protocol analysis were carefully evaluated by Ericsson and Simon's (1980, 1993) analysis of a large body of think-aloud data. This analysis revealed the conditions under which verbal data provide an authentic trace of the task-related process and conditions that result in interference and incomplete protocols. They concluded that the contents of thinking aloud and immediate retrospective reports (vis-a-vis those collected at the end of experiment) were valid, and that the validity and completeness of verbal reports could be enhanced by adopting appropriate methods. In general, concurrent verbal reports should be more informative 119 than retrospective reports, and immediate retrospective reports should be preferable to those that are delayed. They also found that instructing subjects to think aloud on a wide range of tasks did not appear to change the cognitive processes. However, this resulted in additional time needed to complete the verbalization. Ericsson and Simon (1980, 1993) proposed a theoretical foundation for the analysis of the cognitive processes underlying verbal reports of thinking, using the theoretical framework of the information-processing theory of human cognition. This theory (Newell & Simon, 1972) postulates that a cognitive process can be seen as a sequence of internal states successively transformed by a series of information processes. Each state can be described, in large part, in terms of the small number of information structures, or chunks, that are attended to or heeded, or that are available in the limited storage capacity of STM. A critical assumption is that information must be heeded before it can be verbalized, and thereby made observable (Ericsson, 1988). The optimal time for verbal reporting is when the thoughts first enter the subjects' attention as part of their effort to complete a task. This implies concurrent verbal reporting of thinking. Furthermore, concurrent verbalization can be classified into at least three different levels. Level 1 concurrent verbalization, "pure" thinking aloud, includes only verbalization of thoughts naturally entering attention. Verbalized information is thoughts that are encoded orally as inner speech. No special effort is needed to communicate the thoughts. The corresponding instruction is sometimes called "talking aloud" (Ericsson & Simon, 1993). Level 2 involves description, which needs "to explicate or label information that is held in a compressed internal format or in an encoding that is not isomorphic with language" (Ericsson & Simon, 1993, p. 79). The 120 additional information for description must be verbally encoded before it can be vocalized. The corresponding instruction is often known as "thinking aloud." Level 3 requires the explanation of the thought processes or thoughts. Simply recording information already in STM is not enough, special effort needs to be made to link information in STM to earlier thoughts and information attended to. The fundamental difference between levels 2 and 3 is whether or not additional information that is not present in STM needs to be brought into the focus of the subject's attention. According to Ericsson and Simon, level 1 and 2 verbalizations do not change the sequence of heeded information. Empirical studies have shown that they do not change the accuracy of performance, although level 2 may take more time. Verbalizations that cannot be categorized into these two levels may systematically change a subject's cognitive processes, as demonstrated by observable changes in their performance on the task. 7.2 Verbal Protocol Analysis Used in This Study Verbal protocol analysis in this study was the instrument for exploring the cognitive process of users' request and utilization of KBS explanations. The focus was "the pre-decisional behaviour" (Todd & Benbasat, 1987) that took place when an individual requested KBS explanations. Verbal protocol data were collected to supplement primary quantitative measures based on computer logs and questionnaire data of the process and outcome of the explanation use. Since computer logs provided no indication of why explanations were requested, and whether or not explanations were used for decision-making, the cognitive process of explanation use could only be investigated using process tracing methods (Todd & Benbasat, 1987), such as verbal protocol analysis. Without the process tracing measures, a study of explanation use may need the 121 following assumption, although it is often not explicitly stated in prior studies: if explanations are requested, they are read, comprehended, and applied in decision-making. Another related assumption is that subjects work with limited time to finish a task, and intend to optimize their performance, i.e., to deliver the best quality judgment with the minimum amount of effort and time. Thus, subjects will use optional KBS features, such as explanations, only if they perceive that such features may improve the quality of their work. The more specific objectives of this verbal protocol analysis will now be discussed. First, verbal protocol data were expected to provide evidence regarding why a user would request explanations given limited time and cognitive resources, how explanations were used for understanding KBS output, and what the common reasons and nature of explanation use were. Second, verbal protocol data were expected to link explanation use behaviour to the underlying theoretical foundation. For example, intervening variables of information processing originated from discourse comprehension theories were used to predict the effects of hypertext and domain expertise, e.g., in terms of reducing the number of bridging inferences and memory reinstatements. These variables are not directly observable; how experimental treatments affect them may only be inferred from verbal reports. Third, verbal protocol data might give meaning to other quantitative measures, and help interpret the research findings reported in the previous chapter. For example, verbal protocol data might help understand the importance of contextualized access to DE, and explain why the use of DE would improve decision accuracy, as found in Chapter 6. The above objectives were different from the usual goal of understanding the strategy and process of problem solving (Ericsson, 1988), when applying thinking-aloud method. This research 122 investigated the process of user-KBS interaction, rather than the complete cognitive process of reaching evaluative judgments, e.g., how a subject reached an agreement rating, and what information went into the cognitive process. Of particular interest were the reasons and nature of explanation use, which accounted for only a small part of the whole decision-making process. Furthermore, the large amount of text processing associated with the use of the KBS also made the experimental task unique. The task was essentially a combination of text processing and diagnostic decision-making. A subject read a large amount of text and graphics based output from a KBS, including recommendations and explanations. Explanations were not automatically displayed. In fact, a user could finish the task without requesting any explanations at all. The subject made some diagnostic judgments regarding the problem at hand, taking into account the recommendations and relevant explanations provided by the KBS, and the subject's own analysis of the case. The objectives and characteristics of this verbal protocol analysis were taken into account in the development of an appropriate verbal reporting method, instructions to verbalize, and the coding and analysis of the verbal data. 7.3 Method and Procedure to Verbalize The instructions to verbalize was designed based on the research objectives and a number of constraints. One of the key constraints was to minimize the danger of interfering with the user-KBS interaction process, since verbal protocol data were a secondary consideration in this research. Level 1 concurrent verbalization would minimize the chance of potential interference. 123 However, it might generate few and uninformative comments. On the other hand, while probing or having subjects explain the reason for each action taken during KBS use might result in information directly related to the research question, there was the danger of substantially influencing subjects' behaviour. Consequently, the validity of the principal quantitative measures might be jeopardized. The goal was to select a form of verbalization that would not interfere with the cognitive process of task performance, and yet be informative enough about the cognitive processes that would normally occur. Level 2 concurrent verbalization (Ericsson & Simon, 1980, 1993), i.e., "thinking aloud," was selected, as a compromise to balance the above concerns. Although level 2 methods might not generate as much specific information as probing or some level 3 methods, there was less danger of interfering with the cognitive process. Thinking aloud protocols and the correspondence to level 1, 2, and 3 may be influenced by the wording of the instructions. Instructions for level 2 think-aloud recommended by Ericsson and Simon (1993) were adopted for this study. The key message in the instructions was "SAY OUT LOUD everything that passes through your mind for each step as you interact with the expert system" (HI of Appendix H). There was also a supplementary instruction, "it does not matter if your sentences are not complete, since you are not explaining to anyone else. Just act as if you are alone in the room speaking to yourself loudly." The purpose of including this complementary instruction was to encourage the subjects to provide more thorough protocols, and to discourage the subjects from explaining their action and getting into level 3 verbalization. In addition, subjects were told that if they were silent for more than 10 seconds, they would be reminded, so that they had a clear idea of what was expected. 124 There were two additional difficulties that were dealt with in the development of the instructions. First, subjects might read frequently, and provide insufficient verbal protocol reports, because it was difficult to read silently and yet think aloud (Waern, 1988). To solve this problem, subjects should be asked to think aloud and read aloud (Waern, 1988). Therefore, subjects were prompted with verbal instructions to read the text aloud, as well. Second, it was known that think-aloud instructions to subjects reading easy essay text tended to yield little verbalization beyond the reading of the text. The usual prescription to obtain informative think-aloud reports was to break up the continuity of the reading process. Some researchers of text comprehension used sentence-by-sentence reading and "thinking aloud." However, this method would create an unrealistic condition, thus undermine the ecological validity1 of this research. Not only would this method depart from normal reading, but also change the normal user-KBS interaction. Instead, subjects were asked to provide an agreement rating on a scale from 1 to 7 for each recommendation, thus, they had to slow down the reading, try to understand each recommendation, and become engaged in the task. The instructions (Appendix HI) were tested through the pilot study involving two of the four pilot subjects. The instructions appeared to be understood and followed by the subjects. As a result of the pilot study, one minor change was made in verbal instructions to subjects: before the subjects were given the written instructions to verbalize, they were asked to relate reading and thinking aloud to their past experience of doing that while alone (Appendix H2). This 1 Ecological validity refers to the environment in which the evaluation takes place and the degree to which the evaluation may affect the results it obtains. The degree to which this occurs depends on the level of intrusion into users' work and the control exercised over the users' task by the evaluator. Obtrusive techniques are those where users are constantly aware that their behaviour is being monitored or where they have to interrupt their work to provide some information (Preece, et al., 1994, p. 698). 125 instruction appeared to help subjects start verbalizing naturally. Two simple training tasks were prepared (Appendix HI), as suggested by Ericsson and Simon (1993). If a subject did well with the first one, the second one was skipped to save time. Training to use the experimental KBS and training to think aloud were separated into two tasks, although they could have been combined into one. The advantage of combining the two was that subjects would get training in a task similar to the experimental task, but the shortcoming was that it might cognitively overburden subjects. Thus, training for thinking aloud was given following that for using the KBS. After the simple training task(s), subjects were asked to work on the tutorial KBS again for just one screen while verbalizing. The training usually took about five to ten minutes. The experimenter sat beside the subject unobtrusively to ensure help was available in case the subject had any problem with the experimental system. Whenever the subject became silent for a significant amount of time (more than 15 seconds), the experimenter would intervene with "What are you thinking of now?" or "keep talking please" spoken in a neutral voice. Ericsson and Simon (1993) suggested the experimenter should be cautious about interfering during the experiment, and recommended that the subject be reminded to speak when there was a 15 second to one minute pause, with the interval depending on the type of study. Before the training session started, the experimenter would ask whether or not the subject had any objection to thinking-aloud and being tape recorded. No one objected. After the training session, most of the subjects could verbalize while working on the task without the need for prompting. Some only needed very little at the beginning of the task. However, a couple of them were very uneasy with thinking aloud, kept having long pauses, and sometimes verbalized with 126 a low voice. Training and repeated reminding were not effective in overcoming the difficulty of these individuals. A tape recorder registered all that was said during the period when subjects were interacting with the KBS and filling in the evaluative questionnaire (the second set). The pilot study also tested the positioning of the microphone, and the quality of the recording. An electret condenser microphone (Sony ECM-Z200) was used. It was attached to a microphone-stand, positioned directly above the computer monitor and towards the subject's mouth. The quality of the microphone was good enough that it did not pick up the noise of the computer. Given the high density of verbal data and the cost and time of analyzing it, the sample size was kept fairly small. The other reason for not having all of the subjects verbalize was to compare the performance of those who verbalized to those who did not. After all, verbal data were intended to supplement other process and outcome measures. It was important to safeguard against any possible interference that thinking aloud might introduce to the process. Thus, five subjects in each treatment condition were randomly selected to think aloud while working on the experimental task. In the end, nineteen sets of verbal protocols were usable for analysis. In one case, the microphone was not set up properly, and the recording quality was so poor that the tape was impossible to transcribe. 7.4 Data Analysis Method The author of this dissertation transcribed all of the verbal reports, in two stages. Each tape was first transcribed, to record the information on the tape. Then, in the second stage, with the KBS running, the verbal reports were matched against the printed computer logs and KBS 127 screens. Thus, what was read from the KBS was differentiated from the thoughts of subjects, and each segment of the verbal data was labelled corresponding to the KBS screens. Task features were considered in designing a relevant and meaningful coding scheme. The experimental KBS involved seven subanalyses of identical structure, and each subanalysis consisted of an index screen of domain concepts, a relevant data table, recommendations, optional DE at various stages depending on the treatment condition, and RTE. In each of these seven subanalyses, subjects read three to four recommendations, and might have requested the corresponding RTE. All together, subjects received 25 recommendations related to the seven subanalyses. Given these recursive processes, a scoring and scanning approach was more appropriate than global modelling of the whole process. Generally speaking, scoring refers to tabulating the frequencies of certain key items of interest, while scanning involves examining the protocols for information that assists in interpreting other observations (see the review by Todd & Benbasat, 1987). Based on the result of scoring, various types of explanation use can be classified. Corresponding frequencies can also be compared to identify patterns of explanation use, and differences and similarities can be highlighted among various treatment groups. A coding scheme was designed before conducting the experiment, based on the research questions, task features, and effects of hypertext and domain knowledge based on theories (discussed in Chapter 3). Furthermore, prior research of discourse comprehension using verbal protocol data was also useful for conceiving the scheme. According to Just and Carpenter (1980), if texts are well adapted (formed) to the reader, reading proceeds rapidly and smoothly, with infrequent pauses and re-readings of text. The meaning emerges directly to the attention of 128 subjects without any intermediate reportable states, and a think-aloud protocol from reading does not contain any information beyond that in the text itself (Ericsson, 1988). Texts can be difficult if they are poorly organized or poorly matched with readers' prior knowledge. To comprehend difficult texts, subjects need to actively retrieve and integrate their own relevant knowledge and information presented earlier in the text. Therefore, to classify the nature of the use of DE, it was important to identify verbal reports indicative of users' prior knowledge on domain concepts involved in KBS output, before, at the moment of, and after requesting DE. To classify the nature of the use of RTE, it was important to assess the level of understanding or agreement with the KBS recommendation, before, at the moment of, requesting RTE. The coding scheme for DE use was developed on the key dimension of prior knowledge on domain concepts; and coding scheme for RTE was on prior understanding of, and agreement with, KBS recommendations. Ericsson (1988) argued that it was possible to specify the information that subjects must retrieve to generate the necessary inferences and relations needed to attain comprehension. These retrieval activities correspond to a series of intermediate steps with heeded information, which should be reflected in subjects' concurrent verbal reports on the comprehension process. The coding scheme focused on the circumstances of explanation requests. The nature of explanation use was inferred from verbal reports that were indicative of the type of comprehension difficulty. Therefore, indicators of comprehension difficulty should be collected, including reading versus re-reading, pauses, and paraphrasing of KBS output. In this study, KBS output may be characterized as moderately difficult to understand, because specialized domain knowledge was required. Explanations were expected to provide some of the knowledge. Most of the domain concepts involved in KBS output were often used, and familiar to the subjects; but some are not 129 frequently used and were forgotten by most of the subjects. A couple of the domain concepts were so rarely used that even expert subjects were not familiar with them. The use of DE to learn about new or unfamiliar concepts was expected, thus, a corresponding category of DE use was created. If the user already had some prior knowledge, DE might be requested for additional knowledge. The use of DE was also expected to refresh users' memory on certain domain knowledge, which corresponded to the memory reinstatement variable of the discourse comprehension theories. If the user had a substantial amount of prior knowledge, DE might be requested confirming or comparing with what was known. Categories of DE use were created accordingly to capture these phenomena. KBS users were not expected to understand and agree with all the recommendations. The Why and How types of RTE were provided to help users understand the significance and reasoning processes of recommendations, respectively. Accordingly, two categories of RTE use were included in the coding scheme. If more prior understanding of KBS recommendations was present, RTE might be requested for additional information, or confirm or compare with one's own reasoning. Categories of RTE use were created accordingly to reflect these situations. The coding scheme was tested on the verbal reports collected from the two pilot subjects, then substantially modified. Some measures of initial interest were difficult to assess consistently and reliably, e.g., the difference between reading silently and pauses, reading and re-reading silently, and the use of prior domain knowledge. Therefore, the scope of coding was reduced to include only verbal reports that were related to the selection, request, reading, and application of explanations, and could be more reliably coded. Only a small, but the most relevant, part of verbal protocols were examined in detail. Other measures were abandoned including reading 130 versus re-reading, pauses, paraphrasing, as well as the use of prior domain knowledge versus the generation and use of new knowledge. Two additional categories of RTE request, namely surprise and disagreement, were derived from pilot verbal reports, and added to the coding scheme. The final coding scheme is attached as Appendix H3, which defines all of the major categories of explanation use, and illustrates each category with examples. All verbal protocols were then coded using this scheme. 7.5 Classifications of the Use of Deep Explanations This section and the next one present the major categories of DE and RTE use, respectively. Each category defined in the coding scheme is illustrated with the most representative and informative examples of verbal reports. The following notations are used in examples of verbal reports presented in the next couple of sections: Cx.y refers to the vth recommendation (conclusion) in subanalysis x, where x refers to the subanalysis id. For example, C6.1 represents the first recommendation in the Liquidity Analysis (No. 6). Similarly, DE-how-6.1 represents the how type of DE for the first ratio (current ratio) involved in the analysis, explaining how the ratio is defined; while DE-why-6.1 is the Why type, justifying the relevance of the ratio. RTE-how-6.1 refers to the reasoning-trace How explanation of C6.1, offering details of the reasoning process; whereas RTE-why-6.1, represents the corresponding reasoning-trace Why explanation, justifying the significance of C6.1. Verbal reports made by subjects are in quotation marks. KBS recommendations and explanations read by subjects may not be spelled out in full to save space, and are in italics. Also, bold letters indicate domain concepts highlighted by hypertext markers in the hypertext 131 version of the KBS. Words stressed by subjects are underlined. The classification was mainly based on information from verbal reports that were indicative of users' prior familiarity with the domain concepts ranging from no prior knowledge at all to nearly complete prior knowledge, and the intention at the moment of requesting explanations. All of the four major categories of explanation use defined in the coding scheme emerged from the coding as expected, they are reported as follows. 1. Learning about new or unfamiliar domain concepts occurred if the user seemed to have no prior knowledge. Typical indicators included, "What does it mean?" "Never heard of that in my life, I'm going to ask Why." A user might also paraphrase the domain concepts explained by the KBS (indicating one's effort of trying to understand them), e.g., "What is liquidity index? So that's dollars by days in receivable. What does that give you? dollar times days, divided by dollars, so that's going to be days ..." Although most of the domain concepts should have been familiar to users, there were a few of them that were seldom used in practice, and unheard of by most of the users (cf., H4 in Appendix H). The other common characteristic of learning was that users tended to read both the Why and How DE, if they attempted to fully understand the concept. Both of these two features helped differentiate the use of explanations for learning new concepts from that for seeking additional information. For example, Expert No. 04 read C10.3, "Yeah, wow, how did they do that? I'll see Strategics this time first." Read RTE-Strategic-10, "back up, how?" Read RTE-how-10.3, "Funds reinvestment ratio, take a look!" He jumped to the corresponding DE-how-10.4, "Never heard of that before!" and started to paraphrase DE-how-10.4: "Look at the balance sheet. Capital, 132 plant, equipment. So it is all your assets. Cash divided by all your assets, you can expect it will shrink as the company is getting larger." DE of domain concepts provided not only necessary information for users to understand and accept KBS recommendations, but also a basis for users, particularly expert users, to challenge the validity of the recommendations. For example, upon reading DE-how-10.5, Expert No. 3 remarked: "Funds flow adequacy ratio is a stupid calculation, 'cause it takes inventories out, and then, the numerator, and adds it back in as borrow, double counting, though. So, I'm not sure what that tells us." In either case, explanations were helpful for users to understand, then accept or reject KBS recommendations. 2. Seeking additional information occurred if the user probably already knew something about the domain concept. This was a common type of DE use, as both novices and experts had basic training and theoretical background of the task domain. Typical indicators included, "I want to see how they do trend evaluation, that is really important" and "See how it does it." When users started any one of the seven subanalyses, they would first be given a list of domain concepts (ratios) to be used in that analysis. Usually, this was the time when users would scan through the list, looking for ratios new or unfamiliar to them, and ratios that might need additional information. Users who used the hypertext version of the KBS were more likely to use contextualized access to DE. For example, while reading C6.1, Novice No. 4 noticed the hypertext marker attached to the term working capital, he then clicked on it for more information. He started by "this says, is in a very favourable working capital position ... for a short-term loan. And, working capital, just to see how they computed that," requested and read DE-how-6.10, "current 133 assets less current liabilities, current ratio. Based on that statement, I would agree with that." Similarly, when Novice No. 14 was reading C10.2, she did the following, "Funds flow adequacy, would like to know a little bit more about that." She stopped reading the recommendation, and requested DE-why-10.5 for explanation on the concept, then came back to read C10.2. 3. Refreshing memory occurred if the domain concept was familiar to the user, but the user either forgot or did not have it complete when it was needed. Typical indicators included "I forget what it means" (before reading the explanation) and "I should have known that" (after reading the explanation). While all of the users should have the theoretical background from prior training and experience, those who did not frequently conduct the type of financial analysis task performed in the experiment needed memory refreshment on some of the key concepts. For example, Novice No. 12 examined data on Table 7, "Equity/liabilities is OK. Equity/long-term, equity/plant, times interests earned. I forgot what the hell that means, so I'll check that. (DE-why-7.4)... ability to earn profit sufficient to cover interest charges on debt. OK, that's all I wanted to know. That's, yeah, that's excellent..." Similarly, looking at a list of ratios, Novice No. 6 questioned, "financial leverage, how did it calculate that?" She then requested and read DE-how-8.11, "Average total assets over total average shareholders' equity. OK, I should have known that." Novice No. 14 voiced the following when examining ratios used in the Liquidity Analysis, "I need to refresh on the acid-test, (DE-how-6.2), cash plus cash equivalents, plus account receivable ... Basically, current to current, well, let's look at the current (ratio), then. (DE-how-6.1) OK, just current, OK, so. Look at it again, see what is not included (DE-how-6.2), cash, cash equivalents, account receivable, oh, not inventories, there is a huge difference." 134 Expert No. 5 used the hypertext version of the KBS. She read C10.2, could not understand the term funds flow adequacy, used the hypertext based explanation, and then commented: "That's good, I like that, that's useful, that's useful. It's handy to have that quickly like that to remind you if you are ..." "You learn a lot from this over time, if you, like I forgot some of the stuff from when I was doing that, and we did use to pay much more attention to that (ratios). You wouldn't need to go through these all the time, these boxes. Once a while, you could refresh your knowledge about things. It would be good." 4. Confirming or comparing with one's own knowledge occurred when the user already had a definite idea about the domain concept. Often this type of explanation use was signified by "OK, that's the way I would do it" (after reading the explanation) or "that makes sense to me" (before reading the explanation). In many cases, users just wanted to use explanations to confirm with what they remembered. For example, when looking at the list of ratios involved in the Liquidity Analysis, Novice No. 4 commented "OK, just to make sure that I originally did this, see how they computed some things," and went ahead to request and read a few DE. He concluded "That's fine." Expert No. 5 commented "about price-earning, ok. Now, I want to know why this is important (DE-why-9.1), although I think I know why hopefully." Similarly, after requesting DE-why-6.4, Expert No. 5 uttered: "I know that, but, I'm just curious, to see what it's going say to me. 5. Others Category included all other types of explanation use that did not fit into the above four categories. Each instance falling into this class was labelled by the coder. For example, occasionally, some users might request explanations just out of curiosity to see what 135 the explanations had to say about the domain concepts, e.g., "I'm curious about what they say about that?" Most of the DE use falling into the Other category might be labelled as browsing, and usually occurred when the user was reading index screens, often involving the Strategic type of explanation. The purpose might be to see if there was anything interesting or particularly useful. No additional major category of explanation use could be identified from the Other category, after all the protocol data were coded. 6. Inconclusive Category was designed for situations where there was not enough information to determine to which of the above categories the use of an explanation belonged. 7. Not using DE was treated as a separate class to record situations where users did not request explanations. Usually, no verbal reports were provided if users did not use explanations, but occasionally, a few words were verbalized. For example, Novice No. 10 skipped an index screen of domain concepts, where DE on the concepts could be sought, since he was "very comfortable, very comfortable with the ratios." Expert users usually skipped index screens, thinking "This is the stuff I already know" (Expert No. 3), "I should know that already" (Expert No. 5), and "Everything there was straightforward" (Expert No. 17). Some would start by looking for "anything I don't know? or feel uncomfortable with" (Expert No. 17). 7.6 Classification of the Use of Reasoning-Trace Explanations The classification of the use of RTE was based on information indicative of the extent to which users already had prior expectation, understanding, and opinion (they all had studied the case without KBS support first) regarding a recommendation, as well as the intention at the moment of requesting explanations. 136 1. Understanding the significance or relevance of a recommendation would be sought when something not thought of or not understood was given. Users might utter something like "Why is it important?" "Why do you say that?" or simply "Why?" For example, Novice No. 4 read and paraphrased C7.3, "... not maintain enough investment to be competitive. Why do you say that? why is that important?" Then, he went ahead to request and read RTE-why-7.3. 2. Understanding the reasoning process of a recommendation would be sought when one could more or less accept the recommendation, but had no idea about the reasoning process and would seek more details. The use of explanations for understanding the reasoning process was relatively easy to identify, usually evidenced by "How did they do that?" or simply "How?" For example, Novice No. 2 read C5.3, then said "Let's see how they got that conclusion," and requested, read, and paraphrased RTE-how-5.3. Another example is, Novice No. 4 read C9.2 "The company's growth has peaked, ... diversify into newer... products ... in the last five years." Having some difficulty understanding the recommendation, he re-read it, "diversify into newer markets, let's see how they came up with that." Then, he went on to request and read RTE-how-9.2. The use of RTE to understand the underlying knowledge and reasoning process did not always lead to agreement with the recommendation. This seemed to be particularly true, for expert users. For example, upon reading C7.2, Expert No. 3 first examined the reading material printed on paper for relevant data, then questioned "how did they reach that conclusion?" He requested RTE-how-7.2, and contended "OK. That's wrong. High-tech is almost always equity financed, almost 100%, because of the risk." By contrast, the same user read C10.3, "how did you arrive at that conclusion?" He went on to request and read RTE-how-10.3, then, "OK, agree 137 with that." 3. Confirming or comparing with one's own judgment would be sought when the user had some idea more or less consistent with the recommendation of the KBS, which was critical for decision-making. Common indicators were, "Makes sense" (after reading the recommendation but before reading the explanation), "That is what I expected," "I have to agree, let's see why and how," "Yes, I noticed," and "Yes, that's what I thought." Sometimes, even if the KBS reached the same conclusion as a user did, the user would still like to know the exact reasoning process of the KBS. For example, upon reading CI0.1, which praised the cash flow management of the company, subject Novice No. 10 commented "I thought, they managed cash flows very well. Using the expert system to see why this is important," and proceeded to check the Why explanation, then How, to compare with his own reasoning. Novice No. 6 read C6.1, then uttered "Well, I, kind of agree. But, what's, how come?" Then, requested and read RTE-how-6.1. Similarly, Expert No. 2 commented on C6.2, "Certainly true, undisputable. Why? How?" He then requested and read RTE-how-6.2. 4. Seeking additional (missing) details would be tried when the user thought some specific details in the explanation might be useful to back up or justify a recommendation. Typical indicators include "Where do they get R&D numbers?" "Let's take a look at problems of credit sales letting the accounts receivable build-up ... (READ RTE-why-6.2) poor collection practices." For example, upon reading C7.3, Novice No. 10 commented "Yes, it seems to be true. Let's look at the figure." Then, he requested RTE-how-7.3. Similarly, upon reading C5.4, Novice No. 4 commented "How did they determine the level of investment in R&D?" then requested and 138 read RTE-how-5.4, and subsequently re-read it, and finally flipped through the reading material on paper for the missing R&D numbers. Novice No. 4 read the first half of C5.4, uttered "self-production, right." Then, continued, "that's a surprising part. It depends on what they call research and development." He then read the second half, "probably just re-assembly operations, they do not need R&D for that. Where are the R&D numbers? I do not see any R&D numbers here. Income statement? R&D. Cash flow. Where do they get R&D? Just take a look." He read RTE-why-5.4, "Right. Agree." Re-read the recommendation, "OK. They make sense. OK." 5. Surprise might cause a user to seek more information when the user was expecting something different, or something similar but to a different degree. Usually the verbal report included sentences implying surprise and disbelief, such as "Where did you get that? That's bizarre. How?" "I have no idea what they are talking about, let's see Why. ..., Let's see How" and, "How the devil can you know that?" For example, after reading C9.2, Expert No. 2 commented "that doesn't seem to be consistent with anything so far." Then, he requested and read RTE-how-9.2, "And how? you are going to tell me ... internal growth rate, return on equity. OK." Novice No. 6, who was a confident novice and an executive of the UBC Portfolio Management Society, read C7.3, "There is a danger ... not maintaining enough ... mature industries ... than rapidly expanding high-technology. Hmmm, how did it get that?" She requested and read RTE-how-7.3, then, "OK, read this again (C7.3), not maintaining enough, OK, well, I kind of agree, so 3." 6. Disagreement might cause a user to seek more information when something contrary to what was expected was given. Disagreement was expected to be one of the most common 139 reasons for requesting explanations. It was typically associated with verbal reports like "I don't agree with that at all, ..., How did it do that?" "I don't know. I don't agree with that one." And, "We have to disagree here. Let's take a look, why and how." For example, Novice No. 14 read C4.2, "Investment in current assets seems to be increasing at the expense of fixed assets" repeated "Investment in current assets seems to be increasing at the expense of fixed assets ... Depending on the industry, they are high-tech, distributing. I'm not so sure the current assets are a bad thing to be going up in that kind of industry. So, I am going to disagree slightly, and I'd like to see why they think that way." Then, she requested and read RTE-why-4.2. Similarly, Expert No. 2 read C9.4, and verbalized "... high level of debt ... That's ridiculous, they've already been complaining about the fact that they are not investing enough." He then read RTE-how-9.4, then "So, this looks at dividend yield, should be looking at the reinvestment rate, they have been complaining on the previous screen. OK, next screen." In two other cases, users explicitly indicated in their verbal reports that they would request explanations if they disagreed. Expert No. 5 mumbled "the whys and hows of it, I guess, once I see the conclusions then I would be interested in the explanation. So, that's the way I would feel. If I don't agree with." Expert No. 17 mentioned "if I see something negative, I want to know why or how they came to that conclusion." 7. Others Category included all other types of explanation use that did not fit into the above six categories. Each case falling into this class was labelled by the coder. No additional major category of explanation use emerged in the end. 8. Inconclusive Category was designed for situations where there was not enough 140 information to determine to which of the above categories the use of an explanation belonged. 9. Not using RTE was treated as a separate class to record situations where users did not request explanations. Often, if users agreed with the KBS, they were less likely to request explanations. In particular, if users had formed an opinion before hand (e.g., when reviewing the data), once they found out the system agreed with own opinion, they would almost certainly not request RTE. This was probably because the confirmation would be sufficient for the user to make judgment with confidence. For example, Novice No. 6 agreed with C10.2, without a prior opinion, "Yes, I agree. But, I want to see how you got that. Hrnmm," and then requested RTE-how-10.2. In contrast, when reviewing data, she revealed "the price-earning ratio has increased over time, so in the past five years. So, I guess either the stock is overvalued or something" This happened to be consistent with one of the KBS recommendations, C9.1. So, when later she read it, "Yeah, I agree, because stock is overvalued. OK. I agree, so it is 2." In a similar situation, Expert No. 4 proclaimed twice that the company should pay off its shareholders' dividend before reading C9.3, which proposed the same. When he read C9.3 at last, he immediately agreed strongly, and felt no need to request for an explanation. In another example, Novice No. 14 had previously noted a very high level of inventory that the hypothetical company was carrying, and then read C6.3, "Inventory is very unfavourable. Yes!, yes, increasing trend... tighter control over inventory management. Strongly agree." No explanation was sought for C6.3. Generally speaking, complete agreement or trust often caused a user not to request RTE, e.g., "OK, I don't need whys and hows on that one (C7.3). I trust that one" (Expert No. 2). No RTE would be sought when verbal protocols included "I thought of that, too" (Novice No. 14, 141 C4.1), "That's what I thought" (Expert No. 5, C10.3), "Yes, we noticed that before" (Novice No. 14, C10.1), "Yes, I noticed, I strongly agree" (Expert No. 2). By the same token, having a good reason to strongly disagree with a recommendation sometimes seemed to cause a user not to request RTE, too. For example, after reading CI.2, Novice No. 4 commented "Ah, yeah, but because they are applying for a loan, it seems irrelevant. Disagree with that (assigned a strong disagreement rating of 6)." There was not enough information in the verbal reports for systematically categorizing the reasons for not using explanations. Users tended to verbalize what they did or intended to do, not what they did not do. 7.7 Pattern of Explanation Use The classification of explanation use discussed in the previous two sections was statistically analyzed to identify patterns of use by various treatment groups. Frequency data were obtained by classifying each explanation request according to the category and experimental treatment condition. Contingency table analysis was performed, because it was particularly appropriate for comparing the equality of two or more distributions over a set of two or more categories. This section presents the results of the statistical analysis. Among the nine experts who provided verbal protocols, three of them using the lineartext version of the KBS did not request any DE. Thus the number of experts who requested DE was reduced to six, five in the hypertext condition, and only one in the lineartext condition. Therefore, it became impossible to examine the interaction effects between expertise and explanation provision methods, although such interaction effects were not expected. 142 The Use of Deep Explanations. Before analysis, the six categories of explanation use were aggregated into three more general categories, for a number of reasons. First, given the relatively small sample size, the data were too thin to be spread into six or more categories. As shown in Appendix I, 17% to 50% of the cells in contingency tables had expected frequencies smaller than 5, which might result in inaccurate chi-square tests. The reason for this is that contingency table model assumes large sample size (Wickens, 1989), and that the sample size should be larger when the marginal categories are not equally likely. Therefore, it was impossible to examine the pattern of explanation use in the original categories, and some kind of aggregation was necessary. Second, since similarities exist in the original six categories, some of the categories could be naturally combined. For example, it was sometimes difficult to distinguish between whether a user was trying to learn something completely new or just seeking additional knowledge. These two categories could be combined to reflect a user's state of little or no knowledge about domain concepts. Furthermore, given the similarities between some of the categories, a refined categorization would not be as reliable as a coarse one. The analysis of aggregated data was a necessary tradeoff between coarse-grain but more reliable categorization and analysis, versus fine-grain but less reliable categorization and analysis. Table 7.1 shows the overall pattern and difference between novices and experts in the nature of DE use. On the "less knowledge" side, DE were requested for learning about new or unfamiliar domain concepts, or seeking additional information. Overall, this category accounted for over 50% of DE use. On the "more knowledge" side, DE were requested for refreshing memory, or for confirming and comparing with what was already known. The third category included "Others" (mainly browsing) and "Inconclusive" (when no indicative verbal clue was 143 Frequency Novices' Experts Total Types of DE Use Learning & Seeking Additional Information 78 63.9% 21 30.4% 99 51.8% Refreshing Memory & Confirming / Comparing 38 31.1% 36 52.2% 74 38.7% Others & Inconclusive 6 4.9% 12 17.4% 18 9.4% Total 122 63.9% 69 36.1% 191 100% Statistics Value DF Significance Pearson's chi-square 21.85 2 .00 Note 1: the number of novices is greater than that of the experts by four (10 versus 6). Table 7.1 Effects of Expertise on DE Use provided). The majority of the DE requested by novices were for seeking new or unfamiliar domain knowledge, whereas over 50% of the DE requested by experts were for refreshing, comparing with, and confirming what was already known. For experts, a greater portion of verbal reports were categorized as Others and Inconclusive, implying that they were more likely to either use DE for browsing (most of the cases in the Others category) or without providing verbal protocols. Table 7.2 was constructed to identify the potential effect of explanation provision methods on the nature of DE use. As expected, there was no significant difference between the hypertext and lineartext groups. In other words, the use of hypertext to provide DE only increased the quantity of DE use, but did not affect the quality (nature or redistribution of the major types of 144 Frequency Lineartext1 Hypertext Total Types of DE Use Learning & Seeking Additional Information 27 57.4% 72 50.0% 99 51.8% Refreshing Memory & Confirming / Comparing 19 40.4% 55 38.2% 74 38.7% Others & Inconclusive 1 2.1% 17 11.8% 18 9.4% Total 47 24.6% 144 75.4% 191 100% Statistics Value DF Significance Pearson's chi-square 3.95 2 .14 Note 1: the number of hypertext users is greater than that of lineartext users by four (10 versus 6). Table 7.2 Effects of Explanation Provision Methods on DE Use DE use). The patterns of the hypertext and lineartext group were about the same. The Use of Reasoning-Trace Explanations. Before statistical analysis, the initial refined categories of RTE use were aggregated, for the same reasons for DE use. The four more general categories were: (1) understanding the significance or process of reasoning, because there was no or little prior understanding of KBS recommendations; (2) confirming or comparing with users' own understanding, or seeking additional details, further to some prior understanding; (3) dealing with surprise and disagreement, with evidence of clear prior expectation or thorough understanding; and lastly (4) other purposes, such as simply browsing, or requesting RTE without providing verbal reports. Table 7.3 shows the overall pattern and difference between novices and experts in the 145 Frequency Novices1 Experts Total Types of RTE Use Understanding Significance & Process 84 62.7% 52 45.6% 136 54.8% Confirming/ Comparing & Seeking Additional Information 26 19.4% 38 33.3% 64 25.8% Surprise & Disagree-ment 11 8.2% 13 11.4% 24 9.7% Others & Inconclusive 13 9.7% 11 9.6% 24 9.7% Total 134 54.0% 114 46.0% 248 100% Statistics Value DF Significance Pearson's chi-square 8.56 3 .04 Note 1: the number of novices is greater than that of experts by one (9 versus 8). Table 7.3 Effects of Expertise on RTE Use nature of RTE use. Over 50% of the RTE requested were for understanding the significance and process of recommendations provided by KBS, compared to about 25% for confirming, comparing, and seeking additional details. About 10% of the RTE were requested because of surprise and disagreement. Most of RTE requested by novices were for more basic understanding of the reasoning process and significance. For experts, while understanding the basics regarding the recommendations was also the most common use of RTE, a third of the RTE requests were for confirming/comparing KBS output with their own understanding. Table 7.4 indicates that there was no significant difference in the pattern of RTE use between the hypertext and lineartext groups. Reliability of Coding and Analysis. The results presented in this section were based on 146 Frequency Lineartext1 Hypertext Total Types of RTE Use Understanding Significance & Process 54 49.1% 82 59.4% 136 54.8% Confirming/ Comparing & Seeking Additional Information 31 28.2% 33 23.9% 64 25.8% Surprise & Disagree-ment 13 11.8% 11 8.0% 24 9.7% Others & Inconclusive 12 10.9% 12 8.7% 24 9.7% Total 110 44.4% 138 55.6% 248 100% Statistics Value DF Significance Pearson's chi-square 2.87 3 .41 Note 1: the number of hypertext users is greater than that of lineartext users by one (9 versus 8). Table 7.4 Effects of Explanation Provision Methods on RTE Use the coding by the author of this dissertation. A graduate research assistant also coded the verbal protocols to verify the reliability of the coding. She was given the coding scheme and additional instructions (Appendix H4), and worked with the author to code the verbal protocols by a pilot subject, to ensure consistency. Then, she independently coded all the verbal protocols. The reliability of the coding was assessed based on the aggregated categories, since statistical analyses were only performed on these. Over 70% of all the explanations requested were assigned into the same category by the two coders, with each working independently. The agreement level between the two coders was 71.7% and 73.1%, for DE and RTE, respectively. Cohen's Kappa (1960), which is a commonly used measure of agreement for nominal scales, was also calculated. It is defined as the 147 "proportion of agreement after chance agreement is removed from consideration" (pp. 40). When obtained agreement equals chance agreement, kappa equals zero. The Kappa coefficients for DE and RTE were .51 (t = 9.6) and .54 (t = 11.8), respectively. These two measures of coding consistency and reliability between the coders were reasonable, given the inevitable subjectivity in categorizing verbal protocols and the high complexity of the experimental task. To further verify the reliability of the analyses, the categorization by the second coder was also subjected to the same analyses reported in this section. The conclusions for DE use were identical. For RTE use, the conclusion was also the same for the difference between novices and experts. The only discrepancy was that the chi-square test of the effect of explanation provision methods on RTE use was significant (p = .03). A further examination revealed that the discrepancy was mainly caused by reversed patterns of the two treatment groups in the Surprise plus Disagree category and the Others plus Inconclusive category. 7.8 Linkage to the Theoretical Foundations Discourse comprehension theories were used to predict the effects of hypertext and domain expertise on the use of explanations (see Chapter 3). However, the cognitive process and intervening variables were not observable. In studies of text comprehension, the primary source of empirical data has been subjects' subsequent recall and summary of the texts they read (van Dijk & Kintsch, 1983). Verbal protocol analysis is a more direct method that provides information about the content of thought processes during comprehension, and allows the researcher to identify information to which people pay attention. According to van Dijk and Kintsch, readers must integrate the information in the sentence 148 they are currently reading with information mentioned previously in the text, and their prior knowledge, for comprehension. It is assumed that only a small number of ideas or propositions from the previous text can be maintained in STM. The model proposed by van Dijk and Kintsch consists of three intervening variables, namely, the number of memory reinstatements and bridging inferences, and macrostructure of the text. The need for reinstating a piece of information that is not available in the STM buffer arises when the textual input on a given cycle cannot be related to the information still held in the STM buffer. If the information is not available in the LTM either, bridging inferences are created. In this research, it is assumed that a user's LTM contains most of the task domain knowledge from prior training and experience. Some additional knowledge on domain concepts (financial ratios) can be obtained by reading DE. To fully comprehend KBS output, a user must have all the necessary domain knowledge available in STM. In using van Dijk and Kintsch's theories to predict the effects of hypertext and domain expertise, the following two points need to be emphasized: (1) The contents of memory reinstatement from LTM to STM include not only information read from previous text (e.g., reading material and KBS output read previously, in this case), but also domain knowledge accumulated from long-term training and experience; and (2) although information can be reinstated from LTM to STM, it may be vague or incomplete compared to information contained in KBS explanations. In some cases, the needed information may even be non-existent in LTM. Requesting KBS explanations, particularly contextualized access to DE, makes it convenient to bring the necessary knowledge into STM, as an alternative to memory reinstatement and bridging inference. Verbal protocol data could be analyzed and interpreted in the theoretical framework by 149 relating the data to the three intervening variables, to explain and understand the effects of the experimental treatment. Evidence related to each of the three variables is discussed in the following paragraphs. Explanation Use and Memory Reinstatement. The number of memory reinstatements may be reduced, and the reinstatement may be facilitated due to certain types of explanation use. As discussed in Section 7.5, Refreshing Memory is one of the major categories of DE use, which is related directly to memory reinstatement. If the knowledge needed was vague or partial in the LTM, requesting DE would provide more comprehensive and relevant knowledge. Some of the DE use for Confirming/Comparing may also be considered an alternative to memory reinstatements, in the following sense: Users knew they had learned about the domain concepts before, and the complete knowledge was retrievable from LTM. Rather than trying to recall the knowledge from LTM first then compare it with DE, users tended to request DE first, and then make a judgement on if the DE was consistent with what they understood or made sense to them (as evidenced by the sequence of verbal reports). In this case, some memory reinstatements were still needed, but they were facilitated, rather than being completely eliminated. Since hypertext made it possible to supply deep domain knowledge in a contextualized and instantaneous manner, users were more likely to request DE, and to reduce their dependence on their LTM for domain knowledge. For example, Novice No. 12 looked at the list of ratios used in the Market Value subanalysis, "price earnings ratio, ok, ah, a couple of these are not too f - (amiliar?), I need a little refreshing." He then requested DE-why-9.1, "primary measure of the market value ... relates to the company' stock price level and next income." Similarly, all of the examples given in 150 Section 7.5 on refreshing memory are of this nature. Some examples on confirming/comparing can be considered facilitating memory restatement, e.g., "I want to know why this is important, although I think I know why hopefully." RTE were provided to make the recommendations understandable to users. RTE, the How type in particular, usually recited the most relevant information used in the reasoning. Given the large amount of information available, users often forgot the relevant facts related to a recommendation, and had to refer to the hard copy of financial statements. Some users used the explanations as a shortcut for relevant information. The need for memory reinstatement might be reduced by requesting explanations to Seek Additional Information not mentioned in KBS recommendations. In the cases of Confirming/Comparing with one's own reasoning, users tended to request RTE first, and then make a judgment. Thus, memory reinstatements were facilitated. Explanation Use and Bridging Inference. There was evidence related to the effect of explanation use, particularly the use of hypertext-based DE, on reducing the number of bridging inferences. The use of DE to Learn About New Concepts definitely reduced the need for generating bridging inferences, as did the use of DE to Seek Additional Information, to a lesser extent. Hypertext contributed to this effect, by both highlighting what was important and available and by facilitating the access to the needed information. For example, when Novice No. 12 was examining the data on Table 6, the following was uttered: "Ok, yeah, current ratio has increased, acid-test has increased ... Conversion period, not sure what that is, ahm, so get them explain it." He then requested and read the corresponding DE-why-6.7, which was enabled by hypertext, "indicates the average days to turn inventory into cash. OK, that's fine." Had he not had the contextualized access to the explanations, he would 151 have had to try to infer or guess the meaning of the concept. Novice No. 8, who used the lineartext version of the KBS, obviously had problems with the term price-earnings ratio in C9.3. She read the related part of C9.3 repeatedly, and appeared to have generated her own bridging inference on the concept, "the consistent increase in the price-earnings ratio, price, price-earnines ratio, over the last five years, is an indication from the market's perspective, the company is doing significantly well. Price-earnings, how is price-earnings calculated again? How?" She requested RTE-how-9.3, which did not provide the needed information, "The conclusion was reached ... with the exception of a slight decline ..., ... evaluation of the price ... 16.28. Next screen. Let's see. So, price, price, you don't want price to be that high, want to be low." Before providing an agreement rating, she re-read C9.3. Had she used the hypertext version of KBS, she could have easily obtained the required information. Some of the RTE use could be considered as another form of reducing the number of bridging inferences, such as Understanding Significance and Understanding Process of a recommendation (see examples in Section 7.6). In these cases, since users could not follow the recommendation by the KBS, they requested RTE to bridge the gap between the data and the recommendation, rather than trying to come up with their own bridging inferences. Therefore, users' understanding of KBS recommendations can be facilitated, and RTE provided the basis for users to accept or reject the recommendations. Similarly, if a user partially understood a recommendation, and would like to Seek Additional Information, the use of RTE could also be considered a form of reducing the number of bridging inferences. Explanation Use and Macrostructure of Problem. There was evidence related to the effects of hypertext on conveying the macrostructure of the problem under consideration. 152 Hypermaps are potentially useful for conveying the structure of the underlying domain. It was expected that users would prefer hypermaps to index screens for accessing domain knowledge. However, only a small number of users actually utilized hypermaps. The reason for this could be that users were preoccupied with problem solving as opposed to learning. Alternatively, the other possibility is that the index screen was always the first screen for each subanalysis, while hypermaps needed to be requested intentionally at later stages. Although hypermaps were not as heavily used as expected, other hypertext features were helpful for constructing the macrostructure of KBS output. In some cases, contextualized access to DE on unfamiliar concepts could be the most crucial step for building up a macrostructure of a specific recommendation. The examples are related to C10.2, and will be discussed further in the next section (Section 7.9). However, it is worth noting here that they can also be considered as a demonstration of the fact that failure to understand a key concept could make it impossible to fully understand a recommendation. This could happen when the macrostructure of key issues about domain concepts resided in the DE. In this case, it would be difficult or even impossible to make a judgment without understanding the domain concepts. Hypertext markers attached to the domain concepts appeared to have a highlighting effect by catching the attention of users. Very often, when users read KBS output, they only verbalized the highlighted domain concepts, which were possibly the most important elements for building a global picture of the key issues involved. In fact, most of the RTE, particularly the How type, were constructed based on the interpretation of a number of financial ratios (domain concepts), which were highlighted in the hypertext version of the KBS. The phenomenon illustrated by the following example was a fairly common one. Novice 153 No. 4 read C6.2 "increasing trend towards having a high proportion of sales on credit, Yeah, remains within a range well below industry level." Then, "Just to examine again how it was done," he proceeded to view RTE-how-6.2, and produced the following verbal protocols: "days sales in receivable, yeah, OK. Accounts receivable turnover, just to see how they computed that, figure out how (DE-how-6.3), sales over average account receivable, OK. OK, based on that, again, I would agree with that. I'll continue on." The bold letters indicate domain concepts attached with hypertext markers, which caught the subject's attention. This was inferred from the verbal reports, which provided clues as to what information was heeded by the users (Ericsson & Simon, 1993). Furthermore, the highlighting and the signalling of availability effects triggered the user to request a DE. Sometimes, it was difficult to determine whether contextualized access to domain concepts reduced the number of memory reinstatements or the number of bridging inferences, improved understanding of the problem structure, or contributed to all of these. For example, Novice No. 4 read C8.1, "Canacom's management is following a policy of accepting a lower asset turnover for higher profit margins." "... Return on asset as the increase in the return on sales." Re-read C8.1 "policy of accepting a lower asset turnover for higher profit margins, it is paying off... return on asset, assets have been increasing, hmmm. Asset turnover, how has that been computed?" He then proceeded to request the corresponding DE-how-8.8, "... divided by total assets." Then, repeated C8.1 "... Continue this policy in the future." "Hmmm, ... better-than-industry return on asset as the increase in the return on sales. OK, OK." The relationship between the use of explanations and the intervening variables of information processing discussed in this section is summarized in Figure 7.1. The effect of 154 Use of Deep Explanations: Confirming/ comparing Refreshing/ memory Seeking additional info. Learning new concepts Use of Trace Explanations: Confirming/ comparing Seeking additional info. Understanding process Understanding significance Figure 7.1 Explanation Use and Information Processing Variables hypertext on the accessibility of DE is shown explicitly in the figure. Expertise effects are implicit because expertise would influence all categories of explanation use. 7.9 Two Cases of "Difficult" Recommendations In a review of verbal protocol analysis applications in text comprehension, Ericsson (1988) provides both predictions based on theories and empirical evidence that think-aloud instructions to subjects who read essay texts tend to yield little verbalization beyond the reading of the text, while more challenging texts produce much more informative think-aloud reports. However, to a large extent, information in these reports pertains not to comprehension per se but, 155 instead, to efforts to overcome failures and difficulties of comprehension. This conclusion is important for this analysis, as finding out how users dealt with difficulties of comprehension was the objective of this analysis. The above conclusion also implies that verbal protocols associated with "difficult" recommendations may be more informative than others. In this section, the behavioral differences in dealing with two "difficult" recommendations are compared between the treatment groups, based on relevant verbal reports. While 23 of the 25 recommendations provided by the experimental KBS were relatively easy to understand and less disputable, the other two were "difficult." This was initially noted by the author of this dissertation when he heard some unusual verbal protocols in response to these two recommendations. The verbal reports caught the author's attention again during the coding process. A thorough examination of these two special cases was subsequently undertaken, and informative verbal reports were found regarding the effects of hypertext and expertise. Bereiter and Bird (1985) identified four main strategies that subjects relied on when they found text difficult to comprehend: (1) They would rephrase the text in simpler terms; (2) failure to comprehend the last segment could lead to back-tracking, i.e., causing subjects to re-read the preceding text; (3) subjects would occasionally identify missing pieces of information by verbalizing questions (e.g., a text like "No one knows exactly when ..., but..." may lead to the question, "why did ...?"); and (4) subjects would occasionally identify comprehension problems and engage in active problem solving. Levin and Addis (1979) also found that difficulties in the reading material elicited regressions and re-fixations on material viewed earlier. Some of the above strategies as well as some additional ones were found in the verbal reports related to the two difficult recommendations. 156 7-9.1 Analysis of Verbal Reports Related to C5.4 C5.4 was one of the two "difficult" recommendations, in which the first part was not obvious and the second part could not be derived from the materials given to the users: Recommendation (C5.4): The increase in manufacturing payroll suggests increased self-production at the company. However, the low level of investment in research and development does not bode well for competing in the rapidly changing high technology product industry. The following is how Novice No. 10 tried to comprehend C5.4. He started with reading the recommendation "Increase in manufacture... rapidly changing." And, he commented "I think, I know the why, I'm going to the how." He then went on to request and read the RTE-how-5.4 for the reasoning process, "the industry is currently going through a shake-out phase as ... players. Yes. integration ..." Then, he went back to re-read the difficult part of the recommendation "So, low level of investment in research, I should go for Why," and read RTE-why-5.4. Again, he was not completely satisfied, and started to re-read the recommendation, "Increase in manufacturing payroll, I'm a little puzzled by that. I'll slightly agree with that, I think. Go for 3 (a relatively low agreement rating, with 4 being neutral)." He initially tried the How explanation as the primary source for justification. Since the How explanation did not convince him, he then tried Why to get whatever additional information he could. Novice No. 14 read and re-read a certain part of this recommendation at least four times2: "The increase in manufacturing payroll suggests increased self-production. Try that again. The increase in manufacturing payroll suggests increased self-production in the company. I don't know what they mean by that. However, the low level of investment in research development does 2 Re-reading was sometimes silent, without a verbal trace. Thus, information on re-reading would not be complete unless some kind of tracing of eye-fixation had been adopted. 157 not bode well for ... competing in the rapidly changing high-tech. I agree with that. Let's go to Why for the other part ... Why? (RTE-why-5.4) it assesses ... offer more sophisticated ... OK, let's go back. The increase in manufacturing payroll suggests increased self-production, they never explain what self-production is." She then requested and read DE-how-5.4, and subsequently, uttered "What was that conclusion?" "The increase in manufacturing payroll suggests increased self-production in the company. I can't strongly agree, 'cause I don't understand the first one still. Increased self-production, I agree with the second one, though." Three experts (Expert No. 2, 17, and 22) out of nine who provided verbal reports re-read this particular recommendation; two experts (Expert No. 2 and 4) questioned the R&D. Two novices (Novice No. 10 and 14) out of 10 re-read the recommendation, and one (Novice No. 4) questioned the R&D number. Number of RTE-5.4 Explanations Requested Total Number of Subjects chi-square 0 1 or more Number of Subjects Novices 20 71.4% 8 28.6% 28 100% Experts 11 44.0% 14 56.0% 25 100% 4.09 (p = .04) Hypertext 18 64.3% 10 35.7% 28 100% Lineartext 13 52.0% 12 48.0% 25 100% 0.82 (p = .36) Overall 31 58.5% 22 41.5% 53 100% Table 7.5 Differences in the Use of RTE-5.4 Between Various Treatment Groups The above finding triggered a statistical test based on contingency table analysis. Table 7.5 compares the use of RTE of C5.4 by various treatment groups. The majority of novices, 20 158 out of 29, did not request any RTE of C5.4, presumably had no comprehension difficulty and "trusted" the KBS. By contrast, the majority of expert subjects, 14 out of 25, requested one or more RTE to deal with comprehension difficulty. The difference between the expert and novice group was significant (p = .04). Number of RTE-5.4 Explanations Requested Overall F-test 0 1 2 or more Mean Level of Agreement Novices 2.42 (19) 1.75 (4) 3.0 (4) 2.41 (27) Experts 2.00 (11) 2.22 (9) 3.60 (5) 2.40 (25) Overall 2.27 (30) 2.08 (13) 3.33 (9) 2.40 (52) F(l,50) = 6.1 p = .004 Notes: 1. The ANOVA (F-test) has the level of agreement on C5.4 as the dependent variable, and the number of RTE-5.4 as the independent variable at three levels, 0, 1, and 2 or more. 2. Lower agreement ratings indicate higher levels of agreement (with 1 being strongly agree, and 7 being strongly disagree). 3. Numbers in parentheses indicate the number of subjects in the cells. Table 7.6 The Relationship Between Agreement Rating and the Use of RTE-5.4 Table 7.6 depicts the relationship between the level of agreement (after viewing explanations) and the number of RTE requested. ANOVA showed that there was a significant negative relationship between the number of RTE requested and the level of agreement rating. There was no expertise effect or interaction effect between expertise and the number of 159 explanations requested. Since the level of agreement rating before requesting explanations was not captured, it was hard to determine whether a low level of agreement caused explanation seeking or the other way around. With the verbal protocol data, it can be concluded quite unambiguously that it was a low-level of agreement and a high-level of doubt that caused explanation seeking. It appeared that users who had stronger doubts about C5.4 would request more explanations, and that, among these users, experts tended to agree less strongly with the recommendation than novices at the end. Tables 7.5 and 7.6 suggest that expert users were more likely to find inconsistency in KBS output, request RTE to deal with the doubt, and were more difficult to convince, than novices. This result is consistent with prior studies on expertise. Experts were found to be better able to integrate the information presented, discover inconsistencies, recall the relevant information, and give an integrated account of the underlying issues (cf. Ericsson, 1988). 7.9.2 Analysis of Verbal Reports Related to C10.2 Recommendation CI0.2 was built around a rarely used concept, funds flow adequacy ratio. Thus, it was poorly matched to users' expertise, which was a typical cause of comprehension difficulty. CI0.2, along with the two DE of the term funds adequacy ratio, is shown in the following: Recommendation (C10.2): The funds flow adequacy of the company is low. It is not generating sufficient cash from operations to cover capital expenditures and net investments in inventories, etc. There is a need to secure additional financing for operations. Users of the lineartext version of the KBS could only read DE-how-10.5 and DE-why-10.5 from the index screen. Typically, these users experienced some difficulties with C10.2, as 160 (DE-how-10.5) The Funds Flow Adequacy Ratio is computed as follows: 5-year sum of cash from operations 5-year sum of capital expenditures, inventory additions, and cash dividends A five-year total is used in the computation to remove cyclical and other erratic influences. A ratio of less than 1 suggests that internally generated cash is inadequate to maintain dividends and current operating growth levels. A level of 1 or more is considered satisfactory. (DE-why-10.5) The funds flow adequacy ratio is computed to determine the degree to which an enterprise generates sufficient cash from operations to cover its basic needs based on attained levels of growth and without the need for external financing. For this purpose, the relevant needs considered are capital expenditures, additions to inventory, and cash dividends. Additions to accounts receivables are omitted as it is assumed that they can be financed by a growth in accounts payable. indicated by re-reading, multiple rephrasings, long pauses, resorting to RTE-why-10.2 and RTE-how-10.2 for additional information, and flipping through the hard copy of financial statements. For example, Novice No. 10 started by reading "Funds flow adequacy" (stressed the words funds flow adequacy), "Funds flow adequacy in the company is low, is not generating sufficient cash from operations ... need to secure additional financing." "Mmmm, up to that, I don't think so. But, funds flow adequacy," followed by a long pause. Without contextualized access to the DE of the term, he resorted to RTE for more clues. He read RTE-why-10.2, "Problem in its business cash cycle. So, maybe they are saying what, because of big inventories, they need to be streamlining. Maybe they have big inventories, 'cause they still, made mistakes in the past." Essentially, Novice No. 10 started to generate his own bridging inference, by guessing in order to make sense of the situation. He was still not satisfied with the information he had, and went on to try RTE-how-10.2, 161 "is based on the evaluation, ... total cash flow from operation is insufficient to cover the total of capital investments. That's true." He then, re-read C10.2, "there is a need to secure additional, need to secure financing for operations. Yes, I guess, through the inventories. It's not generating sufficient cash, or cash is very low, inventory is very high, too high. So, I'm going to slightly agree with that, 3 (a relatively low agreement rating)." Similarly, Expert No. 4, another subject of the lineartext version of KBS, read C10.2, "the funds flow adequacy is ... low ... How did you figure that? Mmmm." He requested both RTE-why-10.2 and RTE-how-10.2, had a long pause, then flipped through the reading material on paper (both were important indicators of comprehension difficulty). "Net income, cash flow from operating activities, where, where change in cash. Oh, they were positive all the time. I don't know why this low, sufficient cash from operations, the cash position has gone up each year. I don't know how, how, how, in 88 to 92, value is .56, funds adequacy ratio. Five year. OK, disagree." It was clear that the terminology was causing comprehension difficulty, and that the subject made a judgment without all the information he needed. After reading C10.2, two other novices (Novice No. 25 and 26) disagreed, and both requested RTE for help. Novice No. 25 uttered, "Ah. I don't think so. Why do you say that? So, they say, why they say what it is, it seems a kind of funny." He then requested RTE-why-10.2, and checked the reading material on paper. Novice No. 26 read, "the funds flow adequacy of the company is low, OK, this is saying that they are not generating enough cash from operations, which is wrong! They've got sufficient cash flows. Let's take a look at Why (RTE-why-10.2)." In both cases, the RTE could not supply the important information they needed, and the comprehension difficulty remained. 162 It was expected that fewer symptoms of comprehension difficulty would be found from the verbal data of users using the hypertext version than that of users with the lineartext version. Indeed, a completely different scenario was found. As soon as they started reading C10.2, the first thing most of the users noted was the term funds flow adequacy ratio, highlighted by a hypertext maker. They then used hypertext to access the corresponding DE to help understand the recommendation. For example, Novice No. 4 read "The funds flow adequacy in the company is low. It is not generating sufficient cash ...for operation" (stressed the words funds flow adequacy). Then, "How funds flow?" Subsequently, he requested and read DE-how-10.5, "five year sum of cash from operations... divided by... to maintain dividends and current operating level. Mmmm. OK, that is low, capital expenditure, and investment in inventories, Mmmm. That's interesting. I would have to say, to get some further explanations." He then read RTE-how-10.2, and finally said, "Yeah, I agree with that, 3." Expert No. 2 read, "The funds flow adequacy, this is going to tell me How and Why. I'll look at that." He then requested and read DE-how-10.5. "Never heard of that, never in my life. Now, I am going to ask it why it is important." He also requested and read RTE-why-10.2 "... in this regard has an important bearing on its future ability to meet debt obligations. Of course, it does, 'cause they put everything in internal funding that you complained about earlier. That seems really odd. It doesn't have, you think it's going to grow and keep the same capital structure. Then you think it would borrow." The contextualized DE provided some information for him to evaluate the recommendation, although he was critical of the reasoning in this case. Overall, users using the lineartext version of the KBS re-read (Novice No. 8 and 10), 163 stressed the terminology (Novice No. 8), paused (Novice No. 10, Expert No. 4), and checked reading material on paper (Expert No. 4), to deal with the comprehension difficulty. By contrast, those who used the hypertext version usually checked the DE to get a straightforward solution. Hypertext appeared to have made a difference in highlighting what could have been easily overlooked otherwise, and encouraged users to understand the terminologies and the domain knowledge contained in the DE. Some users even re-read the DE through hypertext, after having read them from the index screen initially. Number of DE-10.5 Explanations Requested Total Number of Subjects chi-square 0 1 2 or more Number of Subjects Novices 18 64.3% 6 21.4% 4 14.3% 28 100% Experts 14 56.0% 8 32.0% 3 12.0% 25 100% 0.76 (p = .68) Hypertext 8 28.6% 14 50.0% 6 21.4% 28 100% Lineartext 24 96.0% 0 0% 1 4.0% 25 100% 25.48 (p = .00) Overall 32 60.4% 14 26.4% 7 13.2% 53 100% Table 7.7 Difference in the Use of DE-10.5 Between Various Treatment Groups As can be seen from Table 7.7, 20 out of 28 users in the hypertext group requested at least one DE on the term funds flow adequacy ratio. By contrast, in the lineartext group, only one out of 25 users sought DE on the term. This comparison is much more significant than the aggregate analysis of DE use (for all the subanalyses), reported in Chapter 6. It implies that contextualized access to deep knowledge via hypertext would be much more extensively used, 164 when users were working in an unfamiliar domain. Number of DE-10.5 Explanations Requested Overall F-test 0 1 2 or more Mean Level of Agreement Hypertext 2.88 (8) 3.14 (14) 2.17 (6) 2.85 (28) Lineartext 3.57 (23) (0) 2.00 (1) 3.50 (24) Overall 3.39 (31) 3.14 (14) 2.14 (7) 3.15 (52) F(l,50) = 1.5 p = .24 Notes: 1. The ANOVA (F-test) has the level of agreement on C5.2 as the dependent variable, and the number of DE-10.5 as the independent variable at three levels, 0, 1, and 2 or more. 2. Numbers in parentheses indicate the number of subjects in the cells. Table 7.8 The Relationship Between Agreement Rating and the Use of DE-10.5 It is interesting to note, while on average novices used much more DE than experts (see Chapter 6), this was not true in this particular case. The 25 experts as a group requested 17 DE on funds flow adequacy ratio, which was more than the 15 DE requested by the 28 novices. The conclusion is experts generally requested fewer DE, but their use of DE was more focused and selective. Table 7.8 illustrates the effect of the use of DE on the agreement rating, which was in the expected direction, but not statistically significant according to the ANOVA. 165 7.10 Discussions MANOVA was performed to test the effects of verbalizing on explanation use, task time, and decision accuracy. As shown in Table 7.9, thinking-aloud had no significant effect on decision accuracy, RTE use, and the use of hypertext to request DE, but had some effects on other aspects of performance. The slight increase in the time of using KBS (p = .07) was consistent with Ericsson and Simon's prediction of the effects of level 2 thinking-aloud (1993). However, the thinking-aloud effect on DE use was a surprise, which was apparently caused by the significant increase in the number of DE requested from index screens. Explanation use data were double checked, and the increase in DE and Index use was found in three of the four treatment groups (except the expert-lineartext group). If the increase was induced by the instructions to think-aloud, one possible explanation would be that these subjects knew their use of the system was monitored, thus they might be more likely to feel obligated to use the index screens (as part of essential features of the KBS) then they would otherwise. Major findings from the verbal protocol analysis are summarized as follows: There were slight differences in the pattern of explanation use between novices and experts. Novices used a higher proportion of DE to learn about new knowledge or additional knowledge about domain concepts, while experts in general were more likely to request explanations for refreshing memory, confirming, or comparing. The underlying causes for this difference could be the disparity in the levels of domain knowledge, experience, and interests, between novices and experts. However, in special cases where unfamiliar domain concepts were perceived essential for understanding KBS output, experts and novices were equally likely to request DE. Novices used a higher percentage of RTE to understand the basic significance and reasoning process. In 166 Time1 (Minutes) DE RTE Improvement in Accuracy Index No. of Subjects Overall Hypertext-Abstract2 Hypertext-Contextualized No. of Subjects Overall Think-Aloud Group 37.3 12.6 16.2 0.05 8.6 19 Non-Think-Aloud Group 32.4 7.7 16.9 1.41 4.2 34 Overall 34.2 9.5 16.7 0.93 5.8 53 F-value 3.25 4.37 0.04 1.24 7.64 p-value .08 .04 .85 .27 .01 F(l,49) Multivariate F(5,45) = 2.26 p = .06 2.3 5.2 10 1.9 4.9 18 2.0 5.0 28 0.18 0.04 .68 .85 F(l,25) Multivariate F(2,24) = 0.10 p = .90 Notes: 1. Both explanation provision method and domain expertise were used as covariates. 2. Only the hypertext groups were included; domain expertise was used as a covariate. Table 7.9 Effects of Thinking-Aloud on Performance contrast, experts used a higher proportion of RTE for confirming, comparing, and seeking additional details. Experts also were more likely to identify inconsistencies in the KBS output, and to resort to RTE to deal with them. Hypertext had no effect on the overall patterns of the use of DE and RTE. However, contextualized access to domain knowledge enabled by hypertext was important for user-KBS interaction. Hypertext-based explanations would be critical for understanding KBS output if the 167 domain was difficult (involving unfamiliar concepts). Use of Deep Explanations 1. Learning and Understanding a. Learning a new concept b. Seek additional information 2. Refreshing and Confirming a. Refreshing memory b. Confirming/comparing Use of Reasoning-Trace Explanations 1. Learning and Understanding a. Understanding significance of recommendations b. Understanding process of reasoning 2. Confirming and Comparing a. Confirming/comparing b. Seeking missing details 3. Surprise and Disagreement a. Surprised (expecting something different) b. Disagreement (expecting something opposite) Table 7.10 Summary of Rationale for Explanation Use In conclusion, major contributions of this verbal protocol analysis include the following four aspects. First, empirical evidence on the nature of explanation use was collected and categorized for the first time in empirical studies of explanation use. Verbal data gave meaning to the results of other process and outcome measures. Without the verbal protocol analysis, it would have been difficult to link other quantitative process and outcome measures (based on computer logs and questionnaires) of explanation use and its effects. Second, major categories of explanation use were identified, which gave the designers of KBS some sense as to what extent explanation facilities were utilized and for what reasons (Table 7.10). Third, verbal data provided crucial linkage between the observable effects of the experimental treatments to the underlying theoretical constructs - intervening variables of information processing. Thus, empirical results could be interpreted and understood within a theoretical framework, based on 168 cognitive theories of discourse comprehension. And lastly, verbal protocol analysis triggered additional statistical analyses of some special circumstances of explanation use, and produced some informative and significant results that could have been easily neglected otherwise. 169 CHAPTER 8. CONCLUSIONS AND DISCUSSION This chapter concludes this dissertation by discussing the major research contributions, implications of the research findings, limitations, and some directions for future research. 8.1 Research Contributions and Implications This research investigated the use of hypertext to provide KBS explanations, for increasing their usefulness and usability, in response to the requirements and problems of existing systems. The ultimate objective was to determine the behavioral and cognitive basis of the use of hypertext to provide KBS explanations. Major contributions made by this research include empirical evidence, a theoretical framework, and novel applications of appropriate research methods in the study of KBS explanation use. 8.1.1 Major Research Findings The research questions investigated were: what are the effects of hypertext and domain expertise on the use of KBS explanations, and on the process and outcome of decision making? A number of interesting results have been found with respect to the research questions, and significant advantages of hypertext over conventional lineartext for representing and accessing deep knowledge in KBS. Since the underlying causes of the findings have been discussed in previous chapters, this section highlights the major findings, and briefly discusses their practical implications. Hypertext and the Number of DE Requests. Enhanced accessibility to DE via the use 170 of hypertext significantly increases the number of DE requested. The average number of DE requested by subjects in the hypertext group was three times as many as that of the lineartext group (13.9 versus 4.6); this was true for both novices and experts. This result seems to imply users' intention to access and use domain knowledge in decision making depends on the extent to which the access to domain knowledge is provided in a contextualized and instantaneous manner. The use of hypertext also appears to have reduced the motivational cost of learning, in terms of the production paradox (Carroll & Rosson, 1987). If a KBS is designed as a decision aid to work in a domain in which having access to domain knowledge is critical, hypertext would be a good choice for representing and accessing the knowledge. Hypertext and the Context of DE Requests. This study asserts that one of the major advantages of using hypertext to provide DE is to facilitate contextualized access to domain knowledge. Data analysis shows that in the hypertext group, about 37% of the DE were requested in the context of judgment making, rather than in the abstract. This effect is more significant for experts, as contextualized access to DE accounted for 43% of total DE requested. The effect of hypertext on shifting the context of accessing DE is highly desirable, according to theories of contextualized learning (Fischer, et al., 1990) and meaningful learning (Ausubel, 1968). Hypertext and Decision Accuracy. While the role of hypertext in increasing the access to domain knowledge in the context of judgment making in itself is practically important, the effect on decision accuracy is equally so. The reason is that the use of hypertext in providing KBS explanations can only be justified if there are significant benefits. This study has found that the use of DE is positively related to improvement in decision accuracy. As the verbal protocol analysis shows, DE provided with hypertext can influence the understandability of KBS 171 recommendations, especially when unfamiliar domain concepts are involved. In some cases, the lack of knowledge and means to access DE on a particular domain concept can make it difficult or impossible to understand a KBS recommendation. Hypertext and Preference for DE Types. User preference for DE types is influenced by the use of hypertext to represent and access deep knowledge. While only about 28% of the DE requested by users in the lineartext group were the How type, 42% fall into this category for the hypertext group. This is explained by the fact 48.7% of all the DE requests via hypertext were of the How type (Table 6.15), 49.3% of the DE requested via hypertext in the context of judgment making were the How type. The How type of DE provides definitional information for domain concepts, which is fundamental for understanding KBS output. In the context of problem solving, the most needed explanations are the definitional ones. Justification for domain concepts (Why) seems to be secondary compared to How, probably because users tend to be satisfied with basic explanations. This study has replicated Dhaliwal's (1993) findings on user preference for the DE types by novices and experts. It was found in this study, for novices, the order of preference for explanation types was Why, How, and Strategic; whereas for experts, the ranking was How, Why, and Strategic. Traditionally, most KBS only provided Why explanations to justify input to the reasoning-process, like those implemented in the experimental KBS. This study joins the Dhaliwal study to call for the provision of the How type of DE, which is almost equally important and used as the Why type of DE. This is especially the case for expert users, who are more likely to use How than Why explanations, although overall they use fewer DE than novices. Preference for RTE Types. In terms of overall user preference for RTE types, the 172 ranking was How, Why, and Strategic (47%, 34%, and 20% of total RTE used). The implication is when a user is concerned with the results of reasoning by KBS, the user is most likely to check the trace of the reasoning than the justification of the conclusion in terms of its significance and relevance. This is particularly the case for expert users, where the How type accounts for about 55%, Why accounts for 30% of total RTE requests, whereas the two corresponding percentages for novices are 40% and 36%, respectively. Experts have a stronger demand for the trace (i.e., How, as indicated in Tables 6.19 and 6.21), but novices request both How and Why types almost equally. Most existing KBS provide some form of reasoning-trace based How explanations, but usually no justifications for the conclusions (Why). This study is consistent with previous ones in recognizing and emphasizing the importance of the justification of KBS conclusions (referred to as reasoning-trace Why in this study), which accounts for 33.5% of total RTE use. Both experts and novices would use Why explanations, if provided. If the users are primarily novices, the need for good justification type of explanation is even stronger. RTE and User Perceptions. Although the use of RTE has no significant effect on decision accuracy, it is more important in influencing user trust in KBS and perceived usefulness of explanations, than DE (according to the PLS modelling). This result is interesting. DE are important and useful for judgment making, probably because they help users understand the meaning of KBS output. In contrast, RTE, being more specifically related to the problem under consideration, are concerned with the application of domain knowledge. Therefore, RTE may be more relevant for users in accessing the quality and behaviour of the KBS, i.e., what is going on inside the KBS. Therefore, in situations where a KBS is used as a decision aid, DE are important 173 for accurate judgment, whereas RTE are more important for users in developing a sense of trust in the KBS. Availability Versus Accessibility of Domain Knowledge. This study reaffirms that explanations are used in situations where a KBS is used as a decision-aid to make complex judgments, which was also found in the Dhaliwal study (1993). Dhaliwal found that availability of deep knowledge (DE provision strategies) substantially influenced the use of explanations. This research has found that accessibility to deep knowledge enhanced with the use of hypertext substantially increased the number of DE requested. In other words, how the deep knowledge is provided also makes a difference. There exists some inconsistency in previous research on the effects of expertise on explanation use. While some earlier studies concluded expertise had some impact on explanation use (e.g., Lamberti & Wallace, 1990; Ye, 1990), Dhaliwal (1990) claimed the contrary based on the observation that experts used just as many explanations as novices. While the inconsistency cannot be completely resolved by one study in terms of the number of DE requested, this study has produced more evidence on the existence of an expertise effect, as novices used nearly twice as many DE as experts (12.3 versus 6.3). However, in terms of the number of RTE requested, this study is consistent with the Dhaliwal one, in finding no difference between novices and experts. Nonetheless, novices and experts have different preference for RTE types. The use of RTE by experts was concentrated in the How type, whereas novices used the How and Why types equally. Therefore, if the designated user group of a KBS includes novice users, the Why type of RTE (justification of KBS conclusions) would be more important than otherwise. In summary, the use of hypertext for providing deep explanations has strong effects on 174 the number of DE used, improvement in decision making accuracy, preference for explanation types, and to a large extent, context of DE requests (i.e., shifting the request of DE from in the abstract to in the context of judgment making). These results illustrate the benefits of using hypertext to provide KBS explanations, and offer incentives and clear expectations for KBS designers. Most of the previous empirical evaluations of explanation use were focused on relatively narrow interests (e.g., the match between various user cognitive styles and explanation types; or between task types and explanation types), which were unlikely to lead to operational guidelines (cf. Huber, 1983) for KBS design. In contrast, this study investigates the effects of the contextualized accessibility to deep knowledge as the result of the use of hypertext, which is a more generic construct. Therefore, the results of this study have boarder implications for the design and application of KBS. 8.1.2 Theoretical Contributions A theoretical framework has been built by synthesizing cognitive theories of discourse comprehension and other related theories, for the provision and evaluation of hypertext-based explanations. Rather than as a novel technology, hypertext is positioned as an operationalization of contextualized access to problem solving knowledge. The effects of the use of hypertext can thus be analyzed and explained in terms of intervening variables of information processing originating in cognitive theories of discourse comprehension. The theoretical framework would be useful for the continuing effort in developing a stronger theoretical basis for the provision of KBS explanations. It would also be useful for system designers to conceptualize the fundamental 175 features of hypertext to design better explanation facilities. It is within this theoretical framework that the effects of hypertext and domain expertise were analyzed and hypothesized. Intervening variables of information processing were linked to the verbal protocol data, to help explain the effects of the independent variables on the dependent variables. Therefore, fundamental features of the independent variables, along with their effects, could be better understood than otherwise. 8.1.3 Research Methods One of the major limitations in previous research of explanation use is the failure to capture information about the actual use of explanations. This limitation is dealt with in this study through the verbal protocol analysis. With verbal protocol data, general patterns of explanation use are identified and categorized. Explanation use is linked to the intervening variables of information processing, in terms of memory reinstatement, bridging inference, and macrostructure of the task domain. The verbal data also triggered a detailed analysis of some particularly illustrative cases of explanation use. This detailed analysis provides evidence that contextualized access to DE can be critical for understanding KBS output. The result also implies that the use of hypertext for contextualized access to deep knowledge would be more effective and useful in domains where specialized knowledge on domain concepts is important. If there is a large disparity between the domain expertise of users and that of the KBS, there is a greater need for contextualized access to DE, and hypertext-based DE are more effective. In terms of the data collection method and procedures, the following observations and 176 lessons may be useful for future studies using verbal protocol analysis. First, having users provide agreement ratings was effective to a certain degree for getting them engaged in the task, and for increasing the amount of verbal reports. However, the usefulness and relevance of the protocol was not related to the amount of verbal reports. In fact, most of the verbal data were not relevant for the research questions of this study. A large proportion of the content of verbal reports involved reading KBS output. This was probably because most of the KBS output was reasonably easy for the subjects to understand. There were also huge variations in the contents of verbal reports across subjects. Some were informative of the cognitive process, while others were not. The amount and usefulness of verbal data appeared to be related to the personality and background of the subjects, in addition to training. For example, there were two expert subjects who had extensive experience in lecturing and public speaking. They were able to provide more informative verbal data. On the other hand, some subjects were nervous, and unable to provide informative data. Training and repeated reminders were not effective for several of these subjects, and additional training would have prolonged the experiment, and might also made the whole experimental process less realistic. Some kind of structured probing or level 3 concurrent verbal reporting methods could have been used to elicit more specific information on the use of explanations, but there was always the danger of interfering with the process. Second, unlike methods that trace eye-fixation, verbal protocols could not completely capture re-reading, which might have been a more useful variable for understanding the cognitive processes. Most of the previous research on explanation use involved unrealistic requirements and conditions. This research is built on progress made by Dhaliwal (1993) in striving for realism in the experimental KBS and task. The use of a simulated KBS allows the researcher to develop an 177 experimental system appearing realistic to the users, with minimum costs. The experimental task used in this study was considerably complex and "realistic." Effort was made to ensure the data collection procedure was also realistic. The restrictive requirement that an agreement rating for a recommendation must be provided before requesting RTE was removed, and users could more naturally request explanations and interact with the KBS. 8.2 Limitations of This Study A laboratory experiment research method was chosen for this study, with an emphasis on strong internal validity, but weak external validity (Cook & Campbell, 1979). Despite considerable efforts to minimize the limitation, external validity suffers from the artificiality of the laboratory setting and data collection procedures. Thus findings are less likely to be generalizable across time and settings. While the research setting is artificial, subjects are representative of potential real world KBS users. Since all novice subjects were undergraduate students specializing in accounting or MBA students with accounting backgrounds, and no one had any substantial financial analysis related work experience, novice subjects should possess characteristics similar to entry-level financial analysts. The expert subjects were professional Certified Financial Analysts or Chartered General Accountants, whose work was related to financial analysis, and are representative of professionals in the financial analysis industry. Therefore, it may still be appropriate to generalize the findings to entry-level employees and experienced professionals in the financial analysis industry. The use of different explanation types and implementations in other studies makes it 178 difficult to generalize and compare results. As with other studies, explanations implemented in the experimental KBS have their idiosyncrasies, which may affect external validity. This study adopted a system developed for the Dhaliwal study, thus some results can be compared between these two studies. Furthermore, the use of a simulated KBS in this study allowed the reasoning-trace explanations to be manually implemented using natural English sentences. Thus, some common deficiencies of explanations provided by existing KBS could be avoided to a certain degree. By doing so, the comparison between lineartext and hypertext could be made free of potential confounding factors that were related to deficiencies of existing explanation capabilities. But, the generalizability of the results to existing KBS could also be limited. The design philosophy of the FINALYZER KBS can be characterized as based on theoretical foundations, rather than closely following features of existing KBS. For example, FINAL YZER includes explanation types that are common in existing KBS, such as the How type of RTE and the Why type of DE, but the Strategic and How types of DE have no direct correspondence in most of the existing KBS. The exact patterns of explanation use identified in this study might be influenced by the unique features of the experimental system being used (e.g., implementation of DE versus RTE). However, there is no reason to expect that the categories of the nature of explanation use identified in this study would not be generic. All of the dependent variables (both process and outcome ones) were measured based on the initial use of KBS. It is not clear if the treatment effects can be generalized to repeated use of KBS. For example, the effects of availability of deep explanations (explanation provision strategy) and accessibility enabled by hypertext may be smaller on frequent users than on 179 infrequent ones. Operational constraints limited the development of a more complete and functional KBS. The use of a simulated KBS necessitated a "one-shot, single task" approach as opposed to a "multi-trial, multi-task" longitudinal one. The impact of the independent variables may evolve overtime, although prior research has found that user perceptions based on initial use could be used to reliably predict the future use the system. The potential novelty effect is another limitation of this study, which is related to the previous one. Generally speaking, KBS applications in practical financial analysis are not yet extensively adopted in the industry, and most subjects were exposed to it for the first time. The use of hypertext for knowledge representation and access is relatively new. Therefore, despite the fact that training was provided to minimize novelty effects, the behaviour captured can hardly be viewed as the "natural" behaviour that evolves over long periods of exposure to a technology. This represents a general weakness associated with all behavioral investigations of any new technology or method, and needs to be kept in mind in the interpretation of the results. Improvement in decision accuracy was measured by having subjects work on the same task and fill out the same questionnaire twice, before and after using KBS. There could have been (1) a learning effect caused by the fact that subjects might improve their decision accuracy simply because they worked on the same problem twice; and (2) an "anchoring effect" - subjects' previous decision might influence any decision made in the second time. The effects of the independent variables (explanation provision method effects and domain expertise) were assessed based on between-treatment differences, i.e., improvement in decision accuracy was a relative measure. The experiment was not designed to detect any interaction effects between such learning and anchoring and the experimental treatments for two reasons. First, there was no strong 180 theoretical base or empirical evidence to suggest the existence of the interaction effects. Second, the relationship among the independent variables, explanation use, and improvement in decision accuracy were explicitly tested using structural equation modelling (PLS). Thus, it is still appropriate to conclude that hypertext increased the number of DE requested, and that the use of DE was positively related to improvement in decision accuracy. While instructions to think-aloud had no substantial effects on overall performance, task time was increased slightly and thinking-aloud seemed to have slightly and significantly increased the use of DE from index screens in particular. Since only about a third of the subjects went through the think-aloud procedure, and were equally distributed in all treatment groups, a comparison can be made with the other subjects to identify possible thinking-aloud effects. Any potential effects of thinking-aloud can be controlled and taken into account in the interpretation of the results. 8.3 Directions for Future Research This study investigated the use of hypertext to improve the accessibility of deep knowledge, not the use of hypertext to enhance RTE directly. The focus was on the lack of deep knowledge in KBS explanations, and the natural and contextualized access to deep knowledge. RTE in FINALYZER had only one level of reasoning-trace, as opposed to multiple levels usually provided by existing KBS. It would be interesting to explore the effects of the use of hypertext to represent and access each step of reasoning and evidence. There is a need to develop a unifying framework for the provision of KBS explanations and the study of explanation use, by integrating theories and empirical findings in the areas of 181 cognitive psychology, artificial intelligence, user-interface design, and explanation use. As interests in KBS explanations continue to grow, such a unifying framework is critical for guiding the implementation of KBS explanations and research design on explanation use, and for facilitating the accumulation of knowledge in the field. In terms of research methods, this study used primarily quantitative measures, supplemented with qualitative data from verbal protocols from a small proportion of subjects. In particular, level 2 thinking-aloud was targeted, and the objective was to balance the need for informative verbal protocols on explanation use and the need to minimize intrusion to "natural" explanation use. Other methods of verbalization may be used for different research questions. For example, the level 3 thinking-aloud may be targeted to acquire more informative data in a study primarily focusing on verbal protocols. Furthermore, since difficult KBS recommendations tend to generate more informative verbal protocols, the level of difficulty of KBS output may be manipulated in studies of explanation use. In particular, certain KBS recommendations can be deliberately made "difficult" to observe users' response and use of explanations under these particular circumstances. With the advance of computer technologies, more KBS are being used in organizations. It will be possible in the future to use real KBS, rather than simulated ones. Real KBS can be modified for field studies or longitudinal studies, to increase the external validity of research findings. Explanation is a key aspect of the user interface of KBS. While the lack of knowledge in KBS explanations has been long recognized, prior research has been unable to offer practical solutions. There exists more speculation than serious scientific investigations on the potential of 182 hypertext to enhance KBS explanations. This study has made some initial progress in investigating hypertext as an alternative method for representing and accessing deep knowledge in KBS, and has found some promising results. More work needs to be done to improve the usefulness and usability of KBS explanations. 183 BIBLIOGRAPHY Abu-Hakima, S. & Oppacher, F., Improving explanations in knowledge-based systems: RATIONALE, Knowledge Acquisition, Vol, 2, No. 4, 301-343, 1990. Ajzen, I., Intuitive theories of events and the effects of base rate information on prediction, Journal of Personality and Social Psychology, Vol. 35, 303-314, 1977. Alwin, D.F., Measurement and the Interpretation of Effects in Structural Equation Models, in Long, J.S., (eds), Common Problems /Proper Solutions - Avoiding Error in Quantitative Research, Newbury Park, CA: Sage publications, 1988. Anderson, C.A., Lepper, M.R., & Ross, L., Perseverance of social theories: The role of explanation in the persistence of discredited information, Journal of Personality and Social Psychology, Vol. 39, 1037-1049, 1980. Anderson, J.R., Acquisition of cognitive skill, Psychological Review, Vol. 89, 369-406, 1982. Anderson, J.R., Cognitive Psychology and Its Implications, New York: Freedman, 1985. Anderson, U. & Wright, W.F., Expertise and the explanation effect, Organizational Behavior and Human Decision Research, Vol. 42, No. 2, 250-269, 1988. Arkes, H.R., Dawes, R.N., & Christensen, C , Factors influencing the use of a decision rule in a probabilistic task, Organizational Behaviour and Human Decision Processes, 37, 93-110, 1986. Ausubel, D.P., Educational Psychology: A Cognitive View, New York: Holt, Reinhart and Winston, 1968. Bagozzi, R.P., Structural equation models in experimental research, Journal of Marketing Research, Vol. 14, May, 209-226, 1977. Barclay, D., Higgins, C.A., & Thompson, R., The partial least squares (PLS) approach to causal modelling: Personal computer adoption and use as an illustration, Technology Studies, (forthcoming), 1995. I , Benbasat, I. & Dexter, A.S., An experimental evaluation of graphical and color-enhanced information presentation, Management Science, 31(11), 369-406, 1985. Bereiter, C. & Bird, M., Use of think aloud in the identification of reading comprehension strategies, Cognition and Instruction, No. 2, 131-156, 1985. Bernstein, L.A., Financial Statement Analysis, 5th Edition, Homewood, IL: Irwin, 1993. 184 Bjorkman, M., Feedforward and feedback as determiners of knowledge and policy: notes on a neglected issue, Scandinavian Journal of Psychology, Vol. 13, 152-159, 1972. Borland International Inc., Borland C++ Version 2.0: Whitewater Resource Toolkit, Borland, 1991. Bromage, B.K. & Mayer, R.E., Relationship between what is remembered and creative problem-solving performance in science learning, Journal of Educational Psychology, Vol. 73, No. 4, 451-461, 1981. Camerer, C.F. & Johnson, E.J., The process-performance paradox in expert judgment: How can experts know so much and predict so badly, in Ericsson, K. A. & Smith, J. (eds), Towards a General Theory of Expertise, New York: Cambridge University Press, 1991. Camines, E.G. & Zeller, R.A., Reliability and validity assessment, Sage University Paper Series on Quantitative Applications in the Social Sciences, No. 07-017, Beverly Hills, CA: Sage Publications Inc., 1979. Carlsmith, J.M, Ellsworth, P.C., & Aronson, E., Methods of Research in Social Psychology, Reading, Massachusetts: Addison-Wesley Publishing Company, 1976. Carroll, J.M. & Rosson, M.B., Paradox of the active user, in Carroll, J. M., (ed), Interfacing Thought, 81-111, 1987. Carroll, J.M. & McKendree, J., Interface design issues for advice-giving expert systems, Communications of the ACM, Vol. 30, No. 1, 1987, 14-31. Carroll, J.M, Mack, R.L., Lewis, C.H., Grischkowsky, N.L., & Robertson, S.R., Exploring exploring a word processor, Human-Computer Interaction, Vol. 1, 283-307, 1985. Chandrasekaran, B. & Mittal, S., Deep versus compiled knowledge approaches to diagnostic problem-solving, International Journal of Man-Machine Studies, 19, 425-436, 1983. Chandrasekaran, B., Tanner, M.C., & Josephson, J.R., Explaining control strategies in problem solving, IEEE Expert, (Spring), 9-24, 1989. Chignell, M.H. & Hancock P.A., Intelligent interface design, in Helander, M. (ed), Handbook of Human-Computer Interaction, B.V., North-Holland: Elsevier Science Publisher, 969-995, 1988. Chin, W.W. & Frye, T., PLS-Graph, Version 2.7, 1994. Christopherson, S.L., Schults, C.B., & Waern, Y., The effect of two contextual conditions on recall of a reading passage and on thought processes in reading, Journal of Reading, 185 April, p. 572-578, 1981. Clancey, W.J., The epistemology of a rule-based expert system - A framework for explanation, Artificial Intelligence, Vol. 20, 215-251, 1983. Cohen, J., Statistical Power Analysis for the Behavioral Sciences, 2nd Ed., Hillsdale, NJ: Lawrence Erlbaum Associates Publishers, 1988. Cohen, J., A coefficient of agreement for nominal scales, Educational and Psychological Measurement, Vol. 20, No. 1, 37-46, 1960. Cohen, P., Cohen, J., Teresi, J., Marchi, M., & Velez, C.N., Problems in the measurement of latent variables in structural equations causal models, Applied Psychological Measurement, 14(2), 183-196, 1990. Conklin, J., Hypertext: a survey and introduction, IEEE Computer, Vol. 20, 17-41, 1987. Cook, T.D. & Campbell, D.T., Quasi-Experimentation: Design & Analysis Issues for Field Settings, Boston: Houghton Mifflin Company, 1979. Cronbach, L.J., Coefficient alpha and the internal structure of tests, Psychometrika, Vol, 16, 297-334, 1951. Davis, F., Perceived usefulness, perceived ease of use, and user acceptance of information technology, MIS Quarterly, Vol. 13, No. 3, 319-340, 1989. Davis, F., Bagozzi, R.P., & Warshaw, R.P., User acceptance of computer technology: a comparison of two theoretical models, Management Science, Vol. 35, 982-1003, 1989. Davis, R., Buchanan, B., & Shortliffe, E., Production Rules as a Representation for a Knowledge-Based Consultation Program, Artificial Intelligence, Vol. 8, No. 1, 15-45, 1977. Dhaliwal, J.S., An experimental investigation of the use of explanations provided by knowledge-based systems, unpublished doctoral dissertation, Faculty of Commerce and Business Administration, University of British Columbia, 1993. van Dijk, T. A. & Kintsch, W., Strategies of Discourse Comprehension, New York: Academic Society, 1983. Dollahite, H.A., A strategy and model for interactive explanation in expert systems, unpublished doctoral dissertation, University of Alabama in Huntsville, 1990. Ericsson, K.A., Concurrent verbal reports on text comprehension: A review, Text, 8(4), 295-325, 1988. 186 Ericsson, K.A. & Simon, H.A., Verbal Protocol as Data, Psychological Review, 87(3), 215-251, 1980. Ericsson, K.A. & Simon, H.A., Protocol Analysis: Verbal Reports As Data, Cambridge, Massachusetts: MIT Press, 1993. Feigenbaum, E., McCorduck, P., & Nii, H.P., The Rise of the Expert Company, New York: Times Books, 1988. Fikes, R. & Kehler, T., The role of frame-based representation in reasoning, Communications of the ACM, Vol. 28, No. 9, 904-920, 1985. Fischer, G., Lemke, A.C., & McCall, R., Towards a system architecture supporting contextualized learning, Proceedings of Eighth National Conference on Artificial Intelligence, Vol. 1, July 29 - August 3, AAAI Press/The MIT Press, 420-425, 1990. Foster, G., Financial Statement Analysis, 2nd Edition, Englewood Cliffs, NJ: Prentice-Hall, 1986. Fraser, L.M., Understanding Financial Statements, 3rd Edition, Englewood Cliff, NJ: Prentice Hall, 1988. Fornell, C , (ed), A second generation of multivariate analysis, Vol. 1, (Methods), New York: Praeger, 1982. Fornell, C. & Larcker, D., Evaluating structural equation models with unobservable variable and measurement error, Journal of Marketing Research, 18, 39-50, 1981. Funder, D.C., Error and mistakes: Evaluating the accuracy of social judgement, Psychological Bulletin, 101, 75-90, 1987. Gick, M.L. & Holyoak, K.J., Schema induction and analogical reasoning, Cognitive Psychology, Vol. 15, No. 1, 1-38, 1983. Gilbert, N., Explanation and dialogue, The Knowledge Engineering Review, Vol. 4, No. 3, 235-247, 1989. Gopal, A., Bostrom, R.P., & Chin, W.W., Applying adaptive structuration theory to investigate the process of group support systems use, Journal of Management Information Systems, 9(3), 45-69, 1992. Good, M.D., Whiteside, J.A., Wixon, D.R., & Jones, S.J., Building a user-derived interfaces, Communications of the ACM, 27(10), 1032-1043, 1984. 187 Gould, G.A., & Lewis, C , Designing for usability: Key principles and what designers think, Communications of the ACM, 28(3), 300-311, 1985. Hasling, D.W., Clancey, W.J., & Rennels, G., Strategic explanations for a diagnostic construction system, International Journal of Man-Machine Studies, Vol. 20, 2-19, 1984. Hayes-Roth, F. & Jacobstein, N., The state of knowledge-based systems, Communications of the ACM, 37(3), 27-39, 1994. Hays, W.L., Statistics, 3rd Ed., New York: Holt, Rinehart, & Winston, 1981. Heller, R.S., The role of hypermedia in education: a look at the research issues, Journal of Research on Computing in Education, Summer, 431-441, 1990. Higgins, E.T. & Bargh, J.A., Social cognition and social perception, Annual Review of Psychology, Vol. 38, 369-425, 1987. Hsu, K.C., The effects of cognitive styles and interface design on expert systems usage: an assessment of knowledge transfer, Unpublished doctoral dissertation, Memphis State University, 1993. Huber, G.P., Cognitive style as a basis for MIS and DSS designs: Much ado about nothing? Management Science, Vol. 29, No. 5, (May), 567-579, 1983. Jonassen, D.H., Semantic network elicitation: tools for structuring hypertext, In McAleese, R., & Green, C. (eds), Hypertext: The State of the Art, London: Intellect, 1990. Joreskog K.G. & Sorbom, D., LISREL VI: Analysis of linear structural relationships by maximum likelihood and least squares methods, Mooresville, IN: Scientific Software, Inc., 1986. Just, M.A. & Carpenter, P.A., The Psychology of Reading and Language Comprehension, Newton, MA: Allyn and Bacon, 1987. Just, M.A. & Carpenter, P.A., An theory of reading: from eye fixations to comprehension, Psychological Review, Vol. 87, 329-354, 1980. Kane, B., LIMES: A hypertext interface to a cholesterol management expert system, Artificial Intelligence in Medicine, 2, 193-203, 1990. Kay, R.H., The relation between locus of control and computer literacy, Journal of Research on Computing in Education, Summer, 464-475, 1990. Kerlinger, F.N., Foundations of Behavioral Research, 3rd Edition, Fort Worth: Holt, Rinehart 188 and Winston, Inc., 1986. Kintsch, W., & van Dijk, T.A., Toward a model of text comprehension and production, Psychological Review, Vol. 85, 363-394, 1978. Kintsch, W., The role of knowledge in discourse comprehension: A construction-integration model, Psychological Review, Vol. 95, 163-182, 1988. Knowledge Garden, Inc., KnowledgePro Windows, New York, 1991. Kruglanski, A.W., The psychology of being "right": The problem of accuracy in social perception and cognition, Psychological Bulletin, Vol. 106, No.3, 395-409, 1989. Kruglanski, A.W., Friedland, N., & Farkash, E., Lay persons' sensitivity to statistical information: The case of high perceived relevance, Journal of Personality and Social Psychology, Vol. 46, 503-518, 1984. Lachman, R., Comprehension aids for on-line reading of expository text, Human Factors, Vol. 3, No. 1, 1-15, 1989. Lamberti, D.M., & Wallace, W.A., Intelligent interface design: an empirical assessment of knowledge presentation in expert systems, MIS Quarterly, Vol. 14, No. 3, 279-311, 1990. Lerch, F.J., Prietula, M.J., & Kim, J., Measuring trust in machine advice, Working paper, Graduate School of Industrial Administration, Carnegie Mellon University, 1993. Levin, H. & Addis, A.B., The eye-voice span, Cambridge, Massachusetts: The MIT Press, 1979. Mao, J., Dhaliwal, J.S., & Benbasat, I., Enhancing explanations in knowledge-based systems with Hypertext, Journal of Organizational Computing, (forthcoming), 1995. Mason, R.O & Mitroff, I.I., A program for research in management information systems, Management Science, 19(5), 475-485, 1973. Mayer, R.E., Structural analysis of science prose: Can we increase problem-solving performance? In Britton, B.K. & Black, J.B., (eds), Understanding Expository Text: A Theoretical and Practical Handbook for Analysing Exploratory Text, New Jersey: Lawrence Erlbaum Associates, 65-86, 1985. Mayer, R.E., Elaboration techniques that increase the meaningfulness of technical text: An experimental test of the learning strategy hypothesis, Journal of Educational Psychology, Vol. 72, No. 6, 770-784, 1980. 189 Maybury, M.T., Communicative acts for explanation generation, International Journal of Man-Machine Studies, Vol. 37, 135-172, 1992. Miller, B.E. & Miller, D.E., How to Interpret Financial Statements for Better Business Decisions, NY: AMACOM, American Management Association, 1991. Miyake, N., 1986, Constructive interaction and the iterative process of understanding, Cognitive Sciences, 10, 151-77. Moffitt, K.E., An empirical test of expert system explanation facility effect on incidental learning and decision-making, Unpublished doctoral dissertation, Arizona State University, 1989. Moore, G. & Benbasat, I., Development of an instrument to measure the perceptions of adopting an information technology innovation, Information Systems Research, Vol. 2, No. 3, 1991. Moore, J.D. & Swartout, W.R., Pointing: A way toward explanation dialogue, Proceedings of Eighth National Conference on Artificial Intelligence, Vol. 1, 457-464, 1990. Moore, J.D. & Swartout, W.R., A reactive approach to explanation, in Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, MI, August 20-25, 1989. Muir, B.M., Trust between humans and machines, and the design of decision aids, International Journal of Man-Machine Studies, 27, 527-539, 1987. Newell, A. & Simon, H.A., Human Problem Solving, Englewood Cliffs, New Jersey: Prentice-Hall, 1972. Norusis, M.J., Advanced Statistics Guide: SPSS", McGraw-Hill Book Company, New York, 1985. Oz, E., A Study of Improvement in Decision-making Skills through the Use of Expert Systems, Unpublished doctoral dissertation, Boston University, 1990. Patel, V.L. & Groen, G.J., The general and specific nature of medical expertise: a critical look, in Ericsson, K. A. & Smith, J. (eds), Towards a General Theory of Expertise, New York: Cambridge University Press, 1991. Pedhazur, E.J., Multiple Regression in Behaviourial Research, New York: Holt, Rinehart and Winston Inc., 1982. Powell, J.L., An attempt at Increasing decision rule use in a judgment task, Organizational 190 Behaviour and Human Decision Processes, 48, 89-99, 1991. Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., Carey, T., Human-Computer Interaction, Wokingham, England: Addison-Wesley, 1994. Rada, R. & Barlow, J., Expert systems and hypertext, Knowledge Engineering Review, 3, 285-301, 1988. Robeck, M.C., & Wallace, R.R., The Psychology of Reading: An Interdisciplinary Approach, 2nd Edition, Hillsdale, New Jersey: Lawrence Erlbaum Associate, 1990. Rubinoff, R., Explaining concepts in expert systems: the CLEAR system, Proceedings of the Second Conference of Artificial Intelligence Applications, Los Alamitos, California: IEEE Computer Society Press, 416-421, 1985. Salthouse, T.A., Expertise as the circumvention of human processing limitations, in Ericsson, K. A. & Smith, J. (eds), Towards a General Theory of Expertise, New York: Cambridge University Press, 1991. Schank, R.C., Dynamic Memory - A Theory of Reminding and Learning in Computers and People, Cambridge: Cambridge University Press, 1982. Shanteau, J., Psychological characteristics and strategies of expert decision makers, Acta Psychologica, 68, 203-215, 1988. Shortliffe, E.H., Computer-Based Medical Consultations: MYCIN, New York: Elsevier/North-Holland Inc., 1976. Southwick, R.W., Explaining reasoning: an overview of explanation in Knowledge-based systems, Knowledge Engineering Review, Vol. 6, No. 1, 1-19, 1991. Spiro, R., Coulson, R.L., Feltovich, P.J., & Anderson, D.K., Cognitive flexibility theory: advanced knowledge acquisition in ill-structured domains, Proceedings of Tenth Annual Conference of the Cognitive Science Society, Montreal, 1988, 375-383. Stanton, N.A. & Stammers, R.B., Learning styles in a non-linear training environment, in McAleese, R. & Green, C. (eds), Hypertext: State of the Art, Oxford: Intellect, 1990. Sticklen, J. & Bond, W.E., Functional reasoning and functional modelling, IEEE Expert, Vol. 7, No. 4, 20-21, 1991. Stickney, C.P., Financial Statement Analysis - A Strategic Perspective, 2nd Edition, Fort Worth, TX: The Dryden Press, 1993. 191 Svenson, O., Eliciting and analysing verbal protocols in process studies of judgement and decision making, in Montgomery, H. & Svenson, O. (Eds), Process and Structure in Human Decision Making, John Wiley & Sons, 1989. Swartout, W.R., XPLAIN: a system for creating and explaining expert consulting programs, Artificial Intelligence, Vol. 21, 285-325, 1983. Swartout, W.R., Explanation, in Shapirio, S. C & Eckroth, D. (eds), Encyclopedia of Artificial Intelligence, Vol. 1, John Wiley & Sons, 298-300, 1987. Swartout, W. & Smoliar, S., On Making Expert Systems more Like Experts, Expert Systems, Vol. 4., No. 3, August, 196-207, 1987. Tabachnick, B.G. & Fidell, L.S., Using Multivariate Statistics, lad Edition, New York: Happer & Row, Publishers, 1989. Teach, R.L. & Shortliffe, E.H., An analysis of physician attitudes regarding computer-based clinical consultation systems, Computers and Biomedical Research, Vol. 14, 542-558, 1981. Todd, P. & Benbasat, I., Process tracing methods in decision support systems research: exploring the black box, MIS Quarterly, December, 492-512, 1987. Thompson, R.L., Higgins, C.A., & Howell, J.M., Personal computing: Towards a conceptual model of utilization, MIS Quarterly, (March), 125-143, 1991. Tukey, J.W., Bias and confidence in not-quite large samples, Annals of Mathematical Statistics, 29, (June), 614, 1958. Tulving, E. & Thompson, D.M., Encoding specificity and retrieval process in episodic memory, Psychological Review, 80, 352-373, 1973. Waern, Y., Thoughts on text in context: Applying the think-aloud method to text processing," Text, 8(4), 1988, 323-350.Wick, M. R., & Thompson, W. B, Reconstructive expert system explanation, Artificial Intelligence, Vol. 54, 33-70, 1992. Wickens, T.D., Multiway Contingency Tables Analysis for the Social Sciences, Hillsdale, NJ: Lawrence Erlbaum Associates, 1989. Wick, M.R. & Thompson, W.B., Reconstructive expert system explanation, Artificial Intelligence, 54(1-2), 33-70, 1992. Wild, A.R., Lambert, Z.V., & Durand, R.M., Applying the jackknife statistic in testing and interpreting canonical weights, loadings, and cross-loadings, Journal of Marketing 192 Research, 19, (February), 99-107, 1982. Wold, H., Systems analysis by partial least squares, In Nijkamp, P., Leitner, H., & Wrigley, N. (eds), Measuring the Unmeasurable, Boston, MA: Martinus Nijhoff Publishers, 1985. Wold, H. & Jorekog, K., (eds), Systems Under Indirect Observation: Causality, Structure, Prediction, Vol. 2, Amsterdam: North-Holland, 1982. Wold, H.A., Estimation and evaluation of models where theoretical knowledge is scarce, in Ramsey, J. and Kmenta, J., (eds.), Evaluation of Econometric Models, New York: Academic Press, 1979. Wright, W.F. & Aboul-Ezz, M.E., Effects of extrinsic incentives on the quality of frequency assessments, Organizational Behaviour and human decision processes, 40,143-152,1988. Ye, L.R., User Requirements for Explanation in Expert Systems, Unpublished doctoral dissertation, University of Minnesota, 1990. 193 APPENDIX A. EXPERIMENTAL MATERIALS (The attached experimental materials were used by the Hypertext group. They were different from those of the Lineartext group only in the two illustrative flow charts of the tutorial KBS and the experimental KBS. Hyper-FESTALYZER was also referred to as FINALYZER in this document.) CONTENTS: General Instructions A Short Note on Expert Systems Tutorial on the CREDIT-AD VIS OR Expert System The System Flow Chart of CREDIT-ADVISOR Canacom Corporation Loan Analysis Case Financial Statements and Tables of Financial Ratios Judgment Recording Sheets (Set 1) Using the FINAL YZER Expert System The System Flow Chart of FINAL YZER Judgment Recording Sheets (Set 2) Post-Study Questionnaire Debriefing Protocol Page: 194 195 196 197 198 200 210 214 215 216 220 225 194 Date: Reference Code: FINANCIAL ANALYSIS EXPERT SYSTEMS STUDY CONTENTS: - General Instructions - A Short Note on Expert Systems - Tutorial on the CREDIT-AD VIS OR Expert System - Canacom Corporation Loan Analysis Case - Judgment Recording Sheets (Set 1) - Using the FINALYZER Expert System - Judgment Recording Sheets (Set 2) - Post-Study Questionnaire - Debriefing Protocol 195 GENERAL INSTRUCTIONS The objective of this study is to evaluate: 1) the use of a financial analysis expert system to complete a loan analysis case, and 2) the judgements made with the assistance of such a system. We would like you to think carefully about the financial analysis case, the system's responses, and your judgments regarding the case as part of your participation. The accuracy of your judgements will be evaluated using computer and statistical analyses. Individuals with scores placing in the top 20 percent of the total number of subjects participating under similar conditions will be awarded a prize of $50 each. The prizes will be awarded strictly based on the quality of your judgments and not the amount of time you spend. The study will proceed as follows. You will start with two tutorials: one relating to the use of the "mouse" input device, and another one relating to the CREDIT-AD VIS OR expert system, which has a similar interface to the financial analysis expert system that you will use later. Next, you will be asked to analyze a financial analysis case of Canacom Corporation and to make a set of judgments relating to it. You will then use the FTNALYZER financial analysis expert system to complete your analysis and to make the same set of judgements again. Please take as much time as you need to perform a thorough and complete analysis. Finally, you will complete a short questionnaire to provide your opinions about the system you have used. Note that you are requested to conduct your analysis under the assumption of current economic and interest rate conditions. Please do not bring to bear on your analysis and judgments any fixed loan approval or other policies that may currently be used in the particular organization where you work. You should limit your analysis to the information provided to you. Should you have any questions regarding the case, the system, the judgments or these instructions, please do not hesitate to ask the research assistant for a clarification immediately. 196 A SHORT NOTE ON EXPERT SYSTEMS Expert systems are computer systems that make the specialized knowledge and expertise of a particular domain available to decision makers. They are generally developed by closely modelling the expertise and mode of operation used by human experts. Similar to human experts interacting with their clients, expert systems attempt to provide users with relevant information about the inputs they use in their analysis, as well as the conclusions that they reach. For example, human medical experts commonly inform patients of the inputs they are using for their diagnosis (e.g., why they are requiring a blood test or blood pressure reading) as well as their diagnostic conclusions. They provide, usually at the patients' request, explanations relating to the inputs used for the diagnosis or the conclusions reached. Similarly, expert systems go beyond the capability of conventional computer systems to provide users with various types of explanations. More specifically, the expert system you will be using will provide you with the following information: 1) the ratios and other inputs that it will use for each analysis, 2) the specific conclusions arising from each analysis that it has performed, and 3) explanations relating to both (1) and (2). For each input used or conclusion presented, three different types of explanations will be provided. These are the WHY, HOW and STRATEGIC explanations: WHY explanations either justify why a particular input or ratio is needed for an analysis, or rationalize why a particular conclusion that has been reached is important for the task. HOW explanations either detail how a particular input or ratio is defined and computed, or reveal how a particular conclusion has been reached by presenting a trace of the evaluations. STRATEGIC explanations either provide the overall structure in which all the relevant input information is organized, or the overall problem solving strategies in which a particular conclusion fits. 197 CREDIT-ADVISOR SYSTEM TUTORIAL The CREDIT-ADVISOR tutorial system has features that are identical to the financial analysis expert system (named FINALYZER) which you will use later. The objective of using this tutorial system is to familiarize you with the types and sequences of the screens that you will encounter in the use of the FINAL YZER, and the way you interact with FINAL YZER. Now, you should focus on the system features of CREDIT-AD VIS OR, rather than its contents. It is important to try ALL the features of the tutorial system, to know what to expect when clicking on any buttons, and to become comfortable with using the system. Note, your use of CREDIT-AD VIS OR is NOT part of the evaluation of your performance. The three types of screens that you will see are DATA SCREENS, INFORMATION SCREENS, and RECOMMENDATION SCREENS. On ALL of the three types of screens, there are WHY, HOW, and STRATEGIC explanations for each data item, analysis, or conclusion, respectively (see page 3 for definitions of these explanations). The system structure is illustrated on the next page. By the end of this tutorial, you should know exactly what to expect on the DATA SCREENS, INFORMATION SCREENS, and RECOMMENDATION SCREENS, and know what will happen when you click on the NEXT-SCREEN, PREVIOUS-SCREEN, BACK-TO-ANALYSIS, WHY, HOW, and STRATEGIC buttons. Assume that you are a credit officer at a local bank responsible for the evaluation of credit-card applications. The bank has received the application of Mr. Robert Mortenstein, and the relevant data from his application has been input into a computerised data file. You now wish to use the CREDIT-AD VIS OR expert system to evaluate the merit of the application. If you have any doubt regarding any system features of CREDIT-AD VISOR, please either ask the research assistant or try it out by yourself NOW! 198 RECOMMENDATION SCREENs: INTRODUCTION SCREEN * DATA SCREEN Why | How | Strategic I I INFORMATION SCREEN Why | How 1 Strategic 1 I EXIT SCREEN RECOMMENDATION | ^ y J FOR RFPAYMF.NT 1 HoW 1 ANALYSIS | Strategic RECOMMENDATION [Why | FOR COI.I.ATERAI/RISK 1 *"*" 1 ANALYSIS | strategic \ RECOMMENDATION | Why | FOR CREDIT STATUS 1 H°W 1 ANALYSIS | strategic RECOMMENDATION 1 Why 1 rRFnTTI.TMTT 1 W 1 ANALYSIS 1 Strategic 1 RECOMMENDATION [ **y | FOR DV^AIT 1 How 1 ANALYSIS Iftrateffjc 1 The System Flow Chart of CREDIT-ADVISOR 199 CANACOM CORPORATION LOAN ANALYSIS CASE Assume that you are a corporate loan evaluation officer working for a large financial institution in Western Canada. Your supervisor has asked you to evaluate the attached financial statements of Vancouver-based Canacom Corporation. These statements are the critical components of a more complete application for senior borrowings of $800 million that has been filed by the company for the purposes of streamlining operations. Your assessment of the company will form the basis of a more comprehensive loan evaluation to be undertaken by the Corporate Loans Committee, of which your supervisor is a member. Note that you have been told that the total repayment period will not exceed three years and that there is the possibility of the loan being convertible into stock at the end of that period. Canacom Corporation, through its chain of Computron Corner outlets, is one of North America's leading distributors of technology to individual consumers. Close to 75 percent of its business is in the United States, with 15 percent being in Canada, and the balance is mostly in Europe. Through more than 5700 company-owned retail stores and 3000 dealer/franchise outlets, Canacom distributes a broad product line that includes microcomputers and related software; televisions, radios, audio equipment, tape recorders, and related accessories; toys, antennas, security devices, timers and calculators; electronic parts, batteries, and test equipment among other products. While microcomputers, software, and peripheral equipment were not part of the company product line five years ago and represented only 2.4 percent of sales, they represented the largest component, 34.6 percent, of total sales in 1992. The financial statements of Canacom for 1988-92 and additional relevant qualitative information are presented below. The auditor's opinions on the financial statements have been unqualified for the past five years. To help you with your analysis, common-size financial statements and a complete set of the relevant financial ratios have been prepared for you using a computerised financial analysis package. Comparative information of Hightech Computer Corporation, a Pittsburgh-based major competitor of Canacom is provided. Additionally, the 200 industry composites of the electronic computing equipment manufacturing segment and the radios, televisions, and record players retail segment have been obtained for your use. Your supervisor has also suggested that you utilize the recently purchased FINALYZER financial analysis expert system to help you with the task. You have been specifically instructed to focus on all aspects of Canacom's valuation, liquidity, long-term solvency, asset utilization, and profitability. As part of your report, you will have to provide specific judgments for: 1) the exact amount of the $800 million loan requested that you recommend as being allowable assuming that no collateral or guarantees are provided, 2) a specific estimate of the expected total net earnings of the company in the coming year, 3) ratings of the quality of Canacom's financial management, operating management, liquidity position, long-term solvency position, asset utilization performance, and 4) a rating of the value of Canacom stock as loan collateral. You will also have to provide your subjective probabilities of the correctness of these four judgments. Please review the attached Judgement Recording Sheets now to understand the exact format in which these judgments are to be recorded. == Financial Statements and Ratio Tables Attached== 201 TABLE1 CANACOM CORPORATION Balance Sheets (In Thousands) June 30, 1988-1992 ASSETS Cash & Equivalents Accounts & Notes Receivable (Net) Inventories Other Assets Total Current Assets Net Property and Equipment Other Assets TOTAL ASSETS 1988 37,621 15,841 381,649 12,590 447,701 156,670 5,218 609,589 1989 56,365 25,725 435,160 13,809 531,059 165,140 14,099 710,298 1990 141,944 42,088 513,709 11,416 709,157 190,429 36,909 936,545 1991 167,547 83,616 670,568 27,000 948,731 224,995 53,918 1,227,644 1992 279,743 107,530 844,097 31,928 983,555 257,620 60,990 1,581,908 LIABILITIES AND CAPITAL Notes Payable Accounts Payable Accrued Expenses Income Tax Payable Total Current Liabilities Long-Term Notes Payable Debentures (Net) Store Managers Deposits Deferred Income Taxes Other Long-Term Liabilities Total Liabilities Shareholders' Equity TOTAL LIABILITIES AND CAPITAL 37,189 34,390 52,343 13,931 137,853 8,688 222,045 16,718 10,978 4,954 401,146 208,353 609,589 25,918 58,926 59,170 24,703 168,717 6,523 222,175 14,045 8,902 6,811 427,173 283,125 710,298 34,862 54,560 67,206 47,152 203,780 3,903 122,428 11,972 12,069 10,530 364,682 571,863 936,545 24,942 63,641 92,125 52,160 232,868 20,642 122,666 9,306 18,886 10,599 414,967 812,677 1,227,644 55,737 64,640 115,054 50,668 286,099 15,482 122,938 8,490 17,682 10,345 461,036 1,120,872 1,581,908 202 TABLE2 CANACOM CORPORATION Income Statements (In Thousands) June 30, 1988-1992 Net Sales Other Revenue Total Revenue Cost of Goods Sold* Gross Income Selling & Administrative Expenses** Depreciation & Amortization Operating Income Interest Expenses*** Net Income Before Tax Provision for Taxes NET INCOME AFTER TAX Common Shares Outstanding Net Income per Share Stock Price 'Includes manufacturing payroll ** Include: Nonmanufacturing payroll Advertising expense Rental Expense Foreign currency translation *** Net of interest income of: 1988 1,215,483 11,403 1,226,886 535,549 691,337 484,249 17,121 189,967 28,466 161,501 78,272 83,229 106,004 $0.79 $5.34 28,344 206,507 114,238 54,606 3,230 1,234 1989 1,384,637 11,360 1,395,997 594,841 801,156 546,325 19,110 235,721 25,063 210,658 98,423 112,235 103,644 $1.08 $10.38 32,958 232,569 124,138 61,491 1,722 2,334 1990 1,691,373 15,697 1,707,070 701,777 1,005,293 645,934 23,288 336,071 15,454 320,617 151,015 169,602 102,578 $1.65 $30.00 42,128 286,494 137,722 73,857 -5,295 7,179 1991 2,032,555 28,657 2,061,212 826,842 1,234,370 780,378 29,437 424,555 1,168 423,387 199,302 224,085 103,395 $2.17 $27.50 53,105 339,559 160,905 89,732 3,216 20,946 1992 2,475,188 38,109 2,513,297 1,008,187 1,505,110 930,244 38,679 536,187 8,905 527,282 248,761 278,521 104,335 $2.67 $50.00 71,892 395,135 199,128 106,970 590 15,139 203 TABLE3 CANACOM CORPORATION Statement of Changes in Financial Position (In Thousands) Years Ending June 30, 1988-1992 Cash Flows from Operations Net Income Add (deduct) items not affecting cash Depreciation Expense Increase in Accounts Recievable Increase in Inventories Change in Accounts Payable Other Total of items not affecting cash Net Cash Flow from Operating Activities Cash Flows used by Investing Activities Net Purchases of Land and Equipment Cash Flows from Financing Activities Net Reductions in Long-Term Debt Purchase of Treasury Stock Sale of Treasury Stock to Employees Issue of Debentures Foreign Currency Adjustments Other Cash Provided by Financing Activities Cash at the Beginning of the Year Change in Cash During the Year Cash at the End of the Year 1988 $83,229 $17,121 ($5,206) ($48,584) ($15,543) $3,772 ($48,440) $34,789 ($26,579) ($106,669) ($27,396) $12,954 $98,875 $0 $7,032 ($15,204) $35,778 ($6,994) $28,784 1989 $112,235 $19,110 ($9,884) ($53,511) $24,626 ($536) ($20,195) $92,040 ($31,063) ($2,165) ($53,342) $15,833 $0 $0 ($7,578) ($47,252) $28,784 $13,725 $42,509 1990 $169,602 $23,228 ($16,363) ($78,549) ($4,366) $7,637 ($68,413) $101,189 ($48,494) ($4,022) $0 $21,077 $0 $0 ($26,003) ($8,948) $42,509 $43,747 $86,256 1991 $224,085 $29,437 ($41,528) ($156,859) $9,081 $7,318 ($152,551) $71,534 ($67,678) $16,739 $0 $29,048 $0 ($10,688) ($17,825) $17,274 $86,256 $21,130 $107,386 1992 $278,521 $38,679 ($23,914) ($173,529) $999 $3,272 ($154,493) $124,028 ($72,675) ($5,223) $0 $33,654 $0 ($6,928) ($8,027) $13,476 $107,386 $64,829 $172,215 204 TABLE4 CANACOM CORPORATION Common-Size Balance Sheets June 30, 1988-1992 ASSETS Cash & Equivalents Accounts & Notes Receivable (Net) Inventories Other Assets Total Current Assets Net Property and Equipment Other Assets TOTAL ASSETS 1988 6 3 63 2 74 25 1 100 1989 8 4 61 2 75 23 2 100 1990 15 5 55 2 77 20 4 100 1991 14 7 55 1 78 18 4 100 1992 18 7 53 2 80 16 4 100 Hightech Corporation 1992 26 24 26 8 84 12 4 100 Manufacturing Composite 1992 10 28 28 3 70 23 7 100 Retailing Composite 1992 9 12 53 1 75 19 6 100 LIABILITIES AND CAPITAL Current Liabilities Debentures (Net) Store Managers Deposits Deferred Income Taxes Other Lonq-Term Liabilities Total Liabilities Shareholders' Equity TOTAL LIABILITIES AND CAPITAL 22 38 3 2 1 65 34 100 24 32 2 1 1 60 40 100 22 14 1 1 1 39 61 100 19 11 1 1 2 34 66 100 18 8 1 1 1 29 71 100 23 0 9 32 68 100 35 12 5 52 48 100 53 14 1 68 32 100 205 TABLE5 CANACOM CORPORATION Common Size Income Statements June 30, 1988-1992 Net Sales Other Revenue Total Revenue Cost of Goods Sold* Gross Income Selling & Administrative Expenses** Depreciation & Amortization Operating Income Interest Expenses*** Net Income Before Tax Provision for Taxes Net Income After Tax 'Includes manufacturing payr ** Includes: Nonmanufacturing payroll Advertising expense Rental Expense Foreign currency translation *** Net of interest income of: 1988 100.00 0.94 100.94 44.06 56.88 39.84 1.41 15.63 2.34 13.29 6.44 6.85 2.33 16.99 9.40 4.49 0.27 0.10 1989 100.00 0.82 100.82 42.96 57.86 39.46 1.38 17.02 1.81 15.21 7.11 8.11 2.38 16.80 8.97 4.44 0.12 0.17 1990 100.00 0.93 100.93 41.49 59.44 38.19 1.38 19.87 0.91 18.96 8.93 10.03 2.49 16.94 8.14 4.37 -0.31 0.42 1991 100.00 1.41 101.41 40.69 60.72 38.40 1.45 20.87 0.06 20.81 9.81 11.00 2.61 16.71 7.92 4.41 0.16 1.03 1992 100.00 1.54 101.54 40.73 60.81 37.58 1.56 21.66 0.36 21.30 10.05 11.25 2.90 15.96 8.04 4.32 0.02 0.61 Hightech Manufacturing Retailing Computer Composite Composite 1992 1992 1992 100.00 n/a 100.00 51.5 48.50 33 2.3 13.20 -1.7 14.90 7.1 7.80 n/a n/a n/a n/a n/a 100.00 n/a 100.00 59.7 40.30 34.5 n/a 5.80 0.9 4.90 n/a n/a n/a n/a n/a n/a n/a 100.00 n/a 100.00 66.6 33.40 29.3 n/a 4.10 1.5 2.60 n/a n/a n/a n/a n/a n/a n/a 206 TABLE6 CANACOM CORPORATION Liquidity Ratios June 30,1988-1992 i , T » V * Current Ratio Acid-Test Ratio Accounts Receivable Turnover Inventory Turnover Day Sales in Receivables Days to Sell Inventory Conversion Period (day_s]_ %Cash to Current Assets %Cash to Current Liabilitu Working Capital (millions$ Liquidity Index Operating Cash Flow to Total Current Liabilities 1988 3.25 0.39 76.73 1.40 4.80 260.70 265.50 8.40 27.29 309.9 233.10 0.73 1989 3.15 0.49 53.82 1.37 6.80 266.40 273.20 10.61 33.41 362.34 230.20 0.78 1990 3.48 0.90 40.19 1.37 9.10 266.40 275.50 20.02 69.68 505.43 203.40 0.95 1991 4.07 1.08 24.31 1.23 15.00 296.70 311.70 17.66 71.95 715.86 228.10 1.09 1992 4.42 1.35 23.02 1.19 15.90 306.70 322.60 22.14 97.78 977.2 222.50 1.11 Hightech Computer 1992 3.64 2.17 7.20 3.55 50.70 102.80 153.50 30.55 1112.57 340.21 51.1 N/A anufacturing Composite 1992 2.70 1.10 5.60 3.10 65.20 117.70 182.90 14.80 29.60 N/A N/A N/A Retailing Composite ; ;-^ -1992 1.40 0.30 45.50 3.60 8.00 101.40 109.40 11.60 16.50 N/A N/A N/A 207 TABLE7 CANACOM CORPORATION Capital Structure and Long-Term Solvency Ratios June 30, 1988-1992 * f > - : • - - - : ; ' ; ; • Equity to Total Liabilities Equity to Long-Term Liabilitie Equity to Net Property, Plant, and Equipment Times Interest Earned Earnings Coverage of Fixed Charges Earnings Coverage of Interest Expenses 1988 0.54 0.83 1.36 6.67 2.92 2.22 1989 0.68 1.13 1.74 9.41 3.37 4.67 1990 1.61 3.73 3.03 21.75 4.32 7.55 1991 2.03 4.76 3.65 363.49 4.79 62.24 1992 2.50 6.80 4.39 60.21 5.02 15.38 Hightech Computer 1992 2.61 15.71 6.00 N/A 9.40 N/A anufacturing Composite 1992 0.92 3.89 2.10 3.10 N/A N/A Retailing Composite 0.46 2.19 1.69 2.40 N/A N/A 208 TABLE8 CANACOM CORPORATION Asset Utilization and Profitability Ratios June 30, 1988-1992 • ; : . • ; ' • ; ' : ' -Sales to Cash & Equivalents Sales to Receivables Sales to Inventories Sales to Working Capital Sales to Net Property, Plant, & Equipment Asset Turnover Return on Sales (%) Return on Assets (%) Financial Leverage Return on Equity (%) Return on Long-Term Liabilities & Equity (%) Before Tax Return on Total Assets (%) 1988 32.30 76.70 3.20 3.90 7.80 1.99 6.90 13.70 2.93 40.00 16.20 26.50 1989 24.60 53.80 3.20 3.80 8.40 1.95 8.10 15.80 2.51 39.60 17.90 29.70 1990 11.90 40.20 3.30 3.30 8.90 1.81 10.00 18.10 1.64 29.70 19.40 34.20 1991 12.10 24.30 3.00 2.80 9.00 1.66 11.00 18.30 1.51 27.60 19.20 34.50 1992 8.80 23.00 2.90 2.50 9.60 1.56 11.30 17.60 1.41 24.90 18.40 33.30 Hightech anufacturing Computer Composite 1992 1992 6.90 7.20 6.90 2.90 14.70 1.77 7.80 13.80 1.47 20.30 13.80 26.30 13.00 5.60 4.70 4.50 8.20 1.50 4.40 6.60 N/A N/A N/A 8.60 Retailing Composite 1992 34.80 33.00 5.70 13.00 27.10 2.80 1.30 3.60 N/A N/A N/A 4.10 209 TABLE9 CANACOM CORPORATION Market Value Ratios Years Ending June 30, 1988-1992 Price Earnings Ratio Earnings Price Ratio Price to Cash from Operations Price to Book Value of Equity Dividend Yield (%) Dividend Payout Ratio (%) Internal Growth Rate (%) 1988 6.28 0.16 5.44 2.72 0.00 0.00 40.00 1989 9.27 0.11 8.22 3.80 0.00 0.00 39.60 1990 18.18 0.06 15.35 5.38 0.00 0.00 29.70 1991 12.67 0.08 10.90 3.50 0.00 0.00 27.60 1992 18.73 0.05 16.28 4.65 0.00 0.00 24.90 Hightech Corporation 1992 27.01 0.04 20.90 5.48 0.00 0.00 20.30 S&P500 Industrial 1992 13.55 0.07 N/A N/A 0.15 N/A N/A 210 TABLE10 CANACOM CORPORATION Statement of Changes in Financial Position (in thousands) Years Ending June 30, 1988-1992 Cash Flows from Operations Net Income Add (deduct) items not affecting cash Depreciation Expense Increase in Accounts Recievable Increase in Inventories Change in Accounts Payable Other Total of items not affecting cash Net Cash Flow from Operating Activities Cash Flows used by Investing Activities Net Purchases of Land and Equipment Cash Flows from Financing Activities Net Reductions in Long-Term Debt Purchase of Treasury Stock Sale of Treasury Stock to Employees Issue of Debentures Foreign Currency Adjustments Other Cash Provided by Financing Activities Cash at the Beginning of the Year Change in Cash During the Year Cash at the End of the Year Funds Reinvestment Ratio (%) Funds Adequacy Ratio (5 Year: 1988-1992) 1988 $83,229 $17,121 ($5,206) ($48,584) ($15,543) $3,772 ($48,440) $34,789 ($26,579) ($106,669) ($27,396) $12,954 $98,875 $0 $7,032 ($15,204) $35,778 ($6,994) $28,784 7.38 1989 $112,235 $19,110 ($9,884) ($53,511) $24,626 ($536) ($20,195) $92,040 ($31,063) ($2,165) ($53,342) $15,833 $0 $0 ($7,578) ($47,252) $28,784 $13,725 $42,509 16.99 1990 $169,602 $23,228 ($16,363) ($78,549) ($4,366) $7,637 ($68,413) $101,189 ($48,494) ($4,022) $0 $21,077 $0 $0 ($26,003) ($8,948) $42,509 $43,747 $86,256 13.81 1991 $224,085 $29,437 ($41,528) ($156,859) $9,081 $7,318 ($152,551) $71,534 ($67,678) $16,739 $0 $29,048 $0 ($10,688) ($17,825) $17,274 $86,256 $21,130 $107,386 7.19 1992 $278,521 $38,679 ($23,914) ($173,529) $999 $3,272 ($154,493) $124,028 ($72,675) ($5,223) $0 $33,654 $0 ($6,928) ($8,027) $13,476 $107,386 $64,829 $172,215 9.57 0.56 211 Reference Code: JUDGMENT RECORDING SHEETS (Set 1) Please answer the following questions: Question 1 Based on your analysis and under current economic and interest-rate conditions, rate Canacom's current liquidity position. Please circle the correct answer. Very Weak Position: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Very Strong Position How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % Question 2 Based on your analysis and under current economic and interest-rate conditions, rate Canacom's long-term solvency position. Please circle the correct answer. Very Weak Position: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Very Strong Position How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % 212 Question 3 Based on your analysis and under current economic and interest-rate conditions, rate Canacom's asset utilization performance. Please circle the correct answer. Very Weak Performance: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Very Strong Performance How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % Question 4 Based on your analysis and under current economic and interest-rate conditions, rate the value of Canacom stock as loan collateral. Please circle the correct answer. Very Low Value: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Very High Value How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % 213 Question 5 Based on your analysis and under current economic and interest-rate conditions, rate the quality of Canacom's financial management. Please circle the correct answer. Very Poor Quality: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 Excellent Quality How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % Question 6 Based on your analysis and under current economic and interest-rate conditions, rate the quality of Canacom's operating management. Please circle the correct answer. Very Poor Quality: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 Excellent Quality How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % 214 Question 7 Based on your analysis and under current economic and interest-rate conditions, what is your estimate of Canacom's expected net income in the coming year? My estimate of Canacom's expected net income is !j> million. How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % Question 8 Based on your analysis and under current economic and interest-rate conditions, how much of the $800 million loan being requested would you recommend as being allowable to Canacom for the purposes of streamlining operations, assuming that it is unsecured? I estimate that $ million should be allowable to Canacom for the purposes of streamlining operations. How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % 215 USING THE FINALYZER EXPERT SYSTEM The expert system you will be using is called FINALYZER (for FTNAncial anaLYZER). It was developed by modelling the expertise and knowledge of several experts in the field. It conducts seven specific analyses, and for each analysis informs you of the types of ratios to be used as part of that analysis, on an INFORMATION SCREEN. It also computes and presents tables of ratios identical to the ones you were provided with earlier, on a DATA SCREEN. After applying its expertise to these ratios, it generates a set of recommendations relevant to the particular analysis, on a RECOMMENDATION SCREEN. The system structure is illustrated on the next page. The interface and format of FINALYZER are identical to that of the CREDIT-AD VIS OR you have used earlier. It also makes available to you the WHY, HOW and STRATEGIC explanations on various types of screens. PLEASE GO THROUGH ALL THE ANALYSES. 216 INTRODUCTION SCREEN FINANCIAL STATEMENT PRESENTATION SCREENS ANALYSIS SELECTION SCREEN OVERALL SUMMARY SCREEN EXIT SCREEN BALANCE SHEET ANALYSIS I Information Screen Data Screen Recommendation Screen Information Screen INCOME STATEMENT ANALYSIS Data Screen Recommendation Screen FUNDS FLOW ANALYSIS LIQUIDITY ANALYSIS CAPITAL STRUCTURE ANALYSIS PROFITABILITY ANALYSIS MARKET VALUE ANALYSIS Information Screen Data Screen Recommendation Screen Information Screen Data Screen Recommendation Screen Information Screen .jm-Data Screen Recommendation Screen Information Screen Data Screen Recommendation Screen Information Screen Data Screen Recommendation Screen The System Flow Chart of FINALYZER 217 Reference Code: JUDGMENT RECORDING SHEETS (Set 2) Please answer the following questions: Question 1 Based on your analysis using FINALYZER and under current economic and interest-rate conditions, rate Canacom's current liquidity position. Please circle the correct answer. Very Weak Position: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Very Strong Position How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % Question 2 Based on your analysis using FINALYZER and under current economic and interest-rate conditions, rate Canacom's long-term solvency position. Please circle the correct answer. Very Weak Position: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Very Strong Position How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % 218 Question 3 Based on your analysis using FINAL YZER and under current economic and interest-rate conditions, rate Canacom's asset utilization performance. Please circle the correct answer. Very Weak Performance: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 :Very Strong Performance How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % Question 4 Based on your analysis using FINAL YZER and under current economic and interest-rate conditions, rate the value of Canacom stock as loan collateral. Please circle the correct answer. Very Low Value: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Very High Value How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % 219 Question 5 Based on your analysis using FINAL YZER and under current economic and interest-rate conditions, rate the quality of Canacom's financial management. Please circle the correct answer. Very Poor Quality: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 :Excellent Quality How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % Question 6 Based on your analysis using FINAL YZER and under current economic and interest-rate conditions, rate the quality of Canacom's operating management. Please circle the correct answer. Very Poor Quality: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 1 0 Excellent Quality How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % 220 Question 7 Based on your analysis using FINALYZER and under current economic and interest-rate conditions, what is your estimate of Canacom's expected net income in the coming year? My estimate of Canacom's expected net income is $ million. How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % Question 8 Based on your analysis using FINALYZER and under current economic and interest-rate conditions, how much of the $800 million loan being requested would you recommend as being allowable to Canacom for the purposes of streamlining operations, assuming that it is unsecured? I estimate that $ million should be allowable to Canacom for the purposes of streamlining operations. How confident are you that this judgment is correct? Provide a number between 50% and 100%, with 50% meaning you are completely unsure and 100% meaning you are completely confident. % 221 Reference Code: POST STUDY QUESTIONNAIRE INSTRUCTIONS To what extent do the following statements reflect your views of the FINAL YZER system? For each statement CIRCLE the one number that best represents your agreement or disagreement with the statement (1 - Strongly Agree, 2 - Agree, 3 - Slightly Agree, 4 - Neutral, 5 - Slightly Disagree, 6 - Disagree, 7 - Strongly Disagree). PLEASE ANSWER ALL THE FOLLOWING ITEMS. 1. The use of FINAL YZER greatly enhanced the quality of my judgments. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 2. Using FINAL YZER gave me more control over the financial analysis task. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 3. FINAL YZER provided good advice across different situations. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 4. My interaction with FINAL YZER was clear and understandable. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 5. Using the explanations provided by FINAL YZER improved the quality of the analysis I performed. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 6. FINAL YZER is dependable in important decisions. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 7. It was easy to get FINAL YZER to do what I wanted it to do. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 222 8. My understanding of financial analysis has been enhanced by the use of the explanations provided by FINALYZER. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 9. When FINALYZER gives me unexpected advice, I am confident that the advice is correct. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 10. Using FINALYZER made the financial analysis task easier to perform. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 11. Using FINALYZER enabled me to accomplish the financial analysis task more quickly. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 12. Using the explanations provided by FINALYZER enhanced my effectiveness in completing the financial analysis task. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 13. FINALYZER is a reliable source of knowledge for financial analysis. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 14. Using FINALYZER improved the quality of the analysis I performed. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 15. FINALYZER conveniently supported all the various types of analysis required to complete the judgmental decision making tasks. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 16. The explanations provided by FINALYZER had a significant impact on my judgments. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 223 17. Using FINAL YZER increased my productivity. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 18. I think users who have little expertise would trust the advice given by FINAL YZER. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 19. Overall, I found FINAL YZER useful in analyzing the financial statements. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 20. Using the explanations provided by FINAL YZER enabled me to accomplish the financial analysis more quickly. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 21. Using FINAL YZER enhanced my effectiveness in completing the financial analysis task. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 22. Using the explanations provided by FINAL YZER made the financial analysis task easier to perform. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 23. FINAL YZER gave the same advice for the same situation over time. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 .-Strongly Disagree 24. Using the explanations provided by FINAL YZER gave me more control over the financial analysis task. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 25. Overall, I found FINAL YZER to be easy to use. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 224 26. I would consider the advice generated by FINAL YZER could only have been provided by an expert in the industry. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 27. Learning to use FINAL YZER was easy for me. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 28. Using FINAL YZER allowed me to accomplish more analysis than would otherwise have been possible. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 .-Strongly Disagree 29. Using the explanations provided by FINAL YZER increased my productivity. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 .-Strongly Disagree 30. FINAL YZER behaved in a very consistent manner. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 31. FINAL YZER helped me make good decisions. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 32. It was easy for me to perform the financial analysis using FINAL YZER. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 33. Overall, I found the explanations provided by FINAL YZER useful in analyzing the financial statements. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 .-Strongly Disagree 34. FINAL YZER has considerable knowledge of the factors involved in financial analysis. Strongly Agree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Disagree 225 ESSAY QUESTIONS: In the space provided, please address the following issues as completely and specifically as possible. 1. What are the major strengths of the FINALYZER expert system? 2. What are the major weaknesses of the FINALYZER expert system? 3. More specifically, what changes would you like to see in the way FINALYZER provides explanations? Thank you very much for your participation. Your contribution to this research is greatly appreciated. 226 DEBRIEFING PROTOCOL Thank you again for your participation. It is our hope that you have enjoyed working with FINALYZER in completing the case. Your judgments and evaluations about Canacom and the system, together with the data captured by the system of your interaction with it, will be analyzed using a variety of computer and statistical analyses. We hope to learn more about the optimal manner in which expert systems for financial analysis should be designed to maximize their potential to users such as yourself. Studies like this, that focus on users' behaviour in using computer support tools, provide valuable feedback that helps us design more user friendly and easy-to-learn expert systems. The primary focus of this study is on the explanation component of the expert system. We are particularly interested in how and when you selected explanations, and the relative effectiveness of various methods of providing explanations. Our goal is to learn about how expert systems should provide explanations effectively. The FINAL YZER system that you have used is not a complete and fully functional financial analysis expert system yet. Developing a complete and fully functional system requires a vast amount of time and resources, and is a difficult task beyond the scope of the current study. However, this study contributes to that goal by tackling some aspects of the complete problem. The above information was not revealed to you at the start of the session in the interest of what is termed experimental control and validity. For a study like this to be successful, it is of critical importance that you, as the user of the system, behave as objectively and as naturally as you would in a real situation requiring the use of such a system. Because it was felt that your knowing this information would distort or bias your behaviour in using FINAL YZER, it was decided that this information would only be revealed to you now at the end of your participation. If you have any additional comments, please inform the research assistant now. 227 NOTE PAGE Research assistant name: Participant name: Time of arrival: Time to start tutorials: Time to start problem solving (manually): Time to start using FINAL YZER (including judgment questionnaire): Time to start post-study-questionnaire: Time of finishing: Special observations and comments: 228 APPENDIX B. STEP-BY-STEP EXPERIMENTAL PROCEDURES AND VERBAL INSTRUCTIONS 1. Consent Form "The consent form is required by the University. Without your consent, we are not allowed to have you participate in this study." 2. Background Information Questionnaire "Please take 2-3 minutes to give us some background information, it will help us analyze the results." Once the participant has finished filling out the questionnaire, have a glance at it. Pay special attention to: (1) everything is complete; (2) if there is some previous work experience related to financial analysis, make sure the length of it is specified; (3) familiarity with mouse and Window-based applications. 3. General Instruction Sheet "Essentially, you will do three things today: going through the tutorials, analyzing the case without using the expert system first, and then using the expert system." "To use your time most effectively, we found it works out best if you spend about 25 minuets with the tutorials, 25 minutes working on the case manually, and 35 minutes using the expert system. This is to ensure you get the most out of it, and finish within 90 minutes. Of course, if you wish to spend more time on any part of the study, you are welcome to do so. After all, time is not a factor of the evaluation of your performance." "If you have any questions in the process, please don't hesitate to ask me." 229 4. Short Note on Expert Systems "It basically highlights the differences between expert systems and conventional computer systems." When the reading is done, "You may leave this page besides the monitor for you later reference." 5. Mouse Tutorial "All you need to do is clicking the LEFT button of the mouse ONCE, (for every actions)" (Make sure to mention this to people who never used mouse before.) While the participant is trying the tutorial, "(probably you are very familiar with the use of mouse) the key point is about the two types of button: radio-buttons (round circles) and push-buttons (shaded boxes)." When the participant has finished, "clicking on these to types of buttons is the only way to interact with the expert system. The idea of this tutorial is to get you know the buttons are clickable whenever and wherever you see them." 6. Credit-Advisor System Tutorial a. The participant reads the instruction sheet first, then the research assistant (RA) would explain the system flow chart, and highlight the three types of screens. For hypertext users, "you will get three types of explanation on the three types of screen, by clicking on a ratio-button to make a selection, and a push-button to activate the selection." For lineartext users, "... on the two types of screens..." "Feel free to the change agreement ratings." b. The RA will demonstrate key system features (buttons and screens): explanations from data screens, information screens, and conclusion screens; the difference between BACK-TO-ANALYSIS and PREVIOUS-SCREEN; Strategic explanations on the information screen and conclusion screen to show the difference. Finally, going through the overall analysis, then let the participant RESTART the system. 230 c. While they are practising, "you agree by clicking on the numbers on the LEFT, and disagree on the RIGHT." (It is quite common for people to do the other way around.) d. Once the participant has become comfortable, let him/her finish through the overall analysis. By this time, it is roughly 25 minutes into the experiment session. (Don't EXIT, and stay on the EXIT and RESTART screen for them to practice verbalizing till they have finished the manual analysis, and to initialize the system time.) 7. Loan Analysis Case "Now we 'd like you to analyze the case first without using the expert system. Here is the case, tables, and the questionnaire - we '11 give you an identical questionnaire again when you use the expert system. That's how we measure the impact of the expert system." "Just do a quick analysis. The important thing is to get yourself familiar with the case and numbers. Generally speaking you may spend as much time as you prefer. We encourage you to finish in about 20-25 minutes. Although time is not a factor of the evaluation, we'd like to make sure you reserve at least 35 minutes to use the system. Would you like me to remind you at the 20 minute point?" "A calculator, scrap paper, highlighter are all ready on the other desk. You may want to move over to finish the analysis, without using the computer. Please feel free to mark up on everything." At the end, check the completeness of the questionnaire (all questions are answered). 8. Practising Verbalizing (Attached separately in Appendix H) 9. Using FINALYZER Make sure the 10 page tables are handy for the participant to reference. "Here is the second questionnaire, identical to the one you just had. You can start working on it either while using the expert system, or afterwards." While the subject is reading the instruction, the RA EXIT from CREDIT-AD VIS OR and initialize 231 the system time as "00:00:00.00a". When the participant finishes reading, the RA explains the system flow chart, and highlights the three types of screens and explanations available on them. "The tutorial is like one of the seven analyses listed here." "For each of the seven analyses, you will go through INFORMATION SCREEN, DATA SCREEN, and RECOMMENDATION SCREEN, sequentially (as indicated by the dark arrow)." "There seven analyses, each has 3 to 4 specific recommendations; Altogether, there are 25 specific recommendations." When the analysis selection screen is on display. "Make sure you step through all of them, in whatever order you prefer." (There is no need to go through the sub-analyses sequentially, to allow more flexibility to the users. If one intentionally tries to go through any analysis twice, it is OK, no problem.) If it appears that there is some serious misunderstanding of the system's key features, try to correct it as early as possible (during the first sub-analysis). If it does not work (or there is anything else unusual), let it go. During debriefing, ask the subject what was the reason, whether system features were understood, and make a comment on the note page. Pay special attention at the beginning, whether the subject rate DISAGREE when he/she really means to agree. At the end (or while the subject is working), it is very important to check the completeness of the all the questionnaire. 10. Post Study Questionnaire "Thank you very much. That's almost it. Finally, we would appreciate it if you could spend a couple of more minutes to give us some evaluation very quickly." If time is a concern, skip the essay question page. If there is no hurry, ask additional comments, which can lead into debriefing naturally. 11. Debriefing Protocol Either let the participant read the debriefing protocol, or explain the two main points to him/her (explanation component is the focus, and the system is not fully functioning yet.) 232 12. Note Page It is more important to make sure the experiment is going smoothly than to keep track of time. An approximate time record is enough. If there is any thing unusual, it would be helpful to note it down. Finally, staple the four questionnaires (background information, two judgment questionnaires, post-study evaluation) separately. 13. Miscellaneous If someone uses the PREVIOUS-SCREEN key to come from the 2nd conclusion screen to DATA-SCREEN, or the first conclusion screen, when he/she go back there is not need to provide agreement rating for the 1st conclusion screen. During the debriefing, they can be told existing expert support systems in the industry usually can only calculate ratios based on data down-loaded from 10-k form and other electronically based data, but fall short of providing specific recommendations. However, it is not uncommon that large banks have their own expert system like CREDIT-AD VISOR for consumer loan/credit card application, which is much easier than commercial loan applications. Expert subjects may be particularly interested in chatting about these things. If they have questions related to the nature of the study, which may be difficulty to answer, e.g., why think-aloud, tell them those can be asked later at the end of the session "you will be debriefed at the end of the session." 233 APPENDIX C. An Illustration of Hyper-FINALYZER The six figures included in this appendix represent the six types of screen displayed by the experimental KBS. Although the figures are adopted from Hyper-FINALYZER, the only difference between the two versions of the KBS is the availability of additional hypertext markers for requesting deep explanations the additional hypertext markers (radio-buttons, cf., Borland International Inc., 1991) for requesting deep explanations from the data screen, recommendation screens, and reasoning-trace explanation screens. Therefore, screens of FINAL YZER are not shown here to avoid duplication. Furthermore, the figures were reproduced using DrawPerfect for high resolution and easy integration into this document. 234 Hyper-FINALYZER liquidity Analysis All the following inputs will be used by the system as part of its analysis. Should you wish to receive explanations about any of them, click on the appropriate radio-button followed by a click on the relevant explanation button. At the end, please click on START ANALYSIS to proceed. O Current Ratio O Acid-Test Ratio O Accounts Receivable Turnover O Inventory Turnover O Day Sales in Receivables O Oper. Cash Flow to Total Curr. Liab. O Days to Sell Inventory O Conversion Period [days] O % Cash to Current Assets O % Cash to Current Liabilities O Working Capital [million $] O Liquidity Index i— Explanations fiiii mm iiniiiip i— Screen Control —i START ANALYSIS Figure CI. Example of Input Screens 235 Hyper-FINALYZEFT Liquidity Analysis CANACOM CORPORATION u ( ) o ( ) ( ) ( ) ( ) ( ) o Current Ratio Acid-Test Ratio Accounts Receivable Turnover Inventory Turnover Day Sales in Receivables Days to Sell Inventory Conversion Period (days) %Cash to Current Assets %Cash to Current Liabilities 1988 3.25 0.39 76.73 1.40 4.80 260.7 Liquidity Ratios •June 30,1988-1992" 1989 3.15 0.49 53.82 1.37 6.80 266.4 265.50 273.2 8.40 27.29 10.61 33.41 HiTe Manuf'g Retailing Comp 1990 1991 1992 1992 3.48 4.07 4.42 3.64 0.90 1.08 1.35 2.17 40.19 24.31 23.02 7.20 1.37 1.23 1.19 3.55 9.10 15.00 15.90 50.70 266.4 296.7 306.7 102.8 275.5 311.7 322.6 153.5 20.02 17.66 22.14 30.55 1992 2.70 1.10 5.60 3.10 65.20 117.7 182.9 14.80 69.68 71.95 97.78 1112.6 29.60 1992 1.40 0.30 45.5 3.6 8.0 101.4 109.4 11.6 16.5 r- Explanations mm® HOW STRATEGIC ,— Screen Control Pfisvious-scReej • tcxr-scReEN Figure C2. Example of Data Screens 236 Hyper-FINALVZEFT Hyper-RNALYZEB Recommendations SHORT-TERM LIQUIDITY CONCLUSIONS [PAGE#1 OF 2] The following conclusions have been reached by the system. To receive explanations about them, click on the appropriate radio button (that precedes each conclusion or embedded in the text), followed by a click on the relevant explanation button. At the end, please click on NEXT-SCREEN to proceed. O Canacom is in a very favourable O working capital1 position including little risk in the short term of financial disaster. It would be an optimal client for a short-term loan. O There is an increasing trend toward having a high proportion of sales on credit. While the proportion has tripled in the last five years, it still remains within a range well bellow industry levels. [— Explanations WHY | 11 MOW j H&THATEGJC r- Screen Control f«EWOUS-SC«EEN I llNEXT-SCfiEEN Figure C3. Example of Recommendation Screens 237 Hyper-FINALYZER Liquidity Analysis All the following inputs will be used by the system as part of its analysis. Should you wish to receive explanations about anv of them, click on the appropriate radio-button followed bv a HOW EXPLANATION (Current Ratio) Current Ratio •• Current Assets Current Liabilities Note, the numerator of this ratio consists of inventories and accounts receivables, which are not as liquid as cash. Such a limitation in this measure of liquidity is partly overcome by the 'Oacid test ratio', which excludes from the numerator inventories (the least liquid component of current assets) and prepaid expenses. The O percent cash to current liabilities ratio' further excludes from the numerator receivables (more liquid than inventories but still one step removed from cash). Moreover, since it takes time to convert receivables and inventories into cash, when using the current ratio, one should take into account the nature and turnover rates of the assets and liabilities involved in the numerator and denominator. These turn over rates include 'Qdays sales in receivables', Odays to sell inventory', and the amount of time allowed for the payment of current liabilities. >— Explanations vmr HOW STRATEGIC Screen Control PREVIOUS-SCREEN NEXT-SCREEN 6ACK-TO>AN»l.YSlS Figure C4. Example of Deep Explanations 238 Hyper-FINALYZER Liquidity Analysis All the following inputs will be used by the system as part of its analysis. Should you wish to receive explanations about any of them, click on the appropriate radio-button followed by a STRATEGIC EXPLANATION LIQUIDITY ANALYSIS O Liquidity Index Current Condition levjew Review OWorking Capital Composition • - N ^ C. Assets C. Liabilities OCurrent OAcid-Test Items Items Ratio J * Operational Impact Review Receivables tatio Inventory Cash Flow OInventory ODays to \ OAccounts Turnover . Sell Inventory OCash'to OCash to Current Assets Current Liabilities \ Receivable Turnover ODays Sales Receivables O Operating Cash Flow to Total Current Liabilities i— Explanations $5§§| IfOJ?;:;;; i i i ^ isse j , - Screen Control m&m^m^M: Kecr-SCREEN: S^Ka^ftNAtYSis:! Figure C5. Example of Strategic Explanations (Deep) 239 Hyper-FINALYZER Hyper-FINALYZER Recommendations SHORT-TERM LIQUIDITY CONCLUSIONS [PAGE #1 OF 2] The following conclusions have been reached by the system. To receive explanations about HOW EXPLANATION This conclusion was reached based on the following evaluations: 1) the absolute values of the 'O current ratio1 and 'O acid-test ratio' are larger than 1, 2) both these values are superior to the competitor and industry indices, 3) there is a healthy increasing trend over the last five years in these ratios and in the absolute amount of 'O working capital', and 4) the 'O cash to current liabilities ratio' being at 97.78 percent suggests that the enterprise has the potential to settle virtually all its current liabilities in cash. Explanations §iii 111!! STRATEGIC Screen Control PREVIOUS-SCREEN NEXT-SCREEN ::i^gii0ppi^S(g; Figure C6. Example of Reasoning-Trace Explanations 240 APPENDIX D. MATERIALS FOR RECRUITING SUBJECTS CONTENTS: Page: An Invitation to Professionals in Financial Decision Making 242 Confirmation Form (For Professionals) 243 An Invitation to Students in Accounting/Finance 244 Confirmation Form (For Students) 245 Consent Form (For Professionals) 246 Background Information Questionnaire (For Professionals) 247 Consent Form (For Students) 249 Background Information Questionnaire (For Students) 250 241 THE UNIVERSITY OF BRITISH COLUMBIA Faculty of Commerce and Business Administration 2053 Main Mall Vancouver, B.C. Canada V6T 1Z2 Telephone: (604) 822-8964 Fax: (604) 822-9574 FINANCIAL ANALYSIS EXPERT SYSTEMS STUDY An Invitation to Professionals in Financial Decision Making We are currently studying issues related to the design of expert support systems for financial analysis. The objectives of the study are to measure their value to system users and to understand the manner in which they should be designed to be of greatest value to the users. In the initial stages of the study, such an expert system has been developed by modelling the expertise of several experts in the field. Plans are currently being made to evaluate this system in terms of facilitating financial analysis for investment and loan decision making. We now would like to invite professionals in the financial industry, whose jobs involve some measure of financial statement analysis, to participate in the project. Your participation will involve the following. Initially, you will be trained in the use of the expert system. Next, your task will be to use the system to perform the financial statement analysis of a hypothetical company. At the end of your analysis you will make a set of judgements regarding the financial health of this company. You will also be asked to provide evaluations of the usefulness of various aspects of the system you have used. It is expected that your participation will take about 1.5 hours. It can take place either at the University of British Columbia or at any other location of your choice that is most convenient to you. You may choose any day of the week (including weekends) and any time of day (including evenings), in May or June. A direct benefit of participation will be exposure to a leading-edge, user friendly financial decision support tool that uses a "Windows-type" interface and expert financial knowledge. The top performing 20% of the participants will receive a monetary award of $50 in cash. Any information obtained in connection with your participation that can be identified with you will remain confidential. In any reports or publications, only aggregate data will be presented to preserve the anonymity of participants. There will also be no way to identify you or your company through any of your responses. Participation in this study is completely voluntary and, you will be free to discontinue your participation at any time and at any stage of the study. If you have any questions now or later, please feel free to call Professor Izak Benbasat at 822-8396 or Mr. Jiye Mao at 822-8964. If you do decide to participate, please mail or fax the attached confirmation form to us. We will then call you to schedule your session. We need your assistance to complete this study. Your participation will be highly appreciated. CONFIRMATION FORM I would like to participate in the Financial Analysis Expert Systems Study. Please call me to schedule my participation. Name: Telephone Number: Fax Number: Date: We thank you very much for your interest in this study. Please mail or fax this form to: Professor Izak Benbasat Financial Analysis Expert System Project Faculty of Commerce and Business Administration University of British Columbia 2053 Main Mall, Vancouver, B.C., V6T 1Z2 PHONE (604) 822-8396, FAX (604) 822-9574 243 THE UNIVERSITY OF BRITISH COLUMBIA Faculty of Commerce and Business Administration 2053 Main Mall Vancouver, B.C. Canada V6T 1Z2 Telephone: (604) 822-8964 Fax: (604) 822-9574 FINANCIAL ANALYSIS EXPERT SYSTEMS STUDY An Invitation to Students in Accounting/Finance We are currently studying issues related to the design of computer-based expert support systems for financial statement analysis. In the initial stages of the study, an expert system called FINALYZER (for FINAncial anaLYZER) has been developed by modelling the knowledge and expertise of several experts in the field. Plans are currently being made to evaluate the system in terms of facilitating financial analysis for loan decision making. To this end, we now would like to invite graduate and undergraduate students who have taken or are currently taking courses related to financial statement analysis, such as COMM 459, COMM 550, COMM 453, COMM 351, or COMM 293, to participate in the project. The task will involve the following. Initially, you will be trained to use FINAL YZER. Next, you will be asked to use the system to complete a specially designed financial analysis case about a company whose financial and other data will be provided to you. At the end of your analysis you will make a set of judgments regarding the financial health of the company. You will also be asked to provide scaled evaluations and written comments as to the usefulness of various aspects of the system. It is estimated that your participation will take about 1.5 hours and will be scheduled at the Henry Angus Building on the University of British Columbia campus. You will be paid an honorarium of $15 for your participation and will have an one out of five chance of winning $50 cash award for best performance in the study. A direct benefit of participation will be exposure to a user friendly financial decision support tool that uses a "Windows-type" interface and expert financial knowledge. Any information obtained in connection with the study that can be identified with you will remain confidential. In any reports or publications only aggregate data will be presented to preserve the anonymity of participants. Note that participation in this study is completely voluntary and, if you wish to do so, you will be free to discontinue participation at any time and at any stage of the study. If you have any questions now or later, please call Mr. Ji-Ye Mao at 822-8964 or Professor Izak Benbasat at 822-8396, we will be happy to answer them. If you do decide to participate, please complete the attached confirmation and return it to us. We will then call you later to schedule your participation sometime in February, March, or April 1994, at your convenience, any day of the week (including weekends) and any time of day (including evenings). CONFIRMATION FORM I would like to participate in the Financial Analysis Expert Systems Study. Please call me to schedule my participation. Name: Telephone Number: Date: We thank you very much for your interest in this study. Please return this form to: Professor Izak Benbasat Financial Analysis Expert System Project Faculty of Commerce and Business Administration Room 452, Henry Angus Building University of British Columbia 2053 Main Mall, Vancouver, B.C., V6T 1Z2 PHONE (604) 822-8964, FAX (604) 822-9574 245 FINANCIAL ANALYSIS EXPERT SYSTEMS STUDY CONSENT FORM This form is to be completed after you have read and understood the contents of the Information Sheet that is attached. If you wish to have a signed copy of this form for your own record, we will generate one for you. Agreement to Participate: I have read and understood the contents of the Information Sheet provided and have decided to participate in the study. Agreement to Confidentiality: I understand that some of my colleagues in the financial industry may also participate in this study. I realize that my discussion of the details of this study with them may distort the results. Therefore, I agree not to discuss with any other participant any aspect of the study prior to their participation. Signature of Participant Date Participant Name: 246 BACKGROUND INFORMATION QUESTIONNAIRE (For Professionals) 1. Name (please print): 2. Contact Address: 3. Educational and Professional Qualifications: (please specify the number of years completed if in-progress) Undergraduate: Area of Specialization: Graduate: Area of Specialization: Professional Affiliations (e.g., CFA, ACIB, CA, CGA, CMA, etc.): 4. My current job designation and the number of years I have been at this job: 5. I perform financial statement analysis as part of my job (Yes/No): My average frequency of such analyses is (circulating one of the following, if Yes): Everyday, Every other day, Every week, Every fortnight, Every month The number of years of experience that I have in financial statement analysis is: The industries of my specialization are (if any): 6.1 use (or have used) financial modelling and analysis software (e.g., Lotus 1-2-3, FISCAL, FSAP, etc.) (state NONE otherwise): 7. I use (or have used) the following expert systems (briefly describe the system and the circumstances of the use, state NONE otherwise): 247 8.1 use (or have used) Microsoft Windows, or Macintosh based applications (state NO if never used), and would rate my familiarity on the following scale as: Not Familiar: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Very Familiar 9. I would rate my familiarity with the use of the "mouse" input device used with computer systems as: Not Familiar: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Very Familiar 10. I would rate my familiarity with expert systems in general as: Not Familiar: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Very Familiar 11. I would rate my familiarity with business computing in general as: Not Familiar: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Very Familiar 12. I feel I need an experienced person nearby when I use the computer. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 13. I can make the computer do what I want it to do. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 14. I need someone to tell me the best way to use the computer. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 15. I feel confident about using the computer to store important information. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 : Strongly Agree 16. If I had a problem using the computer, I could solve it one way or another. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 17. When something goes wrong with the computer, I feel there would be little I could do about it. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 248 FINANCIAL ANALYSIS EXPERT SYSTEMS STUDY CONSENT FORM This form is to be completed after you have read and understood the contents of the Information Sheet that is attached. If you wish to have a signed copy of this form for your own record, we will generate one for you. Agreement to Participate: I have read and understood the contents of the Information Sheet provided and have decided to participate in the study. Agreement to Confidentiality: I understand that some of my classmates and friends may also be participating in this study. I realize that my discussion of the details of this study with them may distort the results. Therefore, I agree not to discuss with any other participant any aspect of the study prior to their participation. Signature of Participant Date Participant Name: 249 BACKGROUND INFORMATION QUESTIONNAIRE (For Students) 1. Name (please print): 2. Contact Address: 3. Educational and Professional Qualifications (College, University, etc.): (please specify the number of years completed if in-progress) Education: Professional Affiliations (e.g., CFA, ACIB, CA, CGA, CMA, etc.): 4. I have taken (or I am currently taking) the following courses that relate to financial statement analysis: 5. I have held the following full-time, part-time, or summer jobs that involved computing or analyzing financial ratios (briefly describe the circumstances and the length of the period of each job, state NONE otherwise): 6.1 use (or have used) financial modelling and analysis software (e.g., Lotus 1-2-3, FISCAL, FSAP, etc.) (state NONE otherwise): 7. I use (or have used) the following expert systems (briefly describe the system and the circumstances of the use, state NONE otherwise): 250 8.1 use (or have used) Microsoft Windows, or Macintosh based applications (state NO if never used), and would rate my familiarity on the following scale as: Not Familiar: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Very Familiar 9. I would rate my familiarity with the use of the "mouse" input device used with computer systems as: Not Familiar: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Very Familiar 10. I would rate my familiarity with expert systems in general as: Not Familiar: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Very Familiar 11. I would rate my familiarity with business computing in general as: Not Familiar: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Very Familiar 12. I feel I need an experienced person nearby when I use the computer. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 13. I can make the computer do what I want it to do. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 14. I need someone to tell me the best way to use the computer. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 15. I feel confident about using the computer to store important information. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 16. If I had a problem using the computer, I could solve it one way or another. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 17. When something goes wrong with the computer, I feel there would be little I could do about it. Strongly Disagree: 1 - 2 - 3 - 4 - 5 - 6 - 7 :Strongly Agree 251 APPENDIX E. DATA SCREENING PRIOR TO ANALYSIS Count 6 8 12 8 6 3 1 4 1 2 0 0 0 0 0 1 1 Midpoint 1 2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 .1....+....1....+....1. 4 8 12 Histogram Frequency . .1. 16 . .1 20 Mean Mode Kurtosis 9.472 .000 4.719 Std err Std dev S E Kurt 1.361 9.908 .644 Median Variance Skewness 7.000 98.177 1.975 Table El. Deep Explanation Use (Pre-Transformation) Count 6 0 4 3 6 3 7 7 4 3 5 1 2 0 0 0 2 Midpoint .2 .6 1.0 1.4 1.8 2 2 3 3 3 4 4 5 5 5 6 6 .2 .6 .0 .4 .8 .2 ,6 .0 .4 .1.... + ....1 + ....1. 2 4 6 Histogram Frequency , .1 10 Mean Mode Kurtosis 2.643 .000 .259 Std err Std dev S E Kurt .219 .592 .644 Median Variance Skewness 2.646 2.533 .341 Table E2. Square Root of Deep Explanation Use (Post-Transformation) 252 Count 3 8 7 2 5 4 4 3 3 3 3 2 4 0 1 0 1 Midpoint 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 .1....+....1....+....1. 2 4 6 Histogram Frequency . .1 10 Mean Mode Kurtosis 16.604 2.000 -.570 Std err Std dev S E Kurt 1.698 12.362 .644 Median Variance Skewness 15.000 152.821 .575 Table E3 Reasoning-Trace Explanation Use (Pre-Transformation) C o u n t 0 1 0 2 5 5 5 2 6 6 4 6 4 5 1 1 0 M i d p o i n t - . 5 . 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 4 5 5 6 6 7 7 . 1 + . . . . 1 . . . . + . . . . 1 . 2 4 6 H i s t o g r a m F r e q u e n c y .1 10 Mean Mode Kurtosis 3.729 1.414 -.853 Std err Std dev S E Kurt .228 1.659 .644 Median Variance Skewness 3.873 2.752 -.130 Table E4. Square Root of Reasoning-Trace Explanation Use (Post-Transformation) 253 Variable Deep Explanations (pre-transformation) Deep Explanations (post-transformation) Reasoning-Trace Explanations (pre-transformation) Reasoning-Trace Explanations (post-transformation) Cochran's C (Probability) .487 (.030 approx.) .356 (.477 approx.) .280 (1.00 approx.) .314 (.889 approx.) Bartlett-Box F (Probability) 10.591 (.000) 1.493 (.215) .151 (.929) .169 (.918) Note: very small numbers in parenthesis raise concern over the homogeneity variance assumption, causing the rejection of the homogeneity variance hypothesis. Table E5. Tests of Homogeneity of Variance j _ _L 4 6 H i s t o g r a m F r e q u e n c y 10 Mean Mode Kurtosis S E Skew Maximum .925 .000 .064 .327 ,000 Std err Std dev S E Kurt Range Sum .583 4.242 .644 20.000 49.000 Median Variance Skewness Minimum 1 . 0 0 0 1 7 . 9 9 4 - . 5 5 4 - 1 1 . 0 0 0 Table E6. Frequency Table of Improvement in Decision Accuracy Scores 254 Measure Construct Loadings Weights Usefulness of Explanations: Q8. My understanding of financial analysis has been enhanced by .67 the use of the explanations provided by FINALYZER. Q12. Using the explanations provided by FINALYZER enhanced my effectiveness in completing the financial analysis task. .51 Q16. The explanations provided by FINALYZER had a significant .86 impact on my judgements. Q22. Using the explanations provided by FINALYZER made the .83 financial analysis task easier to perform. Q33. Overall, I found the explanations provided by FINALYZER .70 useful in analyzing the financial statements. Trust in KBS: Q3. FINALYZER provided good advice across different situations. .82 Q6. FINALYZER is dependable in important decisions. .81 Q9. When FINALYZER gives me unexpected advice, I am confident .82 that the advice is correct. Q13. FINALYZER is a reliable source of knowledge for financial .83 analysis. Q18.1 think users who have little expertise would trust the .55 advice given by FINALYZER. Q23. FINALYZER gave the same advice for the same situation over .57 time. Q30. FINALYZER behaved in a very consistent manner. .58 Q31. FINALYZER helped me make good decisions. .54 Improvement in Decision Accuracy: E3. Based on your analysis and under current economic and .57 .26 interest-rate conditions, rate Canacom's asset utilization performance. E4. ..., rate the value of Canacom stock as loan collateral. E5. ..., rate the quality of Canacom's financial management. E6. ..., rate the quality of Canacom's operating management. Table E7. Construct Measures (PLS) 62 -.51 28 .72 -.60 .38 255 Measure Construct Loadings Weights The Use of Deep Explanations: Index refers to deep explanations requested through index screens .71 of domain concepts prior to (in the abstract of) analysis. Hypertext (abstract) refers to deep explanations requested via the links .72 of hypertext originated from index screens ONLY, also prior to analysis. Hypertext (contextualized) refers to deep explanations requested via the .77 links of hypertext in the context of analysis. The Use of Reasoning-Trace Explanations: A Why explanation rationalizes why a particular conclusion that .77 has been reached is important for the task. A How explanation reveals how a particular conclusion has been .93 reached by presenting a trace of the evaluations. Table E7. Construct Measures (Continued) 256 APPENDIX F. VALIDITY AND RELIABILITY OF PERCEPTION MEASURES This appendix documents the reliability and validity assessment of the four user perception measures discussed in Chapter 5. The post-experimental questionnaire consisted of these four multi-item scales, measuring perceived usefulness of explanations provided by KBS, usefulness of KBS, trust in KBS, and ease of use of KBS. The first three were related to the research model, whereas the ease of use scale was included in the questionnaire to test if the two versions of KBS had different levels of ease of use, as a potential covariate. Reliability. The reliability of the four measures was assessed using the RELIABILITY module of SPSS" (1985). The overall reliability scores are shown in Table Fl. All of the four scales had Cronbach's alpha above or close to .80, an acceptable level of reliability (Nunnally, 1967). Individual items within each scale were further examined in terms of the effects of each item on Cronbach's alpha and overall scale variance if the item was removed, as well as item-to-total correlation. Details are included in Tables F2 to F5. Scale Cronbach's alpha Usefulness of Explanations .87 Usefulness of KBS .87 Trust in KBS .85 Ease of Use .76 Table Fl. Overall Reliability of the Four Scales Among the nine items of the Usefulness of Explanations scale, Q5 and Q8 had the lowest item-to-total correlation, much lower than the rest of the items in the scale. The deletion of these two items would increase scale variance and slightly increase Cronbach's alpha (Table F2). Similarly, among the ten items of the Usefulness of KBS scale, Q15 and Q28 had the lowest item-to-total correlation, the deletion of these two items would increase scale variance and slightly increase Cronbach's alpha (Table F3). 257 Q5 Q8 Q12 Q16 Q20 Q22 Q24 Q29 Q33 Scale Mean If Item Deleted 18.31 17.98 18.13 17.69 17.85 18.27 17.85 18.10 18.29 Table F2. Scale Variance If Item Deleted 39.20 37.43 36.51 35.47 32.96 35.73 34.33 36.44 37.50 Corrected Item-Total Corr. .46 .40 .75 .62 .60 .76 .73 .67 .58 Squared Multiple Corr. .44 .43 .70 .60 .62 .74 .58 .76 .55 Alpha If Item Deleted .87 .88 .84 .85 .86 .84 .84 .85 .86 Item-Total Statistics of Usefulness of Explanations Ql Q2 QIO Ql l Q14 Q15 Q17 Q19 Q21 Q28 Scale Mean If Item Deleted 20.17 20.56 20.85 20.42 20.54 20.38 20.50 20.73 20.83 20.81 Scale Variance If Item Deleted 49.00 47.43 50.41 45.46 52.02 51.30 46.37 48.95 52.73 51.10 Corrected Item-Total Corr. .56 .76 .65 .64 .61 .43 .74 .63 .60 .44 Squared Multiple Corr. .53 .65 .59 .80 .61 .44 .84 .64 .53 .29 Table F3. Item-Total Statistics of Usefulness of KBS Alpha If Item Deleted .86 .85 .86 .86 .86 .87 .85 .86 .86 .87 Among the ten items for the trust in KBS scale, Q26 and Q34 had the lowest item-to-total correlation, as shown in Table F4, the deletion of these two items from this scale would increase the scale variance and slightly increase Cronbach's alpha. Among the five items of the Ease of Use scale, Q32 had the lowest item-to-total correlation, as shown in Table F5, the deletion of this item from this scale would also increase scale variance. 258 Scale Mean If Item Deleted 23.40 22.65 22.19 23.19 23.21 22.79 23.00 23.52 23.15 23.77 Scale Variance If Item Deleted 45.97 43.56 44.35 45.65 45.86 45.35 52.35 46.25 47.86 52.30 Corrected Item-Total Corr. .67 .77 .60 .70 .60 .62 .16 .60 .67 .35 Squared Multiple Corr. .56 .65 .59 .61 .38 .55 .14 .69 .58 .31 Table F4. Item-Total Statistics of Trust in KBS Alpha If Item Deleted .83 .82 .83 .82 .84 .83 .87 .83 .83 .85 Q4 Q7 Q25 Q27 Q32 Scale Mean If Item Deleted 7.25 7.10 7.60 7.88 7.40 Scale Variance If Item Deleted 7.45 6.87 7.85 8.93 9.03 Corrected Item-Total Corr. .48 .55 .68 .65 .41 Squared Multiple Corr. .39 .36 .63 .60 .36 Alpha If Item Deleted .74 .71 .67 .70 .75 Table F5. Item-Total Statistics of Ease of Use All together, seven items which had substantially small item-to-total correlations were eliminated from the respective scales through the reliability analysis to increase scale variance and overall reliability. They were Q5, Q8, Q15, Q28, Q32, Q26, and Q34. Validity. The FACTOR module of SPSSx (1985) was used to examine the extent to which the individual items of each scale would converge and load on the underlying factor. Because factor analysis is not sensitive to distribution problems, raw scores of perception measures were used in Q3 Q6 Q9 Q13 Q18 Q23 Q26 Q30 Q31 Q34 259 the factor analysis. The major problem encountered was the high level of overlap between the Usefulness of KBS scale and Usefulness of Explanations. When the remaining 15 items of the two scales were put together, and subjected to factor analysis, six items representing 40% of the items did not load "properly": four loaded on the wrong factor, two others either loaded on both factors or did not load on either. The overlap was probably caused by the similarity in the wordings of the two scales (with the only difference being the subject word, either "FINALYZER" or "Explanations provided by FINAL YZER"). Subjects might have focused on the various aspects of usefulness, and paid little attention to the subject word (i.e., what was really useful). Given the high level of overlap between the two scales and the research focus being explanations, the Usefulness of KBS scale was dropped. The Usefulness of Explanations was kept as the single measure of perceived usefulness. A confirmatory factor analysis (principal component) was conducted on the remaining 19 items, by pre-specifying the number of factors as three, because the three scales were all adopted from previously validated measures. The three factors collectively accounted for 61.4% of total variance in the total 19 items (Table 6). Each factor had a Eigenvalue significantly larger than the usual threshold level of one. Factor Eigenvalue % of Variance Cum. Variance (%) 1 7.26 38.2 38.2 2 2.62 13.8 52.0 3 1.78 9.4 61.4 Table F6. Eigenvalues of the Three Factors Varimax rotation produced the rotated factor matrix, with only loadings exceeding .4 being displayed in Table F7. All of the remaining seven items of the Usefulness of Explanations loaded on Factor 1. The eight items of the Trust scale loaded on the Trust factor (Factor 2), including the seven items from Lerch et al's original trust scale (1993), in addition to Q13. All of the four items of the Ease of Use scale loaded on a single factor, and were kept in the scale. The fact that all of the 19 items in Table F7 loaded on the corresponding underlying factors indicated that all of the 260 three scales had a satisfactory level of convergent and discriminant validity. Q12 Q29 Q22 Q24 Q20 Q33 Q16 Q3 Q6 Q13 Q9 Q30 Q23 Q31 Q18 Q25 Q27 Q4 Q7 FACTOR 1 (Usefulness of Explanations) .88 .82 .78 .77 .73 .64 .62 .46 Table F7. FACTOR 2 (Trust) .79 .75 .75 .71 .70 .64 .60 .54 .48 Factor Cross-Loadings FACTOR 3 (Ease of Use) .44 .90 .85 .57 .49 261 APPENDIX G. STATISTICAL POWER ANALYSIS The power of a statistical test is the probability that it will yield statistically significant result, i.e., the rejection of the null hypothesis (Cohen, 1988, p. 4). The notion of effect size is of central importance in statistical power analysis. Cohen (1988) defined effect size as "the degree to which the phenomenon is present in the population," or "the degree to which the null hypothesis is false" (p. 10). When used in calculating statistical power, it should be pre-specified, based on theory or prior empirical evidence. Given the external constraints such as the number of subjects available, the sample size of 13 in each treatment condition was determined to detect "large" treatment effect size with sufficient statistical power (above .80) at a significance level of .05, as discussed in Chapter 5. For small and medium effect sizes, the power was known to be insufficient. Cohen (1988) introduced detailed procedures of power analysis for common statistical analyses, such as ANOVA, chi-square tests, and multiple regression, which were utilized in this research. He also provided operational definitions of "small," "medium," and "large" effect size for those statistical analyses. The procedures were followed to estimate the power of the three types of statistical analyses. ANOVA. ANOVA tests were used in Section 6.2 for examining the effects of explanation provision method and expertise on explanation use. Effect size index (f) is the standard deviation of the standardized population means, to indicate the degree of departure from no-effect. In this case of two population means to be compared, f = d / 2, where d = (m^ - m^J / a mmax = t n e larger one of the two means, mmin = the smaller one of the two means, and o = the (common) standard deviation with the population. Given the .05 significance level used throughout this research, and the number of observations (n = 12 to 15) in each treatment condition, the power of the analysis (p) is: 262 small f (.10) medium f (.25) large f (.40) p = .10 p = .43 p = .82 While large effect sizes were expected, ANOVA in this research had less than 50% chance of detecting medium or small effect size. Chi-Square Tests. Chi-square tests were used for assessing user preference for context of explanation use (Section 6.2) and explanation types (Section 6.4). Because data were in the form of frequencies of various types of explanations or explanations requested in various contexts, the total number of observations was large, almost always larger than 200. Effect size index (w) measures the discrepancy between the distribution specified by the alternative hypothesis and that represent the null hypothesis. Because large or medium effect size were expected, chi-square tests used in this research generally had sufficient power, as indicated by the following. Given the .05 significance level, and the number of observations (N = 200 or more), the power of the chi-square tests (p) is: For 2 x 3 tables, For 2 x 2 tables, small w (.10) medium w (.30) large w (.50) small w (.10) medium w (.30) large w (.50) p = .23 p = .72 p = .97 p = .29 p = .99 p > .995 263 APPENDIX H. MATERIALS RELATED TO VERBAL PROTOCOL ANALYSIS CONTENTS: Page: HI. Instructions to Verbalize 265 H2. Instructions to the Research Assistant 266 H3. Coding Scheme for Verbal Protocols and Work Sheets 267 H4. Instructions to the Coder 274 264 HI. INSTRUCTIONS TO VERBALIZE INSTRUCTIONS FOR THINK-ALOUD In this study, we are interested in your running commentary on what you are attempting to do, what is going through your mind while you interact with the expert system and work on the financial analysis case. Therefore, we will ask you to THINK ALOUD CONSTANTLY. What we mean by think-aloud is that we want you to reason in a loud voice, SAY OUT LOUD everything that passes through your mind for each step as you interact with the expert system and work on the financial analysis case. It does not matter if your sentences are not complete, since you are not explaining to anyone else. Just act as if you are alone in the room speaking to yourself loudly. It is most important that you keep talking. If you are silent for more than 10 seconds I will remind you to keep talking aloud. EXERCISE I Before turning to the real task, we will start with a couple of practice problems. Please talk aloud while you work on these problems. First, please add two numbers in your head, and say out aloud each step. Now TALK ALOUD while you calculate 476 + 688 = ? EXERCISE II Please multiply two numbers in your head, and say out aloud each step. Now TALK ALOUD while you calculate 24 X 36 = ? 265 H2. INSTRUCTIONS TO THE RESEARCH ASSISTANT The following are some suggested verbal instructions when asking the subjects to verbalize: "We hope you don't mind speaking out your thoughts when using the expert system, and allowing us to tape record the process. The purpose is to follow closely how the expert system affects your thinking." It might be useful to relate think-aloud to people's past experiences. For instance, "when you read some thing alone, do you realize sometimes you read it out aloud?" "Sometimes, when you think about something, you actually mumble to yourself." "We'd like you to do that now while using the expert system. The only difference is to keep your voice loud enough for the tape recorder." Don't use the buzzwords like "verbalize," "verbal protocol," etc to confuse the subjects. "Here are the instructions." "Please keep your voice loud enough for the tape recorder to pick it up." "Try to behave naturally like talking to yourself, no need to explain or describe to anyone else." "Don't be afraid to speak out, I'm not an expert in this domain." When the subjects are working on the warm-up exercise, encourage them to speak out the whole process as complete as possible. "Try to speak out everything comes to mind for each step of the process." Only when necessary, try the second arithmetic problem. It is ok to take a little while, the important thing is train them to keep talking. Then, restart the CREDIT-ADVISOR tutorial. The subject will start thinking aloud by going through one sub-analysis. If necessary, try another one, until he/she is comfortable with doing it. The research assistant should remind the subject to keep talking if there is a silence for a significant length of time (>15 seconds), by "keep talking" or "what are you thinking?" Don't start the real task until the subject becomes comfortable with verbalizing. All the prompts during and after the training should come from the written instruction to the subjects and this sheet. This is to maintain consistency and the target level of verbalization. 266 H3. CODING SCHEME FOR VERBAL PROTOCOL AND WORK SHEETS 1. Unit of Coding The basic unit (or episode) of verbal protocol analysis corresponds to screens of expert system display, e.g., conclusions, explanations, data tables, and ratio screens. In other words, verbal protocols are first broken down consistently, and then analyzed, in the unit of screens of expert system display. Verbal protocols generated by a subject when working on conclusion screens, where usually most of the verbal data were triggered, are coded as Cxv. Here, x is the corresponding analysis id and y indicates the id of the conclusion in analysis x. For example, C6.1 represents the first conclusion in the Liquidity Analysis (No. 6). Verbal protocols generated by a subject when working on deep explanation screens are coded in the same way as the explanations are named, e.g., DE-how-6.1 (meaning how the first ratio in the Liquidity Analysis, i.e., current ratio, is defined. Similarly, RTE-how-6.1 refers to the reasoning-trace How explanations for the first conclusion in the Liquidity Analysis (No. 6). Verbal protocols generated by a subject when working on the judgment questionnaire are coded as Ql to Q8, depending on which question is under consideration. If it is not possible to determine the exact question number, it is coded as Q. 2. Coding Scheme Once the verbal protocols have been broken down to labelled basic units (episodes) corresponding to screens of display, each unit is analyzed to identify useful information. This section highlights what types of information will be extracted from verbal protocols, with the focus on the nature (reasons) of explanation use. The following detailed coding scheme with examples is designed to identify the specific reasons of explanation use, and the frequency of explanation use for each reason, in various experimental conditions. 2.1 Deep Explanation Use: (1) Learning about unknown or unfamiliar domain concepts (with absolutely no prior knowledge) 267 Examples. "Funds reinvestment ratio, take a look! Never heard of that before." "What does it mean?" "Never heard of that in my life, I'm going to ask Why." A subject may paraphrase or rephrase domain concepts explained by the system (indicating one's effort of trying to understanding them), e.g., "What is liquidity index? So that's dollars by days in receivable, what does that give you? dollar times days, divided by dollars, so that's going to be days..." (2) Seeking additional information. The subject probably knows something already about domain concepts (procedures), but wishes to know more. Examples. "I want to see how they do trend evaluation, that is really important." "See how it does it." (3) Refreshing memory. The domain concept is something familiar to the subject, but it is partially or completely forgotten. Examples. "I forget what it means" (before view explanations), "I should know that" (after viewing). (4) Confirming or comparing with one's own knowledge. The subject has a definite idea about the domain concept, but requests an explanation for confirmation or comparison. Examples. "Ok, that's the way I would do it." "That makes sense to me." (5) Others (to be specify by the coder) For example. Deep explanations could be sought when the subject perceives something is important. "I want to see how they do trend evaluation" (It is hard to determine the level of prior knowledge, in this case.) Browsing is probably the most likely case. Curiosity may be another. (6) Inconclusive In some cases, there is not enough information to determine to which of the above categories the use of an explanation belongs. The classification of the use of deep explanations is mainly based on a subject's prior familiarity with the domain concepts (from no prior knowledge at all to nearly complete prior knowledge.) 2.2 Reasoning-Trace Explanation Use: (1) Understanding the significance (importance) or relevance of a particular recommendation 268 (when something not thought of or unexpected is given) Examples. "Why is it important?" "Why do you say that?" "Why?" (2) Understanding the process of analysis of a particular recommendation. Wen one can more or less accept the conclusion, but has no idea about the reasoning process and would seek justification. Examples. "How did they do that?" "How?" (3) Confirming or comparing with one's own judgement. The subject has some idea more or less consistent with the system's recommendation, which is critical to decision making. Examples. "Makes sense." "That is what I expected." "I have to agree, let's see Why and How." "Yes, I noticed." "Yes, that's what I thought." (4) Seeking missing (additional) details, which are specific, and might be useful to back up or justify a recommendation. Examples. "Where do they get R&D numbers?" "Let's take a look at problems of credit sales letting the account receivable build-up... (read explanation) poor collection practices." (5) Surprise, causing the subject to seek more information. The subject might be expecting something different, or to a different degree. Examples. "I have no idea what they are talking about, let's see Why...., Let's see How." "Where did you get that? That's bizarre. How?" "What? This doesn't seem to be consistent with..." "How the devil can you know that?" (6) Disagreement, causing the subject to seek more information. The subject is expecting something complete contrary to what is given. Examples. "I don't agree with that at all, ..., How did it do that?" "I don't know. I don't agree with that one." "We have to disagree here. Let's take a look, why and how." (7) Others (to be specified by the coder) Examples. Curiosity. Again, browsing may be the most likely case. (8) Inconclusive In some cases, there is not enough information to determine to which of the above categories the use of an explanation belongs. 269 The classification of the use of reasoning-trace explanations is mainly based on the extent to which the subjects already have expectation, prior understanding, or opinion (they all have studied the case manually first). 3. Reasons for Not Using Deep Explanations (1) Familiarity with the task domain. Examples. "I don't need these ones (ratio explanations)." "I know all these." "These are just generic background, right?" (2) Others (to be specified by the coder) 4. Reasons for Not Using Reasoning-Trace Explanations: (1) Trust Examples. "I don't need Whys and Hows on that one, I trust that one (a conclusion)." (3) Distrust Examples. "This is a very difficult conclusion to reach." "There could be a lot of other reasons and possibilities." "This is an very iffy one." (2) Strong Agreement Examples. "Strongly agree" is usually not associated with any explanation use. (4) Strong Disagreement (kind of difficult to separate from Strong distrust.) (Relatively rare. It appears, usually disagreement would motivate one to seek explanations.) Examples. "Disagree." (5) Low Perceived Relevancy Examples. "I don't think it's relevant, here." 270 Subject No. The Use of Deep Explanations (Screen Id (Explanation Id), e.g., DE-how-4.1) 1. Learning a new concept: 2. Seek additional information: 3. Refreshing memory: 4. Confirming/comparing: 5. Others (specify): 6. Inconclusive (specify the most likely class): Total No. of Explanations (coded): Total No. of Explanations used: 271 Subject No. The Use of Reasoning-Trace Explanations (Explanation id, e.g., RTE-how-6.3) 1. Understanding significance: 2. Understanding process: 3. Confirming/comparing: 4. Seeking missing details: 5. Surprised (expecting something different): 6. Disagreement (expecting something opposite): 7. Others (specify, e.g., seeking additional justification): 8. Inconclusive (specify the most likely class): Total No. of Explanations (coded): Total No. of Explanations used: 272 Subject No. CODING SCHEME NOT USING EXPLANATIONS (Screen Id, e.g., C5.1) Deep Explanations 1. Familiarity: 2. Others (specify): Reasoning-Trace Explanations 1. Agree: 2. Strongly agree: 3. Disagreement: 4. Strongly disagree: 5. Low relevance: 6. Others (specify): ANECDOTAL INFORMATION 273 H4. INSTRUCTIONS TO THE CODER ADDITIONAL NOTES FOR CODING THE VERBAL PROTOCOLS The following is the instructions given to the subjects before they started using the expert system. It was provided to the subjects to help them understand what types of explanation should be sought when necessary. This may also be useful for analysing the protocols. The expert system you will be using will provide you with the following information: 1) the ratios and other inputs that it will use for each analysis, 2) the specific conclusions arising from each analysis that it has performed, and 3) explanations relating to both (1) and (2). For each input used or conclusion presented, three different types of explanations will be provided. These are the WHY, HOW and STRATEGIC explanations: WHY explanations either justify why a particular input or ratio is needed for an analysis, or rationalize why a particular conclusion that has been reached is important for the task. HOW explanations either detail how a particular input or ratio is defined and computed, or reveal how a particular conclusion has been reached by presenting a trace of the evaluations. STRATEGIC explanations either provide the overall structure in which all the relevant input information is organized, or the overall problem solving strategies in which a particular conclusion fits. 1. Some Additional Guidelines There are three general principles that we have agreed upon: First, we give more weight to comments made BEFORE explanations were sought, to determine the reason of explanation use. For example, we'd like to know if the subject had some idea already, by paying attention to whether the subject seemed to agree, to be surprised, and the presence of verbal clues as small as "Ok," "right." However, comments made during or subsequent to viewing the explanations can be used complementarily, to determine how much the subject knew about the ratio/conclusion. We want to infer from all the information whether there was some prior knowledge or a prior opinion. In short, we look for intention to request explanations, not conclusion as a result of 274 viewing explanations. Second, related to the previous one, it is always the level of prior knowledge on ratios or prior opinion on conclusions that plays the most important role in determining the coding. Thus, we should first look for signs of prior knowledge or opinion. For example, in case of reasoning-trace explanations, only if we know that the subject had some kind or prior own judgment, understanding, or opinion, we can then determine whether the explanation use was a result of surprise, disagreement, seeking missing details, or confirmation. Otherwise, the subject would be more likely seeking help for understanding the conclusions. Third, in case there is not much clue to help the classification, we should put the explanation Id into the "INCONCLUSIVE" or "OTHERS" classes, rather than guessing too much into the case. It would be helpful if the most likely possibility is noted. 2. Additional Specific Guidelines for Reasoning-Trace Explanations First, as subjects were told to use How explanations if they were concerned about the reasoning process, and Why to find out about the significance of the conclusions (why they were important for decision making, i.e., why they were important?), it is usually safe to classify the use of RTE-how explanations as "understanding the reasoning process," and RTE-why as "understanding the significance" of a particular conclusion. The assumptions would be valid if: (1) there is evidence that the subject did not have a firm understanding or opinion regarding the conclusion; and (2) there is evidence that the subjects clearly understood the difference between Why and How explanations. Sometimes, the verbal clue can be as short as "Why?" (for significance), or "How?" "How come?" (for reasoning process). However, sometimes, people were confused between the Why and How, when they requested RTE-why explanations they were actually concerned about the reasoning process (justification). Therefore, each case has to be analyzed based on the situation. When the above two assumptions are valid, if the verbal comment is "Why?" it indicates the subject probably knew less than if the comment is "How?" In the latter case, the subject at least knew the significance, and was probably trying to understand some details of the reasoning process. Second, if the subject read both the RTE-how and RTE-why explanation, usually it indicates the subjects either had no prior opinion, or very strong disagreement with that particular conclusion. This implies the subject was probably having some difficulty accepting the conclusion. 275 Third, if the subject said "agree" or "disagree" after reading an explanation, it should be treated differently from if they said "agree" or disagree" or showed surprise before reading the explanations. In the former case, we can not assume that they had a prior opinion just based on this information, and go ahead to classify the reason for using the explanation as to confirm/compare, or disagreement. They might say "agree" or "disagree" because they were forced by the system to specify agreement or disagreement. While in the latter case, it is a clear sign of prior understanding/opinion. Fourth, if the subject used explanations, without giving any verbal comments, just put the explanation Id into the "INCONCLUSIVE" class, for the time being. Usually, we can take it as just looking for more justification for the conclusion. 3. Additional Specific Guidelines for Deep Explanations First, if the use of an explanation may be equally likely to be classified as either "CONFIRMING/COMPARING" or "SEEKING ADDITIONAL INFORMATION" class, put the explanation Id into the former, as most of the subjects knew the material, and had the theoretical background. (Note this rule only applies to deep explanations.) However, the system is constructed as such, it includes a number of ratios which are seldom used in practice, and new to most of the subjects. The subjects might use the explanations to learn about concepts new to them, as well, e.g., as indicated by "How is that defined?" or "Never heard of it." Second, sometimes it is difficulty for a person who is not familiar with the domain to determine whether the subject was "SEEKING ADDITIONAL INFORMATION (Class 2)" or "LEARNING A NEW CONCEPT (Class 1)." In fact, all the ratios should be commonly known to the subjects, except a few listed here: funds adequacy ratio (DE-how-10.5, DE-why-10.5), funds reinvestment ratio (DE-how-10.4, DE-why-10.4), liquidity index (DE-how-6.11, DE-why-6.11), and possibly internal growth rate (DE-how-9.6, DE-why-9.6). Other than these, even if a subject did not seem to have prior knowledge, He or she would be most likely seeking additional information, rather than learning about new concepts. Third, another usually good indicator of a subject's attempt to learn about a new concept as opposed to just browsing or seeking additional information, is that the subject might read both the DE-how and DE-why explanation, to get as much information as possible. Otherwise, he or she might just read one of the two. A similar indicator is whether the subject paraphrased, re-phrased, and interpreted the explanation in length versus just simply reading it, perhaps even silently (as reflected in the length of the corresponding protocol). 276 Fourth, when reading deep explanations, if the verbal comment is "Why?" it indicates the subject probably knew less than if the comments is "How?" In the latter case, the subject at least knew the existence of the ratio (and probably its function as well), and was trying to understand some details or refresh memory. Fifth, it is possible that at the very beginning of system use (protocol), a subject might browse through a few explanations to test out how the system would work. Thus, there would be very little verbal protocol. Explanation use as such may be put into "OTHERS" class with a note of "BROWSING." Sixth, in case there is no verbal clue at all (not even a "Why?" or "How?"), the use of explanation could also be put into the "INCONCLUSIVE" class, with a note of "BROWSING." This may emerge as a major class of reasons of explanation use later. 277 APPENDIX I. NATURE OF EXPLANATION USE (Refined Categorization) Type of DE Use Frequency Learning about new or unfamiliar concepts Seeking additional information. Refreshing memory Confirming / comparing Others Inconclusive Total Expertise Novices 20 16.4% 58 47.5% 11 9.0% 27 22.1% 1 0.8% 5 4.1% 122 63.9% Experts 12 17.4% 9 13.0% 10 14.5% 26 37.7% 7 10.1% 5 7.2% 69 36.1% Total 32 16.8% 67 35.1% 21 11.0% 53 27.7% 8 4.2% 10 5.2% 191 100.0% Statistics Value DF Significance Pearson's chi-square 30.01 5 .00 Note: Two out of the 12 cells (16.7%) have expected frequency less than 5. Table II. Effects of Expertise on the Use of DE Table II illustrates patterns of DE use by novices and experts. All of the four major categories of explanation use were substantiated by data. Each of the four major categories accounted for at least 10% of total use of DE. The smallest one (Refreshing Memory) of the four categories was still greater than Others and Inconclusive together. 278 Type of DE Use Frequency Learning about new or unfamiliar concepts Seeking additional information. Refreshing memory Confirming / comparing Others Inconclusive Total Explanation Methods Linear text 7 14.9% 20 42.6% 6 12.8% 13 27.7% 1 2.1% 0 0.0% 47 24.6% Hyper text 25 17.4% 47 32.6% 15 10.4% 40 27.8% 7 4.9% 10 6.9% 144 75.4% Total 32 16.8% 67 35.1% 21 11.0% 53 27.7% 8 4.2% 10 5.2% 191 100.0% Statistics Value DF Significance Pearson's chi-square 5.20 5 .39 Note: Two out of the 12 cells (16.7%) have expected frequency less than 5. Table 12. Effects of Explanation Provision Methods on the Use of DE Table 12 illustrates patterns of DE use by hypertext and lineartext users 279 Type of RTE Use Frequency Understanding Significance Understanding Process Confirming / Comparing Seeking Missing Details Surprise Disagree Others Inconclusive Total Expertise Novices 32 23.9% 52 38.8% 25 18.7% 1 0.7% 5 3.7% 6 4.5% 4 3.0% 9 6.7% 134 54.0% Experts 21 18.4% 31 27.2% 35 30.7% 3 2.6% 3 2.6% 10 8.8% 2 1.8% 9 7.9% 114 46.0% Total 53 21.4% 83 33.5% 60 24.2% 4 1.6% 8 3.2% 16 6.5% 6 2.4% 18 7.3% 248 100.0% Statistics Value DF Significance Pearson's chi-square 10.89 7 .14 Note: Six out of the 16 cells (37.5%) have expected frequency less than 5. Table 13. Effects of Expertise on the Use of RTE Table 13 illustrates patterns of RTE use by novices and experts. Only three of the eight initial categories emerged as major categories of RTE use. Four of them could be combined with others; none of the four accounted for more than 8% of total RTE use. 280 Type of RTE Use Frequency Understanding Significance Understanding Process Confirming / Comparing Seeking Missing Details Surprise Disagree Others Inconclusive Total Explanation Methods Linear text 21 17.6% 42 35.3% 31 26.1% 0 0.0% 3 2.5% 10 8.4% 3 2.5% 9 7.6% 119 48.0% Hyper text 32 24.8% 50 38.8% 29 22.5% 4 3.1% 5 3.9% 6 4.7% 3 2.3% 0 0.0% 129 52.0% Total 53 21.4% 92 37.1% 60 24.2% 4 1.6% 8 3.2% 16 6.5% 6 2.4% 9 3.6% 248 100.0% Statistics Value DF Significance Pearson's chi-square 17.17 7 .02 Note: Eight out of the 16 cells (50%) have expected frequency less than 5. Table 14. Effects of Explanation Provision Methods on the Use of RTE Table 14 illustrates patterns of RTE use by hypertext and lineartext users. 281 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0088358/manifest

Comment

Related Items