Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Empirical studies in information modeling: interpretation of the object relationship Siau, Keng Leng 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1996-148351.pdf [ 10.3MB ]
Metadata
JSON: 831-1.0087859.json
JSON-LD: 831-1.0087859-ld.json
RDF/XML (Pretty): 831-1.0087859-rdf.xml
RDF/JSON: 831-1.0087859-rdf.json
Turtle: 831-1.0087859-turtle.txt
N-Triples: 831-1.0087859-rdf-ntriples.txt
Original Record: 831-1.0087859-source.json
Full Text
831-1.0087859-fulltext.txt
Citation
831-1.0087859.ris

Full Text

Empirical Studies in Information Modeling: Interpretation of the Object Relationship By Keng Leng Siau B.Sc, National University of Singapore, 1988 B.Sc. (Hons.), National University of Singapore, 1989 M.Sc, National University of Singapore, 1991 A THESIS SUBMITTED FN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Commerce and Business Administration) We accept this thesis as conforming to the required standard Trie University of British Columbia S e p t e m b e r 1996 © K e n g L e n g S i a u , 1 9 9 6 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. The University of British Columbia Vancouver, Canada Date DE-6 (2/88) i I ABSTRACT Information modeling is the cornerstone of information systems analysis and design. Information models not only provide the abstractions required to facilitate communication between designers and end users, they also provide a formal basis for tools and techniques used in developing and us ing information systems. This dissertation reports on four empirical studies in information modeling. The four studies focus on an important, yet controversial, construct in information modeling — the relationship construct. The theoretical foundation for the four experimental studies comprises theories and findings from the information systems, cognitive psychology, computer science, philosophy, and communication literature. Because of the paucity of empirical research in the area, a two-stage research design, consisting of the exploratory and formalized phases, is employed in this dissertation. Two studies were conducted in the exploratory phase. The first exploratory study investigated the effect of domain familiarity on selection of mandatory or optional connectivity for the relationship construct by modeling experts. The findings indicate that modeling experts tend to choose optional over mandatory relationships, even for domains that are totally unfamiliar to them. The second exploratory study analyzed the effect of conflicting textual information and structural constraints on selection of mandatory or optional connectivity by modeling experts. The results show that modeling experts tend to focus on the information depicted by the structural constraints and ignore the textual information. This exploratory phase allowed us to explore and develop empirical research methods and instruments for studying the relationship construct in information modeling. In the second phase, two formalized studies were conducted. The first formalized study investigated the differences between modeling experts and novices in their interpretation of information models. The results show significant differences in the way modeling experts and novices interpret information models. Modeling experts focus mainly on the structural constraints and de-emphasize the textual information. Modeling novices, on the other hand, pay more attention to the textual information than modeling experts. The second formalized study examined the effect of different representations of relationship on the interpretation of information models by modeling novices. The findings indicate that the explicitness of relationship construct and the use of verb versus noun description for relationship have a profound impact on the accuracy of interpretation. The best combination is one that uses an explicit relationship construct and verb for relationship description. The worst combination is one where the relationship construct is represented implicitly and described using noun. - iii -TABLE OF CONTENTS Abstract ii Table of Contents i v List of Tables viii List of Figures xi Acknowledgment ivx Dedication vx Chapter 1 Introduction 1 1.1 Information Modeling 1 1.2 The Pivotal Role of Information Modeling 3 1.3 Modeling Constructs 5 1.4 Problems with Modeling Constructs 6 1.5 Objectives of this Dissertation 8 1.6 Focus on the Relationship Construct 9 1.7 Contribution of this Dissertation 10 1.8 Organization of Dissertation 11 Chapter 2 The Odyssey of the Relationship Construct 13 2.1 The Beginning 13 2.2 The Crisis 15 2.3 The Rebirth 18 2.4 Reflections on the Odyssey 19 - iv -Chapter 3 Human Information-Processing System 21 3.1 Architecture of Adaptive Control of Thought (ACT) 23 3.2 Working Memory 24 3.3 Declarative Memory 25 3.4 Production Memory 28 3.5 Processes in ACT 31 3.6 Summary 33 Chapter 4 Research Approach and Method 35 4.1 Two-Stage Research Approach 35 4.2 Experimental Method 36 4.3 Experimental Task 37 4.4 Components of the Relationship Construct 39 4.5 Overview of the Studies 42 Chapter 5 Exploratory Phase — Two Exploratory Studies 44 5.1 First Exploratory Study — Effect of Domain Familiarity 44 5.2 Second Exploratory Study - Effect of Conflicting Structural Constraints and Textual Information on Connectivity Selection 61 Chapter 6 Study 1 — Differences in Interpretation of Information Models by Modeling Experts and Novices 82 6.1 Background 82 6.2 Research Question 84 6.3 Theoretical Foundation 87 6.4 Research Framework 92 6.5 Experimental Procedures 99 6.6 Research Hypotheses 100 6.7 Experimental Results 104 6.8 General Discussion 117 - v -Chapter 7 Study 2 ~ Effect of Relationship Representations on Interpretation of Information Models by Modeling Novices 127 7.1 Background 127 7.2 Research Questions 128 7.3 Theoretical Foundation 132 7.4 Research Framework 148 7.5 Experimental Results 159 7.6 General Discussion 187 Chapter 8 Conclusions and Suggestions for Future Research 196 8.1 Summary of Dissertation Research 197 8.2 Potential Contributions of Experimental Findings 201 8.3 Theoretical Contribution to Information Modeling 203 8.4 Possible Future Research Directions 204 References 208 Appendix A Consent Form and Demographic Questionnaire 228 Appendix B Training Materials for Exploratory Study 231 Appendix C Question Format and Set of Questions for Exploratory Study 1 234 Appendix D Question Format and Set of Questions for Exploratory Study 2 237 Appendix E Training Materials for Formalized Study 1 240 Appendix F Question Format and Set of Questions for Formalized Study 1 244 - vi -Appendix G Experimental Materials for Second Formalized Study Appendix H Sample Coding Sheet - vii -List of Tables Table 3.1 A Production System to Understand an ER Diagram Table 5.1 Subjects' Familiarity with ER Modeling Method and Constructs Table 5.2 Number of Subjects and Observations for Each Domain Table 5.3 Counts for Each Domain and Choice Table 5.4 Analyses of Variance for Confidence Level and Perceived Domain Familiarity Table 5.5 Means and Standard Deviations for Confidence Level and Perceived Domain Familiarity Table 5.6 Confidence Level and Perceived Domain Familiarity by Question Table 5.7 Research Design for Second Exploratory Study Table 5.8 Number of Subjects and Observations for the Study Table 5.9 Frequency of May and Must Choices for Each Group Table 5.10 Analyses of Variance for Confidence Level and Perceived Domain Familiarity Table 5.11 Means and Standard Deviations for the Two Variables Table 5.12 Confidence Level and Perceived Domain Familiarity by Question Table 5.13 Choice of Connectivity for Conflicting ST and its Corresponding NST Groups Table 5.14 Analyses of Variance for Confidence Level and Perceived Domain Familiarity Table 5.15 Means and Standard Deviations for the Two Variables Table 5.16 Confidence Level and Perceived Domain Familiarity by Question vm -Table 6.1 Number of Subjects and Observations for Each Group Table 6.2 Counts for Experts and Novices Table 6.3 Analyses of Variance for Confidence Level and Perceived Domain Familiarity for Control (No Structural Constraints) Table 6.4 Means and Standard Deviations for Confidence Level and Perceived Domain Familiarity for Control (No Structural Constraints) Table 6.5 Counts for Experts and Novices for Conflicting Questions Table 6.6 Counts for Experts and Novices for Non-Conflicting Questions Table 6.7 Choice of Connectivity by Questions Table 6.8 Analyses of Variance for Confidence Level Table 6.9 Analyses of Variance for Perceived Domain Familiarity Table 6.10 Means and Standard Deviations Table 6.11 Confidence Level and Perceived Domain Familiarity by Questions Table 7.1 Number of Subjects in Each Cell Table 7.2 Number of Chunks in Each Cell Table 7.3 Analyses of Variance for Accuracy of Interpretation Table 7.4 Percentages (Means and Standard Deviations) of Correct Interpretations of Explicit and Implicit Relationship Constructs Table 7.5 Percentages (Means and Standard Deviations) of Correct Interpretations of Verb and Noun Descriptions Table 7.6 Percentages (Means and Standard Deviations) of Correct Interpretations Table 7.7 List of Error Codes Table 7.8 Error Counts - ix -Table 7.9 Error Counts for Explicit and Implicit Relationship Constructs in Phase 1 Table 7.10 Error Counts for Verb and Noun Relationship Descriptions in Phase 1 Table 7.11 Error Counts for Explicit and Implicit Relationship Constructs in Phase 2 Table 7.12 Error Counts for Verb and Noun Relationship Descriptions in Phase 2 Table 7.13 Summary of Errors by Types - x -List of Figures Figure 1.1 Information Modeling and Systems Development Figure 3.1 The ACT Architecture Figure 3.2 Diagrammatic Representation of Propositions for E R Approach Figure 3.3 Network of Propositions for ER Approach Figure 4.1 Relationship Construct and its Components Figure 5.1 Experimental Framework Figure 5.2 ER Diagrams Representing Familiar and Unfamiliar Domains Figure 5.3 Percentage Difference Between Must and May Choices Figure 5.4 Mean Scores for Confidence Level and Perceived Domain Familiarity Figure 5.5 ER Diagram with "Natural" Mandatory Connectivity Figure 5.6 ER Diagram with "Natural" Optional Connectivity Figure 5.7 Experimental Framework Figure 5.8 Examples of Conflicting and Non-Conflicting Relationships Figure 5.9 Percentage of May and Must for Each Group Figure 5.10 Perceived Domain Familiarity and Confidence Level for Each Group Figure 5.11 Number of May Choices for the Two Groups Figure 5.12 Number of Must Choices for the Two Groups Figure 5.13 Confidence Level for Each Question Figure 5.14 Perceived Domain Familiarity for Each Question - xi -Figure 6.1 Johari Window Figure 6.2 Three-Stage Learning Model Figure 6.3 Research Framework Figure 6.4 Examples of Conflicting and Non-Conflicting Relationships Figure 6.5 Confidence Levels of Experts and Novices Figure 6.6 Perceived Domain Familiarity of Experts and Novices Figure 6.7 Interaction Graph for Confidence Level Figure 6.8 Interaction Graph for Perceived Domain Familiarity Figure 6.9 Optional Relationship Between Student and Book Figure 6.10 Optional Relationship Between Instructor and Course Figure 6.11 A Nonsensical Information Model Figure 7.1 Explicit Representations of the Relationship Construct Figure 7.2 Implicit Representations of the Relationship Construct Figure 7.3 Examples of Relationship Descriptions Figure 7.4 A n ER Representation Figure 7.5 Two Informationally Equivalent Information Models Figure 7.6 Data Flow Diagram 1 Figure 7.7 Data Flow Diagram 2 Figure 7.8 Information Model 1 Figure 7.9 Information Model 2 Figure 7.10 Research Model Figure 7.11 Many-to-many Instance Connection Figure 7.12 Explicit and Implicit Representations of Relationship Construct - xii -Figure 7.13 The Four Possible Combinations of Relationship Explicitness and Relationship Description Figure 7.14 Graphical Representation for Percentages of Correct Interpretations of Explicit and Implicit Relationship Constructs Figure 7.15 Graphical Representation for Percentages of Correct Interpretations of Verb and Noun Descriptions Figure 7.16 Graphical Representation for Percentages of Correct Interpretations Figure 7.17 Error Counts Figure 7.18 Number of Errors for Each Representation Figure 7.19 Number of Errors Per Chunk for Each Representation Figure 7.20 Total Number of Errors for Each Representation Figure 8.1 Future Research Directions Figure 8.2 A Multi-Methodological Approach - xiii -SPECIAL To Fui Hoon ~ You are always my best friend. I will always love you with all my heart. To my Mom — Thanks for your support and understanding. I love you. To my Sis — You are the best and I love you. I could not have done this Ph.D. without your understanding and you taking care of Mom. To my dissertation committee: Yair Wand — You believe in me and have faith in my research ability. Not only are you involved in the research, you are also very concerned about my job-search. Thanks for everything. Izak Benbasat ~ Thanks for being there from the start. Your thoroughness and demand for perfection make the dissertation what it is today. Marshall Arlin You brought me into the world of psychology and statistics and now I am "trapped" in their beauty. Thanks for the guidance, encouragement, and understanding throughout my Ph.D. program. To my good friends who stand by me in times of difficulty: Michael Brydon, Andrew Gemino, Samson Hui, Charalambos Iacovou, Suzanne Johnston, Robbie Nakatsu, Paula Newell, Gary Schwartz, Errol Smythe, Yih Sze Tan, Ah Quee Toh, Judy Yek - xiv -'Dedicated to my fate father. I miss you. - X V -CHAPTER 1 Introduction A scientific field can arise only on the base of a system of concepts. Russel L. Ackoff 1.1 Information Modeling Information modeling is central to information systems analysis and design. Information modeling is defined as "the activity of formally describing some aspects of the physical and social world around us for the purpose of understanding and communication" (Mylopoulos 1992). Kilov and Ross (1994) defined it as the "process of creating an understandable and elegant specification of the business rules of an enterprise." Information modeling takes place in the early phases of software development life cycle. It involves investigating the problems and requirements of the user community, and from that, building a requirements specification for the desired system (Rolland & Cauvet 1992, Kangassalo 1990). The product of the information modeling process is an information model (e.g., data flow diagram, entity-relationship diagram). On one hand, information models provide the conceptual basis for communicating and thinking about information systems (Willumsen 1993). On the other hand, they provide a formal basis for tools and techniques used in the design and development of information systems (Kung & Solvberg 1986). The information modeling and systems development processes are depicted in Figure 1.1 (adapted from Rolland & Cauvet, 1992). User Requirements Modeling* Process Requirement Engineering Information Modeling Information Model Systems Development Design Process Design Engineering Implemented System Figure 1.1 Information Modeling and Systems Development As can be seen from the diagram, the information model serves as a bridge between information modeling and systems development (Rolland & Cauvet 1992, Willumsen 1993, Dahlbom & Mathiassen 1993). 1.2 The Pivotal Role of Information Modeling Despite the rapid advancement of technology in the last few decades, accurate, on-time, and on-budget completion of software projects is still a vision rather than a reality. The software crisis is still very much alive almost thirty years after it was first defined (Gibbs 1994, Kendall 1996, Brodie 1996). For instance, Yourdon (1989) reported application backlogs of four to seven years or more, and Jones (1986) observed that a typical project is one year late and 100% over budget. In addition, the maintenance phase typically consumes up to 70% of all programmer effort (Kendall 1996), and it is errors, not enhancements, that account for 40% of maintenance (Rush, 1985). Page-Jones (1988) writes "It looks as if traditionally we spend about half our time making mistakes and the other half of our time fixing them." In 1994, the IBM's Consulting Group (Gibbs 1994) released the results of a survey of 24 leading companies that had developed large distributed systems. The numbers were unsettling: 55% of the projects cost more than budgeted, 68% overran their schedules, and 88% had to be substantially redesigned. In the early days of computerized information systems, technological failure was the main cause in the failure of business data processing systems (Avison & Fitzgerald 1995). Today, the failure of information systems is rarely due to technology which is on the whole reliable and well tested. Failure is more likely to be caused by miscommunication and misspecification during the information modeling phase (Holtzbolt & Beyer 1995, Kendall & Kendall 1996). For example, Brooks (1987, p. 16) writes "the hardest part of the software task is arriving at a complete and consistent specification." Gladden (1982) estimated that information systems failures run as high as 70% of all information systems, and Boar (1984) reported that 60-80% of these failures result from poor requirements specifications. Other researchers (e.g., Lientz & Swanson 1980, Ramamoorthy et al. 1984) have also indicated that about two-thirds of the maintenance cost can be attributed to misconception ~ not identifying the real needs — during information modeling. Yourdon (1989) concurred, stating that 50% of errors and, more important, 75% of the cost of error removal can be attributed to errors in information modeling. The later in the project life-cycle a change is made or an error is uncovered, the more it costs to incorporate the change or remove the defect (Halpin 1995, Kendall 1996). The importance of attaining accurate and correct requirements specification during information modeling is emphasized by Boehm (1976) : "It is estimated that the relative cost of fixing problems detected during final testing or operation is 50-100 times greater than for problems detected during requirements specification." For example, a recent survey of hundreds of Digital's staff and an analysis of the corporate planning database revealed that on average, 40% of the requirements specified in the feasibility and requirements phase of the life cycle were redefined in the later phases. This cost Digital an average of 50% more than budgeted (Hutchings & Knox 1995). No one claims that doing a perfect job during the information modeling phase will single-handedly eliminate the software crisis. Nevertheless, as major problems in software development occur during information modeling, it makes sense to approach the problems by first understanding and resolving the obstacles in that phase. 1.3 Modeling Constructs Modeling constructs are the pillars of any information modeling method. The expressive and modeling power of an information modeling method depend largely on its modeling constructs. Modeling constructs, as defined by Sernades et al. (1989), are semantic primitives that are used to organize and represent knowledge about the domain of interest. These constructs form the core of information models. Modeling constructs also directly affect the philosophy (i.e., ontology and epistemology) of information systems analysis and design. Ontology is concerned with the essence of things and the nature of the world (Wand & Weber 1988, 1989, 1990, 1993, Avison & Fitzgerald 1995). In information modeling, the ontological position reflects the view of reality of information systems analysis and design (Klein & Lyytinen 1983). The nominalist position in ontology argues that "reality is not a given immutable 'out there', but is socially constructed. It is the product of human mind" (Hirschheim & Klein 1989). According to this nominalist position, the choice of modeling constructs directly influences what the modeling method regards as important and meaningful versus what it suggests as unimportant and irrelevant. For example, the use of the entity-relationship (ER) approach emphasizes entities and relationships but ignores the processes involved. The use of the object-oriented (OO) approach, on the other hand, emphasizes objects and the behavior of objects. In addition, the choice of modeling constructs directly affects the epistemology of the modeling method. Epistemology relates to the way in which the world may be legitimately investigated and what may be considered as knowledge (Avison & Fitzgerald 1995). The choice of modeling constructs, therefore, constrains how one can know or learn about reality ~ the basis of one's claim to knowledge (Klein & Lyytinen 1983, Walsham 1993). For example, users of the entity-relationship approach would focus on identifying entities and relationships whereas users of data-flow diagram (DFD) would emphasize the eliciting of processes, data flows, external entities, and data stores from the problem domain. 1.4 Problems with Modeling Constructs Despite their importance, research on modeling constructs has been largely neglected in the literature to date. Most modeling constructs are introduced based on common sense, superficial observation, and intuition of researchers and practitioners. Theoretical foundation and empirical evidence are either non-existent or considered non-essential. For example, Coad and Yourdon (1991, p. 16) nicely sum up the practitioners' scant concern: "It would be intellectually satisfying to the authors if we could report that we studied the philosophical ideas behind methods of organization, from Socrates and Aristotle to Descartes and Kant. Then, based on the underlying methods human beings use, we could propose the basic constructs essential to an analysis method. But in truth we cannot say that, nor did we do it. "(emphasis added) However, our common sense can be wrong and radically misleading at times. As researchers, we need to approach the problem scientifically and provide, at least, an explanation of why these constructs are or are not important. Common sense can never serve as a substitute for empirical research (Flanagan & Dipboye 1981). With this laissez-faire attitude, one could not help but cast doubts on the usefulness and importance of some of these modeling constructs. It is probable that some of these constructs are not actually actors in the modeling drama, but merely incidental artifacts, created by researchers to help them categorize their observations. These artifacts may play no significant role whatsoever in modeling the real world. The resulting effect of this haphazardness is the dramatic changes in modeling constructs as one moves from one modeling paradigm to another. This scene has been played out on the modeling stage many times before. For example, one wonders why such varied constructs like activities (Kung & Solvberg 1986, Lundberg et al. 1981), processes (Jackson 1983), data flows (DeMarco 1979, Gane & Sarson 1979), entities (Chen 1976, Teorey 1990), and objects (Coad & Yourdon 1991, Embley et al. 1992, Rumbaugh et al. 1991, Yourdon et al. 1995) are used for describing information systems. This point is also argued by Grant (1985), who writes: "While there is no question that some progress toward better systems development methods has been made during the last 30 years, there is equally no question that, overall, any process of change is slow and discontinuous. Speaking from a historical perspective, it seems that for true progress to be realized, there must be a periodic, collective rethinking of basic ideas." 1.5 Objectives of this Dissertation The objectives of this dissertation are to explore, develop, and apply empirical methods for evaluating and studying modeling constructs in information modeling. The dissertation reports a series of experimental studies undertaken to study a modeling construct known as relationship. Other than developing empirical techniques, these experimental studies allow us to understand the behavior of modeling experts and novices in using the relationship construct as well as help us to identify better ways of presenting this construct to end-users. It is our hope that the results will provide elementary understanding of user behavior in information modeling and a background against which to place future studies in context. With respect to theories, this dissertation draws heavily on theories from cognitive psychology in the hope of providing a scientific foundation for understanding and studying modeling constructs in information modeling. The theory of human information-processing system (IPS) forms the main theoretical foundation for this research. Theories from communication, philosophy, and computer science are also used in this dissertation. Although our intention is to eventually adopt, integrate, and develop coherent theories and well-structured theoretical models for this area, this process of development requires extensive experimentation. 1.6 Focus on the Relationship Construct In this dissertation, we focused on one popular and particularly controversial construct — the relationship — which was introduced by Chen (1976) to overcome the semantic deficiency of the relational model. Relationship is commonly defined as the association between two or more entities (e.g., Chen 1976, Elmasri & Navathe 1989, X3H7-93). For example, the IDEF standards (Laamanen 1994, p. 127) state that "relationships are associations between entities." It should be noted that the term relationship is used very loosely in the literature. The three common types of association that are defined as relationship in the literature are (Martin & Odell 1992, Embley et al. 1992): composition (Is-Part-Of or aggregation), generalization-specialization (IS-A), and object or class relationship (relationship or relationship set). For the purpose of this dissertation, we are interested only in the third category of relationship (i.e., object or class relationship). Hereafter when we mention the term relationship, it means the object or class relationship. Although the emphasis on the relationship construct was undertaken as an initial step towards developing the empirical techniques required for such studies, the selection of the relationship construct as the focus of this dissertation is not an extemporaneous choice. Since its introduction by Chen (1976), the relationship construct has been hailed as a major construct by the entity-relationship (ER) community and is regarded by many researchers as natural and intuitive in modeling the real world. Although ignored by the Object-Oriented (OO) community in the beginning, the relationship construct has begun to be adopted by part of the Object-Oriented community. This is evidenced in the recent revival of the relationship construct in new OO models (e.g., Booch 1991, Rumbaugh et al. 1991, Graham 1991, Embley et al 1992, McGregor & Sykes 1992, Champeaux et al. 1993, Khoshafian 1993) and the inclusion of the relationship construct in OO standards (e.g., X3H7-93, Laamanen 1994). Despite claims by many researchers that the relationship construct is natural and intuitive, in practice, the reality is quite different. Several recent studies (e.g., Goldstein & Storey 1990, Batra et al. 1990, Batra & Sein 1994) found the relationship construct to be problematic in information modeling, particularly the degree (e.g., unary, ternary) and connectivity (i.e., structural constraints) of the relationship. Consequently, the relationship construct has a good mix of importance and controversy to warrant an in-depth analysis to assess its value and usefulness in information modeling. 1.7 Contribution of this Dissertation We believe that human factors and issues related to the modeling process deserve more attention in, and should become part of, the information modeling field. As one of the first attempts to evaluate modeling constructs, this research opens up a potentially rich and fertile research area. The research methods and techniques documented here provide a template for future studies. -10-With the focus on the relationship construct, this research also seeks to make a significant contribution to the understanding of this construct, its role and value in information modeling, and better ways of representing the construct to end-users. The results will be of particular interest to both researchers and practitioners in the field who are puzzled by the introduction, withdrawal, and revival of the relationship construct in information modeling. A better understanding of the relationship construct may also contribute to the development of ER and OO standards. This research is of interest to IS management as well. Management of information resources is one of the most critical strategic issues in organizations (Brancheau & Wetherbe 1986). A better understanding of modeling constructs in information modeling will undoubtedly contribute to improving the quality and productivity of systems development in organizations. The results of recent surveys by Niederman et al. (1991), McCormick (1991), and Grover and Goslar (1993) indicate management's concerns about software quality. A Delphi survey of senior IS executives (Niederman et al. 1991) ranks quality of software development as one of the top ten management issues in both service and manufacturing companies. 1.8 Organization of Dissertation The organization of the dissertation is as follows: Chapter 2 discusses the research issues in information modeling and the research questions pursued in this dissertation. Chapter 3 outlines the theory of human information-processing system and discusses a cognitive architecture known as Adaptive - 11 -Control of Thoughts (ACT). Chapter 4 discusses the research approach and method employed in this dissertation. Chapter 5 describes the design of two exploratory studies and discusses the experimental results, and lessons learned from the two studies. Chapters 6 and 7 each reports on a "formalized" study carried out in this dissertation. Chapter 8 concludes the dissertation and proposes a future research agenda. - 12-CHAPTER 2 The Odyssey of the Relationship Construct J have a cat named Trash... If I were trying to sell him (at least to a computer scientist), I would not stress that he is gentle to humans and is self-sufficient, living mostly on field mice. Rather, I would argue that he is object-oriented. Roger King The relationship construct is the focus of this dissertation because of its prominent and controversial nature. Initially, it was regarded by many researchers, especially those in the ER community, as one of the major constructs in modeling the real world. Several recent empirical studies on modeling, however, reveal that users face problems using the construct despite its reputation of being natural and intuitive. The "metamorphosis" of the construct in object-oriented paradigm is itself very intriguing. The following sections trace the odyssey of the construct over the course of the last two decades. 2.1 The Beginning The entity-relationship model, proposed by Chen (1976), combines simple knowledge representation techniques, borrowed from semantic networks, with -13-database technology, leading to a modeling method that promises both modeling power and performance. The two basic modeling constructs in the ER model are entity and relationship. An entity is an object that exists in our mind and can be distinctly identified. Each entity has particular properties, called attributes, that describe it. A relationship is an association among entities. Chen (1976) claimed that the ER model is easier to use than previously proposed models because it incorporates relationships among entities. Similarly, Rolland and Cauvet (1992, p. 30) mentioned that the ER model corresponds "to a major improvement in the field of conceptual modeling, making possible a reasoning on real world entities and relationships between them, instead of building data structures." Kilov and Ross (1994, p. 47) stress that "the most important contribution of the ER approach is the acknowledgment that objects do not exist in isolation -- entities have relationships." Although many researchers have accepted the fact that the relationship construct represents a natural view of the real world (Chen 1976, Teorey et al. 1986, Elmasri & Navathe 1989, Teorey 1990, Vossen 1991, Graham 1991), this belief is mainly based on common sense and intuition rather than empirical findings. As pointed out by Brodie (1984), the popularity of the ER model is due to "the widespread belief in entities and relationships as natural modeling concepts." Though the ER model seems quite simple and straightforward on the first exposure, a closer look reveals some unexpected subtleties (Goldstein 1985). The relationship construct appears to be particularly problematic in the ER model because one is often not sure when to use relationship, attribute, or - 14-even entity to represent something in the real world. For example, Goldstein and Storey (1990) found that users of an automated database design tool had difficulty distinguishing between relationships and attributes. Similarly, Elmasri and Navathe (1989) note that "it is sometimes convenient to think of a relationship type in terms of attributes." According to Peckham and Maryanski (1988), "the relationship construct may appear in the model as an attribute, entity, independent element, or function." Codd (1990) also states that "one person's entity is another person's relationship." The empirical study by Batra et al. (1990) which compares the relational and ER representations indicates that the "most notable error found in the solutions prepared by the subjects was the incorrect representation of connectivity of relationships." Thus, the naturalness and intuitiveness of the relationship construct come into question. 2.2 The Crisis In the 1980s, we witnessed a paradigm shift from structured techniques (i.e., structured programming, structured design, structured analysis) to object-oriented (OO) techniques (Snyder 1993, Meyer 1996). Yourdon (1994, p. 14) writes "Object Orientation has been heralded as a silver bullet, a paradigm shift, a killer technology, and even a cure for whatever ails the U.S. software industry (and by extension, whatever ails the software industries of other countries.)" Modeling is not spared from the onslaught of object orientation. With the proliferation and acceptance of object orientation, numerous OO analysis and design methodologies were proposed by researchers (e.g., Shlaer & Mellor 1988, 1992, Coad & Yourdon 1991, Rumbaugh 1991, Martin & Odell 1992, Yourdon 1994, Yourdon et al. 1995, Satzinger & Orvik 1996, Norman 1996, Dewitz 1996). - 15-An OO approach, simply put, is the use of OO concepts for modeling the domain. The basic unit in OO is the object, which encapsulates its data description (or attributes) and the operations that apply to it (Embley et al. 1992). Objects are grouped into classes that have common properties. Classes are organized into hierarchies in which the subclasses inherit the properties, including data definitions and operations, from their superclasses. In the OO approach, interactions between objects are handled by means of message passing. The operation that manipulates the encapsulated data in the object is called a method. Kim (1990) identified the core OO modeling constructs as consisting of object, object identifier, attributes, methods, encapsulation, message passing, class, class hierarchy, and inheritance. One important construct that is clearly missing from this core is relationship. What happened to the relationship construct that was once proclaimed as natural and intuitive in modeling the real world ? As noted by some researchers (e.g., Elmasri & Navathe 1989, Graham 1991), one of the main distinctions between OO and ER models is that the abstraction of association (represented by relationship in the ER model) is not directly supported in the OO model but is achieved indirectly through interobject references. Elmasri and Navathe (1989, p. 444) conclude that this is "an inherent weakness of the OO approach and is due to the fact that this approach treats each object as a self-contained unit of information". Kim (1990) also indicates that "we view a core object-oriented data model largely as a subset of a semantic data model; of course, semantic data models lack methods." - 16-The absence of relationship construct in early object-oriented information modeling can be partially attributed to its inception by the programming community (Kent 1990, Satzinger & Orvik 1996). Languages such as Simula-67 and Smalltalk are pioneer object-oriented programming languages (Kim 1990). The object-oriented concepts then moved to areas such as artificial intelligence, databases, and information modeling. According to Wegner (1992), an object-oriented language is one which is based on objects, classes, and inheritance. There might not be a need for relationship (the type defined by Chen (1976)) in the data intensive programming world. However, the progression of OO approach from the implementation intensive programming world to the high-level information modeling world results in a problem of fit. For example, Embley et al. (1995, p. 19) stress: "Analysis focuses on real-world problems, whereas design and implementation focus on computerized solution." According to Rumbaugh et al. (1991), the emphasis on implementation mechanisms rather than the underlying thought process that they support is a step backward. They (p. 4) argue that: "the real payoffs come from addressing front-end conceptual issues, rather than back-end implementation issues. Design flaws that surface during implementation are more costly to fix than those that are found earlier. Focusing on implementation issues too early restricts design choices and often leads to an inferior product." Similarly, Kilov & Ross (1994, p. 9) state that "the library of generic programming concepts is at too low a level for analysis. The programmer doing analysis is forced to understand the higher-level business concepts in terms of low-level programming concepts." As such, the need for higher level - 17-of abstractions in information models implies that certain constructs that are absent in the OO programming world may be needed in the information modeling world. The absence of such constructs may be detrimental to information modeling. 2.3 The Rebirth Graham (1991, p. 182) argues that the "problem (with OO models) is the difficulty of modeling relationship types in an object-oriented database." Kim (1990, p. 12) also mentions that "the core (OO) model, though powerful, simply does not capture some of the semantic-integrity constraints and semantic relationships that are important to many types of applications." Ullman (1988) states that "to model the real world adequately, it is often necessary to classify relationships according to how many entities from one entity set can be associated with how many entities of another entity set." Even though the relationship construct is downplayed or ignored by some of the earlier OO models, it has resurfaced in some of the more recently proposed ones (e.g., Booch 1991, Rumbaugh et al. 1991, Graham 1991, Embley et al. 1992, McGregor & Sykes 1992, Champeaux et al. 1993, Khoshafian 1993, Satzinger & Orvik 1996) — sometimes under different names such as link or association. The relationship construct has begun to appear in OO standards and reference models (e.g., OODBTG-91, X3H7-93, Laamanen 1994). The reason is nicely summarized by Booch (1991) — "an object by itself is intensely uninteresting. Objects contribute to the behavior of a system by collaborating with one another." Similarly, Embley et al. (1992, p. 18) write: "objects... are often meaningless unless we understand some relationships among them." This -18-leads to the proposal of object-relationship model by Embley et al. (1992, 1995) and Yourdon (1994). The OO models by Champeaux et al. (1993) and Rumbaugh et al. (1991) even include the notion of object-relationship which is almost identical to the relationship construct proposed by Chen (1976). Martin and Odell (1992) also introduced the object-relationship diagram. Sometimes, when one deals with the OO models, it is as if one is seeing the shadows of the ER model. Yourdon et al. (1995, p. 316) write: "Martin/Odell include E-R analysis, and include more E-R modeling features than Rumbaugh or Coad/Yourdon. For example, the methodology allows N-ary relationships between objects (while Coad/Yourdon only allow binary relationships), as well as relationships between relationships." It seems clear that this controversy regarding the use of relationship construct in information modeling has not been resolved. Our research attempts to fill this gap in knowledge by systematically and empirically analyzing how modeling experts and novices use the relationship construct. This was carried out through a series of experiments and verbal protocol analyses. 2.4 Reflections on the Odyssey The main reason for this unsettling state is the lack of theoretical foundation and empirical evidence on the usefulness of relationship construct in information modeling. Although many researchers have accepted, or once accepted, that relationship is natural in modeling the real world, this belief is - 1 9 -based mainly on common sense and intuition rather than systematic and empirical evaluation. Churchland (1988) submits that "our common-sense psychological framework is a false and radically misleading conception of the cause of human behavior and the nature of cognitive activity." The rebirth of the relationship construct is intriguing. For users to fully understand an information model, the model should provide them with a high-level representation of the application domain rather than the implementation details of the information system (Kangassalo 1990, Willumsen 1993). Everest (1986) also stresses that "the main criteria applied to the model is that it has image fidelity, that is, the model must conform to the user's views of the world...." If relationship is indeed one of the main constructs that humans use to understand and view the world, then the concept of relationship should exist in information modeling. On the other hand, if relationship does not correspond to human view of the world, then there would be no reason to retain this ghostly construct that does not stand for anything real, but represents only descriptive approximations of observations in which it has no part to play. This, however, cannot be settled by argument or speculation. Systematic empirical studies are required. -20-CHAPTER 3 Human Information-Processing System Defining concepts is frequently treated by scientists as an annoying necessity to be completed as quickly and thoughtlessly as possible. A consequence of this disinclination to define is often research carried out like surgery performed with dull instruments. The surgeon has to work harder, the patient has to suffer more, and the chances of success are decreased. Russel L. Ackoff According to Mylopoulos (1992, p. 50), the key concern in information modeling "is to structure the representation of knowledge about a subject consistently to the way humans structure that same knowledge and to make sure that the procedures that use these representations draw the same, or at least a subset of the inferences people would draw when confronted with the same facts." To understand the representation and use of knowledge by humans, we need to approach the issues from a human information-processing perspective. According to Newell and Simon (1972), all humans are information-processing systems (IPS) and hence come equipped with certain common, basic features. -21 -The information-processing paradigm views thinking as a symbol-manipulating process and uses computer simulation as a way to build theories of thinking (Simon 1979). It attempts to map the flow of information that a human is using in a defined situation (Gagne et al. 1993) and tries to understand the general changes of human behavior brought about by learning (Anderson 1995). The information-processing position is currently the dominating viewpoint in cognitive psychology (Card et al. 1983, Simon & Kaplan 1989) and the most frequently employed framework for the study of memory (Hall 1989, Anderson 1995). Moray (1984) argued for the use of knowledge accumulated in cognitive psychology to understand and solve applied problems. Card et al. (1983, p. 1) also write "advances in cognitive psychology and related sciences lead us to the conclusion that knowledge of human cognitive behavior is sufficiently advanced to enable its applications in computer science and other practical domains." Researchers in human-computer interaction have also demonstrated that such an effort is valuable and essential in building a scientific understanding of the human factors involved in information modeling. The purpose of this chapter is to provide the theoretical foundation for this research by discussing the human information-processing system and its implications for information modeling. This background knowledge is necessary for the discussions of various experimental results in the later part of the dissertation. However, the specific theories and hypotheses associated with the individual studies are discussed in the respective chapters. -22-3.1 Architecture of Adaptive Control of Thought (ACT) One of the most popular and well-known human information-processing models is the Adaptive Control of Thought (ACT) architecture proposed by Anderson (1983, 1993, 1995). The ACT architecture consists of three types of memories — declarative, production, and working — as shown in Figure 3.1. Application Declarative Memory G* Retrieval Production Memory Working Memory Execution Encoding Performance t Outside World Figure 3.1 The ACT Architecture The declarative and production memories are long term memories. Declarative memory contains factual knowledge that humans can report or -23-describe, whereas production memory is knowledge that can lead to performance (Anderson 1993). In plain language, declarative knowledge is knowing that something is the case and production knowledge is knowing how to do something (Gagne et al. 1993). For example, knowing that "rectangles represent entities" and "diamonds represent relationships" in the ER diagram is declarative knowledge and our ability to interpret the ER diagram is production knowledge. Working memory contains information that the system currently has access to. It consists of information retrieved from long-term declarative memory as well as temporary structures deposited by encoding processes and the action of productions. 3.2 Working Memory The working memory is activation based; it contains the activated portion of the declarative memory and the declarative structures generated by production firings and perception. Working memory is a temporary memory that cannot hold data over any extended duration. Information in this memory decays within about 10 seconds (Murdock 1961) unless it is rehearsed. In addition to being of limited duration, working memory is also of limited capacity. Miller (1956) claimed that working memory holds 7 + 2 chunks of information while Simon (1974) claimed that it holds only about 5 chunks. Whatever the exact number (it may vary across individuals), the important point is that it is small. Because of its minuscule size, working memory is often referred to as the "bottleneck" of human information-processing systems. -24-3.3 Declarative Memory The long-term declarative memory is represented in the form of a semantic net. A basic unit of declarative knowledge in the human information-processing system is a proposition and is defined as the smallest unit of knowledge that can possess a truth value (Anderson, 1983). Complex units of knowledge are broken down into propositions. For example, "Rectangles Represent Entities" and "Diamonds Represent Relationships" are two propositions related to ER diagrams. Al l propositions must have at least two parts (Kintsch 1974, Gagne et al. 1993). The first part of the proposition is called the argument, which corresponds to the nouns in a sentence. The second part is called the relation. The relation of a proposition can be determined by analyzing the parts of speech in a sentence. Verbs and adjectives typically make up the relations of a proposition. Although a proposition can have more than one argument, it always has only one relation. The relation of a proposition constrains the arguments. For example, in the sentence "Joe walked," "Joe" is the argument and "walked" is what constrains the argument. "Walked" constrains the topic of "Joe" in that it tells us that of all the information we have about Joe, we are attending only to information about Joe walking. In general, relations narrow the focus. For instance, "walked" narrows the focus of the argument "Joe." The declarative knowledge for the ER approach can be represented as propositions as shown below. Each proposition comprises a relation, followed by a list of arguments: (i) represent, entity, rectangle -25-(ii) represent, relationship, diamond (iii) comprise, ER, entity (iv) comprise, ER, relationship These four propositions can be depicted diagrammatically using Kintsch's system (1974) as shown in Figure 3.2. Rectangle Argument Argument 1 ) — - — • Entity Relation Argument Argument 2 1 • Relationship Relation Argument Argument 3 )— • Entity Relation Argument Argument 4 1 • Relationship Relation Figure 3.2 Diagrammatic Representation of Propositions for ER Approach -26-In A C T , individual propositions can be combined into networks of propositions. The nodes of the propositional network stand for ideas, and the linkages represent associations among the ideas (Anderson 1983). Figure 3.3 shows the network of propositions for the ER approach. Rectangle • Represent ER Comprise Figure 3.3 Network of Propositions for ER Approach -27-3.4 Production Memory Unlike declarative knowledge which is static, production knowledge (also known as procedural knowledge) is represented in the form of productions. When people acquire production knowledge they acquire "cognitive skill" or the ability to use knowledge to do things such as think, solve problems, and make decisions. Each piece of knowledge is called a production because it "produces" some bit of mental or physical behavior. Productions are formally represented as IF-THEN contingency statements in which the IF part of the statements contains the conditions that must exist for the rule to be fired and the T H E N part contains the action that will be executed when the conditions are met. The productions are also known as condition-action pairs and is very similar to the IF-THEN statement in programming languages. For example, the following are the production rules for identifying entity and relationship constructs in the ER model. IF Symbol is a diamond T H E N Symbol represents a relationship IF Symbol is a rectangle T H E N Symbol represents an entity Productions can be combined to form a set. A production system, or set, represents all the steps in a mental or physical procedure. The productions in a production system are related to one another by the goal structure. In other -28-words, each production contributes in some way to achieve the final goal behavior. The use of goals and subgoals in productions creates a goal hierarchy that interrelates the productions into an organized set. For example, Table 3.1 shows a production system to understand an ER diagram. PI IF Goal is to understand ER diagram T H E N Set subgoal of identifying meaningful chunks of information in ER diagram P2 IF Subgoal is to identify meaningful chunks of information in ER diagram T H E N Set subgoal of identifying entity in ER diagram and set subgoal of identifying relationship in ER diagram P3 IF Subgoal is to identify entity in ER diagram and symbol is a rectangle T H E N Symbol represents an entity P4 IF Subgoal is to identify relationship in ER diagram and symbol is a diamond T H E N Symbol represents a relationship Table 3.1 A Production System to Understand an ER Diagram Production knowledge can be differentiated in two dimensions. The first is the domain-general/domain-specific dimension and the second is the automated versus controlled dimension. The following two subsections briefly discuss -29-these two dimensions. These concepts will be further discussed in subsequent chapters. 3.4.1 Domain-General versus Domain-Specific Production Knowledge The first dimension refers to the degree to which production knowledge is tied to a specific domain, with the anchor points of the continuum being termed domain-general and domain-specific (Gagne et al. 1993). Domain-general knowledge is knowledge that is applicable across domains and domain-specific knowledge is specialized because it is specific to a particular domain. The term domain refers to any defined area of content and can vary in its breadth. For example, our general ability to solve problem is domain-general knowledge whereas our ability to interpret ER diagram is domain-specific knowledge. 3.4.2 Degree of Automation The second dimension is "degree of automation" with the end points of the continuum being labeled automatic and controlled (or conscious) (Gagne et al. 1993). An automated process or procedure is one that consumes very few cognitive resources of the information-processing system. Controlled process, on the other hand, is knowledge that underlies deliberate thinking because it is under the conscious control of the thinker. For example, driving would be a controlled process for a learner whereas it would be a automated process for an experienced driver. -30-3.5 Processes in A C T The processes in the architecture of ACT (as shown in Figure 3.1) are described as follows: (i) Encoding process deposits information about the outside world into working memory. (ii) Performance process converts commands in working memory into behavior. (iii) Storage process creates permanent records in declarative memory of the contents of working memory and increases the strength of existing records in declarative memory. (iv) Retrieval process retrieves information from declarative memory into working memory. (v) Match process compares the data in working memory with the conditions of productions. (vi) Execution process deposits the actions of matched productions into working memory. The entire process of production matching followed by execution is known as production application. In the next subsection, we will discuss how these memories and processes interact when one is trying to understand an information model. -31 -3.5.1 Information Processing Steps in Understanding an Information Model There are three types of processes involved in understanding an information model (Larkin and Simon 1987). To illustrate the processes, we will assume that reader A is trying to locate a relationship in an ER diagram. 3.5.1.1 Encoding and Search Process Encoding and search operate on the data elements and data structures of the information model, seeking to locate sets of elements that satisfy the conditions of one or more productions in the production memory. Using the example above, reader A will encode and search for data elements (e.g., entities and relationships) contained in the ER diagram. The search process requires attention management (see subsection 3.5.1.4). 3.5.1.2 Match Process In this process, the condition parts of productions in production memory are matched to the data elements located through encoding and search (in the working memory). In our case, reader A will match the data elements encoded to the condition parts of productions in the production memory. -32-3.5.1.3 Execution Process Once a match is identified, the production is executed (or fired) to produce new (inferred) knowledge from the data element. For example, if the data element encoded matched the symbol for a relationship (diamond for Chen's ER diagram), reader A can infer that the data element encoded is a relationship. 3.5.1.4 Attention Management System Since information models are complex and the working memory has limited capacity, an attention management system is required to determine what portion of the model is currently attended to and can trigger the productions in the production memory. For example, reader A needs the attention management system to focus on one or a subset of data elements (i.e., entities and relationships) in the ER diagram at any one time. 3.6 Summary This chapter provides an overview of the human information-processing paradigm, which is used as the theoretical foundation in this dissertation for studying and understanding the relationship construct in information modeling. Most of the concepts introduced in this chapter will be further elaborated and expanded on in the subsequent chapters. -33-With the background information provided in this chapter, we are now in the position to discuss the various experimental studies. The next chapter discusses the research methodology used in this dissertation. - 3 4 -CHAPTER 4 Research Approach and Method The need to discover inefficiency early makes it important to externalize (that is, make visible) an evolving design at each stage. Engineering blueprints, for instance, serve this purpose and are useful not only for a designer by calling his attention to trouble spots and potential inconsistencies, but also for a team or an entire organization developing a product. L.A. Belady 4.1 Two-Stage Research Approach As stated in Chapter 1, one of the goals of this dissertation is to develop empirical techniques for studying modeling constructs in information modeling. Because there is a paucity of "well-tested" empirical techniques in this area, this dissertation adopts a two-stage research approach suggested by Cooper and Emory (1995). In arguing for a two-stage approach, Cooper and Emory (1995) stressed that it was important to more fully understand the research problem before a major commitment of effort and resources was made. The first stage of this two-stage research approach is to explore the research area with the limited objectives of (Babbie 1992, Cooper & Emory 1995): (i) more clearly defining and understanding the research problem; -35-(ii) testing the feasibility of a more elaborate study; (iii) developing the research design, methods, and instruments. The second stage involves more formal studies structured with stated hypotheses or investigative questions (Cooper & Emory 1995). Davis (1996) also suggested that the primary goal of exploratory research is to increase the researchers' understanding of the nature of the problem, and these exploratory studies are generally followed by more rigorous studies at a later date, when the situation is better understood by the researchers. Adopting this two-stage approach for the dissertation, two exploratory experimental studies were conducted in the first stage to explore and develop the empirical research methods and instruments for studying modeling constructs in information modeling. In addition, the exploratory phase provides us a better understanding of research tasks and problems, and the gathering of information on the sample size required for future studies. The exploratory studies are followed by a more rigorous second stage where two experimental studies were carefully planned and conducted with stated hypotheses and specific research questions. 4.2 Experimental Method No one empirical method (e.g., case study, survey, experiment) is ideal for all situations and the choice of an empirical method should be driven by the research questions and the objectives of the study, rather than the other way round. Laboratory experimental studies are chosen for this research because of -36-their superiority in controlling extraneous variables (Benbasat 1989) and their ability to manipulate the independent variable (Cooper & Emory 1995). Experimental designs have the advantage of permitting "separation of factors that do not come separately in Nature-as-you-find-it" (Mook 1983). Davis (1996, p. 136) argues that laboratory experiments "allow the researchers to control most major aspects of research design error; they are the most scientifically rigorous of all the designs..." These characteristics of experimental method are particularly important in this research because we are studying the relationship construct which forms part of various modeling methods. It is very difficult, if not impossible, to manipulate modeling constructs in the real world setting because every modeling method comes with its own set of modeling constructs. In addition, we have to be able to control or eliminate the other modeling constructs which might confound the results. For example, the incorporation of attributes in the relationship construct might confound the results. External validity, the major criticism of laboratory experiments, can be enhanced by replicating the studies using different groups of subjects (Cook & Campbell 1979) and different types of tasks. 4.3 Experimental Task There are two categories of task in information modeling — model interpretation and model construction. Model interpretation involves interpreting or validating information given in the information model and model construction requires subjects to construct an information model from -37-a given case description. The interpretation task is selected for this research and used in the four studies in this dissertation. The construction task is, undoubtedly, an important aspect of information modeling. However, we decided to focus on the interpretation task in this research for several reasons. Despite the popularity of end-user computing, large scale information models are still being developed by analysts and validated by end-users (Kim 1990). Thus, the ability of end-users to correctly understand and validate information models is vital to the success of systems analysis and design. Secondly, we strongly believe that as a communication vehicle in information modeling, information models should be end-user driven rather than analysts driven. As professionals, analysts could afford the time and energy to learn the modeling methods whereas the same could not be said about end-users. Therefore, modeling constructs and construct representations that facilitate validation by end-users may be more important than their direct correspondence to design and development (which ease the development tasks of analysts and programmers). Thirdly, even for analysts, there is the constant need to refer back to the information models throughout the software development life cycle. In other words, model interpretation is a task that is common to both analysts and end-users whereas model construction is mainly the prerogative of analysts. Model interpretation is also a task that can be studied using the analytic tools available from cognitive psychology and experimental psychology. This is important as many of the pitfalls (e.g., experimental control, confounding variables) in studying human behavior have already been encountered in the field of cognitive psychology, and the knowledge of these problems, particularly -38-the knowledge of how to avoid these problems in experimental studies, is likely to be useful for our research. Model interpretation, thus, is a task of substantial, but manageable complexity. Because of the intrinsic importance of the task itself, and the tractable complexity of the task, studies on information model interpretation are a natural starting point in the study of human factors in information modeling. 4.4 Components of the Relationship Construct In information modeling, the relationship construct conveys three important pieces of information. These three segments of information are represented by the degree, connectivity, and textual component of the relationship. The degree is represented by the number of participating entities involved. For example, a binary relationship is an association between two entities, and a ternary relationship is an association between three entities. The connectivity captures the mapping between instances of the entities and one popular way of doing so is through the use of structural constraints. A structural constraint is defined as the cardinality ratio and participation constraints taken together (Elmasri & Navathe 1994). It is represented by a pair of integer numbers (min, max) for each participation of an entity type E in a relationship type R, where 0 < min < max and max > 0. The numbers mean that for each entity e in E, e must participate in at least min and at most max relationship instances in R at all time. Structural constraints can be -39-classified into mandatory and optional. If the minimum participation is 0, it is an optional relationship and if the minimum participation is greater than 0, it is a mandatory relationship. Henceforth, mandatory relationship will be depicted using the notation (1, *) whilst optional relationship will be represented using the notation (0, *). At times, we use the symbol * for the "max" participation, which can represent any number greater than "min". Finally, the textual component is another vital component of information. This information might not be as valuable in data modeling where the main objective is to specify the structure of the database. The textual component in the relationship construct is, however, crucial in information modeling because it conveys the meaning of associations to the users. This information, depicted by wordings, helps to clarify ideas presented in the information model and facilitates understanding and communication between analysts and end-users. It should be noted that the textual component of the relationship construct needs to be combined with the textual components of the entities to make sense. The role of the textual component of the relationship construct is similar to the role of relation in propositions which constrains the arguments (discussed in Chapter 3). The three components of the relationship construct are shown in Figure 4.1. Several recent studies (e.g., Batra et al. 1990, Batra & Sein 1994) have found that the degree (e.g., unary, ternary) and connectivity (i.e., structural constraints) of the relationship construct are particularly problematic in data modeling. For instance, the study by Batra et al. (1990) shows that non-experts have little or no difficulty in modeling entities and attributes, but encounter considerable problems in modeling relationships. The novice users found -40-the connectivity of relationships to be the most problematic. The findings from Batra & Sein's (1994) study also indicated that one of the main problems in modeling was the connectivity of relationships. Optional Textual Component Mandatory Connectivity Employer Degree Figure 4.1 Relationship Construct and its Components The degree of relationships has been well investigated by a few researchers (e.g., Batra et al. 1990, Batra & Sein 1994, Batra & Antony 1994) in the data modeling field and therefore will not be studied in this research. Research (e.g., Batra et al. 1990) has also shown that the most common form of relationship, binary relationship (i.e., degree of 2), was not plagued by the problems facing its unary (i.e., degree of 1) and ternary (i.e., degree of 3) counterparts. For binary relationship, Batra and Davis (1992) found the performance of experts and novices to be similar. Although users faced problems with unary and ternary relationships, these relationships are, however, not commonly used in information modeling. Binary relationship forms the vast majority in information modeling (Embley et al. 1992). Embley et al. (1992, p. 28) also emphasize that "higher order associations are -41 -more complicated to draw, implement, and think about than binary associations and should be avoided if possible." As such, this dissertation investigates only binary relationship. 4.5 Overview of the Studies The next three chapters will discuss the two exploratory studies and the two formalized studies. The two exploratory studies and the first formalized study are closely related in terms of experimental tasks and methods of analysis. Based on the experience obtained in these experiments, different experimental task and data collection technique were used in the second formalized study. Chapter 5 discusses the research design and the results of the two exploratory studies, and the implications of these studies for the subsequent studies. Because of the exploratory nature of these two studies, we did not provide detailed theoretical foundations for these studies other than those discussed in Chapter 3. The studies in the exploratory phase focus on the connectivity of relationships which was found to be problematic for users (Batra et al. 1990). The first exploratory study investigates the effect of domain knowledge on subjects' interpretation of connectivity of the relationship construct. The second exploratory study looks at the effect of conflicting connectivity and textual information on subjects' interpretation of connectivity of the relationship construct. The two formalized studies conducted in the second phase are reported in Chapters 6 and 7. The first formalized study is a more formal and rigorous -42-expansion of the second exploratory study. It investigates the differences between modeling experts and novices in their interpretation of the relationship construct when the connectivity and textual information are conflicting. One of the objectives of this study is to gain insight into the behavioral differences between modeling experts and novices. In addition to the theory on human-information processing system, theories from communication and learning literature are also used in formulating the research hypotheses and in interpreting the results. The second formalized study (reported in Chapter 7) deviates from the first three studies. It investigates the effect of different relationship representations on modeling novices' interpretation of information models and utilizes thinking-aloud protocols. Since the introduction of the relationship construct in ER model by Chen (1976), different symbolic representations for the construct have been introduced by different researchers for no apparent reason. This situation has become more acute with the adaptation of the relationship construct in OO modeling where some researchers simply masquerade the relationship construct as objects or instance connections (as discussed in Chapter 2). Other than theories from cognitive psychology, this study also uses theoretical models from philosophy and computer science for conception of research hypotheses and explanation of results. Although the first three studies and the last one differ in experimental tasks and data collection methods, they are all related in their objectives to explore empirical methods to evaluate modeling constructs and apply those methods to the interpretation of the relationship construct. -43-CHAPTER 5 Exploratory Phase — Two Exploratory Studies Thinking ... is possible only when a way has been found of breaking up the "massed" influence of past stimuli and situations. F.C. Bartlett The main objectives of this exploratory phase, as stated in Chapter 4, are to explore, develop, and investigate empirical methods and instruments for studying modeling constructs and methods in information modeling. The following sections describe the research design and findings of the two exploratory studies. 5.1 First Exploratory Study - Effect of Domain Familiarity1 5.1.1 Background Ideally, an information model should be self-contained so that users need not bring their own perceptions and knowledge to interpret the model. As argued 1 This study has been published as "A Psychological Study on the Use of Relationship Concept — Some Preliminary Findings", Lecture Notes in Computer Science -- Advanced Information Systems Engineering, Vol. 932, J. Iivari, K. Lyytinen, M . Rossi (eds.), 1995, Springer-Verlag, pp. 341-354. (with Wand Y., and Benbasat, I.) -44-by Kung and Solvberg (1986), "the number of distinctions one can draw when classifying features of the world is in principle infinite." If the users and analysts bring their own knowledge and perceptions of the real world to interpret the model, the use of information models as a communication tool between analysts and end-users is likely to result in distortion, or even breakdown. However, an information model is an abstraction of the real world. During the construction of information models, irrelevant and less important information is suppressed so as to focus on those that are more crucial to the current purpose (Yourdon et al. 1995, Norman 1996). Abstraction is, undoubtedly, a necessary and critical step in the construction of information models. For example, Yourdon et al. (1995) write "... all software development approaches incorporate some kind of abstraction." Abstraction is, however, not without its pitfalls. When we have to interpret an information model which is an abstraction of the real world, more often than not, we need to bring our own perception of the real world in order to understand the information model. According to Borgida et al. (1985), "information representation is a kind of perception, because the only way for us to know about reality is through the perceptions of human beings." Wand and Weber (1988, p. 213) also argue that "humans may perceive systems that exist only in their minds." This problem is likely to be more acute in information modeling because the analysts and users have different sets of knowledge. Typically, end-users of the systems have a good understanding of their application environment, but they -45-have little familiarity with the modeling methods (King & McLeod 1984). Analysts, on the other hand, are specialists in information modeling, but have little understanding of the application domain. This disparity is likely to aggravate the distortion and misinterpretation. The first exploratory study attempts to investigate the effect of domain familiarity on selection of connectivity for the relationship construct by modeling experts. The issue of domain familiarity is important for information modeling. For example, Vessey and Conger (1993) reported that knowledge of the application domain improved novice systems analysts' ability to specify information requirements. The study by Shaft & Vessey (1995) found that programmers used more top-down comprehension processes when they are familiar with the application domain and more bottom-up comprehension when the application domain is unfamiliar. It should, however, be noted that the main objectives of this study on domain familiarity are to explore and develop empirical methods and techniques. As such, literature review and theoretical foundation, which are usually major components in formalized studies, are omitted in the exploratory phase. 5.1.2 Experimental Framework For the first exploratory study, the subjects were given information models and were asked to select the appropriate connectivity for the models. The independent variable that we investigate in this exploratory study is domain characteristics and this is a within-subjects factor. This one factor repeated--46-measures design (also known as within-subjects design) (Keppel 1983) is chosen so that each subject can act as his/her own control. The experimental design is summarized in the following diagram and discussed in more detail in the subsequent subsections. Domain Familiarity — Familiar Domain j ~ Unfamiliar Domain 1 Dependent Variables — Choice of Connectivity - Confidence Level ~ Perceived Domain Familiarity* * As a manipulation check, we also measured the subjects' perceived familiarity with the domain Figure 5.1 Experimental Framework 5.1.2.1 Independent Variable The independent variable for this study is domain familiarity which consists of two levels — familiar versus unfamiliar domain. This is a repeated-measures (or within-subjects) factor. The questions on familiar domains were taken from common real-life examples and should be familiar to the subjects in our study — such as university course selection and enrollment. For unfamiliar domains, we used very specialized knowledge from areas such as neurocognition and psychophysics, which are alien to our subjects. For example, the first diagram in Figure 5.2 is of a familiar domain to our subjects whereas the second is of an unfamiliar domain. -47-Familiar Domain Unfamiliar Domain Figure 5.2 ER Diagrams Representing Familiar and Unfamiliar Domains All the subjects were presented with the same set of questions consisting of both familiar and unfamiliar domains. The entire set of questions (i.e., familiar and unfamiliar groups) were randomly ordered in each questionnaire — every questionnaire is unique in order. This ensures that order effect is controlled by randomization. Al l the diagrams presented are without structural constraints and are shown in Appendix C. 5.1.2.2 Dependent Variables The dependent variables in this study are choice of connectivity (i.e., mandatory or optional relationship), confidence level of the interpretation, and perceived domain familiarity. The choice of connectivity is a multiple-choice question. Subjects were presented with a choice of Must or May to describe the relationship construct. Must corresponds to mandatory relationship whereas -48-May corresponds to optional relationship. We decided to use Must and May instead of (1,*) and (0,*) because the former is much more intuitive and understandable. This ease of understanding is important because this instrument might be used in subsequent studies that involve novice subjects. A total of 12 questions consisting of 6 from familiar domains and 6 from unfamiliar domains were given to each subject (see Appendix C for the question format). To counter order bias in the response format, half of the questions listed Must as the first choice and the other half listed May as the first choice. This is to control for subjects who simply circled the first choice for each question. The confidence level of the interpretation ranges from 1 to 7 (1 — No confidence, 7 — Absolute confidence). As a manipulation check, we also captured, for each question, the subjects' perceived familiarity with the domain depicted by the ER diagram. This domain familiarity variable has values ranging from 1 to 7 (1 — Not familiar at all, 7 — Very familiar). 5.1.2.3 Subject Characteristics We ran the two exploratory studies in the MIS department's weekly workshop. The weekly workshop is attended by MIS faculty members, and MIS Ph.D. and M.Sc. students. The typical attendance in the workshop is about 20-25 people. The use of convenience sample is acceptable because of the exploratory nature of the studies. -49-A demographic information sheet was completed by each subject. The consent form and demographic questionnaire are shown in Appendix A. The subjects' self-reported expertise in information modeling was measured using a scale of 1 to 5 (1 — Totally unfamiliar, 5 — Very familiar). The following table summarizes their expertise with the ER modeling method and constructs: Method/Constructs Mean Std. Dev. ER Model 4.17 0.92 Entity 4.33 0.70 Relationship 4.21 0.72 Cardinality 3.67 1.31 Average 4.10 Table 5.1 Subjects' Familiarity with ER Modeling Method and Constructs Based on the demographic information given by the subjects, we consider this group of subjects as modeling experts. They were very comfortable with the ER model (i.e., a score of 4.17) and various ER modeling constructs (i.e., an average familiarity score of 4.1). This was not surprising since most of the subjects had extensive training in information modeling methods from the courses they had taken or taught. Some of the subjects also had industry experience. Training materials consisting of the basic modeling constructs in the ER model (e.g., entity, relationship, structural constraints) and examples of different structural constraints were attached to the front of the questionnaire (see Appendix B). No discussion was allowed as the training materials were designed to be self-explanatory and easy to understand. The training materials -50-followed closely the materials covered for the ER approach in various textbooks (e.g., Hawryszkiewycz 1991, Elmasri & Navathe 1994, Kendall & Kendall 1996). The subjects could refer to the training materials as and when necessary. The same group of subjects was used for the second exploratory study which followed immediately after the first study. 5.1.3 Experimental Results and Discussion The number of subjects and the number of observations for each domain are summarized in Table 5.2. The number of observations for the unfamiliar domain is less than 144 (i.e., 24 subjects X 6 questions) because some questions were left blank by the subjects. No. of Subjects No. of Questions Per Subject No. of Observations Familiar Domain 24 6 144 Unfamiliar Domain 24 6 131 Table 5.2 Number of Subjects and Observations for Each Domain 5.1.3.1 Choice of Connectivity The first dependent variable which is the choice of connectivity for the two domains is analyzed using the nonparametric X2 statistic. The results show X2 (1,275) = 0.018 with p > 0.89. This exceeds our critical level of p < 0.05. Thus, -51 -there is no significant association between domain familiarity and choice of interpretation. Table 5.3 shows the frequency table. Frequency Must Choice May Choice Total Familiar Domain 15 129 244 Unfamiliar Domain 13 118 131 Total 28 247 275 Table 5.3 Counts for Each Domain and Choice As can be seen, the number of May choices is overwhelmingly greater than the number of Must choices. We test the difference between the total number of Must (i.e., 28) and May (i.e., 247) by X2 statistic using .50 probability as the expected value (i.e., 137.5). The X2 value is 174.4 and this value is significant at p < 0.01 level. Therefore, we can conclude that the number of May choices is significantly greater than the number of Must choices. There is, however, no significant difference between the two domains on the choice of connectivity. The percentage difference between the two domains is less than 1%, as illustrated in Figure 5.3. A post-hoc review of the six questions with familiar domains revealed that all of them could be interpreted with optional connectivity. The majority of May choices for familiar domain could possibly be due to the nature of the questions. The same, however, could not be said about the questions with unfamiliar domains. Because the subjects were not familiar with the domains, we expected them to select an equal amount of May and Must connectivity. -52-The fact that most subjects went for May connectivity indicated the subjects' preference for optional connectivity. 100-r 80--•8 60-• 40-• 20-. 0--P h Must May Figure 5.3 Percentage Difference Between Must and May Choices 5.1.3.2 Confidence Level and Perceived Domain Familiarity The confidence level between the two domains was analyzed using General Linear Model (SAS 1985). The results show that the confidence level for familiar domain is significantly higher than that for unfamiliar domain (p < 0.0001). Similarly, the perceived domain familiarity level associated with familiar domain is significantly higher than that associated with unfamiliar domain (p < 0.0001). The results on perceived domain familiarity show that the manipulation of domain characteristics has achieved its desired effect. The statistics are summarized in the table below (DF ~ Degree of Freedom; MS — Mean Square; F — F Statistic). -53-Dependent Variable Effect DF MS F SigF Confidence Level Domain Familiarity Error 1 250 346.90 2.34 148.06 0.0001 Perceived Domain Familiarity Domain Familiarity Error 1 250 750.75 2.21 340.00 0.0001 Table 5.4 Analyses of Variance for Confidence Level and Perceived Domain Familiarity The means and standard deviations for confidence level and perceived domain familiarity are depicted in Table 5.5 and Figure 5.4. The average confidence level for familiar domain is almost twice that for unfamiliar domain. As for perceived domain familiarity, the score for familiar domain is more than twice the score for unfamiliar domain. This is important as it shows that the subjects perceived a great difference between the familiar and unfamiliar domains. In other words, the manipulation was successful. Confidence Level Mean (Std. Dev.) Perceived Domain Familiarity Mean (Std. Dev.) Familiar Domain 4.86 (2.10) 5.68 (1.49) Unfamiliar Domain 2.53 (1.88) 2.26 (1.71) Table 5.5 Means and Standard Deviations for Confidence Level and Perceived Domain Familiarity -54-Perceived Familiarity Confidence Level 0 1 Unfamiliar Familiar Unfamiliar Familiar * 1 1 1 1 K Figure 5.4 Mean Scores for Confidence Level and Perceived Domain Familiarity 5.1.4 Discussion of Results The research issue for this study is to investigate how modeling experts select the connectivity (i.e., structural constraints) for questions with familiar and unfamiliar domains. The results show an overwhelming tendency to select May versus Must connectivity by modeling experts, for both familiar and unfamiliar domains. As shown in Table 5.6, the popularity of May over Must is evident in all the questions. The confidence level and perceived domain familiarity for familiar domain are substantially higher than that for unfamiliar domain. -55-No. of Must No. of May Confidence Perceived Domain Choices Choices Level Familiarity Mean (Std. Dev.) Mean (Std. Dev.) Familiar Domain Q i 1 23 4.71 (1.97) 5.79 (1.25) Q2 2 22 4.29 (2.26) 5.42 (1.79) Q3 6 18 5.00 (2.15) 5.71 (1.27) Q4 2 22 4.25 (1.94) 4.58 (1.61) Q5 4 20 5.42 (2.02) 6.25 (1.11) Q6 0 24 5.50 (2.15) 6.33 (1.20) Unfamiliar Domain Q7 1 21 3.14 (1.91) 2.45 (1.47) Q8 4 18 2.05 (1.68) 1.77 (1.27) Q9 4 18 2.55 (1.99) 2.64 (1.99) Q10 0 22 1.91 (1.44) 1.18(0.39) Q i i 4 18 3.45 (2.22) 3.59 (2.09) Q12 0 21 2.10 (1.55) 1.90 (1.51) Table 5.6 Confidence Level and Perceived Domain Familiarity by Question The popularity of May choices for familiar domain might be due to the nature of the questions. A post-hoc investigation of the six questions with familiar domains revealed that all of them could be interpreted as having "natural" optional connectivity. We, however, expected an equal amount of May and Must connectivity selections for questions with unfamiliar domains. The fact that most subjects went for May connectivity indicated the subjects' preference for optional connectivity. One possible reason for the popularity of May is that these subjects realize that May can be considered a "superset" of Must. The use -56-of Must precludes May whereas the reverse is not necessarily true. The choice of Must means that a relationship must exist between the two entities. However, the choice of May means that the relationship can either exist or not, which is more general. Hence, it is likely that the subjects simply chose the optional relationship ~ just to be on the safe side. For questions with familiar domains, the perceived domain familiarity is consistently higher than the corresponding confidence level (i.e., for all 6 questions). On the contrary, for questions with unfamiliar domains, the confidence level is usually higher than the perceived domain familiarity (i.e., 4 out of 6 questions). It seems perception on domain familiarity is more extreme than perception of confidence. However, we could not explain this phenomenon. Another observation is that the highest confidence level (i.e., 3.45 for Qll) in unfamiliar domain is still lower than the lowest confidence level in familiar domain (i.e., 4.25 for Q4). Overall, the confidence level for familiar domain is substantially higher than that for unfamiliar domain, as shown in Table 5.4 and Figure 5.4. The familiarity of the domains probably gave the subjects more confidence in their choice of connectivity. In conclusion, the results indicate a predominant use of optional over mandatory relationships by expert subjects, irrespective of the familiarity of the domain. -57-5.1.5 Limitations of the Study In this exploratory study, we limited our subject population to information modeling experts. It would be interesting to include both modeling experts and novices in future studies as these studies would shed some light on the behavioral differences between analysts (i.e., modeling experts) and end users (i.e., modeling novices). Also, we chose a convenience sample (i.e., participants of the MIS workshop) as our subjects. This is not a serious drawback for an exploratory study with limited objectives. Nevertheless, this might be a biased sample and might limit the generalizability of the findings. The results of this study need to be verified in future studies. Theories were omitted in this study because the objectives of this phase were to explore, design, and develop empirical methods and techniques. The lack of theoretical foundation, however, limits our understanding of some experimental results. With a strong theory to guide the research, we might be able to explain why the perceived domain familiarity is consistently higher than the confidence level for familiar domain whereas it is not the case for unfamiliar domain. 5.1.6 Implications on Research Designs, Methods and Instruments Generally, the instruments (i.e., training materials, questionnaire) used in the study worked out well. The questionnaire, however, could have been improved if we had done an "instrument calibration" before the actual experiment. Some of the questions used in the questionnaire could have been -58-improved or made more homogeneous within each group if we had tested the questions first on another group of subjects. For example, Q l l (i.e., Floor-Associate-Button), which has an unusually high perceived domain familiarity rating among the questions with unfamiliar domains, could have been eliminated from the group. Also, it might be necessary to pilot test the questions to pre-determine the "natural" connectivity of the relationships. For example, Figure 5.5 has a "natural" mandatory connectivity whereas Figure 5.6 has a "natural" optional connectivity. Figure 5.5 ER Diagram with "Natural" Mandatory Connectivity Figure 5.6 ER Diagram with "Natural" Optional Connectivity For experimental purposes, it would be useful to pilot test these diagrams. This would prevent the problem we encountered in this study where all the questions with familiar domains could be interpreted as having "natural" optional connectivity. In other words, we should have controlled the -59-"natural" connectivity so that half of the questions have a "natural" mandatory connectivity and the other half "natural" optional connectivity. This would ensure that the "natural" connectivity of the relationships does not confound the results of the study. In our case, we did not pilot-test the questions. Due to the lack of prior research in the area, we could not perform a power analysis to determine the number of subjects required for this study. The sample size of this study, though small, was adequate to detect the differences between the familiar and unfamiliar domains. The effect sizes between the familiar and unfamiliar domains for confidence level and perceived domain familiarity are estimated to be 1.5 (i.e., (4.86-2.53) / sqrt(2.34)) and 2.3 (i.e., (5.68-2.26)/sqrt(2.21)) respectively. These are large effect sizes. This information is useful in the design of subsequent "formalized" studies. -60-5.2 Second Exploratory Study - Effect of Conflicting Structural Constraints and Textual Information on Connectivity Selection2 5.2.1 Background Though the connectivity of the relationship (i.e., optional or mandatory) stipulates some constraints on the relationship, it does not describe the underlying meaning of the association (Wand et al. 1993). The semantics of the linkage are conveyed through the textual information describing the entities/objects and relationships in the information model. Many researchers (e.g., Teorey 1990, Kilov & Ross 1994, Hull & King 1987, Peckham & Maryanski 1988) argued that the strength of ER and OO models is their ability to capture semantics of the real world. Although the textual information might not be as useful in the design and implementation of database, it is vital in information modeling where the main objective is to facilitate the communication process between analysts and end-users. This study analyzes the effect of conflicting textual information and structural constraints on modeling experts' selection of connectivity (i.e., mandatory versus optional relationships). The focus on connectivity is again prompted by several recent studies (e.g., Batra et al. 1990, Batra & Sein 1994) which found the connectivity of relationships to be most problematic for novice users. For example, in the study by Batra & Sein (1994), they found that one of the main problems in model construction was the connectivity of relationships. In this 2 Part of this study has been published as "When Parents Need Not Have Children — Cognitive Biases in Information Modeling", Lecture Notes in Computer Science — Advanced Information Systems Engineering, Vol. 1080, P. Constantopoulos, J. Mylopoulos, and Y. Vassiliou (eds.), 1996, Springer-Verlag, pp. 402-420 (with Wand Y., and Benbasat, I.). -61 -study, we investigated the use of structural constraints (i.e., connectivity) and textual information by modeling experts. The result of this study provides us a better understanding of the use of these two types of information by modeling experts. One difficulty encountered in this study is the separation of structural constraints from textual information. Without a clear separation, the research question could not be investigated because of the possible confounding effect of one type of information on the other. To differentiate the two types of information, we introduced the notion of conflicting structural constraints and textual information. For example, for the idea "Parents have children," we introduced a conflict into the information model by presenting the structural constraint (0,*) which indicates that "Parents need not have children." In other words, the textual information and structural constraints are now in contradiction. This design enables us to investigate the use of these two types of information by the subjects. 5.2.2 Experimental Framework The research framework for this experiment is summarized in Figure 5.7. - 6 2 -Structural Constraints — Structural Constraints (ST) Group — No Structural Constraints (NST) Group Relationship — Conflicting Relationship — Non-conflicting Relationship Dependent Variables — Choice of Interpretation — Confidence in Interpretation — Perceived Familiarity with Domain Figure 5.7 Experimental Framework 5.2.2.1 Independent Variables The first independent variable is presence of structural constraints which consists of two levels — the Structural Constraints (ST) group and No Structural Constraints (NST) group. This is a between-subjects factor. The subjects for this experiment were randomly assigned to one of these two groups. The ST group was given ER diagrams with structural constraints and the NST group was presented with ER diagrams with no structural constraints. To prevent any bias, half of the questions in the ST group had mandatory relationships whereas the other half had optional relationships. The two groups received exactly the same set of diagrams except for the presence or absence of structural constraints. The entire set of questions were randomly ordered in each questionnaire (i.e., every questionnaire is unique in order). The question format and the set of diagrams for this study are shown in Appendix D. - 6 3 -The NST group served as a "baseline" group to calibrate the instrument. In other words, the NST group provides us with the "natural" connectivity (i.e., mandatory or option relationship) for each question which we can then use to compare with the ST group (recall that the lack of "natural" connectivity information was a limitation in the first exploratory study). This is possible because the NST group received ER diagrams with no structural constraints and the subjects had to decide on the appropriate relationships based on the textual information. This design also enables us to investigate the effect of structural constraints on users' interpretation. Another independent variable is called relationship which consists of two levels — conflicting versus non-conflicting relationship. This is a within-subjects factor. Conflicting relationship means that the structural constraints and textual information depicted by the ER diagram are in contradiction. For example, the first diagram in Figure 5.8 is conflicting since the textual information tells us that a parent must have children, but the structural constraints suggest an optional relationship. The second diagram is non-conflicting since a scientist does not necessarily teach any courses. Conflicting Relationship Scientist (0,*) Teaches Course Non-Conflicting Relationship Figure 5.8 Examples of Conflicting and Non-Conflicting Relationships -64-We included eight questions that have potentially conflicting relationships into the set of structural constraints (ST) questions. In other words, 8 of the 16 questions in the ST group contain potentially conflicting relationship. However, we could not introduce this variable into the non-structural constraints (NST) group. This is because when there is no structural constraints, the only information available is the textual information and hence there is no conflict. The design of the experiment is shown in Table 5.7. ST Group Conflicting ST Group Non-conflicting ST Group (8 Questions) (8 Questions) NST Group NST Group* NST Group* (8 Questions) (8 Questions) *since there are no structural constraints, there is no conflict Table 5.7 Research Design for Second Exploratory Study Because the conflicting NST group does not exist (i.e., there is no conflict when there is no structural constraints), two separate ANOVAs were performed. One A N O V A compared the ST and NST groups (i.e., the two rows) and the second A N O V A compared the conflicting ST group with the corresponding NST group (i.e., the two cells in the first column). -65-5.2.2.2 Dependent Variables The dependent variables in this study are the choice of connectivity, confidence level, and perceived familiarity with domain. The choice of connectivity was captured using multiple-choice questions. Subjects were presented with a choice of Must or May (i.e., mandatory or optional) to categorize each relationship. These two choices (i.e., Must and May) were randomly ordered to counter a possible order bias. Similar to the first study, the confidence of interpretation and the perceived familiarity with domain have values ranging from 1 to 7, with 1 indicating the lowest and 7 indicating the highest. Familiar problem domains such as university course selection and enrollment were used for all the questions. These familiar domains were chosen so that subjects could use common sense reasoning to help them interpret the ER models. The domain characteristics were controlled by presenting the same set of questions to the ST and NST groups. A total of 16 questions were given to each subject. The questions were randomly ordered so that each subject received a uniquely ordered questionnaire. The subjects in both Structural Constraints (ST) and No Structural Constraints (NST) groups were asked to select the appropriate structural constraints (i.e., mandatory or optional) for a set of information models. They were also asked to rate the confidence level of their selection and their perceived familiarity with the domain. -66-5.2.3 Experimental Results and Discussion: ST versus NST A total of 24 subjects participated in this experiment. These were the same subjects who participated in exploratory study 1. The number of subjects and the number of observations for each group are summarized as follows: No. of Subjects No. of Questions Per Subject No. of Observations Structural Constraints Group 13 16 208 No Structural Constraints group 11 16 176 Table 5.8 Number of Subjects and Observations for the Study 5.2.3.1 Choice of Connectivity The choice of connectivity was analyzed using nonparametric X2 statistics, X2 (1,384) = 17.079 with p < 0.001. The presence of structural constraints has a significant impact on the choice of connectivity. The following table summarizes the result of the X2 test. Frequency No. of May Choice No. of Must Choice Total ST Group 102 106 208 NST Group 123 53 176 Total 225 159 384 Table 5.9 Frequency of May and Must Choices for Each Group -67-70 •• ! g 50 0 u 01 40 30 \ 60 f \ \ \ \ N S T \ \ \ \ \ \ \ \ M a y Must Figure 5.9 Percentage of Mav and Must for Each Group (Note: Origin starts at 30%) The above figure shows the percentage of Must and May (i.e., mandatory and optional) selections for each group. As shown in the diagram, the group with structural constraints (i.e., ST group) selected an almost equal number of Must and May. However, the group that was not given structural constraints tended to select May significantly more than Must, which is consistent with the findings from the first exploratory study. Recall that there are equal number of mandatory and optional relationships in the ST group. This points at the possibility that the subjects in the ST group basically followed the structural constraints depicted in the ER diagrams to decide on mandatory or optional relationship. A check of their responses indicated that this was indeed the case. This shows that they placed a lot of -68-emphasis on the structural constraints shown in the ER diagrams and tended to ignore the textual information. 5.2.3.2 Confidence Level and Perceived Domain Familiarity Confidence level and perceived domain familiarity were analyzed using General Linear Models. The results are summarized in the table below. Dependent Variable Effect DF MS F SigF Confidence Level Structural Constraints Error 1 382 191.99 2.84 67.50 0.0001 Perceived Domain Familiarity Structural Constraints Error 1 382 27.87 2.15 12.94 0.0004 Table 5.10 Analyses of Variance for Confidence Level and Perceived Domain Familiarity A N O V A shows that the presence or absence of structural constraints has a significant effect on confidence level and perceived domain familiarity. The following table shows the means and standard deviations for each group. The ST group rated confidence level and perceived domain familiarity significantly higher than the NST group. Confidence Level Mean (Std. Dev.) Perceived Domain Familiarity Mean (Std. Dev.) ST Group 6.35 (1.04) 5.87 (1.07) NST Group 4.93 (2.22) 5.33 (1.83) Table 5.11 Means and Standard Deviations for the Two Variables -69-N S T S T •I 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 Figure 5.10 Perceived Domain Familiarity and Confidence Level for Each Group The results suggest that the presence of structural constraints gives the subjects more confidence in their interpretation. This is consistent with our speculation that subjects placed a lot of weight on the structural constraints (from the results on choice of connectivity). For example, Table 5.12 indicates that for the ST group, most questions were rated with a higher confidence level than perceived domain familiarity whereas the same is not true for the NST group. The likely explanation is that the presence of structural constraints increased the subjects' confidence in their responses. The presence of structural constraints also has a positive impact on perceived domain familiarity, which is somewhat puzzling. Although one can argue that the additional information provided by the structural constraints increases the confidence level of the subjects, it is hard to apply the same logic to perceived domain familiarity which is determined by the entities and connecting relationships. A question by question breakdown of the study is shown in Table 5.12. -70-Structural Number of Number of Confidence Perceived Domain Constraints Must May Level Familiarity Depicted Choice Choice Mean (Std. Dev.) Mean (Std. Dev.) ST Group Q i (0,1) 1 12 6.38 (1.12) 6.15 (0.80) Q2 (1/) 12 1 6.38 (0.87) 6.46 (0.52) Q3 (0/) 1 12 6.54 (0.66) 6.00 (0.82) Q4 (0,*) 2 11 6.08 (1.66) 6.38 (0.77) Q5 (0,*) 2 11 5.92 (1.75) 5.85 (0.90) Q6 (1/) 12 1 5.77 (1.88) 4.46 (1.66) Q7 (1/) 13 0 6.31 (1.11) 5.38 (1.33) Q8 (0,*) 0 13 6.46 (0.66) 5.54 (0.97) Q9 (0/) 1 12 6.46 (0.78) 5.69 (1.11) Q10 (1,*) 12 1 6.46 (0.66) 6.08 (0.86) Q l l (0,*) 0 13 6.62 (0.51) 5.92 (0.76) Q12 (0,*) 1 12 6.46 (0.78) 6.31 (0.48) Q13 (1/) 12 1 6.31 (0.95) 5.92 (1.44) Q14 (1,1) 12 1 6.54 (0.66) 6.38 (0.65) Q15 (1/) 12 1 6.46 (0.78) 5.69 (0.95) Q16 (1/) 13 0 6.46 (0.78) 5.69 (1.03) NST Group Q l N.A. 8 3 4.82 (2.18) 5.27 (1.90) Q2 N.A. 0 11 5.45 (2.46) 5.45 (1.92) Q3 N.A. 4 7 4.91 (2.30) 4.91 (1.70) Q4 N.A. 6 5 5.18 (2.44) 6.18 (1.60) Q5 N.A. 10 1 5.27 (2.37) 5.45 (1.86) Q6 N.A. 0 11 5.00 (2.32) 4.18 (2.27) Q7 N.A. 1 10 4.55 (2.25) 4.36 (1.96) Q8 N.A. 2 9 4.55 (2.07) 5.00 (1.67) Q9 N.A. 3 8 4.73 (2.45) 5.55 (1.44) QlO N.A. 4 7 4.55 (2.11) 5.64 (1.69) Q U N.A. 1 10 5.09 (2.39) 5.45 (2.16) Q12 N.A. 0 11 5.18 (2.32) 5.73 (1.90) Q13 N.A. 4 7 5.09 (2.39) 5.73 (1.85) Q14 N.A. 4 7 4.73 (2.20) 6.00 (1.55) Q15 N.A. 0 11 5.09 (2.39) 4.73 (2.00) Q16 N.A. 6 5 4.73 (2.15) 5.64 (1.57) Table 5.12 Confidence Level and Perceived Domain Familiarity by Question -71 -The confidence level of the ST group is higher than that of the NST group. The lowest confidence level in the ST group (i.e., 5.77) is still higher than the highest confidence level in the NST group (i.e., 5.45). As for perceived domain familiarity, all the ST questions have a higher perceived domain familiarity than the corresponding NST questions. To summarize, the results suggest that the existence of structural constraints not only has a significant effect on the interpretation of the ER diagrams, but also increases the subjects' confidence and perceived familiarity of the domain. 5.2.4 Experimental Results and Discussion: Conflicting ST Group versus Corresponding NST Group For this between-subjects analysis, we are comparing the choice of connectivity between the conflicting ST group and its corresponding NST group. The responses from the NST group serve as the "benchmark" for analyzing the responses from the conflicting ST group. In other words, the responses from the NST group allow us to evaluate the textual information independently of structural constraints. These responses can then be compared to those from the conflicting ST group. 5.2.4.1 Choice of Connectivity Table 5.13 depicts the choice of connectivity for the two groups. For each question, the choice selected by the majority is emphasized. There are two interesting observations in the table. First, consider the choices made by the -72-conflicting ST group. Notice that for all the eight questions, the choices made by the majority in this group are consistent with the structural constraints depicted in the information models. For example, Q7 (i.e., Airlines — Schedules — Cruise) has structural constraints of (1,*) and all the subjects in the conflicting ST group selected Must as their choice whereas Q8 (i.e., Tour Agency — Organizes — Tour) has structural constraints of (0,*) and all the subjects selected May as their choice. Second, most of the choices made by the majority in the conflicting ST group are the opposite of those made by the majority in the corresponding NST group. Take Q6 (i.e., Bookstore - Sells — Vehicle) for example; all the subjects in the NST group selected May as their choice whereas all, except 1, in the conflicting ST group selected Must as their choice. Conflicting ST Group Corresponding NST Group Structural Number of Number of Number of Number of Constraints Must May Must May Depicted Choice Choice Choice Choice Q l (0,*) 1 12 8 3 Q2 (1,*) 22 1 0 22 Q3 (0,*) 1 22 4 7 Q4 (0,*) 2 22 6 5 Q5 (0,*) 2 22 10 1 Q6 (1/) 22 1 0 22 Q7 (1/) 13 0 1 10 Q8 (0/) 0 23 2 9 Table 5.13 Choice of Connectivity for Conflicting ST and its Corresponding NST Groups The findings seem to indicate that the structural constraints had a strong impact on the interpretation of information models. When structural constraints were given, almost all the subjects made their connectivity - 7 3 -selection by following the structural constraints depicted and ignoring the textual information. The following diagrams depict the number of May and Must selections made by the two groups. Ql Q2 Q3 Q4 Q5 Q6 Q7 Q8 Figure 5.11 Number of May Choices for the Two Groups 14. 12. 10. 8 • 6 . 4 • / / / 2. / 4 0 • 1 ^ Corresponding NST Ql Q2 Q3 Q4 Q5 Q6 Q7 Q8 Figure 5.12 Number of Must Choices for the Two Groups -74-As can be seen, most of the connectivity selections made by the two groups are opposite of one another. The results indicate that subjects who were not given the structural constraints interpreted the information models based on the textual information. However, subjects who were given information models with structural constraints (even though the structural constraints were in conflict with textual information) simply followed the structural constraints in their interpretation. A discussion with two of the subjects in the conflicting ST group after the experiment revealed that they simply looked at the structural constraints to determine the connectivity. They did not even look at the wordings that describe the entities and relationships. 5.2.4.2 Confidence Level and Perceived Domain Familiarity The confidence level and perceived domain familiarity for the conflicting ST and its corresponding NST groups were analyzed using ANOVA. The statistics show that there are significant differences between the conflicting ST and its corresponding NST groups for confidence level and perceived domain familiarity (i.e., p < 0.0001 and p < 0.0028 respectively). Dependent Variable Effect DF MS F SigF Confidence Level Group 1 76.26 24.09 0.0001 Error 190 3.17 Perceived Group 1 21.82 9.17 0.0028 Domain Familiarity Error 190 2.38 Table 5.14 Analyses of Variance for Confidence Level and Perceived Domain Familiarity -75-The means and standard deviations for both groups are summarized in the table below. The conflicting ST group rated confidence level and perceived domain familiarity significantly higher than the NST group. Confidence Level Mean (Std. Dev.) Perceived Domain Familiarity Mean (Std. Dev.) Conflicting ST Group 6.23 (1.28) 5.78 (1.17) Corresponding NST Group 4.97 (2.23) 5.10 (1.89) Table 5.15 Means and Standard Deviations for the Two Variables A question by question analysis of the study is shown in Table 5.16. The confidence level for every question in the conflicting ST group is higher than the corresponding question in the NST group. Similarly, the domains of all the conflicting ST questions are perceived to be more familiar than their corresponding NST counterparts. There is a strong difference between the two groups in terms of confidence level. For example, the lowest confidence level in the conflicting ST group (i.e., 5.77) is still higher than the highest confidence level in the NST group (i.e., 5.45). A probable explanation is that the subjects treated structural constraints as part of the information and felt more comfortable with its presence. This is not surprising as these subjects are modeling experts. It is, therefore, likely that the additional information provided by the structural constraints increased their confidence level. -76-Conflicting ST Group Corresponding NST Group Confidence Level Mean (Std. Dev.) Q l 6.38 (1.12) 4.82 (2.18) Q2 6.38 (0.87) 5.45 (2.46) Q3 6.54 (0.66) 4.91 (2.30) Q4 6.08 (1.66) 5.18 (2.44) Q5 5.92 (1.75) 5.27 (2.37) Q6 5.77 (1.88) 5.00 (2.32) Q7 6.31 (1.11) 4.55 (2.25) Q8 6.46 (0.66) 4.55 (2.07) Perceived Domain Familiarity Q l 6.15 (0.80) 5.27 (1.90) Q2 6.46 (0.52) 5.45 (1.92) Q3 6.00 (0.82) 4.91 (1.70) Q4 6.38 (0.77) 6.18 (1.60) Q5 5.85 (0.90) 5.45 (1.86) Q6 4.46 (1.66) 4.18 (2.27) Q7 5.38 (1.33) 4.36 (1.96) Q8 5.54 (0.97) 5.00 (1.67) Table 5.16 Confidence Level and Perceived Domain Familiarity by Question The following diagrams depict the confidence level and perceived domain familiarity for each of the questions. -77-Figure 5.13 Confidence Level for Each Question Figure 5.14 Perceived Domain Familiarity for Each Question -78-Although the effect on confidence level can be explained by the additional information provided by the structural constraints, the impact on perceived domain familiarity is harder to explain. Perceived domain familiarity is determined by the entire information model (i.e., entities, relationship, and structural constraints (if present)) and how closely the information conveyed by the model fit the reality. One possible explanation is that these subjects placed too much faith on the structural constraints. They could have conceivably made up scenarios (however far-fetched that might be) to convince themselves that the structural constraints were correctly depicted. Another plausible explanation is that the subjects did not even realize the conflict. They looked at the information models but they did not see the conflicts. 5.2.5 Limitations of the Study The subjects in this study are the same as those in the first exploratory study. This is not a serious drawback as these subjects are modeling experts. The training and practice effects on these subjects are minimal and should not confound the experimental results. Moreover, this study is exploratory in nature — the use of convenience sample is acceptable. Since only expert subjects were involved in this study, the generalizability of the results was fairly restricted. The use of convenience sample also limits the external validity. A more elaborated study that includes modeling experts as well as novices would be a more worthwhile research. - 7 9 -The conflicting ST versus its corresponding NST analysis is a between-subjects study (i.e., two groups of subjects). The power could be increased by running it as a within-subjects study so that the subjects could serve as their own control. The main purposes of this second exploratory study, similar to the first, were to explore and test empirical methods and instruments. As such, the research issues were derived from the literature and the theoretical foundation for the studies were omitted. The omission of theories in this study, though acceptable with regard to our objectives, hinders our interpretation and understanding of the results. Subsequent formalized studies would need to be guided by theories and the results explained by theories. 5.2.6 Implications for Research Methods and Instruments Similar to the first exploratory study, the questionnaire used in the experiment worked out quite well. The first and second exploratory studies show that questionnaire, though simple, could be used as an instrument for such studies. The measure of confidence level, though a self-reported value, appears to be adequate for our purpose. With the benefit of hindsight, we also realize that we could have done an instrument calibration test prior to the actual study. This would have enabled us to filter out the weak questions. For example, in the conflicting ST versus its corresponding NST analysis, Q3 (i.e., Landlady-Owns-House) and Q4 (i.e., Parent-Has-Children) were not as strong as the other questions. The test -80-questionnaire could have been improved if we had done an "instrument calibration" before the experiment. The use of information models with no structural constraints allows us to determine the "natural" connectivity of the relationship, which was a concern in the first exploratory study. Again, despite the small sample size, we were able to obtain significant differences in the dependent variables. The effect sizes between the ST and NST groups for confidence level and perceived domain familiarity are estimated to be 0.8 (i.e., (6.35-4.93)/sqrt(2.84)) and 0.4 (i.e., (5.87-5.33)/sqrt(2.15)) respectively. The effect sizes estimated from this study, together with those from exploratory study 1, indicate that the effect size for this type of study could be quite large. This information could be used to estimate the sample size in subsequent studies. The next two chapters present our two formalized studies — designed and built with the knowledge gained from the exploratory phase. -81 -CHAPTER 6 Study 1 --Differences in Interpretation of Information Models by Modeling Experts and Novices An expert is one who does not have to think. He knows. Frank Lloyd Wright 6.1 Background Communication problems between analysts and end-users (i.e., modeling experts and novices) in information modeling, as discussed in Chapter 1, are a major obstacle to achieving correct and accurate requirements specification. As Holtzblott and Beyer (1995) put it: "requirements definition is about people talking effectively with one another." To bridge this communication gap, we need to investigate the differences between modeling experts and novices in their interpretation of information models. Understanding these differences can facilitate the design and selection of modeling methods and constructs, which may help to eliminate or alleviate the communication problems between analysts and end-users. This knowledge may also be helpful in the development of training methods (Guerin & Matthews 1990) and in guiding the communication process between analysts and end-users during -82-information modeling. Gagne et al. (1993) mention that "expert-novice studies are extremely useful for identifying potential cognitive reasons for differences between experts and novices." Conger (1994, p. 50) also proposed that to ease the learning process, one should "try to think like an expert." This, however, is only possible if we know how the experts think. The main objective of this study is, thus, to investigate the differences between modeling experts and novices. Our first exploratory study investigated the effect of domain familiarity on modeling experts' selection of connectivity whereas the second exploratory study analyzed the effect of conflicting textual information and structural constraints on the choice of mandatory versus optional relationship. The findings of the first study indicate that modeling experts tend to choose optional over mandatory relationships, even for domains that are totally unfamiliar to them. The results from the second study show that modeling experts tend to focus on the information depicted by the structural constraints and ignore the textual information. Because of the exploratory nature of the two studies, formalized studies need to be carried out to verify the findings. Therefore, a secondary objective of this study is to verify the findings of the exploratory studies. The research designs, instruments, and findings from the two exploratory studies serve as the foundations for this study. In the two exploratory studies, only modeling experts participated in the studies. This study extends the two exploratory studies by investigating the differences between modeling experts and novices on their selection of connectivity. Similar to the second exploratory study, two types of information models were used. The first type consists of conflicting textual information and -83-structural constraints and the second type consists of non-conflicting textual information and structural constraints. The theoretical foundation, hypotheses, experimental design, and results are discussed in the following sections. 6.2 Research Question We use a well-known model from the communication literature, Joseph Luffs (1969) model of human interaction, to illustrate the communication problem between modeling experts (i.e., analysts) and novices (i.e., end-users) in information modeling. The model is known as Johari Window and it provides a basic introduction to the ideas of disclosure and understanding by communicating parties. The Johari Window is shown in Figure 6.1. Known to self Not known to self 1 2 Known to others Open Blind Not known to others 3 Hidden 4 Unknown Figure 6.1 Tohari Window -84-The Johari Window consists of four quadrants. Quadrant 1, the open quadrant, contains all the aspects known to self and others. Quadrant 2, the blind quadrant, is known to others but not to self. The hidden quadrant is known to self but not to others. A fourth, unknown quadrant, is neither known to self nor others. The aim of communication is to increase the size of quadrant 1 (i.e., open quadrant). Communication requires mutual knowledge (Habermas 1984). However, the analysts and end-users in information modeling do not have much common knowledge. The communication problems can be discussed, based on the Luffs model, from two points of view — the analysts and the end-users. The end-users are the experts in the application domain and the primary source of organizational information requirements. Based on their perceived information requirements, end-users convey informal statements about their information needs to analysts (Kim 1990). These information requirements, however, are unknown and unfamiliar to the analysts. The situation is aggravated by the fact that the requirements specified by the end-users are often ambiguous, inconsistent, incorrect, and incomplete (Ackoff 1967). Thus, the communication of requirements from end-users to analysts is an attempt to move from quadrant 3 (i.e., hidden quadrant) to quadrant 1 (i.e., open quadrant) of the Johari Window. In other words, the information requirements are known to the end-users, but unknown to the analysts. As a result of this knowledge gap, the information models specified and the subsequent systems developed often do not meet the end-users' information needs (as discussed in Chapter 1). -85-The analysts are trained and skilled in using the information modeling methods. The various modeling constructs employed in the model are meaningful and understandable to them. End-users, on the other hand, have little familiarity with the modeling methods (King & McLeod 1984) and are unaccustomed to the modeling constructs. End-users, therefore, fall into the blind quadrant (i.e., quadrant 2) with respect to knowledge on modeling methods and constructs. In order for end-users to validate the information models, they need to move from the blind quadrant (i.e., quadrant 2) to the open quadrant (i.e., quadrant 1). To facilitate such movement, many researchers have proposed the use of intuitive and natural modeling constructs (e.g., Chen 1976, Coad & Yourdon 1991) for information modeling. This, they claim, will enable end-users to better understand the information depicted in the information model and pinpoint incomplete or incorrect information in the model. However, as mentioned in Chapter 1, most of the modeling constructs are introduced based on the "gut feelings" of the researchers and practitioners without much scientific evidence or merit. Moreover, constructs that are perceived by modeling experts to be natural and intuitive may appear artificial and abstract to novices. It is, thus, important to look at the usage of the relationship construct by both modeling experts and novices and understand the differences (if any) between them. The research question of this study is, thus, to experimentally investigate the differences between modeling experts and novices in their interpretation of information models. -86-6.3 Theoretical Foundation 6.3.1 Three-Stage Learning Model Knowledge representation of experts versus novices has been investigated in a number of domains over the course of the last two decades. Cognitive psychologists (e.g., Chase & Simon 1973, Best 1992, Gagne et al. 193) have argued that the knowledge of experts is probably organized differently from the knowledge of novices. A theory that describes the cognitive changes that occur in the evolution of expertise is the "Three-Stage Learning Model" proposed by Anderson (1982, 1995). The three stages in the model are (Fitts 1964, Anderson 1982): cognitive stage, associative stage, and automatic stage. Stage 1 Cognitive j f Stage 2 Associative 1 Stage 3 Automatic 1 Figure 6.2 Three-Stage Learning Model To understand these three stages, we utilize the concepts from the human-information processing paradigm introduced in Chapter 3: -87-(i) The cognitive stage is characterized by the discovery of relevant aspects of the task and the storage of declarative knowledge about the skills. (ii) During the associative stage, skills are chunked, or compiled, into procedural knowledge. (iii) In the final or automatic stage, the procedures of the basic skills undergo a process of continual refinement (i.e., tuning) and strengthening, which results in increased speed and accuracy in performance. As an example of this transformation, let us take the case of a golfer. A novice golfer, in preparing to hit the ball, may verbalize; "Bend your knees," "Head down," "Keep your left arm straight," and so on. However, an expert golfer could execute these processes in split seconds and without conscious awareness. Research has shown that experts in a specific domain have better conceptual or functional understanding of the domain, have automated their basic skills in the domain, and possess domain-specific problem-solving strategies (Gagne et al. 1993). Conceptual understanding is housed in declarative knowledge in terms of images, propositions, and schemas. In solving a problem, conceptual understanding helps the problem solver develop a meaningful representation of the problem and narrow the search for solutions by matching the schema with conditions of productions in the procedural memory. Domain-specific skills and domain-specific strategies are housed in procedural knowledge, with domain-specific skills having more localized elements in their conditions, and strategies having more global, goal-directed elements. Domain-specific -88-strategies help make both the search process and the evaluation of the outcome faster than would otherwise be the case. Automated basic skills allow the problem solver to perform necessary, routine mental operations without thinking about them very much. We briefly look at the differences between experts and novices from the information systems, HCI, computer science, and cognitive psychology literature. 6.3.2 Cognitive Differences Between Experts and Novices Researchers such as Mayer (1988), Glaser and Farr (1988), Glaser (1987), and Gagne et al. (1993) have pointed out that domain experts, in contrast to novices, have the ability to perceive large meaningful patterns, possess highly procedural and goal oriented knowledge, have less need for memory search and general processing, and possess specialized schema which drive performance. 6.3.2.1 Pattern Recognition and Chunking Studies (DeGroot 1965, Chase & Simon 1973) on chess masters and players at different skill levels indicated that the more experienced or expert players could recall a relatively large number of pieces after a brief look at the chessboard while the less experienced players tend to recall only one or two pieces at a time. Moreover, expert-level players recall pieces in functional chunks that have some meaningful classification label. The chunks that are recalled by less experienced players are less organized. In terms of representation, experienced -89-players use bigger and more meaningful chunks of information, a strategy that supports more efficient storage and retrieval of information (Miller 1956). Research in human-computer interaction (Card et al. 1983, Carroll et al. 1985, Mayer 1988) also investigated the cognitive processes and knowledge representation of computer users. A number of researchers (e.g., Adelson 1981, McKeithen et al. 1980, Vessey 1988, Shaft & Vessey 1995) have contrasted expert and novice cognitive skills and knowledge representation in the following domains: recognition of proper syntax, use of conceptual models, program comprehension, recall, and debugging. As in other domains, issues of pattern recognition, specialized knowledge structures, and special types of procedural knowledge appear to be related to the characterization of expert knowledge and useful for explaining the differences between experts and novices. For example, Weidenback (1985) presented expert and novice Fortran programmers with brief displays of computer code and asked them to indicate whether they were syntactically correct or incorrect. Experts were 25% faster on this identification task and made 40% fewer errors than novices. He concluded that experts had automated lower-level skills such as recognition and could allocate attentional resources to the higher order tasks of programming. Quick recognition of recurring syntax is also related to automation that has been found in other realms such as the reading of text (LaBerge & Samuels 1974). 6.3.2.2 Knowledge Organization Glaser (1987) suggested that expertise in a domain seems to be associated with possession of and access to a well-organized body of inter-related information, -90-concepts, or declarative knowledge. Doane (1986) studied the knowledge organization of expert and novice UNIX users. He presented the subjects with a number of UNIX commands and asked them to construct graphs depicting their particular model of UNIX structure. Experts and novices did not differ from one another in the organization of lower-level UNIX functions and commands but experts seemed to have a more integrated model of the system and represented it in a more coherent hierarchy. Novices, in contrast, seemed to represent the system as unrelated fragments. Thinking-aloud studies (e.g., Erlich & Soloway 1984, Jeffries et al. 1981) have also produced evidence that experts represent systems objects such as pointers and flags in a more useful fashion than novices who seem to lack an understanding of these and similar systems objects. These studies indicated that experts have an integrated conceptual model-like organization of knowledge whereas the organization of novices is more fragmented and less defined. 6.3.2.3 Speed in Skill Execution Glaser and Chi (1988) identified two reasons to explain the faster performance by experts. First, experts are quicker in performing the "basic" skills of the domain by virtue of their many more hours of practice. Through practice, experts have automated the lower-level skills. The automation of basic skills frees up working memory and enables the experts to think about other aspects of the problem at hand. Secondly, experts sometimes solve problems more quickly through "opportunistic" reasoning (Gagne et al. 1993) or short-cuts. Opportunistic reasoning is reasoning that is not part of the initial plan or problem representation, but arises as more information is gathered. For - 9 1 -example, expert electronics troubleshooters may not plan every move they make before they start troubleshooting, but during testing of various components they may formulate a hypothesis that fits with the information available so far and that leads quickly to a solution (Means & Roth 1988). Overall, the literature and empirical studies have shown that experts are at a different stage in the learning model than novices, and they employ different strategies in problem solving because of their experience and training. 6.4 Research Framework The research design for this study employs the expert-novice paradigm (Gagne et al. 1993) that is widely used in this area of research. In the expert-novice paradigm, modeling experts and novices are selected and given the same set of problems. The two groups are then compared in terms of their performance (Gagne et al. 1993). The use of such psychological studies has emerged as an important aid to early design in HCI (e.g., Mayer 1979, Card et al. 1983). This study employs a 2 X 2 mixed factorial design with a repeated-measures (i.e., Relationship) for one factor and independent group measures for another (i.e., Modeling Expertise). This is also known as the split-plot design (Keppel 1982). The research framework is summarized in Figure 6.3. -92-Modeling Expertise -- Expert — Novice Relationship Conflicting Relationship Non-Conflicting Relationship Dependent Variables — Choice of Connectivity — Confidence in Interpretation — Perceived Familiarity with Domain Figure 6.3 Research Framework 6.4.1 Independent Variables 6.4.1.1 Modeling Expertise The first independent variable is modeling expertise which consists of two levels — expert and novice. Patel and Groen (1991, p. 96) defined six categories of expertise: Layperson — A n individual who has only common sense or everyday knowledge of the domain Beginner — An individual who has the prerequisite knowledge assumed by the domain Novice — A layperson or a beginner Intermediate — Anyone who is above the beginner level but below the subexpert level -93-Subexpert — An individual with generic knowledge, but inadequate specialized knowledge, of the domain Expert— A n individual with specialized knowledge of the domain The above categorization makes a clear distinction between two types of novices ~ layperson and beginner. With respect to information modeling, a layperson is one who does not know the rules of the modeling method whereas a beginner does. Information modeling is a domain-specific skill. In other words, subjects need to have some prerequisite knowledge of the modeling method. For example, in the case of ER modeling, the subjects need to understand that rectangles represent entities and diamonds represent relationships. Thus, because of the training given as part of the experiment, a modeling novice, in our case, is defined as a beginner rather than a layperson. For modeling experts, we recruited students in the Commerce Faculty who were completing a Systems Analysis and Design class. The students had studied and practiced the interpretation and construction of the modeling method used in our study. The choice of students as subjects allowed us to have better control of the subject population and ease of access to a substantial pool of subjects with specialized knowledge and experience in information modeling. It also enabled us to have populations of experts and novices who were similar in many other aspects. In practice, modelers will usually have considerably more experience. However, we believe that our choice was justified based on the nature of the experimental task. As mentioned above, Patel and Groen (1991) -94-distinguished between experts and subexperts. The difference between the two categories hinges on the notion of "adequacy of specialized knowledge." What would be considered adequate specialized knowledge depends on the task and the knowledge required to perform it. The experimental task in our study is a fairly simple interpretation task. For this task, specialized knowledge of the ER model and some experience in interpreting and constructing information models would suffice. Thus, the benefits of having a homogeneous subject population of sufficient size outweigh the benefits of recruiting practitioners with extensive (e.g., 10-20 years) systems analysis experience. Moreover, the difference between experts and novices depends on the amount of practice. According to the three-stage learning model, more experience would lead to more automated behavior. Best (1992, p. 517) argues that expertise "is acquired directly through experience." If significant differences could be found between our experts and novices, it is likely that the differences would be more pronounced if practitioners, who are likely to have more modeling experience, were used as subjects. Thus, we believe our choice of experts would not jeopardize the external validity of the conclusions. To summarize, we operationalized novice and expert as follows: Novice -- An individual who has some prerequisite knowledge to understand the information models but has no information modeling experience Expert — An individual who has the specialized knowledge and classroom experience to interpret and construct information models -95-In this study, modeling novices were recruited from second year Commerce students who had not taken any courses in MIS or related areas that might have exposed them to systems analysis and design methods. These subjects were given a reading training set to acquire the knowledge required for interpreting the information models. Modeling experts were recruited from the Systems Analysis and Design class. At the time of recruitment, these students had completed 10 of 12 weeks of the course and had acquired the specialized knowledge and classroom experience (including assignments and projects) to utilize the modeling method used in this study. 6.4.1.2 Relationship The second independent variable is relationship — which can be either conflicting or non-conflicting. Similar to the second exploratory study, conflicting relationship means that the structural constraints and the textual information are in conflict. For non-conflicting relationship, there is no such conflict. Using the same examples illustrated in Chapter 5, the first diagram below is conflicting since the textual information tells us that a parent must have children, but the structural constraints suggest an optional relationship. The second diagram is non-conflicting since a scientist need not teach any courses. Conflicting Relationship -96-Scientist (0*) Teaches Course Non-Conflicting Relationship Figure 6.4 Examples of Conflicting and Non-Conflicting Relationships In this study, relationship is a repeated-measures (also known as within-subjects) factor. All subjects in the expert or novice group encountered both conflicting and non-conflicting relationships. This mixed factorial design has the strength of having each subject serves as his/her own control and is more sensitive (for the within-subjects factor) than a corresponding completely randomized design (Keppel 1982). 6.4.2 Dependent Variables The dependent variables in this study are: (i) Choice of interpretation, i.e., whether the connectivity for a relationship is identified as mandatory or optional, (ii) Confidence in interpretation, and (iii) Perceived familiarity with domain. Similar to the exploratory studies, subjects were presented with a choice of Must or May (i.e., mandatory or optional) connectivity to categorize the relationship. The choice of interpretation was captured using a multiple-choice question. These two choices (i.e., Must and May) were randomly -97-ordered to counter a possible order bias. Confidence in interpretation and perceived familiarity with domain had values ranging from 1 to 7, with 1 being the lowest and 7 the highest. It is to be noted that the perceived domain familiarity that we were measuring refers to familiarity with the application domain and not familiarity with the modeling method. This question was carefully phrased and pilot-tested to ensure that subjects interpreted it correctly. To sensitize the subjects to the textual information, we placed, on each page, two similar diagrams (one above the other). The diagram at the top had no structural constraints whereas the diagram at the bottom included the structural constraints (see Appendix F for the question format and the set of questions used in the study). The subjects were told explicitly to work on the question at the top first. The research question is to understand the use of structural constraints and textual information by modeling experts and novices, and their preferences. The top diagram directs the subjects' attention to the textual information. This needs to be done because modeling experts tended to ignore the textual information (by focusing almost entirely on the structural constraints), as discovered in the second exploratory study. In addition, the top diagram served as the control — the modeling experts and novices should not differ in their responses for these diagrams. 6.4.3 Subject Characteristics The subjects in this study were Commerce students. The experiment was conducted over a period of two weeks. Experts were recruited from a Systems -98-Analysis and Design course. Novices were recruited through postings in the Commerce building. The postings stated that subjects must not have taken any courses in MIS and must have no prior experience in Systems Analysis and Design. A demographic information sheet was filled by each subject prior to the start of the experiment to verify that these criteria were met. The subjects were given a remuneration of $10 for their participation. The average length of participation was about 40-45 minutes. Based on effect sizes estimated from the two exploratory studies, an effect size of f = .4 is assumed for this study. With a 2 X 2 full factorial design, a sample size of 56 subjects is required to detect an effect size of f = .4 at a = .05 using a desired power of .8 (Cohen 1988). The power of .8 is recommended by Cohen and Cohen (1983) who indicate that "in the absence of some preference to the contrary, that power be set at .80. This value falls in the middle of the .70-.90 range and is a reasonable one to use as a convention when such is needed." Since a 2 X 2 mixed design is more sensitive than a full factorial design, a sample size of 56 subjects is more than adequate for this study. 6.5 Experimental Procedures Similar to the second exploratory study, familiar problem domains such as university course selection and enrollment were used for all the questions. Domain characteristics were controlled by presenting the two groups with the same set of questions. A total of 40 questions were designed based on observations in the two exploratory studies and these questions were pilot-tested over a period of one month. Another objective of pilot testing was to -99-determine the "natural" connectivity of relationships in these questions (i.e., the connectivity that is selected when the structural constraints are missing). Weak questions (i.e., questions where the choice of interpretation was not consistent among subjects) were filtered out from the set. In the final questionnaire, half of the diagrams had "natural" mandatory relationships and the other half had "natural" optional relationships. The training materials were also modified slightly from those used in the exploratory studies based on feedback and observations gathered during the pilot test. The revised training set is shown in Appendix E. After a series of pilots, a total of 20 questions were selected for the actual study (the questions are shown in Appendix F). The set of 20 questions were given to all subjects. Out of the 20 questions, 10 were conflicting and 10 non-conflicting. Within the conflicting and non-conflicting groups, 5 questions were pilot-tested to be mandatory and 5 optional. The questions were randomly ordered so that each subject received a uniquely ordered questionnaire. Attached to the beginning of each questionnaire was an instruction set that describes the basic ER modeling constructs (e.g., entity, relationship, structural constraints). No discussion was allowed as the training set was designed to be self-explanatory. 6.6 Research Hypotheses Both modeling experts and novices were recruited for this study. Because experts have formed highly-specialized strategies for interpreting information models, we posit that they are likely to focus more on the structural constraints - 100-depicted in the information models and less on the textual information depicted by the information models (as found in the second exploratory study). For example, Gagne et al. (1993) stated that automated skills are executed almost unconsciously when certain conditions are met. Logan (1988) characterized automation as fast, effortless, autonomous, stereotypic, and unavailable to conscious awareness. These observations are consistent with the findings from the second exploratory study where the experts simply focused on structural constraints. Modeling novices, on the other hand, have not automated the skills for interpreting information models. We suggest that novices, at the cognitive or associative stage of the learning model, pay attention to both the structural constraints and the textual information. We, therefore, hypothesize that: Hypothesis 1: Experts and novices differ in their choice of connectivity when interpreting conflicting information models. For non-conflicting information models, there should be no difference between the two groups in their choice of connectivity. The experts will interpret the models based on the structural constraints. The novices will interpret the models based on both the textual information and the structural constraints, which are consistent with one another. Therefore, the interpretation of non-conflicting information models should be the same for both modeling experts and novices. Hypothesis 2: Experts and novices do not differ in their choice of connectivity when interpreting non-conflicting information models. - 101 -This study deals with a domain-specific problem solving task where the domain is information modeling. Modeling experts, being specialists in information modeling, have automated their basic skills, possess better conceptual understanding of the task, and utilize highly specialized information models interpretation strategies. They will, therefore, be very confident in their responses. Modeling novices, on the other hand, are still learning the basic skills of interpreting information models. Thus, we hypothesize that modeling novices are less confident in their interpretation than modeling experts. Hypothesis 3: Experts are more confident than novices in their interpretation of information models. Experts, because of their expertise in the modeling method, have little difficulty in understanding information models. They have automated the lower-level skills, such as interpreting modeling constructs and structural constraints, and could allocate more resources to the higher order tasks of understanding the information depicted by the models and putting them in context. Experts are also superior in perception of meaningful patterns (Gagne et al. 1993). Because modeling experts could perceive meaningful information from the models, they can organize these bits of information into larger groups which increase their understanding and familiarity with the domain. Modeling novices, on the other hand, are unfamiliar with the modeling method. The various pieces of information in the model might appear to them as unrelated fragments. Most of their time and "mental energy" is spent on lower-level processes such as identifying and decoding modeling constructs. This hinders their effort in understanding the information models. Because modeling experts could easily - 102-and quickly perceive meaningful information from the models and put them in context, we hypothesize that: Hypothesis 4: Experts perceive higher level of domain familiarity than novices. Because of their automated skills, modeling experts tend to select the connectivity based on the structural constraints. They tend to downplay the textual information in both the conflicting and non-conflicting models. As a result, their confidence level is consistently high for both conflicting and non-conflicting models. Modeling novices, on the other hand, pay more attention to the textual information than experts. When the textual information and structural constraints are in conflict, as in the case of conflicting models, modeling novices ponder over the conflict and register lower confidence. We, therefore, hypothesize an interaction effect between expertise and relationship (i.e., conflicting and non-conflicting) on confidence level. Hypothesis 5: There is an interaction effect between expertise and relationship (conflict and non-conflict) on level of confidence. The difference in level of confidence between experts and novices for conflicting information models is greater than that for non-conflicting information models. Similarly, we expect an interaction effect between expertise and relationship on perceived domain familiarity. When interpreting conflicting information models, modeling novices will deliberate over the two pieces of conflicting information (i.e., textual information and structural constraints). Modeling - 103-experts, on the other hand, might de-emphasize the textual information or create a scenario that eliminate the conflict (e.g., Professor must own a car because they work for a special university). Conflicting information models, thus, might lower the perceived domain familiarity of modeling novices more than that of modeling experts. Thus, we hypothesize that: Hypothesis 6: There is an interaction effect between expertise and relationship (conflict and non-conflict) on perceived domain familiarity. The difference in perceived domain familiarity between experts and novices for conflicting information models is greater than that for non-conflicting information models. 6.7 Experimental Results 6.7.1 Subjects A total of 25 experts and 26 novices participated in the experiment. Two expert subjects attempted to find out the research hypotheses by asking the researcher several questions on how to answer the questions. The data from these two subjects were discarded, leaving us with 23 expert and 26 novice subjects. Although the total number of subjects (i.e., 49) is lower than the 56 subjects derived from power analysis, it should be sufficient to detect differences as the sample size of 56 is derived from a 2 X 2 factorial design. The design for this study is a two-factor experiment with a repeated measure on the relationship factor. The repeated measure design is more powerful than the 2 X 2 factorial - 104-design for the repeated measure factor. The number of observations for each group is shown in Table 6.1. No. of Subjects No. of Questions* Per Subject No. of Observations Experts 23 20 460 Novices 26 20 520 *Note that each question has two parts — the top diagram without structural constraints and the bottom diagram with structural constraints. Table 6.1 Number of Subjects and Observations for Each Group 6.7.2 Control — Questions with No Structural Constraints The first part of each question (i.e., the top diagram without structural constraints) serves as the control. By placing the (same) diagram without structural constraints at the top of the page (i.e., first part of the question), we attempt to sensitize the subjects to the textual information. This is to overcome the phenomenon discovered in the second exploratory study where experts simply focused on the structural constraints and overlooked the textual information. - 105-6.7.2.1 Choice of Connectivity For the first part of the question (where the structural constraints are absent), the choice of connectivity between experts and novices should not differ. X2 statistics show that X2 (1,980) = 3.464 with p > 0.06. This exceeds our critical level of p < 0.05. Thus, there is no significant difference between experts and novices in their choice of connectivity when the information models are presented without structural constraints. Difference in their choice of connectivity in the second part of the question can then be attributed to the effect of the manipulation. Frequency Must Choice May Choice Total Expert 287 173 460 Novice 294 226 520 Total 581 399 980 Table 6.2 Counts for Experts and Novices 6.7.2.2 Confidence Level and Perceived Domain Familiarity Analyses of variance were performed on confidence level and perceived domain familiarity. The between-subjects independent variable was Expertise. The results show that the confidence level of experts is not significantly different from that of the novices (p > 0.45). However, the perceived domain familiarity of experts is significantly higher than that of the novices (p < 0.0001). -106-Dependent Variable Effect DF MS F SigF Confidence Level Expertise Error 1 978 1.26 2.23 0.56 0.4533 Perceived Domain Familiarity Expertise Error 1 978 286.56 2.87 99.86 0.0001 Table 6.3 Analyses of Variance for Confidence Level and Perceived Domain Familiarity for Control (No Structural Constraints) Since the confidence level for the control (without structural constraints) is not significantly different between experts and novices, we could attribute any differences detected in the second part of the question to the manipulation (conflicting and non-conflicting relationships). On the other hand, because perceived domain familiarity is significantly different between experts and novices in the first part, we have to be very careful in interpreting the results for perceived domain familiarity in the second part as the differences (if any) could be confounded. Confidence Level Mean (Std. Dev.) Perceived Domain Familiarity Mean (Std. Dev.) Expert 5.57 (1.67) 5.97 (1.49) Novice 5.50 (1.32) 4.89 (1.86) Table 6.4 Means and Standard Deviations for Confidence Level and Perceived Domain Familiarity for Control (No Structural Constraints) - 107-The next sections discuss the results of the second part of the questions, which is called the treatment (to distinguish it from the control). 6.7.3 Treatment ~ Questions with Structural Constraints 6.7.3.1 Choice of Connectivity There is a significant difference between experts and novices in their choice of connectivity when there is a conflict. The chi-square statistic is X2 (1,490) = 5.548 with p < 0.018. Hypothesis 1 is, therefore, supported. Table 6.5 shows that experts, when compared to novices, chose more mandatory connectivity (i.e., Must) than optional connectivity (i.e., May). Novices, on the other hand, chose more optional connectivity than mandatory connectivity. For the experts, mandatory connectivity accounted for 52% of the choices and optional connectivity 48% ~ there was an almost even distribution between the two connectivities (note that there was an equal number of mandatory and optional structural constraints in the questionnaires). The majority of novices chose the optional connectivity which accounted for 59% of the choices. Frequency Must Choice May Choice Total Expert 120 110 230 Novice 108 152 260 Total 228 262 490 Table 6.5 Counts for Experts and Novices for Conflicting Questions -108-Table 6.6 shows that for those information models without conflict, there is no significant difference between experts and novices in their choice of connectivity. The chi-square statistic is X2 (1,490) = 0.448 with p > 0.5. Hypothesis 2 is, therefore, supported. Frequency Must Choice May Choice Total Expert 114 116 230 Novice 121 139 260 Total 235 255 490 Table 6.6 Counts for Experts and Novices for Non-Conflicting Questions Table 6.7 shows the choice of connectivity for each question. Experts and novices differ for conflicting information models. This is especially obvious for questions 6-10 where the textual information of these models depict a Must connectivity whereas the structural constraints indicate a May connectivity. For these questions, all of the experts interpreted the relationships based on the structural constraints. It is highly plausible that the modeling experts could have conceivably made up scenarios to convince themselves that the structural constraints were correctly depicted (as suggested in Chapter 5). They placed a lot of faith on the correctness of the structural constraints. A sizable number of novices (ranging from 5 - 11), on the other hand, decided that the connectivity was mandatory despite the fact that the structural constraints depicted an optional connectivity. - 109-Structural Constraints No. of May Choices No. of Must Choices Expert Novice Expert Novice Conflicting Q l ( O 1 3 22 23 Q2 (1/) 1 4 22 22 Q3 (1/) 1 3 22 23 Q4 (1/) 1 3 22 23 Q5 (1/) 1 3 22 23 Q6 (0/) 23 15 0 11 Q7 (0,*) 23 18 0 8 Q8 (0/) 23 20 0 6 Q9 (0/) 23 18 0 8 Qio (0/) 23 21 0 5 Non-Conflicting Q i l (1/) 0 0 23 26 Q12 (1/) 0 1 23 25 Q13 (1/) 0 1 23 25 Q14 (1/) 1 0 22 26 Q15 (1/) 0 2 23 24 Q16 (0/) 23 22 0 4 Q17 (0/) 22 24 1 2 Q18 (0,*) 22 22 1 4 Q19 (0/) 23 26 0 0 Q20 (0/) 23 23 0 3 Table 6.7 Choice of Connectivity by Question -110-6.7.3.2 Confidence Level and Perceived Domain Familiarity Analyses of variance were performed on the dependent measures of confidence level and perceived domain familiarity. The results are shown in Tables 6.8 and 6.9. Effect DF MS F SigF Between Expertise 1 106.93 8.18 0.0063 Subjects Error 48 13.08 Within Relationship 1 22.35 12.39 0.0010 Subjects Expertise*Relationship 1 7.74 4.29 0.0438 Error 47 1.80 Table 6.8 Analyses of Variance for Confidence Level Effect DF MS F SigF Between Expertise 1 176.84 11.72 0.0013 Subjects Error 48 15.09 Within Relationship 1 662.89 48.83 0.0001 Subjects Expertise*Relationship 1 0.0004 0.00 0.9955 Error 47 13.58 Table 6.9 Analyses of Variance for Perceived Domain Familiarity For confidence level, the results show that modeling experts and novices are significantly different (p < 0.0063), and there is a significant difference between - I l l -conflicting and non-conflicting relationships (p < 0.0010). There is also an interaction effect between expertise and relationship (i.e., conflicting or non-conflicting) (p < 0.0438). Hypothesis 5 is, therefore, supported. For perceived domain familiarity, both expertise and relationship are statistically significant (p < 0.0013 and p < 0.0001 respectively). But the interaction effect is not significant (p > 0.9955). Hypothesis 6 is, therefore, not supported. The means and standard deviations are depicted in Table 6.10. Confidence Level Mean (Std. Dev.) Perceived Domain Familiarity Mean (Std. Dev.) Expert 6.67 (0.73) 5.54 (1.95) Novice 6.01 (1.38) 4.69 (1.94) Conflicting 6.17 (1.32) 4.27 (2.17) Non-Conflicting 6.47 (0.97) 5.91 (1.36) Table 6.10 Means and Standard Deviations Table 6.10 shows that experts have higher confidence level than novices and they also have higher perceived domain familiarity. Hypotheses 3 and 4 are, therefore, supported. Table 6.11 depicts the confidence level and perceived domain familiarity for each individual question. - 112-Confidence Perceived Domain Level Familiarity Mean (Std. Dev.) Mean (Std. Dev.) Expert Novice Expert Novice Conflicting Q l 6.61 (1.08) 6.04 (1.11) 5.22 (1.91) 4.77 (1.42) Q2 6.61 (0.94) 5.73 (1.61) 3.87 (2.56) 3.00 (1.85) Q3 6.78 (0.42) 5.54 (1.70) 4.78 (2.33) 3.42 (1.60) Q4 6.74 (0.54) 5.92 (1.55) 5.26 (2.14) 5.04 (1.75) Q5 6.78 (0.52) 5.92 (1.62) 5.04 (2.14) 3.54 (2.10) Q6 6.48 (0.85) 5.54 (1.73) 4.22 (2.26) 4.23 (2.34) Q7 6.52 (0.79) 6.19 (1.23) 4.91 (2.31) 4.42 (2.00) Q8 6.52 (0.99) 6.04 (1.18) 4.61 (2.46) 3.62 (2.12) Q9 6.57 (0.73) 5.54 (1.70) 5.00 (2.11) 3.50 (1.70) Qio 6.52 (0.85) 5.27 (1.89) 4.30 (2.49) 3.15 (2.09) Non-Conflicting Q l l 6.74 (0.62) 6.50 (0.81) 6.57 (0.79) 5.88 (1.53) Q12 6.70 (0.76) 6.12 (1.24) 6.35 (0.98) 5.35 (1.57) Q13 6.83 (0.39) 6.38 (0.94) 6.57 (0.79) 5.15 (1.67) Q14 6.74 (0.69) 6.23 (0.95) 6.65 (0.65) 5.46 (1.68) Q15 6.65 (1.07) 6.42 (0.90) 6.43 (1.16) 5.92 (1.29) Q16 6.74 (0.54) 6.46 (1.07) 6.43 (0.79) 5.92 (1.23) Q17 6.74 (0.54) 5.96 (1.59) 6.39 (1.34) 5.88 (1.34) Q18 6.78 (0.52) 6.23 (1.07) 6.39 (0.78) 5.73 (1.28) Q19 6.65 (0.65) 6.08 (1.16) 5.74 (1.71) 5.08 (1.44) Q20 6.70 (0.63) 6.04 (1.43) 6.13 (0.97) 4.77 (1.31) Table 6.11 Confidence Level and Perceived Domain Familiarity by Question - 113-As can be seen from the table and Figure 6.5, the experts have higher confidence levels for all the questions. Also, the confidence levels of the experts are fairly consistent across the questions whereas novices have higher confidence levels for non-conflicting than conflicting questions. As a result, the differences between experts and novices are larger for conflicting questions (i.e., Q1-Q10) than for non-conflicting questions (i.e., Q11-Q20). -Expert -Novice I I I I H I h H 1—I 1 h i h Q l Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Q11Q12Q13QI4Q15Q16Q17Q18Q19Q20 Conflicting Non-Conflicting Figure 6.5 Confidence Levels of Experts and Novices Figure 6.6 shows that experts also have higher perceived domain familiarity than novices for all the questions (except Q6 where the perceived domain familiarity is about the same for experts and novices). We should, however, be careful not to read too much into this result as the experts and novices are -114-significantly different in perceived domain familiarity even when structural constraints are not given. For both experts and novices, the perceived domain familiarity is higher for non-conflicting information models (i.e., Q11-Q20) than for conflicting information models (i.e., Q1-Q10). •Expert • Novice 2.5-1 1 \ 1—-| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 — Q l Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Q11Q12Q13Q14Q15Q16Q17Q18Q19Q20 Conflicting Non-Conflicting Figure 6.6 Perceived Domain Familiarity of Experts and Novices Analyses of variance also indicate a significant interaction effect between expertise and relationship on confidence level (p < 0.05) but no significant interaction effect between expertise and relationship on perceived domain familiarity (p > 0.99). Thus, hypothesis 5 is supported whereas hypothesis 6 is t rejected. Figures 6.7 and 6.8 depict the interaction graphs for confidence level and perceived domain familiarity. - 115-7 •• 6 . 5 -6 •-5 .5-5 •-4.5--4 -3.5.--Expert -Novice + + Conflict Non-Conflict Figure 6.7 Interaction Graph for Confidence Level 7 r 6.5 -6 -5.5.-5.-4.5 -4 -3.5 -3 --Expert -Novice Conflict Non-Conflict Figure 6.8 Interaction Graph for Perceived Domain Familiarity Figure 6.7 shows that the difference between experts and novices is larger for conflicting information models than for non-conflicting information models. -116-On the other hand, Figure 6.8 shows that the two lines are almost parallel — indicating that there is no interaction effect for perceived domain familiarity. 6.8 General Discussion The results of this study show that experts and novices are different in their interpretation of information models. This is consistent with the findings of other behavioral research which found that humans are not unbiased in their selection and use of data (Davis 1974, 1987, Tversky & Kahneman 1974, 1986). For example, in the field of HCI, Gentner & Grudin (1996, p. 28) write "[designers'] inner views and biases are unconsciously reflected in the types of user interfaces they construct." 6.8.1 Explanation of Findings Modeling experts and novices are not significantly different in their choice of interpretation when structural constraints are absent. Both groups, in this case, had to interpret the information models based on the textual information (i.e., wordings) available. There is also no significant difference between the two groups when structural constraints are consistent with the textual information. The results, however, show that experts and novices differ significantly when the structural constraints contradict the textual information. Almost all the experts follow the structural constraints depicted in the information model, even for cases where the structural constraints are apparently contradictory to the textual information. This finding is the same as the results obtained from - 117-the second exploratory study where expert subjects are found to simply interpret the information models based on the structural constraints. Novices also tend to interpret the information models based on the given structural constraints (even when the structural constraints are in apparent contradiction with the textual information), but to a significantly less extent when compared to experts. As for confidence level, the experts and novices are not significantly different in the control (first part of the question where there is no structural constraints). However, the confidence level of experts is significantly higher than novices in the treatment (second part of the question where there are structural constraints). This shows that experts are more confident than novices in their interpretation when structural constraints are present — irrespective of whether the structural constraints are contradictory to the textual information. This finding, similar to the second exploratory study, indicates that experts place a heavier weight on structural constraints than on textual information in their interpretation. As explained in Chapter 5, one possible explanation is that modeling experts placed a lot of faith on the structural constraints. They could have conceivably made up scenarios (however far-fetched that might be) to convince themselves that the structural constraints were correctly depicted (e.g., Parents need not have children because the children died). The behavior of ignoring the textual information given can be termed "textual information negligence" or "textual information fallacy." This is similar to the notion of base rate fallacy coined by Tversky and Kahneman (1974) in their study of cognitive biases. Research in behavioral decision research has -118-identified a number of cognitive biases associated with human judgment and choice (see Einhorn & Hogarth 1981, Hogarth 1980, Kahneman et al. 1982, Nisbett & Ross 1980) such as base rate fallacy, attentional bias, layout of data, recency, and concreteness. It has been shown that even scientists, skilled in statistics, are susceptible to base rate fallacy (Tversky & Kahneman 1982). Thus, it is not surprising that modeling experts in our case, analogous to the expert statisticians in Tversky and Kahneman's experiment, are prone to textual information fallacy. We could also termed this textual information fallacy as "attentional bias" ~ focusing almost exclusively on structural constraints and ignoring the textual information. Novices, on the other hand, are less susceptible to ignoring the textual information. This finding can be explained, at least partly, by the three-stage learning model we discussed in the beginning of this chapter. Novices, in our case, were introduced to the information modeling method prior to the experiment — the necessary declarative and production knowledge were introduced in the training materials. During the experiment, novices were likely to be at the associative stage of the learning model (i.e., trying to associate the relevant concepts introduced in the training materials to interpret the information model). Because they were at the associative stage, they needed to associate all the information depicted in the information model — looking at both the textual information as well as the structural constraints. Experts, on the other hand, are likely to be at the automated stage of the learning model. At the automatic stage, processes are run without any conscious allocation of attention (Best 1992). Research shows that automated skills are executed almost unconsciously when certain conditions are met and -119-that automation is fast, effortless, autonomous, stereotypic, and unavailable to conscious awareness (Logan 1988, Gagne et al. 1993, Anderson 1995). Thus, these modeling experts, who are trained in interpreting information models and who have formed highly-specialized strategies for interpreting information models, simply attended to the structural constraints depicted in the information models and ignored the textual information depicted by the models. For these modeling experts, their production rules might have been "fine-tuned" to base the decision of mandatory and optional connectivity solely on the structural constraints available, as they usually do, and omit the examination of textual information. Anderson (1995, p. 319) stresses that at the autonomous stage, "the skill becomes continuously more automated and rapid, and cognitive involvement is gradually eliminated." Conversations with a few modeling experts after the second exploratory study as well as this study revealed that they simply ignored the wordings, a seemingly established pattern. Because the modeling novices and experts are at different stages of the learning model, their production rules for interpreting information models are different. The production rule to ensure that the textual information and structural constraints are consistent does not appear to exist in the production sets of the experts. On the other hand, when compared to experts, novices seem to realize the contradiction. In other words, the experts appear to have eliminated that production rule (i.e., checking the consistency of textual information and structural constraints) from their production sets because in general the textual information and structural constraints are in agreement. - 120-The tendency for the experts to focus on structural constraints can also be comprehended from the general use of information models, such as ER model, as a database design tool. It is very common to perceive ER modeling as a means to address the underlying database structure and implementation issues. There is nothing wrong with this view of ER model as a database design tool. The ER model is, indeed, a good way to specify the database structure. Structural constraints, in database design, specify the referential and integrity constraints of the database. Textual information in the model, on the other hand, usually translates into names of database files or structures (for example, in translating ER model to relational schema, names of entities and relationships often become names of relations in the relational database). Expert subjects could, therefore, have possibly regarded structural constraints as more important than textual information, especially if they viewed the information model as a database schema. However, the focus on database structure or implementation issues during information modeling can be detrimental to the objectives of obtaining accurate and correct requirements specification, and of facilitating the communication process between analysts and end-users (as stated in Chapters 1 and 2). Such arguments have also been put forth by researchers such as Embley et al. (1995) and Rumbaugh et al. (1991). For example, Martin and Odell (1992) stressed that a model should be built in such a way that it helps both the analysts and end-users to understand reality. Design and implementation considerations should come only after requirements are correctly specified. -121-6.8.2 Implications for Information Modeling We can infer from the results of this study that modeling experts and novices are significantly different in their cognitive processing of information depicted in the information models. Recalling that information modeling is carried out for the purpose of understanding and communicating (Mylopoulos 1992) requirements specification, these cognitive differences might be detrimental. To use the "blind men and the elephant" analogy, the one who felt the elephant's tail had quite a different view of the elephant from the one who felt its trunk. Therefore, the analysts and end-users, during the information modeling phase, may have different views of the same information model and hence different understanding of the requirements. Referring back to the Johari window introduced in the beginning of this chapter, it appears that the problems in requirements specification are more complicated than they first appeared. We argued that, based on the Johari window, communication between analysts and end-users is an attempt to move to the open quadrant of the window. For the analysts, the organizational or end-users' requirements are hidden from them. For the end-users, the information modeling methods and constructs are also in the hidden quadrant. Even if the analysts know the organizational requirements and the end-users understand the modeling methods and constructs, there is still the problem of cognitive differences between analysts and end-users in their use of information presented in information models. Analysts tend to focus more on the structural constraints rather than the textual information on the information model. These cognitive biases, however, can possibly be remedied. - 122-6.8.3 Possible Remedies These biases can interfere with the information modeling process and affect the quality of requirements specifications. It is, therefore, important for analysts and end-users to be aware of, and understand, their biases so that they can be compensated (Davis 1988). Running experiments such as this helps in identifying the cognitive biases involved in information modeling. Debiasing techniques include training and education to raise awareness. This is especially important for analysts as they are supposed to be the experts in the information modeling process. As information systems professionals, analysts could afford the time and energy to learn the modeling methods, whereas the same could not be said about end-users. Analysts, therefore, should ensure that these cognitive biases and cognitive differences between modeling experts and novices do not jeopardize the information modeling process. It is important for analysts to understand that structural constraints and textual information are both critical in information modeling. In a recent workshop1 (Workshop on Evaluation of Modeling Methods for Systems Analysis and Design) organized by the author in Crete, it was clearly evident that most of the participants in the workshop (mainly with strong technical background) placed more emphasis on the structural constraints than the textual information. One participant even argued that he would prefer to use "A", "B", and "C" rather than "Parents", "Own", and "Child." He was looking at the information modeling process as a preface to database design. He has forgotten that another 1 The Workshop on Evaluation of Modeling Methods for Systems Analysis and Design was held in Crete, Greece from May 20-21, 1996. This workshop was held in conjunction with the 8th International Conference on Advanced Information Systems Engineering (CAiSE'96), Crete, Greece, May 20-24 1996. The organizers of the workshop were Keng Siau (the author) and Yair Wand. - 123-critical, if not more important, objective of information modeling is to correctly and accurately depict the organizational requirements. This misconception can be corrected by training and education. One way to resolve the possible ambiguity between mandatory and optional relationships is to describe the constraints clearly and precisely. For example, Wand, Storey, and Weber (1993) advocated that optional relationships be described as shown in Figures 6.9 and 6.10. Student (0*) .may-borrow (0,1) Book Figure 6.9 Optional Relationship Between Student and Book (taken from Wand, Storey, and Weber 1993, p. 23) Instructor can-teach Course Figure 6.10 Optional Relationship Between Instructor and Course (taken from Elmasri and Navathe 1989, p. 59) By unambiguously describing the types of connectivity between the two entities, it removes the burden for the reader (analyst or end-user) to interpret the connectivity based on his/her common sense knowledge. Moreover, by - 124-adopting this as a standard way of describing relationships, it also removes the problem of conflicting structural constraints and textual information. For example, cases such as the one in Figure 6.11 are easily identified by the analysts and end-users as nonsensical and misspecified. In the diagram, the structural constraints specify an optional relationship whereas the textual information states that the relationship is mandatory. Figure 6.11 A Nonsensical Information Model 6.8.4 Concluding Remarks and Prelude to Next Experiment Because humans have limited information processing capacity and information modeling is a cognitively intensive activity, information modeling is highly susceptible to cognitive biases. There is, therefore, a need for us to identify and understand these biases in order to recognize and control them (e.g., through awareness). Researchers such as Stacy (1995), and Stacy and MacMillian (1995) have also argued for the importance of understanding cognitive biases in software development. They argued that the awareness of these human biases can lead to better software engineering practice, and better understanding between analysts and end-users. Moreover, knowledge about how experts and novices approach and understand information models can aid researchers in designing better information modeling methods. - 125-In the next chapter, we report an experiment carried out to evaluate the different relationship representations and their effect on end-users' comprehension of information models. A relatively new data collection technique, called verbal protocol, is used to "open the black box." -126-CHAPTER 7 Study 2 Effect of Relationship Representations on Interpretation of Information Models by Modeling Novices Everything should be made as simple as possible, but not simpler Albert Einstein 7.1 Background Mylopoulos (1992) states that "a good model allows one to build a description of the subject matter — including the system, the information it will handle and the environment it will function in — that is consistent to the way humans conceptualize the same subject matter." The last point is especially important because it brings to attention a dimension that is often overlooked by researchers — modeling methods are built by humans for use by humans. Researchers develop information models, and information models can be redesigned. By contrast, we cannot change the design of the human. Although the human subsystem is intelligent and adaptive, we cannot change the basic properties that define his/her strengths and weaknesses. In other words, if an information model is to be easy and efficient to communicate with, then the - 127-model and its constructs must be compatible with and supportive of the information processing characteristics of the human mind. In designing modeling methods and their constructs, the designers should keep in mind that humans are directly involved in the modeling process — as creators and consumers of information models. Ignoring "human factors" in the design of modeling methods and constructs will miss an essential part of the picture. Although we strongly believe that the representation of models and constructs is a significant factor in information systems analysis and design, little attention has been paid to this aspect in the design of new modeling methods and choice of constructs. This is analogous to the field of HCI which only gained recognition in computer science as a critical component of software development about a decade ago (Gentner & Grudin 1996). 7.2 Research Questions With the introduction of the OO approach in recent years, as discussed in Chapter 2, numerous implicit representations of the relationship construct have been introduced (such as instance connections and objects). This is because, in pure object-orientation, the only allowable construct is an object. As such, to bend the rule without breaking it, the relationship construct has been masqueraded as instance connections and objects in several OO models (e.g., Coad & Yourdon 1991, Embley et al. 1992). Deciding on a representation for a construct is not a trivial issue. For example, Larkin & Simon (1987, p. 65) argued that information-processing operators working on one representation may recognize features readily or make inferences directly that are difficult to - 128 -realize in another representation. Moreover, disguising a relationship as an instance connection or an object might change the ontology and epistemology of the information systems analysis and development process (as discussed in Chapter 1). Olaisen (1996) notes that "the process of making things obvious is as important to human conduct as critical comprehension." The effect of explicit and implicit relationship constructs on end-users' understanding of information models, therefore, needs to be investigated. The following examples illustrate the notions of explicit and implicit relationship constructs in information modeling. Figure 7.1 illustrates explicit representations of the relationship construct where the relationship is depicted differently from the objects/entities. Figure 7.2 depicts implicit representations of relationship where the relationship is treated as an object. (e.g., Chen 1976, Champeaux et al. 1993) Owner Purchase Vehicle (e.g., Rumbaugh 1991, Embley et al. 1992) V / Figure 7.1 Explicit Representations of the Relationship Construct - 129-Owner Purchase Vehicle (e.g., Coad & Yourdon 1990, 1991, Martin & Odell 1992) Customer Purchase Product (Yourdon, Whitehead, Thomann, Oppel & Nevermann 1995, p.251) Figure 7.2 Implicit Representations of the Relationship Construct The first research question of this study is, thus, to experimentally compare explicit and implicit representations of the relationship construct and to investigate their effects on end-users' understanding of information models. Another observation from the literature on relationship construct is its description. Most researchers (e.g., Chen 1976, Teorey 1990, Elmasri & Navathe 1994) advocate the use of verb for a relationship description. This is in line with the use of a verb for describing the relation in a proposition (as discussed in Chapter 3). Some researchers (e.g., Goldstein 1985, Johannesson & Kalman 1990, Kroenke 1995), on the other hand, use noun to describe relationship. An example to illustrate these two ways of describing relationships is given in Figure 7.3. - 130-Employee Assign Department (e.g., Chen 1976, Elmasri & Navathe 1994) Employee Department (e.g., Goldstein 1985) Figure 7.3 Examples of Relationship Descriptions Research in other areas of MIS has shown that differences in representations can sometimes have a profound effect on users' interpretation and perception of information. For example, the use of color and graphical representation has been widely investigated in the field of MIS (see Carter 1947, Benbasat & Dexter 1985, 1986, Benbasat, Dexter, & Todd 1986, larvenpaa 1989, Tan & Benbasat 1990). Therefore, the second research question of this study is to examine the use of verb versus noun relationship descriptions and to experimentally investigate their effects on end-users' interpretation and understanding of information models. - 131 -To increase the generalizability of the results, in addition to investigating the effects of explicit versus implicit relationship constructs and the use of verbs versus nouns for describing relationships, we introduced task complexity into the research model. Each subject is given information models of three levels of complexity to interpret — simple, moderate, and complex (discussed in greater detail later). This enables us to generalize the results to varying degrees of model complexity. The research model is described in Section 7.4. 7.3 Theoretical Foundation The main theory for this study is Herbert Simon's (1978) theory on equivalence of representations which will be discussed in Section 7.3.1. In addition, we draw upon a number of other theories to help us formulate the hypotheses and interpret the findings. These theories will be discussed in Section 7.3.2. 7.3.1 Theory on Equivalence of Representations1 Before we discuss the theory on equivalence of representations, we will first review Larkin and Simon's (1987) notion of a representation and the cognitive processes (also known as program) involved in understanding a representation. 1 The use of this theory for information modeling has been discussed in "Evaluating Information Modeling Methods — A Cognitive Perspective", Workshop on Evaluation of Modeling Methods in Systems Analysis and Design, Crete, Greece, May 1996, pp. M1-M13 (with Wand Y., and Benbasat, I.). - 132-7.3.1.1 Components of Representation According to Simon and Larkin (1987), a representation is made up of two components: a data structure and the program operating on it to make new inferences. 7.3.1.1.1 Data Structure In the case of information modeling, a data structure is some particular way of organizing information using the modeling method. For example, in the ER approach, information such as "John Owns a Car" can be organized using the data structure consisting of entities and relationship as shown in Figure 7.4. 7.3.1.1.2 Program Based on the ACT architecture (as discussed in Chapter 3), there are three types of processes involved in understanding a given information model. The three processes are (i) encoding and search process, (ii) match process, and (iii) execution process. In addition, an attention management system is required. - 133-With these in mind, we will now discuss the theory on equivalence of representations. 7.3.1.2 Informational and Computational Equivalence Herbert Simon (1978), in proposing the theory on equivalence of representations, argued that it is impossible to find an entirely neutral language to describe representations of information, for a language is itself a form of representation. This difficulty can be overcome, at least in part, by not attempting to describe representations directly, but instead by discussing them in terms of equivalence of representations. At the core of this theory is the notion of informational and computational equivalence of representations (Simon 1978, Larkin & Simon 1987). 7.3.1.2.1 Informational Equivalence Two representations are informationally equivalent if all of the information in one is also inferable from the other, and vice versa (Larkin & Simon 1987). In other words, if the transformation from one to the other entails no loss of information. If each can be constructed from the other then the representations are informationally equivalent. For example, Simon (1978) argued that the statements "Distance equals average velocity times time" and "S=W*T" are informationally equivalent in an appropriate information-processing system. The presupposition that it is for an appropriate information-processing system is important for informational equivalence. - 134-The information-processing system not only needs to know the meanings of S, W, and T, but also has to understand that "=" is "equal" and "*" is the same as "times" in the other statement. As an example, for an appropriate information-processing system, the following two information models in Figure 7.5 (taken from Embley et al. 1992, pp. 28-29) are informationally equivalent. Person owns Vehicle Person Vehicle Model 1 owns Person • Vehicle Model 2 J Figure 7.5 Two Informationally Equivalent Information Models 7.3.1.2.2 Computational Equivalence Two representations are computationally equivalent if the same information can be extracted from each (the same inferences drawn) with about the same - 135-amount of computation. There are two conditions to be satisfied for computational equivalence (Larkin & Simon 1987): (i) the two representations must be informationally equivalent and (ii) any inference that can be drawn easily and quickly from the information given explicitly in one can also be drawn easily and quickly from the other, and vice versa. As an example, consider the following two DFDs (Figures 7.6 and 7.7). For DFD 1, all of the information is depicted on one level. For DFD 2, the leveling technique is used to manage the complexity of the diagram, resulting in a parent diagram and a child diagram. The two DFDs contain the same information and are, therefore, informationally equivalent. Figure 7.6 Data Flow Diagram 1 - 136-Figure 7.7 Data Flow Diagram 2 The two DFDs are, however, not computationally equivalent. Because there are two levels in the second case (i.e., 2 diagrams), the reader needs to shift his/her attention between the two levels if the information he/she needs span both the diagrams (i.e., the attention management is more complicated). The reader is also required to trace the flow of data from the parent diagram to the child diagram and this requires additional processing. - 137-7.3.1.3 Implication of the Theory for Relationship Representations 7.3.1.3.1 Informational Equivalence In one of our recent papers (Siau et al. 1996), we argued that information modeling methods and constructs can be compared and contrasted based on two criteria: informational and computational equivalence. Computational equivalence, however, is not relevant unless the information modeling methods and constructs are informationally equivalent. For example, consider the DFD and ER approaches. A DFD represents the processes and flows of data in an organization whereas an ER diagram represents the data requirements of the organization. It is almost impossible to infer the organizational processes from the ER diagrams. In other words, DFD and ER should not be compared in terms of computational equivalence because they are not informationally equivalent in the first place. For certain representations (e.g., ER and relational representations), training and experience could make two initially informationally inequivalent representations informationally equivalent. For example, we could make the explicit and implicit relationship representations informationally equivalent by providing the subjects with the necessary training. Once two representations are informationally equivalent, they can be compared in terms of computational equivalence. -138-) 7.3.1.3.2 Computational Equivalence Larkin & Simon (1987) argued that the computational efficiency of a representation depends on three factors — data structure, program, and attention management — and on how well these three factors work together. Whether one representation is more computationally efficient than another depends on what productions are available for searching the data structure, for recognizing relevant chunks of information, and for deducing inferences from that information (Simon 1978). Take, for example, the two information models (Figures 7.8 and 7.9) in the following page. For an appropriate information-processing system with the necessary declarative and procedural knowledge (e.g., nouns are entities and verbs are relationships) to understand the information models, the two information models are informationally equivalent, but not computationally equivalent. Although both use the same topology and wordings, the use of the diamond symbol to represent a relationship in Figure 7.8 facilitates the search process and attention management. In this case, the readers could quickly form meaningful chunks of information (e.g., two entities with a connecting relationship) by identifying the relationships (i.e., diamond symbols) in the model. The special symbol for relationship serves as cues that the readers could use to focus attention on part of a diagram and to guide shifts in attention (i.e., easier attention management). In Figure 7.9, the readers have to analyze groups of wordings before they could form meaningful chunks of information. For example, the encoding of groups of words such as "Maintain", "Vehicle", and "Produce" would not result in a - 1 3 9 -match with the conditions of productions in the production memory (as they are meaningless). Because both entities and relationships are represented using the same symbol (i.e., rectangle), additional processing is required to identify meaningful groups of words. Vehicle House Figure 7.8 Information Model 1 l Y i u i n i m i l Factory Ve tide Emj ?loy Worker Own House Figure 7.9 Information Model 2 -140-Because the use of explicit relationship construct facilitates the search process and attention management, we hypothesize that the use of explicit relationship construct will result in fewer errors of interpretation by end-users than the use of implicit relationship construct. Hypothesis 1 The use of explicit relationship construct results in higher accuracy of interpretation than the use of implicit relationship construct. Cognitive psychologists have pointed out that human abilities to recognize information are highly sensitive to the exact form in which the information is presented to the senses (or to memory). Information that is presented directly is much easier to recognize than information that is presented indirectly because translation is required in the latter. The use of verbs for describing relationships is consistent with the use of a verb for describing the relation of a proposition (Gagne et al. 1993) (see Chapter 3). The use of a verb for describing a relationship is, therefore, more direct than the use of a noun for describing a relationship. As such, we hypothesize that: Hypothesis 2 The use of verb descriptions for relationships results in higher accuracy of interpretation than the use of noun descriptions. This study focuses only on modeling novices (such as end-users) whose typical role in the analysis phase is to verify the information models constructed by modeling experts (such as analysts) to ensure that these models reflect the organizational needs correctly and accurately (as discussed in Chapter 1). The efficiency of computation (for any representation) can be improved with - 141 -training and experience. Users reaching the automated stage will be able to perform the computations almost effortlessly. Information modeling experts such as systems analysts are likely to be at the automated stage whereas end-users are typically at the cognitive or associative stage of the learning model (see Chapter 6). The reasons for studying modeling novices are discussed in Section 7.4.6. 7.3.2 Other Supporting Theories 7.3.2.1 Hutchins, Hollan, and Norman's CHI Model The difference between the explicit and implicit representations of relationship can be discussed in terms of the Computer-Human Interface (CHI) model developed by Hutchins, Hollan and Norman (1985). The CHI model explains the relationship between the cognitive effort required to accomplish a task and the distance between the user's goals and the way these goals must be specified to a system. One of the two distances discussed in the model is semantic distance, which has been utilized frequently in the IS literature (e.g., Batra et al. 1990, Suh & Jenkins 1992). In the case of information modeling, semantic distance refers to the distance between the user's intentions and the ability of the modeling constructs to represent those intentions. For information model construction, semantic distance reflects the extent to which the modeling constructs could capture the intentions of the users. A small semantic distance means that thoughts are readily translated into modeling constructs and the translation is simple and straightforward. For - 142-example, if the user needs to represent a relationship and a modeling method provides a construct for representing that relationship, then the semantic distance for representing that relationship would be shorter using the modeling method than one that does not provide an equivalent construct. Thus, a modeling construct that provides users a natural way of encoding and expressing an idea will have a shorter semantic distance than one that is artificial. For information model interpretation, semantic distance refers to the amount of processing that is required by the user to understand a construct. If the constructs do not directly reflect the intentions, the user will be required to translate the constructs into concepts that are compatible with the intentions in order to interpret the model. For example, if a relationship is represented as an object, the information needed is in the model, but it is not in a form that could be directly interpreted. The burden is on the user to make the necessary translation, and this requires effort. Moreover, the translation may not be straight-forward and might be rife with opportunities for misunderstanding. In these cases, the semantic distance would be longer than if no translation is required. It follows, therefore, that it is easier for end-users to interpret explicit representation of the relationship construct because of the shorter semantic distance. The semantic directness of the explicit relationship construct decreases the amount of translation required and provides a natural mapping. For the implicit relationship construct, additional cognitive effort is required to do the transformation. Thus, the semantic distance theory supports the hypothesis (i.e., hypothesis 1) that the use of explicit relationship construct - 143-reduces the semantic distance and in turn leads to better user performance. Batra et al. (1990) have also argued that the use of a special symbol for the relationship construct would lead to a shorter semantic distance. The claim was, however, not empirically tested. Similarly, the use of verbs is semantically more direct than the use of nouns to describe relationships (i.e., consistent with the syntax of English) as the users need not translate the noun descriptions of relationships to verbs. The semantic distance theory, therefore, also predicts that the use of verbs leads to better end-user performance than the use of nouns. 7.3.2.2 Visual Salience Theory Salience refers to the phenomenon in which one's attention is differentially directed to portions of the environment (Taylor & Thompson 1982). Jarvenpaa & Machesky (1989) argued that the visual salience of a construct would impact ease of learning and performance in data modeling. Green (1980) also suggested that the obviousness of the syntactical notation influences learning and performance. These properties are the ones the visual system normally employs in its initial task of segregating one construct from another. According to Treisman (1986, p. 117), "a target that is distinct from its neighbors in its preattentive representation in the brain should 'pop out' of the display." Larkin and Simon (1987) also stressed that the exact form of representation that is presented to our senses has a critical impact on our recognition of information. Moreover, the use of the explicit relationship construct facilitates the forming of chunks (i.e., a chunk is a relationship with its associated entities). The visual salience theory, thus, supports the hypothesis that the use - 144-of explicit relationship construct results in a higher degree of accuracy of interpretation than the use of implicit relationship construct. 7.3.2.3 Wand-Weber Ontological Model Wand and Weber (1993) argued that a one-to-one mapping should exist between each ontological construct and each modeling construct. They introduced the notions of construct overload, construct redundancy, construct excess, and construct deficit (Wand & Weber 1993). (i) Construct overload: when one modeling construct maps into two or more ontological constructs. (ii) Construct redundancy: when two or more design constructs are used to represent a single ontological construct. (iii) Construct excess: when a modeling construct does not map to any ontological construct. (iv) Construct deficit: when an ontological construct does not have any corresponding modeling construct. Wand and Weber (1993) argued that construct overload is undesirable because users must bring to bear other knowledge, which they might not possess, to determine which ontological construct is being represented by the modeling construct. According to the ontological model, object and relationship are -145-separate ontological constructs (Wand, Storey & Weber 1993). Therefore, the use of a single construct for both object/entity and relationship will result in construct overload. The implicit representation of the relationship construct, for example as an object, might result in confusion and misunderstanding because users need to bring knowledge not depicted in the model to differentiate a relationship from an object. This additional knowledge is not required if the relationship construct is represented explicitly. The Wand and Weber model, thus, predicts that the explicit relationship construct is better for end-users' understanding than the implicit relationship construct (i.e., hypothesis 1). 7.3.2.4 Cognitive Capacity Theory Another cognitive science theory that is relevant to our study is the cognitive capacity theory. According to this theory, successful task performance is generally seen as a matter of resource requirements in relation to resource availability. Navon (1984) defines resource as "any internal input essential for processing that is available in quantities that are limited at any point in time." Cognitive effort is defined as the percentage of the available capacity or resources allocated to a given task (Mitchell & Hunt 1989). Performance of any task will be influenced by resource requirements of the task only if resource demand exceeds resource supply. That is, cognitive effort is theoretically relevant only when capacity requirements outstrip the available processing capacity. When resources are limited, the theory predicts that the fewer the resources required by a task, the greater the probability that sufficient resources will be available and the higher the probability of success. - 146-It is hypothesized that the use of the explicit relationship construct results in less cognitive effort required for interpretation. Because humans have limited working memory, the capacity model predicts that for complex task, the explicit relationship construct results in better performance than the implicit relationship construct. Similarly, for complex tasks, the use of verb (versus noun) description produces better end-user performance. At a higher level of complexity, the resulting demands on individuals may exceed their capacities to respond, creating a condition of cognitive overload (Wood 1986). Thus, we hypothesize that: Hypothesis 3a Simple models are interpreted with a higher degree of accuracy than intermediate models. Hypothesis 3b Intermediate complexity models are interpreted with a higher degree of accuracy than complex models. Also, when the information models are simple and the resource capacity is not exceeded, there will be little difference between implicit versus explicit relationship constructs and verb versus noun descriptions. However, once the resources required exceed the capacity of working memory, as in the case of interpreting a complex model, performance will be better with explicit relationship construct and verb descriptions for relationships because of the lesser need for limited resources. -147-We, therefore, predict that the difference in degree of accuracy of interpretation between explicit and implicit relationship constructs will be greater for intermediate and complex models than for simple models. Thus, Hypothesis 4 There is an interaction effect between model complexity and explicitness of the relationship construct. Similarly, we also predict the difference in degree of accuracy of interpretation between verb and noun descriptions of relationships to be greater for intermediate and complex models than for simple models. Hypothesis 5 There is an interaction effect between model complexity and relationship description. 7 A Research Framework A 2 X 2 X 2 X 3 mixed factorial (also known as split-plot) design is used for this study. There are two within-subjects and two between-subjects factors. The research model is shown in Figure 7.10. The first factor is the explicitness of the relationship construct: explicit versus implicit. The second factor is the relationship description: verb versus noun. Model complexity is the third factor and it consists of three levels: simple, intermediate, and complex. Finally, there are two phases in this experiment: Phase 1 and Phase 2. Model complexity and experimental phase are repeated-measures factors. Before the start of Phase 1, the subjects were only given an - 148-example of the information model and its interpretation whereas training materials were given to the subjects prior to the start of Phase 2. The use of two phases enables us to investigate the effect of different degree of training on users' comprehension of information models. The dependent variable, accuracy of interpretation, is captured using verbal protocols (also known as thinking-aloud protocols). Relationship Explicitness - Explicit Construct - Implicit Construct Relationship Description - Verb Description - Noun Description Model Complexity — Simple — Intermediate — Complex User Performance - Accuracy of Interpretation Experimental Phase Phase 1 Phase 2 Figure 7.10 Research Model 7.4.1 Relationship Explicitness The relationship construct is manipulated to be either explicit or implicit. For the explicit relationship construct, we adopted the representation used by Chen - 149-(1976) for the ER model (i.e., a diamond symbol). This representation is widely adopted and known. For the implicit relationship construct, we used the representation proposed by Coad & Yourdon (1991). Coad & Yourdon (1991, p. 131) proposed that for many-to-many instance connection, an "event remembered" object be added to the model. Using their example (p. 131), the instance connection between Owner and Vehicle in Figure 7.11 is many-to-many. A n object Purchase is, therefore, added so that the attributes that describe the connection can be captured. / \ Owner l,m Vehicle l,m DateTime Amount Owner l,m Purchase 1 Vehicle DateTime Amount l,m l,m -v / Figure 7.11 Many-to-manv Instance Connection (taken from Coad and Yourdon 1991, p. 131) Similar representations for the implicit relationship construct are also used by Martin and Odell (1992), Yourdon et. al. (1995), and Norman (1996). For our -150-study, we are interested in investigating the effect of explicit versus implicit relationship constructs. We, therefore, assume that the connectivities between objects/entities are many-to-many. In this way, information models with the explicit relationship construct and those with the implicit relationship construct look exactly the same except for the representation of the relationship construct (as shown in Figure 7.12). We also omit attribute and behavior (i.e., for OO approach) in our information model as they are beyond the scope of this study and are potential confounding variables. Explicit Relationship Construct Owner Purchase Vehicle Implicit Relationship Construct s ; : ) Figure 7.12 Explicit and Implicit Representations of Relationship Construct 7.4.2 Relationship Description There are two levels of this factor: verb versus noun. For example, if the verb description for the relationship is "own", the corresponding noun description will be "ownership." And if the verb description is "produce", the noun - 151-counterpart will be "production." The use of nouns is consistent with the practice by some researchers (e.g., Goldstein 1985, Kroenke 1995). These verbs and nouns were carefully selected and pilot tested using expert subjects (e.g., MIS professors, Ph.D. students, and M.Sc. students). Figure 7.13 shows the four possible cells for the two factors discussed so far. Verb Noun Description Description Explicit Relationship Construct Implicit Relationship Construct Employee — A s s i g n Department Employee —"^Assignmenp— Department \ / \ / Employee Assign Department Employee Assignment Department Figure 7.13 The Four Possible Combinations of Relationship Explicitness and Relationship Description 7.4.3 Model Complexity Model complexity in this study comprises three levels — simple, intermediate, and complex. This is a repeated-measures factor. Each subject is given a total of 12 questions in the test, taken in two phases (with different degree of training). Each phase consists of 6 questions — 2 simple, 2 intermediate, and 2 complex. The rationale for the two phases is discussed in Section 7.4.4. -152-The complexity of the model is operationalized by the number of chunks of information. When more chunks of information are presented in the information model, the information load (Campbell 1988) and the amount of stimulus input (Wood 1986) are increased and these in turn increase the task complexity. A chunk, in our case, is defined as a relationship with its two accompanying entities. This notion of a chunk is very similar to the concept of a proposition in the A C T theory and is supported by the literature on verbal learning tasks. For example, Bower (1969) considered a chunk of verbal material to be a highly integrated or coherent group of words, indexed by a strong tendency for the subject to recall the words together as a unit. Bower (1969, pp. 612-613) goes on to stress that "in almost every aspect, pre-established word groups (cliches) behave like single words in recall... Recall limits are expressible in chunks, not words." To maintain consistency, the following guidelines are followed. A simple model has 3 chunks of information; an intermediate model has 5 or 6 chunks of information; and a complex model has 9 chunks of information. The reason for choosing these numbers is based on our knowledge of the size of working memory (discussed in Chapter 3). According to Miller (1956), there is a limit to the amount of information that can be simultaneously active in the working memory. Because of the limitation of working memory, some ideas and facts are pushed out of the working memory by the entry of new ones. This limitation of the working memory has been widely recognized and used in the systems analysis and design area to determine the number of concepts to -153-present to users at any one time (e.g., Kendall & Kendall 1996, p. 276, Mynatt 1990, pp. 145-146, Page-Jones 1988). Miller claimed that the number of chunks that can be stored in working memory at any one time is 7+2. Simon (1974), on the other hand, asserted that working memory holds only about 5 units. For the simple models, 3 chunks of information is below the threshold of working memory. For the intermediate models of 5-6 chunks, the amount of information is at the threshold representing the size of working memory whereas 9 chunks of information for complex models are above the threshold. This design enables us to evaluate the effect of model complexity on end-users' understanding of information models. Ideally, we would like to randomize the order of the 12 information models. That is to say, some subjects would start with simple models, some with intermediate models, and some with complex models. This would then allow us to overcome the learning effect. However, our pilots indicated that some subjects who were given complex models first were completely overwhelmed by the difficulty involved in understanding them, and subsequently did very poorly in the rest of the study. As a result, we decided to follow a standard sequence by starting with simple information models, followed by intermediate models, then by complex models. This is not a severe limitation in our design because the main objective of this experiment is to investigate the effect of explicit versus implicit relationship constructs and verb versus noun relationship descriptions. -154-7.4.4 Experimental Phase There are two phases in this experiment. Subjects went through both phases (i.e., repeating factor). The 12 questions were randomly split into two sets of 6 questions each (i.e., 2 simple, 2 intermediate, 2 complex). Half of the subjects in each cell were given the first set (say set A) in the first phase and the second set (say set B) in the second phase. The other half were given the opposite (i.e., set B in the first phase and set A in the second phase). This is to counter order effects. Before starting the first phase, an example consisting of an information model and its interpretation was given to the subjects (see Appendix G). No explanation or training was provided by the researchers. The subjects, after reviewing the example, were asked to think-aloud as they tried to understand the 6 information models (2 simple, 2 intermediate, 2 complex) in the first phase (training was provided for thinking-aloud before the start of the experiment ~ see Appendix G). The verbal responses were tape-recorded. After completing the first phase, training materials similar to those used in our previous studies were provided to the subjects (as shown in Appendix G). The subjects were asked to read the training materials. The second set consisting of 6 information models was given to the subjects after they had reviewed the training materials. Similarly, they were asked to think-aloud and the verbal responses were tape-recorded. There was no time pressure or limit on the number of trials that could be attempted by the subjects in both phases. The subjects could also refer to the given example (i.e., Phase 1) and training materials (i.e., Phase 2) as and when needed. The entire set of 12 questions are shown in Appendix G. -155-7.4.5 User Performance The dependent variable, user performance, is measured by the accuracy of interpretation for each chunk of information in the information models. The tape-recorded verbal responses from the subjects were transcribed by a secretary. The transcripts were then coded by two coders using a standard coding scheme (see Appendix H). A set of standard error codes was also used by the two coders (to be discussed later). One of the coders is the author and the other coder is a second year M.Sc. student who was not involved in the experiment and had no idea about the research hypotheses. Verbal protocol analysis (Ericsson & Simon 1993) was employed in this study because of its ability to capture the cognitive processes of the subjects while interpreting the information models. Initially, another approach involving True/False responses was tested. The subjects, in that case, were given the information models on a monitor with five True/False questions for each information model. The accuracy of the responses as well as the time required to make the responses were captured by a program written in Visual Basic. The pilot studies showed that the True/False questions provided information cues for interpreting the information models and the results were distorted by the presence of these cues in the questions (compared to using thinking-aloud protocol). As a result, verbal protocol was employed for data collection. It should be pointed out that if modeling experts were involved in this study, verbal protocol might not be an appropriate data collection technique. Anderson (1995) mentions that in the autonomous stage, where the experts would be at, "sometimes a person even loses the ability to verbally describe the skill." -156-7.4.6 Subject Characteristics Subjects who had no prior training or experience with information modeling were recruited for this study since the objective was to show the effects of various representations on end-users' understanding of information models. There are two main reasons for not including modeling experts in this study. Firstly, the prior knowledge and experience of experts would be a potential confounding variable that cannot be controlled or accounted for. It would be almost impossible to find experts who are skilled in only one of the representations and ignorant of others. For example, modeling experts who are familiar with the ER approach would have little difficulty understanding the various representations used in this study. Secondly, the experts would have developed competence in understanding the representations because of their training and experience. For example, expert users may argue that the method they are using fully satisfy their intentions, even when novice users have problems understanding the method. To modeling experts, the invocation of the mental structure or problem-solving procedures is automated (as discussed in Chapter 6). Hutchins et al. (1985) have argued that automation, though compensating for the deficiencies of the interface, does not reduce the semantic distance that must be spanned — the gulfs between a user's intention and the interface must still be bridged by the user. Although practice and the resulting experience can make the crossing less difficult, it does not reduce the magnitude of the gulfs. This magnitude of the gulfs is experienced by modeling novices. User characteristics, such as sex and age, are controlled by randomization. Each subject is randomly assigned to one of the four cells ~ explicit versus implicit - 157-relationship constructs, and noun versus verb descriptions for relationships. A questionnaire relating to personal demographics and modeling experience was completed by the subjects prior to the test (similar to those used in our previous studies). This is to verify that the subjects were indeed modeling novices. Based on the effect sizes estimated from the two exploratory studies, we adopted an effect size of .4 for this study. In normal circumstances, Cohen and Cohen (1983) proposed power in the .70-.90 range. They went on to say that "in the absence of some preference to the contrary, that power be set at .80. This value falls in the middle of the .70-.90 range and is a reasonable one to use as a convention when such is needed." Cohen (1965, 1977) also indicates that "a = .05 is used as a convention for significance, power = 1 - p* = .80 be used as a convention for power." From the power table, a 2 X 2 design will require a sample size of 56 subjects. Although our design i s a 2 X 2 X 2 X 3 design, the last two factors are repeating factors and there is no power analysis available for repeated design. Keppel (1982) states that "a within-subjects design requires fewer subjects and is more sensitive than a corresponding completely randomized design." To be more conservative, we decided on a sample size of 80 subjects with 20 subjects per cell. Novice subjects were recruited for the experiment. These subjects had no experience in information modeling and had not taken any course in Systems Analysis and Design or Databases. The subjects were randomly assigned to one of the four cells. Each subject received $10 for participating in the experiment -158-which lasted approximately 40 - 60 minutes. In addition, we tried to motivate the subjects by giving the top 20% participants in each experimental condition a cash prize ranging from $20 - $50 (i.e., 1st - $50, 2nd - $40, 3rd - $30, 4th - $20). The evaluation criterion for prizes was based on the number of chunks that were correctly interpreted. 7.5 Experimental Results 7.5.1 Subjects A total of 81 subjects participated in the experiment. The statistical tests revealed that the performance of two subjects were more than 3 standard deviations away from the cell means (after the coding of the transcripts). These two outliers (one from the Explicit/Verb cell and the other from the Implicit/Verb cell) were removed from the analyses. This leaves a total of 79 subjects distributed in the manner shown in Table 7.1. Verb Description Noun Description Explicit Construct 19 20 Implicit Construct 19 21 Table 7.1 Number of Subjects in Each Cell - 159-7.5.2 Inter-Rater Reliability Two coders were involved in the coding of the transcripts. To ensure common understanding of the coding procedure and error codes, the two coders first coded the transcripts of eight subjects (i.e., 2 from each cell) separately. They then met and discussed the problems and established common understanding of the error codes. The remaining 73 subjects were then coded independently. The inter-rater reliabilities (i.e., Pearson Correlation Analysis) for the 73 subjects are shown below. (i) Based on the total number of correct interpretations by each subject (combining the three levels of complexity and the two phases) (N=73), it (ii) Based on the number of correct interpretations by each subject (combining the two phases but separating the three levels of complexity) (iii) Based on the number of correct interpretations by each subject (combining the two phases but separating the three levels of complexity and breakdown by each cell), they are is 0.92. (N=219), it is 0.95. Explicit and Verb Explicit and Noun Implicit and Verb Implicit and Noun 0.98 (N=54), 0.97 (N=54), 0.97 (N=54), 0.92 (N=57). -160-As can be seen, the inter-rater reliabilities are high. The slight decrease in the inter-rater reliability for the cell "Implicit and Noun" is probably due to the increased number of inaccurate and ambiguous interpretations by the subjects. For the cell "Explicit and Verb", the inter-rater reliability is high because most of the chunks were correctly interpreted. We also used the non-parametric Cohen's Kappa statistic to compute the reliability of coding by the two coders. Cohen's Kappa is a measure of association and it provides an index of agreement between two raters (Siegel & Castellan 1988, Norusis 1995). In Kappa statistics, the unit of analysis is a chunk. The Kappa index, computed using SPSS 6.1, is 0.7 - indicating a high agreement between the two raters. The data for running Cohens Kappa statistics are shown in Table 7.2. For example, the number 1072 in cell 1 means that a total of 1072 chunks was coded by both coders 1 and 2 to be wrongly interpreted by the subjects. As for the number 246 in cell 2, these 246 chunks were coded by coder 1 as incorrectly interpreted by the subjects but coded by coder 2 as correct interpretations by the subjects. As can be seen from Table 7.2, 88% (i.e., (1072+3501)/5183) of the ratings fall on the diagonal, indicating a high degree of agreement between coders 1 and 2. Coder 2 Count Wrong Right Wrong 1072 246 Right 364 3501 Table 7.2 Number of Chunks in Each Cell - 161 -7.5.3 Accuracy of Interpretation Because of the high inter-rater reliabilities and agreement between the two coders, the data from one of the coders (i.e., those of the author) were used for subsequent statistical analyses. These statistical analyses were carried out using General Linear Model (GLM) (SAS 1985). Table 7.3 shows the results of analyses of variance for the four factors and their interactions. Effect DF MS F SigF Between Explicitness 1 12752.05 10.40 0.0019 Subjects Description 1 42521.69 34.67 0.0001 Explicitness*Description 1 855.30 0.70 0.4063 Error 75 1226.40 Within Complexity 2 4691.53 22.63 0.0001 Subjects Complexity*Explicitness 2 254.25 1.23 0.2963 Complexity*Description 2 20.57 0.10 0.9056 Complexity*Explicitness*Description 2 219.38 1.06 0.3497 Error (Complexity) 150 207.32 Phase 1 30809.40 41.20 0.0001 Phase*Explicitness 1 466.59 0.62 0.4321 Phase*Description 1 6095.16 8.15 0.0056 Phase*Explicitness*Description 1 582.28 0.78 0.3804 Error (Phase) 75 747.88 Complexity*Phase 2 9.79 0.05 0.9531 Complexity*Phase*Explicitness 2 96.41 0.47 0.6240 Complexity*Phase*Description 2 65.33 0.32 0.7263 Complexity*Phase*Explicitness*Description 2 59.55 0.29 0.7471 Error (Complexity*Phase) 150 203.82 Table 7.3 Analyses of Variance for Accuracy of Interpretation - 162-All four main effects are statistically significant. The experimental phase is statistically significant (p < 0.0001) which means that the accuracy of interpretation is much better in Phase 2 than Phase 1. This is expected as the accuracy of interpretation should improve after reading the training materials prior to Phase 2 and through practice in Phase 1. In Phase 1, the subjects had to work from the given example to see how the task was to be performed. With the added training and experience, we argue that the subjects' interpretation skill makes a transition from declarative knowledge to procedural knowledge (see Chapter 3). The skill becomes much more fluid and error free (Anderson 1995). Complexity is also statistically significant (p < 0.0001). The Duncan statistics show that there is a significant difference (p < 0.05) between simple and intermediate information models. Hypothesis 3a (i.e., simple models are interpreted with higher accuracy than intermediate models) is, therefore, supported. There is, however, no significant difference between intermediate and complex information models at the a = 0.05 level. It appears that once the working memory is loaded (e.g., using Simon's notion of 5 chunks of information), there is little difference in the accuracy of interpretation. In these cases, only part of the information models could be attended to at any one time. Hypothesis 3b (i.e., intermediate models are interpreted with higher accuracy than complex models) is not supported. Analyses of variance also show the use of explicit versus implicit relationship constructs to be significantly different (p < 0.0019). The use of verb versus noun descriptions of relationships is also significantly different (p < 0.0001). -163-Therefore, hypotheses 1 (i.e., the explicit relationship construct is interpreted with higher accuracy than the implicit relationship construct) and 2 (i.e., the verb description are interpreted with higher accuracy than the noun description) are supported. The only interaction effect that is significant is the interaction between the experimental phase and the relationship description (p < 0.0056). In other words, there is a difference in improvement from Phase 1 to Phase 2 for verb versus noun relationship descriptions. Hypothesis 4 (i.e., interaction between relationship representation and complexity) and hypothesis 5 (i.e., interaction between relationship description and complexity) are not supported. To summarize, for the six hypotheses in this study, hypotheses 1, 2, and 3a are supported whereas hypotheses 3b, 4 and 5 are not. These results will be discussed in more detail in subsequent sections. 7.5.3.1 Results on Relationship Explicitness The percentages of correct interpretations of explicit and implicit relationship constructs for the two phases and the three levels of complexity are shown in Table 7.4. These percentages are depicted graphically in Figure 7.14. As can be seen from the table and figure, the explicit relationship construct is consistently better than the implicit relationship construct for both phases and the three levels of complexity. - 164-Phase 1 Phase 2 Simple Interm. Complex Simple Interm. Complex Explicit 76.07 (33.94) 69.97 (28.45) 67.38 (28.82) 94.44 (11.04) 86.89 (15.01) 87.18 (11.90) Implicit 70.83 (32.85) 57.27 (25.70) 59.72 (22.71) 84.17 (19.23) 73.81 (23.28) 73.06 (17.68) Table 7.4 Percentages (Means and Standard Deviations) of Correct Interpretations of Explicit and Implicit Relationship Constructs 90 V Explicit > — • 80-^ ^ S S N N \ ^ Implicit^  70 - • Explicit 60 \ Implicit^— 50 1 1 1 1 1 1 S I C S I C Phase 1 Phase 2 Figure 7.14 Graphical Representation for Percentages of Correct Interpretations of Explicit and Implicit Relationship Constructs - 165-7.5.3.2 Results on Relationship Description Table 7.5 shows the percentages of correct interpretations of verb and noun descriptions of relationships for the two phases and the three levels of complexity. These percentages are depicted graphically in Figure 7.15. Phase 1 Phase 2 Simple Interm. Complex Simple Interm. Complex Verb 87.28 (20.66) 76.89 (19.06) 77.05 (16.35) 94.30 (11.80) 87.40 (14.06) 86.40 (12.64) Noun 60.57 (37.59) 51.16 (28.81) 50.95 (27.16) 84.55 (18.78) 73.65 (23.46) 74.12 (17.77) Table 7.5 Percentages (Means and Standard Deviations) of Correct Interpretations of Verb and Noun Descriptions There are two observations from Figure 7.15. Firstly, verb descriptions of relationships result in a consistently higher degree of accuracy of interpretation than the noun descriptions of relationships for both phases and the three levels of complexity. Secondly, the difference between the degree of accuracy in interpreting verb and noun descriptions of relationships in Phase 1 is larger than that in Phase 2. This difference is statistically significant (p < 0.0056). In other words, there is an interaction effect between the experimental phase and the relationship description. -166-Figure 7.15 Graphical Representation for Percentages of Correct Interpretations of Verb and Noun Descriptions 7.5.3.3 Accuracy of Interpretation for Combinations of Relationship Explicitness and Description Table 7.6 and Figure 7.16 show the percentages of correct interpretations of combinations of the two levels of relationship explicitness and the two levels of relationship description. - 167-Phase 1 Phase 2 Simple Interm. Complex Simple Interm. Complex Explicit Verb 92.98 (16.02) 83.69 (17.61) 84.50 (15.00) 99.12 (3.82) 93.50 (9.71) 94.74 (7.05) Explicit Noun 60.00 (38.77) 56.93 (30.94) 51.11 (29.60) 90.00 (13.68) 80.60 (16.61) 80.00 (11.17) Implicit Verb 81.58 (23.50) 70.10 (18.41) 69.59 (14.39) 89.47 (14.92) 81.30 (15.29) 78.07 (11.49) Implicit Noun 61.11 (37.39) 45.67 (26.19) 50.79 (25.35) 79.37 (21.67) 67.03 (27.29) 68.52 (21.11) Table 7.6 Percentages (Means and Standard Deviations) of Correct Interpretations 100 70 40 " N , ^ ^ ^xplicit/Verb ^ S S s S s ^^Expl i c i t /Verb * \ a + Explicit/Noun ^ N ^Implicit/Verb N . Implicit/Verb X—~-x ifc Implicit/Noun \"^~~-'4i vExpl i ci t/N ou n J T Implicit/Noun 1 1 1 1 1 1 1 I c Phase 1 s i c Phase 2 Figure 7.16 Graphical Depiction of Percentages of Correct Interpretations - 168-There are a number of interesting observations from Figure 7.16. Firstly, in Phase 1, verb description seems to have a major impact on the accuracy of interpretation with the lines representing Explicit/Verb and Implicit/Verb much higher on the scale than those representing Explicit/Noun and Implicit/Noun. One possible explanation is that the training example given to the subjects prior to Phase 1 did not discuss the meaning of the notations (e.g., diamond represents relationship). The subjects, therefore, rely heavily on everyday usage of English language to understand the information in the models. In this case, the use of verbs is very helpful. Explicit/Noun and Implicit/Noun are interpreted with almost the same level of accuracy for simple and complex information models. This shows that with very minimal training (e.g., only one example), the use of noun descriptions is detrimental to the understanding of information models. In both phases, Explicit/Verb is the best and Implicit/Noun is the worst. Referring back to the theories, it appears that Explicit/Verb is the best because both the explicit relationship construct and the use of verb descriptions for relationships serve as information cues to facilitate the subjects' interpretation of information models. On the other extreme, for Implicit/Noun, these information cues are missing. The end-users, in this case, have to resort to common sense or bring additional knowledge to help them interpret the information models. This results in a lower level of accuracy. In Phase 2, Implicit/Verb and Explicit/Noun have almost the same number of correct interpretations for all three levels of complexity. This suggests that with some training (e.g., through reading training materials) and experience (e.g., that gained in Phase 1), the information cues provided by the explicit relationship - 169-construct (e.g., knowing that diamond means relationship) and verb descriptions of relationships produce about the same result. The results obtained so far will be further discussed and explained in Section 7.6 using the findings from a verbal protocol analysis of the types of errors made. 7.5.4 Types of Errors In addition to the accuracy of interpretation, the coding scheme also captures the types of errors made by the subjects. Ten error codes were used to categorize the errors committed by the subjects (shown in Table 7.7). One or more error codes could be used to categorize a single chunk. More than one error code could be associated with a chunk because a chunk could be attempted by the subjects many times resulting in different error codes. As a result, Cohen's Kappa is not an appropriate statistic to analyze the agreement between the two raters. Based on the total number of counts for each error code (separating the two phases, the two relationship constructs, and the two relationship descriptions), the Pearson Correlation between the two coders is 0.93. In the following sections, we used qualitative interpretation to analyze the error types. - 170-7.5.4.1 Description of Error Types Table 7.7 shows the ten error codes used in the coding. These ten error codes will be described and examples will be taken from the transcripts to illustrate these errors. It should be noted that benefits of doubt were given to the subjects as much as possible. Also, there is usually more than one correct interpretation for a chunk and we accept all reasonable interpretations. Once a chunk was correctly interpreted, we ignored subsequent interpretations of the chunk (whether the subsequent interpretations were right or wrong). Code Description 1 Interpret in the reverse direction 2 Confuse relationships with entities/objects 3 Bring/add additional information 4 Give up 5 Link more than one chunk together 6 Incomprehensible sentence or missing keywords 7 Go through all the words (or most of the words) at the beginning 8 Express confusion or difficulty 9 Others (will be documented in the coding sheet by coder) 10 Missing interpretation Table 7.7 List of Error Codes (i) Interpret in the reverse direction This error code is used when subjects interpret the chunk in the reverse direction and the interpretation is meaningless. For example, the chunk - 171 -depicted as "Driver — Assign - Shipment" was interpreted by one subject as "Shipment assigns driver." This interpretation is meaningless and one of the correct interpretations is "Drivers are assigned to shipments." Another subject interpreted the chunk "Juror - Inquire ~ Case" as "Case inquires juror." (ii) Confuse relationships with entities/objects In this case, the subjects treat a relationship as an entity/object. For example, the chunk "Book — Writer — Researcher" is supposed to be interpreted as "Researchers write books." One subject interpreted it as "Book has a writer and a researcher." She treated the relationship "writer" as an entity/ object. (iii) Bring/add additional information Some subjects brought additional information to interpret the information models. For example, one subject commented: "Umm... somehow mortgage comes to mind. Banking." This information (i.e., mortgage, banking) was not in the information model. Another subject said: "When you insure a car, the car is then registered. And because ICBC is owned by the province, that's why the registration of the car is connected by province." This subject added her own knowledge about the vehicle registration procedure in British Columbia into the interpretation (ICBC, which stands for Insurance Corporation of British Columbia, was not in the information model). - 172-(iv) Give up This error code is used when a subject gives up interpreting the information model. For example, one subject said: "Umm... I don't understand this one, so I'll go on to the next one." Another subject commented: "Hmm. I don't know. It's too hard, I don't know. It's confusing.", and moved on to the next information model. (v) Link more than one chunk together This error code is used when subjects link more than one chunk of information together which resulted in an erroneous interpretation. For example, the three chunks of information "College ~ Regulation — Club", "Club — Organization — Activity", and "Student — Participation ~ Activity" were interpreted by a subject as "... college, who, which, ahh, regulated the... club organization that student participated in, and, ahh... activity in the organization." (vi) Incomprehensible sentence or missing keywords This error code is used when the coders cannot understand or make sense of the interpretations. For example, one subject said "Maintenance, manufacturing or construction." Another example is "Organization... suggestion, attachment." These statements are incomprehensible. -173-(vii) Go through all the words (or most of the words) at the beginning Some subjects went through all or most of the words in the information models at the beginning of their interpretation. This is not really an error and might mean that they were simply scanning the information model while thinking aloud. For example, one subject read out most of the words in the information model: "Admittance, college. Sponsorship, club. College, regulation, participation, activity. Approval. Organization. Club." (viii) Express confusion or difficulty This error code is used when subjects express difficulty during interpretation. This is different from the error code "Give up" where the subject gives up interpreting the information model. For example, the following are some of the comments made by subjects while they were interpreting the models: "Ahh... that one... umm, I'm not sure, allocation and truck." "Hmm, I don't understand this case." "I don't... really understand how these things are related or what they're for." "But again it doesn't make a lot of sense to me the way it's drawn, it's rather confusing." (ix) Others (will be documented in the coding sheet by coder) This error code is used by the coder to indicate something out of the ordinary. For example, a coder used the code to bring attention to a subject who repeated his interpretation of the information models all over again after he had finished interpreting them. This code was also used by a coder to indicate -174-performing a task incorrectly. For example, the following is part of the transcript of a subject who instead of interpreting the information model was criticizing the layout of the information in the model: "Personally for me to make sense of what this is supposed to be, I'd do it differently, may be something more on the lines of pyramid where the most important thing is at the top and branched out from there and was interconnected from there." (x) Missing interpretation This error code is used when the subjects omit the interpretation of a chunk of information. 7.5.4.2 Results on Error Types The total number of errors for explicit versus implicit relationship constructs and verb versus noun relationship descriptions for the two phases are shown in Table 7.8 and Figure 7.17. As can be seen from Figure 7.17, in both phases, the implicit relationship construct resulted in more errors of interpretation than the explicit relationship construct. Also, the verb description are better than the noun description in both phases. - 175-Phase 1 Phase 2 Row Code Noun Verb Total 1 Noun Verb Total 2 Total Explicit 1 11 41 52 22 32 54 106 2 202 5 207 46 1 47 254 3 88 36 124 12 12 24 148 4 0 1 1 0 0 0 1 5 86 48 134 35 8 43 177 6 45 19 64 38 18 56 120 7 4 12 16 7 17 24 40 8 3 9 12 1 1 2 14 9 0 4 4 1 1 2 6 10 15 42 57 14 12 26 83 Total 454 217 671 176 102 278 949 Implicit 1 18 27 45 40 23 63 108 2 201 34 235 129 30 159 394 3 88 23 111 31 33 64 175 4 1 0 1 0 1 1 2 5 164 127 291 80 95 175 466 6 30 48 78 29 26 55 133 7 3 4 7 0 3 3 10 8 4 3 7 2 2 4 11 9 0 0 0 1 0 1 1 10 41 27 68 14 16 30 98 Total 550 293 843 326 229 555 1398 Column 1 29 68 97 62 55 117 214 Total 2 403 39 442 175 31 206 648 3 176 59 235 43 45 88 323 4 1 1 2 0 1 1 3 5 250 175 425 115 103 218 643 6 75 67 142 67 44 111 253 7 7 16 23 7 20 27 50 8 7 12 19 3 3 6 25 9 0 4 4 2 1 3 7 10 56 69 125 28 28 56 181 Total 1004 510 1514 502 331 833 2347 Table 7.8 Error Counts - 176-llOOy-1000-900-800-•3 700-1 6 0 ° -? 500-J 4 0 0 " 30a 20a 100-0-i 471 Explicit Implicit Explicit Implicit Phase 1 Phase 2 ~i 1 r Verb Noun Phase 1 Verb Noun Phase 2 Figure 7.17 Error Counts In terms of percentages, in Phase 1, the explicit relationship construct has 20% (i.e., (843-671)/843) less error counts than the implicit relationship construct and the verb description has 49% (i.e., (1004-510)/1004) less error counts than the noun description. It, thus, appears that for subjects with very minimal training, the use of verb over noun description has a more profound impact than the use of the explicit over the implicit relationship construct (i.e., 49% vs. 20%). In Phase 2, the explicit relationship construct has 50% (i.e., (555-278)/555) less error counts than the implicit relationship construct and the verb description has 34% (i.e., (502-331)/502) less error counts than the noun description. This result is in contrast to the percentages obtained in Phase 1. In Phase 2, where the subjects had more training (i.e., training materials) and more experience - 177-(compared to Phase 1) in interpreting information models, the use of the explicit over the implicit relationship construct seems to reduce the error counts more than the use of a verb over noun description. The results indicate that with very minimal training, the use of a verb versus a noun description for a relationship is relatively more important than the use of explicit versus implicit relationship constructs. With training and experience in interpreting the information models, the problem encountered in interpreting a noun description can be partly overcome, increasing the relative importance of the use of the explicit over the implicit relationship construct. These differences will be discussed further in Section 7.6.2. 7.5.4.2.1 Explicit Versus Implicit Relationship Constructs in Phase 1 The counts for each error type encountered in interpreting explicit versus implicit relationship constructs in Phase 1 are shown in Table 7.9. The three main errors in interpreting information models where the relationship is depicted explicitly are: (i) Confuse relationship with entities/objects (31%, 207 counts), (ii) Link more than one chunk together (20%, 134 counts), (iii) Bring additional information (18%, 124 counts). Together these three types of errors accounted for 69% of all errors encountered in the interpretation of information models with explicit relationship construct in Phase 1. -178-Error Description Explicit Implicit Confuse relationships with entities/objects 207 (31%) 235 (28%) Link more than one chunk together 134 (20%) 291 (35%) Bring/add additional information 124 (18%) 111 (13%) Incomprehensible sentence or missing keywords 64 (10%) 78 (9%) Missing interpretation 57 (8%) 68 (8%) Interpret in the reverse direction 52 (8%) 45 (5%) Go through all/most words at the beginning 16 (2%) 7 (1%) Express confusion or difficulty 12 (2%) 7 (1%) Others 4 (1%) 0 (0%) Give up 1 (0%) 1 (0%) Table 7.9 Error Counts for Explicit and Implicit Relationship Constructs in Phase 1 The three main errors in interpreting information models where the relationship is depicted implicitly are: (i) Link more than one chunk together (35%, 291 counts), (ii) Confuse relationship with entities/objects (28%, 235 counts), (iii) Bring additional information (13%, 111 counts). These three types of errors are the same as those encountered in interpreting models with explicit relationship construct and they accounted for the majority of errors (76%) coded in the interpretation of information models with implicit relationship construct. The reversal of the top two errors in interpreting the implicit versus the explicit relationship construct seems to imply that the subjects encountered difficulties in identifying chunks in models with the implicit relationship construct. - 179-7.5.4.2.2 Verb Versus Noun Relationship Descriptions in Phase 1 Table 7.10 depicts the error counts for each error type for noun and verb relationship descriptions in Phase 1. The three main errors in interpreting information models where the relationship is described using a noun are: (i) Confuse relationship with entities/objects (40%, 403 counts), (ii) Link more than one chunk together (25%, 250 counts), (iii) Bring additional information (18%, 176 counts). Error Description Noun Verb Confuse relationships with entities/objects 403 (40%) 39 (8%) Link more than one chunk together 250 (25%) 175 (34%) Bring/add additional information 176 (18%) 59 (12%) Incomprehensible sentence or missing keywords 75 (7%) 67 (13%) Missing interpretation 56 (6%) 69 (14%) Interpret in the reverse direction 29 (3%) 68 (13%) Go through all/ most words at the beginning 7 (1%) 16 (3%) Express confusion or difficulty 7 (1%) 12 (2%) Give up 1 (0%) 1 (0%) Others 0 (0%) 4 (1%) Table 7.10 Error Counts for Verb and Noun Relationship Descriptions in Phase 1 - 180-Together these three types of errors accounted for 83% of all errors encountered in the interpretation of information models where a relationship is described using a noun. The three main errors in interpreting information models where the relationship is described using a verb are: (i) Link more than one chunk together (34%, 175 counts), (ii) Missing interpretation (14%, 69 counts), (iii) Interpret in the reverse direction (13%, 68 counts). These three types of errors accounted for 61% of all error counts in the interpretation of information models with a verb description. It should be noted, however, that the total error count for these three main errors is only 312. This is still lower than the most common error for noun description (i.e., 403 counts for "Confuse relationships with entities/objects"). This number one error in interpreting the noun description is almost a non-issue in interpreting the verb description (i.e., 403 vs. 39 counts). The use of a verb, therefore, reduces the confusion between entities/objects and relationships. This is consistent with Larkin and Simon's (1987) argument that the use of an exact form of representation (e.g., verb for relationship) is important for user comprehension. -181-7.5.4.2.3 Explicit Versus Implicit Relationship Constructs in Phase 2 The counts for each error type encountered in interpreting explicit versus implicit relationship constructs in Phase 2 are shown in Table 7.11. Error Description Explicit Implicit Incomprehensible sentence or missing keywords 56 (20%) 55 (10%) Interpret in the reverse direction 54 (19%) 63 (11%) Confuse relationships with entities/objects 47 (17%) 159 (29%) Link more than one chunk together 43 (15%) 175 (32%) Missing interpretation 26 (9%) 30 (5%) Bring/ add additional information 24 (9%) 64 (12%) Go through all/most words at the beginning 24 (9%) 3 (1%) Express confusion or difficulty 2 (1%) 4 (1%) Others 2 (1%) 1 (0%) Give up 0 (0%) 1 (0%) Table 7.11 Error Counts for Explicit and Implicit Relationship Constructs in Phase 2 The three main errors in interpreting information models with the explicit relationship construct in Phase 2 are: (i) Incomprehensible sentence (20%, 56 counts), (ii) Interpret in the reverse direction (19%, 54 counts), (iii) Confuse relationships with entities/objects (17%, 47 counts). These three types of errors accounted for 56% of all errors encountered in interpreting information models with the explicit relationship construct. - 182-Compared to Phase 1 where the most common error (i.e., confusing relationships with entities/objects) registered a count of 207, Phase 2 has a much lower error count where the three main types of errors registered a total count of only 157. There is a substantial reduction in the total number of errors from Phase 1 to Phase 2 with the total error count reduced by 59% (i.e., (671-278)/671). The three main errors in interpreting information models with the implicit relationship construct in Phase 2 are: (i) Link more than one chunk together (32%, 175 counts), (ii) Confuse relationships with entities/objects (29%, 159 counts), (iii) Bring/ add additional information (12%, 64 counts). These three types of errors accounted for 73% of all errors encountered in interpreting information models with the implicit relationship construct. These three categories are the same (and in the same order) as those encountered in Phase 1. The total number of error counts dropped by 34% (i.e., (843-555)/843) from Phase 1 to Phase 2 for the implicit relationship construct. Compared to 59% reduction in error count (from Phase 1 to Phase 2) for the explicit relationship construct, the reduction in error count (from Phase 1 to Phase 2) for the implicit relationship construct is not as substantial (i.e., 25% less than the explicit relationship construct). It, thus, appears that training and experience in interpretation have a more profound impact in interpreting the explicit than the implicit relationship construct. One possible reason is that the constructs used in the information models were discussed in the training sets prior to Phase 2 whereas they were not discussed in the training example given to the subjects prior to Phase 1. The explicit relationship construct was, -183-therefore, meaningful to the subjects in Phase 2 but may be an "unknown" symbol to the subjects in Phase 1. The error, "Link more than one chunk together", is consistently the top error type for the implicit relationship construct in both phases (291 counts in Phase 1 and 175 counts in Phase 2). Comparatively, this error is substantially reduced for the explicit relationship construct in Phase 2 (134 counts in Phase 1 and 43 counts in Phase 2). This is not surprising as the explicit relationship construct facilitates the forming of a chunk. For example, by glancing at the information models, the subjects could easily locate a relationship (i.e., diamond symbol) and its two accompanying entities to form a chunk. Information models with implicit relationship construct, on the other hand, appear the same throughout the models. It is very difficult to know when one chunk ends and the next one begins unless one reads the textual information. - 184-7.5.4.2.4 Verb Versus Noun Relationship Descriptions in Phase 2 Table 7.12 depicts the error counts for each error type for noun and verb relationship descriptions in Phase 2. Error Description Noun Verb Confuse relationships with entities/objects 175 (35%) 31 (9%) Link more than one chunk together 115 (23%) 103 (31%) Incomprehensible sentence or missing keywords 67 (13%) 44 (13%) Interpret in the reverse direction 62 (12%) 55 (17%) Bring/add additional information 43 (9%) 45 (14%) Missing interpretation 28 (6%) 28 (8%) Go through all/ most words at the beginning 7 (1%) 20 (6%) Express confusion or difficulty 3 (1%) 3 (1%) Others 2 (0%) 1 (0%) Give up 0 (0%) 1 (0%) Table 7.12 Error Counts for Verb and Noun Relationship Descriptions in Phase 2 The three main errors in interpreting information models where the relationship is described using a noun are: (i) Confuse relationship with entities/ objects (35%, 175 counts), (ii) Link more than one chunk together (23%, 115 counts), (iii) Incomprehensible sentence (13%, 67 counts). Together these three types of errors accounted for 71% of all errors encountered in interpreting information models with noun relationship descriptions. The - 185-top two errors in this phase are the same as those in Phase 1. Overall, there is a 50% reduction (i.e., (1004-502)/1004) in error counts from Phase 1 to Phase 2. The three main errors in interpreting information models where the relationship is described using a verb are: (i) Link more than one chunk together (31%, 103 counts), (ii) Interpret in the reverse direction (17%, 55 counts), (iii) Bring additional information (14%, 45 counts). These three types of errors accounted for 62% of all errors encountered in interpreting information models with verb descriptions. The total error counts of 331 in Phase 2 is a 35% (i.e., (510-331)/510) reduction from Phase 1 (compared to 50% reduction in error counts for the noun description). This shows that subjects benefited more from the training materials (provided to them before the start of Phase 2) and the experience gained (from Phase 1) when interpreting information models with noun descriptions than with verb descriptions. One reason is that in Phase 1 the verb description is already easy enough for the subjects to comprehend because of its semantic directness to the English language. -186-7.6 General Discussion 7.6.1 Summary of Findings The statistical analysis shows that the explicit relationship construct is significantly better than the implicit relationship construct (p < 0.0019). Similarly, the use of a verb description for a relationship results in a significantly higher degree of accuracy of interpretation than the use of a noun description (p < 0.0001). Hypotheses 1 and 2 are supported. The explicit relationship construct is consistently better than the implicit relationship construct for all three levels of complexify and the two phases. This indicates that the salience feature of the explicit relationship construct is important in end-users' interpretation of information models. Similarly, the verb description is consistently better than the noun description for all three levels of complexity and the two phases. This shows that the semantic directness of the verb description is also important in end-users' comprehension of information models. Results also indicate that the use of a verb description is particularly helpful for subjects who had very little training (Phase 1 of our study). A duncan test reveals that there is a significant difference (p < 0.05) in accuracy between the interpretation of simple and intermediate information models but there is no significant difference between interpreting intermediate and complex information models. Hypothesis 3a is supported whereas hypothesis 3b is not. A distinct break occurred at the intermediate level. - 187-This finding bears some resemblance to the one obtained by Graesser and Mandler (1978). In that experiment, the researchers gave subjects sets of nouns ranging in number from 1 to 12. The subjects' task was to find a dimension, feature, or attribute that linked the set members together. For example, if a particular set contained dog and goat, the subjects might say that they were both animals or that they both had a "g" in them. The latencies of these responses were measured. The results showed that the time taken by the subjects to find some linking dimension or feature increased in a linear fashion up to about 6 nouns per set. Then a distinct break occurred and after that (from about 6 to 12 items per set) the rate of increase in the time taken required to find a response (i.e., the slope) was lower than the increase observed up to 6 nouns per set. The researchers explained the decreased slope this way: With up to 6 nouns per set the subjects were trying to find a dimension or feature that would link all items in the set. But when the subjects were given more than 6 nouns per set, they could not handle them all at once; the task exceeded their limited capacity. In the latter cases, the subjects began to find dimensions, or features, that would link up to 6 nouns, but would not cover the extra nouns that the experimenter had included. This explanation proposed by Graesser and Mandler (1978) could also be used to explain the distinct break that occurred at the intermediate level in our case. For simple information models, the subjects tried to accommodate all the data elements in their short term memory. However, when dealing wi th intermediate and complex information models, the subjects could only attend to data elements in the information models up to the limit of short term memory at any one time. As a result, a complex model is no more than a combination of intermediate models. Therefore, there is a difference in - 188-accuracy of interpretation between simple and intermediate models whereas there is almost no difference in accuracy of interpretation between intermediate and complex models. Another possible factor contributing to the lack of significant differences between intermediate and complex models is the learning effect — when subjects encountered the complex models, they had already accumulated enough experience from interpreting simple and intermediate models. Similarly, without the learning effect, we would expect a larger drop in accuracy of interpretation from simple to intermediate models if not because of learning effects. Surprisingly, even for simple models, the use of the explicit relationship construct and the verb description had a profound impact on subjects' performance. Our analysis of error types revealed that subjects had problems forming meaningful chunks of information with the implicit relationship construct or the noun description. It is to be noted that even for simple information models, there are at least six data elements (i.e., object/entity or relationship) in the models. Thus, if subjects could not form meaningful chunks but instead viewed each individual element (i.e., object/entity or relationship) as a chunk, even simple information models would require substantial cognitive effort to understand. This phenomenon may occur in some subjects. The interaction between model complexity and relationship explicitness is not statistically significant (p > 0.29). Similarly, the interaction between model complexity and relationship description is also not statistically significant (p > 0.9). Hypotheses 4 and 5 are, therefore, not supported. -189-7.6.2 Further Discussion on Error Types Table 7.13 lists the number of errors for each of the cells and the number of errors per chunk. Figure 7.18 depicts graphically the total number of errors in because there are fewer chunks in simple and intermediate models, the error counts are lower for these two levels of complexity. From Figure 7.18, we can see that for all four representations, the number of errors in Phase 2 is less than the number of errors in Phase 1. This is expected because of the additional training materials given to the subjects before the start of Phase 2 and the practice they had in Phase 1. interpreting the four representations in the two phases. It should be noted that 300-j-280--260--240--220--200--180--160--140 -120--100--80--60 --40--20 --0 Explicit/Verb Explicit/Noun Implicit/Verb S I C s I c Phase 1 Phase 2 Figure 7.18 Number of Errors for Each Representation - 190-Phase 1 Phase 2 Code Simple Interm. Complex Simple Interm. Complex Explicit/ 1 2 4 5 1 10 11 Noun 2 27 73 102 1 12 33 3 24 23 41 2 3 7 4 0 0 0 0 0 0 5 9 23 54 4 18 13 6 10 16 19 8 3 27 7 0 3 1 2 0 5 8 0 3 0 0 0 1 9 0 0 0 0 1 0 10 1 2 12 0 5 9 Total 73 147 234 18 52 106 Errors/Chunk 0.61 0.64 0.65 0.15 0.23 0.29 Explicit/ 1 7 16 18 4 12 16 Verb 2 3 1 1 0 0 1 3 8 12 16 3 4 5 4 0 1 0 0 0 0 5 1 22 25 0 4 4 6 1 8 10 3 6 9 7 4 3 5 2 8 7 8 4 3 2 0 0 1 9 0 2 2 1 0 0 10 6 13 23 0 3 9 Total 34 81 102 13 37 52 Errors/Chunk 0.30 0.37 0.30 0.11 0.17 0.15 Implicit/ 1 3 7 8 6 11 23 Noun 2 38 65 98 12 43 74 3 18 32 38 6 7 18 4 0 1 0 0 0 0 5 16 61 87 8 33 39 6 2 13 15 5 6 18 7 0 0 3 0 0 0 8 0 1 3 1 0 1 9 0 0 0 0 0 1 10 2 10 29 1 4 9 Total 79 190 281 39 104 183 Errors/Chunk 0.63 0.79 0.74 0.31 0.43 0.48 Implicit/ 1 7 8 12 4 5 14 Verb 2 1 10 23 1 12 17 3 6 6 11 5 7 21 4 0 0 0 0 0 1 5 12 48 67 9 34 52 6 13 12 23 6 8 12 7 0 0 4 1 2 0 8 1 1 1 0 0 2 9 0 0 0 0 0 0 10 0 6 21 0 4 12 Total 40 91 162 26 72 131 Errors/Chunk 0.35 0.42 0.47 0.23 0.33 0.38 Grand Total 226 509 779 96 265 472 Table 7.13 Summary of Errors by Types - 191 -Figure 7.19 depicts the number of errors per chunk for each representation. From the figure, we can see that the use of the verb description (i.e., Explicit/Verb and Implicit/Verb) leads to a lower error rate in Phase 1 whereas the use of the explicit relationship construct (i.e., Explicit/Verb and Explicit/Noun) produces fewer errors in Phase 2 (as discussed in Section 7.5.4). 0.9 A» ^ l m p l i c i t / N o u n — • • Explicit/Noun Implicit/Noun ^mpncit/Verb Explicit/Verb 1 1 1 * • " ^ Explicit/Verb 1 1 1 0.6 0.2 0 S I C Phase 1 I c Phase 2 Figure 7.19 Number of Errors Per Chunk for Each Representation Figure 7.20 shows the total number of errors for each representation (i.e., collapsing across the three levels of complexity). The figure shows that Explicit/ Verb has the lowest number of errors in Phase 1, followed by Implicit/Verb, Explicit/Noun, and Implicit/Noun. In Phase 2, however, the - 192-order is as follows: Explicit/Verb, Explicit/Noun, Implicit/Verb, and Implicit/Noun. Although the best and worst representations in both phases are consistent, Explicit/Noun outperforms Implicit/Verb in Phase 2. This again indicates that with minimal training, the verb description is more useful in helping the subjects interpret the information models. However, with more training and practice, the explicit relationship construct becomes more useful. 600VT 55* 500-45a 400-35a 300-250-20a 15tt 10a 50-0 Explicit Explicit Implicit Implicit /Verb /Noun /Verb /Noun Phase 1 47\ 471 471 Explicit Explicit Implicit Implicit /Verb /Noun /Verb /Noun Phase 2 Figure 7.20 Total Number of Errors for Each Representation The findings of this study suggest that the explicit relationship construct should be used in the design of modeling methods as suggested by various theories reviewed earlier. Using the same construct to represent both object/entity and relationship results in construct overload (Wand & Weber 1993) and leads to confusion. Equally importantly, the use of the verb description facilitates end-- 193-users' interpretation of information models. To achieve the best results, explicit construct and verb descriptions for relationships should be used. The implicit construct for relationship, like masquerading it as an object or instance connection, should be avoided. The worst combination is to use the implicit relationship construct with noun relationship description. The use of the explicit relationship construct may create some problems in the design of information systems or databases or in the use of OO programming languages; but the success of information systems projects depends critically on correct requirements specification. If end-users could validate the accuracy and correctness of information models that employ explicit relationship construct better, then the explicit relationship construct should be used. Even for database design, some researchers (e.g., Nachouki et al. 1992, Narashimhan et al. 1993, Biskup et al. 1996) have proposed the use of ER model as a front-end to OO database — by translating ER schema to OO schema. Shoval (1996), for example, advocates that the information model be designed using an ER model before mapping it to an equivalent OO schema. Based on the results of this study, it appears that this might be a good strategy for database design. 7.6.3 Remarks on Coding Scheme and Error Codes The use of a coding scheme and error codes has been shown to be very valuable and comprehensive. As can be seen from Table 7.8, the error code number 9, labeled as "Others (will be documented in the coding sheet by coder)", - 194-accounted for only 7 out of a total of 2347 error counts (almost 0%). On the whole, there are six main categories of errors (in the order of error counts): (i) Confuse relationships with entities/objects, (ii) Link more than one chunk together, (iii) Bring/add additional information, (iv) Incomprehensible sentence or missing keywords, (v) Interpret in the reverse direction, (vi) Missing interpretation. These six types of errors accounted for 96.4% of all the errors (i.e., 2262/2347). The other four error types are only responsible for 3.6% of the errors. Thus, for similar studies in the future, seven categories of errors would suffice: (i) Confuse relationships with entities/objects, (ii) Link more than one chunk together, (iii) Bring/add additional information, (iv) Incomprehensible sentence or missing keywords, (v) Interpret in the reverse direction, (vi) Missing interpretation, (vii) Others (will be documented in the coding sheet by coder). -195-CHAPTER 8 Conclusions and Suggestions for Future Research Every revolutionary idea — science, politics, art or whatever — evokes the same stages of reaction: it is impossible -- don't waste my time it is possible but it is not worth doing I said that it was a good idea all along Anon Information modeling is an indispensable and critical component in systems analysis and design (Olle et al 1988, Dahlbom & Mathiassen 1993). Researchers and practitioners are beginning to realize that although the term information modeling conjures up images of models, formulas, algorithms, and "hard" scientific approaches, it is the behavioral and human factors issues, and not the technical issues, that are the main culprits for incorrect or inaccurate requirements specification (Lucas 1975, Davis 1982, Valusek & Fryback 1987, Coad & Yourdon 1991, Avison & Fitzgerald 1995). -196-8.1 Summary of Dissertation Research The underlying thrust of this research is to develop an understanding of the human factors and behavioral issues involved in information modeling using empirical studies. The focus on the relationship construct was undertaken as an initial step towards developing the methods and techniques for such research. This research design adopted a two-stage approach suggested by Cooper and Emory (1995). In the exploratory stage, we explored the research area with the objectives of understanding the area and testing research design and instruments. The lessons learned and experience gained from the exploratory studies contributed to the experimental design and instrument development in the two formalized experimental studies in the second stage. The findings of the experimental studies lead to a better understanding of the relationship construct, its usage by modeling experts and novices, and better ways of representing the relationship construct. In addition, the experimental procedures, methods, and instruments documented in the dissertation (e.g., questionnaire format, verbal protocol technique, coding scheme) are also valuable and can serve as templates for future studies. The following sections briefly summarize the four experimental studies and their findings. - 197-8.1.1 First Exploratory Study This study investigated the effect of domain knowledge on modeling experts' selection of connectivity for the relationship construct. For both familiar and unfamiliar domains, the number of May choices selected by the subjects was overwhelmingly greater than the number of Must choices. A post-hoc investigation of the questions with familiar domains revealed that all six questions could be interpreted as having a "natural" optional connectivity. Thus, the fact that the majority of the subjects selected May choices for questions with familiar domains may be due to the nature of the questions. This explanation, however, could not be applied to the questions with unfamiliar domains. We speculated that the reason for the popularity of May for questions with unfamiliar domains was that these subjects realized that May can be considered a "superset" — they selected the May choices just to be safe. As for confidence level, the questions with familiar domains had a significantly higher confidence level than the questions with unfamiliar domains. The perceived familiarity of the domain gave the subjects more confidence in their answers. 8.1.2 Second Exploratory Study This study analyzed the effect of conflicting textual information and structural constraints on modeling experts' selection of connectivity (i.e., mandatory versus optional relationship). The results showed that the presence of structural constraints had a significant impact on the choice of connectivity, confidence level, and perceived domain familiarity. The group of subjects that -198-were given information models with structural constraints made an almost equal number of Must and May selections. However, the group that was given information models with no structural constraints selected significantly more May than Must, which was consistent with the findings from the first exploratory study. Further analysis indicated that when structural constraints were present, almost all the subjects followed the structural constraints depicted and ignored the textual information given on the information model (even when the structural constraints and textual information were in conflict). The group of subjects that were given information models with structural constraints reported significantly higher confidence and perceived domain familiarity than the group of subjects that were given information models with no structural constraints. 8.1.3 First Formalized Study This study investigated the differences between modeling experts and novices in their interpretation of information models. Modeling experts and novices were not significantly different in their choice of interpretation when structural constraints were absent. Both groups, in this case, had to interpret the information models based on the textual information (i.e., wordings) available. There was also no significant difference between the two groups when structural constraints were consistent with the textual information. The results, however, showed that experts and novices differed significantly when the structural constraints contradict the textual information. Almost all the expert subjects followed the structural constraints depicted in the information model in their interpretation, even in cases where the structural constraints - 199-were apparently contradictory to the textual information. We termed this behavior exhibited by modeling experts "attentional bias" and "textual information negligence." The findings also showed that experts were more confident than novices in their interpretation when structural constraints were present — irrespective of whether the structural constraints were contradictory to the textual information. 8.1.4 Second Formalized Study This study is different from the other three and it investigated the effect of relationship representations on the interpretation of information models by modeling novices. Verbal protocol analysis is employed in this study to determine the number of errors made in the interpretation and to categorize these errors through the use of a coding scheme. The statistical analyses indicated that the explicit relationship construct was significantly better than the implicit relationship construct, and the use of the verb description for relationship resulted in a significantly higher degree of accuracy of interpretation than the use of the noun description. The results also showed that Explicit/Verb had the lowest number of errors in Phase 1 (i.e., after minimal training), followed by Implicit/Verb, Explicit/Noun, and Implicit/Noun. In Phase 2 (i.e., after more training), the order (from lowest to highest number of errors) was: Explicit/Verb, Explicit/Noun, Implicit/Verb, and Implicit/Noun. Although the best and worst representations (i.e., Explicit/Verb and Implicit/Noun) were consistent in both phases, Explicit/Noun outperformed Implicit/Verb in Phase 2, but not in Phase 1. This indicated that with training and practice, the explicit relationship construct -200-becomes more useful in helping subjects correctly interpret the information models. The use of verb description, on the other hand, is more useful for subjects who have very little training (as in Phase 1). 8.2 Potential Contributions of Experimental Findings The experimental findings from this dissertation indicate that human factors and behavioral studies could contribute to our understanding and design of information modeling methods and constructs. First, these studies could help us identify differences among users of information models. For example, our first formalized study showed that there were significant differences between modeling experts and novices in their interpretation of the information models. Modeling experts focused almost entirely on the structural constraints depicted in the information model and tended to ignore the textual information. A n understanding of these differences would help in the design of training materials for analysts and end-users, and in the design of information models and constructs to suit specific users. For example, the training materials for analysts may stress the importance of consistency between textual information and structural constraints in an information model. Second, these studies help in identifying constructs that more closely match the information processing characteristics of the users. For example, our second formalized study showed that the use of the explicit relationship construct and the verb description facilitated the comprehension of information models by -201 -modeling novices. Masquerading relationships in terms of objects or instance connection was detrimental to users' understanding of information models. Third, these studies may suggest ways of improving user training and learning. For example, our first formalized study indicated that modeling experts were susceptible to cognitive biases. To counter these biases, such incidents should be documented and made known to analysts. Our second formalized study also suggested that for users with very minimal training (e.g., by showing them only one example), using verb description for relationship helps in improving the accuracy of interpretation. Fourth, the findings of such studies can provide guidelines for the design of information modeling methods and constructs, and make design choices and assumptions explicit. One of the new directions in this area is Method Engineering. It has been recently recognized that behavioral studies are important for guiding the engineering of modeling methods and tools. However, there is still a paucity of behavioral and human factors studies in this area. For example, we mentioned in Chapter 1 that most of the constructs used in the various modeling methods are introduced based on intuition or common sense, and have little support from theories or empirical evidence. The results of behavioral and human factors studies can provide the necessary empirical evidence to make known the reasons for these design decisions. Fifth, the results for such studies can provide a predictive evaluation of newly proposed information modeling methods. There are often alternative constructs and representations to be considered in the design process, for example, whether or not to use an explicit construct for relationship. By -202-predicting user behavior, these studies can help in choosing among competing options. Finally, these studies can guide the design of future experiments and help in the interpretation of the results. For example, the studies in this dissertation can provide research ideas (e.g., investigate the use of the object construct by modeling experts and novices) and templates (e.g., the use of protocol analysis) for future research in information modeling. 8.3 Theoretical Contribution to Information Modeling The theoretical foundation for this dissertation is drawn mainly from the cognitive psychology literature. It also draws upon theories from communication, philosophy, and computer science. Based on the findings of our studies, we believe that the lack of strong theoretical foundation in the field of information modeling can be partly overcome by adopting theories from cognitive psychology and other reference disciplines. The theories used in this dissertation, for instance, provide good predictions of human behavior in information modeling. Theories and models from cognitive psychology, such as the human-information processing system, the three-stage learning model, and the Adaptive Control of Thought (ACT), are found to be particularly valuable to this research — in guiding the research design and in interpreting the experimental findings. Over the last few decades, cognitive psychologists have developed a large body of knowledge on human cognitive processes and -203-structures. This knowledge covers areas such as perception, performance, memory, learning, problem solving, cognitive biases, and information processing. Much of what cognitive psychologists discovered about human-information processing and behavior can help us understand information modeling processes (i.e., the constraints of the human mind, cognitive biases, etc.). Many of the problems encountered in studying human behavior have already been explored, studied, and documented in the cognitive psychology literature, and knowledge of these problems and the ways to resolve or avoid them are valuable to the study of information modeling. One of our future goals is to bridge the gap between cognitive psychology and information modeling by identifying theories (e.g., ACT, Cognitive Capacity Theory) that are relevant to information modeling, and experimental techniques that have direct applications to the study of human factors in information modeling. 8.4 Possible Future Research Directions This dissertation focuses on individual cognition. There are two main reasons for making this choice. We see individual cognition as the first step to building a scientific basis for understanding human behavior in information modeling. The experimental design and theories from cognitive psychology provide a good foundation for research on information modeling. Secondly, it is very difficult to interpret the results obtained from multi-individual studies without sufficient understanding of the information modeling processes of individuals. -204-Human Factors in Systems Analysis & Design Individual Cognition Group Processes _ Effect of other Constructs (e.g., object) — Effect of Domain Knowledge Effect of Different Levels of "~ Training Differences Between ~" Modeling Experts & Novices Effect of Different Representations _ Cognitive Biases in Information Modeling _ Communication Processes Between Analysts and Users — Group Dynamics — Power Structure Effect of Group Support Technology _ CASE Tools as Support Technology Ph.D. Research Possible Future Directions Figure 8.1 Future Research Directions Figure 8.1 depicts possible future research directions. We would like to further investigate individual cognition in subsequent studies. For example, the study of the popular object construct would be an interesting follow-up research. At the same time, we will also develop theories and empirical - 2 0 5 -designs for studying group processes in information modeling. For instance, we are interested in studying the communication process between modeling experts and novices, and their dynamics and power structure. The use of group support technology (e.g., GDSS) and CASE tools for information modeling will also be investigated. Ph.D. Research ~+ • ^ • Future Endeavour Figure 8.2 A Multi-Methodological Approach Another logical extension of the current research is to conduct the studies in a realistic field setting (as shown in Figure 8.2). Field studies on the use of various modeling methods would provide us much insight into the strengths and weaknesses of various methods. Running experiments in the field is not an easy task, many confounding variables have to be controlled -206-(thereby compromising reality of the studies). There are other research methodologies that are well suited to field studies. Case studies and action research could be used to validate the findings from laboratory experiments and derive new research questions. For example, in action research, the researchers could be part of a systems analysis and design project team. Case studies and action research will be pursued in the near future to complement our current research. With this dissertation and future research, we hope to eventually develop an explanatory model of human behavior in information modeling, which may take years or decades of research. Human factors are important variables that warrant more research attention in the field. It is also our hope that this dissertation will draw the attention of researchers to the need for empirical studies in this field and serve as a spur to increase research effort in this area. -207-References Ackoff, R., Management Misinformation Systems, Management Science, Vol. 14, No. 4, December 1967, pp. 147-156. Ackoff, R.L., Towards a System of System Concepts, Management Science, Vol. 17, July, 1971. Adelson, B., Problem Solving and the Development of Abstract Categories in Programming Languages, Memory and Cognition, 9, 1981, pp. 422-433. Adelson, B., When Novices Surpass Experts: The Difficulty of a Task May Increase With Expertise, Journal of Experimental Psychology: Learning, Memory, and Cognition, Vol. 10, No. 3, 1984, pp. 483-495. Agassi, J., Ontology and Its Discontent, in: Studies on Mario Bunge's Treatise, Weingartner, P. and Do,, G.J.W., (eds), Rodopi, Amsterdam, pp. 105-122. Airchinnigh, M.M.A., Tutorial Lecture Notes on the Irish School of the VDM, in VDM 1991: Formal Software Development Methods, Lecture Notes in Computer Science, Vol. 552, Berlin, Germany: Springer Verlag, 1991, pp. 141-237. Alexander J.H., Freiling M.J., Shulma S.J., Rehfuss S., Messick S.L., Ontological Analysis: An Ongoing Experiment, Knowledge-Based System, Vol. 2, 1988, pp. 25-37. Alford, M., A Requirements Engineering Methodology for Real Time Processing Requirements, IEEE Transactions on Software Engineering, SE-3, 1, January 1977, pp. 60-69. Alford, M., SREM at the age of eight: The Distributed Computing Design System, Computer, Vol. 18, No. 4, April 1985, pp. 36-46. Allport, A., Visual Attention, in: Foundations of Cognitive Science, M.L. Posner, (ed.), The MIT Press, 1989, pp. 631-682. Anderson, J.R., Acquisition of Cognitive Skill, Psychological Review, Vol. 89, No. 4, 1982, pp. 369-406. Anderson, J.R., Cognitive Psychology and Its Implications, Second Edition, W.H. Freeman and Company, 1985. Anderson, J.R., Language, Memory and Thought, Hillsdale, NJ: Erlbaum, 1976. Anderson, J.R., Learning and Memory: An Integrated Approach, John Wiley & Sons, 1995. Anderson, J.R., Rules of The Mind, Lawrence Erlbaum Associates, 1993. Anderson, J.R., Skill Acquisition: Compilation of Weak-Method Problem Solutions, Psychological Review, Vol. 94, No. 2, 1987, pp. 192-210. -208-Anderson, J.R., The Architecture of Cognition, Cambridge, MA: Harvard University Press, 1983. Avison, D.E., Fitzgerald, G., Information Systems Development: Methodologies, Techniques, and Tools, Second Edition, McGraw-Hill, London, 1995 Babbie, E., The Practice of Social Research, Sixth Edition, Wadsworth Publishing Company, 1992. Banerjee, J., Chou, H.T., Garza, J.F., Kim, W., Woelk, D., Ballou, N, and Kim, H.J., Data Model Issues for Object-Oriented Applications, ACM Transactions on Office Information Systems, Vol. 5, No. 1, January 1987, pp. 3-26. Batini, C , Ceri, S., Navathe, S.B., Conceptual Database Design - An Entity-Relationship Design, Benjamin/Cummings, 1992. Batra, D., and Srinivasan, A., A Review and Analysis of the Usability of Data Management Environments, International Journal of Man-Machine Studies, 36, 1992, pp. 395-417. Batra, D., A Framework for Studying Human Error Behavior in Conceptual Database Modeling, Information & Management, Vol. 25, 1993, pp. 121-131. Batra, D., and Antony, S.R., Novice Errors in Conceptual Database Design, European Journal of Information Systems, Vol. 3, No. 1, 1994, pp. 57-69. Batra, D., and Davis, J.G., Conceptual Data Modeling in Database Design: Similarities and Differences Between Expert and Novice Designers, International Journal of Man-Machine Studies, Vol. 37, 1992, pp. 83-101. Batra, D., and Sein, M.K., Improving Conceptual Database Design Through Feedback, International Journal of Human-Computer Studies, Vol. 40, 1994, pp. 653-676. Batra, D., and Zanakis, S.H., A Conceptual Database Design Approach Based on Rules and Heuristics, European Journal of Information Systems, Vol. 3., No. 3, 1994, pp. 228-239. Batra, D., Hoffer J.A., and Bostrom, R.P., Comparing Representations with Relational and EER Models, Communications of the ACM, Vol. 33, No. 2, February 1990, pp. 126-139. Beard, J.W., and Peterson, T.O., A Taxonomy for the Study of Human Factors in Management Information Systems (MIS), in: J.M. Carey (ed.), Human Factors in MIS, Norwood, NJ: Ablex, pp. 7-26. Bell, T.E., Bixler, D.C., and Dyer, M.E., An Extendible Approach to Computer-aided Software Requirements Engineering, IEEE Transactions on Software Engineering, SE-3, 1, January 1977, pp. 49-60 Benbasat, I., and Dexter, A.S., An Experimental Evaluation of Graphical and Color-enhanced Information Presentation, Management Science, Vol. 31, No. 11, 1985, pp. 1348-1364. -209-Benbasat, I., and Dexter, A.S., An Investigation of Color and Graphical Information Presentation Under Varying Time Constraints, MIS Quarterly, Vol. 10, No. 1, 1986, pp. 59-83. Benbasat, I., Dexter, A.S., and Todd, P., An Experimental Program Investigating Color-Enhanced and Graphical Information Presentation: An Integration of the Findings, Communications of the ACM, Vol. 29, No. 11, 1986, pp. 1094-1105. Benbasat, I., Laboratory Experiments in Information Systems Studies with a Focus on Individuals: A Critical Appraisal, in: The Information Systems Research Challenge: Experimental Research Methods, Vol. 2, Benbasat, I. (ed.), Harvard Business School, 1990, pp. 33-47. Berger, P.L., and Luckman, T., The Social Construction of Reality: A Treatise in the Sociology of Knowledge, Harmondsworth: Penguin, 1971. Best, J.B., Cognitive Psychology, Third Edition, West Publishing Company, 1992. Beyer, H.R., and Holtzblatt, K., Apprenticing with the Customer, Communications of the ACM, Vol. 38, No. 5, May 1995, pp. 45-52. Biskup, J., Menzel, R., and Polle, T., Transforming an Entity-Relationship Schema into Object-Oriented Database Schemas, in: Eder, J. and Kalinichenko, L.A. (eds.), Advances in Databases and Information Systems, Springer-Verlag, 1996 (to appear). Boar, B.H., Application Prototyping: A Requirements Definition Strategy for the 80's, New York, John Wiley & Sons, 1984. Booch, G., Object Oriented Design with Applications, The Benjamin/Cummings Publishing Company, 1991. Booch, G., Object-Oriented Analysis and Design — With Applications, Second Edition, Benjamin/Cummings, 1994. Booth, P., An Introduction to Human-Computer Interaction, Lawrence Erlbaum Associates Ltd., 1989. Boradbent, D.E., Decision and Stress, London: Academic Press, 1971. Borgida, A., Knowledge Representation, in: Semantic Modeling: Similarities and Differences, Entity-Relationship Approach: The Core of Conceptual Modeling, H. Kangassalo (ed.), Elsevier Science Publishers, 1991, pp. 1-24. Borgida, A., Greenspan, S., and Mylopoulos, J., Knowledge Representation as the Basis for Requirement Specifications, Computer, April 1985, pp. 82-90. Bower, G.H., Chunks as Interference Units in Free Recall, Journal of Verbal Learning and Verbal Behavior, 1969, Vol. 8, pp. 610-613. Brancheau, J., and Wetherbe, J., Information Architecture: Methods and Practices, Information Processing and Management, Vol. 22, No. 6, 1986, pp. 453-463. Brodie, M.L., On the Development of Data Models, in: On Conceptual Modelling, Brodie, M.L., Mylopoulos, J. and Schmidt, J.W. (eds.), Springer-Verlag, 1984. -210-Brodie, M.L., Silver Bullet Shy On Legacy Mountain: When Neat Technology Just Doesn't Work, Keynote Speech, Conference on Advanced Information Systems Engineering, Crete, Greece, May 20-24, 1996. Brooks, F.P., No Silver Bullet: Essence and Accidents of Software Engineering, IEEE Computer, April 1987, pp. 10-19. Bubenko, J.A., and Wangler, B., Research Directions in Conceptual Specification Development, in: Loucopoulos, P., and Zicari, R. (eds.), Conceptual Modeling, Databases, and Case: An Integrated View of Information Systems Development, John Wiley & Sons, Inc., 1992, pp. 389-412. Bubenko, J.A., Information Systems Methodologies — A Research Review, in: Information Systems Design Methodologies: Improving the Practice, in: T.W. Olle, H.G. SOI, and A.A. Verrijn-Stuart (eds.), Elsevier Science Publishers, North-Holland, IFIP, 1986, pp. 289-318. Bunge, M., Treatise on Basic Philosophy: Vol. 3: Ontology I: The Furniture of the World, Reidel, Boston, 1977. Bunge, M., Treatise on Basic Philosophy: Vol. 4: Ontology II: A World of Systems, Reidel, Boston, 1979. Card, S.K., Moran, T.P., and Newell, A., The Psychology of Human-Computer Interaction, Hillsdale, NJ: Erlbaum, 1983. Carroll, J.M., Mack, R.L., Lewis, C.H., Grischkowsy, N.L., and Robertson, S.R., Exploring a Word Processor, Human-Computer Interaction, Vol. 1, 1985, pp. 283-307. Carter, L.F., An Experiment on the Design of Tables and Graphs Used for Presenting Numerical Data, Journal of Applied Psychology, Vol. 31, 1947, pp. 640-650. Cash, J.I., McFarlan, F.W., McKenney, J.L., and Applegate, L .M. , Corporate Information Systems Management: Text and Case, Richard D. Irwin Inc., Third Edition, 1992. Caverni, J.P., Fabre, J.M., and Gonzalez, M., Cognitive Biases, North-Holland, 1990. Chaffin, R., and Herrmann, D.J., Effects of Relation Similarity on Part-Whole Decisions, The Journal of General Psychology, Vol. 115, No. 2, 1988, pp. 131-139. Chaffin, R., and Herrmann, D.J., Relation Element Theory: A New Account of the Representation and Processing of Semantic Relations, in: Memory and Learning — The Ebbinghaus Centennial Conference, Gorfein, D.S., and Hoffman (eds.), R.R., Lawrence Erlbaum Associates, 1987, pp. 221-245. Chaffin, R., and Herrmann, D.J., The Similarity and Diversity of Semantic Relations, Memory & Cognition, Vol. 12, No. 2, 1984, pp. 134-141. Champeaux, D.D., Lea D., Faure, P., Object-Oriented System Development, Addison-Wesley Publishing Company, 1993. -211 -Chan, H.C., Wei, K.K. and Siau, K.L., Conceptual Level versus Logical Level User-Database Interaction, Twelfth Annual International Conference on Information Systems (ICIS 91), 1991, pp. 29-40. Chan, H.C., Wei, K.K. and Siau, K.L., User-Database Interface: The Effect of Abstraction Levels on Query Performance, Management Information Systems Quarterly (MISQ), Vol. 17, No. 4, December 1993, pp. 441-464. Chang, T.M., Semantic Memory: Facts and Models, Psychological Bulletin, Vol. 99, No. 2, 1986, pp. 199-220. Chase, W.G., and Simon, H.A., Perception in Chess, Cognitive Psychology, 4, 1973, pp. 55-81. Checkland, P., From Optimising to Learning: A Development of Systems Thinking for the 1990s, Journal of the Operational Research Society, Vol. 36, No. 9, 1985. Checkland, P., Systems Thinking, Systems Practice, Wiley, Chichester, 1981. Chen, P.P., The Entity-Relationship Model: Toward a Unified View of Data, ACM Transactions on Database Systems Vol. 1, No. 1, 1976, pp. 9-36. Churchland, P.M., Matter and Consciousness, Cambridge, MA: MIT Press, 1988. Coad, P. and Yourdon, E., Object-Oriented Analysis, Second Edition, Prentice Hall, 1991. Codd, E.F., The Relational Model for Database Management: Version 2, MA: Addison-Wesley, 1990. Cohen, J. and Cohen, P. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Second Edition, Lawrence Erlbaum Associates, 1983. Cohen, J. Some Statistical Issues in Psychological Research, in: Handbook of Clinical Psychology, B.B. Wolman (ed.), McGraw-Hill, 1965. Cohen, J. Statistical Power Analysis for the Behavioral Sciences (Rev. ed.), Academic Press, 1977. Cohen, J. Statistical Power Analysis for the Behavioral Sciences, Second Edition, Lawrence Erlbaum Associates, 1988. Collins, A.M., and Loftus, E.F., A Spreading-Activation Theory of Semantic Processing, Psychological Review, 82, 1975, pp. 407-428. Conger, S., The New Software Engineering, Wadsworth, 1994. Cook, T.D., and Campbell, D.T., Quasi-Experimentation: Design and Analysis Issues for Field Settings, Boston: Houghton Mifflin, 1979. Cooper, D.R., and Emory, C.W., Business Research Methods, Fifth Edition, Irwin, 1995. -212-Cupp., C G . (Editor and Principal author), Integrated Process Capture and Process Analysis Tools in Support of Business Re-engineering Applications, Proceedings of the CE & CALS Conference, Washington, D.C., June 1993, pp. 275-330. Currie, W., The Strategic Management of a Large Scale IT Projects in the Financial Services Sector, New Technology Work and Employment, Vol. 9, No. 1, 1994, pp. 19-29. Curtis, B., Krasner, H., and Iscoe, N., A Field Study of the Software Design Process for Large Systems, Communications of the ACM, Vol. 31, No. 11, 1988, pp. 1268-1287. Curtis, B., Objects of our Desire: Empirical Research on Object-Oriented Development, Human-Computer Interaction, Vol. 10, No. 2/3, 1995, pp. 337-344. Dahlbom, B., and Mathiassen, L., Computers In Context: The Philosophy and Practice of Systems Design, Blackwell Publishers, Cambridge, Massachusetts, 1993. Davenport, T.H., and J.E., Short, The New Industrial Engineering: Information Technology and Business Process Redesign, Sloan Management Review, Summer 1990, pp. 11-27. Davenport, T.H., and Stoddard, D.B., Reengineering: Business Change of Mythic Proportions?, MIS Quarterly, June 1994, pp. 121-127. Davis, D., Business Research for Decision Making, Fourth Edition, Wadsworth Publishing Company, 1996. Davis, F.D., A Technology Acceptance Model for Empirically Testing New End-user Information Systems: Theory and Results, Ph.D. Dissertation, MIT, 1986. Davis, G.B., and Olson, M., Management Information Systems: Conceptual Foundations, Structure and Development, Second Edition, McGraw-Hill, 1985. Davis, G.B., Management Information Systems: Conceptual Foundations, Structure, and Development, New York: McGraw-Hill, 1974. Davis, G.B., Strategies for Information Requirements Determination, in: Information Analysis: Selected Readings, Galliers R. (ed.), Addison-Wesley, 1987, pp. 237-265. Davis, G.B., Strategies for Information Requirements Documentations, IBM Systems Journal, Vol. 21, No. 1, 1982, pp. 4-30. Davis, J.S., Experimental Investigation of the Utility of Data Structure and ER Diagrams in Database Query, International Journal of Man-Machine Studies, 1990, 32, pp. 449-459. De Marco, T., Structured Analysis and System Specification, Englewood Cliffs, NJ: Prentice-Hall, Inc., 1979. DeGroot, A.D., Thought and Choice in Chess, New York: Basic Books, 1965. Dervin, B., Information as a User Construct: The Relevance of Perceived Information Needs to Synthesis and Interpretation, in: Ward, S.A., and Reed, L.J. (eds.), Knowledge Structure and Use: Implications for Synthesis and Interpretation, Philadelphia, Pa: University Press, 1983, pp. 153-183. -213-Dewitz, S.D., Systems Analysis and Design and the Transition to Objects, McGraw Hill, 1996. Dijkstra, E.W., A Manuscript for the Coahuila Student Chapter of the ACM, EWD 1115, 29 November 1991. Doane, A Longitudinal Study of UNIX Users' Expertise: UNIX Mental Models and Task Performance, Unpublished Doctoral Dissertation, University of California, Santa Barbara, 1986. DTI, Knowledge Based Systems: A Survey, Touche Ross, London, 1992. Einhorn, H.J., and Hogarth, R.M., Confidence in Judgment: Persistence of the Illusion of Validity, Psychological Review, Vol. 85, No. 5, 1978, pp. 395-476. Elmasri R., and Navathe, S.B., Fundamentals of Database Systems, Second Edition, Benjamin/Cummings, 1994. Elmasri, R. and Navathe, S.B., Fundamentals of Database Systems, Addison Wesley, 1989. Elmasri, R., Hevner, A. and Weeldreyer, J., The Category Concept: An Extension to the Entity-Relationship Model, Data Knowledge Engineering, Vol. 1, No. 1, 1985, pp. 75-116. Embley, D.W., Jackson, R.B., and Woodfield, S.N., IEEE Software, Vol. 12, No. 4, 1995, pp. 19-32. Embley, D.W., Kurtz, and Woodfield, S.N., Object-Oriented System Analysis — A Model-Driven Approach, Yourdon Press, Englewood Cliffs, New Jersey, 1992. Ericsson, K.A., and Simon, H.A., Protocol Analysis: Verbal Reports As Data, Revised Edition, The MIT Press, 1993. Erlich, K., and Soloway, E., An Empirical Investigation of the Tacit Plan Knowledge in Programming, in: Human Factors in Computer Systems, J.C. Thomas, and M.L. Schneider (eds.), Norwood, NJ: Ablex, 1984, pp. 113-133. Everest, G.C., Database Management: Objectives, System Functions, and Administration, McGraw-Hill, New York, 1986. Ewusi-Mensah, K., and Przasnyski, Z.H., On Information Systems Project Abandonment: An Exploratory Study of Organizational Practices, MIS Quarterly, Vol. 15, No. 1, 1991, pp. 67-86. Falkenberg, E., and Lindgreen, P., Proposal to TC8 for the establishment of the FRISCO task group, IFIP WG 8.1 memo February, 1987. Fitts, P.M., Perceptual-motor Skill Learning, in: A.W. Melton (ed.), Categories of Human Learning, New York: Academic Press, 1964. Flanagan, M. , and Dipboye, R., Research Settings in Industrial and Organizational Psychology: Facts, Fallacies and The Future, Personnel Psychology, 34, 1981, pp. 37-47. -214-Gagne, E.D., Yekovich, C.W., and Yekovich, F.R., The Cognitive Psychology of School Learning, Harper Colins, 1993. Gane, C. and Sarson, T., Structured Systems Analysis: Tools and Techniques, Englewood Cliffs, NJ: Prentice Hall, Inc., 1979. Gentner, D., and Grudin, J., Design Models for Computer-Human Interfaces, IEEE Computer, June 1996, pp. 28-35. Gentner, D., and Stevens, A.L., Mental Models, Hillsdale, NJ: Erlbaum, 1983. Gibbs, W., Software's Chronic Crisis, Scientific American, September 1994, pp. 86-95. Gingras, L., and McLean, E., Designers and Users of Information Systems: A Study of Differing Profiles, In Proceedings of Third International Conference on Information Systems, Ann Arbor, Michigan, December 1982, pp. 169-181. Gladden, G.R., Stop the Life Cycle, I Want to Get Off, Software Engineering Notes, Vol. 7, No. 2, April 1982. Glaser, R., and Chi, M.T.H., in: M.T.H., Chi, R. Glaser, and M.J. Farr (eds.), The Nature of Expertise, Hillsdale, NJ: Erlbaum, 1988, pp. xv-xxxvi. Glaser, R., Thoughts on Expertise, in: Cognitive Functioning and Social Structure Over the Life Course, in: C. Schooler and K.W., Schaie (eds.), NJ: Ablex, 1987, pp. 81-95. Goldstein, R.C. and Storey, V., Some Findings on the Intuitiveness of Entity-Relationship Concepts, Entity-Relationship Approach to Database Design, Lochovsky, F.H. (ed.), ER Institute, 1990, pp. 9-23. Goldstein, R.C., Database: Technology and Management, John Wiley & Sons, 1985. Graham, I., Object Oriented Methods, Addison-Wesley, 1991. Grandon, G.T., Early Expert Systems: Where are they now?, MIS Quarterly, Vol. 19, No. 1, 1995, pp. 51-81. Grant, F.J., Twenty-First Century Software, Datamation, April 1, 1985 Grasser, A., and Mandler, G., Limited Processing Capacity Constrains the Storage of Unrelated Sets of Words and Retrieval from Natural Categories, Journal of Experimental Psychology: Human Learning and Memory, Vol. 4, 1978, pp. 86-100. Gray, P.M.D., Kulkarni, K.G. and Paton, N.W., Object-Oriented Databases: A Semantic Data Model Approach, Prentice-hall, 1992. Green, T.R.G., Programming as a Cognitive Activity, in: Human Interaction with Computers, Smith, H.T., and Green , T.R.G. (eds.), London: Academic Press, 1980. Gregg, L.W., Perceptual Structures and Semantic Relations, In: Gregg, L.W. (ed.), Knowledge and Cognition, Lawrence Erlbaum Associates, 1974, pp. 1-16. -215-Grover, V., and Goslar, M M Information Technologies for the 1990s: The Executives View, Technical Correspondence, Communications of the ACM, Vol. 36, No. 3, March 1993. Guelin, B., and Matthews, A., The Effects of Semantic Complexity on Expert and Novice Computer Program Recall and Comprehension, The Journal of General Psychology, Vol. 117, No. 4, Oct. 1990, pp. 379-389. Guha R. V., and Lenat, D. B., Cyc: A Midterm Report, Al Magazine, Fall, 1990, pp. 33-59. Gulden, G.K., and Ewers, D.E., Is your ESS Meeting the Need? ComputerWorld, Vol. 23, No. 28, 1989, pp. 85-91. Hall, J.F., Learning and Memory, Second Edition, Allyn and Bacon, 1989. Halpin, T. Conceptual Schema & Relational Database Design, Second Edition, Prentice Hall, Australia, 1995. Hammer, M., Champy, J., Reengineering The Corporation: A Manifesto for Business Revolution, Harper Business, 1993. Hawryszkiewycz, L.T., Introduction to Systems Analysis and Design, Second Edition, Prentice Hall, 1991. Hayes, J.R., Three problems in Teaching General Skills, in: Segal, J., Chipman, S., and Glaser, R., (eds.), Thinking and Learning, Vol. 2, Hillsdale, NJ:Erlbaum, 1985. Hehner, E.C.R., The Logic Programming, Hemel Hempstead, UK: Prentice-Hall, 1993. Herrmann, D.J., Chaffin, R., and Winston, M.E., "Robins are a part of birds": The Confusion of Semantic Relations, Bulletin of the Psychonomic Society, Vol. 24, No. 6, 1986, pp. 413-415. Hirschheim, R., and Klein, H.K, Four Paradigms of Information Systems Development, Communications of the ACM, Vol. 32, No. 10, 1989. Hoare, C.A.R., Essays in Computing Science, ed. C.A.R. Hoare and C.B. Jones, Hemel Hempstead, UK: Prentice Hall, 1989. Hogarth, R.M., Judgment and Choice: The Psychology of Decision. Chichester: John Wiley & Sons, 1980. Holtzblatt, K., and Beyer, H.R., Requirement Gathering: The Human Factor, Communications of the ACM, Vol. 38, No. 5, May 1995, pp. 31-32. Hughes, J.G., Object-Oriented Databases, Prentice-Hall, UK, 1991. Hull, R., and King, R., Semantic Database Modeling: Survey, Applications, and Research Issues, ACM Computing Surveys, Vol. 10, No. 3, September 1987, pp. 201-260. Hunt, E.B., Frost, N.H., and Lunneborg, C , Individual Differences in Cognition: A New Approach to Intelligence, The Psychology of Learning and Motivation, 7, 1973, pp. 87-123. -216-Hutchings, A.F. , and Knox, S.T., Creating Products — Customers Demand, Communications of the ACM, Vol. 38, No. 5, May 1995, pp. 72 - 80. Hutchins, E.L., Hollan, J.D., and Norman, D.A., Direct Manipulation Interfaces, Human-Computer Interaction, Vol. 1, 1985, pp. 311-338. Jackson, M. A., System Development, Englewood Cliffs, NJ: Prentice-Hall, 1983. Jacobson, I., Ericsson, M., Jacobson, A., The Object Advantage — Business Process Reengineering with Object Technology, Addison Wesley, 1995. Jarvenpaa, S.L., and Machesky, J.J., Data Analysis and Learning: An Experimental Study of Data Modeling Tools, International Journal of Man-Machine Studies, 31, 1989, pp. 367-391. Jarvenpaa, S.L., Effect of Task Demand and Graphical Format on Information Processing Strategies, Management Science, Vol. 35, No. 3, 1989, pp. 285-303. Jayaratna, N., Understanding and Evaluating Methodologies, NIMSAD: A Systemic Framework, McGraw-Hill, Maidenhead, 1994. Jeffries, R., Yurner, A., Polosn, P., and Atwood, M., The Processes Involve in Designing Software, in: Cognitive Skills and Their Acquisition, J. Anderson (ed.), Hillsdale, NJ: Erlbaum, 1981, pp. 255-283. Jih, K. W.J., Bradbard, D.A., Snyder, C.A., and Thompson G.A., The Effects of Relational and Entity-Relationship Data Models on Query Performance of End Users, International Journal of Man-Machine Studies, Vol. 31, 1989, pp. 257-267. Johannessen, J.A., The Cognitive Authority of Information: Information Science, the Theory of Science and Ethics, in: Olaisen, J., Munch-Petersen, E., and Wilson, P. (eds.), Information Science: From the Development of the Discipline to Social Interaction, Oslo: Scandinavian University Press, 1996, pp. 113-134. Johannesson, P., and Kalman, K., A Method for Translating Relational Schemas into Conceptual Schemas, in: F.H. Lochovsky (ed.) Entity-Relationship Approach to Database Design and Querying, 1990, pp. 271-285 Jones, C , Programming Productivity, New York: McGraw-Hill, 1986. Juhn, S.H., and Naumann, J.D., The Effectiveness of Data Representation Characteristics on User Validation, Proceedings of the Sixth International Conference on Information Systems, 1985, pp. 212-226. Kahneman D., Slovic P. and Tversky A. (eds.). Judgment Under Uncertainty: Heuristics and Biases. Cambridge: Cambridge University Press, 1982. Kangassalo, H , Foundations of Conceptual Modeling: A Theory Construction View, In: Information Modeling and Knowledge Bases, Amsterdam, IOS Press, 1990, pp. 20-29. Kendall, K.E., and Kendall, J.E., Systems Analysis and Design, Third Edition, Prentice-Hall, 1995. -217-Kendall, P.A., Introduction to Systems Analysis & Design: A Structured Approach, Third Edition, Irwin, 1996. Kent, W., Fact-based Data Analysis and Design, in Entity-Relationship Approach to Software Engineering, ed. C.G. Davis, S. Jajodia, P. Ng, and R. Yeh, Amsterdam: North-Holland, 1983, pp. 3-52. Kent, W., The Leading Edge of Database Technology, in Entity-Relationship Approach to Database Design and Querying, F.H., Lochovsky (ed.), Elsevier Science Publishers, 1990, pp. 3-7. Keppel, G., Design and Analysis: A Researcher's Handbook, Second Edition, Prentice-Hall, 1982. Khoshafian, S. Object-Oriented Databases, John Wiley & Sons, 1993. Kilov, H., and Ross, J., Information Modeling — An Object-Oriented Approach, Prentice-Hall, 1994. Kim, W., Introduction to Object-Oriented Databases, The MIT Press, 1990. Kim, W., Object-Oriented Databases: Definition and Research Directions, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 3, September 1990, pp. 327-341. Kim, Y.G., and March, ST., Comparing Data Modeling Formalisms, Communications of the ACM, Vol. 38, No. 6, June 1995, pp. 103-115. Kim, Y.G., Effects of Conceptual Data Modeling Formalisms on User Validation and Analyst Modeling of Information Requirements, Unpublished Dissertation, University of Minnesota, 1990. King, R., and McLeod, D., A Unified Model and Methodology for Conceptual Database Design, in: On Conceptual Modeling: Perspectives from Artificial Intelligence, Databases, and Programming Languages, Brodie, M.L. Mylopoulos, J., Schmidt, J.W., (eds.), Springer-Verlag, New York, 1984, pp. 313-327. Klatzky, R.L., Human Memory: Structures and Processes, Second Edition, Freeman, 1980. Klein, H.K., Lyytinen, K., The Poverty of Scientism in Information Systems, in: Mumford, E. et al. (eds.), Research Methods in Information Systems, North Holland, Amsterdam, 1983. Kornatzky, Y., and Shoval, P., Conceptual Design of Object-Oriented Database Schemas using the Binary-Relationship Model, Data & Knowledge Engineering, Vol. 14, 1994, pp. 265-288. Kroenke, D.M., Database Processing: Fundamentals, Design, and Implementation, Fifth Edition, Prentice Hall, 1995. Kung, C.H., and Solvberg, A., Activity Modeling and Behavior Modeling, In: T.W. Olle, H.G. Sol, and A.A. Verrijn-Staut (eds.), Information Systems Design Methodologies: Improving the Practice, Amsterdam, North-Holland, IFIP, 1986, pp. 145-171. -218-Laamanen, M.T., The I DEF Standards, in: Verrijn-Stuart, A.A. and Olle, T.W. (eds.), Methods and Associated Tools for the Information Systems Life Cycle, Elsevier Science B.V., North-Holland, 1994, pp. 121-130. LaBerge, D., and Samuels, S.J., Toward a Theory of Automatic Information Processing in Reading, Cognitive Psychology, 6, 1974, pp. 293-323. Lanham, F.W., Stewart, M.M., and Zimmer, K., Business English and Communication, McGraw-Hill, 1977. Larkin, J.H., and Simon, H.A., Why a Diagram is (Sometimes) Worth Ten Thousand Words, Cognitive Science, 11, 1987, pp. 65-99. Lewis, T., The Big Software Chill, IEEE Computer, March, 1996, pp. 12-14. Lientz, B.P., and Swanson, E.B., Software Maintenance Management, Addison-Wesley. Reading, Mass., 1980. Loftus, E.F., Eyewitness Testimony, Cambridge, Massachusetts, Harvard University Press, 1979. Logan, G.D., Toward an Instance Theory of Automatization, Psychological Review, Vol. 95, No. 4, 1988, pp. 492-527. Lucas, H.C., Why Information Systems Fail, Columbia University Press, 1975. Luft, J., Of Human Interaction, Palo Alto: National Press Books, 1969. Lundberg, M, Goldkuhl, G., and Nillson, A., Information Systems Development: A Systematic Approach, Englewood Cliffs, NJ: Prentice Hall Inc., 1981. Maddison, R.N., Information Systems Methodologies, Wiley Heyden, Chichester, 1983. Manola, F„ X3H7 Object Model Features Matrix, X3H7-93-007v6, February 23, 1994. Martin , J., and Odell, J.J., Object-Oriented Analysis and Design, Prentice Hall, 1992. Martin, M.P, Analysis and Design of Business Information Systems, Second Edition, Prentice Hall, 1995. Mayer, R.E., A Psychology of Learning BASIC Computer Programming: Transactions, Prestatements, and Chunks, Communications of the ACM, 22, 1979, pp. 589-593. Mayer, R.E., From Novice to Expert, in: Handbook of Human-Computer Interaction, M. Helender (ed.), 1988, pp. 569-580. Mayer, R.E., The Promise of Cognitive Psychology, San Francisco: Freeman, 1981. Mayer, R.E., Thinking, Problem Solving, Cognition, Second Edition, Freeman, 1992. McCormick, J.J., CIOs reassess Priorities, Information Week, December 16, 1991, p. 13. -219-McFarlan, F. W., Information Technology Changes the Way You Compete, HBR May-June 1984, p. 98 McGregor J.D., and Sykes, D.A., Object-Oriented Software Development: Engineering Software for Reuse, Van Nostrand Reinhold, 1992. Mckeithen, K.B., Reitman, J.S., Rueter, H.H., and Hirtle, S.C., Knowledge and Organization Skill Differences in Computer Programmers, Cognitive Psychology, 13, 1981, pp. 307-325. McNamara, T.P., and Miller, D.L., Attributes of Theories of Meaning, Psychological Bulletin, Vol. 106, No. 3, 1989, pp. 355-376. Means, B., and Roth, C , Some Outcomes of a Cognitive Analysis of Troubleshooting, American Psychological Association Convention, Atlanta, GA, 1988. Mendes, K.S., Structured Systems Analysis: A Technique to Define Business Requirement, Sloan Management Review, Summer 1980, pp. 51-63. Meyer, B., Gurus Share Insights on Objects, IEEE Computer, June 1996, pp. 95-98. Miller, G., The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, Psychological Review, 63, 1956, pp. 81-97. Miller, J., Conceptual Models for Determining Information Requirements, Proceedings AFIPS 1964 Spring Joint Computer Conference, Vol. 25, Spartan Books Inc., 1964, pp. 609-620. Minsky, M., The Society of Mind, Simon & Schuster, 1986. Mook, D.G., In Defense of External Invalidity, American Psychologist, April 1983, pp. 279-387. Moore, G. C , and Benbasat, I., Development of an Instrument to Measure the Perceptions of Adopting an Information Technology Innovation, Information Systems Research, Vol. 2, No. 3, 1991, pp. 192-222. Moray, N., The Usefulness of Experimental Psychology, in: Psychology in the 1990's, Lagerspetz, K., and Niemi, P. (eds.), North Holland, 1984, pp. 225-235. Murdock, B.B., Jr., The retention of Individual Items, Journal of Experimental Psychology, 62, 1961, pp. 618-625. Mylopoulos, J. Conceptual Modeling and Telos, In: P. Loucopoulos and R. Zicari (eds.), Conceptual Modeling, Databases and Case, New York, John Wiley & Sons, 1992, pp. 49-68. Mynatt, B.T., Software Engineering with Student Project Guidance, Prentice-Hall, 1990. Nachouki, L., Chastang, M., and Briand, H., From Entity-Relationship Diagram to an Object-Oriented Database, International Conference on ER Approach, 1992, pp. 459-481. Narashimhan, B., Navathe, S., and Jayaraman, S., On Mapping ER and Relational Models onto OO Schemas, International Conference on ER Approach, 1993, pp. 402-413. -220-Navon, D., "Resources - A Theoretical Soapstone ?", Psychological Review, 91, 1984, pp. 216-234. Neches, R., Fikes, R., Finin, T., Gruber, T., Patil, R., Senator, T., and Swartout, W.R., Enabling Technology for Knowledge Sharing, Al Magazine, Fall, 1991, pp. 37-56. Newell, A., and Simon, H.A., Human Problem Solving, Englewood Cliffs, NJ: Prentice Hall, 1972. Newell, A., Unified Theories of Cognition, Cambridge, MA: Harvard University Press, 1990. Niederman, F., Brancheau, J.C., and Wetherbe, J.C., Information Systems Management Issues for the 1990's, MIS Quarterly, December 1991, pp. 475-500. Nisbett, R.E. and Ross, L., Human Inference: Strategies and Shortcomings of Social Judgment. Englewood Cliffs, NJ: Prentice Hall, 1980. Norman, D.A., Cognitive Engineering, in: User-centred System Design: New Perspectives on Human-Computer Interaction, D.A. Norman and S. Draper (eds.), Hillsdale, New Jersey, 1986. Norman, R.J., Object-Oriented Systems Analysis and Design, Prentice Hall, 1996. Norusis, M.J., SPSS: SPSS 6.1 Guide to Data Analysis, Prentice Hall, 1995. Oei, J.L.H., van Hemmen, L.J.G.T, Falkenberg, E.D., and Brinkkemper, S., The Meta Model Hierarchy: A Framework for Information Systems Concepts and Techniques, Katholieke Universiteit Nijmegen, Technical Report No. 92-17, July 1992. Olaisen, J., Information, Cognitive Authority and Organizational Learning, in: Olaisen, J., Munch-Petersen, E., and Wilson, P. (eds.), Information Science: From the Development of the Discipline to Social Interaction, Oslo: Scandinavian University Press, 1996, pp. 7-20. Olive, I., and Langford, H , Myths of Demons and Users, in: Information Analysis: Selected Readings, Galliers R. (ed.), Addison-Wesley, 1987, pp. 113-124. Olle, T.W., Hagelstein, J., MacDonald, I.G., Rolland, C , Sol, H.G., Van Assche, F.J.M., and Verrijn-Stuart, A.A., Information Systems Methodologies: A Framework for Understanding, Addison-Wesley, 1988. Olle, T.W., Sol., H.G., and Verrijn-Stuart, Information Systems Design Methodologies: A Comparative Review, Proceedings of the CRIS 82 Conference, North-Holland, Amsterdam, 1982. OODBTG, General Characteristics of Object Models, section 3.2 in: Kent, B., Otis, A., Thompson, C , (eds.), Object Data Management Reference Model, section 3 in: Fong, E., Kent, W., Moore, K., and Thompson, C , (eds.), X3/SPARC/DBSSG/OODBTG Final Report, September 1991. Page-Jones, M., The Practical Guide to Structured Systems Design, Second Edition, Prentice Hall, 1988. -221 -Parnas, D.L., On the Criteria to be used in Decomposing Systems into Modules, Communications of the ACM, Vol. 15, No. 12, 1972, pp. 1053-1058. Patel, V.L., and Grown, G.J., The General and Specific Nature of Medical Expertise: A Critical Look, in: Ericsson, K.A., & Smith, J. (eds.), Towards A General Theory of Expertise, New York: Cambridge University Press, 1991, pp. 93-125. Peckham, J., and Maryanski, F., Semantic Data Models, ACM Computing Surveys, Vol. 20, No. 3, September 1988. Porter, M.E., and V.E. Millar, How Information Gives You Competitive Advantage, Harvard Business Review, July-August 1985, pp. 149-160 Prietula, M.J., and March, S.T., Form and Substance in Physical Database Design: An Empirical Study, Information Systems Research, Vol. 2, No. 4, December 1991, pp. 287-314. Ramamoorthy, C.V., Prakash, A., Tsai, W., and Usuda, Y., Software Engineering: Problems and Perspectives, Computer, Vol. 17, No. 10, Oct. 1984, pp. 191-209. Reisner, P., Human Factors Studies of Database Query Languages: A Survey and Assessment, Computing Surveys, Vol. 13, No. 1, March 1981, pp. 13-31. Rolland, C , and Cauvet, C , Trends and Perspectives in Conceptual Modeling, In: P. Loucopoulos and R. Zicari (eds.), Conceptual Modeling, Databases and Case, New York, John Wiley & Sons, 1992, pp. 27-32. Rosch, E.C., Cognitive Representations of Semantic Categories, Journal of Experimental Psychology: General, Vol. 4, 1975, pp. 192-206. Ross, D.T., and Schoman, K.E., Structured Analysis for Requirements Definition, IEEE Transactions on Software Engineering, SE-3, 1, January 1977, pp. 1-15. Ross, D.T., Applications and Extensions of SADT, Computer, 18, 4, April 1985, pp. 25-35. Ross, D.T., Structured Analysis (SA): A Language for Communicating Ideas, IEEE Transactions on Software Engineering, SE-3, 1, January 1977, pp. 16-34. Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, E , and Lorensen, W., Object-Oriented Modeling and Design, Prentice Hall, Englewood Cliffs, NJ, 1991. Rush, G., A Fast Way to Define System Requirements, Computerworld, Vol. XIX, 40, October 7, 1985. SAS User's Guide: Statistics, Version 5 Edition, SAS Institute Inc., 1985. Satzinger, J.W., and Orvik, T.U., The Object-Oriented Approach: Concepts Modeling and System Development, Boyd & Fraser, 1996. Sernadas, C , Fiadeiro, J., Meersman, R., and Sernadas, A., Proof-Theoretic Conceptual Modeling: The NIAM Case Study, In: E.D. Falkenberg and P. Lindgreen (eds.), -222-Information System Concepts: An In-depth Analysis, Elsevier Science Publishers B.V., North-Holland, 1989, pp. 1-30. Shaft, T. M., and Vessey, I., The Relevance of Application Domain Knowledge: The Case of Computer Program Comprehension, Information Systems Research, Vol. 6, No. 3, 1995, pp. 286-299. Shelton, R, From the Editor, Hotline on Object-Oriented Technology, November 1992, p.2. Shlaer, S., Mellor, S.J., Object Lifecycles — Modeling the World in States, Yourdon Press, 1992. Shlaer, S., Mellor, S.J., Object-Oriented Systems Analysis — Modeling the World in Data, Yourdon Press, 1988. Shneiderman, B., and Mayer, R.E., Syntatic / Semantic Interactions in Programming Behavior A Model and Experimental Results, 1979. Shoval, P., and Frumermann, I., OO and EER Conceptual Schemas: A Comparison of User Comprehension, Journal of Database Management, Vol. 5, No. 4, Fall 1994, pp. 28-38. Shoval, P., and Shiran, S., Entity-Relationship and Object-Oriented Data Modeling — An Experimental Comparison of Design Quality, Data & Knowledge Engineering, 1996 (forthcoming). Shoval, P., Experimental Comparisons of Entity-Relationship and Object-Oriented Data Models, in: K Siau and Y. Wand (eds.), Workshop on Evaluation of Modeling Methods in Systems Analysis and Design, Crete, Greece, May 1996, pp. L l - L l l . Siau, K., Chan, H.C., and Tan, K.P, Visual Knowledge Query Language, IEICE Transactions on Information and Systems, Volume E75-D, No 5, 1992, pp. 697 - 703. Siau, K., Chan, H.C., and Tan, K.P., Visual Knowledge Query Language as a Front-end to Relational Systems, The Fifteenth Annual International Computer Software & Applications Conference (COMPSAC 91) 1991, pp. 373 - 378. Siau, K., Chan, H.C., and Wei, K.K., The Effects of Conceptual and Logical Interfaces on Visual Query Performance of End Users, Sixteenth Annual International Conference on Information Systems (ICIS'95), Amsterdam, The Netherlands, December 1995, pp. 225 -236. Siau, K., Wand, Y., and Benbasat, I., A Psychological Study on the Use of Relationship Concept ~ Some Preliminary Findings, Lecture Notes in Computer Science — Advanced Information Systems Engineering, Vol. 932, J. Iivari, K. Lyytinen, M. Rossi (eds.), 1995, Springer-Verlag, pp. 341-354. Siau, K., Wand, Y., and Benbasat, I., Evaluating Information Modeling Methods — A Cognitive Perspective, in: K Siau and Y. Wand (eds.), Workshop on Evaluation of Modeling Methods in Systems Analysis and Design, Crete, Greece, May 1996, pp. M l -M13. -223-Siau, K., Wand, Y., and Benbasat, I., When Parents Need Not Have Children — Cognitive Biases in Information Modeling, Lecture Notes in Computer Science — Advanced Information Systems Engineering, Vol. 1080, P. Constantopoulos, J. Mylopoulos, and Y. Vassiliou (eds.), 1996a, Springer-Verlag, pp. 402-420. Siegel, S., and Castellan, N.J., Nonparametric Statistics for the Behavioral Sciences, Second Edition, McGraw-Hill, 1988. Simon, H.A., and Craig, A.K., Foundations of Cognitive Science, in: Posner, M.I. (ed.), Foundations of Cognitive Science, 1989, pp. 1-47. Simon, H.A., How Big is a Chunk ?, Science, 183, 1974, pp. 482-488. Simon, H.A., Models of Thought, New Haven, CT:Yale University Press, 1979. Simon, H.A., On the Forms of Mental Representation, in: Minnesota Studies in the Philosophy of Science, Vol. IX: Perception and Cognition: Issues in the Foundations of Psychology, Savage, C.W. (ed.), Minneapolis: University of Minnesota Press, 1978. Simsion, G., Data Modeling Essentials: Analysis, Design, and Innovation, Thomson Computer Press, 1994. Smith, E.E., Theories of Semantic Memory, In: W.K. Estes (ed.), Handbook of Learning and Cognitive Processes, Vol. 6, Hillsdale, NJ: Erlbaum, 1978, pp. 1-56. Spencer R. , Teorey T., and Hevia E., ER Standard Proposal, Entity-Relationship Approach: The Core of Conceptual Modeling, H. Kangassalo (ed.), Elsevier Science Publishers, 1991, pp. 425-432. Stacy, W., and MacMillian, J., Cognitive Bias in Software Engineering, Communications of the ACM, Vol. 38, No. 6, June 1995, pp. 57-74. Stacy, W., Cognition and Software Development, Communications of the ACM, Vol. 38, No. 6, June 1995, pp. 31. Stillings, N.A., Feinstein, M.H., Garfield, J.L., Rissland, E.L., Rosenbaum, D.A., Weisler, S.E., and Ward, L.B., Cognitive Science: An Introduction, MIT Press, 1987, pp. 142-157. Sully, P., Modeling the World with Objects, Prentice Hall, 1993. Synder, A., The Essence of Objects: Concepts and Terms, IEEE Software, January 1993, pp. 31-42. Tan, J.K.H., and Benbasat, I., Processing of Graphical Information: A Decomposition Taxonomy to Match Data Extraction Tasks and Graphical Representations, Information Systems Research, Vol. 1, No. 4, 1990, pp. 416-439. Tapson, D., Caston, A., Paradigm Shift, McGraw-Hill, USA, 1993. Taylor, S.E., and Thompson, S.C., Stalking the Elusive "Vividness" Effect, Psychological Review, 89, 1982, pp. 155-181. -224-Teichroew, D., and Hershey, E.A., PSL/PSA: A Computer-aided Technique for Structured Documentation and Analysis of Information Processing Systems, IEEE Transactions on Software Engineering, SE3-1, 1, January 1977, pp. 41-48. Teorey, T.J., Database Modeling and Design — The Entity-Relationship Approach, Morgan Kaufmann Publishers, 1990. Teorey, T.J., Yang, D, and Fry, J.P., A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Model, ACM Computing Survey, Vol. 18, No. 2, June 1986. Treisman, A., Features and Objects in Visual Processing, Scientific American, Vol. 255, No. 5, Nov. 1986, pp. 114-125. Tulving, E., Elements of Episodic Memory, New York: Oxford University Press, 1983. Tulving, E., Episodic and Semantic Memory, in: Organization of Memory, Tulving, E., and Donaldson, W. (eds.), New York: Academic Press, 1972, pp. 381-403. Tversky, A. and Kahneman, D., Judgment Under Uncertainty: Heuristic and Biases, in: Judgment under Uncertainty, Cambridge University Press, 1982, pp. 3-20. Tversky, A. and Kahneman, D., Judgment under Uncertainty: Heuristics and Biases. Science, Vol. 185, 1974, pp. 1124-1131. Tversky, A. and Kahneman, D., Rational Choice and the Framing of Decision. The Journal of Business Vol. 59, No. 2, 1986, pp. S251-S278. Tversky, B. and Hemenway, K., Objects, parts and categories, Journal of Experimental Psychology: General, 113, 1984, pp. 170-197. Ullman, J.D., Principles of Database and Knowledge-base Systems Volume I, Computer Science Press, 1988. Valusek, J.R., and Fryback, D.G., Information Requirements Determination: Obstacles Within, Among and Between Participants, in: Information Analysis: Selected Readings, Galliers R. (ed.), Addison-Wesley, 1987, pp. 139-152. Vessey, I., and Conger, S.A., Learning to Specify Information Requirements: The Relationship Between Applications and Methodology, Journal of Management Information Systems, Vol. 10, No. 2, 1993, pp. 177-201. Vessey, I., and Sravanapudi, A.P., CASE Tools as Collaborative Support Technologies, Communications of the ACM, Vol. 38, No. 1, 1995, pp. 83-95. Vessey, I., Expert-Novice Knowledge Organization: An Empirical Investigation using Computer Program Recall, Behavior and Information Technology, Vol. 7, No. 2, 1988, pp. 153-171. Vossen, G., Data Models, Database Languages and Database Management Systems, Addison-Wesley, UK, 1991. Walsham, G., Interpreting Information Systems in Organizations, John Wiley & Sons, New York, 1993. - 225 -Wand, Y., A Proposal for a Formal Model of Objects, In: Object-Oriented Concepts, Databases, and Applications, Lochovsky, F. and Kim, W. (eds.), ACM Press, Addison-Wesley, Reading, Mass., 1989, pp. 537-559. Wand, Y., and Weber, R., An Ontological Analysis of Some Fundamental Information Systems Concepts, Proceedings of the Ninth International Conference on Information Systems, December 1988, pp. 213-225. Wand, Y., and Weber, R., An Ontological Evaluation of Systems Analysis and Design Methods, Proceedings of the IFIP WG 8.1 Working Conference on Information Systems Concepts: An In-Depth Analysis, Namur, Belgium, October 1989, pp. 79-107. Wand, Y., and Weber, R., An Ontological Model of an Information System, IEEE Transactions on Software Engineering, Vol. 16, No. 11, 1990, pp. 1282-1292. Wand, Y., and Weber, R., On the Ontological Expressiveness of Information Systems Analysis and Design Grammars, Journal of Information Systems, Vol. 3, 1993, pp. 217-237. Wand, Y., and Weber, R., Toward a Theory of the Deep Structure of Information Systems, Proceedings of the Eleventh International Conference on Information Systems, December 1990, pp. 61-71. Wand, Y., and Woo, C , Object-Oriented Analysis - Is it Really that Simple?, Proceedings of the Workshop on Information Technologies and Systems, Orlando, December 1993. Wand, Y., Storey V.C. and Weber, R., Analyzing the Meaning of a Relationship, Working Paper 92-MIS-011, Faculty of Commerce and Business Administration, The University of British Columbia, February, 1993. Wand, Y., Storey, V., and Weber, R., Analyzing the Meaning of a Relationship, Working Paper 92-MIS-011, University of British Columbia, February 1993. Wasserman, A.I., Information System Design Methodology, Journal of the American Society for Information Science, Vol. 31, No. 1, Jan 1980. Weber, R., and Zhang, Y., An Ontological Evaluation of NIAM's Grammar for Conceptual Schema Diagrams, Proceedings of the Eleventh International Conference on Information Systems, December 1991, pp. 75-82. Wegner, P., Dimensions of Object-oriented Modeling, IEEE Computer, October 1992, pp. 12-20. Wetherbe, J.C., Executive Information Systems: Getting it Right, MIS Quarterly, Vol. 15, No. 1, 1991, pp. 51-65. Wiedenbeck, S., Novice/Expert Differences in Programming Skills, International Journal of Man-Machine Studies, Vol. 23, 1985, pp. 383-390. Willumsen, Geir, Conceptual Modeling in IS Engineering, In: Executable Conceptual Models in Information Systems Engineering, Trondheim, Nov. 1993, pp. 11-21. -226-Winston, M.E., Chaffin, R.J.S., and Herrmann, D.J., A Taxonomy of Part-Whole Relations, Cognitive Science, Vol. 11, 1987, pp. 417-444. Wood, R.E., Task Complexity: Definition of the Construct, Organizational Behavior and Human Decision Processes, 37, 1986, pp. 60-82. X3H7-93-007v9, X3H7 Object Model Features Matrix, Frank Manola (ed.), October 10, 1994. Yadav, S.B., Bravoco, R.R., Chatfield, A.T., and Rajkumar, T.M., Comparison of Analysis Techniques for Information Requirement Determination, Communications of the ACM, Vol. 31, No. 9, September 1988, pp. 1090-1097. Yourdon, E., Modern Structured Analysis, Prentice Hall, 1989. Yourdon, E., Object-Oriented Systems Design: An Integrated Approach, Englewood Cliffs NJ: Yourdon Press/Prentice Hall, 1994. Yourdon, E., Whitehead, K., Thomann, J., Oppel, K., and Nevermann, P., Mainstream Objects: An Analysis and Design Approach for Business, Prentice Hall, 1995. -227-APPENDIX A The University of British Columbia Faculty of Commerce and Business Administration Conceptual Modeling Study — Consent Form February 1993 We are conducting research on the effectiveness of certain conceptual models. Your participation will help us to advance knowledge about conceptual modeling. We are seeking your consent to use this questionnaire as part of our research data. All data gathered will be kept confidential. When the data are being reported in research papers, they will be aggregated and hence cannot be associated with individuals. Your consent is completely voluntary. If you have any questions now or later, please call Prof. Yair Wand at 822-8395 or Keng Siau at 221-1290. Yair Wand Izak Benbasat Keng Siau I received a copy of this consent form and understand the information provided. I understand that participation is completely voluntary and agree to participate in this experiment. I agree that my questionnaire and data provided in this experiment be used in the study and any future research. Name (Please print) Signature Date -228-The University of British Columbia Faculty of Commerce and Business Administration Modeling Experiment February 1993 The conceptual level is concerned with the fundamental information needs of an organization quite independent of how data is collected, stored, or used. This experiment is designed to enable us to learn more about conceptual modeling. Al l information provided in this experiment will be kept confidential. Please kindly fill up the following information : 1. Faculty/Department : 2. Position: Undergraduate / MBA / M.Sc. / Ph.D. / Staff / Faculty 3. If student, year of study : l / 2 / 3 / 4 / 5 4. Sex : M / F 5. Age : < 20 / 20-25 / 26-30 / 30-40 / > 40 6. Rate the following modeling constructs according to your familiarity with them. Place a number (1 to 5) in the space to the left of the constructs named below, with 1 meaning that the constructs are totally unfamiliar to you, and 5 meaning that you are very familiar with the constructs. You can use the same number more than once. (1 — Totally unfamiliar, 5 — Very familiar) Entities Relationships Attributes Cardinalities Roles -229-Rate the following techniques according to your familiarity with them. Place a number 1 to 5 in the space to the left of the techniques named below, with 1 meaning that the techniques are totally unfamiliar to you, and 5 meaning that you are very familiar with the techniques. You can use the same number more than once. (1 — Totally unfamiliar, 5 ~ Very familiar) Data Flow Diagram (DFD) Structure Chart Binary Data Model Entity Relationship Model (ER) Object-Oriented Model (OO) Higher-Order System (HOS) Warnier-Orr Diagram Michael Jackson Model Flowchart Nijssen's Information Analysis Method (NIAM) Ontological Model Others (Please specify : ) -230-APPENDIX B Some Basic Modeling Concepts An entity is a "thing" in the real world with an independent existence. A n entity may be an object with a physical existence ~ a particular person, car, house, or employee — or it may be an object with a conceptual existence — a company, a job, a bank account or a university course. A n entity type is represented as a rectangular box. A relationship is an association among entities. For example, Ownership is a relationship between a student and a book. Being employed is a relationship between an employer and an employee. A relationship type is represented as a diamond. The structural constraints associate a pair of integer numbers (min, max) with each participation of an entity type in a relationship type, where 0 < min < max and max > 1. The numbers mean that each entity must participate in at least min and at most max relationship instances in the relationship type at all time. -231 -Symbol Meaning Entity Type Relationship Type Structural Constraint (min, max) on participation of E in R -232-Examples Symbol Meaning Department Department Entity Type Manages^> Manages Relationship Type Department (1,5) Manages (1,3) Project The structural constraint (1,5) means that each Department entity musl participate in at least 1 and at most 5 relationship instances in the Manages relationship type. In other words, each department mus manage at least 1 and at most 5 projects. Department (2,*) (1,3) Project The structural constraint (2,*) means that each Department entity must participate in at least 2 relationship instances in the Manages relationship type. In this case, * can be any number >= 2. In other words, each deparment must manage at least 2 projects. Department (0,2) Manages (1,3) Project The structural constraint (0,2) means that each Department entity may or may not participate in the Manages relationship type. Furthermore, each Department entity can participate only up to 2 relationship instances in the Manages relationship type. In other words, not every department manages projects. If a department manages projects, it can manage up to a maximum of 2 projects. -233-APPENDIX C Question Format Employee Assigns Department Circle one of the following two options that more correctly reflects the participation of the Employee entity type in the Assigns relationship type: 1. must assign 2. may assign What is the confidence level of your above choice? No confidence Absolute confidence What is your familiarity with the domain depicted in the diagram? Not familiar at all 1 2 3 4 5 6 7 Very familiar -234-Appendix C (Con't) Questions with Familiar Domain Employee Course Supplier Project Instructor Person -235-Appendix C (Con't) Unfamiliar Group Photon Source Matches Sound Source Nucleus Leads Dendrite 236-APPENDIX D Question Format Circle one of the following two options that more correctly reflects the participation of the Shareholder entity type in the Owns relationship type: 1. may own 2. must own What is your confidence level in the above choice? No confidence 1 2 3 4 5 6 7 Absolute confidence What is your familiarity with the domain depicted in the diagram? Not familiar at all 1 2 3 4 5 6 7 Very familiar -237-Appendix D (Con't) Conflicting Relationship Soldier Professor Landlady Parent Shareholder Bookstore Airlines Tour Agency -238-Appendix D (Con't) Non-Conflicting Relationship Doctor (0,*) Treats Patient Publisher (0,*) Publishes Journal Scientist (0,*) Teaches Course Student (1,*) Studies Institution Employee (l.l) Works Company Museum (1,*) Collects Artifact -239-APPENDIX E Reading Materials Introduction Systems analysis is an important step in information systems development. It involves identifying the requirements for the new system, and defining the logical specifications of the new system. Because of its importance, numerous methods have been proposed to help analysts in the process. One of the popular techniques is the entity-relationship model. The concepts in the entity-relationship model are designed to reflect users' perceptions of the real world and are not meant to describe the way in which data will be stored in the computer. We will introduce three of the entity-relationship (ER) concepts here — entity, relationship, and structural constraints. A n entity is a "thing" which exists in the real world. An entity may be an object with a physical existence, such as a particular person, car, house, or employee. It may also be an object with a conceptual existence such as a company, a job, a bank account or a university course. An entity is represented by a rectangular box. Entity Symbols Meaning Department A Department entity -240-Relationship A relationship is an association that describes the interaction between the entities. For example, Teaches is a relationship between instructor and student. Purchases is a relationship between customer and item. A relationship is represented by a diamond. Meaning A Manages Relationship The following diagram contains two entities named Department and Project and a relationship Manages. The relationship Manages indicates that department manages project. -241 -Structural Constraint Structural constraint is a pair of integer numbers (min, max) that indicates the minimum and maximum number of relationships an entity can participate in. For example, how many projects can a department manage ? The numbers mean that each Department entity must participate in at least "min" and at most "max" Manages relationships. In other words, each Department must manage at least "min" projects and up to a maximum of "max" projects. At times, we use the symbol * for the "max" participation, which can represent any number greater than "min". Symbol Department (min, max) , , , Manages Note: 0 <= min <= max and max >= 1 Meaning Structural Constraint (min, max) on participation of Department entity in Manages relationship -242-Examples Example 1 Since we know that a hospital must have at least one nurse, we can represent it using the structural constraints (1,*) [i.e., the minimum participation is 1]. In this case, * can be any number greater than 1. Hospital (V) Has Nurse Example 2 Since we know that an apartment may have no balcony, we can represent it using the structural constraints (0,*) [i.e., the minimum participation is 0]. In this case, * can be any number greater than 0. Apartment (0,*) Has Balcony 243-APPENDIX F Question Format Professor In your opinion, which of the following options is more correct ? 1. Professor must own at least one car 2. Professor may own no car What is the confidence level of your above choice? 1 2 3 4 5 6 7 Zero confidence Absolute confidence How familiar are you with the scenario depicted by the option you have selected ? 1 2 3 4 5 6 7 Not familiar at all Very familiar In your opinion, which of the following options is more correct ? 1. Professor must own at least one car 2. Professor may own no car What is the confidence level of your above choice? 1 2 3 4 5 Zero confidence 6 7 Absolute confidence How familiar are you with the scenario depicted by the option you have selected ? 1 2 Not familiar at all 6 7 Very familiar -244-Appendix F (Con't) Conflicting Relationship Professor Bookstore Airline Adult Student House Owner Library Pharmacy Shareholder Apartment -245-Appendix F (Con't) Non-Conflicting Relationship 246-APPENDIX G This set consists of experimental materials used in second formalized study. The materials are organized in the following sequence: (i) Instructions for Think-Aloud (ii) Examples provided to the subjects before Phase 1 — only one example per subject (depending on the treatment) (iii) Instruction for Phase 1 (iv) Question sets for Phase 1 — only one set per subject (depending on the treatment) (v) Reading Materials ~ only one set of reading materials per subject (vi) Instruction for Phase 2 (vii) Question sets for Phase 2 ~ only one set per subject (viii) Agreement to Confidentiality Form -247-I n s t r u c t i o n s f o r T h i n k - A l o u d In this study, we are interested in your running commentary on what is going through your mind while you work on the cases. Therefore, we will ask you to THINK ALOUD CONSTANTLY. What we mean by think-aloud is that we want you to reason in a loud voice, SAY OUT LOUD everything that passes through your mind for each step as you work on the cases. It does not matter if your sentences are not complete, since you are not explaining to anyone else. Just act as if you are alone in the room speaking to yourself loudly. It is most important that you keep talking. If you are silent for more than 10 seconds I -will remind you to keep talking aloud. EXERCISE I Before turning to the real task, we will start with a couple of practice problems. Please talk aloud while you work on these problems. First, please add two numbers in your head, and say out aloud each step. Now TALK ALOUD while you calculate 476 + 688 = ? EXERCISE H Please multiply two numbers in your head, and say out aloud each step. Now TALK ALOUD while you calculate 24 X 36 = ? - 2 4 8 -Example The following diagram depicts the case that "factories produce vehicles", "factories maintain vehicles", "factories employ workers", and "workers own houses." Vehicle House -249-Example The following diagram depicts the case that "factories produce vehicles", "factories maintain vehicles", "factories employ workers", and "workers own houses." aintainance. Factory Vehicle Production Employment Worker -250-Example The following diagram depicts the case that "factories produce vehicles", "factories maintain vehicles", " factories employ workers", and "workers own houses". Maintain Factory Produce Vehicle Employ Worker Own House Example The following diagram depicts the case that "factories produce vehicles", "factories maintain vehicles", "factories employ workers", and "workers own houses". Maintainance Factory Vehicle Production Employment Worker Ownership House - 252 -Phase I This is the first phase of the experiment. In this phase, there will be one practice case (i.e., the first case) and six experimental cases. Your task is to continuously verbalize your thought process as you attempt to understand the cases. As a reminder, you should continuously verbalize your thought process. The structure of the sentences does not matter and incomplete sentences are fully acceptable. Also, do not miss out any component in the cases. Time is not a factor in this study. -253 -Vehicle House Case 0 Pay {Manufacturer Produce Customer Consume Goods Case 1 Insurance Corp. Case 2 Case 3 Case 4 Student Course Instruct Research Professor Sponsor Organization Consult Department Case 5 to Case 6 ON Vehicle House Case 0 Payment Manufacturer Customer Consumption, Goods Case 1 Insurance Corp. ON Case 2 Case 3 Student to ON Activity Case 4 Organization Case 5 Drug Prescription, Treatment Specification, Pharmacy Test Laboratory Patient Attendance Lecommendatioi Physician Hospital Insurance Case 6 Maintain Factory Produce Vehicle Employ Worker Own House Case 0 Pay Produce Manufacturer Customer Consume Goods Case 1 Own Car Customer Register Insure Insurance Corp. Province Case 2 Driver Assign Shipment Store Coordinate Allocate Truck Receive Warehouse Customer Serve Sales-Rep Case 3 Student Approve Participate IO -J to Activity Organize Admit College Sponsor Regulate Club Case 4 Attach Student Participate Research Enroll Sponsor Supervise Course Instruct Organization Professor Consult Suggest Recruit Department Case 5 Drug Prescribe Treatment Dispense Pharmacy Specify Test Receive Patient Attend Recommend Laboratory Analyze Locate Hospital Cover Physician Insurance Case 6 Maintainance Factory Production Vehicle Employment Worker Ownership House Case 0 Payment Manufacturer Production Customer Consumption Goods Case 1 Ownership Car Insurance Insurance Corp. Customer Registration Province to Case 2 Driver Allocation Assignment Shipment Truck Storage Receipt Coordination Warehouse Customer Service Sales-Rep Case 3 Student Approval Participation Admittance Activity Organization College Sponsorship Regulation Club Case 4 Attachment Student Participation Research Enrollment Sponsorship Supervision Course Instruction Organization Professor Suggestion Consultation Recruitment Department Case 5 Drug Prescription Treatment Dispensation Pharmacy Specification Test Receipt Patient Attendance 1— Recommendation Laboratory Analysis Location Hospital Coverage Physician Insurance Case 6 Information Systems Analysis Study Reading Materials Introduction Systems analysis is an important step in information systems development. It involves identifying the requirements for the new system, and defining the specifications of the system. Numerous techniques have been proposed to help analysts in systems analysis. One of the popular techniques is the Entity-Relationship model. The concepts in the Entity-Relationship model are designed to reflect users' perceptions of the real world and are not meant to describe the way in which data will be stored in the computer. Two of the main constructs in Entity-Relationship (ER) modeling are the entity, and relationship. Entity An entity is a "thing" which exists in the real world. An entity may be an object with a physical existence, such as a particular person, car, house, or employee. It may also be an object with a conceptual existence such as a company, a job, a bank account or a university course. An entity is represented by a rectangular box. Symbols Meaning Department A Department entity House A House entity © Keng Siau & Yair Wand -282-Information Systems Analysis Study Relationship A relationship is an association that describes the interaction between entities. For example, "Instruct" is a relationship between instructors and students. "Own" is a relationship between car-owners and cars. A relationship is represented by a diamond. Symbol Meaning An Instruct Relationship A Manage Relationship © Keng Siau & Yair Wand -283-Information Systems Analysis Study Examples Example 1 The following diagram describes the case of "departments manage projects." It contains two entities named Department and Project and the relationship Manage. Note that the diagram could also be drawn as shown below. It is read as "projects are managed by departments." The two diagrams represent the same case and are equivalent. © Keng Siau & Yair Wand -284-Information Systems Analysis Study Example 2 The following diagram depicts the case of "instructors instruct students." It contains two entities named Instructor and Student, and the relationship Instruct. Instructor Instruct Student Note that the diagram could also be drawn as shown below. It is read as "students are instructed by instructors." The two diagrams represent the same case and are equivalent. Student Instruct Instructor © Keng Siau & Yair Wand -285-Information Systems Analysis Study Example 3 The following diagram depicts the case that "factories produce and maintain vehicles", "factories employ workers", and "workers own houses." Vehicle House © Keng Siau & Yair Wand -286-Information Systems Analysis Study Reading Materials Introduction Systems analysis is an important step in information systems development. It involves identifying the requirements for the new system, and defining the specifications of the system. Numerous techniques have been proposed to help analysts in systems analysis. One of the popular techniques is the Entity-Relationship model. The concepts in the Entity-Relationship model are designed to reflect users' perceptions of the real world and are not meant to describe the way in which data will be stored in the computer. Two of the main constructs in Entity-Relationship (ER) modeling are the entity, and relationship. Entity An entity is a "thing" which exists in the real world. An entity may be an object with a physical existence, such as a particular person, car, house, or employee. It may also be an object with a conceptual existence such as a company, a job, a bank account or a university course. An entity is represented by a rectangular box. Symbols Meaning Department A Department entity House A House entity © Keng Siau & Yair Wand -287-Information Systems Analysis Study Relationship A relationship is an association that describes the interaction between entities. For example, "Instruction" is a relationship between instructors and students. "Ownership" is a relationship between car-owners and cars. A relationship is represented by a diamond. Symbol Meaning An Instruction Relationship A Management Relationship © Keng Siau & Yair Wand -288-Information Systems Analysis Study Examples Example 1 The following diagram describes the case of "departments manage projects." It contains two entities named Department and Project and the relationship Management. Note that the diagram could also be drawn as shown below. It is read as "projects are managed by departments." The two diagrams represent the same case and are equivalent. © Keng Siau & Yair Wand -289-Information Systems Analysis Study Example 2 The following diagram depicts the case of "instructors instruct students." It contains two entities named Instructor and Student, and the relationship Instruction. Instructor Instruction Student Note that the diagram could also be drawn as shown below. It is read as "students are instructed by instructors." The two diagrams represent the same case and are equivalent. Student Instruction Instructor © Keng Siau & Yair Wand -290-Information Systems Analysis Study Example 3 The following diagram depicts the case that "factories produce and maintain vehicles", "factories employ workers", and "workers own houses." Vehicle House © Keng Siau & Yair Wand -291 -Information Systems Analysis Study Reading Materials Introduction Systems analysis is an important step in information systems development. It involves identifying the requirements for the new system, and defining the logical specifications of the new system. Numerous techniques have been proposed to help analysts in systems analysis. One of the popular techniques is the Object-Oriented model. The concepts in the Object-Oriented model are designed to reflect users' perceptions of the real world and are not meant to describe the way in which data will be stored in the computer. The main construct in Object-Oriented (OO) modeling is the object. Object An object is defined as a concept, abstraction, or thing which is meaningful for the problem at hand. An object may be physical or conceptual. For example, an object could have a physical existence, such as a particular person, car, house, or employee. An object may also have a conceptual existence such as a company, a job, a bank account or a university course. An object is represented by a rectangular box. Symbols Meaning Department A Department object Employ An Employ object © Keng Siau & Yair Wand - 292 -Information Systems Analysis Study Examples Example 1 The following diagram represents the case of "owners own vehicles." The three objects in the case are Owner, Own, and Vehicle Owner Own Vehicle Note that the diagram could also be drawn as shown below. It is read as "vehicles are owned by owners." The two diagrams represent the same case and are equivalent. Vehicle Own Owner © Keng Siau & Yair Wand 293 -Information Systems Analysis Study Example 2 The following diagram represents the case of "companies employ employees." The three objects in the case are Company, Employ, and Employee. Company Employ Employee Similarly, the same case could be represented as shown below. This is read as "employees are employed by companies." The two diagrams represent the same case and are equivalent. Employee Employ Company © Keng Siau & Yair Wand - 294 -Information Systems Analysis Study Example 3 The following diagram depicts the case that "factories produce and maintain vehicles", " factories employ workers", and "workers own houses". Maintain Factory Vehicle Produce Employ Worker Own House © Keng Siau & Yair Wand -295-Information Systems Analysis Study Reading Materials Introduction Systems analysis is an important step in information systems development. It involves identifying the requirements for the new system, and defining the logical specifications of the new system. Numerous techniques have been proposed to help analysts in systems analysis. One of the popular techniques is the Object-Oriented model. The concepts in the Object-Oriented model are designed to reflect users' perceptions of the real world and are not meant to describe the way in which data will be stored in the computer. The main construct in Object-Oriented (OO) modeling is the object. Object An object is defined as a concept, abstraction, or thing which is meaningful for the problem at hand. An object may be physical or conceptual. For example, an object could have a physical existence, such as a particular person, car, house, or employee. An object may also have a conceptual existence such as a company, a job, a bank account or a university course. An object is represented by a rectangular box. Symbols Department Meaning A Department object Employment An Employment object © Keng Siau & Yair Wand -296-Information Systems Analysis Study Examples Example 1 The following diagram represents the case of "owners own vehicles." The three objects in the case are Owner, Ownership, and Vehicle Owner Ownership Vehicle Note that the diagram could also be drawn as shown below. It is read as "vehicles are owned by owners." The two diagrams represent the same case and are equivalent. Vehicle Ownership Owner © Keng Siau & Yair Wand -297-Information Systems Analysis Study Example 2 The following diagram represents the case of "companies employ employees." The three objects in the case are Company, Employment, and Employee. Company Employment Employee Similarly, the same case could be represented as shown below. This is read as "employees are employed by companies." The two diagrams represent the same case and are equivalent. Employee Employment Company © Keng Siau & Yair Wand -298-Information Systems Analysis Study Example 3 The following diagram depicts the case that "factories produce and maintain vehicles", " factories employ workers", and "workers own houses". Maintainance Factory Vehicle Production Employment Worker Ownership House © Keng Siau & Yair Wand -299-P h a s e I I This is the second phase of the experiment. In this phase, there will be one practice case (i.e., the first case) and six experimental cases. Your task is to continuously verbalize your thought process as you attempt to understand the cases. As a reminder, you should continuously verbalize your thought process. The structure of the sentences does not matter and incomplete sentences are fully acceptable. Also, do not miss out any component in the cases. Time is not a factor in this study. -300-Vehicle House Case 0 Case 1 Juror Case 2 o Case 3 Case 4 Own Publisher Book Topic-Are a Write Collect Prof.-Society Researcher Institution Associate Case 5 Case 6 House Case 0 Ownership^ Account Client Case 1 Case 2 Case 3 Case 4 Publisher Book Writer Prof.-Society Researcher Classification. Specialization^ Topic-Area Employment Institution Case 5 Employee Employment Requisition Part Machine Manufacturer Case 6 Maintain Factory Produce Vehicle Employ Worker Own House Case 0 Own Account Register Client Issue Transaction Case 1 Appoint Case Inquire Juror Judge Allocate Courtroom Case 2 Cooperate Airline Arrange Flight Assign Agency Organize Tour Reserve Airplane Case 3 Affiliate Library Branch Collect Subscribe Journal Contribute Member Volume Donate Duplicate Book Case 4 Publisher Own Publish Book Write Prof.-Society Affiliate Researcher Associate Classify Specialize Topic-Area Employ Collect Institution Case 5 Employee Operate Employ Assign Company Invest Project Generate Order Allocate Machine Maintain Request Part Manufacturei Construct Case 6 Maintain ance Factory Production Vehicle Employment Worker Ownership House Case 0 Ownership Account Registration Client Issuance Transaction Case 1 Appointment Case Inquiry Judge Allocation Courtroom Case 2 Cooperation Airline Arrangement Flight Assignment Agency Organization Tour Reservation Airplane Case 3 Affiliation Library Contribution Member Branch. Collection Volume Donation Subscription Journal Duplication Book Case 4 Publisher Ownership Publication Book Writer Prof.-Society Affiliation Researcher Classification Specialization- Topic-Area Association Employment Collection Institution Case 5 Employee Assignment Employment Company Investment Operation Project Allocation Machine Generation Maintainancc Order Requisition Manufacturei Part Construction Case 6 Information Systems Analysis Study Agreement to Confidentiality I understand that some of my classmates and friends may also be participating in this study. I realize that my discussion of the details of this study with them may distort the results. Therefore, I agree not to discuss with any other participants any aspect of the study. Signature of Participant Date Participant's Name: We thank you very much for your assistance in this study. If you are one of the top 20% participants, we will inform you at a later date. -329-APPENDIX H Coding Sheet Case 1 Subject Name: Set Number 1/2 Cell Type: Explicit-Verb / Explicit-Noun/ Implicit-Verb / Implicit-Noun Phase Number: 1/2 Chunk Right Interpretation Wrong Interpretation Missing 1 Attempt > 1 Attempts 1 Attempt > 1 Attempts Unable to form chunk Unable to understand chunk Unable to form chunk Unable to understand chunk Unable to form chunk Unable to understand chunk Customers Pay Manufacturers Manufacturers Produce Goods Customers Consume Goods Comments: 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0087859/manifest

Comment

Related Items