UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Ontological and cognitive principles on information systems modelling Saghafi, Arash 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_november_saghafi_arash.pdf [ 2.39MB ]
Metadata
JSON: 24-1.0319119.json
JSON-LD: 24-1.0319119-ld.json
RDF/XML (Pretty): 24-1.0319119-rdf.xml
RDF/JSON: 24-1.0319119-rdf.json
Turtle: 24-1.0319119-turtle.txt
N-Triples: 24-1.0319119-rdf-ntriples.txt
Original Record: 24-1.0319119-source.json
Full Text
24-1.0319119-fulltext.txt
Citation
24-1.0319119.ris

Full Text

Ontological and Cognitive Principles on Information Systems Modelling by  Arash Saghafi  BSc, Sharif University of Technology, 2008 MM, The University of British Columbia, 2010 MScB, The University of British Columbia, 2012  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF  THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Business Administration)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  November 2016  ©Arash Saghafi, 2016  ii Abstract Information systems are representations or models of real-world applications. Based on this premise, success of an information system is contingent on how effectively and faithfully the representations are generated and interpreted by analysts and designers. Prior research has suggested using ontology — a branch of philosophy that deals with the order and structure of reality in the broadest sense — as guidance for the modelling process. It is expected that by improving the ontological expressiveness of conceptual models, they will become more faithful and effective representations of the real world. This thesis focuses on information models that are rooted in ontology and users’ performance of cognitive tasks when using such models. Following the three-study structure of doctoral theses, my first study synthesized the prior work that had empirically evaluated the impact of ontological guidance on users’ understanding of conceptual models. The analysis indicated a strong effect of ontological guidance on improving users’ understanding of the “conceptual domain models”, particularly for tasks that required a deeper level of understanding. This provides scientific evidence in favour of incorporating ontological guidance in education and in practice of systems analysis. My second and third studies investigated a data modelling approach that is based on ontological principles, namely the instance-based paradigm, which is an alternative to the traditional data management method. Unlike the traditional approach, the instance-based paradigm requires neither imposing pre-defined structure over the data nor central control/planning. Study 2 evaluated users’ performance in (the cognitive task of) information retrieval. It indicated that users of instance-based representations are able  iii to formulate queries more accurately compared with users of class-based representations.  Study 3 broadened the scope and focused on knowledge discovery and exploration of information (that was not necessarily created for the intended application). Results of a laboratory experiment demonstrated that users of instance-based data were able to identify more potentially interesting patterns compared with users of class-based data.  With the current emphasis on information analytics and importance of incorporating insights from organizational data into decision-making, the latter two studies show that the instance-based model is a promising approach to satisfy the emerging needs of information users.   iv Preface The research work presented in this dissertation, including framing research questions, designing and conducting of the experiments, data collection and interpretations, as well as drafting and preparation of the thesis dissertation was completed by the author, Arash Saghafi, under the direct supervision of Professors Yair Wand and Izak Benbasat at the Sauder School of Business, The University of British Columbia, Vancouver, BC, Canada, and Professor Jeffrey Parsons at Faculty of Business Administration, Memorial University of Newfoundland, St. John’s, NL, Canada. The present work has significantly benefited from the feedback and revisions from my supervisors. A version of Chapter 2 is under final preparation for submission to a research journal for peer review. Chapters 3 and 4 are currently under a second round of review (as one manuscript) at a peer reviewed research journal. Additionally, parts of Chapters 2 and 3 have been published in proceedings of the international conferences listed below:  Saghafi, A., and Wand, Y. 2014. “Do Ontological Guidelines Improve Understandability of Conceptual Models? A Meta-Analysis of Empirical Work,” 2014 47th Hawaii International Conference on System Sciences (HICSS), Waikoloa, HI, pp. 4609–4618.  Saghafi, A., Wand, Y., & Parsons. J. 2016. “Querying Instances — A Protocol Study”, accepted to be included in the proceedings of the Americas Conference on Information Systems, San Diego, United States The UBC Behavioural Research Ethics Board has approved the empirical parts of this work. The ethics approval certificate number is H14-02233.   v Table of Contents Abstract .......................................................................................................................... ii Preface .......................................................................................................................... iv Table of Contents .......................................................................................................... v List of Tables ................................................................................................................ ix List of Figures .............................................................................................................. xi Acknowledgements ..................................................................................................... xii Dedication ................................................................................................................... xiv Chapter 1: Introduction ................................................................................................. 1 Chapter 2: A Meta-analysis of the Effect of Ontological Guidance on Users’ Understandability of Conceptual Models ......................................................... 7 2.1 Synopsis ........................................................................................................ 7 2.2 Introduction .................................................................................................... 8 2.3 Review of Studies on Ontological Guidance ................................................ 11 2.3.1 Ontological Guidance ....................................................................... 11 2.3.2 Introducing the Papers Gathered for the Meta-analysis .................... 17 2.3.3 Cognitive Theories ............................................................................ 24 2.3.3.1 Cognitive Theory of Multimedia Learning (CTML) ............................ 24 2.3.3.2 Theory of Semantic Networks ........................................................... 26 2.3.3.3 Problem Solving Theory and the Theory of Cognitive Fit .................. 27 2.4 Meta-analysis Method ................................................................................. 28 2.4.1 Choice of Analysis Model ................................................................. 28 2.4.2 Selection of Papers in the Pool ......................................................... 29 2.4.3 Variables Used in the Study ............................................................. 29 2.4.4 Data Analysis .................................................................................... 30 2.5 Summary of Findings, Discussion, and Implications ................................... 35 2.5.1 Limitations, Strengths, and Weaknesses of the Study ...................... 37 2.6 Conclusion and Future Research ................................................................ 40 Chapter 3: Role of Instances in Understanding and Querying Data ....................... 43 3.1 Synopsis ...................................................................................................... 43 3.2 Introduction .................................................................................................. 43  vi 3.3 Theoretical Foundations of the Instance-based Paradigm .......................... 46 3.3.1 Independence of Instances from Classes ......................................... 46 3.3.2 Classification: Users Create Classes (and Views) on Demand ........ 48 3.4 Representation of Instance Data ................................................................. 52 3.5 Research Model .......................................................................................... 57 3.6 Experiment 1 ............................................................................................... 63 3.6.1 Design and Experimental Material .................................................... 63 3.6.2 Participants ....................................................................................... 66 3.6.3 Procedure ......................................................................................... 68 3.6.4 Data Analysis and Results ................................................................ 69 3.6.4.1 Perceived Restrictiveness and Intentions to Use .............................. 75 3.6.4.2 Task Completion Time ...................................................................... 76 3.6.4.3 Post hoc Analysis ............................................................................. 78 3.7 Experiment 2 ............................................................................................... 80 3.7.1 Design, Participants, Experimental Material, and Procedure ............ 80 3.7.2 Data Analysis and Results ................................................................ 82 3.8 Limitations and Validity Threats ................................................................... 85 3.8.1 Limitations of the Instance-based Approach ..................................... 85 3.8.2 Validity of the Experiment and Possible Improvements .................... 86 3.9 Implications ................................................................................................. 87 3.9.1 Theoretical Implications .................................................................... 87 3.9.2 Practical Implications ........................................................................ 88 3.10 Summary and Future Research .................................................................. 89 Chapter 4: Visual Analytics on Instance-based Data Models .................................. 91 4.1 Synopsis ...................................................................................................... 91 4.2 Introduction .................................................................................................. 91 4.3 Prior Research and Relevant Foundations .................................................. 93 4.4 Research Model .......................................................................................... 95 4.5 Experiment .................................................................................................. 97 4.5.1 Design and Experimental Material .................................................. 100 4.5.2 Participants ..................................................................................... 104  vii 4.5.3 Experimental Task and Procedure.................................................. 105 4.5.4 Data Analysis and Results .............................................................. 106 4.5.5 Additional and post hoc Analyses ................................................... 109 4.6 Challenges and the Validity of the Study ................................................... 111 4.7 Potential Implications ................................................................................ 113 4.8 Concluding Remarks ................................................................................. 114 Chapter 5: Summary and Conclusions ................................................................... 116 References ................................................................................................................. 120 Appendices ................................................................................................................ 135 Appendix A Material (Experiments in Chapters 3 and 4) ....................................... 135 A.1 Pre-experiment Questionnaire ...................................................................... 135 A.2 Post-experiment Questionnaire ..................................................................... 136 A.3 Travel Agency Domain (Experiment I) .......................................................... 136 A.4 Questions (both Control and Treatment) ....................................................... 140 A.5 Consulting Firm Domain ............................................................................... 141 A.6 Material for the Control and Treatment Groups ............................................. 141 A.7 Questions (for both Control and Treatment).................................................. 145 Appendix B Answer Keys: Experiment 1 and 2 (Chapter 3) .................................. 147 Appendix C Examples of Marked Responses From Subjects in Experiment I (Chapter 3)....................................................................................................... 150 Appendix D Testing Distribution Normality and MVA Tables ............................... 152 D.1 Experiment 1 from Chapter 3 ........................................................................ 152 D.2 Experiment from Chapter 4 ........................................................................... 157 Appendix E Appendix E. Material for Experiment in Chapter 4 ............................ 158 E.1 Human Resource Data: Case Description and Task ..................................... 158 E.2 Material for the Control Group ....................................................................... 158 E.3 Material for the Treatment Group .................................................................. 159 Appendix F Examples of Visual Analytics Done by Subjects from Chapter 4 ..... 162 F.1 Statement and Visualization from Subject #18 (Instance-based) .................. 162 F.2 Statement and Visualization from Subject #21 (Class-based) ...................... 162 F.3 Statement and Visualization from Subject #35 (Instance-based) .................. 163  viii F.4 Statement and Visualization from Subject #12 (Class-based) ...................... 164 Appendix G Converting Class-based Tables to Instance-based .......................... 165   ix List of Tables Table 1. Dependent variables in the meta-analysis ....................................................... 20 Table 2. Studies included in the meta-analysis ............................................................. 22 Table 3. Meta-analyses done in this study .................................................................... 32 Table 4. Fail-safe N for each iteration of the meta-analysis .......................................... 39 Table 5. Example of information retrieval using instance-based and traditional representations ................................................................................................... 57 Table 6. Experimental design and subject allocation .................................................... 67 Table 7. Measures of prior knowledge .......................................................................... 68 Table 8. Descriptive statistics ........................................................................................ 71 Table 9. Between and within factors in the experiment ................................................. 73 Table 10. Covariates in the experiment ......................................................................... 74 Table 11. Regression analysis related to usage intentions ........................................... 76 Table 12. Task completion time .................................................................................... 77 Table 13. Breakdown of travel agency domain questions ............................................. 78 Table 14. Breakdown of the consulting domain questions ............................................ 79 Table 15. Subjects’ usage of available resources in the experiment. ............................ 80 Table 16. Measures of prior knowledge ........................................................................ 81 Table 17. Agreement between coders ........................................................................... 83 Table 18. Results of the protocol analysis ..................................................................... 84 Table 19. Challenges reported by the subjects in the protocol analysis ........................ 84 Table 20. Applications that can benefit from the instance-based approach .................. 89 Table 21. Measures of prior knowledge ...................................................................... 105 Table 22. Descriptive statistics .................................................................................... 107 Table 23. Performance with respect to number of true statements ............................. 107 Table 24. Performance with respect to reliability ......................................................... 108 Table 25. An example of coding for insightful statements ........................................... 109 Table 26. Performance with respect to insights gained from patterns ......................... 109 Table 27. Regression analysis related to usage intentions ......................................... 110 Table 28. Task completion time .................................................................................. 111 Table A1. Pre-experiment questionnaire ..................................................................... 135  x Table A2. Post-experiment questionnaire (taken from Wang and Benbasat (2009) and Xu et al. (2013), reworded to reflect experimental task) ................................... 136 Table B1. Answer key to travel agency case ............................................................... 147 Table B2. Answer key to the consulting case .............................................................. 148 Table C1. Sample scoring from answers of participants in the class-based group ..... 150 Table C2. Sample scoring from answers of participants in the instance-based group 151 Table D1. Multivariate tests table for Experiment 1 in Chapter 3……………………….153 Table D2. Between-subjects effects for Experiment 1 in Chapter 3……………………154 Table D3. Within-subjects contrasts from Experiment 1 in Chapter 3………………….156 Table D4. Tests of between-subjects effects from Experiment 1 of Chapter 3………..157 Table E1. List of properties relevant in the domain ..................................................... 161  xi List of Figures Figure 1. Thesis roadmap ............................................................................................... 6 Figure 2. Performance measures observed in the meta-analysis ................................. 18 Figure 3. Average effects’ sizes of the meta-analysis based on performance measures. d: Cohen’s d; CI: 95% confidence interval .......................................................... 34 Figure 4. Relational representation of Hospitals A and B .............................................. 48 Figure 5. Instance-based representation of data in Hospital A and Hospital B .............. 48 Figure 6. Two things in the domain that are connected to each other ........................... 52 Figure 7. (a) Generic view of travel agency data and (b) sample data in an instance-based system ..................................................................................................... 54 Figure 8. Travel agency schema in traditional (class-based) representation................. 56 Figure 9. Impact of domain familiarity on performance.................................................. 73 Figure 10. Schema of the dataset used in the experiment .......................................... 101 Figure 11. Tableau interface for the class-based group.7 ........................................... 103 Figure 12. Tableau interface for the instance-based group ......................................... 104 Figure A1. Travel agency domain, control group general schema .............................. 137 Figure A2. Travel agency domain, control group actual data ...................................... 138 Figure A3. Travel agency domain, treatment group general schema .......................... 139 Figure A4. Travel agency domain, treatment group actual data .................................. 140 Figure A5. Consulting firm domain, control group general schema ............................. 142 Figure A6. Consulting firm domain, control group actual data ..................................... 143 Figure A7. Consulting firm domain, treatment group general schema......................... 144 Figure A8. Consulting firm domain, treatment group actual data ................................ 145 Figure D1. Testing normality for the t-test, experiment (Chapter 3) ............................ 152 Figure D2. Testing normality for the t-test, experiment (Chapter 4) ............................ 157 Figure E1. Schema of ACME ...................................................................................... 159 Figure E2. Treatment group ........................................................................................ 160 Figure F1. Visual analytics by Subject #18 (instance-based) ...................................... 162 Figure F2. Visual analytics by Subject #21 (class-based) ........................................... 163 Figure F3. Visual analytics by Subject #35 (instance-based) ...................................... 163 Figure F4. Visual analytics by Subject #12 (class-based) ........................................... 164   xii Acknowledgements My everlasting debt of gratitude is owed to Professor Yair Wand. Yair has been the best supervisor to my student, mentor to my mentee, and master to my apprentice. He has taught me how to write, present, analyze, and think. Yair’s legacy in information systems discipline is one of the richest, and I hope to continue it in my career with Yair’s help. I also need to express gratitude to Professor Jeffrey Parsons, who along with Yair supervised two of the studies in my thesis. Jeff’s wealth of knowledge and advice has helped me greatly in the progression of my research. I greatly appreciate his support, and I look forward to many more collaborations in the future. Also, I would like to acknowledge Professor Izak Benbasat’s role in my research identity. Because of Izak’s influence on my education, I decided to focus on empirical as well as behavioural aspects of the information systems field. His advice has greatly enriched my work.  Professor Carson Woo has also had an important role throughout my years as a graduate student at the Management Information Systems division. Collaborations and his support have provided great opportunities for me, and I am grateful for them. In addition, I am grateful for the mentorship and advice that was generously given to me by Professor Andrew Burton-Jones. Andrew shared the data that was of integral importance to one my studies. The conversations we had, and also the great opportunities he provided for me to review papers at the highest level, had a significant role in my development as a researcher.   xiii I would like to extend my gratitude to Professors Ronald Cenfetelli, Hasan Cavusoglu, and Ning Nan — the esteemed faculty of the MIS division — for their support, constructive comments, and positive feedback during my PhD studies. Also, my sincere thanks and appreciation to Paula Chang and Elaine Cho for their help and support during my MSc and PhD studies. Financial support from the Social Sciences and Humanities Research Council (SSHRC) of Canada has enabled me to conduct experiments and travel to conferences to present my work. Its support is greatly appreciated. My dear friends in the MIS division also deserve acknowledgement for their consultations, company, and friendship throughout this journey. I would like to thank Amin, Usman, Amin, Moksh, Camille, and Michael. I should also mention my dear friends who are like family to me. Amir Mehdi, Bahman, Farshad, and Alireza have been by my side during my graduate studies — their support through the worst, and encouragement through best moments will not be forgotten. I have saved my family for last; my love and gratitude go to my grandparents — Mahin and Ali — for their unconditional love. They have always been there for me, even at times when all other doors had closed on me! Finally, my utmost gratitude to my parents — Maryam and Mojtaba — whose love and support cannot be described by any words! They are the best parents one could have, and I owe everything to them. I will strive to be like them when I have my own family.    xiv Dedication   To my parents: Maryam and Mojtaba 1 Chapter 1: Introduction Information systems are representations or models of real-world applications (Wand and Weber 1989). Based on this premise, success of an information system is contingent on how effectively and faithfully the representations are generated and interpreted by analysts and designers (Moody 2005, Wand and Weber 2002). Prior research has suggested using ontology — a branch of philosophy that deals with the order and structure of reality in the broadest sense (Angeles 1981) — as guidance for the modelling process. It is expected that improving the ontological expressiveness of conceptual models1 will make them more faithful and effective representations of the real world (Wand and Weber 1989). Various ontological theories exist; some of the ontological foundations used in the information systems literature include Bunge (1977,1979), Searle (2006), and GOL (Degen et al. 2001). According to Allen and March (2006a) and Fonseca (2007), the most widely used ontology in systems analysis and design and conceptual modelling research is Bunge’s ontology (Bunge 1977), with most empirical evaluations choosing Bunge’s ontology as their focus — hence, it is the chosen ontology as the theoretical foundation of this thesis.  Wand and Weber (1989) adapted this approach to the information system domain. A large body of work has focused on development and evaluation of ontological guidelines for different conceptual modelling grammars (e.g., Recker et al. (2006) for Business Process Model and Notation (BPMN), Evermann and Wand (2006) for Unified Modelling Language (UML), and Bera et al. (2011) for Web Ontology Language (OWL)).                                             1Ontological expressiveness is discussed in detail in Section 2.2.1.   2 The overall objective of this work is to focus on some of the information modelling approaches that are rooted in ontological principles and to study their effect on human cognition. The three studies in this thesis — described briefly below and in depth in their respective chapters — show that information models that are faithful to ontological principles improve users’ performance of cognitive tasks. The results contribute to research by providing theoretical contribution as well as empirical evidence in favour of employing ontology in information modelling. The empirical evidence presented in this work could also inform practitioners of the advantages of employing ontological principles in systems analysis and design.  The first study of my thesis (Chapter 2) synthesizes the prior work that had empirically evaluated the impact of ontological guidance on users’ performance of cognitive tasks (e.g., problem solving) when using conceptual models as well as data models.2 The motivation for focusing on this topic was the discussion in the literature regarding the trade-off between ontological expressiveness of conceptual models and their simplicity (Wand and Weber 1993, Bowen et al. 2009). More specifically, it is expected that by making conceptual models more ontologically expressive, the number of constructs may increase in the conceptual model, thus making it more complicated. Chapter 2 studies whether ontological expressiveness could enhance users’ performance of cognitive tasks. As mentioned, several ontological theories have been discussed in the information systems literature; however, the scope of Chapter 2 was set to empirical                                             2 Conceptual models are formal representations of a domain, used for the purpose of understanding the domain and communicating the requirements (Mylopoulos 1992) — in other words, they are models of the business entities or real-world domains (Wand and Weber 2002). Conceptual models are typically used in the analysis phase of the systems development lifecycle (SDLC). The next step in the SDLC is design, in which blueprints of an information system are designed (Elmasri and Navathe 2011) based on the requirements identified in the conceptual models. These design blueprints or data models could be representations of the structure of the information in a database.  3 studies on Bunge’s ontology, owing to its wide application in the information systems literature (Allen and March 2006a, Fonseca 2007). The analysis indicates a strong effect of ontological guidance on improving the users’ understanding of the “conceptual domain models”. However, the results related to the performance of users of ontologically guided “data models” have been inconclusive; more specifically, Allen and March (2006b) did not find a significant difference between the accuracy of queries (a measure of performance) of subjects using a data model based on the Bunge–Wand–Weber (BWW) ontology (i.e., data organized around things) versus models with different principles (i.e., event-based or hybrid data models). Bowen et al. (2004) showed a positive impact of ontological guidance on users’ performance (i.e., having fewer semantic errors in query formulation). However, in a later paper, Bowen et al. (2009) found different results; they showed that for more complex data models, the users of ontologically guided models have more semantic errors compared with users of non-guided models. All in all, this study showed that employing ontological principles in conceptual modelling provides value in education as well as practice of systems analysis. To address the aforementioned inconclusive results regarding ontologically-guided data models, Chapters 3 and 4 investigate a data modelling approach that is actually based on ontological and cognitive principles — namely, the instance-based paradigm (Parsons and Wand 2000). The instance-based approach focuses on “things” — the building blocks of the world in Bunge’s ontology — rather than classes. More specifically, unlike the traditional approach, the instance-based paradigm does not require imposing pre-defined structure over the data, nor does it entail central control  4 and planning. These chapters study users’ ability to perform cognitive tasks of routine retrieval of information tasks (i.e., exploitation of information) and discovery and identification of patterns3 (i.e., exploration of information). Built upon previous research that has been done on the instance-based paradigm (Parsons and Wand 2000, 2013; Lukyanenko et al. 2014), Chapters 3 and 4 within this thesis propose a theoretical model regarding the potential of the instance-based approach to facilitate understanding the semantics of information for the purpose of identifying new phenomena as well as routine retrieval of information by “content consumers” (Parsons and Wand 2014). Within the scope of the current work, content consumers are users who are familiar with the domain but do not necessarily have extensive knowledge of databases and data models. These users can retrieve information from the information system but have no control over the design of the database (i.e., are not database administrators). Morton et al. (2014) call this group “data enthusiasts”. According to their definition, data enthusiasts are a growing group of users “who need to analyze the data, [however] these users are without formal training in data science” (p. 453). These users could assume a wide variety of roles, such as business analysts, students using data for a course project, or journalists who survey the data to support their stories. This group of users are not necessarily professional database designers or data scientists. From this point forward, the terms “content consumer” and “business users” are used interchangeably to refer to this group of information users.                                             3 According to Frawley et al.’s (2009) definition of pattern, “given a set of facts (data) F, a language L, and some measure of certainty C, a pattern is a statement S in L that describes relationships among a subset of Fs of F with certainty C, such that S is simpler (in some sense) than the enumeration of all facts in Fs”.  5 Chapter 3 evaluates users’ performance in information retrieval tasks. It demonstrates that users of the instance-based representations are able to formulate queries more effectively compared with users of class-based representations. The study also measures intentions to adopt a given representation by users. Findings show that intentions to adopt instance-based representations are not significantly different from intentions to adopt class-based representations — even though subjects had already been trained in class-based methods. Chapter 4 focuses on the more interesting aspect of knowledge discovery and exploring information (that was not necessarily created for the intended application). In a more open-ended setting, users of instance-based representations managed to report patterns of higher quality and of more interest to potential stakeholders of the information system. In this study, intentions to adopt instance-based representations were significantly higher than those of class-based representations. The latter two studies show that the instance-based representation is a superior approach compared with traditional class-based methods for both exploitative and exploratory tasks by (novice) content consumers within an organization. Figure 1 illustrates a roadmap of how these studies are related to each other.  6    Figure 1. Thesis roadmap  7 Chapter 2: A Meta-analysis of the Effect of Ontological Guidance on Users’ Understandability of Conceptual Models4 2.1 Synopsis Information system applications, as representations of real-world applications, are intended to be faithful accounts of the relevant aspects of the domains. As an integral part of the development process, systems analysts create conceptual models in order to understand the application domains and communicate system requirements with stakeholders, analysts, designers, and implementers. Failure to understand the domain requirements has been a prominent reason for the failure of information technology (IT) projects. Improving the quality of models representing domains would have a major impact on the success of development and adoption of information systems. To guide the modelling process, researchers have suggested using ontology, a branch of philosophy that deals broadly with the structure of reality. Applying ontological guidance would make the model more expressive and, hence, a better representation of the reality; however, it tends to make the models more complicated. A large body of work has focused on the trade-off between expressiveness and simplicity of conceptual models. This paper describes a statistical synthesis (i.e., meta-analysis) of selected empirical research that examines application of Bunge’s ontology to guide modellers and its impact on users’ understandability of conceptual models. The result of the meta-analysis shows that ontological guidance can indeed improve users’ performance of cognitive tasks, especially those requiring deeper understanding of the models, thus                                             4A preliminary version of Chapter 2 has been published in Saghafi and Wand (2014). The current chapter extends that work by increasing the number of studies in the meta-analysis pool (from eight to 22), and proposing different theoretical contributions — which are discussed in length in this chapter.  8 providing evidence for the value of applying ontological guidance in conceptual modelling. 2.2 Introduction Systems analysts create conceptual models in order to understand information system application domains and communicate system requirements (Mylopoulos 1992) with stakeholders, analysts, designers, and implementers. Failing to understand the domain requirements is considered one of the major causes of failure in information systems development projects (Wand and Weber 2002, p. 363). In addition, correcting an error in understanding user requirements post-implementation of the information system is “100 times more costly than it is to correct it during requirements analysis” (Moody 2005, p. 245). Thus, by enhancing the quality of conceptual models and making them more understandable, one could expect a major impact on the success of information systems projects. Conceptual models, as representations of real-world domains, are intended to provide a faithful representation of the relevant aspects of the domain (Wand and Weber 2002). Use of ontology, “a branch of philosophy that deals with the order and structure of reality in the broadest sense possible” (Angeles 1981), has been proposed to guide analysts as to what aspects of the world ought to be modelled (Wand and Weber 1989). Conceptual models that have been mapped to ontological concepts are considered to be more faithful to the reality and thus clearer and more complete (Wand and Weber 1993) than models that were not based on the structure of the reality (i.e., models that are not ontologically valid); we consider such models to be ontologically guided. However, these models tend to get more complicated; hence, modellers will face a  9 trade-off between expressiveness and simplicity (Wand and Weber 1993, p. 235). This begs the question of whether ontological guidance should be adapted in conceptual modelling.  We should emphasize that evaluating the clarity and completeness aspects of a representation (e.g., a conceptual model) is not contingent on using a particular ontological theory, as the representation argument would hold true with respect to various ontological theories that exist. The choice of an ontology, however, could be a point of contention — as will be discussed later. The focus of the current work is on Bunge’s ontology (Bunge 1977) — adapted for use in the information systems domain by Wand and Weber (1989, 1993, 1995, 2002) — as it is considered the most widely used ontology in systems analysis and design and in conceptual modelling research (Allen and March 2006a, Fonseca 2007). A large body of work has focused on development of certain rules to guide modellers in creating more ontologically valid models using different conceptual modelling grammars (Recker et al. 2006, Evermann and Wand 2006, Bera et al. 2011) as well as evaluation of the effectiveness of the guidelines on users’ performance of cognitive tasks using conceptual models, in particular studying the trade-off between simplicity and expressiveness. For example, Bodart et al. (2001) evaluated the ontological expressiveness of entity-relationship (ER) diagrams using different measures of understanding; they showed that for certain measures (e.g., recall) simpler models led to better performance by subjects, while for problem solving tasks, the more expressive models were advantageous. Along the same lines, Bowen et al. (2009) investigated the effect of ontological guidance on models with varying complexities and found that users  10 who write queries on larger but non-ontologically guided models tend to outperform ones using guided models. The current paper investigates the value of applying ontological guidance (based on Bunge’s ontology) by conducting a rigorous and quantitative review of previous empirical work that evaluated the effect of ontological guidance on users’ performance; the purpose of this review is to better understand the trade-off between simplicity and expressiveness. To fulfill this objective, we performed a statistical synthesis (i.e., meta-analysis) to reflect the findings of the previous research and broaden its base (Borenstein et al. 2011). Saghafi and Wand (2014) gathered a number of empirical studies about the impact of ontological guidance on users’ understanding of conceptual models. In the previous version we analyzed eight papers. Addressing the limited number of papers in the previous meta-analysis, we extended it by including 14 more papers to a pool of 22 studies and used a different meta-analysis model (namely, the random effects model, which is discussed in the Section 2.3.1). We then statistically synthesized the reported effect sizes from the pooled papers and presented the results using an elaborate categorization of dependent variables based on similarity of measures (i.e., five levels of analysis as opposed to three in the initial version of this work and described further in Sections 2.2.2 and 2.3.4). The analysis shows that ontological guidance can indeed improve users’ understanding of conceptual models, particularly in tasks that require deeper understanding. In the current paper, we further the theoretical discussion on usage of different ontological foundations. We then study the effect of Bunge’s ontological guidance on different aspects of users’ performance by synthesizing the prior empirical research on  11 this topic in a meta-analysis. Our findings address the trade-off between simplicity and complexity of conceptual models across different levels of users’ understanding of models. In addition, our results could assist practitioners by providing evidence related to application of ontological guidance and possible improvements to the quality of conceptual models. This might encourage them to incorporate ontological analysis into their training and analysis methodologies. 2.3 Review of Studies on Ontological Guidance 2.3.1 Ontological Guidance Philosophers have been studying the question of what exists in the world since ancient times (Almeida 2013), from Aristotle to contemporary philosophers (e.g., Bunge 1977, Searle 2006). The branch of philosophy that deals with the structure of reality in the broadest sense is called ontology (Angeles 1981). Performing an exhaustive survey of all the ontologies that have been described over the ages may be an impossible task and also beyond the scope of the current work. Here, we focus only on applications of ontology in information systems.  The underlying premise is that information systems are representations or models of real-world applications (Wand and Weber 1993). Thus, success of an information system is contingent on how effectively and faithfully the representations are generated and interpreted by analysts and designers (Wand and Weber 2002). In order to achieve this goal (i.e., building faithful representations with respect to reality), modellers can seek guidance from ontologies (Wand and Weber 1989, Allen and March 2006a, Fonseca 2007) to base their models on the structure of reality.  12 As mentioned, several ontological theories exist, but according to Fonseca (2007) the most widely used ontology in the systems analysis and design and conceptual modelling research is Bunge’s ontology (Bunge 1977). Wand and Weber (1989) adapted this approach to the information system’s domain in order to theoretically examine different aspects of representation in information systems. In Bunge’s ontology, we consider the world to be made of things that possess properties. The properties are represented as attributes, and the values of the said attributes constitute the state of the thing at a given time (Wand and Weber 1990). The combinations of attribute-values that are possible within a domain are called lawful states. If it is considered that an information system is also a thing (that is representing real-world), “lawful states of the information system should reflect the lawful states of the real-world system” (Wand and Wang 1996, p. 89). We note that scholars in the information systems discipline have used other ontologies as well. Most notable are Allen and March (2006a, 2006b, 2012), who are proponents of using Searle’s (2006) ontology instead of Bunge’s. The focus of Searle’s (2006) ontology is on social or institutional reality. The institutional reality is the “world of conceptual objects created by human intentionality and the characteristics ascribed to material or conceptual objects for human purposes” (Allen and March 2006a, p. 1). Allen and March (2006a) believe that “the domain of Bunge’s ontology is the physical world, and it has no place for human intentions, interpretations, or meaning” (p. 1); thus unfit to present the social concepts. As an example, March and Allen (2014) refer to the inception of the state of Utah as an institutional fact that Bunge’s ontology — according to them — fails to represent. Based  13 on Searle’s (2006) ontology, a speech act (declaration) by the United States government on January 4, 1896, made Utah a state. Acceptance of this declaration by citizens of United States (i.e., collective intentionality) transformed this declaration into an institutional fact.  Shanks and Weber (2012), however, disagree with Allen and March’s interpretation. They refer to Bunge’s (1977, p. 58) distinction between conceptual objects and substantial objects (or things) and posit that “the real world of substantial objects ultimately is unknowable. As a result, humans use conceptual objects to express their understanding of the real world” (Shanks and Weber 2012, p. 968). In fact, “the only way humans can engage in discourse about or think about concrete things (and events that occur to concrete things) is via the concepts (conceptual models) that humans have devised to describe the things (and the events that occur to them)” (Shanks and Weber 2012, p. 968). Based on this argument, they believe that Bunge’s ontology can indeed be used as a theoretical foundation for modeling conceptual objects. Thus, Shanks and Weber (2012), disagree with Allen and March’s claim that Bunge’s ontology cannot be used in modeling conceptual objects.  Referring back to the example of the state of Utah (from March and Allen 2014), we posit that Bunge’s ontology recognizes the existence of the state of Utah as a concept devised by humans to describe the world. Thus, based on Shanks and Weber’s explanation, Bunge’s ontology is indeed capable of representing a conceptual object such as the state of Utah.  14 As an alternative ontological theory that could guide conceptual modellers, Guizzardi et al. (2002, 2004) propose a type of guidance that is rooted in the general ontological language (GOL) proposed by Degen et al. (2001). GOL is based on set theory as it divides “the world into two sorts of entities. On the one hand are urelements, which form an ultimate layer of entities lacking any set-theoretic structure in their make-up. On the other hand are sets, which rise above these urelements in the familiar cumulative hierarchy” (Degen et al. 2001, p. 35). Urelements, in other words, are “individuals” with properties such as quantity, space, and time. Sets are ontological constructs that are made of individuals — akin to the “kind” concept in Bunge’s ontology.  GOL is similar to Bunge’s ontology in the sense that it considers entities as building blocks of the world. To demonstrate the similarities, Guizzardi et al. (2002) mapped constructs of GOL to Bunge and identified counterparts in respective ontologies. Later, Guizzardi et al. (2004) proposed ontological guidelines based on GOL to be used in the UML grammar. Similarly, Evermann and Wand (2006) have also proposed ontological guidelines to be used in UML, although based on Bunge’s ontology. To compare these guidelines (based on GOL and Bunge’s ontology), Hadar and Soffer (2006) performed a quantitative analysis of the UML class diagrams created by 11 professionals (i.e., software developers with 2–12 years of experience). In their analysis, Hadar and Soffer (2006) tried to identify different types of modelling variations5 with vagueness in the guidelines “for deciding how to map reality into modelling constructs” (p. 568). They identified seven variation types (six related to syntax and semantics and one different                                             5 According to Soffer and Hadar (2007), different analysts create different models given the same domain. They defined model variations as “the differences in constructs and relations between adequately constructed models” (p. 599).   15 naming convention). They examined how ontological guidelines can reconcile (or prevent) these variations. The framework by Evermann and Wand (2006) had conclusive rules for five of those variation types and implicit guidance for one. The guidelines developed by Guizzardi et al. (2004) provided conclusive rules for two types of variations, partial rules for another two, implicit guidance for one, and no guidance at all for the remaining variation type. Based on Hadar and Soffer’s (2006) analysis, guidelines rooted in Bunge’s ontology have a wider coverage and applicability in reconciling modelling variations.  In addition to the debates on types of ontologies (e.g., Bunge, Searle, or GOL) used as guidance, there are empirical works that demonstrate that employing ontological guidance could make the models complicated and thus negatively affect users’ performance. Bodart et al. (2001) showed that users had lower recall of ontologically guided models in comparison with non-guided models, as ontologically guided models tend to have more constructs present. Bowen et al. (2009) studied users’ ability to formulate queries using conceptual models. They moderated the complexity of their tasks by presenting simple models (with fewer constructs) and complex models (with higher number of constructs) to their subjects. Their experiment showed that ontological guidance could improve the accuracy of users’ queries (i.e., correctness of the query result) based on simple models, while it had a negative effect on the task based on the more complex models. They concluded that ontological guidance could overcomplicate the models, and this cost may not justify the benefits of ontological guidance.  The meta-analysis done in this paper intends to study the trade-off of using ontological guidance (i.e., conceptual models becoming more faithful to reality and, at the same,  16 time more complex) by aggregating the previous empirical work on this topic. As mentioned earlier, the only focus in this thesis is on Bunge’s ontology (due to its importance in the field, and also for the purpose of scope).  Prior research focused on the development of ontological guidelines — based on Bunge’s ontology — for different conceptual modelling grammars (e.g., Recker et al. (2006), Evermann and Wand (2006), Bera et al. (2011). The approach has also been used to evaluate the effectiveness of the guidelines on users’ performance of cognitive tasks using conceptual models (Bodart et al. 2001, Burton-Jones and Meso 2006). In the literature, ontological guidance has been manifested in two forms, first and foremost being the idea of “ontological expressiveness”, that an ontologically clear and complete conceptual modelling grammar can generate better scripts6 (Wand and Weber 1993). Ontological expressiveness has been used to guide conceptual modellers when using grammars such as ER (Wand et al. 1999, Bodart et al. 2001, Bowen et al. 2004), UML (Evermann and Wand 2006), and business process models (Recker et al. 2011). Second, ontological guidance was applied to good decomposition principles7 (Wand and Weber 1990) and evaluated by Burton-Jones and Meso (2006, 2008). We should note that the theoretical principles behind ontological expressiveness and good decomposition model are not solely reliant on the Bunge–Wand–Weber ontology. In other words, the same principles that are used to evaluate ontological                                             6 More specifically, an ontologically complete grammar is one in which there is a total mapping between constructs of the conceptual modelling grammar and the ontological concepts (such as thing, property, and event). Ontological clarity is described based on three types of deficiencies: (i) construct overload occurs when a construct from the grammar is used to model two or more concepts from ontology, (ii) construct redundancy occurs if two or more constructs map to the same ontological concept, and (iii) construct excess occurs when a construct from the grammar does not map to any ontological concepts.  7 Decomposition is a top-down process used to identify the phenomena to be modelled in a conceptual model — for the sake of systems analysis (Wand and Weber 1995, Weber 1997).   17 expressiveness of conceptual models (Wand and Weber 1993) could be applied to evaluate the expressiveness of models referencing other ontologies (e.g., Searle’s ontology or GOL). Thus, the same methods applied here could be used to evaluate effects of other ontologies. 2.3.2 Introducing the Papers Gathered for the Meta-analysis The studies gathered for the meta-analysis investigated various aspects of the application of ontological guidance in information systems modelling; some studies focused on ontological clarity, some on completeness, and some on both dimensions. The theoretical guidelines that they evaluated were not necessarily the same. In addition, the conceptual modelling languages that the papers examined were different (e.g., ER, UML, data flow diagram (DFD), BPMN, and OWL). Similarly, the dependent variables focused on different aspects of performing cognitive tasks. The authors of many of the papers in our pool referred to theories of cognition (Section 2.2.3) to justify their hypotheses regarding users’ performance when using ontologically guided models vs. non-guided models. Table 1 presents a list of various dependent variables used by the studies in our meta-analysis pool along with the definition of each measure. The dependent variables used in these studies focus on different aspects of using information systems models (e.g., query formulation and knowledge identification — names as in the source material). Thus, we categorized the measures based on similarity of scope in order to analyze these variables from a higher level of abstraction (Figure 2) based on different levels and dimensions of cognition that are discussed in some of the cognitive theories used by the papers in our meta-analysis  18 pool (Section 2.2.3). As illustrated in Figure 2, the highest level of abstraction considered all variables as dimensions or aspects of user “performance”. Figure 2. Performance measures observed in the meta-analysis  At the second level, we distinguished between variables related to query formulation (using database models) and variables that measured users’ performance with regard to conceptual models. According to Mylopoulos (1992), conceptual models are formal representations of the domain used for the purpose of understanding and communication; the objects in a conceptual model are usually business components. Database models, however, model information that is stored in the system (usually in the form of tables); the purpose of data models is querying databases. The third item on this level was task completion time. Some of the studies examining this measure had not hypothesized about time (e.g., Allen and March (2006b)), whereas Bowen et al. (2006) attributed task completion time to efficiency of using the models. However, none of the studies made any predictions regarding users’ response time based on cognitive  19 theories from the literature. Task completion time was measured in the research on database models (Allen and March 2006b) as well as research on conceptual models (Bodart et al. 2001). Because there are no theoretical predictions regarding task completion time (as mentioned), we decided to situate this variable at a higher level in our abstraction hierarchy (as described above).  For the third level in this abstraction, we focused on variables related to users’ understanding of conceptual models, as the majority of the papers had reported measures from this category (Table 2). For this level, the variables that objectively measured users’ understanding of conceptual models (by rating users’ answers to some questions) were placed in the category of actual understanding, while the variables that relied on self-reported subjective evaluation by participants (e.g., perceived ease of understanding or confidence in correctness of answers) were placed in the category of perceived understanding. The fourth level of this hierarchy focuses on different types of actual understanding of conceptual models. Using the distinction made by Mayer (2003) (discussed later in Section 2.2.3.1 in more detail), the tasks that could be performed based solely on the presented material were categorized as surface-level understanding, whereas tasks that required integration of prior knowledge with information presented in the experimental material were classified as deep-level understanding (manifested in variables such as problem solving, knowledge identification, and quality evaluation effectiveness, as defined in Table 1).   20 Table 1. Dependent variables in the meta-analysis Dependent variable Explanation Grouping in meta-analysis Cloze test, and domain comprehension  Questions about the domain that are directly answerable from the model (Gemino and Wand 2005)7 Actual performance on conceptual models: surface level / domain  Model comprehension Evaluating what is directly observable from the model. This evaluation is also used for models that are void of semantics (e.g., Parsons (2011))   Actual performance on conceptual models: surface level / model Query formulation accuracy “Ratio of required semantic elements included in a subject’s query to the total number of semantic elements required in a correctly formulated query” (Allen and March 2006b, p. 273) Cognitive performance on database models (querying) Confidence in correctness of answers Subjects’ prediction of the correctness of their answers using a Likert scale (like Allen and March (2006b), Bowen et al. (2009), and Parsons (2011)) Perceived cognitive performance Quality evaluation effectiveness Correct identification of (all the) defects in a model (Milton et al. 2012). This task requires reference to prior knowledge in order to distinguish correct versus defective models Actual performance on conceptual models – deep level Verification accuracy “The ability to identify discrepancies between a data model and a given set of user requirements” (Moody 2002a, p. 487). This task requires elaboration and making inferences for the purpose of verification (as “inference” in theory of semantic networks) Actual performance on conceptual models – deep level Model recall accuracy Proportion of conceptual model constructs (e.g., entities, relationships) that participants recalled correctly, divided by the total number of constructs in the presented model (Bodart et al. 2001). Note: to measure this variable, the conceptual model diagrams were removed from the participants Actual performance on conceptual models – surface level – recall Documentation correctness The level of completeness of written documentation that was based on a conceptual model (Moody 2002b) Actual performance on conceptual models – deep level Perceived understanding (of model or domain) The effort required to understand a diagram (Burton-Jones and Meso 2006) Perceived cognitive performance Perceived ease of understanding (of model or domain) The degree to which the subject “believes that using a particular [conceptual model] would be free of effort” or easy to use (Moody 2002b, p. 2019) Perceived cognitive performance Problem solving Answering questions that require integration of prior knowledge with what is observable from the presented material (Gemino and Wand 2005) Actual performance on conceptual models – deep level Knowledge identification  Knowledge is used by agents to determine the actions required to attain their goals (Newell 1982, Bera et al. 2011). Knowledge identification is “the task of asking the right questions to determine what actions need to be taken to change the current state of affairs” (Bera et al. 2011, p. 885). Actual performance on conceptual models – deep level Response or completion time Time taken by a participant to answer questions (Bodart et al. 2001) Task completion time  21 Finally, the fifth level of the hierarchy categorizes variables that measure performance of tasks that can be done by referring solely to the material from the presented models (i.e., surface-level understanding). One of the categories in this level is recall accuracy. This variable, as used by Bodart et al. (2001), measures the number of constructs from the model (e.g., entities, relationships, cardinalities) that participants can recall from memory after the conceptual model is taken from them. Other categories at this level distinguish between objective evaluations of tasks that can be done using the presented model. If the experimental questions were related to some aspects of the domain, the variables were categorized as surface-level understanding of the domain (as appearing in the model itself). However, some variables were related to tasks that can be done without relying on the semantics of the domain (e.g., models that were void of semantics (Parsons and Cole 2005); these were categorized as surface level understanding of the model. In short, although the independent variables and the experiments in these papers were not identical, they were focusing on one abstract question: whether the presence of ontological guidance can improve users’ understandability of conceptual models. This motivated us to synthesize the findings of these papers and report our analysis and findings. Table 2 provides a list of the studies gathered for the meta-analysis. This table draws on the framework introduced by Gemino and Wand (2004) for evaluating and comparing conceptual modelling grammars.   22 Table 2. Studies included in the meta-analysis Authors Independent variable(s) Nature of study Task Dependent variable(s) Cognitive foundations Allen and March (2006b) Ontological foundation: state-based vs. event-based Intra-grammar – ER Interpreting the model and writing SQL code - SQL accuracy - Confidence and prediction of accuracy - Completion time References prior research, but no specific cognitive theory was used Bera et al. (2011) Ontological guidance on OWL ontologies Intra-grammar – OWL ontologies Interpreting the model - Knowledge identification - Perceived understanding - Perceived ease of understanding Cognitive fit Cognitive theory of multimedia learning Bera et al. (2014) - Ontology-based rules - Prior domain knowledge Intra-grammar – ER Interpreting the model - Problem solving  Cognitive theory of multimedia learning Bodart et al. (2001) Optional vs. mandatory properties Intra-grammar – ER Interpreting the model - Recall accuracy - Response accuracy - Response time - Problem solving  Semantic networks Theory of multimedia learning Bowen et al. (2004) Ontological clarity Inter-grammar – ER Interpreting the model and writing SQL code - Number of errors  SQL accuracy - Completion time - Confidence References prior research, but no specific cognitive theory was used Bowen et al. (2006) Ontological clarity Intra-grammar – ER Interpreting the model and writing SQL code - Number of errors  SQL accuracy - Completion time - Confidence Cognitive fit Bowen et al. (2009) - Ontological clarity - Task complexity Intra-grammar – ER Interpreting the model and writing SQL code - Number of errors  SQL accuracy - Completion time - Confidence  References prior research, but no specific cognitive theory was used Burton-Jones and Weber (1999) - Ontological clarity - Domain knowledge Intra-grammar – ER Interpreting the model - Problem solving - Perceived ease of understanding Problem solving theory Burton-Jones and Weber (2003) Ontological clarity Intra-grammar – ER  Interpreting the model - Domain comprehension - Confidence None Burton-Jones and Meso (2006) Good decomposition model Intra-grammar – ER  Interpreting the model - Problem solving - Cloze test (domain comprehension) - Perceived ease of understanding Semantic networks Problem solving theory   23 Authors Independent variable(s) Nature of study Task Dependent variable(s) Cognitive foundations Burton-Jones and Meso (2008) Good decomposition model Intra-grammar – ER  Interpreting the model - Domain comprehension - Problem solving - Perceived ease of understanding - Multiple forms of information Cognitive fit, cognitive theory of multimedia learning Cognitive economy (Payne et al. 1993) Burton-Jones et al. (2012)  Optional vs. mandatory properties Intra-grammar – UML Interpreting the model - Domain comprehension Cognitive complexity (Genero et al. 2008) Evermann and Wand (2006) Ontological guidance in UML Intra-grammar – UML Interpreting the model - Problem solving - Model comprehension Semantic networks, cognitive theory of multimedia learning, problem solving theory Gemino and Wand (2005) - Optional vs. mandatory properties - Task complexity Intra-grammar – ER Interpreting the model - Problem solving - Cloze test - Perceived ease of understanding Cognitive theory of multimedia learning Khatri et al. (2006) Ontological completeness Intra-grammar – ER Interpreting the model - Problem solving - Model comprehension - Perceived ease of understanding - Completion time Cognitive fit Milton et al. (2012) Ontological clarity Intra-grammar – UML Interpreting the model - Quality evaluation None Moody (2002a) Ontological clarity Intra-grammar – DFD Interpreting the model - Model comprehension - Verification accuracy - Completion time None Moody (2002b) Ontological clarity Intra-grammar – DFD Interpreting the model - Model comprehension - Documentation correctness - Perceived ease of understanding - Perceived understanding - Completion time None Parsons (2011) - Ontological deficiency - Model semantics - Task complexity Intra-grammar – ER Interpreting the model - Model comprehension - Confidence References prior research, but no specific cognitive theory was used Recker et al. (2011)  Ontological clarity Intra-grammar – BPMN Interpreting the model - Perceived ease of understanding - Perceived understanding None Shanks et al. (2003) Ontological clarity Intra-grammar – ER Interpreting the model - Problem solving - Domain comprehension - Completion time None Shanks et al. (2008) Modelling composites as entities or relationships Intra-grammar – ER Interpreting the model - Problem solving - Completion time - Perceived ease of understanding None  24 2.3.3 Cognitive Theories The ontological expressiveness theory predicts levels of completeness and clarity of models. Since the dependent variables (in studies in the meta-analysis pool) are all related to some aspect of cognition, some researchers use cognitive theories to justify their hypotheses regarding users’ understanding of conceptual models. Here — for review sake — we briefly describe the cognitive theories that are commonly used in making predictions regarding understandability of conceptual models in the domain of information systems (IS). In the following subsections, theories of multimedia learning, semantic network, and cognitive fit are discussed. 2.3.3.1 Cognitive Theory of Multimedia Learning (CTML) Mayer’s theory of multimedia learning (Mayer 2003) from the field of educational psychology has been used frequently to evaluate conceptual modelling grammars. The underlying premise is that “when conceptual models include both words and graphic elements, they can be considered multimedia messages” (Gemino and Wand 2005). The theory of multimedia learning makes two propositions: (1) the processing of information in the human mind is done through visual (eyes) and auditory (ears) channels and (2) the human cognitive capacity is limited (Mayer 2003). The theory suggests that learning is achieved when the presented material is integrated with previous knowledge (Mayer 2003). CTML measures learning through two variables: 1) Retention or comprehension refers to the ability to use visual and verbal models in the working memory. Gemino and Wand (2005) suggest assessing retention of domain information by asking questions answerable directly from the presented material. They further suggest that since human cognitive capacity is limited,  25 even a simpler model might be too much to retain in working memory and hence may not be significantly advantageous over an expressive model. They predicted that on the other hand, the more expressive model could produce better verbal and visual models in the working memory and thus could improve domain comprehension by users. 2) Transfer or problem solving refers to the ability to use knowledge gained from the material to solve problems that are related but not directly answerable from the presented material (Gemino and Wand 2005). Problem solving is based on integration of verbal and visual models with prior knowledge in long-term memory. Gemino and Wand (2005) and Bodart et al. (2001) posited that a clearer and more complete model would be better integrated with prior knowledge. This led to the prediction that subjects receiving models guided by ontological rules would perform problem-solving tasks better than subjects who use models not constructed with ontological guidance (Gemino and Wand 2005). Some of the papers in our meta-analysis (e.g., Bodart et al. (2001), Gemino and Wand (2005), Burton-Jones and Meso (2006)) used the distinction made by Mayer (2003) between “surface-level” understanding and “deep-level” understanding. Surface-level understanding refers to tasks that are at the retention level and can be performed using the presented material that is retained in the working memory. Deep-level understanding, in addition to the information from the presented material, requires integration of the cognitive model formed in the working memory with prior knowledge in the long-term memory.  26 2.3.3.2 Theory of Semantic Networks The theory of semantic networks (Collins and Quillian 1969) posits that the “human semantic memory is structured as a network of nodes linked via directed pathways” (Bodart et al. 2001, p. 388). In Bodart et al. (2001), nodes could refer to entities, attributes of entities, classes, or attributes of classes, and the paths could be any type of relationship between nodes. This theory was also used for making predictions by Burton-Jones and Meso (2006) and Evermann and Wand (2006). When the theory of semantic networks is considered in conjunction with the theory of spreading activation (Anderson and Pirolli 1984), three factors emerge as relevant for making predictions about the relationship between ontological expressiveness and user understanding (Bodart et al. 2001): 1) Number of constructs: As the number of constructs in the conceptual model increases, the likelihood decreases that constructs will be recalled by users. This factor therefore predicts that using a more ontologically expressive model decreases the likelihood that constructs will be recalled by users, because it might have more constructs (i.e., it is more complex). 2) Facilitating elaboration: Elaboration is the cognitive process of establishing paths between nodes in the semantic network (Bodart et al. 2001). Using a more expressive model can lead to a clearer and more complete cognitive model. A better cognitive model can facilitate identification of alternative paths between nodes and hence improve the elaboration process.  27 3) Inferential reconstruction: This is the ability to infer what is plausible in light of information remembered from the model (Bodart et al. 2001). The theory of semantic networks predicts that using a more expressive model can facilitate inferring routes between nodes that might not be directly connected. This aspect is also related to elaboration, because better inferential reconstruction can be done when better elaboration is achieved. In short, the first factor predicts that recalling the more comprehensive model would be more difficult (i.e., surface-level model recall), while the second and third factors predict that the more comprehensive model would lead to creation of a more complete semantic model in the user’s mind, thus improving the performance of tasks that require elaboration and inference (i.e., deep-level understanding, as in combining material from the model with prior knowledge in order to solve problems). 2.3.3.3 Problem Solving Theory and the Theory of Cognitive Fit Newell and Simon (1972) suggest that a person’s ability to reason (or solve problems) about a domain depends on the quality of his/her mental representation of the domain. They assume that the mental representation of the domain is constructed as a “problem space” in the person’s memory. Similar to the theory of problem solving, the theory of cognitive fit (Vessey 1991) suggests that when individuals need to solve problems in a domain, their performance will improve when the mental representation of the problem matches the representation of the real-world domain (Shaft and Vessey 2006).  28 Based on the theories of problem solving and cognitive fit, one can predict that using a more ontologically expressive model would be a better representation of the real world, and hence could provide a better cognitive fit and lead to better problem solving. 2.4 Meta-analysis Method 2.4.1 Choice of Analysis Model A meta-analysis could be conceptualized as either a fixed- or random-effects model (Borenstein et al. 2011). A fixed-effects model assumes that all the studies in the meta-analysis are identical and they share a common effect size. Any variation that exists between the findings of different studies in the pool would be due to sampling error. “Put another way, all factors that could influence the effect size are the same in all the studies” (Borenstein et al. 2011, p. 63). The random-effects model, on the other hand, incorporates a group of studies in meta-analysis, assuming that they have “enough in common that it makes sense to synthesize the information, but there is generally no reason to assume that they are identical” (Borenstein et al. 2011, p. 69). The variation between different studies is attributed to sampling error as well as to the random effects variable. The random effects variable includes the variation between studies, such as the chosen variables for the study, and the experimental methods. As mentioned earlier, the studies gathered in our pool had different independent and dependent variables, yet they all focused on the influence of ontological guidance on some measures of cognitive tasks performed by users. Because the studies in the pool are not identical, we chose the random-effects model for this meta-analysis.  29 In our initial study (Saghafi and Wand 2014), we chose the fixed-effects model (as recommended by Borenstein et al. 2011) in order to somewhat reduce the variations caused by the already small sample size, even though the studies in the pool were not identical. 2.4.2 Selection of Papers in the Pool Using online databases8 — namely Business Source Complete, Web of Science, and JSTOR — and also a working paper by Burton-Jones, Green, Indulska, Recker, and Weber, titled “Information Systems as Representations: A 25-year Review and an Agenda for the Future”, we ended up with a pool of 314 papers. From this pool, we selected papers where the researches had actually conducted empirical experiments using ontologically-guided conceptual models that were rooted in Bunge’s ontology. We set the scope of the meta-analysis to empirical studies that focused on model interpretation rather than creation. Thus, we eliminated papers describing research on users’ ability to create models (e.g., Hadar and Soffer 2006). We conducted the meta-analysis with 22 papers.  2.4.3 Variables Used in the Study As illustrated in Figure 2 and Table 1, the studies in the meta-analysis reported different dependent variables. We excluded recall accuracy and task completion time. Recall accuracy was excluded because we assume that analysts in the real world will have access to models during the analysis and design process and rarely need to recall models from memory (Parsons and Cole 2005). As for task completion time, there were                                             8 The keywords that we searched for were: “Ontology”, “Bunge”, and “Empirical”.  30 no predictions in the cognitive and ontological theories about the relationship between presence of ontological guidance and task completion. Table 1 provides a list of the dependent variables used by the studies in our pool, which were incorporated in our meta-analysis. For the purpose of this study, we chose a meta-dependent variable called “cognitive performance”, which incorporated all the dependent variables from Table 1 (excluding recall accuracy and task completion time for the reasons given above). 2.4.4 Data Analysis As mentioned earlier (in Section 2.2), the papers in our meta-analysis pool had measured different dependent variables in their empirical experiments. Besides the difference in types of variables, the findings were also reported using different statistical measures (e.g., t-values, F-values, and regression coefficients). To synthesize the reported measures, we converted all the reported measures to Cohen’s d, which is the standardized mean difference between two groups (in this case ontologically guided models vs. non-guided models). We abstracted all the (included) dependent variables to “cognitive performance” and synthesized the reported effect sizes, as reported in Table 3 and discussed below in more detail. In addition to the first round of analysis (i.e., encompassing all the variable types), we grouped similar dependent variables (according to the schema in Figure 2) and performed four additional rounds of analysis. The five rounds of analysis are summarized in Table 3 and explained further below.  31 The first round of meta-analysis used 22 papers, with 68 reported effect sizes in total (Table 3). The unbiased Cohen’s d was 0.45, which indicates that on average, ontological guidance can improve performance of cognitive tasks by 0.45 standard deviations (compared with the performance of subjects who used non-guided models). For this random effects meta-analysis (which contains reported effect sizes that are non-identical), we report a 95% credibility interval while taking into account the sampling errors of the studies in the meta-analysis pool. The credibility interval contains the distribution where 95% (as is common in the literature) “of true effects are expected to be found” (Borenstein et al. 2011, p. 350). For the first round of the meta-analysis, the credibility interval of the average effect size (with 95% certainty) was from –0.81 to 1.72. In other words, 95% of the time the range of impact of ontological guidance on subjects’ performance is from –0.81 to 1.72 standard deviations (compared with average performance without ontological guidance). Negative values in the confidence interval of the effect of ontological guidance (based on our meta-analysis) are likely due to the random effects component that varies between different studies (as incorporated into the random-effect model in the meta-analysis). In order to reduce the randomness in the analysis, we grouped reported effect sizes based on similarity of scope (following the abstraction hierarchy in Figure 2).      32 Table 3. Meta-analyses done in this study Round Focus of analysis (as named in the abstraction hierarchy, Figure 2) No. of papers included No. of reported effect sizes No. of dependent variables No. of subjects by cumulated reported effect sizes Average Cohen’s d  Credibility interval (95%) 1 Cognitive performance 22 68 11 2047 0.45 –0.81 to 1.72 2 Cognitive performance on conceptual models (CM) 18 60 9 1648 0.55 –0.51 to 1.61 Cognitive performance on database models (querying) 4 8 2 399 –0.11 –2.2 to 1.97 3 Actual cognitive performance on CM 17 40 7 1025 0.65 –0.58 to 1.88 Perceived cognitive performance on CM 12 20 2 1216 0.37 –0.58 to 1.32 4 Actual performance on CM – surface level 10 18 2 603 0.6 –1.07 to 2.25 Actual performance on CM – deep level 15 22 5 791 0.73 0.03 to 1.34 5 Actual performance on CM – surface level – domain 7 11 1 502 0.91 –1.06 to 2.88 Actual performance on CM – surface level – model 4 7 1 181 0.1 –0.25 to 0.47  In the second round of the meta-analysis, we separated the four papers that had focused on query performance (Allen and March 2006b, Bowen et al. 2004, 2006, 2009). Writing of queries is based on database models, whereas the other tasks in our pool of papers were done using conceptual models. The four papers that had focused on database models reported a total of eight effect sizes. The average Cohen’s d of this analysis was –0.11, with a 95% credibility interval of –2.2 to 1.97.  33 In the group of studies focusing on conceptual models, there were 18 papers with 60 reported effect sizes. The average Cohen’s d of the group was 0.55, with a 95% credibility interval of –0.51 to 1.61. The third round (based on studies that focused on conceptual models) separated actual measures of cognitive performance (i.e., problem solving, knowledge identification, quality evaluation, comprehension) from perceived measures (i.e., perceived understanding, perceived ease of understanding, and confidence in answers). In the meta-analysis on measures of actual cognitive performance, 17 papers were represented containing 40 reported effect sizes. Cohen’s d was 0.65 with a 95% credibility interval of –0.58 to 1.88. The analysis of perceived measures of cognitive performance was performed on 12 papers with 20 reported effect sizes. Cohen’s d was 0.37 with a 95% credibility interval of –0.58 to 1.32. The fourth round focused on the actual measures of understanding. Following Mayer’s (2003) distinction between surface-level and deep-level understandings, we created two groups. Meta-analysis of surface-level understanding was done on nine papers, with 18 reported effect sizes. Cohen’s d was 0.6, with a 95% credibility interval of –1.07 to 2.25. The most reliable effect was observed in the group of measures focusing on deep-level understanding. This analysis included 15 papers, with 22 reported effect sizes. The average Cohen’s d was 0.73, with the 95% credibility interval of 0.03 to 1.34. Both lower and upper bounds of the interval being positive indicates that ontological guidance has a uniform and positive effect on deep-level understanding.  34 The fifth round focused on the measures of surface-level understanding. In the pool of papers used in the meta-analysis, the authors had distinguished between comprehensions of conceptual models at the domain level versus the model level. We followed their suggestions and created two groups representing different dimensions of surface-level understanding (i.e., domain and model). The effect of surface-level understanding of domains was strong (0.91), while the credibility interval is very wide (from –1.06 to 2.88). The surface-level understanding of models (or model comprehension) was represented by only three papers. The average effect size was 0.01 with a 95% credibility interval of –0.25 to 0.47 (i.e., in 95% of the time, ontological guidance will change the average performance by –0.25 to 0.47 standard deviations compared with the average performance in the absence of ontological guidance). Figure 3 summarizes the findings of the meta-analysis based on different measures of performance.  Figure 3. Average effects’ sizes of the meta-analysis based on performance measures. d: Cohen’s d; CI: 95% confidence interval  35 2.5  Summary of Findings, Discussion, and Implications In this section the meta-analysis outcomes are summarized and then the findings analyzed in more depth. The following three points summarize our findings. 1) Ontological guidance has a strong effect on improving users’ deep level understanding (Cohen’s d = 0.73) and their surface level understanding of domains (Cohen’s d = 0.91). 2) The effect of ontological guidance is moderate to low on perceptions of understanding (Cohen’s d = 0.37). 3) Ontological guidance has weak or no effect on users’ (actual) surface level understanding of models (Cohen’s d = 0.1) and their ability to formulate queries (Cohen’s d = –0.11). The strong effect of ontological guidance on deep-level understanding — which according to Mayer (2003) requires integration of working memory with prior knowledge — was predicted by the cognitive theory of multimedia learning as well as the theory of semantic networks. According to the theory of multimedia learning, the ontologically expressive models lead to formation of higher quality models in working memory; the higher quality mental models will be better integrated with prior knowledge and hence would result in better deep-level understanding. The theory of semantic networks (Collins and Quillian 1969), together with the theory of spreading activations (Anderson and Pirolli 1984), also predicts that the more expressive model will lead to better elaboration and inferential reconstruction in the minds of users (Bodart et al. 2001).  36 Interesting results are observed regarding surface-level understanding. Measures of surface-level understanding of domains show that they benefit strongly from ontological guidance, whereas surface-level understanding of models is not affected. To explain this phenomenon we refer to the distinction between evaluations of the syntax, semantics, and pragmatics of conceptual modelling grammars proposed by Burton-Jones et al. (2009). Syntactic evaluation might “involve examining valid ways in which scripts can be created using a grammar or examining alternative ways that individuals form scripts using the grammar” (p. 497). Semantic evaluation examines “the meaning of the constructs in the grammar” (Burton-Jones et al. 2009, p. 497) and how the meaning “can be conveyed more clearly and completely” (Bera et al. 2014, p. 1). Pragmatic evaluation of the conceptual domain reflects the context; more specifically, the contextual conditions “in which models are more likely to be understood or preferred” (Bera et al. 2014, p. 1). Using this distinction, we contend that surface-level understanding of the model reflects mainly the syntactic evaluation of the constructs. This was done by Parsons (2011), who removed meaning from the constructs and instead used symbols such as alpha and beta for constructs and relationships. One could argue that syntactic model comprehension might have lesser ties in this case to the real world, thus weakening the effect of ontological expressiveness (or completeness and clarity) on the model’s representation of the real world. However, surface-level understanding of the domain reflects meaning as well as the context of usage, specifically the application domain. In this case, the impact of ontological guidance on users’ performance is strong because such guidance results in a clearer and more complete model  37 As for the moderate to weak influence of ontological guidance on perceived understanding, we should note that the measures of perceived understanding were collected under varied conditions. For example, Recker et al. (2011) conducted a survey to measure perceptions of professionals without asking them to perform any experimental tasks. Burton-Jones and Meso (2006, 2008) measured perceptions of users after they had completed both problem solving (deep-level understanding) and cloze-test (surface-level understanding) tasks. Hence, the moderate to weak effect of ontological guidance might be due to the variations in the conditions under which the users’ perceptions were measured. If the perceptions from only the subjects who performed problem solving tasks (i.e., deep-level understanding) had been measured, the effect might have been stronger. Finally, the negative average effect of ontological guidance on query formulation can be attributed to the fact that queries are created using database models and not conceptual models. Conceptual models are by definition made for the purposes of understanding the domain and communication (Mylopoulos 1992). Hence, formulating queries is not one of the primary purposes of conceptual models, and similarly, understanding and communication may not be the primary purposes of database models. The effect of ontological guidance on database models, although negative in our meta-analysis, needs to be investigated in great detail as we found the credibility interval of this effect in our analysis to be wide (from –2.2 to +1.97). 2.5.1 Limitations, Strengths, and Weaknesses of the Study As shown in Table 3, the only significant result (with a positive confidence interval) was related to measures of deep-level understanding. The reason for a wide confidence  38 interval for other dimensions of understanding could be associated to the limited number of papers that represented those measures. For example, we only had four papers that had used query accuracy as their dependent variables, and the confidence interval for this measure was from –2.2 to 1.97. Acknowledging this limitation, we may have observed significant results at other levels of analysis had we had more papers in the meta-analysis pool.  In a systematic review such as our meta-analysis, a particular concern to address is publication bias, described as “the studies with statistically significant results are more likely to find their way into the published literature than studies that report results that are not statistically significant” (Borenstein et al. 2011, p. 278). Unpublished studies are metaphorically “archived in the file-drawer”. To address this “file-drawer threat”, we can calculate fail-safe N. This measure estimates the number of studies with insignificant effect sizes (that might be in the file drawer), and which, if incorporated into the meta-analysis, can reduce the overall effect size (Borenstein et al. 2011, p. 284) to a pre-determined criterion effect size. The literature suggests setting the criterion effect size at 0.1 (Orwin 1983). In other words, fail-safe N would be the number of studies with a zero effect size that, if incorporated into the meta-analysis, could reduce the overall effect size (i.e., du or the unbiased Cohen’s d) to the criterion effect size (e.g., dc = 0.1). We calculated the fail-safe Nfs, with respect to the number of reported effect sizes included in the meta-analysis (K), using the equation below (Rosenthal and DiMatteo 2002): Nfs,1 = [K(du – dc)]/dc  39 Table 4 provides the fail-safe N corresponding to each iteration of our meta-analysis. Having a large fail-safe N for the overall meta-analysis (238) indicates the strength of the effect of ontological guidance. The fail-safe N for variables related to subjects’ performance using database models and actual surface-level comprehension are, respectively, 1 and 0. This could indicate the weak effect of ontological guidance on the aforementioned variables. Hence, one could deduce that even without including those studies in the metaphorical file-drawer, ontological guidance has minimal impact on these measures of performance. Table 4. Fail-safe N for each iteration of the meta-analysis Round Focus of analysis (from the abstraction hierarchy of Figure 2) Number of reported effect sizes (K) Average Cohen’s d  Credibility interval (95%) Fail-safe N 1 Cognitive performance 68 0.45 –0.81 to 1.72 238 2 Cognitive performance on CM 60 0.55 –0.51 to 1.61 270 Cognitive performance on database models 9 –0.11 –2.2 to 1.97 1 3 Actual cognitive performance on CM 40 0.65 –0.58 to 1.88 220 Perceived cognitive performance on CM 20 0.37 –0.58 to 1.32 54 4 Actual performance on CM – surface level 18 0.6 –1.07 to 2.25 90 Actual performance on CM – deep level 22 0.73   0.03 to 1.34 139 5 Actual performance on CM – surface level – domain  11 0.91 –1.06 to 2.88 89 Actual performance on CM – surface level – model  7 0.1 –0.25 to 0.47 0  Another limitation, mentioned earlier, was that the studies in our pool were not identical (i.e., they used different independent and dependent variables and various modelling  40 grammars). Thus, the heterogeneity in our meta-analysis due to this randomness leads to confidence intervals that included 0. This means that some of the results are not significant. Nevertheless, we believe that our grouping schema and findings regarding deep-level understanding are valuable contributions that could inform researchers and practitioners alike. 2.6 Conclusion and Future Research We discussed the application of ontological theories as guidance in conceptual modelling in this work. Scholars in the information systems domain have debated whether ontological guidance should be employed, as the ontological expressiveness of models comes at the price of losing simplicity (i.e., the trade-off that was discussed earlier in Section 2.1). There are also various ontological theories that could be used as the foundation for such guidance. The Bunge–Wand–Weber ontology (Wand and Weber 1989) is the most widely used ontological theory in the information systems field (Allen and March 2006a, Fonseca 2007). However, other theories have also been discussed in the literature, most notably by Allen and March (2006a, 2006b, 2012), who are the proponents of using Searle’s (2006) ontology. Comparing the two ontologies, we recognize that Bunge’s ontology does not have the constructs that could directly represent individual and collective intentionality (e.g., people’s acceptance of Utah as a state). We need to point out that the purpose of the current work is not refuting applications of Searle’s ontology (or any other ontologies) in information systems analysis and design. Concepts of Searle’s ontology are indeed important, and employing this theory is valuable in understanding social domains. However, in this work we focused on the  41 Bunge–Wand–Weber ontology — the most widely used ontology in information systems. We performed a meta-analysis in order to synthesize past empirical work that evaluated the merits of guidance based on this ontology. Based on five rounds of meta-analysis in this study (Table 4), one could claim that Bunge’s ontological guidance overall has a moderate (i.e., neither strong nor weak) effect on improving subjects’ cognitive task performance (i.e., the highest level of abstraction in the meta-analysis, using all the reported measures in the pool of papers). In other words, we can conclude that ontological guidance improves user’s performance. This could indicate the advantages of including ontological guidance in training of analysts. More specifically, studying the effect of ontological guidance on deep-level understanding (i.e., integration of cognitive models in working memory with prior knowledge) is significantly strong (with Cohen’s d of 0.73 and a confidence interval of 0.03 to 1.34). This means that users accessing ontologically guided models on average outperform users who have access to non-guided models by 0.73 standard deviations in tasks that require deep-level understanding. Because each study in our pool focused on a different aspect of ontological guidance, we could not consider them identical, and hence we chose the random-effect model for this meta-analysis. The random effects variable component between the studies leads to wide credibility intervals for the findings (the negative to positive range in Table 4). Thus, we cannot claim that ontological guidance has a uniform impact (either positive or negative). However, we noticed the strong effect of ontological guidance on deep-level understanding of conceptual models. For such tasks the credibility interval of this effect was positive 95% of the time.  42 As for future research, the results related to other dimensions of understanding (such as surface-level understanding of domains and perceptions of understanding) were not conclusive. In statistics terminology, the confidence intervals for the effect size of ontological guidance in those dimensions of understanding were wide enough to show that ontological guidance could have both negative and positive effects. Reasons for this phenomenon, and the causal and moderating factors that influence the impact of ontological guidance, could be the subjects of future research in order to find more conclusive results related to application of ontological guidance under certain circumstances. Moreover, studies similar to that of Hadar and Soffer (2006) could be conducted in order to evaluate the guidelines rooted in different ontological theories. Hadar and Soffer focused on professional software developers and studied the coverage of guidelines for UML drawn from BWW versus GOL. Their findings showed that BWW guidelines provide more comprehensive solutions for rectifying variations in modelling (compared with guidance based in GOL). Following that example, future research could compare guidelines based on different ontologies (e.g., BWW vs. Searle) for different modelling grammars (e.g., ER, BPMN); such comparison could help the researchers and practitioners in choosing the ontological theory that best suits their requirements.  As another avenue for future research, we could refer to Bowen et al.’s (2009) work. They used the size of the model as a moderating factor for the impact of ontological guidance on users’ understanding of conceptual models. We call for additional studies on this issue (i.e., size of the model) in order to better understand the trade-off between model expressiveness and simplicity.   43 Chapter 3: Role of Instances in Understanding and Querying Data 3.1 Synopsis The evolution of information system (IS) applications in recent years has exposed deficiencies of traditional data management practices in accommodating rapid changes in data requirements and evolving information needs. This has motivated researchers and practitioners to embark on a quest for more flexible and effective data management methods. We propose that organizing data based on instances, rather than classes, can provide a more flexible and usable structure for understanding and retrieving data. This approach does not impose any pre-defined structure over the data at the outset, and it does not require central control and planning. This paper introduces a graphical formalization of the instance-based representation and explains its advantages. To demonstrate the potential advantages of the proposed approach, we conducted a laboratory experiment to evaluate the effectiveness instance approach compared with the traditional class-based (relational) approach. Users achieved higher accuracy in identifying the correct procedure for querying data when they used instance-based representations than when they used class-based representations. In addition, we conducted a protocol analysis study that demonstrates the effectiveness of the cognitive process of users in their interaction with instance-based representations. Our results indicate that the instance-based paradigm is a more flexible and effective data management approach compared with the incumbent data management practices.  3.2 Introduction Traditionally, information systems are developed within organizations based on a shared understanding of information requirements that form the basis for a well-defined  44 conceptual model (schema). This schema is typically expressed in terms of classes (types) of entities of interest and relationships among these classes, describing the data relevant to one or more predetermined applications (Elmasri and Navathe 2011) and guiding subsequent database design. The database structure, in turn, supports routine transactions as well as ad hoc queries. In the latter case, the conceptual model provides the metadata necessary to understand and retrieve data effectively. As a database generally is used for many purposes, its schema reflects a consensus view of the phenomena of interest to users in the organization by integrating potentially diverse perspectives of different groups of users. Such an integrated schema may not correspond to the view of any single user or group in the organization, resulting in difficulties in understanding aspects of the global schema that do not match the local view of a user or group and in using such a representation (Parsons and Wand 2000, Parsons 2002). Furthermore, a single conceptual schema may inhibit alternative conceptualizations of the phenomena in the domain. Effective information management increasingly relies on understanding data in contexts other than those for which the data were originally collected. For example, the “open data” movement emphasizes a desire to make data (especially government data) freely available for (re)use by anyone (Gurstein 2011). This repurposing of data is a marked departure from traditional ways in which information was collected and used and suggests the need to provide flexibility in retrieving data. Positioned against the traditional view, Parsons and Wand (2000) proposed the “instance-based data model” as an alternative to the traditional class-based data management approach. Unlike class-based approaches, the instance-based paradigm  45 does not require imposing well-defined structure over the data, nor does it necessitate central control and planning. Thus, it appears well-suited to the requirements for use in understanding information that goes beyond a local view (Parsons and Wand 2014). Moreover, the instance-based approach gives users the ability to dynamically organize the data in classes useful for their purposes. Research on instance-based paradigms indicates that this approach supports information requirements agility (Parsons and Wand 2014), provides flexibility (Parsons and Wand 2000, Saghafi 2012), and can improve the quality of data collected outside organizational boundaries (Lukyanenko et al. 2011, 2014). This is particularly important as organizations increasingly turn to external sources of information for analysis and decision-making (Halevy et al. 2009), sources that may or may not have a well-specified schema. This paper examines the potential of the instance-based approach to facilitate understanding of the semantics of information for the purpose of querying data sources. We begin by reviewing the instance-based data model. Next, we consider approaches for representing instance-based data. We describe the design of two experiments that compare an instance-based representation with a traditional class-based representation. We then present results of our first experiment, showing that the instance-based approach leads to better understanding of phenomena and, despite being unfamiliar to experimental participants, produced reported adoption intentions similar to the class-based approach. Following this, we report the results of a follow-up study using protocol analysis to show how participants in the instance-based condition are able to approach the problem more effectively (and efficiently) compared with users  46 in the class-based group. We conclude by discussing the implications of our findings and opportunities for future research. 3.3 Theoretical Foundations of the Instance-based Paradigm The instance-based approach offers a natural way to understand data by separating instances and their properties from predetermined, fixed classification (Parsons and Wand 2000). The instance-based paradigm describes instance as a symbol designating the perceived existence of “a material object, action, event, or any other phenomenon” (Parsons and Wand 2008b, p. 842). We discourse about instances in terms of their properties. Property “refers to any statement about the characteristics of an instance” (Parsons and Wand 2008b, p. 842). We recognize properties that belong to an instance alone (i.e., intrinsic properties), and properties that are shared between two or more instances (i.e., mutual properties). In this section we review the core concepts of the approach and illustrate how they support alternative ways of conceiving the phenomena in a domain. 3.3.1 Independence of Instances from Classes An underlying premise of the instance-based paradigm is that “information modelling reflects humans’ view of existing or possible reality” (Parsons and Wand 2000, p. 236).9 From an ontological perspective, the world (which is represented in information models) is made of things that possess properties (Bunge 1977); in the context of this paper, we refer to two types of properties: intrinsic and mutual. Properties that are inherent to a thing are called intrinsic properties. Mutual properties, however, are only meaningful in                                             9This premise refers to the representation theory, which considers information systems as representations or models of real-world phenomena (Wand and Weber 1989, 2002).  47 the context of two or more things (Wand and Weber 1993, p. 222). The existence of things and their properties in the world is independent of any classification constructed by humans. Thus, “to properly reflect the semantics of a domain, it is necessary to represent instances and their properties independent of any classification” (Parsons and Wand 2000, p. 239). Based on this ontological principle, the instance-based paradigm proposes separation of instances from classification by creating a two-layered architecture, with instances and their properties in one layer and class definitions in another. The independence of instances from classes also provides many advantages from a data management point of view by reducing challenges that result from a fixed classification structure. These challenges include the integration of multiple sources of data (i.e., interoperability between two systems), support of multiple users, and accommodation of multiple applications (Parsons and Wand 2000). To illustrate the challenge related to integrating multiple sources, consider the database designs of two hypothetical hospitals; Figure 4 shows this scenario in a class-based (i.e., relational) model, and Figure 5 depicts the same situation using a possible instance-based representation scheme.10 When the instances and their properties are separate from classification, data instances from multiple information sources can be integrated at the instance-layer. Classification structure of each source can be reconciled at the class-level, based on information semantics; details of schema reconciliation, however, are beyond the scope of this paper                                             10 Names used in Figure 4, or in other examples throughout the chapter, are fictitious. Any similarity with real cases is purely coincidental.  48   Figure 4. Relational representation of Hospitals A and B    Figure 5. Instance-based representation of data in Hospital A and Hospital B  3.3.2 Classification: Users Create Classes (and Views) on Demand Classification is a fundamental cognitive process used by humans to comprehend phenomena by grouping them based on similarity (Lakoff 1987). Classes, in addition to reflecting a repeating pattern of attributes, also indicate that relationships might exist between the attributes (Parsons and Wand 2008a). In other words, classes not only  49 represent similarity of phenomena, but also “make it possible to infer additional information” (Parsons and Wand 2008a). By enabling inference, “classes serve a utilitarian role” (Parsons and Wand 2000, p. 238). From this perspective, classification structure is not inherent in the real world (i.e., things do not “belong” to classes); rather, classification is a human activity (Parsons and Wand 2008b), done for the purpose of providing useful abstractions (i.e., classification is utilitarian). Thus, human cognition’s role is not identifying the “correct” classification of things, “as there is no single correct way to classify a given set of instances”11 (Parsons and Wand 2000, pp. 238–9).  Therefore, rather than having a database designer construct the “correct” classification structure, the instance-based paradigm suggests that users should create their required classes to identify particular similarities among instances on demand. Delegating this task to users allows them to construct frameworks of concepts (i.e., schemas) based on their prior knowledge. The user-constructed schemas would be aligned with their information needs and provide “context for interpreting experience and assimilating new knowledge” (Derry 1996). Moreover, from a cognitive economy point of view, when classes are pre-defined (by a database designer) some of the classes may not be congruent with the requirements of some users. This could lead to an increase of cognitive load for such users (as some of the defined classes in the structure may not be useful for a particular user or information need). Since human cognitive capacity is limited (Mayer 2003), allowing users to create classes on demand could also lead to                                             11 This view is also reflected in Simsion et al.’s (2012) survey of expert data modelling practitioners; according to their study, practitioners also agree that “there is no infallibly correct” (Simsion et al. 2012, p. 152) design for data models (i.e., a classification scheme).  50 better management of cognitive load. As a counter argument, one could posit that in some situations having a fixed schema (i.e., pre-defined classes) could reduce cognitive effort. While we acknowledge that possibility, resolving these contradicting arguments requires comparing the cognitive effort required to create classes on demand by users with the effort required for understanding a schema defined by a database designer. This comparison is addressed to some extent in our second experiment. In general, our emphasis is that giving users the ability to define their own classes will enhance assimilation of information (Derry 1996) and also reduce the cognitive load by avoiding the need to process classes that may not be relevant to a user’s particular needs (à la class-based methods). In short, this principle provides flexibility to the users by allowing them to construct their own schemas based on the knowledge they require for a particular application. Moreover, it helps them manage their cognitive load by ensuring that only useful classes are created. As users define their own classes or views to reflect their individual requirements, they can apply these views as “lenses” to the pool of instances in order to access the information that is relevant to their needs (i.e., view the instance layer based on their individual classifications).  Moreover, in the instance-based paradigm, applications that are not even anticipated can be supported later by simply adjusting the views (or “lenses”) to incorporate information relevant to the new application. An example of an unanticipated application in the healthcare domain could be a contagion outbreak in a certain geographical location. Epidemic researchers (who may not have been the intended users of the data) may need to aggregate personal location data with patients’ medical records and  51 hospitals’ patient-admission data to learn about the characteristics of the contagion. This scenario exemplifies emergence of new applications that require integration of data from multiple sources to be used by unanticipated users for a newly emerged application. This situation could be supported by logical data independence in class-based representations. Logical data independence is a facility afforded by database management systems to change the logical structure of the data (i.e., the classification scheme) without changing existing user views (Gray et al. 2005). Although previously defined user views may not require updating, supporting an emerging application would entail restructuring the data schema to reflect the new application (which in turn might require moving or duplicating instances between tables). This situation could impose a great challenge in a class-based setting (schema evaluation and database operation problems — as summarized by Parsons and Wand (2000)).  However, if the data are stored using the principles of the instance-based paradigm, new views can be defined over the data that incorporate observed symptoms of patients, hospital admission records, and personal location data; this lens could be applied to view the existing data without a need for data restructuring. In other words, by not binding data to a pre-defined structure, new applications can be run using existing data (as there was no predefined class structure). Users can define views reflecting their needs; the information they require can be accessed from the instance-layer based on the definition of their views.12                                             12Note that this approach is consistent with both column-based (e.g., Abadi et al. 2009) and graph-based database approaches. Unlike those approaches, though, the instance-based paradigm emphasizes the cognitive rationale for separating instances from classes and provides the underlying cognitive and ontological semantics to understand what real-world phenomena are represented in a database.  52 3.4 Representation of Instance Data To model information in a manner consistent with the instance-based principles introduced in Section 3.2, we suggest that a thing (an instance) in the domain be represented as a node. In addition, its intrinsic properties are grouped together and depicted on top of the object. If two things have a connection with each other, there should be one or more mutual properties describing the nature of the connection. To model the connection, the two things are linked to each other via an arc, and the mutual properties relevant to that connection are grouped together and depicted on top of the arc. Figure 6 illustrates this representation.   Figure 6. Two things in the domain that are connected to each other This representation is free of concerns, such as primary keys, foreign keys, and cardinality constraints, which are important in understanding relational representations. The benefits of the instance-based approach apply to the representation in Figure 6 as well (e.g., supporting information requirements agility and flexibility in accommodating emerging applications).  We need to point out that the proposed representation is not free of structure — as in free-form text. Our proposed grammar has indeed some structure to it (e.g., things as nodes, mutual properties as links). However, unlike the class-based representations,  53 this structure is not fixed. For example, in class-based representations, two records in a table would have the same number of properties; some properties might have the null value, but the number of cells that are allocated to a record in a row is fixed. In the instance-based grammar, however, two instances that represent the same ontological “kind” do not need to have the same exact number of properties (e.g., one might have 17 properties, while the other instance may have only five).  A visualized representation of a fictional travel agency, using our proposed instance-based modelling approach, is provided in Figure 7a. We show a list of links that could connect two things and a list of properties that a thing or the link may have. Figure 7b shows some actual data that could exist in a fictional travel agency. For the sake of comparison, Figure 8 illustrates the same domain (i.e., travel agency) using the traditional relational approach.        54  (a)        (b)  Figure 7. (a) Generic view of travel agency data and (b) sample data in an instance-based system    55 Users of the instance-based representation can use three reasoning mechanisms to make inferences regarding the data, provided that the user has some knowledge about the domain and the task that he/she is performing. First, a user can make inferences based on the properties of an instance; knowing some of the properties of an instance might be enough to infer what it is (based on the requirements of the task). For example, knowing that an instance has feathers and wings might be sufficient to infer that it is a bird. Second, inference can be made based on the relationships (or links13) between two instances; based on the location of an instance with respect to a connecting link, whether at the origin of the link or at the end, users can make inferences regarding the instance. For example, if we have a directed link labeled “Employs”, which originates from a node with name property of “Sarah” and leads to a node with name property of “John”, one can infer that the instance that has the “Employs” link emanating from it (i.e., Sarah) is the employer, and the instance that the link leads to (i.e., John) is the employee. Finally, a user can make inferences based on the properties of an instance and the links connected to it. Building on the previous example, knowing that an instance has an outgoing link titled “Employs” might be enough to identify it as employer. However, if one wants to distinguish between corporate employers and individual employers, then properties of the instance need to be observed. If the said instance has a property like date of birth, one can infer that it is an individual employer; similarly, a property like head office location might lead to the inference that the instance is a corporate employer.                                             13As mentioned earlier, we model mutual properties using properties of directed links. Some mutual properties are modelled as unidirectional (e.g., instance-A produces instance-B), and some might be bidirectional (e.g., instance-A marries instance-B).  56 To illustrate how information can be retrieved from the instance-based representation, we provide the step-by-step procedure of performing a task related to the travel agency domain in Table 5. For comparison, the procedure required for performing the same task using the traditional approach (amounting to specifying several joins) is also provided. Consider the task of generating a list of passengers with allergies that will travel with a particular airline. The performance of this task is based on the information models provided in Figures 7 and 8 (i.e., instance-based and traditional representations of the travel agency). For another example, see Appendix A.  Figure 8. Travel agency schema in traditional (class-based) representation     57 Table 5. Example of information retrieval using instance-based and traditional representations Describing the procedure for generating a list of all Air Canada passengers with allergies Instance-based Traditional Look for the thing with the “Name” property in which the value is “Air Canada”. Locate all the links titled “Travels With” that are attached to the thing found in step a. At the other end of “Travels With” links, there is an instance (thing) that flies with Air Canada. Note down the value of the “Name” property on those instances that also have the “Allergy” property.  From the “Service Provider” table, look up “Name = Air Canada” and note the corresponding “ServiceProvider_ID”. In the “Itinerary_Detail” table, look for the records with “ServiceProvider_ID” from step a and note their “Itinerary_No”s. Look up the “Itinerary_No”s found in step b in the “Itinerary” table, and note the corresponding “Customer_ID”s. If those “Customer_ID” (from step c) were in the “Customers with Additional Considerations” table, then look up the value of the “Name” property, corresponding to the same “Customer_ID”s, in the “Customer” table.  To summarize, our proposed graph-oriented representation of instance-based data is one possible representation of instances and their connections. There are similarities between our proposed grammar and other approaches discussed in the literature, such as the Object-Role Modelling (ORM) language (Halpin 1998). ORM “pictures the world in terms of objects (entities or values) that play roles (parts in relationships)” (Halpin 1998, p. 82). The objects are modelled as nodes, and the relationships are modelled as links. ORM has similarity to our proposed approach in the sense that it is a graph-oriented grammar that uses the constructs of nodes and links. However, our proposed representation, unlike ORM, does not assign classes or labels to the nodes. Moreover, we based the proposed representation on the underlying cognitive and ontological principles of the instance-based paradigm (discussed in the Section 3.2).  3.5 Research Model We believe this is the first study of the usage of instance-based representations for retrieving information (queries). We focus on “information consumers” (users who are not experts in database design) in an organizational setting and objectively evaluate  58 their ability to use instance-based representations effectively. Moreover, as instance-based representation is not widely practiced in the IS domain (i.e., users with prior familiarity with information representation are likely to be familiar with traditional approaches based on the relational model), we are interested in participants’ subjective evaluation of the experience interacting with the instance-based representation. To collect this evaluation, we asked participants to report their intentions to use the system in the future (if given the chance). By conducting objective as well as subjective evaluation of the instance-based paradigm, we hoped to gain insight into whether it is a suitable (or even superior) alternative to traditional data management methods. We assume that users have some familiarity with the domain (as a regular user in an organization would be). As for the context of use, we draw on a taxonomy of system usage types proposed by March (1991); in this taxonomy, uses of information are categorized into “exploitation” and “exploration” of information. Exploitation is described as “routine execution of knowledge, whereas exploration refers to the search for novel and innovative ways of doing things” (Burton-Jones and Straub 2006, p. 236). This study sets the task context to exploitation of information rather than exploration, since exploitation’s usage outcomes are more immediate and predictable in the short run (March 1991), which makes it more appropriate for laboratory experiments (more on this in Section 3.5). Moreover, exploitation reflects the traditional task of querying databases in organizations. This is the setting for which traditional class-based models are best suited. As a consequence, it provides a conservative test of the value of the instance-based approach (i.e., stronger results would be expected for exploration tasks). Chapter 4 extends this work to the context of exploration tasks.  59 Consistent with the differences between the class-based and instance-based paradigms, two representations were designed. One was a manifestation of the traditional approach, in which the information is organized (i.e., structured) based on pre-defined classes. The second depicted things in the domain, along with their intrinsic properties and mutual properties (relationships) between things; this representation was free of predefined classification or structure (as can be seen in the example in Figure 7). We compare users’ ability to effectively use a system when they view the data using the instance-based representation versus a class-based representation as well as their subjective evaluation of usage intentions. Effective use is operationalized by performance (Burton-Jones and Grange 2012). To measure performance, we asked subjects to describe (or verbalize) the cognitive procedure to perform a data retrieval task via the representation to which they had access (instance-based or traditional). We rely on a general verbal description of the procedure, as there is no accepted query language for the instance-based data model. This evaluation method (i.e., subjects describing the cognitive procedure) was proposed by Ford (2004) and was also used by Bera et al. (2011) to measure subjects’ ability to utilize knowledge management systems. An example of the step-by-step cognitive procedure of performing an information retrieval task was presented in Table 5. Intention to use is mediated by perceived restrictiveness.14 The concept of “system restrictiveness” is defined by Silver (2006) as the extent to which a system allows or constrains user discretion to use the system. This study manifests a situation where                                             14There are other antecedents to usage intentions, such as perceived ease of use and perceived usefulness (Davis 1986), which have been studied extensively (Xu et al. 2013, Tan et al. 2013, Wixom and Todd 2005). System restrictiveness was used in this study, as it is highly relevant to the claimed advantages (notably, flexibility) of the instance-based approach.  60 users’ discretion to use a system is manipulated in the experiment, that is, whether they use the traditional representation and their discretion is restrained or have access to the instance-based representation and are allowed to have flexible mental schemas of the information according to their own knowledge and needs. In short, there are two propositions tested in this chapter: Proposition 1: Content consumers who use the instance-based system will be able to use the system more effectively than content consumers who use the class-based representation. Proposition 2: Content consumers who use the instance-based system will perceive the instance-based representation less restrictive and consequently have equal or higher adoption intentions than content consumers who use the class-based representation. Users’ may be more familiar with the traditional paradigm; however, the instance-based representation is expected to be perceived as less restrictive by subjects as users will be free to think and reason about the data using their own mental frameworks rather than using a fixed classification scheme created by a database designer. As mentioned earlier, the first dependent variable of the study is effective use, defined by Burton-Jones and Grange (2012) as using a system “in a way that helps attain the goals for using the system” (p. 633). The goal in general terms is considered a “cognitive representation of a desired end point” (Fishbach and Ferguson 2007, p. 491). Specific identification of a system’s goal is considered to be complicated; however, attaining the goal can be operationally assessed in terms of performance (Burton-Jones  61 and Grange 2012). This study will measure subjects’ performance by evaluating their ability to correctly formulate the necessary steps to acquire the required information. Justifying proposition 1 is based on the premise that the subjects using the instance-based representation will be able to construct their own mental frameworks of concepts or schemas. The user-defined data schemas are predicted to be more congruent with their prior knowledge and experiences. Based on schema theory (Derry 1996), when users define the structure of information based on their prior knowledge, they are able to assimilate the information more effectively. If the structure of information is pre-determined by database designers, the data structure may not necessarily be congruent with users’ mental schema of prior knowledge (Parsons 2002). As an example of a simple exploitative usage task, consider supply chain management of a small business. In this scenario users need to keep track of the number of backordered units of different products. Using a traditional representation may require going over at least two classes: an inventory class that represents the number of available units of a given product (i.e., quantity on hand) and an order-detail table that shows quantity ordered.15 Users of the instance-based representation, however, have the ability to construct their own mental models, and by doing so they can keep track of the two relevant attributes (i.e., quantity on hand and quantity ordered) within one query. This not only is more convenient but also may reduce the chances of error. The second proposition predicts that the instance-based representation will be perceived as less restrictive, and hence it will be an attractive option to users (i.e., lead to as high or higher usage intentions, despite lack of prior familiarity with the approach).                                             15If a product was ordered through multiple suppliers, then the number of classes could be more than two.  62 More specifically, providing users with the ability to define their own schemas is likely to reduce their perceptions of system restrictiveness.16 Based on reactance theory (Brehm 1966), restrictions trigger psychological reactance in users and lead to behavioural avoidance. In the context of this study, we expect that the traditional representation will be perceived as more restrictive than the instance-based representation and, hence, influence subjects’ usage intentions negatively. However, this prediction may only be valid for content consumers who are not deeply familiar with a representation type (such as the subjects in our experiment). A more experienced user, who has been using the class-based representation for a while, may have become used to the restrictions (i.e., has passed the learning curve) and although finding the approach restrictive, continue using the representation due to familiarity with the class-based approach. A similar proposition (to Proposition 2) was studied in Antony et al. (2005) as well, and their empirical findings refuted the hypothesis. They studied novice designers’ process of data modelling using knowledge-based systems — one with a guidance interface and one with a restrictive interface. They showed that the subjects found the restrictive interface easier to use (and hence they had higher adoption intentions), since the non-restrictive interface overwhelmed them with too much freedom in customizability. Two factors could explain the competing predictions from Silver (2006) and Antony et al. (2005): (i) nature of the task and (ii) subjects’ prior knowledge and experience. Antony et al. (2005) had subjects perform a data-modelling task using CASE tools, which                                             16Tentatively, it is expected that content consumers’ perception of system restrictiveness would be independent of their level of domain knowledge. Perception of system restrictiveness (as operationalized in Appendix A) measures whether the users believe they had control on how they could view the information in the system. Traditional representation is expected to be perceived as more restrictive by the users, as the fixed class-based schema does not allow users to change the way they view the information.   63 involves creation of information, whereas Silver (2006) as well as Wang and Benbasat (2009) studied users’ ability to gather information. The proposed study is more similar to the latter approach (as the task requires retrieval of information). Subjects’ prior knowledge could also influence their reaction to a restrictive system; as mentioned earlier, an experienced user may find mechanisms to overcome restrictions. In our experiment the subjects are novice system users, and their level of familiarity with the domain is moderate. 3.6 Experiment 1 As this is the first empirical test of information consumers’ use of instance-based representations,17 internal validity is of critical importance; thus a laboratory experiment is preferred to a field experiment in this case (Calder et al. 1981). In addition, as the instance-based approach is novel, it is not clear how to design a field study in contexts where the traditional class-based designs have been implemented and used over long periods. 3.6.1 Design and Experimental Material The experiment used a 2×2×2 mixed-subjects design. The first two factors were between-subjects, and the last factor was within-subjects. For the first factor, subjects were randomly assigned to either the class-based or instance-based representation. The second factor manipulated whether subjects received only a general structure (schema) of the data (in the class-based condition, this was a UML class diagram, while the instance-based condition consisted of a list of properties that a thing could have and                                             17Although some recent research has examined intensional and extensional representations, it is reliant on the premise of having classified data, a significant point of departure from our study (Samuel 2011).  64 type of links that connect different things) or the general schema as well as actual data – allocation of subjects in different conditions is shown in Table 6. The experimental material is provided in Appendix A and the answer key in Appendix B. The second factor (i.e., schema vs. schema and data) in our experimental design was included to provide an ecologically valid experimental task. A schema without data is the “natural” form of representing and retrieving class-based information; thus, it was important to have this representation of the data in our study. On the other hand, for an instance-based representation, it is the actual data (instances) that are integral for reasoning (based on the three mechanisms discussed in Section 3.3); thus, it was necessary to provide both schema and actual data to the users. While we could have just used these two representational forms, we believe the resulting comparison between class-based schema (in the form of tables) and instance-based data (in the form of graphs) would have confounded any findings, as it would be impossible to tease out whether results were due to representing classes versus instances or to representing structure only versus structure and data. In short, to ensure a meaningful comparison between the key conditions, we needed to have correspondence between the “natural” form of each representation (data for instance-based and schema for class-based), and a structurally equivalent representation in the other paradigm. The within-factor was the subjects’ familiarity level with a given domain for performing the task. We also wanted to ensure that the experimental outcome was not an artifact of the task domain. Hence, subjects in each group performed tasks for two separate domains (a travel agency and a consulting firm). Using two domains in the experiment follows other empirical studies focusing on users’ understandability of conceptual  65 models for a similar reason (i.e., experiment not being context-sensitive) as well as impro ving external validity of the study (Bodart et al. 2001, Gemino and Wand 2005, Bera et al. 2011). In our experimental design, we expected the consulting case to be less familiar to our subjects than the travel agency case. Our measures of prior knowledge in the pilot as well as in the main experiment (Table 7) also confirmed the (significant) statistical difference in subjects’ familiarity with these domains (both at p-values < 0.0001). This factor (i.e., domain familiarity) is added for exploratory reasons, rather than confirmatory; hence, we do not make any predictions about the effect of domain familiarity.18  Experimental tasks were exemplars for the type of tasks a typical business might perform. We tried to make the task generalizable to realistic cases. Many of the questions were formulated in terms of classes (e.g., find “Clients” or “Agents”), which might favour the control group (i.e., class-based). However, we tried to make the conditions informationally equivalent. Particularly, for the condition in which subjects received both schema and actual data, we provided the same amount of information (i.e., individual instances and their relevant properties) in both instance-based and class-based groups (see Appendix A). We did not have a specific criterion to decide how many instances should be presented. However, we believe that this issue does not limit our findings, as we also studied a condition in which the subjects had only access to the schema of the respective representation (without any instances from actual data).                                              18 For an in-depth analysis of the impact of domain familiarity on users’ understanding of conceptual models, refer to Bera et al. (2014).  66 We presented the experimental material using a non-interactive visual representation of the data, similar to Bera et al. (2011) and Gupta and Jain (1997). Although a working implementation of the instance-based representation (discussed in Section 3.3) had been developed using a graph database management system (Saghafi 2012), confounding factors such as responsiveness of the interface, placement of buttons, colour of the background, and other similar concerns would have been a hindrance in the experiment, which was intended to evaluate subjects’ ability to successfully attain their information (retrieval) needs when interacting with instance-based representations (compared with traditional) rather than ability to learn how to use a particular implementation. Likewise, because there is no accepted query language for the instance-based approach (unlike SQL for the class-based approach), we asked our subjects in both conditions to provide syntax-neutral statements regarding how they would utilize the information provided to them, following the second step19 of query formulation as defined by Ogden (1986). 3.6.2 Participants We recruited subjects from an undergraduate-level course titled “Information Systems Technology and Development” at a large North American university. This course required students to spend two hours in lecture and one hour in the laboratory per week. They learned database concepts (e.g., ER diagrams and creating tables, forms, and queries). The laboratory was taught using Microsoft Access®, and students were required to deliver assignments based on Access®. When we conducted the                                             19Ogden (1986) described three steps for query formulation. These steps, as discussed in Allen and Parsons (2010), are “query formulation (stating the information need in natural language), query translation (stating the query in terms of the data schema independent of the query language syntax), and query writing (producing a syntactically valid, executable statement)” (Allen and Parsons 2010, p. 58).   67 experiment, students had already spent 5 to 6 weeks in the laboratory and submitted two phases of the Access® assignment — they had already learned about data modelling (entity-relationship grammar, as well as creating tables and relationships in MS Access®), query principles (only in lecture), designing forms, and generating reports. Volunteers received a 2% bonus on their final mark in the course. The pilot was done with 14 subjects, testing only one of the factors (i.e., both class-based and instance-based users had access to schema and data). Eight subjects were in the class-based group and six were in the instance-based group. For the travel agency case, subjects in the treatment group achieved a statistically significant higher average (two-tailed p-value = 0.03). However, for the consulting case, the difference between subjects’ performance in control and treatment groups was not statistically significant (two-tailed p-value = 0.35). After analyzing participants’ answers, we reworded some of the questions. The main experiment was conducted with 130 subjects. The allocation of subjects to different experimental groups is described in Table 6. Table 6. Experimental design and subject allocation Condition Schema Schema and data Control (class-based) n = 63 n = 32 n = 31 Treatment (instance-based)  n = 67 n = 36 n = 31   68 3.6.3 Procedure The experiment began with a 20-minute training session. We demonstrated the process of retrieving information and how to verbalize it using the representation given to the group (i.e., traditional representation for the control group, and instance-based for the treatment — examples can be seen in Figures 7 and 8). The training material was based on a fictional movie rental store (i.e., different from the domains in the experiment). Participants were encouraged to ask questions if there was any confusion. After the training, the participants were asked to fill in a short questionnaire (available in Appendix A as the pre-experiment questionnaire) to measure their prior domain knowledge as well as their familiarity with database systems based on a 7-point Likert scale. Table 7 summarizes this data; note that no statistically significant difference was observed among the measures of prior knowledge between the class-based and instance-based groups. Table 7. Measures of prior knowledge  Written queries before (%) Database knowledge Travel knowledge Consulting knowledge Class-based 37% 2.86 / 7 4.49 / 7 2.98 / 7 Instance-based 44% 2.95 / 7 4.19 / 7 2.84 / 7  The order of domain assignments was random. Half the subjects in each condition received the travel agency case first, and the other half received the consulting firm first. For each domain there were four questions that varied in difficulty (e.g., Question 3 in the travel agency case required joining two tables, whereas Question 4 required joining five tables — see Appendix A). The order of difficulty was also randomized to control for learning effect.  69 At the end of the experiment participants received a post-experiment questionnaire (available in Appendix A) to answer five short questions regarding their perceived restrictiveness of the representation and their intention to use the presented representation in the future when given the chance. 3.6.4 Data Analysis and Results Each subject answered four questions related to the travel agency domain and four questions related to the consulting domain. The answers described the steps required to solve a problem (such as the example in Table 5). Marking was done on a 5-point scale. If the answer was completely irrelevant, a score of 0 was awarded. If less than half the steps were correct, a score of 0.25 was awarded. If half or a majority of the steps were correct, 0.5 and 0.75 were assigned, respectively. If all the steps were correctly identified, the subject received 1.00 for that question (examples of marked responses from actual subjects are available in Appendix C). In other words, for each domain, a subject received a score out of 4.00, based on four questions. Subjects’ responses were evaluated by two coders. To measure inter-rater reliability,20 we calculated the intra-class correlation (ICC) between the ratings of the two coders. The ICCs for the travel agency and consulting cases were 83% and 87%, respectively, indicating a high level of consistency between the two coders.                                             20 Following Hallgren’s (2012) guidelines regarding inter-rater reliability measures, Cohen’s kappa is more appropriate for nominal variables, while intra-class correlation (ICC) is better suited for ordinal and interval variables. Since our performance scores (discussed later) are calculated on 0.25 increments from 0–4, we consider them interval variables — hence, our selection of ICC to measure inter-rater reliability.  70 Perceived restrictiveness and intention to adopt the system were measured by using a post-experiment questionnaire. The items21 (available in Appendix A) were measured using a 7-point Likert scale.  The time to complete the experiment (performing the tasks and completing the post-experiment questionnaire) was also measured. Table 8 provides descriptive statistics relevant to each measured variable. Based on Propositions 1 and 2 above, we tested the following specific hypotheses: H1: Performance will be higher in the instance-based condition than in the class-based condition. H2a: Reported adoption intention will be higher in the instance-based condition than in the class-based condition. H2b: Perceived restrictiveness will be a mediator in the relationship between representation type and adoption intentions. To test the hypotheses, we conducted repeated measures ANOVA.22 The corresponding tables between and within-subjects effects are provided in Appendix D. In the following, we present our analysis of the data for each variable (i.e.. performance, restrictiveness, adoption intention, and time). The descriptive statistics are provided in Table 8.                                             21 We also validated the reliability of the scales. Cronbach’s alpha for perceived restrictiveness and adoption intention items were 60% and 89%, respectively. 22 Since assumption of sphericity was rejected in our repeated measures ANOVA model, we used Greenhouse-Geisser’s correction to interpret the data. To test the assumptions of homogeneity of variances we performed Leven’s test, which was not statistically significant for either the travel agency (p = 0.538) or for the consulting case (p = 0.442). To test for normality, we drew Q–Q plots, which are available in Appendix D.  71 In summary, our first hypothesis was supported, since subjects in the treatment group (i.e., instance-based representation) demonstrated higher effective use (performance) than subjects in the control group (i.e., class-based representation); the two-tailed p-value was less than 0.0001.  Table 8. Descriptive statistics Condition Travel agency Mean (SD) / 4 Consulting mean (SD) / 4 Restrictiveness mean (SD) / 7 Usage intention mean (SD) / 7 Time (min) mean (SD) Control  (class-based) n = 63 2.82 (0.82) 1.96 (0.71) 4.23 (0.91) 4.69 (0.94) 33.65 (9.70)   Schema  n = 32 2.85 (0.79) 1.92 (0.76) 4.52 (0.69) 4.53 (1.03) 27.78 (6.37) Schema & data n = 31 2.78 (0.86) 2.00 (0.67) 3.94 (1.02) 4.84 (0.94) 39.71 (8.80) Treatment (instance-based) n = 67 3.42 (0.52) 2.95 (0.70) 3.98 (1.17) 4.55 (1.27) 38.13 (8.30)   Schema n = 36 3.43 (0.54) 2.94 (0.65) 4.04 (1.28) 4.48 (1.32) 35.81 (7.42) Schema & data n = 31 3.39 (0.49) 2.96 (0.78) 3.92 (1.04) 4.62 (1.23) 40.84 (8.56) Scores for travel agency and consulting are out of 4; restrictiveness and usage intentions were measured on a 7-point Likert scale; Time was measured in minutes.  Our second between-factor, which compared the performance of subjects using only the schema with subjects using schema and the data, did not show a significant difference (p-value = 0.948  not significant). Note that we did not hypothesize about such a difference, as we included the “schema and data” option in the class-based condition and the “schema only” condition in the instance-based condition to avoid potential confounds in comparing the “natural” version in each condition. A possible explanation for not finding a statistically significant difference in this factor of the study (i.e., schema vs. schema and data) can be described by examining the advantages of each level. On  72 the one hand, we thought participants who had access to schema and actual data would have a more concrete mental model of the domain. Based on the theory of semantic networks (Collins and Quillian 1969) via spreading activation (Anderson and Pirolli 1984), having a more elaborate representation of a domain would lead to better inference by the users (as was similarly theorized in Bodart et al. (2001)). Thus, as a consequence, greater cognitive fit (Vessey 1991) between the presented models and subjects’ mental models might be achieved. On the other hand, there is an extra cognitive load in trying to integrate multiple models (Kim et al. 2000). Based on the results of our experiment, a possible reason could be attributed to the cognitive load of integrating multiple diagrams cancelling out the advantages of having a more elaborate model of the domain. This is just a tentative discussion as we did not have the data to verify this claim in the experiment. However, the within-subjects effect of domain familiarity as well as the interaction effect between representation type (i.e., class-based vs. instance-based) and prior domain familiarity were significant (p-value < 0.0001 for familiarity, and p-value = 0.004 for the interaction effect). This means that there was a larger difference in performance of instance-based users vs. class-based users when they were working with the less familiar domain (i.e., consulting firm), compared with the difference in performance on the more familiar domain (i.e., travel agency). However, the overall performance was better in the travel agency domain (the more familiar one) by subjects in both groups: Figure 9 illustrates the results. As we mentioned earlier, investigating the effect of familiarity as a within-factor was an exploratory study (rather than confirmatory). We leave the explanation of this phenomenon for future research.   73  Figure 9. Impact of domain familiarity on performance Table 9 summarizes the findings discussed previously. Table 9. Between and within factors in the experiment Factor F-value with df = 1 P-value Instance-based vs. Class-based (between-factor) 57.300 < 0.0001 Schema vs. Schema & data (between-factor) 0.004 0.948  N.S. Familiarity (within-factor) 14.969 <0.0001 Familiarity * Instance-base vs. Class-based (interaction) 8.764 0.004 Familiarity * Schema vs. Schema & data (interaction) 0.818 0.368  N.S. Instance-based vs. Class-based * Schema vs. Schema & data (interaction effect) 0.011 0.915  N.S. Familiarity * Instance-based vs. Class-based * Schema vs. Schema & data (interaction) 0.009 0.924  N.S. df: degrees of freedom, N.S.: Not Significant   74 We also analyzed the effect of prior database knowledge on performance. We used the prior database knowledge measures reported by subjects (Table 7) as a covariate in our model. The results showed that prior database knowledge did not have a significant impact on users’ performance of the experimental tasks.  In addition, we recorded the order of case assignments to subjects (i.e., whether they received the travel agency case or the consulting case first). The effect of this covariate was not significant either. Based on this, we can assume that we successfully controlled for learning effect by randomizing the order of case assignment. Table 10 provides the F-values and significance levels of the factors discussed above.  Table 10. Covariates in the experiment Covariate F-value with df = 1 P-value Prior database knowledge  0.621 0.432  N.S. Order of case assignment 0.111 0.740  N.S. df: degrees of freedom, N.S.: Not Significant     We also studied the effect of prior domain knowledge (both travel agency and consulting) on users’ performance. For this moderation analysis, we used the performance scores of travel agency and consulting firm separately (instead of using prior domain knowledge as a covariate in repeated measures ANOVA, as it would have taken the effect on overall performance on both cases rather than on individual domains). More specifically, we examined the interaction effect between prior travel agency knowledge and experimental condition on performance in travel agency and in a separate analysis examined the interaction effect between prior consulting knowledge and experimental condition on performance in consulting. The moderating effect of prior  75 travel agency knowledge on performance was not significant (p-value of 0.77); however, prior consulting knowledge had a significant effect on performance (p-value of 0.03). This means that higher levels of prior consulting knowledge strengthened the effect of instance-based representation. We leave further investigation on the cause of this observation for future research.  3.6.4.1 Perceived Restrictiveness and Intentions to Use Our secondary hypotheses (H2a and H2b) predicted that users of the instance-based representation would report equal or higher usage intentions compared with class-based users. It also predicted that the perceived restrictiveness would have a mediating role — more specifically, perceived restrictiveness would mediate the relationship between representation type (i.e., class-based vs. instance-based) and users’ adoption intentions.  As mentioned in the experimental procedure section, we used a questionnaire to measure subjects’ perceived restrictiveness and adoption intentions after they had answered questions related to both domains. In other words, unlike our task, we do not consider perceived restrictiveness and adoption intentions to be repeated measures; thus, we used simple regression for this analysis.  The results show that the direct relationship between representation type (i.e., class-based vs. instance-based) and intentions to adopt the representation was not significant (standardized beta coefficient = -0.063, p-value = 0.478  not significant). Then, we proceeded to investigate the indirect path for mediation analysis (Baron and Kenny 1986), that is, the effect of representation type on adoption intentions through perceived  76 restrictiveness. The first step of the indirect path (from representation type to perceived restrictiveness) was not significant either (standardized beta coefficient = –0.117, p-value = 0.186  not significant). To further analyze the mediation effect, Baron and Kenny (1986) suggested continuing the analysis by regressing the dependent variable over the independent and mediator and comparing the coefficients from the direct path and indirect path. Since the first step in analyzing the indirect path was not significant in our case, we did not proceed with the consequent steps. We conclude that our data does not corroborate the presumed mediation effect of perceived restrictiveness on adoption intentions of users of class-based vs. instance-based representations. Based on this analysis, hypotheses H2 and H2b are not supported in our experiment.  Table 11 provides a summary of this analysis. Table 11. Regression analysis related to usage intentions Path Standardized Beta  P-value Representation type and Adoption intentions –0.063 0.478  N.S. Representation type and Perceived restrictiveness –0.117 0.186  N.S. N.S.: Not Significant   3.6.4.2 Task Completion Time As mentioned earlier, the task completion time measured in this experiment included the time taken to perform the tasks related to both domains of travel agency and consulting firm and to fill out the post-experiment questionnaire; thus, we used a simple t-test to compare the task completion time between the two groups (rather than repeated measure ANOVA).  As shown in Table 12, our results show that subjects using the class-based representation completed the task in less time than subjects using the instance-based  77 approach. We need to point out that we do not consider task completion time as a measure of cognition (which was the reason for excluding this variable from the meta-analysis in Chapter 2). However, some empirical evaluations of conceptual models in the past (e.g., Allen and March 2006b, Bodart et al. 2001, Shanks et al. 2008, Khatri et al. 2006) have considered task performance time as a measure of efficiency. We do not make the same claim (i.e., one representation being more efficient than the other), since our subjects had already been trained in the class-based approach for 5 to 6 weeks prior to the experiment and their familiarity with that approach was of course higher.  In comparing the second between-factor of our study (schema vs. schema and data), participants in the class-based group who had access to only the schema performed their task quicker than participants in the instance-based group (with the same condition of only using schema). This could be a consequence of familiarity; subjects had been trained in class-based data management methods for 5 to 6 weeks prior to their participation in the experiment, compared with only 20 minutes of training in the instance-based paradigm. However, when we gave both groups schema and some actual data, the completion times of the class-based and instance-based group were not significantly different (as shown in Table 12). Table 12. Task completion time  Schema Schema and data Two-tailed P-value Class-based M = 27.78, SD = 6.37, n = 32 M = 39.71, SD = 8.80, n = 31 < 0.0001  Significant Instance-based M = 35.81, SD = 7.42, n = 36 M = 40.84, SD = 8.56, n = 31  0.012  Significant Two-tailed P-value <0.0001  Significant 0.611  N.S.   M: Mean, SD: Standard Deviation, n: Sample Size, N.S.: Not Significant  78 In general, subjects in both class-based and instance-based groups using only the schema completed their tasks in a significantly shorter time than corresponding subjects who used both schema and the data. 3.6.4.3 Post hoc Analysis We discussed the overall performance of subjects in each domain earlier. Here, we will break down the performance of subjects for each individual question in each domain. In the travel agency domain, one can observe that overall instance-based representation enabled higher effective use by subjects. However, the difference is not significant for the first question (see Table 13). For this analysis, we did not consider the second between-factor of the study (i.e., schema vs. schema and data) as the results were similar and thus did not provide new insights. Table 13. Breakdown of travel agency domain questions Travel agency questions Control, n = 63 Treatment, n = 67 Two-tailed P-value 1 M (SD) = 0.87 (0.26) M (SD) = 0.91 (0.17) 0.3  N.S. 2 M (SD) = 0.47 (0.32) M (SD) = 0.69 (0.30) < 0.0001 3 M (SD) = 0.76 (0.33) M (SD) = 0.90 (0.20) 0.004 4 M (SD) = 0.72 (0.32) M (SD) = 0.92 (0.17) < 0.0001 M: Mean, SD: Standard Deviation, n: Sample Size, N.S.: Not Significant Similarly, we broke down the questions related to the consulting firm. Table 14 shows that for the first question the difference between class-based and instance-based conditions is not significant, even though the subjects in the treatment group had a higher average. The second question had a low average in the class-based group (0.13 out of 1). We intentionally asked a question that was complicated to answer in the traditional  79 representation because of cardinality constraints. More specifically, answering that question required observing a many-to-many relationship; however, the class diagram depicted a one-to-many relationship (for details see Appendix A). Out of 63 subjects in the class-based group, eight identified that answering this question was not possible owing to cardinality constraints or found the answer in an adjacent table. We gave full marks to those subjects. The rest of the subjects in the class-based group received 0 for not realizing this limitation. For questions 3 and 4 we also observed that the users of instance-based representation performed better than the users of the class-based representation. Table 14. Breakdown of the consulting domain questions Consulting questions Control, n = 63 Treatment, n = 67 Two-tailed P-value 1 M (SD) = 0.72 (0.27) M (SD) = 0.78 (0.21) 0.16  N.S. 2 M (SD) = 0.13 (0.34) M (SD) = 0.66 (0.32) < 0.0001 3 M (SD) = 0.48 (0.23) M (SD) = 0.71 (0.27) < 0.0001 4 M (SD) = 0.63 (0.33) M (SD) = 0.80 (0.32) 0.003 M: Mean, SD: Standard Deviation, n: Sample Size, N.S.: Not Significant  After the completion of the experiment we asked the subjects (in class-based as well as instance-based groups) who received both schema and data to report which resource was most useful to them in performing the tasks. Table 15 provides the statistics regarding subjects’ use of these resources. Subjects in both control and treatment groups found actual data the most useful resource (even though the use of data did not improve their performance compared with the groups that received only the schema, as can be seen in Tables 9 and 11).   80 Table 15. Subjects’ usage of available resources in the experiment. Condition Actual data General schema Both Control (class-based), n = 31 45% (14/31) 39% (12/31) 16% (5/31) Treatment (instance-based), n = 31 71% (22/31) 16% (5/31) 13% (4/31)  3.7 Experiment 2 To gain a deeper understanding of the advantages an instance-based representation provides with respect to effective use (compared with a class-based representation), we conducted a second study using protocol analysis. We expect that first-time users (whether instance-based or class-based) will require some effort to understand a domain representation. Based on cognitive schema theory (Derry 1996), we expect that users of instance-based representation are able to assimilate information more effectively using their own mental schema (based on their prior knowledge). However, users of the class-based representations may incur additional cognitive load (on top of the initial learning effort) to adapt to a structure defined by a database designer. Our protocol analysis investigates the effectiveness of the cognitive process of users in solving problems using both representations. 3.7.1 Design, Participants, Experimental Material, and Procedure Following Burton-Jones and Meso (2006) and Bera et al. (2011), we designed a process tracing study with a small sample of 12 subjects; six were randomly assigned to the control group (receiving a class-based representation) and six were in the treatment group (receiving an instance-based representation). We recruited subjects who had a profile similar to the participants in the first experiment. We therefore advertised to students of an undergraduate-level course titled “Information Systems Technology and  81 Development” after 5 weeks into the semester (i.e., the same condition as in experiment 1).23 Using a pre-experiment questionnaire (available in Appendix A), we measured subjects’ prior database knowledge and domain knowledge. The measures of prior knowledge in Table 16 are comparable with prior knowledge of subjects in the first experiment (Table 7). Table 16. Measures of prior knowledge  Written queries before (%) Database knowledge Travel knowledge Consulting knowledge Class-based (n = 6) 66% 2.66 / 7 4.33 / 7 3.16 / 7 Instance-based (n = 6) 50% 2.83 / 7 4.66 / 7 3.66 / 7  We used the same material and procedure as in experiment 1 (Section 3.5). First, subjects were trained for 20 minutes in the instance-based or class-based method, based on the group to which they were assigned. Then participants were provided with descriptions, general data schema, and actual data from the travel agency and consulting cases (order of case assignment was random). As there were no differences between schema-only and schema-plus-data conditions in experiment 1, we restricted this study to the second condition. We asked the subjects to verbalize their mental process (i.e., think out loud) while performing the experimental task. Audio recordings were taken from each subject and analyzed by two coders; Table 17 shows the inter-coder reliability statistics.                                             23The participants each received $20 for their time. The top two performers were awarded an additional $20 gift card.  82 3.7.2 Data Analysis and Results To investigate the effectiveness of users’ cognitive process in solving problems using instance-based or class-based representations, we measured three dependent variables: “performance”, “breakdown”, and “recovery”. Performance was measured based on the same 4-point scale used in experiment 1 (examples available in Appendix A). Answers to each of the four questions of the case (whether travel agency or consulting) received a score between 0 and 1 based on the correctness of the answer. Thus, subjects received a mark out of 4 for each case. The breakdown variable, first defined by Newell and Simon (1972) and also used by Burton-Jones and Meso (2006) and Bera et al. (2011), is defined as a failure in the line of thought of an individual when they are searching their problem space. Recovery (from a breakdown) is when a subject returns to an earlier step in their line of problem solving process to continue solving the problem. Based on our proposition justification (Section 3.4), we expect that users of the instance-based representation will have fewer breakdowns, since they are able to assimilate information based on their own mental schemas rather than a schema defined by a database designer (as in class-based methods). We also expected that users of the instance-based representation will be more successful in recovering from a breakdown owing to the flexibility that this approach provides in viewing and organizing information according to one’s mental model (Derry 1996). To demonstrate our application of breakdown and recovery concepts, one can think of three hypothetical scenarios. (1) A subject starts answering a question and follows the steps in a metaphorical flow chart (or a business process) but gets to a dead-end. If he/she abandons that line of thought (i.e., the metaphorical flow chart), we consider that  83 a breakdown with no recovery. The subject may start approaching the problem from a different angle (i.e., new line of thought) or give up on answering that question. (2) A subject gets to a dead-end in his/her solution but goes a few steps back and continues on the same line of thought (i.e., metaphorical flow-chart). This is considered a breakdown with a recovery. (3) Without any breaks or failures, a subject goes from the start to end of the solution flow chart. While the answer may not be correct, we consider this a solution with no breakdowns (and hence no recoveries). As mentioned earlier, two coders were involved in evaluating the protocol analysis results. They calibrated their rating scheme using the data from a pilot (of the protocol analysis) with two subjects. The intra-class correlations between the two coders are shown in Table 17. These numbers indicate high levels of agreement between the two coders on the measured variables of the experiment. Table 17. Agreement between coders Domain Variable Intra-class correlation Travel agency Performance 82% Breakdowns 98% Recoveries 88% Consulting Performance 85% Breakdowns 97% Recoveries 94% Table 18 summarizes the results from our protocol analysis for the travel agency and the consulting domains.   84 Table 18. Results of the protocol analysis Domain Condition Performance (SD) Breakdowns Recovery Recovery percentage Travel agency Class-based (n = 6) 2.88 (1.13) 5.83 2.17 37% Instance-based (n = 6) 3.50 (0.69) 4.17 2.33 56% Consulting Class-based (n = 6) 2.13 (0.83) 7.00 2.67 38% Instance-based (n = 6) 3.63 (0.47) 7.00 3.50 50% Performance is out of 4, SD: Standard Deviation, n: Sample Size, Recovery percentage = Recovery/Breakdowns. The protocol analysis results are consistent with our proposition that users of an instance-based representation are able to assimilate the information more effectively and efficiently than users of a class-based representation. This is evidenced by the higher performance, fewer breakdowns, and higher success rate in recovering from a breakdown (i.e., recovery percentage) by instance-based users in both cases. After they completed the experimental task, we asked subjects to describe the biggest challenge they faced while interacting with the representation that was assigned to them. Table 19 lists the subjects’ self-reported challenges. We categorized these challenges based on similarity of the reasons. Table 19. Challenges reported by the subjects in the protocol analysis Condition Challenge Actual quotes Class-based (n = 6) Finding the required information attributes “Pinpointing the location of attributes was difficult” (Subject #3) “Couldn’t find the information in the tables” (Subject #4) “Finding information and tables, […] was challenging” (Subject #11) Understanding the relationships between classes “Questions were challenging. Understanding the relationships between classes were difficult, in particular one-to-many relationships” (Subject #6) “It takes time. Accessing and searching for information was difficult” (Subject #10) “[...] understanding their relationships was challenging” (Subject #11) “Seeing the connection between records was difficult. Connecting foreign keys to primary keys adds overhead in data retrieval” (Subject #12)  85 Condition Challenge Actual quotes Instance-based (n = 6) Confusion due to prior familiarity with class-based approach, or lack of familiarity with instance-based approach “I found the approach easy, however, concepts from the course confused me” (Subject #1) “Understood the system, but describing the sequence of actions threw me off guard” (Subject #2)  “Not knowing what operations were allowed in this view was challenging” (Subject #5) “I was so used to class-based that switching to this view became difficult for me” (Subject #9) Finding the required information attributes “Also locating what property is on what thing was difficult” (Subject #5) “Visualizing what properties to look at based the question was challenging, but graphics and links make finding connections between objects easier” (Subject #8) Understanding the relationships between instances “Too many links become confusing. However, actual data made understanding the question easier by providing example” (Subject #7)  3.8 Limitations and Validity Threats 3.8.1 Limitations of the Instance-based Approach Implementing the principles of the instance-based approach entails restructuring the organizational databases, as it requires separating instances from predefined classes. In addition, users need to undergo training in order to learn how to use such a system (i.e., how to create their own views, retrieve information, and add new information to the system). In short, adopting the instance-based paradigm might require further investments in personnel training in order to generate positive returns (Brynjolfsson and Saunders 2010). Moreover, an underlying premise of this study is that the users of the instance-based approach are capable of defining their own views based on their needs. Cognitive schema theory (Derry 1996) predicts that when the problem schema is compliant with the mental schema of prior knowledge, users’ ability to assimilate information improves.  86 However, constructing schemas might be considered a complicated task for some users, which might become a hindrance in their adoption and use of an instance-based system. It may be that users who employ information systems only for tasks that have been anticipated in the design of a system may find the traditional approach easier to use owing to habit. Nonetheless, our results show that for even routine or closed-ended tasks, users of an instance-based representation were able to perform better than those using a class-based representation. Moreover, the reported intentions to adopt the instance-based representation were not significantly lower than intentions to use the traditional approach. This is noteworthy given that our subjects were already trained in class-based (traditional) methods but had no previous exposure to the instance-based approach. 3.8.2 Validity of the Experiment and Possible Improvements To improve the validity of the experiments and reduce biases, two modelling experts went over the experimental material. Subjects’ prior knowledge of the databases, as well as their domain knowledge, was controlled. As mentioned in Section 3.5, the measurement scales for the dependent variables are available and have been verified before (by Wang and Benbasat (2009) and Xu et al. (2013)). The scales were reworded to reflect the experimental task and are available in Appendix A. In short, as it was a laboratory experiment, we believe internal validity is high. However, as the subjects are students or novice users, there is no assurance of external validity. To address this threat, the generalizability of the approach as well as its implications in the real world need to be considered. The experimental task and the scenarios were  87 designed as realistically as possible. In future research, the benefits of the instance-based paradigm in real settings needs to be tested and evaluated. 3.9 Implications 3.9.1 Theoretical Implications To the best of our knowledge, this is the first study to focus on the ability of information consumers to query instance-based information systems effectively. The theoretical contributions of this work are two-fold. First, this empirical study, as a rigorous evaluation of the instance-based paradigm, is a contribution to the design science research. Second, it can be considered a human–computer interaction (HCI) study in which the design principles are borrowed from the instance-based paradigm. This study uses cognitive and behavioural theories, namely, cognitive schema and reactance theory, to study users’ ability to effectively use an instance-based representation as well as their intentions to use the system given the chance. Studying the restrictiveness of interfaces is also interesting, as there are opposing predictions in the literature: Brehm (1996) and Silver (2006) predict that users react negatively to a restrictive system, while Antony et al. (2005) showed that the users preferred the restrictive system over the less restrictive system, as the latter overwhelmed them with too much customizability. As mentioned in Section 3.4, the differences in findings of Silver (2006) and Antony et al. (2005) could be due to the nature of the task (e.g., creation vs. consumption of information by human subjects) as well as users’ level of domain and systems knowledge. Our results showed no significant difference of perceived restrictiveness and intention to use between the two representations. However, the scope of the current work was limited to consumption of information by users (i.e., non-experts in  88 systems design) with moderate levels of domain knowledge, in an information exploitation usage context. In a broader sense, support for the research hypotheses provides empirical evidence that users of instance-based systems can achieve higher effective use (than comparable users in a class-based setting) in exploitative tasks; this contributes to informing practitioners of the benefits of the instance-based paradigm.  3.9.2 Practical Implications Principles of the instance-based paradigm can provide flexibility that is not afforded by the traditional data management methods. The flexibility needs are most apparent in environments with data sources that contain large volumes and great variety of data, which constitute a growing suite of applications in “big data”. However, adopting the instance-based approach comes with some challenges and limitations, such as costs to restructure legacy systems (from traditional to instance-based), need for user training, and incorporating possible changes in business processes, as might be necessary for successful adoption of any new technology (Brynjolfsson and Saunders 2010). The corroboration of the propositions provides empirical evidence that the users of the proposed information system can fulfill their informational requirements or, in other words, use the information system effectively to formulate information requests (queries). The empirical evidence regarding the benefits of this approach can help justify the costs of restructuring the organizational databases, providing staff training, and devising the required security policies. Some applications that could benefit from the instance-based approach are listed in Table 20.  89  We should point out that we do not refute the utilities provided by classes, such as facilitating inference (Parsons and Wand 2008a) and its use in documentation and standardization of systems (Elmasri and Navathe 2011). The instance-based approach would still support such utilities by allowing classes to exist at a separate layer from instances. In other words, by having a separate classification layer, database operation and schema evolution problems could be reduced (Parsons and Wand 2000), while the utilities of classification (e.g., inference and standardization) are still realized.  Table 20. Applications that can benefit from the instance-based approach Application Flexibility requirement Benefits from the approach Citizen science Information might be integrated from multiple sources Separation of instances from classes: challenges of integrating multiple sources are reduced Business intelligence Multiple applications exist, some of them might emerge over time (e.g., identification of new patterns, and incorporating them into the system) Separation of instance from classes: semantic changes do not affect the information. The operations can continue on existing data. Users define their classes: New concepts can be incorporated in users’ views without restructuring the data Medical databases Multiple users, with varying needs, access the data (e.g., doctors and administrators). Similar concepts might be defined differently by users (e.g., patient in critical condition in different units) Users define their classes: concepts can be defined differently within each view. Separation of instances from classes: Users define their views without knowing how the data are organized  3.10 Summary and Future Research The instance-based paradigm, as an alternative to traditional data management approaches, has been shown to be more flexible, providing agility in changing requirements (Parsons and Wand 2013) and enabling generation of higher quality data by data contributors in open settings (Lukyanenko and Parsons 2014). In our study, we investigated a visual model of the instance-based paradigm — rooted in a previous implementation (Saghafi 2012). In that work, an instance-based system  90 was developed using semantic-web technology, and a proof of concept implementation showed that the proposed implementation was able to run all the necessary database operations on an instance base (as was done in Parsons and Wand (2000), p. 246). To evaluate the effective use of our proposed model, we performed one empirical study with 130 subjects and a second protocol analysis experiment to gain additional insights into differences between instance-based and class-based representations. Our results showed a significant improvement in performance (or effective usage) of subjects in conditions where they used the instance-based representation in comparison with the class-based one. Moreover, in our design, we explored the possibility of providing only a schema to subjects or schema and data at the same time. Results show that addition of data does not lead to a significant improvement in subjects’ performance; moreover, subjects who had access to both schema and data took longer to perform the task (Table 11). In short, our results corroborated the proposition that the instance-based representation will lead to higher effective use by consumers of information in information retrieval tasks (i.e., exploitative usage contexts). Our study was done under the constraint of informational equivalence between the instance-based and the traditional representation. Future research could focus on facilities provided by the instance-based approach that are not achievable in the traditional method. Another interesting area for research would be studying users who are domain or technology experts; evaluation of their usage of instance-based systems would complement the findings of the current study.  91 Chapter 4: Visual Analytics on Instance-based Data Models 4.1 Synopsis The diversity and expansion of information system (IS) applications in recent years have exposed shortcomings of traditional data management practices in catering to the requirements of the modern IS landscape. This has motivated researchers as well as practitioners to embark on a quest for more flexible and effective data management methods. Driven by the same inclination, the instance-based data management paradigm is proposed as a more agile and flexible alternative method for information management. This approach does not require imposing well-defined structure over the data, nor does it entail central control and planning. This chapter investigates the application of instance-based representation in the context of knowledge discovery and exploration of information, which is aligned with current needs in data analytics. An empirical experiment shows that users derive higher quality knowledge from instance-based compared with class-based data. 4.2 Introduction In the current “Age of Information”, advances in data collection have surpassed organizations’ ability to analyze the data collected for various purposes such as evaluating marketing campaign effectiveness or strategizing for competitive positioning (Thomas and Cook 2006). As the applications of data evolve, effective management and analysis of information becomes contingent on providing facilities to understand the meaning of data in contexts other than those for which the data were originally collected (Lukyanenko et al. 2014). This is exemplified by the “open data” movement, which emphasizes freely providing data to be used by members of the online community  92 (Gurstein 2011). Such information may come from multiple sources outside traditional organizational boundaries, created for varying purposes. Being able to effectively utilize information has become a sought-after goal by organizations. This is evidenced by the fact that many companies have incorporated data-driven analysis in their decision-making process (Brown et al. 2011). Traditional data management approaches are based on the assumption of having an understanding of information requirements of an organization, which forms the basis for a well-defined schema (Elmasri and Navathe 2011). These schemas provide the conceptual framework (i.e., meta-data) necessary for managing data for pre-determined and well-defined applications of information. Such conclusive structures might be able to fulfill requirements of tasks with minimal variability. However, in the current open information environments, the traditional approaches do not afford the flexibility required to effectively manage information in alternating contexts and purposes (Parsons and Wand 2000). Facilitating different uses and applications of information requires adjusting to varying perspectives on the domain of interest, which might be inhibited as a consequence of fixed schemas. As an alternative to traditional data management methods, Parsons and Wand (2000) proposed the instance-based paradigm. Unlike the traditional class-based methods, the instance-based paradigm does not require imposing well-defined structure over the data, nor does it necessitate central control and planning. Thus, it could be an appropriate method for understanding and using information created beyond the boundaries of an organization (Parsons and Wand 2013). Moreover, the instance-based paradigm gives users the ability to dynamically organize the data to reflect their usage  93 purposes, as discussed in the previous chapter. For an example, please refer to Figure 5. The objective of the proposed research is to investigate the adoption of the instance-based approach by organizational users in the context of knowledge discovery and information exploration. 4.3 Prior Research and Relevant Foundations The instance-based approach provides a natural foundation for understanding data by separating instances of information from pre-defined structures (Parsons and Wand 2000). This view is based on the premise that existence of phenomena (and their properties) in the world is independent of any pre-determined categorization structure. Classification is not inherent to real world phenomena, but is an artifact of the human mind. Classes are created in order to comprehend phenomena by grouping them based on similarity (Lakoff 1987). Consequently, it could be posited that “there is no single correct way to classify a given set of instances” (Parsons and Wand 2000, pp. 238–9) as each person could come up with a classification that is useful for their own purposes. Thus, a database designer cannot possibly come up with the “correct” classification structure of information in the domain for everything his future users may need. In short, the instance-based paradigm is considered to be more congruent with the order and structure of reality, as it represents things and their properties independent of pre-defined classifications constructed by humans. Moreover, from a cognitive point of view, Derry’s (1996) schemata theory predicts that individuals will assimilate information more effectively when they have the ability to view  94 them according to their own mental frameworks (which are shaped based on their prior knowledge). This is in contrast with the class-based method, in which users need to adapt to class structures that are usually defined by a database designer. Based on these premises, the instance-based paradigm proposes separation of instances from classification by creating a two-layered architecture: instances and their properties in one layer and class definitions in another. Research done so far on the instance-based paradigm indicates that this approach supports information requirements agility (Parsons and Wand 2013), provides flexibility (Parsons and Wand 2000, Saghafi 2012), and can improve the quality of data collected outside organizational boundaries (Lukyanenko et al. 2014). In addition, in Chapter 3 of this thesis, the effective use of instance-based representation in information retrieval tasks was evaluated. The results show that (novice) subjects using the instance-based representation can achieve higher accuracy in identifying the correct procedure for querying data (compared with the ones using the traditional representations). From a subjective evaluation perspective, participants’ intention to adopt the instance-based approach was not statistically different from the adoption intentions of users of the traditional method, even though the subjects were already educated in class-based methods for 6 weeks, while the users of the instance-based representation had merely received a 20-minute training. In other words, subjects’ preference to use the less familiar approach was not significantly different from the more familiar one (i.e., class-based), as some might have tentatively predicted otherwise.  95 In summary, Chapter 3 demonstrated that the instance-based paradigm is a more flexible and effective data management approach compared with the incumbent data management practices. 4.4 Research Model For the proposed study, the focus will be on regular users acting as “information consumers” (users who are not necessarily generating the information and are non-experts in database design) in an organizational setting. As for the context of the study, one could seek guidance from the taxonomy of system usage types proposed by March (1991). In this taxonomy, uses of information are categorized into exploitation and exploration of information. Exploitation is described as “routine execution of knowledge, whereas exploration refers to the search for novel and innovative ways of doing things” (Burton-Jones and Straub 2006, p. 236). This study sets the task context to exploration of information rather than exploitation. Exploration of information is aligned with the current emphasis on discovering new knowledge from data and utilizing the insights in organizational decision-making (Brown et al. 2011). Referring to the theory discussed earlier in Chapter 1 regarding content consumers effective use of instance-based data, the study in this chapter tests the following hypothesis: H3: Quality of patterns identified by users of the instance-based representation will be higher than users of the class-based representation. Justification for this proposition is based on the human cognition literature (as summarized by Davern et al. (2012) and Browne and Parsons (2012)). The study  96 predicts that subjects using the instance-based representation will be able to construct their own mental frameworks of concepts or schemas congruent with their prior knowledge and experiences and thus understand the domain better and explore the information more effectively. The current work is related to categorization literature, as well as heuristics’ impact on human decisions. These theoretical lenses are taken from Browne and Parsons’ (2012) review of cognitive research in systems analysis and design. Each aspect is discussed in more detail below. From a categorization perspective, the prediction rooted in the cognitive schema theory (CST) by Derry (1996). CST is taken from the information processing perspective in the education and learning literature. This theory posits that humans form mental models to construct an understanding of the phenomena that they observe. According to CST, each encountered concept is modelled as a memory object (or a building block) in working memory. When humans solve problems, they connect, reorganize, and map these active memory objects onto components of real-world phenomena based on their previously learned schemas (i.e., prior knowledge). In fact, learning is a form of active construction of mental models. One argues that instance-based representations give more freedom to users to form their own mental models and, hence, process the information (and reason about the domain) more effectively (namely, when they attain a desired end-point or goal successfully). Regardless of the number of instances or properties stored in the instance repository, users have the ability to view the ones that are aligned with their own mental schemas. In the class-based approach, however, the user needs to understand a schema created by a database designer in accordance with  97 the designer’s prior knowledge and environmental input, which may not necessarily be congruent with users’ mental schema of prior knowledge (Parsons 2002). From the human decision-making literature, we refer to Tversky and Kahneman’s (1974) work on heuristics in decisions and judgment. “Heuristics are cognitive short cuts, or rules of thumb, that allow people to act and decide” (Browne and Parsons 2012, p. 1006). In the present research, class-based data are considered heuristics that are provided to users in advance (as anchors). Human users will use the anchor as a starting point in their process of evaluating the problem. These starting points bias the end results toward the initial value (Tversky and Kahneman 1974). Parsons and Saunders (2004) and Allen and Parsons (2010) examined the role of anchoring in reuse of existing code and SQL queries, respectively. Their results indicate a negative impact of anchoring, manifested in misalignment between the artifact (IS, code, or SQL query) with initial requirements as well as propagation of errors (within the SQL codes). In the context of the present study, one expects that under the instance-based condition subjects’ cognition is not anchored to an initial starting point (in contradiction to the class-based schemas created by database designers). Hence, the performance of instance-based users will be less biased compared with users of class-based representations. We assume that by reducing anchoring biases, we can mitigate the potentially negative impact of class-based representation and thus predict a more effective usage by users of the instance-based representation. 4.5 Experiment Chapter 3 demonstrated that instance-based provided for better “standard” query performance. The current chapter expands the previous empirical test by evaluating  98 content consumers’ ability to use instance-based representation in exploratory tasks. To study exploration and knowledge discovery — consistent with the recent literature (Paolini and Di Blas 2014, Morton et al. 2014) — this study used visual data analytics by content consumers as the setting of this experiment. Visual data analytics is defined as human users performing “analytical reasoning facilitated by interactive visual interfaces” (Thomas and Cook 2006, p. 10). Visual analytics is performed when users need to derive insight from data, “detect the expected, and discover the unexpected” (Thomas and Cook 2006). Particularly in problem domains where the specifications are not well defined, employing human visual capabilities in understanding the domain and exploring data might be a more feasible approach than using an automated textual analytics or data mining process (Munzner 2014). Examples of visual analytics (performed by subjects of this experiment) are available in Appendix F. Data analytics may lead to detection of numerous patterns where not all patterns are of equal quality. Quality measures are needed for “selecting and ranking patterns according to their potential interest to the user” (Geng and Hamilton 2006, p. 1). These measures could evaluate patterns objectively or subjectively (Pipino et al. 2002). Objective pattern quality measures statistical strength of the discovered pattern (McGarry 2005, p. 39) based on raw data, with “no knowledge about the user or application” (Geng and Hamilton 2006, p. 3). Subjective pattern quality measures quality of a pattern by taking into account “users’ beliefs or expectations of their particular problem domain” (McGarry 2005, p. 39). Both these measures will be used and further discussed in this section.  99 The experiment investigates H3 from three different aspects: (i) ability of users from each condition to identify true statements (ii) overall reliability of patterns that in turn is afforded by the method (iii) insights gained from the patterns For each of these aspects, there is a hypothesis and a dependent variable to test: H3a: On average, users of the instance-based representation will be able to identify a greater number of correct (true) patterns compared with users of class-based representation. This study first evaluates the correctness of statements identified by each subject with respect to data. Each subject will receive a score corresponding to the number of true statements they identified (e.g., score of 11 for 11 patterns that are correct with respect to the data). H3a will be tested by comparing the average score of subjects in the instance-based group with subjects in the class-based condition. H3b: Reliability of patterns identified by users of the instance-based representation will be higher than those from the class-based representation. Reliability is defined as the “consistency and dependability of the output information” (Wand and Wang 1996, p. 93). Following the recommendation from Pipino et al. (2002), “simple ratio” was used to “measure the ratio of desired outcomes to total outcomes” (p. 213) to measure the reliability of information. The desired outcome was a true statement, and total outcomes included all statements generated by subjects. The  100 dependent variable was the ratio of the number of true statements from each group to the total number of statements generated by the subjects in that group. H3c: Users of instance-based representations will identify more insightful patterns. While the previous two variables were objective measures, insightfulness of a pattern is subjective. To identify those that we call insightful patterns, a coder or evaluator acts as a consultant and identifies the statements that might be surprising and may provide intelligence that is important to the company. This subjective analysis takes into account expectations from a domain. As an example from the retail domain, a pattern stating “sales in December are high” is probably something that the executive team already expects (because of the holiday season), whereas “shipping furniture to customers in California takes longer than the average shipping time” might be an insightful finding and instigate an investigation in the supply chain. Each of these hypotheses will be tested in Section 4.4.4. 4.5.1 Design and Experimental Material This experiment employs a simple 2×1 control–treatment design. Subjects in the control group received a class-based dataset, and subjects in the treatment group worked with instance-based data. The dataset was taken from an analytics challenge advertised by the Association of Information Systems (AIS) Student Chapter.24 The particular dataset we used for this experiment was from the Lockheed Martin challenge, and it included employee data,                                             24 http://aisnet.org/news/news.asp?id=207282 accessed on 29/02/2016.  101 their performance records, as well as telephone, travel, and Internet access logs. The original dataset ascribed to the class-based principles. Although one might question the design quality of the database schema,25 it was designed by the sponsors of the competition (i.e., Lockheed Martin). In other words, we had no involvement in the design of the dataset, and the design is independent of our research view. Figure 10 illustrates the classification schema of this dataset.   Figure 10. Schema of the dataset used in the experiment  As mentioned earlier, the purpose of this experiment is to evaluate content consumers’ ability to visually analyze the data and identify patterns. We selected Tableau, which is a business analytics application - according to the 2016 Gartner annual report (Parenteau et al. 2016) it is a leader in the market for business intelligence and                                             25 In the future, one could vary the design of the classification structure. Studying users’ ability to discover knowledge using classification structures of varying qualities and comparing them with the instance-based representations would be an opportunity for future research.  102 analytics platforms. Tableau satisfies the necessary features for the task; it facilitates analytics reasoning by providing an interactive visual interface for human users (Thomas and Cook 2006). Moreover, it has a feature that enables users to categorize the properties based on their own requirements or to define their own classification. In other words, this option of Tableau strips data of pre-defined classes and just provides a list of all the attributes that are known in the domain. This particular feature allowed us to declassify the data and provide subjects a dataset that is in line with principles of the instance-based paradigm.26 Users could group attributes together and define new classes whether in the Tableau interface (as presented during the training procedure of the experiment) or in their working memories while performing the task. Referring to our discussion related to the experiment’s hypothesis (in Section 4.3), we predict that human users are more effective in assimilating new information when they can construct mental models that are congruent with their prior knowledge, rather than trying to understand a fixed classification scheme created by a database designer. Subjects in the class-based group received the classification schema (Figure 10), and worked with the class-based version of the data in Tableau. As illustrated in Figure 11, Tableau shows a list of classes that are defined in the dataset — users can expand any class on the list to view its defining attributes.                                              26 As for the underlying structural layer to represent instance-based data, we propose one method that is aligned with Parson and Wand (2000), available in Appendix G.  103   Figure 11. Tableau interface for the class-based group. Classes can be expanded to view the defining attributes  Subjects in the instance-based group, however, viewed the non-classified list of attributes. As Figure 12 shows, the attributes are sorted alphabetically. Users could drag and drop attributes onto each other to form classes. Although discussed in the training, only one subject (in the pilot) formed a single class by using this feature; the rest of the subjects performed the task by focusing only on the instances and the attributes they possessed. We also provided subjects with a possible conceptualization of how data could be organized using the instance-based representation (congruent with the grammar discussed in Chapter 3), which is available in Appendix E.  104   Figure 12. Tableau interface for the instance-based group  We need to point out that Tableau provided identical functionalities and features to users of both groups (i.e., instance-based and class-based). The only difference between the experimental material that the subjects in the two groups worked with was related to how the information was organized, i.e., whether the attributes were grouped into classes (Figure 11) or presented free of classification (Figure 12). This way we evaluated subjects’ ability to explore the information and identify patterns using different information representation methods (class-based vs. instance-based). In other words, the only factor manipulated in this experiment was the way data was organized (classes vs. instances). Tableau’s functionalities had no variation across the two groups. 4.5.2 Participants Subjects were business school students from a large North American university who were registered in third and fourth year Business Technology Management stream courses. This group can be considered appropriate participants for this experiment as they are within the target populations of the experiment (see the description of content consumers and data enthusiasts in Chapter 1). A pilot was performed with 14 subjects  105 (evenly split between class-based and instance-based groups). Based on the pilot, we reworded the experimental task to clarify it further. For the main experiment, 42 subjects27 were recruited from third and fourth year students who were either registered in the “Information Systems Technology and Development” course or had already completed that course. Their prior database and domain knowledge levels (based on a 7-point Likert scale) are presented in Table 21. These numbers are similar to prior knowledge levels of subjects from experiment 1 from Chapter 3 (Table 7); in particular, the database knowledge measure is almost identical. Table 21. Measures of prior knowledge  Written queries before (%) Database knowledge Human resources knowledge Class-based 57% 2.86 / 7 2.81 / 7 Instance-based 52% 2.90 / 7 3.33 / 7  4.5.3 Experimental Task and Procedure The experiment included a 20-minute training video to teach subjects how to explore information and identify patterns using Tableau (based on a fictional online retailer’s data). Subjects could revisit the training video any time if they needed to review Tableau’s functions and operations. During the course of the experiment, subjects were also encouraged to ask questions in case they needed further clarification. Following training, subjects were instructed to answer a pre-experiment questionnaire to report their prior database, as well as domain knowledge levels (Appendix A).                                             27 Using the same reliability measures as in the main experiment (discussed later), we calculated effect size from the pilot, which was Cohen’s d = 0.77. Based on the desired statistical a priori power level of 0.8 (which is considered high), the minimum total sample size would be 44. Owing to limitations in recruiting, we conducted the analysis based on data from 42 subjects.   106 As mentioned, the dataset was provided by an international analytics challenge with an objective to identify intellectual property theft by employees. We made the task more generic by asking subjects to report all patterns with respect to data that might be worth investigating further by the stakeholders of the company (Appendix E). We only experimented in one domain because the current experiment required (after 20 minutes of training) over an hour of interaction with the system on average (Table 22). Performing two tasks might have been too taxing on the subjects and might have had a negative impact on their performance. In addition, experiment 1 from Chapter 3 indicated consistent performance across various domains. Thus, we decided that one domain would suffice for this experiment. At the end of the experiment, task participants received a post-experiment questionnaire (the same questionnaire used in the previous study — Appendix A) to answer questions regarding their perceived restrictiveness and usage intentions. 4.5.4 Data Analysis and Results The study operationalized information exploration and knowledge discovery by asking subjects to identify patterns. Subjects reported a total of 357 patterns (see Table 23). The dependent variable for this experiment was “pattern quality”, which is a measure for “selecting and ranking patterns according to their potential interest to the user” (Geng and Hamilton 2006, p. 1). The study evaluated quality of patterns objectively with respect to the data source to determine whether they were true or false. This evaluation was done by an impartial research assistant (RA) who had expertise in statistics and  107 databases. The RA’s reports were verified by the author for correctness. Table 22 reports descriptive statistics from the experiment. Table 22. Descriptive statistics Condition True statements Mean (SD)*  All statements Mean (SD) Restrictiveness Mean (SD) / 7 Usage intention Mean (SD) / 7 Time (min) Mean (SD) Control  (class-based) n = 21 4.52 (3.17) 7.43 (3.74) 4.65 (1.22) 4.10 (1.49) 66.43 (13.15) Treatment (instance-based) n = 21 7.71 (4.78) 9.75 (6.00) 4.10 (1.47) 5.13 (1.25) 59.05 (16.93) * Out of the total responses per person To test H3a (i.e., overall reliability of statements), we compared the average number of true statements produced by subjects in each group using the t-test.28 The results (Table 23) corroborated the hypothesis that subjects using the instance-based representation were able to identify greater number of correct patterns. Table 23. Performance with respect to number of true statements  No. of subjects No. of true statements Mean (SD) Class-based 21 95 4.52 (3.17) Instance-based 21 162 7.71 (4.78) Two-tailed P-value 0.015  Significant For overall reliability of statements generated by users of instance-based vs. class-based representations (H3b), Table 24 demonstrates significant statistical advantage29 of results from the instance-based group.                                              28To test the assumptions of t-test, we performed Levene’s test, which was not statistically significant for the number of statements (p = 0.065). To test for normality, we drew Q–Q plots, which are available in Appendix D. 29 We believe this comparison is justified owing to having a normal distribution of data and homogeneity of variances, If these conditions were not met, comparing the ratios may not have been appropriate (e.g., comparing 3/5 with 300/500 would be meaningless).  108 Table 24. Performance with respect to reliability  No. of subjects No. of reported statements No. of true statements Reliability (true/all) Mean (SD) Class-based 21 156 95 60% (0.49) Instance-based 21 201 162 81% (0.40) Two-tailed P-value 0.0001  Significant The next hypothesis (H3c) is based on a subjective measure of pattern quality, whether the insight gained from the pattern can be of use to stakeholders. From the set of correct/true patterns, we tried to identify the unique patterns in each group. We provided the subset of unique and true patterns to two judges (PhD students with professional Masters degrees with a minimum two years of experience in business intelligence in the industry). The judges had to identify the patterns that they considered insightful or of value to the stakeholders. The judges referred to the dataset and tried to remove patterns that may not be of material importance or were probably information already known by the stakeholders. To set the context for the following example, consider that the dataset included records of 10,000 employees; 49 employees were citizens of Peru, and the majority (97%) were from the USA. A pattern stating “all Peruvian employees are female” was true according to the data but was eliminated by the judges considering the scale. Example of a pattern that may be already known to the stakeholders was “majority of promoted employees are female”; according to the data, 77% of employees were female, so that might be expected that the number of promotions would be proportional to the employee base (a fact that is probably known by the stakeholders). Two useful examples are: “Corporate department employees is seeing an increase in demotions since 2009”, and “number of demoted male employees in IT department is  109 greater than females”, which is not proportional to the employee base (of 77% female). To demonstrate this stage of analysis, Table 25 shows answers provided by one of the subjects (Subject #21 in the class-based group), the number of statements that were true with respect to data, and the ones that our coders found insightful. Table 25. An example of coding for insightful statements Statement True / False Insightful  Corporate department tends to have the most amount of pay increase True 0 Sales department has the least amount of pay increase True 1 Employee ID 117501 has significantly longer call duration compared with others True 1 Number of female employees is much larger than males True 0 The descriptive statistics related to this round of analysis is shown in Table 26. Significance testing was not possible because of the small number of statements qualified as insightful. The reliability of the scales30 generated by the two coders was relatively high, as the Cohen’s kappa coefficient was 70%. We did not take any measures to resolve the disagreement between the two coders; rather, the content of Table 26 is based on the data from the first coder. Table 26. Performance with respect to insights gained from patterns  No. of reported statements No. of true statements No. of insightful statements Insightful / No. of statements Insightful / subjects  Class-based 156 95 9 0.05 0.43 Instance-based 201 162 15 0.07 0.71 4.5.5 Additional and post hoc Analyses In order to test H2 of this thesis (described in sections 3.4 and 3.5.4), we performed a regression analysis to study the relationship between representation type (i.e., class-                                            30 Following Hallgren (2012), we used Cohen’s kappa since the variable was binary (true/false) as opposed to an interval variable (similar to the performance measure in Chapter 3).  110 based vs. instance-based) and adoption intentions. We found users had a significant preference for instance-based representations (standardized beta = 0.384, p-value = 0.012  significant). To investigate H2a (the mediation effect of perceived restrictiveness on the relationship between representation type and users’ adoption intentions), we followed Baron and Kenny’s (1986) guidelines and started by analyzing the indirect path through the mediating factor (i.e., perceived restrictiveness in this case). The first relationship investigated was the effect of representation type on perceived restrictiveness. This relationship was not significant (standardized beta =  –0.267, p-value = 0.09  not significant). Since the variations in the independent variable (i.e., representation type) do not account for variations in the presumed mediator (i.e., perceived restrictiveness), the data does not support H2a. In other words, although users have favourable adoption intentions towards instance-based representations compared with class-based, perceived restrictiveness is not a significant factor in mediating this relationship. Table 27 provides the summary of statistics used for this analysis. Table 27. Regression analysis related to usage intentions Path Standardized beta  P-value Representation type and adoption intentions   0.384 0.012 Representation type and perceived restrictiveness –0.267 0.09  N.S. N.S.: Not Significant    The adoption intention results in this experiment are interesting when compared with the non-significant difference in adoption intentions from the experiment from Chapter 3. We cannot provide reasons based on data or theory to explain the difference, but we could tentatively relate this to the longer time that subjects spent on doing the tasks.  111 The average time that subjects (in both instance-based and class-based groups) spent on this experiment was 63 minutes, compared with 36 minutes in Experiment 1 from Chapter 3. Data was taken from Tables 8 and 28. As for task completion time, users of the instance-based representation were on average 7 minutes faster, but the difference was not statistically significant, as shown in Table 28. Table 28. Task completion time  Completion time Class-based M = 66.43, SD = 13.15, n = 21 Instance-based M = 59.05, SD = 16.93, n = 21 Two-tailed P-value 0.122  N.S. M: Mean, SD: Standard Deviation, n: Sample Size, N.S.: Not Significant  The study also investigated the moderating effect of prior database knowledge and prior domain knowledge on the number of true statements generated by each subject. We looked at the interaction effect of prior database knowledge and experimental conditions on subjects’ performance and at the interaction effect of priori domain knowledge and experimental conditions on subjects’ performance. No interaction was found in this analysis (p-values of 0.678 and 0.762, respectively). 4.6 Challenges and the Validity of the Study Using student subjects in a laboratory experiment might be considered a limitation. However, students are within the target population of our theory (Compeau et al. 2012); hence, we consider it a justified choice. In addition, since this experiment was the first study of content-consumers of instance-based representations in exploratory tasks, we considered the internal validity to be of critical importance. Thus, performing a  112 laboratory experiment instead of a field experiment in this case was also justified (Calder et al. 1981). In addition, the theoretical justification from this work posited that the instance-based approach may require a lighter cognitive load than class-based representations because users might not need to process irrelevant classes. A valid concern could be raised that instance-based representations may include more irrelevant instances than irrelevant classes in the class-based paradigm. In this chapter’s experiment, the issue of having too many instances does not apply as subjects studied only the instances that were relevant to their needs. Drawing support from our experiment, we believe that users of the instance-based approach will not be overwhelmed with the sheer number of instances that may exist in the repository. Instance-based users are able to view only the instances and properties that are congruent with their current task and the mental model that they have built in their minds in the process of solving the problem. Unlike users of the traditional class-based approaches, instance-based users do not need to understand a schema developed by database designers (which was developed based on the needs that they anticipated of the system and in congruence with their own mental frameworks). On a more general note, the study of users’ ability to perform tasks on large databases (whether instance-based or class-based) is an interesting research question. Users will be challenged to find the required information in an instance-based database with thousands of attributes or, similarly, in a class-based database with hundreds of classes. Despite the challenge, however, users need to identify the few attributes that are required to answer a certain question in an information retrieval task; thus, they may  113 not need to refer to the possible large number of attributes available. We tentatively argue that locating attributes within a complex classification structure is probably more difficult than when attributes are presented in a large list (as in the instance-based approach — Figure 11). Studying this issue in more depth could be a prime opportunity for future research. As for databases of smaller scale, however, our experiment demonstrated that users of instance-based representations achieved better performance in exploration of information compared with class-based users. 4.7 Potential Implications As mentioned earlier, researchers and practitioners alike have embarked on a quest for more flexible and effective data management methods that are able to cater to the requirements of modern open information environments. Traditional methods, however, lack the requisite flexibility in accommodating the aforementioned environments in the current IS landscape. This study demonstrates the effectiveness of the instance-based paradigm in knowledge discovery and information exploration tasks. It also acts as scientific evidence of the superiority of this approach to incumbent methods. We need to point out that the results of the laboratory study are represented as proof of concept for the practical utility and efficacy of the instance-based approach in knowledge discovery tasks. For more generalizable results with higher external validity, we call for field experiments in the future. From a theoretical perspective, the rigorous evaluation of the instance-based paradigm is a contribution to design science as well as classification research. Moreover, the formalization of the instance-based approach as well as the findings of the study could guide the design of user-facing interfaces in IS research. For example, in e-commerce  114 research, users could be given the ability to flexibly organize the products based on their attributes (rather than classes), and consequently they might be able to understand the environment better and make better (shopping) decisions that may require further exploration of information. From a practical point of view, validation of this study’s hypotheses informs organizations of the effectiveness and flexibility of the instance-based paradigm. As mentioned earlier, in the current age of information, being able to support varying applications by different users, over multiple sources of data, is of paramount importance (Parsons and Wand 2013, Zikopoulos and Eaton 2011). By adopting the instance-based paradigm, organizations can improve effectiveness of information retrieval and knowledge discovery tasks by their users, and this improvement in data management can enable new insights that guide organizational decision-making (Brown et al. 2011), which in turn translates to better performance by the companies. It should be noted that adopting the instance-based paradigm would be a challenging undertaking by companies as it requires migration of legacy data (from traditional to instance-based) and training users in the principles of the instance-based paradigm. 4.8 Concluding Remarks This chapter empirically evaluated the user’s ability in discovering knowledge using instance-based representation. The study showed that instance-based users were able to identify more correct patterns on average. Instance-based users were also more reliable in the knowledge discovery task (i.e., a more dependable information output as defined by Wand and Wang 1996) by demonstrating a significantly higher percentage of correct patterns over the total number of patterns reported.   115 Advantages of the instance-based paradigm could inform future research in classification (as discussed in Section 4.6). Moreover, the instance-based paradigm as a more effective data management approach could be adopted by organizations in order to improve their ability to manage information in the era of big data. The instance-based approach could be advantageous to traditional methods, particularly when human users are in charge of making analytical reasoning (i.e., visual analytics), as opposed to automated mining algorithms run on datasets. The instance-based representation is expected to improve the efficiency of the human cognitive process in assimilating knowledge, thus leading to enhanced performance by the users in knowledge discovery and information exploration tasks.  116 Chapter 5: Summary and Conclusions The three studies in this dissertation explored the impact of ontological principles in information modelling and their impact on users’ performance of cognitive tasks. The first study (i.e., the meta-analysis) synthesized prior empirical work that had studied ontological guidance in conceptual models and its impact on users’ understanding of the application domain. The results of the study indicate that ontological guidance does indeed improve users’ performance — more prominently in tasks that require integrating information from a conceptual model with prior knowledge (i.e., deep-level of understanding). This could be scientific evidence in favour of incorporating ontological guidance in education as well as practice of systems analysis. However, results related to the usage of ontologically guided data models and users’ ability to formulate database queries were inconclusive. This motivated studying a data modelling approach that is rooted in ontological principles, namely the instance-based paradigm.  Following March’s (1991) taxonomy of system usage types, we focused on users’ ability to exploit and explore instance-based information. The instance-based paradigm allows users to form their own mental frameworks in order to assimilate the presented information with their prior knowledge. Owing to this facility, the second study demonstrated that users of instance-based representations were able to perform better in information retrieval tasks. The third study focused on knowledge discovery tasks and how they could be improved using the instance-based approach. The theoretical benefits of the instance-based approach have been discussed in the literature (Parsons and Wand 2000, 2013). The studies in my thesis provide empirical support that the instance-based method is a superior alternative to traditional data  117 management approaches that enhances users’ performance in querying the data and visual analytics tasks (i.e., reasoning on data using human judgment, as opposed to running an automated data mining algorithm). One needs to point out that the experiments were conducted in a laboratory using student subjects. While the experiments were of high internal validity, future field studies need to be conducted in order to fully investigate the practical utility of the instance-based paradigm. As for future research, in the first study the results related to other dimensions of understanding (such as surface-level model understanding and perceptions of understanding) were not statistically conclusive. Studying the reasons for this phenomenon and investigating the causal and moderating factors that influence the impact of ontological guidance could be the focus of future research. There are also many opportunities to continue on studies 2 and 3. To broaden the scope of the current studies, one could explore how users who are experts in the application domain, as well as in technology, could utilize the instance-based paradigm. Moreover, the instance-based method could be applied as a solution to other existing challenges in managing organizational data. For example, the healthcare domain has lagged in fully utilizing the available organizational data owing to some challenges in information management (Groves et al. 2013). Namely, in this domain, information “remains siloed within one group or department because organizations lack procedures for integrating data and communicating findings” (Groves et al. 2013, p. 2). Being able to share data across different pools (e.g., hospitals and pharmaceutical and insurance companies) with fewer complications (when using instance-based representations compared with class-based representations) can be helpful in improving effectiveness  118 of treatments as well as reducing costs. In addition, the diverse user base (from administrators to physicians with different specializations) may have different applications of the data, such as identifying diseases in earlier stages, reducing readmission rates, and accelerating research and development. Because of such requirements in the healthcare domain, the instance-based approach would be a viable method to be incorporated and tested in future research. Another prime situation for the instance-based approach could be in modelling time-series data. In many scientific fields (e.g., chemistry) and industries (e.g., finance), phenomena are measured and recorded at different points in time. Organizing, querying, and mining time-series data has become the focus of recent studies (Esling and Agon 2012, Wang et al. 2013). One of the biggest challenges related to storage of time-series data is related to the high number of dimensions for each variable (e.g., stock value of a company recorded at every hour over a decade long period). Modelling these dimensions using traditional data modelling methods is a challenge that has been addressed by various researchers (Van Wijk and Van Selow 1999, Heer et al. 2009). Applying the instance-based paradigm to modelling time-series data could potentially offer the ability to model the multiple dimensions of time-series data more effectively by freeing the information from predetermined structures that would limit modelling and representation of data. Future research could investigate the flexibility afforded by the instance-based model to time-series data and its impact on users’ ability to perform a diverse range of exploratory tasks. Research on larger scale and more complex databases could also be performed to investigate users’ ability to retrieve or explore information when the possibilities are  119 vast. In a class-based database with thousands of classes or an instance-based one with thousands of attributes users will certainly be challenged to find the required information. Whether scale of the database would lead to different results than the findings in Chapters 3 and 4 needs to be studied in the future. In conclusion, the three studies in this thesis contribute to the systems analysis and design domain. Research on ontologically guided conceptual models could shed some light on the factors that could modify the effect of ontological guidance on different aspects of users’ understanding. Similarly, there are various opportunities to continue the research on the instance-based method in order to investigate the viability of the approach in different contexts and settings.  120 References Abaci, D. J., Boncz, P. A., & Harizopoulos, S. 2009. “Column-oriented database systems.” Proceedings of the VLDB Endowment, 2(2), 1664–1665. Allen, G. N. & March, S. T., 2006a. “A critical assessment of the Bunge–Wand–Weber ontology for conceptual modeling.. 16th Annual Workshop on Information Technolgies & Systems (WITS) Paper. Available at SSRN: http://ssrn.com/abstract=951803 or http://dx.doi.org/10.2139/ssrn.951803, accessed on 05/05/2016 Allen, G. N., & March, S. T. 2006b. “The effects of state-based and event-based data representation on user performance in query formulation tasks.” MIS Quarterly, 30(2), 269–290. Allen, G. N., & March, S. T. 2012. “A research note on representing part–whole relationships in conceptual modeling.” MIS Quarterly, 36(2), 945–964. Allen, G., & Parsons, J. 2010. “Is query reuse potentially harmful? Anchoring and adjustment in adapting existing database queries.” Information Systems Research, 21(1), 56–77. Anderson, J. R., & Pirolli, P. L. 1984. “Spread of activation.” Journal of Experimental Psychology: Learning, Memory, and Cognition, 10(4), 791–798. Angeles, P. A. 1981. Dictionary of Philosophy. New York: Barnes & Noble Books. Antony, S., Batra, D., & Santhanam, R. 2005. “The use of a knowledge-based system in conceptual data modeling.” Decision Support Systems, 41(1), 176–188.  121 Baron, R. M., & Kenny, D. A. (1986). “The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.” Journal of Personality and Social Psychology, 51, 1173–1182.  Bera, P., Burton-Jones, A., & Wand, Y. 2011. “Guidelines for designing visual ontologies to support knowledge identification.” MIS Quarterly, 35(4), 883–908. Bera, P., Burton-Jones, A., & Wand, Y. 2014. “Research note—How semantics and pragmatics interact in understanding conceptual models.” Information Systems Research, 25(2), 401–419. Bodart, F., Patel, A., Sim, M., & Weber, R. 2001. “Should optional properties be used in conceptual modelling? A theory and three empirical tests.” Information Systems Research, 12(4), 384–405. Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. 2011. Introduction to Meta-analysis. John Wiley & Sons. Bowen, P. L., O’Farrell, R. A., & Rohde, F. H. 2004. “How does your model grow? An empirical investigation of the effects of ontological clarity and application domain size on query performance.” In Proceedings of the 25th International Conference on Information Systems. Association of Information Systems. 77–90. Bowen, P. L., O’Farrell, R. A., & Rohde, F. H. 2006. “Analysis of competing data structures: Does ontological clarity produce better end user query performance.” Journal of the Association for Information Systems, 7(8). 141-156. Bowen, P. L., O’Farrell, R. A., & Rohde, F. H. 2009. “An empirical investigation of end-user query development: the effects of improved model expressiveness vs. complexity.” Information Systems Research, 20(4), 565–584.  122 Brehm, J. W. 1966. A Theory of Psychological Reactance. Oxford, England: Academic Press. Brown, B., Chui, M., & Manyika, J. 2011. “Are you ready for the era of ‘big data’?” McKinsey Quarterly, 4, 24–35. Browne, G. J., & Parsons, J., 2012. “More enduring questions in cognitive IS research.” Journal of the Association for Information Systems, 13(12), 1000–1011. Brynjolfsson, E., & Saunders, A. 2010. Wired for Innovation. The MIT Press. Bunge, M. 1977. Treatise on Basic Philosophy: Ontology I: The Furniture of the World. (Vol. 1). Springer. Burton-Jones, A., Clarke, R., Lazarenko, K., & Weber, R. 2012. “Is use of optional attributes and associations in conceptual modeling always problematic? Theory and empirical tests.” ICIS 2012 Proceedings, pp. 3041–3056. Burton-Jones, A., & Grange, C. 2012. “From use to effective use: A representation theory perspective.” Information Systems Research, 24(3), 632–658. Burton-Jones, A., & Meso, P. N. 2006. “Conceptualizing systems for understanding: An empirical test of decomposition principles in object-oriented analysis.” Information Systems Research, 17(1), 38–60. Burton-Jones, A., & Meso, P. N. 2008. “The effects of decomposition quality and multiple forms of information on novices’ understanding of a domain from a conceptual model.” Journal of the Association for Information Systems, 9(12). Burton-Jones, A., & Straub Jr., D. W. 2006. “Reconceptualizing system usage: An approach and empirical test.” Information Systems Research, 17(3), 228–246.  123 Burton-Jones, A., Wand, Y., & Weber, R. 2009. “Guidelines for empirical evaluations of conceptual modeling grammars.” Journal of the Association for Information Systems, 10(6), Article 1. Burton-Jones, A., & Weber, R. 1999. “Understanding relationships with attributes in entity-relationship diagrams.” In Proceedings of the 20th International Conference on Information Systems. Association for Information Systems. pp. 214–228. Burton-Jones, A., & Weber, R. 2003. “Properties do not have properties: Investigating a questionable conceptual modeling practice.” In D. Batra, J. Parsons, and V. Ramesh (eds.), Proceedings of the Second Annual Symposium on Research in Systems Analysis and Design, Miami. Calder, B. J., Phillips, L. W., & Tybout, A. M. 1981. “Designing research for application.” Journal of Consumer Research, 8(2), 197–207. Collins, A. M., & Quillian, M. R. 1969. “Retrieval time from semantic memory.” Journal of Verbal Learning and Verbal Behavior, 8(2), 240–247. Compeau, D., Marcolin, B., Kelley, H., & Higgins, C. 2012. “Research commentary-generalizability of information systems research using student subjects-a reflection on our practices and recommendations for future research.” Information Systems Research, 23(4), 1093–1109. Davern, M., Shaft, T., & Te’eni, D. 2012. “Cognition matters: Enduring questions in cognitive IS research.” Journal of the Association for Information Systems, 13(4), 273–314.  124 Davis Jr., F. D. 1986. A Technology Acceptance model for Empirically Testing New End-user Information Systems: Theory and Results. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA. Degen, W., Heller, B., Herre, H., & Smith, B. 2001. “GOL: toward an axiomatized upper-level ontology.” In Proceedings of the International Conference on Formal Ontology in Information Systems - Volume 2001, pp. 34–46. Derry, S. J. 1996. “Cognitive schema theory in the constructivist debate.” Educational Psychologist, 31(3–4), 163–174. Elmasri, R., & Navathe, S. B. N. 2011. Database Systems: Models, Languages, Design, and Application Programming. Pearson. Esling, P., & Agon, C. 2012. “Time-series data mining.” ACM Computing Surveys (CSUR), 45(1), 12. Evermann, J., & Wand, Y. 2006. “Ontological modeling rules for UML: An empirical assessment.” Journal of Computer Information Systems, 46(5), 14–29. Fishbach, A., & Ferguson, M. J. 2007. “The goal construct in social psychology.” In Social Psychology: Handbook of Basic Principles (2nd ed.). E.T Higgins (ed.). New York: Guilford Press. pp. 490–515. Ford, N. 2004. “Modeling cognitive processes in information seeking: From Popper to Pask.” Journal of the American Society for Information Science and Technology, 55(9), 769–782. Fonseca, F. 2007. “The double role of ontologies in information science research.” Journal of the American Society for Information Science and Technology, 58(6), 786–793.  125 Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J., 1992. “Knowledge discovery in databases: An overview.” AI Magazine, 13(3), 57–70. Gemino, A., & Wand, Y. 2004. “A framework for empirical evaluation of conceptual modeling techniques.” Requirements Engineering, 9(4), 248–260. Gemino, A., & Wand, Y. 2005. “Complexity and clarity in conceptual modeling: comparison of mandatory and optional properties.” Data & Knowledge Engineering, 55(3), 301–326. Genero, M., Poels, G., and Piattini, M. 2008. “Defining and validating metrics for assessing the understandability of entity–relationship diagrams.” Data & Knowledge Engineering, 64(3), 534–557.  Geng, L., and Hamilton, H. J. 2006. “Interestingness measures for data mining: A survey.” ACM Computing Surveys (CSUR), 38(3), Article 9. Gray, J., Liu, D. T., Nieto-Santisteban, M., Szalay, A., DeWitt, D. J., & Heber, G. 2005. Scientific data management in the coming decade. ACM SIGMOD Record, 34(4), 34–41. Groves, P., Kayyali, B., Knott, D., & Van Kuiken, S. 2013. The Big Data Revolution in Healthcare: Accelerating Value and Innovation. New York: McKinsey Global Institute. Gupta, A., & Jain, R., 1997. “Visual information retrieval.” Communications of the ACM, 40(5), 70–79. Gurstein, M. 2011. “Open data: Empowering the empowered or effective  data use for everyone?” First Monday, 16(2) Available at  126 http://firstmonday.org/ojs/index.php/fm/article/view/3316/2764, accessed on 05/05/2016. Guizzardi, G., Herre, H., & Wagner, G. 2002. “On the general ontological foundations of conceptual modeling.” In Conceptual Modeling—ER 2002. Springer Berlin Heidelberg, pp. 65–78. Guizzardi, G., Wagner, G., & Sinderen, M. 2004. “A formal theory of conceptual modeling universals.” Proceedings of the Workshop on Philosophy and Informatics (WSPI), Cologne, Germany. pp. 1–10. Hadar, I., & Soffer, P. 2006. “Variations in conceptual modeling: classification and ontological analysis.” Journal of the Association for Information Systems,7(8), 568–592. Halevy, A., Norvig, P., & Pereira, F. 2009. “The unreasonable effectiveness of data.” IEEE Intelligent Systems, 24(2), 8–12. Hallgren, K. A. 2012. “Computing inter-rater reliability for observational data: an overview and tutorial”.Tutorials in Quantitative Methods for Psychology, 8(1),  23–34. Halpin, T., 1998. Object-role modeling (ORM/NIAM). In Handbook on Architectures of Information Systems. Springer Berlin Heidelberg. Heer, J., Kong, N., & Agrawala, M. 2009. Sizing the horizon: the effects of chart size and layering on the graphical perception of time series visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM. pp. 1303–1312).   127 Khatri, V., Vessey, I., Ram, S., & Ramesh, V. 2006. “Cognitive fit between conceptual schemas and internal problem representations: The case of geospatio-temporal conceptual schema comprehension.” IEEE Transactions on Professional Communication, 49(2), 109–127. Kim, J., Hahn, J., & Hahn, H. 2000. “How do we understand a system with (so) many diagrams? Cognitive integration processes in diagrammatic reasoning.” Information Systems Research, 11(3), 284–303. Lakoff, G. 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. University of Chicago Press. Lukyanenko, R., Parsons, J., & Wiersma, Y. 2011. “Citizen science 2.0: Data management principles to harness the power of the crowd.” In Service-Oriented Perspectives in Design Science Research. Springer Berlin Heidelberg. pp. 465–473. Lukyanenko, R., Parsons, J., & Wiersma, Y. 2014. “The IQ of the crowd: Understanding and improving information quality in structured user-generated content.” Information Systems Research, 25(4), 669–689. March, J. G. 1991. “Exploration and exploitation in organizational learning.” Organization Science, 2(1), 71–87. March, S. T., & Allen, G. N. 2014. “Toward a social ontology for conceptual modeling.” Communications of the Association for Information Systems, 34, Article 20. Available at http://aisel.aisnet.org/cais/vol34/iss1/70, accessed on 24/05/2016  128 Mayer, R. E. 2003. “The promise of multimedia learning: using the same instructional design methods across different media.” Learning and Instruction, 13(2), 125–139. McGarry, K., 2005. “A survey of interestingness measures for knowledge discovery.” The Knowledge Engineering Review, 20(01), 39–61. Milton, S. K., Rajapakse, J., & Weber, R. 2012. “Ontological clarity, cognitive engagement, and conceptual model quality evaluation: An experimental investigation.” Journal of the Association for Information Systems, 13(9). Available at http://aisel.aisnet.org/jais/vol13/iss9/2, accessed on 28/05/2016. Moody, D. L. 2002a. “Complexity effects on end user understanding of data models: An experimental comparison of large data model representation methods.” In ECIS 2002 Proceedings. pp. 482–496. Moody, D. L. 2002b. “Comparative evaluation of large data model representation methods: The analyst’s perspective.” In Conceptual Modeling — ER 2002. S. Spaccapietra, S. T. March, & Y. Kambayashi (eds.). Springer-Verlag Berlin Heidelberg. pp. 214–231. Moody, D. L. 2005. “Theoretical and practical issues in evaluating the quality of conceptual models: current state and future directions.” Data & Knowledge Engineering, 55(3), 243–276. Morton, K., Balazinska, M., Grossman, D., & Mackinlay, J. 2014. “Support the data enthusiast: Challenges for next-generation data-analysis systems”. Proceedings of the VLDB Endowment, 7(6), pp. 453–456. Munzner, T. 2014. Visualization Analysis and Design. CRC Press.  129 Mylopoulos, J. 1992. “Conceptual modelling and Telos 1.” In Conceptual Modeling, Databases and CASE: An Integrated View of Information System Development, P. Loucopoulos & R. Zicari (eds.) New York: John Wiley & Sons. 20 pp. Newell, A. 1982. “The knowledge level.” Artificial Intelligence, 18(1), 87–127. Newell, A., & Simon, H. A. 1972. Human Problem Solving. Englewood Cliffs, NJ: Prentice-Hall. Ogden, W. C. 1986. “Implications of a cognitive model of database query: comparison of a natural language, formal language and direct manipulation interface.” ACM SIGCHI Bulletin, 18(2), 51–54. Orwin, R. G. 1983. “A fail-safe N for effect size in meta-analysis.” Journal of Educational Statistics, 8(2),157–159. Paolini, P., & Di Blas, N. 2014. “Exploratory portals: The need for a new generation.” In Proceedings of the 2014 International Conference on Data Science and Advanced Analytics (DSAA), IEEE. pp. 581–586. Parenteau, J., Sallem, R. L., Howson, C., Tapadinhas, J., Schlegel, K., & Oestreich, T. W. 2016. Magic Quadrant for Business Intelligence and Analytics Platforms. Available at https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb, accessed on 29/02/2016 Parsons, J. 2002. “Effects of local versus global schema diagrams on verification and communication in conceptual data modeling.” Journal of Management Information Systems, 19(3), 155–183.  130 Parsons, J. 2011. “An experimental study of the effects of representing property precedence on the comprehension of conceptual schemas.” Journal of the Association for Information Systems, 12(6), 441–462. Parsons, J., & Cole, L. 2005. “What do the pictures mean? Guidelines for experimental evaluation of representation fidelity in diagrammatical conceptual modeling techniques.” Data & Knowledge Engineering, 55(3), 327–342. Parsons, J., & Saunders, C. 2004. “Cognitive heuristics in software engineering applying and extending anchoring and adjustment to artifact reuse.” IEEE Transactions on Software Engineering, 30(12), 873–888. Parsons, J., & Wand, Y. 2000. “Emancipating instances from the tyranny of classes in information modeling.” ACM Transactions on Database Systems (TODS), 25(2), 228–268. Parsons, J., & Wand, Y. 2008a. “A question of class.” Nature, 455(7216), 1040–1041. Parsons, J., & Wand, Y. 2008b. “Using cognitive principles to guide classification in information systems modeling.” MIS Quarterly, 32(4), 839–868. Parsons, J., & Wand, Y. 2013. “Cognitive principles to support information requirements agility.” In Advanced Information Systems Engineering Workshops. Springer Berlin Heidelberg, pp. 192–197. Parsons, J. & Wand, Y. 2014. “A foundation for open information environments”, Proceedings of the 22nd European Conference on Information Systems, Tel Aviv, pp. 1–9. Payne, J. W., Bettman, J. R., & Johnson, E. J. 1993. The Adaptive Decision Maker. Cambridge University Press.  131 Pipino, L. L., Lee, Y. W., & Wang, R. Y. 2002. “Data quality assessment.” Communications of the ACM, 45(4), 211–218. Recker, J., Indulska, M., Rosemann, M., & Green, P. 2006. “How good is BPMN really? Insights from theory and practice.” In Proceedings of the 14th European Conference on Information Systems, June 12–14, 2006, Goeteborg, Sweden. J. Ljumgberg and M. Magus (eds.). Paper 135.  Recker, J., Rosemann, M., Green, P. F., & Indulska, M. 2011. “Do ontological deficiencies in modeling grammars matter?” MIS Quarterly, 35(1), 57–79. Rosenthal, R., & DiMatteo, M. R. 2002. “Meta-analysis.” Stevens’ Handbook of Experimental Psychology, Vol. 4 (3rd ed.). John Wiley and Sons. Saghafi, A. 2012. Using Semantic Web Technologies to Implement Flexible Information Management Systems. Electronic Theses and Dissertations (ETDs), University of British Columbia. Saghafi, A. and Wand, Y., 2014. “Do ontological guidelines improve understandability of conceptual models? a meta-analysis of empirical work.” In 47th Hawaii International Conference on System Sciences (HICSS) (pp. 4609-4618). IEEE. Samuel, B. 2011. “Conceptual model understanding: The role of instance information.” Proceedings of the SIGSAND Symposium on Research in Systems Analysis and Design, Bloomington, IN. Searle, J. R. (2006). “Social ontology: Some basic principles.” Anthropological Theory 6(1), 12–29. Shaft, T. M., & Vessey, I. 2006. “The role of cognitive fit in the relationship between software comprehension and modification.” MIS Quarterly, 30(1), 29–55.  132 Shanks, G. G., Nuredini, J., Tobin, D., Moody, D. L., & Weber, R. 2003. “Representing things and properties in conceptual modelling: An empirical evaluation.” In ECIS 2002 Proceedings, pp. 1775–1785. Shanks, G., Tansley, E., Nuredini, J., Tobin, D., & Weber, R. 2008. “Representing part-whole relations in conceptual modeling: an empirical evaluation.” MIS Quarterly, 32(3), 553–573. Shanks, G., & Weber, R. 2012. “The hole in the whole: A response to Allen and March.” MIS Quarterly, 36(3), 965–980. Silver, M. S. 2006. “Decisional guidance: Broadening the scope.” Human-Computer Interaction and Management Information Systems: Foundations, 90–119. Simsion, G., Milton, S. K. and Shanks, G., 2012. “Data modeling: Description or design?” Information & Management, 49(3),151–163. Soffer, P., & Hadar, I. 2007. “Applying ontology-based rules to conceptual modeling: a reflection on modeling decision making.” European Journal of Information Systems, 16(5), 599–611. Tan, C. W., Benbasat, I., & Cenfetelli, R. T. 2013. “IT-mediated customer service content and delivery in electronic governments: An empirical investigation of the antecedents of service quality.” MIS Quarterly, 37(1), 77–109. Thomas, J. J., & Cook, K. A. 2006. “A visual analytics agenda.” IEEE Computer Graphics and Applications, 26(1), 10–13. Tversky, A., & Kahneman, D. 1974. “Judgment under uncertainty: Heuristics and biases.” Science, 185(4157), 1124–1131.  133 Van Wijk, J. J., & Van Selow, E. R. 1999. “Cluster and calendar based visualization of time series data.” In Proceedings of the 1999 IEEE Symposium on Information Visualization (Info Vis ’99). pp. 4–9. Vessey, I. 1991. “Cognitive fit: A theory-based analysis of the graphs versus tables literature.” Decision Sciences, 22(2), 219–240. Wand, Y., Storey, V. C., & Weber, R. 1999. “An ontological analysis of the relationship construct in conceptual modeling.” ACM Transactions on Database Systems (TODS), 24(4), 494–528. Wand, Y. & Wang, R. Y. 1996. “Anchoring data quality dimensions in ontological foundations.” Communications of the ACM, 39(11), 86–95. Wand, Y., & Weber, R. 1989. “An ontological evaluation of systems analysis and design methods.” Information System Concepts: An In-Depth Analysis. Elsevier Science Publishers BV, North-Holland. pp. 79–107. Wand, Y., & Weber, R. 1990. “An ontological model of an information system.” IEEE Transactions on Software Engineering, 16(11), 1282–1292. Wand, Y., & Weber, R. 1993. “On the ontological expressiveness of information systems analysis and design grammars.” Information Systems Journal, 3(4), 217–237. Wand, Y., & Weber, R. 1995. “On the deep structure of information systems.” Information Systems Journal, 5(3), 203–223. Wand, Y., & Weber, R. 2002. “Research commentary: information systems and conceptual modeling—a research agenda.” Information Systems Research, 13(4), 363–376.  134 Wang, W., & Benbasat, I. 2009. “Interactive decision aids for consumer decision making in e-commerce: The influence of perceived strategy restrictiveness.” MIS Quarterly, 33(2), 293–320. Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., & Keogh, E. 2013. “Experimental comparison of representation methods and distance measures for time series data.” Data Mining and Knowledge Discovery, 26(2), 275–309. Weber, R. 1997. Ontological Foundations of Information Systems. Melbourne: Coopers & Lybrand and the Accounting Association of Australia and New Zealand. Wixom, B. H., & Todd, P. A. 2005. “A theoretical integration of user satisfaction and technology acceptance.” Information Systems Research, 16(1), 85–102. Xu, J. D., Benbasat, I., & Cenfetelli, R. T. 2013. “Integrating service quality with system and information quality: an empirical test in the e-service context.” MIS Quarterly, 37(3), 169–196. Zikopoulos, P., & Eaton, C. 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media.  135 Appendices Appendix A  Material (Experiments in Chapters 3 and 4) A.1 Pre-experiment Questionnaire Table A1. Pre-experiment questionnaire Have you ever written queries using a database management system? a) Yes b) No  Compared to an average database user, I would rate my level of experience in database usage as: 1) Very low 2) Low 3) Somewhat low 4) Neither low nor high 5) Somewhat high 6) High 7) Very High Compared to a regular traveler, I would rate my level of knowledge of activities in arranging trips as: 1) Very low 2) Low 3) Somewhat low 4) Neither low nor high 5) Somewhat high 6) High 7) Very High Compared to someone who works in a consulting firm, I would rate my level of project management knowledge as: 1) Very low 2) Low 3) Somewhat low 4) Neither low nor high 5) Somewhat high 6) High 7) Very High         136 A.2 Post-experiment Questionnaire Table A2. Post-experiment questionnaire (taken from Wang and Benbasat (2009) and Xu et al. (2013), reworded to reflect experimental task). Q1–Q3 measure restrictiveness, while Q4–Q6 measure adoption intentions. Q1. I had limited control over the way the representation presented the information. 1) Strongly disagree 2) Disagree 3) Somewhat disagree 4) Neither agree or disagree 5) Somewhat agree 6) Agree 7) Strongly agree Q2. In terms of my preferred way of viewing the information, the representation was confined. 1) Strongly disagree 2) Disagree 3) Somewhat disagree 4) Neither agree or disagree 5) Somewhat agree 6) Agree 7) Strongly agree Q3. In terms of my preferred way of viewing the information, the views were restricted.  1) Strongly disagree 2) Disagree 3) Somewhat disagree 4) Neither agree or disagree 5) Somewhat agree 6) Agree 7) Strongly agree Q4. Next time I need to perform such tasks, I would like to use this kind of representation. 1) Strongly disagree 2) Disagree 3) Somewhat disagree 4) Neither agree or disagree 5) Somewhat agree 6) Agree 7) Strongly agree Q5. Assuming I had access to the representation, I intend to use it to perform such tasks when needed. 1) Strongly disagree 2) Disagree 3) Somewhat disagree 4) Neither agree or disagree 5) Somewhat agree 6) Agree 7) Strongly agree Q6. Given that I had access to the representation, I predict that I would use it to perform such tasks if needed. 1) Strongly disagree 2) Disagree 3) Somewhat disagree 4) Neither agree or disagree 5) Somewhat agree 6) Agree 7) Strongly agree  A.3 Travel Agency Domain (Experiment I) Description: In this hypothetical domain, customers of the travel agency plan trips with the help of travel agents. The travel agency acts as an intermediary between customers and service providers (airlines, train services, etc.). The travel agents create itineraries for customers with respect to their preferences. The agent would need to collect  137 information regarding customers’ payment information and, in some cases, if there are special considerations such as allergies. The itinerary information is shared between the service providers, terminals, and travel agencies; it includes information such as travel locations (to and from), dates, and price. Material for the control and treatment groups The material for the control group included a general schema (see Figure A1) and actual data (see Figure A2).  Figure A1. Travel agency domain, control group general schema   138    Figure A2. Travel agency domain, control group actual data   139 The material for the treatment group includes a general schema (Figure A3) and actual data (Figure A4).    Figure A3. Travel agency domain, treatment group general schema  140  Figure A4. Travel agency domain, treatment group actual data  A.4 Questions (both Control and Treatment) 1. Jennifer Nelson, one of the customers, called the agency and asked for a change in her itinerary. She is currently booked to fly on October 10 to England, but she needs to postpone the departure date to October 12. Please describe the procedure (i.e., changes of information) to fulfill Jennifer’s request. 2. Canada Border Services has asked the agency to identify all trips facilitated by two different types of service providers (e.g., a trip that half of it is with a train and the second half with an airplane). Describe the procedure for retrieving that information.  141 3. The manager wants to allocate commissions earned by one of the agents (named Harry Miller) at the end of the month. Describe the procedure for evaluating an agent’s monthly sales performance (e.g., in January 2014). 4. Air Canada needs a list of all passengers (flying with Air Canada) that have some sort of allergies. Describe the procedure for generating that list. A.5 Consulting Firm Domain Description: SimPro is a multi-national consulting firm with various divisions all over North America. Each division is responsible for delivering projects defined by their diverse range of clients (from individuals to corporations). The projects are completed and managed by firm employees within each division. For each project, the company keeps track of budget, start and end date, as well as information about the project’s owner (i.e., client) and manager (i.e., one of the employees). Employees need to be involved with a project at any given time. Their performance is evaluated using a timesheet, which stores the number of hours that they worked on a project. A.6 Material for the Control and Treatment Groups The material for the control group included a general schema (see Figure A5) and actual data (see Figure A6).  142  Figure A5. Consulting firm domain, control group general schema     143   Figure A6. Consulting firm domain, control group actual data  144   Figure A7. Consulting firm domain, treatment group general schema   145  Figure A8. Consulting firm domain, treatment group actual data  A.7 Questions (for both Control and Treatment) 1. Edward McKay, one of the employees, has asked for overtime pay for his effort in completing the market research project for the EZLink company. Describe the procedure to identify the average hours worked per day by Edward on the market research project — for the sake of simplicity, weekends and holidays are also included.  146 2. The headquarters wants to identify the clients that have worked with two or more divisions of SimPro. Please describe the procedure to identify such clients. 3. The manager assigned to the auditing project of Vancouver Canucks has left the firm. The managing partner at the firm has decided to remove Edward from the current project that he is involved with, and assign him as the project manager of the Canucks’ audit. Describe the procedure for performing this task. 4. Headquarter also wants to identify SimPro divisions that serve clients that are individuals (not corporations). Please describe the process to identify those divisions.    147 Appendix B  Answer Keys: Experiment 1 and 2 (Chapter 3) This Appendix gives the answer keys to Experiment I. Note that the questions are abbreviated to save space in both Tables B1 and B2. Table B1. Answer key to travel agency case Control (class-based) Treatment (instance-based) Q1. Describe the procedure required for helping a client postpone her departure date.  a) Look up Jennifer Nelson’s “Customer_ID” from the customer table. b) Access Jennifer’s itinerary from the “Itinerary” table, by searching for her “Customer_ID” (i.e., foreign key). c) Using the “Itinerary_No” found in step b, find the “Departure Date” field in the “Itinerary_Details” table – the current value should be October 10. d) Change the “Departure Date” on Jennifer’s itinerary to October 12. a) Locate the thing that has the “Name” property with the value of “Jennifer Nelson”. b) Look up the “Travels With” link that originates from the thing representing “Jennifer Nelson”. This link should have “October 10” as the value for “Departure Date” property. c) Considering the available dates on the service provider’s website, a new “Departure Date” will be set (e.g., October 12).  Q2. Describe the procedure for identifying trips that are facilitated by two different types of service providers (e.g., train and airplane). a) In the Itinerary_Details table, locate entries with repeated itinerary numbers. b) Look up the service providers that facilitate the trip (i.e., refer to the Service Provider table and look up the same ServiceProvider_ID from Itinerary_Detail). c) For itineraries facilitated by more than one provider, display the records in which the Service Types are distinct from each other. a) Locate all the “Travels With” links that have the same “Itinerary_No”. b) Go to the thing connected to the aforementioned links (i.e., the end of the link). c) Short list the ones that have different values for the “Service Type” property (two service providers that are both “Airlines” are not considered different). d) Display the “Itinerary_No” of the links that satisfied the conditions in a and b.  Q3. Describe the procedure for evaluating an agents’ monthly sales performance (01/2014). a) Look up Harry Miller’s “Agent_ID” from the “Agent” table. b) In the “Itinerary” table, list all the records that have Harry’s “Agent_ID” (i.e., foreign key) as one of their fields. c) Select the itineraries with the confirmation dates within 01/01/2014 and 31/01/2014. Then, add up the prices of each itinerary. a) Search for a thing that has the “Name” property with the value of “Harry Miller”. b) Look up all the links titled “Communicates With” that go to the thing that has “Harry Miller” as the “Name” property. c) For the links that have “Confirmation Date”s within the range of 01/01/2014 to 31/01/2014, add up the values related to “Sales Price” property.  148 Control (class-based) Treatment (instance-based) Q4. Describe the procedure for generating a list of all Air Canada passengers with allergies.  a) From the “Customers with Additional Considerations” table, look up all the “Customer_ID” of all the customers that have allergies. b) From the “Itinerary” table, find the “Itinerary_No”s belonging to “Customer_ID”s with allergies. c) From the “Service Provider” table, look up AirCanada’s “ServiceProvider_ID”. d) In the “Itinerary_Detail” table, look for the records with “Itinerary_No” from step b, and “ServiceProvider_ID” from step c. e) Look up the “Itinerary_No”s found in step d in the “Itinerary” table. f) The “Customer_ID”s corresponding to the “Itinerary_No”s should be looked up in the “Customer” table. a) Look for all things that have a “Travels With” link going out from them. b) Short list the ones that have the “Allergy” property. c) The thing at the end of the “Travels With” link would have the “Name” property and the value of it should be “Air Canada”. Print the value of the “Name” property of every thing that matches with the pattern of “Thing A” mentioned above.  Table B2. Answer key to the consulting case Control (class-based) Treatment (instance-based) Q1. Describe the procedure to identify the average hours worked per day by an employee on a completed project.  a) Locate Edward McKay in the “Employee” table. Note his “Employee_ID”. b) From the “Timesheet” table, find Edward’s timesheet using his “Employee_ID”. Note his “WorkHours”. c) From the “Project” table, look up EZLink’s market research project. d) Using the start and finish date of the project, calculate the number of days it took to complete the project. e) Divide the work hours of the employee (e.g., Patrick) by the number of days it took to complete the project (for sake of simplicity, weekends and holidays are also included).  a) Locate the thing that has “Edward McKay” as the value of its “Name” property. b) Follow the “Serves” link that goes out from Edward’s node. Note the “WorkHours” property on the “Serves” link as well as “Project Name”. c) From the thing at the end of the “Serves” link from step b, look for “Has Project” links that have the same “Project Name” value. Note “Start Date” and “End Date”. d) Divide the work hours of the employee (e.g., Edward) by the number of days it took to complete the project (for sake of simplicity, weekends and holidays are also included). Q2. Describe the procedure for identifying clients that have worked with two or more divisions. a) Looking at the “Project” table, find “Client ID”s that have been paired with two or more different “Division ID”s. b) Look up those “Client ID”s in the “Client” table and report the value under the “Name” property. a) Look up all the things that have two or more “Has Project” links going out of them. b) Short list the ones for which the “Has Project” links lead to at least two different things (i.e., divisions). c) Report the value of “Name” property from the things found in step b.   149 Control (class-based) Treatment (instance-based) Q3. Describe the procedure of removing an employee from one project and assigning him as the manager of a different project.  a) Look up the particular project (e.g., audit for Vancouver Canucks) from the “Project” table and remove the current project manager ID. b) Locate the new candidate for the project manager position (i.e., Edward McKay) from the “Employee” table. Note Edward’s “Employee_ID”. c) Insert the Edward’s “Employee_ID” as the manager of the project (e.g., audit for Vancouver Canucks). d) Update Edward’s timesheet by going to the “Timesheet” table and putting Canuck’s Audit ID as the Project_ID. a) Locate the thing with “Edward McKay” as the value of “Name” property. b) Remove the current “Serves” link and establish a new “Serves” link to the thing that has the “Name” property of “Canucks”. c) On the “Serves” link, set the value of “Manages” to true. d) Search for the “Serves” link going out of the previous manager’s node and set the value of “Manages” to false.  Q4. Describe the process to identify divisions that serve non-corporate clients. a) Consider both the “Client” and “Corporate Client” tables. b) From the “Client” table, exclude all the “Client_ID”s that are also present in the “Corporate Client” table. c) From the client records that remained after step b, note their “Division_ID” d) In the “Division” table, look up the “Division_ID”s from step c. Print the name of the qualifying divisions.  a) Look up all the things that have a “Has Project” link going out of them. b) Short list the ones that do not have the “CorporateAccount_No” property. c) Display the value of the “Name” property of the things that qualified from step b.     150 Appendix C  Examples of Marked Responses From Subjects in Experiment I (Chapter 3) The 5-point marking scheme is demonstrated in Tables C1 and C2. For the sake of this example, we focused on only one particular question in the travel agency case. First we provide examples from subjects in the class-based group and then actual answers to the same question from participants in the instance-based group. Related to evaluating the class-based condition, we should note that even though we stressed matching primary and foreign keys during the training (as well as in the course for 5 to 6 weeks), we accepted answers from users that described the steps in terms of natural joins (i.e., without discussing primary and foreign keys), as can be seen in the tables below. Question: Canada Border Services has asked the agency to identify all trips facilitated by two different types of service providers (e.g., a trip that half of it is with a train, and the second half with an airplane). Describe the procedure for retrieving that information. Table C1. Sample scoring from answers of participants in the class-based group Answer Mark Explanation a) In “Itinerary Detail” table, find “Itinerary_Nos” that occur multiple times. b) List the ones that have multiple “Service_Provider_IDs”. c) Run those Service_Provider_IDs through “Service Provider” table. d) Note the “Itinerary_Nos” that have two different types of service providers. (Subject #33) 1.00 Procedure leads to correct answer. 1. In Itinerary Detail, look for service provider ID that has two different IDs for one itinerary no. 2. Note service provider ID. Under Service Provider, you will see the service type. (Subject #87) 0.75 The steps are correct. However, did not mention that service types should be distinct.   151 Answer Mark Explanation Look up itinerary table. Find service provider ID. Go to service provider table and identify service type with more than two types of services. (Subject #61) 0.5 Did not consider the fact that two service providers should facilitate a single trip (i.e., looking for the same itinerary-no constant). However, subject understood that two tables need to be joined.  a) Search under the Service Provider table for service provider IDs that have two service providers. b) Print the page. (Subject #87) 0.25 Subject provided only one step required to find the answer, and for that, s/he received 0.25.  Locate from Itinerary Detail the Itinerary No. (Subject #30) 0 Not a complete step.   Table C2. Sample scoring from answers of participants in the instance-based group Answer Mark Explanation Locate a thing that has the name of a person and see if they have two “Travels With” arrows. Then look at the departure dates in the properties of the arrows. If the value is the same, then the trip is facilitated by two different types of service providers if the service type of the adjacent thing is different. (Subject #24) 1.00 Procedure leads to correct answer. 1. Search for Travels With links that have itinerary_no appear more than once. 2. Find associated thing’s “Service Type”. 3. Provide list of links with the same Itinerary_No and the related thing’s “Service Type”. (Subject #124) 0.75 Service types should be different. Otherwise a trip that is facilitated by two airlines, for example, would also be qualified according to this procedure. 1. Find node with two links of “Travels With” going away from the node. 2. From those links, find “itinerary_no_ and record the values. (Subject #22) 0.5 Did not consider the value of “Service Type” property or the requirement that types should be different. Locate Itinerary_Nos occurring more than three times (connected to more than three things). If more than one is a service provider, it is correct. (Subject #39) 0.25 Does not lead to the correct answer, but the subject understood that the pattern for identifying those trips would include repetition of the Itinerary_No property.  1. Find all data associated with Canada Border Services with links to Service Type. 2. Identify all Service Type names related to Canada Border Services. (Subject #76)  0 Irrelevant answer.    152 Appendix D  Testing Distribution Normality and MVA Tables D.1 Experiment 1 from Chapter 3   Figure D1. Testing normality for the t-test, experiment (Chapter 3)  153 Table D1. Multivariate testsa table for Experiment 1 in Chapter 3 Effect Value F Hypoth-esis df Error df Sig. Partial eta squared Noncent. parameter Observed powerc Familiarity Pillai’s Trace .109 14.969b 1.000 122.000 .000 .109 14.969 .970 Wilks’ lambda .891 14.969b 1.000 122.000 .000 .109 14.969 .970 Hotelling’s trace .123 14.969b 1.000 122.000 .000 .109 14.969 .970 Roy’s largest Root .123 14.969b 1.000 122.000 .000 .109 14.969 .970 Familiarity * DatabaseK Pillai’s trace .012 1.446b 1.000 122.000 .231 .012 1.446 .222 Wilks’ lambda .988 1.446b 1.000 122.000 .231 .012 1.446 .222 Hotelling’s trace .012 1.446b 1.000 122.000 .231 .012 1.446 .222 Roy’s largest root .012 1.446b 1.000 122.000 .231 .012 1.446 .222 Familiarity * TravelK Pillai’s trace .011 1.339b 1.000 122.000 .249 .011 1.339 .209 Wilks’ lambda .989 1.339b 1.000 122.000 .249 .011 1.339 .209 Hotelling’s Trace .011 1.339b 1.000 122.000 .249 .011 1.339 .209 Roy’s largest root .011 1.339b 1.000 122.000 .249 .011 1.339 .209 Familiarity * ConsultingK Pillai’s trace .001 .154b 1.000 122.000 .696 .001 .154 .068 Wilks’ lambda .999 .154b 1.000 122.000 .696 .001 .154 .068 Hotelling’s trace .001 .154b 1.000 122.000 .696 .001 .154 .068 Roy’s largest root .001 .154b 1.000 122.000 .696 .001 .154 .068 Familiarity * Dummy_T_C Pillai’s trace .001 .160b 1.000 122.000 .689 .001 .160 .068 Wilks’ lambda .999 .160b 1.000 122.000 .689 .001 .160 .068 Hotelling’s trace .001 .160b 1.000 122.000 .689 .001 .160 .068 Roy’s largest root .001 .160b 1.000 122.000 .689 .001 .160 .068 Familiarity * ControlTreatment01 Pillai’s trace .067 8.764b 1.000 122.000 .004 .067 8.764 .836 Wilks’ lambda .933 8.764b 1.000 122.000 .004 .067 8.764 .836 Hotelling’s trace .072 8.764b 1.000 122.000 .004 .067 8.764 .836 Roy’s largest root .072 8.764b 1.000 122.000 .004 .067 8.764 .836  154 Effect Value F Hypoth-esis df Error df Sig. Partial eta squared Noncent. parameter Observed powerc Familiarity * SchemaData01 Pillai’strace .007 .818b 1.000 122.000 .368 .007 .818 .146 Wilks’ lambda .993 .818b 1.000 122.000 .368 .007 .818 .146 Hotelling’s trace .007 .818b 1.000 122.000 .368 .007 .818 .146 Roy’s largest root .007 .818b 1.000 122.000 .368 .007 .818 .146 Familiarity * ControlTreat-ment01 * SchemaData01 Pillai’s trace .000 .009b 1.000 122.000 .924 .000 .009 .051 Wilks’ lambda 1.000 .009b 1.000 122.000 .924 .000 .009 .051 hotelling’s trace .000 .009b 1.000 122.000 .924 .000 .009 .051 Roy’s largest root .000 .009b 1.000 122.000 .924 .000 .009 .051 a Design: Intercept + DatabaseK + TravelK + ConsultingK + Dummy_T_C + ControlTreatment01 + SchemaData01 + ControlTreatment01 * SchemaData01   Within Subjects Design: Familiarity b Exact statistic c Computed using alpha = 0.05     Table D2. Within-subjects effects for Experiment 1 in Chapter 3 Tests of Within-Subjects Effects Measure: Performance  Source Type III Sum of squares df Mean square F Sig. Partial eta squared Noncent. parameter Observed powera Familiarity Sphericity assumed 4.485 1 4.485 14.969 .000 .109 14.969 .970 Greenhouse-Geisser 4.485 1.000 4.485 14.969 .000 .109 14.969 .970 Huynh-Feldt 4.485 1.000 4.485 14.969 .000 .109 14.969 .970 Lower-bound 4.485 1.000 4.485 14.969 .000 .109 14.969 .970 Familiarity * DatabaseK Sphericity assumed .433 1 .433 1.446 .231 .012 1.446 .222 Greenhouse-Geisser .433 1.000 .433 1.446 .231 .012 1.446 .222 Huynh-Feldt .433 1.000 .433 1.446 .231 .012 1.446 .222 Lower-bound .433 1.000 .433 1.446 .231 .012 1.446 .222  155 Tests of Within-Subjects Effects Measure: Performance  Source Type III Sum of squares df Mean square F Sig. Partial eta squared Noncent. parameter Observed powera Familiarity * TravelK Sphericity assumed .401 1 .401 1.339 .249 .011 1.339 .209 Greenhouse-Geisser .401 1.000 .401 1.339 .249 .011 1.339 .209 Huynh-Feldt .401 1.000 .401 1.339 .249 .011 1.339 .209 Lower-bound .401 1.000 .401 1.339 .249 .011 1.339 .209 Familiarity * ConsultingK Sphericity assumed .046 1 .046 .154 .696 .001 .154 .068 Greenhouse-Geisser .046 1.000 .046 .154 .696 .001 .154 .068 Huynh-Feldt .046 1.000 .046 .154 .696 .001 .154 .068 Lower-bound .046 1.000 .046 .154 .696 .001 .154 .068 Familiarity * Dummy_T_C Sphericity assumed .048 1 .048 .160 .689 .001 .160 .068 Greenhouse-Geisser .048 1.000 .048 .160 .689 .001 .160 .068 Huynh-Feldt .048 1.000 .048 .160 .689 .001 .160 .068 Lower-bound .048 1.000 .048 .160 .689 .001 .160 .068 Familiarity * ControlTreat-ment01 Sphericity assumed 2.626 1 2.626 8.764 .004 .067 8.764 .836 Greenhouse-Geisser 2.626 1.000 2.626 8.764 .004 .067 8.764 .836 Huynh-Feldt 2.626 1.000 2.626 8.764 .004 .067 8.764 .836 Lower-bound 2.626 1.000 2.626 8.764 .004 .067 8.764 .836 Familiarity * SchemaData01 Sphericity assumed .245 1 .245 .818 .368 .007 .818 .146 Greenhouse-Geisser .245 1.000 .245 .818 .368 .007 .818 .146 Huynh-Feldt .245 1.000 .245 .818 .368 .007 .818 .146 Lower-bound .245 1.000 .245 .818 .368 .007 .818 .146 Familiarity * ControlTreat-ment01 * SchemaData01 Sphericity assumed .003 1 .003 .009 .924 .000 .009 .051 Greenhouse-Geisser .003 1.000 .003 .009 .924 .000 .009 .051 Huynh-Feldt .003 1.000 .003 .009 .924 .000 .009 .051  156 Tests of Within-Subjects Effects Measure: Performance  Source Type III Sum of squares df Mean square F Sig. Partial eta squared Noncent. parameter Observed powera Lower-bound .003 1.000 .003 .009 .924 .000 .009 .051 Error (Familiarity) Sphericity assumed 36.553 122 .300      Greenhouse-Geisser 36.553 122.000 .300      Huynh-Feldt 36.553 122.000 .300      Lower-bound 36.553 122.000 .300      a Computed using alpha = 0.05   Table D3. Within-subjects contrasts from Experiment 1 in Chapter 3 Tests of Within-Subjects Contrasts Measure: Performance Source Familiarity Type III Sum of squares df Mean square F Sig. Familiarity Linear 4.485 1 4.485 14.969 .000 Familiarity * DatabaseK Linear .433 1 .433 1.446 .231 Familiarity * TravelK Linear .401 1 .401 1.339 .249 Familiarity * ConsultingK Linear .046 1 .046 .154 .696 Familiarity * Dummy_T_C Linear .048 1 .048 .160 .689 Familiarity * ControlTreatment01 Linear 2.626 1 2.626 8.764 .004 Familiarity * SchemaData01 Linear .245 1 .245 .818 .368 Familiarity * ControlTreatment01 * SchemaData01 Linear .003 1 .003 .009 .924 Error(Familiarity) Linear 36.553 122 .300      157 Table D4. Tests of between-subjects effects from Experiment 1 of Chapter 3 Tests of Between-Subjects Effects Measure: Performance  Transformed Variable: Average Source Type III Sum of squares df Mean square F Sig. Intercept 97.916 1 97.916 140.951 .000 DatabaseK .432 1 .432 .621 .432 TravelK .001 1 .001 .001 .973 ConsultingK .132 1 .132 .191 .663 Dummy_T_C .077 1 .077 .111 .740 ControlTreatment01 39.805 1 39.805 57.300 .000 SchemaData01 .003 1 .003 .004 .948 ControlTreatment01 * SchemaData01 .008 1 .008 .011 .915 Error 84.751 122 .695    D.2 Experiment from Chapter 4  Figure D2. Testing normality for the t-test, experiment (Chapter 4)   158 Appendix E  Appendix E. Material for Experiment in Chapter 4 E.1 Human Resource Data: Case Description and Task ACME Corporation has provided data about personal information of its employees, their performance records, travel and telephone records, and Internet access logs. Your challenge is to report patterns that might be worth investigating further by the stakeholders of the company. For the purpose of this task a pattern is a consistent and recurring characteristic or trait that helps in the identification of a phenomenon or problem. E.2 Material for the Control Group Material for the control group included the schema of ACME (see Figure E1).  159   Figure E1. Schema of ACME  E.3 Material for the Treatment Group Data is not bound to any structure. A possible conceptualization of the information is presented in Figure E2 and Table E1:  160  Figure E2. Treatment group         161 Table E1. List of properties relevant in the domain Property Description Action Action taken in accordance to an employee’s performance Address  Airline  Area Code  Arrival City  Arrival Date  Arrival Time  Birth Country  Birth Date  Business Cell  Business E-Mail  Business Phone  Caller Name  Citizen Country A two-letter code identifying one’s citizenship Citizen Status Country of citizenship City  Country  Date and Time Date and time of accessing servers Department  Departure City  Departure Date  Departure Time  Destination Destination country of a phone call Destination No. Destination number of a phone call Domain Domain name of local servers Duration Duration of a phone call Effective Effective date for an executive decision (e.g., promotion)  Employee ID  Fax  First Name  Flight Number  Gender  Host IP IP address of a web host Host Name Name of a web host In/Out Incoming or outgoing call Job Code A four-character code assigned to a position Last Name  Loc. Area Code A four-character code assigned to a location  Marital Status  Name  NTID Network ID assigned to users (a la username) Personal Cell  Personal E-Mail  Reason Reason for an executive decision Region  Source Source country of a phone call Source Number Source number of a phone call ST State Start Date Date of employment Status Employment status (e.g., active) Time  Zip Code     162 Appendix F  Examples of Visual Analytics Done by Subjects from Chapter 4 Examples of visual analytics are given in Figures F1 to F3. F.1 Statement and Visualization from Subject #18 (Instance-based) In IT department demotion due to performance is more than double males over females  Figure F1. Visual analytics by Subject #18 (instance-based)  F.2 Statement and Visualization from Subject #21 (Class-based) Corporate department tends to have the most amount of pay increase.  163   Figure F2. Visual analytics by Subject #21 (class-based)  F.3 Statement and Visualization from Subject #35 (Instance-based) 2:00 PM always sees the greatest number of users accessing the servers in the day.  Figure F3. Visual analytics by Subject #35 (instance-based)    164 F.4 Statement and Visualization from Subject #12 (Class-based) Employees travel most frequently in March and least frequently in February.  Figure F4. Visual analytics by Subject #12 (class-based)      165 Appendix G  Converting Class-based Tables to Instance-based In a previous research study (under review), we proposed principles for representing instance-based data. In the proposed grammar, each “thing in the domain (in Bungean sense, (Bunge 1977)) is represented as a node along with the intrinsic properties it possesses. If two things were related to each other, information about their relation (i.e., their mutual property) would be modelled as a link connecting the two things (or nodes). Having this representation in mind, here we discuss a method for storing instance-based data in a tabular structure (informed by Parsons and Wand (2000)). We need to emphasize that in the proposed experiment users will analyze and make inferences regarding the data using the aforementioned representation (i.e., nodes and links). The method to be discussed below is only used by the database management system (DBMS) for storing the information. Considering that we have access to class-based data, we would gather a list of all the intrinsic properties that a thing could possess in that domain (e.g., customer name, customer telephone number, product brand, and product weight). A universal table can be created that will include all of the intrinsic properties known in the domain. Each row in the table will represent a “thing”, which could be of different kinds (or classes). For every intrinsic property that the thing possesses, we will record the corresponding value, while the cells related to the properties that a given “thing” does not possess will remain null (i.e., not that the value is not known).  166 For mutual properties that can exist in the domain, we create individual tables (one table per mutual property) and will populate the table with information that is shared between two things, along with pointers to the things (i.e., their row numbers or identifiers). We have used this representation in a sample domain (database of a retailer) and found in our initial tests that it was informationally equivalent to the class-based representation. More specifically, we ran some queries on the instance-based data and found the same answers as in the class-based representation. No information was lost following the proposed transformation method (from class-based to instance-based). However, we should mention that prior research has shown instance-base to be more flexible than class-based approaches, providing agility in the face of changing requirements (Parsons and Wand 2013) and enabling generation of higher quality data by data contributors in user-generated content settings (Lukyanenko et al. 2014).    

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0319119/manifest

Comment

Related Items