UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Graphics : theories & experiments Tan, Joseph K. H. 1988

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1989_A1 T36.pdf [ 16.1MB ]
Metadata
JSON: 831-1.0098367.json
JSON-LD: 831-1.0098367-ld.json
RDF/XML (Pretty): 831-1.0098367-rdf.xml
RDF/JSON: 831-1.0098367-rdf.json
Turtle: 831-1.0098367-turtle.txt
N-Triples: 831-1.0098367-rdf-ntriples.txt
Original Record: 831-1.0098367-source.json
Full Text
831-1.0098367-fulltext.txt
Citation
831-1.0098367.ris

Full Text

GRAPHICS: THEORIES & EXPERIMENTS by JOSEPH K. H. TAN M.S., The University of Iowa, 1982 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DECREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Faculty of Commerce & Business Administration We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA 28 December 1988 © Joseph K. H. Tan, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada DE-6 (2/88) ABSTRACT GRAPHICS: THEORIES & EXPERIMENTS BY Joseph K. H. Tan, December, 1988 Supervisor: Dr. Izak Benbasat Divison: Management Information Systems The primary justification of this research lies in the current thinking among graphics theorists and Management Information Systems researchers that different forms of information representation facilitate different types of tasks, and that it is the task characteristics which essentially influence performance with a given information presentation. Three experiments were designed to investigate hypotheses drawn from the literature testing the relative strengths and weaknesses of various graphical representations for answering a series of questions. Three forms of graph format were studied comprising bars, symbols, and lines. Time is the primary dependent variable of interest in this research. Accuracy is a secondary criterion. The tasks investigated involved the extraction of relationships among elementary classes of information depicted on various attribute components of time series data: (1) Dependent Variable (DV) component (namely, information on scale-values, level relationships, and trends); (2) Primary Independent Variable (PIV) component (namely, information on abscissa time period); and (3) Secondary Independent Variable (SIV) component (namely, information on dataset classification). Experiment 1 tasks involved the extraction of the DV scale-value (QT), DV level relationship (Q2), and ii DV trend (Q3) based on specific time period information on the PIV component. Results indicated that lines took longest for Q1 when compared to bars and symbols. Conversely, experiment 2 tasks involved the extraction of time period information based on a specific DV scale-value (Q1), the DV level relationship between two points (Q2), or the DV trend among several points (Q3). No statistically significant time differences were found among the various graph formats. However, lines were less accurate to use than bars for answering Q 1 . Experiment 3 tasks involved the extraction of dataset information from the SIV component based solely on a specific DV scale-value (Q1), the DV level relationship between two points (Q2), or the DV trend among several points (Q3). Results revealed that the time required for answering either Q2 or Q3 was longest with bars. Together, these results strongly indicated that the degree of support provided by a particular graph format for a particular task is heavily dependent upon the matching of task characteristics with graph format characteristics. Having information related to either the answer or question anchored on the x-axis and/or y-axis was found to influence task performance with the different graph formats investigated. Also, information complexity of graphics was found to be a function of time periods and/or datasets. There was only partial evidence to suggest the influence of individual characteristics on performance. iii TABLE OF CONTENTS ABSTRACT ii List of Tables viii List of Figures x ' i Acknowledgements xiv I. INTRODUCTION 1 A. Objective of the Research 1 B. Scope of the Research 3 C. Importance of the Research 3 D. Overview of the Dissertation 7 II. LITERATURE REVIEW 8 A. Classifying the Empirical Literature 8 B. Existing Theoretical Development 10 1. Theories 11 2. Extensibility of Graphics Theories 11 C. Chernoff's List of Attributes 12 1. Chernoff's Theory 14 2. Implications of Chernoff's Theory 14 D. Bertin's Image Theory & Taxonomies 15 1. Definitions 15 2. Bertin's Theory 17 3. Implications of Bertin's Theory 19 E. Graphics Research at Indiana University 19 1. The Question Construct 20 2. The Information Set Complexity Construct 23 3. Implications of Recent MIS Graphics Research Findings 23 F. Cleveland's Theory of Graphical Perception 24 1. The Cleveland-McGill Theory 24 2. Application of the Cleveland-McGill Theory to Graphing Data 26 3. Implications of Cleveland's Theory 27 C. The Kosslyn-Pinker Theory of Graph Comprehension 27 1. The Kosslyn et al. Analytical Scheme 30 2. The Kosslyn-Pinker Process Model of Graph Comprehension 34 a. Structures & Processes 34 b. Pinker's Graph Difficulty Principle 39 c. Pinker's Treatment of Information Extraction 41 3. Graph Schema Models 44 a. A Bar Chart Schema 44 b. A Symbol Chart Schema 45 c. A Line Graph Schema 49 4. Implications of the Kosslyn-Pinker Theory 50 III. THEORETICAL PROPOSITIONS 52 A. Critical Factors 52 1. Graph Format 52 2. Information Complexity 53 3. The Task Variable 58 iv 4. Learning 64 B. Tasks Investigated in this Research 64 1. Experiment 1 Tasks 65 2. Experiment 2 Tasks 66 3. Experiment 3 Tasks 69 C. Theory & Propositions 72 1. The Theory Investigated 76 a. Proposition 1 78 b. Proposition 2 79 c. Proposition 3 81 D. The Anchoring Concept 82 1. Task Characteristics 82 2. Graph Format Characteristics 85 3. Matching Formats to Tasks 86 a. Proposition 4 86 b. Proposition 5 87 c. Proposition 6 87 d. Proposition 7 88 IV. EXPERIMENTAL METHODOLOGY 90 A. Experimental Variables 90 1. The Dependent Variables • L- 91 a. Time 91 b. Accuracy 91 2. The Independent Variables 92 a. Graph Format 92 b. Question Type 92 c. Information Complexity 93 3. The Session Variable 94 4. The Covariate 94 B. Experimental Hypotheses 95 C. Experimental Design 97 D. Experimental Procedures 100 E. Experimental Stimuli • 102 V. DATA ANALYSIS: THE REPEATED MEASURES DESIGN 106 A. The Repeated Measures Design 106 B. Statistical Analysis Procedures 110 C. The Experimental Raw Data : 111 D. Examination of the Data Structure 114 1. The Normality Assumption 114 2. Homogeneity of Variance/Covariance 116 3. The Symmetry Condition 117 4. The Univariate-Multivariate ANOVA/ANCOVA Issue 118 5. Multiple Comparison Techniques 119 E. Summary 122 VI. RESULTS: EXPERIMENT 1 123 A. Time Performance for Combined Sessions 124 1. The Session Effect 124 2. The GEFT Measure .' : • : 125 3. Additional Outliers 125 4. The Power Analysis 129 v B. Time Performance foi Separate Sessions 130 1. Significant Effects on Time for Session 1 133 2. Significant Effects on Time for Session 2 , 13,7 a. Main Factor Effects on Time for Session 2 138 b. Two-way Interactions on Time for Session 2 139 c. Three-way Interactions on Time for Session 2 149 C. Accuracy Performance for Combined Sessions 152 1. Main Effects on Accuracy for Transformed Data 155 2. Two-way Interactions on Accuracy for Transformed Data 156 D. Summary of Experiment E1 Results 158 VII. RESULTS: EXPERIMENT 2 160 A. Time Performance for Combined Sessions 161 1. The Session Effect 161 2. The GEFT Measure 162 3. Additional Outliers 162 4. The Power Analysis 162 B. Time Performance for Separate Sessions 165 1. Significant Effects on Time for Session 1 165 2. Significant Effects on Time for Session 2 172 a. Main Factor Effects on Time for Session 2 172 b. Two-way Interactions on Time for Session 2 173 c. Three-way Interactions on Time for Session 2 179 C. Accuracy Performance for Combined Sessions 183 1. Main Effects on Accuracy for Transformed Data 183 2. Two-way Interactions on Accuracy for Transformed Data 185 D. Summary of Experiment 2 Results 187 VIII. RESULTS: EXPERIMENT 3 '. 189 A. Time Performance for Combined Sessions 190 1. The Session Effect 190 2. The GEFT Measure 190 3. Additional Outliers 191 4. The Power Analysis 191 B. Time Performance for Separate Sessions 194 1. Significant Effects on Time for Session 1 194 2. Significant Effects on Time for Session 2 198 a. Main Factor Effects on Time for Session 2 201 b. Two-way Interactions on Time for Session 2 201 C. Accuracy Performance for Combined Sessions 214 1. Main Effects on Accuracy for Transformed Data 216 2. Two-way Interactions on Accuracy for Transformed Data 216 D. Summary of Experiment 3 Findings 221 IX. INTEGRATION OF RESULTS 223 A. Overview of Key Findings 223 1. Effects for Time Performance in Session 1 224 2. Effects for Time in Session 2 227 a. Main Factor Effect ; 227 b. 2-way Interactions 229 B. Integration of Findings with the Current Literature 234 1. Learning 235 2. The Individual Difference Characteristics 236 vi 3. Task Characteristics 236 4. Graph Format 238 5. Information Complexity 238 6. Perceptual-Cognitive Mechanisms in Graphics Processing 239 C. Summary 241 X. CONCLUSIONS 242 A. Summary of Key Findings and Major Contributions 242 a. Contributions 242 b. Findings 245 B. Review of Limitations 246 a. Limitations 247 b. Implications of Specific Limitations 248 C. Suggestions for Future Studies 249 XI. BIBLIOGRAPHY '. 251 XII. APPENDIX A: GLOSSARY OF TERMS 263 XIII. APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 267 XIV. APPENDIX CrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 304 XV. APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 : 341 XVI. APPENDIX E:SUBJECT RECRUITMENT FORM 378 XVII. APPENDIX F:SUBJECT CONSENT FORM 380 XVIII. APPENDIX G:OUTLINE OF EXPERIMENTAL PROCEDURES 381 XIX. APPENDIX H:INSTRUCTIONS FOR SUBJECTS 383 XX. APPENDIX l:PILOT TESTING REPORT 385 XXI. APPENDIX ^QUESTIONNAIRE FOR SUBJECTS 394 XXII. APPENDIX K:CERTIFICATE OF APPROVAL: ETHICAL REVIEW COMMITTEE 404 XXIII. APPENDIX LMAIN TURBO PASCAL PROGRAM 405 XXIV. APPENDIX M:SAMPLE BMDP STATISTICAL PROGRAM : 412 vii LIST OF TABLES Table Page(s) 3.1: A Classification Scheme for Information Complexity Factors 56 3.2: A General Classification of Graphics Research Tasks 62 3.3: Classes of Elementary Comprehension Tasks 63 3.4: A Comparison of Task Activities for Experiments 1, 2, and 3 67 3.5: Status of Graphical Component Information for Task Activities in 68 Experiments 1, 2, and 3 3.6: Information Complexity Manipulated in Experiments E1 and E2 70 3.7: Information Complexity Manipulated in Experiment E3 71 3.8: The Anchoring Concept 84 4.1: A Multi-factor Repeated Measures Experimental Design 98 5.1: Summary of Experimental Raw Datasets 113 6.T. Initial ANCOVA Results for the Full Dataset (Experiment 1) 126 6.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment 128 1, Session 2) 6.3: Comparison of ANOVA Results Among Sessions (Experiment 1, Additional 131 Outliers Excluded) 6.4: Tables of Means for All Treatment Combinations (Experiment 1, Outliers 132 excluded) 6.5: Summary of Bonferroni Results for Graph Format x Question Type 136 Interaction (Experiment 1 ; Session 1) 6.6: Summary of Bonferroni Results for Graph Format x Question Type 141 Interaction (Experiment 1, Session 2) 6.7: Summary of Bonferroni Results for Question Type x Time Period 144 Interaction (Experiment 1, Session 2) viii 6.8: Summary of Bonferroni Results for Graph Format x Dataset Interaction (Experiment 1, Session 2) 6.9: Data Table of Question Type x Time Period x Dataset Interaction (Experiment 1, Session 2) 6.10: Summary of Bonferroni Tests for the Question Type x Time Period x Dataset Interaction (Experiment 1, Session 2) 6.11: Comparison of ANCOVA Results for Sessions 1, 2, and the Transformed Dataset (Experiment 1, Outliers Excluded) 6.12: Mean Values of the Question Type x Time Period Interaction for Transformed Data (Experiment 1, Outliers Excluded) 7.1: Initial ANCOVA Results for Full Dataset (Experiment 2) 7.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment 2, Session 2) 7.3: Comparison of ANOVA Results Among Sessions (Experiment 2, Additional Outliers Excluded) 7.4: Tables of Means for All Treatment Combinations (Experiment 2, Additional Outliers Excluded) 7.5: Summary of Bonferroni Results for Graph Format x Question Type Interaction (Experiment 2, Session 1) 7.6: Summary of Bonferroni Results for Graph Format x Dataset Interaction (Experiment 2, Session 2) 7.7: Summary of Bonferroni Results for Time Period x Dataset Interaction (Experiment 2, Session 2) 7.8: Mean Values of Graph Format x Question Type x Time Period Interaction (Experiment 2, Session 2) ix 7.9: Mean Values of Question Type x Time Period x Dataset Interaction (Experiment 2, Session 2) 7.10: Summary of Bonferroni Results for Question Type x Time Period x Dataset Interaction (Experiment 2, Session 2) 7.11: Comparison of ANCOVA results for Sessions 1, 2, and Transformed Dataset (Experiment 2, Outliers Excluded) 7.12: Mean Value Tables for Significant Two-factor Interactions for the Transformed Dataset (Experiment 2, Outliers Excluded) 8.1: Initial ANCOVA Results for the Full Dataset (Experiment 3) 8.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment 3, Session 2) 8.3: Comparison of ANOVA Results Among Sessions (Experiment 3, Additional Outliers Excluded) 8.4: Tables of Means for All Treatment Combinations (Experiment 3, Outliers Excluded) 8.5: Summary of Bonferroni Results for Graph Format x Question Type Interaction (Experiment 3, Session 1) 8.6: Summary of Bonferroni Results for Graph Format x Question Type Interaction (Experiment 3, Session 2) 8.7: Summary of Bonferroni Results for Question Type x Time Period Interaction (Experiment 3, Session 2) 8.8: Summary of Bonferroni Results for Graph Format x Dataset Interaction (Experiment 3, Session 2) 8.9: Summary of Bonferroni Results for Time Period x Dataset Interaction (Experiment 3, Session 2) 8.10: Comparison of ANCOVA Results for Sessions 1, 2, and the Transformed Dataset (Experiment 3, Outliers Excluded) 8.11: Bonferroni Test Results and Mean Value Table for Question Type x Time Period Interaction of Transformed Data (Experiment 3) 8.12: Mean Value Table for Question Type x Dataset Interaction of Transformed Data (Experiment 3, Outliers Excluded) 8.13: Bonferroni Test Results and Mean Value Table for Time Period x Dataset Interaction of Transformed Data (Experiment 3) 9.1: Overview of Key Findings for Experiments E1, E2, and E3 (Session 1 Results) 9.2: Overview of Main Factor Effect on Graph Format Characteristics for Experiments E1, E2, and E3 (Session 2 Results) 9.3: Overview of Two-Factor Interaction Effects on Graph Format Characteristics for Experiments E1, E2, and E3 (Session 2 Results) xi LIST OF FIGURES Figure Page(s) 2.1: Three Visual Information Processing Stages 29 2.2: Kosslyn et al's Basic Level Constituents 31 2.3: Pinker's Graphic Notation 37 2.4: Pinker's Process Model of Graph Comprehension 40 2.5: Pinker's Information Extraction from a Bar Chart 43 2.6: Pinker's Proposed Bar Chart Schema 46-47 3.1: An Illustration of Major Graphical Components 57 3.2: Pinker's Illustrations of Graph Designs for Trend Reading 77 4.1: An Experimental Procedure Flowchart 103 5.1: A Repeated Measures Design 109 5.2: A Guide for Selection of Multiple-Comparison 120 6.1: Plot and Mean values of Graph Format x Question Type Interaction 135 (Experiment.1, Session 1) 6.2: Plot and Mean values of Graph Format x Question Type Interaction 140 (Experiment 1, Session 2) 6.3: Plot and Mean values of Question Type x Time Period Interaction 143 (Experiment 1, Session 2) 6.4: Plot and Mean values of Graph Format x Dataset Interaction (Experiment 1, 147 Session 2) 7.1: Plot and Mean Values of Graph Format x Question Type Interaction 170 (Experiment 2, Session 1) 7.2: Plot and Mean Values of Graph Format x Dataset Interaction (Experiment 175 2, Session 2) 7.3: Plot and Mean Values of Time Period x Dataset Interaction (Experiment 2, 177 Session 2) xii 8.1: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 3, Session 1) 8.2: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 3, Session 2) 8.3: Plot and Mean values of Question Type x Time Period Interaction (Experiment 3, Session 2) &.4: Plot and Mean values of Graph Format x Dataset Interaction (Experiment 3, Session 2) 8.5: Plot and Mean values of Time Period x Dataset Interaction (Experiment 3, Session 2) xiii ACKNOWLEDGEMENTS I dedicate this work to my dear wife, Ms. Leonie Tan. Certainly, without her patient understanding, love, continuing inspiration and assistance, this project would not have come to an end. The responsibility for all omissions and errors must, of course, remain with me. In terms of my dissertation committee, I must surely give credit and make acknowledgement to Dr. Izak Benbasat, chairman of the MIS division as well as this research committee, whose valuable assistance in all aspects of this effort can never be sufficiently emphasized. I must also give credit to other members of my committee, namely, Dr. Al Dexter, and Dr. Larry Ward, whose continuing support, guidance, and encouragement (however great or small) at the various stages of this research are most dearly appreciated. I like also to express my sincere appreciation to members of my Examination committee, all MIS faculty members, and all experimental subjects because without their participation, this dissertation would not have been a success. I like to give special thanks also to those faculty members of the Psychology, Statistics, and Health Care and Epidemiology Departments and staff members of the Computing Center who have assisted in the design, statistical analyses, and programming of this project. In particular, I wish to express my gratitude to my collegues in the department of Healthcare and Epidemiology: Dr. Godwin Eni, for his •genuine concern, encouragement and support; and Dr. John Milsum, for his many excellent suggestions, comments, and insights in proof-reading the script. Finally, I must give credit to my parents and other members of my immediate family for their continuing patience in following me to the end of this journey. xiv I. INTRODUCTION Interest in the use and study of graphics can be found across many disciplines such as Statistics, Education, Cartography, Psychology, Human Factors Engineering, and Managment Information Systems (MIS). Yet, graphics research over the last several years has been criticized as consisting of atheoretical studies that are plagued by serious methodological problems and controversial findings (see Ives, 1982; Kosslyn, Pinker, Simcox, & Parkin, 1983; Jarvenpaa, Dickson, & DeSanctis, 1985). Even so, isolated contributions in the area have come from a wide spectrum of theoretical perspectives that are sometimes very difficult to reconcile, including: a semiological approach (e.g. Bertin, 1973, 1983; Wainer, Lono, & Groves, 1982); the application of psychophysical laws (e.g. Weber's & Steven's laws: Baird & Noma, 1978; Cleveland & McGill, 1984); and an integration of general perceptual and cognitive theory (e.g. Kosslyn et al., 1983; Pinker, 1981, 1983). While several of these recent works, and others (e.g. Benbasat, Dexter, & Todd, 1986; Jarvenpaa & Dickson, 1988), are beginning to offer promising guidelines for graphics designers based on theory and research rather than on debatable rules of thumb, there remains a host of identifiable gaps among the theories that must be filled if knowledge about graphics is to advance (see Vessey, 1987). A. OBJECTIVE OF THE RESEARCH The purpose of this investigation is to address the following question: What are relative strengths and weaknesses of various graphical representations for different types of managerial data extraction tasks ? Bertin (1983; p. 100) suggests that the basic problem in graphics is to choose the most appropriate graphic design for representing a given set of information. In parallel, the current thinking expressed among MIS graphics researchers (e.g. DeSanctis, 1984; Dickson, DeSanctis, & McBride, 1986; Benbasat et al., 1986; Jarvenpaa & Dickson, 1988) and other graphics theorists (e.g. Pinker, 1981; Kosslyn et al., 1983; Cleveland, 1985) is that different graphical representations facilitate the 1 INTRODUCTION / 2 performance of different tasks. Accordingly, a major objective of this research is to test hypotheses based on current beliefs about information representations. These hypotheses are developed in the next few chapters. The method of investigation is a program of laboratory experiments testing possible relationships between various graphic designs and performance on various elementary perceptual-cognitive tasks. In this research, relative strengths and weaknesses of various graphical designs will be evaluated principally on the basis of Time as the major dependent variable for each experiment. This measure is to be defined in terms of latency of responses. Accuracy as measured by the percentages of correct responsest will be a secondary criterion. Thus, for a generally acceptable level of accuracy in task performance, the objective is to minimize time required to extract the appropriate data from a graphical presentation. Different individuals may exhibit different degrees of time-accuracy tradeoff. Researchers should control for this effect, even more so, when relatively complex tasks are to be investigated so as to maintain a high internal validity and an unambiguous interpretation of experimental results. This could be done, for example, with the use of control groups and their performance compared to experimental groups. In this research, time-accuracy tradeoff effect is controlled, first, by the inclusion of both time and accuracy measures, and, second, by the elimination from the final analyses of the data of those subjects who show a significant time-accuracy tradeoff. Independent variables of interest included in this research are: 1. Graph Format 2. Information Complexity, which includes, a. Variations in Time Period b. Variations in Dataset Category t The more general term, Accuracy is used instead of the Engineering term, Precision so as to be consistent with the MIS literature on graphics evaluation (e.g. Wainer et al., 1982; Davis, 1985; Yoo, 1985; Lauer, 1986). Hence, Perf (Accuracy) = No. of Correct Responses/Total No. of Responses x 100% INTRODUCTION / 3 3. Question type (i.e. task) A Session variable is used to control effects due to learning. This implies that not only will subjects assigned to each experiment undergo an initial practice session but that they will replicate all thirty-six different treatment combinations or trials in the experimental session. Finally, the GEFT (Group Embedded Figures Test) developed by Witkin, Oltman, & Raskin (1971) is to be used as a covariate measure to control effects due to individual differences among subjects. Important terms used in the Thesis are defined in the Glossary (appendix A). B. SCOPE OF THE RESEARCH The scope of the research program is limited to determining the effects of a selected set of information representation characteristics in a controlled setting. The individual is the unit of analysis; group effects are not addressed. The research concerns only the most usual purpose for which business time-series graphics are employed: the extraction of elementary quantitative information. Its focus is on graph comprehension tasks rather than on composite problem-solving or complex decision-making tasks (see Davis, Groomer, Jenkins, Lauer, & Kwan, 1985; Davis, 1985; Vessey, 1987 for critiques) or on the recall of quantitative graphical information (see Macdonald-Ross, 1977b). C. IMPORTANCE OF THE RESEARCH Although MIS graphics research focuses primarily on the efficiency and effectiveness of computer graphics as decision aids (e.g. DeSanctis 1984; DeSanctis & Jarvenpaa, 1985; Davis & Olson, 1985), it is important to note that the use of graphics to aid decision-making may essentially be a composite process of many disaggregated processest. It is essential, then, if we are to understand the use of t Both Bertin's and Pinker's genera! propositions on graphics processing strongly support INTRODUCTION / 4 graphics as decision-making aids, that we be able to decompose complex tasks (see Benbasat & Dexter, 1985, 1986; Benbasat et a!., 1986) to a level at which we can understand the underlying mechanisms (see Blalock, 1969). Perhaps this explains why progress in the understanding of graphics as decision aids has been slow even though many one-shot studies on the use of graphics and/or color at complex levels of decision-making and problem solving have been conducted. Indeed, there has been a growing concern over the concentration of prior MIS graphics research on the mainly macro level of decision-making tasks (see Benbasat et al., 1986), when there appears to be an equal, if not more important need to investigate tasks at the more micro level of graph comprehension. It is time that MIS researchers also become acquainted with the use of graphics at the level of visual and logical extraction of quantitative and/or qualitative informationt in answering specific questions. Research on the use of various graphical designs for performing fundamental tasks can benefit both the practicing and academic communities: 1. Despite conflicting and weak empirical evidence on the use of graphics as a decision support tool (Dickson et al., 1986), the use of business graphics continues to proliferate (Lehman, Vogel, & Dickson, 1984). 2. Graphics research has failed to build on previous work. Perhaps this is partly responsible for the limited progress made in the field. On the one hand, graphics researchers interested in human perceptual-cognitive processes have ignored the findings presented in the MIS field probably because the field is still in a developing stage; on the other hand, graphics researchers in MIS are generally unaware of existing theories of graphics information processing advanced in related disciplines (e.g. Chernoff, 1978; Bertin, 1981; Pinker, 1981; Kosslyn et al., 1983; Cleveland & t(cont 'd) such a viewpoint. t Kosslyn et al. (1983; p. 272) contend that charts usually convey information about qualitative relationships (e.g. "is a member of" or "occur after") whereas graphs always convey information about quantitative information (e.g. "x has more than y"). INTRODUCTION / 5 McCill, 1984).t MIS, as an interdisciplinary field, is especially suited to bridging existing gaps among graphics related disciplines, thereby cutting down on duplication. 3. Although this research program is primarily driven by the need to test existing claims by graphics theorists so that the study of graphics can advance, a large part of its motivation comes also from the need to fill in obvious gaps among current theories. Neither the theoretical nor the empirical literature, for example, provides clear statements on the effects of information complexity factors and their interactions with other variables (e.g. graph format) when different types of questions are asked on conventional time series graphics. In addressing these issues, the research program has also attempted to identify some of the knowledge gaps existing among the theories as well as contribute towards filling those gaps. 4. Major methodological problems with earlier MIS graphics research include the lack of adequate control of the task variable, a key factor that is identified in current theories of graph perception and comprehension, and the lack of a priori predictions about performance with various graphical designs based on theories. This lack of theoretical perspectives among earlier graphics researchers in MIS is most evident in their failure to distinguish among the use of various graph formats and the lack of strong rationalization as to why one graph format would be better suited for a particular task than another. This program of research aimed at overcoming these drawbacks. 5. Despite much graphics research, little is understood of how people read and understand graphs at the level of extracting information to answer specific questions. Although some attention has been paid to the importance of the question variable (e.g. Wainer et al., 1982; Powers, Lashley, Sanchez, & Shneiderman, 1984; Davis et al., 1985) there still is little success in empirically validating those critical factors that influence performance with an information presentation. The problem appears also to lie in the lack of cumulative effort and evidence towards resolving t There is hope that this situation will improve. For instance, a stream of recent MIS graphics research with the focus of formalizing a theory of information presentation has been based on Bertin's theory (see Yoo, 1985; Davis et al., 1985; Lauer, Davis, Croomer, Jenkins, & Kwan, 1985). INTRODUCTION / 6 important issues such as: a rigorous specification of critical questions, or tasks that should be tested; an operational definition of factors contributing to the information complexity of a graph; and, a set of meaningful propositions about characteristics of various graphical designs and their effects on performance. A review of existing graphics theories suggests that this problem can be overcome by classifying tasks, identifying a number of possible factors affecting the information complexity of graphics, and formulating a number of interesting a priori hypotheses that can be tested empirically. Results following such a research program would thus contribute to a much needed understanding of the extent to which these theoretical propositions may be used to explain graphical perception and comprehension. 6. While a large proportion of prior MIS graphics research has been devoted to the study of individual differences in the context of information presentation design, the failure to focus on the role of task is believed to be the underlying cause of conflicting results among various graphics experiments (see Ives, 1982; DeSanctis, 1984; Benbasat et al., 1986). A further motivation for such a research program, therefore, is to deal precisely with specific and well-defined tasks so that the underlying, processes may be understood (Blalock, 1969). 7. Finally, research findings about the relative strengths and weaknesses of various forms of graphical designs for performing different elementary tasks may often be integrated for drawing higher inferences. For example, if we know that isolated symbols will strongly facilitate the extraction of exact point values and that unbroken lines will strongly facilitate trend perception, then we may infer that perhaps the most appropriate representation for performing a composite task that requires both extracting point values and reading a trend, is that of a connected symbol graph.t t A connected symbol graph merely connects isolated symbols with an unbroken line. See Cleveland (1985, p. 180-183) for illustrations of all the different representations discussed here. INTRODUCTION / 7 D. OVERVIEW OF THE DISSERTATION Chapter 2 will provide a classification" scheme for the empirical literature, as well as a comprehensive review of the theoretical literature on graphics representations. Then, based on the literature review, a set of factors believed to influence task performance with the use of a graphical presentation will be identified, and a set of propositions that may be translated into testable hypotheses, will be advanced in chapter 3. Following this, chapter 4 will discuss the experimental methodology and the statistical design used in this research program. An examination of the overall structure of the experimental data that will be gathered and how well the data will conform to the various assumptions underying the analysis of variance-covariance for a total within-subject design will be discussed in chapter 5. Chapters 6, 7, and 8 form a series of chapters covering in details the findings for experiments 1, 2, and 3 respectively. Chapter 9 will then attempt to integrate the findings as well as provide generalizations for the results. Finally, chapter 10 will conclude with a summary of key findings, major contributions and limitations of the research, and also will provide suggestions for future MIS graphics researchers. A comprehensive bibliography of quoted references used is included as a separate section, fol lowed by a series of appendices. Included in the appendices are the section on Glossary of Terms for defining the meaning of specialized terms used in the context of the Thesis, the actual questions with their corresponding graphics displayed in the experimental sessions of each study conducted in the research program, the respective instructions, forms, and questionnaire administered to subjects, the pilot testing report, and samples of programs designed for running the experiments and the statistical analyses. II. LITERATURE REVIEW A shift from experiments with complex problem solving tasks to a directed and programmed research approach, with each experiment dealing with an in-depth exploration of some level of decision-related task abstraction^ is currently recommended in the MIS literature evaluating computer graphics and color (e.g. Benbasat & Dexter, 1985,1986; Benbasat et al., 1986; Dickson et al., 1986; Jarvenpaa, 1986; Jarvenpaa & Dickson, 1988). Incidentally, this latter research strategy is not new among color and graphics researchers in areas of human factors and human information processing (e.g. Croxton, 1927; Croxton & Stryker, 1927; Carter, 1947, 1948a, 1948b; Schutz, 1961a, 1961b; Wainer & Reiser, 1979; Wainer & Thissen, 1981; Wainer et al., 1982; Cleveland et al., 1983; Cleveland & McCill, 1984). Indeed, such a methodological shift (see Remus, 1984; Jarvenpaa, Dickson, & DeSanctis, 1985; and Dickson et al., 1986) is long overdue since the complex problem solving approach advocated by prior MIS graphics researchers has often yielded equivocal and piecemeal results without resolving the degree of influence of several key factors, among others including: the Question asked (See Wainer et al., 1982; Pinker, 1983); the Information Set presented (See Bertin, 1983; Lauer et al., 1985); Color (See Christ, 1975; Tullis, 1981; Cleveland & McGill, 1983; Benbasat & Dexter, 1985,1986); Learning (See DeSanctis & Jarvenpaa, 1985 ); and Cognitive Style.t A. CLASSIFYING THE EMPIRICAL LITERATURE Two approaches in MIS graphics experimentation provide a clear basis for classifying the empirical literature (see Davis, 1985; Yoo, 1985; Lauer, 1986; Vessey, 1987): 1. Those using complex problem solving tasks (e.g. Lucas & Nielsen, 1980; Lucas, 1981; Chani, 1981). In Davis' words, the approach adopted by this first set of studies faces such problems as t I.e. Designing the complex decision task as a more refined set of sequentially executable chain of elementary task activities. $ The concept of Cognitive Style is explained briefly in the Glossary. 8 LITERATURE REVIEW / 9 ... a failure to specify and control the variables which affect performance with an information presentation ..(and)..the use of confounding experimental tasks... (which resulted in)..a series of studies which are fatally flawed and whose findings are in conflict with each other (Davis, 1985,p. 41). 2. Those focusing primarily on the process of information extraction from various displays using various question types (e.g. Price, Martuza, & Crouse, 1974; Lusk & Kersnick, 1979; Wainer et al., 1982). Davis observes that results of this second set of studies have been restricted because of their failure to provide a sound taxonomy of question type. Indeed, a review of the empirical literature on graphics research shows that it is no more a heuristic source of suggestions than the genuine foundation for a body of research. Evidence of such a contention can be found in the recent Kosslyn et al. (1983) extensive review of the psychological literature on graphics research, as well as past reviews (e.g. Macdonald-Ross, 1977a; Ives, 1982; DeSanctis, 1984; Jarvenpaa et al., 1985). Thus, Kosslyn et al. could not find any systematic approach examining the various aspects of the graph comprehension process. Instead they found many methodological problems accompanying earlier studies, including such flaws as confounding of perception with memory, tallying of errors rather than scaling perceived values of a graphed variable psychophysical^, failure to counterbalance order of presentation of conditions, usage of a single dataset presented as graphics stimuli, provision of ambiguous instructions to subjects and neglecting to inform subjects on what should be attended to in the graph. There has been a definite lack of either theoretical integration or an accumulated body of empirical evidence to determine the circumstances under which different forms of presentation may be more appropriate for different types of tasks. Experiments conducted by Wainer et al. (1982) and by those from Indiana university (e.g. Davis et al., 1985; Davis, 1985; Yoo, 1985; Lauer et al., 1985; Lauer, 1986) LITERATURE REVIEW / 10 have been based narrowly on Bertin's theory alone. Their works are not without peculiar weaknesses of their own, as will be pointed out in a later discussion. On the other hand, Pinker's (1983) empirical tests of his graph difficulty principle as well as the works of Simcox (1981, 1983a,b, 1984) have been limited by their use of unconventional graphics. Moreover, their experiments employ graphic stimuli that may be considered overly simple in an administrative context. It is debatable whether results based on the novel graphics stimuli that they used could actually be generalized to real-world business applications (see also Davis, 1985; Lauer, 1986). The focus of Cleveland's experimental tasks is simply on accuracy of graphical perception and not the effective use of graphs to support graph comprehension or decision making. Moreover, aside from the work done at Indiana university, no one has actually attempted to measure effects due to the complexity of graphical information presentations. Consequently, instead of trying to draw valid conclusions from still fairly shaky and disjointed bodies of empirical literature, it is argued that MIS graphics researchers need a rigorous and common theoretical perspective on information systems and task characteristics to guide their research. Therefore future research should contribute to the building of a cumulative graphics discipline with this common goal. Accordingly, the first major concern is to conduct a comprehensive survey of the literature on the narrower area of theories of graphics rather than of the whole empirical literature. B. EXISTING THEORETICAL DEVELOPMENT The term theory has different meanings for different writers.t For some, it implies a detailed, systematic and comprehensive approach to a particular area (see Dubin, 1978); for others, it may just be a set of plausible statements that describe a phenomenon (see Cleveland & McCill, 1984). The purpose of this review, therefore, is to introduce to the reader the literature on existing and current theories of information representation (e.g. Chernoff, 1978; Bertin, 1967,1973,1981,1983; t A definition of this term is provided in the Glossary. LITERATURE REVIEW / 11 Pinker, 1981,1983; Kosslyn et al., 1983; Cleveland & McCill, 1984; Cleveland, 1984,1985; Kosslyn, 1985). The intent is: (a) to provide a rigorous and scientific foundation for this dissertation; (b) to extend current theoretical works (e.g. Pinker, 1981; Kosslyn et al., 1983) so as to provide specific models for proposed graph schema structures and accompanying processes; and (c) to draw a priori predictions from these theories with respect to performance on various elementary tasks for different designs of time series graphics. Research based on a systematic approach of theoretical development and empirical testing is more likely to yield better direction to graphics designers on the application of the various design elements as effective decision making aids, than are "one-shot" studies. 1. Theories As Kosslyn (1982) points out, different kinds of theories are meant to illuminate different facets of the same phenomena; for instance, Kosslyn (1982) distinguishes: a theory of computation, which specifies what is computed without regard for how it is computed; a theory of functional architecture, which specifies the structures and processes that are available for performing the computation; and a theory of the algorithm, which specifies how computations are carried out within the confines of the functional architecture. Again, for every theory of computation, numerous theories of the functional architecture could be used to explain actual processing. Further, given a theory of functional architecture, several different algorithms are possible within its confines, such as varying the order in which specific operations are performed (see Kosslyn, 1982). 2. Extensibility of Graphics Theories Some newer theories of graphical information representation (e.g. Kosslyn et al., 1983; Pinker, 1981; Kosslyn, 1985) have claimed that their treatments of perceptual and cognitive processes underlying a graph reading may be naturally extended to accommodate other less constrained forms of information LITERATURE REVIEW / 12 representation. In this regard, Kosslyn et al. argue that since graphs are among the most general form of information representation and yet are also the most constrained,t it is simply a matter of "...relaxing various strictures for making a good chart or graph when considering making a good map, diagram, or table." (Kosslyn et al., 1983, p. 15). Even so, these recent applied cognitive science theories on graphics have yet to be mentioned in the mainstream MIS literature and as they form a foundation on which many graphics research should have been rooted, a more detailed treatment of these theories will be given in contrast to others; this, despite the fact that the other theories will also be as important in contributing to the ideas developed in this dissertation. C. CHERNOFF'S LIST OF ATTRIBUTES Chernoff (1978) provides one of the earliest steps towards the development of a graphics discipline. Building on Schmid's (1954; Schmid & Schmid, 1979) three key attributes for classifying charts and graphs, Chernoff proposes the following list of 17 attributes:* 1. Illustrate or communicate 2. Analyze or comprehend 3. Compute 4. Impact 5. Mnemonic Character 6. Attraction 7. Accuracy (Precision) 8. Accuracy (Lack of Distortion) t I.e. They function to communicate information that is well-structured. Accordingly, Kosslyn et al. (1983) observe that, "graphs are the most constrained form, with two scales always being required and values or sets of values being associated via a paired with relation that is always symmetrical." (p. 14). $ Chernoff believes that more attributes could be added to this initial list. He introduces Schmid's three attributes as the first three in his list. LITERATURE REVIEW / 13 9. Compactness 10. Comprehensiveness 11. Self Explanatory 12. Time 13. Dimensionality 14. Theoretical versus Data 15. Contrast or Sensitivity 16. Ease of Application 17. Audience In his seminal paper, Chernoff (1978) offers brief but descriptive explanations of each of the above constructs, supporting his definitions by illustrating concisely how his list of. attributes may be applied to a graphical method of his very own creation: Chernoff faces. While Chernoff's list, with its common characterization of all information representations, provides a rich source of variables, his characterization is nonetheless intuitive and subjective. In fact, there appears to be an interesting parallel between Chernoff faces and his attribute list; both of them require much subjectivity and intuition to use, and both are seriously weakened by the presence of built-in dependencies among their features. Yet, Chernoff's list forms the skeleton for a general multi-attribute scheme, useful perhaps for evaluating (via self-reporting) the various strengths and weaknesses of different graphical representation methods. Chernoff's list, however, lacks the robustness of an empirical or theoretical construction. It is, at best, good intuition and appears to be driven fundamentally by three major categories of tasks in which graphical representations may be useful: 1. Perceptual Tasks 2. Comprehension Tasks 3. Memory Taskst t These three major categories of tasks follow essentially from the three major purposes of all graphs/charts: communication, analysis, and storage (see e.g. Kosslyn, 1985; Bertin, 1983; cf. Schmid, 1954) The object of communication is to impart to the audience LITERATURE REVIEW / 14 Finally, Chernoff's list may be regarded as forming elements of his theory since it provides insight as to how graphical representation methods as well as their applications may be clustered. 1. C h e r n o f f ' s T h e o r y Chernoff expresses his theory of using appropriate graphical methods for different applications by the fol lowing statement: The key to the successful use of graphics should involve a matching of method and application in terms of the extents of the attributes required by the application and how well the method supplies these attributes (Chernoff, 1978, p. 6). In short, Chernoff believes that it is possible to define both the graphical methods and the applications requiring the use of these methods in terms of his attribute list. According to his theory, optimal result comes from using the information presentation that matches best in characteristics or attributes the nature of the application task being supported. 2. I m p l i c a t i o n s o f C h e r n o f f ' s T h e o r y The key point in Chernoff's theory relevant to this research is the focus on the application task as a critical factor when dealing with effective graphical methods or tools. He argues that not only is it important that one can tell which attributes are effectively supported by the graphical method in question; but, more importantly, one has to know the nature of the application (i.e., task) and the kinds of supporting attributes it suggests. Unfortunately, in spite of the importance of the notion of attribute matching between the application and the graphical representation method, Chernoff's work (1978) fails to suggest any empirically t (cont 'd) information which has typically been analyzed and understood, whereas that of analysis is to find a representation permitting the understanding of what conclusions may be drawn and what relations and regularities exist in the information presented. The object of storage is to assist the reader in remembering the information presented. LITERATURE REVIEW / 15 testable or operationally rigorous definitions of either the application task set or the graphical methods available with respect to a common attribute set. Hence, his theory appeals to researchers particularly because it identifies the application task as a key determinant when using a specific graphical method, in addition to presenting a way of thinking about characterizing common properties of information representations as well as task applications. It is interesting, however, to note that Chernoff's view is in agreement with the current literature on MIS graphics research (e.g. Jarvenpaa et al., 1985; Benbasat & Dexter, 1985,1986; Dickson et al., 1986). Many MIS researchers increasingly assert that a plausible explanation of the contradictory findings among prior graphics research is the lack of control of the task variable. For example, Jarvenpaa (1986) has indicated the need for a common taxonomy of basic decision tasks before there can be any meaningful comparison of research findings. Yet, MIS researchers are slow in proposing a well-defined set of decision making tasks and there have not even been any acceptable operational definitions of the term task at the level of managerial decision making or problem solving, although some researchers appear to be working on the issue (see Dickson et al., 1986). D. BERTIN'S IMAGE THEORY & TAXONOMIES Bertin (1967,1973,1981,1983) advances the first rigorous foundation for the study of graphics and graphics information processing. Yet, it is the English translation of his Semiology of Graphics that popularizes his theoretical treatment on graphics information processing. In reviewing the book, Kosslyn (1985) observes that Bertin "leaves virtually no graphical stone unturned" and that the only general complaint he has about the book is its difficulty to read. 1. Definitions Bertin uses a semiological-r approach to define graphics in relation to mathematics, music and other sign systems. For instance, he observes that both graphics and mathematics are monosemic systems t l .e. A theory of signs and symbols. LITERATURE REVIEW / 16 since a graphic can be comprehended only when the unique meaning of each sign has been specified, just as an equation can be comprehended only when the unique meaning of each term has been specified. But, as Bertin points out, all of the sign-systems intended for the ear are auditory, linear and temporal, whereas those intended for the eye are visual, spatial and atemporal. This leads Bertin to conclude that the true purpose of graphics within the framework of logical reasoning is in the monosemic domain of spatial perception. Moreover, Bertin distinguishes the information to be transmitted in a graphic system (i.e. the content) from the properties of the graphic system (i.e. the container). Information, according to Bertin, is constituted essentially by one or several pertinent correspondences between a finite set of variational concepts, which he calls the components, and an invariant. For example, consider the following two statements: On July 8, 1964, stock X on the Paris exchange is quoted at 128 francs. On July 9, it is quoted at 135 francs. Each statement involves a pertinent correspondence between two variational concepts (the number of francs and the date) and an invariant (stock X) which constitutes the common ground relating the francs to the dates. The complexity of graphical information designed is considered by Bertin to be a function of the number of identifiable elements in each component. Bertin uses the term length to characterize the number of identifiable elements in a given component or variable and offers a taxonomy of levels of organization of the components or visual variables based on the notion of nominal, ordinal, and interval-ratio scale values (Baird & Noma, 1978). Information analysis, in Bertin's view, comprises three stages: (1) determining the number of components, (2) identifying the number of elements or categories in a given component, and (3) defining the level of organization among the components. LITERATURE REVIEW / 17 As for properties of the graphic system, Bertin identifies eight ways in which a visible mark expressing a pertinent correspondence can vary: it can vaiy in relation to the two planar dimensions and in relation to the other six retinal variables including size, value, texture, color, orientation, and shape.t Again, Bertin offers an encyclopedic and useful taxonomy and treatment of the variations of a mark with respect to the retinal variables. A mark, Bertin argues, can be represented as a point, a line or an area within the plane. 2. Bertin's Theory The underlying concept of Bertin's theory is his difficulty metric which is given by the number of perceptual glances necessary to extract the relevant information from an information representation. Applying Zipf's (1935) notion of mental cost to visual perception, Bertin formulates an image theory which basically deals with rules of the graphic system that will aid the designer in choosing the variables required for constructing the most efficient representation. Bertin (1983; p.9) defines efficiency in terms of a question and its answer: If, in order to obtain a correct and complete answer to a given question, other things being equal, one construction requires a shorter period of perception than another construction, we can say that it is more efficient for this question. Basically, Bertin's image theory argues that the reader extracts information from an information representation by visually isolating one or more images or Gestalts (see also Kosslyn et al., 1983) to answer a specific question. Reading involves, first, an external and an internal identification of the relevant components and their respective variables, and, second, the perception of the information itself, ln turn, the perception of information (i.e. a series of pertinent correspondences) depends chiefly on the type of question and the level of reading required by the question. In this respect, Bertin categorizes questions according to types as well as levels of reading^ and distinguishes among three levels of reading ranging from the elementary (i.e. at the level of individual values), t See the Glossary for definitions of these terms. X Refer to Lauer (1986, p.43) for a brief discussion of this issue. LITERATURE REVIEW / 18 intermediate(i.e. at the level of homogeneous categories which are less numerous than the original categories), to the overall level (i.e. at the level of global pattern). Operationalization of Bertin's taxonomy of question types yields the following categories: 1. Exact Questions 2. Trend Questions 3. Comparison Questionst Moreover, Bertin argues that the process of answering a question requires, first, the input identification of the values that have been provided in the question asked, second, the perception of appropriate correspondences between the components, and, finally, the output identification of the required answer. The ease with which a viewer can perform all of these processes determines the speed with which they can answer the question. The most efficient display is that which, given any type or level of question, allows the reader to extract the answer in one glance. Put together, efficient displays are those which require the least number of "perceptual glances" to comprehend well enough to answer a given question. Kosslyn (1985) recommends Bertin's Semiology of Graphics to every serious student and researcher interested in graphics. Yet, in their review on graphical displays, Wainer & Thissen (1981) conclude that Bertin's propositions form a "rudimentary" untested theory in an area which is devoid of organized research and theory. Furthermore, there are ambiguities in some aspects of the Bertin theory. For example, Pinker (1981) argues that Bertin's difficulty metric as given by the number of perceptual glances is ambiguous since Bertin does not define what constitutes a perceptual unit identifiable or recognizable in one perceptual glance. Similarly, Bertin's treatment of image f Wainer et al., 1982. LITERATURE REVIEW / 19 construction, which are rules of construction for selecting the variables to construct the most efficient representation, are difficult to use. Regarding this aspect of Bertin's theory, Kosslyn (1985) comments: The rules were somewhat disappointing...; they are really general goals and are not algorithmic. 3. Implications of Bertin's Theory Fundamentally, Bertin's theory on the use of an information representation to aid decision making is at the level of specifying the question which the decision maker wishes to answer with the information representation. This puts the decision making task into a level of abstraction that is rigorous and empirically testable. It avoids important methodological issues faced by prior MIS researchers. Indeed, a number of recent MIS empirical works have been based on Bertin's theory. E. GRAPHICS RESEARCH AT INDIANA UNIVERSITY In an unpublished paper, Davis et al. (1985) argue that Bertin's taxonomy of question types especially at the intermediate level is ambiguous because it lacks an operational definition of what constitutes a homogeneous category as given in Bertin (1983) (see also Lauer, 1986). They contend that this same ambiguity appears in Wainer et al.'s (1982) operationalization of Bertin's question construct, and that results of the Wainer et al. study with regard to the effects of different questions are: ...ambiguous in that performance as measured by the number of correctly answered questions was greatest for exact (their example of an Elementary Question), then comparison (their example of a Comprehensive Question), and then trend (their example of an Intermediate Question) questions; but, when performance was measured by the time required to answer questions, the ordering was exact, trend, and comparison. (Davis et al., 1985, p. 13). They attempt, therefore, to formulate what they claim are the more rigorous and operationally testable definitions of the question construct and the information-set-complexity construct. LITERATURE REVIEW / 20 Formalizing Bertin's (1973, 1983) general propositions on the process of extracting the relevant question-answer, Davis et al. (1985) advance the following statement of relationship connecting four major constructs: P = f(l,Q,F) where: P = Performance, measured by the time required to isolate the image needed to answer a question. I = The information set presented Q = The question asked F = The form of presentation 1. The Question Construct To avoid the problem found in Bertin's taxonomy of questions, Davis (1985) proposes that the question construct be operationalized by the number and frequency of a set of ordered step(s) taken to perform the necessary question-answer. These steps, found by analyzing subjects' protocols (see Davis et al., 1985) and arranged in increasing order of complexity, are: 1. Identifications 2. Scans 3. Comparisons 4. Estimations 5. Computationst Accordingly, Davis (1985) and Yoo (1985) ran experiments using questions spanning a complexity t Davis (1985) gives brief descriptions and definitions for each of these steps. LITERATURE REVIEW / 21 continuum defined by the number and frequency of ordered step(s) required to extract the correct answer from an information presentation. Their most basic hypothesis is that ...the form of presentation and the complexity of a question are independent; that is, there is no interaction between the steps performed to answer a question and the form in which the information is presented. Results of their experiments, however, fail to support this reasoning. It is debatable whether Davis et al. should have treated question complexity separately from that of information presentation format. For example, Pinker's (1981,1983) arguments strongly support the fact that the same question asked will not be easier or more difficult across-the-board: it depends very much on the sorts of information to be extracted as well as the sorts of representation format used. In other words, each graph format may correspond to different steps for different questions. Hence, tables normally do not support, in Davis' terminology, visual cues the way graphs support them. The lack of robustness in the Davis metric (see Davis, 1985; Yoo, 1985) warrants either its refinement or, more appropriately, a revision of the basic approach. Davis notes that although the content validity of their metric has been demonstrated earlier (Davis et al., 1985), the generation of a taxonomy of question difficulty based on their metric is nonetheless lacking in predictive validity (see Davis, 1985). Indeed, a more critical view of the Davis et al. (1985) and Davis (1985) metric unveils a number of possible problems. First, as already argued, a relatively more difficult question as classified via Davis' metric may be relatively easier with graphs than with tables and vice versa. That is because encoding of numerical magnitudes may well be more effortful than encoding of, say, a bar's level (see Pinker, 1981,1983; Kosslyn et al. 1983). In fact, the relative effectiveness of graphs over tables lies precisely in their capability to capitalize on the power of the human visual systems (Kosslyn et al., 1983; see also Pinker, 1983). Even so, besides the need to encode numerical magnitudes when using tables, it appears that all other computations must necessarily involve calculations (and possibly, some LITERATURE REVIEW / 22 heuristics), whereas graph perception might require instead what Rock (1984) called visual intelligence. Further, it would be wrong to suggest that the Davis et al. (1985) algorithmic processes for answering various questions are the only possible ones (see Kosslyn, 1982); probably, the use of protocol analysis methodology to validate step processes hypothesized in visual information processing will not prove to be as psychologically sound as desired since low-level visual and/or perceptual processes are often found to be unconscious and therefore generally inaccesible to verbalizationt except, of course, the outputs from such processes. The methodology is often, therefore, more appropriately applied to those processes that are at a higher consciousness level such as performing calculation, decision making and/or problem solving. Empirical evidence to support this view may also be found in Pisoni & Tash (1974) and Fodor (1983). Finally, the decomposition of tasks into steps defined by the Davis metric may be different with different individuals. In summary, Davis' (1985) generation of a taxonomy of question difficulty that is independent of variations in information presentation format, basically ignores the geometry into which various sorts of extractible information will be translated due to representational variations and individual differences in perception. To complicate the task decomposition process, psychologists in the area of human perception have suggested that some kinds of perceptual processes, including visual and auditory, are instantaneous and automatic whereas others may need the focus of selective attention and the use of conscious effort (see Fodor, 1983; Rock, 1984). Put simply, the weakness of the Davis metric for a taxonomy of question type based on the complexity of computational steps lies in its failure to accommodate confounding effects due to pertinent principles underlying the perceptual organization of various geometrical format. t See Morton's (1967) argument using the psychological party trick and also Nickerson & Adams (1979). LITERATURE REVIEW / 23 2. The Information Set Complexity Construct Lauer et al. (1985) define the information set complexity f construct as influenced by the following four factors for time series: 1. Length of the Ordinal Variable 2. Length of the Nominal Variable 3. Percent of possible Rank Changes 4. Percent of possible Slope Changes While the first two factors are based somewhat on Bertin's observation of the complexity of a figure, there is little theoretical support or any strong basis for the variations of the last two factors. Moreover, results from their studies (e.g. Yoo, 1985; Lauer et al., 1985; Lauer, 1986) reveal that although certain complexity factors (e.g. length of the nominal variable ) are found to be significantly correlated with performance, there is very little evidence to conclude that the regularity component of information set complexity construct, which is operationalized by the last two factors listed above, is a determinant of performance (see Lauer, 1986, p. 139-140). 3. Implications of Recent MIS Graphics Research Findings Taken together, the results of recent MIS graphics experimentation based on Bertin's theory warrant a further need of adequate control of those critical factors that are likely to affect subjects' performance with the use of an information presentation. For example, the Lauer (1986) and Lauer et al. (1985) studies indicate that although certain factors of complexity do affect performance with an information presentation significantly, there is uncertainty about regularity as one of the influencing factors. Similarly, Davis' (1985) findings indicate that question difficulty is not independent of visual representations and that different questions asked have different effect on performance depending on the format of information representation. . Finally, the Wainer et al. (1982) study indicates that while t This term is used to mean the complexity with respect to the descriptive content of the information set (see Lauer, 1986). LITERATURE REVIEW / 24 the question asked affects performance, there appears to be evidence of a possible time-accuracy performance tradeoff when various visual presentations are tested for the extraction of various quantitative information. Thus it is important that there be adequate control in the experimental design for such a tradeoff effect in future research and in the present study. Overall, the need may be to identify a set of critical tasks in the form of specific questions to be asked of an information presentation, and the systematic examination of these questions. In this case, an alternative classification of question type than that offered by Bertin or even Davis and a system for identifying influencing factors of information complexity become the more critical as well as challenging issues facing current and future MIS graphics researchers. F. CLEVELAND'S THEORY OF GRAPHICAL PERCEPTION In recent years, Cleveland & his associates (e.g. Cleveland, 1984, 1985; Cleveland & McCill, 1984, 1985) have proposed a paradigm of graphical perception which involves three basic premises: 1. A specification and an ordering of elementary graphical-perception tasks 2. A statement on the role of distance in graphical perception 3. A statement on the role of detection in graphical perception 1. The Cleveland-McGill Theory The focus of the Cleveland-McGill (1984) theory is on the accuracy of quantitative judgments with regard to a proposed set of elementary perceptual tasks. Based on a combination of psychophysical theory (e.g. Weber's and Steven's Laws: Baird & Noma, 1978) and other experimental evidence, Cleveland & McGill hypothesize an ordering, according to human quantitative judgments, of ten elementary perceptual tasks ranked from the most to the least accurate: 1. Position along a common scale 2. Positions along nonaligned scales LITERATURE REVIEW I 25 3. Length, direction, angle 4. Area 5. Volume, curvature 6. Shading, color saturation Apparently, the lack of sufficient information to separate the ties between some of the rank ordering forces Cleveland & McCill to place more than one task in three of the ranks above. A lengthy discussion of the theory and how it applies to the extraction of quantitative information on a variety of common graph forms may be found in Cleveland & McGill (1984). Cleveland (1985) classifies all mental-visual tasks involved in graphical quantitative information extraction into two major categories: 1. Those requiring the judgments of geometrical aspects of graphical elements such as position and size: the graphical-perception tasks which are the kinds of tasks dealt with by his paradigm. 2. Those involving the scanning of points to read off values of points using the scale lines and tick mark labels, and those that require conscious rapid mental calculation and quantitative reasoning: the graphical-cognition tasks. To test the predictive validity of their theory, Cleveland & McCill (1984,1985) ran several related experiments in which participants performed different kinds of judgments using dots, angles, lines, bars, pies, and other forms of representations. Evidence drawn from subjects' judgments in what they labeled as the position-length experiment (Cleveland & McCill, 1984) supports the claim that position judgments are more accurate than length judgments by factors varying from 1.4 to 2.5; similarly, those results drawn from subjects' judgments in what they labeled as the position-angle experiment confirms that position judgments are 1.96 times more accurate than angle judgments. These results and others lead Cleveland & McCill to conclude boldly that their theory has "correctly predicted the outcome". LITERATURE REVIEW / 26 While Cleveland & McGill's experiments validated elements of their theory, the findings suggested the need for a revision of their proposed hierarchy of elementary tasks. For example, in the position-length experiment, it is found that as the distance between two values to be judged increased along an axis perpendicular to the common scale, subjects become less accurate in their judgments. In this regard, Cleveland & McGill commented that the position task be expanded into a whole range of tasks.t In effect, they observed that a revision of their theory appears warranted. 2. Application of the Cleveland-McGill Theory to Graphing Data In applying their theory, Cleveland & McGill argue for the following basic supporting principle of data display: $ Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering* (Cleveland, 1985, p. 255). Simply stated, graphs should employ elementary tasks as high in the ordering as possible since Cleveland & McGill claimed that by presenting elementary perceptual tasks as high as possible in the hierarchy, the graph will elicit judgments that are as accurate as possible and, therefore, the graph will maximize the viewer's ability to detect patterns and organize the quantitative information. Consequently, the Cleveland-McGill theory and their experimental results led Cleveland (1984) to propose the replacement of simple bar charts with dot charts. Similarly, divided bar charts could be replaced with clustered dot or symbol charts. Cleveland (1984) argued that such a replacement basically permits all values to be compared by making judgments of position along a common scale in contrast to making judgments of length or area. As supporting evidence, he reinterpreted one of the t I.e. As a range of tasks for judging differences of two values along a continuum of the axis perpendicular to the common scale. t It is not however their claim that this principle offers a complete prescription for graphics designers (Cleveland, 1985). * This refers to the ordering of the ten elementary perceptual tasks discussed earlier. LITERATURE REVIEW / 27 Cleveland-McGill experimental results to show that errors of length-area judgments are 40-250% greater than those of position judgments along a common scale. 3. Implications of Cleveland's Theory It may be argued that Cleveland's theory deals more with the accuracy of graphical perception as applied to various elements of graphics construction rather than with the processes of graphical decision making using various forms of graphical representation. Yet, this line of research is still relevant and important to the study of relative efficacy of various graph formats as representational aids since the process of data extraction from an information presentation is often regarded as essentially one of perception (see Davis et al., 1985; Bertin, 1983; Pinker, 1981). Therefore, MIS graphics designers can also benefit from this line of contributions. However, further empirical evidence is still needed to strongly support claims for the replacement of bar charts by dot charts or divided bar charts by dot charts with grouping. More importantly, there is the need to include empirical testing of comprehension tasks if the Cleveland paradigm is to be of greater value to MIS graphics designers. For example, the question of whether the Cleveland theory holds with an emphasis placed on time performance will be of interest. Indeed, how quickly a person is able to read and understand a standard and unambigiously drawn graph indicates the effectiveness of that graph format for the task in question. G. THE KOSSLYN-PINKER THEORY OF GRAPH COMPREHENSION Kosslyn et al. (1983; Pinker, 1981; kosslyn, 1985) advance both a general computational theory of human visual information processing and a functional architectural model which they apply to explain the underlying perceptual and cognitive mechanisms involved in the reading and understanding of graphs. They also propose a diagnostic scheme for evaluating information representations that is based not only on human processing of visual input but also on the so-called theory of symbols. LITERATURE REVIEW / 28 Based on the contemporary canonical theory of human visual information processing (e.g. Marr, 1982), Kosslyn et al. view the visual processing of information representations to be at three levels as depicted in figure 2.1: 1. Perceptual Image: Sensory Information Store 2. Short-Term Memory 3. Long-Term Memory ln the first phase, an information representation is processed syntactically or pre-semantically, leading to the formation of what is known as a visual sketch. However, processing at this level is limited by several factors, including adequate discriminability, variations of visual properties, processing priorities and perceptual distortions. The output from the first phase of processing is then organized into perceptual units which are operated upon in the second phase of processing. Thus, three lines that meet to form an enclosed figure are seen as a triangle and not simply as three lines since the formation of these perceptual units respects the Gestalt Laws of Organization such as proximity, good continuation, similarity and common fate, and other principles of structural dimensions (e.g. Garner, 1970; Garner & Felfoldy, 1970). Visual information held as perceptual units in short-term or working memory (see Anderson & Bower, 1973; Lindsay & Norman, 1977) can be re-organized and interpreted in various ways. In any case, memory capacity limitations often impose difficulty in reading visual displays either because too much material is presented or else there is too much material placed in a key (i.e. a legend) which requires the reader to engage in an arduous memorization task. Kosslyn argues that reader recognition of an information representation is only possible when the appropriate stored information in long-term memory is contacted. This information constitutes a person's knowledge about how charts and graphs serve to convey information and is a critical step in the graph comprehension process: if a person has never seen a particular display type before, it constitutes a problem to be solved, rather than a display to be read. Importantly, displays should have LITERATURE REVIEW / 29 Figure 2.1: Three Visual Information Processing Stages * R E O R G A N t Z A T I O N * C A P A C I T Y L O U T S " ^KNOWLEDGE SENSORY T-NFQ RMAT I O N • STORE S H O R T - T E R M JMEMORY L O N G - T E R M MEMORY •> * D I S C R I M I N A K I L I T T *DISTORTI0N ^ O R G A N I Z A T I O N ^ P R I O R I T I E S * D ' E S C R I P T X O N C O M P A T I B I L I T Y Source: Kosslyn et al., 1983, p. 321-322, reproduced with permission. LITERATURE REVIEW / 30 description compatibility. In other words, each part of the display, such as labels and the legend symbols, should not be ambiguous or subject to more than one semantic interpretation; displays should not lead a person to access inappropriate information and thus cause him to draw incorrect inferences. 1. T h e K o s s l y n e t a l . A n a l y t i c a l S c h e m e Accordingly, Kosslyn et al. (1983) generate an analytical scheme on the basis of the theory of human visual information processing discussed above and the theory of symbols to provide a detailed diagnostic description of any given chart and graph at three levels: 1. Syntax: where syntactic properties of form classesf corresponding to major basic level constituents of charts and graphs (figure 2.2) can be described 2. Semantics: where the literal reading of each of the components of a chart or graph and the literal meaning that arises from the relations among these components can be described 3. Pragmatics: where ways in which meaningful symbols conveying information above and beyond the direct semantic interpretation of the symbols can be described. Essentially, the Kosslyn et al. descriptive scheme has been developed as a diagnostic questionnaire useful for revealing how any given chart or graph may violate pertinent principles at the syntactic, semantic and pragmatic level. Kosslyn et al. claim that if a graph or chart is unambiguously designed, each question in the questionnaire should be easily answered and that if we have trouble arriving at a straightforward answer to any of these questions, this alerts us that one or more of our operating principles has been violated. The objective, therefore, is to discover which operating principles may be violated in a graph or chart and to specify what changes are to be made in the design of the information representation to make its basic level constituents as well as the relations among these constituents as unambiguous as possible. t These refer to the framework, the background, the specifier and the labels which corresponds to the four basic level of graphic constituents identified by Kosslyn et al. (1983) and illustrated in figure 2.2. LITERATURE REVIEW / 31 Figure 2.2: Kosslyn et al's Basic Level Constituents A G R A P H Source: Kosslyn et al., 1983, p. 323, reproduced with permission. LITERATURE REVIEW / 32 A detailed explanation of all the pertinent principles posited by Kosslyn et al. would demand substantial space (Kosslyn et al., 1983, pp. 1-170); these principles will therefore only be summarized briefly. Further, owing to the large number of terminologies that are used to describe these principles, the Glossary discusses or defines only key terms. At the syntactic level, Kosslyn et al. classify their operating principles as (a) Principles pertinent to seeing the lines which include those of adequate discriminability and perceptual distortion, (b) Principles pertinent to organizing marks into units which include those of the Gestalt laws and Garner's (1970) structural dimensional principles, and (c) Principles of processing priorities and limitations. At the semantic level, Kosslyn et al. identify two major groups of operating principles: (a) Principles of surface compatibility which include those of representativeness and congruence, and (b) Principles of concept and graph schema availability. In addition, Kosslyn et al. argue that as a graphic display may be treated as a complex symbol, two formal principles underlying the theory of symbols become applicable to an unambiguous interpretation of the display: 1. The external mapping principle which means that, Every mark should map into one and only one semantic category, and every piece of information necessary to read the intended information should be indicated unambiguously. 2. The internal mapping principle which specifies that, The correspondence between portions of a display should be unambiguous. At the pragmatic level, Kosslyn et al. draw their operating principles from corresponding principles underlying language (e.g. Grice, 1975) which include: 1. Principles of invited inference LITERATURE REVIEW / 33 2. Principles of contextual compatibil ityt The development of a shorter and more directed form of diagnostic instrument replacing the original lengthy scheme and the results of its application to a substantial and representative sample of charts and graphs are presented in Kosslyn et al. (1983, pp. 171-214). Based on independent evaluations of ten different graphs by means of the short questionnaire, and the aggregation of possible outcomes of two analysts (one being well experienced with the scheme and the other being naive at the outset), Kosslyn et al. claim to achieve an agreement rate between the analysts of 96.58%. This high rate, they claim, confirms the reliability of their questionnaire. The pattern of results reported in Kosslyn et al. suggests that the interpretation of a graph is often contaminated by either the addition of too much information or the absence of relevant information. Evidence shows that the two most frequently occuring faults among graphs are (1) violations of the principle of external mapping and (2) violations of principles pertinent to the organization of marks, ln addition, the greatest proportion of faults found pertains to the specifier alone and its interaction with the framework (see figure 2.2). One interesting though cautionary note revealed by their study is that graphs found in the business area resulted in a much higher number of faults than graphs found in other content areas (e.g. math, physical science, life science, social sciences and "General Interest", which is a catch-all category containing such items as magazines, newspapers and "How to " books). This strongly suggests that graphics designers in the business area are seriously in need of proper guidance. t See the Glossary for definitions of these terms. LITERATURE REVIEW / 34 2. The Kosslyn-Pinker Process Model of Graph Comprehension Drawing on Bertin's (1967,1973) observation that any depicted object on charts, graphs and maps may be described simultaneously by its values along a number of visual dimensions, Pinker characterizes a graph as trying to: ...communicate to the reader a set of pairings of values on two or more mathematical scales, using depicted objects whose visual dimensions (i.e. length, position, lightness, shape, etc.) correspond to the respective mathematical scales and whose values on each dimension (i.e. an object's particular length, position, and so on) are proportional to the values on the corresponding scales. Based on this, Pinker argues that the depicted objects in the graph are mentally represented in two ways: 1. As a visual description to encode the marks depicted on the page in terms of their physical dimensions 2. As a graph schema to spell out how the physical dimensions will be mapped onto appropriate mathematical scales As Pinker suggested, these structures are used by a graph reader to extract different sorts of information, such as the actual value of some scale paired with a given value on another scale, the difference between the scale values of two data entities or the rate of change of values on one scale within a range of values on another and so on. a. Structures & Processes Pinker (1981) and Kosslyn et al. (1983) note that many visual languages (e.g. images, pictures, graphics) proposed in the psychological and artificial intelligence literature (e.g. Palmer, 1975; Marr & Nishihara, 1978; Hinton, 1979; Winston, 1975; Miller & Johnson-Laird, 1976) describe a scene using LITERATURE REVIEW / 35 propositions. In these works, perceived visual entities or objects are assumed to be represented internally as variables, and with predicates being used to specify attributes of and relations among the entities. Such predicate specifications may be a one-place specification of a simple property of an object like Circle (x) (or, x is a circle), a two-place predicate specifying the relation between two objects, such as Above (x,y) (i.e. "x is above y"), or a three and higher-place predicate indicating the relation among groups of objects, such as Between (x,y,z) (i.e. "x is between y and z"). Kosslyn et al. (1983) list four broad principles grounded in basic psychological research to specify the form of the visual description most likely to be constructed from the input of an information representation. Briefly, these principles are: 1. The Indispensability of Space, which states that our perceptual systems pick out a "uni t" or an "object" in a visual scene as any set of light patches that share the same spatial position and not other attributes such as intensity, texture, or wavelength (Kubovy, 1981). 2. The Cestalt Laws of Perceptual Organization, which govern how variables representing visual entities will be related to one another in visual descriptionst (e.g. Wertheimer, 1938). 3. Magnitude Representation, which basically states how quantities may be represented in memory, for example by one of. a set of discrete symbols, or by a continuous interval-ratio scale. This leads Kosslyn and Pinker to distinguish between two forms of magnitude: the interval-value or ratio-value, where the quantity is represented continuously but the units are arbitrary, and the absolute-value, where the units are discrete and well-defined.£ 4. The Distributed Coordinate Systems, which states that memory representations of shape are specified with respect to object-centered cylindrical coordinate systems that are also distributed (e.g. Marr & Nishihara, 1978). In practice, however, Pinker and Kosslyn et al. argue that two factors limit the size of a visual + I.e. How the atomic perceptual units will be integrated into a coherent percept (Pinker, 1981, p. 10; see also Kosslyn et al., 1983). $ See the Glossary for an example. LITERATURE REVIEW / 36 description: 1. Processing Capacity, which limits the number of nodes (i.e. to roughly 4 perceptual units) that are available at any one time in short-term visual description (see Kosslyn et al., 1983; Ericsson, Chase, & Faloon, 1980). 2. Encoding Likelihood, which specifies that different predicates may have different probabilities of being encoded. It proposes that some visual predicates are automatically encoded while others are not, and that the probability of a given true predicate entering into a visual description is a related function of such automatic processes multiplied by a constant that is relative to the availability of processing capacity. It also assumes that the level of node activationt decreases steadily as soon as it is activated, but that the reader can repeatedly re-encode the description by reattending to the graph. Incidentally, a graphic notation has been devised by Pinker (1981) whereby variables are denoted as nodes, one-place predicates are printed next to the appropriate nodes and two-place predicates are printed alongside an arrow linking the two nodes representing the predicate's two arguments. Hence any particular scene can be graphically represented using Pinker's graphic convention as illustrated in Pinker (1981, p. 49; see figure 2.3). Additionally, Pinker and Kosslyn et al. argue that the structure known as a graph schema and its accompanying processes that work over it specify (a) the type of graph that is currently being viewed, (b) how the information currently found in the visual description is to be translated into the conceptual message, and (c) how the request found in a conceptual question is to be translated into a process that accesses the relevant parts of the visual description for the required but unknown piece of information. t This refers to the instantiation of particular variables in a visual description. L I T E R A T U R E R E V I E W / 37 Figure 2.3: Pinker's Graphic Notation t r i a n g l e c i r c l e Source: Pinker, 1981; p. 49, reproduced with permission. LITERATURE REVIEW / 38 According to schema theories (e.g. Minsky, 1975; Winston, 1975; Norman & Rumelhart, 1975; Bregman,1977; Schank & Ableson, 1977), a schema is a representation in memory embodying knowledge in some domain consisting of a description capable of specifying both the information that must be true of some represented object of a given class and the sorts of information that will vary from one exemplar of the class to another. In general, it is believed that a graph schema results from the basic human ability of associating a scale of values in one domain with a scale of values in another domain by means of relating the positive ends of the two respective scales. It is suggested that people also create schemas for specific types of graphs using a general graph schema embodying knowledge of what graphs are for and how they are generally interpreted.t Pinker (1981) and Kosslyn et al. (1983) define four major procedures or processes that access the structures representing graphic information: 1. A Match process that recognizes an individual graph as belonging to a specific type 2. A Message Assembly process that creates a conceptual message out of the instantiated graph schema 3. An Interrogation process that retrieves/encodes new information on the basis of conceptual questions 4. A set of Inferential processes that apply mathematical and logical inference rules to the entries of the conceptual message Essentially, the Kosslyn-Pinker model of graph comprehension posits that a visual array of incoming information is first translated directly into a visual description via some bottom-up encoding mechanisms. The visual description in turn primes the appropriate graph schema in memory via a MATCH process. The visual encoding mechanisms then detect the presence of any predicates that are automatically encoded in the visual processes and perform assembly of conceptual messages. In t For example, Kosslyn's analysis of graphs reveals that they consist generally of pictorial material, a framework, and a set of labels (see Kosslyn et al., 1983). LITERATURE REVIEW / 39 short, the availability of these predicates causes certain conceptual message equations to be flagged when the necessary information to be extracted is also available. Otherwise, if the information is unavailable, the interrogation processes via top-down encoding mechanisms may be used to aid in answering the particular conceptual questioni posed. Finally, the reader has the additional option of performing mathematical and logical operations on the entries in the conceptual message via the inferential process, if necessary, though it would consume more time and effort. Figure 2.4 illustrates the Kosslyn-Pinker model of the graph comprehension process. b. Pinker's Graph Difficulty Principle The most critical aspect of the Kosslyn-Pinker model of graph comprehension is Pinker's (1981) Graph Difficulty Principle, which simply states that A particular type of information will be harder to extract from a given graph to the extent that inferential processes and top-down encoding processes, as opposed to conceptual message lookup, must be used. Pinker (1981,1983) contends that his Graph Difficulty Principle has frequently been noted in the graph comprehension literature (e.g. Macdonald-Ross, 1977a) in such statements as, ...different types of graphs are not easier or more difficult across-the-board, but are easier or more difficult depending on the particular class of information that is to be extracted. t See the Glossary for the definition of this term. LITERATURE REVIEW / 40 Figure 2.4: Pinker's Process Model of Graph Comprehension c o n c e p t u a l m e s s a g e /r-m e s s a g e a s s e m b l y b o t t o m - u p e n c o d i n g v i s u a l a r r a ^ . v i s u a l d e s c r i p t i o n t o p - d o w n e n c o d i n g !J4AT^ m s t a n t i a t e c g r a p h s c h e m a V i n f e r e n t i a l p r o c e s s e s i n t e r r o g a t i o n Source: Pinker, 1981; p. 63, reproduced with permission. LITERATURE REVIEW / 41 Based on the fact that there is a variety of available descriptions in our language system for the shapes of lines as well as pairs of lines (e.g. straight, curved, parallel, x-shaped, etc.) and that the availability of these predicates affords the possibility of a rich set of message flags for trends in a line graph schema, Pinker (1981,1983) argues that line graphs are especially suited to the extraction of trend information. Hence, they are the preferred method of displaying multidimensional scientific data, where cause-and-effect relations, quantitative trends, and interactions among variables are at stake; on the other hand, Pinker (1981) notes that it is sometimes preferable to use a bar chart rather than a line graph to determine the difference between two levels of one variable corresponding to a pair of values on another since the desired values are specified individually in bar but not in line graphs. Yet, Pinker (1983) admits that one of the major problems with this observation, is that ..there is no independent evidence for the putative perceptual effects alluded to (e.g., effortless perception of line shape and single bar length versus effortful perception of a set of relative bar lengths and the height of segment of a curve). In a similar vein, there has been no independent source of empirical evidence to suggest that the use of symbol charts to support scale value extraction is superior to the use of other graphs. In other words, there is a definite need for gathering direct and independent empirical evidence to resolve the very basic argument of these alluded to perceptual effects! c. Pinker's Treatment of Information Extraction Pinker's treatment of the information extracted from graphs follows closely that of Bertin (1967,1973). According to him, the information extracted can be expressed in a representation comprising a list of numbered entries, each of which specifies a pairt of variables, the type or extent of each independent variable (i.e. single value, pair, range), and its value (or difference or trend) of the corresponding dependent variable. t For more complex graphs, an n-tuple of variables must be specified instead (see Pinker, 1981, p. 17). LITERATURE REVIEW / 42 For instance, in the case of figure 2.5 taken from Pinker, the following conceptual messages may have been assembled: 1. The price of graphium is very high in March. 2. The price is higher in March than in the preceding months. 3. The price steadily declined from March to June. 4. The price is $20/ounce in January. 5. The price in June is x (where x is a quantity about half of that for January, about a fifth of that for May, etc.). As a result, Pinker tabulates the set of paired observations assumed to be extracted from the graph as follows: 1:Va absolute-value = March, 2:Va pair = March & February, 3:Va range = March - June, 4:Va absolute-value = January, 5:Va absolute-value = June, Vb level = high Vb difference = higher Vb trend = decreasing Vb absolute-value = $20/oz. Vb ratio-value = x. Thus, it has been surmised that conceptual messages may be generally expressed as an input-output pair of, i:Va ratio-value/absolute-value/pair/range = A, Vb ratio-value/absolute-value/pair/range = B where the subscripts " i " and "a or b" represent an arbitrary number of entries and variables respectively, t t See Pinker (1981, p. 18) for more detail. L I T E R A T U R E R E V I E W / 4 3 Figure 2.5: Pinker's information Extraction from a Bar Chart HO -i MONTHLY PRICE OF GRAPHIUM 1 2 0 -1 0 0 -ao CL C v_ O 6 0 -40 -20 n J A N F E B MAR APR MAY JUN MONTHS Source: Pinker, 1981; p. 56, reproduced with permission. LITERATURE REVIEW / 44 Consequently, the notation for conceptual questions may be expressed simply by replacing the A or B in the generalized entry presented earlier with the ? symbol, indicating that it is the desired but unknown piece of information. Finally, a good grasp and illustration of the Kosslyn-Pinker model of graph comprehension requires a careful examination of their depicted bar chart schema, which is discussed next. 3. Graph Schema Models a. A Bar Chart Schema Figure 2.6 presents a substantial chunk of a schema for interpreting bar charts (Pinker, 1981; Kosslyn et al., 1983). It shows how the scene of a bar chart is mentally divided into its "L-shaped" frameworkt and its pictorial material representation: the bars. From here, the framework is subdivided into the abscissa and the ordinate, which are further subdivided into the actual line and the text printed alongside it. There are also explicit listings for the "pips" cross-hatching the ordinate and the numbers associated with them. All of the relevant information associated with each bar (e.g. height, position, etc.) is specified with respect to the coordinate systems centered on the respective axes of the framework. Pinker uses an asterisk to show that a node, together with its connection to other nodes, can be duplicated any number of times in the visual description. Conceptual information that is available for "reading off" the instantiated graph schema, such as the psychophysical ratio-value of the independent variable (IV) to be equated with the horizontal position of the bar with respect to the abscissa and that of the dependent variable (DV) to be equated with the height of the bar with respect to the ordinate, are specified by means of equation flags appended to respective nodes or arrows. More interestingly, Pinker notes that in the bar chart schema, deriving the DV absolute-values* may t This is the two planar dimensions described in Bertin (1983): the "x-" and "y-" axes, t Pinker uses the term absolute-value to denote a scale whose units are discrete and well-defined, as the number of bars in a bar chart; however, he uses the term ratio-value LITERATURE REVIEW / 45 require the mediation of one's psychopysical ratio-values but nof so in the case of the IV absolute-values due to the fact that bars stand physically on the abscissa framework. Similarly, the extremeness levels of the DV including the maximum or minimum levels, and the DV "staircase-trend" for a simple entity plot or the level differences of an adjacent bar pair are presumed to be automatically available. Even then, Pinker hypothesizes that higher-level inferential processes may have to be used when converting between ratio-values and absolute-values of unknown entriest in the conceptual message, which means more time and effort. Taken together, this bar chart schema claims that readers (at least those who are experienced) are able to translate directly a higher-order perceptual pattern (e.g. a group of bars comprising a staircase) into a quantitative trend, to translate efficiently differences between a pair of adjacent bars into an entry expressing a difference in the symbolized values, and to translate a salient perceptual entity into an entry expressing the extremeness of its corresponding variable value, without the mediation of one's psychophysical ratio-scale values. b. A Symbol Chart Schema The Kosslyn-Pinker graph schema presented here is extensible to modeling comparable graph schemas: pie chart schema, dot or symbol chart schema, line graph schema and so on. In what follows, this paper will try to offer a brief but concise extension of the Kosslyn-Pinker model to the symbol* and the line representations, since Pinker's version of bar chart schema is sufficient for the purpose of this research. ^(cont'd) to mean those quantities whose units could be changed to other units without any loss of information (e.g. yards-feet-inches). See also the Glossary. tE.g. Graphium price in June in figure 2.5. $For a dot chart, each dot may be treated as a specific kind of symbol. LITERATURE REVIEW / 46 Figure 2.6: Pinker's Proposed Bar Chart Schema Source: Pinker, 1981; p. 61-62, reproduced with permission LITERATURE REVIEW / 47 Figure 2.6 (cont'd): Pinker's Proposed Bar Chart Schema Source: Pinker, 1.981; p. 61-62, reproduced with permission. LITERATURE REVIEW / 48 A logical follow-up of the symbol graph schema model is to divide it into its "L-shaped" framework and its pictorial material: the dots or symbols. Again, the framework should be subdivided into the actual line and the text printed alongside it. Such a schema should be very similar to the bar chart schema. Thus, its visual description should contain explicit listings for the "pips" cross-hatching the ordinate and the abscissa, as well as the letters or numbers associated with them. All of the relevant information (e.g. vertical and horizontal positions) of each symbol should be specified via the co-ordinate systems centered on the respective axes of the framework. Again, an asterisk may be used to represent that a node, together with its connection to all other nodes, can be duplicated any number of times in the visual description. Conceptual information that is available for "reading off" the instantiated schema, such as one's psychophysical ratio-values of the independent variable to be equated with the horizontal position of the symbol used with respect to the abscissa and that of the dependent variable to be equated with the vertical position with respect to the ordinate, should be specified by means of equation flags appended to respective nodes or arrows. The key differences between the bar chart schema and that of the symbol chart lies in the anchoring of bars to the abscissa framework and the stronger linkages among the symbols. Bars are more likely to be perceived as rectangular objects, which are strongly anchored to the abscissa but symbols have only weak anchoring to the abscissa frame. In addition, symbols with the same shape show a greater cohesiveness than bars, especially if they are representing multiple datasets. Finally, Garner's (1970) structural dimensional principles would postulate that the vertical as well as horizontal positions of bars as well as symbols are integral but the simplicity of the symbol chart would facilitate the reader's ability to translate more effectively interpolated scale values of a symbol's exact location than that of either the bar or a point on the line as observed in Cleveland & McGill (1984). With respect to a pair-of-symbols, rather than translating a level difference into a judgement of difference in anchored length as in the case of a pair-of-bars, level difference information would translate into a judgement LITERATURE REVIEW / 49 positional difference. Again, the latter task is performed more accurately in human visual processes (Cleveland, 1985). As for trend perception, a series of bars translates into either an ascending or descending staircase or else merges into a large rectangle shape.t In contrast, a series of symbols may either produce a series of meaningful patterns if they are placed close together or else result in an array of scattered patterns especially if they are widely spread out. c. A Line Graph Schema In the case of a line graph schema, its visual description should again be naturally decomposed into its "L-shaped" framework and its pictorial material: the lines. Like other graph schemas, the framework will be subdivided into the abscissa and the ordinate, which are further subdivided into the actual line and the text printed alongside it. Evidently, the line graph schema does not have any straightforward way of deriving absolute-values for either the DV or IV, being totally disjointed from the 'L-shaped' (i.e. axes) framework. Instead, one seems to assess the locations of segmented "points" along both the horizontal and vertical spatial dimension in terms of the reader's psychophysical scales, and then pips are searched along both the abscissa and the ordinate for those closest to specifying the point's location. Thus, the point's positional scale-values are spelled out by the corresponding numbers or letters that must be disembedded and matched onto appropriate pips via an ' interpolation' process (i.e. one of the inferential processes). Consequently, it requires effort to extract single datapoints on a line. The Cestalt Laws of organization of marks would also argue that each line have strong point cohesiveness (i.e. fully connected) and thus patterns or trends of successive pairs or ranges of points t See figure 2.6. LITERATURE REVIEW / 50 are seen together. A longer* range of abscissa values would also enhance trend and pattern perception, in direct contrast to bars. This is especially true for scheme of multiple dataset plots since an additional dataset represented as bars will create multiple bars entries but it will only create an additional line entry when represented as lines. Hence, conceptual messages on patterns and trends for lines will naturally be assembled relatively faster than that afforded by a bar or symbol schema. In summary, the good aspect of the line graph schema is its rich set of message flags for trends: richer in fact than any of the other types of graph or chart schemas discussed. For example, Pinker argues that in the case of the standard Cartesian Line graph representing a dependent variable on the ordinate and two independent variables on the other components: The absence of an effect of the independent variables on the dependent variable translates into flat, overlapping lines; an effect of one of the independent variables translates into two lines with a slope, and an effect of the other independent variable translates into non-overlap of the two lines; additivity of the effects of the two independent variables translates into parallel lines and non-additivity into nonparallel lines, and so on (Pinker, 1983, p. 6-7). 4. implications of the Kosslyn-Pinker Theory The Kosslyn-Pinker process theory of information representation uncovers a number of stages that a graph reader undergoes in the comprehension process. Essentially, time performance for a particular task is longer with top-down as opposed to bottom-up processing. Experience or familiarity with extracting various data from certain graph formats will enhance processing time. For some tasks, the use of a certain format for a certain task may also induce faster processing strategies compared to another task. The important implication of the theory is, then, the appropriate mapping of formats to tasks in order that the most efficient processing strategies may be executed. t By that I mean when there is a larger number of "t ime periods" to be depicted along the abscissa of a time series graph. LITERATURE REVIEW / 51 The findings in the literature appear to generally support such a view. For example, with time-series data, subjects were found to use line graphs better than horizontal or vertical bar charts for reading trends (Schutz, 1961). However, tables generally led to better performance than graphs for point value reading tasks (Washbume, 1927; Carter, 1947; Lusk & Kersnick, 1979; Benbasat et al., 1986). When comparing points and patterns, the evidence in the literature is that graphs as compared to tables lead to better performance (Washbume, 1927; Carter, 1948; Feliciano, Powers, & Bryand, 1963). A more critical implication of the theory that is of relevance to this research, however, is the focus on principles (e.g. the Cestalt Laws, Pinker's Graph Difficulty Principle) governing why some forms of graphical representation should be more supportive of performance with some tasks but not with others. For example, the Pinker's Graph Difficulty Principle allows many of the phenomena about ease or difficulty of reading graphs to be explained, and ideas discussed in earlier theories to be integrated for empirical validations. Several a priori hypotheses are thus drawn from these theories and evaluated in this research program so as to. add to the accumulated evidence (e.g. Pinker, 1983; Cleveland & McGill, 1984) supporting (or refuting) the general predictive validity of the theories. III. THEORETICAL PROPOSITIONS The literature review of the preceding chapter indicates that it is (a) the task in terms of a conceptual question asked, (b) the format of graphical representation, and (c) the graphical information complexity of the display that affect performance (e.g. time) with an information presentation. A fourth factor, experience of the graph reader, could also influence performance. This factor will not be manipulated; rather, replications of the experimental sessions would be used to minimize learning and experience effects. A. CRITICAL FACTORS Accordingly, this chapter begins with a discussion on those factors believed to affect critically the use of an information presentation: 1. Graph Format 2. Information Complexity 3. The Task Variable 1. Graph Format Different writers use different terms such as modes, forms, formats, and/or representations to convey more-or-less the same meaning (Lusk & Kersnick, 1979; Lucas & Nielsen, 1980; Kosslyn, 1982; Larkin & Simon, 1987). For example, the term visual factors (Washburne, 1927, p. 468) was used to refer to those factors that have to do with similarities and/or dissimilarities of geometrical patterns used in graphic numerical representations. Since the perception of various graph formats is constrained by various operating principles governing human visual information processing (Kosslyn et al., 1983), the key issue regarding the use of one graphic format instead of another lies precisely in the resulting pattem(s) that the human visual system can automatically extract (Pinker, 1981, 1983; Kosslyn et al., 1983). For example, Kosslyn (1985) argues i that three lines that meet to form an enclosed figure are perceived as a triangle, rather than simply 52 THEORETICAL PROPOSITIONS / 53 three lines, apparently because the human visual system is governed by perceptual operating principles known as Cestalt Laws of organization discussed in chapter 2. Other examples where Cestalt principles apply include the automatic registration of entire lines in a line graph, the reading of spatially isolated symbols in a symbol graph, and the perception of discrete rectangular bars anchored along the abscissa in a bar chart. Accordingly, the relative effectiveness of a graph to convey its informational content depends greatly on the choice of its representational format: circles, triangles, shaded figures, unfilled or filled dots, symbols, wedges, bars, pictures or lines. It is this choice that complicates important design issues of color and graphics enhanced information support systems available for end-users and/or decision makers. This research is limited to dealing with three graph formats that are the most widely used in time-series representations: bars, symbols, and lines. 2. Information Complexity Bertin (1983; p. 6) treats complexity of graphics as a function of the number of identifiable elements in each variable component or dimension.t He uses the term, component to refer to the two planar dimensions, and six retinal variables including value, shape, size, color, texture and orientation* which, he claims, are the only possible variations available in graphics designs. As noted in the preceding chapter, a major facet of information complexity in time-series graphics corresponds to variations of the ' length' of the various graphical components (e.g. "quantity scale" represented by the ordinate scale pips, "t ime period variation" represented by the abscissa time axis pips, and "dataset category" represented by the classification o f data groupings.* t Please consult the Glossary (appendix A) for the definition of a Dimension. t- These terms are defined in the Glossary (appendix A). * Refer to the Glossary for detailed definitions of these variational concepts. THEORETICAL PROPOSITIONS / 54 Owing to the lack of either a comprehensive theory or sufficient empirical sources of evidence to suggest how these various factors of graphical information complexity could affect performance with a graphical presentation and how they would interact with each other, a logical approach to exploring their influence would be to combine what might constitute discriminable values or levels of the various complexity factor treatments as proposed in table 3.1.t One advantage of this classification scheme (table 3.1) compared to that offered by Lauer et al. (1985) when applied to time-series graphics is its parsimony. The scheme also focuses on the more significant notion of length* and ignores the less important construct of the regularity* factor. More critically, by investigating each of the component lengths at both the high and low level combinations relative to each other, independent contrasts and comparisons of the different factorial effects may be achieved rather than the possible confounding of effects due to the various factor combinations as proposed in the Lauer et al. scheme. In fact, a one-to-one correspondence may be drawn between factors of graphical information complexity presented in table 3.1 and the major graphical components characterizing a time-series graph as illustrated in figure 3.1: t See also figure 3.1 for an illustration of the various components affecting complexity factors listed in table 3.1. * The term length as defined by Bertin refers to the concept of identifiable elements in a given component or variable. For example, the PIV component length of a time-series graph equals the number of time periods depicted along its abscissa while its SIV component length would be the number of data groupings represented in a key (i.e. legend). For time-series graphics, the ordinate length could be described as an inverse function of the number of interpolating pips (i.e. quantity representation) or the significant digits used in DV scale component. See also Lauer et al. (1985), Lauer (1986) and Yoo (1985) for further discussion on the issue of ' length'. * Lauer (1986) and Lauer et al. (1985) operationally defined regularity as the degree of fit as well as the percent rank changes in slope. In contrast to more-or-less demonstrated effects due to lengths of time-series graphic components, there is little evidence to indicate the significance of a regularity construct (Yoo, 1985; Lauer, 1986). Even so, the definition of this construct is somewhat more speculative than that of Schutz's degree of line-crossing (or, confusability factors), which was found to contribute also to increasing complexity of graphics (see Lauer et al., 1985; Schutz, 1961b). One way to control the Schutz effect would be to ensure that the data sets used in constructing the graphics stimuli would not result in any form of line-crossing. THEORETICAL PROPOSITIONS / 55 1. The Dependent Variable (DV) component characterizing the quantity scaling represented along the ordinate scale (i.e. y-axis) 2. The Primary Independent Variable (P!V) component characterizing the time period variation represented along the abscissa time axis (i.e. x-axis) 3. The Secondary Independent Variable (SIV) component characterizing the dataset category represented as a coding scheme in a key (i.e. legend) The importance of manipulating complexity factors in the study of information presentations may be attributed to: 1. The difference between extracting the same information from a complex as opposed to a simple time-series representation. Complexity is important because of its relationship to the amount of information a reader can assimilate and understand in a presentation. Bertin comments that results of the Wainer et al. study (see Wainer et al., 1982) may vary as data amount and complexity increases. Pinker (1981, 1983) observes that the effects of extracting such information like relative levels or trends from various forms of graphs may be different when complex rather than simple graphs are used. 2. " T h e increasing size of the organizational information resource (see Lauer, 1986). There is the need of guidelines for IS designers on how increasingly complex information may be effectively presented. Knowledge of the relative effectiveness of various complex graphics for various kinds of tasks can contribute to a better understanding of graphical design principles. Undoubtedly, complex time-series graphics are often found among real-world applications. 3. Finally, results of task performance comparing simple versus complex graphics will yield better generalizations than findings based on only simple or only complex graphs. Further, graphics used in a MIS context have been far more complex than what has been tested by researchers in areas outside of the MIS field. Consequently, to produce useful results for all practical purposes, graphs ranging from the fairly simple to the complex forms are used as experimental stimuli in this research. THEORETICAL PROPOSITIONS / 56 T a b l e 3 .1 : A C l a s s i f i c a t i o n S c h e m e f o r I n f o r m a t i o n C o m p l e x i t y Fac tors I n f o r m a t i o n Complex i t y Q u a n t i t y S c a l i n g Time P e r i o d Var i a t i o n D a taset Category a Low Low Low b H i g h Low Low c Low H i g h Low d H i g h H i g h Low e Low Low H i g h f H i g h Low H i g h g Low H i g h H i g h h H i g h H i g h H i g h GROSS P R O C E E D S O F VARIOUS CORPORATE SECURITY ISSUES F R O M THE YEAR 1910 TO T H E YEAR 1982 300 Secondary Independent Vari'aBle Component: Dataset Catergory o a c o Legend « Public Sorwdi O S lock j Prlvol« 8ondi Dependent too Variable • Component: 1 Quantity S c a l i n g • o o B O 0 O a o — i Primary Independent V a r i a b l e Component: Time Period V a r i a t i o n THEORETICAL PROPOSITIONS / 58 Owing to the limited number of factors that can be effectively studied at once, the two factors of "t ime period variation" and "dataset category" of time-series graphics as defined in the Glossary are the only manipulations of graphical information complexity performed in this research. 3. The Task Variable Essentially consistent with Bertin's taxonomy of question-types, Pinker (1981) and Kosslyn et al. (1983) provide a taxonomy of basic classes of information for both the independent and the dependent variables that is extractible from any information representation. Their independent variable (IV) classification includes: (a) a single datapoint; (b) a pair of adjacent datapoints; and, (c) a range of successive datapoints. Their dependent variable (DV) information categories include: (a) a level; (b) a ratio-value; (c) an absolute-value; (d) a difference; and, (e) a Trend. The difference between the concept of a ratio-value and that of an absolute-value is explained in the Glossary.t The table given below compares and contrasts the various classifications proposed by Bertin (1981) and those of Pinker (1981) regarding the classes of primitive symbols* and/or the relationship among these primitives that are extractible from a presentation. Bertin's Levels Bertin's Classes Pinker's IV Pinker's DV Elementary A Primitive Symbol A Single Datapoint A Scale Value A Level Intermediate A Homogeneous A Pair of Adjacent Scale Values Cluster Datapoints A Difference A Trend t These terms are also discussed in a preceding footnote under the section on Pinker's proposed bar chart schema. * Refer to the Glossary for a definition of this term. Note also that since the interest of this research will only be to focus on the y-axis scale values of the various time-series datapoints, the term, scale-value is preferred to the more specialized terms of absolute-value and/or ratio-value as defined in the Glossary. THEORETICAL PROPOSITIONS / 59 Overall A Comprehensive A Range of Cluster Successive Datapoints Scale values Relative Levels A Trend Indeed, a good grasp of the kinds of critical and fundamental questions that may be asked on the extraction of various sorts of quantitative information from a graphical presentation can be arrived at by carefully examining the above table. For a single datapoint on a time-series, the apparently fundamental information to extract is its DV quantity or scale value. Examples of conceptual questions on single datapoints for time-series graphics with the usual quantity-time correspondence represented along the y- and x-axes as depicted in figure 3.1 are: 1. What are public bonds' proceeds in 1934 ? in 1928 ? in 1952 ? 2. When did proceeds from stocks first reach the $400 million mark? the $250 million mark ? 3. Which investment type, in any year, come closest to the $500 million mark ? To find the quantities or scale values of two or more datapoints would just be a natural extension of this fundamental task. For a pair of adjacent datapoints, on the other hand, information about its DV level difference appears basic and important. Examples of conceptual questions on adjacent pair datapoints for time-series graphics (see figure 3.1) with the usual quantity-time correspondence are: 1. Is the level of proceeds from stocks in 1958 higher than that in 1964 ? 2. Did the greatest change in levels of proceeds from stocks occur between the years 1958 and 1964 ? 3. Which investment has the largest change in levels of proceeds between two consecutive time periods ? THEORETICAL PROPOSITIONS / 60 The concept of a level difference is meaningless for an isolated datapoint. The level differences for a range of successive datapoints is, evidently, another natural extension of the basic task discussed here. Finally, for a range of successive datapoints, information regarding the i rDV trend appears to be most critical. Examples of conceptual questions on successive ranges of datapoints for time-series graphics with the usual quantity-time correspondence (see figure 3.1) are: 1. What is the general trend of proceeds from private bonds during 1940 to 1982 ? 2. What is the maximum time periods over which proceeds from stocks continued to rise ? 3. Which investment has the longest falling trend in proceeds ? First, it is not possible to specify a trend for a single datapoint. Moreover, the concept of a trend for a pair of adjacent datapoints is simply embedded in the more basic concept of the level difference between them. What appear to have been left out on purpose, however, are cases of pairs or ranges of non-adjacent datapoints. This is because when two or more datapoints are non-adjacent, there must be some other datapoints in-between. Hence, the different kinds of questions that can be asked about them could be regarded as further extensions and/or generalizations of what have already been discussed. Accordingly, these cases are best treated as composite tasks to be studied as extensions of elementary questions described so far. As a further complication, since each datapoint in a time-series with just one dataset depicts only a time-quantity (x,y) attribute correspondence, when there are two or more datasets, the need arises for an additional attribute characterization - that of a dataset classification. In such an event, the resulting attribute characterizations would then correspond to first, the DV quantity scale component, the PIV time period component, and the SIV dataset categorization component. f t Figure 3.1 illustrates the meanings of these attribute characterizations. THEORETICAL PROPOSITIONS / 61 Collectively then, a task may be classified by: 1. Attribute component, depending on which attribute component the answer is to be disembedded: the DV, the PIV, and/or the SIV component t 2. Question types, such as questions on Scale Values, Level Differences, Trends, Cluster Relationships, and so on . * Table 3.2 shows a general classification of graphics research tasks. Limiting question types to the three elementary classes discussed so far, six different elementary task combinations are thus identified for a time-series with only one dataset, and nine elementary task combinations for a time-series with two or more datasets.* Table 3.3 shows the classes of elementary questions that are studied. These include: 1. Exact Questions (i.e. scale values of single datapoints) 2. Relationship Questions (i.e. level differences of pairs of consecutive datapoints) 3. Trend Questions (i.e. trends of ranges of successive datapoints) This research program investigates, in a systematic order, the nine different fundamental task combinations presented in table 3.3. t Note that those questions pertaining to extracting embeded answers from the SIV component (tables 3.2 and 3.3) are only applicable when more than one dataset (figure 3.1) is involved. * See table 3.2. * See table 3.3. THEORETICAL PROPOSITIONS / 62 Table 3.2: A General Classification of Graphics Research Tasks G r a p h A t t r i b u t e C o m p o n e n t s G e n e r a l I n f o r m a t i o n - E x t r a c t i o n T a s k s S c a l e V a l u e s R e l a t i v e L e v e l s T r e n d s . . . e t c . DV Componen t P r i m a r y I V Component S e c o n d a r y I V Component THEORETICAL PROPOSITIONS / 63 Table 3.3: Classes of Elementary Comprehension Tasks G r a p h A t t r i b u t e C o m p o n e n t s C l a s s e s o f E l e m e n t a r y Q u e s t i o n s S t u d i e d E x a c t Q u e s t i o n s R e l a t i o n s h i p Q u e s t i o n s T r e n d Q u e s t i o n s DV Component P r i m a r y I V Component S e c o n d a r y I V Component THEORETICAL PROPOSITIONS / 64 4. Learning It is believed that experienced graph readers know the correspondences between quantitative trends and visual patterns for various types of graph in long term memory. For example, crossing lines in a line graph would convey to an expert graph reader the existence of an interaction effect whereas parallel lines in the same graph would translate into the absence of such an effect. Similarly, a decending staircase in a bar graph would indicate a falling trend to the efficient bar chart reader whereas an ascending staircase would indicate a growing trend. As it is very difficult to monitor people's experience and training in the use of various types of graphs, some of the methods available to researchers for investigating as well as reducing the possibility of a strong confounding learning effect would, for example, be: to restrict to the use of conventional graphics (i.e. graphs whose constructions have been based on established rules or standards); to choose appropriate candidates (e.g. subjects who have been exposed to graphs under investigation); and to apply time-repeated-measures design (i.e. experimental replications). Replications of experimental sessions should not only provide an indication of the significance of learning, but should also help to partial out spurious factorial effects attributable to a lack of practice or unfamiliarity; In this research, each participant replicates two experimental sessions, each of which consists of a total of thirty-six treatment combinations or trials. B. TASKS INVESTIGATED IN THIS RESEARCH A characterization of tasks examined in this research is based on the graph attribute component (table 3.3) from which answers to questions presented for each respective experiment are to be extracted. Hence, one may classify experimental task characteristics according to the variable from which pertinent information is to be chiefly extracted, namely: 1. Dependent variable (DV) component (in experiment 1) THEORETICAL PROPOSITIONS / 65 2. P r i m a r y i n d e p e n d e n t variable (PIV) component (in experiment 2) 3. S e c o n d a r y i n d e p e n d e n t variable (SIV) component (in experiment 3)t Tables 3.4 and 3.5 compare and contrast task activities for each of the three experiments conducted in this research program. It should be noted that the processing of task activities for E1, E2, and E3 are expected to be in an order of increasing complexity because explicit time period and dataset information anchoring is afforded in tasks for E1 and E2 but not for E3. Tables 3.4 and 3.5 suggest that the amount of search processes needed for performing E3 tasks due to unknown dataset information should increase the time required as well as reduce accuracy relative to performing E1 and E2 tasks; that is, in E3 subjects will have to search all datasets in order to answer the question, whereas in E1 and E2, subjects need only examine information related solely to one dataset. Similarly, E2 tasks are expected to take longer to perform than E1 tasks due to the fact that while time period information is explicitly given for E1 tasks, the appropriate time-period information is to be extracted for E2 tasks. This requires the search of multiple points of the given dataset in E2. 1. E x p e r i m e n t 1 T a s k s As presented in table 3.4, task activities for experiment 1 comprise: 1. Q1 -- Finding the DV scale-value of a single datapoint with an explicitly defined time period ( i.e. PIV information ) on a particular dataset ( i.e. SIV information ) and comparing it to a given DV scale-value 2. Q2 -- Finding the DV level difference pattern of a pair of adjacent datapoints with explicitly defined time periods (i.e. PIV information ) on a particular dataset ( i.e. SIV information ) 3. Q3 - Finding the DV trend of a range of successive datapoints with explicitly defined time periods (i.e. PIV information ) on a particular dataset ( i.e. SIV information ) Questions for this experiment have answers that are to be extracted specifically from t h e D V + See figure 3.1 and tables 3.4 & 3.5. THEORETICAL PROPOSITIONS / 66 component (i.e. scale-value, level difference, and trend information) based on given PIV (i.e. time period) and SIV (i.e. dataset) information.t As highlighted in a later section of this chapter, the key characteristics of tasks in this experiment is that of a strong anchoring of time-period information on the abscissa component; that is, all tasks in this experiment begin with explicit time-period information and work towards the uncovering of respective DV component information. Treatment combinations for factors of graphical information complexity are presented in table 3.6. Appendix B provides the 36 different treatment combinations (trials) to be undertaken in this experiment. Actual questions and accompanying graphics are shown. 2. Experiment 2 Tasks As presented in table 3.4, task activities for experiment 2 comprise: 1. Q1 -- Examining the DV scale-values of single datapoints across all time periods of a particular dataset for a given DV scale-value 2. Q2 ~ Examining the DV level difference patterns of pairs of adjacent datapoints across all time periods of a particular dataset for a given DV level difference pattern (increasing or decreasing) 3. Q3 -- Examining the DV trends of ranges of successive datapoints across all time periods of a particular dataset for a given DV trend Table 3.5 indicates that questions for experiment E2 have answers that are to be extracted from the PIV component ( i.e. time periods ) based on given SIV information (i.e. dataset) and characteristics of DV component information. Similar to tasks in experiment E1, the key characteristics of tasks in E2 is that of a strong anchoring of time-period information on the abscissa component. t See table 3.5. THEORETICAL PROPOSITIONS / 67 Table 3.4: A Comparison of Task Activities for Experiments E-1, E2 and E3 Experiments C l a s s e s of Elementary Q u e s t i o n s S t u d i e d Exact Q u e s t i o n s (Q1) R e l a t i o n s h i p Q u e s t i o n s - (Q2) Trend Q u e s t i o n s (Q3) Experiment 1 F i n d i n g the DV_ s c a l e - v a l u e of a s i n g l e p o i n t w i t h e x p l i c i t l y d e f i n e d IV time per i o d and d a t a s e t . F i n d i n g the DV l e v e l s of two a d j a c e n t p o i n t s with e x p l i c i t d e f i n e d IV time p e r i o d s and d a t a s e t . F i n d i n g the DV t r e n d of a range of p o i n t s w i t h e x p l i c i t d e f i n e d IV time p e r i o d s and d a t a s e t . Experiment 2 Examining the. DV s c a l e - v a l u e s of i s o l a t e d p o i n t s a c r o s s time p e r i o d s of a p a r t i c u l a r d a t a s e t to f i n d a s p e c i f i c IV v a l u e . Examining the DV l e v e l d i f f . p a t t e r n of a d j . p o i n t s a c r o s s time p e r i o d s of a p a r t i c u l a r d a t a s e t to f i n d a p a i r of IV v a l u e s . Examining the DV t r e n d s of s u c c e s s i v e p t s . a c r o s s time p e r i o d s . o f a p a r t i c u l a r d a t a s e t t o f i n d a range of IV v a l u e s . Experiment 3 Examining the DV s c a l e - v a l u e s of i s o l a t e d p o i n t s a c r o s s time p e r i o d s of a l l d a t a s e t s to f i n d d a t a s e t v a l u e . Exami n i n g . t h e DV l e v e l d i f f . pat t e r n of a d j . p o i n t s a c r o s s time p e r i o d s of a l l d a t a s e t s to f i n d d a t a s e t v a l u e . Examining the DV t r e n d s of s u c c e s s i v e p t s . a c r o s s time p e r i o d s of a l l d a t a s e t s t o f i n d d a t a s e t v a l u e . Experiments Tasks Sta Primary IY Component tus of Information on Variou Components of Time Series G Secondary IV Component s A t t r ibu te raphics OY Component El Ql Given: Speci f ic x-axis value Given: Specif ic dataset value ?Find: Compare derived OY scale value to a given spec i f ic scale value Q2 Given: Consecutive pa i r of spec i f i c x-axis values Given: Specif ic dataset value ?Find: Relationship of two scale values Q3 Given: Successive range of spec i f i c x-axis values Given: Specif ic dataset value ?Find: Trend of a range of. scale values • E2 Ql ?Find: Speci f ic x-axis value Given: Specif ic dataset value Given: Speci f ic DV scale value Q2 ?Find: Consecutive pa i r of spec i f i c x-axis values Given: Specif ic dataset value Given: Relationship of two scale values Q3 ?Find: Successive range of spec i f i c x-axis values Given: Specif ic dataset value Given: Trend of a range of scales values E3 Ql Not Applicable ?Find: Specif ic dataset value Given: Speci f ic DV scale value Q2 Not Applicable ?Find: Specif ic dataset value Given: Relat ionship of two scale values Q3 Not Appl icable ?Find: Specif ic dataset value Given: Trend of a range of scale values THEORETICAL PROPOSITIONS / 69 The only difference, however, is that E2 tasks all begin with known characterizations of DV component information and works toward uncovering time-period information on the abscissa. In short, time-period information is not explicit in E2 as in El . Treatment combinations for factors of graphical information complexity are similar to experiment E1 (table 3.6). Appendix C lists the 36 different treatment combinations (trials) to be undertaken in this experiment, including both questions and accompanying graphics for each trial. 3. Experiment 3 Tasks As presented in table 3.4, task activities for experiment 3 comprise: 1. Q1 -- Examining the DV scale-values of single datapoints across all time periods and all datasets depicted for a given DV scale-value 2. Q2 - Examining the DV level difference patterns of pairs of adjacent datapoints across all time periods and all datasets depicted for a given DV level difference pattern 3. Q3 -- Examining the DV trends of ranges of successive datapoints across all time periods and all datasets depicted for a given DV trend As observed in table 3.5, questions for the third experiment have answers that are to be extracted from the SIV component ( i.e. dataset information) based solely on given characteristics of DV information (table 3.5). That is, subjects need to search along both the PIV attribute component (i.e. all time periods ) as well as the SIV attribute component (i.e. all datasets) in order to perform tasks in this experiment, ln a later section, it will be shown that the key characteristics of tasks in this experiment is that of the absence of a strong anchoring of time-period information on the abscissa component. Treatment combinations for factors of graphical information complexity are provided in table 3.7. Appendix D presents the 36 different treatment combinations (trials) to be undertaken in this experiment with actual questions and accompanying graphics for each trial. THEORETICAL PROPOSITIONS / 70 Table 3.6: Information Complexity Manipulated in Experiments E1 and E2 K I n f o r m a t i o n C o m p l e x i t y T i m e P e r i o d V a r i a t i o n D a t a s e t C a t e g o r y T o t a l d a t a p t s p l o t t e d a 7 1 7 b 14 1 • 1 4 c 7 3 2 1 d 1 4 3 42 THEORETICAL PROPOSITIONS / 71 Table 3.7: Information Complexity Manipulated in Experiment E3 Informat ion Complex i t y Time P e r i o d V a r i a t i o n D a t a s e t C a t e g o r y T o t a l data p t s p l o t t e d a 7 2 1 4 b 7 3 2 1 c 1 4 2 28 d 1 4 3 42 THEORETICAL PROPOSITIONS / 72 Note that variables manipulated in E3 differ from those of experiments E1 and E2. First, only multiple representations of time-series graphics are used in E3 since the emphasis is on asking questions dealing with the SIV attribute component. Second, this change of factor levels of graphical information complexity in the experiment should result in a greater demand of time and effort on the part of the participants. Indeed, time and accuracy results of pilot testings indicate that subjects found task activities of E3 more demanding than those of E1 and E2. Further supporting evidence for this lies in the increasingly larger mean reaction times and higher percentages of incorrect responses found for task performance during actual sessions of E3 as compared to the other two experiments. In this research, the rationale underlying the use of three experiments instead of a single experiment lies precisely in the presence of task complexity across the experiments. Within each experiment, tasks (i.e. Q l , Q2, and Q3) are generally designed to be equally complex and task complexity is not a variable of interest. C. THEORY & PROPOSITIONS As noted in chapter 1, further progress in the area of MIS graphics research may likely come about with: (1) the introduction of a scheme for which complex tasks can be decomposed to a level where their underlying mechanisms may be characterized (Vessey, 1987); (2) the specification and validation of a sound taxonomy of question types (i.e. tasks) by which results from graphics studies can be compared and integrated (Davis, 1985; Jarvenpaa & Dickson, 1988); and, (3) the generation of a priori hypotheses regarding why certain results may or may not be expected (Benbasat et al., 1986). The final sections of this chapter address these problems based on the concept of matching characteristics between tasks and graphical representations. It is this knowledge which will contribute to our understanding of relative strengths and weaknesses of various graph formats for performing different tasks (Jarvenpaa, 1986; Vessey, 1987; Jarvenpaa & Dickson, 1988). The concept of matching appropriate formats to appropriate tasks is one that has continued to intrigue graphics theorists as well as researchers (see chapters 1 and 2). A review of current MIS empirical THEORETICAL PROPOSITIONS / 73 literature showed that there is a strong and growing interest among researchers in accumulating empirical evidence on the different circumstances in which various types of information representations may prove to be better or worse, based on criteria such as decision time, decision quality, interpretation accuracy, etc. Since reviews of empirical evidence comparing tabular and other types of graphical representations have recently appeared in the mainstream of MIS publications (e.g. jarvenpaa & Dickson, 1988), this discussion will focus on views expressed among the theorists. First, Cleveland (1984) and Cleveland & McCill (1984) observe that the use of bar graphs should be generally avoided, and thus recommend that they be replaced by dot charts. Among the reasons cited for this claim are: 1. The elementary perception task(s) people are likely to perform on a dot chart would be that of judging position along a common scale, but both area and length judgments, which are found to be less accurate than positional judgments, are likely to play important roles when bar charts are used (Cleveland & McGill, 1984, p. 532-533); 2. Some data values will also be given more visual emphasis than other data values in bar representations, but not in dot representations (Cleveland, 1984, p.277). Second, the Kosslyn-Pinker model of graph comprehension appears to argue that a bar chart schema would allow more immediate extraction of values of the PIV attribute component for single bars than the disembedding of abscissa values for segmented points on a line. The process theory (e.g. Pinker, 1981; Kosslyn et al., 1983) actually implies that it is difficult to identify relative levels of .respective points on a line graph because that information is still a part of an unitary Cestalt, the line. In contrast, identifying trends on a line is automatic. Trend preception for bars may become effortful, however, due to the occasional need to perform a serial identification and combination of relevant isolated bars. For time-series, Bertin's rules of graphic construction specifically recommend the use of: symbol graphs for overall vision (correlation); line graphs for general trend perception; and symbol connected THEORETICAL PROPOSITIONS / 74 graphs or bar graphs for precision reading, depending on the way the conceptual question is phrased (see Bertin, 1983, p.215; 1981, p. 114-115). In addition, Bertin (1981; p. 107) claims that the priviledged domain of scatterplots is the discovery of clusters or groupings of objects, and/or the relationship between two characteristics. However, as the scope of this research is limited only to studying time-series graphics, identifying DV cluster relationships falls outside the domain of tasks that are of interest. According to current theories then, it is easier to extract scale-values from a dot or bar chart than from a line graph, and to extract trends from a line graph than from a dot or bar chart. But it appears uncertain whether symbol plots, dot charts, or bar graphs would best facilitate the extraction of level differences. In other words, effective communication can be achieved with the help of graphics only when the information intended to be read is represented in the most appropriate format. The table below puts together the various facets of theoretical support for the different sorts of information that should be optimally extractible from the various choices of graph format. Graph Format Question Type Theories Supporting Authors Symbols/Dots Bars Bars Dots/Symbols DV Scale Values DV Level Differences Task Ordering Process theories Rules of Construction Process theories Task Ordering Cleveland Kosslyn, et al.; Pinker; Bertin Kosslyn, et al.; Pinker-Cleveland Lines DV Trends Process theories Kosslyn, et al.; Pinker Scatterplots DV Clusters Rules of Construction Bertin THEORETICAL PROPOSITIONS / 75 The above table reveals that while some theories have a narrower focus, others like the Kosslyn-Pinker process theories appear to provide a more comprehensive explanation on the use of graphics over a wider range of tasks. As a matter of fact, one may also list the corresponding empirical evidence in an additional column indicating whether the theoretical predictions were empirically supported or refuted. Because much of the empirical literature has been focussed on tabular versus graphical representations, whereas this particular research has emphasized solely the use of different graphical methods for data extraction, it is argued that there is still the need of accumulative evidence on the superiority of one graphics over another before the suggested column is added. Finally, the theories are unfortunately, much less explicit regarding complexity issues. Certainly, the introduction of graphical information complexity factors would lead to further complication of the above table. For instance, Pinker's (1981; p. 36) argument that the advantage of line graphs, for perceiving DV trends over that of bar graphs, would be expected to be even more pronounced in situations where grouped bar charts are used instead of multi-line graphs. This suggets a further reclassification of question types into those performed with single representations versus those performed with multiple representations. Cleveland (1985) claims that the further a pair of entities are apart from each other in one planar dimension the less accurate will be the perception of their relationship along the corresponding perpendicular dimension. Here, distance between points would become a crucial variable when reclassifying question type. Among the theorists, Bertin's view appears most complicated: For example, Bertin (1981; p. 115-121) argues that the rules of construction differ for "reorderable" objectst with one, two, or three characteristics.* Even so, he extends his rules to apply to such diagrammatic construction as networks, maps and images. The reader is referred to his work for details (e.g. see Bertin, 1981; p. 100-175). + I.e. Objects, quantities or datapoints belonging to a specified order which can be changed. * The number of characteristics bears a one-to-one correspondence with the concept of number of dataset groupings. THEORETICAL PROPOSITIONS / 76 1. The Theory Investigated The principal thesis which effectively summarizes the theory under investigation is Pinker's (1981) Graph Difficulty Principle. Briefly stated, this principle argues that A particular type of information will be harder to extract from a given graph to the extent that inferential processes and top-down encoding processes, as opposed to conceptual message lookup, must be used. ln a separate paper, Pinker (1983, p. 3) summarizes the implication of his theory and principle as, ..the case of reading a certain type of information from a certain graph format will depend on the extent to which that graph format translates that trendf into a single visual pattern that the visual system can automatically extract, and on the extent to which the reader knows that the correspondence in that format between the quantitative trend and the visual pattern holds. ln this context, Pinker's theory claims that no one form of information representation is dominant across-the-board: instead, one representation may be better suited to yielding the answer to one kind of question but pooriy suited to yielding the answer to another question, depending on which visual pattern conveys the answer and on the ability of the graph reader to encode that pattern. To illustrate the implications of this theory, Pinker (1981) argues that when trends no longer correspond to single attributes of a distinct perceptual entity but must be inferred from the successive intervals (see Pinker, 1981, p. 38), then the extraction process becomes naturally more time-consuming and effortful. The representation given in figure 3.2b (see Pinker, 1981) presents a case in point. If the figure were re-constructed as a line graph using variable 3 (i.e. A vs B) as the abscissa, and variable 1 as the parameter, Pinker asserts that this new graph would not portray the linear and accelerating trends as transparently as the t I.e. The sorts of information to be extracted. THEORETICAL PROPOSITIONS / 77 Figures 3.2(a,b): Pinker's Illustrations of Graph Designs for Trend Reading S THEORETICAL PROPOSITIONS / 78 present line graph, as these trends must be inferred from successive intervals. Similarly, corresponding trends of the bar-chart counterpart given in figure 3.2a are not as transparent as the recommended line graph for precisely the same reason: the trends may have to be deduced from successive pairwise comparisons of bars. Obviously, not only is it important to realize how Pinker's general theory stated above pulls together the earlier discussion on graphics theories, but it is also critical to articulate the kinds of plausible propositions and hypotheses that may be drawn from this and the other theories. Thus, the rest of this section focusses on advancing a set of plausible and testable propositions based on the theories, as well as providing the underlying reasoning for these hypothesized effects. The rationale for this approach is to avoid the major lack of a priori reasoning, found in past graphics-related research, concerning why a certain graph format would be better suited to performing a given task. Note also that, in general, propositions regarding the interaction of graph format and those of graphical information complexity factors would be speculative rather than theory-based owing to the current lack of theoretical development along those directions. a. Proposition 1 Proposition 1.1: The extraction of correspondence between specific DV scale value and value(s) of IV(s) for a single datapointt is better suited to symbol or bar graphs than to line graphs. The current theoretical argument for this proposition is that in a line graph schema, the scale values of both the DV and the IV of a datapoint are effortful to extract because the datapoint forms an integral + E.g. Q1 of E1 as discussed in tables 3.4 and 3.5. THEORETICAL PROPOSITIONS / 79 part of a line. In contrast, a symbol, when uniquely determined, is an isolated and well-defined 'perceptual unit' and not, on the basis of Gestalt Laws, an integral part of a larger perceptual unit. Indeed, the symbol may well vary in size, which would then determine the nature of the 'perceptual unit' seen. Furthermore, the extraction of DV scale values for bars should be faster and more accurate than for lines simply because the top rectangular base of a bar is discrete and flat for interpolating DV values on the ordinate scale whereas each datapoint on a line is fully embedded and difficult to isolate or detect for reading its value on the DV scale. Proposition 1.2: Effects of graph format are expected to interact with factors of graphical information complexity when answering questions on the extraction of DV scale values of a single datapoint. Moreover, it is hypothesized that with increasing information complexity, the expected effects would become more pronounced, for the simple reason that when more time period subdivisions are plotted on the abscissa of time-series graphics, datapoints on a line inevitably become closer and thus even harder to isolate. Similarly, with increasing number of datasets, which translates into multiple-lines, the disembedding task correspondingly require even more effort.t Yet, owing to the incompleteness of evidence in the literature, further research is desirable to clarify effects due to complexity factors. To date, there have been few studies on the effects of graphical graphical information complexity on task performance (e.g. Lauer, 1986; Yoo, 1985), and their results have not, moreover, been satisfactory. b. Proposition 2 Proposition 2.1: The extraction of correspondence between DV level difference and values of IVs for an adjacent pair of datapoints* is better suited to symbol or bar charts than to line graphs. t This would justifies that the variables manipulated here are in fact, dimensions of the graphical information complexity construct. £ E.g. Q2 of E1 as discussed in tables 3.4 and 3.5. THEORETICAL PROPOSITIONS / 80 The theoretical rationale for this proposition is that for a line, decoding the relative level of an adjacent pair of datapoints would still involve the breaking up of part of the overall line trend -- a 'Gestalt' on its own - although it is reasonable to expect that performance of this task with lines would be faster and more accurate than that of extracting DV scale-values on the ordinate scale. In discussing his bar chart schema, Pinker (1981) hypothesizes that the DV level of each bar on a bar chart is easily decoded because not only is each bar complete by itself with an enclosed area and length,t but also the primary independent variable value (i.e. time period) of each bar is instantly identified. Hence, "bar graphs are better than line graphs ... for illustrating differences between dependent variable values corresponding to specific independent variable values since the desired values are specified individually in bar but not the line graph ..." (Pinker, 1981, p.39). For an adjacent pair of symbols, decoding their relative levels translates into judging their positional differences along a common scale,* an elementary task which is ordered as being more accurate to perform than judging lengths or directions in the Cleveland-McGill (1984) task hierarchy. Clearly, the views of different theories are not always in agreement. Although the theories appear to indicate that both symbol and bar charts are expected to be superior to line graphs for extracting DV level differences, whether symbols or bars are the more suitable format for such a task is still a debatable issue. Indeed, the fact that symbols appear to share a combined characteristics of bars in terms of discreteness as well as that of lines in terms of connectedness places them as the prime representation for the rapid and accurate extraction of DV level difference information. However, the findings from this research will contribute to clarifying such a reasoning. Proposition 2.2: The effects of graph format are expected to interact with factors of information t I.e. It forms a unitary Gestalt. * In this study, this common scale refers to the ordinate scaling. That is, the DV quantity attribute scaling represented along the y-axis of time-series graphics. THEORETICAL PROPOSITIONS / 81 complexity in answering questions about level differences of an adjacent pair of datapoints. With more complex graphical representations, such as increasing number of time periods being plotted along the abscissa, the advantage of symbols and bars over lines for extracting level difference information may be reduced, because many more level differences may have to be examined in the case of a complex graph; hence, the effectiveness of lines for the perception and comparison of level relationships, in contrast to other representations in a complex graphical context, may have to be recognized. Similarly, if complexity is due to plotting a large number of datasets, there exists the possibility of a high adverse effect for bars. This is because, as the number of categories in the dataset classification increases, it becomes harder to perceive level differences in bars belonging to a specified category, than in lines or symbols, because bars belonging to the same category in the classification would be represented in a grouped bar chart as isolated bars. This is not true for symbols or lines. Again, these speculations need the support of empirical evidence. c. Proposition 3 Proposition 3.1: The extraction of DV trend for a range of datapointsr is better suited to line graphs than to symbol or bar graphs. The theoretical reasoning is that DV trend extraction is best suited to line graphs rather than any other representations because In line graphs, trends translate into the shape of a line or of a configuration formed by a set of lines, which is an easily avoidable [SIC]* property. However, in bar graphs, especially those that t E.g. Q3 of E1 as discussed in tables 3.4 and 3.5. * The correct word as I see it should be "encodable" and not "avoidable" THEORETICAL PROPOSITIONS / 82 encode more than two variables, trends translate into a particular pattern of lengths of different bars, which, not forming a unitary Cestalt, must be examined and compared one or two at a time (Pinker, 1983). Simply stated, this means that either a bar-to-bar or symbol-to-symbol comparison may have to be performed in graph schemas other than that of a line graph. Moreover, human information processing is limited by the number of bars or symbols that can be encoded simultaneously (Ericsson et al., 1980). However, a range of several points on a line may easily be encoded as having a single attribute. Proposition 3.2: The effects of graph format are expected to interact with factors of graphical information complexity when answering questions about trends of a range of datapoints. Naturally, as more and more datasets are to be plotted, the advantage of lines for trend perception over bars and symbols should increase, as the power of trend perception in lines is further "capitalized". Conversely, increasing the number of symbol arrays or bar series to be processed could easily confuse or overwhelm any graph reader owing to their characteristics of being discrete. D. THE ANCHORING CONCEPT The concept of information anchoring, now described, is used to provide the theoretical basis for characterizing experimental tasks as well as those of graph formats specifically investigated in this research. 1. Task Characteristics Table 3.8 illustrates how experimental tasks examined in this research can be characterized based on the anchoring concept. First, task activities characterized as Croup I tasks are those with a strong or high anchoring of information on both the ordinate (y-axis) and the abscissa (x-axis). This kind of task is usually limited to drawing (x,y) relationships although it does not matter whether the corresponding THEORETICAL PROPOSITIONS / 83 relationships to be studied begin from the x-axis and work towards the y-axis or vice versa; namely, when a question starts with a time period (a value on x-axis) as in Q1 of E1 (see table 3.5) or, when the answer is to be a time period (a value on the x-axis) based on a given y-value as in Q1 of E2 (see table 3.5). In contrast, task activities characterized as Group IV are those with a weak or low anchoring of information on both axes. Since the concept of strong information anchoring on a graphical component is related directly to the disembedding of information on the respective dimensional axis, then, by definition, exact questions regarding specific (x,y) correspondences are excluded in the case where tasks are classified as having only weak or low anchoring on both the abscissa (x-axis) and the ordinate (y-axis). Examples of such tasks are Q2 and Q3 of E3 (see table 3.5). Between Group I and Group IV tasks, which represent the ends of a continuum, lie tasks whose activities are characterized as having a strong anchoring of information on one component (e.g. x-axis) but a weak anchoring of information on the other axis (e.g. y-axis). Note that all tasks in this research have a strong anchoring on the dataset component. Specifically, Group II tasks are those having a high anchoring of information on the abscissa but a low anchoring of information on the ordinate, and, Group 111 tasks the reverse. Table 3.8 classifies all the experimental tasks investigated, according to the anchoring concept (i.e. Groups I, II, 111, and IV). Details of these tasks have been provided earlier (see tables 3.4 and 3.5.). Note that the matrix for classifying task activities investigated can be extended from the 2 x 2 information anchoring characteristics on the planar dimensions (i.e. x-axis and y-axis) to a 2 X 3 anchoring of information on the x-, y-, and z- component of three-dimensional graphics. In the next section, characteristics of bars, symbols, and lines are discussed using the same concept about information anchoring on the various graphical components representing time-series data. THEORETICAL PROPOSITIONS / 84 Table 3.8: The Anchoring Concept A C l a s s i f i c a t i o n of I n v e s t i g a t e d T a s k s i n t h i s R e s e a r c h o n t o t h e A n c h o r i n g Framework O r d i n a t e ( y - a x i s ) A b s c i s s a ( x - a x i s ) A n c h o r i n g A n c h o r i n g H i g h (+) Low (-) H i g h (+) G r o u p I t a s k s G r o u p I I I t a s k s Ql ( E l ) Ql (E3) Q1 (E2) Low (-) G r o u p I I t a s k s G r o u p IV t a s k s Q2,Q3 (E1) Q2,Q3 (E3) Q2,Q3 (E2) THEORETICAL PROPOSITIONS / 85 2. Graph Format Characteristics Figure 3.1 illustrates the three major components for information anchoring of a time-series graphical representation: 1. Dependent variable (DV) component (i.e. x-axis anchoring) 2. Primary independent variable (PIV) component (i.e. y-axis anchoring) 3. Secondary independent variable (SIV) component (i.e. dataset anchoring) Accordingly, it is possible to characterize bars, symbols, and lines on the basis of whether they show high, moderate, or, low anchoring of information on these respective graphical components. For bars, since each bar stands on a pip that is anchored strongly to the abscissa, these are said to have high x-axis anchoring. On the other hand, the top of each bar provides a flat platform (the length of which varies from one design to another) which usually proves to help in interpolating DV scale values on the ordinate reliably. It can thus be argued that bars do have a moderate y-axis anchoring. Finally, as bars belonging to the same dataset are discrete and isolated from one another so that a series of appropriate bars must be processed to derive information on the dataset, bars are seen to exhibit a low dataset anchoring. For symbols, both the x-axis and y-axis anchoring appear to be about equally high because unlike bars, they do not connect directly to any of the two planar dimensions. Hence, symbols are considered to have moderate x-axis and moderate y-axis anchoring relative to bars. Yet, different symbols belonging to the same dataset, although still discrete and isolated from one another, are more easily grouped than separate rectangular bars because of their relative sizes and of the similarity which they bear to lines in the case of multiple dataset representations. Therefore, symbols may also be said to have a moderate dataset anchoring. THEORETICAL PROPOSITIONS / 86 For lines, as all points on a line are simply embedded and isolating a point from any part of a line is literally breaking up a unitary Cestalt, lines are argued to have both a low x-axis and a low y-axis anchoring. However, relative to either bars or symbols, lines have the best and highest dataset anchoring. 3. Matching Formats to Tasks ln this section, several additional propositions are postulated, that relate essentially to the above assumptions regarding the various characteristics of tasks and graph formats studied in this research. In other words, they may be regarded as a priori predictions on the matching characteristics of the various graph formats (i.e. bars, symbols, and lines) with the various task categories (i.e. Croups I, II, III, and IV tasks) specific to the context of this research. a. Proposition 4 Proposition 4: Performance of task activities with characteristics of strong information anchoring on both the abscissa and ordinate framework (i.e. Group 1 tasks) is best achieved with the use of bars or symbols but not lines. Group 1 tasks (x-axis anchoring = high, y-axis anchoring = high), as defined, will be limited to locating (x,y) points for either question or answer (e.g. Q1 of E1 and E2, tables 3:5 and 3.8). For these tasks, bars are expected to help in regard to exact location of points, either expressed in the question and/or required in the answer, because among all three formats investigated, they provide the best x-axis anchoring (e.g. compare figures 3.2a,b ). Locating points with symbols will be easier than with lines because effort is required to separate points embedded on a line (i.e. cutting a point on a line). Since lines have the lowest x-axis and y-axis anchoring, cutting points on lines will be one of the most difficult task. Indeed, the theory argues that predicates of patterns and trends are stored or assembled in memory but not exact iocations of points when lines are read. Overall, lines having characteristics THEORETICAL PROPOSITIONS / 87 of being continuous and disjointed from both the axes component would be the most undesirable format to use for performing Group I tasks. At this point, note that because of the types of scaling depicted by y-axis (ratio scale) as opposed to x-axis (ordinal scale) for time-series graphics, interpolation on the y-axis appears to be harder to perform than interpolation on the x-axis. This observation will also be useful in distinguishing between the difficulty of using lines to answer Q1 in E1 as opposed to that of Q1 in E2. b. Proposition 5 Proposition 5: Performance of task activities with characteristics of a strong information anchoring on the abscissa framework but a weak information anchoring on the ordinate framework (i.e. Group II tasks) is best achieved with the use of bars, when the task at hand requires only the extraction of single datapoints, and with the use of symbols when the task at hand requires the simultaneous extraction of multiple datapoints. Group II tasks (x-axis anchoring •= high, y-axis anchoring = low) are limited to those with specific time period information in either answer or question (table 3.5) but are not involved with y-axis values (e.g. Q2 and Q3 in E1 and E2, table 3.8). The strength of bars for these tasks lies in their having a high x-axis anchoring. However, since they are the most discrete and individually processed among the graph formats, their use will only be appropriate when singular datapoints are to be extracted one at a time. Conversely, symbols, which more nearly are continuous, will have an edge over bars when multiple datapoints are to be extracted simultaneously. c. Proposition 6 Proposition 6: Performance of task activities with characteristics of strong information anchoring on the ordinate framework but weak information anchoring on the abscissa framework (i.e. Group III tasks) is best achieved with the use of symbols. THEORETICAL PROPOSITIONS / 88 Croup III tasks (x-axis anchoring =• low, y-axis anchoring = high) are limited to those where time-period information is of no concern either in question or answer but the explicit focus is on a specific y-axis value (e.g. Q1 in E3, tables 3.5 and 3.8). For such tasks, performance is expected to be best achieved with the use of a representation like the horizontal bars. Since only vertical bars are used in this research, symbols provide the second best alternative because they have been found to yield a more accurate anchoring of information on the ordinate (y-axis) than either bars or lines (Cleveland, 1984; Cleveland & McGill, 1984). Note that as all tasks in this research have a strong dataset anchoring characteristic, the use of symbols can be expected to have a slight advantage over that of bars because bars have a low dataset anchoring whereas symbols have a moderate dataset anchoring. Hence, the use of symbols for Q1 of E3 should prove superior to either bars or lines. d. Proposition 7 Proposition 7: Performance of task activities with characteristics of weak anchoring of information on both the ordinate and abscissa framework (i.e. Group IV tasks) is best achieved with the use of lines and worst with the use of bars. Group IV tasks (x-axis anchoring = low, y-axis anchoring = low) are restricted to those focussing on the dataset information such as trends or general patterns (e.g. Q2 and Q3 in E3, tables 3.4, 3.5 and 3.8). Characteristics of lines have a corresponding match to these kinds of tasks because, first, there are no anchoring characteristics to the x- and y-axis components with points on lines, and second, lines have the best dataset anchoring. Bars, being the most strongly anchored to the abscissa framework as well as having the worst dataset anchoring among the graph formats studied will be expected to result in the worst match to these tasks. THEORETICAL PROPOSITIONS / 89 In summary, the anchoring framework provides a strong basis for classifying decomposable tasks that are investigated in this research besides facilitating the generation of relevant propositions based on matching characteristics of tasks and those of graphical information representations. IV. EXPERIMENTAL METHODOLOGY In chapter 3, several major factors believed to influence the use of a graphical presentation were identified from the literature and several propositions were drawn from the theories. In this chapter, the experimental methodology used for studying those factors and for testing those propositions will be discussed. First, it should be noted that the adoption of a cumulative experimental approach in this research program has many advantages. They include the generalizability of pervasive effects and the opportunity to manipulate important variables differently in different experiments. For instance, manipulation of many levels of the task variable is more easily accommodated with a series of related experiments than with a single complex experiment. In addition, findings drawn from a program of experiments allow progressive examination of a particular hypothesis. Finally, results based on a program of research could usually be generalized beyond the confines of individual experiments, whereas results from a single experiment would be considerably more l imited.t The series of experiments comprising this research will: involve related sets of experimental variables; test similar experimental hypotheses, with the emphasis placed on the graph format by question-type interaction effect; follow identical experimental plan and procedures; conform to a general statistical model; and use closely resembling graphics stimuli. A. EXPERIMENTAL VARIABLES As indicated previously, elapsed time was the principal dependent variable in this research. Accuracy, a secondary criterion, was included to ensure adequate control over possible time-accuracy tradeoff effects that might surface and present difficulties in the interpretability of results (see chapter 1). t Other advantages of using a cumulative experimental approach may be found in Dickson et al. (1986) and jarvenpaa & Dickson (1988). See also Weick (1965) on advantages associated with using an experimental methodology. 90 EXPERIMENTAL METHODOLOGY / 91 The independent variables in each experiment were: 1. Graph Format 2. Information Complexity a. Variations in Time Period b. Variations in Dataset Category 3. Question Type A "session" variable was included to control for learning. An individual difference variable based on the field-dependence-independence construct (Witkin et al., 1971) was included as a covariate in the experimental design. 1. The Dependent Variables a. Time Time, the principal criterion, was measured as the duration between (1) the time at which the graphical presentation and the question to be answered appeared on the CRT screen and (2) the time at which the subject pressed the 'answer' key ( i.e. " 1 " or " 2 " ) to record his response (see figure 4.1). It was captured unobtrusively by the computer. b. Accuracy Accuracy was a secondary criterion. A score of " 1 " was assigned to each correct answer picked from the binary-choice questions and a " 0 " to others. The scoring scheme was chosen as being easy to code, although it resulted in a distribution with undesirable departures from normality. That binary-choice questions were used instead of multiple-choice questions was because timing was critical and participants tended to spend more time figuring out on the keyboard when multiple-choice questions are used rather than binary-choice. EXPERIMENTAL METHODOLOGY / 92 On the issue of normality, as subjects were asked to replicate each experimental session twice, an alternative scoring scheme was to combine their scores over the two sessions.* 2. T h e I n d e p e n d e n t V a r i a b l e s a. Graph Format Graph Format was a prime factor of interest studied in each of the experiments. The different types of graph format investigated were: 1. Symbols 2. Bars 3. Lines b. Question Type Question type, which was based on the classes of quantitative information that could be extracted from a presentation, was another key factor of interest in this research. Accordingly, questions were designed on the basis of fundamental classes of information that were to be extracted: 1. Exact Questions (Q1) -- These were questions testing the reading and understanding of the relationship between a given DV scale value and the exact scale value of a single datapoint 2. Relationship Questions (Q2) -- These were questions testing the reading and understanding of the relationship between the DV level differences of a pair of adjacent datapoints 3. Trend Questions (Q3) -- These were questions testing the reading and understanding of the DV trend among a range of successive datapoints As discussed in chapter 3, the key characteristic of tasks in experiments E1 and E2 was that of a strong abscissa anchoring although E1 tasks begin with an x-axis value and work towards the DV attribute component while E2 tasks begin with information characterizing the DV component and end with t This issue is discussed further in Chapter 5. EXPERIMENTAL METHODOLOGY / 93 values on the x-axis. The key characteristic of E3 tasks was the absence of this strong abscissa anchoring noted for E1 and E2 tasks. In other words, time period information is neither provided or asked in experiment E3 tasks (chapter 3). c. Information Complexity Factors of graphics information complexity formed the secondary factors of interest in this research program. Owing to the limitation of the number of sub-factors that could be effectively studied at once, the construct of information complexity for time-series graphics was operationalized as variations due to two attribute components: 1. Variations in Time Period -- This factor was manipulated at two levels: a. 7 time periods b. 14 time periods Presumably, an increase in the number of time periods represented along the abscissa would correspondingly increase the amount of irrelevant information that must be processed in order to extract the relevant answers that were embedded in the experimental graphics displays. 2. Variations in Dataset Category - This factor was manipulated differently for different experiments, t For experiments E1 and E2, the treatment levels for this factor were: a. One dataset b. Three datasets For experiment E3, the treatment levels for this factor* were: a. Two datasets b. Three datasets t Refer to tables 3.6 and 3.7, which show the different factorial treatment levels for all three experiments. * The limited size of the Packard-Bell micro-computer monitor used in this research (which is similiar to the size of a typical micro-computer monitor) allows only a maximum of three 14-period datasets to be plotted simultaneously in any one display. EXPERIMENTAL METHODOLOGY / 94 It is expected that an increase in the number of datasets plotted would result in a corresponding increase in the amount of information to be processed and thus in impairment of task performance. In particular, the use of only multiple datasets for task activities examined in experiment E3 was because these tasks concerned only with information extraction from the SIV attribute component (i.e. dataset information). 3. The Session Variable Each subject went through a practice period fol lowed by two experimental sessions. Each experimental session consisted of 36 treatment combinations.+ The purpose of the replication was to control for possible effects due to learning. 4. The Covariate Since tasks of identifying trends, finding specific point values, and comparing level differences involve perceptual disembedding, it is possible that performance may be influenced by individual characteristics. Witkin et al. (1971, p. 4) described such individual characteristics in terms of a person's "perceptual style", which they applied the construct of "field-dependence-independence": In a field-dependent mode of perceiving, perception is strongly dominated by the overall organization of the surrounding field, and parts of the field are experienced as "fused", ln a field-independent mode of perceiving, parts of the field are experienced as discrete from organized ground. Indeed, individual difference or user characteristics had always been considered an important variable to be controlled in MIS laboratory experiments, particularly those investigating on the characteristics of t See appendices B, C, and D for the 36 trials tested in experiments E1, E2, and E3 respectively. EXPERIMENTAL METHODOLOGY / 95 the human-computer interface (see Mason & Mitroff, 1973; Dickson et al., 1977; Benbasat et al., 1986). Consequently, the individual difference construct of field-dependence and field-independence as operationalized by the GEFTt score (Witkin et al., 1971) was introduced as a covariate in the statistical model used to analyze the data collected for the series of studies conducted for this research program. A variety of equivocal results had emerged in the literature dealing with effects of individual differences on system utilization and performance (see Dickson, 1971; Benbasat & Taylor, 1978; Zmud, 1979; Mock & Vasarhelyi, 1983; Huber, 1983). The purpose of this research was to limit the study of individual differences as a possible covarying factor affecting time and/or accuracy performance. Interactions of the individual difference factor with other independent variables investigated (e.g. graph format) would be outside the scope of this research. B. EXPERIMENTAL HYPOTHESES The experimental hypotheses were based on the theoretical propositions advanced in chapter 3. In general, null hypotheses (i.e. hypotheses of no difference between means among the subpopulations) as defined among factors of interests were tested at the a = 0.05 level of significance. Attention was focussed on a priori effects that were expected to be significant (e.g. the Graph Format by Question Type interaction). A more stringent criteria of a — 0.01 was also applied for distinguishing among level of significance. On the basis of current theories and earlier discussion, the following general effects* were expected to be significant: 1. Session -- The presence of learning is expected (see DeSanctis & Jarvenpaa, 1985). Hence, performance in the second session should improve compared to performance in t I.e. Group Embedded Figures Test. * Interaction effects of no relevance to the theories will be excluded. EXPERIMENTAL METHODOLOGY / 96 the first session. 2. GEFT Scores -- Subjects who scored high in tasks requiring the isolation and/or differentiation of relationships from a context (i.e. field-independents) were expected to outperform their counterparts (i.e. field-dependents). In other words, the field-dependence-independence construct as measured by the Grouped Embedded Figures Test (GEFT: see Witkin et al., 1971) scores was expected to influence task performance significantly. While there had been some controversy as to the inclusion of cognitive style variable in MIS graphics research (Huber, 1983; Robey, 1983), some recent evidence provided support for the superiority of field-independent subjects over field-dependent subjects in the performance of disembedding tasks regardless of the format of information presentation (e.g. see Lusk, 1979; Benbasat & Dexter, 1979, 1982, 1985). 3. Graph Format — No particular form of Graph Format was expected to be significantly different from others. A significant Graph Format effect was not expected for all experiments. 4. Question Type - Task activities within each experiment were not expected to vary significantly although it was expected that task activities across the three experiments might differ significantly; that is, E3 tasks would be more complex than E2 tasks and E2 tasks would be more complex than E1 tasks (chapter 3). 5. Time Period - Increasing number of time periods represented along the abscissa (x-axis) of time series graphics were expected to impair task performance. 6. EXPERIMENTAL METHODOLOGY / 97 Dataset -- An increase in the number of datasets depicted was expected to affect task performance adversely. 7. Graph Format x Question Type Interaction — A highly significant Graph Format x Question Type interaction was expected for all experiments (Pinker, 1983). Different graph formats were expected to facilitate different types of tasks. In particular, supports for a priori propositions advanced in chapter 3 were expected. The key hypothesis was the matching of Graph Formats to Tasks.t The above hypotheses included only those effects of key interest. Other interactions such as Graph Format by Dataset, Graph Format by Time Period, Question Type by Dataset, and Question Type by Time Period were of secondary interests. Limitations of current knowledge about graphics did not make it possible to state explicitly which effects were expected to be significant at this point. C. EXPERIMENTAL DESIGN A full within-subject repeated measures factorial design was planned for each of the experiments.* Subjects were provided with all treatment combinations, which were completely randomized for each subject during each experimental session. Table 4.1 shows the 36 different treatment combinations undertaken by each participant in each experimental session. As shown, these combinations consisted of totally crossed treatments of 3 levels of graph format, 3 levels of question type, 2 levels of time period, and 2 levels of dataset category. t I.e. Groups I, II, III, and IV tasks. * See Table 4.1. EXPERIMENTAL METHODOLOGY / 98 Table 4.1: A Multi-factor Repeated Measures Experimental Design Session One I n f o r m a t i o n C o m p l e x i t y Q l B a r s Q2 Q3 Q1 mbol-s Q2 Q3 I Q l J i n e s Q2 Q3 a - -b c a Session Two (Repeated Treatments) I n f o r m a t i on Complex i t y For experiments 1 and 2, they comprised 1 2. a. A single dataset w i th 7 time periods b. A single dataset w i th 14 time per iods c: Three datasets wi th 7 time periods d. Three datasets wi th 14 time periods L For exper iment 3, they comprised: a. T w o datasets with 7 t ime periods b. T w o datasets wi th 14 t ime periods c. Three datasets wi th 7 t ime periods d. Three datasets wi th 14 t ime periods EXPERIMENTAL METHODOLOGY / 99 The actual sets of treatment combinations for experiments El, E2, and E3 are provided in appendices B, C, and D respectively. These treatment combinations were administered as 36 totally and completely randomized experimental trials for each subject during each experimental session. Each subject performed two sessions of 36 trials. Taking the replication of the two experimental sessions as a separate factor in itself led to a total within-subject design of a 2 3 X 3 2 repeated measures factorial model with one covariate for each of the experiments in the series. Adopting the kind of standardized notations advocated by Winer (1962, 1971, p. 540), the following statistical model was adopted: Y = / i + a +|3 +a |3 +y + 0 7 + |3 7 +a |3 7 +8 + ijk l m n i j i j k i k j k i j k 1 a 8 + @ 6 + 7 8 + a 0 8 + a 7 6 + £ 7 8 + a 0 7 8 + $ + i l j l k l i j l i k l j k l i j k l m as + U S + ) $ + 8 $ + a / 3 $ + a 7 $ + a 6 $ + £ 7 $ + i m j m k m l m i j m i k m i l m j k m M S + 7 0 $ + a 0 7 8 $ + n + C * ( x - x ) + e j l m k l m i j k l m n n i j k l m n Where: Y = Dependent Variable Measures (Time, Accuracy) M = Grand Mean a — Session, i = 2 levels i 0 = Graph Format, j = 3 levels j 7 = Question Type, k = 3 levels k EXPERIMENTAL METHODOLOGY / 100 6 = Time Period Variation, I = 2 levels 1 $ = Dataset Category, m = 2 levels m IT = nth subject, a random factor, n = s subjects n C = Regression Coefficient for individual difference (i.d.) factor or, Covariate x = nth individual GEFT Scores n x = Mean GEFT Scores e = Within-Subject Error Term D. EXPERIMENTAL PROCEDURES The experimental task procedures were similar across all experiments. The basic experimental task concerned the answering of binary-choice questions presented on a display screen (see appendices B, C, and D). Reasons for using binary-choice questions and the mix of subjects recruited for each experiment were discussed elsewhere. Subjects, tested individually, were given a set of instructions, as shown in appendix H. They were asked specifically to respond as q u i c k l y a n d a c c u r a t e l y as they possibly can for each trial in each of the experiments assigned to them no matter whether the session was actual or practice.* Moreover, subjects were are encouraged to ask any questions they might have during the practice period to reduce possible delays or interruptions during the course of the actual experimental run. Subjects were also informed verbally before beginning the actual experimental session that it was t Note that to adequately control time-accuracy tradeoff, subjects should be aware that they attend to both time and accuracy to the best of their ability. The more complex a task, the easier it would be for them to tradeoff time for accuracy or vice versa. EXPERIMENTAL METHODOLOGY / 101 possible to have their responses discounted in the final analysis of results should their overall time or accuracy performance fell below some predetermined cutoff points. While subjects were paid $10 bonuses as incentives for participation, further incentives were provided by awarding additional cash prizes of $25, $20, $15, $10 and $5 to the top five performerst in each of the experiments. A secondary set of data similar in structure to those used in the actual experimental trials were designed for the practice period. Feedback on both accuracy and total time taken were provided by the experimenter during the practice period to encourage subjects to perform better. Subjects who answered 3 out of the 12 questions incorrectly during the practice period were requested to redo the 12 practice trials. For the experimental session, the following procedures as flowcharted in figure 4.1 were adopted: 1. Subjects were asked to read the question on a CRT display monitor, without time constraint. 2. They were subsequently asked to hit the 'return' key to receive a graph display, with the same question which they were to answer remaining at the bot tom part of the display screen. 3. As soon as they felt certain of their intended answer to a particular question, they were to hit an 'answer' key ( i.e. either " 1 " or " 2 " ), whereby the time was automatically recorded by the computer. 4. The recording of an answer quickly cleared the current display and question. The next question was shown automatically as subjects entered the next trial (i.e. back to step 1). The step procedures followed in the actual experimentation are provided in appendix G. Altogether, about 36 responses were collected from each subject during the experimental session and about 12 responses during the initial practice period. Furthermore, each subject was asked to perform two sessions of the same experiment. The advantage of capturing this additional information was to guard against spurious effects due to learning alone rather than to the administration of different treatment t I.e. Based on equally weighted time and accuracy performance. EXPERIMENTAL METHODOLOGY / 102 combinations. These procedures were pilot tested. No complaints of fatigue were revealed. Indeed, a few subjects who performed more than 100 continuous question-and-answer trials due to an earlier programming 'bug' were surprisingly oblivious to the fact that they had in fact undergone so many trials. This was because the tasks experimented were neither highly demanding nor trivial. E. EXPERIMENTAL STIMULI In each experiment, the stimuli were sets of graphs constructed from variations of trend patterns that were representative of a wide range of time-series. These trends consisted, in general, of simple non-crossing variations of upward and downward slopes. Four sets of data were utilized for each study. These data sources were not of immediate interest but were constructed according to specific rules, some of which would be described later. For instance, changes in slopes were averaged out to take care of possible effects due to still unknown factors of complexity (see Wainer et al., 1982; Lauer et al., 1985). Basically, these data sources corresponded to the different treatment combinations of information complexity factors manipulated. 1. For experiments E1 and E2, the data comprised a. A single dataset with 7 time periods b. A single dataset with 14 time periods c. Three datasets with 7 time periods d. Three datasets with 14 time periods 2. For experiment E3, the data comprised a. Two datasets with 7 time periods b. Two datasets with 14 time periods c. Three datasets with 7 time periods d. Three datasets with 14 time periods EXPERIMENTAL METHODOLOGY Figure 4.1: An Experimental Procedure Flowchart / 103 Subject sees a question on lower part of screen Proceed to Ask for Graph Presentation s c r e e n ( u p p e r ) T h i s s t a r t s t h e T i m e r S u b j e c t c o n s i d e r s t h e answer t o t h e q u e s t i o n P r o c e e d t o H i t the Answer Key Answer E n t e r e d Yes S u b j e c t h i t s CR f o r Graph t o show on Pr o c e e d t o New T r i a l Yes * T i m e r S t o p p e d * Computer R e c o r d s the Answer Both Upper & Lower Hal f o f CRT c l e a r e d New Q u e s t i o n a p p e a r s EXPERIMENTAL METHODOLOGY / 104 The next step was to represent these data sources in 3 types of graph forms, which resulted in a total of 12 graphs. Finally, the 36 trials were made up of these 12 graphs repeated thrice -- once for each of the three question types. Applications of these data sources had, in fact, been counterbalanced across the 36 trials according to the different treatment combinations. Rules of construction for these data sources were: 1. The questions must be unambiguous -- This feature was tested by the following procedure during pilot testing: Whenever subjects responded wrongly to a particular question in one of the trials, the same treatment combination would be administered to them after the completion of all 36 normal trials. In this way, the data source(s) were modified for those questions which all of the pilot subjects answered wrongly during their first attempts, as well as those particular questions which were subsequently repeated several times to a pilot subject, indicating that those questions were not easy to understand. 2. The different data sources should not produce overlapping slopes. As noted, the elimination of cross-overs for the different datasets would help to control effects that might be attributed to Schutz's (1961b) confusability factors. 3. The variations in slope of each data source(s) were to be relatively constant. In effect, this controlled the regularity of slope changes ( Lauer, 1986 ) so that no one data source would be at a different level of difficulty. Appendices B, C, and D illustrate the different types of time-series displays used as experimental stimuli. Finally, a stand alone micro-computing environment had been chosen in preference to the main frame environment. This was to avoid the problem of shared CPU resources, besides ensuring that the timing data would not be affected by the load on the system. Moreover, to cope with possible confounding of color, only monochrome graphics displays were used. Every effort had been made to EXPERIMENTAL METHODOLOGY / 105 ensure that these displays conformed to pertinent principles of graphics design, such as those laid out in Kosslyn et al. (1983), Ives (1982), Tufte (1983) and Bertin (1983). V. DATA ANALYSIS: THE REPEATED MEASURES DESIGN The research model to be used was discussed in chapter 4. This chapter begins with an examination of the structure of the repeated measures design. More importantly, since there are advantages and disadvantages associated with different types of repeated measures design (Eiashoff, 1986), it is essential to discuss clearly the type of repeated measures design used and why it is used. Indeed, the advantages of using such a design have been strongly emphasized in the literature (e.g. Kerlinger, 1973; Shneiderman, 1980), but the kinds of potential problems that may affect the validity of results using this kind of design are very often neglected (Eiashoff, 1985). In any case, many of the problems that are discussed may, in one way or another, be overcome in a controlled experimental setting. Also included in this discussion are the various steps and methods that are used in the analysis of the experimental datasets as well as an evaluation of how well the data structure conforms to the various assumptions underlying the analysis of variance-covariance for a within-subject design. Issues regarding normality assumption, homogeneity of variance/covariance, symmetry condition, the choice of univariate versus multivariate statistical procedures, and the use of different multiple-comparison techniques are therefore discussed in this chapter. It is important to note from the outset that the focus of statistical analysis is on interaction effects that are of interests, particularly the graph format by question type interaction. Moreover, the focus of contrasts among means for this interaction are limited to those planned contrasts that are of critical concern. A. THE REPEATED MEASURES DESIGN Owing to the lack of a standardized terminology in the literature on experimental designs, a working definition of Repeated Measures Design is provided first. Eiashoff (1986) defines a repeated measures design as one that involves C groups of experimental units or subjects and in which responses are measured at k time points or under k different conditions, 106 DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 107 where k is greater than 1.t The type of repeated measures design that most closely resembles the current experimental design is that of comparison of treatments applied in sequence. In this type of design, each experimental subject receives several different stimuli or treatment combinations and thus several responses are recorded for each subject. In this research, an experimental session has 36 treatment combinations. Since subjects are asked to replicate session twice, the design combines time course and treatment comparisons (see Elashoff, 1986). A major advantage of using the repeated measures design is its efficiency. Not only does this design greatly economize on the number of subjects that must be recruited for each of the experiments, but its use also eradicates all sources of experimental bias due to variability between subjects. In other words, each subject serves fully as his own control (see Keppel, 1980). More critically, such a design has the advantage of yielding powerful F-tests even for restricted sample sizes (Cohen, 1977). As a further advantage with the use of a repeated measures design, the need to worry about departures from the assumed normality distribution is also reduced (Norton, 1952). In fact, this design is especially suited to obtaining precise measurements when many factors have to be investigated at the exploratory stage and when only a limited sample size of the subject population is available. With limited resources, this is exactly the case with this research. Indeed, the number of subjects that would have to be recruited should a full "between-subject" factorial model be used would be excessive. For instance, with only 20 observations in each of the 36 cells for each of the experiments to reach at least an acceptable power level for the F-tests of the various effects, the full factorial model would require a total subject sample of 20 (observations) x 36 (cells) x 3 (experiments). Besides, there is also a large source of experimental errors due to variability between subjects in t See figure 5.1. If k equals 1 and the response y is measured only once for each subject, then the situation is that of an ordinary analysis of variance. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 108 selecting this alternative design. Obviously, the repeated measures design is not without potential problems, including fatigue, learning, carryover and/or order effects. Yet, these special problems are not uncontrollable within an experimental setting. Indeed, pilot testings revealed little sign of fatigue among the participants. The tasks are reasonably interesting and none of the pilot or the actual experimental subjects complained of discomfort about the number of experimental trials they had been asked to perform. Instead, subjects frequently showed an improvement in performance when session 2 results were contrasted with those of session 1 results, indicating that the experience of fatigue was not pervasive. Apparently, subjects experienced learning going from the first to the second session. Performance during the second session for each experiment resulted in not only faster but usually also more accurate responses. The introduction of a Session variable to control learning was thus fulfilling its purpose. It is believed that a third session might begin to induce undesirable fatigue effect on the part of the participants with only marginal improvement expected. Finally, the possibility of a carryover and/or an order effect is minimized by having the treatment combinations "counterbalanced in each of these experiments (see Shneiderman, 1980; Simcox, 1981). In fact, the various factor levels studied are not only balanced in the different treatments administered to yield equal cell responses in every case, but the application of these treatments is also applied in an individually randomized fashion within each experimental session. In other words, the experimental design itself ensured that the carryover problem would be well under control (Davis, 1985; Yoo, 1985; Lauer, 1986). Having discussed the advantages of using a repeated measures design in experiments where a large number of factors may be explored in a controlled setting and on the various problems such a design have, and how these problems may be overcome, the next section discusses the various steps that were used in the process of data analysis. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 109 Figure 5.1: A Repeated Measures Design Group Experimental u n i t Responses (subject) i n group 1 2 . . . k 1 2 1 2 n 2 1 2 n G y y • • . . y y y • y y y . . . y y y . • - y y y • • . y y y - • • y y y • Source: Eiashoff, 1986, p. 6, reproduced with permission. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 110 B. STATISTICAL ANALYSIS PROCEDURES The following major steps constituted the process of statistical analysis for the experimental datasets: 1. The experimental data were first screened for missing values, errors, obvious patterns of correlations among the variables, and for unexpected values or outliers. 2. Descriptive statistics, histograms, and scatter plots showing the relationships among those variables of interest were examined to provide the initial base for any inferences. The inclusion of the Session variable in the initial analysis provided the basis for detecting possible confounding due to learning. Emphasis was placed on the analysis and interpretation of Session 2 data since these data were more representative of stabilized responses. 3. The GEFT measure was initially treated as a covariate in the ANCOVA model to assess its effect on the dependent measures. In addition, correlations between the GEFT scores and mean scores of each subject's performance combined over the 72 total treatment combinations were calculated to assess how the GEFT measure might relate to performance. In this way, a decision could be made on whether to exclude the GEFT measure from the final statistical model. 4. More importantly, the possibility of a time-accuracy tradeoff effect was assessed by correlating the means of time and accuracy measures combined across the two sets of 36 treatment combinations for each subject. Since each experiment was assigned a total of 24 subjects, this means that only two sets of 24 time and accuracy mean scores were correlated so that the final correlational value obtained could possibly be a reflection of chance. Therefore, to ensure that the procedure yielded a meaningful value, not only was its validity further ascertained by correlating between randomly selected time and accuracy measures among the 24 subjects, but correlations between time and accuracy performance scores for e a c h individual were also used as the basis for detecting the presence of any high time-accuracy tradeoff. Outliers detected, as well as those exceeding cutoff criteria selected for time and accuracy were not included in further statistical analysis. 5. Interactions between graph format, question type and other variables of interests were tested using the initial multivariate MANOVA procedure as well as the more commonly used univariate DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 111 ANCOVA approaches. Level of significance for testing null hypotheses on time and accuracy was set at the nominal 5% level. A more stringent criteria of a — 0.01 level was also used for distinguishing among significance of effects. Interpretation of experimental results based on the validity of the F-tests conducted took into account how the various assumptions were met, such as the normality condition, the homogeneity of variance-covariance, and the independence of the residuals, or, in the case of the repeated measures design, the sphericity or symmetry condit ion.t Power analysis of the various F-tests was also performed by means of an approach advocated by Cohen (Cohen, 1977; see also Baroudi & Orlikowski, 1987). This augmented confidence in accepting hypotheses of no difference between means (Glass & Stanley, 1970, p. 283). 6. Following the ANOVA F-tests, contrasts between means of subpopulations that are of interest are analyzed using the Dunn-Bonferroni method (Dunn, 1961) of multiple comparisons.* 7. Finally, the alternative scheme of combining subjects' performance on accuracy for sessions 1 and 2, with twice as much weighting given to results obtained in Session 2, was adopted so as to achieve a more normal distribution of the data.* Again, the criterion of a = 0.05 level was chosen for testing the null hypotheses among means of subpopulations of interests. C. THE EXPERIMENTAL RAW DATA The experimental raw datasets were derived from random assignments of seventy-two subjects, divided into three pools of twenty-four to be assigned to each of the experiments. The subject population consisted of about an equal number of mostly second-year commerce undergraduate and first-year MBA students. All of them were enrolled in introductory MIS courses. The overall average age of subjects was approximately 25 years old. The only criteria of selection was that the subject t These ANOVA nomenclature will be discussed further in separate sections. * The reason for adopting this method is discussed in the final section of this chapter. Briefly, for planned contrasts between two or more means following an ANOVA procedure, this specialized method offers a greater amount of flexibility and is more powerful than most others (Dunn, 1961, p. 54-58). * Refer to the discussion on the accuracy measures in chapter 4. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 112 volunteered for the study. Appendix E shows the subject recruiting form and appendix F the consent form, which subjects must sign before participating in the study. Corresponding to the series of three experiments, the summary statistics for the raw datasets gathered in this research are presented in table 5.1. Applications of the BMDP1D data screening procedure revealed only a small pocket of initial outliers. They represent atypical observations with respect to the two performance measures of time and accuracy. Thus, subjects whose performance on accuracy fell below the 80% cutoff criterion were removed from further analysis. Analysis of the time data suggested, however, the use of different cutoff criteria for different experiments. It was also determined that these criteria should result in the removal of the least possible number of outliers. Hence, a 5 seconds cutoff criteria was used in experiment El , a 10 seconds in experiment E2, and a 15 seconds in experiment E3 to eliminate relatively slow responders.t There was no missing data for the three experimental datasets collected. In table 5.1, outliers are highlighted. In general, subjects' performance scores were within reasonable ranges (table 5.1). Since no major interruptions occurred, such as errors in program execution, or a system crash during the course of the actual administration of the experiments, all the original data with exceptions of the few identified outliers discarded, were included in the statistical analyses. Table 5.1 shows the mean latency responses, the percentages of correct responses, and the respective standard errors for all three groups of 24 participants.* These are average scores combined over the two experimental sessions.* They include responses comprising the initial 36 attempts in each session of the experiments. No subsequent error correction attempts during the experimental sessions were included in the tabulated means. t Note that outliers identified also exhibited relatively higher standard error of mean. * Totally different groups of subjects were used in different experiments. * I.e. The 72 treatment combinations. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 113 Table 5.1: Summary of Experimental Raw Datasets P a r t i c i - Accuracy Experiments Time Experiment:, pants Scores 1 2 3 Scores I 2 3 1 SA (MEAN) 0.986 0 .944 0, .875 (MEAN) 2. ,532 3. ,673 11. ,649 2 S3 0.953 0 .953 0 .861 1. ,795 6. .277 3. ,640 3 SC 0.953 0 .953 0, .903 1. .806 5. .214 13. ,439 4 SO 0.953 0 .944 0 .694* 2 .613 5. .081 2. .362-5 SE 0.931 0 .889 0, .931 3. .899 5. .328 8. .542 6 SF 0.931 0 .944 0 .889 4. .203 6, .529 6. .602 7 SG 0.953 0, .931 0. .792 3, .241 5, .509 8. .302 8 SH 0.931 0 .972 1 .000-* 2 .487 8 .763 16, .296** 9 SI 0.847 0 .944 0. .861 2. .329 6 .164 4, .949 10 SJ 0.944 0 .347 0, .931 2 .760 8 .083 6, .865 11 SK 0.847 0, .931 0. .819 2. .111 3, .907 5. .914 12 SL 0.931 0 .958 0 .944 2 .550 7 .570 9 .306 13 SM 0.889 0, .944 0, .944 . 2, .166 4 .564 9 .055 14 SN 1.000 0 .958 0 .944 2 .530 5 .745 14, .111 15 SO 0.986 0, .931 0, .931 2, , 146 4, .938 6. .347 16 SP 0.972 0 .972 0, .931 .1, .757 4 .038 9, .182 17 so 0.972 0, .875 0. .903 1, .289 4 .337 . 5, .920 18 sa 0.944 0 .361 0, .361 1 .881 7 .987 7 .012 19 SS 0.917 0. .972 0. .931 2. ,695 7, .317 13, .173 20 ST 0.972 0. .375 0. .931 2 .946 5 .828 10, .379 21 su • 0.903 0. ,972 0. ,903 3. .629 6 .204 4, .197 22 sv 0.333 0, .986** '0. .953 3, .320 11 .456 •* 8 .371 23 sw 0.944'*0, ,903 0. ,953 5. .756' "6, .302 13, .391 24 sx 0.972 0. .972 0. .361 1, .637 5 .944 6, .329 1 SA (ST.ERR 0 .0139 0 .0272 0, .0392 (ST.ERR 0. ,1347 0. .2241 0. 9135 2 S3 OF MEAN) 0 .0Z37 0 .0237 0 .0410 OF MEAN) 0. ,1130 0. .4290 0. ,2206 3 SC 0 .0237 0, .0237 0. .0352 0. 1082 0. ,3913 0. 9262 4 SO 0 .0237 0 .0272 0, .0547* 0. ,1466 0. .4609 0. 1263"* 5 SE 0 .0302 0 .0373 0, .0302 0. ,1813 0. ,2961 0. 6965 6 SF 0 .0302 0 .0272 0 .0373 0. .1382 0. .4071 0. ,4643 7 SG 0, .0237 0 .0302 0. .0432 0. 1630 0. ,4962 0. 6767 8 SH 0 .0302 0 .0195 0. .0 * * 0. ,1383 0. .6407 1. 3677** 9 SI 0, .0427 0, .0272 0. ,0410 0. 2247 0. ,4230 0. 3087 10 SJ 0 .0272 0. .0427 0 .0302 0. .1935 0. .6504 0. ,6338 11 SX 0, .0427 0. .0302 0. .0456 0. 1491 0. .2135 0. ,3559 12 SL 0 .0302 0 .0237 0. .0272 0. ,1222 0, .6148 0. .6677 13 SM 0. .0373 0. .0272 0. ,0272 0. 1095 0. .3436 0. ,6353 14 SN 0. .0 0 .0237 0, .0272 0, .1215 0, .6283 0. .3878 IS SO 0, .0139 0. .0302 0. .0302 0. , 1065 0. .2590 0. .4081 16 S? 0 .0195 0 .0195 0, .0302 0, .1454 0 .1892 0, .4940 17 SO 0. ,0195 0. .0392 0. .0352 0. .0792 0. .2452 0. .5299 18 S3 0. .0272 0, .0410 0. ,0410 0. .0836 0 .4567 0, .5362 19 SS 0. .0323 0. .0195 0. ,0302 0. .1242 0 .6027 0 .9350 20 ST 0. .0195 0, .0392 0, .0302 0 .1139 0 .2849 0 .7177 21 SU 0. ,0352 0. 0195 0. ,0352 0. .2534 0, .3472 0, .2339 22 SV 0. .0442 0. ,0139**0. .0237 0. .1918 0, .9533**0 .6269 23 sw 0. 0272**0. 0352 0. 0237 0. ,5006* *0, .3708 1, .0922 24 sx 0. .0195 0. .0195 0, .0410 0 .1150 0 .4736 0 .4946 ** Outliers with high latency time response. * Outliers below 80% accuracy. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 114 D. EXAMINATION OF THE DATA STRUCTURE Assumptions required for the validity of F-tests in the fixed effects analysis of variance model are that the observations are mutually independent and normally distributed, and that the probability distribution within each level of the factors have the same variance. In the case of a repeated measures design, measures made on the same subject may usually be correlated and therefore, the independence assumption becomes critical. F-tests conducted are, however, normally computed in a way which would allow for some relaxation of the various assumptions. First, the normality assumption may usually be overlooked in an ANOVA or ANCOVA procedure because of the general robustnesst of such a procedure (see Glass et al., 1972, p. 246). Second, considerations for the homogeneity of variance assumption include the use of conservative F-tests with adjusted degrees of freedom as well as the use of equal cell observations. Finally, the independence of error terms assumption is replaced by symmetry assumptions,* one for each error sum of squares for which there is more than one degree of freedom for a within factor. 1. The Normality Assumption BMDP5D, designed and programmed by Chasen (BMDP Software Manual, 1985), provides~~an examination of the raw data structure to assess the normality assumption by means of histograms, normal probability plots, half normal plots, and detrended normal probability plots for all subpopulations and combined groups. It was found that the distribution of the accuracy measures was non-normal. This did not come as a surprise since the original data were essentially coded as binary. It indicated however, that a recoding or a transformation of the data appeared warranted t Indeed, "robustness" studies (e.g. Rider, 1929; Pearson, 1929,1931; Cochran, 1947; Hack, 1958) have confirmed that the violation of the normality assumption should not be of any great concern. Refer to Glass et al. (1972) for a comprehensive review of this issue. Read also the section on the normality assumption in this dissertation. * This is equivalent to the condition that orthogonal polynomials for any within factor are independent and have equal variances ( 1985 BMDP Software Manual, p. 379; see also Anderson, 1958, p. 259; Winer, 1971, pp. 594-599). According to the BMDP Software Manual, the symmetry assumption is not required for F-tests in an orthogonal polynomial breakdown or those that include only two levels of a repeated measures. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 115 before they could be submitted to further statistical analyses such as the ANOVA or ANCOVA procedures. The latency time response data, on the other hand, approximated more closely to normal distributions with expected occasional departures from the normality condition. One reason for the existence of such departures might perhaps be due to the fact that there were generally one or two subjects in each cell who showed a greater latency of time response for the various treatment combinations. It was therefore, not unreasonable to find that the overall distributions tended, most often, to be moderately skewed to the right. Yet, dramatic evidence showing the relative unimportance of the normality assumption for the ANOVA and ANCOVA procedures as reviewed by Glass et al. (1972) may be summarized as, Normality has negligible consequences on type-l and type-ll error probabilities unless populations are highly skewed, n's are small, and directional ("one-tailed") tests are employed. (Glass & Hopkins, 1984, p.351) As none of these warning conditions applied to time observations gathered in this research, and as there were enough cell observations for the various subpopulations in the repeated measures design used to ensure the normal distribution as a good approximation of the unknown distribution from which the observations were drawn, the validity of the ANOVA tests conducted for these experiments should not therefore be threatened (Lindman, 1974; Glass et al., 1972).t As for the relatively skewed distributions of accuracy scores, a method of normalizing the score distribution is the recoding of the binary "O's" and " V s " scores to a greater range of scores. This could easily be achieved through the combined scores for both sessions 1 and 2 as will be discussed in the analysis of accuracy data for the individual experiments. t The rationale for this is again, the robustness of the F-tests to violation of this assumption. Refer to previous footnotes. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 116 2. Homogeneity of Variance/Covariance The computation of F-statistics assumes that the data are sampled from normal populations with equal variances. When the sample sizes of the cells and the population variances are not equal, the distribution of F can be strongly affected and the validity of the F-tests becomes questionable. While many researchers are acquainted with Barlett's test of equality of variances for groups with nonzero variances (Dixon & Massey, 1969), it should also be noted that this test is sensitive to the assumption of normality and may improperly reject the null hypothesis too often when the distribution of the data is not normal. Hence, both Barlett's test and Levene's test, which is less sensitive to the normality assumption, were computed for the various subpopulations using BMDP7D and BMDP9D. Again, results of the tests appeared to be mixed with regards to homogeneity of variances for various subpopulations in respect of both time and accuracy performance scores. Since this assumption was not found to be totally upheld for each of the experiments, all reported analyses were conducted with conservative F-tests using modified degrees of f reedom,* which protects ANOVA tests for repeated measures against violation of this assumption. Furthermore, it should be noted that: subjects, randomly assigned to experiments in this research, came from a homogeneous population;* errors due to between subject differences were eliminated in the experimental design; and error sources were derived from cells with equal number of observations. in fact, Glass et al. (1972) have also claimed that, within the degree of variance heterogeneity one is apt to encounter in practice, violation with respect to the assumption discussed has negligible consequences on probability statements (type-l error) or power when n's are equal (see Glass & Hopkins, 1984, p.353). Moreover, since the only variability for a total repeated measures design used in this case are those within subjects, there is less to worry about violation of this assumption. + E.g.The Greenhouse-Geisser approach or the Huynh-Feldt approach. * ANOVA results with a grouping factor of undergraduate versus graduate students showed no significant effects for the grouping factor or its interaction with other factors studied. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 117 3. The Symmetry Condition It has been argued that the interpretation of the ANOVA/ANCOVA results for repeated measures design should rest on corresponding test results of the symmetry condit iont for each error term used in each specific F-test (Davidson, 1980). It has also been pointed out that while the symmetry test of BMDP2V is similar to the notion of Winer's compound symmetry (Winer, 1971, pp. 594-599), it is nonetheless less restrictive than Winer's compound symmetry in that it not only is a sufficient condition but it also is a necessary condition. Tests on the symmetry condition for specific error sum of squares relevant to the particular F-test, therefore, provide clear indications with repect to the independence and equal variances of orthogonal polynomial decomposition for the within factors in a repeated measures design.* An examination of the various sphericity test results for all of the error terms used in each of the experiments revealed that in only a very small number of cases were the symmetry assumption violated. According to the 1985 BMDP Statistical Software Manual (p. 379), When there is reason to doubt the symmetry assumption, either because of the sphericity test or because of compelling theoretical considerations, such as the fact that the within factor is time, for which there is a suspected carryover effect from one level of the within factor to the next, tests can be made by reducing the degree of freedom contributed by the within factors. These adjustments are due to Greenhouse and Geisser (1959) and Huynh-Feldt. See Frane (1980) for more discussion. In other words, the conservative F-test results will still provide the required statistics for drawing inferences in cases where the symmetry assumptions are violated. Nevertheless, a simple and tThis condition has been discussed previously. However, refer to Eiashoff (1986, p. 15-16) for an explanation of the general structure of the covariance matrix satisfying such a conditon. * The analysis of the within factors in a repeated measures design involves an orthogonal polynomial decomposition for the within factors (see appendix A.18 in the 1985 BMDP Software Manual). DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 118 conservative approach is to report all main factor effects and their interactions for significance based on either the Huyhn-Feldt or Greenhouse-Geisser adjustments to degrees of freedom. This research chooses to use the Greenhouse-Geisser probability values, which are calculated to yield very conservative results, for consideration of significant effects whenever possible. 4. The Univariate-Multivariate ANOVA/ANCOVA Issue Owing to the lack of knowledge about violations of the symmetry conditions, the BMDP2V ANOVA procedure and the BMDP4V MANOVA procedure were initially used to analyze the original datasets. However, when it was found that both analyses were showing precisely the same set of significant main and interaction effects for a chosen at the 0.05 level as a general cutoff point, the need to perform both procedures to analyze the datasets was considered redundant. Moreover, a study of the BMDP2V outputs revealed that many of the sphericity tests for the various error terms were not rejected, and therefore, it was argued that the univariate ANOVA results could be used to report the findings. The reason for this is that when this sphericity assumption is met, as it usually is, the univariate approach is considered more powerful than the multivariate approach. This is especially so even for cases where the sample sizes are also limited (Davidson, 1980). ln theory, the univariate approach for a repeated measures design requires a more restrictive assumption regarding the variances and covariances of the repeated measurements than the multivariate approach. Violation of this assumption may thus result in a less powerful test when there is an effect, or may result in a test that is too liberalt (Refer to 1985 BMDP Software Manual p. 395). Yet, the advantage of using the univariate approach for repeated measures design is its ease of accommodating covariates as well as testing for carryover (residual) effects, period effects, and order effects. t I.e. One which rejects the null hypothesis of no difference too frequently. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 119 In fact, these effects do not have a clear definition in the multivariate approach. Even so, while the multivariate approach is more flexible in the assumption regarding the covariance matrix for the repeated measurements, it does require that the repeated measurements have a multivariate normal distribution. Such an assumption could, conceivably, be very difficult to test and yet, violation of this assumption, such as the presence of multivariate outliers, could have serious consequences (Davidson, 1980). Hence, it appears that, the simpler univariate approach is a better choice than a multivariate approach for the repeated measures design in most cases. 5. Multiple Comparison Techniques Figure 5.2 taken from Glass & Hopkins (1984), with permission, shows a f low chart guide for the selection of multiple-comparison (MC) techniques. Since Glass & Hopkins (1984) provided a comprehensive review of the various techniques and when they are used, the discussion in this section will focus on just why the Dunn (1961) method is chosen for data analysis in this research. Trend analysis was not used because there was no plan for an underlying continuum for any of the independent variables included. There was also no requirement that all contrasts to be performed were orthogonal. This indicated why the ANOVA procedure was performed prior to performing mean contrasts that were of interests. D A T A A N A L Y S I S : T H E R E P E A T E D M E A S U R E S D E S I G N / 120 Figure 5.2: A Guide for Selection of Multiple-Comparison Techniques Source: Glass & Hopkins, 1984, p. 393, Reprinted by permission of Prentice Hail, Inc., Englewood Cliffs, New Jersey. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 121 The Dunn-Bonferroni (1961) multiple-comparison technique uses the Bonferroni inequality for determining the critical t-ratios. The Dunn method is perhaps best distinguished from the Scheffe method in that the Dunn method relies on predetermined or planned contrasts whereas the Scheffe method is a very flexible posf hoc data snooping method (Dunn, 1961; Glass & Hopkins, 1984, p. 381-383). The advantage of using the Schefie method, then, is because it can be used for making any simple or complex contrasts even after inspecting the means. However, Dunn argues that it is, ... possible in using the t-intervals to select as the intervals to be estimated a very large set of linear combinations which includes all those which might conceivably be of interest. Then, on looking at the data, one may decide on actually computing intervals for only some of this set. (Dunn, 1961, p. 56) Dunn (1961) provides tables to show that, for a fairly large number of means, the t-intervals (Dunn method) are shorter than corresponding intervals using the F-distribution (Scheffe method) for any reasonable size of the number of linear combinations (see tables 3 and 4 in Dunn, 1961, p. 56-57). As such, Miller (1966, p.54) argued that for a prespecified subset of the possible contrasts, the Dunn method is normally more powerful than the Scheffe method. While the Scheffe method (also known as F-projections or S-method) is the most widely published statistical technique for multiple comparison (Hopkins & Anderson, 1973), Glass & Hopkins claimed that "the flexibility of the Scheffe method such that post hoc data snooping for any number of contrasts is allowed, causes it to be a very conservative and inefficient procedure in the usual research circumstance in which there is interest in only a limited subset of possible contrasts, such as all pairwise contrasts." (Glass & Hopkins, 1984, p. 383). In this research, the Dunn-Bonferroni (1961) method of multiple mean comparison is used only on those pairwise contrasts among sub-population means that are of key interest and planned a priori. DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 122 E. SUMMARY In summary, there is little to worry about the various assumptions for the computation of F-statistics to be performed, for all of the experimental data captured in this research. The initial screening of the experimental data also revealed only a very small number of outliers. Moreover, the use of repeated measures design helps to ensure that sufficient power can be achieved even with low sample sizes. Finally, similar statistical procedures with emphasis placed on the univariate ANOVA approach and the Dunn-Bonferroni (1961) multiple-comparison method are adopted for analyzing and interpreting all experimental data captured in this research. VI. RESULTS: EXPERIMENT 1 This chapter presents results of experiment 1 (E1). The subject composition for this experiment consisted of nineteen males and five females. Of these twenty-four candidates, thirteen were second-year commerce undergraduates whereas eleven were first-year MBA students. The age of these subjects ranged from 20 to 32 with the average falling close to 24.83 years of age. At the time of participation, all subjects were enrolled in MIS courses at the introductory level. In addition to the discussion of chapters 4 and 5, a number of key issues concerning data analysis still need to be addressed. The first of these considerations is whether separate analyses should be performed on time for sessions 1 and 2 datasets. An apparent justification for making this decision would be, for instance, the presence of strong learning as indicated by a significant difference found between time performance in sessions 1 and 2. The second consideration is whether to exclude the GEFT variable from the final statistical model to be used for analyzing the various datasets. For example, the lack of a significant GEFT effect when included as a covariate, or the lack of a strong correlation between GEFT scores and performance scores, would justify such an action. The third consideration is whether there are more outliers than those already identified. For example, including a subject having a high time-accuracy tradeoff might well contribute only to less interpretable results. This is so because in the face of a high time-accuracy tradeoff, factors or combinations of factors found to contribute to high time outcomes might also be interpreted as those contributing to lower accuracy, thus, confounding the findings. Yet, such a confounding can only be detected when there is a simultaneous tracking of time and accuracy performance. The final consideration is the issue of statistical power. This is especially important when several additional outliers are found and a decision has to be made on whether to discard all of them, or retain 123 RESULTS: EXPERIMENT 1 / 1 2 4 those who appeared to present only marginal problems. The purpose of maintaining high power values for the various F-tests is, of course, to increase the level of confidence for not rejecting the null hypotheses. These issues are discussed in sequence in the following few sections, prior to discussing the detailed results on time for task performance during sessions 1 and 2. There follows a presentation of accuracy performance statistics for the two sessions combined. The chapter will then conclude with a summary of key findings for E1. A. TIME PERFORMANCE FOR COMBINED SESSIONS An initial ANCOVA model was run on the full dataset which included both sessions 1 and 2 datasets. The model was a repeated measures design with the following major classification factors: 1. S: Session (Session 1, or Session 2) 2. G: Graph Format (Bars, or Symbols, or Lines) 3. Q: Question Type (Q1, or Q2, or Q3) 4. T: Time Period (7, or 14 Periods) 5. D: Dataset Category (1 , or 3 Datasets) The GEFT scores were treated as a covariate in the model. 1. The Session Effect Analysis of time performance on the full dataset with just one outlier excluded (see table 5.1) for this first experiment revealed a highly significant (F=60.99, p < .01) Session effect as shown in table 6.1. Mean time response dropped from 2.8 seconds in session 1 to 2.2 seconds in session 2. Subjects appeared to experience learning and the implication of the effect is therefore the need for a separate analysis of experimental datasets for time captured during sessions 1 and 2. RESULTS: EXPERIMENT 1 / 1 2 5 2. The GEFT Measure Results of the analysis (table 6.1) did not reveala significant GEFT effect (F = 1.78; p > .05) although there was the need to further substantiate the evidence before a final decision could be made on whether to exclude the GEFT variable from the analysis model. Accordingly, when the GEFT scores of all subjects were correlated with their respective mean time performance scores combined over the 72 treatment combinations, BMDP6D revealed only a low and insignificant correlation (R = -.29; P(R) = .17; Mean time = 2.67 seconds; Mean GEFT=14.3). Moreover, correlations of corresponding scores between and within sessions did not produce significant GEFT-time relationships. A R-square value that was essentially zero (R-square = 0.0003) was also obtained when the GEFT measure was placed solely in a regression model with the time response specified as the only dependent measure. In summary, the evidence collectively supported the relative unimportance of the GEFT measure in explaining subjects' time performance measure. Hence, a decision was made to discard the GEFT scores from the final statistical model used to analyze effects due to latency of reaction. 3. Additional Outliers Since the presence of outliers that exhibited a high time-accuracy tradeoff effect could contribute only to less interpretable results, the degree of time-accuracy tradeoff among individuals was assessed by correlating their time performance scores with their accuracy performance scores over all experimental trials. Those that indicated a high time-accuracy tradeoff were to be highlighted and considered for removal. RESULTS: EXPERIMENT 1 / 1 2 6 Table 6.1: Initial ANCOVA Results for the Full Dataset (Experiment 1) Dependent Variable: Time Performance Sources of Va r iance F Convent ional p-values Greenhouse Gei sser Prob. GEFT 1 .78 0.1971 S: Session 60 .99 0.0000** G: Graph Format 6 . 42 0.0037** 0.0044** Q: Question Type 1 7 . 42 0.0000** 0.0000** T: Time Period 62 . 1 4 0.0000** D: Dataset 73 . 1 8 0.0000** G*Q 21 .29 0.0** 0.0** G*T 0 . 38 0.6371 0.6370 Q*T 17 .0 1 0.0000** 0.0000** G*D 7 .65 0.0015** 0.0017** Q*Q 0 .31 0.7370 0.7165 T*D 1 . 69 0.2078 G*Q*T 1 .95 0.1104 0.1302 G*Q*D 3 .52 0.0 104* 0.0195* G*T*D 1 .75 0.18 63 0.1867 Q*T*D 1 1 .44 0.0001** - 0.0002** G*Q*T*D 3 .23 0.0 163* 0.0220* Significant at p = 0.05 level * Significant at p = 0.01 level-RESULTS: EXPERIMENT 1 / 1 2 7 Yet, the possibility that a particular subject could have a high time-accuracy tradeoff during the combined sessions but not for the separate sessions made it more difficult to decide how these additional outliers were to be identified. It was decided that session 2 data should be used for detecting these outliers since significant effects found during session 2 were of greater interest than effects found significant during session l . t This was due to the possibility that some of the effects found to be significant during session 1 might just be temporal, or even spurious, whereas those more permanent effects would be expected to show again during later sessions regardless of the amount of training subjects had undergone. Additional outliers found for E1 based on individual time-accuracy correlations and their respective significance during session 2 have been marked in table 6.2. Whether all of these additional outliers should be eliminated depends again on how much power would be lost due to their subsequent removal. Consequently, it appeared most appropriate, at this point, to discuss briefly the relatively important concept of the statistical power of the F-tests associated with the ANOVA/ANCOVA procedures (see Cohen, 1977; Baroudi & Orlikowski, 1987). Deciding on a power level desirable for the various F-tests can also help to determine the benefits of still retaining some of those additional "outliers" who might just present marginal problems. t The concern with those effects found significant during the combined sessions was already ruled out on the basis of the presence of the significant (p < .01) Session effect discussed earlier. RESULTS: EXPERIMENT 1 / 1 2 8 Table 6.2: Time-Accuracy Correlations for Identifying Additional Outliers ^ / (Experiment 1, Session 2) Dependent Variable: Time Performance Scores E l S u b j e c t s ( S e s s i o n 2) C o r r e l a t i o n R P r o b a b i 1 i t y P(R) Sample Size 0 1 02 03 04 05 06 07 08 09 1 0 1 1 1 2 1 3 1 4 1 5 1 6 I 7 18 1 9 20 2 1 22 24 .1404 ( P E R F E C T -.2892 . 4278 -. 322 1 -.3171 ( P E R F E C T -.4065 -.1871 ( P E R F E C T ( P E R F E C T .1618 - . 1 159 ( P E R F E C T ( P E R F E C T ( P E R F E C T . 2706 -.1121 .1735 .0915 - . 2527 -.2873 -.3022 . 4032 ACCURACY NOT .0732 .0063** .0482* .052 I ACCURACY -- NOT .0107* .2626 ACCURACY -- NOT I ACCURACY -- NOT . 3342 . 4908 ACCURACY -- NOT I ACCURACY NOT I ACCURACY -- NOT . 1 006 .5053 .2353 .5372 . 1 265 .0803 .0732 36 COMPUTABLE) 36 36 36 36 COMPUTABLE) 36 36 COMPUTABLE) COMPUTABLE) 36 36 COMPUTABLE) COMPUTABLE) COMPUTABLE) 36 36 36 36 36 36 36 * Significant at p = 0.05 level ** Significant at p = 0.01 level RESULTS: EXPERIMENT 1 / 1 2 9 4. The Power Analysis Cohen (1977) defines the power t of a statistical test as the probability of yielding statistically significant results. Baroudi & Orlikowski (1987) reviewed 57 management information system articles published in such leading journals as Communication of the ACM, Decision Sciences, Management Science, and MIS Quarterly to find that, on average, the statistical power of inference testing for these articles fell substantially below the acceptable norm of .80 (Cohen, 1965, 1977; Welkowitz et al., 1982). According to Baroudi, not only does power analysis provide a measure of confidence when interpreting F-test results, but it also ensures that the decision to support the null hypothesis is not a misrepresentation. An approach for calculating power is given in Cohen (1977), and this was used in the analysis of pilot results for this experiment*. Based on this approach with a = .05, and effect size, f estimated rather conservatively at .25 (see Cohen, 1977, p. 277-281), power analysis was performed for all of the F-tests conducted. Even taking into consideration all the additional outliers that were identified for subsequent removal (see table 6.2), most if not all of the F-tests conducted showed power values clearly above the conventional benchmark of .80.* Consequently, this eases the need to report only results for those tests where the decision was to reject the hypotheses of no differences between group means for various subpopulations. The argument for such an approach is simply that when statistical tests conducted have power values close to that conventionally acceptable, a "no effect" conclusion may be confidently stated. t See the Glossary for an alternative definition of this term. * Appendix I contains a summary report of the pilot test results. * Power values were predetermined for this research as stated in the pilot test report (see appendix I). RESULTS: EXPERIMENT 1 / 1 3 0 B. TIME PERFORMANCE FOR SEPARATE SESSIONS Evidence of learning, as well as the independence of the GEFT measure from the dependent variable of time performance, justified the use of a simpler model to analyze the combined dataset separately: one for session 1, and the other for session 2. As observed earlier, the three additional outliers uncovered for session 2 were subsequently discarded on the basis that this would only have a negligible effect on the power values of the various F-tests conducted. Moreover, to ensure that findings for the various datasets (i.e. the full and separate datasets for sessions 1 and 2) are comparable, these additional outliers were also purged from the full dataset, as well as the dataset for session 1, before performing further statistical analyses. Since BMDP2V computed its own error term for each respective F-test in the specified repeated measures model, corresponding analyses of the datasets were conducted using the SAS ANOVA procedure to cross-examine results. For the SAS analyses, a mixed effect model designt was used. A five-way analysis of variance procedure for which the four fixed factors of Graph Format, Question Type, Time Period, and Dataset were totally crossed with a fifth random Subject factor. The standard error terms for such a model with repeated measures were applied accordingly to test the various specific F-tests. Results of both the BMDP and SAS packages were in agreement indicating that the correct error terms had been computed by the BMDP software. Tabie 6.3 compares those main factors and their interactions that were significant for the combined as well as separate sessions (i.e. session 1 and session 2). t This design combines the fixed effect model with the random effect model, (see the 1985 SAS User's Guide: Statistics, Version 5 Edition). RESULTS: EXPERIMENT 1 / 1 3 1 Table 6.3: Comparison of ANOVA Results Among Sessions (Experiment 1, Additional Outliers Excluded) Dependent Variable: Time Performance Sources of Variance p-values (Combined Sessions) p-values (Sess ion 1) p-values (Session 2) S: Session 0 .0000** G: Graph Forma t 0 .0148* 0 .0145* 0 .0888 Q: Question Type 0 .0000** 0 .0003** 0 .0020** T: Time Per iod 0 .0000** 0 .0000** 0 .0002** D: Dataset 0 .0000** 0 .0000** 0 .0000** G*Q 0 . 0** 0 . 0** 0 .0012** G*T 0 .3509 0 .8039 0 . 1 463 G*D 0 .0027** 0 .0240* . 0 .0256* Q*D 0 .8802 0 .8262 0 .4567 Q*T 0 .0000** 0 .0000** 0 .0243* T*D 0 .2218 0 .6321 0 .2100 G*T*D 0 .1152 0 .4175 0 . 3684 G*Q*T 0 .2905 0 . 1 653 .0 .3131 G*Q*D 0 .0307* 0 .0333* 0 .7287 Q*T*D 0 .0007** 0 .0111* 0 .0005** G*Q*T*D 0 .024 1* 0 .0373* 0 . 1 590 + Significant at p = 0.05 level ** Significant at p = 0.01 level RESULTS: EXPERIMENT 1 / 1 3 2 Table 6.4: Tables of Means for All Treatment Combinations (Experiment 1, Outliers Excluded) Dependent Variable: Time Performance G r a p h i c a l I n f o r m a t i o n Complex i t y Q l B a r s Q2 Q3 Ql fmbol i Q2 Q3 I Q l -.ines Q2 Q3 a S e s s i o n 1 S e s s i o n 2 1 .88 1 . 65 1 . 89 1 .54 1 . 97 1 .66 1 .89 1 . 54 1 . 95 1 . 68 2 .79 1 .80 3 . 06 2 .44 2 . 02 1 . 42 2 .39 1 .92 b S e s s i o n 1 S e s s i o n 2 2 .79 2 . 28 2.13 1 .72 2 .06 1.75 3.01 2 .39 3 .22 1 . 90 2.03 1 .84 3 . 60 2.51 2.39 2.10 2 .26 1.17 c S e s s i o n 1 S e s s i o n 2 2 .65 2 .26 3 .02 2.24 3.16 2 . 29 2 . 56 1 .89 3 .24 2 . 27 2 .66 2.11 3 . 76 2.66 3.16 2 .45 2 . 60 1 .68 d S e s s i o n 1 S e s s i o n 2 3.25 2 .86 3.17 2.81 4 .27 2 . 99 3.73 2 . 40 2.85 2 . 04 2.88 2 .42 5.49 3. 27 2.85 2 .36 2.66 2.14 T r e a t m e n t C o m b i n a t i o n s o f I n f o r m a t i o n C o m p l e x i t y a : 1 D a t a s e t w i t h 7 T i m e P e r i o d s b: 1 D a t a s e t w i t h 14 T i m e P e r i o d s c : 3 D a t a s e t s w i t h 7 T i m e P e r i o d s d: 3 D a t a s e t s w i t h 14 T i m e P e r i o d s RESULTS: EXPERIMENT 1 / 1 3 3 A comparison of table 6.1 with this table shows that generally similar factors or combinations of factors were significant for the full dataset (i.e. including the additional outliers) as compared to those for the reduced dataset (i.e. excluding all outliers). Mean scores for time performance combined across all the initial 36 treatment combinations for sessions 1 and 2 for the reduced dataset are presented in table 6.4. Note the apparent improvement over each treatment combination, which is also representative of a separate experimental trial, when mean values between sessions are compared. 1. Significant Effects on Time for Session 1 With the observation made earlier that session 2 findings are to be treated as more critical than session 1 results for this research, only significant effects of key interest are discussed in this section. Note, however, that findings for session 1 could add to our knowledge of graphics users in situations where the readers only retrieve computer graphics displays infrequently, for examples, executives who use graphical support systems occasionally, or those less experienced first-time graph users, who might still require a substantial amount of training with reading and understanding graphical charts. Table 6.3 shows the following factors or combinations of factors to be significant for time during session 1: 1. Graph Format effect 2. Question Type effect 3. Time Period effect 4. Dataset effect 5. Graph Format by Question Type interaction 6. Graph Format by Dataset interaction 7. Question Type by Time Period interaction 8. Graph Format by Question Type by Dataset interaction 9. Question Type by Time Period by Dataset interaction RESULTS: EXPERIMENT 1 / 1 3 4 10. Graph Format by Question Type by Time Period by Dataset interaction Among these, only the Graph Format by Question Type interaction is of key interest since the major purpose of this research is the understanding of how different graph formats would affect performance with various tasks. Figure 6.1 depicts the plot for this 2-way interaction during session 1 of E1 and shows the corresponding mean values. Table 6.5 gives results of the multiple mean comparison produced by BMDP7Dt software on those sets of differences among means that are of major interest; that is, those contrasts that are within each question type or those that are within each graph type but not those among different question and graph types (table 6.5). Hence, out of a total of 36 possible contrasts, only 9 contrasts among means are planned and tested by the Dunn-Bonferroni method. Study of figure 6.1 and table 6.5 reveals that key significant differences were between a particular treatment combination (i.e. the Line-Q1 combination) and the other treatment combinations compared. Subjects using line graphs took significantly longer for extracting scale-values (Q1) than for extracting level differences (Q2) or trends (Q3). Moreover, when lines were used for performing Q 1 , subjects were significantly slower than when either symbols or bars were used. t BMDP7D procedure performs multiple means comparisons of subpopulation observations based on the Dunn-Bonferroni technique. RESULTS: EXPERIMENT 1 / 1 3 5 Figure 6.1: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 1, Session 1) Dependent Variable: Time Performance Q u e s t i o n T y p e Ba r s Symbo1s L i n e s Q l 2.64 ( * I ) 2.80 (*4) 3.98 (*7) Q2 2.56 (*2) 2.82 (*5) 2.60 (*8) Q3 2.87 (*3) 2.59 (*6) 2.48 (*9) * N u m b e r i n g i n c e l l s c o r r e s p o n d f o r D u n n - B o n f e r r o n i m e t h o d o f t o t r e a t m e n t c o m b i n a t i o n s m u l t i p l e means c o m p a r i s o n . RESULTS: EXPERIMENT 1 / 1 3 6 Table 6.5: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 1, Session 1) Dependent Variable: Time Performance 1. Differences found among Means at the a = 0.01 level of significance: a. C V 7 ) b. (*4,*7) c. (*8,*7) d. (*9,*7) 2. No other differences among the 9 mean comparisons that are of interest were significant either at the a = 0.01 or 0.05 level of significance ( The reader should refer to the discussion given in the main body of text on which 9 out of 36 mean comparisons were planned and of interest). 3. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 6.1. RESULTS: EXPERIMENT 1 / 1 3 7 In general, these results are in line with major aspects of the Kosslyn-Pinker theory (Kosslyn et al., 1983). Symbols as well as bars (see table 6.5) proved to be better suited than lines for performing Q1 primarily because they have characteristics of being isolated and discrete, and they are probably processed individually (i.e. one bar after another). In contrast, lines are not well suited to this particular task as the task involves the breaking up of a 'Cestalt' -- the line. This result is also in agreement with the anchoring concept of matching appropriate formats to appropriate tasks discussed in chapter 3. Since lines have the worst x-axis as well as y-axis anchoring characteristics compared to either bars or symbols, they proved to be the least appropriate format for tasks with a strong anchoring on both of the two major dimensional axes (i.e. characteristics of Q1 of this experiment). 2. Significant Effects on Time for Session 2 At this point, it should be noted that emphasis in this dissertation is placed on Session 2 results for several reasons. First, session 2 findings would be more generalizable to situations where graphics are in constant use or among frequent graphics users. Second, by session 2, all participants had been sufficiently exposed to the various graphics stimuli and therefore, effects found during session 2 should be very much less contaminated by the lack of experience or training with the different types of graph format tested. More importantly, the findings will contribute to an accumulation of the general knowledge about the type of task activities that would be best supported by various types of graph format and design. In short, it will add to our knowledge on the reading and understanding of graphics at a more "practical" level than results found in session 1. Interestingly enough, the number of significant main and interaction effects found in session 2 were factors, or combinations of factors that had turned out to be significant in session 1 of this experiment (E1). More importantly, the number of significant effects or interactions found to be significant in session 2 was considerably reduced as compared to those found in session 1. This also confirmed our knowledge that the learning curve is a marginally diminishing one. RESULTS: EXPERIMENT 1 / 1 3 8 For session 2, the following significant main and interaction effects were found: 1. Question Type effect 2. Time Period effect 3. Dataset effect 4. Graph Format by Question Type interaction 5. Question Type by Time Period interaction 6. Graph Format by Dataset interaction 7. Question Type by Time Period by Dataset interaction a. Main Factor Effects on Time for Session 2 . 1. Question Type -- This factor turned out to be highly significant (p < 0.01). Subjects were found to take approximately 1.98 seconds for performing questions on trend extraction (Q3), 2.05 seconds for questions on level difference extraction (Q2), and 2.35 seconds for questions on scale-value extraction (Q1). The Dunn-Bonferroni tests on Session 2 indicated significant differences between performing Q1 and Q2 and between performing Q1 and Q3. In short, there are differences in extracting different types of data from standard time-series graphics. In particular, Q1 appeared to be a more difficult task than Q2 and Q3 in this experiment. Perhaps the differences would have been nonsignificant had tabular displays been included since results from prior research has shown that the best form of information presentation for reading scale-values (Q1) is that of a table (see Jarvenpaa & Dickson, 1988). 2. Time Period - Effect of this highly significant factor (p < .01) was as expected. As the number of time periods depicted along the abscissa of time-series graphics increased, time performance was found to deteriorate correspondingly. Average time for using 7-period graphics was close to 2 seconds compared to 2.3 seconds for 14-period graphics. 3. Dataset -- As expected, an increase was found on latency of responses as the number of datasets depicted on a single plot increased. For graphs with 3 datasets; time performance RESULTS: EXPERIMENT 1 / 139 averaged 2.4 seconds but for graphs with only a single dataset, average time performance dropped to 1.9 seconds. Two-way Interactions on Time for Session 2 Graph Format x Question Type -- This interaction is of central focus to the study. Just as in session 1, this interaction was found to be highly significant (p < .01) during session 2. Figure 6.2 shows the plot and mean values for this interaction and table 6.6 shows the Dunn-Bonferroni results for this interaction. According to the Dunn-Bonferroni tests for session 2, subjects who used line graphs took significantly longer (p < .01) in extracting scale-values of single datapoints (Q1) than reading trends (Q3). Lines also took significantly longer (p < .05) in extracting scale-values (Q1) than in reading level differences (Q2). In addition, symbols were significantly (p < .05) faster to use than lines for reading scale-values (Q1). No other significant differences that were of interest were found at the nominal a = 0.05 level (table 6.6). Since this same interaction was also found to be significant during session 1 and the details of Dunn-Bonferroni results were more or less similar to those found in session 1, both trained and untrained subjects alike found lines to be difficult to use for extracting scale-values. This is consistent with the view that subjects tend to read each 'line' on a line graph as a Cestalt so that isolating single points on a line to read their scale values results in a time consuming and effortful process. RESULTS: EXPERIMENT 1 / 140 Figure 6.2: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance a.a -DC 1^  ».»-O D o r - • O J , ——• , — Q U C 2 T I C N T Y P C Quest ion Type Bars Symbo1s L i n e s Ql 2.26 (*1) 2.05 (*4) 2.72 (*7) Q2 2.08 (*2) 1.97 (*5) 2.08 (*8) Q3 2.17 (*3) 2.04 (*6) 1.7 3 (*9) * Numbering in c e l l s correspond for Dunn-Bonferroni method of to treatment combinat ions m u l t i p l e means comparison RESULTS: EXPERIMENT 1 / 141 Table 6.6: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance a. Significant Differences among Means at the a = 0.01 level: 1) (*9,*7) b. • Significant Differences among Means at the a = 0.05 level: 1) (*4,*7) 2) (*8,*7) c. No other contrasts that were of interest were significant. The reader should also refer to table 6.5 which presents results for session 1. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 6.2. . RESULTS: EXPERIMENT 1 / 142 The rationale underlying these results has already been explained in the earlier section on the same significant Graph Format by Question Type interaction for session 1. That lines were also found to take significantly longer for performing Q1 when compared to Q2, but that no significant differences were found on time performance between Q2 and Q3, simply highlighted the similarity of Q2 and Q3 in this experiment. That is, it takes just two points (whether adjacent or not) to produce a slope or trend. This is also consistent with the grouping of Q2 and Q3 on the basis of the anchoring concept as belonging to the same task category (i.e. Group II tasks). 1 2. Question Type x Time Period -- Unlike the disordinalt Graph Format x Question Type interaction, this two-factor interaction appeared to be strictly ordinal.£ The plot and mean value table for this significant interaction during session 2 are shown in figure 6.3. The Dunn-Bonferroni results are summarized in table 6.7 Results of the Dunn-Bonferroni tests for session 2 indicated that with more time periods (14 periods), performance on Q1 alone was adversediy affected (p < .01). No statistically significant adverse effects were found with increasing time periods on both level difference questions (Q2) or, trend questions (Q3). In addition, for 14 period graphics, Q1 was found to take significantly (p < .01) longer to perform than either Q2 or Q3. Consistent with the previous finding on the highly significant direct effect of the Time Period factor, this result supported the notion that complexity of graphics is stronger with more time periods depicted along the abscissa of a time series plot. However, it also indicated, together with the finding on the Question Type effect, that, in effect, more time periods had greater adverse effects on certain task activities (e.g. Q1) in this experiment (E1) when compared to others (e.g. Q2 and Q3). t See figures 6.1 and 6.2. £ This term is defined in the Glossary. RESULTS: EXPERIMENT 1 / 143 Figure 6.3: Plot and Mean values of Question Type x Time Period Interaction (Experiment 1, Session 2) L a g o n d Q l Q 2 Q U E S T I O N T Y P E Que s t i on Type Ql Q2 Q3 7 P e r i o d s 2.07 (* 1 ) 1.94 (*3) 1.91 (*5) 14 Per i o d s 2.62 (*2) 2.16 (* 4) 2.05 (*6) * Numbering i n c e l l s c o r r e s p o n d f o r D u n n - B o n f e r r o n i method of t o t r e a t m e n t c o m b i n a t i o n s m u l t i p l e means c o m p a r i s o n RESULTS: EXPERIMENT 1 / 144 Table 6.7: Summary of Dunn-Bonferroni Results for Question Type x Time Period Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance a. Significant Differences among Means at the a = 0.01 level: 1) (*V2) 2) (*2,*4); (*2,*6) b. No other contrasts of key interest were significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 6.3. RESULTS: EXPERIMENT 1 / 1 4 5 3. Graph Format x Dataset - Just as the Question Type x Time Period interaction, this was another ordinalt interaction as plotted in figure 6.4 which also shows the mean value table for this interaction. Table 6.8 shows a summary of the Dunn-Bonferroni tests for this interaction. Analysis of this interaction for session 2 revealed that using different types of graph format resulted in different degrees of additional time and effort for multiple dataset versus single dataset representations. The Dunn-Bonferroni tests indicated that for bars, a highly significant increase in latency time performance was found whereas the effect with lines was of less significance (table 6.8). No significant difference was found for using singular symbol plots versus multiple symbol plots. On the other hand, multiple (3 datasets) bars were found to yield significantly greater latency responses when compared to multiple symbol representations. One reasoning that might explain these observations is that with multiple datasets, multiple bars that are also isolated from one bar to the others are used.* In other words, bars belonging to a particular dataset category are represented as isolated bars. * The same is not true about multiple symbol charts or multiple line graphs. In line graphs, each dataset is depicted as one line, which corresponds to a Gestalt by itself since the various points belonging to the same dataset are fully connected on each line. In symbol charts, although the various symbols are not fully connected, symbols belonging to the same dataset are distinct from those belonging to other datasets. Moreover, these symbols normally form a chain pattern t Essentially, this means that relatively higher effects are found for the same level of one factor across all levels of the other factor in a two-way interaction. In other words, lines representing the interaction do not cross each other. * Illustrations on how multiple bar charts were coded in this research are given in appendices B, C, and D. * For the purpose of future references, the term "catergorical isolation" of multiple bars will be used to describe this situation. RESULTS: EXPERIMENT 1 / 146 which the human eye can automatically link. In contrast, differently coded bars, which are similar in shape to each other, are often used to depict different datasets. This explains why with bars, increasing the number of datasets depicted resulted in a lot more time and effort to read and understand than with other formats. An important implication of these results is that the use of bars should be limited to representing single datasets. Then, when multiple datasets must be represented, it is advisable to use either symbols or lines. The choice between using multiple lines or symbols appears to depend on the anchoring characteristics of the tasks at hand. In this experiment, since there was a strong abscissa anchoring for all tasks, the use of multiple symbols proved to be faster to use than multiple lines. Hence, the difference between using multiple symbols and singular symbols was not found to. be significant on time but not between using multiple lines versus singular line (table 6.8). RESULTS: EXPERIMENT 1 / 147 Figure 6.4: Plot and Mean values of Graph Format x Dataset Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance G r a p h Forma t B a r s Symbo1s L i n e s 1 D a t a s e t 1 .77 (* 1 ) 1.86 (*3) 1 . 93 (*5) 3 D a t a s e t s 2.58 (*2) 2.19 (*4) 2.4 3 (*6) N u m b e r i n g i n c e l l s c o r r e s p o n d t o t r e a t m e n t c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i m e t h o d o f m u l t i p l e m e a n s c o m p a r i s o n RESULTS: EXPERIMENT 1 / 148 Table 6.8: Summary of Dunn-Bonferroni Results for Graph Format x Dataset Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance a. Significant Differences among Means above the a = 0.01 level: 1) (*V2); b. Significant Differences among Means at the a = 0.01 level: 1) (*2,*4) 2) (*5,*6) c. No other contrasts of interest were significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 6.4. RESULTS: EXPERIMENT 1 / 149 c. Three-way Interactions on Time for Session 2 The ANOVA results for session 2 produced one significant (p < .01) three-way interaction (see table 6.3). This was the Question Type x Time Period x Dataset effect. The data table for this interaction is given in table 6.9 and results of the Dunn-Bonferroni tests on those contrasts that are of interest in Session 2 t are displayed in table 6.10. An examination of the Dunn-Bonferroni tests results revealed a significant and consistent increase in latency time for performing the same tasks (i.e. Q l , Q2, and Q3) at higher levels of the time period and dataset category variables. In other words, performing Q 1 , Q2, and Q3 on plots with only 1 dataset and 7 periods were found to take significantly less time than performing these tasks on plots with 3 datasets and 14 periods, as shown in table 6.10. Even so, performance of Q2 was adversely affected by increasing datasets on 7-period plots whereas performance of Q3 was adversely affected by increasing datasets on 14-period plots. In addition, it was also found that only with 14-period single dataset plots were task activities associated with Q1 taking significantly (p < .01) longer than task activities associated with Q3. No other meaningful comparisons were significant. In summary, these results were consistent with the ealier findings that increasing number of time periods and datasets adversely affected time performance with graphics displays, although the extent of these adverse effects depended largely on the type of task activities and on whether time periods or datasets were to be increased. t The cell means are marked out in table 6.9 with a number to correspond to the Dunn results on contrasts reported in table 6.10. RESULTS: EXPERIMENT 1 / 1 5 0 Table 6.9: Data Table of Question Type x Time Period x Dataset Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance Q u e s t i o n T ype T r e a t m e n t s o f I n f o r m a t i o n C o m p l e x i t y F a c t o r s 7 T i m e One Da t a s e t 1 P e r i o d s T h r e e Da t a s e t s 14 T i m e One Da t a s e t I P e r i o d s T h r e e D a t a s e t s Ql _ S e s s i o n 2 1 . 8 8 s (* 1 ) 2.27 s (*2) 2.39 s (*3) 2.84 s (*4) Q2 S e s s i o n 2 1.55 s (*5) 2.32 s (*6) 1.91 s (*7) 2.40 s (*8) Q3 S e s s i o n 2 1 . 7 9 s (*9) 2.03 s (*10) 1 . 58 s (*11 ) 2.51 s ( *LAST). * N u m b e r i n g i n c e l l s c o r r e s p o n d t h e D u n n - B o n f e r r o n i m e t h o d o f t o t r e a t m e n t c o m b i n a t i o n s f o r m u l t i p l e m e a n s c o m p a r i s o n . RESULTS: EXPERIMENT 1 / 1 5 1 Table G.10: Summary of Dunn-Bonferroni Tests for the Question Type x Time Period x Dataset Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance 1. Significant Differences among Means at the a = 0.01 level: a. (*1,*4) b. (*3,*11). c. (*5,*8); d. (*11,*12) 2. Significant Differences among Means at the a = 0.05 level: a. (*5,*6) b. (*9,*12) 3. No other contrasts of interest were significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in table 6.9. RESULTS: EXPERIMENT 1 / 1 5 2 Finally, results of this interaction indicated that the finding of Q1 taking longer to perform than Q3 in this experiment was due solely to the case of single dataset plots with 14-periods. In other words, more time periods on single dataset plots facilitated trend perception. But with graphics that were highly complex, difficulties were found with the performance of all tasks. C. ACCURACY PERFORMANCE FOR COMBINED SESSIONS In this experiment, since subjects whose accuracy fell below the 80% mark and whose time-accuracy scores had been significantly correlated were subsequently removed from all of the experimental datasets, it was expected that few, if any, effects due to accuracy performance would be of significant interest. Moreover, as the emphasis of theory testing in this research was placed primarily on time (i.e. the Kosslyn-Pinker theory) as discussed in the other chapters, it seemed reasonable not to be overly concerned about accuracy scores. Indeed, an analysis of the accuracy scores for the original dataset collected in session 2 alone revealed none but one significant 3-factor interaction (table 6.11). The information that was available based on the analysis of session 2 dataset would, therefore, be insufficient to provide any useful insights. In fact, when multiple comparisons of means were performed for all main and two-factor combinations for session 2 dataset alone, no significant effects at the nominal level were uncovered. This confirmed the relatively low informational nature of the accuracy scores as compared to time scores for session 2 of this experiment. The alternative of performing the data analysis for accuracy on the combined dataset with more emphasis placed on session 2 scores was thus adopted. Furthermore, in order that departure from the normality assumption could be minimized because of the binary nature of the accuracy performance scores captured originally, these scores were transformed by the coding scheme to be described. Each correct initial attempt in session 2 was reassigned a score (i.e. "2") which doubled the score (i.e. "1") assigned to each correct initial attempt in session 1. Errors committed during initial attempts regardless of the session in which they RESULTS: EXPERIMENT 1 / 1 5 3 were committed were assigned "O's". In this way, the final recoded data would range from scores of 0, 1, 2, and 3 instead of the original "O's" and "1 's" . This new coding scheme yielded a much more nearly normal distribution of the original data. It seems important also to consider whether the statistical model to be used should include the GEFT measure. In other words, the question is whether to use an ANCOVA or ANOVA model. Accordingly, the GEFT measure was correlated with the transformed accuracy scores of individual subjects with all of the outliers identified excluded. The result was a highly significant correlation (R = .71, P(R)=97E-6, Mean Accuracy =0.94, Mean GEFT= 14.05). This indicated that accuracy performance was largely explained by individual differences. Subjects who were considered as more field-independent performed more accurately than those who were rated as more field-dependent. To verify this high degree of correlation between the GEFT measure and accuracy performance among participants, an ANCOVA procedure with the GEFT measure included as the covariate was used to analyze the datasets for sessions 1 and 2 as well as the set of transformed scores. Results of the ANCOVA procedure indicated a highly significant GEFT effect at the a = 0.01 criteria for session 2 dataset as well as for the combined (transformed) dataset as shown in table 6.11. RESULTS: EXPERIMENT 1 / 1 5 4 Table 6.11: Comparison of ANCOVA Results for Sessions 1, 2, and the Transformed Dataset (Experiment 1, Outliers Excluded) Dependent Variable: Accuracy Performance Sources of Va r ianc e p-values (Comb i ned Sess ions) p-values (Session 1 ) p-values (Session 2) GEFT 0.0004** 0.3326 0 . 0008** G: Graph Format 0.0025** 0.0059** 0 . 1240 Q: Question Type. 0 .0986 0 . 3028 0.1548 T: Time Pe r i o d 0 . 4458 0.273 1 0 .8037 D: Dataset 0.048 I * 0 .0004** 0.8409 G*Q 0.2176 0 . 1597 0.1765 G*T 0.2905 0.0400*• 0.2882 G*D 0.8459 0 .5918 0.9492 Q*D 0 . 1592 0.3603 0.3746 Q*T 0.0 159* 0.0244* 0.0830 T*D 0. 1099 0.1419 0.4120 G*T*D 0.0131* 0.2114 0.0190* G*Q*T 0.1817 0.2772 0 . 5488 G*Q*D 0.7850 0.8070 0.7735 Q*T*D 0.1040 0 . 1806 0.2474 G*Q*T*D 0.2454 0.9832 0.2566 Significant at p = 0.05 level * Significant at p = 0.01 level RESULTS: EXPERIMENT 1 / 1 5 5 Apart from a GEFT effect, the following significant effects were found for accuracy: 1. Graph Format effect 2. Dataset effect 3. Question Type by Time Period effect 4. Graph Format by Time Period by Dataset effect On the basis of the highly significant GEFT variable (F= 18.69; P(F) = 0.0004) found for this experiment (E1) in explaining accuracy performance as well as the relatively fewer number of significant effects found with accuracy as compared to time, the discussion on accuracy scores will thus focus only on those significant effects that are of interest. Higher order interactions that are normally very difficult to interpret properly are not discussed. 1. Main Effects on Accuracy for Transformed Data 1. Graph Format -- Analysis of the data showed that higher percentages of correct responses were obtained when subjects were given bar charts (95%) or symbol graphs (96%) than when they were given line graphs (92%). This finding indicated that line graphs were harder to use than the others. This should not be surprising given that time results for this experiment indicated Q1 to be more difficult to answer than the other tasks. The two results appeared complementary rather than conflicting. 2. Dataset - It was found that increasing datasets adversedly affected accuracy. Hence, the percentage of correct responses was 95% when only single dataset plots were used but dropped to 93% when multiple dataset (3 datasets) plots were used. Again, this finding is consistent with results on time performance where it was found that more datasets led to more time and effort. As usual, complexity of graphics is greater with more datasets. RESULTS: EXPERIMENT 1 / 1 5 6 2. Two-way Interactions on Accuracy for Transformed Data 1. Question Type x Time Period - Analysis of this significant interaction revealed that although increasing time periods adversely affected accuracy performance for task activities associated with Q 1 , accuracy for Q2 or Q3 actually improved slightly.t The underlying cause of such a phenomenon lies perhaps in the fact that with more time periods (14 periods) depicted on a graph, datapoints belonging to the same category are brought closer together. This actually resulted in facilitating rather than obstructing a better (i.e. more accurate) perception of trends (Q3) or level differences (Q2). Table 6.12 shows the mean values for this interaction. Applications of the Dunn-Bonferroni tests on the transformed data, however, failed to reveal any differences to be of significance at the a = 0.05 criteria. The differences among the treatment combinations with respect to accuracy performance were thus weaker than expected. The overall results were nonetheless consistent with the findings on time performance found earlier. Together, the significant effects indicated that more time periods depicted on the abscissa of time series graphs used in this experiment would lead to an increase in latency time response and lower accuracy performance only for the extraction of DV scale-values (Q1), but would have little or no adverse effect on the identification of relative levels (Q2), or on that of a trend (Q3). Conversely, more time periods appeared to yield faster as well as more accurate reading of trends although such improvements in time and accuracy were not statistically significance.* + This improvement was, however, not statistically significant (table 6.12). * Refer to table 6.9 and compare treatment combination (*9) with (*11); refer also to table 6.12 and compare treatment combination (*5) with (*6). RESULTS: EXPERIMENT 1 / 1 5 7 Table G.12: Mean Values of the Question Type x Time Period Interaction for Transformed Data (Experiment 1, Outliers Excluded) Dependent Variable: Accuracy Performance Q u e s t i o n T y p e Ql Q2 Q3 7 P e r i o d s 97% 92% 95% 14 P e r i o d s 9 1 % 94% 97% T h e a p p l i c a t i o n o f D u n n - B o n f e r r o n i m u l t i p l e - c o m p a r i s o n m e t h o d f a i l e d t o i d e n t i f y a n y s t a t i s t i c a l l y s i g n i f i c a n t k e y c o n t r a s t s . RESULTS: EXPERIMENT 1 / 1 5 8 D. SUMMARY OF EXPERIMENT El RESULTS One unexpected finding of this experiment was that trend reading tasks (Q3) were performed faster than scale-value reading tasks (Q1). Although this unexpected result appears to contradict the hypothesis that no one form of information to be extracted is uniformly easy or difficult, it should be remembered that only time series graphics were compared and contrasted in this experiment. Indeed, had tables been included as an alternative form of representation among the stimuli used, the expected result might have been realized. In general, this finding sheds some light on effectiveness of graphics for reading trends compared to exact values (i.e. when DV scale-values are to extracted and compared as in Q1). In spite of this, results of this experiment provided clear evidence that no one graph format is superior to the others for all situations. In fact, the key finding of this study is that the ease or difficulty of using each particular graph format is strongly dependent on the characteristics of the task activities to be performed. Hence, it was found that significantly less time was required to extract single scale-values (Q1) from bars or symbols than from lines. In a similar vein, lines were found to be faster for reading trends (Q3) or for reading level differences (Q2) than for reading scale-values (Q1). No significant differences were found among the various graph formats for extracting level difference information (Q2) although there was some evidence of similarity between Q2 and Q3 in this experiment. In addition, results of this experiment confirmed that the complexity of graphics is stronger with increasing number of time periods depicted, as well as with more dataset categories. As for the interaction of complexity factors with tasks, it was found that more time periods exerted a greater adverse effect on the performance of certain task activities (i.e. Q1), but had virtually no effect on other activities (e.g. Q2 and Q3). In fact, these claims were supported from evidence drawn from results for time as well as for accuracy. RESULTS: EXPERIMENT 1 / 1 5 9 In general, multiple bars representations were found to yield a significantly greater latency time than multiple symbol representations. More importantly, the effect of increasing datasets were found to adversely affect bars more than the other formats. Finally, the highly significant GEFT measure as a covariate in the ANCOVA model for analyzing accuracy performance in this experiment, and the significantly positive correlation found between the GEFT measure and the accuracy performance measure, pointed to the role of individual characteristics as a key determinant of accuracy in this experiment. That is, field-independents were found to perform more accurately than their counterparts (field-dependents). In summary, there appears to be converging evidence supporting the hypotheses drawn in chapters 3 and 4. For example, that lines were found to yield the worst accuracy performance in this experiment was because most errors were committed on questions which required subjects to read scale-values of single points on lines (see also Appendix I). Lines have characteristics of low axes anchoring and thus Q1 was particular difficult to answer using lines than using other graph formats. VII. RESULTS: EXPERIMENT 2 This chapter covers results for experiment E2. The subject population comprised thirteen second-year commerce undergraduates and eleven first-year MBA students. Seventeen of these candidates were males and seven were females. Their average age was 25.21 years of age. As a starting point of the discussion, it may be appropriate to review briefly the major characteristics of task activities investigated in E1 and E2.t In El, all questions or tasks begin with given PIV attribute information (i.e. time period information) anchored on the abscissa framework and work towards uncovering DV attribute information (i.e. scale-value, level difference, and trend information).$ In this experiment, Q1 alone has a strong anchoring of information on both the x-axis and y-axis but Q2 and Q3 have strong anchoring of information only on the x-axis. Although questions or tasks examined in E2 also have characteristics of strong anchoring of information on the abscissa framework, in contrast to E1 tasks, they all begin with some known characteristics of DV attribute information and work towards the uncovering of PIV attribute information (i.e. time period information). Yet, it is interesting to note that while tasks examined in E1 and E2 are different in terms of information specified in the question and information requested in the answer, the classification of E1 and E2 tasks based on the anchoring concept is, nonetheless, identical. * The layout of this chapter is similar to chapter 6, as both are concerned with presenting experimental results for the individual studies. Accordingly, key issues on statistical analysis are discussed at the beginning in this chapter. Again, emphasis is placed on session 2 results for time if a significant Session effect is found. Results on accuracy for this experiment are also based on the combined t A detailed discussion on the differences in tasks examined among the experiments was presented in chapter 3. Thus, only a brief note on the chief characteristics of task activities performed in E1 and E2 will be given here. $ See tables 3.4, 3.5, and 3.8. 160 RESULTS: EXPERIMENT 2 / 161 session dataset, based on the reasoning discussed in earlier chapters. A. TIME PERFORMANCE FOR COMBINED SESSIONS First, average time performance for E2 was about 6.14 seconds. This was a substantial increase from the average time performance found for E1. However, consistent with expectation and the reasoning provided in chapters 3 and 4, this result confirmed that task activities tested in E2 were apparently more complex than those evaluated in E1. Results of the initial ANCOVA procedure on the full dataset with the exception of the single outlier identified in table 5.1 is shown in table 7.1. 1. The Session Effect A highly significant (F = 63.98, p < .01) Session effect was found for E2. Time performance during session 1 averaged 7 seconds but dropped to only about 5 seconds during session 2. As usual, this was interpreted as indicative of an ongoing adjustment or learning process between sessions. Therefore, separate analyses of session 1 and session 2 datasets were conducted. Also, significant interactions between the Session variable and other factors were found in the analysis (see table 7.1). These other factors included the graph format, the question type, and the dataset variables. Analysis of these significant interactions revealed, however, that they were strictly ordinal; in other words, regardless of the fact that these interactions were significant, the strong Session effect, which was indicative of learning, could still be generalized across all levels of the classification factors included in the initial statistical model. RESULTS: EXPERIMENT 2 / 1 6 2 2. The GEFT Measure The relationship between the GEFT measure and time performance scores for E2 was not significant: that is, results of the analysis (table 7.1) indicated a clear independence of the GEFT measure from the time performance measure (F = 3.45, p > .05). Moreover, when the GEFT measure was correlated with the corresponding time scores for all subjects combined over the full 72 treatment combinations, the result was also not statistically significant (R = -.1390, P(R) = .5023, Mean Time = 6.14 s, Mean GEFT=15.63). In addition, correlations of corresponding scores between and within sessions for these relationships were not significant. Therefore the GEFT scores were excluded from the final statistical model used for analyzing E2 datasets. 3. Additional Outliers The same ground rules used for detecting additional outliers for E1 were used for E2. That is, these outliers were identified based on the extent of individual time-accuracy correlations for scores captured during session 2 only. Table 7.2 presents results of the analysis with separate indications for correlations that were found to be highly as well as marginally significant. 4. The Power Analysis Table 7.2 indicates that two out of the eight additional outliers identified for this experiment actually exhibited only marginally significant time-accuracy tradeoff effect at the a = 0.05 criteria. To assess the impact of their presence on the findings, statistical analyses of the data including as well as excluding these marginally significant outliers (i.e. subjects 05 and 06 - see table 7.2) were run. separately. The results revealed negligible effects on the number or type of factors and their combinations that were found to significantly affect time performance. In other words, the same significant effects or interactions were found with or without their being included as part of the data. RESULTS: EXPERIMENT 2 / 163 Table 7 .1 : Initial ANCOVA Results for Full Dataset (Experiment 2) Dependent Variable: Time Performance S o u r c e s o f V a r i a n c e F C o n v e n t i o n a l p - v a l u e s G r e e n h o u s e Ge i s s e r P r o b . GEFT 3 . 45 0 . 0 7 7 2 S : S e s s i o n 6 3 . 98 0 . 0 0 0 0 * * • G : G r a p h F o r m a t 5 . 53 0 . 0 0 6 9 * * 0 . 0 1 2 4 * Q : Q u e s t i o n T y p e 4 . 03 0 . 0 2 3 6 * 0 . 0 2 4 1 * T : T i m e P e r i o d 8 3 . 57 0 . 0 0 0 0 * * D: D a t a s e t I 0 0 . I 0 0 . 0 0 0 0 * * S*G I 2 . 87 0 . 0 0 0 0 * * ,0 . 0 0 0 0 * * S*Q I 4 . 95 0 . 0 0 0 0 * * 0 . 0 0 0 0 * * S * D I I . 24 0 . 0 0 2 9 * * G*Q 8 . 40 0 . 0 0 0 0 * * 0 . 0 0 0 1 * * G * T 3 . 62 0 . 0 3 5 0 * 0 . 0 3 7 9 * G * D 2 2 . 6 5 0 . 0 0 0 0 * * 0 . 0 * * Q * T I . 05 0 . 3 5 3 5 0 . 3 4 6 7 Q * D 6 . 47 0 . 0 0 3 4 * * 0 . 0 0 9 2 * * T * D 33 . 8 l 0 . 0 0 0 0 * * G * Q * T 3 . 1 2 0 . 0 1 8 3 * 0 . 0 2 8 5 * G * Q * D 0. 3 3 0 . 5 1 0 5 0 . 4 8 6 5 G * T * D 4 . 26 0 . 0 2 0 3 * 0 . 0 2 6 3 * G * Q * D 0. 8 3 0 . 5 1 0 5 0 . 4 8 6 5 Q * T * D 2 2 . 64 0 .0000** 0 . 0 * * G * Q * T * D 2 . 99 0 . 0 2 3 1 * 0 . 0 4 5 1 * Significant at p = 0.05 level * Significant at p = 0.01 level RESULTS: EXPERIMENT 2 / 1 6 4 Table 7.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment 2, Session 2) Dependent Variable: Time Performance Scores E2 S u b j e c t s ( S e s s i o n 2) C o r r e l a t i o n R P r o b a b i l i t y P ( R ) S a m p l e S i z e 0 1 02 03 04 05 0 6 07 08 0 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 I 7 I 3 1 9 20 2 1 23 24 (PERFECT ACCURACY — NOT COMPUTABLE) . 5 6 5 3 15 E - 5 * * 36 . 5 8 6 3 7 1E - 6 * * 36 . 1 534 . 3 6 0 4 36 . 3 2 1 0 . 0 4 9 0 * 36 . 3 2 0 0 . 0 4 9 7 * 36 . 1 1 0 7 . 51 09 36 . 4 7 0 0 . 0 0 2 5 * * 3 6 . 5 4 9 2 2 6 E - 5 * * 36 . 3 4 9 9 . 0 3 7 6 * 3 6 . 0 1 3 2 . 9 3 7 6 3 6 . 6 5 9 1 2 9 E - 7 * * . 36 .1998 • . 2 3 0 9 3 6 (PERFECT ACCURACY - - HOT COMPUTABLE) . 0 7 4 6 - . 0 1 3 8 - . 1 2 4 7 - . 0 6 1 0 . 0 4 7 2 - . 0 119 . 6 5 8 3 . 9 3 4 9 . 4 5 8 2 . 7 1 73 . 7 7 9 9 . 9 4 3 8 (PERFECT ACCURACY . 0 0 7 8 , 9 6 3 4 (PERFECT ACCURACY 36 3 6 3 6 3 6 3 6 3 6 NOT COMPUTABLE) 36 NOT COMPUTABLE) Significant at p Significant at p = 0.05 level . = 0.01 level RESULTS: EXPERIMENT 2 / 1 6 5 Yet, the difference between six and eight additional outliers appeared to have a much greater impact on the power values of the various F-tests. Consequently, the choice was to include these two subjects whose time-accuracy correlations were only at the threshold of significance in the data analysis for time performance. Moreover, a check on the time-accuracy tradeoff for all subjects subsequently included in the analysis confirmed the overall absence of any significant accuracy-time correlations. Thus, the retention of the two marginally significant outliers helped to maintain power values for most of the F-tests conducted above the conventionally acceptable .80 benchmark. This justified limiting the rest of the discussion on time performance for null hypotheses that were rejected. B. TIME PERFORMANCE FOR SEPARATE SESSIONS As indicated in the previous chapter, the rationale for excluding the same set of outliers for all datasets (i.e. the combined dataset, sessions 1 and 2 datasets) before further analysis was to ensure comparability of resulting analyses. Table 7.3 presents all of the main and interaction effects that were found to be significant for the various datasets. A table of mean values for time performance scores combined across all the initial 36 treatment combinations for sessions 1 and 2 are presented in table 7.4. Note that each cell in the table shows a marked improvement between performance in session 2 as compared to that in session 1. 1. Significant Effects on Time for Session 1 As observed from table 7.3, the following main or interaction effects were found to be significant for time during session 1: RESULTS: EXPERIMENT 2 / 1 6 6 Table 7.3: Comparison of ANOVA Results Among Sessions (Experiment 2, Additional Outliers Excluded) Dependent Variable: Time Performance Sources o£ Va r i anc e p-values (Comb i ned Sessions) p-values Session 1 p-values Session 2 G: Graph Fo rma t 0.0037** 0 .0000** 0 . 6385 Q: Question Type 0 . I 442 0 .03 13* 0 .3743 T: Time Pe r i od 0 . 0000** 0 . 000 1 ** 0 .0000** D: Dataset 0 . 0000** 0 .0000** 0 .0000** G*Q 0 . 0086** 0 .0090** 0 . 1 637 Q*T 0 .7388 0 . 5670 0 . 9794 G*T 0 .0875 0 .0475* 0 .6253 G*D 0 .0000** 0 . 0000** 0 .0067** Q*Q 0 .0622 0 .0539 0 .4101 T*D 0 .0000** 0 .0001** 0 .0003**' G*Q*T 0 . I 442 0 . 5527 0 .03 12* G*T*D 0 .0287* 0 . 1 294 0 .0940 G*Q*D 0 . 3333 0 . 3380 0 . 6923 Q*T*D 0.0000** 0 .0003** 0 . 0038** G*Q*T*D 0 .0053** 0 .0163* 0 . 1 001 * Significant at p = 0.05 level ** Significant at p = 0.01 level RESULTS: EXPERIMENT 2 / 1 6 7 Table 7.4: Tables of Means for All Treatment Combinations (Experiment 2, Additional Outliers Excluded) Dependent Variable: Time Performance G r a p h i c a 1 I n f o r m a t i o n Complex i t y Q l B a r s Q2 Q3 S) Q l /mbo15 Q2 Q3 I Q l ..ines Q2 Q3 a S e s s i o n 1 S e s s i o n 2 3 .43 2 . 67 6 .70 3 . 09 5 . 30 4.14 4.83 3 .20 5.15 3 .66 8 .03 4.10 5.21 3.83 6 . 07 3 . 42 4 .84 3 . 38 b S e s s i o n 1 S e s s i o n 2 4 . 56 3.37 4 . 45 3 .05 5. 42 3.41 6 .40 4 .54 4.77 3.13 4 . 67 4 . 00 6 . 58 4 .57 5. 54 2 .73 3.96 3 . 29 c S e s s i o n 1 S e s s i o n 2 6 .88 5 .07 6 . 98 3 . 56 7.91 4.70 6 . 57 4 .65 6.13 4 .30 5 . 42 3 . 68 5 . 28 4.28 5 . 49 4 . 94 5.59 4.57 d S e s s i o n 1 S e s s i o n 2 8.31 5.71 14.5 7.61 12.7 7.50 6.55 4.69 8.59 5.96 9.18 6.11 6 . 68 6.91 8 . 98 6.16 9. 27 6.04 Treatment C o m b i n a t i o n s of I n f o r m a t i o n C o m p l e x i t y a: 1 D a t a s e t w i t h 7 Time P e r i o d s b: 1 D a t a s e t w i t h 14 Time P e r i o d s c: 3 D a t a s e t s w i t h 7 Time P e r i o d s d: 3 D a t a s e t s w i t h 14 Time P e r i o d s RESULTS: EXPERIMENT 2 / 1 6 8 1. Graph Format effect 2. Question Type effect 3. Time Period effect 4. Dataset effect 5. Graph Format by Question Type interaction 6. Graph Format by Time Period interaction 7. Graph Format by Dataset interaction 8. Time Period by Dataset interaction 9. Question Type by Time Period by Dataset interaction 10. Graph Format by Question Type by Time Period by Dataset interaction Among these, only the significant Graph Format by Question Type effect for this session 1 will be discussed as it is of major interest in this research and the findings will be applicable to occasional or first-time graph users for the kinds of task activities investigated in this experiment. Figure 7.1 shows the plot for this 2-way interaction and the corresponding mean values for the respective treatment cells. Table 7.5 summarizes results of the Dunn-Bonferroni tests on key contrasts for the interaction. An examination of figure 7.1 and table 7.5 indicated that the significance of this two-factor interaction appeared to rest in just a particular treatment combination: that is, the most time consuming treatment combination appeared to be the combination, Bar Graph and Q2. Accordingly, it was found that, with bars, performing Q2 took significantly longer than Q1 (figure 7.1). Similarly, Q3 took longer than Q 1 , but that difference was not statistically significant among contrasts of interest as examined by the Dunn-Bonferroni tests based on the a = .05 level (see table 7.5). That subjects took longer time to perform Q2 than Q l with bars in this experiment was due primarily to the different characteristics of the respective tasks. Since Q1 was concerned with extracting a single time period whose scale-value was closest to another given value, whereas Q2 was concerned RESULTS: EXPERIMENT 2 / 1 6 9 with extracting two consecutive time periods showing the largest level difference, this result was expected. This is because Q2 involved numerous pairwise comparisons among bars whereas Q1 involved the isolation of a single bar. The latter task used characteristics of bars better than the former task: that is, bars were more easily processed individually than pairwise due to their characteristics of being discrete and isolated. Further, the anchoring concept discussed in chapter 3 makes it seem logical that bars, rather than lines or symbols, should require longer time for performing Q2 as compared to Q 1 . This is because the extraction of level relationships between consecutive pairs of time periods of the same dataset (Q2) required not only a strong x-axis anchoring characteristic on the part of the graph format being used but a substantial dataset anchoring characteristic as well. Because bars rated worst relative to either symbols or lines on their dataset anchoring characteristics, it appeared reasonable that longer time should be taken to perform Q2 than Q1 for bars but not for lines or symbols. Although it appeared logical to find bars requiring longer time for performing Q2 (or, even Q3) over Q 1 , one wonders why lines did not show a strong (i.e. statistically significant) difference in their expected disadvantage over the other formats when they were used to answer Q1 in this experiment, t A reasonable explanation of this phenomenon is that Q1 in E1 was more difficult for lines than Q1 in E2 because the abscissa scale, where the value was to be extracted for Q1 in E2, was undoubtedly more discrete, and thus easier to extract, than the ordinate scale, where the value was to be extracted for Q1 in E1. Thus, the type of scale becomes a facilitating factor for disembedding points on lines. Future research may include the control of Scaling factor (e.g. DeSanctis & Jarvenpaa, 1985) when evaluating graphical, presentations. Note that most other key contrasts were as expected although they were not statistically significance at the nominal level (cf. figure 7.1 with table 7.5). t Note that E1 results clearly showed lines to be the worst format for performing Q 1 , compared to the other graph formats. RESULTS: EXPERIMENT 2 / 1 7 0 Figure 7.1: Plot and Mean Values of Graph Format x Question Type Interaction -(Experiment 2, Session 1) Dependent Variable: Time Performance L o g o n d Q U E S T I O N T Y P C Q u e s t i o n T y p e B a r s S y m b o l s L i n e s Q l 5.80 6.09 (*4) 5.94 (*7) Q2 8.16 (*2) 6.16 (*5) 6.52 (*8) Q3 7.84 (*3) .6.82 (*6) 5.91 (*9) * Numbering i n c e l l s c o r r e s p o n d t o t r e a t m e n t c o m b i n a t i o n s f o r D u n n - 3 o n f e r r o n i method of m u l t i p l e means c o m p a r i s o n . RESULTS: EXPERIMENT 2 / 1 7 1 Table 7.5: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 2, Session 1) Dependent Variable: Time Performance 1. Significant Differences among Means at the a = 0.05 level: (*1,*2) 2. No other key contrasts were found to be significant. Note: Numbering of treatment combinations above refers to marked cells in table of means given in figure 7.1. RESULTS: EXPERIMENT 2 / 1 7 2 2. Significant Effects on Time for Session 2 For session 2, the following significant main and interaction effects were found (table 7.3): 1. Time Period effect 2. Dataset effect 3. Graph Format by Dataset interaction 4. Time Period by Dataset interaction 5. Graph Format by Question Type by Time Period interaction 6. Question Type by Time Period by Dataset interaction Overall, there were less significant effects found in session 2 as compared to session 1. Moreover, those effects that were found to be significant in session 2 were generally also highly significant (p < .01) in session 1.t This observation indicated that significant effects found in session 2 represented the more permanent effects that would be expected regardless of the amount of training or learning previously undertaken by participants. a. Main Factor Effects on Time for Session 2 1. Time Period -- Effect of this highly significant (p < .01) factor indicated that the complexity of graphics increased with more time periods being depicted. Average time taken for using a 7-period plot was 4 seconds in contrast to about 5 seconds for using a 14-period plot. 2. Dataset -- This was another main factor effect found to be highly significant (p < .01). Graphics with multiple datasets took longer (about 5.4 seconds) to process than those with only single datasets (about 3.5 seconds). Although findings on time period and dataset variables might be regarded as of lesser interest compared to those on graph format and question type, these factors were nonetheless included because whether or not, and if so how, they would interact with the more crucial variables (i.e. graph t The exception here was the Graph Format by Question Type by Time Period interaction. RESULTS: EXPERIMENT 2 / 173 format and question type variables) would provide additional insights on the use and design of time-series graphics. fa. Two-way Interactions on Time for Session 2 1. Graph Format x Dataset -- This 2-way interaction was highly significant (p < .01) as well as strictly ordinal. The plot and mean value table for the interaction are provided in figure 7.2. Table 7.6 shows the Dunn-Bonferroni results on those contrasts that were of key interest. Analysis of this interaction revealed that the use of multiple dataset representations, regardless of the graph format used, resulted in additional time and effort as compared to the use of only single dataset representations (see figure 7.2). Results also showed that different types of graph format used resulted in different degrees of additional time and effort spent when comparing multiple versus single dataset representations (see table 7.6). For instance, with increasing number of dataset categories, bars were found to yield a higher significant increase in latency time performance, as compared to using a single dataset line or symbol graph versus a multiple dataset line or symbol graph. The underlying reasoning for this important interaction would be what has been termed the "categorical isolation" effect of multiple bars, described in chapter 6 (refer to the same interaction effect that was found significant in session 2 of E1). Briefly stated, the separation of bars belonging to the same dataset category in multiple bar graphs tended to make them difficult to read. The absence of such an effect on symbols as well as lines made them less vulnerable, and thus, the resulting adverse effect on time was found to be of lesser significance with these representations compared to bars. RESULTS: EXPERIMENT 2 / 1 7 4 2. Time Period x Dataset -- This was another highly significant (p < .01) interaction found during session 2 and plotted in figure 7.3. A table of mean values for the interaction is provided along with the figure. The Dunn-Bonferroni results for key contrasts are summarized in table 7.7. It was found that more time was generally required with more time periods as well as datasets depicted (table 7.7). This was consistent with findings on main effects, namely, the time period and dataset factors. Note that an interesting parallel could be drawn between these complexity factors and factors associated with additional rows and columns in tabular representations. RESULTS: EXPERIMENT 2/175 F i g u r e 7.2: P l o t a n d M e a n V a l u e s of G r a p h F o r m a t x D a t a s e t I n t e r a c t i o n ( E x p e r i m e n t 2, S e s s i o n 2) D e p e n d e n t V a r i a b l e : T i m e P e r f o r m a n c e J2L © •« „ L a g a n d O 3 G R A P H F O R M A T G r a p h Forma t B a r s S y m b o l s L i n e s 1 D a t a s e t ' •3.29 . '(* 1 ) 3.78 (*3) 3.52 (*5) 3 D a t a s e t s 5.69 ( * 2 ) 4.90 (*4) 5.49 ( * 6 ) * Numbering i n c e l l s correspond to treatment combinat ions for Dunn-Bonferron i method of m u l t i p l e means compar i son . RESULTS: EXPERIMENT 2 / 1 7 6 Table 7.6: Summary of Dunn-Bonferroni Results for Graph Format x Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance 1. Significant Differences among Means at the a = 0.05 level: a. (*3,*4) b. (*5,*6) 2. Significant Differences among Means at the a — 0.01 level: a. (*1,*2) 3. No other key contrasts were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 7.2. RESULTS: EXPERIMENT 2 / 1 7 7 Figure 7.3: Plot and Mean Values of Time Period X Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance o-L a g o n d Q 3 O r o - - r T I M C P E R I O D T i m e Pe r i o d Da t a s e t 2 D a t a s e t s Ca t e g o r y 3 D a t a s e t s 7 P e r i o d s 3.49 4.42 (*2) 14 P e r i o d s 3.57 (*3) 6.30 (*4) N u m b e r i n g i n c e l l s c o r r e s p o n d t o t r e a t m e n t c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i method o f m u l t i p l e means c o m p a r i s o n . RESULTS: EXPERIMENT 2 / 178 Table 7.7: Summary of Dunn-Bonferroni Results for Time Period x Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance Significant Differences among Means at the a = 0.05 level: a. (*1,*2) Significant Differences among Means at the a = 0.01 level: a. (*2,*4) b. (*3,*4) No other key contrasts were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure RESULTS: EXPERIMENT 2 / 1 7 9 c. Three-way Interactions on Time for Session 2 Although results of session 2 indicated two three-factor interactions to be statistically significant, multiple means comparison tests performed on key contrasts underlying the Graph Format by Question Type by Time Period interaction did not yield any significant differences. A table of cell means for this interaction is nonetheless provided for the reader in table 7.8. The other three-factor interaction that was found to be significant (p < .01) based on the ANOVA result was the Question Type by Time Period by Dataset interaction. Table 7.9 provides the cell means and table 7.10 shows the Dunn-Bonferroni test results for this interaction. Examination of the Dunn-Bonferroni results revealed that for all task activities performed ( Q l , Q2, and Q3), significant differences were found with increasing datasets (3 datasets) on 14-period plots but not for 7-period plots. Results also showed a similarity between Q2 and Q3 in the pattern of significant differences found among the various types of graphical designs investigated (see table 7.10); that is, the most complex plots of (14-period, 3 datasets) were significantly harder to use than any of the simpler plots. In fact, with highly complex graphs (3 datasets, 14 periods), performance of ail tasks was difficult relative to simple graphics (1 dataset, 7 periods). Finally, there was some evidence to indicate that increasing time periods (14-periods) on a single dataset plot actually facilitated trend and pattern perception although these differences were not statistically significant, t A possible explanation could be that perception of continuity among points belonging to the same dataset is also highly dependent on how closely these points are placed each to one another. Of course, this is also dependent on the number of time periods to be shown on a single plot. t Compare mean values of treatment combinations (*5) with (*7) and ("9) with (*11) in table 7.9. RESULTS: EXPERIMENT 2 / 1 8 0 T a b l e 7.8: M e a n V a l u e s o f G r a p h F o r m a t x Q u e s t i o n T y p e x T i m e P e r i o d I n t e r a c t i o n ( E x p e r i m e n t 2, Sess ion 2) D e p e n d e n t V a r i a b l e : T i m e P e r f o r m a n c e Que s t i on T y p e Ba r s . S y m b o l s L i n e s Ql 7 P e r i o d s 14 P e r i o d s 3.87 (* 1 ) 4.54 (*2) 3.92 (*7) 4.61 ( * 8 ) 4.05 (*13) 5.74 (*14) Q2 7 P e r i o d s 14 P e r i o d s 3.32 (*3) 5.33 (*4) 3.98 ( * 9 ) 4.57 (*10 ) 4.18 (*15 ) 4.44 (*16) Q3 7 P e r i o d s 14 P e r i o d s 4.42 (*5) 5.45 ( * 6 ) 3.89 (*1 1 ) 5.05 (*12) 3.98 (*17) 4.66 (*18) RESULTS: EXPERIMENT 2 / 1 8 1 Table 7.9: Mean Values of Question Type x Time Period x Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance Q u e s t i o n T y p e T r e a m e n t s o f I n f o r m a t i o n C o m p l e x i t y F a c t o r s 7 T i m e One Da t a s e t 1 Pe r i o d s T h r e e Da t a s e t s 14 T i m e One . Da t a se t 1 P e r i o d s T h r e e Da t a s e t s Q 1 . S e s s i o n 2 3.23 s (*1 ) 4.67 s (*2) 4.16 s (*3) 5.77 s (*4) Q2 S e s s i o n 2 3.39 s (*5) 4.27 s (*6) 2.99 s (*7) 6.53 s (*8) Q3 S e s s i o n 2 3.85 s (*9) 4.34 s (*10) 3.56 s ( * 1 1 ) 6.55 s (*LAST) * Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t c o m b i n a t i o n s f o r the D u n n - B o n f e r r o n i method of m u l t i p l e means c o m p a r i s o n . RESULTS: EXPERIMENT 2 / 1 8 2 Table 7.10: Summary of Dunn-Bonferroni Results for Question Type x Time Period x Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance 1. Significant Differences among Means at the a = 0.05 level: a. (*3,*4) 2. Significant Differences among Means below the a — 0.01 level: a. (*1,*4) b. (*5,*8) c. (*6,*8) d. (*7,*8) e. (*9,*12) f. (*10,*12) g. (*11,*12) 3. No other key contrasts were found to be significant. Note: Numbering of treatment combinations above correspond to marked cells in the table of means given in table 7.9; also, *T2 = *LAST in table 7.9. RESULTS: EXPERIMENT 2 / 1 8 3 C. ACCURACY PERFORMANCE FOR COMBINED SESSIONS In this experiment, correlations of the GEFT scores with corresponding accuracy scores for the combined as well as the separate sessions (i.e. sessions 1 and 2) did not reveal any significant relationships (table 7.11). Consistent with these results were the similarly insignificant GEFT effects found when the GEFT measure was included as a covariate in the statistical models used to analyze the accuracy scores for the the separate session datasets (i.e. sessions 1 and 2) as well as for the transformed (combined) dataset. Table 7.11 shows the resulting significant effects for accuracy performance with respect to session 1 and 2 datasets as well as the transformed dataset based on similar rules of data transformation adopted in chapter 6. In this chapter, data analysis on accuracy scores for session 2 of E2 did not yield any interesting significant effects. The following main factor and two-factor interactions were, however, found to be significant based on the analysis of accuracy scores coded as the combined (transformed) dataset: 1. Time Period effect 2. Graph Format by Question Type interaction 3. Question Type by Dataset interaction 1. Main Effects on Accuracy for Transformed Data As expected, accuracy was lower with more complex graphics. Results showed that the overall mean accuracy score for the 14-period graphics was 9 1 % whereas it was 96% for the 7-period graphics. This finding is generally consistent with results on time performance in that increasing number of time periods had always led to longer processing of graphics stimuli. RESULTS: EXPERIMENT 2 / 1 8 4 Table 7.11: Comparison of ANCOVA results for Sessions 1, 2, and the Transformed Dataset (Experiment 2, Outliers Excluded) Dependent Variable: Accuracy Performance Sources o f Va r i a nc e p-values (Combined Sessions) p-values Session 1 p-values Session 2 GEFT 0 .9716 0 .23 13 0.6526 G: Graph Forma t 0 .3287 0 . 4648 0.4441 Q: Quesc ion Type 0 .0642 0 .0403* 0.0863 T: Time P e r i od 0 .0269* 0 .0 145* 0 . 1857 D: Dataset 0 .3952 '. 0 . 2826 0 .3050 G*Q 0 .0266* 0 .00 15** 0.4453 G*T 0 .9136 0 . 9209 0.8512 G*D 0 .1470 0 .0034** 0.4131 Q*D 0 .0308* 0 .0059** 0.1814 Q*T 0 .8728 0 .9113 0.854 1 T*D 0 .5397 0 .6145 0.8264 G*T*D 0 .8433 0 .273 1 0.8040 G*Q*T 0 . 1 636 0 .0132* 0.6434 G*Q*D 0 .0738 0 . 5668 0 .0508 Q*T*D 0 . 0831 0 .00 1 2 * * 0 . 6 0 6 3 G*Q*T*D 0 .0958 0 .3988 0 . 1 900 Significant at p = 0.05 level Significant at p = 0.01 level RESULTS: EXPERIMENT 2 / 1 8 5 2. Two-way Interactions on Accuracy for Transformed Data 1. Graph Format x Question Type -- Table 7.12(a) provides the cell means for this 3 x 3 interaction. No table is provided for the Dunn-Bonferroni results because no key contrasts among means were significant at the generally acceptable criterion. The largest mean difference found based on cross-tabulation of means for this interaction was the resulting lower percentage of correct responses (88%) found with line graphs for performing Q1 as compared to bar charts (97%). Symbols were also the best format for answering Q2 and Q3 based on a visual comparison of cell means (table 7.12a). These observations are in agreement with the anchoring concept (chapter 3). For example, bars are expected to be more accurate than lines for comparing scale-values (Q1) because of their characteristics of strong x-axis anchoring. Lines have a low anchoring to axes so that they would be unsuitable for reading specific location of points (i.e. x-value, y-value). 2. Question Type by Dataset - Analysis of the accuracy data for this particular interaction revealed that none of the key contrasts among means were statistically significant. Table 7.12b, however, shows the cell means for this interaction. RESULTS: EXPERIMENT 2 / 1 8 6 Tables 7.12(a,b): Mean Value Tables for Significant Two-factor Interactions for the Transformed Dataset (Experiment 2, Outliers Excluded) Dependent Variable: Accuracy Performance (a) Graph Format x Question Type Q u e s t i o n T y p e B a r s S y m b o l s L i n e s Q l 97% 91% 88% Q2 93% 99% 96% Q3 91% 96% 92% (b) Ques t ion Type x Dataset Q u e s t i o n T y p e Q1 . Q2- Q3 1 D a t a s e t 90% 9 9 % 96% 3 D a t a s e t s 94% 93% 90% RESULTS: EXPERIMENT 2 / 187 D. SUMMARY OF EXPERIMENT 2 RESULTS Results of E2 generally supported the experimental hypotheses presented in chapters 3 and 4. First, that graph format was not found to be a significant main effect confirmed the expectation that no one form of graph format would be superior overall. Second, results for the interaction of graph format and dataset were consistent with the finding that among the graph formats evaluated, bars had the worst "dataset" anchoring. Consequently, a highly significant difference was found for time between using multiple versus single bar representations whereas a less significant difference was found between using multiple versus singular line graphs. Mean contrast between singular versus multiple symbol graphs was not statistically significant. In addition, results showed that information complexity of graphics increased with increasing number of time periods and datasets. The three-factor Question Type by Time Period by Dataset interaction also provided some support for the notion that increasing number of time periods in the case of single dataset representations could actually facilitate the performance of Q2 and Q3 due to greater proximity among points belonging to the same dataset when more time periods are to be depicted.t Finally, the key finding regarding task activities studied in this experiment is that when anchoring information on the axes framework formed the main aspect of task activities to be performed (e.g. Q1)t bars usually facilitated but lines inhibited task performance. Conversely, when this information anchoring became slack as in Q2 and Q3, bars would inhibit whereas lines would facilitate task performance. In session 1, bars were found faster to use for answering Q1 than Q2. For the combined sessions, lines yielded lower accuracy when used to answer Q1 as compared with other graph formats. t Compare also tables 6.9 and 7.9 for treatment combinations (*5) with (*7) and (*9) with (*11). • t Q1 involved not only anchoring information on the abscissa or x-axis (i.e. time period information) but also anchoring information on the ordinate or y-axis (i.e. scale-value information) as discussed in chapter 3. RESULTS: EXPERIMENT 2 / 188 In summary, there appears to be converging evidence to support hypotheses drawn earlier (chapters 3 and 4). Yet the interpretation of results requires that particular attention be paid to graph format characteristics and the characteristics of the task at hand. Accuracy scores were, however, not found to be highly correlated with the GEFT scores in this experiment. VIII. RESULTS: EXPERIMENT 3 This chapter discusses the findings for experiment 3. The subject population for this experiment came from the same pool as those of El and E2: that is, they were students enrolled in MIS courses at the introductory level. Among them, twelve were second-year undergraduate and twelve first-year MBA students. Nineteen of the twenty-four subjects were males and five were females. Their average age was 24.67 years of age. As discussed in chapter 3,t the chief characteristic of task activities tested in E1 and E2 is that of having strong information anchoring on the PIV attribute component, ln E1, this anchoring information is specified in the question whereas in E2, it is required in the answer. The key difference between El and E2 tasks, therefore, lies in the class of information to be retrieved, ln E1, it comes chiefly from the DV component (i.e. scale-value, level difference, and trend information), whereas in E2, it comes chiefly from the PIV attribute component (i.e. time period information). In contrast, information to be retrieved for task activities evaluated in E3 comes chiefly from the SIV attribute component (i.e. dataset information). Specific time period information anchored on the abscissa component is neither provided, as in E1 tasks, nor requested, as in E2 tasks. In other words, the key difference between tasks examined in the other two experiments (i.e. E1 and E2) and those tested in E3 lies in their respective nature of anchoring information on the abscissa component. Apart from this key difference, E3 also differs from El and E2 in that only multiple representations of time-series graphics are used rather than both singular versus multiple dataset representations. The reason lies solely in the nature of tasks to be examined in E3. t See tables 3.4, 3.5 and 3.8 in particular. 189 RESULTS: EXPERIMENT 3 / 1 9 0 A . TIME PERFORMANCE FOR COMBINED SESSIONS The initial ANCOVA model ran on the full dataset for E3 used the same repeated measures design and classification factors that were specified for E1 and E2 with a minor except ion^ 1. S: Session (Session 1, or Session 2) 2. C: Graph Format (Bars, or Symbols, or Lines) 3. Q: Question Type (Q1, or Q2, or Q3) 4. T: Time Period Variation (7, or 14 Periods) 5. D: Dataset Category (2, or 3 Datasets) As usual, the GEFT scores were treated as a covariate. 1. The Session Effect Analysis of the full dataset for E3 with the exclusion of the initially identified outliers (see table 5.1) yielded a highly significant (F=19.63, p < .01) Session effect on time (table 8.1). Therefore, the experimental datasets for time captured in sessions 1 and 2 were analyzed separately. This experiment resulted in the longest average time response, among the three experiments. Approximately 9.9 seconds was taken for a typical trial in session 1 and about 7.6 seconds in session 2. Thus, the results confirmed the expectation that tasks designed for E3 were more complex than those for E l and E2 (see chapters 3 and 4). 2. The GEFT Measure The initial analysis of E3 data (table 8.1) revealed a non-significant GEFT effect (F = 0.16; p > .05) for time. This was further substantiated by the equally non-significant correlations of GEFT scores and average time performance scores for individual subjects between and within sessions. For example, the GEFT-time relationships based on BMDP6D analysis revealed only a low and non-significant t I.e. The levels of dataset category factor. RESULTS: EXPERIMENT 3 / 1 9 1 correlation for the full dataset (R = - .31; P(R) = .13; Mean Time = 8.58 s; GEFT=17). Hence, it was clear that the GEFT measure was relatively unimportant in explaining (or, predicting) time performance. The GEFT scores were thus excluded from further statistical analyses of time data. 3. Additional Outliers As in previous chapters, additional outliers that exhibited high time-accuracy tradeoff effect were identified based on individual time-accuracy correlations for session 2 only. Individuals indicating a significant or highly significant time-accuracy tradeoff are indicated in table 8.2 with the use of an asterisk (*) or a double asterisk (**) respectively. Table 8.2 shows four additional outliers whose time-accuracy correlations might be considered as highly undesirable. As such, all data scores belonging to these outliers were discarded from the dataset used for subsequent analysis and interpretation of findings for this experiment. Of course, subjects whose time-accuracy correlations could not be computed (table 8.2), because they (e.g. subjects 14 and 22) had achieved perfect accuracy, were included in the dataset used for statistical analyses. 4. The Power Analysis The removal of all additional outliers together with those already identified (table 5.2) was found to have only negligible effect on the power values of the various F-tests conducted. In this research, high power values were achieved chiefly because of the use of the repeated measures design (see chapter 5). RESULTS: EXPERIMENT 3 / 1 9 2 Table 8.1: Initial ANCOVA Results for the Full Dataset (Experiment 3) Dependent Variable: Time Performance Sources of Variance F Convent ional p-values Greenhouse Ge i s s e r Prob. GEFT 0 . 1 6 0.6924 S: Session 1 9 .63 0 . 0 0 0 4 * * G: Graph Format 49 .43 0 . 0 0 0 0 * * 0 . 0 * * Q: Question Type 1 5 . 6 1 0 . 0 0 0 0 * * 0 .000 1 * * T: Time Pe r iod 3 1 .09 0 . 0 0 0 0 * * D: Dataset 3 1 . 08 0 . 0 0 0 0 * * G*Q 34 .34 0 . 0 0 0 0 * * . 0 . 0 * * G*T 3 .20 0 .0539 0 .0587 Q*T 5 .90 0 . 0 0 6 6 * * 0 . 0 0 8 6 * * G*D 1 6 .7 1 0 . 0 0 0 0 * * 0 .000 1 * * Q*D 0 .28 0 .7597 0 .7447 T*D 7 .66 0 .0 137* G*Q*T 1 .07 0 .3789 0 .3637 G*Q*D 3 . 1 0 0 . 0 2 1 5 * 0 . 0 4 5 8 * G*T*D 0 .95 0 .3990 0 .3947 Q*T*D 1 3 .53 0 . 0 0 0 1 * * 0 . 0 0 0 2 * * G*Q*T*D 3 .39 0 .0 142* 0 . 0 2 8 5 * * Significant at p = 0.05 level * * Significant at p = 0.01 level RESULTS: EXPERIMENT 3 / 1 9 3 Table 8.2: Time-Accurat7 Correlations for Identifying Additional Outliers (Experiment 3, Session 2) Dependent Variable: Time Performance Scores E3 Subjects C o r r e l a t ion Probabi1i ty Sample (Session 2) R P(R) S i z e 0 I .3105 . 0575 36 02 -.0127 . 940 1 36 03 .0776 . 6455 36 05 -.1546 . 3566 36 06 -.6105 . 28E-6** 36 07 . 1477 . 3787 36 09 -.1263 .4506 36 I 0 .0555 . 7426 36 1 1 -.5072 94E-5** 36 1 2 -.0376 .8237 36 . 1 3 .0837 .6196 36 1 4 ( P E R F E C T ACC :URACY - - HOT C O M P U T A B L E ) 1 5 -.4969 .0013** 36 1 6 .07 1 5 .6719 36 1 7 .17 16 . 305 1 36 18 -.0792 . 6337 36 19 . 1 539 . 3536 36 20 -.3907 .0 146* 36 2 I -.1995 .2316 36 22 (PERFECT ACC 3URACY -- HOT COMPUTABLE) 23 -.0086 . 9594 36 24 -.0192 .9098 36 * Significant at p = 0.05 level + * Significant at p = 0.0.1 level RESULTS: EXPERIMENT 3 / 1 9 4 B. TIME PERFORMANCE FOR SEPARA TE SESSIONS Table 8.3 compares effects of main factors and their interactions that were analyzed for the combined as well as the separate sessions (i.e. session 1 and session 2). Table 8.4 shows the cell means for the various treatment combinations tested in sessions 1 and 2. As expected, learning or, improvement was evidenced for each treatment combination (i.e. the same experimental trial) across sessions (i.e. comparing mean values of session 1 to corresponding mean cell values of session 2). 1. Significant Effects on Time for Session 1 For session 1, the following significant effects were found on time (table 8.3): 1. Graph Format effect 2. Question Type effect 3. Time Period effect 4. Dataset effect 5. Graph Format by Question Type interaction 6. Graph Format by Time Period interaction 7. Graph Format by Dataset interaction 8. Question Type by Time Period interaction 9. Question Type by Time Period by Dataset interaction Among these, only the significant Graph Format by Question Type interaction will be discussed as it is of key interest for this session (session 1). Figure 8.1 depicts the plot with the corresponding mean value table for this 2-way interaction. Table 8.5 shows results of the multiple mean comparison tests produced by BMDP7D on key contrasts that are of interest. RESULTS: EXPERIMENT 3/195 Table 8.3: Comparison of ANOVA Results Among Sessions (Experiment 3, Additional Outliers Excluded) Dependent Variable: Time Performance Sources of Var iance p-values (Comb i ned Sess ions) p-values (Session I ) p-values (Sess ion 2) G: Graph Format 0 . 0** 0 . 0** 0.0** Q: Question Type 0 .0000** 0 .0000** 0 .000 1 *.* T: Time Pe r iod 0 .0000** 0 .0000** 0.000 1** D: Dataset 0 .0000** 0 .0000** 0.000 1** G*Q 0 . 0** 0 . 0** 0.0000** G*T 0 .0376* 0 .0550* 0.3 180 Q*T 0 .0060** 0 .0201 * 0.0194* G*D 0 .0000** 0 .0001** 0.0002** Q*D 0 .7470 0 . 5844 0.1933 T*D , . 0 .0074** 0 . 1 297 0 . 0044** G*T*D 0 .3323 0 .5191 0 . 2977 G*Q*T 0 . 3032 0 .4172 0.1423 G*Q*D 0 .0206* 0 .0384 0.2254 Q*T*D 0 .0000** 0 .0000** 0.057 1 G*Q*T*D 0 .0228* 0 .0568 0.1604 * Significant at p = 0.05 level ** Significant at p = 0.01 level RESULTS: EXPERIMENT 3 / 196 Table 8.4: Tables of Means for All Treatment Combinations (Experiment 3, Outliers Excluded) Dependent Variable: Time Performance G r a p h i c a l I n f o r m a t i o n C o m p l e x i t y Q l B a r s Q2 Q3 Ql fmbol 5 Q2 Q3 I Q l j i ne s Q2 Q3 a S e s s i o n 1 S e s s i o n 2 5 . 99 4 . 59 9 .26 7 .04 9 . 68 7 . 57 3 .25 2 . 95 6.89 4.72 7 .32 5 .28 9 . 45 5 . 66 3 .95 2 .94 4 .06 3 .27 b S e s s i o n 1 S e s s i o n 2 5 . 49 5 . 02 16.7 10.9 1'4 . 7 11.6 4 . 56 4 . 38 10.9 7 .56 9 .77 7 .03 8.12 5 . 34 12.5 8 .38 9.77 10.4 c S e s s i o n 1 S e s s i o n 2 8.75 8 . 37 18.8 13.5 13.3 10.4 6.12 5.61 10.9 10.1 9.60 8 . 38 8.19 6. 96 7.91 5 . 57 7 .87 5.80 d S e s s i o n 1 S e s s i o n 2 1 1.3 8.06 20 . 5 13.0 19.8 1 3 .2 7.12 6.49 10.2 10.7 13.1 10.2 14.8 7.72 6.22 6.29 10.3 7.67 T r e a t m e n t C o m b i n a t i o n s o f I n f o r m a t i o n C o m p l e x i t y a : 2 D a t a s e t w i t h 7 Time P e r i o d s b: 2 D a t a s e t w i t h 14 Time P e r i o d s c : 3 D a t a s e t s w i t h 7 Time P e r i o d s d : 3 D a t a s e t s w i t h 14 Time P e r i o d s RESULTS: EXPERIMENT 3 / 197 A study of figure 8.1 and table 8.5 reveals that most of the significant differences found among key contrasts were between two particular treatment combinations (the Bar-Q2 and Bar-Q3 combinations) and the other treatment combinations. Subjects using multiple bar graphs took significantly longer for extracting the dataset to which a pair of consecutive datapoints with the largest level difference (Q2) belonged, or a range of successive datapoints with the longest uni-directional trend (Q3) when compared to using other forms of multiple representations (i.e. multiple symbol and line graphs). No difference was found between using symbols and lines for performing either Q2 or Q3. With respect to Q1 (i.e. identifying the dataset with a single point that come closest to a given scale-value), only lines were found to take significantly longer compared to symbols, consistent with propositions advanced in chapter 3. In addition, bars as well as symbols were found to be faster to use for Q1 than for either Q2, or Q3. No significant differences were found between using bars and lines, or using bars and symbols for performing Q 1 . Altogether, these results showed bars to take longest for performing Q2 and Q3 in this experiment (E3) as compared to the other formats. It is believed that two main reasons contributed to this. First, there was no reference in either question or answer to anchoring 't ime period' information on the abscissa for tasks tested. This Jack of a strong x-axis anchoring appeared to make bars less suitable than other graph formats (i.e. lines and symbols) for task activities tested in E3, particularly Q2 and Q3 because for Q l , at least, some anchoring information was provided on the ordinate (y-axis) (i.e. scale-values). Second, the use of only multiple dataset representations in this experiment also added to the disadvantage of using bars over the other forms of representations. On the one hand, lines were generally faster to use for Q2 and Q3 than bars in this experiment because of their having characteristics of point continuity, a characteristics which facilitated the extraction of difference patterns, trends, and other form of Gestalt information (e.g. 'dataset' information). On the other hand, symbols were faster to use for Q2 and Q3 in this experiment than bars simply because not only are symbols easier to string together than bars due to their relatively RESULTS: EXPERIMENT 3 / 198 greater connectedness (see chapter 3), but also for multiple dataset representations, they look more like multiple line graphs than multiple bar charts. As for Q 1 , symbols turned out to be a better format to use than lines (or bars)t simply because in addition to having a reasonable dataset anchoring characteristic (see chapter 3), symbols also possessed moderately strong information anchoring characteristics to the axes framework. In other words, their special advantage over lines (as well as bars) lies in their combining effectively characteristics of both discreteness (as in bars) with apparent continuity (as in lines). In summary, the findings were generally supportive of the notion that for occasional or first-time graph users, it is the task characteristics as well as the characteristics of the graphical representation that is being used which meaningfully determines time performance for tasks investigated. 2. Significant Effects on Time for Session 2 For session 2, the following significant main effects and two-factor interactions were found: 1. Graph Format effect 2. Question Type effect 3. Time Period effect 4. Dataset effect 5. Graph Format by Question Type interaction 6. Question Type by Time Period interaction 7. Graph Format by Dataset interaction 8. Time Period by Dataset interaction No higher-order interactions were found to be significant for time in this experiment. t The mean difference showing symbols to be better than bars are not, however, statistically significant. RESULTS: EXPERIMENT 3 / 1 9 9 Figure 8.1: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 3, Session 1) Dependent Variable: Time Performance Ques t i o n T y p e B a r s S y m b o l s L i n e s Q l 7.89 (* 1 ) 5.26 (*4) 10.14 (*7) Q2 16.30 (*2) 9.73 (*5) 7.65 (*8) Q3 14.37 (*3) 9.94 (*6) 7.99 (*9) * Numbering i n c e l l s c o r r e s p o n d f o r D u n n - B o n f e r r o n i method of t o t r e a t m e n t c o m b i n a t i o n s m u l t i p l e means c o m p a r i s o n RESULTS: EXPERIMENT 3 / 200 e 8.5: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 3, Session 1) Dependent Variable: Time Performance 1. Significant Differences among Means at the a = 0.01 level: a. 2); (*1,*3) b. (*2,*5); (*2,*8) c. (*3,*6); (*3,*9) d. (*4,*5); (*4,*6); (*4,*7) 2. No other key contrasts among means were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 8.1. RESULTS: EXPERIMENT 3 / 201 a. Main Factor Effects on Time for Session 2 1. Graph Format -- This factor was found to be highly significant (p < .01) for session 2. Bars were generally found to take longer to process than lines or symbols. Average time taken for bars was approximately 9.5 seconds compared to only 7 seconds for symbols and slightly over 6 seconds for lines. The difference in time between using symbols and lines was not statistically significant based on the Dunn-Bonferroni tests. 2. Question Type -- This factor was highly significant (p < 0.01) for session 2. Subjects were found to take longer for performing Q2 and Q3 as compared to Q 1 . Average time for Q2 and Q3 was about equal (8 seconds) whereas that for Q l was about 6 seconds. 3. Time Period -- Effect of this highly significant (p < .01) factor was consistent with expectation. As the number of time periods depicted along the abscissa of time series graphics increased, time for extracting data increased correspondingly. Average time for using 7-period graphics was close to 6.6 seconds compared to 8.6 seconds for 14-period graphics. 4. Dataset - As expected, an increase was found for latency of reaction times as the number of datasets depicted on a single plot increased. For graphs with 3 datasets, time performance averaged 8.8 seconds but for graphs with only 2 datasets, 6.4 seconds. b. Two-way Interactions on Time for Session 2 1. Graph Format x Question Type - This two-factor interaction was of central focus in the study. The analysis revealed almost identical results for this interaction in session 2 as those found in session 1. Figure 8.2 shows the plot and mean values for this interaction. The Dunn-Bonferroni results for the interaction are shown in table 8.6. Only key contrasts among means were evaluated. A study of figure 8.2 and table 8.6 revealed that, consistent with session 1 results, subjects using multiple bar graphs took significantly longer time for performing Q2 and Q3 of this experiment compared to other forms of multiple representation. For Q l , no differences between the RESULTS: EXPERIMENT 3 / 202 various formats were significant at the a = 0.05 level (table 8.6). Similar to session 1 results, bars as well as symbols were also found to be faster to use for Q1 than for either Q2, or Q3. That both bars and symbols, but not lines, were found to be faster to use for Q1 than the other tasks (i.e. Q2, or Q3) in this experiment is consistent with the notion that when strong anchoring of information exists on some axis component (e.g. both Q1 of E1 and E2 has strong anchoring of information on both axes but Q1 of this experiment only exhibited strong anchoring of information on the ordinate),t the use of more discrete representations such as bars and symbols would produce better time performance scores than the use of purely continuous representations such as lines. In summary, the key finding for this 2-way interaction for this second session as well as for the first session (which was discussed earlier) is that the degree of support provided by a particular graph format for a particular task is very much dependent on the matching characteristics of the graph format and the task at hand, ln short, bars and symbols were more suited to tasks with characteristic of strong anchoring of information on any one or both of the major axes (e.g. Q1) than those in which such an anchoring of information was unavailable (e.g. Q2 and Q3). The characterization of tasks as well as graph formats based on the anchoring concept in chapter 3 was well supported. t See tables 3.5 and 3.8 regarding classification of tasks based on the anchoring concept. RESULTS: EXPERIMENT 3 / 203 Figure 8.2: Plot and Mean values of Graph Format x Question Type Interaction -(Experiment 3, Session 2) Dependent Variable: Time Performance Que s L i o n T y p e B a r s S y m b o l s L i n e s Ql 6.51 4.8 6 (*4) 6.42 (*7) Q2 11.13 (*2) 8.26 (*5) 5.80 (*8) Q3 10.71 (*3) 7.73 (*6) 6.80 (*9) * Numbering i n c e l l s c o r r e s p o n d f o r D u n n - B o n f e r r o n i method of to t r e a t m e n t c o m b i n a t i o n s m u l t i p l e means c o m p a r i s o n RESULTS: EXPERIMENT 3 / 204 Table 8.6: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance a. Significant Differences among Means at the a = 0.01 level: 1) <*1,*2); (*1,*3) 2) (*2,*8) 3) (*3,*9) 4) (*4,*5) b. Significant Differences among Means at the a — 0.05 level: 1) (*2,*5) 2) (*3,*6) 3) (*4,*6) c. No other contrasts of interest were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 8.2. RESULTS: EXPERIMENT 3 I 205 2. Question Type x Time Period - Unlike the disordinalt Graph Format x Question Type interaction, this two-factor interaction appeared to be strictly ordinal.* The plot and respective cell means for this significant interaction are shown in figure 8.3. Results of the Dunn-Bonferroni tests on key contrasts are summarized in table 8.7. Results of the Dunn-Bonferroni tests for session 2 indicated that: (a) effect of increasing time periods (14 periods) on Q3 performance was highly significant (p < .01); (b) effect of increasing time periods (14 periods) on Q2 performance was significant only at the a — 0.05 level; and (c) no significant adverse effect was found with increasing time periods on Q 1 . In addition, only for 14-period graphics were Q2 as well as Q3 found to take significantly longer to perform than Q 1 . No differences were found among tasks in the case of 7-period graphics. Together with the previous findings on the significance of time period effect, this result supported the notion that the effect of information complexity of graphics is stronger with more time periods. In spite of this, the results also indicated that in effect, increasing number of time periods had varying adverse effects on different task activities: that is, more adverse effects were found with increasing time periods on Q2 and Q3 performance than on Q1 performance. Probably, this was due to the presence of a strong anchoring of information on the ordinate scale in the case of Q 1 , which provided a mechanism to quickly filter all of the irrelevant information that came with more time periods. In the absence of such an anchoring as in cases of Q2 and Q3, there was no such mechanism available to facilitate information processing. t See figures 8.1 and 8.2. % This term is explained in the Glossary. RESULTS: EXPERIMENT 3 / 206 Figure 8.3: Plot and Mean values of Question Type x Time Period Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance Quest i o n Type Q l Q2 Q3 7 P e r i o d s 5.69 7.31 (*3) 6.79 (*5) 1 4 Per i o d s .6.17 (*2) 9.49 (*4) 10.03 (*6) * Numbering i n c e l l s c o r r e s p o n d f o r D u n n - B o n f e r r o n i method of t o t r e a t m e n t c o m b i n a t i o n s m u l t i p l e means c o m p a r i s o n . RESULTS: EXPERIMENT 3 / 2 0 7 Table 8.7: Summary of Dunn-Bonferroni Results for Question Type x Time Period Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance a. Significant Differences among Means below the a = 0.01 level: 1) (*2,*4); (*2,*6) 2) (*5,*6) b. c. Significant Differences among Means at the a = 0.05 level: 1) (*3,*4) d. No other key contrasts among means were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 8.3. RESULTS: EXPERIMENT 3 / 208 3. Graph Format x Dataset - Just as the Question Type x Time Period interaction, this was another ordinal interaction as plotted in figure 8.4. Figure 8.4 also shows the cell means for this two-factor interaction. Table 8.8 shows a summary of the Dunn-Bonferroni tests performed on key contrasts. Essentially, the ordinal interaction for this effect and the highly significant main effect for dataset category indicated that even with a minor increase in the number of datasets depicted (i.e. from two to three datasets), task performance was adversedly affected. Specifically, the Dunn-Bonferroni tests for this session (table 8.8) indicated that increasing datasets significantly impaired task performance when either bars or symbols were used, but not when lines were used. Moreover, subjects took more time with bars as compared to lines for 3 datasets, and with bars as compared to symbols for 2 datasets. These findings were consistent with the characterization of bars based on the anchoring concept (chapter 3). The underlying rationale is that increasing number of datasets contributed to a large number of bars that have to be processed together whereas for lines, an increase in the level of dataset variable from two (i.e. low) to three (i.e. high), they (i.e. increasing number of datasets) only yielded an additional line. In other words, dataset anchoring characteristics are low for bars but high for lines. Consequently, this special strength of lines over bars was dramatically demonstrated in this experiment more than the others as only multiple dataset representations were evaluated here. RESULTS: EXPERIMENT 3 / 209 Figure 8.4: Plot and Mean values of Graph Format x Dataset Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance 3 L o g o n d S y m b o l » G R A P H F O R M A T Graph Format Ba r s Symbols L i n e s 2 D a t a s e t s 7 . 8 0 (* 1 ) 5 . 3 2 (*3) 6 .01 <*5) 3 D a t a s e t s 1 1 . 1 0 (*2) 8 . 5 8 (*4) 6 . 6 7 (*6) * Numbering i n c e l l s c o r r e s p o n d f o r D u n n - B o n f e r r o n i method of to t r e a t m e n t c o m b i n a t i o n s m u l t i p l e means c o m p a r i s o n . RESULTS: EXPERIMENT 3 / 2 1 0 Table 8.8: Summary of Dunn-Bonferroni Results for Graph Format x Dataset Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance Significant Differences among Means below the a = 0.01 level: 1) C V 2 ) 2) (*2,*6) 3) (*3,*4) Significant Differences among Means at the a = 0.01 level: 1) (*1,*3) No other key contrasts among means were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 8.4. RESULTS: EXPERIMENT 3 / 2 1 1 Moreover, the relatively greater adverse effect of increasing number of datasets found on symbols as compared to lines (see figure 8.4 and table 8.8) provided additional evidence on the importance of matching characteristics of task to graph format. Indeed, the relative strength of lines over bars and symbols in this experiment was evidenced by a stronger anchoring of dataset information found on lines as compared to the other formats. It was also evidenced by the absence of time period anchoring for task activities to be performed in this experiment. In summary, the findings implies that some reservations should be placed on the unrestricted use of multiple bar representations in business reporting. Note also that characteristics of the task at hand also determine how adversely increasing number of datasets could affect the graph format chosen. 4. Time Period x Dataset -- This two-factor significant interaction is plotted in figure 8.5 and found to be strictly ordinal. Table 8.9 provides the Dunn-Bonferroni test results on key contrasts for the interaction. Analysis of this interaction revealed that increasing number of datasets and time periods led to increasing time required for task performance, ln line with a priori hypotheses given in chapters 3 and 4, the evidence on hand together with the highly significant main effects of time period and dataset variables clearly indicated that information complexity of graphics increased with increasing levels of both time period and dataset variables. RESULTS: EXPERIMENT 3/212 Figure 8.5: Plot and Mean values of Time Period x Dataset Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance "3* 0 © 7 — P . r l o d M — P-.r" lodi T I M E P E R I O D T i m e Pe r i o d D a t a s e t 2 D a t a s e t s Ca t e g o r y 3 D a t a s e t s 7 P e r i o d s 4.89. (* 1 ) 8.30 (* 2) 14 P e r i o d s 7.86 (*3) 9.26 (*4) * Numbering i n c e l l s c o r r e s p o n d f o r D u n n - B o n f e r r o n i method of to t r e a t m e n t c o m b i n a t i o n s m u l t i p l e means c o m p a r i s o n . RESULTS: EXPERIMENT 3 / 2 1 3 Table 8.9: Summary of Dunn-Bonferroni Results for Time Period x Dataset Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance 1. Significant Differences among Means below a = 0.05 level: a. (*2,*1) b. (*3,*1) 2. No other key contrasts among means were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means given in figure 8.5. RESULTS: EXPERIMENT 3 / 214 C. ACCURACY PERFORMANCE FOR COMBINED SESSIONS The resulting correlations of the GEFT measure and corresponding accuracy scores for the combined (transformed) as well as the separate sessions (sessions 1 and 2) did not reveal any significant relationships. This was further substantiated by the non-significant GEFT effects when the GEFT variable was included as a covariate in the statistical models for analyzing both the session and the combined datasetst (see table 8.10). For the transformed dataset, the significant effects found with respect to accuracy for this experiment (E3) included: 1. Question Type effect 2. Time Period effect 3. Question Type by Time Period effect 4. Question Type by Dataset effect 5. Time Period by Dataset effect As usual, this discussion focusses on those key contrasts that are of interest to this research, particularly, those effects found to be significant at the nominal level when the Dunn-Bonferroni method of multiple-comparison was used. t The method of transformation used for combining the datasets captured for accuracy performance has been described in earlier chapters. RESULTS: EXPERIMENT 3 / 215 Table 8.10: Comparison of ANCOVA Results for Sessions 1, 2, and the Transformed Dataset (Experiment 3, Outliers Excluded) Dependent Variable: Accuracy Performance Sources of Variance p-values (Combined Sessions) p-values (Session 1 ) p-values (Sess ion 2) GEFT 0 . 85 I 4 0. 3064 0 .8250 G: Graph Format 0.0627 0. 06 1 5 0 . 2787 Q: Question Type 0.0034** 0. 0060** 0 .0570 T: Time Peri od 0.005l** 0. 0185* 0 .0483* D: Dataset 0.3737 0. 1 1 42 0 .2980 G*Q 0. I 594 0. 07 1 7 0 .3393 G*T 0.5709 0. 3456 0 .8652 G*D 0.5165 0. 4804 0 .3039 Q*D 0.0084** 0. 1 947 0 .0208* Q*T 0 . 0** 0. 0** 0 .0000** T*D 0.0034** 0. 2387 0 .0037** G*T*D 0. 1083 0. 87 1 7 0 .0456* G*Q*T 0.3808 0. 5479 0 .2941 G*Q*D 0.2287 o. 17 16 0 . 2349 Q*T*D 0.5956 0. 3200 0 .9163 G*Q*T*D 0. 1753 0. 7063 0 . 1 692 Significant at p = 0.05 level Significant at p = 0.01 level RESULTS: EXPERIMENT 3 / 2 1 6 1. Main Effects on Accuracy for Transformed Data 1. Question Type - Analysis of the data showed that significantly higher percentages of correct responses were obtained for performing Q2 (95%) as compared to performing Q3 (87%). No other key contrasts among means were found to be significant at the a = 0.05 criterion as revealed by the Dunn-Bonferroni tests. Note that percentages of correct responses found for Q l performance was 91%. 2. Time Period - More time periods (14 periods) was found to affect accuracy adversely. Hence, the percentage of correct response responses was 94% for 7 periods but dropped to 88% for 14 periods. This finding is consistent with results on time performance in that more time periods led to longer processing time. 2. Two-way Interactions on Accuracy for Transformed Data 1. Question Type x Time Period -- Table 8.11 shows the mean percentages of correct responses for the various treatment combinations of this two-factor interaction and the Dunn-Bonferroni results among key contrasts of interest that were found to be significant below or at the a = 0.05 criterion level. Table 8.11 reveals that a significantly adverse effect was found on accuracy, for task activities to be performed under Q3, with increasing number of time periods (14 periods), but not for those to be performed under Q1 or Q2. t The other significant key contrast was the higher accuracy found with Q1 compared to Q3 and with Q2 compared to Q3 in the case of 14 periods. No significant differences were found among tasks in the case of 7-period graphics. t More time periods also affected Q2 performance adversely but this effect was not statistically significant as revealed by Dunn-Bonferroni tests. RESULTS: EXPERIMENT 3 / 217 Table 8.11: Dunn-Bonferroni Test Results and Mean Value Table for Question Type x Time Period Interaction of Transformed Data (Experiment 3, Outliers Excluded) Dependent Variable: Accuracy Performance Q u e s t i o n Type Q l Q2 Q3 7 P e r i o d s 8 6 % (*1) 97% (*3) 98% (*5) 14 P e r i o d s 96% (*2) 92% (*4) 7 7 % (*6) a. Significant Differences among Means below a — 0.01 level: 1) (*2,*6) 2) (*5,*6) b. Significant Differences among Means at the a = 0.01 level: 1) (*4,*6) c. No other contrasts of interest among means were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of means above. • • RESULTS: EXPERIMENT 3 / 218 These results were consistent with the findings on time performance for this experiment: that is, the adverse effects of more time periods on Q3 was higher than those on Q2. Q1 in this experiment was not adversely affected by increasing number of time periods whether for time or for accuracy. Moreover, differences among the various tasks were only significant for 14 periods, with Q3 standing out as the most difficult task to perform. 2. Question Type x Dataset -- Table 8.12 shows the mean percentages of correct responses for the various treatment combinations of this two-factor interaction. No summary on the Dunn-Bonferroni results is given as no major contrasts among means were found to be significant below or at the a = 0.05 criterion level as revealed by BMDP7D software. 3. Time Period x Dataset -- Table 8.13 shows the cell means on percentages of correct responses for this two-factor interaction and results of the Dunn-Bonferroni tests. The only significant contrast of interest was the lower accuracy found with 14 periods as compared to 7 periods in the case of graphics with only 2 datasets. This was consistent with the idea that complexity of graphics was stronger with increasing time periods. Perhaps, the lack of significant differences between using graphics with only 2 datasets versus those with 3 datasets for accuracy indicated that subjects had undergone sufficient training on multi-dataset displays to use them accurately. RESULTS: EXPERIMENT 3 / 2 1 9 Table 8.12: Mean Value Table for Question Type x Dataset Interaction of Transformed Data (Experiment 3, Outliers Excluded) Dependent Variable: Accuracy Performance Q u e s t i o n T y p e Q1 Q2 Q3 2 D a t a s e t s 96% 93% 85% 3 D a t a s e t s 87% 9 6% 89% RESULTS: EXPERIMENT 3 / 220 Table 8.13: Dunn-Bonferroni Test Results and Mean Value Table for Time Period x Dataset Interaction of Transformed Data (Experiment 3, Outliers Excluded) Dependent Variable: Accuracy Performance D a t a s e t C a t e g o r y T i m e P e r i od 2 D a t a s e t s 3 D a t a s e t s 7 P e r i o d s 97% (* 1 ) 9 1 % (*2) 14 Per i o d s 8 5% (*3) 9 1 % (*4) * N u m b e r i n g i n c e l l s c o r r e s p o n d t o t r e a t m e n t c o m b i n a t i o n s f o t h e D u n n - B o n f e r r o n i m e t h o d o f m u l t i p l e means c o m p a r i s o n . a. Significant Differences among Means below a = 0.01 level: 1) (*1,*3) b. No other key contrast among means were found to be significant. Note that the above numbering of treatment combinations correspond to those marked in the table of;.<neans given above. \•, RESULTS: EXPERIMENT 3 / 221 D. SUMMARY OF EXPERIMENT 3 FINDINGS Strong evidence showing the need to match characteristics of graph formats to those of the tasks to be supported underlies the results of graph format by question type interaction for sessions 1 and 2 of this experiment (E3). Indeed, the highly significant interaction effect of graph format by question type for time, as depicted in figure 8.2, indicated that both symbols and bars took longer for Q2 and Q3 as these tasks had low anchoring of information on the (x,y) axes framework. Hence, significantly longer time was taken to use bars than either symbols or lines for performing Q2 or Q3. For Q 1 , as there was a strong anchoring of information on the y-axis, there were no significant differences found among the various formats, although symbols appeared to have a slight advantage for Q1 over bars or lines. All of these results were in strong support of a priori hypotheses stated on the basis of anchoring concept discussed in chapters 3 and 4. Even with increase of the dataset variable from 2 to 3 datasets, time performance with the use of bars was significantly affected adversely (figure 8.4). No significant adverse effect was found with singular versus multiple lines. The adverse effect for increasing datasets on symbols was actually of a lesser degree than that found on bars. In addition, strong evidence was produced to support the characterization of graph format and task based on the anchoring concept presented in chapter 3: bars may be characterized as having moderately strong x- and y-axis anchoring but weak dataset anchoring; lines may be characterized as having strong dataset anchoring but weak anchoring on the two major axes; and symbols may be characterized as having moderate anchoring on all graphical components (i.e. x-axis, y-axis, and dataset). Accordingly, results of this experiment showed that the use of bars should be strongly questioned in multiple dataset representations as well as for task activities where there is an absence of strong anchoring of information on the axes. Indeed, multiple bar representations should generally be avioded. But the decision to use multiple lines versus multiple symbols would largely depend on the nature of task activities to be performed: that is, in the presence of RESULTS: EXPERIMENT 3 / 2 2 2 a strong anchoring of information on the axes (x-axis, y-axis, or both), multiple symbols should be used whereas the absence of such anchoring information should argue for the use of multiple lines (e.g. E3). Finally, different characteristics of tasks investigated in this experiment were also found to affect different factors of information complexity. For example, while performance of Q1 was not generally found to be adversely influenced by either increasing number of time periods or datasets in this experiment, performance of Q2 and Q3 was found to be gravely influenced by more time periods, but not by more datasets. In fact, there was some indication that subjects learned to use 3 datasets representations as accurately as 2 datasets representations. Inevitably, understanding how characteristics of various graph formats can best be matched to the characteristics of various tasks will help to unveil the relative merits and/or demerits of using various forms of graphs for performing various tasks. Task differences are also useful in providing insights as to why certain results are found in certain experiments but not others. The next chapter discusses these important issues in greater depth and also attempts to integrate the results of all three experiments with the current literature. IX. INTEGRATION OF RESULTS Results of experiments E1, E2, and E3 were presented respectively in chapters 6, 7, and 8. This chapter attempts to combine time performance results by first, drawing general conclusions on the various graph formats studied based on key findings observed in individual experiments, and second, integrating these and other findings with the current graphics literature.t A. OVERVIEW OF KEY FINDINGS The Graph Format by Question Type interaction effect was of central focus in this reseach as it provided information regarding the relative strengths and weaknesses of various graph formats for performing different task activities under different experimental conditions.. Hence, the significance of this particular interaction effect on time was discussed for all experimental sessions 1 and 2 in previous chapters. As noted, the difference between interpreting results for sessions 1 and 2 lies in the implication that session 1 findings are more applicable to occasional or first-time graph users, whereas session 2 findings to experienced or frequent graph users. Accordingly, this interaction effect will form the basis for discussing relative strengths and weaknesses of the various graph formats in terms of session 1 results across the experiments. As for session 2 results, the different graph formats will be evaluated based on several criteria, including: Graph Format as a main factor effect; Graph Format by Question Type interaction effect; and Graph Format by Dataset interaction effect. + t . Since a mega-analysis of the total rawdataset (i.e. E1, E2, and E3 data) suggests significant differences between experiments on the "Experiment" factor and its interaction with other major factors, detailed statistical comparisons among effects found between experiments are not drawn. Instead, comparisons are drawn at a more general level and the discussion focus on how results fit into existing literature. * Examination of detailed experimental results for E1, E2, and E3 revealed no other graph format related interaction effect to be statistically significant for session 2 on time with the exception of a significant 3-way Graph Format by Question Type by Time Period interaction found in that session for E2 based on standard ANOVA procedures. This exception was, however, ignored because further statistical analysis using the Dunn-Bonferroni method of multiple mean comparisons on planned contrasts of key interest among cell means (table 7.8) for this interaction did not yield any significant differences among the contrasts considered. 223 INTEGRATION OF RESULTS / 224 1. Effects for Time Performance in Session 1 Table 9.1 provides an overview of the relative strengths and weaknesses found for the various formats based on time results for session 1 across the experiments. First, the results of experiment E1 provided strong evidence that occasional or first-time graph users found bars as well as symbols to be relatively faster to use than lines for performing task activities associated with Q1 (i.e. tasks which required both high x-axis and y-axis anchoring: Group I tasks).t In addition, lines were faster to use for task activities associated with either Q2 or Q3 (i.e. tasks which required both high x-axis and dataset anchoring: Group II tasks) than with Q 1 . Conversely, E3 results indicated that bars took longer to use than either symbols or lines for task activities associated with either Q2 or Q3 (i.e. tasks which required high dataset but low x-axis anchoring: Group IV tasks).* In addition, symbols were faster to use than lines for task activities associated with Q1 (i.e. tasks which required both high y-axis and dataset anchoring: Group III tasks). Moreover, for both bars and symbols, response time for Q1 was shorter than either Q2 or Q3. Finally, E2 results revealed that bars took significantly longer for performing Q2 (i.e. tasks which required both high x-axis and dataset anchoring: Group ll tasks) compared to Q 1 . * Overall, these results indicated that the relative strengths and weaknesses of various formats depended critically on the type and nature of information anchoring characteristics of tasks to be performed. For example, one of the major strengthstt of bars lies in their having a strong x-axis anchoring characteristics. But this characteristics is good only for performing those tasks with a similar characteristics. Hence, bars are generally good for extracting single data elements (e.g. Q1 in E1 and t See tables 3.8 and 3.9 as well as figure 6.1 and table 6.5. * See tables 3.8 and 3.9 as well as figure 8.1 and table 8.5. * See figure 7.1 and table 7.5. t t Actually, this is a relative notion depending on the type of task: that is, what may be considered a "strength" (e.g. strong x-axis anchoring) relative to one task (e.g. Group I task) may conversely be considered a "weakness" relative to a different task (e.g. Group IV task). INTEGRATION OF RESULTS / 225 E2). In contrast, the key characteristic of lines that differs from other formats is their having a strong dataset anchoring. They also have weak anchoring on the axes (see chapter 3). Consequently, E1 results provided strong evidence on the difficulty of disembedding points on lines. Hence, lines should never be recommended for identifying or locating point values (cf. chapters 3 and 6). However, results indicated that lines are generally best for detecting trends and patterns. As for symbols, they appear to provide an alternative to either bars or lines, especially in situations where the task has only a partial anchoring on some major axes but requires a moderately high anchoring on the dataset component (e.g. Q1 in E3). In these task settings, the disadvantage of bars due to their low dataset anchoring, as well as the disadvantage of lines due to their low axes anchoring, leave symbols as the best choice. In summary, the relative strengths and weaknesses of various graph formats for various tasks applicable to first-time graph users are well characterized by the concept of information anchoring on the major components of time-series graphics (chapter 3). Further progress in knowledge about the merits and demerits of various graph formats will likely come from an active theory-based research program evaluating a variety of tasks and graph formats. The present discussion provides a stepping stone to accumulating knowledge regarding the use of bars, symbols, and lines among novice graph readers. The next section discusses findings that are more applicable to expert graph readers. INTEGRATION OF RESULTS / 226 Table 9.1: Overview of K e y Findings for Experiments E1, E2, and E3 (Session 1 Results) Dependent Variable: Time Performance G r a p h F o r m a t s E x p e r i m e n t s S e s s i o n l R e s u l t s : T i m e P e r f o r m a n c e B a r s . S y m b o l s L i n e s E x p e r i m e n t E l Q l : B a r s b e t t e r t h a n L i n e s Q I : S y m b o l s a r e b e t t e r t h a n L i n e s Q 1 : L i n e s w o r s t a m o n g a l l • f o r m a t s L i n e s : Q2 & Q3 e a s i e r t h a n Q l E x p e r i me n t E2 B a r s : Q2 f a s t e r t o a n s w e r t h a n Ql No s i g n i f i c a n t c o n t r a s t s No s i g n i f i c a n t c o n t r a s t s E x p e r i m e n t ' E 3 B a r s : Ql f a s t e r t o a n s w e r t h a n Q2 & Q3 ' Q2,Q3: B a r s a r e w o r s t o f the g r a p h f o r m a t s S y m b o l s : Q 2 , Q 3 t o o k l o n g e r t o a n s w e r t h a n Q l Ql : S y m b o l s a r e b e t t e r t h a n L i n e s Q2 ,Q3: S y m b o l s , are b e t t e r t h a n B a r s Q1 : L i n e s a r e w o r s e t h a n S y m b o l s Q2,Q3: L i n e s are b e t t e r t h a n B a r s INTEGRATION OF RESULTS / 227 2. Effects for Time in Session 2 The objective of this section is not to give another review of results discussed in the preceding chapters (i.e. chapters 6, 7, and 8), but to focus on Graph Format related findings which may be used to address the research problem (chapter 1). As noted earlier, only main factor and two-factor effects with respect to graph format and its interaction with other factors are discussed as no higher-order interactions related to graph format have been found to yield significant contrasts that would be of interest for this session (i.e. session 2). a. Main Factor Effect 1. Bars - Table 9.2 provides an overview of findings on the Graph Format effect found across the experiments when applied separately to bars, symbols, and lines. It is apparent that for E1 and E2 tasks, bars were no better or worse than the other formats, but they were inferior for task activities investigated in E3. Accordingly, it was observed that E1 and E2 tasks differed from E3 tasks in that strong anchoring of abscissa information was available for tasks associated with experiments E1 and E2 but not with experiment E3 (chapter 3). Moreover, tasks tested in E3 also employed only multiple dataset representations (cf. appendices B, C, and D). It appears, thus, that bars are expected to become difficult to use when they are not anchored strongly to the abscissa as in E3 tasks. Moreover, they become confusing to read when they are used in multiple dataset representations. 2. Symbols - Table 9.2 shows that symbols were more quickly processed than bars for tasks evaluated in experiment E3 but not in E1 and E2. Because strong anchoring were provided on the abscissa for all E1 and E2 tasks, bars were not inferior to symbols when time performance scores were averaged across all tasks examined in these experiments. For experiment E3 tasks, however, both the absence of such a strong abscissa anchoring and the increase in the number of datasets, placed bars at a disadvantage when compared to symbols. 3. Lines - The same general comments regarding symbols compared to bars apply to lines compared to bars (table 9.2). INTEGRATION OF RESULTS / 228 Table 9.2: Overview of Main Factor Effect on Graph Format Characteristics for Experiments E l , E2, and E3 (Session 2 Results) Dependent Variable: Time Performance G r a p h F o r m a t s E x p e r i m e n t s S e s s i o n 2 : M a i n G r a p h F o r m a t E f f e c t on Time Ba r s Symbols L i n e s - E x p e r i m e n t E1 No s i g n i f i c a n t c o n t r a s t s No s i g n i f i c a n t c o n t r a s t s No s i g n i f i c a n t c o n t r a s t s E x pe r imen t E2 No s i g n i f i c a n t c o n t r a s t s No s i g n i f i c a n t c o n t r a s t s No s i g n i f i c a n t c o n t r a s t s E x p e r i m e n t E3 B a r s a r e t h e w o r s t f o r m a t S y m b o l s b e t t e r t h a n B a r s L i n e s b e t t e r t h a n B a r s INTEGRATION OF RESULTS / 229 fa. 2-way Interactions 1. BARS -- Evidence provided by the significant Graph Format by Question Type interaction on the relative strengths and weaknesses of various graph formats for performing various task activities in session 2 of E1 and E3 was generally consistent with that provided by session 1 of these experiments.t The Graph Format by Question Type effect was not significant for session 2 of E2. Table 9.3 provides an overview of findings from this two-factor interaction as applied to the different graph formats evaluated. Analysis of E1 data indicated that bars were faster than lines for isolating single points (i.e. Q1) although this particular contrast did not reach statistical significance at a = 0.05 level for session 2 (see figure 6.2 and table 6.6). This might probably be attributed to learning, in contrast, analysis of E3 results revealed strongly that bars took longest for answering Q2 and Q3 as compared to either lines or symbols. However, the same expected disadvantage of bars over other formats for answering Q2 and Q3 in E2 was not supported by statistically significant contrasts. This was probably due to the presence of strong information anchoring on the abscissa provided by E2 task setting. The principal finding, therefore, is that bars should be used for performing tasks with a strong anchoring of time period information on the abscissa (i.e. Group I and II tasks -- see tables 3.8 and 3.9). This is because every bar on a bar graph has an anchoring base on the x-axis component. Consequently, the absence of a strong anchoring of information on the axes framework caused bars to become an inappropriate format (i.e Group IV tasks - see tables 3.8 and 3.9). In addition, results of the significant Graph Format by Dataset interaction across the experiments strongly indicated that increasing the number of datasets depicted had a greater adverse effect on bars than on either symbols or lines (e.g. cf. detailed findings on the significant Graph Format by Dataset interactions found for all experiments). In fact, when only t Refer to earlier discussion on session 1 results as applied to bars. INTEGRATION OF RESULTS / 230 multiple dataset representations were used,t bars resulted in higher time performance when compared to multiple lines (or multiple symbols). Because multiple bars, unlike multiple lines or symbols, have elements belonging to the same dataset depicted as isolated bars (i.e. one bar from the other), more effort is, therefore, required to string together individual bars belonging to the same dataset than to link together symbols or points on lines belonging to the same dataset. It is interesting to note that although the literature argues that bars have discrete and isolated characteristics which makes them suitable for extracting scale-values of single points and . unsuitable for extracting trends of many points (see Pinker, 1981, 1983), there has been no claim for the advantage of bars due to their strong anchoring to the abscissa. Moreover, warnings against the use of multiple bars are also only beginning to appear in the graphics empirical literature. For example, Cleveland (1984) argued that it is harder to compare the fractional graph area data on the divided bar chart than on the grouped dot chart because all comparisions on a grouped dot chart could be done by position judgement rather than the less accurate length-area judgement to be performed on the divided bar chart (Cleveland & McGill, 1984). 2. SYMBOLS - The significant Graph Format by Question Type interaction found in El indicated that symbols were faster to use than lines for answering Q1 (i.e. isolating single elements from a dataset). No significant difference was found between symbols and the other formats for E2 tasks. However, E3 results showed that both Q2 and Q3 took longer to process with symbols t h a n Q L Results of the significant Graph Format by Dataset interaction found in E1 revealed that multiple symbols were faster to read than multiple bars. However, multiple symbol graphs required more time to process than single symbol graphs in the case of E2. t E.g. Graph stimuli presented in E3. INTEGRATION OF RESULTS / 231 Table 9.3: Overview of Two-Factor Interaction Effects on Graph Format Characteristics for Experiments E1, E2, and E3 (Session 2 Results) Dependent Variable: Time Performance I S e s s i o n 2 ( T i m e ) : T w o - F a c t o r I n t e r a c t i o n s G r a p h F o r m a t s B a r s S y m b o l s L i n e s E x p e r i m e n t s • E x p e r i m e n t Q 1 : S y m b o l s a r e Q l : L i n e s w o r s e E ! b e t t e r t h a n t h a n S y m b o l s L i n e s L i n e s : Q2 & Q3 e a s i e r t h a n Q l B a r s : 1 d a t a s e t L i n e s : 1 d a t a s e t f a s t e r t h a n 3 f a s t e r t h a n 3 d a t a s e t s d a t a s e t s 3 d a t a s e t s : B a r s 3 d a t a s e t s : w o r s e t h a n S y m b o l s b e t t e r S y m b o l s t h a n B a r s Expe r i m e n t B a r s : 1 d a t a s e t S y m b o l s : 1 da t a - L i n e s : ! d a t a s e t E2 f a s t e r t h a n 3 s e t f a s t e r t h a n f a s t e r t h a n 3 d a t a s e t s 3 d a t a s e t s d a t a s e t s • B a r s : Q1 e a s i e r S y m b o l s : Q l i s E x p e r i m e n t t h a n Q2 & Q3 e a s i e r Q2 & Q3 E3 Q2,Q3: B a r s a r e Q2,Q3: S y m b o l s Q2,Q3: L i n e s w o r s t among a l l a r e b e t t e r t h a n a r e b e t t e r t h a n g r a p h f o r m a t s B a r s B a r s B a r s : 2 d a t a s e t s S y m b o l s : 2 d a t a -f a s t e r t h a n 3 s e t f a s t e r t h a n d a t a s e t s 3 d a t a s e t s 2 D a t a s e t s : B a r s 2 D a t a s e t s : w o r s e t h a n S y m b o l s b e t t e r S y m b o l s t h a n B a r s 3 D a t a s e t s : B a r s 3 D a t a s e t s : w o r s e t h a n L i n e s b e t t e r L i n e s t h a n B a r s INTEGRATION OF RESULTS / 232 As well, graphs with three datasets led to more time than those with only two datasets in experiment E3. In E3, results also revealed that among the two dataset plots, symbol charts were faster to use than bar charts. In summary, these results indicated that the advantages of symbols relate to their combining of characteristics belonging to both bars and lines. On the one hand, symbols, like bars, have the advantage over lines of being more strongly anchored to the axes components of time-series graphics. That was why lines took longer than symbols for answering Q1 in E1 (i.e. Group I tasks). On the other hand, symbols, like lines, have the advantage over bars of being more strongly anchored to the dataset component of time-series graphics representations. Accordingly, when multiple dataset representations are to be used, symbols are recommended over bars. However, the issue of multiple symbols versus multiple lines is one which has to be resolved based on the characteristics of tasks to be performed. Actually, symbols are best in situations where either the use of bars or lines becomes inadequate due to the nature of task requiring both high axes and dataset anchoring characteristics (i.e. Group III tasks in table 3.8). Thus, symbols yielded quicker responses when Q-1 in E3 was to be answered as compared to either Q2 or Q3 of that experiment. The implication of these findings is that symbols always offer an alternative to bars or lines and a graphics designer should decide the best representation only after careful consideration. In fact, Cleveland (1984) advised the replacement of bars with the use of dots (or, symbols). Particularly in tasks where both the 'whole' and the 'parts' are to be extracted, such as the need to compare, first, exact values, and, second, the need to identify to which dataset these values belong, the use of symbols would be justifiable. In other words, symbols should be considered for tasks requiring only partial anchoring on the axes (i.e. Group II and III tasks in table 3.8). 3. LINES -- Results of E1 strongly indicated that disembedding a point from a line (i.e. Q1) was an INTEGRATION OF RESULTS / 233 effortful and time consuming task (chapter 6). For example, Q1 took significantly longer to answer than Q2 or Q3 when lines were used. Similarly, line graphs required significantly longer processing time than symbol charts for answering Q1 in E1. In contrast, E3 results provided evidence to show that lines were faster to use than bars for answering both Q2 and Q3. Put together, these results suggest that the general "weakness" of lines for performing Q1 of E1 was due to their having characteristics of being continuous as well as being disjointed completely from the axes framework (i.e. the abscissa and ordinate components). However, this so-called "weakness" associated with lines for isolating single points should be seen as only one side of the coin. Indeed, the same characteristics of continuity (i.e. a strong dataset anchoring characteristic) and separation from the axes (i.e. both a weak x-axis and y-axis anchoring characteristic) apparently became a distinguishable "strength" of lines relative to bars for performing Q2 and Q3 of E3 as observed in the above discussion. The Graph Format by Dataset interaction results indicated that, apart from the influence of task characteristics, lines generally have an advantage over bars because they are read as a 'whole' (i.e. they have a strong dataset anchoring characteristics) and so several datasets may be represented as several lines on a single plot. Results, therefore, revealed that although multiple lines were found to adversely affect task performance in the E1 and E2 task settings, no adverse effect was found with increasing number of datasets on lines in E3 setting (table 9.3); that is, increasing the number of datasets had negligible adverse effect on lines, but not on bars or symbols. Lines were found to be superior to bars for extracting both level difference information (Q2) and trend information (Q3) in E3. Of course, task characteristics also play a part in determining the graph format appropriate for representing multiple datasets. Based on the accumulated evidence, the choice between using multiple lines versus multiple symbol representations appears to depend on whether the main characteristic of the task at hand has a higher information anchoring on the axes framework (e.g. E1 and E2 task settings) or on the INTEGRATION OF RESULTS / 234 dataset component (e.g. E3 task setting). These findings are consistent with the graphics theory literature, which argues that, on the basis of Gestalt principle, datapoints on a line appear to be seen together and are difficult to isolate. For E3 tasks, increasing the number of datasets on a line graph appears to have little or no adverse effect on task performance as compared to doing it on a bar chart or a symbol plot. Essentially this is because each additional dataset depicted on a line graph resulted only in one additional line to be plotted, but on a bar or symbol chart, resulted in many separated bars or isolated symbols. Moreover, since each line is more-or-less seen as a whole, processing several datasets on a line graph involves less effort than on a multiple bar or symbol chart. The implication of these findings for graphics designers is to use multiple line graphs as opposed to multiple bar charts for tasks where no strong anchoring of information is provided on the axes frameworrk (i.e Group IV tasks in table 3.8). Note that, by definition, the absence of strong anchoring of information on both axes framework inevitably ruled out the possibility of fxacf Questions or task dealing only with a single point! B. INTEGRATION OF FINDINGS WITH THE CURRENT LITERATURE To generalize these findings, we now attempt to integrate the experimental results with the current graphics literature. Only results found to be both common across all experiments and of particular interest will be discussed. This discussion emphasizes the time performance results found in session 2 of the experiments.t t Accuracy results as well as session 1 results on time have been of marginal interest throughout this research. INTEGRATION OF RESULTS I 235 1. L e a r n i n g First, a highly significant (p < 0.01) Session effect, which was interpreted as indicative of 'Learning't was found to permeate all the experimental trials. Consistent with session 1 results, results of session 2 show the same relative effect of a selected treatment combination to yield a higher or lower time when compared to another selected treatment combination. Over sessions, there was evidence of an improvement in time performance across-the-board. In short, as found in recent graphics experimentation (e.g. DeSanctis & jarvenpaa, 1985), subjects of the present experiments adjusted rapidly to the different graph formats over the repeated experimental sessions. Learning with respect to accuracy performance was also found when mean scores of the two separate experimental sessions were compared.* Accordingly, it is argued that more attention should be paid to the importance of 'learning' effect in MIS graphics research (see Lusk & Kersnick, 1979; DeSanctis & Jarvenpaa, 1985) and that future researchers should attempt to manipulate and/or control adequately effects due to knowledge and experience of subjects in terms of their familiarity with the graphics stimuli (see Pinker; 1981, 1983). As already noted, the use of a repeated time measures design is appropriate for examining learning effects over time. Moreover, it is important to choose the right subjects, such as frequent users of the graphics representations being investigated, as opposed to first-time or occasional users. In this research, all subjects were asked to undergo a 'practice' session as well as a preliminary 'actual' session to ensure adequate exposure to the experimental graphics stimuli on their part. Finally, the choice of graphics material presented should abide by the standards established for them, so as to avoid t Refer to the Glossary for a definition of this term. * Analysis of accuracy data, however, used the transformed data in order to avoid the highly skewed distribution of the originally captured "0's" and "1 's" data (see Glass et al., 1972 for violation of normality assumption for binary data used in the ANOVA/ANCOVA procedures. One of the key reasons why the combined dataset was used besides that of the normality assumption, was the small significant effects found with respect to accuracy, when analyses were limited to session 2 datasets alone for the various experiments (chapters 6, 7, and 8). INTEGRATION OF RESULTS / 236 possible confounding of factors due to 'poor' or 'illusive' design (see Wainer, 1984). Graphics designed in this research were being pilot tested to avoid possible violations of pertinent principles laid out in Kosslyn et al. (1983) and other sources (see Kosslyn, 1985).t 2. The Individual Difference Characteristics Among the experiments, only E1 results showed a highly positive and significant GEFT-accuracy correlation. Field independents were found to perform more accurately than their counterparts (i.e. field dependents). Lusk (1979), for example, found that high analytics (as classified using Witkin's EFT) outperformed low analytics in a disembedding task regardless of the format of information presentation. On the other hand, all three experiments failed to provide any strong evidence of a statistically significant GEFT-time correlation. These results are interpreted to mean that although field independents were found to outperform field dependents in some situations, there is need to caution about a generalized claim of such superiority. In fact, previous MIS studies have produced only mixed results concerning the effects of individual differences on task performance (see Chervany & Dickson, 1974; Benbasat & Schroeder, 1977; Huber, 1983). In short, the accumulated evidence has not provided any strong support for concerns on the cognitive style variable in MIS graphics research (see Benbasat et al., 1986). 3. Task Characteristics It has been argued that the equivocal results from various graphics experiments often cited in the literature (e.g. Ives, 1982; DeSanctis, 1984) could be explained by differences in task characteristics (Benbasat et al., 1986). That task characteristics was a prime factor influencing performance in this research was evidenced by the directions of the Graph Format by Question Type interaction effect found across the three experiments and discussed earlier. In fact, task differences within as well as t Other aspects of time-accuracy tradeoff are also discussed elsewhere (e.g. Vessey, 1987). INTEGRATION OF RESULTS / 237 among the experiments have been used throughout to explain findings for the various significant effects found across experiments 1, 2, and 3. Consequently, future graphics researchers should pay more attention to providing adequate control over the experimental task variable. In line with results of recent studies (e.g. Dickson et al., 1986), the usefulness of a particular graph format seems to be largely a function of the characteristics of the task at hand. Hence, when there is a strong anchoring of information on the axes framework (i.e. Group I tasks -- Q1 of E1 and E2),t bars as well as symbols were found to be faster and more accurate to use for such tasks than lines (chapters 6 and 7). ln the absence of such an anchoring (i.e. Group IV tasks — Q2 and Q3 of E3),* lines appeared to have a relative edge over bars or symbols (chapter 8). Symbols were found to be most appropriate in situations where there were only partial anchoring of information on the axes as well as strong anchoring of information on the dataset component (e.g. Group 111 tasks -- Q1 of E3). Most of these results were, as a matter of fact, generally supportive of principles underlying the Kosslyn-Pinker theory, such as Gestalt Laws and Pinker's Graph Difficulty Principle (discussed in chapters 2, 3, and 4). In addition, the concept of information anchoring also provided a strong basis for matching task characteristics* to graph format characteristics (table 3.8). In summary, future graphics researchers should not only understand the characteristics of the experimental task investigated, such as the composit ion of its activities, but should also pay particular attention to developing an extended framework or taxonomy of tasks such as that developed in this dissertation (chapter 3). t See tables 3.4, 3.8, and 3.9. * See tables 3.4, 3.8, and 3.9. * Refer to chapter 3 for the detailed discussion. INTEGRATION OF RESULTS / 238 4. Graph Format It is believed that the current controversy over the use of different mode of information presentation can be resolved by a more careful analysis of the match (or mismatch) between the task employed and the presentation format (Benbasat et al., 1986; Vessey, 1987). It is precisely this concern which motivated the present effort. Three types of graph format including bars, symbols and lines as well as three types of elementary cognitive-perceptual tasks, namely, scale-value comparison and extraction task (Q1), level difference comparison and extraction task (Q2), and trend comparison and extraction task (Q3) were investigated in this research. Except for results on time performance for E3, the main Format effect was generally found to be insignificant. For E1 and E3 results on time (for session 2), interaction of Graph Format and Question Type variable, as well as that of Graph Format and Dataset variable, were found to be statistically significant, as postulated. For E2, while the Graph Format by Dataset interaction was also significant, that of the Graph Format by Question Type interaction was not significant for session 2 results. Collectively, these findings provided general support for Pinker's Graph Difficulty Principle (Pinker, 1981,1983; Kosslyn et al., 1983), which claims that while no one type of information format may be deemed superior to another overall, different types of information presentation format are expected to facilitate the extraction of different classes of information. 5. Information Complexity The construct of graphical information complexity was found to be multi-faceted, consistent with current literature (e.g. Davis et al., 1985; Lauer et al., 1985). For example, it was observed that information complexity of time-series graphics depends on several factors, including: the number of time periods plotted along the abscissa; the number of datasets plotted in a single graph; the frequency of pip markings for interpolating quantities along the ordinate scale, the total number of plotted points in a single display (see chapter 3). INTEGRATION OF RESULTS / 239 For all experiments, there was strong and direct evidence to show that information complexity of graphics was stronger at higher levels of dataset category, as well as time period variation. One implication for designers is to restrict the number of time periods or datasets to be depicted on a single plot. This is consistent with the notion that a major limitation on graphics processing is the number of nodes that can be encoded simultaneously in a particular graph schema as well as the finite size of a visual description that can be stored in short-term memory (Kosslyn et ai., 1983). Hence, a graphics designer should always be prepared to simplify a complex and confusing graph into a number of elementary plots that would be easier to read and understand. Another concept associated with information complexity is that of task complexity, a construct which also appears to be multi-factorial. For example, the difficulty of task performance in this research was found to be dependent on several factors including, the type of question asked, the number of graphical components to be searched in order to answer the question,* as well as the type and nature of processing mechanisms involved. In general, it has been shown that higher levels of information complexity in a graphics presentation only lead to greater task complexity. 6. Perceptual-Cognitive Mechanisms in Graphics Processing Little attention has been paid to the perceptual and cognitive behavior of a graph reader in MIS graphics literature. According to Cleveland (1985), the graph reader executes a variety of mental-visual tasks when extracting data from a display. Some of these tasks can be made effortlessly and almost instantaneously, e.g. tasks executed by pre-attentive vision (julesz, 1981), while others require more conscious thought (e.g. tasks involving mental calculation and quantitative reasoning). The tasks examined in this research involved both perception and cognition. It should be noted that in graphics information processing, performance of cognitive tasks is often enhanced with the performance of associated perceptual tasks. For example, reading the DV scale-value of a point is facilitated by the t This differentiated among the experimental tasks for El, E2, and E3. See also tables 3.8 and 3.9. INTEGRATION OF RESULTS / 240 perceptual tasks of locating the position of the point relative to the ordinate scale. It is believed, therefore, that subjects in this research find different types of graphics representation to be helpful in extracting different classes of information, because some kinds of information can be extracted visually from a bar chart, whereas the same information has to be mentally decoded from a line graph, and vice versa. For instance, as bars have relatively strong axes anchoring as compared to lines (see chapter 3), performance of tasks requiring the identification of axes information (e.g. Q1 in E1) with bars would only involve the appropriate conceptual messages being "f lagged" (retrieved) via a bottom-up encoding mechanism process, as opposed to a top-down interrogative process should the same task be performed with lines (see figure 2.4: Kosslyn et al., 1983; Pinker, 1983). In fact, scale-value and time period information on iines cannot automatically be extracted but requires mental effort to read the interpolated value on the respective axes. Consequently, longer time is taken for answering Q1 in E1 with lines than with other representations. Note that symbols also have better anchoring on the axes compared to lines and thus, conceptual messages on time period and/or scale-value information associated with depicted symbols are more easily assembled via the bottom-up mechanisms compared to lines. The key implication of the theory is, thus, the appropriate match of formats to tasks in order to solve a problem with the least amount of top-down encoding processing. Experience or familiarity with extracting various data from various graph formats could also enhance processing time, because repeated activation of a graph schema enhances the priming of appropriate nodes attached to its visual,description (Pinker, 1981, 1983). Task performance in session 2, therefore, was both faster and more accurate than in session 1. Moreover, this explains why some of the significant differences found in session 1 among the various graph formats for performing different tasks were not significant for session 2. In fact, processing strategies were enhanced and stabilized over sessions, for example, several subjects appeared to become comfortable with rapid visual scanning of the displays during session 2, instead of pointing fingers at various places of the graphical INTEGRATION OF RESULTS / 241 displays while performing tasks during session 1. Finally, future graphics researchers may also want to consider monitoring eye movements as so much visual processing is involved in extracting data from graphical representations. C. SUMMARY In summary, both the theoretical discussion and the empirical evidence gathered in this research provided complementary as well as converging evidence to indicate that different types of graph format can facilitate different types of task. More importantly, it is the matching of characteristics between the task at hand and the format of the presentation which determines the relative strengths and/or weaknesses of the various representations. The next chapter concludes the dissertation with a brief review of key findings, contributions, and limitations of the research. In addition, directions for future graphics researchers will also be suggested. X. C O N C L U S I O N S This chapter concludes the dissertation by providing (1) a discussion of major contributions of the research and a summary of key findings; (2) a review of limitations associated with conducting experimental graphics research and implications of these limitations as applied to the present studies; and (3) a general overview of how the present research may be extended, as well as suggestions for future studies based on questions that remain to be answered or issues which arise as a result of this work. A. SUMMARY OF KEY FINDINGS AND MAJOR CONTRIBUTIONS Much has been said about the lack of a cumulative and theory-based research discipline in the field of Management Information Systems (Keen, 1980). Indeed, one of the most often cited reasons for the failure of prior MIS graphics research has been this lack of a cumulative and theoretical perspective (see Davis et al., 1985; Benbasat et al., 1986; Vessey, 1987). Together with these limitations, a number of other methodological problems found among prior MIS graphics research have resulted in a set of guidelines and strong recommendation for the use of programs of laboratory experiments (Jarvenpaa et al., 1985; Dickson et al., 1986; Jarvenpaa & Dickson, 1988). It is futher contended that the design of such an approach which is aimed at studying a variety of tasks at various levels of complexity, examining both outcomes and processes would contribute to developing a cumulative and coherent body of knowledge in the area of graphics (Benbasat et al., 1986; Jarvenpaa & Dickson, 1988). a. Contributions The primary purpose of this research was to provide solution(s) to the basic problem of choosing the most appropriate graphical representation^)* for displaying a given set of data (Bertin, 1983), based on theory and empirical evidence rather than on opinion and intuition. Since the current thinking among MIS graphics researchers is that different modes of information presentation should facilitate different t Such alternative representations include tabular, bar, line, symbol, pie, arid even pictoral representations. 242 CONCLUSIONS / 243 types of tasks (e.g. DeSanctis, 1984; jarvenpaa et al., 1985; Dickson et al., 1986; Benbasat et al., 1986), the central focus of this research program has been on examining the relative strengths and weaknesses of various graph formats on the performance of various tasks. A series of three experiments were conducted to test hypotheses drawn from theories reviewed in chapter 2, so as to provide an independent source of empirical validation on current views of graphics scientists, as well as to uncover new grounds about the strengths and weaknesses of various graph formats. In addition, elementary perceptual-cognitive tasks were examined as opposed to complex decision making tasks because it is believed that they provide the means for laying the foundation of a graphics discipline. The most important contributions of this research, therefore, are that of accumulating empirical evidence and providing a systematic approach for the accrual of knowledge that could be of long term benefit to the research community in the area of graphics and graphics information processing. Accordingly, the primary contribution of the empirical component of this research is that of adding to the knowledge that needs to be compiled for completing the matrix of task enviroments by presentation formats at the micro-level of graph comprehension tasks. An attempt is also made to integrate findings with respect to these more micro-level tasks to the more macro level tasks performed in organizational decision making. This is best done by developing empirical guidelines that have been validated by this research as a set of, ..intial guidelines .. to provide the basis to intelligently move forward .. (in the area of graphics: that is,) .. researchers should be able to replicate and accept certain graphical practice as fact (the base), and add to our knowledge of good graphic practice predicated upon a concrete set of priorities (Jarvenpaa & Dickson, 1988, p.765). In short, implications arising from results found in this research can be used to provide new rules for graphics practitioners as well as to form the basis for future research. For example, the finding that CONCLUSIONS / 244 line graphs have poor axes anchoring could be translated into a rule, which states that if line graphs are to be used for reading point values, some form of redundant coding should be added. These coding could be a grid (which would provide excellent anchoring on the axes components), or distinguishing symbols and numerals on the location of critical points on the line to be read. Apart from these contributions, this dissertation has also contributed by providing a framework for classifying tasks (as applied to tasks experimented in this research), as well as in drawing several hypotheses both independently or interdependently from the literature on theories of graphics that were scattered across many disciplines (chapters 2 and 3). Equally important, this dissertation has also contributed towards: a sound methodology for investigating a large number of critical variables that are believed to influence task performance in the use of a computerized graphics interface; a systematic approach for examining hypothesized relationships; and an efficient design for collecting and analyzing a large empirical database (chapters 4 and 5). It should be noted, however, that any claim to a complete test of a theory such as the Kosslyn-Pinker model of graph comprehension would, in reality, encompass many phases of systematic experimentation. Experiments conducted in this research focussed merely on the predictive validity aspect of the various theories. Specifically, the primary claim tested was that of Pinker's Graph Difficulty Principle. Finally, empirical evidence provided in chapters 6, 7, and 8 for experiments E1, E2, and E3 respectively was integrated and summarized in chapter 9 to provide guidelines to graphics designers as well as future researchers. The next section will review briefly key findings for this research. CONCLUSIONS / 245 b. Findings The following key findings have been based on the analysis and interpretation of empirical data accumulated from experiments E1, E2, and E3. A brief note on the implications of each of these results is also provided. 1. Learning - Learning was found to significantly affect task performance with the use of all graphical information systems. Experimental replications were used to stabilize effects due to learning. 2. Individual Difference Characteristics -- There is only partial and weak evidence on the effect of individual difference characteristics on task performance with graphical systems. 3. Time-Accuracy Tradeoff - Different individuals were found to exhibit different degrees of time-accuray tradeoff. Further, each experimental setting should be controlled separately for possible time-accuracy tradeoff effect in order to maintain a high internal validity and an unambiguous interpretation of experimental results. This could be done, for example, with the inclusion of both time and accuracy measures and appropriate statistical techniques to test the relationship between time and accuracy scores captured. 4. Task Characteristics - Overall, results of the three experiments indicated strongly that the degree of support provided by a particular graph format for a particular task is very much dependent on the matching of characteristics between the graph format and that of the task at hand. Among other things, it is argued that the most critical need for the progress of a graphics discipline is to channel efforts into the development of a comprehensive framework for classifying tasks (see chapter 3) and the accumulation of empirical evidence which is closely associated with such a task taxonomy (see Jarvenpaa & Dickson, 1988). 5. Graph Format -- The properties of the various graph formats investigated may be summarized by the following observations: (1) bars are characterized by strong x-axis anchoring, moderate y-axis anchoring and low dataset anchoring; (2) symbols are characterized by moderate x-axis, y-axis, CONCLUSIONS / 246 and dataset anchoring; and (3) lines are characterized by low x-axis and y-axis anchoring but high dataset anchoring, ln addition, multiple lines and symbols were found to be easier to read and understand than multiple bars. Consequently, the accumulated evidence showed that bars should generally be restricted to tasks where a strong anchoring of information exists on the abscissa and that multiple bars should generally be avioded. In contrast, lines should be used for tasks where little or no anchoring of information is provided on the axes frame and where a high anchoring on the dataset component is required. Finally, symbols appeared to be most suitable when there is only partial anchoring on the axes or where a combination of the characteristics of bars and lines is desired. Thus, this implies that symbols should always be considered as a possible alternative to both bars or lines. 6. Information Complexity - It was found that the construct of graphical information complexity appeared to be multi-factorial. At the very least, this research provided converging evidence to show that information complexity as evidenced by longer elapsed time of a graphics use increased with more time periods and/or datasets plotted on a single display. As well, factors of information complexity were found to interact significantly with the graph format variable. For instance, multiple bar graphs were found to have the greatest adverse effects on task performance as compared to either multiple symbol or multiple line graphs (see chapter 9). B. REVIEW OF LIMITATIONS Generality or external validity is the degree to which the experimental findings may be extrapolated to other populations, settings, and times (see Cook & Campbell, 1979). It may be argued, therefore, that the use of student subjects, the use of elementary task settings at the individual level of graph comprehension, the use of monochrome time series graphics stimuli, and more generally, the use of CONCLUSIONS / 247 laboratory experimental method all represent limitations of this research program. a. Limitations First, it should be noted that the use of students as surrogates for managers in the series of experiments was more or less justified on the ground that similar results of task performance were to be expected, because the tasks were elementary graph comprehension tasks rather than complex managerial decision making tasks. Second, the use of elementary information extraction tasks also made them more generalizable to all tasks performed in real-world organizations, as many uses of graphics in the real-world must necessarily involve elements of these tasks or their combinations. Finally, the reason for using only monochrome graphics, as well as those with no crossing of lines was to control possible confounding effects due to color and other complexity factors so as to maintain a high internal validity of the experimental results. These variables (i.e. color and line-crossing) could always be introduced and manipulated in future studies, if desired. While some kinds of weaknesses such as "unnatural setting" are unavoidable in experimental research, it should be pointed out that concerning the significance of external validity (i.e. the generality of results), the nature of the study should also be carefully considered. The purpose of this research program was to test theory as summarized in Pinker's (1981) principle of graph difficulty using a variety of graph formats and different experimental task settings. In this context, a time dependent measure was, therefore, emphasized as opposed to accuracy although the inclusion of accuracy dependent measure was important to control possible significant time-accuracy tradeoffs. Moreover, for theory testing, external validity may be considered of relatively less importance (Cook & Campbell, 1979) compared to the issue of internal validity.t Consequently, the laboratory experiment method was the most appropriate research strategy chosen for performing this series of studies (see Jenkins, 1985; Benbasat, 1988) because it afforded high internal validity. t This refers to the validity with which the relationship existing or non-existing between the independent and dependent variables investigated may be inferred (Benbasat, 1988). CONCLUSIONS / 248 b. Implications of Specific Limitations The major limitations specific to the research and their implications on the applicability of the findings are: 1. Graphics -- This research evaluates various graph formats. It is not intended to compare tables versus graphs although the tabular format could well be tested as an extension of this research. It is also limited to monochrome time series graphics displays that are generated on a micro-computer screen. The applicability of findings from this investigation is therefore more or less restricted to choosing among alternative graphics representations for depicting time series data on a micro-computer. 2. Tasks - The research concerns only the most usual purpose for which business time series graphics are used: that of extracting elementary quantitative information. The focus is on graph comprehension rather than on complex decision making. The individual is the unit of analysis; group effects are not addressed. Findings based on this research, therefore, may not be directly applicable to complex graphical analysis tasks involving group decision making in organizations although the findings may definitely be used as a foundation for generating a priori hypotheses for relatively complex tasks. 3. Information Complexity -- Due to the iimited number of variables that can be investigated simultaneously in any one experiment, factors of information complexity have been limited only to variations of depicted time periods and datasets. Other potential complexity factors such as the Schutz "degree of line-crossing" effect have been controlled. The findings are thus limited in their applicability to complex forms of time series graphs which have multiple line crossings and irregularities (see Schutz, 1961a,b; Lauer, 1986; Yoo, 1985). Finally, evaluations of the various graph formats for performing various tasks are based only on time and accuracy. Some subjective data have actually been collected from closed and open-ended questions that are administered to participantst after their completion of the computerized session but t I.e. Appendix J. CONCLUSIONS / 249 these data have not been statistically analyzed as they lie outside the intended scope of the research. No findings concerning subjective opinions of participants are thus reported in this write-up. C. SUGGESTIONS FOR FUTURE STUDIES The experimental design, task, subject, and treatment conditions used in the present studies allow for easy replications, refinements, and extensions. In fact, E2 and E3 were simply replications or extensions of E1. At the conceptual level, those researchers who are interested in theories may work towards a more refined version of existing theories to accommodate current and previous findings. Studies may also be done on building a taxonomy of tasks or on how complex tasks may be decomposed. The contributions of time-and-motion studies on task analysis in the field of Industrial Engineering may be of relevance for such a line of research. There is also the need to identify and evaluate other factors of information complexity controlled and/or neglected in the present studies (see Schutz, 1961a,b; Lauer etal . , 1985). At the empirical level, studies are still needed to test various other aspects of existing graphics theories such as the way data is organized in memory, and the processes or computations that could be carried out in reading various graphics. Another area of concern for future researchers concerns the learning effect of various forms of information presentation (see DeSanctis & Jarvenpaa, 1985). This relates also to the need of graphics researchers to examine issues regarding prior knowlege, experience, and familiarity of subjects with the graphics presentations to be investigated (see Pinker, 1981; Simcox, 1981; 1983b). As a direct extension of the current research program, practicing managers and professionals, expert graph readers or others could be studied as replacements of student subjects. As well, new and interesting tasks such as data or relationship clustering may be studied. Moreover, composite tasks CONCLUSIONS / 250 which combine elements of the fundamental tasks investigated could be introduced. There is also the possibility of replicating these series of experiments with colored graphics so as to study the influence of color on graph comprehension. In addition, as newer forms of graph format developed, they can be tested and compared to the oider forms. To conclude, there are numerous possible extensions of the current research program. The overall objective of such research programs should be the testing of existing theories as well as the development of a cumulative and coherent body of knowledge in the information system areas of color and graphics. Ultimately, the validity of any graphics theory must rest on a continuous cycle of refinements and substantiations based on empirical testings. XI. BIBLIOGRAPHY Anderson, J. R. & Bower, G. H. Human Associative Memory. Washington, DC: Hemisphere Press, 1973. Anderson, T. W. An Introduction to Multivariate Statistical Analysis. New York, NY: Wiley, 1958. Baird, J. C , & Noma, E. Fundamentals of Scaling and Phychophysics. Wiley-lnterscience Publication, 1978. Baroudi, J. ]., & Orlikowski, W. J. The Problem of Statistical Power: A Meta-Analysis of MIS Research, 1987. Benbasat, I. Laboratory Experiments in Information Systems Studies with a Focus on Individuals: A Critical Appraisal, WP-88-MIS-023, 1988. Benbasat, I. & Dexter, A. S. Value and Events Approaches to Accounting: An Experimental Evaluation. Accounting Review, 54(4), 1979, 735-749. Benbasat, I. & Dexter, A. S. Individual Differences in the Use of Decision Support Aids. Journal of Accounting Research, 20(1), 1982, 1-11. Benbasat, I. & Dexter, A. S. An Experimental Evaluation of Graphical and Color-enhanced Information Presentation. Management Science, 37(11), 1985, 1348-1364. Benbasat, I. & Dexter, A. S. An Investigation of Color and Graphical Information Presentation Under Varying Time Constraints. MIS Quarterly,10(1), 1986, 59-83. Benbasat, I., Dexter, A. S., & Todd, P. An Experimental Program Investigating Color-Enhanced and Graphical Information Presentation: An Integration of the Findings. Communications of the ACM, 29(11), 1986, 1094-1105. Benbasat, L, & Schroeder, R. An Experimental Investigation of Some MIS Design Variables. MISQ, 7(1), 1977, 640-650. Benbasat, I., & Taylor, R. N. The Impact of Cognitive Styles on Information System Design. MIS Quarterly, 2(2), 1978, 43-54. Bergman, A. S. Perception and Behavior as Compositions of Ideals. Cognitive Psychology, 9, 1977, 250-292. 251 BIBLIOGRAPHY / 252 Bertin, J. Semiologie Graphique: Les Diagrammes - Les Riseaux - Les Cartes. 1967; La Semiologie Graphique. The Hague: Mouton-Gautier, 1973. Bertin, J. Graphics and Graphic Information Processing, Berlin: Walter de Gruyter & Co., 1981. Bertin, ). The Semiology of Graphics. Madison: University of Wisconsin Press, 1983. Blalock, H. M. Theory Construction: From Verbal to Mathematical Formulations. Englewood Cliffs, NJ: Prentice Hall, 1969. Carter, L. F. An Experiment on the Design of Tables and Graphs Used for Presenting Numerical Data. Journal of Applied Psychology, 3, 1947, 640-650. Carter, L. F. Relative Effectiveness of Presenting Numerical Data by the Use of Tables and Graphs, Washington, DC: U.S. Department of Commerce, 1948a. Carter, L. F. Study of the Best Design of Tables and Graphs used for Presenting Numerical Data. Washington, DC: U.S. Department of Commerce, 1948b. Chasen, S. BMDP:P5D - Histograms and Univariate Plots. In BMDP Statistical Software Manual, 1985 Reprinting, by Dixon et al. (Eds.), University of California Press, Berkeley, 1985. Chernoff, H. Graphical Representations as a Discipline. In Peter Wang C. C. (ed.) Graphical Representation of Multivariate Data, NY: Academic Press, 1978. Chervany, N. L., & Dickson, C. W. An Experimental Evaluation of Information Overload in a Production Environment. Management Science, 20, 1974, 1335-1344. Chervany, N. L., Dickson, G. W., & Kozar, K. A. An Experimental Gaming Framework for Investigating the Influence of Management Information Systems on Decision Effectiveness. Management Information Systems Research Centre. Working Paper 7H2, University of Minnesota, 1971. Christ, R. E. Review and Analysis of Color Coding Research for Visual Display, Human Factors, 17(6), 1975, 542-570. Cleveland, W. S. Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging The American Statistician, 38(4), 1984, 270-280. Cleveland, W. S. The Elements of Graphing Data Monterey, Calif.: Wadsworth, 1985. BIBLIOGRAPHY / 253 Cleveland, W. S., & McGill, R. A Color-Caused Optical Illusion on a Statistical Graph. The American Statistician, 1983. Cleveland, W. S., & McGill, R. Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods, journal of the American Statistical Association, 79(387), 1984, 531-554. Cleveland, W. S., & McGill, R. Graphical Perception and Graphical Methods for Analyzing Scientific Data. Science, 229, 1985, 828-833. Cleveland, W. S., Harris, C. S., & McGill, R. Experiments on Quantitative Judgments of Graphs and Maps. The Bell System Technical Journal, 62(6), July-Aug. 1983, 1659-1674. Cochran, W. G. Some consequences when the assumptions for the analysis of variance are not satisfied. Biometrics, 3, 1947, 22-38. Cohen, J. Statistical Power Analysis for the Behavioral Sciences. New York, NY: Academic Press, 1965,1977. Cook, T. D., & Campbell, D. T. Quasi-Experimentation Design and Analysis Issues for Field Settings. Boston: Houghton Mifflin, 1979. Croxton, F. E. Further Studies in the Graphic Use of Circles and Bars. Journal of American Statistical Association, 22, 1927, 36-39. Croxton, F. E., & Stein, H. Graphical Comparison by Bars, Squares, Circles, and Cubes. Journal of the American Statistical Association, 27, 1932, 54-60. Croxton, F. E., & Stryker, R. E. Bar Charts versus Circle Diagrams. Journal of the American Statistical Association, 22, 1927, 473-482. Culberston, H. M., & Powers, R. D. A Study of Graphic Comprehension Difficulties. AV Communication Review, 7, 1959, 97-100. Davis, G. B., & Olson, M. H. Management Information Systems: Conceptual Foundations, Structure, and Development. New York, NY: McGraw-Hill Book Company, 1985. Davis, L. R. The Effects of Question Complexity and Form of Presentation on the Extraction of Question-Answers from an Information Presentation. Ph.D. Dissertation, Indiana University, BIBLIOGRAPHY / 254 1985. Davis, L. R., Groomer, S. M., Jenkins, A. M., Lauer, T. W., & Kwan, Y. Content Validation of a Metric of Question Complexity. Discussion Paper, Indiana University, 1985. Davidson, M. L. The Multivariate Approach to Repeated Measures. Paper presented at the 1980 meetings of the American Statistical Association, Houston, Texas, August 11-14, 1980. DeSanctis, C. Computer Graphics as Decision Aids: Direction for Research. Decision Sciences, 75(4), 1984, 463-487. DeSanctis, G. & Jarvenpaa, S. L. An Investigation of the 'Tables versus Graphs' Controversy in a Learning Environment. In the Proceedings of the Sixth International Conference on Information Systems, December 1985. Dickson, C. W. Management Information System Definitions, Problems and Research. Society for Management Information Systems Newsletter, vol. 1, 1971, 6-12. Dickson, C. W., DeSanctis, G., & McBride, D. J. Understanding the Effectiveness of Computer Graphics for Decision Support: A Cumulative Experimental Approach. Communications of the ACM, 29(1), 1986. Dickson, C. W., Senn, J. A., & Chervany, N. L. Research in Management Information Systems: The Minnesota Experiments. Management Science, 23(9), 1977, 913-923. Dixon, W. J., et al. BMDP Statistical Software Manual, 1985 Reprinting. Berkeley, CA: University of California Press, 1985. Dixon, W. J., & Massey, F. J. Introduction to Statistical Analysis. (3rd ed.) New York, NY: McGraw-Hill, 1969. Dubin, R. Theory Building. New York, NY: The Free Press, 1978. Dunn, O. J. Multiple Comparisons Among Means. Journal of the American Statistical Association, 56, 1961, 52-64. Eells, W. C. The Relative Merits of Circles and Bars for Representing Component Parts. Journal of the American Statistical Association, 21, 1926, 119-132. Ehrenberg, A. S. C. Graphs or Tables. The Statistician, 27', 1978. BIBLIOGRAPHY / 255 Ehrenberg, A. S. C. What We Can and Can't Get from Graphs, and Why. London Business School, Discussion Paper, 1985. Elashoff, J. D. Analyzing repeated measures designs requires more than tests on means, Proc. Am. Stat. Assoc., 1985. Elashoff, J. D. Analysis of Repeated Measures Designs. BMDP Technical Report #83, 1986. Ericsson, K. A., Chase, W. G., & Faloon, S. Acquisition of a Memory Skill, Science, 208, 1980, 1181-1182. Feliciano, G. D., Powers, R. D., & Bryand, E. K. The Presentation of Statistical Information, AV Communication Review, 11, 13, 1963, 32-39. Frane, J. W. The Univariate Approach to Repeated Measures: Foundation, Advantages, & Caveats. BMDP Technical Report #69, 1980. Fodor, J. A. The Modularity of Mind. Cambridge, MA: MIT Press, 1983. Garner, W. R. The Stimulus in Information Processing, American Psychologist, 25, 1970, 350-358. Garner, W. R., & Felfoldy, G. Integrality and Separability of Stimulus Dimensions in Information Processing Cognitive Psychology, Cognitive Psychology, 1, 1970, 225-241. Geisser, S., & Greenhouse, S. W. On Methods in the analysis of Profile Data. Psychometrika, 24, 1959, 95-112. Ghani, J. A. The Effects of Information Representation and Modification on Decision Performance. Ph.D. Dissertation, University of Pennsylvania, 1981. Gibson, E. J., Bergman, R., & Purdy, J. The effect of prior training with a scale of distance on absolute and relative judgments of distance over ground. Journal of Experimental Psychology, 50, 1955, 97-104. Glass, G. V., & Hopkins, K. D. Statistical Methods in Education and Psychology. (2nd Ed.) Englewood Cliffs, New Jersey, NJ: Prentice-Hall, 1984. Glass, G. V., Peckham, P. D., & Sanders, J. R. Consequences of failure to meet assumptions underlying the fixed effects analyses of variance and covariance. Review of Education Research, 42(3), 1972, 237-287. BIBLIOGRAPHY / 256 Glass, G. V., & Stanley, J. C. Statistical Methods in Education and Psychology. Englewood Cliffs, New Jersey, NJ: Prentice-Hall, 1970. Grice, H. P. Logic and Conversation. In P. Cole, & J. L. Morgan, (eds.) Syntax and Semantics 3: Speech Acts. New York, NY: Academic Press, 1975. Hack, H. R. B. An empirical investigation into the distribution of the F-ratio in samples from two nonnormal populations. Biometrika, 45, 1958, 260-265. Hinton, G. E. Some Demonstrations of the Effects of Structural Descriptions in Mental Imagery. Cognitive Science, 3, 1979. Hopkins, K. D. & Anderson, B. L. Multiple Comparisons Guide. Journal of Special Education, 7, 1973,319-28. Huber, G. P. Cognitive Style as the Basis for MIS and DSS Designs. Management Science, 29(5), 1983, 567-57. Ives, B. Graphical User Interfaces for Business Information Systems. Management Information Systems Quarterly, Special Issue, 1982, 15-42. Jarvenpaa, S. L. An Investigation of the Effects of Choice Tasks and Graphics on Information Processing Strategies and Decision Making Performance. Seminar Paper, University of Minnesota, 1986. Jarvenpaa, S. L., & Dickson, G. W. Managing the Use of Computer Graphics in Organizations. MISRC-WP-85-11. Jarvenpaa, S. L., & Dickson, G. W. Graphics and Managerial Decision Making: Research Based Guidelines. Communications of the ACM, 31(6), 1988, 764-774. Jarvenpaa, S. L., Dickson, G. W., & DeSanctis, G. Methodological Issues in Experimental IS Research: Experiences and Recommendations. MIS Quarterly, 9 1985, 141-156. Jenkins, A. M. Research Methodologies and MIS Research. Indiana University, Discussion Paper no. 277, 1985. Julesz, B. Textons, the elements of perception, and their interactions. Nature, 290 1981, 91-97. Keen, P. G. W: The Implications of Cognitive Style for Individual Decision-Making. Ph.D. BIBLIOGRAPHY / 257 Dissertation, Harvard University, 1973. Keen, P. G. W. MIS Research: Reference Disciplines and a Cumulative Tradition. Paper presented at the 1st International MIS conf., 1980. ' Keen, P. C. W., & Scott-Morton, M. S. Decision Support Systems: An Organizational Perspective. Reading, MA: Addison-Wesley, 1978. Keppel, G. Introduction to Design & Analysis. San Francisco, CA: W. H. Freeman & Co.,1980. Kerlinger, F. N. Foundations of Behavioral Research. New York, NY: Holt, Rinehart, & Winston, 1973. Kosslyn, S. M. Mental Representation. In J. R. Anderson, & S. M. Kosslyn, (eds.) Tutorials in Learning and Memory: Essays in Honor of Gordon H. Bower. San Francisco, CA: Freeman, 1982. Kosslyn, S. M. Graphics and Human Information Processing: A Review of Five Books Journal of the American Statistical Association, 30(391), 1985, 499-512. Kosslyn, S. M., Pinker, S., Simcox, W. A., & Parkin, L. P. Understanding Charts and Graphs: A Project in Applied Cognitive Science NIE 400-79-0066, National Inst, of Education (ED) Washington, D.C, 1983. Kubovy, M. Concurrent Pitch Segregation and the Theory of indispensable Attributes. In M. Kubovy, & J. Pomerantz, (eds.) Perceptual Organization. Hillsdale, NJ: Lawrence Erlbaum Press, 1982. Larkin, J. H., & Simon, H. A. Why a Diagram is (Sometimes) Worth Ten Thousand Words, Cognitive Science, 7 7, 1987, 65-99. Lauer, T. W. The Effects of Variations in Form of Presentation and Information Complexity on Performance in an Information Extraction Task. Ph.D. Dissertation, Indiana University, 1986. Lauer, T. W., Davis, L. R., Groomer, S. M., Jenkins, A. M., & Kwan Y. Establishment of the Content Validity of a Metric of Information Set Complexity. Discussion Paper, Indiana University, 1985. Lehman, J., Vogel, D., & Dickson, G. Nine Trends in Business Graphics Use. Datamation, 30(19), 1984, 119-122. Lindman, H. R. Analysis of Variance in Complex Experimental Design. San Francisco, CA: W. H. BIBLIOGRAPHY / 258 Freeman & Co., 1974. Lindsay, P. H., & Norman, D. A. Human Information Processing. New York, NY: Academic Press, 1977. Lucas, H. C. An Experimental Investigation of the Use of Computer-based Graphics in Decision Making. Management Science, 27(7), 1981, 757-768. Lucas, H. C , & Nielsen, N. R. The Impact of the Mode of Information Presentation on Learning and Performance. Management Science, 26(10), 1980, 982-993. Lusk, E. J. A Test of Differential Performance Peaking for a Disembedding Task. Journal of Accounting Research, 17(1), 1979, 286-294. Lusk, E. J., & Kersnick, M. The Effect of Cognitive Style and Report Format on Task Performance: The MIS Design Consequences. Management Science, 25(8), 1979, 787-798. Macdonald-Ross, Michael. How Numbers Are Shown: A Review of Research on the Presentation of Quantitative Data in Texts. Audiovisual Communications Review, 25(4), 1977a, 359-409. Macdonald-Ross, Michael. Research in Graphic Communication, Review of Research in Education, 5, 1977b, 49-85. Manson, R. O., & Mitroff, I. I. A Program for Research on Management Information Systems. Management Science, 19(5), 1973, 475-487. Marr, D. Vision. San Francisco, CA: W. H. Freeman, 1982. Marr, D., & Nishihara, H. K. Representation and Recognition of the Spatial Organization of Three Dimensional Shapes. Proceedings of the Royal Society, 200, 1978, 269-294. Miller, G. A. The Magical Number 7±2: Some Limits on Our Capacity for Processing Information. Psychological Review, 63(2), 1956, 81-97. Miller, G. A., & Johnson-Laird, P. Language and Perception. Cambridge, MA: Harvard University Press, 1976. Miller, R. G. Simultaneous Statistical Inference. New York, NY: McGraw-Hill, 1966. Minsky, M. A Framework for Representing Knowledge. In P. H. Winston; (ed.) The Psychology of Computer Vision. New York, NY: McGraw Hill, 1975. BIBLIOGRAPHY / 259 Mock, T. J. Concepts on Information Value and Accounting, Accounting Review, Oct. 1971, 765-78. Mock, T. J., & Vasarhelyi, M. A. Context, Findings, and Methods in Cognitive Style Research: A Comparative Study. Research Working Paper 531A, Columbia University Graduate School of Business, Sept. 1983. Moriarity, S. Communicating Financial Information through Multidimensional Graphics. Journal of Accounting Research, 17, 1979, 205-223. Morton, J. A Singular Lack of Incidental Learning, Nature, 215, 1967, 203-204. Nickerson, R. & Adams, M. Long-Term Memory for a Common Object, Cognitive Psychology, 11, 1979,287-307. Norman, D. A., & Rumelhart, D. E. (eds.) Exploration in Cognition. San Francisco, CA: W. H. Freeman & Company, 1975. Norton, D. W. An Empirical Investigation of Some Effects of Nonnormality and Heterogeneity on the F-distribution. Ph.D. Dissertation, State University of Iowa, 1952. Palmer, Visual Perception and World Knowledge: Notes on a Model of Sensory-Cognitive Interaction. In D. A. Norman & D. E. Rumelhart (eds.) Explorations in Cognition, San Francisco: Freeman, 1975. Pearson, E. S. The distribution of frequency constants in small samples from non-normal symmetrical and skew populations. Biometrika, 21, 1929, 259-286. Pearson, E. S. The analysis of variance in cases of non-normal variation. Biometrika, 23, 1931, 114-133. Peterson, L. V., & Schramm, W. How Accurately are Different Kinds of Graphs Read? AV Communications, 2, 1984, 178-189. Pinker, S. A Theory of Graph Comprehension. Occasional Paper #15, Cambridge, MA: MIT Center for Cognitive Sciences, 1981. Pinker, S. Pattern Perception and the Comprehension of Graphs. NIE 400-79-0066; National Inst, of Education (ED), Washington, D.C. 1983. Pinker, S.,& Kosslyn, S. M. Theories of Mental Imagery. In Sheikh, A. A. (ed.) Imagery - Current BIBLIOGRAPHY / 260 Theory, Research, and Application. New York, NY: Wiley, 1983. Pisoni, D., & Tash ]. Reaction Times to Comparisons within and across Phonetic Categories, Perception and Psychophysics, 75(2), 1974, 285-290. Powers, M., Lashley, C , Sanchez, P., & Shneiderman, B. An Experimental Comparison on Tabular and Graphic Data Presentation. International journal of Man-Machine Studies, 20, 1984, 545-566. Price, J. R., Martuza, V. R., & Crouse, ]. H. Construct Validity of Test Items Measuring Acquisition of Information from Line Graphs, journal of Educational Psychology, 66(1), 152-156. Remus, W. An Empirical Investigation of the Impact of Graphical and Tabular Data Presentations on Decision Making, Management Science, 30(5), 1984, 533-542. Rider, P. R. On the distribution of the ratio of mean to standard deviation in small samples from non-normal populations. Biometrika, 27, 1929, 124-143. Robey, D. Cognitive Style and DSS Design: A Commentary on Huber's Paper. Management Science, 29(5), 1983, 580-582. Rock, I. Perception. New York, NY: W. H. Freeman, 1984. Sage, A. P. Behavioral and Organizational Considerations in the Design of Information Systems and Processes for Planning and Decision Support, IEEE Trans. Systems, Man Cybernet, 3MC-HO), 1981, 640-678. Schank, R., & Abelson, R. Scripts, Plans, Coals, and Understanding. Hillsdale: Lawrence Erlbaum, 1977. Schmid, C. F. Handbook of Graphic Presentation. New York, NY: Ronald Press, 1954. Schmid, C. F., & Schmid, S. E. Handbook of Graphic Presentation. New York, NY:]ohn Wiley, 1979. Schutz, H. G. An Evaluation of Formats for Graphic Trend Displays, Human Factors, 3(3), 1961a, 99-107. Schutz, H. G. An Evaluation of Methods for Presentation of Graphic Multiple Trends, Human Factors, 3(2), 1961b, 108-119. Shneiderman, B. Software Psychology: Human Factors in Computer and Information Systems. Winthrop Pub., 1980. BIBLIOGRAPHY / 261 Simcox, W. A. Cognitive Considerations in Display Design, NIE 400-79-0066, National Inst, of Education (ED), Washington, D.C. 198.1: Simcox, W. A. Configural Properties in Graphic Displays and Their Effects on Processing, NIE 400-79-0066, National Inst, of Education (ED) Washington, D.C. 1983a. Simcox, W. A. Memorial Consequences of Display Coding, NIE 400-79-0066, National Inst, of Education (ED) Washington, D.C. 1983b. Simcox, W. A. A Method for Pragmatic Communication in Graphic Displays, Human Factors, 26(4), 1984, 483-487. Streufert, S., & Streufert, S. C. Behavior in the Complex Environment. Washington, DC: V. H. Winston & Sons, 1978. Takeuchi, H., & Schmidt, A. H. New Promise of Computer Graphics. Harvard Business Review, 58(1), Jan-Feb 1980, 122-131. Thomdyke, P. W. Applications of Schema Theory in Cognitive Research. In j . R. Anderson & S. M. Kosslyn (eds.), Tutorials in Learning and Memory. W. H. Freeman & Co., 1984. Tufte, E. R. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press, 1983. Tullis, T. S. An Evaluation of Alphanumeric, Graphic, and Color Information Displays, Human Factors, 23(5), 1981, 541-550. Vasarhelyi, M. A. Information Processing in a Simulated Stock Market Environment. In Proceedings of the Second International Conference on Systems, 1981. Vernon, M, D. The Use and Value of Graphical Material in Presenting Quantitative Data. Occupational Psychology, 26, 1952, 22-34. Vessey, I. The Tables vs Graphs Controversy: An Information Processing Analysis. Working Paper, Graduate School of Business, University of Pittsburgh, 1987. Vicino, F. L., & Ringel, S. Decision-making with Updated Graphic vs. Alphanumeric Information. Washington, D.C: Army Personnel Research Office, Technical Research Note 178, Nov. 1966. Wainer, H. How to Display Data Badly, The American Statistican, 38(2), 1984, 137-147. Wainer, H., & Reiser, M. Assessing the Efficacy of Visual Displays. Proceedings of the American BIBLIOGRAPHY / 262 Statistical Association, Social Statistical Section, 1, 1979, 89-92. Wainer, H., Lono, M., & Groves, C. On the Display of Data: Some Empirical Findings. Washington, DC: The Bureau of Social Science Research, 1982. Wainer, H., & Thissen, D. Graphical Data Analysis. Annual Review of Psychology, 32, 1981, 191-241. Washburne, J. N. An Experimental Study of Various Graphic, Tabular and Textural Methods of Presenting Quantitative Material. Journal of Educational Psychology, 78,(6), 1927, 361-376, 465-476. Weick, K. E. Laboratory Experimentation with Organizations, in J. G. March, (ed.) Handbook of Organizations. Chicago, IL: Rand McNally, 1965. Wertheimer, M. Laws of Organization in Perceptual Forms. In W. D. Ellis, (ed.) A Source Book of Cestalt Psychology. London: Routledge and Kegan Paul Ltd., 1938. Wilcox, W. Numbers and the News: Graph, Table or Text? Journalism Quarterly, 41(1), 1964, 38-44. Winer, B. J. Statistical Principles in Experimental Design. New York, NY: McGraw-Hill, 1962; 1971. Winston, P. H. Learning Structural Descriptions from Examples. Artificial Intelligence Report MAC TR-76, MIT, 1975. Witkin, H. A., Oltman, P. K., & Raskin, E. Manual: Embedded Figures Test, Children's Embedded Figures Test, Group Embedded Figures Test. Palo Alto, CA: Consulting Psychologists Press Inc., 1971. Yoo, K. H. The Effects of Question Difficulty and Information Complexity on the Extraction of Data from an Information Presentation. Ph.D. Dissertation, Indiana University, 1985. Zipf, G. K. The Psycho-biology of Language. Boston, MA: Houghton-Miff l in, 1935. Zmud, R. W. Individual Differences and MIS Success: A Review of the Empirical Literature. Management Science, 25(10), 1979, 966-979. Zmud, R. W., Blochor, E., & Moffie, R. P. The Impact of Color Graphic Report Formats on Decision Performance and Learning. Proceedings of the Fourth International Conference in Information Systems, 1983, 179-193. XII. APPENDIX A: GLOSSARY OF TERMS This Glossary defines certain of the specialized terms that are used in this dissertation. When pertinent definitions were readily available from published sources, they have either been quoted verbatim or adapted. In all instances, however, the definitions reflects the meaning of the term as used in the context of this dissertation. 1. Absolute-Value -- A scale whose units are discrete and well-defined, such as the number of jellybeans in a jar (Pinker, 1981). 2. Cognitive Style -- The process behavior that individuals exhibit in the formulation or acquisition, analysis, and interpretation of information or data of presumed value for decision making (Sage, 1981; Huber, 1983). It categorizes individual habits and strategies at a fairly broad level and essentially views problem-solving behavior as a personality variable (Keen & Scott Morton, 1978). 3. Color - Color is the sensation of the variation in the wavelengths of the light reflected by a surface (see Bertin, 1981, p. 187; Lauer, 1986). 4. Conceptual Question - A conceptual question is simply a piece of information that the reader wishes to extract from a graph (Pinker, 1981, p. 19). 5. Dataset Category - The number of datasets depicted as separate catergories using a particular coding scheme. Each category will thus correspond to a cluster of data which forms a unit or entity. These entities are normally spelled out by a legend in time-series graphics. 6. Dimension - A dimension is defined by bipolar endpoints, discriminable into articulated parts (Streufert & Streufert, 1978, p.31). 7. Grpah Format - The different forms of representation such as bars, symbols, lines, wedges, pies, etc. used for presenting quantitative information in time-series graphics. 8. Learning - The cognitive skill of extracting patterns and relationships presented in time-series graphics and applying the extracted information to answering forced-choice 263 APPENDIX A: GLOSSARY OF TERMS / 264 questions (see DeSanctis & Jarvenpaa, 1985). 9. Ordinal and Disordinal Two-Factor Interactions -- Two-factor interactions are classified as ordinal or disordinal depending on changes in effects across levels of one factor with respect to the other factor. If effects found across levels of one factor (say, factor A) are consistently higher (or, lower) for each and ever)' level of the other factor (say, factor B), then the interaction is said to be an "ordinal" one. Otherwise, it is disordinal. in other words, the graphing of an ordinal two-factor A x B interaction will not have any crossing of effect lines, (see Graph Format by Dataset interaction of E1). Conversely, that of a disordinal A x B interaction will have effect lines crossing each other (see Graph Format by Question Type interaction of E1). 10. Orientation — The orientation of a mark is its angle with reference to some set direction such as the horizontal or vertical axis on a graph (see Bertin, 1981, p. 187; Lauer, 1986). 11. Power - The probability of rejecting the null hypothesis given that a particular hypothesis alternative to the null is true. The power of a test is its probability of leading to a Type II error for a particular alternative hypothesis (Glass et al., 1972, p. 284). 12. Primitive Symbol -- This refers to the elements used in graphing quantitative data (see Cleveland, 1985). These elements are perceived as the 'unit' symbol in a graphics representation. 13. Principle of Invited Inference — According to Kosslyn et al. (1983), this refers to the use of proper scaling and other standards in designing graphics so as not to invite misleading interpretations on the part of the graph readers. Huff (1954) is a classic on how to lie with statistical graphs. 14. Principle of Contextual Compatibility -- According to Kosslyn et al., this principle refers to the fact that since most graphic displays are embedded in a context, it is important that "the context and semantic interpretation of the display ... be compatible or comprehension of the display will be impaired." (Kosslyn et al., 1983, p. 50) The principle parallels that proposed by Grice (1975) for language comprehension in the context of an APPENDIX A: GLOSSARY OF TERMS / 265 oral presentation. 15. Quantity Scaling -- This refers to the mathematical representation of quantities on the ordinate scale of time series graphics. Such a scale is useful for interpolating exact values. 16. Question Type - The various sorts of tasks that are performed with the use of the various kinds of time-series graphics used in the series of experiments conducted in this research. These tasks are in the form of forced binary-choice questions (see appendices A, B, and C). 17. Ratio-Value - Unlike the absolute-value, this refers to those quantities that may be represented continuously but whose units are arbitrary in that they may be changed to other units without any loss of information. For example, dollars could be changed to a different unit like cents with no loss of information. Also the inches-feet-yards scale (Gibson et al., 1955). 18. Scale-Value - This refers to the quantitative values of a datapoint represented on the ordinate scale. Reading of scale values of datapoints involves the interpolation of mathematical units on the ordinate. 19. Schema - A schema is a knowledge structure comprising a cluster of knowledge representing a particular generic object, percept, procedure, event or sequence of events, social situation, etc. Such a cluster provides a skeleton structure for a concept that can be activated or filled out with the detailed properties of the particular instance being represented (see Thorndyke, 1984): 20. Shape - This visual variable refers generally to a mark of constant size but which can vary in form; that is, the outline of the mark can vary (see Bertin, 1981, p. 187; Lauer, 1986). 21. Size -- This visual variable refers generally to perceived unit area that can vary from one size to another (see Bertin, 1981, p. 187; Lauer, 1986). 22. Textual - A texture may be described as the number of marks or shapes for some standard unit of area depicted as a regular pattern (see Bertin, 1981, p. 187; Lauer, 1986). 23. Theory - The term, theory may be defined as an unambiguous statement of (1) the APPENDIX A: GLOSSARY OF TERMS / 266 entities in a system, or (2) the lawlike relations among them (Pinker & Kosslyn, 1983, p. 44). 24. Time Period Variation - The number of time periods that are depicted along the abscissa or x-axis of a time-series graphics. 25. Value - This visual variable refers to the ratio between the total amount of black and white (see Bertin, 1981, p. 187; Lauer, 1986). XIII. APPENDIX BrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 Data Sources for Generating Graphics Company A Revenue Report Periods month 10 7 1 89 77 48 53 61 23 20 A Revenue Report for Various Companies Periods month 10 7 3 65 73 74 57 22 27 29 A 86 82 93 76 65 60 50 B 76 78 81 64 40 30 35 C Company A Revenue Report Periods label 10 14 1 98 91 83 77 74 68 56 53 40 23 26 27 32 32 A Revenue Report for Various Companies Periods label 10 14 3 57 64 67 73 74 42 50 57 63 66 41 23 18 9 A 99 97 93 89 87 81 68 78 85 90 67 58 49 30 B 75 82 90 85 84 65 56 63 68 75 56.43 36 24 C 267 APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 2 6 8 Company A Revenue Report (Revenue in 1000's Dollars) LU 0—8 L U LU LU «/i LU r> «/i a; = LL! LU • 3 •3u '—• APPENDIX B:CRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -269 Revenue Report for Various Companies (Revenue in 1000's Dollars) i APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 /-V270 Company A Revenue Report (Revenue in 1000's Dollars) I o •Vs •r i LsJ •E d-rrj 8—1 BX." LsJ 8Jl> 8jJ Z3 LsJ ' / I LU —• Vi 8x1 LL! • ilC —5 3C APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT T /. 271 Revenue Report for Various Companies (Revenue in 1000's Dollars) J : r-. jr. a--U.1 • / I LU EI EX LU «.M LU = > ' / i ale" LU LU CD oc: _ i JZ - "»H .;M BT"| APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 272 Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 273: Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -274" Company A Revenue Report (Revenue in 1000's Dollars) o o U1 3 -• 8—I IXl LU •r i— i LU •r a LU t—g «.'« LU Z3 uc 21 Ccl Ul u. LU LU _n r> rr " n = LU • B - J 3 Vj .—. . w APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 275 ; Revenue Report for Various Companies (Revenue in 1000's Dollars) 'Ji LZ L.J ca Ul '37. LU LU ILL. 1—5 L'" l U J 2: cc: i i i L i J U J 3u ~ i ZX ' J ? L l J O H I -f-APPENDIX B:CRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 276: Company A Revenue Report (Revenue in 1000's Dollars) i I I I I I I I ITU D e — a J—I LILJ BJL. JZ L F _ B.'"i =3 LsJ. LU Vl L-1 8.t"j ! / | L U I J U LL I 2 " . •nr-B if--! UJ APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / ^ 2 7 7 ^ C ^ ^ * W Revenue Report for Various Companies (Revenue in 1000's Dollars) « ^ r - v ; > •..?•**?*>,< APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 278' Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX BiCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / ^279 Revenue Report for Various Companies (Revenue in 1000's Dollars) J Z f: ' i ' U l I •r LU LU' •LU »x: a i— a 8~H! LU \t\, xz a LL. lJ3 IS/B M . M LU '.'"« a: <E LU yj LU i K 'isH ~ " LU LTJ 2: O «~" Vu -~-•ar -_• -_• APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 /•-.280-:-Company A Revenue Report (Revenue in 1000's Dollars) o "'.V 1 T i I i i IUJI O it—a J.!J t u SHI BJJ ».'"B 8JJ ~ » '.'"8 U~ = LiJ y j a S sji! _J 3C ^ "sHI n'"-j L U <n - - w £ APPENDIX -^GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / . ^281^ :>Wv-e Revenue Report for Various Companies (Revenue in 1000's Dollars) ".^'^tutsShsOK APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -282- - " ' V — * ^ .' Company A Revenue Report (Revenue in 1000's Dollars) • >,r-»-.;u;-LsJ 8x5 ~" '.'"! BXl LiJ! UJ • „J 2Z a: • - • • - • u. APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -283 Revenue Report for Various Companies (Revenue in 1000's Dollars) I B r-. L U a L U I— L U •= ZJ £ JII U . L U ".'"« L U Z> V i Bli = L U L U • | liC — J 3 C w IX. •r — -• £ APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS . IN EXPERIMENT 1 / \284 > Company A Revenue Report (Revenue in 10OP's Dollars). APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 A--.285. Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 M286' Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -287: Revenue Report for Various Companies (Revenue in 1000's Dollars) a . t—0 UJ zc i LU o iar-LU LU « 2T !>: LU u-LU LU :r " ~« n: B."? = L i O H 5 id-: - J X w to - v, <c •-• •-• £ APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 288 Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 Revenue Report for Various Companies (Revenue in 1000's Dollars) ><£ APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 /--290-. Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / . 29T.; Revenue Report for Various Companies (Revenue in 1000's Dollars) „_J • i : tic LaJI tlai 'X. >:4 a M LaJ ni.., I I a 8J« >S? "•i LLI « =3 -X "E S si: i j j £ LHJ SJEI Cui LU LLI 2Z = £i£ 'Cl n ^ APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / v292 ; Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX BrCRAPHlCS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / \293' Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX B-CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -294 Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX"B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 295 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT. .1 / .296 • Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX B:CRAPHICS & QUESTIONS FOR-TRIALS IN .EXPERIMENT 1 /.v292? Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -298. Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 299; ' Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX FJ:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / ' 3 0 0 -Company A Revenue Report (Revenue in 1000's Dollars) APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / ,301' Revenue Report for Various Companies (Revenue in 1000's Dollars) XIV. APPENDIX CrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 Data Sources for Generating Graphics Company B Revenue Report Periods j month 10 7 1 89 77 48 69 61 23 20 B Revenue Report for Various Companies Periods month 10 7 3 65 73 74 57 22 27 29 A 86 82 93 76 65 60 64 B 76 78 81 64 40 30 35 C Company B Revenue Report Periods label 10 14 1 98 91 87 78 74 68 56 53 40 23 26 27 32 32 B Revenue Report for Various Companies Periods label 10 14 3 57 64 67 73 74 42 50 57 61 66 41 23 18 9 A 99 97 93 89 91 81 68 78 85 90 67 58 49 30 B 75 82 90 85 84 65 56 63 68 75 56 43 36 24 C 304 APPENDIX OCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 305 Company B Revenue Report (Revenue in 1000's Dollars) APPENDIX CrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 306 Revenue Report for Various Companies (Revenue in 1000's Dollars) r o o •e i n tn • - j -:i"3 ! • - . -i.v L U O C L U L U L U 23 O LU »-« LU U l ua •-• APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 Company B Revenue Report (Revenue in 1000's Dollars) / 307 i — r o —t t i t hi Ul LU irl u_ -LU LU LU LU iic: LL. «,•: --• — i l l • _ • S u •* I—' • 'i __ cr: , " n •T" U J v 04 ^  APPENDIX G.CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 308 Revenue Report for Various Companies (Revenue in 1000's Dollars) i I I i : •Vt-LU sir." *r LU CC •r a g 1 x 1 LE. ^ K1 ua cc * z a x ^ LU -~ -• i t O = UJ LU 2 : = ex: a. cc ^ APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 309 Company B Revenue Report (Revenue in 1000's Dollars) V iii slr. O i — ill ilxl LU L'J ixi a »x: _ J ! U"e a : 2 : U.S a«, . !? t-~ .x: U J ILK." iTJl «-* O O »-« « ir_" £-« UJ IX-n t— u i LU ZTJ m . _ ! -7- l _ l C3 2 : ! »-• 2...JJ iv«: iCs:. fx! Ix! _J s 1 3-"" > a S LU £ APPENDIX CrCRAPHlCS & QUESTIONS FOR TRIALS IN EXPERIMENT Revenue Report for Various Companies (Revenue in 1000's Dollars) 2 / -310-si?! I I'.;'. g LU fj:.. 8., 3 o r.ti c— KJ ILLJ L'„ Ct! F.J"I Lil c— • i : L I L . -" : S O's i d IJU « EJJ a ~c u. IJJ —it i— ue ice: n £ =Ht LU 2 t = _ J LL. "JC ^ (,"8 • - - • " -APPENDIX QCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 3 1 1 Company B Revenue Report (Revenue in 1000's Dollars) [ I I I [ i : i 3 r: F J J ia Cii lul liJi a .—i UJ e--3 u — U J ••-•« a U J In... ~1 O C3 UJi J—I •r Cu LK. •r L'l I_J UJ a rr M in i—. • — ik Cc. • «r LL) —T™ ZZ\ _ j UL O C UJ LU APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 3 1 2 Revenue Report for Various Companies (Revenue in 1000's Dollars) JZ LZ JZ LZ „.rz z z \z i z : . i z z z : r: "T LU -;ir t— LU US L'"S ZZt tji »__• o Lti "*f a: zrz 1x1 —1-» . j — LU 1x1 t>: t— '.•I LU ILCl •r _j a hif.: t—1 t—-ZZ\ L.J LU L=l •»--! 2= C3 I _ J Lx. c- a irl '-I !—I .TP fx. lis LU a r c »-i •— LU 2C LJ_ cc APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 3 1 3 Company B Revenue Report (Revenue in 1000's Dollars) APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 314 Revenue Report for Various Companies (Revenue in 1000's Dollars) 111 j — j™ SUJ! =""-S •jri j./J iii ua "X" • L U »--* L T J 1 izi. .—. IZZ L U i  CC SA. B_.l UJI U l s—t a C J «-» ..J I ir.". KJ (!—« UJI -^— ~ i 0 — UJ a LB"B h.1 m in —i— ' . „ « L-J O -JLZ • _.1 «•-. O UJ ".'•4 APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 7 315 Company B Revenue Report (Revenue in 1000's Dollars) L" • :-|— Ul L U L U r - . . E V : „ J LA • r : .71: ia C 3 2 : s~ .JJ; LU iji. C J [•-. | j j t ~ LL! 1/1 Q LU 'r« _ i • 21 __» n Vi - - -— 1-1 LsJ a IJC, LU iCt. S.U f.TJ ~ J ._i E—I • / I LU a CJ APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 316 Revenue Report for Various Companies (Revenue in 1000's Dollars) I I •J? I 3 i v. SS I I 1 I is . J l " JZ " i : [ . " " « T •x ,.1 r LU LET. >:--i L U L U rj :?z 0 . " a ! - « 8— l_( UJ C J •/I UJ i n a a L U «--« njc; 2 1 " L U • 1 " Ou LU C l C l tzi ZZl L C L U 2 T LL. ' " I l.i-, I — LU • Ci 22 n _ ! ' / l LU •zz a - -r-i «H APPENDIX CGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 3 1 7 Company B Revenue Report (Revenue in 1000's Dollars) APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 318 "" Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / ,319 Company B Revenue Report (Revenue in 1000's Dollars) APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 320 Revenue Report for Various Companies (Revenue in 1000's Dollars) r-. t— LU ttz • i : LU LU •r i / i -±-LU Z3 Cl zz • LU n ZZ* LCl LU LU LCl LL. a 8—1 LU a. LU i — r f"l*» •—" - — • APPENDIX QGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 321 Company B Revenue Report (Revenue in 1000's Dollars) U J ! L i » . M B / " I ! — o L U C u L U «/F. CtL F-UL" = 3 L U I_! ~~ j r. B.J »~i a i— LL. L L ! B_I ..ni U J ..*.,.. BBti"| «r nr. a « _ < ' J : -UJ 23 7~ Lis LU —• CE. UJ *-< tJZ CC LL. L'l i-t UJ O EH! I-H cc: cc: •r UJ _j a. u_ I d I D ix; LJ I 1 UJ I.JU -LU ^ rr ^ i — _ oc 5 1-1 I^JJ UJ L U APPENDIX OGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 322 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX OCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / .323 Company B Revenue Report (Revenue in 1000's Dollars) I I i l l 1! [I lit HI lite V l i'.'.i a s a HI £-1 C-H 1 1 — LJ LU LEI •*••» an -»-* L~H Li.a \J~ ru o a s i L U Cii ~ » f i t : « ~ « JjJ. 1—1 OC uj-; or i j _ V i C l U J w a x i_ni t—f j — «r LU 2r _ J a. oc i — r — m-l l"-J u i L L ! • - • • - " APPENDIX OGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 324 Revenue Report for Various Companies (Revenue in 1000's Dollars) I lit £51 Hi! I i i i'.ii Vt Of. L.J KJ O LiJ '•JJ r r KJ •:-!? Ul Z3 C> LU ill tt. ct: LU S---1 »...' LU a g i y „ x a -L i ^ tij B—5 ui ui a in t — i ^ 5 LU " s— a | CC 5 i — r APPENDIX CGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 325. Company B Revenue Report (Revenue in 1000's Dollars) »*—, a K . 5.™ Ui L«'i Z'J. !..«J lit IT--. «Ji? -•ii-. Ui = 1 : a HiJI -*™» a LI" iK B—! i— on a. !.U L'J t'j£ i~4 tZi 1 u.-: «_> E—t LlJ 2 : L L . »-« ZZl 1— j — LU • Wi L J 2 : Ul in zzz '_« LU a 2: a - J .-..-. — •»-< « N APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 326 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / .327. Company B Revenue Report (Revenue in 1000's Dollars) E—JE t — LU ZSU L i l i'xi K-4 ...J •11 I—s i — U J I zzt 8— Ui LU in • m i.i's (V; 1*4 or ix. dri LU 8—8 ( « ETC: «„J ^ LU 2 : " fu n " C l h— 5— > LU • « I_I LU | E—' " \ L U E l i APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 328 Revenue Report for Various Companies (Revenue in 1000's Dollars) L U t— L U "ZZ •«•"• U J r:> us L U •ai. _ J . U'XZ U J • LD t— az >„' i x ! L L ! tie. LTJ !-= LTJ C i • ux. n LU z : CJ_ >.'•» UJ '.'"i LTJ UJ rn a rr RJUI L7J fr—s U J i>_ Jxi lTj m „J d__i - T T " »S I 1 u i LU = • | 1 o IX! ro wo APPENDIX CGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 330 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX QGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 331 Company B Revenue Report (Revenue in 1000's Dollars) APPENDIX OCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 332 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX QGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 3 3 3 Company B Revenue Report (Revenue in 1000's Dollars) 100 — i 20 H 1 2 ? + E f. B ' 5 LARGEST R E U E N U E C H A N G E O C C U R S BETS-SEEN f l ) P E R I O D P A I R S : i £ ( 2 J A N O T H E R P A I R OF C O N S E C O T I U E PERIODS FREES <RETURM> IF REfiDV TD SEE GRAPH 20 —\ to H * 3 1'-' i 1 i i 1 - : It j B's LARGEST REUENUE CHANGE OCCURS BETWEEN " j CD PERIOD PAIR 10 & i i ~~~ ! ( 2 ) ANOTHER PAIR OF CDNSECUTIUE PERIODS j FRESS <RETURN> IF REfiDV TD SEE GRftFH j i I APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 336 .Revenue Report for Various Companies (Revenue in 1000's Dollars) t • • LiJI U J Z'JL 5-.,. S j J U's t_j [_.« a LL J cc rr: U J ra o LL! :r> cr: LLE LL. : — Ixl • in i—i Cd Cc cn LLI z : _i ci_ cn o e—5 U J U J I S -L".J a UL. P : C3 •1Z . . '=> B — ] U i F = LL. -UJ LsJ :— v*4 ^ APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT Company B Revenue Report (Revenue in 1000's Dollars) 2 / 337 o o o o o o o r-. 'x» ui a- m < s | e.ji I— UJ ~ 1 ti,! P-. M'« LsJ O OC ufl „ . J U"i O <i: a ."P™ Id »--« C J *_LT e-« I i : IJIJ £— «_< a-: LU UJ 8.'(« C l C l » - * C J C n t™» ™J • cc:«..« « - « U J 2C 2: L i . *~i Z71 t— UJ a V l C l 2C UJ rcn ' J .-J V i r s r U J • Vt --• ---- KA ca •-• ---APPENDIX CrCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 338" Revenue Report for Various Companies (Revenue in 1000's Dollars) ~ t — ..i ... LsJI £€. ni • Wi LU LU Eri tj;' I— - J 111 CC :;c U J *"> 2T. >:c ' . J LCT LU LIC: C J o »-i « L L »~« liJ IC L L . '/I s — LU '/! Cl LU m o — a •—a txZ LU ilu LU C J "1 U • n i -Cj U J i t <71 TO O 1*1 — -t-l LYl -_-i/4 U J APPENDIX OGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 339 Company B Revenue Report (Revenue in 1000's Dollars) m I Ii Ii o c--Jl LU sr. i — LU — Y -•.--a c"-.S —._! *-«— ..[— LsJ L>J C f . yz, L U a— ,j~ l!JC-_J ^ H.".J! a LiJ «~s C I E—B 3 7 P.U I— >_7 I L L . l „ I SUJJ U J u _ Cj B_ . l l t—t • rri d _j • tj_ »_« LU 27 LL. I-I m B/| B — E— LU ~ L'B Ci 27 LU in _ J « / l ~r~ l_l Iii n _r a l -B O I,"! . — . . — . — i - « APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 340 Revenue Report for Various Companies (Revenue in 1000's Dollars) tit '_' K O L . J Ui KJC". L U ~ 1 o FJLS •x 2T. l U n—e i — i i a: J F.K. LU UtC Q «-5 O CE I™J • Cr: tjj 2 1 LL. 2 3 J/1 B— j j j L""l I_J LU 23 LD _ J 7* I_I O 2Z S 1 ! LU B " 1 S J 1 Ci S—5 LU B X LU C J 23 I J_I — . «—5 LU • Vt --- ---XV. APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 Data Sources for Generating Graphics Revenue Report for Various Companies Periods month 10 7 2 96 85 73 68 65 55 48 A 89 79 48 53 61 23 20 B Revenue Report for Various Companies Periods month 10 7 3 65 72 74 55 22 30 39 A 86 82 93 76 65 62 50 B 10 20 30 40 18 25 15 C Revenue Report for Various Companies Periods label 10 14 2 64 55 62 68 70 62 50 42 30 16 13 10 7 4 A 89 91 83 78 76 64 56 53 40 22 26 27 32 32 B Revenue Report for Various Companies Periods label 10 14 3 50 55 67 73 74 42 50 56 68 76 41 23 18 9 A 99 97 93 89 87 81 78 80 90 95 70 62 50 40 B 62 72 75 80 85 75 65 70 85 90 61 55 43 30 C 341 APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 342 Revenue Report for Various Companies (Revenue in 1000's Dollars) L i J ix'., cc ui LLS Ld _r_ 7.C i— i—i zx a cc J~ • cc«_» zc r.» cc: »— ~ LJ cc rc ' . • " i i i - *~ • r r n zc a — •_.« cc i APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 3 4 3 Revenue Report for Various Companies (Revenue in 1000's Dollars) L: Ii • • S.C» E—1 «/« F . i J ! fjiJI Cri E — t—a a o—a U'Z JjJ •r cc .r: zsz • !_! I I -!•*-« I - -Q D I_I >-<K 1 LU n I— s*. a 5 oc v4 " APPENDIX DrCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 344 Revenue Report for Various Companies (Revenue in 1000's Dollars) 1!" h p! I II! i S-'j I-r [I ».»s U J : j r " cc VJ. iii Z C I—5 r x : 1 TZ a i--1 LU fit. CC X7 a cc LU r r - « «— 27, CC v i a . CC XT 7C C3 I !•--|nrH LU • CC t -4 U J APPENDIX DrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 345 Revenue Report for Various Companies (Revenue in 1000's Dollars) J Z I 3: Z z L - -1 " <'Z : 1 L _ . _ •}•••••;-• -r J „.Y c „•.. _ _ -J Z z ill u> U i EJT." Ill Ul ZI Ui zz? UJ. f i r -t i t : £...„ E—S :rx O a t--« Ul a. ui 31 Z-t~~ ~zz •11 Ui CL. KSZ xz zr. a I_I i i - • i i -xt " a o I . J -v-JjLJ « DC u. ! — a = AJ. U i APPENDIX DrCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 346 Revenue Report for Various Companies (Revenue in 1000's Dollars) iZi Ul !—t tV*; UJ O-Lii ZZB t—t n U J •••„-"' O c V e U i '71 ZC «..,« £— U t U J tj[j ixl »'SZ ~J U J »- UJ DC w O ^ a: •_.« -~r~ "m nV" <n •2- !Lt! " '.'"l LL. »— • r E a = 2 D Z = l_i «X ^ I " |.-. .-. w I THI fl"'J ui . | ' — * " — - 0 . APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 347 Revenue Report for Various Companies (Revenue in 1000's Dollars) $51 ni! I III. r: T Z o t - - « Cs:. Hi Hi u Is I i I i—t t — zzi. '_.» b j t.ri f.-:n re: ciii p I FV?| iiTi I U J L U III Cd P lb? i E.iJ CE _ E «„.! |— U J u_ CE .-J UJ CC u_ • cc'_' cc L'"! _ CC x r r c ra I_I i i - -rc h E 5 C"4 " APPENDIX D:CRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 348 Revenue Report for Various Companies (Revenue in 1000's Dollars) E 3 t-4 LU EJ_ LU ™ i tr-t g—. «._! LU *.'"« -1— .JU— a i t s — LU LU LLI LU m i i l/» 3" I'M < H 1^ L ' l LU m L C ' LU LL. •r xr re • I_I i I--= K & 5 LU « =^  s— a 1 i T " LU LC , ~ * . U i '.'"I APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 349 Revenue Report for Various Companies (Revenue in 1000's Dollars) 3 i : r: z i i r CC U J Cu UJ 73 EZ3 UJ 72 t — UJ! Cft U J cc rc i_i i — U J IJ_ cc _j cc UJ rc r« cc U'l _ cc r: rc a I_I i i - -i CC u_ XT a I.I rc ! — ca CC APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 350 Revenue Report for Various Companies (Revenue in 1000's Dollars) 1 • • CE lit i— LU —r-. LU US is'.. a ! „ . : Ut Ci i [•••s 6— LU c n re t — i n o c ».«"i CL. OC x r CC • !__! I I---I—< oc * - . Q LE. , R L U S I — L d = iT* LU a: . 1,1 CM « APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 351 Revenue Report for Various Companies (Revenue in 1000's Dollars) r z z r " LU L'J Ld LU t'r". -.1 CC o J—0 t — t . J Li.1 tCr__T E— SJJ LT" — i — a cc LU zc »— cc >/i a . cc xr rc a _ i i i --• |-«-« I - ' XT w ! _ u ! « L U LU ^ rc >*• I a g I_I cc " C"JJ Ui APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 352 Revenue Report for Various Companies (Revenue in 1000's Dollars) z " I * L U Mi' z i : Z Z L U :~s — i -• • i . IUJ L U h'.sr' ...,J a:: r..J c— LU £--"J I t-fl L : i— '.••I L U tJTS ™ f " " • . cc LU ZC Z«" cc "/I LL. cc xz zc a I_I i i - - -= 1 CC « rfz -XZ -> a. S LU « ZC u. zz. ^  CC " C"4 y APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 353 Revenue Report for Various Companies (Revenue in 1000's Dollars) L U Hi L U I'J ..«Ji a: o 8--; E—. !....c L U P X . fJ.1 LU i — i — i — r i — r X 3 - AS _ t _ " ~ LU « - T I E u . «/t u - «— I s i - « _ APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 354 Revenue Report for Various Companies (Revenue in 1000's Dollars) Vt-t— tir.1 us Ex! I!x5 z?z £ E-a ZX a B—C Ex! tu., • LsJ «— i : on «/i a. CC 32 rc in I_I i i •-• i T T i n i s IX. UJ XZ w 3- fc I— S i CE " CM ^  " CL. APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 355 Revenue Report for Various Companies (Revenue in 1000's Dollars) l 1 F pi* rirl 1 ?t.<;ii ii lis! W m 1 1 all sa r o K " " (-•» Uil E X : cc LUI «.»It U J S.1.1 Z " H UJ! «_ r c £— E—fl ~£ a E~0 L I T . UJ ill-CC UJ ZC Z"* cc '.""l l _ _ cc xz rc rrj I_I i !•«-« I---IJL. Lil X„ w n _ •_j UJ UJ <* I — a I _c = cc ^ C--I -< "—- a. APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 I 356 Revenue Report for Various Companies (Revenue in 1000's Dollars) mr I 1 I IT'!1! ks'l r-. t— ».•"« L U L'l-L U E/6 111 1 L U L U B—B IL.J a :--« ir! LU L U . • r LU ZC >" «r L'"l LL. •r x: x a : i L L . x: a »_> &: LU re i cn •"TP* (S Uf> I \-r< «N APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 357 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:GRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENTS / 358 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX DrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 359 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 7 360. Revenue Report for Various Companies (Revenue in 1000's Dollars) I i-t Ul. FX, 1 L U rr?. L J mi r.:a a c LiJi Id L»J E!U 13 1 I H>ril i ! Si! 1 "*r~ cc zc < / B 1x1 tn •x: _ J LU ri-ce xz * hi E3 CI 8 B -o cn I/B IX. cc rr rc m I_I i i-—• I"-" 1x1 « rc 1— • = CC «'-J UJ i -APPENDIX D:GRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 361 Revenue Report for Various Companies (Revenue in 1000's Dollars) CB C3 B—I LU tJL. LU . u t—a t-.. r j LiJ •/! — I — m l_i T-1 LU LU r c T — LU flu LU i n or rc I_I B — '.'I LU i n ix: CC • cc LU cc :« CC «/i LX. CC ZC • u I I--. I CSL LU CC 3C • = » T * uj i - l <-4 ui APPENDIX DrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 362 Revenue Report for Various Companies (Revenue in 1000's Dollars) 6 " r A 5JU I— LU . J _ . txl LU O I!— L.J LU C€. 5~! r"ji • r.i {_  iii in a _ j LiJ rc «_r az CC XT rc rn I_I i i - • 'ZC * rr ui XT L> rn u ° Z-* CJ U l LU •* rc i -I — rn = _ _ _ LC.' ^ ~ _3 OC LU APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 363 " v ••• Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 364 Revenue Report for Various Companies (Revenue in 1000's Dollars) m fen t • • -71— U i i— SJJ — j U.J; t > H 1 '.•I n r • or nr 0T o.i_ or n r c a 1 U L , L U XT ^ «_J -LsJ * rc »— i - - <-j L U APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 365 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 366 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 367 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX DrCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 368 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 369 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 370 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 371 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 372 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT Revenue Report for Various Companies (Revenue in 1000's Dollars) 3 A-373 \ \ \ iiZJi o t~'i Ul Ul £..:., l.J 111 '•.•"»' o a r.c !!••••- • S.1.S rc ! „ 1 j.™ Wi Ui lZ? FJcC >x: „ j rc r»« ZZ • r «/i CL. >T. XT zc • i_t I !•»-> I'--•r X T a I_J ui zc 1— • •AM CC •:-4 APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 3 7 4 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 375 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 376 Revenue Report for Various Companies (Revenue in 1000's Dollars) APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 /•377-Revenue Report forVarious Companies (Revenue in 1000's Dollars) XVI. APPENDIX E-.SUBJECT RECRUITMENT FORM COMPUTER GRAPHICS STUDY We are conducting a study to determine what the most appropriate graphical presentations for performing various tasks are. Your participation in this exercise will provide you with the opportunity to: 1. Interact with a micro-computer graphics program on an IBM-compatible XT system, and 2. Receive $10 participation bonus in addition to cash prizes to be awarded based on your performance level in appreciation of your volunteering time and effort. Participation in this study is strictly VOLUNTARY. You may withdraw from the exercise at any time at your own discretion. It is expected that the total time you will spend is approximately 1.5 hour, allocated as follows: a. Completing the Embedded Figure Test (20 minutes); b. Performing a practice exercise to familiarize yourself with the use of the computer system and the experimental procedures (20 minutes); c. Performing a series of self-paced question-and-answer exercises using computer-generated graphics (approx. 25 minutes); d. Performing a second series of exercises (approx. 25 minutes). All participants who complete the experiment will be awarded cash prizes of $10 plus an amount up to $25 to be awarded based on your accuracy and time performance relative to others. There will be a total of approximately 30 other participants. 378 XVII. APPENDIX F:SUBJECT CONSENT FORM COMPUTER GRAPHICS STUDY: CONSENT FORM 1, the undersigned, hereby agree to participate in all of the related experimental procedures designed for the abovementioned study (see the attached form -- Recruitment Form) It is also my understanding that 1 may, if I wish, withdraw from the experiment at any time at my own discretion, without jeopardy to class standing. SIGNED: Student ID : 380 XVIII. APPENDIX G:OUTLINE OF EXPERIMENTAL PROCEDURES EXPERIMENTAL PROCEDURES - Greet Subject; - Tell Subject the different steps that will take place; - Ask subject for his/her signature on the consent form; - Administer the GEFT test; - Instruct subject about the Practice Session and encourage him/her to ask questions during this session; - Begin the Practice Session; - Ask questions to measure subject's understanding of the graphics presentations and the various tasks to be performed; - Track answers for each task performed during this session until an error is detected; - Stop subject and explain why the mistake was. made before letting him continue further; - If more than 3 wrong answers surfaced during the practice session, break and advise subjects to rerun the session. - Start the Actual Session and remind subject to try hard to be as ACCURATE and QUICK in responding as possible. Remind the subject of the cash bonus he could win for good performance; 381 APPENDIX C:OUTLINE OF EXPERIMENTAL PROCEDURES / 382 - Give participant a break if s/he needs it. - Start the Second Exercise and remind subject again of the cash bonus he could win for good performance! - Request participant to fill out the questionnaire. Be sure to explain what Line Graph, Bar Chart, Scatter Plot is. - Pay participant and ask him/her to sign a receipt; - Save participant's answers into the floppy disk and immediately print a hard copy; - Grade his GEFT score and place results with other forms filled out by him into a folder lay away in a safe place. Each folder should have 1. The consent form; 2. Participant's GEFT score; 3. Subject's completed questionnaire; 4. Receipt for amount compensated; 5. A hard copy of subject's results as stored in the computer. XIX. APPENDIX H:!NSTRUCTIONS FOR SUBJECTS SCREEN 1: Please read the following instructions carefully: You will first go through a "Practice Session" to become familiar with the procedures of the "Actual" experimental run. Since no data are collected during this first session, do not hesitate to ask questions of the lab assistant. Notice that all of the necessary instructions are normally written down at the bot tom of the screen. REWARDS!! As a participant, you will receive a special gift for your time and effort. Be aware too of your contribution to the advancement of our knowledge on graphics comprehension! Thank you for coming and we hope you will enjoy using the graphic terminal. Please continue with the rest of the instructions by pressing "RETURN" SCREEN 2: This exercise is designed intentionally as self-paced. You are free to spend as much time as you like on the question that first appears with no accompanying graph on the screen. This is to familiarize yourself with the information that will be required from the accompanying graph that appears following your pressing the "RETURN" key. Notice that your response is timed from the moment the graph and instructions appear together. Record your "RESPONSE" by pressing the "RETURN" key. This task is repeated for every trial in all experiment sessions. Each time you will be prompted by the onset of a "BELL". Please respond as quickly as possible. 383 APPENDIX HtlNSTRUCTIONS FOR SUBJECTS / 384 Your assistant will be with you throughout this practice to ensure that you encounter no difficulties. SCREEN 3: Finally, notice that during the course of each trial, after you press return indicating that you are ready to see the graph, there may be a slight delay before the graph comes on the screen. Answer the question by pressing the appropriate key, fol lowed immediately by hitting the "RETURN" key. Respond as quickly as possible; however, be sure that your answer is correct. You may begin with the "PRACTICE" session by entering your name and pressing the "RETURN" key. XX. APPENDIX hPILOT TESTING REPORT Summary of Pilot Study The purpose of this pilot study was to test the strength of the experimental treatments and the experimental procedures for my dissertation research. Method A computer program was written to present each subject with various graphical representations and questions on a microcomputer. The program was designed so that each trial consisted of a displayed question fol lowed by the appropriate graph when requested. When a subject indicated his/her answer by entering a choice key, s/he automatically moved to the next trial. Subjects received instructions while interacting with the program and may clarify any difficulties with the experimenter who would always be available during the course of the experiment. Subjects were initially given a series of twelve practice trials and advised accordingly by the experimenter, particularly when they showed any non-conforming behavior with regards to experimental procedures. These practice trials, aimed at familiarizing subjects with the computer system and the experimental procedures, were typical of the question-and-answer sessions used in the actual experimentation. It took most subjects just about half-an-hour to complete the whole experiment. Variables There are four independent and two dependent variables in this experiment. The two dependent variables are the accuracy of the answers and the time it takes to answer the questions asked. The four independent variables are factors that have to do with the different types of graphical designs and the questions. 385 APPENDIX hPILOT TESTING REPORT / 386 Three types of graphical representations used are bars, lines and symbols. These graphs are designed as time-series with 7 or 14 periods depicted along the abscissa and 1 or 3 companies as units to be examined. Three types of questions used in the pilot testing are: 1. fxacf Questions - those asking the comparison of the value of one point to an exact value along the Ordinate; 2. Comparison Questions - those asking the comparison of values of two adjacent points along the abscissa of a particular company; 3. Trend Questions - those asking the trend of a range of points along the abscissa. Examples of each of these question types are provided in Appendix I of this report. In this study, the accuracy score is binary with 1 assigned to those correctly answered questions and 0 to those incorrectly answered questions. Time is measured to a hundredth of a second from the moment the graph appeared to the moment the subject presses the answer key. Experimental Design A full factorial within subject experimental design is used (three question types by three graphical forms by two abscissa periods by two dataset groupings). Every subject receives all thirty-six treatment combinations. Sixteen subjects, one undergraduate and fifteen graduate business students, participated in the pilot study. Three of the subjects repeated the experiment twice yielding a total of nineteen complete datasets. The statistical analysis was based on these nineteen complete datasets since none of the subjects show any sign of or, ever complained of fatigue or loss of interest during normal as well as repeated testings. APPENDIX l:PlLOT TESTING REPORT / 387 Findings The analysis of the findings is divided into two parts - one dealing with the statistical analysis of speed of responses across all nineteen complete datasets; the other, a report of the frequency of errors committed across the different types of treatment combinations. Respond Time Analysis A five-way ANOVA, based on the appropriate error term for the F test used, with subject treated as a random factor reveals the following significant main and interaction effects: ANOVA Results (p-values) Effect Time Grouping .0001 (1 or 3 company units) Question .0001 (Exact, Comparison or Trend) Period .0310 (7 or 14 periods) Form X Question .0002 Grouping X Form .0011 Grouping X Form X Question .0282 APPENDIX l:PILOT TESTING REPORT / 388 These results provide strong evidence that the independent variables manipulated are significant factors affecting the use and understanding of graphical representations. A table of mean values on the effects of more significant factors is provided in Appendix II of this report. A more careful examination of the strong Form X Question interaction effect see Appendix II showed that line representations were best suited to Trend Questions whereas bars were most appropriate when answering Exact Questions, ln terms of other interaction effects, symbol representations appeared to be helpful to subjects only in the case of representing multiple companies but not single company whereas subjects faced the most difficulty when using mulitple although not single bar representations. As a matter of fact, Cohen's 1977 approach for evaluating statistical power of the F-test for main effects reveals that even with a proposed value of .15 for the effect size index f which is very conservative for most of the main effects studied here see operational definitions of f in Cohen, 1977; p. 348 , a power level of about .89 is attained for 19 observations per cell. Due to repeated measures and the economy of the within-subject design, a power level of .95 can be obtained with just twenty-five subjects. Consequently, twenty-five subjects and different random treatment orders are suggested for each experiment in the dissertation series. Accuracy Score Analysis There are two kinds of errors made by subjects: first, the 'logical' error in which subject wanting to press the correct answer key instead presses the wrong one and second, the 'pure' error in which subject presses the wrong answer key but believing it to be correct. The first kind of error could be reduced by means of longer practice; the second kind, by means of feedback or giving subject another chance to attempt a right answer to the same question. While the pilot test experiment were not initially designed to distinguish and trace these various kinds of errors APPENDIX IrPILOT TESTING REPORT / 389 for analysis, it has since been modified to do so for the dissertation experiments. However, one assuring fact is that subjects in general made very few errors with the worst score achieving a high 75% accuracy, and the rest above 83%. The average overall accuracy attained is above 92%! A frequency count analysis of the number and types of error made revealed errors occurring at most only once or twice for most treatment combinations, except for the following which recorded 3 or more instances of errors committed by various subjects: Accuracy Results (No. of errors) Treatment Combination No. of Errors No. of Subjects No. 12 (14-period Multi-lines 13 times 13 on Exact Questions) No. 11 (7-period Multi-lines 7 times 7 on Exact Questions) No. 32 (14-period Multi-symbols 4 times 4 on Exact Questions) No. 35 (7-period Multi-lines 3 times 3 on Comparison Questions) From this and earlier results, we see interesting problems with the perception of certain graphical representations for certain tasks. For example, multiple line representations which APPENDIX I:PILOT TESTING REPORT / 390 yielded the most trouble in subjects' ability to respond quickly to £xacf Questions also caused the most number of errors in the pressing of the answer key! This is true for al! multiple line graphics whether designed as only 7 or as 14 abscissa periods. One way to reduce this error is to relax the perceptual accuracy for these exact questions such as making the exact value comparison easier to extract; otherwise, an alternative is to have the subject repeat those incorrectly answered questions at the end of all 36 trials. It will only be a matter of time for him/her to learn that the answer given earlier was wrong. Conclusion The objective of the pilot testing was to investigate if there should be changes made to the experimental programs and procedures. Interviews with various subjects suggested the possibility of having them replicate the experiment without loss of interest. In the post-test interview, subjects were specifically asked about the extent to which they had difficulty coping with various questions asked, whether part of the instructions were hard to understand or they were getting bored or tired during the course of the experiment. Most subjects were happy with the instructions of the experiment and clearly indicated that they would be willing to go for another run should their results be lost or erased accidentally. A few subjects who complained about the selection of the answer keys and the need to remember what these keys were representing found that after some practice sessions, they became quite comfortable with the 'V or '2 ' key to be pressed as answer keys. More dissatisfaction was expressed when the keys were modified to a different set like '0' and ' 1 ' , and so on. One incident which clearly indicated the effectiveness of the experimental procedure occurred when an earlier version of the experimental program was mistakenly used on one subject which ended up requiring him to go through a series of over 70 continuing trials! The experimenter finally stopped this subject and much to his surprise, this same subject was still eager to APPENDIX LPILOT TESTING REPORT / 391 volunteer to run the newer version of the experimental program without one complain of fatigue! The experimental procedures were also designed to overcome the two major weaknesses of a completely repeated design; namely, the order effects as well as the carryover effects. The order effects are controlled by randomly varying the order of the treatments across subjects whereas that of the carryover effects can be adequately controlled via randomization and replications of the actual experimental session. Put together, the pilot test results successfully indicate that: 1. The independent variables have been operationalized properly and are strong enough to cause significant effects; 2. Much of the task demand was simple enough for subjects to respond quickly and accurately; 3. Most, if not all, of the graphics presentations were adequate for answering the question asked and not confusing to the subjects; 4. The instructions and the other procedures associated with the experiment are sound. APPENDIX 1: PI LOT TESTING REPORT / 392 Appendix 1: Examples of various Question Types Exact Question A's REVENUES IN PERIOD 4 ARE THAN $80,000 ? (1) LESS (2) MORE Comparison Question A's REVENUES IN PERIOD 8 ARE THAN THAT IN PERIOD 9 ? (1) LOWER (2) HIGHER Trend Question A's REVENUES FROM PERIOD 5 TO 7 ARE GENERALLY ? (1) DECREASING (2) INCREASING APPENDIX 1:PILOT TESTING REPORT / 393 Appendix 11: Table of Means for Response Time (Form X Quest ion X Grouping) i i FORM i I c jymbo! Is B a r s I ..ines QUESTION TYPES Q1 Q2 Q3 Q1 Q2 Q3 Q2 Q3 Mean RTime ( i n s e e s ) 4 . 2 3 . 3 3 . 2 3 . 7 3 . 8 3.3 5. 1 3 . 1 2 . 8 GROUPING 1 Company 3 . 9 3 . 0 2.8 2.8 3 . 0 2.4 4.0 2 . 9 2.3 3 C o m p a n i e s 4 . 5 3.6 3.7 4.6 4 . 6 4.3 6.2 3 . 2 3 . 3 Note: Q1 -- E x a c t Q u e s t i o n s Q2•-- Comparison Q u e s t i o n s Q3 -- T r e n d Q u e s t i o n s XXI. APPENDIX ^QUESTIONNAIRE FOR SUBJECTS Graphics Questionnaire DIRECTIONS Please react to the following statements about the information system you have been using. There are no right or wrong answers as this is not a test. We are interested only in your opinions about how well the graphics presentations used in the experiment support your comprehension process. On the scale below please circle the answer which best corresponds to your opinion. For instance, if the statement was: This room is very cold today. Strongly agree 1 2 3 4 5 6 7 Strongly disagree. Then, circle: 1. If you thought it was very cold; 2. f you thought it was cold; 3. tf you thought it was cool; 4. f you thought it was indifferent; 5. If you thought it was warm; 6. f you thought it was hot; 7. If you thought it was very hot; 394 APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 395 ACCURACY LINE 1. The contents of the LINE GRAPHS were very accurate. GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. BAR 2. The contents of the BAR CHARTS were very accurate. CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. SCATTER 3. The contents of the SCATTER PLOTS were very accurate. PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. UNDERSTANDING LINE 4. The LINE GRAPHS were very easy to understand. GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. BAR 5. The BAR CHARTS were very easy to understand. CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. SCATTER 6. The SCATTER PLOTS were very easy to understand. PLOTS APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 396 Strongly agree 1 2 3 4 5 6 7 Strongly disagree. RELEVANCE LINE 16. The LINE GRAPHS contained exactly the right type of information. GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. BAR 17. The BAR CHARTS contained exactly the right type of information. CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. SCATTER 18. The SCATTER PLOTS contained exactly the right type of information. PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 397 FORMAT LINE 10. The LINE GRAPHS were very well formatted. GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. BAR 11. The BAR CHARTS were very well formatted. CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. SCATTER 12. The SCATTER PLOTS were very well formatted. PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. INFORMATIVENESS LINE 13. The LINE GRAPHS contained too much information. GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. BAR 14. The BAR CHARTS contained too much information. CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. SCATTER 15. The SCATTER PLOTS contained too much information. PLOTS APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 398 Strongly agree 1 2 3 4 5 6 7 Strongly disagree. USEFULNESS LINE 7. The LINE GRAPHS were very useful for answering the questions. GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. BAR 8. The BAR CHARTS were very useful for answering the questions. CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. SCATTER 9. The SCATTER PLOTS were very useful for answering the questions. PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 399 CLARITY LINE 19. The LINE GRAPHS clearly indicated when Revenues were high or low. GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. BAR 20. The BAR CHARTS clearly indicated when Revenues were high or low. CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. SCATTER 21. The SCATTER PLOTS clearly indicated when Revenues were high or low. PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree. SATISFACTION LINE 22. My overall satisfaction with the LINE GRAPHS is best described as: GRAPHS Very satisfied 1 2 3 4 5 6 7 Very dissatisfied. BAR 23. My overall satisfaction with the BAR CHARTS is best described as: CHARTS Very satisfied 1 2 3 4 5 6 7 Very dissatisfied. SCATTER 24. My overall satisfaction with the SCATTER PLOTS is best described as: PLOTS APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 400 Very satisfied 1 2 3 4 5 6 7 Very dissatisfied. RESPONSIVENESS 25. I found that the graphics reports appeared very quickly the moment I asked for it. Strongly agree 1 2 3 4 5 6 7 Strongly disagree. CONSISTENCY 26. I believe my approach to performing various tasks remain fairly consistent throughout experiment. Strongly agree 1 2 3 4 5 6 7 Strongly disagree. APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 401 OTHERS 27. I found the questions to be very easy to understand. Strongly agree 1 2 3 4 5 6 7 Strongly disagree. 28: I found the questions to be meaningless without first looking at the accompanying graphics reports. Strongly agree 1 2 3 4 5 6 7 Strongly disagree. 29. I think I can tell the kind of graphics that would be appearing just by reading the question asked. Strongly agree 1 2 3 4 5 6 7 Strongly disagree. 30. If one hundred other participants were faced with this same exercise (that is the question set and information presentations being exactly like yours), how many do you think would perform better (in terms of both speed and accuracy) then you would ? Enter a number between 0 and 100: 31. What, in your opinion, are the strengths/weaknesses of the reports used to answer the questions, a. LINE GRAPHS; APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 402 b. BAR CHARTS; c. SCATTER PLOTS; 32. Suggest improvement with regards to, a. LINE GRAPHS; b. BAR CHARTS; c. SCATTER PLOTS; APPENDIX ^QUESTIONNAIRE FOR SUBJECTS / 403 33. Please candidly discuss your personal effort in this experiment. a. Did you try hard or just give up at any time b. How important was the "cash prizes" in determining: 1) Your decision to participate in the experiment. Not important 1 2 3 4 5 6 7 Very important. 2) Your actual effort during the experiment. Not important 1 2 3 4 5 6 7 Very important. Name: Comm. Crad. Comm. Undergrad. Others: Thank you for participating and please DO NOT DISCUSS the experiment with other participants as you may unduly influence their performance and learning process. XXIII. APPENDIX L:MAIN TURBO PASCAL PROGRAM program Joseph; $1 typedef.sys $1 Types.INC $1 graphix.sys $1 kernel.sys $1 windows.sys $1 hatch.hgh $1 barsl.hgh $1 dots.hgh $1 lines.hgh $1 axismodule.hgh $1 time.hgh $1 Question.hgh $1 Draws.hgh $1 GetDatal.hgh $1 MisCl.inc $1 Actual.hgh procedure FlushBuffer; var bufferhead : integer absolute $0000:$041A; buffertail : integer absolute $0000:$041C; begin buffertail: = buff erhead; 405 APPENDIX LMAIN TURBO PASCAL PROGRAM / 406 end; procedure LoadPic(Gtype:char; qcount:integer); var n:WrkString; extxhar; begin if qcount n: = chr(qcount + ord( '0 '» else str(qcount:2,n); case Gtype of 'B' : ex t := 'a ' ; 'D ' : ex t := 'b ' ; 'L' : ext: = 'c'; end; filename: = 'pic' + n + ext; LoadWindowStack(filename); readln; end; begin ClrScr; repeat WritelnC Enter Experiment Number please ' ) ; readln(FExt); Until (FExt in [ '1 ' , '2 ' /3 ' ] ) ; WriteLnC Generate Practice/Real Graph / Runfxperiment APPENDIX LMAIN TURBO PASCAL PROGRAM / 407 '); readln(rmode); if rmode in ['s','S'] then begin Introduction; PrepareOutfile(Answerfile); writelnC Do you want to skip the practice session ? (Enter Y/N)'); readln(ch); if (ch in [ 'n ' /N ' ] ) then begin Assign(Questionfile,'qpract.' + FExt); reset(questionfile); InitGraphic; for loopcount := 1 to 2 do for qcount: = 1 to 6 do begin ClearScreen; GetQuestion(Questionfile,Gtype,qcount); ResetWindowStack; LoadPic(Gtype,qcount); RestoreWindow(1,0,0); Sound(440); Delay(500); NoSound; GetAnswer(Answer); end; LeaveGraphic; EndPractice; Repeat APPENDIX LMAIN TURBO PASCAL PROGRAM / 408 WritelnCEnter your password please ' ) ; readln(KBD,password); FlushBuffer; until (password = 'fs'); ActualSession; EndFSession; Close(AnswerFile); close(QuestionFile); end else begin Repeat Writeln('Enter your password please ' ) ; readln(KBD,password); FlushBuffer; until (password = 'ss'); InitCraphic; LeaveGraphic; ActualSession; EndSession; Close(AnswerFile); end; end else if rmode in Pp','P'] then begin WritelnC enter your password please'); read(KBD,password); APPENDIX LMAIN TURBO PASCAL PROGRAM / 409 if password = ' ken ' then begin PrepareDataFile(infile1); GetNoQuestion(infile1 ,noquestion); Writeln( 'Enter Graph Type'); readln(y); for x: = 1 to noquestion do begin InitGraphic; GetGraphData(infile1,DataRec); with DataRec do begin XYaxis(xlength,ydelt,nograph); case y of 1:DrawBars( DataRec); 2:DrawDots(DataRec); 3:DrawLines(DataRec); end; readln; i f x n: = chr(x + ord('0')) else str(x:2,n); filename: = 'pic' + n + chr(y + ord('a')-1); s torewindow(l) ; SaveWindowStack(filename); LeaveGraphic; end; APPENDIX LMAIN TURBO PASCAL PROGRAM / 410 end; end; end else if rmode in ['r','R'] then begin WritelnC Enter the correct password ' ) ; read(KBD,password); if password = ' joe' then begin Assign(inf i lel ,'dreal.' + FExt); Reset(infilel); CetNoQuestion(infile1,noquestion); WritelnC Enter CraphType '); readln(y); j : = (y-D*4; for x: = 1 to noquestion do begin GetGraphData(infile1,DataRec); InitGraphic; with DataRec do begin XYaxis(xlength,ydelt,nograph); case y of 1: DrawBars( DataRec); 2:DrawDots(DataRec); 3:DrawLines(DataRec); end; APPENDIX L:MA1N TURBO PASCAL PROGRAM / 411 readln; k := j + x; i f k n:=chr(k + ord('0')) else str(k:2,n); filename: ='pictr ' + n ; storewindow(l) ; SaveWindowStack(filename); end; LeaveCraphic; end; end; end else writelnC invalid input,press any key to continue'); repeat until keypressed; end. XXIV. APPENDIX MrSAMPLE BMDP STATISTICAL PROGRAM /problem Title is 'Statistics for Expt. 3 Accuracy/Time'. /Input Variables are 8. Format is '(2F3,F4,F5,F6,F7,F8,F9.2)'. Unit = 4. A/ariable Names are Trial, Subject, IS, QT, TP, DP, ASCORE, TSCORE. /Croup Cutpoint ( l ) is 1. Names(1) are Learn, Stable. Cutpoints(2) are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23. Cutpoints(3) are 1, 2. Names(3) are Bars, Symbols, Lines. Cutpoints(4) are 1,2. Names(4) are Single, Pair, Range. Cutpoints(5) is 1. Names(5) are seven, fourteen. Cutpoints(6) is 1. Names(6) are one, three. /Histogram Grouping is IS. variable = ascore,tscore. /Histogram Grouping is QT. variable = ascore,tscore. /Histogram Grouping is TP. 412 APPENDIX M:SAMPLE BMDP STATISTICAL PROGRAM / 413 variable = ascore,tscore. /Histogram Grouping is DP. variable = ascore,tscore. /Histogram Grouping is IS,QT. variable = ascore,tscore. /Histogram Grouping is IS,TP. variable = ascore,tscore. /Histogram Grouping is IS,DP. variable = ascore,tscore. /Histogram Grouping is QT,TP. variable = ascore,tscore. /Histogram Grouping is QT,DP. variable = ascore,tscore. /Histogram Grouping is TP,DP. variable = ascore,tscore. /print TTEST. /end 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share