UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Graphics : theories & experiments Tan, Joseph K. H. 1988

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1989_A1 T36.pdf [ 16.1MB ]
Metadata
JSON: 831-1.0098367.json
JSON-LD: 831-1.0098367-ld.json
RDF/XML (Pretty): 831-1.0098367-rdf.xml
RDF/JSON: 831-1.0098367-rdf.json
Turtle: 831-1.0098367-turtle.txt
N-Triples: 831-1.0098367-rdf-ntriples.txt
Original Record: 831-1.0098367-source.json
Full Text
831-1.0098367-fulltext.txt
Citation
831-1.0098367.ris

Full Text

GRAPHICS: THEORIES & EXPERIMENTS by JOSEPH K. H. TAN M.S., The University of Iowa, 1982 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DECREE OF DOCTOR OF PHILOSOPHY  in THE FACULTY OF GRADUATE STUDIES Faculty of Commerce & Business Administration  We accept this thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH COLUMBIA 28 December 1988  © Joseph K. H. Tan, 1988  In presenting degree  this  at the  thesis  in  partial fulfilment  of  University of  British Columbia,  I agree  freely available for reference copying  of  department  this or  publication of  and study.  thesis for scholarly by  this  his  or  her  Department of The University of British Columbia Vancouver, Canada  requirements that the  I further agree  purposes  representatives.  may be It  thesis for financial gain shall not  permission.  DE-6 (2/88)  the  is  that  an  advanced  Library shall make it  permission for extensive  granted  by the  understood be  for  that  allowed without  head  of  my  copying  or  my written  ABSTRACT  GRAPHICS: THEORIES & EXPERIMENTS  BY  Joseph K. H. Tan, December, 1988  Supervisor: Dr. Izak Benbasat Divison: Management Information Systems  The primary justification of this research lies in the current thinking among graphics theorists and Management  Information Systems researchers that different forms of information  representation  facilitate different types of tasks, and that it is the task characteristics which essentially influence performance with a given information presentation.  Three experiments were designed to investigate hypotheses drawn from the literature testing the relative strengths and weaknesses of various graphical representations for answering a series of questions. Three forms of graph format were studied comprising bars, symbols, and lines. Time is the primary dependent variable of interest in this research.  Accuracy is a secondary criterion. The tasks  investigated involved the extraction of relationships among elementary classes of information depicted on various attribute components of time series data: (1) Dependent Variable (DV) component (namely, information on scale-values, level relationships, and trends); (2) Primary Independent Variable (PIV) component (namely, information on abscissa time period); and (3) Secondary Independent Variable (SIV) component (namely, information on dataset classification).  Experiment 1 tasks involved the extraction of the DV scale-value (QT), DV level relationship (Q2), and  ii  DV trend (Q3) based on specific time period information on the PIV component. that lines took longest for Q 1 when compared t o bars and symbols.  Results indicated  Conversely, experiment 2 tasks  involved the extraction of time period information based on a specific DV scale-value (Q1), the DV level relationship between two points (Q2), or the DV trend among several points (Q3). statistically significant time differences were found among the various graph formats.  No  However, lines  were less accurate to use than bars for answering Q 1 .  Experiment 3 tasks involved the extraction of dataset information from the SIV component based solely on a specific DV scale-value (Q1), the DV level relationship between t w o points (Q2), or the DV trend among several points (Q3). Results revealed that the time required for answering either Q2 or Q3 was longest with bars.  Together, these results strongly indicated that the degree of support provided by a particular graph format for a particular task is heavily dependent upon the matching of task characteristics with graph format characteristics.  Having information related to either the answer or question anchored on the  x-axis and/or y-axis was found to influence task performance with the different graph formats investigated.  Also, information complexity of graphics was found to be a function of time periods  and/or datasets. There was only partial evidence to suggest the influence of individual characteristics on performance.  iii  TABLE OF CONTENTS ABSTRACT  ii  List of Tables  viii  List of Figures  x  Acknowledgements I.  xiv  INTRODUCTION  1  A. Objective of the Research B. Scope of the Research C. Importance of the Research D. Overview of the Dissertation II.  III.  'i  1 3 3 7  LITERATURE REVIEW A. Classifying the Empirical Literature B. Existing Theoretical Development 1. Theories 2. Extensibility of Graphics Theories C. Chernoff's List of Attributes 1. Chernoff's Theory 2. Implications of Chernoff's Theory D. Bertin's Image Theory & Taxonomies 1. Definitions 2. Bertin's Theory 3. Implications of Bertin's Theory E. Graphics Research at Indiana University 1. The Question Construct 2. The Information Set Complexity Construct 3. Implications of Recent MIS Graphics Research Findings F. Cleveland's Theory of Graphical Perception 1. The Cleveland-McGill Theory 2. Application of the Cleveland-McGill Theory to Graphing Data 3. Implications of Cleveland's Theory C. The Kosslyn-Pinker Theory of Graph Comprehension 1. The Kosslyn et al. Analytical Scheme 2. The Kosslyn-Pinker Process Model of Graph Comprehension a. Structures & Processes b. Pinker's Graph Difficulty Principle c. Pinker's Treatment of Information Extraction 3. Graph Schema Models a. A Bar Chart Schema b. A Symbol Chart Schema c. A Line Graph Schema 4. Implications of the Kosslyn-Pinker Theory  8 8 10 11 11 12 14 14 15 15 17 19 19 20 23 23 24 24 26 27 27 30 34 34 39 41 44 44 45 49 50  THEORETICAL PROPOSITIONS A. Critical Factors 1. Graph Format 2. Information Complexity 3. The Task Variable  52 52 52 53 58 iv  4. Learning Tasks Investigated in this Research 1. Experiment 1 Tasks 2. Experiment 2 Tasks 3. Experiment 3 Tasks C. Theory & Propositions 1. The Theory Investigated a. Proposition 1 b. Proposition 2 c. Proposition 3 D. The Anchoring Concept 1. Task Characteristics 2. Graph Format Characteristics 3. Matching Formats to Tasks a. Proposition 4 b. Proposition 5 c. Proposition 6 d. Proposition 7  64 64 65 66 69 72 76 78 79 81 82 82 85 86 86 87 87 88  EXPERIMENTAL METHODOLOGY A. Experimental Variables 1. The Dependent Variables a. Time b. Accuracy 2. The Independent Variables a. Graph Format b. Question Type c. Information Complexity 3. The Session Variable 4. The Covariate B. Experimental Hypotheses C. Experimental Design D. Experimental Procedures E. Experimental Stimuli  90 90 91 91 91 92 92 92 93 94 94 95 97 100 102  B.  IV.  •  -  L  •  V.  DATA A. B. C. D.  ANALYSIS: THE REPEATED MEASURES DESIGN The Repeated Measures Design Statistical Analysis Procedures The Experimental Raw Data : Examination of the Data Structure 1. The Normality Assumption 2. Homogeneity of Variance/Covariance 3. The Symmetry Condition 4. The Univariate-Multivariate ANOVA/ANCOVA Issue 5. Multiple Comparison Techniques E. Summary  106 106 110 111 114 114 116 117 118 119 122  VI.  RESULTS: EXPERIMENT 1 A. Time Performance for Combined Sessions 1. The Session Effect 2. The GEFT Measure .' 3. Additional Outliers 4. The Power Analysis  123 124 124 125 125 129  v  :  •  :  B.  Time Performance foi Separate Sessions 1. Significant Effects on Time for Session 1 2. Significant Effects on Time for Session 2 a. Main Factor Effects on Time for Session 2 b. Two-way Interactions on Time for Session 2 c. Three-way Interactions on Time for Session 2 C. Accuracy Performance for Combined Sessions 1. Main Effects on Accuracy for Transformed Data 2. Two-way Interactions on Accuracy for Transformed Data D. Summary of Experiment E1 Results  ,  130 133 13,7 138 139 149 152 155 156 158  VII.  RESULTS: EXPERIMENT 2 A. Time Performance for Combined Sessions 1. The Session Effect 2. The GEFT Measure 3. Additional Outliers 4. The Power Analysis B. Time Performance for Separate Sessions 1. Significant Effects on Time for Session 1 2. Significant Effects on Time for Session 2 a. Main Factor Effects on Time for Session 2 b. Two-way Interactions on Time for Session 2 c. Three-way Interactions on Time for Session 2 C. Accuracy Performance for Combined Sessions 1. Main Effects on Accuracy for Transformed Data 2. Two-way Interactions on Accuracy for Transformed Data D. Summary of Experiment 2 Results  160 161 161 162 162 162 165 165 172 172 173 179 183 183 185 187  VIII.  RESULTS: EXPERIMENT 3 '. A. Time Performance for Combined Sessions 1. The Session Effect 2. The GEFT Measure 3. Additional Outliers 4. The Power Analysis B. Time Performance for Separate Sessions 1. Significant Effects on Time for Session 1 2. Significant Effects on Time for Session 2 a. Main Factor Effects on Time for Session 2 b. Two-way Interactions on Time for Session 2 C. Accuracy Performance for Combined Sessions 1. Main Effects on Accuracy for Transformed Data 2. Two-way Interactions on Accuracy for Transformed Data D. Summary of Experiment 3 Findings  189 190 190 190 191 191 194 194 198 201 201 214 216 216 221  IX.  INTEGRATION OF RESULTS A. Overview of Key Findings 1. Effects for Time Performance in Session 1 2. Effects for Time in Session 2 a. Main Factor Effect b. 2-way Interactions B. Integration of Findings with the Current Literature 1. Learning 2. The Individual Difference Characteristics  223 223 224 227 227 229 234 235 236  vi  ;  C.  3. Task Characteristics 4. Graph Format 5. Information Complexity 6. Perceptual-Cognitive Mechanisms in Graphics Processing Summary  236 238 238 239 241  X.  CONCLUSIONS A. Summary of Key Findings and Major Contributions a. Contributions b. Findings B. Review of Limitations a. Limitations b. Implications of Specific Limitations C. Suggestions for Future Studies  242 242 242 245 246 247 248 249  XI.  BIBLIOGRAPHY  XII.  APPENDIX A: GLOSSARY OF TERMS  263  XIII.  APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1  267  XIV.  APPENDIX CrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2  304  XV.  APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 :  341  XVI.  APPENDIX E:SUBJECT RECRUITMENT FORM  378  XVII.  APPENDIX F:SUBJECT CONSENT FORM  380  '.  251  XVIII. APPENDIX G:OUTLINE OF EXPERIMENTAL PROCEDURES  381  XIX.  APPENDIX H:INSTRUCTIONS FOR SUBJECTS  383  XX.  APPENDIX l:PILOT TESTING REPORT  385  XXI.  APPENDIX ^QUESTIONNAIRE FOR SUBJECTS  394  XXII.  APPENDIX K:CERTIFICATE OF APPROVAL: ETHICAL REVIEW COMMITTEE  404  XXIII. APPENDIX L M A I N TURBO PASCAL PROGRAM XXIV. APPENDIX M:SAMPLE BMDP STATISTICAL PROGRAM  vii  405 :  412  LIST OF TABLES Table  Page(s)  3.1: A Classification Scheme for Information Complexity Factors  56  3.2: A General Classification of Graphics Research Tasks  62  3.3: Classes of Elementary Comprehension Tasks  63  3.4: A Comparison of Task Activities for Experiments 1, 2, and 3  67  3.5: Status of Graphical Component Information for Task Activities in  68  Experiments 1, 2, and 3 3.6: Information Complexity Manipulated in Experiments E1 and E2  70  3.7: Information Complexity Manipulated in Experiment E3  71  3.8: The Anchoring Concept  84  4.1: A Multi-factor Repeated Measures Experimental Design  98  5.1: Summary of Experimental Raw Datasets  113  6.T. Initial ANCOVA Results for the Full Dataset (Experiment 1)  126  6.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment  128  1, Session 2) 6.3: Comparison of ANOVA Results A m o n g Sessions (Experiment 1, Additional  131  Outliers Excluded) 6.4: Tables of Means for All Treatment Combinations (Experiment 1, Outliers  132  excluded) 6.5: Summary of Bonferroni Results for Graph Format x Question Type  136  Interaction (Experiment 1 Session 1) ;  6.6: Summary of Bonferroni Results for Graph Format x Question Type  141  Interaction (Experiment 1, Session 2) 6.7: Summary of Bonferroni Results for Question Type x Time Period Interaction (Experiment 1, Session 2)  viii  144  6.8: Summary of Bonferroni Results for Graph Format x Dataset Interaction (Experiment 1, Session 2) 6.9: Data Table of Question Type x Time Period x Dataset Interaction (Experiment 1, Session 2) 6.10: Summary of Bonferroni Tests for the Question Type x Time Period x Dataset Interaction (Experiment 1, Session 2) 6.11: Comparison of ANCOVA Results for Sessions 1, 2, and the Transformed Dataset (Experiment 1, Outliers Excluded) 6.12: Mean Values of the Question Type x Time Period Interaction for Transformed Data (Experiment 1, Outliers Excluded) 7.1: Initial ANCOVA Results for Full Dataset (Experiment 2) 7.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment 2, Session 2) 7.3: Comparison of ANOVA Results A m o n g Sessions (Experiment 2, Additional Outliers Excluded) 7.4: Tables of Means for All Treatment Combinations (Experiment 2, Additional Outliers Excluded) 7.5: Summary of Bonferroni Results for Graph Format x Question Type Interaction (Experiment 2, Session 1) 7.6: Summary of Bonferroni Results for Graph Format x Dataset Interaction (Experiment 2, Session 2) 7.7: Summary of Bonferroni Results for Time Period x Dataset Interaction (Experiment 2, Session 2) 7.8: Mean Values of Graph Format x Question Type x Time Period Interaction (Experiment 2, Session 2)  ix  7.9: Mean Values of Question Type x Time Period x Dataset Interaction (Experiment 2, Session 2) 7.10: Summary of Bonferroni Results for Question Type x Time Period x Dataset Interaction (Experiment 2, Session 2) 7.11: Comparison of ANCOVA results for Sessions 1, 2, and Transformed Dataset (Experiment 2, Outliers Excluded) 7.12: Mean Value Tables for Significant Two-factor Interactions for the Transformed Dataset (Experiment 2, Outliers Excluded) 8.1: Initial ANCOVA Results for the Full Dataset (Experiment 3) 8.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment 3, Session 2) 8.3: Comparison of ANOVA Results Among Sessions (Experiment 3, Additional Outliers Excluded) 8.4: Tables of Means for All Treatment Combinations (Experiment 3, Outliers Excluded) 8.5: Summary of Bonferroni Results for Graph Format x Question Type Interaction (Experiment 3, Session 1) 8.6: Summary of Bonferroni Results for Graph Format x Question Type Interaction (Experiment 3, Session 2) 8.7: Summary of Bonferroni Results for Question Type x Time Period Interaction (Experiment 3, Session 2) 8.8: Summary of Bonferroni Results for Graph Format x Dataset Interaction (Experiment 3, Session 2) 8.9: Summary of Bonferroni Results for Time Period x Dataset Interaction (Experiment 3, Session 2)  8.10: Comparison of ANCOVA Results for Sessions 1, 2, and the Transformed Dataset (Experiment 3, Outliers Excluded) 8.11: Bonferroni Test Results and Mean Value Table for Question Type x Time Period Interaction of Transformed Data (Experiment 3) 8.12: Mean Value Table for Question Type x Dataset Interaction of Transformed Data (Experiment 3, Outliers Excluded) 8.13: Bonferroni Test Results and Mean Value Table for Time Period x Dataset Interaction of Transformed Data (Experiment 3) 9.1: Overview of Key Findings for Experiments E1, E2, and E3 (Session 1 Results) 9.2: Overview of Main Factor Effect on Graph Format Characteristics for Experiments E1, E2, and E3 (Session 2 Results) 9.3: Overview of Two-Factor Interaction Effects on Graph Format Characteristics for Experiments E1, E2, and E3 (Session 2 Results)  xi  LIST OF FIGURES Figure  Page(s)  2.1: Three Visual Information Processing Stages  29  2.2: Kosslyn et al's Basic Level Constituents  31  2.3: Pinker's Graphic Notation  37  2.4: Pinker's Process Model of Graph Comprehension  40  2.5: Pinker's Information Extraction from a Bar Chart  43  2.6: Pinker's Proposed Bar Chart Schema  46-47  3.1: An Illustration of Major Graphical Components  57  3.2: Pinker's Illustrations of Graph Designs for Trend Reading  77  4.1: An Experimental Procedure Flowchart  103  5.1: A Repeated Measures Design  109  5.2: A Guide for Selection of Multiple-Comparison  120  6.1: Plot and Mean values of Graph Format x Question Type Interaction  135  (Experiment.1, Session 1) 6.2: Plot and Mean values of Graph Format x Question Type Interaction  140  (Experiment 1, Session 2) 6.3: Plot and Mean values of Question Type x Time Period Interaction  143  (Experiment 1, Session 2) 6.4: Plot and Mean values of Graph Format x Dataset Interaction (Experiment 1,  147  Session 2) 7.1: Plot and Mean Values of Graph Format x Question Type Interaction  170  (Experiment 2, Session 1) 7.2: Plot and Mean Values of Graph Format x Dataset Interaction (Experiment  175  2, Session 2) 7.3: Plot and Mean Values of Time Period x Dataset Interaction (Experiment 2, Session 2) xii  177  8.1: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 3, Session 1) 8.2: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 3, Session 2) 8.3: Plot and Mean values of Question Type x Time Period Interaction (Experiment 3, Session 2) &.4: Plot and Mean values of Graph Format x Dataset Interaction (Experiment 3, Session 2) 8.5: Plot and Mean values of Time Period x Dataset Interaction (Experiment 3, Session 2)  xiii  ACKNOWLEDGEMENTS  I dedicate this work t o my dear wife, Ms. Leonie Tan. Certainly, without her patient understanding, love, continuing inspiration and assistance, this project would not have come t o an end. The responsibility for all omissions and errors must, of course, remain with me.  In terms of my dissertation committee, I must surely give credit and make acknowledgement t o Dr. Izak Benbasat, chairman of the MIS division as well as this research committee, whose valuable assistance in all aspects of this effort can never be sufficiently emphasized. I must also give credit t o other members of my committee, namely, Dr. Al Dexter, and Dr. Larry Ward, whose continuing support, guidance, and encouragement (however great or small) at the various stages of this research are most dearly appreciated. I like also t o express my sincere appreciation t o members of my Examination committee, all MIS faculty members, and all experimental subjects because without their participation, this dissertation would not have been a success.  I like t o give special thanks also t o those faculty members of the Psychology, Statistics, and Health Care and Epidemiology Departments and staff members of the Computing Center who have assisted in the design, statistical analyses, and programming of this project. In particular, I wish t o express my gratitude t o my collegues in the department of Healthcare and Epidemiology: Dr. Godwin Eni, for his •genuine concern, encouragement and support; and Dr. John Milsum, for his many excellent suggestions, comments, and insights in proof-reading the script. Finally, I must give credit t o my parents and other members of my immediate family for their continuing patience in following me to the end of this journey.  xiv  I.  INTRODUCTION  Interest in the use and study of graphics can be found across many disciplines such as Statistics, Education, Cartography, Psychology, Human Factors Engineering, and Managment Information Systems (MIS).  Yet, graphics research over the last several years has been criticized as consisting of  atheoretical  studies that are plagued by serious methodological  problems and controversial  findings  (see Ives, 1982; Kosslyn, Pinker, Simcox, & Parkin, 1983; Jarvenpaa, Dickson, & DeSanctis, 1985). Even so, isolated contributions in the area have come from a wide spectrum of theoretical perspectives that are sometimes very difficult to reconcile, including: a semiological Wainer, Lono, & Groves, 1982); the application of psychophysical  approach (e.g. Bertin, 1973, 1983; laws (e.g. Weber's & Steven's laws:  Baird & Noma, 1978; Cleveland & McGill, 1984); and an integration of general perceptual  and  cognitive  theory (e.g. Kosslyn et al., 1983; Pinker, 1981, 1983). While several of these recent works, and others (e.g.  Benbasat, Dexter, & Todd, 1986; Jarvenpaa & Dickson, 1988), are beginning to offer promising  guidelines for graphics designers based on theory  thumb,  and research  rather than on debatable rules  of  there remains a host of identifiable gaps among the theories that must be filled if knowledge  about graphics is t o advance (see Vessey, 1987).  A.  OBJECTIVE  OF THE  RESEARCH  The purpose of this investigation is to address the following question:  What  are relative strengths  different types of managerial  and weaknesses  data extraction  of various graphical  for  tasks ?  Bertin (1983; p. 100) suggests that the basic problem  in graphics is to choose the most appropriate  graphic design for representing a given set of information. among MIS graphics researchers (e.g.  representations  In parallel, the current thinking expressed  DeSanctis, 1984; Dickson, DeSanctis, & McBride, 1986;  Benbasat et al., 1986; Jarvenpaa & Dickson, 1988) and other graphics theorists (e.g. Pinker, 1981; Kosslyn et al., 1983; Cleveland,  1985) is that different  1  graphical representations  facilitate the  INTRODUCTION / 2 performance of different tasks. Accordingly, a major objective of this research is to test hypotheses based on current beliefs about information representations. next few chapters.  These hypotheses are developed in the  The method of investigation is a program of laboratory experiments testing  possible relationships between various graphic designs and performance on various  elementary  perceptual-cognitive tasks.  In this research, relative strengths and weaknesses of various graphical designs will be evaluated principally on the basis of Time as the major dependent variable for each experiment.  This measure is  t o be defined in terms of latency of responses. Accuracy as measured by the percentages of correct responsest will be a secondary criterion.  Thus, for a generally acceptable level of accuracy in task  performance, the objective is to minimize time required to extract the appropriate data from a graphical presentation.  Different individuals may exhibit different degrees of time-accuracy tradeoff.  Researchers should control for this effect, even more so, when relatively complex tasks are t o be investigated  so as to maintain  experimental results.  a high  internal validity and an unambiguous  interpretation of  This could be done, for example, with the use of control groups and their  performance compared to experimental groups.  In this research, time-accuracy tradeoff effect is  controlled, first, by the inclusion of both time and accuracy measures, and, second, by the elimination from the final analyses of the data of those subjects w h o show a significant time-accuracy tradeoff.  Independent variables of interest included in this research are: 1.  Graph Format  2.  Information Complexity, which includes, a.  Variations in Time Period  b.  Variations in Dataset Category  t The more general term, Accuracy is used instead of the Engineering term, Precision so as to be consistent with the MIS literature on graphics evaluation (e.g. Wainer et al., 1982; Davis, 1985; Yoo, 1985; Lauer, 1986). Hence, Perf (Accuracy) = No. of Correct Responses/Total No. of Responses x 100%  INTRODUCTION / 3 3.  Question type (i.e. task)  A Session variable is used to control effects due to learning. This implies that not only will subjects assigned to each experiment undergo an initial practice session but that they will replicate all thirty-six different treatment combinations or trials in the experimental session.  Finally, the GEFT (Group  Embedded Figures Test) developed by Witkin, Oltman, & Raskin (1971) is to be used as a covariate measure to control effects due to individual differences among subjects.  Important terms used in the Thesis are defined in the Glossary (appendix A).  B.  SCOPE  OF THE  RESEARCH  The scope of the research program is limited to determining the effects of a selected set of information representation characteristics in a controlled setting. The individual is the unit of analysis; group effects are not addressed.  The research concerns only the most usual purpose for which business time-series graphics are employed: the extraction of elementary quantitative information.  tasks  rather than on composite  problem-solving or complex  Its focus is on graph  comprehension  decision-making tasks (see Davis,  Groomer, Jenkins, Lauer, & Kwan, 1985; Davis, 1985; Vessey, 1987 for critiques) or on the recall  of  quantitative graphical information (see Macdonald-Ross, 1977b).  C.  IMPORTANCE  OF THE  RESEARCH  Although MIS graphics research focuses primarily on the efficiency  and effectiveness  of computer  graphics as decision aids (e.g. DeSanctis 1984; DeSanctis & Jarvenpaa, 1985; Davis & Olson, 1985), it is important to note that the use of graphics to aid decision-making may essentially be a composite process of many disaggregated processest. It is essential, then, if we are t o understand the use of t  Both  Bertin's  and  Pinker's  genera!  propositions  on  graphics  processing  strongly  support  INTRODUCTION  / 4  graphics as decision-making aids, that we be able t o decompose complex tasks (see Benbasat & Dexter, 1985, 1986; Benbasat et a!., 1986) t o a level at which w e can understand the underlying mechanisms (see Blalock, 1969).  Perhaps this explains why progress in the understanding of graphics  as decision aids has been slow even though many one-shot studies on the use of graphics and/or color at complex levels of decision-making and problem solving have been conducted.  Indeed, there has been a growing concern over the concentration of prior MIS graphics research on the mainly macro level of decision-making tasks (see Benbasat et al., 1986), when there appears t o be an equal, if not more important need t o investigate tasks at the more micro level of graph comprehension.  It is time that MIS researchers also become acquainted with the use of graphics at  the level of visual and logical extraction of quantitative and/or qualitative informationt in answering specific questions.  Research on the use of various graphical designs for performing fundamental tasks can benefit both the practicing and academic communities: 1.  Despite conflicting and weak empirical evidence on the use of graphics as a decision support tool (Dickson et al., 1986), the use of business graphics continues t o proliferate (Lehman, Vogel, & Dickson, 1984).  2.  Graphics research has failed t o build on previous work.  Perhaps this is partly responsible for the  limited progress made in the field. O n the one hand, graphics researchers interested in human perceptual-cognitive processes have ignored the findings presented in the MIS field probably because the field is still in a developing stage; on the other hand, graphics researchers in MIS are generally unaware of existing theories of graphics information processing advanced in related disciplines (e.g. Chernoff, 1978; Bertin, 1981; Pinker, 1981; Kosslyn et al., 1983; Cleveland &  t(cont'd) such a viewpoint. t Kosslyn et al. (1983; p. 272) contend that charts usually convey information qualitative relationships (e.g. "is a member of" or "occur after") whereas graphs convey information about quantitative information (e.g. "x has more than y").  about always  INTRODUCTION / 5 McCill, 1984).t MIS, as an interdisciplinary field, is especially suited t o bridging existing gaps among graphics related disciplines, thereby cutting down on duplication. 3.  Although this research program is primarily driven by the need to test existing claims by graphics theorists so that the study of graphics can advance, a large part of its motivation comes also from the need to fill in obvious gaps among current theories.  Neither the theoretical nor the  empirical literature, for example, provides clear statements on the effects of  information  complexity factors and their interactions with other variables (e.g. graph format) when different types of questions are asked on conventional time series graphics.  In addressing these issues,  the research program has also attempted to identify some of the knowledge gaps existing among the theories as well as contribute towards filling those gaps. 4.  Major methodological problems with earlier MIS graphics research include the lack of adequate control of the task variable, a key factor that is identified in current theories of graph perception and comprehension, and the lack of a priori predictions about performance with various graphical designs based on theories. This lack of theoretical perspectives among earlier graphics researchers in MIS is most evident in their failure to distinguish among the use of various graph formats and the lack of strong rationalization as to why one graph format would be better suited for a particular task than another.  This program of research aimed at overcoming these  drawbacks. 5.  Despite much graphics research, little is understood of how people read and understand graphs at the level of extracting information to answer specific questions. Although some attention has been paid to the importance of the question variable (e.g. Wainer et al., 1982; Powers, Lashley, Sanchez, & Shneiderman, 1984; Davis et al., 1985) there still is little success in empirically validating those critical factors that influence performance with an information presentation. The problem appears also to lie in the lack of cumulative effort and evidence towards resolving  t There is hope that this situation will improve. For instance, a stream of recent MIS graphics research with the focus of formalizing a theory of information presentation has been based on Bertin's theory (see Yoo, 1985; Davis et al., 1985; Lauer, Davis, Croomer, Jenkins, & Kwan, 1985).  INTRODUCTION / 6 important issues such as: a rigorous specification of critical  questions,  or tasks that should be  tested; an operational definition of factors contributing to the information  complexity  of a  graph; and, a set of meaningful propositions about characteristics of various graphical designs and their effects on performance.  A review of existing graphics theories suggests that this  problem can be overcome by classifying tasks, identifying a number of possible factors affecting the information complexity of graphics, and formulating a number of interesting a hypotheses that can be tested empirically.  priori  Results following such a research program would  thus contribute to a much needed understanding of the extent to which these theoretical propositions may be used to explain graphical perception and comprehension. 6.  While a large proportion of prior MIS graphics research has been devoted to the study of individual differences in the context of information presentation design, the failure to focus on the role of task is believed to be the underlying cause of conflicting results among various graphics experiments (see Ives, 1982; DeSanctis, 1984; Benbasat et al., 1986).  A further  motivation for such a research program, therefore, is to deal precisely with specific  well-defined 7.  and  tasks so that the underlying, processes may be understood (Blalock, 1969).  Finally, research findings about the relative strengths and weaknesses of various forms of graphical designs for performing different elementary tasks may often be integrated for drawing higher inferences.  For example, if we know that isolated symbols will strongly facilitate the  extraction of exact point values and that unbroken lines will strongly facilitate trend perception, then we may infer that perhaps the most appropriate representation for performing a composite task that requires both extracting point values and reading a trend, is that of a connected symbol graph.t  t A connected symbol graph merely connects isolated symbols with an unbroken Cleveland (1985, p. 180-183) for illustrations of all the different representations here.  line. See discussed  INTRODUCTION / 7  D. OVERVIEW OF THE DISSERTATION Chapter 2 will provide a classification" scheme for the empirical literature, as well as a comprehensive review of the theoretical literature on graphics representations. Then, based on the literature review, a set of factors believed to influence task performance with the use of a graphical presentation will be identified, and a set of propositions that may be translated into testable hypotheses, will be advanced in chapter 3. Following this, chapter 4 will discuss the experimental methodology and the statistical design used in this research program.  An examination of the overall structure of the experimental data that will be gathered and how well the data will conform to the various assumptions underying the analysis of variance-covariance for a total within-subject design will be discussed in chapter 5. Chapters 6, 7, and 8 form a series of chapters covering in details the findings for experiments 1, 2, and 3 respectively.  Chapter 9 will then attempt to  integrate the findings as well as provide generalizations for the results.  Finally, chapter 10 will conclude with a summary of key findings, major contributions and limitations of the research, and also will provide suggestions for future MIS graphics researchers.  A comprehensive bibliography of quoted references used is included as a separate section, followed by a series of appendices.  Included in the appendices are the section on Glossary  of Terms for  defining the meaning of specialized terms used in the context of the Thesis, the actual questions with their corresponding graphics displayed in the experimental sessions of each study conducted in the research program, the respective instructions, forms, and questionnaire administered to subjects, the pilot testing report, and samples of programs designed for running the experiments and the statistical analyses.  II. LITERATURE REVIEW A shift from experiments with complex problem solving tasks to a directed and programmed research approach, with each experiment dealing with an in-depth exploration of some level of decision-related task abstraction^ is currently recommended in the MIS literature evaluating computer graphics and color (e.g. Benbasat & Dexter, 1985,1986; Benbasat et al., 1986; Dickson et al., 1986; Jarvenpaa, 1986; Jarvenpaa & Dickson, 1988).  Incidentally, this latter research strategy is not new among color and  graphics researchers in areas of human factors and human information processing (e.g. Croxton, 1927; Croxton & Stryker, 1927; Carter, 1947, 1948a, 1948b; Schutz, 1961a, 1961b; Wainer & Reiser, 1979; Wainer & Thissen, 1981; Wainer et al., 1982; Cleveland et al., 1983; Cleveland & McCill, 1984).  Indeed, such a methodological shift (see Remus, 1984; Jarvenpaa, Dickson, & DeSanctis, 1985; and Dickson et al., 1986) is long overdue since the complex problem solving approach advocated by prior MIS graphics researchers has often yielded equivocal and piecemeal results without resolving the degree of influence of several key factors, among others including: the Question al., 1982; Pinker, 1983); the Information  asked (See Wainer et  Set presented (See Bertin, 1983; Lauer et al., 1985); Color  (See  Christ, 1975; Tullis, 1981; Cleveland & McGill, 1983; Benbasat & Dexter, 1985,1986); Learning  (See  DeSanctis & Jarvenpaa, 1985 ); and Cognitive  A.  CLASSIFYING  THE EMPIRICAL  Style.t  LITERATURE  Two approaches in MIS graphics experimentation provide a clear basis for classifying the empirical literature (see Davis, 1985; Yoo, 1985; Lauer, 1986; Vessey, 1987): 1.  Those using complex problem solving tasks (e.g.  Lucas & Nielsen, 1980; Lucas, 1981; Chani,  1981).  In Davis' words, the approach adopted by this first set of studies faces such problems as t I.e. Designing the complex decision task as a more refined executable chain of elementary task activities. $ The concept of Cognitive Style is explained briefly in the Glossary. 8  set  of  sequentially  LITERATURE REVIEW / 9 ... a failure t o specify and control the variables which affect performance with an information presentation ..(and)..the resulted  in)..a  use of confounding experimental tasks...  (which  series of studies which are fatally flawed and whose findings are in conflict  with each other (Davis, 1985,p. 41). 2.  Those focusing primarily on the process of information extraction from various displays using various question types (e.g. Price, Martuza, & Crouse, 1974; Lusk & Kersnick, 1979; Wainer et al., 1982).  Davis observes that results of this second set of studies have been restricted because of their failure to provide a sound taxonomy of question type.  Indeed, a review of the empirical literature on graphics research shows that it is no more a heuristic source of suggestions than the genuine foundation for a body of research.  Evidence of such a  contention can be found in the recent Kosslyn et al. (1983) extensive review of the psychological literature on graphics research, as well as past reviews (e.g. Macdonald-Ross, 1977a; Ives, 1982; DeSanctis, 1984; Jarvenpaa et al., 1985). Thus, Kosslyn et al. could not find any systematic approach examining the various aspects of the graph comprehension process. methodological  Instead they found many  problems accompanying earlier studies, including such flaws as confounding of  perception with memory, tallying of errors rather than scaling perceived values of a graphed variable psychophysical^, failure to counterbalance order of presentation of conditions, usage of a single dataset presented as graphics stimuli, provision of ambiguous instructions t o subjects and neglecting to inform subjects on what should be attended to in the graph.  There has been a definite lack of either theoretical integration or an accumulated body of empirical evidence to determine the circumstances under which different forms of presentation may be more appropriate for different types of tasks. Experiments conducted by Wainer et al. (1982) and by those from Indiana university (e.g. Davis et al., 1985; Davis, 1985; Yoo, 1985; Lauer et al., 1985; Lauer, 1986)  LITERATURE REVIEW / 10 have been based narrowly on Bertin's theory alone. Their works are not without peculiar weaknesses of their own, as will be pointed out in a later discussion. O n the other hand, Pinker's (1983) empirical tests of his graph  difficulty  principle  as well as the works of Simcox (1981, 1983a,b, 1984) have been  limited by their use of unconventional  graphics.  Moreover, their experiments employ graphic stimuli  that may be considered overly simple in an administrative context.  It is debatable whether results  based on the novel graphics stimuli that they used could actually be generalized to real-world business applications (see also Davis, 1985; Lauer, 1986). The focus of Cleveland's experimental tasks is simply on  accuracy of  graphical  perception  comprehension or decision making.  and not the  effective  use of  to  support  graph  Moreover, aside from the work done at Indiana university, no  one has actually attempted to measure effects due to the complexity Consequently, instead of trying to draw valid  presentations.  graphs  of graphical  information  conclusions from still fairly shaky and  disjointed bodies of empirical literature, it is argued that MIS graphics researchers need a rigorous and c o m m o n theoretical perspective on information systems and task characteristics to guide their research.  Therefore future research should contribute to the building of a cumulative graphics  discipline with this common goal.  Accordingly, the first major concern is to conduct a comprehensive survey of the literature on the narrower area of theories  B.  EXISTING  of graphics  THEORETICAL  The term theory  rather than of the whole empirical literature.  DEVELOPMENT  has different meanings for different writers.t For some, it implies a detailed,  systematic and comprehensive approach to a particular area (see Dubin, 1978); for others, it may just be a set of plausible statements that describe a phenomenon (see Cleveland & McCill, 1984).  The purpose of this review, therefore, is to introduce to the reader the literature on existing and current theories t  of information representation (e.g.  A definition of this term is provided in the  Chernoff, 1978; Bertin, 1967,1973,1981,1983;  Glossary.  LITERATURE REVIEW / 11 Pinker, 1981,1983; Kosslyn et al., 1983; Cleveland & McCill, 1984; Cleveland, 1984,1985; Kosslyn, 1985).  The intent is: (a) to provide a rigorous and scientific foundation for this dissertation; (b) to  extend current theoretical works (e.g.  Pinker, 1981; Kosslyn et al., 1983) so as to provide specific  models for proposed graph schema structures and accompanying processes; and (c) to draw a priori predictions from these theories with respect to performance on various elementary tasks for different designs of time series graphics. Research based on a systematic approach of theoretical development and empirical testing is more likely to yield better direction to graphics designers on the application of the various design elements as effective decision making aids, than are "one-shot" studies.  1. Theories As Kosslyn (1982) points out, different kinds of theories are meant to illuminate different facets of the same phenomena; for instance, Kosslyn (1982) distinguishes: a theory of computation,  what is computed without regard for how it is computed; a theory of functional  which specifies  architecture,  which  specifies the structures and processes that are available for performing the computation; and a theory of the algorithm,  which specifies how computations are carried out within the confines of the  functional architecture.  Again, for every theory of computation, numerous theories of the functional architecture could be used to explain actual processing.  Further, given a theory of functional architecture, several different  algorithms are possible within its confines, such as varying the order in which specific operations are performed (see Kosslyn, 1982).  2. Extensibility of Graphics Theories  Some newer theories of graphical information representation (e.g.  Kosslyn et al., 1983; Pinker, 1981;  Kosslyn, 1985) have claimed that their treatments of perceptual and cognitive processes underlying a graph reading may be naturally extended to accommodate other less constrained forms of information  LITERATURE REVIEW / 12 representation.  In this regard, Kosslyn et al. argue that since graphs are among the most general form  of information representation and yet are also the most constrained,t  "...relaxing  it is simply a matter of  various strictures for making a good chart or graph when considering making a good map,  diagram, or table." (Kosslyn et al., 1983, p. 15).  Even so, these recent applied  cognitive  science  theories on graphics have yet to be mentioned in the  mainstream MIS literature and as they form a foundation on which many graphics research should have been rooted, a more detailed treatment of these theories will be given in contrast to others; this, despite the fact that the other theories  will also be as important in contributing to the ideas developed  in this dissertation.  C.  CHERNOFF'S  LIST OF  ATTRIBUTES  Chernoff (1978) provides one of the earliest steps towards the development of a graphics discipline. Building on Schmid's (1954; Schmid & Schmid, 1979) three key attributes for classifying charts and graphs, Chernoff proposes the following list of 17 1.  Illustrate or communicate  2.  Analyze or comprehend  3.  Compute  4.  Impact  5.  Mnemonic Character  6.  Attraction  7.  Accuracy (Precision)  8.  Accuracy (Lack of Distortion)  attributes:*  t I.e. They function to communicate information that is well-structured. Accordingly, Kosslyn et al. (1983) observe that, "graphs are the most constrained form, with t w o scales always being required and values or sets of values being associated via a paired with relation that is always symmetrical." (p. 14). $ Chernoff believes that more attributes could be added to this initial list. He introduces Schmid's three attributes as the first three in his list.  LITERATURE REVIEW / 13 9.  Compactness  10.  Comprehensiveness  11.  Self Explanatory  12.  Time  13.  Dimensionality  14.  Theoretical versus Data  15.  Contrast or Sensitivity  16.  Ease of Application  17.  Audience  In his seminal  paper, Chernoff (1978) offers brief but descriptive explanations of each of the above  constructs, supporting his definitions by illustrating concisely how his list of. attributes may be applied t o a graphical method of his very own creation: Chernoff  faces.  While Chernoff's list, with its  c o m m o n characterization of all information representations, provides a rich source of variables, his characterization is nonetheless intuitive and subjective.  In fact, there appears to be an interesting  parallel between Chernoff faces and his attribute list; both of them require much subjectivity and intuition to use, and both are seriously weakened by the presence of built-in dependencies among their features.  Yet, Chernoff's list forms the skeleton for a general multi-attribute scheme, useful  perhaps for evaluating (via self-reporting) the various strengths and weaknesses of different graphical representation methods. construction.  Chernoff's list, however, lacks the robustness of an empirical or theoretical  It is, at best, good intuition and appears to be driven fundamentally by three major  categories of tasks in which graphical representations may be useful: 1.  Perceptual Tasks  2.  Comprehension Tasks  3.  Memory Taskst  t These three major categories of tasks follow essentially from the three major purposes of all graphs/charts: communication, analysis, and storage (see e.g. Kosslyn, 1985; Bertin, 1983; cf. Schmid, 1954) The object of communication is to impart to the audience  LITERATURE REVIEW / 14 Finally, Chernoff's list may be regarded as forming elements of his theory since it provides insight as to how graphical representation methods as well as their applications may be clustered.  1. C h e r n o f f ' s T h e o r y  Chernoff expresses his theory of using appropriate graphical methods for different applications by the following statement:  The key to the successful use of graphics should involve a matching  of method and application  in terms of the extents of the attributes required by the application and how well the method supplies these attributes (Chernoff, 1978, p. 6).  In short, Chernoff believes that it is possible t o define both the graphical methods and the applications requiring the use of these methods in terms of his attribute list.  According to his theory, optimal  result comes from using the information presentation that matches best in characteristics or attributes the nature of the application task being supported.  2.  I m p l i c a t i o n s of C h e r n o f f ' s Theory  The key point in Chernoff's theory relevant to this research is the focus on the application critical factor when dealing with effective graphical methods or tools. important that one can tell which attributes  He argues that not only is it  are effectively supported by the graphical method in  question; but, more importantly, one has to know kinds of supporting attributes  task as a  the nature of the application (i.e., task) and the  it suggests.  Unfortunately, in spite of the importance of the notion of attribute matching  between the  application  and the graphical representation method, Chernoff's work (1978) fails to suggest any empirically t ( c o n t ' d ) information which has typically been analyzed and understood, whereas that of analysis is to find a representation permitting the understanding of what conclusions may be drawn and what relations and regularities exist in the information presented. The object of storage is to assist the reader in remembering the information presented.  LITERATURE REVIEW / 15 testable or operationally rigorous definitions of either the application available with respect to a common attribute set.  task set or the graphical methods  Hence, his theory appeals t o researchers particularly  because it identifies the application task as a key determinant when using a specific graphical method, in addition t o presenting a way of thinking about characterizing common properties of information representations as well as task applications.  It is interesting, however, t o note that Chernoff's view is in agreement with the current literature on MIS graphics research (e.g. Jarvenpaa et al., 1985; Benbasat & Dexter, 1985,1986; Dickson et al., 1986).  Many MIS researchers increasingly assert that a plausible explanation of the contradictory  findings among prior graphics research is the lack of control of the task variable.  For example,  Jarvenpaa (1986) has indicated the need for a c o m m o n taxonomy of basic decision tasks before there can be any meaningful comparison of research findings. Yet, MIS researchers are slow in proposing a well-defined set of decision  making  tasks and there have not even been any acceptable operational  definitions of the term task at the level of managerial decision making or problem solving, although some researchers appear t o be working on the issue (see Dickson et al., 1986).  D.  BERTIN'S  IMAGE  THEORY  &  TAXONOMIES  Bertin (1967,1973,1981,1983) advances the first rigorous foundation for the study of graphics and graphics information processing.  Yet, it is the English translation of his Semiology  popularizes his theoretical treatment on graphics information processing.  of Graphics  that  In reviewing the book,  Kosslyn (1985) observes that Bertin "leaves virtually no graphical stone unturned" and that the only general complaint he has about the book is its difficulty to read.  1.  Definitions  Bertin uses a semiological-r sign systems. tl.e.  approach t o define graphics in relation to mathematics, music and other  For instance, he observes that both graphics and mathematics are monosemic  A theory of signs and symbols.  systems  LITERATURE REVIEW / 16 since a graphic can be comprehended only when the unique meaning of each sign has been specified, just as an equation can be comprehended only when the unique meaning of each term has been specified.  But, as Bertin points out, all of the sign-systems intended for the ear are auditory, linear and  temporal, whereas those intended for the eye are visual, spatial and atemporal. This leads Bertin to conclude that the true purpose of graphics within the framework of logical reasoning is in the monosemic  domain  of spatial  perception.  Moreover,  Bertin distinguishes the information  transmitted in a graphic system (i.e. the content) from the properties  to  be  of the graphic system (i.e. the  container).  Information,  according  correspondences invariant.  to  Bertin,  is  constituted  essentially  by  one  or  several  between a finite set of variational concepts, which he calls the components,  pertinent and an  For example, consider the following t w o statements:  On July 8, 1964, stock X on the Paris exchange is quoted at 128 francs. O n July 9, it is quoted at 135 francs.  Each statement involves a pertinent correspondence between t w o variational concepts (the number of francs and the date) and an invariant (stock X) which constitutes the c o m m o n ground relating the francs to the dates.  The complexity  of graphical information designed is considered by Bertin to be a function of the  number of identifiable elements in each component.  Bertin uses the term length  to characterize the  number of identifiable elements in a given component or variable and offers a taxonomy of levels of  organization  of the components or visual variables based on the notion of nominal, ordinal, and  interval-ratio scale values (Baird & Noma, 1978). Information analysis, in Bertin's view, comprises three stages: (1) determining the number of components, (2) identifying the number of elements or categories in a given component, and (3) defining the level of organization among the components.  LITERATURE REVIEW / 17 As for properties  of the graphic system, Bertin identifies eight ways in which a visible mark expressing a  pertinent correspondence can vary: it can vaiy in relation to the t w o planar dimensions and in relation to the other six retinal  variables including size, value, texture, color, orientation, and shape.t Again,  Bertin offers an encyclopedic and useful taxonomy and treatment of the variations of a mark with respect to the retinal variables. A mark, Bertin argues, can be represented as a point, a line or an area within the plane.  2. Bertin's Theory The underlying concept of Bertin's theory is his difficulty metric which is given by the number  perceptual  glances  of  necessary to extract the relevant information from an information representation.  Applying Zipf's (1935) notion of mental  cost to visual perception, Bertin formulates an image  theory  which basically deals with rules of the graphic system that will aid the designer in choosing the variables required for constructing the most efficient  efficiency  representation.  Bertin (1983; p.9)  defines  in terms of a question and its answer: If, in order to obtain a correct and complete answer to  a given question, other things being equal, one construction requires a shorter period of perception than another construction, we can say that it is more efficient for this question.  Basically, Bertin's image  theory  argues that the reader extracts information from an information  representation by visually isolating one or more images answer a specific question.  or Gestalts  Reading involves, first, an external  (see also Kosslyn et al., 1983) to  and an internal  identification of the  relevant components and their respective variables, and, second, the perception of the information itself,  ln turn, the perception of information (i.e. a series of pertinent correspondences) depends  chiefly on the type of question and the level of reading required by the question.  In this respect,  Bertin categorizes questions according to types as well as levels of reading^ and distinguishes among three  levels of  reading  ranging  from  the  elementary  (i.e. at the  t See the Glossary for definitions of these terms. X Refer to Lauer (1986, p.43) for a brief discussion of this issue.  level  of  individual  values),  LITERATURE REVIEW / 18  intermediate(i.e.  at the level of homogeneous categories which are less numerous than the original  categories), to the overall  level (i.e. at the level of global  pattern).  Operationalization of Bertin's taxonomy of question types yields the following categories: 1.  Exact Questions  2.  Trend Questions  3.  Comparison Questionst  Moreover, Bertin argues that the process of answering a question requires, first, the input of the values that have been provided in the question asked, second, the perception correspondences between the components, and, finally, the output  identification of appropriate  identification of the required  answer.  The ease with which a viewer can perform all of these processes determines the speed with which they can answer the question. The most efficient display is that which, given any type or level of question, allows the reader to extract the answer in one glance.  Put together, efficient displays are those which  require the least number of "perceptual glances" t o comprehend well enough to answer a given question.  Kosslyn (1985) recommends Bertin's Semiology interested in graphics.  of Graphics  to every serious student and researcher  Yet, in their review on graphical displays, Wainer & Thissen (1981) conclude  that Bertin's propositions form a "rudimentary" untested theory in an area which is devoid of organized research and theory. theory.  Furthermore, there are ambiguities in some aspects of the Bertin  For example, Pinker (1981) argues that Bertin's difficulty  metric as given by the number of  perceptual glances is ambiguous since Bertin does not define what constitutes a perceptual identifiable  f  or  recognizable  Wainer et al., 1982.  in one  perceptual  glance.  Similarly,  Bertin's  treatment  of  unit image  LITERATURE REVIEW / 19  construction,  which are rules of construction  for selecting the variables t o construct the most efficient  representation, are difficult t o use. Regarding this aspect of Bertin's theory, Kosslyn (1985) comments:  The rules were somewhat disappointing...; they are really general goals and are not algorithmic.  3. Implications of Bertin's Theory Fundamentally, Bertin's theory on the use of an information representation t o aid decision making is at the level of specifying the question which the decision maker wishes t o answer with the information This puts the decision  representation.  empirically testable.  making  task into a level of abstraction that is rigorous and  It avoids important methodological issues faced by prior MIS researchers.  Indeed, a number of recent MIS empirical works have been based on Bertin's theory.  E. GRAPHICS  RESEARCH  AT INDIANA  UNIVERSITY  In an unpublished paper, Davis et al. (1985) argue that Bertin's taxonomy of question types especially at the intermediate  homogeneous  level is ambiguous because it lacks an operational definition of what constitutes a  category  as given in Bertin (1983) (see also Lauer, 1986). They contend that this same  ambiguity appears in Wainer et al.'s (1982) operationalization of Bertin's question construct, and that results of the Wainer et al. study with regard to the effects of different questions are:  ...ambiguous in that performance as measured by the number of correctly answered questions was greatest for exact (their example of an Elementary Question), then comparison (their example of a Comprehensive Question), and then trend (their example of an Intermediate Question) questions; but, when performance was measured by the time required to answer questions, the ordering was exact, trend, and comparison. (Davis et al., 1985, p. 13).  They attempt, therefore, to formulate what they claim are the more rigorous and operationally testable definitions of the question construct and the information-set-complexity construct.  LITERATURE REVIEW / 20 Formalizing Bertin's (1973, 1983) general propositions on the process of extracting the relevant question-answer, Davis et al. (1985) advance the following statement of relationship connecting four major constructs:  P = f(l,Q,F)  where:  P = Performance, measured by the time required to isolate the image needed to answer a question. I = The information set presented Q = The question asked F = The form of presentation  1. The Question Construct To avoid the problem found in Bertin's taxonomy of questions, Davis (1985) proposes that the  question  construct be operationalized by the number and frequency of a set of ordered step(s) taken  to perform the necessary question-answer.  These steps, found by analyzing subjects' protocols  (see  Davis et al., 1985) and arranged in increasing order of complexity, are:  1.  Identifications  2.  Scans  3.  Comparisons  4.  Estimations  5.  Computationst  Accordingly, Davis (1985) and Yoo (1985) ran experiments using questions  t  spanning a complexity  Davis (1985) gives brief descriptions and definitions for each of these steps.  LITERATURE REVIEW / 21 continuum defined by the number and frequency of ordered step(s) required to extract the correct answer from an information presentation. Their most basic hypothesis is that  ...the form of presentation and the complexity of a question are independent; that is, there is no interaction between the steps performed to answer a question and the form in which the information is presented.  Results of their experiments, however, fail to support this reasoning. It is debatable whether Davis et al. should have treated question  complexity  separately from that of information presentation format.  For example, Pinker's (1981,1983) arguments strongly support the fact that the same question asked will not be easier or more difficult across-the-board: it depends very much on the sorts of information t o be extracted as well as the sorts of representation format used. may correspond to different steps for different questions. Davis' terminology, visual  In other words, each graph format  Hence, tables normally do not support, in  cues the way graphs support them.  The lack of robustness in the Davis  metric (see Davis, 1985; Yoo, 1985) warrants either its refinement or, more appropriately, a revision of Davis notes that although the content  the basic approach.  validity  of their metric has been  demonstrated earlier (Davis et al., 1985), the generation of a taxonomy of question difficulty based on their metric is nonetheless lacking in predictive  validity  (see Davis, 1985).  Indeed, a more critical view of the Davis et al. (1985) and Davis (1985) metric unveils a number of possible problems.  First, as already argued, a relatively more difficult  question as classified via Davis'  metric may be relatively easier with graphs than with tables and vice versa. That is because  encoding  of numerical magnitudes may well be more effortful than encoding of, say, a bar's level (see Pinker, 1981,1983; Kosslyn et al. 1983). In fact, the relative effectiveness of graphs over tables lies precisely in their capability to capitalize on the power of the human visual systems (Kosslyn et al., 1983; see also Pinker, 1983).  Even so, besides the need to encode numerical magnitudes when using tables, it  appears that all other  computations  must necessarily involve  calculations  (and possibly,  some  LITERATURE REVIEW / 22 heuristics), whereas graph perception might require instead what Rock (1984) called visual  intelligence.  Further, it would be wrong to suggest that the Davis et al. (1985) algorithmic processes for answering various questions are the only possible ones (see Kosslyn, 1982); probably, the use of protocol analysis methodology to validate step processes hypothesized in visual information processing will not prove to be as psychologically sound  as desired since low-level visual and/or perceptual processes are often  found to be unconscious and therefore generally inaccesible to verbalizationt except, of course, the outputs from such processes.  The methodology is often, therefore, more appropriately applied to  those processes that are at a higher consciousness level such as performing calculation, decision making and/or problem solving. Empirical evidence to support this view may also be found in Pisoni & Tash (1974) and Fodor (1983).  Finally, the decomposition of tasks into steps defined by the Davis  metric may be different with different individuals.  In summary, Davis' (1985) generation of a taxonomy of question difficulty that is independent of variations in information presentation format, basically ignores the geometry into which various sorts of extractible information will be translated due to representational variations and individual differences in perception.  To complicate the task decomposition process, psychologists in the area of human  perception have suggested that some kinds of perceptual processes, including visual and auditory, are instantaneous and automatic whereas others may need the focus of selective attention and the use of conscious effort (see Fodor, 1983; Rock, 1984).  Put simply, the weakness of the Davis metric for a taxonomy of question type based on the complexity of computational steps lies in its failure to accommodate confounding effects due to pertinent principles underlying the perceptual organization of various geometrical format.  t See Morton's Adams (1979).  (1967)  argument  using the psychological  party  trick  and also  Nickerson  &  LITERATURE REVIEW / 23 2. The Information Set Complexity Construct Lauer et al. (1985) define the information  set complexity  f construct as influenced by the following  four factors for time series:  1.  Length of the Ordinal Variable  2.  Length of the Nominal Variable  3.  Percent of possible Rank Changes  4.  Percent of possible Slope Changes  While the first t w o factors are based somewhat on Bertin's observation of the complexity  of a  figure,  there is little theoretical support or any strong basis for the variations of the last t w o factors. Moreover, results from their studies (e.g. Yoo, 1985; Lauer et al., 1985; Lauer, 1986) reveal that although certain complexity factors (e.g. length of the nominal variable ) are found to be significantly correlated with performance, there is very little evidence to conclude that the regularity  component of  information set complexity construct, which is operationalized by the last t w o factors listed above, is a determinant of performance (see Lauer, 1986, p. 139-140).  3. Implications of Recent MIS Graphics Research Findings Taken together, the results of recent MIS graphics experimentation based on Bertin's theory warrant a further need of adequate control of those critical factors that are likely t o affect subjects' performance with the use of an information presentation.  For example, the Lauer (1986) and Lauer et al. (1985)  studies indicate that although certain factors of complexity  do affect performance with an information  presentation significantly, there is uncertainty about regularity Similarly,  Davis'  (1985) findings  indicate  that  question  as one of the influencing factors.  difficulty  is not independent  of visual  representations and that different questions asked have different effect on performance depending on the format of information representation. . Finally, the Wainer et al. (1982) study indicates that while t This term is used to mean the complexity the information set (see Lauer, 1986).  with  respect  to the descriptive  content  of  LITERATURE REVIEW / 24 the question asked affects performance, there appears to be evidence of a possible time-accuracy performance tradeoff when various visual presentations are tested for the extraction of various quantitative information.  Thus it is important that there be adequate control in the experimental  design for such a tradeoff effect in future research and in the present study.  Overall, the need may be to identify a set of critical  tasks in the form of specific questions t o be asked  of an information presentation, and the systematic examination of these questions.  In this case, an  alternative classification of question type than that offered by Bertin or even Davis and a system for identifying  influencing factors of information  complexity  become  the more  critical as well as  challenging issues facing current and future MIS graphics researchers.  F. CLEVELAND'S  THEORY  OF GRAPHICAL  PERCEPTION  In recent years, Cleveland & his associates (e.g. Cleveland, 1984, 1985; Cleveland & McCill, 1984, 1985) have proposed a paradigm  of graphical  perception  1.  A specification and an ordering of elementary  2.  A statement on the role of distance  3.  A statement on the role of detection  which involves three basic premises:  graphical-perception  tasks  in graphical perception in graphical perception  1. The Cleveland-McGill Theory The focus of the Cleveland-McGill (1984) theory is on the accuracy of quantitative judgments with regard to a proposed set of elementary  perceptual  tasks.  Based on a combination of psychophysical  theory (e.g. Weber's and Steven's Laws: Baird & Noma, 1978) and other experimental evidence, Cleveland & McGill hypothesize an ordering, according to human quantitative judgments, of ten elementary perceptual tasks ranked from the most to the least accurate:  1.  Position along a common scale  2.  Positions along nonaligned scales  LITERATURE REVIEW I 25 3.  Length, direction, angle  4.  Area  5.  Volume, curvature  6.  Shading, color saturation  Apparently, the lack of sufficient information to separate the ties between some of the rank ordering forces Cleveland & McCill to place more than one task in three of the ranks above.  A lengthy  discussion of the theory and how it applies to the extraction of quantitative information on a variety of c o m m o n graph forms may be found in Cleveland & McGill (1984).  Cleveland  (1985) classifies all mental-visual tasks involved  in graphical  quantitative  information  extraction into two major categories: 1.  Those requiring the judgments of geometrical aspects of graphical elements such as position and size: the graphical-perception  2.  tasks which are the kinds of tasks dealt with by his paradigm.  Those involving the scanning of points to read off values of points using the scale lines and tick mark labels, and those  that  reasoning: the graphical-cognition  To test the predictive  validity  require conscious  rapid mental calculation and  quantitative  tasks.  of their theory, Cleveland & McCill (1984,1985) ran several related  experiments in which participants performed different kinds of judgments using dots, angles, lines, bars, pies, and other forms of representations. labeled as the position-length  Evidence drawn from subjects' judgments in what they  experiment (Cleveland & McCill, 1984) supports the claim that position  judgments are more accurate than length judgments by factors varying from 1.4 to 2.5; similarly, those results drawn from subjects' judgments in what they labeled as the position-angle  experiment confirms  that position judgments are 1.96 times more accurate than angle judgments. These results and others lead Cleveland & McCill to conclude boldly that their theory has "correctly predicted the outcome".  LITERATURE REVIEW / 26 While Cleveland & McGill's experiments validated elements of their theory, the findings suggested the need  for  a revision  position-length  of  their  proposed  hierarchy  of  elementary  tasks.  For example,  in  the  experiment, it is found that as the distance between t w o values to be judged increased  along an axis perpendicular to the c o m m o n scale, subjects become less accurate in their judgments. In this regard, Cleveland & McGill commented that the position task be expanded into a whole range of tasks.t In effect, they observed that a revision of their theory appears warranted.  2. Application of the Cleveland-McGill Theory to Graphing Data In applying their theory, Cleveland & McGill argue for the following basic supporting principle  of data  display: $  Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering* (Cleveland, 1985, p. 255).  Simply stated, graphs should employ elementary tasks as high in the ordering as possible since Cleveland & McGill claimed that by presenting elementary perceptual tasks as high as possible in the hierarchy, the graph will elicit judgments that are as accurate as possible and, therefore, the graph will  maximize  the viewer's ability to detect patterns and organize the quantitative information.  Consequently, the Cleveland-McGill theory and their experimental results led Cleveland (1984) to propose the replacement of simple bar charts with dot charts. replaced with clustered dot or symbol charts.  Similarly, divided bar charts could be  Cleveland (1984) argued that such a replacement  basically permits all values to be compared by making judgments of position along a c o m m o n scale in contrast to making judgments of length or area. As supporting evidence, he reinterpreted one of the  t I.e. As a range of tasks for judging differences of two values along a continuum of axis perpendicular to the common scale. t It is not however their claim that this principle offers a complete prescription graphics designers (Cleveland, 1985). * This refers to the ordering of the ten elementary perceptual tasks discussed earlier.  the for  LITERATURE REVIEW / 27 Cleveland-McGill experimental results t o show that errors of length-area judgments are 40-250% greater than those of position judgments along a c o m m o n scale.  3. Implications of Cleveland's Theory It may be argued that Cleveland's theory deals more with the accuracy  of graphical perception as  applied t o various elements of graphics construction rather than with the processes decision making using various forms of graphical representation.  of graphical  Yet, this line of research is still  relevant and important to the study of relative efficacy of various graph formats as representational aids since the process of data extraction from an information presentation is often regarded as essentially one of perception  (see Davis et al., 1985; Bertin, 1983; Pinker, 1981).  Therefore, MIS graphics  designers can also benefit from this line of contributions.  However, further empirical evidence is still needed t o strongly support claims for the replacement of bar charts by dot charts or divided bar charts by dot charts with grouping. More importantly, there is the need t o include empirical testing of comprehension tasks if the Cleveland paradigm is t o be of greater value t o MIS graphics designers.  For example, the question of whether the Cleveland theory  holds with an emphasis placed on time performance will be of interest.  Indeed, h o w quickly a person  is able t o read and understand a standard and unambigiously drawn graph indicates the effectiveness of that graph format for the task in question.  G. THE KOSSLYN-PINKER  THEORY  OF GRAPH  COMPREHENSION  Kosslyn et al. (1983; Pinker, 1981; kosslyn, 1985) advance both a general computational human visual information processing and a functional  architectural  theory of  model which they apply t o explain  the underlying perceptual and cognitive mechanisms involved in the reading and understanding of graphs.  They also propose a diagnostic  scheme  for evaluating information representations that is  based not only on human processing of visual input but also on the so-called theory of symbols.  LITERATURE REVIEW / 28 Based on the contemporary canonical  theory  of human visual information processing (e.g. Marr,  1982), Kosslyn et al. view the visual processing of information representations to be at three levels as depicted in figure 2.1: 1.  Perceptual Image: Sensory Information Store  2.  Short-Term Memory  3.  Long-Term Memory  ln the first phase, an information representation is processed syntactically to the formation of what is known as a visual sketch.  or pre-semantically,  leading  However, processing at this level is limited by  several factors, including adequate discriminability, variations of visual properties, processing priorities and perceptual distortions.  perceptual  The output from the first phase of processing is then organized into  units which are operated upon in the second phase of processing. Thus, three lines that  meet to form an enclosed figure are seen as a triangle and not simply as three lines since the formation of these perceptual units respects the Gestalt continuation,  similarity  and common  Laws  of Organization  such as proximity,  fate, and other principles of structural  dimensions  good  (e.g.  Garner,  1970; Garner & Felfoldy, 1970). Visual information held as perceptual units in short-term or  working  memory  (see Anderson & Bower, 1973; Lindsay & Norman, 1977) can be re-organized and interpreted  in various ways.  In any case, memory  capacity  limitations often impose difficulty in reading visual  displays either because t o o much material is presented or else there is t o o much material placed in a key (i.e. a legend) which requires the reader to engage in an arduous memorization task.  Kosslyn argues that reader recognition of an information representation is only possible when the appropriate stored information in long-term  memory is contacted.  This information constitutes a  person's knowledge about h o w charts and graphs serve to convey information and is a critical step in the graph comprehension process: if a person has never seen a particular display type before, it constitutes a problem to be solved, rather than a display to be read. Importantly, displays should have  LITERATURE REVIEW / 29 Figure 2.1: Three Visual Information Processing Stages  *REORGANtZATION * CAPACITY SENSORY T-NFQ RMAT I O N • STORE  *DIS CRIMINAKILITT *DISTORTI0N  LOUTS"  ^KNOWLEDGE  SHORT-TERM  LONG-TERM  JMEMORY  •>  *D'ESCRIPTXON COMPATIBILITY  ^ORGANIZATION ^PRIORITIES  Source: Kosslyn et al., 1983, p. 321-322, reproduced with  permission.  MEMORY  LITERATURE REVIEW / 30  description  compatibility.  In other words, each part of the display, such as labels and the legend  symbols, should not be ambiguous or subject to more than one semantic  interpretation; displays  should not lead a person to access inappropriate information and thus cause him t o draw incorrect inferences.  1. T h e Kosslyn et a l . A n a l y t i c a l S c h e m e  Accordingly, Kosslyn et al. (1983) generate an analytical scheme on the basis of the theory of human visual information processing discussed above and the theory of symbols to provide a detailed diagnostic description of any given chart and graph at three levels: 1.  Syntax:  where  syntactic  of form  properties  classesf  corresponding to major basic  level  constituents of charts and graphs (figure 2.2) can be described 2.  Semantics:  where the literal reading of each of the components of a chart or graph and the  literal meaning that arises from the relations among these components can be described 3.  Pragmatics:  where ways in which meaningful symbols conveying information above and beyond  the direct semantic interpretation of the symbols can be described.  Essentially, the Kosslyn et al. descriptive scheme has been developed as a diagnostic questionnaire useful for revealing how any given chart or graph may violate pertinent principles at the syntactic, semantic and pragmatic level. Kosslyn et al. claim that if a graph or chart is unambiguously  designed,  each question in the questionnaire should be easily answered and that if w e have trouble arriving at a straightforward answer to any of these questions, this alerts us that one or more of our operating principles has been violated. The objective, therefore, is t o discover which operating principles may be violated in a graph or chart and to specify what changes are to be made in the design of the information representation to make its basic  level  constituents as well as the relations among these  constituents as unambiguous as possible. t  These  refer  to  the  framework,  corresponds to the four basic level (1983) and illustrated in figure 2.2.  the  of  background,  the  graphic  constituents  specifier  and  identified  the  by  labels  which  Kosslyn  et al.  LITERATURE REVIEW / 31 Figure 2.2: Kosslyn et al's Basic Level C o n s t i t u e n t s  A  Source: Kosslyn et al., 1983, p. 323, reproduced with  GRAPH  permission.  LITERATURE REVIEW / 32 A detailed explanation of all the pertinent principles posited by Kosslyn et al. w o u l d  demand  substantial space (Kosslyn et al., 1983, pp. 1-170); these principles will therefore only be summarized briefly. Further, owing to the large number of terminologies that are used to describe these principles, the Glossary discusses or defines only key terms.  At the syntactic  level, Kosslyn et al. classify their operating principles as (a) Principles pertinent to  seeing the lines which include those of adequate discriminability and perceptual distortion, (b) Principles pertinent to organizing marks into units which include those of the Gestalt laws and Garner's (1970) structural dimensional principles, and (c) Principles of processing priorities and limitations.  At the semantic  level, Kosslyn et al. identify two major groups of operating principles: (a) Principles of  surface compatibility which include those of representativeness and congruence, and (b) Principles of concept and graph schema availability. be treated as a complex symbol,  In addition, Kosslyn et al. argue that as a graphic display may  t w o formal  principles  underlying the theory of symbols become  applicable to an unambiguous interpretation of the display: 1.  The external  mapping  principle  which means that,  Every mark should map into one and only one semantic category, and every piece of information  necessary  to  read  the  intended  information  should  be  indicated  unambiguously.  2.  The internal  mapping  principle  which specifies that,  The correspondence between portions of a display should be unambiguous.  At the pragmatic  level, Kosslyn et al. draw their operating principles from corresponding principles  underlying language (e.g. Grice, 1975) which include: 1.  Principles of invited inference  LITERATURE REVIEW / 33 2.  Principles of contextual compatibilityt  The development of a shorter and more directed form of diagnostic instrument replacing the original lengthy scheme and the results of its application to a substantial and representative sample of charts and graphs are presented in Kosslyn et al. (1983, pp. 171-214).  Based on independent evaluations of ten different graphs by means of the short questionnaire, and the aggregation of possible outcomes of t w o analysts (one being well experienced with the scheme and the other being naive at the outset), Kosslyn et al. claim to achieve an agreement rate between the analysts of 96.58%. This high rate, they claim, confirms the reliability  of their questionnaire.  The pattern of results reported in Kosslyn et al. suggests that the interpretation of a graph is often contaminated by either the addition of t o o much information or the absence of relevant information. Evidence shows that the two most frequently occuring faults among graphs are (1) violations of the principle of external mapping and (2) violations of principles pertinent t o the organization of marks, addition, the greatest proportion of faults found pertains to the specifier the framework  ln  alone and its interaction with  (see figure 2.2).  One interesting though cautionary note revealed by their study is that graphs found in the  business  area resulted in a much higher number of faults than graphs found in other content areas (e.g. math, physical science, life science, social sciences and "General Interest", which is a catch-all category containing such items as magazines, newspapers and " H o w t o " books).  This strongly suggests that  graphics designers in the business area are seriously in need of proper guidance.  t  See the Glossary for definitions  of these  terms.  LITERATURE REVIEW / 34 2. The Kosslyn-Pinker Process Model of Graph Comprehension Drawing on Bertin's (1967,1973) observation that any depicted object on charts, graphs and maps may be described simultaneously by its values along a number of visual dimensions, Pinker characterizes a graph as trying to:  ...communicate to the reader a set of pairings of values on t w o or more mathematical scales, using depicted objects whose visual dimensions (i.e. length, position, lightness, shape, etc.) correspond to the respective mathematical scales and whose values on each dimension (i.e. an object's  particular  length, position, and so on)  are proportional  to  the values  on  the  corresponding scales.  Based on this, Pinker argues that the depicted objects in the graph are mentally represented in t w o ways: 1.  As a visual  description  t o encode the marks depicted on the page in terms of their physical  dimensions 2.  As a graph  schema  to spell out how the physical dimensions will be mapped onto appropriate  mathematical scales  As Pinker suggested, these structures information, such as the actual  difference  value  are used by a graph reader to extract different sorts of of some scale paired with a given value on another scale, the  between the scale values of t w o data entities or the rate of change  of values on one scale  within a range of values on another and so on.  a.  Structures  &  Processes  Pinker (1981) and Kosslyn et al.  (1983) note that many visual  languages  (e.g. images, pictures,  graphics) proposed in the psychological and artificial intelligence literature (e.g. Palmer, 1975; Marr & Nishihara, 1978; Hinton, 1979; Winston, 1975; Miller & Johnson-Laird, 1976) describe a scene using  LITERATURE REVIEW / 35 propositions.  In these works, perceived visual entities or objects are assumed t o be represented  internally as variables, and with predicates being used to specify attributes of and relations among the entities.  Such predicate  specifications  may be a one-place specification of a simple property of an  object like Circle (x) (or, x is a circle), a two-place predicate specifying the relation between t w o objects, such as Above (x,y) (i.e. "x is above y"), or a three and higher-place predicate indicating the relation among groups of objects, such as Between (x,y,z) (i.e. "x is between y and z").  Kosslyn et al. (1983) list four broad principles grounded in basic psychological research to specify the form of the visual representation. 1.  description  most likely to be constructed from the input of an information  Briefly, these principles are:  The Indispensability  of Space, which states that our perceptual systems pick out a "unit" or an  "object" in a visual scene as any set of light patches that share the same spatial  position and not  other attributes such as intensity, texture, or wavelength (Kubovy, 1981). 2.  The Cestalt  Laws of Perceptual  Organization,  which govern how variables representing visual  entities will be related to one another in visual descriptionst (e.g. Wertheimer, 1938). 3.  Magnitude  Representation,  for example by one of. a set of discrete symbols, or by a continuous interval-ratio scale.  This  leads Kosslyn and Pinker to distinguish between t w o forms of magnitude: the interval-value  or  ratio-value,  where the quantity is represented continuously but the units are arbitrary, and the  absolute-value, 4.  which basically states how quantities may be represented in memory,  The Distributed  where the units are discrete and well-defined.£  Coordinate  Systems,  which states that memory representations of shape are  specified with respect to object-centered cylindrical coordinate systems that are also distributed (e.g.  Marr & Nishihara, 1978).  In practice, however, Pinker and Kosslyn et al. argue that two factors limit the size of a visual + I.e. How the atomic perceptual units will 1981, p. 10; see also Kosslyn et al., 1983). $ See the Glossary for an example.  be  integrated  into  a coherent  percept  (Pinker,  LITERATURE REVIEW / 36 description:  1.  Processing  Capacity,  which limits the number of nodes (i.e. to roughly 4 perceptual  units)  that  are available at any one time in short-term visual description (see Kosslyn et al., 1983; Ericsson, Chase, & Faloon, 1980). 2.  Encoding  Likelihood,  which specifies that different predicates may have different probabilities of  being encoded. It proposes that some visual predicates are automatically  encoded while others  are not, and that the probability of a given true predicate entering into a visual description is a related function of such automatic availability of processing capacity.  processes  multiplied by a constant that is relative to the  It also assumes that the level of node activationt decreases  steadily as soon as it is activated, but that the reader can repeatedly re-encode the description by reattending to the graph.  Incidentally, a graphic  notation  has been devised by Pinker (1981) whereby variables are denoted as  nodes, one-place predicates are printed next to the appropriate nodes and two-place predicates are printed alongside an arrow linking the t w o nodes representing the predicate's t w o arguments.  Hence  any particular scene can be graphically represented using Pinker's graphic convention as illustrated in Pinker (1981, p. 49; see figure 2.3).  Additionally, Pinker and Kosslyn et al. argue that the structure known as a graph  schema  and its  accompanying processes that work over it specify (a) the type of graph that is currently being viewed, (b) how the information currently found in the visual description is to be translated into the conceptual message, and (c) how the request found in a conceptual question is to be translated into a process that accesses the relevant parts of the visual description for the required but unknown piece of information.  t  This refers to the instantiation of particular variables in a visual  description.  L I T E R A T U R E R E V I E W / 37 Figure 2.3: Pinker's Graphic Notation  triangle  Source: Pinker,  1981; p. 49, reproduced with  c i rcle  permission.  LITERATURE REVIEW / 38 According to schema theories (e.g.  Minsky, 1975; Winston, 1975; Norman & Rumelhart, 1975;  Bregman,1977; Schank & Ableson, 1977), a schema is a representation in memory  embodying  knowledge in some domain consisting of a description capable of specifying both the information that must be true of some represented object of a given class and the sorts of information that will vary from one exemplar of the class to another.  In general, it is believed that a graph schema results from  the basic human ability of associating a scale of values in one domain with a scale of values in another domain by means of relating the positive ends of the t w o respective scales.  It is suggested that  people also create schemas for specific types of graphs using a general  schema  graph  embodying  knowledge of what graphs are for and how they are generally interpreted.t  Pinker (1981) and Kosslyn et al.  (1983) define four major procedures or processes that access the  structures representing graphic information: 1.  A Match  2.  A Message  process that recognizes an individual graph as belonging t o a specific type  Assembly  process that creates a conceptual message out of the instantiated graph  schema 3.  An Interrogation  process that retrieves/encodes new information on the basis of conceptual  questions 4.  A set of Inferential  processes that apply mathematical and logical inference rules to the entries  of the conceptual message  Essentially, the Kosslyn-Pinker model of graph comprehension posits that a visual information mechanisms.  is first translated directly into a visual  description  via some  The visual description in turn primes the appropriate graph  array of incoming  bottom-up  schema  encoding  in memory via a  MATCH process. The visual encoding mechanisms then detect the presence of any predicates that are automatically encoded in the visual processes and perform assembly of conceptual  t For example, Kosslyn's analysis of graphs reveals that they consist material, a framework, and a set of labels (see Kosslyn et al., 1983).  messages.  generally  of  In  pictorial  LITERATURE REVIEW / 39 short, the availability of these predicates causes certain conceptual message equations to be flagged when the necessary information to be extracted is also available. unavailable, the interrogation  processes  answering the particular conceptual  Otherwise, if the information is  via t o p - d o w n encoding mechanisms may be used to aid in  questioni  posed.  Finally, the reader has the additional option of  performing mathematical and logical operations on the entries in the conceptual message via the  inferential  process,  if necessary, though it would consume more time and effort.  Figure 2.4 illustrates  the Kosslyn-Pinker model of the graph comprehension process.  b.  Pinker's  Graph  Difficulty  Principle  The most critical aspect of the Kosslyn-Pinker model of graph comprehension is Pinker's (1981) Graph  Difficulty  Principle,  which simply states that  A particular type of information will be harder to extract from a given graph to the extent that inferential processes and t o p - d o w n encoding processes, as opposed to conceptual message lookup, must be used.  Pinker (1981,1983) contends that his Graph Difficulty Principle has frequently been noted in the graph comprehension literature (e.g. Macdonald-Ross, 1977a) in such statements as,  ...different types of graphs are not easier or more difficult across-the-board, but are easier or more difficult depending on the particular class of information that is to be extracted.  t  See the Glossary for the definition  of this term.  LITERATURE REVIEW / 40 Figure 2.4: Pinker's Process Model of Graph Comprehension  c o n c e p t u a l m e s s a g e  /rm e s s a g e a s s e m b l y  b o t t o m - u p e n c o d i n g  m s t a n t i a t e c  v i s u a l v i s u a l  !J4AT^  d e s c r i p t i o n  a r r a ^ .  g r a p h s c h e m a  t o p - d o w n e n c o d i n g  i n t e r r o g a t i o n  Source: Pinker, 1981; p. 63, reproduced with  permission.  V i n f e r e n t i a l p r o c e s s e s  LITERATURE REVIEW / 41 Based on the fact that there is a variety of available descriptions in our language system for the shapes of lines as well as pairs of lines (e.g. straight, curved, parallel, x-shaped, etc.) and that the availability of these predicates affords the possibility of a rich set of message flags for trends in a line graph schema, Pinker (1981,1983) argues that line graphs Hence,  they  are  the  preferred  are especially suited to the extraction of trend information.  method  of  displaying  multidimensional  scientific  data,  where  cause-and-effect relations, quantitative trends, and interactions among variables are at stake; on the other hand, Pinker (1981) notes that it is sometimes preferable to use a bar chart  rather than a line  graph to determine the difference between t w o levels of one variable corresponding to a pair of values on another since the desired values are specified individually in bar but not in line graphs. Yet, Pinker (1983) admits that one of the major problems with this observation, is that  ..there is no independent evidence for the putative perceptual effects alluded to (e.g., effortless perception of line shape and single bar length versus effortful perception of a set of relative bar lengths and the height of segment of a curve).  In a similar vein, there has been no independent source of empirical evidence to suggest that the use of symbol charts to support scale value extraction is superior to the use of other graphs.  In other  words, there is a definite need for gathering direct and independent empirical evidence to resolve the very basic argument of these alluded t o perceptual  c.  Pinker's  Treatment  of Information  effects!  Extraction  Pinker's treatment of the information extracted from graphs follows closely that of Bertin (1967,1973). According to him, the information extracted can be expressed in a representation comprising a list of numbered entries, each of which specifies a pairt of variables, the type or extent  of each independent  variable (i.e. single value, pair, range), and its value (or difference or trend) of the corresponding dependent variable. t For more complex 1981, p. 17).  graphs, an n-tuple of variables  must  be specified  instead  (see  Pinker,  LITERATURE REVIEW / 42 For instance, in the case of figure 2.5 taken from Pinker, the following conceptual  messages  may have  been assembled: 1.  The price of graphium is very high in March.  2.  The price is higher in March than in the preceding months.  3.  The price steadily declined from March to June.  4.  The price is $20/ounce in January.  5.  The price in June is x (where x is a quantity about half of that for January, about a fifth of that for May, etc.).  As a result, Pinker tabulates the set of paired observations assumed t o be extracted from the graph as follows:  1:Va absolute-value = March,  Vb level = high  2:Va pair = March & February,  Vb difference = higher  3:Va range = March - June,  Vb trend = decreasing  4:Va absolute-value = January,  Vb absolute-value = $20/oz.  5:Va absolute-value = June,  Vb ratio-value = x.  Thus, it has been surmised that conceptual  messages  may be generally expressed as an input-output  pair of,  i:Va ratio-value/absolute-value/pair/range = A, Vb ratio-value/absolute-value/pair/range = B  where the subscripts " i " and "a or b"  represent an arbitrary number of entries and variables  respectively, t  t  See Pinker (1981, p. 18) for more detail.  LITERATURE REVIEW / 4 3 Figure 2.5: Pinker's information Extraction from a Bar Chart  HO -i  MONTHLY PRICE OF GRAPHIUM 120-  100-  n ao CL  C v_  O  60-  40 -  20  JAN  FEB  MAR  APR  MONTHS Source: Pinker, 1981; p. 56, reproduced with  permission.  MAY  JUN  LITERATURE REVIEW / 44 Consequently, the notation for conceptual  questions  may be expressed simply by replacing the A or B  in the generalized entry presented earlier with the ? symbol, indicating that it is the desired but unknown piece of information.  Finally, a good grasp and illustration of the Kosslyn-Pinker model of graph comprehension requires a careful examination of their depicted bar chart schema, which is discussed next.  3. Graph Schema Models  a.  A Bar Chart  Schema  Figure 2.6 presents a substantial chunk of a schema for interpreting bar charts (Pinker, 1981; Kosslyn et al., 1983).  It shows how the scene of a bar chart is mentally divided into its "L-shaped" frameworkt  and its pictorial material representation: the bars.  From here, the framework is subdivided into the  abscissa and the ordinate, which are further subdivided into the actual line and the text printed alongside it. There are also explicit listings for the "pips" cross-hatching the ordinate and the numbers associated with them.  All of the relevant information associated with each bar (e.g. height, position,  etc.) is specified with respect to the coordinate systems centered on the respective axes of the framework.  Pinker uses an asterisk to show that a node, together with its connection to other nodes,  can be duplicated any number of times in the visual description.  Conceptual information that is  available for "reading off" the instantiated graph schema, such as the psychophysical ratio-value  of the  independent variable (IV) to be equated with the horizontal position of the bar with respect to the abscissa and that of the dependent variable (DV) to be equated with the height of the bar with respect to the ordinate, are specified by means of equation flags appended to respective nodes or arrows.  More interestingly, Pinker notes that in the bar chart schema, deriving the DV absolute-values*  may  t This is the two planar dimensions described in Bertin (1983): the "x-" and "y-" axes, t Pinker uses the term absolute-value to denote a scale whose units are discrete and well-defined, as the number of bars in a bar chart; however, he uses the term ratio-value  LITERATURE REVIEW / 45 require the  absolute-values  mediation of one's  psychopysical  ratio-values  but nof so in the case of the  due to the fact that bars stand physically on the abscissa framework.  extremeness levels of the DV including the maximum or minimum levels, and the DV  IV  Similarly, the  "staircase-trend"  for a simple entity plot or the level differences of an adjacent bar pair are presumed to be automatically available.  Even then, Pinker hypothesizes that higher-level inferential  when converting between ratio-values  and absolute-values  processes may have to be used  of unknown entriest in the conceptual  message, which means more time and effort.  Taken together, this bar chart schema claims that readers (at least those w h o are experienced) are able to translate directly  a higher-order perceptual pattern (e.g. a group of bars comprising a staircase) into  a quantitative trend, to translate efficiently  differences between a pair of adjacent bars into an entry  expressing a difference in the symbolized values, and to translate a salient perceptual entity into an entry expressing the extremeness of its corresponding variable value, without the mediation psychophysical ratio-scale  b.  A Symbol  Chart  of one's  values.  Schema  The Kosslyn-Pinker graph schema presented here is extensible to modeling  comparable  schemas: pie chart schema, dot or symbol chart schema, line graph schema and so on.  graph  In what  follows, this paper will try to offer a brief but concise extension of the Kosslyn-Pinker model to the symbol* and the line representations, since Pinker's version of bar chart schema is sufficient for the purpose of this research.  ^(cont'd) to mean those quantities whose units could be changed t o other any loss of information (e.g. yards-feet-inches). See also the Glossary. tE.g. Graphium price in June in figure 2.5. $For a dot chart, each dot may be treated as a specific kind of symbol.  units  without  LITERATURE REVIEW / 46 Figure 2.6: Pinker's Proposed Bar Chart Schema  Source:  Pinker,  1981; p.  61-62, reproduced  with  permission  LITERATURE REVIEW / Figure 2.6 (cont'd): Pinker's Proposed Bar Chart Schema  Source: Pinker, 1.981; p. 61-62, reproduced with permission.  47  LITERATURE REVIEW / 48 A logical follow-up of the symbol graph schema model is to divide it into its "L-shaped" framework and its pictorial material: the dots or symbols.  Again, the framework should be subdivided into the  actual line and the text printed alongside it.  Such a schema should be very similar to the bar chart schema.  Thus, its visual description should  contain explicit listings for the "pips" cross-hatching the ordinate and the abscissa, as well as the letters or numbers associated with them.  All of the relevant information (e.g. vertical and horizontal  positions) of each symbol should be specified via the co-ordinate systems centered on the respective axes of the framework.  Again, an asterisk may be used to represent that a node, together with its  connection to all other nodes, can be duplicated any number of times in the visual description. Conceptual information that is available for "reading off" the instantiated schema, such as one's psychophysical ratio-values  of the independent variable to be equated with the horizontal position of  the symbol used with respect to the abscissa and that of the dependent variable to be equated with the vertical position with respect to the ordinate, should be specified by means of equation flags appended to respective nodes or arrows.  The key differences between the bar chart schema and that of the symbol chart lies in the anchoring of bars to the abscissa framework and the stronger linkages among the symbols.  Bars are more likely to  be perceived as rectangular objects, which are strongly anchored to the abscissa but symbols have only weak anchoring to the abscissa frame.  In addition, symbols with the same shape show a greater  cohesiveness than bars, especially if they are representing multiple datasets.  Finally, Garner's (1970)  structural dimensional principles would postulate that the vertical as well as horizontal positions of bars as well as symbols are integral but the simplicity of the symbol chart would facilitate the reader's ability to translate more effectively interpolated scale values of a symbol's exact location than that of either the bar or a point on the line as observed in Cleveland & McGill (1984).  With respect to a  pair-of-symbols, rather than translating a level difference into a judgement of difference in anchored length as in the case of a pair-of-bars, level difference information would translate into a judgement  LITERATURE REVIEW / 49 positional difference.  Again, the latter task is performed more accurately in human visual processes  (Cleveland, 1985).  As for trend perception, a series of bars translates into either an ascending or descending staircase or else merges into a large rectangle shape.t In contrast, a series of symbols may either produce a series of meaningful patterns if they are placed close together or else result in an array of scattered patterns especially if they are widely spread out.  c.  A Line Graph  Schema  In the case of a line graph schema, its visual description should again be naturally decomposed into its "L-shaped" framework and its pictorial material: the lines.  Like other graph schemas, the framework  will be subdivided into the abscissa and the ordinate, which are further subdivided into the actual line and the text printed alongside it.  Evidently, the line graph schema does not have any straightforward way of deriving absolute-values either the DV or IV, being totally disjointed from the 'L-shaped' (i.e. axes) framework.  for  Instead, one  seems to assess the locations of segmented "points" along both the horizontal and vertical spatial dimension in terms of the reader's psychophysical scales, and then pips are searched along both the abscissa and the ordinate for those closest t o specifying the point's location.  Thus, the point's  positional scale-values are spelled out by the corresponding numbers or letters that must  be  disembedded and matched onto appropriate pips via an 'interpolation' process (i.e. one of the inferential processes). Consequently, it requires effort to extract single datapoints on a line.  The Cestalt Laws of organization of marks w o u l d also argue that each line have strong point cohesiveness (i.e. fully connected) and thus patterns or trends of successive pairs or ranges of points  t  See figure  2.6.  LITERATURE REVIEW / 50  are seen together.  A longer*  range of abscissa values would also enhance trend and pattern  perception, in direct contrast to bars. This is especially true for scheme of multiple dataset plots since an additional dataset represented as bars will create multiple  additional  line entry when represented as lines.  bars entries  but it will only create an  Hence, conceptual messages on patterns and trends  for lines will naturally be assembled relatively faster than that afforded by a bar or symbol schema.  In summary, the good aspect of the line graph schema is its rich set of message flags for trends: richer in fact than any of the other types of graph or chart schemas discussed.  For example, Pinker argues  that in the case of the standard Cartesian Line graph representing a dependent variable on the ordinate and t w o independent variables on the other components:  The absence of an effect of the independent variables on the dependent variable translates into flat, overlapping lines; an effect of one of the independent variables translates into t w o lines with a slope, and an effect of the other independent variable translates into non-overlap of the t w o lines; additivity of the effects of the t w o independent variables translates into parallel lines and non-additivity into nonparallel lines, and so on (Pinker, 1983, p. 6-7).  4. implications of the Kosslyn-Pinker Theory The Kosslyn-Pinker process theory of information representation uncovers a number of stages that a graph reader undergoes in the comprehension process.  Essentially, time performance for a particular  task is longer with t o p - d o w n as opposed to bottom-up processing.  Experience or familiarity with  extracting various data from certain graph formats will enhance processing time.  For some tasks, the  use of a certain format for a certain task may also induce faster processing strategies compared to another task. The important implication of the theory is, then, the appropriate mapping of formats to tasks in order that the most efficient processing strategies may be executed.  t By that I mean when there is a larger number the abscissa of a time series graph.  of  "time  periods"  to  be depicted  along  LITERATURE REVIEW / 51 The findings in the literature appear to generally support such a view.  For example, with time-series  data, subjects were found to use line graphs better than horizontal or vertical bar charts for reading trends (Schutz, 1961).  However, tables generally led to better performance than graphs for point  value reading tasks (Washbume, 1927; Carter, 1947; Lusk & Kersnick, 1979; Benbasat et al., 1986). When comparing points and patterns, the evidence in the literature is that graphs as compared to tables lead to better performance (Washbume, 1927; Carter, 1948; Feliciano, Powers, & Bryand, 1963).  A more critical implication of the theory that is of relevance to this research, however, is the focus on principles (e.g. the Cestalt Laws, Pinker's Graph Difficulty Principle) governing why some forms of graphical representation should be more supportive of performance with some tasks but not with others. For example, the Pinker's Graph Difficulty Principle allows many of the phenomena about ease or difficulty of reading graphs to be explained, and ideas discussed in earlier theories to be integrated for empirical validations.  Several a priori  hypotheses are thus drawn from these theories and evaluated  in this research program so as to. add to the accumulated evidence (e.g.  Pinker, 1983; Cleveland &  McGill, 1984) supporting (or refuting) the general predictive validity of the theories.  III. THEORETICAL PROPOSITIONS The literature review of the preceding chapter indicates that it is (a) the task in terms of a  conceptual  question  information  asked,  complexity  (b)  the  format  of  graphical  representation,  and  (c)  the  graphical  of the display that affect performance (e.g. time) with an information presentation.  fourth factor, experience  of the graph reader, could also influence performance.  A  This factor will not  be manipulated; rather, replications of the experimental sessions would be used to minimize learning and experience effects.  A. CRITICAL  FACTORS  Accordingly, this chapter begins with a discussion on those factors believed to affect critically the use of an information presentation: 1.  Graph Format  2.  Information Complexity  3.  The Task Variable  1. Graph Format Different writers use different terms such as modes, forms, formats, and/or representations  to convey  more-or-less the same meaning (Lusk & Kersnick, 1979; Lucas & Nielsen, 1980; Kosslyn, 1982; Larkin & Simon, 1987).  For example, the term visual  factors  (Washburne, 1927, p. 468) was used to refer to  those factors that have to d o with similarities and/or dissimilarities of geometrical patterns used in graphic numerical representations.  Since the perception of various graph formats is constrained by various operating principles governing human visual information processing (Kosslyn et al., 1983), the key issue regarding the use of one graphic format instead of another lies precisely in the resulting pattem(s) that the human visual system can automatically extract (Pinker, 1981, 1983; Kosslyn et al., 1983). For example, Kosslyn (1985) argues i  that three lines that meet to form an enclosed figure are perceived as a triangle, rather than simply 52  THEORETICAL PROPOSITIONS / 53 three lines, apparently because the human visual system is governed by perceptual operating principles known as Cestalt Laws of organization discussed in chapter 2.  Other examples where Cestalt principles apply include the automatic registration of entire lines in a line graph, the reading of spatially isolated symbols in a symbol graph, and the perception of discrete rectangular bars anchored along the abscissa in a bar chart. Accordingly, the relative effectiveness of a graph t o convey its informational content depends greatly on the choice of its representational format: circles, triangles, shaded figures, unfilled or filled dots, symbols, wedges, bars, pictures or lines.  It is  this choice that complicates important design issues of color and graphics enhanced information support systems available for end-users and/or decision makers.  This research is limited t o dealing with three graph formats that are the most widely used in time-series representations: bars, symbols, and lines.  2. Information Complexity  Bertin (1983; p. 6) treats complexity of graphics as a function of the number of identifiable elements in each variable component or dimension.t He uses the term, component  t o refer to the t w o planar  dimensions, and six retinal variables including value, shape, size, color, texture and orientation* which, he claims, are the only possible variations available in graphics designs.  As noted in the preceding  chapter, a major facet of information complexity in time-series graphics corresponds t o variations of the 'length' of the various graphical components (e.g.  "quantity scale" represented by the ordinate  scale pips, "time period variation" represented by the abscissa time axis pips, and "dataset category" represented by the classification o f data groupings.*  t Please consult the Glossary (appendix A) for the definition of a Dimension. t- These terms are defined in the Glossary (appendix A). * Refer to the Glossary for detailed definitions of these variational concepts.  THEORETICAL PROPOSITIONS / 54 Owing to the lack of either a comprehensive theory or sufficient empirical sources of evidence to suggest how these various factors of graphical information complexity could affect performance with a graphical presentation and how they would interact with each other, a logical approach to exploring their influence would be to combine what might constitute discriminable values or levels of the various complexity factor treatments as proposed in table 3.1.t One advantage of this classification scheme (table 3.1) compared to that offered by Lauer et al. (1985) when applied to time-series parsimony.  The scheme also focuses on the more significant notion of length*  important construct of the regularity*  graphics is its  and ignores the less  factor.  More critically, by investigating each of the component lengths  at both the high and low level  combinations relative to each other, independent contrasts and comparisons of the different factorial effects may be achieved rather than the possible confounding of effects due to the various factor combinations as proposed in the Lauer et al. scheme.  In fact, a one-to-one correspondence may be drawn between factors of graphical  information  complexity presented in table 3.1 and the major graphical components characterizing a time-series graph as illustrated in figure 3.1: t See also figure 3.1 for an illustration of the various components affecting complexity factors listed in table 3.1. * The term length as defined by Bertin refers to the concept of identifiable elements in a given component or variable. For example, the PIV component length of a time-series graph equals the number of time periods depicted along its abscissa while its SIV component length would be the number of data groupings represented in a key (i.e. legend). For time-series graphics, the ordinate length could be described as an inverse function of the number of interpolating pips (i.e. quantity representation) or the significant digits used in DV scale component. See also Lauer et al. (1985), Lauer (1986) and Yoo (1985) for further discussion on the issue of 'length'. * Lauer (1986) and Lauer et al. (1985) operationally defined regularity as the degree of fit as well as the percent rank changes in slope. In contrast to more-or-less demonstrated effects due t o lengths of time-series graphic components, there is little evidence t o indicate the significance of a regularity construct (Yoo, 1985; Lauer, 1986). Even so, the definition of this construct is somewhat more speculative than that of Schutz's degree of line-crossing (or, confusability factors), which was found to contribute also to increasing complexity of graphics (see Lauer et al., 1985; Schutz, 1961b). One way to control the Schutz effect would be to ensure that the data sets used in constructing the graphics stimuli would not result in any form of line-crossing.  THEORETICAL PROPOSITIONS / 55 1.  The Dependent Variable (DV) component characterizing the quantity scaling represented along the ordinate scale (i.e. y-axis)  2.  The Primary Independent Variable (P!V) component characterizing the time period variation represented along the abscissa time axis (i.e. x-axis)  3.  The Secondary Independent Variable (SIV) component characterizing the dataset category represented as a coding scheme in a key (i.e. legend)  The importance of manipulating complexity factors in the study of information presentations may be attributed to: 1.  The difference between extracting the same information from a complex as opposed t o a simple time-series representation.  Complexity is important because of its relationship to the amount of  information a reader can assimilate and understand in a presentation.  Bertin comments that  results of the Wainer et al. study (see Wainer et al., 1982) may vary as data amount and complexity  increases.  Pinker (1981,  1983) observes that  the effects  of extracting  such  information like relative levels or trends from various forms of graphs may be different when complex rather than simple graphs are used. 2.  " T h e increasing size of the organizational information resource (see Lauer, 1986).  There is the  need of guidelines for IS designers on h o w increasingly complex information may be effectively presented.  Knowledge of the relative effectiveness of various complex graphics for various  kinds of tasks can contribute  t o a better  understanding  of graphical design  principles.  Undoubtedly, complex time-series graphics are often found among real-world applications. 3.  Finally, results of task performance comparing simple versus complex graphics will yield better generalizations than findings based o n only simple or only complex graphs.  Further, graphics  used in a MIS context have been far more complex than what has been tested by researchers in areas outside of the MIS field.  Consequently, to produce useful results for all practical  purposes, graphs ranging from the fairly simple t o the complex forms are used as experimental stimuli in this research.  THEORETICAL PROPOSITIONS / 56 Table 3.1: A Classification S c h e m e f o r I n f o r m a t i o n C o m p l e x i t y Factors  Information Complex i t y  Quantity Scaling  Time P e r i o d Var i a t i o n  Dataset Category  a  Low  Low  Low  b  High  Low  Low  c  Low  High  Low  d  High  High  Low  e  Low  Low  High  f  High  Low  High  g  Low  High  High  h  High  High  High  G R O S S P R O C E E D S O F VARIOUS C O R P O R A T E  SECURITY  ISSUES  F R O M T H E Y E A R 1910 TO T H E Y E A R 1 9 8 2 300  Secondary Independent V a r i ' a B l e Component: D a t a s e t C a t e r g o r y  Legend « O  o a  Public Sorwdi Slockj  O  Prlvol« 8 o n d i  c o  o  0 O  Dependent Variable Component: Quantity Scaling  too • 1  •  o  o B  a —i  Primary Independent V a r i a b l e  Component: Time P e r i o d V a r i a t i o n  THEORETICAL PROPOSITIONS / 58 Owing t o the limited number of factors that can be effectively studied at once, the t w o factors of "time period variation" and "dataset category" of time-series graphics as defined in the Glossary are the only manipulations of graphical information complexity performed in this research.  3. The Task Variable Essentially consistent with Bertin's taxonomy of question-types, Pinker (1981) and Kosslyn et al. (1983) provide a taxonomy of basic classes  of information for both the independent and the dependent  variables that is extractible from any information representation.  Their independent variable (IV)  classification includes: (a) a single datapoint; (b) a pair of adjacent datapoints; and, (c) a range of successive datapoints. Their dependent variable (DV) information categories include: (a) a level; (b) a ratio-value; (c) an absolute-value; (d) a difference; and, (e) a Trend.  The difference between the  concept of a ratio-value and that of an absolute-value is explained in the Glossary.t  The table given below compares and contrasts the various classifications proposed by Bertin (1981) and those of Pinker (1981) regarding the classes of primitive symbols* and/or the relationship among these primitives that are extractible from a presentation. Bertin's Levels  Bertin's Classes  Pinker's IV  Pinker's DV  Elementary  A Primitive Symbol A Single Datapoint A Scale Value A Level  Intermediate  A Homogeneous  A Pair of Adjacent  Scale Values  Cluster  Datapoints  A Difference A Trend  t These terms are also discussed in a preceding footnote under the section on Pinker's proposed bar chart schema. * Refer to the Glossary for a definition of this term. Note also that since the interest of this research will only be to focus on the y-axis scale values of the various time-series datapoints, the term, scale-value is preferred t o the more specialized terms of absolute-value and/or ratio-value as defined in the Glossary.  THEORETICAL PROPOSITIONS / 59 A Comprehensive  Overall  Cluster  A Range of  Scale values  Successive  Relative Levels  Datapoints  A Trend  Indeed, a good grasp of the kinds of critical and fundamental questions that may be asked on the extraction of various sorts of quantitative information from a graphical presentation can be arrived at by carefully examining the above table.  For a single datapoint  on a time-series, the  apparently  fundamental information to extract is its DV quantity or scale value. Examples of conceptual questions on single datapoints for time-series graphics with the usual quantity-time correspondence represented along the y- and x-axes as depicted in figure 3.1 are:  1.  What are public bonds' proceeds in 1934 ? in 1928 ? in 1952 ?  2.  When did proceeds from stocks first reach the $400 million mark? the $250 million mark ?  3.  Which investment type, in any year, come closest to the $500 million mark ?  To find the quantities or scale values of t w o or more datapoints would just be a natural extension of this fundamental task.  For a pair of adjacent basic and important.  datapoints,  on the other hand, information about its DV level difference appears  Examples of conceptual questions on adjacent pair datapoints for time-series  graphics (see figure 3.1) with the usual quantity-time correspondence are:  1.  Is the level of proceeds from stocks in 1958 higher than that in 1964 ?  2.  Did the greatest change in levels of proceeds from stocks occur between the years 1958 and 1964 ?  3.  Which investment has the largest change in levels of proceeds between t w o consecutive time periods ?  THEORETICAL PROPOSITIONS / 60 The concept of a level difference is meaningless for an isolated datapoint.  The level differences for a  range of successive datapoints is, evidently, another natural extension of the basic task discussed here.  Finally, for a range of successive critical.  datapoints,  information regarding t h e i r D V trend appears to be most  Examples of conceptual questions on successive ranges of datapoints for time-series graphics  with the usual quantity-time correspondence (see figure 3.1) are:  1.  What is the general trend of proceeds from private bonds during 1940 to 1982 ?  2.  What is the maximum time periods over which proceeds from stocks continued to rise ?  3.  Which investment has the longest falling trend in proceeds ?  First, it is not possible to specify a trend for a single datapoint. pair of adjacent  Moreover, the concept of a trend for a  datapoints is simply embedded in the more basic concept of the level difference  between them. What appear to have been left out on purpose, however, are cases of pairs or ranges of non-adjacent  datapoints.  This is because when t w o or more datapoints are non-adjacent, there  must be some other datapoints in-between. Hence, the different kinds of questions that can be asked about them could be regarded as further extensions and/or generalizations of what have already been discussed. Accordingly, these cases are best treated as composite tasks to be studied as extensions of elementary questions described so far.  As a further complication, since each datapoint  in a time-series with just one dataset depicts only a  time-quantity (x,y) attribute correspondence, when there are two or more datasets, the need arises for an additional attribute characterization - that of a dataset classification.  In such an event, the resulting  attribute characterizations would then correspond t o first, the DV quantity scale component, the PIV time period component, and the SIV dataset categorization component.f  t  Figure 3.1 illustrates the meanings of these attribute  characterizations.  THEORETICAL PROPOSITIONS / 61 Collectively then, a task may be classified by: 1.  Attribute  component,  depending  on  which  attribute  component  the  answer  is to  be  disembedded: the DV, the PIV, and/or the SIV c o m p o n e n t t 2.  Question  types,  such  as questions  on  Scale Values,  Level  Differences,  Trends,  Cluster  Relationships, and so o n . *  Table 3.2 shows a general classification of graphics research tasks. Limiting question types to the three elementary classes discussed so far, six different elementary task combinations are thus identified for a time-series with only one dataset, and nine elementary task combinations for a time-series with t w o or more datasets.*  Table 3.3 shows the classes of elementary questions that are studied. These include: 1.  Exact Questions (i.e. scale values of single datapoints)  2.  Relationship Questions (i.e. level differences of pairs of consecutive datapoints)  3.  Trend Questions (i.e. trends of ranges of successive datapoints)  This research program investigates, in a systematic order, the nine different  fundamental  task  combinations presented in table 3.3.  t Note that those questions pertaining to extracting embeded answers from the SIV component (tables 3.2 and 3.3) are only applicable when more than one dataset (figure 3.1) is involved. * See table 3.2. * See table 3.3.  THEORETICAL PROPOSITIONS / 62 Table 3.2: A General Classification of Graphics Research Tasks  General Scale Values Graph Attribute Components  DV Componen t  Primary IV Component  Secondary IV Component  Information-Extraction Relative Levels  Trends  Tasks  ...etc.  THEORETICAL PROPOSITIONS / 63 Table 3.3: Classes of Elementary Comprehension Tasks  Classes Exact Questions Graph A t t r i b u t e Components  DV Component  Primary IV Component  Secondary IV Component  of Elementary  Questions  Relat ionship Questions  Studied Trend Questions  THEORETICAL PROPOSITIONS / 64 4. Learning It is believed that experienced graph readers know the correspondences between quantitative trends and visual patterns for various types of graph in long term memory.  For example, crossing lines in a  line graph would convey t o an expert graph reader the existence of an interaction effect whereas parallel lines in the same graph would translate into the absence of such an effect.  Similarly, a  decending staircase in a bar graph would indicate a falling trend t o the efficient bar chart reader whereas an ascending staircase would indicate a growing trend.  As it is very difficult t o monitor people's experience and training in the use of various types of graphs, some of the methods available t o researchers for investigating as well as reducing the possibility of a strong confounding learning effect would, for example, be: to restrict t o the use of conventional graphics (i.e. graphs whose constructions have been based on established rules or standards); t o choose appropriate candidates (e.g. subjects w h o have been exposed to graphs under investigation); and  to  apply  time-repeated-measures  design  (i.e. experimental  replications).  Replications  of  experimental sessions should not only provide an indication of the significance of learning, but should also help to partial out spurious factorial effects attributable to a lack of practice or unfamiliarity;  In this research, each participant replicates t w o experimental sessions, each of which consists of a total of thirty-six treatment combinations or trials.  B. TASKS INVESTIGATED IN THIS RESEARCH  A characterization of tasks examined in this research is based on the graph attribute component (table 3.3) from which answers to questions presented for each respective experiment are t o be extracted. Hence, one may classify experimental task characteristics according t o the variable from which pertinent information is t o be chiefly extracted, namely: 1.  Dependent variable (DV) component (in experiment 1)  THEORETICAL PROPOSITIONS / 65  variable (PIV) component (in experiment 2)  2.  Primary independent  3.  Secondary independent  variable (SIV) component (in experiment 3 ) t  Tables 3.4 and 3.5 compare and contrast task activities for each of the three experiments conducted in this research program.  It should be noted that the processing of task activities for E1, E2, and E3 are  expected t o be in an order of increasing complexity because explicit time period and dataset information anchoring is afforded in tasks for E1 and E2 but not for E3. Tables 3.4 and 3.5 suggest that the amount of search processes needed for performing E3 tasks due t o unknown dataset information should increase the time required as well as reduce accuracy relative t o performing E1 and E2 tasks; that is, in E3 subjects will have to search all datasets in order t o answer the question, whereas in E1 and E2, subjects need only examine information related solely to one dataset.  Similarly, E2 tasks are  expected t o take longer t o perform than E1 tasks due t o the fact that while time period information is explicitly given for E1 tasks, the appropriate time-period information is to be extracted for E2 tasks. This requires the search of multiple points of the given dataset in E2.  1. E x p e r i m e n t 1 T a s k s  As presented in table 3.4, task activities for experiment 1 comprise: 1.  Q 1 -- Finding the DV scale-value of a single datapoint with an explicitly defined time period ( i.e. PIV information ) on a particular dataset ( i.e. SIV information ) and comparing it to a given DV scale-value  2.  Q 2 -- Finding the DV level difference pattern of a pair of adjacent datapoints with explicitly defined time periods (i.e. PIV information ) on a particular dataset ( i.e. SIV information )  3.  Q3 -  Finding the DV trend of a range of successive datapoints with explicitly defined time  periods (i.e. PIV information ) on a particular dataset ( i.e. SIV information )  Questions for this experiment have answers that are to be extracted specifically from t h e D V + See figure 3.1 and tables 3.4 & 3.5.  THEORETICAL PROPOSITIONS / 66 component (i.e. scale-value, level difference, and trend information) based on given PIV (i.e. time period) and SIV (i.e. dataset) information.t As highlighted in a later section of this chapter, the key characteristics of tasks in this experiment is that of a strong anchoring of time-period information on the abscissa component; that is, all tasks in this experiment begin with explicit time-period  information and work towards  the  uncovering of respective  DV component  information. Treatment combinations for factors of graphical information complexity are presented in table 3.6.  Appendix  B provides the 36 different treatment combinations (trials) t o be undertaken in this  experiment. Actual questions and accompanying graphics are shown.  2. Experiment 2 Tasks As presented in table 3.4, task activities for experiment 2 comprise: 1.  Q1 -- Examining the DV scale-values of single datapoints across all time periods of a particular dataset for a given DV scale-value  2.  Q 2 ~ Examining the DV level difference patterns of pairs of adjacent datapoints across all time periods of a particular dataset for a given DV level difference pattern (increasing or decreasing)  3.  Q3 -- Examining the DV trends of ranges of successive datapoints across all time periods of a particular dataset for a given DV trend  Table 3.5 indicates that questions for experiment E2 have answers that are to be extracted from the PIV component characteristics  of  ( i.e. time periods ) based DV component  on given  information.  SIV information (i.e.  Similar t o tasks in experiment  dataset) and E1, the  key  characteristics of tasks in E2 is that of a strong anchoring of time-period information on the abscissa component.  t  See table 3.5.  THEORETICAL PROPOSITIONS / 67 Table 3.4: A Comparison of Task Activities for Experiments E-1, E2 and E3  Classes  of Elementary  Exact Questions (Q1)  Experiments  Experiment 1  Experiment  2  Experiment  3  Questions  Relat ionship Questions (Q2)  Studied  Trend Questions (Q3)  F i n d i n g t h e DV_ s c a l e - v a l u e of a single point with e x p l i c i t l y d e f i n e d IV t i m e per i o d and dataset.  F i n d i n g t h e DV l e v e l s o f two adjacent points with e x p l i c i t d e f i n e d IV time p e r i o d s and dataset.  Examining t h e . DV s c a l e - v a l u e s of isolated points across time p e r i o d s of a particular dataset to find a s p e c i f i c IV value.  Examining the Examining the DV l e v e l diff. DV t r e n d s o f p a t t e r n o f a d j .s u c c e s s i v e p t s . points across a c r o s s time time p e r i o d s of p e r i o d s . o f a a particular particular dataset to find dataset to find a p a i r of IV a r a n g e o f IV values. values.  Examining the DV s c a l e - v a l u e s of i s o l a t e d points across time p e r i o d s of a l l datasets to find dataset value.  Examining the Exami n i n g . t h e DV t r e n d s o f DV l e v e l diff. p a t t e r n o f a d j .s u c c e s s i v e p t s . a c r o s s time points across time p e r i o d s of p e r i o d s of a l l a l l datasets to datasets to find dataset find dataset value. value.  F i n d i n g t h e DV trend of a range of p o i n t s with explic it d e f i n e d IV time p e r i o d s and dataset.  Experiments  Sta tus of Information on Variou s A t t r i b u t e Components of Time Series G raphics  Tasks Primary IY Component  El  • E2  E3  Secondary IV Component  OY Component  Ql  Given: S p e c i f i c x-axis value  Given: Specific dataset value  ?Find: Compare derived OY scale value to a given s p e c i f i c scale value  Q2  Given: Consecutive p a i r of s p e c i f i c x - a x i s values  Given: Specific dataset value  ?Find: Relationship of two scale values  Q3  Given: Successive range of s p e c i f i c x-axis values  Given: Specific dataset value  ?Find: Trend of a range of. scale values  Ql  ?Find: S p e c i f i c x-axis value  Given: Specific dataset value  Given: Specific DV scale value  Q2  ?Find: Consecutive p a i r of s p e c i f i c x-axis values  Given: Specific dataset value  Given: Relationship of two scale values  Q3  ?Find: Successive range of s p e c i f i c x-axis values  Given: Specific dataset value  Given: Trend of a range of scales values  Ql  Not Applicable  ?Find: Specific dataset value  Given: Specific DV scale value  Q2  Not Applicable  ?Find: Specific dataset value  Given: Relationship of two scale values  Q3  Not Appl icable  ?Find: Specific dataset value  Given: Trend of a range of scale values  THEORETICAL PROPOSITIONS / 69 The only difference, however, is that E2 tasks all begin with known characterizations of DV component information  and works toward  uncovering time-period  information  on the abscissa.  In short,  time-period information is not explicit in E2 as in E l . Treatment combinations for factors of graphical information complexity are similar t o experiment E1 (table 3.6).  Appendix C lists the 36 different treatment combinations (trials) t o be undertaken in this experiment, including both questions and accompanying graphics for each trial.  3. Experiment 3 Tasks As presented in table 3.4, task activities for experiment 3 comprise: 1.  Q1 -- Examining the DV scale-values of single datapoints across all time periods and all datasets depicted for a given DV scale-value  2.  Q2 - Examining the DV level difference patterns of pairs of adjacent datapoints across all time periods and all datasets depicted for a given DV level difference pattern  3.  Q3 -- Examining the DV trends of ranges of successive datapoints across all time periods and all datasets depicted for a given DV trend  As observed in table 3.5, questions for the third experiment have answers that are to be extracted from the SIV component ( i.e. dataset information) based solely on given characteristics of DV information (table 3.5). That is, subjects need to search along both the PIV attribute component (i.e. all time periods ) as well as the SIV attribute component (i.e. all datasets) in order to perform tasks in this experiment,  ln a later section, it will be shown that the key characteristics of  tasks in this experiment is that of the absence of a strong anchoring of time-period information on the abscissa component.  Treatment combinations for factors of graphical information complexity  are provided in table 3.7. Appendix D presents the 36 different treatment combinations (trials) to be undertaken in this experiment with actual questions and accompanying graphics for each trial.  THEORETICAL  PROPOSITIONS / 70  Table 3.6: Information Complexity Manipulated in Experiments E1 and E2 K  Informat ion Complex i t y  Time P e r i o d Var i a t ion  a  7  b  14  c  d  Dataset Category  Total data pts plotted  1  7  1•  1 4  7  3  21  1 4  3  42  THEORETICAL PROPOSITIONS / 71 Table 3.7: Information Complexity Manipulated in Experiment E3  Informat ion Complex i t y  Time P e r i o d Variation  Dataset Category  Total data pts p l o t t e d  a  7  2  1 4  b  7  3  21  c  1 4  2  28  d  1  4  3  42  THEORETICAL PROPOSITIONS / 72 Note that variables manipulated in E3 differ from those of experiments E1 and E2. First, only multiple representations of time-series graphics are used in E3 since the emphasis is on asking questions dealing with the  SIV attribute component.  Second, this change of factor levels of  graphical  information complexity in the experiment should result in a greater demand of time and effort on the part of the participants.  Indeed, time and accuracy results of pilot testings indicate that subjects found  task activities of E3 more demanding than those of E1 and E2. Further supporting evidence for this lies in the increasingly larger mean reaction times and higher percentages of incorrect responses found for task performance during actual sessions of E3 as compared to the other t w o experiments.  In this  research, the rationale underlying the use of three experiments instead of a single experiment lies precisely in the presence of task complexity across the experiments. (i.e.  Within each experiment, tasks  Q l , Q2, and Q3) are generally designed to be equally complex and task complexity is not a  variable of interest.  C.  THEORY  &  PROPOSITIONS  As noted in chapter 1, further progress in the area of MIS graphics research may likely come about with: (1) the introduction of a scheme for which complex tasks can be decomposed to a level where their underlying mechanisms may be characterized (Vessey, 1987); (2) the specification and validation of a sound taxonomy of question types (i.e. tasks) by which results from graphics studies can be compared and integrated (Davis, 1985; Jarvenpaa & Dickson, 1988); and, (3) the generation of a priori hypotheses regarding why certain results may or may not be expected (Benbasat et al., 1986).  The  final sections of this chapter address these problems based on the concept of matching characteristics between tasks and graphical representations.  It is this knowledge which will contribute to our  understanding of relative strengths and weaknesses of various graph formats for performing different tasks (Jarvenpaa, 1986; Vessey, 1987; Jarvenpaa & Dickson, 1988).  The concept of matching appropriate formats to appropriate tasks is one that has continued to intrigue graphics theorists as well as researchers (see chapters 1 and 2).  A review of current MIS empirical  THEORETICAL PROPOSITIONS / 73 literature showed that there is a strong and growing interest among researchers in accumulating empirical  evidence  on  the  different  circumstances  in  which  various  types  of  information  representations may prove to be better or worse, based on criteria such as decision time, decision quality, interpretation accuracy, etc.  Since reviews of empirical evidence comparing tabular and other  types of graphical representations have recently appeared in the mainstream of MIS publications (e.g. jarvenpaa & Dickson, 1988), this discussion will focus on views expressed among the theorists.  First, Cleveland (1984) and Cleveland & McCill (1984) observe that the use of bar graphs should be generally avoided, and thus recommend that they be replaced by dot charts. Among the reasons cited for this claim are: 1.  The elementary perception task(s) people are likely to perform on a dot chart would be that of judging position along a common scale, but both area and length judgments, which are found to be less accurate than positional judgments, are likely to play important roles when bar charts are used (Cleveland & McGill, 1984, p. 532-533);  2.  Some data values will also be given more visual emphasis than other data values in bar representations, but not in dot representations (Cleveland, 1984, p.277).  Second, the Kosslyn-Pinker model of graph comprehension appears t o argue that a bar chart schema would allow more immediate extraction of values of the PIV attribute component for single bars than the disembedding of abscissa values for segmented points on a line. The process theory (e.g.  Pinker,  1981; Kosslyn et al., 1983) actually implies that it is difficult to identify relative levels of .respective points on a line graph because that information is still a part of an unitary Cestalt, the line. In contrast, identifying trends on a line is automatic.  Trend preception for bars may become effortful, however,  due t o the occasional need t o perform a serial identification and combination of relevant isolated bars.  For time-series, Bertin's rules of graphic construction specifically recommend the use of: symbol graphs for overall vision (correlation); line graphs for general trend perception; and symbol connected  THEORETICAL PROPOSITIONS / 74 graphs or bar graphs for precision reading, depending on the way the conceptual question is phrased (see Bertin, 1983, p.215; 1981, p. 114-115).  In addition, Bertin (1981; p. 107) claims that the  priviledged domain of scatterplots is the discovery of clusters or groupings of objects, and/or the relationship between t w o characteristics.  However, as the scope of this research is limited only to  studying time-series graphics, identifying DV cluster relationships falls outside the domain of tasks that are of interest.  According to current theories then, it is easier to extract scale-values from a dot or bar chart than from a line graph, and to extract trends from a line graph than from a dot or bar chart.  But it appears  uncertain whether symbol plots, dot charts, or bar graphs w o u l d best facilitate the extraction of level differences.  In other words, effective communication can be achieved with the help of graphics only  when the information intended to be read is represented in the most appropriate format.  The table  below puts together the various facets of theoretical support for the different sorts of information that should be optimally extractible from the various choices of graph format.  Graph Format  Question Type  Theories  Supporting Authors  Symbols/Dots Bars  DV Scale Values  Task Ordering Process theories Rules of Construction  Cleveland Kosslyn, et al.; Pinker; Bertin  Bars Dots/Symbols  DV Level Differences  Process theories Task Ordering  Kosslyn, et al.; PinkerCleveland  Lines  DV Trends  Process theories  Kosslyn, et al.; Pinker  Scatterplots  DV Clusters  Rules of Construction  Bertin  THEORETICAL PROPOSITIONS / 75 The above table reveals that while some theories have a narrower focus, others like the Kosslyn-Pinker process theories appear to provide a more comprehensive explanation on the use of graphics over a wider range of tasks. As a matter of fact, one may also list the corresponding empirical evidence in an additional column indicating whether the theoretical predictions were empirically supported or refuted. Because  much  of  the  empirical  literature  has  been  focussed  on  tabular  versus  graphical  representations, whereas this particular research has emphasized solely the use of different graphical methods for data extraction, it is argued that there is still the need of accumulative evidence on the superiority of one graphics over another before the suggested column is added.  Finally, the theories are unfortunately, much less explicit regarding complexity issues. introduction of graphical above table.  information  complexity  factors  Certainly, the  would lead to further complication of the  For instance, Pinker's (1981; p. 36) argument that the advantage of line graphs, for  perceiving DV trends over that of bar graphs, would be expected to be even more pronounced in situations where grouped bar charts are used instead of multi-line graphs.  This suggets a further  reclassification of question types into those performed with single representations versus those performed with multiple representations.  Cleveland (1985) claims that the further a pair of entities are  apart from each other in one planar dimension the less accurate will be the perception of their relationship along the corresponding perpendicular dimension.  Here, distance between points would  become a crucial variable when reclassifying question type. Among the theorists, Bertin's view appears most complicated:  For example, Bertin (1981; p. 115-121) argues that the rules of construction differ  for "reorderable" objectst with one, two, or three characteristics.* Even so, he extends his rules to apply to such diagrammatic construction as networks, maps and images. The reader is referred t o his work for details (e.g. see Bertin, 1981; p. 100-175).  + I.e. Objects, quantities or datapoints belonging to a specified changed. * The number of characteristics bears a one-to-one correspondence number of dataset groupings.  order with  which the  can  concept  be of  THEORETICAL PROPOSITIONS / 76 1. The Theory Investigated The principal thesis which effectively summarizes the theory under investigation is Pinker's (1981)  Graph Difficulty  Principle.  Briefly stated, this principle argues that  A particular type of information will be harder to extract from a given graph to the extent that inferential processes and t o p - d o w n encoding processes, as opposed to conceptual message lookup, must be used.  ln a separate paper, Pinker (1983, p. 3) summarizes the implication of his theory and principle as,  ..the  case of reading  depend  on the extent  visual pattern which  a certain  the  quantitative  to which  type  that graph  that the visual system reader trend  knows  that  of information  the  format  from  a certain  translates  graph  that trendf  format into  a single  can automatically  extract,  and on the extent  correspondence  in that  format  and the visual pattern  will  between  to the  holds.  ln this context, Pinker's theory claims that no one form of information representation is dominant across-the-board: instead, one representation may be better suited to yielding the answer to one kind of question but pooriy suited t o yielding the answer to another question, depending on which visual pattern conveys the answer and on the ability of the graph reader to encode that pattern. To illustrate the implications of this theory, Pinker (1981) argues that when trends no longer correspond to single attributes of a distinct perceptual entity but must be  inferred  from the successive intervals (see Pinker, 1981, p. 38), then the extraction process  becomes naturally more time-consuming and effortful. (see Pinker, 1981) presents a case in point.  The representation given in figure 3.2b  If the figure were re-constructed as a line graph  using variable 3 (i.e. A vs B) as the abscissa, and variable 1 as the parameter, Pinker asserts that this new graph would not portray the linear and accelerating trends as transparently as the  t  I.e. The sorts of information to be extracted.  THEORETICAL PROPOSITIONS / 77 Figures 3.2(a,b): Pinker's Illustrations of Graph Designs for Trend Reading S  THEORETICAL PROPOSITIONS / 78 present line graph, as these trends must be inferred from successive intervals.  Similarly,  corresponding trends of the bar-chart counterpart given in figure 3.2a are not as transparent as the recommended line graph for precisely the same reason: the trends may have to be deduced from successive pairwise comparisons of bars.  Obviously, not only is it important to realize how Pinker's general theory stated above pulls together the earlier discussion on graphics  theories,  but it is also critical to articulate the kinds of  plausible propositions and hypotheses that may be drawn from this and the other theories. Thus, the rest of this section focusses on advancing a set of plausible and testable propositions based on the theories, as well as providing the underlying reasoning for these hypothesized effects. The rationale for this approach is to avoid the major lack of a priori  reasoning, found in  past graphics-related research, concerning why a certain graph format would be better suited to performing a given task.  Note also that, in general, propositions regarding the interaction of graph format and those of graphical information complexity factors would be speculative rather than theory-based owing to the current lack of theoretical development along those directions.  a.  Proposition  1  Proposition 1.1: The extraction of correspondence  between specific DV scale  value  and  value(s) of IV(s) for a single datapointt is better suited to symbol or bar graphs than t o line graphs.  The current theoretical argument for this proposition is that in a line graph schema, the scale values of both the DV and the IV of a datapoint are effortful to extract because the datapoint forms an integral  + E.g. Q1 of  E1 as discussed in tables 3.4 and  3.5.  THEORETICAL PROPOSITIONS / 79 part of a line.  In contrast, a symbol, when uniquely determined, is an isolated and well-defined  'perceptual unit' and not, on the basis of Gestalt Laws, an integral part of a larger perceptual unit. Indeed, the symbol may well vary in size, which would then determine the nature of the 'perceptual unit' seen. Furthermore, the extraction of DV scale values for bars should be faster and more accurate than for lines simply because the top rectangular base of a bar is discrete and flat for interpolating DV values on the ordinate scale whereas each datapoint on a line is fully embedded and difficult t o isolate or detect for reading its value on the DV scale.  Proposition 1.2: Effects of graph format are expected to interact with factors of graphical information complexity when answering questions on the extraction of DV scale values of a single datapoint.  Moreover, it is hypothesized that with increasing information complexity, the expected effects would become more pronounced, for the simple reason that when more time period subdivisions are plotted on the abscissa of time-series graphics, datapoints on a line inevitably become closer and thus even harder to isolate. Similarly, with increasing number of datasets, which translates into multiple-lines, the disembedding task correspondingly require even more effort.t Yet, owing to the incompleteness of evidence in the literature, further research is desirable to clarify effects due to complexity factors. To date, there have been few studies on the effects of graphical graphical information complexity on task performance (e.g. Lauer, 1986; Yoo, 1985), and their results have not, moreover, been satisfactory.  b.  Proposition  2  Proposition 2.1: The extraction of correspondence between DV level difference  and values of  IVs for an adjacent pair of datapoints* is better suited to symbol or bar charts than to line graphs. t This would justifies that the variables manipulated graphical information complexity construct. £ E.g. Q2 of E1 as discussed in tables 3.4 and 3.5.  here  are  in  fact,  dimensions  of  the  THEORETICAL PROPOSITIONS / 80 The theoretical rationale for this proposition is that for a line, decoding the relative level of an adjacent pair of datapoints would still involve the breaking up of part of the overall line trend -- a 'Gestalt' on its own - although it is reasonable to expect that performance of this task with lines would be faster and more accurate than that of extracting DV scale-values on the ordinate scale.  In discussing his bar chart schema, Pinker (1981) hypothesizes that the DV level of each bar on a bar chart is easily decoded because not only is each bar complete by itself with an enclosed area and length,t but also the primary independent variable value (i.e. time period) of each bar is instantly identified.  Hence, "bar graphs are better than line graphs ... for illustrating differences between  dependent variable values corresponding to specific independent variable values since the desired values are specified individually in bar but not the line graph ..." (Pinker, 1981, p.39).  For an adjacent pair of symbols, decoding their relative levels translates into judging their positional differences along a c o m m o n scale,* an elementary task which is ordered as being more accurate t o perform than judging lengths or directions in the Cleveland-McGill (1984) task hierarchy.  Clearly, the views of different theories are not always in agreement.  Although the theories appear to  indicate that both symbol and bar charts are expected to be superior to line graphs for extracting DV level differences, whether symbols or bars are the more suitable format for such a task is still a debatable issue.  Indeed, the fact that symbols appear t o share a combined characteristics of bars in  terms of discreteness as well as that of lines in terms of connectedness places them as the prime representation for the rapid and accurate extraction of DV level difference information.  However, the  findings from this research will contribute to clarifying such a reasoning.  Proposition 2.2: The effects of graph format are expected to interact with factors of information  t  I.e.  It forms a unitary  Gestalt.  * In this study, this common scale refers to the ordinate scaling. That attribute scaling represented along the y-axis of time-series graphics.  is, the  DV  quantity  THEORETICAL PROPOSITIONS / 81 complexity in answering questions about level differences of an adjacent pair of datapoints.  With more complex graphical representations, such as increasing number of time periods being plotted along the abscissa, the advantage of symbols and bars over lines for extracting level difference information may be reduced, because many more level differences may have to be examined in the case of a complex graph; hence, the effectiveness of lines for the perception and comparison of level relationships, in contrast to other representations in a complex graphical context, may have to be recognized.  Similarly, if complexity is due to plotting a large number of datasets, there exists the possibility of a high adverse effect for bars. This is because, as the number of categories in the dataset classification increases, it becomes harder to perceive level differences in bars belonging to a specified category, than in lines or symbols, because bars belonging to the same category in the classification would be represented in a grouped bar chart as isolated bars. This is not true for symbols or lines. Again, these speculations need the support of empirical evidence.  c.  Proposition  3  Proposition 3.1: The extraction of DV trend  for a range of datapointsr is better suited to line  graphs than to symbol or bar graphs.  The theoretical reasoning is that DV trend extraction is best suited to line graphs rather than any other representations because  In line graphs, trends translate into the shape of a line or of a configuration formed by a set of lines, which is an easily avoidable  [SIC]* property.  However, in bar graphs, especially those that  t E.g. Q3 of E1 as discussed in tables 3.4 and 3.5. * The correct w o r d as I see it should be "encodable"  and not  "avoidable"  THEORETICAL PROPOSITIONS / 82 encode more than t w o variables, trends translate into a particular pattern of lengths of different bars, which, not forming a unitary Cestalt, must be examined and compared one or t w o at a time (Pinker, 1983).  Simply stated, this means that either a bar-to-bar or symbol-to-symbol comparison may have to be performed in graph schemas other than that of a line graph. Moreover, human information processing is limited by the number of bars or symbols that can be encoded simultaneously (Ericsson et al., 1980). However, a range of several points on a line may easily be encoded as having a single attribute.  Proposition 3.2: The effects of graph format are expected to interact with factors of graphical information complexity when answering questions about trends of a range of datapoints.  Naturally, as more and more datasets are to be plotted, the advantage of lines for trend perception over bars and symbols should increase, as the power of trend perception  in lines is further  "capitalized". Conversely, increasing the number of symbol arrays or bar series to be processed could easily confuse or overwhelm any graph reader owing t o their characteristics of being discrete.  D.  THE ANCHORING  CONCEPT  The concept of information  anchoring,  now described, is used to provide the theoretical basis for  characterizing experimental tasks as well as those of graph formats specifically investigated in this research.  1. Task Characteristics Table 3.8 illustrates how experimental tasks examined in this research can be characterized based on the anchoring concept.  First, task activities characterized as Croup I tasks are those with a strong or  high anchoring of information on both the ordinate (y-axis) and the abscissa (x-axis). This kind of task is usually limited to drawing (x,y) relationships although it does not matter whether the corresponding  THEORETICAL PROPOSITIONS / 83 relationships to be studied begin from the x-axis and work towards the y-axis or vice versa; namely, when a question starts with a time period (a value on x-axis) as in Q1 of E1 (see table 3.5) or, when the answer is to be a time period (a value on the x-axis) based on a given y-value as in Q1 of E2 (see table 3.5).  In contrast, task activities characterized as Group IV are those with a weak or low anchoring of information on both axes.  Since the concept of strong information anchoring on a graphical  component is related directly to the disembedding of information on the respective dimensional axis, then, by definition, exact questions regarding specific (x,y) correspondences are excluded in the case where tasks are classified as having only weak or low anchoring on both the abscissa (x-axis) and the ordinate (y-axis). Examples of such tasks are Q2 and Q3 of E3 (see table 3.5).  Between Group I and Group IV tasks, which represent the ends of a continuum, lie tasks whose activities are characterized as having a strong anchoring of information on one component (e.g. x-axis) but a weak anchoring of information on the other axis (e.g. y-axis). have a strong anchoring on the dataset component.  Note that all tasks in this research  Specifically, Group II tasks are those having a  high anchoring of information on the abscissa but a low anchoring of information on the ordinate, and, Group 111 tasks the reverse. Table 3.8 classifies all the experimental tasks investigated, according to the anchoring concept (i.e. Groups I, II, 111, and IV). Details of these tasks have been provided earlier (see tables 3.4 and 3.5.).  Note that the matrix for classifying task activities investigated can be extended from the  2 x 2  information anchoring characteristics on the planar dimensions (i.e. x-axis and y-axis) to a 2 X 3 anchoring of information on the x-, y-, and z- component of three-dimensional graphics.  In the next  section, characteristics of bars, symbols, and lines are discussed using the same concept about information anchoring on the various graphical components representing time-series data.  THEORETICAL PROPOSITIONS / 84 Table 3.8: The Anchoring Concept  A  C l a s s i f i c a t i o n of I n v e s t i g a t e d Tasks i n t h i s Research onto the Anchoring Framework  Ordinate(y-axis) Anchoring  High  (+)  Abscissa (x-axis) H i g h (+)  Group  I  tasks  Ql ( E l ) Q1 ( E 2 )  Low  (-)  Group  II  tasks  Q2,Q3 ( E 1 ) Q2,Q3 ( E 2 )  Anchoring Low (-)  Group  III tasks  Ql  (E3)  Group  IV  tasks  Q2,Q3 ( E 3 )  THEORETICAL PROPOSITIONS / 85 2. Graph Format Characteristics Figure 3.1 illustrates the three major components for information anchoring of a time-series graphical representation:  1.  Dependent variable (DV) component (i.e. x-axis anchoring)  2.  Primary independent variable (PIV) component (i.e. y-axis anchoring)  3.  Secondary independent variable (SIV) component (i.e. dataset anchoring)  Accordingly, it is possible to characterize bars, symbols, and lines on the basis of whether they show high, moderate, or, low anchoring of information on these respective graphical components.  For bars, since each bar stands on a pip that is anchored strongly to the abscissa, these are said to have high x-axis anchoring.  On the other hand, the t o p of each bar provides a flat platform (the  length of which varies from one design to another) which usually proves to help in interpolating DV scale values on the ordinate reliably. anchoring.  It can thus be argued that bars do have a moderate y-axis  Finally, as bars belonging to the same dataset are discrete and isolated from one another  so that a series of appropriate bars must be processed to derive information on the dataset, bars are seen t o exhibit a low dataset anchoring.  For symbols,  both the x-axis and y-axis anchoring appear to be about equally high because unlike bars,  they do not connect directly to any of the t w o planar dimensions.  Hence, symbols are considered to  have moderate x-axis and moderate y-axis anchoring relative to bars. Yet, different symbols belonging to the same dataset, although still discrete and isolated from one another, are more easily grouped than separate rectangular bars because of their relative sizes and of the similarity which they bear to lines in the case of multiple dataset representations. moderate dataset anchoring.  Therefore, symbols may also be said to have a  THEORETICAL PROPOSITIONS / 86 For lines, as all points on a line are simply embedded and isolating a point from any part of a line is literally breaking up a unitary Cestalt, lines are argued to have both a low x-axis and a low y-axis anchoring.  However, relative to either bars or symbols, lines have the best and highest dataset  anchoring.  3. Matching Formats to Tasks ln this section, several additional propositions are postulated, that relate essentially to the above assumptions regarding the various characteristics of tasks and graph formats studied in this research. In other words, they may be regarded as a priori  predictions on the matching characteristics of the  various graph formats (i.e. bars, symbols, and lines) with the various task categories (i.e. Croups I, II, III, and IV tasks) specific to the context of this research.  a.  Proposition  4  Proposition 4: Performance of task activities with characteristics of strong information anchoring on both the abscissa and ordinate framework (i.e. Group 1 tasks) is best achieved with the use of bars or symbols  but not  lines.  Group 1 tasks (x-axis anchoring = high, y-axis anchoring = high), as defined, will be limited to locating (x,y) points for either question or answer (e.g. Q1 of E1 and E2, tables 3:5 and 3.8).  For these tasks,  bars are expected t o help in regard to exact location of points, either expressed in the question and/or required in the answer, because among all three formats investigated, they provide the best x-axis anchoring (e.g. compare figures 3.2a,b ).  Locating points with symbols will be easier than with lines  because effort is required t o separate points embedded on a line (i.e. cutting a point on a line). Since lines have the lowest x-axis and y-axis anchoring, cutting points on lines will be one of the most difficult task. Indeed, the theory argues that predicates of patterns and trends are stored or assembled in memory but not exact iocations of points when lines are read. Overall, lines having characteristics  THEORETICAL PROPOSITIONS / 87 of being continuous and disjointed from both the axes component would be the most undesirable format to use for performing Group I tasks. At this point, note that because of the types of scaling depicted  by y-axis (ratio scale) as opposed  to  x-axis (ordinal scale) for time-series  graphics,  interpolation on the y-axis appears t o be harder to perform than interpolation on the x-axis.  This  observation will also be useful in distinguishing between the difficulty of using lines to answer Q1 in E1 as opposed to that of Q1 in E2.  b.  Proposition  5  Proposition 5: Performance of task activities with characteristics  of a strong  information  anchoring on the abscissa framework but a weak information anchoring on the ordinate framework (i.e.  Group II tasks) is best achieved with the use of bars, when the task at hand  requires only the extraction of single datapoints, and with the use of symbols  when the task at  hand requires the simultaneous extraction of multiple datapoints.  Group II tasks (x-axis anchoring •= high, y-axis anchoring = low) are limited t o those with specific time period information in either answer or question (table 3.5) but are not involved with y-axis values (e.g. Q2 and Q3 in E1 and E2, table 3.8). x-axis anchoring.  The strength of bars for these tasks lies in their having a high  However, since they are the most discrete and individually processed among the  graph formats, their use will only be appropriate when singular datapoints are t o be extracted one at a time.  Conversely, symbols, which more nearly are continuous, will have an edge over bars when  multiple datapoints are to be extracted simultaneously.  c.  Proposition  6  Proposition 6: Performance of task activities with characteristics of strong information anchoring on the ordinate framework but weak information anchoring on the abscissa framework (i.e. Group III tasks) is best achieved with the use of  symbols.  THEORETICAL PROPOSITIONS / 88 Croup III tasks (x-axis anchoring  =• low, y-axis anchoring  =  high) are limited t o those where  time-period information is of no concern either in question or answer but the explicit focus is on a specific y-axis value (e.g. Q1 in E3, tables 3.5 and 3.8). For such tasks, performance is expected to be best achieved with the use of a representation like the horizontal bars.  Since only vertical bars are  used in this research, symbols provide the second best alternative because they have been found t o yield a more accurate anchoring of information on the ordinate (y-axis) than either bars or lines (Cleveland, 1984; Cleveland & McGill, 1984).  Note that as all tasks in this research have a strong dataset anchoring characteristic, the use of symbols can be expected to have a slight advantage over that of bars because bars have a low dataset anchoring whereas symbols have a moderate dataset anchoring.  Hence, the use of symbols for Q1 of  E3 should prove superior to either bars or lines.  d.  Proposition  7  Proposition 7: Performance  of task activities with  characteristics  of weak anchoring  of  information on both the ordinate and abscissa framework (i.e. Group IV tasks) is best achieved with the use of lines and worst with the use of bars.  Group IV tasks (x-axis anchoring = low, y-axis anchoring = low) are restricted to those focussing on the dataset information such as trends or general patterns (e.g. Q2 and Q3 in E3, tables 3.4, 3.5 and 3.8).  Characteristics of lines have a corresponding match to these kinds of tasks because, first, there  are no anchoring characteristics to the x- and y-axis components with points on lines, and second, lines have the best dataset anchoring.  Bars, being the most strongly anchored to the abscissa  framework as well as having the worst dataset anchoring among the graph formats studied will be expected to result in the worst match to these tasks.  THEORETICAL PROPOSITIONS / 89 In summary, the anchoring framework provides a strong basis for classifying decomposable tasks that are investigated in this research besides facilitating the generation of relevant propositions based on matching characteristics of tasks and those of graphical information representations.  IV. EXPERIMENTAL METHODOLOGY In chapter 3, several major factors believed t o influence the use of a graphical presentation were identified from the literature and several propositions were drawn from the theories.  In this chapter,  the experimental methodology used for studying those factors and for testing those propositions will be discussed.  First, it should be noted that the adoption of a cumulative experimental approach in this research program  has many advantages.  They include the generalizability of pervasive effects and the  opportunity t o manipulate important variables differently in different experiments.  For instance,  manipulation of many levels of the task variable is more easily accommodated with a series of related experiments than with a single complex experiment.  In addition, findings drawn from a program of  experiments allow progressive examination of a particular hypothesis.  Finally, results based on a  program of research could usually be generalized beyond the confines of individual experiments, whereas results from a single experiment would be considerably more limited.t  The series of experiments comprising this research will: involve related sets of experimental variables; test similar experimental hypotheses, with the emphasis placed on the graph format by question-type interaction effect; follow identical experimental plan and procedures; conform t o a general statistical model; and use closely resembling graphics stimuli.  A. EXPERIMENTAL VARIABLES As indicated previously, elapsed time was the principal dependent variable in this research.  Accuracy,  a secondary criterion, was included t o ensure adequate control over possible time-accuracy tradeoff effects that might surface and present difficulties in the interpretability of results (see chapter 1).  t Other advantages of using a cumulative experimental approach may be found in Dickson et al. (1986) and jarvenpaa & Dickson (1988). See also Weick (1965) on advantages associated with using an experimental methodology. 90  EXPERIMENTAL METHODOLOGY  / 91  The independent variables in each experiment were: 1.  Graph Format  2.  Information Complexity  3.  a.  Variations in Time Period  b.  Variations in Dataset Category  Question Type  A "session" variable was included to control for learning. An individual difference variable based on the field-dependence-independence construct (Witkin et al., 1971) was included as a covariate in the experimental design.  1. The Dependent Variables  a.  Time  Time, the principal criterion, was measured as the duration between (1) the time at which the graphical presentation and the question to be answered appeared on the CRT screen and (2) the time at which the subject pressed the 'answer' key ( i.e.  " 1 " or " 2 " ) to record his response (see figure 4.1).  It was  captured unobtrusively by the computer.  b.  Accuracy  Accuracy was a secondary criterion. A score of " 1 " was assigned to each correct answer picked from the binary-choice questions and a " 0 " to others. code, although  The scoring scheme was chosen as being easy to  it resulted in a distribution with undesirable departures from normality.  That  binary-choice questions were used instead of multiple-choice questions was because timing was critical and participants tended t o spend more time figuring out on the keyboard when multiple-choice questions are used rather than binary-choice.  EXPERIMENTAL METHODOLOGY / 92 On the issue of normality, as subjects were asked to replicate each experimental session twice, an alternative scoring scheme was to combine their scores over the t w o sessions.*  2. T h e I n d e p e n d e n t V a r i a b l e s  a.  Graph  Format  Graph Format was a prime factor of interest studied in each of the experiments. The different types of graph format investigated were: 1.  Symbols  2.  Bars  3.  Lines  b.  Question  Type  Question type, which was based on the classes of quantitative information that could be extracted from a presentation, was another key factor of interest in this research. Accordingly, questions were designed on the basis of fundamental classes of information that were to be extracted: 1.  Exact Questions (Q1) -- These were questions testing the reading and understanding of the relationship between a given DV scale value and the exact scale value of a single  2.  datapoint  Relationship Questions (Q2) -- These were questions testing the reading and understanding of the relationship between the DV level differences of a pair of adjacent datapoints  3.  Trend Questions (Q3) -- These were questions testing the reading and understanding of the DV trend among a range of successive datapoints  As discussed in chapter 3, the key characteristic of tasks in experiments E1 and E2 was that of a strong abscissa anchoring although E1 tasks begin with an x-axis value and work towards the DV attribute component while E2 tasks begin with information characterizing the DV component and end with t  This issue is discussed further in Chapter 5.  EXPERIMENTAL METHODOLOGY / 93 values on the x-axis.  The key characteristic of E3 tasks was the absence of this strong abscissa  anchoring noted for E1 and E2 tasks.  In other words, time period information is neither provided or  asked in experiment E3 tasks (chapter 3).  c.  Information  Complexity  Factors of graphics information complexity formed the secondary factors of interest in this research program.  Owing to the limitation of the number of sub-factors that could be effectively studied at  once, the construct of information complexity for time-series graphics was operationalized as variations due to two attribute components: 1.  Variations in Time Period -- This factor was manipulated at t w o levels: a.  7 time periods  b.  14 time periods  Presumably, an increase in the number of time periods represented along the abscissa would correspondingly increase the amount of irrelevant information that must be processed in order to extract the relevant answers that were embedded in the experimental graphics displays. 2.  Variations  in  Dataset  Category  -  This  factor  was  manipulated  differently  for  different  experiments, t For experiments E1 and E2, the treatment levels for this factor were: a.  One dataset  b.  Three datasets  For experiment E3, the treatment levels for this factor* were: a.  Two datasets  b.  Three datasets  t Refer to tables 3.6 and 3.7, which show the different factorial treatment levels for all three experiments. * The limited size of the Packard-Bell micro-computer monitor used in this research (which is similiar to the size of a typical micro-computer monitor) allows only a maximum of three 14-period datasets to be plotted simultaneously in any one display.  EXPERIMENTAL METHODOLOGY / 94 It is expected that an increase in the number of datasets plotted would result in a corresponding increase in the amount of information to be processed and thus in impairment of task performance.  In particular, the use of only multiple datasets for task activities examined in  experiment E3 was because these tasks concerned only with information extraction from the SIV attribute component (i.e. dataset information).  3. The Session Variable Each subject  went  through  a practice  period  followed  by t w o experimental  sessions.  Each  experimental session consisted of 36 treatment combinations.+ The purpose of the replication was to control for possible effects due to learning.  4. The Covariate Since tasks of identifying trends, finding specific point values, and comparing level differences involve perceptual  disembedding,  it  is possible  that  performance  may be  influenced  by  individual  characteristics.  Witkin et al. (1971, p. 4) described such individual characteristics in terms of a person's "perceptual style", which they applied the construct of "field-dependence-independence":  In a field-dependent mode of perceiving, perception is strongly dominated by the overall organization of the surrounding field, and parts of the field are experienced as "fused",  ln a  field-independent mode of perceiving, parts of the field are experienced as discrete from organized ground.  Indeed, individual difference or user characteristics had always been considered an important variable to be controlled in MIS laboratory experiments, particularly those investigating on the characteristics of t See appendices respectively.  B, C, and D for the 36 trials  tested  in experiments  E1, E2, and E3  EXPERIMENTAL METHODOLOGY  / 95  the human-computer interface (see Mason & Mitroff, 1973; Dickson et al., 1977; Benbasat et al., 1986). Consequently, the individual difference construct of field-dependence and field-independence as operationalized by the GEFTt score (Witkin et al., 1971) was introduced as a covariate in the statistical model used to analyze the data collected for the series of studies conducted for this research program.  A variety of equivocal results had emerged in the literature dealing with effects of individual differences on system utilization and performance (see Dickson, 1971; Benbasat & Taylor, 1978; Zmud, 1979; Mock & Vasarhelyi, 1983; Huber, 1983).  The purpose of this research was to limit the study of  individual differences as a possible covarying factor affecting time and/or accuracy performance. Interactions of the individual difference factor with other independent variables investigated (e.g. graph format) would be outside the scope of this research.  B.  EXPERIMENTAL  HYPOTHESES  The experimental hypotheses were based on the theoretical propositions advanced in chapter 3.  In  general, null hypotheses (i.e. hypotheses of no difference between means among the subpopulations) as defined among factors of interests were tested at the a = 0.05 level of significance. Attention was focussed on a priori effects that were expected to be significant (e.g. the Graph Format by Question Type interaction).  A more stringent criteria of a — 0.01 was also applied for distinguishing among  level of significance.  On the basis of current theories and earlier discussion, the following general effects* were expected t o be significant:  1.  Session  -- The presence of learning is expected (see DeSanctis & Jarvenpaa, 1985).  Hence, performance in the second session should improve compared t o performance in  t *  I.e. Group Embedded Figures Test. Interaction effects of no relevance to the theories will be excluded.  EXPERIMENTAL METHODOLOGY / 96 the first session.  2.  GEFT Scores  -- Subjects w h o  scored  high  in tasks requiring  the  isolation  and/or  differentiation of relationships from a context (i.e. field-independents) were expected to outperform  their  counterparts  field-dependence-independence  (i.e.  field-dependents).  construct  as measured  In  other  by the  words,  Grouped  the  Embedded  Figures Test (GEFT: see Witkin et al., 1971) scores was expected to influence task performance significantly.  While there had been some controversy as to the inclusion of cognitive style variable in MIS graphics research (Huber, 1983; Robey, 1983), some recent evidence  provided  support for the superiority of field-independent subjects over field-dependent subjects in the  performance  of  disembedding  tasks  regardless  of  the  format  of  information  presentation (e.g. see Lusk, 1979; Benbasat & Dexter, 1979, 1982, 1985).  3.  Graph  Format  — No particular form of Graph Format was expected to be significantly  different from others.  A significant Graph Format effect was not expected for all  experiments.  4.  Question  Type  -  Task activities within each experiment were not expected to vary  significantly although it was expected that task activities across the three experiments might differ significantly; that is, E3 tasks would be more complex than E2 tasks and E2 tasks would be more complex than E1 tasks (chapter 3).  5.  Time Period  - Increasing number of time periods represented along the abscissa (x-axis)  of time series graphics were expected to impair task performance.  EXPERIMENTAL METHODOLOGY / 97 6.  Dataset  -- An increase in the number of datasets depicted was expected to affect task  performance adversely.  7.  Graph Format x Question  Type Interaction  — A highly significant Graph Format x Question  Type interaction was expected for all experiments (Pinker, 1983). Different graph formats were expected to facilitate different types of tasks.  In particular, supports for a  priori  propositions advanced in chapter 3 were expected. The key hypothesis was the matching of Graph Formats to Tasks.t  The above hypotheses included only those effects of key interest.  Other interactions such as  Graph Format by Dataset, Graph Format by Time Period, Question Type by Dataset, and Question Type by Time Period were of secondary interests.  Limitations of current knowledge  about graphics did not make it possible to state explicitly which effects were expected to be significant at this point.  C.  EXPERIMENTAL  DESIGN  A full within-subject repeated measures factorial design was planned for each of the experiments.* Subjects were provided with all treatment combinations, which were completely randomized for each subject during each experimental session.  Table 4.1 shows the 36 different treatment combinations undertaken by each participant in each experimental session.  As shown, these combinations consisted of totally crossed treatments of 3  levels of graph format, 3 levels of question type, 2 levels of time period, and 2 levels of dataset category.  t *  I.e. Groups I, II, III, and IV tasks. See Table 4 . 1 .  EXPERIMENTAL METHODOLOGY / 98 Table 4.1: A M u l t i - f a c t o r Repeated M e a s u r e s E x p e r i m e n t a l Design  Session  One  Bars  mbol-s  Information Complex it y Ql  Q2  Q3  Q1  Q2  IJ i n e s Q3  a  Q2  Ql  -  Q3  -  b  c  a  Session  Two (Repeated Treatments)  I n f o r m a t i on Complex i t y  For experiments 1 and 2, they c o m p r i s e d  1  L  2.  For e x p e r i m e n t 3, they comprised: a.  A single dataset w i t h 7 time periods  b.  A single dataset w i t h 14 time p e r i o d s  c:  Three datasets w i t h 7 time periods  d.  Three datasets w i t h 14 time periods  a.  T w o datasets with 7 time periods  b.  T w o datasets with 14 time periods  c.  Three datasets with 7 time periods  d.  Three datasets w i t h 14 time periods  EXPERIMENTAL METHODOLOGY  / 99  The actual sets of treatment combinations for experiments El, E2, and E3 are provided in appendices B, C, and D respectively.  These treatment  combinations  were administered as 36 totally  completely randomized experimental trials for each subject during each experimental session.  and Each  subject performed t w o sessions of 36 trials. Taking the replication of the t w o experimental sessions as a separate factor in itself led to a total within-subject design of a 2  3  X 3  2  repeated measures factorial  model with one covariate for each of the experiments in the series.  Adopting the kind of standardized notations advocated by Winer (1962, 1971, p. 540), the following statistical model was adopted:  Y  =/i  ijklmn 8  a  j  as  +U  im  l  S  +)  jm  S  +7  i  +|3  +78  + @ 6  i l  M  +a  j  +a  + a  k l  $  0 $  0  i  |3 +y + j k 8  l i  0  i k 6  k  + a / 3 $  l m  + a  0 7  7  + a  j  + 8 $  km  j l m k l m  i  7  8  $  i j k l m  + £ 7 8  l  j 7  + a  i j m  + n  — Session, i = 2 levels  i  = Graph Format, j = 3 levels  0  j = Question Type, k = 3 levels  7  k  $  + a  l 6  $  8  + $  +  m  + £ 7 $  i l m  - x)  n  7  0  i j k l  + a  +C*(x  n  Y = Dependent Variable Measures (Time, Accuracy)  a  k  i k m  Where:  M = Grand Mean  + |3 7 + a |3 7 +8 + j k i j k 1  j k m  + e  ijklmn  +  EXPERIMENTAL METHODOLOGY / 100 = Time Period Variation, I = 2 levels  6  1 $  = Dataset Category, m = 2 levels m  IT = nth subject, a random factor, n = s subjects n  C = Regression Coefficient for individual difference (i.d.) factor or, Covariate = nth individual GEFT Scores  x n  x = Mean GEFT Scores e = Within-Subject Error Term  D.  EXPERIMENTAL  PROCEDURES  The experimental task procedures were similar across all experiments.  The basic experimental task  concerned the answering of binary-choice questions presented on a display screen (see appendices B, C, and D).  Reasons for using binary-choice questions and the mix of subjects recruited for each  experiment were discussed elsewhere.  Subjects, tested individually, were given a set of instructions, as shown in appendix H.  They were  asked specifically to respond as q u i c k l y a n d a c c u r a t e l y as they possibly can for each trial in each of the experiments assigned to them no matter whether the session was actual or practice.* Moreover, subjects were are encouraged to ask any questions they might have during the practice period to reduce possible delays or interruptions during the course of the actual experimental run.  Subjects were also informed verbally before beginning the actual experimental session that it was t Note that to adequately control time-accuracy tradeoff, subjects should be aware that they attend to both time and accuracy to the best of their ability. The more complex a task, the easier it would be for them to tradeoff time for accuracy or vice versa.  EXPERIMENTAL METHODOLOGY / 101 possible to have their responses discounted in the final analysis of results should their overall time or accuracy performance fell below some predetermined cutoff points.  While subjects were paid $10  bonuses as incentives for participation, further incentives were provided by awarding additional cash prizes of $25, $20, $15, $10 and $5 to the top five performerst in each of the experiments.  A secondary set of data similar in structure to those used in the actual experimental trials were designed for the practice period.  Feedback on both accuracy and total time taken were provided by  the experimenter during the practice period to encourage subjects to perform better.  Subjects w h o  answered 3 out of the 12 questions incorrectly during the practice period were requested to redo the 12 practice trials.  For the experimental session, the following procedures as flowcharted in figure 4.1 were adopted: 1.  Subjects were asked to read the question on a CRT display monitor, without time constraint.  2.  They were subsequently asked to hit the 'return' key to receive a graph display, with the same question which they were to answer remaining at the b o t t o m part of the display screen.  3.  As soon as they felt certain of their intended answer to a particular question, they were to hit an 'answer' key ( i.e. either " 1 " or " 2 " ), whereby the time was automatically recorded by the computer.  4.  The recording of an answer quickly cleared the current display and question. The next question was shown automatically as subjects entered the next trial (i.e. back to step 1).  The step procedures followed in the actual experimentation are provided in appendix G.  Altogether,  about 36 responses were collected from each subject during the experimental session and about 12 responses during the initial practice period.  Furthermore, each subject was asked to perform t w o  sessions of the same experiment. The advantage of capturing this additional information was to guard against spurious effects due to learning alone rather than to the administration of different treatment t  I.e.  Based on equally weighted time and accuracy  performance.  EXPERIMENTAL METHODOLOGY  / 102  combinations.  These procedures were pilot tested.  No complaints of fatigue were revealed.  Indeed, a few subjects  w h o performed more than 100 continuous question-and-answer trials due to an earlier programming 'bug' were surprisingly oblivious to the fact that they had in fact undergone so many trials. This was because the tasks experimented were neither highly demanding nor trivial.  E. EXPERIMENTAL  STIMULI  In each experiment, the stimuli were sets of graphs constructed from variations of trend patterns that were representative of a wide range of time-series.  These trends consisted, in general, of simple  non-crossing variations of upward and downward slopes.  Four sets of data were utilized for each study. These data sources were not of immediate interest but were constructed according to specific rules, some of which would be described later.  For instance,  changes in slopes were averaged out to take care of possible effects due t o still unknown factors of  complexity (see Wainer et al., 1982; Lauer et al., 1985). Basically, these data sources corresponded to the different treatment combinations of information complexity factors manipulated. 1.  2.  For experiments E1 and E2, the data comprised a.  A single dataset with 7 time periods  b.  A single dataset with 14 time periods  c.  Three datasets with 7 time periods  d.  Three datasets with 14 time periods  For experiment E3, the data comprised a.  Two datasets with 7 time periods  b.  Two datasets with 14 time periods  c.  Three datasets with 7 time periods  d.  Three datasets with 14 time periods  EXPERIMENTAL  METHODOLOGY / 103  Figure 4.1: An Experimental Procedure Flowchart  Subject sees a question on lower part of screen Proceed to Ask for Graph Presentation  Yes  S u b j e c t h i t s CR f o r G r a p h t o show on screen (upper) T h i s s t a r t s the Timer  Proceed t o New T r i a l  Subject considers the answer t o t h e q u e s t i o n  Proceed t o H i t the Answer Key Answer E n t e r e d Yes  *Timer Stopped* Computer Records the Answer  Both Upper & Lower Hal f o f CRT c l e a r e d New Q u e s t i o n appears  EXPERIMENTAL METHODOLOGY / 104 The next step was to represent these data sources in 3 types of graph forms, which resulted in a total of 12 graphs. Finally, the 36 trials were made up of these 12 graphs repeated thrice -- once for each of the three question types. Applications of these data sources had, in fact, been counterbalanced across the 36 trials according to the different treatment combinations.  Rules of construction for these data sources were: 1.  The questions must be unambiguous -- This feature was tested by the following procedure during pilot testing: Whenever subjects responded wrongly to a particular question in one of the trials, the same treatment combination would be administered to them after the completion of all 36 normal trials. In this way, the data source(s) were modified for those questions which all of the pilot subjects answered wrongly during their first attempts, as well as those particular questions which were subsequently repeated several times to a pilot subject, indicating that those questions were not easy to understand.  2.  The different data sources should not produce overlapping slopes. As noted, the elimination of cross-overs for the different datasets would help to control effects that might be attributed to Schutz's (1961b) confusability  3.  factors.  The variations in slope of each data source(s) were to be relatively constant.  In effect, this  controlled the regularity of slope changes ( Lauer, 1986 ) so that no one data source would be at a different level of difficulty.  Appendices B, C, and D illustrate the different types of time-series displays used as experimental stimuli.  Finally, a stand alone micro-computing environment had been chosen in preference to the main frame environment.  This was to avoid the problem of shared CPU resources, besides ensuring that the  timing data would not be affected by the load on the system.  Moreover, to cope with possible  confounding of color, only monochrome graphics displays were used. Every effort had been made to  EXPERIMENTAL METHODOLOGY / 105 ensure that these displays conformed to pertinent principles of graphics design, such as those laid out in Kosslyn et al. (1983), Ives (1982), Tufte (1983) and Bertin (1983).  V. DATA ANALYSIS: THE REPEATED MEASURES DESIGN The research model to be used was discussed in chapter 4. This chapter begins with an examination of the structure of the repeated measures design.  More importantly, since there are advantages and  disadvantages associated with different types of repeated measures design (Eiashoff, 1986), it is essential to discuss clearly the type of repeated measures design used and why it is used. Indeed, the advantages of using such a design have been strongly emphasized in the literature (e.g.  Kerlinger,  1973; Shneiderman, 1980), but the kinds of potential problems that may affect the validity of results using this kind of design are very often neglected (Eiashoff, 1985). In any case, many of the problems that are discussed may, in one way or another, be overcome in a controlled experimental setting.  Also included in this discussion are the various steps and methods that are used in the analysis of the experimental datasets as well as an evaluation of how well the data structure conforms to the various assumptions  underlying the analysis of variance-covariance  for a within-subject  design.  Issues  regarding normality assumption, homogeneity of variance/covariance, symmetry condition, the choice of univariate versus multivariate statistical procedures, and the use of different multiple-comparison techniques are therefore discussed in this chapter.  It is important to note from the outset that the  focus of statistical analysis is on interaction effects that are of interests, particularly the graph format by question type interaction.  Moreover, the focus of contrasts among means for this interaction are  limited to those planned contrasts that are of critical concern.  A.  THE REPEATED  MEASURES  DESIGN  Owing to the lack of a standardized terminology in the literature on experimental designs, a working definition of Repeated Measures Design is provided first.  Eiashoff (1986) defines a repeated measures design as one that involves C groups of experimental units or subjects and in which responses are measured at k time points or under k different conditions,  106  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 107  where k is greater than 1.t  The type of repeated measures design that most closely resembles the current experimental design is that of comparison of treatments applied in sequence.  In this type of design, each experimental  subject receives several different stimuli or treatment combinations and thus several responses are recorded for each subject.  In this research, an experimental session has 36 treatment combinations.  Since subjects are asked to replicate session twice, the design combines time course and treatment comparisons (see Elashoff, 1986).  A major advantage of using the repeated measures design is its efficiency.  Not only does this design  greatly economize on the number of subjects that must be recruited for each of the experiments, but its use also eradicates all sources of experimental bias due to variability between subjects.  In other  words, each subject serves fully as his own control (see Keppel, 1980).  More critically, such a design has the advantage of yielding powerful F-tests even for restricted sample sizes (Cohen, 1977). As a further advantage with the use of a repeated measures design, the need to worry about departures from the assumed normality distribution is also reduced (Norton, 1952).  In  fact, this design is especially suited to obtaining precise measurements when many factors have to be investigated at the exploratory stage and when only a limited sample size of the subject population is available.  With limited resources, this is exactly the case with this research.  Indeed, the number of  subjects that would have to be recruited should a full "between-subject" factorial model be used would be excessive.  For instance, with only 20 observations in each of the 36 cells for each of the  experiments to reach at least an acceptable power level for the F-tests of the various effects, the full factorial model would require a total subject sample of 20 (observations) x 36 (cells) x 3 (experiments). Besides, there is also a large source of experimental errors due to variability between subjects in t See figure 5.1. If k equals 1 and the response y is measured subject, then the situation is that of an ordinary analysis of variance.  only  once  for  each  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 108 selecting this alternative design.  Obviously, the repeated measures design is not without potential problems, including fatigue, learning, carryover and/or  order  experimental setting.  effects.  Yet, these special problems  are not  uncontrollable within  an  Indeed, pilot testings revealed little sign of fatigue among the participants. The  tasks are reasonably interesting and none of the pilot or the actual experimental subjects complained of discomfort about the number of experimental trials they had been asked to perform.  Instead,  subjects frequently showed an improvement in performance when session 2 results were contrasted with those  of  session 1 results, indicating that the experience  of fatigue was not  Apparently, subjects experienced learning going from the first to the second session.  pervasive.  Performance  during the second session for each experiment resulted in not only faster but usually also more accurate responses. purpose.  The introduction of a Session variable to control learning was thus fulfilling its  It is believed that a third session might begin to induce undesirable fatigue effect on the part  of the participants with only marginal improvement expected.  Finally, the possibility of a carryover and/or an order effect is minimized by having the treatment combinations "counterbalanced in each of these experiments (see Shneiderman, 1980; Simcox, 1981). In fact, the various factor levels studied are not only balanced in the different treatments administered to yield equal cell responses in every case, but the application of these treatments is also applied in an individually randomized fashion within each experimental session.  In other words, the experimental  design itself ensured that the carryover problem would be well under control (Davis, 1985; Yoo, 1985; Lauer, 1986).  Having discussed the advantages of using a repeated measures design in experiments where a large number of factors may be explored in a controlled setting and on the various problems such a design have, and how these problems may be overcome, the next section discusses the various steps that were used in the process of data analysis.  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 109 Figure 5.1: A Repeated Measures Design  Group  Experimental u n i t ( s u b j e c t ) i n group  Responses 1 2  k  ••..  y  1 2  y  y  y  y  •  y  1 2  y  y  ...  y  y  y  . •-  y  y  y  • •.  y  y  y  - ••  y  y  y  n  2  1 2  n  Source: Eiashoff,  . . .  G  1986, p. 6, reproduced with  permission.  •  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 110 B. STATISTICAL ANALYSIS PROCEDURES The following major steps constituted the process of statistical analysis for the experimental datasets: 1.  The experimental data were first screened for missing values, errors, obvious patterns of correlations among the variables, and for unexpected values or outliers.  2.  Descriptive statistics, histograms, and scatter plots showing the relationships among those variables of interest were examined to provide the initial base for any inferences. The inclusion of the Session variable in the initial analysis provided  the basis for detecting  possible  confounding due to learning. Emphasis was placed on the analysis and interpretation of Session 2 data since these data were more representative of stabilized responses. 3.  The GEFT measure was initially treated as a covariate in the ANCOVA model to assess its effect on the dependent measures.  In addition, correlations between the GEFT scores and mean  scores of each subject's performance combined over the 72 total treatment combinations were calculated to assess h o w the GEFT measure might relate to performance.  In this way, a decision  could be made on whether to exclude the GEFT measure from the final statistical model. 4.  More importantly, the possibility of a time-accuracy tradeoff effect was assessed by correlating the means of time and accuracy measures combined across the two sets of 36 treatment combinations for each subject.  Since each experiment was assigned a total of 24 subjects, this  means that only t w o sets of 24 time and accuracy mean scores were correlated so that the final correlational value obtained could possibly be a reflection of chance. Therefore, to ensure that the procedure yielded a meaningful value, not only was its validity further ascertained by correlating between randomly selected time and accuracy measures among the 24 subjects, but correlations between time and accuracy performance scores for e a c h individual were also used as the basis for detecting the presence of any high time-accuracy tradeoff.  Outliers detected, as  well as those exceeding cutoff criteria selected for time and accuracy were not included in further statistical analysis. 5.  Interactions between graph format, question type and other variables of interests were tested using the initial multivariate MANOVA procedure as well as the more commonly used univariate  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 111 ANCOVA approaches.  Level of significance for testing null hypotheses on time and accuracy  was set at the nominal 5% level. A more stringent criteria of a — 0.01 level was also used for distinguishing among significance of effects.  Interpretation of experimental results based on the  validity of the F-tests conducted took into account how the various assumptions were met, such as the normality condition, the homogeneity of variance-covariance, and the independence of the residuals, or, in the case of the repeated measures design, the sphericity or symmetry condition.t Power analysis of the various F-tests was also performed by means of an approach advocated by Cohen (Cohen, 1977; see also Baroudi & Orlikowski, 1987).  This augmented  confidence in accepting hypotheses of no difference between means (Glass & Stanley, 1970, p. 283). 6.  Following the ANOVA F-tests, contrasts between means of subpopulations that are of interest are analyzed using the Dunn-Bonferroni method (Dunn, 1961) of multiple comparisons.*  7.  Finally, the alternative scheme of combining subjects' performance on accuracy for sessions 1 and 2, with twice as much weighting given to results obtained in Session 2, was adopted so as to achieve a more normal distribution of the data.* Again, the criterion of a = 0.05 level was chosen for testing the null hypotheses among means of subpopulations of interests.  C.  THE EXPERIMENTAL  RAW  DATA  The experimental raw datasets were derived from random assignments of seventy-two subjects, divided into three pools of twenty-four to be assigned to each of the experiments.  The subject  population consisted of about an equal number of mostly second-year commerce undergraduate and first-year MBA students. All of them were enrolled in introductory MIS courses. The overall average age of subjects was approximately 25 years old.  The only criteria of selection was that the subject  t These ANOVA nomenclature will be discussed further in separate sections. * The reason for adopting this method is discussed in the final section of this chapter. Briefly, for planned contrasts between two or more means following an ANOVA procedure, this specialized method offers a greater amount of flexibility and is more powerful than most others (Dunn, 1961, p. 54-58). * Refer to the discussion on the accuracy measures in chapter 4.  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 112 volunteered for the study. Appendix E shows the subject recruiting form and appendix F the consent form, which subjects must sign before participating in the study. Corresponding t o the series of three experiments, the summary statistics for the raw datasets gathered in this research are presented in table 5.1.  Applications of the BMDP1D data screening procedure revealed only a small pocket of initial outliers. They represent atypical observations with respect t o the t w o performance measures of time and accuracy.  Thus, subjects whose performance on accuracy fell below the 80% cutoff criterion were  removed from further analysis.  Analysis of the time data suggested, however, the use of different  cutoff criteria for different experiments.  It was also determined that these criteria should result in the  removal of the least possible number of outliers.  Hence, a 5 seconds cutoff criteria was used in  experiment E l , a 10 seconds in experiment E2, and a 15 seconds in experiment E3 to eliminate relatively slow responders.t There was no missing data for the three experimental datasets collected. In table 5.1, outliers are highlighted.  In general, subjects' performance scores were within reasonable ranges (table 5.1).  Since no major  interruptions occurred, such as errors in program execution, or a system crash during the course of the actual administration of the experiments, all the original data with exceptions of the few identified outliers discarded, were included in the statistical analyses.  Table 5.1 shows the mean latency responses, the percentages of correct responses, and the respective standard errors for all three groups of 24 participants.* These are average scores combined over the two experimental sessions.* They include responses comprising the initial 36 attempts in each session of the experiments.  No subsequent error correction attempts during the experimental sessions were  included in the tabulated means. t Note that outliers identified also exhibited relatively higher standard error of * Totally different groups of subjects were used in different experiments. * I.e. The 72 treatment combinations.  mean.  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 113 Table 5.1: Summary of Experimental Raw Datasets Participants SA  3 4  SC SO SE SF SG SH SI SJ SK SL SM SN SO SP so sa SS ST su • sv sw sx  5 6 7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 IS 16 17 18 19 20 21 22 23 24  ** *  (MEAN)  1  2 S3  SA S3 SC SO SE SF SG SH SI SJ SX SL SM SN SO S? SO S3 SS ST SU SV  Accuracy Scores  (ST.ERR OF MEAN)  sw sx  Experiments 2 3  Time Scores  I  0.986 0 .944 0,.875 0.953 0 .953 0 .861 0.953 0 .953 0,.903 0.953 0 .944 0 .694* 0.931 0 .889 0,.931 0.931 0 .944 0 .889 0.953 0,.931 0..792 0.931 0 .972 1 .000-* 0.847 0 .944 0..861 0.944 0 .347 0,.931 0.847 0,.931 0..819 0.931 0 .958 0 .944 0.889 0,.944 0,.944 . 1.000 0 .958 0 .944 0.986 0,.931 0,.931 0.972 0 .972 0,.931 0.972 0,.875 0..903 0.944 0 .361 0,.361 0.917 0..972 0..931 0.972 0..375 0..931 0.903 0.,972 0.,903 0.333 0,.986**'0..953 0.944'*0,,903 0.,953 0.972 0..972 0..361  (MEAN)  2.,532  1  0 .0139 0 .0272 0,.0392 (ST.ERR 0 .0Z37 0 .0237 0 .0410 OF MEAN) 0 .0237 0,.0237 0..0352 0 .0237 0 .0272 0,.0547* 0 .0302 0 .0373 0,.0302 0 .0302 0 .0272 0 .0373 0,.0237 0 .0302 0..0432 0 .0302 0 .0195 0..0 ** 0,.0427 0,.0272 0.,0410 0 .0272 0..0427 0 .0302 0,.0427 0..0302 0..0456 0 .0302 0 .0237 0..0272 0..0373 0..0272 0.,0272 0..0 0 .0237 0,.0272 0,.0139 0..0302 0..0302 0 .0195 0 .0195 0,.0302 0.,0195 0..0392 0..0352 0..0272 0,.0410 0.,0410 0..0323 0..0195 0.,0302 0..0195 0,.0392 0,.0302 0.,0352 0.0195 0.,0352 0..0442 0.,0139**0..0237 0.0272**0. 0352 0.0237 0..0195 0..0195 0,.0410  Outliers with high latency time Outliers below  80% accuracy.  response.  Experiment:, 2  3  3.,673 11. ,649 .277 3.,640 1.,795 6. 1..806 5..214 13.,439 2..3622 .613 5..081 8..542 3..899 5..328 4..203 6, 6..602 .529 8..302 3,.241 5,.509 2 .487 8 .763 16, .296** 4,.949 2..329 6 .164 6,.865 2 .760 8 .083 5..914 2..111 3,.907 2 .550 7.570 9 .306 2,.166 4 .564 9 .055 2 .530 5 .745 14, .111 2,, 146 4,.938 6..347 .1,.757 4 .038 9,.182 1,.289 4 .337 . 5, .920 1 .881 7.987 7.012 2.,695 7,.317 13, .173 2 .946 5 .828 10, .379 3..629 6 .204 4,.197 3,.320 11 .456 •* 8 .371 5..756' "6, .302 13, .391 1,.637 5 .944 6,.329 0.,1347 0..2241 0.9135 0.,1130 0..4290 0.,2206 0. 1082 0.,3913 0.9262 0.,1466 0..4609 0. 1263"* 0.,1813 0.,2961 0.6965 0..1382 0..4071 0.,4643 0. 1630 0.,4962 0.6767 0.,1383 0..6407 1.3677** 0. 2247 0.,4230 0.3087 0..1935 0..6504 0.,6338 0. 1491 0..2135 0.,3559 0.,1222 0,.6148 0..6677 0. 1095 0..3436 0.,6353 0,.1215 0,.6283 0..3878 0., 1065 0..2590 0..4081 0,.1454 0 .1892 0,.4940 0..0792 0..2452 0..5299 0..0836 0 .4567 0,.5362 0..1242 0 .6027 0 .9350 0 .1139 0 .2849 0 .7177 0..2534 0,.3472 0,.2339 0..1918 0,.9533**0 .6269 0.,5006* *0, .3708 1,.0922 0 .1150 0 .4736 0 .4946  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 114 D.  EXAMINATION  OF THE DATA  STRUCTURE  Assumptions required for the validity of F-tests in the fixed effects analysis of variance model are that the  observations  are mutually  independent  and  normally  distributed,  distribution within each level of the factors have the same variance.  and that the  probability  In the case of a repeated  measures design, measures made on the same subject may usually be correlated and therefore, the independence assumption becomes critical.  F-tests conducted are, however, normally computed in a  way which would allow for some relaxation of the various assumptions.  First, the normality assumption may usually be overlooked in an ANOVA or ANCOVA procedure because of the general robustnesst of such a procedure (see Glass et al., 1972, p. 246).  Second,  considerations for the homogeneity of variance assumption include the use of conservative F-tests with adjusted degrees of freedom as well as the use of equal cell observations.  Finally, the independence  of error terms assumption is replaced by symmetry assumptions,* one for each error sum of squares for which there is more than one degree of freedom for a within factor.  1. The Normality Assumption BMDP5D, designed and programmed  by Chasen (BMDP Software Manual, 1985), provides~~an  examination of the raw data structure to assess the normality assumption by means of histograms, normal  probability  plots,  half normal  subpopulations and combined groups. was non-normal. binary.  plots,  and  detrended  normal  probability  plots  for  all  It was found that the distribution of the accuracy measures  This did not come as a surprise since the original data were essentially coded as  It indicated however, that a recoding or a transformation of the data appeared warranted  t Indeed, "robustness" studies (e.g. Rider, 1929; Pearson, 1929,1931; Cochran, 1947; Hack, 1958) have confirmed that the violation of the normality assumption should not be of any great concern. Refer to Glass et al. (1972) for a comprehensive review of this issue. Read also the section on the normality assumption in this dissertation. * This is equivalent to the condition that orthogonal polynomials for any within factor are independent and have equal variances ( 1985 BMDP Software Manual, p. 379; see also Anderson, 1958, p. 259; Winer, 1971, pp. 594-599). According to the BMDP Software Manual, the symmetry assumption is not required for F-tests in an orthogonal polynomial breakdown or those that include only two levels of a repeated measures.  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 115 before they could be submitted to further statistical analyses such as the ANOVA or ANCOVA procedures. The latency time response data, on the other hand, approximated more closely to normal distributions with expected occasional departures from the normality condition.  One reason for the  existence of such departures might perhaps be due to the fact that there were generally one or t w o subjects in each cell w h o showed a greater latency of time response for the various treatment combinations.  It was therefore, not unreasonable to find that the overall distributions tended, most  often, t o be moderately skewed to the right.  Yet, dramatic evidence showing the relative unimportance of the normality assumption for the ANOVA and ANCOVA procedures as reviewed by Glass et al. (1972) may be summarized as,  Normality  has negligible  consequences  on  type-l  and  type-ll  error  probabilities  unless  populations are highly skewed, n's are small, and directional ("one-tailed") tests are employed. (Glass & Hopkins, 1984, p.351)  As none of these warning conditions applied to time observations gathered in this research, and as there were enough cell observations for the various subpopulations in the repeated measures design used to ensure the normal distribution as a g o o d approximation of the unknown distribution from which the observations were drawn, the validity of the ANOVA tests conducted for these experiments should not therefore be threatened (Lindman, 1974; Glass et al., 1972).t  As for the relatively skewed distributions of accuracy scores, a method of normalizing the score distribution is the recoding of the binary "O's" and " V s " scores to a greater range of scores.  This  could easily be achieved through the combined scores for both sessions 1 and 2 as will be discussed in the analysis of accuracy data for the individual experiments.  t The rationale for this is again, the assumption. Refer to previous footnotes.  robustness  of  the  F-tests  to  violation  of  this  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 116 2. Homogeneity of Variance/Covariance The computation of F-statistics assumes that the data are sampled from normal populations with equal variances.  When the sample sizes of the cells and the population variances are not equal, the  distribution of F can be strongly affected and the validity of the F-tests becomes questionable.  While  many researchers are acquainted with Barlett's test of equality of variances for groups with nonzero variances (Dixon & Massey, 1969), it should also be noted that this test is sensitive to the assumption of normality and may improperly reject the null hypothesis t o o often when the distribution of the data is not normal.  Hence, both Barlett's test and Levene's test, which is less sensitive t o the normality  assumption, were computed for the various subpopulations using BMDP7D and BMDP9D.  Again, results of the tests appeared to be mixed with regards to homogeneity of variances for various subpopulations in respect of both time and accuracy performance scores.  Since this assumption was  not found to be totally upheld for each of the experiments, all reported analyses were conducted with conservative F-tests using modified degrees of f r e e d o m , * which protects ANOVA tests for repeated measures against violation of this assumption.  Furthermore, it should be noted that: subjects,  randomly assigned to experiments in this research, came from a homogeneous population;* errors due to between subject differences were eliminated in the experimental design; and error sources were derived from cells with equal number of observations.  in fact, Glass et al. (1972) have also claimed that, within the degree of variance heterogeneity one is apt to encounter in practice, violation with respect to the assumption discussed has negligible consequences on probability statements (type-l error) or power when n's are equal (see Glass & Hopkins, 1984, p.353).  Moreover, since the only variability for a total repeated measures design used  in this case are those within subjects, there is less to worry about violation of this assumption.  + E.g.The Greenhouse-Geisser approach or the Huynh-Feldt approach. * ANOVA results with a grouping factor of undergraduate versus graduate students showed no significant effects for the grouping factor or its interaction with other factors studied.  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 117 3. The Symmetry Condition It has been argued that the interpretation of the ANOVA/ANCOVA results for repeated measures design should rest on corresponding test results of the symmetry conditiont for each error term used in each specific F-test (Davidson, 1980). It has also been pointed out that while the symmetry test of BMDP2V is similar to the notion of Winer's compound symmetry (Winer, 1971, pp. 594-599), it is nonetheless less restrictive than Winer's compound symmetry in that it not only is a sufficient condition but it also is a necessary condition. Tests on the symmetry condition for specific error sum of squares relevant to the particular F-test, therefore, provide clear indications with repect to the independence and equal variances of orthogonal polynomial decomposition for the within factors in a repeated measures design.*  An examination of the various sphericity test results for all of the error terms used in each of the experiments revealed that in only a very small number of cases were the symmetry assumption violated. According to the 1985 BMDP Statistical Software Manual (p. 379),  When there is reason to doubt the symmetry assumption, either because of the sphericity test or because of compelling theoretical considerations, such as the fact that the within factor is time, for which there is a suspected carryover effect from one level of the within factor to the next, tests can be made by reducing the degree of freedom contributed by the within factors. These adjustments are due to Greenhouse and Geisser (1959) and Huynh-Feldt.  See Frane  (1980) for more discussion.  In other words, the conservative F-test results will still provide the required statistics for drawing inferences in cases where the symmetry assumptions are violated.  Nevertheless, a simple and  tThis condition has been discussed previously. However, refer to Eiashoff (1986, p. 15-16) for an explanation of the general structure of the covariance matrix satisfying such a conditon. * The analysis of the within factors in a repeated measures design involves an orthogonal polynomial decomposition for the within factors (see appendix A.18 in the 1985 BMDP Software Manual).  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 118 conservative approach is to report all main factor effects and their interactions for significance based on either the Huyhn-Feldt or Greenhouse-Geisser adjustments to degrees of freedom. This research chooses to use the Greenhouse-Geisser  probability values, which are calculated to yield very  conservative results, for consideration of significant effects whenever possible.  4. The Univariate-Multivariate ANOVA/ANCOVA Issue  O w i n g to the lack of knowledge about violations of the symmetry conditions, the BMDP2V ANOVA procedure and the BMDP4V MANOVA procedure were initially used t o analyze the original datasets. However, when it was found that both analyses were showing precisely the same set of significant main and interaction effects for a chosen at the 0.05 level as a general cutoff point, the need to perform both procedures to analyze the datasets was considered redundant.  Moreover, a study of the BMDP2V outputs revealed that many of the sphericity tests for the various error terms were not rejected, and therefore, it was argued that the univariate ANOVA results could be used t o report the findings.  The reason for this is that when this sphericity assumption is met, as it  usually is, the univariate approach is considered more powerful than the multivariate approach. This is especially so even for cases where the sample sizes are also limited (Davidson, 1980).  ln theory, the univariate approach for a repeated measures design requires a more  restrictive  assumption  than the  regarding the variances and covariances  of the repeated  measurements  multivariate approach. Violation of this assumption may thus result in a less powerful test when there is an effect, or may result in a test that is t o o liberalt (Refer to 1985 BMDP Software Manual p. 395). Yet, the advantage of using the univariate approach for repeated measures design is its ease of accommodating covariates as well as testing for carryover (residual) effects, period effects, and order effects. t  I.e. One which  rejects the null hypothesis of no difference t o o frequently.  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 119 In fact, these effects do not have a clear definition in the multivariate approach.  Even so, while the  multivariate approach is more flexible in the assumption regarding the covariance matrix for the repeated measurements, it does require that the repeated measurements have a multivariate normal distribution.  Such an assumption could, conceivably, be very difficult to test and yet, violation of this  assumption, such as the presence of multivariate outliers, could have serious consequences (Davidson, 1980).  Hence, it appears that, the simpler univariate approach is a better choice than a multivariate  approach for the repeated measures design in most cases.  5. M u l t i p l e C o m p a r i s o n T e c h n i q u e s Figure 5.2 taken from Glass & Hopkins (1984), with permission, shows a flow chart guide for the selection of multiple-comparison  (MC) techniques.  Since Glass & Hopkins  (1984) provided a  comprehensive review of the various techniques and when they are used, the discussion in this section will focus on just why the Dunn (1961) method is chosen for data analysis in this research.  Trend analysis was not used because there was no plan for an underlying continuum for any of the independent variables included.  There was also no requirement that all contrasts to be performed  were orthogonal. This indicated why the ANOVA procedure was performed prior to performing mean contrasts that were of interests.  D A T A A N A L Y S I S : T H E R E P E A T E D M E A S U R E S D E S I G N / 120 Figure 5.2: A Guide for Selection of Multiple-Comparison Techniques  Source: Glass & Hopkins, 1984, p. 393, Reprinted by permission Englewood Cliffs, New Jersey.  of Prentice  Hail, Inc.,  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 121 The  Dunn-Bonferroni  (1961)  multiple-comparison  determining the critical t-ratios.  technique  uses the  Bonferroni  inequality  for  The Dunn method is perhaps best distinguished from the Scheffe  method in that the Dunn method relies on predetermined or planned contrasts whereas the Scheffe method is a very flexible posf hoc data snooping  method (Dunn, 1961; Glass & Hopkins, 1984, p.  381-383). The advantage of using the Schefie method, then, is because it can be used for making any simple or complex contrasts even after inspecting the means. However, Dunn argues that it is,  ... possible in using the t-intervals to select as the intervals to be estimated a very large set of linear combinations which includes all those which might conceivably be of interest.  Then, on  looking at the data, one may decide on actually computing intervals for only some of this set. (Dunn, 1961, p. 56)  Dunn (1961) provides tables to show that, for a fairly large number of means, the t-intervals (Dunn method) are shorter than corresponding intervals using the F-distribution (Scheffe method) for any reasonable size of the number of linear combinations (see tables 3 and 4 in Dunn, 1961, p. 56-57). As such, Miller (1966, p.54) argued that for a prespecified subset of the possible contrasts, the Dunn method is normally more powerful than the Scheffe method.  While the Scheffe method (also known as F-projections or S-method) is the most widely published statistical technique for multiple comparison (Hopkins & Anderson, 1973), Glass & Hopkins claimed that "the flexibility of the Scheffe method such that post hoc data snooping for any number of contrasts is allowed, causes it to be a very conservative and inefficient procedure in the usual research circumstance in which there is interest in only a limited subset of possible contrasts, such as all pairwise contrasts."  (Glass & Hopkins, 1984, p. 383).  In this research, the Dunn-Bonferroni (1961) method of multiple mean comparison is used only on those pairwise contrasts among sub-population means that are of key interest and planned a priori.  DATA ANALYSIS: THE REPEATED MEASURES DESIGN / 122 E.  SUMMARY  In summary, there is little to worry about the various assumptions for the computation of F-statistics to be performed, for all of the experimental data captured in this research. The initial screening of the experimental data also revealed only a very small number of outliers.  Moreover, the use of repeated  measures design helps to ensure that sufficient power can be achieved even with low sample sizes. Finally, similar statistical procedures with emphasis placed on the univariate ANOVA approach and the Dunn-Bonferroni (1961) multiple-comparison method are adopted for analyzing and interpreting all experimental data captured in this research.  VI. RESULTS: EXPERIMENT 1 This chapter presents results of experiment 1 (E1). consisted of nineteen males and five females.  The subject composition for this experiment  Of these twenty-four candidates, thirteen were  second-year commerce undergraduates whereas eleven were first-year MBA students.  The age of  these subjects ranged from 20 to 32 with the average falling close to 24.83 years of age. At the time of participation, all subjects were enrolled in MIS courses at the introductory level.  In addition to the discussion of chapters 4 and 5, a number of key issues concerning data analysis still need to be addressed.  The first of these considerations is whether separate analyses should be  performed on time for sessions 1 and 2 datasets.  An apparent justification for making this decision  would be, for instance, the presence of strong learning as indicated by a significant difference found between time performance in sessions 1 and 2.  The second consideration is whether to exclude the GEFT variable from the final statistical model to be used for analyzing the various datasets.  For example, the lack of a significant GEFT effect when  included as a covariate, or the lack of a strong correlation between GEFT scores and performance scores, would justify such an action.  The third consideration is whether there are more outliers than those already identified.  For example,  including a subject having a high time-accuracy tradeoff might well contribute only to less interpretable results.  This is so because in the face of a high time-accuracy tradeoff, factors or combinations of  factors found to contribute to high time outcomes might also be interpreted as those contributing t o lower accuracy, thus, confounding the findings. Yet, such a confounding can only be detected when there is a simultaneous tracking of time and accuracy performance.  The final consideration is the issue of statistical power.  This is especially important when several  additional outliers are found and a decision has to be made on whether t o discard all of them, or retain 123  RESULTS: EXPERIMENT those who appeared to present only marginal problems.  1/124  The purpose of maintaining high power  values for the various F-tests is, of course, to increase the level of confidence for not rejecting the null hypotheses.  These issues are discussed in sequence in the following few sections, prior to discussing the detailed results on time for task performance during sessions 1 and 2. There follows a presentation of accuracy performance statistics for the two sessions combined. The chapter will then conclude with a summary of key findings for E1.  A.  TIME PERFORMANCE  FOR COMBINED  SESSIONS  An initial ANCOVA model was run on the full dataset which included both sessions 1 and 2 datasets. The model was a repeated measures design with the following major classification factors:  1.  S: Session (Session 1, or Session 2)  2.  G: Graph Format (Bars, or Symbols, or Lines)  3.  Q: Question Type ( Q 1 , or Q2, or Q3)  4.  T: Time Period (7, or 14 Periods)  5.  D: Dataset Category ( 1 , or 3 Datasets)  The GEFT scores were treated as a covariate in the model.  1. The Session Effect Analysis of time performance on the full dataset with just one outlier excluded (see table 5.1) for this first experiment revealed a highly significant (F=60.99, p < .01) Session effect as shown in table 6.1. Mean time response dropped from 2.8 seconds in session 1 to 2.2 seconds in session 2.  Subjects  appeared to experience learning and the implication of the effect is therefore the need for a separate analysis of experimental datasets for time captured during sessions 1 and 2.  RESULTS: EXPERIMENT  1/125  2. The GEFT Measure Results of the analysis (table 6.1) did not reveala significant GEFT effect (F = 1.78; p > .05) although there was the need to further substantiate the evidence before a final decision could be made on whether to exclude the GEFT variable from the analysis model.  Accordingly, when the GEFT scores of all subjects were correlated with their respective mean time performance scores combined over the 72 treatment combinations, BMDP6D revealed only a low and insignificant correlation (R = -.29; P(R) = .17; Mean time = 2.67 seconds; Mean GEFT=14.3).  Moreover,  correlations of corresponding scores between and within sessions did not  significant  GEFT-time relationships.  produce  A R-square value that was essentially zero (R-square = 0.0003) was also  obtained when the GEFT measure was placed solely in a regression model with the time response specified as the only dependent measure.  In summary, the evidence collectively supported the relative unimportance of the GEFT measure in explaining subjects' time performance measure.  Hence, a decision was made to discard the GEFT  scores from the final statistical model used to analyze effects due t o latency of reaction.  3. Additional Outliers Since the presence of outliers that exhibited a high time-accuracy tradeoff effect could contribute only t o less interpretable results, the degree of time-accuracy tradeoff among individuals was assessed by correlating their time performance scores with their accuracy performance scores over all experimental trials.  Those that indicated a high time-accuracy tradeoff were to be highlighted and considered for  removal.  RESULTS: EXPERIMENT 1 / 1 2 6 Table 6.1: Initial A N C O V A Results f o r t h e Full Dataset (Experiment 1) D e p e n d e n t Variable: T i m e P e r f o r m a n c e  Sources of Va r iance  GEFT S: S e s s i o n  F  1 .78  Convent i o n a l Greenhouse p-values Gei s s e r Prob.  0.1971  60 .99  0.0000**  6 . 42  0.0037**  0.0044**  Q: Q u e s t i o n Type  1 7. 42  0.0000**  0.0000**  T: Time Period  62 . 1 4  0.0000**  D: Dataset  73 . 1 8  0.0000**  G*Q  21 .29  0.0**  0.0**  0.6371  0.6370  G: Graph Format  G*T  0 . 38  Q*T  17 .0 1  0.0000**  0.0000**  G*D  7 .65  0.0015**  0.0017**  Q*Q  0 .31  0.7370  0.7165  T*D  1 . 69  0.2078  G*Q*T  1.95  0.1104  0.1302  G*Q*D  3 .52  0.0 104*  0.0195*  G*T*D  1 .75  0.18 63  0.1867  Q*T*D  1 1.44  0.0001** -  0.0002**  0.0 163*  0.0220*  G*Q*T*D  Significant at p = 0.05 level * Significant at p = 0.01 level-  3 .23  RESULTS: EXPERIMENT  1/127  Yet, the possibility that a particular subject could have a high time-accuracy tradeoff during the combined sessions but not for the separate sessions made it more difficult to decide how these additional outliers were to be identified.  It was decided that session 2 data should be used for detecting these outliers since significant effects found during session 2 were of greater interest than effects found significant during session l . t This was due to the possibility that some of the effects found t o be significant during session 1 might just be temporal, or even spurious, whereas those more permanent effects w o u l d be expected to show again during later sessions regardless of the amount of training subjects had undergone.  Additional  outliers found for E1 based on individual time-accuracy correlations and their respective significance during session 2 have been marked in table 6.2.  Whether all of these additional outliers should be eliminated depends again on how much power w o u l d be lost due to their subsequent removal. Consequently, it appeared most appropriate, at this point, to discuss briefly the relatively important concept of the statistical power of the F-tests associated with the ANOVA/ANCOVA procedures (see Cohen, 1977; Baroudi & Orlikowski, 1987). Deciding on a power level desirable for the various F-tests can also help t o determine the benefits of still retaining some of those additional "outliers" w h o might just present marginal problems.  t The concern with those effects found significant during the combined sessions was already ruled out on the basis of the presence of the significant (p < .01) Session effect discussed earlier.  RESULTS: EXPERIMENT 1 / 1 2 8 Table 6.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment 1, Session 2) Dependent Variable: Time Performance Scores  El Subjects ( S e s s i o n 2)  Probab i 1 i t y P(R)  Cor re l a t i o n R  0 1  .1404  02  (PERFECT  03  -.2892  . 4032 ACCURACY  Sample Size 36  NOT C O M P U T A B L E )  .0732  36  .0063**  36  04  . 4278  05  -. 322 1  .0482*  36  06  -.3171  .052 I  36  07  (PERFECT  08  -.4065  .0107*  36  09  -.1871  .2626  36  1  0  (PERFECT  1  1  (PERFECT  -. 1  (PERFECT  1 5  (PERFECT  1  (PERFECT  I 7  NOT C O M P U T A B L E )  --  NOT C O M P U T A B L E )  ACCURACY  --  NOT  I  COMPUTABLE) 36 . 3342  159  1 4  6  --  ACCURACY  .1618  1 2 1 3  ACCURACY  . 2706  36  . 4908 ACCURACY  I  --  ACCURACY  NOT C O M P U T A B L E ) NOT  I  COMPUTABLE) COMPUTABLE)  ACCURACY  --  NOT  36  . 1 006  36  .1735  .5053  36  .0915  .2353  36  2 1  - . 2527  .5372  36  22  -.2873  . 1 265  36  24  -.3022  .0803  36  18  1  -.1121 9  20  .0732  * Significant at p = 0.05 level ** Significant at p = 0.01 level  ^ /  RESULTS: EXPERIMENT  1/129  4. The Power Analysis Cohen (1977) defines the p o w e r t of a statistical test as the probability of yielding statistically significant results. Baroudi & Orlikowski (1987) reviewed 57 management information system articles published in such leading journals as Communication  MIS Quarterly  of the ACM, Decision  Sciences,  Management  Science,  and  to find that, on average, the statistical power of inference testing for these articles fell  substantially below the acceptable norm of .80 (Cohen, 1965, 1977; Welkowitz et al., 1982). According to Baroudi, not only does power analysis provide  a measure of confidence  when  interpreting F-test results, but it also ensures that the decision to support the null hypothesis is not a misrepresentation.  An approach for calculating power is given in Cohen (1977), and this was used in the analysis of pilot results for this experiment*. Based on this approach with a = .05, and effect size, f estimated rather conservatively at .25 (see Cohen, 1977, p. 277-281), power analysis was performed for all of the F-tests conducted.  Even taking into consideration all the additional outliers that were identified for subsequent removal (see table 6.2), most if not all of the F-tests conducted showed power values clearly above the conventional benchmark of .80.*  Consequently, this eases the need to report only results for those tests where the decision was to reject the hypotheses of no differences between group means for various subpopulations.  The  argument for such an approach is simply that when statistical tests conducted have power values close to that conventionally acceptable, a " n o effect" conclusion may be confidently stated.  t See the Glossary for an alternative definition of this term. * Appendix I contains a summary report of the pilot test results. * Power values were predetermined for this research as stated in the pilot test report (see appendix I).  RESULTS: EXPERIMENT B.  TIME PERFORMANCE  FOR SEPARATE  1/130  SESSIONS  Evidence of learning, as well as the independence of the GEFT measure from the dependent variable of time performance, justified the use of a simpler model to analyze the combined dataset separately: one for session 1, and the other for session 2.  As observed earlier, the three additional outliers  uncovered for session 2 were subsequently discarded on the basis that this would only have a negligible effect on the power values of the various F-tests conducted.  Moreover, to ensure that  findings for the various datasets (i.e. the full and separate datasets for sessions 1 and 2) are comparable, these additional outliers were also purged from the full dataset, as well as the dataset for session 1, before performing further statistical analyses.  Since BMDP2V computed its own error term for each respective F-test in the specified repeated measures model, corresponding analyses of the datasets were conducted using the SAS ANOVA procedure t o cross-examine results. For the SAS analyses, a mixed effect model designt was used. A five-way analysis of variance procedure for which the four fixed factors of Graph Format, Question Type, Time Period, and Dataset were totally crossed with a fifth random Subject factor.  The standard  error terms for such a model with repeated measures were applied accordingly to test the various specific F-tests.  Results of both the BMDP and SAS packages were in agreement indicating that the  correct error terms had been computed by the BMDP software.  Tabie 6.3 compares those main factors and their interactions that were significant for the combined as well as separate sessions (i.e. session 1 and session 2).  t This design combines the fixed effect model with 1985 SAS User's Guide: Statistics, Version 5 Edition).  the  random  effect  model,  (see  the  RESULTS: EXPERIMENT 1 / 1 3 1 Table 6.3: Comparison of ANOVA Results Among Sessions (Experiment 1, Additional Outliers Excluded) Dependent Variable: Time Performance  Sources of Variance  +  **  p-values (Combined Sessions)  p-values (Sess ion 1)  p-values ( S e s s i o n 2)  S: S e s s i o n  0 .0000**  G: Graph Forma t  0 .0148*  0 .0145*  0 .0888  Q: Q u e s t i o n Type  0 .0000**  0 .0003**  0 .0020**  T: Time Per i o d  0 .0000**  0 .0000**  0 .0002**  D: D a t a s e t  0 .0000**  0 .0000**  0 .0000**  G*Q  0 . 0**  0 . 0**  0 .0012**  G*T  0 .3509  0 .8039  0 . 1 463  G*D  0 .0027**  0 .0240* .  0 .0256*  Q*D  0 .8802  0 .8262  0 .4567  Q*T  0 .0000**  0 .0000**  0 .0243*  T*D  0 .2218  0 .6321  0 .2100  G*T*D  0 .1152  0 .4175  0 . 3684  G*Q*T  0 .2905  0 . 1 653  .0 .3131  G*Q*D  0 .0307*  0 .0333*  0 .7287  Q*T*D  0 .0007**  0 .0111*  0 .0005**  G*Q*T*D  0 .024 1*  0 .0373*  0 . 1 590  Significant at p = 0.05 level Significant at p = 0.01 level  RESULTS: EXPERIMENT 1 / 1 3 2 Table 6.4: Tables of Means for All Treatment Combinations (Experiment 1, Outliers Excluded) Dependent Variable: Time Performance  Graphical Information Complex i t y  Ql  Q2  Q3  Ql  Q2  I-.ines Q3  Ql  Q2  Q3  a Session Session  1 2  1 .88 1 . 65  b Session Session  1 2  2 . 7 9 2 . 1 3 2 .06 3.01 3 .22 2 . 0 3 3 . 60 2.39 2 . 2 8 1 .72 1.75 2 .39 1 . 90 1 .84 2.51 2.10  2 .26 1.17  c Session Session  1 2  2 . 6 5 3 .02 3 . 1 6 2 . 56 3 .24 2 .66 3 . 76 3.16 2 .26 2 . 2 4 2 . 2 9 1 .89 2 . 27 2.11 2 . 6 6 2 .45  2 . 60 1 .68  d Session Session  1 2  3.25 3.17 2 . 8 6 2.81  Treatment  a: b: c: d:  fmboli  Bars  1 1 3 3  1 . 8 9 1 . 97 1 .89 1 . 95 2 .79 3 . 06 2 . 02 2 .39 1 .54 1 .66 1 . 54 1 . 68 1 .80 2 .44 1 . 42 1 .92  Combinations  Dataset Dataset Datasets Datasets  with with with with  7 14 7 14  4 .27 3.73 2 . 8 5 2 . 8 8 5 . 4 9 2.85 2 . 6 6 2 . 9 9 2 . 40 2 . 04 2 .42 3. 27 2 .36 2.14  of Information  Time Time Time Time  Periods Periods Periods Periods  Complexity  RESULTS: EXPERIMENT 1 / 1 3 3 A comparison of table 6.1 with this table shows that generally similar factors or combinations of factors were significant for the full dataset (i.e. including the additional outliers) as compared t o those for the reduced dataset (i.e. excluding all outliers).  Mean scores for time performance combined across all  the initial 36 treatment combinations for sessions 1 and 2 for the reduced dataset are presented in table  6.4.  Note  the apparent  improvement  over each treatment  combination, which  is also  representative of a separate experimental trial, when mean values between sessions are compared.  1. Significant Effects on Time for Session 1 With the observation made earlier that session 2 findings are t o be treated as more critical than session 1 results for this research, only significant effects of key interest are discussed in this section.  Note,  however, that findings for session 1 could add t o our knowledge of graphics users in situations where the readers only retrieve computer graphics displays infrequently, for examples, executives w h o use graphical support systems occasionally, or those less experienced first-time graph users, w h o might still require a substantial amount of training with reading and understanding graphical charts.  Table 6.3 shows the following factors or combinations of factors t o be significant for time during session 1:  1.  Graph Format effect  2.  Question Type effect  3.  Time Period effect  4.  Dataset effect  5.  Graph Format by Question Type interaction  6.  Graph Format by Dataset interaction  7.  Question Type by Time Period interaction  8.  Graph Format by Question Type by Dataset interaction  9.  Question Type by Time Period by Dataset interaction  RESULTS: EXPERIMENT 10.  1/134  Graph Format by Question Type by Time Period by Dataset interaction  Among these, only the Graph Format by Question Type interaction is of key interest since the major purpose of this research is the understanding of how different graph formats would affect performance with various tasks.  Figure 6.1 depicts the plot for this 2-way interaction during session 1 of E1 and  shows the corresponding mean values.  Table 6.5 gives results of the multiple mean comparison  produced by BMDP7Dt software on those sets of differences among means that are of major interest; that is, those contrasts that are within each question type or those that are within each graph type but not those among different question and graph types (table 6.5).  Hence, out of a total of 36 possible  contrasts, only 9 contrasts among means are planned and tested by the Dunn-Bonferroni method.  Study of figure 6.1 and table 6.5 reveals that key significant differences were between a particular treatment  combination  (i.e. the  Line-Q1  combination)  and the  other  treatment  combinations  compared. Subjects using line graphs took significantly longer for extracting scale-values (Q1) than for extracting level differences (Q2) or trends (Q3). Moreover, when lines were used for performing Q 1 , subjects were significantly slower than when either symbols or bars were used.  t BMDP7D procedure performs multiple based on the Dunn-Bonferroni technique.  means  comparisons  of  subpopulation  observations  RESULTS: EXPERIMENT  1/135  Figure 6.1: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 1, Session 1) Dependent Variable: Time Performance  Question Type  *  Ba r s  Symbo1s  Lines  Ql  2.64  (* I )  2.80  (*4)  3.98  (*7)  Q2  2.56  (*2)  2.82  (*5)  2.60  (*8)  Q3  2.87  (*3)  2.59  (*6)  2.48  (*9)  Numbering i n c e l l s c o r r e s p o n d to treatment c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i m e t h o d o f m u l t i p l e means comparison.  RESULTS: EXPERIMENT 1 / 1 3 6 Table 6.5: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 1, Session 1) Dependent Variable: Time Performance  1.  2.  Differences found among Means at the a = 0.01 level of significance: a.  CV7)  b.  (*4,*7)  c.  (*8,*7)  d.  (*9,*7)  No other differences among the 9 mean comparisons that are of interest were significant either at the a = 0.01 or 0.05 level of significance ( The reader should refer to the discussion given in the main body of text on which 9 out of 36 mean comparisons were planned and of interest).  3.  Note that the above numbering of treatment combinations correspond t o those marked in the table of means given in figure 6.1.  RESULTS: EXPERIMENT  1/137  In general, these results are in line with major aspects of the Kosslyn-Pinker theory (Kosslyn et al., 1983).  Symbols as well as bars (see table 6.5) proved to be better suited than lines for  performing Q1 primarily because they have characteristics of being isolated and discrete, and they are probably processed individually (i.e. one bar after another).  In contrast, lines are not  well suited to this particular task as the task involves the breaking up of a 'Cestalt' -- the line. This result is also in agreement with the anchoring concept of matching appropriate formats to appropriate tasks discussed in chapter 3.  Since lines have the worst x-axis as well as y-axis  anchoring characteristics compared to either bars or symbols, they proved to be the least appropriate format for tasks with a strong anchoring on both of the t w o major dimensional axes (i.e. characteristics of Q1 of this experiment).  2. Significant Effects on Time for Session 2 At this point, it should be noted that emphasis in this dissertation is placed on Session 2 results for several reasons. First, session 2 findings would be more generalizable to situations where graphics are in constant use or among frequent graphics users.  Second, by session 2, all participants had been  sufficiently exposed to the various graphics stimuli and therefore, effects found during session 2 should be very much less contaminated by the lack of experience or training with the different types of graph format tested. More importantly, the findings will contribute to an accumulation of the general knowledge about the type of task activities that would be best supported by various types of graph format and design.  In short, it will add to our knowledge on the reading and understanding of  graphics at a more "practical" level than results found in session 1.  Interestingly enough, the number of significant main and interaction effects found in session 2 were factors, or combinations of factors that had turned out to be significant in session 1 of this experiment (E1).  More importantly, the number of significant effects or interactions found to be significant in  session 2 was considerably reduced as compared to those found in session 1. This also confirmed our knowledge that the learning curve is a marginally diminishing one.  RESULTS: EXPERIMENT  1/138  For session 2, the following significant main and interaction effects were found: 1.  Question Type effect  2.  Time Period effect  3.  Dataset effect  4.  Graph Format by Question Type interaction  5.  Question Type by Time Period interaction  6.  Graph Format by Dataset interaction  7.  Question Type by Time Period by Dataset interaction  a.  1.  Main  Factor  Effects on Time for Session  2  .  Question Type -- This factor turned out to be highly significant (p < 0.01). Subjects were found to take approximately 1.98 seconds for performing questions on trend extraction (Q3), 2.05 seconds for questions on level difference extraction (Q2), and 2.35 seconds for questions on scale-value extraction (Q1).  The Dunn-Bonferroni tests on Session 2 indicated significant  differences between performing Q1 and Q2 and between performing Q1 and Q3.  In short, there are differences in extracting different types of data from standard time-series graphics.  In particular, Q1 appeared to be a more difficult task than Q2 and Q3 in this  experiment.  Perhaps the differences would have been nonsignificant had tabular displays been  included since results from prior research has shown that the best form of  information  presentation for reading scale-values (Q1) is that of a table (see Jarvenpaa & Dickson, 1988). 2.  Time Period - Effect of this highly significant factor (p < .01) was as expected. As the number of time periods depicted along the abscissa of time-series graphics increased, time performance was found to deteriorate correspondingly.  Average time for using 7-period graphics was close  to 2 seconds compared to 2.3 seconds for 14-period graphics. 3.  Dataset -- As expected, an increase was found on latency of responses as the number of datasets depicted on a single plot increased.  For graphs with 3 datasets; time performance  RESULTS: EXPERIMENT 1 / 139 averaged 2.4 seconds but for graphs with only a single dataset, average time performance dropped to 1.9 seconds.  Two-way  Interactions  on Time for Session  2  Graph Format x Question Type -- This interaction is of central focus to the study.  Just as in  session 1, this interaction was found to be highly significant (p < .01) during session 2.  Figure  6.2 shows the plot and mean values for this interaction and table 6.6 shows the Dunn-Bonferroni results for this interaction.  According to the Dunn-Bonferroni tests for session 2, subjects w h o used line graphs took significantly longer (p < .01) in extracting scale-values of single datapoints (Q1) than reading trends (Q3). Lines also took significantly longer (p < .05) in extracting scale-values (Q1) than in reading level differences (Q2).  In addition, symbols were significantly (p < .05) faster to use  than lines for reading scale-values (Q1).  No other significant differences that were of interest  were found at the nominal a = 0.05 level (table 6.6).  Since this same interaction was also found to be significant during session 1 and the details of Dunn-Bonferroni results were more or less similar to those found in session 1, both trained and untrained subjects alike found lines to be difficult t o use for extracting scale-values.  This is  consistent with the view that subjects tend to read each 'line' on a line graph as a Cestalt so that isolating single points on a line to read their scale values results in a time consuming and effortful process.  RESULTS: EXPERIMENT 1 / 140 Figure 6.2: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance  DC  a.a -  1^ ».»-  O O  J  ,  ——•  , QUC2TICN  Quest ion Type  Bars  Dor- •  —  TYPC  Symbo1s  Lines  Ql  2.26  (*1)  2.05  (*4)  2.72  (*7)  Q2  2.08  (*2)  1.97  (*5)  2.08  (*8)  Q3  2.17  (*3)  2.04  (*6)  1.7 3  (*9)  * Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t combinations f o r D u n n - B o n f e r r o n i method of m u l t i p l e means c o m p a r i s o n  RESULTS: EXPERIMENT 1 / 141 Table 6.6: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance  a.  Significant Differences among Means at the a = 0.01 level:  1) b.  c.  (*9,*7)  • Significant Differences among Means at the a = 0.05 level: 1)  (*4,*7)  2)  (*8,*7)  No other contrasts that were of interest were significant. The reader should also refer to table 6.5 which presents results for session 1.  Note that the above numbering of  treatment combinations correspond to those marked in the table of means given in figure 6.2.  . RESULTS: EXPERIMENT 1 / 142 The rationale underlying these results has already been explained in the earlier section on the same significant Graph Format by Question Type interaction for session 1. That lines were also found to take significantly longer for performing Q1 when compared to Q2, but that no significant differences were found on time performance between Q2 and Q3, simply highlighted the similarity of Q2 and Q3 in this experiment.  That is, it takes just t w o points (whether  adjacent or not) to produce a slope or trend. This is also consistent with the grouping of Q2 and Q3 on the basis of the anchoring concept as belonging t o the same task category (i.e. Group II tasks). 2.  1  Question Type x Time Period -- Unlike the disordinalt  Graph  Format x Question Type  interaction, this two-factor interaction appeared t o be strictly ordinal.£ The plot and mean value table  for  this  significant  interaction  during  session  2 are  shown  in  figure  6.3.  The  Dunn-Bonferroni results are summarized in table 6.7 Results of the Dunn-Bonferroni tests for session 2 indicated that with more time periods (14 periods), performance on Q1 alone was adversediy affected (p <  .01).  No statistically significant adverse effects were found with  increasing time periods on both level difference questions (Q2) or, trend questions (Q3).  In  addition, for 14 period graphics, Q1 was found t o take significantly (p < .01) longer to perform than either Q2 or Q3.  Consistent with the previous finding on the highly significant direct effect of the Time Period factor, this result supported the notion that complexity of graphics is stronger with more time periods depicted along the abscissa of a time series plot.  However, it also indicated, together  with the finding on the Question Type effect, that, in effect, more time periods had greater adverse effects on certain task activities (e.g. others (e.g. Q2 and Q3).  t See figures 6.1 and 6.2. £ This term is defined in the  Glossary.  Q1) in this experiment (E1) when compared to  RESULTS:  EXPERIMENT  1 / 143  Figure 6.3: Plot and Mean values of Question Type x Time Period Interaction (Experiment 1, Session 2)  L a g o n d  Ql  Que s t i on Type  Q2 QUESTION  TYPE  Q2  Ql  Q3  7 Periods  2.07  (* 1 )  1.94  (*3)  1.91 (*5)  14 Per i o d s  2.62  (*2)  2.16  (* 4)  2.05 (*6)  * Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i m e t h o d o f m u l t i p l e means c o m p a r i s o n  RESULTS: EXPERIMENT 1 / 144 Table  6.7: Summary  of  Dunn-Bonferroni  Results for  Question Type x Time  Period  Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance  a.  b.  Significant Differences among Means at the a = 0.01 level: 1)  (*V2)  2)  (*2,*4); (*2,*6)  No other contrasts of key interest were significant.  Note that the above numbering of  treatment combinations correspond to those marked in the table of means given in figure 6.3.  RESULTS: EXPERIMENT 3.  1/145  Graph Format x Dataset - Just as the Question Type x Time Period interaction, this was another ordinalt interaction as plotted in figure 6.4 which also shows the mean value table for this interaction. Table 6.8 shows a summary of the Dunn-Bonferroni tests for this interaction.  Analysis of this interaction for session 2 revealed that using different types of graph format resulted in different degrees of additional time and effort for multiple dataset versus single dataset representations.  The Dunn-Bonferroni tests indicated that for bars, a highly significant increase in latency time performance was found whereas the effect with lines was of less significance (table 6.8).  No  significant difference was found for using singular symbol plots versus multiple symbol plots. O n the other hand, multiple (3 datasets) bars were found to yield significantly greater latency responses when compared to multiple symbol representations.  One reasoning that might explain these observations is that with multiple datasets, multiple bars that are also isolated from one bar to the others are used.* In other words, bars belonging to a particular dataset category are represented as isolated bars. *  The same is not true about multiple symbol charts or multiple line graphs.  In line graphs, each  dataset is depicted as one line, which corresponds to a Gestalt by itself since the various points belonging to the same dataset are fully connected on each line. In symbol charts, although the various symbols are not fully connected, symbols belonging to the same dataset are distinct from those belonging to other datasets. Moreover, these symbols normally form a chain pattern t Essentially, this means that relatively higher factor across all levels of the other factor in representing the interaction do not cross each * Illustrations on how multiple bar charts appendices B, C, and D. * For the purpose of future references, the will be used to describe this situation.  effects are found for the same level of one a two-way interaction. In other words, lines other. were coded in this research are given in term  "catergorical  isolation"  of  multiple  bars  RESULTS: EXPERIMENT 1 / 146 which the human eye can automatically link.  In contrast, differently coded bars, which are similar in shape to each other, are often used to depict different datasets.  This explains why with bars, increasing the number of datasets  depicted resulted in a lot more time and effort to read and understand than with other formats.  An important  implication of these results is that the use of bars should be limited  to  representing single datasets. Then, when multiple datasets must be represented, it is advisable to use either symbols or lines. The choice between using multiple lines or symbols appears to depend on the anchoring characteristics of the tasks at hand.  In this experiment, since there  was a strong abscissa anchoring for all tasks, the use of multiple symbols proved to be faster to use than multiple lines.  Hence, the difference between using multiple symbols and singular  symbols was not found to. be significant on time but not between using multiple lines versus singular line (table 6.8).  RESULTS: EXPERIMENT 1 / 147 Figure 6.4: Plot and Mean values of Graph Format x Dataset Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance  Graph Forma t  Symbo1s  Bars  Lines  1  Dataset  1 .77  (* 1 )  1.86  (*3)  1 . 93  (*5)  3  Datasets  2.58  (*2)  2.19  (*4)  2.4 3  (*6)  Numbering i n c e l l s correspond t o treatment combinations f Dunn-Bonferroni method o f m u l t i p l e means comparison o  r  RESULTS: EXPERIMENT 1 / 148 Table 6.8: Summary of Dunn-Bonferroni Results for Graph Format x Dataset Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance  a.  Significant Differences among Means above the a = 0.01 level:  1) b.  c.  (*V2);  Significant Differences among Means at the a = 0.01 level:  1)  (*2,*4)  2)  (*5,*6)  No other contrasts of interest were significant.  Note that the above numbering of  treatment combinations correspond to those marked in the table of means given in figure 6.4.  RESULTS: EXPERIMENT 1 / 149 c.  Three-way  Interactions  on Time for Session  2  The ANOVA results for session 2 produced one significant (p < .01) three-way interaction (see table 6.3). This was the Question Type x Time Period x Dataset effect. The data table for this interaction is given in table 6.9 and results of the Dunn-Bonferroni tests on those contrasts that are of interest in Session 2 t are displayed in table 6.10.  An examination of the Dunn-Bonferroni tests results revealed a significant and consistent increase in latency time for performing the same tasks (i.e. Q l , Q2, and Q3) at higher levels of the time period and dataset category variables.  In other words, performing Q 1 , Q2, and Q3 on plots with only 1  dataset and 7 periods were found to take significantly less time than performing these tasks on plots with 3 datasets and 14 periods, as shown in table 6.10.  Even so, performance of Q2 was adversely affected by increasing datasets on 7-period plots whereas performance of Q3 was adversely affected by increasing datasets on 14-period plots.  In addition, it  was also found that only with 14-period single dataset plots were task activities associated with Q1 taking significantly (p < .01) longer than task activities associated with Q3.  No other meaningful  comparisons were significant.  In summary, these results were consistent with the ealier findings that increasing number of time periods and datasets adversely affected time performance with graphics displays, although the extent of these adverse effects depended largely on the type of task activities and on whether time periods or datasets were to be increased.  t The cell means are marked out in table 6.9 with results on contrasts reported in table 6.10.  a number to  correspond to the  Dunn  RESULTS:  EXPERIMENT  1/150  Table 6.9: Data Table of Question Type x Time Period x Dataset Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance  Treatments  7 Quest ion Type  *  of  Information  1  Time Per iods  One Da t a s e t  Ql _ Sess ion  2  1.88s (* 1 )  Q2 Session  2  1.55 (*5)  Q3 Session  2  1.79s (*9)  14  Three Da t a s e t s  s  Complexity  Factors  I  Time Per i o d s  One Da t a s e t  Three Datasets  2.27 (*2)  s  2.39 (*3)  s  2.84 (*4)  s  2.32 (*6)  s  1.91 (*7)  s  2.40 (*8)  s  2.03 (*10)  s  1 . 58 (*11 )  s  2.51 s ( *LAST).  Numbering in c e l l s c o r r e s p o n d to treatment combinations for the D u n n - B o n f e r r o n i method of m u l t i p l e means c o m p a r i s o n .  RESULTS: EXPERIMENT 1 / 1 5 1 Table G.10: Summary of Dunn-Bonferroni Tests for the Question Type x Time Period x Dataset Interaction (Experiment 1, Session 2) Dependent Variable: Time Performance  1.  2.  3.  Significant Differences among Means at the a = 0.01 level: a.  (*1,*4)  b.  (*3,*11).  c.  (*5,*8);  d.  (*11,*12)  Significant Differences among Means at the a = 0.05 level: a.  (*5,*6)  b.  (*9,*12)  N o other contrasts of interest were significant.  Note that the above numbering of  treatment combinations correspond t o those marked in the table of means given in table 6.9.  RESULTS: EXPERIMENT  1/152  Finally, results of this interaction indicated that the finding of Q1 taking longer to perform than Q3 in this experiment was due solely to the case of single dataset plots with 14-periods. other words, more time periods on single dataset plots facilitated trend perception.  In  But with  graphics that were highly complex, difficulties were found with the performance of all tasks.  C. ACCURACY PERFORMANCE FOR COMBINED SESSIONS In this experiment, since subjects whose accuracy fell below the 80% mark and whose time-accuracy scores had been significantly correlated were subsequently removed from all of the experimental datasets, it was expected that few, if any, effects due to accuracy performance would be of significant interest. Moreover, as the emphasis of theory testing in this research was placed primarily on time (i.e. the Kosslyn-Pinker theory) as discussed in the other chapters, it seemed reasonable not to be overly concerned about accuracy scores.  Indeed, an analysis of the accuracy scores for the original dataset collected in session 2 alone revealed none but one significant 3-factor interaction (table 6.11). The information that was available based on the analysis of session 2 dataset would, therefore, be insufficient to provide any useful insights.  In  fact, when multiple comparisons of means were performed for all main and two-factor combinations for session 2 dataset alone, no significant effects at the nominal level were uncovered. This confirmed the relatively low informational nature of the accuracy scores as compared to time scores for session 2 of this experiment.  The alternative of performing the data analysis for accuracy on the combined  dataset with more emphasis placed on session 2 scores was thus adopted.  Furthermore, in order that departure from the normality assumption could be minimized because of the binary nature of the accuracy performance  scores captured  transformed by the coding scheme to be described. reassigned a score (i.e.  originally,  these  scores  were  Each correct initial attempt in session 2 was  " 2 " ) which doubled the score (i.e.  "1") assigned to each correct initial  attempt in session 1. Errors committed during initial attempts regardless of the session in which they  RESULTS: EXPERIMENT were committed were assigned "O's".  1/153  In this way, the final recoded data would range from scores of  0, 1, 2, and 3 instead of the original "O's" and " 1 ' s " . This new coding scheme yielded a much more nearly normal distribution of the original data.  It seems important also to consider whether the statistical model to be used should include the GEFT measure.  In other words, the question  is whether  to  use an ANCOVA o r ANOVA  model.  Accordingly, the GEFT measure was correlated with the transformed accuracy scores of individual subjects with all of the outliers identified excluded.  The result was a highly significant correlation  (R = .71,  GEFT= 14.05).  P(R)=97E-6,  Mean Accuracy = 0 . 9 4 ,  Mean  performance was largely explained by individual differences.  This indicated  that  accuracy  Subjects w h o were considered as more  field-independent performed more accurately than those w h o were rated as more field-dependent.  To verify this high degree of correlation between the GEFT measure and accuracy performance among participants, an ANCOVA procedure with the GEFT measure included as the covariate was used to analyze the datasets for sessions 1 and 2 as well as the set of transformed scores.  Results of the  ANCOVA procedure indicated a highly significant GEFT effect at the a = 0.01 criteria for session 2 dataset as well as for the combined (transformed) dataset as shown in table 6.11.  RESULTS: EXPERIMENT  1/154  Table 6.11: Comparison of ANCOVA Results for Sessions 1, 2, and the Transformed Dataset (Experiment 1, Outliers Excluded) Dependent Variable: Accuracy Performance  Sources of Va r i a n c e  p-values (Comb i ned Sess i o n s )  GEFT  *  p-values p-values ( S e s s i o n 1 ) ( S e s s i o n 2)  0.0004**  0.3326  0 . 0008**  G: Graph Format  0.0025**  0.0059**  0 . 1240  Q:  0 .0986  0 . 3028  0.1548  T: Time Pe r i o d  0 . 4458  0.273 1  0 .8037  D:  0.048 I *  0 .0004**  0.8409  G*Q  0.2176  0 . 1597  0.1765  G*T  0.2905  0.0400*•  0.2882  G*D  0.8459  0.5918  0.9492  Q*D  0 . 1592  0.3603  0.3746  Q*T  0.0 159*  0.0244*  0.0830  T*D  0. 1099  0.1419  0.4120  G*T*D  0.0131*  0.2114  0.0190*  G*Q*T  0.1817  0.2772  0 . 5488  G*Q*D  0.7850  0.8070  0.7735  Q*T*D  0.1040  0 . 1806  0.2474  G*Q*T*D  0.2454  0.9832  0.2566  Question Type.  Dataset  Significant at p = 0.05 level Significant at p = 0.01 level  RESULTS: EXPERIMENT 1 / 1 5 5 Apart from a GEFT effect, the following significant effects were found for accuracy: 1.  Graph Format effect  2.  Dataset effect  3.  Question Type by Time Period effect  4.  Graph Format by Time Period by Dataset effect  O n the basis of the highly significant GEFT variable (F= 18.69; P(F) = 0.0004) found for this experiment (E1) in explaining accuracy performance as well as the relatively fewer number of significant effects found with accuracy as compared to time, the discussion on accuracy scores will thus focus only on those significant effects that are of interest.  Higher order interactions that are normally very difficult to  interpret properly are not discussed.  1. Main Effects on Accuracy for Transformed Data 1.  Graph Format -- Analysis of the data showed that higher percentages of correct responses were obtained when subjects were given bar charts (95%) or symbol graphs (96%) than when they were given line graphs (92%). This finding indicated that line graphs were harder to use than the others. Q1  This should not be surprising given that time results for this experiment indicated  to be more  difficult  t o answer  than  the other  tasks.  The t w o results  appeared  complementary rather than conflicting. 2.  Dataset -  It was found that increasing datasets adversedly affected accuracy.  Hence, the  percentage of correct responses was 95% when only single dataset plots were used but dropped to 93% when multiple dataset (3 datasets) plots were used.  Again, this finding is  consistent with results on time performance where it was found that more datasets led to more time and effort. As usual, complexity of graphics is greater with more datasets.  RESULTS: EXPERIMENT  1/156  2. Two-way Interactions on Accuracy for Transformed Data 1.  Question Type x Time Period - Analysis of this significant interaction revealed that although increasing time periods adversely affected accuracy performance for task activities associated with Q 1 , accuracy for Q2 or Q3 actually improved slightly.t The underlying cause of such a phenomenon lies perhaps in the fact that with more time periods (14 periods) depicted on a graph, datapoints belonging to the same category are brought closer together.  This actually  resulted in facilitating rather than obstructing a better (i.e. more accurate) perception of trends (Q3) or level differences (Q2). Table 6.12 shows the mean values for this interaction. Applications of the Dunn-Bonferroni tests on the transformed data, however, failed to reveal any differences to be of significance at the a = 0.05 criteria. The differences among the treatment combinations with respect to accuracy performance were thus weaker than expected.  The overall results were nonetheless consistent with the findings on time performance found earlier.  Together, the significant effects indicated that more time periods depicted on the  abscissa of time series graphs used in this experiment would lead to an increase in latency time response and lower accuracy performance only for the extraction of DV scale-values (Q1), but w o u l d have little or no adverse effect on the identification of relative levels (Q2), or on that of a trend (Q3). Conversely, more time periods appeared to yield faster as well as more accurate reading of trends although such improvements in time and accuracy were not statistically significance.*  + This improvement was, however, not statistically significant (table 6.12). * Refer to table 6.9 and compare treatment combination (*9) with (*11); table 6.12 and compare treatment combination (*5) with (*6).  refer  also  to  RESULTS:  EXPERIMENT  1/157  Table G.12: Mean Values of the Question Type x Time Period Interaction for Transformed Data (Experiment 1, Outliers Excluded) Dependent Variable: Accuracy Performance  7  Quest ion Type  Ql  Periods  97%  14 P e r i o d s  The  application  failed  to  9  1%  Q2  Q3  92%  95%  97%  94%  of Dunn-Bonferroni multiple-comparison  identify  any s t a t i s t i c a l l y  significant  key  method  contrasts.  RESULTS: EXPERIMENT D.  SUMMARY  OF EXPERIMENT  El  1/158  RESULTS  One unexpected finding of this experiment was that trend reading tasks (Q3) were performed faster than scale-value reading tasks (Q1).  Although this unexpected result appears to contradict the  hypothesis that no one form of information to be extracted is uniformly easy or difficult, it should be remembered that only time series graphics were compared and contrasted in this experiment.  Indeed,  had tables been included as an alternative form of representation among the stimuli used, the expected result might have been realized. In general, this finding sheds some light on effectiveness of graphics for reading trends compared to exact values (i.e. when DV scale-values are to extracted and compared as in Q1).  In spite of this, results of this experiment provided clear evidence that no one graph format is superior to the others for all situations.  In fact, the key finding of this study is that the ease or difficulty of  using each particular graph format is strongly dependent on the characteristics of the task activities to be performed. Hence, it was found that significantly less time was required to extract single scale-values (Q1) from bars or symbols than from lines. In a similar vein, lines were found t o be faster for reading trends (Q3) or for reading level differences (Q2) than for reading scale-values (Q1). No significant differences were found among the various graph formats for extracting level difference information (Q2) although there was some evidence of similarity between Q2 and Q3 in this experiment.  In addition, results of this experiment confirmed that the complexity of graphics is stronger with increasing number of time periods depicted, as well as with more dataset categories.  As for the  interaction of complexity factors with tasks, it was found that more time periods exerted a greater adverse effect on the performance of certain task activities (i.e. other activities (e.g.  Q2 and Q3).  Q1), but had virtually no effect on  In fact, these claims were supported from evidence drawn from  results for time as well as for accuracy.  RESULTS: EXPERIMENT  1/159  In general, multiple bars representations were found to yield a significantly greater latency time than multiple symbol representations.  More importantly, the effect of increasing datasets were found to  adversely affect bars more than the other formats.  Finally, the highly significant GEFT measure as a covariate in the ANCOVA model for analyzing accuracy performance in this experiment, and the significantly positive correlation found between the GEFT measure and the accuracy performance measure, pointed t o the role of individual characteristics as a key determinant of accuracy in this experiment.  That is, field-independents were found to perform  more accurately than their counterparts (field-dependents).  In summary, there appears to be converging evidence supporting the hypotheses drawn in chapters 3 and 4. For example, that lines were found to yield the worst accuracy performance in this experiment was because most errors were committed on questions which required subjects to read scale-values of single points on lines (see also Appendix I). Lines have characteristics of low axes anchoring and thus Q1 was particular difficult to answer using lines than using other graph formats.  VII. RESULTS: EXPERIMENT 2 This chapter covers results for experiment E2. The subject population comprised thirteen second-year commerce undergraduates and eleven first-year MBA students.  Seventeen of these candidates were  males and seven were females. Their average age was 25.21 years of age.  As a starting point of the discussion, it may be appropriate to review briefly the major characteristics of task activities investigated in E1 and E2.t In El, all questions or tasks begin with given PIV attribute information (i.e. time period information) anchored on the abscissa framework and work towards uncovering DV attribute information (i.e. scale-value, level difference, and trend information).$ In this experiment, Q1 alone has a strong anchoring of information on both the x-axis and y-axis but Q2 and Q3 have strong anchoring of information only on the x-axis.  Although  questions  or tasks examined  in E2 also have characteristics of strong anchoring  of  information on the abscissa framework, in contrast to E1 tasks, they all begin with some known characteristics  of  DV attribute  information  and work  towards the uncovering of  PIV attribute  information (i.e. time period information). Yet, it is interesting to note that while tasks examined in E1 and E2 are different in terms of information specified in the question and information requested in the answer, the classification of E1 and E2 tasks based on the anchoring concept is, nonetheless, identical. *  The layout of this chapter is similar to chapter 6, as both are concerned with presenting experimental results for the individual studies. beginning in this chapter. Session effect is found.  Accordingly, key issues on statistical analysis are discussed at the  Again, emphasis is placed on session 2 results for time if a significant Results on accuracy for this experiment are also based on the combined  t A detailed discussion on the differences in tasks examined presented in chapter 3. Thus, only a brief note on the activities performed in E1 and E2 will be given here. $ See tables 3.4, 3.5, and 3.8. 160  among the experiments chief characteristics of  was task  RESULTS: EXPERIMENT 2 / 161 session dataset, based on the reasoning discussed in earlier chapters.  A.  TIME PERFORMANCE  FOR COMBINED  SESSIONS  First, average time performance for E2 was about 6.14 seconds.  This was a substantial increase from  the average time performance found for E1. However, consistent with expectation and the reasoning provided in chapters 3 and 4, this result confirmed that task activities tested in E2 were apparently more complex than those evaluated in E1.  Results of the initial ANCOVA procedure on the full dataset with the exception of the single outlier identified in table 5.1 is shown in table 7.1.  1. The Session Effect A highly significant (F = 63.98, p < .01) Session effect was found for E2. Time performance during session 1 averaged 7 seconds but dropped to only about 5 seconds during session 2. As usual, this was interpreted as indicative of an ongoing adjustment or learning process between sessions. Therefore, separate analyses of session 1 and session 2 datasets were conducted.  Also, significant interactions between the Session variable and other factors were found in the analysis (see table 7.1).  These other factors included the graph format, the question type, and the dataset  variables. Analysis of these significant interactions revealed, however, that they were strictly ordinal; in other words, regardless of the fact that these interactions were significant, the strong Session effect, which was indicative of learning, could still be generalized across all levels of the classification factors included in the initial statistical model.  RESULTS: EXPERIMENT  2/162  2. The GEFT Measure The relationship between the GEFT measure and time performance scores for E2 was not significant: that is, results of the analysis (table 7.1) indicated a clear independence of the GEFT measure from the time performance measure (F = 3.45, p > .05).  Moreover, when the GEFT measure was correlated with the corresponding time scores for all subjects combined over the full 72 treatment combinations, the result was also not statistically significant (R = -.1390, P(R) = .5023, Mean Time = 6.14 s, Mean GEFT=15.63).  In addition, correlations  of  corresponding scores between and within sessions for these relationships were not significant. Therefore the GEFT scores were excluded from the final statistical model used for analyzing E2 datasets.  3. Additional Outliers The same ground rules used for detecting additional outliers for E1 were used for E2. That is, these outliers were identified based on the extent of individual time-accuracy correlations for scores captured during session 2 only. Table 7.2 presents results of the analysis with separate indications for correlations that were found to be highly as well as marginally significant.  4. The Power Analysis Table 7.2 indicates that t w o out of the eight additional outliers identified for this experiment actually exhibited only marginally significant time-accuracy tradeoff effect at the a = 0.05 criteria. To assess the impact of their presence on the findings, statistical analyses of the data including as well as excluding these marginally significant outliers (i.e. subjects 05 and 06 separately.  see table 7.2) were run.  The results revealed negligible effects on the number or type of factors and their  combinations that were found to significantly affect time performance.  In other words, the same  significant effects or interactions were found with or without their being included as part of the data.  RESULTS: EXPERIMENT 2 / 163 Table 7 . 1 : Initial ANCOVA Results for Full Dataset (Experiment 2) Dependent Variable: Time Performance  Sources of Var iance  GEFT  F  3 . 45  Convent i o n a l Greenhouse p-values Ge i s s e r Prob.  0.0772  S:  Session  G:  Graph Format  5 . 53  0 .0069**  0.0124*  Q:  Question Type  4 . 03  0. 0 2 3 6 *  0.0241*  T:  Time Per i o d  D:  Dataset  6 3 . 98  0. 0 0 0 0 * * •  8 3 . 57  0 .0000**  I00. I 0  0 .0000**  S*G  I 2 . 87  0 .0000**  ,0 . 0 0 0 0 * *  S*Q  I 4 . 95  0. 0 0 0 0 * *  0.0000**  S*D  I I . 24  0 .0029**  G*Q  8 . 40  0 .0000**  0.0001**  G*T  3 . 62  0. 0 3 5 0 *  0.0379*  G*D  2 2 . 65  0 .0000**  0 .0**  Q*T  I . 05  0 .3535  0.3467  Q*D  6 . 47  0. 0 0 3 4 * *  0.0092**  T*D  33 . 8 l  0 .0000**  G*Q*T  3. 1 2  0.0 183*  0.0285*  G*Q*D  0. 3 3  0. 5 1 0 5  0.4865  G*T*D  4 .2 6  0. 0 2 0 3 *  0 .0263*  G*Q*D  0. 8 3  0. 5 1 0 5  0.4865  Q*T*D  2 2 . 64  0 .0000**  0.0**  0.023 1 *  0.0451*  G*Q*T*D  Significant at p = 0.05 level * Significant at p = 0.01 level  2 . 99  RESULTS: EXPERIMENT 2 / 1 6 4 Table 7.2: Time-Accuracy Correlations for Identifying Additional Outliers (Experiment 2, Session 2) Dependent Variable: Time Performance Scores  E2 S u b j e c t s ( S e s s i o n 2)  01  Correlat ion R (PERFECT  Probability P(R)  ACCURACY  —  Sample Size  NOT C O M P U T A B L E )  02  .5653  15E-5**  36  03  . 5863  71E-6**  36  04  . 1 534  .3604  36  05  .3210  . 0490*  36  06  .3200  .0497*  36  07  .1107  . 51 0 9  36  08  .4700  .0025**  36  09  . 5492  26E-5**  36  .3499  .0376*  36  1 1  .0132  . 9376  36  1  2  .659  1  3  .1998 •  1  4  1  5  .0746  . 6583  36  1  6  -.0138  . 9349  36  I 7  -.1247  . 4582  36  I3  -.0610  . 7 1 73  36  .0472  .7799  36  119  . 9438  36  1  0  1  . 2309  ( P E R F E C T ACCURACY  19 20  -.0  21  (PERFECT .0078  23 24  29E-7**.  (PERFECT  Significant at p = 0.05 level . Significant at p = 0.01 level  --  ACCURACY  36 HOT C O M P U T A B L E )  NOT  COMPUTABLE) 36  , 9634 ACCURACY  36  NOT  COMPUTABLE)  RESULTS: EXPERIMENT 2 / 1 6 5 Yet, the difference between six and eight additional outliers appeared t o have a much greater impact on the power  values  of the various F-tests.  Consequently, the choice was t o include these t w o  subjects whose time-accuracy correlations were only at the threshold of significance in the data analysis for time performance.  Moreover, a check on the time-accuracy tradeoff for all subjects  subsequently included in the analysis confirmed the overall absence of any significant accuracy-time correlations.  Thus, the retention of the t w o marginally significant outliers helped t o maintain power values for most of the F-tests conducted above the conventionally acceptable .80 benchmark.  This justified limiting  the rest of the discussion on time performance for null hypotheses that were rejected.  B. TIME PERFORMANCE  FOR SEPARATE  SESSIONS  As indicated in the previous chapter, the rationale for excluding the same set of outliers for all datasets (i.e.  the combined dataset, sessions 1 and 2 datasets) before further analysis was to ensure  comparability of resulting analyses. Table 7.3 presents all of the main and interaction effects that were found to be significant for the various datasets. A table of mean values for time performance scores combined across all the initial 36 treatment combinations for sessions 1 and 2 are presented in table 7.4. Note that each cell in the table shows a marked improvement between performance in session 2 as compared t o that in session 1.  1. Significant Effects on Time for Session 1 As observed from table 7.3, the following main or interaction effects were found t o be significant for time during session 1:  RESULTS: EXPERIMENT 2 / 1 6 6 Table 7.3: Comparison of ANOVA Results Among Sessions (Experiment 2, Additional Outliers Excluded) Dependent Variable: Time Performance  Sources o£ Va r i anc e  p-values (Comb i ned Sessions)  p-values Session 1  p-values Session 2  G: Graph Fo rma t  0.0037**  0 .0000**  0 . 6385  Q:  0 . I 442  0 .03 13*  0 .3743  T: Time Pe r i od  0 . 0000**  0 . 000 1 **  0 .0000**  D: D a t a s e t  0 . 0000**  0 .0000**  0 .0000**  G*Q  0 . 0086**  0 .0090**  0 . 1 637  Q*T  0 .7388  0 . 5670  0 . 9794  G*T  0 .0875  0 .0475*  0 .6253  G*D  0 .0000**  0 . 0000**  0 .0067**  Q*Q  0 .0622  0 .0539  0 .4101  T*D  0 .0000**  0 .0001**  0 .0003**'  G*Q*T  0 . I 442  0 . 5527  0 .03 12*  G*T*D  0 .0287*  0 . 1 294  0 .0940  G*Q*D  0 . 3333  0 . 3380  0 . 6923  Q*T*D  0.0000**  0 .0003**  0 . 0038**  G*Q*T*D  0 .0053**  0 .0163*  0 . 1 001  Question Type  * Significant at p = 0.05 level ** Significant at p = 0.01 level  RESULTS: EXPERIMENT 2 / 1 6 7 Table 7.4: Tables of Means for All Treatment Combinations (Experiment 2, Additional Outliers Excluded) Dependent Variable: Time Performance  Graphica1 Information Complex i t y  Ql  Q2  Q3  Ql  Q2  I. . i n e s Q3  Ql  Q2  Q3  a Session Session  1 2  3 . 4 3 6 . 7 0 5 . 30 4 . 8 3 5 . 1 5 8 . 0 3 5 . 2 1 2 . 6 7 3 . 0 9 4.14 3 .20 3 .66 4 . 1 0 3.83  b Session Session  1 2  4 . 5 6 4 . 4 5 5. 42 3.37 3 . 0 5 3.41  6 .40 4.77 4 .54 3 . 1 3  c Session Session  1 2  6 . 8 8 6 . 9 8 7.91 5 . 0 7 3 . 5 6 4.70  6 . 57 6.13 5 . 42 5 . 28 5 . 49 5.59 4 . 6 5 4 .30 3 . 6 8 4 . 2 8 4 . 94 4 . 5 7  d Session Session  1 2  8.31 5.71  Treatment  a: b: c: d:  S) /mbo15  Bars  1 1 3 3  14.5 7.61  Combinations  Dataset Dataset Datasets Datasets  with with with with  12.7 7.50  of  7 Time 14 T i m e 7 Time 14 T i m e  6.55 4.69  8.59 5.96  Information  Periods Periods Periods Periods  6 . 0 7 4 .84 3 . 42 3 . 3 8  4 . 6 7 6 . 5 8 5. 54 3 . 9 6 4 . 0 0 4 .57 2 .73 3 . 2 9  9.18 6.11  6 . 6 8 8 . 9 8 9. 2 7 6.91 6.16 6 . 0 4  Complexity  RESULTS: EXPERIMENT 1.  Graph Format effect  2.  Question Type effect  3.  Time Period effect  4.  Dataset effect  5.  Graph Format by Question Type interaction  6.  Graph Format by Time Period interaction  7.  Graph Format by Dataset interaction  8.  Time Period by Dataset interaction  9.  Question Type by Time Period by Dataset interaction  10.  Graph Format by Question Type by Time Period by Dataset interaction  2/168  Among these, only the significant Graph Format by Question Type effect for this session 1 will be discussed as it is of major interest in this research and the findings will be applicable to occasional or first-time graph users for the kinds of task activities investigated in this experiment.  Figure 7.1 shows  the plot for this 2-way interaction and the corresponding mean values for the respective treatment cells. Table 7.5 summarizes results of the Dunn-Bonferroni tests on key contrasts for the interaction.  An examination of figure 7.1 and table 7.5 indicated that the significance of this two-factor interaction appeared to rest in just a particular treatment combination: that is, the most time consuming treatment combination appeared to be the combination, Bar Graph and Q2. Accordingly, it was found that, with bars, performing Q2 took significantly longer than Q1 (figure 7.1).  Similarly, Q3 took longer than Q 1 ,  but that difference was not statistically significant among contrasts of interest as examined by the Dunn-Bonferroni tests based on the a = .05 level (see table 7.5).  That subjects took longer time to perform Q2 than Q l with bars in this experiment was due primarily to the different characteristics of the respective tasks.  Since Q1 was concerned with extracting a  single time period whose scale-value was closest to another given value, whereas Q2 was concerned  RESULTS: EXPERIMENT  2/169  with extracting t w o consecutive time periods showing the largest level difference, this result was expected.  This is because Q2 involved numerous pairwise comparisons among bars whereas Q1  involved the isolation of a single bar. The latter task used characteristics of bars better than the former task: that is, bars were more easily processed individually than pairwise due to their characteristics of being discrete and isolated.  Further, the anchoring concept discussed in chapter 3 makes it seem logical that bars, rather than lines or symbols, should require longer time for performing Q2 as compared to Q 1 . This is because the extraction of level relationships between consecutive pairs of time periods of the same dataset (Q2) required not only a strong x-axis anchoring characteristic on the part of the graph format being used but a substantial dataset anchoring characteristic as well.  Because bars rated worst relative to either  symbols or lines on their dataset anchoring characteristics, it appeared reasonable that longer time should be taken to perform Q2 than Q1 for bars but not for lines or symbols.  Although it appeared logical to find bars requiring longer time for performing Q2 (or, even Q3) over Q 1 , one wonders why lines did not show a strong (i.e. statistically significant) difference in their expected  disadvantage  over the  other  formats when  they were  used to  answer Q1  in  this  experiment, t A reasonable explanation of this phenomenon is that Q1 in E1 was more difficult for lines than Q1 in E2 because the abscissa scale, where the value was to be extracted for Q1 in E2, was undoubtedly more discrete, and thus easier to extract, than the ordinate scale, where the value was t o be extracted for Q1 in E1. Thus, the type of scale becomes a facilitating factor for disembedding points on lines. Future research may include the control of Scaling 1985) when evaluating graphical, presentations.  factor (e.g. DeSanctis & Jarvenpaa,  Note that most other key contrasts were as expected  although they were not statistically significance at the nominal level (cf. figure 7.1 with table 7.5).  t Note that E1 results clearly showed compared to the other graph formats.  lines  to  be  the  worst  format  for  performing  Q1,  RESULTS: EXPERIMENT  2/170  Figure 7.1: Plot and Mean Values of Graph Format x Question Type Interaction (Experiment 2, Session 1) Dependent Variable: Time Performance  L o g o n d  Q U E S T I O N  Question Type  *  Symbols  Bars  Ql  5.80  Q2  8.16  Q3  7.84  T Y P C  Lines  6.09  (*4)  5.94  (*7)  (*2)  6.16  (*5)  6.52  (*8)  (*3)  .6.82  (*6)  5.91  (*9)  Numbering i n c e l l s c o r r e s p o n d t o t r e a t m e n t combinations f o r D u n n - 3 o n f e r r o n i m e t h o d o f m u l t i p l e means c o m p a r i s o n .  RESULTS: EXPERIMENT 2 / 1 7 1 Table 7.5: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 2, Session 1) Dependent Variable: Time Performance  1.  Significant Differences among Means at the a = 0.05 level:  (*1,*2)  2.  No other key contrasts were found to be significant.  Note: Numbering of treatment  combinations above refers to marked cells in table of means given in figure 7.1.  RESULTS: EXPERIMENT  2/172  2. Significant Effects on Time for Session 2 For session 2, the following significant main and interaction effects were found (table 7.3): 1.  Time Period effect  2.  Dataset effect  3.  Graph Format by Dataset interaction  4.  Time Period by Dataset interaction  5.  Graph Format by Question Type by Time Period interaction  6.  Question Type by Time Period by Dataset interaction  Overall, there were less significant effects found in session 2 as compared to session 1.  Moreover,  those effects that were found to be significant in session 2 were generally also highly significant (p < .01) in session 1.t This observation indicated that significant effects found in session 2 represented the more permanent effects that would be expected regardless of the amount of training or learning previously undertaken by participants.  a.  1.  Main Factor  Effects on Time for Session  2  Time Period -- Effect of this highly significant (p < .01) factor indicated that the complexity of graphics increased with more time periods being depicted.  Average time taken for using a  7-period plot was 4 seconds in contrast to about 5 seconds for using a 14-period plot. 2.  Dataset -- This was another main factor effect found to be highly significant (p < .01). Graphics with multiple datasets took longer (about 5.4 seconds) to process than those with only single datasets (about 3.5 seconds).  Although findings on time period and dataset variables might be regarded as of lesser interest compared to those on graph format and question type, these factors were nonetheless included because whether or not, and if so how, they would interact with the more crucial variables (i.e. graph t  The exception  here was the Graph Format by Question Type by Time Period interaction.  RESULTS: EXPERIMENT 2 / 173 format and question type variables) would provide additional insights on the use and design of time-series graphics.  fa. Two-way  1.  Interactions  on Time for Session  2  Graph Format x Dataset -- This 2-way interaction was highly significant (p < .01) as well as strictly ordinal.  The plot and mean value table for the interaction are provided in figure 7.2.  Table 7.6 shows the Dunn-Bonferroni results o n those contrasts that were of key interest.  Analysis of this interaction revealed that the use of multiple dataset representations, regardless of the graph format used, resulted in additional time and effort as compared t o the use of only single dataset representations (see figure 7.2).  Results also showed that different types of graph  format used resulted in different degrees of additional time and effort spent when comparing multiple versus single dataset representations (see table 7.6). For instance, with increasing number of dataset categories, bars were found t o yield a higher significant increase in latency time performance, as compared to using a single dataset line or symbol graph versus a multiple dataset line or symbol graph.  The underlying reasoning for this important interaction would be what has been termed the "categorical isolation" effect of multiple bars, described in chapter 6 (refer t o the same interaction effect that was found significant in session 2 of E1). Briefly stated, the separation of bars belonging t o the same dataset category in multiple bar graphs tended t o make them difficult t o read. The absence of such an effect o n symbols as well as lines made them less vulnerable, and thus, the resulting adverse effect on time was found t o be of lesser significance with these representations compared t o bars.  RESULTS: EXPERIMENT 2.  2/174  Time Period x Dataset -- This was another highly significant (p < .01) interaction found during session 2 and plotted in figure 7.3. A table of mean values for the interaction is provided along with the figure. The Dunn-Bonferroni results for key contrasts are summarized in table 7.7.  It was found that more time was generally required with more time periods as well as datasets depicted (table 7.7). This was consistent with findings on main effects, namely, the time period and dataset factors. Note that an interesting parallel could be drawn between these complexity factors and factors associated with additional rows and columns in tabular representations.  RESULTS: EXPERIMENT  2/175  F i g u r e 7.2: P l o t a n d M e a n V a l u e s of G r a p h F o r m a t x D a t a s e t I n t e r a c t i o n ( E x p e r i m e n t 2, S e s s i o n 2) Dependent Variable: Time  Performance  J 2 L ©  •«  „  L a g a n d  O G R A P H  3  FORMAT  Graph Forma t  Bars  Symbols  Lines  1  Dataset'  •3.29  . '(* 1 )  3.78  (*3)  3.52  (*5)  3  Datasets  5.69  (*2)  4.90  (*4)  5.49  (*6)  * Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i method of m u l t i p l e means c o m p a r i s o n .  RESULTS: EXPERIMENT 2 / 1 7 6 Table 7.6: Summary of Dunn-Bonferroni Results for Graph Format x Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance  1.  2.  Significant Differences among Means at the a = 0.05 level: a.  (*3,*4)  b.  (*5,*6)  Significant Differences among Means at the a — 0.01 level: a.  3.  (*1,*2)  No other key contrasts were found to be significant.  Note that the above numbering of  treatment combinations correspond to those marked in the table of means given in figure 7.2.  RESULTS: EXPERIMENT 2 / 1 7 7 Figure 7.3: Plot and Mean Values of Time Period X Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance  o-  L a g o n d  Q TIMC  2  O r  o - - r  PERIOD  Da t a s e t Time Pe r i o d  3  Datasets  7  Periods  3.49  14  Periods  3.57  (*3)  Ca t e g o r y 3  Datasets  4.42  (*2)  6.30  (*4)  Numbering i n c e l l s correspond t o treatment combinations f o r D u n n - B o n f e r r o n i method o f m u l t i p l e means c o m p a r i s o n .  RESULTS: EXPERIMENT 2 / 178 Table 7.7: Summary of Dunn-Bonferroni Results for Time Period x Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance  Significant Differences among Means at the a = 0.05 level: a.  (*1,*2)  Significant Differences among Means at the a = 0.01 level: a.  (*2,*4)  b.  (*3,*4)  No other key contrasts were found to be significant.  Note that the above numbering of  treatment combinations correspond to those marked in the table of means given in figure  RESULTS: EXPERIMENT c.  Three-way  Interactions  on Time for Session  2/179  2  Although results of session 2 indicated t w o three-factor interactions t o be statistically significant, multiple means comparison tests performed on key contrasts underlying the Graph Format by Question Type by Time Period interaction did not yield any significant differences.  A table of cell  means for this interaction is nonetheless provided for the reader in table 7.8.  The other three-factor interaction that was found to be significant (p < .01) based on the ANOVA result was the Question Type by Time Period by Dataset interaction. Table 7.9 provides the cell means and table 7.10 shows the Dunn-Bonferroni test results for this interaction.  Examination of the Dunn-Bonferroni results revealed that for all task activities performed ( Q l , Q2, and Q3), significant differences were found with increasing datasets (3 datasets) on 14-period plots but not for 7-period plots.  Results also showed a similarity between Q2 and Q3 in the pattern of significant  differences found among the various types of graphical designs investigated (see table 7.10); that is, the most complex plots of (14-period, 3 datasets) were significantly harder to use than any of the simpler plots.  In fact, with highly complex graphs (3 datasets, 14 periods), performance of ail tasks  was difficult relative to simple graphics (1 dataset, 7 periods).  Finally, there was some evidence to indicate that increasing time periods (14-periods) on a single dataset plot actually facilitated trend and pattern perception although these differences were not statistically significant, t A possible explanation could be that perception of continuity among points belonging t o the same dataset is also highly dependent on how closely these points are placed each t o one another.  Of course, this is also dependent on the number of time periods to be shown on a  single plot.  t Compare table 7.9.  mean values  of  treatment  combinations  (*5)  with  (*7)  and  ("9)  with  (*11)  in  RESULTS:  EXPERIMENT  2/180  T a b l e 7.8: M e a n V a l u e s o f G r a p h F o r m a t x Q u e s t i o n T y p e x T i m e P e r i o d I n t e r a c t i o n ( E x p e r i m e n t 2, S e s s i o n 2) Dependent Variable: Time  Performance  Que s t i on Type  Ba r s .  Symbols  Lines  Ql 7  Periods  14  Per iods  3.87  (* 1 )  3.92  (*7)  4.05  (*13)  4.54  (*2)  4.61  (*8)  5.74  (*14)  3.32  (*3)  3.98  (*9)  4.18  (*15)  5.33  (*4)  4.57  (*10 )  4.44  (*16)  4.42  (*5)  3.89  (*1 1 )  3.98  (*17)  5.45  (*6)  5.05  (*12)  4.66  (*18)  Q2 7  Periods  14  Periods  Q3 7  Periods  14  Periods  RESULTS: EXPERIMENT 2 / 1 8 1 Table 7.9: Mean Values of Question Type x Time Period x Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance  Treaments  7 Quest ion Type  Q 1  *  Information  Complexity  1  T i m e Pe r i o d s  One Da t a s e t  .  of  14  Three Da t a s e t s  Time  One . Da t a s e t  Factors  1  Per i o d s Three Da t a s e t s  Session  2  3.23 (*1 )  s  4.67 (*2)  s  4.16 (*3)  s  5.77 (*4)  s  Q2 Session  2  3.39 (*5)  s  4.27 (*6)  s  2.99 (*7)  s  6.53 (*8)  s  Q3 Session  2  3.85 (*9)  s  4.34 (*10)  s  3.56  s  6.55 s (*LAST)  Numbering i n c e l l s c o r r e s p o n d the D u n n - B o n f e r r o n i method o f  (*11)  to treatment combinations f o r m u l t i p l e means c o m p a r i s o n .  RESULTS: EXPERIMENT 2 / 1 8 2 Table 7.10: Summary of Dunn-Bonferroni Results for Question Type x Time Period x Dataset Interaction (Experiment 2, Session 2) Dependent Variable: Time Performance  1.  Significant Differences among Means at the a = 0.05 level: a.  2.  3.  (*3,*4)  Significant Differences among Means below the a — 0.01 level: a.  (*1,*4)  b.  (*5,*8)  c.  (*6,*8)  d.  (*7,*8)  e.  (*9,*12)  f.  (*10,*12)  g.  (*11,*12)  No other key contrasts were found t o be significant.  Note: Numbering of treatment  combinations above correspond to marked cells in the table of means given in table 7.9; also, *T2 = *LAST in table 7.9.  RESULTS: EXPERIMENT C. ACCURACY  PERFORMANCE  FOR COMBINED  2/183  SESSIONS  In this experiment, correlations of the GEFT scores with corresponding accuracy scores for the combined as well as the separate sessions (i.e. sessions 1 and 2) did not reveal any significant relationships (table 7.11).  Consistent with these results were the similarly insignificant GEFT effects  found when the GEFT measure was included as a covariate in the statistical models used to analyze the accuracy scores for the the separate session datasets (i.e. sessions 1 and 2) as well as for the transformed (combined) dataset.  Table 7.11 shows the resulting significant effects for accuracy performance with respect to session 1 and 2 datasets as well as the transformed dataset based on similar rules of data transformation adopted in chapter 6.  In this chapter, data analysis on accuracy scores for session 2 of E2 did not yield any  interesting significant effects.  The following main factor and two-factor interactions were, however,  found to be significant based on the analysis of accuracy scores coded as the combined (transformed) dataset:  1.  Time Period effect  2.  Graph Format by Question Type interaction  3.  Question Type by Dataset interaction  1. Main Effects on Accuracy for Transformed Data As expected, accuracy was lower with more complex graphics.  Results showed that the overall mean  accuracy score for the 14-period graphics was 9 1 % whereas it was 96% for the 7-period graphics. This finding is generally consistent with results on time performance in that increasing number of time periods had always led to longer processing of graphics stimuli.  RESULTS: EXPERIMENT 2 / 1 8 4 Table 7.11: Comparison of ANCOVA results for Sessions 1, 2, and the Transformed Dataset (Experiment 2, Outliers Excluded) Dependent Variable: Accuracy Performance  Sources of Va r i a nc e  p-values (Combined Sessions)  p-values Session 1  p-values Session 2  0 .9716  0 .23 13  0.6526  G: G r a p h Forma t  0 .3287  0 . 4648  0.4441  Q: Quesc i o n Type  0 .0642  0 .0403*  0.0863  T: Time P e r i od  0 .0269*  0 .0 145*  0 . 1857  D: D a t a s e t  0 .3952 '.  0 . 2826  0 .3050  G*Q  0 .0266*  0 .00 15**  0.4453  G*T  0 .9136  0 . 9209  0.8512  G*D  0 .1470  0 .0034**  0.4131  Q*D  0 .0308*  0 .0059**  0.1814  Q*T  0 .8728  0 .9113  0.854 1  T*D  0 .5397  0 .6145  0.8264  G*T*D  0 .8433  0 .273 1  0.8040  G*Q*T  0 . 1 636  0 .0132*  0.6434  G*Q*D  0 .0738  0 . 5668  0 .0508  Q*T*D  0 . 0831  0 .00 1 2 * *  0.6063  G*Q*T*D  0 .0958  0 .3988  0 . 1 900  GEFT  Significant at p = 0.05 level Significant at p = 0.01 level  RESULTS: EXPERIMENT  2/185  2. Two-way Interactions on Accuracy for Transformed Data  1.  Graph Format x Question Type -- Table 7.12(a) provides the cell means for this interaction.  3x3  No table is provided for the Dunn-Bonferroni results because no key  contrasts among means were significant at the generally acceptable criterion. The largest mean difference found based on cross-tabulation of means for this interaction was the resulting lower percentage of correct  responses (88%) found with line graphs  for  performing Q1 as compared to bar charts (97%). Symbols were also the best format for answering Q2 and Q3 based on a visual comparison of cell means (table 7.12a).  These observations are in agreement with the anchoring concept (chapter 3).  For  example, bars are expected to be more accurate than lines for comparing scale-values (Q1) because of their characteristics of strong x-axis anchoring.  Lines have a low  anchoring to axes so that they would be unsuitable for reading specific location of points (i.e. x-value, y-value).  2.  Question Type by Dataset - Analysis of the accuracy data for this particular interaction revealed that none of the key contrasts among means were statistically significant. 7.12b, however, shows the cell means for this interaction.  Table  RESULTS: EXPERIMENT 2 / 1 8 6 Tables 7.12(a,b): Mean Value Tables for Significant Two-factor Interactions for the Transformed Dataset (Experiment 2, Outliers Excluded) Dependent Variable: Accuracy Performance  (a) Graph Format x Question Type  Quest ion Type  Bars  Symbols  Lines  Ql  97%  91%  88%  Q2  93%  99%  96%  Q3  91%  96%  92%  (b) Q u e s t i o n Type x Dataset  Question Type  Q1 .  1  Dataset  90%  3  Datasets  94%  Q2-  99%  93%  Q3  96%  90%  RESULTS: EXPERIMENT 2 / 187 D.  SUMMARY  OF EXPERIMENT  2 RESULTS  Results of E2 generally supported the experimental hypotheses presented in chapters 3 and 4. First, that graph format was not found to be a significant main effect confirmed the expectation that no one form of graph format would be superior overall. Second, results for the interaction of graph format and dataset were consistent with the finding that among the graph formats evaluated, bars had the worst "dataset" anchoring. Consequently, a highly significant difference was found for time between using multiple versus single bar representations whereas a less significant difference was found between using multiple versus singular line graphs. Mean contrast between singular versus multiple symbol graphs was not statistically significant.  In addition, results showed that information complexity of graphics increased with increasing number of time periods and datasets. The three-factor Question Type by Time Period by Dataset interaction also provided some support for the notion that increasing number of time periods in the case of single dataset representations could actually facilitate the performance of Q2 and Q3 due to greater proximity among points belonging to the same dataset when more time periods are to be depicted.t  Finally, the key finding regarding task activities studied in this experiment is that when anchoring information on the axes framework formed the main aspect of task activities to be performed (e.g. Q1)t bars usually facilitated but lines inhibited task performance. Conversely, when this information anchoring became slack as in Q2 and Q3, bars would inhibit whereas lines would facilitate task performance. In session 1, bars were found faster to use for answering Q1 than Q2. For the combined sessions, lines yielded lower accuracy when used to answer Q1 as compared with other graph formats. t Compare also tables 6.9 and 7.9 for treatment combinations (*5) with (*7) and (*9) with (*11). • t Q1 involved not only anchoring information on the abscissa or x-axis (i.e. time period information) but also anchoring information on the ordinate or y-axis (i.e. scale-value information) as discussed in chapter 3.  RESULTS: EXPERIMENT 2 / 188 In summary, there appears to be converging evidence to support hypotheses drawn earlier (chapters 3 and 4).  Yet the interpretation of results requires that particular attention be paid to graph format  characteristics and the characteristics of the task at hand. Accuracy scores were, however, not found to be highly correlated with the GEFT scores in this experiment.  VIII. RESULTS: EXPERIMENT 3 This chapter discusses the findings for experiment 3. The subject population for this experiment came from the same pool as those of El and E2: that is, they were students enrolled in MIS courses at the introductory level. Among them, twelve were second-year undergraduate and twelve first-year MBA students.  Nineteen of the twenty-four subjects were males and five were females. Their average age  was 24.67 years of age.  As discussed in chapter 3,t the chief characteristic of task activities tested in E1 and E2 is that of having strong information anchoring on the PIV attribute component,  ln E1, this anchoring information is  specified in the question whereas in E2, it is required in the answer. The key difference between El and E2 tasks, therefore, lies in the class of information to be retrieved, ln E1, it comes chiefly from the DV component (i.e. scale-value, level difference, and trend information), whereas in E2, it comes chiefly from the PIV attribute component (i.e. time period information).  In contrast, information t o be retrieved for task activities evaluated in E3 comes chiefly from the SIV attribute component (i.e. dataset information).  Specific time period information anchored on the  abscissa component is neither provided, as in E1 tasks, nor requested, as in E2 tasks. the key difference between tasks examined in the other two experiments (i.e.  In other words,  E1 and E2) and those  tested in E3 lies in their respective nature of anchoring information on the abscissa component.  Apart  from this key difference, E3 also differs from El and E2 in that only multiple representations of time-series graphics are used rather than both singular versus multiple dataset representations. reason lies solely in the nature of tasks to be examined in E3.  t  See tables 3.4, 3.5 and 3.8 in  particular. 189  The  RESULTS: EXPERIMENT  3/190  A . TIME PERFORMANCE FOR COMBINED SESSIONS The initial ANCOVA model ran on the full dataset for E3 used the same repeated measures design and classification factors that were specified for E1 and E2 with a minor e x c e p t i o n ^  1.  S: Session (Session 1, or Session 2)  2.  C: Graph Format (Bars, or Symbols, or Lines)  3.  Q: Question Type ( Q 1 , or Q2, or Q3)  4.  T: Time Period Variation (7, or 14 Periods)  5.  D: Dataset Category (2, or 3 Datasets)  As usual, the GEFT scores were treated as a covariate.  1. The Session Effect Analysis of the full dataset for E3 with the exclusion of the initially identified outliers (see table 5.1) yielded a highly significant (F=19.63, p <  .01) Session effect on time (table 8.1).  Therefore, the  experimental datasets for time captured in sessions 1 and 2 were analyzed separately.  This experiment resulted in the longest average time response, among the three experiments. Approximately 9.9 seconds was taken for a typical trial in session 1 and about 7.6 seconds in session 2. Thus, the results confirmed the expectation that tasks designed for E3 were more complex than those for E l and E2 (see chapters 3 and 4).  2. The GEFT Measure The initial analysis of E3 data (table 8.1) revealed a non-significant GEFT effect (F = 0.16; p > .05) for time.  This was further substantiated by the equally non-significant correlations of GEFT scores and  average time performance scores for individual subjects between and within sessions.  For example,  the GEFT-time relationships based on BMDP6D analysis revealed only a low and non-significant t  I.e. The levels of dataset category  factor.  RESULTS: EXPERIMENT correlation for the full dataset (R = -.31; P(R) = .13; Mean Time = 8.58 s; GEFT=17).  3/191  Hence, it was clear  that the GEFT measure was relatively unimportant in explaining (or, predicting) time performance.  The  GEFT scores were thus excluded from further statistical analyses of time data.  3. Additional Outliers As in previous chapters, additional outliers that exhibited high time-accuracy tradeoff effect were identified based on individual time-accuracy correlations for session 2 only.  Individuals indicating a  significant or highly significant time-accuracy tradeoff are indicated in table 8.2 with the use of an asterisk (*) or a double asterisk (**) respectively.  Table 8.2 shows four additional outliers whose time-accuracy correlations might be considered as highly undesirable.  As such, all data scores belonging to these outliers were discarded from the  dataset used for subsequent analysis and interpretation of findings for this experiment.  Of course,  subjects whose time-accuracy correlations could not be computed (table 8.2), because they (e.g. subjects 14 and 22) had achieved perfect accuracy, were included in the dataset used for statistical analyses.  4. The Power Analysis The removal of all additional outliers together with those already identified (table 5.2) was found to have only negligible effect on the power values of the various F-tests conducted. In this research, high power values were achieved chiefly because of the use of the repeated measures design (see chapter 5).  RESULTS: EXPERIMENT 3 / 1 9 2 Table 8.1: Initial ANCOVA Results for the Full Dataset (Experiment 3) Dependent Variable: Time Performance  Sources of Variance  GEFT  F  0. 1 6  Convent i o n a l Greenhouse Ge i s s e r p-values Prob.  0.6924  1 9. 6 3  0.0004**  G: Graph Format  49 . 4 3  0.0000**  0 . 0**  Q: Q u e s t i o n Type  1 5. 6 1  0.0000**  0 .000 1 * *  T: Time Pe r i o d  3 1.09  0.0000**  D: D a t a s e t  3 1 . 08  0.0000**  G*Q  34 . 3 4  0.0000**  G*T  3 .20  0.0539  0 .0587  Q*T  5 .90  0.0066**  0 .0086**  G*D  1 6.7 1  0.0000**  0 .000 1 * *  Q*D  0 .28  0.7597  0 .7447  T*D  7 .66  0 . 0 137*  G*Q*T  1. 0 7  0.3789  0 .3637  G*Q*D  3. 1 0  0.0215*  0 .0458*  G*T*D  0 .95  0.3990  0 .3947  Q*T*D  1 3.53  0.0001**  0 .0002**  0 . 0 142*  0 .0285*  S:  Session  G*Q*T*D  3 .39  * Significant at p = 0.05 level * * Significant at p = 0.01 level  . 0 . 0**  RESULTS: EXPERIMENT 3 / 1 9 3 Table 8.2: Time-Accurat7 Correlations for Identifying Additional Outliers (Experiment 3, Session 2) Dependent Variable: Time Performance Scores  E3 S u b j e c t s ( S e s s i o n 2)  Correlat ion R  . 0575  36  02  -.0127  . 940 1  36  03  .0776  . 6455  36  05  -.1546  . 3566  36  06  -.6105  . 28E-6**  36  . 3787  36  -.1263  .4506  36  I0  .0555  . 7426  36  1 1  -.5072  1 2  -.0376  .8237  36  .1 3  .0837  .6196  36  . 1477  07 09  14  (PERFECT  1 5  -.4969  94E-5**  ACC:URACY  - - HOT  36  COMPUTABLE)  .0013**  36  16  .07 1 5  .6719  36  1 7  .17 16  . 305 1  36  -.0792  . 6337  36  . 3536  36  18 19  +  Sample Size  .3105  0I  *  P r o b a b i 1 i ty P(R)  . 1 539  20  -.3907  .0 146*  36  2I  -.1995  .2316  36  22  (PERFECT ACC3URACY -- HOT COMPUTABLE)  23  -.0086  . 9594  36  24  -.0192  .9098  36  Significant at p = 0.05 level * Significant at p = 0.0.1 level  RESULTS: EXPERIMENT  3/194  B. TIME PERFORMANCE FOR SEPARA TE SESSIONS Table 8.3 compares effects of main factors and their interactions that were analyzed for the combined as well as the separate sessions (i.e. session 1 and session 2). Table 8.4 shows the cell means for the various treatment combinations tested in sessions 1 and 2.  As expected, learning or, improvement  was evidenced for each treatment combination (i.e. the same experimental trial) across sessions (i.e. comparing mean values of session 1 to corresponding mean cell values of session 2).  1. Significant Effects on Time for Session 1 For session 1, the following significant effects were found on time (table 8.3): 1.  Graph Format effect  2.  Question Type effect  3.  Time Period effect  4.  Dataset effect  5.  Graph Format by Question Type interaction  6.  Graph Format by Time Period interaction  7.  Graph Format by Dataset interaction  8.  Question Type by Time Period interaction  9.  Question Type by Time Period by Dataset interaction  Among these, only the significant Graph Format by Question Type interaction will be discussed as it is of key interest for this session (session 1). Figure 8.1 depicts the plot with the corresponding mean value table for this 2-way interaction. Table 8.5 shows results of the multiple mean comparison tests produced by BMDP7D on key contrasts that are of interest.  RESULTS: EXPERIMENT 3 / 1 9 5 Table 8.3: Comparison of ANOVA Results Among Sessions (Experiment 3, Additional Outliers Excluded) Dependent Variable: Time Performance  Sources of Var iance  p-values (Comb i ned Sess i o n s )  p-values p-values ( S e s s i o n I ) ( S e s s ion 2)  G: Graph Format  0 . 0**  0 . 0**  0.0**  Q: Q u e s t i o n Type  0 .0000**  0 .0000**  0 .000 1 *.*  T: Time Pe r i o d  0 .0000**  0 .0000**  0.000 1**  D: Dataset  0 .0000**  0 .0000**  0.000 1**  G*Q  0 . 0**  0 . 0**  0.0000**  G*T  0 .0376*  0 .0550*  0.3 180  Q*T  0 .0060**  0 .0201 *  0.0194*  G*D  0 .0000**  0 .0001**  0.0002**  Q*D  0 .7470  0 . 5844  0.1933  0 .0074**  0 . 1 297  0 . 0044**  G*T*D  0 .3323  0 .5191  0 . 2977  G*Q*T  0 . 3032  0 .4172  0.1423  G*Q*D  0 .0206*  0 .0384  0.2254  Q*T*D  0 .0000**  0 .0000**  0.057 1  G*Q*T*D  0 .0228*  0 .0568  0.1604  T*D  , .  * Significant at p = 0.05 level ** Significant at p = 0.01 level  RESULTS: EXPERIMENT 3 / 196 Table 8.4: Tables of Means for All Treatment Combinations (Experiment 3, Outliers Excluded) Dependent Variable: Time Performance  Graphical In f ormat i o n Complex it y  Ql  Q2  Q3  Ql  Q2  Ij i n e s Q3  Ql  Q2  Q3  a Session Session  1 2  5 . 9 9 9 .26 9 . 68 3 .25 6.89 4 . 5 9 7 .04 7 . 57 2 . 95 4.72  7 .32 9 . 45 3 .95 5 .28 5 . 66 2 .94  4 .06 3 .27  b Session Session  1 2  5 . 4 9 16.7 5 . 02 10.9  1'4 . 7 4 . 56 1 0 . 9 11.6 4 . 38 7 .56  9 .77 8 . 1 2 1 2 . 5 7 .03 5 . 34 8 .38  9.77 10.4  c Session Session  1 2  18.8 8.75 8 . 37 13.5  13.3 10.4  d Session Session  1 2  11.3 8.06  Treatment  a: b: c: d:  fmbol 5  Bars  2 2 3 3  6.12 5.61  10.9 10.1  9.60 8.19 8 . 38 6. 96  7.91 7 .87 5 . 57 5.80  20 . 5 19.8 7 . 1 2 1 3 . 0 1 3 .2 6.49  10.2 10.7  13.1 10.2  6.22 6.29  Combinations  Dataset Dataset Datasets Datasets  with with with with  7 14 7 14  of Information  Time Time Time Time  Periods Periods Periods Periods  14.8 7.72  Complexity  10.3 7.67  RESULTS: EXPERIMENT 3 / 197 A study of figure 8.1 and table 8.5 reveals that most of the significant differences found among key contrasts were between t w o particular treatment combinations (the Bar-Q2 and Bar-Q3 combinations) and the other treatment combinations.  Subjects using multiple bar graphs took significantly longer for  extracting the dataset to which a pair of consecutive datapoints with the largest level difference (Q2) belonged, or a range of successive datapoints with the longest uni-directional trend (Q3) when compared to using other forms of multiple representations (i.e. multiple symbol and line graphs).  No  difference was found between using symbols and lines for performing either Q2 or Q3. With respect to Q1 (i.e. identifying the dataset with a single point that come closest to a given scale-value), only lines were found to take significantly longer compared t o symbols, consistent with propositions advanced in chapter 3. In addition, bars as well as symbols were found to be faster to use for Q1 than for either Q2, or Q3.  No significant differences were found between using bars and lines, or using  bars and symbols for performing Q 1 .  Altogether, these results showed bars to take longest for performing Q2 and Q3 in this experiment (E3) as compared t o the other formats. It is believed that t w o main reasons contributed to this.  First,  there was no reference in either question or answer to anchoring 'time period' information on the abscissa for tasks tested. This Jack of a strong x-axis anchoring appeared to make bars less suitable than other graph formats (i.e. lines and symbols) for task activities tested in E3, particularly Q2 and Q3 because for Q l , at least, some anchoring information was provided on the ordinate (y-axis) (i.e. scale-values).  Second, the use of only multiple dataset representations in this experiment also added  to the disadvantage of using bars over the other forms of representations.  O n the one hand, lines were generally faster t o use for Q2 and Q3 than bars in this experiment because of their having characteristics of point continuity, a characteristics which facilitated the extraction of difference patterns, trends, and other form of Gestalt information (e.g. information).  'dataset'  On the other hand, symbols were faster to use for Q2 and Q3 in this experiment than  bars simply because not only are symbols easier to string together than bars due to their relatively  RESULTS: EXPERIMENT 3 / 198 greater connectedness (see chapter 3), but also for multiple dataset representations, they look more like multiple line graphs than multiple bar charts. As for Q 1 , symbols turned out to be a better format to use than lines (or bars)t simply because in addition to having a reasonable dataset anchoring characteristic (see chapter 3), symbols also possessed moderately strong information characteristics to the axes framework.  anchoring  In other words, their special advantage over lines (as well as  bars) lies in their combining effectively characteristics of both discreteness (as in bars) with apparent continuity (as in lines).  In summary, the findings were generally supportive of the notion that for occasional or first-time graph users, it is the task characteristics as well as the characteristics of the graphical representation that is being used which meaningfully determines time performance for tasks investigated.  2. Significant Effects on Time for Session 2 For session 2, the following significant main effects and two-factor interactions were found:  1.  Graph Format effect  2.  Question Type effect  3.  Time Period effect  4.  Dataset effect  5.  Graph Format by Question Type interaction  6.  Question Type by Time Period interaction  7.  Graph Format by Dataset interaction  8.  Time Period by Dataset interaction  No higher-order interactions were found to be significant for time in this experiment.  t The mean difference statistically significant.  showing  symbols  to  be  better  than  bars  are  not,  however,  RESULTS: EXPERIMENT 3 / 1 9 9  Figure 8.1: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 3, Session 1) Dependent Variable: Time Performance  Ques ti o n Type  Bars  Symbols  Lines  Ql  7.89  (* 1 )  5.26  (*4)  10.14  (*7)  Q2  16.30  (*2)  9.73  (*5)  7.65  (*8)  Q3  14.37  (*3)  9.94  (*6)  7.99  (*9)  * Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i m e t h o d o f m u l t i p l e means c o m p a r i s o n  RESULTS: EXPERIMENT 3 / 200  e 8.5: Summary of Dunn-Bonferroni Results for Graph Format x Question Type Interaction (Experiment 3, Session 1) Dependent Variable: Time Performance  1.  2.  Significant Differences among Means at the a = 0.01 level: a.  2); (*1,*3)  b.  (*2,*5); (*2,*8)  c.  (*3,*6); (*3,*9)  d.  (*4,*5); (*4,*6); (*4,*7)  No other key contrasts among means were found t o be significant.  Note that the above  numbering of treatment combinations correspond to those marked in the table of means given in figure 8.1.  RESULTS: EXPERIMENT 3 / 201 a.  1.  Main  Factor  Effects on Time for Session  2  Graph Format -- This factor was found to be highly significant (p < .01) for session 2. Bars were generally found to take longer to process than lines or symbols.  Average time taken for bars  was approximately 9.5 seconds compared to only 7 seconds for symbols and slightly over 6 seconds for lines. The difference in time between using symbols and lines was not statistically significant based on the Dunn-Bonferroni tests. 2.  Question Type -- This factor was highly significant (p < 0.01) for session 2.  Subjects were  found to take longer for performing Q2 and Q3 as compared t o Q 1 . Average time for Q2 and Q3 was about equal (8 seconds) whereas that for Q l was about 6 seconds. 3.  Time Period -- Effect of this highly significant (p < .01) factor was consistent with expectation. As the number of time periods depicted along the abscissa of time series graphics increased, time for extracting data increased correspondingly.  Average time for using 7-period graphics  was close to 6.6 seconds compared to 8.6 seconds for 14-period graphics. 4.  Dataset - As expected, an increase was found for latency of reaction times as the number of datasets depicted on a single plot increased.  For graphs with 3 datasets, time performance  averaged 8.8 seconds but for graphs with only 2 datasets, 6.4 seconds.  b.  1.  Two-way  Interactions  on Time for Session  2  Graph Format x Question Type - This two-factor interaction was of central focus in the study. The analysis revealed almost identical results for this interaction in session 2 as those found in session 1. Figure 8.2 shows the plot and mean values for this interaction. The Dunn-Bonferroni results for the interaction are shown in table 8.6.  Only key contrasts among means were  evaluated.  A study of figure 8.2 and table 8.6 revealed that, consistent with session 1 results, subjects using multiple bar graphs took significantly longer time for performing Q2 and Q3 of this experiment compared to other forms of multiple representation.  For Q l , no differences between the  RESULTS: EXPERIMENT 3 / 202 various formats were significant at the a = 0.05 level (table 8.6).  Similar to session 1 results,  bars as well as symbols were also found to be faster to use for Q1 than for either Q2, or Q3.  That both bars and symbols, but not lines, were found to be faster to use for Q1 than the other tasks (i.e.  Q2, or Q3) in this experiment is consistent with the notion that when strong  anchoring of information exists on some axis component (e.g. both Q1 of E1 and E2 has strong anchoring of information on both axes but Q1 of this experiment only exhibited strong anchoring of information on the ordinate),t the use of more discrete representations such as bars and symbols would produce better time performance scores than the use of purely continuous representations such as lines.  In summary, the key finding for this 2-way interaction for this second session as well as for the first session (which was discussed earlier) is that the degree of support provided by a particular graph format for a particular task is very much dependent on the matching characteristics of the graph format and the task at hand, ln short, bars and symbols were more suited to tasks with characteristic of strong anchoring of information on any one or both of the major axes (e.g.  Q1) than those in which such an anchoring of information was  unavailable (e.g. Q2 and Q3). The characterization of tasks as well as graph formats based on the anchoring concept in chapter 3 was well supported.  t  See tables 3.5 and 3.8 regarding classification of tasks based on the anchoring  concept.  RESULTS:  EXPERIMENT 3 / 203  Figure 8.2: Plot and Mean values of Graph Format x Question Type Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance  Que s L i o n Type  Ql  Bars  Symbols  6.51  Lines  4.8 6  (*4)  6.42  (*7)  Q2  11.13  (*2)  8.26  (*5)  5.80  (*8)  Q3  10.71  (*3)  7.73  (*6)  6.80  (*9)  * Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i m e t h o d o f m u l t i p l e means c o m p a r i s o n  RESULTS: EXPERIMENT 3 / 204 Table 8.6: Summary  of Dunn-Bonferroni  Results for Graph Format x Question Type  Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance  a.  b.  c.  Significant Differences among Means at the a = 0.01 level: 1)  <*1,*2); (*1,*3)  2)  (*2,*8)  3)  (*3,*9)  4)  (*4,*5)  Significant Differences among Means at the a — 0.05 level: 1)  (*2,*5)  2)  (*3,*6)  3)  (*4,*6)  N o other contrasts of interest were found t o be significant.  Note that the above  numbering of treatment combinations correspond t o those marked in the table of means given in figure 8.2.  RESULTS: EXPERIMENT 3 I 205 2.  Question Type x Time Period -  Unlike the disordinalt  Graph Format x Question Type  interaction, this two-factor interaction appeared to be strictly ordinal.* The plot and respective cell  means  for  Dunn-Bonferroni  this tests  significant on  key  interaction contrasts  are  are  shown  in  summarized  figure in  table  8.3. 8.7.  Results  of  the  Results  of  the  Dunn-Bonferroni tests for session 2 indicated that: (a) effect of increasing time periods (14 periods) on Q3 performance was highly significant (p <  .01); (b) effect of increasing time  periods (14 periods) on Q2 performance was significant only at the a — 0.05 level; and (c) no significant adverse effect was found with increasing time periods on Q 1 . In addition, only for 14-period graphics were Q2 as well as Q3 found to take significantly longer to perform than Q 1 . No differences were found among tasks in the case of 7-period graphics.  Together with the previous findings on the significance of time period effect, this result supported the notion that the effect of information complexity of graphics is stronger with more time periods.  In spite of this, the results also indicated that in effect, increasing number of time  periods had varying adverse effects on different task activities: that is, more adverse effects were found with increasing time periods on Q2 and Q3 performance than on Q1 performance. Probably, this was due to the presence of a strong anchoring of information on the ordinate scale in the case of Q 1 , which provided a mechanism to quickly filter all of the irrelevant information that came with more time periods. In the absence of such an anchoring as in cases of Q2 and Q3, there was no such mechanism available to facilitate information processing.  t See figures 8.1 and 8.2. % This term is explained in the  Glossary.  RESULTS:  EXPERIMENT  3 / 206  Figure 8.3: Plot and Mean values of Question Type x Time Period Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance  Quest ion Type  7  Periods  14 Per iods  Q2  Ql  5.69  .6.17  (*2)  Q3  7.31  (*3)  9.49  (*4)  6.79  10.03  (*5)  (*6)  * Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i m e t h o d o f m u l t i p l e means c o m p a r i s o n .  RESULTS: EXPERIMENT Table 8.7: Summary  of Dunn-Bonferroni  3/207  Results for Question Type x Time  Period  Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance  a.  Significant Differences among Means below the a = 0.01 level: 1)  (*2,*4); (*2,*6)  2)  (*5,*6)  b.  c.  Significant Differences among Means at the a = 0.05 level: 1)  d.  (*3,*4)  No other key contrasts among means were found to be significant.  Note that the above  numbering of treatment combinations correspond to those marked in the table of means given in figure 8.3.  RESULTS: EXPERIMENT 3 / 208 3.  Graph Format x Dataset - Just as the Question Type x Time Period interaction, this was another ordinal interaction as plotted in figure 8.4.  Figure 8.4 also shows the cell means for this  two-factor interaction. Table 8.8 shows a summary of the Dunn-Bonferroni tests performed on key contrasts.  Essentially, the ordinal interaction for this effect and the highly significant main  effect for dataset category indicated that even with a minor increase in the number of datasets depicted (i.e. from two to three datasets), task performance was adversedly affected.  Specifically, the Dunn-Bonferroni tests for this session (table 8.8) indicated that increasing datasets significantly impaired task performance when either bars or symbols were used, but not when lines were used. Moreover, subjects took more time with bars as compared to lines for 3 datasets, and with bars as compared to symbols for 2 datasets.  These findings were consistent with the characterization of bars based on the anchoring concept (chapter 3). The underlying rationale is that increasing number of datasets contributed to a large number of bars that have to be processed together whereas for lines, an increase in the level of dataset variable from t w o (i.e. low) to three (i.e. high), they (i.e. increasing number of datasets) only yielded an additional line. In other words, dataset anchoring characteristics are low for bars but high for lines.  Consequently, this special strength of lines over bars was dramatically  demonstrated in this experiment more than the others as only multiple dataset representations were evaluated here.  RESULTS:  EXPERIMENT  3 / 209  Figure 8.4: Plot and Mean values of Graph Format x Dataset Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance  3  L o g o n d  S y m b o l » G R A P H F O R M A T  Graph Format  *  Ba r s  2  Datasets  7.80  3  Datasets  11.10  Symbols  Lines  (* 1 )  5.32  (*3)  6.01  <*5)  (*2)  8.58  (*4)  6.67  (*6)  Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t c o m b i n a t i o n s f o r D u n n - B o n f e r r o n i m e t h o d o f m u l t i p l e means c o m p a r i s o n .  RESULTS: EXPERIMENT 3 / 2 1 0 Table 8.8: Summary of Dunn-Bonferroni Results for Graph Format x Dataset Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance  Significant Differences among Means below the a = 0.01 level: 1)  CV2)  2)  (*2,*6)  3)  (*3,*4)  Significant Differences among Means at the a = 0.01 level: 1)  (*1,*3)  No other key contrasts among means were found t o be significant.  Note that the above  numbering of treatment combinations correspond t o those marked in the table of means given in figure 8.4.  RESULTS: EXPERIMENT  3/211  Moreover, the relatively greater adverse effect of increasing number of datasets found on symbols as compared to lines (see figure 8.4 and table 8.8) provided additional evidence on the importance of matching characteristics  of task to graph format.  Indeed, the relative  strength of lines over bars and symbols in this experiment was evidenced by a stronger anchoring of dataset information found on lines as compared to the other formats. It was also evidenced by the absence of time period anchoring for task activities to be performed in this experiment.  In summary, the findings implies that some reservations should be placed on the unrestricted use of multiple bar representations in business reporting.  Note also that characteristics of the  task at hand also determine how adversely increasing number of datasets could affect the graph format chosen.  4.  Time Period x Dataset -- This two-factor significant interaction is plotted in figure 8.5 and found to be strictly ordinal. Table 8.9 provides the Dunn-Bonferroni test results on key contrasts for the interaction.  Analysis of this interaction revealed that increasing number of datasets and time periods led to increasing time required for task performance,  ln line with a priori hypotheses given in chapters  3 and 4, the evidence on hand together with the highly significant main effects of time period and dataset variables clearly indicated that information complexity of graphics increased with increasing levels of both time period and dataset variables.  RESULTS: EXPERIMENT 3 / 2 1 2 Figure 8.5: Plot and Mean values of Time Period x Dataset Interaction (Experiment 3, Session 2) Dependent Variable: Time Performance  0 © 7 — P.rlodM  — P-.r"  TIME  lodi  PERIOD  Dataset  *  "3*  Time Pe r i o d  2  7  Periods  4.89.  14  Periods  7.86  Datasets  Ca t e g o r y 3  Datasets  (* 1 )  8.30  (* 2)  (*3)  9.26  (*4)  Numbering i n c e l l s c o r r e s p o n d to t r e a t m e n t combinations f o r D u n n - B o n f e r r o n i method of m u l t i p l e means c o m p a r i s o n .  RESULTS: EXPERIMENT 3 / 2 1 3 Table  8.9: Summary  of  Dunn-Bonferroni  Results for  Time  Period  x Dataset  Interaction  (Experiment 3, Session 2) Dependent Variable: Time Performance  1.  2.  Significant Differences among Means below a = 0.05 level: a.  (*2,*1)  b.  (*3,*1)  No other key contrasts among means were found to be significant.  Note that the above  numbering of treatment combinations correspond to those marked in the table of means given in figure 8.5.  RESULTS: EXPERIMENT 3 / 214  C. ACCURACY PERFORMANCE FOR COMBINED SESSIONS The resulting correlations of the GEFT measure and corresponding accuracy scores for the combined (transformed) as well as the separate sessions (sessions 1 and 2) did not reveal any significant relationships.  This was further substantiated by the non-significant GEFT effects when the GEFT  variable was included as a covariate in the statistical models for analyzing both the session and the combined datasetst (see table 8.10).  For the transformed dataset, the significant effects found with respect to accuracy for this experiment (E3) included:  1.  Question Type effect  2.  Time Period effect  3.  Question Type by Time Period effect  4.  Question Type by Dataset effect  5.  Time Period by Dataset effect  As usual, this discussion focusses on those key contrasts that are of interest to this research, particularly, those effects found t o be significant at the nominal level when the Dunn-Bonferroni method of multiple-comparison was used.  t The method of transformation used for combining performance has been described in earlier chapters.  the  datasets  captured  for  accuracy  RESULTS: EXPERIMENT  3 / 215  Table 8.10: Comparison of ANCOVA Results for Sessions 1, 2, and the Transformed Dataset (Experiment 3, Outliers Excluded) Dependent Variable: Accuracy Performance  Sources of Variance  GEFT  p-values (Combined Sessions)  p-values p-values (Session 1 ) (Sess ion 2)  0 . 85 I 4  0. 3064  0 .8250  G: Graph Format  0.0627  0. 06 1 5  0 . 2787  Q: Q u e s t i o n Type  0.0034**  0. 0060**  0 .0570  T: Time P e r i od  0.005l**  0. 0185*  0 .0483*  D: D a t a s e t  0.3737  0. 1 1 42  0 .2980  G*Q  0. I 594  0. 07 1 7  0 .3393  G*T  0.5709  0. 3456  0 .8652  G*D  0.5165  0. 4804  0 .3039  Q*D  0.0084**  0. 1 947  0 .0208*  Q*T  0 . 0**  0. 0**  0 .0000**  T*D  0.0034**  0. 2387  0 .0037**  G*T*D  0. 1083  0. 87 1 7  0 .0456*  G*Q*T  0.3808  0. 5479  0 .2941  G*Q*D  0.2287  o. 17 16  0 . 2349  Q*T*D  0.5956  0. 3200  0 .9163  G*Q*T*D  0. 1753  0. 7063  0 . 1 692  Significant at p = 0.05 level Significant at p = 0.01 level  RESULTS: EXPERIMENT 3 / 2 1 6 1. Main Effects on Accuracy for Transformed Data 1.  Question Type - Analysis of the data showed that significantly higher percentages of correct responses were obtained for performing Q2 (95%) as compared to performing Q3 (87%). No other key contrasts among means were found to be significant at the a = 0.05 criterion as revealed by the Dunn-Bonferroni tests.  Note that percentages of correct responses found for  Q l performance was 9 1 % .  2.  Time Period - More time periods (14 periods) was found to affect accuracy adversely.  Hence,  the percentage of correct response responses was 94% for 7 periods but dropped to 88% for 14 periods.  This finding is consistent with results on time performance in that more time periods  led to longer processing time.  2. Two-way Interactions on Accuracy for Transformed Data 1.  Question Type x Time Period -- Table 8.11 shows the mean percentages of correct responses for the various treatment combinations of this two-factor interaction and the Dunn-Bonferroni results among key contrasts of interest that were found to be significant below or at the a = 0.05 criterion level.  Table 8.11 reveals that a significantly adverse effect was found on accuracy, for task activities to be performed under Q3, with increasing number of time periods (14 periods), but not for those to be performed under Q 1 or Q 2 . t The other significant key contrast was the higher accuracy found with Q1 compared to Q3 and with Q2 compared to Q3 in the case of 14 periods.  No  significant differences were found among tasks in the case of 7-period graphics.  t More time periods also affected Q2 performance adversely statistically significant as revealed by Dunn-Bonferroni tests.  but  this  effect  was not  RESULTS: EXPERIMENT 3 / 217 Table 8.11: Dunn-Bonferroni Test Results and Mean Value Table for Question Type x Time Period Interaction of Transformed Data (Experiment 3, Outliers Excluded) Dependent Variable: Accuracy Performance  Quest ion Type  7  14  a.  b.  Periods  Per i o d s  86%  (*1)  96%  (*2)  97%  92%  (*3)  (*4)  98%  (*5)  77%  (*6)  Significant Differences among Means below a — 0.01 level: 1)  (*2,*6)  2)  (*5,*6)  Significant Differences among Means at the a = 0.01 level: 1)  c.  Q3  Q2  Ql  (*4,*6)  No other contrasts of interest among means were found to be significant.  Note that the  above numbering of treatment combinations correspond to those marked in the table of means above.  •  •  RESULTS: EXPERIMENT 3 / 218 These results were consistent with the findings on time performance for this experiment: that is, the adverse effects of more time periods on Q3 was higher than those on Q2.  Q1 in this  experiment was not adversely affected by increasing number of time periods whether for time or for accuracy.  Moreover, differences among the various tasks were only significant for 14  periods, with Q3 standing out as the most difficult task to perform. 2.  Question Type x Dataset -- Table 8.12 shows the mean percentages of correct responses for the various  treatment  combinations  of  this  two-factor  interaction.  No  summary  on  the  Dunn-Bonferroni results is given as no major contrasts among means were found to be significant below or at the a = 0.05 criterion level as revealed by BMDP7D software.  3.  Time Period x Dataset -- Table 8.13 shows the cell means on percentages of correct responses for this two-factor interaction and results of the Dunn-Bonferroni tests.  The only significant  contrast of interest was the lower accuracy found with 14 periods as compared to 7 periods in the case of graphics with only 2 datasets. This was consistent with the idea that complexity of graphics was stronger with increasing time periods.  Perhaps, the lack of significant differences  between using graphics with only 2 datasets versus those with 3 datasets for accuracy indicated that subjects had undergone sufficient training on multi-dataset displays to use them accurately.  RESULTS: EXPERIMENT 3 / 2 1 9 Table 8.12: Mean Value Table for Question Type x Dataset Interaction of Transformed Data (Experiment 3, Outliers Excluded) Dependent Variable: Accuracy Performance  Quest ion Type  Q1  Q2  2  Datasets  96%  93%  3  Datasets  87%  9 6%  Q3  85%  89%  RESULTS: EXPERIMENT 3 /  220  Table 8.13: Dunn-Bonferroni Test Results and Mean Value Table for Time Period x Dataset Interaction of Transformed Data (Experiment 3, Outliers Excluded) Dependent Variable: Accuracy Performance  Dataset Time P e r i od  *  2  Datasets  Category 3  Datasets  7  Periods  97%  (* 1 )  9 1 %  14  Per i o d s  8 5%  (*3)  91%  (*2)  (*4)  Numbering i n c e l l s correspond to t r e a t m e n t combinations f o the D u n n - B o n f e r r o n i method of m u l t i p l e means comparison.  a.  Significant Differences among Means below a = 0.01 level: 1)  b.  (*1,*3)  No other key contrast among means were found to be significant.  Note that the above  numbering of treatment combinations correspond t o those marked in the table of;.<neans given above.  \•,  RESULTS: EXPERIMENT 3 / 221 D.  SUMMARY  OF EXPERIMENT  3  FINDINGS  Strong evidence showing the need to match characteristics of graph formats to those of the tasks to be supported underlies the results of graph format by question type interaction for sessions 1 and 2 of this experiment (E3). Indeed, the highly significant interaction effect of graph format by question type for time, as depicted in figure 8.2, indicated that both symbols and bars took longer for Q 2 and Q3 as these tasks had low anchoring of information on the (x,y) axes framework.  Hence, significantly longer  time was taken t o use bars than either symbols or lines for performing Q 2 or Q3. For Q 1 , as there was a strong anchoring of information on the y-axis, there were no significant differences found among the various formats, although symbols appeared t o have a slight advantage for Q1 over bars or lines.  All of these results were in strong support of a priori hypotheses stated on the basis of  anchoring concept discussed in chapters 3 and 4.  Even with increase of the dataset variable from 2 to 3 datasets, time performance with the use of bars was significantly affected adversely (figure 8.4). versus multiple lines.  No significant adverse effect was found with singular  The adverse effect for increasing datasets on symbols was actually of a lesser  degree than that f o u n d on bars.  In addition, strong evidence was produced t o support the characterization of graph format and task based on the anchoring concept presented in chapter 3: bars may be characterized as having moderately strong x- and y-axis anchoring but weak dataset anchoring; lines may be characterized as having strong dataset anchoring but weak anchoring on the t w o major axes; and symbols may be characterized as having moderate anchoring on all graphical components (i.e. x-axis, y-axis, and dataset).  Accordingly, results of this experiment showed that the use of bars should be strongly  questioned in multiple dataset representations as well as for task activities where there is an absence of strong anchoring of information on the axes. should generally be avioded.  Indeed, multiple bar representations  But the decision to use multiple lines versus multiple symbols  would largely depend on the nature of task activities to be performed: that is, in the presence of  RESULTS: EXPERIMENT  3/222  a strong anchoring of information on the axes (x-axis, y-axis, or both), multiple symbols should be used whereas the absence of such anchoring information should argue for the use of multiple lines (e.g. E3).  Finally, different characteristics of tasks investigated in this experiment were also found t o affect different factors of information complexity.  For example, while performance of Q1 was not generally  found t o be adversely influenced by either increasing number of time periods or datasets in this experiment, performance of Q2 and Q3 was found t o be gravely influenced by more time periods, but not by more datasets.  In fact, there was some indication that subjects learned t o use 3 datasets  representations as accurately as 2 datasets representations.  Inevitably, understanding h o w characteristics of various graph formats can best be matched t o the characteristics of various tasks will help to unveil the relative merits and/or demerits of using various forms of graphs for performing various tasks. Task differences are also useful in providing insights as t o why certain results are found in certain experiments but not others.  The next chapter discusses  these important issues in greater depth and also attempts to integrate the results of all three experiments with the current literature.  IX. INTEGRATION OF RESULTS Results of experiments E1, E2, and E3 were presented respectively in chapters 6, 7, and 8. This chapter attempts to combine time performance results by first, drawing general conclusions on the various graph formats studied based on key findings observed in individual experiments, and second, integrating these and other findings with the current graphics literature.t  A.  OVERVIEW  OF KEY  FINDINGS  The Graph Format by Question Type interaction effect was of central focus in this reseach as it provided information regarding the relative strengths and weaknesses of various graph formats for performing different task activities under different experimental conditions.. Hence, the significance of this particular interaction effect on time was discussed for all experimental sessions 1 and 2 in previous chapters.  As noted, the difference between interpreting results for sessions 1 and 2 lies in the  implication that session 1 findings are more applicable to occasional or first-time graph users, whereas session 2 findings to experienced or frequent graph users.  Accordingly, this interaction effect will  form the basis for discussing relative strengths and weaknesses of the various graph formats in terms of session 1 results across the experiments.  As for session 2 results, the different graph formats will be  evaluated based on several criteria, including: Graph Format as a main factor effect; Graph Format by Question Type interaction effect; and Graph Format by Dataset interaction effect. +  t . Since a mega-analysis of the total rawdataset (i.e. E1, E2, and E3 data) suggests significant differences between experiments on the "Experiment" factor and its interaction with other major factors, detailed statistical comparisons among effects found between experiments are not drawn. Instead, comparisons are drawn at a more general level and the discussion focus on how results fit into existing literature. * Examination of detailed experimental results for E1, E2, and E3 revealed no other graph format related interaction effect to be statistically significant for session 2 on time with the exception of a significant 3-way Graph Format by Question Type by Time Period interaction found in that session for E2 based on standard ANOVA procedures. This exception was, however, ignored because further statistical analysis using the Dunn-Bonferroni method of multiple mean comparisons on planned contrasts of key interest among cell means (table 7.8) for this interaction did not yield any significant differences among the contrasts considered. 223  INTEGRATION OF RESULTS / 224 1. Effects for Time Performance in Session 1 Table 9.1 provides an overview of the relative strengths and weaknesses found for the various formats based on time results for session 1 across the experiments.  First, the results of experiment E1  provided strong evidence that occasional or first-time graph users found bars as well as symbols t o be relatively faster t o use than lines for performing task activities associated with Q 1 (i.e. tasks which required both high x-axis and y-axis anchoring: Group I tasks).t In addition, lines were faster t o use for task activities associated with either Q2 or Q3 (i.e. tasks which required both high x-axis and dataset anchoring: Group II tasks) than with Q 1 .  Conversely, E3 results indicated that bars took longer to use than either symbols or lines for task activities associated with either Q2 or Q3 (i.e. tasks which required high dataset but l o w x-axis anchoring: Group IV tasks).* In addition, symbols were faster t o use than lines for task activities associated with Q1 (i.e. tasks which required both high y-axis and dataset anchoring: Group III tasks). Moreover, for both bars and symbols, response time for Q1 was shorter than either Q 2 or Q3.  Finally, E2 results revealed that bars took significantly longer for performing Q 2 (i.e. tasks which required both high x-axis and dataset anchoring: Group ll tasks) compared t o Q 1 . *  Overall, these results indicated that the relative strengths and weaknesses of various formats depended critically on the type and nature of information anchoring characteristics of tasks t o be performed. For example, one of the major strengthstt  of bars lies in their having a strong x-axis anchoring  characteristics.  But this characteristics is g o o d only for performing those tasks with a similar  characteristics.  Hence, bars are generally good for extracting single data elements (e.g. Q1 in E1 and  t See tables 3.8 and 3.9 as well as figure 6.1 and table 6.5. * See tables 3.8 and 3.9 as well as figure 8.1 and table 8.5. * See figure 7.1 and table 7.5. t t Actually, this is a relative notion depending on the type of task: that is, what may be considered a "strength" (e.g. strong x-axis anchoring) relative t o one task (e.g. Group I task) may conversely be considered a "weakness" relative to a different task (e.g. Group IV task).  INTEGRATION OF RESULTS / 225 E2).  In contrast, the key characteristic of lines that differs from other formats is their having a strong dataset anchoring.  They also have weak anchoring on the axes (see chapter 3).  Consequently, E1 results  provided strong evidence on the difficulty of disembedding points on lines. Hence, lines should never be recommended for identifying or locating point values (cf. chapters 3 and 6).  However, results  indicated that lines are generally best for detecting trends and patterns.  As for symbols, they appear to provide an alternative to either bars or lines, especially in situations where the task has only a partial anchoring on some major axes but requires a moderately high anchoring on the dataset component (e.g. Q1 in E3). In these task settings, the disadvantage of bars due to their low dataset anchoring, as well as the disadvantage of lines due to their low axes anchoring, leave symbols as the best choice.  In summary, the relative strengths and weaknesses of various graph formats for various tasks applicable to first-time graph users are well characterized by the concept of information anchoring on the major components of time-series graphics (chapter 3). Further progress in knowledge about the merits and demerits of various graph formats will likely come from an active theory-based research program evaluating a variety of tasks and graph formats. The present discussion provides a stepping stone to accumulating knowledge regarding the use of bars, symbols, and lines among novice graph readers. The next section discusses findings that are more applicable to expert graph readers.  INTEGRATION  OF RESULTS / 226  Table 9.1: Overview of K e y Findings for Experiments E1, E2, and E3 (Session 1 Results) Dependent Variable: Time Performance  Session G raph Formats  Bars  l Results:  .  Time  Performance  Symbols  Lines  Experiments  Experiment  Ql :  El  Bars than  better QI: Lines  Symbols a r e better than Lines  Q1:  Lines worst among a l l • formats  Lines: easier  E x p e r i me n t E2  B a r s : Q2 to answer  faster than  No  s i g n i f i c a n t contrasts  No  Q2 & Q3 t h a n Ql  s i g n i f i c a n t contrasts  Ql  Experiment '  E  3  B a r s : Ql t o answer  Q2 & Q3 '  faster than  Symbols: Q2,Q3 took l o n g e r to answer t h a n Ql Ql : S y m b o l s better than Lines  are  Q1 : L i n e s a r e worse than Symbols  Q2,Q3: B a r s a r e Q2 ,Q3: S y m b o l s , Q2,Q3: L i n e s worst o f the than are b e t t e r t h a n are b e t t e r graph formats Bars Bars  INTEGRATION  OF RESULTS / 227  2. Effects for Time in Session 2 The objective of this section is not to give another review of results discussed in the preceding chapters (i.e. chapters 6, 7, and 8), but to focus on Graph Format related findings which may be used to address the research problem (chapter 1). As noted earlier, only main factor and two-factor effects with respect to graph format and its interaction with other factors are discussed as no higher-order interactions related to graph format have been found to yield significant contrasts that would be of interest for this session (i.e. session 2).  a.  1.  Main  Factor  Effect  Bars - Table 9.2 provides an overview of findings on the Graph Format effect found across the experiments when applied separately to bars, symbols, and lines.  It is apparent that for E1 and  E2 tasks, bars were no better or worse than the other formats, but they were inferior for task activities investigated in E3. Accordingly, it was observed that E1 and E2 tasks differed from E3 tasks in that strong anchoring of abscissa information was available for tasks associated with experiments E1 and E2 but not with experiment E3 (chapter 3).  Moreover, tasks tested in E3  also employed only multiple dataset representations (cf. appendices B, C, and D).  It appears,  thus, that bars are expected to become difficult to use when they are not anchored strongly to the abscissa as in E3 tasks.  Moreover, they become confusing to read when they are used in  multiple dataset representations. 2.  Symbols - Table 9.2 shows that symbols were more quickly processed than bars for tasks evaluated in experiment E3 but not in E1 and E2. Because strong anchoring were provided on the abscissa for all E1 and E2 tasks, bars were not inferior to symbols when time performance scores were averaged across all tasks examined in these experiments.  For experiment E3 tasks,  however, both the absence of such a strong abscissa anchoring and the increase in the number of datasets, placed bars at a disadvantage when compared to symbols. 3.  Lines -  The same general comments regarding symbols compared to bars apply to lines  compared to bars (table 9.2).  INTEGRATION  OF RESULTS / 228  Table 9.2: Overview of Main Factor Effect on Graph Format Characteristics for Experiments E l , E2, and E3 (Session 2 Results) Dependent Variable: Time Performance  Session Graph Formats  2  : Main  Graph  Ba r s  Format  Effect  Symbols  on  Time  Lines  Experiments  -Exper iment E1  No  significant contrasts  No  significant contrasts  Expe r imen t E2  No  significant contrasts  No  significant contrasts  Exper iment E3  Bars are the worst format  Symbols better than Bars  No  No  significant contrasts  significant contrasts  Lines than  better Bars  INTEGRATION fa. 2-way 1.  OF RESULTS / 229  Interactions  BARS -- Evidence provided by the significant Graph Format by Question Type interaction on the relative strengths and weaknesses of various graph formats for performing various task activities in session 2 of E1 and E3 was generally consistent with that provided by session 1 of these experiments.t The Graph Format by Question Type effect was not significant for session 2 of E2. Table 9.3 provides an overview of findings from this two-factor interaction as applied to the different graph formats evaluated.  Analysis of E1 data indicated that bars were faster than lines for isolating single points (i.e.  Q1)  although this particular contrast did not reach statistical significance at a = 0.05 level for session 2 (see figure 6.2 and table 6.6).  This might probably be attributed to learning,  in contrast,  analysis of E3 results revealed strongly that bars took longest for answering Q2 and Q3 as compared to either lines or symbols.  However, the same expected disadvantage of bars over  other formats for answering Q2 and Q3 in E2 was not supported by statistically significant contrasts.  This was probably due to the presence of strong information anchoring on the  abscissa provided by E2 task setting.  The principal finding, therefore, is that bars should be used for performing tasks with a strong anchoring of time period information on the abscissa (i.e. Group I and II tasks -- see tables 3.8 and 3.9).  This is because every bar on a bar graph has an anchoring base on the x-axis  component.  Consequently, the absence of a strong anchoring of information on the axes  framework caused bars to become an inappropriate format (i.e Group IV tasks - see tables 3.8 and 3.9).  In addition, results of the significant Graph Format by Dataset interaction across the  experiments strongly indicated that increasing the number of datasets depicted had a greater adverse effect on bars than on either symbols or lines (e.g. cf. detailed findings on the significant Graph Format by Dataset interactions found for all experiments). t  Refer to  earlier discussion  on session 1 results as applied to  bars.  In fact, when only  INTEGRATION OF RESULTS / 230 multiple dataset representations were used,t bars resulted in higher time performance when compared to multiple lines (or multiple symbols).  Because multiple bars, unlike multiple lines or  symbols, have elements belonging to the same dataset depicted as isolated bars (i.e. one bar from the other), more effort is, therefore, required to string together individual bars belonging to the same dataset than to link together symbols or points on lines belonging t o the same dataset.  It is interesting to note that although the literature argues that bars have discrete and isolated characteristics which makes them suitable for extracting scale-values of single points and . unsuitable for extracting trends of many points (see Pinker, 1981, 1983), there has been no claim for the advantage of bars due to their strong anchoring to the abscissa.  Moreover, warnings  against the use of multiple bars are also only beginning to appear in the graphics empirical literature.  For example, Cleveland (1984) argued that it is harder to compare the fractional  graph area data on the divided bar chart than on the grouped dot chart because all comparisions on a grouped dot chart could be done by position judgement rather than the less accurate length-area judgement to be performed on the divided bar chart (Cleveland & McGill, 1984).  2.  SYMBOLS - The significant Graph Format by Question Type interaction found in El indicated that symbols were faster to use than lines for answering Q1 (i.e. isolating single elements from a dataset). tasks.  No significant difference was found between symbols and the other formats for E2  However, E3 results showed that both Q2 and Q3 took longer to process with symbols  thanQL  Results of the significant Graph Format by Dataset interaction found in E1 revealed that multiple symbols were faster to read than multiple bars. However, multiple symbol graphs required more time to process than single symbol graphs in the case of E2. t  E.g. Graph stimuli presented in E3.  INTEGRATION OF RESULTS / 231 Table 9.3: Overview of Two-Factor Interaction Effects on Graph Format Characteristics for Experiments E1, E2, and E3 (Session 2 Results) Dependent Variable: Time Performance I  Session Graph Formats  2  (Time):  Bars  Two-Factor  Symbols  Experiments  Exper E !  Interactions  Lines •  iment  Q 1: S y m b o l s a r e better than Lines  Ql:  L i n e s  than  Lines: easier Bars: 1 dataset f a s t e r than 3 d a t a s e t s 3 d a t a s e t s : B a r s worse than Symbols  Expe r i m e n t E2  Bars:  dataset  1  faster  than  3  d a t a s e t s  •  Experiment E3  Bars: than  Q1 Q2  easier & Q3  Q2 & than  Q3 Ql  Lines:1 dataset faster than 3 datasets 3  d a t a s e t s :  Symbols than  b e t t e r  Bars  Symbols: 1 da t a set faster than 3 datasets  L i n e s : ! datase t faster than 3 datasets  Symbols: Ql i s e a s i e r Q 2 & Q3  Q2,Q3: B a r s a r e w o r s t among a l l graph formats  Q2,Q3: Symbols are b e t t e r than Bars  Bars:2 datasets faster than 3 datasets  Symbols:2 dataset f a s t e r than 3 datasets  2 Datasets:Bars worse than Symbols  2 Datasets: Symbols better than Bars  3 Datasets:Bars worse than Lines  worse  Symbols  Q2,Q3: L i n e s are better than Bars  3 Datasets: Lines better than Bars  INTEGRATION OF RESULTS / 232 As well, graphs with three datasets led to more time than those with only t w o datasets in experiment E3.  In E3, results also revealed that among the t w o dataset plots, symbol charts  were faster to use than bar charts.  In summary, these results indicated that the advantages of symbols relate to their combining of characteristics belonging to both bars and lines. On the one hand, symbols, like bars, have the advantage over lines of being more strongly anchored to the axes components of time-series graphics. That was why lines took longer than symbols for answering Q1 in E1 (i.e. tasks).  Group I  O n the other hand, symbols, like lines, have the advantage over bars of being more  strongly  anchored  to  the  dataset  component  of  time-series  graphics  representations.  Accordingly, when multiple dataset representations are to be used, symbols are recommended over bars.  However, the issue of multiple symbols versus multiple lines is one which has to be  resolved based on the characteristics of tasks to be performed.  Actually, symbols are best in situations where either the use of bars or lines becomes inadequate due to the nature of task requiring both high axes and dataset anchoring characteristics (i.e. Group III tasks in table 3.8). Thus, symbols yielded quicker responses when Q-1 in E3 was to be answered as compared to either Q2 or Q3 of that experiment.  The implication of these findings is that symbols always offer an alternative to bars or lines and a graphics designer should decide the best representation only after careful consideration. Cleveland (1984) advised the replacement of bars with the use of dots (or, symbols).  In fact,  Particularly  in tasks where both the 'whole' and the 'parts' are to be extracted, such as the need to compare, first, exact values, and, second, the need to identify to which dataset these values belong, the use of symbols would be justifiable.  In other words, symbols should be considered  for tasks requiring only partial anchoring on the axes (i.e. Group II and III tasks in table 3.8). 3.  LINES -- Results of E1 strongly indicated that disembedding a point from a line (i.e. Q1) was an  INTEGRATION OF RESULTS / 233 effortful and time consuming task (chapter 6).  For example, Q1 took significantly longer to  answer than Q2 or Q3 when lines were used. Similarly, line graphs required significantly longer processing time than symbol charts for answering Q1 in E1. In contrast, E3 results provided evidence to show that lines were faster t o use than bars for answering both Q2 and Q3.  Put together, these results suggest that the general "weakness" of lines for performing Q1 of E1 was due to their having characteristics of being continuous as well as being completely from the axes framework (i.e. the abscissa and ordinate components).  disjointed  However, this  so-called "weakness" associated with lines for isolating single points should be seen as only one side of the coin.  Indeed, the same characteristics of continuity (i.e. a strong dataset anchoring  characteristic) and separation from the axes (i.e. both a weak x-axis and y-axis anchoring characteristic) apparently became a distinguishable "strength" of lines relative to bars for performing Q2 and Q3 of E3 as observed in the above discussion.  The Graph Format by Dataset interaction results indicated that, apart from the influence of task characteristics, lines generally have an advantage over bars because they are read as a 'whole' (i.e. they have a strong dataset anchoring characteristics) and so several datasets may be represented as several lines on a single plot.  Results, therefore, revealed that although multiple  lines were found to adversely affect task performance in the E1 and E2 task settings, no adverse effect was found with increasing number of datasets on lines in E3 setting (table 9.3); that is, increasing the number of datasets had negligible adverse effect on lines, but not on bars or symbols.  Lines were found to be superior to bars for extracting both level  difference  information (Q2) and trend information (Q3) in E3. Of course, task characteristics also play a part in determining the graph format appropriate for representing multiple datasets.  Based on  the accumulated evidence, the choice between using multiple lines versus multiple symbol representations appears to depend on whether the main characteristic of the task at hand has a higher information anchoring on the axes framework (e.g.  E1 and E2 task settings) or on the  INTEGRATION OF RESULTS / 234 dataset component (e.g. E3 task setting).  These findings are consistent with the graphics theory literature, which argues that, on the basis of Gestalt principle, datapoints on a line appear to be seen together and are difficult to isolate. For E3 tasks, increasing the number of datasets on a line graph appears to have little or no adverse effect on task performance as compared to doing it on a bar chart or a symbol plot. Essentially this is because each additional dataset depicted on a line graph resulted only in one additional line to be plotted, but on a bar or symbol chart, resulted in many separated bars or isolated symbols.  Moreover, since each line is more-or-less seen as a whole, processing several  datasets on a line graph involves less effort than on a multiple bar or symbol chart.  The implication of these findings for graphics designers is to use multiple line graphs as opposed to multiple bar charts for tasks where no strong anchoring of information is provided on the axes frameworrk (i.e Group IV tasks in table 3.8). Note that, by definition, the absence of strong anchoring of information on both axes framework inevitably ruled out the possibility of fxacf  Questions  B.  or task dealing only with a single point!  INTEGRATION  OF FINDINGS  WITH  THE CURRENT  LITERATURE  To generalize these findings, we now attempt to integrate the experimental results with the current graphics literature.  Only results found to be both c o m m o n across all experiments and of particular  interest will be discussed. This discussion emphasizes the time performance results found in session 2 of the experiments.t  t Accuracy results as well throughout this research.  as session  1  results  on  time  have  been  of  marginal  interest  INTEGRATION OF RESULTS I 235 1.  Learning  First, a highly significant (p < 0.01) Session effect, which was interpreted as indicative of 'Learning't was found to permeate all the experimental trials. Consistent with session 1 results, results of session 2 show the same relative effect of a selected treatment combination to yield a higher or lower time when compared to another selected treatment combination. Over sessions, there was evidence of an improvement  in  time  experimentation (e.g.  performance  across-the-board.  In  short,  as found  in  recent  graphics  DeSanctis & jarvenpaa, 1985), subjects of the present experiments adjusted  rapidly to the different graph formats over the repeated experimental sessions.  Learning with respect  to accuracy performance was also found when mean scores of the t w o separate experimental sessions were compared.*  Accordingly, it is argued that more attention should be paid to the importance of 'learning' effect in MIS graphics research (see Lusk & Kersnick, 1979; DeSanctis & Jarvenpaa, 1985) and that future researchers should attempt to manipulate and/or control adequately effects due to knowledge and experience of subjects in terms of their familiarity with the graphics stimuli (see Pinker; 1981, 1983).  As already noted, the use of a repeated time measures design is appropriate for examining learning effects over time. Moreover, it is important to choose the right subjects, such as frequent users of the graphics representations being investigated, as opposed to first-time or occasional users.  In this  research, all subjects were asked to undergo a 'practice' session as well as a preliminary 'actual' session to ensure adequate exposure to the experimental graphics stimuli on their part.  Finally, the choice of  graphics material presented should abide by the standards established for them, so as t o avoid  t Refer to the Glossary for a definition of this term. * Analysis of accuracy data, however, used the transformed data in order t o avoid the highly skewed distribution of the originally captured "0's" and " 1 ' s " data (see Glass et al., 1972 for violation of normality assumption for binary data used in the ANOVA/ANCOVA procedures. One of the key reasons why the combined dataset was used besides that of the normality assumption, was the small significant effects found with respect to accuracy, when analyses were limited to session 2 datasets alone for the various experiments (chapters 6, 7, and 8).  INTEGRATION OF RESULTS / 236 possible confounding of factors due to 'poor' or 'illusive' design (see Wainer, 1984).  Graphics  designed in this research were being pilot tested to avoid possible violations of pertinent principles laid out in Kosslyn et al. (1983) and other sources (see Kosslyn, 1985).t  2. The Individual Difference Characteristics Among the experiments, only E1 results showed a highly positive and significant GEFT-accuracy correlation.  Field independents were found to perform more accurately than their counterparts (i.e.  field dependents). outperformed  low  Lusk (1979), for example, found that high analytics (as classified using Witkin's EFT) analytics  in a disembedding  task  regardless  of  the  format  of  information  presentation.  On the other hand, all three experiments failed to provide any strong evidence of a statistically significant  GEFT-time  correlation.  These  results  are  interpreted  to  mean  that  although  field  independents were found to outperform field dependents in some situations, there is need to caution about a generalized claim of such superiority.  In fact, previous MIS studies have produced only mixed  results concerning the effects of individual differences on task performance (see Chervany & Dickson, 1974;  Benbasat & Schroeder, 1977; Huber, 1983).  In short, the accumulated evidence has not  provided any strong support for concerns on the cognitive style variable in MIS graphics research (see Benbasat et al., 1986).  3. Task Characteristics It has been argued that the equivocal results from various graphics experiments often cited in the literature (e.g.  Ives, 1982; DeSanctis, 1984) could be explained by differences in task characteristics  (Benbasat et al., 1986).  That task characteristics was a prime factor influencing performance in this  research was evidenced by the directions of the Graph Format by Question Type interaction effect found across the three experiments and discussed earlier. t  Other aspects of time-accuracy tradeoff  In fact, task differences within as well as  are also discussed elsewhere (e.g. Vessey,  1987).  INTEGRATION OF RESULTS / 237 among the experiments have been used throughout to explain findings for the various significant effects found across experiments 1, 2, and 3.  Consequently, future graphics researchers should pay  more attention to providing adequate control over the experimental task variable.  In line with results of recent studies (e.g.  Dickson et al., 1986), the usefulness of a particular graph  format seems to be largely a function of the characteristics of the task at hand. Hence, when there is a strong anchoring of information on the axes framework (i.e. Group I tasks -- Q1 of E1 and E2),t bars as well as symbols were found to be faster and more accurate t o use for such tasks than lines (chapters 6 and 7).  ln the absence of such an anchoring (i.e.  Group IV tasks — Q2 and Q3 of E3),* lines  appeared to have a relative edge over bars or symbols (chapter 8). Symbols were found to be most appropriate in situations where there were only partial anchoring of information on the axes as well as strong anchoring of information on the dataset component (e.g. Group 111 tasks -- Q1 of E3).  Most of these results were, as a matter of fact, generally supportive of principles underlying the Kosslyn-Pinker theory, such as Gestalt Laws and Pinker's Graph chapters 2, 3, and 4).  Difficulty  Principle  (discussed in  In addition, the concept of information anchoring also provided a strong basis  for matching task characteristics* to graph format characteristics (table 3.8).  In summary, future graphics researchers should not only understand the characteristics of  the  experimental task investigated, such as the composition of its activities, but should also pay particular attention to developing an extended framework or taxonomy of tasks such as that developed in this dissertation (chapter 3).  t See tables 3.4, 3.8, and 3.9. * See tables 3.4, 3.8, and 3.9. * Refer to chapter 3 for the detailed  discussion.  INTEGRATION  OF RESULTS / 238  4. Graph Format It is believed that the current controversy over the use of different mode of information presentation can be resolved by a more careful analysis of the match (or mismatch) between the task employed and the presentation format (Benbasat et al., 1986; Vessey, 1987).  It is precisely this concern which  motivated the present effort.  Three types of graph format including bars, symbols and lines as well as three types of elementary cognitive-perceptual tasks, namely, scale-value comparison and extraction task (Q1), level difference comparison and extraction task (Q2), and trend comparison and extraction task (Q3) were investigated in this research.  Except for results on time performance for E3, the main Format effect was generally  found t o be insignificant.  For E1 and E3 results on time (for session 2), interaction of Graph Format  and Question Type variable, as well as that of Graph Format and Dataset variable, were found to be statistically significant, as postulated.  For E2, while the Graph Format by Dataset interaction was also  significant, that of the Graph Format by Question Type interaction was not significant for session 2 results.  Collectively, these findings provided general support for Pinker's  Graph  Difficulty  Principle  (Pinker, 1981,1983; Kosslyn et al., 1983), which claims that while no one type of information format may be deemed superior to another overall, different types of information presentation format are expected t o facilitate the extraction of different classes of information.  5. Information Complexity The construct of graphical information complexity was found t o be multi-faceted, consistent with current literature (e.g. Davis et al., 1985; Lauer et al., 1985).  For example, it was observed that  information complexity of time-series graphics depends on several factors, including: the number of time periods plotted along the abscissa; the number of datasets plotted in a single graph; the frequency of pip markings for interpolating quantities along the ordinate scale, the total number of plotted points in a single display (see chapter 3).  INTEGRATION OF RESULTS / 239 For all experiments, there was strong and direct evidence to show that information complexity of graphics was stronger at higher levels of dataset category, as well as time period variation.  One  implication for designers is to restrict the number of time periods or datasets to be depicted on a single plot.  This is consistent with the notion that a major limitation on graphics processing is the  number of nodes that can be encoded simultaneously in a particular graph schema as well as the finite size of a visual description that can be stored in short-term memory (Kosslyn et ai., 1983).  Hence, a  graphics designer should always be prepared to simplify a complex and confusing graph into a number of elementary plots that would be easier to read and understand.  Another concept associated with information complexity is that of task complexity, a construct which also appears to be multi-factorial.  For example, the difficulty of task performance in this research was  f o u n d to be dependent on several factors including, the type of question asked, the number of graphical components to be searched in order to answer the question,* as well as the type and nature of processing mechanisms involved.  In general, it has been shown that higher levels of information  complexity in a graphics presentation only lead to greater task complexity.  6. Perceptual-Cognitive Mechanisms in Graphics Processing Little attention has been paid to the perceptual and cognitive behavior of a graph reader in MIS graphics literature. According to Cleveland (1985), the graph reader executes a variety of mental-visual tasks when extracting data from a display.  Some of these tasks can be made effortlessly and almost  instantaneously, e.g. tasks executed by pre-attentive vision (julesz, 1981), while others require more conscious thought (e.g. tasks involving mental calculation and quantitative reasoning). examined in this research involved both perception and cognition.  The tasks  It should be noted that in graphics  information processing, performance of cognitive tasks is often enhanced with the performance of associated perceptual tasks.  t This differentiated and 3.9.  For example, reading the DV scale-value of a point is facilitated by the  among the  experimental  tasks for  El,  E2, and  E3. See also tables  3.8  INTEGRATION OF RESULTS / 240 perceptual tasks of locating the position of the point relative to the ordinate scale.  It is believed, therefore, that subjects in this research find different types of graphics representation to be helpful in extracting different classes of information, because some kinds of information can be extracted visually from a bar chart, whereas the same information has to be mentally decoded from a line graph, and vice versa.  For instance, as bars have relatively strong axes anchoring as compared to  lines (see chapter 3), performance of tasks requiring the identification of axes information (e.g. Q1 in E1) with bars would only involve the appropriate conceptual messages being "flagged" (retrieved) via a bottom-up encoding mechanism process, as opposed to a top-down interrogative process should the same task be performed with lines (see figure 2.4: Kosslyn et al., 1983; Pinker, 1983).  In fact, scale-value and time period information on iines cannot automatically be extracted but requires mental effort to read the interpolated value on the respective axes. taken for answering Q1 in E1 with lines than with other representations.  Consequently, longer time is Note that symbols also have  better anchoring on the axes compared to lines and thus, conceptual messages on time period and/or scale-value information associated with depicted symbols are more easily assembled via the bottom-up mechanisms compared to lines. The key implication of the theory is, thus, the appropriate match of formats to tasks in order to solve a problem with the least amount of t o p - d o w n encoding processing.  Experience or familiarity with extracting various data from various graph formats could also enhance processing time, because repeated activation of a graph schema enhances the priming of appropriate nodes attached to its visual,description (Pinker, 1981, 1983). Task performance in session 2, therefore, was both faster and more accurate than in session 1.  Moreover, this explains why some of the  significant differences found in session 1 among the various graph formats for performing different tasks were not significant for session 2.  In fact, processing strategies were enhanced and stabilized  over sessions, for example, several subjects appeared to become comfortable with rapid visual scanning of the displays during session 2, instead of pointing fingers at various places of the graphical  INTEGRATION OF RESULTS / 241 displays while performing tasks during session 1.  Finally, future graphics researchers may also want to consider monitoring eye movements as so much visual processing is involved in extracting data from graphical representations.  C.  SUMMARY  In summary, both the theoretical discussion and the empirical evidence gathered in this research provided complementary as well as converging evidence to indicate that different types of graph format can facilitate different types of task.  More importantly, it is the matching of characteristics  between the task at hand and the format of the presentation which determines the relative strengths and/or weaknesses of the various representations.  The next chapter concludes the dissertation with a brief review of key findings, contributions, and limitations of the research. suggested.  In addition, directions for future graphics researchers will also be  X.  CONCLUSIONS  This chapter concludes the dissertation by providing (1) a discussion of major contributions of the research and a summary of key findings; (2) a review of limitations associated with  conducting  experimental graphics research and implications of these limitations as applied to the present studies; and (3) a general overview of how the present research may be extended, as well as suggestions for future studies based on questions that remain to be answered or issues which arise as a result of this work.  A.  SUMMARY  OF KEY FINDINGS  AND  MAJOR  CONTRIBUTIONS  Much has been said about the lack of a cumulative and theory-based research discipline in the field of Management Information Systems (Keen, 1980).  Indeed, one of the most often cited reasons for the  failure of prior MIS graphics research has been this lack of a cumulative and theoretical perspective (see Davis et al., 1985; Benbasat et al., 1986; Vessey, 1987). Together with these limitations, a number of other methodological problems found among prior MIS graphics research have resulted in a set of guidelines and strong recommendation for the use of programs  of laboratory  experiments  (Jarvenpaa  et al., 1985; Dickson et al., 1986; Jarvenpaa & Dickson, 1988). It is futher contended that the design of such an approach which is aimed at studying a variety of tasks at various levels of complexity, examining both outcomes and processes w o u l d contribute to developing a cumulative and coherent body of knowledge in the area of graphics (Benbasat et al., 1986; Jarvenpaa & Dickson, 1988).  a.  Contributions  The primary purpose of this research was t o provide solution(s) to the basic problem  of choosing the  most appropriate graphical representation^)* for displaying a given set of data (Bertin, 1983), based on theory and empirical evidence rather than on opinion and intuition. Since the current thinking among MIS graphics researchers is that different modes of information presentation should facilitate different  t Such alternative representations.  representations  include  tabular,  242  bar,  line,  symbol,  pie,  arid  even  pictoral  CONCLUSIONS / 243 types of tasks (e.g.  DeSanctis, 1984; jarvenpaa et al., 1985; Dickson et al., 1986; Benbasat et al.,  1986), the central focus of this research program has been on examining the relative strengths and weaknesses of various graph formats on the performance of various tasks.  A series of three experiments were conducted to test hypotheses drawn from theories reviewed in chapter 2, so as to provide an independent source of empirical validation on current views of graphics scientists, as well as t o uncover new grounds about the strengths and weaknesses of various graph formats.  In addition, elementary perceptual-cognitive tasks were examined as opposed to complex  decision making tasks because it is believed that they provide the means for laying the foundation of a graphics discipline.  The most important  contributions  of this research, therefore, are that  of  accumulating empirical evidence and providing a systematic approach for the accrual of knowledge that could be of long term benefit to the research community in the area of graphics and graphics information processing.  Accordingly, the primary contribution of the empirical component of this research is that of adding to the knowledge that needs to be compiled for completing the matrix of task  presentation  formats  at the micro-level of graph comprehension tasks.  enviroments  by  An attempt is also made to  integrate findings with respect to these more micro-level tasks to the more macro level tasks performed in organizational decision making.  This is best done by developing empirical guidelines  that have been validated by this research as a set of,  ..intial guidelines .. to provide the basis to intelligently move forward .. (in the area of graphics: that is,) .. researchers should be able to replicate and accept certain graphical practice as fact (the base), and add to our knowledge of good graphic practice predicated upon a concrete set of priorities (Jarvenpaa & Dickson, 1988, p.765).  In short, implications arising from results found in this research can be used to provide new rules for graphics practitioners as well as t o form the basis for future research.  For example, the finding that  CONCLUSIONS / 244 line graphs have poor axes anchoring could be translated into a rule, which states that if line graphs are to be used for reading point values, some form of redundant coding should be added. These coding could be a grid (which would provide excellent anchoring on the axes components), or distinguishing symbols and numerals on the location of critical points on the line to be read.  Apart from these contributions, this dissertation has also contributed by providing a framework for classifying tasks (as applied to tasks experimented in this research), as well as in drawing several hypotheses both independently or interdependently from the literature on theories of graphics that were scattered across many disciplines (chapters 2 and 3). Equally important, this dissertation has also contributed towards: a sound methodology for investigating a large number of critical variables that are believed to influence task performance in the use of a computerized graphics interface; a systematic approach for examining hypothesized relationships; and an efficient design for collecting and analyzing a large empirical database (chapters 4 and 5).  It should be noted, however, that any claim to a complete test of a theory such as the Kosslyn-Pinker model  of  graph  experimentation.  comprehension  in  reality,  encompass  many  phases  of  systematic  Experiments conducted in this research focussed merely on the predictive  aspect of the various theories. Difficulty  would,  Specifically, the primary claim tested was that of Pinker's  validity Graph  Principle.  Finally, empirical evidence provided in chapters 6, 7, and 8 for experiments E1, E2, and E3 respectively was integrated and summarized in chapter 9 to provide guidelines to graphics designers as well as future researchers. The next section will review briefly key findings for this research.  CONCLUSIONS / 245 b.  Findings  The following key findings have been based on the analysis and interpretation of empirical data accumulated from experiments E1, E2, and E3.  A brief note on the implications of each of these  results is also provided. 1.  Learning  -  Learning was found to significantly affect task performance with the use of all  graphical information systems.  Experimental replications were used t o stabilize effects due to  learning. 2.  Individual  Difference  Characteristics  -- There is only partial and weak evidence on the effect of  individual difference characteristics on task performance with graphical systems. 3.  Time-Accuracy  Tradeoff  time-accuray tradeoff.  -  Different individuals were found to exhibit different degrees of  Further, each experimental setting should be controlled separately for  possible time-accuracy tradeoff effect in order to maintain a high internal validity and an unambiguous interpretation of experimental results. This could be done, for example, with the inclusion of both time and accuracy measures and appropriate statistical techniques to test the relationship between time and accuracy scores captured. 4.  Task Characteristics  -  Overall, results of the three experiments indicated strongly that the  degree of support provided by a particular graph format for a particular task is very much dependent on the matching of characteristics between the graph format and that of the task at hand.  Among other things, it is argued that the most critical need for the progress of a graphics discipline is to channel efforts into the development of a comprehensive framework  for  classifying tasks (see chapter 3) and the accumulation of empirical evidence which is closely associated with such a task taxonomy (see Jarvenpaa & Dickson, 1988). 5.  Graph Format -- The properties of the various graph formats investigated may be summarized by the following observations: (1) bars are characterized by strong x-axis anchoring, moderate y-axis anchoring and low dataset anchoring; (2) symbols are characterized by moderate x-axis, y-axis,  CONCLUSIONS / 246 and dataset anchoring; and (3) lines are characterized by low x-axis and y-axis anchoring but high dataset anchoring,  ln addition, multiple lines and symbols were found t o be easier to read and  understand than multiple bars.  Consequently, the accumulated evidence showed that bars should generally be restricted t o tasks where a strong anchoring of information exists on the abscissa and that multiple bars should generally be avioded.  In contrast, lines should be used for tasks where little or no  anchoring of information is provided on the axes frame and where a high anchoring on the dataset component is required.  Finally, symbols appeared to be most suitable when there is  only partial anchoring on the axes or where a combination of the characteristics of bars and lines is desired. Thus, this implies that symbols should always be considered as a possible alternative to both bars or lines. 6.  Information  Complexity  - It was found that the construct of graphical information complexity  appeared t o be multi-factorial. At the very least, this research provided converging evidence t o show that information complexity as evidenced by longer elapsed time of a graphics use increased with more time periods and/or datasets plotted on a single display.  As well, factors of information complexity were found t o interact significantly with the graph format variable.  For instance, multiple bar graphs were found t o have the greatest adverse  effects on task performance as compared t o either multiple symbol or multiple line graphs (see chapter 9).  B. REVIEW OF LIMITATIONS Generality or external  validity  is the degree to which the experimental findings may be extrapolated t o  other populations, settings, and times (see Cook & Campbell, 1979). It may be argued, therefore, that the use of student subjects, the use of elementary task settings at the individual level of graph comprehension, the use of monochrome time series graphics stimuli, and more generally, the use of  CONCLUSIONS / 247 laboratory experimental method all represent limitations of this research program.  a.  Limitations  First, it should be noted that the use of students as surrogates for managers in the series of experiments was more or less justified on the ground that similar results of task performance were to be expected, because the tasks were elementary graph comprehension tasks rather than complex managerial decision making tasks.  Second, the use of elementary information extraction tasks also  made them more generalizable to all tasks performed in real-world organizations, as many uses of graphics in the real-world must necessarily involve elements of these tasks or their combinations. Finally, the reason for using only monochrome graphics, as well as those with no crossing of lines was to control possible confounding effects due to color and other complexity factors so as to maintain a high internal validity of the experimental results.  These variables (i.e. color and line-crossing) could  always be introduced and manipulated in future studies, if desired.  While some kinds of weaknesses such as "unnatural setting" are unavoidable in experimental research, it should be pointed out that concerning the significance of external validity (i.e. the generality of results), the nature of the study should also be carefully considered. program was to test theory as summarized in Pinker's (1981) principle of graph formats and different experimental task settings.  The purpose of this research  of graph difficulty  using a variety  In this context, a time dependent measure  was, therefore, emphasized as opposed to accuracy although the inclusion of accuracy dependent measure was important to control possible significant time-accuracy tradeoffs.  Moreover, for theory  testing, external validity may be considered of relatively less importance (Cook & Campbell, 1979) compared to the issue of internal validity.t Consequently, the laboratory experiment method was the most appropriate research strategy chosen for performing this series of studies (see Jenkins, 1985; Benbasat, 1988) because it afforded high internal validity.  t This refers to the validity with which the relationship existing or non-existing between the independent and dependent variables investigated may be inferred (Benbasat, 1988).  CONCLUSIONS / 248 b.  Implications  of Specific  Limitations  The major limitations specific to the research and their implications on the applicability of the findings are: 1.  Graphics  -- This research evaluates various graph formats.  It is not intended to compare tables  versus graphs although the tabular format could well be tested as an extension of this research. It is also limited to monochrome  time series graphics displays that are generated on a  micro-computer screen. The applicability of findings from this investigation is therefore more or less restricted to choosing among alternative graphics representations for depicting time series data on a micro-computer. 2.  Tasks - The research concerns only the most usual purpose for which business time series graphics are used: that of extracting elementary quantitative information. The focus is on graph comprehension rather than on complex decision making. The individual is the unit of analysis; group effects are not addressed. Findings based on this research, therefore, may not be directly applicable to complex graphical analysis tasks involving group decision making in organizations although the findings may definitely be used as a foundation for generating a priori  hypotheses  for relatively complex tasks. 3.  Information  Complexity  -- Due to the iimited number of variables that can be investigated  simultaneously in any one experiment, factors of information complexity have been limited only to variations of depicted time periods and datasets.  Other potential complexity factors such as  the Schutz "degree of line-crossing" effect have been controlled. The findings are thus limited in their applicability to complex forms of time series graphs which have multiple line crossings and irregularities (see Schutz, 1961a,b; Lauer, 1986; Yoo, 1985).  Finally, evaluations of the various graph formats for performing various tasks are based only on time and accuracy.  Some subjective data have actually been collected from closed and open-ended  questions that are administered to participantst after their completion of the computerized session but t  I.e. Appendix J.  CONCLUSIONS / 249 these data have not been statistically analyzed as they lie outside the intended scope of the research. No findings concerning subjective opinions of participants are thus reported in this write-up.  C.  SUGGESTIONS  FOR FUTURE  STUDIES  The experimental design, task, subject, and treatment conditions used in the present studies allow for easy replications, refinements, and extensions.  In fact, E2 and E3 were simply replications  or  extensions of E1.  At the conceptual level, those researchers w h o are interested in theories may work towards a more refined version of existing theories to accommodate current and previous findings.  Studies may also  be done on building a taxonomy of tasks or on how complex tasks may be decomposed.  The  contributions of time-and-motion studies on task analysis in the field of Industrial Engineering may be of relevance for such a line of research. There is also the need to identify and evaluate other factors of information complexity controlled and/or neglected in the present studies (see Schutz, 1961a,b; Lauer e t a l . , 1985).  At the empirical level, studies are still needed to test various other aspects of existing graphics theories such as the way data is organized in memory, and the processes or computations that could be carried out in reading various graphics. Another area of concern for future researchers concerns the learning effect of various forms of information presentation (see DeSanctis & Jarvenpaa, 1985). This relates also to the need of graphics researchers to examine issues regarding prior knowlege, experience, and familiarity of subjects with the graphics presentations t o be investigated (see Pinker, 1981; Simcox, 1981; 1983b).  As a direct extension of the current research program, practicing managers and professionals, expert graph readers or others could be studied as replacements of student subjects. interesting tasks such as data or relationship clustering may be studied.  As well, new and  Moreover, composite tasks  CONCLUSIONS / 250 which combine elements of the fundamental tasks investigated could be introduced. There is also the possibility of replicating these series of experiments with colored graphics so as to study the influence of color on graph comprehension.  In addition, as newer forms of graph format developed, they can  be tested and compared to the oider forms.  To conclude, there are numerous possible extensions of the current research program.  The overall  objective of such research programs should be the testing of existing theories as well as the development of a cumulative and coherent body of knowledge in the information system areas of color and graphics.  Ultimately, the validity of any graphics theory must rest on a continuous cycle of  refinements and substantiations based on empirical testings.  XI. Anderson, J. R. & Bower, G. H. Human  BIBLIOGRAPHY  Associative  Memory.  Washington, DC: Hemisphere Press,  1973. Anderson, T. W.  An Introduction  to Multivariate  Baird, J. C , & Noma, E. Fundamentals  Statistical  of Scaling  Analysis.  and Phychophysics.  New York, NY: Wiley, 1958.  Wiley-lnterscience Publication,  1978. Baroudi, J. ]., & Orlikowski, W. J. The Problem of Statistical Power: A Meta-Analysis of MIS Research, 1987. Benbasat, I.  Laboratory Experiments in Information Systems Studies with a Focus on Individuals: A  Critical Appraisal, WP-88-MIS-023, 1988. Benbasat, I. & Dexter, A. S. Value and Events Approaches to Accounting: An Experimental Evaluation.  Accounting  Review, 54(4), 1979, 735-749.  Benbasat, I. & Dexter, A. S. Individual Differences in the Use of Decision Support Aids. Journal Accounting  Research,  of  20(1), 1982, 1-11.  Benbasat, I. & Dexter, A. S. An Experimental Evaluation of Graphical and Color-enhanced Information Presentation. Management  Science,  37(11), 1985, 1348-1364.  Benbasat, I. & Dexter, A. S. An Investigation of Color and Graphical Information Presentation Under Varying Time Constraints. MIS Quarterly,10(1),  1986, 59-83.  Benbasat, I., Dexter, A. S., & Todd, P. An Experimental Program Investigating Color-Enhanced and Graphical Information Presentation: An Integration of the Findings.  Communications  of the  ACM, 29(11), 1986, 1094-1105. Benbasat, L, & Schroeder, R. An Experimental Investigation of Some MIS Design Variables.  MISQ,  7(1), 1977, 640-650. Benbasat, I., & Taylor, R. N. The Impact of Cognitive Styles on Information System Design. MIS  Quarterly,  2(2), 1978, 43-54.  Bergman, A. S. Perception and Behavior as Compositions of Ideals. 250-292. 251  Cognitive  Psychology,  9, 1977,  BIBLIOGRAPHY / 252 Bertin, J.  Semiologie  Graphique.  Graphique:  Les Diagrammes  and Graphic  Information  Bertin, ). The Semiology  of Graphics.  Theory  Construction:  M.  Les Riseaux  - Les Cartes.  1967; La  Semiologie  The Hague: Mouton-Gautier, 1973.  Bertin, J. Graphics  Blalock, H.  -  Processing,  Berlin: Walter de Gruyter & Co., 1981.  Madison: University of Wisconsin Press, 1983. From  Verbal  to Mathematical  Formulations.  Englewood Cliffs,  NJ: Prentice Hall, 1969. Carter, L. F. An Experiment on the Design of Tables and Graphs Used for Presenting Numerical Data. Journal  Carter, L.  of Applied  F.  Psychology,  3, 1947, 640-650.  Relative Effectiveness of Presenting Numerical Data by the Use of Tables and Graphs,  Washington, DC: U.S. Department of Commerce, 1948a. Carter, L.  F.  Study of the Best Design of Tables and Graphs used for Presenting Numerical Data.  Washington, DC: U.S.  Department of Commerce, 1948b.  Chasen, S. BMDP:P5D - Histograms and Univariate Plots. In BMDP Statistical  Reprinting, Chernoff, H.  Graphical Representations as a Discipline. of Multivariate  L., & Dickson, C.  Production Environment. Chervany,  N.  Manual,  1985  by Dixon et al. (Eds.), University of California Press, Berkeley, 1985.  Representation  Chervany, N.  Software  L., Dickson, G.  In Peter Wang C.  C.  (ed.)  Graphical  Data, NY: Academic Press, 1978.  W.  An Experimental Evaluation of Information Overload in a  Management  Science,  W., & Kozar,  K.  20, 1974, 1335-1344. A.  An Experimental  Gaming  Framework  for  Investigating the Influence of Management Information Systems on Decision  Effectiveness.  Management  University  Information  Systems  Research  Centre.  Working  Paper  7H2,  of  Minnesota, 1971. Christ, R. E. Review and Analysis of Color Coding Research for Visual Display, Human  Factors,  17(6),  1975, 542-570. Cleveland, W.  S.  Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and  Multibased Logging The American Cleveland, W.  S. The Elements  Statistician,  of Graphing  38(4), 1984, 270-280.  Data Monterey, Calif.: Wadsworth, 1985.  BIBLIOGRAPHY / 253 Cleveland, W. S., & McGill, R. A Color-Caused Optical Illusion on a Statistical Graph. Statistician,  The  American  1983.  Cleveland, W. S., & McGill, R. Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods,  journal  of the American  Statistical  Association,  79(387),  1984, 531-554. Cleveland, W. S., & McGill, R. Graphical Perception and Graphical Methods for Analyzing Scientific Data. Science,  229, 1985, 828-833.  Cleveland, W. S., Harris, C. S., & McGill, R. Experiments on Quantitative Judgments of Graphs and Maps. The Bell System Cochran, W.  G.  Journal, 62(6), July-Aug. 1983, 1659-1674.  Some consequences when the assumptions for the analysis of variance are not  satisfied. Biometrics, Statistical  Cohen, J.  Technical  3, 1947, 22-38.  Power  Analysis  for the Behavioral  Sciences.  New York, NY: Academic  Press,  1965,1977. Cook, T.  D., & Campbell, D.  T.  Quasi-Experimentation  Design  and Analysis  Issues for Field  Settings.  of American  Statistical  Boston: Houghton Mifflin, 1979. Croxton, F. E. Further Studies in the Graphic Use of Circles and Bars. Journal  Association,  22, 1927, 36-39.  Croxton, F. E., & Stein, H. Graphical Comparison by Bars, Squares, Circles, and Cubes. Journal American  Statistical  Association,  27, 1932, 54-60.  Croxton, F. E., & Stryker, R. E. Bar Charts versus Circle Diagrams. Journal  Association, Culberston,  H.  M., & Powers, Review,  B., & Olson, M.  and Development. Davis, L.  of the American  Statistical  22, 1927, 473-482.  Communication Davis, G.  of the  H.  R.  D.  A Study of Graphic  Comprehension  Difficulties.  AV  7, 1959, 97-100.  Management  Information  Systems:  Conceptual  Foundations,  Structure,  New York, NY: McGraw-Hill Book Company, 1985.  R. The Effects of Question Complexity and Form of Presentation on the Extraction of  Question-Answers from an Information Presentation.  Ph.D. Dissertation, Indiana University,  BIBLIOGRAPHY / 254 1985. Davis, L.  R., Groomer, S. M., Jenkins, A.  Metric of Question Complexity. Davidson, M.  M., Lauer, T. W., & Kwan, Y. Content Validation of a  Discussion  Paper, Indiana University, 1985.  L. The Multivariate Approach to Repeated Measures.  Paper presented at the 1980  meetings of the American Statistical Association, Houston, Texas, August 11-14, 1980. DeSanctis, C. Computer Graphics as Decision Aids: Direction for Research. Decision  Sciences,  75(4),  1984, 463-487. DeSanctis, G. & Jarvenpaa, S.  L. An Investigation of the 'Tables versus Graphs' Controversy in a on  Information  Dickson, C. W. Management Information System Definitions, Problems and Research.  Society for  Learning Environment.  Systems,  In the Proceedings  of the Sixth International  Conference  December 1985.  Management Information Systems Newsletter, vol. 1, 1971, 6-12. Dickson, C.  W., DeSanctis, G., & McBride, D.  J.  Understanding the Effectiveness of Computer  Graphics for Decision Support: A Cumulative Experimental Approach.  Communications  of the  ACM, 29(1), 1986. Dickson, C. W., Senn, J. A., & Chervany, N. L. Research in Management Information Systems: The Minnesota Experiments. Management Dixon, W.  J., et al.  BMDP  Statistical  Science,  Software  23(9), 1977, 913-923.  Manual,  1985 Reprinting.  Berkeley, CA: University of  California Press, 1985. Dixon, W.  J., & Massey, F.  J.  Introduction  to Statistical  Analysis.  (3rd ed.)  New York, NY:  McGraw-Hill, 1969. Dubin, R. Theory  Building.  New York, NY: The Free Press, 1978.  Dunn, O. J. Multiple Comparisons Among Means.  Journal  of the American  Statistical  Association,  56, 1961, 52-64. Eells, W. C. The Relative Merits of Circles and Bars for Representing Component Parts. Journal American  Statistical  Association,  21, 1926, 119-132.  Ehrenberg, A. S. C. Graphs or Tables. The Statistician,  27', 1978.  of the  BIBLIOGRAPHY / 255 Ehrenberg, A. S. C. What We Can and Can't Get from Graphs, and Why. London Business School, Discussion Paper, 1985. Elashoff, J. D. Analyzing repeated measures designs requires more than tests on means, Proc. Am. Stat. Assoc., 1985. Elashoff, J. D. Analysis of Repeated Measures Designs. BMDP Technical Report #83, 1986. Ericsson, K.  G., & Faloon, S. Acquisition of a Memory Skill, Science,  A., Chase, W.  208, 1980,  1181-1182. Feliciano, G.  D., Powers, R.  Communication  D., & Bryand, E.  Review,  K. The Presentation of Statistical Information, AV  11, 13, 1963, 32-39.  Frane, J. W. The Univariate Approach t o Repeated Measures: Foundation, Advantages, & Caveats. BMDP Technical Report #69, 1980. Fodor, J. A. The Modularity  of Mind.  Cambridge, MA: MIT Press, 1983.  Garner, W. R. The Stimulus in Information Processing, American Garner, W.  R., & Felfoldy, G.  Psychologist,  25, 1970, 350-358.  Integrality and Separability of Stimulus Dimensions in Information  Processing Cognitive Psychology, Cognitive  Psychology,  1, 1970, 225-241.  Geisser, S., & Greenhouse, S. W. O n Methods in the analysis of Profile Data.  Psychometrika,  24,  1959, 95-112. Ghani, J. A. The Effects of Information Representation and Modification on Decision Performance. Ph.D.  Dissertation, University of Pennsylvania, 1981.  Gibson, E. J., Bergman, R., & Purdy, J. The effect of prior training with a scale of distance on absolute and relative judgments of distance over ground. Journal  of Experimental  Psychology,  50, 1955,  97-104. Glass, G.  V., &  Hopkins,  K.  D.  Statistical  Methods  in  Education  and  Psychology.  (2nd  Ed.)  Englewood Cliffs, New Jersey, NJ: Prentice-Hall, 1984. Glass, G.  V., Peckham, P.  D., & Sanders, J.  R.  Consequences of failure to meet assumptions  underlying the fixed effects analyses of variance and covariance.  42(3), 1972, 237-287.  Review  of Education  Research,  BIBLIOGRAPHY / 256 Glass, G. V., & Stanley, J. C. Statistical  Methods  in Education  and Psychology.  Englewood Cliffs,  New Jersey, NJ: Prentice-Hall, 1970. Grice, H. P. Logic and Conversation.  Speech Hack, H.  Acts.  Syntax and Semantics  In P. Cole, & J. L. Morgan, (eds.)  3:  New York, NY: Academic Press, 1975.  R. B. An empirical investigation into the distribution of the F-ratio in samples from t w o  Biometrika,  nonnormal populations. Hinton, G.  45, 1958, 260-265.  E. Some Demonstrations of the Effects of Structural Descriptions in Mental Imagery.  Cognitive  Hopkins, K.  Science,  3, 1979.  D. & Anderson, B.  L. Multiple Comparisons Guide.  Journal  of Special  Education,  7,  1973,319-28. Huber, G. P. Cognitive Style as the Basis for MIS and DSS Designs.  Management  Science,  29(5),  1983, 567-57. Ives, B.  Graphical User Interfaces for Business Information Systems.  Systems  Jarvenpaa, S.  Quarterly,  L.  Processing  Special  Management  Information  Issue, 1982, 15-42.  An Investigation of the Effects of Choice Tasks and Graphics on Information Strategies  and Decision  Making  Performance.  Seminar  Paper,  University  of  Minnesota, 1986. Jarvenpaa, S.  L., & Dickson, G.  W.  Managing the Use of Computer Graphics in Organizations.  W.  Graphics and Managerial Decision Making: Research Based  MISRC-WP-85-11. Jarvenpaa, S.  L., & Dickson, G.  Guidelines.  Communications  of the ACM, 31(6), 1988, 764-774.  Jarvenpaa, S. L., Dickson, G. W., & DeSanctis, G. Methodological Issues in Experimental IS Research: Experiences and Recommendations. MIS Quarterly,  9 1985, 141-156.  Jenkins, A. M. Research Methodologies and MIS Research. Indiana University, Discussion Paper no. 277, 1985. Julesz, B. Textons, the elements of perception, and their interactions. Nature, Keen,  P.  G.  W:  The  Implications  of  Cognitive  Style  for  Individual  290 1981, 91-97. Decision-Making.  Ph.D.  BIBLIOGRAPHY / 257 Dissertation, Harvard University, 1973. Keen, P. G. W. MIS Research: Reference Disciplines and a Cumulative Tradition.  Paper presented at  the 1st International MIS conf., 1980. ' Keen, P.  C.  W., & Scott-Morton, M.  Decision  S.  Support  Systems:  An Organizational  Perspective.  Reading, MA: Addison-Wesley, 1978. Keppel, G. Introduction Kerlinger, F.  to Design  Foundations  N.  & Analysis.  of Behavioral  San Francisco, CA: W. H. Freeman & Co.,1980.  Research.  New York, NY: Holt, Rinehart, & Winston,  1973. Kosslyn, S. M. Learning  Mental Representation. and Memory:  In J. R. Anderson, & S. M.  Essays in Honor  of Gordon  H.  Bower.  Kosslyn, (eds.)  Tutorials  in  San Francisco, CA: Freeman,  1982. Kosslyn, S. M. Graphics and Human Information Processing: A Review of Five Books Journal American  Statistical  Association,  of the  30(391), 1985, 499-512.  Kosslyn, S. M., Pinker, S., Simcox, W. A., & Parkin, L. P. Understanding Charts and Graphs: A Project in Applied Cognitive Science NIE 400-79-0066,  National Inst, of Education (ED) Washington,  D . C , 1983. Kubovy, M. Concurrent Pitch Segregation and the Theory of indispensable Attributes. & J. Pomerantz, (eds.) Perceptual  Organization.  In M. Kubovy,  Hillsdale, NJ: Lawrence Erlbaum Press, 1982.  Larkin, J. H., & Simon, H. A. Why a Diagram is (Sometimes) Worth Ten Thousand Words,  Science, Lauer, T.  W.  Cognitive  7 7, 1987, 65-99. The Effects of Variations in Form of Presentation and Information Complexity on  Performance in an Information Extraction Task. Ph.D. Dissertation, Indiana University, 1986. Lauer, T. W., Davis, L. R., Groomer, S. M., Jenkins, A. M., & Kwan Y. Establishment of the Content Validity of a Metric of Information Set Complexity. Lehman, J., Vogel, D., & Dickson, G.  Discussion  Paper, Indiana University, 1985.  Nine Trends in Business Graphics Use. Datamation,  30(19),  1984, 119-122. Lindman, H.  R.  Analysis  of Variance  in Complex  Experimental  Design.  San Francisco, CA: W.  H.  BIBLIOGRAPHY / 258 Freeman & Co., 1974. Lindsay, P.  H., & Norman, D.  Human  A.  Information  Processing.  New York, NY: Academic Press,  1977. Lucas, H.  C.  An Experimental Investigation of the Use of Computer-based Graphics in Decision Management  Making.  Lucas, H. C , & Nielsen, N.  Science,  R. The Impact of the M o d e of Information Presentation on Learning and  Performance. Management Lusk, E.  J.  27(7), 1981, 757-768.  Science,  A Test of Differential  Accounting  Research,  26(10), 1980, 982-993.  Performance  Peaking for a Disembedding Task.  Journal  of  17(1), 1979, 286-294.  Lusk, E. J., & Kersnick, M. The Effect of Cognitive Style and Report Format on Task Performance: The MIS Design Consequences. Management Macdonald-Ross, Michael.  25(8), 1979, 787-798.  How Numbers Are Shown: A Review of Research on the Presentation of  Quantitative Data in Texts. Audiovisual Macdonald-Ross, Michael.  Science,  Communications  Review,  Research in Graphic Communication, Review  25(4), 1977a, 359-409. of Research  in Education,  5,  1977b, 49-85. Manson, R.  O., & Mitroff, I.  Management  Science,  Marr, D. Vision.  I.  A Program for Research on Management Information Systems.  19(5), 1973, 475-487.  San Francisco, CA: W.  Marr, D., & Nishihara, H.  K.  Representation and Recognition of the Spatial Organization of Three  Dimensional Shapes. Proceedings Miller, G.  A.  of the Royal Society,  200, 1978, 269-294.  The Magical Number 7±2: Some Limits on Our Capacity for Processing Information.  Psychological  Miller, G.  H. Freeman, 1982.  Review,  63(2), 1956, 81-97.  A., & Johnson-Laird, P.  Language  and Perception.  Cambridge, MA: Harvard University  Press, 1976. Miller, R. G. Simultaneous  Statistical  Inference.  New York, NY: McGraw-Hill, 1966.  Minsky, M. A Framework for Representing Knowledge.  Computer  Vision.  In P. H. Winston; (ed.)  New York, NY: McGraw Hill, 1975.  The Psychology  of  BIBLIOGRAPHY / 259 Mock, T. J. Concepts on Information Value and Accounting, Accounting Mock, T.  J., & Vasarhelyi, M.  Comparative Study.  A.  Review, Oct. 1971, 765-78.  Context, Findings, and Methods in Cognitive Style Research: A  Research Working Paper 531A, Columbia University Graduate School of  Business, Sept. 1983. Moriarity, S.  Communicating Financial Information through Multidimensional Graphics.  Accounting  Research,  of  17, 1979, 205-223.  Morton, J. A Singular Lack of Incidental Learning, Nature, Nickerson, R. & Adams, M.  Journal  215, 1967, 203-204.  Long-Term Memory for a C o m m o n Object, Cognitive  Psychology,  11,  1979,287-307. Norman, D.  A., & Rumelhart, D.  E.  (eds.)  Exploration  in Cognition.  San Francisco, CA: W.  H.  Freeman & Company, 1975. Norton, D. W.  An Empirical Investigation of Some Effects of Nonnormality and Heterogeneity on the  F-distribution.  Ph.D. Dissertation, State University of Iowa, 1952.  Palmer, Visual Perception and World Knowledge: Notes on a Model of Sensory-Cognitive Interaction. In D. A. Norman & D. E. Rumelhart (eds.) Explorations  in Cognition,  San Francisco: Freeman,  1975. Pearson, E. S. The distribution of frequency constants in small samples from non-normal symmetrical and skew populations. Biometrika, Pearson, E.  S.  21, 1929, 259-286.  The analysis of variance in cases of non-normal variation.  Biometrika,  23, 1931,  114-133. Peterson, L.  V., & Schramm, W.  Communications,  H o w Accurately  are Different  Kinds of  Graphs  Read?  AV  2, 1984, 178-189.  Pinker, S. A Theory of Graph Comprehension.  Occasional  Paper #15, Cambridge, MA: MIT Center  for Cognitive Sciences, 1981. Pinker, S. Pattern Perception and the Comprehension of Graphs.  NIE 400-79-0066;  National Inst, of  Education (ED), Washington, D.C. 1983. Pinker, S.,& Kosslyn, S. M.  Theories of Mental Imagery.  In Sheikh, A. A.  (ed.)  Imagery  -  Current  BIBLIOGRAPHY / 260 Theory,  Research,  Pisoni, D., & Tash ].  Perception  and Application.  New York, NY: Wiley, 1983.  Reaction Times to Comparisons within and across Phonetic  and Psychophysics,  Categories,  75(2), 1974, 285-290.  Powers, M., Lashley, C , Sanchez, P., & Shneiderman, B. An Experimental Comparison on Tabular and Graphic Data Presentation. International  journal  of Man-Machine  Studies,  20, 1984, 545-566.  Price, J. R., Martuza, V. R., & Crouse, ]. H. Construct Validity of Test Items Measuring Acquisition of Information from Line Graphs, journal  of Educational  Psychology,  66(1), 152-156.  Remus, W. An Empirical Investigation of the Impact of Graphical and Tabular Data Presentations on Decision Making, Management  Science,  30(5), 1984, 533-542.  Rider, P. R. O n the distribution of the ratio of mean to standard deviation in small samples from non-normal populations. Biometrika,  27, 1929, 124-143.  Robey, D. Cognitive Style and DSS Design: A Commentary on Huber's Paper.  Management  Science,  29(5), 1983, 580-582. Rock, I. Perception.  New York, NY: W. H. Freeman, 1984.  Sage, A. P. Behavioral and Organizational Considerations in the Design of Information Systems and Processes for Planning and Decision Support, IEEE Trans.  Systems,  Man Cybernet,  3MC-HO),  1981, 640-678. Schank, R., & Abelson, R. Scripts,  Plans, Coals,  and Understanding.  Hillsdale: Lawrence Erlbaum,  1977. Schmid, C. F. Handbook  of Graphic  Presentation.  New York, NY: Ronald Press, 1954.  Schmid, C. F., & Schmid, S. E. Handbook of Graphic Presentation. New York, NY:]ohn Wiley, 1979. Schutz, H.  G. An Evaluation of Formats for Graphic Trend Displays, Human  Factors,  3(3), 1961a,  99-107. Schutz, H. G. An Evaluation of Methods for Presentation of Graphic Multiple Trends, Human  Factors,  3(2), 1961b, 108-119. Shneiderman,  B.  Software  Winthrop Pub., 1980.  Psychology:  Human  Factors  in  Computer  and  Information  Systems.  BIBLIOGRAPHY / 261 Simcox, W.  Cognitive Considerations in Display Design, NIE 400-79-0066,  A.  National Inst, of  Education (ED), Washington, D.C. 198.1: Simcox, W.  Configural Properties in Graphic Displays and Their Effects on Processing, NIE  A.  400-79-0066, Simcox, W.  National Inst, of Education (ED) Washington, D.C. 1983a. Memorial Consequences of Display Coding, NIE 400-79-0066,  A.  National Inst, of  Education (ED) Washington, D.C. 1983b. Simcox, W.  A Method for Pragmatic Communication in Graphic Displays, Human  A.  Factors,  26(4),  1984, 483-487. Streufert, S., & Streufert, S.  Behavior  C.  in the Complex  Environment.  Washington, DC: V.  H.  Winston & Sons, 1978. Takeuchi, H., & Schmidt, A.  H.  New Promise of Computer Graphics.  Harvard  Business  Review,  58(1), Jan-Feb 1980, 122-131. Thomdyke, P. W. Applications of Schema Theory in Cognitive Research. In j . R. Anderson & S. M. Kosslyn (eds.), Tutorials  in Learning  Tufte, E. R. The Visual Display  and Memory.  of Quantitative  W.  Information.  H. Freeman & Co., 1984. Cheshire, CT: Graphics Press, 1983.  Tullis, T. S. An Evaluation of Alphanumeric, Graphic, and Color Information Displays, Human  Factors,  23(5), 1981, 541-550. Vasarhelyi, M.  A.  of the Second  Vernon,  M,  D.  Occupational  Vessey, I.  Information Processing in a Simulated Stock Market Environment. International  The  Use  Conference  and  Psychology,  Value  of  on Systems,  1981.  Graphical  Material  in  Presenting  In  Proceedings  Quantitative  Data.  26, 1952, 22-34.  The Tables vs Graphs Controversy: An Information Processing Analysis.  Working Paper,  Graduate School of Business, University of Pittsburgh, 1987. Vicino, F.  L., & Ringel, S.  Washington,  D.C:  Decision-making with Updated Graphic vs.  Army  Personnel  Research  Office,  Wainer, H. How to Display Data Badly, The American Wainer, H., & Reiser, M.  Technical  Statistican,  Alphanumeric Information.  Research  Note  178, Nov.  1966.  38(2), 1984, 137-147.  Assessing the Efficacy of Visual Displays.  Proceedings  of the  American  BIBLIOGRAPHY / 262 Statistical  Association,  Social  Statistical  Section,  1, 1979, 89-92.  Wainer, H., Lono, M., & Groves, C. O n the Display of Data: Some Empirical Findings.  Washington,  DC: The Bureau of Social Science Research, 1982. Wainer, H., & Thissen, D. Graphical Data Analysis. Annual Washburne, J.  N.  Review  of Psychology,  32, 1981, 191-241.  An Experimental Study of Various Graphic, Tabular and Textural Methods of  Presenting Quantitative Material.  Journal  of Educational  Psychology,  78,(6), 1927, 361-376,  465-476. Weick, K.  E. Laboratory Experimentation with Organizations,  Organizations.  of  Chicago, IL: Rand McNally, 1965.  Wertheimer, M. Laws of Organization in Perceptual Forms.  Cestalt  in J. G. March, (ed.) Handbook  Psychology.  In W. D. Ellis, (ed.) A Source  London: Routledge and Kegan Paul Ltd., 1938.  Wilcox, W. Numbers and the News: Graph, Table or Text? Journalism Winer, B. J. Statistical  Book of  Principles  in Experimental  Design.  Quarterly,  41(1), 1964, 38-44.  New York, NY: McGraw-Hill, 1962; 1971.  Winston, P. H. Learning Structural Descriptions from Examples.  Artificial  Intelligence  Report  MAC  TR-76, MIT, 1975. Witkin, H.  A., Oltman, P.  Figures Test, Group  K., & Raskin, E.  Embedded  Manual:  Embedded  Figures  Test, Children's  Embedded  Figures Test. Palo Alto, CA: Consulting Psychologists Press Inc.,  1971. Yoo, K. H. The Effects of Question Difficulty and Information Complexity on the Extraction of Data from an Information Presentation. Ph.D. Dissertation, Indiana University, 1985. Zipf, G. K. The Psycho-biology Zmud,  R.  W.  Individual  Management  Science,  of Language. Differences 25(10),  Boston, MA: Houghton-Mifflin, 1935.  and MIS Success: A Review of the Empirical  Literature.  1979, 966-979.  Zmud, R. W., Blochor, E., & Moffie, R. P. The Impact of Color Graphic Report Formats o n Decision Performance and Learning.  Systems,  1983, 179-193.  Proceedings  of the Fourth  International  Conference  in  Information  XII. APPENDIX A: GLOSSARY OF TERMS This Glossary defines certain of the specialized terms that are used in this dissertation. When pertinent definitions were readily available from published sources, they have either been quoted verbatim or adapted.  In all instances, however, the definitions reflects the meaning of the term as used in the  context of this dissertation.  1.  Absolute-Value  -- A scale whose units are discrete and well-defined, such as the number  of jellybeans in a jar (Pinker, 1981). 2.  Cognitive  Style  -- The process behavior that individuals exhibit in the formulation or  acquisition, analysis, and interpretation of information or data of presumed value for decision making (Sage, 1981; Huber, 1983). It categorizes individual habits and strategies at a fairly broad level and essentially views problem-solving behavior as a personality variable (Keen & Scott Morton, 1978). 3.  Color  - Color is the sensation of the variation in the wavelengths of the light reflected by  a surface (see Bertin, 1981, p. 187; Lauer, 1986). 4.  Conceptual  Question  - A conceptual question is simply a piece of information that the  reader wishes to extract from a graph (Pinker, 1981, p. 19). 5.  Dataset  Category  - The number of datasets depicted as separate catergories using a  particular coding scheme. forms a unit or entity.  Each category will thus correspond to a cluster of data which  These entities are normally spelled out by a legend in time-series  graphics. 6.  Dimension  - A dimension is defined by bipolar endpoints, discriminable into articulated  parts (Streufert & Streufert, 1978, p.31). 7.  Grpah  Format  -  The different forms of representation such as bars, symbols, lines,  wedges, pies, etc. used for presenting quantitative information in time-series graphics. 8.  Learning  -  The cognitive skill of extracting patterns and relationships presented in  time-series graphics and applying the extracted information to answering forced-choice 263  APPENDIX A: GLOSSARY OF TERMS / 264 questions (see DeSanctis & Jarvenpaa, 1985). 9.  Ordinal  and Disordinal  Two-Factor  Interactions  -- Two-factor interactions are classified as  ordinal or disordinal depending on changes in effects across levels of one factor with respect to the other factor.  If effects found across levels of one factor (say, factor A) are  consistently higher (or, lower) for each and ever)' level of the other factor (say, factor B), then the interaction is said to be an "ordinal" one. Otherwise, it is disordinal.  in other  words, the graphing of an ordinal two-factor A x B interaction will not have any crossing of effect lines,  (see Graph Format by Dataset interaction of E1). Conversely, that of a  disordinal A x B interaction will have effect lines crossing each other (see Graph Format by Question Type interaction of E1). 10.  Orientation  — The orientation of a mark is its angle with reference to some set direction  such as the horizontal or vertical axis on a graph (see Bertin, 1981, p. 187; Lauer, 1986). 11.  Power  - The probability of rejecting the null hypothesis given that a particular hypothesis  alternative to the null is true. The power of a test is its probability of leading to a Type II error for a particular alternative hypothesis (Glass et al., 1972, p. 284). 12.  Primitive  Symbol  -- This refers to the elements used in graphing quantitative data (see  Cleveland, 1985).  These elements are perceived as the 'unit' symbol in a graphics  representation. 13.  Principle  of Invited  Inference  — According to Kosslyn et al. (1983), this refers to the use  of proper scaling and other standards in designing graphics so as not to invite misleading interpretations on the part of the graph readers.  Huff (1954) is a classic on h o w to lie  with statistical graphs. 14.  Principle  of Contextual  Compatibility  -- According to Kosslyn et al., this principle refers to  the fact that since most graphic displays are embedded in a context, it is important that "the  context  and semantic  interpretation  of  the  display  ...  be  compatible  or  comprehension of the display will be impaired." (Kosslyn et al., 1983, p. 50) The principle parallels that proposed by Grice (1975) for language comprehension in the context of an  APPENDIX A: GLOSSARY OF TERMS / 265 oral presentation. 15.  Quantity  Scaling  -- This refers to the mathematical representation of quantities on the  ordinate scale of time series graphics. Such a scale is useful for interpolating exact values. 16.  Question  Type - The various sorts of tasks that are performed with the use of the various  kinds of time-series graphics used in the series of experiments conducted in this research. These tasks are in the form of forced binary-choice questions (see appendices A, B, and C). 17.  Ratio-Value  -  Unlike the absolute-value,  this refers to those quantities that may be  represented continuously but whose units are arbitrary in that they may be changed to other units without any loss of information.  For example, dollars could be changed to a  different unit like cents with no loss of information.  Also the inches-feet-yards scale  (Gibson et al., 1955). 18.  Scale-Value  - This refers to the quantitative values of a datapoint represented on the  ordinate scale.  Reading of scale values of datapoints involves the interpolation  of  mathematical units on the ordinate. 19.  Schema  -  A schema is a knowledge structure comprising a cluster of  knowledge  representing a particular generic object, percept, procedure, event or sequence of events, social situation, etc.  Such a cluster provides a skeleton structure for a concept that can  be activated or filled out with the detailed properties of the particular instance being represented (see Thorndyke, 1984): 20.  Shape - This visual variable refers generally to a mark of constant size but which can vary in form; that is, the outline of the mark can vary (see Bertin, 1981, p. 187; Lauer, 1986).  21.  Size -- This visual variable refers generally to perceived unit area that can vary from one size to another (see Bertin, 1981, p. 187; Lauer, 1986).  22.  Textual  -  A texture may be described as the number of marks or shapes for some  standard unit of area depicted as a regular pattern (see Bertin, 1981, p. 187; Lauer, 1986). 23.  Theory  - The term, theory  may be defined as an unambiguous statement of (1) the  APPENDIX A: GLOSSARY OF TERMS / 266 entities in a system, or (2) the lawlike relations among them (Pinker & Kosslyn, 1983, p. 44). 24.  Time Period  Variation  - The number of time periods that are depicted along the abscissa  or x-axis of a time-series graphics. 25.  Value - This visual variable refers to the ratio between the total amount of black and white (see Bertin, 1981, p. 187; Lauer, 1986).  XIII. APPENDIX BrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 Data Sources for Generating Graphics  Company A Revenue Report Periods month 10 7 1 89 77 48 53 61 23 20 A  Revenue Report for Various Companies Periods month 10 7 3 65 73 74 57 22 27 29 A 86 82 93 76 65 60 50 B 76 78 81 64 40 30 35 C  Company A Revenue Report Periods label 10 14 1 98 91 83 77 74 68 56 53 40 23 26 27 32 32 A  Revenue Report for Various Companies Periods label 10 14 3 57 64 67 73 74 42 50 57 63 66 41 23 18 9 A 99 97 93 89 87 81 68 78 85 90 67 58 49 30 B 75 82 90 85 84 65 56 63 68 75 56.43 36 24 C  267  APPENDIX B:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT  1/268  Company A Revenue Report (Revenue in 1000's Dollars)  LU  0—8 L U  LU LU  «/i LU  r> «/i a; = LL! LU •  •3u '—•  3  APPENDIX B:CRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -269 Revenue Report for Various Companies (Revenue in 1000's Dollars)  i  APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 /-V270 Company A Revenue Report (Revenue in 1000's Dollars)  o •Vs  I  •r  i  LsJ  •E drrj 8—1 BX."  LsJ 8Jl>  8jJ  Z3 LsJ ' / I  —• V i  LU  LL! • ilC —5 3C  8x1  APPENDIX B:CRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT T /. 2 7 1  Revenue Report for Various Companies (Revenue in 1000's Dollars)  r-.  jr.  a--  J :  U.1  •/I LU  LU «.M LU  = > ' / i ale" LU LU CD  EI EX  oc: _ i BT"|  "»H  JZ  .;M  APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 272 Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX  B:GRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / 273:  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX B:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / -274"  Company A Revenue Report (Revenue in 1000's Dollars)  • 8—I  IXl  LU  •r  i—  i  LU  •r  a  LU  t—g  «.'« LU Z3  21  uc Ccl Ul  LU LU _n r> rr " n LU • B - J 3 Vj .—.  o  o  U1  3-  . w  u.  =  APPENDIX B:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / 275  Revenue Report for Various Companies (Revenue in 1000's Dollars)  'Ji LZ  L.J  ca Ul  '37.  LU  LU ILL. 1—5  L'"l UJ  2: cc: iii LiJ U J  ZX  3u  'J? L l J O H  ~ i  I  -f-  ;  APPENDIX B:CRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 276: Company A Revenue Report (Revenue in 1000's Dollars)  i I I I I I I  I  ITU  D e—  a  J—I  LILJ BJL.  JZ  LF_  .'"i  L-  1  B  8.t"j ! / |  =3 LsJ. LU  L U  LLI  IJU  2".  Vl •nr-B if--! UJ  APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 Revenue Report for Various Companies (Revenue in 1000's Dollars)  / ^ 2 7 7 ^ C ^ ^ * W « ^ r - v ; > •..?•**?*>,<  APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 278' Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX BiCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / ^279 Revenue Report for Various Companies (Revenue in 1000's Dollars)  I  •r LU  LU' •LU  »x:  a  i—  JZ  a  8~H!  LU \t\,  xz a l  LL. IS/B  J3  M . M  LU '.'"«  a: <E LU yj  LU i K ~"  'isH  LU LTJ 2: O «~"  Vu f:  'i'  Ul  -~-  •ar -_• -_•  APPENDIX  B:GRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 /•-.280- :  Company A Revenue Report (Revenue in 1000's Dollars)  o "'.V  i I i i  IUJI  O  it—a J.!J t u  SHI  BJJ  ».'"B 8JJ  ~ » '.'"8  U~ =  LiJ y j a S sji! _J 3C ^  1  T  "sHI  <n - -  n'"-j w  LU  £  APPENDIX -^GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / . ^ 2 8 1 ^ : > W v - e  Revenue Report for Various Companies (Revenue in 1000's Dollars)  ".^'^tutsShsOK  APPENDIX B:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / -282- - " ' V — * ^ .'  Company A Revenue Report (Revenue in 1000's Dollars)  LsJ  8x5  ~" '.'"! BXl LiJ! UJ • „J 2Z  a:  •-• •-•  u.  • >,r-»-.;u;-  APPENDIX B:CRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / - 2 8 3  Revenue Report for Various Companies (Revenue in 1000's Dollars)  I  B  r-.  LU  a  LU  I—  LU  •=  JII  U.  £  ZJ  L U ".'"« L U V i Bli = LU LU • | Z>  liC  —J  3 C  w  IX.  •r — -• £  APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS . IN EXPERIMENT 1 / \284 > Company A Revenue Report (Revenue in 10OP's Dollars).  APPENDIX B:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 1 A--.285.  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX B:CRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 M 2 8 6 '  Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX B:CRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / -287:  Revenue Report for Various Companies (Revenue in 1000's Dollars)  a .  t—0  UJ  zc  i LU  o iar-  LU  LU  «  2T !>: LU uLU LU : r "  ~« n: B."? = L i O H 5 id-: - J to  X  w - v,  <c •-• •-• £  APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / 288 Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX B:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1  Revenue Report for Various Companies (Revenue in 1000's Dollars)  ><£  APPENDIX B:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 /--290-. Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX B:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / . 29T. Revenue Report for Various Companies (Revenue in 1000's Dollars)  „_J  •i:  tic  LaJI tlai 'X. >:4  a M  LaJ ni..,  II a  8J«  >S?  LLI  =3 si:  "•i «  S ijj £  -X "E  LHJ SJEI Cui  LU LLI  2Z =  £i£ 'Cl n  ^  ;  APPENDIX B:CRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / v292  Company A Revenue Report (Revenue in 1000's Dollars)  ;  APPENDIX BrCRAPHlCS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / \293'  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX B-CRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / - 2 9 4  Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX"B:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / 295  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX B:CRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT. .1 / .296 •  Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX B:CRAPHICS & QUESTIONS  FOR-TRIALS IN .EXPERIMENT 1 /.v292?  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX B:CRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / -298.  Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX B:GRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / 299;'  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX FJ:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 1 / ' 3 0 0 -  Company A Revenue Report (Revenue in 1000's Dollars)  APPENDIX B:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 1 / ,301'  Revenue Report for Various Companies (Revenue in 1000's Dollars)  XIV. APPENDIX CrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 Data Sources for Generating Graphics  Company B Revenue Report Periods j  month 10 7 1 89 77 48 69 61 23 20 B  Revenue Report for Various Companies Periods month 10 7 3 65 73 74 57 22 27 29 A 86 82 93 76 65 60 64 B 76 78 81 64 40 30 35 C  Company B Revenue Report Periods label 10 14 1 98 91 87 78 74 68 56 53 40 23 26 27 32 32 B  Revenue Report for Various Companies Periods label 10 14 3 57 64 67 73 74 42 50 57 61 66 41 23 18 9 A 99 97 93 89 91 81 68 78 85 90 67 58 49 30 B 75 82 90 85 84 65 56 63 68 75 56 43 36 24 C  304  APPENDIX OCRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / 305  Company B Revenue Report (Revenue in 1000's Dollars)  APPENDIX CrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 306 Revenue Report for Various Companies (Revenue in 1000's Dollars)  tn •-j -  :i"3  r !•-.  -i.v  L U  O C L U  L U  L U  23 O LU »-« LU U l •e  o  o in  ua •-•  APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 307 Company B Revenue Report (Revenue in 1000's Dollars)  —t  t t i  hi  Ul LU irl  u_ -  LU  S  u •*  LU  'i _,_" cr:n  I—'  LU LU • iic: LL.  o  i—r  «,•: --• — ill  •_•  •T"  U J v  04 ^  APPENDIX G.CRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 /  Revenue Report for Various Companies (Revenue in 1000's Dollars)  i I  I  i :  •Vt-  LU sir."  *r  LU  1  g 1x1  CC  LE. ^  ua  cc *  •r  K  a  z a x ^ LU -~ -• i t O = UJ LU 2 : = ex: a. cc ^  308  APPENDIX  CCRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / V  Company B Revenue Report (Revenue in 1000's Dollars)  iii slr.  O i—  ill ilxl  L'J  LU  ixi  a  _ J ! U"e  a:  2 : U.S  a«,.!? t-~  »x:  2..J iv«:  iCs:. fx!  .x:  Ix!  U J  ILK." iTJl  «-* O  O »-« _J « ir_" s 1 £-« UJ  IX-  n t—  ui  3-""  >  a  S  LU ZTJ m -7C3 !  . _ ! LU £ l_l 2: »-•  309  APPENDIX CrCRAPHlCS & QUESTIONS FOR TRIALS IN  EXPERIMENT 2 / - 3 1 0 -  Revenue Report for Various Companies (Revenue in 1000's Dollars)  si?!  I  I'.;'.  g  r.ti LU  fj:.. 8., 3  c—  o  KJ  L'„  ILLJ  Ct! F.J"I  Lil  •i:  c— O's EJJ IJJ ue  LIL.  "  :  -  S  i d IJU «  a  ~c  u.  —it i— ice: n £  =Ht LU 2 t = _J  LL.  (,"8 • - -  "JC ^ •"-  APPENDIX QCRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT  Company B Revenue Report (Revenue in 1000's Dollars)  [III  [i:i3 FJJ  ia a .—i Cii UJ  e--3 u—  lul liJi  r:  UJ  • • • «  In...  UJ  ~1  a  O  C3  UJi J—I  •r Cu LK. •r L'l I_J UJ a in i—. ik Cc.  rr  •  «r LL) —T™ _ j UL  LU  M  • —  O C  ZZ\ UJ  2/311  APPENDIX  CCRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT  Revenue Report for Various Companies (Revenue in 1000's Dollars)  JZ  LZ JZ LU -;ir  LZ  „.rz  a  t— LU  hif.:  US  zz  L'"S  \z iz: .izzz:  ZZt tji  t—1  »__•  o  t—-  Lti  L.J  "*f  L=l  ZZ\ LU  a: •»--! 2=  zrz  r:  C3 I_J  1x1  — 1-» .j—  Lx. c-  LU irl 1x1 t>:  "T  !—I  a  '-I .TP  fx.  t— '.•I  lis LU  LU  a rc »-i •—  ILCl  LU 2C  •r LJ_ cc _j  2/312  APPENDIX  CCRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT  Company B Revenue Report (Revenue in 1000's Dollars)  2/313  APPENDIX  CCRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / 314  Revenue Report for Various Companies (Revenue in 1000's Dollars)  111  j—  j™ SUJ! =""-S •jri  j./J  iii ua  •  "X"  L U »--*  izi. IZZ L LTJ  1  .—.CC SA. i—  U  B_.l  UJI  Ul  s—t a C J  I  (!—«  ^-—  «-» ..J ir.".  KJ  UJ  a  UJI  ~i  0— L "B B  h.1m in  —i— ' . „ « L-J  O  -JLZ •  _.1 «•-. O  UJ  ".'•4  APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 7 Company B Revenue Report (Revenue in 1000's Dollars)  • :-|—  Ul  L" LU  LU  r-.  LsJ  .EV:  „ J LA  •r: .71:  ia  C 3  a IJC,  2 : LU  s~ .JJ; iCt.  LU  iji. C J [•-.  S.U f.TJ ~J ._i  |jj  E—I  t ~ LL! 1/1 Q  LU 'r« _ i • / I LU •  21 a  __» n C J Vi - - —  1-1  315  APPENDIX GGRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / 316  Revenue Report for Various Companies (Revenue in 1000's Dollars)  .Jl"  I  "i: >:--i  I •J?  LET.  I  LU  3  JZ  LU  rj  :?z 0."  a  i  L U «--«  njc;  a ! - « 21"  LU  8— • 1 "  Ou  l_(  UJ  LU Cl Cl  tzi ZZl CJ  v.  [  SS  ,.1  I  I  1  I is  .  T  r  "  "  «  L C  •x  LU 2T  LL. ' " I  LU l.i-, I — •/I  UJ in a  LU  •  C i 22 n _!  '/l  LU  •zz a  - -r-i «H  APPENDIX CGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 Company B Revenue Report (Revenue in 1000's Dollars)  /317  APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 318 "" Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / ,319 Company B Revenue Report (Revenue in 1000's Dollars)  APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 320 Revenue Report for Various Companies (Revenue in 1000's Dollars)  r-. t—  LU ttz •i:  LU a  LU  8—1  •r  LU  i/i -±-  a.  LU Z3 Cl LU  zz • LU  ZZ*  n LCl  LU LU LCl  LL.  i—r f"l*» •—" — • -  APPENDIX QGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 321 Company B Revenue Report (Revenue in 1000's Dollars)  UJ!  L i  ».M B/"I  o  ! —  L U  Cu L U  CtL  «/F. F-UL"  r. LL.  L U  =3 I_!  ~~ j  B.J  »~i  a  i—  LL!  B_I  ..ni  UJ  Bti"|  ..*.,..  B  «r nr. «_<  a ' J : -  UJ 23 7~ Lis LU —• CE. UJ *-< tJZ CC  u_ Id  ix; I  ID  LJ  1 UJ  I.JU  -  L'l i-t LU  ^  LL.  UJ O rr ^ i— EH! I-H  •_jr a. oc 5 _  cc: cc:  UJ  1-1  LU  I^JJ  UJ  APPENDIX OGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 322 Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX  OCRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / .323  Company B Revenue Report (Revenue in 1000's Dollars)  I  I  ill  1!  lit  HI  £-1  lite Vl  i'.'.i s  a  C-H  a  11—  HI  LJ LU LI E  •*••» an -»-* L~H  Li.a ru o  si  [I  LU  \J~  a  Cii  ~ » f i t : « ~ «  JjJ. 1—1 OC uj-;  or  ij_  Vi C l U w a x i_ni  J  t—f  j  —  «r LU 2r  i—r  _J  a. oc  —  m - l l"-J  LL!  •-• •-"  ui  APPENDIX OGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / Revenue Report for Various Companies (Revenue in 1000's Dollars)  I  lit i'.ii Vt  Of.  LU  .J KJ L  S---1  O  LiJ  '•JJ  »...' LU  rr  a  gi  KJ •:-!?  Ul Z3 C >  y„ a  x  Li  LU  £51  ^  tij  ^  5  B—5  Hi!  ill tt.  Iii  ui  ct: LU "  ui  in  a t — i s—  a | CC  i—r  5  324  APPENDIX CGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 325. Company B Revenue Report (Revenue in 1000's Dollars)  »*—,  a  5.™  K .  Ui Z'J.  L«'i !..«J  lit  IT--. «Ji? -•ii-.  Ui  a HiJI -*™» a LI" iK =1:  B—!  i—  on a. L'J  !.U t'j£ i~4 tZi 1  E—t  u.-: «_> LlJ 2 : LL.  »-«  ZZl 1—  j — LU Wi L J  • 2 :  Ul  in  zzz '_« LU a - J  2: a  .-..-. —  •»-<  «N  APPENDIX CCRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / 326  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX GGRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / .327.  Company B Revenue Report (Revenue in 1000's Dollars)  E—JE  t—  K-4 LU ZS Lil  U  i'xi  ...J  •11  m  i.i's (V;  1*4  I—s  i—  or ix. dri  LU  UJ 8—8  I  zzt  (  fu n  Ui  " Cl  h— 5—  8—  «  ETC: «„J ^ LU 2 : "  >  LU • «  LU in  •  Eli  I_I  LU  |  E—' " \ L U  APPENDIX CCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 328 Revenue Report for Various Companies (Revenue in 1000's Dollars)  L U  t—  L U  "ZZ •«•"• RJUI U J  r:> us L U  •ai. _J.  'XZ  U  UJ  • t—  LD az  >„' i x ! LL!  tie. LTJ !-= LTJ Ci  L7J fr—s UJ i>_  Jxi lTj  m  „J  d__i • ux. »S n LU I 1 -TT"  z:  CJ_ >.'•»  UJ  '.'"i LTJ u i LU = • |  U J rn  1 o  a rr IX!  ro wo  APPENDIX CGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 330 Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX QGRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / 331  Company B Revenue Report (Revenue in 1000's Dollars)  APPENDIX OCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 332 Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX QGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT Company B Revenue Report (Revenue in 1000's Dollars)  2/333  100  —  i  20 H  1  B'5 fl) (2J  2  ?  +  E  LARGEST R E U E N U E C H A N G E O C C U R S BETS-SEEN PERIOD PAIR S :i £ A N O T H E R P A I R OF C O N S E C O T I U E PERIODS  FREES <RETURM> IF REfiDV TD SEE GRAPH  f.  20  —\  to  H  *  3  1'-'  B's LARGEST REUENUE CHANGE OCCURS BETWEEN CD (2)  PERIOD PAIR 10 & i i ANOTHER PAIR OF CDNSECUTIUE  FRESS <RETURN> IF REfiDV TD SEE GRftFH  PERIODS  i 1  ~~~  i i  1-  "  :  It  j  j  !  j j i I  APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 336 . R e v e n u e Report for Various Companies (Revenue in 1000's Dollars)  t••  LiJI UJ  Z'JL 5-.,.  o  S j J  e—5 U's  UJ UJ  t_j [_.«  a  IS-  LLJ  L".J  cc rr:  a  UJ  UL.  ra o C3 LL!  :r> cr:  •1Z . . '=>  B — ]  Ixl  U i  F =  LLE :—  P:  LL. LL. UJ • LsJ  in i—i Cd Cc  :—  cn LLI z : ci_ cn  _i  v*4 ^  APPENDIX  GGRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / 337  Company B Revenue Report (Revenue in 1000's Dollars)  e.ji  I—  UJ ~ 1  ti,! P-. M'« LsJ O O C ufl „.J  <i:  ."P™  U"i  O a  Id »--«  C J *_LT  e-« I i : IJIJ £—  «_< a-: LU  UJ  8.'(«  C lC l  »-* C J  C n t™» ™J  • cc:«..«  2C 2: L i . *~i «-« U J  Z71 t—  UJ  a  V l C l 2C UJ rcn ' J .-J V i r s r U J  •  Vt --• --o r-.  o 'x»  o ui  o a -  o m  o  <s|  o  -  KA  ca •-• ---  APPENDIX CrCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 338" Revenue Report for Various Companies (Revenue in 1000's Dollars)  ~ t —  ..i ...  LsJI £€.  •  ni  Wi LU  LU Eri tj;' I— - J 111 a  •—a  CC  :;c U J txZ LU  *"> 2T. ilu  >:c  ' . J LCT  LU  LIC:  LU CJ  CJ  "1  o »-i U «  LL  »~« liJ IC L L . '/I  s—  LU  '/! C l LU  m •  o — <71  TO  O  1*1  Cj  — -t-l i/4 LYl -_-  n  i-  U J it  U J  APPENDIX OGRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 2 / 339  Company B Revenue Report (Revenue in 1000's Dollars)  m  IIi Ii c--Jl  LU  sr.  LU  -•.--a  —Y —._! *-«— ..[—  L>J  i—  c"-.S  LsJ  yz, a— ,j~ Cf.  LU  ! l JC-  _J  ^ H.".J! LiJ  a «~s  C I  E—B 3 7  P.U  I— >_7  ILL.  SUJJ  UJ  l„I  Cj B_.ll t—t • rri _j d • tj_ »_« LU 27 u_  m  LL. I-I B/|  B —  E— LU  ~  L'B LU  C i 27  in _ J « / l ~r~l_l Iii  n _r a l-B  I,"! o  —  .—. i-«  O  .—.  APPENDIX GGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 2 / 340 Revenue Report for Various Companies (Revenue in 1000's Dollars)  tit  '_'  K  L.J  o  Ui KJC".  LU ~1  LU B "1  FJLS  SJ1  Ci •x  2T. l U  S—5  n—e  LU  i — a: i Ji F.K. LU  X  LU  Q CJ  UtC  O  «-5  CE  B  I™  J  23  I  • Cr:J_I  tjj «—5 2 1 LL. — .  23 J/1 B—  jjj  L""l I_J  LU 2 3  LD _ J  7* I_I LU • O 2Z S 1  !  Vt --O  ---  XV. APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 Data Sources for Generating Graphics  Revenue Report for Various Companies Periods month 10 7 2 96 85 73 68 65 55 48 A 89 79 48 53 61 23 20 B Revenue Report for Various Companies Periods month 10 7 3 65 72 74 55 22 30 39 A 86 82 93 76 65 62 50 B 10 20 30 40 18 25 15 C Revenue Report for Various Companies Periods label 10 14 2 64 55 62 68 70 62 50 42 30 16 13 10 7 4 A 89 91 83 78 76 64 56 53 40 22 26 27 32 32 B Revenue Report for Various Companies Periods label 10 14 3 50 55 67 73 74 42 50 56 68 76 41 23 18 9 A 99 97 93 89 87 81 78 80 90 95 70 62 50 40 B 62 72 75 80 85 75 65 70 85 90 61 55 43 30 C  341  APPENDIX D:GRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 342  Revenue Report for Various Companies (Revenue in 1000's Dollars)  LiJ ix'.,  cc ui  LLS  Ld  _r_ 7.C i—  i—i  zx a  cc J~  • cc«_» zc r.» cc: »— ~  LJ  cc rc '.•"i ii- *~ • r r n zc a — •_.« cc i  APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT Revenue Report for Various Companies (Revenue in 1000's Dollars)  Ii • •  L:  .C»  S  E— «/« 1  F.iJ!  fjiJI  Cri  E—  t—a  a  o—a  U'Z Q I_I >D  <K LU  JjJ  •r cc .r: I—  zsz  •  !_!  I I !•*-« I--  a  oc  1 n  s*. 5  v4 "  3/343  APPENDIX DrCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / Revenue Report for Various Companies (Revenue in 1000's Dollars)  1!" h  p!  I  II!  r ».»s U J  i  :jr"  cc VJ.  S-'j  [I  I-  iii ZC I—5 r x :  1  a  TZ  i--1  CC  LU  X7  fit.  cc  LU r r  a  - «  «— 27, LU CC vi a. CC XT 7C C3 CC  •  I  !•-|nrH t -4  U J  344  APPENDIX DrGRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 345 Revenue Report for Various Companies (Revenue in 1000's Dollars)  JZ  Z u>  U i  z  EJT." Ill  L  --  :  L  r  _  J  1  "  <'Z  Ui  1  . _  c „•.. _  Ul ZI  •}•••••;-•  _J Z  zz?  -  UJ.  „.Y  -  firtit:  £...„  E—S  :rx O  a  z  t--«  xt "  Ul  a.  a o  ui I  31 Z-  I.J  -v-  JjLJ~zz « t~~  •11DC  u.  Ui CL. KSZ xza = zr. a ! —  3:  ill  I_I  i i-• AJ. i i -  Ui  APPENDIX DrCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 346 Revenue Report for Various Companies (Revenue in 1000's Dollars)  iZi  Ul !—t tV*;  UJ OLii ZZ t—t B  n UJ •••„-"'  O  cVe  U i  '71 ZC «..,«  £— Ut UJ  tj[j  ixl »'SZ  »-  ~J UJ  UJ  DC w O ^  a: •_.« -  ~r~ "m  nV"  <n  • - !Lt! " 2  '.'"l LL. »—  •r E a  2  D Z l _ i «X  I  = =  ^ "  |.-. .-. w I T H I fl"'J  .  | ' — *  "—-  ui 0.  APPENDIX D:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 347  Revenue Report for Various Companies (Revenue in 1000's Dollars)  $51 ni!  r:  I  III.  T Z  u  Is Ii I  o t--«  Cs:.  Hi  Hi i—t t—  zzi. '_.» b j  t.ri  f.-:n re:  ciii  p I  UJ LU  FV?|  iiTi  I  III Cd  E.iJ  P lb? i  CE _E  «„.! |— UJ  CC u_  u_ CE .-J UJ  •  cc'_'  cc r c h L'"! _ CC x r rc  ra  E  5  I_I  i i--  C"4 "  APPENDIX D:CRAPH1CS & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 348  Revenue Report for Various Companies (Revenue in 1000's Dollars)  E3 t-4  LU EJ_ LU ™ i  tr-t g—.  «._! LU *.'"«  -1— a i t .JU—  s—  LU LU  LLI  LU  m i i L'l  LU m  LC'  LU  =  K  &  5  LU «  =^ s— LL.  •r i xr re I-- •  I_I  l/»  3"  I'M  <H  ^1  a 1 iT"  ,~*.  LU LC Ui  '.'"I  APPENDIX D:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 349  Revenue Report for Various Companies (Revenue in 1000's Dollars)  CC UJ  Cu  UJ  i :  73  EZ3  UJ 72  r:  t—  UJ!  Cft UJ  cc rc i_i i—  UJ  zi  CC u_ XT  IJ_  cc _j  3  UJ  cc  rc r«  a I.I  cc rc  U'l _  ! —  cc r: ca rc a I_I CC  i  r  i i-i  APPENDIX D:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 350  Revenue Report for Various Companies (Revenue in 1000's Dollars)  1 • •  CE  lit i— LU .—rLU US is'..  a  !„.:  Ut  Ci  i  [•••s  6—  LU  oc *  cn -  re t—  .  LE. in LU o c CL. I—  » . « " i  Q ,R  S  OC x r Ld = CC • !__! iT* LU a: I . 1,1 I--I—< CM «  APPENDIX D:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 351  Revenue Report for Various Companies (Revenue in 1000's Dollars)  rzz  r"  LU  L'J  Ld LU  t'r". -.1 CC  o J—0  t— t.J  Li.1 tCr__T  E—  SJJ LT" —i—  a LU  zc »—  XT w  cc !_u!  «  LU  LU ^ cc >/i a . rc >*• cc xr I rc a _i a I_I cc "g i i --• |-«-« C"JJ Ui I-'  APPENDIX D:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 /  Revenue Report for Various Companies (Revenue in 1000's Dollars)  "I*  z LU  Mi'  z  LU  :~s — i -• • i .  IUJ  i: ZZ  LU h'.sr'  ...,J a::  r..J c— LU £--"J  I t-fl  i— '.••I LU  L:  = 1  tJTS  CC «  ™f""  rfz -  •  XZ ->  .  cc  LU  ZC Z«"  a. S LU «  cc ZC "/I  u.  LL.  cc xz z z .  z ca I_I  CC "  i i--C"4 y  ^  352  APPENDIX D:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 353  Revenue Report for Various Companies (Revenue in 1000's Dollars)  LU  Hi LU  I'J  ..«Ji  a:  o  8--; E—. .... c LU !  PX.  fJ.1  LU  3 - AS _ " ~ LU « - T I E u. «/t u - «— X t _  i — i — i — r  i—r  I  s  i-«  _  APPENDIX D:CRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 /  Revenue Report for Various Companies (Revenue in 1000's Dollars)  Vtt— tir.1 us  Ex! I!x5  z?z £  E-a ZX  a  B—C  iTT  i n is  IX.  Ex! tu.,  UJ XZ w •  LsJ  «— i : «/i  on 3- fc  a. I—  rc in S CC 3 2  I_I  i  i CE " i •-• CM ^ " CL.  354  APPENDIX D:GRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 355  Revenue Report for Various Companies (Revenue in 1000's Dollars)  1 l  K""  Fpi*  rirl  1 (-•»  Uil EX:  cc  LUI «.»It  ?t.<;ii  UJ  ii lis!  W m  S.1.1 Z"  H  UJ! «_  1  rc  £— E—fl  ~£  1  a E~0  LIT.  IJL. Lil  UJ  ill-  UJ  X„  n_  w  CC •_j  ZC Z"*  UJ  cc UJ <* '.""l l _ _  all sa  cc xz I — a =I rc rrj _c I_I cc ^ i  r o  !•«-« C--I -< I--- "—- a.  APPENDIX D:CRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 I 356  Revenue Report for Various Companies (Revenue in 1000's Dollars)  mr  I 1  I  r-.  IT'!! 1  ks'l  t— ».•"« LU  L'lLU  E  /6  111  LU  1  LU  B—B  IL.J  a  :--«  ir! LU LU.  LL.  x: a • r »_>  LU ZC >" &:  «r  LU  re  L'"l LL. i  •r x: cn x a •"TP* :i  I  \-r< «N  (S  Uf>  APPENDIX D:CRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 357 Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:GRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENTS / 358 Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX DrGRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 359  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:CRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 7 360.  Revenue Report for Various Companies (Revenue in 1000's Dollars)  I  i-t  Ul. FX,  LU  rr?.  1  LJ  mi  r.:a a c  LiJi Id  L»J E!U  13 "*r~  1  cc zc  I  </B 1x1  tn •x:  H>ril  _J LU  i!  xz *  ce E3  hi CI  ri-  8 B  -  1x1 «  cn rc I/B IX.  1—  cc rr • = rc m  Si!  1  I_I  i i-—•  CC «'-J  o  I"-"  UJ i -  APPENDIX D:GRAPH1CS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 361 Revenue Report for Various Companies (Revenue in 1000's Dollars)  CB  C3 B—I  LU tJL.  LU  .u t—a  t-..  rj  LiJ  •/! —I—  m l_i  T-1  LU LU r c T  —  LU flu LU in  or rc I_I B—  '.'I  LU in  ix: CC  cc LU cc :«  • CSL  LU  CC CC  «/i LX. CC 3C • ZC •  u  »T*  = uj  I I--. Ii - l < -4 ui  APPENDIX DrGRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 362  Revenue Report for Various Companies (Revenue in 1000's Dollars)  6"  r A 5JU I—  LU .J_.  txl LU  O  I!—  L.J LU C€. 5~!  r"ji •  r.i {__ iii in  'ZC *  rr u i  a _j  XT > L  rn  LiJ  «_r u °  Z-* CJ  rc  Ul  az  LU •*  rc  i-  CC X T I —  rc rn rn = I_I ___  i  ^ ~  OC  i-•  LC.'  _3 LU  APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 363 Revenue Report for Various Companies (Revenue in 1000's Dollars)  "v  •••  APPENDIX D:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 364  Revenue Report for Various Companies (Revenue in 1000's Dollars)  t ••  m  -71— U i  i—  fen  SJJ  —j  U.J; t>H  1  '.•I  nr  UL,  •  LU  XT ^  o r «_J -  nr  LsJ *  0Trc »— or rc  o.i_ n  a  1  i--  <-j  L U  APPENDIX D:CRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 365  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:CRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 366  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:CRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 367  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX DrCRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 368 Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:CRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 369  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:CRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 370  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:CRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 371  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:GRAPHICS & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 372  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:GRAPHICS & QUESTIONS FOR TRIALS IN EXPERIMENT 3 A-373 Revenue Report for Various Companies (Revenue in 1000's Dollars)  iZJi o t~'i Ul  Ul  £..:.,  .J  l  111 '•.•"»'  o a  r.c !!••••-  \ \\  •  S.1.S  rc !„1  j.™  Wi Ui l  •r  Z?  FJcC  >x: „j  XT  a  I_J  rc r»«  ZZ ui • r zc «/i CL.1— >T. XT •  zc  •  •AM  i_t CC  I !•»-> •:-4 I'--  APPENDIX D:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT  Revenue Report for Various Companies (Revenue in 1000's Dollars)  3/374  APPENDIX D:GRAPHICS  & QUESTIONS  FOR TRIALS IN EXPERIMENT 3 / 375  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:GRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 / 376  Revenue Report for Various Companies (Revenue in 1000's Dollars)  APPENDIX D:CRAPHICS  & QUESTIONS FOR TRIALS IN EXPERIMENT 3 /•377-  Revenue Report forVarious Companies (Revenue in 1000's Dollars)  XVI. APPENDIX E-.SUBJECT RECRUITMENT FORM COMPUTER GRAPHICS STUDY  We are conducting a study to determine what the most appropriate graphical presentations for performing various tasks are.  Your participation in this exercise will provide you with the opportunity to: 1.  Interact with a micro-computer graphics program on an IBM-compatible XT system, and  2.  Receive $10 participation bonus in addition to cash prizes to be awarded based on your performance level in appreciation of your volunteering time and effort.  Participation in this study is strictly VOLUNTARY. time at your own discretion.  You may withdraw from the exercise at any  It is expected that the total time you will spend is approximately  1.5 hour, allocated as follows:  a.  Completing the Embedded Figure Test (20 minutes);  b.  Performing a practice exercise to familiarize yourself with the use of the computer system and the experimental procedures (20 minutes);  c.  Performing  a  series  of  self-paced  question-and-answer  exercises  using  computer-generated graphics (approx. 25 minutes); d.  Performing a second series of exercises (approx. 25 minutes).  All participants w h o complete the experiment will be awarded cash prizes of $10 plus an amount up to $25 t o be awarded based on your accuracy and time performance relative to others. There will be a total of approximately 30 other participants.  378  XVII. APPENDIX F:SUBJECT CONSENT FORM COMPUTER GRAPHICS STUDY: CONSENT FORM  1, the undersigned, hereby agree t o participate in all of the related experimental procedures designed for the abovementioned study (see the attached form -- Recruitment Form)  It is also my understanding that 1 may, if I wish, withdraw from the experiment at any time at my own discretion, without jeopardy to class standing.  SIGNED:  Student ID :  380  XVIII. APPENDIX G:OUTLINE OF EXPERIMENTAL PROCEDURES EXPERIMENTAL PROCEDURES - Greet Subject;  - Tell Subject the different steps that will take place;  - Ask subject for his/her signature on the consent form;  - Administer the GEFT test;  - Instruct subject about the Practice Session and encourage him/her to ask questions during this session;  - Begin the Practice Session;  - Ask questions t o measure subject's understanding of the graphics presentations and the various tasks t o be performed;  - Track answers for each task performed during this session until an error is detected;  - Stop subject and explain why the mistake was. made before letting him continue further;  - If more than 3 wrong answers surfaced during the practice session, break and advise subjects t o rerun the session.  - Start the Actual Session and remind subject t o try hard to be as ACCURATE and QUICK in responding as possible. Remind the subject of the cash bonus he could win for good performance;  381  APPENDIX C:OUTLINE  OF EXPERIMENTAL PROCEDURES / 382  - Give participant a break if s/he needs it.  - Start the Second Exercise and remind subject again of the cash bonus he could win for good performance!  - Request participant to fill out the questionnaire.  Be sure to explain what Line Graph, Bar Chart,  Scatter Plot is.  - Pay participant and ask him/her to sign a receipt;  - Save participant's answers into the floppy disk and immediately print a hard copy;  - Grade his GEFT score and place results with other forms filled out by him into a folder lay away in a safe place. Each folder should have 1.  The consent form;  2.  Participant's GEFT score;  3.  Subject's completed questionnaire;  4.  Receipt for amount compensated;  5.  A hard copy of subject's results as stored in the computer.  XIX. APPENDIX H:!NSTRUCTIONS FOR SUBJECTS SCREEN 1:  Please read the following instructions carefully:  You will first go through a "Practice Session" to become familiar with the procedures of the "Actual" experimental run. Since no data are collected during this first session, do not hesitate to ask questions of the lab assistant.  Notice that all of the necessary instructions are normally written d o w n at the  b o t t o m of the screen.  REWARDS!! As a participant, you will receive a special gift for your time and effort.  Be aware too of  your contribution to the advancement of our knowledge on graphics comprehension!  Thank you for  coming and we hope you will enjoy using the graphic terminal.  Please continue with the rest of the instructions by pressing "RETURN"  SCREEN 2:  This exercise is designed intentionally as self-paced. You are free to spend as much time as you like on the question that first appears with no accompanying graph on the screen. This is t o familiarize yourself with the information that will be required from the accompanying graph that appears following your pressing the "RETURN" key.  Notice that your response is timed from the moment the graph and instructions appear together. Record your "RESPONSE" by pressing the "RETURN" key. experiment sessions.  This task is repeated for every trial in all  Each time you will be prompted by the onset of a "BELL".  quickly as possible. 383  Please respond as  APPENDIX HtlNSTRUCTIONS  FOR SUBJECTS / 384  Your assistant will be with you throughout this practice to ensure that you encounter no difficulties.  SCREEN 3:  Finally, notice that during the course of each trial, after you press return indicating that you are ready to see the graph, there may be a slight delay before the graph comes on the screen.  Answer the  question by pressing the appropriate key, followed immediately by hitting the "RETURN" key. Respond as quickly as possible; however, be sure that your answer is correct.  You may begin with the "PRACTICE" session by entering your name and pressing the "RETURN" key.  XX. APPENDIX hPILOT TESTING REPORT Summary of Pilot Study The purpose of this pilot study was t o test the strength of the experimental treatments and the experimental procedures for my dissertation research.  Method  A computer program was written t o present each subject with various graphical representations and questions o n a microcomputer.  The program was designed so that each trial consisted of a displayed  question followed by the appropriate graph when requested. When a subject indicated his/her answer by entering a choice key, s/he automatically moved t o the next trial.  Subjects received instructions while interacting with the program and may clarify any difficulties with the experimenter w h o would always be available during the course of the experiment.  Subjects were  initially given a series of twelve practice trials and advised accordingly by the experimenter, particularly when they showed any non-conforming behavior with regards t o experimental procedures.  These  practice trials, aimed at familiarizing subjects with the computer system and the experimental procedures, were typical of the question-and-answer sessions used in the actual experimentation.  It  took most subjects just about half-an-hour t o complete the whole experiment.  Variables  There are four independent and t w o dependent variables in this experiment.  The t w o dependent  variables are the accuracy of the answers and the time it takes t o answer the questions asked. The four independent variables are factors that have to do with the different types of graphical designs and the questions.  385  APPENDIX hPILOT TESTING REPORT / 386 Three types of graphical representations used are bars, lines and symbols. These graphs are designed as time-series with 7 or 14 periods depicted along the abscissa and 1 or 3 companies as units to be examined.  Three types of questions used in the pilot testing are: 1.  fxacf Questions  - those asking the comparison of the value of one point to an exact value  along the Ordinate; 2.  Comparison  Questions  - those asking the comparison of values of t w o adjacent points along  the abscissa of a particular company; 3.  Trend Questions  - those asking the trend of a range of points along the abscissa.  Examples of each of these question types are provided in Appendix I of this report.  In this study, the accuracy score is binary with 1 assigned to those correctly answered questions and 0 t o those incorrectly answered questions.  Time is measured to a hundredth of a second from the  moment the graph appeared to the moment the subject presses the answer key.  Experimental Design  A full factorial within subject experimental design is used (three question types by three graphical forms by t w o abscissa periods by t w o dataset groupings). Every subject receives all thirty-six treatment combinations.  Sixteen subjects, one undergraduate and fifteen graduate business students, participated in the pilot study.  Three of the subjects repeated the experiment twice yielding a total of nineteen complete  datasets.  The statistical analysis was based on these nineteen complete datasets since none of the  subjects show any sign of or, ever complained of fatigue or loss of interest during normal as well as repeated testings.  APPENDIX l:PlLOT TESTING REPORT / 387 Findings  The analysis of the findings is divided into t w o parts - one dealing with the statistical analysis of speed of responses across all nineteen complete datasets; the other, a report of the frequency of errors committed across the different types of treatment combinations.  Respond Time Analysis  A five-way ANOVA, based on the appropriate error term for the F test used, with subject treated as a random factor reveals the following significant main and interaction effects:  ANOVA Results (p-values)  Effect  Grouping  Time  .0001  (1 or 3 company units) Question  .0001  (Exact, Comparison or Trend) Period  .0310  (7 or 14 periods) Form X Question  .0002  Grouping X Form  .0011  Grouping X Form X Question  .0282  APPENDIX l:PILOT TESTING REPORT / 388 These results provide strong evidence that the independent variables manipulated are significant factors affecting the use and understanding of graphical representations. A table of mean values on the effects of more significant factors is provided in Appendix II of this report.  A more careful examination of the strong Form X Question interaction effect see Appendix II showed that line representations were best suited to Trend appropriate when answering Exact Questions,  Questions  whereas bars were most  ln terms of other interaction effects, symbol  representations appeared to be helpful to subjects only in the case of representing multiple companies but not single company whereas subjects faced the most difficulty when using mulitple although not single bar representations.  As a matter of fact, Cohen's  1977 approach for evaluating statistical power of the F-test for  main effects reveals that even with a proposed value of .15 for the effect size index f which is very conservative for most of the main effects studied here see operational  definitions  of f in  Cohen, 1977; p. 348 , a power level of about .89 is attained for 19 observations per cell. Due to repeated measures and the economy of the within-subject design, a power level of .95 can be obtained with just twenty-five subjects.  Consequently, twenty-five  subjects and  different  random treatment orders are suggested for each experiment in the dissertation series.  Accuracy Score Analysis  There are t w o kinds of errors made by subjects: first, the 'logical' error in which subject wanting to press the correct answer key instead presses the wrong one and second, the 'pure' error in which subject presses the wrong answer key but believing it t o be correct.  The first kind of  error could be reduced by means of longer practice; the second kind, by means of feedback or giving subject another chance to attempt a right answer to the same question. While the pilot test experiment were not initially designed to distinguish and trace these various kinds of errors  APPENDIX IrPILOT TESTING REPORT / 389 for analysis, it has since been modified t o do so for the dissertation experiments.  However, one  assuring fact is that subjects in general made very few errors with the worst score achieving a high 75% accuracy, and the rest above 83%. The average overall accuracy attained is above 92%!  A frequency count analysis of the number and types of error made revealed errors occurring at most only once or twice for most treatment combinations, except for the following which recorded 3 or more instances of errors committed by various subjects:  Accuracy Results (No. of errors)  Treatment Combination  No. of Errors  No. of Subjects  No. 12 (14-period Multi-lines  13 times  13  7 times  7  4 times  4  3 times  3  on Exact Questions) No. 11 (7-period Multi-lines on Exact Questions) No. 32 (14-period Multi-symbols on Exact Questions) No. 35 (7-period Multi-lines on Comparison Questions)  From this and earlier results, we see interesting problems with the perception of certain graphical representations for certain tasks.  For example, multiple line representations which  APPENDIX I:PILOT TESTING REPORT / 390 yielded the most trouble in subjects' ability t o respond quickly to £xacf Questions  also caused  the most number of errors in the pressing of the answer key!  This is true for al! multiple line  graphics whether designed as only 7 or as 14 abscissa periods.  One way to reduce this error is  to relax the perceptual accuracy for these exact questions such as making the exact value comparison easier to extract; otherwise, an alternative is to have the subject repeat those incorrectly answered questions at the end of all 36 trials.  It will only be a matter of time for  him/her to learn that the answer given earlier was wrong.  Conclusion  The objective of the pilot testing was to investigate if there should be changes made to the experimental  programs  and  procedures.  Interviews with various  subjects  possibility of having them replicate the experiment without loss of interest.  suggested  the  In the post-test  interview, subjects were specifically asked about the extent to which they had difficulty coping with various questions asked, whether part of the instructions were hard to understand or they were getting bored or tired during the course of the experiment.  Most subjects were happy  with the instructions of the experiment and clearly indicated that they would be willing to go for another run should their results be lost or erased accidentally. A few subjects who complained about the selection of the answer keys and the need to remember what these keys were representing found that after some practice sessions, they became quite comfortable with the 'V or '2' key t o be pressed as answer keys.  More dissatisfaction was expressed when the keys  were modified to a different set like '0' and ' 1 ' , and so on.  One incident which clearly indicated the effectiveness of the experimental procedure occurred when an earlier version of the experimental program was mistakenly used on one subject which ended up requiring him to go through a series of over 70 continuing trials! The experimenter finally stopped this subject and much to his surprise, this same subject was still eager to  APPENDIX LPILOT TESTING REPORT / 391 volunteer to run the newer version of the experimental program without one complain of fatigue!  The experimental procedures were also designed to overcome the t w o major weaknesses of a completely repeated design; namely, the order effects as well as the carryover effects.  The  order effects are controlled by randomly varying the order of the treatments across subjects whereas that of the carryover effects can be adequately controlled via randomization and replications of the actual experimental session.  Put together, the pilot test results successfully indicate that: 1.  The independent variables have been operationalized properly and are strong enough to cause significant effects;  2.  Much of the task demand was simple enough for subjects to respond quickly and accurately;  3.  Most, if not all, of the graphics presentations were adequate for answering the question asked and not confusing to the subjects;  4.  The instructions and the other procedures associated with the experiment are sound.  APPENDIX 1: PI LOT TESTING REPORT / 392 Appendix 1: Examples of various Question Types  Exact Question  A's REVENUES IN PERIOD 4 ARE  THAN $80,000 ?  (1) LESS (2) MORE  Comparison Question  A's REVENUES IN PERIOD 8 ARE  THAN THAT IN PERIOD 9 ?  (1) LOWER (2) HIGHER Trend Question  A's REVENUES FROM PERIOD 5 TO 7 ARE GENERALLY  (1) DECREASING (2) INCREASING  ?  APPENDIX  1:PILOT TESTING REPORT / 393  Appendix 11: Table of Means for Response T i m e  (Form X Q u e s t i o n X Grouping)  FORM i  c  QUESTION TYPES  Mean (in  RTime sees)  i  i  jymbo! Is  I  Bars  I..ines  Q1  Q2  Q3  Q1  Q2  Q3  4 . 2  3 . 3  3 . 2  3 . 7  3 . 8  3.3  3 . 9  3 . 0  2.8  2.8  3 . 0  4 . 5  3.6  3.7  4.6  4 . 6  Q2  Q3  5. 1  3 . 1  2 . 8  2.4  4.0  2 . 9  2.3  4.3  6.2  3 . 2  3 . 3  GROUPING  1  Company  3  Companies  Note: Q1  --  Exact  Questions  Q2•--  Comparison  Q3  Trend  --  Questions  Questions  XXI. APPENDIX ^QUESTIONNAIRE FOR SUBJECTS Graphics Questionnaire DIRECTIONS  Please react t o the following statements about the information system you have been using. There are no right or wrong answers as this is not a test.  We are interested only in your opinions about how  well the graphics presentations used in the experiment support your comprehension process.  On the scale below please circle the answer which best corresponds t o your opinion. the statement was:  This room is very cold today.  Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  Then, circle:  1.  If you thought it was very cold;  2.  f you thought it was cold;  3.  tf you thought it was cool;  4.  f you thought it was indifferent;  5.  If you thought it was warm;  6.  f you thought it was hot;  7.  If you thought it was very hot;  394  For instance, if  APPENDIX ^QUESTIONNAIRE  FOR SUBJECTS / 395  ACCURACY  LINE  1. The contents of the LINE GRAPHS were very accurate.  GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  BAR  2. The contents of the BAR CHARTS were very accurate.  CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  SCATTER  3. The contents of the SCATTER PLOTS were very accurate.  PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  UNDERSTANDING  LINE  4. The LINE GRAPHS were very easy to understand.  GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  BAR  5. The BAR CHARTS were very easy to understand.  CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  SCATTER PLOTS  6. The SCATTER PLOTS were very easy to understand.  APPENDIX ^QUESTIONNAIRE  FOR SUBJECTS / 396  Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  RELEVANCE  LINE  16. The LINE GRAPHS contained exactly the right type of information.  GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  BAR  17. The BAR CHARTS contained exactly the right type of information.  CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  SCATTER  18. The SCATTER PLOTS contained exactly the right type of information.  PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  APPENDIX ^QUESTIONNAIRE  FOR SUBJECTS / 397  FORMAT  LINE  10. The LINE GRAPHS were very well formatted.  GRAPHS Strongly agree  BAR  1 2 3 4 5 6 7 Strongly disagree.  11. The BAR CHARTS were very well formatted.  CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  SCATTER  12. The SCATTER PLOTS were very well formatted.  PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  INFORMATIVENESS  LINE  13. The LINE GRAPHS contained t o o much information.  GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  BAR  14. The BAR CHARTS contained t o o much information.  CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  SCATTER PLOTS  15. The SCATTER PLOTS contained too much information.  APPENDIX ^QUESTIONNAIRE  FOR SUBJECTS / 398  Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  USEFULNESS  LINE  7. The LINE GRAPHS were very useful for answering the questions.  GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  BAR  8. The BAR CHARTS were very useful for answering the questions.  CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  SCATTER  9. The SCATTER PLOTS were very useful for answering the questions.  PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  APPENDIX ^QUESTIONNAIRE  FOR SUBJECTS / 399  CLARITY  LINE  19. The LINE GRAPHS clearly indicated when Revenues were high or low.  GRAPHS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  BAR  20. The BAR CHARTS clearly indicated when Revenues were high or low.  CHARTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  SCATTER  2 1 . The SCATTER PLOTS clearly indicated when Revenues were high or low.  PLOTS Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  SATISFACTION  LINE  22. My overall satisfaction with the LINE GRAPHS is best described as:  GRAPHS Very satisfied 1 2 3 4 5 6 7 Very dissatisfied.  BAR  23. My overall satisfaction with the BAR CHARTS is best described as:  CHARTS Very satisfied 1 2 3 4 5 6 7 Very dissatisfied.  SCATTER PLOTS  24. My overall satisfaction with the SCATTER PLOTS is best described as:  APPENDIX ^QUESTIONNAIRE  FOR SUBJECTS / 400  Very satisfied 1 2 3 4 5 6 7 Very dissatisfied.  RESPONSIVENESS  25. I found that the graphics reports appeared very quickly the moment I asked for it.  Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  CONSISTENCY  26.  I believe my approach to performing various tasks remain fairly consistent throughout  experiment.  Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  APPENDIX ^QUESTIONNAIRE  FOR SUBJECTS / 401  OTHERS  27. I found the questions to be very easy to understand.  Strongly agree 1 2 3 4 5 6 7 Strongly disagree.  28: I found the questions to be meaningless without first looking at the accompanying graphics reports.  Strongly agree 1 2 3 4 5 6 7 Strongly di