Visual Exploratory Analysis of Large Data Sets Evaluation and Application by Heidi Lap Mun Lam B.A.Sc., Simon Fraser University, 2001 M.A.Sc., Simon Fraser University, 2004 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in The Faculty of Graduate Studies (Computer Science) The University Of British Columbia (Vancouver) May, 2008 c© Heidi Lap Mun Lam 2008 Abstract Large data sets are difficult to analyze. Visualization has been proposed to assist exploratory data analysis (EDA) as our visual systems can process signals in parallel to quickly detect patterns. Nonetheless, designing an effective visual analytic tool remains a challenge. This challenge is partly due to our incomplete understanding of how common visualization techniques are used by human operators during analyses, either in laboratory settings or in the workplace. This thesis aims to further understand how visualizations can be used to support EDA. More specifically, we studied techniques that display multiple levels of visual information resolutions (VIRs) for analyses using a range of methods. The first study is a summary synthesis conducted to obtain a snapshot of knowledge in multiple-VIR use and to identify research questions for the thesis: (1) low-VIR use and creation; (2) spatial arrangements of VIRs. The next two studies are laboratory studies to investigate the visual memory cost of image transformations frequently used to create low-VIR displays and overview use with single-level data displayed in multiple-VIR interfaces. For a more well-rounded evaluation, we needed to study these techniques in ecologically-valid settings. We therefore selected the application domain of web session log analysis and applied our knowledge from our first three evaluations to build a tool called Session Viewer. Taking the multiple coordinated view and overview + detail approaches, Session Viewer displays multiple levels of web session log data and multiple views of session populations to facilitate data analysis from the high-level statistical to the low-level detailed session analysis approaches. Our fourth and last study for this thesis is a field evaluation conducted at Google Inc. with seven session analysts using Session Viewer to analyze their own data with their own tasks. Study observations suggested that displaying web session logs at multiple levels using the overview + detail technique helped ii Abstract bridge between high-level statistical and low-level detailed session analyses, and the simultaneous display of multiple session populations at all data levels using multiple views allowed quick comparisons between session populations. We also identified design and deployment considerations to meet the needs of diverse data sources and analysis styles. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.1 Visual information resolution (VIR) . . . . . . . . . . . . 5 1.1.2 Evaluation strategies in information visualization . . . . 6 1.2 Thesis Contributions: Evaluation . . . . . . . . . . . . . . . . . 13 1.2.1 Summary synthesis: multiple visual information resolu- tion interface designs . . . . . . . . . . . . . . . . . . . . 14 1.2.2 Laboratory experiment: visual memory costs of transfor- mations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.3 Experimental-simulation study: overview use . . . . . . . 17 1.2.4 Field evaluation: Session Viewer at Google Inc. . . . . . 19 1.3 Thesis Contributions: Application . . . . . . . . . . . . . . . . . 20 2 Background: Exploratory Data Analysis and Visualization . 22 2.1 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . 23 2.2 Roles of Visualization in EDA . . . . . . . . . . . . . . . . . . . 25 2.3 Challenges and Requirements for EDA Visualization Systems . . 26 2.3.1 Provide context for data interpretation . . . . . . . . . . 26 2.3.2 Provide re-representation and multiple representations of data, and allow fluid traversals . . . . . . . . . . . . . . . 27 iv Table of Contents 2.3.3 Provide linking between data views . . . . . . . . . . . . 27 2.4 Web Session Log Analysis . . . . . . . . . . . . . . . . . . . . . . 28 2.4.1 Session logs . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.2 Log analysis: existing practices and problems . . . . . . . 30 3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1 Empirical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1.1 Laboratory experiments to study human perceptual and cognitive capabilities . . . . . . . . . . . . . . . . . . . . 33 3.1.2 Experimental-simulation studies to evaluate multiple-VIR techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1.3 Field experiments to understand visualization system use in the field . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1.4 Systematic reviews to summarize existing visualization study results . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Visual Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.1 Existing visualizations for data exploration . . . . . . . . 42 3.2.2 Existing interactions for data exploration . . . . . . . . . 45 3.2.3 Visualizations for web session logs . . . . . . . . . . . . . 46 3.2.4 Visualizations for computer-based logs . . . . . . . . . . 47 3.2.5 Non-visual log analysis tools . . . . . . . . . . . . . . . . 48 3.2.6 Summary of Visual Analytics Related Work . . . . . . . 48 4 Summary Synthesis: A Study-based Guide to Multiple Visual Information Resolution Interface Designs . . . . . . . . . . . . 49 4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Summary of Studies . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Decision 1: Single or Multiple-VIR Interface? . . . . . . . . . . 56 4.3.1 Consideration 1: multiple-VIR interface interaction costs should be considered . . . . . . . . . . . . . . . . . . . . 57 4.3.2 Consideration 2: single-level task-relevant data may not be suited for multiple-VIR displays . . . . . . . . . . . . 60 4.3.3 Summary of considerations in choosing between a single or a multiple-VIR interface . . . . . . . . . . . . . . . . . 62 4.4 Decision 2: How to Create the Low VIRs? . . . . . . . . . . . . 62 4.4.1 Consideration 1: having too many visual resolutions may hinder performance . . . . . . . . . . . . . . . . . . . . . 63 v Table of Contents 4.4.2 Consideration 2: having too much information on the low- VIR display may hinder performance . . . . . . . . . . . 65 4.4.3 Consideration 3: displaying information is not sufficient; information has to be perceivable . . . . . . . . . . . . . 68 4.4.4 Consideration 4: a priori automatic filtering may be a double-edged sword . . . . . . . . . . . . . . . . . . . . . 71 4.4.5 Consideration 5: the roles of the low-VIR displays may be more limited than proposed in literature . . . . . . . . 73 4.4.6 Summary of considerations in low-VIR creations . . . . . 75 4.5 Decision 3: Simultaneous or Temporal Displays of the Multiple VIRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5.1 Consideration 1: tasks with single-level answers may not benefit from simultaneous VIR displays . . . . . . . . . . 76 4.5.2 Consideration 2: tasks with single-level information scent may not benefit from the simultaneous display of different visual resolutions . . . . . . . . . . . . . . . . . . . . . . 77 4.5.3 Considerations in choosing between temporal switching or simultaneous display of the VIRs . . . . . . . . . . . . . 80 4.6 Decision 4: How to Spatially Arrange the Visual Information Resolutions, Embedded or Separate? . . . . . . . . . . . . . . . 80 4.6.1 The issue of distortion . . . . . . . . . . . . . . . . . . . 84 4.6.2 Considerations in spatially arranging the various VIRs . 87 4.7 Summary: Design Recommendations . . . . . . . . . . . . . . . 87 4.7.1 Provide the same number of VIRs as the levels of organi- zation in the data . . . . . . . . . . . . . . . . . . . . . . 87 4.7.2 Provide relevant, sufficient, and necessary information in the low-VIR displays to support context use . . . . . . . 88 4.7.3 Simultaneously display VIRs for multi-level answers or multi-level clues . . . . . . . . . . . . . . . . . . . . . . . 88 4.7.4 Open question: how should multiple VIRs be displayed simultaneously? . . . . . . . . . . . . . . . . . . . . . . . 89 4.8 Summary: Methodology Recommendations . . . . . . . . . . . . 89 4.8.1 Use comparable interfaces . . . . . . . . . . . . . . . . . 90 4.8.2 Capture usage patterns . . . . . . . . . . . . . . . . . . . 96 4.8.3 Isolate interface factors . . . . . . . . . . . . . . . . . . . 97 4.8.4 Report study details . . . . . . . . . . . . . . . . . . . . . 98 4.9 Limitations of Study . . . . . . . . . . . . . . . . . . . . . . . . . 100 vi Table of Contents 4.10 Summary of Results and Implications for Design . . . . . . . . . 100 5 Laboratory Experiment: Visual Memory Costs of Image Trans- formations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.1.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . 104 5.1.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.1.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.1.4 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.2 Data Analysis and Result Summaries . . . . . . . . . . . . . . . 111 5.3 Detailed Results and Statistics . . . . . . . . . . . . . . . . . . . 112 5.3.1 Scaling transformation . . . . . . . . . . . . . . . . . . . 113 5.3.2 Rotation transformation . . . . . . . . . . . . . . . . . . 113 5.3.3 Rectangular fisheye transformation . . . . . . . . . . . . 115 5.3.4 Polar fisheye transformation . . . . . . . . . . . . . . . . 115 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.4.1 Effects of image transformations . . . . . . . . . . . . . . 119 5.4.2 Effects of grids . . . . . . . . . . . . . . . . . . . . . . . . 121 5.4.3 Revisiting design guidelines . . . . . . . . . . . . . . . . . 122 5.5 Limitations of Study . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.5.1 Definition of no-cost zones . . . . . . . . . . . . . . . . . 124 5.5.2 Transformation type . . . . . . . . . . . . . . . . . . . . . 125 5.6 Summary of Results and Implications for Design . . . . . . . . . 126 6 Experimental-Simulation Study: Overview Use in Multiple Vi- sual Information Resolution Interfaces . . . . . . . . . . . . . . 128 6.1 User Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1.1 Study tasks . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.1.2 Study data . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.1.3 Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.1.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1.5 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.1.6 Study design and protocol . . . . . . . . . . . . . . . . . 140 6.1.7 Study design choices . . . . . . . . . . . . . . . . . . . . 143 6.2 Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.2.1 Performance time and error results . . . . . . . . . . . . 145 6.2.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . 145 vii Table of Contents 6.2.3 Subjective preference and questionnaire results . . . . . . 146 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3.1 H1: True. The low-VIR view alone is sufficient if the target is simple and spans a limited visual angle . . . . . 149 6.3.2 H2: False. Embedding high-VIR plots in low-VIR strips did not enhance complex-target matching . . . . . . . . . 151 6.3.3 H3: False. Providing side-by-side visual comparison with selective detailed plots did not enhance simple but similar target matching . . . . . . . . . . . . . . . . . . . . . . . 152 6.3.4 Interaction complexity and spatial arrangements . . . . . 153 6.4 Limitations of Study . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.5 Summary of Results and Implications for Design . . . . . . . . . 156 7 Session Viewer Design . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.1 Target Users: Web Session Log Analysts . . . . . . . . . . . . . 159 7.2 Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.3 From Design Goals to Tool Features . . . . . . . . . . . . . . . . 161 7.3.1 User-defined data objects . . . . . . . . . . . . . . . . . . 162 7.3.2 Main visualization panels . . . . . . . . . . . . . . . . . . 162 7.3.3 From design goals to tool features . . . . . . . . . . . . . 165 7.4 Session Viewer: Visualization and Interactions . . . . . . . . . . 169 7.4.1 Aggregate Pane . . . . . . . . . . . . . . . . . . . . . . . 169 7.4.2 Multiple Pane . . . . . . . . . . . . . . . . . . . . . . . . 170 7.4.3 Detail Pane . . . . . . . . . . . . . . . . . . . . . . . . . 174 7.4.4 Interactions and view coordinations . . . . . . . . . . . . 174 7.4.5 Other tool features . . . . . . . . . . . . . . . . . . . . . 176 7.4.6 Implementation details . . . . . . . . . . . . . . . . . . . 178 7.5 Use-Case Scenario: Exploring the Relationships between Task Type and Search Behaviour . . . . . . . . . . . . . . . . . . . . . 178 7.6 Design Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.6.1 SV1: basic design . . . . . . . . . . . . . . . . . . . . . . 187 7.6.2 SV2: supporting multiple populations . . . . . . . . . . . 187 7.6.3 Scrollable Multiple Pane and session partitioning to pro- vide sessions-level overview . . . . . . . . . . . . . . . . . 189 7.6.4 Visualizations to provide aggregate-level overviews . . . . 191 7.6.5 Multiple coordinated view to show multi-level data . . . 192 viii Table of Contents 7.6.6 Vertically-stacking multiple panel views for population comparisons . . . . . . . . . . . . . . . . . . . . . . . . . 193 8 Field Evaluation: Session Viewer at Work . . . . . . . . . . . . 196 8.1 Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.1.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 200 8.1.3 Setting and apparatus . . . . . . . . . . . . . . . . . . . . 201 8.1.4 Data collection and analysis . . . . . . . . . . . . . . . . 202 8.2 Design Theme 1: Working with Real-World Data . . . . . . . . 203 8.2.1 Finding 1: data validation was integral in analysis . . . . 204 8.2.2 Finding 2: data-field needs were diverse . . . . . . . . . . 210 8.2.3 Finding 3: data signals were difficult to find . . . . . . . 211 8.2.4 Design Theme 1 summary: tool needs to be flexible for real-world data . . . . . . . . . . . . . . . . . . . . . . . . 214 8.3 Design Theme 2: Tool reception . . . . . . . . . . . . . . . . . . 215 8.3.1 Finding 1: gap in existing analysis-tool coverage . . . . . 215 8.3.2 Finding 2: data transfer is less crucial than assumed . . 218 8.3.3 Finding 3: tool power brought complexity . . . . . . . . 221 8.3.4 Design Theme 2 summary: unique tool role determines reception . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 8.4 Limitations of Study . . . . . . . . . . . . . . . . . . . . . . . . . 223 8.5 Summary of Results and Implications for Design . . . . . . . . . 224 9 Open Questions, Conclusions, and Future Work . . . . . . . . 226 9.1 Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 9.1.1 Creation of overviews . . . . . . . . . . . . . . . . . . . . 226 9.1.2 The roles of context . . . . . . . . . . . . . . . . . . . . . 231 9.1.3 Spatial arrangements of the VIRs . . . . . . . . . . . . . 233 9.1.4 Evaluating information visualization . . . . . . . . . . . . 234 9.2 Thesis Conclusions and Future Work . . . . . . . . . . . . . . . 237 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Appendices A Previously Published or In-Preparation Papers . . . . . . . . . 258 ix Table of Contents B Summary Synthesis Reviewed Studies . . . . . . . . . . . . . . . 260 B.1 Study Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . 260 B.2 Study Interfaces, Tasks, Data, and Results . . . . . . . . . . . . 262 C Visual-Memory Experiment Materials . . . . . . . . . . . . . . 271 C.1 Informed Consent . . . . . . . . . . . . . . . . . . . . . . . . . . 271 C.2 Participant Instructions . . . . . . . . . . . . . . . . . . . . . . . 272 C.3 Sample Experimental Stimuli . . . . . . . . . . . . . . . . . . . . 272 C.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . 274 C.5 Experimental Result Analysis . . . . . . . . . . . . . . . . . . . 274 C.5.1 Single-factor ANOVA results . . . . . . . . . . . . . . . . 274 C.5.2 Post-hoc analysis results . . . . . . . . . . . . . . . . . . 276 D Overview-Use Study Materials . . . . . . . . . . . . . . . . . . . 277 D.1 Informed Consent . . . . . . . . . . . . . . . . . . . . . . . . . . 277 D.2 Participant Instructions . . . . . . . . . . . . . . . . . . . . . . . 279 D.2.1 Verbal briefing instructions . . . . . . . . . . . . . . . . . 279 D.2.2 Instructions on the study software interface . . . . . . . . 280 D.3 Participant Questionnaires . . . . . . . . . . . . . . . . . . . . . 284 E Session Viewer Field Evaluation Materials . . . . . . . . . . . . 289 E.1 Pre-Session Interview Script . . . . . . . . . . . . . . . . . . . . 289 E.2 Post-Session Interview Script . . . . . . . . . . . . . . . . . . . . 289 F UBC Research Ethics Board Certificates . . . . . . . . . . . . . 291 x List of Tables 4.1 Summary synthesis: study summary . . . . . . . . . . . . . . . . 56 4.2 Summary synthesis: papers reporting interactions . . . . . . . . . 57 4.3 Summary synthesis: papers with single-level data . . . . . . . . . 60 4.4 Summary synthesis: papers with compound multiple-VIR interfaces 63 4.5 Summary synthesis: papers that included at least two multiple- VIR interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6 Summary synthesis: papers that included both a hiVIR and a multiple-VIR interface . . . . . . . . . . . . . . . . . . . . . . . . 67 4.7 Summary synthesis: papers that looked at text data . . . . . . . 69 4.8 Summary synthesis: papers that implemented a priori automatic filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.9 Summary synthesis: papers that included a temporal and at least one simultaneous-display interface . . . . . . . . . . . . . . . . . 77 4.10 Summary synthesis: papers with both simultaneous interfaces . . 81 4.11 Summary synthesis: papers that included a embedded interface . 85 5.1 Visual-memory experiment: summary of no-cost zones . . . . . . 111 5.2 Visual-memory experiment: summary of performance cost . . . . 111 6.1 Overview-use study: pilot task instructions . . . . . . . . . . . . 131 6.2 Overview-use study: study instructions . . . . . . . . . . . . . . . 132 6.3 Overview-use study: summary of study task and data character- istics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.4 Overview-use study: questionnaire . . . . . . . . . . . . . . . . . 142 6.5 Overview-use study: study coded behaviour for interface modes . 146 6.6 Overview-use study: study coded behaviour for answer confirma- tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.7 Overview-use study: study coded behaviour for visual search modes146 xi List of Tables 8.1 Session Viewer field evaluation: session counts . . . . . . . . . . . 201 C.1 Visual-memory experiment: ANOVA tables for the scaling ex- periments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 C.2 Visual-memory experiment: ANOVA tables for the rectangular fisheye experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 275 C.3 Visual-memory experiment: ANOVA tables for the rotation ex- periments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 C.4 Visual-memory experiment: ANOVA tables for the polar fisheye experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 C.5 Visual-memory experiment: Pairwise comparison results . . . . . 276 xii List of Figures 1.1 Multiple-VIR interfaces . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 McGrath’s research strategy circumplex . . . . . . . . . . . . . . 8 2.1 Sample web session log . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1 Summary synthesis: decision tree . . . . . . . . . . . . . . . . . . 50 5.1 Visual-memory experiment: scaling transformation sample stimuli 105 5.2 Visual-memory experiment: rotation transformation sample stimuli106 5.3 Visual-memory experiment: rectangular fisheye transformation sample stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4 Visual-memory experiment: polar fisheye transformations sample stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.5 Visual-memory experiment results: scaling time and accuracy . . 114 5.6 Visual-memory experiment results: rotation time and accuracy . 116 5.7 Visual-memory experiment results: rectangular fisheye time and accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.8 Visual-memory experiment results: polar fisheye time and accuracy120 5.9 Visual-memory experiment results: extended polar fisheye time and accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.1 Overview-use study: study interfaces and Max task data . . . . . 135 6.2 Overview-use study: study interfaces and Most task data . . . . 136 6.3 Overview-use study: study interfaces and Shape task data . . . . 137 6.4 Overview-use study: study interfaces and Compare task data . . 138 6.5 Overview-use study: sample targets for the Shape task . . . . . . 139 6.6 Overview-use study: study interface . . . . . . . . . . . . . . . . 139 6.7 Overview-use study: protocol . . . . . . . . . . . . . . . . . . . . 140 6.8 Overview-use study: study time results . . . . . . . . . . . . . . . 147 6.9 Overview-use study: study error results . . . . . . . . . . . . . . 147 xiii List of Figures 6.10 Overview-use study: study subjective rating results . . . . . . . . 148 6.11 Overview-use study: study subjective questionnaire results . . . . 148 7.1 Session Viewer: schematic diagram . . . . . . . . . . . . . . . . . 163 7.2 Session Viewer: main screen . . . . . . . . . . . . . . . . . . . . . 164 7.3 Session Viewer: histograms . . . . . . . . . . . . . . . . . . . . . 166 7.4 Session Viewer: link to webpages . . . . . . . . . . . . . . . . . . 168 7.5 Session Viewer: Histogram Panel . . . . . . . . . . . . . . . . . . 170 7.6 Session Viewer: Transitions Panel . . . . . . . . . . . . . . . . . . 170 7.7 Session Viewer: session markers . . . . . . . . . . . . . . . . . . . 172 7.8 Session Viewer: Sessions Panel . . . . . . . . . . . . . . . . . . . 173 7.9 Session Viewer: Expanded sessions. . . . . . . . . . . . . . . . . . 174 7.10 Session Viewer: Session Attributes Panel . . . . . . . . . . . . . . 175 7.11 Session Viewer: interaction coordination scheme . . . . . . . . . 175 7.12 Session Viewer: Pattern Matcher . . . . . . . . . . . . . . . . . . 177 7.13 Session Viewer: main panels showing study logs . . . . . . . . . . 180 7.14 Session Viewer: task time histogram . . . . . . . . . . . . . . . . 181 7.15 Session Viewer: Sessions Panels for two task types . . . . . . . . 182 7.16 Session Viewer: confirming a hypothesis formed by exploration . 183 7.17 Session Viewer: version 1 main screen . . . . . . . . . . . . . . . 185 7.18 Session Viewer: version 2 main screen . . . . . . . . . . . . . . . 186 7.19 SV2: the State Counts Panel . . . . . . . . . . . . . . . . . . . . 188 7.20 SV2: the Distribution/Filter Panel . . . . . . . . . . . . . . . . . 189 8.1 Session Viewer field evaluation: setup . . . . . . . . . . . . . . . 202 8.2 Session Viewer field evaluation: duplicate sessions . . . . . . . . . 205 8.3 Session Viewer field evaluation: event state highlights . . . . . . 209 8.4 Session Viewer field evaluation: expanded 2-dimensional session view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 D.1 Overview-use study material: Embedded interface . . . . . . . . . 280 D.2 Overview-use study material: Separate interface . . . . . . . . . . 281 xiv Acknowledgements In writing this thesis, I struggled between the use of “We” and “I” to describe a body of work that constitutes the thesis. To me, frequent use of the pronoun “I” seems unnecessarily egotistic since I cannot claim the work to be mine alone. Theoretically, I might have substituted instances of “I” by “We” “to secure an impersonal style and tone, or to avoid the obtrusive repetition of “I” (Oxford English Dictionary 2007), or to hide behind those “We”s to absolve myself from taking responsibility for the work. However, the true meaning behind those “We”s is to emphasize, acknowl- edge, and express my gratitude to my collaborators for their guidance and in- spirations in these past four years: my thesis supervisor, Tamara Munzner; my thesis committee members, Diane Tang, and Joanna McGrenere; and my col- laborators, most were also my internship mentors, Patrick Baudisch, Ronald Rensink, Robert Kincaid, and Daniel Russell. Naturally, I would also like to thank my parents, who for many years, have tolerated my many idiosyncrasies and assumed them to be necessary evils of technology-related scholarship. I understandably have not been diligent in pro- viding them with counter-examples. xv Chapter 1 Introduction A study in 2002 found that human beings collectively produced more than five exabytes, or 5x1018 bytes, worth of recorded information per year in the form of print, film, magnetic and optical storage media (Lyman and Varian 2003). The same study estimated that new stored information had grown by about 30% per year between 1999 and 2002 (Lyman and Varian 2003). Having such massive amounts of data is a double-edged sword. Ideally, availability of data makes it possible to make decisions based on data rather than intuition. For example, it has been argued that data analysts can obtain better insights into human behaviours from available databases to support decision making in both corporations and governments, and in a wide range of areas such as marketing, economics, and social policies (Levitt and Dubner 2005; Thomas and Cook 2005; Ayres 2007). In practice, however, human beings may be overwhelmed by the massiveness of data sets. Even as early as 1960, the flood of available data to individuals made information overload a subject of interest in clinical psychiatry (Miller 1960). By the mid 1990s, the term “information revolution” had became part of our vocabulary, and commentators on the social impacts of technology described challenges an individual had to face in our information age (e.g., Shenk 1997). If the burden on individual citizens to make informed decisions given the amount of data is heavy, the challenges on data analysts demand more than human diligence. The reason being that analysts have to routinely handle and analyze massive amounts of raw and potentially conflicting data gathered from various sources in multiple formats such as text, numbers, sounds, and images. In 2008, a decade later, data overload is still considered to be a generic and difficult problem for the data analyst (Woods et al. 2002; Thomas and Cook 2005). Given our visual ability to process a large amount of information in parallel, visualization has often been considered as a vital component in the solution (Tukey 1977; Thomas and Cook 2005). According to Tukey (1977), “the greatest value of a picture is when it forces us to notice what we never 1 Chapter 1. Introduction expected to see” (p. vi). Indeed, visualization has been applied to support a wide range of analyses in different domains. Examples include text documents with the ThemeView 3D visual landscape in In-SPIRE (Hetzler and Turner 2004) and ThemeRiver (Havre et al. 2002), computer source code with Seesoft (Eick et al. 1992), e-mails with Themail (Viégas et al. 2005), calendar data with DateLens (Bed- erson et al. 2004), tree analysis with Treemaps (Johnson and Shneiderman 1991) and TreeJuxtaposer (Munzner et al. 2003), and tabular database anal- ysis with Table Lens (Rao and Card 1994) and Polaris (Stolte and Hanrahan 2000). Visualization systems have also been built to assist data exploration and analysis. One example is the GeoTime software by Oculus (Hetzler and Turner 2004). In terms of visualization techniques, one main class of techniques display data in multiple visual information resolutions (VIRs). Examples of multiple- VIR techniques include zooming, focus + context, and overview + detail. In the list of visualization systems given above, panning and zooming techniques were applied to the ThemeView 3D visual landscape for users to access thousands of documents clustered by themes (Hetzler and Turner 2004). Overview + detail techniques were used in Themail (Viégas et al. 2005) and Seesoft (Eick et al. 1992). Seesoft displays up to 50,000 lines of source code with each line mapped to a thin row in the overview. Users can access the actual source code in a separate window. Themail displays gigabytes of e-mails by extracting keywords and displaying them in columns. Users can select words to view corresponding e-mail messages in detail. Focus + context applications in our example list include Table Lens to display up to 68,400 table cells on a 19-inch screen (Rao and Card 1994), DateLens to display up to six months of calendar data on small screens (Bederson et al. 2004), and TreeJuxtaposer to display up to 500,000 tree nodes (Munzner et al. 2003). Despite continual efforts, displaying large data sets to support exploratory analysis remains difficult. Part of the challenge is scalability (Thomas and Cook 2005, p. 24–28). The sheer size of modern data sets requires novel visualization techniques that can display more data points and dimensions than available pixels on standard output devices. Displaying large data sets also requires a non-trivial amount of engineering effort to ensure system interactivity for data exploration. In addition to technical challenges, visualization system designers also need to consider perceptual and cognitive limitations of human operators. Due to the complex interplay between the human operator and the visual- 2 Chapter 1. Introduction ization system, empirical evaluation of system use is paramount in providing effective visualization support for data analysis. However, our understanding of frequently-used visualization techniques such as animation (Tversky et al. 2002) and focus + context (Furnas 2006) remains incomplete. Another impor- tant aspect of evaluation that tends to be overlooked in visualization research concerns system deployment (Thomas and Cook 2005, p. 149). Deployment- related considerations involve handling real-world data and diverse analytical practices, and integrating the visualization system with established workflows and tools. Perhaps because of the lack of follow-through of research prototypes into the workplace, technology transfer and commercialization of prototypes re- main rare despite the proliferation of visualization research. In short, building an effective visualization system to support visual exploratory analysis of large data sets requires understanding of both information-visualization specific issues and non-visualization specific deployment issues. This thesis therefore investigated both information-visualization specific and general deployment considerations in system building, and crystalized findings as design guidelines and considerations. To scope the project, we focused on multiple-VIR visualization techniques. As no single research strategy can ade- quately answer a research question, we investigated different aspects of multiple- VIR techniques with four studies conducted using a wide variety of meth- ods, including a qualitative summary synthesis, a laboratory experiment, an experimental-simulation study, and a field evaluation. The thesis started by taking a high-level view to obtain a more comprehen- sive snapshot of knowledge about multiple-VIR interface use using a qualitative summary synthesis approach, reported in Chapter 4 as the first of four eval- uations. We examined 19 existing experimental-simulation studies to extract high-level guidelines for multiple-VIR interface designs, looking at issues such as the amount of information displayed on overviews, the number of VIRs re- quired for effective analysis, and methods to present the various VIRs in the interface. We also identified two research areas for further examination in this thesis: (1) overview creation in multiple-VIR interfaces, and (2) spatial arrange- ments of the different VIRs. We investigated these two questions in our quantitative laboratory studies. The second study of the thesis is a laboratory experiment. Reported in Chap- ter 5, our laboratory experiment systematically measured visual memory costs incurred by two-dimensional geometric transformations that are frequently used in creating data overviews and integrating VIRs in multiple-VIR interfaces. 3 Chapter 1. Introduction The third study of the thesis is an experimental-simulation study detailed in Chapter 6. We examined the use of overviews in multiple-VIR interfaces for single-level data. While laboratory studies can be effective in studying visualization tech- niques, such approaches are necessarily limited by the need to study isolated fac- tors such as task and visualization components using predetermined dependent variables (Plaisant 2004). To truly understand how a visualization technique is used and to understand deployment issues, we need to consider the technique as part of a visualization system in ecologically-valid settings. The remaining efforts in this thesis were therefore devoted to addressing these questions using a specific application domain. We chose web session log analysis as our application domain since it is representative and relevant: chal- lenges faced by web session analysts are also found in other analytical scenarios involving large data sets with complex compositions, and the need to analyze web log data to understand search behaviour increases with the growing impor- tance of the Internet as an information source. Existing tools for web session log analysis are largely non-visual and do not adequately support data explo- ration. This application domain is therefore an opportunity for us to apply the knowledge and experience gained in the first three studies, such as guidelines for overview construction and VIR presentation, to better support analysis of large data sets by mitigating problems in existing analysis practices. We built Session Viewer and deployed it at Google Inc. The design and implementation of Session Viewer are detailed in Chapter 7. Chapter 8 reports the fourth, and last, study in this thesis, which is a field evaluation conducted to evaluate design choices made in the process and to further understand issues encountered by visualization systems in the workplace. Chapter 9 discusses four questions addressed in our studies that remain re- search challenges. The chapter looks at the two research questions in the thesis: design choices in overview creation and design choices in spatial arrangements of VIRs in interfaces. We also discuss a question related to overview creation— potential roles of context in data analysis. The last open question addressed in Chapter 9 concerns approaches to evaluating visualization techniques and sys- tems to sum up our experiences in the four evaluations conducted in this thesis. The chapter concludes the thesis by elaborating on contributions of the thesis and suggests future directions. The remainder of this introductory chapter is structured as follows. After introducing two sets of terminology used throughout the thesis in Section 1.1, 4 Chapter 1. Introduction Sections 1.2 and 1.3 list the contributions of the thesis. Appendix A contains a list of published, submitted, and in-preparation papers related to the thesis. 1.1 Terminology Two sets of terminologies are used throughout the thesis. The first set concerns the interface techniques under study (Section 1.1.1), and the second set concerns the strategies employed to study those techniques (Section 1.1.2). 1.1.1 Visual information resolution (VIR) In this thesis, we devised the term visual information resolution (VIR) as a measure of visual information perceivability. Visual information is defined as datum values made accessible to the visualization system’s users by showing them visually. By our definition, displays with low VIR have comparatively lower visual information perceivability than displays with high VIR. Perceivability can be further characterized based on type, visual quantity, and visual quality of the displayed data. In terms of information type, VIR of interface views can differ if they display data from different levels of the hierarchy in the data organization: lower VIR shows data at higher levels of the hierarchy. For example, in Treemaps (Bederson et al. 2002), users can focus on different layers of the hierarchical tree at different VIRs in the display. In terms of quantity, interface views can differ in the amount of information displayed. One example is semantic zooming, where users are provided with different amounts of details in a view by zooming in and out. Both our metrics of information type and visual quantity are akin to Simon and Larkin’s (1987) definition of informational equivalence of representations, where “two representations are informational equivalent if all of the information in the one is also inferable from the other, and vice versa” (p. 67). In terms of quality, visual objects can display the same amount of data points with different visual encodings that result in different perceivability. One common example is the display of textual data. With the same font type, data displayed using small unreadable font sizes is considered to be of lower VIR than those displayed in larger readable font sizes. As for visual objects, the criteria of perceivability is less well defined. One example is the visual encod- ings used in our overview-use study in Chapter 6 where we encoded the same line graph data using two different types of encodings. The high-VIR encoding 5 Chapter 1. Introduction displays the y-dimension of line graphs using both space and colour, while the low-VIR encoding only uses colour, thus making the fine details of the displayed lines graph less perceivable. This characterization is akin to Simon and Larkin’s (1987) definition of computational equivalence of representations, where “two representations are computationally equivalent if they are informationally equiv- alent and, in addition, any inference that can be drawn easily and quickly from the information given explicitly in the one can also be drawn easily and quickly from the information given explicitly in the other, and vice versa” (p. 67). Taxonomies of multiple-VIR techniques exist. One example is Plaisant et al.’s (1995) taxonomy for image browsers. Detailed mapping between ex- isting taxonomies to our terminologies is beyond the scope of our discussion here. In general, our terminologies differ by focusing on the visual encodings instead of on their expected functions: for example, focus (as in focus + con- text) or detail (as in overview + detail) can be thought of as high VIR, while context or overview is of comparatively low VIR. Multiple-VIR interfaces can be further classified as temporal or simulta- neous based on the way they display the multiple VIRs, as shown in Figure 1.1. Temporal interfaces, an example being the pan-and-zoom user interfaces, allow users to drill up and down the zoom hierarchy and display the different VIRs one at a time. In contrast, simultaneous interfaces show all the VIRs on the same display. We refer to interfaces that integrate and spatially embed the dif- ferent VIRs as embedded displays, as in focus + context visualizations. When the different VIRs are displayed as separate views, we refer to these interfaces as separate, as in overview + detail displays. Since the different VIRs can occupy the entire display window, or be integrated as part of a single window, we explicitly differentiate the two by using the term view to denote separate windows or panes, and the term region to denote an area within a view. Additional examples of multiple-VIR interfaces can be found at http://www.cs.ubc.ca/∼hllam/res ss interfaces.htm. 1.1.2 Evaluation strategies in information visualization A wide range of experimental designs have been applied to study information visualization techniques and systems. This thesis employed four different types of strategies to examine various aspect of multiple-VIR interface use. This section introduces terminologies used in reference to visualization evaluation strategies, explains experimental design, tasks, and measurements, and how the 6 Chapter 1. Introduction Figure 1.1: Our classification of interfaces. Interfaces can be classified based on the number of visual information resolutions (VIR). (a) Single-VIR interfaces display one VIR, as in panning interfaces. (b)–(d) Multiple-VIR interfaces show multiple VIRs. In this illustrate, each interface contains two VIRs: high (de- noted as large circles) and low (denoted as small circles). (b) In the Temporal approach, users can pan around in the low-VIR view and zoom into an subarea as a high-VIR view, as in pan and zoom interfaces. (c) In the Separate approach, the low- and the high-VIRs can be placed in separate panels, as in overview + detail interfaces. (d) In the Embedded approach, the low- and the high-VIRs are embedded in a single unified display, as in focus + context interfaces. four evaluations in this thesis relate to these terminologies and strategies. There are a number of ways to classify experimental approaches. Some pos- sibilities include the type of analysis performed on collected data (quantitative versus qualitative), the time span (short term versus longitudinal), the study environment (laboratory versus field), and the study designs (controlled exper- iments versus observational studies). Unfortunately, no one axis is adequate in classifying existing approaches. To facilitate discussion in this thesis, we adopt McGrath’s (1994) taxonomy of research strategies, originally developed for so- cial and behavioural sciences, as it covers a wide range of strategies and focuses on research goals. McGrath (1994) classified research strategies with five axes into eight strate- gies, each represented as a slice in the strategy circumplex shown in Figure 1.2. The first three axes are based on three desirable features or criteria researchers wish to maximize in an experiment: A. Generalizability of evidence of different study populations; B. Precision of measurements of behaviours being studied; 7 Chapter 1. Introduction Figure 1.2: McGrath’s research strategy circumplex. c©1994 by Morgan Kauf- mann. Reprint by permission. and C. Realism of the situation or context within which the measurements are being made in relation to where study results will be applied. The fourth axis is concrete-abstract, or “the degree to which the setting used in the strategy is universal or abstract vs. particular or concrete” (McGrath 1994, p. 156). The last axis is obtrusive-unobtrusive, or “the degree to which the strategy involves procedures that are obtrusive, vs. procedures that are unobtrusive, with respect to the ongoing human systems that are to be the object of the study” (McGrath 1994, p. 156). With these five axes, McGrath (1994) derived eight research strategies grouped into four quadrants: field, experimental, respondent, and theoretical strategies. In this section, we discuss strategies employed in this thesis in the order of their presentation: the formal theory strategy of the theoretical strategies; the experimental strategies (laboratory experiment, experimental simulation); and the field strategies (field experiment, field study). We therefore will not discuss the respondent strategies in this discussion even though they have been used in surveys and questionnaires in visualization evaluations. 8 Chapter 1. Introduction Quadrant IV: Theoretical Strategies Of the two strategies in theoretical strategies included in McGrath’s (1994) research strategy circumplex, the formal theory strategy include a class of eval- uation method called systematic review. Though infrequently conducted, sys- tematic reviews can provide snapshots of existing knowledge based on existing study results, where “the researcher focuses on formulating general relations among a number of variables of interest” that “hold over some relatively broad range of populations” (McGrath 1994, p. 158). The following description of the various types of systematic review is based on Chapter 13 of (Shadish et al. 2002). Narrative review is a qualitative approach and describes existing litera- ture using narrative descriptions without performing quantitative synthesis of study results. Most published papers have related work sections that can be considered as narrative reviews. Due to its descriptive and qualitative nature, narrative reviews can include study results gathered using dramatically different methods and therefore can potentially present a more realistic view of existing knowledge. Reflections made by the reviewers, especially when they are experi- enced in the area under review, can be very illuminating and thought provoking. Two excellent examples are Card et al.’s (1999) essays on various information visualization topics such as interaction and focus + context, by Tversky et al.’s (2002) review on effectiveness of animation, and Furnas’s (2006) follow-up work on focus + context visualization techniques after his original paper on the topic ten years prior (Furnas 1986). Despite its strength, narrative review can lead to incorrect conclusions as the readers must rely on the reviewer to weight the significance of each reviewed publication. In addition, the selection of reviewed publications can be biased as well. Quantitative approaches were therefore developed to mitigate these biases, which form the basis of meta-analysis. Roughly, there are two main approaches to a meta-analysis: (1) vote count- ing and (2) study-effect analysis. Instead of describing study results, the vote- counting approach categorizes results as significantly positive, significant neg- ative, or nonsignificant in reference to the research question. The category with the most entries is considered to best represent existing knowledge about the research question under analysis. Its simplicity confuses treatment effect and sample-size effects, as nonsignificant results may be due to lack of experimental power, rather than lack of effect. In other words, the vote-counting approach 9 Chapter 1. Introduction to meta-analysis can miscount studies with small effects, which can lead to incorrect conclusions such as missing side-effects of medications. Meta-analysis that computes effect size is perhaps the most popular ap- proach to meta-analysis in information visualization evaluation. Indeed, as dis- cussed in Section 3.1.4, to the best of our knowledge, the only meta-analysis in information visualization computed effect size (Chen and Yu 2000). Effect size measure is used as a common metric for the diverse study outcomes so that the results can be meaningfully compared in a meta-analysis. However, using a standard metric usually leads to a drastic reduction of studies that can be brought under meta-analysis. In the case of Chen and Yu (2000), only 6 out of the 35 studies considered for the review met the researchers’ criteria for inclusion in the meta-analysis. Chapter 4 is a systematic review that took a mixture of the two approaches to obtain a snapshot of existing knowledge on multiple-VIR interface study results. Since our approach is not a conventional one, we termed it summary synthesis to avoid confusion. Due to the small number of experimental-simulation studies concerning multiple-VIR interfaces, we combined the inclusiveness and flexibil- ity of narrative review with some of the rigor of meta-analysis by listing all applicable study results for each research question under consideration, instead of only reporting results that supported our conclusions, as in most narrative reviews. Section 4.1 further details the methodology. Quadrant II: Experimental strategies The two strategies included in the experimental strategy quadrant are labora- tory experiment and experimental simulation. Laboratory experiment is a strategy where the researcher “deliberately concocts a situation or behaviour setting or context, defines the rules for its operation, and then induces some individuals or groups to enter the concocted system and engage in the behaviours called for by its rules and circumstances” (McGrath 1994, p. 157). Typically, designs in laboratory experiments in in- formation visualization are modeled after experiments in sciences, especially in experimental psychology, where “purposeful changes are made to the input vari- ables of a process or system so that we may observe and identify the reasons for changes that may be observed in the output response” (Montgomery 2001, p. 1). Examples include perceptual and cognitive studies to understand human limi- tations when interacting with visual displays. Section 3.1.1 discusses literature 10 Chapter 1. Introduction in this area. In the context of information visualization, “input variables of a process or system” relate to properties of the visual display such as item density and vari- ations in colour encoding or texture of display items. Stimuli used in laboratory experiments are mostly static images. Input variables are generally referred to as independent variables or experimental factors. The design is usually such that effects of these factors, such as reaction time and task accuracy, can be analyzed using established statistical methods such as t-tests and analysis of variance (ANOVA) F -tests. To isolate study factors, tasks in laboratory experiments are generally simple abstracted tasks such as visual search tasks and visual memory tasks. The visual search paradigm is an experimental technique developed by experimental psychologists to study a number of visual processes, such as preattentive and attentive processes in vision (Wolfe 2000, p. 335). In this paradigm, participants are shown visual displays containing varying numbers of objects and are asked to determine whether a pre-specified target is included in the display. For example, a person might be asked to look for a red T in a display containing different numbers of blue T’s and red O’s, and, on trials where the target is present, a red T as well (e.g., in Treisman and Gelade 1980). Another popular task used in laboratory experiments is the study of explicit memory using a three-phased task: a studying or encoding phase where the participant is exposed to the stimuli; a retention phase during which the stimuli is held in memory; and a testing phase where the participant is asked to recog- nize or recall information presented in the study phase (Wixted 1998, p. 265). Chapter 5 reports a study in this thesis that adopted the visual memory task to measure visual memory costs incurred by image transformations in interfaces. Experimental simulation is a strategy where the researcher “attempts to achieve much of the precision and control of the laboratory experiment but to gain some of the realism (or apparent realism) of field studies” by “concocting a situation or behaviour setting or context” and “making it as much like some class of actual behaviour setting as possible” (McGrath 1994, p. 157). Over 50% of evaluations published in the ACM SIGCHI Conference on Human Fac- tors in Computing Systems (CHI) proceedings included formal evaluations em- ploying this strategy (Barkhuus and Rode 2007). Like laboratory experiments, experimental-simulation studies mostly use factorial design, measure task com- pletion time and task accuracy analyzed using established statistical methods. The major differences reside in bringing realism to the study setting, which in- 11 Chapter 1. Introduction clude testing interactive interfaces displaying realistic data instead of static ab- stract stimuli, studying scenario-based tasks instead of simple abstracted tasks, and soliciting subjective participant feedback in addition to objective time and accuracy measurements. Tasks are chosen based on the intended application of the visualization technique or system under investigation, and usually comprise basic task operations such as locate, identify, compare, associate, distinguish, rank, cluster, correlate, and categorize (Amar et al. 2005; Roth and Mattis 1990; Tory and Möller 2004; Wehrend and Lewis 1990). To obtain feedback from participants regarding study interfaces, experimental-simulation studies typically gather subjective feedback in the form of questionnaires, which can be reported quantitatively such as in the NASA-TLX scale that measures mental workload (Zhang 2005), and qualitatively such as open-ended questions that solicit perceived positive and negative aspects of the interfaces. The third study in this thesis took the experimental simulation strategy to study overview use in multiple-VIR interface to display single-level data. Section 3.1.2 surveys experimental-simulation studies that focused on multiple- VIR interface techniques and systems. Chapter 4 systematically reviews existing experimental-simulation studies in this area to derive design guidelines. Quadrant I: Field strategies Two strategies were included in the Quadrant I field strategies (Figure 1.2): field experiment and field study (McGrath 1994). Both field studies and field experiments are strategies where the re- searcher studies a natural behavioural system. In field studies, the research “sets out to make direct observations of ‘natural’, ongoing systems, while in- truding on and disturbing those systems as little as possible” (McGrath 1994, p. 157). This strategy is frequently employed during visualization designs to gather design requirements by understanding user characteristics such as work- flow and expertise, and existing practices and problems of the target application. In this thesis, the task of interest is exploratory data analysis, discussed in Sec- tion 2.1, and the application domain is web session log analysis, discussed in Section 2.4.2. Section 7.1 reports interview findings of existing analysis prac- tices and problems identified before system design. Section 8.1.1 reports similar findings from interviews conducted as part of our field work detailed in Chap- ter 8. Field experiments, in contrast, give up some of the unobtrusiveness by mod- 12 Chapter 1. Introduction ifying one aspect of the system, as the goals of these studies are frequently to assess the causal effects of the difference in that manipulated feature on other behaviours of the system. In the context of visualization evaluation, the mod- ification often comes in the form of a visualization system, usually designed to address existing analysis concerns. The study therefore aims to evaluate the system by observing changes in task behaviours. To preserve the naturalness and unobtrusiveness of the method, researchers prefer having the participants use their own data performing participants’ own tasks instead of prescribed tasks whenever possible. Data collected are qualitative and observational. Sec- tion 3.1.3 surveys field experiments performed to evaluate information systems. Chapter 8 reports the field work conducted at Google Inc. to study the use of Session Viewer, a visual analytic tool built as part of this thesis to support web session log analysis. Our work is akin to field experiments as we introduced the tool at a workplace. However, we focused more on tool use than on behavioural changes brought about by the tool, so use of the term “experiment” could lead to misunderstanding. We labeled our work field evaluation instead. 1.2 Thesis Contributions: Evaluation The evaluation aspect of this thesis involves four studies to investigate chal- lenges in building visualization systems to support data analysis, each with a different research strategy based on McGrath’s (1994) research circumplex, as shown in Figure 1.2. We started with a summary synthesis to systematically review existing multiple-VIR study results. From our findings, we identified two research areas for further investigations using a laboratory experiment, an experimental-simulation study, and a field evaluation. Contributions from each study are described as follows: 1. Given the lack of understanding of multiple-VIR interface use and effec- tiveness in both the information visualization and human-computer inter- action communities, we analyzed 19 existing multiple-VIR interface stud- ies to extract high-level interface design guidelines. Chapter 4 details our findings. 2. We systematically examined the effects of two-dimensional geometric trans- formations and background grids on visual memory and defined a no-cost zone for each transformation type within which we did not detect per- formance degradations. We verified and refined two established design 13 Chapter 1. Introduction guidelines in this context: we refined guidelines on preserving orthogonal ordering, and verified the effectiveness of background grids (Misue et al. 1995). Chapter 5 details the study. 3. In interfaces that provide multiple VIRs, low-VIR overviews typically sac- rifice visual details for display capacity, with the assumption that users can select regions of interest to examine at higher VIRs. We examined and refuted this assumption for single-level data and proposed interaction costs as a factor. Chapter 6 details the study. 4. There have been very few long term and detailed evaluations of infor- mation visualization systems in the workplace using real-world data. We evaluated Session Viewer with seven web session log analysts at Google Inc. in a field evaluation and identified two design themes summarizing issues and implications for visualization system effectiveness in the work- place. Chapter 8 details the study. The remainder of this section elaborates on each conducted evaluation in terms of study motivations and goals, study approach, and major findings. 1.2.1 Summary synthesis: multiple visual information resolution interface designs Motivations and goals Despite numerous evaluation efforts and the long history of applying multiple- VIR techniques to interface design, the use and effectiveness of these techniques remain unclear (Furnas 2006). The difficulty in studying these interfaces re- flects their complexity; a large number of factors are at play that significantly affect their use. These factors include the match between task information re- quirements and the type and amount of information displayed, the supported interactions, the use of image transformations in the implementations, and user characteristics in terms of spatial ability, interface use, and task domain knowl- edge. Chapter 4 details the summary synthesis in this thesis that aimed to provide a clearer snapshot of our existing knowledge based on empirical evaluations of multiple-VIR interface techniques and systems, with the goal of extracting guidelines for design and identifing research questions for the thesis. 14 Chapter 1. Introduction Approach and major findings Our summary synthesis analyzed 19 existing multiple-VIR interface studies and cast findings into a four-point decision tree to design multiple-VIR displays: (1) When are multiple VIRs useful? (2) How to create the low-VIR display? (3) Should the multiple VIRs be displayed simultaneously? (4) Should the multiple VIRs be embedded or separated? We summarized our findings as design recommendations. We concluded that the number of VIRs should match the number organization levels in the data, and the information displayed in the low VIRs should be relevant, sufficient, and necessary for the supported task. Simultaneous display of the different VIRs was found to be suited to tasks where answers, or information scent leading to the answers (Pirolli et al. 2003), spanned multiple levels of the VIRs. Otherwise, temporal switching of the VIRs should be more appropriate due to simpler and more familiar interactions. The issue of spatial arrangements of the various VIRs remains an open question in research. The questions of overview creation and spatial arrangements of the various VIRs were further examined in the next three studies. 1.2.2 Laboratory experiment: visual memory costs of transformations Motivations and goals Geometric transformations are widely used in interface design, particularly in multiple-VIR visualization systems to create the low-VIR overview. In this study, we investigated the visual memory costs of four frequently used geometric transformations: scaling, rotation, rectangular fisheye, and polar fisheye. Rotation has been used in embedded interfaces such as the Hyperbolic tree (Lamping et al. 1995) and to create interactive radial graph layout (Yee et al. 2002). Likewise, scaling is extremely popular; for example, scaling is frequently used to create the low-VIR display in multiple-VIR interfaces, for example, as the lowest zoom level in Summary Thumbnails that provide semantic zooming for webpages displayed on mobile devices (Lam and Baudisch 2005), and as the low-VIR overview in separate interfaces for documents (e.g., Hornbæk and Frokjær 2001, Hornbæk et al. 2003) and maps (e.g., Hornbæk et al. 2002). Unfortunately, scaling only works to a certain extent: when the size of an image is reduced too far, its details become indiscernible. One possible remedy 15 Chapter 1. Introduction is to selectively scale visual objects such that readability is preserved for the part of the image relevant to the user, while the rest remains available in a reduced form to serve as context. The class of embedded techniques, a popular multiple-VIR technique, does so by providing both an unscaled focus (a high- VIR region) and a scaled-down context (a low-VIR region) in a single integrated image (Leung and Apperley 1994; Skopik and Gutwin 2005). Focus + context can be realized using a nonlinear transformation called a fisheye transformation, which has two main variants: rectangular and polar (Leung and Apperley 1994; Skopik and Brown 1992). There exists a large body of work using the fisheye transformation, such as the Fisheye menu to support item selection from a long list (Bederson 2000), Fishnet to display lengthy web documents (Baudisch et al. 2004), DateLens to display calendar data on small-screens (Bederson et al. 2004), and a two-dimensional graph display for large information spaces (Bartram et al. 1995). While scaling, rotation, rectangular fisheye, and polar fisheye transforma- tions can provide benefits in overview creation and VIR presentation, there is a danger that the transformed image may be too distorted to remain recognizable. This issue is a serious usability concern, since users need to be able to retain, or at least compensate for, their orientation in the visualization after the trans- formation. They also need to be able to associate displayed components before and after the transformation to equate the two views as the same, or at least holding the same information. Unfortunately, effects of these transformations on visual memory are largely unknown. Our goal was therefore to systematically measure visual memory costs of these four two-dimensional geometric transformations to guide interface design. Also, we aimed to refine existing design guidelines, such as to mitigate incurred perceptual costs by preserving orthogonal ordering and by applying background grids. Approach and major findings Since we were mostly interested in visual memory cost, which is a perceptual cost, we modeled our study design after those in experimental psychology and conducted a laboratory experiment. Instead of using a fully interactive system with scenario-based tasks, we showed static abstract images and studied a three- phased visual memory task (encode, retain, test). We also only measured task completion time and accuracy without soliciting subjective feedback or recording 16 Chapter 1. Introduction observations. For each transformation type, we defined a no-cost zone boundary after which we observed degradations in task time and accuracy. We refined the orthogonal-ordering guideline proposed by Misue et al. (1995) where we sug- gested that instead of preserving left-right and up-down ordering, providing an up-down indicator would suffice. We verified the use of background grids in mitigating visual memory costs in these transformations, and provided further insights as to how they compensated different transformations such as provid- ing distance cues to compensate for distance distortions in rectangular fisheye transformations. 1.2.3 Experimental-simulation study: overview use Motivations and goals Creation of the low-VIR view is one of the first steps in the multiple-VIR design process. A low-VIR view corresponds to the overview in separate techniques, the context in embedded techniques, and the lowest zoom level display in zooming techniques. While it is obvious how to display data at the highest VIR in the detail, focus, or high-zoom displays, how and what to display in low VIRs can be difficult. Ideally, the low VIRs should map to all the data in the data population so that users can select an area of interest for detail explorations at higher VIRs. When the data is structured at multiple levels that are relevant to the task, that structure can be used to create the lower-VIR views as low-level data can be aggregated and collectively represented by higher-level structures. For example, designers can represent individual species (e.g., Panthera tigris, Panthera leo and Panthera onca) by Genus (e.g., Panthera). Using multiple VIRs for data organized at multiple levels of detail was found to be effective in our summary synthesis, detailed in Chapter 4. However, when the data has only a single level of inherent structure or has no known structure, designers have little guidance on low-VIR creation. The lack of known data structures may necessitate the display of every datum in the data population, and designers may need to sacrifice the amount of visual detail displayed for each datum to increase the low-VIR’s display capacity. This approach is viable if the designer can assume users can recover lost visual details in higher-VIR displays. In other words, designers need to ensure sufficient and perceivable visual details to enable users to select areas of interest in the low- VIR display for further examination. 17 Chapter 1. Introduction Displaying visual details does not always guarantee that users can access the displayed visual information. For example, information encoded by text with a font that is unreadable is not accessible to users. In other words, the usefulness of displayed text can be characterized by font readability. For graphical displays, the corresponding visual requirements are more difficult to define despite the rich history of perception research. One such requirement is visual salience. In general, a visual object is salient when it attracts the user’s attention more than its neighbours, and is therefore easily detected (Landragin et al. 2001). One way to achieve extreme visual salience is by visual pop-out, where visual objects with features that can be preattentively processed are spotted quickly and reliably on the display independent of the number of distractors and observer intent (Treisman 1985). However, this extreme approach can be inappropriate when it is unclear a priori which of several aspects of the data should be emphasized. Instead, a more appropriate strategy would be to encode visual objects with sufficient salience to enable overview use without having one aspect overpower the others. The low-VIR view would contain a variety of items of similar salience, where the visual target would not draw more attention than the non-targets but could be detected and accessed. Based on pilot study results and Tullis’s (1985) work on display character- istics and visual search time (discussed in Section 3.1.1 in the Related Work Chapter), we selected two perceptual parameters of visual salience out of a col- lection of six: target visual complexity and visual span. We established the boundaries of these requirements by showing that our participants universally chose to use the low-VIR displays only when the visual targets were structurally simple and spanned a small visual angle. We then focused on situations where these visual requirements were not completely met. The goal of this study was therefore to investigate whether distributing high-VIR details amongst multiple VIRs could relax perceptual requirements established for single low-VIR views, and if the spatial arrangements of high and low VIRs affect overview effective- ness. Approach and major findings While laboratory experiments are appropriate in measuring perceptual costs, they are unsuitable vehicles to better understand interface use since they tend to ignore interactivity and participant usage behaviours. The third study in this thesis took the experimental-simulation approach to understand overview 18 Chapter 1. Introduction use in multiple-VIR interfaces. We therefore studied fully-interactive interfaces with scenario-based tasks, and recorded detailed observations in addition to task completion time and accuracy to gain insights into interface use. We found that, surprisingly, neither of our separate or embedded multiple- VIR interfaces provided performance benefits when compared to the optimal single-VIR interfaces. However, we did observe benefits in providing side-by- side comparisons for target matching in the separate interface. We conjectured that the high cognitive load of multiple-VIR interface interactions, whether real or perceived, is a more considerable barrier to their effective use than was previously considered. Overview design in multiple-VIR interfaces is a complex issue and remains an open question in research, which is discussed further in the last chapter of this thesis, in Section 9.1.1. 1.2.4 Field evaluation: Session Viewer at Google Inc. Motivations and goals Interface use is a complex phenomenon that cannot be adequately studied in laboratories (Plaisant 2004; Shneiderman and Plaisant 2006). Finding effective methods to evaluate visualizations is an open research area, and is discussed in Section 9.1.4. Traditionally, the information visualization community has focused largely on experimental-simulation studies to compare between visual- izations. Generally, these studies lack realism and have had limited success in discovering unexpected factors that affect interface use, in ensuring participant engagement during the study, and in studying domain expertise. Laboratory studies also cannot provide insights into prototype deployments in the workplace to discover issues that may be unrelated to information visualization techniques, but nonetheless may determine the outcome of technology transfers. We therefore focused on the web session log analysis application domain and built a visual analytic tool called Session Viewer to examine some of these issues. Our goal in the fourth study in this thesis was therefore two-fold: to examine design choices made in building Session Viewer, and to study system requirements in the workplace. 19 Chapter 1. Introduction Approach and major findings We conducted a field evaluation at Google Inc. with seven log-analyst partic- ipants working on their own data and their own tasks, reported in Chapter 8. Taking a qualitative approach, we collected 20 hours of tool-use observations and grouped our findings into two design themes: (1) design implications in dealing with real-world noisy data, and (2) factors that lead to tool reception in the workplace. In terms of visual-design findings, we found that noisy data requires sub- stantial validation, and tools should convey the gist of the data. Tools should allow fluid data-view projection to support frequent analysis direction changes. We examined design choices made during the creation of Session Viewer. We found that our analysts could effectively identify interesting sessions for further examination using Session Viewer’s scrollable overview, which displays small multiples of sessions that are interactively reorderable, even though the overview did not simultaneously show all data. We also found that the sepa- rate visualization technique, when coupled with statistical data attributes, was surprisingly effective in supporting data cleaning and data selection. In terms of spatial layout, we found a tradeoff in optimizing screen space for single- population analysis and multiple-population comparisons. We also identified three main considerations for system deployment. We found that the level of data configurability should be based more on target users’ technical skills than on existing data schema. We believe the strongest determinant of our tool’s reception was its unique contribution to the analysis process by bridging between existing analysis practices. However, the complex- ity of a powerful tool may deter its use. In our case, integration with current tool sets was found to be less crucial than we previously assumed. 1.3 Thesis Contributions: Application The application aspect of this thesis involves our detailed design study of a information visualization system prototype called Session Viewer to support web session log analysis. Session Viewer is our proposed solution to address existing data exploration concerns of web session log analysts based on their current analysis practices, characteristics of web session log data, and the task of exploratory data analysis. The software is unique in its ability to handle multi-level data and support cross-level analysis. Chapter 7 covers the design 20 Chapter 1. Introduction of Session Viewer in detail. Session Viewer plays a major role in this thesis as it provides a test bed to examine our design ideas based on knowledge gained in the first three evalua- tions in the thesis. We identified two research themes from the multiple-VIR design summary synthesis in Chapter 4: overview creation and the choice be- tween the embedded and separate approach to spatially arrange VIRs. Specific aspects of these two themes were examined further in two subsequent evalua- tions: the visual-memory experiment in Chapter 5 studied perceptual costs of image transformation, and the overview-use study in Chapter 6 looked at visual details required for effective overview use. In Session Viewer, we provided a concrete design example in the application domain of web session log analysis to further explore design choices associated with these two research themes. Section 7.6 describes our design evolution of Session Viewer and listed our design choices, while the last evaluation of this thesis, the field evaluation, examined the impact of our design choices. In addition to our design study of Session Viewer, our application contribu- tions are therefore: 1. We proposed a solution to create overviews of large data sets. Instead of data pre-selection, we allowed scrolling in our overview that displays session objects as small multiples with each session object comprised of events. We augmented our overview with visually and interactively linked session attribute for each session object. With session reordering based on attributes, we found in our field evaluation that analysts could effectively isolate interesting sessions in a population for further analysis. 2. We provided positive evidence for using the separate technique to display multiple levels of data. We found in our field evaluation that the separate technique provided close mapping between VIRs and analysts’ concept of session logs, and the technique was found to be effective in supporting data validation and session selection. 21 Chapter 2 Background: Exploratory Data Analysis and Visualization Given the focus of exploratory data analysis of the thesis, this chapter provides background information in general, and the thesis domain of web session log analysis in specific. Data analysis has long resided in the realm of statistics, with established methods that summarize data populations and model patterns in the data. An essential step in data analysis is data exploration, where the analyst tries to understand the data to generate hypotheses. This step is arguably difficult, especially in situations where the data size under analysis is large, and when the data contains multiple subpopulations that are unknown prior to the analysis. Visualization systems, by taking advantage of the human visual system’s ability to process large number of visual signals in parallel, has been suggested as a viable solution to aid data exploration. This chapter surveys these areas in more depth. The survey begins with a survey of the exploratory data analysis (EDA) task, based mostly on the statistics and analytical methodology literature (Section 2.1), followed by a list of proposed roles played by visualization in data exploration (Section 2.2). These proposed roles are solidified into design challenges and requirements for visualization systems that support EDA (Section 2.3). The last section in this chapter focuses on the application domain of this thesis, web session log analysis. Section 2.4 explains the rationale of our domain choice and provides background information on the subject. 22 Chapter 2. Exploratory Data Analysis and Visualization 2.1 Exploratory Data Analysis Data analysis is a complicated process, which is part of a larger context of inquiry. Tukey described the process of data analysis as a continuum from exploration to confirmation data analysis (Tukey 1986), that generally starts with data exploration. The term “exploratory data analysis” (EDA) was first coined by Tukey in 1977 in his seminal work of the same title (Tukey 1977). The goal of EDA is to discover patterns in data. The emphasis is to study data to obtain an understanding. This thesis focuses on EDA instead of the entire analysis process, since the goal of visualization is to support EDA (Section 7.3) and since there exist numerous statistical packages that address confirmatory analysis. Instead of discussing specific EDA statistical methods such as data transformation and residual analysis (Tukey 1986; Leinhardt and Wasserman 1979), we focus on EDA philosophies to understand how visualization can play a role in supporting the task of EDA. According to Hartwig and Dearing (1979), the essence of EDA is skepticism and openness: skeptical of potentially inappropriate use and fallacies of data representations and analytical methods, and open to unanticipated patterns in the data. EDA therefore focuses on tentative model building and hypothesis generation in an iterative process of model specification, residual analysis, and model re-specification (Behrens 1997). In other words, analysts should form their hypotheses while studying the data, not before. Ho (1994) further characterized the logic of EDA based on the work of Peirce in 1878, and suggested applying the process of abduction to EDA. Abduction is the process where analysts “look for a pattern in a phenomenon and suggest hypothesis” (Ho 1994, p. 15). Its purpose is to generate guesses of a kind that deduction can explicate and that induction can evaluate. Ho argued that in ex- ploratory data analysis, “although there may be more than one convincing pat- terns, we ‘abduct’ only those which are more plausible” and that “exploratory data analysis is [therefore] not trying out everything” (Ho 1994, p. 16), since in general, it would be impossible to falsify every possibility. On the other hand, exploratory data analysis is not to make hasty decisions, as “researchers must be well-equipped with proper categories in order to sort out the invariant features and patterns of phenomena” (Ho 1994, p. 18). Even though EDA is more a philosophy of approach than a prescribed method, several researchers have provided concrete steps to achieve these goals. The analysis starts with analysts studying the data. Hartwig and Dearing (1979) 23 Chapter 2. Exploratory Data Analysis and Visualization advocated a bottom-up approach that starts by understanding the data distri- bution of each data dimension value, building the understanding to correlate between dimension pairs, examining the network of relationships between the variables, and building models about the data. Sanderson and Fisher (1994) adapted the philosophy of EDA to sequential data exploration, or exploratory sequential data analysis (ESDA), for data with integral temporal components. Sanderson and Fisher (1994) described the process of ESDA as “Eight Cs”, where the first three Cs (Chunks, Comments and Codes) are initial steps to understand the data, and the next four steps (Connections, Comparisons, Con- straints and Conversions) are devoted to data exploration. The last step, Com- putations, is where analysts reach the conclusion of the analysis. The outcome of EDA or abduction is therefore a set of plausible models that can be further assessed. Ho (1994) argued that the next stage in the analysis is to refine the hypothesis by drawing logical consequences using deduction, or “a process through which we start with general claims or general assertions and ask what follows from these premises” (Reisberg 2001, p. 411). However, since deduction relies on the truthfulness of the premises, empirical justification of the hypotheses with data is required. That is the next stage of analysis, or induction, “a process in which one begins with specific facts or observations and then draw some general conclusion from them” (Reisberg 2001, p. 378). Ho’s induction process is akin to confirmatory data analysis (CDA) modes of Tukey (1986). In the first stage, or the rough CDA mode, analysts use probabilistic approaches such as confidence intervals or significant tests to initially assess the plausible hypotheses. In the next CDA mode, specific hypotheses are tested using a strict probabilistic framework following a decision theoretic approach. This process is cyclical, leading to a good description of the data by successive approximations. A related area is the work on analytical reasoning and sense making. A comprehensive review is beyond the scope of this thesis but can be found in Chapter 2 of Thomas and Cook (2005). However, I will briefly mention Pirolli and Card’s (2005) Sensemaking process, which is partly based on their ear- lier work on information foraging theory (Pirolli and Card 1999). Information foraging theory is widely known in the fields of human-computer interaction and information visualization, and has been applied to web-searching support tool designs (e.g., Olson and Chi 2003) and to model web usage behaviours (e.g., Card et al. 2001; Chi et al. 2001). Briefly, Pirolli and Card’s (2005) Sensemaking process has two main loops, 24 Chapter 2. Exploratory Data Analysis and Visualization the foraging loop based on Pirolli and Card (1999) and the sensemaking loop based on Russell et al. (1993). The bottom-up process in the foraging loop is akin to EDA as prescribed by EDA advocates such as Hartwig and Dear- ing (1979). According to Pirolli and Card (2005), analysts start with the step of “search and filter” to collect relevant external data sources into a tempo- rary storage space or “shoebox”, after which the documents in the shoebox are read to extract evidence to draw inferences and to trigger new hypotheses and searches. In summary, data analysis operates in a cyclic fashion in terms of processes (abduction, deduction, induction) or modes (EDA, rough CDA, CDA). The analyst, when trained in different modes of analysis, moves fluidly between ex- ploratory and confirmatory processes in a single analysis (Behrens 1997). Pat- terns and unexpected outcomes are regarded as starting points for hypothesis generation and future testings rather than as statistical conclusions. In addition, the analyst familiar with EDA will explore data patterns associated with the hypothesized main effect to make sure the CDA was not misled by unrecognized patterns that can lead to conclusions inconsistent with the data. Given our visual capability to process large amounts of information in par- allel, visualization is an attractive tool to support EDA. 2.2 Roles of Visualization in EDA Advocates of EDA often recommend using graphical displays to represent data. According to Tukey, “the greatest value of a picture is when it forces us to notice what we never expected to see” (Tukey 1977, p. vi). Thomas and Cook summarized the six basic ways where visualization can amplify human cognitive capabilities (2005, p. 46): 1. Increasing cognitive resources, such as by using a visual resource to expand human working memory; 2. Reducing search, such as by representing a large amount of data in a small space; 3. Enhancing the recognition of patterns, such as when information is orga- nized in space by its time relationship; 4. Supporting the easy perceptual inference of relationships that are other- wise more difficult to induce; 25 Chapter 2. Exploratory Data Analysis and Visualization 5. Perceptual monitoring of a large number of potential events; 6. Providing a manipulable medium that, unlike static diagrams, enables the exploration of a space of parameter values. In addition to the visual display, modern visualization systems offer interac- tivities that further assist EDA, as interactivities allow analysts to quickly and fluidly “hold and assess” working hypotheses (Behrens 1997), either by viewing the data in different perspectives, or by emphasizing different parameters of a problem (MacEachren and Kraak 1997). 2.3 Challenges and Requirements for EDA Visualization Systems Good (1983) posed two questions for EDA tool designers: How should we present a collection of k -tuples to match the cognitive powers of the analyst so that he can (i) see patterns in the data, and (ii) formulate sensible hypotheses about the data? Existing EDA literature provides further and specific design requirements, such as to provide context for data interpretation, to provide re-representation and multiple representations of data, and to provide links between different data views. These requirements were part of the considerations in creating Session Viewer, the visual analytic tool in this thesis built to support web session log analysis. Design considerations of Session Viewer are listed in Section 7.3. 2.3.1 Provide context for data interpretation Interpretation of data usually requires comparison, either to existing standards, or to related values. According to Woods et al. (2002), “Presenting data in the context shifts part of the burden to the external display rather than requiring the observer to carry out all of this cognitive work in the head.” (p. 32). Outliers can only be detected when the analyst understands how the datum departs from, or conforms to, the typical expected case (Woods et al. 2002). In addition to interpreting individual data, analysts often need to discover relationships in the context of the field of practice. Such a frame of reference is a fundamental prerequisite for depicting relations rather than simply making 26 Chapter 2. Exploratory Data Analysis and Visualization data available. Instead of organizing displays around pieces of data, displays should organize data in units meaningful to the application. 2.3.2 Provide re-representation and multiple representations of data, and allow fluid traversals EDA advocates have stressed the fact that conventional views of data are more based on habits than for enhancing analysis, and analysis tools should support re-representation of data (Behrens 1997). In addition, data analysis often requires checking multiple hypotheses, as “science is the holding of multiple working hypotheses” (Chamberlain 1965). Almost always there are multiple frames of reference that apply. Each frame of reference is like one perspective from which an analyst views or extracts meaning from data (Woods et al. 2002), and patterns and anomalies in data may only be obvious in certain views. Multiple representations of data are therefore needed to view data in different ways using multiple scales and perspectives, both spatially and conceptually (MacEachren and Kraak 1997; Sanderson and Fisher 1994). Another important aspect of EDA tools is to allow analysts to shift per- spectives fluently so as to support the fast data-view projection changes in the process (Behrens 1997; Woods et al. 2002), as the definition of relevance in data can be highly context sensitive (Woods et al. 2002). 2.3.3 Provide linking between data views An individual datum is meaningless unless the data analyst can interpret its value in the context of other data. In some cases, the analyst may need to interpret data values in the context of the theory that unifies the data (Behrens 1997). Indeed, part of the analysis is to discover the multiple potentially relevant frames of reference and to find ways to integrate and couple these multiple frames (Woods et al. 2002). EDA tools should therefore link multiple data views to allow propagation of changes in one plot to all relevant plots (Behrens 1997), and to identify relationships among variables (MacEachren and Kraak 1997). 27 Chapter 2. Exploratory Data Analysis and Visualization 2.4 Web Session Log Analysis Given the general background on exploratory data analysis and possible roles played by visualization, this section focuses on the chosen application domain of this thesis: web session log analysis. To better study exploratory data analysis, web session log analysis was cho- sen as the thesis’s design example due to problem relevance and commonality with other analysis situations. In the year 2002, the size of the World Wide Web was about 533x103 terabytes (Lyman and Varian 2003). The same study reported that in the year 2002, each individual in the United States spent about 100 hours per year online (Lyman and Varian 2003). With the increasing im- portance, complexity and volume of the web, providing better web information- seeking support is essential and requires an understanding of web search usage behaviours. Researchers have used methods ranging from field experiments and studies to web session log analyses to achieve this goal. Typically, field-study researchers observe participants perform their information-seeking activities on the web in participants’ own environments, followed by interviews to further understand participants’ tasks, motivations, thinking processes, and expectations. Pub- lished examples include Jones et al.’s (2001) study to identify methods people use to manage web information for re-use, Sellen et al.’s (2002) study to observe web activities conducted by knowledge workers, and Teevan et al.’s (2004) study on orienteering behaviours in directed search. While such observational studies can reveal rich and detailed information and allow for deep understandings of naturalistic search behaviour in the context of users’ goals, the approach is too labour intensive for large population analyses. Session log analysis is a more scalable alternative. Session logs are computer logs that capture user actions in units of sessions. Session logs can be obtained by server- or client-side logging. Server-side logs include transactions of web search engines, Intranets and web sites, while client-side logs are usually recorded by plug-in tools installed in users’ web browsers. Even though session logs cannot capture user goals and intent, they do capture realistic search behaviours as users perform real information searches in their own environments uninterrupted by the data collection mechanism. However, session logs are difficult to analyze due to the large data size and complex compositions. Before discussing existing difficulties in web session log analysis in Sec- tion 2.4.2, we first explain the structure and composition of web session logs. 28 Chapter 2. Exploratory Data Analysis and Visualization Figure 2.1: A sample web session log from user study described in Russell and Grimes (2007). Each row is an event, with the sequence number, participant ID, question number, navigation type, event time, URL, and webpage title. 2.4.1 Session logs One method to obtain web session logs is by client-side logging. This method is commonly used in short-term user studies to understand web-usage behaviours. Figure 2.1 shows a sample log from one such study (Russell and Grimes 2007). In session logs, the basic unit of analysis is a session, a time-stamped se- quence of events. An event corresponds to a user action, such as submitting a query to the search engine, clicking on the next page link, or clicking on a web result. Each event has attributes, such as a time-stamp, URL, action type (e.g., web search, webpage click), search domain (e.g., Image, Product), and the submitted query. Since a session is simply an ordered list of events, aggregates of event at- tributes become session attributes. Examples include the total number of events in a session and the total dwell time of a session. In addition, session logs may contain participant feedback at the session level that also constitute session attributes. Examples include task satisfaction and self-reported task outcomes. A session population is a group of sessions with shared characteristics such as usage patterns. In short, a session log has structure at three levels: session population, session, and event. In general, a session is a multi-dimensional data object. Most dimensions are single values (e.g., event count per session), but one dimension is a time-ordered sequence of event objects. Each event is itself a multidimensional datum. In other words, the session data object is a multi-level data object. 29 Chapter 2. Exploratory Data Analysis and Visualization 2.4.2 Log analysis: existing practices and problems One analysis option is a detailed study of individual sessions. One example is Kellar et al.’s (2007) study to better understand four types of information- seeking tasks. These task categories were created by detailed studying of task descriptions in session logs. In their case, they looked at 40 task descriptions sampled from a larger set of logs generated by six participants over four days. This detailed study approach can lead to interesting insights, but is very labor intensive. In Kellar et al.’s (2007) case, the task categories were created by ten focus-group participants over an hour, and further refined by the researchers themselves after the focus-group session. The large amount of time required for detailed analysis thus limits sample size, and sampling from a larger data set may be potentially biased, which may render general conclusions drawn inaccurate or even misleading. A more scalable and commonly used alternative is to compute overall pop- ulation statistics at multiple levels, such as unique term frequency at the query level and event type frequency at the session level (Jansen 2006), or more com- plex web usage mining methods to model and predict user behaviours (e.g., Pierrakos et al. 2003). While these statistical approaches are scalable and effective, they tend to be hypothesis-driven and confirmatory rather than data- driven and exploratory, and may not uncover unexpected trends or may obscure subpopulation differences in the data. In addition, without exploring the data in detail, hypothesis formation can be difficult. The detailed and the statistical approaches to analyzing web session logs are therefore complementary, as one approach can potentially mitigate the short- comings of the other. For example, the difficulty in selecting representative sessions for detailed analysis may be mitigated if the selection could be guided by statistical session attributes such as session duration distributions. For sta- tistical analysts, being able to view representative sessions of various statistical populations in detail may facilitate hypothesis generation and uncover unex- pected trends. The key challenge with session log analysis is therefore to bridge between detailed and aggregate analysis to better support data exploration. The lack of cross-level analysis is not unique to web session log analysis. Tukey and others advocated exploratory data analysis using graphical plots to ensure adequate data exploration and understanding before applying statistical methods, and data analysis was considered as a continuum from exploratory to confirmatory analysis (Tukey 1986). Visual exploratory analysis (VEA) is an 30 Chapter 2. Exploratory Data Analysis and Visualization attractive approach given our visual capabilities to spot trends, patterns, and anomalies. In practice, effective VEA requires a sophisticated visualization tool. Building such a tool is the subject in Chapter 7 of the thesis. 31 Chapter 3 Related Work Given that this thesis studies how interfaces that display multiple levels of data detail can support exploratory data analysis using four diverse research strate- gies, this chapter surveys related empirical studies employing each of the four strategies in Section 3.1: laboratory experiments (Section 3.1.1), experimen- tal simulation (Section 3.1.2), field experiments (Section 3.1.3), and systematic reviews (Section 3.1.4). In terms of applications, we survey the general area of visual analytics where interactive visual interfaces are used to support analytical reasoning in Sec- tion 3.2. More specifically, in Section 3.2.1 we cover visualization systems de- signed for visual exploratory analysis, as well as three categories of visualization techniques to display multiple data forms, multiple data levels, and multiple data dimensions. Since our chosen application domain for our field evaluation in Chapter 8 is web session log analysis, this chapter also reviews visualization specifically designed for to visualize web session logs (Section 3.2.3, the more general com- puter logs (Section 3.2.4), and non-visualization approaches to web session log analyses (Section 3.2.5). 3.1 Empirical Studies As the field of information visualization matures, researchers have begun to acknowledge the need to evaluate existing visualization techniques and systems (Chen and Czerwinski 2000). Indeed, a 2007 study found that over 90% of the papers that were accepted for the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI) in 2006 included formal evaluations, where only about half of the papers did in 1983 (Barkhuus and Rode 2007). The main themes in empirical studies include laboratory experiments to study human perceptual and cognitive capabilities in interacting with visual- izations, experimental-simulation studies to evaluate information visualization 32 Chapter 3. Related Work techniques, field experiments to study information visualization systems use in practical contexts, and systematic reviews of study results to obtain overviews of existing knowledge. This section delineates previous efforts in these four areas of investigations, and how this thesis contributes to and extends existing work. Due to the large body of work that exists in perceptual and visualization-technique evaluations, this section focuses on work related to multiple visual resolution interface reso- lution (VIR) visualization techniques when discussing these two themes, as the study of multiple-VIR interfaces is the main focus of this thesis. 3.1.1 Laboratory experiments to study human perceptual and cognitive capabilities Perception and cognition research in information visualization and related fields such as human-computer interaction and experimental psychology aim at estab- lishing models and limitations of human perceptual and cognitive capabilities to better visualization designs. Ware (2004) focuses on subjects in vision research that are pertinent in visualization design and extracts design principles that can be applied to better information display. The major areas of vision research study how humans perceive various kinds of visual stimuli such as colour, shape, motion, and depth, along with research on object recognition and visual attention. Knowledge gained from these kinds of studies has been applied to visualization design. Examples include using study results of colour perception to derive design guidelines to encode nominal or continuous data in interface design (e.g., Ware 2004, p. 123–138); using depth perception to create three-dimension visual representations on two-dimensional output devices (e.g., Hubona et al. 1999); using Gestalt laws of perceptual or- ganization to facilitate visual search (e.g., Hornof 2001); and using pre-attentive processing for rapid visual numerical estimations (e.g., Healey et al. 1996). In addition to establishing human limitations in interacting with visualiza- tion displays, researchers have also proposed models to describe human be- haviours. Examples of models in interface design and evaluation include Card et al.’s (1983) GOMS (Goals, Operators, Methods, and Selection rules) model for observations and evaluations of interactions, Plumlee and Ware’s (2006) gen- eral model for navigation-intensive information seeking, and Pirolli and Card’s (1999) model on information foraging. Pirolli and Card’s (1999) model on in- formation foraging was discussed in Section 2.1 in the context of examining 33 Chapter 3. Related Work the task of exploratory data analysis to derive design requirements for Session Viewer, a visualization tool to support web session log analysis. On visual memory and geometric transformations Of particular interest to evaluating multiple visual information resolution (VIR) interfaces are studies on visual memory and on perceptual costs in geometric transformations, a technique frequently employed in multiple-VIR interface de- sign, and studies on visual salience, a subject closely related to low-VIR overview designs. Several studies examined the roles of visual memory in interface design. An example is Robertson et al.’s (1989) Data Mountain. Data Mountain makes use of our spatial memory and allows users to place thumbnails of documents at arbitrary positions on an inclined plane in a three-dimensional desktop virtual environment using a simple two-dimensional interaction technique. In a subse- quent study, the researchers demonstrated that users could retrieve documents faster with better accuracy than the benchmark Microsoft Internet Explorer Favorites (Robertson et al. 1989), and the performance did not measurably degrade after six months (Czerwinski et al. 1999). While these studies demon- strated the use of spatial memory in interface design, they did not quantify perceptual limits applicable in designs. Skopik and Gutwin (2005) looked at the effects of rectangular fisheye trans- formation on visual memory and found that distortions increased the time re- quired to remember and find target nodes, but without affecting task accuracy. The researchers proposed and demonstrated the effectiveness of adding visual markers, called “visit wear”, of the places previously visited by users to offset distortion effects and improve navigation. Several studies have looked at effects of geometric transformations. Shep- ard, Cooper, and Metzler examined the effects of mental rotation in a series of experiments where participants were asked to determine whether two geometric objects were identical but viewed from different perspectives (Cooper and Shep- ard 1973; Shepard and Metzler 1971). Their experimental results show a clear linear relationship between the angle of rotation and response times, suggesting that participants mentally rotate one of the stimulus to match the other. While these results suggest a mechanism for mental rotation, the experiments did not aim to study visual memory costs. Two other studies measured perceptual costs of geometric transformations 34 Chapter 3. Related Work in visual search tasks using abstract images based on time and accuracy mea- surements. Rensink (2004) found no measurable cost for translational shifts up to at least 2 degrees of visual angle, or 2 cm at a viewing distance of 55 cm. Performance was not measurably affected for rotations up to 17 degrees, but degraded sharply beyond that. Scaling was found to be invariant at a reduction factor of two, but created a measurable cost at a factor of four. In another series of experiments involving visual search on displays with nonlinear polar fisheye transformations, Lau et al. (2004) found that the studied transforma- tions had significant time costs, with performance slowed by a factor of almost three under large distortions. Interestingly, the study did not find any benefits in adding background grids to their images. In fact, the researchers found that grids caused participants’ performance to slow down, suggesting that the grids only added to the perceptual noise. However, these two experiments focused mostly on visual search. Even though visual search is a common component of many of the visual operations in information visualization (Ware 2004), other important factors are still at play. One of these factors is visual memory. Our laboratory experiment in this thesis, reported in Chapter 5, contributes to these previous efforts by systematically quantifying visual memory costs in four types of geometric transformations: scaling, rotation, rectangular fisheye, and polar fisheye. The study also looked at if and how background grids can mit- igate incurred visual memory costs. This study was conducted in a larger con- text of understanding low-VIR display creation and VIR integration in multiple- VIR interface design. On visual salience Visual salience is a broad topic and a vast amount of human vision research has been done to measure and to understand the phenomenon. In general, a visual object is salient when it attracts the user’s attention more than its neighbours, and is therefore easily detected (Landragin et al. 2001). Since the study of visual salience has a long history in vision research, a comprehensive review of the literature is beyond the scope of this thesis, this section only includes work that is directly applied in interface design and evaluation. To automatically evaluate interfaces, researchers have built predictive mod- els to measure visual salience. Using plain character displays, Tullis (1985) identified six display characteristics that correlated with visual search times: (1) the overall density of characters on the display; (2) the local density of other 35 Chapter 3. Related Work characters near each character; (3) the number of distinct groups of characters; (4) the average visual angle subtended by those groups; (5) the number of dis- tinct labels or data items; (6) the average uncertainty of the positions of the items on the display. Recent efforts in measuring visual salience were based on image processing and statistics models, for example, Rosenholtz et al.’s (2005) Feature Congestion model measured display element saliency using color and luminance contrast to quantify display clutter. In terms of mechanisms, one well-studied area is preattentive vision, or vi- sual pop-out, where researchers have discovered a limited set of visual proper- ties such as colour and motion that are detected very rapidly and accurately by the low-level visual system (e.g., Palmer 1999, p. 554–560; Treisman 1985). The information visualization community has incorporated much of this knowl- edge into its design guidelines for visual encoding. One example is using visual features that can be preattentively and individually processed to encode multi- dimensional data (Ware 2004, p. 151–156), such as texture and colours (Healey and Enns 1999) or motion (Huber and Healey 2005). Another application is to highlight visual objects on displays by encoding them with pre-attentively distinct visual symbols. Our efforts in applying measurements of visual salience in interface design reside in the area of low-VIR overview design. Typically, low-VIR overviews in multiple-VIR interfaces need to accommodate a large volume of data to allow users to select individual datum for further examinations at higher details. Such practice is in accordance with Shneiderman’s (1996) information-seeking mantra of “overview first, zoom and filter, then details on demand” (p. 337). Displaying a large number of visual objects on the low-VIR overviews can lead to visual cluttering when the number of objects presented on the overview exceeds the perceptual capability of its users, thus rendering the overview ineffective. One proposed solution is attention filtering using colour and intensity coding to help users segregate their visual fields so that they can focus on regions that are pertinent for the task at hand. This approach has been verified in Yeh and Wickens’s (2001) study on designs of electronic map displays. The third study in this thesis, detailed in Chapter 6, examined a different aspect of the overview problem. Instead of taking the decluttering approach, we investigated how visual objects could remain visually available to users without being overtly salient, as in the phenomenon of visual pop-out. We adapted two of Tullis’s (1985) display characteristics, the number of distinct groups of characters as visual complexity and the average visual angle subtended by 36 Chapter 3. Related Work those groups as visual span, to graphical displays and investigated the limits to which visual objects remained available for users to select for further detailed examinations. This study was conducted in a larger context of studying low-VIR overview creation and VIR arrangements in multiple-VIR interfaces. 3.1.2 Experimental-simulation studies to evaluate multiple-VIR techniques While the vision science literature offers valuable advice to designers in their choice of visual encoding, the studies have generally focused on single static images. To better evaluate information visualization techniques, researchers have to consider interactivity. There exists a large amount of experimental- simulation studies to evaluate visualization techniques (see Chen and Yu 2000 for a meta-analysis). This section focuses on studies that aim at evaluating multiple visual information resolution (VIR) techniques such as zooming, focus + context, and overview + detail. Since Chapter 4 details a summary synthesis of 19 existing studies in this area to better guide multiple-VIR interface design, this section only briefly lists related work in the general area of multiple-VIR interface studies. Instead, the focus here is evaluations of overview use, which is the subject of the third study in this thesis detailed in Chapter 6. Although study results of multiple-VIR interfaces are sometimes character- ized as mixed (e.g., in Nekrasovski et al. 2006), the situation becomes clearer when we categorize the studies. In cases where the task required multiple levels of the displayed data, study results generally show that multiple-VIR interfaces outperformed their high-VIR counterparts. Examples include Schaffer et al.’s (1996) network repair task where the answers involved links at all levels of a network, and Hornbæk et al.’s (2003) essay-writing task where participants were required to summarize the main points of an electronic document. In both cases, interfaces that showed multiple levels of details simultaneously were found to better support the study tasks than interfaces that showed single data levels. In cases where the data set structure had only a single intrinsic level, multiple- VIR interfaces were found to be beneficial only when the low-VIR display pro- vided perceivable details required by the task. For text, perceivability is simply readability, and unreadable text on low-VIR overviews does not enhance task performance. The situation is well illustrated by Baudisch et al.’s (2004) study on information searches on web documents. Their multiple-VIR interfaces dis- played web documents with guaranteed legible keywords, but surrounding text 37 Chapter 3. Related Work was too small to read. Their single-VIR interface was a high-VIR scrollable web browser. The study demonstrated performance benefits for their multiple-VIR interfaces over the high-VIR browser, but only for selective tasks. When the task only required reading the legible keywords, as in their Outdated task, their multiple-VIR interfaces outperformed their high-VIR browser. The superior performances supported by their multiple-VIR interfaces were probably due to participants’ ability to answer the task questions based on information displayed on the low-VIRs alone. Since the low-VIR displays were considerably smaller than the high-VIR display, the low-VIR displays effectively concentrated task- relevant information. In contrast, when the task required reading text around these keywords, as in the Analysis task, having a low-VIR display did not result in performance benefits. One possible explanation is that participants did not find the low-VIR useful and focused instead on the high-VIR displays. Similarly, in North and Shneiderman’s (2000) study, their multiple-VIR in- terface had a low-VIR overview that displayed the names of geographic states in the United States. These names acted as hyperlinks for relevant passages in the high-VIR view that provided detailed census information for these states. Again, their single-VIR interface was a high-VIR scrollable browser. The study found that when the answers were available on the low-VIR overview, partic- ipants did not need the high-VIR view as their performance was not affected by the lack of interactive coordination between the low- and the high-VIR dis- plays. In cases where the tasks required information that was only available on the high-VIR view, interactive coordination was crucial as participants used the low-VIR hyperlinks as shortcuts to reach relevant passages in the high-VIR view. Another example is Hornbæk and Hertzum’s (2007) study, which investi- gated visualizations for large numbers of menu items. Their Multifocus interface provided larger numbers of readable menu items in the low-VIR regions based on a priori significance, while the other interface implemented Bederson’s (2000) Fisheye Menu and displayed unreadable items in the low-VIR regions at the extreme ends of the menus. Even though the study failed to find differences between these two interfaces in terms of participant performance, satisfaction ratings or subjective preference, eye-tracking results suggested that participants used the low-VIR regions more frequently in the Multifocus interface trials. The researchers thus questioned the use of screen space to provide unreadable text as being beneficial. For non-textual graphic displays such as geographic maps, one study has 38 Chapter 3. Related Work demonstrated the costs of ineffective low-VIR overviews. Hornbæk et al.’s (2002) study found that having a low-VIR overview resulted in slower per- formance times and worse recall accuracy for their Washington map trials, and their Montana map trials had generally poor performance results. Their results suggest that the failure of the overviews was partly due to insufficient details provided to support their study tasks: the Montana map itself was single-level and did not offer enough meaningful map contents at low VIRs to guide region selections, and the Washington map display did not show enough details at the overview level to support their tasks. Given the delicate balance between the need for concise yet perceivable visual objects on low-VIR views, the third study in this thesis, detailed in Chapter 6, was conducted to study perceptual requirements for low-VIR graphical visual targets to be reliably accessible to users of multiple-VIR interfaces. 3.1.3 Field experiments to understand visualization system use in the field While experimental strategies can be effective in evaluating specific visualization techniques, such approaches fall short in evaluating visualization systems, as the number of factors that may influence system use and reception is large and, in many cases, unpredictable (Shneiderman and Plaisant 2006). Also, field experiments can study interface-use questions in ecologically-valid settings. This section therefore focuses on evaluations of whole visualization systems using field experiments. To the best of our knowledge, only four sets of field experiments were con- ducted to look at visualization system use in exploratory data analysis. With expert meteorological forecasters analyzing data provided by the researchers, Trafton et al. (2000) found that users tended to be goal-directed when dealing with large amounts of data and mainly extracted qualitative information from visualizations. González and Kobsa (2003b) reported two studies on InfoZoom with five office workers and found that even though InfoZoom provided benefits in creative discovery, the stand-alone tool was not integrated into participants’ daily analysis routine in the long run (González and Kobsa 2003a). Saraiya et al. (2004) compared the number of insights generated using five visualization tools for microarray data including Clusterview, TimeSearcher, Hierarchical Cluster- ing Explorer, Spotfire, and GeneSpring. Their study found that these tools did not adequately link the data to biological meaning, different visualization 39 Chapter 3. Related Work tools resulted in different kinds of insights, and ineffective interaction mecha- nisms severely reduced tool usability. This initial study was followed by a more in-depth longitudinal study (Saraiya et al. 2006) with participants using their own data to address the motivation issue. The longitudinal study reported how data analysts used a combination of tools in their analysis processes and further examined the issue of interaction in these tools. Seo and Shneiderman (2006) evaluated the Hierarchical Clustering Explorer with three case studies and an e-mail user survey to evaluate the software. Two studies looked at other uses of visualization systems. Bellamy et al. (2007) reported the design and pilot deployment of a visualization to monitor and manage compliance processes, and concluded that diagnostic visualizations should provide an integrated view of all required information at sufficient detail. Biehl et al. (2007) evaluated FASTDash, a visualization to improve team ac- tivity awareness. Their field experiment found that FASTDash improved team awareness, reduced reliance on shared artifacts, and increased project-related communication. The fourth study in this thesis, detailed in Chapter 8, contributes to previ- ous efforts in studying visualization systems in ecologically-valid settings. The chapter details a field evaluation of Session Viewer, a visualization tool to sup- port web session log analysis. Our efforts were directed to examine particular design choices made during the creation of Session Viewer such as overview use and spatial arrangements of various VIRs, and to study design issues unique to visualization use and deployment in the workplace in general. 3.1.4 Systematic reviews to summarize existing visualization study results No single user study can provide a complete picture of multiple-VIR interface use due to the large number of factors involved. Similar to the idea that groups composed of independent thinkers tend to be more accurate in their conclusions (Surowiecki 2004), systematic reviews can potentially produce more accurate views of existing research questions than any individual study. To the best of our knowledge, two systematic reviews have been conducted to study visualization systems. The first is Chen and Yu’s (2000) meta-analysis. Based on 35 experimental studies published between 1991 and 2000, the re- searchers isolated six studies that satisfied their selection criteria. They found two broad types of causal relationships: (1) effects of visual-spatial interfaces 40 Chapter 3. Related Work on information retrieval, and (2) effects of cognitive ability of users on infor- mation retrieval. The researchers concluded that for users with the same level of cognitive abilities, simpler visual-spatial interfaces tended to result in better task performance. The second systematic review is Hudhausen et al.’s (2002) meta-study of the effectiveness of algorithm visualizations (AV). Due to the diversity of the 24 experimental studies under analysis, the researchers decided against a gen- eral statistical meta-analysis. Instead, the analysis employed a vote-counting approach within groups defined with dependent and independent variables in the studies. The meta-study found that how students used the AV technology was more important than what the visualization showed. Two other reviews can also be included in this category, even though they did not directly examine visualization systems. Tversky et al. (2002) provides a narrative review to identify scenarios in which using animation in education is effective. The paper also explains the reasons behind ineffective use of animation based on cognitive principles of congruence and apprehension. Hornbæk (2006) reviewed 587 papers and included 180 to review usability measures employed in human-computer interaction research. The summary synthesis in this thesis, detailed in Chapter 4, adds to their efforts and focuses on extracting design guidelines for multiple-VIR interface based on 19 existing experimental-simulation studies. 3.2 Visual Analytics The term visual analytics was coined by Thomas and Cook (2005) to represent “the science of analytical reasoning facilitated by interaction visual interfaces” (p. 4). The field of visual analytics researches techniques in analytical reasoning, visual representations and interaction, and data representations and transfor- mation to facilitate exploration and understanding of large data sets, as well as to produce, present and disseminate analysis results. Given the vast scope of the topic, this section selectively reviews systems and techniques developed for visual data analysis and exploration, many of which influenced and inspired the design of Session Viewer, a visualization tool to support web session log analysis presented in Chapter 7. As web session logs analysis is the application domain of the thesis, this review focuses on visualizations that support exploratory analysis of web session logs in particular, 41 Chapter 3. Related Work and computer-based log analysis in general. To complete the discussion on session log analysis, Section 3.2.5 briefly explores non-visual approaches to web session log analysis. 3.2.1 Existing visualizations for data exploration Many visualization systems have been developed to support general data ex- ploration (Keim 2002), including commercial ventures such as Spotfire (spot- fire.com), Tableau (tableausoftware.com), and Inxight (inxight.com). While these systems support exploration, their visualizations are typically standard graphical displays with single-level data, where each data attribute is a single value. However, these systems are not tailored for showing multi-level data such as web session logs, where at least one of the data attributes is also a multi-dimensional data object. A wide range of visualization and interaction techniques have been devel- oped to facilitate visual data exploration and analysis. This section focuses on visual techniques developed for data display, which can be classified as (1) multiple data-form displays showing the same data in multiple representations, (2) multiple data-level displays showing data at multiple levels of organization (or multiple visual information resolutions), and (3) multiple data-dimension display showing multiple attributes for each datum. 1. Multiple data-form display Analysis often requires viewing the same data in different forms, for example, in linear and logarithmic scales. Roberts (2000) advocated a technique called Multiform to provide different representations, or forms, of the same data to allow analysts to view the data in a more multi-faceted manner. The rationale behind Multiform is based on the fact that different visualization techniques highlight different aspects of the data, and displaying multiple representations of the same data may increase the likelihood of knowledge discovery, as discussed in Section 2.3.2. A related idea is to provide different views of the same data in the same form. One example is the reorderable matrix, an idea first introduced by Bertin (1981) and implemented in Table Lens (Rao and Card 1994). The rationale is that reordering visual data displays based on data attributes can reveal outliers, correlated features and trends in sample populations. 42 Chapter 3. Related Work For data populations, the technique of small multiples provides a means to compare between populations. The idea was first proposed by Bertin as collections (Bertin 1981), and further advocated by Tufte (1983). Different forms of small multiples have been arranged in rows and columns to create univariate and bivariate matrices (e.g., MacEachren et al. 2003). Techniques developed for multiple data-form display can be used to create low-VIR in multiple-VIR interfaces. For example, we adapted the techniques of small multiples and reorderable matrix to create a sessions-level low-VIR overview in Session Viewer (Chapter 7). 2. Multiple data-level display In cases where the total number of data points far exceeds the display capability, techniques such as zooming, focus + context, and overview + detail have been used as solutions. The central idea behind these techniques is to display data at multiple visual information resolutions such that users can follow the sequence of “overview first, zoom and filter, then details on demand” as suggested by (Shneiderman 1996, p. 337). While the detailed data level may simply be the highest data resolution available, creating the overview requires data filtering, clustering, or visual compression of overview visual objects. Taking the approach of data filtering, Furnas (1986) proposed the degree-of- interest function based on a priori significance of the data objects and their dis- tance relative to the object under inspection, or the object in focus. The amount of display emphasis for data objects is proportional to their degree-of-interest distances. Data objects far away from the focus objects are de-emphasized on the display, for example by being displayed with fewer pixels, or not displayed at all. Jakobsen and Hornbæk (2006) further extended the distance parameter in the degree-of-interest function by separating it into syntactic and semantic distances. If the data is inherently hierarchical, the a priori function can reflect the data structure and the overview can display the highest level of the hier- archy allowed by available space. For example, Hornbæk and Frokjær (2001) developed an overview for electronic documents by displaying only the section and subsection headers in their low-VIR overview window. Another approach of data filtering involves user interaction. Ahlberg and Shneiderman (1994) proposed tight coupling of dynamic query filters to selec- tively reduce the number of data points on a scatter plot such that the analyst can focus on a subset of the larger data. 43 Chapter 3. Related Work In terms of data clustering, van Wijk and van Selow (1999) illustrated the use of pre-processing data to create a more visually manageable overview using a year-long timeseries data set with 52,560 data points. Based on the results of a cluster analysis, their interface only displayed averaged values of the clusters instead of all the data points. In terms of visual compression of overview objects, Kincaid and Lam (2006) developed a more spatially compact line-graph encoding to display a large col- lection of line graphs. Instead of encoding both the x- and the y-dimension of line graphs by spatial positions, their visual encoding uses colour to encode the y-dimension, thus reducing the amount of space required to display each line graph. Their visual encoding also enables stacking of line graphs to avoid the inevitable visual cluttering in the overlay alternative. Considerations in creating overviews constitute one of the design discussions in this thesis. In addition to creating the overview, displaying multiple levels of data involves considerations such as the number of display resolutions required and spatial arrangements of the different visual resolutions. These issues are explored in Chapter 4, our summary synthesis of existing study results, and in Chapters 7 and 8, where we detail Session Viewer’s design and deployment. 3. Multiple data-dimension display Most two-dimensional data plots and graphs can only display two variables at a time. Viewing all data dimensions using these displays thus requires multi- ple plots. Some visualizations work around this limitation by allowing users to dynamically select data attributes displayed, for example, Ward’s (1994) Xmd- vTool and Stolte and Hanrahan’s (2000) Polaris. Other visualizations strive to simultaneously display as many data dimen- sions as possible. Parallel Coordinates (Inselberg 1985) takes such an approach and displays data points in n-dimensional space as polylines with vertices on the parallel axes. Dense-pixel displays (Keim and Kriegel 1994) map each data dimension value to a colored pixel and group the pixels belonging to each di- mension into adjacent areas. Display arrangements of data provide detailed information on local correlations, dependencies and areas of interest of the data set. Since most dense-pixel displays use one pixel for each data dimension, this approach maximizes display capacity of output devices. The third common ap- proach to display multi-dimensional data is glyphs (Littlefield 1983). A glyph is a graphical object that combines multiple visual features in a single object, 44 Chapter 3. Related Work with each feature encoding a data dimension. Many examples of glyphs have been developed, such as the whisker plot where each data value is represented by a line segment radiating out from a central point (Ware 2004, p. 184), and the Chernoff Faces, a more whimsical example where different data dimensions are mapped to the sizes and shapes of different facial features (Chernoff 1973). Another approach to display multi-dimensional data is to reduce the high- dimensional data space to a displayable two, or three, dimensional space by multidimensional scaling. One example is Wise et al.’s (1995) ThemeScape to display text-based documents. Even though the distance relationships between the data points are preserved in multidimensional reduction, how each dimen- sion in the high-dimensional data space relates to the displayed low-dimensional space is not clear, thus making it difficult to understand individual data dimen- sions. Multidimensional scaling is therefore used to show data clusters, rather than to read off individual data dimension values. 3.2.2 Existing interactions for data exploration Our discussion of visual analytic related work continues with interaction tech- niques that were developed for (1) data querying and filtering, and (2) data-view coordination. These works influenced the design of Session Viewer. 1. Data filtering and querying Analysts often need to isolate interesting subpopulations in large data sets to focus analysis. Dynamic query filtering allows users to progressively refine filter criteria aided by visual feedback of the results (Ahlberg and Shneiderman 1994). For data querying, pattern matching allows analysts to highlight sequences, as implemented in TimeSearcher for time-series data (Hochheiser and Shneiderman 2003). Session Viewer employed both dynamic querying and pattern matching to filter and highlight sessions (Chapter 7). 2. Data-view coordination View coordination is a well-known problem found in situations where informa- tion is displayed over multiple views and users need to relate visual objects between views. Smooth animation has been proposed as a solution to connect the different temporal views in zooming interfaces to preserve object constancy 45 Chapter 3. Related Work (Robertson et al. 1989). Interactive brushing and linking have been proposed to visually relate simultaneously displayed regions or views in interfaces. Brush- ing is a rapid interaction technique that enables a user to “highlight, select, or delete a subset of elements being graphically displayed by pointing at the elements which a mouse or other suitable input device” (Ward 1994, p. 330). Linking is when “brushing elements in one view affects the same data in all other views” (Ward 1994, p. 330). North and Shneiderman (1997) constructed a taxonomy of multiple window coordination based on navigation and selection. Coordination has been shown to facilitate the use of overview as a navigational aid when the different resolutions are needed in the task, since it provides “the ability to directly select a target in the overview to immediately locate its details”, and the overview thus acts as “an improved scroll bar that facilitates exploration” (North and Shneiderman 2000, p. 736). Coordination between multiple views also allows selection of the same data object in multiple views, which helps users relate the different views and allows concurrent considerations of all the views in their data exploration. Session Viewer took the multiple view approach to display various levels of session details coordinated by linking (Chapter 7). 3.2.3 Visualizations for web session logs Visualizations have been used to display aggregate data derived from session logs for presentation or analysis. For example, Pass et al. (2006) surveyed traditional graphical plots to describe and evaluate search services. Despite having analysis goals similar to our tool, these static plots are more suited for presentation than exploration and discovery. While interactive systems designed for session log analysis exist, they gener- ally focus on website design evaluations based on traffic and user paths, rather than on search usage behaviours. Examples of website traffic visualizations include disk-tree and time-tube visualizations (Chi 2002) and a 3D structure (Wong and Marden 2001). User paths are often displayed as node-link graphs, as in VISVIP (Cugini and Scholtz 1999), WebViz (Pitkow and Bharat 1994), and WebQuilt (Waterson et al. 2002). Lee et al. (2001) took a different approach, displaying web traffic statistics with starfields and user paths with parallel coor- dinates. Hochheiser and Shneiderman (2001) used a multiple-coordinated visu- alization to show web visitation data. Other visualizations, such as 3DWebPath (Frecon and Smith 1998) and History tree (Kreuseler et al. 2004), displayed per- 46 Chapter 3. Related Work sonal web-navigation histories and were designed to help users navigate rather than to analyze their usage behaviours. Generally in these analyses, analysts tend to look for different paths through a fixed set of web pages. For example, usability engineers use VISVIP to study how experimental participants use a single website to accomplish study tasks (Cugini and Scholtz 1999). In contrast, the goals of analyzing web session logs to study search behaviours are different. For example, Kellar et al. (2007) aimed to study four types of information- seeking activities. In that case, their participants could potentially go through an infinite number of pages across a large number of site domains. Visually depicting all user paths as trees or graphs is therefore challenging, and conse- quently, existing systems do not adequately address the needs of web search analysis. The one exception is Card et al.’s (2001) Web Behavior Graphs, which show search structures of individual users as modified state diagrams to help re- searchers locate problem spaces within the web site under analysis, and identify usage behaviour patterns. Despite the richness of the information and insights obtained from the analysis, Card et al.’s (2001) approach is difficult to scale. 3.2.4 Visualizations for computer-based logs An area highly related to web session log visualization is usability log visualiza- tion. Gray et al. (1996) used the coloured Bar Visualization to show usability sessions, with each bar consisting of a stack of colour-coded boxes encoding user activities types. However, the system does not allow comparison between pop- ulations or displays at multiple levels of detail. Many visualizations associated with usability log analysis use two- and three-dimensional graphs. Examples include population counts and summary statistics of events (Kay and Thomas 1995) and mouse activities (Guzdial et al. 1994), sequence patterns displayed as state transition diagrams to support Markov-based analysis (Guzdial et al. 1993), and a spreadsheet-like display for sequence alignment (Sanderson and Fisher 1994). These graphs are better suited for presenting analysis results than for data exploration, especially since many of them have to be generated manually. There are also systems designed for computer log analysis, such as MieLog, which displays textual log entries as colour-coded bars based on their type (Takada and Koike 2002), AuthorLines, which shows e-mail participation and initiation counts based on author (Viégas and Smith 2004), and SnortView, 47 Chapter 3. Related Work which shows network-based intrusion detection system logs as two-dimensional time diagrams (Koike and Ohno 2004). However, since the analysis goals are typically detecting anomalies rather than identifying and characterizing popu- lations, these systems are not well-suited for exploratory session log analysis. 3.2.5 Non-visual log analysis tools Commercial statistics packages are frequently used for web log analysis, for ex- ample, Microsoft Excel (microsoft.com) and SPSS (spss.com). Analysts also build custom programs, from simple scripts to calculate summary statistics to elaborate algorithms to find usage patterns and population clusters (e.g., Pier- rakos et al. 2003). For usability log analysis, systems such as MacSHAPA offers integrated support with a largely text-based interface (Sanderson and Fisher 1994). 3.2.6 Summary of Visual Analytics Related Work Even though we surveyed a large body of visualization systems that may be used for web session log analysis, we decided to build our own visual analytic tool for three reasons. First, we found that generic visual data exploration systems were not tailored to showing our multi-level web session log data. Also, most visualizations built for web-log analysis were designed to display user paths of a limited number of webpages, and were thus inadequate in supporting analysis of web search behaviours in general that may involve a potentially unlimited number of webpages. Second, even though non-visual approaches to web session log analysis exist and are popular amongst analysts, they tend to focus on confirmatory analysis rather than exploratory data analysis, which is the focus of the thesis. Third, by building a system, we could apply our knowledge and experience gained in our first three evaluations in the design (Chapter 7) and examine our choices in an evaluation of the resultant system (Chapter 8). 48 Chapter 4 Summary Synthesis: A Study-based Guide to Multiple Visual Information Resolution Interface Designs In this review, we analyzed 19 existing multiple-VIR interface studies to get a clearer snapshot of the current understanding of multiple-VIR interface use, and how to apply this knowledge in their design. To unify our discussion, we grouped the interfaces into single or multiple-VIR interfaces. For single-VIR interfaces, we looked at the hiVIR interface that shows data in detail and at the highest available VIR, for example, the “detail” in overview + detail interfaces. We considered three multiple-VIR interface types in this review: temporal, or temporal switching of the different VIRs as in zooming interfaces; separate, or displaying the different VIRs simultaneously but in separate windows as in overview + detail interfaces; and embedded, or showing the different VIRs in a unified view as in focus + context interfaces. Section 1.1.1 includes a more detailed explanation of our terminology. Since most of the existing multiple- VIR interface studies did not explicitly consider user characteristics such as visual-spatial ability, we did not address this important issue in our discussion. To better guide design processes, this chapter is structured as a decision tree to create a multiple-VIR visualization, as shown in Figure 4.1. Our decision tree has four major steps: 49 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Figure 4.1: Decision tree to create a multiple visual information resolution dis- play. There are four major steps in the decision process, each covered in a section in the chapter: (Decision 1/Section 4.3) Decide if multi-VIR is appro- priate for the application; (Decision 2/Section 4.4) Decide on the number of resolutions, amount of data and visual information to be displayed on the low VIRs; (Decision 3/Section 4.5) Decide on the methods to display the multi- ple VIRs; (Decision 4/Section 4.6) Decide on the spatial layout of the multiple VIRs. Considerations at each decision point are listed with their respective section numbers. DECISION 1 (Section 4.3): Single- or multiple-VIR interface The first step in the process is to decide if a multiple-VIR interface is suitable for the task and data at hand. The choice is not obvious as multiple-VIR interfaces typically have more complex and involved interactions than their single-VIR counterparts. Section 4.3.1 discusses interaction costs reported in the reviewed studies. Section 4.3.2 discusses considerations in using multiple VIRs to display 50 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs single-level data. DECISION 2 (Section 4.4): Create the low VIRs If the designer decides to use a multiple-VIR interface for his data, the next step in the design process is to create the low VIRs, which is a challenge with large amounts of data (Keim et al. 2006). In addition to the technical challenges in providing adequate interaction speed and in fitting the data onto the display device, the designer also needs to consider the appropriate levels of visual reso- lution provided by the interface. Study results indicate that providing too many levels of resolution may be distracting to users, as discussed in Section 4.4.1. Similarly, showing too much data in the low VIRs can also be distracting, as discussed in Section 4.4.2. In many cases, the data may have to be abstracted and visually abbreviated to increase the display capability of the low VIRs. Ellis and Dix (2007) provides a taxonomy of clutter reduction techniques that include sampling, filtering, and clustering. Section 4.4.3 discusses cases where designers had gone too far in their abstractions and study participants could no longer use the visual information in the low VIRs. Instead of abstraction, the designer could choose to selectively display or emphasize a subset of the data in the low VIRs, for example, based on the generalized fisheye degree-of-interest function (Furnas 1986). However, study results suggest that a priori automatic filtering may be a double-edged sword, as discussed in Section 4.4.4. Given all these considerations, we complete the discussion by re-examining the roles of low VIRs in Section 4.4.5 to help ground low-VIR design. DECISION 3 (Section 4.5): Simultaneous or temporal display of the VIRs Once the VIRs are created, the designer then needs to display them, either simultaneously as in the embedded or the separate interfaces, or one VIR at a time as in the temporal interfaces. Generally, temporal displays require view integration over time and can therefore burden short-term memory (Furnas 2006). On the other hand, simultaneous-VIR interfaces have more complex interactions such as view coordination in separate displays and the issue of image distortion frequently found in embedded displays. Our reviewed study found that for tasks that did not involve multi-level answers, or tasks that did not provide multi-level clues to single-level answers, displaying data with simultaneous multiple-VIR interfaces was not beneficial. Sections 4.5.1 and 4.5.2 51 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs consider the case when the study tasks did not require simultaneous display of VIRs, as in single-level answer with single-level clues. DECISION 4 (Section 4.6): Embedded or separate display of the VIRs If the choice is simultaneous display, the designer then has to consider the spatial layout of the different VIRs. The choices are to display the VIRs in the same view, as in the embedded interfaces, or by showing them in separate views, as in the separate interfaces. Both spatial layouts involve tradeoffs: the embedded displays frequently involve distortion, as discussed in Section 4.6.1, and the separate displays involve view coordination. For each of these decision points, we summarized current beliefs and assump- tions about multiple-VIR interface use, along with relevant study results. We also flagged situations where study results did not clearly support our previous beliefs based on existing literature. 4.1 Methodology Ideally, we would like to perform a meta-analysis to translate results from differ- ent studies to a common metric and statistically explore relationships between study characteristics and study results, as a meta-analysis is more objective, thorough and systematic than qualitative approaches. However, recognizing that the reviewed studies are different in their implementations of the various multiple-VIR techniques, their study tasks and their data, and in some cases, their experimental design and measurements, meta-analysis may only be able to include a very small subset of existing studies. Indeed, only 6 of the 35 studies considered by Chen and Yu (2000) met their criteria for their meta-analysis, and the researchers had a long list of recommendations to visualization evaluators to standardize their study designs. Some of their recommendations, echoed by others (e.g., Plaisant 2004; Ellis and Dix 2006), are still active areas of research. One example is to create standardized task taxonomies for interface evaluations (e.g., Winckler et al. 2004; Valiati et al. 2006). Perhaps a compromise worth making is to take a more qualitative, albeit less rigorous, approach to extract high-level themes from existing study results. That is the approach we took to extract design guidelines for multiple-VIR in- terfaces in this study. Instead of comparing between studies, we focused on 52 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs pairwise-interface comparisons within each study to abstract generalizable us- age patterns based on task, data characteristics, and interface differences. We collected an initial set of candidate papers by performing keyword searches on popular search engines (Google and Google Scholar) and large academic databases (ACM and IEEE digital libraries), along with our own collection of study publications accumulated over the years. From this initial set, we further located more study publications based on citations of the initial set. During the course of our synthesis, we continuously added new publications. Since our goal was to understand multiple-VIR interface use, we differed from most systematic reviews as we did not have specific questions in mind when we began our review. Instead, we took a bottom-up and qualitative approach to find emergent themes from coded individual study findings. We started our process by first coding the papers based on the interfaces studied, as shown in Table 4.1, and the major study findings, as shown in Appendix B. We focused on objective measures of task time and accuracy since these measures were reported in all user studies we sampled. We then gathered study findings for each interface pair (e.g., hiVIR and temporal) to identify possible underlying reasons that may explain study results. We considered the interfaces (e.g., visual elements, interactions), the data displayed (e.g., level of organization details, levels of data), the tasks (e.g., task natures), and explanations provided by paper authors based on their observations and understanding of their studies. We therefore labeled these interface-pair findings, along with possible explanations, as considerations in design. We organized these considerations into a four-point decision tree, which became the framework of our review (Figure 4.1). For some comparisons, we could not abstract general results from the studies, and we explained our reasons for excluding these interface pairs when appropriate. Since many studies looked at more than two study interfaces, their study results were mentioned in more than one section of the chapter. Our approach therefore may suffer from reviewer bias in our study inclusion and in our emphasis put on various study results. To ensure objectivity, or at least to convey to our readers the basis of our claims, we listed the studies we considered in each of the design considerations. Given that we collected only 19 papers, we believe explaining each set of study results qualitatively instead of attempting statistical meta-analysis would provide a more encompassing snap- shot of our collective knowledge on multiple-VIR use. While we did count the number of studies that produced statistically significant results that support the claim for each design consideration, we did not take the vote-counting ap- 53 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs proach in systematic reviews (see the terminology section in 1.1.2), as we did not base our findings on the these numbers. Instead, we considered each study publication individually to identify evidence that might either support or refute our findings, taking into consideration possible explanations for study results regardless if they achieved statistical significance. In fact, we put more empha- sis in the researchers’ insights reported in their publications than on statistical results. Section 4.9 discusses other limitations of our review. 4.2 Summary of Studies Table 4.1 lists the 19 studies reviewed, along with our encoding of the test interfaces as hiVIR, temporal, separate and embedded. Screen captures of study interfaces are available at http://www.cs.ubc.ca/∼hllam/res ss interfaces.htm. In order to provide a reasonably concise review, we excluded studies where study results did not differentiate between study interfaces in terms of performance measures or usage patterns (e.g., Buring et al. 2006). We considered all the interfaces in the reviewed studies except for the Saraiya et al. (2005) study, since their two “Multiple View” interfaces displayed the same data in separate views at the same VIR, but used a different graphical format. Since our review focused on multiple-VIR interfaces, we considered the issue of multiple presentation forms to be beyond the scope of our review. Since this study aimed to provide an evidence-based guide to designers in using multiple-VIR interfaces, and not a review paper on existing multiple-VIR study results, we only provided enough study details to illustrate our points so as to maintain readability. For reference, Appendix B.1 provides brief summaries of each study, and Appendix B.2 lists the interfaces, tasks, data, and significant results for each of the reviewed papers. For each design consideration, we listed studies included for the analysis. Each paper is designated with an identification letter which is used in subsequent tables. Please note that Hornbæk et al.’s online document study was reported in two papers: Hornbæk and Frokjær (2001) and Hornbæk et al. (2003). For completeness, we also included our overview-use study in this review. Chapter 6 details the study, and references to the study in this review is denoted as (Lam et al. 2007) for consistency. 54 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs ID Authors Paper Title Sgl Multiple H T E S A Baudisch et al. (2002) Keeping things in context: a com- parative evaluation of focus plus context screens, overviews, and zooming x x x B Baudisch et al. (2004) Fishnet, a fisheye web browser with search term popouts: a com- parative evaluation with overview and linear view x x x C Bederson et al. (2004) DateLens: a fisheye calendar in- terface for PDAs x x D Gutwin and Skopik (2003) Fisheye views are good for large steering tasks x x E Hornbæk and Frokjær (2001) Reading of electronic documents: the usability of linear, fisheye and overview + detail interfaces x x x Hornbæk et al. (2003) Reading patterns and usability in visualization of electronic docu- ments x x x F Hornbæk et al. (2002) Navigation patterns and usability of zoomable user interfaces with and without an overview x x G Hornbæk and Hertzum (2007) Untangling the usability of Fish- eye menus x x x H Jakobsen and Hornbæk (2006) Evaluating a fisheye view of source code x x I Lam and Baudisch (2005) Summary Thumbnails: readable overviews for small screen web browsers x x J Lam et al. (2007) Overview use in multiple visual information resolution interfaces x x x K Nekrasovski et al. (2006) An evaluation of pan and zoom and rubber sheet navigation x x x L North and Shneiderman (2000) Snap-together visualization: can users construct and operate coor- dinated visualizations? x x Sgl = Single; H = HiVIR; T = Temporal; E = Embedded; S = Separate. continued on next page... 55 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs ...continued from previous page ID Authors Paper Title Sgl Multiple H T E S M Pirolli et al. (2003) The effects of information scent on visual search in the hyperbolic tree browser x x N Plaisant et al. (2002) SpaceTree: supporting explo- ration in large node link tree, de- sign evolution and empirical eval- uation x x O Plumlee and Ware (2006) Zooming, multiple windows, and visual working memory x x P Saraiya et al. (2005) Visualization of graphs with asso- ciated timeseries data x x Q Schafer and Bowman (2003) A comparison of traditional and fisheye radar view techniques for spatial collaboration x x R Schaffer et al. (1996) Navigating hierarchically clus- tered networks through fisheye and full-zoom methods x x S Shi et al. (2005) An evaluation of content brows- ing techniques for hierarchical space-filling visualizations x x Table 4.1: Multiple-VIR studies reviewed. An ’X’ in the cell denotes the study included an interface of the corresponding type: Sgl = Single; H = HiVIR; T = Temporal; E = Embedded; S = Separate. Note that Lam et al. (2007) is the third study in this thesis, reported in Chapter 6. 4.3 Decision 1: Single or Multiple-VIR Interface? The first step in our design decision tree is to decide if a multiple-VIR interface is appropriate for the task and data at hand. To isolate situations where the additional low VIRs were found to be useful, we looked at studies that compared the single-VIR hiVIR interfaces to the three multiple-VIR interfaces: temporal, embedded, and separate. It is generally believed that interfaces should provide more than one VIR (e.g., Card et al. 1999, p. 307). However, for users, having the extra VIRs means more complex and difficult VIR coordination and integration, which may be time consuming and require added mental and motor efforts. The topic of 56 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs interaction costs in multiple-VIR interface is further discussed in Section 4.3.1. Interaction costs may be justified if lower VIRs provided in addition to the basic hiVIR interfaces are useful to users. In general, usefulness of the additional lower VIRs hinges upon the levels of data structure required by the task. In other words, single-level data may not be suited for multiple-VIR display, as discussed in Section 4.3.2. 4.3.1 Consideration 1: multiple-VIR interface interaction costs should be considered Interaction complexity can be difficult to measure and isolate. Commonly used objective measurements such as performance time and accuracy are aggregate measures and cannot be used to identify specific interaction costs incurred in interface use. In ten of our reviewed papers, researchers recorded usage pat- terns, participant strategies, and interface choice that revealed interaction costs (Table 4.2). 1. Interaction costs from usage patterns Source Papers Usage patterns G. Hornbæk and Hertzum (2007) (Eye-tracking records) M. Pirolli et al. (2003) Usage patterns E. Hornbæk et al. (2003) (Navigation-action logs) F. Hornbæk et al. (2002) H. Jakobsen and Hornbæk (2006) Participant strategies A. Baudisch et al. (2002) J. Lam et al. (2007) Interface choice E. Hornbæk and Frokjær (2001); Hornbæk et al. (2003) F. Hornbæk et al. (2002) J. Lam et al. (2007) Table 4.2: Ten papers that reported interface interactions. Five reported usage patterns obtained either from eye-tracking records or navigation-action logs; two reported participant strategies; and two reported interface choice. As shown in Table 4.2, 5 of the 19 studies reported usage patterns con- structed based on eye-tracking records or navigation action logs. Two of the studies reported usability problems with their multiple-VIR interfaces (Hornbæk et al. 2002; Hornbæk and Hertzum 2007). 57 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Hornbæk et al.’s (2002) study on map navigation reported that participants who actively used the low-VIR view switched between the low- and the high- VIRs more frequently, which resulted in longer task completion time. The researchers reported that using the additional low-VIR view may require mental effort and time moving the mouse, thus adding complexity in the interaction (p. 382). Indeed, navigation patterns showed that only 55% of the 320 tasks were solved with active use of the low-VIR view in their multiple-VIR interfaces (p. 380). Hornbæk and Hertzum’s (2007) study on fisheye menus reported large navi- gation costs in their separate and embedded interfaces, all implemented a focus- locking interaction mechanism (Bederson 2000). Even though these interfaces succeeded in facilitating quick, coarse navigation to the target, participants had difficulty getting to the final target since the menu items moved with the mouse. Based on eye-tracking data, the researchers reported that participants made longer fixations and longer scan paths with their separate and embed- ded interfaces than with their temporal interface, suggesting increased mental activity and visual search. 2. Interaction costs from participant strategies As shown in Table 4.2, 2 of the 19 studies reported participant strategies in interface use. In Baudisch et al.’s (2002) study on map path-finding and verification, some participants avoided continuously zooming in and out using the temporal in- terface by memorizing all the locations required in the task and answered the questions in a planned order. As a result, they could stay at a specific magni- fication without zooming back to the low-VIR view, thus effectively using the temporal interface as a hiVIR interface. In Lam et al. (2007), participants developed a strategy to use the seemingly suboptimal hiVIR interface in a visual comparison task. The data consisted of a collection of line graphs that were identical except shifted by various amounts in the x-dimension. The task involved matching a line graph with the same amount of horizontal shift. Some participants took advantage of spatial arrangement of the separate interface by selecting candidate line graphs from the low-VIR view and displaying them in high VIR for side-by-side comparison. The majority of participants, however, developed a strategy to enable the use of the high- VIR view alone. Taking advantage of the mouse wheel and the tool-tips which 58 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs displayed horizontal and vertical values of the line graph point under the cursor, participants scrolled vertically up and down with the cursor fixed horizontally at the point where the target peaked. As a result, they eliminated the need to visually compare line graphs. Instead, they tried to find another peak at the same x point numerically by reading off the tool-tips and avoided the need to interact with multiple VIRs. 3. Interaction costs from participants’ interface choices Another indicator of interaction costs is participants’ active choice to use only one VIR in a multiple-VIR interface to avoid coordinating between the multiple VIRs. As shown in Table 4.2, participants could explicitly convert a multiple- VIR into a single-VIR interface in 2 of the 19 studies, and in Hornbæk et al.’s (2002) study on map navigation, the researcher recorded active pane use. In Hornbæk’s study on reading electronic documents, participants could ex- pand all the document sections at once by selecting the pop-up menu item “expand all” in the embedded interface (Hornbæk and Frokjær 2001; Hornbæk et al. 2003). Six out of 20 participants chose to do this in one or more of the tasks. On average, they expanded 90% of the sections, thus effectively using the embedded interface as a hiVIR interface. In Hornbæk et al.’s (2002) study on map navigation, 45% of participants did not actively use the low-VIR view in the separate interface, even though 80% of participants reported preference for having the extra low-VIR view. In Lam et al.’s (2007) study on visual search and comparison of line graphs, participants could expand all initially compressed graphs in their embedded or their separate interface by a key press, thus effectively turning the multiple-VIR interface into a high-VIR interface. Their participants actively switched to the hiVIR interface in 58% of the trials. We suspect this desire to use only a single VIR when given a multiple-VIR interface is more prevalent than reported. In many cases, participants were not provided with a simple mechanism to convert from the multiple-VIR interface to its single-VIR counterparts, while in other cases, sole use of one window in the separate interface could not be discerned without detailed interaction recordings such as eye-tracking records. Using multiple-VIR interfaces as single- VIR interfaces may explain some studies’ inability to distinguish hiVIR interface and their multiple-VIR counterparts, for example in Lam et al. (2007), our overview-use study detailed in Chapter 6. 59 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.3.2 Consideration 2: single-level task-relevant data may not be suited for multiple-VIR displays Multiple-VIR Effect Paper with single-VIR data No benefits B. Baudisch et al. (2004) J. Lam et al. (2007) Adverse effects F. Hornbæk et al. (2002) Mixed effects E. Hornbæk et al. (2003) Excluded I. Lam and Baudisch (2005) Table 4.3: Five papers that had single-level data and included a single-VIR interface for analysis comparison. In these cases, most multiple-VIR interfaces supported the same or worse performances than their single high-VIR counter- parts. The number of VIRs provided by the interface should reflect the levels of organization in the data as required by the task. Otherwise, users may need to pay the cost of coordinating between different VIRs without the benefits of rich information at every VIR. Among the seven studies reviewed that included a single-VIR interface, five of them used at least one set of single-level data (Table 4.3). Two studies failed to show performance benefits of multiple-VIR interface for single-level data in cases where the tasks required detailed informa- tion not provided by the low-VIR display alone. Hornbæk et al.’s (2002) study showed adverse effects in using multiple-VIR interfaces for single-level data. Hornbæk et al.’s (2003) study on online documents showed mixed results, as task nature affected the levels of data required, and consequently, interface use. We excluded Lam and Baudisch’s (2005) study in this discussion as their hiVIR interface had almost nine times the number of pixels than their multiple-VIR interfaces, making direct comparisons difficult. Baudisch et al.’s (2004) study on information searches showed a lack of ben- efit in using multiple-VIR interface for single-level data when the task could not be performed based on information showed on the low VIR alone. Their study interfaces displayed web documents with guaranteed legible keywords which constituted their low-VIR displays. When the task only required reading the keywords, as in their Outdated task, their multiple-VIR interfaces outperformed their high-VIR browser, probably because the low-VIR displays concentrated task-relevant information in smaller display spaces. In contrast, when the task required reading surrounding text which may be too small to be legible, as in the Analysis task, having the extra low-VIR display did not result in performance 60 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs benefits for the single-level document data. The situation is similar in Lam et al.’s (2007) study on visual-target search on a line-graph collection. Their multiple-VIR interfaces only showed performance benefits over their hiVIR interface when the visual targets could be directly identified on the low-VIR display, for example, in their Max task. Otherwise, having the extra low VIR did not seem to enhance participant performance since their data was essentially single-leveled. Hornbæk et al.’s (2002) study on map navigation illustrates the adverse effects of displaying single-level data using a multiple-VIR interface. Despite having a similar number of objects, area occupied by the geographical state object, and information density on the maps, there were surprisingly large dif- ferences in usability and navigation patterns between the two study-map trials. The Washington-map trials had better performance time, accuracy and subjec- tive satisfaction than the Montana-map trials. The researchers explained these differences by differences in map content and the number of organization levels: the Washington map had three levels of county, city, and landmark, while the Montana map was single-leveled with weak navigation cues at low zoom levels. As a result, unlike the multiple-level Washington map, the single-level Montana map data was not suitable for the multiple-VIR temporal interface had produced poorer performance results. Hornbæk et al.’s online document study showed mixed results, which illus- trated how task nature could affect the levels of data required, and how that difference could affect interface effectiveness (Hornbæk and Frokjær 2001; Horn- bæk et al. 2003). In their question-answering task, participants were slower without being more accurate in their answers if they were given an additional low-VIR view. Based on reading patterns, Hornbæk and Frokjær suggested that the slower reading times were due to the attention-grabbing low-VIR view in the separate interface, which led participants to further explore the documents perhaps unnecessarily. In contrast, in the essay-writing task where participants were required to summarize the documents, having the extra low-VIR overview displaying data structure as section and subsection headers resulted in better quality essays without any time penalty when compared to the hiVIR interface. In other words, when the task required single-level answers, as in the question- answering task, having an extra low-VIR display had a time cost; when the task required multiple-level answers, as in the essay-writing task, the low-VIR display produced higher quality results. 61 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.3.3 Summary of considerations in choosing between a single or a multiple-VIR interface In general, the amount of interaction efforts required to coordinate the multiple VIRs is non-trivial and should be considered. We found that when adding VIRs to the high-VIR display did not add task-relevant information, as in the case of using multiple VIRs to display single-level data, costs incurred in VIR coordination were typically not justified. 4.4 Decision 2: How to Create the Low VIRs? Once the designer decided on taking the multiple-VIR approach, the next step in the process is to create the low-VIR display. Creating the low-VIR display in a multiple-VIR interface is a non-trivial task, especially when the amount of data involved is large. Study results suggest a delicate balance between displaying enough visual information for the low-VIR display to be useful and showing irrelevant resolution or information that becomes distracting. In Section 4.4.1, we discuss the adverse effect of displaying more levels of VIRs than supported by the data and required by the task. Section 4.4.2 discusses the related topic of displaying too much information on the low-VIR display. Given the space constraints, designers usually need to find less space-intensive visual encodings for the data or reduce the number of data displayed on the low- VIR display. Section 4.4.3 discusses cases where the researchers had gone too far in their visual-encoding abstraction as their study participants could no longer use the visual information on the low VIRs. Section 4.4.4 looks at the trade- offs in using a priori automatic filtering to selectively show data on low-VIR displays. Given all these considerations, we round up the discussion in Section 4.4.5 by re-examining the roles of low VIRs to help ground low-VIR designs. Study results suggest a more limited set of low-VIR roles than proposed in literature. While we found that study results supported the use of low VIR as navigational shortcuts to move within the data and to provide overall data structure, we failed to find supports to the common beliefs of using low VIR to aid orientation or to provide meaning for comparative interpretation of an individual data value. 62 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.4.1 Consideration 1: having too many visual resolutions may hinder performance In general, the number of visual resolutions supported by the interface should reflect the levels of organization in the data. Otherwise, users may need to pay the cost of coordinating between the different VIRs without the benefit of rich information at each level. In cases where the extra VIRs were not useful for the task at hand, the irrelevant information could be distracting. These extra VIRs may at best be ignored, and at worst, may harm task performance. Of the 19 studies reviewed, four looked at compound multiple-VIR inter- faces when an additional low-VIR view was added to an already multiple-VIR interface (Table 4.4). Effect Paper with compound multiple-VIR interface Low-VIR view added to No benefits A. Baudisch et al. (2002) temporal zoom plus pan (z+p) display to create their overview plus detail (o+d) interface K. Nekrasovski et al. (2006) temporal Pan&Zoom and their embedded Rubber Sheet Navigation inter- faces Adverse effects F. Hornbæk et al. (2002) temporal zoomable inter- face Excluded G. Hornbæk and Hertzum (2007) embedded fisheye menu Table 4.4: Four papers that had at least one compound multiple-VIR interface, created by adding an additional low-VIR view was to a multiple-VIR interface. Since Hornbæk and Hertzum’s (2007) study did not include an interface that was only embedded without the low-VIR overview, we could not discern the effects of having an additional low-VIR view and thus excluded it from this discussion. For the other three studies, perhaps because the multiple-VIR interfaces already displayed all the meaningful and task-relevant visual informa- tion levels supported by the data, having the additional low-VIR view did not enhance or even degrade participant performance. Two studies showed a lack of benefit in providing additional low-VIR views (Table 4.4). Participants in Baudisch et al.’s (2002) study obtained similar per- formances using the overview plus detail (o+d) interface and their zoom plus 63 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs pan (z+p) interface. The researchers reported that participants kept the tem- poral view zoomed to 100% magnification for tracing, thus effectively reduced the temporal component of the interface to a single-VIR display, and used the compound multiple-VIR interface as a separate interface (low-VIR + temporal used as high-VIR). In Nekrasovski et al.’s (2006) study on large trees and visual comparison tasks, the overall tree view in the low-VIR overview provided task-relevant lo- cation cues. However, the information was not unique and necessary as the high-VIR view also provided a similar visual cue. As a result, the study failed to show performance benefits in having an extra low-VIR view in their interfaces even though participants reported reduced physical demand. Hornbæk et al.’s (2002) study on map navigation suggested performance hin- derance when an interface provided irrelevant levels of resolutions. One of their study interfaces was a temporal interface with an added low-VIR overview. They reported that participants who actively used the low-VIR overview had higher performance time, possibly because of the mental and motor efforts required in integrating the low- and high-VIR windows. Such costs were not compensated by richer information displays as the temporal interface already contained all the task-relevant visual resolutions and may have reduced, or even eliminated, the need for a separate overview (p. 381). In some cases, study results indirectly suggested adverse effects on perfor- mance when the interfaces provided irrelevant VIRs. For example, in Plumlee and Ware’s (2006) study that required matching three-dimensional object clus- ters, their temporal interface had many magnification levels that neither helped participants to locate candidate objects, nor were detailed enough for visual matching. Given that participants needed to memorize cluster objects between temporal view switching with the temporal interface, the extra zooming levels may have rendered the tasks harder. This extra cognitive load may explain the relatively small number of items participants could handle before the oppo- nent separate interface supported better performance, when compared to results obtained in Saraiya et al. (2005). Similarly, in Baudisch et al.’s (2002) study on static visual path-finding tasks and dynamic obstacle-avoidance tasks, their temporal interface and their sepa- rate interface seemed to have included more VIRs than their embedded interface. While the special setup in their embedded interface undoubtedly contributed to the superior participant performances, we did wonder if the extra VIRs may have distracted participants in the other two interface trials. 64 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.4.2 Consideration 2: having too much information on the low-VIR display may hinder performance While it may be tempting to provide more rather than less information on the low-VIR display, study results suggest that the extra information may harm task performance. None of the 19 reviewed studies included low-VIR item density as a factor. However, we obtained indirect evidence by comparing between multiple-VIR interfaces that display different amounts of visual information in their low-VIR displays, and by comparing between low- and high-VIR displays for visual search tasks that only required the low-VIR displays. As shown in Table 4.5, 15 of the studies included at least two multiple- VIR interfaces. Of the 15 studies, 11 showed similar amounts of information on the low-VIR displays and could not be used to understand the effects of task-irrelevant information. We also excluded Hornbæk’s electronic document study since their low-VIR displays showed different kinds, rather than different amounts, of information (Hornbæk and Frokjær 2001; Hornbæk et al. 2003). We excluded Plaisant et al.’s (2002) study since it was unclear from the paper the number of items initially shown in their embedded SpaceTree interface. Amount of Low- VIR Info Papers Similar (excluded) A. Baudisch et al. (2002) B. Baudisch et al. (2004) C. Bederson et al. (2004) D. Gutwin and Skopik (2003) F. Hornbæk et al. (2002) J. Lam et al. (2007) K. Nekrasovski et al. (2006) O. Plumlee and Ware (2006) Q. Schafer and Bowman (2003) R. Schaffer et al. (1996) S. Shi et al. (2005) Different G. Hornbæk and Hertzum (2007) M. Pirolli et al. (2003) Excluded E. Hornbæk and Frokjær (2001); Hornbæk et al. (2003) N. Plaisant et al. (2002) Table 4.5: Fifteen papers that included at least two multiple-VIR interfaces. We compared the amount of information displayed on the different low-VIRs to understand the effects of task-irrelevant information. 65 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Our discussion here therefore focuses on the two studies that displayed sim- ilar kinds of information, but at different amounts, on their low-VIR displays: 1. Pirolli et al.’s (2003) study compared the separate file browser with the embedded hyperbolic tree browser. While the paper did not explicitly compare display capacities of the two low-VIRs, we estimated display volume based on paper figures. The low-VIR view of the separate file browser displayed about 30 items. In contrast, the capacity of the low- VIR region of the embedded hyperbolic tree browser was at least two orders of magnitude larger. 2. Hornbæk and Hertzum’s (2007) study compared the temporal cascading menu to two embedded menu designs based on the Fisheye menu (Bederson 2000). While the lowest VIR of their temporal cascading menu only showed a list of alphabets, their embedded fisheye menus showed all menu items in font sizes based on relative distances from the focus. In both of these cases, the researchers advised against putting too much visual information on the display. Pirolli et al. (2003) argued against the as- sumption of “ ‘squeezing’ more information into the display ‘squeezes’ more information into the mind” (p. 51) since visual attention and visual search in- teract in complex ways. In fact, their study showed detrimental effects of display crowding. Pirolli et al. (2003) quantified information relevance as information scent. For their tree data set, they developed an Accuracy of Scent score, which was related to “(a) the ability of users to discriminate the information scent associated with different subtrees to explore and (b) the correctness of those choices with respect to the task.” (p. 31). Their study found that their em- bedded hyperbolic tree browser interface led to slower performance times when compared to their temporal file browser under low information scent, possibly because their embedded interface displayed irrelevant information that was dis- tracting. Hornbæk and Hertzum (2007) came to a similar conclusion in their study on displaying menus with large numbers of items: “designers of fisheye and focus + context interfaces should consider giving up the widespread idea that the context region must show the entire information space” (p. 28). We excluded their temporal cascading menu results in this discussion since their separate and their embedded interfaces had severe usability problems, and were therefore not comparable to the temporal results. We therefore focused on the two embedded 66 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs interfaces and compared between them instead. Their Multifocus menu dis- played larger numbers of readable menu items than the Fisheye menu, but had lower coverage of the data set. Eye-tracking results indicated that participants made more use of context and transition regions in the Multifocus menu than with the Fisheye menu. The researchers thus suggested dispensing with the unreadable, and therefore inaccessible, transition regions in the Fisheye menu (p. 26). Task Answer Loca- tion Papers low VIR B. Baudisch et al. (2004) I. Lam and Baudisch (2005) J. Lam et al. (2007) L. North and Shneiderman (2000) P. Saraiya et al. (2005) Both high VIR and E. Hornbæk and Frokjær (2001); low VIR Hornbæk et al. (2003) H. Jakobsen and Hornbæk (2006) Table 4.6: Seven papers that included a hiVIR and a multiple-VIR interface, classified by the locations from which participants could find answers to the tasks. This situation is analogous to tasks where answers are apparent from the low-VIR display, and extra information in the hiVIR interface is therefore irrel- evant. As shown in Table 4.6, seven of the reviewed studies included a hiVIR and a multiple-VIR interface. Five of them included tasks that could be answered using the low-VIR displays alone. We therefore attempted to understand effects of displaying unnecessary information by comparing participant performances between their multiple-VIR interfaces, where participants were likely to have consulted mainly the low-VIR displays, and their hiVIR interfaces, where par- ticipants needed to sieve through irrelevant information to locate task answers. However, except in the case of Lam et al. (2007) and North and Shneiderman (2000) where a loVIR interface was also studied, our findings were speculations as we could not be certain that participants focused on the low-VIR displays in the multiple-VIR interfaces. In Baudisch et al.’s (2004) study on information searches on web documents, their Outdated task required participants to check if the web documents con- tained all four semantically highlighted keywords. In other words, the detailed readable content of the web documents displayed in their hiVIR interface was 67 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs irrelevant to the Outdated task. Since their separate and their embedded in- terfaces concentrated these task-relevant semantic highlights in their low-VIR displays, the two multiple-VIR interfaces outperformed their hiVIR interface for this task. In Lam and Baudisch’s (2005) study on information search on webpages, their PDA-sized temporal interfaces, when rendered on desktop, supported equal performance as their desktop counterpart, even though the hiVIR interface had nine times more display space showing completely readable information. The re- searchers suggested that the extra information on the desktop display may have distracted participants and caused unnecessary searching and reading, which may have resulted in lack of performance benefits of having a larger display. In Lam et al.’s (2007) study on visual target search in a large line-graph collection, one of the tasks involved finding the highest point in the data. The loVIR interface alone was adequate for the task, and not surprisingly, interfaces that included a low-VIR display were found to support better performance than their hiVIR interface. Observation data suggested that about half of the partic- ipants did not use the high-VIR display in the multiple-VIR interfaces for this task. In North and Shneiderman’s (2000) study on visual information search, in- terfaces that were equipped with a low-VIR view (i.e., their loVIR and separate interfaces) were found to be superior to the hiVIR interface for tasks that could be answered based on information on these low-VIR views alone. Similarly Saraiya et al. (2005) found that their low-VIR, or single attribute, display was most helpful to analyze graphs at a particular time point, as “mul- tiple attributes can get cluttered due to the amount of information being visu- alized simultaneously” (p. 231). In short, instead of using physical item density as a measurement of space-use efficiency, a perhaps more useful consideration is the density of useful informa- tion on the display, which is arguably task or even subtask specific. 4.4.3 Consideration 3: displaying information is not sufficient; information has to be perceivable The mere presence of information on the screen is not sufficient; the information needs to be perceivable to be usable. Text on the low-VIR display may need to be readable to be useful. As shown in Table 4.7, seven of the 19 studies reviewed looked at text data. Four studies included unreadable text in their interfaces, 68 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs while two had only readable text. We excluded Bederson et al.’s (2004) study as both of their interfaces, the embedded DateLens and the temporal Pocket PC Calendar, used symbols to replace text in case of inadequate display area. Text Readability Papers Some unreadable B. Baudisch et al. (2004) G. Hornbæk and Hertzum (2007) H. Jakobsen and Hornbæk (2006) I. Lam and Baudisch (2005) Only readable text E. Hornbæk and Frokjær (2001); Hornbæk et al. (2003) L. North and Shneiderman (2000) Excluded C. Bederson et al. (2004) Table 4.7: Seven papers that looked at text data, classified by the readability of the included text. Study results showed that unreadable text displayed on low VIRs were inef- fective shortcuts to high-VIR details, as single hiVIR displays resulted in similar participant performance despite displaying the information in a larger screen area and thus, having a larger search space. In Baudisch et al.’s (2004) study on information searches on web documents, both of their multiple-VIR interfaces showed unreadable text except for a few keywords. When the task required reading neighborhood texts to these read- able keywords, as in their Analysis task, the multiple-VIR interfaces failed to demonstrate performance benefits over the traditional hiVIR browser. In Hornbæk and Hertzum’s (2007) study on displaying large numbers of menu items, their embedded Fisheye menu displayed unreadable items at the extreme ends in the low-VIR regions. Eye-tracking results indicated that par- ticipants made very little use of low-VIR regions, thus suggesting their ineffec- tiveness (p. 26). Jakobsen and Hornbæk’s (2006) study looked at displaying program code using an embedded fisheye interface which displayed unreadable text in the low- VIR regions. The embedded interface showed time cost over the hiVIR interface in a task that involved counting conditional and loop statements, as participants spent more time in the embedded interface to find closing braces of a loop control structures that were unreadable in the low-VIR regions. The researchers thus suggested that interfaces should display readable text to allow direct use of the low-VIR view information (p. 385). Lam and Baudisch’s (2005) study reported similar findings. Their temporal 69 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Thumbnail interface had unreadable low-VIR text, but their temporal Summary Thumbnail contained only readable low-VIR text. They found that participants using the Thumbnail interface had 2.5 times more zooming events, and when zoomed in, horizontally scrolled almost 4 times more, suggesting the ineffective- ness of the unreadable low-VIR text. For graphical visual signals, two studies reported effects of showing insuffi- cient details on the low-VIR display (Hornbæk et al. 2002; Lam et al. 2007). In Hornbæk et al.’s (2002) study on map navigation, the geographic map informa- tion provided by the low-VIR overviews may not have been sufficiently detailed for the study tasks, for example, to find a neighboring location given a starting point, to compare the location or size of two map objects, or to find two largest map object given a geographic boundary. For the Washington-map trials, hav- ing an extra low-VIR overview had time and recall accuracy costs, suggesting the burden of “switching between the detail and the overview window required mental effort and time moving the mouse” (p. 382). Indeed, “tasks solved with active use of the overview were solved 20% slower than tasks where the overview window was not actively used” (p. 380), possibly due to the insufficient infor- mation on the low-VIR overview that led to the large number of transitions between the overview and the detail window. Despite 80% indicated subjective preference for having the extra view, only 55% of participants actively used the low-VIR view. Lam et al. (2007) qualified perceptual requirements for their low-VIR display as visual complexity and visual span. The study looked at displaying a large collection of line graphs for visual search and visual compare tasks, and found that in order for the low-VIR view to be usable, the signal had to be visually simple and limited to a small horizontal area. For example, in the task that required finding the highest peak in the data collection, the visual signals on the low-VIR displays were simple narrow peaks and could easily be found. In contrast, three-peak signals in their Shape task were complex and were less discernable in the low-VIR views. As a result, participants resorted to the the high-VIR views for these three-peak signals. In short, designers need to provide enough details for visual objects on low- VIR displays to be usable. For text, the display objects should be readable if the tasks required understanding text content. For graphical objects, the criteria are less clearly defined. 70 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.4.4 Consideration 4: a priori automatic filtering may be a double-edged sword Papers Filtering Effect(s) B. Baudisch et al. (2004) pos H. Jakobsen and Hornbæk (2006) pos and neg E. Hornbæk and Frokjær (2001); neg Hornbæk et al. (2003) Table 4.8: Four papers implemented a priori automatic filtering. pos = positive effects observed; neg = negative effects observed. Designers often can only display a subset of the data on the low-VIR dis- plays. One selection approach is based on Furnas’s (1986) degree-of-interest function using a priori knowledge of data relevance with respect to the focus datum. Jakobsen and Hornbæk (2006) further differentiated the distance term in the function into semantic and syntactic distances to implement an embedded interface for source code. As seen in Table 4.8, of the three studies that im- plemented a priori automatic filtering, two suggested that automatic filtering could enhance task performance as the low-VIR displays concentrated useful in- formation and reduced distractors. However, in two studies, some participants were confused by the selective filtering and became disoriented. Instead of seeing filtering as a workaround to the display-size challenge and as a liability, there is evidence to suggest that filtering in itself can enhance task performance. When filtering selects task-relevant information for the low-VIR display, such intelligence avoids tedious manual searching and navigation in the high-VIR view, and possibly also avoids distractions by irrelevant information. In Baudisch et al.’s (2004) study on information searches on webpages, their multiple-VIR interfaces semantically highlighted and preserved readability of keywords relevant to the tasks. These keywords were concentrated in smaller display spaces by reducing font sizes of surrounding texts. Such interfaces re- sulted in better participant performances as long as they still provided task- required layout information. For example, participants were faster when using either of their multiple-VIR interfaces for the Outdated task, and when using their web-column preserving embedded interface for the Product-choice task. In Jakobsen and Hornbæk’s (2006) study on displaying program source code, automatic and semantically selected readable context in their embedded interface avoided the need to manually search for function declarations in the entire source 71 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs code. This advantage manifested in faster performance times in tasks where participants were required to search for information contained in the function declarations throughout the entire source code. However, automatic filtering may be a double-edged sword, as filtering may result in disorientation and distrust of the automatic selection algorithm. In Hornbæk’s study on reading electronic documents (Hornbæk and Frokjær 2001; Hornbæk et al. 2003), their embedded interface preserved readability only for the most important part of the document, with content importance determined by the interface a priori. Participants expressed distrust, both in their satisfaction feedback where they rated the embedded interface as confusing, and in their comments indicating that they “did not like to depend on an algorithm to determine which parts of the documents should be readable” (p. 142). This problem may be worse with semantic filtering, where object visibility depends on the semantic relatedness of the object to the focus datum, rather than the geometric distance between screen displays. Selection of displayable context based on syntactic distance between the data point and the focus is arguably easier to predict than semantic selection. Consequently, it may be easier for users to understand and trust filtering algorithms based on syntactic distance only. Also, since context information is updated when the focal point changes, it may be more confusing to navigate with semantic-context updates, as pointer navigation is conceptually geometric rather than semantic. In Jakobsen and Hornbæk’s (2006) study on program source code visualization, low-VIR regions replaced scrolling in the hiVIR interface and only displayed semantically- relevant source code based on focus. Participants were confused about the semantic algorithm that caused program lines to be shown and highlighted in the context area (p. 385). Another problem of automatic filtering is that the selection may affect the amount of time users spent on different parts of the data. In Hornbæk’s study on reading electronic documents (Hornbæk and Frokjær 2001; Hornbæk et al. 2003), the researchers found that participants spent approximately 30% less time on the initially collapsed sections displayed on their embedded interface than when displayed in full on the other interfaces. In short, while a priori filtering may concentrate task-relevant information on low-VIR displays, selective filtering may incur user distrust and confusion, and may even affect how users explore the displayed data. 72 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.4.5 Consideration 5: the roles of the low-VIR displays may be more limited than proposed in literature While the high-VIR display enables users to perform detail work, low-VIR roles are harder to verify. We therefore looked at four proposed uses of the low-VIR display based on published literature, and found that study results support only proposed claims for separate interfaces: low-VIR provides navigation shortcuts and overall data structure. We were unable to find strong support for low- VIR in embedded interfaces to aid orientation or to provide meaning for data comparison. Supported: low-VIR view provides navigation shortcuts Information showed in the low-VIR region or view can facilitate navigation by providing long-distance links, thus “decreasing the traversal diameter of the structure” in navigation (Furnas 2006). Coordinations between the low- and the high-VIR views enable users to directly select targets on low VIR displays for detail exploration. For example, North and Shneiderman (2000) found that low-VIR view of a list of geographic states acted as hyperlinks for the high-VIR detail census data. Another way that low VIR assists navigation is by providing a map of avail- able paths (Card et al. 1999). An example is the low-VIR overview in the sep- arate interface in Hornbæk et al.’s (2003) online document study that showed section and subsection headers. For graphical displays, Baudisch et al.’s (2002) study found that participants used the low-VIR overview to navigate to targets and performed the detail work in the hiVIR display. Low VIR can also be useful for refinding. In Hornbæk et al.’s study on electronic-document reading, reading pattern analysis showed that participants “used the overview pane to directly jump back to previously visited targets” and “the overview pane supports [sic] helps reader memorize important docu- ment positions” (p. 145) and resulted in participant preference and satisfaction, even though this apparent navigation advantage failed to materialize as time performance benefits (Hornbæk and Frokjær 2001; Hornbæk et al. 2003). Supported: low-VIR view provides overall data structure Low VIR can provide a data structure that may not be apparent in higher VIRs. For example, Hornbæk et al.’s study on reading electronic documents found that 73 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs document section and subsection headers shown on the low-VIR view of their separate interface “may indirectly have helped subjects to organize and recall text” (p. 144), and led to higher quality essay without any time penalty. Open: low-VIR region aids orientation When the information space contains little or no information for which we can base our navigational decisions, the problem of “desert fog” occurs (Jul and Furnas 1998). Global context in embedded displays has been proposed to help users orient (Nigay and Vernier 1998), perhaps by providing visual support for working memory as the display gives evidence of where to go next (Card et al. 1999). While we did not find evidence to study this role of the embedded low-VIR region, results from Hornbæk et al.’s (2002) study on map navigation may shed some lights on the topic. Results from Hornbæk et al.’s (2002) study suggested that visual cues in data aided navigation. In their study, the Washington map contained rich visual cues for navigation. Participants were faster in navigation tasks performed using their temporal interface with the Washington map without the low-VIR view, suggesting that the map contained visual objects that aided navigation. In contrast, participants using the Montana map made a smaller number of scale changes when the low-VIR display was present, suggesting that the map itself did not contain enough visual objects for effective navigation, and participants needed the guidance of the low-VIR overview. If visual objects displayed in low-VIR regions of embedded interfaces act similarly to navigational cues in the Washington map, it would be likely that low-VIR regions can aid orientation. Open: low-VIR region provides data meaning It is believed that data value is only meaningful when interpreted in relation to surrounding entities, and “the surrounding entities at different scales of aggre- gation exert a semantic influence on any given item of interest” (Furnas 2006). Again, we did not find embedded results to study this low-VIR region role. How- ever, Saraiya et al.’s (2005) study on displaying time-series data as nodes in a graph may provide some understanding. Saraiya et al.’s (2005) study included a hiVIR interface that showed all 10 time points simultaneously and a temporal interface that showed one data point 74 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs at a time. Even though participants made more errors overall when using the hiVIR interface, thus suggesting having surrounding entities may be detrimental rather than helpful, a closer look at individual tasks showed mixed results. We focused on tasks that involved all time points as they were more likely to involve comparative interpretations. The study reported the temporal interface supported faster task time in finding the topology trend of a larger graph and in searching for outlier time points. These two results suggested that despite having to identify trends or detect outliers, context provided in the hiVIR in- terface was detrimental rather than beneficial, possibly due to visual clutter. On the other hand, participants achieved better performance results with the hiVIR interface for the two tasks that involved finding outlier nodes and groups, and did not exhibit any performance differences for tasks that involved finding time trends. Given the mixed results from Saraiya et al.’s (2005) study, we were unable to offer any insights into the role of low-VIR regions in providing data meaning for comparison. 4.4.6 Summary of considerations in low-VIR creations Creating low-VIR displays is the second step in our decision tree (Figure 4.1). The first consideration is to determine the number of VIRs needed. Study results suggested that the number of VIRs in an interface should match the number of levels in the displayed data, as extra VIRs may hinder performance. Similarly, the low-VIR overview should only display task-relevant information, as extra information may be distracting. Information displayed should be perceivable in order to be useful. For text, readability is an important consideration; for graphical objects, the definition is less clear. Oftentimes, there are too many items in the data than what can be accommodated on the output device. Even though a priori selection of display data is an attractive solution, study results have found that doing so could lead to user confusion and distrust. 4.5 Decision 3: Simultaneous or Temporal Displays of the Multiple VIRs The third decision in the process of creating a multiple-VIR interface is on VIR arrangements. For the designer, it is a choice between showing the VIRs 75 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs simultaneously or one at a time, as in zooming techniques. A well-known problem with zooming is that when the user zooms in on a focus, all contextual information is lost. Loss of context can be a considerable usability obstacle, as users need to integrate all information over time, an ac- tivity that requires memory to keep track of the temporal sequence and their orientations within that sequence (Herman et al. 2000; Furnas 2006). To al- leviate these problems, a set of techniques collectively called focus + context were developed. Indeed, Card et al. (1999) stated the first premise of focus + context visualization as that “the user needs both overview (context) and detail information (focus) simultaneously” (p. 307). Another problem of zooming is that it “‘uses up’ the temporal dimension—making it poor for giving a focus + context rendering of a dynamic, animated world” (Furnas 2006). Although this reasoning appears to be logical, empirical study results did not consistently support using simultaneous VIR displays: study results sug- gested that the temporal interface was surprisingly good for most tasks. We identified two situations where the simultaneous-VIR display provided perfor- mance benefits: when the answer to the problem involved information from all the available VIRs (Section 4.5.1), and when the different VIRs provided clues for the task (Section 4.5.2). Otherwise, temporal switching seemed adequate. 4.5.1 Consideration 1: tasks with single-level answers may not benefit from simultaneous VIR displays In general, we found that simultaneous-VIR display was best suited for tasks that required multi-level answers. We focused on 10 of the 19 studies as they in- cluded a temporal and at least one simultaneous-display interface for comparison (Table 4.9). We excluded Hornbæk et al.’s (2002) study in this discussion since their separate interface, the zoomable interface with an overview, was effectively used as just a temporal interface most of the time. Three of these 10 studies had at least one task that required multi-level answers, and all showed performance benefits in using their simultaneous-display interfaces for those tasks compared to their temporal interfaces. In Bederson et al.’s (2004) study, the embedded DateLens interface was found to be more effective than the temporal Pocket PC interface in tasks that involved counting events within a 3-month time period in the calendar, for example, in counting scheduled events or appointment conflicts. In Plaisant et al.’s (2002) tree browsing study, the SpaceTree embedded in- 76 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Papers MA SA MC SC SB A. Baudisch et al. (2002) x C. Bederson et al. (2004) x x x G. Hornbæk and Hertzum (2007) x x x K. Nekrasovski et al. (2006) x x M. Pirolli et al. (2003) x N. Plaisant et al. (2002) x x O. Plumlee and Ware (2006) x R. Schaffer et al. (1996) x x x S. Shi et al. (2005) x F. Hornbæk et al. (2002)(ex) Table 4.9: Ten papers that included a temporal and at least one simultaneous- display interface. MA = Multiple-level answers; SA = Single-level answers; MC = Multiple-level clues; SC = Single-level clues; SB = Single-VIR interface better supported tasks; ex = excluded from review. terface trials were faster than the temporal Explorer interface on average and more accurate in a task that required listing all the ancestors of a node. In Schaffer et al.’s (1996) re-routing task, participants were required to find an alternative route to connect two points in the network that were disconnected, and the route spanned all levels in the hierarchical network. The embedded interface supported faster task completion times and required only half the number of zooming actions when compared to the temporal interface. The advantage of the embedded interface could be its display of the ancestral nodes along with the children nodes at the lowest level of the hierarchy, since all of which were needed to find an alternative route. On the other hand, the temporal interface seemed to offer better support for tasks with single-level answers, unless the clues required to reach the answers were also multi-level, as discussed in the next section. 4.5.2 Consideration 2: tasks with single-level information scent may not benefit from the simultaneous display of different visual resolutions For tasks with single-level answers, simultaneous-VIR display was still helpful if the clues to the tasks spanned multiple data levels. As shown in Table 4.9, five of the nine included studies with multi-level clues to single-level answers, and all except the Hornbæk and Hertzum (2007) study demonstrated benefits 77 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs in using simultaneous-VIR displays. In Baudisch et al.’s (2002) study, their multiple-VIR interfaces supported equal or better performances than their temporal interface in the route-finding and connection-verification tasks. Even though the answer could be obtained in the high-VIR view alone, both tasks required global relative locations in the low VIR and detail information in the high VIR. Pirolli et al. (2003) looked at a similar phenomenon called information scent. Their study suggested that the embedded hyperbolic tree interface may support faster task time than the temporal file explorer interface at high-scent tasks. In their embedded hyperbolic interface, participants could see more of the hi- erarchical structure in a single view and traversed tree levels faster. Under high-scent conditions where ancestor nodes provided clues to task answers, this feature could be advantageous. In contrast, under low information scent con- ditions, participants examined more tree nodes when using the embedded than the temporal interface, and resulted in slower task times. Plaisant et al. (2002) reported that the embedded SpaceTree supported equal or better task times in the first-time tree node finding tasks than the temporal Explorer interface. Even though the researchers did not provide enough task instructions for us to judge if the the task provided multiple-level clues, the researchers did mention providing hints to participants that seemed to span multiple levels: “To avoid measuring users’ knowledge about the nodes they were asked to find (e.g kangaroos) we provided hints to users (e.g. kangaroos are mammals and marsupials) without giving them the entire path to follow (e.g. we didn’t give out the well known step such as animals).” (p. 62). In Plumlee and Ware’s (2006) study, the task required matching complex clusters of three-dimensional objects, and clues to the answers were present in both the low-VIR view, showing the location of the candidate targets, and in the high-VIR view, showing the details required in visual matching. Their separate interface was found to better support the task when the total number of objects per cluster was above five items, in which case participants could no longer hold all the clues in their short-term memory when using the temporal interface. One possible exception to this hypothesis is Hornbæk and Hertzum’s (2007) study. The researchers looked at the usability of fisheye menus showing 100 and 292 items. The study found that known-item search tasks were solved faster and more accurately with the temporal cascading-menu interface. However, due to the various implementation-dependent usability issues with the simultaneous- VIR interfaces, we could not discern relative interface effectiveness based on 78 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs VIR arrangement alone, and we therefore excluded it from our analysis. Taking Considerations 1 and 2 together, we concluded that tasks with single- level answers and single-level clues would not benefit from simultaneous display of the different visual resolutions. Indeed, that seemed to be the case based on study results, even for the tasks that required object comparisons. As long as participants could keep task-required information in their short-term memory, the temporal interface seemed adequate, and at times, even resulted in better participant performances and feedback. As shown in Table 4.9, five of the nine included studies had at least one task that required single-level answers and provided single-level clues. All except the Shi et al.’s (2005) study results supported this general conclusion. We excluded Hornbæk and Hertzum’s (2007) visual searches in menus in this discussions due to the nontrivial usability issues with their simultaneous-VIR interfaces. Bederson et al.’s (2004) study showed that the temporal Pocket PC was more appropriate for simple calendar tasks that involved checking start dates of pre-scheduled activities and tasks that spanned short-time periods. In Nekrasovski et al.’s (2006) study, where the task was to compare topolog- ical distances between colored nodes in a large tree, their results showed that their temporal interface outperformed their embedded interface, even though the task required comparison between objects. Indeed, their temporal interface was rated by participants as being less mentally demanding and easier to navigate. In Schaffer et al.’s (1996) study, even though the embedded interface sup- ported faster task times than temporal in rerouting within a hierarchical net- work, participants did not seem to need simultaneous-VIR display to locate broken links at the lowest network level, as indicated by the lack of perfor- mance differences between the temporal and the embedded interface trials for this link-location task. The exception is Shi et al.’s (2005) study, where researchers found that their embedded interface supported faster task times than the temporal interface. In Shi et al.’s (2005) case, there may be a speed-accuracy tradeoff: the researchers observed that in some cases, their participants ignored potential targets that occupied a small amount of space and missed the small targets in less than 3.75% of the trials. Even though the researchers did not report task error rates, they reported that this phenomenon may have a more severe and adverse impact on their embedded than on their temporal interface trials. Also, there were participants who gave up when using the embedded interface, but they only timed-out in the temporal trials. 79 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs In short, simultaneous-VIR display is appropriate for multi-level answers or single-level answers found by multi-level clues. Otherwise, the temporal interface seemed adequate. 4.5.3 Considerations in choosing between temporal switching or simultaneous display of the VIRs In general, simultaneous VIR display, as in embedded or separate interfaces, requires more complex interactions, while temporal interfaces can be taxing on the user’s memory. Study results suggested that temporal switching was more suitable for tasks that did not involve multi-level answers, or did not provide multi-level clues to single-level answers. 4.6 Decision 4: How to Spatially Arrange the Visual Information Resolutions, Embedded or Separate? The last step in our decision tree is to decide between the two spatial arrange- ments of simultaneous-VIR display: the interface can embed the different VIRs within the same window or show them as separate views. Proponents of the embed approach argued that the different VIRs should be integrated into a single dynamic display, much as in human vision (Card et al. 1999; Furnas 2006). View integration is believed to facilitate visual search, as it provides an overview of the whole display which “gives cues (including overall structure) that improve the probability of searching the right part of the space” (Pirolli et al. 2003, p. 21), and integrated views of data is argued to “support and im- prove perception and evaluation of complex situations by not forcing the analyst to perceptually and cognitively integrate multiple separate elements” (Thomas and Cook 2005, p. 83). Also, it is believed that when information is broken into two displays (e.g., legends for a graph, or overview + detail), visual search and working memory consequences degrade performance as users need to look back and forth between the two displays (Card et al. 1999; Pirolli et al. 2003). On the other hand, spatial embedding frequently involves distortion, an issue discussed in Section 4.6.1. The choice between these two spatial arrangements is unclear based on em- pirical study results. Oftentimes, perceived functions of the two interfaces bi- 80 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Embedded vs. separate Papers Unable to compare B. Baudisch et al. (2004) E. Hornbæk and Frokjær (2001); Hornbæk et al. (2003) K. Nekrasovski et al. (2006) No difference G. Hornbæk and Hertzum (2007) J. Lam et al. (2007) Q. Schafer and Bowman (2003) embedded better A. Baudisch et al. (2002) D. Gutwin and Fedak (2004)(ex) Table 4.10: Eight papers that included both embedded and separate interfaces, classified by participant performances. ex = excluded from analysis. ased study data and task selections. For example, studies tended to use trees or graphs for node finding to study embedded interfaces (e.g., Plaisant et al. 2002; Pirolli et al. 2003; and Shi et al. 2005) and spatial navigation for sepa- rate displays (e.g., North and Shneiderman 2000 and Plumlee and Ware 2006). As a result, the issue of spatial arrangement was frequently confounded in our reviewed studies. As shown in Table 4.10, 8 of the 19 studies included both embedded and separate interfaces. We found it difficult to directly compare between the two simultaneous displays in three of the studies. For the remaining five studies, three did not find significant performance differences. Only Baudisch et al. (2002) and Gutwin and Fedak’s (2004) studies demonstrated superior perfor- mance support of their embedded interfaces. In the case of Baudisch et al. (2002) the performance differences were possibly due to the unique implementation of their interface, while Gutwin and Fedak’s (2004) results were possibly due to comparatively complex interactions required in their separate interfaces. Mixed results in our reviewed studies, as shown in Table 4.10, may reflect the different tradeoffs in these interfaces. Also, in some cases, the benefit of providing multiple VIRs may be so large that the spatial arrangement may not matter (Tory et al. 2006, p. 12). Of the three studies that we decided to be incomparable with the other five, two of them were excluded due to intentional implementation differences based on common perceived use of the two spatial arrangements: low-VIR view in the separate interface to display data overview, and low-VIR regions in the embedded interface to show background and supporting information. The first is Baudisch et al.’s (2004) study on web document search. Their embedded 81 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs interface was designed to favour row discrimination and their separate interface favoured for column discrimination, thus adding another factor that influenced study results. Hornbæk et al.’s study showed different kinds of information in their two multiple-VIR interfaces (Hornbæk and Frokjær 2001; Hornbæk et al. 2003). The low-VIR view of their separate interface provided document section and subsection headers and was optimal for showing overall structure in text doc- uments and for encouraging detail explorations. In contrast, their embedded interface showed a priori determined text significant to the focal area, which promoted rapid document reading at the cost of accuracy. The last study in the incomparable group did not intend to study spa- tial arrangement despite including both separate and embedded interfaces. In Nekrasovski et al.’s (2006) study on large tree displays, the goal of their separate interface was to investigate the use of an extra low-VIR view. Consequently, neither of their separate interfaces (temporal with overview and embedded with overview) could be directly compared with their embedded interface to discern effects of spatial arrangements. In the five cases where direct comparison was possible, three studies did not find performance differences between the two simultaneous interfaces. The two exceptions were Baudisch et al. (2002) and Gutwin and Fedak’s (2004) studies. Even though Gutwin and Fedak’s (2004) study on steering tasks showed significant results, we believe their results may be confounded by the relatively complex interactions required in their separate interfaces. The study included three embedded fisheye displays and two separate displays. In a series of two- dimensional steering tasks where participants were required to move a pointer along a defined path, the study found that the embedded interfaces supported better time and accuracy performances over the separate interface at all display magnifications. The researchers thus concluded that “the fact that fisheyes show[ed] the entire steering task in one window clearly benefited performance” (p. 207). However, we believe a number of factors were involved in addition to the dif- ferent VIR spatial arrangements. The first factor was differing effective steering path widths and lengths between interfaces. Of the five study interfaces, only one of the separate interfaces, the Panning-view, had an increased travel length at higher magnifications. All other interfaces had constant control/display ratios over all magnifications. As for the Radar-view separate interface, participants interacted with the low-VIR miniature view instead of the magnified high-VIR 82 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs view, thus the actual steering path width was effectively constant over all mag- nifications. We also found that interaction complexity differed greatly among the five interfaces. Their Panning-view separate interface had more complex panning interactions than the other interfaces, especially at higher levels of magnification of the steering path. The Panning-view separate interface required two mouse actions, mouse drag for panning and mouse move for steering, while the Radar- view separate interface required only mouse-drag on the miniature low-VIR view. In contrast, the embedded interfaces required only a single mouse action to shift the focal point and magnify the underlying path. This type of interaction, however, has the disadvantage of magnification-motion effect, where objects in the magnifier appear to move in the opposite direction to the motion of the lens, and is easier to overshoot the motion and slip off the side of the lens. We considered this motion effect as a third factor in the study. Given the complex interplay of at least three factors that seemed to be implementation specific, we failed to extract general conclusions on VIR spatial arrangement based on Gutwin and Fedak (2004) study. Baudisch et al.’s (2002) study looked at three tasks that required informa- tion from all VIRs: a static route-finding task, a static connection-verification taks, and a dynamic obstacle-avoidance task. Study results indicated that the embedded interface better supported all of the tasks and was preferred by par- ticipants. Their unique embedded interface implementation avoided many of the usability pitfalls in embedding high-VIR regions into low-VIR displays, which may explain its superior participant performance: first, the location for the high-VIR region was fixed, thus potentially avoiding disorientation with a mo- bile focus in respect to the context area and the associated complex interactions, and second, distortion was not used in the system. Instead, the researchers used different hardware display resolutions for the two different VIRs. In contrast, their separate interface seemed more interactively complicated than the usual implementation, requiring panning in both low- and high-VIR views and zoom- ing in the high-VIR view. Nonetheless, we believe their study demonstrated an effective use of their embedded interface over their separate interface. We conclude that there is not sufficient evidence to derive design guidelines in choosing between the two simultaneous displays, as it is difficult to draw conclusions based only on Baudisch et al.’s (2002) study. 83 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.6.1 The issue of distortion One of the potential costs in embedding multiple VIRs within the same window is distortion. Based on Furnas’s (1986) fisheye views and on studies of attention, Card et al. (1999) justified distortion since “the user’s interest in detail seems to fall away from the object of attention in a systematic way and that display space might be proportioned to user attention”. Also, Card et al. (1999) rea- soned that “it may be possible to create better cost structures of induced detail in combination with the information in focus, dynamically varying the detail in parts of the display as the user’s attention changes [...] Focus and context visualization techniques are ‘attention-warped’ displays, meaning that they at- tempt to use more of the display resource to correspond to interest of the user’s attention” (p. 307). Even though distortion is believed to be justified, it is still useful to examine the costs. The first problem is that distortion may not be noticed by users and be misinterpreted (Zanella et al. 2002), especially when the layout is not familiar to the user or is sparse (Carpendale et al. 1997). Even when users recognize the distortion, distance and angle estimations may be more difficult and inaccurate when the space is distorted (Carpendale et al. 1997), except perhaps in con- strained cases such as bifocal or modified fisheye distortions (Mountjoy 2001). Also, users may have difficulties understanding the distorted image to associate the components before and after the transformation (Carpendale et al. 1997), or in identifying link orientation in the hyperbolic browser (Lamping et al. 1995). To our knowledge, only three published studies measured effects of distortion directly and systematically. Lau et al. (2004) found that a nonlinear polar fish- eye transformation had a significant time cost in visual search, with performance slowed by a factor of almost three under large distortions. In terms of visual memory costs, our laboratory experiment, reported in Chapter 5, found image recognition took longer and was less accurate at high fisheye transformation levels. Skopik and Gutwin (2005) reported a time penalty without compromis- ing accuracy on refinding nodes in a highly-linked graph when the graph was transformed by a polar fisheye transformation. It was difficult to tease out the effects of distortion based on the 19 papers we reviewed here, since none of the studies specifically looked at distortion as a factor. We could therefore only rely on observations reported in the papers to obtain insights. As shown in Table 4.11, 14 studies included an embedded interface, and 12 implemented distortion. The two exceptions were Baudisch 84 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Papers Distortion Effects None Text Grid pos neg A. Baudisch et al. (2002) x B. Baudisch et al. (2004) x x C. Bederson et al. (2004) x x D. Gutwin and Skopik (2003)(ex) x E. Hornbæk and Frokjær (2001) x x G. Hornbæk and Hertzum (2007)(ex) x x H. Jakobsen and Hornbæk (2006) x x J. Lam et al. (2007) x K. Nekrasovski et al. (2006) x M. Pirolli et al. (2003) x N. Plaisant et al. (2002) x Q. Schafer and Bowman (2003) x R. Schaffer et al. (1996) x x S. Shi et al. (2005) x x Table 4.11: Fourteen papers that included at least one embedded interface. pos = Performance benefits demonstrated; neg = Problems reported; ex = excluded from review. et al. (2002) and Lam et al. (2007). Baudisch et al. (2002) took a hardware approach and implemented their embedded interface with two different pixel resolutions and Lam et al. (2007) used two distinct visual encodings to represent the same data in two VIRs. Interestingly, not all 12 studies reported usability or performance problems with visual distortion. In fact, seven studies reported performance benefits in using their distortable interfaces. We excluded Gutwin and Skopik (2003) in this analysis as we could not tease out the effects of distortion based on study results due to the large number of factors involved in the study, as discussed earlier in this section. The remaining six studies that demonstrated positive effects of distortion involved either text or grid-based distortions, suggesting that constrained and predictable distortions were well tolerated. Five studies reported problems attributed to distortion, and all involved com- paratively more drastic and elastic distortion techniques than text or grid-based distortions. We also excluded Hornbæk and Hertzum (2007) in our analysis since, even though the researchers reported usability problems with their vari- ous embedded and separate interfaces, it is unclear how distortion contributed to these problems. We therefore focused our discussion on the remaining four studies to further understand distortion costs. 85 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Nekrasovski et al.’s (2006) embedded interface implemented Rubber-Sheet Navigation that allowed users to stretch or squish rectilinear focus areas as though the data set was laid out on a rubber sheet with its borders nailed down (Sarkar et al. 2003). The researchers attributed the relatively poor performance of their embedded interface to the disorienting effects of distortion (p. 18). Plaisant et al.’s (2002) study found that their participants took longer to refind previously-visited nodes in a tree using the embedded hyperbolic and SpaceTree interfaces than with the traditional temporal Microsoft Explorer file browser. Among the two distortion interfaces, participants demonstrated bet- ter performance with SpaceTree than with the hyperbolic tree browser, which involved more drastic distortions. This result was predicted by the researchers as in SpaceTree, “the layout remains more consistent, [thus] allowing users to remember where the nodes they had already clicked on were going to appear, while in the hyperbolic browser, a node could appear anywhere, depending on the location of the focus point” (p. 62). Pirolli et al.’s (2003) study also compared between a temporal file browser and the embedded hyperbolic tree browser. The researchers found that the hyperbolic tree browser supported better performance only for tasks with high- information scent. Even though the researchers did not explicitly report prob- lems related to distortion, they suggested providing landmarks to aid navigation in the embedded hyperbolic tree browser, thus indicating potential interaction costs in hyperbolic distortions. Schafer and Bowman’s (2003) embedded interface implemented the radar fisheye view on maps. Their study reported both positive and negative effects of distortion. On the positive side, if noticed, the distortion enhanced awareness to the viewport in a collaborative traffic and sign positioning task using a map. However, users may not notice the distortion as it may not be caused by their direct action since the task was collaborative. In short, while we believe interfaces that implement distortions were gen- erally more difficult to use, constrained and predictable distortions were found to be better tolerated and may tip the tradeoff between showing more informa- tion simultaneously on the display and the risk of causing disorientation and confusion. 86 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.6.2 Considerations in spatially arranging the various VIRs There are tradeoffs in using either of the two simultaneous displays, embedded and separate. Embedded interfaces tend to implement distortion, which may be difficult for participants to understand and may involve difficult interactions. For separate interfaces, view coordination has been found to be difficult. Study results regarding this question were mixed. 4.7 Summary: Design Recommendations We summarize our findings as three recommendations to designers in creating multiple-VIR interfaces. 4.7.1 Provide the same number of VIRs as the levels of organization in the data Furnas argued for the need to provide more than two VIRs in his 2006 paper: By presenting only two levels, focus and context, these differ from the richer range of trading off one against the other represented in the canon- ical FE-DOI. This difference must ultimately prove problematic for truly large worlds where there is important structure at many scales. There the user will need more than one layer of context. In the same paper, he also argued that the levels of resolutions can be deter- mined based on the scale bandwidth of the presentation technology and scale range of the information world (Furnas 2006, p. 1003). Looking at the question from a different angle, study results suggested that the effectiveness in providing multiple VIRs, especially simultaneous display of different VIRs, was contingent upon the the number of organization levels in the data and the information needs of the task. In fact, we found that having extra VIRs may actually impede task performance, especially in temporal interfaces where users coordinate between the different VIRs using short-term memory. we believe that interface should therefore provide one VIR per data level. 87 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4.7.2 Provide relevant, sufficient, and necessary information in the low-VIR displays to support context use While the high VIRs should support detail work demanded by the tasks at hand, study results suggested that low-VIR views in separate interfaces were used in two ways: in navigation where they provided short-cuts to jump to different parts of the data; and in mental data organization if they displayed overall data structure. To be effective, designers need to include only sufficient, relevant, and necessary information in the low-VIR views. This finding is in accordance with Norman’s (1993) Appropriateness Principle, where he stated that the visual representation should provide neither more or less information that is needed for the task at hand since extra information displayed may be distracting and render the task more difficult. In the case of multiple-VIR interfaces, displaying an inappropriate amount of information may tip the balance as the value of the display may not be sufficient to overcome the costs of having the extra visual resolutions. The amount of detail for each visual object displayed on low-VIR views is likely to be more than previously assumed in our community, judging from the number of ineffective low-VIR views created for the reviewed studies. For text documents, readability may be a requirement, as suggested in Jakobsen and Hornbæk (2006): the design should “saturate the context area with readable information” in building interfaces to display program source code (p. 386), and in Hornbæk and Hertzum (2007): “making the context region of the [fisheye menu] interfaces more informative by including more readable or otherwise useful information” (p. 28). For graphical displays, studies on visual search (e.g., Tullis 1985) and Lam et al.’s (2007) study provided guidelines, for example, visual signals should be simple and of narrow visual spans to be accessible, but the criteria still remain unclear. 4.7.3 Simultaneously display VIRs for multi-level answers or multi-level clues Selecting the correct visualization technique to display data is important due to the inherent tradeoffs in the temporal, separate, and embedded techniques. While most temporal implementations offer familiar panning and zooming interactions, these interfaces require users to keep information in their short-term memories. Simultaneous-VIR displays, on the other hand, often require more complex and 88 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs unfamiliar interactions such as view coordinations. Based on study results, we concluded that if the task or subtask needs information from multiple levels, either as part of the answer to the task or as clues leading to the answer, the interface should show multiple levels simultaneously. Otherwise, the temporal technique should be more suitable due to its simpler interface and more familiar interactions. 4.7.4 Open question: how should multiple VIRs be displayed simultaneously? Unfortunately, we are unable to suggest guidelines in displaying multiple VIRs simultaneously, either as embedded or as separate displays, due to the difficulties in obtaining direct interface comparisons based on our set of reviewed studies. 4.8 Summary: Methodology Recommendations Despite being able to use most of the study results in our analysis, we encoun- tered difficulties in interpreting selective study results and had to exclude them from our analysis. Part of our difficulty may be due to differences between our goals and those of the reviewed studies: we aimed to tease out factors that af- fect visualization use instead of overall interface effectiveness. While evaluating visualization using experimental-simulation studies has been argued to be dif- ficult due to the lack of standardized tasks, effective measurements to capture interface use, and ecologically validity (Plaisant 2004), we believe the method can be improved even without data and task repositories, novel measurements, or abandoning the experimental strategy for field strategy. To identify areas that could be improved upon, we looked at the four main scenarios that led to result exclusion in our summary synthesis: 1. Study interfaces were not comparable at the individual factor level, such as visual elements, information content and amount displayed, level of organization displayed, and interaction complexity; 2. Measurements were not sensitive enough to capture usage patterns, which were needed to understand factors at play in visualization use; 3. Studies investigated multiple interface-use factors, making it difficult to isolate effects of each; 89 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs 4. Studies did not report sufficient details for our analysis, since we wished to extract effects of selective design factors in interface use instead of overall system or technique effectiveness. We therefore argue that by using comparable study interfaces, capturing usage patterns in addition to overall performance measures, isolating interface- use factors, and by reporting more study details, we can increase consistency among experimental-simulation studies and increase their utility, since study results will then be more amenable to meta-analysis. 4.8.1 Use comparable interfaces In order to understand factors influencing interface use, studies should identify possible factors at play, and if possible, vary one experimental factor at a time, as in factorial designs. For visual design, some factors include the interfaces’ basic visual elements such as the number of views and the use of image distortion, the amount and type of information displayed, and the number of levels in the displayed data. For interaction, study designers should consider the required number of input devices, the types of action required, and the number of displays on which the action is applied. Basic visual elements While it is understandable that interfaces in empirical studies may be dramat- ically different in appearance, they should be comparable in their basic visual elements whenever possible to allow for direct comparison. For example, in Baudisch et al.’s (2004) study on visual searches on webpages, they included two interfaces that showed web documents at two levels of detail simultane- ously. The separate interface had a scrollable detail page and an low-VIR view that showed the entire webpage by compressing all elements equally. The em- bedded interface was a non-scrollable browser that showed the entire webpage by differentially compressing pertinent versus peripheral content in order to keep the pertinent text readable. On the surface, the two interfaces were ideal candidates for studying the effects of spatial arrangement of the low- and the high-VIR components: in the separate interface, the two components were ar- ranged as separate views; in the embedded interface, they were embedded into a single view. However, there was another factor at play that affected performance results. 90 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Since their interfaces displayed readable words pertinent to their study tasks as highlighted popouts, the spatial association between the original web docu- ments and these popout words became important. Unfortunately, association by row was found to be more difficult in their embedded interface than by col- umn, as their focus + context implementation selectively distorted the vertical dimension. On the other hand, their separate implementation proportionally reduced both vertical (row) and horizontal (column) dimensions. Their study results reflected the interfaces’ ability to associate popouts with document rows and columns: their embedded interface better supported a task that did not require row-specific information (the Product Choice task), but not for row- dependent tasks (e.g., the Co-occurrence task). The separate interface results showed opposite trends. Since Baudisch et al.’s (2004) study aimed to evaluate overall effectiveness of their novel embedded interface relative to two existing techniques, both the vi- sual components’ spatial arrangement and the row-column association with the highlighted popouts were part of their interface design and should be evaluated together. However, when we tried to tease out the effect of spatial arrange- ment to extract general design guidelines, we could not to isolate the effect and therefore could not include their study results in our analysis. We encountered similar problems in analyzing Bederson et al.’s (2004) study on PDA-size calendar use. Their study looked at two interfaces: the Pocket PC calendar that provided a single level of detail per view (day, week, month, or year), and the DateLens interface that used a Table Lens-like distortion technique to show multiple levels of details simultaneously. Again, the study seemed to compare the effects of providing separate views one at a time, or embedding them in a single view. Their study looked at a variety of calendar tasks that involved searching for appointments, navigation and counting scheduled events, and scheduling given constraints. While their study did not find an overall time effect, the researchers found a task effect and thus divided the tasks into simple and complex tasks based on task-completion time. The study concluded that the DateLens trials were faster in complex tasks, while the Pocket PC trials were faster in simple tasks. On closer inspection, we realized that while their DateLens interface provided a day, week, month, and year view, it also provided a three-month and a six- month view, with the three-month view being the default in the study. On the other hand, the Pocket PC interface did not seem to have a corresponding three- 91 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs month overview. If that were the case, since three of their six complex tasks (tasks 5, 10, 11) required scheduling and counting events within three-month periods, we could not determine if the benefits of the DateLens interface in these tasks came from providing a three-month overview, or from providing multiple levels of details in the same view. Again, our need to understand performance contribution from individual factors forced us to exclude these results from our analysis. Information content We encountered difficulties in direct comparison of interfaces with different in- formation displayed, as the interfaces were often used for different purposes. One example is Hornbæk et al.’s study on online document reading (Hornbæk and Frokjær 2001; Hornbæk et al. 2003). Their study looked at two interfaces that provided multiple levels of data simultaneously. The low-VIR view in their separate interface showed document header and subheaders and acted as a table of contents. Their embedded inter- face showed context based on a degree-of-interest algorithm, thus the content was dynamic based on the focal point of the document. Not surprisingly, par- ticipants used the two interfaces differently. Reading patterns indicated that when using the embedded interface, participants spent more time in the ini- tial orientation mode, but less time in the linear read-through mode, suggest- ing that the embedded interface shortened navigation time by supporting an overview-oriented reading style. In contrast, reading patterns in the separate interface was found to be less predictable and “shaped by situation-dependent inspiration and associations”, and “the overview pane grabs subjects’ attention, and thereby leads them to explorations that strictly speaking are unnecessary” (p. 144), probably because display was similar to a table of contents. Study results reflected the different information content displayed in these low-VIR displays. Compared to the embedded interface, participants who used the sep- arate interface produced better results in the essay tasks at the expense of time, and the study failed to find differences between the two interfaces for the question-answering tasks. While the different information content was arguably part of the interface design, we could not incorporate results from this study in our analysis as we could not separate out visual spatial effects from those of displaying different kinds of information in the low-VIR displays. 92 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Levels of display The total amount and levels of information provided by the interface is also important, as extra information or levels may be detrimental to performance. One example is Plumlee and Ware’s (2006) study on visual memory in zoom- ing and multiple windows. Their temporal interface had a continuous-zoom mechanism that showed intermediate levels of detail that did not seem to be present in the separate interface, which seemed to have only two levels based on the researchers’ descriptions. Their study task required participants to match complex clusters of three-dimensional objects. To do so, participants needed to first locate clusters at the low-zoom level and match cluster components at the high-zoom level. Intermediate-zoom levels did not seem to carry task-relevant information as the clusters themselves were not visible given the textured back- grounds. Plumlee and Ware (2006) stated that participants needed 1.5 seconds to go through a magnification change of at least 30 times between the lowest and highest zoom levels. During this time, participants needed to keep track of the components in various objects in their short-term memory. We wondered if hav- ing the extra levels of detail in their temporal interface unnecessarily degraded participants’ visual memories and made the interface less usable. This extra cog- nitive load may explain the relatively small number of items participants could handle before the opponent separate interface became more appropriate for the task, in contrast to the results of a 2005 study on graph visualization by Saraiya et al. (2005). Saraiya et al.’s (2005) Single-Attribute temporal interface sup- ported better performance than their Multiple-Attribute hiVIR interface even when the task involved a 50-node graph, each node with 10 time points. Due to the differing levels of data displayed in the two study interfaces, we excluded Plumlee and Ware’s (2006) study from our analysis to understand the conditions in which simultaneous display of multiple data levels is beneficial. Similarly, Baudisch et al. (2002) studied static visual path-finding tasks and dynamic obstacle-avoidance task using three interfaces each providing multiple levels of details. Their zoom and pan temporal interface and their overview plus detail separate interface seemed to support more levels of detail than their focus plus context embedded interface, which had two levels only. Their embedded trials were faster than the temporal and the separate trials for the static visual path finding tasks, and were more accurate in the dynamic obstacle-avoidance task. While the special hardware setup in their embedded interface undoubtedly 93 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs contributed to the superior participant performance, we wondered if the extra resolutions may have distracted participants in the other two interface trials, even though we did include this study in our analysis of simultaneous displays of multiple levels of detail as we believe the difference in the number of display level was small. Levels in data Since researchers have argued that the interface should only display a different data resolution if it is meaningful to the task at hand (e.g., Furnas 2006), the number of displayed data levels is an important consideration. For example, in Hornbæk et al.’s (2002) study on map navigation, there were surprisingly large differences in usability and navigation patterns between the two study maps, despite being similar in terms of the number of objects, area occupied by the geographical state object and information density. The maps differed by the number of levels of organization: the Washington map had three levels of county, city, and landmark, while the Montana map was single-leveled. Perhaps for this reason, the study failed to find differences in participant performance when using the two study interfaces with the Montana map, but their participants were faster in a navigation task and more accurate in the memory tasks using just the temporal interface without an overview with the Washington map. We took advantage of this unintended data-level difference to examine how interfaces with multiple levels of display data support single-leveled data. These fortuitous opportunities for re-analysis were, however, rare. Interaction complexity In some cases, interaction style may be a factor in the study, and in others, unintended differences in interaction complexities among the interfaces studied may not be avoidable. Nonetheless, interaction complexity differences make comparison difficult, as seen in Hornbæk and Hertzum’s (2007) study on fisheye menus. In their 2007 study, Hornbæk and Hertzum’s (2007) intention was to study the visual design and use of fisheye menus (Hornbæk and Hertzum 2007). They had four interfaces: a traditional cascading menu (temporal), the Fisheye menu as described by Bederson (2000) (embedded), the Overview menu (separate), and the Multifocus menu (embedded). The separate and the embedded Multifocus interfaces were both based on Bederson et al.’s (2004) Fisheye menu, and all 94 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs three simultaneous-VIR interfaces implemented the focus-lock interaction to aid menu-item selection. The separate interface did not implement distortion, and showed a portion of the menu items based on mouse position along the menu, showing the field-of-view in the low-VIR view. The embedded Multifocus interface showed important menu items in readable fonts, and did not have an index of letters as in the embedded Fisheye or the separate interfaces. Surprisingly, their temporal interface outperformed all other simultaneous- VIR interfaces. The researchers suggested that one possible reason was the relatively simple navigation in the temporal interface: their participants en- countered obvious and severe difficulties in using the focus-lock mode in the other interfaces. While the researchers successfully identified a usability prob- lem in the Bederson et al.’s (2004) Fisheye menu, we could not conclude if the visual designs of the other three simultaneous-VIR interfaces were truly inferior to the temporal interface that showed one VIR at a time. Ensure comparable interface with follow-up studies Our recommendation to use comparable interface can be difficult to implement. One challenge is to identify study elements prior to the study to ensure com- parability. For example, in Hornbæk’s et al.’s study on map navigation, the researchers did try to use comparable maps (Hornbæk and Frokjær 2001; Horn- bæk, Frokjaer, and Plaisant 2003). Differences between the two study maps were only apparent after the study. Another difficulty in adhering to these suggestions may be due to a conflict of evaluation goals: the goals of the original designs were to compare between systems at the overall-performance level, while our goal was to extract the effects of interface-use factors in systems. It is therefore difficult to modify original study designs without changing these goals, since the systems themselves are complex and are frequently incomparable at the interface-factor level. In both cases, we believe follow-up studies are needed. Follow-up studies, either performed by the original researchers or by third parties, can take ad- vantage of the knowledge gained in original studies or system-level studies, such as correcting mistakes made in original studies as in using different levels in data or different levels in interfaces. System-level evaluations can be used as a vehicle to identify factors, perhaps by detailed observations of how participants interact with the systems. These factors can then be studied in more detail and in isolation in subsequent studies. For example, Baudisch et al.’s (2004) Fishnet 95 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs interface study identified at least two factors, the visual components’ spatial ar- rangement and the row-column association with the highlighted popouts, which can be studied in isolation with appropriate study designs. 4.8.2 Capture usage patterns In most reviewed studies, the main measurements were performance time, accu- racy, and subjective preferences. While these measurements provided valuable information on overall interface effectiveness, efficiency and user acceptance, they may not be sensitive enough to illuminate factors involved in interface use and to tease out design tradeoffs, especially when the study failed to find overall performance differences between the interfaces. While most reviewed studies reported experimenter observations on partic- ipant strategy and comments to interpret performance results, only 5 of the 19 studies reported usage patterns, constructed either based on eye-tracking records or navigation action logs (Table 4.2). We found these five studies to be most useful in our analysis. For example, Hornbæk et al.’s (2003) study on online document reading used progression maps to investigate reading patterns. Progression maps showed visible parts of the document during the reading process. The researchers interpreted study results using reading patterns derived from these progression maps and provided a richer understanding of how the study interfaces were used. For example, their reading pattern explains the longer performance time in the question- answering task trials using the separate interface: “further explorations were often initiated by clicking on the overview pane”, and “further exploration [of the displayed documents] happen[ed] because of the visual appearance of the overview and because of the navigation possibility afforded by the ability to click the overview pane”. They therefore concluded that “the overview pane grabs subjects’ attention, and thereby leads them to explorations that strictly speaking are unnecessary” (p. 144). In another of their studies, Hornbæk and Hertzum (2007) looked at fisheye menu use. Despite not finding performance differences between their simultane- ous-VIR interfaces, eye-tracking results showed interesting insights into how the interfaces were used: their participants used the low-VIR regions more fre- quently in the embedded Multifocus interface trials, possibly due to the readable information included. The researchers were therefore able to conclude based on usage pattern that designs should make “the context region of the [fisheye menu] 96 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs interfaces more informative by including more readable or otherwise useful in- formation” (p. 28). Capture usage patterns with observations We found that usage patterns provided rich insights in interface use regardless of statistical results. We therefore recommend recording and reporting detailed but non-intrusively collected usage patterns in study, from simple observations to detailed interactivity and eye-gaze logs. 4.8.3 Isolate interface factors Information visualization systems are complex interfaces that typically involve visual encoding and interaction, and for some implementations, view coordina- tion and image transformation. While simply identifying such factors is proba- bly sufficient to evaluate system effectiveness, studying overall effects may ob- scure contributions from each factor, a difficulty we encountered during our analysis to draw design guidelines based on these factors. That was the case when we looked at Gutwin and Skopik’s (2003) study on two-dimensional steering, where at least three factors were at play. Their study looked at five separate and two embedded interfaces. In addition to the different spatial arrangements of the different levels of details in their interfaces, there were also different effective steering path widths and lengths and different interaction styles. Section 4.5 discusses potential factors in this study in detail. Another type of difficulty we encountered in our analysis was to tease out usability factors involved in the embedded techniques. While showing all data as a single view in context may provide benefits, these techniques often require more complex interactions and image distortion, which has been shown to incur costs in orientation (Carpendale et al. 1997) and visual memory (Lam et al. 2006). Ideally, we would like to be able to study each of these factors in isola- tion. However, we were only marginally successful in teasing out the effects of distortion, as our study set had embedded interfaces that implemented different types and degrees of distortion. For example, Baudisch et al.’s (2002) study implemented their embedded interface with a hardware approach, using differ- ent pixel density in their displays to recreate the two regions, thus avoiding the need for distortion in their interface. Their study found performance benefits in all their tasks using their embedded display. In contrast, studies that imple- mented drastic and elastic distortion techniques reported null or mixed results, 97 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs along with observed usability problems, for example the Rubber Sheet Naviga- tion (Sarkar et al. 2003) in Nekrasovski et al.’s (2006) study, the Hyperbolic Tree browser in Plaisant et al. (2002) and Pirolli et al.’s (2003) studies, and the fisheye projections in Schafer and Bowman (2003). Despite this insight, our dis- tortion classification is still rough, both in terms of classifying distortion types and performance effects. Section 4.6.1 further discusses distortion in embedded interfaces. Isolate interface factors with follow-up studies As in the case of using comparable interfaces, it may not always be possible to identify all relevant interface factors at experimental design. In addition, conducting fully-crossed experiments with a large number of factors may be too expensive. We therefore also recommend conducting follow-up studies to focus on a selected subset of the identified factors. 4.8.4 Report study details One of the frustrations we had while analyzing our study set stemmed from the lack of details in study reporting. Indeed, Chen and Yu (2000) encountered similar problems in their meta-analysis. Since their meta-analysis synthesized significance levels and effect sizes, they had to exclude many more studies than in our qualitative analysis. Based on their experience, Chen and Yu recom- mended four standardizations in empirical studies: testing information, task taxonomy (for visual information retrieval, data exploration, and data analysis tasks), cognitive ability tests, and levels of details in reporting statistical results. They also asked for better clarity in visual-spatial properties descriptions and more focus on task-feature binding in studies. The researchers concluded that “it is crucial to conduct empirical studies concerning information visualization systematically within a comparable reference framework” (p. 864). In addition to supporting Chen and Yu’s (2000) recommendations, we have two further recommendations. We advocate reporting full task instructions. We also advocate documenting interface interactions with video, or even making the interface prototype software and trial experiments available for download. Al- lowing others to see or experience the exact instructions and interface behaviours seen by study participants would help reproducibility and clarify study proce- dures for later meta-analysis. 98 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs Although nine of the 19 reviewed papers provided detailed descriptions of the study tasks, only five provided actual task instructions. Since interface use can be severely affected by task nature such as the levels of detail required in task answers, it was difficult to analyze study results when the publications did not provide the written task instructions given to participants before the trials and any verbal hints given during the trials. For example, in our analysis, we needed to ascertain the factors that led to successful use of simultaneous-VIR displays. One possibility was when task instructions provided clues that span multiple data levels. Since we attempted to reinterpret the results based on different criteria, we encountered difficulties when the study did not provide enough task instructions for us to be certain if the task instructions provided multiple-level clues, for example in Plaisant et al.’s (2002) SpaceTree study where we had to guess based on study observations. Even providing detailed task instructions may still be inadequate in some cases. For example, in Pirolli et al.’s (2003) preliminary task analysis study, their tasks were measured for information scent. Even though the researchers did provide a list of tasks, they did not cross-match the list with information scent scores, making it difficult for us to later associate task nature, information scent score, and study results. We therefore assumed the instructions of high information scent tasks provided useful clues at multiple levels of the tree. For studies where interaction plays a pivotal role in study results, text de- scriptions of the interaction, no matter how detailed and carefully constructed, seem inadequate. One example is Hornbæk and Hertzum’s (2007) study on the use of fisheye menu, where the focus-lock interaction was found to be one of the major usability problems. Despite the researchers’ well-constructed de- scriptions, we did not fully understand the interaction until we tried the online fisheye menu prototype kindly provided by Bederson1. Report study details with online resources We understand the strict page limits for research papers in many venues has required authors to make draconian choices in the amount of detail reported. Even without the page limits, such choices should be guided by the study goals and paper emphasis to ensure readability, as it is impossible to predict how study results may be used in future analysis. We therefore recommend that researchers provide study details as electronic supplementary materials in pub- 1http://www.cs.umd.edu/hcil/fisheyemenu/ 99 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs lication venues that support archival availability of such materials, or as infor- mation posted on laboratory websites. 4.9 Limitations of Study While we attempted to provide a comprehensive systematic review in the use and design of multiple-VIR interfaces, we were necessarily limited by our method, our own knowledge and time to include all relevant studies in our review. Sec- tion 4.1 discusses limitations in our methodology. Also, our synthesis was based entirely on the publications. In many cases, the goals of these reports were to directly compare interfaces as a whole, especially when one or more of the interfaces were novel. Given our goal to understand interface use, we often had to read the publications from a different perspective, and consequently, we may have misread or incorrectly inferred information from these publications. 4.10 Summary of Results and Implications for Design We analyzed 19 existing multiple-VIR interface studies to extract design guide- lines, and cast our findings into a four-point decision tree: (1) When are multiple VIRs useful? (2) How to create the low-VIR display? (3) Should the VIRs be displayed simultaneously? (4) Should the VIRs be embedded, or separated? We recommended that VIR and data levels should match, and low VIRs should only display task-relevant information. Simultaneous display of the different VIRs, rather than temporal switching between them, is suitable for tasks with multi-level answers, or task that provided multiple-level clues. We identified two areas for further investigation in this thesis: the question of low-VIR display creation, and the issue of simultaneous-VIR display. We investigated these questions with the next three studies: two laboratory studies and a field evaluation. For completeness, we have included study results from the two laboratory studies in this review. The next two chapters report study details: Chapter 5 reports our laboratory experiment on visual memory costs in geometric transformation, discussed in this review in the context of distortion in Section 4.6.1; Chapter 6 reports our experimental-simulation study on overview use with single-level data, denoted in this review as Lam et al. (2007) and discussed in various sections as one of the studies analyzed. Our field evaluation 100 Chapter 4. A Study-based Guide to Multiple-VIR Interface Designs continued our evaluation into overview use and simultaneous-VIR arrangement, and is reported in Chapter 8 after the discussion on the design of the study tool in Chapter 7. 101 Chapter 5 Laboratory Experiment: Visual Memory Costs of Image Transformations The second study in this thesis further investigated issues of overview creation and spatial arrangements of visual resolutions discussed in the summary syn- thesis (Chapter 4). Geometric transformations such as scaling, rotation, rectangular fisheye, and polar fisheye transformations are widely used in creating the low-VIR displays in multiple-VIR interfaces. Scaling, for example, has been used to create thumb- nails for documents and rotation in graph navigation (e.g., Yee et al. 2002). Both fisheye transformations are often implemented in embedded interfaces, ex- amples include rectangular fisheye transformations to realize text or grid-based distortions such as DateLens (Bederson et al. 2004) and polar fisheye transfor- mations used in focus + context map applications such as those used in Schafer and Bowman’s (2003) study. Section 4.6.1 discusses the various types of distor- tion implemented in embedded interfaces in more details. Despite their wide-spread uses, there is a danger that the transformed images may be too distorted to remain recognizable. Unfortunately, the effects of these transformations on performance are largely unknown, as seen in our summary synthesis. Several design guidelines have been suggested to transform images with minimal disruption. These guidelines include: • Maintain orthogonal ordering (left-right, up-down ordering), proximity (distance relationships between objects) and topology (inside-outside re- lationships) of the original image (Misue et al. 1995); • Use visual cues to support the user’s comprehension of geometric distor- tion (Carpendale et al. 1997). Background grids have been suggested as 102 Chapter 5. Visual Memory Costs of Image Transformations the most effective of these (Zanella et al. 2002), as used in EPT (Carpen- dale et al. 1997). • Use animation to retain the relationships among components displayed during transformation, and to avoid reassimilating the new display (Robert- son et al. 1989). Many visualizations involving geometric transformation follow this principle, with earlier adopters being Pad++ (Bederson and Hollan 1994) and Table Lens (Rao and Card 1994). While these guidelines may provide designers with some hints for handling geometric transformations, they are based mostly on casual experience, and are not detailed or quantitative enough for actual implementation. Clearly, different types of geometric transformations and different degrees of transformation incur different amounts of perceptual cost. Knowing these costs would help designers gauge cost-benefit tradeoffs in their applications. Quantifying the effectiveness of various techniques suggested by these guidelines to mitigate transformation costs would be also helpful. For example, since smooth animation may impose a heavy computational load, it would be useful to determine the largest trans- formation “jump” we can perceptually tolerate. Also, the presence of grids may create visual noise instead of being beneficial. Extending earlier studies on geometric transformations and visual search (Rensink 2004; Lau et al. 2004), the goal of this work was to better under- stand and quantify the effects of two-dimensional geometric transformations on visual memory to guide interface and visualization design. In this study, we presented the first measurements of the effects of four types of geometric trans- formation on visual memory: scaling, rotation, rectangular fisheye, and polar fisheye transformations. These transformations were applied to automatically generated abstract images consisting of dots and connecting lines. Based on these results, we defined a no-cost zone boundary for each transformation type, after which task time and accuracy degraded. Based on our results, we refined two of the design guidelines mentioned above: Misue et al.’s (1995) orthogonal ordering requirement and the use of background grids to mitigate costs incurred by transformation (Zanella et al. 2002). 5.1 Experiments We conducted ten experiments to investigate the effects of geometric trans- formations on visual memory. Two additional experiments were conducted as 103 Chapter 5. Visual Memory Costs of Image Transformations follow-up experiments. All experiments used a within-subject design. In each experiment, we considered only a single factor, the transformation type, looking at five levels of transformation degree. Each transformation level was blocked, with the order of level presentation partially counterbalanced across participants using the ordering listed in Appendix C.4. Each level was tested using two phases, each with eight trials. In the learn- ing phase, participants were presented with eight stimuli in sequence. In the recognition phase, they were shown another set of eight stimuli, 50% of which were shown in the learning phase. For each stimulus, participants were asked to determine whether it had been shown in the learning phase. Baseline per- formance was measured in terms of response time and accuracy obtained using untransformed test stimuli. This baseline was then compared with results of the transformed trials. 5.1.1 Transformations We investigated four types of transformations to abstract images consisting of dots connected by lines: scaling, rotation, rectangular fisheye and polar fish- eye. We also examined the effects of grid presence and grid type. Ten initial experiments were carried out: • Scaling (1, 0.5, 0.33, 0.25, 0.2x reduction factor) Exp 1. no grid Exp 2. rectangular grid • Rotation (0, 30, 45, 60, 90 degrees clockwise rotation) Exp 3. no grid Exp 4. rectangular grid • Rectangular fisheye (0, 0.5, 1, 2, 3 transformation factor) Exp 5. no grid Exp 6. rectangular grid Exp 7. polar grid • Polar fisheye (0, 0.5, 1, 2, 3 transformation factor) Exp 8. no grid 104 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.1: Sample stimuli for expt 2, the scaling transformation with rectan- gular background grids, darkened for printing purposes. Expt 1 used similar stimuli without background grids. Exp 9. rectangular grid Exp 10. polar grid The choice of transformation ranges was based on two considerations. For scaling, there was a limit to which we could reduce stimuli size without severely compromising perceivable detail. Otherwise, we used pilot results to determine the start of performance degradation induced by the transformations. Based on our results, we extended two of the experiments: (1) experiment 4-ext: rotation with a rectangular-grid to study a wider range of rotations: 0, 90, 120, 150, 180, and (2) experiment 10-ext: polar fisheye with a polar grid to study the effects of transforming the sizes of the dot, and drawing the connecting lines in various coordinate systems. We did not include the translation transformation as it had previously been found to be robust in visual search tasks to at least 2 degrees of visual angle (Rensink 2004). 5.1.2 Stimuli All experimental stimuli were randomly generated abstract images consisting of dots connected by lines. We chose to use abstract rather than photorealistic images in part to avoid semantic effects, such as the verbal effect found by Gold- stein and Chance (1971), where recognition accuracy was considerably lower for 105 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.2: Sample stimuli for expt 4 and 4-ext, the rotation transformation with rectangular background grids, darkened for printing purposes. Expt 3 used similar stimuli for up to 90 degrees rotation without background grids. 106 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.3: Sample stimuli for rectangular-grid fisheye rectangular experiment (expt 6), along with the maximally distorted image for the polar-grid variety (expt 7). Expt 5 used similar stimuli without background grids. Grids have been darkened for printing purposes. 107 Chapter 5. Visual Memory Costs of Image Transformations objects difficult to name. Moreover, in the domain of information visualization, data is typically represented in abstract form. Our stimuli were similar to two- dimensional network graphs, but we believe these results generalize to many different encodings of information. All original stimuli had a resolution of 300x300 pixels to ensure that all levels of transformations would fit onto the display screen. In the grid experiments, we filled the entire screen with the corresponding grid. We used a different set of images for each experiment, but the same experimental set for all participants. All images were generated in the same manner for consistency. Each consisted of 15 dots connected by lines. The number of dots was determined in pilot studies to optimize image memorability. The locations of the dots were randomly gen- erated. The algorithm only guaranteed non-collision but not constant density of the dots. Pilot studies showed that the task was too difficult if we only provided the dots. Lines were therefore added to link the dots to enhance stimuli memora- bility, similar to lines drawn between stars in astronomical constellations. The algorithm that added the lines did not guarantee that all the dots were joined as a single unit, but it did ensure all of the dots were connected to at least one other dot, namely, its nearest neighbour. The algorithm minimized line crossing, but did not control the number of topological features, for example loops. When grids were added to the images, the thickness of the connecting lines was increased to two pixels to better distinguish the dot-line foreground from the grid background. For the fisheye transformation experiments, we used a transformation func- tion, taken from Leung and Apperley (1994): T (x) = (d+ 1)x (dx+ 1) (5.1) where T (x) is the transformed value given input x, and d is the transformation factor. A larger d value leads to a higher degree of distortion. Figures 5.1 to 5.4 show a series of stimuli showing all the transformation types and levels. Additional stimuli used in the experiment are included in Appendix C.3. 108 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.4: Polar fisheye transformations sample stimuli. First two rows show sample stimuli for the polar fisheye transformation with rectangular-grid in expt 9. Expt 8 used similar stimuli without background grids. The third row shows one example stimulus used in expt 10, transformed at the maximum transforma- tion factor with a polar-grid. Stimuli in the last row were used for expt 10-ext. Grids have been darkened for printing purposes. 109 Chapter 5. Visual Memory Costs of Image Transformations 5.1.3 Participants A different group of 20 participants was tested in each of the 12 experiments. All were university students with normal or corrected-to-normal vision. Their ages ranged from 18 to 34 years. 5.1.4 Protocol For each of the 12 experiments, all 20 participants completed trials on all five levels of the test transformation, and the order of appearance of the levels were partially counterbalanced among the participants. The actual presentation orders used are listed in Appendix C.4. Each experiment had a separate pool of stimuli. The stimulus was randomly selected from a pool of 50 and each only appeared once in the experiment for each participant to avoid learning effects, but the same pool was used for all participants in each experiment. Prior to the actual experiment, participants were shown samples of original and transformed images to help them understand the transformation. Each transformation-level session consisted of two phases: learning and recognition. In the learning phase, participants were asked to study eight un- transformed images; each was displayed for 12 seconds and followed by a 2.5- second blank screen before the next image appeared. Participants were told they would need to recognize those images later on in the experiment, and that some of these images might be transformed in a manner similar to sample images shown during the training session. In the recognition phase, eight transformed images were shown to participants in sequence. Half of these had been shown in the learning phase in their original form. The participants’ task was there- fore to indicate whether they had seen the images in the learning phase. Task instructions presented to participants are included in Appendix C.2. Prior to the experiment, participants were trained on the task using un- transformed images in both the learning and the recognition phase. They were required to obtain at least 80% accuracy before starting the actual study. Each experiment typically took 30 minutes. Participants were compensated for their time with five dollars. Based on our pilot experience, in order to do well on the tasks, participants needed to pay close attention to the test images during the learning phase. As an added incentive, we informed participants that high-accuracy scores would result in additional five-dollar bonuses. 110 Chapter 5. Visual Memory Costs of Image Transformations 5.2 Data Analysis and Result Summaries Experiment No-cost zone Time Accuracy Combined 1. Scaling: no-grid ≥ 0.2x ≥ 0.2x ≥ 0.2x 2. Scaling: rect-grid ≥ 0.2x ≥ 0.2x ≥ 0.2x 3. Rotation: no-grid 45◦ 45◦? 45◦ 4-ext. Rotation: rect-grid 60◦ 60◦ 60◦ 5. Rect Fisheye: no-grid d = 1 d = 1 d = 1 6. Rect Fisheye: rect-grid d = 2 d = 2 d = 2 7. Rect Fisheye: polar-grid d = 2? d = 2 d = 2 8. Polar Fisheye: no-grid d = 1? d = 1 d = 1 9. Polar Fisheye: rect-grid d = 2 d = 2 d = 2 10. Polar Fisheye: polar-grid d = 2? d = 2? d = 2 Table 5.1: Summary of experimental results: no-cost zones. A no-cost zone is the largest degree of transformation that can be compensated for without incurring a cost in performance. The combined result is the minimum of the time and accuracy results. Note that results from expt 10-ext are not included since they are inconclusive. Experiment Tx Level Performance Cost Time(s) Accuracy(%) 1. Scaling: no-grid none none none 2. Scaling: rect-grid none none none 3. Rotation: no-grid 60◦ 5.4 (3.4) 69 (88) 4-ext. Rotation: rect-grid 90◦ 5.9 (4.1) 75 (88) 5. Rect Fisheye: no-grid d = 2 5.2 (4.6) 50 (88) 6. Rect Fisheye: rect-grid d = 3 3.9 (2.8) 75 (88) 7. Rect Fisheye: polar-grid d = 3 5.5 (3.5) 75 (94) 8. Polar Fisheye: no-grid d = 2 4.7 (3.7) 75 (94) 9. Polar Fisheye: rect-grid d = 3 5.6 (3.5) 75 (88) 10. Polar Fisheye: polar-grid d = 3 5.6 (3.8) 75 (88) Table 5.2: Summary of experimental results: performance cost at the transfor- mation levels just outside the no-cost zones, as shown in the Tx Level column. Baseline values are in parentheses for comparison. Italicized results are cases where the boundaries were estimated based on observed trends instead of sta- tistical analyses. Note that results from expt 10-ext are not included since they are inconclusive. We recorded two performance measures: response time and accuracy. Re- sponse time was defined as the period from which the image was shown during the recognition phase, to the time when a response was made. Accuracy was 111 Chapter 5. Visual Memory Costs of Image Transformations the percentage of answers that correctly identified whether the images had been shown in the learning phase. Blind guessing would lead to 50% accuracy, since half of the images shown in the recognition phase were present in the learning phase. For the analysis of response times, we used a repeated measure single-factor Analysis of Variance (ANOVA) with transformation type as the factor for each experiment. We used the Greenhouse-Geisser adjustment and marked the re- sults as adjusted if the sphericity assumptions were violated. Post-hoc analyses were performed for statistically significant results with Bonferroni correction and marked as corrected. For the accuracy results, we used the Friedman test for the initial analyses, and the Mann-Whitney test for post-hoc analyses. Only significant results are reported for the post-hoc analyses. For each experiment, we mapped out a no-cost zone beyond which the per- formance began to degrade, as indicated by measurably higher response times and lower accuracy rates when compared to performance on untransformed im- ages based on statistical analyses. Limitations of our no-cost zone definition are further discussed in Section 5.5. Due to the large number of experiments, we summarized our results in Ta- ble 5.1. For cases where boundaries were not established by statistical analy- ses, we provided estimates based on result trends and marked them by a ‘?’. Table 5.2 lists the results immediately outside of the identified no-cost zones. Corresponding baseline values were provided in parentheses for comparison. As the tables indicate, visual memory was robust against many forms of transformations to a large extent. Scaling did not impact performance down to a reduction factor of at least 0.2x. Rotation did not seem to affect performance up to 45 degrees and both fisheye transformations had little effect on time or accuracy up to d = 1. The presence of grids generally extended these boundaries. 5.3 Detailed Results and Statistics We now provide the detailed experimental results and data analyses for each of the four transformation types. For readability, details of the ANOVA and the post-hoc analysis results are listed in Appendices C.5.1 and C.5.2 respectively. 112 Chapter 5. Visual Memory Costs of Image Transformations 5.3.1 Scaling transformation Figure 5.5 shows the results. Results showed no significant differences be- tween the five levels, with or without adding grids to the images: time/no-grid: F(2.3, 43.2) = 0.67, p = .54, adjusted; accuracy/no-grid: χ2(4, N=20) = 2.01; time/rect-grid: F(4, 76) = .60, p = .67; accuracy/rect-grid: χ2(4, N=20) = 3.15, p = .53. Scaling over the ranges studied was not found to impact performance, and further reduction of stimuli would render them to small too discern details. 5.3.2 Rotation transformation Figure 5.6 shows the results. For the no-grid experiment, we found a marginal main effect in response time (F(1.9, 35.8) = 2.92, p = .070). Post-hoc analysis indicated that performance degradation was measurable beginning at 60 degrees, at which participants took 5.4 s compared to the 3.4 s baseline. We also found a marginal main effect in accuracy (χ2(4, N=20) = 8.75, p = .070) but could not identify a clear no-cost boundary. For the rectangular-grid experiment, we failed to find a main effect in both time (F(2.6, 49.7) = 1.33; p = .27, adjusted) and accuracy (χ2(4, N=20) = 7.16, p = .13), thus we were unable to locate no-cost zone boundaries based on these results. Since we found relatively little performance degradation in the rectangular- grid results, we extended the range of rotation to cover 0, 90, 120, 150, and 180 degrees in expt 4-ext. In order to keep our five-level design, we did not revisit 30, 45, and 60 degree rotations in expt 4-ext, but we did include the 90-degree rotation condition as a reference point to compare with expt 4. The results are shown in Figure 5.6 as “Rectangular Grid Ext”. In expt 4-ext, we obtained similar results for 0 and 90-degree conditions as in expt 4, albeit the 90-degree result was 8% higher numerically, but not significantly different. Unlike the case in expt 4, we found a main effect in response time (F(4, 76) = 5.05, p = .001) in expt 4-ext. Post-hoc analysis indicated both the 90-degree and the 180-degree rotation trials were significantly slower at 5.9 s compared to the 4.1 s baseline. We also found a main effect in accuracy (χ2(4, N=20) = 14.95, p = .005). Post-hoc analysis indicated the transformed trials were 14% less accurate than baseline. These results therefore suggested a no-cost boundary of 60 degrees. To de- termine the improvement provided by the rectangular grid, we compared the accuracy between the non-grid and grid trials from 30 to 90 degrees. Accu- 113 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.5: Results for the scaling experiments with N = 20. Response time data points are averages with 95% confidence interval bars. Accuracy results are medians with quartiles. 114 Chapter 5. Visual Memory Costs of Image Transformations racy for the grid results were higher than their non-grid counterpart by 10% (two-tailed Mann Whitney test, p = .03). This increase in accuracy was not accompanied by an increase in time, thus ruling out any time-accuracy tradeoff. 5.3.3 Rectangular fisheye transformation Figure 5.7 shows the results. For the no-grid experiment, we found a marginal main effect in response time (F(1.9, 36.2) = 2.83, p = .074, adjusted). It took 0.6 s longer for d = 2 and d = 3 trials than the 4.6 s baseline. We also found a main effect in accuracy (χ2(4, N=20) = 43.80, p < .001) and the d = 2 and d = 3 trials were 33% less accurate than the rest of the trials. Using the one- sample z-test, we found that the accuracy for the d = 2 and d = 3 trials were at chance (Z(N=40) = 1.44; p = .15). These results indicated a clear no-cost zone boundary at d = 1. For the rectangular-grid experiment, we found a marginal main effect in time (F(2.78, 52.9) = 2.63; p = .063, adjusted). Post-hoc analysis indicated that d = 3 trials were slower at 3.9 s when compared to the 2.8 s baseline, indicating a no-cost time boundary at d = 2. There was a strong effect in accuracy (χ2(4, N=20) = 18.34, p = .001), with baseline and d = 1 trials being 15% more accurate than for d = 3, indicating a no-cost accuracy boundary at d = 2. For the polar-grid experiment, the main effect in time was also marginal (F(4, 68) = 3.32; p = .051, adjusted), with a marginal time degradation at d = 3 (p = .077, corrected). While the task accuracy main effect remained, it was much smaller (χ2(4, N=19) = 10.4, p = .034), with a no-cost accuracy boundary at d = 2. 5.3.4 Polar fisheye transformation Figure 5.8 shows the results. We failed to find a main effect in time for the no-grid experiment (F(1.82, 34.5)=2.3; p = .12, adjusted). There was, however, a main effect in accuracy (χ2(4, N=20) = 17.16, p = .002), with d = 2 and d = 3 trials being 20% less accurate than baseline, thus indicating a no-cost accuracy boundary at d = 1. A one-sample z-test analysis indicated that performance at d = 2 and d = 3 had not degraded to chance (Z(N=40) = 8.23; p < .001). For the polar-grid experiment, we found a main effect in time (F(4, 76) = 6.08, p = < .001). Post-hoc analysis indicated d = 3 trials were 1.7 s slower than baseline and d = 1 trials, which took 4 s on average. This indicated a time 115 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.6: Results for the rotation experiments with N = 20. Response time data points are averages with 95% confidence interval bars. Accuracy results are medians with quartiles. 116 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.7: Results for the rectangular fisheye experiments with N = 20. Re- sponse time data points are averages with 95% confidence interval bars. Accu- racy results are medians with quartiles. 117 Chapter 5. Visual Memory Costs of Image Transformations no-cost zone boundary at d = 2. We failed to find a main effect in accuracy (χ2(4, N=20) = 6.92, p = .14). For the rectangular-grid experiment, we found a main effect in time (F(4, 76) = 4.32, p = .003). Post-hoc analysis indicated d = 3 trials were slower by 1.8 s than the 3.8 s baseline and d = 1 trials, thus indicating a no-cost time boundary at d = 2. We also found an accuracy main effect (χ2(4, N=20) = 11.27, p = .024). Post-hoc analysis indicated d = 3 trials were 12% less accurate than baseline, thus indicating a no-cost accuracy boundary at d = 2. Despite extending the no-cost boundaries from d = 1 to 2, the presence of either polar or rectangular grids on polar fisheye transformed images did not substantially improve accuracy. This pattern was in stark contrast to that found in the rectangular fisheye experiments and suggests that there is something unusual about the polar fisheye transformation. One possibility involves the shape of the lines connecting the dots. In exper- iment 10, the connecting lines were straight. If straight lines were less natural in the polar transformed images than in their rectangular counterparts, then this unnaturalness may have contributed to the lack of benefit of grids in the polar trials. To test our hypothesis, we extended the polar fisheye experiment to look at line shape in experiment 10-ext, where the straight lines in the original images were drawn based on either a polar coordinate system (polar-line), a rectangular coordinate system (rect-line), or a mirror image of the ones drawn in the polar coordinate system (antipolar-line). The last case was included to tease out any potentially adverse effects induced by an unnatural transformation on the lines. Theoretically, transformation can be applied globally to the surrounding space, or locally to the objects in the space. In experiment 10, we assumed that space was transformed without affecting the sizes or shapes of the dots and the lines, as if they were pinned on the surface instead of completely adhered to the surface of transformation. The only exception was in scaling, where we had to transform the dot size to avoid collision. To determine if this might account for the polar fisheye results, we also included a case where we transformed the size of the dots and keeping the lines in the rectangular coordinate system (scaled-dot) We failed to find a main effect in time (F(2.4, 45.5) = 2.09, p = .13), but did find a main effect in accuracy (χ2(4, N=20) = 15.7, p = .003). Post-hoc analysis indicated that our participants made significantly more errors in the polar-line trials than base-line, and the accuracy was at chance (Z(N=20) = 1.45; p = .15). 118 Chapter 5. Visual Memory Costs of Image Transformations Examples of these transformations are shown in the last row of Figure 5.4, and Figure 5.9 shows the results. In essence, the pattern found for the polar fisheye results does not appear to be due to the scaling of the dots, nor the shape of the lines connecting them. Instead, it appears to be that the polar fisheye transformation may simply be better suited to visual memory. 5.4 Discussion Our results were used to map out no-cost zones in all the transformation types studied. We first compare our results to Lau et al.’s (2004) investigations, which were complementary to ours and studied visual search instead of visual memory. We then examine our results in the context of two design guidelines for using image transformations in interfaces: the use of background grids to mitigate perceptual costs incurred by image transformations (Zanella et al. 2002), and preserving horizontal/vertical ordering, proximity, and topology to minimize transformation-incurred disruptions (Misue et al. 1995). 5.4.1 Effects of image transformations We compare our study result with those of Lau et al.’s (2004) investigations of perceptual costs in geometric transformations measured in visual search tasks to locate the figure “T” amongst a population of “L” figures. Both ours and Lau et al.’s (2004) study results suggested that invariance was possible for all geo- metric transformations for up to a point. Interestingly, this invariance appeared to be more extensive in recognition than search tasks. For example, search task performance degraded after a 50% reduction, while memory task performance remained unaffected even at 20% of the original size. Participants could also tolerate a larger degree in rotation (memory: 45◦; search: 17◦), and a larger amount of polar fisheye transformation (memory: d = 1; search: d = 0.5)2. While we applied the transformations to dot locations in most of our exper- iments, we found interesting results when we applied the polar fisheye transfor- mation to dot sizes, and drew the connecting lines based on different coordinate systems. Contrary to our intuition, trials using images with lines drawn based on the polar coordinate system were the least accurate and equivalent to blind guessing, while corresponding trials with supposedly unnatural mirror images of 2The Lau et al. (2004) experiments used a different fisheye polar transformation function with a transformation factor c. A c value of 1.2 can be roughly translated to our d = 0.5. 119 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.8: Results for the polar fisheye experiments with N = 20. Time data points are averages with 95% confidence interval bars. Accuracy results are medians with quartiles. 120 Chapter 5. Visual Memory Costs of Image Transformations Figure 5.9: Results for the extended polar fisheye experiment with N = 20. Time data points are averages with 95% confidence interval bars. Accuracy results are medians with quartiles. orig = original image; scaled-dot = dot sizes transformed; antipolar-line = lines drawn as the mirror image in the polar coordinate system; polar-line = lines drawn in the polar coordinate system; rect-line = lines drawn in the rectangular coordinate system. these lines exhibited better performance. These results suggest that distinctive local structure, rather than global consistency, was a more important factor in memorability. At large distortions, the lines in the polar-line images formed similarly rounded shapes, while corresponding antipolar-line images produced figures with enough acute angles to remain distinguishable, despite their blatant incongruity with the underlying transformation and with the coordinate system. 5.4.2 Effects of grids In the design guidelines listed at the beginning of this chapter, Zanella et al. (2002) suggested using background grids to mitigate perceptual costs incurred by image transformations. We found that for visual memory, adding grids to the images appeared to help in two ways: 1. No-cost zone extension. The presence of either rectangular or polar grid 121 Chapter 5. Visual Memory Costs of Image Transformations generally pushed the no-cost zone boundaries to higher levels. For exam- ple, the combined no-cost zone boundary for the fisheye transformations were increased from d = 1 to d = 2, and the rotation boundary was pushed from 45 to 60 degrees. 2. Accuracy improvement. Grids were found to improve accuracy. For rota- tion, participants were 10% more accurate in grid trials without spending extra time in the task, thus ruling out potential time-accuracy tradeoffs. In the case of rectangular fisheye transformation, we found that partici- pants’ accuracy improved from chance to baseline at d = 2, and to 75% at d = 3, again without time compensation. Interestingly, we failed to observe substantial improvement by adding grids to polar fisheye trans- formed images. Here, the grids appeared to simply elevate response times slightly, echoing the results for visual search (Lau et al. 2004). To understand the apparent lack of performance improvement in polar tri- als, and to obtain further insights to the different transformation types and their interactions with grids, we revisited the design guidelines described at the beginning of the chapter. 5.4.3 Revisiting design guidelines Design guidelines discussed at the beginning of the chapter were based mostly on design experiences and were mostly abstract. In this section, we explain our results based on Misue et al.’s (1995) guidelines to provide concrete examples, and suggest refinement on preserving orthogonal ordering based on our results. Misue et al. (1995) suggested that horizontal/vertical ordering, proximity, and topology should be maintained to minimize disruptions incurred by image transformations. Scaling preserves all three; the limit of this transformation seems to be how far can one reduce the image before the details can no longer be perceived. This finding is consistent with the common interface design practice of using scaled-down versions of images to represent full-resolution file contents, especially when the file content is visually salient, as in the cases of most image files and graphically intense web pages. Indeed, various forms of thumbnails have been suggested for small-screen devices to avoid the laborious reauthoring of desktop-sized web pages for small screens (Woodruff et al. 2001; Wobbrock et al. 2002). The rotation transformation violates horizontal/vertical ordering but main- 122 Chapter 5. Visual Memory Costs of Image Transformations tains proximity and topology. Interestingly, rectangular grids fail to improve performance starting at a 90-degree rotation. Since our images did not have a clear up-down axis, this limit may be due to our inability to recognize the main vertical axis and the up direction in the image. Having a rectangular grid may help re-orientation, but only if the information provided by the grid is unambiguous. For example, the grid looked the same for 0, 90 or 180-degree ro- tations, and similarly for 30 or 120-degree and 60 or 150-degree rotations. Taken together, our results suggest a refinement to Misue et al.’s (1995) guideline on maintaining orthogonal ordering: transformation should preserve an orthogonal relationship between principal axes with a clear up and down. For both fisheye transformations, proximity is violated while preserving hor- izontal/vertical ordering and topology. In that case, the perceptual challenge is to discern the relative distance between objects in the image. The polar fisheye transformation seemed to be much better tolerated than its rectangular counterpart, as accuracy was maintained at 75% even outside the no-cost zone in the polar case while corresponding rectangular trials showed chance perfor- mance. This result was not expected, as the polar transformation’s rounded appearance does not look natural on a rectangular screen (Leung and Apperley 1994); among other things, it bends horizontal and vertical lines. Nonetheless, the polar fisheye transformation is generally preferred over its rectangular coun- terpart in map applications, since the distortion may be perceived as consistent with the effect of distorting a planar map onto a hemisphere, and the trans- formation preserves the angle of the original image (Skopik and Brown 1992; Churcher et al. 1997). The polar fisheye transformation may also be more fa- miliar than rectangular, as the effect resembles that produced by the ultra-wide angle fisheye lens used in photography. The number of transformation parameters and their degree of integration may further explain the smaller degree of degradation observed in our polar fisheye trials. In the rectangular case, the width and height are transformed separately. Rectangles that are the same distance from the focus point may not have the same size and shape. Objects may thus be distorted with different aspect ratios based on their horizontal and vertical distances, which may impose a higher mental load (Bartram et al. 1995). In contrast, the polar fisheye trans- formation only distorts radial distances, and may not incur the same problem as the rectangular case. This issue may also explain the different effects we observed in our fisheye transformation trials. In the rectangular fisheye trials, adding a polar or rectan- 123 Chapter 5. Visual Memory Costs of Image Transformations gular grid improved accuracy from chance to 75% without time compensation. In contrast, neither a rectangular nor a polar grid improved performance in the corresponding polar fisheye trials. One possibility is that the grid, rectan- gular or polar, provided a powerful visual cue encoding standard distances in transformed images that helped to offset the difficulty in distance estimation when the image was distorted, as in the rectangular fisheye case. Since distance transformation is integrated in polar fisheye transformations, distance estima- tion may not be as difficult as in the rectangular case, thus nullifying potential benefits brought about by adding a grid. Visual cues may also be used to aid recognition of objects. Researchers have investigated how the boundary of a scene affects target location after learn- ing (Hartly et al. 2004), and how view-point changes affect scene recognition (Christou et al. 2003). Smooth animation is another technique believed to alleviate the disruptive effects of image transformations (Robertson et al. 1989; Bederson and Bolt- man 1999). Similar to previous work on visual search (Rensink 2004; Lau et al. 2004), our current results suggest that the visual system could compensate for relatively large jumps in transformations. Both visual search and visual mem- ory have thus been ruled out as valid reasons for requiring smooth animation. Nevertheless, the need for such animation may arise from some other consider- ations, and so further investigations are needed before advocating relaxing that design guideline. 5.5 Limitations of Study Our study is limited in two aspects: the definition of no-cost zones and the choice of geometric transformation types. 5.5.1 Definition of no-cost zones The main motivation behind defining no-cost zone for each transformation type is to connect to and enable comparison with Lau et al.’s (2004) study results. Lau et al.’s (2004) study also investigated perceptual costs of geometric trans- formations but in visual search tasks, and is therefore complementary to our study. However, the definition of no-cost zone is limited. For this study, we de- fined the boundary of no-cost zone as the first level of transformation at which we could measure performance degradation. Our identified boundaries were 124 Chapter 5. Visual Memory Costs of Image Transformations therefore limited by at least two factors: our ability to find statistically signif- icant performance differences between different levels in our experiment, and the number of levels we tested in our experiments. It can be argued that our inability to detect performance differences between experimental levels could be due to lack of experimental power instead of true absence of performance degradation, even though our marginally significant cases had medium to large effect sizes (Appendix C.5.1). 5.5.2 Transformation type In this work we adopted the view that geometric transformations simply af- fected object locations within a space. An equally valid view is to consider transformation on the space itself and the objects embedded within it. That view corresponds to transforming dot sizes and line shapes in addition to dot locations, so visual cues providing more information about how the space has transformed could improve performance. We briefly studied this issue in our extended study on polar fisheye transformation in experiment 10-ext, where we looked at effects of transforming dot sizes and their connecting lines drawn in various coordinate systems. Our results suggest that memorability may depend more upon local image structure than on global consistency with the underly- ing transformation and coordinate system. Further investigations are needed to establish this conclusion more firmly. Our experiments looked at how a single and uniform transformation affects visual memory. In real-life situations, images may transform by parts and inde- pendently. It would be interesting to compare our results with those obtained using multiple transformations on a single image. We suspect the perceptual limits for multiple transformations will be much smaller than those established in our current set of experiments. We decided on a small number of dots in the stimuli to create an acceptable level of task difficulty, but scalability is of interest. It would be interesting to see if the total number of dots in the stimuli would impact visual memory in similar ways if the stimuli contain local features that are individually salient and memorable. Also, in most information visualization interfaces, the whole purpose of fisheye transformation is to create space for new information to be displayed, instead of creating a large empty space as in our fisheye stimuli (Figures 5.3 and 5.4). We suspect having new information added to the stimuli would further reduce no-cost zone boundaries defined in this study. 125 Chapter 5. Visual Memory Costs of Image Transformations 5.6 Summary of Results and Implications for Design We examined the effects of four different types of transformations on visual memory: scaling, rotation, rectangular fisheye, and polar fisheye. We found no-cost zones in all of the transformation types that exceed those found in Lau et al.’s (2004) work on visual search. We also found substantial benefits in applying grids to images for all of our transformation types except for polar fisheye. Our work therefore quantified the limits of our visual memory in coping with geometric transformations, and validated the use of grids as a visual cue to aid recognition of images. The main contributions of our study are to provide empirical evidence to verify, exemplify, and to refine design guidelines that were based mostly on de- sign experience and are abstract. Two main design implications drawn from study results are a refinement on Misue et al.’s (1995) guideline on orthogo- nal ordering where we suggested providing a clear up-down direction indicator may be sufficient, and the effectiveness of background grids to mitigate visual memory costs in geometric transformations. Even though this study systematically quantifies visual memory costs of two- dimensional geometric transformations, it is difficult to apply the no-cost zone boundary values directly to interface design. First, in addition to our study limitation in the correct identification of no-cost zone boundaries as discussed in Section 5.5.1, these boundaries were determined based on collected task com- pletion time and accuracy only. It is therefore unclear if other costs such as cognitive load may play important roles in task performance. Second, the ab- stract task of image recognition is difficult to extend to real-life tasks, as we are not sure how visual recognition affects visualization use in real-world systems such as multiple-VIR interfaces, even though common sense informs us that it has to be an important factor. For example, it is unclear if recognition of a transformed image guarantees usability of the transformed image. In other words, it may not be the case where image recognition implies that the user can identify individual nodes, or the relationships between nodes, in a trans- formed image of a network. For these reasons, direct application of no-cost zone boundary values in design is difficult. To study interfaces under more realistic and applicable situations, we mod- eled our next study using the experimental-simulation strategy frequently used 126 Chapter 5. Visual Memory Costs of Image Transformations in human-computer interaction and information visualization studies. Our study first identified perceptual requirements for effective overview use in pilot inves- tigations, and then examined these effects in detail in the actual study. 127 Chapter 6 Experimental-Simulation Study: Overview Use in Multiple Visual Information Resolution Interfaces The third study in this thesis looked at overview use in multiple visual informa- tion resolution (VIR) interfaces with fully-interactive interfaces, scenario-based tasks, and recorded detailed observations. The goal of the study is to understand overview use in multiple-VIR display. More specifically, we studied perceptual requirements of overview graphical objects that permitted users to select areas of interest for further examinations in high-VIR displays, and examined how different spatial arrangements of the VIRs can support overview use. We studied four interfaces: low VIR, high VIR, and two multiple-VIR inter- faces where high and low VIRs were available in separate regions or embedded together. Our study data were unordered collections of line graphs synthetically created for specific visual characteristics at low and high VIRs. At low VIRs, we used colour encoding of the y-dimension to create a heatmap-like strip. At high VIRs, we used height coding in conjunction with colour for y-values to show a more traditional plot. To better study interface preferences, our participants could use any combination of VIRs in the multiple-VIR interfaces. We found that in cases where our two perceptual requirements, visual sim- plicity and narrow visual span, were not met, participants using our multiple- VIR interfaces did not obtain better time and accuracy performance over the high-VIR interface, even when the multiple-VIR interfaces offered obvious ben- efits such as visually associating detailed plots with strips in a complex-target matching task, or side-by-side display in a visual comparison task. In fact, we 128 Chapter 6. Overview Use in Multiple-VIR Interfaces were intrigued to find that at least 20% of participants chose to forego these benefits and devoted the entire interface to the high-VIR display. We conjecture that our results reflect the high interaction costs of multiple-VIR interfaces and the surprisingly stringent target visual requirements to enable effective overview use in multiple-VIR interfaces. 6.1 User Study Design We took a different approach in this study since we had considerable difficulty in extending the visual-memory experiment results to design. While perceptual studies collect important information about our visual system, building results obtained from abstract tasks with static images into design guidelines for vi- sualization is a challenging and long process, especially when visualization use is complex and dynamic, and our understanding of human vision, memory and cognition is still incomplete. Also, the complexity of both our visual system and visualization use make it difficult to isolate and identify factors to build models of interface use, and perceptual studies are not optimal in discovering new factors. In designing our experimental-simulation study, we took what we believed to be the strengths of our visual-memory experiment and perceptual studies in general: rigorous experimental design with established protocols and tasks. We therefore took care to develop study tasks based on published task taxonomies (Section 6.1.1), used synthetic data to control visual features (Section 6.1.2), and used comparable visual elements to encode the data (Section 6.1.3). To better observe true interface use, we modified standard study design by allowing our participants to decide on interface use: our interfaces provided a simple mechanism to switch between interface modes, and our participants could use either single-VIR mode in all of the multiple-VIR trials. This study design choice resulted in interesting insights into interface use. We studied four interfaces: two single-VIR (LoVIR, HiVIR) as compari- son baselines, and two multiple-VIR (Embedded, Separate). We had four vi- sual search and compare tasks, and collected three types of data: performance measurements as time and error rates; detailed observations of participant be- haviours and strategies; and participant feedback from subjective question- naires. 129 Chapter 6. Overview Use in Multiple-VIR Interfaces 6.1.1 Study tasks We developed four study tasks based on operations in published taxonomies for task diversity and generalizability, such as locate, identify, compare, associate, distinguish, rank, cluster, correlate, and categorize (Amar et al. 2005; Roth and Mattis 1990; Tory and Möller 2004; Wehrend and Lewis 1990). We used a scenario of monitoring and managing electric power in a control room to develop concrete examples of these abstract operations. We first piloted with 12 tasks listed in Table 6.1. Based on pilot results and Tullis’s (1985) work on display characteristics described in the Related Work chapter (Section 3.1.1), we identified two target characteristics that affected high- and low-VIR view use: visual complexity and visual span. Complexity referred to the number of peaks in the target, where simple targets had a single peak and complex ones had multiple peaks. Targets were considered local when they spanned less than 2 degrees of visual angle, or 2 cm of horizonal display width at a viewing distance of 55 cm. Otherwise, they were considered as dispersed. Other display characteristics considered, but not further studied, include overall visual organization in the display, studied in question 8 and 10, and visual uniqueness of line graphs, studied in questions 1, 3, 5, 6, 7, and 11 in Table 6.1. We selected four of the original twelve pilot tasks to address different aspects of these perceptual criteria, which were questions 2, 4, 9, and 12 in Table 6.1. In addition, visual instructions were also provided for participants to control for individual differences in visual analytical skills. Table 6.2 presents the task code names, and the domain instructions. Appendix D.2.2 contains all task instructions displayed on the study software. Table 6.3 summarizes task characteristics based on the two study perceptual requirements. 6.1.2 Study data We found in our pilot studies that data characteristics greatly influenced par- ticipant strategies. We therefore used synthetic data to ensure contrasting data characteristics, and developed tight criteria to create the study data collections. Each data collection contains several data groups: original feature to match and search target, distractor, and background. To avoid target pop-out by 130 Chapter 6. Overview Use in Multiple-VIR Interfaces Operation Class Task Instruction 1 distinguish Does location 107’s power consumption profile differ from the rest of the locations monitored? 2 find extremum (vertical) Which location has the highest power surge for the time period shown on the screen? 3 correlate A fault occurred at 6:00, and resulted in a temporary power surge. Which location is affected the earliest? 4 find extremum (horizontal) Which location has the most number of power surges? 5 compute derived value (standard deviation) Which location has the most stable power consump- tion profile? 6 compute derived value (average) Which location has the highest power consumption overall? 7 find anomalies Identify the unique power consumption profile in this collection. 8 characterize dis- tribution The power stations are sorted by longitude of their location vertically, with top of the screen being the top of the country. Is the highest power consumers in the top, middle, or lower 3rd of the country? 9 correlate A fault happened at location <x> at 6:00, causing a similar power surge in another location afterwards. Which one? 10 correlate + cate- gorize The following recording consists of recording of an en- tire year. Given power consumption increases with decreasing temperature, and Winter is the coldest season, which season do you think is at the beginning of the recording (Spring, Summer, Fall, Winter)? 11 filter + catego- rize The recording is taken from a power control room in the UK during a football match on TV. It is known that UK citizens tend have a habit of making tea dur- ing breaks, thus causing power surges. Which loca- tion did not receive the broadcast? 12 compare Find the power profile that is the same as that of location <x>. Table 6.1: Instructions for the twelve pilot study tasks, along with their opera- tions. colour or position, and to control task difficulty and visual diversity, we created two distractor and five background populations, each containing 19 or 20 time series with characteristic patterns of peaks. Examples of these populations are given later on in the section so as to discuss them in the context of study tasks. 131 Chapter 6. Overview Use in Multiple-VIR Interfaces Domain Instruction Visual Instruction Max: Which location has the highest power surge for the time period shown on the screen? Look for the brightest spot. You can mouse over and read the power off the tool-tip. Also notice the maximum power scale is shown above. Most: Which location has the most number of power surges? None needed. Shape: A fault happened at lo- cation <x> at 6:00, causing a similar power surge in another lo- cation afterwards. Which one? Look for a power surge of a similar shape as the one at location <x> at 6:00. Compare: Find the power profile that is the same as that of loca- tion <x>. All the profiles are exactly the same, ex- cept time-shifted by different amounts. The power surges of location <x> are in the middle of each column. Table 6.2: Instructions for the four study tasks. Task Complexity Span Comparison Max simple local no Most complex dispersed no Shape complex local yes Compare simple local yes Table 6.3: Summary of study task and data characteristics. Each peak was created using a Gaussian function with a specified mean that translates to peak location, and variability that translates to peak width. The peak was scaled to the required height. In addition, we added a random noise of up to 2 pixels in absolute value to better mimic real-life data (Kincaid and Lam 2006). Figure 6.1 shows the targets and distractors for the Max task, Figure 6.2 for the most task; Figure 6.3 for the Shape task, and Figure 6.4 for the Compare task. The parameters used were determined based on pilot results. For the Max task, the target peak was 10% higher, or 6% brighter on screen, than the distractor peaks and was at least 20% higher than the background peaks, as shown in Figure 6.1. For the Most task, the target line graph had six peaks of varying widths and heights at random x-positions, while the distractor line graphs had four peaks and the background graphs had three peaks or less, as shown in Figure 6.2. For the Shape task, the target and distractors were peak clusters of three narrow peaks with similar widths and different heights of low, medium and high. As shown in Figures 6.3 and 6.5, four configurations 132 Chapter 6. Overview Use in Multiple-VIR Interfaces were created: (a) high-low-high, (b) low-high-low, (c) low-medium-high, and (d) high-medium-low. For the Compare task, all line graphs had the same three- peak configuration, but horizontally shifted by ±10, ±20, or ±30 pixels from the target, as shown in Figure 6.4. As a result, participants could use any of the peaks in the three-peak line graphs for comparison. For each task, we generated a collection of 140 line graphs, each with 800 data points, for a total of 112,000 data points. These numbers were determined by the horizontal and vertical resolution of the display area, so that the entire collection could be visible without scrolling in LoVIR. 6.1.3 Interfaces We used two visual elements to show xy-data, inspired by the Line Graph Ex- plorer system (Kincaid and Lam 2006) that uses analogous but visibly different visual encodings for low- and high-VIR views. Both elements encoded the x- dimension in the same way, but their encodings of the y-data value differed: 1. Strip encoded the y-data with colour as a low-VIR strip of 6 pixels in height: 2. Plot doubly encoded the y-data with both colour and vertical spatial po- sition as a high-VIR plot of 45 pixels in height: Colour encoding was achieved by mapping y-value to saturation and bright- ness in the HSB space. To maximize line-graph detail perceivability, we mapped the normalized y-value y to saturation s and brightness level b using a sigmoidal function: s = 2 1 + e−4(1−y) − 1; b = 2 1 + e−4y − 1 (6.1) Using these two visual elements, we built the four interfaces shown in Fig- ure 6.1: (a) LoVIR, (b) HiVIR, (c) Embedded and (d) Separate. The display area for all the interfaces was 872 x 880 pixels. LoVIR showed the data collection using only the strips, while the HiVIR interface displayed only the plots. Both Embedded and Separate provided strips and plots, showing only strips initially. In Embedded, left clicking on a strip added or removed a corresponding 133 Chapter 6. Overview Use in Multiple-VIR Interfaces plot directly below, with the pair bounded by a one-pixel perimeter box to visually reinforce the association. In Separate, left clicking on a strip added or removed the corresponding plot in the bottom panel, and marked or unmarked both the strip and the plot with separate perimeter boxes. The lower-plot window automatically resized with newly added plots and took up screen area from the upper-strip window for up to half the screen height, after which the sizes of both windows remained constant, and the lower-plot window accommodated newly added plots with vertical scrolling. Users could inactivate the automatic panel resizing by man- ually dragging the panel divider. Dragging the panel all the way to the top or the bottom of the screen allowed users to manually transform Separate to either HiVIR or LoVIR. All interfaces had a panel on the far left to display the strip/plot numbers as text strings for plots or as graphical bars for strips, as shown in Figures 6.6, D.1, and D.2. Positions and states of the number displays were linked with those of the corresponding strips/plots. Common interactions For consistency, we standardized a number of interactions, adding only slight interface-specific adaptations. • Scrolling. A scrollbar supported vertical scrolling when display height exceeded panel height. LoVIR never required scrolling while HiVIR always did. Embedded and Separate became scrollable once a plot was added. Both the top and the bottom panels were separately scrollable in Separate. None of the interfaces required horizontal scrolling. • Mouse-click marking. A left click toggle-marked a strip/plot. In LoVIR, Embedded, and Separate, the mark was a one-pixel box surrounding the entire strip in the low-VIR view. In HiVIR, we marked by coloring the plot background, because perimeter marking was not salient in the visu- ally noisy plots. An example of highlighted plot background is shown in Figure 6.6 for plot 110. • Key-press global action. For the single-VIR interfaces, participants could mark all strips/plots with the O key, and unmark them with the ESC key. For the multiple-VIR interfaces, pressing the O key added all plots to the 134 Chapter 6. Overview Use in Multiple-VIR Interfaces Figure 6.1: Main study panel showing Max task data: (a) LoVIR, (b) HiVIR, (c) Embedded, and (d) Separate. The targets are circled in cyan, and one of the distractors are circled in yellow. high-VIR view in Separate, or opened all plots within Embedded. Pressing Esc restored the initial low-VIR view. • Mouseover highlighting. For all the interfaces, a red one-pixel box ap- peared around the strip/plot perimeter on mouseover to provide visual feedback of the strip/plot in focus. In Separate, the strip-plot pair was highlighted for visual linking. Figures 6.1 and 6.6 show mouseover high- 135 Chapter 6. Overview Use in Multiple-VIR Interfaces Figure 6.2: Main study panel showing Most task data: (a) LoVIR, (b) HiVIR, (c) Embedded, and (d) Separate. The targets are annotated with arrows in cyan. The rest of the line graphs are distractors. lighting. • Mouseover tool-tips. For all the interfaces, mouseover triggered a tool-tip to immediately appear, displaying the x- and the y-value of the data point under the cursor and the strip/plot number of that row. 136 Chapter 6. Overview Use in Multiple-VIR Interfaces Figure 6.3: Main study panel showing Shape task data: (a) LoVIR, (b) HiVIR, (c) Embedded, and (d) Separate. The targets are circled in cyan, and one of the distractors are circled in yellow. 6.1.4 Participants 24 participants, 15 of them female, were recruited using an online reservation system. The average age of participants was 26 years and ranged between 19 to 40 years. Most were university students, with less than half from the Department of Computer Science. 137 Chapter 6. Overview Use in Multiple-VIR Interfaces Figure 6.4: Main study panel showing Compare task data: (a) LoVIR, (b) HiVIR, (c) Embedded, and (d) Separate. The targets are annotated with arrows in cyan. The rest of the line graphs are distractors. 6.1.5 Material The study was conducted on a desktop machine with a 3.2GHz Intel P4 CPU, 1.5 GB of RAM, and Java 1.5.0 06, using a 19-inch LCD display with 1280 x 1024 pixels. 138 Chapter 6. Overview Use in Multiple-VIR Interfaces Figure 6.5: Sample targets for the Shape task. Four three-peak targets were created for the study: (a) high-low-high, (b) low-high-low, (c) low-medium- high, and (d) high-medium-low. Figure 6.6: The HiRes study interface showing Most task data. The full display window had a narrow region on the far left with strip/plot numbers, and then a main panel in the middle whose contents depended on the interface. The far right panel contained study instructions: on top, information on visual encoding and available interface interactions; beneath that, task instructions, as provided in Table 6.2; on the bottom, the Show Data and Answer Ready buttons. 139 Chapter 6. Overview Use in Multiple-VIR Interfaces 6.1.6 Study design and protocol The study was a within-subject, two-factor design with interface and task being the two factors, each with four levels. All four interfaces were tested against the four tasks. Each task had four isomorphic data sets, one for each trial. The order of presentation of the interfaces was counter-balanced between participants. Task ordering was randomized, and data ordering was fixed to avoid repeats in interface/data pairing between participants. Figure 6.7: Experimental protocol for this study. As depicted in Figure 6.7, the experiment consisted of four interface sessions, with one training and one actual task for each of the four interface/task com- binations. Training for all four tasks preceded actual tasks for each interface session. The experimenter began by explaining the compact visual encoding used in the low-VIR views. Participants were then told about the structure of the study. They were encouraged to try out interface features and to explore new strategies for the different interfaces during training, as strategies devel- oped for one interface might not be appropriate for another. Appendix D.2.1 contains the script for the verbal instructions. Since the correct answers were obvious when found, participants were not explicitly told to optimize for speed or accuracy. The entire display window is shown in Figure 6.6 showing the HiVIR in- terface for the Most task. Figures D.1 and D.2 in Appendix D.2 shows the Embedded and the Separate interfaces respectively. For each task, participants first read the instructions in the right-hand panel of the study interface. When ready, they would press the Show Data button to display the data using the session interface. Once an answer was found, participants pressed the Answer Ready button to enter the answer in a dialogue box. Appendix D.2.2 contains 140 Chapter 6. Overview Use in Multiple-VIR Interfaces all the instructions displayed in the experimental software interface. For each interface/task combination, we allotted at least 10 minutes for participants to complete each training task. At the end of the 10 minutes, they had the option to end the training and be told the answer, or to continue the training. On average, participants took (3 ± 2) minutes to finish the training tasks, with similar averaged time over the four tasks. In terms of interfaces, the Separate training trials took four minutes on average, which was one to two minutes longer than the rest. Actual tasks had five-minute time limits, after which participants had to proceed to the next task without being informed of the correct answer. Breaks were allowed in between tasks, and there was a mandatory five-minute break after two interface sessions. For each task, the experimenter observed participant mouse actions, verbal comments, and non-verbal signals including large-scale eye movement and signs of frustration. These observations were recorded as textual narrations in real time. For example: “Look for target in low-res. Press O to switch to high- res. Scan and scroll from top. Found answer, visual check without using tool- tip”. We used these observations to help us interpret our performance time and accuracy results. We also developed a coding scheme for two kinds of usage behaviours: • Interface mode used to locate final answer. The three categories were LoVIR mode, HiVIR mode, and both. The observation, only recorded for the two multiple-VIR interfaces, was later corroborated by the electroni- cally recorded log of user actions. • Answer confirmation method. The two categories were visual comparison, and tool-tip/numeric confirmation, differentiated based on back-and- forth tool-tip activations of the target and the candidate line graphs. • Visual search mode. This observation was only collected for the LoVIR interface. The two categories were serial search, where the participant systematically inspected one strip at a time and in sequence, and visual spotting, where they surveyed the entire display simultaneously. Due to the narrow strips in LoVIR, serial search required the visual guide provided by the mouseover framing box, as shown in 6.1(a). For visual spotting, participants simply gazed at the display without any mouse interactions. After the four interface sessions, participants filled out two questionnaires. The first questionnaire solicited subjective ratings of the four interfaces over 141 Chapter 6. Overview Use in Multiple-VIR Interfaces the four tasks, while the second solicited the four interfaces’ ease of use with a five-point rating scale (Table 6.4). The actual questionnaires used in the study are included in Appendix D.3. The entire study took about two hours, and participants were compensated with CDN $20. Code Question find data It is easy to find the data using the display. compare data It is easy to compare between data using the display. navigate It is easy to navigate within the display. disorient It is easy to get disoriented using the display. remember It is easy to remember individual power profiles using the display. fun It is easy to fun and enjoyable using the display. effort It requires a lot of effort to use the display. frustrating It is frustrating to use the display. confidence I have confidence in my answer when using the display. Table 6.4: Five-point rating questions to solicit ease of use ratings for the four interfaces. Study hypotheses We developed three study hypotheses based on pilot observations and our beliefs about multiple-VIR interface use. H1 aimed to establish boundaries of our two selected perceptual requirements: H1 The targets should be simple and span a limited region for a single low- VIR display to be usable. We believed that the LoVIR interface would be the most efficient for the Max task, where the visual target satisfied both criteria; insufficient but usable for Shape task, where the target was complex; and would be unusable for the Most task, where both criteria were violated. In cases where the visual requirements were not completely satisfied, we hypothesized that selective display of high-VIR plots would mitigate the ad- verse effects of the lost perceivability, especially when the interface obviously supported the task. More specifically, our hypotheses were: H2 When the targets were visually complex and could not be easily detected in the strips, embedded display of high VIR plots alongside the low-VIR strips would prime the search by promoting the learning of the unfamiliar and abstract strip. 142 Chapter 6. Overview Use in Multiple-VIR Interfaces In other words, the Embedded interface would better support the Shape task than the HiVIR or the Separate interfaces, as the Embedded interface put the corresponding plots right underneath the strips. H3 When the targets were visually simple but similar to the distractors, precise identification of these targets using the low-VIR view would be difficult. However, users should still be able to select rough matches from the low- VIR view. The interface that displayed these potential matches in high VIR that allowed side-by-side comparisons would better support the task. In other words, the Separate interface would better support the Compare task than the HiVIR or the Embedded interfaces. 6.1.7 Study design choices As discussed at the beginning of the chapter, this study aimed to understand if users could still select regions of interests on low-VIR overviews that contained visual signals that did not fully comply with our two identified perceptual re- quirements: simple target with narrow visual span. Our goal of filling specific gaps in our understanding of multiple-VIR interface use led to eight main design choices. 1. Synthetic data. To create multiple isomorphic data sets with tight control over the visual characteristics of target, distractor and background graphs, we chose to generate synthetic data with real-world data characteristics. 2. Unordered data. While we used the visual encoding of the Line Graph Ex- plorer system to build our interfaces (Kincaid and Lam 2006), we specifi- cally avoided providing its sorting or clustering capabilities for two reasons. First, we wanted to focus on visual search and comparison based solely on visual qualities of individual targets, instead of the larger context. Pilot results showed that when the line graph collections as a whole showed larger trends, for instance clusters, the display was treated as a whole and participants did not selectively view individual line graphs in detail. Second, the power of reordering and clustering is already well understood (Kincaid and Lam 2006; Rao and Card 1994). 3. Task domain and visual instructions. To control for individual differ- ences in visual analytical skills between participants, we provided specific 143 Chapter 6. Overview Use in Multiple-VIR Interfaces domain task instructions on control room monitoring and the visual op- eration on the encoded data. Our scenario provided a concrete unifying story, but did not require any specific expertise on the part of participants. 4. On-the-fly interface switching. To observe our participants’ interface choices as another indicator of interface effectiveness, we allowed our participants to switch to either VIR of the multiple-VIR interfaces at any point, even though we provided an automatic mechanism to allocate screen space be- tween the two VIRs. 5. Only two discrete VIRs. Some previous multiple-VIR interface studies have found that distortion-based interaction across a continuous range of VIRs can decrease performance and satisfaction (e.g., Nekrasovski et al. 2006). In this study, we choose to focus on the issue of spatial arrangement of separating low-VIR regions from, versus embedding them within, high- VIR regions. We thus used only two discrete VIRs, as in systems like TableLens (Rao and Card 1994), to avoid conflating the question of spatial arrangement with that of distortion. Distortion was studied in Chapter 5, where we measured visual memory costs of geometric transformations. 6. Same platform and screen area across interfaces. A common platform en- sured consistent visual encoding, common interaction, and identical dis- play areas. 7. The full data set is simultaneously visible from the low-VIR interface to be used as an overview. Our data set size was therefore limited to the display capability of the low-VIR view, which was 140 line graphs. As a result of the last three design choices, vertical scrolling was needed when users chose to display plots. 6.2 Study Results In this section, we present performance results for the actual tasks as time and error counts, coded observations, and subjective questionnaire results. We used the original interface grouping for all the results even when participants switched to single-mode use in the multiple-VIR interface trials. In a separate analysis, we did not find significant differences between the single-mode use and the multiple-mode use populations in the multiple-VIR interface trials. 144 Chapter 6. Overview Use in Multiple-VIR Interfaces Discussions of hypotheses are delayed to Section 6.3. 6.2.1 Performance time and error results Performance time was defined as the period from which the participant pressed the Show Data button to the time when he pressed the Answer Ready but- ton. We analyzed the time results using repeated measure two-factor Analysis of Variance (ANOVA) with interface and task as the two factors. When the sphericity assumption was violated, we used the Greenhouse-Geisser adjust- ment and marked the results as adjusted. Post-hoc analyses were performed with Bonferroni correction, and we report significant post-hoc results only. Figure 6.8 shows the time results. Main effects of interface (F(3, 69) = 5.97, p = .001), task (F(3, 69) = 34.45, p < .0001), as well as interaction between the two (F(9, 207) = 11.20, p < .0001, adjusted) were found. For interface, post- hoc analysis indicated LoVIR trials were slower than Embedded or Separate. For task, all except the Most and the Shape results were different. For interface- task interaction, HiVIR/Max tasks were almost 3.5 times slower than the rest of the interfaces for the Max task. LoVIR/Most was almost 2 times slower than HiVIR/Most, Embedded/Most, and Separate/Most. LoVIR/Shape was 1.7 times slower than HiVIR/Shape, Embedded/Shape, and Separate/Shape. Error measures were binary for each task: 1 when the participant provided an incorrect answer and 0 otherwise. We first analyzed the data using the Friedman test, and used the Mann-Whitney test with appropriate corrections for post-hoc analysis. We report significant results only. Figure 6.9 shows er- ror results for each interface/task condition. Results showed that LoVIR/Most trials had 7 errors compared to the perfect scores of HiVIR/Most and Em- bedded/Most, and LoVIR/Shape trials had 6 errors compared to the perfect scores of Embedded/Shape and Separate/Shape. Along with the time results in Figure 6.8, we concluded that none of the interface/task results exhibited time-accuracy tradeoff: tasks that took longer also had more errors. 6.2.2 Observations We quantified our observations by classifying each trial into one of the encoded categories. For multiple-VIR interfaces, we based our counts on the interface mode used at the time where participants found the answers, and the count results are shown in Table 6.5. For all the interfaces, the methods for answer 145 Chapter 6. Overview Use in Multiple-VIR Interfaces confirmation are summarized in Table 6.6. For the LoVIR interface, the visual search modes used to locate the visual targets are shown in Table 6.7.
UBC Theses and Dissertations
Visual exploratory analysis of large data sets : evaluation and application Lam, Heidi Lap Mun 2008
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
- 24-ubc_2008_fall_lam_heidi.pdf [ 87.4MB ]
- JSON: 24-1.0051365.json
- JSON-LD: 24-1.0051365-ld.json
- RDF/XML (Pretty): 24-1.0051365-rdf.xml
- RDF/JSON: 24-1.0051365-rdf.json
- Turtle: 24-1.0051365-turtle.txt
- N-Triples: 24-1.0051365-rdf-ntriples.txt
- Original Record: 24-1.0051365-source.json
- Full Text