UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Why visualization? : task abstraction for analysis and design Brehmer, Matthew Michael 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_may_brehmer_matthew.pdf [ 93.96MB ]
Metadata
JSON: 24-1.0228806.json
JSON-LD: 24-1.0228806-ld.json
RDF/XML (Pretty): 24-1.0228806-rdf.xml
RDF/JSON: 24-1.0228806-rdf.json
Turtle: 24-1.0228806-turtle.txt
N-Triples: 24-1.0228806-rdf-ntriples.txt
Original Record: 24-1.0228806-source.json
Full Text
24-1.0228806-fulltext.txt
Citation
24-1.0228806.ris

Full Text

Why Visualization?Task Abstraction for Analysis and DesignbyMatthew Michael BrehmerB.Cmp. Cognitive Science, Queen’s University, 2009M.Sc. Computer Science (Human-Computer Interaction),The University of British Columbia, 2011a dissertation submitted in partial fulfillmentof the requirements for the degree ofDoctor of Philosophyinthe faculty of graduate and postdoctoralstudies(Computer Science)The University of British Columbia(Vancouver)April 2016c (by-nd) Matthew Michael Brehmer, 2016AbstractWhy do people visualize data?People visualize data either to consume or produce information relevantto a domain-specific problem or interest. Visualization design and evaluationinvolves a mapping between domain problems or interests and appropriatevisual encoding and interaction design choices. This mapping translatesa domain-specific situation into abstract visualization tasks, which allowsfor succinct descriptions of tasks and task sequences in terms of why data isvisualized, what dependencies a task might have in terms of input and output,and how the task is supported in terms of visual encoding and interactiondesign choices. Describing tasks in this way facilitates the comparison andcross-pollination of visualization design choices across application domains;the mapping also applies in reverse, whenever visualization researchers aimto contextualize novel visualization techniques.In this dissertation, we present multiple instances of visualization taskabstraction, each integrating our proposed typology of abstract visualizationtasks. We apply this typology as an analysis tool in an interview study ofindividuals who visualize dimensionally reduced data in di↵erent applicationdomains, in a post-deployment field study evaluation of a visual analysis toolin the domain of investigative journalism, and in a visualization design studyin the domain of energy management.In the interview study, we draw upon and demonstrate the descriptivepower of our typology to classify five task sequences relating to visualizingdimensionally reduced data. This classification is intended to inform thedesign of new tools and techniques for visualizing this form of data.iiIn the field study, we draw upon and demonstrate the descriptive andevaluative power of our typology to evaluate Overview, a visualization toolfor investigating large text document collections. After analyzing its adop-tion by investigative journalists, we characterize two abstract tasks relatingto document mining and present seven lessons relating to the design of vi-sualization tools for document data.In the design study, we demonstrate the descriptive, evaluative, and gen-erative power of our typology and identify matches and mismatches betweenvisualization design choices and three abstract tasks relating to time seriesdata.Finally, we reflect upon the impact of our task typology.iiiPrefaceParts of this dissertation have been previously published with various co-authors:A version of Chapter 2 has been published as A Multi-Level Typology ofAbstract Visualization Tasks byMatthew Brehmer and Tamara Munzner;in IEEE Transactions on Visualization and Computer Graphics (Proceed-ings of InfoVis 2013), 19(12), p. 2376–2385 [33]1. I conducted the literaturereview. Tamara and I both contributed to the meta-analysis of the litera-ture and writing. A modified version of the task typology proposed in thischapter appears in Visualization Analysis and Design by Tamara Munzner(AK Peters Visualization Series, CRC Press, 2014) [219].A version of Chapter 3 has been published as Visualizing Dimension-ally Reduced Data: Interviews with Analysts and a Characterization of TaskSequences byMatthew Brehmer, Michael Sedlmair, Stephen Ingram, andTamara Munzner; in Proceedings of the ACM Workshop on Beyond Timeand Errors: Novel Evaluation Methods For Information Visualization (BE-LIV 2014), p.1-8 [36]2. This publication was preceded by a technical re-port entitled Dimensionality Reduction in the Wild: Gaps and Guidance byMichael Sedlmair,Matthew Brehmer, Stephen Ingram and Tamara Mun-zner (UBC CS TR-2012-03) [283]3, and by an unpublished manuscript enti-tled Dimensionality Reduction in the Wild by Michael Sedlmair, MatthewBrehmer, Stephen Ingram and Tamara Munzner (2013)4. Michael con-1http://dx.doi.org/10.1109/TVCG.2013.1242http://dx.doi.org/10.1145/2669557.26695593http://cs.ubc.ca/cgi-bin/tr/2012/TR-2012-034Included in the appendices as Section B.6.ivducted the majority of the interviews with analysts between 2010 and 2012.All authors contributed to the initial analysis of the collected data. For theBELIV paper [36], I re-analyzed this data using the task typology describedin Chapter 2. I performed the majority of the writing for the BELIV 2014submission (which omitted much of the material from the earlier techni-cal report and manuscript); Tamara and Michael contributed to the editingprocess.A version of Chapter 4 has been published as Overview: The Design,Adoption, and Analysis of a Visual Document Mining Tool For Investiga-tive Journalists by Matthew Brehmer, Stephen Ingram, Jonathan Stray,and Tamara Munzner; in IEEE Transactions on Visualization and Com-puter Graphics (Proceedings of InfoVis 2014), 20(12), p. 2271–2280 [35]5.Overview was developed by Jonathan Stray with contributions from JonasKarlsson, Adam Hooper, and Stephen Ingram. Stephen’s algorithmic contri-butions are documented in greater detail in an earlier technical report [151]and in his PhD dissertation [146]. I conducted a post-deployment evaluationof Overview and its use by investigative journalists. Jonathan and I inter-viewed the tulsa, ryan, and dallas journalists; I interviewed the gunsjournalist, while Jonathan interviewed the newyork journalist. Jonathanconducted the think-aloud evaluation with journalists. I performed the anal-ysis of the interview data (including transcripts and screen captures), as wellas the Overview log data. I performed the majority of the writing for theInfoVis 2014 submission, while Tamara and Jonathan contributed to theediting process.A version of Chapter 5 has been published as Matches, Mismatches,and Methods: Multiple-View Workflows for Energy Portfolio Analysis byMatthew Brehmer, Jocelyn Ng, Kevin Tate, and Tamara Munzner; inIEEE Transactions on Visualization and Computer Graphics (Proceedingsof InfoVis 2015), 22(1), p. 449–458 [37]6. I conducted the work domain anal-ysis, sandbox prototyping, and the analysis of feedback on prototype designsfrom energy analysts. Jocelyn and I both contributed to the workflow design.5http://dx.doi.org/10.1109/TVCG.2014.23464316http://dx.doi.org/10.1109/TVCG.2015.2466971vKevin initiated the project and provided feedback on my process during myinternship at EnerNOC (then Pulse Energy); he also provided introductionsto energy analysts. EnerNOC’s Energy Manager development team, led byCailie Crane and Reetu Mutti, implemented some of our prototype designsinto a new commercial version of Energy Manager. I performed the majorityof the writing for the InfoVis 2015 submission, while Tamara and Jocelyncontributed to the editing process.All images in Chapter 2, Chapter 4, and Chapter 5 are reprinted withthe permission of the IEEE. Figure 4.3 is a detail from Figure C.1, an imageproduced by Jonathan Stray [305]. All images in Chapter 3 are reprintedwith the permission of the ACM, with the exception of Figure 3.3, whichappears in Tenenbaum et al. [313] and is reprinted with the permission ofthe AAAS. This dissertation includes several illustrations that were origi-nally created by Eamonn Maguire for Visualization Analysis and Design byTamara Munzner [219], including Figure 1.1, Figure 1.8, Figure 6.1, Fig-ure 6.2, Figure 6.3, and Figure 6.4; these illustrations are available for useunder the Creative Commons Attribution 4.0 International license (c BY4.0).The studies described in this dissertation were conducted with the ap-proval of the UBC Behavioural Research Ethics Board (BREB): certificatenumber H10-03336.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . .xxiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Research Trajectory . . . . . . . . . . . . . . . . . . . . . . . 21.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . 51.3.1 A Typology of Abstract Visualization Tasks . . . . . . 51.3.2 Use of the Typology in an Interview Study . . . . . . 71.3.3 Use of the Typology in a Field Study . . . . . . . . . 101.3.4 Use of the Typology in a Design Study . . . . . . . . . 121.3.5 Summary of Contributions . . . . . . . . . . . . . . . 151.4 Extension and Impact of the Typology . . . . . . . . . . . . . 161.5 A Note on Chronology . . . . . . . . . . . . . . . . . . . . . . 172 A Typology of Abstract Visualization Tasks . . . . . . . . . 20vii2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Background Context . . . . . . . . . . . . . . . . . . . . . . . 232.3 A Typology of Tasks . . . . . . . . . . . . . . . . . . . . . . . 272.3.1 Why is Data Being Visualized? . . . . . . . . . . . . . 282.3.2 How Does the Visualization Technique or Tool Sup-port the Task? . . . . . . . . . . . . . . . . . . . . . . 312.3.3 What are the Inputs and Outputs of the Task? . . . . 342.3.4 Concise Task Descriptions . . . . . . . . . . . . . . . . 352.4 Example: A Sequence of Interdependent Tasks . . . . . . . . 362.5 Connections to Previous Work . . . . . . . . . . . . . . . . . 392.5.1 Existing Classifications . . . . . . . . . . . . . . . . . 402.5.2 Theoretical Foundations . . . . . . . . . . . . . . . . . 452.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.6.1 Using the Typology to Describe . . . . . . . . . . . . . 472.6.2 Using the Typology to Generate . . . . . . . . . . . . 482.6.3 Using the Typology to Evaluate . . . . . . . . . . . . 492.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Interview Study:Visualizing Dimensionally Reduced Data: Interviews withAnalysts and a Classification of Task Sequences . . . . . . 513.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.3 Research Process . . . . . . . . . . . . . . . . . . . . . . . . . 543.4 Task Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 563.4.1 Dimension-Oriented Task Sequences . . . . . . . . . . 573.4.2 Cluster-Oriented Task Sequences . . . . . . . . . . . . 613.5 A Task Typology Revisited . . . . . . . . . . . . . . . . . . . 643.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.6.1 Implications for Evaluation . . . . . . . . . . . . . . . 673.6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . 703.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71viii4 Field Study:Overview: The Design, Adoption, and Analysis of a VisualDocument Mining Tool For Investigative Journalists . . . 734.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.3 Initial Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . 794.4 Design of Overview . . . . . . . . . . . . . . . . . . . . . . . . 804.5 Observations of Real World Usage . . . . . . . . . . . . . . . 844.5.1 Case Studies . . . . . . . . . . . . . . . . . . . . . . . 864.5.2 Think-Aloud Evaluation . . . . . . . . . . . . . . . . . 914.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.6.1 Task Abstractions Reconsidered . . . . . . . . . . . . 934.6.2 Design Rationale . . . . . . . . . . . . . . . . . . . . . 954.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034.9 Addendum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045 Design Study:Matches, Mismatches, and Methods: Multiple-ViewWork-flows for Energy Portfolio Analysis . . . . . . . . . . . . . . 1065.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.3 Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105.3.1 Data Abstraction . . . . . . . . . . . . . . . . . . . . . 1105.3.2 Task Abstraction . . . . . . . . . . . . . . . . . . . . . 1135.4 The Previous Version of Energy Manager . . . . . . . . . . . 1145.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195.6 Prototyping Environments . . . . . . . . . . . . . . . . . . . . 1215.7 Visual Encoding Matches and Mismatches . . . . . . . . . . . 1235.7.1 Faceted Views for Overview and Drill Down . . . . . . 1245.7.2 Rank-Based Overviews . . . . . . . . . . . . . . . . . . 1265.7.3 Matrix-Based Overviews . . . . . . . . . . . . . . . . . 1275.7.4 Map-Based Overviews . . . . . . . . . . . . . . . . . . 128ix5.7.5 Stack-Based Roll Up Encodings . . . . . . . . . . . . . 1295.8 Workflow Design with Multiple Views . . . . . . . . . . . . . 1305.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355.10.1 Guidelines: Familiarity and Trust . . . . . . . . . . . . 1355.10.2 Methodological Reflection . . . . . . . . . . . . . . . . 1375.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385.12 Addendum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396 Reflection and Conclusion . . . . . . . . . . . . . . . . . . . . 1426.1 Reflecting on the Task Typology . . . . . . . . . . . . . . . . 1426.1.1 An Extended Task Typology . . . . . . . . . . . . . . 1426.1.2 Comparisons to Roth (2013), Schulz et al. (2013) . . . 1496.1.3 Impact of the Task Typology . . . . . . . . . . . . . . 1536.2 Reflecting on the Interview Study . . . . . . . . . . . . . . . . 1576.3 Reflecting on the Field Study . . . . . . . . . . . . . . . . . . 1596.4 Reflecting on the Design Study . . . . . . . . . . . . . . . . . 1606.5 Concluding Thoughts . . . . . . . . . . . . . . . . . . . . . . 162Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165A Appendix: Task Typology (A Chronology) . . . . . . . . . 213A.1 Preliminary Influences . . . . . . . . . . . . . . . . . . . . . . 214A.2 Dedicated Literature Review . . . . . . . . . . . . . . . . . . 227A.3 Meta-Analysis of Existing Classifications . . . . . . . . . . . . 242A.4 Our Initial Classifications of Tasks . . . . . . . . . . . . . . . 246A.5 Mid-Level Visualization Tasks . . . . . . . . . . . . . . . . . . 271A.6 Our Proposed Taxonomy of Tasks . . . . . . . . . . . . . . . 275A.7 Revisions: From Taxonomy to Typology . . . . . . . . . . . . 276A.8 Presenting our Typology . . . . . . . . . . . . . . . . . . . . . 282A.9 Subsequent Evolution of our Typology . . . . . . . . . . . . . 283B Appendix: Interview Study . . . . . . . . . . . . . . . . . . . 285B.1 Complete List of Interviews . . . . . . . . . . . . . . . . . . . 285xB.2 Interview Foci and Questions . . . . . . . . . . . . . . . . . . 285B.3 Data Collection, Analysis, and Abstraction . . . . . . . . . . 288B.4 Data Analysis Artefact Examples . . . . . . . . . . . . . . . . 289B.5 Previous Interpretations of Findings . . . . . . . . . . . . . . 289B.6 Dimensionality Reduction in the Wild . . . . . . . . . . . . . 296B.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 297B.6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . 301B.6.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . 305B.6.4 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . 309B.6.5 Challenges . . . . . . . . . . . . . . . . . . . . . . . . 324B.6.6 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . 327B.6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 329B.6.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 331C Appendix: Field Study . . . . . . . . . . . . . . . . . . . . . 332C.1 Initial Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . 332C.2 Overview v1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332C.3 Field Study Proposal . . . . . . . . . . . . . . . . . . . . . . . 334C.3.1 Research Questions . . . . . . . . . . . . . . . . . . . . 335C.3.2 Research Context . . . . . . . . . . . . . . . . . . . . . 336C.3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . 336C.3.4 Outcomes and Follow-on Work . . . . . . . . . . . . . 341C.4 Interview Protocol . . . . . . . . . . . . . . . . . . . . . . . . 341C.5 Preliminary Field Study Results . . . . . . . . . . . . . . . . 350C.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 350C.5.2 The Overview Project . . . . . . . . . . . . . . . . . . 352C.5.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . 354C.5.4 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 359C.5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 363C.5.6 Conclusion and Future Work . . . . . . . . . . . . . . 367C.6 Overview v3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367D Appendix: Design Study . . . . . . . . . . . . . . . . . . . . 369xiD.1 Design Study Proposal . . . . . . . . . . . . . . . . . . . . . . 369D.1.1 Domain Background . . . . . . . . . . . . . . . . . . . 370D.1.2 Information Visualization Background . . . . . . . . . 371D.1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 372D.1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . 373D.1.5 Desired Outcome . . . . . . . . . . . . . . . . . . . . . 375D.1.6 Milestones . . . . . . . . . . . . . . . . . . . . . . . . . 376D.2 Example Research Artefacts . . . . . . . . . . . . . . . . . . . 378E Appendix: Consent Forms . . . . . . . . . . . . . . . . . . . 391Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394xiiList of TablesTable 2.1 Nodes in the why part of our typology and their relationto the vocabulary used in previous work. . . . . . . . . . 41Table 2.2 Nodes in the how part of our typology and their relationto the vocabulary used in previous work. . . . . . . . . . 42Table 3.1 A summary of task sequences performed by the ten ana-lysts that we interviewed and found in papers discussingdimensionality reduction (DR) and visualization. . . . . . 58Table 4.1 A summary of the six case studies. . . . . . . . . . . . . . 92Table 5.1 Data abstraction summary. . . . . . . . . . . . . . . . . . 111Table 5.2 A summary of the matches and mismatches between ab-stract tasks and visual encoding design choices. . . . . . . 124Table A.1 Additional dimensions of previous classifications. . . . . . 245Table B.1 The complete set of twenty-four interviews with nineteenanalysts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286Table B.2 Establishing conceptual relationships across summaries. . 292Table B.3 Establishing conceptual relationships across summaries (con-tinued). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292Table B.4 Establishing conceptual relationships across summaries (con-tinued). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296Table B.5 Establishing conceptual relationships across summaries (con-tinued). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296xiiiTable B.6 Usage examples described using our preliminary classifica-tion of people who use DR, DR techniques, and tasks. . . 298Table B.7 Usage examples described using our classification of tasksrelating to DR. . . . . . . . . . . . . . . . . . . . . . . . . 309Table D.1 Characterizing energy analysts activities as abstract tasks. 380xivList of FiguresFigure 1.1 Munzner’s nested model of visualization design. . . . . . 6Figure 1.2 Our multi-level typology of abstract visualization tasks. . 7Figure 1.3 Concise task descriptions using elements of the typology. 8Figure 1.4 An example sequence of tasks using the structure andvocabulary of our typology. . . . . . . . . . . . . . . . . 8Figure 1.5 Task sequences that involve visualizing dimensionally re-duced data. . . . . . . . . . . . . . . . . . . . . . . . . . 10Figure 1.6 Overview, a multiple-view application intended for useduring an investigation of a large collection of text docu-ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Figure 1.7 A sandbox environment for creating visualization datasketches pertaining to energy portfolio analysis. . . . . . 15Figure 1.8 A modified version of our typology appearing in Munzner[219]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 1.9 Timelines of the projects described in this dissertation. . 19Figure 2.1 Our multi-level typology of abstract visualization tasks:why, how, and what. . . . . . . . . . . . . . . . . . . . . 27Figure 2.2 Task descriptions for Example #1 (choropleth map) andExample #2 (large trees). . . . . . . . . . . . . . . . . . 36Figure 2.3 An example sequence of tasks as using the structure andvocabulary of our typology. . . . . . . . . . . . . . . . . 37Figure 3.1 A task sequence involving dimensionally reduced data. . 52xvFigure 3.2 Five task sequences that involve visualizing dimensionallyreduced data. . . . . . . . . . . . . . . . . . . . . . . . . 56Figure 3.3 A visual encoding of dimensionally reduced data, in whichthree synthesized dimensions have been identified. . . . . 60Figure 3.4 Scatterplots of dimensionally reduced data illustrating tasksrelated to item clusters. . . . . . . . . . . . . . . . . . . . 62Figure 3.5 Six tasks related to dimensionally reduced data, charac-terized using our abstract task typology. . . . . . . . . . 65Figure 3.6 A refinement to the why part of our abstract task typol-ogy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 4.1 Overview is a multiple-view application intended for thesystematic search, summarization, annotation, and read-ing of a large collection of text documents. . . . . . . . . 75Figure 4.2 A timeline of Overview ’s development, deployment, andadoption phases. . . . . . . . . . . . . . . . . . . . . . . . 76Figure 4.3 Detail from “A full-text visualization of the Iraq WarLogs”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Figure 4.4 Overview v2, a desktop application released in Winter2012. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Figure 4.5 Overview v4, a web-based application released in Summer2013. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Figure 4.6 The human-centred design process development cycle. . 103Figure 4.7 The two tasks characterized in Section 4.6.1. . . . . . . . 105Figure 4.8 The task of annotating documents and clusters. . . . . 105Figure 5.1 The previous version of Energy Manager, our collabora-tors’ energy analysis tool. . . . . . . . . . . . . . . . . . 116Figure 5.2 The previous version of Energy Manager, our collabora-tors’ energy analysis tool (continued). . . . . . . . . . . . 117Figure 5.3 A sandbox design environment for visualizing energy datafrom a portfolio of buildings. . . . . . . . . . . . . . . . . 122xviFigure 5.4 Faceted boxplots that encode aggregate area-normalizedenergy demand distributions. . . . . . . . . . . . . . . . 125Figure 5.5 A bar + bump plot of energy intensity. . . . . . . . . . . 127Figure 5.6 A time series calendar matrix of energy intensity savings. 129Figure 5.7 An interactive auxiliary boxplot prototype. . . . . . . . . 131Figure 5.8 A stacked area graph of energy demand. . . . . . . . . . 132Figure 5.9 The redesigned Energy Manager that incorporates manyaspects of our prototype designs. . . . . . . . . . . . . . 133Figure 5.10 The three tasks identified in Section 5.3.2. . . . . . . . . 141Figure 6.1 The why part of our typology, slightly extended and re-cast as a set of actions. . . . . . . . . . . . . . . . . . . . 144Figure 6.2 A set of targets, to be used in conjunction with a spec-ification of actions. . . . . . . . . . . . . . . . . . . . . 145Figure 6.3 Munzner’s classifcation of what [219]. . . . . . . . . . . . 146Figure 6.4 Munzner’s extension [219] to the how part of our typology. 147Figure 6.5 Our typology is used to describe Shneiderman’s visual in-formation seeking mantra: overview first, zoom and filter,details-on-demand. . . . . . . . . . . . . . . . . . . . . . 153Figure A.1 Early brainstorming on the topic of task classification. . 229Figure A.2 Dimensions of high-level, mid-level, and low-level tasks. . 243Figure A.3 Mid-level abstract tasks along the axes of domain speci-ficity and interface specificity. . . . . . . . . . . . . . . . 244Figure A.4 Our first classification. . . . . . . . . . . . . . . . . . . . 248Figure A.5 Our second classification. . . . . . . . . . . . . . . . . . . 250Figure A.6 Whiteboard and post-it diagramming prior to our thirdclassification . . . . . . . . . . . . . . . . . . . . . . . . . 251Figure A.7 Our third classification. . . . . . . . . . . . . . . . . . . . 252Figure A.8 Whiteboard and post-it diagramming prior to our fourthclassification. . . . . . . . . . . . . . . . . . . . . . . . . 253Figure A.9 Our fourth classification. . . . . . . . . . . . . . . . . . . 259xviiFigure A.10 Cross-cutting dimensions of previous classifications or frame-works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264Figure A.11 Our fifth classification. . . . . . . . . . . . . . . . . . . . 269Figure A.12 Our sixth classification. . . . . . . . . . . . . . . . . . . . 274Figure A.13 Our seventh and proposed classification. . . . . . . . . . 276Figure A.14 Our multi-level typology of abstract visualization tasks asof October 2013. . . . . . . . . . . . . . . . . . . . . . . . 283Figure A.15 Previous classifications sorted from low to high level ofabstraction. . . . . . . . . . . . . . . . . . . . . . . . . . 284Figure B.1 An example of an interview transcript. . . . . . . . . . . 290Figure B.2 An example of raw inter notes with post-hoc annotations. 291Figure B.3 An example document sent to us by an analyst. . . . . . 293Figure B.4 An early version of an analyst summary. . . . . . . . . . 294Figure B.5 A later version of an analyst summary. . . . . . . . . . . 295Figure B.6 Our preliminary classification of people who use DR, DRtechniques, and tasks. . . . . . . . . . . . . . . . . . . . . 297Figure B.7 Our classification of dimensionality reduction algorithms. 303Figure B.8 Our data collection, analysis, and abstraction methodol-ogy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305Figure B.9 Our classification of tasks relating to DR. . . . . . . . . 309Figure B.10 Example scatterplots of dimensionally reduced data illus-trating potential tasks related to checking cluster separa-bility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318Figure C.1 “A full-text visualization of the Iraq War Logs”. . . . . . 333Figure C.2 Overview v1, a prototype desktop application completedin Fall 2011. . . . . . . . . . . . . . . . . . . . . . . . . . 334Figure C.3 Overview v2, displaying tulsa’s email corpus. . . . . . . 353Figure C.4 An excerpt of Overview’s log file, listing timestamped in-teraction events. . . . . . . . . . . . . . . . . . . . . . . . 359Figure C.5 The median time spent viewing a single document. . . . 362xviiiFigure C.6 Overview v3, a web-based application released in Fall2012. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368Figure D.1 Eleven slide decks created between Nov 2013 and Febru-ary 2014. . . . . . . . . . . . . . . . . . . . . . . . . . . . 378Figure D.2 Partial summary of findings from initial interviews withenergy analysts. . . . . . . . . . . . . . . . . . . . . . . . 379Figure D.3 Partial summary of findings from initial interviews withenergy analysts (continued). . . . . . . . . . . . . . . . . 379Figure D.4 Partial characterization of data abstractions relevant toenergy analysts’ activities. . . . . . . . . . . . . . . . . . 380Figure D.5 Partial characterization of data abstractions relevant toenergy analysts’ activities (continued). . . . . . . . . . . 381Figure D.6 Verifying the task and data abstractions with power userenergy analysts. . . . . . . . . . . . . . . . . . . . . . . . 381Figure D.7 Verifying the task and data abstractions with power userenergy analysts (continued). . . . . . . . . . . . . . . . . 382Figure D.8 Initial data sketches produced within the sandbox envi-ronment. . . . . . . . . . . . . . . . . . . . . . . . . . . . 382Figure D.9 Initial data sketches produced within the sandbox envi-ronment (continued). . . . . . . . . . . . . . . . . . . . . 383Figure D.10 Following-up with the power user energy analysts withdesigns from our sandbox design. . . . . . . . . . . . . . 383Figure D.11 Following-up with the power user energy analysts withdesigns from our sandbox design (continued). . . . . . . 384Figure D.12 Another iteration of data sketches produced using thesandbox environment. . . . . . . . . . . . . . . . . . . . . 384Figure D.13 Another iteration of data sketches produced using thesandbox environment (continued). . . . . . . . . . . . . . 385Figure D.14 Early view coordination design depicting a matrix withauxiliary boxplots. . . . . . . . . . . . . . . . . . . . . . 385Figure D.15 Proposed workflow design involving multiple views basedon consolidated feedback from energy analysts. . . . . . 386xixFigure D.16 Storyboards using sandbox screenshots based on poweruser workflows. . . . . . . . . . . . . . . . . . . . . . . . 386Figure D.17 Storyboards using sandbox screenshots based on poweruser workflows (continued). . . . . . . . . . . . . . . . . 387Figure D.18 Color stock charts with juxtaposed line charts as alterna-tive to matrix with juxtaposed boxplots. . . . . . . . . . 388Figure D.19 Values from the brushed time period are highlighted onthe juxtaposed boxplots. . . . . . . . . . . . . . . . . . . 389Figure D.20 Boxplot for the brushed time period (red) is shown along-side the boxplot for the entire time series. . . . . . . . . 389Figure D.21 An example of how feedback was documented. . . . . . . 390xxGlossaryAPI application programming interfaceBELIV Beyond Time and Errors: Novel Evaluation Methods forVisualization, a bi-annual ACM workshop focusing on the challengesof evaluation in visualizationCCA canonical correlation analysisCSV comma-separated valuesDR dimensionality reductionFOIA Freedom of Information Act, allows individuals such as journaliststo request documents and data from public institutionsHCI human-computer interactionHDD heating degree day, one of several approaches to normalizing energyperformance using weather data; a full discussion of them is beyondthe scope of this thesisJSON JavaScript object notationkW kilowattskWh kilowatt-hoursMDS multi-dimensional scalingNLP natural language processingxxiOCR optical character recognitionPCA principal component analysisPDF portable document formatSFS sequential forward selectionSPLOM scatterplot matrixt-SNE t-distributed stochastic neighbor embeddingTF-IDF term frequency-inverse document frequencyUI user interfaceWIMP windows-icons-menus-pointerxxiiAcknowledgmentsAcknowledgements by chapter:Chapter 2: First, I thank my co-author Tamara Munzner, who firstconceived this project during her PhD and during early discussions withFranc¸ois Guimbretie`re. I acknowledge Ron Rensink for the original idea ofclassifying lookup, browse, locate, and explore according to target identityand location. I also thank Jessica Dawson, Joel Ferstay, Stephen Ingram,Joanna McGrenere, Miriah Meyer, Michael Sedlmair, and Colin Ware fortheir feedback on the paper. We received financial support for this projectfrom the Natural Sciences and Engineering Research Council of Canada(NSERC).Chapter 3: I thank my co-authors: Michael Sedlmair, Stephen Ingram,and Tamara Munzner; Michael was the first author of the technical re-port [283] that was the starting point for our 2014 Beyond Time and Errors:Novel Evaluation Methods for Visualization (BELIV) paper [36] and thischapter. I also thank the data analysts who participated in the originalinterview study for their time and energy: Kerem Altun, Ryan Brinkman,Jennifer Bu¨ttgen, Anamaria Cris¸an, Klaus Dress, Des Higgins, Carrie Holt,Heidi Lam, Kevin Leyton-Brown, Cindy Marven, Greg Mori, Sareh Nabi-Abdolyousefi, Cydney Nielsen, Ahmed Saad, Jonathan Stray, Sid Thakur,John Westbrook, James Wright, and Hong Yi. Finally, I thank StevenBergner, Jessica Dawson, Joel Ferstay, Miriah Meyer, Torsten Mo¨ller, TomTorsney-Weir, Melanie Tory, and Hamidreza Younesy for assisting with in-terviews and/or feedback on paper drafts. We received financial support forthis project from NSERC.xxiiiChapter 4: I thank my co-authors: Stephen Ingram, Jonathan Stray,and Tamara Munzner. I thank Jonas Karlsson and Adam Hooper, whocontributed to the development Overview v3-v4. I also thank the casestudy journalists: Jack Gillum, Ian James, Michael Keller, Adam Playford,Jonathan Stray, Jarrel Wade, and the dallas journalist who requested to re-main anonymous. Finally, I thank Jessica Dawson, Joel Ferstay, Heidi Lam,Joanna McGrenere, Ron Rensink, and Michael Sedlmair for their commentson the project and paper. We received financial support for this projectfrom the Knight News Challenge and NSERC.Chapter 5: I thank my co-authors: Jocelyn Ng, Kevin Tate, and TamaraMunzner. I also thank our collaborators at EnerNOC (formerly Pulse En-ergy): Bruce Cullen, Ben Gready, David Helliwell, Bruce Herzer, SteveJones, Jamie King, Sarah Laird, Fritz Lapastora, Ari Lesniak, Jordana Mah,Harish Raisinghani, Maria Serbenescu, Paul Teehan, and especially JamesChristopherson. I also especially thank Cailie Crane, Reetu Mutti, and theEnergy Manager development team. I also thank the energy analysts thatwe interviewed: Marc Etienne Brunet, Andy Constant, Bill Edbrooke, ChrisGoodchild, Marc Tabet, Sean Terry, Natalie Vadeboncoeur, Lillian Zaremba,and especially Jerome Conraud and Kevin Ng. Finally, I thank MichelleBorkin, Anamaria Cris¸an, Jessica Dawson, Johanna Fulda, Enamul Hoque,Sung-Hee Kim, Narges Mahyar, and Joanna McGrenere for their feedbackon the paper. We received financial support for this project from NSERCand Mitacs.For each of the four research chapters, I thank the anonymous reviewerswho reviewed the associated research paper, and in particular I thank thereviewer of our typology paper [33] who introduced me to Cognitive WorkAnalysis by Vicente [337], which proved to be an excellent resource.Personal acknowledgements:First and foremost, I thank Tamara Munzner, my primary thesis advisor,who first reached out to me during the 2009 graduate recruitment season,taught me the principles of visualization analysis and design in her graduatecourse later that year, and took me on as a PhD student in 2011. I thank herxxivfor all of our discussions, her attention to detail, and the career mentoringthat she has provided over the last several years.I thank Joanna McGrenere, my co-advisor. Joanna encouraged me topursue a PhD, and has since provided thoughtful mentoring and perspectiveson HCI research.I thank Ron Rensink, the third member of my supervisory committee. Iam particularly thankful for Ron’s graduate seminar in visual display design,which provided dual perspectives from perceptual psychology and design, aswell as opportunities to practice succinct writing and peer review.I thank my external examiner Jason Dykes for his very thorough reportand the thought-provoking questions that he raised.I thank my university examiners Giuseppe Carenini and Alfred Hermida,as well as my thesis defense chair Luanne Freund.I thank Giuseppe Carenini, chair of my thesis proposal defense meetingin 2014, and Ron Garcia, chair of my research proficiency evaluation meetingin 2012.I thank the members of the Tamara Munzner’s InfoVis group between2011 and 2016: Michelle Borkin, Anamaria Cris¸an, Jessica Dawson, Kim-berly Dextras-Romagnino, Wenqiang (Dylan) Dong, Joel Ferstay, JohannaFulda, Stephen Ingram, Zipeng Liu, and Michael Sedlmair. I also thankregular group meeting attendees Giuseppe Carenini, Enamul Hoque, Sung-Hee Kim, and Narges Mahyar, as well as group alumni Heidi Lam, MiriahMeyer, and Melanie Tory who joined in person or remotely for occasionalmeetings.I thank the members of Joanna McGrenere’s LUNCH (Lab for Uni-versal usability, persoNalization, Cscw, and Hci) research group between2011 and 2016: Kamyar Ardekany, Jessica Dawson, Shathel Haddad, MonaHaraty, Sung-Hee Kim, Juliette Link, Matei Negulescu, Antoine Ponsard,Diane Tam, Charlotte Tang, and Kailun Zhang.I thank the members of the UBC MUX (Multimodal User eXperience)lab between 2011 and 2016 for the feedback and support they providedduring research update presentations and practice talks. In addition to allof those already mentioned above from the InfoVis and LUNCH groups, Ixxvthank faculty members Kellogg Booth and Karon MacLean, as well as allthe post-docs, visiting researchers, and students who have contributed tothe culture of the MUX lab.I thank the InfoVis subcommittee of the 2014 IEEE Doctoral Colloquiumfor their feedback on my thesis research program: Christopher Collins, PetraIsenberg, and Chris Weaver. I also thank my fellow Doctoral Colloquiumparticipants: Sriram Karthik Badam, Richard Brath, Samuel Gratzl, JuliaJu¨rgens, and Jorge Poco.I also thank those that I collaborated and co-authored with on otherprojects between 2011 and 2016: Benjamin Bach, Johanna Fulda, NathalieHenry Riche, Claudia Jacova, Joanna McGrenere, Charlotte Tang, MelanieTory, Sheelagh Carpendale, Donghao Ren, and especially Bongshin Lee.I thank Eamonn Maguire for the excellent illustrations that he createdfor Tamara’s book [219] and for making it possible for me to use them inthis dissertation.I thank Sandra Mathison, instructor of EPSE 595 (a qualitative researchmethods course o↵ered in Winter 2012), who introduced me to a number ofepistemological and theoretical perspectives, methodologies, and methods.I thank Laura Selander, group assistant for the UBC Imager lab, andJoyce Poon, UBC Department of Computer Science graduate program as-sistant, for their logistics support.I thank Joanna McGrenere and Claudia Jacova, my M.Sc advisors, aswell as T. C. Nicholas Graham, my undergraduate honours thesis advisor;my decision to pursue a PhD was in part motivated by my early researchexperiences with them and by their encouragement.I thank my family for all of their love and support over the years: myparents Leslie and Michael Brehmer, my brother Nicholas Brehmer, mygrandparents, and my uncle Thomas Brehmer, who introduced me to thework of Edward Tufte at an impressionable age.Lastly, I thank my partner, Anamaria Cris¸an: for enduring, for under-standing, and for joining me on this path.xxviChapter 1IntroductionWhy do people visualize data?Ultimately, visualizing data allows people to consume and produce in-formation in order to solve domain-specific problems or to communicate anunderstanding about phenomena relevant to a particular domain. Visual-ization is often associated with data analysis and communication processes,however it is important to stress that not all data analysis and communi-cation tasks are addressed via visualization. This dissertation describes anapproach that researchers and practitioners can use to systematically clas-sify and abstract visualization tasks, whether they occur in a data analysisor communication context, and whether they involve the consumption or theproduction of information. This abstraction of tasks is necessary and impor-tant because this abstraction facilitates visualization analysis and design: itcan be used to communicate and transfer lessons learned from studying vi-sualization tasks in specific application domains or with specific datatypes,it can be used to understand the implications of findings from controlledexperiments, and it can be used to contextualize novel visualization tech-niques.In this dissertation, we1 present a typology of abstract visualization tasks1With the exception of personal anecdotes in Section 1.1 and in the conclusion chapter,I will use the pronoun we throughout this dissertation to reflect the collaborative natureof the work reported therein.1and document its application in three studies: in an interview study of indi-viduals who visualize dimensionally reduced data in ten di↵erent applicationdomains, in a post-deployment field study evaluation of a visual analysis toolin the domain of investigative journalism, and in a visualization design studyin the domain of energy management. We also survey how our approach totask analysis has been adopted and extended by others visualization com-munity.1.1 Research TrajectoryBefore discussing the contents and contributions of this dissertation in moredetail, I will tell the story of how I came to study why people visualize data.When I entered my PhD program in late 2011, I posed the followingquestion: How do we evaluate visualization techniques and tools in an ap-plication domain context, particularly if these techniques and tools are usedfor data analysis? At this time, I read an early manuscript of a survey ofvisualization evaluation by Lam et al. [183], in which the authors read andcoded over eight hundred recent visualization research papers that reportan evaluation component. While many of these papers discuss human per-ceptual performance or visualization usability, relatively few of these papersdocument an attempt to evaluate the use of visualization tools or techniquesin settings other than in a controlled experiment, and fewer still commenton adoption: whether a deployed visualization tool was incorporated intothe recurring data analysis workflows of individuals working in a specificapplication domain. The findings of this survey prompted me to ask: Whyis the study of real-world usage and adoption of visualization techniques ortools reported so infrequently? and If this research is dicult to conduct,what makes it so dicult?Initially, I focused my study on the corpus of research papers emanatingfrom the ACM Beyond Time and Errors: Novel Evaluation Methods forVisualization (BELIV) workshop series, where a number of methodologiesand methods for evaluating visualization tools or techniques “in the wild”have been proposed. At the 2012 BELIV workshop, there was substantial2discussion pertaining to a need for a better shared understanding of thetasks of people who visualize data or use visualization tools, and that thee↵ective application of visualization evaluation methodologies depends uponthis understanding.My interest in evaluating visualization techniques and tools in real-worldsettings prompted me to undertake field study and a design study; both casesinvolved a visualization tool that was developed to address domain-specificproblems. In a design study, it is critical that the researchers have correctlyabstracted the domain problems or use cases and mapped these to appro-priate visual encoding and interaction design choices, perhaps incorporatingdesign choices originally applied to other domains. However, this abstrac-tion and mapping is seldom straightforward: Sedlmair et al. [284] describehow initial designs often fail to address the tasks that people are expectedto perform, how inappropriate evaluation methods are chosen, or how pre-maturely deployed visualization tools fail to be adopted by the people forwhich they were designed.By early 2013, my thinking had coalesced into a thesis statement ex-pressed as follows: visualization design and evaluation is dicult becausemapping a person’s tasks to visual encoding and interaction design choicesrequires multiple levels of abstraction. Researchers and practitioners wouldbenefit from a domain-agnostic, consistent, and validated approach for as-sisting in this mapping.What is a “task”? A task is an ill-defined concept, and as the reader willdiscover in Chapter 2, there is currently little agreement in the visualiza-tion literature as to the appropriate granularity for describing a task. Forinstance, finding an extreme value [8] is less abstract than exploring [366]or integrating insights [301], while comparing sequence variants in a humangenome is quite domain-specific. This confusion is the result of a conflationof two axes on which we might characterize a task: level of abstraction andapplicability, in which the latter refers to the specificity of a task with respectto a particular application domain or datatype. Relating these task descrip-tions is dicult, though not impossible; however, visualization practitioners3hardly have a shared lexicon when describing these relations between levelsof abstraction and application areas.Researchers and practitioners should strive to go beyond merely describ-ing the use of visualization tools or techniques in a specific application do-main; rather they should abstract these domain-specific tasks in order torealize an appropriate visualization design space. This abstraction also letspractitioners contribute back to the visualization research community, trans-ferring their findings beyond a single domain.My dissertation examines visualization task abstraction from multipleperspectives. Chapter 2 documents the synthesis of related work classify-ing tasks, interactions, and visualization design choices. The result of thissynthesis was a new approach to task analysis, a typology for classifying vi-sualization tasks at multiple levels of abstraction. However, proposing thistypology was not enough. We had to validate this typology as a pragmatictool [16]; to do so, we used the typology to describe existing interactions be-tween people and visualization techniques or tools, to generate new designs,and to evaluate these designs. The three forms of validation are intertwined,as the ability to generate or evaluate implies the ability to describe; in thisdissertation, we address all three types of validation.The remainder of this dissertation serves to validate this typology inapplied settings spanning multiple domains. In Chapter 3, we used our ty-pology to analyze findings from an interview study spanning several di↵erentapplication domains, focusing on individuals who visualize dimensionally re-duced data. In Chapter 4, we used our typology in a field study to evaluateOverview, a visual analysis tool that was adopted by investigative journal-ists. Finally, in Chapter 5, we used our typology to design and evaluatevisualization designs in the domain of energy management.1.2 MotivationTask analysis is essential for visualization analysis, evaluation, and design.While there are many approaches to visualization task analysis, they vary interms of level of abstraction, applicability across domains and datatypes, as4well as in terms of the vocabulary that they use. The visualization commu-nity requires a synthesis of existing task analysis approaches and theoreticalfoundations, one that spans multiple levels of abstraction, spans across do-mains and datatypes, and introduces a common and consistent task lexicon2.This dissertation demonstrates that such a synthesis is possible. Fur-thermore, we demonstrate our approach to task analysis in visualizationanalysis, evaluation, and design projects. We also report on the adoption ofour approach by others in the community.1.3 Thesis ContributionsThere are four research chapters contained in this dissertation, each o↵eringcontributions to the visualization research community. A succinct summaryof all contributions appears at the end of this section.1.3.1 A Typology of Abstract Visualization TasksThe visualization design and evaluation process is characterized by multiplelevels [217, 219], as characterized by Munzner’s nested model, shown insee Figure 1.1. These levels include the domain problem, data and taskabstractions, visual encoding and interaction design choices, and ultimatelythe algorithms that drive them. In this dissertation, our focus is on taskabstractions, how they map upward to domain problems, and how they mapdownward to visualization design choices.As indicated in Section 1.1, my personal motivation for developing an ap-proach for classifying and abstracting tasks was pragmatic: we had amassedobservational data of the use of visualization tools and techniques in the in-terview study and field study projects, described in Chapter 3 and Chapter 4respectively, where we struggled to describe and compare tasks performedby di↵erent people, tasks performed with di↵erent visualization techniquesor tools, as well as tasks associated with di↵erent application domains. Werequired a systematic approach for analyzing tasks abstractly, allowing us todescribe and evaluate visualization design choices that address these tasks.2We elaborate on this motivation in Chapter 25Data/task abstractionVisual encoding/interaction idiomAlgorithmDomain situationFigure 1.1: Munzner’s nested model of visualization design [217, 219];the arrows indicate the cascading e↵ects of design decisionsmade at higher levels. Illustration: c E. Maguire (2014).Methodology: In Chapter 2, we describe our comprehensive review ofprevious work that classified tasks, interactions, activities, and visualizationdesign choices. This review included over two dozen previous classificationsystems and theoretical frameworks from the literatures of visualization,human-computer interaction (HCI), information retrieval, communications,and cartography. We examined the vocabulary and definitions used in thisbody of previous work, and after multiple rounds of coding, we had groupedsimilar terms, determined representative terms for each group, and arrangedthese representative terms into multiple levels of abstraction3. We reasonedabout how tasks could be described using this arrangement of terms, eitherin isolation, or as a sequence of interdependent tasks.Contributions: The result of our synthesis was a typology of abstract vi-sualization tasks, illustrated in Figure 1.2. This typology allows for succinctdescriptions of tasks, in which a task description is comprised of why datais visualized (at multiple levels of abstraction), what dependencies a taskmight have in terms of input and output, and how the task is or can besupported in terms of visual encoding and interaction design choices; given3Appendix A documents the evolution of our typology.6why?present discovergenerate / verify enjoylookuplocatebrowseexploreproduceidentify compare summarizetarget known target unknownlocation unknownlocation knownqueryconsumesearchhow?annotateimportderiverecordselectnavigatearrangechangefilteraggregateencodemanipulate introducewhat?[ input ] [ output ](if applicable)a bcFigure 1.2: Our multi-level typology of abstract visualization tasks,which classifies (a) why data is visualized, (b) how the taskis supported in terms of visual encoding and interaction designchoices, and (c) what dependencies a task might have. Note thatthe colors used for why and how correspond to the abstractionand technique design levels of Munzner’s nested model [217],shown in Figure 1.1.this structure, it is possible to describe sequences of interdependent tasks,as illustrated in Figure 1.3 and in the example of Figure 1.4. Our typologyhas since proven to be useful in our subsequent interview study (Chapter 3),field study (Chapter 4), and design study (Chapter 5) projects, as well asin recent work by others; we reflect upon the adaptation and use of ourtypology by others in Chapter 6.1.3.2 Use of the Typology in an Interview StudyIn Chapter 3, we used our typology to analyze data analysis and the useof visualization techniques and tools “in the wild” by way of an interviewstudy. In particular, we used the typology to examine the data analysistasks of individuals working in several di↵erent domains, and specificallytasks related to the analysis of high-dimensional data; we sought to betterunderstand this data, the dimensionality reduction (DR) transformationsapplied to it, as well as why and how visualization techniques and tools areused throughout analysts’ domain-specific workflows.7how?what?why?how?what?why?how?what?why?dependencyFigure 1.3: Concise task descriptions are constructed using elementsfrom each part of the typology. In specifying the input andoutput of tasks, we can describe sequences of interdependenttasks.2D dataclusters and pointsscatterplotencodenavigateselect++discoverexploreidentifyproducederivehigh-dim. data 2D dataproduceannotateclusters and pointscolours for pointshow?why?what?legend“verifying a hypothesis regarding the existence of clusters of unlabelled items in a scatterplot of dimensionally-reduced data, then labelling the points.”Figure 1.4: An example sequence of tasks, described as a sentence(top left), as well as using the structure and vocabulary of ourtypology (top); the bottom depicts a series of transformationscorresponding to the inputs and outputs of each task. This par-ticular series of abstract tasks is relevant to both the interviewstudy (Chapter 3) and field study (Chapter 4) projects, as theyboth involve high-dimensional data, dimensionality reduction,and the visualization of dimensionally reduced data.8Methodology: The focus of this research was to classify the tasks asso-ciated with the visualization of dimensionally reduced data, such as in theexample of Figure 1.4. Our data collection and analysis methodology in-cluded twenty-four interviews with researchers and a literature survey span-ning several application domains, including HCI, chemistry, bioinformatics,computer science, and policy analysis. Our approach was similar to aninterview study by Kandel et al. [163], one classifying data analysis andvisualization among enterprise data analysts; we view our work as beingcomplementary to their findings, given that both projects addressed dataanalysis and visualization “in the wild” for a broad group of domains. Wecollected a large amount of data: diagrams, screen shots, interview notes,recordings, and transcripts, as well as interviewees’ research papers, theirdata, and other research artifacts.Using a qualitative coding approach, we developed a classification oftask sequences relating to visualizing dimensionally reduced data, whichare illustrated in Figure 1.5, where each sequence is comprised of tasks,and each task can be defined using the vocabulary of our task typology.We distinguished between tasks relating to learning about the syntheticdimensions resulting from DR and those relating to learning about clustersof items in the dimensionally reduced data.Contributions: With the advent of the task typology proposed in Chap-ter 2, we had a theoretical lens and vocabulary with which to approach theconsiderable amount of data that we collected from our interview study.Using the vocabulary of our typology, we were able to classify why thesetechniques are applied in sequential workflows, as well as what the inputsand outputs of these tasks are.We contribute a datatype-specific classification of tasks grounded inobservations of real-world analyst behaviour. We encourage the furtherclassification of tasks specific to datatype, as these are complementary toour datatype-agnostic typology that we introduce in Chapter 2; examplesin the literature include the often-cited task by datatype taxonomy byShneiderman [291], classifications of graph-specific tasks [186, 272], tabu-9DR name synth. dimensionsstartDR name synth. dimensionsmap synth. to originalstartDR verify clustersstartDR verify clustersstart name clustersDR verify clustersstart name clustersmatch clusters and classesFigure 1.5: A classification of five task sequences that involve visual-izing dimensionally reduced data, based upon findings from aninterview study, documented in Chapter 3.lar data [133], and time-oriented data [185]. When the paper about ourinterview study was first published, there was no prior classification of tasksrelating to visualizing dimensionally reduced data4. The findings of our in-terview study and our classification of tasks is further contextualized withreferences to specific visual encoding design choices; as a result, our clas-sification of tasks can serve to validate and inform visualization techniqueresearch, a challenge that we identified in previous work [150].1.3.3 Use of the Typology in a Field StudyIn 2010, our research group began collaborating with a professional journal-ist who was developing a visualization tool intended for the exploration oflarge text document collections. Since this time, the tool has been deployedas Overview5 (shown in Figure 1.6), a web-based application for investigativejournalists who report on large document collections attained from Freedomof Information Act (FOIA) requests or from whistleblower organizations,collections ranging in size from hundreds to tens of thousands of documents.Between 2012 and 2014, we conducted a post-deployment field study evalu-4A 2015 task taxonomy by Etemadpour et al. [94] is discussed in Section 6.2.5https://www.overviewdocs.com/10Figure 1.6: Overview is a multiple-view application intended for thesystematic search, annotation, and reading of a large collectionof text documents, which visualizes hierarchical clusters of doc-uments as a tree (left).ation of Overview, in which we analyzed its adoption and self-initiated useby investigative journalists. Chapter 4 documents this field study.Methodology: We conducted case studies of six journalists who usedOverview to conduct investigations involving large document collections; infive of these cases, the investigation resulted in a published story, and one ofthese stories [236] was a finalist for the 2014 Pulitzer Prize in journalism6.A critical di↵erence between our approach and other post-deployment fieldstudies that focus on the usage of visualization tools or techniques [195, 277,292] is that our case study participants were not solicited by the researchers:they freely chose to use Overview and they did not inform preceding phasesof design. We also engaged a di↵erent set of people at each stage of de-sign, rather than the same set of people. This di↵erence reflects Overview’scontext of use: repeat usage cannot be predicted and Overview is only ap-6http://www.pulitzer.org/2014_public_service_finalist11propriate for some investigations; we have yet to encounter a journalist whospecializes in investigations pertaining to large document collections.We interviewed these six journalists about the form and provenanceof their documents, the objectives of their investigation, and their use ofOverview; we also collected their logged interaction data and their anno-tated document collections. We used our task typology, which we introducein Chapter 2, to better understand why Overview was adopted by thesejournalists to perform their investigations.Results: The analysis of journalists’ use of Overview revealed that our ini-tial understanding of their task was insucient: the task of “exploring” adocument collection, a term that appears often in previous work on visu-alizing document data, is both too vague and too narrow to capture howjournalists actually used Overview. Instead, we identified two di↵erent tasksusing the vocabulary and structure of our typology: one of generating hy-potheses and summarizing the contents of a document collection, and an-other of locating and identifying specific evidence in order to verify or refuteprior hypotheses.Contributions: Given our more precise understanding of journalists’ tasks,we were able to rigorously analyze the rationale for Overview’s visual encod-ing and interaction design choices. This analysis is transferable beyond thedomain of journalism and speaks to the design of visualization techniquesand tools addressing document data and to some extent any data that canbe hierarchically structured. Finally, we reflect upon Overview’s design andevaluation process, comparing our approach to previous human-centred vi-sualization design processes [153, 195]; we also discuss the value, logistics,and limitations of studying the adoption of visualization techniques or tools.1.3.4 Use of the Typology in a Design StudyIn 2013, we initiated a visualization design study project that providedan opportunity to validate the generative potential of our typology. Thisproject was a collaboration with a company that develops energy usage re-porting software for multi-building organizations such as universities, school12boards, or hotel chains. Many of these client organizations have designatedenergy analysts who oversee large portfolios of buildings; these analysts areresponsible for identifying cost saving opportunities, diagnosing erratic en-ergy usage behaviour, and attempting to understand the role of fluctuatingexternal factors such as weather, occupancy, operating hours, and equipmentusage within buildings. Tools and techniques for addressing these tasks withrespect to single buildings already exist, however they do not scale to port-folios of dozens or hundreds of buildings. We conjectured that an interactiveapplication integrating visualization while considering these issues of scalecould address the tasks of these analysts. Chapter 5 documents this designstudy.Methodology: We began by analyzing the energy domain and interview-ing energy analysts from commercial client organizations who had previ-ously used our industry partner’s software, asking them about their rolesand responsibilities, their technical background, their portfolio of buildings,and the limitations of current tools. We also presented our interview find-ings and sought additional feedback from members of our industry partner’sclient services team, who have expertise with the current software and actas liaisons to client organizations.Once again, we used our task typology, to identify and abstract thetasks of these analysts. We narrowed our scope to tasks that recurred oftenamong the analysts and those that were consistent with the mandate of ourindustry partner’s product development team to support analysis of energyconsumption in building portfolios.Over the course of four months, we designed and implemented over adozen interactive visualization data sketches [195] to address the tasks ofthese analysts, following a process of rapid iteration in which functionalsketches featuring the analysts’ data were used to further refine our un-derstanding of their tasks and context of use. These data sketches wereproduced within the interactive interactive sandbox environment shown inFigure 1.7. Our task abstractions informed the process of mapping thesetasks to a set of appropriate visual encoding and interaction design choices.13The design choices that we considered included those for performing multi-ple comparisons between aggregate and individual items over time [2], foridentifying cyclic and acyclic events using meaningful temporal granulari-ties [335], and for identifying di↵erences in multiple lists of ranked itemswhile simultaneously identifying the cause of rank changes [119].In early 2014, we conducted chau↵eured demos [195] of these interactivedata sketches with four groups of analysts; the energy usage data used inthese demos was collected from analysts’ own building portfolios. By inte-grating the feedback we received on our sketches and our understanding ofenergy analysts’ tasks, we then envisioned ways to juxtapose and sequencediscrete views of the data in order to support workflows, and we continuedto elicit feedback from analysts and our collaborators’ client services team.Results: Our collaborators have since adopted a number of our designsinto a new version of their commercial energy analysis software tool. Theyassigned over ten full-time software developers to the project since mid-2014and the tool has since been released to some client organizations in a smallpilot deployment; the tool will soon7 be deployed to thousands of otherclients.Contributions: As a result of abstracting the data and tasks relating tothe energy management domain, visualization practitioners working in otherdomains might benefit from our classification of matches and mismatchesbetween abstract tasks and visualization design choices, particularly for do-mains that involve comparing many concurrent time series.We also confronted issues of domain convention in this project; in theenergy sector, some visual encodings carry very specific meanings. We con-sidered how to introduce unfamiliar visual encodings and how to get peopleworking in this domain to trust them.Finally, we contribute some methodological guidance for visualizationdesign studies, including our approach to work domain analysis, a systematictask analysis and abstraction, our sandbox prototyping and workflow design,as well as how to e↵ectively present visualization design documentation.7Relative to November 2015.14Figure 1.7: A sandbox environment for creating visualization datasketches pertaining to the analysis of energy usage in largebuilding portfolios. In this visualization data sketch, a calendar-based time series matrix is juxtaposed with summary boxplots,where each row is a group of buildings. Both the matrix and theboxplots encode the di↵erence between average energy demandin 2012 and 2013.1.3.5 Summary of ContributionsThe contributions of this dissertation can be summarized as follows:• A typology of abstract visualization tasks, which allows for succinctdescriptions of tasks and task sequences in terms of why data is vi-sualized, what dependencies a task might have in terms of input andoutput, and how the task is supported in terms of visual encoding andinteraction design choices (Chapter 2).• A synthesis of the literature relating to visualization tasks (Chapter 2).• A datatype-specific classification of five task sequences relating to vi-15sualizing dimensionally reduced data, one based on findings from ourinterview study with data analysts spanning several application do-mains. This classification draws upon and demonstrates the descrip-tive power of our typology of tasks and is intended to inform the designof new tools and techniques for visualizing dimensionally reduced data(Chapter 3).• A field study evaluation of Overview, a visualization tool for investigat-ing large text document collections. We draw upon and demonstratethe descriptive and evaluative power of our typology of tasks and char-acterized two abstract tasks relating to document mining (Chapter 4).• Seven lessons relating to the design of interactive visualization toolsfor hierarchical data and document data in particular. These lessonsare based on an analysis of successive deployed versions of Overviewand its adoption by self-initiated journalists (Chapter 4).• A methodological reflection on the study of visualization adoption(Chapter 4).• A demonstration of the descriptive, evaluative, and generative powerof our typology of tasks in a visualization design study within theenergy domain (Chapter 5).• An identification of matches and mismatches between visualizationdesign choices and three abstract tasks for concurrent time series data(Chapter 5).• Two lessons pertaining to familiarity with visual encodings, two lessonspertaining to the trust of data aggregation design choices, and threelessons pertaining to visualization design methodology (Chapter 5).1.4 Extension and Impact of the TypologyIn Chapter 6, we comment on how our task typology was subsequentlyextended by Munzner in her 2014 book Visualization Analysis and De-sign [219]; this modified typology is shown in Figure 1.8. Munzner moved16introduce nodes from the how part of the typology to become forms of pro-duce in the why part of the typology; she also added targets to the why partof the typology, referring to the original why part of our typology as actions;finally, she reorganized the how part of the typology and elaborated on formsof encode. We explicitly make reference to and use Munzner’s modificationsto the typology in our interview study (Chapter 3) and in our design study(Chapter 5). Chapter 6 also contains commentary on the origin, the benefits,and the potential drawbacks of Munzner’s modifications.We also present a survey of how our task typology and our approachto systematically analyzing and abstracting tasks has been used and/or ex-tended by others in the visualization community, including how our typol-ogy may integrate with novel theoretical frameworks. This survey includesthe use of our typology to analyze domain-specific usage of visualizationtechniques or tools, from bioinformatics to malware analysis, as well asdatatype-specific visualization usage, from geospatial data to multiplex net-works. This survey also includes the use of our typology to specify andcontextualize tasks in experimental studies, as well as the use of our typol-ogy to motivate the design of novel visualization techniques and tools.1.5 A Note on ChronologyThe duration of the projects described in this dissertation extended long pe-riods of time. As a result, periods of focused research on these projects wereinterleaved or overlapping. Figure 1.9 illustrates the chronological historyof these projects, indicating the core focus periods of projects, importantmilestones, as well as periods of part-time focus. Figure 1.9 also indicatesother milestones in my PhD, research projects not included in this disserta-tion [34, 106], and internships.17AnalyzeSearchQueryConsumePresent EnjoyDiscoverProduceAnnotate Record DeriveIdentify Compare SummarizetagTarget known Target unknownLocation knownLocation unknownLookupLocateBrowseExploreActions(a) why (actions).TrendsAll DataOutliers FeaturesAttributesOne ManyDistribution Dependency Correlation SimilarityNetwork DataSpatial DataShapeTopologyPathsExtremesTargets(b) why (targets).How?Encode Manipulate Facet ReduceArrangeMapChangeSelectNavigateExpress SeparateOrder AlignUseJuxtaposePartitionSuperimposeFilterAggregateEmbedColorMotionSize, Angle, Curvature, ...Hue Saturation LuminanceShapeDirection, Rate, Frequency, ...from categorical and ordered attributesWhy?How? What?(c) how (design choices).Figure 1.8: A modified version of our typology appearing in Munzner[219] (c.f. Figure 1.2): (a) forms of produce were moved fromhow; (b) a new classification of targets; (c) a reorganization ofhow. Illustrations: c E. Maguire (2014).18Work domain analysisoperating systemscourse2012 20142013 2015 2016Task Typology 2012 20142013 2015 2016Interview Study: Visualizing DR DataField Study: Overview Design Study: Energy Portfolio Analysis PhD milestones entered PhD programPhD research proficiency evaluationPhD thesis proposal defence,… attended VACCINEevaluation in visual analytics workshopparticipated in IEEE VIS 2014 Doctoral Colloquium, …ACM BELIV 2014 paper published [17]MicrosoftResearch internshipcompleted dissertationdraftIEEE VAST 2015 paper published [58]EnerNOC (then Pulse Energy) internshipproject inceptionIEEE InfoVis 2015 paper published [20]IEEE InfoVis 2015 paper submittedIEEE VAST 2015 paper submittedACM CHI 2016 paper submittedIEEE InfoVis 2014 paper published [18]IEEE InfoVis 2014 paper submittedEuroVis 2014 paper submittedIEEE InfoVis 2013 paper submittedIEEE InfoVis 2013 paper published [16]completed PhD courserequirementqualitative methods courseUBC Technical report published [177]IEEE InfoVis 2012 paper submittedACM BELIV 2014 paper submittedIEEE TVCG paper submittedIEEE TVCG revision submittedACM BELIV 2014 paper submitted ACM BELIV 2014 paper published [19]field study beginsLiterature analysis & writingAnalysis, writingWorkflow designcase study 1 Analysis, writingcase study 2case study 4case study 3case study 6case study 5Analysis, writingAnalysis, writingRevisingAnalysis, writingRevisingAnalysis, writingCore FocusPart-time FocusMilestoneLegendIEEE VAST 2015 paper researchFigure 1.9: Timelines of the projects described in this dissertation.Note that the projects described in this dissertation overlappedin time. While the order of the chapters in this dissertationreflect the order in which the projects were completed, they donot reflect the order in which they were initiated.19Chapter 2A Typology of AbstractVisualization Tasks“By thinking about visualization as a process instead of anoutcome, we arm ourselves with an incredibly powerful thinkingtool.” — Jer Thorp in “Visualization as process, notoutput” [316] (Harvard Business Review, April 3, 2013)1The considerable previous work characterizing visualization processeshas focused on low-level tasks or interactions and high-level tasks, leavinga gap between them that is not addressed2. This gap leads to a lack ofdistinction between the ends and means of a task, limiting the potentialfor rigorous analysis. We contribute a multi-level typology of visualizationtasks to address this gap, distinguishing why and how a visualization task isperformed, as well as what the task inputs and outputs are. Our typologyallows complex tasks to be expressed as sequences of interdependent tasks,resulting in concise and flexible descriptions for tasks of varying complexity1This chapter is a slightly modified version of our paper A Multi-Level Typology ofAbstract Visualization Tasks by Matthew Brehmer and Tamara Munzner; in IEEE Trans-actions on Visualization and Computer Graphics (Proceedings of InfoVis 2013), 19(12),p. 2376–2385 [33]. http://dx.doi.org/10.1109/TVCG.2013.124.2Referring to the examples cited in Section 1.1, finding an extreme value [8] is anexample of a low level of abstraction while exploring [366] or integrating insights [301] areexamples of a higher level of abstraction.20and scope. It provides abstract rather than domain-specific descriptions oftasks, so that useful comparisons can be made between visualization tech-niques or tools targeted at di↵erent application domains. This descriptivepower supports a level of analysis required for the generation of new designs,by guiding the translation of domain-specific problems into abstract tasks,and for the qualitative evaluation of visualization tools or techniques. Wedemonstrate the benefits of our approach in a detailed example, comparingtask descriptions from our typology to those derived from related work. Wealso discuss the similarities and di↵erences between our typology and overtwo dozen existing classifications and theoretical frameworks from severalresearch communities, including visualization, HCI, information retrieval,communications, and cartography.2.1 MotivationConsider a person who encounters a choropleth map while reading a blogpost in the aftermath of an American presidential election. This particularmap is static and visually encodes two attributes, candidate and marginof victory, encoded for each state using a bivariate colour mapping. Thisperson decides to compare the election results of Texas to those of California,motivated not by an explicit need to generate or verify some hypothesis, norby a need to present information to an audience, but rather by a casualinterest in American politics and its two most populous states. How mightwe describe this person’s task in an abstract rather than domain-specificway?According to Munzner’s nested model for visualization design and val-idation [217], abstract tasks are domain- and interface-agnostic operationsthat people perform. Disappointingly, there is little agreement as to theappropriate granularity of an abstract task among the many existing clas-sifications in the visualization, HCI, cartography, and information retrievalliterature [7, 8, 12, 42, 48, 51, 58, 60, 75, 117, 130, 166, 175, 186, 193, 216,239, 242, 252, 260, 262, 263, 291, 298, 301, 329, 330, 343, 349, 366, 370]. Oneof the more frequently cited of these [8] would classify the above example as21being a series of value retrieval tasks. This low-level characterization doesnot describe the person’s context or motivation; nor does take into accountprior experience and background knowledge. For instance, a description ofthis task might di↵er if the person was unfamiliar with American geogra-phy: the person must locate and identify these states before comparing theirvalues. Conversely, high-level descriptions of exploratory data analysis andpresentation emanating from the sensemaking literature [7, 48, 175, 242]cannot aptly describe this person’s task.The gap between low-level and high-level classification leaves us unableto abstractly describe tasks in a useful way, even for the simple static choro-pleth map in the above example. This gap widens when interactive vi-sualization techniques are considered, and the complexity of its usage iscompounded over time. We must move beyond describing a single task inisolation, to a description that designates when one task ends and anotherbegins. To close this gap, visualization tasks must be describable in anabstract way across multiple levels.The primary contribution of this chapter is a multi-level typology of ab-stract visualization tasks that unites the previously disconnected scopes oflow-level and high-level classifications by proposing multiple levels of link-age between them. Our typology provides a powerful and flexible way todescribe complex tasks as a sequence of interdependent simpler ones. Whilethis typology is very much informed by previous work, it is also the resultof new thinking and has many points of divergence with existing models.Central to the organization of our typology are three questions that serveto disambiguate the means and ends of a task: why data is being visualized,how the visualization technique or tool supports the task, and what are thetask’s inputs and outputs. We have found that no prior characterization oftasks satisfactorily answers all of these questions simultaneously at multiplelevels of abstraction. Typically, low-level classifications provide a sense ofhow a task is performed, but not why; high-level classifications are the con-verse. One major advantage of our typology over prior work is in providinglinkage between these two questions. Another advantage is the ability tolink sequences of tasks, made possible by the consideration of what tasks22operate on.Our typology provides a consistent lexicon for description that supportsmaking precise comparisons of tasks between di↵erent visualization toolsand across application domains. Succinct and abstract descriptions of tasksare crucial for analysis of people using visualization tools and techniques.This analysis is an essential precursor to the e↵ective design and evaluationof visualization tools, particularly in the context of problem-driven designstudies [284]. In these studies, visualization practitioners work with peoplefrom specific application domains to determine why and what, subsequentlydrawing from their specialized knowledge of visual encoding and interac-tion design choices as well as known human capabilities with respect toperception [62, 254] and interaction to envision how that task is to be sup-ported. A need for task analysis also arises in visualization evaluation [183],particularly in observational studies of people using visualization tools andtechniques. Our typology provides a code set for qualitatively describingthe behaviour of participants in such studies.2.2 Background ContextAs we expect some readers to be unfamiliar with the context that motivatedthis work, we begin with a brief discussion of our current inability to suc-cinctly describe and analyze visualization tasks. The primary limiting factorin using existing classifications as tools for analysis is that we cannot easilydistinguish between the ends and means of tasks. Making this distinctionis a central problem for practitioners during the abstraction phase of designstudies [284] and during the analysis phase of qualitative studies of peopleusing visualization tools or techniques [183].For instance, a number of existing classifications mention the word de-rive [8, 60, 130, 186, 239, 301]. Is derive a task, or the means by whichanother task is performed? A person may derive data items as an end initself, for example to reduce the number of dimensions in a dataset, or asa means towards another end, such as to verify a hypothesis regarding theexistence of clusters in a derived low-dimensional space. The ends-means23ambiguity exists for many terms found in existing classifications: considerfilter [8, 48, 117, 130, 166, 175, 186, 216, 239, 242, 260, 262, 291, 366],navigate [130, 298, 343], or record [130, 216, 301]. The first step towards dis-tinguishing ends from means involves asking why someone would visualizedata separately from how the visualization tool or technique supports thetask, a question that is central to the organization of our typology.The separation of why and how does not in itself resolve all confusion.Consider sort, another term appearing in existing classifications [8, 117,130, 186, 239]. Sorting has an input and an output; in some cases, it isitems of data within a single view [251]; in others, views themselves may besorted [17]. In both cases, the sorted output can serve as input to subsequenttasks. The next step in distinguishing ends from means is thus characterizingwhat the task’s inputs and outputs are, allowing us to describe sequences ofinterdependent tasks.To illustrate how the ends-means ambiguity arises during the course ofanalysis, we will now attempt to use representative existing classification todescribe two example tasks:Example #1: recall the example stated above in Section 2.1, that of acasual encounter with an electoral map in which a person compares tworegions; election results for each state are encoded as a choropleth mapbased on two attributes, candidate and margin of victory. Furthermore, weknow that this person is familiar with American geography and its regions;this prior knowledge dictates the type of search.Using the typology of Andrienko and Andrienko [12], we might describethis example as an elementary direct comparison task. While richer than aseries of retrieve value tasks [8], this description tells us little about why andhow this comparison was performed. Low-level descriptions derived from anumber of other classifications are similarly impoverished [51, 117, 263, 330,349, 366, 370].We might enrich our description of this task using a recent taxonomy ofcartographic interaction primitives by Roth [260, 262], a much more com-prehensive approach that distinguishes between goals, objectives, operators,24and operands. Using his taxonomy, this task would be described as follows:• goals: procure• objectives: compare• operators: retrieve and calculate• operands: attribute–in–space (search target); general (search level)While the dimensions of this description are similar to the questionsof why, how, and what, the description is incomplete, particularly in itsclassification of goals and objectives. Roth’s taxonomy provides us only witha partial sense of how the comparison is performed: retrieve does not tellsus about whether the person knows the spatial location of the regions to becompared a priori. The goal, procure, does not provide us with any higher-level context or motivation for why the person is procuring; specifically, theperson’s casual interest in these two regions is lost. Finally, Roth’s taxonomyimposes a spatial constraint on operands, leaving us unable to fully articulatewhat is being compared.Example #2: in evaluation studies [183], it is sometimes necessary toperform a comparative analysis of a task being performed using di↵erentvisualization tools or techniques. Consider a person using a tree visualizationtool whose interest relates to two nodes in a large tree, and her intent isto present the path between these nodes to her colleagues. SpaceTree [122]and TreeJuxtaposer [220] are two tree visualization tools that allow people tolocate paths between nodes by means of di↵erent focus + context techniques.Both tools allow for path selection, in which the encoding of selected pathsdi↵ers from that of non-selected paths. The tools di↵er in how the elementsthat have been visualized are manipulated: TreeJuxtaposer allows a personto arrange areas of the tree to ensure visibility for areas of interest, whileSpaceTree couples the act of selection by aggregating and filtering unselecteditems.As in the previous example, task descriptions from existing classifica-tions seldom answer all three questions: why, how, and what. Using the25taxonomy of interactive dynamics for visual analysis by Heer and Shneider-man [130], we might describe this task as being an instance of data and viewspecification (visualize and filter) as well as view manipulation (navigate andselect). This description tells us how, but it doesn’t specify why the data isbeing visualized.We might complement Heer and Shneiderman’s description with onebased on a taxonomy of graph visualization tasks by Lee et al. [186], inwhich this task would be classified as a topology task, namely one of deter-mining connectivity and subsequently finding the shortest path. As the scopeof Lee et al.’s taxonomy is specialized, we are provided with a clear indica-tion of what the person’s interest is, this being a path. Unfortunately, thisdescription provides only a partial account of why data is being visualized;we are not provided with a high-level motivation beyond determining andfinding.Both descriptions do not relate the person’s actions to the high-level goalof presenting information to others. Second, and more importantly, thesedescriptions fail to distinguish how this task is performed using SpaceTreefrom how it is performed using TreeJuxtaposer.Summary: these examples demonstrate our inability to comprehensivelyanalyze tasks using existing classifications of behaviour of people who usevisualization tools or techniques. Note that we are not directly criticizingthese classifications; we acknowledge that their scope is often deliberatelyconstrained, with some focusing on low-level tasks, interactions, or opera-tions [8, 12, 42, 51, 58, 60, 75, 117, 166, 186, 263, 291, 329, 330, 343, 349, 366,370], while others focus on high-level tasks or goals [7, 48, 175, 193, 242], oron the behaviour of people who work in specific domains or contexts [186,260, 262, 299]. We lack guidance on how to integrate these disjoint bodiesof work, to compose task descriptions that draw from all of them. This inte-gration is the aim of our typology, which will allow practitioners to describetasks that address critical questions posed during visualization design andevaluation, namely why, how, and what.It could be argued that a classification of tasks should focus solely on26why?present discovergenerate / verify enjoylookuplocatebrowseexploreproduceidentify compare summarizetarget known target unknownlocation unknownlocation knownqueryconsumesearchhow?annotateimportderiverecordselectnavigatearrangechangefilteraggregateencodemanipulate introducewhat?[ input ] [ output ](if applicable)a bcFigure 2.1: Our multi-level typology of abstract visualization tasks.The typology spans why, how, and what; task descriptions areformed by nodes from each part: (a) why data is visualized,from high-level (consume vs. produce) to mid-level (search)to low-level (query); (b) how a visualization tool or techniquesupports the task in terms of visual encoding and interactiondesign choices; (c) what the task inputs and outputs are.the goal of the person who uses the visualization tool or technique, or whydata is visualized; people are often not immediately concerned with how atask is performed, as long as their task can be accomplished. We argue thatby classifying tasks according to how they are performed, in addition to whythey are performed and what they pertain to, we can improve communicationbetween visualization practitioners working in di↵erent domains, facilitatingtool-independent comparisons, the analysis of diverging usage strategies forexecuting tasks [337, 371], and improved reasoning about design alternatives.2.3 A Typology of TasksOur multi-level typology3 of abstract visualization tasks, represented in Fig-ure 2.1, is encapsulated by three questions: why the data is being visualized,3We denote this work as a typology, rather than a taxonomy, as the former is appropriatefor classifying abstract concepts, while the latter is appropriate for classifying empiricallyobservable events [13]. For instance, one could construct a taxonomy of the observableways in which a person could interact with a particular visualization tool, or a taxonomyof existing visual encoding design choices for tree-based data [279].27how the visualization tool or technique supports the task, and what does thetask pertain to (Figure 2.1a-c). Complete task descriptions, such as thosefor Examples #1-2 (represented in Figure 2.2), must include nodes fromall three parts of this typology. In the remainder of this dissertation, weuse a fixed-width font to highlight vocabulary from this typology; thisvocabulary is also indexed separately at the end of this dissertation.This structure, while unusual relative to many existing classifications,mirrors the analytical thinking process undertaken in design studies [284].Why, what, and how are also used in Cognitive Work Analysis by Vicente[337], particularly for relating abstractions within a work domain, as well asin an analysis of techniques and tools for visualizing time-oriented data byAigner et al. [2], which asks what is presented?, why is it presented?, andhow is it presented?.We will introduce why before how, as this order reflects the translationof empirically observable domain problems into abstract tasks and subse-quently into visual encoding and interaction design choices: practitionersfirst determine why data is visualized, and then must decide upon how thevisualization tool or technique will support the task. We then discuss thewhat part of our typology, which considers the input and output of tasks.Our typology supports the description of a complex domain-specific visual-ization workflow as a sequence of interdependent tasks, where the output ofa prior task may serve as the input to a subsequent task, as we demonstratein the example featured in Section 2.4.For clarity, we first present our typology in its entirety with minimaldiscussion of the previous work that informed its organization, and thenfocus on these connections in Section 2.5 and in Table 2.1 and Table 2.2.2.3.1 Why is Data Being Visualized?The why part of our typology, shown in Figure 2.1a, allows us to describewhy the data is being visualized, and includes multiple levels of abstraction,a narrowing of scope from high-level (consume vs. produce) to mid-level(search) to low-level (query).28Consume: People visualize data in order to consume information in manydomain contexts. In most cases, this consumption is driven either by a needto present information or to discover and analyze new information [334].However, there are many other contexts in which the information beingvisualized is simply enjoyed [78, 247, 299], where people indulge their casualinterests in a topic.Present refers to the visualization of data for the succinct communi-cation of information, for telling a story with data, guiding an audiencethrough a series of cognitive operations. Presentation using a visualizationtechnique or tool may take place within the context of decision making,planning, forecasting, and instructional processes [102, 199, 260, 262]. Pre-sentation brings to mind collaborative and pedagogical contexts, and theway in which a presentation is given may vary according to the size of theaudience, whether the presentation is live or pre-recorded, and whether theaudience is co-located with the presenter [177].Discover is about the generation and verification of hypotheses andis associated with modes of scientific inquiry [239]. Scientific investigationmay be motivated by existing theories, models, and hypotheses, or by theserendipitous observation of unexpected phenomena [9].Enjoy refers to casual encounters with visualized data [247, 299]. In thesecontexts, a person is not driven by a need to verify or generate a hypothe-sis; novelty stimulates curiosity and thereby exploration [77, 299, 303, 318].This motivation is notably absent from existing classifications, as shownin Table 2.1 and Table 2.2. Casual encounters with visualized data canbe fleeting, such as in the earlier example of encountering a static choro-pleth electoral map while reading a blog post. Conversely, these encountersmight be immersive and time-consuming experiences, such as in museumsettings [247].Produce: we use produce in reference to tasks in which the intent is togenerate new information. This information includes but is not limited to:transformed or derived data, annotations, recorded interactions, or screen-shots of static visualizations. Examples of produce in previous work include29the production of graphical annotations and explanatory notes to describefeatures of line graphs of time series data [355], or the production of graph-ical histories in Tableau intended to document the analytical provenance ofa person using this tool [131]. Additional examples of produce involvingderived data and annotations are featured in the example of Section 2.4.It is important to note that the products of a produce task may beused in some subsequent task that may or may not involve a visualizationtool or technique. For example, some visualization tools for analyzing high-dimensional data allow people to produce new categorical attributes forlabelling clustered data points in a dimensionally reduced coordinate space;these attributes might be used later for constructing a predictive model.Search: Regardless of whether the intent is to present, discover, ormerely enjoy, a person will search for aspects of interest in the visual-ized data. While terms relating to search and exploration are often con-flated [199, 318], we have imposed a characterization of search that dependson what is being sought. We classify them according to whether the identityor location of the search target is known a priori. Whether the identity ofthe search target is known recalls the concept of references and character-istics introduced by Andrienko and Andrienko [12]: searching for knownreference targets entails lookup or locate, while searching for targetsmatching particular characteristics entails browse or explore. Considerour earlier example of a person who is familiar with American geographyand is searching for California on an choropleth map; we would describethis as an instance of lookup. However, a person who is unfamiliar withAmerican geography must locate California.In contrast, the identity of a search target might be unknown a priori;a person may be searching for characteristics rather than references [12];these characteristics might include particular values, extremum, anomalies,trends, or ranges [8]. For instance, if a person using a tree-based visualencoding is searching within a particular subtree for leaf nodes havingfew siblings, we would describe this as an instance of browse because thelocation is known a priori. Finally, explore entails searching for character-30istics without regard to their location; many visualization tools provide anoverview of the data, which is often the starting point for exploration. Ex-amples include searching for outliers in a scatterplot, for anomalous spikesor periodic patterns in a line graph of time series data, or for unanticipatedspatially-dependent patterns in a choropleth map.Query: Once a target or set of targets has been found, a person willidentify, compare, or summarize these targets. If a search returns knownor reference targets [12], either by lookup or locate, identify returnstheir characteristics. For example, someone who uses a choropleth map rep-resenting election results can identify the winning candidate and marginof victory for the state of California. Conversely, if a search returns targetsmatching particular characteristics, either by browse or explore, identifyreturns references. For instance, our election map enthusiast can identifythe state having the highest margin of victory.The progression from identify to compare to summarize correspondsto an increase in the amount of search targets under consideration [12, 42,329], in that identify refers to a single target, compare refers to multiplesubsets of targets, and summarize refers to a whole set of targets. As withexplore, summarize is also often associated with overviews of the data [186].Continuing with the choropleth map example, the person identifies theelection results for one state, compares the election results of one state toanother, or summarizes the election results across all states, determininghow many favoured one candidate or the other, or the overall distributionof margin of victory values.2.3.2 How Does the Visualization Technique or ToolSupport the Task?We now turn our consideration to the how part of our typology, whichcontains idioms, defined as families of related visual encoding and interactiondesign choices. This part of our typology, shown in Figure 2.1b, is likelyto be most familiar to readers, as it contains a number of idioms associatedwith interaction design choices that are well-represented by several existing31classifications [117, 216, 260, 262, 366]. We distinguish between three classesof idioms: those for encoding data, those for manipulating previouslyencoded elements, and those for introducing new elements.Encode: The majority of visualization tasks rely on how data is initiallyencoded as a visual representation4. A full enumeration of visual encodingdesign choices for various datatypes beyond the scope of this chapter andappears in Munzner’s book [219].Manipulate: The following idioms a↵ect previously encoded elements,modifying them to some extent. These idioms represent families of inter-related design choices incorporating both interaction and visual encoding.We consider visual encoding and interaction design choices in a unified waybecause many idioms incorporate aspects of both [211, 217], such as focus+ context techniques [122, 220].Select refers to the demarcation of one or more encoded elements, dif-ferentiating selected from unselected elements [252]. Examples range fromdirectly clicking or lassoing elements in a scatterplot to brushing designchoices used to highlight elements in visualization tools incorporating mul-tiple linked views [348].Navigate refers to instances where the person using a visualization toolor technique alters their viewpoint, such as zooming, panning, and rotat-ing. Other navigation instances include the triggering of details-on-demandviews, combining navigate and select [291].Arrange refers to the process of organizing encoded elements spatially.This includes arranging representations of data [193, 216, 353], such as re-ordering the axes in a parallel coordinates plot or the rows and columns ofa scatterplot matrix (SPLOM). Other forms of arrangement allow peopleto coordinate the spatial layout of views [130, 348].Change pertains to alterations in visual encoding. Simple examples in-clude altering the size and transparency of points in a scatterplot or edgesin a node-link graph, altering a colour-scale or texture mapping, or trans-4Some tasks do not depend on how the data is visually encoded, or take place before thedata is encoded; consider, for instance, produce tasks that involve deriving new data orrecording states of a visual analysis process or presentation for downstream consumption.32forming the scales of axes. Other alterations have more pronounced e↵ects,changing the visual encoding, such as transitioning between grouped andstacked bar charts, or between linear and radial layouts for line graphs oftime series data. Pronounced changes in visual encoding such as these areoften facilitated by smoothly animated transitions, which reduce their dis-ruptive e↵ects [129].Filter refers to adjustments to the exclusion and inclusion criteria forencoded elements. Some forms of filtering allow for elements to be temporar-ily hidden from view and later restored, while other forms are synonymouswith outright deletion. As an example of temporary filtering, consider aperson examining an age histogram based on population census data. First,she decides to exclude males, then further adjusts her filter criteria to focussolely on unemployed females. Finally, she revises the gender criteria tofocus on unemployed males.A common example of permanent filtering, or deletion, is that ofmanually selecting and removing outliers resulting from errors in dataentry. Alternatively, consider a scatterplot in which some data points arelabelled with manually generated categorical tags. Deleting a tag wouldremove this categorical label from all data points having that tag.Aggregate concerns changes in the granularity of encoded elements; wealso consider its converse, segregate, as being associated with this familyof design choices. For example, a person may adjust the granularity of acontinuous time scale in a line graph, aggregating daily values into monthlyvalues, or segregating annual values into quarterly values. Alternatively, aperson may aggregate a clique within a node-link graph into a representativeglyph, or segregate clique glyphs into their component nodes.Introduce: While manipulate idioms alter previously encoded elements,introduce idioms add new elements.Annotate refers to the addition of graphical or textual annotations asso-ciated with one or more encoded elements. When an annotation is associatedwith data elements, an annotation could be thought of as a new attributefor these elements. The earlier example of manually tagging points in a33scatterplot with categorical labels is one such instance of annotating data.Import pertains to the addition of new data elements. In some envi-ronments, these new data elements might be loaded from external sources,while others might be manually generated.Derive refers to the computation of new data elements given exist-ing data elements. Aggregating data often implies deriving data, howeverthis may not always be true: we further specify that derived data mustbe persistent, while aggregated data need not be. For instance, a personmight derive new attributes for tabular data using a multi-dimensionalscaling (MDS) algorithm.Finally, record refers to the saving or capturing of elements as persistentartefacts. As a consequence, record is often associated with produce. Theseartefacts include screen shots, annotations, lists of bookmarked elements orlocations, parameter settings, or interaction logs [293]. An interesting ex-ample of record is that of assembling a graphical history [131], in whichthe output of each task includes a static snapshot of the state of the vi-sualization tool, and as these snapshots accumulate they are encoded as abranching tree. Recording and retaining artefacts such as these are oftendesirable for maintaining a sense of analytical provenance, allowing peoplewho use the tool to revisit earlier states or parameter settings.2.3.3 What are the Inputs and Outputs of the Task?Previous work has reached no agreement on the question of what is visu-alized. Many classifications do not address it at all; others discuss whatimplicitly, as indicated by the parenthetical terms in Table 2.1 and Ta-ble 2.2. Of those that classify what, some focus on the level of the entiredataset, such as tables composed of values and attributes or networks com-posed of nodes and links [291]. Others allow more precise specification ofdata-attribute semantics, such as categorical, ordinal, and quantitative [48].A few classifications include not only data but also views as first-class cit-izens [58, 60, 130, 343]. Specific examples of what as classified in previouswork include:34• Values, extremum, ranges, distributions, anomalies, clusters, correla-tions [8].• Graph-specific objects [186]: nodes, links, paths, graphs, connectedcomponents, clusters, groups.• Time-oriented primitives [2]: points, intervals, spans, temporal pat-terns, rates of change, sequences, synchronization.• Interaction operands [343]: pixels, data [values, structures], attributes,geometric [objects, surfaces], visualization structures.In this typology, we have chosen a flexible and agnostic representationof what that accommodates all of these modes of thinking: in short, we havea “bring your own what” mentality. The only absolute requirement is toexplicitly distinguish a task’s input and output constraints when describingsequences of interdependent tasks [329]. An extensive discussion of what thatdovetails well with this typology appears in Munzner’s book [219]5.2.3.4 Concise Task DescriptionsOur multi-level typology can be used to concisely describe visualizationtasks. Each task is defined by why data is being visualized, how the vi-sualization tool or technique supports the task, and by what are the inputsand outputs of the task. Single tasks may involve multiple nodes from eachpart of the typology, as shown in Figure 2.2.We have chosen to present these descriptions using a simple and flexiblevisual notation, rather than with a formal grammar [12, 185, 280, 329, 353];in doing so, creating and iterating on task descriptions can be easily inte-grated into existing collaborative design and ideation activities, making useof materials such as coloured sticky notes and whiteboards. A crucial as-pect of these descriptions is that sequences of interdependent tasks can bechained together, such that the output from earlier tasks forms the input5Munzner [219] provides a structured classification of data as well as a classificationof targets; both can be used in the analysis of inputs and outputs. The classification oftargets is represented in Figure 6.2.35enjoylookupcompareencodevalues for two known regionsexample #1presentlocateidentifyencodenavigateselect+filteraggregate+++encodenavigateselect+arrange++SpaceTree TreeJuxtaposera path between two nodesexample #2Figure 2.2: Task descriptions for Example #1 (left): casually encoun-tering an choropleth electoral map and comparing election re-sults for two regions; and Example #2 (right): presenting apath between two nodes in a large tree using SpaceTree [122]and TreeJuxtaposer [220].to later tasks, as discussed in the following example and as represented inFigure 2.3.2.4 Example: A Sequence of InterdependentTasksVisualization tasks are seldom executed in isolation, and the output of onetask may serve as input to a subsequent task. To illustrate this type ofdependency, we present an example in which our typology is used to describea sequence of interdependent tasks.Consider the case of labelling clusters of related items in a dataset withmany dimensions, where a label is a new categorical attribute value for theitem. Labels are assigned to clusters by means of annotation. However, onemust first explore the visualized dataset and identify clusters of interest.Here a person uses a visualization technique in which items in the dataset362D dataclusters and pointsscatterplotencodenavigateselect++discoverexploreidentifyproducederivehigh-dim. data 2D dataproduceannotateclusters and pointscolours for pointshow?why?what?legend“verifying a hypothesis regarding the existence of clusters of unlabelled items in a scatterplot of dimensionally-reduced data, then labelling the points.”Figure 2.3: An example sequence of tasks, described as a sentence(top left), as well as using the structure and vocabulary of ourtypology (top); the bottom depicts a series of transformationscorresponding to the inputs and outputs of each task.are encoded as points in an interactive scatterplot. Identifying clustersis facilitated by navigating and selecting items in this scatterplot; uponselection, additional attribute values for the item are shown in details-on-demand secondary displays or in tooltips. This task too has a dependencyon the result of an earlier task. Before the the data is encoded in thescatterplot, a set of two-dimensional distances between data points must beproduced: they are derived from the original set of dimensional attributesusing DR.Using our typology, we can express dependencies in which the outputof one task serves as the input of another, such as the relationship betweenhow data is derived and the choice of visual encoding. Such dependenciesare represented in Figure 2.3.As in the examples of Section 2.2, we can compare our description tothose generated by other classifications. Consider the classification of ba-sic visualization tasks by Chuah and Roth [60], which distinguishes betweenthree categories of operations. Using this classification, this sequence of tasks37could be described as having data operations (derived attributes), graphicaloperations (encode data), and set operations (create set, express member-ship). This description does specify how and what, but it does not expressthe interdependencies within a sequence of tasks, nor does it tell us why thedata is being visualized. Neither can we easily distinguish when sets (orclusters) are created, as this might occur before the data is encoded, or itmight occur via interactive selection of items in the scatterplot.While the description based on Chuah and Roth [60] classification isatemporal, the operator interaction framework by Chi and Riedl [58] definesstage- and transformation- based operators occurring along the visualizationpipeline. Their framework does not contain a comprehensive list of opera-tors, so we draw from the example operators cited in their paper to describethis sequence of tasks as follows:1. visualization transformation operators:dimension reduction (DR)2. visual mapping transformation operators:scatterplot3. view stage operators:zoom, focus, details-on-demand, pickThis description does capture the interdependencies for this sequenceof tasks, though it mischaracterizes the processes of dimension reductionas transformations to visualized elements, rather than transformations ondata, a distinction central to our definition of derive. While this descriptioncaptures up until the second task in our description, it does not capture thefinal task of producing cluster labels by means of annotation.The description based on our typology retains the separability of thesetasks, ensuring the distinction between interim inputs and outputs. An-other problem with descriptions generated by existing classifications wasthat of coverage; the how part of our typology includes both derive andannotate, while descriptions generated by other classifications could notaccount for the latter [8, 58], or both [239, 291, 330, 349, 366, 370]. Finally,38our description also accounts for both why data is derived and why clus-ters are annotated with tags, whereas descriptions generated using existingclassifications mention how a task is performed in relation only to when it isperformed [58] or to what it is performed on [60]. We maintain that a taskdescription requires why, how, and what; the question of when for a sequenceof interdependent tasks is best served by denoting task input and output.2.5 Connections to Previous WorkOur typology was informed in part by related work, including existing clas-sifications and established theoretical models, and in part by new think-ing with many points of divergence from previous work. We surveyedwork relating to tasks spanning the research literature in visualization, vi-sual analytics, HCI, cartography, and information retrieval. We focus ontwo subsets that informed the configuration of our typology: thirty worksthat explicitly contribute a taxonomy, typology, characterization, frame-work, or model of tasks, goals, objectives, intentions, activities, or inter-actions [7, 8, 12, 42, 48, 51, 58, 60, 75, 117, 130, 166, 175, 186, 193, 216,239, 242, 252, 260, 262, 263, 291, 298, 301, 329, 330, 343, 349, 366, 370],along with twenty other references that make compelling or noteworthy as-sertions about the behaviour of people who use visualization tools or tech-niques [2, 9, 77, 78, 102, 163, 199, 218, 244, 247, 261, 293, 299, 303, 318,321, 327, 334, 345, 353]6. The similarities between the individual nodes ofour typology and those of existing classifications and other related work arepresented in detail in Table 2.1 and Table 2.27.Table 2.1 and Table 2.2 serve three purposes: they document our choicesof terms for the purpose of reproducibility, they illustrate the influence ofprevious work on our thinking, and they indicate overrepresented and un-derrepresented areas in the literature, such as consume and enjoy in thewhy part of our typology. Note that non-leaf nodes in the how part of our6Appendix A includes additional meta-analysis of this literature and documents theevolution of our typology.7Yalc¸ın [364] has visualized the vocabulary from previous work represented in thesetables here: http://keshif.me/demo/vis_tasks.html.39typology are poorly represented, serving to indicate the gap between lowand high levels of abstraction.2.5.1 Existing ClassificationsThe scope of existing classifications can be categorized in three di↵erentways: level of abstraction, temporality, and applicability.Level of abstraction: Much of previous work can be divided into thosehaving a low or high level of abstraction, with very little falling in between.Relying solely on either type of classification leads to the aforementionedends-means confusion, thereby limiting the potential for rigorous analysis.Low-level use of a visualization technique or tool is well represented in re-lated work [8, 12, 42, 51, 58, 60, 75, 117, 166, 186, 263, 291, 329, 330, 343,349, 366, 370]. Elements common to many of these classifications includeselect, filter, and navigate. Following Lee et al. [187] and Roth [262], wenote that low-level classifications of tasks are often conflated with those ofinteraction design choices. At the high level, abstract tasks can be found inthe context of theoretical models, but without explicit connections to low-level visualization tasks [7, 48, 175, 193, 242]. Examples of these includeconfirm hypotheses, present, and explore. Low-level classifications often pro-vide a sense of how a task is performed, but not why; high-level models arethe converse. Our focus on multi-level descriptions of visualization tasks isintended to close this gap and resolve the ends-means confusion.Temporality: Most classifications are atemporal in that they do not haveany way to express sequences or dependencies between di↵erent stages. Afew classifications are explicitly temporal in that they divide the behaviorof people who use visualization tools or techniques into larger stages thatoccur in specific sequences or cycles. Examples include pipeline models forvisualization construction [58] or data analysis [163], or cyclic models such asknowledge crystallization [48] or information foraging and sensemaking [242].However, empirical observations of the use of visualization tools and tech-niques have indicated a mismatch between the specific cyclical or sequentialpatterns proposed by these models and the actual behaviour of people [152].40why?consume –! present present, [293, 334], author, compose [48]*, build (case), tell (story) [242]*, depict [239]*,express (ideas), describe [301]*, guide, share [130]* inform, elaborate [370]*, report [163],! discover(generate,verifyhypotheses)discover [199], explore [370]*, [334], verify [51]*, [199], synthesize [216]*, [199], investi-gate, integration (of insight) [301]*, [199], frame operations: construct, elaborate, question,reframe [175]*, assimilate, assess, understand [239]*, infer [330]*, analyze [216, 239]*,[199], support, reevaluate (hypotheses) [242]*, monitoring [345], confirm (hypotheses), ex-pose (uncertainty), formulate (cause and e↵ect), concretize (relationships), learn (domainparameters), multivariate explanation [7]*, evaluate, learn, investigate [199], open-ended ex-ploration, diagnosis [244], abduction, deduction, induction [239], generate, confirm (hypothe-ses) [9, 102], integrate, interpret [102], exploratory and confirmatory data analysis [327]! enjoy (using visualized data in casual contexts) [247, 299], strolling [78]produce export [260, 262]*, store [216]*, save [193, 260, 262]*, extract [48, 291]*, generating (im-ages) [244], (a classification) [48, 301]*, (a categorization) [216, 349, 370]*, (a record ofone’s history / process) [130, 291, 301]*search search [48, 242, 370]*, acquire [216]*, visual queries [345]! lookup lookup [51]* [199], identify: lookup (value) [349]*, (value) lookup [263]*, retrieve (value) [8,186, 239, 260, 262]*[321], procure [260, 262]*! browse browse [48, 216, 239, 298]*, [78, 318], search [260, 262]*, finding (gestalt) [42]*, browsingtasks: follow (path) [186]*! locate locate [193, 216, 330, 349, 370]*, [102], search [51]*[78], search (for known item) [199],seek [298]*, pathfinding [345]! explore explore [193, 239, 366]*, [345, 353], forage [48, 193, 242]*, finding (gestalt) [42]*,(overview) tasks [186]*, find (clusters, correlations, extremum, anomalies) [8, 186, 239]*,determine (correlations) [263]*, determine (clusters) [349]*query query [252]*, posing queries [42]*, elementary and synoptic tasks [12]*, levels of ques-tions [329]*, question answering [199]! identify identify [186, 216, 239, 260, 262, 330, 349, 370]*, [260, 262], reading (the data) [102],read (fact, pattern) [48]*, lookup [12]*, examine [301]*, determine (range) [8, 186, 239]*,determine / characterize (distribution) [8, 186, 239, 349]*, recognize [175]*! compare compare [12, 175, 216, 239, 260, 262, 301, 330, 370]*, [199], compare (within a relationvs. across / between relations) [263, 349]*, relation seeking [12]*, read comparison [48]*,making comparisons [42]*, [345], discriminate [216]*, associate [260, 262]*! summarize summarize [370]*, summarize (set), enumerate (set objects) [60]*, overview [48, 75, 291]*,(overview) tasks [186]*, scan [186, 216]*, connectional tasks [12]*, count [186, 291]*, visual-ization [78], review [293]Table 2.1: Nodes in the why part of our typology of abstract visualiza-tion tasks and their relation to the vocabulary used in previouswork. Underlining is used where a term used in our typologyappears in related work. Terms in parentheses are encompassedby the what part of our typology. Previous work that explicitlycontributes a classification system is denoted by *; other sourcesincidentally make compelling or noteworthy assertions about theuse of visualization tools or techniques.41how?encode encode [60, 239, 366, 370]*, create mapping [60]*, visualize [130, 330]*, generate [301]*,transform (visual mapping) [58]*manipulate manipulate [353], (object) manipulation [216]*, modify [252]*, (data) manipulationloop [345]! select select [130, 216, 239, 252, 343, 366]*, brush [117, 166, 239]*, [58, 345, 353], distin-guish [349, 370]*, emphasize [370]*, di↵erentiate [239]*, highlight [75, 130, 252]*, [345],identify: portray, individualize, profile [370]*, indicate [216, 252]*, mark [216, 366]*, refer-ence [216]*, outline (clusters) [370]*, promote [48]*, track [366]*, pick [216]*[58], express(set membership) [60]* connect [239, 366]*! navigate navigate [130, 298, 343]*, [199, 218, 244, 345, 353], focus [42, 75]*, [58], details-on-demand [48, 291]*, [58], flip through [58], zoom [42, 48, 75, 117, 166, 216, 239, 260,262, 291, 366]*, [58, 218, 353], pan [42, 117, 216, 239, 260, 262, 366]*, [353], elabo-rate [239, 366]*, abstract [239, 366]*, change (range) [117]*, drill down [75]*, maneuver/ navigate [301]*, rotate [58, 353] revisit [117, 186]*! arrange arrange [42, 260, 262]*, sort [8, 117, 130, 186, 239]*, [218], rank [260, 262, 349, 370]*, co-ordinate [130]*, delineate, sequence [260, 262]*, index [263]*, move [216, 252]*, edit [216]*,organize [130]*, [293], orient, permute, position, translate [58], reorder [48, 353], config-ure [330]*, reconfigure [239, 366]*, restructure [193]*! change change (parameters) [75]*, [58], change (metaphor) [117]*, change (representation) [75]*,change (vis. encoding) [218], transform [252]*, [199, 353], transform (mapping), shift,scale, set (graphical value) [60]*, rotate, scale [58], configure [330]*, animate [58, 353], dis-tort [166, 343]* [58], orient / transform [301]*, (object) manipulation: transform, stretch,shape [216]*, re-express, re-symbolize, re-project [260, 262]*, edit [216, 260, 262]*, acti-vate [252]*! filter filter [8, 48, 117, 130, 166, 175, 186, 216, 239, 242, 260, 262, 291, 366]*, [218, 321, 353],subsetting, (value) filtering, (view) filtering [58], exclude [199, 321], screen: filter, sup-press, conceal [216]*, maneuver: (data) management / culling [301]*, configure [330]*,delete (objects, sets, graphical objects) [60]*, delete [48, 117, 252]*, overlay [260, 262]*,restore [117, 216]*! aggregate aggregate [216]*, [58, 218], cluster [48]*, [58], associate [216, 349, 370]*, simplify [58],link [42, 75, 166, 216]*, [293, 353], merge [117]*, generalize / merge [370]*, assem-ble [216]*, create (set) [60]*, split [117]*, disassemble [216]*, disassociate [216]*, reveal:itemize, separate [370]*, segregate: ungroup, unlink [216]*, withdraw, overlay [216]*introduce introduce [216]*! annotate annotate [117, 130, 260, 262]*, add placemark [366], create (anchors) [193]*, create /copy (graphical objects) [60]*, create / modify (note) [117]*, externalize (analysis arte-facts) [293], give a meaningful name to (groups / clusters) [186]*! import import [260, 262]*, add (objects) [60]*, create [48, 216]*, generate [252]*, (data) en-try [216]*, load [193]! derive derive [130]*, derived (attributes) [60]*, derive (new conditions) [301]*, compute (derivedvalue) [8, 186, 239]*, copy [252]*, compute [370]*, calculate [216, 260, 262, 301]*, config-ure, determine [330]*, average [48]*, computation operators [51]*, transform (data) [58]*,estimate, generate (statistics) [301]*, extrapolate [216], *[102], interpolate [216], *[102]! record record [130, 216, 301]*, bookmark [117]*, history [291]*, redo, undo [117, 366]*Table 2.2: Nodes in the how part of our typology and their relation tothe vocabulary used in previous work. Typographic conventionsfollow those used in Table 2.1.42Vicente argues that sequence-based approaches to task analysis are overlyrigid and thus inappropriate for describing such open-ended tasks [337], andthat constraint-based approaches to task analysis allow for more flexibilityin terms of how a task is performed. Descriptions based on our typologydo not force any strict global temporal orderings, as imposed by sequence-or cycle-based models; instead, they accommodate local interdependencieswithin sequences of tasks by way of constraints on task input and output.Applicability: Many classifications represented in our survey are appli-cable across domains and datatypes, though specifically-targeted classifica-tions and models do exist. Examples include Lee et al.’s task taxonomyfor graph visualization [186] and Lammarsch et al.’s task framework fortime-oriented data [185]. Our typology encompasses and complements thesespecific classifications, and we encourage further development of more likethese.We are also aware of five domain- and datatype-agnostic classificationsthat span low-level and high-level tasks. These classifications had the highestcontributions to the organization of our own typology:Springmeyer et al. [301] (1992): This classification of scientific dataanalysis covers both how and why, but these aspects are not clearly distin-guished within its hierarchical structure, that which begins with a high-leveldistinction between investigation and integration of insight.Mullins and Treu [216] (1993): This extensive taxonomy contains over150 items: an exhaustive list of high-level mediation and coordination tasks,which overlaps with our classification of why and how, as well as many low-level object-oriented interactions relating to physical interface input andoutput. We do not attempt to specify tasks at this lowest level, though wehave adopted a consideration of input and output in the what part of ourtypology.Pike et al. [239] (2009): Their characterization of analytic discoursedraws from earlier work, distinguishing high-level modes of inquiry [7] asgoals, from low-level tasks [8] and interactions [366]. These are in turndistinguished from the separable intents of representation and interaction43design choices. Bringing these formerly disjoint classifications together islaudable, though the integration of this information for the purpose of ana-lyzing tasks was not the focus of Pike et al.’s article. The aim of our typologyof abstract tasks is to make this integration explicit, relating these intentsand design choices (how) to modes of inquiry, goals, and tasks (why).Heer and Shneiderman [130] (2012): Their taxonomy of interactiondynamics provides a top-level distinction between data-, view-, and process-centric tasks. The focus of their taxonomy is on interactive elements andoperations; ten of the twelve task types they characterize are encompassedby the how part of our typology. The two remaining process and provenancetasks, share and guide, are captured by the definition of present in the whypart of our typology.Roth [262] (2012): Roth’s taxonomy, based on Norman’s Stages of Ac-tion model [226], classifies cartographic interaction primitives as objectives,operators, and operands. Norman’s model describes a series of translationsbetween a person’s goal, an immediate intention (or objective), and a seriesof actions (operators) performed on an environment (of operands). Roth’sclassification is closely aligned with our notions of why, how, and what, andthus has a high-level structure similar to that of our own typology. How-ever, Roth’s taxonomy imposes a spatial constraint on where operands arelocated in space, as discussed in Section 2.2. In contrast, we restrict ourclassification of what to that of input and output; the location of operandsis represented by the search node in the why part of our typology.What these five classifications have in common is that they are atem-poral, and most span our characterization of why and how. Our typologyintegrates and extends this work, adding a specification of what, the inputand output of tasks8. As a result, our typology can be used to describesequences of tasks, in that the output of one task may serve as the inputof another.8As indicated in Section 2.3.3, we chose a flexible and agnostic representation of whatthe inputs and outputs can be. Munzner [219] has since provided a more structuredclassification of targets that can be used in the analysis of the outputs of tasks; for moredetail about this extension to the typology, see Section 6.1.442.5.2 Theoretical FoundationsOur typology was also informed by four theoretical frameworks:Distributed cognition: The distributed cognition literature o↵ers us auseful distinction between pragmatic and epistemic actions [173, 194]9. Prag-matic actions are explicitly and consciously goal-directed, while epistemicactions serve to coordinate actors’ internal mental models with externalrepresentations of information [194], where an external representation couldbe an image or interface associated with a visualization tool or technique.Given this distinction, epistemic actions are often performed in support ofpragmatic actions. This distinction is lost in low-level classifications; in iso-lation from higher-level goals we are unable to discern between pragmaticand epistemic actions. Our typology accommodates this distinction. Exter-nal representations are the graphical and interface elements displayed to orcreated by a person. Pragmatic actions correspond to the why part of our ty-pology, while epistemic actions are captured by the how part of our typology.The set of manipulate idioms are particularly well-suited for the purpose ofdescribing epistemic actions and their role in coordinating between internaland external representations.Stages of Action: Norman’s Stages of Action model [226] and its influenceon Roth’s objective-operand-operator meta-analysis [261] of previous classi-fications helped shape the why-what-how organization of our typology. Inthe process of evaluating visualization tools, we can discuss Norman’s gulf ofexecution with respect to the how part of the typology, in which we describethe means by which a person can execute the task with a visualization tool.Also central to Norman’s model is the gulf of evaluation, useful for reasoningabout whether the output of a task matches a person’s expectation. How-ever, this gulf is more applicable when reasoning about specific interactiondesign choices, which are not directly addressed by our typology. More re-cently, Lam [180] extended the model with a gulf of goal formation, relevant9Distributed cognition theory is fundamental to the study of collaboration, howeverour typology does not at present explicitly address collaborative visualization tasks; herewe focus solely on other aspects of distributed cognition, namely the distinction betweenpragmatic and epistemic actions.45whenever a person articulates their own questions pertaining to visualizeddata, thereby specifying the ends of a task. This gulf corresponds to the whypart of our typology, which allows us to abstractly describe these questions.Sensemaking: The why part of our typology overlaps with and bridgesto high-level processes of decision making and prediction described in the-ories of information foraging and sensemaking, both temporal stage-basedmodels [48, 242] and atemporal data-frame models [175]. In particular,sensemaking models connect at the levels of discover, denoting hypothe-sis generation and formation, present, and the types of search: lookup,locate, browse, and explore.Play theory: Casual interactions with visualized data pose another setof problems for many existing classifications, in that task specifications forthese contexts are not easy to motivate by a need to present, discover, orproduce [299]. We included enjoy in the why part of the typology to en-compass casual consumption of information, curiosity-driven tasks withoutexpectations or predicted outcomes [77, 318]. The choropleth map exampleused in Section 2.1 is an instance of this type of task. As visualized databecomes increasingly pervasive in casual contexts, we may turn to theoriesof casual information seeking and newsreading behaviour, such as Stephen-son’s Play theory [303] to motivate visualization tasks in these contexts.This theory accounts for media consumption activities that bring no ma-terial gain, serving no “work” functions, but instead induce moments ofabsorption and self-enchantment. Casual media consumption relies uponserendipitous apperception, a readiness to interact with information relat-ing to existing interests. Studies of newsreading behaviour indicated thatpeople read most avidly what they already know about [303], a seemingly ir-rational activity that cannot be described as an explicit need to discover newinformation. We posit that this behaviour is also true of some consumptionof visualized data, particularly in non-work contexts [247, 299].462.6 DiscussionOur motivation to develop a multi-level classification of abstract tasks grewin part from our own needs. We have specifically noted that our ability torigorously analyze tasks has been constrained in the context of the designand evaluation of visualization tools or techniques in general [211, 219] andof design studies in particular [284]. We o↵er this new typology as a nextstep in the ongoing discussion in the literature, rather than as a final answer.Our e↵orts also serve the broader purpose of strengthening the science ofanalytical reasoning [315] by further uniting the frameworks and method-ologies of the cognitive sciences with those in the field of visualization [246].This work also calls for a wider range of evaluation methods centred aroundtask analysis, with a feedback loop in which tasks observed in field settingscan inform subsequent design and evaluation. Our multi-level task typologywill serve to expedite this translation and analysis.We now discuss the capabilities and potential usage of our typology interms of its descriptive, evaluative, and generative power [16, 18].2.6.1 Using the Typology to DescribeThe typology’s descriptive power is in its provision of a consistent lexicon fortasks in terms of why, how, and what in a way supports precise comparisonsacross di↵erent visualization tools and application domains. This lexiconcan be used to describe and compare tasks as they occur in situ, of partic-ular use to those analyzing current work practices and use of visualizationtools “in the wild”. This form of inquiry is often performed within a singledomain, such as enterprise data analysis [163] or intelligence analysis [164],wherein tasks are described in a domain-specific way. An interface- anddomain-independent vocabulary for multi-level tasks allows practitioners toperform comparative analyses of tasks involving di↵erent visualization toolsoccurring in di↵erent disciplines10.Only the descriptive aspect of the typology has been directly validated10The interview study described in Chapter 3 is an example of using the typology toclassify and compare tasks spanning multiple domains.47in this chapter; we used our typology to describe several empirical cases in-cluding single tasks and a sequence of interdependent tasks. We also demon-strated its ability to facilitate the comparison of tasks as they are performedusing di↵erent visualization tools. Future work includes a further examina-tion of its descriptive power by analyzing whether it covers the full set ofabstract tasks described in previously published design studies. In addition,we acknowledge that the typology does not at present explicitly address col-laborative use of visualization tools, although we did consider some of theissues involved during its development [153]. Future work will verify if thetypology can suciently describe collaborative tasks, or if extensions areneeded11.2.6.2 Using the Typology to GenerateThe typology’s generative power stems from its ability to prescribe and in-form design12. In particular, the typology is well-suited to support taskanalysis occurring throughout the formative discover and design stages ofSedlmair et al.’s nine-stage design study framework [284]. In the discoverstage, the practitioner must transform a domain problem into an abstracttask description; the typology provides an explicit set of choices for whydata is being visualized, possibly making this dicult aspect of design stud-ies more tractable. In the design stage, the practitioner then chooses howthe task will be supported, calling upon the existing repertoire of encod-ing and interaction design choices or inventing new ones. During both thediscover and design stages, the practitioner must consider what comprisesthe inputs and outputs of these tasks, remaining aware that these tasksmay have interdependencies. Once a set of candidate design choices havebeen identified, the designer must consider additional constraints beyondinterdependencies, including human capabilities with respect to perceptionand interaction, domain conventions, and display medium. Regarding visualperception in particular, seminal research by Cleveland and McGill [62] iden-11This future work may involve revisiting the distributed cognition literature and itsdiscussion of collaboration, as indicated in footnote 9.12We use the typology to inform design in Chapter 5.48tified the perceptual constraints and limitations with respect to elementaryperceptual tasks along di↵erent visual channels (e.g., comparison of posi-tion, length, area, shading, angle, direction, etc.); di↵erent visual encodingchoices will involve di↵erent combinations of visual channels, so knowledgeof these constraints and limitations allows us to rank these choices in termsof expected e↵ectiveness [197]. Taking all of these constraints and limi-tations into consideration, the designer can then make informed decisionsabout candidate design choices intended to support the task or sequence oftasks.2.6.3 Using the Typology to EvaluateThe typology is intended to facilitate the evaluation of the experience ofusing a visualization tool or technique, which includes field studies such asin Chapter 4. We can validate the typology’s evaluative power by using itas a set of codes for labelling human behaviour, a common practice in open-ended observational studies of people using visualization tools or techniques;these include longitudinal insight-based studies [229] and multidimensionalin-depth long-term case studies (milcs) [292]. A milc study of SocialAc-tion [237], a social network visualization tool, incorporated a categorizationof interaction design choices by Yi et al. [366] into the analysis of how peo-ple performed the tasks; we intend that our task typology be used in asimilar manner, in which the scope of analysis is expanded to include whyand what. Mixed-method qualitative evaluation studies allow practitionersto determine how a task is performed along with its inputs and outputsvia interaction logs and observational analysis; we can also determine whydata is being visualized via interviews, think-aloud protocols, and artefactanalysis.Task descriptions generated by our typology can also be used to bet-ter understand individuals’ analytical strategies and the context-dependentvariability with regards to how a task is performed [337, 371]. Understand-ing individual problem solving strategies in terms of mental model forma-tion and coordination is also an ongoing goal of distributed cognition re-49search [137, 193]. Our typology and its accommodation of pragmatic andepistemic actions may serve to further this research in the study of peopleusing visualization tools and techniques.Finally, while our typology may not provide the low level of specifica-tion required for defining the procedures of empirical experiments aimed atevaluating the performance of human subjects with respect to specific in-teraction and visual encoding design choices [183], it may be use to connectand contextualize these low-level experimental tasks to high-level tasks anddomain-specific activities.2.7 SummaryThe primary contribution of this chapter is a multi-level task typology thatrelates both why and how a task is performed to what the task pertainsto in terms of inputs and outputs. The typology allows for the precisedescription of complex tasks as sequences of simpler tasks, with their inter-dependencies made explicit. One major advance of the new typology is thatit bridges the gap between the low-level and high-level tasks of previous workby providing linkages between them, distinguishing the ends and means ofa task. Our typology integrates new thinking with existing classificationsof tasks, and with previously established theoretical frameworks spanningmultiple literatures. The multi-level task typology presented here is anotherstep towards a systematic theoretical framework for visualization, helping usto describe existing visualization experiences, evaluate them, and generatenew ones.50Chapter 3Interview Study:Visualizing Dimensionally Reduced Data: Interviews withAnalysts and a Classification of Task Sequences1We characterize five task sequences related to visualizing dimensionallyreduced data, drawing from data collected from interviews with ten dataanalysts from di↵erent application domains, and from our understanding ofthe technique literature. Our classification of visualization task sequences fordimensionally reduced data fills a gap created by the abundance of proposedtechniques and tools that combine high-dimensional data analysis, dimen-sionality reduction (DR), and visualization, and is intended to be used in thedesign and evaluation of future techniques and tools. We discuss implica-tions for the evaluation of existing work practices, for the design of controlledexperiments, and for the analysis of post-deployment field observations.3.1 MotivationDR is the process of reducing a dataset with many dimensions to a lower-dimensional representation that retains most of its important structure. It1This chapter is a slightly modified version of our paper Visualizing DimensionallyReduced Data: Interviews with Analysts and a Characterization of Task Sequences byMatthew Brehmer, Michael Sedlmair, Stephen Ingram, and Tamara Munzner; in Pro-ceedings of the ACM Workshop on Beyond Time and Errors: Novel Evaluation MethodsFor Information Visualization (BELIV 2014), p.1-8 [36]. http://dx.doi.org/10.1145/2669557.2669559.51a b cFigure 3.1: A task sequence involving dimensionally reduced data. (a)Data is reduced to two dimensions; (b) encoded in a scatterplotto verify visible clusters; and (c) colour-coded according topreexisting class labels to match clusters and classes.has been an active research area throughout several decades and across manydomains, from its origins in psychology [319, 368] through statistics [41] tomachine learning [155, 313, 331] and visualization [149, 150, 159, 365].While many techniques and tools combining DR with visualization havebeen proposed, there is still no perfect automated solution that will gen-erate the most e↵ective visual encoding for every situation. Analysts arefaced with complex choices between alternative DR techniques and be-tween di↵erent visualization techniques for analyzing the resulting data.These choices are strongly dictated by the analysts’ data and tasks [312].The statistics and machine learning communities have provided extensiveclassifications of DR techniques based on data and technique characteris-tics [73, 101, 124, 155, 332, 360]. In contrast, there is very little that isexplicitly stated about the characteristics of tasks that analysts engage inwhen visually analyzing dimensionally reduced data. To guide designers, an-alysts, and those who conduct evaluations of techniques and tools, a betterunderstanding of these tasks is essential.The contribution of this chapter is a classification of five task sequencesrelated to the visualization of dimensionally reduced data: naming synthe-sized dimensions, mapping a synthesized dimension to original dimensions,verifying clusters, naming clusters, and matching clusters and classes. Inthe last of these sequences, illustrated in Figure 3.1, an analyst uses DR andscatterplots to verify clusters, and then match them with existing classes.52Our classification is based on an in-depth analysis of ten interviews withanalysts who use DR for visualizing their data, as well as on a literaturereview of papers that apply DR for the purpose of data visualization. Ouranalysis framework is our typology of abstract tasks proposed in Chapter 2.Our typology allows practitioners to characterize task sequences based onobserved work practices, occurring in requirements gathering activities andin field evaluations of deployed tools.3.2 Related WorkClassifying tasks: The systematic analysis of worker activities and tasksis a critical process in the design and evaluation of technology, and taskanalysis frameworks appear in many di↵erent fields, including human factorsand ergonomics [337], HCI [216], and visualization, including the typologyof tasks proposed in Chapter 2.While many classifications of visualization tasks are agnostic to datatype,some address specific types of data [291], such as network data [186], time-oriented data [185], and tabular data [133]. As we discussed in Section 2.5.1and in Meyer et al. [211], datatype-specific task classifications consider aspecific set of data abstractions, facilitating a mapping to appropriate visualencoding and interaction design choices. A datatype-specific task classifica-tion of tasks is also critical for evaluation, such as when specifying tasks tobe performed by participants in controlled experiments. In this chapter, wepropose a datatype-specific classification of task sequences for dimensionallyreduced data.Classifications of tasks are often based on their authors’ own experiencein conjunction with a thorough consideration of the literature [7, 291], whileothers are based on observations of human behaviour in controlled labora-tory settings [8]. In contrast, our classification of task sequences is primarilybased on an interview study with analysts working with their own data [203],allowing us to ground our findings in real data analysis practices.Mapping tasks to design choices for high-dimensional data analy-sis: There are many approaches that combine analysis of high-dimensional53data, DR, and visualization, including some developed by our researchgroup [149, 150, 357]. While there are existing classifications of high-dimensional data analysis techniques [24] and of dimensionally reduceddata [285], the mapping between data, tasks, and appropriate design choicesremains unclear [312]. This problem is particularly apparent when design-ing to accommodate workflows, or instantiations of task sequences withinsoftware tools for high-dimensional data analysis [150, 159].One task for dimensionally reduced data is that of matching clusters andcategorical classes given with the data, discussed below in Section 3.4.2.Based on findings from an empirical data study, we previously identifiede↵ective visual encoding design choices that support this task [286], and wecalled for similar work to be done for other tasks relating to dimensionallyreduced data. Our classification of task sequences moves us closer to thisgoal.Expert judgments and dimensionally reduced data: We are aware ofone other study involving expert analysts’ interpretations of visualized di-mensionally reduced data, though they do not share our explicit examinationof analysts’ domain problems and tasks: Lewis et al. [188] asked expert andnovice analysts in a controlled lab setting to subjectively rate the value oftwo-dimensional scatterplots of seven dimensionally reduced datasets, gener-ated using nine di↵erent DR techniques. Their findings showed that expertswere more consistent than novices in their positive and negative ratings.Judging the value or quality of a visual encoding of dimensionally reduceddata should occur regardless of task, and analysts can additionally leverageautomated quality metrics based on human perception [5, 24]. In our study,the domain experts we interviewed varied in terms of their perceived under-standing of DR; furthermore, we sought to characterize experts’ tasks andactivities in naturalistic settings, rather than in a controlled lab study.3.3 Research ProcessOur methodological choice was motivated by a vibrant thread of work in thevisualization community using qualitative methods in general [49, 153, 322],54and interview studies in particular [163, 164]2.Data collection: Between 2010 and 2012, we interviewed nineteen data an-alysts working in academic and industry settings, representing over a dozendomains, spanning the natural sciences, computer science, policy analysis,and investigative journalism. These analysts were recruited from our ex-tended personal and professional networks via snowball sampling, and theywere known to work with high-dimensional data. These interviews weresemi-structured, lasting in duration from one to four hours3; some of theseinterviews were more akin to contextual inquiries [139], occurring at theanalyst’s workplace, while others were performed in our department or viateleconference.We discussed the analysts’ domain context, their data analysis goals,and their data; we also asked more specific questions about how they trans-formed their data and their use of DR and visualization techniques4. Wealso collected artifacts from these analysts, including their published papersand theses, their unpublished manuscripts, screenshots of their visualizeddata, and in some cases, even their data.Data analysis: We alternated between data collection and analysis, pro-gressing from initial to focused coding of the data [55]5.In this chapter we concentrate our attention on the ten analysts who(a) specifically used dimensional synthesis DR algorithms in analyzing theirhigh-dimensional data, and who (b) also visualized their dimensionally re-duced data.To analyze the data that we collected from these ten interviews, weused our typology of abstract visualization tasks, proposed in Chapter 2.Our typology distinguishes why data is being visualized at multiple levelsof abstraction, what inputs and outputs a task may have, as well as howa task is supported by visual encoding and interaction design choices. This2We elaborate on the evolution of our methodology and the foci of our analysis inSection B.3.3In Section B.1, we indicate that we conducted twenty-four interviews in total, as fiveanalysts were interviewed twice; see Table B.1.4The interview foci and questions can be found in Section B.25Example artefacts from this data analysis process can be found in Section B.4.55DR name synth. dimensionsstartDR name synth. dimensionsmap synth. to originalstartDR verify clustersstartDR verify clustersstart name clustersDR verify clustersstart name clustersmatch clusters and classesFigure 3.2:Five task sequences that involve visualizing dimensionally re-duced data. Individual tasks are described using our typologyin Figure 3.5.lens allowed us to focus on a subset of our findings from the standpointof visualization design and evaluation6, culminating in the task sequencespresented in Section 3.4. In Section 3.5, we revisit the typology and illustratehow it can describe our five task sequences.Finally, we enriched our analysis with further examples from the litera-ture. We specifically sought papers that report on applications where DRand visualization were performed in conjunction for analysis, and we con-sider these applications with respect to the task sequences we characterized.3.4 Task SequencesWe have identified five task sequences related to dimensionally reduced data.In this section, we describe each task sequence and illustrate the sequence inFigure 3.2. Each is named after the terminal task appearing in the sequence.We also comment on how these task sequences arose in our interviews, and6The previous interpretations of our findings are documented in Section B.5. Weinitially focused on characterizing DR techniques, people who use them, their tasks, andtheir challenges [283]. Later, we narrowed our focus to that of DR techniques and relatedtasks (see Section B.6). In this chapter, our focus is even narrower, on tasks relating tothe visualization of dimensionally reduced data following the use of dimensional synthesisDR techniques.56which visualization techniques were used to address these sequences. Thesetask sequences are not exclusive: some analysts performed multiple tasksequences in the course of their work. This descriptive survey of analysts’data, task sequences, and visualization is summarized in Table 3.1. Thedataset sizes being investigated by these analysts ranged from dozens toover a million dimensions, and from hundreds to hundreds of thousands ofitems.Dimensionality reduction: All the task sequences we characterized beginwith DR. In our context, we define DR as a means of dimensional synthesis:a set of m synthesized dimensions is derived from n original dimensions,where m < n. Dimensional synthesis techniques are commonly di↵erenti-ated between linear and non-linear [155]. Linear techniques such as principalcomponent analysis (PCA) [161] or classical MDS [319, 368] produce syn-thetic dimensions from linear projections of the original data. However,many datasets have an intrinsic structure that can only be revealed usingnon-linear techniques, such as Isomap [313], t-distributed stochastic neigh-bor embedding (t-SNE) [331], or Glimmer MDS [149]. Further distinctionbetween linear and non-linear dimensional synthesis is outside of the scopeof this chapter, though we note that some techniques are more appropriatefor verifying the existence of local cluster structure while others are moreappropriate for identifying global intrinsic dimensions (or manifolds) [188].In Table 3.1, we note who used linear and non-linear DR.It is not our intent to catalog and di↵erentiate the large body of DRtechniques; we will concentrate our analysis on their output, asking why doanalysts visualize these synthesized dimensions.3.4.1 Dimension-Oriented Task SequencesWe describe two task sequences that specifically relate to synthesized dimen-sions as generated by dimensional synthesis DR techniques: naming synthe-sized dimensions and mapping synthesized to original dimensions. The verbsname andmap were deliberately chosen and are defined using the vocabularyof our typology in the following two subsections.57Case Data DR Task Sequence Visualization TechniquesID Description#DimsxItemsLinearNon-LinearNameDimensionsMapDimensionsVerifyClustersNameClustersMatchClusters2DScatterplots3DScatterplotsSPLOMsScreeplotsGraph/TreeCorrel.matrixHeatmapsA1 usage logsfrom onlinemusic service48 x 310 4 4 4 4 4 4 4 4A2 aggregatedsearch enginemetrics12–31 x 1,463 4 4 4 4 4 4 4 4A3 recreationalboatingsurvey data39 x 543 4 4 4 4 4 4 4 4 4A4 protein re-gion data160 x 10–100K 4 4 4 4 4 4 4A5 polymermolecule fea-ture vectors1K x 10K 4 4 4 4 4 4 4A6 bibliometricco-occurrencematrix20K x 20K 4 4 4 4 4 4 4 4A7 human mo-tions frommultiplesensors1,170 x 9,120 4 4 4 4 4A8 genomic,clinical datafrom patients1.4M x 600 4 4 4 4 4 4A9 distancematrix ofgenomesequences100K x 100K 4 4 4 4 4 4 4 4 4A10 distance ma-trix of textdocuments10K x 10K 4 4 4 4 4 4Ref.[41] distancematrix ofMorse codes36 x 36 4 4 4 4 4[201] BRDF re-flectancemodel4.36M x 104 4 4 4 4 4[256] quadrupedskeletonmodels348–406 x 9 4 4 4[313] 64 x 64 pximages4,096 x 698–1K 4 4 4 4Table 3.1: Top: A summary of task sequences performed by the tenanalysts that we interviewed, along with the visualization tech-niques(s) used to perform these tasks sequences. Bottom: exam-ples of task sequences in papers discussing DR and visualization.58Name synthesized dimensions: Given a set of synthesized dimensions,an analyst may want to discover what these dimensions mean, to generatehypotheses about the semantics of these synthesized dimensions. An ana-lyst will browse the set of synthesized dimensions, and for each dimensionof interest, she will browse items and their corresponding values; as a result,she may be able to identify the name of a synthesized dimension.This task sequence was attempted by two of the analysts we interviewed(A1 and A2 in Table 3.1). Both worked in the field of HCI and attemptedto identify the intrinsic dimensions related to usage data collected aboutonline search behaviour and music listening behaviour, respectively.A common approach, employed by both analysts, is to inspect datapoints plotted according to two synthesized dimensions in a two-dimensionalscatterplot, in which the analyst may be able to discern an interesting se-mantic relationship along the axes. In some cases, these scatterplots areaugmented with text labels containing categorical information, such as itemname, annotated adjacent to a subset of the plotted points [41, 201, 313]or available through interaction. Tenenbaum et al.’s paper describing theIsomap algorithm [313] contains a particularly compelling example (repro-duced in Figure 3.3), in which each data point in a scatterplot corresponds toan image of a face; a random sample of these images are displayed directly inthe scatterplot as thumbnails adjacent to their corresponding points. Giventhis display, it is possible to discern names for the three synthesized dimen-sions resulting from dimensional synthesis.Map synthesized to original dimensions: Regardless of whether an an-alyst is interested in naming synthesized dimensions, another possible tasksequence involves mapping synthesized dimensions back to original dimen-sions. In the context of PCA [161], this mapping is often referred to asthe loading of the synthesized dimensions by the original dimensions. Givena synthesized dimension, an analyst may want to discover this mapping.More specifically, the analyst may either verify a hypothesis that this map-ping exists, or generate a new hypothesis about it. The analyst will browseitems and their values along this synthesized dimension and compare these59Figure 3.3: A visual encoding of dimensionally reduced data, in whichthree synthesized dimensions have been identified: up-downpose along the y-axis, left-right pose along the x-axis, and light-ing direction indicated below each image. Figure from Tenen-baum et al. [313] ( c [2000] AAAS).values to those along the set of original dimensions, looking for similaritiesand correlations. This mapping could allow analysts to identify groups ofcorrelated original dimensions.Four of the analysts we interviewed attempted to perform this sequenceof tasks; two of these analysts had previously attempted to name some oftheir synthesized dimensions. A1 mapped her synthesized dimensions to aset of original dimensions in aggregated usage logs from an online musicstreaming service, while A2 attempted the same task sequence with aggre-gate search engine metrics but was unable to confidently map any of her syn-thesized dimensions to her original dimensions. Both used two-dimensional60scatterplots to carry out this task sequence. The other two analysts wereexplicitly interested in grouping original dimensions based on this mapping:a policy analyst (A3) investigating survey data pertaining to recreationalboating practices used two-dimensional scatterplots to compare synthesizeddimensions and original dimensions, while a bioinformatician (A4) investi-gating protein regions used a SPLOM, heat maps, and density plots.3.4.2 Cluster-Oriented Task SequencesThere exists another set of task sequences where the semantics of the synthe-sized dimensions are not a central interest; instead, analysts are interested inclusters of items that might be revealed in the dimensionally reduced data.We characterize three task sequences: verify clusters, name clusters, andmatch clusters and classes. As with the dimension-oriented task sequences,the verbs verify, name, and match were deliberately chosen and are definedusing the vocabulary of our typology in the following three subsections.Verify clusters: Analysts might seek to verify the hypothesis that clus-ters of items will be revealed in the dimensionally reduced data, or toverify hypotheses about specific conjectured clusters. In order to discoverclusters, analysts must locate and identify item clusters in the low-dimensional representation of the data; in the example of Figure 3.4b, wecan identify three clusters.All ten of the analysts we spoke to were interested in verifying thatclusters exist in their data. This task sequence is also captured by a discus-sion by Buja and Swayne [41] about visualizing data following multidimen-sional scaling. The analysts we interviewed used a variety of visualizationtechniques when performing this task sequence, including two-dimensionalmonochrome scatterplots, such as those depicted in Figure 3.4a-b, as wellas three-dimensional scatterplots, SPLOMs, dendrograms, heat maps, anddensity plots.Name clusters: Once the existence of clusters has been verified, such asin the example of Figure 3.4b, the next task is often one of generatinghypotheses regarding the meaning of these clusters in the form of a name.61−2 −1 0 1 2 3−4−202.x.y(a) No discernible clusters.−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5−1.5−1.0−0.50.00.5.x.y(b) Three discernible clusters.−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5−1.5−1.0−0.50.00.5.x.y(c) A match between clusters andclass labels.−80 −60 −40 −20 0 20 40 60−60−40−200204060.x.y(d) A partial match between clustersand class labels.−2 −1 0 1 2 3−4−202.x.y(e) No discernible class separation.Figure 3.4: Example scatterplots of dimensionally reduced data illus-trating tasks related to item clusters: Verifying the existence ofclusters, naming clusters, and matching clusters and classes.In this discover task, an analyst will browse items within a cluster andattempt to summarize the cluster with a meaningful name. In some cases,this name is made explicit, as the analyst will annotate the cluster, therebyusing the visual encoding to produce new information about their data.Eight of the analysts who had previously verified clusters also attemptedto name clusters in the course of their work, using the same visualizationtechniques. For instance, A6 examined bibliometric data from a corpusof life sciences research literature, who attempted to identify and nameclusters of related research concepts, such as “cancer” or “RNA”.Match clusters and classes: The final task sequence we characterize ismatching clusters with classes. The input to this match task is not only62a set of item clusters, identified in the earlier verify clusters task, butalso a set of categorical class labels. These classes might come directly withthe data, be assigned using a clustering algorithm run by the analyst, or bethe result of manual labeling. The analyst must verify a hypothesis that acluster of items matches the class for those items. To discover a match, theanalyst performs a lookup for the class and cluster membership of an itemin order to compare them, resulting in a match (as in Figure 3.4c), otherwisereferred to as a true positive, or a mismatch (as in Figure 3.4d-e), which couldeither be a true negative or a false negative. This task was examined in ourrecent paper [286], a paper that o↵ered guidance for choosing appropriatevisualization techniques for dimensionally reduced data.Naming the clusters is not a pre-requisite for this match task, thoughwe did encounter four analysts who reported performing both tasks in suc-cession (A1, A2, A9, A10); two other analysts performed this task withoutpreviously naming the clusters they identified (A7, A8). Typically, this taskwas performed using two-dimensional scatterplots, wherein the points werecoloured using the class labels; SPLOMs, interactive and non-interactivethree-dimensional scatterplots, and node-link graphs were also used. Notethat the visual separability of colour-coded clusters di↵ers perceptually fromthe separability of monochrome clusters, as described in our recent taxon-omy of cluster separation factors [285]. These perceptual di↵erences shouldbe taken into account particularly when determining which experimentalstimuli for use in controlled experiments.A possible outcome of this task sequence is a partial match betweenclasses and clusters: there may be more clusters than classes, or vice versa.In cases where there are more clusters than class labels, illustrated in Fig-ure 3.4d, this outcome suggests that the class labels may not capture afiner-grained cluster structure in the data, as was the case for the investiga-tive journalist that we interviewed (A10). In cases where there are moreclasses than clusters, illustrated in Figure 3.4e, this result may either be atrue negative, in which perfect class separation is not possible, or a false pos-itive [286]. If this mismatch is suspected to be a false negative, Sedlmair etal. recommend selecting other dimensions to visualize, using other design63choices such as a SPLOM, or revisiting the choice of DR technique.3.5 A Task Typology RevisitedThe analysts that we interviewed hailed from very di↵erent domains, eachusing a di↵erent terminology to describe their work processes. For instance,we needed a way to compare how diagnosing cancer patients based on theirgenomic data (A8) was like classifying types of human motion through theuse of sensors attached to the body (A7). We required an abstract vocabularyfor describing and comparing the work processes of these analysts.For this reason, we used our typology of abstract visualization tasks,introduced in Chapter 2, which provided a domain-agnostic vocabulary andframework for describing visualization tasks in terms of why, what, and how.By describing a task in this manner, we can link outputs and inputs todescribe sequences of interdependent tasks, which Norman would refer toas activities [227]. We use it here to describe task sequences relating tovisualizing dimensionally reduced data across multiple domains.Our analysis concentrated on the why and what aspects of the taskspertaining to dimensionally reduced data, as summarized in Figure 3.5. Wechose not to be prescriptive about how these task sequences should best besupported by visualization techniques; instead, we described the variety oftechniques used by the analysts that we interviewed for each task sequence,as summarized in Table 3.1.The analysts we interviewed were all interested in discovery, whichinvolves the generation and verification of hypotheses. Figure 3.5b-fshow which tasks relate to hypothesis generation and which relate tohypothesis verification. The graphical depiction also shows which taskcan be associated with pure consumption of information and which task canadditionally lead to the production of new information. When consuminginformation, an analyst will search for targets within a visual encoding.Whether the location and identity of these targets is known a priori willdetermine the type of search. In tasks related to visualizing dimensionallyreduced data, we found that search strategies used by analysts were either64discovergenerate hypothesesbrowseidentifyannotatesynthesized dimensionsidentified dimensionsinput output query search consume produce Name Synthesized Dimensions     Map Synthesized Dimension to Original DimensionsVerify Clusters Name Clusters Match Clusters and Classesdiscoververify hypotheseslocateidentifyitems + original dimensions item clustersinput output query search consume discovergenerate hypothesesbrowsesummarizeannotateitems in cluster cluster namesinput output query search consume produce discovergenerate, verify hypothesesbrowsecomparesynthesized dim. + original dims.mapping between synthesized & originalinput output query search consume discoververify hypotheseslookupcompareclusters + classes(mis)matches between clusters & classesinput output query search consume Dimensionality Reduction: Dimensional Synthesisn original dimensions m synthesized dims. (m < n)input output deriveproduce a b cd e fFigure 3.5: Six tasks related to dimensionally reduced data, charac-terized using our abstract task typology introduced in Chap-ter 2, which describes why the data is being visualized at multi-ple levels of abstraction (yellow) and what inputs and outputsa task has (grey). These tasks are combined to form the tasksequences described in Section 3.4.browse, locate, or lookup, as indicated in Figure 3.5b-f. Once targets arefound, an analyst will execute some form of query: they might identifya single target, such as an item cluster, compare multiple targets, such asvalues along a synthesized dimension to values along an original dimension,or summarize all the targets, such as when naming a cluster.Dependencies: The task sequences described in Section 3.4 contain de-pendencies. For example, in order to match clusters and classes, an analystmust first verify that clusters exist. Each of the sequences also depend onthe output of DR techniques, the derived synthetic dimensions. The ap-plication of DR to a set of original dimensions is itself a task, as shownin Figure 3.5a. However, unlike the other tasks described in this chapter,it is about neither hypothesis generation nor verification, but rather about65enjoydiscovergenerate / verifypresentlookuplocatebrowseexploreidentifyonecomparesomesummarizealltarget known target unknownlocation unknownlocation knownqueryconsumesearchannotatederiverecordproduceFigure 3.6: The why part of our abstract task typology from Chap-ter 2, with the refinement (emphasized in red) that the actionsof annotate, record, and derive are forms of produce [219].producing new information intended to support subsequent tasks.While the distinctions between these tasks and task sequences may seemobvious in hindsight, we initially struggled to find a vocabulary and frame-work that would allow us to distinguish between these task sequences andtheir interdependencies. Our task typology, introduced in Chapter 2, allowsus to describe these task sequences explicitly, whereas they were implicit inprevious work combining DR and visualization.Extended typology: Figure 3.6 reproduces the why part of an extendedtask typology [219]7.The changes relevant to our analysis in this chapter pertain to three ac-tions: an analyst may annotate information, derive new information fromexisting, or record their use of a visualization tool so as to provide ana-7We comment further on the the extensions to our typology in Section 6.1.1.66lytical provenance or to facilitate subsequent presentations of the visualizeddata. The terms annotate, derive, and record were previously attributedto families of interaction design choices in the how part of our typology;the extended typology classifies them as ends rather than means and thussituates them as forms of produce. Both versions of the typology distin-guish whether a person will visualize data either to consume or produceinformation. The remaining aspects of the typology describing lower levelsof abstraction are unchanged.3.6 DiscussionWe discuss the utility of our classification of task sequences with regardto several visualization evaluation scenarios, the limitations of our currentfindings, and our planned future work.3.6.1 Implications for EvaluationTask analysis and evaluation are closely linked. An understanding of visual-ization tasks informs how an evaluation is conducted, from the justificationof experimental procedures to the collection and analysis of field observa-tions.Our current work adds to previous task classifications proposed in thevisualization evaluation literature [133, 186, 330]. As evaluation takes onmany forms, we frame our discussion around four of Lam et al.’s scenariosfor empirical studies [183].Understanding work practices: Work practice evaluation or work do-main analysis can provide a richer understanding of the perspective of peoplewho might benefit from visualizing their data, reflecting real work practicesand activities. While we have outlined their immense importance severaltimes [34, 211, 217], only a few dedicated examples exist in the visualizationliterature [163, 164, 322].More commonly, however, such work practice evaluations occur in de-sign studies, an increasingly popular form of problem-driven visualizationresearch. In particular, a design study’s early discover stage [284] involves67the analysis of work practices within a very specific usage context in a partic-ular domain. These concrete work practices are then translated into abstractvisualization tasks and design requirements.Our current work goes beyond task classification in design studies byconducting interviews with analysts across di↵erent application domains.We then cast our findings as task sequences or activities [227] that abstractaway domain-specific language. In doing so, we intend to support researcherswhen conducting and analyzing future work practice evaluations, specificallywhen DR techniques are to be used. We encourage practitioners to adoptour classification of task sequences into a lexicon for coding observationsof work practices and for translating domain-specific descriptions of thesepractices. We believe that using our task sequences will make the analysisprocess more ecient and, furthermore, will allow for transferability betweendesign studies from di↵erent application domains [284].Evaluating human performance: Our classification of task sequencescan inform the design of experimental procedures and participant instruc-tions in controlled laboratory studies, where the aim might be to quanti-tatively assess human performance on a newly proposed visualization tech-nique. Many previous classifications of tasks have informed experimentaldesign, such as the adoption of a task classification by Zhou and Feiner[370] in a laboratory evaluation of an information retrieval tool [214]. Weexpect that our classification of task sequences will play a similar role in theevaluation of techniques or tools that visualize dimensionally reduced data.For instance, an experiment might compare multiple visualization techniquesfor verifying clusters and subsequently matching clusters and classes, whereperformance might be measured in terms of speed and accuracy.Munzner [217] refers to such studies as a form of downstream validation,in which a design has been implemented for its investigation in a study. Incontrast, upstream validation in this case refers to the justification of visualencoding and interaction design choices before its implementation. We deemour task sequences to be similarly helpful for such upstream evaluations.Researchers presenting new visual encoding or interaction design choices68can refer to our task sequences to concisely state assumptions about whichabstract tasks are supported, rather than leaving this description implicitin a way that places a burden on a potential adopter of the design choice.Evaluating the experience of using a visualization tool or tech-nique: In either lab or field settings, a researcher can evaluate the experi-ence of using a tool or technique by dictating the tasks without specifyinghow to execute them, asking study participants to verbalize their actionswhile they attempt to execute a sequence of tasks. Such a think-aloud pro-tocol might allow the researcher to understand if features of the tool arelearnable, useful, or in need of further usability improvements. Question-naires and interview questions relating to the experience of using a visual-ization tool or technique could also be framed around our classification oftask sequences.We note that expertise has many facets; the distinction between novicesand experts is a particularly nuanced question for studies considering DR.Several of the high-dimensional data analysts that we interviewed might bedescribed as middle-ground users [150]: they had significant domain exper-tise but only a partial understanding of the available DR tools and of themathematics underlying these techniques. This characteristic is importantto keep in mind when recruiting participants for evaluations of performanceor experience, as some evidence exists that participants with an understand-ing of DR will interpret visual encodings of dimensionally reduced data dif-ferently than those who do not have this understanding [188].Evaluating visual data analysis and reasoning: While a researchermust dictate the tasks in a controlled laboratory experiment, another sce-nario is the observation of tasks in an open-ended qualitative evaluation ofa visualization tool or technique. Here, the researcher must recognize whenthese task sequences appear in naturalistic settings, in order to better un-derstand how visual data analysis and reasoning are supported following theintroduction of a new visualization tool. This form of evaluation is typicalin design studies [284, 292], particularly after a tool is deployed.As with evaluations of work practices, our classification of task sequences69could become part of a lexicon for coding observed behaviour after a toolis deployed. In cases where direct observation of tool use is not possible,our classification of task sequences might be used to analyze interactionlog files, or used as a basis for diary or interview questions, suggesting aconsistent vocabulary for coding participant responses. Precedents for theuse of task classification in evaluation of deployed tools include the adoptionof a classification by Yi et al. [366] in a longitudinal field study of a socialnetwork analysis tool [237], or how we used our task typology introducedin Chapter 2 to evaluate why and how journalists used Overview, a tool foranalyzing large document collections [35]8.Finally, if we consider the task sequences name synthesized dimensionsand name clusters in particular, one conceivable evaluation of visual dataanalysis and reasoning would involve collecting participant annotations andexplanations of synthesized dimensions or clusters in visual encodings ofdimensionally reduced data. Such a study might adopt a protocol similarto one used by Willett et al. [355] to elicit participant annotations andexplanations of visualized time series data in an application deployed online.This evaluation could help to identify the features of a visualization tool thatfacilitate or inhibit visual data analysis and reasoning.3.6.2 LimitationsOur interview findings are certainly not exhaustive, and despite conductinginterviews with nineteen analysts, only ten of these analysts contributedto our classification of task sequences. This selection was based on ourgoal of studying task sequences relating to visualizing data reduced withdimensional synthesis techniques. There are many other interesting areasof high-dimensional data analysis that we did not address. Specifically, wefound that many of our excluded interviewees used dimensional filteringtechniques, in which a subset of the original dimensions are retained [159,365]. Alternatively, other analysts applied DR to their data without visuallyanalyzing it. In these cases, DR was used to reduce the data for algorithmic8This use of the task typology is documented in Chapter 4.70input, such as for classification and other machine learning applications.We consider our findings to be existence proofs of the task sequencesas performed by analysts as part of their ongoing work. We do not makeclaims about the prevalence of these task sequences in high-dimensional dataanalysis, nor do we make claims about completeness: our classification oftask sequences might be incomplete due to sampling or observer bias.3.7 SummaryIn this chapter, we presented a classification of five task sequences relatedto visualizing dimensionally reduced data:• Name synthesized dimensions: discover meaning of these dimen-sions, generate hypotheses about their semantics, browse these di-mensions and their corresponding values, and ideally identify theirnames.• Map synthesized to original: discover this mapping, verify a hy-pothesis that this mapping exists, or generate a new hypothesis aboutthis mapping; for a synthesized dimension, browse items and their val-ues and compare these values to those from the original dimensionsand ideally identify groups of correlated original dimensions.• Verify clusters: verify a hypothesis that clusters of items exist, orverify a hypotheses about specific conjectured clusters, locate clus-ters.• Name clusters: generate hypotheses regarding the meaning of theseclusters, browse items within a cluster, summarize the cluster with ameaningful name; in some cases, annotate the cluster (produce newinformation about the data).• Match clusters and classes: verify a hypothesis that a cluster of itemsmatches the class for those items; to discover a match, lookup theclass and cluster membership of an item in order to compare them.71Our abstract classification of these task sequences fills a gap between thelarge body of technique-driven literature and analysts’ domain problems inthis area. We encourage other researchers to consider these task abstractionsin the evaluation of existing work practices, in the discover phase of futuredesign studies involving high-dimensional data and DR, in the design ofcontrolled experiments, and in field evaluations of deployed visualizationtools.72Chapter 4Field Study:Overview: The Design, Adoption, and Analysis of a VisualDocument Mining Tool For Investigative Journalists“The Street finds its own uses for things - uses themanufacturers never imagined.” — William Gibson in “RocketRadio” (Rolling Stone, June 15, 1989)1For an investigative journalist, a large collection of documents obtainedfrom a Freedom of Information Act (FOIA) request or a leak is both ablessing and a curse: such material may contain multiple newsworthy sto-ries, but it can be dicult and time consuming to find relevant documents.Standard text search is useful, but even if the search target is known itmay not be possible to formulate an e↵ective keyword search term. Inaddition, summarization is an important non-search action. We presentOverview2, an application for the systematic analysis of large document col-1This chapter is a slightly modified version of our paper Overview: The Design, Adop-tion, and Analysis of a Visual Document Mining Tool For Investigative Journalists byMatthew Brehmer, Stephen Ingram, Jonathan Stray, and Tamara Munzner; in IEEETransactions on Visualization and Computer Graphics (Proceedings of InfoVis 2014),20(12), p. 2271–2280 [35]. http://dx.doi.org/10.1109/TVCG.2014.2346431. Section 4.9is a new addendum section that is unique to this dissertation. High-resolution versions ofthe figures in this chapter are available here: http://cs.ubc.ca/labs/imager/tr/2014/Overview/.2Throughout this chapter, Overview is italicized to distinguish it from “overview”, anoverloaded term in the visualization literature.73lections based on document clustering, visualization, and tagging. This workcontributes to the small set of studies which evaluate a visualization tool “inthe wild”, and we report on six case studies where Overview was voluntarilyused by self-initiated journalists to produce published stories. We find thatthe frequently-used language of “exploring” a document collection is bothtoo vague and too narrow to capture how journalists actually used our appli-cation. Our iterative process, including multiple rounds of deployment andobservations of real world usage, led to a much more specific classificationof tasks. We analyze and justify the visual encoding and interaction designchoices used in Overview’s design with respect to our final task abstractions,and propose transferable lessons for visualization design methodology.4.1 MotivationFOIA requests, leaks, government transparency initiatives, or other disclo-sures can result in thousands or millions of pages of potentially newsworthymaterial. Investigative journalists must find the stories lurking in thesemassive document collections, but it is frequently impossible to read everydocument. Standard text search can be used to locate documents con-taining particular terms, but not all information retrieval problems can beexpressed as word search queries, especially if the relevant information isunexpected or novel. Journalists may also be interested in patterns of textacross many documents, which can reveal significant trends, categories, orthemes. We conjectured that this document mining problem could be solvedby a visualization tool built around clustering and tagging documents. Thepath from this hypothesis to a tool that working journalists would voluntar-ily use was a long one; we needed to refine both our understanding of theproblem and the ways in which journalists might want to solve it.This chapter reports on the design, adoption, and analysis of Overview3,an application developed by co-author Jonathan Stray in collaboration withour research group over several years. Overview, shown in Figure 4.1, vi-sualizes a document collection as a tree where nodes represent clusters of3https://www.overviewdocs.com/74Figure 4.1: Overview is a multiple-view application intended for thesystematic search, summarization, annotation, and readingof a large collection of text documents, hierarchically clusteredbased on content similarity and visualized as a tree (left). Pic-tured: a collection of White House email messages concerningdrilling in the Gulf of Mexico prior to the 2010 Deepwater Hori-zon oil spill.similar documents; a person can navigate this tree, identify clusters,read individual documents, and annotate documents with meaningful tags.A timeline illustrating Overview’s development, deployment, and adoptionphases is shown in Figure 4.2. Beginning with an initial use case, we devel-oped a research prototype (v1), a publicly available cross-platform desktopapplication (v2), and finally a web-based application (v3-v4). Ultimately,we succeeded in building a useful tool for journalists: we report on multiplecase studies where Overview was adopted for real investigations. Analysis ofthese cases revealed that journalists often used the application in ways wedid not anticipate, and we found that the often-used concept of “exploring”a document collection fails to capture the tasks that journalists actuallyperform.75Oct. 2012CS3: RYAN adoption case study (v2)2011 2012 2013Dec. 2010WARLOGS motivating use caseJun. 2011Overview Project receives KnightFoundationfundingNov. 2011v1 (prototype) deployed       +CARACAS pilot case study (v1)Feb. 2012v2 (desktop) deployed      +CS1: IRAQ-SEC deployment case study (v2)Jun. 2012CS2: TULSAadoption case study (v2)Dec. 2012CS4: GUNS adoption case study (v3)Aug. 2013CS5: DALLAS adoption case study (v4)Jun 2013v4 (web-beta) deployedSept. 2012v3 (web-alpha) deployedphase deploymentdevelopment adoptionDec. 2013CS6: NEWYORK adoption case study (v4)2014v3 think-aloud evaluationFigure 4.2: A timeline of Overview ’s development, deployment, andadoption phases: deployments are represented as yellow squares;deployment-phase case studies are represented as (purple cir-cles), while adoption-phase case studies are represented as(turquoise circles). The dotted red lines indicate which versionof Overview was used in each case study.We frame this work as a visualization field study, one that took placeduring and after a process of iterative design addressing a particular do-main problem, involving collaborators and people from that domain. Thecontributions of this chapter include our classification of data and task ab-stractions, a description of its usage in real investigations spanning fourdeployments and six case studies, and a detailed analysis of the mappingfrom these abstractions to visual encoding and interaction design choices.This analysis led to important design revisions, based on a better under-standing of why and how journalists use Overview. From this experience wepropose transferable lessons for visualization design methodology.4.2 Related WorkThere have been a number of approaches and tools to support the analysis ofdocument collections, spanning a range of data transformations and visualencodings. We also review how these tools were evaluated.Topic model visualization: One common approach to visualizing a docu-ment collection uses probabilistic topic models inferred from the collection.These define topics as distributions of words and assign a distribution oftopics per document. Both distributions are visualized directly in recentwork by Chaney and Blei [52], while other tools or techniques focus on thenumber of documents in each topic [70, 80, 191], or use the topic assignments76to compute the similarity between documents [57, 86]. Overview does notuse distribution-based topic models but directly creates a hard hierarchicalclustering, which is visually encoded as a tree.Documents as points: Many visualization tools, including the first twoversions of Overview, encode individual documents as points in a scatter-plot. InfoSky [118] places points according to a pre-existing hierarchicalarrangement of documents; in contrast, Overview is intended for documentcollections that do not have a pre-existing hierarchical structure. Other ap-proaches begin with an unstructured document collection and place pointsbased on document similarity metrics and DR techniques, such as Leak-splorer [38], PEx [232], and EV [57]. Overview v1-v2 included a similar scat-terplot which placed points by DR through MDS. Finally, ForceSPIRE [91]and TopicViz [86] incorporate a scatterplot where points corresponding todocuments can be interactively placed according to one’s own semantics ormental model, adaptively adjusting the underlying similarity metric usedbetween document pairs. In Section 4.6.2, we discuss in greater detail whya scatterplot was omitted from later versions of Overview, and how taggingdocuments and clusters is an e↵ective alternative to interactive placement.Documents as landscapes or clouds: Document collections have alsobeen encoded as landscapes, three-dimensional visual encodings of two- di-mensional scatterplots where height represents density, as in In-Spire [135]and recent work by O¨sterling et al. [231]. However, empirical studies haveshown that spatial landscapes are not well suited for encoding inherentlynon-spatial data, and exhibit poor visual memory performance in compari-son to two-dimensional scatterplots [324].It is also possible to visualize a document collection by encoding clustersof documents as interactive tag clouds, as in Newdle [192]. Once again, pre-vious research has documented the perceptual drawbacks of tag clouds [128].By encoding a document collection as a tree, Overview circumvents theseissues.Documents as networks of entities: Jigsaw’s approach [115, 165] todocument collection analysis di↵ers from Overview in that it emphasizes the77extraction of entities from documents, linking names, places, events, anddates, visualizing these relationships. The emphasis on entities is reflectiveof the domains in which Jigsaw is used, which include intelligence analysis,law enforcement, and academic research [165]. Journalists frequently startwith barely-legible scanned documents which must first be converted to textthrough optical character recognition (OCR), greatly reducing the accuracyof standard entity extraction techniques. As a flexible multiple-view appli-cation, Jigsaw also has a significant learning curve, and people have reportedinvesting many months into learning how to use it [165]. The journalists wespoke to are accustomed to short deadlines and may only intermittently beworking on a story involving a large document collection, so simplicity is acrucial requirement.Documents as trees and rivers: Like Overview, HierarchicalTopics [80]features a tree of document clusters, initially arranged by similar keywords.It allows people to re-arrange the tree according to their own semantics,similar to how a person who uses ForceSPIRE can rearrange documentsin a scatterplot [91]. HierarchicalTopics [80] additionally allows people totrack topic prevalence over time with a stacked area graph visual encodingin the style of a ThemeRiver [127]. However, this approach requires tempo-ral metadata that would be dicult to extract from the diverse documentsources supported by Overview.Evaluating visual document mining tools: Several of the aforemen-tioned tools have been evaluated via controlled experiments and case stud-ies. Controlled experiments, such as those used to evaluate Newdle [192]or HierarchicalTopics [80], often involve non-specialists conducting domain-agnostic tasks specified by the researchers, who conjecture that they matchwith real world usage. Moreover, the documents used in these controlled ex-periments were collections of online news articles which are not appropriatetest data for Overview, as professionally produced news articles are cleanand homogeneous, unlike the diverse and messy documents obtained by ourcase study journalists, which often contain little or no metadata; news ar-ticles are the output of the journalistic document mining process, not the78input.Most similar to our approach is a series of case studies of academicresearchers, intelligence analysts, and law enforcement personnel who hadadopted Jigsaw [165]. These case studies resulted in a better understandingof Jigsaw’s utility in relation to the tasks of people working in a specificapplication domain; like us, they identified similar barriers to adoption andtheir results suggested new directions for design [115, 116].4.3 Initial Use CaseThe Overview project began in December 2010, when journalist and co-author Jonathan Stray visualized a subset (11,616 of 391,832) of the Wik-iLeaks Iraq War Logs [305]. Journalists had previously examined these doc-uments by using text search to retrieve specific records and by visualizingthe structured data fields such as time and location, but they had not at-tempted an analysis of the unstructured text of the reports. In this initialuse case, which we will refer to as warlogs, documents were visualized aspoints placed according to a measure of similarity between documents andcoloured according to pre-existing categorical labels, such as “friendly ac-tion” and “criminal incident.” As shown in Figure 4.3, this design revealedmeaningful cluster structure that cross-cuts the colourings, showing that thepre-existing coarse categorization does not capture the whole story4.The warlogs scatterplot had serious limitations: it was not possible tointeractively and systematically examine the contents of clusters of docu-ments. However, it demonstrated that visual cluster analysis could illumi-nate previously unknown and meaningful structure in a real world documentcollection, a conjecture that Stray had synthesized from his previous experi-ence reporting on this collection of documents. On the basis of this promisingresult, Stray collaborated with us to design an interactive visualization toolfor document mining.4The entire image is shown in Section C.1.79Figure 4.3: Detail from “A full-text visualization of the Iraq WarLogs” (warlogs) [305], in which distinct clusters of documentsare visible; these documents pertain to “criminal incidents” dur-ing the Iraqi civil war involving abductions and blindfolding.4.4 Design of OverviewWe now describe our initial task abstraction, Overview’s underlying dataabstractions, and the elements of its interface.Initial task abstraction: During the development of Overview v1-v2, ourtask abstraction was based on the warlogs use case: journalists wouldbe motivated by the hypothesis that their document collection containeda semantically interesting cluster structure, and would require a means forexploring that structure, drilling down into these clusters to examine thecontained documents. During this exploration, they would need a way tokeep track of what they had discovered, allowing them to revisit previouslyexamined clusters and documents.Data abstractions: Although Overview’s design has evolved over thecourse of four deployed versions, it continues to reflect several underlyingdata abstractions. Overview does not incorporate any novel text analysistechniques; following a practice common in that domain, we convert eachdocument to a vector of words weighted by the term frequency-inverse doc-ument frequency (TF-IDF) formula, and compute similarity between doc-uments using the cosine distance metric [275]. We generate our document80Figure 4.4: Overview v2, a desktop application released in Winter2012. Shown here is 6,849 of the U.S. State Departmentdiplomatic cables released by WikiLeaks, those pertaining toVenezuela. The “Oil industry” tag is selected; clusters con-taining documents having this tag are emphasized in pink inthe Topic Tree and are shown in the Cluster List as a set ofkeywords. Individual documents having the “Oil industry” tagare emphasized in the scatterplot and shown in the DocumentList as a set of keywords. The fifth document is selected; itscontents are displayed in the Document Viewer and it is markedas a larger black dot in the scatterplot.clusters by hierarchically clustering these distances and encoding the re-sult as a tree [146, 151]. Clusters are labeled with keywords extracted viaTF-IDF scores.Multiple meaningful clusterings may exist for any collection of docu-ments [121]; our particular distance metric and hierarchical clustering algo-rithm is but one possible choice. Human-generated clusterings that leveragedomain knowledge can complement automatic clusterings [80, 91]. For thesereasons, Overview allows for an arbitrary number of human-generated tagson each document, which can be assigned to individual documents or at thecluster level. Tags allow people to keep track of what they have found andwhere they have looked so far.81Figure 4.5: Overview v4, a web-based application released in Summer2013. Shown here is 625 White House email messages concern-ing drilling in the Gulf of Mexico prior to the 2010 DeepwaterHorizon oil spill. The “Obama letter” tag is selected; clusterscontaining documents having this tag are highlighted in greenin the Topic Tree. One of these clusters is selected and itskeywords are displayed in a tooltip; the 66 documents in thiscluster are listed in the Document List. Selecting a documentfrom this list reveals the Document Viewer (cf. Figure 4.1).Interface: With each deployment came changes to the interface, though wewill focus on the di↵erences between Overview v2 and v4, shown in Figure 4.4and 4.55, respectively. The visualization design of v1 and v2 are quite similarto each other, as are v3 and v46.Common to all deployed versions of Overview is the Topic Tree visual en-coding, representing a hierarchical clustering of similar documents, the Doc-ument List, showing currently selected documents, the Document Viewer,and the ability to create and assign custom categorical tags to clusters or in-dividual documents; tags are encoded as coloured labels on documents andclusters. Selections of documents are propagated and highlighted across5A video demonstration of Overview v4 is available here: http://vimeo.com/71483614.6Screenshots of v1 and v3 can be found in Appendix C as Figure C.2 and Figure C.6,respectively.82views.The Topic Tree underwent some of the most significant changes. It wasredesigned to emphasize nodes, and to encode the number of documents ineach node, instead of focusing on the edges between identically-sized nodes.In v1-v2, the Topic Tree could be pruned based on a threshold cluster size,controlled using a set of coloured radio buttons below; in v3, we replacedthreshold pruning with an open/close interface that allows a person to showor hide the children of any node. Pan and zoom controls were also added,including an auto-zoom feature that automatically zooms and pans to aselected node.Another prominent change was the removal of the interactive scatterplot,in which individual documents were encoded by points and their place-ment corresponded to a two-dimensional projection of the original high-dimensional TF-IDF vector space, generated via MDS; pairs of documentsappearing closer together were deemed to be more similar than pairs of doc-uments that were farther apart. The scatterplot had panning and zoomingcontrols, and document-points could be selected via clicking or lassoing.We also removed the Cluster List and consolidated the Document Viewerwith the Document List (c.f. Figure 4.1). The Document List now displaysthe document title, extracted keywords, and coloured labels indicating whichtags have been applied to each document. We added full-text keyword searchin v4; documents matching a search term are highlighted with colour labelsin the Topic Tree, and these results can be saved as a persistent tag. Finally,we added a “Show Untagged” button in v4, which highlights documents andclusters where no tags have been applied, a crucial feature for the (initiallyunexpected) task of exhaustively reviewing a document collection.This section described the design without providing any rationale for itsevolution. Our decisions were based on observations of real world usage; weprovide concrete examples of why and how Overview was used by journalistsin Section 4.5. Then, in Section 4.6, we present our final task abstraction,the outcome of analyzing these observations, and justify our design choiceswith respect to these revisited tasks.834.5 Observations of Real World UsageWe conducted six case studies where we analyzed the use of Overview byinvestigative journalists. We distinguish between a case study and a usagescenario [284], in which the former involves a person from the target ap-plication domain who uses a tool to examine their own data, having goalsrelated to their ongoing work; in contrast, the latter reports usage of a toolby its designers with curated data and conjectured tasks.Pilot case study: The first person who used Overview was the AssociatedPress Caracas bureau chief, whom we asked in November 2011 to use the v1prototype to examine 6,849 of the 251,287 U.S. State Department diplomaticcables released by WikiLeaks, those pertaining to Venezuela; this documentcollection is featured in Figure 4.4. Although he found the tool to be in-teresting, his analysis did not lead to a published story. This informal pilotcase study revealed basic usability problems and the experience promptedus to formalize the case study process and determine foci of interest, suchas utility, usability, learnability, and journalists’ tasks in context.Metrics: In addition to the qualitative analysis of journalists’ tasks, wealso focus on the metric of adoption defined as self-initiated use: did ajournalist freely chose to use the tool for their own investigation, ratherthan trying out the tool in response to direct solicitation by the researchers?According to this distinction, adoption occurred in five of the six case studieswe report, as indicated by the turquoise circles in Figure 4.2; the journalistin the remaining case study (iraq-sec) was co-author Stray. We were alsointerested in the outcome of a journalist’s investigation: did they completetheir investigation to satisfaction as a result of using Overview, either bychoosing to publish a story or by deciding that their findings did not merita story? Or did they abandon Overview because the tool did not help furthertheir investigation?Recruitment: Since the v2 deployment, Stray has promoted Overviewwithin the data journalism community. Several hundred journalists havecreated accounts on the public server, and they have collectively uploadedmore than nine million documents; Overview is used by approximately two84hundred unique people each month7. As of April 7, 2016, we are aware oftwenty published stories where Overview played a part in the investigativeprocess8, five of which are discussed as case studies below. The self-initiatedjournalists featured in case studies 2–6 were recruited to participate in ourstudy after they contacted Stray with technical questions, which often per-tained to workflow diculties such as wrangling their document collectioninto a format that Overview could ingest.Methods: Our case study findings are the result of triangulating betweenmultiple data collection and analysis methods9. Our primary data collectionmethod was that of a semi-structured interview10. We conducted interviewsvia Skype or Google+ Hangout, as our journalists were geographically re-mote; both services include a screen sharing feature, allowing journalists todemonstrate aspects of their investigative process. We recorded these in-terviews and demonstrations using a screen capture application and latertranscribed them. The deadline-driven nature of journalism precluded mul-tiple interviews during an ongoing investigation, so we chose to intervieweach journalist after their investigation was complete, despite the knownlimitations of retrospective introspection [93]. Journalists were encouragedbut not expected to keep a diary relating to their ongoing use of Overview.Five of our case study journalists wrote or contributed to retrospective blogposts about their process [168, 307–309, 338], and one of them (tulsa) alsosent us his personal notes.We also collected usage logs for each journalist, consisting of times-tamped interactions with Overview, which included selecting, viewing,and annotating documents and clusters with tags. Log file analysis allowedus to partially reconstruct a journalist’s analysis process, complementinginformation divulged to us in their retrospective interview. Finally, each7As of March 2014.8Links to these stories can be found here: https://github.com/overview/overview-server/wiki/News-stories. A blog post that summarizes several of thesestories is available here: https://blog.overviewdocs.com/completed-stories/9Section C.3 provides additional detail regarding our data collection and analysismethodology.10Our interview protocol is provided in Section C.4.85journalist provided us with their tagged document collection, which helpedto establish a shared context.4.5.1 Case StudiesThe six case studies we present, summarized in Table 4.1, took place betweenFebruary 2012 and December 2013, as indicated in Figure 4.2.CS1: IRAQ-SEQ [306]: Our first case study took place in February 2012,when journalist and co-author Stray used Overview v2 to analyze recentlydeclassified documents from the Iraq war concerning the behavior of privatesecurity contractors. In particular, he wanted to categorize and count typesof documented incidents involving these contractors; aside from the high-profile incidents that made headlines, he wanted to determine the prevalenceof other incidents that these contractors were involved in during the Iraq war.The document collection was the result of a FOIA request to the U.S.State Department, comprised of 666 incident reports over 4,500 pages, whichwere scanned using OCR. After the documents were loaded in Overview,Stray examined document clusters over the course of five days: he navigatedthe Topic Tree, selected clusters and their documents, aggregated clus-ters using the tree pruning controls, and annotated approximately 48% ofthe documents with 28 unique tags. After a lengthy “orientation” phase todetermine incident categories of interest, he sampled the documents usingthe “Select Random” button (above the Cluster List in Figure 4.4), whichwould select a document from the Document List to be shown in the Docu-ment Viewer. With this approach, he read and tagged 50 of the 666 reports,which allowed him to develop hypotheses regarding the prevalence of cer-tain incident types. Afterward, he followed up with U.S. State Departmentrepresentatives, who provided additional context and a timeline for theseincidents. His published story [306] combines his categorical summarizationwith the context of the war.CS2: TULSA [339]: The first case of self-initiated adoption by a jour-nalist took place in June 201211, revealing a di↵erent motivation for using11Additional analysis of this case study is provided in Section C.3.86Overview. In this case, the journalist wanted to locate and identify ev-idence, documents that would support or refute a pre-existing hypothesis:he was following-up on an anonymous tip regarding municipal governmentmismanagement and potential conflicts of interest between city hall, munic-ipal police, and police equipment vendors. He filed a FOIA request with theCity Hall of Tulsa, Oklahoma for email messages between these organiza-tions, and then used Overview v2 to examine 5,996 of these email messages.His search for corroborating evidence spanned multiple sessions over 18days, beginning with an exhaustive and systematic left-to-right navigationof the Topic Tree, aggregating clusters using the tree pruning controls, andselecting clusters to view their contained documents. He viewed roughly70% of the documents in the Document Viewer at least once, annotating92% of them with 22 unique tags. We observed that he undertook multipleiterations of tagging: he began by tagging entire clusters using terms appear-ing in cluster keywords, but later tagged individual documents throughoutthe tree with tags such as “important”, “weird”, and “follow-up.” As a re-sult of this thorough tagging, the journalist was able to lookup and browsepreviously identified clusters or documents of interest, focus on documentsannotated by multiple tags, or locate documents that remained untagged;the latter was accomplished by selecting uncoloured points in the scatter-plot. These tags also provided a starting point for the further annotationof 129 “important” documents with notes relating to his hypothesis; thesenotes eventually became integral parts of his published story [339].CS3: RYAN [110]: In October 2012, Overview v2 was used yet again tolocate evidence in support of a hypothesis, though there are several di↵er-ences as compared to the tulsa case study. In this case, a journalist wantedto follow-up on an earlier story and on accusations made by Vice PresidentJoe Biden that vice-presidential nominee Paul Ryan’s campaign statementswere hypocritical. In order to support or refute this hypothesis, the jour-nalist sought to compare Ryan’s campaign statements regarding wastefulgovernment programs to his correspondence with various federal agenciesconcerning these same programs. After filing over 200 FOIA requests to87these agencies, the journalist received 8,680 pages of correspondence. Thesephysical documents arrived in several batches, and were scanned using OCR.The journalist wanted to find genuine correspondence signed by Ryan;however, prevalent OCR errors prevented him from locating these docu-ments using keyword search. Overview was able to cluster documents ef-fectively on the remaining intact text, and most of the documents in thiscollection were quickly found to be irrelevant to his hypothesis. Over thecourse of half a day, he navigated the Topic Tree to locate and identifya small subset of clusters containing one hundred and seventy-six pages ofgenuine correspondence containing Ryan’s signature; the remainder couldbe safely ignored, comprised of attachments and other irrelevant correspon-dence. Unlike the tulsa journalist, the ryan journalist annotated a mere8% percent of the document collection with 12 unique tags. As with tulsa,the ryan journalist used tags as a starting point for the further annotationof his source documents with notes; his published story [110] compares thesefindings to Ryan’s campaign statements.CS4: GUNS [167]: The first documented adoption of Overview’s webapplication deployment (v3) took place in December 2012. Shortly after theNewtown school shooting, the journalist asked Daily Beast readers to self-identify as gun owners or non-owners, to report where they lived, and to posttheir opinion on the debate over gun ownership on a discussion board. Hecollected 1,278 comments: 757 from gun owners and 521 from non-owners.He aimed to determine what the debate on gun ownership is about: do gunowners and non-owners raise the same issues? He was also curious aboutgeographical di↵erences.He uploaded the responses from gun owners and non-owners into two sep-arate instances of Overview. Like the iraq-sec case study, the guns jour-nalist was interested in summarizing a document collection, though the formof this summarization was di↵erent. In iraq-sec, the journalist wanted tocategorize and count types of documented incidents; in contrast, the gunsjournalist sought to identify documents that were representative of theirclusters, the sensational and polarizing speaking points from both sides of88the debate over gun ownership; he was less interested in a fine-grained clas-sification or quantification. For both sets of documents, he navigated andselected clusters and their contained documents, compared related clustersbetween the gun owner and non-owner instances, and later browsed previ-ously identified clusters to identify representative quotes from people onboth sides. Ultimately, he read nearly all the discussion board commentsover the course of a day. Unlike the previous case studies, he did not useOverview’s tagging functionality, instead opting to copy quotes into an Ex-cel spreadsheet, where he integrated geographical metadata and iterativelyarranged quotes to construct a narrative for his story [167].CS5: DALLAS: In August 2013, a journalist used Overview v4 in a sim-ilar fashion to that of the tulsa journalist, though the outcome of theirinvestigations di↵ered. In the dallas case study, the journalist had re-cently reported on a collection of 4,653 email messages resulting from aFOIA request regarding the state government’s response to an emergencyincident. The journalist believed that some remaining evidence was left tobe located, beyond what had already been reported in the earlier story.Despite having already read all the documents in the collection (unassistedby Overview), the journalist used Overview to verify that nothing was over-looked and sought to gather material for a follow-up story. She subsequentlyused Overview to examine four additional collections of messages, analyzedindividually, ranging in size between 1,858 and 3,564 email messages.The keyword search feature introduced in Overview v4 was found to beparticularly useful: the journalist alternated between identifying clustersby navigating, aggregating, and selecting nodes in the Topic Tree, andlocating documents via keyword search, then identifying related docu-ments. As her analysis progressed, we observed that the journalist reliedmore upon keyword search to highlight clusters of interest within the TopicTree. She applied tags to each of the five document collections: the numberof tags ranged between three and seven, and between 7% and 52% of docu-ments were annotated with at least one tag; in total, 14 out of 31 tags werecreated from keyword search results.89In this case, Overview was used to make the decision not to publish:after 12 hours of Overview usage spanning several weeks, the journalist wassuciently confident that nothing significant had been overlooked in theprevious investigation, ultimately deciding not to write a follow-up story.This journalist estimated that it would have taken “more than a week” toreach this conclusion without Overview, and is “definitely planning on usingit again for large document sets”.CS6: NY [236]: The final case study we report took place in December2013, in which a journalist used Overview v4 to confirm that a documentcollection did not contain evidence that would refute his hypothesis. In theny case study, the journalist had gathered material to investigate the stateof New York’s process for handling and responding to police misconductcases, including 1,680 proposed and passed bills retrieved from the StateSenate Open Legislation application programming interface (API). He hy-pothesized that the state legislature had failed to pass any bills addressingthis misconduct by increasing oversight.A considerable amount of data wrangling was required before this jour-nalists could use Overview. The State Senate API provided the bills inJavaScript object notation (JSON) format; to address this, the journalistwrote a script to import these documents into a database, which was inturn used to export a comma-separated values (CSV) file that Overviewcould ingest.Following data ingestion, the journalist used Overview for about fourhours over the course of three days to read all the document titles and key-words in a systematic fashion: starting with the smaller nodes, he wouldselect a node in the Topic Tree and scan the document titles and keywordsappearing in the Document List; the titles tended to be verbose and de-scriptive, and any that were deemed interesting were read in the DocumentViewer or tagged as “review” . He eventually examined the largest node,which contained 732 documents with similar titles and keywords, their con-tents mostly comprised of boilerplate text; the journalist tagged the entirenode as “no unless”, meaning that any document contained by the node was90not significant unless there was another tag on it. He later returned to doc-uments tagged with “review”, replacing this tag with one of five descriptivetags. Though the tag highlighting used in Overview’s Topic Tree allowedthe journalist to quickly locate tagged documents, he suggested that thetree could alternatively hide all documents not marked with a particulartag, such as his “not of interest” tag.His approach was similar to tulsa and dallas, in that they all soughtto locate and identify clusters containing potential evidence. However,the tulsa and dallas journalists could have stopped their search once thisevidence was found, as it is unlikely that any additional evidence wouldinvalidate their previous findings. In contrast, the ny journalist sought toprove the non-existence of evidence, which required review of every docu-ment, as any evidence that went overlooked would have invalidated a claimof non-existence.As a result of his analysis, the journalist was confident that no bills hadbeen passed to address police misconduct, though several relevant bills hadbeen proposed multiple times; conveniently, multiple versions of proposedbills were clustered together in Overview’s Topic Tree. While this findingis reported in a only a single paragraph of his published story [236], itplayed a key role in his argument that the state of New York is facing apolice oversight problem; this story received considerable acclaim from thejournalism community and was a finalist for the 2014 Pulitzer Prize12.4.5.2 Think-Aloud EvaluationTo complement our case study observations, we also solicited feedback fromother journalists. After the deployment of the web-based Overview v3, whichincluded usage tracking, we observed that Overview and its individual fea-tures were not being used to the extent that we had hoped. We suspectedusability problems so we embarked on a discount usability testing programinspired by the work of Nielsen [224]: five na¨ıve journalists were indepen-dently presented with an example document collection, such as the collection12http://www.pulitzer.org/finalists/532591CaseStudy1: iraq-sec [306]2:tulsa [339]3: ryan [110] 4: