UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Visual text analytics for online conversations Prince, Md Enamul Hoque 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_september_prince_mdenamulhoque.pdf [ 16.77MB ]
Metadata
JSON: 24-1.0347620.json
JSON-LD: 24-1.0347620-ld.json
RDF/XML (Pretty): 24-1.0347620-rdf.xml
RDF/JSON: 24-1.0347620-rdf.json
Turtle: 24-1.0347620-turtle.txt
N-Triples: 24-1.0347620-rdf-ntriples.txt
Original Record: 24-1.0347620-source.json
Full Text
24-1.0347620-fulltext.txt
Citation
24-1.0347620.ris

Full Text

Visual Text Analytics for Online ConversationsbyMd Enamul Hoque PrinceB.Sc., Chittagong University of Engineering & Technology, 2007M.Sc., Memorial University of Newfoundland, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University Of British Columbia(Vancouver)May 2017c©Md Enamul Hoque Prince, 2017AbstractWith the proliferation of Web-based social media, asynchronous conversationshave become very common for supporting online communication and collabora-tion. Yet the increasing volume and complexity of conversational data often makeit very difficult to get insights about the discussions. This dissertation posits that byintegrating natural language processing and information visualization techniquesin a synergistic way, we can better support the user’s task of exploring and ana-lyzing conversations. Unlike most previous systems, which do not consider thespecific characteristics of online conversations; we applied design study method-ologies from the visualization literature to uncover the data and task abstractionsthat guided the development of a novel set of visual text analytics systems.The first of such systems is ConVis, that supports users in exploring an asyn-chronous conversation, such as a blog. ConVis offers a visual overview of a con-versation by presenting topics, authors, and the thread structure of a conversation,as well as various interaction techniques such as brushing and linked highlight-ing. Broadening from a single conversation to a collection of conversations, Multi-ConVis combines a novel hierarchical topic modeling with multi-scale explorationtechniques. A series of user studies revealed the significant improvements in userperformance and subjective measures when these two systems were compared totraditional blog interfaces.Based on the lessons learned from these studies, this dissertation introduced aninteractive topic modeling framework specifically for asynchronous conversations.The resulting systems empower the user in revising the underlying topic modelsthrough an intuitive set of interactive features when the current models are noisyand/or insufficient to support their information seeking tasks. Two summative stud-iiies suggested that these systems outperformed their counterparts that do not supportinteractive topic modeling along several subjective and objective measures.Finally, to demonstrate the generality and applicability of our approach, we tai-lored our previous systems to support information seeking in community questionanswering forums. The prototype was evaluated through a large-scale Web-basedstudy, which suggests that our approach can be adapted to a specific conversationalgenre among a diverse range of users.The dissertation concludes with a critical reflection on our approach and con-siderations for future research.iiiLay SummarySince the rise of social-media, an ever-increasing amount of conversations are gen-erated. Often many people contribute to the discussion, which become very longwith hundreds of comments, making it difficult for users to get insights about thediscussion. This dissertation integrates language processing and visualization tech-niques to support the user’s task of exploring and analyzing conversations. Lan-guage processing mines topics and opinions from the conversations, while visu-alization techniques provide visual overviews of the mined data and support userexploration and analysis. User studies revealed significant improvements, whenour systems were compared to traditional blog interfaces. This dissertation alsointroduces a new human-in-the-loop algorithm that helps the user to revise resultsof topic modeling. Two user studies show that these systems outperform theirnon-interactive counterparts. Finally, we tailored our previous systems to supportinformation seeking in community question answering forums. The prototype wassuccessfully evaluated through a large-scale user study.ivPrefaceParts of this thesis are based on prior peer-reviewed publications by me (under thename Enamul Hoque) and with various coauthors:Chapter 2 is based on the article ConVis: A visual text analytic system forexploring blog conversations, by Enamul Hoque and Giuseppe Carenini; in Journalof Computer Graphics Forum (Proceedings of EuroVis), 33(3):221230, 2014 [59].My main contributions include: 1) user requirements analysis; 2) iterative designand development of prototypes; 3) designing and conducting the user study; 4)preparing the manuscript. Giuseppe Carenini advised me throughout the projectand contributed to the editing process.Portions of Chapter 2 also appeared (along with portions of Chapter 1) in asummarized form in Interactive exploration of asynchronous conversations: Ap-plying a user-centered approach to design a visual text analytic system, by EnamulHoque, Giuseppe Carenini, and Shafiq Joty; in Proceedings of the Workshop onInteractive Language Learning, Visualization, and Interfaces (ILLVI), 2014 [62].A version of Chapter 3 has been published as MultiConVis: A visual text ana-lytics system for exploring a collection of online conversations, by Enamul Hoqueand Giuseppe Carenini; in Proceedings of the ACM International Conference onIntelligent User Interfaces (IUI), pp. 96-107, 2016 [60]. My main contributionsinclude: 1) user requirements analysis; 2) iterative design and development of pro-totypes; 3) designing and conducting the user study; 4) preparing the manuscript.My co-author Giuseppe Carenini played supervisory roles and contributed to theediting process.Portions of Chapter 4 were published in ConVisIT: Interactive topic modelingfor exploring asynchronous online conversations; Proceedings of the ACM Inter-vnational Conference on Intelligent User Interfaces (IUI), pp. 169-180, 2016 [60].An extended version of this paper has also appeared as a journal paper: Interac-tive topic modeling for exploring asynchronous online conversations: Design andevaluation of ConVisIT, by Enamul Hoque and Giuseppe Carenini; ACM Transac-tions on Interactive Intelligent Systems (TiiS), 6(1):7:17:24, Feb. 2016 [61]. Mymain contributions include: 1) proposing and implementing the interactive topicmodeling approach; 2) iterative design and development of prototypes; 3) design-ing and conducting the user study; 4) preparing the manuscript. Giuseppe Careniniplayed supervisory roles during the design and evaluation of the system. He alsocontributed to the editing process.Portions of Chapter 5 were published in CQAVis: Visual text analytics for com-munity question answering, by Enamul Hoque, Shafiq Joty, Lluı´s Ma`rquez andGiuseppe Carenini; in Proceedings of the ACM International Conference on Intel-ligent User Interfaces (IUI), 2017 [63]. Most of this research was done during myinternship at the Qatar Computing Research Institute between January 2016 andMay 2016. My main contributions include: 1) user requirements analysis; 2) iter-ative design and development of prototypes; 3) designing and conducting the userstudy. My co-authors provided feedback during the design and evaluation of thesystem. Shafiq Joty and Lluı´s Ma`rquez helped me in establishing the collaborationwith Qatar Living administrators, advertising for the user study and deploying thetool online. I performed the majority of the writing for the paper, while all of mycollaborators contributed to the editing process.All the user studies described in Chapter 2, 3, and 4, were conducted withthe approval of the UBC Behavioural Research Ethics Board (BREB): certificateH13-02132.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . 131.4.1 Exploring Conversations with Static Model . . . . . . . . 141.4.2 Exploring Conversations with Human-in-the-loop Model 181.4.3 Applying and Tailoring the Solutions to Specific Domains 202 Supporting Users in Exploring a Single Conversation . . . . . . . . 232.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 From User Requirements to Design Principles . . . . . . . . . . . 25vii2.2.1 Why and How People Read Blogs? . . . . . . . . . . . . 252.2.2 Data and Tasks Abstraction . . . . . . . . . . . . . . . . 272.2.3 Design Principles . . . . . . . . . . . . . . . . . . . . . . 292.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3.1 Metadata-based Visualization . . . . . . . . . . . . . . . 312.3.2 Content-based Visualization . . . . . . . . . . . . . . . . 312.3.3 Faceted Exploration . . . . . . . . . . . . . . . . . . . . 322.4 Mining and Summarizing Conversations . . . . . . . . . . . . . . 332.4.1 Topic Modeling . . . . . . . . . . . . . . . . . . . . . . . 332.4.2 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . 352.4.3 Corpora and Preprocessing . . . . . . . . . . . . . . . . . 352.5 ConVis Design and Implementation . . . . . . . . . . . . . . . . 362.5.1 Visual Encoding . . . . . . . . . . . . . . . . . . . . . . 362.5.2 User Interactions . . . . . . . . . . . . . . . . . . . . . . 392.5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . 412.6 Informal Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 412.6.1 Procedure and Participants . . . . . . . . . . . . . . . . . 422.6.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . 432.6.3 Revisiting Task Abstraction . . . . . . . . . . . . . . . . 452.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Supporting Users in Exploring a Set of Conversations . . . . . . . . 493.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.2.1 Metadata Visualization . . . . . . . . . . . . . . . . . . . 523.2.2 Topic Modeling and Visualization . . . . . . . . . . . . . 533.2.3 Opinion Visualization . . . . . . . . . . . . . . . . . . . 543.3 User Requirements Analysis . . . . . . . . . . . . . . . . . . . . 553.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 573.5 Text Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.5.1 Topic Hierarchy Generation . . . . . . . . . . . . . . . . 583.5.2 Topic Modeling Over Each Conversation . . . . . . . . . 59viii3.5.3 Sentiment . . . . . . . . . . . . . . . . . . . . . . . . . . 623.6 MultiConVis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.6.1 Visual Encoding . . . . . . . . . . . . . . . . . . . . . . 623.6.2 Multi-level Exploration . . . . . . . . . . . . . . . . . . . 653.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.8.1 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . 683.8.2 User Study . . . . . . . . . . . . . . . . . . . . . . . . . 703.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.9.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . 753.9.2 Evaluation Methodology . . . . . . . . . . . . . . . . . . 753.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764 Interactive Topic Modeling for Exploring Online Conversations . . 774.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.2.1 Human-in-the-loop Topic Model . . . . . . . . . . . . . . 804.2.2 Interactive Topic Hierarchy Revision . . . . . . . . . . . 824.3 Interactive Topic Modeling System . . . . . . . . . . . . . . . . . 834.3.1 Interactive Topic Revisions of Topic Models for a SingleConversation . . . . . . . . . . . . . . . . . . . . . . . . 834.3.2 Interactive Topic Revisions of Topic Models for a Set ofConversations . . . . . . . . . . . . . . . . . . . . . . . . 884.4 Interactive Visualization for Topic Revision . . . . . . . . . . . . 924.4.1 ConVisIT: Exploring a Single Conversation Using Interac-tive Topic Modeling . . . . . . . . . . . . . . . . . . . . 934.4.2 MultiConVisIT: Exploring a Collection of ConversationsUsing Interactive Topic Modeling . . . . . . . . . . . . . 964.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.6.1 Study I . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.6.2 Study II . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115ix4.7.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . 1154.7.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 1174.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175 Tailoring Our Visual Text Analytics Solutions to a Community Ques-tion Answering Forum . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195.2 The Design Process . . . . . . . . . . . . . . . . . . . . . . . . . 1215.3 User Requirements Analysis . . . . . . . . . . . . . . . . . . . . 1225.3.1 Domain Characterization . . . . . . . . . . . . . . . . . . 1225.3.2 Data and Task Abstractions . . . . . . . . . . . . . . . . 1255.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 1265.4.1 Offline Processing . . . . . . . . . . . . . . . . . . . . . 1265.4.2 Online Processing . . . . . . . . . . . . . . . . . . . . . 1275.5 Text Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275.6 CQAVis Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345.8 Web-based User Study . . . . . . . . . . . . . . . . . . . . . . . 1345.8.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 1355.8.2 Study Setup and Procedure . . . . . . . . . . . . . . . . . 1355.8.3 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . 1365.8.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 1365.8.5 Analysis of Results . . . . . . . . . . . . . . . . . . . . . 1375.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1415.9.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . 1415.9.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . 1425.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1446 Reflection and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 1456.1 Reflection on the Design Approach . . . . . . . . . . . . . . . . . 1466.2 Impact of Our Visual Text Analytics Systems . . . . . . . . . . . 1496.3 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . 1526.4 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . 153x6.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158A Supplementary Materials for Chapter 3 . . . . . . . . . . . . . . . . 172A.1 Script for User Study . . . . . . . . . . . . . . . . . . . . . . . . 172A.2 Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . 176B Supplemental Materials for Chapter 4 . . . . . . . . . . . . . . . . . 183B.1 User Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183B.1.1 Script for User Study . . . . . . . . . . . . . . . . . . . . 183B.1.2 Questionnaires . . . . . . . . . . . . . . . . . . . . . . . 187B.1.3 Evaluating User-Generated Summary by Human Raters . 194B.2 User Study 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196B.2.1 Script for User Study . . . . . . . . . . . . . . . . . . . . 196B.2.2 Questionnaires . . . . . . . . . . . . . . . . . . . . . . . 200B.2.3 Evaluating User-Generated Summary by Human Raters . 208C Supplemental Materials for Chapter 5 . . . . . . . . . . . . . . . . . 210C.1 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210D Participant Consent Forms . . . . . . . . . . . . . . . . . . . . . . . 214xiList of TablesTable 1.1 Characterizing online conversations from different dimensions. 10Table 1.2 User categorization for asynchronous conversation. . . . . . . 12Table 1.3 A summary of resources for our systems . . . . . . . . . . . . 15Table 1.4 Summary of user studies conducted in the dissertation. . . . . . 15Table 2.1 A set of tasks that a user may likely have to perform while ex-ploring a blog conversation. . . . . . . . . . . . . . . . . . . . 28Table 2.2 Set of conversational data to be visualized and their abstract types. 29Table 3.1 A summary of how facet elements are abstracted for a collectionof conversations vs. one conversation. . . . . . . . . . . . . . . 56Table 4.1 Different possible topic revision operations. . . . . . . . . . . 84Table 4.2 A set of operations for revising the topic hierarchy. . . . . . . . 89Table 4.3 Statistical analysis (Mann-Whitney’s U test) on usefulness, ease-ofUse, enjoyable and findInsightfulComments measures. . . . . 102Table 4.4 Statistical analysis (Mann-Whitney’s U test) on summary ratings. 107Table 4.5 Statistical analysis (Mann-Whitney’s U test) on summary ratings. 115Table 5.1 Overview of user study sessions and queries. . . . . . . . . . . 137Table 5.2 Breakdown of query types along with examples . . . . . . . . 137xiiList of FiguresFigure 1.1 A set of conversations returned for a query ‘iPhone bending’are presented as a paginated list by Macrumors. . . . . . . . . 4Figure 1.2 An example of an excerpt from a single conversation . . . . . 5Figure 1.3 The research presented in this thesis falls into the intersectionbetween information visualization, natural language process-ing, and human-in-the-loop computation. . . . . . . . . . . . 8Figure 1.4 The design space explored in this research. . . . . . . . . . . 14Figure 1.5 The ConVis interface . . . . . . . . . . . . . . . . . . . . . . 16Figure 1.6 The MultiConVis interface . . . . . . . . . . . . . . . . . . . 17Figure 1.7 Our interactive topic modeling framework . . . . . . . . . . . 19Figure 1.8 CQAVis is a visual interface to support information seekingtasks in a community question answering forum . . . . . . . . 21Figure 2.1 a) Reply-to relationships between the initial post and the com-ments. b) corresponding FQG . . . . . . . . . . . . . . . . . 34Figure 2.2 A snapshot of ConVis for exploring blog conversation. . . . . 37Figure 2.3 Hovering the mouse over a topic element. . . . . . . . . . . . 39Figure 2.4 Clicking on a topic results in drawing a thick vertical outlinenext to each of the related comments. . . . . . . . . . . . . . 41Figure 2.5 An example showing: (a) The user clicked on a comment inthe Thread Overview. (b) As a result, the system automaticscrolled to the actual comment in the Conversation View. . . . 42xiiiFigure 2.6 Comparison of uses patterns between two participants usingthe two different strategies on the conversation titled “MusicStreaming to Overtake Downloads”. . . . . . . . . . . . . . 43Figure 3.1 The MultiConVis interface, showing a subset of blog conver-sations returned by the query ‘iPhone bending’ from Macrumors. 51Figure 3.2 Overview of the MultiConVis system. . . . . . . . . . . . . . 57Figure 3.3 Hierarchical topic model generation. . . . . . . . . . . . . . 58Figure 3.4 Reply-to relationships between the initial post A and the com-ments C1,C2, ...,C6 of a conversation. . . . . . . . . . . . . . 60Figure 3.5 The main visual encodings in MultiConVis. . . . . . . . . . . 63Figure 3.6 A snapshot of MultiConVis for the ‘iPhone bending’ dataset. . 64Figure 3.7 A conversation from the ‘iPhone bending’ dataset, showingstacked area chart to represent how sentiment distribution evolvesover time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 3.8 As the user selects a particular conversation, the ConversationList is replaced by the ConVis interface. . . . . . . . . . . . . 67Figure 3.9 The baseline interface. . . . . . . . . . . . . . . . . . . . . . 71Figure 3.10 Average rating of interfaces by the participants on six differentmeasures. Longer bars indicate better rating. . . . . . . . . . 73Figure 3.11 Responses to statements regarding specific features of the Mul-tiConVis interface. . . . . . . . . . . . . . . . . . . . . . . . 74Figure 4.1 Interactive topic modeling framework for exploring asynchronousconversation. . . . . . . . . . . . . . . . . . . . . . . . . . . 79Figure 4.2 Three different user operations for topic revision . . . . . . . 86Figure 4.3 Illustrative examples of how the topic hierarchy changes as aresult of applying different operations. . . . . . . . . . . . . . 90Figure 4.4 An example of a split operation . . . . . . . . . . . . . . . . 94Figure 4.5 An example demonstrating the merge by join operation. . . . 94Figure 4.6 An example demonstrating the merge by absorbtion operation 95Figure 4.7 An example of showing fewer, more generic topics of ‘iPhone6 bend’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97xivFigure 4.8 An example of adding a topic as a child. . . . . . . . . . . . . 98Figure 4.9 Responses to statements regarding the prior experience. . . . . 100Figure 4.10 Average rating of the three interfaces by the participants. . . . 102Figure 4.11 Responses to statements regarding specific features of the threeinterfaces under investigation. . . . . . . . . . . . . . . . . . 104Figure 4.12 Some interaction log statistics for interactions that are commonbetween ConVis and ConVisIT. . . . . . . . . . . . . . . . . 105Figure 4.13 Average ratings for user generated summaries based upon twohuman raters . . . . . . . . . . . . . . . . . . . . . . . . . . 106Figure 4.14 An example of summaries of different topics created by a userduring the study. . . . . . . . . . . . . . . . . . . . . . . . . 111Figure 4.15 Average rating of the two interfaces by the participants. . . . . 112Figure 4.16 Responses to statements regarding specific features of the Mul-tiConVisIT interface. . . . . . . . . . . . . . . . . . . . . . . 113Figure 4.17 Average ratings for user generated summaries based upon twohuman raters . . . . . . . . . . . . . . . . . . . . . . . . . . 115Figure 5.1 An example of a new question, followed by a set of relatedthread questions and their comments. . . . . . . . . . . . . . 120Figure 5.2 Overview of our interactive system for supporting communityquestion answering. . . . . . . . . . . . . . . . . . . . . . . . 126Figure 5.3 A screenshot of the interface showing the top answer and re-lated questions for a user’s question. . . . . . . . . . . . . . . 130Figure 5.4 An example of a thread overview that splits a large number ofcomments into multiple rows . . . . . . . . . . . . . . . . . . 132Figure 5.5 When the user clicks on a rectangle in the thread overviewrepresenting a comment, the interface scrolls to that commentin the conversation view. . . . . . . . . . . . . . . . . . . . . 133Figure 5.6 Average rating of interfaces by the participants on four differ-ent measures. Longer bars indicate better rating. . . . . . . . 138Figure 5.7 Interface features used by the participants. . . . . . . . . . . . 140Figure 6.1 Design stages of ConVis and ConVisIT. . . . . . . . . . . . . 148xvFigure 6.2 A screenshot of the modified ConVis interface used in the SEN-SEI project. . . . . . . . . . . . . . . . . . . . . . . . . . . . 150Figure 6.3 VisOHC visually represents the comments of a conversationusing a sequence of rectangles. . . . . . . . . . . . . . . . . . 151Figure C.1 The introduction page. . . . . . . . . . . . . . . . . . . . . . 211Figure C.2 The post-study questionnaire regarding the user’s subjectiveexperience. . . . . . . . . . . . . . . . . . . . . . . . . . . . 212Figure C.3 The post-study questionnaire regarding the user’s backgroundand prior Web experience. . . . . . . . . . . . . . . . . . . . 213xviAcknowledgmentsFirst of all, I would like to express my deepest gratitude to my advisor, GiuseppeCarenini, for his enormous support throughout my Ph.D. study. As a great advisor,Giuseppe demonstrated to me how research should be done, provided insightfuladvice, and most importantly constantly challenged me to do better.I would like to sincerely thank my other supervisory committee members TamaraMunzner and Raymond Ng, for their valuable support on my research and helpfulcomments and suggestions.Many thanks go to my external examiner Sheelagh Carpendale for her con-structive and insightful feedback on my thesis.I thank my university examiners Ronald Garcia and Victoria L. Lemieux, aswell as my thesis defense chair Cay Holbrook.I thank all the current and past members of our Natural Language Process-ing and Intelligent User Interfaces research groups at UBC for the feedback andsupport they provided. I am especially indebted to Shafiq Joty, for his continu-ous support throughout my graduate student life. I also thank attendees of TamaraMunzner’s InfoVis reading group, who provided valuable feedback and support onmy research.I thank my collaborators for providing valuable help during a design study,which I conducted at the Qatar Computing Research Institute (QCRI): Lluı´sMa`rquez, Alberto Barro´n-Ceden˜o, Giovanni Da San Martino, Alessandro Mos-chitti, Preslav Nakov, and Salvatore Romeo. I also thank Qatar Living, and espe-cially Ash Bashir, for their help in deploying our research prototype in their websitefor running a web-based user study.I would like to thank my M.Sc. supervisors Orland Hoeber and Minglun Gong,xviiwho inspired me to pursue my doctoral degree.Last but not the least, I would like to thank my family for all their uncondi-tional love and encouragement: my parents A K Fazlul Haque and Nargis AsharaKhanam, my brother Nazmul Huq, and my sister Fauzia Jahan. Many thanks to mybeloved wife Farzana Afroze, whose support during the final stages of this Ph.D.is so appreciated.xviiiChapter 1IntroductionSince the internet revolution and the subsequent rise of social media, an ever-increasing amount of human conversations are generated in many different modal-ities [17]. While email remains a fundamental way of communicating for mostpeople, other conversational modalities such as blogs and microblogs have quicklybecome widely popular. These conversations are primarily asynchronous in nature,where participants communicate with each other at different times.People engaged in asynchronous conversations to exchange ideas, ask ques-tions, and comment on daily life events. Often many people contribute to the dis-cussion, which can quickly become very long with hundreds of comments. Thenet result of this phenomena is that an enormous and growing volume of conver-sational data is generated everyday. Recent statistics from Alexa’s Internet trafficrating service reveal that the top three blogging and microblogging sites Word-press, Twitter, and Tumblr are among the top 50 most visited sites in the world [4].In the Wordpress blogging platform alone, users produce about 80.7 million newposts and 44.5 million new comments each month and over 409 million peopleview more than 24.2 billion pages in the same period [7]. Conversations in socialnetwork platforms also continue to rise at an accelerating pace. A recent studyfrom Pew Research Center shows that 79% of American internet users use Face-book, with roughly three-quarters (76%) of these Facebook users report that theyvisit the site daily [6].These collections of online conversations can provide valuable insights in many1domains including but not limited to marketing intelligence, business analytics,customer relationship management, journalism, and healthcare analytics. For in-stance, business analysts may want to analyze text conversations in social mediato uncover consumer sentiment and insights about their products or company andto draw conclusions about commercial strategies [93]. In online news media, suchas the New York Times, editors are interested to know which of their contributionsgenerate most comments from their readers in order to make strategic decisionsabout how to balance the content of their online news. They may also want toassess the quality of comments to remove the low quality ones, and to identifyhigh quality contributions to set community standards [32]. As another example,administrators in online health communities are interested in continuously moni-toring the forum in order to foster lively discussions, while at the same time theyneed to prevent the propagation of misinformation and abusive comments [76]. Fi-nally, a casual reader may want to skim through a blog to find out the communityresponse to a particular topic and to decide whether and how she should contributeto the discussion.While the abundance of conversational data opens up a great opportunity forimportant discoveries in a variety of domains, exploring and analyzing such largeamounts of data has become a challenging problem in both personal and pro-fessional contexts. This problem is commonly known as information overload,where users feel overwhelmed by the vast amount of potentially relevant informa-tion [14]. To address this problem, this dissertation takes a visual text analytics ap-proach, where we combine natural language processing methods for understatingand summarizing discussions and information visualization techniques to presentan overview of the conversational data to users.In the remainder of this introduction, we first discuss some key challenges aris-ing from the volume and complexity of conversational data, and the short-comingsof existing approaches in dealing with such challenges in Section 1.1. Then, weoutline the research methodology used to tackle these challenges (Section 1.2), fol-lowed by defining the scope of this work (Section 1.3). Finally, we summarize ourmajor contributions along with an outline of the dissertation in Section 1.4.21.1 The ProblemAn asynchronous conversation such as a blog may start with a news article or aneditorial opinion and later may generate a long and complex thread as commentsare added by the participants [17]. When a reader wants to explore such a largeconversation, traditional social media sites provide very limited support. Theysimply present the original posts and subsequent replies as a paginated indentedlist. Thus the reader needs to go through a long list of comments sequentially,until her information needs are fulfilled. Going through such an overwhelmingamount of textual data in this way often leads to information overload, i.e., the userfinds it very difficult to get insights about the ongoing or past discussions [69].The problem becomes even more serious when the user is interested in analyzingmultiple conversations that are discussing similar issues.To illustrate the problem, let us consider the issue of ‘iPhone bending’ thatwent viral on social media, when the iPhone 6 was launched in September 2014.Soon after the product was released, some people claimed that this new phone caneasily bend in the pocket while sitting on it. This incident triggered a huge amountof discussions in Macrumors [1], a blog site that regularly publishes Apple relatednews and allows participants to make comments. Within a few days, more thana dozen conversations with thousands of comments were generated in Macrumorscovering various related issues, such as ‘what users reported about the bendingissue’, ‘what Apple says to defend its new product’, and ‘what are the reactionsfrom the rivals of Apple’ etc. In this situation, we could imagine at least threedifferent users who would like to explore this set of conversations. First, a potentialcustomer, who intended to buy an iPhone may want to explore these conversationsto verify whether the bending issue is really serious. Second, a journalist may wantto publish a story about what people are saying about the ‘bending issue’. Finally,an Apple marketing analyst may want to get a pulse from the online community tomake an informed decision about how to react to the rumors and possibly redesignthe products.In all three cases, given the large number of conversations/comments, it wouldbe extremely difficult and time-consuming for a user to explore and analyze allthis information with the current blog interfaces. This is primarily due to the fact3Figure 1.1: A set of conversations returned for a query ‘iPhone bending’ arepresented as a paginated list by Macrumors (accessed in March, 2015).that a typical blog site presents both the list of conversations and the commentsas paginated lists and only provides sequential access to conversations/comments(see Figure 1.1 and Figure 1.2). It neither provides any high-level overview of theconversations, nor provides sufficient navigational cues. As a result, users oftenbecome overwhelmed by the large amount of conversational data and leave thediscussions without fulfilling their information needs [69].While both the Natural Language Processing (NLP) and the Information Vi-4Figure 1.2: An example of an excerpt from a single conversation, which con-sists of an initial post followed by a set of comments (accessed in March,2015).sualization (InfoVis) community individually attempt to address this and similarproblems, only little efforts have been devoted to integrating NLP and InfoVis5techniques in a synergistic way. In general, previous work at the intersection ofInfovis and NLP has developed relatively simple and generic approaches to visualtext analytics.A common visualization technique to help users in analyzing individual docu-ments is to visually encode the frequency of keywords using font size, such as ap-plying a tag cloud metaphor [113]. Another way to visually represent a documentis to split it into blocks of text and then use color to indicate some key informationwithin each block [54, 73]. For instance, TileBars [54] presents the documents re-trieved from search queries using bars, where widths are relative to the length of thedocuments, and heights are relative to the number of query terms. The content of adocument were divided into blocks and the color within each block represents thefrequency of query terms. Similarly, Oelke and Keim extracted some features suchas vocabulary richness or sentence length for each text block and represent themusing color at different levels of granularity ranging from chapters to sentencesto words [73]. However, since the above visualizations did not reveal semanticrelationships among terms, another body of works attempted to capture such re-lationship using tree representations, such as Word Tree [126], Double Tree [29],and DocuBurst [25]. Additionally, they mapped term frequency to font size.In the research presented in this thesis, we have used some of the commonmetaphors from the above works, such as colored bars to encode some featuresof the text (e.g., sentiment) and mapping frequency of discussion topics to fontsize. However, the above works are devised for generic documents, in contrast,we have considered additional data specific of conversations including the reply-relationships between comments and the relationships between comments and au-thors, which introduces more challenges from the visualization perspective thatwere not addressed in the above works.Previous research that specifically focused on visualizing asynchronous con-versations also have important limitations. Most of these works did not derive theirvisual encodings and interactive techniques from task and data abstractions basedon a detailed analysis of specific user needs and requirements in the target domains.Instead, they either visualize only metadata such as the reply-relationship betweencomments that do not reveal any content information (e.g., [101, 123, 125]), orvisualize the results of simple, often inaccurate text analysis techniques that are6not adequate to support the user (e.g., [109, 124]). Furthermore, these text analysismethods are not designed to exploit the specific characteristics of asynchronousconversations, such as reply-relationships and use of quotation; despite recent ev-idence suggest that NLP methods such as topic modeling [70] are more accuratewhen these specific characteristics are taken into account.In short, most previous works did not integrate text analysis and informationvisualization based on considering the specific characteristics of online conversa-tions and of their users. This dissertation aims to address such short-coming ofexisting approaches.1.2 ApproachThe primary goal of this thesis is to develop a comprehensive understanding ofhow a combination of text analysis and interactive visualization can support usersin exploring online conversations. The hypothesis is that by tightly integratingNLP and InfoVis techniques, we can better support the user’s task of exploring andanalyzing conversations. But how NLP and InfoVis techniques can be effectivelyintegrated? More specifically, I pose the following research questions:1. What tasks do users want to perform and what metadata and text analysisresults are actually useful to support these tasks?2. How can useful metadata and content be extracted from the conversation?3. How should the extracted metadata and contents be visualized to the user?4. How can we support the user when (she realizes that) the current text analysisresults are not helping her anymore?5. When we compare our proposed approach for exploring and analyzing con-versations with traditional interfaces, is there any difference in user perfor-mance and subjective measures?6. What specific aspects of the proposed approach are more/less beneficial forthe potential users?7Figure 1.3: The research presented in this thesis falls into the intersectionbetween information visualization, natural language processing, andhuman-in-the-loop computation.My research falls into the cross section between three main research areas thatare associated with my research questions: information visualization (Q1, Q3, Q5,Q6), natural language processing (Q2, Q4), and human-in-the-loop-computation(Q4). The overlap between these three areas defines the scope of my doctoralresearch, i.e., designing and testing visual analytics systems for asynchronous con-versations (see Figure 1.3). The distinct role played by each area in my research isas follows:-Why InfoVis? To address Q1 and Q3, I focus on applying human-centered de-sign methodologies from the InfoVis literature [91, 111]. Starting from an analysisof user behaviours and needs in the target conversational domain, such methodshelp uncover useful task and data abstractions. On the one hand, task and data ab-stractions can characterize the type of information that needs to be extracted fromthe conversation (Q1); on the other hand, they can inform the design of the visualencodings and interaction techniques (Q3). More tellingly, as both the NLP andthe InfoVis components of the resulting system are designed by referring to a com-mon set of task and data abstractions, they are more likely to be consistent andsynergistic. Finally, in order to answer Q5 and Q6, I focus on applying differenttechniques for user evaluation established in the InfoVis literature, such as informalevaluations, controlled studies, case studies and online user studies [78].8- Why NLP for conversations? To address Q2, I focus on devising and applyingtext mining and summarization methods specific to asynchronous conversations.Most of the existing visual text analytics systems use NLP methods that were orig-inally devised for generic documents. These methods generally do not exploit thespecific characteristics of asynchronous conversations (e.g., use of quotation, dia-log acts), while it has been shown that text analysis results are more accurate whenthese specific characteristics are taken into account [70]. In order to address thislimitation, I aim to adopt and extend text mining and summarization approachesthat take advantage of the conversational features.- Why human-in-the-loop computation? To address Q4, I focus on consideringhuman feedback in the text analysis process. The motivation for such an approachis that the results of NLP systems can be either too noisy and/or may not match theuser’s mental model, and current tasks. In such situations, I aim to support the userin providing feedback to the underlying NLP system, so that the results can bettermatch her information needs.In essence, my approach to designing visual text analytics systems consists ofapply design study methodology in InfoVis to uncover data and task abstractions;apply NLP methods for extracting the identified data to support the correspond-ing tasks; and incorporate human feedback in the text analysis process when theextracted data is noisy or may not match the user’s mental model and current tasks.1.3 ScopeIn the initial, exploratory phase of my research, I focused on understanding andcharacterizing the broad range of domains, users, and data for asynchronous con-versations with the aim of better defining the scope for the thesis. Here, I providean overview of the different types of conversations and users, followed by the de-sign scope of this thesis.Types of online conversations: Online conversations can be characterizedfrom at least three major perspectives, as shown in Table 1.1. First of all, thephenomenal adoption of novel Web-based social media has led to the rise of asyn-chronous conversations in many different modalities, ranging from blogs, to mi-9Dimension ExamplesNature of initial post Article, question, opinion, pro-posal, reviewGenre: The subject of discussionsPolitics, business, technology,education, art, lifestyle, enter-tainment, health, sportsConversational modality: It refers to “ameans or mode of communication, where aparticular modality may be associated withboth distinct communication technologies aswell as distinct social conventions and lan-guage characteristics” [17].Blogs, conversations in socialnetworks, microblogsTable 1.1: Characterizing online conversations from different dimensions.croblogs, to discussions in social networks. Social news blog sites1, such as Red-dit, Slashdot, and Digg contain user-generated stories that are ranked based onpopularity 2. Users can comment on these posts and these comments may also beranked. Online news sites such as New York Times 3 allow readers to contribute bycommenting on articles on a broad range of topics. Moreover, for many users, mi-croblogs such as Twitter and Tumblr and social networking sites such as Facebookand Google Plus have become part of their online life.Second, an online conversation can be characterized based on the content of itsinitial post, which can be an article, question, opinion, or review. Some websitesmay focus on a specific type of initial post. For instance, Quora, a communityquestion answering forum, allows people to start a conversation by asking a ques-tion.Third, online conversations can be categorized based on their genre. Somewebsites may focus on a broad range of subjects, while others may focus on aparticular genre. For example, Huffington Post and Daily Kos blogs are dedicatedto the discussion of politics4, while Slashdot and MacRumors focus on technology.1Also known as social news aggregators2reddit.com, slashdot.org, digg.com3www.nytimes.com4 www.huffingtonpost.com, www.dailykos.com10In this research, we developed a set of visual text analytics systems focusingon supporting a common set of tasks involved in exploring and analyzing conver-sations. However, for the purpose of design and evaluation of the approach, wemainly focused on blog conversations. According to the basic definition, a blogrefers to “frequently updated website consisting of dated entries arranged in re-verse chronological order so the most recent post appears first” [96]. Over theyears, blogs have evolved in terms of style and content, enabling the productionof diverse content [15, 79]. Based on the broad definition of blogs and its diversenature, in this dissertation we refer to blogs as a variety of conversations rangingfrom personal blogs, to corporate blogs, to discussions on news articles, to onlineforums.Blogs are appealing over other conversational modalities as an initial designtarget of this thesis for various reasons. First, blogging is a common way for peopleto freely publish their thoughts about almost any content published on Web [17].Therefore, they are not limited to any specific type of initial contribution or any spe-cific subject listed in Table 1.1. Second, blogs are mainly focused on high-qualitycontent generation and information sharing as opposed to purely social interac-tions, which are more prevalent in social networking sites. Third, blogs are oftenarchived and actively read over several years [42]. Finally, unlike microblog [108],they do not have fixed length comments; furthermore, they have finer conversa-tional structure as participants often reply to a post and/or quote a fragment ofother comments [70], making it a more challenging problem for users to exploreand analyze such conversations.Nevertheless, in the later part of this dissertation, we show how our solutionscan be tailored and adapted to specific domain problems. Here, by domain prob-lems we refer to problems faced by a user or a group of users in a specific con-versational modality, possibly with a focus on a particular genre. For instance, inChapter 5 we present a design study, where our visual text analytics systems weresimplified and tailored to support information seeking tasks in a community ques-tion answering forum (i.e., a blog where the initial post is a question) for a userpopulation possibly having low visualization expertise. Furthermore, in Chapter 6,we report how several other researchers have applied or partially adopted our dataabstractions and visual encodings to address specific domains problems, such as11Table 1.2: User categorization for asynchronous conversation.problems faced by administrators of online health forums and instructors of educa-tional forums.Users: As shown in Table 2.2, users in a conversational domain can be catego-rized into two groups based on their activities: (a) participants who have alreadycontributed to the conversations, and (b) non-participants who have not contributedto the conversations yet. Depending on different user groups the tasks might varyas well, something that needs to be taken into account in the design process.For example, imagine a participant who has expressed her opinion about amajor political issue. After some time, she may become interested to know whatcomments were made supporting or opposing her opinion, and whether those com-ments require a reply right away. In contrast, a non-participant, who is interestedin joining the ongoing conversation on that particular political issue, may want todecide whether and how she should contribute by quickly skimming through a longthread of blog comments. Another group of users may include analysts who do notwish to join the conversation, but may want to analyze and gain insights from con-versations. For instance, a journalist may want to summarize the major argumentsthat were used to support or oppose the political issue. Another example is an ana-lyst who wants to discover important insights from conversations and present those12to a policy maker for supporting her decision-making process.In this dissertation, we have mainly focused on supporting the non-participant’sactivity on archived conversations as opposed to ongoing ones. However, as wewill discuss in Chapter 6, in future work our text analysis methods and visual-ization techniques could be extended to support other types of users and ongoingconversations.1.4 Thesis ContributionsThe fundamental contributions of this research have arisen from devising the ap-proaches for tightly integrating natural language processing and information vi-sualization techniques for interactive exploration of conversational data. In orderto evaluate the effectiveness of these approaches, we explore a two-dimensionaldesign space, as shown in Figure 1.4. These dimensions are: the scale of conver-sations (a single conversation vs. a set of conversations) and the topic modelingchoice (static model vs. human-in-the-loop model). Here, a single conversationconsists of an initial post followed by a set of comments, where these commentsand the initial posts are connected by reply-relationships, as exemplified in Fig-ure 1.2. A collection of conversations consists of two or more conversations thatshare some common themes. For instance, these conversations may be retrievedfrom a blog or forum site given a search query, as shown in Figure 1.1.Along the two different dimensions of our design space, four different visualtext analytics systems have been developed to explore and validate our fundamen-tal approaches. After designing these systems, in Chapter 5 we have shown howour visual text analytics solutions can be applied and tailored to a specific domainproblem. The resulting system, CQAVis, was designed for supporting informationseeking tasks in a community question answering forum for a user population pos-sibly having low visualization expertise. A summary of the available resources forour systems is provided in Table 1.3.At different stages of designing the systems, we conducted user studies to val-idate our approach. Table 1.4 provides a complementary summary of these userstudies organized by thesis chapters. As we can see, a variety of user studies wereconducted ranging from informal evaluation, to more formal summative studies in13Figure 1.4: The design space explored in this research.lab settings, to a web-based study in the wild.We now provide an overview of the visual text analytics systems we developedfollowed by the summary of contributions that have emerged from designing andevaluating these systems.1.4.1 Exploring Conversations with Static ModelIn the initial work, we proposed a visual text analytics system that supports usersin exploring a single asynchronous conversation (Chapter 2). Following the de-sign study methodology in InfoVis, we started with a user requirement analysisfor the domain of blog conversations to derive a set of design principles. Basedon these principles, we designed an overview+detail interface, named ConVis thatprovides a visual overview of a conversation by presenting topics, authors and thethread structure of a conversation (see Figure 1.5). The underlying topic modelingapproach was specifically designed for asynchronous conversations that takes intoaccount the unique features of conversations, namely reply relationships and use ofquotation. By using this approach, we group the sentences of a conversation into anumber of topical clusters using a graph-based clustering technique and label each14System Resource urlConVis Video demo Short: https://goo.gl/gBQE3eLong: www.cs.ubc.ca/group/iui/convis.mp4Live demo Long:www.cs.ubc.ca/ enamul/convis/MultiConVis Video demo Short: https://goo.gl/ZmVYksLong: www.cs.ubc.ca/group/iui/multiconvis.mp4ConVisIT Video demo Short: https://goo.gl/QALDvwLong: www.cs.ubc.ca/group/iui/convisit.mp4MultiConVisIT Video demo Short: https://goo.gl/edT69xLong: www.cs.ubc.ca/group/iui/multiconvisit.mp4CQAVis Video demo Short: https://goo.gl/IM3GezLong: www.cs.ubc.ca/group/iui/cqavis.mp4Live demo http://iyas.qcri.orgTable 1.3: A summary of the available resources for our systems.Table 1.4: Summary of user studies conducted in the dissertation.cluster by generating semantically meaningful descriptors. The visual interfaceprovides various interaction techniques such as brushing and highlighting basedon multiple facets to support the user in exploring and navigating the conversation.We performed an informal user evaluation, which provides anecdotal evidenceabout the effectiveness of ConVis as well as directions for further design. Theparticipants’ feedback from the evaluation suggests that ConVis can help the user toidentify the topics and opinions expressed in the conversation; supporting the user15Figure 1.5: The ConVis interface (see Figure 2.2 for further description)..in finding comments of interest, even if they are buried near the end of the thread.The informal evaluation also reveals that in few cases the extracted topics andopinions are incorrect and/or they may not match the mental model and informationneeds of the user. To address this problem, we introduced a human-in-the-loopmodel as discussed in Section 1.4.2.In subsequent work, we focused on supporting readers in exploring a collectionof conversations related to a given query (Chapter 3). Exploring topics of interestthat are potentially discussed over multiple conversations is a challenging prob-lem, as the volume and complexity of the data increases. To address this challenge,we devised a novel hierarchical topic modeling technique that organizes the topicswithin a set of conversations into multiple levels, based on their semantic similar-ity. For this purpose, we extended the topic modeling approach for a single con-versation to generate a topic hierarchy from multiple conversations by consideringthe specific features of conversations. We then designed a visual interface, namedMultiConVis that presents the topic hierarchy along with other conversational data,as shown Figure 1.6. The user can explore the data, starting from a possibly largeset of conversations, then narrowing it down to the subset of conversations, andeventually drilling-down to the set of comments relating to one conversation.We evaluated MultiConVis through case studies with domain experts and aformal user study with regular blog readers. Our case studies demonstrate that the16Figure 1.6: The MultiConVis interface (see Figure 3.1 for further descrip-tion).system can be useful in a variety of contexts of use, while the formal user studyprovides evidence that the MultiConVis interface supports the user’s tasks moreeffectively than traditional interfaces. In particular, all our participants, both in thecase studies and in the user study, appear to benefit from the topic hierarchy andthe high-level overview of the conversations. The user study also shows that theMultiConVis interface is significantly more useful than the traditional interface,enabling the user to find insightful comments from thousands of comments, evenwhen they were scattered around multiple conversations, often buried down nearthe end of the threads. More importantly, MultiConVis was preferred by the major-ity of the participants over the traditional interface, suggesting the potential valueof our approach for combining NLP and InfoVis.Contributions:1) We performed a user requirements analysis based on extensive literaturereview in the domain of blogs to inform our interface design for both a singleconversation as well as a set of conversations. The analysis reveals the data andtask abstractions for the problem domain and a set of design principles to supportthe user requirements.2) We adopted a topic modeling method for effectively extracting topics froma single conversation. We also extended this method for creating a topic hierar-chy for a whole collection of conversations, by aggregating the topics extractedfrom each conversation in the collection. The novelty of our approach is that in17both extracting the topics and organizing them into a hierarchy, our methods takesadvantage of conversational features to enhance the quality of the topic model.3) We presented the design and implementation of two novel visual interfaces:ConVis and MultiConVis. Unlike previous approaches which either visualize somemetadata or only one type of content information from the conversations (e.g., thetopics covered but not opinions), our interfaces visualize both topic and opinionmining results along with a set of metadata, such as authors and position of thecomments. We also proposed a way to seamlessly integrate the two interfacesto allow users to switch from exploring a collection of conversations to a singleconversation.4) We designed and conducted a series of user studies, namely an informalevaluation, a formal lab-based study, and three case studies, which revealed thedifferences in user performance and subjective opinions when our systems werecompared to traditional blog interfaces for exploring conversations. These studiesalso provide further directions for our subsequent research, including the need fora human-in-the-loop model.1.4.2 Exploring Conversations with Human-in-the-loop ModelA preliminary evaluation of ConVis suggested that while the participants were gen-erally positive about the interface, the results of the topic model were sometimesnoisy and/or did not match their current information needs. This was particularlyevident from the interviews, where users expressed a pressing need for enhancingtheir ability to revise the topic model according to their own information needs.This was also revealed by spontaneous users’ comments, while they were perform-ing the experimental tasks.Motivated by this experience, we proposed a novel interactive topic modelingapproach in Chapter 4 that revises the topics on the fly on the basis of users’ feed-back. We then designed a visual interface, named ConVisIT, by extending ConVis,where the user can explore long conversations, as well as revise the topic modelwhen the current results are not adequate to fulfill her information needs (see Fig-ure 1.7). By analyzing the tasks of exploring online conversations, we devised aset of topic revision operations that are critical to the user. For instance, the user18Figure 1.7: Our interactive topic modeling frameworkcould perform a merge operation on two topics if these topics are talking aboutsimilar issues. In other cases, if a topic is too generic the user could split this intofurther smaller sub-topics. For example, splitting ‘ObamaCare’ would create threesubtopics namely ‘health insurance’, ‘drugs’ and ‘healthcare cost’. By dynamicallyrevising the topic model, the user could build a topic model that better matches hermental model and current information needs.A similar human-in-the-loop model was investigated for exploring a set of con-versations in Chapter 4. The motivation is that while a topic hierarchy is useful toorganize the discussion themes within a set of conversations into a coherent struc-ture, such a hierarchy may be too noisy and/or may not match the user’s currenttasks or her mental model. To support the user in this situation, we devised anapproach for revising the topic hierarchy based on users’ information needs. Ac-cording to this approach, the user can provide the feedback to the system throughthe visual interface MultiConVisIT, which incorporates a set of features for re-vising the topic hierarchy. The system then updates the topic hierarchy, which isvisualized in the interface for further exploration.We ran two summative user studies in lab-based settings to compare ConVisIT19and MultiConVisIT with interfaces that do not support human-in-the-loop topicmodeling. In essence, both studies suggest that most users benefit from gettingmore control over the topic modeling process while exploring conversations. Thefirst study, as described in Section 4.6.1, reveals that ConVisIT outperformed botha traditional interface as well as ConVis along several subjective metrics. Similarresults were found in the second study reported in Section 4.6.2, where MultiCon-VisIT was found to be more useful and was also preferred over its counterpart thatdoes not provide interactive topic revision operations.Contributions:1) We proposed a novel interactive topic modeling approach specifically de-vised for asynchronous conversations. Existing systems for interactive topic mod-eling (e.g., [19, 66, 81]) were mainly devised for generic documents without con-sidering the unique features of conversations.2) We designed a set of interactive features that allow the user to revise thecurrent topic model. In response, the interface updates and re-organizes the mod-ified topics by means of intuitive animations, so that the user can better fulfill herinformation needs.3) We conducted two lab-based summative studies, which revealed the poten-tial utility of our human-in-the-loop topic modeling approaches.1.4.3 Applying and Tailoring the Solutions to Specific DomainsAfter designing the visual text analytics systems, we have analyzed how our solu-tions for generic blog conversations can be applied to specific domain problems inChapter 5 and Chapter 6. To answer this question, we conducted a design study inthe domain of community question answering (CQA) forum, in which the initialpost is a question. Here, our generic visual text analytics solutions were appliedand tailored to support information seeking tasks for a user population possiblyhaving low visualization expertise. Figure 1.8 shows a screenshot of our interface,presenting the results for a user-provided query.Our system was evaluated by deploying it in an online study, in which it wastested with hundreds of real users. This large-scale Web study underlines the po-tential for tightly integrating NLP and InfoVis in practice, offering the users a new20Figure 1.8: CQAVis is a visual interface to support information seeking tasksin a community question answering forum (See Figure 5.3 for furtherdescription).way of seeking information in CQA forums. It also reveals important lessons fordesigning and studying such systems for real users with varying levels of exper-tise, which can arguably be generalizable for the design and evaluation of visualanalytics systems for other conversational domains.In addition to our own work, we also conducted a survey focusing on howother researchers have recently applied or partially adopted the data abstractionsand visual encodings of MultiConVis and ConVis in a variety of domains, such asto online health forums and educational forums. In Section 6.2, we analyze theseresearch works to understand the potential applicability of our systems to differentdomain problems.Contributions:1) We characterized the CQA forums by identifying user tasks and some key21design needs.2) We designed and implemented a visualization tool that demonstrates howour generic solutions for integrating NLP and InfoVis techniques presented inChapter 2 and 3 can be applied and tailored to the information seeking tasks inCQA.3) We evaluated the new CQA forum tool in the wild in an ecologically validtesting by deploying the system among real forum readers.4) We identified and summarized generalizable lessons that can be useful todesign visual interfaces for online conversations in other domains, as well as todesign for user population possibly having low visualization literacy.22Chapter 2Supporting Users in Exploring aSingle ConversationIn this chapter, we present a visual text analytic system that tightly integrates inter-active visualization with novel text mining and summarization techniques to fulfillinformation needs of users in exploring a single conversation. At first, we performa user requirement analysis for the domain of blog conversations to derive a setof design principles. Following these principles, we present an interface that visu-alizes a combination of various metadata and textual analysis results, supportingthe user to interactively explore the blog conversations. Finally, we conducted aninformal user evaluation, which provides anecdotal evidence about the effective-ness of our system and directions for further design1. A further evaluation of oursystem, which was conducted in the form of a summative user study in a controlledsetting, is described in Chapter 4.2.1 IntroductionA single asynchronous conversation such as a blog conversation consists of an ini-tial post such as an article or a question, followed by a set of subsequent replies.Often many people contribute to the discussion, which can quickly become very1This chapter is a slightly modified version of our paper ConVis: A visual text analytic systemfor exploring blog conversations, by Enamul Hoque and Giuseppe Carenini; in Journal of ComputerGraphics Forum (Proceedings of EuroVis), 33(3):221230, 2014 [59].23long with hundreds of comments. Traditional social media sites present the origi-nal posts and subsequent replies as a paginated indented list (see Figure 1.2). Thusthe reader needs to go through a long list of comments sequentially, until her in-formation needs are fulfilled. Going through such an overwhelming amount oftextual data in this way often leads to information overload, i.e., the user finds itvery difficult to get insights about the ongoing (or past) discussion. The end resultis that the readers start to skip comments, generate simpler responses and leave theconversation without satisfying their intent [69].To illustrate the problem, consider a scenario where Sarah is interested intechnology-related blogs. She opens a blog discussion about a news article ofhacking in US army servers. She is curious to know what are the different opin-ions about the US cyber security lapses. She finds that the top few posts blame the‘shoddy work’ done by the contractor companies, while others believe that the in-cident was merely ‘a honeypot for hacker’. Sarah wants to know more about whatother people are saying about the hacking issue, but soon realizes that the topicof discussion is shifted to ‘US involvement in the Vietnam war’, which she is notinterested in. So Sarah keeps on skimming comments and notices that some othersare discussing the technical details of hacking. At this point, Sarah is quite ex-hausted; she does not know whether the long list of remaining comments discussthe reasons for cyber security lapses; but she decides to end the reading withoutfulfilling her information needs.To support readers in dealing with similar situations, we have developed Con-Vis: a visual exploratory text analytic system for blogs that tightly integrates in-teractive visualization with text mining techniques that are especially devised todeal with conversational data. Motivated by the nested design model [91], westarted by characterizing the domain. While asynchronous conversations compriseemails, blogs, microblogs (e.g., Twitter), and messaging; in this chapter we focuson the domain of blogs. In fact, blog conversations often have finer conversationalstructure as participants often reply to a post and/or quote a fragment of other com-ments [70], making it a more challenging problem for users to explore and analyzesuch conversations. Once we have characterized our domain, we derive a set ofdesign principles, which then guide the visual encoding and interaction techniquesof ConVis. The primary contributions of this work are as follows:241) We performed a user requirements analysis based on extensive literaturereview in the domain of blogs, as described in Section 2.2. The analysis includesdata and task abstractions for the problem domain and a set of design principles tosupport the user requirements.2) To the best of our knowledge, ConVis is the first visual text analytic systemfor blog conversations that visualizes both topic and opinion mining results alongwith a set of metadata such as authors and position of the comments, which wereidentified as primary means for browsing and navigation from the user requirementanalysis. Existing systems either visualize some metadata or only one type of con-tent information from the conversations (e.g., the topics covered but not opinions),thus limiting the ability of the user to explore and analyze the conversation.3) We present the design, implementation, and evaluation of ConVis. ConVisvisually represents the overview of a blog and then allows the user to explore thisconversation based on multiple facets (e.g., topics and authors). This is a majorshift from traditional blog reading interfaces which provide a long list of paginatedcomments, thus only supporting linear navigation.2.2 From User Requirements to Design PrinciplesBlog reading has been extensively studied in the fields of computer mediated com-munications (CMC) [35, 72, 135], social media [48, 57], human computer interac-tion (HCI) [13, 30, 90], and information retrieval [69, 74, 83, 87, 119]. This liter-ature provides a detailed analysis of the motivations and goals for reading blogs,along with the unique behaviours of blog reading. Based on this analysis, wecharacterize the data and tasks in the domain of blogs and then identify the userrequirements (UR), which are finally translated into a set of design principles.2.2.1 Why and How People Read Blogs?Over the years, several studies have been conducted to identify the motivations andgoals for reading blog conversations [9, 30, 72, 74, 87]. Kaye performed a web sur-vey among active bloggers to find the reasons why they access blogs [72]. Thesereasons were grouped into 10 general motivational blocks, including informationseeking, fact checking, guidance/opinion seeking, and political surveillance. In25particular, the users reported that often they read blogs to seek information abouttheir area of interests such as education, technology, and politics [30, 72]. Blogsalso help users to quickly verify and compare accounts of news and informationand check the accuracy of traditional media (fact checking) [72, 74]. Frequently,users read blogs to seek a wide variety of opinions and to help them make up theirminds about important issues [30, 72, 74, 87]. Mishne noted that the informationin blogs is often subjective or opinionated [87]. In fact, it has been found thatreaders consider blogs with a mixture of positive and negative posts more credi-ble [9]. Overall, this suggests that the interface should facilitate a visual overviewof the diverse range of opinions covering positive and negative sentiments aboutimportant topics, allowing the user to understand various viewpoints (UR-1).The people-centric nature of the domain of blogs was reported in various stud-ies [30, 87]. Dave et al. reported that Blog readers are looking to find ideas orinformation, take the pulse of a community and meet people [30]. In other words,blogging can promote a sense of belonging in the blogosphere among others whotry to publicly express their opinions and to affiliate with like-minded individu-als (“find people who think like I do”) [72, 74]. This indicates the importance ofidentifying the key participants and their opinions (UR-2).In reality, users do not always look for important information or opinions, theymay read blogs simply for enjoyment or personal fulfillment [13, 72]. An ethno-graphic study reveals that “the participants visit blogs for information, inspiration,entertainment, and to a certain extent because it is just what they have alwaysdone” [13]. Kaye suggests that blogs bring more novelty and thus users find blogsto be more fun and interesting than formal media content [72]. This aspiration fornovelty and fun should be encouraged by the interface by promoting explorationand serendipitous discoveries (UR-3).Previous studies suggest that many blog readers are inherently variety-seekers [90, 119], i.e., they are often looking for a variety of opinions and dis-cussion themes. Singh et al. found the individual’s tendency to switch from oneset of topics to another [119]. Even in the case when a reader may read only con-tent on the same topic, she essentially reads distinct posts leading to some varietywithin a topic. Thus, being able to browse the conversations based on different pos-sible topics and sub-topics can effectively support this variety seeking behaviour26(UR-4). Many users also exhibit skimming tendency [95, 135], i.e., they seek toquickly scan through a set of posts to understand what the authors are saying. Thisbehaviour might be explained by the exploratory nature of blog reading. It hasbeen found that readers remain in an exploratory state (intermediate state) beforeentering into a focused state from another focused state [119]. The reading in thisexploratory state provides clues of what the reader may expect to find if she focusedon the comments she is currently skimming. In other words, the reader needs toquickly skim through (i.e., explore) a few posts about a topic before delving deeperinto its details (i.e., entering into a focused state). Therefore, the interface shouldfacilitate open-ended exploration within the conversation space, by providing navi-gational cues that help the user to seek interesting comments and to quickly decidewhether they are worthwhile to read (UR-5).2.2.2 Data and Tasks AbstractionFrom the analysis of primary goals of blog reading, we compile a list of tasks andthe associated data variables that one would wish to visualize for these tasks. In ad-dition, we analyze the Blog track in the Text REtrieval Conference (TREC), whichdefines a set of tasks on opinion finding (e.g., What do people think about X?) andblog distillation (e.g., Find me a blog with a principle interest in X) [83]. Based onthese analyses, we create a set of tasks (phrased as questions) that the blog readermight ask, along with the possible associated variables as listed in Table 2.1. Mostof these questions involve topics and the sentiment expressed in the conversation,which are relevant to some of the key goals of the users, including informationseeking, fact checking, and guidance seeking. Q1 and Q2 are related to findingtopics, while Q3 through Q6 can involve both topic and sentiment information. Q7through Q9 may additionally require to know the people-centric information andrelate such information with other data such as topics and sentiment (extendingUR-2). The last question (Q10) reflects the motivations for personal fulfillmen-t/enjoyment. Finally, to reflect the exploratory behaviour associated with most ofthe tasks listed here, both thread (to support exploratory state) and comments (tosupport focused state) are included as data variables.Upon identifying the data involved in the list of tasks, we abstract them in27No Questions (Q) Topic Author Opinion Thread Comment1 What this conversation isabout?X X2 Which topics are generat-ing more discussions?X3 What do people say abouttopic X?X X X X4 How controversial was theconversation? Were theresubstantial differences inopinion?X X X X X5 How do other people’sviewpoints differ from mycurrent viewpoint on topicX?X X X X6 Why are people support-ing/opposing an opinion?X X X7 Who was the most domi-nant participant in the con-versation?X X X8 Who are the sources ofmost negative/positivecomments on a topic?X X X X X9 Who has similar opinions tomine?X X X10 What are some interesting/-funny comments to read?X X X XTable 2.1: A set of tasks (phrased as questions) that a user may likely have toperform/answer while exploring a blog conversation to satisfy her infor-mation needs.terms of scale and type. Table 2.2 lists a comprehensive set of conversational datato be visualized and their abstract types. We also compute average and maximumcounts for different types of data to better understand what scale the visualizationneeds to deal with. These values are computed based on a set of 20 Slashdotblogs which comes with human generated topic annotations [70]. Here, the topicsand the sentiment are added since they can be useful for performing almost allof the tasks in Table 2.1. The position of the comment in the discussion spaceand comment length are added since they have been found to be useful cues for28Attributesfrom dataAbstracttypeCountsAvg. Max.ThreadstructureTree Depth: 4.3nodes: 60.3Depth: 5nodes: 101Topic Categorical 10.77 23Author Categorical 57.71 92DerivedAttributesAbstracttypeRangeTopic length Quantitative [0.0,1.0] (normalized)CommentlengthQuantitative [0.0,1.0] (normalized)Position ofthe commentOrdinal [1,101]Sentiment Ordinal [-2,-1,0,+1,+2]Table 2.2: Set of conversational data to be visualized and their abstract types.The avg. and max. counts for different types of data are provided basedon the Slashdot dataset.navigation [13, 95] (UR-6).Another study suggests that the exact timestamp of a comment is much lessimportant to users than its chronological position with respect to the other com-ments [13] (UR-7). Therefore, we wanted to encode the position of the comments(ordinal) as opposed to their timestamps (quantitative).2.2.3 Design PrinciplesBased on the user and tasks analysis, we have identified the following key designprinciples (DP) that form the basis of our visualization system. Each design prin-ciple is derived from one or more of the User Requirements, as follows:1. Show comprehensive set of relevant data: The visual interface should dis-play a comprehensive set of user/system generated metadata namely com-ment length, position of the comment, and moderation score ( UR-6, UR-7),as well as the results of text analysis (UR-1) as listed in Table 2.2.2. Provide faceted exploration: Considering the exploratory nature of blog29reading, the interface should provide various facets (e.g., topics and authors)as a means for navigation and browsing. Once these primary facets are ef-fectively presented, users will arguably take a more active role in exploringconversations in a non-linear fashion, by quickly navigating through com-ments of a particular facet (addressing UR-3, UR-4, UR-5).3. See relationship between multiple facets: Many of the common tasks forbrowsing conversation require the user to perceive the relations betweenmultiple facets and comments. For example, to perform the task in Q8, theuser needs to know how the author, opinion, and topic facets are related toeach other. Thus, we aim to effectively reveal the relation between multi-facets to the user, to better support the critical tasks identified in Table 2.1(UR-2).4. Provide overview at multiple granularity levels: We aim to integrate thehigh level summarized view of the conversation (e.g., topics), the visualoverview of the thread (showing sentiment information of all the comments),and the actual comments (detailed content) in a seamless way, so that the usercan easily switch between the different levels of overview and the actual con-versation (UR-1, UR-5).5. Lightweight interactions: To enhance learnability, the interface should fa-cilitate the open-ended exploration of conversations through a set of low-costinteractions [77], that can be easily triggered and reversed without requiringmuch cognitive overload (UR-5). Low cost interactions, along with interfacemetaphors that are easily understood, can make the exploration process moreenjoyable (UR-3).2.3 Related WorkPrevious work on visualizing asynchronous conversations can be classified into twocategories: metadata-based and content-based visualization; depending on whetherthe focus of the research was more on visualizing the system and user generatedmetadata (e.g., thread structure), vs. the results of some text analysis (e.g., findingtopical clusters).302.3.1 Metadata-based VisualizationEarlier works for visualizing asynchronous conversations primarily focused onrevealing the structural and temporal patterns of a conversation [34, 101, 125].Typically, the goal was to effectively represent the thread structure of a conversa-tion using tree visualization techniques, such as thumbnail metaphor (a sequenceof rectangles) [125] and radial tree layout [101]. Various interaction techniques,such as highlighting user-specified search terms [125] and zooming into an area ofthe thread overview [101] were proposed to deal with space constraints for largerthreads. Other works visualize various system and user generated metadata such astimestamp [34]; comment length and moderation score [95]. Metadata-based visu-alization has also been applied to blog archives [68], as opposed to a single blogconversation, which shows the history of social interactions to help users identifypotentially useful blog entries.Even though metadata-based visualizations help to understand the social in-teraction patterns or the quality of the comments in a conversation, they may beinadequate to support users in most of the tasks shown in Table 2.1. For exam-ple, if the user is reading a political blog to know “what do people think aboutObama’s recent healthcare policy?”, knowing how nested the thread structure isor how many replies are made to a particular post would be insufficient. Also,the type of metadata can vary among different forums or blog sites, hence it ishard to generalize the utility of some metadata in supporting the browsing and ex-ploration processes. Therefore, in this work, we are interested in complementinguseful metadata by analyzing the textual content and conveying the results to theuser. The aim is to provide insights that are based on a more comprehensive viewof the conversation.2.3.2 Content-based VisualizationSome early works aimed to identify and visualize the primary themes or topicalclusters within conversations [30, 109]. In contrast, [131] focused more on theorganization of the discussion by creating a tree layout, where the parent commentis placed on top as a text block, while the space below the parent node is dividedbetween supporting and opposing statements. In general, the main limitation of31these approaches is that they rely on simple, generic text analysis methods, whichdo not consider the structure of the conversation. More recently, the TIARA systemproposes an enhanced Latent Dirichlet Allocation (LDA)-based topic modelingtechnique, which automatically derives a set of topics to summarize a collection ofdocuments and their content evolution over time [127]. Each layer in the graphicalrepresentation represents a topic, where the keywords of each topic are distributedalong time. From the height of each topic and its content distributed over time, theuser can see the topic evolution. In contrast to visualizing topics, Opinion Spacevisualizes the differences in opinions in an online conversation [44] by projectingusers on a two-dimensional map based on Principal Component Analysis(PCA),where the participants with similar opinions are positioned near to each other. Theexpectation is that by exploring the map, users can better understand a broad rangeof viewpoints.While there has been a clear trend of moving beyond using only metadata to anincreasing use of text analysis within the interactive visualization process, currentsystems generally suffer from two fundamental limitations. First, they use generictext analysis techniques. Secondly, current systems only convey one type of minedinformation (e.g., either topic or opinion), thus limiting the user’s ability to performmost of the tasks in Table 2.1. In this work, we aim to address both limitations.2.3.3 Faceted ExplorationFaceted browsing has been widely used in general text and multimedia search [55].According to this approach, various metadata and content information can be usedas facets for exploring and filtering content. Various techniques have been de-veloped to interactively explore the faceted datasets [16, 38, 80, 132]. SolarMaparranges entities of the topic facet as cluster nodes and interactively highlights therelations with other facets located in the surrounding circular ring to this clusterregion [16]. FacetLens introduces linear facets (e.g., year) and integrates richerfaceted navigation techniques to expose trends and relationships between attributevalues within a facet [80]. PivotSlice allows the user to construct a series of dy-namic queries using facet values to divide the entire dataset into different subsetsin a tabular layout, while directed edges are drawn between related items upon32selection [132].In general, the above methods require the user to apply some interactive tech-niques (e.g., dynamic queries [132], context switching [16]) in order to explorethe relationships between facets. In contrast, our work is more similar to [38],where all relationships between facets are permanently displayed and are directlyaccessible to the user.2.4 Mining and Summarizing ConversationsWe now discuss the two computational approaches that were applied for miningand summarizing conversations: topic modeling and sentiment analysis.2.4.1 Topic ModelingIn topic modeling, the sentences of a blog conversation are first grouped into aset of topical clusters/segments (segmentation). Then, representative key phrasesare assigned to each of these segments (labeling). We adopt a novel topic model-ing approach that captures finer level conversation structure in the form of a graphcalled Fragment Quotation Graph (FQG) [70]. All the distinct fragments (both newand quoted) within a conversation are extracted as the nodes of the FQG. Then theedges are created to represent the replying relationship between fragments. If acomment does not contain any quotation, then its fragments are linked to the frag-ments of the comment to which it replies, capturing the original ‘reply-to’ relation.Here, we briefly describe how topic segmentation and labeling can take advantageof the FQG, interested readers are directed to [70] for a more detailed description.Topic Segmentation:First, a Lexical Cohesion-based Segmenter (LCSeg) [49] is applied to find the seg-mentation boundary within each path (from roots to the leaves) of a FQG (see Fig-ure 3.4). Then an undirected weighted graph G(V,E) is constructed, where eachnode in V represents a sentence within the conversation, and each edge w(x,y) inE represents the number of segments on different paths in which the two sentencesappear together. If x and y do not appear together in any segment, their cosinesimilarity (always between 0 and 1) is used as edge weight. By construction, any33Figure 2.1: a) Reply-to relationships between the initial post A and the com-ments C1,C2, ...,C6 (left). Here, ‘>’ represents the quotation mark andeach lowecase letter corresponds to a text fragment that may compriseone or more sentences. b) the corresponding FQG (right) where eachnode represents a text fragment and the edges represent replying rela-tionships between fragments.subgraph of G whose nodes are strongly connected represent a set of sentences thatshould belong to the same topical segment.To identify subgraphs whose nodes are strongly connected, a k-way min-cutgraph partitioning algorithm is applied on the graph G(V,E) with the normalizedcut (Ncut) criteria. Since Ncut is an NP-complete problem, an approximate solu-tion is found following an efficient method proposed by Shi and Malik [118]. Atthe end of this process, each sentence of the conversation is assigned to one of thetopical segments.Topic LabelingTopic labeling takes the segmented conversation as input and generates keyphrasesto describe each topic in the conversation. The conversation is first tokenized anda syntactic filter is applied to select only nouns and adjectives from the text. Thena novel graph-based ranking model is applied that exploits two conversational fea-tures: information from the leading sentences of a topical segment and the FQG.34For this purpose, a heterogeneous network is constructed that consists of threesubgraphs: the FQG; the word co-occurrence graph (GW ) that captures the co-occurrence of each word in the topic cluster with respect to the words in the leadingsentence of that cluster; and a bipartite graph that ties these two graphs together. Aco-ranking method [134] is then applied to this heterogeneous network to generatethe ranked list of words for each topic. The top-M selected keywords from theranked list are then marked in the text, and the sequences of adjacent keywordsare collapsed into keyphrases. Finally, to achieve broad coverage of the topic, theMaximum Marginal Relevance (MMR) criterion is used to select the labels that aremost relevant, but not redundant.2.4.2 Sentiment AnalysisFor sentiment analysis, we applied the Semantic Orientation CALculator (SO-CAL) [121], which is a lexicon-based approach for determining whether a textexpresses a positive vs. negative opinion. SO-CAL computes polarity as numericvalues. Its performance is consistently good across various domains and on com-pletely unseen data, thus making a suitable tool for our purpose. At first, we applySO-CAL to generate the polarity for each sentence of the conversation. We define5 different polarity intervals, and for each comment in the conversation we counthow many sentences fall in any of these polarity intervals. Then, we normalize thevalue in each polarity interval by the total number of sentences in the comment tocompute the polarity distribution for that comment.2.4.3 Corpora and PreprocessingWhile designing and implementing ConVis, we have been mainly working withtwo quite different blog sources: Slashdot [2] (a technology related blog site) andDaily Kos [3] (a political analysis blog site). The Slashdot corpus, which wascollected from [70], consists of 20 conversations annotated with topics by threehuman annotators. The other corpus was created by crawling blog conversationsfrom the Daily Kos site.After obtaining the conversations, we converted them into a common format(representing various metadata and the actual conversation) that our text mining35methods can process. Then, we applied topic modeling and sentiment analysisto each conversation. For the Slashdot corpus, we automatically generate a topicmodel comprising of x topics, where x represents the average number of topicsproduced by the annotators for that conversation. Since the Daily Kos corpus wasnot annotated by any human rater, we simply used the average number of topics(i.e., 11) among all the blog conversations annotated in [70]. Finally, the results oftopic modeling and sentiment analysis along with different metadata are mappedto the abstract data type as shown in Table 2.2.2.5 ConVis Design and Implementation2.5.1 Visual EncodingConVis is designed to support multi-faceted exploration of blog conversations2.The visual encoding was guided by the design principles presented in Section 2.2,and the information to be presented is generated by the text mining techniquesdescribed in Section 2.4. A high-level design decision for the interface was to fol-low an overview+detail approach to deal with the space constraints. The rationaleis that several studies have found the overview+detail approach to be more effec-tive for text comprehension tasks than other approaches such as zooming and fo-cus+context [24]. Overview+detail also allows us to provide information at multi-ple granularities (DP-4) by displaying a high-level overview of what was discussedby whom (i.e., topics and authors), a visual summary of the whole conversation (inthe Thread Overview) and the most detailed view representing the actual conver-sation (see Figure 2.2). The interactions between these views are performed in acoordinated way. Below, we describe the design of each component along withcareful justification of crucial design decisions.The Thread Overview hierarchically represents a visual summary of the wholeconversation, and allows the user to navigate through the comments (see Figure 2.2,middle). It displays each comment as a horizontal stacked bar. Each stacked barencodes three different metadata (comment length, position in the thread, and depthof the comment within the thread) and the text analysis results (i.e., sentiment) for2A video demonstration of ConVis is available here https://goo.gl/gBQE3e.36Figure 2.2: A snapshot of ConVis for exploring blog conversation: TheThread Overview visually represents the whole conversation encodingthe thread structure and how the sentiment is expressed for each com-ment(middle); The Facet Overview presents topics and authors circu-larly around the Thread Overview; and the Conversation View presentsthe actual conversation in a scrollable list (right). Here, topics and au-thors are connected to their related comments via curved links.a comment, which are identified to be potentially useful navigational cues (DP-1).The stacked bars are vertically ordered according to their positions in the threadstarting from the top with indentation indicating thread depth, allowing the userto see the whole thread structure at a glance. The height of each bar encodes thenormalized comment length, while the width of all the bars remain equal. Thusone could easily notice the differences in length among comments. The currentimplementation can reasonably show up to 200 comments when the visualizationis used on a 1920×1080 screen. This scale was sufficient for all the conversationswe have examined (see Table 2.2) and is plausibly adequate for the vast majorityof blog conversations.The distribution of sentiment orientation of a comment is encoded using colorwithin each stacked bar, where width of each cell of a stacked bar indicates thenumber of sentences that belongs to a particular sentiment orientation. A set of fivediverging colors was chosen from ColorBrewer [5] to visualize this distribution in aperceptually meaningful order, ranging from purple (highly negative,−2) to orange37(highly positive, +2). Thus, the distribution of colors in the Thread Overviewcan help the user to perceive whether this conversation is mainly neutral /positive/negative, or very controversial. For example, if the Thread Overview is mostly instrong purple color, then the conversation has many negative comments.Facet Overview: To support multifaceted exploration of the conversation (DP-2), the primary facets, namely topics and authors are presented in a circular layoutaround the Thread Overview (see Figure 2.2). Topics and authors are presented tothe left and right side of the Thread Overview respectively, creating a symmetricview. Both topics and authors are positioned according to their chronological orderin the conversation starting from top, allowing the user to understand how the con-versation evolves as the discussion progresses. Two distinctive qualitative colorsare used to encode the facet links and the facet elements. The font size of a topicencodes how much it has been discussed when compared to the other topics withinthe whole conversation. Likewise, the font size of an author encodes how manytimes a participant has posted in a conversation. Thus, the font size of both facetshelps the user to quickly identify what are the most discussed themes and who arethe most dominant participants within a conversation.To convey how facets and comments of the conversations are inter-related (DP-3), the facet elements are connected to their corresponding comments in the ThreadOverview via subtle curved links indicating topic-comment-author relationships(the relation between topic and comments can be many-to-many). While a com-mon way to relate various elements in multiple views is synchronized visual high-lighting, we choose visual links because it has been found that users can locatevisually linked elements in complex visualizations faster and with greater satisfac-tion than plain highlighting [120]. By default, these visual links are drawn in thede-saturated tone of the corresponding facet’s color.The design decision of arranging facet elements in a circular layout is moti-vated by two primary reasons. First, more elements can be accommodated in thisway than in a linear fashion. In fact, the current implementation can reasonablyshow up to 100 topics /authors when the visualization is used on a 1920× 1080screen. Second, a circular layout helps to encode the curved links between facetsand comments without much visual clutter.The Conversation View displays the actual text of the comments as a scrol-38Figure 2.3: Hovering the mouse over a topic element (‘major army security’)causes highlighting the connecting visual links, brushing the related au-thors, and providing visual prominence to the related comments in theThread Overview.lable list (see Figure 2.2, right). Like in the Thread Overview, comments are in-dented according to their depth in the thread hierarchy, thus revealing the reply-torelationships. At the left side of each comment, the following metadata are pre-sented: title, author name, photo, and a stacked bar representing the sentimentdistribution (mirrored from Thread Overview). Overall, the Conversation Viewprovides a familiar web discussion interface to the user, thus potentially enhancingthe learnability for those who are accustomed to the current blog interfaces (DP-5).2.5.2 User InteractionsConVis provides a set of lightweight interactions [77]. These interactions are de-signed so that they can be easily triggered without causing drastic modificationsto the visual encoding, thus allowing the user to comprehend their effect withoutmuch cognitive overload (DP-5).Both overviews and the Conversation View interact in a coordinated way. Hov-ering the mouse over a facet element causes a rectangular border to be drawnaround that element and subsequently highlights the connecting curved links bychanging their color to a darker tone. This also causes brushing the elements in the39other facet, and provides visual prominence to the related comments in the ThreadOverview by de-saturating the rest of the stacked bars (see Figure 2.3). As such, theuser can perceive relations between multiple facets (DP-3). If the user becomesfurther interested in a facet element (e.g., a specific topic), she can subsequentlyselect that item by clicking on it, resulting in drawing a thick vertical outline nextto the corresponding stacked bars in the Thread Overview (see Figure 2.4). As aresult, the comments of a particular topic/author remain persistently selected. Thecolor of the vertical outlines is the same color as its facet, thus distinguishing be-tween the selections of different types of facets. This encoding is also mirrored inthe Conversation View (see Figure 2.4, right). Moreover, the user can select multi-ple facet items so that the comments shared among them become more apparent.Highlighting and selection is also possible for each individual comment bothfrom the Thread Overview and the Conversation View. Hovering the mouse overthe stacked bar representing a comment causes it to be highlighted by drawing hor-izontal outlines on the top and bottom of the bar. It also causes the related topic(s)and author to be brushed along with the visual links connecting the comment tobe highlighted. This highlighting is also mirrored in the Conversation View. Con-versely, hovering the mouse over a comment in the Conversation View highlightsthe corresponding stacked bar in the Thread Overview. The user can subsequentlyselect a comment either in the Thread Overview (see Figure 3.5) or in the Conver-sation View, so that this highlighting remains persistent unless the user toggle thestate by clicking on it again.A selection of a comment in the Thread Overview or of a facet in the FacetOverview causes scrolling to the relevant comment in the Conversation View via asmooth animation. In this way, the user can easily locate the comments that belongto a particular topic and/or author. Moreover, the keyphrases of the relevant topicand sentiments are highlighted in the Conversation View upon selection, providingmore details on demand about what makes a particular comment positive/negativeor how it is related to a particular topic. The user can also scroll through thecomments with traditional interactions using the mouse wheel, or standard arrowand page keys. Finally, any branch of the conversation can be expanded/collapsedby clicking the up/down arrow to the left side of parent posts.40Figure 2.4: Clicking on a topic results in drawing a thick vertical outline nextto each of the related comments.2.5.3 ImplementationA server-side component (in PHP) retrieves conversations annotated with topicsand sentiment information. The visualization component, on the other hand, isimplemented in JavaScript (using the D3 and JQuery library), which is sufficientlyfast to respond in real time to the user actions3.2.6 Informal EvaluationDuring the design and implementation of ConVis, we conducted formative evalu-ations to identify potential usability issues and to iteratively refine the prototype.Once the prototype was completed, we ran an informal evaluation [78] with a dif-ferent set of target users to evaluate the higher levels of the nested model [91]. Inthis evaluation, we aimed to: 1) understand to what extent the overall visualizationand its specific components are perceived to be useful by the potential users; 2)identify differences among users in how they performed the tasks; and 3) solicitideas for improvements and enhancements.3A live demo of ConVis is available here www.cs.ubc.ca/enamul/convis41(a) (b)Figure 2.5: An example showing: (a) The user clicked on a comment (theone with horizontal outlines) in the Thread Overview. (b) As a result,the system automatic scrolled to the actual comment in the ConversationView.2.6.1 Procedure and ParticipantsA pre-study questionnaire was administered to capture demographic informationand prior experience of participants with blog reading. Then the ConVis interfacewas demonstrated to the participants. After that, they were allowed to choose threeconversations of their interest from a set of six blogs from Slashdot, all of themhaving similar length (avg. number of comments is 91.33). Instead of asking someabstract questions (such as the ones in Table 2.1), we provided an open-endedtask to reflect the exploratory nature of blog reading. We asked the participant toexplore the conversations according to her own interests and write down a summaryof the key insights (if any) gained while exploring each conversation. During thestudy, we primarily focused on gathering qualitative data such as observations,user-generated summaries, and semi-structured interviews.We conducted the study with five participants (age range 18 to 24, 2 female),42Figure 2.6: Comparison of uses patterns between two participants using thetwo different strategies on the conversation titled “Music Streaming toOvertake Downloads”.who are frequent blog readers (four of them reported to read blogs at least severaltimes a day and one reported several times a week). The three most commonreasons for them to read blogs are information seeking, guidance/opinion seeking,and enjoyment. They are primarily interested in blogs about technology, politics,and education.2.6.2 Results and AnalysisBrowsing strategies: From the interaction log data and semi-structured inter-views, we identified two main strategies for reading comments: exploring by topicfacets, and skimming through detailed comments. Figure 2.6 shows the sequenceof interface actions made by participant P2, who followed the former strategy, andP5 who followed the latter, on the same conversation. Overall, of the five partici-pants, two followed the exploration by topic strategy, while the other three followedthe skimming comments one. The two participants who followed the former strat-egy, reported that they would begin by quickly scanning the topics and selectingeither the most discussed topic first or the ones that were interesting to them, andthen reading the comments linked to that topic. We also observed that to find thecomments of interest in the selected topic, they often relied on the sentiment and43comment length encoded in the Thread Overview. After going through the com-ments on a specific topic, they either went on reading the next topic that appearedin the conversation, or went back to scan the topic list to find the next topic of in-terest. This navigational behaviour can be observed from the sequence of actionsmade by Participant P2 (see Figure 2.6(left)). The other three participants followedthe traditional way of blog reading, primarily skimming through the comments inthe Conversation View. This is illustrated in Figure 2.6 (right), where P5 mainlyhovered over different comments. However, at the same time these participantsacknowledged that they tended to coordinate with the topics and Thread Overviewwhere the related items were highlighted so that they could get a sense of whatpart of the conversation they were reading and when the topic was about to change.Supporting evidence came from interaction log data, where those who followed thefirst strategy, on average clicked on different topics 13 times and hovered 68 timesfor each conversation. On the contrary, those who followed the second strategyhovered only 11 times on average per conversation and never clicked on a topic.Interface features: In general, all participants, independently of their pre-ferred browsing strategy, agreed that showing the set of topics and then visuallylinking them to the comments in the Thread Overview helped them to quickly un-derstand what a conversation is about and to focus on its most interesting parts. P2said: “I just try to find topics that are interesting to me which is really useful. Icould look into a comment of that topic and then look at other comments replying tothat comment, so this navigation feature was really good.” Another useful featureaccording to the participants is the Thread overview displaying the comments andsentiment. P1 said: “In the visualization, it is very clear to see what kind of articleI am going to dealing with... the last conversation has lot of purple, indicatingits something going to have many negative comments”; however, P3 reported thatthe sentiment classification was incorrect in some cases, making it less reliable.Encoding the comment length was found to be useful to P4: “The height of thebar was really useful, cz the thicker comments were generally more interesting andinsightful than the shorter ones.”.Users were also interested in seeing how much an author contribute to a specifictopic. According to one participant, “My primary interest with the author wouldbe to see how much they have participated back into the topic and that happens44in various occasions, so I found the linking between topics and authors quite use-ful’. P2 also found some utility of the author facet: ‘“If I find someone’s commentinteresting, then I wanna know what other comments she made, and how people re-acted to that.” In such scenario, linking the comments to the corresponding authoris valuable. But participants also emphasized that if they would have been part ofthe community, the author facet would have been much more useful: “If I wouldknow some people, I would be really interested in what they are saying. But sincethese are random people, I don’t know if I would incline to care” (P1). The par-ticipants also acknowledged that if they had been participants in the conversation,they would have been interested to know who is replying to their posts.Preference: When the participants were asked to compare their experience us-ing ConVis with their regular blog reading interface, the answers were generally infavour of ConVis, due to its ability to show a visual overview of the whole conver-sation and allowing the user to explore through facets. Moreover, the visualizationtool was found to be easy to learn by the participants. According to P1: “Seeingthe sort of pagination in current interfaces, you don’t get the overall. I have toread through all of them.” On the contrary, “Using ConVis I would read more im-portant parts of the conversation as opposed to just people talking. I can navigatethrough the comments without actually reading them, which is really helpful.” P5who followed the strategy of skimming through the conversation mentioned: “Iam so much used to scroll up and down in the list of comments, but using this ad-ditional visual overview, I had a sense of where I am reading right now and whattopic I am currently reading”. P2 said that ConVis provides a quicker way to ex-plore comments: ”It allows me to navigate through the most insightful stuff outof five minutes which could take say 15 minutes otherwise. Actually I found manycomments to be interesting towards the end of conversations, which I probablywouldn’t notice if I would use my blog interface.”2.6.3 Revisiting Task AbstractionAnalyzing the user-generated summaries from the evaluation helps us to reflect onthe task abstraction in Section 2.2. After mapping each sentence of the summariesto one or more possible tasks in Table 2.1, we find that some of the tasks were45performed more frequently than others. All of the participants answered Q1 andQ2 in their summaries, suggesting that understanding the topics is a fundamentaltask. A substantial portion of each summary answers questions Q3 through Q6,which are related to the opinion variable. We also realize that the exploratorybehaviour can be largely influenced by participant’s own viewpoints (Q5) and whatthey perceive as interesting/funny (Q10). However, the summaries reveal very littleinterest of the participant in looking for questions specific to authors (Q7 throughQ9), suggesting that being a part of the community might be highly relevant forthese tasks as mentioned by a participant. Thus, it is important to consider thecharacteristics of the target blog community into the design process.2.7 DiscussionBased on the results and analysis of the informal evaluation, we discuss more gen-erally various visualization design issues and directions for future improvements.Improve faceted exploration: An important aspect of our visualization was toexplicitly depict the relations between multiple facets of the conversation with therelated comments. However, depending on the tasks additional facets can becomemore useful to the participants (e.g., moderation scores, named entities), while anexisting facet being less useful (e.g., author). In the future, we plan to devise aninteractive visualization technique that allows the user to dynamically change thefacets of interest and reveal relations between them.Enhance scalability: On scalability, while ConVis can deal with conversationswith hundreds of comments, additional techniques are needed for longer conver-sations. In some cases when the discussion topic is very popular, the conversationcan become very large with thousands of comments. To deal with such situations,we suggest integrating additional computational methods such as detecting highquality comments [45] to guide the way of filtering and aggregating comments, aswell as to apply focus+context techniques to the Thread Overview.Need for human-in-the-loop model: In general, the utility of visual text an-alytic systems can be substantially improved if more accurate natural languageprocessing techniques are adopted. Even though the text analytic methods usedin this chapter achieve significantly higher accuracy than traditional methods [70],46the informal evaluation reveals that still in few cases the extracted topics and opin-ions are incorrect. In particular, during the interviews users expressed a pressingneed for enhancing their ability to revise the topic model according to their owninformation needs.In such cases, a promising approach could be to incorporate users feedbackin the text mining loop, so that the underlying models can be iteratively refined.Motivated by this experience, we have designed ConVisIT (an extended version ofConVis), by incorporating an interactive topic modeling approach. This extendedinterface supports the user in revising the topic model, while she is exploring theconversation. We discuss this interactive topic modeling approach in details inchapter 4.Further user evaluation: While the informal evaluation provided some pre-liminary feedback from users about ConVis, further evaluations were necessary tocompare this interface with regular blog reading interface. For this purpose, laterwe conducted a summative evaluation [78] using a lab-based study to understandthe effectiveness of ConVis compared to traditional blog reading interfaces as wellas an interface that supports interactive topic modeling (i.e., ConVisIT). We discussthe results of this study in Chapter 4.2.8 SummaryWe have presented ConVis, a visual text analytic system designed to support theexploration and analysis of blog conversations. Our approach incorporates novelmining methods that take advantage of conversational features, with interactive vi-sualization that supports multifaceted exploration. The participants’ feedback fromour informal evaluation suggests that ConVis can help the user to simultaneouslyexplore the topics and opinions expressed in the conversation; supporting the userin finding comments of interest, even if they are buried near the end of the thread.Interestingly, ConVis is beneficial also to users who follow the traditional strat-egy of scrolling through the Conversation View, because the other views providesituational awareness (e.g., what topic is expected next).Exploring a large set of conversations is arguably an even more challengingtask than exploring only one conversation, because the volume and complexity47of the textual data may drastically increase and the information overload problemcould be even more prevalent and serious among users [69]. Therefore, in oursubsequent work, we have extended our approach to handle a large collection ofasynchronous conversations, where the user is able to explore topics that are dis-cussed over many different threads. In the next chapter, we discuss this approachfor exploring a set of conversations in detail.48Chapter 3Supporting Users in Exploring aSet of ConversationsIn Chapter 2, we presented ConVis, a visual text analytics system for exploringa single conversation. We now describe how we have extended the ConVis sys-tem, that we called MultiConVis, to support users in exploring and analyzing acollection of conversations. The resulting system supports the user exploration,starting from a possibly large set of conversations, then narrowing it down to asubset of conversations, and eventually drilling-down to comments of one conver-sation. Similarly to what we did for ConVis, the development of MultiConVis isbased on the integration of NLP techniques for topic modeling and sentiment anal-ysis with information visualizations, by considering the unique characteristics ofonline conversations. Later in this chapter, we present a set of case studies withdomain experts and a formal user study with regular blog readers, which illustratethe potential benefits of our approach, when compared to a traditional blog readinginterface1.1This chapter is a modified version of our paper MultiConVis: A visual text analytics systemfor exploring a collection of online conversations, by Enamul Hoque and Giuseppe Carenini; inProceedings of the ACM International Conference on Intelligent User Interfaces (IUI), pp. 96-107,2016 [60].493.1 IntroductionWith the proliferation of web-based social media, there has been an exponentialgrowth of asynchronous online conversations discussing a large variety of popularissues like ‘ObamaCare’, ‘US immigration reform’, and ‘Apple iWatch release’.Given a query, traditional blog sites only present the set of relevant blogs as apaginated list ordered by their recency, without providing any high-level summaryof the conversations. This navigational support is often inadequate to explore a setof blogs that may be of great interest to readers [68].To understand the problem, let us recall the ‘iPhone bending’ query exampleintroduced in Chapter 1. After the iPhone 6 was launched, some people claimedthat this new phone can easily bend in the pocket. This incident triggered a lotof discussions in Macrumors [1], a blog site for Apple-related news. Within justa few days, more than a dozen conversations with thousands of comments weregenerated in Macrumors covering various related topics. In this context, we couldimagine three different users who would like to explore this set of conversations.First, a potential customer, who intended to buy an iPhone may want to explorethese conversations to verify whether the bending issue is really serious or not.Second, a journalist may want to publish a story about what people are sayingabout this issue by analyzing this set of conversations. Finally, an Apple marketinganalyst may want to know how the online community is responding to this issue tomake an informed decision about how to react to the rumors and possibly redesignthe products. In all these cases, given the large number of conversations/comments,it would be difficult and time-consuming for a user to explore and analyze all thisinformation with traditional blog interfaces, which only provide sequential accessto conversations/comments.In this work, we tightly couple NLP techniques for topic modeling and senti-ment analysis with interactive visualizations to support the exploration and analy-sis of a large set of conversations by considering the specific characteristics of blogconversations. As we have pointed out in Chapter 1, blog conversations exhibit sev-eral unique characteristics: unlike microblog or messaging [108], they do not havefixed length comments; furthermore, they have finer conversational structure asparticipants often reply to a post and/or quote a fragment of other comments [70].50Figure 3.1: The MultiConVis interface, showing a subset of blog conver-sations returned by the query ‘iPhone bending’ from Macrumors inNovember 2014. Here, the user filtered out some conversations fromthe list using the Timeline located at the top, and then hovered on aconversation item (highlighted row in the right). As a consequence, therelated topics from the Topic Hierarchy were highlighted (left).In this chapter, we consider these unique characteristics in devising our novel NLPand InfoVis techniques.We built the MultiConVis system on top of ConVis (described in Chapter 2).As we move from a single conversation to a collection of conversations, criticalchallenges emerge from the fact that users need to deal with a much larger amountof data, with different levels of granularity. For instance, the number of topicsincreases drastically for a set of conversations, therefore understanding and explor-ing these topics can be much more time consuming and cumbersome. Since someof these topics are similar in their semantic meaning, grouping them into a hierar-chical topic organization may support the understanding and navigation of topicsmore effectively.To address this challenge, we devise a hierarchical topic modeling techniquethat organizes the topics within a set of conversations into multiple levels, based ontheir semantic similarity. The resulting topic hierarchy is intended to better supportuser’s understanding and navigation of the topics. We then design a visual interfacethat presents the hierarchical topic structure along with other conversational dataas shown in Figure 3.1. The main contributions of this work are:511) A hierarchical topic modeling method over a collection of conversations.While Chapter 2 describes how to effectively extract topics from a single conversa-tion, here we propose a method which creates a topic hierarchy for a whole collec-tion of conversations, by aggregating the topics extracted from each conversationin the collection.2) The design and implementation of the MultiConVis interface, which sup-ports exploration of a collection of blog conversations based on the topic hierarchyand sentiment. In essence, MultiConVis can be seen as an interface built on topof ConVis to allow the user to seemingly switch from exploring a collection ofconversations to a single conversation. In particular, MultiConVis initially visual-izes all the conversations in the whole collection, next supports the user in filteringout conversations that are irrelevant to her information needs, and then allows theuser to drill down to a specific conversation, which is visualized with the ConVisinterface.3) The evaluation of MultiConVis through a set of case studies, and a userstudy to investigate how the system influences user performance and subjectiveopinions when compared to a sample, traditional blog reading interface similar toexisting interfaces, like Slashdot [2] and Macrumors [1].3.2 Related WorkIn Chapter 2, we have already provided an overview of related work which primar-ily focused on visualizing a single online conversation. Here, we discuss researchprototypes that aim to support the exploration of a large collection of conversations.These prototypes can be categorized based on the information they extract and vi-sualize: (a) metadata of the conversations, such as timestamps, tags, and authors,(b) the results of text analysis, such as topic model and opinion.3.2.1 Metadata VisualizationSome earlier works have focused on how to support the exploration of a blogarchive using only metadata, for example, by visualizing tags and comments ar-ranged along a time-axis [68], or by providing faceted visualization widgets for vi-sual query formulation according to time, place, and tags [36]. While these works52may assist users to find the blogs they are looking for, they are not designed to sup-port users in understanding the actual content (i.e., the text) of these conversations.However, many tasks for blog readers, that we have identified in Chapter 2, requirethe user to get overviews of the actual content of a collection of conversations, suchas “Find out what are people feeling about X over time.” Therefore, our goal is tovisualize a combination of various metadata and textual analysis results that areidentified as important in our user requirements analysis.3.2.2 Topic Modeling and VisualizationIn contrast to simply showing the metadata of the conversations, recently therehave been some attempts to visualize the topics discussed within a collection ofconversations [37, 124, 127]. A common approach is to use probabilistic topicmodels such as Latent Dirichlet Allocation (LDA), where topics are defined as dis-tributions of words and documents are represented as a mixture of topics. Many ofthese works also consider the temporal aspects of topics by showing the evolutionof topics over time. For example, Themail visualizes how topics in a collectionof email conversations develop over time by arranging keywords selected basedon term-frequency inverse document-frequency (TF-IDF) along a horizontal timeaxis [124]. TIARA [127] represents the temporal evolution of topics from an emailcollection by applying the ThemeRiver visualization [53], where each layer in thestacked graph represents a topic and the keywords of each topic are distributed overtime. From the height of each topic and its content distributed over time, the usercan see the topic evolution.More recent works have tried to move beyond visualizing topics as a flat list,by organizing them into a hierarchy [28, 40, 82]. For example, HierarchicalTopicsorganizes a large number of topics into a tree structure by considering the distancebetween the probability distributions of topics [40]; and then utilizes a hierarchicalThemeRiver view to explore temporal trends of topics. Using the same algorithm,TopicPanorama builds topic hierarchies from multiple corpora (i.e., news, blogs,and microblogs), followed by matching these hierarchies using a graph-matchingtechnique, so that the common and distinctive topics from different corpora can bevisualized [82]. It combines a radially stacked tree visualization with a density-53based graph visualization to facilitate the examination of the matched topic graphfrom multiple perspectives. Compared to these approaches that generate statictopic hierarchies, RoseRiver focused on exploring the evolutionary patterns of hi-erarchical topics generated at different timeframes by conveying topic merging andsplitting relationships over time using Sankey diagrams [28].Organizing topics into a hierarchy can be very useful to our work as well, be-cause the number of distinct topics in a collection of conversations may be quitehigh, compared to a single conversation. However, existing hierarchical topic mod-eling approaches are not designed specifically for conversational data. In contrast,MultiConVis creates a topic hierarchy for a collection of conversations by aggre-gating the topics of each conversation. And such topics are generated by takingspecific characteristics of asynchronous conversations such as reply-relationshipinto account [59].3.2.3 Opinion VisualizationThere is a growing interest in visualizing the opinions expressed in conversations,mostly focusing on microblogs [33, 85, 129]. Diakopoulos et al. presented VoxCivitas [33] that displayed sentiment and tweets volume over time for events dis-cussed in microblogs to support the tasks of journalistic inquiry. TwitInfo [85]was also designed for visualizing microblogs with a focus on providing more ac-curate aggregation of sentiment information over a collection of tweets. Unlikethese works, OpinionFlow focused more on visualizing the spreading of opinionsabout a particular topic (e.g., ‘US government shutdown’) among participants witha combination of a density map and a Sankey diagram [129]. Often the opinioninformation is summarized with other important aspects of information spreadingsuch as temporal information, and the connections among conversation threads andauthors [133].A critical issue when abstracting data for sentiment analysis is how to aggre-gate sentiment information across sentences, comments, and conversations. Whileall the works described above dealt with twitter data, in which tweets are onlyorganized as a list, here we focus on a set of much more structured blog conversa-tions, where each conversation consists of a set of comments organized in multiple54threads with reply-relationships. We exploit this additional structure when we vi-sually represent sentiment over multiple, different levels.3.3 User Requirements AnalysisIn Chapter 2, we analyzed why and how people read blogs and used this analysisto derive the data and task abstractions. Here, we are going to identify useful dataabstractions for a set of conversations and compare them with data abstractions fora single conversation.In essence, the primary goals of reading blogs include information seeking, factchecking, and opinion seeking [30, 72], which require the reader to understandwhat topics are discussed in the conversations and what opinions are expressed onthose topics. Furthermore, users often exhibit a variety seeking behaviour, i.e., theytend to switch frequently from a topic to its sub-topics or to a completely differenttopic [119].Blog readers also care about temporal aspects of the conversations [31, 57],for instance, the start and end time of a conversation, the chronological positionof a comment with respect to the other comments within a conversation [13], andthe volume of comments over time when exploring multiple conversations. In-formation about authors of the comments is also considered to be valuable [57],especially for blogs in which the same users participate frequently.Table 3.1 summarizes our design choices for what information our interfaceshould display, in light of the current literature on blog readers. The row in thetable corresponds to data facets and the columns to whether the facet is for multipleconversations vs a single one.Since the number of topics for a collection of conversations is potentially muchlarger than for a single conversation, all the topics within a collection are organizedinto a hierarchy, while the topics of each single conversation are organized as a flatlist and are explicitly connected to the comments of that conversation. To supportthe goals related to the time facet, the volume of comments over time is computedfor each conversation in the collection of conversations, whereas within each con-versation the chronological position of the comments is used. For the sentimentfacet the distribution of sentiments across five polarity intervals, ranging from -255Table 3.1: A summary of how facet elements are abstracted for a collectionof conversations vs. one conversation.to +2, is computed by counting how many sentences fall in each of these intervals.Here, for a collection of conversations, we compute the sentiment distribution foreach conversation, whereas for one conversation, we compute this distribution ata finer level, i.e., for each comment. Finally, for the authors facet, while for a setof conversations only counts of authors are computed without providing the de-tailed list of authors, for one conversation the list of authors for that conversationis shown.Current literature on blog reading not only inspired our data choices, but alsoguided the development of MultiConVis interactive visualization techniques. Con-sidering the exploratory nature of blog reading, MultiConVis supports the user inbrowsing the set of conversations and comments by means of all the key facets,namely topics, sentiment, and authors. Furthermore, the interface facilitates theexploration through the facets at different levels of granularity: from all conversa-tions, to a subset of conversations, to one conversation. For consistency, elementsof the same facet across different levels of granularity have similar visual mappings56Figure 3.2: Overview of the MultiConVis system.in terms of color, shape, and other visual encoding channels. Finally, to facilitatethe exploration and filtering of conversations, important attributes of each conver-sation, namely the number of topics/authors, the number of comments, and theoverall sentiment distribution are encoded as information scent [128].3.4 System OverviewThe MultiConVis system consists of four major components as shown in Fig-ure 3.2. Given a specified query (e.g., ‘iPhone bending’), the data acquisitionmodule invokes a blog site such as Macrumors to crawl the set of conversationsobtained from the first page of the search results returned by that site. Next, the pre-processing module performs data cleaning to retain only the conversational data inthe crawled pages, followed by extracting the conversational structure, i.e., reply-relationships and quotation. We also use a state-of-the-art tagger [8] to tokenizetext and annotate the tokens with their part-of-speech tags. After that, the analy-sis module performs topic modeling and sentiment analysis over the whole set ofconversations. It then aggregates both metadata and results of text analysis at dif-ferent granularity levels as described in the user requirements analysis. Finally, thevisualization module displays the results obtained from the analysis module, andsupports the user to interactively explore the conversations.57Figure 3.3: Hierarchical topic model generation.3.5 Text Analysis3.5.1 Topic Hierarchy GenerationOur topic modeling approach takes a collection of n blog conversationsC = {c1,c2, ...,cn} that satisfies a user query and generates a topic hierarchy fol-lowing a bottom-up approach. In the resulting hierarchy, each node represents thecluster of sentences in the conversations that discuss the topic described by thelabel of the node. One could think of a top-down approach to be more suitablefor generating the topic hierarchy, as it considers the whole set of conversationswhile generating the initial set of clusters (the roots of the hierarchy); however, wechoose a bottom-up approach because in this way we are able to take into accountthe conversational structure extracted from each conversation. In other words, wefirst generate a set of topic clusters for each conversation by taking advantage ofits conversational structure, and then we organize these topic clusters from all theconversations into a hierarchy. More specifically, our topic hierarchy generationinvolves two primary steps as shown in Figure 3.3: 1) generate a set of topics Ti foreach conversation ci ∈C; 2) aggregate all the Ti into a hierarchical topic structurefor the whole collection.583.5.2 Topic Modeling Over Each ConversationIn order to generate a topic model over each conversation, we adopt the methoddescribed in Chapter 2. We briefly summarize it here, because our topic model-ing method for a collection of conversations exploits similar data structures andtechniques. Topic modeling of a single conversation starts by grouping the sen-tences of the conversation into a number of topic clusters (segmentation). Then,representative key phrases are assigned to each of these clusters (labeling).In essence, topic segmentation applies a Lexical Cohesion-based Segmenter(LCSeg) [49] to each thread in the conversation as shown in Figure 3.4, whereeach thread represents a path from the initial message to a leaf message. Noticethat after running the LCSeg algorithm, two sentences (e.g., s1 and s4) may appeartogether in the same segment in one thread (A,C1,C2), while falling into differentsegments in another thread (A,C1,C5). To consolidate all the (possibly conflicting)segmentation decisions made on each thread, we apply an efficient min-cut graphpartitioning algorithm [118]. The optimal number of topics for each conversation isautomatically determined by maximizing a clustering objective function proposedin [97].Topic labeling takes the segmented conversation as input and generates a set ofkeyphrases to describe each topic cluster in the conversation. This is done by adapt-ing the co-ranking method proposed in [134], in which a list of the top keyphrasesis extracted from a graph of words that captures the co-occurrence of each wordin the topic cluster with respect to the words in the leading sentence of that clus-ter, as well as the position of each word with respect to the thread structure of theconversation.Creating the Topic Hierarchy Over the CollectionThis is the key computational contribution of this chapter. Once the sets of topicsTi for each conversation ci are generated, we organize all of them into a single topichierarchy to create a structured overview of the whole collection of conversations.To achieve this, we have devised a graph-based method similar to the one that weapply to single conversations. The main difference here is that the nodes of thegraph we create are not sentences anymore, but topics.59(a) (b)Figure 3.4: a) Reply-to relationships between the initial post A and the com-ments C1,C2, ...,C6 of a conversation (left). Each post may compriseof one or more sentences as denoted by s1,s2,s3, ...,s10. b) the corre-sponding list of threads along with segmentation results after runningthe LCSeg algorithm on each of these threads. Here, the segmentationboundary is denoted by ‘|’ (right).In particular, we create a weighted undirected graph G(VC,EC), where thenodes VC represent the union of all the topics Ti from the set of conversationsC = {c1,c2, ...,cn} and the edge weight w(x;y) in EC, between any two given topicnodes x and y, are generated by computing the average similarity between all pairsof sentences, in which one sentence belongs to topic x and the other one belongsto topic y. More formally, consider Sx is a set of l sentences and Sy is a set of msentences for topics x and y respectively. Then we compute the edge weight w(x;y)as follows:w(x;y) =1l×m ∑si∈Sx,s j∈Sysim(si,s j) (3.1)Here, sim(si,s j) is the measure of similarity between a pair of sentences si ands j. This measure is based on cosine similarity between si and s j, if topic x and topicy belong to two different conversations cx and cy. Also, the same cosine similaritymeasure is used when si and s j are from the same conversation, but never appear inthe same segment in the segmentation results of the LCSseg algorithm. However,if si and s j are both from the same conversation and they appear together in thesame segment at least once, then the similarity is determined by k, where k is thenumber of times (k >= 1) in which si and s j appeared in the same segment. This60is based on the intuition that two topics that are from the same conversation andhave stronger cohesion in the threads of that conversation should be more likely tobe clustered together than those that do not. More formally,sim(si,s j) ={CosineSim(si,s j) if cx 6= cyk if k >= 1CosineSim(si,s j) if k = 0 else(3.2)CosineSim(si,s j) =∑w∈si,s j tfw,si · tfw,sj√∑p∈si tf2p,si ·√∑q∈s j tf2q,s j(3.3)0≤CosineSim(si,s j)≤ 1 (3.4)Here, tfa,b denotes the term frequency of term a in the sentence b 2.Once we have built the graph G(VC,EC), we apply the same graph partition-ing algorithm used in topic segmentation for single conversation, i.e., approximatesolution to n-Cut [118] on G(VC,EC). As a result, topic nodes that are mostly sim-ilar i.e., strongly connected in G(VC,EC) will form n different clusters. Each ofthese clusters can be interpreted as a parent topic (in the topic hierarchy) of all thetopic nodes that form that cluster. Here, the number of clusters n is automaticallydetermined by maximizing a clustering objective function proposed in 4.1 [97].For the final step of topic labeling, we assign a set of keyphrases to each parenttopic by taking all the sentences from all the children topic nodes under it, and bythen extracting and ranking keyphrases from all those sentences. This process issimilar to the topic labeling method described for a single conversation, except thatgiven the absence of a thread structure between multiple conversations, we modifythe ranking process by creating a graph that only captures word co-occurrencerelationships.2For the sake of simplicity, we measured the cosine similarity between two sentences based onword frequency. Nevertheless, one could replace this simple representation with more recent neuralembeddings for sentences like [75] to obtain better performance.613.5.3 SentimentFor sentiment analysis, we apply the Semantic Orientation CALculator (SO-CAL) [121], which has been shown to work well on user-generated content. SO-CAL computes sentiment polarity as numeric values. At first, we generate thepolarity for each sentence of the conversation using SO-CAL. We defined five dif-ferent polarity intervals (-2 to +2), and aggregate the results at various levels. Forinstance, at the level of a single conversation for each comment, we count howmany sentences fall in any of these polarity intervals to compute the polarity distri-bution for that comment. Similarly, when dealing with a set of conversations, foreach conversation we count how many sentences fall in any of these five polarityintervals to compute the polarity distribution for that conversation.3.6 MultiConVisIn order to explore various design choices, we carried out an iterative design pro-cess, starting from early mockups and prototypes, to a fully functional system.Throughout this process, we performed formative evaluations [78] to identify po-tential usability issues and to iteratively refine the prototype. We now present thefinal design of the MultiConVis interface3, along with justifications for the key de-sign decisions based on our user requirements analysis and the InfoVis literature.3.6.1 Visual EncodingFacets: As mentioned earlier, a key design goal of MultiConVis is to facilitatethe exploration of a set of conversations at multiple levels of granularity, whilemaintaining consistent visual mapping across different levels. We maintained con-sistency in the visual encodings across different levels as follows: 1) Sentimentdistributions are represented in the same way (as a stacked bar) for a conversa-tion, for a topic as well as for a comment (see Figure 3.5a). A set of five diverg-ing colors was used in a perceptually meaningful order purple (highly negative)to orange (highly positive) to visualize the distribution of sentiment orientations3A video demonstration of MultiConVis is available here https://goo.gl/ZmVYks.62(a) Sentiment distribution (b) Topic in four different states with respect touser interaction(c) ConversationFigure 3.5: The main visual encodings in MultiConVis: a) Sentiment distri-bution is shown as stacked bar; b) Visual encoding of topics changesaccording to different user interactions; c) Visual encoding of a set ofaggregated metadata and text analysis results for a conversation.at all the three different levels of granularity4. 2) For all the attributes related totopics/authors facet, the same color coding was used across different levels (seeFigure 3.5b).All conversations: Initially, when the user starts exploring the whole collec-tion of conversations MultiConVis displays three components as shown in Fig-ure 3.6: 1) a Topic Hierarchy; 2) an overview of the set of conversations (Con-versation List); and 3) a Timeline View showing the volume of comments of thewhole collection over time. These three components are interactively coordinated,so that any operation in one view is reflected in the other views.The Conversation List shows the current set of conversations, where each itemin the list represents a set of aggregated metadata and the results of text analysisfor the corresponding conversation (See Figure 3.5c). In particular, we encode thefollowing attributes of each conversation: 1) the overall sentiment distribution us-4The orange and purple colors were selected instead of the standard green and red to avoid thecolor blindness effects.63Figure 3.6: A snapshot of MultiConVis for the ‘iPhone bending’ dataset: theTopic Hierarchy represents the set of topics and their sub-topics as an in-dented tree (left); the Conversation List shows a set of aggregated meta-data and text analysis results for each conversation (row); the Timelineat the top shows the volume of comments over time for all conversa-tions.ing a stacked bar, 2) the number of comments, which is encoded as the height ofthis stacked bar, 3) the count of topics and authors as horizontal bars, and 4) asparkline that represents the volume of comments over time in a more space effi-cient way [50]. In addition, the title and a text snippet of the conversation are shownto the right side of its visual summary. Overall, these attributes summarize the setof conversations, facilitating the discovery of interesting subsets of conversationsthat are of interest to the user.The Topic Hierarchy visually conveys all the topics in the whole collection ofconversations using an indented tree representation. Here, topics are sorted chrono-logically within each level of the hierarchy. Each topic node is represented by itstop keyphrase label returned by the topic modeling method, however, when theuser hovers on a topic additional keyphrases are also shown to provide more con-text about that topic. The font size of a topic node represents how much it has beendiscussed compared to other topics. We present the Topic Hierarchy as an indentedtree, where the parent-child relationship is represented by relative vertical position64along with the horizontal position. We made this choice because an indented treerepresentation is much more compact than explicitly showing hierarchical linksbetween topic nodes.3.6.2 Multi-level ExplorationFrom the whole collection to subsets of conversations: While the user initiallygets an overview of all the conversations in the collection, her subsequent goal isto find the subset of conversations that are more interesting or relevant, given hercurrent information needs. We support this goal by providing a set of interactivefeatures: linked highlighting, selection, filtering, and reordering. The TimelineView, shown in Figure 3.6, allows the user to quickly filter out conversations that donot fall within the time range in which the discussions were more active or relevant.In addition to filtering, the user can reorder the set of conversations based on thefollowing attributes: number of topics/authors/comments, sentiment distribution,and date of the first post of a conversation.To promote exploration based on the topic facet, we provide coordinated high-lighting and selection of conversations by topic. For example, hovering on a topichighlights all the conversations where this topic was discussed, and converselyhovering on a conversation temporarily highlights topics in the Topic Hierarchy.Moreover, when the user selects a topic by clicking on it, a vertical outline is drawnalongside the related conversations, allowing the user to see the conversations inwhich this topic was discussed, even when she is exploring different conversation-s/topics. Throughout the filtering and selection processes, the representation of var-ious attributes from both topics and conversations serve as information scent, thusenhancing the ability of the user to navigate and filter data more effectively [128].Often, as the user finds a subset of conversations that are relevant to her infor-mation needs, she may become interested to know more detailed information aboutthem, for instance, to see the temporal evolution of sentiment over time for eachconversation. We provide such feature based on user interactions, i.e., as the userclicks on the ‘Show timeline’ button, the sentiment distribution of comments overtime is represented as a stacked area chart, within each conversation item in the list(See Figure 3.7). This helps the user to understand temporal patterns of sentiment65Figure 3.7: A conversation from the ‘iPhone bending’ dataset, showingstacked area chart to represent how sentiment distribution evolves overtime.in different conversations, supporting her to fulfill information needs related to thetime facet.Drill down to one conversation: As the user continues her exploration, shemay become particularly interested in a specific conversation. In this case, she candrill down into that conversation with the ConVis interface, which was designed toexplore a single conversation (described in Chapter 2) [59]. Here, an important de-sign question arises: once the exploration has reached a single conversation, shouldwe show ConVis along with both the Conversation List and the Topic Hierarchyso that the user can simultaneously glance at all of them? Notice that showing allthe levels would be extremely challenging because of horizontal space limitations.However, we found this not even to be necessary. Our initial formative evaluationsand case studies indicate that users do not need to jump back and forth to the Con-versation List while exploring a single conversation. On the contrary, users tendto spend most of the time reading specific comments of the conversation they havedecided to focus on before going back to the Conversation List. In light of this,when the user drills down into one conversation the Conversation List is replacedwith the ConVis interface, as shown in Figure 3.8.Now, we briefly describe how the visualization components of the ConVis in-terface interact with other views of MultiConVis (a more detailed description ofConVis is provided in Chapter 2). Recall that ConVis consists of an overview(Thread Overview) of the conversation along with two primary facets, topics andauthors, which are presented circularly around this overview. Once ConVis is dis-played within MultiConVis, the Topic Hierarchy over the whole collection is stillshown to provide helpful context to the user in understanding the relationship be-tween the topics of the selected conversation and the topics of the other conversa-66Figure 3.8: As the user selects a particular conversation, the ConversationList is replaced by the ConVis interface, where the Thread Overviewvisually represents the whole conversation encoding the thread struc-ture and how the sentiment is expressed for each comment(middle);The Facet Overview presents topics and authors circularly around theThread Overview; and the Detail View presents the actual conversationin a scrollable list (right). Here, topics a are connected to their relatedcomments as well as to their parents in the Topic Hierarchy via curvedlinks.tions. As shown in Figure 3.8, the topics of the selected conversation displayedwith ConVis are explicitly linked to the ones in the Topic Hierarchy.The user can explore a conversation using the interactive features of ConVis,such as hovering and selecting a topic of interest. While exploring a topic, shemight become interested to know whether similar topics are discussed in otherconversations. At any point, the user can look at the Topic Hierarchy to see whatare the other similar topics to her current topic of interest, but not discussed in thisconversation. For instance, when the user is exploring the topic ‘Thin metal’ in thecurrent conversation, she may select a related topic labeled ‘Structural issue’ in theTopic Hierarchy, which results in abandoning the ConVis interface and switchingback to the Conversation List, where the conversations related to ‘Structural issue’would be highlighted. Finally, at any time the user can return to the Conversation67List by clicking on the ‘Back’ button.3.7 ImplementationThe data acquisition, preprocessing, and analysis components were developed us-ing python and a server-side component (in PHP) which feeds the data to the vi-sualization pipeline. The visual interface was implemented using a combination ofHTML and JavaScript (using the D3, JQuery, crossfiter, and dc.js libraries).3.8 EvaluationWe evaluated the MultiConVis interface in two different ways: 1) case studies withdifferent domain experts, 2) a formal user study with regular blog readers. Whilethe case studies provided qualitative evidence for the utility of the MultiConVissystem, the user study allowed us to compare the system with a traditional inter-face. Note that ConVis, the interface for single conversations embedded in Multi-ConVis, had already been evaluated (described in Section 2.6 and Section 4.6.1),which showed that ConVis outperformed traditional interfaces along several sub-jective metrics (e.g., usefulness, enjoyable).3.8.1 Case StudiesWe conducted case studies with three users, whose professions are quite diverse,but who come from populations that could all arguably benefit from MultiConVis:U1: a regular blog reader who visits the Macrumors blog site several times aweek. Therefore, he was interested in exploring the conversations returned by our‘iPhone bending’ query. His primary goal was to verify whether the problem of‘iPhone bending’ reported by some customers was really serious or not.U2: a graduate student in the school of Journalism, who contributes to localnewspapers about recent political issues. He had strong interest in our dataset aboutthe recent ‘ObamaCare health reform’. His primary goal was to understand andsummarize the key opinions expressed by the participants about the ObamaCarehealth reform.U3: a business analyst in a social media company, where she often needs to an-68alyze a large amount of conversations to understand how customers react to newlyreleased products. So, her goal in the study was to explore conversations aboutthe ‘iWatch release’ to identify comments that express negative opinions about theproduct, which is a task that matches what she performs on a regular basis for hercompany.For the purpose of case studies, we have collected three different datasets fromtwo different blog sources: Macrumors [1] (a technology news related blog sitededicated to the discussion of recent news and opinion relating to the Apple Inc)and Daily Kos [3] (a political analysis blog site) between September to December2014. To create each dataset, we provide a query to the blog site to retrieve the setof conversations that appear on the first page of the search results.For each case study, we analyzed the results by triangulating between multipledata collection methods, including observations, notes taken by participants dur-ing the analysis session, and semi-structured interviews. In addition, we loggedinterface actions to better understand the usage patterns.We now report the primary results of the case studies. The key findings werethat: (a) all three users relied on the topic hierarchy to accomplish their task, (b)each user used the hierarchy differently, (c) all users found the topic hierarchyextremely useful. For instance, while the blog reader started his exploration byquickly scanning through the topics in the hierarchy and then going back and forthbetween topics and conversations, the journalist explored the topics in the hierarchymore systematically, exploring all the comments about one topic before moving toa new one. Still differently, the business analyst started by skimming through thetitles of the conversations. But, as she was skimming through the conversations,she also kept an eye on the topics that were highlighted for each conversation in thetopic hierarchy. In this way, she identified controversial topics that were intenselydebated in recent conversations.Overall, the semi-structured interviews revealed that users were very satisfiedwith the interface. In particular, U1 said “The comments about that chemical acidbath was buried down in the middle of one conversation, which I don’t think Iwould have noticed with a regular interface. Using MultiConVis, I was able topick this topic from the hierarchy and then jumped into the related comments with-out having to read the entire conversations....”. U2 found the topic hierarchy to69be very helpful in supporting a systematic exploration of the conversations by or-ganizing the key opinions into meaningful topical groups. More interestingly, herealized the potential utility of MultiConVis system for other exploratory tasks thathe would like to perform, “This tool could be not only useful when I want to write astory, but also to prepare for interviewing a policy maker, or a politician by quicklyunderstanding what topics are triggering the most interesting or controversial dis-cussions in the public spheres.” Finally, U3 anticipated that this tool could be veryuseful to understand what features of their products worked (or didn’t work) andthen revise the products accordingly, “The MultiConVis interface would definitelyhelp me to understand the requirements and needs of my customers more effec-tively. Our current way is just to skim through the comments, often missing theimportant feedback from customers ...but this interface can help me identify whatare the biggest concerns from the customers and get clues about the ways to satisfytheir needs.”3.8.2 User StudyWe ran a formal user study to evaluate the effectiveness and usability of the Mul-tiConVis interface compared to an interface that represents the traditional inter-faces for blog reading. The aim of the user study was to answer the followingtwo questions: (1) When we compare MultiConVis with the traditional interfacefor exploring a set of conversations, is there any difference in user performanceand subjective reactions? (2) What specific features of the MultiConVis interfaceare perceived as more/less beneficial by the potential users (e.g., Topic Hierarchy,Timeline)?MethodologySince the first research question requires comparisons among two different userinterfaces, we conducted a summative evaluation through controlled experi-ments [78]. The study was designed with two interfaces as conditions: a) thetraditional interface for blog reading, and b) MultiConVis. Here, the traditional in-terface shows a set of blog conversations as a linear list, where each item representsa set of metadata of the conversations, such as title, the number of comments, and70Figure 3.9: The baseline interface, which initially presents the collection ofconversations as a linear list, showing a set of metadata for each conver-sation in the list.posting date (see Figure 3.9). The user can click on any conversation in the list,which results in showing all the comments of that conversation using an indentedlist representation. In addition, we provided a set of interactions that are commonin most blog reading interfaces, i.e., searching for terms and sorting conversationsby attributes (e.g., number of comments). A within-subject design was used withinterface as the within-subject factor, allowing us to directly compare the measuresof each participant with respect to both interfaces. Finally, all study aspects, in-cluding instructions and setup, went through several iterations of evaluation andpilot testing.Procedure and taskAt first, a pre-study questionnaire was administered to capture demographic in-formation and prior experience with exploring blog conversations. Then, the par-ticipant went through the following steps for each of the two interfaces: 1) In ascripted warm-up session, the interface was introduced to the participant using asample dataset. 2) The participant was then asked to perform a task based on a setof conversations. For each interface, a different set of conversations was provided.Task: Considering the open-ended nature of blog reading, no specific set of71questions was given. Instead, the participant was asked to explore a set of conver-sations about the given query and then write a single summary of what she thoughtwere the major discussion points and most insightful comments within the conver-sations. The study lasted approximately 60 minutes and each participant was paid$15 to participate.We selected two different datasets crawled from the Macrumors site for testing(‘iPhone bend’ and ‘iPad release’). The number of conversations in the datasets arekept the same (16 conversations in each dataset) to avoid potential variations due tothe amount of conversational data. Also, to counterbalance any potential learningeffects due to the order of exposure to specific interfaces and dataset, the order wasvaried using a 2 x 2 Latin square. During the study, we collected both quantitativedata such as task completion time and qualitative data such as observations andquestionnaires. Finally, a post-study questionnaire followed by a semi-structuredinterview were administered regarding the user’s experience with two interfaces5.ParticipantsWe conducted the study with 16 users (aged 18-37, 6 females) who have consid-erable experience with reading blogs. The participants held a variety of occupa-tions ranging from journalists, engineers, system analysts and students from bothgraduate and undergraduate levels. They were recruited through emails and socialnetworks (Facebook and Reddit posts).Results AnalysisAfter completing the task with each interface, participants rated six different mea-sures in the form of in-study questionnaires. Since these measures were rated usinga standard 5 point Likert scale, standard parametric analysis was not suitable dueto the lack of normality [71]. Instead we performed nonparametric analysis i.e.,Mann-Whitney’s U tests on the responses for each of these measures.The results of these questionnaires are presented in Figure 5.6. The pairwisecomparisons using Mann-Whitney’s U tests indicate that MultiConVis was supe-rior on five different measures out of six: usefulness (Z =−1.823; p < .05); enjoy-5The study materials for the user study can be found in Appendix A72Figure 3.10: Average rating of interfaces by the participants on six differentmeasures. Longer bars indicate better rating.able to use (Z =−3.697; p < .01); find insightful comments (Z =−3.95; p < .01);find major points (Z = −2.909; p < .01); and enable to write more informativesummary (Z = −3.915; p < .01). For the other measure i.e., ease of use, Multi-ConVis was still rated higher over the traditional interface, however the results wasnot significant. This is interesting, because MultiConVis appears to be as easy touse as the other interface in spite of its complex interface features.Interface features: Each participant was also asked a set of questions regard-ing the usefulness of specific features of the MultiConVis interfaces. From Fig-ure 3.11, we can readily see that the majority of the responses were dominated bypositive ratings. Among the interface features, the Topic Hierarchy received themost positive ratings (strongly agree:9, agree:6), followed by the visual summaryof each conversation, and interactive filtering by timeline.Time: The average time required to complete the tasks was not significantlyaffected by the interfaces, with MultiConVis and the traditional interface requiring1065±249 and 1029±204 secs respectively.73Figure 3.11: Responses to statements regarding specific features of the Mul-tiConVis interface.Overall Preference: In the post-study questionnaire, participants were askedwhich system they prefer for exploring a collection of conversations. 75% of theparticipants indicated a preference for MultiConVis, whereas 25% preferred thetraditional interface. Many of the participants who chose MultiConVis indicatedthat the utility of Topic Hierarchy was the primary reason for their preference:“By having a topic hierarchy of the relevant topics, as well as highlighting whichconversation refers to which topic, it was very easy to filter out the blogs thatwere not relevant.’ (P8). They also found the visual summary provided for eachconversation was very useful, “The summary offered by this visualization is quiteimpressive and throws a lot of instant information.” (P2). Additionally, for thesentiment distribution over time “...made it very easy to see how opinions changedover time. While investigating bend gate it was clear how the community opinionchanged after the event had played out in the media” (P4).Those who preferred the traditional interface indicated that they like its famil-iarity “I preferred the older style of interface mainly because it’s what I’m morefamiliar with...” (P1). They also pointed out that sometimes the topic hierarchy wasinaccurate, for instance, topic labels did not always make sense to them: “..maybewith better tagging I’d find it (MultiConVis) more useful...” (P1), and “the key-74words weren’t necessarily the most useful ones or the relevant ones” (P5). Wehave considered these comments to improve our approach in Chapter 4, by intro-ducing a human-in-the-loop topic model.3.9 DiscussionWe now discuss the summary of findings from the user study, as well as the limi-tations of this type of user study.3.9.1 Summary of FindingsOur case studies demonstrate that the system can be useful in a variety of contextsof use, while the formal user study provides evidence that the MultiConVis inter-face supports the user’s tasks more effectively compared to traditional interfaces.In particular, all our participants, both in the case studies and in the user study,appear to greatly benefit from the topic hierarchy and the high-level overview ofthe conversations. The user study also shows that the MultiConVis interface issignificantly more useful than the traditional interface, enabling the user to findinsightful comments from thousands of comments, even when they were scatteredaround multiple conversations, often buried down near the end of the threads. Moreimportantly, MultiConVis was preferred by the majority of the participants over thetraditional interface, suggesting the potential value of our approach for combiningNLP and InfoVis.3.9.2 Evaluation MethodologyIn this work, we conducted a lab-based user study to understand the potential effec-tiveness of the MultiConVis interface. Even though a controlled study is suitablefor comparing different interfaces, it may not accurately capture real-world sce-narios [78]. Although we carefully recruited participants who were frequent blogreaders, still different settings were controlled to make a fair comparison amonginterfaces (e.g., they were not allowed to choose a conversation according to theirown interest).In order to enhance the ecological validity of our evaluations [18], a possibleapproach would be to perform Web-based studies to observe how the system is75used by real users to satisfy their information needs. In Chapter 5, we will describehow we ran a user study in Web-based environments, where participants workedin their own settings performing their own task. This study was conducted amonghundreds of users who performed information seeking tasks by exploring a set ofconversations in a community question answering forum. This study also gives usthe advantage of collecting interaction logs from a large number of users to getdeeper insights that are arguably more generalizable than a lab study.3.10 SummaryMultiConVis is an interactive visual text analytics system for exploring a collectionof blog conversations. Unlike traditional systems, MultiConVis takes the uniquecharacteristics of online conversations into account to tightly integrate NLP andInfoVis techniques. The resulting visual interface aggregates data across differentlevels, supporting a faceted exploration starting from a whole set of conversations,to a subset of conversations, to one conversation.While the topic hierarchy was found to be very useful, still in a few cases theextracted topics were either noisy or did not match the user’s current informationneeds. To deal with this problem, we have devised an interactive topic hierarchyrevision approach, where the user can provide feedback to the system so that therevised topic hierarchy better matches her tasks and mental model. In the nextchapter, we will discuss this interactive topic revision approach in details.76Chapter 4Interactive Topic Modeling forExploring Online ConversationsIn Chapter 2 and 3, we presented two visual text analytics systems for exploringonline conversations. In both systems, we have applied topic modeling techniquesto summarize the primary themes discussed in a conversation (or a set of conver-sations). However, from the evaluations with real users, we found that the resultsof the topic model were sometimes noisy, or even if accurate did not match theircurrent information needs.To address this problem, in this chapter we propose novel topic modeling meth-ods for asynchronous conversations that revise the model on the fly on the basis ofusers’ feedback. We then integrate these methods within our visual interfaces (i.e.,ConVis and MultiConVis) to create two new interfaces ConVisIT and MultiCon-VisIT, where IT stands for Interactive Topic modeling. The goal of incorporatingthe user’ feedback within the visual interface is to support the user in exploringconversations, as well as in revising the topic model when the current results arenot adequate to fulfill the user’s information needs. Finally, we discuss two lab-based studies with real users that compared ConVisIT and MultiConVisIT withinterfaces that do not support human-in-the-loop topic modeling1.1Portions of this work were published in ConVisIT: Interactive Topic Modeling for ExploringAsynchronous Online Conversations; Proceedings of the ACM International Conference on Intelli-gent User Interfaces (IUI), pp. 169-180, 2016 [60]. An extended version of this paper has also been774.1 IntroductionWhile topic models can provide an attractive solution to understanding large con-versations, they may not always be useful to the end users [17, 59, 66]. This couldbe due to different reasons. For instance, the current information seeking tasksmay require a topic model at a different level of granularity, e.g., if the user needsmore specific information about ‘ObamaCare’ she might be interested in exploringits potential sub-topics such as ‘health insurance’, ‘healthcare cost’, and ‘drugs’.Also, the interpretation of topics may vary among users according to their exper-tise and mental model. In a topic annotation study of blog conversations, humanannotators sometimes disagreed on the number of topics and on the assignment ofsentences to topic clusters [70]. For instance, for one of the conversations fromtheir corpora, one annotator produced 22 topics, while another annotator reportedonly 8 topics. Furthermore, the results of automatic topic modeling can be sim-ply incorrect, in the sense that the generated topics would not make sense to anyuser [22, 70]. For example, two semantically different topics ‘Obama health pol-icy’ and ‘job recession’ might be wrongly grouped together with the misleadingtopic ‘Obama recession’.Similarly, when we organize topics into a hierarchy, the resulting organiza-tion may not be always useful to the users. For example, when a new product islaunched, a business analyst may want to organize topics based on people’s opin-ions about the ‘sales’ and ‘customer service’, whereas a potential buyer may wantto categorize topics based on ‘new features’ of the product. In other cases, the topicorganization might not be accurate i.e., two semantically different topics might bewrongly placed under the same parent topic. For example, a parent topic named‘iPhone bending vulnerability’ may have two children‘iPhone 6’ and ‘longer bat-tery life’, which might be completely unrelated in the context of the discussion.In this chapter, we present an interactive topic modeling framework, that cansupport the user in exploring conversations by relying on topics that make senseto her, that are semantically coherent and match her expertise, mental model, andcurrent task. In our framework the user can revise the topic model, while she isappeared as a jounral paper: Interactive Topic Modeling for Exploring Asynchronous Online Con-versations: Design and Evaluation of ConVisIT, by Enamul Hoque and Giuseppe Carenini; ACMTransactions on Interactive Intelligent Systems (TiiS),6(1):7:17:24, Feb. 2016 [61].78Figure 4.1: Interactive topic modeling framework for exploring asyn-chronous conversation.exploring conversations. To achieve this, user feedback is incorporated within thetopic modeling loop in real-time through the visual interfaces. In particular, weincorporated user feedback by extending the ConVis and MultiConVis interfaces.The re-designed interfaces support the user in revising both a list of topics as wellas a topic hierarchy.Figure 4.1 illustrates our interactive topic modeling framework. Given theasynchronous conversation(s), the system generates an initial topic model (a lin-ear list of topics, or a set of topics organized into a hierarchy), which are presentedin the visual interface along with other conversational data. The interface thensupports the user in exploring the conversation. However, whenever the user real-izes that the current topic model is not helping her, she can provide topic revisionfeedback to the system through interactions. Subsequently, the system updates thetopic model accordingly and the new results are shown in the interface.The primary contributions of our work are three-fold:1) A novel interactive topic modeling approach specifically devised for asyn-79chronous conversations. Existing systems (e.g., [19, 66, 81]) were mainly devisedfor generic documents without considering the unique features of conversations. Incontrast, we analyze the information seeking tasks in our target domain to selecta minimum set of topic revision operations that are critical to the user. Then, wedevise computational methods for each of these operations to be performed by thesystem.2) We designed a set of interactive features that allow the user to revise thecurrent topic model. In response, the interface updates and re-organizes the mod-ified topics by means of intuitive animations, so that the user can better fulfill herinformation needs.3) We conducted two lab-based summative studies to assess how user perfor-mance and experience change when a human-in-the-loop topic modeling approachis introduced in our visual interfaces for exploring both a single conversation anda collection of conversations.The remainder of this chapter is organized as follows. First, we provide anoverview of related research on interactive topic modeling. Next, we present ourapproach for interactive topic modeling and the specific interactive visualizationfeatures by which the user can revise the topic model. This is followed by a de-scription of the user study along with a detailed analysis of the results. Finally, wediscuss the overall findings and outline directions for future work.4.2 Related WorkSeveral approaches and tools have been proposed in the literature for incorporatinghuman feedback within the topic modeling generation process, which we discussbelow.4.2.1 Human-in-the-loop Topic ModelSince system-generated topic models can be noisy and and/or may not match users’current information needs, some recent works have investigated how user supervi-sion can be introduced to improve the results. The main focus has been on answer-ing the following two research questions: 1) How to revise the topic model giventhe user feedback? 2) How to best support the user in expressing such feedback80within a visual interface?To answer the first question– in the dominant LDA topic modeling framework,the original unsupervised LDA method was modified to allow for the introductionof human supervision [11, 66, 105]. For instance, Andrzejewski et al. incorporatesuser’s domain knowledge in LDA by adding constraints in the form of must-link(enforces that sets of words must appear together in the same topic) and cannot-link (enforces that sets of words must be in different topics) using Dirichlet forestprior [11]. However, this method requires to rerun Gibbs sampling from scratchafter a set of constraints is added, leading to high latency. Since such latency isundesirable for real-time interactions, Hu et al. propose a more efficient inferencemechanism that aims to minimize user’s waiting time [66]. More recently, anothervariant of LDA was proposed that also incorporated must-link and cannot-link con-straints [130], however these constraints are applied at the document level insteadof at the word level. The purpose of applying such constraints is to improve thetopic model stability, by minimizing the changes to the topic assignments of olddocuments when the model is updated to take new documents into account.Unfortunately, all these approaches were designed for non-conversational doc-uments. In contrast, an asynchronous conversation has some unique features suchas participants often reply to a comment or they quote a portion of comments, cre-ating a conversational structure. It has been shown that by utilizing these uniquefeatures, the accuracy of the topic model on conversations can be improved overtraditional document-centric topic model [70]. Therefore, we devise a new interac-tive topic modeling framework that is designed to take advantage of conversationalfeatures.Previous work has addressed the question of how a visual interface can sup-port the user in revising text analytic models, while exploring a set of documents[19, 23, 81]. In their seminal work, Pirolli et al. presented the Scatter/Gather sys-tem, where the user could select a document cluster and then ask the system tore-cluster it to analyze its sub-clusters [102]. More recently, Chuang et al. ex-tend Termite [20], which visualizes the term-topic distributions produced by LDA,and allows the user to revise the model by clicking on words to promote or de-mote their inclusion/prominence in a topic [23]. Similarly, Lee et al. visualizetopic modeling results from LDA, and allow the user to interactively manipulate81the topical keyword weights and to merge/split topic clusters [81]. Even morerecently, user feedback was incorporated through a scatter plot visualization, thatsteers a semi-supervised non-negative matrix factorization (NMF) method [100]for topic modeling [19]. The authors show that the NMF-based approach has fasterempirical convergence and offers more consistency in the results over traditionalLDA-based approaches. They visually present each topic cluster and then allowthe user to directly manipulate the documents and keywords within each cluster tospecify topic revisions. A fundamental limitation of most of these works is thatthe visual interfaces for interactive topic model were not evaluated with real users.Therefore, a set of critical research questions remained unanswered. For instance,would users be really interested in performing all the operations provided withsuch a complex interactive visualizations? What operations are actually useful tothe users for performing exploratory tasks in a specific domain? To answer thesequestions, we applied a systematic design approach, where we first identified a setof topic revision operations which are most useful according to our tasks analysis,and then performed a user study to measure the utility of these operations.4.2.2 Interactive Topic Hierarchy RevisionThere have been some earlier works to design systems for revising hierarchicalstructures such as taxonomies and ontologies [10, 99]. For instance, ReTAX is ataxonomy revision system which takes a pre-established taxonomy as input andsome new items, and then uses a set of consistency rules to determine the incon-sistency in the hierarchy and generate refinement to the hierarchy to resolve theinconsistency. More recently, Nikitina et al. presents a technique for ontology re-vision, where the system presents a set of propositions, for example, a is a subclassof b, to the user, and then based on user’s feedback revises the underlying ontologyaccordingly [99].Unlike revising taxonomies or ontologies, the revision of a topic hierarchy hasrarely been studied. One notable exception is the the work from Dou et al. [40],which allows the user to modify a hierarchical topic structure from a visual inter-face. However, such modification does not invoke the underlying topic modelingsystem; instead, the revised topic structure simply becomes visible in the interface.82Moreover, even though they considered split operation as critical for improving thecurrent topic model, this operation was not supported in their work.4.3 Interactive Topic Modeling SystemAs illustrated in Figure 4.1, our interactive topic modeling system performs twoprimary functions: 1) generating the initial topic model, 2) revising the topic modelbased on user feedback. We have already discussed how to generate the initial topicmodel in Chapter 2 and 3. We now discuss in detail how the system revises the topicmodel generated from a single conversation, as well as the topic model generatedfrom a collection of conversations.4.3.1 Interactive Topic Revisions of Topic Models for a SingleConversationAlthough the initial topic model generated by our approach has been found to bemore accurate than models generated by traditional methods for non-conversationaltext [70], still the extracted topics may not always match the user’s informationneeds. Depending on the user’s mental model and current tasks, the topic modelingresults may not be adequate. For instance, more specific topics may be more usefulin some cases, while more generic ones in other cases. Therefore, we incorporatea set of topic revision operations by which users can iteratively modify the initialtopic model to better fulfill their information needs.Since it may take some effort from the users to express different topic revisionoperations, it is important to identify the minimal set of operations that would beboth intuitive and sufficient to support user’s tasks [23]. For this purpose, we firstconsidered eleven different possible topic revision operations listed in Table 5.1based on reviewing existing work on interactive topic modeling [11, 19, 66, 81].Next, we prioritized the operations based on the following criteria ordered by theirimportance: 1) Task relevancy: To what extent this operation is relevant to thetasks involved in exploring a conversation as identified in [59]? 2) Topic model rel-evancy: Is this operation applicable to our topic model approach? 3) Redundancy:Is this operation already covered by other operations, which are stronger on theprevious two criteria?83No Operation Why?CriteriaReferenceTaskrelevancyTopicModelrelevancyRedun-dancy1 Split a topic This topic is too generic high yes no [19, 81]2 Merge by joining These topics are talking about similarthings high yes no [19, 81]3 Merge by absorption A sub-topic is more related to a differenttopic than its current parent topic high yes no [19]4 Split by keyword This keyword should be separated into anew topic medium yes yes [19]5 Change the overall granular-ity level of topicsToo few topics/ too many specific topicsare generated medium yes yes [81]6 Remove the topic from thedisplayThis topic does not make any sense (i.e.,off-topic) low yes yes [81]7 Assign a label for this topic The current label of this topic does not rep-resent the actual topic low yes yes [43]8 Increase the weight of thiskeyphraseThis keyphrase should be included in thetopic label list low yes yes [43]9 Apply must-link constraint Those words must be in the same topic low no no [11, 66]10Apply cannot-link constraint Those words must not be in the same topic low no no [11, 66]11Change keyword weights This keyword is more related to the topic low no yes [19, 81]Table 4.1: Different possible topic revision operations.84The three operations at the bottom of Table 5.1 (9-11) are eliminated basedon both task and topic model relevancy criteria. Not only are these operationsdesigned to fix the term-topic distribution, which is not applicable to our topicmodeling approach; but more importantly, they are arguably not very useful tosupport the high-level exploratory reading tasks as identified in 2 and therefore theusers may not be motivated to perform such operations. In the end, we selectedthe top three operations in Table 5.1 namely, ‘split a topic’, ‘merge topics by join’,and ‘merge topics by absorption’, because we identified them as the most relevantto our exploratory reading tasks in which the user may benefit from dynamicallychanging the granularity level of different topics. Also, by selecting them someother candidate operations with lower task relevancy become redundant and aretherefore eliminated. These are ‘change the overall granularity level of topics’(covered by topic splitting and merging) and ‘split by keyword’ (covered by topicsplitting).Figure 4.2 illustrates the three selected topic revision operations. In the remain-der of the section, we describe how each of these operations supports the user’stasks, and how the underlying topic model is revised according to these operations.Split a topicTopic splitting allows the user to explore more specific sub-topics of a given topic,thus changing the topic granularity to a finer level. Consider an example, whereinitially the system creates a topic named ‘military security’. As the user startsexploring this topic, she finds it to be too generic with respect to her informationneeds and therefore she wants to split it into more specific sub-topics.Method: Assume that the user wants to split a topic A into multiple sub-topics,as shown in Figure 4.2. Upon user’s request, the underlying topic model creates asub-graph GA(VA,EA) ⊂ G(V,E) from the original graph G(V,E) generated in theinitial topic segmentation (see Section 2.4.1 for details), where VA represents thevertices (sentences) of topic A, and each edge w(x,y) in EA represents the weightededges of topic A.Next, the system splits the chosen topical cluster A into further n sub-clustersA1,A2, ...,An, by applying the same graph partitioning algorithm used in the initial85Figure 4.2: Three different user operations for topic revisiontopic segmentation phase, i.e., approximate solution to n-Cut [118] on GA(VA,EA).Here, n is the optimal number of sub-topics, which is automatically determined byfinding the value of n for which an objective function Q is maximized according tothe formula proposed by Newman and Girvan [97],Qn(A) =n∑c=1∑x∈Vc,y∈Vc w(x,y)∑x∈VA,y∈VA w(x,y)−(∑x∈Vc,y∈VA w(x,y)∑x∈VA,y∈VA w(x,y))2. (4.1)Qn(A) measures the quality of a clustering of nodes in the graph GA(VA,EA)into n groups, where ∑x∈Vc,y∈Vc w(x,y) measures the within-cluster sum of weights,∑x∈VA,y∈VA w(x,y) measures the sum of all edge weights in the graph, and∑x∈Vc,y∈VA w(x,y) measures the sum of weights over all edges attached to nodes incluster c. In essence, according to Equation 4.1, the nodes in high-quality clustersshould have much stronger connections among themselves than with other nodesin the graph.86We apply Equation 4.1 for increasing value of n= 2,3,4,5 and select the valueof n, for which Qn(A) is maximum. The highest possible value of n is capped to5 because of time constraint imposed by the interactive nature of the operation.Notice, however, that this limitation is not too penalizing. Our analysis of theSlashdot corpus shows that in 86% cases of splitting a topic, the best value ofQn(A) is with n≤ 5 and in the cases for which this is not the case the improvementfor n > 5 is minimal.Once the parent topic is segmented into n different sub-clusters, representativekeyphrases are generated for each sub-topic. This is done by running our topic la-beling method, as described in Section 2.4.1, only on the sub-conversation coveredby A.Merge by joiningThis operation allows the user to aggregate multiple similar topics into a single one.Opposite to topic splitting, the result is a topic with coarser granularity. Consideran example, where the initial topic model produces two different topics namely‘secure code’ and ‘simple sql server injection’. The user may find that both topicsare too specific, therefore joining them into a more generic topic may help her tobetter perform subsequent tasks.Method: Assume that the user decides to merge by joining two topics A andB (see Figure 4.2). To perform this operation, the topic modeling system createsanother topic C and assigns its vertices as VC =VA⋃VB and edges as EC = EA⋃EB.After that, a label for C is generated. This is done by running our topic labelingmethod, as described in Section 2.4.1, only on the sub-conversation covered by C.Merge by absorptionIf a sub-topic is more related to a different topic than its current parent topic, mergeby absorption allows the user to separate this sub-topic from its current parentand merge it with the one to which it is more related. Unlike the previous mergeoperation (which joins two independent topics), this operation allows a sub-topicthat is already placed under a topic to be absorbed by a different parent topic.Consider an example, where the sentences related to two different topics, namely87‘Obama health policy’ and ‘job recession’ are wrongly grouped together under thetopic ‘Obama recession’. The user may realize that the sub-topic ‘job recession’should be separated from its parent topic and merged with the ‘unemployment’topic to which it is more related.Method: Upon receiving a merge by absorption feedback from the user on Akand B, the topic modeling system removes the sub-topic Ak from its current parentA and merge it with the topic B (see Figure 4.2). The system then creates a newparent topic C and then assigns vertices such that,VC =VAk⋃VB,VA =VA \VAk . (4.2)and edges such that,EC = EAk⋃EB,EA = EA \EAk . (4.3)Next, the topic labeling method takes the portion of the conversation that con-sists of the sentences in VC, thus generating a label for C that potentially representsdescriptive keyphrases from both topics Ak and B.4.3.2 Interactive Topic Revisions of Topic Models for a Set ofConversationsIn Chapter 3, we have shown that to support the exploration of multiple conver-sations it can be extremely useful to organize topics into a hierarchical structure.However, similarly to what we found out for simpler topic models for single con-versations, the initial model generated by our system may not work well for thecurrent user, because the resulting hierarchy is noisy and/or does not conform withthe users mental model and current tasks.To address these problems, we facilitate the user in modifying the topic hi-erarchy through a set of topic revision operations similarly to what was done forrevising topic models for a single conversation. For this purpose, we analyze thefeedback from the user study (as described in the previous chapter), as well as for-mative studies, to devise a set of operations for revising the topic hierarchy that ispotentially useful. The list of operations is shown in Table 4.2.The first two operations in Table 4.2 are intended to change the number of88sub-topics of a parent topic depending on users information needs. The next twooperations help the user to move a children topic from its current parent topic andplace it under a more appropriate parent topic. The last two operations help the userto rename a topic node or simply remove it. While exploring a set of conversations,the user may apply a combination of different operations to organize topics into ahierarchy that is more accurate and matches her mental model and current tasks.No Operation Why?1Show me fewer, moregeneric children topicsThe current node has too many childrentopics2Show me more specificchildren topicsThe current node has too few childrentopics3 Add node as childA child topic is wrongly placed under adifferent parent topic4 Merge node as siblings These topics cover similar content5 Remove a nodeThis topic does not make any sense(i.e., garbage-topic)6 Rename a nodeThe current label of this topic does notdescribe the topic properlyTable 4.2: A set of operations for revising the topic hierarchy.Figure 4.3 provides a summary of what happens after applying each of theoperations for revising the topic hierarchy. While some of these operations havesimilarities with the operations for revising topic models for a single conversation,we have designed a set of operations that are potentially more useful in the contextof revising a topic hierarchy. In the remainder of the section, we provide a moredetailed description on how each of these operations supports the user’s tasks, andhow the system revises the topic hierarchy according to each of these operations.Show fewer, more generic sub-topicsIn the initial topic hierarchy, some of the parent topics may have a large numberof very specific children topics. If such a refined decomposition does not matchthe user mental model and/or information seeking task, it would be useful to allowher to change the topic hierarchy by showing only a few but more generic children89Figure 4.3: Illustrative examples of how the topic hierarchy changes as a re-sult of applying different operations.topics of that parent topic. In this way, the user can change the granularity level of atopic to be coarser. Note that this operation is similar to splitting a topic describedin 4.3.1. However, the underlying computational method is slightly different inthat, in this case, the system clusters the set of children topics of a node into asmaller set, as opposed to clustering the sentences of a topic into multiple sub-topics.Method: Assume that a topic A has the following sub-topics a1, ...,an. Whenthe user requests for showing fewer sub-topics of A, the system clusters the setof sub-topics a1, ...an into a smaller set of sub-topics b1, ...,bm where, m < n (seeFigure 4.3).This is done by applying the same graph partitioning algorithm used in thegeneration of topic hierarchy, i.e., an approximate solution to n-Cut (see Sec-tion 3.5.2) [118]. More specifically, an undirected weighted graph G(V,E) is con-structed, where each node in V = a1, ...,an represents a sub-topic of A, and eachedge w(x,y) in E represents the similarity between two sub-topics. This similarity90score is computed according to Equation 3.1. The number of clusters (i.e., |m|) inthis clustering algorithm is automatically determined by maximizing a clusteringobjective function proposed in 4.1 [97].Once the sub-clusters b1, ...,bm are created, representative keyphrases are gen-erated for each sub-topic in b1, ...,bm. This is done by running our topic labelingmethod only on the set of sentences covered by each sub-topic in b1, ...,bm.Show more specific sub-topicsThis operation serves the opposite purpose of the previous operation, i.e., changethe granularity level of a parent topic to a finer level by deriving and showing morespecific children topics.Method: Assume that the user requested to show more specific sub-topics ofA. In response, the system removes the set of sub-topics in the immediate level(i.e., b1, ...,bm) from the topic hierarchy and then links the sub-topics of b1, ...,bmi.e., a1, ...an as the sub-topics of A (see Figure 4.3).Add as ChildSometimes a child topic node might be wrongly placed under a different topicinstead of under a more appropriate parent topic. In such case, it may be useful toallow the user to move the child topic from its current topic and place it under amore appropriate parent. For instance, a topic namely ‘bending vulnerability’ waswrongly placed under the parent topic ‘iPhone5 models’ and the user realized thatit would be more appropriate to add ‘bending vulnerability’ as a child under theparent topic ‘iPhone 6 bend’.Method: When the user applies this operation on a sub-topic Ak, the systemremoves Ak from its current parent A and then assigns B as its new parent as sug-gested by the user (see Figure 4.3).Merge as SiblingsThis operation serves a similar purpose like the previous one, however, instead ofadding a topic as a child to a node it is added as siblings to that node. The twosiblings are then placed under a new parent node.91Method: The underlying computational method is similar to the ‘merge byjoin’ operation for revising the topic model for a single conversation. In essence,when the user applies this operation, the system removes the sub-topic Ak from itscurrent parent A and merges it with the topic B (see Figure 4.3). The system thencreates a new parent topic C and then assigns vertices such that,VC =VAk ∪VB,VA =VA \VAk (4.4)and edges such that,EC = EAk ∪EB,EA = EA \EAk (4.5)After that, the topic labeling method takes the portion of the conversation thatconsists of the sentences in VC, thus generating a label for C that potentially repre-sents descriptive keyphrases from both topics Ak and B.Removing topics of a conversationSometimes, a topic is not relevant or interesting according to the user’s currentinformation needs. In such case, it may be useful to allow the user to remove suchnodes and adjust the topic hierarchy accordingly.Method: When the user applies this operation, the system removes the sub-topic Ak, along with all the children nodes of Ak, from its current parent A andshow the updated hierarchy to the user.Renaming topicsIf a topic doesn’t accurately reflect its textual content, the user can rename the topicby giving a more appropriate label.Method: The system updates the topic hierarchy by changing the label of thetopic node and showing the updated label to the user.4.4 Interactive Visualization for Topic RevisionWe have extended the ConVis and MultiConVis interfaces to incorporate the topicrevision operations described above. While doing so, we did not discard any exist-92ing features of these interfaces, rather we have complemented them with additionalinteractive features for revising the topics. We now discuss these extended inter-faces (i.e., ConVisIT and MultiConVisIT) along with their interactive features.4.4.1 ConVisIT: Exploring a Single Conversation Using InteractiveTopic ModelingAs the user explores a conversation, she may realize that the initial topic model isnot helping her anymore, and may want to revise it. To support the user in sucha situation, ConVisIT provides a set of interactive topic revision operations withinthe interface through some intuitive direct manipulation methods2. As the userperforms these operations, the system updates the topic model and changes thevisual encoding of the topic list from the initial flat list of topics into a multi-rootedtree organization. Such updates to the topic organization becomes visible to theuser through perceptually meaningful animations, following the design guidelinesof effective animation presented in [58]. In particular, we have devised stagedanimation for each operation, i.e., we break up the corresponding transition intoa set of simple sub-transitions, allowing multiple successive changes to be easilyobserved.For instance, when the user splits a topic by double clicking on it, the followingsub-transitions occur (See Figure 4.4). First, the clicked topic A moves to the leftalong with its parent node(s) (if any), while existing nodes at the deepest level arepushed towards their new positions (up/down) around the circular layout to createangular spaces for the new sub-topics. Second, the new sub-topics A1,A2, ...Anappear and move from their parent’s position (A) to their new positions. Third,labels appear for these sub-topics (see Figure 4.4b). Double clicking on A againcauses it to collapse by following the exact reverse order of animation, i.e., thelabels of the children move from their current positions to their parent and fadeaway, and then the parent moves to its previous position while other nodes movecloser to the parent node to fill the gaps left by the removed children nodes.Merging of two topics can be performed by dragging a topic A over anothertopic B, which causes the system to update the topic model. As a result, a new2A video demonstration of ConVisIT is available here: https://goo.gl/QALDvw93(a) Before splitting (b) After splittingFigure 4.4: An example showing: (a) The user hovers over the topic ‘militarysecurity’ and decides to perform the split operation. (b) As a result, thetopic moves to its left while the rest of the topics are pushed along theperimeter of the circular layout to create space for the new children.(a) Before merge by joining (b) After merge by joiningFigure 4.5: An example showing: (a) The user decides to merge two topics byjoining (indicated by orange color). (b) As a result, ConVisIT updatesthe topic organization where these two topic nodes are merged underthe parent topic ‘subject sql injection attack’.parent topic C appears to the left and curved links are drawn from C to A and Bto indicate parent-child relationship (see Figure 4.5). The user can subsequentlydouble click on C to collapse it, which hides its sub-topics. Finally, if a child topicAk is discovered to be wrongly placed under a topic A instead of under a moreappropriate topic B, the user can drag Ak over B. As a result, the link of Ak with its94(a) Before merge by absorption (b) After merge by absorptionFigure 4.6: An example showing: (a) The user realizes that the topic ‘websites’ does not fit under the topic ‘army server’ and should be mergedwith the topic ‘prototype’. (b) ConVisIT updates the topic organizationwhere the previous link from ‘web sites’ to ‘army server’ is removed,and then ‘web sites’ is absorbed into a more generic parent topic ‘webprogramming’ along with ‘prototype’.parent A is removed and then a new parent node C appears that connects both Akand B (see Figure 4.6).As the user continues to perform interactive topic revisions, the topic organi-zation can potentially grow quickly to multiple levels of hierarchy due to iterativesplitting and merging. The current implementation can reasonably show a topicorganization having a tree depth up to four levels, when the visualization is usedon a 1920 x 1080 screen. This seems adequate for conversations with no more thana few hundreds comments, because the number of sub-topics grows exponentiallywith the depth of the topic hierarchy, and topics at the bottom of a hierarchy ofdepth four becomes so specific (i.e., cover so few sentences) that further splittingwould be inappropriate. For instance, if we assume that the avg. branching factorin a single-rooted topic hierarchy is 3 and the conversation contains 300 sentences,each leaf of the topic hierarchy of depth 4 will contain on avg. (300/34) = 3.7sentences.954.4.2 MultiConVisIT: Exploring a Collection of Conversations UsingInteractive Topic ModelingAs the user explores the set of conversations, she may realize that the initial topichierarchy does not match her mental model or information needs and may want torevise it. To support such situations, we developed a set of interactive techniqueswith respect to the topic revision operations listed in Table 4.2. We have alsoperformed an informal user study to understand the potential usability problemsand refine the topic revision operations to match with user’s information needs.As the user performs these operations, the system revises the topic hierarchy andupdates in the interface. Similar to ConVisIT, MultiConVisIT makes the revisedtopic organization visible to the user through staged animation [58] 3.For instance, when the user asks the system to show fewer, more generic sub-topics of a topic a, the following sub-transitions occur, as shown in Figure 4.7.First, existing sub-topics of a (i.e., ‘iPhone 6 bend’) are moved vertically to providespace for their new parent topics. Second, the new parent topics appear. Third,the new parent topic nodes are collapsed so that the previous sub-topics becomehidden. Double clicking on ‘iPhone 6 bend’ again results in showing more specificchildren nodes by removing the children in its immediate level.If the user thinks that a topic is wrongly placed under a parent topic, she canchange the topic assignment by dragging the topic over a different topic to whichit is more related. The user can do this in two different ways: merge a topic as asibling to another topic or place it as a child to another topic (see Figure 4.8). Asthe user drags a topic over another topic node, a dialog box appears that allows theuser to decide whether she wants to merge the dragged topic as a sibling or as achild.The user can also remove or rename topics. If the user feels that a topic is notrelevant, or does not make any sense, she can drag that topic to the recycle bin. Ifa topic label does not represent its corresponding textual comments, the user canrename it. Finally, at any time the user can undo the latest topic revision operationthat she made, by clicking on the ‘Undo’ button.3A video demonstration of MultiConVisIT is available here: https://goo.gl/edT69x96(a) Before the operation (b) New parents appear (c) Previous sub-topicsare hiddenFigure 4.7: An example showing: (a) The user asks the system to show fewer,more generic topics of ‘iPhone 6 bend’. (b) As a result, the sub-topicsare moved vertically to create space for their new parents (indicatedby orange color) and horizontally to move to the next level of the tree.c) the previous sub-topics of ‘iPhone 6 bend’ are hidden as their newparent nodes are collapsed.4.5 ImplementationA server side component (in PHP) communicates with the topic modeling system(in Python) to produce the updated results. The visualization component, on theother hand, is implemented in JavaScript using the D3 and JQuery library, whichis sufficiently fast to respond in real time to the user actions. The system runs on alaptop computer with a 2.4 GHZ processor and 16 GB RAM.In ConVisIT, the average processing time for a topic splitting operation is 6.92sec. and for a topic merging operation is 2.74 sec. (over the initial set of topics inour corpora). Similar processing time was observed in MultiConVisIT (7.34 sec.for showing fewer, more generic topics and 2.96 sec. for adding a topic as sibling).In order to increase the response time, topic split results were cached by the systemfor all the topics in the initial topic model, as well as for the sub-topics as soon asthey are created upon topic revision operations. Similarly, the results for showingfewer, more generic topics were cached for the initial topic hierarchy.97(a) Before the operation (b) After the operationFigure 4.8: An example showing: (a) The user decides to add ‘Thin metal’ asa child of the topic ‘Structural issue’ (indicated by orange color). (b) Asa result, ‘Thin metal’ is now placed under the topic ‘Structural issue’.4.6 EvaluationWe now report two summative user studies that we have conducted in lab-basedsettings to compare ConVisIT and MultiConVisIT with interfaces that do not sup-port human-in-the-loop topic modeling.4.6.1 Study IThe goal of this study is to understand how the introduction of visual interfacesfor exploring a single conversation may influence the user performance and sub-jective measures compared to more traditional interfaces. In this chapter, we havepresented ConVisIT, which is highly interactive, providing the capability to revisetopic models. Its precursor, ConVis (described in Chapter 2) is also an interactivevisualization for exploring conversations, however, it does not support any topicrevision operations. Finally, as a traditional interface for exploring conversation,we have re-implemented the interface to the popular Slashdot blog. The user studyaims to answer the following questions:(1) When we compare ConVisIT, ConVis, and the Slashdot interface, is thereany difference in user performance and subjective measures?98• Does one interface help to find more insightful comments in a conversation?• Is one interface perceived as more useful and easy to use?• Is reading behavior influenced by the interfaces? If the answer is ‘Yes’ thenhow?(2) What specific visualization features/components of the three different inter-faces are perceived as more/less beneficial by the potential users (e.g., interactivetopic revision, Thread Overview, and relations between facets)?MethodologyWe performed a laboratory-based summative study to compare among inter-faces [78]. The study was designed with three interfaces as conditions: Slashdot,ConVis, and ConVisIT. The Slashdot interface follows a typical blog reading in-terface design and it serves as a suitable baseline for our experiment. It providesan indented list-based representation of the whole conversation as well as com-mon functionalities of blog interfaces such as scrolling up and down, collapsing asub-thread, and searching for terms. The primary reason for including ConVis asan interface condition was to verify whether any potential improvements in per-formance and user behaviour over a typical blog reading interface are due to thevisualization features common between ConVis and ConVisIT, or due to the in-teractive topic revision feature (which is only present in ConVisIT). For fair com-parison, different interface parameters such as screen size and font size were keptthe same across all the interfaces. Moreover, a within-subject design was used forthis experiment with interface as the within-subject factor, allowing us to directlycompare the performance and subjective measures of each participant with respectto all three interfaces. Finally, all study aspects, including instructions and setup,went through several iterations of evaluation and pilot testing with two users, whodid not participate in the actual study.Participants20 subjects (aged 19-43, 8 females) with considerable prior experience of readingblogs participated in the study. Figure 4.9 shows responses to statements regard-99Figure 4.9: Responses to statements regarding the prior experience.ing the prior experience of the participants. Notice that 75% of the participantsreported that they read blogs at least several times a week. Moreover, 70% of theparticipants post comments on other people’s blogs at least several times a month.The subjects held a variety of occupations including engineers, software develop-ers, and university students mostly with strong science background. They wererecruited through emails and Reddit posts.Procedure and taskAt the beginning, a pre-study questionnaire was administered to capture demo-graphic information and prior experience with blog reading. Then, the user wentthrough the following steps for each of the three interfaces: 1) In a scripted warm-up session, the interface was introduced to the participant. A sample conversationwas shown using the given interface and the experimenter explained the interfaceactions by following the written script. 2) The participant was then asked to per-form a task on a given conversation (a different conversation was provided for eachinterface). Rather than asking some specific questions, we provided an open-endedtask to reflect the exploratory nature of blog reading. We asked the participant toexplore the conversation according to her own interests using the given interfaceand write down a summary of the keypoints found while exploring the conversa-100tion. The study lasted approximately 90 minutes and each participant was paid $20to participate.We carefully selected three different conversations from the Slashdot blog cor-pora having similar number of comments (89, 101, and 89) to avoid potential vari-ations due to the conversation length or complexity of the thread. Also, to coun-terbalance any potential learning effects due to the order of exposure to specificinterfaces and conversations, the order was varied using a 3 x 3 Latin square.During the study, we collected both quantitative data such as task completiontime and qualitative data such as observations and questionnaires. After complet-ing the task with each interface, participants rated the following aspects on a 5 pointLikert scale in an in-study questionnaire: 1) usefulness: ‘I found this interface tobe useful for browsing conversations’; 2) easeofUse: ‘I found this interface to beeasy to use’; 3) enjoyable: ‘I found this interface enjoyable to use’; and 4) findIn-sightfulComments: ‘This interface enabled me to find more insightful comments’.At the end of the study, post-study questionnaires followed by a semi-structuredinterview were administered regarding the interfaces overall as well as their indi-vidual features4. Finally, we logged interface actions to better compare the usagepatterns of the three different interfaces.Results analysisIn-study questionnaires: The results of the in-study questionnaires are presentedin Figure 4.10, showing the average rating expressed by the participants on four dif-ferent measures. Since the data was collected using a standard 5 point Likert scale,the standard parametric analysis is not suitable due to the lack of normality [71].Instead, we perform nonparametric analysis i.e., Mann-Whitney’s U tests on theresponses for each of these measures. Finally, all reported pairwise comparisonsare corrected with the Bonferroni adjustment.The analysis reveals that the interfaces significantly affected findInsightful-Comments, with pairwise comparisons showing that ConVisIT was perceived tohelp them in finding more insightful comments than the ConVis and the Slash-dot interfaces (see Figure 4.10). This is an important result because it supports4The study materials for the user study can be found in Appendix B101Measures Slashdot vs ConVis Slashdot vs ConVisIT ConVis vs ConVisITusefulness U = 82.5; p < 0.001 U = 70.0; p < 0.001 U = 182.5; p = 0.575easeofUse U = 178.0; p = 0.518 U = 177.0; p = 0.511 U = 195.0; p = 0.885findInsightful-Comments U = 62.0; p < 0.001 U = 27.0; p < 0.001 U = 131.0; p < 0.05enjoyable U = 67.5; p < 0.001 U = 119.0; p < 0.05 U = 159.0; p = 0.24Table 4.3: Statistical analysis (Mann-Whitney’s U test) on usefulness, ease-ofUse, enjoyable and findInsightfulComments measures (2-tailed p val-ues).Figure 4.10: Average rating of the three interfaces by the participants for thefollowing measures: usefulness, easeofUse, enjoyable and findInsight-fulComments . Longer bars indicate better rating.our intuition that by allowing the user to dynamically modify the topic organiza-tion (in ConVisIT), we enable her to find more insightful comments. There werealso significant effects of interface on usefulness as shown in Table 4.3, with pair-wise tests showing that ConVisIT and ConVis were perceived to be significantlymore useful than the Slashdot interface. Moreover, ConVisIT was rated slightlymore useful than ConVis, although the difference was not significant. Similar re-sults were obtained on the enjoyable measure, where ConVis and ConVisIT wererated significantly higher than Slashdot (see Figure 4.10). Finally, the easeofUsemeasure is not significantly affected by the interfaces, indicating that none of the102interfaces was superior on this measure. However, this is a favorable outcome forConVisIT in that even though its interactive features are more complex than in Con-Vis, the participants did not report ConVisIT as being significantly more difficult touse. Similarly, it is also a favorable outcome for both ConVisIT and ConVis, since,in spite of their complexity, they were found to be as easy to use as the simplertraditional blog interface.Interface features: The in-study questionnaire also included a number ofquestions regarding the usefulness of specific features of the three interfaces. Tocomplement this data, we also analyzed the interaction log data of ConVis, Con-VisIT, and Slashdot. The quantitative results of the subjective ratings are providedin Figure 4.11. We can readily see that the majority of the responses regarding fea-tures of the Slashdot interface range from strongly disagree to neutral. In contrast,responses regarding ConVis and ConVisIT features are dominated by strongly pos-itive to neutral ratings.Regarding topic revision operations, Split was found to be more useful (35%strongly agree and 40% agree) than Merge (20% strongly agree and 25% agree).This is also evident from the usage of these operations, as the split operation wasused more frequently (5.3 times on average) than merge (1.6 times on average).Moreover, 16 out of 20 users performed split operation prior to performing anymerge operation. A possible explanation is that participants generally found theinitial topic model results to be too coarse grained with respect to their informa-tion needs, expertise and mental model, and therefore they tended to apply splitoperation both earlier on and more frequently than the merge operation so that theycould read at finer topic granularity.An interesting observation from the log data is that even though some fea-tures were common in both ConVis and ConVisIT, they were used more frequentlywith ConVisIT. For example, participants hovered and clicked on topics and com-ments more times on average using ConVisIT than using ConVis, as shown inFigure 4.12. A possible explanation is that due to the presence of interactive topicrevision features, the participants could create topics that were more useful to themand therefore they relied on topics more frequently in their exploration.Time: Interestingly, the average time required to complete the tasks was notsignificantly affected by the interfaces, with Slashdot, ConVis, and ConVisIT re-103Figure 4.11: Responses to statements regarding specific features of the threeinterfaces under investigation.quiring on average±sd 1056±479, 1240±486 and 1159±604 secs respectively.This result is rather promising, because it indicates that participants were notslowed down by the fact that they were unfamiliar with the topic revision oper-ations and by the overhead involved in performing those operations.User-generated summaries: Recall that during the study, each participant was104Figure 4.12: Some interaction log statistics for interactions that are commonbetween ConVis and ConVisIT (based on avg. values among 20 par-ticipants)asked to write down a summary of the keypoints she had found after exploring aconversation using the given interface. We have analyzed these summaries to verifywhether the three different interfaces for exploring conversations had an effect onthe user’s ability to write high-quality summaries 5.Evaluation Protocol: For the purpose of evaluating the summaries, we re-cruited two human raters with research experience in natural language processing,but who were not involved in this research in any way. For each of the three con-versations, a set of summaries were presented to the raters. The raters were alsotold that the original blog conversation from the Slashdot corpora was not availableanymore and so they would have to rate each of these summaries on a 5-point likertscale according to their overall satisfaction with its content. Note that the summaryraters did not know which interface was used to produce each summary6.While rating the summary the rater was asked to consider the following threecriteria: 1) How informative this summary is (the more informative the better); 2)How insightful this summary is (the more insightful the better); 3) Whether thereis any redundant information within the summary (the less redundant the better).The rater was also told that the focus of this evaluation was about the content of the5Although there were 20 participants, for 2 participants the user-generated summaries were miss-ing, therefore we had 54 (18 x 3) summaries with an equal number of task-system pairs.6The instructions for rating blog summaries are provided in Section B.1.3105Figure 4.13: Average ratings for user generated summaries based upon twohuman raterssummary, not about whether it was grammatically correct and/or fluent, thereforethey were told to ignore these linguistic aspects while rating. Finally, the rater wasallowed to revise the ratings of the summaries she had already assessed, as shemoved down the list and saw more and more summaries.We converted the Likert responses from a scale of ‘Extremely Poor’ to ‘Excel-lent’ to a scale of 1 to 5, with 1 corresponding to ‘Extremely Poor’, and 5 to ‘Ex-cellent’. The weighted Kappa coefficient was then computed with linear weightsto determine the level of agreement between the two raters. The set of weightschosen was [0,0.25,0.50,0.75,1.0]. The resultant weighted Kappa coefficient was0.449, which represents good agreement [46].Results: Figure 4.13 shows the results of the evaluation of the user-generatedsummaries created with the support of three different interfaces. These results sug-gest that the interface used by the participants for reading a conversation influencestheir ability to write good summaries. In particular, for two conversations (‘Hack-ing’ and ‘Video games’), the summaries created with the support of the ConVisITinterface received considerably higher ratings by both human raters compared to106Conversation Rater Slashdot vs ConVis Slashdot vs ConVisIT ConVis vs ConVisITHacking R1 U = 15.5; p = 0.665 U = 5.5; p < 0.05 U = 6.5; p < 0.05R2 U = 11.0; p = 0.207 U = 6.5; p < 0.05 U = 12.0; p = 0.301StreamingmusicR1 U = 13.5; p = 0.450 U = 13.5; p = .423 U = 16.5; p = 0.799R2 U = 12.0; p = 0.283 U = 10.0; p = 0.092 U = 17; p = 0.849VideogamesR1 U = 13.5; p = 0.452 U = 6.0; p < 0.05 U = 6.5; p < 0.05R2 U = 13.0; p = 0.388 U = 6.5; p < 0.05 U = 7.0; p < 0.05Table 4.4: Statistical analysis (Mann-Whitney’s U test) on summary ratings(2-tailed p values)the two interfaces that do not support human-in-the-loop topic model. Pair-wisetests show that such differences are significant (Table 4.4). For the other conver-sation (‘Streaming music’), there was not any significant difference between thethree interfaces. This finding may be due to differences in conversation length (the‘Streaming music’ conversation was longer) and/or to differences in the amount ofnoise/error in the initial topic models for the different conversations.Overall preference: At the end of the study, participants were asked to in-dicate their overall preference for a blog reading interface and then justify theirchoice. 60% of the participants indicated a preference for ConVisIT, 25% for Con-Vis, and 15% for Slashdot. Most of the participants who chose ConVisIT feltthat the topic revision operations were very helpful in finding relevant comments:“ConVisIT is the most convenient interface because of its splitting and mergingfeatures. Using this interface to understand the conversation, I really did not haveto go through all the comments” (P19). It was also evident that when the granu-larity level of the topics did not match the user’s information needs, ConVisIT wasespecially helpful for navigation: “Sometimes the first-level keywords are way toogeneric, so it’s better to navigate via second-level categories (P11)”. However,participants did become frustrated in a few cases, when ConVisIT could not accu-rately split the topics into meaningful list of sub-topics as mentioned by P13: “...Ienjoyed the ability to split apart topics, though I think it would benefit from bettercategorization of topics as I felt like some were misclassified”.Those participants who chose the ConVis interface over its counterparts em-phasized the utility of its visual components, i.e., the visual representation of107the thread and highlighting the relations between topics and comments, which“...makes it easier to find out which comments are more interesting”, and “...al-lowed me to see more of what was going on, how comments were inter-related, aswell as kept me interested and focused on the thread as a whole.” (P4). The pri-mary reason for preferring ConVis over ConVisIT was that sometimes the revisedtopic organization became too cluttered or made the navigation too complex: “...drilling down to a sub-topic made the graph look too cluttered up. Sometimes, itwas harder to figure out if two topics were at the same level or not based on thelayout.” (P7), and “It felt like a good mix, others were too complex (ConVisIT) ortoo simple (Slashdot).” (P2).Three participants who preferred the Slashdot interface felt that it was easier touse, although one said “...it is not giving me the structural information that I aminterested about ” (P16). Another reason was that they were so much familiar withthis interface: “Scrolling through the conversation was good enough for me to findimportant topics in it, maybe because I am used to reading things this way.” (P15).4.6.2 Study IIWhile in study I we compared interfaces for exploring a single conversation, instudy II we focus on investigating the effectiveness of interactive topic modelingfor a set of conversations, namely on the utility of topic hierarchy revision opera-tions. The study was designed in a similar way to the study I (Section 4.6.1), withthe primary difference being that unlike Study I in which ConVisIT was comparedwith both ConVis and a traditional interface, here we compare MultiConVisIT withMultiConVis only. Recall, however, that MultiConVis was already compared to atraditional interface in a separate study described in Chapter 3, which showed thatMultiConVis outperformed its counterpart along several subjective measures (e.g.,usefulness, enjoyable) and was preferred by the majority of participants. We de-cided to run two separate studies, mainly because the task for exploring and analyz-ing a set of conversations usually require significantly more time than exploring asingle conversation; therefore, running a within-subject study with three interfaceswould have been less feasible.In this study, we aim to answer the following questions:108(1) When we compare MultiConVisIT with MultiConVis for exploring a set ofconversations, is there any difference in user performance and subjective measures?(2) What specific topic revision features of the MultiConVisIT interface areperceived as more/less beneficial by the potential users?MethodologySimilarly to what was done in study I (see Section 4.6.1), we designed a summa-tive evaluation through controlled experiments with two interfaces as conditions:MultiConVis, and MultiConVisIT. A within-subject design was used with interfaceas the within-subject factor, allowing us to directly compare the measures of eachparticipant with respect to both interfaces. We again refined all study aspects, in-cluding instructions and setup, through several iterations of pilot study with threeusers, who did not participate in the actual study.ParticipantsWe conducted the study with 16 users (aged 18-28, 9 females) who have consid-erable experience of reading blogs. The participants held a variety of occupationsranging from journalists, engineers, and students from both graduate and under-graduate levels. They were recruited through emails and social networks (Face-book and Reddit posts).Procedure and TasksAt first, a pre-study questionnaire was administered to capture demographic in-formation and prior experience with exploring blog conversations. Then, the par-ticipant went through the following steps for each of the two interfaces: 1) In ascripted warm-up session, the interface was introduced to the participant using asample dataset. 2) The participant was then asked to perform a task based a set ofconversations. For each interface, a different set of conversations was provided.Task: Similarly to what was done in study I (Section 4.6.1), we provided anopen-ended task of exploring the set of conversations and write the summary ofmajor discussion points. However, since MultiConVis and MultiConVisIT show aset of conversations along with a large number of topics organized into a hierarchy,109we asked the participant to summarize the major points under the most appropriatecorresponding topic in the hierarchy, rather than creating a plain summary. In thisway, we were able to test whether the the interface has an effect on the user’s abilityto find the most insightful and informative comments and then able to summarizethem in a coherent way under different topics.The participant was provided a task scenario where she would work as a busi-ness analyst for Apple and needed to analyze the set of conversations form a givendataset,so that later on she could discuss her insights with her colleagues. For ex-ample, when the dataset on iPhone bending was provided, the participant was giventhe following task:The issue of iPhone bending went viral on social media after the iPhone 6 waslaunched in September 2014. Soon after the product was released, some peopleclaimed that this new phone can easily bend in the pocket while sitting on it. Thisincident triggered a huge amount of discussions in Macrumors, a blog site thatregularly publishes Apple related news and allows participants to make comments.You are working for Apple as a business analyst. Your task is to find the ma-jor discussion points about the iPhone bending issue and summarize each of themunder the most appropriate corresponding topic. The final outcome will be a sum-mary of the conversations organized according to a topic hierarchy that you willhave to show and discuss with your colleagues. So you want to make sure that thetopic hierarchy and the summary of major discussion points are as informative andas clear as possible.To facilitate the above task, the interface allowed the user to click on a ‘sum-mary’ button adjacent to each topic node, so that the user could enter the summaryof a topic within a text box. At any time, the user could click on the ‘Show sum-mary view’ button to review what summary has been added so far for differenttopic nodes. Figure 4.14 shows such a summary view, where the user has enteredsummaries under the corresponding topics.We selected two different datasets crawled from the Macrumors site for testing(‘iPhone bend’ and ‘iPad release’)7. The number of conversations in the datasetswas kept the same (16 conversations in each dataset) to avoid potential variations7These are the same datasets that were used for the user study described in Chapter 3110Figure 4.14: An example of summaries of different topics created by a userduring the study.due to the amount of conversational data. Also, to counterbalance any potentiallearning effects due to the order of exposure to specific interfaces and dataset, theorder was varied using a 2 x 2 Latin square. During the study, we collected bothquantitative data such as task completion time and qualitative data such as obser-vations and questionnaires. Finally, a post-study questionnaire was administeredregarding the user’s experience with two interfaces8. The study lasted approxi-8The study materials for the user study can be found in Appendix B111Figure 4.15: Average rating of the two interfaces by the participants for thefollowing measures: usefulness, easeofUse, enjoyable, findInsightful-Comments and writeInformativeSummary. Longer bars indicate betterrating.mately 90 minutes and each participant was paid $20 to participate.Analysis of resultsAfter completing the task with each interface, participants rated six different mea-sures in an in-study questionnaire. The results of this questionnaire are presentedin Figure 4.15. The pairwise comparisons using Mann-Whitney’s U tests indicatethat MultiConVisIT is superior on two different measures out of six: usefulness(U = 75; p < 0.05) and enable to write a more informative summary i.e., writeIn-formativeSummary (U = 76.5; p < 0.05). For the other measures, MultiConVisITis still superior over its counterpart except for easeofUse, however, the results ofthese measures are not significant. Overall, this is a promising result as it suggeststhat in general participants found the MultiConVisIT interface to be more usefuland they felt that it helped them to write a more informative summary about the setof conversations.Interface features: The in-study questionnaire also included a number ofquestions regarding the usefulness of specific features of the two interfaces. Tocomplement this data, we also analyzed the interaction log data collected duringthe user experiments. The quantitative results of the subjective responses are pro-112Figure 4.16: Responses to statements regarding specific features of the Mul-tiConVisIT interface.vided in Figure 4.169. We can readily see that the majority of the responses regard-ing topic revision operations range from strongly agree to neutral. These resultssuggest that most of the users found the ability to organize topics through differenttopic revision operations to be useful. This is also evident from the usage of theseoperations, as users applied them quite frequently (‘Show me fewer, more genericchildren topics’ operation was used 3.9 times on average; ‘Show me more specificchildren topics’ was used 2.1 times on average; and ‘remove a node’ was used 3.4times on average).Time: The average time required to complete the tasks was not significantlyaffected by the interfaces, with MultiConVis and MultiConVisIT requiring onaverage±sd 1490±369 and 1610±321 secs respectively.Overall Preference: In the questionnaire, participants were also asked if theywould prefer MultiConVisIT over its counterpart. 68.75% of participants indicateda preference for MultiConVisIT, while 31.25% indicated their preference for Mul-tiConVis.Many of the participants who chose MultiConVisIT indicated that the abilityto organize the topic hierarchy according to their own mental model and currenttasks was the primary reason for their preference: “Sometimes the topics were notorganized in the way I expected. By organizing the topics into categories according9Among 16 participants, one participant’s questionnaire data was missing.113to my own way was very useful for browsing. it makes the navigation easier...’ (P7).Another participant mentioned that “The additional features are somehow bet-ter when it comes to getting the main topics. I found the collapsing of topics to-gether very interesting because if one wants to look at a very specific part of thediscussion, that is enabled (P4)”.Those who preferred the MultiConVis interface indicated that they found it eas-ier to learn compared to MultiConvisIT and few of them thought that the existingtopic hierarchy was already sufficient for them “...it took me quite a while to getused to the added features (of MultiConVisIT)” (P11).User generated summaries: Similarly to the analysis in Study I (Sec-tion 4.6.1), we have compared the summaries written by the users to verify whetherthe interfaces for exploring conversations had an effect on the user’s ability to writehigh-quality summaries. For this purpose, we employed two human raters whowere not involved in this research in any way. For each of the two set of conversa-tions, a set of summaries were presented to the raters. While rating the summarythe rater was asked to consider the three criteria as stated in Section 4.6.1 (infor-mativeness, insightfulness, and redundancy of the summary content). Again, therater was told to focus on the content of summaries rather then the grammaticalcorrectness while rating the summary10.We converted the Likert responses from a scale of ‘Extremely Poor’ to ‘Excel-lent’ to a scale of 1 to 5, with 1 corresponding to ‘Extremely Poor’, and 5 to ‘Ex-cellent’. The weighted Kappa coefficient was then computed with linear weightsto determine the level of agreement between the two raters. The set of weightschosen was [0,0.25,0.50,0.75,1.0]. The resultant weighted Kappa coefficient was0.471, which represents good agreement [46].Results: Figure 4.17 shows the results of the evaluation of the user-generatedsummaries created with the support of MultiConVis and MultiConVisIT. One couldreadily notice that for both sets of conversations (‘iPhone bend’ and ‘iPad release’),the summaries created with the support of the MultiConVisIT interface receivedconsiderably higher ratings by both human raters compared to its counterpart thatdo not support human-in-the-loop topic model. Pair-wise tests show that in three10The instructions for rating blog summaries are provided in Section B.2.3114Figure 4.17: Average ratings for user generated summaries based upon twohuman ratersDataset Rater MultiConVis vs MultiConVisITiPhone bending R1 U = 21.0; p = 0.23R2 U = 11.0; p < 0.05iPad release R1 U = 14.0; p < 0.05R2 U = 12.0; p < 0.05Table 4.5: Statistical analysis (Mann-Whitney’s U test) on summary ratings(2-tailed p values).out of four cases of comparisons between the two interfaces, the differences aresignificant (see Table 4.5).4.7 DiscussionWe now discuss the summary of findings from the two user studies, as well as thelimitations of our interactive topic modeling approach.4.7.1 Summary of FindingsBased on our analysis of the study results, we now revisit our research questions.The first user study reveals that overall ConVisIT was the most preferred interface,and was rated higher over its counterparts on the findInsightfulComments measure.Similarly, the summaries created by the participants after reading the conversation115using ConVisIT tend to receive at least similar or higher ratings compared to theother two interfaces. In contrast, Slashdot was the least preferred interface, and itreceived significantly lower rating on three different measures. As for ConVis, itseems to provide a middle ground between the other two interfaces and its topicorganization, although static, was found to be visually less cluttered than the oneof ConVisIT. In general, this shows that while interactive topic model can be ben-eficial to the user, such feature may introduce visual clutter and interaction costsat least for some users. Finally, there were no significant differences among theinterfaces in terms of easeOfuse and time to task completion, in spite of the highercomplexity of ConVis and ConVisIT.In the first study, we also analyzed what specific visualization features/compo-nents of the interfaces are perceived as more/less beneficial by the potential users(e.g., interactive topic revision, Thread Overview, relations between facets). Wefound that in general, the visualization features of ConVis and ConVisIT receivedhigher rating than the ones of Slashdot. Interestingly, we found that subjectivereactions about different features of the interfaces such as split, merge, and click-ing on topic directly correlates with their frequency of use. More importantly, wefound that not all interactive topic revision operations were equally received. Forexample, the split operation was used more frequently than its counterparts. Al-though we have proposed some possible explanations, this issue needs to be furtherinvestigated.From the second study, we again found the interactive topic modeling approachto be effective for exploring and analyzing a set of conversations. More specifically,MultiConVisIT was found to be more useful, although few users reported that theyneeded more time to learn the additional interactive topic revision features. Theinterface was also preferred over its counterpart that does not provide interactivetopic revision operations. Moreover, it enhanced the participants’ perceived abilityto write a more informative summary over MultiConVis. Such a finding is alsosupported by the objective evaluation of user-generated summaries, as summariesproduced by the user with MultiConVisIT were rated higher by external raters(based on how informative, insightful, and non-redundant the summary is). Finally,the majority of the participants rated most of the topic hierarchy revision featurespositively.1164.7.2 LimitationsIn the second study, to avoid increasing complexity in the design, we have onlytested the interactive topic features for revising the topic hierarchy in MultiCon-VisIT. However, using MultiConVisIT, the user can drill down to a single conver-sation with the ConVis view, therefore we could also provide the topic revisingfeatures for a linear topic model, as described in Section 4.3.1. In future work, wemay investigate how these two topic revision approaches could be combined withinthe MultiConVisIT interface, and whether (and how) that may lead to potential dif-ferences in user performance and subjective measures.Another interesting point is that in both studies, the system collected the topicrevision feedback from each individual user which is not shared with other poten-tial users. Arguably, it could be useful for a user to share her refined topic modelsso that other potential users exploring the same dataset might benefit. Therefore, apromising direction would be to incorporate topic revision feedback from multipleusers with the aim of building more accurate, shareable topic models.4.8 SummaryIn this chapter, we presented and evaluated a novel human-in-the-loop topic mod-eling approach to support the exploration of online conversations. We devised aset of topic revision operations specifically for asynchronous online conversationsand incorporated them into our visual text analytics systems (i.e., ConVis and Mul-tiConVis). By utilizing our interactive topic revision approach, users can exploreand revise the topic model to better fulfill their information needs.The user studies reported in this chapter reveal that both ConVisIT and Multi-ConVisIT were preferred by the majority of the participants over their counterpartsthat do not support interactive topic revision. Moreover, our analysis shows thatsummaries written by participants during the exploration of conversations receivedhigher (or at least equal) ratings by human raters, when ConVisIT and MultiCon-VisIT were the interfaces used to explore online conversations. In essence, theresults from the studies indicate that users benefit from getting more control overthe topic modeling process while exploring conversations.117Chapter 5Tailoring Our Visual TextAnalytics Solutions to aCommunity Question AnsweringForumSo far, we have presented a set of systems covering two different dimensions ofour visual text analytics design space, namely single vs. a set of conversations andstatic vs. human-in-the-loop model. While developing these systems, we did notrestrict our solutions to any specific domain problem faced by users. Furthermore,our evaluations were limited to either case studies or lab studies.In this chapter, we are interested in understanding how our generic visual textanalytics solutions can be applied and tailored to a specific domain problem. Toanswer this question, we present a design study in a community question answering(CQA) forum, where our visual text analytics solutions were simplified and tailoredto support information seeking tasks for a user population possibly having lowvisualization expertise.A crucial aspect of this work is that unlike the evaluations we have presented inprevious chapters, we evaluated the new system in a more ecologically valid way,by deploying it in a real-world environment in which it was tested with hundreds118of real users. Through this large-scale online study, we gained deeper insightsabout the potential utility of the system, as well as learned generalizable lessons fordesigning visual text analytics systems for the CQA forums and similar domainsof conversations 1.5.1 IntroductionCommunity question answering forums, such as StackExchange, Yahoo! Answers,and Quora are becoming more and more popular these days.2 They represent ef-fective means for communities of users around particular topics to share infor-mation and to collectively satisfy their information needs. CQA forums typicallyorganize their content in the form of multiple topic-oriented question–commentthreads, where a question posed by a user may be answered by a possibly long listof comments from other users.Many such online forums are not moderated, which often results in very noisyand redundant content. Users tend to deviate from the original question and engagein discussions on completely irrelevant or only loosely related topics. At the sametime, similar questions may be posted repeatedly with minor variations. This near-duplicity is difficult to track for users, who are usually offered only simple searchcapabilities by the forum interface. When relevant answers to user questions arescattered around multiple related conversations and buried among a large numberof comments, the user is facing a challenging information processing task, which,without proper support, leads to information overload.For example, consider John, who is an expatriate, just arrived in Qatar and isseeking recommendations for a good bank. When he searches for ‘Which is thebest bank in Qatar?’ in the Qatar Living forum3, a very popular site in Qatar, itreturns about a dozen previously asked questions, such as ‘ What is the best bankto open an account?’ or ‘What is the best bank in Qatar for small business?’ (seeFigure 5.1). Each of these questions is followed by a set of comments, resulting in1This chapter is a modified version of our paper CQAVis: Visual text analytics for communityquestion answering, by Enamul Hoque, Shafiq Joty, Lluı´s Ma`rquez and Giuseppe Carenini; in Pro-ceedings of the ACM International Conference on Intelligent User Interfaces (IUI), pp. 161-172,2017 [63].2stackexchange.com, answers.yahoo.com, quora.com3http://www.qatarliving.com/forum119Figure 5.1: An example of a new question q asked by a user which is shownat the top, followed by a set of related thread questions q1, . . . ,qn andtheir comments.hundreds of comments in total. Given the large number of comments from multiplerelated threads, it would be very difficult and time-consuming for John to identifyand make sense of useful comments using a traditional interface.In this chapter, we present CQAVis, an intelligent visual interface specificallytailored to help users find comments that provide good answers to a new ques-tion (i.e., never asked in exactly this form before) in community-created forums.CQAVis was designed by simplifying and tailoring our generic solutions for blog120conversations, to take into account specific features of CQA data and tasks. Theresulting interface allows the user to start with a new question, then to explore therelated threads to find the ones that seem to be most relevant to her informationneeds, and eventually to navigate through the comments of a thread in search forrelevant answers to her question. The underlying text analytic module dynamicallyranks potential answers to a new question by combining two relevant measures:(i) how good or useful the comment is with respect to the thread question, such asq1, q2 in Figure 5.1, and (ii) how similar the thread question is with respect to thenew question (q).Our system was deployed in the Qatar Living forum site to evaluate our inter-face among hundreds of real users. Qatar Living forum was suitable for our study,because it represents the type of forums where the information overload problem,as described above, could be more prevalent due to unmoderated noisy content.Moreover, a large number of its users have limited expertise in using visual inter-faces, which poses critical challenges to designing interfaces.The primary contributions of our work include: 1) characterization of the CQAforums by identifying user tasks and some key design needs; 2) design of CQAVisthat demonstrates how our generic approach for integrating NLP and InfoVis tech-niques presented in Chapter 2 and 3 can be applied and tailored to meet thesespecific user needs; 3) the evaluation of the tool in the wild in an ecologically validtesting by deploying the system among real forum readers, which in turn reveal thatthe overall approach for combining NLP and InfoVis techniques presented in thisdissertation can be effective for a diverse range of user population; and 4) general-izable lessons learned from the study that can be useful to design visual interfacesfor online conversations in other domains such as news comments and health fo-rums, as well as to design for user populations possibly having low visualizationliteracy.5.2 The Design ProcessOur design study process followed the nine-stage framework proposed by Sedlmairet al. [111]. In particular, we focused on four core phases of the design framework,i.e., discover, design, implement, and deploy:1211) Discover: In this stage, we analyzed the needs, problems, and requirementsin the domain of CQA forums through literature review and conducting in-depthinterviews with Qatar Living forum users and administrators.2) Design: After reaching a shared understanding of the CQA domain, we ex-plored the design space, by analyzing the CQA data and tasks and how our currentinterfaces can be re-designed to support those tasks. We applied an iterative designapproach, starting with paper prototyping, followed by prototyping on a limitedannotated dataset which led us to the final prototype on the whole forum datset.3) Implement: We developed both client and server side components in col-laboration with the Qatar Living administrators.4) Deploy: Following several pilot studies and corresponding refinements ofthe prototype, we deployed the tool as a beta version in the Qatar Living websiteand gathered feedback about its use in the wild.5.3 User Requirements AnalysisIn order to understand the requirements of users, we have analyzed existing liter-ature characterizing the CQA domain and conducted in-depth interviews with theQatar Living admin and target users.5.3.1 Domain CharacterizationTo characterize the domain of question answering (QA) forums, we analyzed exist-ing literature in the areas of human-computer interaction and computer supportedcollaborative work, focusing on what types of questions are asked [51, 64, 88],who answers and why [41, 88] and what are the predictors for answer quality [51].Subjective nature of questions: Researchers have found that there are moresubjective and opinion-based questions than factual questions [56, 64, 88]. Mor-ris et al. surveyed QA users and found that only 17% of the questions they askedwere seeking factual information, while the most common categories of questionswere requests for recommendation (29%) and opinions (22%) [88]. Similar resultswere found for a social-question-answering system, with 64.7% of the queries werefound to be subjective [64]. Due to the nature of these questions, e.g., ‘best Ital-ian restaurant in Doha’, often any particular subjective answer, for example ‘I like122Di Capri Ristorante’, may not satisfy the information needs, therefore the user in-terface should effectively support browsing various answers from multiple relatedthreads.Variability in answer quality: Previous work also analyzed the characteristicsof good answers. Harper et al. conducted a controlled field study to analyze dif-ferent predictors of answer quality across several QA sites [51]. They found thatwhile QA site like Yahoo! Answers provide lots of high-quality answers, usersshould also expect substantial variability in the quality of individual answers. Toaddress this issue, it may be useful to apply an automatic approach to identifyinghigh quality answers and help users to navigate through these answers.Slower response: Some researchers have explored the factors affecting answerquality and response time on QA sites. Raban and Harper identified both intrinsicfactors such as perceived ownership of information and gratitude and extrinsic fac-tors such as reputation systems that motivate CQA users to answer questions [104].However, even when motivated people are available to answer, their response timestend to be long [65, 88]. For example, the average time to receive a response to aquestion posted to Microsoft’s Live QA site was 2 hours and 52 minutes [65].Interviews: In addition to analyzing existing literature, we also conducted twosemi-structured interviews and a number of follow-up interviews with our collab-orator at Qatar Living. We also interviewed five users in our early design process,who regularly visit Qatar Living forum. The goal was to understand more spe-cific needs and requirements for the type of forum that Qatar Living represents todirectly inform our design process.Many naive users: Qatar Living is one of the most popular sites in Qatar, withover 550,000 visitors per month and over 19 million page views a month fromQatar. The Qatar Living forum is very popular in Qatar, especially among theexpatriates. It is actively visited by hundreds of users everyday, who mainly try tofulfill their information needs in their topics of interest. However, a large portionof the forum users are naive and they are not proficient with sophisticated userinterfaces. Therefore, an important design goal was to make the interface simpleand intuitive. In addition to naive and non-expert users, there is a dedicated smallgroup of forum moderators having higher level expertise about the topics. Theseusers actively browse the new questions posted in the forum and try to answer123them depending on their expertise. While we have mainly focused on supportingthe former group of users, we argue that moderators can also benefit from our textanalytics and visual interface for their tasks.Searching for previous questions rather than asking new ones: Our collabo-rator and Qatar Living users pointed out that usually the readers try to get theirquestions answered as quickly as possible. So, they often prefer to use the searchfeature within the forum to find similar questions to their current question, ratherthan posting their questions and waiting for answers.However, they have difficulty in exploring threads of comments associated withsimilar questions, due to a large volume of comments they need to read, which istime consuming and cumbersome using the existing search interface. This suggestsa pressing need for improving the search interface for the forum to enhance theuser’s ability to find good answers.Difficulty in finding good answers: Like many other CQA sites, Qatar Livingforum contents are often noisy and redundant. Users tend to use very informallanguage, often writing very long stories with small pieces of relevant text only.Due to noisy and redundant content, the question threads can become longer withonly a few relevant answers. As a result, searching for relevant answers often leadsto the information overload problem. To make matters worse, although there isan upvoting/downvoting feature, most users either do not know how to use thisfeature or they do not bother to do it. This was also confirmed by the users whowere interviewed during the early design. Based on this observation, our collabo-rators agreed that an automatic comment classifier that is reasonably accurate canbe effective in identifying good answers. More importantly, the interface shouldfacilitate the user to find the good answers, which may be scattered among the largeamount of comments from multiple different threads.In summary, while designing the system for supporting the information seekingtasks in forums like Qatar Living (that are typically unmoderated, contain nearduplicate questions and lot of noisy comments), we should consider following userrequirements: 1) The interface should effectively support the user in identifyingmultiple good answers from related question threads. 2) To address the variabilityin the answer quality, a classifier should be introduced to identify useful comments.3) The interface should introduce interactive visualization components to enhance124the user’s ability to find good answers from large volume of comments. 4) Tosupport users having lower visualization expertise, the interface should be simpleand intuitive.5.3.2 Data and Task AbstractionsTasks: In our conversations with the Qatar Living admin, we learned several use-cases and tasks of the forum users. We analyzed these tasks according to a visu-alization task typology [92] in order to inform our design. At the high level, usersare primarily interested in seeking information with the goal to discover new infor-mation or knowledge. At this level, the user may ask questions like “Which is thebest bank in Qatar?”, or “ Where can I find a good Chinese restaurant in Qatar?”.Once the user is presented with some related questions to her new questions, thenext level task is to search for the most related questions of interest by browsingthe list of questions presented to her. When they find the most related questionsfrom the list, next they focus on identifying, comparing, and summarizing the mostuseful answers to her original question.Data: Based on our user requirements and tasks analysis, we derive how thedata should be abstracted for visualizing to the user. As illustrated in Figure 5.1,an example dataset consists of a question asked by a user with the set of relatedquestions found by the system. We encode the relatedness of a question to the newquestion by a rank value (ordinal). Each related question is also followed by a setof comments that tried to answer that question. For each of these comments, wederive the goodness score provided by a classifier with respect to its related ques-tion and represent as a normalized quantitative value between [0.0,1.0], by passingthe score through a sigmoid function. We also assign each comment into one of sixequally sized bins depending on its classification score to help the user understandhow relevant a particular comment is. Based on this binning, we also compute thedistribution of comments for each related question thread by counting how manycomments fall into a particular bin. We compute this distribution, because it caneffectively convey to the user how many comments are useful among all the com-ments of a question.125Figure 5.2: Overview of our interactive system for supporting communityquestion answering.5.4 System OverviewFigure 5.2 presents an overview of our system, which is organized in two parts. Inthe offline step (Figure 5.2a), we pre-process the datasets and we train a commentclassifier. In the online regime (Figure 5.2b), the user enters a question as input,and the system performs three steps on the fly: retrieving the top n related questionthreads, ranking all the answers, and visualizing the results. We briefly discussthese steps below.5.4.1 Offline ProcessingTo build the system, we used a dump of the Qatar Living forum from March 2016,and we performed several pre-processing steps including the conversion of theXML dump to JSON format that our interface can process. This dump contains202,304 conversations and 2,043,022 comments. On an average, each conversa-tion consists of 10.21 comments.We also used the datasets on CQA from SemEval-2016 Task 3 (subtask A) [94],where the comments in the threads are manually annotated with good vs. badlabels, indicating how well the comments answer the question in the thread. Usingthis dataset, we extracted a collection of features and we trained a Support VectorMachine (SVM)-based comment classifier that scores each comment in a threadregarding its goodness.1265.4.2 Online ProcessingWhen a user types a new question q, the system performs the following three stepson the fly: (i) Retrieve related questions, where Google local search is invoked toretrieve the top-n question threads in the Qatar Living forum that are most similar toq, {qi}ni=1; (ii) Rank the answers, where all the comments from these top-n questionthreads are ranked based on their relevance with respect to q. (iii) Visualize theresults, where the presentation module takes the related questions’ threads togetherwith the ranked lists of comments and the overall best selected answer, and presentsthem to the user.5.5 Text AnalyticsThe answer ranker module computes the relevance score of a comment c in aquestion thread qi with respect to the new question q by combining two scores:(i) σ(q,qi), the similarity of qi to q; and (ii) γ(c,qi), the goodness score for c withrespect to qi. Formally, the relevance score ρ(c,q,qi) is computed by:ρ(c,q,qi) = σ(q,qi)× γ(c,qi) (5.1)We use the inverse rank in the list returned by the Google search engine as σ(q,qi),and γ(c,qi) is computed by a comment classifier, indicating how well comment canswers qi. The resulting score is used to rank all the comments from the retrievedquestion threads to obtain the best overall answer to the input question q. Intu-itively, if a comment is a good comment with respect to the thread question, andthe thread question is related to the new question, then the comment is likely to be arelevant answer to the new question.4 The core NLP component of this architectureis the comment classifier, which is briefly described below.Comment Classifier Given a question q and a list of comments associated with it{ci}mi=1, the task of the classifier is to assign a relevance score to each of the com-ments according to their goodness at answering the question. This very problem4As discussed in the SemEval-2016 Task 3 description paper [94], this is a very simple way toobtain good results for the general task of ranking answers for new questions.127was set at SemEval-2016 Task 3 [94], subtask A. We trained an SVM classifier onthat dataset to distinguish between good and bad comments.5 The dataset is splitinto training, development and test sets, with 2,669, 500, and 700 questions, and17,900, 2,440, and 7,000 answers, respectively. The kernel function in our SVMis a linear combination of four functions: two linear kernels over numeric featuresand embeddings, and two tree kernels over shallow syntactic trees.Numeric Features These features are inspired by [12]. They include three typesof information: (i) a variety of textual similarity measures computed between thequestion and the comment; (ii) several Boolean features capturing the presence ofURLs, emails, positive/negative words, acknowledgments, forum categories, longwords, etc.; (iii) a set of global features modeling dialogue and user interactions inthe thread.Embedding Features Higher level abstract features learned automatically by deepneural networks have proved to be quite beneficial for learning semantic similaritybetween two texts [39, 103, 115, 117]. We learn embeddings for questions andanswers by training a convolutional neural network (CNN) on the comment classi-fication task following the approach of [115]. Specifically, the input to the CNN isformed by two matrices containing word embeddings for the question and for theanswer, respectively. The CNN performs a convolution and a max-pooling oper-ations on the word embeddings and on the convoluted feature maps, respectively,to produce the question embedding qE and the answer embedding cE . These em-beddings are then combined to produce a similarity value using a similarity matrix.The similarity and the embeddings along with other additional similarity featuresare then passed through a hidden layer and next to the output layer for classifica-tion. qE and cE are learned by backpropagating the (cross entropy) errors from theoutput layer. qE and cE vectors are finally concatenated and used as features in ourSVM model.Tree kernels Tree kernels provide effective ways to learn by comparing syntac-tic structures of two texts in the SVM framework, which has been shown to givestate-of-the-art results in CQA [98]. First, we produce shallow syntactic trees for5 The conversations in the SemEval dataset were written in the same language (English) as thematerial on the QatarLiving forum site.128the question and for the comment using the Stanford parser. Following [114],we link the two trees by connecting nodes such as NP, PP, VP, when there isat least one lexical overlap between the corresponding phrases of the trees, andwe mark those links using a specific tag. The kernel function K is defined as:K((q1,q2),(c1,c2))= T K(q1,c1)+T K(q2,c2), where T K(q,c) is a tree kernel func-tion operating over a pair of question (q) and comment (c) trees.6Classification Performance Our comment classifier was evaluated on theSemEval-2016 test set with the official scorer, obtaining the following results:MAP=77.66, AvgRec=88.05, MRR=84.93, F1=66.16, Acc=75.54. Compared tothe participant systems at SemEval-2016, our system scores in second positionregarding the official MAP evaluation metric (−1.5 points below the best). In con-trast, our system achieves better F1 (+1.8) and better Accuracy (+0.4) than the topsystem. For a full description of the results from SemEval-2016, see [94].5.6 CQAVis DesignIn order to explore a large number of design choices, we carried out an iterativedesign process, starting from early mockups and prototypes using paper and Pow-erpoint. We then developed a mid-level prototype which works on a small CQA an-notated corpus [94], where the comments are annotated with good vs. bad labelsby human experts. Finally, we developed a fully functional system and deployedwithin a real CQA site. Throughout the design process, we performed formativeevaluations [78] to identify potential usability issues and to iteratively refine theprototype. We now present the final design of the CQAVis interface7 8, along withjustifications for the key design decisions based on our user requirements analysisand the InfoVis literature.The design of our visual interface was influenced by our previously devel-oped interfaces for exploring a set of conversations (i.e., ConVis and MultiCon-Vis); however, in this new design we took into account specific features of CQAdata and tasks. A high level design decision for the interface was to follow anoverview+detail approach, where the overview represents the question list view6We use Partial Tree Kernel and Syntactic Tree Kernel [26, 89] to instantiate T K.7 A live demo of CQAVis is available at iyas.qcri.org8A video demonstration of CQAVis is available here https://goo.gl/IM3Gez129Figure 5.3: A screenshot of the interface showing the top answer and relatedquestions for a user’s question. As the user selects a related questionmarked by the blue rectangular boundary, the interface shows the corre-sponding thread in the conversation view.showing the top-most relevant questions to the user’s question; and the detail view(i.e., conversation view) showing the question followed by the answers for a par-ticular question thread (see Figure 5.3). We made this choice because this allowsusers to browse comments concerning a specific question, while still having thecontext of the other related questions, and also because this approach has beenfound to be more effective for text comprehension tasks than other approachessuch as zooming and focus+context [24].Questions list view After the system finds the related questions to the user’s ques-tion, it presents an overview of the ranked list of relevant questions in a scrollablelist view (see Figure 5.3, left). Each item within the questions list view representsa question thread, showing a set of metadata i.e., the original question, the postingdate, the total number of comments, as well as a stacked bar with the distribution130of useful comments. Since we are representing an ordered sequence of values, weused a set of six sequential colors by varying monotonically on the green colorchannel ranging from dark green (highly useful) to white (not useful). In this way,the user can quickly get a sense of which threads seem to be more relevant andwhich threads may contain the most useful answers.Notice that encoding the distribution of useful comments using colors within astacked bar is analogous to how the sentiment distribution was represented within aconversation (in MultiConVis) and within a comment (in ConVis). However, hereinstead of diverging color we used sequential colors, as the normalized usefulnessscore ranges from 0 (not useful) to useful (1).The questions are ordered by their relevance rank by default, but the user canchange this order by selecting criteria from the popup menu ‘Order by’. For in-stance, she can order the question threads based on the number of useful answerswithin each of these threads.Another important feature of the interface is that at any time the user can filterout comments with low usefulness score by using the slider of the widget (con-taining sequence of colored rectangles) at the top, as shown in Figure 5.3. In thisway, the user can quickly narrow down the set of less useful comments of differentquestion threads and focus on the ones that are potentially good answers to herquestion.Note that at the top of the question list view, the interface also shows the com-ment that has received the best score with respect to the new question (“Top sug-gested answer”). This feature was designed to support the user in finding a verygood answer to her question immediately, without having to open any questionthread and then navigating to answers within that thread. This was motivated bythe user requirements analysis, from which we learned that users would like to findsome very good answers quickly, therefore showing the top ranked answer rightaway could be very useful.Conversation view When the user selects a particular question thread from thelist, the system presents the corresponding thread in the conversation view, asshown in Figure 5.3. Again, we followed an overview+detail approach, where atthe top we show a visual overview of the entire thread along with the question, fol-131Figure 5.4: An example of a thread overview that splits a large number ofcomments into multiple rows to deal with horizontal space constraints.lowed by a detail view containing the list of comments. Here, the thread overviewvisually encodes the comments using a sequence of rectangles from left to right,where each rectangle represents a comment. The color within each rectangle en-codes the classification score of the comment represented by that rectangle. If thehorizontal space is not sufficient for showing all the comments, then it shows therectangles in multiple rows as shown in Figure 5.4. In this way, the thread overviewvisualization can scale with hundreds of comments, which is sufficient for a typicalCQA forum conversation.From the thread overview, the user can quickly notice which comments aremore useful and then immediately navigate to a particular comment by clicking onthe rectangle representing that comment (see Figure 5.5). Note that the two viewsare coordinated, i.e., hovering on a rectangle in the thread overview highlights thecorresponding comment in the detailed view (by scrolling if needed) and vice-versa, thus providing the user a sense of where s/he is in the current thread andwhat to expect next. Finally, the user can reorder the comments of a thread basedon their classification score to quickly go through the most useful answers.Throughout the design of CQAVis, an important goal was to make the interfacesimple and intuitive for the naive users, who constitute a large portion of usersof Qatar Living and similar forums. To achieve this goal we focused on usingvisualization metaphors that are common and easily understood (e.g., bar graphbased visualization and sequence of rectangles) and a small set of simple, low costinteractions [77] that can be easily triggered and reversed without requiring muchcognitive overload.132Figure 5.5: When the user clicks on a rectangle in the thread overview rep-resenting a comment, the interface scrolls to that comment (marked byblack color) in the conversation view.As one could easily notice that the design of the conversation view in CQAViswas strongly influenced by ConVis. For example, both interfaces visually encodethe thread overview using a sequence of colored bars to represent comments. More-over, in both interfaces, the thread overview and detailed view are coordinated, sothat any interaction in one view reflects in the other view. Yet there are a few no-table differences between the two interfaces. First, recall that in ConVis, the top-ics and authors of the conversation were connected to the comments in the threadoverview via explicit links. In contrast, in CQAVis, we created a compact represen-tation of the thread overview with a sequence of rectangles positioned horizontallyand removed the representation of topics and authors along the thread view. We133did this because our data and task abstractions obviated the necessity of presentingthe topics and authors and also this helps us simplifying the interface considerably.Second, unlike what was done in ConVis design, we did not encode the commentlength using the height/width of the rectangle representing a comment. The pri-mary reason for removing this feature is that in a pilot study we found that eitherusers did not understand what this encoding means and even if they understand thisencoding they did not find it to be useful (see later in Section 5.8.3).5.7 ImplementationThe system is implemented as a Java Web application and runs on an Apache Tom-cat Server. The back-end of the system is developed using Java. The presentationmodule, on the other hand, is implemented in JavaScript using the D3 and JQuerylibraries. It should be noted while implementing the presentation module, we wereable to reuse parts of the implementations from our previously developed inter-faces, making it faster to design the fully functional prototype.Furthermore, our system was designed to be sufficiently fast to respond in realtime to the user’s actions. A key factor for the efficiency is the fact that we pre-computed and stored the goodness scores for all the comments in all the question-threads from the static snapshot of the Qatar Living database. In this way, at run-ning time there is no need to classify the comments of the already stored question-comment threads.5.8 Web-based User StudyTo better understand the potential utility of our approach in real world scenarioswe undertook a large-scale, Web-based study. The primary aim of our study wasto empirically examine how real users would use CQAVis and what their impres-sions would be to such a visual search interface. The main research questionswere: 1) What are the possible benefits and limitations of the CQAVis interface insupporting the task of information seeking? 2) When we compare CQAVis with atypical interface for forum search, as instantiated by Qatar Living forum, is thereany difference in subjective measures?1345.8.1 MethodologyWhile a lab-based user study would allow us to have more control over the usersand tasks, realism would be largely lost [18]. Therefore, we decided to run thestudy in a Web-based environment to enhance its ecological validity, since partici-pants can then work in their own settings performing their own tasks [78]. It alsogives us the advantage of collecting interaction logs from a large number of usersto get deeper insights that are arguably more generalizable than a lab study.5.8.2 Study Setup and ProcedureIn order to run the user study, we discussed with our collaborators at Qatar Living,who agreed to incorporate our web-based tool as a beta version of the forum site.Our system was deployed at a server and then a Web-link of the system was madeavailable on the forum search page for the real users of the Qatar Living forum. Toavoid compatibility issues, we tested our interface on the Web browser versions ofMozilla Firefox, Apple Safari, and Google Chrome to ensure that we could supporta wide range of participants.Participants were guided through three main steps of the study: 1) Introduc-tion: In the home page, some background information and example queries wereprovided to get started, along with an invitation to use the interface, as shown inFigure C.1. The page also contained a short video (duration of 78 seconds) todemonstrate the main features of the interface. 2) Interaction: The main part of thestudy was the interaction with CQAVis. Here, users were not asked to completeany specific task; instead they could perform their own set of information seekingtasks. 3) Feedback: Participants were free to fill out a post-study questionnaireat any time during their interaction by clicking on the ‘Give feedback on the newtool’ button at the top. The form also allowed them to provide free-form commentsand suggestions, as shown in Figure C.2. Finally, the questionnaire sought vol-untary information about the age, gender, and Web experience of participants (seeFigure C.3) 9. Throughout the sessions, we logged interface actions along withtheir timestamps in a completely non-intrusive way to better understand the usagepatterns of the CQAVis tool.9The study materials for the user study can be found in Appendix C1355.8.3 Pilot StudyBefore making the beta version publicly available and running the online study,all study aspects, including instructions and setup, went through several iterationsof a pilot study. We ran this pilot study in a lab-based setting with six partici-pants, where we collected the data in the form of questionnaires, interviews, andobservations.The pilot study helped us in refining both the study procedure and the pro-totype. For example, the pilot study suggested that background questions shouldbe asked at the end of the study instead of at the beginning, because participantswanted to immediately explore the system without requiring to fill out the ques-tionnaire. We also modified the types of questions being asked (e.g., we providedfewer open-ended questions). The pilot study also led us to simplify the interface,by avoiding visually encoding less useful data, such as the comment length, whichwas originally encoded using the width of the rectangle encoding the commentitself.5.8.4 ParticipantsOur online study attracted 768 participants over a period of 18 weeks. The userswere recruited through the beta version link of Qatar Living, as well as throughpublicizing in the online social networks Facebook and Twitter, and via mailinglists.Those participants who chose to provide their background information (5.3%of total participants) held a variety of occupations, including students, expatriatesworking as engineers, architects and consultants, researchers and professors in uni-versities etc. The majority of participants were young (85% of them were below45). Among those who indicated their gender information, 65% were male par-ticipants. In general, most of the participants were quite familiar with using theWeb, with 72% of them indicating that they visit the Web quite frequently (sev-eral times a day). However, when it comes to uses of online forums, the responseswere mixed, ranging from rarely to very frequently with 37% users mentioning thatthey occasionally visit forums to their questions answered (i.e., several times permonth).1365.8.5 Analysis of ResultsWe now present both our quantitative and qualitative analysis, as well as the resultsbased on the data collected from the user logs and questionnaires.Sessions and queries: During the study, we captured quantitative data regard-ing 1,122 queries from 768 users. A summary of the queries and sessions is pro-vided in Table 5.1. From the table, we can see that based on the medians, a typicalparticipant spent 142 seconds with the system and issued just 1 query per visit. Theaverage session lengths are considerably larger, as some participants engaged withthe system for much longer time periods.Min Median Mean St. Dev MaxLength of session (seconds) 1.79 142 416 666.45 3,327Queries per session 0 1 1.47 1.54 16Query length (characters) 1 20 22.81 14.4 200Table 5.1: Overview of user study sessions and queries.Question Type Percent ExampleRecommendation 21.82Where can I find Italianrestaurants in Doha ?Opinion 18.51 Is QnB a good bank?Factual knowl-edge31.21When does Ramadan startin 2016?Rhetorical 0.55 How it is to live in Qatar?Invitation 0.83 need tennis partnerOthers 27.07 razor racing carTable 5.2: Breakdown of query types along with examplesWe categorized the questions asked by the participants by following [88] tounderstand the nature of information needs that were prevalent among our targetusers. The distribution of question types is shown in Table 5.2. Here, both opinionand recommendation questions are subjective in nature; opinion questions ask fora rating of a specific item whereas recommendation questions ask for open-endedrequests and suggestions. In contrast, factual questions expect objective answers.137Figure 5.6: Average rating of interfaces by the participants on four differentmeasures. Longer bars indicate better rating.Rhetorical questions are intended to promote discussions as opposed to elicitingspecific answers. An invitation asks for attending an event. Finally, the othercategory consists of queries, that do not fall into any of the previous categories.The distribution of questions was similar to what has been found in the existingliterature [88], with subjective questions (i.e., opinion and recommendation) beingstrongly prevalent among participants (41%). This justifies the rationale for tailor-ing our interface to deal with subjective questions which may require the user toread many useful comments to get the answers from various perspectives.Subjective ratings: After interacting with the interface the user could choseto provide feedback by clicking on the Feedback button. 56 users chose to providefeedback on the tool. In the feedback form, participants rated four different mea-sures on a standard 5 point Likert scale: 1) ‘I found this tool to be useful’; 2) ‘Ifound this tool easy to use’; 3) ‘I found this interface enjoyable to use’; 4) ‘Thistool enabled me to find answers relevant to my questions’.The results of these questionnaires are presented in Figure 5.6. From the Fig-ure, we can readily see that the majority of the responses were dominated by pos-itive ratings. In particular, most users agreed that the tool is useful and it enabledthem to find answers relevant to their questions.Preference: In the questionnaire, participants were also asked if they wouldprefer this tool over their regular forum search tool. 68.75% of participants indi-138cated a preference for CQAVis, with only a small fraction of them (6.25%) choos-ing the regular one. 25% indicated that they were indifferent between the twointerfaces.Interaction patterns: In addition to questionnaires, we analyzed the log datato get insights into the interaction patterns of users. Figure 5.7 shows the percent-age of users who used each interactive features of the interface at least once. Asexpected, almost all the participants typed at least one query during the interaction.Similarly, most of them hovered and clicked on conversations in the question listview. When interacting with the conversation view, over 54% of the users hoveredon the thread view and 39.4% clicked on rectangles in the thread view represent-ing comments. This result is rather encouraging, because despite being completelynew visualization features, they were used by a large portion of users. Finally, sort-ing and filtering comments were used by a smaller number of users (12% and 9%respectively). A possible explanation is that many participants did not notice thesefeatures, while interacting with the interface. Another reason could be that userswere able to fulfill their information needs with other interactive features.Qualitative data analysis: I analyzed the free-from text provided by 39 par-ticipants to gain insight into the users’ experience with the interface. In order tomake sense of these comments and suggestions, I carried out a bottom-up codingapproach: first, read all the free-form texts to gain an overview of the participants’feedback; second, find common themes and associate codes accordingly; and fi-nally, categorize the themes into the main types of feedback. Once I had identifiedthe common themes, I read all the free-form texts again in a second pass to ana-lyze how appropriate their associated codes were and to count frequency of occur-rences. In this second pass no new themes emerged. All the themes resulting frommy analysis are described below:General feedback (21 participants): From this analysis, it was found that thefeedback towards CQAVis was positive (67%), but there was also some negative(24%) and neutral (9%) feedback. More specifically, those who were positive to-wards the CQAVis interface expressed that the interface was simple and easy touse, which was an important design goal. According to participant P20, “The de-sign of this tool is very simple and easy to use. I am impressed with the tool’saccessibility and how intuitive it was...”. Also, some participants’ perceived speed139Figure 5.7: Interface features used by the participants.of task completion was enhanced by the interface as pointed out by P29 “Quickand reliable”.A number of participants thought that the system was able to satisfy their infor-mation needs effectively. For instance, P13 mentioned that “It gave me the answerI was looking for in a straightforward way, which is what you want from a searchtool. No need to scroll through lots and lots of Google pages...”. Similarly, P1 likedthe idea of finding high quality comments from similar question, “I like that youcan get similar questions and their corresponding high quality answers immedi-ately, without having to read all the comments”. Some people also compared theirpositive search experience with the traditional Qatar Living search tool: “QatarLiving is difficult to search but with this tool it gets much easier” [P22].Those who were critical about the interface mentioned that the text analyticstechniques need to be more accurate ‘“Need more accuracy for the result” [P6].Some people also questioned the reliability of the comments and suggested a wayto filter out spam comments: “It could be made better by filtering out spam com-ments. Some of the information has no actual basis...” [P21]. Also, one participantsuggested that for time sensitive questions, the system should consider the times-tamp of answers for ranking, “I asked: when does Ramadan start? But the topanswer was actually posted few years ago” [P18].Reactions to interface features (16 participants): There were also recurringcomments on particular features of the interface. For instance, some participants140liked the idea of having the question view and the conversation view side-by-side:“It is nice to have the questions and answers load quickly side to side withouthaving to open many tabs in the browser” [P22].Several people were impressed by the visual thread overview and the colorcoding to represent the usefulness of a comment. “I liked the color coding idea ofthe comments in the tool. It is very useful” [P24]. However, learning this featurerequires sometimes for one participant “At the beginning it was not clear what thecolored squares are...” [P8]. One possible explanation is that very few participants(2%) actually watched the video tutorial provided on the introduction page.Suggestions for improvement (4 participants): A few participants felt that theuser interface needs some improvements in general. There were also few specificsuggestions about the components, for instance, the size of the slider at the topneeds to be increased, so that it can be easily noticed [P10] and the interface shouldshow the textual label ‘not useful’ explicitly for the comments that fall into the leastuseful bin [P34].5.9 DiscussionWe now discuss the implications of our results and generalizable lessons we havelearned from the design study.5.9.1 Summary of FindingsBased on our analysis of the results, we now revisit our research questions men-tioned at the beginning of the previous section. The first question was what arethe possible benefits and limitations of CQAVis in supporting information seekingtasks. From the feedback data, the majority of participants who filled up the ques-tionnaires found the interface to be useful and felt that it enabled them to find therelevant answers to their questions. Also, the qualitative feedback from participantssuggests that their overall impression was quite positive.With regards to the second research question, when the participants were askedto indicate a preference, the majority of them chose CQAVis over the traditionalforum search tool. However, recall that the questionnaire data was filled up onlyby a fraction of participants, which may have introduced a positive bias. While141this prevents us from making strong claims from the questionnaire data alone, wecomplement the analysis with qualitative observation based on the free-form com-ments as well as from the interaction log data to get a deeper understanding of bothpositive and negative aspects of the interface features and their usage. In particular,we have emphasized critiques of the interface and suggestions for improvementsthat we got from those users, because this neutral or negative feedback is muchmore likely to generalize to the whole target population (as they are coming froma subset of the population with a likely positive bias).We should also consider that the interaction log data was analyzed over all theparticipants, thus arguably reflects overall usage patterns. In particular, the logdata reveals that not all the interface features were equally used. While some ofthe new interface features such as the thread overview were used by a fair numberof participants, still some participants did not use them. A possible explanationis that some participants might prefer the traditional way of scrolling through thecomments of the thread, while still having a situational awareness by looking at thethread view. In the future, capturing users’ eye gaze data could shed more light onthis aspect.5.9.2 Lessons LearnedWe now reflect upon on our design and evaluation of the CQAVis interface to sum-marize the lessons learned that can arguably be generalizable to other conversa-tional domains.DesignMost target users in our domain did not have enough familiarity with complexinteractive visualization. To support such users, we have focused on following de-sign principles which can be applicable to other domains where users have similarexpertise level.Less is more: In our early prototypes, we considered some advanced features,such as visually encoding additional data (e.g., comment length) and more com-plex interactions (e.g., navigate through the related-question-graph) with an aim tobetter support users. However, the feedback from users throughout the pilot stud-142ies led us to simplify the interface iteratively, eliminating such kinds of interactivevisualization features. Based on our experience, we suggest that when designingfor similar populations in the domain of conversations, the designer should sim-plify interface features iteratively to retain features that are not only useful but alsosimple and intuitive.Enhance learnability: We found that in our study users do not tend to spendtime reading the instructions or watch the video tutorial to learn the new inter-face. Therefore, the interface should enhance the learnability by providing self-explanatory components by adding more textual labels and tooltips.Introduce familiar visualizations: During the prototyping stage, we realizedthat novice users in the Qatar Living forums often find it difficult to understandcomplex visualizations. Therefore, the interface should use the visualization com-ponents that are easily understood by most people, such as bar graph based visual-ization.EvaluationWhile we argue that the web-based online study enhanced the ecological validityby evaluating with real forum readers performing their real tasks, it also posedseveral challenges. For instance, it was difficult to collect sufficient amount ofquantitative and qualitative feedback from a large number of participants. Whileit is common to collect users’ background and demographic information in theform of a pre-study questionnaire, in a pilot study we found that participants werereluctant to fill out the questionnaire. Therefore, the questionnaire was includedin the feedback form that the user could fill out, after they had interacted with theinterface.Even then the challenge was how to get feedback from a large amount of par-ticipants who have interacted with the interface. While a button for providing feed-back was available at the top of the interface, some participants did not even noticeit. To further enhance the likelihood of obtaining some feedback, we introduced apop-up screen that would appear reminding the user to submit the feedback whenthey move their cursor at the top of the screen. To provide further incentives to theuser, the message mentioned that the participant will be entered into a lottery of143winning 50 QAR gift cards. While all of the above techniques helped us to receivemore feedback, we call for more research on how to get a rich number of feedbackfrom a large number of participants in a Web-based study.5.10 SummaryIn this chapter, we presented an interactive system for exploring CQA forums asan example of how our visual text analytic solutions can be simplified and tailoredto specific domain problems. The resulting system, CQAVis, supports users tofind good answers to a newly-posed question, by combining a novel set of NLPand InfoVis techniques, informed by an understanding of the user requirementsin the domain of CQA. The underlying NLP methods automatically retrieve andrank a set of comments with respect to the new question, (i) by selecting a set ofquestion threads that are relevant to the user question, (i) by assigning a goodnessscore to the comments within these threads, and (iii) by measuring the similaritybetween the new question and the thread questions. The visual interface, whichwas simplified and tailored from the MultiConVis interface, helps users in rapidlynavigating through the useful comments, even if they are scattered around multipledifferent threads.Our large-scale Web study underlines the potential for tightly integrating NLPand InfoVis, offering the users a new way of information seeking in CQA forums.An important finding from the study is that although a large portion of the userpopulation did not have visualization expertise, the primary interactive visualiza-tion features were still widely used by participants. This suggests that by carefulconsideration of the target user characteristics and by iteratively simplifying thevisual encodings of the interface, it is possible to tailor a visual text analytic sys-tem to a target population with possibly low visualization literacy - not for justthose who have strong visualization expertise. It also reveals important lessons fordesigning and studying such systems for a user population with varying levels ofexpertise, which can arguably be generalizable for other conversational domains.144Chapter 6Reflection and ConclusionIn this dissertation, we explored how to identify and leverage critical synergies atthe intersection between natural language processing and information visualizationto support users in exploring a large amount of online conversations more effec-tively. Our work was motivated by the challenges arising from the volume andcomplexity of conversational data and the shortcomings of existing approaches indealing with such challenges.To address the information overload problem, we explored a design space cov-ering two dimensions: the scale of conversational data i.e., single vs. a set of con-versations; and the underlying text analytic model i.e., static vs. human-in-the-loop.We developed and evaluated a set of systems, each addressing a different aspect ofthis design space, which are presented in Chapter 2, 3, and 4. Subsequently, weconducted a design study to demonstrate that our solutions can be successfully tai-lored to develop a new system for addressing specific domain problems, such asproblems faced by users in a community question answering forum, as describedin Chapter 5.In this final chapter, we revisit our approach for designing visual text analyt-ics systems (Section 6.1), reflect upon the research impact of these systems (Sec-tion 6.2), and indicate open research questions and directions for future work (Sec-tion 6.4). We conclude the dissertation with some closing remarks about visual textanalytics for online conversations.1456.1 Reflection on the Design ApproachAfter presenting the design studies that focused on how to tightly integrate NLPand InfoVis for exploring online conversations, we take a step back to reflect on thewider context of designing visual text analytics. In particular, we critically analyzethe role of the designer in the process of integration between NLP and InfoVistechniques to derive lessons that are broadly applicable for designing visual textanalytics systems.Within the visualization community, there has been a significant advancementin the field of design study methodologies [86, 91, 111], which provides guidelineson how to perform design activities and how to validate different stages of design.However, when developing a visual text analytics system, in addition to designingthe visualizations, it is also necessary to devise a set of text analysis methods andvalidate the interpretability, accuracy, and usefulness of the output generated bythese methods. Arguably, devising suitable text analysis methods is just as criticalas visualization design in determining the effectiveness of a visual text analyticssystem. Therefore, within a user-centered design approach, a designer must con-sider what should be the most suitable text analysis methods and how to iterativelymodify these methods when their output is not sufficiently interpretable, accurate,and useful.Unfortunately, when designing visual text analytics systems, many researcherstreat text analysis models as black boxes without considering whether they are themost suitable models for the target domain problem. For example, many text vi-sualizations select the terms to be displayed based on their frequency [37], or theirTF-IDF scores [124], even though more sophisticated techniques are available [52]that could select better descriptive keyphrases. By not considering the most suit-able text analysis methods, often the system fails to effectively support the realworld tasks.Contrary to this trend, Chuang et al. opened the black-box and focused onhow to devise an interpretable and trustworthy text analysis model by aligning themodel with the tasks and user expertise in a particular problem domain [21]. Fromtheir experience of designing a dissertation browser, they distilled the followingprocess-oriented guidelines: align the model with the tasks, user expertise, and vi-146sual encoding; verify the modeling decisions to assess how well they fit an analyst’sgoals; iteratively modify a model when its output is incorrect or incomplete; andprogressively disclose data at multiple levels of abstractions, so that analysts canswitch between different levels of abstractions to interpret and verify the model’soutput. Within their guidelines, they discussed what are the possible approaches toimprove a candidate model, such as modify model parameters, modify the modelstructure, add more training data and leverage interactive machine learning tech-niques. Unfortunately, no adequate guidelines were provided on when and how tochoose a particular model modification approach. For instance, even though theyacknowledge that modifying the model by introducing interactive machine learningtechniques is a challenging problem, no further guidelines were provided on whenand how the designer should devise such techniques so that the resulting systemimproves significantly.Echoing the call for aligning the model with the tasks and visual encoding [21],we focused on the problem domain characterization and task abstractions first andthen devised the suitable text analysis models. Based on our data and task abstrac-tions, we also made modeling choices, i.e., choosing the suitable topic modelingand sentiment analysis methods.In the next step, we verified the performance of the topic model with end users.For instance, after designing ConVis, we attempted to verify the performance of thetopic models by running an informal evaluation with a small number of real users,as discussed in Chapter 2. Through this evaluation, we identified that the currentmodel sometimes does not match the user’s mental model and current tasks. There-fore, we pondered how the model could be modified so that it can support users inperforming their tasks more effectively. Since our analysis revealed that the per-ceived usefulness of the topic model depends on user’s mental model and currenttasks, allowing the user to revise the model was deemed to be more promisingthan other alternative approaches, such as modifying the model parameters, andthe model structure.Once we had decided to introduce interactive topic modeling (as describedin Chapter 4), we faced another important challenge of how to devise a minimalset of interactive topic revision operations, that real users would find useful. Toapproach this issue, we first identified a set of candidate operations from existing147Figure 6.1: Design stages of ConVis and ConVisIT.literature on topic revision and then we prioritized the operations based on threecriteria, i.e., 1) task relevancy, 2) topic model relevancy, and 3) redundancy. Thesecriteria helped us to tie the model modification process with the task abstractionsand the current topic model. As a result, we were able to design a new interactivetopic modeling method that better matches the goals of users engaged in exploringand analyzing conversations. An overview of our design process is illustrated inFigure 6.1. A similar design process was successfully applied when we introducedthe human-in-the-loop topic modeling approach in MultiConVisIT.Based on our experience and current literature, we summarize the followingguidelines for a designer of a visual text analytics system: We suggest that ratherthan treating text analysis models as black boxes, the designer should consider howto tailor and adapt these models based on a detailed analysis of specific user needsand requirements in the target domains. Furthermore, the designer should itera-tively analyze the performance of text analysis methods to determine whether tointroduce human-in-the-loop in the computation process or focus on improving themodel without considering human supervision. Finally, if the designer decides tointroduce human-in-the-loop computation, the type of interactive feedback opera-tions for modifying the model should be derived based on the data and tasks ab-stractions in the target domain. In essence, we call for applying the user-centereddesign approach to inform and iteratively refine both the text analysis methods and148interactive visualizations design.6.2 Impact of Our Visual Text Analytics SystemsSince our visual text analytics systems have been made publicly available, theyhave been tailored and adopted in a variety of domains, both in our work as well asin other research projects. In Chapter 5, we have already reported a design study fora community question answering forum, where our visual text analytics solutionswere simplified and tailored to support information seeking tasks.In addition to our work, several other researchers have applied or partiallyadopted the data abstractions and visual encodings of MultiConVis and ConVisin a variety of domains, ranging from news comments [32, 107], to online healthforums [76, 84], to educational forums [47]. We now analyze these recent worksand discuss similarities and differences with our systems.News comments: SENSEI1 is a research project that was funded by the Eu-ropean Union and was conducted in collaboration with four leading universitiesand two industry partners in Europe. The main goal of this project was to developsummarization and analytics technology to help users make sense of human con-versation streams from diverse media channels, ranging from comments generatedfor news articles to customer-support conversations in call centers.After the research work on developing ConVis was published and the tool wasmade publicly available, the SENSEI project researchers expressed their interest inadopting our system. Their primary objective was to evaluate their text summariza-tion and analytics technology by visualizing the results with ConVis, with the finalgoal of detecting end-user improvements in task performance and productivity.In their version of the interface2, they kept the main features of ConVis, namelythe topics, authors, and thread overview; and then added some new features to showtext analytics results specific to their application, as shown in Figure 6.2 [107]. Inparticular, within the thread overview, for each comment they encoded how muchthis comment agrees or disagrees with the original article, instead of showing thesentiment distribution of that comment. Another additional interactive feature is1www.sensei-conversation.eu2A video demo of their version of the interface is available atwww.youtube.com/watch?v=XIMP0cuiZIQ149Figure 6.2: A screenshot of the modified ConVis interface used in the SEN-SEI project. The interface shows the results of some additional text anal-ysis methods, namely the degree of agreement/disagreement between acomment and the original article (within the thread overview), the pre-dicted mood of the corresponding author (A), and the textual summaryof the conversation (B) [107].that clicking on an author element results in showing the predicted mood of thatauthor (using five different modes, i.e., amused, satisfied, sad, indignant, and disap-pointed). Furthermore, they added a summary view that shows a textual summaryof the conversation in addition to the detailed comments. Finally, they introducedsome new interactive features, such as zooming and filtering to deal with the con-versations that are very long with several hundreds of comments.Online health forums: Kwon et al. developed VisOHC [76], a visual ana-lytics system designed for administrators of online health communities (OHCs).In this paper, they cite our work and discuss the similarities as well as the differ-ences between VisOHC and ConVis. For instance, similar to the thread overviewin ConVis, they represented the comments of a conversation using a sequence ofrectangles and used the color encoding within those rectangles to represent senti-ment (see Figure 6.3). However, they encoded additional data in order to supportthe specific domain goals and tasks of OHC administrators. For instance, they150Figure 6.3: VisOHC visually represents the comments of a conversation us-ing a sequence of rectangles (F), where color within each rectangle rep-resents sentiment expressed in a comment. Additionally it shows a scat-ter plot (B), and a histogram view (C) (The figure is adapted from [76]).used a scatter plot to encode the similarities between discussion threads and a his-togram view to encode various statistical measures regarding the selected threads,as shown in Figure 6.3.Mamykina et al. analyzed how users in online health communities collectivelymake sense of the vast amount of information and opinions within an online dia-betes forum, called TuDiabetes [84]. Their study found that members of TuDia-betes often value a multiplicity of opinions rather than consensus. From their study,they concluded that in order to facilitate the collective sensemaking of such diver-sity of opinions, a visual text analytics tool like ConVis could be very effective.They also mentioned that in addition to topic modeling and sentiment analysis,some other text analysis methods related to their health forum under study, suchas detection of agreement and topic shift in conversation, should be devised andincorporated into tools like ConVis.Educational forums: More recently, Fu et al. presented iForum, an interac-tive visual analytics system for helping instructors in understanding the temporal151patterns of student activities and discussion topics in a MOOC forum [47]. Theymentioned that while the design of iForum has been inspired by tools such as Con-Vis, they have tailored their interface to the domain-specific problems of MOOCforums. For instance, like ConVis, their system also provides an overview of topicsand discussion threads, however, they focused more on temporal trends of an entireforum, as opposed to an individual conversation or a set of conversations related toa specific query.Beyond online conversations: Recently, Shen et al. present NameClari-fier [116], a visual analytics system that supports the user to interactively disam-biguate author names in bibliographic citation records. In this system, they par-tially adopted our visual encoding technique for arranging the facets namely topicsand authors and exposing the relations between them. More specifically, they en-coded the relations between a list of ambiguous authors and a list of confirmedauthors via subtle curves, similarly to how ConVis visualizes the relations betweentopics and authors by linking them to the corresponding comments in the threadoverview.6.3 Summary of ContributionsThe contributions of this dissertation can be summarized as follows:• A user requirements analysis based on extensive literature review in the do-main of blogs to inform our interface design for both a single conversationas well as a set of conversations (Chapter 2 and 3).• Adoption of a topic modeling method for mining topics from a single con-versation by taking advantage of the conversational features (Chapter 2). Wealso extended this method for creating a topic hierarchy for a collection ofconversations, by organizing the topics extracted from each conversation inthe collection (Chapter 3).• The design of ConVis and MultiConVis, which visualize both topic and opin-ion mining results along with a set of metadata, such as authors and positionof the comments. We also proposed a way to seamlessly integrate the two in-152terfaces to allow users to switch from exploring a collection of conversationsto a single conversation (Chapter 2 and 3).• Results from a series of user studies, namely an informal evaluation, a formallab-based study, and three case studies, which revealed the differences inuser performance and subjective opinions when our systems were comparedto traditional blog interfaces for exploring conversations (Chapter 2 and 3).• A novel interactive topic modeling approach specifically devised for asyn-chronous conversations. We developed: i) a system that revises the topicmodel generated from a single conversation, as well as the topic model gen-erated from a collection of conversations (Chapter 4) and ii) interactive fea-tures to help the user in performing a set of topic revision operations (Chap-ter 4).• Results from two lab-based user studies which revealed the potential utilityof our human-in-the-loop topic modeling approaches (Chapter 4).• Demonstration of how our generic solutions for integrating NLP and InfoVistechniques presented in Chapter 2 and 3 can be simplified and tailored to theinformation seeking tasks of a community question answering forum users(Chapter 5).• Evaluation of our new community question answering forum tool in the wildin an ecologically valid testing by deploying the system among real forumreaders (Chapter 5).• Generalizable lessons that can inform the design of visual interfaces for on-line conversations in other domains, as well as to design for user populationpossibly having low visualization expertise (Chapter 5).6.4 Limitations and Future WorkWhile this thesis has made some significant progress in supporting the tasks ofexploring online conversations, it also raises further challenges, open questions,153and ideas for future work. Here we discuss the key challenges and opportunitiesfor future research.How can we scale up our systems for big data? As social media conversationaldata is growing in size and complexity at an unprecedented rate, new challengeshave emerged from both computational and visualization perspectives. In partic-ular, we need to address the following aspects of big data, while designing visualtext analytics for online conversations:Volume: Most of the existing visualizations are inadequate to handle very largeamounts of raw conversational data. For example, ConVis scales with conversa-tions with hundreds of comments; however, it is unable to deal with a very longconversation consisting of more than a thousand comments. To tackle the scalabil-ity issue, we will investigate computational methods for filtering and aggregatingcomments, as well as devise interactive visualization techniques such as zoomingto progressively disclose the data from a high-level overview to low-level details.Velocity: The systems that we have developed do not process streaming con-versations. Yet in many real-world scenarios, conversational data is constantlyproduced at a high rate, which poses enormous challenges for mining and visual-ization methods. For instance, immediately after a product is released a businessanalyst may want to analyze text streams in social media to identify problems orissues, such as whether customers are complaining about a feature of the product.In these cases, timely analysis of the streaming text can be critical for the com-pany’s reputation. For this purpose, we aim to investigate how to efficiently minestreaming conversations and how to visualize the extracted information in real timeto the user.How can we support the user in tailoring our systems to a specify conversa-tional genre, a specific domain, or tasks? In Section 6.2, we already discussed howour current visual text analytics systems have been applied and tailored to variousdomains. However, in these systems, the user does not have flexibility in terms ofthe choice of the datasets and the available interaction techniques. Therefore, itmay take a significant amount of programming effort to re-design the interface fora specific conversational domain. For example, when we tailored our system to acommunity question answering forum with a specific user population in mind, wehad to spend a considerable amount of time modifying the existing code in order154to re-design the interface for the new conversational genre.In this context, can we enable a large number of users - not just those who havestrong programming skills to author visual interfaces for exploring conversationsin a new domain? To answer this question, we need to research how to constructan interactive environment that supports custom visualization design for differentdomains without requiring the user to write any code. Such interactive environ-ment would allow the user to have more control over the data to be representedand the interactive techniques to be supported. To this end, we will investigatecurrent research on general purpose visual authoring tools such as Lyra [110] andIVisDesigner [106], which provide custom visualization authoring environments,to understand how we can build a similar tool, but specifically for conversationaldata.How can the system adapt to a diverse range of users? A critical challenge ofintroducing a new visualization is that the effectiveness of visualization techniquescan be impacted by different user characteristics, such as visualization expertise,cognitive abilities, and personality traits [27]. Unfortunately, most previous workhas focused on finding individual differences for simple visualizations only, suchas bar and radar graphs [122]. It is still unknown how the individual differencesmight impact a more complex visualization like ConVis, that not only requirescoordinations between text and visualization but also supports more complex in-teractive techniques. In this regard, we will examine what aspects of a visual textanalytics system are impacted by user characteristics and how to dynamically adaptthe visualization to such characteristics.How can we leverage text analysis and visualization techniques to develop ad-vanced storytelling tools for online conversations? Data storytelling has becomeincreasingly popular among InfoVis practitioners such as journalists, who maywant to create a visualization from social media conversations and integrate it intotheir narratives to convey critical insights. Unfortunately, even sophisticated visu-alization tools like Tableau 3 offer only limited support for authoring data stories,requiring users to manually create textual annotations and organize the sequenceof visualizations. More importantly, they do not provide methods for processing3www.tableau.com155the unstructured or semi-structured data generated in online conversations.In this context, we aim to investigate how to leverage NLP and InfoVis tech-niques for online conversations to create effective semi-automatic authoring toolsfor data storytelling. More specifically, we need to devise methods for generatingand organizing the summary content from online conversations and choosing thesequence in which such content is delivered to users. To this end, we will investi-gate current research on narrative visualization [67, 112].6.5 Final RemarksThe overarching goal of this dissertation was to combine text analysis and inter-active visualization techniques to support users in exploring online conversations.To that aim, we posed a set of research questions in Chapter 1 that guided thedevelopment of our visual text analytics systems. These research questions wereanswered by synthesizing design study methodologies in information visualiza-tion, text analysis methods specifically designed to deal with conversational data,and human-in-the-loop computation to deal with noisy text analysis results.We applied these considerations to the design, implementation, and evalua-tion of a variety of text analytics systems. Our first system, ConVis addresses thechallenges of exploring and analyzing an asynchronous conversation, by offeringa visual overview of topics, authors, and the thread structure of a conversation(Chapter 2). Next, MultiConVis moves beyond visualizing a single conversationto a collection of conversations related to a given query (Chapter 3). It combinesa novel hierarchical topic modeling technique with interactive visualization in or-der to support users in understanding the discussions and allow them to seeminglyswitch from exploring a collection of conversations to a single conversation. Weconducted a series of user studies through informal evaluation, case studies, andlab-based studies, which revealed significant improvements in user performanceand subjective measures when our systems were compared to traditional blog in-terfaces. The outcomes from these studies also motivated us to introduce an in-teractive topic modeling approach. The resulting systems, ConVisIT and Multi-ConVisIT empower the user in revising the underlying topic models through anintuitive set of interactive features when the current models are noisy and/or insuf-156ficient to support their information seeking tasks (Chapter 4). Finally, the onlinedeployment of CQAVis, a visual interface for supporting information seeking incommunity question answering forums demonstrates that our systems can be ef-fectively tailored to a specific domain problem – a critical finding that indicates thegenerality and applicability of our approach (Chapter 5).Despite the tremendous advances in NLP and InfoVis, only little effort hasbeen devoted to combining sophisticated text analysis and interactive visualizationtechniques in a synergistic way to address information overload problems. Thisdissertation demonstrates that by tightly integrating advanced text analysis and In-foVis techniques, guided by a human-centered design approach, we can effectivelysupport users in dealing with these problems in a variety of contexts.We believe that exploring online conversations is just one example that can besupported more effectively by combining techniques from text analysis and infor-mation visualization, guided by a user-centred design approach. Beyond onlineconversations, there are many other types of text collections, such as scientificdocuments, news articles, and literature, where creating a strong synergy betweenthese two research areas is critical in addressing the information overload problem.Therefore, we envision that a similar approach for combining NLP and InfoViscould also help users in exploring these text collections more efficiently and effec-tively.157Bibliography[1] Macrumors, 2016 (accessed December 28, 2016). http://macrumors.com/.→ pages 3, 50, 52, 69[2] Slashdot, 2016 (accessed January 28, 2016). http://slashdot.com/. → pages35, 52[3] Daily Kos, 2017 (accessed February 01, 2017). http://dailykos.com/. →pages 35, 69[4] Alexa’s Internet traffic rating service, 2017 (accessed February 25, 2017).http://www.alexa.com/topsites. → pages 1[5] ColorBrewer, 2017 (accessed February 25, 2017). http://colorbrewer2.org/.→ pages 37[6] Pew Research, 2017 (accessed February 25, 2017).http://www.pewinternet.org/2016/11/11/social-media-update-2016/. →pages 1[7] Wordpress user activities, 2017 (accessed February 25, 2017).https://wordpress.com/activity/. → pages 1[8] Illinois Part of Speech Tagger, 2017 (accessed March 09, 2017).http://bit.ly/1xtjFHe. → pages 57[9] R. Aggarwal, R. Gopal, R. Sankaranarayanan, and P. V. Singh. Blog,blogger, and the firm: Can negative employee posts lead to positiveoutcomes? Information Systems Research, 23(2):306–322, 2012. → pages25, 26[10] E. Alberdi and D. H. Sleeman. Retax: A step in the automation oftaxonomic revision. Artifical Intelligence, 91(2):257–279, Apr. 1997. →pages 82158[11] D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domainknowledge into topic modeling via Dirichlet forest priors. In Proceedingsof the International Conference on Machine Learning, pages 25–32, 2009.→ pages 81, 83, 84[12] A. Barro´n-Ceden˜o, S. Filice, G. Da San Martino, S. Joty, L. Ma`rquez,P. Nakov, and A. Moschitti. Thread-level information for commentclassification in community question answering. In Proceedings of theAnnual Meeting of the Association for Computational Linguistics and theInternational Joint Conference of the Asian Federation of NaturalLanguage Processing (ACL-IJCNLP), pages 687–693, 2015. → pages 128[13] E. Baumer, M. Sueyoshi, and B. Tomlinson. Exploring the role of thereader in the activity of blogging. In Proceedings of the ACM Conferenceon Human Factors in Computing Systems (CHI), pages 1111–1120, 2008.→ pages 25, 26, 29, 55[14] D. Bawden and L. Robinson. The dark side of information: overload,anxiety and other paradoxes and pathologies. Journal of InformationScience, 35(2):180–191, 2009. → pages 2[15] D. Boyd. A blogger’s blog: Exploring the definition of a medium.Reconstruction, 6(4), 2006. → pages 11[16] N. Cao, D. Gotz, J. Sun, Y.-R. Lin, and H. Qu. SolarMap: Multifacetedvisual analytics for topic exploration. In Proceedings of the IEEEConference on Data Mining (ICDM), pages 101–110, 2011. → pages 32,33[17] G. Carenini, G. Murray, and R. Ng. Methods for Mining and SummarizingText Conversations. Morgan Claypool, 2011. → pages 1, 3, 10, 11, 78[18] S. Carpendale. Evaluating information visualizations. In InformationVisualization: Human-Centered Issues and Perspectives, pages 19–45.Springer, 2008. → pages 75, 135[19] J. Choo, C. Lee, C. K. Reddy, and H. Park. Utopian: User-driven topicmodeling based on interactive nonnegative matrix factorization. IEEETransactions on Visualization and Computer Graphics (Proceedings ofVAST), 19(12):1992–2001, 2013. → pages 20, 80, 81, 82, 83, 84[20] J. Chuang, C. D. Manning, and J. Heer. Termite: Visualization techniquesfor assessing textual topic models. In Proceedings of the International159Working Conference on Advanced Visual Interfaces (AVI), pages 74–77,2012. → pages 81[21] J. Chuang, D. Ramage, C. Manning, and J. Heer. Interpretation and trust:Designing model-driven visualizations for text analysis. In Proceedings ofthe ACM Conference on Human Factors in Computing Systems (CHI),pages 443–452. ACM, 2012. → pages 146, 147[22] J. Chuang, S. Gupta, C. Manning, and J. Heer. Topic model diagnostics:Assessing domain relevance via topical alignment. In Proceedings of theConference on Machine Learning, pages 612–620, 2013. → pages 78[23] J. Chuang, Y. Hu, A. Jin, J. D. Wilkerson, D. A. McFarland, C. D.Manning, and J. Heer. Document exploration with topic modeling:Designing interactive visualizations to support effective analysisworkflows. In NIPS Workshop on Topic Models: Computation, Applicationand Evaluation, pages 1–4, 2013. → pages 81, 83[24] A. Cockburn, A. Karlson, and B. B. Bederson. A review of overview+detail, zooming, and focus+ context interfaces. ACM Computing Surveys(CSUR), 41(1):2, 2008. → pages 36, 130[25] C. Collins, S. Carpendale, and G. Penn. Docuburst: Visualizing documentcontent using language structure. In Computer graphics forum, volume 28,pages 1039–1046. Wiley Online Library, 2009. → pages 6[26] M. Collins and N. Duffy. Convolution kernels for natural language. InT. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances inNeural Information Processing Systems (NIPS), pages 625–632. MITPress, 2002. → pages 129[27] C. Conati, G. Carenini, E. Hoque, B. Steichen, and D. Toker. Evaluatingthe impact of user characteristics and different layouts on an interactivevisualization for decision making. Computer Graphics Forum(Proceedings of EuroVis), 33(3):371–380, 2014. → pages 155[28] W. Cui, S. Liu, Z. Wu, and H. Wei. How hierarchical topics evolve in largetext corpora (Proceedings of InfoVis). IEEE Transactions on Visualizationand Computer Graphics, 20(12):2281–2290, 2014. → pages 53, 54[29] C. Culy and V. Lyding. Double Tree: An advanced kwic visualization forexpert users. In 14th International Conference Information Visualisation,pages 98–103, 2010. → pages 6160[30] K. Dave, M. Wattenberg, and M. Muller. Flash forums and forumReader:navigating a new kind of large-scale online discussion. In Proceedings ofthe ACM Conference on Computer-Supported Cooperative Work (CSCW),pages 232–241, 2004. → pages 25, 26, 31, 55[31] M. De Choudhury and H. Sundaram. Why do we converse on socialmedia?: an analysis of intrinsic and extrinsic network factors. InProceedings of the ACM SIGMM International Workshop on Social Media,pages 53–58. ACM, 2011. → pages 55[32] P. Deokgun, S. Simranjit, D. Nicholas, , and E. Niklas. Supportingcomment moderators in identifying high quality online news comments. InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems (CHI), pages 1114–1125, 2016. → pages 2, 149[33] N. Diakopoulos, M. Naaman, and F. Kivran-Swaine. Diamonds in therough: Social media visual analytics for journalistic inquiry. In IEEESymposium on Visual Analytics Science and Technology (VAST), pages115–122, 2010. → pages 54[34] J. Donath. A semantic approach to visualizing online conversations.Communications of the ACM, 45(4):45–49, 2002. → pages 31[35] J. Donath, K. Karahalios, and F. Vie´gas. Visualizing conversation. Journalof Computer-Mediated Communication, 4(4):1–9, 1999. → pages 25[36] M. Do¨rk, S. Carpendale, C. Collins, and C. Williamson. VisGets:Coordinated visualizations for web-based information exploration anddiscovery. IEEE Transactions on Visualization and Computer Graphics(Proceedings of InfoVis), 14(6):1205–1212, 2008. → pages 52[37] M. Do¨rk, D. Gruen, C. Williamson, and S. Carpendale. A visualbackchannel for large-scale events. IEEE Transactions on Visualizationand Computer Graphics (Proceedings of InfoVis), 16(6):1129–1138, 2010.→ pages 53, 146[38] M. Do¨rk, N. H. Riche, G. Ramos, and S. Dumais. PivotPaths: Strollingthrough faceted information spaces. IEEE Transactions on Visualizationand Computer Graphics (Proceedings of InfoVis), 18(12):2709–2718, Dec.2012. → pages 32, 33[39] C. dos Santos, L. Barbosa, D. Bogdanova, and B. Zadrozny. Learninghybrid representations to retrieve semantically equivalent questions. In161Proceedings of the Annual Meeting of the Association for ComputationalLinguistics (ACL) and the International Joint Conference on NaturalLanguage Processing (ACL-IJCNLP), pages 694–699, 2015. → pages 128[40] W. Dou, L. Yu, X. Wang, Z. Ma, and W. Ribarsky. HierarchicalTopics:Visually exploring large text collections using topic hierarchies. IEEETransactions on Visualization and Computer Graphics (Proceedings ofVAST), 19(12):2002–2011, 2013. → pages 53, 82[41] N. B. Ellison, R. Gray, J. Vitak, C. Lampe, and A. T. Fiore. Calling allFacebook friends: Exploring requests for help on Facebook. InProceedings of the International Conference on Web and Social Media(ICWSM), pages 155–164. → pages 122[42] J. L. Elsas. Leveraging collection structure in information retrieval withapplications to search in conversational social media. PhD thesis,Carnegie Mellon University. → pages 11[43] A. Endert, P. Fiaux, and C. North. Semantic interaction for visual textanalytics. In Proceedings of the ACM Conference on Human Factors inComputing Systems (CHI), pages 473–482. ACM, 2012. → pages 84[44] S. Faridani, E. Bitton, K. Ryokai, and K. Goldberg. Opinion Space: ascalable tool for browsing online comments. In Proceedings of the ACMConference on Human Factors in Computing Systems (CHI), pages1175–1184, 2010. → pages 32[45] N. FitzGerald, G. Carenini, G. Murray, and S. Joty. Exploitingconversational features to detect high-quality blog comments. InProceedings of the Canadian Artificial Intelligence, pages 739–744, 2007.→ pages 46[46] J. L. Fleiss, B. Levin, and M. C. Paik. Statistical methods for rates andproportions. John Wiley & Sons, 2013. → pages 106, 114[47] S. Fu, J. Zhao, W. Cui, and H. Qu. Visual analysis of MOOC forums withiForum. IEEE Transactions on Visualization and Computer Graphics(Prooceedings of VAST), 23(1):201–210, 2017. → pages 149, 152[48] T. Furukawa, Y. Matsuo, I. Ohmukai, K. Uchiyama, and M. Ishizuka.Social networks and reading bahavior in blogosphere. In Proceedings ofthe International AAAI Conference on Weblogs and Social Media(ICWSM),2007. → pages 25162[49] M. Galley, K. McKeown, E. Fosler-Lussier, and H. Jing. Discoursesegmentation of multi-party conversation. In Proceedings of the AnnualMeeting on Association for Computational Linguistics (ACL), pages562–569, 2003. → pages 33, 59[50] P. Goffin, W. Willett, J.-D. Fekete, and P. Isenberg. Exploring theplacement and design of word-scale visualizations. IEEE Transactions onVisualization and Computer Graphics (Proocedings of InfoVis), 20(12):2291–2300, Dec 2014. → pages 64[51] F. M. Harper, D. Raban, S. Rafaeli, and J. A. Konstan. Predictors of answerquality in online Q&A sites. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems (CHI), pages 865–874, 2008. →pages 122, 123[52] K. S. Hasan and V. Ng. Automatic keyphrase extraction: A survey of thestate of the art. In Proceedings of the Annual Meeting of the Association forComputational Linguistics (ACL), pages 1262–1273, 2014. → pages 146[53] S. Havre, E. Hetzler, P. Whitney, and L. Nowell. ThemeRiver: visualizingthematic changes in large document collections. Transactions onVisualization and Computer Graphics, 8(1):9–20, 2002. → pages 53[54] M. A. Hearst. TileBars: visualization of term distribution information infull text information access. In Proceedings of the ACM Conference onHuman Factors in Computing Systems (CHI), pages 59–66, 1995. → pages6[55] M. A. Hearst. Design recommendations for hierarchical faceted searchinterfaces. In Proceedings of the SIGIR Workshop on Faceted Search,pages 26–30, 2006. → pages 32[56] M. A. Hearst. ‘Natural’ search user interfaces. Communications of theACM, 54(11):60–67, Nov. 2011. → pages 122[57] M. A. Hearst, M. Hurst, and S. T. Dumais. What should blog search looklike? In Proceedings of the ACM Workshop on Search in Social Media,pages 95–98, 2008. → pages 25, 55[58] J. Heer and G. G. Robertson. Animated transitions in statistical datagraphics. Transactions on Visualization and Computer Graphics, 13(6):1240–1247, 2007. → pages 93, 96163[59] E. Hoque and G. Carenini. ConVis: A visual text analytic system forexploring blog conversations. Computer Graphics Forum (ProceedingsEuroVis), 33(3):221–230, 2014. → pages v, 23, 54, 66, 78, 83[60] E. Hoque and G. Carenini. MultiConVis: A visual text analytics system forexploring a collection of online conversations. In Proceedings of the ACMConference on Intelligent User Interfaces (IUI), pages 96–107, 2016. →pages v, vi, 49, 77[61] E. Hoque and G. Carenini. Interactive topic modeling for exploringasynchronous online conversations: Design and evaluation of ConVisIT.ACM Transactions on Interactive Intelligent Systems (TiiS), 6(1):7:1–7:24,Feb. 2016. → pages vi, 78[62] E. Hoque, G. Carenini, and S. R. Joty. Interactive exploration ofasynchronous conversations: Applying a user-centered approach to designa visual text analytic system. In Proceedings Workshop on InteractiveLanguage Learning, Visualization, and Interfaces (ILLVI 2014), inconjunction with the ACL-2014, pages 45–52, 2014. → pages v[63] E. Hoque, S. Joty, M. Lluı´s, and G. Carenini. CQAVis: Visual text analyticsfor community question answering. In Proceedings of the ACM conferenceon Intelligent User Interfaces (IUI), pages 161–172, 2017. → pages vi, 119[64] D. Horowitz and S. D. Kamvar. The anatomy of a large-scale social searchengine. In Proceedings of the International Conference on World Wide Web(WWW), pages 431–440. ACM, 2010. → pages 122[65] G. Hsieh and S. Counts. mimir: A market-based real-time question andanswer service. In Proceedings of the SIGCHI Conference on HumanFactors in Computing Systems (CHI), pages 769–778, 2009. → pages 123[66] Y. Hu, J. Boyd-Graber, B. Satinoff, and A. Smith. Interactive topicmodeling. Machine Learning, 95(3):423–469, 2014. → pages 20, 78, 80,81, 83, 84[67] J. Hullman and N. Diakopoulos. Visualization rhetoric: Framing effects innarrative visualization. IEEE Transactions on Visualization and ComputerGraphics (Proceedings of InfoVis), 17(12):2231–2240, 2011. → pages 156[68] Indratmo, J. Vassileva, and C. Gutwin. Exploring blog archives withinteractive visualization. In Proceedings of the Working Conference onAdvanced Visual Interfaces (AVI), pages 39–46, 2008. → pages 31, 50, 52164[69] Q. Jones, G. Ravid, and S. Rafaeli. Information overload and the messagedynamics of online interaction spaces: A theoretical model and empiricalexploration. Information Systems Research, 15(2):194–210, 2004. →pages 3, 4, 24, 25, 48[70] S. Joty, G. Carenini, and R. T. Ng. Topic segmentation and labeling inasynchronous conversations. Journal of Artificial Intelligence Research,47:521–573, 2013. → pages 7, 9, 11, 24, 28, 33, 35, 36, 46, 50, 78, 81, 83[71] M. C. Kaptein, C. Nass, and P. Markopoulos. Powerful and consistentanalysis of likert-type ratingscales. In Proceedings of the ACM Conferenceon Human Factors in Computing Systems (CHI), pages 2391–2394, 2010.→ pages 72, 101[72] B. K. Kaye. Web side story: An exploratory study of why weblog users saythey use weblogs. AEJMC Annual Conference, 2005. → pages 25, 26, 55[73] D. A. Keim and D. Oelke. Literature fingerprinting: A new method forvisual literary analysis. In IEEE Symposium on Visual Analytics Scienceand Technology (VAST), pages 115–122, 2007. → pages 6[74] D. Kim and T. J. Johnson. Political blog readers: Predictors of motivationsfor accessing political blogs. Telematics and Informatics, 29(1):99–109,Feb. 2012. → pages 25, 26[75] R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba,and S. Fidler. Skip-thought vectors. In Advances in neural informationprocessing systems, pages 3294–3302, 2015. → pages 61[76] B. Kwon, S.-H. Kim, S. Lee, J. Choo, and J. Y. Jina Huh. Visohc:Designing visual analytics for online health communities. IEEETransactions on Visualization and Computer Graphics, 2015. → pages 2,149, 150, 151[77] H. Lam. A framework of interaction costs in information visualization.Transactions on Visualization and Computer Graphics, 14(6):1149–1156,2008. → pages 30, 39, 132[78] H. Lam, E. Bertini, P. Isenberg, C. Plaisant, and S. Carpendale. Empiricalstudies in information visualization: Seven scenarios. IEEE Transactionson Visualization and Computer Graphics, 18(9):1520–1536, 2012. →pages 8, 41, 47, 62, 70, 75, 99, 129, 135165[79] S. Laqua and M. A. Sasse. Exploring blog spaces: a study of blog readingexperiences using dynamic contextual displays. In Proceedings of theBritish HCI Group Annual Conference on People and Computers:Celebrating People and Technology, pages 252–261, 2009. → pages 11[80] B. Lee, G. Smith, G. G. Robertson, M. Czerwinski, and D. S. Tan.FacetLens: Exposing trends and relationships to support sensemakingwithin faceted datasets. In Proceedings of the ACM Conference on HumanFactors in Computing Systems (CHI), pages 1293–1302, 2009. → pages 32[81] H. Lee, J. Kihm, J. Choo, J. Stasko, and H. Park. iVisClustering: Aninteractive visual document clustering via topic modeling. In ComputerGraphics Forum (Proceedings of EuroVis), volume 31, pages 1155–1164,2012. → pages 20, 80, 81, 82, 83, 84[82] S. Liu, X. Wang, J. Chen, J. Zhu, and B. Guo. TopicPanorama: a fullpicture of relevant topics. In Proceedings of the IEEE Conference on VisualAnalytics Science and Technology (VAST), pages 183–192, 2014. → pages53[83] C. Macdonald, R. L. Santos, I. Ounis, and I. Soboroff. Blog track researchat TREC. SIGIR Forum, 44(1):58–75, 2010. → pages 25, 27[84] L. Mamykina, D. Nakikj, and N. Elhadad. Collective sensemaking inonline health forums. In Proceedings of the ACM Conference on HumanFactors in Computing Systems (CHI), pages 3217–3226, 2015. → pages149, 151[85] A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C.Miller. TwitInfo: aggregating and visualizing microblogs for eventexploration. In Proceedings of the ACM Conference on Human Factors inComputing Systems (CHI), pages 227–236, 2011. → pages 54[86] S. McKenna, D. Mazur, J. Agutter, and M. Meyer. Design activityframework for visualization design. IEEE Transactions on Visualizationand Computer Graphics (Proceedings of InfoVis), 20(12):2191–2200,2014. → pages 146[87] G. Mishne. Information access challenges in the blogspace. In Workshopon Intelligent Information Access (IIIA), 2006. → pages 25, 26[88] M. R. Morris, J. Teevan, and K. Panovich. What do people ask their socialnetworks, and why?: A survey study of status message Q&A behavior. In166Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems (CHI), pages 1739–1748, 2010. → pages 122, 123, 137, 138[89] A. Moschitti. Efficient Convolution Kernels for Dependency andConstituent Syntactic Trees. In J. Fu¨rnkranz, T. Scheffer, andM. Spiliopoulou, editors, Machine Learning: ECML 2006, volume 4212 ofLecture Notes in Computer Science. 2006. → pages 129[90] S. A. Munson and P. Resnick. Presenting diverse political opinions: howand how much. In Proceedings of the ACM Conference on Human Factorsin Computing Systems (CHI), pages 1457–1466, 2010. → pages 25, 26[91] T. Munzner. A nested model for visualization design and validation.Transactions on Visualization and Computer Graphics (Proocedings ofInfoVis), 15(6):921–928, 2009. → pages 8, 24, 41, 146[92] T. Munzner. Visualization Analysis and Design. CRC Press, 2014. →pages 125[93] G. Murray, E. Hoque, and G. Carenini. Opinion summarization andvisualization. In F. A. Pozzi, E. Fersini, E. Messina, and B. Liu, editors,Sentiment Analysis in Social Networks, pages 171 – 187. MorganKaufmann, 2017. → pages 2[94] P. Nakov, L. Ma`rquez, A. Moschitti, W. Magdy, H. Mubarak, A. A. Freihat,J. Glass, and B. Randeree. SemEval-2016 task 3: Community questionanswering. In Proceedings of the International Workshop on SemanticEvaluation (SemEval), 2016. → pages 126, 127, 128, 129[95] S. Narayan and C. Cheshire. Not too long to read: The tldr interface forexploring and navigating large-scale discussion spaces. In HawaiiConference on System Sciences (HICSS), pages 1–10, 2010. → pages 27,29, 31[96] B. A. Nardi, D. J. Schiano, M. Gumbrecht, and L. Swartz. Why we blog.Communications of the ACM, 47(12):41–46, 2004. → pages 11[97] M. E. Newman and M. Girvan. Finding and evaluating communitystructure in networks. Physical Review E, 69(2):026113, 2004. → pages59, 61, 86, 91[98] M. Nicosia, S. Filice, A. Barro´n-Ceden˜o, I. Saleh, H. Mubarak, W. Gao,P. Nakov, G. Da San Martino, A. Moschitti, K. Darwish, L. Ma`rquez,167S. Joty, and W. Magdy. QCRI: Answer selection for community questionanswering - experiments for Arabic and English. In In Proceedings of theInternational Workshop on Semantic Evaluation (SemEval), 2015. →pages 128[99] N. Nikitina, S. Rudolph, and B. Glimm. Interactive ontology revision. WebSemantics: Science, Services and Agents on the World Wide Web, 12:118–130, 2012. → pages 82[100] P. Paatero and U. Tapper. Positive matrix factorization: A non-negativefactor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994. → pages 82[101] V. Pascual-Cid and A. Kaltenbrunner. Exploring asynchronous onlinediscussions through hierarchical visualisation. In IEEE Conference onInformation Visualization, pages 191–196, 2009. → pages 6, 31[102] P. Pirolli, P. Schank, M. Hearst, and C. Diehl. Scatter/gather browsingcommunicates the topic structure of a very large text collection. InProceedings of the ACM Conference on Human Factors in ComputingSystems (CHI), pages 213–220. ACM, 1996. → pages 81[103] X. Qiu and X. Huang. Convolutional neural tensor network architecture forcommunity-based question answering. In Proceedings of the InternationalJoint Conference on Artificial Intelligence (IJCAI), pages 1305–1311,2015. → pages 128[104] D. Raban and F. Harper. Motivations for answering questions online. NewMedia and Innovative Technologies, 73, 2008. → pages 123[105] D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: Asupervised topic model for credit attribution in multi-labeled corpora. InProceedings Conference on Empirical Methods on Natural LanguageProcessing (EMNLP), pages 248–256, 2009. → pages 81[106] D. Ren, T. Ho¨llerer, and X. Yuan. iVisDesigner: Expressive interactivedesign of information visualizations. IEEE Transactions on Visualizationand Computer Graphics (Proceedings of InfoVis), 20(12):2092–2101,2014. → pages 155[107] G. Riccardi, C. Balamurali, A R, B. Fabio, Favre, F. Carmelo, A. Funk,R. Gaizauskas, and V. Lanzolla. Report on the summarization views of thesensei prototype. In Technical report, 2015. → pages 149, 150168[108] A. Ritter, C. Cherry, and B. Dolan. Unsupervised modeling of twitterconversations. In Human Language Technologies: The 2010 AnnualConference of the North American Chapter of the Association forComputational Linguistics, pages 172–180. Association for ComputationalLinguistics, 2010. → pages 11, 50[109] W. Sack. Conversation Map: an interface for very-large-scaleconversations. Journal of Management Information Systems, 17(3):73–92,2000. → pages 7, 31[110] A. Satyanarayan and J. Heer. Lyra: An interactive visualization designenvironment. 33(3):351–360, 2014. → pages 155[111] M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology:reflections from the trenches and the stacks. IEEE Transactions onVisualization and Computer Graphics, 18(12):2431–2440, 2012. → pages8, 121, 146[112] E. Segel and J. Heer. Narrative visualization: Telling stories with data.IEEE Transactions on Visualization and Computer Graphics, 16(6):1139–1148, 2010. → pages 156[113] C. Seifert, B. Kump, W. Kienreich, G. Granitzer, and M. Granitzer. On thebeauty and usability of tag clouds. In IEEE Conference on InformationVisualisation, pages 17–25, 2008. → pages 6[114] A. Severyn and A. Moschitti. Structural relationships for large-scalelearning of answer re-ranking. In Proceedings of the International ACMConference on Research and Development in Information Retrieval(SIGIR), pages 741–750. ACM, 2012. → pages 129[115] A. Severyn and A. Moschitti. Learning to rank short text pairs withconvolutional deep neural networks. In Proceedings of the Conference onResearch and Development in Information Retrieval (SIGIR), pages373–382, 2015. → pages 128[116] Q. Shen, T. Wu, H. Yang, Y. Wu, H. Qu, and W. Cui. Nameclarifier: Avisual analytics system for author name disambiguation. IEEETransactions on Visualization and Computer Graphics (Proceedings ofVAST), 23(1):141–150, Jan 2017. → pages 152[117] Y. Shen, W. Rong, Z. Sun, Y. Ouyang, and Z. Xiong. Question/answermatching for cqa system via combining lexical and sequential information.169In Proceedings of the AAAI Conference on Artificial Intelligence, pages275–281, 2015. → pages 128[118] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEETransactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000. → pages 34, 59, 61, 86, 90[119] P. V. Singh, N. Sahoo, and T. Mukhopadhyay. Seeking variety: A dynamicmodel of employee blog reading behavior. In Workshop on InformationSystems and Economics, 2010. → pages 25, 26, 27, 55[120] M. Steinberger, M. Waldner, M. Streit, A. Lex, and D. Schmalstieg.Context-preserving visual links. IEEE Transactions on Visualization andComputer Graphics (Proceedings of InfoVis), 17(12):2249–2258, 2011. →pages 38[121] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede. Lexicon-basedmethods for sentiment analysis. Computational Linguistics, 37(2):267–307, 2011. → pages 35, 62[122] D. Toker, C. Conati, G. Carenini, and M. Haraty. Towards adaptiveinformation visualization: on the influence of user characteristics. InInternational Conference on User Modeling, Adaptation, andPersonalization, pages 274–285. Springer, 2012. → pages 155[123] G. D. Venolia and C. Neustaedter. Understanding sequence and replyrelationships within email conversations: a mixed-model visualization. InProceedings of the ACM Conference on Human Factors in ComputingSystems (CHI), pages 361–368, 2003. → pages 6[124] F. B. Vie´gas, S. Golder, and J. Donath. Visualizing email content:portraying relationships from conversational histories. In Proceedings ofthe ACM Conference on Human Factors in Computing Systems (CHI),pages 979–988, 2006. → pages 7, 53, 146[125] M. Wattenberg and D. Millen. Conversation thumbnails for large-scalediscussions. In Extended Abstract Proceedings of the ACM Conference onHuman Factors in Computing Systems (CHI), pages 742–743, 2003. →pages 6, 31[126] M. Wattenberg and F. B. Vie´gas. The Word Tree, an interactive visualconcordance. IEEE Transactions on Visualization and Computer Graphics,14(6), 2008. → pages 6170[127] F. Wei, S. Liu, Y. Song, S. Pan, M. X. Zhou, W. Qian, L. Shi, L. Tan, andQ. Zhang. TIARA: a visual exploratory text analytic system. InProceedings ACM Conference on Knowledge Discovery and Data Mining,pages 153–162, 2010. → pages 32, 53[128] W. Willett, J. Heer, and M. Agrawala. Scented Widgets: Improvingnavigation cues with embedded visualizations. IEEE Transactions onVisualization and Computer Graphics (Proceedings of InfoVis), 13(6):1129–1136, 2007. → pages 57, 65[129] Y. Wu, S. Liu, K. Yan, M. Liu, and F. Wu. OpinionFlow: Visual analysis ofopinion diffusion on social media. IEEE Transactions on Visualization andComputer Graphics (Proceedsings of VAST), 20(12):1763–1772, Dec 2014.→ pages 54[130] Y. Yang, S. Pan, Y. Song, J. Lu, and M. Topkara. User-directednon-disruptive topic model update for effective exploration of dynamiccontent. In Proceedings of the ACM Conference on Intelligent UserInterfaces (IUI), pages 158–168, 2015. → pages 81[131] K.-P. Yee and M. Hearst. Content-centered discussion mapping. OnlineDeliberation 2005/DIAC-2005, 2005. → pages 31[132] J. Zhao, C. Collins, F. Chevalier, and R. Balakrishnan. Interactiveexploration of implicit and explicit relations in faceted datasets.Transactions on Visualization and Computer Graphics, 19(12):2080–2089,2013. → pages 32, 33[133] J. Zhao, N. Cao, Z. Wen, Y. Song, Y. Lin, and C. Collins. #FluxFlow:Visual analysis of anomalous information spreading on social media(proocedings of VAST). IEEE Transactions on Visualization and ComputerGraphics, 20(12):1773–1782, Dec 2014. → pages 54[134] D. Zhou, S. A. Orshanskiy, H. Zha, and C. L. Giles. Co-ranking authorsand documents in a heterogeneous network. In Seventh IEEE InternationalConference on Data Mining, pages 739–744, 2007. → pages 35, 59[135] A. R. Zinman. Me, myself, and my hyperego: understanding peoplethrough the aggregation of their digital footprints. PhD thesis, MIT, 2011.→ pages 25, 27171Appendix ASupplementary Materials forChapter 3This appendix contains supplemental materials for Chapter 3, namely the scriptused by the experimenter to run the study and questionnaires used during the study.A.1 Script for User Study172  Script for User Study  STEP 1: PARTICIPANT GREETING Tell Participant: "Thank you for participating in our study. The whole process today will last approximately 90 minutes. First you will answer a short pre-study questionnaire.  Then, we will move to the main portion of the study, which will involve you reading few blog conversations and writing short summaries.    At the end of the study, you will  be given a short post-study questionnaire." Action: Have participant fill out and sign the consent form AND the Record of Participation STEP 2: PRE-STUDY QUESTIONNAIRES Tell participant: "Now we will have you answer a series of questions”. Action: Open up user form. Provide the user_id. The user will fill up the pre-study, then select interface. Tell participant: Please fill up the following questionnaires. STEP 3: USER TRAINING Tell Participant: "OK, now we are going to do the main part of this study." Action: Open up browser and set to Full Screen (F11).  Action: Training tutorial.  Action: Training tutorial. Open the interface with a sample dataset and demonstrate the key features. For the Interface MultiConVis: The visual interface consists of three major components including: 1) a Topic Hierarchy which visualizes all the topics in the whole collection of conversations using an indented tree representation. 2) The  Conversation List shows the current set of conversations as a list and 3), a Timeline View presents the volume of comments of the whole collection over time. For each conversation: 1) the interface shows the sentiment distribution as a stacked bar, 2) and the height of this stacked bar indicates the number of comments of this conversation, and 3) the count of topics and authors are represented as horizontal bars, and 4) finally a sparkline  represents the volume of comments over time. As  you select a particular conversation, the Conversation List is replaced by the ConVis interface, where the Thread Overview visually represents the whole conversation encoding the thread structure and how the sentiment is expressed for each comment(middle); The Facet Overview presents topics and authors circularly around the Thread Overview; and the Detail View presents the actual conversation in a scrollable list (right). Here, topics are connected to their related comments as well as to their parents in the Topic Hierarchy via curved links.   Demonstrate interactions in List mode: - Highlighting by topics 173  - Expand/ collapse topics - Sorting conversations - Click timeline button to show sentiment over time - Filter by time  Demonstrate interactions in Conversation Mode: - Hovering the mouse over a facet element  - related comments and facets are highlighted  - tooltips become visible - Clicking over a facet element:  - a thick border is drawn along that element  - the interface scrolls down to related comments in detail view  - topic words are highlighted - Hovering over a comment   - related topic and author are highlighted - Clicking a comment  - sentiment words are highlighted For the Interface Macrumors: - Demonstrate Interactions:  - Sort the list of conversations - Search by keyword - Switching between list mode and conversation mode.  STEP 4: SELECT TASK Please read the following task. Dataset: iPhone bending  The issue of ‘iPhone bending’ went viral on social media after the iPhone 6 was launched in September 2014.  This incident triggered a huge amount of discussions in Macrumors, a blog site that regularly publishes Apple related news and allows participants to make comments.  Now, you are going to explore a set of conversations where people are discussing about this issue. You can take notes during the exploration using the opened text editor. At the end of exploring and reading through the set of conversations, your task is to write a summary of what you think are the major discussion points and most insightful comments within the set of conversations. You have 20 minutes to complete the task. Dataset: iPad release  iPad air 2 was launched in October 2014.  This event triggered a huge amount of discussions in Macrumors, a blog site that regularly publishes Apple related news and allows participants to make comments.  174  Now, you are going to explore a set of conversations where people are discussing about this issue. You can take notes during the exploration using the opened text editor. At the end of exploring and reading through the set of conversations, your task is to write a summary of what you think are the major discussion points and most insightful comments within the set of conversations. You have 20 minutes to complete the task. STEP 5: IN-STUDY QUESTIONNAIRE After each task, the participant will fill up a set of in-study questionnaires Do the above steps (6-7) two times (perform two tasks with two different datasets). STEP 6: POST STUDY QUESTIONNAIRE At the end of all the tasks, the participant will fill up a post-study questionnaires STEP 7: DEBRIEFING Tell Participant: “Thank you very much again for your participation. Would you have any other comments or questions?” Action: Get Payment form signed  175A.2 Questionnaires176PRE-STUDY QUESTIONNAIRES  ID:   ____________________________________________________ Gender:   ______________________ Age:   _______ Occupation: _____________________________________________________________ Field of study (if student): _____________________________________________ 1. How often do you read blogs? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day) 2. Please rate how strongly you agree or disagree with each of the following statements with respect to reading blogs. I read blogs for Information seeking Strongly disagree disagree neutral agree strongly agree I read blogs for guidance/ opinion seeking strongly disagree disagree neutral agree strongly agree I read blogs for fact checking strongly disagree disagree neutral agree strongly agree I read blogs for sense of my belongingness with the blog community strongly disagree disagree neutral agree strongly agree I read blogs for fun and enjoyment strongly disagree disagree neutral agree strongly agree I read blogs for political surveillance strongly disagree disagree neutral agree strongly agree I read blogs for anti-traditional media sentiment strongly disagree disagree neutral agree strongly agree I read blogs for blog presentation/ characteristics strongly disagree disagree neutral agree strongly agree 177  3. What are the types of blogs you generally read?  Political  Sports  Business  Technology  Health  Personal  Others (Specify):____________________ 4. How often do you comment on other people’s blogs? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day)  5. How often do you write your own blog (any type)? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day)  6. On an average how many blog conversations do you read in the same session? 1-2 3-5 6-10 10-20 >20        178IN-STUDY QUESTIONNAIRES Please rate how strongly you agree or disagree with each of the following statements with respect to the interface (A) you have used for exploring blog conversations.   I found this interface to be useful for browsing conversations  Strongly disagree disagree neutral agree strongly agree I found this interface easy to use  strongly disagree disagree neutral agree strongly agree I found this interface enjoyable to use  strongly disagree disagree neutral agree strongly agree This interface enabled me to find the major points that were discussed in the set of conversations. strongly disagree disagree neutral agree strongly agree This interface enabled me to find more insightful comments in the set of conversations. strongly disagree disagree neutral agree strongly agree This interface enabled me to write a more informative summary about the major points that were discussed in the set of conversations. strongly disagree disagree neutral agree strongly agree  179IN-STUDY QUESTIONNAIRES Please rate how strongly you agree or disagree with each of the following statements with respect to the interface (B) you have just used for exploring blog conversations.   I found this interface to be useful for browsing conversations  Strongly disagree disagree neutral agree strongly agree I found this interface easy to use  strongly disagree disagree neutral agree strongly agree I found this interface enjoyable to use  strongly disagree disagree neutral agree strongly agree This interface enabled me to find the major points that were discussed in the set of conversations. strongly disagree disagree neutral agree strongly agree This interface enabled me to find more insightful comments in the set of conversations. strongly disagree disagree neutral agree strongly agree This interface enabled me to write a better summary about the major points that were discussed in the set of conversations. strongly disagree disagree neutral agree strongly agree     180I found the topic hierarchy to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  I found visual summary of each conversation to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  I found the visual representation of sentiment distribution over time to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  I found the interactive feature for filtering conversation by timeline to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  The switching between Conversation List and Conversation View was easy to understand. strongly disagree  disagree  neutral  agree  strongly agree  181 182Appendix BSupplemental Materials forChapter 4This appendix contains supplemental materials for Chapter 4, namely the scriptused by the experimenter to run the study, questionnaires that were used duringthe study, and the instructions for human raters who rated the user-generated sum-maries.B.1 User Study 1This section contains the documents for Study 1 as described in Section 4.6.1,where we compare between ConVisIT, ConVis, and a traditional interface.B.1.1 Script for User Study183Script for User Study  STEP 1: PARTICIPANT GREETING Tell Participant: "Thank you for participating in our study. The whole process today will last approximately 90 minutes. First, you will answer a short pre-study questionnaire.  Then, we will move to the main portion of the study, which will involve you reading few blog conversations and writing short summaries.    At the end of the study, you will  be given a short post-study questionnaire." Action: Have participant fill out and sign the consent form AND the Record of Participation STEP 2: PRE-STUDY QUESTIONNAIRES Tell participant: "Now we will have you answer a series of questions”. Action: Open up user form. Provide the user_id. The user will fill up the pre-study, then select interface. Tell participant: Please fill up the questionnaires. STEP 3: USER TRAINING Tell Participant: "OK, now we  are going to do the main part of this study." Action: Open up a browser and set to Full Screen (F11).  Action: Training tutorial. Open the interface with a sample dataset and demonstrate the key features. For the Interface ConVis:  “The visualization that you see here is called ConVis. It’s a visualization that can be used to explore and analyze a blog conversation. The Thread Overview visually represents the whole conversation encoding the thread structure and how sentiments are expressed for each message(middle). An overview of topics and authors presented circularly around the Thread Overview. The actual conversation is presented in a scrollable list (right). Here, topics and authors are connected to their related comments via curved links. Thread overview: It displays each message of the discussion as a horizontal stacked bar. Each stacked bar encodes three different metadata (comment length, position in the thread, and depth of the message within the thread) and the sentiment. The stacked bars are vertically ordered according to their positions in the thread starting from the top with indentation indicating thread depth. The height of each stacked bar encodes the comment length. Facet Overview: Both topics and authors are positioned according to their chronological order in the conversation starting from top - The font size of a topic encodes how much it has been discussed with compared to the other topics within the whole conversation. 184Interactions: - Hovering the mouse over a facet element  - related comments and facets are highlighted  - tooltips become visible - Clicking over a facet element:  - a thick border is drawn along that element  - the interface scrolls down to related comments in detail view  - topic words are highlighted - Hovering over a comment   - related topic and author are highlighted - Clicking a comment  - sentiment words are highlighted For the Interface ConVis-IT: Explain and demonstrate the interactive topic revision operations: - Split a topic: If a topic is too generic, you can split a topic into further topics by double clicking on it. You can collapse back by double clicking on it again. - Merge two topics: You can drag one topic over the other to merge them together. For Interface Slashdot: Explain and demonstrate the basic features of scrolling through comments and expanding/ collapsing a parent comment to  show/ hide its children.  STEP 4: Select Task Action: Please select a conversation from the list here (but not the one that was used before). Please read the following task. Your are going to explore the selected conversation according to your own interest. You can take notes during the task either on opened text editor or on paper. At the end of reading the conversation you can write a summary of the key points you find within the conversation. You have 15 minutes to work on the task. STEP 5: In-Study Questionnaires Action: Please fill up the questionnaire based on your experience with the interface you just used. Do the above steps (4-5) three times (perform three tasks). STEP 6: Post-study questionnaires At the end of all the tasks, the participant will fill up a post-study questionnaires 185STEP 7: DEBRIEFING Tell Participant: “Thank you very much again for your participation. Would you have any other comments or questions?” Action: Get Payment form signed 186B.1.2 Questionnaires187PRE-STUDY QUESTIONNAIRES  ID:   ____________________________________________________ Gender:   ______________________ Age:   _______ Occupation: _____________________________________________________________ Field of study (if student): _____________________________________________  1. How often do you read blogs? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day) 2. What are your major motivations for reading blogs (Circle multiple options if relevant)? strongly disagree disagree neutral agree strongly agree Information seeking strongly disagree disagree neutral agree strongly agree Guidance/ opinion seeking strongly disagree disagree neutral agree strongly agree Fact checking strongly disagree disagree neutral agree strongly agree Sense of belongingness with the Blog users/community strongly disagree disagree neutral agree strongly agree Fun and enjoyment strongly disagree disagree neutral agree strongly agree Political surveillance strongly disagree disagree neutral agree strongly agree Anti-traditional media sentiment strongly disagree disagree neutral agree strongly agree Blog presentation/characteristics  188 3. What are the types of blogs you generally read?  Political  Sports  Business  Technology  Health  Personal  Others (Specify):____________________ 4. How often do you comment on other people’s blog? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day)  5. How often do you write blog (any type)? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day)  6. How often do you use blogs to make a decision/ choice? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day)    189IN-STUDY QUESTIONNAIRES Please rate how strongly you agree or disagree with each of the following statements with respect to the interface (Slashdot) you have used for exploring blog conversations.  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface to be useful for browsing conversations  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface easy to use  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface enjoyable to use  strongly disagree  disagree  neutral  agree  strongly agree  This interface enabled me to find more insightful comments  strongly disagree  disagree  neutral  agree  strongly agree  I found the indented-list representation of the conversation to be useful  strongly disagree  disagree  neutral  agree  strongly agree  Scrolling through the long conversation is useful for finding more insightful comments. strongly disagree  disagree  neutral  agree  strongly agree  Showing  the detailed comments  only (without any overview) is useful for browsing conversations  190IN-STUDY QUESTIONNAIRES Please rate how strongly you agree or disagree with each of the following statements with respect to the interface (ConVis) you have used for exploring blog conversations.  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface to be useful for browsing conversations  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface easy to use  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface enjoyable to use  strongly disagree  disagree  neutral  agree  strongly agree  This interface enabled me to find more insightful comments  strongly disagree  disagree  neutral  agree  strongly agree  I found the visual representation of the discussion topic/ author to be useful  strongly disagree  disagree  neutral  agree  strongly agree  I found visual representation of the thread overview to be useful strongly disagree  disagree  neutral  agree  strongly agree  I found the highlighting the relations between topic and author to be useful strongly disagree  disagree  neutral  agree  strongly agree  I found the selection of  comments based on topic/ author to be useful for navigating long conversation 191IN-STUDY QUESTIONNAIRES Please rate how strongly you agree or disagree with each of the following statements with respect to the interface (ConVisIT) you have used for exploring blog conversations.  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface to be useful for browsing conversations  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface easy to use  strongly disagree  disagree  neutral  agree  strongly agree  I found this interface enjoyable to use  strongly disagree  disagree  neutral  agree  strongly agree  This interface enabled me to find more insightful comments  strongly disagree  disagree  neutral  agree  strongly agree  I found the feature of splitting a topic further into sub-topics to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  I found the feature of merging topics together to be useful.     192 193B.1.3 Evaluating User-Generated Summary by Human Raters194195B.2 User Study 2This section contains the documents for Study 2 as described in Section 4.6.2,where we compare between ConVisIT, ConVis, and a traditional interface.B.2.1 Script for User Study196Script for User Study  STEP 1: PARTICIPANT GREETING Tell Participant: "Thank you for participating in our study. The whole process today will last approximately 90 minutes. First, you will answer a short pre-study questionnaire.  Then, we will move to the main portion of the study, which will involve you reading few blog conversations and writing short summaries.    At the end of the study, you will  be given a short post-study questionnaire." Action: Have participant fill out and sign the consent form AND the Record of Participation STEP 2: PRE-STUDY QUESTIONNAIRES Tell participant: "Now we will have you answer a series of questions”. Action: Open up user form. Provide the user_id. The user will fill up the pre-study, then select interface. Tell participant: Please fill up the following questionnaires. STEP 3: USER TRAINING Tell Participant: "OK, now we are going to do the main part of this study." Action: Open up a browser and set to Full Screen (F11).  Action: Training tutorial.  Action: Training tutorial. Open the interface with a sample dataset and demonstrate the key features. Features in MultiConVis: The visual interface consists of three major components including: 1) a Topic Hierarchy which visualizes all the topics in the whole collection of conversations using an indented tree representation. 2) The  Conversation List shows the current set of conversations as a list and 3), a Timeline View presents the volume of comments of the whole collection over time. For each conversation: 1) the interface shows the sentiment distribution as a stacked bar, 2) and the height of this stacked bar indicates the number of comments of this conversation, and 3) the count of topics and authors are represented as horizontal bars, and 4) finally a sparkline  represents the volume of comments over time. As  you select a particular conversation, the Conversation List is replaced by the ConVis interface, where the Thread Overview visually represents the whole conversation encoding the thread structure and how the sentiment is expressed for each comment(middle); The Facet Overview presents topics and authors circularly around the Thread Overview; and the Detail View presents the actual conversation in a scrollable list (right). Here, topics are connected to their related comments as well as to their parents in the Topic Hierarchy via curved links.   Demonstrate interactions in List mode: - Highlighting by topics 197- Expand/ collapse topics - Sorting conversations - Click timeline button to show sentiment over time - Filter by time  Demonstrate interactions in Conversation Mode: - Hovering the mouse over a facet element  - related comments and facets are highlighted  - tooltips become visible - Clicking over a facet element:  - a thick border is drawn along that element  - the interface scrolls down to related comments in detail view  - topic words are highlighted - Hovering over a comment   - related topic and author are highlighted - Clicking a comment  - sentiment words are highlighted - Adding summary to topics:  - You can summarize the keypoints that were discussed about a topic, by clicking on the summary icon of that topic node. Additional features in MultiConVisIT: Explain topic hierarchy revision features: You can revise the topic hierarchy presented here according to your own needs. For example, if you think that the current topic is too broad, you can ask the system to show fewer more generic children nodes, by either double clicking or selecting the menu by right clicking on it. You can also remove this additional level of children by double clicking on the parent topic.  You can also change the topic assignment by dragging one topic over another topic. There are two ways to do this: 1) you can merge a topic as a sibling to another topic, 2) you can place it as a child to another topic. If you feel that a topic is less relevant, or doesn’t make any sense, you can drag that topic to the recycle bin. You can also rename a topic if the current topic does not represent its corresponding textual comments. This is critical for creating a more informative summary because the name of the topic needs to match the major discussion points of the summaries. Finally, at any time you can undo the last topic revision operation you have made, by clicking on the undo button.  198 STEP 4: Select Task Please read the following task. For Dataset iPhone bending: The issue of ‘iPhone bending’ went viral on social media after the iPhone 6 was launched in September 2014.  Soon after the product was released, some people claimed that this new phone can easily bend in the pocket while sitting on it. This incident triggered a huge amount of discussions in Macrumors, a blog site that regularly publishes Apple related news and allows participants to make comments.  You are working for Apple as a business analyst. Your task is to find the major discussion points about the iPhone bending issue and summarize each of them under the most appropriate corresponding topic. The final outcome will be a summary of conversations organized according to a topic hierarchy that you will have to show and discuss with your colleagues. So you want to make sure that the topic hierarchy and the summary of major discussion points are as informative and as clear as possible. You have 30 minutes to work on the task. For Dataset iPad release: The iPad air 2 was launched in October 2014.  This event triggered a huge amount of discussions in Macrumors, a blog site that regularly publishes Apple related news and allows participants to make comments.  You are working for Apple as a business analyst. Your task is to find the major discussion points about the iPad release issue and summarize each of them under the most appropriate corresponding topic. The final outcome will be a summary of conversations organized according to a topic hierarchy that you will have to show and discuss with your colleagues. So you want to make sure that the topic hierarchy and the summary of major discussion points are as informative and as clear as possible. You have 30 minutes to work on the task. STEP 5: IN-STUDY QUESTIONNAIRE After each task, the participant will fill up a set of in-study questionnaires Do the above steps (6-7) two times (perform two tasks with two different datasets). STEP 6: POST STUDY QUESTIONNAIRE At the end of all the tasks, the participant will fill up post-study questionnaires STEP 7: DEBRIEFING Tell Participant: “Thank you very much again for your participation. Would you have any other comments or questions?” Action: Get Payment form signed  199B.2.2 Questionnaires200PRE-STUDY QUESTIONNAIRES  ID:   ____________________________________________________ Gender:   ______________________ Age:   _______ Occupation: _____________________________________________________________ Field of study (if student): _____________________________________________ 1. How often do you read blogs? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day) 2. Please rate how strongly you agree or disagree with each of the following statements with respect to reading blogs. I read blogs for Information seeking Strongly disagree disagree neutral agree strongly agree I read blogs for guidance/ opinion seeking strongly disagree disagree neutral agree strongly agree I read blogs for fact checking strongly disagree disagree neutral agree strongly agree I read blogs for sense of my belongingness with the blog community strongly disagree disagree neutral agree strongly agree I read blogs for fun and enjoyment strongly disagree disagree neutral agree strongly agree I read blogs for political surveillance strongly disagree disagree neutral agree strongly agree I read blogs for anti-traditional media sentiment strongly disagree disagree neutral agree strongly agree I read blogs for blog presentation/ characteristics strongly disagree disagree neutral agree strongly agree 201 3. What are the types of blogs you generally read?  Political  Sports  Business  Technology  Health  Personal  Others (Specify):____________________ 4. How often do you comment on other people’s blogs? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day)  5. How often do you write your own blog (any type)? Never Rarely Occasionally Frequently Very frequently  (several times a year) (several times a month) (several times a week) (several times a day)  6. On an average how many blog conversations do you read in the same session? 1-2 3-5 6-10 10-20 >20        202IN-STUDY QUESTIONNAIRES Please rate how strongly you agree or disagree with each of the following statements with respect to the interface (A) you have just used for exploring blog conversations.   I found this interface to be useful for browsing conversations  Strongly disagree disagree neutral agree strongly agree I found this interface easy to use  strongly disagree disagree neutral agree strongly agree I found this interface enjoyable to use  strongly disagree disagree neutral agree strongly agree This interface enabled me to find the major points that were discussed in the set of conversations. strongly disagree disagree neutral agree strongly agree This interface enabled me to find more insightful comments in the set of conversations. strongly disagree disagree neutral agree strongly agree This interface enabled me to write a better summary about the major points that were discussed in the set of conversations. strongly disagree disagree neutral agree strongly agree     203I found the topic hierarchy to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  I found visual summary of each conversation to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  I found the visual representation of sentiment distribution over time to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  I found the interactive feature for filtering conversation by timeline to be useful.  strongly disagree  disagree  neutral  agree  strongly agree  The switching between Conversation List and Conversation View was easy to understand. strongly disagree  disagree  neutral  agree  strongly agree    204IN-STUDY QUESTIONNAIRES Please rate how strongly you agree or disagree with each of the following statements with respect to the interface (B) you have just used for exploring blog conversations.    I found this interface to be useful for browsing conversations  Strongly disagree disagree neutral agree strongly agree I found this interface easy to use  strongly disagree disagree neutral agree strongly agree I found this interface enjoyable to use  strongly disagree disagree neutral agree strongly agree This interface enabled me to find the major points that were discussed in the set of conversations. strongly disagree disagree neutral agree strongly agree This interface enabled me to find more insightful comments in the set of conversations. strongly disagree disagree neutral agree strongly agree This interface enabled me to write a better summary about the major points that were discussed in the set of conversations. strongly disagree disagree neutral agree strongly agree    205I found the feature of showing less subtopics to a parent topic to be useful. strongly disagree  disagree  neutral  agree  strongly agree  I found the feature of showing more subtopics to a parent topic to be useful. strongly disagree  disagree  neutral  agree  strongly agree  I found the feature of merging topics as siblings topic together to be useful. strongly disagree  disagree  neutral  agree  strongly agree  I found the feature of adding topics as children topic of another topic node to be useful. strongly disagree  disagree  neutral  agree  strongly agree  I found the feature of removing irrelevant topics to recycle bin to be useful strongly disagree  disagree  neutral  agree  strongly agree     206  207B.2.3 Evaluating User-Generated Summary by Human Raters208209Appendix CSupplemental Materials forChapter 5This appendix contains supplemental materials for Chapter 5, namely the question-naires used during the study.C.1 User Study210Figure C.1: The introduction page.211Figure C.2: The post-study questionnaire regarding the user’s subjective ex-perience.212Figure C.3: The post-study questionnaire regarding the user’s backgroundand prior Web experience.213Appendix DParticipant Consent FormsThe following consent form was used for the user study in Chapter 4. A consentform with identical wording was used in Chapter 3, with the exceptions aboutthe amount of of payment ($15 instead of $20) and the duration of the study (75minutes instead of 90 minutes).214 Version 1.1 June  2013 Page 1 of 2       THE UNIVERSITY OF BRITISH COLUMBIA   Department of Computer Science  2366 Main Mall  Vancouver, B.C., V6T 1Z4   Date:   Research Participant Consent Form  Principal Investigators  Dr. Giuseppe Carenini, Associate Professor, Department of Computer Science, University of British Columbia,  (xxx) xxx – xxxx  Research Assistants Enamul Hoque Prince, Doctoral Student, Department of Computer Science, University of British Columbia, (xxx) xxx-xxxx  Project Purpose and Procedures The purpose of this study is to investigate the potential of different visualization methods to better support the users in finding information from conversations. You will fill up a pre-study questionnaire about your demographic and expertise. Then, you will perform few tasks using the given visual interfaces. Each task will consist of presenting conversational data along with a textual question on the displayed data. You will  read the conversations and answer the questions by using keyboard/ mouse. Finally, you will be interviewed regarding your experience using the studied visual interfaces. This study will take a maximum of 90 minutes.  Restrictions on participation All participants should have self-reported normal visual acuity (at least 20/50 acuity with correction) and self-reported normal (unassisted) hearing.  Confidentiality Your identity will be kept strictly confidential.  All of your data/ results will be kept completely anonymous.  None of the forms will contain any information that would permit anyone to link the results with you.  The test forms will be coded to protect your anonymity and will be stored in a secured laboratory room and/or a password-protected server.  Remuneration/Compensation This experiment will take a maximum of 90 minutes to complete, and you will receive twenty dollars for your participation.  215 Version 1.1 June  2013 Page 2 of 2 Contact Information About the Project If you have any questions or require further information about the study you may contact Giuseppe Carenini at (604) 822 – 5109 or by Email at carenini@cs.ubc.ca.  Contact for information about the rights of research subjects If you have any concerns about your treatment or rights as a research subject, you may contact the Research Subject Information Line in the UBC Office of Research Services at 604-822-8598.  Consent We intend for your participation in this study to be pleasant and stress-free.  Your participation is entirely voluntary and you may refuse to participate or withdraw from the study at any time.   Your signature below indicates that you have received a copy of this consent form for your own records.  Your signature indicates that you consent to participate in this study.  You do not waive any legal rights by signing this consent form.   I, ________________________________, agree to participate in the study as outlined above. My participation in this study is voluntary and I understand that I may withdraw at any time.  _______________________________________________________________________ Participant’s Signature                                     Email Address                                  Date   ____________________________________________________ Investigator’s Signature                                       Date 216

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0347620/manifest

Comment

Related Items