UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Automatic conceptual window grouping with frequent pattern matching Scholtz, Anna 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_november_scholtz_anna.pdf [ 6.83MB ]
Metadata
JSON: 24-1.0384599.json
JSON-LD: 24-1.0384599-ld.json
RDF/XML (Pretty): 24-1.0384599-rdf.xml
RDF/JSON: 24-1.0384599-rdf.json
Turtle: 24-1.0384599-turtle.txt
N-Triples: 24-1.0384599-rdf-ntriples.txt
Original Record: 24-1.0384599-source.json
Full Text
24-1.0384599-fulltext.txt
Citation
24-1.0384599.ris

Full Text

Automatic Conceptual Window Grouping with Frequent PatternMatchingbyAnna ScholtzB.Sc., Technische Universität Chemnitz, 2016A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Computer Science)The University of British Columbia(Vancouver)October 2019c© Anna Scholtz, 2019The following individuals certify that they have read, and recommend to the Faculty of Graduateand Postdoctoral Studies for acceptance, the thesis entitled:Automatic Conceptual Window Grouping with Frequent Pattern Matchingsubmitted by Anna Scholtz in partial fulfillment of the requirements for the degree of Master ofSciencein Computer Science.Examining Committee:Reid Holmes, Computer ScienceSupervisorThomas Fritz, Computer ScienceSupervisory Committee MemberiiAbstractWhile working, software developers constantly switch between different projects and tasks and usemany different applications, web resources and files. These diverse resources are scattered acrossmany windows and lead to cluttered workspaces that can distract developers in their workflows.Having mechanisms to determine which resources belong together for working on a project,would allow us to develop tools that could support developers in organizing their work, decluttertheir workspace and switch between projects. Existing approaches in this area often either requireusers to manually define which resources belong together, or do not examine how users wouldgroup the resources themselves and how to best support them.In this thesis we present an approach that automatically detects groups of applications andresources that developers use and are relevant to the tasks and projects they are working on. Thesegroups are referred to as Conceptual Groups. The approach applies frequent pattern analysis onrecorded interaction data and clusters these to retrieve conceptual groups. To measure the accuracyof our approach, we conducted a study with 11 participants and compared it to existing approacheswhich were outperformed by up to 50%.iiiLay SummarySoftware developers use many applications and files while working on different tasks and projectson their computers. Over time, they open more and more windows, tabs and applications whichmakes it harder to find the right window to switch to and potentially distracts workers. In thisthesis, I describe an approach that automatically determines which applications and documentsdevelopers user and are relevant to the tasks and projects they work on. This approach can be usedfor various productivity tools for decluttering workspaces or navigating between different projects.To determine how accurately this approach can determine which applications and documents arerelated to a certain project, I evaluated it in a study with 11 participants and compared it to existingapproaches.ivPrefaceAll of the work presented henceforth was conducted in the Software Practices Laboratory at theUniversity of British Columbia. All projects and associated methodswere approved by the Univer-sity of British Columbias Research Ethics Board [certificate #H18-02647]I was the lead investigator, responsible for the concept formation, data collection, analysis andmanuscript composition. Reid Holmes and Thomas Fritz were involved throughout the conceptformation and manuscript composition.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Context Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.1 Manual Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Automatic Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Frequent Pattern Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Navigation Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1 Study Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Study Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8vi3.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4.1 How Many Applications and Resources Are Usually Open? . . . . . . . . 103.4.2 How Much Time Is Spent in Active Windows? . . . . . . . . . . . . . . . 103.4.3 Are There Frequently Recurring Interaction Patterns? . . . . . . . . . . . . 103.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Ground Truth Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1 Study Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1.1 Study Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2 Study Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.4.1 Group Size and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 154.4.2 Group Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.4.3 Frequent Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Approach for Detecting Conceptual Groups . . . . . . . . . . . . . . . . . . . . . . . 205.1 Event Data Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 Pre-Processing Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.3 Frequent Pattern Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.4 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.5 Application and Resource Assignment . . . . . . . . . . . . . . . . . . . . . . . . 236 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.2 Comparison to Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327.1 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327.2 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337.3 Construct Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348.1 Supporting Professionals with Conceptual Groups . . . . . . . . . . . . . . . . . . 34vii8.1.1 Tool Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348.2 Prototype Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368.2.1 Accuracy Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . 378.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378.3.1 Field Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388.3.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40A Pilot Study Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43A.1 Recruiting Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43A.2 Setup and Introduction Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46A.3 Final Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51B Ground Truth Study Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54B.1 Recruiting Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54B.2 Setup and Introduction Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55B.3 Study Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61B.3.1 Task 1 - Let’s Go Travel . . . . . . . . . . . . . . . . . . . . . . . . . . . 61B.3.2 Task 2 - Step by Step Blockchain . . . . . . . . . . . . . . . . . . . . . . 63B.3.3 Task 3 - Raytracer Documentation . . . . . . . . . . . . . . . . . . . . . . 80B.4 Final Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83viiiList of TablesTable 3.1 Pilot study: Total open applications, tabs, windows. . . . . . . . . . . . . . . . 10Table A.1 Pilot study participant metadata. . . . . . . . . . . . . . . . . . . . . . . . . . . 45ixList of FiguresFigure 3.1 Pilot study: Pop-up for indicating recent workflow. . . . . . . . . . . . . . . . 9Figure 3.2 Pilot study: Usage duration of specific tabs and windows aggregated across allparticipants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Figure 3.3 Screenshot of tool for visualising detected patterns from pilot data. The time-line on the top indicates the end and start of frequent patterns that have beendetected. Frequent patterns consisting of the same applications and resourcesappear in the same row. Below is a log of every recorded event which was usedfor manual inspection. If events are part of a pattern then they get highlightedin a color that correspond to a frequent pattern and indicated in the legend onthe right side. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Figure 4.1 Pop-up for self-reporting conceptual groups. . . . . . . . . . . . . . . . . . . 16Figure 4.2 Frequent pattern occurrences (P3, P6, P8). Each row indicates the start andend of a specific pattern. Patterns consist of a specific set of applications andresources. Different patterns developed for different participants. Vertical linesmark project switches that participants worked on: T1 - blockchain, T2 - travelplanning, T3 - raytracer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 5.1 Overview of the steps involved to detect conceptual groups. . . . . . . . . . . 24Figure 5.2 Constraints for creating frequent patterns. Different patterns are indicatedon a timeline based on their occurence and denoted by different characters(A,B,X , ...) Actual: patterns detected in the event data. (1) only different pat-terns, (2) the closest occurrences of patterns and (3) patterns with less thanc = 3 events inbetween can be merged. . . . . . . . . . . . . . . . . . . . . . 25xFigure 6.1 Accuracy of our approach for each response of each participant. “Correct”:data assigned to same groups as in ground truth, “Wrong”: incorrectly as-signed data, “Missing”: data indicated in the ground truth but missing in thegenerated groups, “Likely Correct”: data that has been part of the group ear-lier or later but not at the time of the response. Vertical black lines mark theaverage correctness. Number on the left refer to the participant. Numbers onthe right indicate the difference in number of generated groups vs. number ofgroups indicated by participants. . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 6.2 Boxplots of accuracies of related approaches for each response. . . . . . . . . 31Figure 8.1 Prototype of our tool for displaying and interacting with conceptual groups. . . 35Figure A.1 Pilot study consent form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Figure A.2 Pilot study final survey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Figure B.1 Ground truth study consent form. . . . . . . . . . . . . . . . . . . . . . . . . 60Figure B.2 Travel planning task description. . . . . . . . . . . . . . . . . . . . . . . . . . 62Figure B.3 Spreadsheet template for the travel planning task. . . . . . . . . . . . . . . . . 63Figure B.4 Blockchain implementation task description. . . . . . . . . . . . . . . . . . . 79Figure B.5 Raytracer documentation task description. . . . . . . . . . . . . . . . . . . . . 82xiAcknowledgmentsI would like to thank my supervisor Reid Holmes and Thomas Fritz for their support and guidancethroughout my research and writing process.I would like to thank my family for supporting me from afar, and especially my brother Michaelwho always had an open ear for my complaints and helped me brainstorm various ideas.I would also like to thank the members of the Software Practices Lab for providing me adviceon my research and piloting my studies as well as the participants of all my studies for dedicatingtheir time.Finally, I must thank my friends I have made in Canada and across the globe for keeping mesane: most notably Puneet Mehrotra, Siddhesh Khandelwal, Hayley Guillou, Giovanni Viviani,Nico Ritschel, Neil Newman and many more.xiiChapter 1IntroductionWorking on multiple projects and tasks at the same time and constantly switching between themis quite common for software development professionals in today’s dynamic work environment [4,11]. Previous work [14, 15] and results of a pilot study in which knowledge workers were moni-tored for up to 10 work days show that while working on their tasks, individuals interact with up to60 different computer applications, access different web pages to seek information, and use variousdocuments to store data. Over time, users open dozens of applications and resources which oftenleads to cluttered workspaces, making it more tedious to find the relevant resources, to resumeprevious work, as well as it increases the potential to get distracted [18].To help with the organization of resourcess and workspaces, various approaches have beendevised for grouping documents that are related to specific tasks. However, those either requireusers to manually define their groups [27], do not account for resources that are part of multiplegroups [1], or do not evaluate how accurately these groups match with how users would want togroup these resources themselves [22, 24].The objective of this thesis work is to automatically detect groups of applications and resourcesfrom interaction data to support developers in switching between different projects and tasks andorganizing their workspaces. We refer to these groups as Conceptual Groups. Conceptual groupsconsist of resources, such as documents or web pages, and applications as well as previous actionsthat were performed in the same context. Conceptual groups could be related to a certain project,which again can consist of different subtasks. For example, the conceptual group related to im-plementing a game for iOS might consist of Xcode, all created program files, a Terminal used forversion control, a web browser and browsed websites and an iOS simulator. A conceptual grouprelated to writing a scientific paper might be composed of a text editor, all LaTeX files the paper1consists of, a Terminal to compile the paper, and a PDF viewer. It is possible that some applicationsor resources are part of multiple different conceptual groups. For example, web browsers and webengines are often used in many different contexts to search for information. Conceptual groupspotentially change over time as users start new subtasks which require different applications orresources.In our work, we are interested in the characteristics of these conceptual groups and whetherthey can get detected automatically. For the automatic detection, we assume that developers useand switch between the same group of applications and resources repeatedly for a certain task orproject and that we can use frequent pattern mining to detect the groups to support them in theirwork.For our work, we address the following research questions:RQ1 What are the characteristics and similarities of the way that different developers group theirapplications and artifacts?RQ2 How accurately can conceptual groups be detected using frequent pattern analysis?RQ3 How accurate is our model relative to approaches from related work in group detection?To address these research questions, our main contributions in this thesis are a pilot studythat investigates how users switch between applications and documents and whether certain actionsequences are frequently repeated. Based on our initial insights, we developed an approach to au-tomatically detect conceptual groups from recorded computer interactions and a study to collectground truth to uncover how users group their applications and artifacts for their work projectsperformed, which are also used to measure the accuracy of our proposed method and to compareit to related approaches. We showcase a prototype tool that implements our approach and allowsusers to access their conceptual groups and to customize their computer workspaces. We also dis-cuss a wide range of applications conceptual groups could be used for, such as tools for organizingand decluttering the workspace. Initial results suggest that frequent patterns can be used to ex-tract conceptual groups from interaction data outperforming existing approaches by up to 50% inaccuracy.The rest of this thesis is organized as follows: we discuss related work in Chapter 2 and describethe result of our pilot study in Chapter 3 which we conducted as a first exploration step. Next, weexplain how we collected ground truth data used to evaluate our approach in Chapter 4 and describeour approach in Chapter 5 followed by an evaluation of its accuracy in Chapter 6. We finish the2paper with a discussion section in which we comment on the conceptual group concept, describeour prototype and other application scenarios of conceptual groups and future work, commentingon limitations in Chapter 7, and a conclusion in Chapter 9.3Chapter 2Related WorkRelated work can be categorised into concepts derived by analysisng user interaction data forgrouping applications and resources and approaches to detect these groups.2.1 ConceptsPrevious work examined how knowledge workers group and structured their work by observingthem or capturing their computer interaction during their work. Generally, these studies foundthat knowledge workers have groups of applications and resources that they use for completingtheir tasks. For example, Bannon et al. analysed computer interaction data and inferred that userswork in workspaces that they defined as “[. . .] tools and data relevant to the users’ goals [. . .]”and “[. . .] highly dynamic internal structures which can be modified as users reformulate theirgoals” [4]. While workspaces just list “[. . .] software tools that the computer system can providefor accomplishing these goals [. . .]”, our concept of conceptual groups keeps a history of tools thatusers actually previously used while working to achieve their goals and is likely to be using again.Similarly, Morteo et al. performed an observation study. Based on their observations, theycoined the term working spheres [19] which are “[. . .] higher levels of unit of work or activitiesthat people divide their work into [. . .]”, however, unlike conceptual groups, do not include thetools and applications necessary for completing this work.2.2 Context DetectionDetecting the current context of what users are working on can help to mitigate problems such asfinding related files [5], reducing distractions [18] and switching between tasks frequently [4, 11].42.2.1 Manual GroupingMost related approaches that support grouping of applications and resources require users to man-ually group either windows [27] or define working spheres [19] that consist of emails, contacts orfiles relevant to specific activities with a common goal. Users have to constantly review and adapttheir groups when opening new windows or switching tasks which can get cumbersome, interrup-tive and time-consuming over time. A better alternative are approaches that automatically detectthese groups.2.2.2 Automatic GroupingSome approaches that automatically retrieve the user’s context rely on Bayesian networks but oftenrequire users to initialize the network, by manually assigning randomly chosen files to the rightcontext [8] or only infrequently updating the groups [24]. Fully automatic approaches often onlydetect task switches [9, 17, 20, 26], but do not consider that there might be a hierarchy of tasksconsisting of various subtasks and do not take into account that users switch back and forth betweenthe same tasks.Other fully automatic approaches count the number of switches between windows [1] or de-termine the semantic similarity of window titles and temporal proximity [22] to group windows.For determining the temporal closeness between windows, the number of switches between win-dows is counted and they are grouped together if they exceed a certain threshold. However, theseapproaches either do not take into account that the same resources and applications can be usedacross different groups and that switch patterns occur or require the number of groups to be de-tected.Our approach is fully automatic in detecting groups, it is not necessary to define a specificnumber of groups that can be detected and requires no training.2.3 Frequent Pattern DetectionMore recent approaches detect frequently occurring patterns in workflows or routines to identifytasks [7, 25] using different techniques [12, 21] to extract recurring patterns from event data. Al-though these approaches return a set of detected frequent patterns and do not group or summarizethem further, this temporal information could be used to detect conceptual groups since using thesame applications and resources over and over again suggests that they belong together and couldbe seen as forming groups.52.4 Navigation SupportDifferent window manager [6, 28] and navigation tools [23] have been introduced that visualizewhich windows belong to the same task or recommend windows that are related to the currentlyactive window when switching between them. These tools determine relatedness based on previ-ous direct switches between windows. While these tools show that having applications users canactually use is beneficial, none of the tools consider groups of applications and resources or useautomatic grouping.6Chapter 3Pilot StudyAs a first exploratory step, we conducted a pilot study to gain insights into how users switch be-tween applications and windows while working on different tasks. Our research questions were:• How many and for how long do users actively use applications and resources during theirwork?• How often do users switch between windows and applications?• Are there frequently recurring interaction patterns?To address these research questions we performed an exploratory field study with 5 participantsover a period of 7 to 10 workdays, analyzing their user interactions and examining the use offrequent pattern matching on user interactions.3.1 Study ProcedureIn a short introduction session we explained participants the study, asked them for consent, andinstalled a tool on their work machines for monitoring all interactions. Once the tool was installed,we asked participants to continue their work as usual for up to 10 working days. During thistime, our tool prompted users approximately every 20 minutes, aksing them to report their currentworkflows in a popup. We asked about workflows, since we thought of them as frequently recurringsequences of steps or patterns that users perform during their work and wanted to investigate if wecan detect them.The maximum duration of the study was 10 working days, however the study automaticallyended as soon as the participant provided 100 pop-up responses. In a short follow-up survey,7we asked about demographics and more generally about the workflows and workflow switchesparticipants performed.To validate frequently detected workflows, we conducted a followup survey with our partic-ipants, in which we presented frequently recurring interaction patterns that we extracted fromrecorded interaction data.More information about the study, consent form and collected data can be found in Appendix A.3.2 Study SupportFor this in situ study, participants had to install a tool on their work machine, which was requiredto run macOS Sierra or higher, that collects the following data:• Name, title and window position of active and background applications and idle times.Recorded every time made users changes to them.• Aggregated cursor movements, clicks and number of keyboard presses. Individual keystrokes were not recorded, only the number of keys pressed during a time window of 10seconds. This prevents recording of passwords.• Keystrokes for switching applications ( cmd + tab ) and switching windows ( cmd + ‘ ).• Self-reported workflow samples.The pop-up as shown in Figure 3.1 allowed participants to create their recent workflows byselecting used applications and windows and also showed a timeline of previously used applicationsto make it easier for participants to remember their earlier activities.All recorded data was stored locally on the participants device. Since one potential risk wasthat private data might get recorded, we mitigated this risk by showing participants the storagelocation of the data and provided instructions on how they can delete or censor data entries theydo not want to share. Since some participants also used their work machine for private or non-work-related activities, the monitoring tool provided an option to pause recording interactions.Participants were also able to quit the tool at any time.3.3 ParticipantsFor this study we recruited participants through personal contact. Only participants that use work-ing machines running macOS were eligible since the monitoring tool was developed for this plat-form.8Figure 3.1: Pilot study: Pop-up for indicating recent workflow.Overall, we recruited 5 participants of which one was female and four were male (age range/mean/±:24-29/25.8/1.72). All participants were graduate students in computer science that had used theircomputer setup between two to seven years (range/mean/±: 2-7/4.3/1.89). Participants indicatedthat they mostly work on their research, course projects or course assignments.For each participant we collected between 3553 to 11091 events and recorded between 27.7 to52.88 hours of interactions.3.4 ResultsWe analysed the collected data to gain a deeper understanding of how participants interact with theircomputer and to find out if there are frequently recurring interaction patterns. In the following, wepresent the results grouped by the three questions we investigated.93.4.1 How Many Applications and Resources Are Usually Open?Table 3.1 summarises how many applications as well as windows and tabs were open during theperiod of the study for each participant. The median number of applications open at the same timewas between 7 to 14 and of open windows and tabs was between 12 to 38. The recorded dataalso shows that participants used a variety of different applications and a large number of differentresources, with 60% (±14.5) of them being web pages, during the study.P1 P2 P3 P4 P5Total Recorded Time [hours] 47.91 48.29 52.88 27.7 35.55Total Recorded Events 5076 5564 11091 3553 5278Total Applications 36 37 24 60 32Total Resources 731 900 2225 624 908Number of applications open at the same timeMax. 15 14 16 20 12Median 9 9 12 14 7Number of windows and tabs open at the same timeMax. 121 46 66 73 34Median 22 22 38 33 12Table 3.1: Pilot study: Total open applications, tabs, windows.3.4.2 How Much Time Is Spent in Active Windows?Figure 3.2 shows the distribution of time spent in active windows and tabs. Participants mostoften spent less than 10 seconds in active windows and tabs. Within that time frame about 14%of the usage durations less than one second long. These quick switches might have occurred whenparticipants try to find the right window to switch to and accidentally switch to a wrong windowfor a short period of time. These results indicate that users spent only short times in active windowsand tend to perform frequent switches.During periods of activity, switches occur frequently. On average, participants switch 1.67times (± 2.45) per minute. The maximum number of switches within that timeframe was 35. Allparticipants have a similar average switch frequency with between 1.29 to 2.37 switches per minutealigning with related findings [14, 15].3.4.3 Are There Frequently Recurring Interaction Patterns?By applying pattern detection techniques based on the apriori principle [3] (see more details inSection 5.3), we were able to extract frequently recurring patterns.10Resource usage durationDuration [sec] P1 P2 P3 P4 P5 All0-10 1993 2295 6146 1414 2495 1434310-20 1166 1217 2259 785 1237 666420-30 452 493 693 272 374 228430-40 226 290 363 137 184 120040-50 130 174 242 82 113 74150-60 77 118 135 62 76 46860-70 68 92 109 40 62 37170-80 42 85 68 37 46 27880-90 31 45 55 27 30 18890-100 34 56 47 29 27 193100-110 25 40 60 25 22 172110-120 39 36 57 34 26 192>120 264 322 284 220 209 1299017503500525070000-10 20-30 40-50 60-70 80-90 100-110 >120P1 P2 P3 P4 P5Count0400080001200016000Duration [sec]0-10 20-30 40-50 60-70 80-90 100-110 >120Table 1P1 P5 P2 P3 P4 All0-1 136 161 168 1440 78 19831-2 278 403 380 1296 180 25372-3 184 363 263 587 168 15653-4 157 270 194 438 150 12094-5 184 232 172 328 137 10535-6 173 226 171 303 105 9786-7 156 177 158 267 113 8717-8 143 178 155 264 104 8448-9 140 183 212 340 115 9909-10 442 302 422 883 264 2313264 209 322 284 22006501300195026000-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10All1Figure 3.2: Pilot study: Usage duration of specific tabs and windows aggregated across allparticipants.The core idea of this principle is that sequences are considered frequent if their sub sets arealso frequent. Detected frequent sequences get concatenated with eachother and extended to longerfrequent sequences and get retained as long as they occur often enough. We determined the numberof patterns that evolve during one hour of high activity. As parameters for the apriori algorithm wedefined the minimum number of occurrences of an action must at least be 5 and a maximum switchdistance between actions of 2 to be still considered as part of a pattern. These parameters ensurethat only patterns are considered as frequent that have a relatively high number of occurrences andallowing some variances of patterns at the same time.During their most active hours, between 87 to 417 different patterns emerged, with a medianof 12 different patterns per hour. The median number steps of these patterns was 4.0. We onlyconsidered patterns that occurred at least 5 times, on average patterns had 8.9 occurrences in onehour.We visualised the recorded interaction data and noticed recurring usage patterns. Figure 3.3shows a screenshot of the tool used to visualise all occurrences of detected patterns.The responses of participants in our follow-up survey about the detected patterns confirmedthat most of these patterns were also perceived by participants as occurring frequently.These findings also align with related work in terms of identifying temporal patterns in inter-action data. [7, 16, 25].11Figure 3.3: Screenshot of tool for visualising detected patterns from pilot data. The timelineon the top indicates the end and start of frequent patterns that have been detected. Fre-quent patterns consisting of the same applications and resources appear in the same row.Below is a log of every recorded event which was used for manual inspection. If eventsare part of a pattern then they get highlighted in a color that correspond to a frequentpattern and indicated in the legend on the right side.3.5 DiscussionOur pilot study shows that users interact with a large number of different applications and resourcesand with many of them running at the same time. Also, windows are only active for a relativelyshort amount of time and switches between different windows occur repeatedly, and in recurringpatterns, meaning that users tend to switch between the same sets of applications and resourcesover and over again.These initial findings suggest that it could be beneficial to support users in these numerousswitches and that these frequently occuring patterns could be leveraged to identify commonly ap-pearing workflows or conceptual groups of a user.Motivated by these findings, we conducted a study to gather ground truth data of how usersdefine their conceptual groups while working on different projects. We describe our study methodand results in the next section.12Chapter 4Ground Truth StudyTo get a better understanding of how users perceive their own conceptual groups, we performed alab study. We designed this study to gather ground truth data that we could then use to develop andevaluate our approach. Additionally, we conducted interviews after the study asking participantsabout their perception of their own conceptual groups.4.1 Study ProcedureWe conducted a controlled lab study with 11 participants. In the study, participants were asked towork on three tasks and switch between them while we recorded their computer interactions andasked them to self-report on their conceptual groups.We provided participants with a MacBook Air, an additional screen with a resolution of 1920×1200, and an external mouse for the study. In an introduction sessions we explained the study andasked participants for consent. During the study, participants were asked to work on three differenttasks and were free to use any programming language, application or web site. If an applicationwas not already installed then they were free to install it.A pre-installed background tool collected interaction data and prompted the participants ap-proximately every 12 minutes (±2 minutes) asking them to switch to the next project in a round-robin fashion. We chose a median of 12 minutes for these switches because previous work hasshown that switches between working spheres occur about every 12 minutes [13].Every 25 minutes a pop-up was presented asking participants to review and self-report theirconceptual groups which we stated as being applications/resources that are used together becauseof the tasks or projects that are worked on. Having a time span of 25 minutes provided enough13time for participants to work on several projects for long enough that conceptual groups potentiallychanged without too many interruptions. The total duration of the study was 80 minutes duringwhich three responses were collected. We made it clear to participants that they were not expectedto finish all the tasks in the allocated time.We concluded our study with a short interview to inquire how participants usually group theirapplications and resources, their understanding of conceptual groups and what tool support theywould like to have to make their workflows and switches more efficient. We also demonstrated ourprototype during the interview and asked for feedback.4.1.1 Study TasksThe three different tasks participants were working on during the study were:1. coding a simplified blockchain implementation in any language by following a step-by-steptutorial,2. documenting the software architecture of a provided raytracer implementation by creating aUML class diagram, and3. planning a trip to a destination of the participant’s choosing.Each task assignment came with detailed descriptions and the order was varied between par-ticipants (see Appendix B). We chose to have three independent tasks which consisted of severaldifferent sub tasks and were related to software development related problems as well as generalplanning tasks. We consider projects as larger units of work consisting of multiple steps while tasksare considered as a single unit of work or a single step in a project. For example, the trip planningproject consisted of multiple tasks, such as, filling a spreadsheet or writing a packing list. Havingthree projects allowed to present participants with a variety of different types of tasks but still gavethem enough time to work on the individual projects to develop recurring interaction patterns. Wechose these assignments for their similarity to projects that are encountered in a real-world setting.We ran a lab instead of an in situ because this allows for better comparability between participantsand poses a lower risk of private information being recorded.4.2 Study SupportTo collect participants interaction data and their self-reports we used the monitoring tool as in ourpilot study. We extended it to also keep track of URLs of web pages, file paths of used documentsand task switch notifications.14For the self-reports, we developed a pop-up, as shown in Figure 4.1, that was prompted to par-ticipants and had a list of all applications and documents that were used and could be assigned todifferent conceptual groups. The pop-up allowed to add and delete groups and to assign applica-tions and resources to groups by selecting the corresponding checkbox.4.3 ParticipantsWe recruited 11 participants (age range/mean/±: 24-32/28.5/2.66, gender: 2 female, 9 male)through personal contacts for our lab study. All participants were graduate students in computerscience and had experience in programming. Due to an inital error in the data collection tool, notall data of P1 was properly recorded. In the following, we will not consider the data collectedfrom this participant for our analysis. The error that caused the tool to crash was fixed for all theparticipants that followed.4.4 ResultsFirst, we used the collected data and insights from interviews to investigate the characteristics andsimilarities of the way that different workers group their applications and artifacts (RQ1). Between223 and 443 events (Mean/±: 319.3/67.96) were recorded for each participant.4.4.1 Group Size and EvolutionDuring the study, most participants created one conceptual group per task. P2 provided as theonly participant four conceptual groups in the first response with three groups being for the threedifferent projects and one being related to setting up the workspace and installing some missingapplications. This group was merged with the conceptual group related to project 1 in the tworemaining responses showing that conceptual groups can change quickly and get merged into otherexisting groups.Groups consisted of 2 to 36 application and resource pairs, with an average of 10.5 (±6.4).The sizes of groups increased over time. In the first responses, groups had an average size of8.7 (±4.9), in the second 10.5 (±6.2) and in the last one 12.2 (±7.6). Most of the resources andapplications participants used got assigned to a conceptual group at some point. About 12.6%(±4.7) of the resources used were completely ignored and could be considered as less relevant, for0.02% (±0.01) participants explicitly selected that they are not relevant.Since switching was randomized with a median time of 12 minutes, four participants (P4, P5,15Figure 4.1: Pop-up for self-reporting conceptual groups.P6, P9) had only worked on two projects when their first review pop-up appeared and thus onlyprovided two groups.While our results show that most resources and applications are considered to be relevant atsome point, we also discovered that conceptual groups underly frequent changes. About 19%(±8.8) of the resources are assigned to groups only once and did not get reassigned to a groupwhen another review pop-up was prompted. 28.4% (±8.9) appeared at least in conceptual groups16of two and 34.9% (±7.6) in all three reviews. The resources that appear only once in conceptualgroups tend to be used less often with a median of two usages while resources that are assigned togroups in all responses, like Google Search in a web browser application or viewing a specific PDFin a PDF viewer, have a median of five usages. This indicates that there are resources that are usedmore continuously and across various groups.4.4.2 Group CompositionMost resources in conceptual groups were web pages with a mean of 46.7% (±0.5) per group.Conceptual groups consisted to 34.5% (±47.5) of documents, such as code or text files, to 9.36%(±29.1) of command line actions, and to 8.22% (±27.5) of file manager locations. This shows thatweb resources play a significant role when working on different tasks.When asking for how participants would describe conceptual groups, all had generally thesame understanding of the idea which was that conceptual groups are related to individual tasksor projects, describing them as “[. . .] a set of windows or views, even tabs that fit together for aspecific task or topic [. . .]” (P8). While most groups are project or task-related, several participants(P1, P2, P3, P9) also noted that some of their daily conceptual groups outside of this study arerelated to communication, like mail applications or messengers, “[. . .] can live across tasks [. . .]”(P3) or are “global” (P3) and do not belong to a specific task.4.4.3 Frequent PatternsWe visualised frequent pattern occurrences and noticed that while participants were working onindividual tasks different sets of patterns are prominent. Figure 4.2 shows how patterns developedover time for participants P3, P6 and P8. Frequent patterns are different for each participant.An example of a pattern is a sequence of steps from File Authentication.swift in Xcode to FileUser.swift in Xcode to Web page google.com in Firefox.Vertical lines indicate when participants received the notification to switch tasks and horizontalbars indicate the start and end of pattern occurrences. The tasks which participants were workingon at any point in time are indicated in the top of the diagram with T1 being the blockchain imple-mentation, T2 the travel planning and T3 the raytracer documentation. Different patterns that wereactive at a certain point in time are aligned along the y-axis. All occurrences of a specific patternconsisting of the same sequence of used applications and resources appear in the same row. Forexample, a pattern consisting of a sequence Google Search in Firefox to Terminal to StackOverflowin Firefox has multiple occurrences over the period of the study. All of the occurrences appear in17T2 T2T1 T1T3 T3T3T1 T3(a) P3T1 T2 T3 T1 T2(b) P6T1 T2 T3 T1 T2 T3 T1 T2(c) P8Figure 4.2: Frequent pattern occurrences (P3, P6, P8). Each row indicates the start and endof a specific pattern. Patterns consist of a specific set of applications and resources. Dif-ferent patterns developed for different participants. Vertical lines mark project switchesthat participants worked on: T1 - blockchain, T2 - travel planning, T3 - raytracer.the same row with marking the time when the sequence started and was finished. Patterns oftenoverlap with other patterns that consist of very similar applications and resources. For example,the pattern consisting of a sequence Google Search in Firefox to Terminal to StackOverflow inFirefox often overlaps with a pattern Google Search in Firefox to Terminal to Authentication.swiftin Xcode.While working on a certain tasks the same patterns appear. For instance, for a specific parti-18ciapnt while working on documenting the ray tracer usually patterns that contain usages of Xcodeand a tool for drawing diagrams occured. Every time the participant was working on this projectthis pattern occurred. These sets of patterns could be thought of as footprint that is characteris-tic for the project and could be leveraged to identify conceptual groups. The visualisation alsomakes it apparent that P3 switched to project 3 while working on project 1 without receiving thenotification to switch. The active patterns change although no notification has been received (seedotted vertical line). There is also a significant difference in the number of patterns created for eachproject. For all participants, most frequent patterns emerge when working on the travel planningsince participants frequently switch between different websites for this task.These results show that conceptual groups change over time, consist to a large amount of webpages, are usually related to a specific project and have a characteristic footprint consisting offrequently occurring patterns. This suggests that an approach to detect conceptual groups has tosupport frequent changes in conceptual groups and allow groups to consist of a wide variety ofdifferent applications and resources. In the following, we devise an approach that uses frequentpattern detection to detect conceptual groups. Our approach also tries to address that conceptualgroups change over time and that some resources and applications are more or less relevant thanothers.19Chapter 5Approach for Detecting ConceptualGroupsOur approach automatically detects conceptual groups from collected interaction event data for asingle user. Conceptual groups are groups of applications and resources that are repeatedly usedand relevant while working on a project. The main idea is that users have a mental concept forgrouping their applications and resources that are used for certain tasks. They use the same appli-cations and resources and perform the same sequences of actions repeatedly within a conceptualgroup for a task, such as, switching between two application, which allows to apply pattern miningon recorded interaction data to extract conceptual groups. Therefore, our approach searches forfrequently recurring patterns which emerge when users switch between the same applications andresources. We visually analyzed patterns generated from the data collected from the ground truthstudy and noticed that the same patterns often appear temporally close to each other. This suggeststhat different conceptual groups could be distinguished by their characteristic pattern footprints.These footprints consists of sets of frequent patterns that appear together and overlap with eachother in their occurrences.We devised an approach to detect conceptual groups based on interaction event data. Figure 5.1shows an overview of the steps involved, starting with event pre-processing, detecting patterns,clustering of patterns based on their overlapping occurrences, and assignment of applications andresources to detected groups.205.1 Event Data LoggingOur approach takes as input computer interactions that are recorded as a sequence of non-overlappingevents. Events can be recorded by a background monitoring tool, such as the one used in our pre-vious studies. It is also possible to use a custom monitoring tool, e.g. to support other platforms,as long as for each event, the start and end timestamp, current application, resource used withinthat application, resource URL and the current window title is recorded. If the user does not per-form any action, including keyboard inputs or scrolling, for more than 30 seconds, an idle event isrecorded.5.2 Pre-Processing EventsThe recorded events are pre-processed so that only events that are recent enough with a start times-tamp larger than γ are retained. As a starting point for our approach, we chose γ to be 2 weeksunder the assumption that events from 2 weeks ago are not relevant for current conceptual groupsanymore and can be ignored. Future research should look into providing users the option to choosegamma or identify which gamma works best.The approach only considers resources that users spend a reasonable amount of time on. Forour approach, we chose the time threshold to be 500ms and removed all shorter events, since weassume that events shorter than 500 milliseconds are accidental switches and can be ignored.5.3 Frequent Pattern AnalysisOnce we pre-processed the event data, we apply the apriori principle [3] to detect recurring patternsin the recorded activity event data. The main advantages of using the apriori principle is that itworks efficiently with large amounts of data, is fully unsupervised, and finds all existing requentpatterns. The central rule of the apriori principle is that itemsets are considered frequent if theirsub sets are also frequent. This allows the creation of new frequent itemsets by combining existingfrequent itemsets.In our case, if a short pattern, consisting of one or more events, is considered as frequent,meaning that it occurs more than φ times in the ecorded events, then it gets extended by anotherfrequent pattern. For the newly created pattern the number of occurrences are determined and thepattern will only be considered as frequent if it exceeds the threshold φ that needs to be configured.This procedure is repeated until all frequent patterns have been found and no new patterns getcreated.21After examining the data, we added four additional rules to our approach for the creation offrequent patterns which are also visualised in Figure 5.2 to ensure the correctness of created patternswhich are denoted by A and B in the following:1. Only two frequent patterns that are different can be merged into a new pattern. It is notallowed to merge the same patterns A and A to get the new pattern AA since that would meanthat the user did not actually switch to a different resource.2. When merging two frequent patterns A and B, the last event that is part of A must haveoccurred before the first event in B. This ensures that the generated patterns still considerthe order of events. Also, an occurrence of A must be merged with the occurrence of B thatappears closest to it.3. For patterns A and B to be merged into one pattern, there must have been less than c eventsbetween the last event of A and the first event of B. c can be configured. When generatingthese frequent patterns, we want to be resistant to noise. Users might not always have theexact same behaviour when switching between applications, they might have a differentswitch order or skip certain applications from time to time.4. Patterns with fewer than k events (for example k = 1) are removed since they do not provideany meaningful information.5.4 ClusteringIn a next step, our approach clusters frequent patterns that frequently occur together. Clusteringis necessary since the footprint to identify different tasks is a combination of multiple patterns, soit is necessary to determine which patterns belong together and represent a characteristic footrpintfor a specific task.For the clustering, we first calculate the overlap ratio is calculated pairwise between all patternsand for each pair the maximum overlap ratio is retained. A ratio of 0.0 implies that the two patternsdo not overlap while 1.0 implies that at least one of the two patterns completely overlaps with theother. These ratios are used used as input for the clustering using DBSCAN [10], which is a density-based clustering algorithm and will return an arbitrary number of clusters that we consider as theconceptual groups. DBSCAN requires two parameters ε and minPts that have to be defined andare used as a similarity threshold so that patterns are considered as closely related and a minimumnumber of related patterns to form a conceptual group. The major benefit of using DBSCAN for22clustering is that it does not require our approach to define the number of expected clusters, isrobust to outliers, and finds arbitrarily shaped clusters.5.5 Application and Resource AssignmentFinally, we assign previously used applications and resources to the generated conceptual groups.Events might not be part of a frequent pattern, but they might still be used in context of a specifictask and therefore should be part of the corresponding conceptual group. This is, for example,the case for events that are not frequent enough but always used in very close time proximity tofrequent patterns.For this, we iterate through all pre-processed events and check if they occur within close timeproximity (e.g. ± 5s) of a frequent pattern occurrence and add them to the conceptual group. Forall events that do not appear in close time proximity to a frequent pattern the resource titles will becompared with resources that have been assigned to conceptual groups. Only resource titles of thesame applications are compared and they are considered as part of the same group if their similarityscore calculated using TF-IDF exceeds threshold s.23RecordedEventsProcess events(Filtering)Assign resources andapplications to groupsClustering using DBSCANShow or use groupingsin toolSimilarity based onoccurrences in detectedtemporal patterns1 2 3Figure 5.1: Overview of the steps involved to detect conceptual groups.24A... B X YC D X R X X X P→ time→ time→ time→ time...DA B Y Z CZA... B X YC D XinvalidR X X X P ...DA B Y Z CZA(1)Actual(1)(2)(3)... B X YC D X R X X X P A C A B B ...DA B Y Z CZA... B X YC D X R X X X P A B X XT C DY ...DA B Y Z CZ××invalidFigure 5.2: Constraints for creating frequent patterns. Different patterns are indicated ona timeline based on their occurence and denoted by different characters (A,B,X , ...)Actual: patterns detected in the event data. (1) only different patterns, (2) the closestoccurrences of patterns and (3) patterns with less than c = 3 events inbetween can bemerged.25Chapter 6EvaluationTo evaluate the accuracy of our approach to detect conceptual groups, we used the data collectedin our lab study. We examine the similarity between the conceptual groups self-reported by ourparticipants with the groups our approach generated automatically. In particular, we examinedhow many applications and resources were assigned correctly and incorrectly to automaticallygenerated groups, are missing, and for which it is unclear whether they are correct or wrong basedon the collected data.For our evaluation, we experimented with multiple sets of parameters for α , φ , ε , c, minPts,k and s. Specifically, we adjusted parameters to have our approach generate a reasonable numberof groups (2 to 4 groups per response). The best result was achieved by choosing the followingparameters:• α = 1 second (minimum event length to not get filtered out when pre-processing events)• φ = max{log(0.05× total events),2} (minimum number of occurrences). We chose thisvalue to account for scaling of the number of patterns that should get detected based onthe number of previous actions. If there only have been a few events, then patterns repeatless frequently. To detect patterns when users just started working, φ has to be relativelysmall. If more resources are used over time, φ should increase to prevent the detection oftoo many, potentially insignificant, patterns. However, to detect new patterns that emerge ata later point in time, φ needs to be bounded instead of steadily increasing. This behaviour isrepresented by the logarithm function.• c = 3 (maximum number of switches in between two frequent patterns for them to still be26allowed to get combined). This number needs to be relatively small to prevent combinatorialexplosion, but large enough to ensure noise resistance when combining patterns.• k = 1 (minimum pattern length)• ε = 1−0.15 (DBSCAN). Patterns must at least have an overlap of 15% to be considered aspart of the same group.• minPts = 1 (DBSCAN). Minimum number of related patterns to form a group.• Patterns within ±5 seconds before or after a frequent pattern are added to the correspondinggroup. (Resource assignment)• s = 0.91 (TF-IDF similarity threshold). This threshold is set quite high in order to ensurethat only resources with similar titles are clustered together.6.1 AccuracyIn order to calculate the accuracy of our approach, a label is assigned to each resource and applica-tion used which represents the generated cluster they are part of. This enables us to compare howsimilar the label assignments from the ground truth data and the generated conceptual groups are.Since our approach generates an arbitrary number of clusters, first it needs to be determined whichof the generated clusters represent the clusters indicated in the ground truth. For this correlation,for each generated group the group of the ground truth that has the most applications and resourcesin common is selected as counterpart. A group can only be selected once, so if our approach gen-erated more groups than have been indicated in the ground truth, some groups might not have acounterpart.We analyse the accuracy of our approach as follows: “Correct” matches refer to the percentageof applications and resources that were assigned to the same conceptual groups as indicated inthe participant responses. For this we calculate how many of the applications and resources arelabeled according to the provided ground truth. “Wrong” matches data points that were assignedto the incorrect group and do not match the labeling of the collected ground truth while “Missing”are resources that are indicated in the ground truth but are not present in the generated groups.Occasionally, participants added resources to groups only in one response but did not reassign themto the group in the following responses, or only assigned them at a later point in time. This couldeither mean that these resources are no longer or not yet relevant or that participants forgot to select27them. Nevertheless, if a resource has been part of a group previously or at a later point in time thenthat might be an indicator that it is relevant to that conceptual group. These cases are indicated as“Likely Correct”. Overall, 41.8% (±14.5) of applications and resources are assigned “Correct”ly,25.6% (±15.3) are assigned to “Wrong” groups, 13.5% (±13.1) are “Missing”, and 17.4% (±10.7)are “Likely Correct”. The overall accuracy improved over time with more collected data. After thefirst response, 33.9% (±15.1) resources were correctly assigned, 42.3% (±13.3) were correct afterthe second and 48.1% (±13.1) were correct after the third response. Over time more patternsemerged which makes different footprints more discernable and leads to better accuracy.Results for the measured accuracies for all participants are shown in Figure 6.1. The differencein the number of generated groups and the number of groups indicated in the ground truth is alsoprovided for each participant. For example, if our approach generated 2 groups instead of 3 groupsthat a participant indicated, then the difference is indicated as -1.Of the wrongly assigned data, on average 14.9% (±11.3) are resources that participants didnot assign to any groups. Participants might have considered these as less relevant or might haveforgotten to select them. It would be interesting to analyse in a future study whether having theseapplications and resources in groups is unfavourable for users or whether these should actually bepart of the groups. The remaining incorrectly assigned data points occured due to our approachcorrelating them to the wrong patterns.If we consider that the applications and resources classified as “Likely Correct” are potentiallyassigned correctly, then we would have an overall accuracy of 45.3% (±18.6) and an accuracy of51.3% (±14) after the third response.Notably, for some participants (P7, P9) the percentage of “Missing” fluctuates significantly.The main reason for this is that over time φ is changing. It can happen that patterns disappear overtime resulting in some resources being added or removed from groups.There are several reasons for the accuracy of the generated groups not being higher. Firstly,the number of groups that are generated sometimes does not match the number of groups indicatedin the ground truth. Figure 6.1 shows the distribution of the difference in the number of groupsbetween the ground truth on right side of the charts. Generally, the number of generated groupstended to be below the number of actual groups. The numbers increased over time, getting closerto the actual number of indicated groups. The difference in numbers of groups is also the mainfactor for incorrectly assigned resources and applications. 87.6% (±29.9) of the wrongly assignedapplications and resources belonged to a group that was missing or should have been part of anothergroup. This was especially problematic in the beginning when only a small set of events have beenrecorded it is hard to discern different contexts. This happens because some applications, such28as web browsers and search engines, are used across all groups which means the characteristicfootprints for the groups might be quite similar. It is possible that over a longer period of time ourapproach will more accurately determine the number of groups the user perceives since patternswill manifest and re-occur more often over time. To further verify this, it would be necessary toconduct a longer study.Secondly, resources and applications that have been used less frequently are more likely to betreated as noise which occasionally results in data points not getting assigned to any group whileusers still consider them as relevant. Applications and resources that were considered “Missing”have been only used with a median of 4 times before each response while the ones that have beenassigned to a group have been used with a median of 18 times.6.2 Comparison to Related WorkIn a further step we compared our approach to the automatic task-cluster generation approachbased on document switching and revisitation [2] and clustering based on semantic similarity ofwindow titles, similar to the one used in SWISH [22] (RQ3). We chose to compare against thesetwo approaches since they are most similar and also automatically detect groups. We modified theapproach used in SWISH and used DBSCAN instead of KMeans for clustering to allow a variablenumber of groups. We examined multiple different sets of parameters for ε and minPts, best resultswere achieved with ε = 0.85 and minPt = 4. Using the data collected in the ground truth study, theaccuracies for the approaches after each response are shown in Figure 6.2.Our approach has an overall accuracy of 41.8% (±14.5), higher than the other two approacheswhich have an overall accuracy of 24.4% (±9.4) and 25.4% (±13.5) respectively. Therefore, ourapproach exceeds the accuracy by 41.6% and 39.2% respectively. Additionally, for each of theresponses, our approach has a higher accuracy and increases its accuracy over time while accuracyremains relatively constant and did not improve over time for the other approaches.Overall, the results suggest that frequent patterns can serve as indicators for conceptual groupsresulting in a higher accuracy than previous related approaches. After the third response we mea-sured an average accuracy of 48.1% (±13.1) with accuracies up to 72.3% for certain participants.However, using frequent patterns it is not always possible to detect new conceptual groups rightaway as they emerge, instead a certain number of actions need to be performed so that the charac-teristic footprints can be detected.29Table 1-2Response Participant Correct Maybe Wrong Missing Maybe Correct % Maybe Wrong Total Correct Wrong = Noise1 2 0.27272727272727300.02272727215909090.56818181875000000.13636363636363600 0.24999999375 0.09090909090909090.5000000000000000 0.278409090625 0.29545454488636400.36037411017199100 0.068181818751 3 0.19565217391304300.00000000000000000.7608695652173910 0 0 0.36956521739130400.39130434782608700.195652173913043 0.19565217391304300.13933091339924300.3695652173913041 4 0.4 0.01960784291939000.6078431374727670 0.0 0.111111109876543 0.2 0 0.355119825659647 0.372549019389978 0.1568627453159041 5 0.38235294117647100.00000000000000000.61764705882352900.00000000000000000 0 0.44117647058823500.17647058823529400.382352941176471 0.382352941176471 0.4411764705882351 6 0.642857142857143 0.03571428482142860.17857142946428600.14285714285714300 0.24999999375 0.14285714285714300.07142857142857140.651785713839286 0.678571427678571 0.1071428580357141 7 0.20754716981132100.01596516678575420.71988388981801900.05660377358490570 0.076923076331361 0.20754716981132100.52830188679245300.208775259554624 0.223512336597075 0.1915820030255671 8 0.32142857142857100.01530612233965010.62755102051749300.03571428571428570 0.07142857091836740.21428571428571400.42857142857142900.322521865873594 0.336734693768222 0.1989795919460641 9 0.41860465116279070.06976744122621570.5116279076109940 0 0.272727270247934 0.25581395348837210.325581395348837230.4376321349606 0.48837209238900600 0.1860465122621561 10 0.34482758620689660.00000000000000000.44827586206896600.206896551724137930 0.275862068965517240.17241379310344830.344827586206897 0.34482758620689700 0.2758620689655171 11 0.28571428571428570.00000000000000000.14285714285714300.57142857142857140.339358064054916 0 0.071428571428571420.071428571428571420.285714285714286 0.28571428571428600 0.07142857142857142 2 0.49152542372881400.06214689196484620.39548022667922200.05084745762711860.151069175654149 0.33333332962963 0.18644067796610200.27118644067796600.512241054153589 0.55367231569366 0.47557723386635900.1242937860012552 3 0.62500000000000000.02083333229166670.10416666770833300.2500000000000000 0.499999975000001 0.04166666666666670.08333333333333330 0.635416665625 0.64583333229166700.14184029071630600.02083333437499992 4 0.20312500000000000.10329861053723000.59982638946277000.09375000000000000 0.388888886728395 0.26562500000000000.43750000000000000.243296681652413 0.30642361053723000 0.162326389462772 5 0.42105263157894700.01754385939849620.35087719323308300.21052631578947400 0.142857140816327 0.12280701754386000.2456140350877190 0.4235588971715 0.438596490977444 0.1052631581453632 6 0.48888888888888900.04444444370370370.22222222296296300.24444444444444400 0.333333327777778 0.133333333333333000.13333333333333300.503703703209877 0.533333332592593 0.08888888962962962 7 0.33333333333333300.05582137131704860.55023923474355700.06060606060606060 0.263157893351801 0.21212121212121200.39393939393939400.348023167813136 0.389154704650382 0.1562998408041642 8 0.40000000000000000.00000000000000000.20000000000000000.40000000000000000 0 0.00000000000000000.2000000000000000 0.4 0.4 02 9 0.32258064516129030.07526881670250900.51612903275985700.086021505376344090.466666663555556 0.161290322580645160.430107526881720440.357706092721625 0.39784946186379900 0.08602150587813622 10 0.34090909090909090.00000000000000000.6590909090909090 0 0 0.386363636363636350.27272727272727270.340909090909091 0.34090909090909100 0.3863636363636362 11 0.68181818181818180.06818181732954550.20454545539772700.0454545454545454560.42327503964714 0.3749999953125 0.181818181818181820.090909090909090910.707386362997159 0.74999999914772700 0.1136363644886363 2 0.49230769230769200.05555555524691360.42136752167616300.03076923076923080.133390697344587 0.277777776234568 0.20000000000000000.27692307692307700.507739790901657 0.54786324755460600.51342478170892800.1444444447530863 3 0.60377358490566000.02515723186582810.08805031530398320.2830188679245280 0.666666644444445 0.03773584905660380.07547169811320750.620545072257163 0.628930816771488 0.14058878692143000.01257861719077573 4 0.42000000000000000.04666666614814820.19333333385185200.34000000000000000 0.33333332962963 0.14000000000000000.10000000000000000.435555555209877 0.466666666148148 0.09333333385185193 5 0.40540540540540500.02079002063009760.35758835774828100.21621621621621600 0.153846152662722 0.13513513513513500.24324324324324300.408603870093125 0.42619542603550300 0.1143451145050383 6 0.42187500000000000.02901785693558670.45535714306441300.093750000000000000 0.142857141836735 0.20312500000000000.28125000000000000.426020408104045 0.45089285693558700 0.1741071430644133 7 0.29032258064516100.01612903185483870.24193548427419400.45161290322580600 0.24999999375 0.06451612903225810.19354838709677400.294354838508065 0.3064516125 0.04838709717741933 8 0.49315068493150700.20600632165234700.17755532218327000.12328767123287700 0.653846151331361 0.31506849315068500.06849315068493150.627847125493824 0.699157006583853 0.1090621714983383 9 0.450549450549450560.01098901062271060.13186813223443200.40659340659340660.333333322222223 0.032967032967032970.109890109890109890.454212453968254 0.46153846117216100.51342478170892800.02197802234432233 10 0.396551724137931050.00000000000000000.6034482758620690 0 0 0.34482758620689660.258620689655172430.396551724137931 0.39655172413793100.14058878692143000.3448275862068973 11 0.72 0.02999999925000000.1700000007500000 0.08 0.480854368524428 0.24999999375 0.12 0.08 0.727499999625 0.74999999925000000 0.09000000075Total 0.417512121550446 0.03556463225676820.39421300171958900.134538164414502 0.131886787565671 0.427798781069193 0.45307675380721500 0.1487939431399920.144837431769927 0.04200117834796890.20852486599751700.131117890290444 0.18683861011789600 0.11294680821946101 Avg 0.339358064054916 0.01790881302515290.51833088326005900.0543281178158488 0.35726687708006900+- 0.151069175654149 0.02192388472059850.20899457177562700.0562458551783256 0.172993060374748002 Avg 0.42327503964714 0.04475391432450460.38025773320384200.163771784808387 0.46802895397164500+- 0.133390697344587 0.03440125068322970.19407248074053800.134723238579993 0.167791948027817003 Avg 0.480854368524428 0.04403116942064700.28405038869486600.179850543263184 0.52488553794507500+- 0.131886787565671 0.05916400414247200.16718328254685100.152734673064016 0.19105079170814300Participant2345678910110 0.25 0.5 0.75 1Correct Likely Correct Wrong MissingParticipant2345678910110 0.25 0.5 0.75 1Participants234567891011Percentage of Applications and Resources0 0.25 0.5 0.75 1Response 2Response 3Response 1-3-2-1-1-1-2-2-1-1-2-1-2-2-2-2-2-2-1-10+1-2-2-1-2-1-1-2-10Group DifferenceGroup DifferenceGroup Difference1Figure 6.1: Accuracy of our approach for each response of each participant. “Correct”: dataassigned to same groups as in ground truth, “Wrong”: incorrectly assigned data, “Miss-ing”: data indicated in the ground truth but missing in the generated groups, “LikelyCorrect”: data that has been part of the group earlier or later but not at the time of theresponse. Vertical black lines mark the average correctness. Number on the left referto the participant. Numbers on the right indicate the difference in number of generatedgroups vs. number of groups indicated by participants.30Our approach Abela et al. Semantic similaritybased on SWISHFigure 6.2: Boxplots of accuracies of related approaches for each response.31Chapter 7ThreatsAs with any study there are some internal and external threats to its validity.7.1 External ValiditySince our pilot study was executed as an in-situ study some variables are out of our control. For ex-ample, some participants stopped the tool for several hours during the study. Additionally, prompt-ing the popup for reporting on their workflows at least twice an hour might have added distractionsand impacted the way participants would normally work.The tasks participants were working on in our ground truth study might not be generalizableto tasks that are performed in actual work environments. We tried to choose a variety of differenttasks which are mostly related to common activities of software developers to mitigate this risk. Ina real work environment, workers are repeatedly externally interrupted or interrupt themselves orget distracted. These different types of interruptions were not simulated in our lab study but mighthave an impact on the conceptual groups.While our approach is not only applicable for software development related tasks, we only hadparticipants that were working in a computer science research context and are developing softwareon a daily basis. Workers in other fields might have different working styles which might makeour conceptual group approach more or less applicable. Also, all participants were grad student, soresults might be different for software developers working in industry.All of our participants were very experienced computer users working the majority of theirwork day with computers. Novice computer users might not have a different perception of theirconceptual groups.327.2 Internal ValidityWe asked participants for their workflows during our pilot study, however, because participantsstruggled with identifying their workflows, we re-designed our approach to detecting conceptualgroups. Since we did not further use the self-reports participants provided in the pilot study theimpact on our developed approach is minimal.During the ground truth study, we asked participants to indicate their own conceptual groups.We therefore provided participants an explanation of conceptual groups in the beginning. However,this concept is quite general and some users indicated that they were unsure about their exactgroups. Therefore some of the collected results might have some inaccuracies due to uncertaintyor the results might be slightly biased based on our explanation in the beginning.Additionally, the parameters we used for our approach might change if tools record data andcreate conceptual groups over a longer period of time to yield results with high accuracy. Theparameter configuration used for our evaluation potentially overfits with the collected data andmight have been optimized for the collected data and should be investigated further in future work.We also tried different parameter values for approaches from previous work we compared ourapproach against and chose the ones that resulted in the highest accuracy. In the future, moredata should be collected and separate data sets for training and testing should be used to get moreaccurate values for these parameters.The study was executed on a provided machine running macOS. Some of the participants (P4,P7) were not familiar with the operating system and its usage which might have impacted their usualworkflow. We tried to mitigate this risk by providing a short tutorial about the usage. Additionally,we allowed participants to use any application or programming language they wanted so that theycould work in an environment they are familiar with.7.3 Construct ValidityIt is hard to measure how well conceptual groups can describe how users think of the way applica-tions and resources should be organized for the projects they are working on. It might sometimes behard for participants to decide whether a specific resource is actually relevant to a specific projector not. Therefore, there is a threat to the construct validity. We tried to mitigate this threat byproviding a definition of conceptual groups and validating the detected patterns with participantsafterwards as well.33Chapter 8DiscussionWe have shown that our approach can detect conceptual groups with an accuracy higher than pre-vious work. In this chapter we discuss how the algorithm for detecting conceptual groups mightbe used in approaches to support users in their work. We thereby focus on a prototype that wedeveloped, design recommendations that we identified and further future work in the area.8.1 Supporting Professionals with Conceptual GroupsConceptual groups can be used as basis for tools that support professional workflows. In the follow-ing, we present and discuss the prototype that we built to support professionals as well as feedbackwe received from users that were presented with screenshots of the prototype.8.1.1 Tool PrototypeTo help users in switching between different applications and projects, we developed a prototypetool that uses the conceptual group approach to detect the groups. It then uses these groups to showusers their conceptual groups and make switching within and between them easier. Figure 8.1depicts a screenshot of our prototype.Users could use the tool as an addition to the application switcher ( cmd + tab ), for example ifthey wish to switch between tasks instead of just applications. For this the tool also allows to openall the applications that are part of a specific group. It can also be used to declutter the currentworkspace by allowing to close all applications and documents of a specific conceptual group.The tool is implemented in Swift and runs on macOS. Users can open the tool using the keycombination cmd + fn + tab .34Figure 8.1: Prototype of our tool for displaying and interacting with conceptual groups.Conceptual groups are shown as boxes containing applications and related resources. Userscan switch between applications and resources within these groups as well as between groupsquickly using key combinations. Groups are updated periodically, e.g., every two minutes in thebackground, and applications are ordered based on their most recent usage. Since the generatedconceptual groups might not entirely match the users perception, it is possible for users to deleteor add groups, resources, and applications, as well as drag and drop applications and resourcesbetween groups.358.2 Prototype FeedbackStart with a sentence saying (in cas it’s true" that "overall participants perceived the prototypepredominantly positive, e.g. "This woull...". At the same time, participants also mentioned severalideas for extending the current work, specifically with respect to the interactiveness of the approach,the abstraction and visualization of it, and the use of it to enhance other navigation tools.We showcased our tool to ask for feedback by showing screenshots and overall participantsperceived the prototype predominantly positive, e.g. “This would actually be really helpful!” (P4).At the same time, participants also mentioned several ideas for extending the current work, specifi-cally with respect to the interactiveness of the approach, the abstraction and visualisation of it, andthe use of it to enhance other navigation tools.Interacting with Conceptual Groups. Several participants commented on the need for interactingwith the conceptual groups and expressed interest in having indicated the ability to name groups(P3) and for the tool “to break [resources] down more fine-grained” (P2). In our demo, mostof the conceptual groups had applications that had only very few different resources and some ofthe visited tabs were filtered out as they were considered as not relevant. Some participants wereconcerned whether the conceptual groups could be detected accurately and appreciated that it isalso possible to manually adapt the generated groups (P1, P2).Participants also expressed interest in having the ability to save conceptual groups and to laterrestore them as well as to close all the applications and documents that are part of a conceptualgroup that has been completed or suspended (P2, P3, P4, P7, P8, P9, P10). This would not onlyallow users to get back to working on a project quickly but also to share their conceptual groupswith others.Conceptual Groups Summaries and Visualisations. Different ways of representing the concep-tual groups were suggested, such as displaying the current state of the actual windows instead ofjust showing application icons and the resource titles or to use an alignment similar to Exposéon macOS (P1). Another concern that was mentioned is that conceptual groups might grow verylarge over time, hence it is important to retain only the applications and resources that are actu-ally relevant to that group. Ideally, tools should summarise resource usages and or provide searchfunctionality.Conceptual groups could also indicate “[. . .] daily goals [. . .]” (P4) and help users to maintainfocus on their current tasks (P3). It might also be useful to show activity statistics or even add somegamification to help users stay focused.Using Conceptual Groups to Enhance Current Navigation Tools. Navigational tools, such as36the application switcher could implement the approach and only show applications or windows ofthe current conceptual group or learn commonly repeating patterns to order the windows accord-ingly so that “[. . .] alt + tab would switch to the right window even if it wasn’t the most recentwindow [. . .]” (P3). It might also be useful to “[. . .] have alt + tab between conceptual groups.”(P6). In the case of Exposé on macOS “[. . .] it could blur some of the applications that are notrelated.” (P5) or group windows by their conceptual groups (P2). Additionally, window tilingmanagers could be combined with conceptual groups and “[. . .] learn the layout and associatethat with the context or the conceptual group.” (P3).8.2.1 Accuracy ImprovementsOur approach currently considers frequent patterns for determining conceptual groups. Since itoften takes some time for frequent patterns to evolve, our approach is not always able to detector update conceptual groups in real time. By considering additional properties the accuracy mightincrease. For instance, instead of just analysing the window title, semantically analysing the wholecontent of a window to group windows together might yield more accurate results as window titlesoften consist only of very few words.Further improvements could be achieved by using eye tracking to determine which windowsare actually looked at and used by users. Currently, our data collection approach assumes thatwindows that are in focus are actively used. However, often users open multiple windows on thesame or several screens and arrange them next to each other. While working they might look atsome window, eg. to read through documentation or source code without putting that window intofocus. So although that window is actively used our approach would not recognize the usage.Certain actions such as copy, cut and paste could be strong indicators that the user is stillworking within the same conceptual group. It could be likely that text gets usually copied andpasted between windows that should belong to the same conceptual group. Taking these inputactions into account might further increase accuracy.8.3 Future WorkThis section discusses future work related to improving the accuracy of our approach, differentapplications and possible field studies to gather more general observations about conceptual groups.378.3.1 Field StudyWe intend to further test our approach in a field study to confirm that our approach also worksoutside the lab environment. Instead of providing tasks to users, in this study participants willresume working on their own tasks. The tool used in our lab study would still run in the back-ground, collect data, and ask for conceptual groups from time to time. The study will run overseveral days providing insight into how accurate our approach works in a more real-word setting,how conceptual groups develop and change over time and how many conceptual groups workerstypically have. Additionally, it would be interesting to compare different workers in their percep-tions of conceptual groups. We will try to run this study with mainly software developers workingin industry as opposed to only graduate students since the working styles in academia and industryare potentially quite different.8.3.2 ApplicationsSection 8.1 provides a wide range of different applications of our approach and ideas for tools thatcould integrate the approach. In the future, we are planning to develop prototypes for the ideas andperform user studies for some of these tools to determine how helpful and usable they are.38Chapter 9ConclusionProfessionals use a lot of applications and resources during their work day for the projects, tasksand activities they work on and have to switch between them a lot. This thesis presents an ap-proach to automatically generate conceptual groups from recorded desktop interaction data. Toobtain these groups our approach analyses recurring patterns that emerge when switching betweendifferent applications and windows.We conducted a lab study with 11 participants to collect ground truth data for conceptual groupsand to evaluate our approach. Results show that frequent patterns can be used to discern and detectconceptual groups with an accuracy of 48%, however, to achieve an even higher accuracy it mightbe necessary to consider additional factors. The accuracy of our approach also exceeds the accuracyof previous similar approaches by up to 50%.Based on the approach, we developed a prototype that detects conceptual groups and supportsprofessionals in their navigation between these groups. In an interview with participants aboutconceptual groups and our prototype, we gained more insight into how participants perceive theirconceptual groups in their daily work life and received many suggestions for potential applications.We believe that these future applications of conceptual groups could help users in organizing theirworkspace and could make workflows and task switches more efficient.39Bibliography[1] C. Abela, C. Staff, and S. Handschuh. Online activity graph for document importance andassociation. In Proceedings of the 7th International Conference on Semantic Systems, pages191–194. ACM, 2011. → pages 1, 5[2] C. Abela, C. Staff, and S. Handschuh. Automatic task-cluster generation based on documentswitching and revisitation. In UMAP Workshops, 2015. → page 29[3] R. Agrawal, T. Imielin´ski, and A. Swami. Mining association rules between sets of items inlarge databases. In Acm sigmod record, volume 22, pages 207–216. ACM, 1993. → pages10, 21[4] L. Bannon, A. Cypher, S. Greenspan, and M. L. Monty. Evaluation and analysis of users’activity organization. In Proceedings of the SIGCHI conference on Human Factors inComputing Systems, pages 54–57. ACM, 1983. → pages 1, 4[5] D. Barreau and B. A. Nardi. Finding and reminding: file organization from the desktop.ACM SigChi Bulletin, 27(3):39–43, 1995. → page 4[6] M. S. Bernstein, J. Shrager, and T. Winograd. Taskposé: exploring fluid boundaries in anassociative window visualization. In Proceedings of the 21st annual ACM symposium onUser interface software and technology, pages 231–234. ACM, 2008. → page 6[7] O. Brdiczka, N. M. Su, and J. B. Begole. Temporal task footprinting: identifying routinetasks by their temporal patterns. In Proceedings of the 15th international conference onIntelligent user interfaces, pages 281–284. ACM, 2010. → pages 5, 11[8] S. Costache, J. Gaugaz, E. Ioannou, and C. Niederée. Detecting contexts on the desktopusing bayesian networks. 2010. → page 5[9] A. N. Dragunov, T. G. Dietterich, K. Johnsrude, M. McLaughlin, L. Li, and J. L. Herlocker.Tasktracer: a desktop environment to support multi-tasking knowledge workers. InProceedings of the 10th international conference on Intelligent user interfaces, pages 75–82.ACM, 2005. → page 540[10] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. A density-based algorithm for discoveringclusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996. →page 22[11] V. M. González and G. Mark. Constant, constant, multi-tasking craziness: managingmultiple working spheres. In Proceedings of the SIGCHI conference on Human factors incomputing systems, pages 113–120. ACM, 2004. → pages 1, 4[12] M. S. Magnusson. Discovering hidden time patterns in behavior: T-patterns and theirdetection. Behavior Research Methods, Instruments, & Computers, 32(1):93–110, 2000. →page 5[13] G. Mark, V. M. Gonzalez, and J. Harris. No task left behind?: examining the nature offragmented work. In Proceedings of the SIGCHI conference on Human factors in computingsystems, pages 321–330. ACM, 2005. → page 13[14] G. Mark, S. T. Iqbal, M. Czerwinski, and P. Johns. Bored mondays and focused afternoons:the rhythm of attention and online activity in the workplace. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, pages 3025–3034. ACM, 2014. →pages 1, 10[15] A. N. Meyer, T. Fritz, G. C. Murphy, and T. Zimmermann. Software developers’ perceptionsof productivity. In Proceedings of the 22nd ACM SIGSOFT International Symposium onFoundations of Software Engineering, pages 19–29. ACM, 2014. → pages 1, 10[16] A. N. Meyer, L. E. Barton, G. C. Murphy, T. Zimmermann, and T. Fritz. The work life ofdevelopers: Activities, switches and perceived productivity. IEEE Transactions on SoftwareEngineering, 43(12):1178–1193, 2017. → page 11[17] H. T. Mirza, L. Chen, G. Chen, I. Hussain, and X. He. Switch detector: an activity spottingsystem for desktop. In Proceedings of the 20th ACM international conference onInformation and knowledge management, pages 2285–2288. ACM, 2011. → page 5[18] Y. Miyata and D. A. Norman. Psychological issues in support of multiple activities. Usercentered system design: New perspectives on human-computer interaction, pages 265–284,1986. → pages 1, 4[19] R. Morteo, V. M. González, J. Favela, and G. Mark. Sphere juggler: fast context retrieval insupport of working spheres. In Proceedings of the Fifth Mexican International Conferencein Computer Science, 2004. ENC 2004., pages 361–367. IEEE, 2004. → pages 4, 5[20] R. Nair, S. Voida, and E. D. Mynatt. Frequency-based detection of task switches. InProceedings of the 19th British HCI Group Annual Conference, volume 2, pages 94–99,2005. → page 541[21] S. Nijssen and J. N. Kok. The gaston tool for frequent subgraph mining. Electronic Notes inTheoretical Computer Science, 127(1):77–87, 2005. → page 5[22] N. Oliver, G. Smith, C. Thakkar, and A. C. Surendran. Swish: semantic analysis of windowtitles and switching history. In Proceedings of the 11th international conference onIntelligent user interfaces, pages 194–201. ACM, 2006. → pages 1, 5, 29[23] N. Oliver, M. Czerwinski, G. Smith, and K. Roomp. Relalttab: assisting users in switchingwindows. In Proceedings of the 13th international conference on Intelligent user interfaces,pages 385–388. ACM, 2008. → page 6[24] T. Rattenbury and J. Canny. Caad: an automatic task support system. In Proceedings of theSIGCHI conference on Human factors in computing systems, pages 687–696. ACM, 2007.→ pages 1, 5[25] J. Shen, E. Fitzhenry, and T. G. Dietterich. Discovering frequent work procedures fromresource connections. In Proceedings of the 14th international conference on Intelligentuser interfaces, pages 277–286. ACM, 2009. → pages 5, 11[26] J. Shen, J. Irvine, X. Bao, M. Goodman, S. Kolibaba, A. Tran, F. Carl, B. Kirschner,S. Stumpf, and T. G. Dietterich. Detecting and correcting user activity switches: algorithmsand interfaces. In Proceedings of the 14th international conference on Intelligent userinterfaces, pages 117–126. ACM, 2009. → page 5[27] G. Smith, P. Baudisch, G. Robertson, M. Czerwinski, B. Meyers, D. Robbins, andD. Andrews. Groupbar: The taskbar evolved. In Proceedings of OZCHI, volume 3, page 10,2003. → pages 1, 5[28] C. Tashman and W. K. Edwards. Windowscape: Lessons learned from a task-centric windowmanager. ACM Transactions on Computer-Human Interaction (TOCHI), 19(1):8, 2012. →page 642Appendix APilot Study DetailsA.1 Recruiting ParticipantsAfter obtaining approval for conducting the study from the UBC Ethic’s Board, we first tested ourstudy on two graduate students for 3 days. After this test period we made minor adjustementsto the study pop-up and added a timeline of recently used applications and resources. To recruitparticipants for our pilot we used mailing lists and personal contacts at UBC. We sent the followinginvitation via email:Subject: Invitation to participate in study on detecting user workflowsHi,Have you ever been curious about how often you perform the same steps over and overagain while working on your computer? Wouldn’t it be nice to have tool support tomake these workflows more efficient?Under the supervision of Professor Reid Holmes, I, Anna Scholtz, am conducting astudy to gain insight into workflows software developers perform while working ontheir computer. I would be delighted to invite you to participate in this study.This will be an in situ study, conducted over a period of two weeks. After a shortinstruction session, we will ask you to install a tool that will monitor all computerinteractions in the background. After that you can resume working while running thetool in the background. Approximately every 20 minutes, a popup will appear askingyou about your two most-recent workflows. A brief interview session, in which we will43ask you questions related to your indicated workflows, will conclude the study.To be eligible, you must be working with macOS and use your machine for more thanone hour per day.If you are interested in participating, or if you have any questions, please reply to thisemail.44Participant Age GenderWhich bestdescribes yourprimary work area?Which of the followingbest describes your role?Which ofthe followingstatements describesyour currentposition best?For howlong haveyou beenusing yourcurrentcomputer/desktop setup?P1 25 Female Research Individual Contributor Junior 3.5P2 29 Male Research Individual Contributor Other 3P3 24 Male Research Individual Contributor Junior 7P4 25 Male Research Individual Contributor Junior 2P5 26 Male Research Individual Contributor Junior 6Table A.1: Pilot study participant metadata.45A.2 Setup and Introduction SessionIf interested participants were eligible, then we scheduled an introduction session in which weexplained the purpose of the study and installed the monitoring tool. We followed the followingscript for this session:Welcome to the study. The study is about detecting user workflows. We consider aworkflow as a sequence of steps and actions performed as part of a bigger task toachieve a certain goal. Workflows might be repeated multiple times for a given task.An example could be: (1) interacting with IDE editor, (2) web browser StackOver-flow, (3) interacting with IDE editor, (4) running test case, (5) committing changes inTerminal/shell to repository.For the study, we have to install a program that collects the following data:• active applications, background applications: name, title, window position• aggregated input (no clear text input): number of scroll events, cursor distance,keyboard presses within 10 seconds– we are not recording each input event to prevent sniffing of passwords orpersonal information.• Idle times• data entered into popupAll recorded information is stored in a database that you can navigate to using ourtool. You can click on the program icon in the status bar, then click on "Preferences"and then click on "Open data folder" which will open the directory containing thedatabase. The database file can be edited and viewed using common database browserapplications like "DB Browser for SQLite". You can remove or modify data entries thatyou do not want to share at any time.During the study you resume your usual work. If you don’t want to track your actionsfor some time, either quit application or disable trackers by checking the option in themenubar of our monitoring application. Remember to start the application again andenable study mode when you want to resume recording.Approximately every 20min a popup will appear.[Show screenshot of popup]46You are asked you to indicate one to three of your most-recent workflows. For eachworkflow indicate the steps by adding more boxes and specifying applications anddocuments you used during that step. You can also choose to not answer the popup byleaving all fields empty and clicking "Submit".The study will end either after 10 working days or once you submitted 100 popupresponses. For each response we will compensate you with 50cents. In total, you canget up to $50 of compensation. After the study you will be asked to complete a finalsurvey.Do you have any questions?If you have any concerns or questions during the study, then please let me know.[Setup notifications][Show SQL tool for modifying data entries]After obtaining the participants consent using the consent form shown in Figure A.2 and an-swering remaining questions, we started the study.47  Page 1 of 3 	The	University	of	British	Columbia		Department	of	Computer	Science	 201-2366	Main	Mall	Vancouver	BC	Canada	V6T	1Z4	 Consent Form  User Workflow Detection  Principal Investigator Reid Holmes, Professor, Department of Computer Science, UBC  (rtholmes@cs.ubc.ca)  Co-Investigators Anna Scholtz, Graduate Student, Department of Computer Science, UBC (ascholtz@cs.ubc.ca)  Nicholas Bradley, Graduate Student, Department of Computer Science, UBC (ncbrad@cs.ubc.ca)  Thomas Fritz, Professor, Department of Informatics, University of Zurich, Affiliate Professor, Department of Computer Science, UBC (fritz@cs.ubc.ca, +41 44635 6732)  Purpose The overall purpose of this research is to support software development professionals by providing valueable insights into their work patterns and developing tools to support recurring workflows. To accomplish this objective, we are investigating methods that automatically detect recurring workflows in user activity data which is collected while software development professional are working at their computer.  Study Procedure This study is an in situ study, performed over a period two weeks. The study is composed of three parts:  (a) a 20-minute long introduction session in which we will explain the study to you, ask for your consent and install a tool which will monitor all computer interactions in the background,  (b) a period of up to 10 workdays in which you are asked to continue working as usual and will be prompted a popup every 20 minutes to provide self-reports about your current workflows while the monitoring tool is running in the background, and  (c) a 15-minute follow-up survey, in which we will ask you about demographics and more generally about the workflows and workflow switches you performed. Each self-report will take approximately 2 minutes to answer. During an average working day, we expect that answering all popups will take less than 30 minutes in total.  48  Page 2 of 3 Note: we consider a workflow as a sequence of steps and actions performed as part of a bigger task to achieve a certain goal. Workflows might be repeated multiple times for a given task. If you agree to participate, we will install the monitoring tool on your computer. Note that only software development professionals working with macOS and using their machine for more than four hours per day are eligible to participate. The installed tool will collect the following data: - Name, title and window position of active and background applications - Aggregated input data, such as number of scroll events, cursor movements, number of keyboard presses and number of clicks. We are not recording individual key strokes but only the number of keys pressed every 10 seconds. This prevents recording of passwords or other private information. - Idle times - Self-reported workflow samples  The data will be stored locally on your device. At the end of the study our tool will provide you instructions how this data should be sent to us. During the introductions session we will show you how recorded data that you do not want to share can be removed or censored.  Known Risks One risk is that potentially private or confidential data will be recorded. We are mitigating this risk by showing the location of where collected data is stored. You are allowed to remove or censor data entries that you do not want to share at any time. Additionally, the monitoring program does not capture each key stroke which prevents recording of passwords or other private information.  Also, you can terminate the study at any point in time without providing any reason. Otherwise, the risks involved in this study are minimal and are those commonly associated with the use of computers, such as potential eye or wrist strain.  Potential Benefits We do not expect there to be any potential benefits from participating in this study. However, you may find it interesting to reflect on your activity patterns while working on your computer. Direct benefits can arise if our findings lead to tools that can automatically identify current workflows and assist in providing information, such as related applications, documents or websites relevant for the current workflow.  Compensation We use a micro-payment approach and will compensate you with 50 cents for every self-report you submit, with a maximum compensation of CDN $50 per person (100 submitted self-reports). The study will end either after you submitted 100 self-reports or after 10 workdays. Note that you can withdraw from the study at any time without a reason and will receive a prorated reimbursement.  Data, Storage & Confidentiality All data (survey responses and monitoring data) will be saved on password-protected and encrypted devices of the researchers directly involved or in locked university filing 49  Page 3 of 3 cabinets in rooms of the University of British Columbia. All data will be anonymized before and will not contain any identifying information.   You will be identified by numbers or pseudonyms in any internal or academic research publication or presentation. If we choose to use some of your comments, they will be attributed to a participant number or a pseudonym. At no point in time will your employer have access to the identifying information. The data will be stored for five years, after which it will be permanently deleted.   Use of the Data Data collected will be used for analysis and may also be used for class project presentations and other research presentations. This project forms the basis of a thesis research project and may be submitted as a research publication in the future.   Contact for information about the study If you have any questions about or desire further information with respect to the study, you may contact Anna Scholtz (ascholtz@cs.ubc.ca), Dr. Reid Holmes (rtholmes@cs.ubc.ca), or Dr. Thomas Fritz (fritz@cs.ubc.ca).  Who can you contact if you have complaints or concerns about the study? If you have any concerns or complaints about your rights as a research participant and/or your experiences while participating in this study, contact the Research Participant Complaint Line in the UBC Office of Research Ethics at 604-822-8598 or if long distance e-mail RSIL@ors.ubc.ca or call toll free 1-877-822-8598.  Consent Your participation in this study is entirely voluntary. You are free to withdraw your participation at any point during the study, without needing to provide any reason. You can disable the monitoring software yourself at any time during the study. Any information you contribute up to your withdrawal will be retained and used in this study, unless you request otherwise.  With your signature on this form, you confirm the following statements:   • I am at least 18 years old. • I had enough time to make the decision to participate and I agree to the participation.   Name of Participant  ________________________________________________________________________ Signature of Participant   Location and Date Figure A.1: Pilot study consent form.50A.3 Final SurveyAfter finishing the study, participants were asked to complete a survey as shown in Figure A.3. ForQ8 we determined frequently occurring patterns using frequent pattern analysis and updated thesurvey for each participant based on their frequent workflows.51ResearchDevelopmentl tProject Managementj t tTestingtiOther Engineert  iOther Non-Engineert  iIndividual ContributorI i i l  t i tLeadManagerOthertSenioriJunioriOthertQ0. Thank you again for participating in our study!i i i i iYour data will help us a lot in our research on automating workflows and we hope you enjoyed the participation.ill l l i i l j i i iFinally, we would like to ask you a few more questions and would appreciate your feedback.i ll l li i l iQ1. How old are you?l24Q2.What is your gender?iMaleQ3.Which best describes your primary work area?Q4.Which of the following best describes your role?Q5. Which of the following statements describes your current position best?Q6. For how long have you been using your current computer/desktop setup?l i7 yearsQ7. What is your definition of i i i i workflowl ?52Sequence of actions I do while completing a taskQ8.For the following workflows, could you please rate how often they occurred during your work?ll i l l l i   Never Rarely Occasionally Frequently Very Frequently AlwaysiTerm2 → Preview → Google Chromei   i   l    Discord → iTerm2 → Google Chromei   i   l    Mail → Preview → Google Chromeil  i   l    Xcode → Safari → Terminal  f i  i l   Google Chrome ("Messenger") →l    Google Chrome ("YouTube") → Googlel     lChrome ("Messenger")   iTerm2 ("1. fg") → Google Chromei  . f   l  ("Messenger")   Q9.Can you think of any way/approach that could support you performing some/multiple/all workflows and if so how? For example, would ai l i l i l ll l i l ltool that automatically groups windows/applications by workflow in the task bar or on your screen be of use to you?l i ll i li i l iWhile it would be nice to have some low level tasks automatized, most of the action in my workflows requires to look and read the screen, therefore there isn't muchthat could be doable to automaize that.Q10. Are there any parts or whole workflows that you would like to be automated (regardless of how realistic it may be) and if so, whichl l l li l li i i i iones and why?Not really. Most of my workflow involves one or two tools, and I use shortcuts to quickly jump through themLocation DataLocation: (49.278793334961, -123.11389923096)Source: GeoIP EstimationFigure A.2: Pilot study final survey.53Appendix BGround Truth Study DetailsB.1 Recruiting ParticipantsAfter obtaining approval for conducting the study from the UBC Ethic’s Board, we first tested ourstudy on two graduate students for 3 days. After this test period we made minor adjustements tothe study pop-up and the frequency of how often it task switches and conceptual group reviewsshould occur. To recruit participants for our study we used mailing lists and personal contacts atUBC. We sent the following invitation via email:Invitation to participate in the study “Detecting Conceptual Groups”Hi,Have you ever been curious about how often you perform the same steps over andover again while working at your computer? Or what applications and documentsyou usually use for your different high-level tasks? Wouldn’t it be nice to have toolsupport to make switching between tasks more efficient?Under the supervision of Professor Thomas Fritz, I, Anna Scholtz, am conducting astudy to gain insight into workflows software developers perform and their concep-tual groups while working on their computer. I would be delighted to invite you toparticipate in this study.This will be a lab study, conducted over a period of approximately 2 hours. After ashort instruction session, we will ask you to install a tool that will monitor all com-puter interactions in the background. You can either use your own computer if it runs54macOS or we can provide you with a MacBook Air. You are asked to work on differenttasks while running the tool in the background. These tasks range from implement-ing a simple software system to planning a vacation trip. Approximately every 20minutes, a popup will appear asking you to review your conceptual groups that havebeen detected. A brief survey, in which we will ask you about your demographics andquestions related to your indicated workflows, will conclude the study.For your participation in this study, we will compensate you with a 20$ Amazon giftcard.If you are interested in participating, please reply to this email. We will then follow upwith obtaining your consent and scheduling a study session.If you have any questions, feel free to contact me at any time.B.2 Setup and Introduction SessionIf interested participants were eligible, then we scheduled an introduction session in which weexplained the purpose of the study and installed the monitoring tool and asked for their consentusing the consent form shown in Figure B.2. We followed the following script for this session:Welcome to the study. The study is about detecting conceptual groups and groupingapplications and used resources that belong to one conceptual group. We consider aconceptual groups as groups of applications and resources, such as files or websites.So conceptual groups contain all applications and resources necessary to achieve acertain goal or to complete tasks that have a shared goal. These goals could for exam-ple be developing a software system or planning a vacation and could be composed ofother sub-tasks.For the study we provide you with a MacBook Air which runs a program in the back-ground that collects the following data:• Active applications: name, title, window position• Used resources: file paths, URLs• Key combinations: cmd + tab, switching window• Idle times• Generated groups55• Data entered into popupsDuring the study, you will work on 3 different tasks which will be described in detailin the provided files:1. Task 1 is about writing a simplified blockchain implementation in a languageof your choosing. We provide a step-by-step tutorial in Python which you canfollow.2. Task 2 is about creating a UML class diagram for a raytracer written in Python.The source code can be found in the provided files.3. Task 3 is about planning a trip to a destination of your choosing. You will fill outa provided spreadsheet template asking about sights, food and accommodationfor your trip. Additionally, please create a packing list of things you would takeon the trip.You are not expected to finish the tasks in the allocated time.A popup will ask you to switch to the next task from time to time. If you reached task 3then continue with task 1 again.Approximately every 20 minutes a popup will appear asking you to review your con-ceptual groups. You can open the conceptual group view by clicking on the notification.The overview allows you to create new groups, delete groups, add and delete appli-cations and resources from groups and let’s you drag and drop applications betweengroups.[Show screenshot of tool]The study will end after 90 minutes and will be compensated with a 20$ Amazon giftcard.All recorded information is stored in a database that you can navigate to using ourtool. You can click on the program icon in the status bar, then click on"Preferences" and then click on "Open data folder" which will open the directorycontaining the database. The database file can be edited and viewed using commondatabase browser applications like "DB Browser for SQLite". You can remove ormodify data entries that you do not want to share at any time.56Do you have any questions?If you have any concerns or questions during the study, then please let me know.57Consent Form 
Detection of Conceptual Groups Principal Investigator Reid Holmes, Professor, Department of Computer Science, UBC  (rtholmes@cs.ubc.ca) Co-Investigators Anna Scholtz, Graduate Student, Department of Computer Science, UBC (ascholtz@cs.ubc.ca) Nicholas Bradley, Graduate Student, Department of Computer Science, UBC (ncbrad@cs.ubc.ca) Thomas Fritz, Professor, Department of Informatics, University of Zurich, Affiliate Professor, Department of Computer Science, UBC (fritz@cs.ubc.ca, +41 44635 6732) Purpose The overall purpose of this research is to support software development professionals by providing valuable insights into their work patterns and developing tools to support recurring workflows. To accomplish this objective, we are investigating methods that automatically detect recurring workflows and conceptual groups in user activity data which is collected while software developers are working at their computer. Study Procedure This study is a lab study, performed over a period of approximately 2 hours. The study is composed of three parts:  (a) a 5-minute long introduction session in which we will explain the study to you, ask for your consent and install a tool which will show groupings of applications and resources that are conceptually related,  (b) a period of up to 90 minutes in which you are asked to work on three different provided tasks. The tasks are composed of a software implementation, a software planning and a trip planning task. A popup is prompted approximately every 20 minutes asking to provide self-reports about your conceptual groups. You will also get asked to change to different tasks via a popup. The University of British Columbia 

Department of Computer Science201-2366 Main Mall
Vancouver BC 
Canada V6T 1Z4!  Page !  of !1 358(c) A 15-minute follow-up survey, in which we will ask you about demographics and more generally about your conceptual groups. Each self-report will take less than 2 minutes to answer. If you agree to participate, we will either install the tool on your computer or provide you with a MacBook Air that has the tool already installed. Note that you can only use your own device if it is running macOS. The installed tool will collect the following data: - Name, title and window position of active and background applications - Name, path and titles of used files and visited URLs - Groups the installed tool generates  - Tool settings - Idle times and task switches - Self-reports At the end of the session we will show you how recorded data that you do not want to share can be removed or censored. Known Risks One risk is that potentially private or confidential data will be recorded. We are mitigating this risk by showing the location of where collected data is stored. You are allowed to remove or censor data entries that you do not want to share at any time. Additionally, the monitoring program does not capture any keystrokes which prevents recording of passwords or other private information.  Also, you can terminate the study at any point in time without providing any reason. Potential Benefits We do not expect there to be any potential benefits from participating in this study. However, you may find it interesting to reflect on your activity patterns while working on your computer. The tool might have some direct benefit since it assists participants in providing information, such as related applications, documents or websites relevant for the current workflow. Compensation We will compensate you with up to CDN $20 in form of a gift certificate for your participation. Note that you can withdraw from the study at any time without a reason and will receive a prorated reimbursement. Data, Storage & Confidentiality All data (survey responses and monitoring data) will be saved on password-protected and encrypted devices of the researchers directly involved or in locked university filing cabinets in rooms of the University of British Columbia. All data will be anonymised before and will not contain any identifying information.    Page !  of !2 359You will be identified by numbers or pseudonyms in any internal or academic research publication or presentation. If we choose to use some of your comments, they will be attributed to a participant number or a pseudonym. At no point in time will your employer have access to the identifying information. The data will be stored for five years, after which it will be permanently deleted.  Use of the Data Data collected will be used for analysis and may also be used for class project presentations and other research presentations. This project forms the basis of a thesis research project and may be submitted as a research publication in the future.  Contact for information about the study If you have any questions about or desire further information with respect to the study, you may contact Anna Scholtz (ascholtz@cs.ubc.ca), Dr. Reid Holmes (rtholmes@cs.ubc.ca), or Dr. Thomas Fritz (fritz@cs.ubc.ca). Who can you contact if you have complaints or concerns about the study? If you have any concerns or complaints about your rights as a research participant and/or your experiences while participating in this study, contact the Research Participant Complaint Line in the UBC Office of Research Ethics at 604-822-8598 or if long distance e-mail RSIL@ors.ubc.ca or call toll free 1-877-822-8598. Consent Your participation in this study is entirely voluntary. You are free to withdraw your participation at any point during the study, without needing to provide any reason. You can disable the monitoring software yourself at any time during the study. Any information you contribute up to your withdrawal will be retained and used in this study, unless you request otherwise. With your signature on this form, you confirm the following statements:  • I am at least 18 years old. • I had enough time to make the decision to participate and I agree to the participation. Name of Participant 

________________________________________________________________________ Signature of Participant Location and Date  Page !  of !3 3Figure B.1: Ground truth study consent form.60B.3 Study ExecutionWe provided participants with 3 folders that contained descriptions of the tasks they were asked towork on and additional files. The ordering of the tasks was random for each participant.B.3.1 Task 1 - Let’s Go TravelThe description for the travel planning task is depicted in Figure B.3.1.61Let’s	Go	TravelThis	task	is	about	planning	a	trip	to	a	destination	of	your	choosing.Please	do	a	little	research	about	activities,	food,	flights	as	well	asaccommodations	for	your	trip	and	fill	out	the	 travel-plannerspreadsheet.	Try	to	stay	whithin	a	budget	of	$4000.Please	create	a	packing	list	for	the	things	you	want	to	take	on	the	trip.Figure B.2: Travel planning task description.62We provided a spreadheet template shown in Figure B.3 in which participants could enterinformation.Figure B.3: Spreadsheet template for the travel planning task.B.3.2 Task 2 - Step by Step BlockchainThe description and step-by-step tutorial for the task asking for a simplified blockchain implemen-tation is shown in Figure B.3.2.63Step	by	Step	BlockchainThis	task	is	about	writing	a	simplified	blockchain	implementation.	Thefollowing	steps	will	guide	you	through	the	process.	You	can	choose	anyprogramming	language	for	your	own	implementation.	Theimplementation	will	be	slightly	different	from	the	referenceimplementation	and	the	tutorial.	After	each	step	you	will	be	asked	touse	the	version	control	system	git	to	record	your	changes.A	blockchain	is	an	immutable,	sequential	chain	of	records	called	blocks.They	can	contain	transactions,	files	or	any	kind	of	data.	Ourimplementation	will	be	based	on	transactions.	The	blocks	are	chainedtogether	using	hashes.	Blockchains	can	have	a	wide	variety	ofapplications	such	as	for	crypto	currencies,	smart	contracts	or	otherfinancial	services.In	the	following	we’ll	first	create	a	basic	implementation	of	a	blockchainand	then	add	a	small	API	that	can	be	used	to	interact	with	theimplemented	blockchain.Step	0	-	SetupPlease	create	a	new	git	repository	in	the	task	folder	 /task-1 .	You	canuse	any	git	client	of	your	choosing.	The	following	instructions	will	bebased	on	using	git	in	the	command	line:Open	the	command	line,	switch	to	the	current	working	directory	and64run	 git	init .Create	a	new	project	folder	inside	 /task-1 	for	your	blockchainimplementation	and	setup	your	project	in	your	preferred	programmingenvironment.Once	everything	is	set	up,	create	your	initial	commit:You	can	first	review	changes:	 git	statusIt	often	makes	sense	to	exclude	certain	file	(eg.	binaries)	that	should	notbe	tracked.	For	this,	create	a	new	file	 .gitignore 	and	add	the	paths	ofthe	files	to	be	excluded.To	add	and	commit	changed	files	run	 git	add 	and	 git	commit 	andprovide	a	descriptive	commit	message.Step	1	-	BlocksBlockchains	consist	of	blocks	that	look	like	this:block	=	{				'index':	1,	#	Index	of	this	block				'timestamp':	1506057125.900785,	#	Creation	time				#	It	might	make	sense	to	create	a	separate	data	structure	for	transactions				'transactions':	[	#	List	of	transactions								{												'sender':	"8527147fe1f5426f9dd545de4b27ee00",												'recipient':	"a77f5cdfa2934df3954a5c7c7da5df1f65",												'amount':	5,								}				],				'proof':	324984774000,	#	Proof	(more	on	that	later)				'previous_hash':	"2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"	#	Hash	of	the	previous	block}Let’s	first	create	a	data	structure	to	represent	blocks.	Note	that	theblocks	themselves	will	be	immutable	which	means	that	they	cannot	bechanged	otherwise	 previous_hash 	of	subsequent	blocks	would	beincorrect.For	this	you	might	also	need	to	implement	a	data	structure	representingtransactions.Commit	your	changes.Step	2	-	BlockchainNext,	create	a	blockchain	data	structure	that	will	manage	all	the	blocks.An	exemplary	blueprint	for	this	class	could	look	like:class	Blockchain(object):				def	__init__(self):								self.chain	=	[]66								self.current_transactions	=	[]												def	new_block(self):								#	Creates	a	new	Block	and	adds	it	to	the	chain								pass								def	new_transaction(self):								#	Adds	a	new	transaction	to	the	transactions	list								pass								@staticmethod				def	hash(block):								#	Hashes	a	Block								pass				@property				def	last_block(self):								#	Returns	the	last	Block	in	the	chain								passCommit	your	changes.Step	3	-	Blockchain	Initializationand	Creating	BlocksWhen	the	blockchain	is	first	instantiated	a	genesis	block	-	a	block	withno	predecessors	-	needs	to	be	created.	This	block	needs	to	have	a	proof67which	is	the	result	of	mining	and	will	be	described	a	little	later.Before	creating	the	genesis	block,	the	blockchain	could	use	a	generalmethod	for	creating	a	new	block	and	a	method	to	calculate	the	SHA-256hash	of	a	block:class	Blockchain(object):				def	new_block(self,	proof,	previous_hash=None):								"""								Create	a	new	Block	in	the	Blockchain								:param	proof:	<int>	The	proof	given	by	the	Proof	of	Work	algorithm								:param	previous_hash:	(Optional)	<str>	Hash	of	previous	Block								:return:	<dict>	New	Block								"""																						#	Create	a	new	instance	of	block	data	structure								block	=	{												'index':	len(self.chain)	+	1,												'timestamp':	time(),												'transactions':	self.current_transactions,												'proof':	proof,												'previous_hash':	previous_hash	or	self.hash(self.chain[-1]),								}																								#	Reset	the	current	list	of	transactions68								self.current_transactions	=	[]																								self.chain.append(block)								return	block												def	hash(block):								"""								Creates	a	SHA-256	hash	of	a	Block								:param	block:	<dict>	Block								:return:	<str>								"""																								#	We	must	make	sure	that	the	Dictionary	is	Ordered,	or	we'll	have	inconsistent	hashes								block_string	=	json.dumps(block,	sort_keys=True).encode()								return	hashlib.sha256(block_string).hexdigest()																#	[...]The	genesis	block	can	now	be	created	as	follows	when	instantiating	anew	 Blockchain :class	Blockchain(object):				def	__init__(self):								self.current_transactions	=	[]69								self.chain	=	[]																#	Create	the	genesis	block								self.new_block(previous_hash=1,	proof=100)																#	[...]Commit	your	changes.Step	4	-	TransactionsNext,	blockchains	need	a	way	to	add	transactions	to	blocks:class	Blockchain(object):				#	[...]								def	new_transaction(self,	sender,	recipient,	amount):								"""								Creates	a	new	transaction	to	go	into	the	next	mined	Block								:param	sender:	<str>	Address	of	the	Sender								:param	recipient:	<str>	Address	of	the	Recipient								:param	amount:	<int>	Amount								:return:	<int>	The	index	of	the	Block	that	will	hold	this	transaction								"""																				70								self.current_transactions.append({												'sender':	sender,												'recipient':	recipient,												'amount':	amount,								})																								return	self.last_block['index']	+	1												#	Helper	to	return	the	last	mined	block	of	the	chain						@property				def	last_block(self):								return	self.chain[-1]Note	that	the	function	 new_transaction 	returns	the	index	of	the	nextblock	to	be	mined	that	will	contain	the	added	transaction.Commit	your	changes.Next	we	will	dive	deeper	into	how	blocks	are	created,	forged	and	mined.Step	5	-	Proof	of	WorkThe	proof	of	work	is	how	new	blocks	are	mined.	The	proof	of	work	is	anumber	that	should	be	computationally	difficult	to	find	but	easy	toverify.	The	rule	we	will	implement	in	the	following	is:	Find	a	number	pthat	when	hashed	with	the	previous	block’s	solution	a	hash	with4	leading	0s	is	produced.71class	Blockchain(object):				#	[...]				def	proof_of_work(self,	last_proof):								"""								Simple	Proof	of	Work	Algorithm:								-	Find	a	number	p'	such	that	hash(pp')	contains	leading	4	zeroes,	where	p	is	the	previous	p'								-	p	is	the	previous	proof,	and	p'	is	the	new	proof								:param	last_proof:	<int>								:return:	<int>								"""																#	At	this	point	we	brute-force	until	we	get	a	valid	solution								p	=	0								while	self.valid_proof(last_proof,	p)	is	False:												p	+=	1																return	p												@staticmethod				def	valid_proof(last_proof,	proof):								"""								Validates	the	Proof:	Does	hash(last_proof,	proof)	contain	4	leading	zeroes?								:param	last_proof:	<int>	Previous	Proof72								:param	proof:	<int>	Current	Proof								:return:	<bool>	True	if	correct,	False	if	not.								"""																												guess	=	f'{last_proof}{proof}'.encode()								guess_hash	=	hashlib.sha256(guess).hexdigest()								return	guess_hash[:4]	==	"0000"Miners	get	a	small	reward	(eg.	1	coin)	that	will	be	added	as	a	newtransaction	into	the	chain.	More	about	this	will	be	explained	later.Commit	your	changes.Step	6	-	Blockchain	APINext	we’ll	create	a	small	API	to	use	the	previously	created	blockchainover	the	web	using	HTTP	requests.You	can	use	any	HTTP	framework	you’d	like	for	this	part	in	thefollowing	example	code	Flask	is	used.class	Blockchain(object):			#	[...]#	Instantiate	our	Nodeapp	=	Flask(__name__)73#	Generate	a	globally	unique	address	for	this	nodenode_identifier	=	str(uuid4()).replace('-',	'')#	Instantiate	the	Blockchainblockchain	=	Blockchain()#	The	API	endpoints@app.route('/mine',	methods=['GET'])def	mine():				return	"We'll	mine	a	new	Block"		@app.route('/transactions/new',	methods=['POST'])def	new_transaction():				return	"We'll	add	a	new	transaction"@app.route('/chain',	methods=['GET'])def	full_chain():				response	=	{								'chain':	blockchain.chain,								'length':	len(blockchain.chain),				}				return	jsonify(response),	200if	__name__	==	'__main__':				app.run(host='0.0.0.0',	port=5000)74This	creates	the	following	endpoints:GET	/mine :	mines	a	new	blockPOST	/transactions/new :	creates	a	new	transactionGET	/chain :	returns	the	blockchain	as	jsonHere,	we	run	the	server	on	port	 5000 .	Feel	free	to	use	any	server	andport	configuration	you	prefer.Commit	your	changes.Step	7	-	Transaction	EndpointWe	already	have	the	logic	implemented	for	creating	a	new	transaction.So	we	just	need	to	connect	it	to	our	new	 /transactions/new 	endpoint:#	[...]@app.route('/transactions/new',	methods=['POST'])def	new_transaction():				values	=	request.get_json()								#	Check	that	the	required	fields	are	in	the	POST'ed	data				required	=	['sender',	'recipient',	'amount']				if	not	all(k	in	values	for	k	in	required):								return	'Missing	values',	400								#	Create	a	new	Transaction75				index	=	blockchain.new_transaction(values['sender'],	values['recipient'],	values['amount'])								response	=	{'message':	f'Transaction	will	be	added	to	Block	{index}'}				return	jsonify(response),	201#	[...]Commit	your	changes.Step	8	-	Mining	EndpointThe	mining	endpoint	does	the	following	3	things:Calculate	the	proof	of	workAdd	1	coin	as	reward	for	the	miner	as	transaction	to	the	chainForge	the	new	block	by	adding	it	to	the	chain#	[...]@app.route('/mine',	methods=['GET'])def	mine():				#	We	run	the	proof	of	work	algorithm	to	get	the	next	proof...				last_block	=	blockchain.last_block				last_proof	=	last_block['proof']				proof	=	blockchain.proof_of_work(last_proof)76				#	We	must	receive	a	reward	for	finding	the	proof.				#	The	sender	is	"0"	to	signify	that	this	node	has	mined	a	new	coin.				blockchain.new_transaction(								sender="0",								recipient=node_identifier,								amount=1,				)				#	Forge	the	new	Block	by	adding	it	to	the	chain				previous_hash	=	blockchain.hash(last_block)				block	=	blockchain.new_block(proof,	previous_hash)				response	=	{								'message':	"New	Block	Forged",								'index':	block['index'],								'transactions':	block['transactions'],								'proof':	block['proof'],								'previous_hash':	block['previous_hash'],				}		return	jsonify(response),	200#	[...]Commit	your	changes.Step	9	-	Interacting	with	the77BlockchainNow	we	can	manually	test	and	interact	with	the	created	blockchain.First	start	the	server	that	is	executing	the	blockchain	API:$	python	blockchain.py*	Running	on	http://127.0.0.1:5000/	(Press	CTRL+C	to	quit)Next	we	can	run	some	 GET 	and	 POST 	requests.	One	way	to	run	theserequests	is	either	by	using	 curl 	on	the	command	line	or	by	using	aGUI	tool,	like	postman.We	can	execute	the	following	requests:GET	http://localhost:5000/minePOST	http://localhost:5000/transactions/new 	with	a	bodycontaining	our	transaction	structure:				$	curl	-X	POST	-H	"Content-Type:	application/json"	-d	'{				"sender":	"d4ee26eee15148ee92c6cd394edd974e",				"recipient":	"someone-other-address",				"amount":	5				}'	"http://localhost:5000/transactions/new"mine	a	few	blocks	to	get	a	more	impressive	chain78GET	http://localhost:5000/chainStep	10	-	File-sharing	BlockchainAny	data	can	be	used	in	a	blockchain	and	transformed	to	transactionsmetadata.	In	the	case	of	file	sharing,	blockchains	would	remove	theneed	of	having	a	central	store	of	the	files.	In	this	step	you	are	asked	toextend	your	blockchain	implementation	so	that	transactions	can	be	usedfor	file	sharing	and	can	contain	files.For	now	all	the	participants	in	the	chain	would	receive	a	copy	of	thatmetadata	and	therefore	the	files.	You	can	look	into	approaches	to	makethis	more	secure	and	ensure	that	only	certain	participants	can	accessthe	files.This	step	is	not	part	of	the	reference	implementation.Please	commit	your	changes.Next	StepsWe	now	have	a	basic	blockchain	implementation.	Right	now	ourimplementation	is	not	decentralized.	If	you	still	have	time,	you	canimplement	the	Consensus	Algorithm	by	following	the	guide	starting	atStep	4	here.This	tutorial	is	based	on	https://hackernoon.com/learn-blockchains-by-building-one-117428612f46Figure B.4: Blockchain implementation task description.79B.3.3 Task 3 - Raytracer DocumentationFigure B.3.3 shows the task description for the raytracer documentation task. We also downloadedthe code files for the raytracer from https://github.com/martinchristen/pyRT and stored them in thedirectory we provided to participants.80Raytracer	DocumentationThis	task	is	about	creating	a	UML	class	diagram	for	a	raytracer	writtenin	Python.	Raytracers	are	used	in	computer	graphics	to	render	a	sceneand	generate	images.	Raytracers	generate	the	images	by	tracing	thepath	of	light	as	pixels	on	the	image	plane.	This	technique	creates	veryreal	looking	images	and	is	used	in	the	production	of	animation	movies.For	this	task	you	do	not	have	to	understand	the	math	behind	raytracers(but	you	are	welcome	to	try	following	it	while	creating	the	UML	classdiagram).You	can	find	the	source	code	for	the	raytracer	in	 /pyRT/pyrt .	You	canalso	look	at	some	rendered	examples	in	 /pyRT/examples .UML	class	diagrams	are	often	used	in	software	documentations.	Youcan	use	any	tool	you’d	like	for	creating	the	diagram.	Some	tools	thatwork	well	are:yEdthe	free	online	editor	draw.ioany	graphics	editor	such	as	Inkscape	or	Open	Office	DrawYou	can	find	the	UML	class	diagram	speficiation	online.	A	short	cheatsheet	can	be	found	here:	https://docs.microsoft.com/en-gb/visualstudio/modeling/uml-class-diagrams-reference?view=vs-2015Please	create	a	simplified	UML	class	diagram	which	contains	all81relevant	classes	and	relevant	methods	of	these	classes.	Please	exportthe	class	diagram	to	PDF,	PNG	or	JPEG	and	save	it	in	the	 /task-2directory.The	source	code	is	from	the	repositoryhttps://github.com/martinchristen/pyRT.Figure B.5: Raytracer documentation task description.82B.4 Final InterviewDuring the final interview, we asked participants the following questions:1. How do you usually switch between applications and different windows?2. Do you currently group applications and resources in some way?3. What is your definition of conceptual group?4. Do you experience any problems when grouping them?5. Imagine the groups could be detected with perfect accuracy, do you think theywould help you in your workflow? How and when would they be particularlyhelpful?6. Would you prefer a certain way of displaying the groups?7. After showing prototype: Would you prefer a different way of displaying thegroups? Would you use a similar or improved version of a tool for this? Helpfulwhen switching between tasks or applications of one task?All interviews were recorded and later transcribed.83

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0384599/manifest

Comment

Related Items