Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Segmentifier : interactively refining clickstream data into actionable segments Dextras-Romagnino, Kimberly 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2018_may_dextras-romagnino_kimberly.pdf [ 6MB ]
Metadata
JSON: 24-1.0365820.json
JSON-LD: 24-1.0365820-ld.json
RDF/XML (Pretty): 24-1.0365820-rdf.xml
RDF/JSON: 24-1.0365820-rdf.json
Turtle: 24-1.0365820-turtle.txt
N-Triples: 24-1.0365820-rdf-ntriples.txt
Original Record: 24-1.0365820-source.json
Full Text
24-1.0365820-fulltext.txt
Citation
24-1.0365820.ris

Full Text

Segmentifier: Interactively Refining Clickstream Data intoActionable SegmentsbyKimberly Dextras-RomagninoB. Sc., Concordia University, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)April 2018c© Kimberly Dextras-Romagnino, 2018AbstractClickstream data has the potential to provide actionable insights into e-commerceconsumer behavior, but previous techniques fall short of handling the scale andcomplexity of real-world datasets. We present Segmentifier, a novel visual an-alytics interface that supports an iterative process of refining collections of actionsequences into meaningful segments that are suitable for downstream analysis withtechniques that require relatively small and clean input. We also present task anddata abstractions for the application domain of clickstream data analysis, leading toa framework that abstracts the segment analysis process in terms of six functions:view, refine, record, export, abandon, and conclude. The Segmentifier interface issimple to use for analysts operating under tight time constraints. It supports fil-tering and partitioning through visual queries for both quantitative attributes andcustom sequences of events, which are aggregated according to a three-level hier-archy. It features a rich set of views that show the underlying raw sequence detailsand the derived data of segment attributes, and a detailed glyph-based visual his-tory of the automatically recorded analysis process showing the provenance of eachsegment in terms of an analysis path of attribute constraints. We demonstrate theeffectiveness of our approach through a usage scenario with real-world data and acase study documenting the insights gained by an e-commerce analyst.iiLay SummaryCompanies engaged in e-commerce are recording every action consumers take onwebsites, resulting in rich clickstream datasets that have the potential to provide in-sight into consumer behavior. However, analyzing this type of data is difficult andan open research problem due the size and noisiness associated with real worldclickstream datasets. To address this problem, we designed and implemented Seg-mentifier, a novel interface that embodies an initial analysis step that gives analystsan overview of the data and allows them to iteratively refine and separate the datainto segments that either provide insights or allow more effective use of other tech-niques. We also introduce a framework that encapsulates the design of our systemwhich explains this segment analysis process in terms of six functions: view, refine,record, export, abandon, and conclude. We evaluate our system through a usagescenario with real-world data and a case study with an e-commerce analyst.iiiPrefaceThis thesis is based on material contained in the following paper:• Kimberly Dextras-Romagnino and Tamara Munzner. Segmentifier: Interac-tively Refining Clickstream Data into Actionable Segments. Submitted forpublication.I was the lead researcher responsible for gathering requirements from our do-main experts, designing, implementing, and evaluating the Segmentifier interface,and writing the paper that this thesis is based on. Dr. Tamara Munzner contributedlargely in the design process of the interface as well as helping to frame, write, andedit parts of the paper.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Clickstream Data and Tasks . . . . . . . . . . . . . . . . . . . . . . 42.1 E-commerce Clickstream Analysis Goals . . . . . . . . . . . . . 42.2 Clickstream Task Abstraction . . . . . . . . . . . . . . . . . . . . 52.3 Clickstream Segment Analysis Framework . . . . . . . . . . . . . 62.4 Clickstream Data Abstraction . . . . . . . . . . . . . . . . . . . . 93 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Segmentifier Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 184.1 Segment Inspector View . . . . . . . . . . . . . . . . . . . . . . 184.1.1 Ranges Attributes . . . . . . . . . . . . . . . . . . . . . . 184.1.2 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . 19v4.1.3 Sequence Details . . . . . . . . . . . . . . . . . . . . . . 224.2 Operation Manager View . . . . . . . . . . . . . . . . . . . . . . 234.2.1 Operation Builder . . . . . . . . . . . . . . . . . . . . . . 234.2.2 Operation Glyph Design . . . . . . . . . . . . . . . . . . 244.2.3 Operation Inspector . . . . . . . . . . . . . . . . . . . . . 264.3 Analysis Paths View . . . . . . . . . . . . . . . . . . . . . . . . 265 Implementation and Pre-processing . . . . . . . . . . . . . . . . . . 276 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.1 Usage Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.2 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.2.1 Analysis #1 . . . . . . . . . . . . . . . . . . . . . . . . . 346.2.2 Analysis #2 . . . . . . . . . . . . . . . . . . . . . . . . . 376.3 Domain Expert Feedback . . . . . . . . . . . . . . . . . . . . . . 377 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . 398 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 48viList of FiguresFigure 1.1 The Segmentifier interface. . . . . . . . . . . . . . . . . . . . 3Figure 2.1 Clickstream Segment Analysis Framework. . . . . . . . . . . 7Figure 2.2 Action Transition Network showing all actions at each level ofhierarchy: (a) Detailed, (b) Mid-level, (c) Roll-up. . . . . . . 12Figure 4.1 Action Transition Network showing all actions at each level ofhierarchy: (a) Detailed, (b) Mid-level, (c) Roll-up. . . . . . . 21Figure 4.2 Sequence Details view. . . . . . . . . . . . . . . . . . . . . . 22Figure 4.3 Ranges Operation Builders with corresponding operation glyphs. 24Figure 4.4 Actions Operation Builders with corresponding operation glyphs. 25Figure 6.1 Usage scenario workflow. . . . . . . . . . . . . . . . . . . . 33Figure 6.2 Analysis Paths View representing four analyses done in casestudy with domain expert. . . . . . . . . . . . . . . . . . . . 36Figure A.1 Case Study Analysis #1 (CS-A1): The initial state of the in-terface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Figure A.2 CS-A1: Analyst partitions segment containing full purchasingfunnel to investigate number of check out pages viewed. . . . 50Figure A.3 CS-A1: Analyst investigates number of sequences that containmore than expected checkout pages. . . . . . . . . . . . . . . 51Figure A.4 CS-A1: Analyst investigates number of sequences that includea purchase action but not all five actions of the purchasing funnel. 52viiFigure A.5 CS-A1: Analyst partitions segments to determine percentageof purchasing funnel completed in one session. . . . . . . . . 53Figure A.6 CS-A1: Analyst investigate a hypothesis that behavior changesdepending on the time of day and creates two segments for se-quences that start between 7-9 am and those that start between7-9 pm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure A.7 CS-A1: Analyst determines there is no significant differencebetween sequences that begin in the morning vs the night basedon percentage of sequences that contain the full purchasingfunnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Figure A.8 CS-A1: Analyst determines there is no significant differencebetween sequences that begin in the morning vs the night basedon the length of the sequences. . . . . . . . . . . . . . . . . . 56Figure A.9 CS-A1: Analyst investigates sequences that contain an AD-DTOCART action and do not discover anything worth exploring. 57Figure A.10 CS-A1: Analyst investigates sequences that contain an RE-MOVEFROMCART action and notice that 18% of the time thisaction results in the end of a session. . . . . . . . . . . . . . . 58Figure A.11 CS-A1: Analyst investigates the purchasing funnel. . . . . . . 59Figure A.12 CS-A1: Analyst investigates sequences that got to the check-out part of the purchasing funnel but did not purchase. . . . . 60Figure A.13 Case Study Analysis #2 (CS-A2): The initial state of the in-terface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Figure A.14 CS-A2: The analyst switches to the Detailed Action Level toget further details about pages viewed. . . . . . . . . . . . . . 62Figure A.15 CS-A2: The analyst discovers that 84% of the time an APP-START action is generated before entering the cart. . . . . . . 63Figure A.16 CS-A2: The analyst investigates the impact of a new awardsaccount whose pages are stored in the ELITEREWARDS pageviewaction on purchasing. . . . . . . . . . . . . . . . . . . . . . . 64Figure A.17 CS-A2: The analyst investigates at which point of the purchas-ing funnel the users accessed the awards account. . . . . . . . 65viiiFigure A.18 CS-A2: The analyst creates a new segment with sequencescontaining the ELITEREWARDS action. . . . . . . . . . . . . . 66Figure A.19 CS-A2: The analyst investigates further by filtering sequencesthat contain a PURCHASE action and those that do not. . . . . 67Figure A.20 CS-A2: The analyst switches to the Mid-level of the actionhierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure A.21 CS-A2: The analyst investigates further to determine at whatpoint of the purchasing funnel users access their account. . . . 69ixAcknowledgmentsFirst and foremost, I would like to thank my supervisor Tamara Munzner who Iwould have been lost without. Her dedication to her students and her work andher overall awesome personality is what made these past few years both incrediblyrewarding and enjoyable.I would like to thank everyone else who has contributed to the work in thisthesis. Thanks to Mobify for collaborating with me and partially funding this work.More specifically thanks to Peter McLachlan, Boris Lau, Amanda Naso, Ben Cole,and Tim Schultz who were always available to chat and help me in any way theycould. Thanks to my second reader Rachel Pottinger for taking the time to read mythesis and provide very helpful comments.I would like to thank all my friends who are the reason I successfully made itthrough this program. Most importantly, thanks to my housemates India Stephen-son, Hudson Mclellan, and Taz for being my second family. I could not have askedfor better roommates and friends! Thanks to the members of the Infovis group,Michael Oppermann, Zipeng Liu, Anamaria Crisan, and Madison Elliott, for theircontinuous feedback and keeping the lab an exciting place to be. Thanks to DilanUstek, Meghana Venkatswamy, Vaden Masrani, Louie Dinh, Jacob Chen, RobbieRolin, Yasha Pushak, Halldor Thorhallsson, Giovanni Viviani, Neil Newman, An-toine Ponsard, Clement Fung and all my other graduate school friends for makingthe last few years full of great times and adventures. Thanks to all my oldest friendsall across Canada whose friendships I will treasure forever.Finally, I would like to thank my parents, my brother, and the rest of my familyfor all their support throughout my Masters. I would not have made it without you!xChapter 1IntroductionCompanies engaged in e-commerce are recording every action consumers take onwebsites, resulting in rich clickstream datasets with the potential to provide in-sight into consumer behavior that could be used to improve user experience and in-crease corporate revenue. Extracting meaningful signals from these huge and noisydatasets is a difficult analysis problem. Analysis of this data includes answering avariety of questions ranging from discovering the page users most commonly exiton to classifying different types of buying behaviors. The people tasked with un-derstanding clickstream data, referred to as clickstream analysts, generally sit onthe data analysis team of an e-commerce company. They can range from computerscientists to people with no technical background. Moreover, they often have manydemands on their time and minimal training in data analysis.In practice, clickstream analysts frequently turn to third party platforms suchas Google Analytics [2] or Adobe Analytics [1] that focus on making aggregatedhigh-level metrics easy to see quickly but do not provide support for fully drillingdown into the details of consumer behavior. While these platforms do provide anoverall picture of website performance, many potentially actionable insights areleft unfound.A substantial amount of research has been devoted to developing techniquesfor analyzing clickstream event data, typically supporting very specific tasks suchas extracting common behavior through pattern mining or identifying similarly-behaving groups of users through clustering. While these techniques often yield1excellent results with clean and relatively small datasets, they fail when applied tothe noisy and large datasets that occur in real-world practice. We propose an initialvisual data analysis stage to convert raw datasets into data segments that are itera-tively refined until they are sufficiently clean and small to directly show answers,or to match the assumptions of previously proposed techniques that would be usedfor further downstream analysis. Our flexible human-in-the-loop visual analyticsapproach leverages the domain expertise of clickstream analysts and is suitable forthe broad set of questions that they must answer in order to obtain actionable re-sults. It bridges a gap between the characteristics of real-world clickstream dataand the assumptions of previous technique-oriented work.One contribution of our work is a thorough characterization of task and dataabstractions for clickstream data analysis, culminating in a Clickstream SegmentAnalysis Framework that abstracts the iterative and exploratory analysis processin terms of six core functions: an analyst can view, refine, record, export, aban-don, and conclude data segments. Our main contribution is Segmentifier, a novelvisual analytics interface that supports clickstream analysts in the iterative pro-cess of refining data segments and viewing their characteristics before downstreamfine-grained analysis is conducted. The simple-to-use interface includes a rich setof views that show the derived attributes characterizing the segments in additionto the raw sequence details. It supports filtering and partitioning through visualqueries for both quantitative attributes and custom sequences of events, which areaggregated according to a three-level hierarchy that we derived based on our re-quirements analysis. It features a detailed glyph-based visual history of the au-tomatically recorded refinement process showing the provenance of each segmentin terms of its analysis path. We show evidence for the utility of Segmentifierthrough a detailed usage scenario and a case study showcasing the actionable in-sights gained quickly by a data analyst looking at clickstream data generated by hise-commerce company.2Figure 1.1: The Segmentifier interface. The Operation Manager View (A) is responsible for the inspection and creationof operations used to refine segments. The Analysis Paths View (B) is a graphical history that records all segments(gray rectangles) and operations (rectangular glyphs) created during analysis; the selected segment is highlightedin blue with a black outline. The Segment Inspector View (C) shows the attributes and raw sequences associatedwith the selected segment.3Chapter 2Clickstream Data and TasksOur requirements analysis process involved exploratory and followup interviewswith 12 employees of an e-commerce middleware company over a period of 7months. The company builds mobile websites for their clients, large e-commercecompanies, while also performing internal clickstream data analysis of the websitesto provide their clients with valuable insights. The interviewees included peoplewho already conducted clickstream data analysis internally, who might conductsuch analysis if provided with appropriate tools, and who met regularly with andunderstood the work context of their external clients’ clickstream data analysts.We first describe the e-commerce clickstream analysis in domain terms, emphasiz-ing what questions could have actionable answers in this application domain. Wethen present abstract versions of their tasks and a framework for clickstream seg-ment analysis that we developed to reason about design considerations. We finallypresent our data abstraction featuring multiple levels of derived data.2.1 E-commerce Clickstream Analysis GoalsWe focus specifically on e-commerce clickstream data recorded from websiteswhose main functionality is to sell products to consumers. These websites try togenerate the most revenue by finding ways to increase traffic (number of users ona site), reduce abandonment (number of users leaving the site), increase consumerengagement (time users spend on the site and chances that a user returns to the site)4and increase conversion rate (odds a user purchases).Clickstream data analysis is intended to provide insight into consumer behaviorby allowing analysts to achieve the following goals: 1) identify common successfulconsumer trends and optimize for them; 2) identify problems or painful paths andspend resources to fix or improve them; 3) identify groups of common behaviors inorder to personalize consumer experience; 4) identify site metrics or benchmarksto keep track of the state of the website. We define analysis results to be actionableif they are aligned with these goals.Examples of commonly asked questions derived from these high-level goalsare: What pages do users most commonly exit on? How many users purchase?How many bounce (exit after viewing one page)? How many users make it throughthe purchasing funnel? Where do they drop out? Can you classify different typesof buying behaviors?Our interviews with experts revealed that industry analysts often only have afew hours to conduct analysis to discover insights in the data and then an evenshorter time frame of several minutes to concisely report their findings to clientsthrough very short progress reports or presentations. Moreover, many people inthis job role have no or minimal skills in programming and statistics. The designrequirement for Segmentifier is to cater to these limits of skills and available time,allowing analysts to quickly find actionable answers that are simple to interpret andconvey to their clients.2.2 Clickstream Task AbstractionIn clickstream analysis, the information available about consumer behavior is loggedsequences of actions that a user performs on a website. We translate the goals ofSection 2.1 into two kinds of abstract tasks, relating these sequences to behaviorvia subtasks with analysis questions:Identify Tasks (I): The goal is to discover new interesting behavior.• I1 Identify new interesting behaviors: Is there a way to distinguish a set ofsequences representing behaviors that are expected (example: users add tocart before purchasing), unexpected (example: no purchases for a month),favorable (example: lead to a purchase), or unfavorable (example: lead to5abandonment)?• I2 Identify the cause or effect of a behavior: What behaviors trigger/followbehavior X?• I3 Identify more fine-grained behaviors from an initial behavior: Can se-quences that follow behavior X be described by more specific behavior Y?Verify Tasks (V): The goal is to check whether conjectured interesting behaviorsactually occur.• V1 Verify existence of a behavior: Do any sequences follow behavior X?• V2 Verify amount of support for a behavior: How many sequences followbehavior X?• V3 Verify if a behavior causes another behavior: Do sequences that followbehavior X also follow behavior Y?2.3 Clickstream Segment Analysis FrameworkOur task abstractions motivate the idea that clickstream datasets can provide an-swers for many different analysis questions if they can be partitioned into appro-priate segments that encapsulate interesting behaviors. This idea follows the samepremise as previous philosophies of data wrangling, that real-world datasets areextremely noisy and need to be cleaned and refined before analysis can be con-ducted [17]. Clickstream datasets, which track every user action, can containmillions of sequences in real-world logs, a scale much larger than many othertypes of event sequence data [27, 36]. Moreover, they are inherently extremelynoisy with high variability between sequences, meaning that very few are identi-cal. Sarikaya et al. point out the need for simplifying clickstream data for effectivedownstream analysis, noting the analysis difficulties that arise from the clutter ofrepeated events, ambiguous events, useless events, and irrelevant sequences thatdo not pertain to a current analysis task [36]. They call for an exploratory tool thatsupports an iterative process between pre-processing data and viewing the resultswith a graphical history to record steps. Segmentifier is our answer to this call; in6Figure 2.1: Clickstream Segment Analysis Framework iteratively generatessegments that analysts can view, refine, export, abandon, or use to con-clude answers for a flowing series of analysis questions.this section we introduce a framework for clickstream segment analysis, shown inFigure 2.1, to concisely and clearly explain our design rationale.In this abstract framework, many segments are generated and analyzed throughan iterative exploration process that traverses through six functions that pertain toa current segment: view, refine, record, export, abandon, and conclude. Theexploration begins with an initial segment that contains all sequences in the dataset,and typically involves flowing between many tasks opportunistically based on whatthe visual data analysis reveals. Exploring one question often spurs follow-up orentirely new questions; in some cases these are based on the satisfactory answerthat is found, or they may arise after giving up on finding an answer for a previousquestion. Each refinement step is recorded. Analysis may stop when the flow ofquestions runs dry, or the analyst simply runs out of time.View Viewing a segment provides information to the analyst not only about thelow-level details of the raw sequences that it contains, but also a high-level charac-terization of attributes that characterize the segment. The result of viewing mightbe to notice that the segment provides answers for the current task or is a good fitand should be exported, to spark ideas on how to further refine a segment, to gener-ate new analysis questions and thus change to a new task, or to determine whether7a segment should be abandoned.Refine Refining allows analysts to create more appropriate segments for a task byeither filtering, partitioning, or transforming the sequences of any segment. Everyrefinement step leads to a new segment that can be viewed, analyzed, and furtherrefined. Decisions on how to refine are determined by the current analysis questionor inspired by the discovery of interesting patterns when viewing the segment.Record We embrace the suggestion from previous work that graphical historiesto help analysts remember analysis paths are essential for any exploratory analy-sis [15]. Recording all refinement steps automatically helps analysts keep track ofthe many questions and hypotheses that are generated during an exploratory analy-sis session, providing enough provenance information that answers can be revisitedlater and understood in context.Export A segment can be exported for further downstream analysis by applyingtechniques such as clustering and pattern mining that require relatively clean in-put data to work effectively. For example, pattern mining helps discover commonsequences of events but requires as input a small number of sequences with lowvariability to achieve useful results. By viewing and refining segments, analystscan create segments that meet these requirements to maximize the effectiveness ofthese downstream analysis techniques after the export.Conclude Successfully concluding an analysis path by finding an actionable an-swer for the current analysis question may occur as an immediate outcome fromviewing a segment.Abandon A segment would be abandoned after viewing if the analyst sees noobvious pathway for further refining and decides it contains no actionable resultsand is not suitable for export.Each of the tasks in Section 2.2 can be answered by iteratively applying thesefunctions until the appropriate segment is found. For Verify tasks, analysts can8keep refining and viewing until they create a segment that follows the known task-related behavior. The number of sequences within the segment will then indicatethe support for that behavior. If the sequences within a segment match the inputdata requirements for some specific technique, the segment can be exported forfurther analysis. For Identify tasks, analysts discover behaviors by developing hy-potheses through viewing the data of segments, and testing them by refining andcreating different segments.This framework explicitly addresses the cold-start problem that bedevils manyprevious systems, where an analyst is required to run a query before seeing any re-sults. It is difficult to know what to query when confronted with a completely blankview; ensuring a meaningful view at all times is the cornerstone of our approach toguide fruitful exploration.2.4 Clickstream Data AbstractionThe difficult visual analytics design problem is how to connect the internal and im-plicit mental model that clickstream data analysts have about interesting consumerbehaviors to viewable characteristics of sequences and segments. We compute a se-ries of derived data attributes intended to reveal patterns, distributions, and outliersthat connect with the analysts’ informal sense of what behaviors are interesting.Specifically, we try to generate evocative attributes such that a series of constraintson these attributes could serve to define a segment that contains expected, unex-pected, favorable, or unfavorable behavior.An event sequence is any multivariate sequence of timed events; clickstreamdata is a special case of this data type where the individual events are categori-cal actions performed by a user interacting with a website. The information as-sociated with each action, referred to as action attributes, is a quantitative singletimestamp, a categorical action type, a categorical sessionid identifier as describedbelow, and a categorical clientid identifier indicating the specific user. A sequenceis a time-ordered list of actions that are associated with a particular id over somespecific time interval. The information associated with each sequence, referred toas sequence attributes, is derived from the action attributes within it, yielding thequantitative time range attributes of sequence start, sequence end, and sequence9duration. The sequence action counts are also quantitative attributes storing thetotal number of actions and also per-type counts for each of the action types.A session is a default time frame often used in industry when logging se-quences of events, where actions performed by a single user are grouped togetheruntil a gap of a specific amount of time has occurred. Our industry partner setsthis time to 30 minutes. This one-size-fits-all sessionid cutoff leads to analysischallenges because there may be multiple interesting subsequences within each ofthese per-session sequences, if activities that are conceptually different for the siteuser are grouped together. Conversely, grouping sequences according to the per-session id will often lead to multiple sequences for each site user in the clickstreamdatasets typically studied by the analysts due to time scales ranging from days tomonths. Although the original clickstream data is partitioned by these arbitrarysession boundaries, we can also derive per-user sequences where all of these areconcatenated together in order to follow the behavior of a single clientid acrosslonger periods of time.The number of different raw action types recorded depends on the granularityof the tracking and ranges from tens to hundreds in the clickstream datasets studiedby the analysts we interviewed. The core actions tracked include the important 4e-commerce actions of adding to cart, removing from cart, searching, and purchas-ing; 3 site functionality actions of errors, loading mode, and offline mode; and thefrequent action of viewing a page. Websites can have hundreds of unique pages butvery few analysis tasks require distinguishing between every single page on a site.Sites are often built using a single template for an entire group of similar pages,so that the builders are only concerned with tens of templates rather than hundredsof individual pages. However, our inspection of template structures across a vari-ety of sites showed that they are not necessarily consistent, hampering their utilityfor analysis. We derive a three-level categorical attribute action hierarchy thatclassifies actions into groups that provide appropriate levels of detail for the click-stream analysis tasks that we target, based on the recurring common elements inthe site-specific templates. The actions in each level of the hierarchy are shown inFigure 2.2 through the Action Transition Network chart discussed in Chapter 4.1.All three levels include the full set of 4 e-commerce actions and 3 site actions. Thedetailed actions level, shown in Figure 2.2(a), also has a set of 17 page groupings10that roughly corresponds to the consistent templates across sites. The mid-levelactions, Figure 2.2(b), aggregates the detailed page groups into 6 higher-level cat-egories, and the coarsest roll-up actions level, Figure 2.2c, groups all page viewstogether into a single aggregate action. This action hierarchy is used to transformsequences of raw actions with high variability into derived sequences of aggregateactions with much lower variability, allowing analysts to consider more coherentlarge-scale structure. The exact details of these page groups and names may besomewhat specific to this particular e-commerce company; the generalizable ideabehind this approach is the value of a derived multi-scale hierarchy of aggregate ac-tion types for analyzing consumer behavior which has been also adopted by someof the previous work [13, 20, 24, 27, 33, 37, 40, 45].11Figure 2.2: Action Transition Network showing all actions at each level of hierarchy: (a) Detailed, (b) Mid-level, (c)Roll-up.12We define a segment as any set of sequences. We derive many additionalquantitative segment attributes and a derived network of action transitions basedon the constituent sequence attributes and the union of all the actions containedwithin them. The simple quantitative attribute of segment size is simply the countof sequences that it contains. More complex computations use two intermediatetables. One table has one row per sequence, and is used to compute distributionsacross all countable ranges showing the number of sequences that fall into binsbetween the minimum and maximum values. The underlying sequence time rangesare used to create four temporal distributions: duration, start hour, day of theweek, and date. The distributions of per-sequence absolute action counts arecomputed for all actions together, plus each action type individually. Finally, thetable contains a column for each categorical type in the action hierarchy to recordhow many times it appears within the sequence, in order to compute the numberof sequences that contain each action, both as an absolute number and a relativepercentage compared to the segment size.The other table consists of action transition trigrams for the combination ofan action, the action before it, and the action after it. We count the number oftimes that this trigram occurs within the union of all actions within a segment’sconstituent sequences. The results are the action transition networks of all tran-sitions between actions that occur in the segment. There is one network for eachlevel of the action hierarchy, and the number of nodes at each level is constant (24at the detailed level, 13 at the mid-level, and 7 at the roll-up level). The data-drivenaspect that changes for each segment are the edges: the transition counts that serveas quantitative edge weights, which can be zero to indicate that there is no transi-tion between two actions in any of the sequences. Transitions are unidirectional;links in both directions are computed.The refine and record functions of the clickstream segment analysis frameworkthat we propose in Section 2.3 and instantiate in Segmentifier leads to an analysispaths tree where each node is a segment. The initial segment at the root of thetree contains all sequences, and the downstream segments become smaller as theyare refined. Each segment node has associated with it the recorded information ofexactly what operator used to refine it, as discussed in Section 4.2; this deriveddata constitutes a full record of the analysis process. We use this tree to derive the13quantitative attributes of relative counts for each segment: the percentages withina segment compared to its direct predecessor in the tree, and compared to the rootsegment of all sequences.14Chapter 3Related WorkWe frame the related work with respect to the clickstream segment analysis frame-work presented in Section 2.3 by grouping previous systems into categories ac-cording to the framework’s functions. No previous system encompasses all aspectsof our framework (view, refine, export, abandon, conclude, and record); Segmen-tifier is unique in providing an initial analysis process that can handle the size andnoisiness of real-world clickstream data.Post-Export A significant amount of research in this field proposes specific tech-niques for relatively clean and small sets of event sequences; we consider these tobe useful for downstream analysis only after the export stage of our framework.Major categories of such techniques include clustering [8, 14, 18, 28, 40, 43, 48],pattern mining [10, 23, 26, 27, 33], and cohort comparison [29, 49]. In all ofthese cases, the techniques do not handle the scale and complexity of real-worldclickstream data. They are validated with input data such as manually refineddatasets [43, 49], very small datasets [27, 33], or explicitly indicate a requirementfor sampling [26]. Segmentifier is designed to maximize the effectiveness of thesetechniques by allowing analysts to first refine the data to meet their requirements.View Sequences A great deal of the previous work focuses on different ways toview sequences but does not provide mechanisms to view segment-level attributes.Most earlier work simply displayed individual sequences, either along a horizon-15tal timeline [3, 19, 21, 35] or using footstep graphs [16]. Later work introducedways to show many sequences at the same time by either grouping them [7] orusing different visual encoding idioms such as networks [6, 34, 42], flow dia-grams [31, 32, 44], disk trees [11], or icicle plots [22, 46]. Wang et al. [41] ex-tended some of this work by introducing interaction techniques such as aligningsequences. However, without any ability to view segment attributes or to refine thedata, discovering insights is extremely difficult.Refine Visual query systems [12, 13, 20, 47] emphasize the need to refine seg-ments by allowing analysts to filter data. Although they record refinement historyusing something analogous to our analysis paths tree, they provide either a limitedor nonexistent view of the segment attributes and they do not provide a view of thesequences at all. Analysts thus have trouble knowing how to refine and often haveto take a guess–and–check approach. Moreover, these systems focus on filtering,whereas Segmentifier also provides ways to refine through partitioning.View and Refine Tools that emphasize the need to both iteratively view and refinedata come closer to the functionality of Segmentifier. In addition to providing aview of both the segment attributes and the sequences, they incorporate the addi-tional functionality of filtering by ordinal or categorical segment attributes [25, 38]or filtering by custom patterns of events [37, 39]. Wongsuphasawat et al. [45]incorporates both these filtering options but in two different tools focusing on veryspecific tasks of comparing datasets and funnel analysis.The tools closest to Segmentifier incorporate versions of both filtering optionsand focus on exploring and simplifying event sequences. Session Viewer [24] hasthe ability to filter by segment attributes but only highlights sequences that fol-low custom event patterns. EventFlow [30] incorporates both filtering options butprovides very minimal segment attributes. It does feature aggregation, but at thesequence level rather than the event level as supported by our action hierarchy.Neither has the ability to record the analytic process.EventPad [9] is the most similar work to our own. It fully incorporates both fil-tering options, but does suffer from the cold-start problem of starting with a blank16slate. It allows analysts to manually choose to record segments they create, butdoes not automatically record all refinement steps. Its focus on regular expressionswould be a poor match for clickstream analysts who do not have the time or train-ing to grapple with these complex queries; Segmentifier is designed around a morerestricted query scope suitable for non-programmers. These three tools have mainviews that focus on viewing raw sequences and provide only limited support forviewing sequence attributes. In contrast, the design of Segmentifier promotes seg-ment attributes to first-class citizens, a crucial factor in accommodating the scaleof clickstream data.17Chapter 4Segmentifier InterfaceSegmentifier embodies the clickstream data analysis pipeline by supporting an it-erative process of refining segments and viewing the results while recording allanalysis paths. Its design capitalizes on the domain knowledge of e-commerce ex-perts to refine raw sequences into clean, meaningful segments tied to consumerbehavior that can lead to actionable insights, either directly or after being exportedfor different types of downstream analysis. The interface consists of three majorviews, as shown in Figure 1.1: the Segment Inspector view, the Operation Managerview, and the Analysis Paths view.4.1 Segment Inspector ViewThe Segment Inspector view tackles the View part of the framework by showingall attributes of the selected segment, allowing analysts to identify new interestingbehaviors that they can explore further or verify if the segment matches some be-havior of interest. The view is divided into three sections: Ranges, Actions, andSequence Details.4.1.1 Ranges AttributesThe top Ranges section, shown in Figure 1.1C, consists of linked histograms show-ing the sequence count distributions for all of the ordered segment attributes. Byhaving access to these distributions, instead of just the sequences as is the case in18most of the previous work, analysts can now immediately spot patterns and un-usual trends of different attributes in the data. For example, the top row containstime-related attributes: duration (with the option of switching between minutes,hours, or days), start hour, day of week, and start date. The bottom row showsthe distribution of counts for individual actions: it always shows the total numberof actions in a sequence, and on demand shows additional charts with counts foreach specific type of action. Brushing any of the charts automatically cross-filtersall other range charts to show correlations between attributes. The scale of eachhorizontal axis is data driven to ensure that the major trend is visible despite long-tail distributions, with an aggregate leftover bar on the far right labelled with >representing all values beyond the point that bars are shorter than one pixel.4.1.2 ActionsThe middle Actions section, shown in Figure 1.1C, focuses on the action hierarchydiscussed in Section 2.4 through three different charts: Contains Chart, ActionTransition Network, and Selected Action Adjacency View. The analyst chooseswhich level of the action hierarchy is currently active: Detailed, Mid-level, or Roll-up, which then updates the action types shown in each of these charts accordingly.The Contains Chart shows the distribution of action types across the selectedsegment’s sequences as a bar chart with labels for both absolute counts and relativepercentages, with actions color coded to match the other two charts.The middle chart shows the derived Action Transition Network between theaggregated action types at the chosen level of the action hierarchy. The nodesare always at the same fixed positions at each hierarchy level; the transition countdata derived from the selected segment drives which edges are drawn and at whatthickness, for either the outgoing or incoming direction. The min visible links sliderat the top filters out edges whose percentages fall below the specified threshold,allowing the analyst to trade off between detail and clutter. Figure 4.1 shows thevisual encoding of the chart at each level of the hierarchy. The dimensions of thechart adapt according to the hierarchy level.The spatial layout and color of the nodes emphasize the action hierarchy struc-ture by providing consistency between views, in order to help analysts spot pat-19terns. Our goal was to test the utility of the novel derived action hierarchy dataabstraction with a visual encoding that was well tuned to reveal its structure. Sincethe action hierarchy nodes do not change dynamically, we created the layouts andthe color scheme manually with a few rounds of iterative refinement.The layout provides spatial stability between the hierarchy levels, preservingthe relative positions of the e-commerce actions at the bottom and site functionalityactions at the top. The pageview region in the middle has detailed-level actionspositioned close to others within the same mid-level group, and preserves relativepositions of the mid-level groups. The design targets high information densityfor maximum label legibility, with minimal edge crossings. Similarly, the colorpalette is designed to preserve visual consistency between levels, with identicalcolors for the actions that appear across all levels and similar color families forall detailed-level actions within the same mid-level action group. These colorsare used consistently across many elements representing actions in all three mainviews. The goal is maximum distinguishability between the full set of 24 colorsthat are visible simultaneously at the detailed level and some distinguishabilitybetween the similar colors, where the largest mid-level group, BROWSE GROUP,has size 6. This requirement is particularly difficult due to the small size of themarks: circular nodes in these charts and the operation glyphs in the other views,and the small boxes in the Sequence Details section.The third chart is the Selected Action Adjacency View, which supports furtheranalysis of a single node selected from the full network in a butterfly view: thechosen action is in the middle, with all of the incoming nodes aligned on the leftand the outgoing ones aligned on the right. The aligned nodes are flanked bystacked bar charts showing the proportions of their occurrence, using consistentcoloring. The exact values appear in a tooltip on hover.Some of previous work includes standalone charts similar to the Selected Ac-tion Adjacency View. We observed that our industry partners were manually cre-ating simple versions of the Action Transition Network, which gives an overviewof all possible action transitions. We provide automatic support for this reasoningprocess in a novel view that does not appear in any previous work. It solves thecold-start problem for the Selected Action Adjacency View, allowing analysts todecide which actions are worth exploring further.20Figure 4.1: Action Transition Network showing all actions at each level of hierarchy: (a) Detailed, (b) Mid-level, (c)Roll-up. Layout and color scheme chosen to enforce hierarchical structure. The Mid-level and Roll-up charts arenot to scale; they have been enlarged to show details.21Figure 4.2: Sequence Details. All sequences in the selected segment shownas series of glyphs. Blue bar and number indicates frequency of oc-currence of each sequence. (a) Sequences in selected segment. (b) Se-quences after being collapsed.4.1.3 Sequence DetailsThe bottom Sequence Details section is shown in Figure 4.2. Each row shows asequence as series of boxes, one for each action, encoded according to the actiontypes at the chosen hierarchy level. Their color is consistent with the nodes in theActions section and they are labelled by a character that is unique for each action.This character set was manually chosen to be evocative and memorable for thestatic set of actions, following the same rationale as above for the layout and col-ors. Identical sequences are grouped together with only one row for each uniquesequence, with a bar chart and numerical label on the left showing the counts. Se-quences are sorted by frequency. The grouping and sorting yields a view that ismuch more scalable then showing the raw sequences alone while preserving thelow level details required to spot patterns of user behavior. The Collapse Consec-utive checkbox aggregates consecutive identical actions into a single action box,leading to a different grouping of sequences and visual encoding. Figure 4.2(b)shows the collapsed version of 4.2(a).224.2 Operation Manager ViewThe Operation Manager view, shown in Figure 1.1A, tackles the Refine part ofthe framework. Analysts iteratively apply refinement operations to either filtera segment into a smaller one via attribute constraints or partition it into multipleothers by dividing according to attribute values. This view contains two sections,the top Operation Inspector and the bottom Operation Builder.4.2.1 Operation BuilderThe Operation Builder at the bottom has two tabs, one for ranges and one foractions, following the Segment Inspector structure.Ranges Operation Builder The Ranges Operation Builder is automatically up-dated when any chart is selected in the Segment Inspector Ranges section, with abar showing the full range of values for the relevant attribute. It can be filteredin the Range Filter Builder or partitioned in the Range Partition Builder using thewidgets in the respective tab. Figure 4.3(a) and (b) show examples of an analystusing the Range Filter Builder to filter a subset of the sequences based on valuesof a different attributes. The Figure 4.3(a) example shows filtering down dates tothe month of October using sliders, and in Figure 4.3(b) the analyst uses the NOToption to select all values that are outside regular working hours. Figure 4.3(c)and (d) show examples of an analyst using the Range Partition Builder to partitiona segment into smaller segments based on the values of an attribute different at-tributes. Figure 4.3(c) shows the result of selecting the purchase count chart andadding three partition bars at values 1, 2, and 8, while Figure 4.3(d) adds only asingle partition bar to compare sequences that lasted less versus more than an hour.Actions Operation Builder The Actions Operation Builder, shown in Figure 4.4,supports investigation of how many sequences contain certain patterns of actions.This view is essentially a glyph-based regular expression query, with a limited setof choices carefully chosen to be comprehensible to analysts with limited techni-cal backgrounds. The user selects actions from the Action List, specifies whetherthe links between them are consecutive (shown as solid lines) or non-consecutive23Figure 4.3: Ranges Operation Builders with corresponding operation glyphs.(a) filter operation using date chart. (b) NOT filter operation using hourchart. (c) partition operation with 3 partitions using purchase countchart. (d) partition operation with 1 partition using duration chart.(shown with dashed lines), chooses whether or not to invert the whole pattern withthe NOT button, and then applies the pattern normally or as a purchasing funnel.Selecting a sequence in the Sequence Details view automatically instantiates itspattern in the builder for potential editing, as does clicking on an already createdactions operation node.4.2.2 Operation Glyph DesignOperation glyphs are used to represent each operation in the Operation Inspectorand Analysis Paths views. The rectangular glyphs are designed to concisely showthe type of operation, the attribute used, and an indication of values selected.Figure 4.3 shows four ranges operation glyphs with their corresponding builders.The icon on the left represents the source chart: time-related attributes have a clocksymbol and a unique letter indicating the specific chart, and attributes related to ac-tion counts have a pound sign within a circle colored by the specific action. For24Figure 4.4: Actions Operation Builders with corresponding operation glyphs.(a) A non-consecutive action pattern filter at the Detailed action hierar-chy level. (b) A consecutive action pattern NOT filter at the Roll-uplevel.ranges filter operations shown in Figure 4.3(a) and 4.3(b), the blue shaded por-tion of the rectangle represents the user-selected filtered range compared to the fullrange of values for the specified attribute. For ranges partition operations shownin Figure 4.3(c) and 4.3(d), a line is drawn for every partition the analyst added.The position of the line represents the value of the partition compared to the fullvalue range of the attribute.Figure 4.4 shows two actions operation glyphs with their corresponding builders.Colored circles represent the actions, touching to show consecutive links and sep-arated to show non-consecutive ones. Figure 4.4(b) shows a NOT filter representedwith a dark gray background.254.2.3 Operation InspectorThe top Operation Inspector section, shown in Figure 1.1A, provides detailed in-formation about the segment and operation nodes currently selected in the AnalysisPaths view. Each row represents one of the series of operations that was appliedto create the segment. The operation details, on the left, include the correspondingoperation glyph and text describing the constraints it specifies. The results, on theright, provide a visual and textual representation of the segment size absolutely andcompared to the previous and initial segments.4.3 Analysis Paths ViewThe Analysis Paths view, visible in Figure 1.1B, tackles the record part of theframework. It keeps track of all conducted analysis paths through a graphicalhistory tree of nodes for all created operations, represented by their glyphs, andsegments represented by gray rectangles sized by their absolute sequence counts.Selecting a segment turns it blue, triggers a complete update of Segment Inspectorview to reflect that segment’s data, and updates the Operation Inspector to showthe attribute constraints used to create the segment. The analyst can then refine itto create a new segment in the tree by building a new operation.26Chapter 5Implementation andPre-processingSegmentifier was built in Javascript with D3.js [5], using crossfilter.js [4] to managethe flow of data between views. We cache crossfilter results to speed up operationsthat proceed down an analysis path from the root to a leaf segment, so the interac-tive response time varies depends on the recent history of user selections in additionto the overall number of sessions and their length. Sequences grouped by clientidare substantially longer than those grouped by sessionid. On a 1.7 GHz Intel Corei7 computer with 8GB of memory, loading time is roughly linear in the number ofsequences (15 seconds for 200K and 1 minute for 1M sequences). The interactiveresponse time for updating after segment selection ranges from 2-10 seconds forsmaller per-session datasets of 200K sequences but can extend in some cases to30-300 seconds for larger per-client 200K sequence datasets. The response timesfor the 1M sequence per-session dataset range from 12-25 seconds.Pre-processing of raw clickstream datasets into Segmentifier format beginswith an SQL query to extract a table of one sequence per row with start time,end time, clientid, and sessionid. Each action is mapped to a unique character cor-responding to the lowest level of the action hierarchy. This processing stage wasconducted upon the e-commerce company computing infrastructure and took un-der ten minutes in all cases. Subsequent processing occurs in Python with a scriptthat creates two new aggregated strings for the other two levels of the action hierar-27chy and computes all of the derived time range attributes, which again takes underten minutes in all cases. For each new website analyzed, we expect domain ex-perts to map site-specific page templates to the 17 action types at the detailed levelof the action hierarchy, as discussed in Section 2.4 and shown in Figure 2.2(a).We settled on this manual grouping by domain experts following Liu et al. [27],who concluded that this approach was more effective than the multiple automatictechniques they considered.28Chapter 6ResultsWe show the effectiveness of Segmentifier in two ways, both using real-worlddatasets. First, we walk through a detailed usage scenario to illustrate the util-ity of the tool based on our own analysis of this dataset. Second, we summarizethe insights found through a case study with an e-commerce industry clickstreamanalyst, showcasing the effectiveness of Segmentifier for a domain expert targetuser.6.1 Usage ScenarioThe first dataset comes from a live e-commerce site (CUST1). The data collectedover two months constitutes 4 million per-session sequences and 10.5 million ac-tions. We randomly sampled over the entire time period to a manageable size of1 million sequences containing 2.6 million actions. The website was built with62 unique site-specific templates that we mapped into our action hierarchy. Thisdataset is also used in Figures 4.2, 4.3, 4.4, and 6.1.Through our own analysis using Segmentifier, we constructed a usage scenariothat shows the utility of the tool and illustrates the type of insights that can befound. Figure 6.1 showcases this usage scenario step by step as a series of smalldetail views extracted from full screenshots. Each subfigure has a header indicatingthe analysis flow according to the Clickstream Segment Analysis Framework; acheckmark by View denotes finding an actionable result.29A. Identify and filter out unexpected behavior (I1) The initial segment with 1 mil-lion sequences is loaded in as the root of the tree in the Analysis Paths view, andhighlighted in blue showing that it is the selected segment (Figure 6.1A Segment).The analyst begins by looking at the Actions section and notices from the ContainsChart that only 41% of sequences contain a PAGEVIEW action (Figure 6.1 A View).Noting this situation as unexpected behavior, they explore further by looking at thesequences in Sequence Details (Figure 4.2a). They select the Compress Consec-utive option to simplify the sequences (Figure 4.2b) and notice that over 58% ofsequences only contain APPSTART actions (an action signifying when a site loads).The analyst determines that there must be an APPSTART tracking issue.They filter out the unexpected behavior by selecting the sequence in SequenceDetails which automatically creates the action pattern in the Actions OperationBuilder (Figure 6.1A Refine): a consecutive path from START to APPSTART toEXIT with a black dot in the APPSTART to include any number of those actionsconsecutively. The analyst selects the NOT option and clicks Apply. The operationis recorded in the Analysis Paths view (Figure 6.1A Record) by a path from theinitial segment to an operation node representing the action pattern followed bythe resulting segment node whose size and label show that 415,980 sequences arenot affected by this behavior.B. Identify unfavorable behavior (I1) The analyst selects the resulting segment(Figure 6.1B Segment) and all attributes in the Segment Inspector are updated.After deselecting the Compress Consecutive option in the Sequence Details view,the analyst notices that approximately 50% of sequences contain one APPSTARTaction and one PAGEVIEW action, referred to as a bounce (Figure 6.1B View). Theanalyst decides to further explore the unfavorable behavior by once again selectinga sequence to access the pattern in the Actions Operation Builder (Figure 6.1BRefine) and applying the filter operation. The operation is recorded in the AnalysisPaths view (Figure 6.1B Record).C. Verify support for unfavorable behavior (V2) They select the resulting seg-ment (Figure 6.1C Segment). To get more details, the analyst switches to the De-30tailed level of the hierarchy (Figure 6.1C Refine) which updates the charts in theSegment Inspector view. The Contains Chart (Figure 6.1C View) shows the analystthe distribution of pages users bounced from. This information is an indicator ofpotential problem pages which should be explored. Content with the informationgathered from this segment, the analyst concludes this analysis path and continuesto explore other segments.D. Verify existence of expected behavior (V1) The analyst wants to analyze thepurchasing funnel, an expected behavior, and determine how many users drop outat each step. They select the previous segment (Figure 6.1D Segment) and createa pattern in the Actions Operation Builder (Figure 6.1D Refine) consisting of non-consecutive links between the five actions in the purchasing funnel. By selectingApply as Funnel, an operation node and resulting segment node is created for eachstep of the pattern in the Analysis Paths view (Figure 6.1D Record).E.1 Verify support of expected behavior (V2) They select the last resulting seg-ment representing all sequences that contain the full purchasing funnel. The Oper-ation Inspector shows a detailed breakdown of the series of operations applied tocreate the segment allowing them to determine the dropout percentage at each stepof the purchasing funnel. They notice that 67% make it to the checkout step butnever purchase (Figure 6.1E.1 View) and hypothesize that there may be an error inthe checkout process preventing consumers from purchasing.E.2 Identify more-fine grained behaviors (I3) In the Ranges Attributes section,the analyst creates the PV PDP count chart (Figure 6.1E.2 Segment), describinghow many product pages are in each sequence. By selecting this chart, the RangesOperation Builder range bar updates to show the total range, 1-44 product pages.To further explore the distribution, the analyst applies a partition operation withtwo partition bars at values 2 and 10 (Figure 6.1E.2 Refine). A partition node iscreated in the Analysis Paths view with three resulting segment nodes (Figure 6.1E.2 Record).31F. Verify support of more-fine grained behaviors (I2) After selecting the partitionoperation node (Figure 6.1F Operation), the Operation Inspector updates to showdetails of the resulting three segments (Figure 6.1F View): 23% of purchasers viewone product page, 67% view between 2-10 pages, and 10% view between 10-44.The analyst decides to use this information to send targeted messages to consumersbased on the type of purchasing behavior followed.G. Verify favorable behavior (V1) They re-select the segment representing se-quences that followed a full purchasing funnel (Figure 6.1G Segment) and noticein the Sequence Details view that the sequences follow a favorable behavior forexporting to the pattern mining technique (Figure 6.1 G View). The number ofsequences is small and there is low variability between them satisfying the datarequirements for this technique. They download the segment (Figure 6.1G Export)and end their analysis. Figure 1.1B shows the final analysis path tree.32Figure 6.1: Usage scenario workflow broken down into the analysis of 7 segments/operations. Details of each inSection 6.1. View boxes with checkmarks means actionable answers were found at this step.336.2 Case StudyA clickstream analyst at the e-commerce company was preparing to give a one-month post website launch presentation to their commercial customer. Their goalwas to use Segmentifier to discover actionable insights and suggestions for im-provement that they could relay back to their customer. We conducted two separatechauffeured analyses with this domain expert, who was also one of the intervieweesduring the requirements analysis phase.6.2.1 Analysis #1This analysis used data from a different customer website (CUST2), with 25 site-specific templates mapped into our action hierarchy. The full dataset collected inone month was 20M sequences containing 190M actions. We randomly sampled200K sequences each representing one user session, containing 1.9 million actions.The resulting Analysis Paths tree shown in Figure 1.1 reflects the four analysis taskscompleted during the two hour session. We summarize the insights found for eachanalysis here; the supporting materials contain further screenshots showing thesefindings.A. Analyze purchasing behavior The analyst explored purchasing behavior by fil-tering out the sequences that contain a PURCHASE action. Two separate analysispaths emerged from this segment: 1) They explored the checkout flow and discov-ered that 12% of sessions contain more checkout pages than necessary to completea purchase. 2) They explored the percentage of the purchasing funnel completed inone session and discovered that 30% of users actually exit the site and return laterto complete their purchase.B. Comparing morning vs night behavior The analyst investigated a hypothesisthat the behavior of users would change based on the time of day so they specifi-cally chose to investigate the difference of behavior between sessions that began at7-9 am versus 7-9 pm. They compared the number of actions in each session andthe percentage of sessions which completed the full purchasing funnel and foundno significant differences.34C. Analyze add and remove from cart behavior The analyst then explored addingto cart and removing from cart behavior and discovered that 30% of users whoremoved from cart exited the session and most likely did not come back.D. Analyze purchasing funnel Using Apply as Funnel, the analyst was able to eas-ily determine the percentage of sessions that drop out at each step of the purchasingfunnel. Using the information gathered in Analysis A, they were able to determinethat about 20% of people who get to checkout will not end up purchasing.35Figure 6.2: Analysis Paths View representing four analyses done in case study with domain expert.366.2.2 Analysis #2For this analysis we used the same base dataset (CUST2), but inspected the derivedsequences of all actions performed by a single client over the entire one-month timeframe instead of using the sessions with 30-minutes cutoff values. The sampleddataset consisted of 200,000 sequences, each representing one client, containing4.3 million actions in total. Figure 1.1 shows an overview of the analysis.With access to a new type of sequences, the domain expert decided to revisitsome of the questions they explored in the previous analysis. They were interestedin analyzing the reasons for dropping out of the purchasing funnel at the checkoutstage and the effect of removing from cart. They discovered that 25% of clientsthat removed from cart at the checkout state, exited and never returned to the siteto purchase. They also noticed that an APPSTART action was triggered every timebefore the PV CART action which was an unusual behavior. Followup investigationdetermined a problem with the cart pages of the website.The domain expert was also interested in analyzing the effect of a particularawards account, myBeautyPage, recently added on the website. Therefore, duringthe pre-processing step, the domain expert ensured that the myBeautyPage tem-plate name was the only template mapped to the PV ELITEREWARDS action in theaction hierarchy. The domain expert was able to easily discover that 1% of allclients signed up for the awards account and from those 27% made a purchase.They partitioned these clients to understand at what point of the purchasing funnelthey signed up for the awards account and discovered it occurred mostly beforeadding to cart. Finally, they discovered that clients that accessed the awards ac-count generally had longer sequences, signaling that they were more committedcustomers.6.3 Domain Expert FeedbackThe clickstream analyst was able to find many valuable insights to take back totheir customer, and saved several screenshots for inclusion in their presentation.They thought the interface was easy to use and understand, specifically appreciat-ing the Sequence Details view for generating new questions and the Analysis Pathsview for re-examining previously created segments and helping them remember37their analysis process. They also noted the usefulness of switching between dif-ferent levels of the action hierarchy for different tasks and how the color schemereinforced the hierarchy.Their wishlist of additional capabilities included saving workflows to reuse ondifferent data, seeing a fourth level of actions with more detail showing all site-specific page templates, increased capabilities of the pattern builder to filter bymore sophisticated patterns, and the ability to easily compare two segments.38Chapter 7Discussion and Future WorkA key aspect of the Segmentifier design is that each possible refinement operationcorresponds to one attribute constraint. Thus, a path through the recorded analysistree from root to leaf provides a crisp way to understand the provenance of a seg-ment as a sequential series of attribute constraints. We saw the need for this kind ofunderstandability after initial experiments using existing techniques such as clus-tering and pattern matching on the full dataset, which yielded incomprehensibleresults because of the high variability between sequences.We carefully considered the tradeoff between power and simplicity for this ap-plication context. Many previous systems provide full support for regular expres-sions [9, 24], which are notoriously difficult for non-programmers to handle evenwhen presented through a visual language, so we deliberately kept the scope ofthe Actions Operation Builder limited. Despite the request for more functionalityfrom the domain expert, we still suspect that adding too much additional complex-ity on this front would make the system unusable for the untrained and extremelytime-crunched analysts who we are targeting.Segmentifier explicitly supports refinement through both filtering and parti-tioning. Although in theory partitioning could be achieved by the workaround offiltering multiple times, the utility of partitioning is that the analysis tree explicitlyshows the split into multiple segments in a way that is a better match for a click-stream analyst’s mental model and encourages subsequent analysis of the differentcases. It is possible to compare segment sizes quickly using the Analysis Paths39view, but support for direct comparison between segments in detail is left for fu-ture work. An additional form of refinement is to transform segments by alteringtheir sequences. In Segmentifier, transform operations occur implicitly throughtwo choices: by switching between levels of the action hierarchy to change theaction aggregation level and by selecting the Compress Consecutive option in theSequence Details view to collapse consecutive, identical actions into one. Support-ing further transformations explicitly would also be interesting future work.Our primary focus was the agile and iterative development of Segmentifier’sdesign, with only modest engineering effort to improve loading and processingtimes to achieve a base level of usability. A few months of data collection can yieldmany millions of sequences, but our implementation strains at one million andhas better responsiveness at 200K sequences. We randomly sample from the fulldatasets in hopes of capturing sequence variability over the time ranges of interestto the domain experts, rather than simply targeting shorter time periods. We con-jecture that our fundamental design would scale to datasets larger than our currentengineering accommodates, and we have some evidence for this optimism. Thecase studies with the domain expert were run using a sample of 200,000 sequencesfrom the CUST2 dataset in order to ensure that Segmentifier interaction would becompletely fluid with no hindrances in the analysis loop. We later replicated thefirst analysis ourselves with a larger sample of 1 million session sequences fromCUST2, up to the limit of what Segmentifier can load. Although the processingtime was frequently slower, the same patterns were visible. Improving the capac-ity and speed of our approach using computational infrastructure such as Hadoopand MapReduce [37] would be very interesting future work, to find the boundarybetween engineering and design issues.40Chapter 8ConclusionSegmentifier bridges the gap from the large and noisy clickstreams in real-world e-commerce settings to downstream analysis techniques that require relatively cleandata by allowing analysts to view, refine, record, export, abandon, and draw con-clusions from iteratively developed segments. Understandable results can be foundquickly by analysts with limited time and skills. Our task and data abstraction con-nects the mental model of clickstream analysts about interesting behavior to view-able characteristics of sequences and segments via a series of evocative deriveddata attributes. The usage scenario and case study show the utility of Segmentifieras a step forward in handling the complexities of realistic e-commerce clickstreamdatasets.41Bibliography[1] Adobe Analytics. https://www.adobe.com/data-analytics-cloud. URLhttps://www.adobe.com/data-analytics-cloud. → page 1[2] Google Analytics. https://analytics.google.com/. URLhttps://analytics.google.com/. → page 1[3] W. Aigner, S. Miksch, B. Thurnher, and S. Biffl. PlanningLines: novelglyphs for representing temporal uncertainties and their evaluation. In Proc.Intl. Conf. Information Visualisation (IV), pages 457–463, 2005.doi:10.1109/IV.2005.97. → page 16[4] M. Bostock. crossfilter.js. https://github.com/square/crossfilter, 2013. →page 27[5] M. Bostock, V. Ogievetsky, and J. Heer. D3 data-driven documents. IEEETrans. Visualization and Computer Graphics, 17(12):2301–2309, 2011.ISSN 1077-2626. doi:10.1109/TVCG.2011.185. → page 27[6] J. Brainerd and B. Becker. Case study: e-commerce clickstreamvisualization. In Proc. IEEE Symp. Information Visualization (InfoVis),pages 153–156, 2001. doi:10.1109/INFVIS.2001.963293. → page 16[7] M. Burch, F. Beck, and S. Diehl. Timeline trees: Visualizing sequences oftransactions in information hierarchies. In Proc. Working Conf. on AdvancedVisual Interfaces (AVI), pages 75–82. ACM, 2008.doi:10.1145/1385569.1385584. → page 16[8] I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White. Visualization ofnavigation patterns on a web site using model-based clustering. In Proc. Intl.Conf. Knowledge Discovery and Data Mining (KDD), pages 280–284.ACM, 2000. doi:10.1145/347090.347151. URLhttp://doi.acm.org/10.1145/347090.347151. → page 1542[9] B. C. M. Cappers and J. J. van Wijk. Exploring multivariate event sequencesusing rules, aggregations, and selections. IEEE Trans. Visualization andComputer Graphics, 24(1):532–541, 2018. ISSN 1077-2626.doi:10.1109/TVCG.2017.2745278. → pages 16, 39[10] Y. Chen, P. Xu, and L. Ren. Sequence synopsis: Optimize visual summaryof temporal event data. IEEE Trans. Visualization and Computer Graphics,24(1):45–55, 2018. doi:10.1109/TVCG.2017.2745083. → page 15[11] E. H. Chi. Improving web usability through visualization. IEEE InternetComputing, 6(2):64–71, 2002. ISSN 1089-7801. doi:10.1109/4236.991445.→ page 16[12] J. A. Fails, A. Karlson, L. Shahamat, and B. Shneiderman. A visual interfacefor multivariate temporal data: Finding patterns of events across multiplehistories. In Proc. IEEE Symp. Visual Analytics Science and Technology(VAST), pages 167–174, 2006. doi:10.1109/VAST.2006.261421. → page 16[13] D. Gotz and H. Stavropoulos. DecisionFlow: Visual analytics forhigh-dimensional temporal event sequence data. IEEE Trans. Visualizationand Computer Graphics, 20(12):1783–1792, 2014.doi:10.1109/TVCG.2014.2346682. → pages 11, 16[14] S. Guo, K. Xu, R. Zhao, D. Gotz, H. Zha, and N. Cao. EventThread: Visualsummarization and stage analysis of event sequence data. IEEE Trans.Visualization and Computer Graphics, 24(1):56–65, 2018. ISSN 1077-2626.doi:10.1109/TVCG.2017.2745320. → page 15[15] J. Heer, J. Mackinlay, C. Stolte, and M. Agrawala. Graphical histories forvisualization: Supporting analysis, communication, and evaluation. IEEETrans. Visualization and Computer Graphics, 14(6):1189–1196, 2008. ISSN1077-2626. doi:10.1109/TVCG.2008.137. → page 8[16] I. hsien Ting, C. Kimble, and D. Kudenko. Visualizing and classifying thepattern of users browsing behavior for website design recommendation. InProc. Intl. ECML/PKDD Workshop on Knowledge Discovery in DataStream, pages 101–102, 2004. → page 16[17] S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactivevisual specification of data transformation scripts. In Proc. ACM Conf. onHuman Factors in Computing Systems (CHI), pages 3363–3372. ACM,2011. doi:10.1145/1978942.1979444. → page 643[18] S. Kannappady, S. P. Mudur, and N. Shiri. Visualization of web usagepatterns. In Proc. Intl. Database Engineering and Applications Symposium(IDEAS), pages 220–227, 2006. doi:10.1109/IDEAS.2006.52. → page 15[19] G. M. Karam. Visualization using timelines. In Proc. ACM SIGSOFT Intl.Symp. Software Testing and Analysis (ISSTA), pages 125–137. ACM, 1994.ISBN 0-89791-683-2. doi:10.1145/186258.187157. → page 16[20] J. Krause, A. Perer, and H. Stavropoulos. Supporting iterative cohortconstruction with visual temporal queries. IEEE Trans. Visualization andComputer Graphics, 22(1):91–100, 2016. ISSN 1077-2626.doi:10.1109/TVCG.2015.2467622. → pages 11, 16[21] M. Krstajic, E. Bertini, and D. Keim. CloudLines: Compact display of eventepisodes in multiple time-series. IEEE Trans. Visualization and ComputerGraphics, 17(12):2432–2439, 2011. ISSN 1077-2626.doi:10.1109/TVCG.2011.179. → page 16[22] J. B. Kruskal and J. M. Landwehr. Icicle plots: Better displays forhierarchical clustering. The American Statistician, 37(2):162–168, 1983.doi:10.2307/2685881. → page 16[23] B. C. Kwon, J. Verma, and A. Perer. Peekquence: Visual analytics for eventsequence data. In Proc. ACM SIGKDD Workshop on Interactive DataExploration and Analytics (IDEA), 2016. → page 15[24] H. Lam, D. Russell, D. Tang, and T. Munzner. Session Viewer: Visualexploratory analysis of web session logs. In Proc. IEEE Symp. VisualAnalytics Science and Technology (VAST), pages 147–154, 2007.doi:10.1109/VAST.2007.4389008. → pages 11, 16, 39[25] J. Lee, M. Podlaseck, E. Schonberg, and R. Hoch. Visualization and analysisof clickstream data of online stores for understanding web merchandising. 5:59–84, 2001. → page 16[26] Z. Liu, B. Kerr, M. Dontcheva, J. Grover, M. Hoffman, and A. Wilson.CoreFlow: Extracting and visualizing branching patterns from eventsequences. Computer Graphics Forum, 36(3):527–538, 2017. ISSN0167-7055. doi:10.1111/cgf.13208. URL https://doi.org/10.1111/cgf.13208.→ page 15[27] Z. Liu, Y. Wang, M. Dontcheva, M. Hoffman, S. Walker, and A. Wilson.Patterns and sequences: Interactive exploration of clickstreams to44understand common visitor paths. IEEE Trans. Visualization and ComputerGraphics, 23(1):321–330, 2017. ISSN 1077-2626.doi:10.1109/TVCG.2016.2598797. → pages 6, 11, 15, 28[28] A. Makanju, S. Brooks, A. N. Zincir-Heywood, and E. E. Milios. LogView:Visualizing event log clusters. In Proc. Conf. on Privacy, Security and Trust,pages 99–108, 2008. doi:10.1109/PST.2008.17. → page 15[29] S. Malik, B. Shneiderman, F. Du, C. Plaisant, and M. Bjarnadottir.High-volume hypothesis testing: Systematic exploration of event sequencecomparisons. ACM Trans. Interact. Intell. Syst., 6(1):9:1–9:23, 2016. ISSN2160-6455. doi:10.1145/2890478. → page 15[30] M. Monroe, R. Lan, H. Lee, C. Plaisant, and B. Shneiderman. Temporalevent sequence simplification. IEEE Trans. Visualization and ComputerGraphics, 19(12):2227–2236, 2013. ISSN 1077-2626.doi:10.1109/TVCG.2013.200. → page 16[31] P. Mui. Introducing flow visualization: visualizing visitor flow. https://analytics.googleblog.com/2011/10/introducing-flow-visualization.html,2011. URL https://analytics.googleblog.com/2011/10/introducing-flow-visualization.html. →page 16[32] A. Perer and D. Gotz. Data-driven exploration of care plans for patients. InExtended Abstracts of ACM Conf. Human Factors in Computing Systems(CHI), pages 439–444. ACM, 2013. doi:10.1145/2468356.2468434. →page 16[33] A. Perer and F. Wang. Frequence: Interactive mining and visualization oftemporal frequent event sequences. In Proc. Intl. Conf. on Intelligent UserInterfaces (IUI), pages 153–162, 2014. ISBN 978-1-4503-2184-6.doi:10.1145/2557500.2557508. URLhttp://doi.acm.org/10.1145/2557500.2557508. → pages 11, 15[34] J. Pitkow and K. A. Bharat. Webviz: A tool for world-wide web access loganalysis. In Proc. Intl. Conf. World Wide Web (WWW), pages 271–277,1994. → page 16[35] C. Plaisant, R. Mushlin, A. Snyder, , J. Li, D. Heller, and B. Shneiderman.Lifelines: using visualization to enhance navigation and analysis of patientrecords. In Proc. Symp. American Medical Informatics Association (AMIA),pages 76–80, 1998. → page 1645[36] A. Sarikaya, E. Zgraggen, R. Deline, S. Drucker, and D. Fisher. Sequencepre-processing: Focusing analysis of log event data. In IEEE VIS The EventEvent: Temporal & Sequential Event Analysis Workshop, 2016. URLhttps://pdfs.semanticscholar.org/1d65/8b25094f6215c681208945f7103f3ea2d030.pdf. → page 6[37] Z. Shen, J. Wei, N. Sundaresan, and K. L. Ma. Visual analysis of massiveweb session data. In Proc. IEEE Symp. Large Data Analysis andVisualization (LDAV), pages 65–72, 2012. doi:10.1109/LDAV.2012.6378977.→ pages 11, 16, 40[38] C. Shi, S. Fu, Q. Chen, and H. Qu. VisMOOC: Visualizing videoclickstream data from massive open online courses. In Proc. IEEE Symp.Pacific Visualization (PacificVis), pages 159–166, 2015.doi:10.1109/PACIFICVIS.2015.7156373. → page 16[39] K. Vrotsou, J. Johansson, and M. Cooper. ActiviTree: Interactive visualexploration of sequences in event-based data using graph similarity. IEEETrans. Visualization and Computer Graphics, 15(6):945–952, 2009.doi:10.1109/TVCG.2009.117. → page 16[40] G. Wang, X. Zhang, S. Tang, H. Zheng, and B. Y. Zhao. Unsupervisedclickstream clustering for user behavior analysis. In Proc. ACM Conf. onHuman Factors in Computing Systems (CHI), pages 225–236, 2016. ISBN978-1-4503-3362-7. doi:10.1145/2858036.2858107. URLhttp://doi.acm.org/10.1145/2858036.2858107. → pages 11, 15[41] T. D. Wang, C. Plaisant, A. J. Quinn, R. Stanchak, S. Murphy, andB. Shneiderman. Aligning temporal data by sentinel events: Discoveringpatterns in electronic health records. In Proc. ACM Conf. Human Factors inComputing Systems (CHI), pages 457–466. ACM, 2008.doi:10.1145/1357054.1357129. → page 16[42] S. J. Waterson, J. I. Hong, T. Sohn, J. A. Landay, J. Heer, and T. Matthews.What did they do? understanding clickstreams with the WebQuiltvisualization system. In Proc. Working Conf. on Advanced Visual Interfaces(AVI), pages 94–102. ACM, 2002. doi:10.1145/1556262.1556276. → page16[43] J. Wei, Z. Shen, N. Sundaresan, and K. L. Ma. Visual cluster exploration ofweb clickstream data. In Proc. IEEE Conf. Visual Analytics Science andTechnology (VAST), pages 3–12, 2012. doi:10.1109/VAST.2012.6400494.→ page 1546[44] K. Wongsuphasawat and D. Gotz. Exploring flow, factors, and outcomes oftemporal event sequences with the outflow visualization. IEEE Trans.Visualization and Computer Graphics, 18(12):2659–2668, 2012. ISSN1077-2626. doi:10.1109/TVCG.2012.225. → page 16[45] K. Wongsuphasawat and J. Lin. Using visualizations to monitor changes andharvest insights from a global-scale logging infrastructure at Twitter. InProc. IEEE Conf. Visual Analytics Science and Technology (VAST), pages113–122, 2014. doi:10.1109/VAST.2014.7042487. → pages 11, 16[46] K. Wongsuphasawat, J. A. Guerra Go´mez, C. Plaisant, T. D. Wang,M. Taieb-Maimon, and B. Shneiderman. LifeFlow: Visualizing an overviewof event sequences. In Proc. ACM Conf. Human Factors in ComputingSystems (CHI), pages 1747–1756. ACM, 2011.doi:10.1145/1978942.1979196. → page 16[47] E. Zgraggen, S. M. Drucker, D. Fisher, and R. DeLine. (s,Qu)Eries: Visualregular expressions for querying and exploring event sequences. In Proc.ACM Conf. on Human Factors in Computing Systems (CHI), pages2683–2692, 2015. ISBN 978-1-4503-3145-6.doi:10.1145/2702123.2702262. URLhttp://doi.acm.org/10.1145/2702123.2702262. → page 16[48] X. Zhang, H.-F. Brown, and A. Shankar. Data-driven personas: Constructingarchetypal users with clickstreams and user telemetry. In Proc. ACM Conf.on Human Factors in Computing Systems (CHI), pages 5350–5359. ACM,2016. doi:10.1145/2858036.2858523. URLhttp://doi.acm.org/10.1145/2858036.2858523. → page 15[49] J. Zhao, Z. Liu, M. Dontcheva, A. Hertzmann, and A. Wilson. MatrixWave:Visual comparison of event sequence data. In Proc ACM Conf. HumanFactors in Computing Systems (CHI), pages 259–268, 2015. ISBN978-1-4503-3145-6. doi:10.1145/2702123.2702419. URLhttp://doi.acm.org/10.1145/2702123.2702419. → page 1547Appendix ASupporting MaterialsThe following supporting material provides additional screenshots of Segmentifierto further clarify the visual encodings and linked-view interaction, and to furtherdocument how the clickstream analysis framework is instantiated through the in-terface. We show full screenshots of every step of Analysis #1 and Analysis #2 ofthe case study with our domain expert. Figures A.1 to A.12 describe Analysis #1where 200,000 per-session sequences are loaded into the tool, each representingone user session. Figures A.13 to A.21 describe Analysis #2 where 200,000 per-client sequences are loaded each representing all actions performed by that user.48Figure A.1: Case Study Analysis #1 (CS-A1): The initial state of the interface when 200,000 sequences, each rep-resenting a user session, are loaded in. The initial segment including the 200,000 sequences is shown as a bluerectangle in the Analysis Paths view in the middle. The Segment Inspector view on the right shows informa-tion about the underlying data of the segment including ordinal and categorical segment attributes and the actualsequences. The analyst has thus far chosen to show only two per-action histograms, for APPSTART and APPDIS-PLAYERROR.49Figure A.2: The analyst first narrows down to purchasers by filtering segments that contain a PURCHASE action andfrom those they filter out segments that follow all five actions required to purchase an item and those that donot. They add the CHECKOUT chart to the Ranges section on the right, notice that 3 pages seem to be requiredto properly check out from this page, and click on that chart to start creating a partition operation in the RangesOperation Builder on the left. They build a partition operation to determine which sequences had fewer than,equal to, or more than that expected number of checkout pages and apply it to the segment containing all fivepurchasing actions. The Operation Inspector view on the left shows that 15% of sessions contain fewer, 48%equal to, and 37% more.50Figure A.3: The analyst decides to further analyze those who viewed more checkout pages than expected by determin-ing those that viewed four and then five in a row. They used the Actions Operation Builder to specify that manyconsecutive actions, as we can tell from the glyphs in Operations Inspector in the upper left showing the path tothe selected segment at the bottom of this analysis path. It contains only 289 sequences, and the raw sequencesin the Sequence Detail view on the bottom right are quite long for this uncommon behavior.51Figure A.4: The analyst decides to go back up the tree to select the segment on the right side of the Analysis Paths viewto check the number of sessions that include a purchase action but not all five actions of the purchasing funnel.The Operation Inspector shows them that 28% of users do not complete a purchase in one session but return in alater session to do so.52Figure A.5: The analyst continues their inquiry by investigating how many of the five actions in the purchasing funnelare completed in one session through multiple rounds of splits using the Actions Operation Builder to indicatewhich actions are not contained in the segment. They notice that 16% of users that make it to checkout must havereturned after a previous session, since they PURCHASE without having conducted the other required actions inthis session.53Figure A.6: The analyst decides to investigate a hypothesis that behavior changes depending on the time of day so theyreturn to the root segment of the Analysis Paths tree to start new analysis paths. They click on the Start Hourchart from the Ranges section of the Segment Inspector view in the upper right and select a two-hour segmentfrom 7 to 9am, which automatically populates the Filter tab of the Ranges Operation Builder accordingly, andthey hit the Apply button to create that new segment. They then do the same for the 7-9pm range, and they seethere are more such sequences: around 31K at night vs. around 20K in the morning.54Figure A.7: They first investigate the percentage of sequences that contain the full purchasing funnel with the hypothe-sis that users spend more time browsing and purchasing at night compared to the morning when users are usuallycommuting to work. However, using the Operation Inspector, they discover that the results are similar: althoughthe absolute counts differ, the relative proportion of around 2% of sessions is the same for both time ranges.55Figure A.8: They then investigate the number of actions in the sequences with the similar hypothesis that users mayhave longer sessions at night after dinner than in the morning commute time slot. By building partition operationsand applying them to both segments, using the Operation Inspector, they discover a similar disconfirmatory resultthat the size of the partitions are also similar between both times.56Figure A.9: The analyst decides to investigate sequences that contain an ADDTOCART action. However, after inspect-ing the data through the Segment Inspector, they did not discover anything worth exploring further and theyabandon this analysis path.57Figure A.10: The analyst also decides to investigate sequences that contain a REMOVEFROMCART action. Using theSelected Action Adjacency View on the right, they notice that 18% of the time this action results in the end of asession.58Figure A.11: The analyst decides to analyze the purchasing funnel by building the pattern of five actions in the ActionsOperation Builder on the left and clicking Apply as Funnel. The result is shown in the Analysis Paths View.After selecting the final resulting segment, the details of the dropout of each step is shown in the OperationInspector on the top left.59Figure A.12: The analyst decides to investigate the sequences that got to the checkout part of the purchasing funnelbut did not purchase. The Operation Inspector shows that this occurs 37% of the time. Using the previouslyfound insight in Figure A.5 that 16% of users that check out are returning from a previous session, the analysthypothesizes that about 21% of users that get to checkout never come back to purchase.60Figure A.13: Case Study Analysis #2: The initial state of the interface when 200,000 per-client sequences are firstloaded, representing all actions performed by user over the entire dataset time window. The Duration histogramaxis now extends to 30 days with sharp dropoff, in contrast to the Analysis #1 segments where this histogramwas usually capped at between 15 and 20 minutes with a much more uniform distribution. The Action Level atthe top right is set to the default of Roll-up.61Figure A.14: The analyst switches to the Detailed Action Level to get further details about pages viewed. They inves-tigate users that make it to the checkout but do not purchase. The Select Action Adjacency View on the rightshows that 25% of those users that REMOVEFROMCART leave the site and never return.62Figure A.15: The analyst scrolls down the Segment Inspector view on the right and notices in the Sequence Detailsview at the bottom that an APPSTART action represented by the gray glyph occurs often before the CART actionrepresented by the pink glyph. They confirm their observation using the Selected Action Adjacency Chart whichshows that 84% of the time an APPSTART action is generated before entering the cart. Followup investigationdetermined a problem with the cart pages of the website.63Figure A.16: The analyst wants to investigate the impact of a new awards account whose pages are stored in theELITEREWARDS pageview action. They first look at the percentage of purchasers who signed up and discoverby looking at the Operation Inspector that it is 6% of users.64Figure A.17: The analyst investigates at which point of the purchasing funnel the users accessed the awards accountby building and filtering by patterns of actions in the Actions Operation Builder with the awards account actioninserted at three different parts of the funnel. They discover based on the sizes of the three resulting segmentsthat it is accessed most frequently after ADDTOCART.65Figure A.18: The analyst creates a new segment with sequences containing the ELITEREWARDS action and uses theOperation Inspector view to discover that 1% of users have signed up for the awards account.66Figure A.19: They investigate further by filtering sequences that contain a PURCHASE action and those that do not,discovering that 27% of users who access the rewards account end up making a purchase.67Figure A.20: The analyst switches to the Mid-level of the action hierarchy using the Action Level radio buttons at thetop right corner to simplify the sequences. They notice in the Operations Inspector that 83% of users accessany ACCOUNT page before purchasing.68Figure A.21: They investigate further to determine at what point of the purchasing funnel users access their account.They establish that it occurs most frequently before ADDTOCART.69

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0365820/manifest

Comment

Related Items