UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Understanding motifs of program behaviour and change Alimadadi Jani, Saba 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_february_alimadadijani_saba.pdf [ 2.13MB ]
JSON: 24-1.0357369.json
JSON-LD: 24-1.0357369-ld.json
RDF/XML (Pretty): 24-1.0357369-rdf.xml
RDF/JSON: 24-1.0357369-rdf.json
Turtle: 24-1.0357369-turtle.txt
N-Triples: 24-1.0357369-rdf-ntriples.txt
Original Record: 24-1.0357369-source.json
Full Text

Full Text

Understanding Motifs of Program Behaviour and ChangebySaba Alimadadi JaniB.Sc., University of Tehran, Iran, 2009M.Sc., Simon Fraser University, Canada, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)October 2017© Saba Alimadadi Jani, 2017AbstractProgram comprehension is crucial in software engineering; a necessary step forperforming many tasks. However, the implicit and intricate relations betweenprogram entities hinder comprehension of program behaviour and change. It isparticularly a difficult endeavour to understand dynamic and modern programminglanguages such as JavaScript, which has grown to be among the most popularlanguages. Comprehending such applications is challenging due to the temporal andimplicit relations of asynchronous, DOM-related and event-driven entities spreadover the client and server sides.The goal of the work presented in this dissertation is to facilitate program com-prehension through the following techniques. First, we propose a generic techniquefor capturing low-level event-based interactions in a web application and mappingthose to a higher-level behavioural model. This model is then transformed intoan interactive visualization, representing episodes of execution through differentsemantic levels of granularity. Then, we present a DOM-sensitive hybrid change im-pact analysis technique for JavaScript through a combination of static and dynamicanalysis. Our approach incorporates a novel ranking algorithm for indicating the im-portance of each entity in the impact set. Next, we introduce a method for capturinga behavioural model of full-stack JavaScript applications’ execution. The model istemporal and context-sensitive to accommodate asynchronous events, as well as thescheduling and execution of lifelines of callbacks. We present a visualization of themodel to facilitate program comprehension for developers. Finally, we propose anapproach for facilitating comprehension by creating an abstract model of softwarebehaviour. The model encompasses hierarchies of recurring and application-specificmotifs. The motifs are abstract patterns extracted from traces through our noveliitechnique, inspired by bioinformatics algorithms. The motifs provide an overview ofthe behaviour at a high level, while encapsulating semantically related sequences inexecution. We design a visualization that allows developers to observe and interactwith inferred motifs.We implement our techniques in open-source tools and evaluate them througha set of controlled experiments. The results show that our techniques significantlyimprove developers’ performance in comprehending the behaviour and impact ofchange in software systems.iiiLay SummaryProgram comprehension is crucial in software engineering; a necessary step forperforming many tasks. Assisting comprehension through analysis of code andprogram execution traces has been a popular research area. However, the implicitand intricate relations between program entities hinder comprehension of programbehaviour and change. It is particularly challenging to understand modern program-ming languages such as JavaScript, which has grown to be among the most popularlanguages for both client and server development.The goal of the work presented in this dissertation is to facilitate programcomprehension through semi-automated techniques, using both static and dynamicanalysis. Our techniques create behavioural models of program execution throughour proposed algorithms, and visualize them for developers in order to improve theirperformance. We evaluate our techniques through a set of controlled experiments.The results show that our methods significantly improve developers’ performancein terms of task completion duration and accuracy.ivPrefaceEach research chapter of this dissertation relates to one or more papers, whichhave been published or are currently under review. I have collaborated with mysupervisors, Dr. Ali Mesbah and Dr. Karthik Pattabiraman, for conducting theresearch projects in all chapters. I was the main contributor for all chapters, includingthe idea, design, development, and evaluation of the work. I had the collaborationof Sheldon Sequeira, a former Masters’ student in our lab, for Chapter 2.The publications for each chapter are as follows.• Chapter 2:– Understanding JavaScript Event-based Interactions [4]: S. Alimadadi,S. Sequeira, A. Mesbah and K. Pattabiraman, ACM/IEEE InternationalConference on Software Engineering (ICSE), 2014, 11 pages. (Accep-tance Rate: 20%)ACM SIGSOFT Distinguished Paper Awsrd– Understanding JavaScript Event-based Interactions with CLEMATIS[8]: S. Alimadadi, S. Sequeira, A. Mesbah and K. Pattabiraman, ACMTransactions on Software Engineering and Methodology (TOSEM),2016, 39 pages.• Chapter 3:– Hybrid DOM-Sensitive Change Impact Analysis for JavaScript [6]: S.Alimadadi, A. Mesbah and K. Pattabiraman, European Conference onObject-Oriented Programming (ECOOP), 2015, 25 pages. (AcceptanceRate: 22.8%)v• Chapter 4:– Understanding Asynchronous Interactions in JavaScript Applications[7]: S. Alimadadi, A. Mesbah and K. Pattabiraman, ACM/IEEE Inter-national Conference on Software Engineering (ICSE), 2016, 12 pages.(Acceptance Rate: 19%)• Chapter 5:– Inferring Hierarchical Motifs from Execution Traces: S. Alimadadi, A.Mesbah and K. Pattabiraman, 12 pages (submitted).viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Challenges and Motivation . . . . . . . . . . . . . . . . . . . . . 21.1.1 Understanding Dynamic JavaScript Behaviour . . . . . . 21.1.2 Understanding Impact of Change in JavaScript . . . . . . 51.1.3 Understanding Hierarchical Motifs of Behaviour in Execution 61.2 Origin of Chapters and Acknowledgements . . . . . . . . . . . . 72 Understanding JavaScript Event-Based Interactions . . . . . . . . . 92.1 Challenges and Motivation . . . . . . . . . . . . . . . . . . . . . 112.1.1 Challenge 1: Event Propagation . . . . . . . . . . . . . . 12vii2.1.2 Challenge 2: Asynchronous Events . . . . . . . . . . . . 132.1.3 Challenge 3: Implications of Events . . . . . . . . . . . . 152.1.4 Challenge 4: Linking Test Failures to Faults . . . . . . . . 152.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.1 JavaScript Transformation and Tracing . . . . . . . . . . 192.2.2 Capturing a Behavioural Model . . . . . . . . . . . . . . 232.2.3 Understanding Test Assertion Failures . . . . . . . . . . . 252.2.4 Visualizing the Captured Model . . . . . . . . . . . . . . 312.2.5 Tool Implementation: Clematis . . . . . . . . . . . . . . 362.3 Controlled Experiments . . . . . . . . . . . . . . . . . . . . . . . 372.3.1 Experimental Design . . . . . . . . . . . . . . . . . . . . 372.3.2 Experimental Procedure . . . . . . . . . . . . . . . . . . 392.4 Experiment 1: Lab Environment . . . . . . . . . . . . . . . . . . 392.4.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 392.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.5 Experiment 2: Industrial . . . . . . . . . . . . . . . . . . . . . . 422.5.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 422.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.5.3 Qualitative Analysis of Participant Feedback . . . . . . . 452.6 Experiment 3: Test Failure Comprehension . . . . . . . . . . . . 492.6.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 492.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.7 Performance Overhead . . . . . . . . . . . . . . . . . . . . . . . 522.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.8.1 Task Completion Duration . . . . . . . . . . . . . . . . . 532.8.2 Task Completion Accuracy . . . . . . . . . . . . . . . . . 552.8.3 Consistent Performance . . . . . . . . . . . . . . . . . . 572.8.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 572.8.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 582.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 593 Hybrid DOM-Sensitive Change Impact Analysis for JavaScript . . . 603.1 Impact Transfer in JavaScript . . . . . . . . . . . . . . . . . . . . 62viii3.1.1 Impact through the DOM . . . . . . . . . . . . . . . . . . 643.1.2 Impact through Event Propagation . . . . . . . . . . . . . 653.1.3 Impact through Asynchronous Callbacks . . . . . . . . . 663.1.4 JavaScript Dynamism . . . . . . . . . . . . . . . . . . . 673.1.5 Impact Paths . . . . . . . . . . . . . . . . . . . . . . . . 683.2 Exploratory Study: DOM-related and Event-based Impacts . . . . 683.3 Hybrid Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.3.1 Static Control-Flow and Partial Data-Flow Analysis . . . . 703.3.2 Analyzing the Dynamic Features . . . . . . . . . . . . . . 713.3.3 Hybrid Model for Impact Analysis . . . . . . . . . . . . . 743.4 Impact Metrics and Impact Set Ranking . . . . . . . . . . . . . . 773.5 Tool Implementation: TOCHAL . . . . . . . . . . . . . . . . . . . 793.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.6.1 Study 1: Comparing Static, Dynamic, and Hybrid Analyses 803.6.2 Study 2: Industrial Controlled Experiment . . . . . . . . . 843.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 914 Understanding Asynchronous Interactions in Full-Stack JavaScript 924.1 Challenges and Motivation . . . . . . . . . . . . . . . . . . . . . 944.1.1 Challenge 1: Server-Side Callbacks . . . . . . . . . . . . 944.1.2 Challenge 2: Asynchronous Client Side . . . . . . . . . . 954.1.3 Challenge 3: Network Communication . . . . . . . . . . 974.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.2.1 Temporal and Context-Sensitive Model . . . . . . . . . . 974.2.2 Client-Side Analysis . . . . . . . . . . . . . . . . . . . . 1004.2.3 Server-Side Analysis . . . . . . . . . . . . . . . . . . . . 1044.2.4 Connecting Client and Server . . . . . . . . . . . . . . . 1064.2.5 Visualizing the Model . . . . . . . . . . . . . . . . . . . 1064.2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . 1084.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 1094.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 112ix4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155 Inferring Hierarchical Motifs from Execution Traces . . . . . . . . 1175.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175.2 Challenges and Motivation . . . . . . . . . . . . . . . . . . . . . 1195.3 Overview of the Methodology . . . . . . . . . . . . . . . . . . . 1225.4 Algorithm for Inferring Motifs . . . . . . . . . . . . . . . . . . . 1235.4.1 Inspiration from Analyzing Biological Sequences . . . . . 1245.4.2 Forming a Knowledge Base . . . . . . . . . . . . . . . . 1255.4.3 Finding Exact Matches . . . . . . . . . . . . . . . . . . . 1255.4.4 Allowing Abstraction in Motifs . . . . . . . . . . . . . . 1265.4.5 Inferring Hierarchies of Motifs . . . . . . . . . . . . . . . 1295.5 Creating and Visualizing the Model . . . . . . . . . . . . . . . . 1305.5.1 Creating the Motif Model . . . . . . . . . . . . . . . . . 1315.5.2 Visualizing the Model . . . . . . . . . . . . . . . . . . . 1335.6 Implementation: SABALAN . . . . . . . . . . . . . . . . . . . . . 1345.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345.7.1 Motif Characteristics . . . . . . . . . . . . . . . . . . . . 1345.7.2 Controlled Experiment . . . . . . . . . . . . . . . . . . . 1365.7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 1405.7.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 1435.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 1446 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 1527.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1527.2 Research Questions Revisited . . . . . . . . . . . . . . . . . . . . 1547.3 Reflections and Future Directions . . . . . . . . . . . . . . . . . 157Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160xList of TablesTable 2.1 Adopted and adapted comprehension activities. . . . . . . . . . 38Table 2.2 Comprehension tasks used in study 1. . . . . . . . . . . . . . . 40Table 2.3 Comprehension tasks used in study 2. . . . . . . . . . . . . . . 42Table 2.4 Injected faults for the controlled experiment. . . . . . . . . . . 49Table 3.1 (A) Results of analyzing JavaScript’s DOM-related and dynamicfeatures. (B) Factors in determining impact metrics. . . . . . . 70Table 3.2 Impact transfer through different entities. . . . . . . . . . . . . 74Table 3.3 Impact Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 77Table 3.4 Results of comparison between static, dynamic and TOCHAL(RQ1) (A) Comparison of impact sets (B) Comparison of func-tions in system dependency graphs . . . . . . . . . . . . . . . 82Table 3.5 Impact analysis tasks used in the controlled experiment. . . . . 86Table 4.1 Types of vertices in the model graph . . . . . . . . . . . . . . 99Table 4.2 Interaction Edges . . . . . . . . . . . . . . . . . . . . . . . . 100Table 4.3 Creation and extension of the behavioural graph based on theoperations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Table 4.4 Comprehension tasks of the experiment . . . . . . . . . . . . . 110Table 5.1 Characteristics of traces and inferred motifs . . . . . . . . . . 136Table 5.2 Comprehension tasks used in the study. . . . . . . . . . . . . . 138xiList of FiguresFigure 2.1 Initial DOM state of the running example. . . . . . . . . . . . 12Figure 2.2 JavaScript code of the running example. . . . . . . . . . . . 13Figure 2.3 Test assertion understanding example (a) JavaScript code, (b)Portion of DOM-based UI, (c) Partial DOM, (d) DOM-based(Selenium) test case, (e) Test case assertion failure. The dottedlines show the links between the different entities that must beinferred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Figure 2.4 A processing overview of our approach. . . . . . . . . . . . . 18Figure 2.5 Instrumented JavaScript function declaration. . . . . . . . . . 20Figure 2.6 Instrumented JavaScript return statement. . . . . . . . . . . . 21Figure 2.7 Instrumented JavaScript function calls. . . . . . . . . . . . . 21Figure 2.8 Comparison of instrumention techniques for JavaScript functioncalls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 2.9 Relating assertions to DOM accesses for the test case of Fig-ure 2.3d. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 2.10 Example JavaScript code after our selective instrumentation isapplied. Slicing criteria: <10, size> . . . . . . . . . . . . . 30Figure 2.11 Overview of all captured stories. . . . . . . . . . . . . . . . . 31Figure 2.12 Top: menu of CLEMATIS. Bottom: overview of a captured story. 32Figure 2.13 Three semantic zoom levels in CLEMATIS. Top: overview.Middle: zoomed one level into an episode, while preservingthe context of the story. Bottom: drilled down into the selectedepisode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33xiiFigure 2.14 Visualization for a test case. (a) Overview of the passing testcase, (b) Three semantic zoom levels for the failing test case;Top: overview. Middle: second zoom level showing assertiondetails, while preserving the context. Bottom: summary offailing assertion and the backwards slice. . . . . . . . . . . . 35Figure 2.15 t-test analysis with unequal variances of task completion dura-tion by tool type. Lower values are better. [Study 1] . . . . . . 41Figure 2.16 Box plots of task completion duration data per task for eachtool. Lower values are better. [Study 1] . . . . . . . . . . . . 41Figure 2.17 t-test analysis with unequal variances of task completion accu-racy by tool type. Higher values are better. [Study 1] . . . . . 41Figure 2.18 Box plots of task completion accuracy data per task for eachtool. Higher values are better. [Study 1] . . . . . . . . . . . . 41Figure 2.19 Notched box plots of task completion duration data per task andin total for the control (gold) and experimental (green) groups(lower values are desired). [Study 2] . . . . . . . . . . . . . . 43Figure 2.20 Notched box plots of task completion accuracy data per taskand in total for the control (gold) and experimental (green)groups (higher values are desired). [Study 2] . . . . . . . . . 43Figure 2.21 Box plots of task completion accuracy data per task and in totalfor the control (blue) and experimental (cream) groups (highervalues are desired). [Study 3] . . . . . . . . . . . . . . . . . . 51Figure 2.22 Box plots of task completion duration data per task and in totalfor the control (blue) and experimental (cream) groups (lowervalues are desired). [Study 3] . . . . . . . . . . . . . . . . . . 51Figure 3.1 Motivating example: JavaScript code . . . . . . . . . . . . . 62Figure 3.2 Motivating example: HTML/DOM . . . . . . . . . . . . . . 63Figure 3.3 Impact transfer through DOM elements. . . . . . . . . . . . . 64Figure 3.4 Impact transfer through event propagation. . . . . . . . . . . 65Figure 3.5 Impact transfer through asynchronous callbacks. . . . . . . . 67Figure 3.6 A static call graph, displaying the dependencies extracted fromthe running example (Figures 3.1 and 3.2). . . . . . . . . . . 75xiiiFigure 3.7 A sample hybrid graph, including the dynamic and DOM-sensitive dependencies extracted from the running example(Figures 3.1 and 3.2). . . . . . . . . . . . . . . . . . . . . . . 75Figure 3.8 Task completion accuracy per task and in total for the control(ctrl) and experimental (exp) groups (RQ4.2.1). . . . . . . . . 88Figure 3.9 Task completion duration data per task and in total for thecontrol (ctrl) and experimental (exp) groups (RQ4.2.2). . . . . 88Figure 4.1 Receiving HTTP requests at an end point . . . . . . . . . . . 95Figure 4.2 Callback hell . . . . . . . . . . . . . . . . . . . . . . . . . . 96Figure 4.3 Asynchronous client-side JavaScript . . . . . . . . . . . . . 96Figure 4.4 A sample temporal, context-sensitive and asynchronous modelof events, lifelines, and interactions. . . . . . . . . . . . . . . 99Figure 4.5 A snapshot of the visualization. . . . . . . . . . . . . . . . . 107Figure 4.6 Accuracy results. Gold plots display experimental (SAHAND)group, and green plots display the control group. Higher valuesare better. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Figure 5.1 I: A sample registration form. II: A) Sample execution trace,and B) hierarchy of inferred motifs. III: Dynamic call graph ofexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120Figure 5.2 Initial DOM state of the running example. . . . . . . . . . . . 121Figure 5.3 [Partial] JavaScript code of the running example. . . . . . . . 121Figure 5.4 This figure depicts a DB of traces (A) and a sample query trace(B) of an application, on the left and right side, respectively.Exact matches of length 2 and 3 between the query trace anddifferent DB traces are marked. . . . . . . . . . . . . . . . . 125Figure 5.5 This figure briefly depicts (A) the process of taking two ex-panded trace subsets, (B, C) forming a scoring matrix basedon similarities between sub-traces, and (D, E) finding a matchin manner that maximizes the similarities between sub-traces.The final motif can be seen in (F). . . . . . . . . . . . . . . . 128xivFigure 5.6 Sample model of the running example. The root node (1) isthe highest-level inferred motif. Node 2 is a sub-motif of node(1), marked by the hierarchical edge between the two. Node3 is abstract allowing variations of its child node to occur inthe motif. The leaves of (nodes 4–8) are concrete functionexecutions in the trace. . . . . . . . . . . . . . . . . . . . . . 131Figure 5.7 A [modified] screenshot of visualization. (A): Query trace. (B):Inferred motifs depicted on the table. (C): Motif hierarchies.(D):All motifs. (E): Code panel displaying selected function/-motif code. . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Figure 5.8 Notched box plots of ccuracy results. Green plots display exper-imental (SABALAN) group, and gold plots display the controlgroup. Higher values are better. . . . . . . . . . . . . . . . . 139xvAcknowledgmentsForemost, I wish to express my utmost gratitude to my senior supervisor, Dr. AliMesbah. I am indebted to him for his expertise and insight. For continuouslysupporting me in my research, while steering me in the right direction when Ineeded. I owe my deepest gratitude to my co-supervisor, Dr. Karthik Pattabiraman,for his inspiration and motivation. For perfectly balancing the roles of a criticalreviewer and a kind advisor at the same time. I am thankful to my supervisors forallowing me to discover my path, while always being there for guidance.I am most grateful to Dr. Ivan Beschastnikh and Dr. Alexandra Fedorova, fortheir insightful comments and hard questions. My sincere gratitude goes to mythesis committee, Dr. Gail C. Murphy, Dr. Sidney Fels, and Dr. William G.J.Halfond. I would also like to acknowledge all my dear friends and colleagues inSoftware Analysis and Testing (SALT) Lab at UBC.I wish to thank my beloved brothers, Siavash and Soroush. I am sincerelygrateful to you for your constant love and support. I cannot find words to expressmy gratitude to my parents, Parviz and Parvin. I am eternally grateful to you foryour unconditional and endless love, guidance and support.xviTo my parents.xviiChapter 1IntroductionProgram comprehension is known to be an essential step in software engineering.Developers spend a considerable amount of time understanding code. About 50% ofmaintenance effort is spent on comprehension alone [31]. To understand code, de-velopers typically start by searching for clues in the code and the environment. Thenthey navigate the incoming and outgoing dependencies to relate pieces of foragedinformation. Throughout the process, they collect information they find relevant forunderstanding the code on an “as-needed” basis [71]. However, developers often failin searching and relating information, and lose track of relevant information whenusing such ad-hoc strategies [117]. Further, they form a mental model of the entities,relations and the intent of the code, which they use throughout development to helpthem make decisions. Unfortunately, these models are almost always inaccurate[92]. Thus, code understanding is challenging, and there is a need for systematicand automated techniques that facilitate this process [75]. Further, presence ofprogramming languages characteristics such as dynamism, asynchrony and non-determinism in the execution makes the analysis more problematic and burdensome,and renders conventional techniques ineffective. It is particularly a challengingendeavour to understand the complex behaviour of modern programming languages,such as JavaScript.JavaScript is widely used today to create interactive modern web applicationsthat replace many traditional desktop applications. It has been selected as the mostpopular programming language for five consecutive years [126] and it is the most1used language on GitHub [74]. However, understanding the behaviour of webapplications is troublesome for developers [101, 143].1.1 Challenges and MotivationThe goal of this thesis is to investigate support for developers to improve theirperformance in program comprehension tasks. However, the implicit and intricaterelations between program entities hinder comprehension of program behaviour.Comprehending client- and server-side JavaScript applications is particularly chal-lenging due to the temporal and implicit relations of asynchronous, dynamic, andevent-driven entities split between the client and server. These relation also affectthe way a change propagates in a JavaScript application and impacts the systemas a whole. Moreover, size and complexity of execution traces further complicatecomprehension of real-world applications.1.1.1 Understanding Dynamic JavaScript BehaviourDespite their popularity, understanding the behaviour of modern web applications isstill a rigorous task for developers during development and maintenance tasks. Thechallenges mainly stem from the dynamic, event-driven, and asynchronous natureof the JavaScript language.Understanding Event-Based InteractionsTo understand JavaScript, developers must understand its weakly-typed, event-driven, and highly-dynamic nature, its interactions with the Document ObjectModel (DOM), and communications with the server. Unfortunately, despite itsimportance and challenges, there is currently not much research dedicated to sup-porting program comprehension for web applications [34]. Popular tools, suchas Firebug and Chrome DevTools, are limited in their capabilities to support webdevelopers effectively.Research Question 1. How can we enhance developers’ performance in under-standing the event-based interactions in client-side JavaScript?2To address RQ1, we propose a generic technique for capturing low-level event-based interactions in a web application and mapping those to a higher-level be-havioural model. This model is then transformed into an interactive visualization,representing episodes of triggered causal and temporal events, related JavaScriptcode executions, and their impact on the dynamic state of the DOM. Our approachallows developers to easily understand the complex dynamic behaviour of theirapplication at three different semantic levels of granularity. Furthermore, it helpsdevelopers bridge the gap between test cases and program code by localizing thefault related to a test assertion. This method is discussed in Chapter 2.Understanding Test Failures and Their Root CausesTo test their web applications, developers often write test cases that check the appli-cation’s behaviour from an end-user’s perspective using popular frameworks suchas Selenium [121]. Such test cases are agnostic of the JavaScript code and operateby simulating a series of user actions followed by assertions on the application’sruntime DOM. As such, they can detect deviations in the expected behaviour asobserved in the DOM.However, when a web application test assertion fails, determining the faultyprogram code responsible for the failure can be challenging. Fault localization isone of the most difficult phases of debugging [132], and has been an active topic ofresearch [1, 28, 67, 144]. Although testing of modern web applications has receivedincreasing attention in the recent past [15, 88, 129], there has been limited work onwhat happens after a test reveals an error.Research Question 2. How can we enhance developers’ performance in under-standing the root-causes of test assertion failures in client-side JavaScript?We extend our approach for understanding program behaviour in Chapter 2to further aid developers in understanding root causes of test failures. Our test-case comprehension strategy automatically connects test assertion failures to faultyJavaScript code considering the involved DOM elements.3Asynchronous Interactions in Full-Stack JavaScriptJavaScript has been the lingua franca of client-side web development for someyears. But platforms such as Node.js [97] have made it possible to use Java-Script for writing code that runs outside of the browser. As such, “full-stack”applications written entirely in JavaScript from client-side to the server-side havealso seen an exponential growth recently. Node.js provides a light-weight, non-blocking, fast, and scalable platform for writing network-based applications. It isalso more convenient for web developers to use the same language for both front-and back-end development. Despite all the advantages, this approach imposes manychallenges on the developers’ comprehension of the dynamic execution of a webapplication. Understanding such applications is challenging for developers, due tothe temporal and implicit relations of asynchronous and event-driven entities splitacross the client and server side. JavaScript applications take extensive advantage ofasynchronous callbacks [49] to simulate concurrency, which complicates the flow ofexecution. The impact of asynchrony intensifies due to the communication betweenthe client and server. They interact heavily with the DOM, events, timers, and XHRobjects, across both client and server, all of which negatively affect developers’understanding. The uncertainty involved in the asynchronous communication makesthe execution more intricate and thus more difficult to understand for developers.Despite the popularity of full-stack JavaScript development and severity of thesechallenges, there is currently no technique available that provides a holistic overviewof the execution of JavaScript code in full-stack web applications. The existing tech-niques do not support full-stack JavaScript comprehension [4, 10, 61, 84, 101, 143].Research Question 3. How can we improve developers’ understanding of thetemporal and asynchronous behaviour of full-stack JavaScript?In Chapter 4, we propose a technique for capturing a behavioural model offull-stack JavaScript applications’ execution. The model is temporal and context-sensitive to accommodate asynchronous events, as well as the scheduling andexecution of lifelines of callbacks. We present a visualization of the model tofacilitate program understanding for developers. Our goal is to help developers gain4a holistic view of the dynamic behaviour of full-stack JavaScript applications.1.1.2 Understanding Impact of Change in JavaScriptTo remain useful, a software program must continually change to adapt to thechanging environment [77]. Code change impact analysis (CIA) [13] aims atidentifying parts of the program that are potentially affected by a change in the code.Impact analysis has been a popular research area [12, 20, 105, 111, 114]. Most ofthe research, however, is focused on traditional programming languages and ignoresJavaScript and thereby modern web applications.JavaScript requires a novel approach for effective change impact analysis, be-cause of its unique set of features that make it challenging to comprehend [4] andanalyze by traditional code analysis techniques [47]. Each of the parties involved inthe execution of a JavaScript application can introduce new and implicit relations inthe system that can transfer the impact of change seamlessly. For instance, we haveobserved that the impact of a code change can be propagated through the DOM,even when there may be no visible connections between JavaScript functions andvariables in the JavaScript code [6]. Similarly, events and server interactions cantransfer the impact of a change, in addition to the JavaScript code itself.Traditional impact analysis has been performed either by static analysis ordynamic analysis. However, the aforementioned challenges make it difficult forJavaScript static analysis techniques [47, 66, 125] to carry out impact analysis effec-tively. None of these techniques can provide support for the DOM-based, dynamicand asynchronous JavaScript features. Further, current dynamic and hybrid analysistechniques [136, 137] ignore the aforementioned challenges, i.e., they overlook theimportant role of the DOM in their analysis and they do not support the hiddenrelations that are created through event-handler registration, event propagation, andasynchronous client/server communication.Research Question 4. How does providing a model of the dependencies in theapplication improve developers’ performance in understanding the change impactin JavaScript applications?5We first perform an empirical study of change propagation, the results of whichshow that the DOM-related and dynamic features of JavaScript need to be takeninto consideration in the analysis since they affect change impact propagation. Wepropose a DOM-sensitive hybrid change impact analysis technique for JavaScriptthrough a combination of static and dynamic analysis. The proposed approachincorporates a novel ranking algorithm for indicating the importance of each entityin the impact set. Our approach is explained in Chapter Understanding Hierarchical Motifs of Behaviour in ExecutionAssisting comprehension through dynamic analysis of execution traces is a popularresearch area. Traces are rich sources of information regarding the behaviour of aprogram, leading to precise analysis techniques. However, these techniques typicallydo not scale well, due to the size and complexity of execution traces.Common techniques such as trimming, summarizing, and visualizing traces[75, 92] do not solve the problem for large, complex, and real-world applicationsand understanding higher-level key points of the semantics of program behaviourremains difficult [33, 141, 142]. The interpretation of prior work from executionpatterns, if existent, is different from ours. These papers have predominantly fo-cused on generic and pre-defined design patterns, low-level architectural relationsbetween program artifacts, or visualizations of all details of execution [7, 21, 60, 73].Research Question 5. How does providing high-level and semantic motifs of aapplication’s behaviour improve comprehensibility of the application?To address this research question, we propose a generic technique in Chapter 5for facilitating comprehension by creating an abstract model of software behaviour.The model encompasses hierarchies of recurring and application-specific motifs.The motifs are abstract patterns extracted from traces through our novel technique,inspired by bioinformatics algorithms. The motifs provide an overview of thebehaviour at a high-level, while encapsulating semantically related sequences inexecution. We design a visualization that allows developers to observe and interactwith inferred motifs.61.2 Origin of Chapters and AcknowledgementsChapters 2 – 4 are based on published peer-reviewed papers listed below. Chapter 5is based on a paper that is currently under submission in a software engineeringconference. The author of this thesis is the main contributor of all chapters. Chap-ter 2 is co-authored by Sheldon Sequeira, a former Masters’ student in SALT lab.All chapters are co-authored by the author’s supervisors: Dr. Ali Mesbah and Dr.Karthik Pattabiraman.• Chapter 2. Addressing RQ1, this chapter was originally published in theACM/IEEE International Conference on Software Engineering (ICSE), in2014. This paper received an ACM SIGSOFT Distinguished Paper Award.The extended version of the paper, addressing RQ2, was later published inACM Transactions on Software Engineering and Methodology (TOSEM), in2016.– Understanding JavaScript Event-based Interactions [4]: S. Alimadadi,S. Sequeira, A. Mesbah and K. Pattabiraman, ACM/IEEE InternationalConference on Software Engineering (ICSE), 2014, 11 pages. (Accep-tance Rate: 20%)– Understanding JavaScript Event-based Interactions with CLEMATIS[8]: S. Alimadadi, S. Sequeira, A. Mesbah and K. Pattabiraman, ACMTransactions on Software Engineering and Methodology (TOSEM),2016, 39 pages.• Chapter 3. This chapter, targeting RQ3, was published in the EuropeanConference on Object-Oriented Programming (ECOOP) in 2015.– Hybrid DOM-Sensitive Change Impact Analysis for JavaScript [6]: S.Alimadadi, A. Mesbah and K. Pattabiraman, European Conference onObject-Oriented Programming (ECOOP), 2015, 25 pages. (AcceptanceRate: 22.8%)• Chapter 4. This chapter addresses RQ4 and was published at the ACM/IEEEInternational Conference on Software Engineering (ICSE) in 2016.– Understanding Asynchronous Interactions in JavaScript Applications7[7]: S. Alimadadi, A. Mesbah and K. Pattabiraman, ACM/IEEE Inter-national Conference on Software Engineering (ICSE), 2016, 12 pages.(Acceptance Rate: 19%)• Chapter 5. This chapter, addressing RQ5, is currently under submission ata software engineering conference. The original idea of this paper was ac-cepted to the Doctoral Symposium track at the ACM SIGSOFT InternationalSymposium on the Foundations of Software Engineering (FSE) in 2016.– Inferring Hierarchical Motifs from Execution Traces: S. Alimadadi, A.Mesbah and K. Pattabiraman, 12 pages, under review.– Understanding Behavioural Patterns in JavaScript [3]: S. Alimadadi,ACM SIGSOFT International Symposium on Foundations of SoftwareEngineering (FSE), 2016, 3 pages.8Chapter 2Understanding JavaScriptEvent-Based InteractionsWeb applications have become one of the fastest growing types of software systemstoday. Despite their popularity, understanding the behaviour of modern web applica-tions is still challenging for developers during development and maintenance tasks.The challenges mainly stem from the dynamic, event-driven, and asynchronousnature of the JavaScript language. We propose a generic technique for capturinglow-level event-based interactions in a web application and mapping those to ahigher-level behavioural model. This model is then transformed into an interactivevisualization, representing episodes of triggered causal and temporal events, relatedJavaScript code executions, and their impact on the dynamic DOM state. Ourapproach, implemented in a tool called CLEMATIS, allows developers to easilyunderstand the complex dynamic behaviour of their application at three differentsemantic levels of granularity. Furthermore, CLEMATIS helps developers bridge thegap between test cases and program code by localizing the fault related to a testassertion. The results of our industrial controlled experiment show that CLEMATISis capable of improving the comprehension task accuracy by 157%, while reducingthe task completion time by 47%. A follow up experiment reveals that CLEMATISimproves the fault localization accuracy of developers by a factor of two.Understanding the implicit and intricate relations between program entities inJavaScript applications is challenging for many reasons. First, the weakly-typed9and highly-dynamic nature of JavaScript makes it a particularly difficult languageto analyze. Second, JavaScript code is extensively used to seamlessly mutate theDocument Object Model (DOM) at runtime. This dynamic interplay between twoseparate entities, namely JavaScript and the DOM, can become quite complex tofollow [98]. Third, JavaScript is an event-driven language allowing developers toregister various event listeners on DOM nodes. While most events are triggeredby user actions, timing events and asynchronous callbacks can be fired with nodirect input from the user. To make things even more complex, a single event canpropagate on the DOM tree and trigger multiple listeners according to the eventcapturing and bubbling properties of the event model [133].In this chapter, we present a generic, non-intrusive technique, called CLEMATIS,for supporting web application comprehension. Through a combination of auto-mated JavaScript code instrumentation and transformation, we capture a detailedtrace of a web application’s behaviour during a particular user session. Our tech-nique transforms the trace into an abstract behavioural model, preserving temporaland causal relations within and between involved components. The model is thenpresented to the developers as an interactive visualization that depicts the creationand flow of triggered events, the corresponding executed JavaScript functions, andthe mutated DOM nodes.We then apply our approach to further aid developers in understanding rootcauses of test failures. Fault localization is one of the most difficult phases ofdebugging [132], and is an active topic of research [1, 28, 67, 144]. Althoughtesting of modern web applications has received attention [15, 88, 129], there hasbeen limited work on what happens after a test reveals an error.To the best of our knowledge, we are the first to provide a generic techniquefor capturing low-level event-based interactions in a JavaScript application, andmapping and visualizing those interactions as higher-level behavioural models. Weextend the approach, as we propose a novel test failures comprehension unit andevaluate its effectiveness through a user experiment. Overall, our work makes thefollowing key contributions:• We propose a generic technique for capturing and presenting the complexdynamic behaviour of web applications. In particular, our technique:10– Captures the consequences of JavaScript and DOM events in terms ofthe executed JavaScript code, including the functions that are calledindirectly through event propagation on the DOM tree.– Extracts the source-and-target relations for asynchronous events, i.e.,timing events and XMLHttpRequest requests/callbacks.– Identifies and tracks mutations to the DOM caused by each event.• We build a novel model for capturing the event-driven interactions as well asan interactive, visual interface supporting the comprehension of the programthrough three different semantic levels of zooming granularity.• We implement our technique in a generic open source tool called CLEMATIS,which (1) does not modify the web browser, (2) is independent of the servertechnology, and (3) requires no extra effort from the developer to use.• We extend CLEMATIS to automatically connect test assertion failures to faultyJavaScript code considering the involved DOM elements.• We empirically evaluate CLEMATIS through three controlled experimentscomprising 48 users in total. The first two studies investigate the codecomprehension capabilities of CLEMATIS. One of these studies is carried outin a lab environment, while the other is carried out in an industrial setting.The results of the industrial experiment show that CLEMATIS can reduce thetask completion time by 47%, while improving the accuracy by 157%. Weevaluate the test failure comprehension unit of CLEMATIS through a third userexperiment. The results show that CLEMATIS improves the fault localizationaccuracy of developers by a factor of two.2.1 Challenges and MotivationModern web applications are largely event-driven. Their client-side execution isnormally initiated in response to a user-action triggered event, a timing event, orthe receipt of an asynchronous callback message from the server. As a result,web developers encounter many program comprehension challenges in their dailydevelopment and maintenance activities. We use an example, presented in Figures2.1–2.2, to illustrate these challenges.Furthermore, developers often write test cases that assert the behaviour of a web111 <BODY>2 <FIELDSET class="registration">3 Email: <INPUT type="text" id="email"/>4 <BUTTON id="submitBtn">Submit</BUTTON>5 <DIV id="regMsg"></DIV>6 </FIELDSET>7 </BODY>Figure 2.1: Initial DOM state of the running example.application from an end-user’s perspective. However, when such test cases fail, itis difficult to relate the assertion failure to the faulty line of code. The challengesmainly stem from the existing disconnect between front-end test cases that assert theDOM and the application’s underlying JavaScript code. We use another example,illustrated in Figure 2.3, to demonstrate these testing challenges.Note that these are simple examples and these challenges are much more potentin large and complex web applications.2.1.1 Challenge 1: Event PropagationThe DOM event model [133] makes it possible for a single event, fired on a particularDOM node, to propagate through the DOM tree hierarchy and indirectly triggera series of other event-handlers attached to other nodes. There are typically twotypes of this event propagation in web applications; (1) with bubbling enabled, anevent first triggers the handler of the deepest child element on which the event wasfired, and then it bubbles up on the DOM tree and triggers the parents’ handlers. (2)When capturing is enabled, the event is first captured by the parent element andthen passed to the event handlers of children, with the deepest child element beingthe last. Hence, a series of lower-level event-handlers, executed during the capturingand bubbling phases, may be triggered by a single user action. The existence orthe ordering of these handlers is often inferred manually by the developer, whichbecomes more challenging as the size of the code/DOM tree increases.Example. Consider the sample code shown in Figures 2.1–2.2. Figure 2.1 representsthe initial DOM structure of the application. It mainly consists of a fieldsetcontaining a set of elements for the users to enter their email address to be registeredfor a service. The JavaScript code in Figure 2.2 partly handles this submission.121 $(document).ready(function() {2 $('#submitBtn').click(submissionHandler);3 $('fieldset.registration').click(function() {4 setTimeout(clearMsg, 3000);5 }); });6 ...7 function submissionHandler(e) {8 $('#regMsg').html("Submitted!");9 var email = $('#email').val();10 if (isEmailValid(email)) {11 informServer(email);12 $('#submitBtn').attr("disabled", true);13 }14 }15 ...16 function informServer(email) {17 $.get('/register/', { 'email': email }, function(data) {18 $('#regMsg').append(data);19 });20 return;21 }22 ...23 function clearMsg() {$('#regMsg').fadeOut(2000);}Figure 2.2: JavaScript code of the running example.When the user clicks the submit button, a message appears indicating that thesubmission was successful. This message is displayed from within the event-handlersubmissionHandler() (line 7), which is attached to the button on line 2 ofFigure 2.2. However, after a few seconds, the developer observes that the messageunexpectedly starts to fade out. In the case of this simple example, she can read thewhole code and find out that the click on the submit button has bubbled up to itsparent element, namely fieldset. Closer inspection reveals that fieldset’sanonymous handler function is responsible for changing the value of the same DOMelement through a setTimeout function (lines 3–5 in Figure 2.2). In a morecomplex application, the developer may be unaware of the existence of the parentelement, its registered handlers, or the complex event propagation mechanisms suchas bubbling and capturing.2.1.2 Challenge 2: Asynchronous EventsWeb browsers provide a single thread for web application execution. To circumventthis limitation and build rich responsive web applications, developers take advantage13of the asynchronous capabilities offered by modern browsers, such as timeouts andXMLHttpRequest (XHR) calls. Asynchronous programming, however, introducesan extra layer of complexity in the control flow of the application and adverselyinfluences program comprehension.Timeouts. Events can be registered to fire after a certain amount of time or at certainintervals in JavaScript. These timeouts often have asynchronous callbacks that areexecuted when triggered. In general, there is no easy way to link the callback ofa timeout to its source, which is important to understand the program’s flow ofexecution.XHR Callbacks. XHR objects are used to exchange data asynchronously with theserver, without requiring a page reload. Each XHR goes through three main phases:open, send, and response. These three phases can be scattered throughout thecode. Further, there is no guarantee on the timing and the order of XHR responsesfrom the server. As in the case of timeouts, mapping the functionality triggered by aserver response back to its source request is a challenging comprehension task fordevelopers.Example. Following the running example, the developer may wish to further investi-gate the unexpected behaviour: the message has faded out without a direct actionfrom the developer. The questions that a developer might ask at this point include:“What exactly happened here?” and “What was the source of this behaviour?”.By reviewing the code, she can find out that the source of this behaviour was theexpiration of a timeout that was set in line 4 of Figure 2.2 by the anonymous handlerdefined in lines 3–5. However the callback function, defined on line 23 of Figure2.2, executes asynchronously and with a delay, long after the execution of the anony-mous handler function has terminated. While in this case, the timing behaviour canbe traced by reading the code, this approach is not practical for large applications.A similar problem exists for asynchronous XHR calls. For instance, the anonymouscallback function of the request sent in the informServer function (line 17,Figure 2.2) updates the DOM (line 18).142.1.3 Challenge 3: Implications of EventsAnother challenge in understanding the flow of web applications lies in understand-ing the consequences of (in)directly triggered events. Handlers for a (propagated)DOM event, and callback functions of timeouts and XHR requests, are all Java-Script functions. Any of these functions may change the observable state of theapplication by modifying the DOM. Currently, developers need to read the code andmake the connections mentally to see how an event affects the DOM state, whichis quite challenging. In addition, there is no easy way of pinpointing the dynamicchanges made to the DOM state as a result of event-based interactions. Inferringthe implications of events is, therefore, a significant challenge for developers.Example. After the submitBtn button is clicked in the running example, aconfirmation message will appear on-screen and disappear shortly thereafter (lines8&23, Figure 2.2). Additionally, the attributes of the button are altered to disable it(line 12). It can be difficult to follow such DOM-altering features in an application’scode.2.1.4 Challenge 4: Linking Test Failures to FaultsTo test their web applications, developers often write test cases that check theapplication’s behaviour from an end-user’s perspective using popular frameworkssuch as Selenium1. Such test cases are agnostic of the JavaScript code and operateby simulating a series of user actions followed by assertions on the application’sruntime DOM. As such, they can detect deviations in the expected behaviour asobserved on the DOM.However, when a web application test assertion fails, determining the faultyprogram code responsible for the failure can be a challenging endeavour. The mainchallenge here is the implicit link between three different entities, namely, the testassertion, the DOM elements on which the assertion fails (checked elements), andthe faulty JavaScript code responsible for modifying those DOM elements. Tounderstand the root cause of the assertion failure, the developer needs to manuallyinfer a mental model of these hidden links, which can be tedious. Further, unlike intraditional (e.g., Java) applications, there is no useful stack trace produced when1http://seleniumhq.org151			<div	class="row-sort-assets">2							<div	class="sort-assets"></div>3							...4			</div>5			<div	id="assets-container"	data-pages="">6							<div	class="asset-row">7											<div	class="asset-icon"></div>8											...9							</div>10		</div>1		public	void	testSortByDefaults()	{2						driver.get("http://localhost:9763/store/assets/gadget");3						driver.findElement(By.css("i.icon-star")).click();4						int	s1	=	driver.findElements(By.css(".asset-icon")).size();5						assertEquals(12,	s1);67						scrollWindowDown();8						int	s2	=	driver.findElements(By.css(".asset-icon")).size();9						assertEquals(4,	s2	-	s1);10		}	1			var	currentPage	=	1;2			var	sortType	=	'default';3			var	gridSize	=	8;4			var	infiniteScroll	=	false;5			6			var	renderAssets	=	function(url,	size)	{7							var	data	=	assetsFromServer(url);8			9							var	temp	=	'<div	class="asset-row">';10						for	(i	=	0;	i	<	size;	i++)	{11										temp	+=	'		<div	class="asset-icon">';	12										...	//	Reading	from	variable	'data'13										temp	+=	'		</div>';14						}15						temp	+=	'</div>';16							17						return	$('#assets-container').append(temp);18		};19			20		$(document).on('click',	'#sort-assets',	function(){21						$('#sort-assets').removeClass('selected-type')22						$(this).addClass('selected-type');23						currentPage	=	1;24						sortType	=		$(this).attr('type');25						gridSize	=	12;26						renderAssets(url	+	sortType	+	currentPage,	gridSize)27						infiniteScroll	=	true;28		});29			30		var	scroll	=	function()	{31						if(infiniteScroll)	{32										currentPage++;33										renderAssets(url	+sortType	+	currentPage,	gridSize/2)34						}35		};36		$(window).bind('scroll',	scroll);(a)(d)(c)(b)(e)AAll CategoriesBar Chart Bubble Chart Date Time Directions by GoogleassertEquals(4,	s2	-	s1),	AssertionFailure:	expected	<4>	but	was:	<6>1324Figure 2.3: Test assertion understanding example (a) JavaScript code, (b)Portion of DOM-based UI, (c) Partial DOM, (d) DOM-based (Sele-nium) test case, (e) Test case assertion failure. The dotted lines showthe links between the different entities that must be inferred.a web test case fails as the failure is on the DOM, and not on the application’sJavaScript code. This further hinders debugging as the fault usually lies withinthe application’s code, and not in its representative DOM state. To the best of ourknowledge, there is currently no tool support available to help developers in thistest failure understanding and fault localization process.Example. The example in Figure 2.3 uses a small code snippet based on the open-source WSO2 eStore application 2 to demonstrate the challenges involved andour solution. The store allows clients to customize and deploy their own digitalstorefront. A partial DOM representation of the page is shown in Figure 2.3c.Figure 2.3d shows a Selenium test case, written by the developers of the applicationfor verifying the application’s functionality in regards to “sorting” and “viewing” theexisting assets. The JavaScript code responsible for implementing the functionalityis shown in Figure 2.3a.2https://github.com/wso2/product-es16After setting the environment, the test case performs a click to sort the assets.Then, an assertion is made to check whether the expected assets are present onthe DOM of the page (line 5 of Figure 2.3d). The second portion of the test caseinvolves scrolling down the webpage and asserting the existence of four additionalassets on the DOM (lines 7–9).While the mapping between the test case and related JavaScript code may seemtrivial to find for this small example, challenges arise as the JavaScript code-baseand the DOM increase in size. As a result, it can be difficult to understand thereason for a test case failure or even which features are being tested by a given testcase.When a test case fails, first one needs to identify the dependencies of the testcase. Based on the fail message in our example (Figure 2.3e), it is almost impossibleto determine the cause of failure. Closer examination reveals the dependencies ofassertions on variables s1 and s2, which in turn depend on DOM elements withclass asset-icon (link ¶ in Figure 2.3).Next, the developer/tester is forced to find the points of contact between theDOM elements and the JavaScript code. Finding the JavaScript code responsiblefor modifying this subset of DOM elements is not easy. In the context of ourexample, a developer would eventually conclude that line 17 of Figure 2.3a isactually responsible for appending elements to the DOM. Discovering such implicitlinks (· and ¸ in Figure 2.3) needs tedious examination in smaller programs, andmay not be feasible in larger applications.In JavaScript, events can trigger code execution and must be taken into accountfor finding the source of the fault. The renderAssets() function in our example(Figure 2.3) can be called from within two event handlers (lines 26 and 33, respec-tively, shown as ¹). While in our example it may be straight-forward to link thecall to scrollWindowDown() (line 7 of Figure 2.3d) to the execution of eventhandler scroll (line 30–35 of Figure 2.3a) due to the similarity in naming convention,such a linear mapping is neither possible in all cases nor easily inferable.Finally, to fully understand an assertion and its possible cause of failure, thedata and control dependencies for the DOM-altering statements must be determinedand examined by the developer in order to identify all possible points of failure.In the case of eStore, the modification of the DOM within renderAssets()17ProxyServer BrowserJavaScript TransformationTransformationto Java data structureEpisodes & Story CreationLinking DOMMutationsBehaviouralModel GenerationVisualizationDOM eventsTimeoutsXHR callbacksFunction callsDOM mutationsJSONTrace Collection1234567Figure 2.4: A processing overview of our approach.depends on the arguments passed into the function (lines 7 & 10). Dotted line4 shows possible invocations of renderAssets(), revealing dependencies onglobal variables such as gridSize. Tracing the dependencies reveals that an updateto gridSize on line 25 of Figure 2.3a is the root cause of the unusual behaviour.2.2 ApproachIn this section, we describe our approach for addressing the challenges mentionedin the previous section. An overview of the overall process is depicted in Figure 2.4,which consists of the following main steps:• First, our technique captures a fine-grained trace of all semantically relatedevent-based interactions within a web application’s execution, in a particularuser session. The collection of this detailed trace is enabled through a seriesof automated JavaScript transformations (Section 2.2.1).• Next, a behavioural model is extracted from the information contained withinthe trace. The model structures the captured trace and identifies the implicitcausal and temporal relationships between various event-based interactions(Section 2.2.2).• Then, the model is extended through a combination of selective code instru-mentation and dynamic backward slicing to bridge the gap between test cases18and program code (Section 2.2.3).• Finally, based on the inferred behavioural model, our approach generatesan interactive (web-based) user interface, visualizing and connecting all thepieces together. This interactive visualization assists developers during theirweb application comprehension and maintenance tasks (Section 2.2.4).We describe each step further below.2.2.1 JavaScript Transformation and TracingTo automatically trace semantically related event-based interactions and their im-pact, we transform the JavaScript code on-the-fly. Our approach generates a tracecomprising multiple trace units. A trace unit contains information acquired throughthe interception of a particular event-based interaction type, namely, DOM events,timing events, XHR calls and callbacks, function calls, and DOM mutations. Theobtained trace is used to build a behavioural model (as described in Section 2.2.2).Interposing on DOM Events. There are two ways event listeners can be bound toa DOM element in JavaScript. The first method is programatically using the DOMLevel 1 (e.click=handler) or DOM Level 2 (e.addEventListener)methods W3C [133] in JavaScript code. To record the occurrence of such events,our technique replaces the default registration of these JavaScript methods suchthat all event listeners are wrapped within a tracing function that logs the occurringevent’s time, type, and target.The second and more traditional way to register an event listener is inline inthe HTML code (e.g., <DIV onclick=‘handler();’>). The effect of thisinline assignment is semantically the same as the first method. Our techniqueinterposes on inline-registered listeners by removing them from their associatedHTML elements, annotating the HTML elements, and re-registering them usingthe substituted addEventListener function. This way we can handle themsimilarly to the programmatically registered event handlers.Capturing Timeouts and XHRs. For tracing timeouts, we replace the browser’ssetTimeout() method and the callback function of each timeout with wrapperfunctions, which allow us to track the instantiation and resolution of each timeout.A timeout callback usually happens later and triggers new behaviour, and thus we191 function clearMsg() {2 send(JSON.stringify({messageType: "FUNCTION_ENTER", fnName: "clearMsg←↩", args: null, ...}));3 $('#submissionMsg').fadeOut(2000);4 send(JSON.stringify({messageType: "FUNCTION_EXIT", fnName: "clearMsg"←↩, ...}));5 }Figure 2.5: Instrumented JavaScript function declaration.consider it as a separate component than a setTimeout(). We link these togetherthrough a timeout_id and represent them as a causal connection later. In ourmodel, we distinguish between three different components for the open, send,and response phases of each XHR object. We intercept each component byreplacing the XMLHttpRequest object of the browser. The new object capturesthe information about each component while preserving its functionality.Recording Function Traces. To track the flow of execution within a JavaScript-based application, we instrument three code constructs, namely function decla-rations, return statements, and function calls. Each of these code constructs areinstrumented differently, as explained below.Function Declarations: Tracing code is automatically added to each functiondeclaration allowing us to track the flow of control between developer-definedfunctions by logging the subroutine’s name, arguments, and line number. In case ofanonymous functions, the line number and source file of the subroutine are used assupplementary information to identify the executed code.As this communication is done each time a function is executed, argumentvalues are recorded dynamically at the cost of a small overhead. Figure 2.5 containsthe simple clearMsg() JavaScript function from the running example shown inFigure 2.2 (line 22), which has been instrumented to record both the beginning andend of its execution (lines 2 and 4).Return Statements: Apart from reaching the end of a subroutine, control can bereturned back to a calling function through a return statement. There are two reasonsfor instrumenting return statements: (1) to accurately track nested function calls, and(2) to provide users with the line numbers of the executed return statements. Withoutrecording the execution of return statements, it would be difficult to accurately201 function informServer(email) {2 $.get('/register/', { email }, function(data) {3 $('#regMsg').append(data);4 });5 return RSW(null, 5);6 }Figure 2.6: Instrumented JavaScript return statement.1 function submissionHandler(e) {2 $('#regMsg')[FCW("html")]("Submitted!");3 var email = $('#email')[FCW("value")]();4 if (FCW(isEmailValid)(email)) {5 FCW(informServer)(email);6 $('#submitBtn')[FCW("attr")]("disabled", true);7 }8 }9 function clearMsg() {10 $('#regMsg')[FCW("fadeOut")](2000);11 }12 function FCW(fnName) { // Function Call Wrapper13 send(JSON.stringify({messageType: "FUNCTION_CALL",...,←↩targetFunction: fnName}));14 return fnName;15 }Figure 2.7: Instrumented JavaScript function calls.track nested function calls. Furthermore, by recording return values and the linenumber of each return statement, CLEMATIS is able to provide users with contextualinformation that can be useful during the debugging process.Figure 2.6 illustrates the instrumentation for the return statement of inform-Server(), a function originally shown in the running example (Figure 2.2, lines16-21). The wrapper function RSW receives the return value of the function and theline number of the return statement and is responsible for recording this informationbefore execution of the application’s JavaScript is resumed.Function Calls: In order to report the source of a function invocation, ourapproach also instruments function calls. When instrumenting function calls, it isimportant to preserve both the order and context of each dynamic call. To accuratelycapture the function call hierarchy, we modify function calls with an inline wrapperfunction. This allows us to elegantly deal with two challenging scenarios. First,211 Before Instrumentation:2 getRegistrationDate(getStudentNumber(document.getElementById('←↩username').value));4 After Clematis Instrumentation:5 FCW(getRegistrationDate)(FCW(getStudentNumber)(document←↩FCW("getElementById")('username').value));7 Alternative Instrumentation:8 FCW(getRegistrationDate);9 FCW(getStudentNumber);10 FCW(getElementById);11 getRegistrationDate(getStudentNumber(document.getElementById('←↩username').value));Figure 2.8: Comparison of instrumention techniques for JavaScript func-tion calls.when multiple function calls are executed from within a single line of JavaScriptcode, it allows us to infer the order of these calls without the need for complexstatic analysis. Second, inline instrumentation enables us to capture nested functioncalls. Figure 2.7 depicts the instrumentation of function calls for two methods fromFigure 2.1, submissionHandler() and clearMsg().Once instrumented using our technique, the function calls to isEmailValid()and informServer() are wrapped by function FCW (lines 4 and 5). The interpos-ing function FCW() executes immediately before each of the original function callsand interlaces our function logging with the application’s original behaviour. Classmethods html(), value(), attr(), and fadeOut() are also instrumentedin a similar way (lines 2, 3, 6, and 10 respectively).For comparison, an alternative instrumentation technique is shown on lines 8 –10 of figure 2.8. While such a technique might be sufficient for measuring functioncoverage, it does not capture the order of execution accurately for nested functioncalls or when multiple function calls are made from a single line. Doing so wouldrequire more complex static analysis.DOM Mutations. Information about DOM mutations can help developers relatethe observable changes of an application to the corresponding events and JavaScriptcode. To capture this important information, we introduce an observer module intothe system. This information is interleaved with the logged information about events22and functions, enabling us to link DOM changes with the JavaScript code that isresponsible for these mutations.2.2.2 Capturing a Behavioural ModelWe use a graph-based model to capture and represent a web application’s event-based interactions. The graph is multi-edge and directed. It contains an orderedset of nodes, called episodes, linked through edges that preserve the chronologicalorder of event executions.3 In addition, causal edges between the nodes representasynchronous events. We describe the components of the graph below.Episode Nodes. An episode is a semantically meaningful part of the applicationbehaviour, initiated by a synchronous or an asynchronous event. An event maylead to the execution of JavaScript code, and may change the DOM state of theapplication. An episode node contains information about the static and dynamiccharacteristics of the application, and consists of three main parts:1. Source: This is the event that started the episode, and its contextual infor-mation. This source event is either a DOM event, a timeout callback, or aresponse to an XHR request, and often causes a part of the JavaScript code tobe executed.2. Trace: This includes all the functions that are executed either directly orindirectly after the source event occurs. A direct execution corresponds tofunctions that are called from within an event handler on a DOM element.An indirect execution corresponds to functions that get called due to thebubbling and capturing propagation of DOM events. The trace also includesall (a)synchronous events that were created within the episode. All theinvoked functions and initiated events are captured in the trace part, and theiroriginal order of execution and dependency relations are preserved.3. Result: This is a section in each episode summarizing the changes to theDOM state of the application. These changes are caused by the execution ofthe episode trace and are usually observable by the end-user.Edges. In our model, edges represent a progression of time and are used to connect3Because JavaScript is single-threaded on all browsers, the events are totally ordered in time.23episode nodes. Two types of edges are present in the model:1. Temporal: The temporal edges connect one episode node to another, indicat-ing that an episode succeeded the previous one in time.2. Causal: These edges are used to connect different components of an asyn-chronous event, e.g., timeouts and XHRs. A causal edge from episode s to dindicates episode s was caused by episode d in the past.Story. The term story refers to an arrangement of episode nodes encapsulating asequence of interactions with a web application. Different stories can be capturedaccording to different features, goals, or use-cases that need investigation.Algorithm 1 takes the trace collected from a web application as input andoutputs a story with episodes and the edges between them. First, the trace unitsare extracted and sorted based on the timestamp of their occurrence (line 3). Next,the algorithm iteratively forms new episodes and assigns trace units to the source,trace, and the result fields of individual episodes. If it encounters a trace unitthat could be an episode source (i.e., an event handler, a timeout, or an XHRcallback), a new episode is created (lines 5–6) and added to the list of nodes inthe story graph (line 8). The encountered trace unit is added to the episode as itssource (line 7). Line 9 shows different types of trace units that could be added tothe trace field of the episode. This trace is later processed to form the completefunction call hierarchy as well as each function’s relation with the events insidethat episode. Next, DOM mutation units that were interleaved with other traceunits are organized and linked to their respective episode (lines 11–12). An episodeterminates semantically when the execution of the JavaScript code related to thatepisode is finished. The algorithm also waits for a time interval τ to ensure thatthe execution of immediate asynchronous callbacks is completed (line 13). Whenall of the trace units associated with the source, trace, and result of the episode areassigned and the episode termination criteria are met, a temporal edge is added toconnect the recently created episode node to the previous one (line 14). The sameprocess is repeated for all episodes by proceeding to the next episode captured in thetrace (line 15). After all episodes have been formed, the linkages between distantasynchronous callbacks – those that did not complete immediately – are extracted24Algorithm 1 Story Creationinput : traceoutput :storyProcedure CREATEMODEL() begin1 G <V,E > story← /02 ecurr,eprev← /03 Σ tu← EXTRACTANDSORTTRACEUNITS(trace)4 foreach tu ∈ Σ tu do5 if eprev ≡ /0||eprev.ended()&&tu.type≡ episodeSource then6 ecurr← CREATEEPISODE()7 ecurr.source← SETEPISODESOURCE(tu)8 V ←V ∪ ecurr9 else if (tu.type≡ FunctionTrace||EventHandler) ||(tu.type≡ XHRCallback||TimeoutCallback&& ¬episodeEndCriteria) then10 ecurr.trace← ecurr.trace∪ tu11 else if tu.type≡ DOMMutation then12 ecurr.results← ecurr.results∪ tu13 if episodeEndCriteriaSatis f ied then14 E← E ∪ CREATETEMPORALLINK(eprev,ecurr)15 eprev← ecurr16 timeoutMap<TimeoutSet, TimeoutCallback>←MAPTIMEOUTTRACEUNITS(Σ tu)17 XHRMap<XHROpen, XHRSend, XHRCallback>←MAPXHRTRACEUNITS(Σ tu)18 E← E ∪EXTRACTCAUSALLINKS(TIMEOUTMAP, XHRMAP)19 story← BUILDSTORY(G <V,E >)20 return storyand added to the graph as causal edges (lines 16–18). Finally, the story is createdbased on the whole graph and returned (lines 19–20).2.2.3 Understanding Test Assertion FailuresIn this section, we extend CLEMATIS to further assist developers in the compre-hension process. We add a test case comprehension strategy to CLEMATIS, to helpdevelopers understand the root cause of a test failure. Our technique automaticallylinks a test assertion failure to the checked DOM elements, and subsequently to the25related statements in the JavaScript code. The following subsections describe ourstrategies for fulfilling the aforementioned requirements of JavaScript test failurecomprehension.Relating Test Assertions to DOM Elements. The DOM acts as the interfacebetween a front-end test case and the JavaScript code. Therefore, the first step tounderstanding the cause for a test case failure is to determine the DOM dependenciesfor each test assertion. While this seems simple in theory, in practice, assertionsand element accesses are often intertwined within a single test case, convoluting themapping between the two.Going back to the test case of our example in Figure 2.3d, the first assertion onLine 5 is dependent on the DOM elements returned by the access on the previousline. The last assertion on Line 9 is more complex as it compares two snapshotsof the DOM and therefore has dependencies on 2 DOM accesses (Lines 4 and8). Figure 2.9 summarizes the test case’s execution and captures the temporal andcausal relations between each assertion and DOM access.Assertion 1 Assertion 2DOM Access 1(icon-star)DOM Access 2(asset-icons)DOM Access 3(asset-icons)+Figure 2.9: Relating assertions to DOM accesses for the test case of Fig-ure 2.3d.To accurately determine the DOM dependencies of each assertion (¶ in Fig-ure 2.3), we apply dynamic backward slicing to each test case assertion. In addition,we track the runtime properties of those DOM elements accessed by the test case.This runtime information is later used in our analysis of the DOM dependencies ofeach assertion.Contextualizing Test Case Assertion. In the second step, our approach aims to (1)help developers understand the context of their assertions by monitoring test-relatedJavaScript execution, asynchronous events, and DOM mutations; (2) determinethe initial link between JavaScript code and the checked DOM elements (· inFigure 2.3).26In order to monitor JavaScript events, we leverage the tracing technique out-lined in Section 2.2.1, which tracks the occurrence of JavaScript events, functioninvocations, and DOM mutations. We utilize the tracked mutations in order to focuson the segments of JavaScript execution most relevant to the assertions in a test case.As we are only interested in the subset of the DOM relevant to each test case, ourapproach focuses on the JavaScript code that interacts with this subset.The previous step yields the set of DOM elements relevant to each assertion. Wecross reference these sets with the timestamped DOM mutations in our executiontrace extracted from CLEMATIS to determine the JavaScript functions and events(DOM, timing, or XHR) relevant to each assertion.Once the relevant events and JavaScript functions have been identified for eachassertion, we introduce wrapper functions for the native JavaScript functions usedby developers to retrieve DOM elements. Specifically, we redefine methods such asgetElementById and getElementsByClassName to track DOM accesseswithin the web application itself so that we know exactly where in our collectedexecution trace the mutation originated. The objects returned by these methodsare used by the application later to update the DOM. Therefore, we compute theforward slice of these objects to determine the exact JavaScript lines responsible forupdating the DOM. Henceforth, we refer to the references returned by these nativemethods as JavaScript DOM accesses.We compare the recorded JavaScript DOM accesses with the DOM dependen-cies of each test case assertion to find the equivalent JavaScript DOM accesseswithin the application’s code. Moreover, the ancestors of those elements accessedby each assertion are also compared with the recorded JavaScript DOM accesses.This is important because in many cases a direct link might not exist betweenthem. For instance, in the case of our example (Figure 2.3d), a group of assetsare compiled and appended to the DOM after a scroll event. We compare theproperties of those DOM elements accessed by the final assertion (assets on Lines 4and 8 of Figure 2.3d), as well as the properties of those elements’ ancestors, withthe recorded JavaScript DOM accesses and conclude that the assets were added tothe DOM via the parent element assets container on Line 17 of Figure 2.3a (·).Slicing the JavaScript Code. At this point, our approach yields the set of Java-Script statements responsible for updating the DOM dependencies of our test case.27However, the set in isolation seldom contains the cause of a test failure. We computea backwards slice for these DOM-mutating statements to find the entire set ofstatements that perform the DOM mutation.In our approach, we have opted for dynamic slicing, which enables us to producethinner slices that are representative of each test execution, thus reducing noiseduring the debugging process. The slices incorporate data and control dependenciesderived from the application. Moreover, by using dynamic analysis we are ableto present the user with valuable runtime information that would not be availablethrough static analysis of JavaScript code.Selective Instrumentation. An ideal test case would minimize setup by exercis-ing only the relevant JavaScript code related to its assertions. However, developersare often unaware of the complete inner workings of the application under test. As aresult, it is possible for a test case to execute JavaScript code that is unrelated to anyof its contained assertions. In such a case, instrumenting an entire web application’sJavaScript code base would yield a large trace with unnecessary information. Thiscan incur high performance overheads, which may change the web application’sbehaviour. Therefore, instead of instrumenting the entirety of the code for dynamicslicing, our approach intercepts and statically analyzes all JavaScript code sent fromthe server to the client to determine which statements may influence the assertedDOM elements. Then, this subset of the application’s code is instrumented. This ap-proach has two advantages. First, it minimizes the impact our code instrumentationhas on the application’s performance. Second, selective instrumentation yields amore relevant and concise execution trace, which in turn lowers the processing timerequired to compute a backward slice.Our approach first converts the code into an abstract syntax tree (AST). Thistree is traversed in search of a node matching the initial slicing criteria. Oncefound, the function containing the initial definition of the variable-in-question isalso found, henceforth referred to as the parent closure. Based on this information,the algorithm searches this parent closure for all references to the variable of interest.This is done in order to find all locations in the JavaScript code where the variablemay be updated, or where a new alias may be created for the variable. Moreover,for each variable update pertaining to the variable of interest, we also track thedata dependencies for such an operation. Repeating these described steps for each28of the detected dependencies allows us to iteratively determine the subset of codestatements to efficiently instrument for a given initial slicing criteria.Once all possible data and control dependencies have been determined throughstatic analysis, each variable and its parent closure are forwarded to our codetransformation module, which instruments the application code in order to collecta concise trace. The instrumented code keeps track of all updates and accesses toall relevant data and control dependencies, hereby referred to as write and readoperations, respectively. This trace is later used to extract a dynamic backwardsslice.Figure 2.10 shows an example of our code instrumentation technique’s outputwhen applied to the JavaScript code in Figure 2.3a with slicing criteria <10, size>.By acting as a control dependency for variable temp, size determines the numberof displayed assets for the example. For each relevant write operation, our instru-mentation code logs information such as the name of the variable being written to,the line number of the executed statement, and the type of value being assigned tothe variable. Moreover, the data dependencies for such a write operation are alsologged. Likewise, for each read operation we record the name of the variable beingread, the type of value read, and the line number of the statement. Information aboutvariable type is important when performing alias analysis during the computation ofa slice.Computing a Backwards Slice. Once a trace is collected from the selectivelyinstrumented application by running the test case, we run our dynamic slicingalgorithm. We use dynamic slicing as it is much more accurate than static slicing atcapturing the exact set of dependencies exercised by the test case.The task of slicing is complicated by the presence of aliases in JavaScript. Whencomputing the slice of a variable that has been assigned a non-primitive value, weneed to consider possible aliases that may refer to the same object in memory. Thisalso occurs in other languages such as C and Java, however, specific to JavaScriptis the use of the dot notation, which can be used to seamlessly modify objectsat runtime. The prevalent use of aliases and the dot notation in web applicationsoften complicates the issue of code comprehension. Static analysis techniques oftenignore addressing this issue [47].To remedy this issue, we incorporate dynamic analysis in our slicing method. If291			var	currentPage	=	1;2			var	sortType	=	'default';3			var	gridSize	=	_write("gridSize",	8,	3);4			var	infiniteScroll	=	false;5			6			var	renderAssets	=	function(url,	size)	{7							var	data	=	assetsFromServer(url);8			9							var	temp	=	'<div	class="asset-row">';10						for	(i	=	0;	i	<	_read("size",	size,	10);	i++)	{11										temp	+=	'		<div	class="asset-icon">';	12										...	//	Reading	from	variable	'data'13										temp	+=	'		</div>';14						}15						temp	+=	'</div>';16							17						return	$('#assets-container').append(temp);18		};19			20		$(document).on('click',	'#sort-assets',	function(){21						$('#sort-assets').removeClass('selected-type')22						$(this).addClass('selected-type');23						currentPage	=	1;24						sortType	=		$(this).attr('type');25						gridSize	=	_write("gridSize",	12,	25);26						renderAssets(url	+	sortType	+	currentPage,	_readAsArg("gridSize",	gridSize,	26));27						infiniteScroll	=	true;28		});29			30		var	scroll	=	function()	{31						if(infiniteScroll)	{32										currentPage++;33										renderAssets(url	+sortType	+currentPage,	_readAsArg("gridSize",	gridSize,	33)/2);34						}35		};36		$(window).bind('scroll',	scroll);Figure 2.10: Example JavaScript code after our selective instrumentationis applied. Slicing criteria: <10, size>a reference to an object of interest is saved to a second object’s property, possiblythrough the use of the dot notation, the object of interest may also be altered viaaliases of the second object. For example, after executing statement a.b.c =objOfInterest;, updates to objOfInterest may be possible through a,a.b, or a.b.c. To deal with this and other similar scenarios, our slicing algorithmsearches through the collected trace and adds the forward slice for each detectedalias to the current slice for our variable of interest (e.g. objOfInterest).The line numbers for each of the identified relevant statements in the com-puted slice are collected and used during the visualization step, as shown in the30Figure 2.11: Overview of all captured stories.Section Visualizing the Captured ModelIn the final step, our technique produces an interactive visualization of the gener-ated model, which can be used by developers to understand the behaviour of theapplication. The main challenge in the visualization is to provide a way to displaythe model without overwhelming the developer with the details. To this end, ourvisualization follows a focus+context [29] technique that provides the details basedon a user’s demand. The idea is to start with an overview of the captured story, letthe users determine which episode they are interested in, and provide an easy meansto drill down to the episode of interest. With integration of focus within the context,developers can semantically zoom into each episode to gain more details regardingthat episode, while preserving the contextual information about the story.Multiple Sessions, Multiple Stories. The user can capture multiple sessions thatleads to creation of multiple stories. After each story is recorded, it will be added tothe list of captured stories. The name of each story is the date and time at whichit was captured. Figure 2.11 shows a screenshot of sample captured stories in thevisualization of Clematis. Once the users select their desired story, the browseropens a new page dedicated to that story. The initial view of a story contains a menubar that helps the user navigate the visualization (Figure 2.12, top). It also displaysan overview of all captured episodes inside the story (Figure 2.12, bottom).Story Map, Queries, and Bookmarking. A menu bar is designed for the visu-alization that contains two main parts: the story map and the query mechanism(Figure 2.12, top). The story map represents a general overview of the whole storyas a roadmap. Panning and (semantic) zooming are available for all episodes and31Figure 2.12: Top: menu of CLEMATIS. Bottom: overview of a capturedstory.may cause users to lose the general overview of the story. Hence, based on theuser’s interaction with the story (e.g., episode selection), the episodes of interestare highlighted on the roadmap to guide the user. The query section enables usersto search and filter the information visualized on the screen. Users can filter theepisodes displayed on the screen by the episode types (i.e., Event, Timeout, orXHR). They can also search the textual content of the events as well as the actualcode. Moreover, they have the option to bookmark one or more episodes whileinteracting with the target web application. Those episodes are marked with a starin the visualization to help users to narrow the scope and spot related episodes (e.g.,episode #6 in Figure 2.12 is bookmarked). The episodes’ timing information is alsoshown.Semantic Zoom Levels. The visualization provides 3 semantic zoom levels.Zoom Level 0. The first level displays all of the episodes in an abstracted manner,showing only the type and the timestamp of each episode (Figure 2.12, bottom). Thetype of each episode is displayed by the text of the episode as well as its backgroundcolor. The horizontal axis is dedicated to time and episodes are sorted from leftto right according to the time of their occurrence (temporal relations). The causaledges between different sections of each timeout or XHR object are shown byadditional edges under the episodes.Zoom Level 1. When an episode is selected, the view transitions into thesecond zoom level, which presents an outline of the selected episode, providingmore information about the source event as well as a high-level trace (Figure 2.13,middle). The trace at this level contains only the names of the (1) invoked functions,(2) triggered events, and (3) DOM mutations, caused directly or indirectly by thesource event. At this level, the user can view multiple episodes to have a side-by-side32Figure 2.13: Three semantic zoom levels in CLEMATIS. Top: overview.Middle: zoomed one level into an episode, while preserving thecontext of the story. Bottom: drilled down into the selectedepisode.comparison.Zoom Level 2. The final zoom level exhibits all the information embeddedin each episode (Figure 2.13, bottom). Clicking on the “Event” tab will displaythe type of the event that started the episode (DOM, timeout or XHR event). The33contextual information of the event are displayed based on its type. Choosing the“DOM mutations” tab will list all the changes that were made to the DOM afterthe execution of this episode. For each DOM element that was added, removedor modified, an item is added to the list of mutations that identifies the modifiedelement, the type of the change and additional information about the change. Thethird and final tab depicts a detailed trace of the episode. The trace at this levelincludes a customized sequence diagram of the dynamic flow of all the invokedJavaScript functions and events within that episode. When the user clicks on any ofthe functions or events in the diagram, the JavaScript code of each executed functionis displayed and highlighted (Figure 2.13, bottom).Inferred Mappings between Test Failures and Code. The test case comprehen-sion unit extends the interactive visualization to depict the inferred mappings forthe test failure. The visualization helps to understand (1) the client-side JavaScriptcode related to the assertion failure, (2) the test case’s relations to DOM changesand JavaScript execution, and/or (3) any deviations in the expected behaviour withrespect to a previous version where the test passed. Figure 2.14 depicts an exampleof the high-level view provided by our visualization for a test case.In the high-level view, the progress of an executed test case over time is depictedon the horizontal axis where the earliest assertions are shown on the left-hand sideof the high-level view and the most recent JavaScript events and assertions areshown closer to the right-hand side. The top of Figure 2.14b shows the high-levelvisualization produced by running the same test case from Figure 2.14a on a faultyversion of the application. Passing assertions for a test case are represented as greynodes, and failures are shown in red. In the case of an assertion, causal links relatethe assertion to prior events that may have influenced its outcome. These are eventsthat altered portions of the DOM relevant to the assertion. DOM events, timingevents, and network-related JavaScript events are visualized alongside the assertionsas green, purple and blue nodes, respectively.Clicking on a failed assertion node reveals additional details about it (Fig-ure 2.14b). Details include related (1) DOM dependencies, (2) failure messages, and(3) related JavaScript functions. The final zoom level of an assertion node displaysall the information captured for the assertion including the captured slice, and theline numbers of the failing test case assertions.34	public	void	testSortByDefaults()	{				driver.get(	"http://localhost:9763/store/assets/gadget");					driver.findElement(By.css("i.icon-star")).click();					int	s1	=	driver.findElements(By.css(".asset-icon")).size();					assertEquals(12,	s1);					scrollWindowDown();					int	s2	=	driver.findElements(By.css(".asset-icon")).size();					assertEquals(4,	s2	-	s1);	}	(b)(a)Episode #3Scroll Assertion #2FailAssertion #1PassEpisode #2Click Episode #1LoadEpisode #3Scroll Assertion #2PassAssertion #1PassEpisode #2Click Episode #1Loadvar	currentPage	=	1;var	sortType	=	'default';var	gridSize	=	8;var	infiniteScroll	=	false;var	renderAssets	=	function(url,	size)	{				var	data	=	assetsFromServer(url);				var	temp	=	'<div	class="asset-row">';				for	(i	=	0;	i	<	size;	i++)	{								temp	+=	'		<div	class="asset-icon">';									...	//	Reading	from	variable	'data'								temp	+=	'		</div>';				}				temp	+=	'</div>';				return	$('#assets-container').append(temp);};				$(document).on('click',	'#sort-assets',	function(){								$('#sort-assets').removeClass('selected-type')								$(this).addClass('selected-type');								currentPage	=	1;								sortType	=		$(this).attr('type');								gridSize	=	12;								renderAssets(url	+	sortType	+	currentPage,	gridSize)								infiniteScroll	=	true;				});				var	scroll	=	function()	{								if(infiniteScroll)	{												currentPage++;												renderAssets(url	+sortType	+	currentPage,	gridSize/2)								}				};				$(window).bind('scroll',	scroll);123456789101112131415161718192021222324252627282930313233343536Current	Line:	17	|	Operation:	Read	|	Variable:	temp	|	Value:	'<div12345678910Episode #3Scroll Assertion #1PassEpisode #2Click Episode #1Loadorg.junit.ComparisonFailure: expected:<[4]> but was: <[6]>class: "asset-icon"tagName: "div"renderAssets() scroll() anonymous20()Related FunctionsFailure MessageDOM DependencyTest Case SummaryApplication JavaScript Slice132Figure 2.14: Visualization for a test case. (a) Overview of the passing testcase, (b) Three semantic zoom levels for the failing test case; Top:overview. Middle: second zoom level showing assertion details,while preserving the context. Bottom: summary of failing asser-tion and the backwards slice.When displaying the code slice for an assertion, each line of JavaScript codethat may have influenced the assertion’s outcome is highlighted in the contextof the source code (Figure 2.14b, lower-right). The user can further explore thecaptured slice by stepping through its recorded execution using a provided control35panel, shown in green on Figure 2.14b. By doing so, the user is able to take apost-mortem approach to fault localization whereby the faulty behaviour is studieddeterministically offine after execution has completed. Further, the user can alsoexamine the captured runtime values of relevant JavaScript variables.RESTful API. We deployed a RESTful API that provides access to details aboutcaptured stories and allows the approach to remain portable and scalable. Thisarchitectural decision enables all users, independent of their environments, to takeadvantage of the behavioural model. By invoking authorized calls to the API, onecan represent the model as a custom visualization, or use it as a service in the logicof a separate application.2.2.5 Tool Implementation: ClematisWe implemented our approach in a tool called CLEMATIS, which is freely available4.We use a proxy server to automatically intercept and inspect HTTP responsesdestined for the client’s browser. When a response contains JavaScript code, itis transformed by CLEMATIS. We also use the proxy to inject a JavaScript-basedtoolbar into the web application, which allows the user to start/stop capturing theirinteractions with the application. We used a proxy since it leads to a non-intrusiveinstrumentation of the code. A browser plugin would be a suitable alternative.However, unlike browser plugins, a proxy-based approach does not require installinga plugin, is not dependent on the type of the browser, and does not need to bemaintained and updated based on browser updates. The trace data collected isperiodically transmitted from the browser to the proxy server in JSON format. Toobserve low-level DOM mutations, we build on and extend the JavaScript MutationSummary library5. The model is automatically visualized as a web-based interactiveinterface. Our current implementation does not capture the execution of JavaScriptcode that is evaluated using eval. CLEMATIS provides access to details of capturedstories through a RESTful API.4http://salt.ece.ubc.ca/software/clematis/5http://code.google.com/p/mutation-summary/362.3 Controlled ExperimentsTo assess the efficacy of our program comprehension approach, we conductedtwo controlled experiments, following guidelines by Wohlin et al. [139], one ina research lab setting and the other in an industrial environment. In addition, toassess the test failure comprehension extension of CLEMATIS, we conduct a thirdcontrolled experiment.Common design elements of all experiments are described in this section. Sec-tions 2.4–2.6 are dedicated to describing the specific characteristics and results ofeach experiment, separately.Our evaluation aims at addressing the following research questions. The firstfour research questions are designed to evaluate the main code comprehension unitof CLEMATIS. These questions are investigated in the first two experiments (Section2.4–2.5). RQ1.5, however, assesses the extended test failure comprehension unit ofCLEMATIS (Section 2.6). In order to be able to maintain the duration of experimentsessions reasonable, we decided to evaluate the test comprehension unit separately.RQ1.1 Does CLEMATIS decrease the task completion duration for common tasksin web application comprehension?RQ1.2 Does CLEMATIS increase the task completion accuracy for common tasksin web application comprehension?RQ1.3 For what types of tasks is CLEMATIS most effective?RQ1.4 What is the performance overhead of using CLEMATIS? Is the overallperformance acceptable?RQ1.5 Is the test failure comprehension unit helpful in localizing (and repairing)JavaScript faults detected by test cases?2.3.1 Experimental DesignThe experiments had a “between-subject” design; i.e., the subjects were divided intotwo groups: experimental group using CLEMATIS and control group using othertools. The assignment of participants to groups was done manually, based on thelevel of their expertise in web development. We used a 5-point Likert scale in apre-questionnaire to collect this information, and distributed the level of expertise37Table 2.1: Adopted and adapted comprehension activities.Activity DescriptionA1 Investigating the functionality of (a part of) the systemA2 Adding to / changing the system’s functionalityA3 Investigating the internal structure of an artifactA4 Investigating the dependencies between two artifactsA5 Investigating the run-time interaction in the systemA6 Investigating how much an artifact is usedA7 Investigating the asynchronous aspects of JavaScriptA8 Investigate the hidden control flow of event handlingin a balanced manner between the two groups. None of the participants had anyprevious experience with CLEMATIS and all of them volunteered for the study.Task Design. The subjects were required to perform a set of tasks during the experi-ment, representing tasks normally used in software comprehension and maintenanceefforts. We adapted the activities proposed by Pacione et al. [102], which covercategories of common tasks in program comprehension, to web applications byreplacing two items. The revised activities are shown in Table 2.1. We designed aset of tasks for each experiment to cover these activities. Tables 2.2 and 2.3 showthe tasks for studies 1 and 2 accordingly. Because study 2 was conducted in anindustrial setting, participants had limited time. Therefore, we designed fewer tasksfor this study compared to study 1. Table 2.4 depicts the tasks used in study 3,which aims the fault localization capabilities of CLEMATIS.Independent Variable (IV). This is the tool used for performing the tasks, and hastwo levels: CLEMATIS represents one level, and other tools used in the experimentrepresent the other level (e.g., Chrome developer tools, Firefox developer tools,Firebug).Dependent Variables (DV). These are (1) task completion duration, which isa continuous variable, and (2) accuracy of task completion, which is a discretevariable.Data Analysis. For analyzing the results of each study, we use two types ofstatistical tests to compare dependent variables across the control and experimentalgroups. Independent-samples t-tests with unequal variances are used for durationand accuracy in study 1, and for duration in study 2. However, the accuracy data in38study 2 was not normally distributed, and hence we use a Mann-Whitney U test forthe analysis of accuracy in this study. We use the statistical analysis package R6 forthe analysis.2.3.2 Experimental ProcedureAll experiments consisted of four main phases. First, the subjects were asked to filla pre-questionnaire regarding their expertise in the fields related to this study.In the next phase, the participants in the experimental group were given a tutorialon CLEMATIS. They were then given a few minutes to familiarize themselves withthe tool after the tutorial.In the third phase, each subject performed a set of tasks, as outlined in Tables2.2 and 2.3. Each task was given to a participant on a separate sheet of paper, whichincluded instructions for the task and had room for the participant’s answer. Oncecompleted, the form was to be returned immediately and the subject was given thenext task sheet. This allowed us to measure each task’s completion time accurately,to answer RQ1.1 and RQ1.3. To address RQ1.2 and RQ1.3, the accuracy of eachtask was later evaluated and marked from 0 to 100 according to a rubric that wehad created prior to conducting the experiment. The design of the tasks allowed theaccuracy of the results to be quantified numerically. The tasks and sample rubricsare available in our technical report Alimadadi et al. [5].In the final phase, participants filled out a post-questionnaire form providingfeedback on their experience with the tool used (e.g., limitations, strength, usability).2.4 Experiment 1: Lab EnvironmentThe first controlled experiment was conducted in a lab setting with students andpostdocs at the University of British Columbia (UBC).2.4.1 ApproachExperimental Design. For this experiment, both groups used Mozilla Firefox 19.0.The control group used Firebug 1.11.2. We chose Firebug in the control group since6http://www.r-project.org39Table 2.2: Comprehension tasks used in study 1.Task Description ActivityT1 Locating the implementation of a feature modifying the DOM A1, A4T2 Finding the functions called after a DOM event (nested calls) A1, A4, A5T3.a Locating the place to add a new functionality to a function A2, A3T3.b Finding the caller of a function A4, A5T4.a Finding the functions called after a DOM event (nested calls + bubbling) A1, A4, A5T4.b Locating the implementation of a UI behavior A1, A3, A4T5.a Finding the functions called after a DOM event (bubbling + capturing) A1, A5, A8T5.b Finding the changes to DOM resulting from a user action A4, A5T6.a Finding the total number of sent XHRs A6, A7T6.b Finding if there exists an un-responded XHR A4, A5, A7it is the de facto tool used for understanding, editing, and debugging modern webapplications.7 Firebug has been used in other similar studies Zaidman et al. [143].Experimental Subjects. We recruited 16 participants for the study, 3 females and13 males. The participants were drawn from different educational levels: 2 under-graduate students, 5 Master’s students, 8 Ph.D. students, and 1 Postdoctoral fellow,at UBC. The participants represented different areas of software and web engineer-ing and had skills in web development ranging from beginner to professional. Thetasks used in this study are enumerated in Table 2.2.Experimental Object. We decided to use a web-based survey application that wasdeveloped in our lab. The application had modest size and complexity, so that itcould be managed within the time frame anticipated for the experiment. Yet itcovered the common comprehension activities described in Table 2.1.Experimental Procedure. We followed the general procedure described in section2.3.2. After filling the pre-questionnaire form, the participants in the control groupwere given a tutorial on Firebug and had time to familiarize themselves with it,though most of them were already familiar with Firebug.2.4.2 ResultsDuration. To address RQ1.1, we measured the amount of time (minutes:seconds)spent on each task by the participants, and compared the task durations between7Firebug has over 3 million active daily users: https://addons.mozilla.org/en-US/firefox/addon/firebug/statistics/usage/4016:0020:0024:0028:0032:0036:0040:0044:0048:0052:00Total Time (m:s)Clematis FirebugFigure 2.15: t-test analysis withunequal variances oftask completion dura-tion by tool type. Lowervalues are better. [Study1]Time (m:s)0:002:004:006:008:0010:0012:0014:00T1 ClematisT1 FirebugT2 ClematisT2 FirebugT3 ClematisT3 FirebugT4 ClematisT4 FirebugT5 ClematisT5 FirebugT6 ClematisT6 FirebugFigure 2.16: Box plots of taskcompletion durationdata per task for eachtool. Lower values arebetter. [Study 1]30405060708090100Total Accuracy (%)Clematis FirebugFigure 2.17: t-test analysis withunequal variances oftask completion accu-racy by tool type. Highervalues are better. [Study1]Accuracy (%)020406080100T1 ClematisT1 FirebugT2 ClematisT2 FirebugT3 ClematisT3 FirebugT4 ClematisT4 FirebugT5 ClematisT5 FirebugT6 ClematisT6 FirebugFigure 2.18: Box plots of taskcompletion accuracydata per task for eachtool. Higher values arebetter. [Study 1]CLEMATIS and Firebug using a t-test. According to the results of the test, therewas a statistically significant difference (p-value=0.002) in the durations betweenCLEMATIS (M=23:22, SD=4:24) and Firebug (M=36:35, SD=8:35). Figure 2.15shows the results of the comparisons.To investigate whether certain categories of tasks (Table 2.2) benefit more fromusing CLEMATIS (RQ1.3), we tested each task separately. The results showed im-provements in time for all tasks. The improvements were statistically significant fortasks 2 and 5, and showed a 60% and 46% average time reduction with CLEMATIS,respectively. The mean times of all tasks for CLEMATIS and Firebug are presented41Table 2.3: Comprehension tasks used in study 2.Task Description ActivityT7 Extracting the control flow of an event with delayed effects A1, A4, A5, A7T8 Finding the mutations in DOM after an event A1, A5T9 Locating the implementation of a malfunctioning feature A1, A2, A3T10 Extracting the control flow of an event with event propagation A1, A5, A8in Figure 2.16. The results show that on average, participants using CLEMATISrequire 36% less time than than the control group using Firebug, for performing thesame tasks.Accuracy. The accuracy of answers was calculated in percentages. We comparedthe accuracy of participants’ answers using a t-test. The results were again in favourof CLEMATIS and were statistically significant (p=0.02): CLEMATIS (M=83%,SD=18%) and Firebug (M=63%, SD=16%). This comparison of accuracy betweentools is depicted in Figure 2.17. As in the duration case, individual t-tests werethen performed for comparing accuracy per task (related to RQ1.3). CLEMATISshowed an increased average accuracy for all tasks. Further, the difference wasstatistically significant in favour of CLEMATIS for task 5, and subtasks 4.a and 5.a.The results show that participants using CLEMATIS achieved 22% higher accuracythan participants in the control group. We plot the average accuracies of all tasksfor CLEMATIS and Firebug in Figure 2.18. We discuss the implications of theseresults in Section Experiment 2: IndustrialTo investigate CLEMATIS’s effectiveness in more realistic settings, we conducteda second controlled experiment at a large software company in Vancouver, wherewe recruited professional developers as participants and used an open-source webapplication as the experimental object.2.5.1 ApproachExperimental Design. Similar to the first experiment, the participants were dividedinto experimental and control groups. The experimental group used CLEMATIS42llT7−CtrlT7−ExpT8−CtrlT8−ExpT9−CtrlT9−ExpT10−CtrlT10−ExpTotal−CtrlTotal−ExpDuration (mm:ss)0:008:2016:4025:0033:2041:4050:00Figure 2.19: Notched box plotsof task completion dura-tion data per task andin total for the control(gold) and experimental(green) groups (lowervalues are desired).[Study 2]lllllll llT7−CtrlT7−ExpT8−CtrlT8−ExpT9−CtrlT9−ExpT10−CtrlT10−ExpTotal−CtrlTotal−Exp020406080100Accuracy (%)Figure 2.20: Notched box plotsof task completion accu-racy data per task andin total for the con-trol (gold) and exper-imental (green) groups(higher values are de-sired). [Study 2]throughout the experiment. Unlike the previous experiment, members of the controlgroup were free to use the tool of their choice for performing the tasks. The intentionwas for the participants to use whichever tool they were most comfortable with.5 participants used Google Chrome’s developer tools, 2 used Firefox’s developertools, and 3 used Firebug.Experimental Subjects. We recruited 20 developers from a large software companyin Vancouver, 4 females and 16 males. They were 23 to 42 years old and had mediumto high expertise in web development.Task Design. For this experiment, we used fewer but more complex taskscompared to the first experiment. We designed 4 tasks (Table 2.3) spanning thecategories: following the control flow, understanding event propagation, detectingDOM mutations, locating feature implementation, and determining delayed codeexecution using timeouts.43Experimental Object. Phormer 8 is an online photo gallery in PHP, JavaScript,CSS and XHTML. It provides features such as uploading, commenting, rating,and displaying slideshows for users’ photos. It contains typical mechanisms suchas dynamic DOM mutation, asynchronous calls (XHR and timeouts), and eventpropagation. Phormer has over 6,000 lines of JavaScript, PHP and CSS code in total(1500 lines of JavaScript). It was rated 5.0 star on SourceForge and had over 38,000downloads at the time of conducting the experiment.Experimental Procedure. We followed the same procedure described in 2.3.2,with one difference: the participants in the control group were not given any tutorialregarding the tool they used throughout the experiment, as they were all proficientusers of the tool of their choice.2.5.2 ResultsBox plots of task completion duration and accuracy, per task and in total, for thecontrol (Ctrl) and experimental (Exp) groups, are depicted in Figures 2.19 and 2.20,respectively.Duration. Similar to the previous experiment, we ran a set of t-tests for the totaltask duration as well as for the time spent on individual tasks. The results of thetests showed a statistically significant difference (p-value = 0.0009) between theexperimental group using CLEMATIS (M=15:37, SD=1:43) and the control group(M=29:12, SD=5:59), in terms of total task completion duration. The results showedimprovements in duration when using CLEMATIS for all four tasks. We foundsignificant differences in favour of CLEMATIS for tasks T7, T8 and T9. The resultsshow that developers using CLEMATIS took 47% less time on all tasks compared todevelopers in the control group.Accuracy. We used Mann-Whitney U tests for comparing the results of task accu-racy between the control and the experimental group, since the data was not normallydistributed. For the overall accuracy of the answers, the tests revealed a statisticallysignificant difference with high confidence (p-value = 0.0005) between CLEMATIS(M=90%, SD=25%) and other tools (M=35%, SD=20%). We then performed thecomparison between individual tasks. Again, for all tasks the experimental group8http://p.horm.org/er/44using CLEMATIS performed better on average. We observed statistical significantimprovements in the accuracy of developers using CLEMATIS for tasks T7, T8 andT10. The results show that developers using CLEMATIS performed more accuratelyacross all tasks by 157% on average, compared to developers in the control group.2.5.3 Qualitative Analysis of Participant FeedbackThe industrial participants in our second experiment shared their feedback regardingthe tool they used in the experiment session (CLEMATIS for the experimentalgroup and other tools for the control group). They also discussed their opinionsabout the features an ideal web application comprehension tool should have. Wesystematically analyzed [36] the qualitative data to find the main features of a webapplication comprehension tool according to professional web developers. To thebest of our knowledge, at the time conducting this study, there were neither anytools available specifically designed for web application comprehension, nor anystudies on their desirable characteristics.Data CollectionThe participant selection was based on introductions by the team leads in thecompany. Our research group had started a research collaboration with the companyand they were willing to spread the word about the experiment and help recruitvolunteer participants. The examiner was present at the company starting two weeksprior to the experiment and helped the procession of recruiting and if possible,giving an introduction to the potential participants.Our overall policy for recruiting participants was random sampling. However,throughout the course of the experiment, we tried to partially apply theoreticalsampling by asking participants to recommend other candidates fit for attending theexperiment. In general, this did have a noticeable impact on our sampling processsince our desirable sample set had to be diverse. A wider range of experience andproficiency was suitable for our purpose, as we wanted to support various groupsof web developers by CLEMATIS. Moreover, preserving the overall randomnessof sampling was necessary for ensuring the validity of our qualitative analysis.Hence, we examined the background and the experience of our potential candidates45and tried to include a more diverse group of participants that still met our originalrequirements.In the final phase of the experiment, where we gathered the qualitative data,the participants filled a post-questionnaire form with open-ended questions. Theforms allowed them to focus and provide answers without having the sense of beingwatched. Next, they were interviewed verbally based on both their answers to thequestionnaire, and the comments of previous participants. During the interviews,the examiner took notes of the participants’ answers as well as their expressions andbody language, which could convey more insight into participants’ intents.Extracting the ConceptsAfter each group of consecutive sessions was completed, we started coding thegathered data based on open coding principles. We read and analyzed comments andinterview manuscripts of each participant, and coded every comment based on theparticipant’s intention. At this stage, no part of the data was excluded. The codingonly helped us extract the existing concepts within the data. Hence, by performingcoding parallel to conducting the experiments, we were able to better direct ourfollowing interview sessions. This process enabled us to observe the emergingcategories as we proceeded with the experiment. We used this information to guidethe interviews towards discovering the new data. Moreover, we simultaneouslycompared the coded scripts of different participants. This allowed us to investigatethe consistencies or differences between the derived concepts.As we progressed further in conducting the experiment sessions, the core cat-egories of concepts began to emerge from the coded data. We used memos toanalyze these categories early in the process, while we were still able to improvethe interviews.Categories started to form during the process of coding the data. We startedto recognize the core categories based on the density of the data in each category.We then continued with selective coding of the remaining forms and manuscripts.We intentionally permitted the evolution of multiple core categories (as opposedto one), in order to account for different aspects of an ideal comprehension tool toget recognized. Multiple categories were integrated to create each core category.46The concepts that contributed to building each core category were referred to by anoticeable number of participants. Various subcategories were brought together toform different aspects of a desirable web application comprehension tool accordingthe developers who are interested in using such a tool. Closer to the end of theexperiments, only the more relevant categories to the core categories were selecteddue to selective coding. The maturity of the core categories (described below) wasindicated when the newly gathered data did not contribute much to the existingcategories.Guidelines for Web Application Comprehension ToolsThe following are the characteristics of a desirable web application comprehensiontool, derived from the participants’ responses to our post-questionnaire forms andinterviews.• Integration with debugging.One of the most prevalent concepts that was discussed by the participants wasdebugging. All of our participants were using a browser-specific debuggerin their everyday tasks. Although these debugging capabilities are not besttuned for web application comprehension, they still play a potent role inweb development process. Almost all developers in the control group usedone or more features of a debugger. Many developers in the experimentalgroup requested adding features such as setting break points and step-by-stepexecution to CLEMATIS. Some of our participants suggested the integrationof CLEMATIS with commonly-used platforms that support debugging.• DOM inspection.Majority of the participants used the DOM inspection feature of browserdevelopment tools extensively. However, the participants in the control groupwere frustrated by the unavailability of a feature that allows them to easilydetect all of changes to the DOM after a certain event. This option wasprovided for CLEMATIS users, most of whom chose this feature as one oftheir favourite features. The majority of the participants in the experimentalgroup mentioned CLEMATIS’s DOM mutation view is particularly useful, andrequested a better visualization.47• JavaScript and DOM interaction.Many participants in the control group were complaining about the lack ofbetter means of relating the JavaScript code to DOM elements and events.Not using CLEMATIS, there is currently no trivial way of relating DOM eventsto the respective executed JavaScript code. Moreover, there is no connectionbetween a DOM feature and the JavaScript code responsible for that feature.This can make the common task of feature location rigorous.• Call hierarchy.One of the most popular topics of CLEMATIS users was any concept relatedto the trace it keeps in each episode. The majority of the participants in theexperimental group were pleased by the ease of understanding the customizedsequence diagrams. They quickly adopted this feature, and many of theCLEMATIS users were also impressed by the inclusion of asynchronouscallbacks and propagated event handlers. On the other hand, most of theparticipants in the control group expressed dissatisfaction by the lack offeatures such as call stacks in existing tools.• Interactivity and realtimeness.Many CLEMATIS users mentioned more interaction and better responsivenessof the tool as a key factor in adopting it for their every-day tasks. Intrigued bythe ability to capture a story of interactions, they were demanding realtimecreation of stories while interacting with the application, and better analysisperformance. The industrial tools used by the control group provided muchbetter performance, but lacked many of the desired features (other corecategories).• Sophisticated visualization.Many participants indicated that visualization techniques and the usabilityfactors can hugely impact their usage of a tool. Most of CLEMATIS userpreferred the focus+context technique adopted by CLEMATIS. However,being an academic prototype, CLEMATIS has much room for improvement interms of interface design and usability. In general, any tool that supports alltechnical core categories can still be unsuccessful should it fail in deliveringthe necessary information to users through a visualization.48Table 2.4: Injected faults for the controlled experiment.Fault Fault Description Detecting TestCaseRelatedTaskF1 Altered unary operation related to navigating slideshow SlideShowTest T11F2 Modified string related to photo-rating feature MainViewTest T12F3 Changed number in branch condition for photo-rating feature MainViewTest T12F4 Transformed string/URL related to photo-rating feature MainViewTest T12There were few features that the participants found useful, but were not includedin the core categories. Among them was semantic zooming, or presenting theoverview first and providing more details on demand. Another popular featurewas the extraction of DOM mutations per event. The participants also requestedfor a number of features to be included in future versions of the tool. Thesefeatures included filtering and query options for DOM mutations, and the ability toattach notes to bookmarked episodes. Overall, according to two of our industrialparticipants, CLEMATIS is “Helpful and easy to use” and “Very useful. A lot ofpotential for this tool!”.2.6 Experiment 3: Test Failure ComprehensionWe conducted a third controlled experiment to assess the effectiveness of our testfailure comprehension extension of CLEMATIS.2.6.1 ApproachExperimental Design. Once again, we divided the participants into experimental(CLEMATIS) and control groups.Experimental Subjects. 12 participants were recruited for the study at the Univer-sity of British Columbia (UBC), three females and nine males. The participantswere drawn from different education levels at UBC. They all had prior experience inweb development and testing, ranging from beginner to professional. Furthermore,six of the participants had worked in industry previously either full-time or throughinternshipsTask Design. For this experiment, we used fewer but more complex tasks comparedto the first experiment. To answer RQ1.5, participants were given two main tasks,49each involving the debugging of a test failure in the Phormer application (Table 2.4).For each task, participants were given a brief description of the failure and a testcase capable of detecting the failure. We used a test suite written by a UBC studentfor the Phormer application. The test suite was written as part of a separate andindependent course project, six months before the inception of our project presentedin this paper.For the first task of this experiment (T11), they were asked to locate an injectedfault in Phormer given a failing test case. Participants were asked not to modify theapplication’s JavaScript code during T11.The second task of this experiment (T12) involved identifying and fixing aregression fault (unrelated to the first one). For this task, participants were askedto locate and repair the fault(s) causing the test failure. As the second failure wascaused by three separate faults, participants were allowed to modify the applicationsource code in order to iteratively uncover each fault by rerunning the test case. Inaddition to the failing test case, participants in both groups were given two versionsof Phormer, the faulty version and the original fault-free one. The intention herewas to simulate a regression testing environment.The injected faults are based on common mistakes JavaScript developers makein practice, as identified by Mirshokraie et al. [90].Experimental Object. Similar to the previous experiment, we used Phormer as theexperimental object.Experimental Procedure. The procedure was similar to what we described in2.3.2. A maximum of 1.5 hours was allocated for the study: 10 minutes weredesignated for an introduction, 15 minutes were allotted for users to familiarizethemselves with the tool being used, 20 minutes were allocated for task 11, another30 minutes were set aside for task 12, and 15 minutes were used for completing thequestionnaire at the end of the study.2.6.2 ResultsFigure 2.21 and Figure 2.22 depict box plots of task completion accuracy andduration, per task and in total, for both the experimental group (Exp) and the controlgroup (Ctrl).50llT1−ExpT1−CtrlT2−ExpT2−CtrlTotal−ExpTotal−Ctrl020406080100Accuracy (%)Figure 2.21: Box plots of taskcompletion accuracydata per task and intotal for the control(blue) and experimental(cream) groups (highervalues are desired).[Study 3]llT1−ExpT1−CtrlT2−ExpT2−CtrlTotal−ExpTotal−CtrlDuration (mm:ss)8:2016:4025:0033:2041:4050:00Figure 2.22: Box plots of taskcompletion durationdata per task and intotal for the control(blue) and experimental(cream) groups (lowervalues are desired).[Study 3]Accuracy. The accuracy of participant answers was calculated to answer RQ5.Overall, the group using CLEMATIS (M = 95.83, SD = 10.21) performed muchmore accurately than the control group (M = 47.92, SD = 45.01). The resultsshow a statistically significant improvement for the experimental group (p-value =0.032). Comparing the results for the two tasks separately, the experimental groupperformed better on both tasks on average. The results show that participants usingCLEMATIS performed more accurately across both tasks by a factor of two, onaverage, compared to those participants in the control group.Duration. To further answer RQ5, we measured the amount of time (minutes:seconds)spent by participants on each task and in total. According to the results of the tests,there was a statistically significant difference in the duration of T11 for CLEMATIS(M = 5:42, SD = 2:10) and the control group (M = 12:03, SD = 4:29); p-value =0.016. Comparison of the duration data gathered for T12 yielded no significantdifference between CLEMATIS (M = 23:23, SD = 6:31) and the control group (M51= 19:46, SD = 8:05); p-value > 0.05. Those participants in the control group whoanswered task 2 correctly required a mean duration of 25:21 to complete the task,which is a longer time than the mean duration of the experimental group. The resultsrevealed no significant difference between CLEMATIS group (M = 29:05, SD =7:42) and the control group (M = 31:49, SD = 10:37) with regard to the total timespent (p-value > 0.05). The results show that developers using CLEMATIS took 54%less time to localize a detected fault. The results are inconclusive regarding faultrepair time.2.7 Performance OverheadWith respect to RQ4, there are three sources of potential performance overhead:(1) instrumentation overhead, (2) execution overhead, and (3) dynamic analysisoverhead. The first pertains to the overhead incurred due to the instrumentation codeadded by CLEMATIS, while the second pertains to the overhead of processing thetrace and constructing the model. The third type of overhead is caused by dynamicslicing, and can only occur when the test failure comprehension unit is activated.We do not measure the overhead of visualization as this is dependent on the usertask performed.We measure the first two types of overhead when the test comprehension unitis deactivated. Then we activate the test unit and measure the additional overhead.Phormer, the experimental object in study 2, is used to collect performance mea-surements over 10 one-minute trials of user interaction with the application. Wealso activate the test comprehension unit, and execute each of the two test casesfrom experiment 3 with both selective instrumentation enabled and disabled. Thetwo tests were run 10 times each. The results are as follows:Instrumentation overhead. Code comprehension. Average delays of 15.04 and1.80 seconds were experienced for pre and post processing phases with CLEMATISrespectively. And a 219.30 ms additional delay was noticed for each page. Onaverage, each captured episode occupies 11.88 KB within our trace.Test comprehension Average delays of 1.29 and 1.83 seconds were introducedby the selective and non-selective instrumentation algorithms, respectively, on topof the 407 ms required to create a new browser instance. Moreover, the average52trace produced by executing the selectively instrumented application was 37 KB insize. Executing a completely instrumented application resulted in an average tracesize of 125 KB. Thus, the selective instrumentation approach is able to reduce tracesize by 70% on average, while also reducing instrumentation time by 41%.Execution overhead. Code comprehension. For processing one minute of activitywith Phormer, CLEMATIS experienced an increase of 250.8 ms, 6.1 ms and 11.6 msfor DOM events, timeouts and XHRs, respectively. Based on our experiments, therewas no noticeable delay for end-users when interacting with a given web applicationthrough CLEMATIS.Test Comprehension The actual execution of each test case required an addi-tional 246 ms for the selectively instrumented application. Instrumenting the entireapplication without static analysis resulted in each test case taking 465 ms longerto execute. Based on these measurements, our selective instrumentation approachlowers the execution overhead associated with CLEMATIS by 47%.Dynamic analysis overhead. It took CLEMATIS 585 ms on average to computeeach JavaScript slice when utilizing selective instrumentation. Non-selective instru-mentation lengthened the required dynamic analysis time to 750 ms. By analyzinga more concise execution trace, CLEMATIS was able to lower the slice computationtime by 22%. Thus, we see that CLEMATIS incurs low performance overhead in allthree components, mainly due to its selective instrumentation capabilities.2.8 Discussion2.8.1 Task Completion DurationTask completion duration is a measure of task performance. Therefore, CLEMATISimproves web developers’ performance by significantly decreasing the overall timerequired to perform a set of code comprehension tasks (RQ1.1).Dynamic Control Flow. Capturing and bubbling mechanisms are pervasive in Java-Script-based web applications and can severely impede a developer in understandingthe dynamic behaviour of an application. These mechanisms also complicate thecontrol flow of an application, as described in Section 2.1. Our results showthat CLEMATIS significantly reduces the time required for completing tasks that53involve a combination of nested function calls, event propagation, and delayedfunction calls due to timeouts within a web application (T2, T5.a, and T7). Hence,CLEMATIS makes it more intuitive to comprehend and navigate the dynamic flowof the application (RQ1.3).One case that needs further investigation is T10. This task mainly involvesfollowing the control flow when most of the executed functions are invoked throughevent propagation. The results of this task indicate that although using CLEMATIScaused an average of 32% reduction in task completion duration, the differencewas not statistically significant. However, closer inspection of the results revealsthat the answers given using CLEMATIS for T10 are 68% more accurate in average.This huge difference shows that many of the developers in the control group wereunaware of occurrences of event propagation in the application, and terminated thetask early. Hence, they scored significantly lower than the experimental group intask accuracy and still spent more time to find the (inaccurate) answers.Feature Location. Locating features, finding the appropriate place to add a newfunctionality, and altering existing behaviour are a part of comprehension, mainte-nance and debugging activities in all software tools, not only in web applications.The results of study 1 suggested that CLEMATIS did reduce the average time spenton the tasks involving these activities (T1, T3, T4.b), but these reductions werenot statistically significant. These tasks mostly dealt with static characteristics ofthe code and did not involve any of the features specific to JavaScript-based webapplications. Study 2, however, involved more complicated tasks in more realisticsettings. T9 represented the feature location activity in this study, and the resultsshowed that using CLEMATIS improved the average time spent on this task by68%. Thus, we see that CLEMATIS speeds up the process of locating a feature or amalfunctioning part of the web application (RQ1.3).State of the DOM. The final category of comprehension activities investigated inthis work is the implications of events on the state of the DOM. Results of Study1 displayed a significant difference in duration of the task involving finding DOMmutations in favour of CLEMATIS (T5). The results of Study 2 further confirmed thefindings of Study 1 by reducing the duration in almost half (T8). Thus, CLEMATISaids understanding the behaviour of web applications by extracting the mutated54elements of the DOM, visualizing contextual information about the mutations, andlinking the mutations back to the corresponding JavaScript code (RQ1.3).Test Failure Comprehension. The average recorded task duration for T11 wassignificantly lower for the experimental group. The participants in the controlgroup often used breakpoints to step through the application’s execution whilerunning the provided test case. When unsure of the application’s execution, thesedevelopers would restart the application and re-execute the test case, extendingtheir task duration. Instead of following a similar approach, those developers usingCLEMATIS were able to rewind and replay the application’s execution multiple timesoffline, after only executing the test case once. The trace collected by CLEMATISduring this initial test case execution was used to deterministically replay theexecution while avoiding the overhead associated with re-running the test case.While task duration was significantly improved by CLEMATIS for T11, the aver-age measured task duration was in fact longer for CLEMATIS in T12. However, theparticipants using CLEMATIS performed much more accurately on T12, suggestingthat the task is complex and the main advantage of using CLEMATIS is in accuratecompletion of the task. Studying the accuracy results for T12 reveals that manyof the participants in the control group failed at correcting the faults, and insteadsimply addressed the failure directly. This may explain the reason for no observableimprovement in task duration for T12, as hiding the failure often requires less effortthan repairing the actual fault.2.8.2 Task Completion AccuracyTask completion accuracy is another metric for measuring developers’ performance.According to the results of both experiments, CLEMATIS increases the accuracyof developers’ actions significantly (RQ1.2). The effect is most visible when thetask involves event propagation (RQ1.3). The outcome of Study 1 shows thatCLEMATIS addresses Challenge 1 (described in Section 2.1) in terms of both timeand accuracy (T5.a). Study 2 further indicates that CLEMATIS helps developers tobe more accurate when faced with tasks involving event propagation and controlflow detection in JavaScript applications (67% and 68% improvement for T7 andT10 respectively).55For the remaining tasks of Study 1, the accuracy was somewhat, though notsignificantly, improved. We believe this is because of the simplistic design of theexperimental object used in Study 1, as well as the relative simplicity of the tasks.This led us towards the design of Study 2 with professional developers as participantsand a third-party web application as the experiment object in the evaluation ofCLEMATIS. According to the results of Study 2, CLEMATIS significantly improvesthe accuracy of completion of tasks (T8) that require finding the implications ofexecuted code in terms of DOM state changes (RQ1.3). This is related to Challenge3 as described in Section 2.1.For the feature location task (T9), the accuracy results were on average slightlybetter with CLEMATIS. However, the experimental group spent 68% less time on thetask compared to the control group. This is surprising as this task is common acrossall applications and programming languages and we anticipated that the results forthe control group would be comparable with those of the experimental group.Test Failure Comprehension. The results from both experimental tasks suggestthat CLEMATIS is capable of significantly improving the fault localization and repaircapabilities of developers (RQ1.5). many participants in the control group failedto correctly localize the fault, illustrating the difficulty in tracing dependencies ina dynamic language such as JavaScript. Although users in the control group hadaccess to breakpoints, many of them had difficulty stepping through the application’sexecution at runtime due to the existence of asynchronous events such as timeouts,which caused non-deterministic behaviour in the application when triggered in thepresence of breakpoints.Many of the participants in the control group fixed the failure instead of theactual fault; they altered the application’s JavaScript code such that the providedtest case would pass, yet the faults still remained unfixed. The JavaScript coderelated to task 2 contained multiple statements that accessed the DOM dependencyof the failing test case assertion. Participants who simply corrected the failure hadtrouble identifying which of these statements was related to the fault, and as a resultwould alter the wrong portion of the code. On the other hand, those participantsusing CLEMATIS were able to reason about these DOM altering statements usingthe provided links and slices.562.8.3 Consistent PerformanceLooking at Figures 2.19 and 2.20, it can be observed that using CLEMATIS not onlyimproves both duration and accuracy of individual and total tasks, but it also helpsdevelopers to perform in a much more consistent manner. The high variance in theresults of the control group shows that individual differences of developers (or toolsin Study 2) influence their performance. However, the low variance in all the tasksfor the experimental group shows that CLEMATIS helped all developers in the studyto perform consistently better by making it easier to understand the internal flowand dependency of event-based interactions.2.8.4 Threats to ValidityInternal Threats. The first threat is that different levels of expertise in each subjectgroup could affect the results. We mitigated this threat by manually assigning thesubjects to experimental and control groups such that the level of expertise was bal-anced between the two groups. The second threat is that the tasks in the experimentwere biased towards CLEMATIS. We eliminated this threat by adopting the tasksfrom a well-known framework of common code comprehension tasks Pacione et al.[102]. A third threat arises from the investigators’ bias towards CLEMATIS whenrating the accuracy of subjects’ answers. We addressed this concern by developingan answer key for all the tasks before conducting the experiments. A similar concernarises regarding the task completion duration measurements. We mitigated thisthreat by presenting each task to subjects on a separate sheet of paper and askingthem to return it upon completion. The duration of each task was calculated fromthe point a subject received the task until they returned the paper to the investigators,thus eliminating our bias in measuring the time (and the subjects’ bias in reportingthe time). Finally, we avoided an inconsequential benchmark by choosing a tool forthe control group in Study 1 that was stable and widely-deployed, namely Firebug.In Study 2, the developers in the control group were given the freedom to chooseany tool they preferred (and had experience with).External Threats. An external threat to validity is that the tasks used in theexperiment may not be representative of general code comprehension activities.As mentioned above, we used the Pacione’s framework and thus these tasks are57generalizable. A similar threat arises with the representativeness of the participants.To address this threat, we used both professional web developers and students/post-docs with previous web development experience.Reproducibility. As for replicating our experiments, CLEMATIS, the experimen-tal object Phormer, and the details of our experimental design (e.g., tasks andquestionnaires) are all available making our results reproducible.2.8.5 LimitationsThe contributions of this work were essential basic steps towards an interactiveapproach for understanding event-based interactions in client-side JavaScript. How-ever, our approach entails many limitations and has much room left for futureimprovements.JavaScript is a highly dynamic language. There are many cases that occurin JavaScript applications and are not currently supported by CLEMATIS. As anexample, CLEMATIS does not instrument JavaScript code that is maintained instrings and is executed using eval(). Also, should an exception occur and changethe normal means of function execution, the resulting model may be affected.However, these are among features of JavaScript that can be handled in near futureusing the current design.There is also room left for research in determining the episode ending criteria.For terminating an episode, the current approach ensures that the call stack is emptyand there are no immediate asynchronous timing events in the event loop. If theseconditions are valid and there is inactivity in JavaScript execution for a certainamount of time, the algorithm terminates the episode. We determined the minimumrequired inactivity time by choosing the best results from is a set of empiricalexaminations. Further investigation on this temporal threshold, as well as othercriteria that can define the boundaries of episodes may lead to interesting findings.Finally, the resulting model can still be overwhelming for users. Large-scaleenterprise applications often have customized event frameworks and communicatewith their servers constantly. CLEMATIS’s semantic zooming can help mitigate thisissue, but to a limit. Proposing abstraction and categorization techniques techniquesfor CLEMATIS’s visualization can be applied to further assist the comprehension58process.2.9 Concluding RemarksModern web applications are highly dynamic and interactive, and offer a rich expe-rience for end-users. This interactivity is made possible by the intricate interactionsbetween user-events, JavaScript code, and the DOM. However, web developers facenumerous challenges when trying to understand these interactions. In this paper, weproposed a portable and fully-automated technique for relating low-level interac-tions in JavaScript-based web applications to high level behaviour. We proposeda behavioural model to capture these event interactions, and their temporal andcausal relations. We also proposed a strategy for helping developers understandthe root causes of failing test cases. We presented a novel interactive visualizationmechanism based on focus+context techniques, for presenting these complex eventinteractions in a more comprehensible format to web developers. Our approachis implemented in a code comprehension tool, called CLEMATIS. The evaluationof CLEMATIS points to the efficacy of the approach in reducing the overall timeand increasing the accuracy of developer actions, compared to state-of-the-art webdevelopment tools. The greatest improvement was seen for tasks involving con-trol flow detection, and especially event propagation, showing the power of ourapproach.59Chapter 3Hybrid DOM-Sensitive ChangeImpact Analysis for JavaScriptIn Chapter 2, we discussed understanding the interactions of episodes of JavaScriptexecution. However, software constantly changes to adapt to the changing envi-ronment. Understanding and analyzing the impact of change has been a popularresearch trend. However, performing change impact analysis on JavaScript appli-cations is challenging due to features such as the seamless interactions with theDOM, event-driven and dynamic function calls, and asynchronous client/servercommunication.The first feature is the interplay between the JavaScript code and the DocumentObject Model (DOM) at runtime. The DOM is a standard object model representingHTML at runtime. DOM APIs are used in JavaScript for dynamically accessing,traversing, and updating the content, the structure, and the style of HTML pages.We have observed that the impact of a code change can be propagated through theDOM, even when there may be no visible connections between JavaScript functionsand variables in the JavaScript code. The second feature pertains to the highlydynamic [116] and event-driven [133] nature of JavaScript code. For instance, asingle fired event can dynamically propagate on the DOM tree [133] and triggermultiple listeners indirectly. These implicit relations between triggered functionsare not directly visible in the JavaScript code. Finally, XMLHttpRequest (XHR)objects used for asynchronous communication in JavaScript can transfer the impact60of a change between two parts of the program that are not explicitly connectedthrough the code. For instance, a server response can dynamically generate andexecute JavaScript code on the client-side through callbacks.In this chapter, we propose a hybrid analysis method for change impact analysisof JavaScript-based web applications that combines the advantages of both staticand dynamic analysis techniques to obtain a more complete impact set. Our analysisis DOM-sensitive and aware of the event-driven, dynamic and asynchronous entities,and their relations in JavaScript. It creates a novel graph-based representationcapturing these relations, which is used for detecting the impact set of a given codechange. The main contributions of our work are as follows.• A formalization of factors and challenges involved in change impact analysisfor JavaScript.• An exploratory study to investigate the existence and role of impact paths thatrequire the analysis of DOM-related and event-based features in JavaScript.The results show that these features exist in real-world applications and cannotbe ignored by JavaScript change impact analysis.• A DOM-sensitive event-aware hybrid change impact analysis technique forJavaScript. The approach creates a novel hybrid model for identifying theimpact set of a change in a JavaScript application.• A set of metrics for ranking the inferred impact set to facilitate the findingand understanding of the desired change impact by developers.• An implementation of our approach in a tool called TOCHAL (TOol forCHange impact AnaLysis). TOCHAL is open source and available for down-load [130].• An empirical evaluation of TOCHAL through a comparison with traditionalpure static and dynamic analysis approaches, as well as a controlled experi-ment to assess the usefulness of TOCHAL in an industrial setting.Our results show that event-driven and dynamic interactions between JavaScriptcode and the DOM are prominent in real applications, can affect change propagation,and thus should be part of a JavaScript impact analysis technique. We also find thata hybrid of both static and analysis techniques is necessary for a more complete611 function checkPrice() {2 var itemName = extractName($('#item231'));3 var cadPrice = $('#price_ca').innerText;4 $.ajax({5 url : "prices/latest.php",6 type : "POST",7 data : itemName,8 success : eval(getAction() + "Item")9 });10 confirmPrice();11 }12 function updateItem(xhr) {13 var updatedInfo = getUpdatedPrice(xhr.responseText);14 suggestItem.apply(this, updatedInfo);15 }16 function suggestItem() {17 if (arguments.length > 2) {18 displaySuggestion(arguments1);19 }20 }21 function calculateTax() {22 $(".price").each(function(index) {23 $(this).text(addTaxToPrice($(this).text()));24 });25 }26 $("#price-ca").bind("click", checkPrice);27 $("prices").bind("click", calculateTax);Figure 3.1: Motivating example: JavaScript codeanalysis. And finally, TOCHAL can improve developers’ performance in terms ofboth impact analysis task completion time (by 78%) and accuracy (by 223%).3.1 Impact Transfer in JavaScriptMany unique features of JavaScript applications require special attention duringimpact analysis. These features include (but are not limited to) DOM interactions,dynamic event-driven execution of functions, and asynchronous communicationwith the server. The impact can be transferred through these entities, without directvisible relations in JavaScript. Throughout the rest of the paper, we use the termindirect impact to refer to change impact transferred through such features. Impacttransferred directly through JavaScript code, e.g., through function calls, is referredto as direct impact.Relevant Entities. Let f be a JavaScript function, d a DOM element, and x an XHR621  <img id=‘item231’ src=‘img/items/231.png’                    itemName=‘dress’/>2  <fieldset name=‘prices’>3      <div class=‘price’ id=‘price-ca’>120</div>4      <div class=‘price’ id=‘price-us’>110</div>5  </fieldset>divimg fieldsetdiv div1 23 4Figure 3.2: Motivating example: HTML/DOMobject. If F, D, and X are sets representing each of those entities, respectively, thenthe set of all relevant entities is defined as E : ∑ε ← F ∪D∪XChange impact can propagate between these JavaScript entities through a seriesof read and write operations.Read/Write Operations. Let ε1 and ε2 be arbitrary entities in E. Suppose ε1 writesto ε2 at time τ . Then the relation is represented as ε1Wτε2. If ε1 is read by ε2 at timeτ , then the relation is represented as ε1Rτε2.The semantics of the relations between entities are drawn from actual JavaScriptexecution mechanisms (e.g., function f reads from a DOM element d). However, foreach W/R relation between two entities, there is a conceptual R/W relation betweenthe same two entities in the opposite direction (e.g., d writes to f ). Definition 3.1formalizes the notion of impact transmission between two entities.Impact. Let ε1 and ε2 ∈ E. If the value and/or the behaviour of ε2 depends on thevalue and/or the behaviour of ε1, then ε1 is said to have an impact on ε2, representedas ε1→ ε2.The impact can also be indirectly transferred from entity ε1 to ε2, if ε1 writes toε3, which is later read by ε2. We call such this relation a WR pair.We use a simple motivating example, presented in Figures 3.1–3.2, to illustratethe challenges and explain each definition. We use different portions of this examplein the following subsections. Note that this is a simple example and these challengesare much more potent in large and complex web applications.633.1.1 Impact through the DOMIn modern web applications, the DOM structure [42] evolves dynamically throughthe execution of JavaScript code, to update the content, structure, and style of the ap-plication in a responsive manner. A JavaScript function can write to a DOM element,which in turn can be read by another function and thus impact its behaviour. SuchDOM elements can transfer the impact of a change indirectly. Impact transferredthrough the DOM introduces an important challenge in identifying change impactin web applications. Hence, in this work, we propose DOM-related dependenciesas additional means of impact transfer.bodyfieldsetdiv div1  function checkPrice() {2      . . .3      var cad-price = $(‘#price_ca’).innerText();4      . . .5  }6  function calculateTax() {7      $(‘.price’).each(function(index) {8          $(this).text(addTaxToPrice(               $(this).text());9      });10 }11 $(‘#price_ca’).bind(‘click’, checkPrice);id=price_caclass=price432Figure 3.3: Impact transfer through DOM elements.. Figure 3.3 displays a portion of the motivating example that contains a hiddenDOM-based dependency. Function calculateTax() retrieves all DOM ele-ments having the class attribute price (line 7). The function then recalculatesthe price of each element to include the tax and rewrites the value of the elementwith the new price (line 8). Later, when the function checkPrice() (line 1)is invoked through a user event (registered in line 11), it retrieves the value of aDOM element with id "price-ca" (line 3) and uses this value to perform otheroperations. So far, there is no direct relation between functions calculateTax()and checkPrice() that shows any dependency between the two code segments.However, looking at the DOM structure shown on the right side of Figure 3.3, we cansee that the element with ID "price-ca" is also an instance of the price class(element  on the DOM tree). This means that the value used by checkPrice()may be affected by calculateTax(). This is a simple example of dynamic64bodyfieldsetdiv div1  function checkPrice() {2      . . .3  }4  function calculateTax() {5      . . .6  }7  $(‘#price_ca’).bind(‘click’, checkPrice);8  $(‘prices’).bind(‘click’, calculateTax); id=price_caname=prices23 4Figure 3.4: Impact transfer through event propagation.DOM dependency, which needs to be taken into account in impact analysis ofJavaScript applications.3.1.2 Impact through Event PropagationIn web applications, a single event can propagate on the DOM tree and invokemultiple handlers of the same event-type attached to any of the ancestors of thetarget element [133]. The direction of the event propagation depends on whetherthe capturing or bubbling mode is enabled. When capturing is enabled, the eventis first captured by the parent element and then passed to the event handlers ofchildren, with the deepest child element being the last. With bubbling enabled, anevent first triggers the handler of the target element on which the event was fired,and then it bubbles up and triggers the parents’ handlers. The second type of impactdependencies we introduce pertain to the hidden relations between the handlersinvoked via propagation of the original event on the target DOM tree. Such invokedhandlers can be involved in change impact propagation, i.e., they can affect thecontrol flow of the application and thus need to be considered in impact analysis forJavaScript.. In a segment of the motivating example shown in Figure 3.4, checkPrice() isattached to the element with id price-ca as an event handler (line 7). Therefore,if a user clicks on that element (element  on the right side DOM tree), func-tion checkPrice() gets invoked. However, price-ca is contained within afieldset element with the name prices (element Á on the DOM tree), whichis similarly bound to an event handler for the click event (Figure 3.4, line 8).65Due to event bubbling, any click on price-ca will bubble up to prices andtrigger its event handler as well. Hence, calculateTax() is invoked throughpropagation of an event originally targeting price-ca. As a result, the executionof calculateTax() also depends on the price-ca element, in addition to theprices element.3.1.3 Impact through Asynchronous CallbacksXMLHttpRequest (XHR) objects help developers enrich user experiences withweb applications through asynchronous communication with the server. Whileincreasing the interactivity and responsiveness of applications, XHR usage addsadditional complexity to impact analysis. Each XHR consists of three main phases:open, send, and response. A callback function is invoked when the XHR responseis received from the server, without a user involvement. A change in opening therequest, sending it, or the sent message can impact the response of the server, as wellas the behaviour of the application after receiving the response through a callbackfunction. Different components of XHR objects can make the detection of controland data flow relations more troublesome, particularly when these components arenot necessarily collocated in the same function or module. This motivates the thirdtype of impact dependencies we introduce in this work.. checkPrice() sends an asynchronous request to the server (lines 4–9 inFigure 3.5). However, the assigned callback function cannot be recognized statically,as the code uses the action chosen by the user dynamically to invoke the appropriatefunction (line 8, eval). Let’s assume that in this example the selected action is to“update” the price of an item, and hence the updateItem() function is assignedas the callback function of the XHR object. As it can be seen, there are no directfunction calls or shared variables between checkPrice() and updateItem()to enable traditional change impact analysis techniques to derive a dependencyrelation between the two functions. However, the XHR message along with the datathat was sent with it can affect the response that comes back from the server andthus can impact the behaviour of updateItem().661  function checkPrice() {2      . . .3      var itemName = extractName($(‘item231’);4      $.ajax({5          url : ‘prices/latest.php’,6          type : ‘POST’,7          data : itemName,8          success : eval(getAction() + ‘item’)9      });10    . . .11 }12 function updateItem(xhr) {13      var updatedInfo = getUpdatedPrice(xhr.responseText);14      suggestItem.apply(this, updatedInfo);15 }16 function suggestItem() {17     if (arguments.length > 2) {18         displaySuggestion(arguments);19     }20 }XHRFigure 3.5: Impact transfer through asynchronous callbacks.3.1.4 JavaScript DynamismMany traditional impact analysis techniques use static aspects of the code to deter-mine the impact set of a change. Dynamic features of the JavaScript language pose achallenge to static analysis techniques. For instance, almost everything in JavaScript,from fields and methods of objects to their parents, can be created or modified atruntime. Also, JavaScript’s dynamic policies for invoking functions can add morecomplexities. One such policy is function variadicity, which is common in webapplications [116]; i.e., in JavaScript, functions can be invoked with more or lessarguments compared to the parameters specified in a function’s static declaration.In addition to the DOM, event, and XHR challenges, the dynamic features of theJavaScript language need to be addressed in an effective change impact analysistechnique.. In line 14 of Figure 3.5, updateItem() invokes suggestItem() throughthe apply() function, which makes it impossible to infer the number of passedarguments statically. Function suggestItem(), the callee, takes no argumentsaccording to its declaration (line 16). Yet, the function is invoked with an arbitrarynon-zero number of arguments, which can change the execution of the application(Figure 3.5, line 17). Knowledge of the passed arguments at runtime is crucial for67performing precise data-flow analysis, required for impact analysis.3.1.5 Impact PathsThe concept of WR pairs can be generalized to any mechanism that can transferimpact in JavaScript, such as XHR objects, function arguments, and function returnvalues. Consecutive WR pairs involving all JavaScript entities can form generalimpact paths, as described in Definition 3.1.5.Impact Path. An impact path of an entity (P(ε)) is a directed acyclic path startingfrom entity ε . The nodes on the path are entities in the system, and the edges arethe directed impact relations that connect those entities.For instance, updateItem() → suggestItem(), checkPrice()→ #price-ca (DOM element with id=price-ca), checkPrice() →#price-ca → calculateTax() → addTaxToPrice() are examples ofimpact paths that exist in the running example (Figure 3.1).The length of an impact path is defined as the number of entities in the path. Theminimum length for propagation of the impact of a change through DOM elementsis 3 ( f Wτ1 d Rτ2 g | τ1 < τ2, d ∈ D, f ,g ∈ F).3.2 Exploratory Study: DOM-related and Event-basedImpactsWe conducted an exploratory study to investigate the role of JavaScript’s DOM-related, event-based and dynamic features in code change propagation. Our goalwas to understand whether DOM elements, event handlers, and propagated eventscontribute to forming new impact paths in JavaScript code.Subject Applications. We selected ten web applications that make extensiveuse of JavaScript on the client-side for this study. We selected these applicationsfrom (1) participants of recent JavaScript programming contests, and (2) trendingand popular JavaScript projects on GitHub1. They are listed in column 1 of Table 3.1.1https://github.com/trending/68Design. We capture JavaScript-DOM interactions as well as any occurrencesof event propagation on the DOM tree. For each DOM access that occurs duringthe execution, we collect the accessed entity, the JavaScript function that accessesthe DOM, and the access type. Access types are directed operations performedon DOM elements by JavaScript functions, where the direction of the access isdetermined by the direction of the flow of data. For instance, assume functionf oo creates DOM element e at time τ , which means f oo Wτ e. Then the typeof access is element-creation and the direction of access is from f oo to e.The collected data is analyzed to extract the impact set for each of the JavaScriptfunctions involved in the execution. The DOM elements that are located on atleast one impact path of a function are the ones that can contribute to the impactpropagation process and are called WR elements for simplicity. The consideredimpact paths are required to have a length of at least three (e.g., f ooWτ1eRτ2bar,for functions f oo and bar and DOM element e). Moreover, redundant impact pairs(reads and writes between same entities) do not contribute to the impact paths andare therefore eliminated from the analysis.Results. Each application is manually exercised in different scenarios multipletimes and the results are integrated. The results are shown in section (A) of Table 3.1.The first column of section (A) displays the total number of DOM elements accessedby JavaScript code during execution. The second column of the section shows theratio of WR DOM elements to the total number of involved DOM elements fromthe first column. The number of DOM event handlers that were triggered duringthe execution is shown in column three. Column four represents the percentage ofhandlers that were invoked through event propagation (capturing or bubbling) tothe total number of triggered handlers. The average length of impact paths in eachapplication is depicted in column one of section (B).The results show that on average, 42% of the DOM elements that were accessedduring the execution of these applications, were part of an impact path betweentwo functions. Moreover, about 14% of the executed event handlers were invokedthrough event propagation mechanisms. The results thus reveal the importance ofDOM elements in transferring the impact. Also, the role of propagated event han-dlers is significant in determining the dynamic behaviour of a JavaScript application.69Table 3.1: (A) Results of analyzing JavaScript’s DOM-related and dy-namic features. (B) Factors in determining impact metrics.JavaScript (A) DOM and dynamic features (B) Factors of impact metricsApplication LOC #DOMelem.% WRDOMelem.# ofhandlers% prop.handlersAvg.PathLengthfan-inelem.fan-outelem.fan-infunc.fan-outfunc.same-game 229 62 98 20 45 6.3 1.9 2.9 23.1 15.2ghostBusters 343 44 61 39 0 4.3 3.3 0.4 2.6 20.0simple-cart 9238 41 51 14 0 3.9 2.1 1.7 2.7 3.3mojule 522 47 17 18 33 7.0 1.5 2.1 4.6 3.3jq-notebook 839 1 100 21 38 4.0 16.0 11.0 0.9 1.3doctored.js 3534 2 50 47 15 5.3 4.0 7.0 1.0 0.8jointlondon 2498 34 9 16 0 3.7 0.8 2.2 1.6 0.6space-mahjon 983 61 10 53 4 4.0 1.8 3.0 1.5 0.9listo 354 5 20 10 0 4.0 0.1 1.5 21.4 1.9peggame 1274 17 6 23 0 3.0 0.8 1.3 4.3 2.1Average 1981 31 42 26 14 4.6 3.2 3.3 6.3 4.9Hence, a CIA technique for JavaScript application should consider the DOM-relatedand event-based features as media for propagating the impact.We further analyze the structure of the created dependency graphs to gain moreinsight into the nature of DOM- and event-based relations within JavaScript applica-tions. Among all structural and semantic aspects of the graphs, the average fan-inand fan-out scores of the functions and DOM elements in the graphs are reported insection (B) of Table 3.1. These factors are selected due to their correlations with theratio of WR elements in subject applications. We use this information later in thepaper, when we propose a set of metrics for ranking the impact set (Section 3.4).3.3 Hybrid AnalysisWe propose a hybrid technique, called TOCHAL, which augments static analysiswith dynamic analysis to enable a DOM-sensitive and event-aware change impactanalysis method for JavaScript applications.3.3.1 Static Control-Flow and Partial Data-Flow AnalysisOur approach first identifies JavaScript entities that can be analyzed statically.Among entities described in Definition 3.1, JavaScript functions (F) are the onlyentities that fit this criterion. The DOM is created and mutated during execution.This limits the static reasoning about its structure and possible event propagationsthat would affect change impact. Regarding XHR objects, it is not easy to infer70statically what messages are received from the server. Moreover, there is no staticinformation available regarding the order and timing of asynchronous callbacks.Our static analysis module incorporates direct relations between functions into astatic call graph (SCG) by analyzing the JavaScript code. In JavaScript, functions arefirst-class citizens and receive the same treatment as objects; we augment the samestatic call graph with global variables, which we treat similarly as the functions.To increase the precision of the static analysis, which in turn improves the qualityof the impact set, we perform a pruning algorithm on the extracted dependencies.The pruning is conducted based on a partial data-flow analysis of the call graph.Function invocations are not considered as impact relations unless the two functionshave a data dependency through passed arguments or return values, as described inSection 3.3.1. This does not concern data dependencies through global variablesshared between two functions, where separate dependency relations are formed.Function Dependencies Let ρ,δ ∈ P. Then impact relations between f and g aredefined as:• f → g if f invokes g and the signature of g indicates that it takes parameters.• g→ f if f invokes g and the definition of g includes a return value.3.3.2 Analyzing the Dynamic FeaturesTo include the dynamic features of the JavaScript language in our impact analysis,our dynamic analysis module intercepts, transforms, and instruments the Java-Script code on-the-fly to collect execution traces. To collect a trace of functionexecutions, the beginning and the end of each JavaScript function are instrumented.Function declarations are modified to collect traces of function invocation andpassed arguments. We also trace function terminations and return statements (ifthey exist in the function). These traces are then used to create a dynamic call graph(DCG) that captures dependencies between function executions at runtime.The DCG is an under-approximation of the call graph, while the SCG is anover-approximation of the call graph. The DCG contains fewer false positivescompared to the SCG, and we augment it to capture DOM-related, event-driven,and asynchronous features of JavaScript, as explained below.71DOM-Sensitive Impact Analysis A JavaScript function can impact a DOM elementand vice versa, which is defined as:Direct Impact between JavaScript and DOM Consider a DOM element d ∈ D,and a function f ∈ F. Then f can directly impact d and vice versa:{f →τ d if fWτdd→τ f if f RτdFurthermore, the impact can travel from function f to function g, through aDOM element d, under certain conditions as defined in the next definition.Indirect JavaScript Impact through DOM Consider two functions f, g ∈ F, anda DOM element d ∈ D. f can indirectly impact g through d, if and only if thefollowing conditions hold:f →d g iffWτ1d &gRτ2d &τ2 > τ1In other words, function f can potentially impact function g through DOMelement d, if f writes to d and g reads from the same element. Such a write-read(WR) pair indicates the existence of a potential impact between the two functions, ifthe read instruction happens after the write. Such WR pairs can occur subsequently,involving more elements and functions in the application. The reading function canitself write to a DOM element and augment the propagation path. The same changecan then potentially impact all elements and functions that are on such a path.To analyze how the DOM transfers the change impact (Section 3.3.2), all readand write accesses to the DOM need to be monitored dynamically. Each accessis made from a JavaScript function to a DOM node, element, or an attribute, andthrough standard DOM API calls (e.g., getElementById, querySelector).We modify the prototype of the Node, Document and Element classes to be ableto dynamically intercept DOM accesses, while preserving the original behaviour ofthese classes. This allows us to monitor changes to the structure of the DOM tree,as well as the existence, content, and attributes of DOM elements.72It is worth mentioning that the caller functions are extracted from the dynamiccontext of the intercepted DOM API calls. As a result, if a function does not existor is not detected statically (e.g., it is created through an eval statement), it willstill be captured in the dynamic phase if it interacts with other entities and is part ofthe execution.For each function and DOM element involved in an access, we augment thedynamic call graph by adding two nodes (if they do not already exist), and connect-ing them through an edge representing the type of access that was made from thefunction to the element.Event-Based Impact PropagationTOCHAL also captures all event handlers called directly and indirectly through eventpropagation on the DOM tree.Definition 3.3.2 summarizes the potential impact transmission between a DOMelement and all event handlers that are called through event propagation.Indirect Impact via Propagation Let d be a DOM element that has an event han-dler for event e. Consider prope[d] to be the set of all JavaScript functions that aresubmitted as handlers for event e to d or any of its ancestors in the DOM tree, andthus can be triggered by event propagation. Then d can impact all these handlersindirectly: d→ prope[d].Moreover, the dynamic analysis module yields information on function argu-ments and return values for all directly and indirectly invoked functions. Thesevariables may differ from what has been declared in static function signatures, dueto function variadicity in JavaScript.XHR Relations There are three main phases in the lifecycle of each XHR object:open, send, and response. These three phases can be scattered throughout thecode. In addition, callbacks from the server-side could invoke other functions onthe client-side, and hence it is not trivial to find the XHR components statically.Our technique instruments and intercepts each component of an XHR object bywrapping around the XHR object of the browser. The gathered information is then73Table 3.2: Impact transfer through different entities.Assume f ,g ∈ F (functions), d ∈ D (DOM) and x ∈ X (XHR)Relation DescriptionfWτg1. f calls g and passes arguments.2. g calls f and f returns a value.fWτd1. f creates element d, adds it to the DOM tree, deletes it, or detaches or relocates it from theDOM.2. f modifies the content or the attributes of d.dRτ f1. f uses information regarding the content, attributes, or location of d in the DOM.2. f is bound to d through an event handling mechanism.3. f is set as an event handler of one of the ancestors of d, that can be triggered via eventpropagation.fWτx1. f opens x as a new XHR object.2. f sends a previously-created XHR object x.xRτ f1. f is set as the callback function of x2. f sends a previously-created XHR object x.used to augment the dynamic call graph. Similar to the previous features, new nodesrepresenting XHR objects are added to the graph, the involved functions nodes areonly added if they were not previously included, and the function nodes are linkedto the XHR node based on the type of access they make to the XHR object. Theaccess types are defined by the type of the interaction between a function f and anXHR object x, and determine the direction of the impact relation. For instance, if fcreates and opens x, then f → x, while if f is registered as the callback function ofx, then x→ f .Similar to the static call graph, we enhance the dynamic call graph using inter-procedural data-flow analysis. Arguments and return values of functions are used totrim the call graph where there is no data flow between two functions (Definition3.3.1). However, instead of using the static function code, the dynamic argumentsand return values are used to support function variadicity at run time.3.3.3 Hybrid Model for Impact AnalysisAt this stage, TOCHAL creates a system dependency graph by integrating theobtained static and dynamic call graphs. This graph is used for performing impactanalysis on JavaScript applications. The dynamic part of the model contributesto the precision of the analysis, while its static features make it more complete.We take a best-effort approach for fulfilling soundness, following the soundiness74checkPrice()XHRupdateItem() suggestItem()displaySuggestion()getUpdatePrice()addTaxToPrice()                                            JS FunctionXHR ObjectLabeled andDirected EdgecalculateTax()Figure 3.6: A static call graph, displaying the dependencies ex-tracted from the running example (Figures 3.1 and 3.2).checkPrice()1XHRupdateItem()suggestItem()displaySuggestion()getUpdatePrice()addTaxToPrice()calculateTax()#item231                                            DOM elementJS FunctionXHR ObjectLabeled andDirected Edgeread byread by (id)#price-ca.pricehandled bywrites to(class)triggers(propagation)triggers(propagation)opens& sendsresponseinvokes+ argsinvokes + argsinvokes+ argsreturns value2345 6 78 9 10Figure 3.7: A sample hybrid graph, including the dynamic andDOM-sensitive dependencies extracted from the running ex-ample (Figures 3.1 and 3.2).manifesto [80]. In order to satisfy the practicality of our approach in terms ofprecision and scalability, complete soundness is not a concern of our approach. Thehybrid model is represented as a directed graph. The vertices are system entitiesand the edges are the potential impact relations as summarized in Table 3.2.Vertices. The vertices in the graph are all entities that are present statically orare created during the execution of an application. The vertices can take one of thefollowing types:75• JavaScript functions. JavaScript functions extracted by our static analyzer(Section 3.3.1) are added as vertices. Functions found dynamically are addedas well due to their involvement in the impact propagation, even if they arenot connected directly (Section 3.3.2).• DOM elements. The importance of DOM elements in transferring the impactdynamically was discussed in Section 3.3.2. Accessed DOM elements (andtheir contextual information) are captured as vertices in the model.• XHR objects. XHR vertices incorporate information regarding the creationand sending of messages, as well as callback functions and data transmitteddynamically between the server and the browser.Edges. The edges in the graph are labeled and directed.• Direction. The direction of each edge depicts the flow of the data betweentwo vertices. The edges are categorized into read and write accesses. Anedge is directed from the vertex that writes (offers) the data to the vertex thatreads it.• Labels. The edge labels indicate the type of dependency relations that connectthe vertices. Different labels are used to connect different vertices, since thevalid operations vary for each category of vertices.Example. Figure 3.6 shows a dependency graph for the running example(Figures 3.1 and 3.2) obtained through the static analysis module alone. On theother hand, Figure 3.7 depicts a simplified hybrid graph utilizing both the staticand dynamic analysis modules of TOCHAL. This is hard to do with the static graph.However, using the hybrid graph, one can find the potential impact set of a changein entity ε by tracing the graph forward, starting from ε .Consider a case where a developer plans to make a change to the calculateTax()function (line 21 of Figure 3.1), and would like to find the potential impact be-fore making the actual change. A DOM-agnostic static change impact analysismethod (see Figure 3.6) would report that only addTaxToPrice() would beaffected. The source code shows that next to the addTaxToPrice() function,DOM elements with class=price (lines 22–23 of Figure 3.1) can also be affected.However, our hybrid DOM-sensitive analysis reveals that there exist moreimpact paths. The DOM element with id=price-ca is also a member of a76Table 3.3: Impact MetricsEntity Metric Description CC with % WRDOM elem.d ∈ D fin(d) Number of functions f such that fWτd 0.66f ∈ F fin( f ) Number of elements d such that f Rτd 0.74fout( f ) Number of elements d such that fWτd 0.62ε ∈ DorF L̂[P] Average length of impact paths in the application 0.59Dm(ε) Minimum distance of ε from the change set -class=price (box 4 of Figure 3.7). This element can thus be impacted bycalculateTax(), and in turn can propagate the impact to checkPrice()indirectly (lines 3 & 26 of Figure 3.1, box 1 of Figure 3.7). Furthermore, eval-uating the response of an XHR object, checkPrice() can then transfer theimpact to updateItem() (box 6), which can propagate the impact to morefunctions (boxes 5, 8 & 9). To summarize, as our hybrid model shows, chang-ing the calculateTax() function can affect six more elements in addition tothe two elements that can be detected by statically analyzing the code. Thus, theproper impact set consists of functions addTaxToPrice(), checkPrice(),updateItem(), getUpdatedPrice(), suggestItem(),displaySuggestion(), the DOM element with id=price-ca, and the anony-mous XHR object.3.4 Impact Metrics and Impact Set RankingAn impact set inclusive of the contributions of both static and dynamic analysescan become large and overwhelm the user. Considering that not all entities in theimpact set are equally important, providing a ranking mechanism is essential forhelping developers identify relevant impacted entities more efficiently.We propose a set of impact metrics to estimate the importance of each entity inthe produced impact set. The impact metrics, outlined in Table 3.3, are variablesderived from the semantic and structural characteristics of the hybrid graph. Thesemetrics can affect the probability of impact propagation, through DOM-related anddynamic mechanisms of JavaScript. Based on the impact metrics, we propose animpact ranking mechanism as outlined in Definition 3.4. The impact rank scoreof each entity ε , referred to as IR(ε), is an estimation of the importance of ε in77propagating the change, relative to other elements in the impact set.Impact rank Let ε be an entity in the impact set. Then the impact rank of ε isdefined as:IR(ε) =IRpr(ε)∗ L̂[P(ε)]∗Fanw(ε)Dm(ε)where, the value of the impact rank of an element in the impact set depends on fourvariables. (1) IRpr(ε): the accumulation of impact ranks of immediate precedentsof entity ε in the hybrid graph that are on an impact path from the change set to ε . Ifthe impact is transferred to ε from entities with higher ranks, then the importance ofε in transferring the impact potentially increases. (2) Dm(ε): the shortest distanceof ε from the change set in the hybrid graph. The closer ε is to the change set, thehigher is the probability of being impacted by the change in practice. Hence, alonger distance of ε from the change set has a lesser effect on its impact rank. (3)L̂[P(ε)]: the average length of an impact path starting from entity ε . If ε can cascadethe impact to more elements and deeper levels in the propagation graph, then ε ispotentially an important entity in the impact set. (4) Fanw(ε): the weighted fan-in/ fan-out score of functions and DOM elements in the hybrid graph is indicativeof the number of entities that can impact and be impacted by ε directly; hence,we consider it as a determining factor in impact rank determination. Fanw(ε) iscalculated as follows:Fanw(ε) ={w1 ∗ fin(ε) if ε ∈ Dw1 ∗ fin(ε)+w2 ∗ fout(ε) if ε ∈ FL̂[P(ε)] and Fanw(ε) are extracted from the findings of our exploratory study(Section 3.2). During the course of the study, we measured the semantic andtopological properties of hybrid graphs of ten web applications. Among the analyzedvariables, the length of impact paths, fan-in of DOM element node, and fan-in andfan-out of function nodes (section (B) of Table 3.1) have a correlation with thenumber or ratio of DOM elements that can transfer the impact. Hence, thesevariables can affect the probability of impact propagation through an entity. Theythus play a role in these two determining factors influencing the overall impact rank.783.5 Tool Implementation: TOCHALWe implemented TOCHAL using JavaScript libraries such as Esprima,2 Estraverse3and Escodegen4 for parsing and transforming the JavaScript code. We use theselibraries to instrument functions in a manner that permits us to collect data aboutfunction invocations, function entries and function exits. The collected data alsoinclude dynamic values of functions’ arguments and return values. External filesare attached to the beginning of each document that allow instrumentation andinterception of DOM events, XHR objects and timeouts. We use the MutationSummary library [93] to detect JavaScript code appended to the DOM on the fly.Therefore, we can extend function instrumentation to the JavaScript code that getscreated dynamically during application execution. We use WALA [128] to extractthe static call graphs of applications.TOCHAL provides an interface for developers to utilize our hybrid change impactanalysis. The main goal is to facilitate the comprehension of change impact “during”development and debugging activities. Hence, the analysis needs to be availableto developers while they make changes to their application. We have integratedour impact analysis method within Google Chrome’s DevTools [27], a popularweb development environment. This decision entails a number of benefits, namely(1) the approach is complementary to existing web development platforms andenvironments; it does not change the functionality they provide, but augments theircapabilities. (2) The developer can perform the impact analysis in the same contextas the code, and can preserve her mental model of the code. (3) The developer isnot required to learn a new tool, or divide her attention between two different tools.The interface allows the user to select JavaScript functions (including XHRcallbacks) or DOM elements as the change set, and then perform the impact analysis.Chrome’s DevTools includes a set of panels, each providing a window to a subsetof functionalities that the platform provides. Two panels are of more interest for us:“elements” and “sources”. The elements panel visualizes and provides inspectionmechanisms on the DOM. The sources panel displays all of the JavaScript code thatcontributes to the application. We add a sidebar to each of these panels, allowing2http://esprima.org3https://github.com/Constellation/estraverse/4https://github.com/Constellation/escodegen/79the user to invoke the CIA unit, on a selected entity, at any stage of the development.TOCHAL is publicly available for download [130].3.6 EvaluationWe empirically evaluate the effectiveness and usefulness of TOCHAL through thefollowing research questions:RQ4.1 How does our hybrid method compare to static/dynamic analysis methods?RQ4.2 Does TOCHAL help developers in performing change impact analysis inpractice?We address these questions through two studies, each described in the followingtwo subsections, respectively.3.6.1 Study 1: Comparing Static, Dynamic, and Hybrid AnalysesTo address RQ4.1, we conduct a study to evaluate the impact sets extracted us-ing TOCHAL in comparison with those detected by static and dynamic analysistechniques separately. We compare TOCHAL with a state-of-the-art static analysistechnique. We also examine the differences in the outcomes of TOCHAL with thoseof the dynamic analysis unit of TOCHAL. The term dynamic analysis encompassesthe DOM-sensitive, dynamic, event-driven, and asynchronous analysis performedby TOCHAL. Our first hypothesis is that TOCHAL outperforms static impact analysisdue to its support for dynamic analysis. We also hypothesize that while dynamicanalysis is a significant part of Tochal, it is outperformed by tochal’s hybrid analysis.We decided to compare TOCHAL with static and dynamic approaches, since tothe best of our knowledge, TOCHAL is the first change impact analysis techniquefor JavaScript and there is no similar tool available for JavaScript.Design and Procedure. The only entities that can be analyzed by both static anddynamic analysis methods are JavaScript functions. Hence, to be fair to both staticand dynamic analyses, we configure TOCHAL to only deliver functions in the impactsets. TOCHAL’s hybrid model and analysis, however, do not differ from what is80described in Section 5.3 and use DOM-based, dynamic and asynchronous entitiesand relations to extract the functions in the impact sets.We use the same set of subject applications from our exploratory study ofSection 3.2 (Table 3.4, column 1). For each application, we randomly sample threefunctions and extract their impact sets using each of the methods. We comparethe impact sets to assess the completeness of the outcomes of analysis with eachapproach. The sample functions are selected from a pool of functions that arerecognized by all three analysis methods. In other words, static and dynamicanalysis alone are not able to detect some functions that are detected by TOCHALand are involved in its hybrid analysis. If static/dynamic analysis is performed onany of the functions it does not recognize, the impact set will be empty. We aim atcomparing impact sets at this stage and these functions are unable to provide usefulinformation regarding the analysis. Thus, we do not include such functions in thecomparison. However, this indicates the need for an investigation of the functionsinvolved in each type of analysis (in the dependency graphs), as described in thenext paragraph.We measure the number of functions that are included in the dependency graphsof each analysis, contributing to the detection of the impact sets. The averagenumber of functions in each type of analysis denotes the extent of the analysisperformed for extracting the impact sets. Moreover, the lower number of recognizedfunctions by an analysis method means that there are more functions for which themethod is unable to perform CIA.Pure Static Analysis. We use the static analysis part of our approach, whichis built using WALA [128]. WALA is a leading static analysis tool for JavaScript,used by many other JavaScript analysis techniques [125, 136, 137]. It should benoted that WALA by itself does not support change impact analysis. For the purposeof this evaluation, we extended and directed it towards performing static impactanalysis to conduct the comparisons.Pure Dynamic Analysis. We disable the static module of TOCHAL and onlyutilize the dynamic analysis module. The applications are exercised through theirtest suites when available and manually within multiple sessions and the results areintegrated. Each manual session follows a set of pre-defined scenarios that covers81Table 3.4: Results of comparison between static, dynamic and TOCHAL(RQ1) (A) Comparison of impact sets (B) Comparison of functionsin system dependency graphs(A) Impact sets (B) FunctionsApplication TOCHAL Static StaticTOCHAL DynamicDynamicTOCHAL TOCHALStaticTOCHALDynamicTOCHAL PureStaticPureDynamicavg min max avg min max % avg min max % avg % % % %same-game 4.33 2 7 0.67 0 1 15 3.67 2 6 85 16 56 93 6 44ghostBusters 1.33 0 2 0.33 0 1 25 1.33 0 2 75 10 80 100 0 20simple-cart 1.67 0 3 0.67 0 1 40 1.67 0 3 100 44 74 91 9 26mojule 1.00 0 2 0.33 0 1 33 0.67 0 2 67 21 24 90 10 71jq-notebook 2.67 0 7 0.67 0 2 25 2.33 0 6 87 19 47 100 0 53doctored.js 1.67 0 4 0.33 0 1 20 1.33 0 3 80 38 67 50 56 33jointlondon 2.33 0 5 0.67 0 1 29 1.67 0 4 72 36 31 85 15 69space-mahjon 2.67 1 5 1.00 0 2 37 2.00 0 5 63 27 56 93 7 44listo 1.33 1 2 0.00 0 0 0 1.33 1 2 100 12 75 58 25 42peggame 2.67 1 6 1.00 1 1 37 2.00 0 5 75 24 83 75 25 17Average 2.07 0.5 4.3 0.57 0.1 1.1 26 1.8 0.3 3.9 80 25 59 84 15 42all main use-cases of the application that are accessible to an end-user.Hybrid Analysis. We use the hybrid model of TOCHAL for performing impactanalysis and obtaining the set of functions that are involved in the hybrid analysis.Results and Discussion To answer RQ4.1, we discuss the outcomes of the study,summarized in table 3.4.Completeness of Impact Sets. Section (A) of Table 3.4 depicts the results ofthe impact set detection using static, dynamic and hybrid analysis methods. Thefirst column of this section displays the average, minimum and maximum sizesof the impact sets of the selected JavaScript functions, detected by TOCHAL. Thesecond column displays the average, minimum and maximum impact set size bystatic analysis. The third column represents the percentage of the ratio of the impactset size of static analysis to that of TOCHAL. Similarly, the fourth and fifth columns,respectively, show the impact set size for dynamic analysis, and the ratio of thesize of impact sets using dynamic analysis compared to TOCHAL. We observe anincrease in completeness for all applications in favour of TOCHAL. On average,static and dynamic analysis methods detected 0.57 and 1.80 functions in the impactset of each sample function, respectively. The hybrid TOCHAL method, however,extracted an average of 2.07 functions to be potentially impacted by each of thesample functions.Overall, the impact sets extracted by TOCHAL include 74% more functions82on average compared to those detected by static impact analysis. The outcomeconforms the findings of our earlier exploratory study, showing the prevalence andimportance of DOM-related and dynamic characteristics of JavaScript in impactanalysis. TOCHAL takes into account new types of entities in its dependency graphthat are more aligned with the nature of JavaScript (DOM elements, XHR objects).It also recognizes new (and mostly hidden) types of relations between these entities,that lead to more complete and more precise dependency graphs and impact sets atthe same time. The static analysis still remains useful in TOCHAL since dynamicanalysis can only cover 80% of the impact sets detected by hybrid analysis andcannot replace it.It is worth noting the small sizes of static impact sets. Considering the conser-vativeness of static methods, the dependency graphs in general can include manyrelations between functions that are not feasible in practice. Therefore, static impactsets in CIA methods for traditional languages are expected to get large and containinfeasible relations. Our results show an opposite phenomenon for JavaScript ap-plications. The small sizes of static dependency graphs and the resulting impactsets attest to the difficulties and limitations of static analysis for JavaScript. Thefindings further confirm that new forms of definitions and usages of functions, ob-jects, DOM elements and asynchronous objects negatively affect analysis of usefuldependency graphs. Static analysis confronts more barriers during the analysisJavaScript applications that should be mitigated using an approach that supportsthese features.Necessity of Hybrid Analysis. Separate data sets are collected from each typeof analysis, to let us distinguish between the statically detected and dynamicallyinvoked functions. Through these datasets, we extract the functions that wereinvoked dynamically but were not detected statically, and the functions that wereextracted before the execution, but did not play a role at runtime. The results aresummarized in section (B) of Table 3.4. The first column of this section containsthe total number of functions that were included in our hybrid analysis. The secondand third columns represent the percentages of these functions that were covered bystatic and dynamic analysis units, respectively.The static analysis method only covers about 56% of the functions that are83covered during hybrid analysis. As expected, this inadequacy is caused due tothe dynamism of JavaScript even in simple function invocations. Moreover, thedynamic analyzer includes 86% of the functions detected by the hybrid analysis.This confirms our anticipation of incompleteness of dynamic analysis, due to itsreliance on specific executed scenarios of the code. The increase in the coveredJavaScript functions by our proposed hybrid analysis leads to a more completesystem dependency graph, which is used to perform the impact analysis. Hence, theproposed hybrid approach can improve the accuracy of JavaScript impact analysis.Column 4 in section (A) of Table 3.4 represents the percentage of JavaScriptfunctions that were detected by the static analyzer, but were not invoked duringany of the executions of the applications. Note that the existence of such functions,that would be missed from pure dynamic analysis, is sufficient evidence for thenecessity of a hybrid analysis technique. Column 5 of the same section of the table,on the other hand, displays the percentage of functions that were executed duringthe execution of each application, but were not detected by the static analyzer. Theresults confirm our previous intuition regarding the shortcomings of static Java-Script analysis, even in a well-established type of analysis based merely on functiondeclarations and invocations.3.6.2 Study 2: Industrial Controlled ExperimentWe conducted a controlled experiment [139] in an industrial environment5 to addressthe following two questions, derived from RQ4.2:RQ4.2.1 Does TOCHAL increase the CIA task completion accuracy?RQ4.2.2 Does TOCHAL decrease the CIA task completion duration?Experimental Subjects We recruited 10 participants, 5 female and 5 male, withages ranging from 20 to 34. At the time of conducting the experiment, all participantswere employed in a large software company in Vancouver. Their skills in webdevelopment ranged from medium to professional. All participants volunteered fortaking part in our experiment and did not receive monetary compensation.5The experimental material is available at: http://ece.ubc.ca/~saba/tochal/study-materials.zip84Experimental Design The experiment had a “between-subject” design to avoidcarryover effects. We formed two independent groups of participants. The experi-mental group used TOCHAL for performing the tasks; none of the participants werefamiliar with TOCHAL prior to attending the experimental sessions. Since TOCHALis the first CIA tool for JavaScript applications, the control group performed thetasks using the development tool they used in their day-to-day web developmentactivities (without TOCHAL). To avoid bias, it was important for the members of thetwo groups to have similar proficiency levels. We manually assigned the participantsto the groups to ensure this was the case based on a pre-questionnaire evaluatingtheir experience and expertise.Tasks. We designed a set of four tasks that involved finding change impactsduring web application maintenance activities. The tasks, outlined in Table 3.5,require the participants to detect and comprehend the potential impact of a changein a JavaScript function or a DOM element. Moreover, two of the tasks requireparticipants to use their understanding of the change impact to find a bug or aninconsistency that occurs after the change.All features of TOCHAL were available to the experimental group during theexperimental session. We were particularly interested in assessing the usefulness ofthe ranking mechanism of TOCHAL, which was deployed based on our proposedimpact metrics (Section 3.4). Hence, we designed a smaller study that enabled usto compare the effects of using the ranking. However, this comparison was onlymeaningful for the experimental group, who used TOCHAL and had access to itsranking feature. Having this in mind, the two debugging tasks, tasks T3 and T4 inTable 3.5, were designed to have similar levels of difficulty and to require similaramount of effort and expertise. However, we disabled the ranking feature for T3,while leaving it enabled for T4. These two tasks were counterbalanced to avoidorder effects.Dependent and Independent Variables. We measured two continuous vari-able as our dependent variables. Task completion duration was measured in minutesand seconds. Tasks completion accuracy was measured in marks based on a fixedand predefined grading rubric, and the marks were converted to percentages forconsistency across all tasks.85Table 3.5: Impact analysis tasks used in the controlled experiment.Task DescriptionT1 Finding the potential impact of a DOM element (the button changing the size of thedisplayed slideshow images)T2 Finding the potential impact of a JavaScript function (the function toggling the play/pausestate of the slideshow)T3 Finding a conflict after making a new change (problem in submitting new commentsafter changing the table containing all comments of a picture). Ranking is disabled.T4 Finding a bug in JavaScript code (entered email format is not properly checked)The independent variable is a nominal variable including two levels. One levelrepresents using TOCHAL, and the other level depicts using a different tool (e.g.,Chrome DevTools, Firefox Developer Tools, or Firebug).Experimental Object We used Phormer [106], an online photo gallery in PHP,JavaScript, CSS and XHTML, which consisted of around 6,000 lines of code andover 38,000 downloads at the time of conducting the experiment. Throughout theexperiment, the participants had to understand and debug parts of the applicationrelated to displaying a slideshow of pictures, viewing the pictures, and authoringcomments.Experimental Procedure. The experiment consisted of three main phases.Pre-Experiment. The participants were given a pre-questionnaire form be-fore attending the experimental session. They were required to provide informationregarding their proficiency and experience in web development and software engi-neering. This information was used for assigning them into one of the two groupsprior to the session. The pre-questionnaire also inquired about the tools the partici-pants normally used for performing their every-day web development tasks. Theanswers to this question were used to determine the proficiency levels of the partici-pants for assigning them into the control and experimental groups. The informationwas also used to indicate the tool the control group would use for the experiment.It is worth mentioning that all participants of the control group selected GoogleChrome as their preferred web development tool. In the experimental session, theparticipants were introduced to TOCHAL for the first time and were trained to use it.86Then they were given a few minutes to familiarize themselves with the tool, and askus questions if needed.Tasks. During the experimental session, the participants were asked to per-form a set of tasks, as indicated in Table 3.5. To avoid experimental bias, each taskwas handed out on a separate sheet of paper to the participant, when the investigatormarked the start time of the task. The investigator terminated the measurement ofthe task duration when the participant returned the paper to the investigator alongwith her answer. The task accuracy was marked after the session, using a rubriccreated prior to conducting the experiment.Post-Experiment. We asked the participants to fill a post-questionnaire formafter completing the tasks. The questionnaire contained questions regarding boththe helpful features and the shortcomings of the tool they used in the experiment.Moreover, we enquired about the metrics our participants considered to affect theimportance of an entity in the impact set.Results and DiscussionsFigures 3.8 and 3.9 depict the results of task completion duration and accuracy forboth experimental and control groups.Accuracy (RQ4.2.1). We ran the Shapiro-Wilk normality test on the accuracydata. The results showed that the data collected for tasks T1, T3 and T4 were notnormally distributed and hence, Mann-Whitney U tests were used for analyzingthe results of these tasks. The data gathered for task T2 and the total accuracy ofthe tasks were normally distributed and were analyzed using t-tests. The resultsof conducting the tests revealed a statistically significant difference for the experi-mental group using TOCHAL (Mean=84%, STDDev=14%) compared to the controlgroup (Mean=26%, STDDev=11%); (p-value=0.0001). Overall, participants usingTOCHAL perform 223% more accurately compared to the control group, across alltasks in the experiment.Further, the accuracy for all tasks was higher when the participants usedTOCHAL. The improvement was statistically significant for tasks T1, T2 andT4, but not T3. Recall from the tasks table (Table 4.4), that the ranking mechanismof TOCHAL was disabled for T3. This outcome thus emphasizes the value of ranking87llll llT1−ExpT1−CtrlT2−ExpT2−CtrlT3−ExpT3−CtrlT4−ExpT4−CtrlTot−ExpTot−Ctrl020406080100Accuracy (%)Figure 3.8: Task completionaccuracy per task and intotal for the control (ctrl)and experimental (exp)groups (RQ4.2.1).lllT1−ExpT1−CtrlT2−ExpT2−CtrlT3−ExpT3−CtrlT4−ExpT4−CtrlTot−ExpTot−CtrlDuration (mm:ss)0:008:2016:4025:0033:2041:40Figure 3.9: Task completionduration data per taskand in total for the control(ctrl) and experimental(exp) groups (RQ4.2.2).the impact set in helping the user find the important impacts more efficiently. Nothaving access to the impact ranks, participants had to expend more manual effort.Enforcing more analysis burden on the participant increases the variation in theanswers based on the individual differences in abilities of the participants. The highvariation prevents the statistical significance of the results of T3, in spite of the 60%higher accuracy average for TOCHAL users.Duration (RQ4.2.2). We first used the Shapiro-Wilk test on the duration dataand confirmed that the data was normally distributed. Therefore, we ran a set oft-tests on all individual tasks, as well as on the total time spent on all of the tasksthroughout the experiment. The results show a statistically significant differencein the total duration for the experimental group using TOCHAL (Mean=19:54,STDDev=1:23), compared to the control group using other tools (Mean=35:26,STDDev=8:21); (p-value=0.01). Overall, participants using TOCHAL spent anaverage of 78% less time on the same set of tasks compared to those using othertools.Further, the participants using TOCHAL showed improvements for all of the88tasks compared to the control group, with the difference being statistically significantfor T1. Considering the higher accuracy scores for other tasks when using TOCHAL,we observe that many participants of the control group terminated the tasks withincomplete and incorrect answers, erroneously assuming that they had completedthe task. This caused them to obtain lower accuracy scores, while still spendingmore time on average.Ranking. The results of the experimental group were analyzed further toinvestigate the effects of using our proposed ranking system on the duration andaccuracy of understanding and debugging an application after a change. T3 andT4 were both debugging tasks requiring similar levels of expertise. As mentionedearlier, the use of TOCHAL’s ranking feature was disabled for T3, while it wasenabled for T4. Performing a t-test on the results revealed a statistically significantdifference in task completion duration between participants who used the rankingmechanism (Mean=1:50, STDDev=39), compared to those who did not (Mean=6:34,STDDev=1:16), with p-value<0.05. We used a Mann-Whitney U test to analyzethe accuracy results for the ranking mechanism. Although using the ranks led toan average of 20% higher accuracy of the answers, the results were not statisticallysignificant in this case. However, the participants completed T4 about 3.7 timesfaster than T3. This significant improvement highlights the importance of rankingthe impact set, and is an indication of the usefulness of our impact ranking method(Section 3.4).Participants’ Feedback. We gathered qualitative data from both experimentaland control groups through a post-questionnaire form. The questionnaire askedparticipants about the usefulness of the tool used in the study, its strengths, and itsshortcomings. Overall, all TOCHAL users mentioned that they found the tool useful.The participants were particularly pleased with the idea of finding the potentialimpact of JavaScript functions and DOM elements. Understanding the dynamicbehaviour and underlying dependencies were mentioned to be most useful. Theusers found TOCHAL to be helpful in solving the problems faster, especially in thepresence of its ranking mechanism. The participants in the experimental group werealso interested to see more features in TOCHAL. The feature requests were mostly89attributed to improving the user interface. Some participants were also interested inhaving direct debugging support in the tool.Furthermore, we asked our participants about the metrics they thought coulddetermine the importance of an entity in the impact set. We analyzed and categorizedthe participants’ opinions on impact metrics. A few of the final categories wereconsidered in our ranking strategy, as we expected. These metrics included (1)distance of an entity from the change set and (2) number of dependencies of anentity. However, there were many other categories that are not included in ourmethods, such as: (1) number of invocations of a function, (2) visibility of theimpact in the interface, (3) importance of the feature to the customers, (4) “breadth”of usage of the entity: whether it is used by multiple files, or is isolated to one file(5) complexity of the function, and (6) “history” of a function: rate of having faultsin the function.Threats to Validity. The first internal threat is the the population selection threat,and specifically the equivalency of the two groups in terms of their expertise.We addressed this threat by first manually dividing the participants into differentproficiency levels, which were extracted from the information gathered in the pre-questionnaire forms. We then distributed the members of each level into control andexperimental groups by random sampling. The second threat is the investigator’sbias while marking the accuracy of the answers. We mitigated this threat by creatinga marking rubric while designing the tasks, and using the same rubric for markingthe results later. A similar threat can arise from the bias in measuring the durationof the tasks. We resolved this issue by enforcing a mutual supervision on timemeasurement by both the investigator and the participant. We assigned a separatesheet of paper to each task, which was handed to the participant in the beginning ofthe task, and was returned to the investigator after task completion. The fourth threatcan be introduced by the choice of the tool that the control group used. This threatwas mitigated by allowing the control group to choose the tool of their preference.Finally, the compensatory incentives were not a threat to validity as all of ourparticipants volunteered for the experiment, with no monetary compensation.An external threat is with regard to the representativeness of our sample ofpopulation. We mitigated this threat by recruiting professional web developers from90industry as our participants. Another concern is raised regarding the representa-tiveness of tasks used in the experiment. We used general tasks enquiring aboutunderstanding the impact of a code change and also detecting potential faults in thecode, which are faced by developers in their daily professional activities.3.7 Concluding RemarksThe dynamic, asynchronous, and event-based nature of JavaScript and its interac-tions with the DOM make modern web applications highly interactive and respon-sive to users. These same features also introduce new types of dependencies intothe system, making the prediction and detection of change impact challenging fordevelopers. In this paper, we proposed an automated technique, called TOCHAL, forperforming a hybrid DOM-sensitive change impact analysis for JavaScript. TOCHALbuilds a novel hybrid system dependency graph, by inferring and combing staticand dynamic call graphs. Our technique ranks the detected impact set based onthe relative importance of the entities in the hybrid graph. Our evaluation showsthat the dynamic and DOM-based JavaScript features occur in real applications andcan lead to significant means of impact propagation. Furthermore, we find that ahybrid approach leads to a more complete analysis compared with a pure static ordynamic analysis. Finally, our industrial controlled experiment shows that TOCHALincreases developers’ performance, by helping them to perform maintenance tasksfaster and more accurately.91Chapter 4Understanding AsynchronousInteractions in Full-StackJavaScriptJavaScript has become one of the most popular languages in practice. In Chapters2–3, we discussed understanding the behaviour and impact of change in client-sideJavaScript. However, developers now use JavaScript not only for the client-sidebut also for server-side programming, leading to “full-stack” applications writtenentirely in JavaScript.However, there are three groups of challenges involved in understanding theexecution on the client side, the server side, and their interactions. First, JavaScriptis a single-threaded language and thus callbacks are often exercised to simulateconcurrency. Nested and asynchronous callbacks are used regularly [49] to providecapabilities such as non-blocking I/O and concurrent request handling. This use ofcallbacks, however, can gravely complicate program comprehension and mainte-nance — a problem known as “callback hell” on the web by developers. Second, theDocument Object Model (DOM) and custom events, timers and XMLHttpRequest(XHR) objects interact with JavaScript code on the client and server to providereal-time interaction, all of which complicate understanding. Moreover, Node.jsdeploys the event-loop model for handling and scheduling asynchronous eventsand callbacks, the improper use of which can lead to unexpected behaviour of the92application. Finally, client and server code communicate through XHR messages,and multiple messages (and their responses) can be in transit at a given time. As inany distributed system, there is no guarantee on the order or time of the arrival ofrequests at the server, and responses at the client. The uncertainty involved in theasynchronous communication makes the execution more intricate and thus moredifficult to understand for developers.Despite the popularity of JavaScript and severity of these challenges, there iscurrently no technique available that provides a holistic overview of the executionof JavaScript code in full-stack web applications. The existing techniques do notsupport full-stack JavaScript comprehension [10, 61, 84, 101, 143]. In our earlierwork, we proposed a technique, called CLEMATIS [4], for understanding client-sideJavaScript. CLEMATIS is, however, only designed for client-side JavaScript, andis agnostic of the server, where most of the program logic is located in full-stackapplications.In this chapter, we present a technique called SAHAND, to help developers gaina holistic view of the dynamic behaviour of full-stack JavaScript applications. Ourwork makes the following contributions.• We propose a novel temporal and behavioural model of full-stack JavaScriptapplications. The model is context-sensitive and creates lifelines of JavaScriptexecution on both the client and server sides. The model connects both sidesthrough their asynchronous communications, to provide a holistic view of theapplication behaviour.• We create a visual interface for displaying the model to the developers, tohelp them understand the underlying mechanisms of execution. We treat themodel as a multi-variate time series, based on which, we create a temporalvisualization of the lifelines of JavaScript execution.• We implement our approach in a tool called SAHAND [119]. The tool isbrowser-independent and non-intrusive. SAHAND can handle the simulatedconcurrency of JavaScript through asynchronous execution of callbacks, XHRobjects, timers, and events.• We evaluate our approach through a controlled experiment conducted with 12participants. Our results show that using SAHAND helps developers perform93program comprehension tasks three times more accurately.4.1 Challenges and MotivationTo comprehend the behaviour of a full-stack web application, one must understandthe full lifecycle of a feature on both the client and server sides. We elaborate onsome the challenges involved using the examples illustrated in Figures 4.1–4.3.These are simple examples and the challenges are more potent in large and complexapplications.4.1.1 Challenge 1: Server-Side CallbacksReceiving requests at the end points. Various types of HTTP requests are receivedat the end points on a server. Node.js applications have one or more handlersassigned to each incoming request. Each of the handlers can change the flow ofexecution, return a response to the client, or pass the execution to the next handler.The ability to register anonymous functions, or arrays of functions, can complicatethe process of understanding and maintaining the handling and routing the requests.Example. The example of Figure 4.1 depicts an end point for receiving a GET request(lines 12–18 use Express.js APIs [46]). Three items are registered as handlers ofthe /locate request. First, an anonymous function is registered (lines 12–15),which can return a response to the client conditionally (lines 13–14) and preventthe execution of the remaining handlers. The second assigned handler (line 16) isan array of callback functions cb1() and cb2() (line 8). An additional functioncb0() can be pushed to the array at runtime based on dynamic information (lines9–11). cb0() can itself affect the control flow and send a response to the clientin a specific scenario (lines 3–4). Finally, another anonymous function is added tothe list of request handlers (line 16). Understanding how a request is received androuted in the server depends on understanding the complex control flow of all thesehandlers. This task becomes more challenging as the number of handlers increasesin practice.Callback hell. Functions are first-class citizens in JavaScript. They can be passedas arguments to other functions and be executed later. Callback functions are widely941 var cb0 = function (req, res, next) {2 var region = locateClient(req.body.client)3 if (region.ASIA) {4 res.send(customizedRes(req.body.content))5 }6 next()7 }8 var cbacks = cb1, cb29 if (user.isLoggedIn) {10 cbacks.push(cb0);11 }12 app.get('/locate', function(req, res, next) {13 if (req.header('appStats'))14 res.send(statCollectionResponse(req.body.stats))15 next();16 }, cbacks, function(req, res) {17 // do stuff18 })Figure 4.1: Receiving HTTP requests at an end pointused in JavaScript applications [49]. However, It is not trivial to understand theJavaScript code that deploys callbacks. In many cases callbacks are nested (upto eight levels deep [49]) or are assigned in loops, which negatively impacts thereaders’ ability to follow the data and the control flow. This problem is know as thecallback hell by developers [24]. To aggravate the situation, Node.js deploys theevent loop model for scheduling and organizing callbacks. The event loop is notvisible to the developers, but it determines the asynchronous execution on the serverside.Example. The code in Figure 4.2 depicts a simple example of callback hell. Manycallback functions are passed as arguments to other functions in a nested manner(lines 2–5). Callbacks can also get assigned in loops. In the case of our example(lines 3–5), the same anonymous function is assigned as a callback for all iterationsof a loop.4.1.2 Challenge 2: Asynchronous Client SideThere are two asynchronous events typically used in the client side. First, asyn-chronous XHR messages are used to seamlessly communicate with the serverwithout changing the state. Second, timing events are utilized for performing951 app.post('/cparse', function(req, res) {2 customParse(req.body, function(er, list) {3 list.forEach(function (row, index) {4 buildScript(row, req.body.format).extractArgs(row, function (←↩instType) {5 row.forEach(function (arg, i) {6 resolveAliases(instType, arguments0);7 }) }) }) })8 // send response back9 })Figure 4.2: Callback hell1 function updateUnits() {2 for (var i = 0; i < unit.length; i ++) {3 (function(i) {4 $.post(extractUrl(i), function(data) {5 if (data.requiresAlert())6 setTimeout(extractMessage(data), msgDelay);7 });8 } } })(i);9 function periodicUpdate() {10 $.get('/pupdate', function(data) {11 // do stuff12 });13 }14 setInterval(periodicUpdate, updateCycle);Figure 4.3: Asynchronous client-side JavaScriptperiodic tasks, or tasks that must take place after a temporal delay. To handleasynchronous events, developers typically use callbacks which are triggered whenthe event occurs. However, mapping the observed functionality of the event to itsoriginal source is a challenging task for developers. This is especially so when thesource and the callback are often semantically and temporally separate.Example. The sample code in Figure 4.3 displays a simplified client-side JavaScriptcode. The updateUnits() function (line 1) posts a set of XHR requests to theserver in a loop (lines 2–7). Each of these messages has a callback function that isinvoked upon receipt of the server’s response. The callback function of all sent is thesame anonymous function (lines 4–7). Based on the content of the response data, atimeout may be set that will execute after a certain delay (line 6). In another part ofthe code, an interval is set that executes the periodicUpdate() function at pe-96riodic intervals throughout the lifecycle of the application. periodicUpdate()in turn sends a get request to the server and continues its execution upon arrival ofthe response.4.1.3 Challenge 3: Network CommunicationThe server and the client communicate through request/response messages. Hence,the role of the network layer needs to be taken into account to obtain a holisticoverview of the execution. The requests do not necessarily arrive at the server inthe same order as they are sent on the client side. The processing times of differentrequests can vary on the server side as well. Moreover, after the responses are sentfrom the server, there is no guarantee on the time and order in which they willarrive at the client. Observing the behaviour of the application as a whole on bothclient and server sides is non-trivial. However, this is necessary for developers tounderstand the full functionality of the features throughout their lifespan.4.2 ApproachIn this section, we first present the building blocks of our model. We then discussthe different steps of our approach and how they contribute to the generation of themodel.4.2.1 Temporal and Context-Sensitive ModelOur approach creates a custom directed graph of the context-sensitive executionsof events and functions during their lifespan. The model is designed to accommo-date the temporal nature of function executions and the asynchronous schedulingmechanisms of full-stack JavaScript. The relations of functions and (a)synchronousevents are also temporal to reflect the precise dynamic and asynchronous behaviourof the application. We use the notations introduced here to show how our approachcreates the model based on dynamic analysis.Vertices. The vertices of the graph can be events or lifelines of function executions:V ::= LL lifeline of a function execution| E (a)synchronous client/server event97Function executions are the focal points of the model. Each function can gothrough four phases in its lifecycle. Hence, a lifeline of the ith execution of functionf at time τ during execution (LL < f , i,τ >) manifests as one of the followingphases:LL < f , i,τ > ::= Sch( f ) scheduled : as a callback| Act( f ) active: being executed| Ina( f ) inactive: in stack, butanother function is active| Ter( f ) terminated: execution hasfinishedTo understand the lifeline of each execution, the model must account for allthese phases. There can be a maximum of one scheduling phase per functionexecution, depending on whether it was triggered asynchronously. This meansSch( f ) can occur 0 or 1 times in the beginning of a lifeline. Each execution has atleast one active phase (Act( f )). If the function invokes another function, the calleebecomes active, and the caller becomes inactive until the execution of the calleeis finished. Hence, after an initial active phase, a lifeline can contain an arbitrarynumber of {Ina( f ),Act( f )} pairs, before its execution is finally terminated (Ter( f )).However, there are cases where the execution is left unterminated, for instance dueto exceptions, or ending the execution before a scheduled callback occurs. Ingeneral, the lifeline of function f can be depicted as:LL( f ) = [Sch( f )]Act( f )(Ina( f )Act( f ))∗ [Ter( f )]The other type of nodes included in our model are events. The events can besynchronous or asynchronous, and can be triggered on the client or the server code.Capturing the events and extracting their relations with the rest of the entities inthe application is crucial for program understanding. Table 4.1 summarizes theinformation required for analyzing various types of vertices that is captured by ourapproach, in addition to the time of event occurrence.Edges. The edges of the graph have three primary attributes, namely time, type,and direction.Function lifelines are temporal entities over a contiguous time period. A lifeline98InaActInaIna InaInaActActAct ActActActActActSch ActSch ActActActActIna InaSchActCLIENTSERVERWindow of synchronization between client and serverDOM event handlingDOM eventpropagationprogrammaticevantfunctioninvocationActfunctionterminationcallbackschedulingActXHR sent by clientXHR received by serverActSchscheduled XHR callback invokedActserver sends the responseevent loop ticksEventActSchInaActiveInactiveScheduledClient ServerActsetTimeouttimeoutexpiryab c de fghjActSchActSchasynchronouscallbackisynceventk asynchronouscallbackActSchlmFigure 4.4: A sample temporal, context-sensitive and asynchronousmodel of events, lifelines, and interactions.Table 4.1: Types of vertices in the model graphEventTypeNode Client/ServerInformation gatheredDOMeventVe client user input information, DOM element, handler functionCustomeventVe client Custom Event type, DOM element, handler functionNode.jseventVe(s) server Custom event type, registered functionTimeoutsetVll(t) client&serverCustom ID, delay, callback function, setter functionTimeoutcallbackVll(t) client&serverCustom ID, callback function, setter functionXHRsendVll(x) client&serverCustom ID, sent data, callback function, opening andsending functionsXHRcallbackVll(x) client&serverCustom ID, response data, callback function, opening andsending functionscan interact with other lifelines and events at multiple points in time during itslifespan. The edges must preserve the temporal aspects of the interactions, andreflect them in the model.The type of each edge represents the type of interaction between the two involvedgraph nodes. Function lifelines can interact with each other and with events through99Table 4.2: Interaction EdgesEdge Relation Src Dst Sync GatheredinformationEc calls LL LL yes args, context infoEt terminates LL LL no return valueEcs schedules LL|E LL no callback typeEss schedules (s) LL LL no callback typeEts timeout set LL LL no delayExs xhr send LL LL both dataEt triggers LL E yes event typeEe emits E LL yes event typevarious types of relations, which are summarized in Table 4.2.The direction of an edge represents the direction of the control flow betweenthe involved nodes, which depends on the type of the edge.Table 4.3 summarizes the algorithm of creating the model graph based ona selective trace of execution. The rows of the table are the transactions in thetrace, and the columns formulate the handling of nodes, edges, and the logic of thealgorithm for each transaction. Figure 4.4 provides a schematic representation ofthe model. We refer to the algorithm table and the model figure throughout the restof the section, as we discuss the formation of the model.4.2.2 Client-Side AnalysisOn the client side, each function is either invoked directly by another function, oris triggered by a DOM event, a scheduled callback (including timing events) or aresponse to a request sent earlier. Next, we discuss how we create the client-sidemodel based on these entities and their relations.Events and DOM Interactions. Our approach captures both DOM and customclient-side events. For each event, we gather information on the involved DOMelement, the type of user action or programmatic event, the user input and theinvoked handler. Furthermore, our previous study [6] shows that around 14% of thetriggered handlers are not invoked directly by an event. These handlers are indirectlycalled through event propagation mechanisms of JavaScript, where a single eventcan trigger multiple handlers of the ancestors of the target element [133]. Thus, wecapture propagated handlers and their relations with the original events.100Table 4.3: Creation and extension of the behavioural graph based on theoperationsunionsqll : Stack of function lifelines. ©: Node.js event loop. Π f e: List of fired DOM events. Πue: List of unhanded DOM events.The time τ and the side (server/client) are included in all transactions.Row OperationTypeNode Edge Instructions1 Original DOMevent <ev,el>e := newVe(ev,el)ll := newVll(ev→ handler)unionsqll ←unionsqll ∪ llΠ f e←Π f e ∪ ed = newEe(src : e,dst : ll,action)if(JS active)ll.init(Phase.Sch)Πue←Πue ∪ eO.W.ll.init(Phase.Act)2 PropagatedDOM events(Σ pe)ep :=Π f e→ head∀ei ∈ Σ pelli := newVll(pe→ handler)unionsqll ←unionsqll ∪ ll∀ei ∈ Σ ped := newEe(src : ep,dst : lli,ep→ action)∀ei ∈ Σ pelli.init(Phase.Sch)Π f e←Π f e ∪ (ei)3 Timeout set to− id := newuniqueTOID()llc := unionsqll → headll := newVll(to− id,delay)d := newEts(src : llc,dst : ll,delay)ll.init(Phase.Sch)i f (serverside)©←©∪< TO, ll >4 Timeoutcallbackll := unionsqll .get(TO→ to− id) ll.end(phase.Sch)ll.start(phase.Act)i f (serverside)©.pop(ll→ to− id)5 XHR send xhr− id := newuniqueXHRID()llc := unionsqll → headll := newVll(xhr− id,url,method)d := newExs(src : llc,dst : ll,data)ll.init(Phase.Sch)i f (serverside)©←©∪< XHR, ll >6 XHR callback ll := unionsqll .get(XHR→ xhr− id) ll.end(phase.Sch)ll.start(phase.Act)i f (serverside)©.pop(ll→ xhr− id)7 Server events llc := unionsqll → heade := newVe(ev)ll := newVll(ev→ handler)unionsqll ←unionsqll ∪ lld = newEe(src : llc,dst : e,e→ type)d = newEe(src : e,dst : ll)ll.init(Phase.Act)llc.init(Phase.Ina)8 Callbackschedulingllc := unionsqll → headll := newVll(callback)d = newEcs(src : llc,dst : ll) ll.init(Phase.Sch)i f (serverside)©←©∪< CB, ll >9 Callbackinvokationll :=©→head ll.end(Phase.Sch)ll.start(Phase.Act)i f (serverside)©.pop(< CB, ll >)10 Functioninvokationllc := unionsqll → headll := newVll( f unction)d = newEc(src : llc,dst : ll) ll.start(Phase.Act)llc.start(Phase.Ina)11 Functionterminationllc := unionsqll → popllp := unionsqll → headd = newEt(src : llc,dst : llp) llc.start(Phase.Ter)llp.start(Phase.Act)Upon invocation of the original handler, we create a node representing the eventand add it to the model (Table 4.3, row 1 & Figure 4.4, a). The node containsinformation about the target DOM element and the input data (if applicable). If thecall stack at the time of event is empty and the event can be handled immediately,a new lifeline is created for the handler, and is initialized with an active phase.101However, if the call stack is not empty and the browser thread is executing otherJavaScript code, the lifeline will start with a scheduled phase, which will terminateand enter an active phase as soon as the stack and waiting event queue are emptyand the handler can be invoked. For each propagated handler, a new lifeline iscreated (linked to the same event node of the original event) that is initialized with ascheduled phase (Table 4.3, row 2 & Figure 4.4, b). The lifeline enters the activephase after the execution of the original (preceding) event and its synchronouscallers is finished, but before any asynchronous event/callback scheduled in thepreceding event handler.A new edge is created from the event node to each of the newly created handlerlifelines. The edge to the original handler’s lifeline maintains the user action.The edges to the propagated lifelines (if any) will indicate the occurrence of thepropagation as well as the initial user action. We intercept event handling byinstrumenting the registration of event listeners in the code. Our tracing techniquethen retrieves information regarding the element, the event, and the handler(s) oncethe event occurs.Timeouts. There is often temporal and semantic separation between setTimeout()and the callback function. Even in the case of immediate timeouts, the callbackis not executed until the JavaScript call stack is empty, and there are no other pre-ceding triggered DOM and asynchronous events that are yet not handled. Hence, asetTimout’s delay is merely the minimum required time until the timeout expires.We intercept all timeouts by replacing the browser’s setTimeout() similarto our previous work [4].Each timeout must be set within the current active phase of a lifeline. Uponsetting a timeout, we create a new lifeline, representing the callback functionexecution, that is initialized with a scheduled phase in the beginning. An edge iscreated from current active lifeline to the newly created scheduled lifeline (Table 4.3,row 3 & Figure 4.4, c). The new edge includes the data regarding the details ofthe timeout (delay and passed arguments). The lifeline proceeds to an active phasewhen the timeout expires and the callback is executed (Table 4.3, row 4 & Figure 4.4,d).XHRs. The server is treated as a blackbox at this stage. Our technique captures102the information regarding sending the request (e.g., method, data) and the means ofreceiving the response (e.g., response data, callback) and how it is handled on theclient side (sync or async).When the active lifeline sends a request, we create a new node, initializedwith a scheduled phases (Table 4.3, row 5 & Figure 4.4, e). A new edge connectsthe current active lifeline to the new scheduled one. The new edge encapsulatesinformation regarding the request (type of the request, sync/async, url, possible sentdata). When the response is received, the captured information is completed withthe response data (Table 4.3, row 6 & Figure 4.4, f).Function executions. Our analysis of function executions is similar to creating adynamic call graph that is temporal and context sensitive. Our method accumulatesa trace of function executions initiated by regular function calls, as well as thefunction executions caused by any of the mechanism discussed above.The lifeline node representing the lifecycle of a function execution preserves thetemporal states of the function and their respective edges represent their relationswith the rest of the application. Lifelines and their edges map to particular executionsof functions and maintain the information regarding the context of that execution(e.g., caller information, dynamic arguments, return values).There are three possible cases of function invocation, each of which is handleddifferently. First, when a function is invoked without passing any callbacks, a newlifeline node is created (Table 4.3, row 10 & Figure 4.4, g). The new lifeline isinitialized with an active phase, and the execution continues from there. Meanwhile,an inactive phase is added to the caller lifeline, which finishes and enters the activephase when the callee returns. Second, when a function is invoked with a callback,but the callback is not immediately (synchronously) executed, a new lifeline isadded for the callee. The lifeline is initialized with a scheduled phase and is notmarked as active yet (Table 4.3, row 8 & Figure 4.4, h). Finally, when a functionis invoked with a callback function, and the passed callback function is executed,our method retrieves the existing lifeline where the callback is already scheduled,and transitions it to an active phase (Table 4.3, row 9 & Figure 4.4, i). Synchronouscallback invocations are treated as regular function calls.Every time a new lifeline is created, it is added to a stack of lifelines (unionsqll). Whenthe execution of a function lifeline terminates after an active phase, the lifeline103enters the terminated phase, and is popped (Table 4.3, row 11 & Figure 4.4, j).Our technique instruments all JavaScript functions in order to gather a detailedexecution trace dynamically. JavaScript functions can have different return state-ments in different intra-procedural execution paths. Hence, our method instrumentsall existing return statements individually. Should a path terminate without a returnstatement, we inject a different logging function for marking the termination of thefunction. Function invocations are wrapped within our trace functions. All argu-ments are examined and if they are functions, additional instrumentation is addedto distinguish potential callback scheduling. The analysis recursively checks thesubprogram and if the potential callback is eventually invoked, the actual callbackinvocation are annotated through additional tracing code. Further, to distinguishbetween multiple invocations of the same function, we maintain its contextualinformation in the caller function, and update it per execution of the callee. Wepass the updated state to the callee through our instrumentation, where it is used tocustomize the collected trace for that specific execution.4.2.3 Server-Side AnalysisOur approach tracks the incoming requests from their arrival at the endpoints of theserver. The endpoint layer typically contains minimal logic, but can highly affectthe flow of execution (e.g., routing to different handlers, sending the response back).The essence of this part of the analysis is similar to the client side. However, thefocus at this stage is on challenges specific to server-side JavaScript development,such as the callback hell and the server-side events. Before discussing our analysisof the server-side behaviour of a JavaScript application, we need to describe the roleof the event loop on the server.Event loop. The event loop consists of a queue of asynchronous events waitingto be executed at each tick of the loop when the stack (of synchronous functions)becomes empty.The stack, the event loop, and the mechanisms of pushing/popping eventsin/from the loop determine the order and time of asynchronous events execu-tion. Hence, we need to consider them in our analysis. For example, thereare three ways of scheduling an immediate callback in a Node.js application,104namely Immediate setTimeout() (a timeout with 0 delay), setImmediate()and process.nextTick() However, the order and time of execution of thecallbacks using each method differs based on the contents of the event loop.process.nextTick() pushes the callback to the front of the event loop queueregardless of the contents of the queue. setImmediate() enters the callback intothe queue after the I/O operations, but before timing callbacks. setTimeout(0)pushes the callback to the end of the queue (after all existing callbacks). Hence, eventhough the delay is set to 0, it may be executed with more delay in practice. Thisshows the importance of reflecting the exact dynamic execution of asynchronousJavaScript in helping developers understand the behaviour of the application.Callbacks. We capture all callback invocations (synchronous or asynchronous),their relations with the events in the loop that triggered them, and the consequencesof their executions. When a callback is scheduled, a new lifeline node is created inthe server-side of the model for the callback function, which starts with a scheduledphase. The respective asynchronous event is added to the list of events in the loop.Later when the event is popped and the callback is invoked, the lifeline is retrieved,the scheduled phase is terminated and the active phase starts. This part of theanalysis is similar to that of the client side, although we consider the event loop andthe respective scheduling methods (Table 4.3, rows 8–9 & Figure 4.4, k).Events. There is no DOM on the server side and hence there are no user events.However, developers can take advantage of Node.js events to trigger custom eventsand invoke their handlers using EventEmitters. A major difference betweenEventEmitters and client-side events is that the former are synchronous innature and thus do not occur in the event loop. Although these events can be emittedin asynchronous functions, the invocation of handlers is different from asynchronoushandlers and thus has to be analyzed differently. In our model, for each emittedevent a new event node is created. An edge connects the current active lifeline tothe event. The current lifeline enters an inactive phase. A new lifeline in the activephase is created, which is connected to the new event node through an edge. Whenthe execution of the handler finishes, the inactive phase of the original lifeline willfinish and it will be active again (Table 4.3, row 7 & Figure 4.4, l).1054.2.4 Connecting Client and ServerIn a typical web application, execution starts on the client side with an event, whichcan trigger an asynchronous request to the server. This entails code execution onthe server and sending the response back to the client, which will complete thelifecycle of interaction when the execution terminates on the client side. However,JavaScript execution can continue on the client side even while the asynchronousrequest is being handled on the server. The synchronization of the client and serverside executions of a full-stack feature occurs in our model when the two endscommunicate through XHR objects (Table 4.3, rows 5–6 & Figure 4.4, m).We create temporal models for both client and server sides. Due to the networklayer in the middle, each side initially treats the other side as a black box. Theconnections between the two sides are made by marking and tracking the XHRobjects. Because the client and the server may have different clocks, we cannot usethe timestamps produced by their respective clocks for synchronization. Hence, wetrack all communications between the client and the server. This way, our approachcan find windows of synchronization between the two sides, which start by a requestarriving at the server and end when the response is sent back to the client. Whilethis approach only provides a relative sense of time globally, in practice, this issufficient for the purposes of our approach, since it is accurate for each specificfull-stack interaction.4.2.5 Visualizing the ModelIn the last step of the approach, we create a visual interface based on our inferredtemporal model. The visualization shows the temporal characteristics of the lifelines,events, and their relations, to facilitate understanding of execution patterns. Thereare three major criteria that need to be considered in creating a visualization fortemporal data [2].1) Time. There are two types of temporal primitives. Time points are specific lineson the time axis. Time intervals constitute ranges on the time axis. our visualizationuses time points to represent events and event loop ticks, and time intervals, todepict function lifelines and the phases of their lifespans. The time axis can followone of the common structures of time: linear, cyclic or branching. The structure106CLIENTSERVERLifeline of function executiona Event queueon the client side.Color specifiesevent type (DOM, timeout, XHR)iDifferent phases of lifelines are depicted with different coloursbScheduling phasecInteractions betweendifferent lifelinesdSimulated Node.js event loop.Each icon types maps to one event type.Green and red borders represent pushand pop operations.jXHR requestis sentgXHR responseis receivedhefFigure 4.5: A snapshot of the visualization.of our time axis is a mixture of subsets of both linear and branching structures.As a linear structure, it follows the natural perception of time, where time passesfrom past to future, and the temporal primitives are ordered (as opposed to a cyclicperception of time). Moreover, similar to the branching structure, multiple edgescan exit a single temporal primitive node. But unlike branching, the outgoing edgesactually occur at different timestamps and do not represent alternatives.2) Data. Data is the second criterion of time-series visualization and can be exam-ined from different aspects. The frame of reference for our data is abstract, sinceit does not encompass a spacial layout. The data is multivariate since each nodecontains a set of information (variables) accumulated for the event or lifeline itrepresents.3) Representation. The final criterion is the representation of the time-relevant data.This can be of two kinds: static or animated. We deploy a static approach, meaningthat our visualization makes all the information available on screen on demand,and hence the viewers can concentrate on the data itself and make comparisons ondifferent parts of the model. We collect multiple variables for each node. Presentingthem all to the viewers can be overwhelming and obstruct the overview of the wholemodel. We utilize basic interaction to allow users to view information on demandby clicking on any of the events.Lifeline visualization has been extensively used for displaying histories indomains such as medical records [108]. We incorporate custom lifeline visualizationin the interface of our behavioural model.107Visualization example. Figure 5.7 displays a sample snapshot of the interface. Themain frame of the visualization depicts our lifelines. Each lifeline represents aparticular context-sensitive execution of a function (a). Different phases of a lifelineare depicted as rectangles with different colours on the lifeline (b). If a lifelinerepresents an asynchronous callback, it will start with a scheduling phase (c). Linesbetween caller/scheduler lifelines and their respective callee/scheduled lifelinesdisplay the edges between the function executions (d).Once an XHR is sent to the server, an edge connects the the scheduled callbackto the handler on the server. However, due to the potential network delays, thehandler execution may start later than when the request is sent (g). The request isthen dispatched and handled, until the response is sent back to the client (h). Inaddition to the main panel, there are two smaller panels to represent the client-sideevents and the server event loop. The first row on the client panel (i) represents theDOM events, and timeout and XHR callbacks that occur on the client side. Thecolour and label of each cell on this row depict the type of each event. The server’sevent loop is depicted at the bottom of the server panel (j). Every time a user-definedcallback is scheduled, a timeout is set, or an XHR is sent, an event is pushed to theevent loop (marked with a green border). When it is a callback’s turn to be executed,the corresponding event is popped from the loop (marked with a red border), whilethe remaining events (if any) can still be observed in the loop. Finally, the horizontalaxis below both panels represents the time.4.2.6 ImplementationWe implemented our approach in a tool called SAHAND. We instrument JavaScriptcode on the server side at startup, using a proxy server built with Node.js andExpress.js, and on the client-side code on the fly. We create an AST of the codeusing Esprima [44], instrument the AST using Estraverse [45], and serialize the ASTback into JavaScript code with Escodegen [43]. The visualization is built on top ofthe timeline view of Google chart tool [53]. SAHAND is publicly available [119].1084.3 EvaluationWe conducted a comparative controlled experiment [139] to investigate the effectsof using SAHAND on the performance of developers when understanding full-stackweb applications. Our experimental dataset is available online [119].4.3.1 Experimental SetupThe participants in our study are asked to perform three comprehension tasks on afull-stack JavaScript application.Experimental subjects. We recruited 12 participants for the experiment, 11 malesand one female, aged between 23 and 33. All of the participants are graduatestudents at UBC who regularly program with JavaScript. None of the participantshad used SAHAND prior to the experiments.Experimental object. We use Math-Race [82] as our experimental object. It isan open-source, online game that allows multiple players to compete over solvingsimple mathematical problems. During timed cycles of the game, they players cananswer questions, keep the history of their scores, and enter the game’s hall of fameif they achieve high scores. We chose this application because it is a full-stackJavaScript application built on Node.js. It is also relatively small (about 200-300LOC of JavaScript on each of the client and server sides), and hence it is feasiblefor our participants to understand its main features during the limited time of theexperiment (about 75 minutes). Although it is a small application, it employs manyadvanced features such as asynchronous events and callbacks. Our participants hadnever seen Math-Race before the experiment.Experimental design. The experiment had a between-subject design. We dividedthe participants into two groups. The experimental group used SAHAND for perform-ing a set of comprehension tasks. The participants in the control group were allowedto use any existing web development tool. They all selected Google Chrome’sDeveloper Tools [27], one of the most popular client-side development tools, as theyall self-reported as experts in it. We also provided the control group with JetBrain’sWebStorm [135], a popular JavaScript IDE, for working with the server-side codeof the experimental object. In contrast, the experimental group were only allowedto view the code in addition to SAHAND’s visualization, and not permitted to use an109Table 4.4: Comprehension tasks of the experimentTask DescriptionT1 Understanding full-cycle implementation of submitting a correct answer on the client side.T2.a Understanding time-triggered feature of terminating game rounds managed by the server.T2.b Detecting a potential for an event-race condition during client-server communications.T3.a Understanding the purpose of a new feature involving nested callbacks.T3.b Understanding the asynchronous execution of a function involved in nested callbacks.IDE or debugger. We limited their access to other tools because we wanted to gaina better control of SAHAND’s impact on understanding.Task Design. We designed a set of tasks that represented common comprehen-sion activities performed in normal development proposed by Pacione et al. [102].Each of our tasks covers multiple activities, and also involves elements specific toJavaScript comprehension. The tasks are summarized in Table 4.4.Variables. We wanted to measure the performance of developers in performingprogram comprehension tasks. The dependent variables (DV) should quantifydevelopers’ performance. Our design involves two interval dependent variables, taskcompletion duration and accuracy. We also considered two nominal independentvariables (IV). The first IV is the tool (set of tools) used for the experiment, and hastwo levels. One level is SAHAND, and the other is the set of Chrome’s DevToolsand WebStorm. The second IV is the expertise level of participants. We wantedto investigate the effects of expertise of the participants on how they comprehendweb applications. We classified participants into two groups, namely experts andnovices, based on their responses to a pre-questionnaire form (described below).Experimental procedure. This consists of four parts. In the first part, the partici-pants completed a pre-questionnaire form where they ranked their expertise in webapplication development using a 5-point Likert scale, and prior experience withsoftware development in general. We used a combination of their self-reported ex-pertise and experience to assign an expertise score to each participant. The expertisescore was used to assign the participant to either the experimental or control group.We manually balanced the distribution of expertise in both groups. We also used theexpertise score to assess whether the expertise of participants affects their programcomprehension performance. In the second part of the experiment, we presented ashort tutorial on SAHAND for the experimental group. However, we did not present110any tutorial to the control group as they identified themselves as expert in ChromeDeveloper Tools. Both groups were given a few minutes to familiarize themselveswith the settings of the application, the object application, and the tools. In the thirdpart, the participants performed the tasks (Table 4.4). We presented each task tothe participants on a separate sheet of paper, and measured the time from whenthey started the task until they returned the answer. This setup ensured that thetime-tracking process was not biased towards either the examiner or the participant.We measured the accuracy of the answers later, based on a grading rubric that wehad finalized prior to conducting the study. The accuracy of the tasks could bequantified with a grade between 0 and 100 per task. The tasks and their rubrics,along with the rest of documentations of the study are available online [119]. Inthe fourth part, when the participants finished all of the tasks, they were given apost-questionnaire form. The form asked about their experience with the tool usedin the experiment, and its pros and cons. We also solicited participants’ opinions onthe features they thought would be useful for a web application comprehension tool.4.3.2 ResultsWe were interested in observing the effects of tool and expertise on task completionduration and accuracy. Both variables are conceptually dependent although we didnot observe a correlation between them in our experiments. Imagine a case where aparticipant finishes the tasks early thinking she has found the correct answer, butthe answer is incorrect or incomplete. In this case, the fast completion of a taskis not an improvement, since the purpose of the question is not fulfilled and theparticipant has not performed better. Because of this relationship, we performed amultivariate analysis, where we examined the pair of both duration and accuracy asthe dependent variable.We performed a set of multivariate analysis of variance (MANOVA) tests toinvestigate the effects of tool and expertise on the integration of duration andaccuracy. Using MANOVA entails two advantages for our analysis. First, it canreveal differences that are not discovered by ANOVA. Second, it can prevent type Ierrors that may occur when multiple independent ANOVA tests are conducted. Weperformed the MANOVA tests on the total duration and accuracy (combining all111tasks). Next, we ran a MANOVA test on each individual task. If the results of aMANOVA test were significant, we examined the univariate tests (ANOVA) to seeif the significance in performance improvement was due to the duration, accuracy,or both.Examining the total results, we found a significant main effect of tool (p <.0001) on the group of accuracy and duration, but no significant main effect ofexpertise (p > .05). For individual tasks too, we found a significant main effect oftool (T1:p< .001, T2:p< .001, T3:p< .05), but not of expertise. We then examinedthe univariate tests (ANOVA) for each significant result, to find which dependentvariable(s) contributed to the significance. From the results, we found that there is astatistically significant difference (p<.00001) in accuracy between the group usingSAHAND (M=89%, SD=10%) and the control group (M=30%, SD=11%). However,we did not find a statistically significant difference for duration (p > .05) betweenthe group using SAHAND (M=32:06, SD=5:43) and the control group (M=33:49,SD=6:37).The above results suggest that that task completion accuracy was the determiningfactor in the significance of the results of the multivariate tests. The accuracyresults are shown in Figure 5.8. We find that SAHAND helped developers performcomprehension tasks three times more accurately, in about the same amount of timeused by the control group.4.3.3 Discussion“Fast Is Fine, but Accuracy Is Everything”. Using SAHAND significantly im-proved the accuracy of each individual task in the experiment. The large differencebetween the means of two groups, and the high confidence of the test results em-phasize the impact of the challenges of understanding full-stack JavaScript, evenfor a simple application as our experimental object. Tasks T1 and T2 were seekingdevelopers’ understanding of two of the basic features of the application, whoseimplementation was divided between both ends of the application. The tasks alsoinvolved understanding features such as event propagation on the client-side, andasynchronous time management on the server side. Task T3 required understandingthe execution of a nested callback code segment, which can create implicit and112●T1−ExpT1−CtrlT2−ExpT2−CtrlT3−ExpT3−CtrlTot−ExpTot−Ctrl020406080100Accuracy (%)Figure 4.6: Accuracy results. Gold plots display experimental (SAHAND)group, and green plots display the control group. Higher values arebetter.intricate connections in the application.Manual analysis of the answers of the control group showed that they all had anincomplete and sometimes incorrect vision of the full-stack execution of the features.Their mental model of the application’s behaviour missed both entities and connec-tions, on both client and server and their interactions. They gained significantlylower accuracy scores, while spending about the same time as the experimentalgroup. On the other hand, SAHAND users were able to see all the involved entitiesand their relations. The model allowed them to extract the information usuallyhidden in the application, and finish the tasks much more accurately.“It Will Get Better ... in Time”. The results did not show a statistically significantdifference of task completion duration between experimental and control groups.Manual investigation of the control group’s answers showed that almost all of themhad incomplete (and not necessarily wrong) answers for most of the questions.Therefore, it is possible that these participants spent the whole time on a small por-tion of the answer compared to the experimental group. This means that overall, asthe multivariate tests found, SAHAND users performed better than the control group113as they used approximately the same amount of time for providing significantlymore accurate answers.Further, none of the participants in the experimental group had seen SAHANDbefore the experiment. We observed that SAHAND users looked more often at thesource code and spent more time analyzing and interpreting the interface at thebeginning of the session. However, near the end of task T1, they would shift almostall of their attention on the model while solving the problems. We believe this is dueto two main reasons: (1) the users required a short learning phase for performinga real task (although they had a tutorial in the beginning), (2) only after multiplecomparisons between the interface and the actual code, were the users able to trustSAHAND as a fair representation of the behaviour. We believe that developers willget faster using SAHAND once these barriers are overcome. Examining the averagetime spent on each task, we observed that SAHAND users finished T1 only 8% fasterthan the control group in the beginning of the session. However, by the end of thesession, SAHAND users finished T3 32% faster on average. This result strengthensour intuition that by adopting the tool for a longer period of time, users will becomemuch faster in performing the tasks.User Feedback. According to the post-questionnaire forms, all SAHAND usersfound the tool useful. They particularly liked the overview it provided of the wholeinteraction. They found the unified client/server view most useful. The participantsalso found it easy to infer function relations from the model, and liked the abstractionand filtering of details in the visualization. However, some of them mentioned thatthe context-sensitive depiction of functions can become overwhelming in largeinteraction sessions. They requested interface features such as direct links to thecode, showing connections to the DOM, and integration with a debugger. These areinteresting directions for future work.Threats to Validity. The first internal threat is the examiner’s bias in measuring thetime. We addressed this threat by enforcing a mutual supervision on timekeeping bythe examiner and the participant. The start and end time of each task were marked bythe exchange of sheets of paper containing the question and the answer of that taskbetween the examiner and the participant. The same threat arises from examiner’sbias while marking the accuracy of the tasks. We mitigated this risk by devising the114rubrics of each task before conducting the experiments. The rubrics were later usedto mark the accuracy of the answers. Another threat is the impact of the expertiselevel of the participants on their performance in the experiment. We eliminated thisthreat by determining the expertise level of participants through a pre-questionnaireform before conducting the experiments. We used this information to rank par-ticipants into multiple bins based on their expertise levels, and then used randomsampling to assign the members of each bin to one of the control and experimentalgroups. The tools used by the control group can introduce another threat. Weavoided this threat by letting the participants choose the browser development kit forclient-side analysis (all chose Chrome). For the server side, we provided them withWebStorm, a popular enterprise IDE for web development. We resolved the biasof the experiment tasks by designing the tasks based on a framework of commoncomprehension tasks [102]. Using this framework, we also eliminate a potentialexternal threat arising from the representativeness of the tasks. The second externalthreat is the representativeness of the participants. We addressed this threat byrecruiting graduate students who regularly performed (and researched) JavaScriptdevelopment. Many of the participants had professional development experienceduring or prior to the time of this work. However, our participants were not full-timeprofessional developers and this could still threaten the validity of our experiment.Finally, to ensure the reproducibility of the experiment, we used an open-sourceexperimental object, and made our tool, the tasks, questionnaires, and the rubricspublic [119].4.4 ConclusionFull-stack JavaScript development is becoming increasingly important; yet thereis relatively little support for programmers in this space. This paper introducedSAHAND, a novel technique for aiding developers’ comprehension of full-stackJavaScript applications by creating a behavioural model of the application. Themodel is temporal and context sensitive, and is extracted from a selectively recordedtrace of the application. We proposed a temporal visualization interface for themodel to facilitate developers’ understanding of the behavioural model. The im-plementation of the approach is available as an open-source Node.js application115[119]. We investigated the effectiveness of SAHAND by conducting a user experi-ment. We found that SAHAND improves developers’ performance in completingprogram comprehension tasks by increasing their accuracy by three times, withouta significant change in task completion duration.116Chapter 5Inferring Hierarchical Motifsfrom Execution Traces5.1 IntroductionProgram comprehension is an essential first step for many software engineeringtasks. Developers spend a considerable amount of time understanding code. About50% of maintenance effort is spent on comprehension alone [31]. Unfortunately,code understanding is challenging. To understand code, developers typically startby searching for clues in the code and the environment. Then they go back and forthon the incoming and outgoing dependencies to relate pieces of foraged information.Throughout the process, they collect information they find relevant for understandingthe code on an “as-needed” basis [71]. However, developers often fail in searchingand relating information, and lose track of relevant information when using suchad-hoc strategies [117]. Further, developers form mental models of code, that areoften inaccurate [92]. Thus, there is a need for systematic and automated techniquesfor program comprehension [76].Dynamic analysis, which collects and utilizes data (traces) from program exe-cution [34], is a popular technique for program comprehension. However, due tothe amount of information obtained during the execution, the traces tend to becomecomplex and overwhelming and thus difficult to understand [33, 141, 142]. Existingtechniques target this problem, e.g., by summarizing traces [55], structuring and117visualizing collected data [7, 48], or inferring system specifications [100]. However,the first technique loses some of the data that may still be valuable, and the restbecome overwhelming for developers and are not flexible to small variations of data.The problem can also be approached by finding patterns in the execution. However,prior work in the area has predominantly focused on generic and predefined designpatterns, low-level architectural relations between program artifacts, or visualiza-tions of all details of execution [7, 21, 60, 73]. While useful, these approaches donot capture the behavioural patterns that are not defined or even known prior toanalysis, but form and recur (with small variations) throughout the execution of aprogram. Even in more traditional programming languages, patterns in execution donot repeat exactly in the same manner or same sequence. Further, presence of pro-gramming languages features such as dynamism, asynchrony and non-determinismin the execution makes the analysis more problematic and burdensome, and ren-ders conventional techniques ineffective. Hence, program comprehension throughdynamic analysis still remains challenging [34].In this paper, we propose a novel technique for program comprehension byinferring a model of execution motifs. Motifs are abstract and flexible recurringpatterns in program execution that serve a specific purpose in the functionality ofthe program. The term is inspired by sequence motifs, which are recurring patternsin DNA sequences that have a biological function [40]. Our approach discoversmotifs from traces containing function executions and events. Our proposed algo-rithm compares a trace obtained from an interaction session against a database ofpreviously-collected traces. It iteratively examines segments of traces for detectingsequences of function executions that may recur in execution. It is tolerant ofsmall variations in different manifestations of each motif, allowing abstraction ininferred motifs. The algorithm discovers hierarchies between motifs as they emergefrom details of execution. The hierarchical structure of inferred motifs revealshow higher-level key points of execution are formed. It allows users to have anoverview of the trace, while still having access to all execution details as well as allintermediate levels.The main contributions of our work are as follows:• We propose an automated approach for inferring a model of program be-118haviour, which encompasses hierarchies of abstract recurring motifs extractedfrom execution traces. Our approach is inspired by techniques from bioinfor-matics, where similar challenges arise in investigating similarities in largesequences of DNA. The motifs facilitate program comprehension by high-lighting the main characteristics of behaviour, and abstracting the details andvariations of execution.• We design and build a visualization technique for presenting the motifs todevelopers, to provide assistance with program comprehension. Our methodis complementary to existing tools and techniques, and is designed to beutilized alongside existing programming environments.• We implement our approach in a tool called SABALAN that supports Java-Script-based web applications. Our tool is non intrusive for generating traces,and infers models of recurring motifs from execution trace in an automatedmanner.• We evaluate our approach through a controlled experiment conducted with 14participants, on a set of real-world program comprehension tasks. The resultsshow that using SABALAN helps developers perform program comprehensiontasks 54% more accurately than other tools.5.2 Challenges and MotivationTo assist the process of searching, relating and collecting information, many tech-niques collect execution traces, analyze them, and/or visualize the results for thedevelopers. Despite providing the grounds for precise analyses, dynamic tracesbecome very large and cause information overload. Further, they become verycomplex due to dynamism, asynchrony and non-determinism in program execution.These challenges render large traces ineffective in assisting program understanding.In this section, we use a simple example to illustrate these challenges (Figures5.1–5.3). We selected JavaScript for the examples since it is the lingua franca of webdevelopment. It is voted as the most popular language [126] and is the most usedlanguage on GitHub [74]. JavaScript applications are highly dynamic, asynchronousand event driven, and heavily interact with the Document Object Model (DOM) andthe server code. These features can help demonstrate trace complexity within small119Email:Address:Occupation: Select one...Student...Degree: Select oneABSubmitI———————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-——BII———————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-—————————-——AwindowvalidateEmailcheckAddress occupation studentFormsubmitIIIFigure 5.1: I: A sample registration form. II: A) Sample execution trace,and B) hierarchy of inferred motifs. III: Dynamic call graph ofexamplecode segments. While our approach is general, we use JavaScript in this paper todemonstrate it.Overload by Information in Large Traces. The amount of information a tracecarries matters due to the cognitive overload understanding the trace imposes ondevelopers [35], e.g., a study found that one GB of trace data was generated forevery two seconds of executed C/C++ code [113]. For modern applications, whichare often distributed among many nodes with many components involved, the tracesbecome incomprehensible for developers very quickly. Some techniques try toaddress the problem by reducing the trace during/after its collection [19, 55, 113] byfocusing on more important entities, or filtering the details of the executions. Thesetechniques have been able to make traces more useful by decreasing the informationcontained in the traces [61]. However, even with a technique that creates a smallertrace, the trace is still not necessarily understandable for developers, as some of thedata might be lost or missed by developers.Complex and Hidden Dependencies. Revealing abstract and higher-level patternsthat highlight the key points of a program’s behaviour can facilitate comprehen-1201 <form>2 Email: <input type="email" id="email">3 Address: <input type="text" class="addr">4 Occupation: <div class="dropdown" id="occupation">5 <button class="dropbtn">6 Choose one</button>7 <div class="dropdown-content">8 <a href="#">Academic</a>9 <a href="#">Industry</a>10 </div>11 </div>12 <input type="submit" value="submit">Submit</input>13 </form>Figure 5.2: Initial DOM state of the running example.1 $("#email").addEventListener("change", validateEmail, false);2 $(".addr").click(checkAddress);3 $(".dropdown-content").addEventListener("change", occupation, false);4 function validateEmail () {5 // do stuff6 }7 function checkAddress() {8 // do more stuff9 }Figure 5.3: [Partial] JavaScript code of the running example.sion. The focus of the developer can be guided through a hierarchy of recurringpatterns of execution, while all collected information are still preserved for furtherinquiry. However, extracting such patterns (motifs) is challenging due to dynamism,asynchrony and non-determinism in program execution.First, there are many complex and hidden dependencies between entities in thesystem, that can affect the execution. Understanding the impact of a user actionor asynchronous communication with the server are examples of relations thatare difficult, if not impossible, to capture from merely analyzing the code or thecall graph. They act as media for connecting segments of execution together, thatotherwise would not be related in the code itself. Further, for a part of behaviourto be distinguished as a motif, it should recur during the execution. Differentexecutions of what is conceptually the same motif, may vary in details and thus maynot converge to reveal the same motif. The alterations are intensified when programshave user interfaces, are distributed, or involve general dynamism and asynchrony.121However, the caused variations should not prevent the analysis to recognize thehigh-level blueprint of the behavioural motif they all manifest. An analysis that isoverly dependent on all execution details cannot permit higher-level motifs to revealthemselves.Example. Consider the example shown in Figure 5.1, showing a part of a form re-quired for registering a user. Specific events on the input fields of the form have han-dlers that validate the input before the form can be submitted. verifyEmail()and checkAddress() (lines 1–2 of Figure 5.3) are handlers for email andaddress fields of part (A) of the form (lines 2–3 of Figure 5.2). The two func-tion are always executed together in a successful registration scenario, and are aconsistent part of the motif representing that scenario, due to their placement in theDOM. However, this relation cannot be inferred from the code (Figure 5.3) or thecall graph (Figure 5.1.III).Moreover, a successful submission requires filling all the fields properly andsubmitting. However, the form change in section (B) of Figure 5.1 is based on theinput of occupation (lines 4–11 of Figure 5.2). If the user chooses Student,a drop-down menu appears and the appearance, content, and functionality of theform changes a bit based on user’s input. However, the conceptual purpose ofsubmitting the form, and hence the motif, remain the same. Should an analysis betoo dependent on exact execution details, these two executions will be considereddifferent. However, a more representative analysis should recognize that regardlessof occupation of the user, the essence of the motif is the same and it should supportboth options. There are no prior knowledge or templates of the application-specificmotifs. Hence, a useful comprehension method should allow a degree of flexibilityin inferring motifs, to allow abstract motifs to form independently from unimportantcontextual details.5.3 Overview of the MethodologyFor execution traces, such as the one depicted in Figure 5.1.II.A, our goal is toinfer a hierarchy of its recurring motifs, by utilizing the knowledge of previousexecutions of the application. The model of extracted motifs assists comprehensionof the program behaviour by facilitating the cycle of searching, relating, and122collecting information. Having our proposed approach, developers are able to gainan overall understanding of the highlights of execution, manifested as motifs, at aglance. Further, they would have the means to understand the details of such motifs,their hierarchies, and relations upon inquiry (Figure 5.1.II.B). Our approach takesadvantage of precision of dynamic analysis, but prevents developers from beingtrapped and overwhelmed by the execution details.Our proposed approach first instruments and intercepts the application on the fly,to obtain traces. Having a knowledge-base of previously collected traces and a querytrace, our algorithm then extracts motifs of different lengths within traces, and infershierarchies and relations of motifs. Our algorithm is inspired by bioinformaticsalgorithms for aligning biological sequences. Our approach creates a behaviouralmodel from the motifs and their relations, which we visualize for developers in thefinal step.Execution Traces. To obtain the traces required for the algorithm, we instru-ment the applications and collect dynamic execution information automatically. Ourinstrumentation allows our technique to intercept all function executions and collecttheir context-sensitive information. It also intercepts all events that can occur duringthe execution. Their added knowledge can assist the algorithm to infer conclusionsabout motifs and their causal and temporal relations with (a)synchronous events.Next, our approach eliminates low-level details included in the raw trace, suchas auxiliary events, low-level and library method calls, and framework-specificdetails. The pruned trace is then used as the input to the algorithm. Note that ourmethod is non intrusive, and preserves the original behaviour of the applicationunder investigation.5.4 Algorithm for Inferring MotifsIn this section, we propose our algorithm for detecting motifs in order to createa higher-level model of behaviour. We define motifs as abstract and hierarchicalsequences of function executions that recur throughout the lifetime of an application.While each motif eventually supports concrete sequences of function executions, itis by nature a composite element, and can represent more complicated structures.We define a motif as an ordered set of two or more members (m0 to mi), which123include [sub-]motifs, abstract entities, and context-sensitive function executions.The confidence of a motif in each of its members is a representative of the mannerthat member is observed within different executions of the motif, and is shown asc0 to ci for all motif’s members, respectively.M = {〈m0,c0〉 . . .〈mi,ci〉|∞i=1} ,mi ::= function execution| sub-motif| abstract entityOur approach draws the attention of developers to the main observed motifs,presumed to represent highlights of behaviour, preventing their view to be obstructedby low-level details. The underlying model still preserves details that can bedemanded by users as necessary.5.4.1 Inspiration from Analyzing Biological SequencesIn designing the algorithm, we were inspired by bioinformatics, where there isa constant need to explore, compare, and analyze large data sequences. Mostrelevant to our approach are sequence alignment algorithms, which find similaritiesin sequences of DNA, RNA, and protein by arranging and comparing them [52]. Webegin our core algorithm by using a heuristic for finding exact matches between tracesequences. For comparison of the trace sequences, we adapt the idea of BLAST(Basic Local Alignment Search Tool) [9], a local sequence alignment algorithmwhich we modify to fit the domain of execution traces. We then expand the matchesby maximizing similarities in their neighbouring entities in the traces. This phaseis accomplished using a dynamic programming algorithm in order to allow moreflexible and more abstract motifs. Throughout this process, our method infersexisting motifs and reveals their relations and hierarchies. In this section, we useAlgorithms algorithm 2 – algorithm 3 and Figures 5.4 – 5.5 to explain our algorithm.Our algorithm takes the application (app) and its knowledge-base (ΣΣdb) as inputand returns the extracted motifs as output. Please note that we have eliminatedand/or merged many details for the sake of brevity. The details can be found in therepository of our open-source tool [118].124k = 2k = 3expandedbaz. ..DB Tracesexact matchoccupationvalidateEmailcheckAddressbazbarfoobarsubmitbarindes.html:11removepopupchangeshowPopupchangetransitiongroupedstackedfoobazoccupationvalidateEmailcheckAddressfoobarQuery TracestudentFormsubmitbarchangetransitionexact matchA B...... ...Figure 5.4: This figure depicts a DB of traces (A) and a sample querytrace (B) of an application, on the left and right side, respectively.Exact matches of length 2 and 3 between the query trace and differ-ent DB traces are marked.5.4.2 Forming a Knowledge BaseOur algorithm requires a knowledge base of multiple previous executions, i.e., aset of traces, named database (DB) traces (ΣΣdb), which can initially be collectedby executing the test suite, crawling, or exploratory testing and exercising theapplication multiple times. During each interaction session, our approach collects atrace, called the query trace (Σq), which will be analyzed and compared against allDB traces for finding its motifs. Each query trace is itself added to the DB tracesafter the algorithm is finished. This part is depicted in lines 21–23 of Algorithmalgorithm 2. Parts A and B of Figure 5.4 display sample DB traces and query traceof the running example (Figure 5.3).5.4.3 Finding Exact MatchesNext, the algorithm finds all the exact matches of length k between the query traceand the DB traces. We start by matches of length 2 (function pairs). We thenincrement the length of exact matches iteratively and repeat the search at eachiteration, until we have found all exact matches. Two sets of exact matches of length2 and 3 are shown in Figure 5.4, between 2 DB traces (part A) and the query trace(part B). Lines 25–26 of algorithm 2 iterate over the query trace for finding matches125Algorithm 2 Finding exact matches and expanding theminput :app, ΣΣdboutput :motifsProcedure EXTRACTPATTERNS() begin21 modifiedApp← INSTRUMENT(app)22 rawTrace← INTERCEPT(modifiedApp)23 Σq← PRUNE(rawTrace)24 k← 225 for i← k; i ≤ Σq.length; i ++ do26 for j← 0; j ≤ Σq.length - k; j ++ do27 subQ← EXTRACTSUBTRACE(k,Σq, i, j)28 matches← EXACTDBMATCHES(ΣΣdb, i, subQ)29 for m← 0; matches.length; m ++ do30 for n← 0; n < matches[m].length; n ++ do31 dir← INITIALEXPANSIONDIRECTION()32 Σ I←[qIstart qIend dbIstart dbIend]33 if EXPANDABLE(Σq, ΣΣdb[m]), Σ I then34 subDb← matches[m][n]35 expandedQ← Σq.EXPAND(subq, Σ Iq, dir)36 expandedDb← ΣΣdb[m].EXPAND(subdb, Σ Idb, dir)37 dir← DIREXPANSION(toggle: true, ΣΣdb[m], Σq, Σ I)38 k.INCREMENT()39 MOTIF← COMPARE(expandedQ, expandedDb, k)40 matches.PUSH(MOTIF, k)41 motifs.ADD(MOTIF)42 Σ I ← ADJUSTEXPANSIONINDICES(dir)4344454647of length k, and increment k at each iteration. Lines 27–28 show that subsequencesof length k are extracted from the query trace at each iteration, and are comparedagainst all k-length subsequences of all DB traces to find matches.5.4.4 Allowing Abstraction in MotifsIn the next step, we expand each match to progress towards finding flexible andabstract motifs, that are tolerant of small alterations. This technique decreasesdependency on specific execution details, provides a higher-level overview of the126semantics of the application, and permits flexible motifs of variable length that mayinclude gaps.At this stage, our algorithm iteratively performs the following steps. First, itselects two matches from existing matches, which were determined as the result ofprevious step (Figure 5.4). Then, it iteratively expands the query and DB matchesfrom both directions, while gradually incrementing the length of the motif underinvestigation (Lines 29–45 of Algorithm algorithm 2). Figure 5.4 also showsan expanded match of two [partially] different sequences, for measuring theirsimilarities (Figure 5.4. A). This phase continues until the accumulated penalty ofgaps interrupts expansion of the motif.Next, the algorithm finds a [sub-]sequence of those matches that have themaximum similarity. At this step, we adapt a dynamic programming algorithmcalled Smith-Waterman for finding patterns in two molecular sequences [123].This algorithm quantifies the sequence alignment process by assigning scores formatches and mismatches, and penalties for gaps. Aligned sequences are then foundby searching for the highest scores in scoring matrices. To adapt this algorithm toour domain and compare similarities of two traces, we propose a similarity matrixthat determines the similarity of two members in traces. The similarity of the tracesare determined by a combination of similarities of all their members, based on adynamic programming heuristic.Similarity Matrix. We propose a similarity measure for quantifying the similaritiesbetween function executions in traces. Our comparison is based on two metrics,function names and parameters. We devise three scores for comparison: strongmatch, weak match, and mismatch (leading to a carryover penalty). If two functionsmatch in terms of names and parameter numbers they are a strong match. The matchis weak if only the names are equal (and not parameter numbers). The reason forconsidering parameter count as a separate metric is the nature of JavaScript, wherea function call does not need to be faithful to the function signature in terms ofarguments (known as function variadicity). Two function executions do not matchif both metrics are different. The strong and weak matches are assigned positivescores, while the mismatch is assigned a negative score as it can represent gaps inthe motif, which can accumulate and disrupt a motif. The base scores for matchesand penalties are determined using empirical data.127submitExpanded MatchsubmitstudentFormoccupationcheckAddressvalidateEmailDBTraceDBTracevalidateEmailcheckAddressoccupationsubmitQuery TracevalidateEmailcheckAddressoccupationsubmitsubmitstudentFormoccupationcheckAddressvalidateEmailQuery TraceQuery TraceDB TraceoccupationvalidateEmailcheckAddresssubmitoccupationvalidateEmailcheckAddressstudentFormsubmitDBTracevalidateEmailcheckAddressoccupationsubmitsubmitstudentFormoccupationcheckAddressvalidateEmailQuery TracesubmitstudentFormoccupationcheckAddressvalidateEmailDBTracevalidateEmailcheckAddressoccupationsubmitQuery Trace0 0 0 0 0 00 2 0 0 0 00 0 4 2 0 00 0 2 6 4 20 0 0 4 4 60 0 0 0 0 00 2 0 0 0 00 0 4 2 0 00 0 2 6 4 20 0 0 4 4 6NOoccupationvalidateEmailcheckAddressABSTRACT0 0 0 0 0 00 2 0 0 0 00 0 4 2 0 00 0 2 6 4 20 0 0 4 4 60 0 0 0 0 00 2 0 0 0 00 0 4 2 0 00 0 2 6 4 20 0 0 4 4 6ACDBEFMatch? PatternFigure 5.5: This figure briefly depicts (A) the process of taking two ex-panded trace subsets, (B, C) forming a scoring matrix based on sim-ilarities between sub-traces, and (D, E) finding a match in mannerthat maximizes the similarities between sub-traces. The final motifcan be seen in (F).Moreover, our matrix needs to support comparison of motifs, to accommodateanother extension of the original algorithm, which permits hierarchies betweenmotifs. The function execution members of a motif are compared as explainedabove. Should a motif contain an abstract node, then all valid executions of theabstract node should be compared with the other sequence. Throughout the processof comparing motifs, our algorithm infers hierarchies and abstractions of motifsshould they exist, as explained below.Then, we perform our adaptation of the Smith-Waterman algorithm on the twoexpanded sequences, as shown in line 48 of Algorithm algorithm 3, which createsa scoring mechanism for comparing the two sequences based on a one-by-onecomparison of all their entities, based on our similarity matrix. The result is ascoring matrix of overall scores of comparing two sequences (Mk+1,k+1). This128process is shown for the two sequences of the running example from the previousstep in Figure 5.5.A–C. To find the common motif in these sequences, we findthe sub-sequences that hold the highest collective similarity as a group. We startby finding the highest score in the matrix (line 49 of algorithm 3, Figure 5.5.D),and then trace the matrix back, determining the aligned motif at this stage (lines50–58 & Figure 5.5.E). For navigating the motif back in the matrix, our dynamicprogramming algorithm chooses the maximum neighbouring score at each step (line51). Based on the selected neighbour, the algorithm determines whether that motifmember comes from one or both of sequences, and whether an abstract entity shouldbe injected to show different alternatives of the motif. The motif’s confidence inthat member is then updated based on how it is selected (lines 53–55 of Algorithm3). The inferred motif of the running example (Figure 5.5.F) has five members, oneof which is abstract. The abstract member was advised to enable to motif to supportboth sequence shown in Figure 5.5.A, which serve the same purpose (registration),but are executed in a slightly different manner. The abstract member demonstratesthat observing the studentForm function in the motif is arbitrary, and the motifcan either be observed with this function and five total members, or with only fourmembers which do not include said function.5.4.5 Inferring Hierarchies of MotifsIn the next step, we devise another extension to the original algorithm, whichenables us to infer and reveal hierarchical relations between motifs. By definition,our motifs are composite entities, which can contain other motifs as their members.During the analysis, our algorithm may encounters cases where (1) the match thatis being compared is a motif itself, and (2) the expansion leads to discovery of anew motif. In such cases, a hierarchical relation is added between the two motifs.Meaning not only the [sub-]motif was observed independently within the execution,it also contributes to the formation of the new and larger motif. Our analysis followsa bottom-up approach, starting with function executions as building blocks of thetrace. It iteratively works the way up to higher-level and more abstract motifs thatallow flexibility in execution. At each iteration of the algorithm, new motifs can berevealed which may have hierarchical relations with existing motifs. As such motifs129Algorithm 3 Inferring a motifinput :S1,S2,koutput :motifProcedure EXPANDPATTERNS() begin48 Mk+1,k+1← SMITHWATERMAN(S1,S−2,k)49〈imax, jmax〉← MAXSCORELOCATION(Mk+1,k+1)50 while imax > 0 AND jmax > 0 do51 dir← MAXNEIGHBOUR(imax, jmax)52 if dir == DIAG then53 MOTIF.INSERT(ABSTRACT(S1[imax],S2[ jmax]))5455 else if thenmotif.insert(FUNCTION(S1[imax] == S2[ jmax]))5657〈imax, jmax〉← BACKTRACK(Mk+1,k+1,S1,S2,dir)5859 if S2.type == MOTIF then60 BUILDHIERARCHY(S2 , MOTIF)6162 return MOTIFemerge, our algorithm captures the process of their formation and hierarchies in amodel, as explained in the next section.In the running example, we first find an exact match with k = 3, which is a motifitself, but with no abstraction (Figure 5.4). Later, during expansion, we find thatthis motif is a member of a larger motif (Figure 5.5.F) with two other members (anabstract member and a function execution). The algorithm creates a hierarchy (line59 of algorithm 3), which then manifests in the model (Figure 5.6), as an edge fromthe new abstract motif (node 1) to the sub-motif (node 2).5.5 Creating and Visualizing the ModelIn this section, we explain our methodology for inferring the hierarchical model ofbehavioural motifs and visualizing it.130VerticesPatternFunctionAbstractDependentExclusiveAttributeWeakStrongRegistrationvalidateEmail checkAddress occupation submitstudentFormclick12 38Hierarchy AbstractionEdges4 5 6 7Figure 5.6: Sample model of the running example. The root node (1) isthe highest-level inferred motif. Node 2 is a sub-motif of node (1),marked by the hierarchical edge between the two. Node 3 is abstractallowing variations of its child node to occur in the motif. The leavesof (nodes 4–8) are concrete function executions in the trace.5.5.1 Creating the Motif ModelAs mentioned above, during the process of extracting motifs, our approach infersthe hierarchies and other potential relations between them. Such structural relationsare preserved in a model, represented as a directed acyclic graph (DAG), whichevolves as the algorithm proceeds, as explained below.Vertices. Vertices of the graph can be functions, motifs, abstract entities, ordependent vertices. Function vertices are atomic nodes representing specific andcontext-sensitive function executions. Motif vertices are semantically compositeclasses. Each motif conceptually contains an ordered set of its members. Abstractvertices are the model’s means for supporting flexibility in motifs. Should therebe alterations in different observations of a motif, an abstract node is used foraccommodating all valid cases. Dependent vertices hold additional attributes ofother types of vertices. and exist only to provide more information about othervertices. e.g., an event contributing to a function execution is shown as a dependentvertex.Edges. The vertices of the graph are connected through directed and orderededges. The edges are responsible for connecting motifs to their members. Thedirection of an edge is from a motif node to its members, which are ordered based131on the time they were observed in traces. The edges also represent the confidence ofthe algorithm in the respective member (strong or weak), based on the manner ofobservation of the member. Edges may contain other special attributes, dependingon its type and the purpose it serves in the model. For instance, the “edge exclusion”property is used to show that only one of the variations of an abstract node is validat a given time.Figure 5.6 represents the model of the running example of Figure 5.1. The rootof the DAG (node 1) is a motif representing registration. It consist of three members:a sub-motif (2), an abstract node (3), and a function execution (8). The first memberis a motif itself, which contains a sequence of three functions from the trace, marked4–6. The strong edge to the sub-motif (and from there to its children) shows thehigh confidence of the algorithm in the sub-motif. Node 3 is an abstract node, actingas a place holder for valid versions of the node, manifested in its children. Thisnode exists due to the variation in two observations of the motif (Figure 5.5.A). Inthe case of our example, the exclusive type of the child (7) edge demonstrates thatoccurrence of this node is optional in the motif (studentForm is observed ornot). Further, the weak edge type displayed the algorithm low confidence in thisnode. Node 8, the final member of 1, is a execution of function submit. All leavesof the DAG are concrete function executions from the trace. Nodes at higher levelsof a graph involve abstractions and hierarchies, to represent the incremental processof emergence of motifs from details of trace.Motif Relations. The hierarchies that form between motifs are one type ofrelations that are preserved through the model. The algorithm also discovers othertypes of relations between motifs, such as temporal or causal relations. The in-tegration of all these relations depict how semantics of the program are shapedfrom bottom (small specific motifs) to the top (larger and more abstract motifsrepresenting key points of behaviour). The motifs may be semantically relatedin manners that are not quite obvious from the code. For instance, motif m1 maycause motif m2, or they may be ordered (but not dependent) due to the design andthe architecture of the system. Querying the model allows us to reveal patterns ofmotifs themselves and even discover patterns that are not known by the developer,are created unintentionally, or are imposed on the system by other factors such asthird-party frameworks.132A BEDCDB MotifsFigure 5.7: A [modified] screenshot of visualization. (A): Query trace.(B): Inferred motifs depicted on the table. (C): Motif hierarchies.(D):All motifs. (E): Code panel displaying selected function/motifcode.5.5.2 Visualizing the ModelFinally, we visualize the motifs to further assist program comprehension by takingadvantage of information visualization techniques. Our web-based visualizationprovides two main views for displaying (1) the motifs recorded in a specific querytrace, and (2) all motifs discovered in the behaviour (DB traces).Trace Motifs. To allow developers to focus only on a part of behaviour that isof interest to them, this view displays motifs that are found within the query trace,freshly recorded from an interaction session (Figure 5.7, A and B). Section (A) ofthe figure displays the pruned query trace, where time proceeds from top to bottom.Section (B) displays the motifs, distinguished by colour and index. The saturationof each cell of a motif displays the motif’s confidence in that member. Each motifmay recur multiple times in the same trace, or may contain hierarchies of motifs(Figure 5.7C).All Motifs. The second view is meant to provide a global overview of appli-cation behaviour by displaying all its motifs, extracted from all DB traces (Fig-ure 5.7D). These motifs may display system usecases, feature implementations, orother higher-level sequences that somehow describe the functionality of the system.They do not conform to a single trace, and thus each motif has its own separate133trace. In both views, hovering the mouse over each entity displays more informationregarding that entity in the tooltip. Clicking on an entity displays its respectivecode (function or motif) on the code panel (Figure 5.7E). Displaying only the coderelevant to the motif, allows developers to only focus on their specific task, withoutthe added burden of understanding the whole application.5.6 Implementation: SABALANWe implemented our approach in an open-source tool, called SABALAN. Theentire tool is implemented in JavaScript. We create our own Express.js server forimplementing the algorithm, the instrumentation unit, and the visualization. Wedevelop the bioinformatics-inspired algorithms from scratch for execution traces.We use a proxy to automatically inspect applications [63]. For instrumenting thecode, we create an AST of the code, modify it, and serialize it back into JavaScript[43–45]. SABALAN is publicly available [118].5.7 EvaluationWe empirically evaluate our approach by investigating the characteristics of theextracted motifs, as well as the usefulness of our approach for developers and itsoverhead through the following research questions.RQ5.1. What are the characteristics of typical motifs inferred by SABALAN fromexecution traces?RQ5.2. Does using SABALAN improve developers’ performance for commoncomprehension tasks?5.7.1 Motif CharacteristicsTo address RQ5.1, we performed our analysis on seven JavaScript applications,listed in Table 5.1.Design. We selected seven open-source JavaScript applications from GitHub (Ta-ble 5.1). These applications cover various software domains and were selectedbased on their popularity and usage. Based on each application’s specifications,we selected a method for collecting its traces (running the test suite or designing134scenarios for exploratory testing). We provided the traces (DB and query) as inputto our approach, and analyzed their extracted motifs and investigated their maincharacteristics. We measured the number of unique motifs inferred from all tracesfor each application, calculated their lengths, and analyzed their hierarchies. Weregistered the size of DB traces in terms of number of different traces as well as thesize of a typical trace.Results and Discussion. The results of the analysis are depicted in Table 5.1. Thesecond column displays the number of lines of code for each application, while thethird column contains the number of motifs our approach found in the applications.The next column represents the number of DB traces that were collected for eachapplication. Column five, shows the average size of traces of each application,collected by SABALAN in one-minute interaction sessions. Note that our toolperforms a level of filtering while logging execution details. Column six shows theaverage trace size collected by Google Chrome’s JavaScript profiling and Timeline.It can be seen that the average trace size using SABALAN is 77 KBs, while withoutSABALAN there is an average of 96 MBs of data for the same interaction session.These values emphasize the extent of the information contain in the raw traces,even for modest-sized applications, which make them challenging for developers toanalyze. However, using our approach, developers have an average of 8 recurringhigh-level motifs for each interaction session, each with an average length of 4(columns 7–9), to guide them through the understanding the behaviour. The lastcolumn displays the number of unique hierarchical relations between unique inferredmotifs. The numbers show the existence of hierarchies of motifs. Further assessmentof the structures of model graphs depict the bottom-up formation of higher-levelkey points of behaviour based on smaller motifs through such hierarchies.There are a few cases in the results where the algorithm was not able to findmany (meaningful) motifs, or any hierarchies. Upon further investigation, we foundthat these applications rely heavily on external and graphic libraries, which weredisabled in our analysis. These features can be activated in future if needed.An important factor that can impact the efficacy of the algorithm is the require-ments of the DB traces. The number of initial traces in the knowledge base, theircoverage of the application’s functionality, and their similarities (or differences),are factors that can impact the quantity and quality of the final motifs. We aimed to135Table 5.1: Characteristics of traces and inferred motifsMotif LengthApplication LOC #ofM.#ofDBTracesize(KB)RawTrace(MB)Avg Min Max #ofunqH.Phormer 6000 13 20 84 86 4 2 11 4same-game 229 4 7 255 143 3 2 4 0simple-cart 9238 4 19 45 67 4 2 8 3browserQuest 36206 17 15 67 125 5 3 9 2adarkroom 15612 6 15 41 40 4 2 6 2doctored.js 3534 4 10 16 102 3 2 5 1hextrix 5154 7 16 30 110 4 2 6 2Average 10853 8 14.5 77 96 4 2 7 2maximize the features we covered with the DB traces. We stopped collecting newDB traces when we observed that adding a trace did not affect the inferred motifs(average of 14.5 DB traces per application).5.7.2 Controlled ExperimentNext, we conducted a controlled experiment to assess the effectiveness of ourtechnique for developers in practice and address RQ5.2. We divided the participantsinto control and experimental groups. The experimental group used our approach,while the control group used the tool of their choice. The participants accomplisheda set of comprehension tasks, and their performance was measured. The taskswere designed based on common software comprehension activities [102]. Wedefined the performance of a developer by the combination of time and accuracy ofcompleting the tasks. Our hypothesis was that using our approach would enhancedevelopers’ performance in understanding the overall behaviour, main usecases, andrecurring motifs of a web application.Experiment PlanningThe goal of our experiment is to investigate the following research questions.RQ5.2.1. Does using SABALAN decrease task completion duration for commoncomprehension tasks?RQ5.2.2. Does using SABALAN increase task completion accuracy for common136comprehension tasks?RQ5.2.3. Is SABALAN better suited for certain types of comprehension tasks?Variable Selection. Our design involved one independent variable (IV), the variablewe controlled, which was the type of tool used in the experiment, i.e., a nominalvariable with two levels. We refer to the first level as SABALAN, since they had touse our tool. The second level represented usage of other tools, which we refer toas OTHER. Our goal was to measure developers’ performance in completing thetasks. Since performance is not measurable, we quantified it using two variables,namely task completion duration and accuracy (both continuous), which were ourdependent variables (DV).Selection of Object. We chose Phormer photo gallery application as our object[106], which has about 6,000 lines of code and over 43,000 downloads. It is anopen-source PHP-based application that allows users to store photos, categorize andrate them, view them as a slideshow. Since we had allocated limited time for eachsession, we had to choose an application that is simple, and yet exhibits realisticmotifs in its behaviour - these criteria are met by Phromer.Selection of Subjects. We recruited 14 participants for the experiment. They wereall graduate students in computer science and engineering, and many of them hadprofessional software development experience. The participants consisted of 2female and 12 male participants, aged between 23 and 35, and they volunteered forthe experiment. Knowledge of programming and familiarity with web development(and particularly JavaScript) were our only requirements for picking the participants.Overall, our participants had 1–10 years of web development and 1–18 years ofsoftware development experience, respectively.Experimental Design. Our experiment had a “between-subject” design. To avoidthe carryover effect, we divided our participants into two groups. The experimentalgroup were given access to SABALAN for performing the tasks, while the controlgroup used Google Chrome’s Developer Tools for completing the session. Allour participants were familiar with DevTools according to their answers to thepre-questionnaire form and chose to use it during the experiment. No member ofthe experimental group were familiar with SABALAN prior to the study session. Toavoid bias in favour of one of the groups in terms of their proficiency levels, we137Table 5.2: Comprehension tasks used in the study.Task Description ActivityT1.A Understanding all common usecases A1, A7, A9T1.B Determining the most used scenarios A6, A7T2.A Locating the implementation of a feature for reuse A1, A3T2.B Estimating the quality of said implementation A4, A5, A8T3 Understanding the addition of a new feature A1, A2, A3collected historical data about our participants prior to scheduling the sessions. Weassigned each participant a proficiency score, based on a combination of metrics,including their experience with software development, knowledge of JavaScript,and how they perceived their own expertise. We balanced the proficiency levels inboth experimental and control groups.Experiment Tasks. We designed five comprehension tasks, as outlined in Table 5.2.The design of the tasks was based on common program comprehension activities,proposed by Pacione et al. [102]. As the name suggests, these activities representfine-grained activities that developers need to perform for understanding software,regardless of the language and the platform used. Table 5.2 shows how each ofour tasks covers one or more activities - all activities are covered in our design.Moreover, each task also included a mini questionnaire, which asked about howparticipants perceived the difficulty of the task, the required time, and the requiredexpertise level for accomplishing the task. We have made all the tasks and datasetspublicly available [118].Experimental ProcedureThe procedure of the experimental sessions consisted of three main phases.• Pre-study. We required all participant to fill a pre-questionnaire form togather some demographic data about them. Further, we used the data regard-ing their experience, programming habits, and self-perceived expertise level,to assign participants expertise scores. The score allowed us to fairly balancethe expertise levels in both experimental and control groups.• Training. At this step, the experimental group were given a tutorial onSABALAN, which they were encountering for the first time. Then both138●●●●●●●T1A−CtrlT1A−ExpT1B−CtrlT1B−ExpT2A−CtrlT2A−ExpT2B−CtrlT2B−ExpT3−CtrlT3−ExpTotal−CtrlTotal−Exp050100Accuracy (%)●●●●●●●T1A−CtrlT1A−ExpT1B−CtrlT1B−ExpT2A−CtrlT2A−ExpT2B−CtrlT2B−ExpT3−CtrlT3−ExpTotal−CtrlTotal−Exp050100Accuracy (%)●●●●●●●T1A−CtrlT1A−ExpT1B−CtrlT1B−ExpT2A−CtrlT2A−ExpT2B−CtrlT2B−ExpT3−CtrlT3−ExpTotal−CtrlTotal−Exp050100Accuracy (%)Figure 5.8: Notched box plots of ccuracy results. Green plots display ex-perimental (SABALAN) group, and gold plots display the controlgroup. Higher values are better.groups were given some time to familiarize themselves with the setting of theexperiment. We then started the tasks when the participants were ready.• Tasks. During this phase, the participants completed the five comprehensiontasks summarized in Table 5.2. Based on our design, we wanted to measureboth duration and accuracy of completing the tasks. To measure time, weprepared each task on a separate sheet of paper. We started a timer when wehanded a task sheet to a participant, and asked her to return it to us (with theanswer) as soon as she had completed the task, which is when we stoppedthe timer. This allowed us to record the time they spent on each task. Weevaluated the accuracy of each task later, based on rubrics we had preparedprior to conducting the experiment.Moreover, we wanted to gather some data regarding how the participantsperceived the tasks. Thus, we provided them with a set of meta tasks, thatquestioned them about the perceived difficulty, time-consumption, and re-quired expertise level for each task. Finally, the participants filled a post-questionnaire form regarding their experience in the study.139ResultsWe first ran the Shapiro-Wilk test on all collected data sets, to determine if they werenormally distributed. For normally distributed data of accuracy we used two-samplet-tests. The duration data did not pass the normality test and thus we used theMann-Whitney U test.For the accuracy, the results of running the tests showed a significant dif-ference, with a high confidence, for the experimental group using SABALAN(Mean=87.8%, STDDev=11.6%), compared to the control group using ChromeDevTools (Mean=50.5%, STDDev=11.6%); (p− value = 6.2e−05). The accuracyresults are shown in Figure 5.8. Overall, using SABALAN increased developers’accuracy in performing comprehension tasks by an average of 54%, over othertools. (RQ5.2.1). We further analyzed the impact of using SABALAN in accuracyof individual tasks. The results of running the statistical tests showed significantdifference in favour of SABALAN for all tasks, expect T3. The accuracy of resultsof T1A through T2B were significantly higher using SABALAN. The results for taskT3, although not statistically significant, were on average 23% more accurate whenparticipants used SABALAN.For the times, the collected task completion duration data were comparable forparticipants of both experimental and control groups. Running the tests did notreveal any statistically significant difference in task duration between the two groups(RQ5.2.2).Finally, we analyzed the collected data from the questionnaire form participantsfilled regarding each task. They perceived the difficulty of tasks from 2.15 to 2.54,based on a 5-point Likert scale, which shows an average level of difficulty for alltasks. We compared the difficulty of each task as perceived by participants with theresults of duration and accuracy of the same task. We found no correlation betweenperceived difficulty of a task by participants and how they actually perform the taskbased on the Pearson correlation coefficient.5.7.3 DiscussionThe results of the experiment revealed that SABALAN improves developers’ per-formance in comprehension by significantly increasing their accuracy by 54%140(RQ5.2.1). The results however did not show a significant difference for duration(RQ5.2.2).Domain Knowledge and System Use-casess. One of the first steps towardsprogram understanding is general understanding of its domain and overall dynamicbehaviour by identifying the components that provide a solution to the domain. Ourresults show that SABALAN significantly increases the accuracy of such tasks (T1).This task consisted of two main parts, understanding the overall behaviour and usec-ases of the experimental object (T1.A), and deciding on their importance (T2.B).Using SABALAN significantly improved the accuracy of these two tasks by 49% and78%, respectively. The results show that using SABALAN not only provides a moreaccurate overview of an application’s behaviour compared to ad-hoc approaches,but also helps developers obtain a better understanding of the importance and usageof main system components and their interactions (RQ5.2.3).Feature Location. Feature location is one of the main tasks performed duringprogram comprehension, and has many applications, such as reuse and testing. Ourresults show that using SABALAN significantly improved the accuracy of featurelocation (RQ5.2.1, RQ5.2.3). The experimental group were able to find componentsinvolved in the implementation of a feature and infer their relations 42% moreaccurately than the control group (T2.A). They were also 42% more accurate inestimating the quality of the implementation of the said feature (T2.B). Investigatingthe answers revealed that the control group missed many connections in the codethat lead to discovery of different parts of the implementation and thus failed tocreate a complete and accurate model of the involved code. Due to their incompleteunderstanding of the feature, the control group was not able to estimate and measurethe quality of the respective part of application as well. The experimental group,however, could assess the quality based on the more accurate model of the behaviourthat extracted the feature as a behavioural motifs (RQ5.2.3).Software Change and Root-Cause Detection. The last task (T3) involvedunderstanding the system in order to make a change, by finding the root cause of aparticular observed behaviour. The experimental group were able to perform thetask 23% more accurately with SABALAN, although the results were not statisticallysignificant (RQ5.2.1). Using SABALAN, they were able to focus on a much smallerpart of the code that was relevant to the feature that needed change. However, be-141cause we do not have debugger support within SABALAN, using common debuggingtechniques such as setting breakpoints and watching variables in such tasks requiredthe participants to frequently switch between the visualization and the application.We believe that we could achieved statistical significance for task if we extendour research prototype or integrate it with a debugger such as Google Chrome’sDevTools.Accuracy over Speed. The results did not show any significant difference fortask completion duration in favour of SABALAN. We believe this is not a significantissue due to three reasons. First, accuracy of performing a task is more impor-tant that its speed [7]. The significant improvement of task completion accuracywith SABALAN (54%), and the test’s high confidence in the result, emphasize thechallenges of comprehending traces, as well as the usefulness of SABALAN inimproving developers’ performance for completing said tasks. Investigating theanswers further, we found that many participants in the control group had finishedthe tasks early, assuming they had the right answers. While in fact, they were noteven aware that they are missing crucial parts of the answer, which resulted in themhaving lower accuracy than SABALAN users. Next, we believe that the unfamiliarityof our participants with SABALAN might have caused them to spend more timetrying to use it. This theory is strengthened when we analyze individual tasks results.We observed that the experimental group had the worst speed ratio compared to thecontrol group for the first task (T1.A), after which they quickly improve and surpassthe control group in later tasks. Finally, dividing the locus of attention may havealso played a role in the results. While the control group were only focusing on thebrowser, the SABALAN group had to switch back and forth between the application(browser) and the tool. We believe this problem can be solved by either extendingthe tool or integrating it into an existing programming environment.Participants’ Perception of Tasks vs. Performance Reality. There were nocorrelations between the difficulty of a task as perceived by participants, and theirmeasured performance scores. All participants deemed all tasks to be of moderatedifficulty. However, the control group scored significantly lower accuracy marks forall tasks. Considering the participants rated tasks after their completion, shows theirinterpretation of the task requirements did not match the reality of the task they hadjust performed. The result confirm the challenging nature of trace comprehension.142Due to difficulty of finding, relating, and collecting elements of execution withan ad-hoc approach, developers miss crucial elements of analysis, without evenknowing that there is more to the task.Performance Overhead. We used the experimental object of our user study,Phormer, to obtain data regarding the additional overhead of our approach, in10 one-minute interaction sessions. We measured three sources of potential per-formance overhead into account. The overhead caused by instrumentation phase,the imposed overhead on execution of instrumented code and data collection, andthe overhead of analysis of traces and motif extraction, which were respectivelymeasured as 1.2, 0.1, and 2.1 seconds on average. This is negligible for all practicalpurposes, and is barely noticeable during the interaction with the application. Thus,the performance overhead of SABALAN was entirely acceptable for this application.5.7.4 Threats to ValidityThe external threats of conducting an experiment such as ours, typically arise fromrepresentativeness of tasks, participants, and object selected for the experiment. Wemitigated the threat of task selection by designing our tasks in a manner that coveredall Pacione’s common comprehension activities [102], to show that the design ofour tasks is not biased, and that they are representative of routine comprehensiontasks. A valid concern is regarding the representativeness of the participants ofthe developer population, since we recruited students. We tried to address thisconcern by recruiting only graduate students who had prior experience with Java-Script, many of whom had experience working in industry. To address the threatof representativeness of the experimental object, we chose an open source Java-Script application Phormer, [106] with about 6,000 lines of code and over 43,000downloads at the time of conducting the study. An internal threat that concernsour method is the bias towards assigning participants into control and experimentalgroups, or the population-selection problem, which we addressed by balancing theexpertise levels between the two groups. Other threats can arise due to the possiblebias of the examiner (us) regarding the measurement of both duration and accuracyof task completion. We mitigated the time measurement threat by designing amethod for time measurement that both the participant and the examiner agreed143upon, namely physically exchanging the task sheet between the participant and theexaminer. We mitigated the bias towards measuring accuracy by creating a rubricprior to conducting the experiments, and abiding by the rubric for marking the tasksin order to address this threat. The final threat we address is the tool used in theexperiment. We chose Google Chrome’s Developer Tools, which is very popularfor client-side web development, and all our participants were previously familiarwith it (based on the pre-questionaire they filled out).5.8 Concluding RemarksIn this paper, we proposed a generic technique for inferring a hierarchical modelof application-specific motifs from execution traces. Our motifs, inspired by bioin-formatics algorithms, are recurring abstract patterns of execution that abstract outalterations and are closer to the higher-level features of a system. We designed avisualization for our technique that allows users to observe and query the motifs forprogram understanding. Our technique is implemented in a tool called SABALAN,which is publicly available. The results of our user experiment showed that using thesystematic analysis of SABALAN enabled participants to perform comprehensiontasks 54% more accurately than other state of art tools.144Chapter 6Related WorkPrevious research has approached understanding program behaviour and changethrough different perspectives.Program Analysis. EventRacer is a tool for facilitating dynamic race detectionfor event-driven applications [112]. Compliant with its goal, EventRacer tracesonly the events and not other dynamic and asynchronous feature of JavaScript.Moreover, unlike all our methods, theirs requires using an instrumented browser.Wei and Ryder [136] use both static and dynamic analysis to perform a points-toanalysis of JavaScript. However, they do not take into account the DOM-basedand asynchronous interactions of JavaScript. Ghezzi et al. [51] extract behaviouralmodels from a different perspective. They focus on users’ navigation preferencesin user-intensive software. Their approach, called BEAR, depends on server logsto capture user interactions. Unlike CLEMATIS and SAHAND, BEAR only focuseson direct user interactions in order to fulfill its purpose, which is classifying thebehaviour of users.Originally proposed by Weiser [138], program slicing techniques can be clas-sified in two categories, namely static and dynamic slicing [72]. WALA [124]performs JavaScript slicing by inferring a call graph through static analysis. SinceJavaScript is such a dynamic language, WALA yields conservative results thatmay not be reflective of an application’s actual execution. It also ignores the Java-Script-DOM interactions completely. Although not used for slicing purposes, others[94, 140] have utilized static analysis to reduce the execution overhead incurred145from code instrumentation. CLEMATIS determines JavaScript slices through aselective code instrumentation algorithm.There are numerous static analysis techniques proposed for JavaScript analysisin different domains [47, 66, 69, 83, 95, 125]. We did not choose a pure staticapproach, since many event-driven, dynamic and asynchronous features of Java-Script are not well supported statically. Dynamic and hybrid JavaScript analysistechniques have attempted to solve the shortcomings of static analysis [4, 6, 96, 137].However, existing techniques focus on the client-side and do not consider the server.Magnus et al. recently proposed a technique to build an event-based call graphfor Node.js applications [84]. There are two differences between their work and ours.First, our method considers functions in the graph as temporal and context-sensitivenodes, which can interact with each other and with events throughout differentphases of their lifecycle. Second, our technique accounts for various means ofasynchronous scheduling. It integrates client information, client-server interactions,and asynchronous server execution and creates a behavioural model. It is throughthis model that SAHAND can provide a holistic and temporal overview of full-stackexecution.Analysis of Asynchrony. Different approaches target asynchrony in different do-mains, such as comprehension, debugging and testing. Frameworks such as Arrows[70] have been proposed to help developers understand and avoid asynchronouserrors. Zheng et al. [145] used static analysis to find asynchronous bugs in webapplications. WAVE [62] is a testing platform for finding concurrency errors onthe client side. Libraries and features such as Async.js [25] and Promises [109]have been adopted to “tame” the asynchronous JavaScript issue. Despite beingvery useful and promising, Async.js is not native to JavaScript. Both Async.js andPromises require the current and future code to follow specific design and syntacticguidelines, which impede their wide adoption.Fault Localization and Debugging. Delta debugging [144] is a techniquewhereby the code change responsible for a failure is systematically deduced bynarrowing the state differences between a passing and a failing run. Other faultlocalization techniques have been proposed that compare different characteristicsof passing and failing runs for a program [1, 28, 54, 110]. CLEMATIS is differentin that it focuses on a single web application test assertion at a time and does not146require a passing test per se to operate.There is limited research geared towards web application fault localization inthe literature [14, 99]. Google has recently provided some support for debuggingasynchronous JavaScript in Chrome DevTools [26]. Our work is different from pre-vious techniques since it aims at making the implicit links between test failures andfaulty JavaScript code more explicit to enhance debugging. In addition, calculatingand displaying the JavaScript code slice for a test assertion poses new challengesnot faced by previous techniques. This is stemmed from the disconnect between atest assertion failure, the DOM, and the JavaScript code interacting with the DOM.Analysis of Change. Static Analysis. Static CIA is performed by analyzingthe source code without executing it. A common pattern in many traditional CIAtechniques is the usage of dependency-based impact analysis methods [18]. Staticimpact analysis techniques typically find syntactic dependencies by performingforward slicing on the graph. This type of analysis, however, is based on assumptionsmade for all possible executions of the software, and hence incurs false positives,which hinders its adoption [58]. The dependency graph can become large and maycontain invalid paths. Hence, the resulting impact set can be large and difficult tocomprehend. More recently, static analysis has been applied to analyze JavaScriptapplications. Sridharan et al. [125] adapt traditional points-to analysis for Java-Script through correlation tracking of dynamic properties in the code. Jensen et al.[66] statically model the role of the DOM and browser in their analysis. However,they acknowledge gaps and shortcomings in their analysis, which can result inmany false-positives. Feldthaus et al. [47] present an approach for constructingapproximate JavaScript call graphs. However, their analysis completely ignoresdynamic property accesses and interactions with the DOM. Madsen et al. [83]combine pointer analysis with use analysis to investigate the effects of JavaScriptlibraries and frameworks on the applications’ data flow. These techniques neglectthe dynamic DOM interactions as well as event-driven, and asynchronous features ofthe JavaScript language. Therefore, their analysis can be incomplete for performingchange impact analysis for JavaScript applications.Dynamic Analysis. Existing dynamic methods produce a precise but incompleteanalysis. Apiwattanapong et al. [12] propose a dynamic technique in which execute-after relations are used to reduce the overhead caused by the amount of collected147dynamic information. Dynamic CIA tools have been applied to various fieldsof software engineering. For instance, Ramanathan et al. [111] avoid testingunchanged test cases by comparing strings of different traces in their tool. Chianti[114] is a CIA tool for Java that reports the change impact in terms of the subsetof the test suite affected by the change. These techniques provide more precisioncompared to static analysis, especially when integrated with other techniques suchas information retrieval [41][50]. However, they do not target JavaScript code andits unique analysis challenges, such as DOM interactions, dynamic function callsinvolving event propagations, and asynchronous callbacks.Wei and Ryder’s recent JavaScript blended analysis approach [136] and state-sensitive points-to analysis [137] are perhaps the closest to our work. Their workintegrates the information gathered during both static and dynamic analyses toperform a points-to analysis of JavaScript applications. However, their methodsdo not focus on analyzing the change impact, and hence do not incorporate thedependencies that are formed through DOM interactions and asynchronous Java-Script mechanisms. Moreover, they do not take into account the important role ofevents and event propagation [133] on the DOM tree, which connects JavaScriptfunctions, unlike our analysis which does.Feature Location, Capture and Replay, and Tracing. Many papers have focusedon locating the implementation of UI- and interaction-based features [23, 81, 85, 86]in web applications. However, they only retrieve the client-side implementation ofa feature, and they require a constant manual effort for selecting the elements orfeatures under investigation. FireDetective [143] is a Firefox add-on that capturesthe client-server interactions to facilitate comprehension. Although its purposeis similar to SAHAND, it only supports partial Java execution on the server side.Further, it does not support a higher level model or a temporal visualization ofthe trace. Li and Wohlstadter [78] present a tool called Script Insight to locatethe implementation of a DOM element in JavaScript code. Similarly, Maras et al.[85, 86] propose a technique for deriving the implementation of a UI feature onthe client side. While similar to our work at a high level, in these approaches theuser needs to select a visible DOM element and its relevant behaviour in order toinvestigate its functionality. This manual effort can easily frustrate the user in largeapplications. Further, these techniques are not concerned with capturing event-based148interactions. Finally, the model they derive and present to the user contains low-levelinformation and noise, which can adversely influence program comprehension.Extensive reliance on user interactions is an important characteristic of modernweb applications. Capture and replay tools are used in the literature to address thisissue [34]. Record and replay techniques aid the understanding and debugging tasksof web applications [11, 22, 89, 91]. The goal of these techniques, however, is toprovide a deterministic replay of UI events without capturing their consequences.Montoto et al. [91] propose a set of techniques for generating a navigation sequencefor Ajax-based websites and executing the recorded trace. Mugshot [89] is asystem which employs a server-side web proxy to capture events in interactiveweb applications. It injects code into a target web application in order to recordsources of nondeterminism such as DOM events and interrupts. The recordedinformation is used by Mugshot to dispatch synthetic events to a web browser inorder to replay the execution trace. WaRR [11] is another system for capturingand replaying events. Capturing is accomplished by altering a user’s web browserin order to record keystrokes and mouse clicks. In the event of a failure, endusers of a web application may send a record of their keystrokes to the developerfor debugging purposes. Jalangi [122] is another record-replay tool that supportsdynamic analysis by shadow execution on shadow values. Burg et al. [22] integratetheir capture/replay tool with debugging tools.The goal in most of these techniques is to find a deterministic way of replayingthe same set of user events for debugging purposes. Instead of simply replayingrecorded events, our approach aims at detecting causal and temporal event-basedinteractions and linking them to their impact on JavaScript code execution and DOMmutations. Moreover, our approach does not require manual user effort, a modifiedserver, or a special browser.Tracing techniques such as FireCrystal [101] and DynaRIA [10] collect tracesof JavaScript execution selectively. Unravel [61] is a more recent tool for supportingdeveloper learning. Similar to our work, these tools provide a high-level abstractionand visualization of the trace. However, all these techniques only focus on theclient-side JavaScript. SAHAND, on the other hand, traces, models and connectsboth client and server side traces with a focus on asynchronous JavaScript execution.Visualization. There are many tools that use visualization to improve the process149of understanding the behaviour of software applications [10, 101, 143]. However,these methods are not concerned with creating a behavioural model and inferringhigh-level motifs of execution, while ours is.Matthijssen et al. [87] conduct a user study for investigating the strategiesthat web developers use for code comprehension. Extraviz [35] is a visualizationtool that represents the dynamic traces of Java applications to assist with programcomprehension tasks. However, their approach does not concern itself with buildinga model of the web application, while ours does.Zaidman et al. [143] propose a Firefox add-on called FireDetective, whichcaptures and visualizes a trace of execution on both the client and the server side.Their goal is to make it easier for developers to understand the link between clientand server components, which is different from our approach which aims to make iteasier for developers to understand the client-side behaviour of the web application.FireCrystal [101] is another Firefox extension that stores the trace of a webapplication in the browser. It then visualizes the events and changes to the DOM in atimeline. FireCrystal records the execution trace selectively similar to our work. Butunlike CLEMATIS, FireCrystal does not capture the details about the execution ofJavaScript code or asynchronous events. Another limitation of FireCrystal is that itdoes not link the triggering of events with the dynamic behaviour of the application,as CLEMATIS does. DynaRIA [10] focuses on investigating the structural andquality aspect of the code. While DynaRIA captures a trace of the web application,CLEMATIS facilitates the process of comprehending the dynamic behaviour using ahigh-level model and visualization based on a semantically partitioned trace.Trace Visualization. Several papers assist program comprehension through dy-namic analysis and visualization of traces. Their proposed techniques allow users toexplore large traces [115], or perform reduction, compaction and pruning techniqueson traces [19, 55–57, 113]. A popular trend is using standard visual protocols, suchas UML diagrams [21, 38, 127]. Other papers propose more customized visual-ization techniques through synchronized views [33], provide program’s landscapefocusing on communications [48], allow user interactions with the visualization[104], visualize similarities in traces [32], or present many other techniques forrepresenting the traces [17, 37, 47, 64, 65, 68, 79, 115, 120, 131]. Extravis [33] isthe first such technique that was quantitatively measured through a controlled exper-150iment [35]. Another group of methods capture and analyze low-level informationin execution traces using techniques such as extracting behavioural units describedin usecase scenarios [134], profiling [73], dividing the trace into segments [107],identifying feature-level phases by defining an optimization problem [16], or similarmethods [30, 39, 103]. Heuzeroth et al. [59, 60] propose to find patterns in execu-tion. Other papers aim at providing higher-level representations of trace [7, 142].However, unlike our approach, these approaches do not address the problem oflarge traces, do not infer a higher-level model of program behaviour, and do notapplication-specific and hierarchical abstract motifs for facilitating comprehension.151Chapter 7Concluding RemarksProgram comprehension is vital for performing many software engineering tasks,consuming much of the effort in software engineering. Modern web applicationsare highly dynamic and interactive, and offer a rich experience for end-users. Thisinteractivity is made possible by the intricate interactions between user-events,JavaScript code, the DOM, and the server. However, web developers face numerouschallenges when trying to understand these interactions.7.1 ContributionsIn this thesis, we introduced our novel techniques for understanding the behaviour,root causes of failures, and impact of change for JavaScript, as well as inferringhigher-level motifs of behaviour from execution traces. The main contributions ofthe thesis are as follows.• A portable and fully-automated technique, called CLEMATIS, for extractingepisodes of interaction in JavaScript-based web applications in Chapter 2. Wealso proposed a strategy for helping developers understand the root causes offailing test cases. We presented a novel interactive visualization mechanismbased on focus+context techniques, for facilitating the understanding ofthese complex event interactions. The evaluation of CLEMATIS points tothe efficacy of the approach in reducing the overall time and increasing theaccuracy of developer actions, compared to state-of-the-art web development152tools.• An automated technique, called TOCHAL, for performing a hybrid DOM-sensitive change impact analysis for JavaScript (Chapter 3). TOCHAL buildsa novel hybrid system dependency graph, by inferring and combing staticand dynamic call graphs. Our technique ranks the detected impact set basedon the relative importance of the entities in the hybrid graph. Our evaluationshows that the dynamic and DOM-based JavaScript features occur in realapplications and can lead to significant means of impact propagation. Fur-thermore, we find that a hybrid approach leads to a more complete analysiscompared with a pure static or dynamic analysis. Finally, our industrial con-trolled experiment shows that TOCHAL increases developers’ performance,by helping them to perform maintenance tasks faster and more accurately.• A novel technique, called SAHAND, for aiding developers’ comprehensionof full-stack JavaScript applications by creating a behavioural model of theapplication. The model, described in Chapter 4, is temporal and contextsensitive, and is extracted from a selectively recorded trace of the application.We proposed a temporal visualization interface for the model to facilitatedevelopers’ understanding of the behavioural model. We investigated theeffectiveness of SAHAND by conducting a user experiment. We found thatSAHAND improves developers’ performance in completing program compre-hension tasks by increasing their accuracy by three times, without a significantchange in task completion duration.• A generic technique for inferring a hierarchical model of application-specificmotifs from execution traces (Chapter 5). Our motifs, inspired by bioin-formatics algorithms, are recurring abstract patterns of execution that areaccommodating to alterations and are closer to the higher-level features of asystem. We designed a visualization for our technique that allows users toobserve and query the motifs. Our technique is implemented in a tool calledSABALAN, which is publicly available. The results of our user experimentshowed that using the systematic analysis of SABALAN enabled participantsto perform 54% more accurately.1537.2 Research Questions RevisitedWe introduced a set of five research questions in Chapter 1, which were addressedby the work presented in the following chapters (chapters 2–5).Research Question 1How can we enhance developers’ performance in understanding the event-basedinteractions in client-side JavaScript?This research question focuses on understanding the behaviour of modern webapplications, which is a challenging endeavour for developers during developmentand maintenance tasks. The challenges mainly stem from the dynamic, event-driven,and asynchronous nature of the JavaScript language. To address RQ1, in Chapter 2,we proposed a generic technique for capturing low-level event-based interactionsin a web application and mapping those to a higher-level behavioural model. Thismodel is then transformed into an interactive visualization, representing episodes oftriggered causal and temporal events, related JavaScript code executions, and theirimpact on the dynamic DOM state. Our approach, implemented in a tool calledCLEMATIS, allows developers to easily understand the complex dynamic behaviourof their application at three different semantic levels of granularity.The results of our industrial controlled experiment showed that CLEMATIS iscapable of improving the comprehension task accuracy by 157%, while reducingthe task completion time by 47%. Further, the results showed that CLEMATISis most helpful for complex and implicit relations in program execution that aretroublesome for developers to understand, if not impossible. For instance, in theexperimental object, the effect of using CLEMATIS was most visible when the tasksinvolved timing events, event propagations, or server communications. It can alsobe observed that using CLEMATIS not only improves both duration and accuracy ofindividual and total tasks, but it also helps developers to perform in a much moreconsistent manner. Unlike the control group, the low variance in all the tasks forthe experimental group shows that CLEMATIS helped all developers in the study toperform consistently better by making it easier to understand the internal flow anddependency of event-based interactions.154Research Question 2How can we enhance developers’ performance in understanding the root-causes oftest assertion failures in client-side JavaScript?While RQ1 addresses understanding the behaviour of client-side JavaScriptapplications, RQ2 focuses on understanding the behaviour specifically when afailure occurs. The goal of this research question is to investigate the means bywhich we can help developers bridge the gap between test cases and program codeby localizing the fault related to a test assertion. Our approach for addressing RQ2 isbuilt on top of CLEMATIS. We proposed an automated technique to help developerslocalize the fault related to a test failure. Through a combination of selectivecode instrumentation and dynamic backward slicing, our technique bridges the gapbetween test cases and program code. We extended the visualization of CLEMATISto help developers understand the relation of observed program behaviour to thetest cases. A follow up experiment reveals that extended CLEMATIS improves thefault localization accuracy of developers by a factor of two. This approach is alsodiscussed in Chapter 2.Research Question 3How can we improve developers’ understanding of the temporal and asynchronousbehaviour of full-stack JavaScript?RQ1 and RQ2 both target understanding different aspect of JavaScript applica-tions, but only on the client side. While JavaScript is the lingua franca of client-sideweb development, it is also used for server-side programming, leading to “full-stack”applications written entirely in JavaScript. RQ3 targets the challenges that rise whenunderstanding distributed components of a full-stack application on the client- andserver-side, as well as their interactions. The temporal nature of program executionparticularly the asynchronous and implicit relations of JavaScript entities spreadover the client and the server make comprehension a rigorous task.In Chapter 4, we proposed a technique for capturing a behavioural model offull-stack JavaScript applications’ execution. The model is temporal and context-155sensitive to accommodate asynchronous events, as well as the scheduling andexecution of lifelines of callbacks. We present a visualization of the model tofacilitate program understanding for developers. We implement our approach in atool, called SAHAND, and evaluate it through a controlled experiment. The resultsshow that SAHAND improves developers’ performance in completing programcomprehension tasks by increasing their accuracy by a factor of three.Research Question 4How does providing a model of the dependencies in the application improve devel-opers’ understanding of the change impact in JavaScript applications?While all the previous research questions focus on understanding the behaviourof a JavaScript application exactly as captured from an execution session, thisresearch questions target understanding the behaviour in the presence of a changein the code. It aims at helping developers understand and analyze the impact of achange in the system, and predict the potential impact even before the actual changehappens.In Chapter 3, we propose a change impact analysis for JavaScript which ad-dresses challenges such as the seamless interplay with the DOM, event-driven anddynamic function calls, and asynchronous client/server communication. We firstperform an empirical study of change propagation, the results of which show thatthe DOM-related and dynamic features of JavaScript need to be taken into consid-eration in the analysis since they affect change impact propagation. We propose aDOM-sensitive hybrid change impact analysis technique for JavaScript through acombination of static and dynamic analysis. The proposed approach incorporates anovel ranking algorithm for indicating the importance of each entity in the impactset. Our approach is implemented in a tool called TOCHAL. The results of ourevaluation reveal that TOCHAL provides a more complete analysis compared tostatic or dynamic methods. Moreover, through an industrial controlled experiment,we find that TOCHAL helps developers by improving their task completion durationby 78% and accuracy by 223%.156Research Question 5How does providing high-level and semantic motifs of a application’s behaviourimprove comprehensibility of the application?All previous research questions rely on all details of program execution, capturedthrough traces. However, due to the amount of information that can be gatheredthrough the execution, the captured traces tend to become very large, complex, anddifficult to understand. None of these approaches provide a higher-level abstractionof the program behaviour or infer recurring and behavioural patterns representingthe semantics of a particular application, which is the focus of this research question.Further, the scope of this research question is not limited to JavaScript and includesall program traces. In Chapter 5, we propose a generic technique for facilitatingcomprehension by creating an abstract model of software behaviour. The modelencompasses hierarchies of recurring and application-specific motifs. The motifsare abstract patterns extracted from traces through our novel technique, inspired bybioinformatics algorithms. The motifs provide an overview of the behaviour at ahigh-level, while encapsulating semantically related sequences in execution. Wedesign a visualization that allows developers to observe and interact with inferredmotifs. We implement our approach in an open-source tool, called SABALAN,and evaluate it through a user experiment. The results show that using SABALANimproves developers’ accuracy in performing comprehension tasks by 54%.7.3 Reflections and Future DirectionsIn this thesis, we took the first steps towards using static and dynamic analysisfor assisting program comprehension, particularly for JavaScript. Our approachesaimed at algorithmically structuring program entities and their execution tracesinto behavioural models and making them more comprehensible. We also usedinformation visualization techniques to facilitate developers’ understanding of theapplication. However, there still remains much left to be addressed.As a next future step to this thesis, researchers can further investigate themeans of helping comprehension by providing semantic models and patterns ofexecution of software systems. There is great need for techniques that bridge the gap157between low-level details of code and execution and the mental model of developers.Our preliminary studies showed that such techniques are effective in facilitatingcomprehension by significantly improving developers’ performance. However, thereis not much research conducted for providing higher-level and semantic overviewsinto the behaviour of software that better match the mental model of developers.Conducting long-term experiments with professional developers can greatlyguide the design of such techniques and tools. Investigating how developers un-derstand program behaviour can provide valuable insight for the current state ofresearch in program comprehension. Studying developers’ expectations of compre-hension tools can drive designing, building, and deploying such tools. An iterativeprocess of studying, designing, evaluating, and improving the design based onfeedback can lead to more useful comprehension techniques that are more likely tobe adopted in industry as well.Further, designing and building more sophisticated visualizations using infor-mation visualization techniques can allow programmers to gain a better insight intoprogram behaviour. Enabling visual pattern recognition and clustering methodsin an interactive visual interface can greatly benefit the viewers. The interactionmechanisms such as querying, filtering, semantic zooming, bookmarking, andadding notes can be utilized to help developers locate and understand their requiredinformation easier, faster, and more accurately.Moreover, more tool support for such techniques can greatly increase theirimpact in real-world software engineering. Integration with programming IDE’s,debugger, and other developments environments can have a significant effect onadaptation of these techniques, and improve developers’ performance in their every-day tasks. Another direction these approaches can pursue is in debugging. Speciallyimproving CLEMATIS’s fault localization unit to further help developers detect andlocalize faulty behaviour of JavaScript applications.Another possible future direction in program understanding is understandingother program artifacts, such as test cases. The test suite is a major part of anapplication that needs to be understood and maintained by the developers. Despitethis necessity, there has been far less research on aiding developers’ understandingof the test suites. Researchers can extend program comprehension to understandingthe test suites of applications and to address the challenges specific to test compre-158hension and maintenance. Understanding test code is difficult, firstly due to thesame challenges that make production-code comprehension rigorous. Moreover,compared to testing traditional languages, JavaScript-specific features such as DOMinteractions and server communications further complicate comprehension of Java-Script test suites. Developers and testers use assertions in test cases to investigate thecorrectness of the code under test. Assertions are vital to functionality and quality oftest suites. They can examine the validity of a condition and the conformance of thetarget code to a requirement. However, they do not contain information regardingthe procedure of a test case, its semantics, and its purpose. In order to understanda test case, developers need to understand the process that led to the state of theapplication that is asserted in the test case. Furthermore, coverage metrics are usedfor distinguishing parts of the code that are covered by a test suite. Code coveragecan reveal parts of the application that are not tested. Although useful, coveragetools do not provide any information regarding how test cases interact with the code.Moreover, client-side JavaScript applications consists of more than just the code.The challenges of understanding JavaScript application propagate to understandingtest cases as well. With existing techniques, it is difficult to understand how the testcases interact with the dynamic and event-driven JavaScript code, the DOM, andthe server. We hypothesize that having an always-on visualization in a developmentenvironment is beneficial for developers to improve their performance in under-standing the testing process of a given application. Deploying such approaches byembedding them into development environments will provide realtime visual cuesfor representing the test cases, their dynamic procedure, and their interactions withthe code, DOM elements, and the server.It is worth mentioning that this thesis is only a step towards facilitating compre-hension of program behaviour and motifs. Our findings show the usefulness of ourproposed techniques, as well as great promise for further research in the area. Wehave made all our tools and supporting documents open source and available to beused in future research.159Bibliography[1] H. Agrawal, J. Horgan, S. London, and W. Wong. Fault localization usingexecution slices and dataflow tests. In Proceedings of InternationalSymposium on Software Reliability Engineering (ISSRE), pages 143–151,Toulouse, France, 1995. IEEE. → pages 3, 10, 146[2] W. Aigner, S. Miksch, W. Müller, H. Schumann, and C. Tominski.Visualizing time-oriented data - a systematic view. Computers & Graphics,31(3):401–409, 2007. → pages 106[3] S. Alimadadi. Understanding behavioural patterns in javascript. InProceedings of the 2016 24th ACM SIGSOFT International Symposium onFoundations of Software Engineering, FSE 2016, pages 1076–1078. ACM,2016. → pages 8[4] S. Alimadadi, S. Sequeira, A. Mesbah, and K. Pattabiraman. UnderstandingJavaScript event-based interactions. In Proceedings of the ACM/IEEEInternational Conference on Software Engineering (ICSE), pages 367–377.ACM, 2014. → pages v, 4, 5, 7, 93, 102, 146[5] S. Alimadadi, S. Sequeira, A. Mesbah, and K. Pattabiraman. UnderstandingJavaScript event-based interactions. Technical Report UBC-SALT-2014-001,University of British Columbia, 2014.http://salt.ece.ubc.ca/publications/docs/UBC-SALT-2014-001.pdf. → pages39[6] S. Alimadadi, A. Mesbah, and K. Pattabiraman. Hybrid DOM-sensitivechange impact analysis for JavaScript. In Proceedings of the EuropeanConference on Object-Oriented Programming (ECOOP), pages 321–345.LIPIcs, 2015. → pages v, 5, 7, 100, 146[7] S. Alimadadi, A. Mesbah, and K. Pattabiraman. Understandingasynchronous interactions in full-stack JavaScript. In Proceedings of the160ACM/IEEE International Conference on Software Engineering (ICSE), pages1169–1180. ACM, 2016. URLhttp://salt.ece.ubc.ca/publications/docs/icse16.pdf. → pages vi, 6, 8, 118,142, 151[8] S. Alimadadi, S. Sequeira, A. Mesbah, and K. Pattabiraman. UnderstandingJavaScript event-based interactions with Clematis. ACM Transactions onSoftware Engineering and Methodology (TOSEM), page 17 pages, 2016.URL http://salt.ece.ubc.ca/publications/docs/tosem16.pdf. → pages v, 7[9] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basiclocal alignment search tool. Journal of Molecular Biology, 215(3):403 – 410,1990. ISSN 0022-2836. → pages 124[10] D. Amalfitano, A. Fasolino, A. Polcaro, and P. Tramontana. The DynaRIAtool for the comprehension of Ajax web applications by dynamic analysis.Innovations in Systems and Software Engineering, 10(1):41–57, 2014. →pages 4, 93, 149, 150[11] S. Andrica and G. Candea. WaRR: A tool for high-fidelity web applicationrecord and replay. In Proceedings of the International Conference onDependable Systems & Networks (DSN), pages 403–410. IEEE ComputerSociety, 2011. → pages 149[12] T. Apiwattanapong, A. Orso, and M. J. Harrold. Efficient and precisedynamic impact analysis using execute-after sequences. In Proceedings ofthe International Conference on Software Engineering (ICSE), pages432–441. ACM, 2005. → pages 5, 147[13] R. S. Arnold. Software Change Impact Analysis. IEEE Computer Society,1996. ISBN 0818673842. → pages 5[14] S. Artzi, J. Dolby, F. Tip, and M. Pistoia. Practical fault localization fordynamic web applications. In Proceedings of the International Conferenceon Software Engineering (ICSE), pages 265–274. ACM, 2010. → pages 147[15] S. Artzi, J. Dolby, S. H. Jensen, A. Møller, and F. Tip. A framework forautomated testing of javascript web applications. In Proceedings of the 33rdInternational Conference on Software Engineering (ICSE), pages 571–580.ACM, 2011. → pages 3, 10[16] O. Benomar, H. Sahraoui, and P. Poulin. Detecting program executionphases using heuristic search. In International Symposium on Search BasedSoftware Engineering, pages 16–30. Springer, 2014. → pages 151161[17] I. Beschastnikh, Y. Brun, M. D. Ernst, and A. Krishnamurthy. Inferringmodels of concurrent systems from logs of their behavior with csight. InProceedings of the 36th International Conference on Software Engineering(ICSE), pages 468–479. ACM, 2014. → pages 150[18] S. A. Bohner and R. S. Arnold. Software Change Impact Analysis. IEEEComputer Society Press, Los Alamitos, CA, USA, 1996. → pages 147[19] J. Bohnet, M. Koeleman, and J. Döllner. Visualizing massively prunedexecution traces to facilitate trace exploration. In International Workshop onVisualizing Software for Understanding and Analysis (VISSOFT), pages57–64. IEEE, 2009. → pages 120, 150[20] B. Breech, M. Tegtmeyer, and L. Pollock. Integrating influence mechanismsinto impact analysis for increased precision. In Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM), pages 55–65.IEEE, 2006. → pages 5[21] L. C. Briand, Y. Labiche, and J. Leduc. Toward the reverse engineering ofuml sequence diagrams for distributed java software. IEEE Transactions onSoftware Engineering, 32(9):642–663, 2006. → pages 6, 118, 150[22] B. Burg, R. Bailey, A. J. Ko, and M. D. Ernst. Interactive record/replay forweb application debugging. In Proceedings of the Symposium on UserInterface Software and Technology (UIST), pages 473–484. ACM, 2013. →pages 149[23] B. Burg, A. J. Ko, and M. D. Ernst. Explaining visual changes in webinterfaces. In Proceedings of the ACM User Interface Software andTechnology Symposium (UIST), pages 259–268. ACM, 2015. → pages 148[24] callbackhell. Callback hell, a guide to writing asynchronous JavaScriptprograms. http://callbackhell.com, July 27, 2017. → pages 95[25] Caolan McMahon. Async.js. https://github.com/caolan/async, July 27, 2017.→ pages 146[26] P. Chen. Debugging asynchronous JavaScript with chrome DevTools, July27, 2017. URLhttp://www.html5rocks.com/en/tutorials/developertools/async-call-stack/.→ pages 147162[27] chromedevtools. Chrome devtools.https://developers.google.com/web/tools/chrome-devtools/, July 27, 2017.→ pages 79, 109[28] H. Cleve and A. Zeller. Locating causes of program failures. In Proceedingsof the 27th International Conference on Software Engineering (ICSE), pages342–351, New York, NY, USA, 2005. ACM. → pages 3, 10, 146[29] A. Cockburn, A. Karlson, and B. B. Bederson. A review of overview+detail,zooming, and focus+context interfaces. ACM Computing Surveys, 41(1):2:1–2:31, 2009. → pages 31[30] J. E. Cook and Z. Du. Discovering thread interactions in a concurrent system.Journal of Systems and Software, 77(3):285–297, 2005. → pages 151[31] T. A. Corbi. Program understanding: Challenge for the 1990s. IBM SystemsJournal, 28(2):294–306, 1989. → pages 1, 117[32] B. Cornelissen and L. Moonen. Visualizing similarities in execution traces.In Proceedings of the 3rd Workshop on Program Comprehension throughDynamic Analysis (PCODA), pages 6–10, 2007. → pages 150[33] B. Cornelissen, D. Holten, A. Zaidman, L. Moonen, J. J. Van Wijk, andA. Van Deursen. Understanding execution traces using massive sequenceand circular bundle views. In International Conference on ProgramComprehension (ICPC), pages 49–58. IEEE, 2007. → pages 6, 117, 150[34] B. Cornelissen, A. Zaidman, A. van Deursen, L. Moonen, and R. Koschke.A systematic survey of program comprehension through dynamic analysis.IEEE Transactions on Software Engineering, 35(5):684–702, 2009. →pages 2, 117, 118, 149[35] B. Cornelissen, A. Zaidman, and A. van Deursen. A controlled experimentfor program comprehension through trace visualization. IEEE Transactionson Software Engineering, 37(3):341–355, 2011. → pages 120, 150, 151[36] J. W. Creswell. Qualitative inquiry and research design: Choosing amongfive approaches. Sage, Thousand Oaks, California, 2nd edition, 2012. →pages 45[37] W. De Pauw and S. Heisig. Zinsight: A visual and analytic environment forexploring large event traces. In Proceedings of the 5th internationalsymposium on Software visualization, pages 143–152. ACM, 2010. →pages 150163[38] W. De Pauw, E. Jensen, N. Mitchell, G. Sevitsky, J. Vlissides, and J. Yang.Visualizing the execution of Java programs. In S. Diehl, editor, SoftwareVisualization: International Seminar, pages 151–162. 2002. → pages 150[39] M. Denker, J. Ressia, O. Greevy, and O. Nierstrasz. Modeling features atruntime. In International Conference on Model Driven EngineeringLanguages and Systems, pages 138–152. Springer, 2010. → pages 151[40] P. D’haeseleer. What are DNA sequence motifs? Nature biotechnology, 24(4):423–425, 2006. → pages 118[41] B. Dit, M. Wagner, S. Wen, W. Wang, M. Linares-Vásquez, D. Poshyvanyk,and H. Kagdi. Impactminer: A tool for change impact analysis. InCompanion Proceedings of the International Conference on SoftwareEngineering (ICSE), pages 540–543. ACM, 2014. → pages 148[42] Document Object Model (DOM). Document Object Model (DOM).http://www.w3.org/DOM/, July 27, 2017. → pages 64[43] escodegen. Escodegen. https://github.com/estools/escodegen, July 27, 2017.→ pages 108, 134[44] esprima. Esprima. http://esprima.org/, July 27, 2017. → pages 108[45] estraverse. Estraverse. https://github.com/estools/estraverse, July 27, 2017.→ pages 108, 134[46] express. Express. http://expressjs.com/, July 27, 2017. → pages 94[47] A. Feldthaus, M. Schäfer, M. Sridharan, J. Dolby, and F. Tip. Efficientconstruction of approximate call graphs for JavaScript IDE services. InProceedings of International Conference on Software Engineering (ICSE),pages 752–761. IEEE, 2013. → pages 5, 29, 146, 147, 150[48] F. Fittkau, J. Waller, C. Wulf, and W. Hasselbring. Live trace visualizationfor comprehending large software landscapes: The explorviz approach. InIEEE Working Conference on Software Visualization (VISSOFT), pages 1–4.IEEE, 2013. → pages 118, 150[49] K. Gallaba, A. Mesbah, and I. Beschastnikh. Don’t call us, we’ll call you:Characterizing callbacks in JavaScript. In Proceedings of the InternationalSymposium on Empirical Software Engineering and Measurement (ESEM),pages 247–256. IEEE Computer Society, 2015. → pages 4, 92, 95164[50] M. Gethers, B. Dit, H. Kagdi, and D. Poshyvanyk. Integrated impactanalysis for managing software changes. In Proceedings of the InternationalConference on Software Engineering (ICSE), pages 430–440. ACM, 2012.→ pages 148[51] C. Ghezzi, M. Pezzè, M. Sama, and G. Tamburrelli. Mining behavior modelsfrom user-intensive web applications. In Proceedings of the 36thInternational Conference on Software Engineering, pages 277–287. ACM,2014. → pages 145[52] M. Gollery. Bioinformatics: Sequence and genome analysis, 2nd ed.Clinical Chemistry, 51(11):2219–2219, 2005. → pages 124[53] googlecharts. Google chart tools. https://developers.google.com/chart/, July27, 2017. → pages 108[54] A. Groce and W. Visser. What went wrong: Explaining counterexamples. InWorkshop on Model Checking of Software, pages 121–135. Springer BerlinHeidelberg, 2003. → pages 146[55] A. Hamou-Lhadj and T. Lethbridge. Summarizing the content of large tracesto facilitate the understanding of the behaviour of a software system. InInternational Conference on Program Comprehension (ICPC), pages181–190. IEEE, 2006. → pages 117, 120, 150[56] A. Hamou-Lhadj and T. C. Lethbridge. A survey of trace exploration toolsand techniques. In Proceedings of the 2004 conference of the Centre forAdvanced Studies on Collaborative research, pages 42–55. IBM Press, 2004.→ pages[57] A. Hamou-Lhadj, T. C. Lethbridge, and L. Fu. Challenges and requirementsfor an effective trace exploration tool. In International Workshop onProgram Comprehension (ICPC), pages 70–78. IEEE, 2004. → pages 150[58] L. Hattori, D. Guerrero, J. Figueiredo, J. Brunet, and J. Damásio. On theprecision and accuracy of impact analysis techniques. In Proceedings of theInternational Conference on Computer and Information Science, pages513–518, 2008. → pages 147[59] D. Heuzeroth, T. Holl, and W. Löwe. Combining static and dynamicanalyses to detect interaction patterns. Univ., Fak. für Informatik,Bibliothek, 2001. → pages 151165[60] D. Heuzeroth, T. Holl, G. Hogstrom, and W. Löwe. Automatic designpattern detection. In International Workshop on Program Comprehension(ICPC), pages 94–103. IEEE, 2003. → pages 6, 118, 151[61] J. Hibschman and H. Zhang. Unravel: Rapid web application reverseengineering via interaction recording, source tracing, and library detection.In Proceedings of ACM User Interface Software and Technology Symposium(UIST), pages 270–279. ACM, 2015. → pages 4, 93, 120, 149[62] S. Hong, Y. Park, and M. Kim. Detecting concurrency errors in client-sidejava script web applications. In Proceedings of IEEE InternationalConference on Software Testing, Verification and Validation (ICST), pages61–70. IEEE, 2014. → pages 146[63] hoxy. Hoxy proxy. greim.github.io/hoxy/, July 27, 2017. → pages 134[64] K. E. Isaacs, A. Giménez, I. Jusufi, T. Gamblin, A. Bhatele, M. Schulz,B. Hamann, and P.-T. Bremer. State of the art of performance visualization.EuroVis 2014, 2014. → pages 150[65] S. Jayaraman, B. Jayaraman, and D. Lessa. Compact visualization of Javaprogram execution. Software: Practice and Experience, 2016. → pages 150[66] S. H. Jensen, M. Madsen, and A. Møller. Modeling the HTML DOM andbrowser API in static analysis of JavaScript web applications. InProceedings of the Symposium on Foundations of Software Engineering(ESEC/FSE), pages 59–69. ACM, 2011. → pages 5, 146, 147[67] J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantulaautomatic fault-localization technique. In Proceedings of the 20thIEEE/ACM International Conference on Automated Software Engineering,ASE ’05, pages 273–282. ACM, 2005. → pages 3, 10[68] B. Karran, J. Trumper, and J. Dollner. Synctrace: Visual thread-interplayanalysis. In Software Visualization (VISSOFT), 2013 First IEEE WorkingConference on, pages 1–10. IEEE, 2013. → pages 150[69] V. Kashyap, K. Dewey, E. A. Kuefner, J. Wagner, K. Gibbons, J. Sarracino,B. Wiedermann, and B. Hardekopf. Jsai: A static analysis platform forJavaScript. In Proceedings of the ACM SIGSOFT International Symposiumon Foundations of Software Engineering (FSE), pages 121–132. ACM, 2014.→ pages 146166[70] Y. P. Khoo, M. Hicks, J. S. Foster, and V. Sazawal. Directing JavaScript witharrows. ACM SIGPLAN Notices, 44(12):49–58, 2009. → pages 146[71] A. J. Ko, B. A. Myers, M. J. Coblenz, and H. H. Aung. An exploratory studyof how developers seek, relate, and collect relevant information duringsoftware maintenance tasks. IEEE Transactions on Software Engineering,32(12):971–987, 2006. → pages 1, 117[72] B. Korel and J. W. Laski. Dynamic program slicing. Inf. Process. Lett., 29(3):155–163, 1988. → pages 145[73] J. Koskinen, M. Kettunen, and T. Systa. Profile-based approach to supportcomprehension of software behavior. In International Conference onProgram Comprehension (ICPC), pages 212–224. IEEE, 2006. → pages 6,118, 151[74] A. La. Language trends on GitHub.https://github.com/blog/2047-language-trends-on-github, July 27, 2017. →pages 2, 119[75] T. D. LaToza, G. Venolia, and R. DeLine. Maintaining mental models: Astudy of developer work habits. In Proceedings of the 28th InternationalConference on Software Engineering, ICSE ’06, pages 492–501. ACM, 2006.→ pages 1, 6[76] T. D. LaToza, G. Venolia, and R. DeLine. Maintaining mental models: Astudy of developer work habits. In Proceedings of the 28th InternationalConference on Software Engineering (ICSE), pages 492–501. ACM, 2006.→ pages 117[77] M. M. Lehman. Programs, life cycles, and laws of software evolution.Proceedings of the IEEE, 68(9):1060–1076, 1980. → pages 5[78] P. Li and E. Wohlstadter. Script InSight: Using models to explore JavaScriptcode from the browser view. In Proceedings of the 9th InternationalConference on Web Engineering (ICWE), pages 260–274. Springer-Verlag,2009. → pages 148[79] S. Lin, F. Taïani, T. C. Ormerod, and L. J. Ball. Towards anomalycomprehension: using structural compression to navigate profiling call-trees.In Proceedings of the 5th international symposium on Software visualization,pages 103–112. ACM, 2010. → pages 150167[80] B. Livshits, M. Sridharan, Y. Smaragdakis, O. Lhoták, J. N. Amaral, B.-Y. E.Chang, S. Z. Guyer, U. P. Khedker, A. Møller, and D. Vardoulakis. Indefense of soundiness: A manifesto. Communications of the ACM, 58(2):44–46, 2015. → pages 75[81] J. Lo, E. Wohlstadter, and A. Mesbah. Imagen: Runtime migration ofbrowser sessions for JavaScript web applications. In Proceedings of theInternational World Wide Web Conference (WWW), pages 815–825. ACM,2013. → pages 148[82] I. Loire. Math-race. https://github.com/iloire/math-race, July 27, 2017. →pages 109[83] M. Madsen, B. Livshits, and M. Fanning. Practical static analysis ofJavaScript applications in the presence of frameworks and libraries. InProceedings of the Joint Meeting on Foundations of Software Engineering(ESEC/FSE), pages 499–509. ACM, 2013. → pages 146, 147[84] M. Madsen, F. Tip, and O. Lhoták. Static analysis of event-driven node.jsJavaScript applications. In Proceedings of ACM SIGPLAN Conference onObject-Oriented Programming, Systems, Languages, and Applications(OOPSLA), pages 505–519. ACM, 2015. → pages 4, 93, 146[85] J. Maras, J. Carlson, and I. Crnkovi. Extracting client-side web applicationcode. In Proceedings of the International Conference on World Wide Web(WWW), pages 819–828. ACM, 2012. → pages 148[86] J. Maras, M. Stula, and J. Carlson. Generating feature usage scenarios inclient-side web applications. In Proceeding of the International Conferenceon Web Engineering (ICWE), pages 186–200. Springer, 2013. → pages 148[87] N. Matthijssen, A. Zaidman, M.-A. Storey, I. Bull, and A. Van Deursen.Connecting traces: Understanding client-server interactions in Ajaxapplications. In Proceedings of the International Conference on ProgramComprehension (ICPC), pages 216–225. IEEE Computer Society, IEEE,2010. → pages 150[88] A. Mesbah, A. van Deursen, and D. Roest. Invariant-based automatic testingof modern web applications. IEEE Transactions on Software Engineering(TSE), 38(1):35–53, 2012. → pages 3, 10[89] J. Mickens, J. Elson, and J. Howell. Mugshot: Deterministic capture andreplay for Javascript applications. In Proceedings of the 7th USENIX168Conference on Networked Systems Design and Implementation, NSDI’10,pages 159–174. USENIX Association, 2010. → pages 149[90] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Guided mutation testingfor JavaScript web applications. IEEE Transactions on SoftwareEngineering (TSE), page 19 pages, 2015. → pages 50[91] P. Montoto, A. Pan, J. Raposo, F. Bellas, and J. López. Automatingnavigation sequences in Ajax websites. In Proceedings of the InternationalConference on Web Engineering (ICWE), pages 166–180. Springer, 2009.→ pages 149[92] G. C. Murphy, D. Notkin, and K. Sullivan. Software reflexion models:Bridging the gap between source and high-level models. In Proceedings ofthe 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering,pages 18–28. ACM, 1995. → pages 1, 6, 117[93] Mutation Summary. Mutation summary.https://code.google.com/p/mutation-summary/, July 27, 2017. → pages 79[94] G. C. Necula, J. Condit, M. Harren, S. McPeak, and W. Weimer. Ccured:Type-safe retrofitting of legacy software. ACM Transactions onProgramming Languages and Systems, 27(3):477–526, May 2005. → pages145[95] H. V. Nguyen, C. Kästner, and T. N. Nguyen. Building call graphs forembedded client-side code in dynamic web applications. In Proceedings ofthe ACM SIGSOFT International Symposium on Foundations of SoftwareEngineering, pages 518–529. ACM, 2014. → pages 146[96] H. V. Nguyen, H. A. Nguyen, A. T. Nguyen, and T. N. Nguyen. Mininginterprocedural, data-oriented usage patterns in JavaScript web applications.In Proceedings of the ACM/IEEE International Conference on SoftwareEngineering, pages 791–802. ACM, 2014. → pages 146[97] nodejs. Node.js. https://nodejs.org/, July 27, 2017. → pages 4[98] F. Ocariza, K. Bajaj, K. Pattabiraman, and A. Mesbah. An empirical study ofclient-side JavaScript bugs. In Proceedings of the ACM/IEEE InternationalSymposium on Empirical Software Engineering and Measurement (ESEM),pages 55–64. IEEE Computer Society, 2013. → pages 10169[99] F. J. Ocariza, K. Pattabiraman, and A. Mesbah. Autoflox: An automatic faultlocalizer for client-side JavaScript. In Proceedings of the InternationalConference on Software Testing, Verification and Validation (ICST), pages31–40. IEEE Computer Society, 2012. → pages 147[100] T. Ohmann, M. Herzberg, S. Fiss, A. Halbert, M. Palyart, I. Beschastnikh,and Y. Brun. Behavioral resource-aware model inference. In Proceedings ofthe ACM/IEEE international conference on Automated software engineering,pages 19–30. ACM, 2014. → pages 118[101] S. Oney and B. Myers. FireCrystal: Understanding interactive behaviors indynamic web pages. In Proceedings of the Symposium on Visual Languagesand Human-Centric Computing, pages 105–108. IEEE Computer Society,2009. → pages 2, 4, 93, 149, 150[102] M. J. Pacione, M. Roper, and M. Wood. A novel software visualisationmodel to support software comprehension. In Proceedings of the WorkingConference on Reverse Engineering (WCRE), pages 70–79. IEEE ComputerSociety, IEEE, 2004. → pages 38, 57, 110, 115, 136, 138, 143[103] V. K. Palepu and J. A. Jones. Visualizing constituent behaviors withinexecutions. In Software Visualization (VISSOFT), 2013 First IEEE WorkingConference on, pages 1–4. IEEE, 2013. → pages 151[104] M. Perscheid, B. Steinert, R. Hirschfeld, F. Geller, and M. Haupt.Immediacy through interactivity: online analysis of run-time behavior. InWorking Conference on Reverse Engineering (WCRE), pages 77–86. IEEE,2010. → pages 150[105] M. Petrenko and V. Rajlich. Variable granularity for improving precision ofimpact analysis. In Proceedings of the International Conference on ProgramComprehension (ICPC), pages 10–19. IEEE, 2009. → pages 5[106] Phormer. Phormer PHP photo gallery. http://p.horm.org/er/, July 27, 2017.→ pages 86, 137, 143[107] H. Pirzadeh and A. Hamou-Lhadj. A novel approach based on gestaltpsychology for abstracting the content of large execution traces for programcomprehension. In International Conference on Engineering of ComplexComputer Systems (ICECCS), pages 221–230. IEEE, 2011. → pages 151[108] C. Plaisant, B. Milash, A. Rose, S. Widoff, and B. Shneiderman. Lifelines:Visualizing personal histories. In Proceedings of the SIGCHI Conference on170Human Factors in Computing Systems (CHI), pages 221–227. ACM, 1996.→ pages 107[109] Promises. Promises/A+. https://promisesaplus.com, July 27, 2017. →pages 146[110] B. Pytlik, M. Renieris, S. Krishnamurthi, and S. P. Reiss. Automated faultlocalization using potential invariants. In International Workshop onAutomated and Algorithmic Debugging, pages 273–276, Ghent, Belgium,2003. → pages 146[111] M. K. Ramanathan, A. Grama, and S. Jagannathan. Sieve: A tool forautomatically detecting variations across program versions. In Proceedingsof the International Conference on Automated Software Engineering (ASE),pages 241–252. IEEE, 2006. → pages 5, 148[112] V. Raychev, M. Vechev, and M. Sridharan. Effective race detection forevent-driven programs. In Proceedings of the ACM SIGPLAN InternationalConference on Object Oriented Programming Systems Languages &Applications (OOPSLA), pages 151–166. ACM, 2013. → pages 145[113] S. P. Reiss and M. Renieris. Encoding program executions. In proceedingsof the 23rd International Conference on Software Engineering, pages221–230. IEEE Computer Society, 2001. → pages 120, 150[114] X. Ren, F. Shah, F. Tip, B. G. Ryder, and O. Chesley. Chianti: a tool forchange impact analysis of Java programs. In Proceedings of the ACMSIGPLAN Conference on Object-oriented Programming, Systems,Languages, and Applications (OOPSLA), pages 432–448. ACM, 2004. →pages 5, 148[115] M. Renieris and S. P. Reiss. Almost: exploring program traces. InProceedings of the 1999 workshop on new paradigms in informationvisualization and manipulation in conjunction with the eighth ACMinternation conference on Information and knowledge management, pages70–77. ACM, 1999. → pages 150[116] G. Richards, S. Lebresne, B. Burg, and J. Vitek. An analysis of the dynamicbehavior of JavaScript programs. In Proceedings of the Conference onProgramming Language Design and Implementation (PLDI), pages 1–12.ACM, 2010. → pages 60, 67171[117] M. P. Robillard, W. Coelho, and G. C. Murphy. How effective developersinvestigate source code: an exploratory study. IEEE Transactions onSoftware Engineering, 30(12):889–903, 2004. → pages 1, 117[118] Sabalan. Sabalan. Anonymized for double-blind review., July 27, 2017. →pages 124, 134, 138[119] sahand. Sahand: tool implementation and dataset.http://salt.ece.ubc.ca/software/sahand/, July 27, 2017. → pages 93, 108,109, 111, 115, 116[120] R. R. Sambasivan, I. Shafer, M. L. Mazurek, and G. R. Ganger. Visualizingrequest-flow comparison to aid performance diagnosis in distributed systems.IEEE transactions on visualization and computer graphics, 19(12):2466–2475, 2013. → pages 150[121] selenium. Selenium. http://seleniumhq.org, July 27, 2017. → pages 3[122] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs. Jalangi: a selectiverecord-replay and dynamic analysis framework for JavaScript. InProceedings of the Joint Meeting on Foundations of Software Engineering,ESEC/FSE, pages 488–498. ACM, 2013. → pages 149[123] T. F. Smith and M. S. Waterman. Identification of common molecularsubsequences. Journal of molecular biology, 147(1):195–197, 1981. →pages 127[124] M. Sridharan, S. J. Fink, and R. Bodik. Thin slicing. SIGPLAN Notices, 42(6):112–122, June 2007. → pages 145[125] M. Sridharan, J. Dolby, S. Chandra, M. Schäfer, and F. Tip. Correlationtracking for points-to analysis of JavaScript. In Proceedings of EuropeanConference on Object-Oriented Programming (ECOOP), pages 435–458.Springer, 2012. → pages 5, 81, 146, 147[126] Stack Overflow. Developer survey.http://stackoverflow.com/research/developer-survey-2017, July 27, 2017.→ pages 1, 119[127] T. Systä, K. Koskimies, and H. Müller. Shimba–an environment for reverseengineering java software systems. Software: Practice and Experience, 31(4):371–394, 2001. → pages 150172[128] T. J. Watson Libraries for Analysis. WALA. http://wala.sourceforge.net/,July 27, 2017. → pages 79, 81[129] S. Thummalapenta, K. V. Lakshmi, S. Sinha, N. Sinha, and S. Chandra.Guided test generation for web applications. In Proceedings of theInternational Conference on Software Engineering (ICSE), pages 162–171.IEEE, 2013. → pages 3, 10[130] Tochal. Tochal. https://github.com/saltlab/tochal, July 27, 2017. → pages61, 80[131] J. Trümper, J. Bohnet, and J. Döllner. Understanding complex multithreadedsoftware systems by using trace visualization. In Proceedings of the 5thinternational symposium on Software visualization, pages 133–142. ACM,2010. → pages 150[132] I. Vessey. Expertise in debugging computer programs: A process analysis.International Journal of Man-Machine Studies, 23(5):459 – 494, 1985. →pages 3, 10[133] W3C. Document Object Model (DOM) level 2 events specification.http://www.w3.org/TR/DOM-Level-2-Events/, July 27, 2017. → pages 10,12, 19, 60, 65, 100, 148[134] Y. Watanabe, T. Ishio, and K. Inoue. Feature-level phase detection forexecution trace using object cache. In Proceedings of the 2008 internationalworkshop on dynamic analysis: held in conjunction with the ACM SIGSOFTInternational Symposium on Software Testing and Analysis (ISSTA 2008),pages 8–14. ACM, 2008. → pages 151[135] webstorm. WebStorm. https://www.jetbrains.com/webstorm/, July 27, 2017.→ pages 109[136] S. Wei and B. G. Ryder. Practical blended taint analysis for JavaScript. InProceedings of the International Symposium on Software Testing andAnalysis (ISSTA), pages 336–346. ACM, 2013. → pages 5, 81, 145, 148[137] S. Wei and B. G. Ryder. State-sensitive points-to analysis for the dynamicbehavior of JavaScript objects. In Proceedings of European Conference onObject-Oriented Programming (ECOOP), pages 1–26. Springer, 2014. →pages 5, 81, 146, 148[138] M. Weiser. Program slicing. In Proceedings of the International Conferenceon Software Engineering (ICSE), pages 439–449. IEEE, 1981. → pages 145173[139] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén.Experimentation in software engineering. Springer Science & BusinessMedia, 2012. → pages 37, 84, 109[140] S. Yong and S. Horwitz. Using static analysis to reduce dynamic analysisoverhead. Formal Methods in System Design, 27(3):313–334, 2005. →pages 145[141] A. Zaidman. Scalability solutions for program comprehension throughdynamic analysis. In Proceedings of the European Conference on SoftwareMaintenance and Reengineering (CSMR), pages 4–pp. IEEE, 2006. →pages 6, 117[142] A. Zaidman and S. Demeyer. Managing trace data volume through aheuristical clustering process based on event execution frequency. InProceedings of the European Conference on Software Maintenance andReengineering (CSMR), pages 329–338. IEEE, 2004. → pages 6, 117, 151[143] A. Zaidman, N. Matthijssen, M.-A. Storey, and A. van Deursen.Understanding Ajax applications by connecting client and server-sideexecution traces. Empirical Software Engineering, 18(2):181–218, 2013. →pages 2, 4, 40, 93, 148, 150[144] A. Zeller. Isolating cause-effect chains from computer programs. InProceedings of the 10th ACM SIGSOFT Symposium on Foundations ofSoftware Engineering, pages 1–10. ACM, 2002. → pages 3, 10, 146[145] Y. Zheng, T. Bao, and X. Zhang. Statically locating web application bugscaused by asynchronous calls. In Proceedings of the 20th internationalconference on World Wide Web (WWW), pages 805–814. ACM, 2011. →pages 146174


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items