UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Understanding web application test assertion failures Sequeira, Sheldon 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2014_november_sequeira_sheldon.pdf [ 1.08MB ]
JSON: 24-1.0167024.json
JSON-LD: 24-1.0167024-ld.json
RDF/XML (Pretty): 24-1.0167024-rdf.xml
RDF/JSON: 24-1.0167024-rdf.json
Turtle: 24-1.0167024-turtle.txt
N-Triples: 24-1.0167024-rdf-ntriples.txt
Original Record: 24-1.0167024-source.json
Full Text

Full Text

Understanding Web ApplicationTest Assertion FailuresbySheldon SequeiraB.A.Sc., The University of British Columbia, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical & Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)October 2014© Sheldon Sequeira 2014AbstractDevelopers often write test cases that assert the behaviour of a web ap-plication from an end-user’s perspective. However, when such test casesfail, it is dicult to relate the assertion failure to the faulty line of code.The challenges mainly stem from the existing disconnect between front-endtest cases that assert the DOM and the application’s underlying JavaScriptcode. We propose an automated technique to help developers localize thefault related to a test failure. Through a combination of selective code in-strumentation and dynamic backward slicing, our technique bridges the gapbetween test cases and program code. Through an interactive visualization,our approach, implemented in a tool called Camellia, allows developers toeasily understand the dynamic behaviour of their application and its rela-tion to the test cases. The results of our controlled experiment show thatCamellia improves the fault localization accuracy of developers by a fac-tor of two. Moreover, the implemented approach incurs a low performanceoverhead.iiPrefaceApart from Chapter 2, the work presented in this thesis is under reviewas a submission to the ACM/IEEE International Conference on SoftwareEngineering (ICSE), 2015. I conceived the idea for the project and wasresponsible for its design and described implementation.The work described in Chapter 2 appears in the proceedings of ICSE 2014,367–377 [6]. The paper appeared as part of the research track and receivedthe ACM SIGSOFT Distinguished Paper Award. I was involved in theconception and design of this previous project. In terms of implementation,I was involved in many facets of the project including the instrumentationtechnique and produced visualization. The section on “Capturing Timeoutsand XHRs” was originally drafted by Alimadadi, S.The controlled experiment described in Chapter 4 was conducted under theapproval of the University of British Columbia (UBC) Behavioural ResearchEthics Board (BREB) with the certificate number H13-00632. In addition toconducting the study, I also analyzed the collected data using the statisticalanalysis package R [13].iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1 Document Object Model (DOM) . . . . . . . . . . . . 31.1.2 JavaScript . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Testing Web Applications . . . . . . . . . . . . . . . . 41.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.1 Running Example . . . . . . . . . . . . . . . . . . . . 61.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Claims and Contributions . . . . . . . . . . . . . . . . . . . . 82 Approach: Modelling Web Application Behaviour . . . . . 102.1 JavaScript Transformation and Tracing . . . . . . . . . . . . 102.1.1 Interposing on DOM Events . . . . . . . . . . . . . . 112.1.2 Capturing Timeouts and XHRs . . . . . . . . . . . . 112.1.3 Recording Function Traces . . . . . . . . . . . . . . . 112.1.4 Tracking DOM Changes . . . . . . . . . . . . . . . . 14ivTable of Contents2.2 Capturing a Behavioural Model . . . . . . . . . . . . . . . . 142.2.1 Episode Nodes . . . . . . . . . . . . . . . . . . . . . . 142.2.2 Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.3 Story . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Visualizing the Captured Model . . . . . . . . . . . . . . . . 152.3.1 Semantic Zoom Levels . . . . . . . . . . . . . . . . . 152.4 Tool Implementation: Clematis . . . . . . . . . . . . . . . . . 162.5 Results and E↵ectiveness of Proposed Model . . . . . . . . . 163 Approach: Test Case Understanding . . . . . . . . . . . . . 183.1 Relating Test Assertions to DOM Elements . . . . . . . . . . 183.2 Contextualizing Test Case Assertions . . . . . . . . . . . . . 193.3 Slicing the JavaScript Code . . . . . . . . . . . . . . . . . . . 203.3.1 Selective Instrumentation . . . . . . . . . . . . . . . . 213.3.2 Computing a Backwards Slice . . . . . . . . . . . . . 243.4 Visualizing the Inferred Mappings . . . . . . . . . . . . . . . 263.4.1 Semantic Zoom Levels . . . . . . . . . . . . . . . . . 273.4.2 Stepping Through the Slice . . . . . . . . . . . . . . . 273.5 Tool Implementation . . . . . . . . . . . . . . . . . . . . . . 284 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . 304.1.1 Experimental Object . . . . . . . . . . . . . . . . . . 304.1.2 Participants . . . . . . . . . . . . . . . . . . . . . . . 304.1.3 Task Design . . . . . . . . . . . . . . . . . . . . . . . 314.1.4 Independent Variable (IV) . . . . . . . . . . . . . . . 324.1.5 Dependent Variables (DV) . . . . . . . . . . . . . . . 324.1.6 Data Analysis . . . . . . . . . . . . . . . . . . . . . . 324.1.7 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 324.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 334.2.2 Duration . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.3 Qualitative Feedback . . . . . . . . . . . . . . . . . . 344.3 Performance Overhead . . . . . . . . . . . . . . . . . . . . . 354.3.1 Instrumentation Overhead . . . . . . . . . . . . . . . 354.3.2 Execution Overhead . . . . . . . . . . . . . . . . . . . 364.3.3 Dynamic Analysis Overhead . . . . . . . . . . . . . . 365 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.1 Task Completion Accuracy . . . . . . . . . . . . . . . . . . . 38vTable of Contents5.2 Task Completion Duration . . . . . . . . . . . . . . . . . . . 395.3 Reducing Variance in User Performance . . . . . . . . . . . . 395.4 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.5 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 405.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.7 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . 426 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.1 Fault localization . . . . . . . . . . . . . . . . . . . . . . . . 436.2 Program Slicing . . . . . . . . . . . . . . . . . . . . . . . . . 446.3 Behavioural Debugging . . . . . . . . . . . . . . . . . . . . . 446.4 Visualization for fault localization . . . . . . . . . . . . . . . 457 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47AppendicesA Experiment Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 51A.1 Task 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51A.2 Task 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51B Information Consent Form . . . . . . . . . . . . . . . . . . . . 52C Pre-Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . 55D Post-Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . 56viList of Tables4.1 Injected faults for the controlled experiment. . . . . . . . . . 32viiList of Figures1.1 Example event registration technique for web applications. . . 31.2 Example of a Selenium test case . . . . . . . . . . . . . . . . 41.3 Running example . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Instrumented JavaScript function declaration. . . . . . . . . . 122.2 Instrumented JavaScript return statement. . . . . . . . . . . . 132.3 Instrumented JavaScript function calls . . . . . . . . . . . . . 132.4 Three semantic zoom levels in Clematis . . . . . . . . . . . 173.1 Relating test case assertions to DOM accesses . . . . . . . . . 193.2 Instrumented JavaScript code for the running example . . . . 233.3 Processing view of approach. . . . . . . . . . . . . . . . . . . 263.4 Visualization for a test case. . . . . . . . . . . . . . . . . . . . 294.1 Box plots of (a) task completion accuracy, (b) task completionduration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37viiiList of Algorithms1 Selective Instrumentation . . . . . . . . . . . . . . . . . . . . . 222 Backward Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . 25ixAcknowledgementsI o↵er my enduring gratitude to my supervisors and my fellow students inthe SALT lab at UBC. In particular, I would like to thank Dr. Ali Mesbahand Dr. Karthik Pattabiraman for their unwavering guidance, support, andpatience. I am indebted to Dr. Mesbah for inviting me into such a supportivecommunity of researchers.I would like to acknowledge Dr. Matei Ripeanu for acting as chair in myproject defence and would like to thank Dr. Mesbah and Dr. Pattabiramanfor taking time out of their busy schedules to act as examiners during mydefence.A special thank you to the Natural Sciences and Engineering Research Coun-cil of Canada (NSERC) for supporting me financially in my studies. I wouldalso like to thank SAP Labs Vancouver for providing me with an environ-ment to cultivate my technical skills during my postgraduate studies.Last but not least, I would like to thank my parents for their supportthroughout my years of education. I attribute any of my success to theirhard-work and dedication. A special thanks to my father for driving me toand from school during obscene hours of the morning and night.xDedicationI dedicate this thesis to my family, for all the sacrifices they have made forme. I also dedicate this work to my friends who have kept me laughingthrough the years.xiChapter 1IntroductionFault localization has been found to be one of the most dicult phasesof debugging [30], and has been an active topic of research in the past[5, 9, 19, 35]. The fault localization process involves finding the exact line ofcode responsible for some abnormal program behaviour. Although testingof modern web applications has received increasing attention in the recentpast [8, 23, 29], there has been limited work on what happens after a testreveals an error. After a test fails, the fault localization process much beundertaken by developers in order to identify the fault responsible for thefailure.To test their web applications, developers often write test cases that checkthe application’s behaviour from an end-user’s perspective using popularframeworks such as Selenium.1 Such test cases are agnostic of the Java-Script code and operate by simulating a series of user actions followed byassertions on the application’s runtime Document Object Model (DOM). Assuch, they can detect deviations in the expected behaviour as observed onthe DOM.However, when a web application test assertion fails, determining the faultyprogram code responsible for the failure can be a challenging endeavour.The main challenge here is the implicit link between three di↵erent entities,namely, the test assertion, the DOM elements on which the assertion fails(checked elements), and the faulty JavaScript code responsible for modifyingthose DOM elements. To understand the root cause of the assertion failure,the developer needs to manually infer a mental model of these hidden links,which can be tedious. Further, unlike in traditional (e.g., Java) applications,there is no useful stack trace produced when a web test case fails as thefailure is on the DOM, and not on the application’s JavaScript code. Thisfurther hinders debugging as the fault usually lies within the application’scode, and not in its surfacing DOM. To the best of our knowledge, there1http://seleniumhq.org11.1. Backgroundis currently no tool support available to help developers in this test failureunderstanding and fault localization process.In this thesis, we present an automated technique, called Camellia, to helpdevelopers understand the root cause of a test failure. Camellia helps pro-grammers in three ways. First, it automatically links a test assertion failureto the checked DOM elements, and subsequently to the related statements inthe JavaScript code. Second, it incorporates a novel dynamic slicing methodfor JavaScript that reduces the amount of noise encountered by developerswhen debugging. Our slicing method is based on a selective instrumenta-tion algorithm to reduce the performance overhead and trace size. Third,once a test failure is connected to the relevant slice, our technique visualizesthe output to abstract away details and relate the test failure to the cor-responding episodes of code execution, DOM mutations, and the computedJavaScript slice. Camellia aids developers in understanding existing testcases by capturing the DOM dependencies of a test case, as well as an ap-plication’s DOM evolution and JavaScript code execution when running thetest case.1.1 BackgroundModern web applications are largely event-driven. Their client-side exe-cution is normally initiated in response to a user-action triggered event, atiming event, or the receipt of an asynchronous callback message from theserver. A fault in their client-side execution may be detected by a DOM-based test case. Still, the implicit link between the test assertion, the DOM,and the application’s underlying JavaScript code makes the debugging pro-cess for web applications challenging.As an analogy, the client-side component of a web application follows asimilar structure as the one outlined in the model-view-controller designpattern [22]. Here, the DOM captures the structure of the application andalso serves as the interface between the user and the application, actingas both the model and view. JavaScript code is executed in response toactions on the DOM and is responsible for updating the DOM, acting as theapplication’s controller.21.1. Background123456var	button	=	document.getElementById("myButton");		button.addEventListener('click',	function()	{		//	Respond	to	click	event,	execute	some	JS		...});		Figure 1.1: Example event registration technique for web applications.1.1.1 Document Object Model (DOM)The DOM acts as an interface between JavaScript applications and thebrowser’s front-end. This dynamic tree-like structure represents the UI atruntime and is updated by client-side JavaScript code to represent statechanges. A website can use JavaScript to bind event listeners to HTMLelements such as buttons. These registered listeners execute a specifiedJavaScript function when a particular event occurs on their parent HTMLelement. For example, the addEventListener JavaScript method is avail-able to most HTML elements and can be passed an event type, such asclick, along with a callback function in order to register a listener. Once alistener is registered, the browser will execute the specified callback functionif the user clicks on the listener’s associated HTML element.Figure 1.1 illustrates the use of the JavaScript addEventListener methodto register a callback function with an existing button, which exists withinthe DOM. Once registered on Line 3, the callback function is executed oncethe button with id myButton is clicked by the application user.1.1.2 JavaScriptJavaScript developers often include JavaScript code within their sites inorder to improve the level of interactivity. JavaScript allows for the mutationof the DOM at runtime without the need for a page reload, and has becomewidely-adopted in websites and online applications. Additionally, JavaScriptcan be used to communicate with a web server.Web browsers provide a single thread for web application execution. Tocircumvent this limitation and build rich responsive web applications, de-velopers take advantage of the asynchronous capabilities o↵ered by modernbrowsers, such as timeouts and XMLHttpRequest (XHR) calls.Timeouts: Events can be registered to fire after a certain amount of time or31.1. Background12345678910	@Test	public	void	testSlideShow()	throws	Exception	{			driver.get("http://localhost:8888/phormer331/?feat=slideshow");					assertEquals("1",	driver.findElement(By.id("ss_n")).getText());					driver.findElement(By.linkText("Next")).click();			assertEquals("2",	driver.findElement(By.id("ss_n")).getText());	}Figure 1.2: Example of a Selenium test case used to test web applications.at certain intervals in JavaScript. These timeouts often have asynchronouscallbacks that are executed when triggered.XHR Callbacks: XHR objects are used to exchange data asynchronouslywith the server, without requiring a page reload. Each XHR goes throughthree main phases: open, send, and response. These three phases can bescattered throughout the code. There is no guarantee on the timing and theorder of XHR responses from the server.1.1.3 Testing Web ApplicationsCurrently, frameworks such as Selenium and CasperJS are used to automatethe execution of test cases related to an application’s client-side. Theseframeworks work by simulating a series of user actions specified by the testeron the application under test. Testers expect that by executing a series ofevent-based actions on the application, its DOM state will be brought to adesirable state to verify the presence and properties of certain elements. Thisblack-box approach to testing requires little knowledge of the application’sclient-side JavaScript or server-side code, and hence is easily applicable.Testers may either either record a series of actions, such as mouse clicks,during a user session and have these automatically translated into a scriptrepresentation, or manually specify a series of actions in a script-like format.Then, the framework is able to replay these actions while executing certainassertions on the application’s DOM.Figure 1.2 shows an example of one such test case that navigates to a specificapplication page (Line 3), makes an initial assertion on a DOM element withid ss n (Line 5), and then simulates a click action before executing a secondassertion (Lines 7-9). The driver object within the written test case acts41.2. Motivation1			<div	class="row-sort-assets">2							<div	class="sort-assets"></div>3							...4			</div>5			<div	id="assets-container"	data-pages="">6							<div	class="asset-row">7											<div	class="asset-icon"></div>8											...9							</div>10		</div>1		public	void	testSortByDefaults()	{2						driver.get("http://localhost:9763/store/assets/gadget");3						driver.findElement(By.css("i.icon-star")).click();4						int	s1	=	driver.findElements(By.css(".asset-icon")).size();5						assertEquals(12,	s1);67						scrollWindowDown();8						int	s2	=	driver.findElements(By.css(".asset-icon")).size();9						assertEquals(4,	s2	-	s1);10		}	1			var	currentPage	=	1;2			var	sortType	=	'default';3			var	gridSize	=	8;4			var	infiniteScroll	=	false;5			6			var	renderAssets	=	function(url,	size)	{7							var	data	=	assetsFromServer(url);8			9							var	temp	=	'<div	class="asset-row">';10						for	(i	=	0;	i	<	size;	i++)	{11										temp	+=	'		<div	class="asset-icon">';	12										...	//	Reading	from	variable	'data'13										temp	+=	'		</div>';14						}15						temp	+=	'</div>';16							17						return	$('#assets-container').append(temp);18		};19			20		$(document).on('click',	'#sort-assets',	function(){21						$('#sort-assets').removeClass('selected-type')22						$(this).addClass('selected-type');23						currentPage	=	1;24						sortType	=		$(this).attr('type');25						gridSize	=	12;26						renderAssets(url	+	sortType	+	currentPage,	gridSize)27						infiniteScroll	=	true;28		});29			30		var	scroll	=	function()	{31						if(infiniteScroll)	{32										currentPage++;33										renderAssets(url	+sortType	+	currentPage,	gridSize/2)34						}35		};36		$(window).bind('scroll',	scroll);(a)(d)(c)(b)(e)AAll CategoriesBar Chart Bubble Chart Date Time Directions by GoogleassertEquals(4,	s2	-	s1),	AssertionFailure:	expected	<4>	but	was:	<6>1324Figure 1.3: Running example (a) JavaScript code, (b) Portion of DOM-based UI, (c) Partial DOM, (d) DOM-based (Selenium) test case, (e) Testcase assertion failure. The dotted lines show the links between the di↵erententities that must be inferred.as the interface between the test case and the web application, which isaccessed via a web browser.1.2 MotivationOur work is motivated by the fact that Selenium test cases do not adequatelyrepresent the execution of an application’s underlying code. Mapping ac-tions and assertions from a front-end test case to the related JavaScript codeis dicult without some supplementary information. A way of automati-cally inferring and presenting the relationship between these assertions andthe application’s underlying code would reduce the e↵ort required to localizea test assertion fault.51.2. Motivation1.2.1 Running ExampleWe use a small code snippet based on the open-source WSO2 eStore ap-plication [4] as a running example to demonstrate the challenges involvedand our solution. The store allows clients to customize and deploy theirown digital storefront. A single row of four placeholder assets are shown inFigure 1.3b; three small buttons in the upper-right corner allow for theseassets to be re-ordered according to some pre-existing categories. A partialDOM representation of the page is shown in Figure 1.3c. Figure 1.3d showsa Selenium test case, written by the developers of the application for veri-fying the application’s functionality in regards to “sorting” and “viewing”the existing assets. The JavaScript code responsible for implementing thefunctionality is shown in Figure 1.3a. The eStore application makes promi-nent use of the jQuery2 JavaScript library, which is used to retrieve DOMelements using the $ symbol. Invoking the $ function with an argument isequivalent to retrieving elements via the getElementById method, whichshown on Line 1 of Figure 1.1.The test case first creates a browser instance and navigates to the assetpage (Figure 1.3d, Line 2). Then, the star icon shown in the upper-rightcorner of Figure 1.3b is clicked (Figure 1.3d, Line 3) to sort the assets bypopularity. Then an assertion is made to check whether the expected assetsare present on the DOM of the page (Line 5). The second portion of thetest case involves scrolling down on the webpage and asserting the existenceof four additional assets on the DOM (Lines 7–9).In the underlying JavaScript code (Figure 1.3a), an event listener is regis-tered (Lines 20–28) with each of the sorting icons in the upper-right portionof the UI. Thus, clicking on any of these icons flushes the asset grid andrepopulates it according to the sort criteria (Line 26). The scroll function(Lines 30–35) is registered to the scroll event of the window object (Line36) and is responsible for appending additional assets to the DOM, as theuser scrolls down the page.1.2.2 ChallengesWhile the mapping between the test case and related JavaScript code mayseem trivial to find for this small example, challenges arise as the JavaScript2http://jquery.com/61.2. Motivationcode base and the DOM increase in size. As a result, it can be dicult tounderstand the reason for a test case failure or even which features are beingtested by a given test case.C1: Identifying Dependencies of a Test Case. The test case in Figure 1.3dfails when executed and displays the message shown in Figure 1.3e. Basedon the failure message alone, it is almost impossible to determine the causeof failure. Closer examination of the test case, however, reveals that the as-sertion fails on variables s1 and s2, which in turn depend on DOM elementswith class asset-icon. This link between the assertion and the DOM islabelled in Figure 1.3 as ∂.C2: Finding the Point of Contact. However, it remains unclear which Java-Script code, if any, was responsible for modifying this subset of DOM ele-ments, which eventually led to the assertion failure. The link to the actualfaulty line of code is not easy to find. In order to make this association,developers must tediously examine the application’s underlying JavaScriptcode in search of any statements that mutate the DOM. In the context of ourexample, a developer would eventually conclude that Line 17 of Figure 1.3ais actually responsible for appending elements to the DOM. This implicitlink between the checked DOM elements and JavaScript code is depictedon the Figure as ∑. Further, lines 9–15 are responsible for generating newDOM elements expected by our failing assertion (∏).C3: Mapping Test Case Execution to Events. At this point, it is still notclear which events would cause the responsible lines of JavaScript code toexecute. Further examination of the code reveals that the renderAssets()function (Figure 1.3a, Line 6) can be called from within two event-handlers inresponse to either a click or a scroll event (Lines 26 and 33, respectively,shown as π).Now that the developer has gathered some contextual information related tothe failure, the relation between the failed test case and the triggered eventhandlers must be derived. While in our running example it may be straight-forward to link the call to scrollWindowDown() (Line 7 of Figure 1.3d) tothe execution of event handler scroll (Line 30–35 of Figure 1.3a) due to thesimilarity in naming convention, such a linear mapping is neither possiblein all cases nor easily inferable.C4: Tracking Dependencies. Finally, going back to the initial point of con-tact between the DOM and JavaScript code (Line 17 of Figure 1.3a), tofully understand an assertion and its possible cause of failure, the data and71.3. Claims and Contributionscontrol dependencies for the DOM-altering statements must be determinedand examined by the developer in order to identify all possible points of fail-ure. Often, such dependencies may exist across the occurrence of multipleDOM or asynchronous events (e.g., timing, XMLHttpRequest). In the caseof eStore, the modification of the DOM within renderAssets() depends onthe arguments passed into the function. Argument url is used to retrieveasset information from the server (Line 7), and argument size is used as acontrol dependency when generating assets (Line 10). Dotted line 4 showspossible invocations of renderAssets(), revealing dependencies on globalvariables (url, sortType, currentPage, gridSize). An erroneous up-date to any of these global variables, in response to a DOM event, couldbe the cause of our assertion failure. Tracing the data dependencies for thecall to renderAssets() on Line 33 reveals that an update to global variablegridSize on Line 25 of Figure 1.3a is the root cause of our application’sunusual behaviour, i.e., gridSize is changed to 12, while line 33 assumes itis still 8, the initial value. Thus, a code change to either Line 25 or 33 wouldfix the fault and resolve the assertion failure reported by our test case.1.3 Claims and ContributionsOur work makes the following key contributions:• An automated technique, called Camellia, for connecting test asser-tion failures to faulty JavaScript code through checked DOM elements.• A selective instrumentation algorithm that statically detects and in-struments only related lines of code, to minimize the trace size andperformance overhead.• A generic JavaScript slicing technique and stand-alone open sourcetool called SliceJS.• An interactive visualization to enhance the debugging experience.• An evaluation of Camellia through a controlled experiment with 12participants, assessing its ecacy and usefulness in understanding testfailures and localizing faults.Our measurements indicate that the selective instrumentation algorithm ise↵ective in reducing the trace size (by 70%), the instrumentation time (by41%), and the execution overhead (by 47%). The results of our controlled81.3. Claims and Contributionsexperiment show that Camellia is capable of improving user accuracy bynearly a factor of two for tasks related to localizing and repairing JavaScriptfaults.This thesis is structured as follows. The next two chapters contain detaileddescriptions of our approach to designing and implementing our solution.An evaluation of our tool, Camellia, is contained within Chapter 4 anda discussion of its implications follows in Chapter 5. Chapter 6 presentsrelated work. We conclude in Chapter 7.9Chapter 2Approach: Modelling WebApplication BehaviourIn the next two chapters, we describe our approach for addressing the chal-lenges mentioned previously in Chapter 1.2. The overall process consists oftwo main phases, the first of which is detailed in this chapter. This initialphase is a general technique for capturing and modelling an application’s be-haviour, and consists of (1) capturing a fine-grained execution trace for theapplication under test, (2) extracting a behavioural model from the capturedtrace, and (3) producing a visualization based on the inferred model.During the second phase of our approach, we expand on this general modelto include test case information. As a result, we are able to relate eachtest failure to the relevant areas of an application’s execution, which assistsdevelopers to better understand test failures. This process is necessary asprogram comprehension is known to be an essential step during softwaremaintenance and debugging activities [11].Below, we describe each of the aforementioned three steps further. Ourtechnical report [7] contains a more elaborate description of the technicaldetails of this initial phase of the approach.2.1 JavaScript Transformation and TracingOur approach aims to help developers understand the context of their asser-tions by monitoring test-related JavaScript execution, DOM events, asyn-chronous events, and DOM mutations (C3 ). Tracking this information al-lows us to generate a trace comprising multiple trace units with each unitrepresenting the occurrence of an asynchronous event or the execution of aJavaScript function. Below, we discuss how we monitor all this activity withthe purpose of relating it to test assertions.102.1. JavaScript Transformation and Tracing2.1.1 Interposing on DOM EventsThere are two ways event listeners can be bound to a DOM element inJavaScript. The first method is programatically using the DOM Level 1e.click=handler or DOM Level 2 e.addEventListener methods [31] inJavaScript code. To record the occurrence of such events, our techniquereplaces the default registration of these JavaScript methods such that allevent listeners are wrapped within a tracing function that logs the occurringevent’s time, type, and target.The second and more traditional way to register an event listener is inline inthe HTML code, e.g., <DIV onclick=‘handler();’>. The e↵ect of this in-line assignment is semantically the same as the first method. Our techniqueinterposes on inline-registered listeners by removing them from their asso-ciated HTML elements, annotating the HTML elements, and re-registeringthem using the substituted addEventListener function. This way we canhandle them similarly to the programmatically registered event handlers.2.1.2 Capturing Timeouts and XHRsFor tracing timeouts, we replace the browser’s setTimeout() method andthe callback function of each timeout with wrapper functions, which allow usto track the instantiation and resolution of each timeout. A timeout callbackusually happens later and triggers new behaviour, and thus we consider itas a separate component than a setTimeout(). We link these togetherthrough a timeout id and represent them as a causal connection later.In our model, we distinguish between three di↵erent components for theopen, send, and response phases of each XHR object. We intercept eachcomponent by replacing the XMLHttpRequest object of the browser. Thenew object captures the information about each component while preservingits functionality.2.1.3 Recording Function TracesTo track the flow of execution within a JavaScript-based application, we in-strument three code constructs, namely function declarations, return state-ments, and function calls. Each of these code constructs are instrumenteddi↵erently, as explained below.112.1. JavaScript Transformation and Tracing29303132333435363739		...		var	scroll	=	function()	{				send(JSON.stringify({messageType:	"FUNCTION	ENTER",	fnName:	"scroll",	args:	null,	...}));				if(infiniteScroll)	{								currentPage++;								renderAssets(url	+sortType	+	currentPage,	gridSize/2)				}				send(JSON.stringify({messageType:	"FUNCTION	EXIT",	fnName:	"scroll",	...}));		};		...Figure 2.1: Instrumented JavaScript function declaration.Function Declarations: Tracing code is automatically added to each functiondeclaration allowing us to track the flow of control between developer-definedfunctions by logging the subroutine’s name, arguments, and line number. Incase of anonymous functions, the line number and source file of the subrou-tine are used as supplementary information to identify the executed code.As this communication is done each time a function is executed, argumentvalues are recorded dynamically at the cost of a slight overhead. Figure 2.1contains the simple scroll() JavaScript function from the running exampleshown in Figure 1.3a (Line 30), which has been instrumented to record boththe beginning and end of its execution (Lines 31 and 36).Return Statements: Apart from reaching the end of a subroutine, control canbe returned back to a calling function through a return statement. Thereare two reasons for instrumenting return statements: (1) to accurately tracknested function calls, and (2) to provide users with the line numbers ofthe executed return statements. Without recording the execution of returnstatements, it would be dicult to accurately track nested function calls.Furthermore, by recording return values and the line number of each returnstatement, our approach is able to provide users with contextual informationthat can be useful during the debugging process.Figure 2.2 illustrates the instrumentation for the return statement ofrenderAssets(), a function originally shown in the running example (Fig-ure 1.3a, Lines 6-18). The wrapper function RSW receives the return value ofthe function and the line number of the return statement and is responsiblefor recording this information before execution of the application’s Java-Script is resumed.Function Calls: In order to report the source of a function invocation,our approach also instruments function calls. When instrumenting func-tion calls, it is important to preserve both the order and context of each122.1. JavaScript Transformation and Tracing6789101112131415161718		var	renderAssets	=	function(url,	size)	{						var	data	=	assetsFromServer(url);								var	temp	=	'<div	class="asset-row">';						for	(i	=	0;	i	<	size;	i++)	{										temp	+=	'		<div	class="asset-icon">';											...	//	Reading	from	variable	'data'										temp	+=	'		</div>';						}						temp	+=	'</div>';													return	RSW($('#assets-container').append(temp),	17);			};Figure 2.2: Instrumented JavaScript return statement.20212223242526272829...		$(document).on('click',	'#sort-assets',	function(){						$('#sort-assets')[[FCW("removeClass",	21)]]('selected-type');						$(this)[[FCW("addClass",	22)]]('selected-type');						currentPage	=	1;						sortType	=		$(this)[[FCW("attr",	24)]]('type');						gridSize	=	12;						FCW(renderAssets,	26)(url	+	sortType	+	(currentPage));						infiniteScroll	=	true;		});				function	FCW(fnName,	lineNo)	{	//	Function	Call	Wrapper						send(JSON.stringify({messageType:	"FUNCTION_CALL",...,	targetFunction:	fnName}));						return	fnName;			}Figure 2.3: Instrumented JavaScript function calls for running example.dynamic call. To accurately capture the function call hierarchy, we modifyfunction calls with an inline wrapper function. This allows us to elegantlydeal with two challenging scenarios. First, when multiple function calls areexecuted from within a single line of JavaScript code, it allows us to inferthe order of these calls without the need for complex static analysis. Second,inline instrumentation enables us to capture nested function calls. Figure2.3 depicts the instrumentation of function calls within an event-handlermethod from Figure 1.3a.Once instrumented using our technique, the function call to renderAssets()is wrapped by function FCW (Line 26). The interposing function FCW() ex-ecutes immediately before each of the original function calls and interlacesour function logging with the application’s original behaviour. Class meth-ods removeClass(), addClass(), and attr() are also instrumented in asimilar way (Lines 21, 22, and 24 respectively).132.2. Capturing a Behavioural Model2.1.4 Tracking DOM ChangesFurther, we track the DOM’s evolution, as information about DOM mu-tations can help developers relate the observable changes of an applicationto the corresponding events and JavaScript code. We leverage the onloadevent of the document object to track the initial DOM state una↵ected byany JavaScript execution. Building on this, we use an observer module to logany subsequent changes to the DOM. This information is interleaved withthe logged information about events and functions, enabling us to link DOMchanges with the JavaScript code that is responsible for these mutations.2.2 Capturing a Behavioural ModelWe use a graph-based model to capture and represent a web application’sevent-based interactions. The graph is multi-edge and directed. It containsan ordered set of nodes, called episodes, linked through edges that preservethe chronological order of event executions.3 We describe the componentsof the graph below.2.2.1 Episode NodesAn episode is a semantically meaningful part of the application behaviour,initiated by a synchronous or an asynchronous event. An event may leadto the execution of JavaScript code, and may change the DOM state of theapplication. An episode node contains information about the static anddynamic characteristics of the application.2.2.2 EdgesIn our model, edges represent a progression of time and are used to connectepisode nodes. Two types of edges are present in the model:• Temporal: The temporal edges connect one episode node to another,indicating that an episode succeeded the previous one in time.3Because JavaScript is single-threaded on all browsers, the events are totally ordered in time.142.3. Visualizing the Captured Model• Causal: These edges are used to connect di↵erent components of anasynchronous event, e.g., timeouts and XHRs. A causal edge fromepisode s to d indicates episode d was caused by episode s in the past.2.2.3 StoryThe term story refers to an arrangement of episode nodes encapsulatinga sequence of interactions with a web application. Di↵erent stories canbe captured according to di↵erent features, goals, or use-cases that needinvestigation. An episode terminates semantically when the execution ofthe JavaScript code related to that episode is finished.DOM mutation units that were interleaved with other trace units are orga-nized and linked to their respective episode for a given story.2.3 Visualizing the Captured ModelIn the final step of the first phase, our technique produces an interactivevisualization of the generated model, which can be used by developers tounderstand the behaviour of the application. The main challenge in thevisualization is to provide a way to display the model without overwhelmingthe developer with the details. To this end, our visualization follows afocus+context [10] technique that provides the details based on a user’sdemand. The idea is to start with an overview of the captured story, let theusers determine which episode they are interested in, and provide an easymeans to drill down to the episode of interest. With integration of focuswithin the context, developers can semantically zoom into each episode togain more details regarding that episode, while preserving the contextualinformation about the story.2.3.1 Semantic Zoom LevelsThe visualization provides 3 semantic zoom levels. The first level displaysall of the episodes in an abstracted manner, showing only the type and thetimestamp of each episode (Figure 2.4, top).When an episode is selected, the view transitions into the second zoom level,which presents an outline of the selected episode, providing more information152.4. Tool Implementation: Clematisabout the source event as well as a high-level JavaScript trace (Figure 2.4,middle). The trace at this level contains only the names of the (1) invokedfunctions, (2) triggered events, and (3) DOM mutations, caused directly orindirectly by the source event. The user can view multiple episodes to havea side-by-side comparison.The final zoom level exhibits all the information embedded in each episode,i.e., detailed information about the source event, the DOM mutations causedby the episode, and the low-level trace (Figure 2.4, bottom). Upon request,the JavaScript code of each executed function is displayed and highlighted.2.4 Tool Implementation: ClematisThe first phase of our approach is implemented in a tool called Clematis,which is freely available [2]. We use a proxy server to automatically interceptand inspect HTTP responses destined for the client’s browser. When aresponse contains JavaScript code, it is transformed by Clematis. Thetrace data collected is periodically transmitted from the browser to the proxyserver in JSON format. To observe low-level DOM mutations, we build onand extend the JavaScript Mutation Summary library [14]. The model isautomatically visualized as a web-based interactive interface.2.5 Results and E↵ectiveness of Proposed ModelA controlled experiment was conducted at a large software company inVancouver to investigate Clematis’s e↵ectiveness in aiding developers inprogram comprehension tasks [6]. For the study, we recruited professionaldevelopers as participants and used an open-source web application as theexperimental object. The evaluation of Clematis points to the ecacyof the approach in reducing the overall time and increasing the accuracyof developer actions, compared to state-of-the-art web development tools.Specifically, for tasks related to program comprehension, developers usingClematis took 47% less time on assigned tasks related to program compre-hension, compared to developers using other web development tools. More-over, the results showed that developers using Clematis performed moreaccurately across all tasks by 61% on average.162.5. Results and E↵ectiveness of Proposed ModelEpisode #4Event Episode #5EventEpisode #3TimeoutEpisode #2Event Episode #1EventEpisode #2Event Episode #1Event Episode #4Event Episode #5EventEpisode #2Event Episode #1Event Episode #4Event Episode #5Eventss_slideshow() ss_update() hideElem(x) dg(x)TID:	1callback:	ss_slideshowduration:	10000TO: 1tex removed text removed text removedtext addedDOM MutationsTraceSourceTID: 1 ss_slideshow() ss_update()hideElem(x) dg(x) inlineElem(x)ss_run() TID: 2text added text addedZoom Level 1 to Level 2Zoom Level 0 to Level 112Figure 2.4: Three semantic zoom levels in Clematis. Top, Level 0:overview. Middle, Level 1: zoomed one level into an episode, while pre-serving the context of the story. Bottom, Level 2: drilled down into theselected episode.17Chapter 3Approach: Test CaseUnderstandingThe second phase of our approach builds on the model and visualizationdescribed in the previous chapter (Chapter 2). Given this method for mod-elling an application’s behaviour, we now focus on relating the producedmodel to a test case’s actual execution. Linking test cases to the availablemodel assists developers in localizing faults from test failures.The overall process has four main steps, described subsequently in the nextfew sections.3.1 Relating Test Assertions to DOM ElementsThe DOM acts as the interface between a front-end test case and an ap-plication’s JavaScript code. Therefore, the first step to understanding thecause for a test case failure is to determine the DOM dependencies for eachtest assertion. While this seems simple in theory, in practice, assertions andelement accesses are often intertwined within a single test case, convolutingthe mapping between the two.Going back to the test case of our running example in Figure 1.3d, the firstassertion on Line 5 is dependent on the DOM elements returned by the ac-cess on the previous line. The last assertion on Line 9 is more complex as itcompares two snapshots of the DOM and therefore has dependencies on 2DOM accesses (Lines 4 and 8). Figure 3.3 summarizes the test case’s execu-tion and captures the temporal and causal relations between each assertionand DOM access.To accurately determine the DOM dependencies of each assertion, and toaddress C1 (∂ in Figure 1.3), we apply dynamic backward slicing to each183.2. Contextualizing Test Case AssertionsAssertion 1 Assertion 2DOM Access 1(icon-star)DOM Access 2(asset-icons)DOM Access 3(asset-icons)+Figure 3.1: Relating assertions to DOM accesses for the test case of Fig-ure 1.3d.test case assertion. In addition, we track the runtime properties of thoseDOM elements accessed by the test case. This runtime information is laterused in our analysis (in Section 2.1) when analyzing the DOM dependenciesof each assertion.3.2 Contextualizing Test Case AssertionsIn the second step, our approach aims to (1) help developers understand thecontext of their assertions by monitoring test-related JavaScript execution,asynchronous events, and DOM mutations (C3 ); (2) determine the initiallink between JavaScript code and the checked DOM elements (C2, ∑ inFigure 1.3).In order to monitor JavaScript events, we leverage the tracing techniqueoutlined in Chapter 2.1, which tracks the occurrence of JavaScript events,function invocations, and DOM mutations. We utilize the tracked mutationsin order to focus on the segments of JavaScript execution most relevant tothe assertions in a test case. As we are only interested in the subset of theDOM relevant to each test case, our approach focuses on the JavaScriptcode that interacts with this subset.Recall that the previous step described in Section 3.1 yields the set of DOMelements relevant to each assertion. We cross reference these sets with thetimestamped DOM mutations in our execution trace to determine the Java-Script functions and events (DOM, timing, or XHR) relevant to each asser-tion.Once the relevant events and JavaScript functions have been identified foreach assertion, we introduce wrapper functions for the native JavaScriptfunctions used by developers to retrieve DOM elements. Specifically, weredefine methods such as getElementById and getElementsByClassName to193.3. Slicing the JavaScript Codetrack DOM accesses within the web application itself so that we know exactlywhere in our collected execution trace the mutation originated. The objectsreturned by these methods are used by the application later to update theDOM. Therefore, we compute the forward slice of these objects to determinethe exact JavaScript lines responsible for updating the DOM. Henceforth,we refer to the references returned by these native methods as JavaScriptDOM accesses.We compare the recorded JavaScript DOM accesses with the DOM depen-dencies of each test case assertion to find the equivalent JavaScript DOMaccesses within the application’s code. Moreover, the ancestors of thoseelements accessed by each assertion are also compared with the recordedJavaScript DOM accesses. This is important because in many cases a directlink might not exist between them. For instance, in the case of our runningexample, a group of assets are compiled and appended to the DOM after ascroll event. We compare the properties of those DOM elements accessedby the final assertion (assets on Lines 4 and 8 of Figure 1.3d), as well as theproperties of those elements’ ancestors, with the recorded JavaScript DOMaccesses and conclude that the assets were added to the DOM via the parentelement assets container on Line 17 of Figure 1.3a (∑).3.3 Slicing the JavaScript CodeAt this point, our approach yields the set of JavaScript statements respon-sible for updating the DOM dependencies of our test case. However, theset in isolation seldom contains the cause of a test failure. We compute abackwards slice for these DOM-mutating statements to find the entire setof statements that perform the DOM mutation, thus addressing C4.Slicing can be done statically or dynamically. Static slicing is often overlyconservative when identifying dependencies for a statement or variable, es-pecially for a dynamic language such as JavaScript. In our approach, wehave opted for dynamic slicing, which enables us to produce thinner slicesthat are representative of each test execution, thus reducing noise duringthe debugging process. Moreover, by using dynamic analysis we are able topresent the user with valuable runtime information that would not be avail-able through static analysis of JavaScript code. For example, if we were tocompute the slice for variable temp on Line 17 of the running example (Fig-ure 1.3a), it would be useful to know the runtime values of the arguments203.3. Slicing the JavaScript Codepassed into renderAssets, as these values a↵ect variable temp.To produce a trace for dynamic slicing, our approach selectively instrumentsthe JavaScript code.3.3.1 Selective InstrumentationAn ideal test case would minimize setup by exercising only the relevant Java-Script code related to its assertions. However, developers are often unawareof the complete inner workings of the application under test. As a result,it is possible for a test case to execute JavaScript code that is unrelated toany of its contained assertions. In such a case, instrumenting an entire webapplication’s JavaScript code base would yield a large trace with unneces-sary information. This can incur high performance overheads, which maychange the web application’s behaviour. Therefore, instead of instrument-ing the entirety of the code for dynamic slicing, our approach interceptsand statically analyzes all JavaScript code sent from the server to the clientto determine which statements may influence the asserted DOM elements.Then, this subset of the application’s code is instrumented. This approachhas two advantages. First, it minimizes the impact our code instrumenta-tion has on the application’s performance. Second, selective instrumentationyields a more relevant and concise execution trace, which in turn lowers theprocessing time required to compute a backward slice.Algorithm 1 summarizes our approach to selective code instrumentation.Our algorithm first converts the code into an abstract syntax tree (AST).This tree is traversed in search of a node matching the initial slicing criteria(Line 6, LocateVariableInSourceCode()). Once found, the function con-taining the initial definition of the variable-in-question is also found (Line7), henceforth referred to as the parent closure. Based on this information,the algorithm searches this parent closure for all references to the variableof interest. This is done in order to find all locations in the JavaScript codewhere the variable may be updated, or where a new alias may be createdfor the variable. As JavaScript is not type-safe, using static analysis to de-termine a variable’s possible type can be complex. Instead of such complexanalysis, our approach assumes that any variable may have a non-primitivevalue at runtime, and instead uses collected runtime information to deter-mine whether alias analysis is required for each variable. Runtime valuetypes are important since only those variables with non-primitive types canbe assigned aliases. Moreover, for each variable update pertaining to the213.3. Slicing the JavaScript CodeAlgorithm 1: Selective Instrumentationinput : sourceCode, slicingCriteria < line, name >output: instrumentedSourceCodebegin1 ⌃criteriatoDo < line, name > slicingCriteria2 ⌃criteriacompleted < line, name > ;3 ⌃vartoInst < closure, name > ;4 while ¬(criteriatoDo.empty()) do5 cnext  criteriatoDo.getF irst()6 location  LocateVariableInSourceCode(cnext)7 closure  DetermineClosure(cnext.name, location)8 if ¬vartoInst.contains(< closure, cnext.name >) then9 vartoInst.add(< closure, cnext.name >)10 ⌃datadeps< line, name > GetDataDepends(closure, cnext.name)11 foreach newData 2 ⌃datadepsdo12 if ¬criteriatoDo.contains(newData)&&¬criteriacompleted.contains(newData) then13 criteriatoDo.add(newData)14 ⌃ctrldeps< line, name > GetControlDepends(closure, cnext.name)15 foreach newCtrl 2 ⌃ctrldeps do16 if ¬criteriatoDo.contains(newCtrl)&&¬criteriacompleted.contains(newCtrl) then17 criteriatoDo.add(newCtrl)18 criteriacompleted.add(criteriatoDo.removeF irst())19 return AugmentCode(sourceCode, vartoInst)variable of interest, we also track the data dependencies for such an opera-tion. Repeating these described steps for each of the detected dependencies(Lines 10–13) allows us to iteratively determine the subset of code state-ments to eciently instrument for a given initial slicing criteria.As an example, if variable temp on Line 17 of Figure 1.3a was providedas input to Algorithm 1, it is apparent that our algorithm would need toinstrument Lines 11, 12, 13, and 15 as these lines directly alter the valueof temp. However, those JavaScript lines involving variable data wouldalso need to be instrumented as it acts as a data dependency to temp onLine 12. Such data dependencies are returned by our selective algorithm(Algorithm 1, Line 10). A similar approach is taken for determining controldependencies (Line 14 of the algorithm).Static analysis is also used to determine the context of function calls. Againreferring to Figure 1.3a, the update to data on Line 7 is dependent onargument url. Since variable data in later used to update temp on Line12, we would need to consider all calls to renderAssets when computing223.3. Slicing the JavaScript Code1			var	currentPage	=	1;2			var	sortType	=	'default';3			var	gridSize	=	_write("gridSize",	8,	3);4			var	infiniteScroll	=	false;5			6			var	renderAssets	=	function(url,	size)	{7							var	data	=	assetsFromServer(url);8			9							var	temp	=	'<div	class="asset-row">';10						for	(i	=	0;	i	<	_read("size",	size,	10);	i++)	{11										temp	+=	'		<div	class="asset-icon">';	12										...	//	Reading	from	variable	'data'13										temp	+=	'		</div>';14						}15						temp	+=	'</div>';16							17						return	$('#assets-container').append(temp);18		};19			20		$(document).on('click',	'#sort-assets',	function(){21						$('#sort-assets').removeClass('selected-type')22						$(this).addClass('selected-type');23						currentPage	=	1;24						sortType	=		$(this).attr('type');25						gridSize	=	_write("gridSize",	12,	25);26						renderAssets(url	+	sortType	+	currentPage,	_readAsArg("gridSize",	gridSize,	26));27						infiniteScroll	=	true;28		});29			30		var	scroll	=	function()	{31						if(infiniteScroll)	{32										currentPage++;33										renderAssets(url	+sortType	+currentPage,	_readAsArg("gridSize",	gridSize,	33)/2);34						}35		};36		$(window).bind('scroll',	scroll);Figure 3.2: JavaScript code of the running example after our selective in-strumentation is applied. Slicing criteria: <10, size>the slice for criteria <17, temp>, since argument url influences the value ofdata (π, Figure 1.3).Once all the possible data and control dependencies have been determinedthrough static analysis, each variable and its parent closure are forwardedto our code transformation module, which instruments the application code(Algorithm 1, line 19) in order to collect a concise trace. The instrumentedcode keeps track of all updates and accesses to all relevant data and controldependencies, hereby referred to as write and read operations, respectively.This trace is later used to extract a dynamic backwards slice.Figure 3.2 shows an example of our code instrumentation technique’s output233.3. Slicing the JavaScript Codewhen applied to the JavaScript code in Figure 1.3a with slicing criteria<10, size>. Recall that by acting as a control dependency for variabletemp, size determines the number of displayed assets for the motivatingexample. For each relevant write operation, our instrumentation code logsinformation such as the name of the variable being written to, the linenumber of the executed statement, and the type of value being assigned tothe variable. Moreover, the data dependencies for such a write operationare also logged. Likewise, for each read operation we record the name ofthe variable being read, the type of value read, and the line number of thestatement. Information about variable type is important when performingalias analysis during the computation of a slice.3.3.2 Computing a Backwards SliceOnce a trace is collected from the selectively instrumented application byrunning the test case, we run our dynamic slicing algorithm. We use dy-namic slicing as it is much more accurate than static slicing at capturingthe exact set of dependencies exercised by the test case. Referring to Fig-ure 1.3d, a static slice for criteria <10, size> would contain Lines 3, 10,25, 26, and 33. However, if the test case shown in Figure 1.3d were onlypartially executed, say up to the first assertion on Line 5, we would notbe interested in Line 33 when viewing the produced slice, as the scrollevent and function were never executed. The trace produced by partiallyexecuting the aforementioned test case would allow us to omit Line 33 fromour slice, since that line would be absent from the execution trace. Thus,dynamic analysis is able to provide slices that are tailored to each test caseexecution.The algorithm starts by extracting all instances of the initial slicing crite-ria from the trace (Line 5). For each of the read operations, the trace istraversed backwards (temporally) in search of the nearest related write op-eration (Algorithm 2, Lines 6–8). Once found, the write operation is addedto the slice under construction. It may happen that the nearest write aug-ments a previous write (e.g. i++); in this case, the backwards traversal ofthe trace is continued after adding the augmented write operation to theslice (Line 17). This process is repeated for all the data dependencies re-lated to that write operation (Lines 10–12). A similar approach is taken forincluding control dependencies in the slice.The task of slicing is complicated by the presence of aliases in JavaScript.243.3. Slicing the JavaScript CodeAlgorithm 2: Backward Slicinginput : dynamicTrace, slicingCriteria < line, name >output: ⌃sliceStatementsbegin1 ⌃stmtsToReturn < line > ;2 currentOp  ;3 while dynamicTrace.hasNext() do4 currentOp  dynamicTrace.next()5 if currentOp == slicingCriteria then6 while currentOp.hasPrev() do7 backOp  currentOp.prev()8 if backOp instanceof write &&backOp.variable == slicingCriteria.name then9 stmtsToReturn.add(backOp)10 ⌃varReads < line, name > GetReadsForWrite(backOp)11 foreach read 2 ⌃varReads do12 stmtsToReturn.add(ComputeSlice(read))13 if ¬isPrimitive(backOp) then14 ⌃alias < line, name > GetNewAliases(backOp, currentOp)15 foreach alias 2 ⌃alias do16 stmtsToReturn.add(WritesTo(alias,prevOp))17 if ¬isAugmentation(backOp) then18 break19 return stmtsToReturnWhen computing the slice of a variable that has been assigned a non-primitive value, we need to consider possible aliases that may refer to thesame object in memory. This also occurs in other languages such as C andJava, however, specific to JavaScript is the use of the dot notation, whichcan be used to seamlessly modify objects at runtime. The prevalent use ofaliases and the dot notation in web applications often complicates the issueof code comprehension. Static analysis techniques often ignore addressingthis issue [12].To remedy this issue, we incorporate dynamic analysis in our slicing method.If a reference to an object of interest is saved to a second object’s property,possibly through the use of the dot notation, the object of interest may alsobe altered via aliases of the second object. For example, after executingstatement a.b.c = objOfInterest;, updates to objOfInterest may bepossible through a, a.b, or a.b.c. To deal with this and other similar253.4. Visualizing the Inferred Mappingsscenarios, our slicing algorithm searches through the collected trace andadds the forward slice (Lines 13–16) for each detected alias to the currentslice for our variable of interest (e.g. objOfInterest).The line numbers for each of the identified relevant statements in the com-puted slice are collected and used during the visualization step, as shown inthe next section.Browser ServerProxyJavaScript Analysis and TransformationTrace CollectionSelect JS R/W OperationsDOM Access WrapperVisualizationMap Test Case Accesses to TraceBlack-Box Test Case123Compute JavaScript Slices456Function & Event TraceDOM Accesses & MutationsFigure 3.3: Processing view of approach.3.4 Visualizing the Inferred MappingsIn the final step, Camellia produces an interactive visualization of the in-ferred mappings for the test failure. This visualization helps to understand(1) the client-side JavaScript code related to the assertion failure, (2) thetest case’s relations to DOM changes and JavaScript execution, and/or (3)any deviations in the expected behaviour with respect to a previous versionwhere the test passed. The visualization is built on top of our Clematisframework [6], which has been designed to support developers in under-standing complex event-based interactions in JavaScript applications (seeChapter 2). However, Clematis does not support test case understanding.263.4. Visualizing the Inferred MappingsFigure 3.4a depicts an example of the high-level view provided by our visu-alization for a test case. In the high-level view, the progress of an executedtest case over time is depicted on the horizontal axis where the earliest as-sertions are shown on the left-hand side of the high-level view and the mostrecent JavaScript events and assertions are shown closer to the right-handside. The top of Figure 3.4b shows the high-level visualization produced byrunning the same test case from Figure 3.4a on a faulty version of the appli-cation. Passing assertions for a test case are represented as grey nodes, andfailures are shown in red. DOM events, timing events, and network-relatedJavaScript events are visualized alongside the assertions as green, purpleand blue nodes, respectively.3.4.1 Semantic Zoom LevelsAdditional details for each assertion or event are provided to the user on de-mand following the focus+context style. Each JavaScript event node in thehigh-level view can be expanded to view its related call graph and DOM mu-tations, as mentioned in Chapter 2.3. In the case of an assertion, causal linksrelate the assertion to prior events that may have influenced its outcome.These are events that altered portions of the DOM relevant to the assertion.Moreover, clicking on a failed assertion node reveals additional details aboutit (Figure 3.4b, transition ∂). Details include related (1) DOM dependen-cies, (2) failure messages, and (3) related JavaScript functions. The finalzoom level of an assertion node displays all the information captured for theassertion including the captured slice (transition ∑), and the line numbersof the failing test case assertions (transition ∏).3.4.2 Stepping Through the SliceWhen displaying the code slice for an assertion, each line of JavaScriptcode that may have influenced the assertion’s outcome is highlighted in thecontext of the application’s source code (Figure 3.4b, lower-right). Theuser can further explore the captured slice by stepping through its recordedexecution using a provided control panel, shown in green on Figure 3.4b. Bydoing so, the user is able to take a post-mortem approach to fault localizationwhereby the application’s faulty behaviour is studied deterministically o✏ineafter it has completed its execution. Here, the user can also examine thecaptured runtime values of relevant JavaScript variables. Subsets of the slice273.5. Tool Implementationcan also be viewed by further filtering which variables should be included inthe displayed slice.Figure 3.4b shows the generated visualization for the test case from the mo-tivating example (Figure 1.3d). The JavaScript slice related to the secondassertion on Line 9 of Figure 1.3d is shown, alongside the expected scrollDOM event. Using the control panel, the user can step through the pro-duced slice backwards, rewinding the application’s execution from Line 17of Figure 1.3a.3.5 Tool ImplementationThe second phase of our technique is implemented in two open source tools,namely SliceJS and Camellia.SliceJS [3] is an independent JavaScript backwards slicing tool that imple-ments our selective code instrumentation and backwards slicing algorithms.Camellia [1] builds on top of SliceJS and other libraries to provide acoherent framework for web application test failure understanding and faultlocalization. Camellia supports Selenium test cases currently. In orderto compute code slices for the assertions in each test case, we build uponEclipse’s Java Development Tools (JDT) and JavaSlicer [16]. The Java-Script Mutation Summary library [14] is leveraged to monitor the DOM’sevolution during a test case. The visualization is built as an extension ontop of Clematis [6].283.5. Tool Implementation	public	void	testSortByDefaults()	{				driver.get(	"http://localhost:9763/store/assets/gadget");					driver.findElement(By.css("i.icon-star")).click();					int	s1	=	driver.findElements(By.css(".asset-icon")).size();					assertEquals(12,	s1);					scrollWindowDown();					int	s2	=	driver.findElements(By.css(".asset-icon")).size();					assertEquals(4,	s2	-	s1);	}	(b)(a)Episode #3Scroll Assertion #2FailAssertion #1PassEpisode #2Click Episode #1LoadEpisode #3Scroll Assertion #2PassAssertion #1PassEpisode #2Click Episode #1Loadvar	currentPage	=	1;var	sortType	=	'default';var	gridSize	=	8;var	infiniteScroll	=	false;var	renderAssets	=	function(url,	size)	{				var	data	=	assetsFromServer(url);				var	temp	=	'<div	class="asset-row">';				for	(i	=	0;	i	<	size;	i++)	{								temp	+=	'		<div	class="asset-icon">';									...	//	Reading	from	variable	'data'								temp	+=	'		</div>';				}				temp	+=	'</div>';				return	$('#assets-container').append(temp);};				$(document).on('click',	'#sort-assets',	function(){								$('#sort-assets').removeClass('selected-type')								$(this).addClass('selected-type');								currentPage	=	1;								sortType	=		$(this).attr('type');								gridSize	=	12;								renderAssets(url	+	sortType	+	currentPage,	gridSize)								infiniteScroll	=	true;				});				var	scroll	=	function()	{								if(infiniteScroll)	{												currentPage++;												renderAssets(url	+sortType	+	currentPage,	gridSize/2)								}				};				$(window).bind('scroll',	scroll);123456789101112131415161718192021222324252627282930313233343536Current	Line:	17	|	Operation:	Read	|	Variable:	temp	|	Value:	'<div12345678910Episode #3Scroll Assertion #1PassEpisode #2Click Episode #1Loadorg.junit.ComparisonFailure: expected:<[4]> but was: <[6]>class: "asset-icon"tagName: "div"renderAssets() scroll() anonymous20()Related FunctionsFailure MessageDOM DependencyTest Case SummaryApplication JavaScript Slice132Figure 3.4: Visualization for a test case. (a) Overview of the passing testcase, (b) Three semantic zoom levels for the failing test case; Top: overview.Middle: second zoom level showing assertion details, while preserving thecontext. Bottom: summary of failing assertion and the backwards slice.29Chapter 4EvaluationTo assess the e↵ectiveness of Camellia in understanding test failures, weconducted a controlled experiment [33]. Our evaluation aims to address thefollowing research questions:RQ1 Is Camellia helpful in localizing (and repairing) JavaScript faultsdetected by test cases?RQ2 What is the performance overhead of using Camellia? Is the overallperformance acceptable?4.1 Experimental Design4.1.1 Experimental ObjectWe used Phormer,4 a photo gallery web application, in our controlled ex-periment. Phormer is comprised of approximately 6K lines of JavaScript,PHP, and CSS code and has amassed around 40K downloads at the time ofthis study. In terms of features, Phormer allows users to upload and ratephotos and can also display the photos as a slideshow.4.1.2 Participants12 participants were recruited for the study at the University of BritishColumbia (UBC). Of these, three were female and nine were male. More-over, the participants were drawn from di↵erent education levels at UBC:two undergraduate students, six Master’s students, three Ph.D. students,and one post-doctoral fellow. While the participants did represent di↵erentareas of software engineering, all had prior experience in web development4http://p.horm.org/er/304.1. Experimental Designand testing, ranging from beginner to professional. Furthermore, six of theparticipants had worked in industry previously either full-time or throughinternships.Subjects were placed into either (1) an experimental group using Camellia,or (2) a control group. Subjects in the control group were allowed to use anytool of their choice for completing the experimental tasks. Two participantschose to use Chrome Developer tools, two used the default development toolsprovided by Firefox, and the remaining two in the control group opted touse Firebug. To conduct this “between-subject” experiment, subjects wereassigned to either group manually, based on their reported level of expertisein web development and testing. A 5-point Likert scale was used in a pre-questionnaire to collect information about each participant’s competency inorder to maintain an equal level of expertise and minimize bias between thetwo groups. The participants had no prior experience with Camellia.4.1.3 Task DesignTo answer RQ1, participants were given two main tasks, each involving thedebugging of a test failure in the Phormer application (Table 4.1). For eachtask, participants were given a brief description of the failure and a test casecapable of detecting the failure.For the first task, they were asked to locate an injected fault in Phormergiven a failing test case. Participants were asked not to modify the applica-tion’s JavaScript code during task 1.The second task involved identifying and fixing a regression fault (unrelatedto the first one). For this task, participants were asked to locate and repairthe fault(s) causing the test failure. As the second failure was caused bythree separate faults, participants were allowed to modify the applicationsource code in order to iteratively uncover each fault by rerunning the testcase. In addition to the failing test case, participants in both groups weregiven two version of Phormer, the faulty version and the original fault-freeone. The intention here was to simulate a regression testing environment.The injected faults are based on common mistakes JavaScript developersmake in practice, as identified by Mirshokraie et al. [24].314.1. Experimental DesignTable 4.1: Injected faults for the controlled experiment.Fault Fault Description Detecting Test Case RelatedTaskF1 Altered unary operation related to navigatingslideshow SlideShowTest T1F2 Modified string related to photo-rating feature MainViewTest T2F3 Changed number in branch condition forphoto-rating feature MainViewTest T2F4 Transformed string/URL related to photo-rating feature MainViewTest T24.1.4 Independent Variable (IV)The IV in the experiment is the tool used for performing the tasks. The twolevels of the variable are (1) Camellia, and (2) the group of other toolsused in the experiment.4.1.5 Dependent Variables (DV)The dependent variables for the experiment are (1) task completion accuracy(discrete), and (2) task completion duration (continuous).4.1.6 Data AnalysisTwo types of statistical tests were used to compare dependent variablesacross the two groups of participants. An independent-samples t-test withunequal variances was used for comparing task duration as the data col-lected was normally distributed, and a Mann-Whitney U test was used forcomparing task accuracy since the data was not normally distributed. TheShapiro-Wilk normality test was used to check the distribution of both datasets. We use the statistical analysis package R [13] for the analysis.4.1.7 ProcedureOur lab was used as the setting for the study. At the beginning of eachexperiment session, the participating user was informed about the generalpurpose of the study. Overall, there were 4 main phases to the experiment.First, participants were asked to complete a pre-questionnaire regardingtheir experience and expertise in the web development and testing field.324.2. ResultsThe participants in the experimental group were then given a tutorial onCamellia and were allowed a few minutes to familiarize themselves withthe tool. The participants in the control group were not given any tutorialregarding their chosen tool, as they were already familiar with it.Next, participants were asked to perform the two tasks. Textual descriptionsof each task were given to participants on separate pieces of paper, with spaceleft on the sheet for participant answers. Once the first task was complete,the sheet with the answer was returned to the investigator immediately. Thisstep allowed us to accurately measure the task completion time. Accuracyfor each task was marked from 0 to 100 according to a rubric that was draftedprior to conducting the first study. As a result, we were able to quantifyeach participant’s accuracy per task. Lastly, participants completed a post-questionnaire about their experiences with the tool.A maximum of 1.5 hours was allocated for the study: 10 minutes were des-ignated for an introduction, 15 minutes were allotted for users to familiarizethemselves with the tool being used, 20 minutes were allocated for task 1,another 30 minutes were set aside for task 2, and 15 minutes were used forcompleting the questionnaire at the end of the study.4.2 ResultsFigure 4.1 depict box plots of task completion accuracy and duration, pertask and in total, for both the experimental group (Exp) and the controlgroup (Ctrl).4.2.1 AccuracyThe accuracy of participant answers was calculated to answer RQ1. Overall,the group using Camellia (M=95.83, SD=10.21) performed much moreaccurately than the control group (M=47.92, SD=45.01). The results showa statistically significant improvement for the experimental group (p-value=0.032, at the 5% significance level). Comparing the results for the twotasks separately, the experimental group performed better on both tasks onaverage.Note that the tasks were not biased towards our tool, as there were partici-pants in the control group who did complete the tasks correctly. Therefore,334.2. Resultsit was possible to complete the assigned tasks by using only traditional de-bugging tools. In particular, three of the participants in the control groupwere able to correctly complete the first task, despite only having moder-ate to little domain experience. Further, two of these three control groupparticipants completed the second task correctly. The results show that par-ticipants using Camellia performed more accurately across both tasks bya factor of two, on average, compared to those participants in the controlgroup.4.2.2 DurationTo further answer RQ1, we measured the amount of time (minutes:seconds)spent on task 1 by participants. The task duration for the first task was com-pared between the experimental group using Camellia (M=5:42, SD=2:10)and the control group (M=12:03, SD=4:29). According to the results ofthe test, there was a statistically significant di↵erence (p-value=0.016) indurations between the experimental group and control group. We also mea-sured the amount of time spent by participants on task 2, repairing faults.The results indicate no improvements in time for the experimental group(M=23:23, SD=6:31) compared to the control group (M=19:46, SD=8:05).However, as mentioned above, the participants using Camellia performedmuch more accurately on this second task, suggesting that the task is com-plex and the main advantage of using Camellia is in accurate completionof the task. Those participants in the control group who answered task 2correctly required a mean duration of 25:21 to complete the task, which isa longer time than the mean duration of the experimental group. The re-sults show that developers using Camellia took 54% less time to localize adetected fault. The results are inconclusive regarding fault repair time.4.2.3 Qualitative FeedbackQualitative feedback was gathered from the participants during the post-questionnaire, which allowed us to gain further insight into Camellia’susefulness. Overall, the feedback regarding Camellia was very positive.The features that were found most useful by the experimental group were(1) the slicing functionality, (2) the ability to revisit runtime values for vari-ables included in each slice, and (3) the visualization of JavaScript events,which helped users to better understand each test case initially. Moreover,344.3. Performance Overheadthe participants in the control group reported diculty tracking applica-tion execution with existing tools, wishing they had some way of filteringirrelevant source code. Although users in the control group did have theability to step through JavaScript execution using their browser’s develop-ment environment, they felt it was cumbersome to use test case breakpointsin conjunction with JavaScript breakpoints. In terms of requested featuresfor Camellia, some participants would like to see a more automated way ofcomparing slices for two di↵erent application versions, instead of manuallycomparing the two slices themselves. Additionally, many wished for moredetail when viewing each JavaScript slice. Specifically, users would like abetter distinction between data and control dependencies. Some participantsin the experimental group also wished for particular variables and operationsto be highlighted in the slice visualization instead of simply highlighting theentire executed line.4.3 Performance OverheadTo collect performance measurements (RQ2), we executed each of the twotest cases from our controlled experiment on the experimental object, Phor-mer. Camellia was tested with both selective instrumentation enabledand disabled. The two tests were run 10 times each, yielding the followingresults:4.3.1 Instrumentation OverheadAverage delays of 1.29 and 1.83 seconds were introduced by the selectiveand non-selective instrumentation algorithms, respectively, on top of the407 ms required to create a new browser instance. Moreover, the averagetrace produced by executing the selectively instrumented application was37 KB in size. Executing a completely instrumented application resultedin an average trace size of 125 KB. Thus, the selective instrumentationapproach is able to reduce trace size by 70% on average, while also reducinginstrumentation time by 41%.354.3. Performance Overhead4.3.2 Execution OverheadThe actual execution of each test case required an additional 246 ms for theselectively instrumented application. Instrumenting the entire applicationwithout static analysis resulted in each test case taking 465 ms longer to exe-cute. Based on these measurements, our selective instrumentation approachlowers the execution overhead associated with Camellia by 47%.4.3.3 Dynamic Analysis OverheadIt took Camellia 585 ms on average to compute each JavaScript slice whenutilizing selective instrumentation. Non-selective instrumentation length-ened the required dynamic analysis time to 750 ms. By analyzing a moreconcise execution trace, Camellia was able to lower the slice computationtime by 22%.Thus, we see that Camellia incurs low performance overhead in all threecomponents, mainly due to its selective instrumentation capabilities.364.3. Performance OverheadT1−ExpT1−CtrlT2−ExpT2−CtrlTotal−ExpTotal−Ctrl020406080100 Accuracy (% )(a)(b)T1−ExpT1−CtrlT2−ExpT2−CtrlTotal−ExpTotal−CtrlDuration  (mm :ss)8:2016:4025:0033:2041:4050:00Figure 4.1: Box plots of (a) task completion accuracy data per task and intotal for the control and experimental groups (higher values are desired),(b) task completion duration data per task and in total for the control andexperimental groups (lower values are desired).37Chapter 5Discussion5.1 Task Completion AccuracyThe results from both experimental tasks suggest that Camellia is capableof significantly improving the fault localization and repair capabilities ofdevelopers (RQ1).The improvement in accuracy for Camellia was most evident when thefault influenced a complex data flow sequence within the application. Thedebugging in T2 involved tracing both the execution of multiple nested func-tion calls as well as the initialization and callback of an asynchronous Xml-HttpRequest object. Although this asynchronous object is a feature specificto JavaScript, understanding a failure involving multiple function calls is atask faced by developers of all applications, regardless of the involved pro-gramming language. However, many participants in the control group failedto correctly localize the fault in T2, illustrating the diculty in tracingdependencies in a dynamic language such as JavaScript.The fault in T1 a↵ected a global variable, which was updated frequentlythroughout the application’s source code by multiple event handlers. From adeveloper’s perspective, any of these handlers could have contained the fault,and we believe providing users with the runtime value of this global vari-able was instrumental in helping them identify the fault in the experimentalgroup. Although users in the control group had access to breakpoints, manyof them had diculty stepping through the application’s execution at run-time due to the existence of asynchronous events such as timeouts, whichcaused non-deterministic behaviour in the application when triggered in thepresence of breakpoints.Many of the participants in the control group fixed the failure instead ofthe actual fault ; they altered the application’s JavaScript code such thatthe provided test case would pass, yet the faults still remained unfixed. The385.2. Task Completion DurationJavaScript code related to task 2 contained multiple statements that ac-cessed the DOM dependency of the failing test case assertion. Participantswho simply corrected the failure had trouble identifying which of these state-ments was related to the fault, and as a result would alter the wrong portionof the code. On the other hand, those participants using Camellia wereable to reason about these DOM altering statements using the provided linksand slices.5.2 Task Completion DurationWhile task duration was significantly improved by Camellia for task 1, theaverage measured task duration was in fact longer for Camellia in task 2.However, studying the accuracy results for task 2 reveals that many of theparticipants in the control group failed at correcting the faults, and insteadsimply addressed the failure directly. This may explain the reason for noobservable improvement in task duration for task 2, as hiding the failureoften requires less e↵ort than repairing the actual fault.The average recorded task duration for task 1 was significantly lower forthe experimental group. The participants in the control group often usedbreakpoints to step through the application’s execution while running theprovided test case. When unsure of the application’s execution, these devel-opers would restart the application and re-execute the test case, extendingtheir task duration. Instead of following a similar approach, those developersusing Camellia were able to rewind and replay the application’s executionmultiple times o✏ine, after only executing the test case once. The tracecollected by Camellia during this initial test case execution was used todeterministically replay the execution while avoiding the overhead associ-ated with re-running the test case.5.3 Reducing Variance in User PerformanceFurther dividing the participants into smaller groups within both the initialexperimental group and control group, based on their reported competency,reveals a relationship between user expertise and task performance. Specif-ically, comparing the performance gap between the beginners in the twoinitial groups against the gap between experts in the same groups shows395.4. Strengthsthat the greatest improvement in performance is achieved by beginner-leveldevelopers. While those beginners tasked with localizing the injected faultswithout Camellia had diculty due to the challenges discussed in Sec-tion 1.2, beginners in the experimental group were automatically providedwith useful information by Camellia. Such information sheds light on anapplication’s execution and is only accessible by advanced debugging tech-niques with traditional tools. Conversely, those experts using Camelliamay have add more diculty adjusting to the new development environmentsince their debugging habits are more established than those beginners usingCamellia.5.4 StrengthsCamellia provides users with a starting point during the fault localizationprocess. Specifically, the produced visualization allows users to quickly gainan overview of the failing test case and its relation to the application undertest. This is especially useful when unfamiliar with an existing code base(e.g. debugging code written by other developers). This benefit was evidentin our experiment where participants were unfamiliar with the experimentalobject, Phormer. Those developers using Camellia were far more likelyto address the actual program faults rather than simply hiding the failure.Furthermore, the ability to capture an application’s execution makes it easierto collaborate with other developers when debugging, since a failing test casestory can be shared and deterministically replayed.The selective instrumentation algorithm yields a concise and relevant execu-tion trace, making our slicing approach scalable to large web applications.Additionally, by instrumenting JavaScript code on-the-fly, our approach re-mains browser independent and requires no extra e↵ort from the developerto use.5.5 ImprovementsDespite being well-received by the users in our experiment, we acknowledgesome possible improvements for Camellia. Currently, Camellia providesusers with a dynamic JavaScript slice related to the DOM dependenciesfor a given assertion. However, our instrumentation and slicing algorithms405.6. Limitationsare agnostic to the context of each assertion. It is possible to produce athinner slice for each assertion by leveraging the contextual clues providedby the assertions themselves. Referring to the test case shown in our runningexample in Figure 1.3d, the assertion on Lines 5 is checking for the number ofelements with class asset-icon. When producing a slice for this assertion,our algorithm could be improved to only instrument those JavaScript linesrelated to the creation or removal of asset-icon elements. Similarly, ifan assertion was checking the textual contents of an element (as shown onLines 5 and 9 of Figure 1.2), we could reduce the amount of noise in theproduced slice by only including JavaScript code related to textual updatesto the DOM.Many participants in our experimental group found the interface of Camel-lia to be fragmented. As a result, it was not always immediately clear whichprogram slices were related to which assertions. A stronger mapping fromeach DOM-based assertion to its related JavaScript slice would help stream-line the debugging process when using Camellia.An automated way for comparing JavaScript slices for passing and failingassertions would help ease the fault localization process further. During theexperiment, users of Camellia would often use the program slice from thepassing version of Phormer as a guideline when debugging the failing versionof the application. An algorithm could be implemented for comparing exe-cution paths between two application versions, allowing users to more easilyidentify any deviation in program behaviour between the two versions.5.6 LimitationsIn terms of limitations, our approach does not support the eval construct ofJavaScript. Additionally, support of third-party JavaScript libraries is lim-ited. Libraries such a jQuery are often used by developers to simplify thedevelopment of an application’s client-side behaviour. The library itself pro-vides many function abstractions that address cross-browser compatibilityissues. Currently, our approach does not instrument or track the executionof third-party library code. As a result, library code is not included in ourproduced JavaScript slices. The assumption is made that a fault resides inthe application’s own source code. However, we believe this limitation issurmountable with additional research and development time.The scope of our fault localization approach is limited to the client-side415.7. Threats to Validitycomponent of a web application. If a fault exists within an application’sJavaScript or HTML code, Camellia is able to provide users with thenecessary information for them to eventually localize the issue. However,failures caused by faults related to server-side code are outside the scope ofour project.5.7 Threats to ValidityOur participants might not be representative of industrial developers. How-ever, many of the participants in our experiment had previous software andweb development experience in industry. A possible threat to the validity ofour experiment arises when comparing the competency of each group of par-ticipants. We maintained an equal level of expertise between the two groupsby manually placing participants into each group based on the results of ourpre-questionnaire.Another threat regarding representativeness arises for the injected faultsand test cases used to illustrate Camellia’s usefulness. We mitigated thisthreat by using a real-world web application, test cases written by otherdevelopers, and by injecting faults that represent common web applicationdevelopment mistakes. The test cases and faults injected in our experimenttouched di↵erent areas of the application’s source code, disassociating thetwo tasks from each other.Additional threats arise in regards to measuring task accuracy and duration.To avoid any bias related to the marking of participants’ answers, a scoringrubric was compiled ahead of conducting the study, which clearly containedthe correct answers for each task. We were able to accurately measure taskduration by presenting the participants with each task on a separate pieceof paper. Participants were told to return each piece of paper with theiranswer once they had completed the assigned task, at which point the tasktimer was stopped. Lastly, we avoided an inconsequential comparison inour experiment by allowing those participants in the control group to usewhichever tool they felt most comfortable with.Our tool Camellia [1] and the experimental object Phormer are all publiclyavailable making it possible for our controlled experiment to be replicated.42Chapter 6Related WorkRelated to our work are debugging and fault localization techniques.6.1 Fault localizationDelta debugging [35] is a technique whereby the code change responsiblefor a failure is systematically deduced by narrowing the state di↵erencesbetween a passing and a failing run. Changes to a program’s code baseare reverted from a failing version until a compilable version is found topass. This process yields the code change to blame for the failure. Whiledelta debugging can be used to simplify test failure localization, it makesthe assumption that there is a corresponding passing test for the failingtest. Our approach, in contrast, (1) does not require a passing test, and(2) presents the user with a slice of the code responsible for the assertionfailure.Other fault localization techniques have been proposed that compare dif-ferent characteristics of passing and failing runs for a program [9, 15, 27].In particular, Agrawal et al. [5] propose a tool, x slice, which produces andcompares execution slices for C programs based on generated test cases. Ourapproach tailors the comparison-based localization approach to the web ap-plication domain by relating JavaScript coverage, asynchronous events, andDOM-based assertions to each other. Moreover, calculating and displayingthe JavaScript code slice for a DOM-based assertion poses new challengesnot faced by previous techniques, due to the dynamic nature of JavaScriptand its interactions with the DOM.TestNForce [17], a plugin developed for Visual Studios, aides developers intest case maintenance. Similar to our approach, TestNForce is concernedwith relating test cases to the underlying code of the product under test.The plugin determines which tests should be updated or executed after436.2. Program Slicingproduction code has been altered. In contrast, the technique proposed inthis paper is concerned with finding the line of code which could have causedan existing test to fail.Ocariza et al. [26] proposed a technique for localizing faults in JavaScriptcode. In particular, their technique focuses on code-terminating JavaScripterrors originated from DOM API calls. Our work, however, is concernedwith identifying the JavaScript code responsible for an assertion failure of aDOM-based test case.6.2 Program SlicingOriginally proposed by Weiser [32], program slicing techniques can be clas-sified in two categories, namely static and dynamic slicing [21]. WALA [28]performs JavaScript slicing by inferring a call graph through static analysis.Since JavaScript is such a dynamic language, WALA yields conservative re-sults that may not be reflective of an application’s actual execution. It alsoignores the JavaScript-DOM interactions completely. Although not used forslicing purposes, others [25, 34] have utilized static analysis to reduce theexecution overhead incurred from code instrumentation. Our approach de-termines slices via a combination of static and dynamic program analysisthrough a selective code instrumentation algorithm.6.3 Behavioural DebuggingJin et al. [18] present a technique for automatically identifying the be-havioural di↵erences between two versions of the same Java program. Theirsolution analyzes the code di↵erences between two versions, generates testcases to exercise these changed portions in particular, and presents to theuser the behavioural di↵erences between the two programs. Similarly, ourwork highlights the behavioural di↵erences between two versions of a pro-gram. However, our work di↵ers in two ways. First, it relies on existing testcases to exercise the application. Second, it focuses on di↵erences in Java-Script code related to DOM elements rather than those di↵erences relatedto Java code changes.Ko et al. [20] recognized that during debugging, developers have troubletranslating their failure-related hypothesis into program queries. To address446.4. Visualization for fault localizationthis challenge a new tool called WhyLine was developed, which is capableof automatically generating high-level questions about program behaviour.Once a generated question is selected by the user, the source code responsiblefor the application behaviour in question is displayed to them. While bothour approach and the one implemented in WhyLine are concerned withrelating an application’s execution to its visible front-end, our approachdoesn’t attempt to generate questions and instead aims to answer thosequestions implied by a failed assertion. Because of this di↵erence, we areable to more selectively record and analyze application behaviour.6.4 Visualization for fault localizationJones et al. [19] propose a method to help developers map executed pro-gram statements to test case outcomes. The intuition here is that a programfault is more likely to reside in a statement that is highly executed by failingtest cases and their approach reflects this by colour coding specific lines ina visualization. In contrast, our approach is more concerned with compar-ing execution runs for each individual test case in isolation. Furthermore,our visualization displays code slices instead of code coverage and presentssuch information alongside JavaScript event-based data, thereby facilitatingdebugging of the code.45Chapter 7ConclusionThis work is motivated by the existing disconnect between front-end testcases and an web application’s underlying JavaScript code, which adverselyinfluences failure comprehension and fault localization. To bridge this gap,we proposed an automated technique to determine implicit links between atest assertion, the checked DOM elements, and the related JavaScript codeand events. We presented a novel selective instrumentation and code slic-ing approach capable of abstracting away irrelevant data when localizinga fault. We implemented our approach in a tool called Camellia, whichvisualizes implicit connections between assertion failures and faults. Weevaluated Camellia through a controlled experiment using an open-sourceweb application. We found that Camellia can provide developers withuseful information during assertion failure understanding and fault localiza-tion steps, and is capable of improving accuracy in tasks related to faultlocalization and repair.While our implementation is specific to web applications, we believe theoverall approach can be applied to other domains. The challenges faced byweb developers during the fault localization process are likely familiar tothose developers in other domains, where applications can also be composedof separate but closely-coupled entities (e.g. model-view-controller designpattern [22]). In such cases, it can be dicult to mentally model how eachentity interacts with the others. We believe other developers can reap thebenefits of our approach in such scenarios.As future work, we plan to improve our slicing algorithm to better supportpopular third-party JavaScript libraries. We also plan to incorporate thefeedback obtained from the participants of the experiment into Camellia.46Bibliography[1] Camellia. http://salt.ece.ubc.ca/software/camellia/.[2] Clematis. http://salt.ece.ubc.ca/software/clematis/.[3] SliceJS. http://salt.ece.ubc.ca/software/slicejs/.[4] WSO2 EnterpriseStore. https://github.com/wso2/enterprise-store.[5] H. Agrawal, J.R. Horgan, S. London, andW.E. Wong. Fault localizationusing execution slices and dataflow tests. In Proc. of Intl. Symp. onSoftware Reliability Engineering (ISSRE), pages 143–151. IEEE, 1995.[6] Saba Alimadadi, Sheldon Sequeira, Ali Mesbah, and Karthik Pattabi-raman. Understanding JavaScript event-based interactions. In Proceed-ings of the International Conference on Software Engineering (ICSE),pages 367–377. ACM, 2014.[7] Saba Alimadadi, Sheldon Sequeira, Ali Mesbah, and Karthik Pattabira-man. Understanding JavaScript event-based interactions. Technical Re-port UBC-SALT-2014-001, University of British Columbia, 2014. http://salt.ece.ubc.ca/publications/docs/UBC-SALT-2014-001.pdf.[8] Shay Artzi, Julian Dolby, Simon Holm Jensen, Anders Møller, andFrank Tip. A framework for automated testing of JavaScript web ap-plications. In Proceedings of the International Conference on SoftwareEngineering (ICSE). ACM, 2011.[9] H. Cleve and A. Zeller. Locating causes of program failures. In Proceed-ings of the International Conference on Software Engineering (ICSE),pages 342–351. ACM, 2005.[10] Andy Cockburn, Amy Karlson, and Benjamin B. Bederson. A review ofoverview+detail, zooming, and focus+context interfaces. ACM Com-puting Surveys, 41(1):2:1–2:31, 2009.47Bibliography[11] Thomas A. Corbi. Program understanding: Challenge for the 1990s.IBM Systems Journal, 28(2):294–306, 1989.[12] Asger Feldthaus, Max Scha¨fer, Manu Sridharan, Julian Dolby, andFrank Tip. Ecient construction of approximate call graphs forJavaScript IDE services. In Proceedings of International Conferenceon Software Engineering (ICSE), pages 752–761. IEEE Computer So-ciety, 2013.[13] Robert Gentleman and Ross Ihaka. The R project for statistical com-puting. http://www.r-project.org.[14] Google. Mutation Summary Library. http://code.google.com/p/mutation-summary/.[15] Alex Groce and Willem Visser. What went wrong: Explaining coun-terexamples. In Workshop on Model Checking of Software, pages 121–135, 2003.[16] Clemens Hammacher. Design and implementation of an ecient dy-namic slicer for Java. Bachelor’s Thesis, November 2008.[17] Victor Hurdugaci and Andy Zaidman. Aiding software developers tomaintain developer tests. In Proceedings of the European Conference onSoftware Maintenance and Reengineering (CSMR), pages 11–20. IEEEComputer Society, 2012.[18] Wei Jin, Alessandro Orso, and Tao Xie. Automated behavioral re-gression testing. In Proceedings of International Conference on Soft-ware Testing, Verification and Validation (ICST), pages 137–146. IEEEComputer Society, 2010.[19] J.A. Jones and M.J. Harrold. Empirical evaluation of the Tarantula au-tomatic fault-localization technique. In Proceedings of the InternationalConference on Automated Software Engineering (ASE), pages 273–282.ACM, 2005.[20] Andrew J. Ko and Brad A. Myers. Debugging reinvented: Asking andanswering why and why not questions about program behavior. InProceedings of the International Conference on Software Engineering(ICSE), pages 301–310. ACM, 2008.[21] Bogdan Korel and Janusz W. Laski. Dynamic program slicing. Inf.Process. Lett., 29(3):155–163, 1988.48Bibliography[22] Glenn E. Krasner and Stephen T. Pope. A cookbook for using themodel-view controller user interface paradigm in smalltalk-80. J. ObjectOriented Program., 1(3):26–49, August 1988.[23] Ali Mesbah, Arie van Deursen, and Danny Roest. Invariant-based au-tomatic testing of modern web applications. IEEE Transactions onSoftware Engineering (TSE), 38(1):35–53, 2012.[24] Shabnam Mirshokraie, Ali Mesbah, and Karthik Pattabiraman. E-cient JavaScript mutation testing. In Proceedings of the InternationalConference on Software Testing, Verification and Validation (ICST),pages 74–83. IEEE Computer Society, 2013.[25] George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak,and Westley Weimer. Ccured: Type-safe retrofitting of legacy software.ACM Trans. Program. Lang. Syst., 27(3):477–526, May 2005.[26] Frolin Jr Ocariza, Karthik Pattabiraman, and Ali Mesbah. Autoflox:An automatic fault localizer for client-side JavaScript. In Proceedingsof the International Conference on Software Testing, Verification andValidation (ICST), pages 31–40. IEEE Computer Society, 2012.[27] Brock Pytlik, Manos Renieris, Shriram Krishnamurthi, and Steven P.Reiss. Automated fault localization using potential invariants. In In-ternational Workshop on Automated and Algorithmic Debugging, pages273–276, 2003.[28] Manu Sridharan, Stephen J. Fink, and Rastislav Bodik. Thin slicing.SIGPLAN Not., 42(6):112–122, June 2007.[29] Suresh Thummalapenta, K. Vasanta Lakshmi, Saurabh Sinha, NishantSinha, and Satish Chandra. Guided test generation for web applica-tions. In Proceedings of the International Conference on Software En-gineering (ICSE), pages 162–171. IEEE Computer Engineering, 2013.[30] Iris Vessey. Expertise in debugging computer programs: A processanalysis. International Journal of Man-Machine Studies, 23(5):459–494, 1985.[31] W3C. Document Object Model (DOM) level 2 events specification.http://www.w3.org/TR/DOM-Level-2-Events/, 13 November 2000.[32] Mark Weiser. Program slicing. In Proceedings of the International Con-ference on Software Engineering (ICSE), pages 439–449. IEEE, 1981.49Bibliography[33] Claes Wohlin, Per Runeson, Martin Ho¨st, Magnus C. Ohlsson, Bjo¨ornRegnell, and Anders Wessle´n. Experimentation in software engineering:an introduction. Kluwer, 2000.[34] SuanHsi Yong and Susan Horwitz. Using static analysis to reduce dy-namic analysis overhead. Formal Methods in System Design, 27(3):313–334, 2005.[35] Andreas Zeller. Isolating cause-e↵ect chains from computer programs.In Proceedings of the ACM SIGSOFT Symposium on Foundations ofSoftware Engineering (FSE), pages 1–10. ACM, 2002.50Appendix AExperiment TasksA.1 Task 1The Slide Show feature of this photo gallery application is experiencing un-usual behaviour. The SlideShowTest test case (SlideShowTest.java) checksfor the correct behavior of this feature by browsing through the photospresent in an existing gallery. If needed, you can observe the test casefailure by running SlideShowTest.java. Identify the file name(s) and linenumbers(s) of the JavaScript code responsible for this fault.A.2 Task 2The Main View of this photo gallery application is experiencing unusual be-haviour. The MainViewTest test case (MainViewTest.java) browses throughvarious pages in the gallery application and exercises the photo rating fea-ture. If needed, you can observe the test case failure by running Main-ViewTest.java. Identify the file name(s) and line numbers(s) of the JavaScriptcode responsible for this fault.51Appendix BInformation Consent FormE↵ects of Using Camellia on FaultLocalization for Web ApplicationsPrincipal Investigator:Dr. Ali MesbahAssistant ProfessorElectrical and Computer Engineering Department, University of BritishColumbiaCo-Investigators:Dr. Karthik PattabiramanAssistant ProfessorElectrical and Computer Engineering Department, University of BritishColumbiaSheldon SequeiraGraduate StudentElectrical and Computer Engineering Department, University of BritishColumbiaSponsor: N/APurpose: The purpose of this study is to investigate whether our tool,Camellia, eases the fault localization process for web application develop-ers. Specifically, a focus is placed on localizing those bugs discovered onthe client-side by a test suite. In this study, participants will be asked toperform some tasks with Camellia, or using other tools such as Chrome De-52Appendix B. Information Consent FormvTools. The aforementioned tasks are related to understanding Seleniumtest cases and debugging web applications (JavaScript-based). The resultsfrom the study will be compared later in terms of task completion durationand accuracy.Study Procedure: During the period of this experiment, first you will beinformed of the consent for the study and sign the consent form. Then youwill fill a pre-questionnaire survey, indicating your level of expertise in thefields related to the current experiment. Afterwards, you will be asked toperform a list of tasks and write down the results. After finishing the study,you will fill a post-questionnaire form about your opinion on the tool andfinally you will be shortly interviewed about your experience with the tools.Consent forms will be administered in person and thus signatures will becollected in person. You may also ask questions about the consent form andthe procedure of the experiment.You may leave the experiment at any time that you wish. You can withdrawyour volunteered participation without consequences.Location and Devices The location of the study for UBC student partic-ipants will be the SALT Lab, room 3070 in Fred Kaiser building of the ECEdepartment of UBC for UBC students. One of the computers in the lab isdedicated to this study.Potential Risks: The risks related to this project are minimal and are nomore than what you would encounter in your regular day.Potential Benefits: The results of this study will help assess and improveour under-development tool for modelling web applications and their testcases. Web developers can use the tool, Camellia, in order to help themlocalize fault detected by existing client-side test cases.Results and Future Contact: You can obtain the results of this studyfrom Sheldon Sequeira via email after the results have been processed, ana-lyzed and documented. After Completion of this experiment, there will beno contact with you from the investigator team. However, feel free to con-tact a co-investigator should you have any comments, questions or concerns.53Appendix B. Information Consent FormComplaints: Complaints or concerns about your treatment or rights asa research participant should be directed to:The Research Subject Information Line in the UBC Oce of Research Ser-vicesConfidentiality: The data collected during this experiment is completelyanonymous and total confidentiality is promised.Data will be recorded in the form of text documents and will not includeany personal information about the participant. The gathered data will beput in a password protected folder in the investigators flash memory stick,and will be locked in a secure cabinet for 5 years after the date of the study.Consent: Your participation in this study is entirely voluntary and youmay refuse to participate or withdraw at any time during the experiment.You consent to participate in the study on E↵ects of Using Camellia onFault Localization for Web Applications. You have understood the natureof this project and wish to participate.Your signature below indicates that you have received a copy of this consentform for your own records.———————————————————————————————Participant’s Name———————————————————————————————Participant’s Signature———————————————————————————————Date54Appendix CPre-QuestionnaireName:Degree Pursued/Job Title:Sex:Age:1. From 1 to 5, what best describes your familiarity with web develop-ment(1: no/very little experienced, 5: professional)1 2 3 4 52. Please name the tools that you usually use for localizing faults in web-application code.From 1 to 5, what best describes your familiarity with each tool?(1: no/very little experienced, 5: professional)Tool Familiarity1) ........................................... 1 2 3 4 52) ........................................... 1 2 3 4 53) ........................................... 1 2 3 4 53. From 1 to 5, what best describes your familiarity with the Seleniumtesting framework(1: no/very little experienced, 5: professional)1 2 3 4 54. What is the browser that you normally use for web development?5. How many years of experience have you had with web development?6. How many years of experience have you had with software develop-ment?55Appendix DPost-QuestionnaireID Number:1) Please name the tool(s) you used in the experiment.2) Did you find the above tool(s) helpful for localizing faults in the webapplication? What capabilities of the tool(s) did you find more useful?3) What were the limitations of the tool(s) used in the experiment? Whatimprovements would you suggest?56


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items