Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Effective test generation and adequacy assessment for Javascript-based web applications Mirshokraie, Shabnam 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_november_mirshokraie_shabnam.pdf [ 1.04MB ]
Metadata
JSON: 24-1.0167728.json
JSON-LD: 24-1.0167728-ld.json
RDF/XML (Pretty): 24-1.0167728-rdf.xml
RDF/JSON: 24-1.0167728-rdf.json
Turtle: 24-1.0167728-turtle.txt
N-Triples: 24-1.0167728-rdf-ntriples.txt
Original Record: 24-1.0167728-source.json
Full Text
24-1.0167728-fulltext.txt
Citation
24-1.0167728.ris

Full Text

Effective Test Generation and Adequacy Assessment forJavaScript-based Web ApplicationsbyShabnam MirshokraieBSc. Computer Engineering, Ferdowsi University of Mashhad, Iran, 2006MSc. Computing Science, Simon Fraser University, Canada, 2010A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)October 2015© Shabnam Mirshokraie, 2015AbstractToday’s modern Web applications rely heavily on JavaScript and client-side run-time manipulation of the DOM (Document Object Model) tree. One way to provideassurance about the correctness of such highly evolving and dynamic applicationsis through testing. However, JavaScript is loosely typed, dynamic, and notoriouslychallenging to analyze and test.The work presented in this dissertation has focused on advancing the state-of-the-art in testing JavaScript-based web applications by proposing a new set oftechniques and tools. We proposed (1) a new automated technique for JavaScriptregression testing, which is based on inferring invariant assertions, (2) the firstJavaScript mutation testing tool, capable of guiding the mutation generation to-wards behaviour-affecting mutants in error-prone portions of the code, (3) an auto-matic technique to generate test cases for JavaScript functions and events; Mutationanalysis is used to generate test oracles, capable of detecting regression JavaScriptand DOM-level faults, and (4) utilizing existing DOM-dependent assertions as wellas useful execution information inferred from a DOM-based test suite to automat-ically generate assertions for unit-level testing of JavaScript functions.To measure the effectiveness of the proposed approaches, we evaluated eachmethod presented in this thesis by conducting various empirical studies and com-parisons with existing testing techniques. The evaluation results point to the effec-tiveness of the proposed test generation and test assessment techniques in terms ofaccuracy and fault detection capability.iiPrefaceDuring my PhD studies, I have conducted the research described in this disserta-tion in collaboration with my supervisors, Ali Mesbah and Karthik Pattabiramanat the University of British Columbia (UBC). Research projects included in thisdissertation have been either published or currently under review. I was the maincontributor for the research projects presented in each chapter, including the initialidea, developing, and evaluating the system. I had the collaboration of Ali Mesbahand Karthik Pattabiraman to discuss the projects and ideas, as well as making editsin the text.The following list presents publications for each chapter.• Chapter 2:– “JSART: JavaScript Assertion-based Regression Testing” [78], S. Mir-shokraie and A. Mesbah, International Conferencee on Web Engineer-ing (ICWE), 2012, 238-252.• Chapter 3:– “Efficient JavaScript Mutation Testing” [79], S. Mirshokraie, A. Mes-bah and K. Pattabiraman, International Conference on Software Test-ing, Verification, and Validation (ICST), 2013, 74-83 (Best paper Runner-up award).– “Guided Mutation Testing for JavaScript Web Applications” [82], S.Mirshokraie, A. Mesbah and K. Pattabiraman, IEEE Transaction onSoftware Engineering (TSE), 2015, 429-444.• Chapter 4:– “JSEFT: Automated JavaScript Unit Test Generation” [81], S. Mir-shokraie, A. Mesbah and K. Pattabiraman, International Conferenceon Software Testing, Verification, and Validation (ICST), 2015, 1-10(Nominated for the best paper award).– “PYTHIA: Generating Test Cases with Oracles for JavaScript Appli-cations” [80], S. Mirshokraie, A. Mesbah and K. Pattabiraman, Auto-mated Software Engineering (ASE), 2013, New Ideas Track, 610-615.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Test Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Web Testing . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Current Test and Oracle Generation Techniques . . . . . . 31.1.3 Test Automation Challenges . . . . . . . . . . . . . . . . 51.2 Adequacy Assessment . . . . . . . . . . . . . . . . . . . . . . . 51.2.1 Mutation Testing Challenges . . . . . . . . . . . . . . . . 61.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 JSART: JavaScript Assertion-based RegressionTesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . 122.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.1 JavaScript Tracing . . . . . . . . . . . . . . . . . . . . . 142.3.2 Invariant Generation . . . . . . . . . . . . . . . . . . . . 162.3.3 Filtering Unstable Invariant Assertions . . . . . . . . . . 172.3.4 Regression Testing through Assertions . . . . . . . . . . . 172.4 Tool Implementation . . . . . . . . . . . . . . . . . . . . . . . . 192.5 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 20iv2.5.1 Experimental Objects . . . . . . . . . . . . . . . . . . . . 202.5.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 202.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Guided Mutation Testing for JavaScript WebApplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Running Example and Motivation . . . . . . . . . . . . . . . . . 333.3 Overview of Approach . . . . . . . . . . . . . . . . . . . . . . . 343.4 Ranking Functions . . . . . . . . . . . . . . . . . . . . . . . . . 363.4.1 Ranking Functions for Variable Mutation . . . . . . . . . 363.4.2 Ranking Functions for Branch Mutation . . . . . . . . . . 423.5 Ranking Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 433.5.1 Variables Involved in Dynamic Invariants . . . . . . . . . 433.5.2 Variables with High Usage Frequency . . . . . . . . . . . 443.6 Mutation Operators . . . . . . . . . . . . . . . . . . . . . . . . . 453.6.1 Generic Mutation Operators . . . . . . . . . . . . . . . . 453.6.2 JavaScript-Specific Mutation Operators . . . . . . . . . . 473.7 Tool Implementation: MUTANDIS . . . . . . . . . . . . . . . . . 493.8 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 493.8.1 Experimental Objects . . . . . . . . . . . . . . . . . . . . 503.8.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 513.8.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.9.1 Stubborn Mutants . . . . . . . . . . . . . . . . . . . . . . 603.9.2 Type and Location of Operators . . . . . . . . . . . . . . 613.9.3 Characteristics of JavaScript Functions . . . . . . . . . . 623.9.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 633.10 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 JSEFT: Automated JavaScript Unit TestGeneration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.2 Challenges and Motivation . . . . . . . . . . . . . . . . . . . . . 714.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.3.1 Maximizing Function Coverage . . . . . . . . . . . . . . 744.3.2 Generating Test Cases . . . . . . . . . . . . . . . . . . . 764.3.3 Generating Test Oracles . . . . . . . . . . . . . . . . . . 80v4.3.4 Tool Implementation . . . . . . . . . . . . . . . . . . . . 864.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 864.4.1 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.4.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.4.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 934.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955 Atrina: Inferring Unit Oracles from GUI Test Cases . . . . . . . . . 965.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.3.1 Extracting DOM-Related Characteristics . . . . . . . . . 1035.3.2 Relating DOM Changes to the Code . . . . . . . . . . . . 1065.3.3 Generating Unit-Level Assertions . . . . . . . . . . . . . 1095.3.4 Tool Implementation: Atrina . . . . . . . . . . . . . . . . 1125.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 1125.4.1 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.4.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 1205.4.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 1215.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.1 Revisiting Research Questions . . . . . . . . . . . . . . . . . . . 1256.1.1 Research Question 1 . . . . . . . . . . . . . . . . . . . . 1256.1.2 Research Question 2 . . . . . . . . . . . . . . . . . . . . 1276.2 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 129Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131viList of TablesTable 2.1 Characteristics of the experimental objects. . . . . . . . . . . . 21Table 2.2 Properties of the invariant assertions generated by JSART. . . . 22Table 2.3 Precision and Recall for JSART fault detection. . . . . . . . . . 24Table 2.4 Manual effort imposed by our approach for deriving stable in-variant assertions. . . . . . . . . . . . . . . . . . . . . . . . . 27Table 3.1 Computed FunctionRank and PageRank for the running example. 42Table 3.2 Ranking functions for branch mutation (running example). . . 43Table 3.3 Generic mutation operators for variables and function parameters. 46Table 3.4 Generic mutation operators for branch statements. . . . . . . . 46Table 3.5 DOM, jQuery, and XmlHttpRequest (XHR) operators. . . . . . 48Table 3.6 Characteristics of the experimental objects. . . . . . . . . . . . 50Table 3.7 Bug severity description. . . . . . . . . . . . . . . . . . . . . 50Table 3.8 Mutants generated by MUTANDIS. . . . . . . . . . . . . . . . 53Table 3.9 Mutation score computed for SimpleCart, JQUERY, and WymEd-itor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Table 4.1 Characteristics of the experimental objects. . . . . . . . . . . . 87Table 4.2 Results showing the effects of our function coverage maxi-mization, function state abstraction, and mutation-based or-acle generation algorithms. . . . . . . . . . . . . . . . . . . . 89Table 4.3 Fault detection. . . . . . . . . . . . . . . . . . . . . . . . . . 91Table 5.1 Characteristics of the experimental objects. . . . . . . . . . . . 113Table 5.2 Accuracy achieved by Atrina. . . . . . . . . . . . . . . . . . . 117viiList of FiguresFigure 1.1 SELENIUM test case. . . . . . . . . . . . . . . . . . . . . . . 3Figure 2.1 Motivating JavaScript example. . . . . . . . . . . . . . . . . 12Figure 2.2 Overview of the JavaScript tracing and invariant generationsteps (web application version n). . . . . . . . . . . . . . . . 14Figure 2.3 Overview of the filtering step to remove unstable invariant as-sertions, for web application version n. . . . . . . . . . . . . 16Figure 2.4 Overview of the JavaScript regression testing step through in-variant assertions, for web application version n+1. . . . . . . 18Figure 2.5 Invariant assertion code for JavaScript function parameters, lo-cal variables and DOM modifications. Injected assertions areshown in bold. . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 2.6 Performance plot of JSART. . . . . . . . . . . . . . . . . . . 26Figure 3.1 JavaScript code of the running example. . . . . . . . . . . . . 33Figure 3.2 Overview of our mutation testing approach. . . . . . . . . . . 35Figure 3.3 Call graph of the running example. . . . . . . . . . . . . . . . 40Figure 3.4 Equivalent Mutants (%) generated by MUTANDIS, random,PageRank, and random variable selection. . . . . . . . . . . . 55Figure 3.5 Bug Severity Rankd (Avg) achieved by MUTANDIS, random,PageRank, and random variable selection. . . . . . . . . . . . 56Figure 4.1 JavaScript code of the running example. . . . . . . . . . . . . 70Figure 4.2 Overview of our test generation approach. . . . . . . . . . . . 74Figure 4.3 Generated SELENIUM test case. . . . . . . . . . . . . . . . . 84Figure 4.4 Generated QUNIT test case. . . . . . . . . . . . . . . . . . . 85Figure 4.5 Statement coverage achieved. . . . . . . . . . . . . . . . . . 90Figure 5.1 Running example (a) JavaScript code, and (b) DOM-based testcase. The line from (b) to (a) shows the point of contact be-tween the DOM assertion and the code. The arrow lines in (a)show the backward as well as forward slices between JavaScriptstatements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 99viiiFigure 5.2 Overview of our assertion generation approach. . . . . . . . . 101Figure 5.3 Finding (1) intra DOM assertion dependency within the testcase (b), (2) inter DOM assertion dependency between (b) DOM-based assertion and (a) the JavaScript code, and (3) the initialpoint of contact between (b) DOM-based assertion and (a) theJavaScript code. . . . . . . . . . . . . . . . . . . . . . . . . . 103Figure 5.4 Intra (data and control) code dependency through backwardslicing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106Figure 5.5 Generated QUNIT test case and assertions. . . . . . . . . . . 110Figure 5.6 Intra code dependency through forward slicing. . . . . . . . . 111Figure 5.7 Relating candidate DOM element to JavaScript code. . . . . . 111Figure 5.8 Fault detection rate using different types of generated assertions.117Figure 5.9 Fault finding capability. . . . . . . . . . . . . . . . . . . . . . 118Figure 5.10 Time overhead for each approach. . . . . . . . . . . . . . . . 121ixAcknowledgmentsI would like to express my deepest gratitude to my senior supervisor, Ali Mesbah,for mentoring me in the past four years with his patience and knowledge. Withouthis support, completion of this thesis would not have been possible for me. I wouldalso like to thank Karthik Pattabiraman for the enriching feedback he providedduring the conversations we had. The knowledge that I have gained through theinsightful discussions with Karthik are invaluable.I am grateful to all the members at the Software Analysis and Testing Lab forproviding me a stimulating and fun environment. A special thank to my amazingwonderful friend Maryam for helping me get through the difficult times, and forall the emotional and caring support she provided. Last but certainly not least, Iowe my deepest gratitude to my parents for their love and endless support. I deeplyappreciate the never-ending patience I have received from them.xChapter 1IntroductionIn this section, we provide an introduction to modern web application testing, fol-lowed by some of the current techniques used for automating the testing process.1.1 Test AutomationSoftware testing is an integral part of the software engineering. Testing helps toimprove the quality and dependability of the applications. The importance of webapplication testing arises from the ever increasing reliance on these systems in so-cial and organizational applications. Companies use web applications to transmit,collect, and process customer’s data. Web applications are also used in webmail,content management systems (CMS), social media tools, and many other systems.Such applications have become more complex in last decades, with using differenttechnologies and programming languages, that are even implemented by differentdevelopers. As a result, writing high-quality test cases for web applications thatcan assure their correct behaviour becomes more complicated, time consuming,and effort intensive for developers [20]. Automatic test case generation can signif-icantly reduce the time and manual effort, while increasing the reliability of webapplications.Determining the desired behaviour of the application under test for a giveninput is called the Oracle Problem. Manual testing is expensive and time consum-ing, mainly because of the manual effort that should be spent in identifying theproper oracle. Therefore, automating the oracle generation is an important part ofthe testing process. Our goal is to improve the dependability and reliability of the1modern web applications by automating the test and oracle generation process withdifferent techniques and tools proposed in this thesis.1.1.1 Web TestingOne of the common engines of today’s modern web applications is JavaScript. Ac-cording to a recent survey conducted by Stack Overflow, JavaScript is the mostpopular programming language [17]. It is also the most-used programming lan-guage in GitHub repositories [16]. Developers employ JavaScript to add func-tionality, dynamically change the GUI state, and communicate with web servers.Given the increasing reliance of web applications on this language, it is importantto check its correct behaviour.Automatically generating effective test suites for JavaScript applications is par-ticularly challenging compared with traditional languages. The event-driven andhighly dynamic nature of JavaScript make JavaScript applications error-prone [86]and difficult to test. Moreover, JavaScript is used to seamlessly change the Doc-ument Object Model (DOM) at run-time. DOM is a language independent con-vention for representing and interacting with objects in HTML documents. DOMnodes are organized in a tree structure, which is called the DOM tree. JavaScriptcan dynamically update the content, structure, and style of a document by access-ing and manipulating the DOM tree. The dynamic interaction between JavaScriptand the DOM, as two separate components, can become quite complex [86].The huge event space of such applications hinders model inferring techniquesto cover the whole state space in a limited amount of time. Moreover, inferringtest assertions with high fault finding capability requires a precise analysis of theinteraction between the JavaScript and the DOM. Providing an accurate mappingbetween the two components becomes difficult as the JavaScript code and the DOMincrease in size. It is also challenging to identify a proper set of test oracles fromthe large number of oracles that potentially can be selected.To test JavaScript-based web applications, developers often write test cases us-ing frameworks such as SELENIUM [12] to examine the correct interaction of theuser with web application (GUI testing) and QUNIT [11] to test the proper func-tionality of the individual units (unit testing). Although such frameworks help to21 @Test2 public void testCase1(){3 WebElement divElem=driver.findElements(By.id("divElem"));4 divElem.click();5 driver.findElements(By.id("endCell")).getSize().height;6 WebElement startCell=driver.findElements(By.id("startCell"));7 startCell.click();8 driver.findElements(By.id("startCell"));9 ...10 }Figure 1.1: SELENIUM test case.automate test execution, the test cases need to be written manually, which can betedious and inefficient. Using SELENIUM to write DOM-based tests and asser-tions requires little knowledge about the internal operations performed at the clientside code; The tester needs only basic knowledge of common event sequencesto cover important DOM elements to assert. This makes it easier for the testerto write DOM-based test suites. However, DOM-based assertions can potentiallymiss some portion of the code, while more fine grained unit-level assertions mightbe capable of detecting such faults. Furthermore, since DOM-based tests are ag-nostic of the JavaScript code, finding the root cause of an error during DOM-basedtesting is more expensive than during unit testing. Figure 1.1 shows a sampleDOM-based test case. The test case contains the sequence of clicking on differentDOM elements without an observable communication with the executed JavaScriptcode. On the other hand, writing unit test assertions for web applications that haverich interaction with the DOM through their JavaScript code is more tedious. Togenerate unit-level assertions, the technique needs to precisely interpret the fullrange of interaction between the code level operations of a unit and the DOM leveloperations of a system, otherwise it may not be able to assert the correctness ofa particular behaviour when the unit is used as a part of a system. The inherentcharacteristics of unit and DOM-based tests, indicate that they are complementaryand that there is a trade-off in individually using each to detect faults.1.1.2 Current Test and Oracle Generation TechniquesDifferent test and oracle generation techniques have been proposed to overcomethe aforementioned problems.Web Test Generation. Marchetto and Tonella [71] propose a search-based algo-3rithm for generating event-based sequences to test Ajax applications. Mesbah etal. [76] use generic and application-specific invariants as a form of automated softoracles for testing AJAX applications. Sen et al. [105] propose a record and replayframework called Jalangi. It incorporates selective record-replay as well as shadowvalues and shadow execution to enable writing of heavy-weight dynamic analyses.The framework is able to track generic faults such as null and undefined val-ues as well as type inconsistencies in JavaScript.Jensen et al. [63] propose a technique to test the correctness of communicationpatterns between client and server in AJAX applications by incorporating serverinterface descriptions. They construct server interface descriptions through an in-ference technique that can learn communication patterns from sample data. Saxenaet al. [102] combine random test generation with the use of symbolic executionfor systematically exploring a JavaScript application’s event space as well as itsvalue space, for security testing. Artzi et al. propose Artemis [22], which supportsautomated testing of JavaScript applications. Artemis considers the event-drivenexecution model of a JavaScript application for feedback-directed testing.Oracle Generation. There has been limited work on oracle generation for testing.Fraser et al. [48] propose µTEST, which employs a mutant-based oracle generationtechnique. It automatically generates unit tests for Java object-oriented classes byusing a genetic algorithm to target mutations with high impact on the application’sbehaviour. They further identify [47] relevant pre-conditions on the test inputs andpost-conditions on the outputs to ease human comprehension. Staats et al. [107]address the problem of selecting oracle data, which is formed as a subset of internalstate variables as well as outputs for which the expected values are determined.Mutation testing is applied to produce oracles and rank the inferred oracles in termsof their fault finding capability. They merely focus on supporting the creation oftest oracles by the programmer, rather than fully automating the process of test casegeneration. Loyola et al. [69] propose Dodona, which ranks program variablesbased on their dependencies during the program execution. Using this ranking,they suggest a set of variables to be monitored by the tester as assertion-basedoracles. Dodona is also among the test oracle generation supporting systems.41.1.3 Test Automation ChallengesAlthough, researchers have recently developed automated test generation tech-niques for JavaScript-based applications [22, 71, 72, 76, 102], current techniquessuffer from two main shortcomings, namely, they:1. Target the generation of event sequences, which operate at the event-level orDOM-level to cover the state space of the application. These techniques failto capture faults that do not propagate to an observable DOM state. As such,they potentially miss this portion of code-level JavaScript faults. In orderto capture such faults, effective test generation techniques need to target thecode at the JavaScript unit-level, in addition to the event-level.2. Either ignore the oracle problem altogether or simplify it through genericsoft oracles, such as W3C HTML validation [22, 76], or JavaScript runtimeexceptions [22]. These type of oracles are not able to detect applicationspecific errors. A generated test case without assertions is not useful sincecode coverage alone is not the goal of software testing. For such generatedtest cases, the tester still needs to manually write many assertions, whichis time and effort intensive. On the other hand, soft oracles target genericfault types and are limited in their fault finding capabilities. However, to bepractically useful, unit testing requires strong oracles to determine whetherthe application under test executes correctly.3. They merely focus on supporting the test oracle generation by the program-mer. Thus, these approaches are not fully automating the process of test casecreation [69, 107].To address the above mentioned shortcomings, we proposed a set of fully auto-mated test case and assertion generation techniques for JavaScript applications.Our techniques can capture application specific DOM as well as JavaScript coderelated faults.1.2 Adequacy AssessmentWhile automated testing can help the tester to assure the application’s dependabil-ity and detect faults in the application code, it does not reveal the trustworthiness5of the written tests. In order to understand how well the functionality and the datais being tested, we need to assess the quality of the tests. A large body of researchhas been accomplished to assess the quality of test suites from different perspec-tives: (1) code coverage, and (2) mutation analysis. While code coverage tells howmuch of the source code is exercised by the test suite, it does not provide sufficientinsight into the actual quality of the tests. Mutation testing has been proposed as afault-based testing technique to assess and improve the quality of a test suite.The main idea of mutation testing is to create mutants (i.e., modified versionsof the program) and check if the test suite is effective at detecting the mutants.The technique first generates a set of mutants by applying a set of well-definedmutation operators on the original version of the system under test. These mutationoperators typically represent subtle mistakes, such as typos, commonly made byprogrammers. A test suite’s adequacy is then measured by its ability to detect (or‘kill’) the mutants, which is known as the adequacy score (or mutation score).1.2.1 Mutation Testing ChallengesDespite being an effective test adequacy assessment method [21, 66], mutationtesting suffers from two main issues. First, there is a high computational cost inexecuting the test suite against a potentially large set of generated mutants. Sec-ond, there is a significant amount of effort involved in distinguishing equivalentmutants, which are syntactically different but semantically identical to the originalprogram [32]. Equivalent mutants have no observable effect on the application’sbehaviour, and as a result, cannot be killed by test cases. Empirical studies indicatethat about 45% of all undetected mutants are equivalent [70, 103]. According to arecent study [70], it takes on average 15 minutes to manually assess one single first-order mutant. While detecting equivalent mutants consumes considerable amountof time, there is still no fully automated technique that is capable of detecting allthe equivalent mutants [70].Current Mutation Testing Approaches. A large body of research has been con-ducted to turn mutation testing into a practical approach. To reduce the computa-tional cost of mutation testing, researchers have proposed three main approaches togenerate a smaller subset of all possible mutants: (1) mutant clustering [64], which6is an approach that chooses a subset of mutants using clustering algorithms; (2)selective mutation [24, 84, 116], which is based on a careful selection of more ef-fective mutation operators to generate less mutants; and (3) higher order mutation(HOM) testing [65], which tries to find rare but valuable higher order mutants thatdenote subtle faults [66].According to the taxonomy suggested by Madeyski et al. [70], there are threemain categories of approaches that address the problem of equivalent mutants: (1)detecting equivalent mutants, (2) avoiding equivalent mutant generation, and (3)suggesting equivalent mutants. As far as equivalent mutant detection techniquesare concerned, the most effective approach is proposed by Offutt and Pan [88, 89],which uses constraint solving and path analysis. The results of their evaluationshowed that the approach is able to detect on average the 45% of the equivalentmutants. However, these solutions are involved with considerable amount of man-ual effort and are error-prone.Among equivalent detection methods, program slicing has also been used inequivalent mutants detection [59]. The goal there is to guide a tester in detecting thelocations that are affected by a mutant. Among avoiding equivalent mutant genera-tion techniques, Domı´nguez-Jime´nez et al. [40] propose an evolutionary mutationtesting technique based on a genetic algorithm to cope with the high computationalcost of mutation testing by reducing the number of mutants. Their system gen-erates a strong subset of mutants, which is composed of surviving and difficultto kill mutants. Their technique, however, cannot distinguish equivalent mutantsfrom surviving non-equivalent mutants. Bottaci [30] presents a mutation analysistechnique based on the available type information at run-time to avoid generatingincompetent mutants. This approach is applicable for dynamically typed programssuch as JavaScript. However, the efficiency of the technique is unclear as they donot provide any empirical evaluation of their approach.Among the equivalent mutant suggestion techniques, Schuler et al. [104] sug-gest possible equivalent mutants by checking invariant violations. They generatemultiple mutated versions and then classify the versions based on the number ofviolated invariants. The system recommends testers to focus on those mutationsthat violate the most invariants. In a follow-up paper [103], they extend their workto assess the role of code coverage changes in detecting equivalent mutants.7Deng et al. [39] implement a version of statement deletion (SDL) mutationoperator for Java within the muJava mutation system. The design of SDL opera-tor is based on a theory that performing mutation testing using only one mutationoperator can result in generating effective tests. However, the authors cannot con-clude that SDL-based mutation is as effective as selective mutation, which containsa sufficient set of mutation operators from all possible operators. Therefore, theyonly recommend for future mutation systems to include SDL as a choice.However, these solutions suffer from the following limitations:1. They are involved with considerable amount of manual effort, and thus areerror-prone;2. The mutants need to be executed against the test suite, which limits the effi-ciency of the technique as the number of mutants increase.3. The system only recommends testers to focus on those mutations that aremore likely to be non-equivalent. These techniques are not fully automatedand are used as a supporting system for the tester;To tackle the above mentioned issues, we proposed a fully automated mutation gen-eration technique that avoids generating equivalent mutants a priori by identifyingbehaviour-affecting portions of the code, and thus achieving greater efficiency. Ourapproach (1) reduces the number of equivalent mutants and (2) guides testers to-wards designing test cases for important portions of the code from the application’sbehaviour point of view.1.3 Research QuestionsTo improve the dependability of JavaScript web applications, we designed twohigh-level research questions:RQ 1.3.A. How can we automatically generate effective test cases for JavaScriptapplications?In response to web testing challenges, we (1) designed an automated test caseand oracle generator, which is capable of detecting faults in the JavaScript appli-cations for both unit and DOM level, and (2) proposed an approach to exploit theexisting DOM-based test suite in order to generate unit-level assertions.8RQ 1.3.B. How can we effectively assess the quality of the existing JavaScripttest cases?To assess the quality of test cases, we proposed a new JavaScript mutationtesting approach, which helps to guide the mutation generation process towardsparts of the code that are error-prone or likely to influence the program’s output.1.4 ContributionsIn response to the first and second research questions as outlined in Section 1.3,the following papers have been published:• Chapter 2:– “JSART: JavaScript Assertion-based Regression Testing” [78], S. Mir-shokraie and A. Mesbah, International Conferencee on Web Engineer-ing (ICWE), 2012, 238-252.• Chapter 3:– “Efficient JavaScript Mutation Testing” [79], S. Mirshokraie, A. Mes-bah and K. Pattabiraman, International Conference on Software Test-ing, Verification, and Validation (ICST), 2013, 74-83 (Best paper Runner-up award).– “Guided Mutation Testing for JavaScript Web Applications” [82], S.Mirshokraie, A. Mesbah and K. Pattabiraman, IEEE Transaction onSoftware Engineering (TSE), 2015, 429-444.• Chapter 4:– “JSEFT: Automated JavaScript Unit Test Generation” [81], S. Mir-shokraie, A. Mesbah and K. Pattabiraman, International Conferenceon Software Testing, Verification, and Validation (ICST), 2015, 1-10(Nominated for the best paper award).– “PYTHIA: Generating Test Cases with Oracles for JavaScript Appli-cations” [80], S. Mirshokraie, A. Mesbah and K. Pattabiraman, Auto-mated Software Engineering (ASE), 2013, New Ideas Track, 610-615.9– “Unit Test Generation for JavaScript”, S. Mirshokraie, A. Mesbah andK. Pattabiraman, Submitted to the Software Testing, Verification andReliability (STVR) journal and is currently under review.• Chapter 5:– “Atrina: Inferring Unit Oracles from GUI Test Cases”, Submitted to theInternational Conference on Software Testing, Verification, and Valida-tion (ICST’16) and is currently under review.I have also contributed to the following related publications:• Automated Analysis of CSS Rules to Support Style Maintenance [74]: A.Mesbah and S. Mirshokraie, ICSE’12, 408-418;• A Systematic Mapping Study of Web Application Testing [50]: V. Garousi,A. Mesbah, A. Betin Can and S. Mirshokraie, IST, vol. 55, no. 8, 1374-1396,2013;10Chapter 2JSART: JavaScript Assertion-based RegressionTestingSummary1One way to provide assurance about the correctness of highly evolving and dy-namic applications is through regression testing. We propose an automated tech-nique for JavaScript regression testing, which is based on on-the-fly JavaScriptsource code instrumentation and dynamic analysis to infer invariant assertions.These obtained assertions are injected back into the JavaScript code to uncoverregression faults in subsequent revisions of the web application under test. Ourapproach is implemented in a tool called JSART. We present our case study con-ducted on nine open source web applications to evaluate the proposed approach.The results show that our approach is able to effectively generate stable assertionsand detect JavaScript regression faults with a high degree of accuracy and minimalperformance overhead.2.1 IntroductionWeb applications usually evolve fast by going through rapid development cyclesand are, therefore, susceptible to regressions, i.e., new faults in existing function-ality after changes have been made to the system. One way of ensuring that suchmodifications (e.g., bug fixes, patches) have not introduced new faults in the mod-1This chapter appeared at the International Conference on Web Engineering (ICWE), 2012 [78].111 function setDim(height, width) {2 var h = 4*height, w = 2*width;3 ...4 return{h:h, w:w};5 }7 function play(){8 $(#end).css("height", setDim($('body').width(), $('body').height()).h←↩+ 'px');9 ...10 }Figure 2.1: Motivating JavaScript example.ified system is through systematic regression testing. While regression testing ofclassical web applications has been difficult [109], dynamism and non-determinismpose an even greater challenge [99] for Web 2.0 applications.In this work, we propose an automated technique for JavaScript regressiontesting, which is based on dynamic analysis to infer invariant assertions. Theseobtained assertions are injected back into the JavaScript code to uncover regressionfaults in subsequent revisions of the web application under test. Our techniqueautomatically (1) intercepts and instruments JavaScript on-the-fly to add tracingcode (2) navigates the web application to produce execution traces, (3) generatesdynamic invariants from the trace data, (4) transforms the invariants into stableassertions and injects them back into the web application for regression testing.Our approach is orthogonal to server-side technology, and it requires no man-ual modification of the source code. It is implemented in an open source toolcalled JSART (JavaScript Assertion-based Regression Testing). We have empiri-cally evaluated the technique on nine open-source web applications. The results ofour evaluation show that the approach generates stable invariant assertions, whichare capable of spotting injected faults with a high rate of accuracy.2.2 Motivation and ChallengesFigure 2.1 shows a simple JavaScript code snippet. Our motivating example con-sists of two functions, called setDim and play. The setDim function has twoparameters, namely height and width, with a simple mathematical operation(Line 2). The function returns local variables, h and w (Line 4). setDim is called12in the play function (Line 8) to set the height value of the CSS property ofthe DOM element with ID end. Any modification to the values of height orwidth would influence the returned values of setDim as well as the propertyof the DOM element. Typical programmatic errors include swapping the order ofheight and width when they are respectively assigned to local variables h andw or calling setDim with wrong arguments, i.e., changing the order of functionarguments.Detecting such regression errors is a daunting task for web developers, es-pecially in programming languages such as JavaScript, which are known to bechallenging to test. One way to check for these regressions is to define invariantexpressions of expected behaviour [77] over program variables and assert their cor-rectness at runtime. This way any modification to height, width, h, or w thatviolates the invariant expression will be detected. However, manually expressingsuch assertions for web applications with thousands of lines of JavaScript codeand several DOM states, is a challenging and time-consuming task. Our aim in thiswork is to provide a technique that automatically captures regression faults throughgenerated JavaScript assertions.2.3 Our ApproachOur regression testing approach is based on dynamic analysis of JavaScript code toinfer invariants from a given web application. We use the thus obtained invariantsas runtime assertions in the JavaScript code to automatically uncover regressionerrors that can be introduced after changes have been made to the web applicationin a subsequent reversion. Our approach is largely based on two assumptions (1)the current version of the web application, from which invariants are being gen-erated, is bug-free (2) the inferred invariants capture program specifications thatare unlikely to change frequently in the following revisions (we revisit these twoassumptions in Section 5.4.4). Our regression testing technique is composed ofthe following four main steps: (1) JavaScript tracing, (2) Invariant generation, (3)Filtering unstable assertions, and (4) Regression testing through assertions. In thefollowing subsections, we will describe each step in details.13Intercept/Instrument JS code Navigate Collect TracesInvariants Server BrowserGenerate InvariantsWeb application (version n)Figure 2.2: Overview of the JavaScript tracing and invariant generation steps (web application version n).2.3.1 JavaScript TracingIn order to infer useful program invariants, we need to collect execution traces ofthe JavaScript code. The idea is to log as much program variable value changes atruntime as possible. Figure 2.2 depicts a block diagram of the tracing step. Our ap-proach automatically generates trace data in three subsequent steps: (i) JavaScriptinterception and instrumentation, (ii) navigation, and (iii) trace collection. In thefollowing, we explain each step in details.JavaScript Interception and Instrumentation.The approach we have chosen for logging variables is on-the-fly JavaScript sourcecode transformation to add instrumentation code. We intercept all the JavaScriptcode of a given web application, both in JavaScript files and HTML pages, bysetting up a proxy [26] between the server and the browser. We first parse theintercepted source code into an Abstract Syntax Tree (AST). We then traverse theAST in search of program variables as well as DOM modifications as describedbelow.Tracing Program Variables. Our first interest is the range of values of JavaSc-ript program variables. We probe function entry and function exit points, by identi-fying function definitions in the AST and injecting statements at the start, end, andbefore every return statement. We instrument the code to monitor value changesof global variables, function arguments, and local variables. Per program point,we yield information on script name, function name, and line number, used for de-bugging purposes. Going back to our running example (Figure 2.1), our techniqueadds instrumentation code to trace width, height, h, and w. For each variable,14we collect information on name, runtime type, and actual values. The runtime typeis stored because JavaScript is a loosely typed language, i.e., the types of variablescannot be determined syntactically, thus we log the variable types at runtime.Tracing DOM Modifications. In modern web applications, JavaScript codefrequently interacts with the DOM to update the client-side user interface state.Our recent study [87] of four bug-tracking systems indicated that DOM-relatederrors form 80% of all reported JavaScript errors. Therefore, we include in ourexecution trace how DOM elements and their attributes are modified by JavaScriptat runtime. For instance, by tracing how the CSS property of the ‘end’ DOMelement in Figure 2.1 is changed during various execution runs, we can infer therange of values for the height attribute.Based on our observations, JavaScript DOM modifications usually follow acertain pattern. Once the pattern is reverse engineered, we can add proper instru-mentation code around the pattern to trace the changes. In the patterns that weobserved, first a JavaScript API is used to find the desired DOM element. Next,a function is called on the returned object responsible for the actual modificationof the DOM-tree. After recognizing a pattern in the parsed AST, we add instru-mentation code that records the value of the DOM attribute before and after theactual modification. Hence, we are able to trace DOM modifications that happenprogrammatically through JavaScript.Navigation.Once the AST is instrumented, we serialize it back to the corresponding JavaScriptsource code file and pass it to the browser. Next, we navigate the application inthe browser to produce an execution trace. The application can be navigated indifferent ways including (1) manual clicking (2) test case execution (3) or usinga web crawler. To automate the approach, our technique is based on automateddynamic crawling [75]. The execution needs to run as much of the JavaScript codeas possible and execute it in various ways. This can be achieved through visitingas many DOM state changes as possible as well as providing different values forfunction arguments.15Inject Assertions into JS code NavigateCollect Assertion ViolationsInvariants Server BrowserRemove Violated AssertionsConvert to AssertionsWeb application (version n)Stable invariant assertions Figure 2.3: Overview of the filtering step to remove unstable invariant assertions, for web applicationversion n.Trace Collection.As the web application is navigated, the instrumented JavaScript code producestrace data, which needs to be collected for further analysis. Keeping the trace datain the browser’s memory during the program execution can make the browser slowwhen a large amount of trace data is being produced. On the other hand, sendingdata items to the proxy as soon as the item is generated, can put a heavy load onthe proxy, due to the frequency of HTTP requests. In order to tackle the aforemen-tioned challenges, we buffer a certain amount of trace data in the memory in anarray, post the data as an HTTP request to a proxy server when the buffer’s sizereaches a predefined threshold, and immediately clear the buffer in the browser’smemory afterwards. Since the data arrives at the server in a synchronized manner,we concatenate the tracing data into a single trace file on the server side, which isthen seeded into the next step (See Figure 2.2).2.3.2 Invariant GenerationThe assertion generation phase is involved with analyzing the collected executiontraces to extract invariants. Substantial amount of research has been carried outon detecting dynamic program invariants [29, 37, 43, 56]. Our approach is basedon Daikon [43] to infer likely invariants. As indicated with the dotted line in Fig-ure 2.2, we cycle through the navigation and invariant generation phases until thesize of generated invariant file remains unchanged, which is an indication that allpossible invariants have been detected.162.3.3 Filtering Unstable Invariant AssertionsThe next step is to make sure that the generated invariants are truly invariants.An invariant assertion is called unstable when it is falsely reported as a violatedassertion. Such assertions result in producing a number of false positive errorsduring the testing phase. To check the stability of the inferred invariants, we usethem in the same version of the web application as assertions. From the user’sperspective, no assertion violations should be reported because the web applicationhas not changed. Hence, any assertion violation reported as such is a false positiveand should be eliminated. Our filtering process, shown in Figure 2.3, consists ofthe following four processes:• Converting the inferred invariants into checkable assertions;• Injecting the invariant assertions into the same version of the web applica-tion;• Navigating the web application;• Collecting assertion violations and removing them;From each of the inferred invariants, we generate an assertion in JavaScriptformat. We use on-the-fly transformation to inject the assertions directly into thesource code of the same version of the web application. Since we have all theinformation about the program points and the location of the assertions, we caninject the assertions at the correct location in the JavaScript source code throughthe proxy, while the code is being sent to the client by the server. This way theassertions can run as part of the client-side code and gain access to the values of allprogram variables needed at runtime. Once the assertions are injected, we executethe web application in the browser and log the output. Next we collect and removeany violated assertions. The output of this step is a set of stable invariant assertions,used for automated regression testing in the next step.2.3.4 Regression Testing through AssertionsOnce a set of stable invariant assertions are derived from version n of a web ap-plication, they can be used for automatic regression testing a subsequent version17Inject Assertions into JS code NavigateCollect Assertion FailuresServer BrowserReport FailuresWeb application (version n+1)Stable invariant assertions Test reportFigure 2.4: Overview of the JavaScript regression testing step through invariant assertions, for web appli-cation version n+1.(n+1) of the web application. The regression testing phase is depicted in Fig-ure 2.4.We inject the inferred stable assertions to the JavaScript source code of themodified web application, in a similar fashion to the filtering step in Section 2.3.3.Once the assertions are injected, the new version of the web application is readyfor regression testing. Any failed assertion during the testing phase generates anentry in the test report, which is presented to the tester at the end of the testing step.The generated test report provides precise information on the failed assertion, thefile name, the line number, and the function name of the assertion.1 function setDim(height, width) {2 assert((width < height), ‘example.js:setDim:ENTER:POINT1’);3 var h = 4*height, w = 2*width;4 ...5 assert((width < height), ‘example.js:setDim:EXIT:POINT1’);6 assert((w < h), ‘example.js:setDim:EXIT:POINT1’);7 return{h:h, w:w};8 }10 function play(){11 $(#end).css("height", setDim($('body').width(), $('body').height()).←↩h + 'px');12 assert(isIn($(‘#end’).css(‘height’), {100, 200,300}),‘example.js:play:POINT3’);13 ...14 }Figure 2.5: Invariant assertion code for JavaScript function parameters, local variables and DOM modifi-cations. Injected assertions are shown in bold.Figure 2.5 shows the automatically injected invariant assertions for our runningexample of Figure 2.1. Note that we do not show all the assertions as they clut-18ter the figure. Each assert call has the invariant as the first parameter and thecorresponding debugging information in the second parameter, which includes in-formation about script name, function name, and line number. In this example, theinferred invariants yield information about the inequality relation between func-tion arguments, width and height, as well as local variables, w and h. Theassertions in lines 2 and 5-6 check the corresponding inequalities, at entry and exitpoints of the setDim function at runtime. The example also shows the asser-tion that checks the height attribute of the DOM element, after the JavaScriptDOM modification in the play function. The assertion that comes after the DOMmanipulation (Line 12) checks the height value by calling the auxiliary isInfunction. isIn checks the value of height to be in the given range, i.e., either100, 200, or 300. Any values out of the specified range would violate the assertion.2.4 Tool ImplementationWe have implemented our JavaScript regression testing approach in a tool calledJSART. JSART is written in Java and is available for download.2JSART extends and builds on top of our InvarScope [52] tool. For JavaScriptcode interception, we use an enhanced version of Web-Scarab’s proxy [26]. Thisenables us to automatically analyze and modify the content of HTTP responses be-fore they reach the browser. To instrument the intercepted code, Mozilla Rhino3 isused to parse JavaScript code to an AST, and back to the source code after instru-mentation. The AST generated by Rhino’s parser has traversal API’s, which weuse to search for program points where instrumentation code needs to be added.For the invariant generation step, we have extended Daikon [43] with support foraccepting input and generating output in JavaScript syntax. The input files are cre-ated from the trace data and fed through the enhanced version of Daikon to derivedynamic invariants. The navigation step is automated by making JSART operate asa plugin on top of our dynamic AJAX crawler, CRAWLJAX [75].42 http://salt.ece.ubc.ca/content/jsart/3 http://www.mozilla.org/rhino/4 http://www.crawljax.com192.5 Empirical EvaluationTo quantitatively assess the accuracy and efficiency of our approach, we have con-ducted a case study following guidelines from Runeson and Ho¨st [101]. In ourevaluation, we address the following research questions:RQ1 How successful is JSART in generating stable invariant assertions?RQ2 How effective is our overall regression testing approach in terms of correctlydetecting faults?RQ3 What is the performance overhead of JSART?The experimental data produced by JSART is available for download.22.5.1 Experimental ObjectsOur study includes nine web-based systems in total. Six are game applications,namely, SameGame, Tunnel, TicTacToe, Symbol, ResizeMe, and GhostBusters.Two of the web applications are Jason and Sofa, which are a personal and a com-pany homepage, respectively. We further include TuduList, which is a web-basedtask management application. All these applications are open source and useJavaScript on the client-side.Table 2.1 presents each application’s ID, name, and resource, as well as thecharacteristics of the custom JavaScript code, such as JavaScript lines of code(LOC), number of functions, number of local and global variables, as well as thecyclomatic complexity (CC). We use Eclipse IDE to count the JavaScript lines ofcode, number of functions, number of local as well as global variables. JSme-ter 5 is used to compute the cyclomatic complexity. We compute the cyclomaticcomplexity across all JavaScript functions in the application.2.5.2 Experimental SetupTo run the experiment, we provide the URL of each experimental object to JSART.In order to produce representative execution traces, we navigate each application5 http://jsmeter.info20Table 2.1: Characteristics of the experimental objects.AppIDNameJSLOC#Functions#LocalVars#GlobalVarsCCResource1 SameGame 206 9 32 5 37 crawljax.com/same-game2 Tunnel 334 32 18 13 39 arcade.christianmontoya.com/tunnel3 TicTacToe 239 11 22 23 83 dynamicdrive.com/dynamicindex12/tictactoe.htm4 Symbol 204 20 28 16 32 10k.aneventapart.com/2/Uploads/6525 ResizeMe 45 5 4 7 2 10k.aneventapart.com/2/Uploads/5946 GhostBusters 277 27 75 4 52 10k.aneventapart.com/2/Uploads/6577 Jason 107 8 4 8 6 jasonjulien.com8 Sofa 102 22 2 1 5 madebysofa.com/archive9 TuduList 2767 229 199 31 28 tudu.ess.ch/tuduseveral times with different crawling settings. Crawling settings differ in the num-ber of visited states, depth of crawling, crawling time, and clickable element types.To obtain representative data traces, each of our experimental objects is navigatedthree times on average. Although JSART can easily instrument the source code ofimported JavaScript libraries (e.g., jQuery, Prototype, etc), in our experiments weare merely interested in custom code written by developers, since we believe thatis where most programming errors occur.To evaluate our approach in terms of inferring stable invariant assertions (RQ1),we count the number of stable invariant assertions generated by JSART before andafter performing the filtering step. As a last check, we execute the initial versionof the application using the stable assertions to see whether our filtered invariantassertions are reporting any false positives.Once the stable invariant assertions are obtained for each web application, weperform regression testing on modified versions of each application (RQ2). To thatend, in order to mimic regression faults, we produce twenty different versions foreach web application by injecting twenty faults into the original version, one at atime. We categorize our faults according to the following fault model:1. Modifying Conditional Statements: This category is concerned with swap-ping consecutive conditional statements, changing the upper/lower boundsof loop statements, as well as modifying the condition itself;21Table 2.2: Properties of the invariant assertions generated by JSART.AppIDTraceData(MB)#TotalAssertions#EntryAssertions#ExitAssertions#DOMAssertions#TotalUnstableAssertions#UnstableEntryAssertions#UnstableExitAssertions#UnstableDOMAssertions#TotalStableAssertions#StableEntryAssertions#StableExitAssertions#StableDOMAssertions1 8.6 303 120 171 12 0 0 0 0 303 120 171 122 124 2147 1048 1085 14 14 9 5 0 2133 1039 1080 143 1.2 766 387 379 0 16 8 8 0 750 379 371 04 31.7 311 138 171 2 14 7 7 0 297 131 164 25 0.4 55 20 27 8 0 0 0 0 55 20 27 86 2.3 464 160 266 38 3 1 2 0 461 159 264 387 1.2 29 4 6 19 0 0 0 0 29 4 6 198 0.1 20 2 2 16 0 0 0 0 20 2 2 169 2.6 163 58 104 1 0 0 0 0 163 58 104 12. Modifying Global/Local Variables: In this category, global/local variablesare changed by modifying their values at any point of the program, as wellas removing or changing their names;3. Changing Function Parameters/Arguments: This category is concernedwith changing function parameters or function call arguments by swapping,removing, and renaming parameters/arguments. Changing the sequence ofconsecutive function calls is also included in this category;4. DOM modifications: Another type of fault, which is introduced in ourfault model is modifying DOM properties at both JavaScript code level andHTML code level.For each fault injection step, we randomly pick a JavaScript function in theapplication code and seed a fault according to our fault model. We seed five faultsfrom each category.To evaluate the effectiveness of JSART (RQ2), we measure the precision andrecall as follows:Precision is the rate of injected faults found by the tool that are correct: TPTP+FP22Recall is the rate of correct injected faults that the tool finds: TPTP+FNwhere T P (true positives), FP (false positives), and FN (false negatives) respec-tively represent the number of faults that are correctly detected, falsely reported,and missed.To evaluate the performance of JSART (RQ3), we measure the extra timeneeded to execute the application while assertion checks are in place.2.5.3 ResultsIn this section, we discuss the results of the case study with regard to our threeresearch questions.Generated Invariant Assertions.Table 2.2 presents the data generated by our tool. For each web application, thetable shows the total size of collected execution traces (MB), the total numberof generated JavaScript assertions, the number of assertions at entry point of thefunctions, the number of assertions at exit point of the functions, and the numberof DOM assertions. The unstable assertions before the filtering as well as the sta-ble assertions after the filtering step are also presented. As shown in the table, forapplications 1, 5, 7, 8, and 9, all the generated invariant assertions are stable andthe filtering step does not remove any assertions. For the remaining four applica-tions (2, 3, 4, 6), less than 5% of the total invariant assertions are seen as unstableand removed in the filtering process. Thus, for all the experimental objects, theresulting stable assertions found by the tool is more than 95% of the total asser-tions. Moreover, we do not observe any unstable DOM assertions. In order toassure the stability of the resulting assertions, we examine the obtained assertionsfrom the filtering step across multiple executions of the original application. Theresults show that all the resulting invariant assertions are truly stable since we donot observe any false positives.As far as RQ1 is concerned, our findings indicate that (1) our tool is capable ofautomatically generating a high rate of JavaScript invariant assertions, (2) the un-stable assertions are less than 5% of the total generated assertions, (3) the filteringtechnique is able to remove the few unstable assertions, and (4) all the remaining23Table 2.3: Precision and Recall for JSART fault detection.App ID # FN # FP # TP Precision (%) Recall (%)1 2 0 18 100 902 4 0 16 100 803 1 0 19 100 954 2 0 18 100 905 0 0 20 100 1006 1 0 19 100 957 0 0 20 100 1008 0 0 20 100 1009 1 0 19 100 95invariant assertions that JSART outputs are stable, i.e., they do not produce anyfalse positives on the same version of the web application.Effectiveness.Since applications 3, 4, and 9 do not contain many DOM assertions, we werenot able to inject 5 faults from the DOM modification category. Therefore, werandomly chose faults from the other fault model categories.In Table 2.3, we present the accuracy of JSART in terms of its fault findingcapability. The table shows the number of false negatives, false positives, truepositives, as well as the percentages for precision and recall. As far as RQ2 is con-cerned, our results show that JSART is very accurate in detecting faults. The pre-cision is 100%, meaning that all the injected faults, which are reported by the tool,are correct. This also implies that our filtering mechanism successfully eliminatesunstable assertions as we do not observe any false positives. The recall oscillatesbetween 80-100%, which is caused by a low rate of missed faults (discussed inSection 5.4.4 under Limitations). Therefore, as far as RQ2 is concerned, JSART isable to successfully spot the injected faults with a high accuracy rate.Performance.Figure 2.6 depicts the total running time needed for executing each web applicationwith and without the assertion code. Checking a fairly large number of assertionsat runtime can be time consuming. Thus, to capture the effect of the added asser-tions on the execution time, we exploit a 2-scale diagram. As shown in Figure 2.6,24each experimental object is associated with two types of data. The left-hand Y-axis represents the running time (seconds), whereas the right-hand Y-axis showsthe number of assertions. This way we can observe how the number of assertionsrelates to the running time. As expected, the figure shows that by increasing thenumber of assertions, the running time increases to some degree. While the timeoverhead of around 20 seconds is more evident for the experimental object 2 (i.e.,Tunnel with 2147 number of assertions), it is negligible for the rest of the experi-mental objects. Considering that Tunnel has 260 statements in total, the number ofassertions instrumented in the code is eight times more than the number of state-ments in the original version. Therefore, it is reasonable to observe a small amountof overhead. Though assertions introduce some amount of overhead, it is worthmentioning that we have not experienced a noticeable change (i.e., freezing orslowed down execution) while running the application in the browser.Thus, as far as RQ3 is concerned, the amount of overhead introduced by ourapproach is 6 seconds on average for our experimental objects, which is negligi-ble during testing. Furthermore, based on our observations, the assertions do notnegatively affect the observable behaviour of the web applications in the browser.2.6 DiscussionUnstable Assertions.As mentioned in Section 2.3.3, we observe a few number of unstable invariantassertions initially, which are removed by our filtering mechanism. By analyzingour trace data, we observe that such unstable assertions arise mainly because ofthe multiple runtime types of JavaScript variables. This is based on the fact thatin JavaScript it is possible to change the type of a variable at runtime. However,Daikon treats variables as single type, selects the first observed type, and ignoresthe subsequent types in the trace data. This results in producing a few number ofunstable invariant assertions for JavaScript. We remove such unstable assertionsin our filtering step. A drawback of removing these assertions, is that our toolmight miss a fault during the regression testing phase. However, according to ourobservations, such unstable assertions form only around 5% of the total generated252 4 6 8020406080100Experimental ObjectsRun Time (sec)lllllll ll1 3 5 7 905001000150020002500Number of AssertionslRun time with assertion checkingRun time without assertion checkingNumber of assertions per applicationFigure 2.6: Performance plot of JSART.assertions. Thus, we are still able to achieve high accuracy as presented in theprevious section.Limitations.Our approach is not able to detect syntax errors that are present in the JavaScriptcode. Furthermore, tracing DOM manipulations using APIs other than the standardDOM API or jQuery is currently not supported by JSART. Further, a regressionfault either directly violates an invariant assertion, or it can violate closely relatedassertions, which have been affected by the fault. However, if the tool is not ableto infer any invariants in the affected scope of the error, it fails to detect the fault.This results in observing a low rate of false negatives as illustrated in Section 5.4.Revisiting the Assumptions.As we mentioned in Section 5.3, we assume that the current version of the webapplication is bug-free. This is based on the fact that in regression testing a goldstandard is always needed as a trusted version for comparing the test results against26Table 2.4: Manual effort imposed by our approach for deriving stable invariant assertions.App ID Total Time (min) Manual Effort (min)1 13 42 11.5 33 15.5 54 11 35 6.5 2.56 9 4.57 7.5 3.58 6.5 29 18 13[28] to detect regression faults. However, if the original version of the applicationdoes contain an error, the generated assertions might reflect the error as well, andas such they are not able to detect the fault. Our second assumption states thatthe program specifications are unlikely to change frequently in revisions. Here weassume that software programs evolve gradually and regression faults are mostlydue to small changes. However, if major upgrades occur in subsequent revisionssuch that the core specification of the application is affected, the inferred invariantsfrom the original version may not be valid any longer and new invariant assertionsneed to be generated.Automation Level.While the testing phase of JSART is fully automated, the navigation part requiressome manual effort. Although the crawling is performed automatically, we do needto manually setup the tool with different crawling configurations per applicationexecution. Moreover, for each application run, we manually look at the size of theinvariant output to decide whether more execution traces (and thus more crawlingsessions) are needed. We present the manual effort involved with detecting stableinvariant assertions in Table 2.4. The table shows the total time, which is the dura-tion time of deriving stable assertions including both automatic and manual parts.The reported manual effort contains the amount of time required for setting up thetool as well as the manual tasks involved with the navigation part. The results showthe average manual effort is less than 5 minutes.272.7 Related WorkAutomated testing of modern web applications is becoming an active area of re-search [22, 72, 76, 93]. Most of the existing work on JavaScript analysis is, how-ever, focused on spotting errors and security vulnerabilities through static analysis[53, 54, 118]. We classify related work into two broad categories: web applicationregression testing and program invariants.Web Application Regression Testing.Regression testing of web applications has received relatively limited attentionfrom the research community [109, 114]. Alshahwan and Harman [19] discuss analgorithm for regression testing of web applications that is based on session data[41, 106] repair. Roest et al. [99] propose a technique to cope with the dynamismin Ajax web interfaces while conducting automated regression testing. None ofthese works, however, target regression testing of JavaScript in particular.Program Invariants.The concept of using invariants to assert program behaviour at runtime is as old asprogramming itself [34]. A more recent development is the automatic detection ofprogram invariants through dynamic analysis. Ernst et al. have developed Daikon[43], a tool capable of inferring likely invariants from program execution traces.Other related tools for detecting invariants include Agitator [29], DIDUCE [56],and DySy [37]. Recently, Ratcliff et al. [96] have proposed a technique to reusethe trace generation of Daikon and integrate it with genetic programming to pro-duce useful invariants. Conceptually related to our work, Rodrı´guez-Carbonell andKapur [98] use inferred invariant assertions for program verification.Mesbah et al. [76] proposed a framework called ATUSA for manually speci-fying generic and application-specific invariants on the DOM-tree and JavaScriptcode. These invariants were subsequently used as test oracles to detect erroneousbehaviours in modern web applications. Pattabiraman and Zorn proposed DoDOM[93], a tool for inferring invariants from the DOM tree of web applications for re-liability testing.To the best of our knowledge, this work is the first to propose an automated28regression testing approach for JavaScript, which is based on JavaScript invariantassertion generation and runtime checking.2.8 ConclusionsIn this work, we present an automated technique for JavaScript regression testingbased on generated invariant assertions. The contributions of this work can besummarized as follows:• A method for detecting JavaScript invariants across multiple application exe-cutions through on-the-fly JavaScript instrumentation and tracing of programvariables and DOM manipulations;• A technique for automatically converting the inferred invariants into stableassertions, and injecting them back into the web application for regressiontesting;• The implementation of our proposed technique in an open source tool calledJSART;• An empirical study on nine open source JavaScript applications. The resultsof our study show that our tool is able to effectively infer stable assertionsand detect regression faults with minimal performance overhead;29Chapter 3Guided Mutation Testing for JavaScript WebApplicationsSummary6Mutation testing is an effective test adequacy assessment technique. However,there is a high computational cost in executing the test suite against a potentiallylarge pool of generated mutants. Moreover, there is much effort involved in filteringout equivalent mutants. Prior work has mainly focused on detecting equivalent mu-tants after the mutation generation phase, which is computationally expensive andhas limited efficiency. We propose an algorithm to select variables and branchesfor mutation as well as a metric, called FunctionRank, to rank functions accordingto their relative importance from the application’s behaviour point of view. Wepresent a technique that leverages static and dynamic analysis to guide the muta-tion generation process towards parts of the code that are more likely to influencethe program’s output. Further, we focus on the JavaScript language, and propose aset of mutation operators that are specific to web applications. We implement ourapproach in a tool called MUTANDIS. The results of our empirical evaluation showthat (1) more than 93% of generated mutants are non-equivalent, and (2) morethan 75% of the surviving non-equivalent mutants are in the top 30% of the rankedfunctions.6This chapter appeared at the IEEE Transactions on Software Engineering (TSE), 2015 [82].303.1 IntroductionMutation testing is a fault-based testing technique to assess and improve the qualityof a test suite. The technique first generates a set of mutants, i.e., modified versionsof the program, by applying a set of well-defined mutation operators on the origi-nal version of the system under test. These mutation operators typically representsubtle mistakes, such as typos, commonly made by programmers. A test suite’sadequacy is then measured by its ability to detect (or ‘kill’) the mutants, which isknown as the adequacy score (or mutation score).Despite being an effective test adequacy assessment method [21, 66], mutationtesting suffers from two main issues. First, there is a high computational cost inexecuting the test suite against a potentially large set of generated mutants. Sec-ond, there is a significant amount of effort involved in distinguishing equivalentmutants, which are syntactically different but semantically identical to the originalprogram [32]. Equivalent mutants have no observable effect on the application’sbehaviour, and as a result, cannot be killed by test cases. Empirical studies indicatethat about 45% of all undetected mutants are equivalent [70, 103]. According to arecent study [70], it takes on average 15 minutes to manually assess one single first-order mutant. While detecting equivalent mutants consumes considerable amountof time, there is still no fully automated technique that is capable of detecting allthe equivalent mutants [70].There has been significant work on reducing the cost of detecting equivalentmutants. According to the taxonomy suggested by Madeyski et al. [70], threemain categories of approaches deal with the problem of equivalent mutants: (1)detecting equivalent mutants [88, 89], (2) avoiding equivalent mutant generation[18], and (3) suggesting equivalent mutants [103]. Our proposed technique falls inthe second category (these categories are further described in Section 5.5).In this work, we propose a generic mutation testing approach that guides themutation generation process towards effective mutations that (1) affect error-pronesections of the program, (2) impact the program’s behaviour and as such are poten-tially non-equivalent. In our work, effectiveness is defined in terms of the severityof the impact of a single generated mutation on the applications observable be-haviour. Our technique leverages static as well as dynamic program data to rank,31select, and mutate potentially behaviour-affecting portions of the program code.Our mutation testing approach is generic and can be applied to any program-ming language. However, in this work, we implement our technique for JavaScript,a loosely-typed dynamic language that is known to be error-prone [36, 85] and dif-ficult to test [22, 76]. In particular, we propose a set of JavaScript specific muta-tion operators, capturing common JavaScript programmer mistakes. JavaScript iswidely used in modern web applications, which often consist of thousands of linesof JavaScript code, and is critical to their functionality.To the best of our knowledge, our work is the first to provide an automatedmutation testing technique, which is capable of guiding the mutation generationtowards behaviour-affecting mutants in error-prone portions of the code. In addi-tion, we present the first JavaScript mutation testing tool in this work.The key contributions of this work are:• A new metric, called FunctionRank, for ranking functions in terms of theirrelative importance based on the application’s dynamic behaviour;• A method combining dynamic and static analysis to mutate branches that arewithin highly ranked functions and exhibit high structural complexity;• A process that favours behaviour-affecting variables for mutation, to reducethe likelihood of equivalent mutants;• A set of JavaScript-specific mutation operators, based on common mistakesmade by programmers;• An implementation of our mutation testing approach in a tool, called MU-TANDIS7, which is freely available from https://github.com/saltlab/mutandis/;• An empirical evaluation to assess the efficacy of the proposed technique us-ing eight JavaScript applications.Our results show that, on average, 93% of the mutants generated by MUTAN-DIS are non-equivalent. Further, the mutations generated have a high bug severityrank, and are capable of identifying shortcomings in existing JavaScript test suites.7 MUTANDIS is a Latin word meaning “things needing to be changed”.321 function startPlay(){2 ...3 for(i=0; i<$("div").get().length; i++){4 setup($("div").get(i).prop('className'));5 }6 endGame();7 }9 function setup(cellClass){10 var elems=document.getElementsByClassName(cellClass);11 if(elems.length == 0)12 endGame();13 for(i=0; i<elem.length; i++){14 dimension= getDim($(elems).get(i).width(), $(elems).get(i).height←↩());15 $(elems).get(i).css('height', dimension+'px');16 }17 }19 function getDim(width, height){20 var w = width*2, h = height*4;21 var v = w/h;22 if(v > 1)23 return (v);24 else25 return (1/v);26 }28 function endGame(){29 ...30 $('#startCell').css('height', ($('body').width()+$('body').height())←↩/2+'px');31 ...32 }Figure 3.1: JavaScript code of the running example.While the aim of this work is not particularly generating hard-to-kill mutants, ourexperimental results indicate that the guided approach does not adversely influencethe stubbornness of the generated mutants.3.2 Running Example and MotivationEquivalent mutants are syntactically different but semantically equivalent to theoriginal application. Manually analyzing the program code for detecting equivalentmutants is a daunting task especially in programming languages such as JavaScript,which are known to be challenging to use, analyze and test. This is because of (1)the dynamic, loosely typed, and asynchronous nature of JavaScript, and (2) its33complex interaction with the Document Object Model (DOM) at runtime for userinterface state updates.Figure 5.1 presents a snippet of a JavaScript-based game that we will use as arunning example throughout this thesis. The application contains four main func-tions as follows:1. startPlay function calls setup to set the dimension of all div DOMelements;2. setup function is responsible for setting the height value of the cssproperty of all the DOM elements with the given class name. The actualdimension computation is performed by calling the getDim function;3. getDim receives two parameters width and height based on which itreturns the calculated dimension;4. Finally, endGame sets the height value of the css property of a DOMelement with id startCell, to indicate a game termination.Even in this small example, one can observe that the number of possible mu-tants to generate is quite large, i.e., they span from a changed relational operatorin either of the branching statements or a mutated variable name, to completely re-moving a conditional statement or variable initialization. However, not all possiblemutants necessarily affect the behaviour of the application. For example, changingthe “==” sign in the if statement of line 11 to “<=”, will not affect the appli-cation. This is because the number of DOM elements can never become less thanzero, and hence the injected fault does not semantically change the application’sbehaviour. Therefore, it results in an equivalent mutant.In this thesis, we propose to guide the mutation generation towards behaviour-affecting, non-equivalent mutants as described in the next section.3.3 Overview of ApproachAn overview of our mutation testing technique is depicted in Figure 3.2. Our maingoal is to narrow the scope of the mutation process to parts of the code that affect34(1)Intercept & Instrument JS code(2)Run(3)Exec. Trace Server Browser(5)Variable Usage Freq.(6)Dynamic Invariants(7)Function Call Graph(8)Cyclomatic Complexity(10)Variable Selection(11) Variable Mutation(9)Function Rank (12)Branch Mutation(4)StaticJS Code (13) JS SpecificMutation Figure 3.2: Overview of our mutation testing approach.the application’s behaviour, and/or are more likely to be error-prone and difficultto test. We describe our approach below. The numbers below in parentheses corre-spond to those in the boxes of Figure 3.2.In the first part of our approach, we (1) intercept the JavaScript code of a givenweb application, by setting up a proxy between the server and the browser, andinstrument the code, (2) execute the instrumented program by either crawling theapplication automatically, or by running the existing test suite (or a combination ofthe two), and (3) gather detailed execution traces of the application under test.We then extract the following pieces of information from the execution traces,namely (5) variable usage frequency, (6) dynamic invariants, and (7) the functions’call frequency. In addition to dynamically inferred information from the executiontraces, we also construct the function call graph of the application by incorporatingboth static and dynamic information.Using the function call graph and the dynamic call frequencies, we (9) rankthe program’s functions in terms of their relative importance from the application’sbehaviour point of view. The higher a function’s ranking, the more likely it will beselected for mutation in our approach.Further, within the highly ranked functions, our technique (10) identifies vari-ables that have a significant impact on the function’s outcome based on the usagefrequency and dynamic invariants extracted from the execution traces, and (11)selectively mutates only those variables to reduce the likelihood of equivalent mu-35tants.In addition to variables, our technique mutates branch conditions, includingloops. Functions with high cyclomatic complexity are known to be more error-prone and challenging to test [25, 83], as the tester needs to detect and exercise allthe different paths of the function. We therefore (4) statically analyze the JavaScriptcode of the web application, and (8) measure its cyclomatic complexity. To performbranch mutation (12), we target the highly ranked functions (selected in 9) that alsoexhibit high cyclomatic complexity.In addition to the generic mutation operators, our technique considers (13)a number of JavaScript specific mutation operators, based on an investigation ofcommon errors made by web programmers. These specific operators are appliedwithout any ranking or selection process.Our overall guided mutation testing algorithm is presented in Algorithm 1. Inthe following three sections, we describe in detail our technique for ranking func-tions (Section 3.4), ranking and selecting variables (Section 3.5), and performingthe actual mutations, including the mutation operators (Section 3.6).3.4 Ranking FunctionsIn this section, we present our function ranking approach, which is used for selec-tive variable and branch mutation.3.4.1 Ranking Functions for Variable MutationIn order to rank and select functions for variable mutation generation, we proposea new metric called FunctionRank, which is based on PageRank [31], but takesdynamic function calls into account. As such, FunctionRank measures the relativeimportance of each function at runtime. To calculate this metric, we use a functioncall graph inferred from a combination of static and dynamic analysis (line 6 inAlgorithm 1). Our insight is that the more a function is used, the higher its impactwill be on the application’s behaviour. As such, we assign functions that are highlyranked, a higher selection probability for mutation.Function Call Graph. To create a function call graph, we use dynamic as wellas static analysis. We instrument the application to record the callee functions36per call, which are encountered during program execution. However, the obtaineddynamic call graph may be incomplete due to the presence of uncovered functionsduring the program execution. Therefore, to achieve a more complete function callgraph, we further infer the static call graph through static analysis. We detect thefollowing types of function calls in our static analysis of the application’s code:1. Regular function calls e.g., foo();2. Method calls e.g., obj.foo();3. Constructors e.g., new foo();4. Function handlers e.g., e.click(foo);5. Anonymous functions called by either a variable or an object property wherethe anonymous function is saved.We enhance our dynamically inferred call graph of the executed functions bymerging the graph with the statically obtained call graph containing uncoveredfunctions. Note that constructing function call graph for the JavaScript applica-tions using static analysis is often unsound due to highly dynamic nature of theJavaScript language. In JavaScript functions can be called through dynamic prop-erty access (e.g., array[func]). They can be stored in object properties withdifferent names, and properties can be dynamically added or removed. Moreover,JavaScript functions are first class meaning that they can be passed as parameters.While static program analysis cannot reason about such dynamic function callsin JavaScript applications, relying on pure dynamic analysis can also lead to anincomplete call graph because of the unexecuted functions that are part of the un-covered code at run-time. Therefore, in our approach we choose to first constructour function call graph based on dynamic information obtained during the execu-tion, and then make use of static analysis for those functions that are remaineduncovered during the execution.Dynamic Call Frequencies. While the caller-callee edges in the call graph areconstructed through static analysis of the application’s code, the call frequency foreach function is inferred dynamically from the execution traces (line 3 in Algo-rithm 1). The call graph also contains a mock node, called main function, which37Algorithm 1: Guided Mutation Algorithm.input : A Web application App, the maximum number of variable mutations MaxVarMut and branchmutations MaxBrnMutoutput: The mutated versions of application Mutants1 App← INSTRUMENT(App)begin2 trace← COLLECTTRACE(App)3 {callFrq fi, j ,varUsgFrq fi , invars fi} ← GETREQUIREDINFO(trace)4 l = m = 05 while l < MaxVarMut do6 {FR( fi)ni=0} ← FUNCTIONRANK(callGraph,callFrq fi, j )7 mutF← SELECTFUNC((FR( fi))ni=0)8 α ← 11−ReadVar fi9 candidVarsmutF ← invarsmutF ∪{vi|varUsgFrqmutF (vi)> α}10 {pr(vi ∈ candidVarsmutF )} ← 1|candidVarsmutF |11 mutVar← SELECTVAR(candidVarsmutF , pr(vi))12 mutantl ← VARIABLEMUTATION(mutF,mutVar,varMutOps)13 l++end14 varMutants←⋃mutantMaxVarMutl=115 while m < MaxBrnMut do16 {pr( fi)ni=0} ← f cc( fi)×FR( fi)∑nj=1 f cc( fi)×FR( fi)17 mutF← SELECTFUNC((pr( fi))ni=0)18 mutBrn← SELECTRANDOMBRN(mutF)19 mutantm← BRANCHMUTATION(mutBrn,brnMutOps)20 m++end21 brnMutants←⋃mutantMaxBrnMutm=122 Mutants← varMutants∪brnMutants23 return Mutantsendrepresents the entire code block in the global scope, i.e., global variables and state-ments that are not part of any particular function. The main node does not corre-spond to any function in the program. In addition, function event handlers, whichare executed as a result of triggering an event, are linked to the main node in ourdynamic call graph.The FunctionRank Metric. The original PageRank algorithm [31] assumes thatfor a given vertex, the probability of following all outgoing edges is identical, andhence all edges have the same weight. For FunctionRank, we instead apply edgeweights proportional to the dynamic call frequencies of the functions.Let l( f j, fi) be the weight assigned to edge ( f j, fi), in which function i is calledby function j. We compute l by measuring the frequency of function j calling i38during the execution. We assign a frequency of 1 to edges directing to unexecutedfunctions. The FunctionRank metric is calculated as:FR( fi) = ∑j∈M( fi)FR( f j)× l( f j, fi), (3.1)where, FR( fi) is the FunctionRank value of function i, l( f j, fi) is the frequency ofcalls from function j to i, and M( fi) is the set of functions that call function iThe initial PageRank metric requires the sum of weights on the outgoing edgesto be 1. Therefore, to solve equation 3.1, we need to normalize the edge weightsfrom each function in our formula such that for each i, ∑nj=1 l( fi, f j) = 1. To pre-serve the impact value of call frequencies on edges when compared globally in thegraph, we normalize l( fi, f j) over the sum of weights on all edges. Since outgoingedges from function fi should sum to 1, an extra node called f akeNode is addedto the graph. Note that the extra f akeNode is different from the mock main nodeadded earlier. f akeNode contains an incoming edge from fi, where:l( fi, f akeNode) = 1−n∑j=1l( fi, f j) (3.2)Functions with no calls are also linked to the f akeNode through an outgoingedge with weight 1.A recursive function is represented by a self-loop to the recursive node in thefunction call graph. The original PageRank does not allow for self-loop nodes(i.e., a web page with a link to itself). Self-loop to a node infinitely increases itsrank without changing the relative rank of the other nodes. Therefore, such nodesare disregarded in the original PageRank formula. However, recursive functionsare inherently important as they are error-prone and difficult to debug, and theycan easily propagate a fault into higher level functions. To incorporate recursivefunctions in our analysis, we break the self-loop to a recursive function Rec fi byreplacing the function with nodes fi and fci in the function call graph. We furtheradd an edge l( fi, fci), where l is the call frequency associated with the recursivecall. All functions that are called by Rec fi will get an incoming edge from theadded node fci. This way, all the functions called by Rec fi are now linked to fci(and indirectly linked to fi). After the FunctionRank metric is computed over all39setup10 12endGamestartPlaygetDim 20(a) Call graph with call numbers.setup0.3 0.030.06endGamestartPlaygetDim0.61fakeNode0.670.3311(b) Call graph with call frequencies.Figure 3.3: Call graph of the running example.functions, we assign the new FunctionRank value of the recursive node as follows:FR(Rec fi) = FR( fi)+FR( fci), where FR(Rec fi) is the new FunctionRank valueassigned to the recursive function Rec fi.We initially assign equal FunctionRank values to all nodes in 3.1.The calcu-lation of FunctionRank is performed recursively, until the values converge. Thus,the FunctionRank of a function i depends on:1. the number of functions that call i;2. the FunctionRank values of the functions that call i (incoming edges);3. the number of dynamic calls to i.Hence, a function that is called by several functions with high FunctionRanksand high call frequencies receives a high FunctionRank itself.At the end of the process, the extra function f akeNode is removed and theFunctionRank value of all other functions is multiplied by 11−FR f akeNode , whereFR f akeNode is the calculated FunctionRank of f akeNode.Overall, our approach assigns each function a FunctionRank value between 0and 1. These values are used to rank and select functions for variable mutation(lines 6-7 in Algorithm 1). The higher the FunctionRank value of a given function,the more likely it is to be selected for mutation.40Figure 3.3 depicts the function call graph obtained from our running example(Figure 5.1). The labels on the edges of Figure 3.3a show the number of callsto each function in the graph. Figure 3.3b shows the modified graph with theextra node f akeNode added to compute the normalized function call frequencyvalues. In our example, assuming that the number of div elements is 20 (line3 in Figure 5.1), setup will be called 20 times and endGame will be calledonce (lines 4 and 6). Now, assume that the number of DOM elements with theclass name specified as the input to function setup varies each time setup iscalled (line 10) such that two elements have a length of zero and the total lengthof the rest is 20. Then, function endGame is called twice (in line 12) when thelength of such elements is zero, and getDim is called 20 times in total (line 14).Therefore, the call frequencies of functions setup and endGame become 0.3 and0.03 respectively when they are called by startPlay in lines 4 and 6. Similarly,the call frequencies of getDim and endGame become 0.61 and 0.06, respectively,when called by setup.Note that if the weight on an outgoing edge of a given function is simplynormalized over the sum of the weights on all the outgoing edges of that func-tion, then the call frequencies for both setup and getDim become 0.91 whenthey are called by startPlay and setup, respectively. However, as shown inFigure 3.3a the number of times that function getDim is called is twice that ofsetup. To obtain a realistic normalization, we have introduced the f akeNode, asshown in Figure 3.3b.Table 3.1 shows the computed FunctionRank values, using equation 3.1, foreach function of the running example. The values are presented as percentages.getDim achieves the highest FunctionRank because of the relatively high valuesof both the incoming edge weight (where getDim is called by setup in line 14in Figure 5.1), and the FunctionRank of its caller node, setup. These rankingvalues are used as probability values for selecting a function for mutation.To illustrate the advantage of FunctionRank, we show the same calculationusing the traditional PageRank metric, i.e., without considering dynamic edgeweights. As shown in Table 3.1, endGame obtains the highest ranking usingPageRank. However, this function has not been used extensively during the ap-plication execution, and hence has only limited impact on the behaviour of the41Table 3.1: Computed FunctionRank and PageRank for the running example.Function Name FunctionRank (%) PageRank (%)getDim 34.5 27.0setup 25.0 23.0endGame 21.3 34.6startPlay 19.2 15.4application. In contrast, when FunctionRank is used, endGame falls to the thirdplace, and is hence less likely to be chosen for mutation.3.4.2 Ranking Functions for Branch MutationTo rank functions for branch mutation, in addition to the FunctionRank, we takethe cyclomatic complexity of the functions into account (lines 16–17 in Algo-rithm 1).The cyclomatic complexity measures the number of linearly independent pathsthrough a program’s source code [73]. By using this metric, we aim to concentratethe branch mutation testing effort on the functions that are error-prone and harderto test.We measure the cyclomatic complexity frequency of each function throughstatic analysis of the code. Let f cc( fi) be the cyclomatic complexity frequencymeasured for function fi, then f cc( fi) =cc( fi)∑nj=1 cc( f j), where cc( fi) is the cyclomaticcomplexity of function fi, given that the total number of functions in the applicationis equal to n.We compute the probability of choosing a function fi for branch mutation usingthe previously measured FunctionRank (FR( fi)) as well as the cyclomatic com-plexity frequency ( f cc( fi)). Let p( fi) be the probability of selecting a function fifor branch mutation, then:p( fi) =f cc( fi)×FR( fi)∑nj=1 f cc( f j)×FR( f j), (3.3)where f cc( fi) is the cyclomatic complexity frequency measured for function fi,and n is the total number of functions.Table 3.2 shows the cyclomatic complexity, the frequency, and the function42Table 3.2: Ranking functions for branch mutation (running example).Function Name cc fcc Selection Probability (p)getDim 4 0.4 0.51setup 3 0.3 0.27startPlay 2 0.2 0.14endGame 1 0.1 0.08selection probability measured for each function in our example (Figure 5.1). Theprobabilities are obtained using equation 3.3. As shown in the table, getDimachieves the highest selection probability as both its FunctionRank and cyclomaticcomplexity are high.3.5 Ranking VariablesApplying mutations on arbitrarily chosen variables may have no effect on the se-mantics of the program and hence lead to equivalent mutants. Thus, in addition tofunctions, we measure the importance of variables in terms of their impact on thebehaviour of the function. We target local and global variables, as well as functionparameters for mutation.In order to avoid generating equivalent mutants, within each selected function,we need to mutate variables that are more likely to change the expected behaviourof the application (lines 7-12 in Algorithm 1). We divide such variables into twocategories: (1) those that are part of the program’s dynamic invariants (invarsmutFin line 9); and (2) those with high usage frequency throughout the application’sexecution (varUsgFrqmutF(vi)> α in line 9).3.5.1 Variables Involved in Dynamic InvariantsA recent study [104] showed that if a mutation violates dynamic invariants, it isvery likely to be non-equivalent. This suggests that mutating variables that are in-volved in forming invariants affects the expected behaviour of the application witha high probability. Inspired by this finding, we infer invariants from the executiontraces, as depicted in Figure 3.2. We log variable value changes during run-time,and analyze the collected traces to infer stable dynamic invariants. The details of43our JavaScript invariant generation technique can be found in [78]. From each ob-tained invariant, we identify all the variables that are involved in the invariant andmark them as potential variables for mutation.In our running example (Figure 5.1), an inferred invariant in getDim yieldsinformation about the inequality relation between function parameters width andheight, e.g., (width > height). Based on this invariant, we choose width andheight as potential variables for mutation.3.5.2 Variables with High Usage FrequencyIn addition to dynamic invariants, we consider the impact of variables on the ex-pected behaviour based on their dynamic usage. We define the usage frequencyof a variable as the number of times that the variable’s value has been read duringthe execution in the scope of a given function. Let u(vi) be the usage frequency ofvariable vi, then u(vi) =r(vi)∑nj=1 r(v j), where r(vi) is the number of times that the valueof variable vi is read, given that the total number of variables in the scope of thefunction is n.We identify the usage of a variable by identifying and measuring the frequencyof a given variable being read in the following scenarios: (1) variable initialization,(2) mathematical computation, (3) condition checking in conditional statements,(4) function arguments, and (5) returned value of the function. We assign the samelevel of importance to all the five scenarios.From the degree of a variable’s usage frequency in the scope of a given func-tion, we infer to what extent the behaviour of the function relies on that variable.Leveraging the collected execution traces, we compute the usage frequencies in thescope of a function. We choose variables with usage frequencies above a thresholdα as potential candidates for the mutation process. We automatically compute (line8 in Algorithm 1) this threshold for each function as:α =1ReadVariables f (i), (3.4)where ReadVariables f (i) is the total number of variables that at some point duringthe execution their value have been read within function f (i).Going back to the getDim function in our running example of Figure 5.1, the44values of function parameters width and height, as well as the local variables wand h are read just once in lines 19 and 20, when they are involved in a number ofsimple computations. The result of the calculation is assigned to the local variablev, which then is checked as a condition for the if-else statement. v is returnedfrom the function in either line 22 or 24, depending on the outcome of the ifstatement. In this example, variable v has the highest usage frequency since it hasbeen used as a condition in a conditional statement as well as the returned value ofthe getDim function.Overall, we gather a list of potential variables for mutation, which are obtainedbased on the inferred dynamic invariants and their usage frequency (line 9 in Al-gorithm 1). Therefore, in our running example, in addition to function parameterswidth and height, which are part of the invariants inferred from getDim, thelocal variable v is also among the potential variables for the mutation process be-cause of its high usage frequency. Note that the local variables w and h are not inthe list of candidates for variable mutation as they have a low usage frequency andare not part of any dynamic invariants directly.3.6 Mutation OperatorsWe employ generic mutation operators as well as JavaScript specific mutation op-erators for performing mutations.3.6.1 Generic Mutation OperatorsOur mutant generation technique is based on a single mutation at a time. Thus,we need to choose an appropriate candidate among all the potential candidatesobtained from the previous ranking steps of our approach. Our overall guidedmutation process includes:• Selecting a function as described in Section 3.4.1 and mutating a variablerandomly selected from the list of potential candidates obtained from thevariable ranking phase (Section 3.5),• Selecting a function as described in Section 3.4.2 and mutating a branchstatement selected randomly (lines 16-19 in Algorithm 1).45Table 3.3: Generic mutation operators for variables and function parameters.Type Mutation OperatorLocal/GlobalChange the value assigned to the variable.VariableRemove variable declaration/initialization.Change the variable type by converting x =number to x = string.Replace arithmetic operators (+, −, ++, −−, + =,− =, /, ∗) used for calculating and assigning a valueto the selected variable.FunctionParameterSwap parameters/arguments.Remove parameters/arguments.Table 3.4: Generic mutation operators for branch statements.Type Mutation OperatorChange literal values in the condition (including lower/up-per bound).LoopStatementReplace relational operators (<, >, <=, >=, ==, ! =,===, ! ==).Replace logical operators (‖, &&).Swap consecutive nested for/while.Replace arithmetic operators (+, −, ++, −−, +=, −=,/, ∗).Replace x++/x-- with ++x/--x (and vice versa).Remove break/continue.Change literal values in the condition.ConditionalStatementReplace relational operators (<, >, <=, >=, ==, ! =,===, ! ==).Replace logical operators (‖, &&).Remove else if or else from the if statement.Change the condition value of switch-case statement.Remove break from switch-case.Replace 0/1 with false/true (and vice versa) in thecondition.ReturnStatementRemove return statement.Replace true with false (and vice versa) in return(true/false).Table 3.3 shows the generic mutation operators we use for mutating globalvariables, local variables as well as function parameters/arguments. Table 3.4presents the operators we use for changing for loops, while loops, if andswitch-case statements, as well as return statements.463.6.2 JavaScript-Specific Mutation OperatorsWe propose the following JavaScript-specific mutation operators, based on an in-vestigation of various online resources (see below) to understand common mistakesin JavaScript programs from the programmer’s point of view. In accordance to thedefinition of mutation operator concept, which is representing typical program-ming errors, the motivation behind the presented selection of operators is to mimictypical JavaScript related programming errors. To our knowledge, ours is the firstattempt to collect and analyze these resources to formulate JavaScript mutationoperators.Adding/Removing the var keyword. Using var inside a function declares thevariable in local scope, thus preventing overwriting of global variables ([36, 61,90]). A common mistake is to forget to add var, or to add a redundant var, bothof which we consider.Removing the global search flag from replace. A common mistake is assum-ing that the string replace method affects all possible matches. The replacemethod only changes the first occurrence. To replace all occurrences, the globalmodifier needs to be set ([15, 100, 111]).Removing the integer base argument from parseInt. One of the commonerrors with parsing integers in JavaScript is to assume that parseInt returns theinteger value to base 10, however the second argument, which is the base, variesfrom 2 to 36 ([33, 111]).Changing setTimeout function. The first parameter of the setTimeoutshould be a function. Consider f in setTimeout(f, 3000) to be the func-tion that should be executed after 3000 milliseconds. The addition of parentheses“()” to the right of the function name, i.e., setTimeout(f(), 3000) invokesthe function immediately, which is likely not the intention of the programmer. Fur-thermore, in the setTimeout calls, when the function is invoked without passingthe expected parameters, the parameter is set to undefined when the function isexecuted (same changes are applicable to setInterval) ([55, 60, 90]).Replacing undefined with null. A common mistake is to check whether anobject is null, when it is not defined. To be null, the object has to be definedfirst ([36, 100, 111]). Otherwise, an error will result.47Table 3.5: DOM, jQuery, and XmlHttpRequest (XHR) operators.Type Mutation OperatorDOMChange the order of arguments ininsertBefore/replaceChild methods.Change the name of the id/tag used in getElementByIdand getElementByTagName methods.Change the attribute name in setAttribute,getAttribute, and removeAttribute methods.Swap innerHTML and innerText properties.JQUERYSwap {#} and {.} sign used in selectors.Remove {$} sign that returns a JQUERY object.Change the name of the property/class/element in the follow-ing methods: addClass, removeClass, removeAttr,remove, detach, attr, prop, css.XHR Modify request type (Get/Post), URL, and the value of theboolean asynch argument in the request.open method.Change the integer number against which therequest.readyState/request.status is com-pared with; {0, 1, 2, 3, 4} for readyState and {200, 404}for status.Removing this keyword. JavaScript requires the programmer to explicitly statewhich object is being accessed, even if it is the current one. Forgetting to usethis, may cause binding complications ([36, 95, 111]), and result in errors.Replacing (function()!==false) by (function()). If the default valueshould be true, use of (function()) should be avoided. If a function in somecases does not return a value, while the programmer expects a boolean outcome,then the returned value is undefined. Since undefined is coerced to false,the condition statement will not be satisfied. A similar issue arises when replacing(function() === false) with (!function()) ([100]).In addition, we propose a list of DOM specific mutation operators within theJavaScript code. Table 3.5 shows a list of DOM operators that match DOM mod-ification patterns in either pure JavaScript language or the JQUERY library. Wefurther include two mutation operators that target the XmlHttpRequest typeand state as shown in Table 3.5.We believe these specific operators are important to be applied on their own.Hence, they are applied randomly without any ranking scheme, as they are basedon errors commonly made by programmers.483.7 Tool Implementation: MUTANDISWe have implemented our JavaScript mutation testing approach in a tool calledMUTANDIS. MUTANDIS is written in Java and is publicly available for down-load.8To infer JavaScript dynamic invariants, we use our recently developed tool,JSART [78]. For JavaScript code interception, we employ a proxy between theclient and the server. This enables us to automatically analyze the content of HTTPresponses before they reach the browser. To instrument or mutate the interceptedcode, Mozilla Rhino9 is used to parse JavaScript code to an AST, and back to thesource code after the instrumentation or mutation is performed. The executiontrace profiler is able to collect trace data from the instrumented application code byexercising the web application under test through one of the following methods: (1)exhaustive automatic crawling using CRAWLJAX [75], (2) the execution of existingtest cases, or (3) a combination of crawling and test suite execution.3.8 Empirical EvaluationTo quantitatively assess the efficacy of our mutation testing approach, we haveconducted a case study in which we address the following research questions:RQ1 How efficient is MUTANDIS in generating non-equivalent mutants?RQ2 How effective are FunctionRank and selective variable mutation in (i) gener-ating non-equivalent mutants, and (ii) injecting non-trivial behaviour-affectingfaults?RQ3 How useful is MUTANDIS in assessing the existing test cases of a givenapplication?The experimental data produced by MUTANDIS is available for download.88 https://github.com/saltlab/mutandis/9 http://www.mozilla.org/rhino/49Table 3.6: Characteristics of the experimental objects.AppIDNameJSLOC#FunctionsCCResource1 SameGame 206 9 37 http://crawljax.com/same-game2 Tunnel 334 32 39 http://arcade.christianmontoya.com/tunnel3 GhostBusters 277 27 52 http://10k.aneventapart.com/2/Uploads/6574 Symbol 204 20 32 http://10k.aneventapart.com/2/Uploads/6525 TuduList 2767 229 28 http://tudu.ess.ch/tudu6 SimpleCart (library) 1702 23 168 http://simplecartjs.org7 JQUERY (library) 8371 45 37 https://github.com/jquery/jquery8 WymEditor 3035 188 50 https://github.com/wymeditorTable 3.7: Bug severity description.Bug Severity Description RankCritical Crashes, data loss 5Major Major loss of functionality 4Normal Some loss of functionality, regular issues 3Minor Minor loss of functionality 2Trivial Cosmetic issue 13.8.1 Experimental ObjectsOur study includes eight JavaScript-based objects in total. Four are game appli-cations, namely, SameGame, Tunnel, GhostBusters, and Symbol. One is a web-based task management application called TuduList. Two, namely SimpleCart andJQUERY, are JavaScript libraries. The last application, WymEditor, is a web-basedHTML editor. All the experimental objects are open-source applications. One ofour main inclusion criteria was for the applications to extensively use JavaScripton the client-side. Although the game applications used in our study are small sizeweb applications, they all extensively and in many different ways use JavaScript.Table 4.1 presents each application’s ID, name, and resource, as well as thestatic characteristics of the JavaScript code, such as JavaScript lines of code (LOC)excluding libraries, number of functions, and the cyclomatic complexity (CC)across all JavaScript functions in each application.503.8.2 Experimental SetupTo run the analysis, we provide the URL of each experimental object to MUTAN-DIS. Note that because SimpleCart and JQUERY are both JavaScript libraries, theycannot be executed independently. However, since they come with test cases, weuse them to answer RQ3.We evaluate the efficiency of MUTANDIS in generating non-equivalent mutants(RQ1) for the first five applications in Table 4.1. We collect execution traces byinstrumenting the custom JavaScript code of each application and executing theinstrumented code through automated dynamic crawling. We navigate each appli-cation several times with different crawling settings. Crawling settings differ in thenumber of visited states, depth of crawling, and clickable element types. We in-ject a single fault at a time in each of these five applications using MUTANDIS. Thenumber of injected faults for each application is 40; in total, we inject 200 faults forthe five objects. We automatically generate these mutants from the following mu-tation categories: (1) variables, (2) branch statements, and (3) JavaScript-specificoperators. We then examine each application’s behaviour to determine whether thegenerated mutants are equivalent.The determination of whether the mutant is equivalent is semi-automated forobservable changes. An observable change is a change in the behaviour of theapplication which can be observed as the application is automatically executedin the browser. Note that in web applications DOM is an observable unit of theapplication, which is shown in the browser. We execute the same sequence ofevents in the mutated version as it is used in the original version of the application.The resulting observable DOM of the mutated version in the browser is visuallycompared against the original version. If we notice any observable change duringthe execution, the mutant is marked as non-equivalent. This way we can eliminatethe burden of manual analysis of the applications’ source code for every mutants.For non-observable changes, we manually inspect the application’s source code todetermine whether the mutant is equivalent.To make sure that changes in the applications’ behaviour, from which the non-equivalency is determined, are not cosmetic changes we use the bug severity ranksused by Bugzilla, a popular bug tracking system. The description and the rank51associated with each type of bug severity is shown in Table 3.7. We choose non-equivalent mutants from our previously generated mutants (for RQ1). We thenanalyze the output of the mutated version of the application and assign a bug scoreaccording to the ranks in Table 3.7.To address RQ2, we measure the effectiveness of MUTANDIS in compari-son with random-based mutation generation. Moreover, to understand the impactof applying FunctionRank and rank-based variable mutation in generating non-equivalent mutants as well as injecting behaviour-affecting faults we compare:1. The proposed FunctionRank metric with PageRank;2. Our selective variable mutation with random variable mutation;Similar to RQ1, we use the ranks provided by Bugzilla to measure the criticality ofthe injected faults on the non-equivalent mutants.Unfortunately, no test suites are available for the first five applications. Thus,to address RQ3, we run our tool on the SimpleCart, JQUERY, and WymEditor thatcome with Qunit10 test cases. We gather the required execution traces of the Sim-pleCart library by running its test cases, as this library has not been deployed on apublicly available application. However, to collect dynamic traces of the JQUERYlibrary, we use one of our experimental objects (SameGame), which uses JQUERYas one of its JavaScript libraries. Unlike the earlier case, we include the JQUERYlibrary in the instrumentation step. We then analyze how the application uses dif-ferent functionalities of the JQUERY library using our approach. The executiontraces of the WymEditor are collected by crawling the application. We generate120 mutants for each of the three experimental objects. Mutated statements, whichare not executed by the test suite are excluded. After injecting a fault using MU-TANDIS, we run the test cases on the mutated version of each application. We de-termine the usefulness of our approach based on (1) the number of non-equivalentgenerated mutants, and (2) the number of non-equivalent surviving mutants. Anon-equivalent surviving mutant is one that is neither killed nor equivalent, and isan indication of the incompleteness of the test cases. The presence of such mutantscan help testers to improve the quality of their test suite. For mature test suites,we expect the number of non-equivalent surviving mutants to be low [62]. We fur-ther compare MUTANDIS against random mutation testing to evaluate the effect of10 http://docs.jquery.com/QUnit52Table 3.8: Mutants generated by MUTANDIS.Name#Mutants#EquivMutants#Non-EquivMutantsEquivMutants(%)BugSeverityRank(avg)BugSeverity(%)SameGame 40 2 38 5.0 3.9 78Tunnel 40 4 36 10.0 3.8 76GhostBusters 40 3 37 7.5 3.2 64Symbol 40 3 37 7.5 3.9 78TuduList 40 2 38 5.0 3.8 76Avg. 40 2.8 37.2 7.0 3.7 74.4our approach on the stubbornness of the generated mutants. Stubborn mutants arenon-equivalent mutants that remain undetected by a high quality test suite [115].3.8.3 ResultsGenerated Non-Equivalent Mutants (RQ1)Table 3.8 presents our results for the number of non-equivalent mutants and theseverity of the injected faults using MUTANDIS. For each web application, thetable shows the number of mutants, number of equivalent mutants, the number ofnon-equivalent mutants, the percentage of equivalent mutants, and the average bugseverity as well as the percentage of the severity in terms of the maximum severitylevel.As shown in the table, the number of equivalent mutants varies between 2–4,which corresponds to less than 10% of the total number of mutants.On average, the percentage of equivalent mutants generated by MUTANDIS is7%, which points to its efficiency in generating non-equivalent mutants.We observe that more than 70% of these equivalent mutants generated by MU-TANDIS originate from the branch mutation category. The reason is that in our cur-rent approach, branch expressions are essentially ranked according to the variablesused in their expressions without considering whether mutating the expression53changes the actual boolean outcome of the whole expression (e.g.; if(trueVar|| var){...}where the value of trueVar is always true, and thus mutatingvar to !var does not affect the boolean outcome of the expression). We furthernotice cases in our experimental objects where the programmer writes essentiallyunused hard-coded branch expressions. For instance, in Tunnel, we observed acouple of return true/false statements at exit point of the functions thathave high FunctionRank and cyclomatic complexity value. However, the returnedvalue is never used by the caller function and hence, mutating the return booleanvalue as part of branch mutation generates an equivalent mutant. This is the mainreason that we observe 10% of equivalent mutants (the highest in Table 3.8) for theTunnel application. Moreover, we notice that certain types of mutation operatorsaffect the number of equivalent mutants. For example for a number of mutations weobserve that replacing >= (<=) sign with > (<) keeps the program’s behaviourunchanged since either the lower/upper bound is never reached or the programmerspecify extra bounds checking before returning the final value.Fault Severity of the Generated Mutants. The fault severity of the injected faultsis also presented in Table 3.8. We computed the percentage of the bug severity asthe ratio of the average severity rank to the maximum severity rank (which is 5).As shown in the table, the average bug severity rank across all applications is 3.72(bug severity percentage is 74.4% on average). We observed only a few faultswith trivial severity (e.g; cosmetic changes). We also noticed a few critical faults(3.8% on average), which caused the web application to terminate prematurely orunexpectedly. It is worth noting that full crashes are not that common for webapplications, since web browsers typically do not stop executing the entire webapplication when an error occurs. The other executable parts of the applicationcontinue to run in the browser in response to user events [85]. Therefore, it is veryrare for web applications to have type 5 errors, and hence the maximum severityrank is often 4.More than 70% of the injected faults causing normal to major loss of func-tionality are in the top 20% ranked functions, showing the importance ofFunctionRank in the fault seeding process.Moreover, we noticed that the careful choice of a variable for mutation is alsoas important as the function selection. For example, in the SameGame applica-54Experimental ObjectsEquivalent Mutants (%)051015201 2 3 4 5MutandisRandomPageRankRandom Variable MutationFigure 3.4: Equivalent Mutants (%) generated by MUTANDIS, random, PageRank, and random variableselection.tion, the updateBoard function is responsible for redrawing the game boardeach time a cell is clicked. Although updateBoard is ranked as an importantfunction according to its FunctionRank, there are two variables within this func-tion that have high usage frequency compared to other variables. While mutatingeither of these variables causes major loss of functionality, selecting the remainingones for mutation either has no effect or only marginal effect on the application’sbehaviour. Furthermore, we observed that the impact of mutating variables thatare part of the invariants as well as the variables with high usage frequency canseverely affect the application’s behaviour. This indicates that both invariants andusage frequency play a prominent role in generating faults that cause major loss offunctionality, thereby justifying our choice of these two metrics for variable selec-tion (Section 3.5).Effectiveness of FunctionRank and selective variable mutation (RQ2)The results obtained from MUTANDIS, random mutation, PageRank, and randomvariable mutation in terms of the percentage of equivalent mutants and bug severityrank are shown in Figure 3.4 and Figure 3.5, respectively.55Experimental ObjectsBug Severity Rank0123451 2 3 4 5MutandisRandomPageRankRandom Variable MutationFigure 3.5: Bug Severity Rankd (Avg) achieved by MUTANDIS, random, PageRank, and random variableselection.As shown in Figure 3.4, the percentage of equivalent mutants generated byMUTANDIS is always less than or equal to the ones generated by the other threeapproaches. Not surprisingly, random mutation obtains the largest percentage ofequivalent mutants (ranges from 7.5–15%). This indicates that our selective vari-able mutation plays a more prominent role in reducing the percentage of equivalentmutants generated by MUTANDIS.On average, MUTANDIS reduces the number of equivalent mutants by 39% incomparison with random mutation generation.On average, FunctionRank and selective variable mutation reduce the num-ber of equivalent mutants by 12% and 26%, respectively when compared withPageRank and random variable mutation.We observed that for three applications (ID=1, 2, 4) the main reason behindthe reduction in the number of equivalent mutants is the use of selective variablemutation, as by replacing selective variable mutation with random mutation, thepercentage of equivalent mutants significantly increases (ranges from 33–50% in-crement) For these applications, we observed that although high rank functions areselected for mutation, modifying a non-behavioural affecting part of the selected56function’s code (i.e., a useless branch or variable) results in generating an equiv-alent mutant. Therefore, the choice of the variable or branch to mutate is veryimportant.However, for application with ID 3 (GhostBusters), FunctionRank plays aprominent role in reducing the number of equivalent mutants. Figure 3.4 showsthat for this application the percentage of equivalent mutants becomes the same asMUTANDIS, when we use random variable mutation coupled with FunctionRank.We observed that in the aforementioned application, major variables in the programhave high usage frequency. Moreover, these variables are shared among detectedinvariants, thus making the selection of a specific variable for mutation less effec-tive compared to other applications. For the last application (ID 5), we observedthat FunctionRank and selective variable mutation are both effective in terms ofgenerating non-equivalent mutants.Figure 3.5 compares the severity of the injected faults according to the ranksprovided in Table 3.7. The results show that MUTANDIS achieves the highest rankamong the other approaches. Our mutation generation technique increases the criti-cality of the injected faults by 20% in comparison with random mutation approach.We observed that by replacing FunctionRank with PageRank, the severity ofthe behaviour-affecting faults drops by 13%, which indicates that FunctionRankoutperforms PageRank in terms of its impact on the behaviour of the applicationtowards more critical failures.We further noticed that using the proposed selective variable mutation increasesthe bug severity by 9% on average. While this indicates the importance of us-ing the proposed variable mutation technique, it reveals that our rank-based func-tion selection technique plays a more prominent role in increasing the severitydegree of the injected faults compared to our variable selection strategy. For ex-ample, in application with ID 2 (Tunnel), function updateTunnel contains themain logic of the application, and it is among the top-ranked functions. SinceupdateTunnel is significantly used throughout the application’ execution asits high rank indicates, modifications to the variables of the function affects theexpected behaviour of the application, and cause the application to show more se-vere bugs. Our function ranking technique is able to guide the mutation processtowards selecting updateTunnel function, and thus increasing the overall bug57Table 3.9: Mutation score computed for SimpleCart, JQUERY, and WymEditor.Mutandis RandomAppID#JSTestCasesJSBranchCoverage(%)#TotalMutants#Equiv.#KilledNon-Equiv.(%)Equiv.(%)Non-Equiv.Surviving(%)MutationScore(%)#Equiv.#KilledNon-Equiv.(%)Equiv.(%)Non-Equiv.Surviving(%)MutationScore(%)6 83 41 120 2 80 95 5 32 67 8 78 81 19 30 707 644 73 120 3 106 79 21 9 90 6 107 54 46 6 948 253 71 120 6 97 74 26 15 85 9 99 57 43 11 89severity degree. On the other hand, more than 90% of the local and global variablesused in function updateTunnel are involved with crucial reading and writing ofproperties. While mutating such important variables generates non-equivalent mu-tants, it will not significantly improve the criticality of the injected faults amongthe non-equivalent mutants compared to random selection of variables. This im-plies that our variable selection strategy plays a more prominent role in generatingnon-equivalent mutants rather than increasing the severity degree of the mutation.Assessing Existing Test Cases (RQ3)The results obtained from analyzing the mutants generated by MUTANDIS on thetest cases of SimpleCart (ID 6), JQUERY library (ID 7), and WymEditor (ID 8) arepresented in Table 3.9. The columns under “MUTANDIS”, and “Random” presentthe results obtained by using our approach and random mutation generation re-spectively. The table shows the number of test cases, branch coverage achieved bythe test suite, number of mutants, number of equivalent mutants, number of mu-tants detected by the test suite (killed mutants), the percentage of non-equivalentmutants and the equivalent mutants, the percentage of non-equivalent survivingmutants, and the mutation score. To compute the percentage of equivalent mu-tants in presence of the test suite, we follow the guidance suggested by [103],where, Equiv(%) = #Equiv#TotalMutants−#Killed × 100. Similarly, the percentage of non-equivalent mutants is: Non-Equiv(%) = #Non-Equiv#TotalMutants−#Killed ×100 The percentage58of non-equivalent surviving mutants is: #NonEquivSurvivingMutants#TotalNonEquivMutants ×100.Mutation score is used to measure the effectiveness of a test suite in terms ofits ability to detect faults [112]. The mutation score is computed according to thefollowing formula:( KM−E)× 100, where K is the number of killed mutants, M isthe number of mutants, and E is the number of equivalent mutants.Quality of test suites. The test suites of both JQuery and WymEditor are fre-quently updated in response to issues raised by the users. Both JQUery and WymEd-itor have around 71% branch coverage. This points to the overall high quality of thetest cases considering how difficult it is to write unit-level test cases for JavaScriptcode. Note that despite the low branch coverage of SimpleCart, we gather exe-cution traces of this application based on the available test suite. Therefore, theprocess of mutation generation is performed according to the executed part of theapplication from the test suite point of view. We also observed that for the threeapplications in Table 3.9, a substantial percentage of uncovered branches are re-lated to check for different browser settings (i.e., running the application under IE,FireFox, etc).Surviving mutants. As shown in the table, less than 30% of the mutants gener-ated by MUTANDIS are equivalent. SimpleCart achieves a mutation score of 67,which means there is much room for test case improvement in this application. ForSimpleCart, we noticed that the number of non-equivalent, surviving mutants inthe branch mutation category is more than twice the number in the variable muta-tion category. This shows that the test suite was not able to adequately examineseveral different branches in the SimpleCart library, possibly because it has a highcyclomatic complexity (Table 4.1). On the other hand, the QUnit test suite of theJQUERY library achieves a high mutation score of over 90%, which indicates thehigh quality of the designed test cases. However, even in this case, 9% of thenon-equivalent mutants are not detected by this test suite.We further observed that:More than 75% of the surviving non-equivalent mutants are in the top 30% ofthe ranked functions.This again points to the importance of FunctionRank in test case adequacyassessment.59As far as RQ3 is concerned:MUTANDIS is able to guide testers towards designing test cases for importantportions of the code from the application’s behaviour point of view.Stubbornness of the generated mutants. Comparing the percentage of equiva-lent mutants as well as surviving non-equivalent mutants generated by MUTAN-DIS to those generated by random mutation in Table 3.9, reveals that while ourapproach decreases the percentage of equivalent mutants (55% on average), itdoes not negatively affect the stubbornness of the mutants. To better show theeffectiveness of MUTANDIS in decreasing the number of equivalent mutants, wecompute odds ratio, which is a useful measure of effect size for categorical data[70]; the odds of non-equivalent mutants generated by approach M is computed asoddsNon-Equiv in M =#Non-EquivM−#killedM#EquivM.Regarding our results, odds ratioNon-Equiv =oddsNon-Equiv in MutandisoddsNon-Equiv in Random= 2.6, which isthe odds of non-equivalent mutants generated by MUTANDIS divided by the oddsof non-equivalent mutants using random mutation generation. This indicates thatthe odds of non-equivalent mutants generated by MUTANDIS is 2.6 times higherthan the random mutation strategy. We similarly measure the odds ratiokilled forthe number of killed mutants. The odds ratiokilled of 0.98 indicates that comparedwith random mutation generation, our approach does not sacrifice stubbornness ofthe mutants. We further discuss the stubbornness of the mutants in Section 5.4.4.3.9 Discussion3.9.1 Stubborn MutantsThe aim of our mutation testing approach is to guide testers towards potentiallyerror-prone parts of the code while easing the burden of handling equivalent mu-tants by reducing the number of such mutations. However, reducing the numberof equivalent mutants might imply a decrease in the number of generated stubborn(or hard-to-kill) mutants, which are particularly useful for test adequacy assess-ment. Our initial results indicate that while the proposed guided approach reducesthe number of equivalent mutants, it does not negatively affect the number of stub-born mutants generated. This finding is in line with a recent empirical study [115],60in which no precise correlation was found between the number of equivalent mu-tants and stubborn mutants. However, we acknowledge that our finding is based onpreliminary results and more research in this direction is needed.In the following, we discuss different types of stubborn mutants we observedin our evaluation of MUTANDIS and how they can be utilized by a guided mutationgeneration technique to increase the number of hard-to-kill mutants. Based on ourobservations, stubbornness of mutants in JavaScript applications stems from (1) thetype and ultimate location of the mutation operator, and (2) specific characteristicsof JavaScript functions. We discuss each in the next two subsections, respectively.3.9.2 Type and Location of OperatorsWe notice that the type of the mutation operator as well as the ultimate locationof the mutation affect the stubbornness of generated mutant. As far as the variableand branch mutations are concerned, the following mutations can result in stubbornmutants based on our observations:• Variable mutations that happen in the body of conditional statements withmore than one nested statement, where the conditions are involved with bothvariable as well as DOM related expressions. To satisfy such conditions, notonly the variables should hold proper values, but also the proper structure aswell as the properties of the involved DOM elements are required to be inplace. This intertwined interaction limits the input space to only a few andchallenging ones that are capable of satisfying the condition.• Replacing the prefix unary operators with postfix unary operators, e.g., ++va-riable to variable++.• Replacing the logical operators in conditional statements when the statementcontains more than one different logical operator (e.g., if(A && B ||C){...} to if(A && B && C){...}).• Swapping true/false in conditional statements when the statement con-tains more than two conditions (e.g., if(A && B && C){...} to if(A&& !B && C){...}).61• Removing a parameter from a function call where the function contains morethan three parameters.As far as JavaScript specific mutation operators are concerned, we observedthat the following two mutations result in more stubborn mutants compared withthe rest:• Adding a redundant var keyword to a global defined variable.• Changing setTimeout calls such that the function is called without pass-ing all the required parameters.Our findings with respect to the type of mutation operator indicate that some classesof the operators tend to generate more stubborn mutants. While in our current ap-proach we equally treat all classes, giving more priority to the hard-to-kill mutationoperators would enhance the guided technique to potentially produce more stub-born mutants, which is part of our future work.3.9.3 Characteristics of JavaScript FunctionsA given JavaScript function can exhibit different behaviours at runtime. This ismainly due to two features of the JavaScript language.First feature is related to the use of this in a function. The this keywordrefers to the owner of the executed function in JavaScript. Depending on where thefunction is called from at runtime, the value of this can be different. It can referto (1) a DOM element for which the executed function is currently an event handlerof, (2) the global window object, or (3) the object of which the function is a prop-erty/method of. Let’s assume function func is defined as follows: var func =function () {console.log(this);};. If func is set as the event han-dler of a DOM element elem (e.g.; elem.addEventListener(‘‘click’’,func, false);), when elem is clicked, this will become the DOM elementelem. However, if function func is directly invoked (e.g.; func();), this be-comes the window object. Therefore, the value of this can dynamically changewithin the same function as the program executes. Considering the highly dynamicnature of JavaScript applications, it is challenging for the tester to identify all such62usage scenarios. Therefore, the mutation that occurs in these functions remainsundetected unless the tester (1) correctly identifies all possible scopes from whichthe function can be invoked, and (2) associates each invocation with proper testoracles that are relevant to the value of this.Second feature is function variadicity, meaning that a JavaScript function canbe invoked with an arbitrary number of arguments compared to the function’s staticsignature, which is common in web applications [97]. For example, if a function iscalled without passing all the expected parameters, the remaining parameters areset to undefined, and thus the function exhibits a different behaviour. Note thatin cases where the programmer uses the same implementation of a given functionfor the purpose of different functionalities, the function achieves a high rank valueaccording to our ranking mechanism since the function is executed several timesfrom different scopes of the application. Testing the expected behaviour of all thepossible functionalities is quite challenging, since invoking a particular functional-ity is often involved with triggering only a specific sequence of events capable oftaking the application to the proper state. While the code coverage of the function isthe same among different usage scenarios, the mutated statement remains unkilledunless a proper combination of test input and oracle is used. We believe that if ourguided approach takes into account such particular usages of a function, which arebarely exposed, it can reduce the number of equivalent mutants while increasingthe number of hard-to-kill mutants, which forms part of our future work.3.9.4 Threats to ValidityAn external threat to the validity of our results is the limited number of web appli-cations we use to evaluate the usefulness of our approach in assessing existing testcases (RQ4). Unfortunately, few JavaScript applications with up-to-date test suitesare publicly available. Another external threat to validity is that we do not per-form a quantitative comparison of our technique with other mutation techniques.However, to the best of our knowledge, there is no mutation testing tool availablefor JavaScript, which limits our ability to perform such comparisons. A relativelylow number of generated mutants in our experiments is also a threat to validity.However, detecting equivalent mutants is a labour intensive task. For example it63took us more than 4 hours to distinguish the equivalent mutants for JQUERY in ourstudy. In terms of internal threat to validity, we had to manually inspect the appli-cation’s code to detect equivalent mutants. This is a time intensive task, which maybe error-prone and biased towards our judgment. However, this threat is shared byother studies that attempt to detect equivalent mutants. As for the replicability ofour study, MUTANDIS and all the experimental objects used are publicly available,making our results reproducible.3.10 Related WorkA large body of research has been conducted to turn mutation testing into a prac-tical approach. To reduce the computational cost of mutation testing, researchershave proposed three main approaches to generate a smaller subset of all possiblemutants: (1) mutant clustering [64], which is an approach that chooses a subset ofmutants using clustering algorithms; (2) selective mutation [24, 84, 116], which isbased on a careful selection of more effective mutation operators to generate lessmutants; and (3) higher order mutation (HOM) testing [65], which tries to find rarebut valuable higher order mutants that denote subtle faults [66].Our guided mutation testing approach is a form of selective mutation. However,in addition to selecting a small set of effective mutation operators, our approachfocuses on deciding which portions of the original code to select such that (1) theseverity of injected faults impacting the application’s behaviour increases, (2) thelikelihood of generating equivalent mutants diminishes.The problem of detecting equivalent mutants has been tackled by many re-searchers (discussed below). The main goal of all equivalent mutant detection tech-niques is to help the tester identify the equivalent mutants after they are generated.We, on the other hand, aim at reducing the probability of generating equivalentmutants in the first place.According to the taxonomy suggested by Madeyski et al. [70], there are threemain categories of approaches that address the problem of equivalent mutants: (1)detecting equivalent mutants, (2) avoiding equivalent mutant generation, and (3)suggesting equivalent mutants. As far as equivalent mutant detection techniquesare concerned, the most effective approach is proposed by Offutt and Pan [88, 89],64which uses constraint solving and path analysis. The results of their evaluationshowed that the approach is able to detect on average the 45% of the equivalentmutants. However, these solutions are involved with considerable amount of man-ual effort and are error-prone. Among equivalent detection methods, program slic-ing has also been used in equivalent mutants detection [59]. The goal there is toguide a tester in detecting the locations that are affected by a mutant. Such equiv-alent mutant detection techniques are orthogonal to our approach. If a mutationhas been statically proven to be equivalent, we do not need to measure its impacton the application’s expected behaviour and we focus only on those that cannot behandled using static techniques. Moreover, static techniques are not able to fullyaddress unpredictable and highly dynamic aspects of programming languages suchas JavaScript.Among avoiding equivalent mutant generation techniques, Adamopoulos et al.[18] present a co-evolutionary approach by designing a fitness function to detectand avoid possible equivalent mutants. Domı´nguez-Jime´nez et al. [40] proposean evolutionary mutation testing technique based on a genetic algorithm to copewith the high computational cost of mutation testing by reducing the number ofmutants. Their system generates a strong subset of mutants, which is composedof surviving and difficult to kill mutants. Their technique, however, cannot distin-guish equivalent mutants from surviving non-equivalent mutants. Langdon et al.have applied multi-object genetic programming to generate higher order mutants[68]. An important limitation of these approaches is that the generated mutantneeds to be executed against the test suite to compute its fitness function. In con-trast, our approach avoids generating equivalent mutants in the first place, therebyachieving greater efficiency. Bottaci [30] presents a mutation analysis techniquebased on the available type information at run-time to avoid generating incompe-tent mutants. This approach is applicable for dynamically typed programs suchas JavaScript. However, the efficiency of the technique is unclear as they do notprovide any empirical evaluation of their approach. Gligoric et al. [51] conduct thefirst study on performing selective mutation to avoid generating equivalent mutantsin concurrent code. The results show that there are important differences betweenthe concurrent mutation and sequential mutation operators. The authors show thatsequential and concurrent mutation operators are independent, and thus they pro-65pose sets of operators that can be used for mutation testing of concurrent codes.While we also make use of a small set of mutation operators, we aim to supportsequential programs.Among the equivalent mutant suggestion techniques, Schuler et al. [104] sug-gest possible equivalent mutants by checking invariant violations. They generatemultiple mutated versions and then classify the versions based on the number ofviolated invariants. The system recommends testers to focus on those mutationsthat violate the most invariants. In a follow-up paper [103], they extend their workto assess the role of code coverage changes in detecting equivalent mutants. Kin-tis et al. [67] present a technique called I-EQM to dynamically isolate first orderequivalent mutants. They further extend the coverage impact approach [103] toclassify more killable mutants. In addition to coverage impact, the classificationscheme utilizes second order mutation to assess first order mutants as killable. Togenerate mutants, they utilize Javalanche [103]. Our work is again different fromthese approaches in the sense that instead of classifying mutants, we avoid gener-ating equivalent mutants a priori by identifying behaviour-affecting portions of thecode.Deng et al. [39] implement a version of statement deletion (SDL) mutationoperator for Java within the muJava mutation system. The design of SDL opera-tor is based on a theory that performing mutation testing using only one mutationoperator can result in generating effective tests. However, the authors cannot con-clude that SDL-based mutation is as effective as selective mutation, which containsa sufficient set of mutation operators from all possible operators. Therefore, theyonly recommend for future mutation systems to include SDL as a choice, whichwe have already taken into account in this work.Ayari et al. [23] and Fraser et al. [48] apply search based techniques to auto-matically generate test cases using mutation testing for Java applications. Harmanet al. [57] propose SHOM approach which combines dynamic symbolic execu-tion and Search based software testing to generate strongly adequate test data tokill first and higher order mutants for C programs. However, all these approachesmake use of mutation testing for the purpose of test case generation, and thus togenerate mutants they rely on the available mutation testing frameworks.Zhang et al. [117] present FaMT approach which incorporates a family of tech-66niques for prioritizing and reducing tests to reduce the time required for mutationtesting. FaMT is designed based on regression test prioritization and reduction.Our approach is orthogonal to this work as our goal is to optimize the mutant gen-eration to produce useful mutants, which can later be executed against the test suite.Our mutation generation approach can be combined with this technique to furtherspeed up mutation testing.Bhattacharya et al. [27] propose NodeRank to identify parts of code that areprone to bugs of high severity. Similar to our work, NodeRank uses the PageRankalgorithm to assign a value to each node in a graph, indicating the relative im-portance of that node in the whole program according to the program’s static callgraph. The authors empirically show that such important portions of the code re-quire more maintenance and testing effort as the program evolves. In our approachwe propose a new metric, FunctionRank, which takes advantage of dynamic infor-mation collected at execution time for measuring the importance of a function interms of the program’s behaviour. Weighting the ranking metric with call frequen-cies as we do makes it more practical in web application testing, as the likelihoodof exercising different parts of the application can be different. Further, to the bestour knowledge, we are the first to apply such a metric to mutation testing.3.11 ConclusionsIn this work, we proposed a guided mutation testing technique that leverages dy-namic and static characteristics of the system under test to selectively mutate por-tions of the code that exhibit a high probability of (1) being error-prone, or (2)affecting the observable behaviour of the system, and thus being non-equivalent.Thus, our technique is able to minimize the number of generated mutants whileincreasing their effect on the semantics of the system. We also proposed a setof JavaScript-specific mutation operators that mimic developer mistakes in prac-tice. We implemented our approach in an open source mutation testing tool forJavaScript, called MUTANDIS. The evaluation of MUTANDIS points to the efficacyof the approach in generating behaviour-affecting mutants.67Chapter 4JSEFT: Automated JavaScript Unit TestGenerationSummary11The event-driven and highly dynamic nature of JavaScript, as well as its runtimeinteraction with the Document Object Model (DOM) make it challenging to testJavaScript-based applications. Current web test automation techniques target thegeneration of event sequences, but they ignore testing the JavaScript code at theunit level. Further they either ignore the oracle problem completely or simplify itthrough generic soft oracles such as HTML validation and runtime exceptions. Wepresent a framework to automatically generate test cases for JavaScript applicationsat two complementary levels, namely events and individual JavaScript functions.Our approach employs a combination of function coverage maximization and func-tion state abstraction algorithms to efficiently generate test cases. In addition, thesetest cases are strengthened by automatically generated mutation-based oracles. Weempirically evaluate the implementation of our approach, called JSEFT, to assessits efficacy. The results, on 13 JavaScript-based applications, show that the gen-erated test cases achieve a coverage of 68% and that JSEFT can detect injectedJavaScript and DOM faults with a high accuracy (100% precision, 70% recall).We also find that JSEFT outperforms an existing JavaScript test automation frame-work both in terms of coverage and detected faults.11This chapter appeared at the IEEE International Conference on Software Testing, Verificationand Validation (ICST), 2015 [81].684.1 IntroductionTo test JavaScript applications, developers often write test cases using web testingframeworks such as SELENIUM (GUI tests) and QUNIT (JavaScript unit tests).Although such frameworks help to automate test execution, the test cases still needto be written manually, which is time-consuming.Researchers have recently developed automated test generation techniques forJavaScript-based applications [22, 71, 72, 76, 102]. However, current web testgeneration techniques suffer from two main shortcomings:1. Target the generation of event sequences, that can potentially miss the por-tion of code-level JavaScript faults.2. Either ignore the oracle problem altogether or simplify it through genericsoft oracles.To address these two shortcomings, we propose an automated test case generationtechnique for JavaScript applications.Our approach, called JSEFT (JavaScript Event and Function Testing) operatesthrough a three step process. First, it dynamically explores the event-space of theapplication using a function coverage maximization method, to infer a test model.Then, it generates test cases at two complementary levels, namely, DOM event andJavaScript functions. Our technique employs a novel function state abstractionalgorithm to minimize the number of function-level states needed for test gener-ation. Finally, it automatically generates test oracles, through a mutation-basedalgorithm.This work makes the following main contributions:• An automatic technique to generate test cases for JavaScript functions andevents.• A combination of function converge maximization and function state ab-straction algorithms to efficiently generate unit test cases;• A mutation-based algorithm to effectively generate test oracles, capable ofdetecting regression JavaScript and DOM-level faults;• The implementation of our technique in a tool called JSEFT, which is pub-licly available [8];691 var currentDim=20;2 function cellClicked() {3 var divTag = '<div id='divElem' />';4 if($(this).attr('id') == 'cell0'){5 $('#cell0').after(divTag);6 $('div #divElem').click(setup);7 }8 else if($(this).attr('id') == 'cell1'){9 $('#cell1').after(divTag);10 $('div #divElem').click(function(){setDim(20)});11 }12 }14 function setup() {15 setDim(10);16 $('#startCell').click(start);17 }19 function setDim(dimension) {20 var dim=($('#endCell').width() + $('#endCell').height()))/dimension;21 currentDim += dim;22 $('#endCell').css('height', dim+'px');23 return dim;24 }26 function start() {27 if(currentDim > 40)28 $(this).css('height', currentDim+'px');29 else $(this).remove();30 }32 $document.ready(function() {33 ...34 $('#cell0').click(cellClicked);35 $('#cell1').click(cellClicked);36 });Figure 4.1: JavaScript code of the running example.• An empirical evaluation to assess the efficacy of JSEFT using 13 JavaScriptapplications.The results of our evaluation show that on average (1) the generated test suiteby JSEFT achieves a 68% JavaScript code coverage, (2) compared to Artemis, afeedback-directed JavaScript testing framework [22], JSEFT achieves 53% bettercoverage, and (3) the test oracles generated by JSEFT are able to detect injectedfaults with 100% precision and 70% recall.704.2 Challenges and MotivationIn this section, we illustrate some of the challenges associated with test generationfor JavaScript applications.Figure 4.1 presents a snippet of a JavaScript game application that we use asa running example throughout the thesis. This simple example uses the popularjQuery library [6] and contains four main JavaScript functions:1. cellClicked is bound to the event-handlers of DOM elements with IDscell0 and cell1 (Lines 34–35). These two DOM elements become avail-able when the DOM is fully loaded (Line 32). Depending on the elementclicked, cellClicked inserts a div element with ID divElem (Line 3)after the clicked element and makes it clickable by attaching either setupor setDim as its event-handler function (Lines 5–6, 9–10).2. setup calls setDim (Line 15) to change the value of the global variablecurrentDim. It further makes an element with ID startCell clickableby setting its event- handler to start (Line 16).3. setDim receives an input variable. It performs some computations to set theheight value of the css property of a DOM element with ID endCelland the value of currentDim (Lines 20–22). It also returns the computeddimension.4. start is called at runtime when the element with ID startCell is clicked(Line 16), which either updates the width dimension of the element on whichit was called, or removes the element (Lines 27-29).There are four main challenges in testing JavaScript applications.The first challenge is that a fault may not immediately propagate into a DOM-level observable failure. For example, if the ‘+’ sign in Line 21 is mistakenlyreplaced by ‘-’, the affected result does not immediately propagate to the observ-able DOM state after the function exits. While this mistakenly changes the valueof a global variable, currentDim, which is later used in start (Line 27), itneither affects the returned value of the setDim function nor the css value of71element endCell. Therefore, a GUI-level event-based testing approach may nothelp to detect the fault in this case.The second challenge is related to fault localization; even if the fault propagatesto a future DOM state and a DOM-level test case detects it, finding the actuallocation of the fault is challenging for the tester as the DOM-level test case isagnostic of the JavaScript code. However, a unit test case that targets individualfunctions, e.g., setDim in this running example, helps a tester to spot the fault,and thus easily resolve it.The third challenge pertains to the event-driven dynamic nature of JavaScript,and its extensive interaction with the DOM resulting in many state permutationsand execution paths. In the initial state of the example, clicking on cell0 orcell1 takes the browser to two different states as a result of the if-else state-ment in Lines 4 and 8 of the function cellClicked. Even in this simple ex-ample, expanding either of the resulting states has different consequences due todifferent functions that can be potentially triggered. Executing either setup orsetDim in Lines 6 and 10 results in different execution paths, DOM states, andcode coverage. It is this dynamic interaction of the JavaScript code with the DOM(and indirectly CSS) at runtime that makes it challenging to generate test cases forJavaScript applications.The fourth important challenge in unit testing JavaScript functions that haveDOM interactions, such as setDim, is that the DOM tree in the state expectedby the function, has to be present during unit test execution. Otherwise the testwill fail due to a null or undefined exception. This situation arises often inmodern web applications that have many DOM interactions.4.3 ApproachOur main goal in this work is to generate client-side test cases coupled with effec-tive test oracles, capable of detecting regression JavaScript and DOM-level faults.Further, we aim to achieve this goal as efficiently as possible. Hence, we maketwo design decisions. First, we assume that there is a finite amount of time avail-able to generate test cases. Consequently we guide the test generation to maximizecoverage under a given time constraint. The second decision is to minimize the72number of test cases and oracles generated to only include those that are essentialin detecting potential faults. Consequently, to examine the correctness of the testsuite generated, the tester would only need to examine a small set of assertions,which minimizes their effort.Our approach generates test cases and oracles at two complementary levels:DOM-level event-based tests consist of DOM-level event sequences and asser-tions to check the application’s behaviour from an end-user’s perspective.Function-level unit tests consist of unit tests with assertions that verify the func-tionality of JavaScript code at the function level.An overview of the technique is depicted in Figure 4.2. At a high level, ourapproach is composed of three main steps:1. In the first step (Section 4.3.1), we dynamically explore various states of agiven web application, in such a way as to maximize the number of functionsthat are covered throughout the program execution. The output of this initialstep is a state-flow graph (SFG) [76], capturing the explored dynamic DOMstates and event-based transitions between them.2. In the second step (Section 4.3.2), we use the inferred SFG to generate event-based test cases. We run the generated tests against an instrumented versionof the application. From the execution trace obtained, we extract DOM ele-ment states as well as JavaScript function states at the entry and exit points,from which we generate function-level unit tests. To reduce the number ofgenerated test cases to only those that are constructive, we devise a stateabstraction algorithm that minimizes the number of states by selecting rep-resentative function states.3. To create effective test oracles for the two test case levels, we automaticallygenerate mutated versions of the application (Section 4.5). Assuming thatthe original version of the application is fault-free, the test oracles are thengenerated at the DOM and JavaScript code levels by comparing the statestraced from the original and the mutated versions.7312Web AppExtract Function StateInstrument Crawl MaximizeCoverageSFGCollect Trace Extract Event Seq.Event-based TestsInstrument Collect TraceRun testsFunction-level Unit TestsExtract DOM State3MutateExtract Function StateInstrumentCollect TraceRun testsExtract DOM StateDiffDOM OraclesFunc. Oracles DiffAbstract Func. StatesFigure 4.2: Overview of our test generation approach.4.3.1 Maximizing Function CoverageIn this step, our goal is to maximize the number of functions that can be covered,while exercising the program’s event space. To that end, our approach combinesstatic and dynamic analysis to decide which state and event(s) should be selectedfor expansion to maximize the probability of covering uncovered JavaScript func-tions. While exploring the web application under test, our function coverage max-imization algorithm selects a next state for exploration, which has the maximumvalue of the sum of the following two metrics:1. Potential Uncovered Functions. This pertains to the total number of unexe-cuted functions that can potentially be visited through the execution of DOM events74in a given DOM state si. When a given function fi is set as the event-handler ofa DOM element d ∈ si, it makes the element a potential clickable element in si.This can be achieved through various patterns in web applications depending onwhich DOM event model level is adopted. To calculate this metric, our algorithmidentifies all JavaScript functions that are directly or indirectly attached to DOMelements as event handlers, in si through code instrumentation and execution tracemonitoring.2. Potential Clickable Elements. The second metric, used to select a state forexpansion, pertains to the number of DOM elements that can potentially becomeclickable elements. If the event-handlers bound to those clickables are triggered,new (uncovered) functions will be executed. To obtain this number, we staticallyanalyze the previously obtained potential uncovered functions within a given statein search of such elements.While exploring the application, the next state for expansion is selected byadding the two metrics and choosing the state with the highest sum. The procedurerepeats the aforementioned steps until the designated time limit, or state space sizeis reached.In the running example of Figure 4.1, in the initial state, clicking on elementswith IDs cell0 and cell1 results in two different states due to an if-elsestatement in Lines 4 and 8 of cellClicked. Let’s call the state in which aDIV element is located after the element with ID cell0 as s0, and the state inwhich a DIV element is placed after the element with ID cell1 as s1. If states0, with the clickable cell0, is chosen for expansion, function setup is called.As shown in Line 15, setup calls setDim, and thus, by expanding s0 both ofthe aforementioned functions get called by a single click. Moreover, a potentialclickable element is also created in Line 16, with start as the event-handler.Therefore, expanding s1 results only in the execution of setDim, while expandings0 results in the execution of functions setup, setDim, and a potential executionof start in future states. At the end of this step, we obtain a state-flow graph ofthe application that can be used in the next test generation step.754.3.2 Generating Test CasesIn the second step, our technique first extracts sequences of events from the inferredstate-flow graph. These sequences of events are used in our test case generationprocess. We generate test cases at two complementary levels, as described below.DOM-level event-based testing. To verify the behaviour of the application atthe user interface level, each event path, taken from the initial state (Index) toa leaf node in the state-flow graph, is used to generate DOM event-based testcases. Each extracted path is converted into a JUNIT SELENIUM-based test case,which executes the sequence of events, starting from the initial DOM state. Go-ing back to our running example, one possible event sequence to generate is:$(‘#cell0’).click→$(‘div #divElem’).click→$(‘#startCe-ll’).click.To collect the required trace data, we capture all DOM elements and their at-tributes after each event in the test path is fired. This trace is later used in our DOMoracle comparison, as explained in Section 4.5.JavaScript function-level unit testing. To generate unit tests that target JavaScriptfunctions directly (as opposed to event-triggered function executions), we log thestate of each function at their entry and exit point, during execution. To that end, weinstrument the code to trace various entities. At the entry point of a given JavaScriptfunction we collect (1) function parameters including passed variables, objects,functions, and DOM elements, (2) global variables used in the function, and (3)the current DOM structure just before the function is executed. At the exit point ofthe JavaScript function and before every return statement, we log the state of the(1) return value of the function, (2) global variables that have been accessed in thatfunction, and (3) DOM elements accessed (read/written) in the function. At eachof the above points, our instrumentation records the name, runtime type, and actualvalues. The dynamic type is stored because JavaScript is a dynamically typedlanguage, meaning that the variable types cannot be determined statically. Notethat complex JavaScript objects can contain circular or multiple references (e.g.,in JSON format). To handle such cases, we perform a de-serialization process inwhich we replace such references by an object in the form of $re f : Path, where76Path denotes a JSONPath string12 that indicates the target path of the reference.In addition to function entry and exit points, we log information required forcalling the function from the generated test cases. JavaScript functions that areaccessible in the public scope are mainly defined in:1. The global scope directly (e.g., function f(){...});2. Variable assignments in the global scope (e.g., var f = function()-{...});3. Constructor functions (e.g, function constructor() {this. me-mber= function(){...}});4. Prototypes (e.g., Constructor.prototype.f= function() {.-..});Functions in the first and second case are easy to call from test cases. For the thirdcase, the constructor function is called via the new operator to create an objecttype, which can be used to access the object’s properties (e.g., container=newConstructor(); container.member();). This allows us to access theinner function, which is a member of the constructor function in the above ex-ample. For the prototype case, the function can be invoked through container.-f() from a test case.Going back to our running example in Figure 4.1, at the entry point of setDim,we log the value and type of both the input parameter dimension and globalvariable currentDim, which is accessed in the function. Similarly, at the exitpoint, we log the values and types of the returned variable dim and currentDim.In addition to the values logged above, we need to capture the DOM statefor functions that interact with the DOM. This is to address the fourth challengeoutlined in Section 5.2. To mitigate this problem, we capture the state of the DOMjust before the function starts its execution, and include that as a test fixture [11] inthe generated unit test case.In the running example, at the entry point of setDim, we log the innerHTMLof the current DOM as the function contains several calls to the DOM, e.g., retriev-ing the element with ID endCell in Line 22. We further include in our execution12 http://goessner.net/articles/JsonPath/77trace the way DOM elements and their attributes are modified by the JavaScriptfunction at runtime. The information that we log for accessed DOM elements in-cludes the ID attribute, the XPath position of the element on the DOM tree, andall the modified attributes. Collecting this information is essential for oracle gen-eration in the next step. We use a set to keep the information about DOM modi-fications, so that we can record the latest changes to a DOM element without anyduplication within the function. For instance, we record ID as well as both widthand height properties of the endCell element.Once our instrumentation is carried out, we run the generated event sequencesobtained from the state-flow graph. This way, we produce an execution trace thatcontains:• Information required for preparing the environment for each function to beexecuted in a test case, including its input parameters, used global variables,and the DOM tree in a state that is expected by the function;• Necessary entities that need to be assessed after the function is executed,including the function’s output as well as the touched DOM elements andtheir attributes (The actual assessment process is explained in Section 4.5).Function State Abstraction. As mentioned in Section 5.2, the highly dynamicnature of JavaScript applications can result in a huge number of function states.Capturing all these different states can potentially hinder the technique’s scalabilityfor large applications. In addition, generating too many test cases can negativelyaffect test suite comprehension. We apply a function state abstraction method tominimize the number of function-level states needed for test generation.Our abstraction method is based on classification of function (entry/exit) statesaccording to their impact on the function’s behaviour, in terms of covered brancheswithin the function, the function’s return value type, and characteristics of the ac-cessed DOM elements.Branch coverage: Taking different branches in a given function can change itsbehaviour. Thus, function entry states that result in a different covered branchshould be taken into account while generating test cases. Going back to our78Algorithm 2: Function State Abstractioninput : The set of function states sti ∈ STf for a given function foutput: The obtained abstracted states set AbsStatesbegin1 for sti ∈ STf do2 L = 1; StSetL← /03 if BRNCOVLNS[sti] 6= BRNCOVLNS[StSet]Ll=1 then4 StSetL+1← sti5 L++end6 else7 StSetl ← sti ∪StSetlend8 K = L+1; StSetK ← /09 if DOMPROPS[sti] 6= DOMPROPS[StSet]Kk=L+1 || RetType[sti] 6= RETTYPE[StSet]Kk=L+1then10 StSetK+1← sti11 K++end12 else13 StSetk ← stk ∪StSetkendend14 while StSetK+L 6= /0 do15 SelectedSt← SELECTMAXST(sti|sti ∩StSetK+Lj=1 )16 AbsStates.ADD(SelectedSt)17 StSetK+L← StSetK+L−SelectedStend18 return AbsStatesendexample in Figure 4.1, executing either of the branches in lines 27 and 29clearly takes the application into a different DOM state. In this example, weneed to include the states of the start function that result in different cov-ered branches, e.g., two different function states where the value of the globalvariable currentDim at the entry point falls into different boundaries.Return value type: A variable’s type can change in JavaScript at runtime. Thiscan result in changes in the expected outcome of the function. Going back toour example, if dim is mistakenly assigned a string value before adding it tocurrentDim (Line 21) in function setDim, the returned value of the functionbecomes the string concatenation of the two values rather than the expectednumerical addition.79Accessed DOM properties: DOM elements and their properties accessed in afunction can be seen as entry states. Changes in such DOM entry states can af-fect the behaviour of the function. For example, in line 29 this keyword refersto the clicked DOM element of which function start is an event-handler. As-suming that currentDim≤ 40, depending on which DOM element is clicked,by removing the element in line 29 the resulting state of the function startdiffers. Therefore, we take into consideration the DOM elements accessed bythe function as well as the type of accessed DOM properties.Algorithm 2 shows our function state abstraction algorithm. The algorithm firstcollects covered branches of individual functions per entry state (BRNCOVLNS[sti]in Line 3). Each function’s states exhibiting same covered branches are categorizedunder the same set of states (Lines 4 and 7). StSetl corresponds to the set of functionstates, which are classified according to their covered branches, where l = 1, ...,Land L is the number of current classified sets in covered branch category. Similarly,function states with the same accessed DOM characteristics as well as return valuetype, are put into the same set of states (Lines 10 and 13). StSetk correspondsto the set of function states, which are classified according to their DOM/returnvalue type, where k = 1, ...,K and K is the number of current classified sets in thatcategory. After classifying each function’s states into several sets, we cover eachset by selecting one of its common states. The state selection step is a set coverproblem [35], i.e., given a universe U and a family S of subsets of U , a cover is asubfamily C ⊆ S of sets whose union is U . Sets to be covered in our algorithm areStSetK+L, where sti ∈ StSetK+L. We use a common greedy algorithm for obtaininga minimum number of states that can cover all the possible sets (Lines 15-17).Finally, the abstracted list of states is returned in Line 18.4.3.3 Generating Test OraclesIn the third step, our approach automatically generates test oracles for the twolevels of test cases generated in the previous step, as depicted in the third step ofFigure 4.2. Instead of randomly generating assertions, our oracle generation uses amutation-based process.Mutation testing is typically used to evaluate the quality of a test suite [38], or80to generate test cases that kill mutants [48]. In our approach, we adopt mutationtesting to (1) reduce the number of assertions automatically generated, (2) targetcritical and error-prone portions of the application. Hence, the tester would onlyneed to examine a small set of effective assertions to verify the correctness of thegenerated oracles. Algorithm 3 shows our algorithm for generating test oracles. Ata high level, the technique iteratively executes the following steps:1. A mutant is created by injecting a single fault into the original version ofthe web application (Line 9 and 19 in Algorithm 3 for DOM mutation andcode-level mutation, respectively),2. Related entry/exit program states at the DOM and JavaScript function levelsof the mutant and the original version are captured. OnEvDomSt in Line 4 isthe original DOM state on which the event Ev is triggered, A f terEvDomStin line 5 is the observed DOM state after the event is triggered, MutDomin line 9 is the mutated DOM, and ChangedSt in line 10 is the correspond-ing affected state for DOM mutations. FcExit in Line 22 is the exit stateof the function in the original application and MutFcExit in line 23 is thecorresponding exit state for that function after the application is mutated forfunction-level mutations.3. Relevant observed state differences at each level are detected and abstractedinto test oracles (DIFF in Line 11 and 24 for DOM and function-level oracles,respectively),4. The generated assertions (Lines 15 and 28) are injected into the correspond-ing test cases.DOM-level event-based test oracles. After an event is triggered in the generatedSELENIUM test case, the resulting DOM state needs to be compared against theexpected structure. One naive approach would be to compare the DOM tree inits entirety, after the event execution. Not only is this approach inefficient, it re-sults in brittle test-cases, i.e., the smallest update on the user interface can breakthe test case. We propose an alternative approach that utilizes DOM mutation test-ing to detect and selectively compare only those DOM elements and attributes that81Algorithm 3: Oracle Generationinput : A Web application (App), list of event sequences obtained from SFG (EvSeq), maximumnumber of mutations (n)output: Assertions for function-level (FcAsserts) and DOM event-level tests (DomAsserts)1 App← INSTRUMENT(App)begin2 while GenMuts < n do3 foreach EvSeq ∈ SFG do4 OnEvDomSt← Trace.GETONEVDOMST(Ev ∈ EvSeq)5 A f terEvDomSt← Trace.GETAFTEREVDOMST(Ev ∈ EvSeq)6 AccdDomProps← GETACCDDOMNDS(OnEvDomSt)7 EquivalentDomMut← true8 while EquivalentDomMut do9 MutDom←MUTATEDOM(AccdDomProps,OnEvDomSt)10 ChangedSt← EvSeq.EXECEVENT(MutDom)11 Di f fChangedSt,A f terEvDomSt ← DIFF(ChangedSt,A f terEvDomSt)12 if Di f fChangedSt,A f terEvDomSt 6= /0 then13 EquivalentDomMut← f alse14 DomAsserti = Di f fChangedSt,A f terEvDomSt15 DomAssertsEv,A f terEvDomSt =⋂DomAssertiendend16 AbsFcSts← Trace.GETABSFCSTS()17 EquivalentCodeMut← true18 while EquivalentCodeMut do19 MutApp←MUTATEJSCODE(App)20 MutFcSts← EvSeq.EXECEVENT(MutApp)21 foreach FcEntry ∈ AbsFcSts.GETFCENTRIES do22 FcExit← AbsFcSts.GETFCEXIT(FcEntry)23 MutFcExit←MutFcSts.GETMUTFCEXIT(FcEntry)24 Di f fFcExit,MutFcExit ← DIFF(FcExit,MutFcExit)25 if Di f fFcExit,MutFcExit 6= /0 then26 EquivalentCodeMut← f alse27 FcAsserti =⋂Di f fFcExit,MutFcExit28 FcAssertsFcEntry =⋃FcAssertiendendendendend29 return {FcAsserts,DOMAsserts}endare affected by an injected fault at the DOM-level of the application. Our DOMmutations target only the elements that have been accessed (read/written) duringexecution, and thus have a larger impact on the application’s behaviour. To se-lect proper DOM elements for mutation, we instrument JavaScript functions thatinteract with the DOM, i.e., code that either accesses or modifies DOM elements.We execute the instrumented application by running the generated SELENIUM82test cases and record each accessed DOM element, its attributes, the triggered eventon the DOM state, and the DOM state after the event is triggered (GETONEVDOM-ST in line 4, GETAFTEREVDOMST in line 5, and GETACCDDOMNDS in line 6to retrieve the original DOM state, DOM state after event Ev is triggered, and theaccessed DOM properties as event Ev is triggered, respectively, in Algorithm 3).To perform the actual mutation, as the application is re-executed using the samesequence of events, we mutate the recorded DOM elements, one at a time, be-fore the corresponding event is fired. MUTATEDOM in line 9 mutates the DOMelements, and EvSeq.EXECEVENT in line 10 executes the event sequence on themutated DOM. The mutation operators include (1) deleting a DOM element, and(2) changing the attribute, accessed during the original execution. As we mutatethe DOM, we collect the current state of DOM elements and attributes.Figure 4.3 shows part of a DOM-level test case generated for the running ex-ample. Going back to our running example, as a result of clicking on $(‘div#divElem’) in our previously obtained event sequence ($(‘#cell0’).cli-ck→$(‘div #divElem’).click→$(‘#startCell’)), the height a-nd width properties of DOM element with ID endCell, and the DOM elementwith ID startCell are accessed. One possible DOM mutation is altering thewidth value of the endCell element before click on $(‘div #divElem’)happens. We log the consequences of this modification after the click event on$(‘div #divElem’) as well as the remaining events. This mutation affectsthe height property of DOM element with ID endCell in the resulting DOMstate from clicking on $(‘div #divElem’). Line 6 in Figure 4.3 shows thecorresponding assertion. Furthermore, Assuming that the DOM mutation makescurrentDim≤ 40 in line 27, after click on element #startCell happens, theelement is removed and no longer exists in the resulting DOM state. The generatedassertion is shown in line 10 of Figure 4.3.Hence, we obtain two sets of execution traces that contain information aboutthe state of DOM elements for each fired event in the original and mutated applica-tion. By comparing these two traces (DIFF in line 11 in Algorithm 3), we identifyall changed DOM elements and generate assertions for these elements. Note thatany changes detected by the DIFF operator (line 12 in Algorithm 3) is an indicationthat the corresponding DOM mutation is not equivalent (line 13); if no change is831 @Test2 public void testCase1(){3 WebElement divElem=driver.findElements(By.id("divElem"));4 divElem.click();5 int endCellHeight=driver.findElements(By.id("endCell")).getSize().←↩height;6 assertEquals(endCellHeight, 30);7 WebElement startCell=driver.findElements(By.id("startCell"));8 startCell.click();9 boolean exists=driver.findElements(By.id("startCell")).size!=0;10 assertTrue(exists);11 int startCellHeight=driver.findElements(By.id("startCell")).getSize()←↩.height;12 assertEquals(startCellHeight, 50);13 }Figure 4.3: Generated SELENIUM test case.detected, another DOM mutation is generated.We automatically place the generated assertion immediately after the corre-sponding line of code that executed the event, in the generated event-based (SELEN-IUM) test case. DomAssertsEv,A f terEvDomSt in line 15 contains all DOM assertionsfor the state A f terEvDOMSt and the triggered event Ev.Function-level test oracles. To seed code level faults, we use our recently devel-oped JavaScript mutation testing tool, MUTANDIS [79]. Mutations generated byMUTANDIS are selected through a function rank metric, which ranks functions interms of their relative importance from the application’s behaviour point of view.The mutation operators are chosen from a list of common operators, such as chang-ing the value of a variable or modifying a conditional statement. Once a mutantis produced (MUTATEJSCODE in line 19), it is automatically instrumented. Wecollect a new execution trace from the mutated program by executing the samesequence of events that was used on the original version of the application. Thisway, the state of each JavaScript function is extracted at its entry and exit points.AbsFcSts.GETFCENTRIES in line 21 retrieves the function’s entries from the ab-stracted function’s states. GETFCEXIT in line 22, and GETMUTFCEXIT in line 23retrieve the corresponding function’s exit state in the original and mutated applica-tion. This process is similar to the function state extraction algorithm explained inSection 4.3.2.After the execution traces are collected for all the generated mutants, we gen-erate function-level test oracles by comparing the execution trace of the original841 test("Testing setDim",4,function(){2 var fixture = $("#qunit-fixture");3 fixture.append("<button id=\"cell0\"> <div id=\"divElem\"/> </button>←↩<div id=\"endCell\" style=\"height:200px;width:100px;\"/>");4 var currentDim=20;5 var result= setDim(10);6 equal(result, 30);7 equal(currentDim, 50);8 ok($(#endCell).length > 0));9 equal($(#endCell).css('height'), 30); });Figure 4.4: Generated QUNIT test case.application with the traces we obtained from the modified versions (DIFF in line24 in Algorithm 3). If the DIFF operator detects no changes (line 25 of the algo-rithm), an equivalent mutant is detected, and thus another mutant will be generated.Our function-level oracle generation targets postcondition assertions. Suchpostcondition assertions can be used to examine the expected behaviour of a givenfunction after it is executed in a unit test case. Our technique generates postcon-dition assertions for all functions that exhibit a different exit-point state but thesame entry-point state, in the mutated execution traces. FcAsserti in line 27 con-tains all such post condition assertions. Due to the dynamic and asynchronousbehaviour of JavaScript applications, a function with the same entry state can ex-hibit different outputs when called multiple times. In this case, we need to com-bine assertions to make sure that the generated test cases do not mistakenly fail.FcAssertsFcEntry in line 28 contains the union of function assertions generated forthe same entry but different outputs during multiple executions. Let’s consider afunction f with an entry state entry in the original version of the application (A),with two different exit states exit1 and exit2. If in the mutated version of the appli-cation (Am), f exhibits an exit state exitm that is different from both exit1 and exit2,then we combine the resulting assertions as follows: assert1(exit1,expRes1)‖a-ssert2(exit2,expRes2), where the expected values expRes1 and expRes2 are ob-tained from the execution trace of A.Each assertion for a function contains (1) the function’s returned value, (2) theused global variables in that function, and/or (3) the accessed DOM element inthat function. Each assertion is coupled with the expected value obtained from theexecution trace of the original version.The generated assertions that target variables, compare the value as well as the85runtime type against the expected ones. An oracle that targets a DOM element,first checks the existence of that DOM element. If the element exists, it checksthe attributes of the element by comparing them against the observed values inthe original execution trace. Assuming that width and height are 100 and 200accordingly in Figure 4.1, and ‘+’ sign is mutated to ‘-’ in line 20 of the runningexample in Figure 4.1, the mutation affects the global variable currentDim,height property of element with ID endCell, and the returned value of thefunction setDim. Figure 4.4 shows a QUNIT test case for setDim functionaccording to this mutation with the generated assertions.4.3.4 Tool ImplementationWe have implemented our JavaScript test and oracle generation approach in an au-tomated tool called JSEFT. The tool is written in Java and is publicly available fordownload [8]. Our implementation requires no browser modifications, and is henceportable. For JavaScript code interception, we use a web proxy, which enables usto automatically instrument JavaScript code before it reaches the browser. Thecrawler for JSEFT extends and builds on top of the event-based crawler, CRAWL-JAX [75], with random input generation enabled for form inputs. As mentionedbefore, to mutate JavaScript code, we use our recently developed mutation test-ing tool, MUTANDIS [79]. The upper-bound for the number of mutations can bespecified by the user. However, the default is 50 for code-level and 20 for DOM-level mutations. We observed that these default numbers provide a balanced trade-off between oracle generation time, and the fault finding capability of the tool.DOM-level test cases are generated in a JUNIT format that uses SELENIUM (Web-Driver) APIs to fire events on the application’s DOM inside the browser. JavaScriptfunction-level tests are generated in the QUNIT unit testing framework [11], capa-ble of testing any generic JavaScript code.4.4 Empirical EvaluationTo quantitatively assess the efficacy of our test generation approach, we have con-ducted an empirical study, in which we address the following research questions:RQ1 How effective is JSEFT in generating test cases with high coverage?86Table 4.1: Characteristics of the experimental objects.ID Name LOC URL1 SameGame 206 crawljax.com/same-game/2 Tunnel 334arcade.christianmontoya.com/tunnel/3 GhostBusters 28210k.aneventapart.com/2/Uploads/657/4 Peg 509www.cccontheweb.org/peggame.htm5 BunnyHunt 580http://www.themaninblue.com/experiment/BunnyHunt/6 AjaxTabs 592https://github.com/amazingSurge/jquery-tabs/7 NarrowDesign 1,005 http://www.narrowdesign.com8 JointLondon 1,211 http://www.jointlondon.com9 FractalViewer 1,245 onecm.com/projects/canopy/10 SimpleCart 1,900https://github.com/wojodesign/simplecart-js/11 WymEditor 3,035 http://www.wymeditor.org12 TuduList 1,963 http://tudu.ess.ch/tudu13 TinyMCE 26,908 http://www.tinymce.comRQ2 How capable is JSEFT of generating test oracles that detect regression faults?RQ3 How does JSEFT compare to existing automated JavaScript testing frame-works?JSEFT and all our experimental data are available for download [8].4.4.1 ObjectsOur study includes thirteen JavaScript-based applications in total. Table 4.1 presentseach application’s ID, name, lines of custom JavaScript code (LOC, excludingJavaScript libraries) and resource. The first five are web-based games. AjaxTabs isa JQUERY plugin for creating tabs. NarrowDesign and JointLondon are websites.FractalViewer is a fractal tree zoom application. SimpleCart is a shopping cartlibrary, WymEditor is a web-based HTML editor, TuduList is a web-based taskmanagement application, and TinyMCE is a JavaScript based WYSIWYG editorcontrol. The applications range from 206 to 27K lines of JavaScript code.The experimental objects are open-source and cover different application types.All the applications are interactive in nature and extensively use JavaScript on theclient-side.874.4.2 SetupTo address our research questions, we provide the URL of each experimental objectto JSEFT. Test cases are then automatically generated by JSEFT. We give JSEFT10 minutes in total for each application. 5 minutes of the total time is designatedfor the dynamic exploration step.Test Case Generation (RQ1). To measure client-side code coverage, we use JS-Cover [7], an open-source tool for measuring JavaScript code coverage. We reportthe average results over five runs to account for the non-determinism behaviour thatstems from crawling the application. In addition, we assess each step in our ap-proach separately as follows: (1) compare the statement coverage achieved by ourfunction coverage maximization with a method that chooses the next state/eventfor the expansion uniformly at random, (2) assess the efficacy of our function stateabstraction method (Algorithm 2), and (3) evaluate the effectiveness of applyingmutation techniques (Algorithm 3) to reduce the number of assertions generated.Test Oracles (RQ2). To evaluate the fault finding capability of JSEFT (RQ2), wesimulate web application faults by automatically seeding each application with 50random faults. We automatically pick a random program point and seed a fault atthat point according to our fault category. While mutations used for oracle gen-eration have been selectively generated (as discussed in Section 4.5), mutationsused for the purpose of evaluation are randomly generated from the entire applica-tion. Note that if the mutation used for the purpose of evaluation and the mutationused for generating oracles happen to be the same, we remove the mutant from theevaluation set. Next we run the whole generated test suite (including both function-level and event-based test cases) on the faulty version of the application. The faultis considered detected if an assertion generated by JSEFT fails and our manualexamination confirms that the failed assertion is detecting the seeded fault. Wemeasure the precision and recall as follows:Precision is the rate of injected faults found by the tool that are actual faults:TPTP+FPRecall is the rate of actual injected faults that the tool finds: TPTP+FN88Table 4.2: Results showing the effects of our function coverage maximization, function state abstrac-tion, and mutation-based oracle generation algorithms.St. Coverage State Abstraction OraclesAppIDFun.cov.maximize(%)Randomexploration(%)#Func.Statesw/oabstraction#Func.StateswithabstractionFunc.StateReduction(%)#Assertionsw/omutation#Assertionswithmutation1 99 80 447 33 93 5101 1362 78 78 828 21 97 23212 813 90 66 422 14 96 3520 454 75 75 43 19 56 1232 1095 49 45 534 23 95 150 796 78 75 797 30 96 1648 1257 63 58 1653 54 97 198202 3428 56 50 32 18 43 78 519 82 82 1509 49 97 65403 25310 71 69 71 23 67 6584 9611 56 54 1383 131 90 2530 31812 41 38 1530 62 96 3521 18413 51 47 1401 152 89 2481 335AVG 68.4 62.8 - - 85.5 - -where TP (true positives), FP (false positives), and FN (false negatives) respec-tively represent the number of faults that are correctly detected, falsely reported,and missed.Comparison (RQ3). To assess how JSEFT performs with respect to existingJavaScript test automation tools, we compare its coverage and fault finding ca-pability to that of Artemis [22]. Similar to JSEFT, we give Artemis 10 minutesin total for each application; we observed no improvements in the results obtainedfrom running Artemis for longer periods of time. We run Artemis from the com-mand line by setting the iteration option to 100 and enabling the coverage prioritystrategy, as described in [22]. Similarly, JSCover is used to measure the coverageof Artemis (over 5 runs). We use the output provided by Artemis to determine ifthe seeded mutations are detected by the tool, by following the same procedure asdescribed above for JSEFT.89Experimental ObjectsCoverage (%)0204060801001 2 3 4 5 6 7 8 9 10 11 12 13JSeftArtemisFigure 4.5: Statement coverage achieved.4.4.3 ResultsTest Case Generation (RQ1). Figure 4.5 depicts the statement coverage achievedby JSEFT for each application. The results show that the test cases generated byJSEFT achieve a coverage of 68.4% on average, ranging from 41% (ID 12) up to99% (ID 1). We investigated why JSEFT has low coverage for some of the ap-plications. For instance, we observed that in JointLondon (ID 8), the applicationcontains JavaScript functions that are browser/device specific, i.e., they are exclu-sively executed in Internet Explorer, or iDevices. As a result, we are unable tocover them using JSEFT. We also noticed that some applications required moretime to achieve higher statement coverage (e.g., in NarrowDesign ID 7), or theyhave a large DOM state space (e.g., BunnyHunt ID 5) and hence JSEFT is onlyable to cover a portion of these applications in the limited time it had available.Table 4.2 columns under “St. Coverage” present JavaScript statement cover-age achieved by our function coverage maximization algorithm versus a randomstrategy. The results show a 9% improvement on average, for our algorithm, across90Table 4.3: Fault detection.JSEFT ArtemisAppID#InjectedFaults#FN#FP#TPPrecision(%)Recall(%)Byfunc-leveltests(%)Precision(%)Recall(%)1 50 0 0 50 100 100 30 100 202 50 9 0 41 100 82 73 100 123 50 4 0 46 100 92 17 100 84 50 15 0 35 100 70 28 100 225 50 26 0 24 100 48 25 100 06 50 9 0 41 100 82 15 100 167 50 17 0 33 100 66 24 100 08 50 23 0 27 100 54 26 100 09 50 6 0 44 100 88 41 100 2410 50 16 0 34 100 68 65 100 811 50 21 0 29 100 58 27 100 612 50 26 0 24 100 48 17 100 2213 50 23 0 27 100 54 26 100 28AVG - 15 0 35 100 70 32 100 12.8all the applications. We observed that our technique achieves the highest improve-ment when there are many dynamically generated clickable DOM elements in theapplication, for example, GhostBusters (ID 3).The columns under “State Abstract” in Table 4.2 present the number of func-tion states before and after applying our function state abstraction algorithm. Theresults show that the abstraction strategy reduces function states by 85.5% on av-erage. NarrowDesign (ID 7) and FractalViewer (ID 9) benefit the most by a 97%state reduction rate. Note that despite this huge reduction, our state abstractiondoes not adversely influence the coverage as we include at least one function statefrom each of the covered branch sets as described in Section 4.3.2.The last two columns of Table 4.2, under “Oracles”, present the number of as-sertions obtained by capturing the whole application’s state, without any mutations,and with our mutation-based oracle generation algorithm respectively. The resultsshow that the number of assertions is decreased by 86.5% on average due to our al-gorithm. We observe the most significant reduction of assertions for NarrowDesign(ID 7) from more than 198000 to 342.Fault finding capability (RQ2). Table 4.3 presents the results on the fault finding91capabilities of JSEFT. The table shows the total number of injected faults, thenumber of false negatives, false positives, true positives, and the precision andrecall of JSEFT.JSEFT achieves 100% precision, meaning that all the detected faults reportedby JSEFT are real faults. In other words, there are no false-positives. This isbecause the assertions generated by JSEFT are all stable i.e., they do not changefrom one run to another. However, the recall of JSEFT is 70% on average, andranges from 48 to 100%. This is due to false negatives, i.e., missed faults byJSEFT, which occur when the injected fault falls is either in the uncovered regionof the application, or is not properly captured by the generated oracles.The table also shows that on average 32% percent of the injected faults (rangesfrom 15–73%) are detected by function-level test cases, but not by our DOM event-based test cases. This shows that a considerable number of faults do not propagateto observable DOM states, and thus cannot be captured by DOM-level event-basedtests. For example in the SimpleCart application (ID 10), if we mutate the mathe-matical operation that is responsible for computing the total amount of purchaseditems, the resulting error is not captured by event-based tests as the fault involvesinternal computations only. However, the fault is detected by a function-level testthat directly checks the returned value of the function. This points to the im-portance of incorporating function-level tests in addition to event-based tests forJavaScript web applications. We also observed that even when an event-based testcase detects a JavaScript fault, localizing the error to the corresponding JavaScriptcode can be quite challenging. However, function-level tests pinpoint the corre-sponding function when an assertion fails, making it easier to localize the fault.Comparison (RQ3). Figure 4.5 shows the code coverage achieved by both JSEFTand Artemis on the experimental objects running for the same amount of time,i.e., 10 minutes. The test cases generated by JSEFT achieve 68.4% coverage onaverage (ranging from 41–99%), while those generated by Artemis achieve only44.8% coverage on average (ranging from 0–92%). Overall, the test cases gen-erated by JSEFT achieve 53% more coverage than Artemis, which points to theeffectiveness of JSEFT in generating high coverage test cases. Further, as can beseen in the bar plot of Figure 4.5, for all the applications, the test cases generatedby JSEFT achieve higher coverage than those generated by Artemis. This increase92was more than 226% in the case of Bunnyhunt (ID 5). For two of the applications,NarrowDesign (ID 7) and JointLondon (ID 8), Artemis was not able to completethe testing task within the allocated time of ten minutes. Thus we let Artemis runfor an additional 10 minutes for these applications (i.e., 20 minutes in total). Eventhen, neither application completes under Artemis.Table 4.3 shows the precision and recall achieved by JSEFT and Artemis. Withrespect to fault finding capability, unlike Artemis that detects only generic faultssuch as runtime exceptions and W3C HTML validation errors, JSEFT is able toaccurately distinguish faults at the code-level and DOM-level through the test or-acles it generates. Both tools achieve 100% precision, however, JSEFT achievesfive-fold higher recall (70% on average) compared with Artemis (12.8% recall onaverage).4.4.4 Threats to ValidityAn external threat to the validity of our results is the limited number of web ap-plications that we use to evaluate our approach. We mitigated this threat by us-ing JavaScript applications that cover various application types. Another threatis that we validate the failed assertions through manual inspection, which can beerror-prone. To mitigate this threat, we carefully inspected the code in which theassertion failed to make sure that the injected fault was indeed responsible for theassertion failure. Regarding the reproducibility of our results, JSEFT and all theapplications used in this study are publicly available, thus making the study repli-cable.4.5 Related WorkWeb application testing. Marchetto and Tonella [71] propose a search-based al-gorithm for generating event-based sequences to test Ajax applications. Mesbahet al. [75] apply dynamic analysis to construct a model of the application’s statespace, from which event-based test cases are automatically generated. In subse-quent work [76], they propose generic and application-specific invariants as a formof automated soft oracles for testing AJAX applications. Our earlier work, JSART[78], automatically infers program invariants from JavaScript execution traces and93uses them as regression assertions in the code. Sen et al. [105] recently proposed arecord and replay framework called Jalangi. It incorporates selective record-replayas well as shadow values and shadow execution to enable writing of heavy-weightdynamic analyses. The framework is able to track generic faults such as nulland undefined values as well as type inconsistencies in JavaScript. Jensen et al.[63] propose a technique to test the correctness of communication patterns betweenclient and server in AJAX applications by incorporating server interface descrip-tions. They construct server interface descriptions through an inference techniquethat can learn communication patterns from sample data. Saxena et al. [102] com-bine random test generation with the use of symbolic execution for systematicallyexploring a JavaScript application’s event space as well as its value space, for secu-rity testing. Our work is different in two main aspects from these: (1) they all targetthe generation of event sequences at the DOM level, while we also generate unittests at the JavaScript code level, which enables us to cover more and find morefaults, and (2) they do not address the problem of test oracle generation and onlycheck against soft oracles (e.g., invalid HTML). In contrast, we generate strongoracles that capture application behaviours, and can detect a much wider range offaults.Perhaps the most closely related work to ours is Artemis [22], which supportsautomated testing of JavaScript applications. Artemis considers the event-drivenexecution model of a JavaScript application for feedback-directed testing. In thiswork, we quantitatively compare our approach with that of Artemis (Section 5.4).Oracle generation. There has been limited work on oracle generation for testing.Fraser et al. [48] propose µTEST, which employs a mutant-based oracle generationtechnique. It automatically generates unit tests for Java object-oriented classes byusing a genetic algorithm to target mutations with high impact on the application’sbehaviour. They further identify [47] relevant pre-conditions on the test inputsand post-conditions on the outputs to ease human comprehension. Differential testcase generation approaches [42, 108] are similar to mutation-based techniques inthat they aim to generate test cases that show the difference between two versionsof a program. However, mutation-based techniques such as ours, do not requiretwo different versions of the application. Rather, the generated differences are inthe form of controllable mutations that can be used to generate test cases capable94of detecting regression faults in future versions of the program. Staats et al. [107]address the problem of selecting oracle data, which is formed as a subset of internalstate variables as well as outputs for which the expected values are determined.They apply mutation testing to produce oracles and rank the inferred oracles interms of their fault finding capability. This work is different from ours in thatthey merely focus on supporting the creation of test oracles by the programmer,rather than fully automating the process of test case generation. Further, (1) theydo not target JavaScript; (2) in addition to the code-level mutation analysis, wepropose DOM-related mutations to capture error-prone [86] dynamic interactionsof JavaScript with the DOM.4.6 ConclusionsIn this work, we presented a technique to automatically generate test cases forJavaScript applications at two complementary levels: (1) individual JavaScriptfunctions, (2) event sequences. Our technique is based on algorithms to maximizefunction coverage and minimize function states needed for efficient test generation.We also proposed a method for effectively generating test oracles along with thetest cases, for detecting faults in JavaScript code as well as on the DOM tree. Weimplemented our approach in an open-source tool called JSEFT. We empiricallyevaluated JSEFT on 13 web applications. The results show that the generated testsby JSEFT achieve high coverage (68.4% on average), and that the injected faultscan be detected with a high accuracy rate (recall 70%, precision 100%).95Chapter 5Atrina: Inferring Unit Oracles from GUI TestCasesSummary13Testing JavaScript web applications is challenging due to its complex runtime in-teraction with the Document Object Model (DOM). Writing unit-level assertionsfor JavaScript applications is even more tedious as the tester needs to preciselyunderstand the interaction between the DOM and the JavaScript code, which isresponsible for updating the DOM. In this work, we propose to leverage existingDOM-dependent assertions in a human-written DOM-based test cases as well asuseful execution information inferred from the DOM-based test suite to automat-ically generate assertions used for unit-level testing of the JavaScript code of theapplication. Our approach is implemented in a tool called Atrina. We evaluate ourapproach to assess its effectiveness. The results indicate that Atrina maps DOM-based assertions to the corresponding JavaScript code with high accuracy (99%precision, 92% recall). In terms of fault finding capability, the assertions generatedby Atrina outperform human-written DOM-based assertions by 31% on average.It also surpasses the state-of-the-art mutation-based assertion generation techniqueby 26% on average in detecting faults.13This work has been submitted to the International Conference on Software Testing, Verification,and Validation (ICST’16) and is currently under review.965.1 IntroductionJavaScript has emerged as the lingua franca of modern, interactive web applica-tions. The interactivity is made possible by the close relation between the Doc-ument Object Model (DOM) and the underlying JavaScript code. However, test-ing modern web applications is challenging. To check the application’s behaviourfrom an end-user’s perspective, testers often use popular frameworks such as Se-lenium. The main advantage of using these frameworks to write DOM-based testsand assertions is that they require little knowledge about the internal operationsperformed by the code. Rather, the tester needs only basic knowledge of commonevent sequences to cover important DOM elements to assert.On the other hand, it is more tedious to write unit test assertions for web ap-plications that have rich interaction with the DOM through their JavaScript code.This is because the tester needs to precisely understand the full range of interac-tion between the code level operations of a unit and the DOM level operations of asystem, and thus may fail to assert the correctness of a particular behaviour whenthe unit is used as a part of a system. Our previous findings [81] indicate that whileDOM-based assertions tend to miss the related portion of code-level failure, morefine grained unit-level assertions can detect such faults. Furthermore, finding theroot cause of an error during DOM-based testing is much more expensive than dur-ing unit testing. This suggest that we need unit-level tests to complement existingDOM-based test for more effective fault detection and localization.Current test generation approaches either produce unit test oracles based onmutation testing techniques [48, 81], or rely on soft oracles [22]. Mutation-basedapproaches suffer from high computational cost, and the problem of equivalentmutants (which are syntactically different but semantically the same as the originalapplication). Soft oracles such as HTML validation and runtime exceptions arealso limited in that they fail to capture logical and computational errors. Recently,Milani Fard et al. [44] proposed using the DOM-based test suite of a web applica-tion to regenerate assertions for newly detected states through exploring alternativepaths of the application. However, the new assertions generated by this techniqueremain at the DOM-level without considering the relation between the JavaScriptcode and the DOM. In this work, we propose to exploit an existing DOM-based test97suite to generate unit-level assertions at the code-level for applications that interacthighly with the DOM through the underlying JavaScript code. We utilize existingDOM-dependent assertions as well as useful execution information inferred froma DOM-based test suite to automatically generate assertions used for testing indi-vidual JavaScript functions. To the best of our knowledge, this work is the first topropose an approach for generating unit-level assertions by using existing DOM-based test suites.The main contributions of our work include:• A slicing-based technique to generate unit-level assertions for testing JavaScr-ipt functions by utilizing existing DOM-based test assertions;• A technique for selectively choosing additional DOM elements to assert onthat are unchecked in the existing DOM-based test suite;• An implementation of our approach in a tool, called Atrina;• An empirical evaluation to assess the efficacy of the approach on seven open-source web applications; The results show that the assertions generated byAtrina surpass the fault finding capabilities of (1) the human-written DOM-based assertions by 31% on average, and (2) the state-of-the-art mutation-based assertion generation technique by 26% on average.5.2 MotivationUnlike DOM-based testing, asserting the behaviour of a JavaScript applicationthrough unit-level tests requires a tester to check the correctness of several in-termediate code-level variables and object properties. The code-level operationsare mainly responsible for updating the DOM during the application execution.Therefore, a tester needs to analyze the relationship between the JavaScript codeand the DOM’s evolution. We believe that DOM-based assertions can be utilized asa guideline to generate unit test assertions at JavaScript code level. In this section,we discuss why through an example.Figure 5.1 presents (a) snippet of a JavaScript-based shopping cart application,and (b) sample DOM-based SELENIUM test case, which we will use as a runningexample. The application’s code (a) consists of two functions:98	1		$document.ready(function()	{			2			...			3		$(	".merchandise").click(addToCart);	4		$(	"#shopCart").click(viewCart);	5		});	6	7		function	addToCart()	{		8			var	coupElem=	$("#couponButt");	9			selItem=	getItemInfo($(".merchandise"));	10			for(var	i=0;	i<availItems.length;	i++){	11				if(availItems[i].name	==	selItem.name)	12					availItems[i].count-=	selItem.quantity;			13			}	14			var	price=	selItem.price	*	selItem.quantity;	15			if(!coupon.expired){	16				coupElem.removeClass(customer.couponStatus);	17				customer.couponStatus=	coupon.Id	+	'-'	+	'used';	18				price	-=	coupElem.data('value');		19				coupElem.addClass(customer.couponStatus);	20				coupon.expired=true;		21			}			22			customer.payable	+=	price;	23		}		24	25		function	viewCart(){	26			...	27			if($("#couponButt").attr("class")	==	'default'	&&	customer.payable==0)	28				showMsg('Shopping	cart	is	empty');	29			else	30				$("div.shopContainer").append("<p>"	+	"Total	purchase	is:	$"	+								customer.payable	+"</p>");	31		}	1		@Test	2		public	void	testCase1(){	3			Select	quantity	=	new	Select(driver.findElement(By.					id("quantityDropDown")));	4			quantity.selectByIndex(1);	5			WebElement	item	=	driver.findElements(By.css("merchandise"));	6			item.click();	7			WebElement	cart	=	driver.findElements(By.id("shopCart"));	8			cart.click();			9			String	expectedMsg	=	"Total	purchase	is:	$70";		10		String	msg=driver.findElements(By.cssSelector					("div.shopContainer")).getText();	11		assertEquals(msg,	expectedMsg);	12	}(a)(b)Figure 5.1: Running example (a) JavaScript code, and (b) DOM-based test case. The line from (b) to (a)shows the point of contact between the DOM assertion and the code. The arrow lines in (a) show thebackward as well as forward slices between JavaScript statements.1. addToCart is bound to the event handler of DOM element with class at-tribute merchandise. When any of these element are clicked, addToCa-rt gets the information of the selected merchandise, and sets the quantity of99the current available items by updating the availItems object. If a validdiscount coupon exists, addToCart calculates the discount value, and dis-ables the selected coupon button with ID couponButt by removing thecorresponding class. Finally, addToCart updates the payable amount bysetting the payable property of the customer object.2. viewCart is invoked by clicking on a DOM element with ID shopCart.The function appends a message to a div element with class shopConta-iner including the final payable amount of the customer. If the couponbutton with ID couponButt is not selected and the payable amount isequal to zero, then the empty cart message is shown.Let’s assume that in line 14 of Figure 5.1(a) selItem.price, which repre-sents the original price of the merchandise, is 100, and selItem.quantityis 1. In line 18, the discount, which is calculated based on the data value ofthe couponButt element is 30. The DOM-based assertion in Figure 5.1(b)(line 11) checks the correctness of a text appended to a div element with classshopContainer containing the final amount payable by the customer, which isequal to 70 in this example. Analyzing the assertion in line 11 of Figure 5.1(b) indi-cates that the expected value of the assertion is directly influenced by the payableproperty of customer object as well as the object’s property coupon.expiredin function addToCart. We also infer that the selitem variable in line 9 of Fig-ure 5.1(a), which directly influences the value of customer.payable, is alsoused in updating the value of availItems.count in line 12.Further, by leveraging the execution information obtained from running theDOM-based test case, we can infer the DOM’s evolution, which can influencethe fault finding capability of the test suite. However, this is not checked by theDOM-based test suite. For instance, DOM element with ID couponButt is ac-cessed several times in function addToCart as well as viewCart as the testcase in Figure 5.1(b) runs, however it remains unchecked. Since the evolution ofthe couponButt DOM element pertains to the underlying JavaScript code, it isimportant to assert on code statements responsible for changing the aforementionedDOM elements.100Instrument JS codeDOM based Test Suite(1)Run Test Suite(2)DOM Assertions(4)Relate JS code to DOM Muts.(5)Backward Slice(6)Extract JS Vars/Objs(8)Forward Slice(7)Explicit Func. Assertions(9)Implicit Func. Assertions(3)Candidate DOM Properties(10)Candidate Func. AssertionsFigure 5.2: Overview of our assertion generation approach.5.3 ApproachFigure 5.2 shows an overview of our unit-level assertion generation technique. Ata high level, our approach generates code-level assertions based on human writtenDOM-based tests and assertions. Our code level assertions fall in the followingthree categories: (1) explicit assertions, which are directly inferred from analyzingthe manually written DOM-based assertions, (2) implicit assertions, which are in-directly affected by the human written DOM-based assertions, and (3) candidateassertions, which are not considered in the written DOM-based assertions, yet arepotentially useful for fault detection. We describe how our approach below findsthe three categories of assertions. The numbers below in parentheses correspondto those in the boxes of Figure 5.2.In the first part of our approach we (1) execute the instrumented application byrunning the existing DOM-based test suite, and gather a detailed execution traceof the application. We then extract (2) DOM-based assertions, and (3) candidateDOM element properties, which are useful DOM properties that can be used togenerate assertions. We also (4) identify the initial point of connection betweenthe application’s source code and checked DOM element.In the second part of our approach, we use the information gathered in the firstpart to obtain the assertions. In this part, we (5) calculate the backward slice of theDOM mutating statements to find the entire code blocks that update the checkedDOM element, (6) extract accessible entities from the obtained statements, and101Algorithm 4: Oracle Generationinput : Test suite T ; The set of test cases tci ∈ Toutput: The ordered set of oracles oraclesbegin1 trace← EXEC(T )2 domAccss← GETDOMACC(trace)3 f reqAccdDOM← /04 α = 1READPROPERTIES(T )5 for dom ∈ domAccss do6 if ACC(propdom)≥ α then7 f reqAccdDOM← dom∪ f reqAccdDOMendend8 for tci ∈ T do9 trace← EXEC(tci)10 domAccss← GETDOMACC(trace)11 for asstn ∈ assertionstci do12 asserDOMAcc← GETDOMACC(asstn)13 asserDOMMuts← GETDOMMUTS(asserDOMAcc)14 for domMut ∈ asserDOMMuts do15 bwSts← GETBWSLICE(domMut, trace)16 expAsstnRel← GETWRVARS(bwSts)17 f wSts← GETFWSLICE(bwSts, trace)18 impAsstnRel← GETWRVARS( f wSts)endend19 cndDOMMuts← GETDOMMUTS( f reqAccdDOM)20 for domMut ∈ cndDOMMuts do21 bwSts← GETBWSLICE(domMut, trace)22 cndAsstn← GETWRVARS(bwSts)end23 explicitAsstn[ f unc]nf=1← ACCESSIBLES([ f unc]nf=1, [expAsstnRel])24 implicitAsstn[ f unc]nf=1← ACCESSIBLES([ f unc]nf=1, [impAsstnRel])25 candidateAsstn[ f unc]nf=1← ACCESSIBLES([ f unc]nf=1, [cndAsstn])26 oracles[ f unc]nf=1←{explicitAsstn∪ implicitAsstn∪ candidiateAsstn}end27 return (oracles[ f unc]nf=1)end(7) form explicit assertions using the accessible entries. We further (8) perform aforward slice on the extracted entities to identify statements that are implicitly af-fected by such entities, and (9) form implicit assertions using the collected entities,and (10) generate candidate assertions by performing steps (4), (5), and (6) on theinferred candidate DOM element properties (3).Our overall unit-level assertion generation is presented in Algorithm 4. In thefollowing sections we describe our technique for extracting DOM related informa-1021@Test2publicvoidtestCase1(){...8cart.click();9StringexpectedMsg="Totalpurchaseis:$70";10Stringmsg=driver.findElements(By.cssSelector("div.shopContainer")).getText();11assertEquals(msg,expectedMsg);12}22customer.payable+=price;...30$("div.shopContainer").append("<p>"+"Totalpurchaseis:$"+customer.payable+"</p>");12(a)(b)3Figure 5.3: Finding (1) intra DOM assertion dependency within the test case (b), (2) inter DOM assertiondependency between (b) DOM-based assertion and (a) the JavaScript code, and (3) the initial point ofcontact between (b) DOM-based assertion and (a) the JavaScript code.tion from the execution (Section 5.3.1), relating DOM mutations to the JavaScriptcode (Section 5.3.2), and generating unit test assertions (Section 5.3.3).5.3.1 Extracting DOM-Related CharacteristicsThe DOM connects a test case to the web application’s code. Therefore, we firstneed to analyze the DOM-based test suite and extract the following pieces of infor-mation: (1) DOM-related operations of the existing test suite that may have tightconnection with the JavaScript code, and (2) frequently accessed DOM properties,which are potentially influential in improving the fault finding capability of the testsuite, but are left unchecked in the manually-written test suite.DOM-Related Operations. Any written test case needs to check the correctnessof the application’s behaviour. In a DOM-based test case, the expected behaviouris checked through DOM-based assertions. A DOM-based assertion is defined as< domProps,expVal >, where domProps consists of one or more DOM elementfeatures (e.g. attribute, and/or textual value), and expVal is the correct value ex-pected by the assertion. In the rest of the sections, we call the DOM element featureas a DOM property. DOM-based assertions play a significant role in our approachas they guide us towards important portions of the underlying JavaScript code thatneed to be checked in unit-level assertions.103For each DOM-based assertion we find intra DOM assertion dependency withinthe test case.Definition 1 (Intra DOM Assertion Dependency) An intra DOM assertion de-pendency is defined as a three tuple of < assert,domElems,domProps >, whereassert is the intended DOM-based assertion, domElems is the accessed DOM el-ements in the test case pertaining to the assertion, and domProps is the accessedDOM properties within the assertion.GETDOMACC in line 10 of Algorithm 4 retrieves DOM dependencies of the as-sertion in the test case. Going back to our example in Figure 5.3(b), tracking theassertion in line 11 shows that it has a DOM dependency to a div element withclass shopContainer, which is accessed in line 10. The intra DOM assertiondependency of the example further shows that the text value of the DOM elementis compared with the expectedMsg in line 11.We further need to correlate the inferred intra DOM assertion dependency withthe application’s code. We call the correlation between the DOM-based assertionand the application’s code as inter DOM assertion dependency.Definition 2 (Inter DOM Assertion Dependency) An inter DOM assertion de-pendency is defined as < assert, initPoint >, where assert is the intended DOM-based assertion, and initPoint is the initial line of code in the application that isresponsible for mutating the property of a DOM element extracted from the intraDOM assertion dependency.In order to find the initial point of contact between the application’s code and amutated DOM property in the DOM-based test case, we track evolution of theaccessed DOM elements (GETDOMMUTS in line 13 of the algorithm) as well asinvoked event handlers as the test case runs. We consider DOM mutation as aDOM-tree structural change (e.g.; additions and removals of child nodes), as wellas DOM write operations such as changes to attributes and/or updates to childtext nodes. For instance, running the sample test case in Figure 5.1(b) results inmutating (1) the textual value of div element with class shopContainer, and(2) the class attribute of DOM element with ID couponButt.104In Section 5.3.2, we explain inferring the initial point of contact between thesource code and a mutated DOM element in a DOM-based test suite in details.Frequently Accessed DOM Properties. In addition to DOM-based assertions, wefurther consider DOM element properties that are frequently accessed within theapplication as the test case runs (lines 1 to 7 of Algorithm 4). ACC in line 6 of thealgorithm computes the access frequency of a DOM property, f reqAccdDOM inline 7 contains the inferred candidate DOM properties, and GETDOMMUTS in line19 records DOM mutations occur on candidate DOM properties.The intuition is that frequent use of a given DOM property can point to the ex-tent of application’s behaviour dependency on the DOM property. Thus, if changeshappen to a property through the JavaScript code, it is important to assert the cor-rectness of mutations on the property. We define the access frequency of a DOMelement property as the number of times that the element’s property has been readduring the execution of a test case. DOM properties include attributes as well astextual value of the elements. In order to record DOM property accesses within theapplication, we rewrite native function calls used by programmers to access DOMelement such as getElementById, getElementsByClassName, and/or g-etElementsByTagName. The returned object from these functions is later usedto access attributes or textual values of the element. Thus, we apply a forward sliceon the returned object to find instances of element’s property access in the code.For example in function addToCart of Figure 5.1(a), DOM element with IDcouponButt is assigned to coupElem variable. The assigned variable is laterused to access the class attribute as well as the value of the DOM element inlines 16, 18, and 19.Let Acc(propel) be the access frequency computed for property prop of DOMelement el, then:Acc(propel) =Read(propel)∑ne=1 Read(domEleme), where Read(domEleme) is the number oftimes that DOM element domElem is read, given that the total number of DOMelements during the execution of a test case is n. Note that reading a DOM elementrefers to accessing the element to read the corresponding property. In Figure 5.1(a),the class attribute of DOM element couponButt is read in lines 16 and 27, andthus the access frequency computed for the class attribute of the element is equalto 23 .1057functionaddToCart(){8varcoupElem=$("#couponButt");9selItem=getItemInfo($(".merchandise"));14varprice=selItem.price*selItem.quantity;15if(!coupon.expired){...18price-=coupElem.data('value');19coupElem.addClass(customer.couponStatus);20coupon.expired=true;21}22customer.payable+=price;23}...Figure 5.4: Intra (data and control) code dependency through backward slicing.We choose element’s property with access frequencies above a threshold α aspotential candidates, which are later used for the purpose of unit-level assertiongeneration. We automatically compute this threshold for each test case as:α = 1ReadProperties(T ) , where ReadProperties(T ) is the total number of proper-ties which have been read during the execution of test suite T .Going back to our running example and the sample DOM-based test case inFigure 5.1, class attribute of the couponButt is selected as a potential candi-date since its access frequency ( 23 ) is greater than the computed threshold, which isequal to 12 in this example.5.3.2 Relating DOM Changes to the CodeTo determine the initial point of contact between DOM and the underlying appli-cation’s code, we first cross reference the DOM element as well as the propertywe are interested in with a set of DOM mutations obtained from the executiontrace. The desired DOM element and its property are inferred from either the in-tra DOM assertion dependency or the candidate DOM properties as described inSection 5.3.1. Recall that our execution trace contains information about triggeredevents, event handlers, and DOM mutations caused by the events. Therefore, wecan identify relevant events and invoked functions corresponding to a given DOMmutation. For example, the collected execution trace in Figure 5.3 contains infor-mation about the mutations of a div element with class shopContainer, whichpertains to the DOM-based assertion.To figure out where the mutation originated in our execution trace, we keep106record of DOM accesses within the invoked functions. For each DOM access, wetrack JavaScript lines of code that are responsible for updating the correspondingDOM element. Going back to our example in Figure 5.3, given that the textualproperty of the div element is extracted from the intra DOM assertion depen-dency, we identify line 30 in function viewCart as the initial point of contactresponsible for changing the text of DOM element.After inferring DOM mutant statements, we identify the control and data intracode dependency within the application’s code.Definition 3 (Intra Code Dependency) An intra code dependency is defined as< criterion,codeSts >, where criterion is a variable at the initial point of contact,and codeSts is the set of control and data dependent statements that are eitheraffected by the ctiterion or have some effect on the criterion.To find the intra code dependency, we perform backward as well as forwardslicing by using criterion as the slicing criterion. GETBWSLICE in lines 15 and 21of Algorithm 4 computes a backward slice with respect to assertion related DOMmutations, and candidate DOM property mutations respectively. We use dynamicslicing to capture run-time dependencies. Note that instrumenting the entire appli-cation’s code to perform dynamic slicing incurs high performance overheads. Toavoid high overheads, we first intercept the code sent from the server to the client,and then statically instrument only those statements that may affect a given DOMelement. To extract the subset of the code statements, we first find the JavaScriptclosure scope which contains the definition of the variable in the initial slicingcriteria. Then all references to the variable within the closure scope are found.Therefore, we can identify all locations in the code where the variable is updated,read, or a new alias is created. For each variable update/read related to the variableof the slicing criteria, we track the data dependencies for such an operation. Theaforementioned steps are performed iteratively for each dependencies to collect thesubset of code statements, which are instrumented for a given initial slicing crite-ria. The instrumented code keeps track of all updates and accesses to all relevantdata and control dependencies. Once the test case runs, we collect traces fromthe instrumented code. This trace is used to dynamically extract backward slicingas well as forward slicing statements. Note that in addition to backwards slicing107which is later used to generate explicit assertions, we also use forward slicing togenerate our implicit assertions (Section 5.3.3).The backward slicing technique starts by extracting instances of the initial slic-ing criteria from the trace. For each read operations, the trace is traversed back-wards to find the nearest related write operation. Once found, the write operationis added to the slice under construction. This process is repeated for all the data de-pendencies related to that write operation. A similar approach is taken for includingcontrol dependencies in the slice. Our slicing technique supports inter-proceduralslicing. For example, if a variable is assigned by the return value of a called func-tion, the slicer recursively tracks the function and performs a backward slice on thestatement returned by the called function.To address aliasing when computing the slice of a variable that has been setby a non-primitive value, we need to consider possible aliases that may refer tothe same object. Specifically in JavaScript dot notation and bracket notation arefrequently used to modify objects at run time. Since static analysis techniques forJavaScript often ignore this issue [45], we use dynamic slicing. If a reference toan object of interest is saved to a second object’s property, e.g. through the use ofthe dot notation, the object of interest may also be altered via aliases of the secondobject. For example, after executing statement a.b.c = objOfInterest;,updates to objOfInterest may be possible through a, a.b, or a.b.c. Todeal with such scenarios, our slicing technique searches through the collected traceand adds the forward slice for each detected alias to the current slice for our variableof interest (e.g. objOfInterest).Given customer.payable as the initial slicing criteria in our example, Fig-ure 5.4 shows the relevant backward slice statements (lines 22, 18, 15, 14, and 9),where customer.payable, variable price, as well as properties of the objectselItem are assigned, and the value of coupon.expired is checked in theconditional statement. By the end of backward slicing step, we have all the rele-vant statements corresponding to a given DOM element. These are later used toderive test assertions.1085.3.3 Generating Unit-Level AssertionsOur approach targets postcondition assertions which are used to examine the ex-pected behaviour of a given function after it is executed in a unit test case. Byanalyzing a given DOM-based test case, we generate unit-level assertions in thefollowing three categories: (1) explicit assertions, (2) implicit assertions, and (3)candidate assertions.Explicit AssertionsAfter collecting all the statements, that are relevant to a given DOM-based asser-tion, we extract accessible entities from these statements (ACCESSIBLES in line 23of the algorithm). Types of accessible entities include (1) the function’s returnedvalue, (2) the used global variables in that function, (3) the object’s property wherethe object is accessible in the outer scope of the function, and/or (4) the accessedDOM element in that function. Dynamic backward slice of a DOM-based asser-tion helps to (1) track all statements that contribute to the checked result and assuch identify those entities that might have influenced the checked property valueof the DOM element, and (2) eliminate unrelated entities that are not involved inthe computation that leads to the update performed on the checked DOM element.Since our dynamic slice is extracted from the program run, we can track all con-crete values associated with accessible entities. During the run of a test case, theremight be different instances where a given statement is executed. Different exe-cution instances can lead to different behaviour. Since we are using dynamic slic-ing, an instance that leads to the required behaviour, which is checked through theDOM-based assertion, is on the backward slice. Given that the manually-writtenexpected value, that is checked against the DOM’s property is valid, the concretevalues of related entities in the backward slice are potentially correct. Therefore,concrete value of an entity in the backward slice can be used as the expected valueof the entity in unit-level assertions to test the current version of the application(discussed in Section 5.4.4). explicitAsstn in line 23 of Algorithm 4 contains theinferred explicit assertions.In our running example (Figure 5.4), explicit assertions check the correctnessof customer.payable, coupon.expired, as well as price and quanti-1091test("addToCart",5,function(){...2var customer= {Id:"10", couponStatus:"default", payable:0};3var coupon= {Id:"1", expired:false}4var availItems= {{name:"jacket", price:100, count:2}}5varselItem= {name:"", price:0, quantity:0};6addToCart();7equal(customer.payable,70);8ok(coupon.expired);9deepEqual(selItem,{price:100,quantity:1});10equal(availItems[0].count,1);11equal(customer.couponStatus, '1-used');12});Figure 5.5: Generated QUNIT test case and assertions.ty properties which belong to selItem object. Assuming that the original priceof the item is 100, the number of selected item is 1, and the calculated discountaccording to the value attribute of a DOM element with ID couponButt is 30,then the expected values included in the assertions for each of the entities are 70,boolean value true, 100, and 1, respectively. Figure 5.5 shows a unit test case foraddToCart function with the generated assertions in QUNIT framework. Lines7 to 9 in the figure corresponds to the explicit assertions.Implicit AssertionsWe gather all the statements that explicitly affect the computations relevant to agiven DOM-based assertion. While assertions inferred from such statements areinherently important, we further need to consider entities that are implicitly in-fluenced by the checked DOM element in the manually-written test suite. For thispurpose we apply a dynamic forward slice on the statements collected from a back-ward slice of a DOM-based assertion. A forward slice with respect to a statementst, indicates how an operand at st is being subsequently. This can help the testerto ensure that st establishes the expected outcome of the computations assumed bylater statements.GETFWSLICE in line 17 of the algorithm computes forward slice on the vari-able operands of a statement in the backward slice. The process of forward slicingis similar to the backward slicing technique discussed earlier (Section 5.3.2). Theslicing criterion of the forward slice module is either a variable, object’s property,or an accessed DOM property extracted from the statements in a backward slice1107functionaddToCart(){8varcoupElem=$("#couponButt");9selItem=getItemInfo($(".merchandise"));10for(vari=0;i<availItems.length;i++){11if(availItems[i].name==selItem.name)12availItems[i].count-=selItem.quantity;13}...}Figure 5.6: Intra code dependency through forward slicing.7functionaddToCart(){15if(!coupon.expired){16coupElem.removeClass(customer.couponStatus);17customer.couponStatus=coupon.Id+'-'+'used';18price-=coupElem.data('value');19coupElem.addClass(customer.couponStatus);}}......Figure 5.7: Relating candidate DOM element to JavaScript code.segments of the code. The accessible entities (ACCESSIBLES in line 24), whichhave been set within the collected forward slice statements establish the implicitassertions. implicitAsstn in line 24 of Algorithm 4 contains the inferred implicitassertions. Figure 5.6 shows the intra code dependency obtained by performingforward slicing on the running example. As shown in the figure, the properties ofobject selItem are set in line 9, that is recorded during the backward slice pro-cess. Given line 9 as the forward slice criteria, we mark availItems.count(line 12) as an implicit assertion. Line 10 in Figure 5.5 shows the generated implicitassertion for addToCart function according to this criteria.Candidate AssertionsIn addition to explicit and implicit assertions, we also verify the correctness ofcode-level entities pertaining to DOM updates, which are essentially important butnot checked in the existing DOM-based test cases. We derive such unit-level asser-tions, which we call candidate assertions, from the candidate DOM element prop-erties previously obtained from the test case execution (box 3 in Figure 5.2). Asthe test case runs, we monitor the DOM’s evolution and match the list of mutated111DOM elements and their properties with property updates of the candidate DOMelements. Once a match is found, we infer backwards slice statements pertainingto the mutation of DOM element’s property (GETBWSLICE in line 21 of the al-gorithm). Therefore, in this case the slicing criteria which is given as input to thebackwards slicing module is an update to the property of the candidate DOM ele-ment. After gathering the related JavaScript statements within the application, weextract accessible entities of these statements (ACCESSIBLES in line 25), whichform our candidate assertions. candidateAsstn in line 25 contains our candidateassertions.Recall from the running example, one such potential DOM property which werecord as part of Section 5.3.1, is class attribute associated with DOM elementwith ID couponButt. As shown in Figure 5.7 monitoring DOM changes revealthat line 19, where the class attribute of the element is set, is the initial pointof contact between DOM mutation and the JavaScript code. Given line 19 as theslicing criteria, customer.couponStatus (line 17) is marked as the candi-date assertion. Line 11 in Figure 5.5 shows the candidate assertion generated foraddToCart function.5.3.4 Tool Implementation: AtrinaWe have implemented our JavaScript unit test assertion generation in an auto-mated tool called Atrina. The tool is written in Java, and is publicly availablefor download [2]. We use a proxy server to intercept HTTP responses which con-tain JavaScript code. The JavaScript Mutation Summary library [9] is used to trackDOM changes during the execution of the test suite. Trace information is collectedby the proxy once received from the browser. To instrument Selenium test cases,we convert them into an abstract syntax tree (AST) by employing Eclipse Java de-velopment tools (JDT). Once the transformation is done, we run the Java code ofthe changed AST on the application under test.5.4 Empirical EvaluationTo quantitatively assess the efficacy of our test generation approach, we have con-ducted a case study, in which we address the following research questions:112Table 5.1: Characteristics of the experimental objects.ID Name LOC (JS) # TestCases # Assertions1 Phormer 1.5K 7 182 EnterpriseStore 57K 19 213 WolfCMS 1.3K 12 424 Claroline 36K 23 355 StudyRoom 10.6K 12 236 AddressBook 1.1K 13 147 Brotherhood 0.8K 10 10RQ1 How accurate is Atrina in mapping DOM-based assertions to the correspond-ing JavaScript code?RQ2 How effective is Atrina in generating unit test assertions that detect faults?RQ3 Are the assertions generated by Atrina more effective than DOM-based as-sertions written manually by the tester in terms of fault finding capability?RQ4 How does Atrina compare to existing mutation-based techniques for gener-ating unit test assertions?Atrina and the experimental data are available for download [2].5.4.1 ObjectsOur study includes seven open source JavaScript web applications that have SELE-NIUM test cases. Table 5.1 presents the experimental objects and their properties.Phormer [10] is a photo gallery web application. EnterpriseStore [5] is an assetmanagement web application. WolfCMS [14] is a content management system.Claroline [4] is a collaborative online learning and course management system.AddressBook [1] is an address/phone book. StudyRoom [13] is a web-based out-door study environment simulator, and Brotherhood [3] is an online social net-working platform.5.4.2 SetupTo address our research questions, we provide the URL as well as the availablemanually written DOM-based test suite of each experimental object to Atrina. Unitlevel test assertions are then automatically generated by the tool.Accuracy (RQ1). To evaluate the accuracy of Atrina, we measure precision and113recall. We manually compare the slices generated by Atrina with the JavaScriptcode that is relevant to each assertion. Precision and recall are defined as follows:Precision is the fraction of lines in a slice produced by Atrina, that are actuallyrelated to the human-written DOM-based assertion: T PT P+FPRecall is the fraction of the correct set of related lines of code to each assertion,which is actually present in the slice produced by Atrina: T PT P+FNwhere T P (true positives), FP (false positives), and FN (false negatives) respec-tively represent the number of lines of code that are correctly reported, falselyreported, and missed to report as related to the DOM-based assertion.Effectiveness (RQ2). To assess the effectiveness of Atrina, we measure the faultfinding capability of the assertions generated by the tool. Moreover, to under-stand the effect of each type of assertion produced by Atrina in detecting faults,we compare the fault detection rate of using (1) exclusively explicit assertions, (2)explicit assertions and implicit assertions, and (3) explicit assertions and candidateassertions. Since explicit assertions compose the core body of our assertions, weconsider implicit and candidate assertions in conjunction with explicit ones ratherthan separate them.The experimental objects do not come with a rich version history to apply At-rina on real regression changes. Therefore we mimic regression faults by auto-matically injecting mutations to the application, and evaluate the tool’s ability indetecting the seeded faults. Using our recently developed mutation testing tool,MUTANDIS [79], we automatically inject 50 random first-order mutations into theJavaScript code of the applications. The mutation operators are chosen from a listof common operators such as changing the value of a variable, modifying a con-ditional statement, altering unary operations, as well as common mistakes madeby developers when developing a given web application [82], e.g., changing theID/tag name passed into DOM access functions such as getElementById orgetElementsByTagName, and modifying the attribute name/value in setAt-tribute. The fault is considered detected if an assertion generated by Atrina failswhen run on the mutated code, and our manual examination confirms that the failedassertion is detecting the seeded fault.Comparison with human-written DOM-based Assertions (RQ3). To assess the114usefulness of Atrina, we compare the human written DOM-based assertions withthe unit-level test assertions generated by our approach in terms of fault findingcapability. Similar to RQ2, we perform fault injection on both. The faults injectedinto our experimental objects in response to RQ3 are the same as the ones that weseed in applications to answer RQ2.Comparison with Mutation-based Assertion Generation (RQ4). To assess howAtrina performs with respect to the current state-of-the-art oracle generation tech-nique, we compare our tool’s fault finding capability with the mutation-based as-sertion generation approach [48, 81]. To generate mutation-based assertions forthe JavaScript code, we use human-written DOM-based test suite as a means toexecute the application and infer the execution traces required for the purpose ofmutation analysis. We perform the following steps to generate test assertions usingmutation analysis.1. Remove assertions from the human-written DOM-based test suite.2. Execute the test suite on the original version of the application to obtainexecution traces.3. Inject mutations for the purpose of oracle generation.4. Execute the human-written test suite on the generated mutants, and producetest oracles by comparing execution traces obtained from the mutants andthe original version of the application.We generate 50 mutants to produce test assertions for each application - we choose50 to balance coverage of different faults and execution time. Note that the im-plementation and evaluation of the mutation analysis technique both use mutationoperators from our prior work [82]. Therefore, our evaluation is biased in favourof mutation-based assertion generation approach over Atrina.5.4.3 ResultsAccuracy (RQ1). Table 5.2 shows the number of correctly reported (true posi-tive), the number of incorrect reported (false positive), and the number of missed(false negative) JavaScript lines of code, as well as precision and recall achieved byAtrina, which are related to human-written DOM-based assertions. The table alsoshows the number of explicit, implicit, candidate, and the total number of asser-115tions generated by Atrina. The recall achieved for Phormer (ID 1), WolfCMS (ID3), AddressBook (ID 6), and Brotherhood (ID 7) is 100%. For EnterpriseStore (ID2), Claroline (ID 4), and StudyRoom (ID 5) the recall achieved is 84%, 79%, and82% respectively. Except for EnterpriseStore and Claroline applications, for whichthe precision rate is 98%, the computed precision rate for the rest of applications is100%.We noticed that the lower recall rate obtained by Atrina is mainly due to the useof third party libraries. Currently, we only analyze the application source code anddo not consider libraries in our slicing technique. The underlying assumption isthat faults mainly originate from the application’s code. The small drop observedin precision is due to functions that are called but not instrumented due to limi-tations in our current implementation. If the definition of a called function is notinstrumented, we assume that the function call is related to our slice, while it maynot be so. We also observed that in rare cases a variable is seemingly assigned by areturn value of a function, though the return statement is not found in the bodyof the called function. Our current implementation includes such variable assign-ments in the pertaining slices. Note that both recall and precision can be improvedto 100% with a more robust implementation of our technique.We found out that on average 6% of the human-written DOM-based assertionsin our experimental objects are not connected to the JavaScript code in the follow-ing scenarios: (1) HTML is used to transfer the data, which is required by the clientfrom the server (e.g;, the required information is stored as meta-data in attributeswithin the HTML), (2) web server is utilized to perform computations, (3) insteadof dynamically generating the DOM structure through the JavaScript code, HTMLfragments are retrieved from the server and injected into the page, and (4) CSS andHTML are used to perform required changes to the user interface (e.g.; the CSStransition property with hover is used to bypass JavaScript).Effectiveness (RQ2). Figure 5.8 depicts the fault detection rate (percentages)achieved by (1) Atrina, (2) explicit assertions when included individually, and (3)explicit assertions in conjunction with either implicit assertions or (4) candidateassertions. The number on each bar represent the number of faults detected by thecorresponding assertion types. As shown in Figure 5.8, Atrina detects on average63% of the total faults (ranges from 42-84%). The percentage of faults detected116Table 5.2: Accuracy achieved by Atrina.Backward Slice AssertionsAppID#TP(LOC)#FP(LOC)#FN(LOC)Precision(%)Recall(%)#Explicit#Implicit#Candidate#Total1 174 0 0 100 100 41 9 13 632 861 18 162 98 84 51 19 26 963 193 0 0 100 100 83 23 16 1224 1446 29 385 98 79 72 29 31 1325 1017 0 224 100 82 78 18 11 1076 533 0 0 100 100 24 3 14 417 430 0 0 100 100 14 5 16 35AVG - - - 99.4 92.1 - - - -Experimental ObjectsFault Detection Rate (%)020406080100302326284229353921161720312225282518202433313133393133371 2 3 4 5 6 7AtrinaExplicit Only AssertionsExplicit+Implicit AssertionsExplicit+Candidate AssertionsFigure 5.8: Fault detection rate using different types of generated assertions.by explicit assertions alone is less than that detected through the combination ofexplicit with either implicit assertions or candidate assertions. This indicates thatimplicit as well as candidate assertions are essential entities in improving the faultfinding capability of Atrina. By eliminating implicit and candidate assertions, faultdetection rate drops 23% on average, with a maximum drop of 31% for the Enter-117Experimental ObjectsFault Detection Rate (%)020406080100302223423029211816312522252218333031392831       1 2 3 4 5 6 7 AVGAtrinaMutation−basedHuman−writenFigure 5.9: Fault finding capability.priseStore application (ID 2).Figure 5.8 shows that the improvement contributed by implicit assertions is 6%on average, while the improvement due to candidate assertions is 19% on average.This indicates that candidate assertions play a more prominent role in increasingthe number of faults detected by Atrina than implicit assertions. Not surprisingly,explicit assertions contribute the most among the three assertion types generatedby Atrina. Explicit assertions detect 76% of the total detected faults on average(ranges from 69-94%). These assertions are derived directly from the DOM-basedoracles written by the developer of the application who has a deep knowledge of theapplication’s behaviour. Therefore, it is not surprising that explicit assertions de-rived directly from such oracles have the highest impact on fault finding capabilityof our tool.Comparison with human-written DOM-based Assertions (RQ3). Figure 5.9compares the fault detection rate achieved by the code-level assertions generatedby Atrina with the human-written DOM-based assertions. The numbers shown on118each bar represent the actual number of faults detected by the corresponding asser-tion generation technique. As shown in the figure, the percentage of faults found byAtrina is higher than manually written DOM-based assertions for all applications.Overall, Atrina outperforms manual assertions in terms of fault finding capabilityby 31% on average (ranges from 6-45%). We observed that on average, 52% ofthe candidate DOM properties that we select to construct our candidate assertionswere ignored in human-written DOM assertions, although their values are updatedthrough the JavaScript code. We further noticed that for each failed manual DOMassertion as a result of an injected fault, at least one explicit assertion fails in Atrina(three failed explicit assertions on average). We observed that most often DOM as-sertions written by the tester are too generic in nature. Therefore even when a DOMassertion detects a JavaScript fault, pinpointing the root cause of the error can bequite challenging. However, code-level assertions make it easier for the tester tolocalize the fault, as their locations directly correlate with the code.We observed in several cases that the value of a DOM element property thatis checked in the human-written test suite is later used in JavaScript code that in-volves internal computations only. If the seeded fault falls in the correspondingcomputational statements, the resulting error is not captured through the manuallywritten DOM assertions. In such cases, implicit assertions are capable of detectingthe error, which points to the importance of incorporating these types of assertionsin our approach. We also noticed that around 66% of the faults found by implicitassertions are neither detected by explicit/candidate assertions nor by the human-written ones. This is because they require executing a more complex sequence ofevents to propagate to the observable DOM (e.g., when an object’s property is as-signed in a function to be later used in updating a value of a DOM element after aspecific event is triggered).Comparison with Mutation-based Assertion Generation (RQ4). Figure 5.9presents the results of comparing fault finding capability of Atrina with mutation-based assertion generation technique. As shown in the figure, Atrina produces unitassertions that are more effective than those produced by mutation-based techniquein terms of fault-finding capability. The assertions generated by Atrina surpassesthose generated by the mutation-based approach by 26% on average (ranges from10-40%), although both implementation and evaluation of the mutation-based tech-119nique use a common set of mutation operators (and thus our evaluation is biasedtowards mutation-based techniques). This points to the importance of incorporat-ing the information that exists in human-written DOM-based test cases.While the results demonstrate that Atrina is more effective than mutation-basedapproach in terms of fault detection, we further investigate efficiency of our ap-proach in terms of time overhead. We compute overhead of Atrina as the sum-mation of time required for (1) instrumenting the application, and (2) analyzingthe collected trace to compute JavaScript slices. To calculate time overhead of themutation-based approach, we consider the total time required for running the testsuite multiple times (once per mutation), generating mutants, as well as the timeneeded to compare the original and the mutated version of the application to gen-erate assertions. Figure 5.10 shows the results of time overhead computed for eachapproach. Our results show that the time overhead for Atrina is 47 seconds on av-erage, while the overhead computed for mutation-based technique is 98 seconds onaverage. As shown in the figure, for the EnterpriseStore application (ID 2), whichis the largest application we considered (57K LOC), time efficiency is increasedby 58% using Atrina. This indicates that our approach significantly outperformsmutation-based assertion generation as far as time efficiency is concerned.5.4.4 DiscussionFault Masking. As we mentioned in Section 5.3.3, the concrete value of an entityin the computed backward slice can potentially be used as the expected value ofthe entity in explicit assertions to test the current version of the application. Theactual values of the related entities in the backward slice are correct unless thereexists a masked fault which is concealed in the chain of computations and thusdoes not propagate to the checked state of the DOM element. However, we conjec-ture that fault masking rarely happens in JavaScript web applications as it is moreprevalent in programs with many small expressions whose results are stored in sev-eral intermediate values. We also observed no fault masking occurrence during theevaluation of Atrina on seven JavaScript applications used in this study.Limitations. The effectiveness of the generated assertions by Atrina in terms offault finding capability depends on the quality of human-written DOM-based test120Experimental ObjectsTime (Seconds)0501001502001 2 3 4 5 6 7AtrinaMutation−basedFigure 5.10: Time overhead for each approach.cases. If the DOM assertions contained in the DOM-based test suite check irrele-vant information, the explicit assertions obtained by our tool will point to entitiesthat may not be important from the tester’s point of view. This can also negativelyaffect the fault finding capability of implicit assertions as they are indirectly in-ferred from the DOM-based assertions. Moreover, if the human-written test suitedoes not execute application’s state with effective DOM elements, our tool is notable to infer effective candidate assertions.5.4.5 Threats to ValidityAn external threat to the validity of our evaluation is the limited number of JavaScri-pt applications used to measure the effectiveness of our approach. We mitigatedthis threat by using web applications from various domains, code size, and func-tionality. Another threat concerns validating failed assertions through manual in-spection that can be error-prone. To mitigate this threat, we carefully examine thecode in which the assertion failed to make sure that the injected fault was indeed re-sponsible for the assertion failure. Moreover, manual computation of the JavaScriptslices to measure precision and recall is a time intensive task done by the authors,121and thus could be error-prone. However, we made every effort to mitigate thisthreat by precisely examining the application’s code.The regression faults we inject to evaluate the effectiveness of Atrina may notbe realistic. We mitigate this threat by injecting mutations that represent commonJavaScript applications faults, as well as using real-world web applications, andSELENIUM test cases written by developers.5.5 Related WorkWhile automated test generation has significantly addressed in the literature, therehas been limited work on supporting the construction of test oracles. Recently,Harman et al. [58] have conducted a comprehensive survey of current techniquesused to address the oracle problem. Mesbah et al. [76] automatically producegeneric invariants in a form of soft oracles to test AJAX applications. JSART [78]automatically infers JavaScript invariants from the execution traces for the purposeof regression testing. Jalangi [105] is a framework to support writing of heavy-weight dynamic analyses. The framework detects generic JavaScript faults suchas null, undefined values, and type inconsistencies. Jensen et al. [63] incorporateserver interface descriptions to test the correctness of communication patterns be-tween client and server through learning the communication patterns from sampledata in AJAX applications. Xie et al. explore test oracle generation for GUI sys-tems [113]. Eclat [91], and DiffGen [108] are used for automatically generatinginvariant-based oracles. Our work is different from these approaches in that weuse the available DOM-related information in a human written test suite to inferunit-level assertions at the JavaScript code-level. Moreover, we generate assertionsthat capture application’s behaviour, rather than generic and soft oracles.Fraser et al. [48] propose a mutation-based oracle generation system calledµTEST. µTEST automatically generates unit tests for Java object-oriented classesby employing a genetic algorithm which target mutations with high impact on theapplication’s behaviour. They further enhance the system [47] to improve humancomprehension through identifying relevant pre-conditions on the test inputs andpost-conditions on the outputs. The authors assume that the tester will manuallycorrect the generated oracles. However, the results on the effectiveness of such ap-122proaches which rely on the ”generate-and-fix” assumption to construct test oraclesare not conclusive [49]Staats et al. [107] propose an oracle data selection technique, which is basedon mutation testing to produce oracles and rank the inferred oracles in terms oftheir fault finding capability. This work suffers from the scalability issues ofmutant-generation based techniques as well as the problem of estimating the propernumber of mutants required for generating effective oracle data set. Similar tomutation-based techniques, differential test case generation approaches [42, 108]also target generating test cases that show the difference between two versions of aprogram. Pastore et al. [92] exploit crowd sourcing approach to check assertions.In this approach the developer produces tests and provides sufficient API docu-mentation for the crowd such that crowd workers can determine the correctnessof assertions. However, recruiting qualified crowd to generate test oracles can bequite challenging.In the context of leveraging the existing test cases to generate more complextests, Pezze` et al. [94] propose a technique to construct integration tests whichfocus on class interactions by utilizing the unit test cases. The integration tests areformed by combining initialization and execution sequences of simple unit tests toform new ones. However, the proposed technique does not deal with assertions.eToc [110] and EvoSuite [46] use search based techniques to evolve the initialpopulation of test cases. Their main goal is to increase the code coverage achievedby the test suite. However, in this work our aim is to increase the fault findingcapability by focusing on test assertions rather than increasing the code coverage.Milani Fard et al. [44] propose Testilizer which utilizes DOM-based test suiteof the web application to explore alternative paths and consequently regenerateassertions for new detected states. Our work is different from this approach in thatwe exploit DOM-related information in a human written test suite to capture thebehaviour of the application at the unit-level JavaScript. Furthermore, they do notgenerate code-based assertions which we do.1235.6 ConclusionsIn this work, we presented an automated technique to generate JavaScript unit testassertions; given a web application and a UI-level test suite, we generate assertionsthat can capture regression faults in the JavaScript code. We implemented ourapproach in an open-source tool called Atrina. We empirically evaluated Atrinaon seven web applications. The results show that our approach (1) is accurate inmapping the existing UI-level assertions to the JavaScript code, (2) is effectivein detecting regression faults (63% on average), (3) outperforms human-writtenDOM-based assertions in terms of fault finding capability (by 31% on average),and (4) generates unit assertions that are more effective (26% on average) thanthose produced by a mutation-based technique.The results indicate that existing higher level test assertions can be leveraged togenerate unit-level assertions. In our current approach we rely on parts of the codethat are covered by the human-written test cases. Our future work will includeusing learning-based techniques to generate unit-level assertions for parts of thecode that are not examined through existing human-written tests.124Chapter 6ConclusionsJavaScript is increasingly being used to create modern interactive web applicationsthat offload a considerable amount of their execution to the client-side. JavaScriptis a notoriously challenging language for web developers to use, maintain, analyzeand test. The work presented in this dissertation aims at improving the state-of-the-art in testing JavaScript web applications.6.1 Revisiting Research QuestionsIn the beginning of this thesis, we designed two research questions. We believethat the contributions show that we have addressed the research questions.6.1.1 Research Question 1How can we generate effective test cases for JavaScript web applications?To this end, we proposed a set of techniques to automatically test the correctbehaviour of the application. Mutation analysis has subsequently been used toassess the effectiveness of the generated tests.Chapter 2. We proposed JSART, which targets web application testing from theinvariant assertions points of view. Program invariants formulate the main char-acteristics of the application under test that remain unchanged as the applicationevolves. Therefore, they can be used towards regression testing. Our techniquefirst infers such invariants from the application, and then convert them to test asser-tions. Our empirical study shows that the proposed approach is able to infer stableassertions and accurately detect regression faults. Our technique is involved with125minimal performance overhead, and thus it may also scale well for industrial appli-cations in practice. Moreover, manual detection of invariants is a time consumingtask. Therefore, the automation brought by our tool can reduce the manual effortinvolved with inferring invariants.However, our invariant generation technique is based on the assumption thatthe program specifications are not changed frequently in subsequent revisions. Ifmajor changes affect the core properties of the application, the inferred invariantsfrom the original version may not be valid any longer. Furthermore, it may notbe possible to convert every program’s characteristic into a useful invariant. Forinstance, game applications are usually involved with huge amount of state changesas the application executes. In such cases, the application may contain only a fewinvariants.Chapter 4. In order to generate a more generic type of oracles that can be usedduring the testing cycle of various applications, including unit and GUI testing, weproposed JSEFT in Chapter 4. JSEFT generates test cases combined with post-condition assertions at the two complementary levels of unit and event-based tests.We use mutation testing to produce our assertions. To evaluate the effectivenessof JSEFT we consider a state-of-the-art JavaScript test generation framework as abasis to compare our technique. The results of the empirical evaluation indicatethat the approach generates test cases with high fault finding capability.Using our tool makes it easier for the tester to find the root cause of the errorthrough the fine grained unit assertions. Moreover, our tool provides a better testsuite comprehension as we reduce the number of test cases. Note that the generatedtests can become hard to understand because of the huge number of test cases andassertions.Though the evaluation results points to the effectiveness of JSEFT in detect-ing faults, we found out that the generated event-based tests are brittle. Therefore,if trivial changes occur on the GUI of the application, our event-based assertionsmay not be valid anymore. A more robust mutation-based assertion generationis required to address this problem. Further analysis of the results obtained fromJSEFT revealed that (1) although, the generated assertions by JSEFT are effectivein detecting the injected faults, the use of mutation testing for the purpose of asser-tion generation can negatively impact the performance, and (2) event-based tests126can potentially miss the code-related errors if the fault does not propagate to theobservable GUI state. We observed that the rate of missed faults by DOM-basedtest cases is higher for the applications that have tight interaction with the GUIthrough the underlying executable code. These two observations form the basisof Chapter 5, where we make use of the GUI-dependent assertions as a guide togenerate code related unit-level assertions.Chapter 5. Fruitful observations from analyzing the results of JSEFT led us to pro-pose Atrina, which utilizes the existing GUI-based (i.e., DOM) assertions as wellas useful execution information inferred from a GUI test suite to automatically gen-erate assertions used for testing individual functions. Our results confirm that thegenerated unit-level assertions surpass the fault finding capability of DOM-basedassertions. We also found out that Atrina outperforms mutation-based assertiongeneration technique in terms of the time efficiency. Though the current results arepromising, we acknowledge that more studies are required to draw more generalconclusions.Further Observations. During the evaluation of different test generation tech-niques proposed in this thesis, we realized that using function closures is quitepopular in writing JavaScript applications. Function closures in JavaScript lan-guage provide a way to make variables and functions private, thus keeping themout of the global scope. While function closures can be called during the testingprocess at the highest program scope they belong to, it is not possible to call themdirectly in test cases, which makes it challenging to assess their outcomes. Onepossible future direction is to measure the extent of such hard-to-test code writ-ten by developers by conducting a thorough empirical study. The results of thestudy can be used towards generating effective test cases by identifying hard-to-test scopes, and if possible expose them to the testing unit through automated coderefactoring. JavaScript developers can also make use of the results of empiricalstudy as a coding recommendation to make their future applications more testableand consequently more maintainable.6.1.2 Research Question 2How can we effectively assess the quality of the existing JavaScript test cases?127We used mutation analysis as the test assessment technique. In the following,we explain our observations as well as potential future work to further improve andexpand the use of mutation analysis.Chapter 3. To assess the quality of test cases, we proposed MUTANDIS, a genericmutation testing approach, that guides the mutation generation towards behaviour-affecting mutants in error-prone portions of the code. The empirical evaluationindicates that MUTANDIS can (1) significantly reduce the number of equivalentmutants, and (2) guide testers towards designing test cases for important portionsof the code from the application’s behaviour point of view.One of the main challenges in adopting mutation testing in industrial environ-ments, is the time and manual effort involved with detecting equivalent mutants.According to a recent study [70], it takes 15 minutes on average to manually detectan equivalent mutant. Our current evaluation results show that our approach con-siderably reduces the number of such useless mutants. This indicates that MUTAN-DIS can potentially reduce the effort required for eliminating such useless mutants,though this needs to be investigated by a thorough user study in future. Testers canalso use MUTANDIS to assess and compare the adequacy of their web applicationtesting techniques.Stubborn Mutants. Reducing the number of equivalent mutants can potentiallyeliminate stubborn (hard-to-kill) mutants, which are particularly important for as-sessing the fault finding capability of test cases. The current evaluation resultsshow that MUTANDIS does not negatively influence the stubbornness of the mu-tants. However, our approach is not particularly designed to generate such muta-tions. We found out that the stubbornness of the mutants generated by MUTANDISstems from the inherent characteristics of the JavaScript functions. For example,one such characteristic is function variadicity, meaning that a function can be calledwith an arbitrary number of arguments. Therefore, one interesting future work di-rection is to enhance the mutation generation technique by taking into accountsuch specific function features. This way we can reduce the number of equivalentmutants while increasing the number of stubborn mutants.DOM-level Mutation. We proposed DOM-level mutation testing in Chapter 4to generate our DOM-based assertions. However, we currently considered only a128few type of GUI mutations. Moreover, DOM elements are randomly selected forthe purpose of mutation. To make DOM-level mutation more useful, we need toprecisely define the type and location of mutation operators. In DOM mutationthe output is the resulting state of an executed event. Therefore, the scope of themutation operators differs from the traditional code-based mutant generators. Asa future work to enhance our current mutation technique, we need to (1) define anew set of DOM-based mutation operators, and (2) design a new equivalent mutantdetection method, which is capable of identifying mutants that are equivalent attheir DOM-level properties.6.2 Concluding RemarksThe work presented in this thesis has focused on providing JavaScript applicationswith a rich set of new test automation techniques. Given the growing popularityof JavaScript and the challenges of testing this dynamic language, we see manyopportunities for using the proposed techniques in practice. Further, developerscan use our approaches not only to test the applications, but also to assess theadequacy of their web application tests.Although, the approaches proposed in this thesis have been tailored for JavaSc-ript-based applications, a number of contributions in test generation as well as testassessment techniques are applicable to other programming languages as well. Asfar as our test generation technique is concerned:• DOM-level mutation, which is proposed to generate DOM-based assertions,can be extended to support GUI-level mutation. The GUI mutation-basedoracle generation can be utilized to create test oracles for any type of appli-cation that has rich user interface interactions.• Our function coverage maximization, which is used to increase the numberof functions executed during the model extraction phase, is generic enoughto be applied in other event driven applications as well.• Our slicing-based technique, which is proposed to generate unit-level asser-tions by utilizing existing DOM-based test assertions, can be extended to129model GUI-level test cases (e.g.; written in Java). The extracted model isthen converted to the corresponding unit-level assertions.Our test assessment approach is designed for the purpose of mutation analysis,however the proposed FunctionRank metric, which measures the relative impor-tance of functions, can be used in program comprehension analysis as well. It isworth mentioning that although we proposed a set of specific JavaScript mutationoperators, the overall mutation generation methodology can be applied to otherprogramming languages (e.g.; Java) as well.130Bibliography[1] AddressBook. http://sourceforge.net/projects/php-addressbook/.Accessed: 2015-09-30. → pages 113[2] Atrina. https://github.com/saltlab/Atrina. Accessed: 2015-05-30. → pages112, 113[3] Brotherhood Social Network. https://github.com/HSJared/Social-Network/.Accessed: 2015-09-30. → pages 113[4] Claroline. http://www.claroline.net/. Accessed: 2015-03-30. → pages 113[5] WSO2 EnterpriseStore. https://github.com/wso2/enterprise-store.Accessed: 2015-03-30. → pages 113[6] jQuery API. http://api.jquery.com. Accessed: 2014-06-30. → pages 71[7] JSCover. http://tntim96.github.io/JSCover/. Accessed: 2014-01-30. →pages 88[8] JSeft. http://salt.ece.ubc.ca/software/jseft/. Accessed: 2014-06-30. →pages 69, 86, 87[9] Google. Mutation Summary Library.http://code.google.com/p/mutation-summary/. Accessed: 2014-10-30. →pages 112[10] Phormer Photogallery. http://sourceforge.net/projects/rephormer/.Accessed: 2015-03-30. → pages 113[11] Qunit. http://qunitjs.com. Accessed: 2014-06-30. → pages 2, 77, 86[12] Selenium. http://docs.seleniumhq.org/. Accessed: 2015-05-30. → pages 2[13] Environment Simulated Study Room.https://github.com/NhatHo/Environment-Simulated-Study-Room/.Accessed: 2015-09-30. → pages 113131[14] WolfCMS. https://github.com/wolfcms/wolfcms. Accessed: 2015-03-30.→ pages 113[15] String replace JavaScript bad design. http://www.thespanner.co.uk/2010/09/27/string-replace-javascript-bad-design/,2010. Accessed: 2012-08-30. → pages 47[16] GitHut. A small place to discover languages in GitHub. http://githut.info,2015. Accessed: 2015-04-30. → pages 2[17] StackOverflow.2015 developer survey.http://stackoverflow.com/research/developer-survey-2015#tech, 2015.Accessed: 2015-04-30. → pages 2[18] K. Adamopoulos, M. Harman, and R. Hierons. How to overcome theequivalent mutant problem and achieve tailored selective mutation usingco-evolution. In Proc. Genetic and Evolutionary Computation Conference(GECCO), pages 1338–1349. ACM, 2004. → pages 31, 65[19] N. Alshahwan and M. Harman. Automated session data repair for webapplication regression testing. In Proceedings of the Int. Conf. on SoftwareTesting, Verification, and Validation (ICST’08), pages 298–307. IEEEComputer Society, 2008. ISBN 978-0-7695-3127-4.doi:http://doi.ieeecomputersociety.org/10.1109/ICST.2008.56. → pages 28[20] S. Anand, E. K. Burke, T. Y. Chen, J. Clark, M. B. Cohen, W. Grieskamp,M. Harman, M. J. Harrold, and P. Mcminn. An orchestrated survey ofmethodologies for automated software test case generation. Journal ofSystems and Software, 86(8):1978–2001, August 2013. → pages 1[21] J. Andrews, L. Briand, and Y. Labiche. Is mutation an appropriate tool fortesting experiments? In Proc. Intl. Conference on Software Engineering(ICSE), pages 402–411. ACM, 2005. → pages 6, 31[22] S. Artzi, J. Dolby, S. Jensen, A. Møller, and F. Tip. A framework forautomated testing of JavaScript web applications. In Proc. 33rdInternational Conference on Software Engineering (ICSE’11), pages571–580, 2011. → pages 4, 5, 28, 32, 69, 70, 89, 94, 97[23] K. Ayari, S. Bouktif, and G. Antoniol. Automatic mutation test input datageneration via ant colony. In Proc. Annual Conference on Genetic andEvolutionary Computation (GECCO), pages 1074–1081. ACM, 2007. →pages 66132[24] E. F. Barbosa, J. C. Maldonado, and A. M. R. Vincenzi. Toward thedetermination of sufficient mutant operators for C. Software Testing,Verification and Reliability (STVR), 11(2):113–136, 2001. → pages 7, 64[25] V. Basili, L. Briand, and W. Melo. A validation of object orient designmetrics as quality indicators. IEEE Transaction on Software Engineering(TSE), 22(10):751–761, 1996. → pages 36[26] C.-P. Bezemer, A. Mesbah, and A. van Deursen. Automated securitytesting of web widget interactions. In Proc. Euro. Softw. Eng. Conf. andSymp. on the Foundations of Softw. Eng. (ESEC-FSE), pages 81–91. ACM,2009. → pages 14, 19[27] P. Bhattacharya, M. Iliofotou, I. Neamtiu, and M. Faloutsos. Graph-basedanalysis and prediction for software evolution. In Proc. InternationalConference on Software Engineering (ICSE), pages 419–429. ACM, 2012.→ pages 67[28] R. Binder. Testing Object-Oriented Systems: Models, Patterns, and Tools.Addison-Wesley, 2000. → pages 27[29] M. Boshernitsan, R. Doong, and A. Savoia. From Daikon to Agitator:lessons and challenges in building a commercial tool for developer testing.In Proc. International Symposium on Software Testing and Analysis(ISSTA’06), pages 169–180. ACM, 2006.doi:http://doi.acm.org/10.1145/1146238.1146258. → pages 16, 28[30] L. Bottaci. Type sensitive application of mutation operators fordynamically typed programs. In Proc. 5th International Workshop onMutation Analysis, pages 126–131, 2010. → pages 7, 65[31] S. Brin and L. Page. The anatomy of a large-scale hypertextual web searchengine. Computer Networks and ISDN Systems, 30(1-7):107–117, 1998.→ pages 36, 38[32] T. Budd and D. Angluin. Two notions of correctness and their relation totesting. Acta Informatica, 18(1):31–45, 1982. → pages 6, 31[33] A. Burgess. The 11 JavaScript mistakes you are making. http://net.tutsplus.com/tutorials/javascript-ajax/the-10-javascript-mistakes-youre-making/,2011. → pages 47133[34] L. A. Clarke and D. S. Rosenblum. A historical perspective on runtimeassertion checking in software development. ACM SIGSOFT SoftwareEngineering Notes, 31(3):25–37, 2006. → pages 28[35] T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction toAlgorithms. McGraw-Hill Higher Education, 2nd edition, 2001. → pages80[36] D. Crockford. JavaScript: The Good Parts. O’Reilly Media, Inc., 2008. →pages 32, 47, 48[37] C. Csallner, N. Tillmann, and Y. Smaragdakis. DySy: Dynamic symbolicexecution for invariant inference. In Proceedings of the 30th InternationalConference on Software Engineering (ICSE’08), pages 281–290. ACM,2008. → pages 16, 28[38] R. DeMillo, R. Lipton, and F. Sayward. Hints on test data selection: Helpfor the practicing programmer. Computer, 11(4):34–41, 1978. → pages 80[39] L. Deng, J. Offutt, and N. Li. Empirical evaluation of the statementdeletion mutation operator. In Proc. International Conference on SoftwareTesting Verification and Validation (ICST), pages 84–93. IEEE ComputerSociety, 2013. → pages 8, 66[40] J. Domı´nguez-Jime´nez, A. Estero-Botaro, A. Garca-Domnguez, andI. Medina-Bulo. Evolutionary mutation testing. Information and SoftwareTechnology, 53(10):1108–1123, 2011. → pages 7, 65[41] S. Elbaum, G. Rothermel, S. Karre, and M. Fisher. Leveraging user-sessiondata to support web application testing. IEEE Trans. Softw. Eng., 31:187–202, 2005. ISSN 0098-5589. → pages 28[42] S. Elbaum, H. N. Chin, M. B. Dwyer, and M. Jorde. Carving and replayingdifferential unit test cases from system test cases. IEEE Transactions onSoftware Engineering (TSE), 35(1):29–45, 2009. → pages 94, 123[43] M. Ernst, J. Perkins, P. Guo, S. McCamant, C. Pacheco, M. Tschantz, andC. Xiao. The Daikon system for dynamic detection of likely invariants.Science of Computer Programming, 69(1-3):35–45, 2007. → pages 16, 19,28[44] A. M. Fard, M. Mirzaaghaei, and A. Mesbah. Leveraging existing tests inautomated test generation for web applications. In Proc. International134Conference on Automated Software Engineering (ASE’14), pages 67–78,2014. → pages 97, 123[45] A. Feldthaus, M. Schafer, M. Sridharan, J. Dolby, and F. Tip. Efficientconstruction of approximate call graphs for javascript ide services. In Proc.International Conference on Software Engineering (ICSE), pages 752–761.ACM, 2013. → pages 108[46] G. Fraser and A. Arcuri. The seed is strong: Seeding strategies insearch-based software testing. In Proc. International Conference onSoftware Testing Verification and Validation (ICST’12), pages 121–130.IEEE Computer Society, 2012. → pages 123[47] G. Fraser and A. Zeller. Generating parameterized unit tests. In Proc.International Symposium on Software Testing and Analysis (ISSTA’11),pages 364–374, 2011. → pages 4, 94, 122[48] G. Fraser and A. Zeller. Mutation-driven generation of unit tests andoracles. IEEE Transactions on Software Engineering (TSE), 38(2):278–292, 2012. → pages 4, 66, 81, 94, 97, 115, 122[49] G. Fraser, M. Staats, P. McMinn, A. Arcuri, and F. Padberg. Doesautomated white-box test generation really help software testers? In Proc.International Symposium on Software Testing and Analysis (ISSTA’13),pages 188–198. ACM, 2013.doi:http://doi.acm.org/10.1145/1146238.1146258. → pages 123[50] V. Garousi, A. Mesbah, A. B. Can, and S. Mirshokraie. A systematicmapping study of web application testing. Information and SoftwareTechnology, 55(8):1374–1396, 2013. → pages 10[51] M. Gligoric, L. Z. C. Pereira, and G. Pokam. Selective mutation testing forconcurrent code. In Proc. Intl. Symp. Softw. Testing and Analysis (ISSTA),pages 224–234. ACM, 2013. → pages 65[52] F. Groeneveld, A. Mesbah, and A. van Deursen. Automatic invariantdetection in dynamic web applications. Technical ReportTUD-SERG-2010-037, TUDelft, 2010. → pages 19[53] S. Guarnieri and B. Livshits. Gatekeeper: mostly static enforcement ofsecurity and reliability policies for JavaScript code. In Conference onUSENIX security symposium, SSYM’09, pages 151–168, 2009. → pages28135[54] A. Guha, S. Krishnamurthi, and T. Jim. Using static analysis for Ajaxintrusion detection. In Intl. conference on World Wide Web (WWW), pages561–570, 2009. ISBN 978-1-60558-487-4. → pages 28[55] P. Gurbani and S. Cinos. Top 13 JavaScript mistakes.http://blog.tuenti.com/dev/top-13-javascript-mistakes/, 2010. → pages 47[56] S. Hangal and M. S. Lam. Tracking down software bugs using automaticanomaly detection. In Proceedings of the 24th International Conference onSoftware Engineering (ICSE’02), pages 291–301. ACM Press, 2002. →pages 16, 28[57] M. Harman, Y. Jia, and W. Langdon. Strong higher order mutation-basedtest data generation. In Proc. ACM SIGSOFT International Symposium onFoundations of software engineering (FSE), pages 212–222. ACM, 2011.→ pages 66[58] M. Harman, P. McMinn, M. Shahbaz, and S. Yoo. A comprehensive surveyof trends in oracles for software testing. Technical Report CS-13-01,Department of Computer Science, University of Sheffield, 2013. → pages122[59] R. Hierons, M. Harman, and S. Danicic. Using program slicing to assist inthe detection of equivalent mutants. Software Testing, Verification, andReliability, 9(4):233–262, 1999. → pages 7, 65[60] T. Ho. Premature invocation.http://tobyho.com/2011/10/26/js-premature-invocation/, 2011. → pages 47[61] J. Hsu. JavaScript anti-patterns.http://jaysoo.ca/2010/05/06/javascript-anti-patterns/, 2010. → pages 47[62] K. Jalbert and J. Bradbury. Predicting mutation score using source codeand test suite metrics. In Proc. International Workshop on Realizing AISynergies in Software Engineering (RAISE), pages 42–46, 2012. → pages52[63] C. Jensen, A. Møller, and Z. Su. Server interface descriptions forautomated testing of JavaScript web applications. In Proc. EuropeanSoftware Engineering Conference and ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (ESEC/FSE’013).ACM, 2013. → pages 4, 94, 122136[64] C. Ji, Z. Chen, B. Xu, and Z. Zhao. A novel method of mutation clusteringbased on domain analysis. In Proc. 21st International conference onSoftware Engineering and Knowledge Engineering (SEKE), pages422–425, 2009. → pages 6, 64[65] Y. Jia and M. Harman. Constructing subtle faults using higher ordermutation testing. In Proceedings of the 8th International WorkingConference on Source Code Analysis and Manipulation (SCAM’08), pages249–258. ACM, 2008. → pages 7, 64[66] Y. Jia and M. Harman. An analysis and survey of the development ofmutation testing. IEEE Transactions on Software Engineering (TSE), 37(5):649–678, 2010. → pages 6, 7, 31, 64[67] M. Kintis, M. Papadakis, and N. Malevris. Isolating first order equivalentmutants via second order mutation. In Proc. International Conference onSoftware Testing Verification and Validation (ICST), pages 701–710. IEEEComputer Society, 2012. → pages 66[68] W. Langdon, M. Harman, and Y. Jia. Efficient multi-objective higher ordermutation testing with genetic programming. Journal of Systems andSoftware, 83(12):2416–2430, 2010. → pages 65[69] P. Loyola, M. Staats, I. Ko, and G. Rothermel. Dodona: automated oracledata set selection. In Proc. ACM SIGSOFT International Symposium onSoftware Testing and Analysis (ISSTA’14), pages 193–203, 2014. → pages4, 5[70] L. Madeyski, W. Orzeszyna, R. Torkar, and M. Jozala. Overcoming theequivalent mutant problem: A systematic literature review and acomparative experiment of second order mutation. IEEE Transactions onSoftware Engineering (TSE), 40(1):23–42, 2013. → pages 6, 7, 31, 60, 64,128[71] A. Marchetto and P. Tonella. Using search-based algorithms for Ajax eventsequence generation during testing. Empirical Software Engineering, 16(1):103–140, 2011. → pages 3, 5, 69, 93[72] A. Marchetto, P. Tonella, and F. Ricca. State-based testing of Ajax webapplications. In Proc. 1st Int. Conference on Sw. Testing Verification andValidation (ICST’08), pages 121–130. IEEE Computer Society, 2008. →pages 5, 28, 69137[73] T. McCabe. A complexity measure. IEEE Transaction on SoftwareEngineering (TSE), SE-2(4):308–320, 1976. → pages 42[74] A. Mesbah and S. Mirshokraie. Automated analysis of CSS rules tosupport style maintenance. In Proc. International Conference on SoftwareEngineering (ICSE’12), pages 408–418, 2012. → pages 10[75] A. Mesbah, A. van Deursen, and S. Lenselink. Crawling Ajax-based webapplications through dynamic analysis of user interface state changes.ACM Transactions on the Web (TWEB), 6(1):3:1–3:30, 2012. → pages 15,19, 49, 86, 93[76] A. Mesbah, A. van Deursen, and D. Roest. Invariant-based automatictesting of modern web applications. IEEE Transactions on SoftwareEngineering (TSE), 38(1):35–53, 2012. → pages 4, 5, 28, 32, 69, 73, 93,122[77] B. Meyer. Applying “design by contract”. Computer, 25(10):40–51, 1992.ISSN 0018-9162. doi:http://dx.doi.org/10.1109/2.161279. → pages 13[78] S. Mirshokraie and A. Mesbah. JSART: JavaScript assertion-basedregression testing. In Proc. International Conference on Web Engineering(ICWE’12), pages 238–252. Springer, 2012. → pages iii, 9, 11, 44, 49, 93,122[79] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Efficient JavaScriptmutation testing. In Proc. 6th International Conference on SoftwareTesting Verification and Validation (ICST’13), pages 74–83. IEEEComputer Society, 2013. → pages iii, 9, 84, 86, 114[80] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Pythia: Generating testcases with oracles for JavaScript applications. In Proc. InternationalConference on Automated Software Engineering (ASE), New Ideas Track,pages 610–615. IEEE Computer Society, 2013. → pages iii, 9[81] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. JSEFT: AutomatedJavaScript unit test generation. In Proc. 8th International Conference onSoftware Testing Verification and Validation (ICST’15), pages 1–10. IEEEComputer Society, 2015. → pages iii, 9, 68, 97, 115[82] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Guided mutation testingfor javascript web applications. IEEE Transactions on SoftwareEngineering (TSE), 41(5):429–444, 2015. → pages iii, 9, 30, 114, 115138[83] N. Nagappan, T. Ball, and A. Zeller. Mining metrics to predict componentfailures. In Proc. International Conference on Software Engineering(ICSE), pages 452–461. IEEE Computer Society, 2006. → pages 36[84] A. S. Namin, J. H. Andrews, and D. J. Murdoch. Sufficient mutationoperators for measuring test effectiveness. In Proc. InternationalConference on Software Engineering (ICSE), pages 351–360. ACM, 2008.→ pages 7, 64[85] F. Ocariza, K. Pattabiraman, and B. Zorn. JavaScript errors in the wild: Anempirical study. In Proc. of the International Symposium on SoftwareReliability Engineering (ISSRE), pages 100–109. IEEE Computer Society,2011. → pages 32, 54[86] F. Ocariza, K. Bajaj, K. Pattabiraman, and A. Mesbah. An empirical studyof client-side JavaScript bugs. In Proc. ACM/IEEE InternationalSymposium on Empirical Software Engineering and Measurement(ESEM’13), pages 55–64. IEEE Computer Society, 2013. → pages 2, 95[87] F. J. Ocariza, K. Pattabiraman, and A. Mesbah. AutoFLox: An automaticfault localizer for client-side JavaScript. In Proceedings of the 5th IEEEInternational Conference on Software Testing, Verification and Validation(ICST’12), pages 31–40. IEEE Computer Society, 2012. → pages 15[88] A. Offutt and J. Pan. Detecting equivalent mutants and the feasible pathproblem. In International Conference on Computer Assurance(COMPASS), pages 224–236, 1996. → pages 7, 31, 64[89] A. Offutt and J. Pan. Automatically detecting equivalent mutants andinfeasible paths. Software Testing, Verification, and Reliability, 7(3):165–192, 1997. → pages 7, 31, 64[90] A. Osmani. Learning JavaScript Design Patterns. O’Reilly Media, 2012.→ pages 47[91] C. Pacheco and M. Ernst. Eclat: Automatic generation and classification oftest inputs. In Proc. European Conference on Object-OrientedProgramming (ECOOP’05), pages 50–527, 2005. → pages 122[92] F. Pastore, L. Mariani, and G. Fraser. CrowdOracles: Can the crowd solvethe oracle problem? In Proc. International Conference on Software TestingVerification and Validation (ICST’13), pages 342–351. IEEE ComputerSociety, 2013. → pages 123139[93] K. Pattabiraman and B. Zorn. DoDOM: Leveraging DOM invariants forWeb 2.0 application robustness testing. In Proc. Int. Conf. Sw. ReliabilityEngineering (ISSRE’10), pages 191–200. IEEE Computer Society, 2010.→ pages 28[94] M. Pezze`, K. Rubinov, and J. Wuttke. Generating effective integration testcases from unit ones. In Proc. International Conference on SoftwareTesting Verification and Validation (ICST’13), pages 11–20. IEEEComputer Society, 2013. → pages 123[95] C. Porteneuve. Getting out of binding situations in JavaScript.http://www.alistapart.com/articles/getoutbindingsituations/, 2008. → pages48[96] S. Ratcliff, D. White, and J. Clark. Searching for invariants using geneticprogramming and mutation testing. In Proceedings of the 13th AnnualConference on Genetic and Evolutionary Computation (GECCO). ACM,2011. → pages 28[97] G. Richards, S. Lebresne, B. Burg, and J. Vitek. An analysis of thedynamic behavior of JavaScript programs. In Proc. of the conf. onProgramming language design and implementation (PLDI’10), pages1–12. ACM, 2010. ISBN 978-1-4503-0019-3.doi:http://doi.acm.org/10.1145/1806596.1806598. → pages 63[98] E. Rodrı´guez-Carbonell and D. Kapur. Program verification usingautomatic generation of invariants. In Proc. 1st International Colloquiumon Theoretical Aspects of Computing (ICTAC’04), volume 3407 of LectureNotes in Computer Science, pages 325–340. Springer-Verlag, 2005. →pages 28[99] D. Roest, A. Mesbah, and A. van Deursen. Regression testing Ajaxapplications: Coping with dynamism. In Proc. 3rd Int. Conf. on Sw.Testing, Verification and Validation (ICST’10), pages 128–136. IEEEComputer Society, 2010. → pages 12, 28[100] B. L. Roy. Three common mistakes in JavaScript/EcmaScript.http://weblogs.asp.net/bleroy/archive/2005/02/15/Three-common-mistakes-in-JavaScript- 2F00 -EcmaScript.aspx, 2005.→ pages 47, 48140[101] P. Runeson and M. Ho¨st. Guidelines for conducting and reporting casestudy research in software engineering. Empirical Software Engineering,14(2):131–164, 2009. → pages 20[102] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song. Asymbolic execution framework for JavaScript. In Proc. Symp. on Securityand Privacy (SP’10), pages 513–528. IEEE Computer Society, 2010. ISBN978-0-7695-4035-1. doi:http://dx.doi.org/10.1109/SP.2010.38. → pages 4,5, 69, 94[103] D. Schuler and A. Zeller. Covering and uncovering equivalent mutants.Software Testing, Verification, and Reliability, 23(5):353–374, 2012. →pages 6, 7, 31, 58, 66[104] D. Schuler, V. Dallmeier, and A. Zeller. Efficient mutation testing bychecking invariant violations. In Proc. Intl. Symp. Softw. Testing andAnalysis (ISSTA), pages 69–79. ACM, 2009. → pages 7, 43, 66[105] K. Sen, S. Kalasapur, T., and S. Gibbs. Jalangi: A selective record-replayand dynamic analysis framework for JavaScript. In Proc. EuropeanSoftware Engineering Conference and ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (ESEC/FSE’013).ACM, 2013. → pages 4, 94, 122[106] S. Sprenkle, E. Gibson, S. Sampath, and L. Pollock. Automated replay andfailure detection for web applications. In ASE’05: Proc. 20th IEEE/ACMInt. Conf. on Automated Sw. Eng., pages 253–262. ACM, 2005. ISBN1-59593-993-4. doi:http://doi.acm.org/10.1145/1101908.1101947. →pages 28[107] M. Staats, G. Gay, and M. Heimdahl. Automated oracle creation support,or: How I learned to stop worrying about fault propagation and lovemutation testing. In Proc. International Conference on SoftwareEngineering (ICSE’11), pages 870–880, 2011. → pages 4, 5, 95, 123[108] K. Taneja and T. Xie. DiffGen: Automated regression unit-test generation.In Proc. International Conference on Automated Software Engineering(ASE’08), pages 407–410. IEEE Computer Society, 2008. → pages 94,122, 123[109] A. Tarhini, Z. Ismail, and N. Mansour. Regression testing web applications.In Int. Conf. on Advanced Comp. Theory and Eng., pages 902–906. IEEEComputer Society, 2008. ISBN 978-0-7695-3489-3.141doi:http://doi.ieeecomputersociety.org/10.1109/ICACTE.2008.82. → pages12, 28[110] P. Tonella. Evolutionary testing of classes. In Proc. ACM SIGSOFTInternational Symposium on Software Testing and Analysis (ISSTA’04),pages 119–128, 2004. → pages 123[111] E. Weyl. 16 common JavaScript gotchas.http://www.standardista.com/javascript/15-common-javascript-gotchas/,2010. → pages 47, 48[112] M. Woodward. Mutation testing - its origin and evolution. Information andSoftware Technology, 35(3):163–169, 1993. → pages 59[113] Q. Xie and A. M. Memon. Designing and comparing automated testoracles for gui-based software applications. ACM Transactions on SoftwareEngineering and Methodology (TOSEM), 16(1), 2007. → pages 122[114] L. Xu, B. Xu, Z. Chen, J. Jiang, and H. Chen. Regression testing for webapplications based on slicing. In Proc. of Int. Conf. on Computer Softwareand Applications (COMPSAC), pages 652–656. IEEE Computer Society,2003. ISBN 0-7695-2020-0. → pages 28[115] X. Yao, M. Harman, and Y. Jia. A study of equivalent and stubbornmutation operators using human analysis of equivalence. In Proc.International Conference on Software Engineering (ICSE), pages 919–930.ACM, 2014. → pages 53, 60[116] L. Zhang, S. Hou, J. Hu, T. Xie, and H. Mei. Is operator-based mutantselection superior to random mutant selection? In Proc. InternationalConference on Software Engineering (ICSE), pages 435–444. ACM, 2010.→ pages 7, 64[117] L. Zhang, D. Marinov, and S. Khurshid. Faster mutation testing inspired bytest prioritization and reduction. In Proc. Intl. Symp. Softw. Testing andAnalysis (ISSTA), pages 235–245. ACM, 2013. → pages 66[118] Y. Zheng, T. Bao, and X. Zhang. Statically locating web application bugscaused by asynchronous calls. In Proceedings of the Intl. Conference onthe World-Wide Web (WWW), pages 805–814. ACM, 2011. → pages 28142

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0167728/manifest

Comment

Related Items