Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Directed test generation and analysis for web applications Milani Fard, Amin 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2017_may_milanifard_amin.pdf [ 2.23MB ]
JSON: 24-1.0340953.json
JSON-LD: 24-1.0340953-ld.json
RDF/XML (Pretty): 24-1.0340953-rdf.xml
RDF/JSON: 24-1.0340953-rdf.json
Turtle: 24-1.0340953-turtle.txt
N-Triples: 24-1.0340953-rdf-ntriples.txt
Original Record: 24-1.0340953-source.json
Full Text

Full Text

Directed Test Generation and Analysis forWeb ApplicationsbyAmin Milani FardBSc. Computer Engineering, Ferdowsi University of Mashhad, Iran, 2008MSc. Computing Science, Simon Fraser University, Canada, 2010A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)January 2017© Amin Milani Fard, 2017AbstractThe advent of web technologies has led to the proliferation of modern web appli-cations with enhanced user interaction and client-side execution. JavaScript (themost widely used programming language) is extensively used to build responsivemodern web applications. The event-driven and dynamic nature of JavaScript, andits interaction with the Document Object Model (DOM), make it challenging tounderstand and test effectively. The ultimate goal of this thesis is to improve thequality of web applications through automated testing and maintenance.The work presented in this dissertation has focused on advancing the state-of-the-art in testing and maintaining web applications by proposing a new set oftechniques and tools. We proposed (1) a feedback-directed exploration techniqueand a tool to cover a subset of the state-space of a given web application; the ex-ploration is guided towards achieving higher functionality, navigational, and pagestructural coverage while reducing the test model size, (2) a technique and a toolto generate UI tests using existing tests; it mines the existing test suite to infera model of the covered DOM states and event-based transitions including inputvalues and assertions; it then expands the inferred model by exploring alternativepaths and generates assertions for the new states; finally it generates a new testsuite from the extended model, (3) the first empirical study on JavaScript tests tocharacterize their prevalence and quality metrics, and to find out root causes forthe uncovered (missed) parts of the code under test, (4) a DOM-based JavaScripttest fixture generation technique and a tool, which is based on dynamic symbolicexecution; it guides the executing through different branches of a function by pro-ducing expected DOM instances, (5) a technique and a tool to detect JavaScriptcode smells using static and dynamic analysis.We evaluated the presented techniques by conducting various empirical studiesand comparisons. The evaluation results point to the effectiveness of the proposedtechniques in terms of fault detection capability and code coverage for test genera-tion, and in terms of accuracy for code smell detection.iiPrefaceResearch projects included in this dissertation have been either published or cur-rently under review. I have conducted the research described in this work in col-laboration with my supervisor, Ali Mesbah. I was the main contributor for theresearch projects, including the idea, tool development, and evaluations. I alsohad the collaboration of Mehdi Mirzaaghaei, and Eric Wohlstadter in two of thepresented work.The following list presents publications for each chapter.• Chapter 2:– Feedback-Directed Exploration of Web Applications to Derive TestModels [123]: A. Milani Fard, A. Mesbah, IEEE International Sympo-sium on Software Reliability Engineering (ISSRE’13), 278–287, 2013,©IEEE, Reprinted by permission;• Chapter 3:– Leveraging Existing Tests in Automated Test Generation for Web Ap-plications [126]: A. Milani Fard, M. Mirzaaghaei, A. Mesbah, IEEE/ACM International Conference on Automated Software Engineering(ASE’14), 67–78, 2014, ©ACM, Inc., Reprinted by permission;– Leveraging Existing Tests in Automated Web Test Generation: A. Mi-lani Fard, A. Mesbah, Submitted to a software engineering journal;• Chapter 4:– JavaScript: The (Un)covered Parts [125]: A. Milani Fard, A. Mesbah,Accepted at IEEE International Conference on Software Testing, Veri-fication and Validation (ICST’17), 11 pages, 2017;• Chapter 5:– Generating Fixtures for JavaScript Unit Testing [127]: A. Milani Fard,A. Mesbah, and E. Wohlstadter, IEEE/ACM International Conferenceon Automated Software Engineering (ASE’15), 190–200, 2015, ©IEEE,Reprinted by permission;iii• Chapter 6:– JSNose: Detecting JavaScript Code Smells [124]: A. Milani Fard, A.Mesbah, IEEE International Conference on Source Code Analysis andManipulation (SCAM’13), 116–125, 2013, ©IEEE, Reprinted by per-mission;ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 UI Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1 Test Model Generation . . . . . . . . . . . . . . . . . . . 21.1.2 UI Test Generation . . . . . . . . . . . . . . . . . . . . . 31.2 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 JavaScript Test Quality Assessment . . . . . . . . . . . . 41.2.2 JavaScript Unit Test Generation . . . . . . . . . . . . . . 61.3 Code Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.1 JavaScript Code Smell Detection . . . . . . . . . . . . . . 71.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Feedback-Directed Exploration of Web Applications to Derive TestModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Background and Motivation . . . . . . . . . . . . . . . . . . . . 132.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.1 Deriving Test Models . . . . . . . . . . . . . . . . . . . . 162.3.2 Feedback-directed Exploration Algorithm . . . . . . . . . 18v2.3.3 State Expansion Strategy . . . . . . . . . . . . . . . . . . 202.3.4 Event Execution Strategy . . . . . . . . . . . . . . . . . . 242.3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . 262.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 272.4.1 Experimental Objects . . . . . . . . . . . . . . . . . . . . 272.4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 282.4.3 Results and Findings . . . . . . . . . . . . . . . . . . . . 302.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Leveraging Existing Tests in User Interface Test Generation for WebApplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2 Background and Motivation . . . . . . . . . . . . . . . . . . . . 403.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.3.1 Mining Human-Written Test Cases . . . . . . . . . . . . 433.3.2 Exploring Alternative Paths . . . . . . . . . . . . . . . . 473.3.3 Regenerating Assertions . . . . . . . . . . . . . . . . . . 483.3.4 Test Suite Generation . . . . . . . . . . . . . . . . . . . . 573.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.5 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 583.5.1 Experimental Objects . . . . . . . . . . . . . . . . . . . . 583.5.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 593.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.6.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . 683.6.2 Generating Negative Assertions . . . . . . . . . . . . . . 683.6.3 Test Case Dependencies . . . . . . . . . . . . . . . . . . 693.6.4 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . 693.6.5 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 693.6.6 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 703.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734 JavaScript: The (Un)covered Parts . . . . . . . . . . . . . . . . . . . 744.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.2.1 Subject Systems . . . . . . . . . . . . . . . . . . . . . . 764.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 794.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.3.1 Prevalence of Tests . . . . . . . . . . . . . . . . . . . . . 85vi4.3.2 Quality of Tests . . . . . . . . . . . . . . . . . . . . . . . 884.3.3 (Un)covered Code . . . . . . . . . . . . . . . . . . . . . 924.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 954.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995 Generating Fixtures for JavaScript Unit Testing . . . . . . . . . . . 1005.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.2 Background and Motivation . . . . . . . . . . . . . . . . . . . . 1035.2.1 DOM Fixtures for JavaScript Unit Testing . . . . . . . . . 1035.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 1065.2.3 Dynamic Symbolic Execution . . . . . . . . . . . . . . . 1065.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.3.1 Collecting DOM-based Traces . . . . . . . . . . . . . . . 1085.3.2 Deducing DOM Constraints . . . . . . . . . . . . . . . . 1095.3.3 Translating Constraints to XPath . . . . . . . . . . . . . . 1125.3.4 Constructing DOM Fixtures . . . . . . . . . . . . . . . . 1155.3.5 Implementation Details . . . . . . . . . . . . . . . . . . . 1175.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 1215.4.1 Experimental Objects . . . . . . . . . . . . . . . . . . . . 1215.4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 1225.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1245.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286 Detecting JavaScript Code Smells . . . . . . . . . . . . . . . . . . . 1296.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.2 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . 1316.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326.4 JavaScript Code Smells . . . . . . . . . . . . . . . . . . . . . . . 1346.4.1 Closure Smells . . . . . . . . . . . . . . . . . . . . . . . 1346.4.2 Coupling between JavaScript, HTML, and CSS . . . . . . 1376.4.3 Excessive Global Variables . . . . . . . . . . . . . . . . . 1396.4.4 Long Message Chain . . . . . . . . . . . . . . . . . . . . 1396.4.5 Nested Callback . . . . . . . . . . . . . . . . . . . . . . 1406.4.6 Refused Bequest . . . . . . . . . . . . . . . . . . . . . . 1416.5 Smell Detection Mechanism . . . . . . . . . . . . . . . . . . . . 1416.5.1 Metrics and Criteria Used for Smell Detection . . . . . . . 1446.5.2 Combining Static and Dynamic Analysis . . . . . . . . . 1456.5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . 1486.6 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 148vii6.6.1 Experimental Objects . . . . . . . . . . . . . . . . . . . . 1496.6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 1496.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1526.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 1536.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1547 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557.1 Revisiting Research Questions and Future Directions . . . . . . . 1557.2 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 159Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161viiiList of TablesTable 2.1 Experimental objects (statistics excluding blank/comment lines,and JavaScript libraries). . . . . . . . . . . . . . . . . . . . . . 27Table 2.2 State-space exploration methods evaluated. . . . . . . . . . . . 28Table 2.3 Results of different exploration methods. . . . . . . . . . . . . 30Table 3.1 Summary of the assertion reuse/regeneration conditions for anelement e j on a DOM state s j, given a checked element ei onstate si. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Table 3.2 DOM element features used to train a classifier. . . . . . . . . 54Table 3.3 Experimental objects. . . . . . . . . . . . . . . . . . . . . . . 59Table 3.4 Test suite generation methods evaluated. . . . . . . . . . . . . 60Table 3.5 DOM mutation operators. . . . . . . . . . . . . . . . . . . . . 62Table 3.6 Statistics of the original test suite information usage, averageover experimental objects. . . . . . . . . . . . . . . . . . . . . 65Table 4.1 Our JavaScript subject systems (60K files, 3.7 M productionSLOC, 1.7 M test SLOC, and 100K test cases). . . . . . . . . . 78Table 4.2 Test quality metrics average values. . . . . . . . . . . . . . . . 89Table 4.3 Statistics for analyzing uncovered code. The ”–” sign indicatesno instance of a particular code. . . . . . . . . . . . . . . . . . 93Table 5.1 Examples of DOM constraints, translated XPath expressions,and solved XHTML instances for the running example. . . . . 114Table 5.2 Constraints table for the running example. The “Next to negate”field refers to the last non-negated constraint. . . . . . . . . . . 116Table 5.3 DRT data structure for the running example. . . . . . . . . . . 119Table 5.4 Characteristics of experimental objects excluding blank/com-ment lines and external JavaScript libraries. . . . . . . . . . . . 122Table 5.5 Evaluated function-level test suites. . . . . . . . . . . . . . . . 123Table 5.6 Coverage increase (in percentage point) of test suites on rowsover test suites on columns. Statement and branch coverage areseparated by a slash, respectively. . . . . . . . . . . . . . . . . 125ixTable 6.1 Metric-based criteria for JavaScript code smell detection. . . . 143Table 6.2 Experimental JavaScript-based objects. . . . . . . . . . . . . . 149Table 6.3 Precision-recall analysis (based on the first 9 applications), anddetected code smell statistics (for all 11 applications). . . . . . 151Table 6.4 Spearman correlation coefficients between number of code smellsand code quality metrics. . . . . . . . . . . . . . . . . . . . . 153xList of FiguresFigure 2.1 State-flow graph of the running example. The shaded nodesrepresent partially expanded states, edges with dashed lines arecandidate events, and nodes shown in dashed circle are statesthat might be discovered after event execution. . . . . . . . . 15Figure 2.2 Processing view of FEEDEX, our feedback-directed explorationapproach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Figure 2.3 Simple JavaScript code snippet. . . . . . . . . . . . . . . . . 21Figure 2.4 A simple DOM instance. . . . . . . . . . . . . . . . . . . . . 21Figure 2.5 DOM tree comparison. . . . . . . . . . . . . . . . . . . . . . 23Figure 3.1 A snapshot of the running example (an organizer application)and its partial DOM structure. . . . . . . . . . . . . . . . . . 41Figure 3.2 A human-written DOM-based (SELENIUM) test case for theOrganizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Figure 3.3 Partial view of the running example application’s state-flowgraph. Paths in thick solid lines correspond to the coveredfunctionalities in existing tests. Alternative paths, shown inthin lines, can be explore using a crawler. Alternative pathsthrough newly detected states (i.e., s10 and s11) are highlightedas dashed lines. . . . . . . . . . . . . . . . . . . . . . . . . 42Figure 3.4 Processing view of our approach. . . . . . . . . . . . . . . . 44Figure 3.5 The subsumption lattice showing the is-subsumed-by relationfor generated assertions. . . . . . . . . . . . . . . . . . . . . 56Figure 3.6 Box plots of number of states, transitions, and test cases fordifferent test suites. Mean values are shown with (*). . . . . . 63Figure 3.7 Average number of assertions per state, before and after filter-ing unstable assertions. . . . . . . . . . . . . . . . . . . . . . 65Figure 3.8 Box plots of mutation score using different test suite genera-tion methods. Mean values are shown with (*). . . . . . . . . 66Figure 3.9 Box plots of JavaScript code coverage achieved using differenttest suites. Mean values are shown with (*). . . . . . . . . . . 67xiFigure 4.1 Distribution of studied subject systems. . . . . . . . . . . . . 77Figure 4.2 A hard to test JavaScript code snippet. . . . . . . . . . . . . . 82Figure 4.3 Distribution of JavaScript tests. . . . . . . . . . . . . . . . . 86Figure 4.4 Percentage of subjects with test per each quartile with respectto popularity (number of stars and watchers) and maturity (num-ber of commits and contributors). . . . . . . . . . . . . . . . 87Figure 4.5 Boxplots of the code coverage of the executed JavaScript tests.Mean values are shown with (*). . . . . . . . . . . . . . . . . 88Figure 4.6 Average number of assertions per test. . . . . . . . . . . . . . 90Figure 4.7 Test to total code ratio. . . . . . . . . . . . . . . . . . . . . . 91Figure 4.8 Test to total commits ratio. . . . . . . . . . . . . . . . . . . . 92Figure 5.1 A JavaScript function to compute the items total price. . . . . 103Figure 5.2 A DOM subtree for covering a path (lines 6-8, 10-14, and 18-20) of sumTotalPrice in Figure 5.1. . . . . . . . . . . . . 105Figure 5.3 A QUint test case for the sumTotalPrice function. TheDOM subtree of Figure 5.2 is provided as a fixture before call-ing the function. This test with this fixture covers the pathgoing through lines 6-8, 10-14, and 18 in sumTotalPrice. 105Figure 5.4 Processing view of our approach. . . . . . . . . . . . . . . . 108Figure 5.5 Restricted XPath grammar for modeling DOM constraints. . . 113Figure 5.6 Comparison of statement and branch coverage, for DOM-dependentfunctions, using different test suite generation methods. . . . . 124Figure 6.1 Processing view of JSNOSE, our JavaScript code smell detector. 142xiiList of Algorithms1 Feedback-directed Exploration . . . . . . . . . . . . . . . . . . . . 182 State-Flow Graph Inference . . . . . . . . . . . . . . . . . . . . . 463 Assertion Regeneration . . . . . . . . . . . . . . . . . . . . . . . . 504 Test Fixture Generation . . . . . . . . . . . . . . . . . . . . . . . 1105 JavaScript Code Smell Detection . . . . . . . . . . . . . . . . . . 147xiiiGlossaryList of acronyms and abbreviations.AJAX Asynchronous JavaScript and XMLAPI Application Program InterfaceAST Abstract Syntax TreeBFS Breadth-First SearchCFG Control-Flow GraphCSS Cascading Style SheetsDFS Depth-First SearchDOM Document Object ModelDTD Document Type DefinitionHTML HyperText Markup LanguageJS JavaScriptSFG State-Flow GraphSLOC Source Lines of CodeUI User InterfaceURL Uniform Resource LocatorXHR XMLHttpRequestXML Extensible Markup LanguagexivAcknowledgmentsI would like to express my deepest gratitude to my supervisor, Ali Mesbah, forhis exemplary supervision and guidance in the past four and a half years. He in-troduced me to the field of web software testing and analysis, and had a profoundinfluence on my research. I have always benefited from his insightful commentsand discussions. This thesis would not have been completed without his support.I collaborated with Eric Wohlstadter and Mehdi Mirzaaghaei in two of my re-search projects, whom I would like to thank. I am also thankful to Karthik Pattabi-raman and Ivan Beschastnikh for providing valuable comments and suggestions onmy PhD proposal. Moreover, I would like to thank Philippe Kruchten, MieszkoLis, Hasan Cavusoglu, and Guy-Vincent Jourdan for reading my dissertation andproviding valuable comments.Thanks also go to all faculty members, staff, and friends in the Electrical andComputer Engineering at UBC for providing such a nice academic environment. Inparticular, I have enjoyed the time with all the members in the Software Analysisand Testing Lab. I also gratefully acknowledge the financial support by the Nat-ural Sciences and Engineering Research Council of Canada (NSERC) through itsCanada Graduate Scholarship and Strategic Project Grants, as well as the Four YearDoctoral Fellowship of UBC, that helped me to focus full time on my research.Finally, I would like to thank my family and friends for their love and supportover the years. I am specially indebted to my wife, Hoda, for her patience, love,understanding, and encouragement since we started our life journey ten years ago.I wish to dedicate this thesis to her. Fatima, my beloved daughter, thank you alsofor coming into our life in the middle of my PhD career and bringing joy to ourfamily. And last but not the least, I would like to thank the Almighty for giving methe strength and patience to work through all these years.xvChapter 1IntroductionThe advent of web technologies has led to the proliferation of modern web ap-plications, such as Gmail, Google Drive, Facebook, and YouTube, with enhanceduser interaction and client-side execution. Due to the considerable impact of webapplications on economic, technical, and social activities, it is important to ensuretheir quality. Testing techniques aim at increasing the quality of functional prop-erties of a software to ensure they meet the requirements. On the other hand, thegoal of maintenance techniques is to increase the quality of non-functional (struc-tural) properties. In this work, we aim at improving the quality of web applicationsthrough both testing and maintenance.Web applications are often written in multiple languages such as JavaScript,HTML, and CSS. JavaScript is known to be the most widely used programminglanguage1, which is extensively used to build responsive modern web applications.The event-driven dynamic nature of JavaScript and its interaction with the Docu-ment Object Model (DOM2) [180], make it challenging for developers to under-stand [62], test [66, 130], and debug [141] effectively. Testing frameworks, suchas SELENIUM [42] for DOM-based user interface (UI) testing and QUNIT [40]for JavaScript unit testing, help to automate test execution. However, test casesare written manually, which is a tedious process with an incomplete result. Main-taining web applications is also difficult due to intricate dependencies among thecomponents written in different programming languages.1According to a recent survey of more than 56K developers conducted by Stack Overflow [167],and also exploration of the programming languages used across GitHub repositories [91].2The DOM is a tree-like structure that provides APIs for accessing, traversing, and mutating thecontent and structure of HTML elements at runtime.1The ultimate goal of this thesis is improving the quality of web applicationsthrough automated testing and maintenance. Such automated techniques reducethe time and effort of manual testing and maintenance. Towards this objective, weconsider three perspectives that are complementary to each other: (1) UI testing,(2) unit testing, and (3) code maintenance. The results of each perspective willbenefit the goal of this thesis in a different way.1.1 UI TestingTo avoid dealing separately with the complex interactions between multiple weblanguages, many developers treat the web application as a black-box and test itvia its manifested DOM and perform UI testing. Such DOM-based UI tests bringthe application to a particular DOM state through a sequence of actions, such asfilling a form and clicking on an element, and subsequently verify the existence orproperties of particular elements in that DOM state.1.1.1 Test Model GenerationModel-based testing uses models of program behaviour to generate test cases [175].These test models are either specified manually or derived automatically. Dynamicanalysis and exploration (also know as crawling) plays a significant role in auto-mated test model derivation for many automated testing techniques [68, 71, 74,113, 119, 121, 128, 129, 171]. Unlike traditional static hyper-linked web pages,crawling web applications is challenging due to event-driven user interface changescaused by client-side code execution.In the recent past, web crawlers have been proposed to explore event-drivenDOM mutations in web applications. For instance, CRAWLJAX [120] uses dynamicanalysis to exercise and crawl client-side states of web applications. It incremen-tally reverse engineers a model, called State-Flow Graph (SFG), where each vertexrepresents a runtime DOM state and each edge represents an event-driven transi-tion between two states. Such an inferred SFG can be utilized in DOM-based UItest case generation [121], by adopting different graph coverage methods (e.g., allstates/edges/paths/transitions coverage).Challenge. Most industrial web applications have a huge state-space and exhaus-2tive crawling – e.g., breadth-first search (BFS), depth-first search (DFS), or ran-dom search – lead to state explosion[177]. In addition, given a limited amount oftime, exhaustive crawlers can become mired in specific parts of the application,yielding poor functionality coverage. Since exploring the whole state-space can beinfeasible (state explosion) and undesirable (time constrains), it is challenging toautomatically derive an incomplete test model but with an adequate functionalitycoverage in a timely manner.Related Work. Efficient strategies for web application crawling [70, 73, 79, 134]try to discover as many states as possible in the shortest amount of time. However,discovering more state does not necessarily result in higher functionality coverage.Thummalapenta et al. [169] proposed a guided test generation technique for webapplications. Although their approach directs test generation towards higher busi-ness logic coverage, the application is crawled in an exhaustive manner that canlead to poor coverage.1.1.2 UI Test GenerationManually writing DOM-based UI test cases is inefficient with limited coverage,therefore crawling-based techniques [74, 121, 159, 169] have been proposed toautomate exploration and test generation.Challenges. Such test generation methods are limited in three areas:1. Input values: Valid input values is required for proper state-space coverage.Automatic input generation is challenging since many applications requirea specific type, value, or combination of inputs to expose the hidden statesbehind input fields and forms.2. Paths to explore: Since covering the whole state-space of many industrialweb applications is infeasible in practice, crawlers are limited to a crawldepth, exploration time or number of states. Not knowing which paths areimportant to explore results in obtaining a partial coverage of a specific re-gion of the application.3. Assertions: Any generated test case needs to assert the application behaviour.However, generating proper assertions, known as the oracle problem [184],3without human knowledge is challenging. Thus, many web testing tech-niques rely on generic invariants [121] or standard validators [66].Related work. Researchers have proposed techniques to use existing artifacts fortesting. Elbaum et al. [81] and Sprenkle et al. [165] leverage user-sessions datafor web application test generation. Schur et al. [159] infer behaviour modelsfrom enterprise web applications via crawling and generate test cases simulatingpossible user inputs. Similarly, Xu et al. [188] mine executable specifications ofweb applications from SELENIUM test cases to create an abstraction of the system.Yuan and Memon [192] propose an approach to iteratively rerun automaticallygenerated test cases for generating alternating test cases. Artzi et al. [66] presentArtemis, which uses the gathered information while random testing to generatednew test inputs. These approaches, however, do not use information in the existingtests and do not address test oracle generation. Yoo and Harman [190] proposea search-based approach to reuse and regenerate existing test data for primitivedata types. Pezze et al. [150] present a technique to generate integration test casesfrom existing unit test cases. Test suite augmentation techniques, such as [189],are used in regression testing to generate new test cases for the changed parts ofthe application. Wang et at.[181] propose aggregating tests generated by differentapproaches using a unified test case language. However, these techniques also donot consider the oracle generation problem.1.2 Unit TestingWriting DOM-based UI tests does not generally require an understanding of theclient side code under test as it is a black-box testing process, thus easier to write fortesters. However, DOM-based tests may miss code-level bugs that do not propagateto the DOM [130]; therefore JavaScript unit testing is important for proper testingof web applications.1.2.1 JavaScript Test Quality AssessmentTo assist developers with writing tests, there exist number of JavaScript unit test-ing frameworks, such as Mocha [36], Jasmine[31], QUnit [40], and Nodeunit [37],each having its own advantages and disadvantages [52]. However, even by us-4ing a testing framework, some JavaScript features, such as DOM interactions,event-dependent callbacks, asynchronous callbacks, and closures (hidden scopes),are considered to be harder to test [28, 46, 48, 53, 127, 130, 173]. Thus the re-search community have proposed some automated test generation techniques forJavaScript programs to partially assist with hard to test code [66, 95, 127, 130, 131],though are not considerably used by testers and developers yet. Currently there isno (large scale) study on JavaScript (unit) tests in the wild to evaluate difficultiesin writing such tests. Moreover, studying the quality of JavaScript tests can revealsome shortcomings and difficulties of manual testing, which provides insights onhow to improve existing JavaScript test generation tools and techniques.Challenges. Test quality assessment is, however, not a trivial task and can besubjective. For instance a good test quality metric that considers test effective-ness, is fault/bug detection capability. Ideally, this requires manual investigationof JavaScript bug reports, which confines the study to JavaScript projects that havebug reports associated to some test cases. As an alternative, mutation score, i.e.,the percentage of killed mutants over total non-equivalent mutants, is often usedas an estimate of defect detection capability of a test suite. However, running testsuites on many generated mutated versions of JavaScript programs can be very timeconsuming to apply for a large scale study. Consequently, some other test qualitymetrics, such as code coverage, and average number of assertions per test, can beused. While code coverage does not directly imply a test suite effectiveness [97],it is a widely accepted test quality indicator. Thus finding the root cause of why aparticular statement is not covered by a test suite, can help in writing higher qualitytests.Related work. Ocariza et al. [142] performed study to characterize root causesof client-side JavaScript bugs. Researchers also studied test cases and mining testsuites in the past. Inozemtseva et al. [97] found that code coverage does not directlyimply the test suite effectiveness. Zhang et al. [199] analyzed test assertions andshowed that existence of assertions is strongly correlated with test suite effective-ness. These work, however, did not study JavaScript tests. Mirshokraie et al. [129]presented a JavaScript mutation testing approach and as part of their evaluation,assessed mutation score for test suites of two JavaScript libraries.51.2.2 JavaScript Unit Test GenerationUnit testing is a software testing method to test individual units of a source code.In generic unit testing, a test fixture is a fixed state of the software under test that isset up to provide all the required resources in order to properly run a test. In client-side JavaScript unit testing, test fixtures can be in the form of a partial HTML forthe expected DOM structure before calling the JavaScript function under tests. Ifsuch a DOM-based fixture is not provided in the exact structure as expected bythe function, a DOM API method (e.g., getElementById()) returns null,thus the execution of the function fails due to the null exception and the test caseterminates prematurely. Therefore proper DOM fixtures are required for effectiveJavaScript unit testing. Since manual construction of test fixtures is tedious andcostly, automated fixture generation is of great value.Challenges. Automated generation of proper test fixtures is challenging as it re-quires inferring and solving constraints in the code. This is even more complicatedwhen DOM structure is involved. There are two main challenges in generatingproper DOM-based fixtures:1. DOM-related variables: JavaScript is a weakly-typed and highly-dynamiclanguage, which makes static code analysis quite challenging. Moreover,its interactions with the DOM can become difficult to follow [62, 141]. Afixture generator needs to determine DOM-dependent variables that refer tovalues/properties of DOM elements.2. Hierarchical DOM relations: Unlike most test fixtures that deal only withprimitive data types, DOM-based test fixtures require a tree structure. In fact,DOM fixtures not only contain proper DOM elements with attributes andtheir values, but also hierarchical parent-child relations that can be difficultto reconstruct.Related work. Most JavaScript test generation techniques [66, 95, 158, 161] do notconsider DOM fixture generation. Li et al. [109] proposed SymJS, which appliessymbolic execution with a limited support for the DOM. They consider substitutingDOM element variables with integer or string values and using a solver, rather thanactually generating the hierarchical DOM structure. Mirshokraie et al. [130] pro-posed JSeft that captures the full DOM tree during the execution of an application6before executing the function under test, and uses that DOM as a test fixture. Thisapproach, assumes that the captured DOM contains the DOM structure as expectedby the function, which is not always the case. As such, the code coverage achievedwith such a DOM can be quite limited. Moreover, the DOM captured this way, canbe too large and difficult to read as a fixture in a test case.1.3 Code MaintenanceWeb applications must continually evolve and be maintained to remain functional.Maintenance may involve refactoring to improve the internal logical structure ofthe code without changing its external behaviour.1.3.1 JavaScript Code Smell DetectionJavaScript code maintenance is particularly challenging because (1) there is typ-ically no compiler in the development cycle that would help developers to spoterroneous or unoptimized code; (2) JavaScript has a dynamic, weakly-typed, asyn-chronous nature, and supports intricate features such as prototypes, first-class func-tions, and closures; and (3), it interacts with the DOM through a complex event-based mechanism [180]. As a result JavaScript applications tend to contain manycode smells, i.e., patterns in the code that indicate potential comprehension andmaintenance issues [85]. Code smells, once detected, need to be refactored to im-prove the design and quality of the code.Challenges. Manual JavaScript code smell detection is time consuming and error-prone, thus automated tools can be very helpful. Detecting code smells is de-pendent on identifying objects, classes, and functions in the code, which is notstraightforward in JavaScript due to its very flexible model of objects and func-tions. JavaScript code refactoring is also challenging due to dynamism, wide use ofglobal variables and first-class functions, which can cause unexpected side-effectswhen making code changes.Related work. WebScent [137] is a tool to detect client-side smells (mixing ofHTML, CSS, and JavaScript, duplicate code in JavaScript, and HTML syntaxerrors) in embedded code within server-side code. However it requires manualnavigation of the application, and also does not support inferring dynamic cre-7ation/change of objects, properties, and functions at runtime. WARI [17] finds un-used and duplicated JavaScript functions. JSLint [9] and PMD [12] are static codeanalysis tools that validate JavaScript code against a set of good coding practices.The Google Closure Compiler [27] rewrites JavaScript code to make it faster byremoving comments and unreachable code. Tool support for refactoring JavaScriptare in their early stages. Feldthaus et al. [83, 84] proposed static pointer anal-ysis techniques to perform renaming of variables or object properties, and someJavaScript-specific refactorings, such as encapsulation of properties and extractionof modules, targeting programming idioms [77].1.4 Research QuestionsThe overarching goal of this dissertation is to develop automated testing and main-tenance techniques that improve the quality of web applications. To achieve thisgoal and in accordance to the aforementioned three perspectives, we designed fiveresearch questions listed below.RQ1.4.1 How can we effectively derive test models for web applications?RQ1.4.1 is in line with UI testing. In response to RQ1.4.1, we propose afeedback-directed exploration technique [123] and tool, called FEEDEX [25], thatselectively covers a subset of the state-space of a given web application. The ex-ploration is guided towards achieving higher functionality, navigational, and pagestructural coverage of a given web application while reducing the size of our testmodel i.e., a State-Flow Graph (SFG) [120, 121]. Our results show that FEEDEXis capable of generating test models that are enhanced in all aspects compared tothe traditional exploration methods.RQ1.4.2. Can we utilize the knowledge in existing UI tests to generate newtests?RQ1.4.2 is also with respect to UI testing. To address RQ1.4.2, we proposeda technique [126] and a tool, called TESTILIZER [47], to generate DOM-based UItest cases using existing tests. TESTILIZER mines the existing test suite to infer amodel (in the form of a SFG) of the covered DOM states and event-based transi-tions including input values and assertions. It then expands the inferred model byexploring alternative paths and generates assertions for the new states. Finally it8generates a new test suite from the extended model. Our results supports that lever-aging input values and assertions from human-written test suites can be helpful ingenerating more effective UI tests.RQ1.4.3. What is the quality of JavaScript tests in practice and which part ofthe code is hard to cover and test?RQ1.4.3 is in line with unit testing. To address RQ1.4.3, we present the firstempirical study of JavaScript tests to characterize their prevalence, quality metrics,and shortcomings [125]. We perform our study across a large corpus of JavaScriptprojects and found that about 22% of the studied subjects do not have test code.About 40% of projects with JavaScript at client-side do not have a test, while this isonly about 3% for the purely server-side JavaScript projects. Also tests for server-side code have high quality, while tests for client-side code have moderate to lowquality. On average JavaScript tests lack proper coverage for event-dependent call-backs (36%), asynchronous callbacks (53%), and DOM-related code (63%).RQ1.4.4. How can we automate fixture generation for JavaScript unit testing?RQ1.4.4 is also with respect to unit testing. In response to RQ1.4.4, we pro-posed a DOM-based test fixture generation technique [127] and a tool, called CON-FIX [24], which is based on concolic execution. Conceptually our approach infersa control flow graph (CFG) by guiding the executing through different branchesof a function. Our empirical results show that the generated fixtures substantiallyimprove code coverage compared to test suites without these fixtures, with a neg-ligible overhead.RQ1.4.5. Which JavaScript code smells are prevalent in practice and whatmaintenance issues they cause?RQ1.4.5 is regarding code maintenance. In response to RQ1.4.5, we proposeda tool and technique, called JSNOSE [34], to detect JavaScript code smells [124]that could benefit from refactoring. It uses static and dynamic analysis to inferprogram entities model containing objects, functions, variables, and code blocks.We collected 13 JavaScript code smells by studying various online developmentresources and books that discuss bad JavaScript coding patterns. Our results showthat lazy object, long method/function, closure smells, coupling JS/HTML/CSS,and excessive global variables, are the most prevalent JavaScript smells.91.5 PublicationsIn response to the research questions in Section 1.4, the following peer-reviewedpapers have been published:• Feedback-Directed Exploration of Web Applications to Derive Test Mod-els [123]: A. Milani Fard, A. Mesbah, IEEE International Symposium onSoftware Reliability Engineering (ISSRE’13), 278–287, 2013;• Leveraging Existing Tests in Automated Test Generation for Web Applica-tions [126]: A. Milani Fard, M. Mirzaaghaei, A. Mesbah, IEEE/ACM Inter-national Conference on Automated Software Engineering (ASE’14), 67–78,2014;• JavaScript: The (Un)covered Parts [125]: A. Milani Fard, A. Mesbah, IEEEInternational Conference on Software Testing, Verification and Validation(ICST’17), 11 pages, 2017;• Generating Fixtures for JavaScript Unit Testing [127]: A. Milani Fard, A.Mesbah, and E. Wohlstadter, IEEE/ACM International Conference on Auto-mated Software Engineering (ASE’15), 190–200, 2015;• JSNose: Detecting JavaScript Code Smells [124]: A. Milani Fard, A. Mes-bah, IEEE International Conference on Source Code Analysis and Manipu-lation (SCAM’13), 116–125, 2013.I have also contributed to the following testing related publication:• An Empirical Study of Bugs in Test Code [176]: A. Vahabzadeh, A. MilaniFard, and A. Mesbah, International Conference on Software Maintenanceand Evolution (ICSME’15), 101–110, 2015.10Chapter 2Feedback-Directed Exploration of Web Applica-tions to Derive Test ModelsSummary3Dynamic exploration techniques play a significant role in automated web appli-cation testing and analysis. However, a general web application crawler that ex-haustively explores the states can become mired in limited specific regions of theweb application, yielding poor functionality coverage. In this chapter, we proposea feedback-directed web application exploration technique to derive test models.While exploring, our approach dynamically measures and applies a combinationof code coverage impact, navigational diversity, and structural diversity, to decidea-priori (1) which state should be expanded, and (2) which event should be exer-cised next to maximize the overall coverage, while minimizing the size of the testmodel. Our approach is implemented in a tool called FEEDEX. We have empir-ically evaluated the efficacy of FEEDEX using six web applications. The resultsshow that our technique is successful in yielding higher coverage while reducingthe size of the test model, compared to classical exhaustive techniques such asdepth-first, breadth-first, and random exploration.3An initial version of this chapter has been published in the IEEE International Symposium onSoftware Reliability Engineering (ISSRE), 2013 [123].112.1 IntroductionModern web applications make extensive use of JavaScript to dynamically mutatethe DOM in order to provide responsive interactions within the browser. Due tothis highly dynamic nature of such applications, dynamic analysis and exploration(also know as crawling) play a significant role [88] in many automated web appli-cation testing techniques [64, 68, 74, 114, 119, 121, 128, 129, 171]. Such testingtechniques depend on data and models generated dynamically through web appli-cation crawling.Most web application exploration techniques used for testing apply an exhaus-tive search in order to achieve a “complete” coverage of the application state-spaceand functionality. An assumption often made is that the state-space of the applica-tion is completely coverable in a reasonable amount of time. In reality, however,most industrial web applications have a huge dynamic state-space and exhaustivecrawling – e.g., through breadth-first search (BFS), depth-first search (DFS), orrandom search – can cause the state explosion problem [177]. In addition, a genericcrawler that exhaustively explores the states can become mired in irrelevant regionsof the web application [144], producing large test models that yield poor function-ality coverage.Because exploring the whole state-space can be infeasible (state explosion)and undesirable (time constrains), the challenge we are targeting in this chapter isto automatically derive an incomplete test model but with adequate functionalitycoverage, in a timely manner.To that end, we propose a novel feedback-directed exploration technique, calledFEEDEX, which is focused on efficiently covering a web application’s functional-ity to generate test models. We propose four metrics to capture different aspects ofa test model, namely code coverage impact, navigational diversity, page structuraldiversity, and test model size. Using a combination of these four metrics, our ap-proach dynamically monitors the exploration and its history. It uses the feedbackobtained to decide a-priori (1) which states should be expanded, and (2) whichevents should be exercised next, so that a subset of the total state-space is effec-tively captured for adequate test case generation.The main contributions of our work include:12• We present a feedback-directed exploration technique that selectively coversa subset of the state-space of a given web application to generate an adequatetest model;• We propose the notions of coverage impact and diversity – i.e., navigationaland page structural diversity – to capture different aspects of derived testmodels;• We describe an event execution method, which prioritizes events based ontheir historical productivity in producing relevant states;• We implement our approach in a tool called FEEDEX, which is freely avail-able;• We empirically evaluate the efficacy of our approach on six web applications.The results show that our method yields much better code coverage (10% atthe minimum), and diversity (23% at the minimum), compared to traditionalexhaustive methods. In addition, our approach is able to reduce the size ofthe derived test model and test suite by at least 38% and 42%, respectively.2.2 Background and MotivationWeb Applications. The advent of recent Web and browser technologies has ledto the proliferation of modern – also known as web 2.0 – web applications, withenhanced user interaction and more efficient client-side execution. Building onweb technologies such as AJAX, web applications execute a significant amount ofJavaScript code in the browser. A large portion of this code is dedicated to interactwith the Document Object Model (DOM) to update the user interface at runtime.In turn, these incremental updates lead to dynamically generated execution stateswithin the browser.Automated Test Model Generation. Model-based testing uses models of programbehaviour to generate test cases. These test models are either specified manuallyor derived automatically.Tool support for exploring (or “crawling”) dynamic states of web applicationsenables automated test model derivation. Unlike traditional static hyper-linked13web pages, automatically exploring web applications is challenging because manyof the user interface state changes are (1) event-driven, (2) caused by the executionof client-side application code, and (3) represented by dynamically mutating theinternal DOM tree in the browser. In the recent past, web crawlers have beenproposed that can explore event-driven DOM mutations in web applications. Forinstance, CRAWLJAX [120] uses dynamic analysis to exercise and crawl client-side states of web applications. It incrementally reverse engineers a model, calledState-Flow Graph (SFG), which captures dynamic DOM states and event-driventransitions connecting them. This SFG model is formally defined as follows:Definition 1 A state-flow graph SFG for a web application A, is a labeled directedgraph, denoted by a 4 tuple < r,V ,E,L > where:1. r is the root node (called Index) representing the initial state after A hasbeen fully loaded into the browser.2. V is a set of vertices representing the states. Each v ∈ V represents aruntime DOM state in A.3. E is a set of (directed) edges between vertices. Each (v1,v2) ∈ E repre-sents a clickable c connecting two states if and only if state v2 is reached byexecuting c in state v1.4. L is a labelling function that assigns a label, from a set of event types andDOM element properties, to each edge.5. SFG can have multi-edges and can be cyclic. 2To infer this SFG, events (e.g., clicks) are automatically generated on candidateclickables, i.e., user interface elements of the web application that can potentiallychange the state of the application. An example is a DIV element, dynamicallybound to an event-listener, which when clicked calls a JavaScript function, whichin turn mutates the DOM inside the browser (see Figures 2.3 and 2.4). Only whenthe event generated results in a state change, the candidate clickable is seen as areal clickable and the new state and the clickable are added to the SFG.14Indexe0 s1e1s5e4s7e6s6e0 c1e1c2e6s8e6c3e7s4s3 e3e5e0 s2e2e3e2 e2e3e5e4Figure 2.1: State-flow graph of the running example. The shaded nodes represent partially expandedstates, edges with dashed lines are candidate events, and nodes shown in dashed circle are states thatmight be discovered after event execution.An example of such a SFG is shown in Figure 2.1. Such an automatically in-ferred test model can be utilized in test case generation [121], by adopting differentgraph coverage methods (e.g., all states/edges/paths/transitions coverage).Motivation. Given that (1) most industrial web applications have a huge dynamicstate-space, which can cause the state explosion problem, (2) dynamic explorationis time-consuming, and (3) in any realistic software development environment, theamount of time dedicated to testing is limited, opting for the exploration of a partialsubset of the total state-space of a given web application seems like a pragmaticfeasible solution.Given a specific amount of time, there are, however, different ways of explor-ing this partial subset. For example, assume that the SFG in Figure 2.1, includingthe dashed nodes and edges, represents the complete state model of a web applica-tion. Also assume that our allowed crawling time is limited to 4 event executionsand our crawler applies a depth-first search. If the explored sequence of exercisedclickables is e1, e2, e2, e2, we retrieve states s1, s2, s3, and s4, all of which belongto same crawling path. It could also be the case that the crawler has a predeter-mined order for event execution. Consider a crawler in which e3 always executesbefore e2. The sequence of the resulting clicks would then become e1, e2, e3, e2which results in discovering only 3 states, namely s1, s2, and s3. Another exam-ple, depicted in Figures 2.3–2.4, includes exploring states behind next and previouslinks. This is a typical scenario in which exhaustive crawlers can become mired in15Intercept & Instrument JS codeGenerate eventExecution TraceServer BrowserMeasure Code Coverage ImpactGuidedCrawlerState-flow GraphAnalyzeDOMCalculate   state scoresCalculateevent productivityupdateFigure 2.2: Processing view of FEEDEX, our feedback-directed exploration approach.specific parts of the web application, yielding poor functionality coverage.Because a complete exploration of the whole application can be infeasible(state explosion), undesirable (time constraint), and inefficient (large test modelderived that yields low coverage), the challenge we are targeting in this chapter isto derive an incomplete test model that can adequately provide functionality cov-erage.2.3 ApproachThe goal of our work is to automatically derive a test model that captures differentaspects of the given web application’s client-side functionality. In the next sub-sections, we present these desired aspects of a test model, followed by a salientdescription of our feedback-directed exploration technique, called FEEDEX.2.3.1 Deriving Test ModelsA web crawler is generally evaluated on its ability to retrieve relevant and desirablecontent [61, 144, 166]. In this work, we propose metrics that target relevanceand desirability in the context of web application testing. We believe that a test16model derived automatically from a web application should possess the followingproperties to effectively cover different aspects of the application under test:• Functionality Coverage. A test suite generated from a derived test modelcan only cover the maximum functionality contained in the test model, andnot more. For instance consider Figure 2.1. If the inferred test model doesnot capture events e6 and e7 the generated test suite will not be able to coverthe underlying functionality behind those two events. Therefore, it is im-portant to derive a test model that possesses adequate coverage of the webapplication when the end goal is test suite generation.• Navigational Coverage. The navigational structure of a web applicationallows its users to navigate it in various directions. For instance s3 and s4are both on the same navigational path, whereas s3 and s8 are on differentbranches (Figure 2.1). To cover the navigational structure adequately, webelieve that a test model should cover different navigational branches of theweb application.• Page Structural Coverage. The structure of a webpage in terms of its in-ternal DOM elements, attributes, and values, provides the main interactioninterface with end users. Each page structure provides a different degree ofcontent and functionality. To capture this structural functionality adequately,we believe a test model should cover heterogeneous DOM structures of theweb application.• Size. The size of the derived test model has a direct impact on the number ofgenerated test cases and event executions within each test case. Reducing thesize of the test model can decrease both the cost of maintaining the generatedtest suite and the number of test cases that must be rerun after changes aremade to the software [94]. Thus, while deriving test models, the size shouldbe optimized as long as the reduction does not adversely influence the cover-age. We consider the number of events (edges) in the SFG as an indicator ofthe model size since test case generation is done by traversing the sequenceof events.17Algorithm 1: Feedback-directed Explorationinput : A web application A, the maximum exploration time t, the maximumnumber of states to explore n, exploration strategy STRoutput: The inferred state-flow graph SFG1 SFG← ADDINITIALSTATE(A)Procedure EXPLORE() begin2 while CONSTRAINTSATISFIED(t,n) do3 PES← GETPARTIALLYEXPANDEDSTATES(SFG)4 for si ∈ PES do5 for s j ∈ PES & s j 6= si do6 Score(si,s j)← GETSCORE(si,s j,STR)end7 MinScore(si)← GETMINSCORE(si)end8 s← GETNEXTTOEXPLORESTATE(PES,MinScore)9 C← PRIORITIZEEVENTS(s)10 for c ∈C do11 browser.GOTO(SFG.GETPATH(s))12 dom← browser.GETDOM()13 robot.FIREEVENT(c)14 new dom← browser.GETDOM()15 if dom.HASCHANGED(new dom) then16 SFG.UPDATE(c,new dom)17 EXPLORE()endendend18 return SFGend2.3.2 Feedback-directed Exploration AlgorithmIn this chapter, we refer to the process of executing candidate clickables of a state asstate expansion. Figure 2.2 depicts an overview of our approach. At a high level,it dynamically analyzes the exploration history at runtime, including previouslycovered states and events, to anticipate and decide a-priori (1) which state shouldbe expanded next, and (2) from the state that is selected for expansion, which event(i.e., clickable element) should be exercised.Given a web application, a maximum exploration time t, and a maximum num-ber of states to explore n, the intent is to maximize the test model coverage while18reducing the model size within t. The rationale behind our technique is to rewardstates and events that have a substantial impact on different aspects of a test model(and penalize those that do not). Our exploration strategy, shown in Algorithm 1,applies a greedy approach to select and expand a partially expanded state.Definition 2 A partially expanded state is a state, during exploration, which stillcontains one or more unexercised candidate clickables. 2In Figure 2.1, both s6 and s8 are partially expanded states. To expand the nextpartially expanded state, while exploring, we calculate a state score (explained inSection 2.3.3) for each state that needs to be expanded, at runtime; this score pri-oritizes partially expanded states selectively so that a test model with enhancedaspects can be inferred for testing. Our key insight is that by dynamically mea-suring and expanding states with the highest scores, we can construct a state-flowgraph of an application, yielding higher overall coverage. In addition to the statescore, our approach prioritizes events based on their productivity ratio (describedin Section 2.3.4).Algorithm 1 repeats the following main steps until the given time limit t, orstate-space limit n is reached:• For each partially expanded state, compute the state score with respect toother unexpanded states (Lines 4-6). The score of a state is the minimumpair-wise state score of that state (Line 7);• Choose the state with the highest fitness (score) value as the next state toexpand (Line 8);• On the chosen state for expansion, prioritize the events based on their eventscore (Line 9);• Take the browser to the chosen state and execute the highest ranked eventaccording to the prioritized order (Lines 10-3);• If the DOM state is changed after the event execution, update the SFG ac-cordingly (Lines 14-16) and call the Explore procedure (line 17).192.3.3 State Expansion StrategyIn this section, we describe our state expansion strategy, which is used to prioritizepartially expanded states based on their overall contributions towards the differentdesired properties of the test model.Code Coverage Impact. One primary objective for generating a test model is toachieve an adequate code coverage of the core functionality of the system. Notethat we only consider the client-side code coverage and the server-side code cover-age is not the goal of this work. The following example shows how state expansioncan affect the code coverage.Example 1 Figure 2.3 depicts a simple JavaScript code. Figure 2.4 shows thecorresponding DOM state. Assume that the state-flow graph as shown in the Figure2.1 was generated by exploring this simple example. If the crawler finishes afterclicking on the next and the previous clickables multiple times, only the functionshow and the first two lines of the script code will be covered, i.e., 9 lines in total.Assuming that the total number of JavaScript lines of code is 27 (not all the code isshown in the figure), coverage percentage would be 927 = 33.33%. This indicatesthat no matter what test case generation method we apply on the inferred SFG, wecan not achieve a higher code coverage than 33.33%. However, if the crawler wasable to click on the Update clickable before termination, the function load wouldbe executed as well, yielding a coverage of 1527 = 55.55%. 2The code coverage impact for a state s, denoted by CI(s), is defined as theamount of increase in the code coverage after s is created. Considering the exam-ple again, when the Index page is loaded, the first two JavaScript lines are executedand thus initial coverage is 227 = 7.4%, and the CI(Index)=0.074. After executingonclick="show()", coverage would reach 33.33% with a 25.93% increaseand thus CI(s1) = 0.259. For each newly discovered state, we calculate the CIin this manner. If a resulting state of an event is an already discovered state inthe SFG, the CI value will be updated if the new value is larger. The CI scoreis taken into consideration when making decisions about expanding partially ex-panded states.Path Diversity. Given a state-flow graph, states located on different branches are201 myImg = new Array("1.jpg","2.jpg","3.jpg","4.jpg");2 curIndex = 1;3 ...4 function show (offset) {5 curIndex = curIndex + offset;6 if (curIndex > 4) {7 curIndex = 1; }8 if (curIndex == 0) {9 curIndex = 4 ; }10 document.imgSrc.src = myImg[curIndex - 1];11 }12 ...13 function load () {14 var xhr = new XMLHttpRequest();15'GET', '/update/', true);16 xhr.onreadystatechange = function(e) {17 document.getElementById("container").innerHTML = this.responseText18 };19 xhr.send();20 }Figure 2.3: Simple JavaScript code snippet.<body><img name="imgSrc" id="imgSrc" src="1.jpg"><span onclick="show(-1)">previous</span><span onclick="show(1)">next</span>...<a href="#" onclick="load(); return false;">Update!</a><div id="container"></div></body>Figure 2.4: A simple DOM instance.more likely to cover different navigational functionality (as in Section 2.3.1) thanthose on a same path. Thus, guiding the exploration towards diversified paths canyield a better navigational functionality coverage.Definition 3 A simple event path Psi of state si is a path on SFG from the Indexnode to si, without repeated nodes. 2Note that in our definition we only consider “simple paths” without repeatednodes to avoid the ambiguity of having cycles and loops, such as those shown inFigure 2.1, where for instance, there exists an infinite number of paths from Index21to s2. We formulate the path similarity of two states si and s j as:PathSim(si,s j) =2×|Psi ∩Ps j ||Psi |+ |Ps j |(2.1)where |Psi | is the length of Psi , and |Psi ∩Ps j | denotes the number of shared eventsbetween Psi and Ps j . The path similarity notion captures the percentage of eventsshared by two simple paths. The fewer events two states share in their paths, themore diverse their navigational functionality. Thus, let MaxPathSim(si,s j) be themaximum path similarity of si and s j considering all possible simple paths, from In-dex to si and s j, respectively. The path diversity of si and s j, denoted by PD(si,s j),is then calculated as:PD(si,s j) = 1−MaxPathSim(si,s j) (2.2)Example 2 Consider the running example of Figure 2.1. We calculate pair-wisepath diversity scores for states s4, s6, and s8. PathSim(s4,s6) = 2×|Ps4∩Ps6||Ps4|+|Ps6| canbe computed in two ways as there exist two simple event paths from Index tos6: For Ps6 through s1, PathSim(s4,s6) = 2×14+2 =26 =13 , and for Ps6 through s5,PathSim(s4,s6) = 2×04+2 = 0. Thus MaxPathSim(s4,s6) =13 and PD(s4,s6) = 1−13 =23 . Similarly, PD(s4,s8)=1-0=1, and PD(s6,s8)=1-0=1. This indicates thats4 and s6 are not that diverse with respect to each other, but they are very diversewith respect to s8. 2DOM Diversity. In many web applications, the DOM is mutated dynamicallythrough JavaScript to reflect a state change, without requiring a URL change. Manyof such dynamic DOM mutations are, however, incremental in nature and corre-spond to small delta changes, which might not be interesting from a testing per-spective, especially given the space and time constraints. Therefore, guiding theexploration towards diversified DOM states can result in a better page structuralcoverage.In most current AJAX crawlers, string representations of the DOM states areused for state comparison. The strings are compared by either calculating the editdistance [120], a strip and compare method [120], or by computing a hash of the22<html><head> <body><title> <h1> <div><img>(a) DOM tree t1<html><head> <body><title> <table> <div><tr><td> <td><img>(b) DOM tree t2<html><head> <body><title> <table> <div><tr> <tr><td> <td> <td> <td><img>(c) DOM tree t3Figure 2.5: DOM tree comparison.content [80]. These approaches ignore the tree structure of DOM trees. To ac-count for the actual structural differences, we adopt the tree edit distance betweentwo ordered labeled trees, which was proposed [168] and implemented [149] asthe minimum cost of a sequence of edit operations that transforms one tree intoanother. The operations include deleting a node and connecting its children to theparent, inserting a node between a node and the children of that node, and rela-belling a node.We define state DOM diversity as the normalized DOM tree edit distance. Let tiand t j be the corresponding DOM trees of two states si and s j. The DOM diversityof si and s j, denoted by DD(si,s j), is defined as:DD(si,s j) =T ED(ti, t j)max(|ti|, |t j|) (2.3)where T ED(ti, t j) is the tree edit distance between ti and t j, and max(|ti|, |t j|) is themaximum number of nodes in ti and t j.Example 3 Figure 2.5 depicts three DOM trees with |t1|=7, |t2|=10, and |t3|=13.t2 can be produced from t1 by (1) relabelling <h1> in t1 to <table>, and (2)inserting three nodes under <table>. Thus T ED(t1, t2) = 4 and their DOM di-versity equals 410 =0.4. Similarly T ED(t1, t3) = 7 and thus their DOM diversityequals 713 =0.53. This shows t3 is more DOM diverse than t2 with respect to t1.T ED(t2, t3) = 3 and their DOM diversity equals 313 =0.23 223Overall State Score. The state score is a combination of code coverage impact,path diversity, and DOM diversity. Our state expansion fitness function is a linearcombination of the three metrics as follows:Score(si,s j) = wCI ·CI(si,s j)+wPD ·PD(si,s j)+wDD ·DD(si,s j) (2.4)where, wCI , wPD, and wDD are user-defined weights (between 0 and 1) for codecoverage impact, path diversity, and DOM diversity, respectively.2.3.4 Event Execution StrategyThe goal of our event execution strategy is to reduce the size of events sequences(edges) in the SFG, while preserving the coverage. Reducing the size of events isimportant since it reduces the size of generated test cases, which in turn minimizesthe time overhead of test rerun [94].Intuitively, we try to minimize the execution of events that are not likely toproduce new states. We categorize web application user interface events into fourgroups based on their impact on the application state transitional behaviour: (1)An event that does not change the DOM state is called a self-loop, e.g., eventsthat replace the DOM tree with an exact replica, e.g., refresh with no changes, orclear data in a form; (2) A state-independent event is an event that always causesthe same resulting state, e.g., events that always result in the Index page; (3) Astate-dependent event is an event that after its first execution, always causes thesame state, when triggered from the same state. (4) A Nondeterministic event is anevent that may results in a new state, regardless of where it is triggered from. Suchevents can result in different states when triggered from the same state. In Figure2.1, for instance, e0 is a self-loop event, e5 is a state-independent event, and e4 isa state-dependent event.A crawler that distinguishes between these different events can avoid self-loops, minimize state-independent and nondeterministic event executions, and em-phasize state-dependent events to explore uncovered states. To that end, we defineevent productivity (EP) as follows.Let RSi(e) denote the resulting state of the i-th execution of the event e, andn be the total number of executions of e (including the last execution). The event24productivity ratio of e, denoted by EP(e), is defined as:EP(e) =1 ; if n = 0∑ni=1 MinDD(RSi(e))n ; otherwise(2.5)where MinDD(RSi(e)) = mins∈SFG{DD(RSi(e),s)}, i.e., the minimum diversity ofRSi(e) and all existing states in the SFG. Note that 0 ≤ EP(e) ≤ 1 and its valuecan change after each execution of e, while exploring.The above definition captures three properties. Firstly, it gives the highest ratioto the unexecuted events (in case n = 0) since the resulting state is more likelyto be a new state compared to already executed events. Naturally, this also helpsin covering more of the JavaScript code, since the event-listeners typically triggerthe execution of one or more JavaScript function(s). Secondly, it penalizes eventsthat result in an already discovered state, such as self-loops and state-independentevents, with MinDD(RSi(e))=0. Thirdly, the productivity ratio is proportional tothe structural diversity of the resulting state with respect to previously discoveredstates. This gives a higher productivity ratio to events that have resulted in morediverse structures, guiding the exploration towards more DOM diverse states.Remark 1. We do not consider path-diversity (PD) in the calculation of EP.This is because when the execution of an event results in a new state, the resultingstate shares much of its navigational path with the source state that leads to PDclose to 0, which discourages new state discovery. On the other hand, if the re-sulting state is an already discovered state in the SFG, its shortest event path maynot share much with other paths and therefore might get a high PD. This is alsocontrary to penalizing events causing already discovered states.The next example shows how this definition is applied on self-loops, state-independent events, and forward-backward events.Example 4 Consider Figure 2.1 again. For simplicity assume DOM diversity is1 for new states and 0 for existing states. An example of a self-loop event is e0.Assume that the first observation of e0 is at state Index. Since we have neverexecuted e0 before, EP(e0) = 1. After the first execution, EP(e0) = 01 = 0 becauseMinDD(Index) = 0 as the diversity of the source state (Index) and the resulting25state (Index) is 0. By the second execution, say at s1, EP(e0) = 0+01+1 = 0. Nowconsider e5 as a state-independent event. For the first time, say at s1, EP(e5) = 1.After its first execution, since s6 is a new state (we assume diversity is 1), theproductivity ratio will be EP(e5) = 11 = 1. However, the second execution resultsin a duplicate (s6 again) and thus EP(e5) = 1+01+1 = 0.5. Events e2 and e3 areexamples of previous-next events (see Figure 2.4). After the first execution of e3 atstate s2, EP(e3) = 01 = 0, because s1 was already in the SFG before executing e3and thus MinDD(s1)=0. 2Remark 2. Prioritizing events only makes sense when we allow the crawlerto exercise the same clickable multiple times. If the objective of the test modelgeneration is to merely achieve high code coverage, then clicking on the sameclickable again is unlikely to increase the code coverage; however, if the event isstate-dependent or nondeterministic, multiple execution of the same clickable canhave an impact on DOM and path diversity.2.3.5 ImplementationOur proposed approach is implemented in a tool called FEEDEX, which is publiclyavailable [25].To explore web applications, we build on top of CRAWLJAX [120, 121]. Thedefault engine supports both depth-first and breadth-first (multi-threaded, multi-browser) crawling strategies. FEEDEX replaces the default crawling strategy ofCRAWLJAX (as described in [120]) with our feedback-directed exploration algo-rithm. The tool can be configured to constrain the state space, by setting parame-ters such as the maximum number of states to crawl, maximum crawling time, andsearch depth level.As shown in Figure 2.2, FEEDEX intercepts and instruments the JavaScriptcode to collect execution traces while crawling. To parse and instrument the JavaScriptcode, we use Mozilla Rhino [41]. After each event execution, FEEDEX analyzesthe collected execution trace and measures the code coverage impact, which in turnis used in our overall exploration decision making.State path diversity is calculated according to Equation 2.2 by finding simplepaths (Definition 3) from Index state to each state in the SFG. In order to compute26Table 2.1: Experimental objects (statistics excluding blank/comment lines, and JavaScript libraries).ID Name JS LOC Description1 ChessGame [20] 198 A JavaScript-based simple game2 TacirFormBuilder [45] 209 A JavaScript-based simple HTML form builder3 TuduList [51] 782 An AJAX-based todo lists manager in J2EE and MySQL4 FractalViewer [26] 1245 A JavaScript-based fractal zoomer5 PhotoGallery [39] 1535 An AJAX-based photo gallery in PHP without MySQL6 TinyMCE [50] 26908 A JavaScript-based WYSIWYG editoreach simple event path, we apply a DFS traversal from the Index state to a statenode in the SFG and disregard already visited ones to avoid cycles or loops. Forthe computation of the tree edit distance for DOM state diversity as in Equation2.3, we take advantage of the Robust Tree Edit Distance (RTED) algorithm [149],which has optimal O(n3) worst case complexity and is robust, where n is the max-imum number of nodes of the two given trees. Considering that the number ofnodes in a typical DOM tree is relatively small, the overhead of the DOM diversitycomputation is negligible.In order to calculate the productivity ratio for an event, we store, for each event,a set of tuples comprising of source and target states, corresponding to all the pre-vious executions of that event.2.4 Empirical EvaluationTo assess the efficacy of the proposed feedback-directed exploration, we have con-ducted a controlled experiment. Our main research question is whether our pro-posed exploration technique is able to derive a better test model compared to tra-ditional exhaustive methods.Our experimental data along with the implementation of FEEDEX are availablefor download [25].2.4.1 Experimental ObjectsBecause our approach is targeted towards web applications, our selection criteriaincluded applications that (1) use extensive client-side JavaScript, (2) are based ondynamic DOM manipulation and AJAX interactions, and (3) fall under different27Table 2.2: State-space exploration methods evaluated.Method Exploration CriteriaDFS Expand the last partially expanded stateBFS Expand the first partially expanded stateRND Randomly expand a partially expanded stateFEEDEX Expand the partially expanded state with the highest score(wCI = 1,wPD = 0.5,wDD = 0.3), and prioritize eventsdomains. Based on these criteria, we selected six open source applications fromdifferent domains which are shown in Table 2.1. We use CLOC [22] to count theJavaScript lines of code (JS LOC). The reported LOC in Table 2.1 is excludingblank lines, comment lines, and JavaScript libraries such as jQuery, DWR, Scrip-taculous, Prototype, Bootstrap, and google-analytics. Note that we also excludethese libraries in the instrumentation step.2.4.2 Experimental SetupAll our experiments are performed on a core-2 Duo 2.40GHz CPU with 3.46 GBmemory, running Windows 7 and Firefox web browser.Independent VariablesWe compare our feedback-directed exploration approach with different traditionalexploration strategies.Exploration Constraints. We confine the designated exploration time for derivinga test model to 300 seconds (5 minutes) for all the experimental runs.4 We set nolimits on the crawling depth nor the maximum number of states to be discovered.We configure the tool to allow multiple executions of the same clickable element,as the same clickable can cause different resulting states.State-space Exploration. Table 2.2 presents the different state expansion strate-gies we evaluate in the first part of our experiment. The first three (DFS, BFS,RND) are exhaustive crawling methods. DFS expands states in a depth-first fash-ion, i.e., the last discovered state would be expanded next. BFS applies breath-first4Dedicating 5–10 minutes to test generation is acceptable in most testing environments [96].28exploration by expanding the first discovered state next. RND performs randomexploration in which a partially expanded state is chosen uniformly at random tobe expanded next. Note that for these traditional exhaustive exploration meth-ods we consider the original event execution strategy, i.e., a user-defined orderfor clicking on elements. For this experiment, the order is defined as: A, DIV,SPAN, IMG, BUTTON, INPUT, and TD.The last method is an instantiation of ourfeedback-directed exploration score (See Equation 2.4) where wCI = 1, wPD = 0.5,and wDD = 0.3. We empirically evaluated different weightings and found this set-ting among other settings can generally produce good results. FEEDEX prioritizesclickable elements based on the event productivity ratio EP (Equation 2.5) andexecutes them in that order (see Section 2.3.4).Dependent VariablesWe analyze the impact of different state-space exploration strategies on the codecoverage, the overall average path diversity and DOM diversity, as well as the sizeof the derived test model.Code Coverage Score. We measure the final statement code coverage achievedby each method, after 5 minutes of exploration. In order to measure the JavaScriptcode coverage, we instrument the JavaScript code as explained in Section 2.3.5.Diversity Scores. In order to measure the path diversity of a SFG, we measureaverage pair-wise navigational diversity of leaf nodes (states without any outgoingedges) since the position of the leaf nodes in the graph is an indication of thediversity of its event paths (i.e., paths from the Index node to the leaves). TheAvgPD is defined as:AvgPD(SFG) =∑∀si,s j∈L(V ) PD(si,s j)2 ·m · (m−1) (2.6)where L(V ) denotes the set of leaf nodes in the SFG and m = |L(V )|, i.e., thenumber of leaf nodes. This value is in the range of 0 to 1.To assess the page structural diversity of the derived test models from eachmethod, we compute the overall average pair-wise structural diversity (AvgDD) in29Table 2.3: Results of different exploration methods.Exploration Statement Navigational Page Test TestMethod Coverage Path Structural Model SuiteDiversity Diversity Size SizeDFS 37.55% 0.010 0.035 578 247BFS 43.82% 0.410 0.065 475 165RND 40.44% 0.369 0.066 450 241FEEDEX 48.13% 0.443 0.081 280 95Improvement (min–max%) 10–28% 7–4000% 23–130% 38–86% 42–61%the derived SFG as:AvgDD(SFG) =∑∀si,s j∈V DD(si,s j)2 ·n · (n−1) (2.7)where n = |V |, i.e., the number of states in the SFG and AvgDD is in the range of0 to 1.Test Model and Test Suite Size. As discussed in Section 2.3.1, the derived testmodel (SFG) can be used to generate test cases through different graph traversaland coverage methods. The event size of the derived test model has a direct impacton the number and size of test cases that can be generated, regardless of the testgeneration method used.We consider (1) the number of edges (events) in the SFG, as the size of testmodel, and (2) the number of “distinct” simple event paths in the SFG, as the sizeof test suite (e.g. the number of all possible Selenium [42] test cases generatedfrom the test model). Two simple event paths (Definition 3) are distinct if theyvisit different sequence of states. Note that simple event paths can not have cyclesor loops. As described in Section 2.3.5, simple paths are generated using DFStraversal from the Index state to every other state in the SFG.2.4.3 Results and FindingsTable 2.3 shows the results of different exploration methods. We report the averagevalues obtained from the six experimental objects. For the random state expansionmethod (RND), we report the average values over five runs. The table shows state-ment code coverage, navigational path diversity (AvgPD), page structural diversity30(AvgDD), number of edges (events) as well as the number of distinct paths in thederived test model.Results present that for FEEDEX, there is on average between (min–max%)10–28% improvement on the final statement coverage (after 5 minutes), 7–4000%on path diversity, 23–130% on page structural diversity, 38–86% reduction in thenumber of edges, and 42–61% reduction in the distinct paths in the test model. Itis evident from the results that FEEDEX is capable of generating test models thatare enhanced in all aspects compared to the traditional exploration methods. Testmodels created using FEEDEX have smaller size (thus need less time to executetest cases) and higher code coverage and state diversity. The simultaneous im-provement in all the evaluation criteria points to the the effectiveness of our stateexpansion and event execution strategy.Given the limited amount of exploration time, we believe the achieved im-provements for code coverage is substantial. Our approach also significantly in-creases the average DOM diversity compared to the RND method. Note that themain reason for having small AvgDD values for all the methods, is the normaliza-tion by the large number of possible pair-wise links in the computation of AvgDD(Equations 2.7). There is a low improvement of 7% achieved by FEEDEX over BFSwith respect to the navigational path diversity (AvgPD). This is expected due to thefact that BFS, by its nature, generates models with many branches that do not sharetoo many event paths, and are more sparse, compared to the other methods. Theamount of shared paths also contribute to reducing the number of distinct paths,thus reducing the size of test suite. Therefore, although there is some improve-ment, FEEDEX can not improve AvgPD significantly compared to BFS. Amongtraditional methods, DFS is the least effective. Specifically, AvgPD achieved byDFS is much lower than others due to the fact that it keeps expanding states in thesame branch in most cases, and FEEDEX can improve the navigational diversitysignificantly (4000%).While evaluating different weights in FEEDEX scoring function, we found thatassigning 1 to wDD and 0 to the rest, is effective in producing more structuraldiverse models, if an application has many different DOM states, such as Pho-toGallery, and not that effective for applications with minimum DOM changes,such as ChessGame (a chess board that remains relatively the same). Similarly,31assigning 1 to wPD and 0 to the rest, is more effective in increasing AvgPD inapplications with many navigational branches (features) such as TinyMCE. In gen-eral, feedback-directed exploration technique, with the settings we used, is superiorover the exhaustive methods with respect to all aspects of generated test models.Considering the significant improvements achieved by using FEEDEX, the compu-tational overhead of our method is negligible.It worth to mention that if the exploration time is unlimited, the final generatedtest model for all the exploration methods will converge to a same model. This is,however, infeasible for many industrial web applications with huge state space andshort release time. Thus we believe the given 5 minute exploration limit can showhow FEEDEX can outperform exhaustive search methods in different aspects.2.5 DiscussionLimitations. A limitation of FEEDEX is that the maximum effectiveness dependson the weights used in the scoring function (Equation 2.4) and on the applicationstate-space and functionality, which is not known in advance. For example in ourother experiments we observed that setting wDD,wPD = 0 and wCI = 1, can gener-ate test models with higher code coverage but less diversity and larger test modelsize. In general, the setting we used in this work can generate a test model whichenhances all aspects of a test model compared to traditional methods. We do notclaim that Equation 2.4 is the best way to combine CI, PD, and DD, or the weightsthat we empirically found, always outperform previous work. Instead, we rely onthe intuition of the feedback-directed heuristic which we believe effectively worksmost of the time.Applications. The main application of our technique is in automated testing ofweb applications. The automatically derived test model, for instance, can be usedto generate different types of test cases used for invariant-based assertions aftereach event, regression testing, cross-browser testing [68, 74, 114, 119, 121, 128].Our exploration technique towards higher client-side code coverage can also helpwith a more accurate detection of unused/dead JavaScript code [124]. Moreover,FEEDEX is generic in terms of its scoring function, thus by changing the fitnessfunction (line 6 of Algorithm 1), it can generate a test model based on other user-32defined criteria.Threats to Validity. A threat to the external validity of our experiment is with re-gard to the generalization of the results to other web applications. We acknowledgethat more web applications should be evaluated to support the conclusions. To mit-igate this threat we selected our experimental objects from four different domains:gallery, task management, game, and editor. The selected web applications exhibitvariations in functionality and structure and we believe they are representative ofmodern web applications.One threat to the internal validity of our experiment is related to the evaluationmetrics including AvgDD and AvgPD proposed by the authors of this work basedon which the effectiveness of FEEDEX is evaluated. However, we believe thesemetrics capture different properties of a test model as described in Section 2.3.With respect to reliability of our results, FEEDEX and all the web-based sys-tems are publicly available, making the experiment reproducible.2.6 Related WorkCrawling Web Applications. Web crawling techniques for traditional hypertext-based websites have been extensively studied in the past two decades. More relatedto our work is the idea behind “scoped crawling” [144] to constrain the explorationto webpages that fall within a particular scope; this way, obtaining relevant contentbecomes more efficient than through a comprehensive crawl. Well-known exam-ples of scoped crawling include hidden-web crawling [110] and focused crawling[117]. Scope can also be defined in terms of geography, language, genre, format,and other content-based properties of webpages. All these techniques focus on thetextual contents of webpages. Our approach, on the other hand, is geared towardsfunctionality coverage of a web application under test.Crawling modern Web 2.0 applications engenders more challenges due to thedynamic client-side processing (through JavaScript and AJAX calls), and is rel-atively a new research area [70, 80, 120, 134]. Many web testing techniques[64, 68, 74, 114, 119, 121, 128, 129, 169, 171] depend on data gathered and modelsgenerated through crawling. However, such techniques rely on exhaustive searchmethods. To the best of our knowledge, this work is the first to propose a feedback-33directed exploration of web applications that enhances different aspects of a testmodel.Some metrics have been proposed to measure crawling effectiveness and di-versity. Marchetto et al. [64] propose crawlability metrics that capture the degreeto which a crawler is able to explore the web application. These metrics combinedynamic measurements such as code coverage, with static indicators, such as linesof code and cyclomatic complexity. Their crawlibility metrics are geared towardsthe server-side of web applications. Regarding our diversity metrics, although thenotion of diversity has been used for classifying search query results [61, 156, 157],we propose new metrics for capturing state and navigational diversity as two as-pects of a web application test model.More related to our work are guided test generation [169] and efficient AJAXcrawling techniques [70, 73, 79, 134]. Thummalapenta et al. [169] recently pro-posed a guided test generation technique for web applications, called WATEG.Their work crawls the application in an exhaustive manner, but directs test gen-eration towards higher coverage of specific business rules, i.e., business logic ofan application’s GUI. Our work differs from WATEG in two aspects. Firstly, weguide the exploration and not the test generation, and secondly, our objective isto increase code coverage and state diversity, and at the same time decrease testmodel size. Efficient strategies for AJAX crawling [70, 73, 79, 134] try to discoveras many states as possible in the shortest amount of time. The goal of our workis, however, not to crawl the complete state-space as soon as possible (which is in-feasible for many industrial web applications), but to drive a partial model, whichadequately covers different desired properties of the application.Static analysis. Researchers have used static analysis to detect bugs and securityvulnerabilities in web apps [93, 200]. Jensen et al. [101] model the DOM as a set ofabstract JavaScript objects. However, they acknowledge that there are substantialgaps in their static analysis, which can result in false-positives. Because JavaScriptis a highly dynamic language, such static techniques typically restrict themselves toa subset of the language. In particular, they ignore dynamic JavaScript interactionswith the DOM, which is error-prone and challenging to detect and resolve [143].Dynamic Analysis. Dynamic analysis and automated testing of JavaScript-based34web applications has gained more attention in the recent past. Marchetto et al.[114] proposed a state-based testing technique for AJAX applications in which se-mantically interacting event sequences are generated for testing. Mesbah et al.proposed [121] an automated technique for generating test cases with invariantsfrom models inferred through dynamic crawling. JSArt [128] and DoDOM [148]dynamically derive invariants for the JavaScript code and the DOM respectively.Such inferred invariants are used for automated regression and robustness testing.Artemis [66] is a JavaScript testing framework that uses feedback-directed ran-dom testing to generate test inputs. Compared to FEEDEX, Artemis randomly gen-erates test inputs, executes the application with those inputs and uses the gatheredinformation to generate new input. FEEDEX on the other hand, explores the appli-cation by dynamically running it and building a state-flow graph that can be usedfor test generation. The strength of FEEDEX is that it guides the application at run-time towards a better code and more diverse navigational and structural coverage.Kudzu [158] combines symbolic execution of JavaScript with random test gen-eration to explore sequence of events and produce input values depending on theexecution of the control flow paths. Their approach uses a complex string con-straint solver to increase the code coverage by producing acceptable input stringvalues. In our work we did not intend to increase the code coverage by consideringthe problem of input generation for the crawler and only focused on improving thecrawling strategy. Compared to Kudzu, FEEDEX is much simpler, more automated,and does not require such heavy computation.2.7 ConclusionsAn enabling technique commonly used for automating web application testing andanalysis is crawling. In this work we proposed four aspects of a test model that canbe derived by crawling, including functionality coverage, navigational coverage,page structural coverage, and size of the test model. We have presented a genericfeedback-directed exploration approach, called FEEDEX, to guide the explorationtowards client-side states and events that produce a test model with enhanced as-pects. Our experiment on six Web 2.0 applications shows that FEEDEX yieldshigher code coverage, diversity, and smaller test models, compared to traditional35exhaustive methods (DFS, BFS, and random).In this work, we only measure the client-side coverage. One possible futurework can be including the server-side code coverage in the feedback loop to ourguided exploration engine. This might help deriving some new states that are de-pendent on particular server-side code execution. Our state expansion method iscurrently based on a memory-less greedy algorithm, which might benefit from alearning mechanism to improve its effectiveness.36Chapter 3Leveraging Existing Tests in User Interface TestGeneration for Web ApplicationsSummary5Current test generation techniques start with the assumption that there are no ex-isting test cases for the system under test. However, many software systems nowa-days are already accompanied with an existing test suite. We believe that exist-ing human-written test cases are a valuable resource that can be leveraged in testgeneration. We demonstrate our insight by focusing on testing modern web appli-cations. To test web applications, developers currently write test cases in popularframeworks such as SELENIUM. On the other hand, most web test generation tech-niques rely on a crawler to explore the dynamic states of the application. Theformer requires much manual effort, but benefits from the domain knowledge ofthe developer writing the test cases. The latter is automated and systematic, butlacks the domain knowledge required to be as effective. We believe combining thetwo can be advantageous. In this work, we propose a technique to (1) mine thehuman knowledge present in the form of input values, event sequences, and asser-tions, in the human-written test suites, (2) combine the inferred knowledge with thepower of automated crawling and machine learning, and (3) extend the test suite foruncovered/unchecked portions of the web application under test. Our approach is5This chapter is the extended version of the paper appeared at the IEEE/ACM International Con-ference on Automated Software Engineering (ASE), 2014 [126] that is in preparation for submissionto a software engineering journal.37implemented in a tool called TESTILIZER. An evaluation of our approach indicatesthat, on average TESTILIZER can generate test suites with improvements by 105%in mutation score (fault detection rate) and by 22% in code coverage, compared tothe original human-written test suite.3.1 IntroductionTesting modern web applications is challenging since multiple languages, such asHTML, JavaScript, CSS, and server-side code, interact with each other to createthe application. The final result of all these interactions at runtime is manifestedthrough the DOM and presented to the end-user in the browser. To avoid deal-ing with all these complex interactions separately, many developers treat the webapplication as a black-box and test it via its manifested DOM, using testing frame-works such as SELENIUM [42]. These DOM-based test cases are written manually,which is a tedious process with an incomplete result.On the other hand, many automated testing techniques [74, 121, 159, 169] arebased on crawling to explore the state space of the application. Although crawling-based techniques automate the testing to a great extent, they are limited in threeareas:• Input values: Having valid input values is crucial for proper coverage of thestate space of the application. Generating these input values automatically ischallenging since many web applications require a specific type, value, andcombination of inputs to expose the hidden states behind input fields andforms.• Paths to explore: Industrial web applications have a huge state space. Cov-ering the whole space is infeasible in practice. To avoid unbounded explo-ration, which could result in state explosion, users define constraints on thedepth of the path, exploration time or number of states. Not knowing whichpaths are important to explore results in obtaining a partial coverage of aspecific region of the application.• Assertions: Any generated test case needs to assert the application be-haviour. However, generating proper assertions automatically without hu-38man knowledge is known to be challenging. As a result, many web testingtechniques rely on generic invariants [121] or standard validators [66] toavoid this problem.These two approaches work at the two extreme ends of the spectrum, namely,fully manual or fully automatic. We believe combining the two can be advanta-geous. In particular, humans may have the domain knowledge to see which in-teractions are more likely or important to cover than others; they may be able touse domain knowledge to enter valid data into forms; and, they might know whatelements on the page need to be asserted and how. This knowledge is typicallymanifested in manually-written test cases.In this work, we propose to (1) mine the human knowledge existing in manually-written test cases, (2) combine that inferred knowledge with the power of auto-mated crawling, and (3) extend the test suite for uncovered/unchecked portions ofthe web application under test. We present our technique and tool called TESTILIZER,which given a set of SELENIUM test cases TC and the URL of the application, au-tomatically infers a model from TC, feeds that model to a crawler to expand by ex-ploring uncovered paths and states, generates assertions for newly detected statesbased on the patterns learned from TC, and finally generates new test cases.To the best of our knowledge, this work is the first to propose an approach forextending a web application test suite by leveraging existing test cases. The maincontributions of our work include:• A novel technique to address limitations of automated test generation tech-niques by leveraging human knowledge from existing test cases.• An algorithm for mining existing test cases to infer a model that includes (1)input data, (2) event sequences, (3) and assertions, and feeding and expand-ing that model through automated crawling.• An algorithm for reusing human-written assertions in existing test cases byexact/partial assertion matching as well as through a learning-based mecha-nism for finding similar assertions.• An open source implementation of our technique, called TESTILIZER [47].39• An empirical evaluation of the efficacy of the generated test cases on 8 webapplications. On average, TESTILIZER can generate test suites that improvethe mutation score (fault detection rate) by 105% and the code coverage by22%, compared to the original test suite.• An empirical evaluation of the impact of the original tests effectiveness onthe generated tests. Our results suggest that, there is a very strong correlationbetween the effectiveness of the original tests and the effectiveness of thegenerated tests.3.2 Background and MotivationIn practice, web applications are largely tested through their DOM using frame-works such as SELENIUM. The DOM is a dynamic tree-like structure representinguser interface elements in the web application, which can be dynamically updatedthrough client-side JavaScript interactions or server-side state changes propagatedto the client-side. DOM-based testing aims at bringing the application to a par-ticular DOM state through a sequence of actions, such as filling a form and click-ing on an element, and subsequently verifying the existence or properties (e.g.,text, visibility, structure) of particular DOM elements in that state. Figure 3.1 de-picts a snapshot of a web application and Figure 3.2 shows a simple DOM-based(SELENIUM) test case for that application.For this work, a DOM state is formally defined as:Definition 4 (DOM State) A DOM State DS is a rooted, directed, labeled tree. Itis denoted by a 5-tuple, <D,Q,o,Ω ,δ >, where D is the set of vertices, Q is the setof directed edges, o ∈D is the root vertex, Ω is a finite set of labels and δ : D→Ωis a labelling function that assigns a label from Ω to each vertex in D. 2The DOM state is essentially an abstracted version of the DOM tree of a webapplication, displayed on the web browser at runtime. This abstraction is con-ducted through the labelling function δ , the implementation of which is discussedin Section 3.3.1 and Section 3.4.40Figure 3.1: A snapshot of the running example (an organizer application) and its partial DOM structure.Motivation. Overall, our work is motivated by the fact that a human-written testsuite is a valuable source of domain knowledge, which can be exploited for tack-ling some of the challenges in automated web application test generation. Anothermotivation behind our work is that manually written test cases typically correspondto the most common happy-paths of the application that are covered. Automatedanalysis can subsequently expand these to cover unexplored bad-weather applica-tion behaviour.Running example. Figure 3.1 depicts a snapshot of the Organizer [38], a webapplication for managing notes, contacts, tasks, and appointments, which we useas a running example to show how input data, event paths, and assertions can beleveraged from the existing test cases to generate effective test cases.Suppose we have a small test suite that verifies the application’s functionalityfor “adding a new note” and “adding a new contact”. Due to space constraints,we only show the testAddNote test case in Figure 3.2. The test case containsvaluable information regarding how to log onto the Organizer (Lines 4–5), whatdata to insert (Lines 9–10), where to click (Lines 6, 8, 11, 13), and what to assert(Lines 7, 12).We believe this information can be extracted and leveraged in automated testgeneration. For example, the paths (i.e., sequence of actions) corresponding tothese covered functionalities can be used to create an abstract model of the appli-cation, shown in thick solid lines in Figure 3.3. By feeding this model that containsthe event sequences and input data leveraged from the test case to a crawler, we can411 @Test2 public void testAddNote(){3 get("http://localhost:8080/theorganizer/");4 findElement("logon_username")).sendKeys("user");5 findElement("logon_password")).sendKeys("pswd");6 findElement(By.cssSelector("input type="image"")).click();7 assertEquals("Welcome to The Organizer!", closeAlertAndGetItsText());8 findElement("newNote")).click();9 findElement("noteCreateShow_subject")).sendKeys("Running ←↩Example");10 findElement("noteCreateShow_text")).sendKeys("Create a simple ←↩running example");11 findElement(By.cssSelector("input type="image"")).click();12 assertEquals("Note has been created.", driver.findElement("←↩mainContent")).getText());13 findElement("logoff")).click();14 }Figure 3.2: A human-written DOM-based (SELENIUM) test case for the Organizer.s1s2notess5logoffs6contactss4dayAtAGlanceoklogofflogoffcontactss8dayAtAGlancenoteslogoffoks11dayAtAGlancedeleteupdatelogoffcontactss10dayAtAGlancedeleteupdatenoteslogoffcontactsIndexlogOns9createAccount dayAtAGlanceedits3newNotecontactsdayAtAGlancesavenotes logoffcontactsdayAtAGlanceeditnoteslogoffs7newContactdayAtAGlancesavenoteslogoffcontactscreateAccountFigure 3.3: Partial view of the running example application’s state-flow graph. Paths in thick solid linescorrespond to the covered functionalities in existing tests. Alternative paths, shown in thin lines, canbe explore using a crawler. Alternative paths through newly detected states (i.e., s10 and s11) arehighlighted as dashed lines.explore alternative paths for testing, shown as thin lines in Figure 3.3; alternativepaths for deleting/updating a note/contact that result in newly detected states (i.e.,s10 and s11) are highlighted as dashed lines.Further, the assertions in the test case can be used as guidelines for generatingnew assertions on the newly detected states along the alternative paths. These orig-inal assertions can be seen as parallel lines inside the nodes on the graph of Figure3.3. For instance, line 12 of Figure 3.2 verifies the existence of the text “Note hasbeen created” for an element (span) with id="mainContent", which can be42assigned to the DOM state s4 in Figure 3.3.By exploring alternative paths around existing paths and learning assertionsfrom existing assertions, new test cases can be generated. For example the eventscorresponding to states 〈Index,s1,s2,s10,s4,s5〉 can be turned into a new testmethod testUpdateNote(), which on state s4, verifies the existence of a<span> element with id="mainContent". Further, patterns found in exist-ing assertions can guide us to generate similar assertions for newly detected states(e.g., s9, s10, s11) that have no assertions.3.3 ApproachFigure 3.4 depicts an overview of our approach. At a high level, given the URL ofa web application and its human-written test suite, our approach mines the existingtest suite to infer a model of the covered DOM states and event-based transitionsincluding input values and assertions (blocks 1, 2, and 3). Using the inferred modelas input, it explores alternative paths leading to new DOM states, thus expandingthe model further (blocks 3 and 4). Next it regenerates assertions for the new states,based on the patterns found in the assertions of the existing test suite (block 5), andfinally generates a new test suite from the extended model, which is a superset ofthe original human-written test suite (block 6). We discuss each of these steps inmore details in the following subsections.Scope. The technique presented in this work is applicable only in the contextof user interface regression testing. We assume the current version of the givenapplication is correct and thus augmented test suites are not useful to test the sameversion of the software under test.3.3.1 Mining Human-Written Test CasesTo infer an initial model, in the first step, we (1) instrument and execute the human-written test suite T to mine an intermediate dataset of test operations. Using thisdataset, we (2) run the test operations to infer a state-flow graph (3) by analyzingDOM changes in the browser after the execution of each test operation.Instrumenting and executing the test suite. We instrument the test suite (block1 Figure 3.4) to collect information about DOM interactions such as elements ac-43DOM-based FixtureJavaScript Code(1)Instrument Code(2)Execute Function(4)Deduce DOM-dependant PCsExecution Trace(3)Collect Execution Trace(9)Generate Test caseGenerated Unit Tests(7)Generate Fixture(8)Apply Fixture to Cover a New PathInstrumented Code(6)Solve XPath expressions (5)Translate PCs to XPath expressionsSolved XML treeFigure 3.4: Processing view of our approach.cessed in actions (e.g., clicks) and assertions as well as the structure of the DOMstates covered.Definition 5 (Manual-test Path) A manual-test path is the sequence of event-basedactions performed while executing a human-written test case t ∈ T . 2Definition 6 (Manual-test State) A manual-test state is a DOM state located ona manual-test path. 2The instrumentation hooks into any code that interacts with the DOM in anypart of the test case, such as test setup, helper methods, and assertions. Note thatthis instrumentation does not affect the functionality of the test cases (more detailsin Section 3.4). By executing the instrumented test suite, we store all observedmanual-test paths as an intermediate dataset of test operations:Definition 7 (Test Operation) A test operation is a triple <action, target, input>,where action specifies an event-based action (e.g., a click), or an assertion (e.g.,verifying a text), target pertains to the DOM element to perform the action on, andinput specifies input values (e.g., data for filling a form). 244The sequence of these test operations forms a dataset that is used to infer theinitial model. For a test operation with an assertion as its action, we refer to thetarget DOM element as a checked element, defined as follows:Definition 8 (Checked Element) A checked element ce ∈ vi is an element in theDOM tree in state vi, which its existence, its attributes, or its value are checked inan assertion of a test case t ∈ T . 2For example in line 12 of the test case in Figure 3.2, the text value of theelement with ID "mainContent" is asserted and thus that element is a checkedelement. Part of the DOM structure at this state is shown in Figure 3.1, whichimplies that <span id="mainContent"> is that checked element.For each checked element we record the element location strategy used (e.g.,XPath, ID, tagname, linktext, or cssselector) as well as the access values and inner-HTML text. This information is later used in the assertion generation process (inSection 3.3.3).Constructing the initial model. We model a web application as a State-FlowGraph (SFG) [120, 121] that captures the dynamic DOM states as nodes and theevent-driven transitions between as edges.Definition 9 (State-flow Graph) A state-flow graph SFG for a web applicationW is a labeled, directed graph, denoted by a 4 tuple < r,V ,E ,L > where:1. r is the root node (called Index) representing the initial DOM state after Whas been fully loaded into the browser.2. V is a set of vertices representing the states. Each v ∈ V represents anabstract DOM state DS of W , with a labelling function Φ : V → A thatassigns a label fromA to each vertex in V , whereA is a finite set of DOM-based assertions in a test suite.3. E is a set of (directed) edges between vertices. Each (v1,v2) ∈ E repre-sents a clickable c connecting two states if and only if state v2 is reached byexecuting c in state v1.4. L is a labelling function that assigns a label, from a set of event types andDOM element properties, to each edge.5. SFG can have multi-edges and be cyclic. 245Algorithm 2: State-Flow Graph Inferenceinput : A Web application url URL, a DOM-based test suite T S, crawling constraints CCoutput: A state-flow graph SFGProcedure INFERSFG(URL,TS,CC)begin1 T Sinst ← INSTRUMENT(T S)2 EXECUTE(T Sinst)3 TOP← READTESTOPERATIONDATASET()4 SFGinit ←∅5 browser.GOTO(URL)6 dom← browser.GETDOM()7 SFGinit .ADDINITIALSTATE(dom)8 for top ∈ TOP do9 C← GETCLICKABLES(top)10 for c ∈C do11 assertion← GETASSERTION(top)12 dom← browser.GETDOM()13 robot.FIREEVENT(c)14 new dom← browser.GETDOM()15 if dom.HASCHANGED(new dom) then16 SFGinit .UPDATE(c,new dom,assertion)endend17 browser.GOTO(URL)end18 SFGext ← SFGinit19 EXPLOREALTERNTIVEPATHS(SFGext ,CC)20 return SFGextendProcedure EXPLOREALTERNTIVEPATHS(SFG,CC) begin21 while CONSTRAINTSATISFIED(CC) do22 s← GETNEXTTOEXPLORESTATE(SFG)23 C← GETCANDIDATECLICKABLES(s)24 for c ∈C do25 browser.GOTO(SFG.GETPATH(s))26 dom← browser.GETDOM()27 robot.FIREEVENT(c)28 new dom← browser.GETDOM()29 if dom.HASCHANGED(new dom) then30 SFG.UPDATE(c,new dom)31 EXPLOREALTERNTIVEPATHS(SFG,CC)endendendendAn example of such a partial SFG is shown in Figure 3.3. The abstract DOMstate is an abstracted version of the DOM tree of a web application, displayed on theweb browser at runtime. This abstraction can be conducted by using a DOM stringedit distance, or by disregarding specific aspects of a DOM tree (such as irrelevant46attributes, time stamps, or styling issues) [121]. The state abstraction plays animportant role in reducing the size of SFG since many subtle DOM differences donot represent a proper state change, e.g., when a row is added to a table.Algorithm 2 shows how the initial SFG is inferred from the manual-test paths.First the initial index state is added as a node to an empty SFG (Algorithm 2, lines5–7). Next, for each test operation in the mined dataset (TOP), it finds DOM el-ements using the locator information and applies the corresponding actions. If anaction is a DOM-based assertion, the assertion is added to the set of assertions ofthe corresponding DOM state node (Algorithm 2, lines 8–17). The state compari-son to determine a new state (line 15) is carried out via a state abstraction function(more explanation in Section 3.4).3.3.2 Exploring Alternative PathsAt this stage, we have a state-flow graph that represents the covered states and pathsfrom the human-written test suite. In order to further explore the web applicationto find alternative paths and new states, we seed the graph to an automated crawler(block 4 Figure 3.4).The exploration strategy can be conducted in various ways: (1) remaining closeto the manual-test paths, (2) diverging [123] from the manual-test paths, or (3)randomly exploring. However, in this work, we have opted for the first option,remaining close to the manual-test paths. The reason is to maximize the potentialfor reuse of and learning from existing assertions. Our insight is that if we divergetoo much from the manual-test paths and states, the human-written assertions willalso be too disparate and thus less useful. We evaluate this exploration approach inSection 3.5.To find alternative paths, events are generated on DOM elements and if such anevent mutates the DOM state, the new state and the corresponding event transitionare added to the SFG. Note that the state comparison to determine a new state (line29) is done via the same state abstraction function used before (line 15). The pro-cedure ExploreAlternativePaths (Algorithm 2, lines 21–31) repeatedlycrawls the application until a crawling constraint (e.g., maximum time, or numberof states) is reached. The exploration is guided by the manual-test states while ex-47ploring alternative paths (Line 22); GetNexToExploreState decides whichstate should be expanded next. It gives the highest priority to the manual-test statesand when all manual-test states are fully expanded, the next immediate states foundare explored further. More specifically, it randomly selects a manual-test state thathas some unexercised candidate clickables and navigates the application throughthat state. The GetCandidateClickable method (Line 23) returns a set ofcandidate clickables that can be applied on the selected state. This process is re-peated until all manual-test states are fully expanded.For example, consider the manual-test sates shown in grey circles in Figure3.3. The alternative paths exploration method starts by randomly selecting a statesuch as s2, navigating the application to reach to that state from the Index state,and firing an event on s2 which can result in an alternative state s10.3.3.3 Regenerating AssertionsThe next step is to generate assertions for the new DOM states in the extendedSFG (block 5 Figure 3.4). In this work, we propose to leverage existing assertionsto regenerate new ones. By analyzing human-written assertions we can leverageinformation about (1) portions of the page that are more important for testing; forexample, a banner section or decoration parts of a page might not be as importantas an inner content that changes according to a main functionality, (2) patterns inthe page that are part of a template. Therefore, extracting important DOM patternsfrom existing assertions may help us in generating new but similar assertions.We formally define a DOM-based assertion as a function A : (s,c)→ {True,False}, where s is a DOM state, and c is a DOM condition to be checked. Itreturns True if s matches/satisfies the condition c, denoted by s |= c, and Falseotherwise. We say that an assertion A subsumes (implies) assertion B, denoted byA =⇒ B, if A→ True, then B→ True. This means that B can be obtained fromA by weakening A’s condition. In this case, A is more specific/constrained thanB. For instance, an assertion verifying the existence of a checked element <spanid="mainContent"> can be implied by an assertion which verifies both theexistence of that element and its attributes/textual values.Algorithm 3 shows our assertion regeneration procedure. We consider each48manual-test state si (Definition 6) in the SFG and try to reuse existing associatedassertions in si or generate new ones based on them for another state s j. We extendthe set of DOM-based assertions in three forms: (1) reusing the same assertionsfrom manual-test states for states without such assertions, (2) regenerating asser-tions with the exact assertion pattern structure as the original assertions but adaptedfor another state, and (3) learning assertions structures of the original assertions,and generate similar ones for another state.Assertion ReuseAs an example for the assertion reuse, consider Figure 3.3 and the manual-testpath with the sequence of states 〈Index,s1,s2,s3,s4,s5〉 for adding a note. Asser-tions in Figure 3.2 line 7 and 12 are associated to states s1 and s4, respectively.Suppose that we explore an alternative path for deleting a note with the sequence〈Index,s1,s2,s10,s4,s5〉, which was not originally considered by the developer.Since the two test paths share a common path from Index to s1, the assertion on s1can be reused for the new test case (note deletion) as well. This is a simple form ofassertion reuse on new test paths.Assertion RegenerationWe regenerate two types of precondition assertions namely exact element-basedassertions, and exact region-based assertions. By “exact” we mean repetition ofthe same structure of an original assertion on a checked element.The rationale behind our technique is to use the location and properties ofchecked elements and their close-by neighbourhood in the DOM tree to regen-erate assertions, which focus on the exact repeated structures and patterns in otherDOM states. This approach is based on our intuition that checking the close-byneighbour of checked elements is similarly important.Exact element assertion generation. We define assertions of the formA (s j,c(e j))with a condition c(e j) for element e j on state s j. Given an existing checked element(Definition 8) ei on a DOM state si, we consider 2 conditions as follows:1. ElementFullMatched: If a DOM state s j contains an element with exact tag,attributes, and text value as ei, then reuse assertion on ei for checking e j on49Algorithm 3: Assertion Regenerationinput : An extended state-flow graph SFG = < r,V,E,L >Procedure REGENERATEASSERTIONS(SFG)begin/*Learn from DOM elements in the manual-test states*/1 dataset←MAKEDATASET(SFG.GETMANUALTESTSTATES())2 TRAIN(dataset)3 for si ∈V do4 for ce ∈ si.GETCHECKEDELEMENTS() do5 assert← ce.GETASSERTION()6 cer← ce.GETCHECKEDELEMENTREGION()7 si.ADDREGFULLASSERTION(cer)8 for s j ∈V & s j 6= si do9 dom← s j.GETDOM()/*Generate exact element assertion for s j*/10 if ELEMENTFULLMATCHED(ce,dom) then11 s j .REUSEASSERTION(ce,assert)end12 else if ELEMENTTAGATTMATCHED(ce,dom) then13 s j .ADDELEMTAGATTASSERTION(ce)end/*Generate exact region assertion for s j*/14 if REGIONFULLMATCHED(cer,dom) then15 s j .ADDREGFULLASSERTION(cer)end16 else if REGIONTAGATTMATCHED(cer,dom) then17 s j .ADDREGTAGATTASSERTION(cer)end18 else if REGIONTAGMATCHED(cer,dom) then19 s j .ADDREGTAGASSERTION(cer)endendend/* Generate similar region assertions for si */20 for be ∈ si.GETBLOCKELEMENTS() do21 if PREDICT(be) == 1 then22 si.ADDREGTAGATTASSERTION(be.GETREGION())endendendends j.2. ElementTagAttMatched: If a DOM state s j contains an element e j with exacttag and attributes, but different text value as ei, then generate assertion on e jfor checking its tag and attributes.Table 3.1 summarizes these conditions. An example of a generated assertion is50Table 3.1: Summary of the assertion reuse/regeneration conditions for an element e j on a DOM state s j ,given a checked element ei on state si.Condition DescriptionElementFullMatched Tag(ei)=Tag(e j) ∧ Att(ei)=Att(e j) ∧T xt(ei)=T xt(e j)ElementTagAttMatched Tag(ei)=Tag(e j)∧Att(ei)=Att(e j)RegionFullMatched Tag(R(ei,si))=Tag(R(e j,s j)) ∧Att(R(ei,si))=Att(R(e j,s j)) ∧T xt(R(ei,si))=T xt(R(e j,s j))RegionTagAttMatched Tag(R(ei,si))=Tag(R(e j,s j)) ∧Att(R(ei,si))=Att(R(e j,s j))RegionTagMatched Tag(R(ei,si))=Tag(R(e j,s j))assertTrue(isElementPresent("mainContent"))) whichchecks the existence of a checked element with ID "mainContent". Such anassertion can be evaluated in any state in the SFG that contains that DOM element(and thus meets the precondition). Note that we could also propose assertions incase of mere tag matches, however, such assertions are not generally considereduseful as they are too generic.Exact region assertion generation. We define the term checked element region torefer to a close-by area around a checked element:Definition 10 (Checked Element Region) For a checked element e on state s, achecked element regionR(e,s), is a functionR : (e,s)→{e,P(e),C h(e)}, whereP(e) and C h(e) are the parent node, and children nodes of e respectively. 2For example, for the element e = <span id="mainContent"> (Fig-ure 3.1), which is in fact a checked element in line 12 of Figure 3.2 (at states4 in Figure 3.3), we have R(e,s4) = {e,P(e),C h(e)}, where P(e)=<tdclass="cssMain" valign="top">, and C h(e)={<img src="img/head notes.gif">, <p>, <input id="ok" src="img/ok0.gif">}.We define assertions of the form A (s j,c(R(e j,s j))) with a conditionc(R(e j,s j)) for the region R of an element e j on state s j. Given an existingchecked element ei on a DOM state si, we consider 3 conditions as follows:1. RegionFullMatched: If a DOM state s j contains an element e j with exact tag,attributes, and text values of R(e j,s j) as R(ei,si), then generate assertiononR(e j,s j) for checking its tag, attributes, and text values.512. RegionTagAttMatched: If a DOM state s j contains an element e j with exacttag, and attributes values ofR(e j,s j) asR(ei,si), then generate assertion onR(e j,s j) for checking its tag and attributes values.3. RegionTagMatched: If a DOM state s j contains an element e j with exacttag value of R(e j,s j) as R(ei,si), then generate assertion on R(e j,s j) forchecking its tag value.Note that the assertion conditions are relaxed one after another. In other words,on a DOM state s, if s |= RegionFullMatched, then s |= RegionTagAttMatched;and if s |= RegionTagAttMatched, then we have s |= RegionTagMatched. Conse-quently it suffices to use the most constrained assertion. We use this property forreducing the number of generated assertions in Section 22.Table 3.1 summarizes these conditions. Assertions that we generate for achecked element region, are targeted around a checked element. For instance, tocheck if a DOM state contains a checked element region with its tag, attributes, andtext values, an assertion will be generated in the form of assertTrue(isElementRegionFullPresent(parentElement,element,childrenElements)), where parentElement, element, and childrenElementsare objects reflecting information about that region on the DOM.For each checked element ce on si, we also generate a RegionFull type ofassertion for checking its region, i.e., verifying RegionFullMatched condition onsi (Algorithm 3 lines 6–7). Lines 10–13 perform exact element assertion gener-ation. The original assertion can be reused in case of ElementFullMatched (line11). Lines 14–19 apply exact region assertion generation based on the observedmatching. Notice the hierarchical selection which guarantees generation of morespecific assertions.Learning Assertions for Similar RegionsThe described exact element/region assertion regeneration techniques only con-sider the exact repetition of a checked element/region. However, there might bemany other DOM elements that are similar to the checked elements but not ex-actly the same. For instance, consider Figure 3.2 line 12 in which a <spanid="mainContent"> element was checked in an assertion. If in another state,52a <div id="centreDiv"> element exists, which is similar to the <span>element in certain aspects such as content and position on the page, we could gen-erate a DOM-based assertion for the <div> element in the form of assertTrue(isElementPresent("centreDiv")));.We view the problem of generating similar assertions as a classification prob-lem which decides whether a block level DOM element is important to be checkedby an assertion or not. To this end, we apply machine learning to train a classi-fier based on the features of the checked elements in existing assertions. Morespecifically, given a training dataset D of n DOM elements in the form D ={(xi,yi) | xi ∈ Rp, yi ∈ {−1,1}}ni=1, where each xi is a p-dimensional real vectorrepresenting the features of a DOM element ei, and yi indicates whether ei is achecked element (+1) or not (−1), the classification function F : xj→ yi maps afeature vector x j to its class label y j. To do so, we use Support Vector Machine(SVM) [179] to find the max-margin hyperplane that divides the elements withyi = 1 from those with yi =−1. We chose SVM because it outperforms other clas-sification algorithms that we tried. In the rest of this subsection, we describe ourused features, how to label the feature vectors, and how to generate similar regionDOM-based assertions.DOM element features. We present a set of features for a DOM element to beused in our classification task. A feature extraction function ψ : e→ x maps anelement e to its feature set x. Many of these features are based on and adaptedfrom the work in [164], which performs page segmentation ranking for adapta-tion purpose. The work presented a number of spatial and content features thatcapture the importance of a webpage segment based on a comprehensive userstudy. Although they targeted a different problem than ours, we gained insightfrom their empirical work and use that to reason about the importance of a pagesegment for testing purposes. Our proposed DOM features are presented in Ta-ble 3.2. We normalize feature values between [0–1] as explained in Table 3.2, tobe used in the learning phase. For example, consider the element e = <span>in Figure 3.1, then ψ(e) = 〈0.5, 0.7, 0.6, 0.5, 1, 0.2, 0, 0.3〉 corresponding tofeatures BlockCenterX, BlockCenterY, BlockWidth, BlockHeight,TextImportance, InnerHtmlLength, LinkNum, and ChildrenNum, re-spectively.53Table 3.2: DOM element features used to train a classifier.Feature Name Definition RationaleElementCenterX,ElementCen-terYThe (x,y) coordinates of the centre of a DOM ele-ment. BlockCenterX and BlockCenterY are normal-ized by dividing by PageWidth and PageHeight (i.e.,the width and height of the whole page) respectively.Web designers typically put the most important information (main con-tent) in the centre of the page, the navigation bar on the header or on theleft side, and the copyright on the footer [164]. Thus, if the (x,y) coordi-nate of the centre of a DOM block is close to the (x,y) coordinate of theweb page centre, that block is more likely to be part of the main content.ElementWidth,ElementHeightThese are the width and height of the DOM element,which are also normalized by dividing by PageWidthand PageHeight, respectively.The width and height of an element can be an indication for an importantsegment. Intuitively, large blocks typically contain much irrelevant noisycontent [164].TextImportance This binary value feature indicates whether the blockelement contains any visually important text.Text in bold/italic style, or header elements (such as h1, h2,..., h5) tohighlight and emphasize textual content usually imply importance in thatregion.InnerHtmlLength The innerHtmlLength is the length of all HTML codestring (without whitespace) in the element block. Wenormalize this value by dividing it by InnerHtml-Length of the whole page.The normalized feature value can indicate the block content size. Intu-itively, blocks with many sub-blocks and elements are considered to beless important than those with fewer but more specific content [164].LinkNum The LinkNum is the number of anchor (hyperlink) el-ements inside the DOM element and is normalized bythe link number of the whole page.If a DOM region contains clickables, it is likely part of a navigationalstructure (menu) and not part of the main content [164].ChildrenNum The ChildrenNum is the number of child nodes undera DOM node. We normalize this value by dividing itby a constant number (10 in our implementation) andsetting the normalized value to 1 if it exceeds 1.We have observed in many DOM-based test cases that checked elementsdo not have a large number of children nodes. Therefore, this feature canbe used to discourage elements with many children to be selected for aregion assertion, to enhance test readability.54Labelling the feature vectors. For the training phase, we need a dataset of featurevectors for DOM elements annotated with +1 (important to be checked in assertion)and -1 (not important for testing) labels. After generating a feature vector for each“checked DOM element”, we label it by +1. For some elements with label -1,we consider those with “most frequent features” over all the manual-test states.Unlike previous work that focuses on DOM invariants [148], our insight is thatDOM subtrees that are invariant across manual-test states, are less important to bechecked in assertions. In fact, most modern web applications execute a significantamount of client-side code in the browser to mutate the DOM at runtime; henceDOM elements that remain unchanged across application execution are more likelyto be related to fixed (server-side) HTML templates. Consequently, such elementsare less likely to contain functionality errors. Thus, for our feature vectors weconsider all block elements (such as div, span, table) on the manual-test statesand rank them in a decreasing order based on their occurrences. In order to havea balanced dataset of items belonging to {-1,+1}, we select the k-top ranked (i.e.,k most frequent) elements with label -1, were k equals the number of label +1samples.Predicting new DOM elements. Once the SVM is trained on the dataset, it isused to predict whether a given DOM element should be checked in an assertion(algorithm 3, Lines 20–23). If the conditionF (ψ : e→ x)=1 holds, we generate aRegionTagAtt type assertion (i.e., checking tag and attributes of a region). Wedo not consider a RegionFull (i.e., checking tag, attributes, and text of a region)assertion type in this case because we are dealing with a similar detected region,not an exact one. Also, we do not generate a RegionTag assertion type becausea higher priority should be given to the similar region-based assertions.Assertion MinimizationThe proposed assertion regeneration technique can generate many DOM-based as-sertions per state, which in turn can make the generated test method hard to com-prehend and maintain. Therefore, we (1) avoid generating redundant assertions,and (2) prioritize assertions based on their constraints and effectiveness.Avoiding redundant assertions. A new reused/generated assertion for a state55RegionTagRegionTagAttributeRegionFullElementTagAttributeElementFullFigure 3.5: The subsumption lattice showing the is-subsumed-by relation for generated assertions.(Algorithm 3, lines 5, 11, 13, 15, 17, 19, and 22), might already be sub-sumed by, or may subsume other assertions, in that state. For example an ex-act element assertion which verifies the existence of a checked element <spanid="mainContent"> can be subsumed by an exact region assertion which hasthe same span element in either its checked element, parent, or its children nodes.Assertions that are subsumed by other assertions are redundant and safely elimi-nated to reduce the overhead in testing time and increase the readability and main-tainability of test cases. For a given state s with an existing assertion B, a newassertion A generated for s is treated as follows:Discard A ; if B =⇒ AReplace B with A ; if A =⇒ B andB 6∈ original assertionsAdd A to s ; otherwiseFigure 3.5 depicts the subsumption lattice in which arrows indicate is-subsumed-by relation. For instance, assertions of type RegionFull (checking thetag, attributes, and text of a region) subsume all other weaker assertions such asRegionTagAttribute (checking the tag and attributes of a region), or ElementFull(checking the tag, attributes, and text of an element).Prioritizing assertions. We prioritize the generated assertions such that given amaximum number of assertions to produce per state, the more effective ones areranked higher and chosen. We prioritize assertions in each state in the following or-56der; the highest priority is given to the original human-written assertions. Next arethe reused, the RegionFull, the RegionTagAtt, the ElementTagAtt, andthe RegionAtt assertions. This ordering gives higher priorities to more speci-fic/constrained assertions first.3.3.4 Test Suite GenerationIn the final step, we generate a test suite from the extended state-flow graph. Eachpath from the Index node to a sink node (i.e., node without outgoing edges) inthe SFG is transformed into a unit test. Loops are included once. Each test casecaptures the sequence of events as well as any assertions for the target states. Tomake the test case more readable for the developers, information (such as tag nameand attributes) about related DOM elements is generated as code comments.Filtering unstable assertions. After generating the extended test suite, we makesure that the reused/regenerated assertions are stable, i.e., do not falsely fail, whenrunning the test suite on an unmodified version of the web application. Some ofthese assertions are not only DOM related but also depend on the specific paththrough which the DOM state is reached. Our technique automatically identifiesand filters these false positive cases from the generated test suite. This is donethrough executing the generated test suite and eliminating failing assertions formthe test cases iteratively, until all tests pass successfully.3.4 ImplementationThe approach is implemented in a tool, called TESTILIZER, which is publicly avail-able [47]. The state exploration component is built on top of CRAWLJAX [120].TESTILIZER requires as input the source code of the human-written test suite andthe URL of the web application. Our tool does not modify the web application thusthe source code of web application is not required. TESTILIZER currently supportsSELENIUM tests, however, our approach can be easily applied to other DOM-basedtests as well.To instrument the test cases, we use JavaParser [32] to get an abstract syntaxtree. We instrument all DOM related method calls and calls with arguments thathave DOM element locaters. The identified DOM related method call expressions,57such as findElement in SELENIUM tests, will be then wrapped by our ownmethod call that enables us to collect information such as page source, test casename, elements under test, element locator, and assertions, from test executionat runtime. We also log the DOM state after every event in the tests, capable ofchanging the DOM. Note that this instrumentation does not affect the functionalityof the test cases.For the state abstraction function (as defined in Definition 4), we generate anabstract DOM state by ignoring recurring structures (patterns such as table rowsand list items), textual content (such as ignoring the text node “Note has beencreated” in the partial DOM shown in Figure 3.1), and contents in the <script>tags. For the classification step, we use LIBSVM [72], which is a popular libraryfor support vector machines.3.5 Empirical EvaluationTo assess the efficacy of our proposed technique, we have conducted a controlledexperiment to address the following research questions:RQ1. How much of the information (input data, event sequences, and assertions)in the original human-written test suite is leveraged by TESTILIZER?RQ2. How successful is TESTILIZER in regenerating effective assertions?RQ3. How does the effectiveness of the original test suite affect the effectivenessof the generated tests?RQ4. Does TESTILIZER improve code coverage?Our experimental data along with the implementation of TESTILIZER are avail-able for download [47].3.5.1 Experimental ObjectsWe selected eight open-source web applications that contain existing SELENIUMtest suites and fall under different application domains. Moreover, since our ef-fectiveness assessment (RQ2) is based on the client-side mutation analysis (as ex-58Table 3.3: Experimental objects.ID Name SLOC # Test # Assertions # MutantsMethods1 Claroline PHP (295K) 23 35 38(v 1.11.7) JS (36K)2 PhotoGallery PHP (5.6K) 7 18 26(v 3.31) JS (1.5K)3 WolfCMS PHP (35K) 12 42 28(v 0.7.8) JS (1.3K)4 EnterpriseStore Java (3K) 19 17 52(v 1.0.0) JS (57K)5 CookeryBook PHP (25K) 6 25 22JS (0.4K)6 AddressBook PHP (30.2K) 13 13 30(v 8.2.5) JS (1.1K)7 StudyRoom PHP (1.6K) 12 23 38JS (10.6K)8 Brotherhood PHP (212K) 10 10 41(v 0.4.5) JS (0.8K)plained in Section 3.5.2), we chose applications that make extensive use of client-side JavaScript.The experimental objects and their properties are shown in Table 3.3. Claroline[21] is a collaborative e-learning environment, which allows instructors to createand administer courses. Phormer [39] is a photo gallery equipped with upload,comment, rate, and slideshow functionalities. WolfCMS [18] is a content manage-ment system. EnterpriseStore [6] is an enterprise asset management web applica-tion. CookeryBook [5] is an adaptable cookery book that customizes the presenta-tion of recipes. AddressBook [1] is an address/phone book. SimulatedStudyRoom[14] is a a web interface for the projector system of a study room, which simulatesan outdoor study environment. Brotherhood [2] is an online social networking plat-form for sharing interests, activities, and updates, sending messages, and uploadingphotos.3.5.2 Experimental SetupOur experiments are performed on Mac OS X, running on a 2.3GHz Intel Core i7CPU with 8 GB memory, and FireFox 38.0.59Table 3.4: Test suite generation methods evaluated.Test Suite Exploration Strategy Action Sequence AssertionGeneration Generation GenerationMethod Method MethodORIG Manual Manual ManualEXND+AR(TESTILIZER)Explore around manual-test states inthe initial SFG (EXND)Traversing paths in the extendedSFG generated from the originaltestsAssertionregen-eration(AR)EXND*+AR(TESTILIZER*)Explore around newly discoveredstates in the extended SFG (EXND*)Traversing paths in the extendedSFG generated from the originaltestsAssertionregen-eration(AR)EXND+RND Explore around states in the initialSFG (EXND)Traversing paths in the extendedSFG generated from the originaltestsRandom(RND)RAND+RND Random crawling (RAND) Traversing paths in the SFG gen-erated by random crawlingRandom(RND)Independent VariablesWe compare the original human-written test suites with the test suites generated byTESTILIZER.Test suite generation method. We evaluate different test suite generation meth-ods for each application as presented in Table 3.4. We compare EXND+AR(TESTILIZER) with 4 baselines, (1) ORIG: original human-written test suite, (2)EXND*+AR (TESTILIZER*): differs from TESTILIZER in the alternative path ex-ploration approach, i.e., EXND* explores around the newly discovered states ratherthan exploring around states in the initial SFG. In other word, paths in the extendedSFG are more diverse from manual-test paths. This is to evaluate our proposedexploration approach (Section 3.3.2). (3) EXND+RND: test suite generated bytraversing the extended SFG, equipped with random assertion generation, and (4)RAND+RND: random exploration and random assertion generation.In random assertion generation, for each state we generate element/region as-sertions by randomly selecting from a pool of DOM-based assertions. These ran-dom assertions are based on the existence of an element/region in a DOM state.Such assertions are expected to pass as long as the application is not modified.However, due to our state abstraction mechanism, this can result in unstable asser-tions, which are also automatically eliminated following the approach explained inSection further evaluate various instantiations of our assertion generation inEXND+AR:• (a) original assertions,• (b) reused assertions (Section 22),• (c) exact generated (Section 22),• (d) similar region generated (Section 22),• (e) a combination of all these types.Exploration constraints. We confine the max exploration time to five minutes inour experiments to evaluate generated models of different sizes. Suppose in theEXND/EXND* approaches, TESTILIZER/TESTILIZER* spend time t to generatethe initial SFG for an application. To make a fair comparison, we add this timet to the 5 minutes for the RAND exploration approach. We set no limits on thecrawling depth nor the maximum number of states to be discovered while lookingfor alternative paths. Note that for EXND/EXND* and RAND crawling, after aclickable element on a state was exercised, the crawler resets to the index page andcontinues crawling from another chosen state.Maximum number of generated assertions. We set the maximum number ofgenerated assertions for each state to five. To have a fair comparison, also for theEXND+RND and RAND+RND methods, we perform the same assertion prioriti-zation used in TESTILIZER and select the top ranked.Learning parameters. We set the SVM’s kernel function to the Gaussian RBF,and use 5-fold cross-validation for tuning the model and feature selection.Dependent VariablesOriginal coverage. To assess how much of the information including input data,event sequences, and assertions of the original test suite is leveraged (RQ1), wemeasure the state and transition coverage of the initial SFG (i.e., SFG mined fromthe original test cases). We also measure how much of the unique assertions andunique input data in the original test cases has been utilized.61Table 3.5: DOM mutation operators.ID Mutation Operator Description1 Changing the id/tag name in getElementById and getElementByTagName methods.2 Changing the attribute name/value in setAttribute, getAttribute and removeAttributemethods.3 Removing $ sign that returns a jQuery object.4 Changing the innerHTML assignment of an element to an empty string.5 Swapping innerHTML and innerText properties.6 Changing the name of the property/class/element in the addClass, removeClass, removeAttr,remove, attr, and css methods in jQuery.7 Changing request type (Get/Post), URL, and the value of the boolean asynch argument in method.Mutation score. In the context of regression testing, a realistic evaluation of theeffectiveness of generated tests requires different versions of the same subject ap-plications. However, we did not have such multiple versions that work fine withthe existing tests. Thus, to answer RQ2 (assertions effectiveness), we evaluatethe DOM-based fault detection capability of TESTILIZER through automated first-order mutation analysis. The test suites are evaluated based on the number ofdetected (or killed) mutants, which is known as the mutation score. Mutation scoreis used to measure the effectiveness of a test suite in terms of its ability to detectfaults [186]. The mutation score is computed as the percentage of killed mutantsover all (non-equivalent) mutants.We apply the DOM, jQuery, and XHR mutation operators at the JavaScriptcode level, shown in Table 3.5, which are based on a study of common mistakesmade by web developers [129]. For each experimental object we generate as muchpossible mutated (faulty) versions, each containing one injected artificial defectinto the JavaScript code. To do so, we carefully searched within JavaScript filesof each subject system for target mutation locations based on the defined mutationoperators. The last column of Table 3.3 presents the number of injected mutantsfor each application with an average of 34 generated mutants.Correlation. For RQ3 (the impact of original tests effectiveness), we use the non-parametric Spearman’s correlation to quantitively study the relationship betweenthe effectiveness of the original and the generated test suites. The Spearman’s cor-relation coefficient measures the monotonic relationship between two continuousrandom variables. The Spearman correlation coefficient does not require the data62ORIG EXND EXND* RAND10203040506070# States(a) Number of states.ORIG EXND EXND* RAND020406080# Transitions(b) Number of transitions.ORIG EXND EXND* RAND0102030405060# Test cases(c) Number of test cases.Figure 3.6: Box plots of number of states, transitions, and test cases for different test suites. Mean valuesare shown with (*).to be normally distributed [103].In order to measure the influence of the effectiveness of the original tests onthe effectiveness of the generated test, we compute the correlation between themutation score of the original tests and the generated tests. For this purpose, wegenerate different versions of a test suite by removing some test cases, which canaffect the mutation score of the original test suite as well as the generated testsuite. Since applying mutation analysis on many different variants of test suitesfor all of the experimental objects is very costly, we only considered the Phormerphoto gallery application in this study and executed test suites 780 times. Theoriginal written test suite contains 7 test cases. We generate different combinationson test suites with 1 test case (7 variants) and 6 test cases (7 variants) out of 7 tests,resulting in 14 test suites. For each of the 14 test suites plus the original one (15in total) we use TESTILIZER to generate an extended test suite, resulting in 30 testsuites (15 original and 15 generated). Finally we calculate the mutation score foreach of the 30 test suites on 26 mutants, i.e., 780 test suite executions, and measurethe correlation between the mutation score of the 15 original and the 15 generatedtests.Code coverage. Code coverage has been commonly used as an indicator of thequality of a test suite by identifying under-tested parts, while it does not directlyimply the effectiveness of a test suite [97]. Although TESTILIZER does not targetcode coverage maximization, to address RQ4, we compare the JavaScript codecoverage of the different test suites using JSCover [33].633.5.3 ResultsBox plots in Figure 3.6 depict characteristics of the evaluated test suites with re-spect to the exploration strategies. For the random exploration (RAND), we re-port the average values over three runs. As expected, the number of states, tran-sitions, and generated test cases are higher in TESTILIZER/TESTILIZER* (shownas EXND/EXND*). RAND on average generates fewer states and transitions, butmore test cases compared to the original test suite. This is mainly due to the factthat in the SFG generated by RAND, there are more paths from the Index to thesink nodes than in the SFG mined from the original test suite. Results indicate thatthe number of states, transitions, and test cases are not that different in EXND orEXND* exploration methods.Figure 3.7 presents the average number of assertions per state before and afterfiltering the unstable ones (as explained in Section 3.3.4). The difference betweenthe number of actual generated assertions and the stable ones reveals that our gen-erated assertions (combined, similar/exact generated) are more stable than the ran-dom approach. The reduction percentage is 17%, 48%, 30%, 9%, 13%, 15%, 38%,and 35% for the original, reused, exact generated, similar generated, combined(TESTILIZER), TESTILIZER*, EXND+RND and RAND+RND, respectively.A major source of this instability is the selection of dynamic DOM elements inthe generated assertions. For instance, RND (random assertion generation) selectsmany DOM elements with dynamic time-based attributes. Also the more restrictedan assertion is, the less likely it is to remain stable in different paths. This isthe case for some of the (1) reused assertions that replicate the original assertionsand (2) exact generated ones specially FullRegionMatchs type. On the otherhand, learned assertions are less strict (e.g., AttTagRegionMatchs) and arethus more stable.Finding 1: Test suites generated by TESTILIZER compared to randomly gener-ated tests are more stable, i.e., they have fewer false failures.Overall, the test suite generated by TESTILIZER, on average, consists of 9%original assertions, 14% reused assertions, 33% exact generated assertions, and44% of similar learned assertions.Original SFG coverage (RQ1). Table 3.6 shows the usage of original test suite640	  1	  2	  3	  4	  5	  EXND	  +	  Original	  EXND	  +	  Reused	  EXND	  +	  Exact	  Generated	  EXND	  +	  Similar	  Generated	  EXND	  +	  Combined	  (Tes>lizer)	  EXND*	  +	  Combined	  (Tes>lizer*)	  EXND	  +	  RND	  RAND	  +	  RND	  Avg	  #	  Asse>ons	  per	  State	  Before	  Filtering	   A4er	  Filtering	  Figure 3.7: Average number of assertions per state, before and after filtering unstable assertions.Table 3.6: Statistics of the original test suite information usage, average over experimental objects.Test Suite Generation Original Original Original OriginalState Transition Input AssertionCoverage Coverage Data Usage UsageEXND/EXND*+AR 98% 96% 100% 100%EXND+RND 98% 96% 100% 0%RAND+RND 70% 63% 0% 0%information (RQ1) by different test suites. As expected TESTILIZER, which lever-ages the event sequences and inputs of the original test suite, has on average almostfull state (98%) and transition (96%) coverage of the initial model. This is the samefor EXND and EXND* exploration strategies. The few cases missed are due to thetraversal algorithm we used, which has limitations on dealing with cycles in thegraph that do not end with a sink node and thus are not generated. Note that wecan select the missing cases from the original manual-written test suite and addthem to the generated test suite.On average, the RAND exploration approach covered 70% of the states and of63% the transitions, without using input data except for the login data, which wasprovided to the crawler. By analyzing the generated test suites, we found that themissing of original states and transitions are mainly due to the lack of proper inputdata.Finding 2: Randomly generated tests are unable to produce many of the statesand transitions of the model extracted from the original written tests due to thelack of proper inputs.Test suite effectiveness (RQ2). Figure 3.8 depicts a comparison of mutation scores65020406080Mutation Score (%)ORIGEXND +OriginalEXND +ReusedEXND +ExactGeneratedEXND +SimilarGeneratedEXND +Combined(Testilizer)EXND* +Combined(Testilizer*)EXND +RNDRAND +RNDFigure 3.8: Box plots of mutation score using different test suite generation methods. Mean values areshown with (*).for the different methods. It is evident that exact and similar generated assertionsare more effective than original and reused ones. The effectiveness of each asser-tion generation technique solely is not more than the random approach. However,the results show that their combination (TESTILIZER) outperforms fault detectioncapability of the original test suite by 105% (21 percentage point increase) and therandom methods by 22% (8 percentage point increase). This supports our insightthat leveraging input values and assertions from human-written test suites can behelpful in generating more effective test cases.Finding 3: On Average, tests generated by TESTILIZER outperform the faultdetection capability of the original test suite by 105% (21 percentage point in-crease) and the random methods by 22% (8 percentage point increase).Test suites generated using TESTILIZER* are almost as effective as those gen-erated by TESTILIZER. Applying EXND and EXND* can result in extended SFGswith different states and transitions. Based on our observations of failed test cases,i.e. detected mutants, we can give two reasons for this similar effectiveness: (1)the injected mutants can be reflected in multiple DOM states and thus assertionsmay fail on states that can be different in the two extended SFGs. (2) most of thedetected mutants were reflected on common initial SFG states.Impact of the original tests effectiveness (RQ3). The Spearman correlation anal-ysis reveals that there exists a strong positive correlation (r= 0.91, p=0) betweenthe effectiveness of the original tests and the effectiveness of the generated tests.This is expected as more manual-test paths/states and assertions helps TESTILIZER66ORIG EXND EXND* RAND102030405060Code Coverage (%)Figure 3.9: Box plots of JavaScript code coverage achieved using different test suites. Mean values areshown with (*).to generate more tests with diverse assertions.Finding 4: Our results suggest that, there is a strong correlation between theeffectiveness of the original tests and the effectiveness of the generated tests.Thus the quality of tests generated by TESTILIZER is directly influenced by theoriginal test suite.Code coverage (RQ4). Although code coverage improvement is not the main goalof TESTILIZER in this work, the generated test suite has a slightly higher code cov-erage. As shown in Figure 3.9, on average there is a 22% improvement (8 percent-age point increase) over the original test suite and 19% improvement (7 percentagepoint increase) over the RAND test suite. Note that the original test suites werealready equipped with proper input data, but not many execution paths (thus theslight increase). On the other hand, the random exploration considered more pathsin a blind search, but without proper input data (except for login data). Code cov-erage is slightly higher for EXND compared to EXND*. Our observations revealthat giving higher priority for expanding manual-test states, i.e., EXND, resulted inexecuting lines of code and consequently covering new functionalities that couldbe missed by applying EXND*. The difference is, however, not significant andcan not imply that EXND can always achieve higher code coverage compared toEXND*.Finding 5: Test suites generated by TESTILIZER have slightly higher code cov-erage compared to the original ones and the ones generated by random explo-ration.673.6 Discussion3.6.1 ApplicationsTESTILIZER can be used in automated testing of web applications by generatingDOM-based tests given an existing test suite. The presented technique is applicableonly in the context of regression testing, which assumes the current version of thegiven application is correct. This assumption is the basis for assertion generationand the filtration process for unstable assertions.Although we proposed this work in the context of testing rich web interfaces, asimilar approach can be adapted to GUI applications in general such as desktop andmobile apps. For instance, Memon [116] proposed dynamic analysis techniques toreverse engineer an event-flow graph (EFG) of desktop GUI applications. Similarto the SFG used in our work, EFG can be used for test case generation.The results of this work support the usefulness of leveraging existing UI teststo generate new ones. However, TESTILIZER is limited to applications that alreadyhave human-written tests, which may not be so prevalent in practice. An interestingresearch question is whether human-written tests of similar applications can beleveraged to generate effective tests for applications without existing tests.3.6.2 Generating Negative AssertionsIn our experiments we observed that the majority of DOM-based assertions verifyeither the existence of an element, its properties, or its value on the DOM tree. Inthis work for fewer instances of negative assertions, which assert the absence ofan element, we only considered reusing such assertions on a same state. We be-lieve that its is possible to generate negative assertions using the existing manually-written tests for similar DOM states. One possible approach to calculate the simi-larity – between 0 and 1 – of two abstract DOM state in terms of similarity of theircorresponding string representation. Given a source state containing a negative as-sertion, and a target state, if the similarity between source and target is above 0.5,we can add the negative assertion from source to the set of assertions for the targetstate. Generating such negative assertions might in some cases produce unstableassertions that fail on the unmodified version of the application. Such unstable68assertions can be removed through the filtering process described in Section Test Case DependenciesAn assumption made in TESTILIZER is that the original test suite does not haveany test case dependencies. Generally, test cases should be executable withoutany special order or dependency on previous tests. However, it has been shownthat dependent tests exist in practice, and are sometimes difficult to identify [198].While conducting our evaluation, we also came across multiple test suites thatviolated this principle. For such cases, although TESTILIZER can generate testcases, failures can occur due to these dependencies.3.6.4 EffectivenessThe effectiveness of the generated test suite depends on multiple factors. First, thesize and the quality of the original test suite is very important; if the original testsuite does not contain paths with effective assertions, it is not possible to generatean effective extended test suite. Second, the learning-based approach can be tunedin various ways (e.g., selecting other features, changing the SVM parameters, andchoosing sample dataset size) to obtain better results. Also other learning tech-niques can be used and we used SVM due to the ease of use. Third, the size ofthe DOM subtree (region) to be checked can be increased to detect changes moreeffectively, however, it might come at the cost of making the test suite more brittle.Forth, the crawling time directly affects the size of the extended SFG and thus sizeof generated test suites, which may result in producing more effective tests.3.6.5 EfficiencyThe larger a test suite, the more time it takes to test an application. Since in manytesting environments time is limited, not all possible paths of events should begenerated in the extended test suite. The challenge is finding a balance betweeneffectiveness and efficiency of the test cases. The current graph traversal method inTESTILIZER may produce test cases that share common paths, which do not con-tribute much to fault detection or code coverage. An optimization could be realizedby guiding the test generation algorithm towards sates that have more constrained69DOM-based assertions.3.6.6 Threats to ValidityA threat to the external validity of our experiment is with regard to the gener-alization of the results to other web applications. We acknowledge that more webapplications should be evaluated to support the conclusions. To mitigate this threat,however, we selected our experimental objects from different domains with vari-ations in functionality and structure, which we believe they are representative ofreal-world web applications. Although SELENIUM is widely used in industry fortesting commercial web applications, unfortunately, very few open source web ap-plications are publicly available that have (working) SELENIUM test suites. More-over, since our effectiveness evaluation is based on the client-side mutation anal-ysis, as shown in Table 3.5, the chosen applications should have a considerableamount of JavaScript code. However, many available web applications with SE-LENIUM tests are mainly server-side containing embedded JavaScript code. There-fore, we were able to include a limited number of applications with not large testsuites in our study.One threat to the internal validity is related to the mutation score analysis forthe generated tests, for which the mutants were manually produced for the studiedsubjects. To mitigate the bias towards producing good results for TESTILIZER,we carefully searched within all JavaScript files of each subject system for targetmutation locations based on the defined mutation operators on DOM, jQuery, andXHR, and produced a mutant based on each observed instance similarly for allcases.With respect to reproducibility of our results, TESTILIZER, the test suites,and the experimental objects are publicly available, making the experiment repro-ducible.3.7 Related WorkElbaum et al. [81] leverage user-sessions for web application test generation.Based on this work, Sprenkle et al. [165] propose a tool to generate additionaltest cases based on the captured user-session data. McAllister et al. [115] leverage70user interactions for web testing. Their method relies on prerecorded traces of userinteractions and requires instrumenting one specific web application framework.These techniques do not consider leveraging knowledge from existing test cases.Yuan and Memon [192] propose an approach to iteratively rerun automat-ically generated test cases for generating alternating test cases. This is inlinewith feedback-directed testing [146], which leverages dynamic data producedby executing the program using previously generated test cases. For instance,Artemis [66] is a feedback-directed tool for automated testing of JavaScript ap-plications that uses generic oracles such as HTML validation. Our previous work,FeedEx [123], applies a feedback-directed exploration technique to guide the ex-ploration at runtime towards more coverage and higher navigational and structuraldiversity. These approaches also do not use information in existing test cases, andthey do not address the problem of test oracle generation.A generic approach used often as a test oracle is checking for thrown exceptionsand application crashes [191]. This is, however, not very helpful for web applica-tions as they do not crash easily and the browser continues the execution even afterexceptions. Current web testing techniques simplify the test oracle problem in thegenerated test cases by using soft oracles, such as generic user-defined oracles, andHTML validation [66, 121]. Mirshokraie and Mesbah [128], and Pattabiramanand Zorn [148] dynamically derive invariants for the JavaScript code and the DOMrespectively. Such inferred invariants are used for automated regression and ro-bustness testing. Mirshokraie et al. [130] perform mutation analysis to generatetest cases with oracles using the dynamic event-driven model of JavaScript. Theygenerate oracle for their test cases using mutation testing. However, they do notreuse the existing test inputs or oracles.Xie and Notkin [187] infer a model of the application under test by executingthe existing test cases. Dallmeier et al. [78] mine a specification of desktop sys-tems by executing the test cases. Schur et al. [159] infer behaviour models fromenterprise web applications via crawling. Their tool generates test cases simulatingpossible user inputs. Similarly, Xu et al. [188] mine executable specifications ofweb applications from SELENIUM test cases to create an abstraction of the system.Yoo and Harman [190] propose a search-based approach to reuse and regenerateexisting test data for primitive data types. They show that the knowledge of existing71test data can help to improve the quality of new generated test data. Alshahwan andHarman [63] generate new sequences of HTTP requests through a def-use analysisof server-side code. Pezze et al. [150] present a technique to generate integrationtest cases from existing unit test cases. Mirzaaghaei et al. [132] use test adaptationpatterns in existing test cases to support test suite evolution.Leveraging unit tests has been studied in the context of mining API usage[201], and test recommendation [67, 82, 90, 99, 107] where testers can get inspiredby recommended tests written for other similar applications. For instance Jan-jic and Atkinson [99] introduced recommendation techniques for JUnit tests usingpreviously written test cases and the knowledge extracted from them. Landha¨ußerand Tichy [107] explored the possibilities of test reuse for recommendation basedon clones. Such techniques can be utilized in JavaScript unit test generation tools,such as JSeft [130] or ConFix [127], to generate test for a similar code.Adamsen et al. [60] propose a technique for testing Android mobile apps bysystematically exposing existing tests to adverse conditions. Zhang et al. [197]reuse existing test cases for security testing by analyzing the security vulnerabili-ties in the execution traces. Tonella et al. [172] reuse existing DOM-based tests togenerate visual web tests. This work is also related to test suite augmentation tech-niques [155, 189] used in regression testing. In test suite augmentation the goal isto generate new test cases for the changed parts of the application. More related toour work is [181], which aggregates tests generated by different approaches usinga unified test case language. They propose a test advice framework that extractsinformation in the existing tests to help improve other tests or test generation tech-niques.Our work is different from these approaches in that we (1) reuse knowledgein existing human-written test cases in the context of web application testing, (2)reuse input values and event sequences in test cases to explore alternative paths andnews states of web application, and (3) reuse oracles of the test cases for regener-ating assertions to improve the fault finding capability of the test suite.723.8 ConclusionsThis work is motivated by the fact that a human-written test suite is a valuablesource of domain knowledge, which can be used to tackle some of the challengesin automated web application test generation. Given a web application and itsDOM-based (such as SELENIUM) test suite, our tool, called TESTILIZER, utilizesthe given test suite to generate effective test cases by exploring alternative paths ofthe application, and regenerating assertions for new detected states. Our empiricalresults on 8 real-world applications show that TESTILIZER easily outperforms arandom test generation technique, provides substantial improvements in the faultdetection rate compared with the original test suite, while slightly increasing codecoverage too.The results support the usefulness of leveraging existing DOM-based tests togenerate new ones. However, TESTILIZER is limited to applications that alreadyhave human-written tests, which may not be so prevalent in practice. On the otherhand, many web applications are similar to each other in terms of design and codebase, such as being built on top of the same content management system. Wepropose an open research problem whether human-written tests can be leveragedto generate effective tests for applications without existing tests. This is, how-ever, challenging particularly for assertion generation based on learned patterns.DOM-based assertions on abstract DOM states of an application may require somechanges to be applied on similar abstract DOM state of another application.73Chapter 4JavaScript: The (Un)covered PartsSummary6Testing JavaScript code is important. JavaScript has grown to be among the mostpopular programming languages and it is extensively used to create web appli-cations both on the client and server. We present the first empirical study ofJavaScript tests to characterize their prevalence, quality metrics (e.g. code cov-erage), and shortcomings. We perform our study across a representative corpus of373 JavaScript projects, with over 5.4 million lines of JavaScript code. Our resultsshow that 22% of the studied subjects do not have test code. About 40% of projectswith JavaScript at client-side do not have a test, while this is only about 3% for thepurely server-side JavaScript projects. Also tests for server-side code have highquality (in terms of code coverage, test code ratio, test commit ratio, and averagenumber of assertions per test), while tests for client-side code have moderate to lowquality. In general, tests written in Mocha, Tape, Tap, and Nodeunit frameworkshave high quality and those written without using any framework have low quality.We scrutinize the (un)covered parts of the code under test to find out root causesfor the uncovered code. Our results show that JavaScript tests lack proper coveragefor event-dependent callbacks (36%), asynchronous callbacks (53%), and DOM-related code (63%). We believe that it is worthwhile for the developer and researchcommunity to focus on testing techniques and tools to achieve better coverage fordifficult to cover JavaScript code.6An initial version of this chapter has been accepted to be published in the IEEE InternationalConference on Software Testing, Verification and Validation (ICST), 2017 [125].744.1 IntroductionJavaScript is currently the most widely used programming language according toa recent survey of more than 56K developers conducted by Stack Overflow [167],and also exploration of the programming languages used across GitHub reposito-ries [91]. JavaScript is extensively used to build responsive modern web applica-tions, and is also used to create desktop and mobile applications, as well as server-side network programs. Consequently, testing JavaScript applications and modulesis important. However, JavaScript is quite challenging to test and analyze due tosome of its specific features. For instance, the complex and dynamic interactionsbetween JavaScript and the DOM, makes it hard for developers to test effectively[66, 127, 130].To assist developers with writing tests, there exist number of JavaScript unittesting frameworks, such as Mocha [36], Jasmine[31], QUnit [40], and Nodeunit[37], each having its own advantages [52]. Also the research community have pro-posed some automated testing tools and test generation techniques for JavaScriptprograms [66, 95, 127, 130, 131], though they are not considerably used by testersand developers yet.Some JavaScript features, such as DOM interactions, event-dependent call-backs, asynchronous callbacks, and closures (hidden scopes), are considered to beharder to test [28, 46, 48, 53, 127, 173]. However, there is no evidence that to whatextent this is true in real-world practice.In this work, we study JavaScript (unit) tests in the wild from different angles.The results of this study reveal some of the shortcomings and difficulties of manualtesting, which provide insights on how to improve existing JavaScript test genera-tion tools and techniques. We perform our study across a representative corpus of373 popular JavaScript projects, with over 5.4 million lines of JavaScript code. Tothe best of our knowledge, this work is the first study on JavaScript tests. The maincontributions of our work include:• A large-scale study to investigate the prevalence of JavaScript tests in thewild;• A tool, called TESTSCANNER, which statically extracts different metrics in75our study and is publicly available [59];• An evaluation of the quality of JavaScript tests in terms of code coverage,average number of assertions per test, test code ratio, and test commit ratio;• An analysis of the uncovered parts of the code under test to understand whichparts are difficult to cover and why.4.2 MethodologyThe goal of this work is to study and characterize JavaScript tests in practice.We conduct quantitative and qualitative analyses to address the following researchquestions:RQ1: How prevalent are JavaScript tests?RQ2: What is the quality of JavaScript tests?RQ3: Which part of the code is mainly uncovered by tests and why?4.2.1 Subject SystemsWe study 373 popular open source JavaScript projects. 138 of these subject sys-tems are the ones used in a study for JavaScript callbacks [87] including 86 of themost depended-on modules in the NPM repository [55] and 52 JavaScript reposito-ries from GitHub Showcases7 [56]. Moreover, we added 234 JavaScript reposito-ries from Github with over 4000 stars. The complete list of these subjects and ouranalysis results, are available for download [59]. We believe that this corpus of 373projects is representative of real-world JavaScript projects as they differ in domain(category), size (lines of code), maturity (number of commits and contributors),and popularity (number of stars and watchers).We categorize our subjects into 19 categories using topics from JSterJavaScript Libraries Catalog [54] and GitHub Showcases [56] for the same or simi-lar projects. Table 4.1 presents these categories with average values for the number7GitHub Showcases include popular and trending open source repositories organized around dif-ferent topics.760%	  10%	  20%	  30%	  40%	  50%	  60%	  70%	  80%	  90%	  100%	  C1	  C2	  C3	  C4	  C5	  C6	  C7	  C8	  C9	  C10	  C11	  C12	  C13	  C14	  C15	  C16	  C17	  C18	  C19	  Total	  Client	  side	   Server	  side	   Client	  and	  server	  side	  Figure 4.1: Distribution of studied subject systems.of JavaScript files (production code), source lines of code (SLOC) for productionand test code, number of test cases, and number of stars in Github repository foreach category. We used SLOC [58] to count lines of source code excluding li-braries. Overall, we study over 5.4 million (3.7 M production and 1.7 M test)source lines of JavaScript code.Figure 4.1 depicts the distribution of our subject systems with respect to theclient or server side code. Those systems that contain server-side components arewritten in Node.js8, a popular server-side JavaScript framework. We apply thesame categorization approach as explained in [87]. Some projects such as MVCframeworks, e.g. Angular, are purely client-side, while most NPM modules arepurely server-side. We assume that client-side code is stored in directories such aswww, public, static, or client. We also use code annotations such as /* jshintbrowser:true, jquery:true */ to identify client-side code.The 373 studied projects include 128 client-side, 130 server-side, and 115client&server-side code. While distributions in total have almost the same size,they differ per project category. For instance subject systems in categories C1 (UIcomponents), C2 (visualization), C8 (touch and drag&drop), C19 (MVC frame-works), and C18 (multimedia) are mainly client-side and those in categories C4(software dev tools), C6 (parsers and compilers), C12 (I/O), C13 (package andbuild managers), C14 (storage), C16 (browser utils), and C17 (CLI and shell) aremainly server-side.8 https://nodejs.org77Table 4.1: Our JavaScript subject systems (60K files, 3.7 M production SLOC, 1.7 M test SLOC, and 100K test cases).ID Category # Subject Ave # Ave Prod Ave Test Ave # Ave # Ave #systems JS files SLOC SLOC tests assertions starsC1 UI Components, Widgets, and Frameworks 52 41 4.7K 2.8K 235 641 9.8KC2 Visualization, Graphics, and Animation Libraries 48 53 10.2K 3.8K 425 926 7.5KC3 Web Applications and Games 33 61 10.6K 1.4K 61 119 4KC4 Software Development Tools 29 67 12.7K 7.8K 227 578 6.9KC5 Web and Mobile App Design and Frameworks 25 91 22.3K 6.9K 277 850 14.4KC6 Parsers, Code Editors, and Compilers 22 167 27K 9.5K 701 1142 5.5KC7 Editors, String Processors, and Templating Engines 19 26 4.3K 1.9K 102 221 6.5KC8 Touch, Drag&Drop, Sliders, and Galleries 19 10 1.9K 408 52 72 7.9KC9 Other Tools and Libraries 17 93 9.1K 7.6K 180 453 8.5KC10 Network, Communication, and Async Utilities 16 19 4.1K 7.6K 279 354 7.6KC11 Game Engines and Frameworks 13 86 17K 1.2K 115 293 3.5KC12 I/O, Stream, and Keyboard Utilities 13 8 0.6K 1K 40 61 1.5KC13 Package Managers, Build Utilities, and Loaders 11 47 3.4K 5.4K 200 300 8.5KC14 Storage Tools and Libraries 10 19 4K 7K 222 317 5.5KC15 Testing Frameworks and Libraries 10 28 2.8K 3.6K 271 632 5.7KC16 Browser and DOM Utilities 9 45 5.6K 7.1K 76 179 5.2KC17 Command-line Interface and Shell Tools 9 9 2.8K 1K 26 244 2.6KC18 Multimedia Utilities 9 11 1.6K 760 17 97 6.2KC19 MVC Frameworks 9 174 40.1K 15.2K 657 1401 14.2KClient-side 128 39 8.2K 3.2K 343 798 7.9KServer-side 130 63 9.4K 7.2K 231 505 6.7KClient and server-side 115 73 12.7K 4.7K 221 402 7.4KTotal 373 57 10.1K 4.5K 263 644 7.3K784.2.2 AnalysisTo address our research questions, we statically and dynamically analyze test suitesof our subject programs. To extract some of the metrics in our study, we developa static analyzer tool, called TESTSCANNER [59], which parses production andtest code into an abstract syntax tree using Mozilla Rhino [41]. In the rest of thissection we explain details of our analysis for each research question.Prevalence of tests (RQ1)To answer RQ1, we look for presence of JavaScript tests written in any framework(e.g. Mocha, Jasmine, or QUnit). Tests are usually located at folders namely tests,specs9, or similar names.We further investigate the prevalence of JavaScript tests with respect to sub-ject categories, client/server-side code, popularity (number of stars and watchers),maturity (number of commits and contributors), project size (production SLOC),and testing frameworks. To distinguish testing frameworks, we analyze packagemanagement files (such as package.json), task runner and build files (such asgrunt.js and gulpfile.js), and test files themselves.Quality of tests (RQ2)To address RQ2, for each subject with test we compute four quality metrics asfollowing:Code coverage. Coverage is generally known as an indicator of test quality. Wecompute statement, branch, and function coverage for JavaScript code using JS-Cover [33] (for tests that run in the browser), and Istanbul [30]. To calculate cover-age of the minified JavaScript code, we beautify them prior to executing tests. Wealso exclude dependencies, such as files under the node modules directory, andlibraries (unless the subject system is itself a library).Average number of assertions per test. Code coverage does not directly implya test suite effectiveness [97], while assertions have been shown to be stronglycorrelated with it [199]. Thus, TESTSCANNER also computes average number of9For instance Jasmine and Mocha tests are written as specs and are usually located at folders withsimilar names.79assertions per test case as a test suite quality metric. Our analysis tools detectsusage of well-known assertion libraries such as assert.js, should.js, expect.js, andchai.Test code ratio. This metric is defined as the ratio of test SLOC to production andtest SLOC. A program with a high test code ratio may have a higher quality testsuite.Test commit ratio. This metric is the ratio of test commits to total commits. Highertest commit ratio may indicate more mature and higher quality tests. We assumethat every commit that touches at least one file in a folder named test, tests, spec,or specs is a test commit. In rare cases that tests are stored elsewhere, such as theroot folder, we manually extract number of test commits by looking at its githubrepository page and counting commits on test files.We investigate these quality metrics with respect to subject categories,client/server-side code, and testing frameworks.(Un)covered code (RQ3)Code coverage is a widely accepted test quality indicator, thus finding the rootcause of why a particular statement is not covered by a test suite, can help in writ-ing higher quality tests. Some generic possible cases for an uncovered (missed)statement s, are as following:1. s belongs to an uncovered function f , where(a) f has no calling site in both the production and the test code. In thiscase, f could be (1) a callback function sent to a callback-acceptingfunction (e.g., setTimeout()) that was never invoked, or (2) an un-used utility function that was meant to be used in previous or futurereleases. Such unused code can be considered as code smells [124].Consequently we cannot pinpoint such an uncovered function to a par-ticular reason.(b) the calling site for f in the production code was never executed by atest. Possible root causes can be that (1) f is used as a callback (e.g.event-dependent or asynchronous) that was never invoked, (2) the call80to f statement was never reached because of an earlier return state-ment or an exception, or the function call falls in a never met conditionbranch.(c) f is an anonymous function. Possible reasons that f was not coveredcan be that (1) f is used as a callback that was never invoked (e.g. anevent-dependent callback while the required event was not triggered,or an asynchronous callback while did not wait for the response), (2) fis a self-invoking function that was not executed to be invoked, or (3)f is set to a variable and that variable was never used or its usage wasnot executed.2. s belongs to a covered function f , where(a) the execution of f was terminated, by a return statement or an ex-ception, prior to reaching s.(b) s falls in a never met condition in f (e.g. browser or DOM dependentstatements).3. The test case responsible for covering s was not executed due to a test exe-cution failure.4. s is a dead (unreachable) code.Uncovered statement in uncovered function ratio. If an uncovered statements belongs to an uncovered function f , making f called could possibly cover s aswell. This is important specially if f needs to be called in a particular way, such asthrough triggering an event.In this regard, our tool uses coverage report information (in json or lcov for-mat) to calculate the ratio of the uncovered statements that fall within uncoveredfunctions over the total number of uncovered statements. If this value is large itindicates that the majority of uncovered statements belong to uncovered functions,and thus code coverage could be increased to a high extent if the enclosing functionis called by a test case.811 function setFontSize(size) {2 return function() {3 // this is an anonymous closure4 = size + 'px';5 };6 }7 var small = setFontSize(12);8 var large = setFontSize(16);9 ...10 function showMsg() {11 // this is an async callback12 alert("Some message goes here!");13 }14 ...15 $("#smallBtn").on("click", small);16 $("#largeBtn").on("click", large);17 $("#showBtn").on("click", function() {18 // this is an event-dependent anonymous callback19 setTimeout(showMsg, 2000);20 $("#photo").fadeIn("slow", function() {21 // this is an anonymous callback22 alert("Photo animation complete!");23 });24 });25 ...26 checkList = $("#checkList");27 checkList.children("input").each(function () {28 // this is an DOM-related code29 if (':checked')) {30 ...31 }else{32 ...33 }34 });Figure 4.2: A hard to test JavaScript code snippet.Hard-to-test JavaScript code. Some JavaScript features, such as DOM inter-actions, event-dependent callbacks, asynchronous callbacks, and closures (hiddenscopes), are considered to be harder to test [28, 46, 48, 53, 127, 173]. In this sec-tion we explain four main hard-to-test code with an example code snippet depictedin Figure 4.2. Also we fine-grain statement and function coverage metrics to in-vestigate these hard-to-test code separately in detail. To measure these coveragemetrics, TESTSCANNER maps a given coverage report to the locations of hard-to-test code.DOM related code coverage. In order to unit test a JavaScript code with DOMread/write operations, a DOM instance has to be provided as a test fixture in the ex-82act structure expected by the code under test. Otherwise, the test case can terminateprematurely due to a null exception. Writing such DOM based fixtures can be chal-lenging due to the dynamic nature of JavaScript and the hierarchical structure ofthe DOM [127]. For example, to cover the if branch at line 29 in Figure 4.2, oneneeds to provide a DOM instance such as <div id="checkList"><inputtype="checkbox" checked></input></div>. To cover the elsebranch, a DOM instance such as <div id="checkList"><inputtype="checkbox"></input></div> is required. If such fixtures arenot provided, $("checkList") returns null as the expected element is notavailable, and thus checkList.children causes a null exception and the testcase terminates.DOM related code coverage is defined as the fraction of number of coveredover total number of DOM related statements. A DOM related statement is a state-ment that can affect or be affected by DOM interactions such as a DOM API us-age. To detect DOM related statements TESTSCANNER extracts all DOM API us-ages in the code (e.g. getElementById, createElement, appendChild,addEventListener, $, and innerHTML) and their forward slices. Forwardslicing is applied on the variables that were assigned with a DOM element/attribute.For example the forward slice of checkList at line 26 in Figure 4.2 are lines 27–34. A DOM API could be located in a (1) return statement of a function f , (2)conditional statement, (3) function call (as an argument), (4) an assignment state-ment, or (5) other parts within a scope. In case (1), all statements that has a callto f are considered DOM related. In case (2), the whole conditional statements(condition and the body of condition) are considered DOM related. In case (3) thestatements in the called function, which use that DOM input will be consideredDOM related. In other cases, the statement with DOM API is DOM related.Event-dependent callback coverage. The execution of some JavaScript code mayrequire triggering an event such as clicking on a particular DOM element. For in-stance it is very common in JavaScript client-side code to have an (anonymous)function bound to an element’s event, e.g. a click, which has to be simulated. Theanonymous function in lines 17–24 is an event-dependent callback function. Suchcallback functions would only be passed and invoked if the corresponding event istriggered. In order to trigger an event, testers can use methods such as jQuery’s83.trigger(event, data, ...) or .emit(event, data, ...) ofNode.js EventEmitter. Note that if an event needs to be triggered on a DOM el-ement, a proper fixture is required otherwise the callback function cannot be exe-cuted.Event-dependent callback coverage is defined as the fraction of number ofcovered over total number of event-dependent callback functions. In order todetect event-dependent callbacks, our tool checks if a callback function is anevent method such as bind, click, focus, hover, keypress, emit,addEventListener, onclick, onmouseover, and onload.Asynchronous callback coverage. Callbacks are functions passed as an ar-gument to another function to be invoked either immediately (synchronous) orat some point in the future (asynchronous) after the enclosing function returns.Callbacks are particularly useful to perform non-blocking operations. FunctionshowMsg in lines 10–13 is an asynchronous callback function as it was passed tothe setTimeout() asynchronous API call. Testing asynchronous callbacks re-quires waiting until the callback is called, otherwise the test would probably finishunsuccessfully before the callback is invoked. For instance QUnit’s asyncTestallows tests to wait for asynchronous callbacks to be called.Asynchronous callback coverage is defined as the fraction of number of cov-ered over total number of asynchronous callback functions. Similar to a study ofcallbacks in JavaScript [87], if a callback argument is passed into a known deferringAPI call we count it as as an asynchronous callback. TESTSCANNER detects someasynchronous APIs including network calls (e.g.,DOM events (e.g., onclick), timers (setImmediate, setTimeout,setInterval, and process.nextTick), and I/O (e.g. APIs of fs, http,and net).Closure function coverage. Closures are nested functions that make it possible tocreate hidden scope to privatize variables and functions from the global scope inJavaScript. A closure function, i.e., the inner function, has access to all parametersand variables – except for this and argument variables – of the outer function,even after the outer function has returned [77]. The anonymous function in lines2–5 is an instance of a closure.84Such hidden functions cannot be called directly in a test case and thus testingthem is challenging. In fact writing a unit test for a closure function without codemodification is impossible. Simple solutions such as making them public or puttingthe test code inside the closure are not good software engineering practices. Oneapproach to test such private functions is adding code inside the closure to storereferences to its local variables inside objects and return it to the outer scope [48].Closure function coverage is defined as the fraction of number of covered over totalnumber of closure functions.Average number of function calls per test. Some code functionalities depend onthe execution of a sequence of function calls. For instance in a shopping applica-tion, one needs to add items to the cart prior to check out. We perform a correlationanalysis between average number of unique function calls per test and code cover-age. We also investigate whether JavaScript unit tests are mostly written at singlefunction level or they execute sequence of function calls.4.3 Results4.3.1 Prevalence of TestsThe stacked bar charts in Figure 4.3a depicts the percentage of JavaScript tests, persystem category (Table 4.1), per client/server side, and in aggregate. The heightof each bar indicates the percentage of subjects in that category. In total, amongthe 373 studied subjects, 83 of them (i.e., 22%) do not have JavaScript tests. Themajority (78%) of subjects have at least one test case.Finding 1: 22% of the subject systems that we studied do not have any JavaScripttest, and 78% have at least one test case.As shown in figure 4.3b, amongst subjects with test, the majority of tests arewritten in Mocha (38%), Jasmine (19%), and QUnit (18%). 6% does not followany particular framework and have their own tests. Minor used frameworks are Tap(5%), Tape (4%), Nodeunit (3%), Vows (3%) and others (4%) including Jest, Evi-dence.js, Doh, CasperJS, Ava, UTest, TAD, and Lab. We also observe that 3 repos-itories have tests written in two testing frameworks: 2 projects (server and client-server) with Nodeunit+Mocha test, and one (client-server) with Jasmine+QUnit850%	  10%	  20%	  30%	  40%	  50%	  60%	  70%	  80%	  90%	  100%	  C1	  C2	  C3	  C4	  C5	  C6	  C7	  C8	  C9	  C10	  C11	  C12	  C13	  C14	  C15	  C16	  C17	  C18	  C19	  Client	  Server	  Client-­‐Server	  Total	  Mocha	   No	  tests	   Jasmine	   QUnits	   Other	  frameworks	   Its	  own	  tests	  (a) Distribution within all subjects.Mocha	  38%	  Jasmine	  19%	  QUnits	  18%	  Own	  tests	  6%	  Tap	  5%	  Tape	  4%	  Others	  4%	  Nodeunit	  3%	  Vows	  3%	  (b) Testing frameworks distribution.Figure 4.3: Distribution of JavaScript tests.test.Finding 2: The most prevalent used test frameworks for JavaScript unit testingare Mocha (38%), Jasmine (19%), and QUnit (18%).We also investigate the prevalence of UI tests and observe that only 12 projects(i.e., 3%) among all 373 ones have UI tests for which 9 are written using Webdrive-rio and Selenium webdriver, and 3 uses CasperJS. 7 of these projects are client andserver side, 3 are client-side, and 2 are server-side. One of these subjects does nothave any JavaScript test.Finding 3: Only 3% of the studied repositories have functional UI tests.Almost all (95%) of purely server-side JavaScript projects have tests, while thisis 61% for client-side and 76% for client&server-side ones. Note that the numberof subjects in each category are not very different (i.e., 128 client-side, 130 server-860%	  20%	  40%	  60%	  80%	  100%	  1-­‐4K	   4K-­‐5.6K	   5.6K-­‐8.9K	   8.9K-­‐92K	  (a) Number of stars0%	  20%	  40%	  60%	  80%	  100%	  1-­‐151	   151-­‐262	   262-­‐444	   444-­‐6K	  (b) Number of watchers0%	  20%	  40%	  60%	  80%	  100%	  1-­‐251	   254-­‐701	   710-­‐1.8K	   1.8K-­‐27.6K	  (c) Number of commits0%	  20%	  40%	  60%	  80%	  100%	  1-­‐19	   19-­‐46	   47-­‐102	   102-­‐1.4K	  (d) Number of contributorsFigure 4.4: Percentage of subjects with test per each quartile with respect to popularity (number of starsand watchers) and maturity (number of commits and contributors).side, and 115 client and server-side code). Interestingly the distribution of testframeworks looks very similar for client-side and client-server side projects.As shown in Figure 4.3a, all subjects systems in categories C6 (parsers andcompilers), C12 (I/O), C13 (package and build managers), C14 (storage), C19(MVC frameworks), and C17(CLI and shell), have JavaScript unit tests. Projectsin all of these categories, except for C19, are mainly server-side as depicted in Fig-ure 4.1. In contrast, many of subjects in categories C1 (UI components), C3 (webapps), C8 (touch and drag&drop), and C18 (multimedia) do not have tests, whichare mainly client-side. Thus we can deduce that JavaScript tests are written morefor server-side code than client-side, or client and server-side code.Finding 4: While almost all subjects (95%) in the server-side category havetests, about 40% of subjects in client-side and client-server side categories donot have tests.We believe the more prevalence of tests for server-side code can be attributed to(1) the difficulties in testing client-side code, such as writing proper DOM fixturesor triggering events on DOM elements, and (2) using time-saving test scripts formost Node.js based projects, such as npm test that is included by default wheninitializing a new package.json file. This pattern is advocated in the Node.jscommunity [57] and thus many server-side JavaScript code, such as NPM modules,87Client Server Client-Server Total020406080100Coverage (%)(a) Statement coverageClient Server Client-Server Total020406080100Coverage (%)(b) Branch coverageClient Server Client-Server Total020406080100Coverage (%)(c) Function coverageFigure 4.5: Boxplots of the code coverage of the executed JavaScript tests. Mean values are shown with(*).have test code.We also consider how popularity (number of stars and watchers) and maturity(number of commits and contributors) of subject systems are related to the preva-lence of unit tests. Figure 4.4a shows the percentage of subjects with tests in eachquartile. As popularity and maturity increase, the percentage of subjects with testincreases as well.4.3.2 Quality of TestsCode coverage. Calculating the code coverage requires executing tests on a prop-erly deployed project. In our study, however, we faced number of projects withfailure in build/deployment or running tests. We tried to resolve such problemsby quick changes in build/task configuration files or by retrieving a later version(i.e., some days after fetching the previous release). In most cases build failure wasdue to errors in dependent packages or their absence. We could finally calculatecoverage for 231 out of 290 (about 80%) subjects with tests. We could not prop-erly deploy or run tests for 44 subject systems (41 with test run failure, freeze, orbreak, and 3 build and deployment error), and could not get coverage report for 15projects with complex test configurations.Boxplots in Figure 4.5 show that in total tests have a median of 83% statementcoverage, 84% function coverage, and 69% branch coverage. Tests for server-sidecode have higher coverage in all aspects compared to those for client-side code. Wenarrow down our coverage analysis into different subject categories. As depictedin Table 4.2, subjects in categories C6 (parsers and compilers), C10 (Network and88Table 4.2: Test quality metrics average values.Statement Branch Function Ave # Test Testcoverage coverage coverage assertions code commitper test ratio ratioSubjectcategoryC1 77% 57% 76% 2.83 0.41 0.16C2 67% 52% 65% 2.72 0.28 0.14C3 60% 38% 58% 3.75 0.88 0.14C4 79% 68% 78% 2.50 0.58 0.24C5 75% 63% 75% 2.53 0.52 0.21C6 87% 79% 88% 2.53 0.47 0.24C7 80% 67% 72% 2.51 0.46 0.22C8 64% 47% 60% 2.04 0.35 0.12C9 73% 58% 69% 2.67 0.49 0.23C10 91% 79% 90% 2.73 0.72 0.24C11 64% 45% 57% 3.41 0.18 0.11C12 90% 77% 89% 2.36 0.59 0.20C13 86% 67% 84% 2.27 0.59 0.18C14 88% 77% 87% 2.74 0.62 0.26C15 81% 69% 79% 5.79 0.59 0.25C16 78% 67% 79% 1.67 0.49 0.29C17 67% 54% 63% 8.32 0.47 0.21C18 60% 31% 62% 4.42 0.31 0.16C19 81% 67% 80% 3.58 0.53 0.21Testingframework Mocha 82% 70% 79% 2.39 0.49 0.20Jasmine 74% 60% 75% 1.93 0.41 0.21QUnit 71% 54% 71% 3.93 0.41 0.16Own test 61% 41% 58% 5.99 0.30 0.16Tap 89% 80% 89% 1.56 0.58 0.21Tape 93% 81% 94% 2.93 0.70 0.18Others 80% 65% 77% 5.60 0.46 0.24Nodeunit 74% 63% 72% 6.20 0.57 0.24Vows 74% 66% 72% 1.92 0.55 0.27Client 70% 53% 70% 2.71 0.36 0.16Server 85% 74% 83% 3.16 0.58 0.23C&S 72% 56% 70% 2.9 0.4 0.18Total 78% 64% 76% 2.96 0.46 0.2Async), C12 (I/O), C13 (package and build managers), C14 (storage), C15 (testingframeworks), and C19 (MVC frameworks) on average have higher code coverage.Projects in these categories are mainly server-side. In contrast, subjects in cate-gories C2 (visualization), C3 (web apps), C8 (touch and drag&drop), C11 (gameengines), C17 (CLI and shell), and C18 (multimedia), have lower code coverage.Note that subjects under these categories are mainly client-side.Finding 5: The studied JavaScript tests have a median of 83% statement cov-erage, 84% function coverage, and 69% branch coverage. Tests for server-sidecode have higher coverage in all aspects compared to those for client-side code.89Client Server Client-Server Total0123456Ave # assertions per testFigure 4.6: Average number of assertions per test.Table 4.2 also depicts the achieved coverage per testing framework. Tests writ-ten in Tape, Tap, and Mocha have generally higher code coverage. The majorityof server-side JavaScript projects are tested using these frameworks. On the otherhand, tests written in QUnit, which is used more often for the client-side than theserver-side, has generally lower code coverage. Developers that used their ownstyle of testing without using popular frameworks write tests with the poorest cov-erage.Finding 6: Tests written in Tape, Tap, and Mocha frameworks, generally havehigher coverage compared to those written in QUnit, Nodeunit, and those with-out using any test framework.Average number of assertions per test. Figure 4.6 depicts boxplots of averagenumber of assertions per test case. While median values are very similar (about2.2) for all cases, server-side code has a slightly higher mean value (3.16) com-pared to client-side (2.71). As shown in Table 4.2, subjects in categories C3 (webapps), C11 (game engines), C15 (testing frameworks), C17 (CLI and shell), C18(multimedia), and C19 (MVC frameworks) on average have higher average num-ber of assertions per test compared to others. Interestingly among these categoriesonly for C15 and C19 code coverage is also high while it is low for the rest.Finding 7: The studied test suites have a median of 2.19 and a mean of 2.96for the average number of assertions per test. These values do not differ muchamong server-side and client-side code.90Client Server Client-Server Total0. code ratioFigure 4.7: Test to total code ratio.Also results shown in Table 4.2 indicate that tests written in QUnit, Tape, Node-unit, other frameworks (e.g. Jest, CasperJS, and UTest), and those without using aframework, have on average more assertions per test. The majority of server-sideJavaScript projects are tested using these frameworks. Again we observe that onlyfor tests written in Tape framework code coverage is also high while it is low forthe rest.Test code ratio. Figure 4.7 shows test to total (production and test) code ratiocomparison. The median and mean of this ratio is about 0.6 for server-side projectsand about 0.35 for client-side ones. As shown in Table 4.2, on average subjects withhigher test code ratio belongs to categories C3, C4, C5, C10, C12, C13, C14, C15,and C19 while those in C2, C8, C11, and C18 have lower test code ratio. Also testswritten in Tap, Tape, Nodeunit, and Vows have higher test code ratio while testswritten without using any framework have lower test code ratio.We further study the relationship between test code ratio and total code cover-age (average of statement, branch, and function coverage) through the Spearman’scorrelation analysis10. The result shows that there exists a moderate to strong cor-relation (ρ = 0.68, p = 0) between test code ratio and code coverage.Finding 8: Tests for server-side code have higher test code ratio (median andmean of about 0.6) compared to client-side code (median and mean of about0.35). Also there exists a moderate to strong correlation (ρ = 0.68, p = 0)between test code ratio and code coverage.10The non-parametric Spearman’s correlation coefficient measures the monotonic relationship be-tween two continuous random variables and does not require the data to be normally distributed.91Client Server Client-Server Total0. commit ratioFigure 4.8: Test to total commits ratio.Test commit ratio. Figure 4.8 depicts test to total (production and test) commitratio comparison. The median and mean of this ratio is about 0.25 for server-sideprojects and about 0.15 for client-side ones. As shown in Table 4.2, on averagesubjects with higher test commit ratio belongs to categories C4, C6, C9, C10, C14,C15, and C16 while those in C1, C2, C3, C8, C11, and C18 have lower test com-mit ratio. Also tests written in Nodeunit, Vows, and other frameworks (e.g. Jest,CasperJS, and UTest) have higher test commit ratio while tests written in QUnit orwithout using any framework have lower test commit ratio.Similar to the correlation analysis for test code ratio, we study the relationshipbetween test commit ratio and total code coverage. The result indicates that thereexists a moderate to low correlation (ρ = 0.49, p = 0) between test commit ratioand code coverage.Finding 9: While test commit ratio is relatively high for server-side projects(median and mean of about 0.25), it is moderate in total and relatively low forclient-side projects (median and mean of about 0.15). Also there exists a mod-erate to low correlation (ρ = 0.49, p = 0) between test commit ratio and codecoverage.4.3.3 (Un)covered CodeAs explained earlier in Section 4.2.2, one possible root cause for uncovered codeis that the responsible test code was not executed. In our evaluation, however, weobserved that for almost all the studied subjects, test code had very high coveragemeaning that almost all statements in test code were executed properly. Thus the92Table 4.3: Statistics for analyzing uncovered code. The ”–” sign indicates no instance of a particular code.Function coverage StatementcoverageAll Async Event Closure All DOM Ave # USUFcallback dependent related func calls ratiocallback per testSubjectcategoryC1 76% 65% 33% 79% 77% 73% 2.91 0.59C2 65% 43% 17% 62% 67% 61% 2.82 0.73C3 58% 21% 10% 38% 60% 27% 3.94 0.82C4 79% 49% 48% 70% 81% 75% 2.89 0.53C5 75% 52% 33% 65% 75% 62% 3.05 0.72C6 88% 60% 32% 87% 87% 57% 3.30 0.33C7 72% 34% 28% 81% 80% 52% 3.11 0.52C8 60% 40% 39% 80% 64% 78% 2.18 0.77C9 69% 14% 8% 80% 73% 23% 2.89 0.59C10 90% 65% 60% 95% 91% 81% 4.98 0.5C11 57% 33% 7% 68% 64% 51% 2.79 0.85C12 89% 85% 68% 98% 90% 85% 3.56 0.32C13 84% 71% 74% 60% 86% 85% 2.86 0.49C14 87% 66% 36% 89% 88% – 2.98 0.62C15 79% 70% 39% 62% 81% 58% 2.16 0.59C16 79% 40% 5% 43% 78% 48% 2.69 0.39C17 63% 7% 5% 56% 67% – 2.42 0.65C18 62% – 0% 89% 60% 40% 2.19 0.86C19 81% 61% 47% 76% 82% 62% 2.92 0.53Testingframework Mocha 79% 50% 34% 71% 82% 58% 3.62 0.56Jasmine 75% 65% 34% 69% 74% 62% 2.28 0.71QUnit 71% 53% 28% 76% 71% 68% 3.35 0.66Own test 58% 45% 26% 66% 61% 51% 1.78 0.63Tap 89% 68% 87% 94% 89% – 2.52 0.24Tape 94% 79% 65% 92% 93% 88% 3.19 0.22Others 77% 33% 30% 66% 80% 79% 2.14 0.48Nodeunit 72% 53% 63% 74% 74% 52% 4.08 0.62Vows 72% 60% 38% 79% 74% 0% 1.60 0.6Client 70% 46% 25% 69% 70% 66% 2.96 0.68Server 83% 64% 48% 82% 85% 67% 3.19 0.45C&S 70% 48% 29% 69% 72% 57% 2.93 0.69Total 76% 53% 36% 74% 78% 63% 3.05 0.57test code coverage does not contribute in the low coverage of production code.Uncovered statement in uncovered function (USUF) ratio. If an uncovered codec belongs to an uncovered function f , making f called could possibly cover c aswell. As described in Section 4.2.2, we calculate the ratio of uncovered statementsthat fall within uncovered functions over the total number of uncovered statements.Table 4.3 shows average values for this ratio (USUF). The mean value of USUFratio is 0.57 in total, 0.45 for server-side projects, and about 0.7 for client-side ones.This indicate that the majority of uncovered statements in client-side code belong93to uncovered functions, and thus code coverage could be increased to a high extentif the enclosing function could be called during test execution.Finding 10: A large portion of uncovered statements fall in uncovered functionsfor client-side code (about 70%) compared to server-side code (45%).Hard-to-test-function coverage. We measure coverage for hard-to-test functionsas defined in Section 4.2.2. While the average function coverage in total is 76%, theaverage event-dependent callback coverage is 36% and the average asynchronouscallback coverage is 53%. The average value of closure function coverage in totalis 74% and for server-side subjects is 82% while it is 69% for client-side ones.Finding 11: On average, JavaScript tests have low coverage for event-dependentcallbacks (36%) and asynchronous callbacks (53%). Average values for client-side code are even worse (25% and 46% respectively). The average, closurefunction coverage is 74%.We measure the impact of tests with event triggering methods on event-dependent callback coverage, and writing async tests on asynchronous callbackcoverage through correlation analysis. The results show that there exists a weakcorrelation (ρ = 0.22) between number of event triggers and event-dependent call-back coverage, and a very weak correlation (ρ = 0.1) between number of asyn-chronous tests and asynchronous callback coverage.Finding 12: There is no strong correlation between number of event triggersand event-dependent callback coverage. Also number of asynchronous tests andasynchronous callback coverage are not strongly correlated.This was contrary to our expectation for higher correlations, however, we ob-served that in some cases asynchronous tests and tests that trigger events werewritten to merely target specific parts and functionalities of the production codewithout covering most asynchronous or event-dependent callbacks.DOM related code coverage. On average, JavaScript tests have a moderatelylow coverage of 63% for DOM-related code. We also study the relationship ofexistence of DOM fixtures and DOM related code coverage through correlationanalysis. The result shows that there exists a correlation of ρ = 0.4, p = 0 betweenhaving DOM fixtures in tests and DOM related code coverage. Similar to the cases94for event-dependent and async callbacks, we also observed that DOM fixtures weremainly written for executing a subset of DOM related code.Finding 13: On average, JavaScript tests lack proper coverage for DOM-relatedcode (63%). Also there exists a moderately low correlation (ρ = 0.4) betweenhaving DOM fixtures in tests and DOM related code coverage.Average number of function calls per test. As explained in Section 4.2.2, weinvestigate number of unique function calls per test. The average number of func-tion calls per test has a mean value of about 3 in total and also across server-sideand client-side code. We further perform a correlation analysis between the aver-age number of function calls per test and total code coverage. The result showsthat there exists a weak correlation (ρ = 0.13, p = 0) between average number offunction calls per test and code coverage.Finding 14: On average, there are about 3 function calls to production code pertest case. The average number of function calls per test is not strongly correlatedwith code coverage.4.3.4 DiscussionImplications. Our findings regarding RQ1 indicate that the majority (78%) ofstudied JavaScript projects and in particular popular and trending ones have at leastone test case. This indicates that JavaScript testing is getting attention, however, itseems that developers have less tendency to write tests for client-side code as theydo for the server-side code. Possible reasons could be difficulties in writing properDOM fixtures or triggering events on DOM elements. We also think that the highpercentage of test for server-side JavaScript can be ascribed to the testing patternthat is advocated in the Node.js community [57]. To assist developers with testingtheir JavaScript code, we believe that it is worthwhile for the research communityto invest on developing test generation techniques in particular for the client-sidecode, such as [127, 130, 131].For RQ2, the results indicate that in general, tests written for mainly client-sidesubjects in categories C2 (visualization), C8 (touch and drag&drop), C11 (gameengines), and C18 (multimedia) have lower quality. Compared to the client-side95projects, tests written for the server-side have higher quality in terms of code cov-erage, test code ratio, and test commit ratio. The branch coverage in particular forclient-side code is low, which can be ascribed to the challenges in writing tests forDOM related branches. We investigate reasons behind the code coverage differ-ence in Section 4.3.3. The higher values for test code ratio and test commit ratiocan also be due to the fact that writing tests for server-side code is easier comparedto client-side.Developers and testers could possibly increase code coverage of their tests byusing existing JavaScript test generator tools, such as Kudzu [158], ARTEMIS [66],JALANGI [161], SymJS [109], JSEFT [130], and CONFIX [127]. Tests written inMocha, Tap, Tape, and Nodeunit generally have higher test quality compared toother frameworks and tests that do not use any testing framework. In fact devel-opers that do not write their test by leveraging an existing testing framework writelow quality tests almost in all aspects. Thus we recommend JavaScript developerscommunity to use a well-maintained and mature testing framework to write theirtests.As far as RQ3 is concerned, our study shows that JavaScript tests lack propercoverage for event-dependent callbacks, asynchronous callbacks, and DOM-related code. Since these parts of code are hard to test they can be error proneand thus requires effective targeted tests. For instance a recent empirical study[141] reveals that the majority of reported JavaScript bugs and the highest impactfaults are DOM-related.It is expected that using event triggering methods in tests, increase coverage forevent-dependent callbacks, asynchronous callbacks, and DOM-related statements.However, our results do not show a strong correlation to support this. Our man-ual analysis revealed that tests with event triggering methods, async behaviours,and DOM fixtures are mainly written to cover only particular instances of event-dependent callbacks, asynchronous callbacks, or DOM-related code. This againcan imply difficulties in writing tests with high coverage for such hard-to-test code.We believe that there is a research potential in this regard for proposing testgeneration techniques tailored to such uncovered parts. While most current testgeneration tools for JavaScript produce tests at single function level, in practicedevelopers often write tests that invoke about 3 functions per test on average. It96might also worth for researchers to develop test generation tools that produce testswith a sequence of function calls per test case.Finally, we observed that UI tests are much less prevalent in the studiedJavaScript projects. Our investigation of the coverage report did not show a sig-nificant coverage increase on the uncovered event-dependent callbacks or DOM-related code between UI and unit tests. Since UI tests do not need DOM fixturegeneration, they should be able to trigger more of the UI events, compared to codelevel unit tests. It would be interesting to further investigate this in JavaScript ap-plications with large UI tests.Test effectiveness. Another test quality metric that is interesting to investigate istest effectiveness. An ideal effective test suite should fail if there is a defect inthe code. Mutation score, i.e., the percentage of killed mutants over total non-equivalent mutants, is often used as an estimate of defect detection capability of atest suite. In fact it has been shown that there exists a significant correlation be-tween mutant detection and real fault detection [102]. In this work, however, wedid not consider mutation score as a quality metric as it was too costly to gener-ate mutants for each subject and execute the tests on each of them. We believethat it is worthwhile to study the effectiveness of JavaScript tests using mutationtesting techniques, such as Mutandis [129], which guides mutation generation to-wards parts of the code that are likely to affect the program output. This can helpto find out which aspects of code are more error-prone and not well-tested. Apartfrom test quality evaluation based on mutation score, studying JavaScript bug re-ports [142] and investigating bug locations, can give us new insights for developingmore effective test generation tools.Threats to validity. With respect to reproducibility of the results, our tool andlist of the studied subjects are publicly available [59]. Regarding the generaliz-ability of the results to other JavaScript projects, we believe that the studied setof subjects is representative of real-world JavaScript projects as they differ in do-main (category), size (SLOC), maturity (number of commits and contributors), andpopularity (number of stars and watchers). With regards to the subject categoriza-tion, we used some existing categories proposed by JSter Catalog [54] and GitHubShowcases [56].97There might be case that TESTSCANNER cannot detect a desired pattern inthe code as it performs complex static code analysis for detecting DOM-relatedstatements, event-dependent callbacks, and asynchronous APIs. To mitigate thisthreat, we made a second pass of manual investigation through such code patternsusing grep with regular expressions in command line and manually validated ran-dom cases. Such a textual search within JavaScript files through grep was espe-cially done for a number of projects with parsing errors in their code for whichTESTSCANNER cannot generate a report or the report would be incomplete. Sinceour tool statically analyzes test code to compute the number of function calls pertest, it may not capture the correct number of calls that happen during execution.While dynamic analysis could help with this regard, it can not be used for theunexecuted code and thus is not helpful to analyze uncovered code.4.4 Related WorkThere are number of previous empirical studies on JavaScript. Ratanaworabhanet al. [152] and Richards et al. [153] studied JavaScript’s dynamic behavior andRichards et al. [154] analyzed security issues in JavaScript projects. Ocariza etal. [142] performed study to characterize root causes of client-side JavaScriptbugs. Gallaba et al. [87] studied the use of callback in client and server-sideJavaScript code. Security vulnerabilities in JavaScript have also been studied onremote JavaScript inclusions [140], [193], cross-site scripting (XSS) [183], andprivacy violating information flows [98]. Milani Fard et al. [124] studied codesmells in JavaScript code. Nguyen et al. [139] performed usage patterns mining inJavaScript web applications.Researchers also studied test cases and mining test suites in the past. Inozemt-seva et al. [97] found that code coverage does not directly imply the test suiteeffectiveness. Zhang et al. [199] analyzed test assertions and showed that exis-tence of assertions is strongly correlated with test suite effectiveness. Vahabzadehet al. [176] studied bugs in test code. Milani Fard et al. proposed Testilizer [126]that mines information from existing test cases to generate new tests. Zaidman etal. [194] investigated co-evolution of production and test code.These work, however, did not study JavaScript tests. Related to our work,98Mirshokraie et al. [129] presented a JavaScript mutation testing approach and aspart of their evaluation, assessed mutation score for test suites of two JavaScriptlibraries. To the best of our knowledge, our work is the first (large scale) study onJavaScript tests and in particular their quality and shortcomings.4.5 ConclusionsJavaScript is heavily used to build responsive client-side web applications as wellas server-side projects. While some JavaScript features are known to be hard to test,no empirical study was done earlier towards measuring the quality and coverage ofJavaScript tests. This work presents the first empirical study of JavaScript tests tocharacterize their prevalence, quality metrics, and shortcomings.We found that a considerable amount of JavaScript projects do not have anytest and this is in particular for projects with JavaScript at client-side. On the otherhand almost all purely server-side JavaScript projects have tests and the quality ofthose tests are higher compared to tests for client-side. On average JavaScript testslack proper coverage for event-dependent callbacks, asynchronous callbacks, andDOM-related code. The result of this study can be used to improve JavaScript testgeneration tools in producing more effective test cases that target hard-to-test code.It would be interesting to evaluate effectiveness of JavaScript test by measuringtheir mutation score, which reveals the quality of written assertions. Another possi-ble direction could be designing automated JavaScript code refactoring techniquestowards making the code more testable and maintainable.99Chapter 5Generating Fixtures for JavaScript Unit TestingSummary11In today’s web applications, JavaScript code interacts with the Document ObjectModel (DOM) at runtime. This runtime interaction between JavaScript and theDOM is error-prone and challenging to test. In order to unit test a JavaScript func-tion that has read/write DOM operations, a DOM instance has to be provided asa test fixture. This DOM fixture needs to be in the exact structure expected bythe function under test. Otherwise, the test case can terminate prematurely dueto a null exception. Generating these fixtures is challenging due to the dynamicnature of JavaScript and the hierarchical structure of the DOM. We present an au-tomated technique, based on dynamic symbolic execution, which generates testfixtures for unit testing JavaScript functions. Our approach is implemented in atool called CONFIX. Our empirical evaluation shows that CONFIX can effectivelygenerate tests that cover DOM-dependent paths. We also find that CONFIX yieldsconsiderably higher coverage compared to an existing JavaScript input generationtechnique.5.1 IntroductionTo create responsive web applications, developers write JavaScript code that dy-namically interacts with the DOM. As such, changes made through JavaScript code11An initial version of this chapter has been published in the IEEE/ACM International Conferenceon Automated Software Engineering (ASE), 2015 [127].100via these DOM API calls become directly visible in the browser.This complex interplay between two separate languages, namely JavaScriptand the HTML, makes it hard to analyze statically [101, 111], and particularlychallenging for developers to understand [62] and test [66, 130] effectively. As re-vealed in a recent empirical study [141], the majority (65%) of reported JavaScriptbugs are DOM-related, meaning the fault pertains to a DOM API call in JavaScriptcode. Moreover, 80% of the highest impact JavaScript faults are DOM-related.In order to unit test a JavaScript function that has DOM read/write op-erations, a DOM instance needs to be provided in the exact structure as ex-pected by the function. Otherwise, a DOM API method (e.g., var n =getElementById("news")) returns null because the expected DOM nodeis not available; any operations on the variable pointing to this non-existent DOMnode (e.g., n.firstChild) causes a null exception and the test case calling thefunction terminates prematurely. To mitigate this problem, testers have to writetest fixtures for the expected DOM structure before calling the JavaScript functionin their unit tests. The manual construction of proper DOM fixtures is, however,tedious and costly.Despite the problematic nature of JavaScript-DOM interactions, most currentautomated testing techniques ignore the DOM and focus on generating events andfunction arguments [66, 161]. JSeft [130] applies an approach in which the appli-cation is executed and the DOM tree is captured just before executing the functionunder test. This DOM tree is used as a test fixture in generated test cases. Thisheuristic-based approach, however, assumes that the DOM captured at runtimecontains all the DOM elements, values, and relations as expected by the function,which is not always the case. Thus, the code coverage achieved with such a DOMcan be quite limited. Moreover, the DOM captured this way, can be too large anddifficult to read as a fixture in a test case. SymJS [109] applies symbolic executionto increase JavaScript code coverage, with limited support for the DOM, i.e., itconsiders DOM element variables as integer or string values and ignores the DOMstructure. However, there exist complex DOM structures and element relations ex-pected by the JavaScript code in practice, which this simplification cannot handle.In this work, we provide a technique for automatically generating DOM-basedfixtures and function arguments. Our technique, called CONFIX, is focused on cov-101ering DOM-dependent paths inside JavaScript functions. It operates through aniterative process that (1) dynamically analyses JavaScript code to deduce DOM-dependent statements and conditions, (2) collects path constraints in the form ofsymbolic DOM constraints, (3) translates the symbolic path constraints into XPathexpressions, (4) feeds the generated XPath expressions into an existing structuralconstraint solver [89] to produce a satisfiable XML structure, (5) generates a testcase with the solved structure as a test fixture or function argument, runs the gen-erated test case, and continues recursively until all DOM-dependent paths are cov-ered.To the best of our knowledge, our work is the first to consider the DOM as a testinput entity, and to automatically generate test fixtures to cover DOM-dependentJavaScript functions.Our work makes the following main contributions:• A novel dynamic symbolic execution engine to generate DOM-based testfixtures and inputs for unit testing JavaScript functions;• A technique for deducing DOM structural constraints and translating thoseto XPath expressions, which can be fed into existing structural constraintsolvers;• An implementation of our approach in a tool, called CONFIX, which is pub-licly available [24];• An empirical evaluation to assess the coverage of CONFIX on real-worldJavaScript applications. We also compare CONFIX’s coverage with that ofother JavaScript test generation techniques.The results of our empirical evaluation show that CONFIX yields considerablyhigher coverage — up to 40 and 31 percentage point increase on the branch, andthe statement coverage, respectively — compared to tests generated without DOMfixtures/inputs.1021 function dg(x){2 return document.getElementById(x);3 }5 function sumTotalPrice(){6 sum = 0;7 itemList = dg("items");8 if (itemList.children.length == 0)9 dg("message").innerHTML = "List is empty!";10 else {11 for (i = 0; i < itemList.children.length; i++){12 p = parseInt(itemList.children[i].value);13 if (p > 0)14 sum += p;15 else16 dg("message").innerHTML += "Wrong value for the price of item "←↩+ i;17 }18 dg("total").innerHTML = sum;19 }20 return sum;21 }Figure 5.1: A JavaScript function to compute the items total price.5.2 Background and MotivationThe majority of reported JavaScript bugs are caused by faulty interactions with theDOM [141]. Therefore, it is important to properly test DOM-dependent JavaScriptcode. This work is motivated by the fact that the execution of some paths in aJavaScript function, i.e., unique sequences of branches from the function entry tothe exit, depends on the existence of specific DOM structures. Such DOM struc-tures have to be provided as test fixtures to effectively test such DOM-dependentpaths.5.2.1 DOM Fixtures for JavaScript Unit TestingA test fixture is a fixed state of the system under test used for executing tests [122].It pertains to the code that initializes the system, brings it into the right state, pre-pares input data, or creates mock objects, to ensure tests can run and produce re-peatable results. In JavaScript unit testing, a fixture can be a partial HTML that theJavaScript function under test expects and can operate on (read or write to), or afragment of JSON/XML to mock the server responses in case of XMLHttpRequest103(XHR) calls in the code.Running Example.. Figure 5.1 depicts a simple JavaScript code for calculatingthe total price of online items. We use this as a running example to illustrate theneed for providing proper DOM structures as test input for unit testing JavaScriptcode.Lets assume that a tester writes a unit test for the sumTotalPrice functionwithout any test fixture. In this case, when the function is executed, it throwsa null exception at line 8 (Figure 5.1) when the variable itemList is ac-cessed for its children property. The reason is that the DOM API methodgetElementById (line 2) returns null since there is no DOM tree availablewhen running the unit test case. Consequently, dg (line 2) returns null, andhence the exception at line 8. Thus, in order for the function to be called from atest case, a DOM tree needs to be present. Otherwise, the function will terminateprematurely. What is interesting is that the mere presence of the DOM does notsuffice in this case. The DOM tree needs to be in a particular structure and containattributes and values as expected by the different statements and conditions of theJavaScript function.For instance, in the case of sumTotalPrice, in order to test the calculation,a DOM tree needs to be present that meets the following constraints:1. A DOM element with id "items" must exist (line 7)2. That element needs to have one or more child nodes (lines 8, 10)3. The child nodes must be of a DOM element type that can hold values (line 12),e.g., <input value="..."/>4. The values of the child nodes need to be positive (line 13) integers (lines 12)5. A DOM element with id "total" must exist (line 18).A DOM subtree that satisfies these constraints is depicted in Figure 5.2, whichcan be used as a test fixture for unit testing the JavaScript function. There arecurrently many JavaScript unit testing frameworks available, such as QUnit [40],JSUnit [35], and Jasmine [31]. We use the popular QUnit framework to illustratethe running example in this work. QUnit provides a $"qunit-fixture" vari-able to which DOM test fixtures can be appended in a test case. A QUnit unit test104<div id=‘qunit-fixture’><form id=‘items’><input id=‘item1’ value=50> <input id=‘item2’ value=120><div id=‘total’>Figure 5.2: A DOM subtree for covering a path (lines 6-8, 10-14, and 18-20) of sumTotalPrice inFigure 5.1.1 test("A test for sumTotalPrice", function() {2 $("#qunit-fixture").append('<form id="items"><input type="text" id="←↩item1" value=50><input type="text" id="item2" value=120></form><←↩div id="total"/>');3 sum = sumTotalPrice();4 equal(sum, 170, "Function sums correctly.");5 });Figure 5.3: A QUint test case for the sumTotalPrice function. The DOM subtree of Figure 5.2 isprovided as a fixture before calling the function. This test with this fixture covers the path goingthrough lines 6-8, 10-14, and 18 in shown in Figure 5.3. The execution of the test case with this particular fixtureresults in covering a path (lines 6-8, 10-14, and 18). If the fixture lacks any of theprovided DOM elements that is required in the execution path, the test case failsand terminates before reaching the assertion.Other DOM fixtures are required to achieve branch coverage. For example,to cover the true branch of the if condition in line 8, the DOM must satisfy thefollowing constraints:1. A DOM element with id "items" must exist (line 7)2. That element must have no child nodes (line 8)3. A DOM element with id "message" must exist (line 9).Yet another DOM fixture is needed for covering the else branch in line 16:1. A DOM element with id "items" must exist (line 7)2. That element needs to have one or more child nodes (lines 8, 10)3. The child nodes must be of a DOM element type that can hold values (line 12)e.g., input1054. The child nodes values need to be integers (lines 12)5. The value of a child node needs to be zero or negative (line 15)6. A DOM element with id "message" must exist (line 16).7. A DOM element with id "total" must exist (line 18).5.2.2 ChallengesAs illustrated in this simple example, different DOM fixtures are required for maxi-mizing JavaScript code coverage. Writing these DOM fixtures manually is a daunt-ing task for testers. Generating these fixtures is not an easy task either. There aretwo main challenges in generating proper DOM-based fixtures that we address inour proposed approach.Challenge 1: DOM-related variables.. JavaScript is a weakly-typed andhighly-dynamic language, which makes static code analysis quite challenging[153, 161, 182]. Moreover, its interactions with the DOM can become difficultto follow [62, 141]. For instance, in the condition of line 13 in Figure 5.1, thevalue of the variable p is checked. A fixture generator needs to determine that pis DOM-dependent and it refers to the value of a property of a DOM element, i.e.,itemList.children[i].value.Challenge 2: Hierarchical DOM relations.. Unlike most test fixtures that dealonly with primitive data types, DOM-based test fixtures require a tree structure. Infact, DOM fixtures not only contain proper DOM elements with attributes and theirvalues, but also hierarchical parent-child relations that can be difficult to recon-struct. For example the DOM fixture in Figure 5.2 encompasses the parent-childrelation between <form> and <input> elements, which is required to evaluateitemList.children.length and itemList.children[i].value inthe code (lines 8, 11, and 12 in Figure 5.1).5.2.3 Dynamic Symbolic ExecutionOur insight in this work is that the problem of generating expected DOM fixturescan be formulated as a constraint solving problem, to achieve branch coverage.Symbolic execution [106] is a static analysis technique that uses symbolic val-ues as input values instead of concrete data, to determine what values cause each106branch of a program to execute. For each decision point in the program, it in-fers a set of symbolic constraints. Satisfiability of the conjunction of these sym-bolic constraints is then checked through a constraint solver. Concolic execution[92, 160], also known as dynamic symbolic execution [170], performs symbolic ex-ecution while systematically executing all feasible program paths of a program onsome concrete inputs. It starts by executing a program with random inputs, gatherssymbolic constraints at conditional statements during execution, and then uses aconstraint solver to generate a new input. Each new input forces the execution ofthe program through a new uncovered path; thus repeating this process results inexploring all feasible execution paths of the program.5.3 ApproachWe propose a DOM-based test fixture generation technique, called CONFIX. Toaddress the highly dynamic nature of JavaScript, is based on a dynamic symbolicexecution approach.Scope. Since primitive data constraints can be solved using existing input genera-tors for JavaScript [109, 158, 161], in this work, we focus on collecting and solvingDOM constraints that enable achieving coverage for DOM dependent statementsand conditions in JavaScript code. Thus, is designed to generate DOM-based testfixtures and function arguments for JavaScript functions that are DOM-dependent.Definition 11 (DOM-Dependent Function) A DOM dependent function is aJavaScript function, which directly or indirectly accesses DOM elements, at-tributes, or attribute values at runtime using DOM APIs. 2An instance of a direct access to a DOM element isdocument.getElementById("items") in the function dg (Line 2,Figure 5.1). An indirect access is a call to another JavaScript function thataccesses the DOM. For instance, the statement at line 7 of Figure 5.1 is an indirectDOM access through function dg.Overview. Figure 5.4 depicts an overview of CONFIX. At a high level, instrumentsthe JavaScript code (block 1), and executes the function under test to collect atrace (blocks 2 and 3). Using the execution trace, it deduces DOM-dependent path107DOM-based FixtureJavaScript Code(1)Instrument Code(2)Execute Function(4)Deduce DOM-dependant PCsExecution Trace(3)Collect Execution Trace(9)Generate Test caseGenerated Unit Tests(7)Generate Fixture(8)Apply Fixture to Cover a New PathInstrumented Code(6)Solve XPath expressions (5)Translate PCs to XPath expressionsSolved XML treeFigure 5.4: Processing view of our approach.constraints (block 4), translates those constraints into XPath expressions (block 5),which are fed into an XML constraint solver (block 6). The solved XML tree isthen used to generate a DOM-based fixture (block 7), which subsequently helpsin covering unexplored paths (block 8). Finally, it generates a test suite by addinggenerated test fixtures into a JavaScript unit testing template such as QUnit (block9). In the following subsections we discuss each of these steps in more details.Algorithm. Algorithm 4 demonstrates our DOM fixture generation technique. Theinput to our algorithm is the JavaScript code, the function under test f , and op-tionally its function arguments (provided by a tester or a tool [158, 161]). Thealgorithm concolically generates DOM fixtures required for exploring all DOM-dependent paths.5.3.1 Collecting DOM-based TracesWe instrument the JavaScript code under test (algorithm 4 line 3) with wrapperfunctions to collect information pertaining to DOM interactions, which includestatements or conditional constructs that depend on the existence of a DOM struc-ture, element, attribute, or value. The instrumentation is non-intrusive meaningthat it does not affect the normal functionality of the original program.108The instrumentation augments function calls, conditionals (in if and loopconstructs), infix expressions, variable initializations and assignments, and returnstatements with inline wrapper functions to store an execution trace (see Sec-tion 5.3.5 for more details). Such a trace includes the actual statement, type ofstatement (e.g. infix, condition, or function call), list of variables and their valuesin the statement, the enclosing function name, and the actual concrete values ofthe statement at runtime. CONFIX currently supports DOM element retrieval pat-terns based on tag names, IDs, and class names, such as getElementById,getElementsByTagName, children, innerHTML, parentNode, and$() for jQuery-based code.After instrumenting the source code, the modified JavaScript file is used in arunner HTML page that is loaded inside a browser (line 7, algorithm 4) and then aJavaScript driver (e.g., WebDriver [42]) executes the JavaScript function under test(line 8). The initial DOM fixture is an empty HTML runner having a div elementwith id qunit-fixture. This execution results in an execution trace, which iscollected from the browser for further analysis (line 9).5.3.2 Deducing DOM ConstraintsAn important phase of dynamic symbolic execution is gathering path constraints(PCs). In our work, path constraints are conjunctions of constraints on symbolicDOM elements.Definition 12 (Symbolic DOM Element) A symbolic DOM element d is a datastructure representing a DOM element in terms of its symbolic properties and val-ues. d is denoted by a 4 tuple <P,C ,A ,T > where:1. P is d’s parent symbolic DOM element.2. C is a set of child symbolic DOM elements of d.3. A is a set of 〈att, val〉 pairs; each pair stores an attribute att of dwith a symbolic value val.4. T is the element type of d. 2109Algorithm 4: Test Fixture Generationinput : JavaScript code JS, the function under test f , function arguments for foutput: A set of DOM fixtures f ixtureSet for f1 negatedConstraints←∅2 DOMRefTrackList←∅Procedure GENERATEFIXTURE(JS, f )begin3 JSinst ← INSTRUMENT(JS)4 f ixtureSet←∅5 f ixture←∅6 repeat7 browser.LOAD(JSinst , f ixture)8 browser.EXECUTE( f )9 t← browser.GETEXECUTIONTRACE()10 f ixture← SOLVECONSTRAINTS(t)11 f ixtureSet← f ixtureSet ∪ f ixtureuntil f ixture 6=∅;12 return fixtureSetendProcedure SOLVECONSTRAINTS(t)begin13 DOMRefTrackList← GETDOMREFERENCETRACKS(t)14 pc← GETPATHCONSTRAINT(t, DOMRefTrackList)15 f ixture← UNSAT16 while f ixture = UNSAT do17 f ixture←∅18 c← GETLASTNONNEGCONST(pc,negatedConstraint)19 if c 6= null then20 negatedConstraint← negatedConstraint ∪ c21 pc← NEGATECONSTRAINT(pc,c)22 xp← GENERATEXPATH(pc,DOMRe f TrackList)/* SOLVEXPATHCONSTRAINT returns UNSAT if xp is notsolvable. */23 f ixture← SOLVEXPATHCONSTRAINT(xp)endend24 return fixtureendNote that keeping the parent-children relation for DOM elements is sufficientto recursively generate the DOM tree.DOM constraints in the code can be conditional or non-conditional. A non-conditional DOM constraint is a constraint on the DOM tree required by a DOMaccessing JavaScript statement. A conditional DOM constraint is a constraint onthe DOM tree used in a conditional construct.Example 5 Consider the JavaScript code in Figure 5.1. In line 2,110document.getElementById(x) is a non-conditional DOM constraint,i.e., an element with a particular ID is targeted. On the other hand,itemList.children.length == 0 (line 8) is a conditional DOM con-straint, i.e., the number of child nodes of a DOM element is checked to be zero.Non-conditional DOM constraints evaluate to null or a not-null object, while,conditional DOM constraints evaluate to true or false. For example, ifwe execute sumTotalPrice() with a DOM subtree void of an element withid="items", the value of itemList at line 7 will be null and the executionwill terminate at line 8 when evaluating itemList.children.length ==0. On the other hand, if that element exists and has a child, the condition at line 8evaluates to false.In this work, the input to be generated is a DOM subtree that is accessed viaDOM APIs in the JavaScript code. As we explained in Section 5.2.2, due to thedynamic nature of JavaScript, its interaction with the DOM can be difficult to fol-low (challenge 1). Tracking DOM interactions in the code is needed for extractingDOM constraints. Aliases in the code add an extra layer of complexity. We needto find variables in the code that refer to DOM elements or element attributes. Toaddress this challenge, we apply an approach similar to dynamic backward slic-ing, except that instead of slices of the code, we are interested in relevant DOM-referring variables. To that end, we use dynamic analysis to extract DOM referringvariables from the execution trace, by first searching for DOM API calls, their argu-ments, and their actual values at runtime. The process of collecting DOM referringvariables (Algorithm 4 line 13) is outlined further in subsection 5.3.5.Once DOM referring variables are extracted, constraints on their corre-sponding DOM elements are collected and used to generate constraints onsymbolic DOM elements (see Definition 12). DOM constraints can be ei-ther attribute-wise or structure-wise. Attribute-wise constraints are satis-fied when special values are provided for element attributes. For example,parseInt(itemList.children[i].value) > 0 (line 13 of Figure 5.1)requires the value of the i-th child node to be an integer greater than zero. Thevalue is an attribute of the child node in this example. Structure-wise constraintsare applied to the element type and its parent-children relations. For example, in111itemList.children.length == 0 (line 8 of Figure 5.1 to cover the elsebranch (lines 10–19) an element with id "items" is needed with at least one childnode.The conjunction of these symbolic DOM constraints in an iterationforms a path constraint. For instance, the structure-wise constraint inparseInt(itemList.children[i].value) > 0 (line 13 of Figure 5.1)requires the child nodes to be of an element type that can hold values, e.g., inputtype, and the attribute-wise constraint requires the value to be a positive integer.Our technique reasons about a collected path constraint and constructs sym-bolic DOM elements needed. For each symbolic DOM element, (1) infers thetype of the parent node, (2) determines the type and number of child nodes, and(3) generates, through a constraint solver, satisfied values that are used to assignattributes and values (i.e., 〈att, val〉 pairs). The default element type for asymbolic DOM element is div — the div is a placeholder element that defines adivision or a section in an HTML document. It satisfies most of the element typeconstraints and can be parent/child of many elements — unless specific attributes/-values are accessed from the element, which would imply that a different elementtype is needed. For instance, if the value is read/set for an element, the typeof that element needs to change to, for instance input, because per definition,the div type does not carry a value attribute (more detail in subsection 5.3.5).These path constraints with satisfied symbolic DOM elements are used to generatea corresponding XPath expression, as presented in the next subsection.5.3.3 Translating Constraints to XPathThe problem of DOM fixture generation can be formulated as a decision prob-lem for the emptiness test [69] of an XPath expression, in the presence of XHTMLmeta-models, such as Document Type Definitions (DTD) or XML Schemas. Thesemeta-models define the hierarchical tree-like structure of XHTML documents withthe type, order, and multiplicity of valid elements and attributes. XPath [75]is a query language for selecting nodes from an XML document. An exampleis the expression /child::store/child::item/child::price whichnavigates the root through the top-level “store” node to its “item” child nodes and112on to its “price” child nodes. The result of the evaluation of the entire expression isthe set of all the “price” nodes that can be reached in this manner. XPath is a veryexpressive language. We propose a restricted XPath expression grammar, shownin Figure 5.5, which we use to model our DOM constraints in.〈XPath〉 ::= 〈Path〉 | /〈Path〉〈Path〉 ::= 〈Path〉/〈Path〉 | 〈Path〉[〈Qualifier〉] | child::〈Name〉 | 〈Name〉〈Qualifier〉 ::= 〈Qualifier〉 and 〈Qualifier〉 | 〈Path〉 | @〈Name〉〈Name〉 ::= 〈HTMLTag〉 | 〈Attribute〉=〈Value〉〈HTMLTag〉 ::= a | b | button | div | form | frame | h1 - h6 | iframe | img | input | i | li | link | menu |option | ol | p | select | span | td | tr | ul〈Attribute〉 ::= id | type | name | class | value | src | innerHTML | title | selected | checked | href |size | width | heightFigure 5.5: Restricted XPath grammar for modeling DOM constraints.We transform the deduced path constraints, with the symbolic DOM elements,into their equivalent XPath expressions conforming to this specified grammar.These XPath expressions systematically capture the hierarchical relations betweenelements. Types of common constraints translated to expressions include specify-ing the existence of a DOM element/attribute/value, properties of style attributes,type and number of child nodes or descendants of an element, or binary propertiessuch as selected/not selected.Example 6 Table 5.1 shows examples of collected DOM con-straints that are translated to XPath expressions, for the run-ning example. For example, in the first row, the DOM constraintdocument.getElementById("items") 6= null is translated to theXPath expression div[@id="qunit-fixture"][div[@id="items"]],which expresses the desire for the existence of a div element with id “items”in the fixture. The last row shows a more complex example, including six DOMconstraints in a path constraint, which are translated into a corresponding XPathexpression.113Table 5.1: Examples of DOM constraints, translated XPath expressions, and solved XHTML instances for the running example.DOM constraints Corresponding XPath expressions Solved XHTMLdocument.getElementById(“items”) 6= null div[@id=“qunit-fixture”][div[@id=“items”]] <div id=“items”/>document.getElementById(“items”) 6= null ∧ div[@id=“qunit-fixture”][div[@id=“items”] and <div id=“items”/>itemList.children.length == 0 ∧ child::div[@id=“message”]] <div id=“message”/ >document.getElementById(“message”) 6= nulldocument.getElementById(“items”) 6= null ∧ div[@id=“qunit-fixture”][div[@id=“items” and <div id=“items”>itemList.children.length 6= 0 ∧ child::div[@id=“Confix1”]] and <div id=“Confix1”/>0 < itemList.children.length ∧ child::div[@id=“message”]] </div>document.getElementById(“message”) 6= null <div id=“message”/>document.getElementById(“items”) 6= null ∧ div[@id=“qunit-fixture”][div[@id=“items” and <div id=“items”>itemList.children.length 6= 0 ∧ child::input[@id=“Confix1” and @value=”1”]] and <input id=“Confix1” value=“1”/>0 < itemList.children.length ∧ child::div[@id=“message”] and </div>parseInt(itemList.children[0].value) > 0 ∧ child::div[@id=“total”]] <div id=“message”/>document.getElementById(“message”) 6= null ∧ <div id=“total”/>document.getElementById(“total”) 6= null1145.3.4 Constructing DOM FixturesNext, the XPath expressions are fed into a structural XML solver [89]. The con-straint solver parses the XPath expressions and compiles them into a logical rep-resentation, which is tested for satisfiability. If satisfiable, the solver generates asolution in the XML language. Since an XHTML meta-model (i.e., DTD) is fedinto the solver along with the XPath expressions, the actual XML output is aninstance of valid XHTML. The last column of Table 5.1 shows solved XHTML in-stances that satisfy the XPath expressions, for the running example. These solvedXHTML instances are subsequently used to construct test fixtures.Each newly generated fixture forces the execution of the JavaScript functionunder test along a new uncovered path. This is achieved by systematically negatingthe last non-negated conjunct in the path constraint and solving it to obtain a newfixture, in a depth first manner. If a path constraint is unsatisfiable, the techniquechooses a different path constraint and this process repeats until all constraints arenegated.In the main loop of Algorithm 4 (lines 6–11), fixtures are iteratively gen-erated and added to the fixtureSet. In the SolveConstraints proce-dure, a fixture is initialized to UNSAT; the loop (lines 16–23) continues untilthe fixture is set either to a solved DOM subtree12 (line 23), or to ∅ (line 17)if there exist no non-negated constraints in the PCs. When a ∅ fixture returnsfrom SolveConstraints, the loop in the main procedure terminates, and thefixtureSet is returned.12Note that SolveXpathConstraint returns UNSAT when it fails to solve the given pathconstraint.115Table 5.2: Constraints table for the running example. The “Next to negate” field refers to the last non-negated constraint.IterationCurrent fixture Current DOM constraints NegatedNexttonegateFixture for the next iteration Paths covered1 ∅ document.getElementById(“items”) = null - 3 <div id=“items”/> Lines 1–82 <div id=“items”/> document.getElementById(“items”) 6= null ∧ 3 - <div id=“items”/> Lines 1–9itemList.children.length = 0 ∧ - - <div id=“message”/ >document.getElementById(“message”) = null - 33 <div id=“items”/> document.getElementById(“items”) 6= null ∧ 3 - <div id=“items”> Lines 1–9 and 20<div id=“message”/ > itemList.children.length = 0 ∧ - 3 <div id=“Confix1”/>document.getElementById(“message”) 6= null 3 - </div><div id=“message”/>4 <div id=“items”> document.getElementById(“items”) 6= null ∧ 3 - <div id=“items”> Lines 1–8, 10–13,<div id=“Confix1”/> itemList.children.length 6= 0 ∧ 3 - <div id=“Confix1”/> and 15–18</div> 0 < itemList.children.length ∧ - - </div><div id=“message”/> parseInt(itemList.children[0].value) ≯ 0 ∧ - - <div id=“message”/>document.getElementById(“message”) 6= null ∧ 3 - <div id=“total”/>document.getElementById(“total”) = null - 35 <div id=“items”> document.getElementById(“items”) 6= null ∧ 3 - <div id=“items”> Lines 1–8, 10–13,<div id=“Confix1”/> itemList.children.length 6= 0 ∧ 3 - <input id=“Confix1” value=“1”/> and 15–20</div> 0 < itemList.children.length ∧ - - </div><div id=“message”/> parseInt(itemList.children[0].value) ≯ 0 ∧ - 3 <div id=“message”/><div id=“total”/> document.getElementById(“message”) 6= null ∧ 3 - <div id=“total”/>document.getElementById(“total”) 6= null 3 -6 <div id=“items”> document.getElementById(“items”) 6= null ∧ 3 - UNSAT⇒ Negate last non-negated Lines 1–8, 10–14,<input id=“Confix1” value=“1”/> itemList.children.length 6= 0 ∧ 3 - constraint and 18–20</div> 0 < itemList.children.length ∧ - 3<div id=“message”/> parseInt(itemList.children[0].value) > 0 ∧ 3 -<div id=“total”/> document.getElementById(“message”) 6= null ∧ 3 -document.getElementById(“total”) 6= null 3 -<div id=“items”> document.getElementById(“items”) 6= null ∧ 3 - No non-negated constraint exists⇒<input id=“Confix1” value=“1”/> itemList.children.length 6= 0 ∧ 3 - Fixture = ∅</div> 0 < itemList.children.length ∧ 3 -<div id=“message”/> parseInt(itemList.children[0].value) > 0 ∧ 3 -<div id=“total”/> document.getElementById(“message”) 6= null ∧ 3 -document.getElementById(“total”) 6= null 3 -116Example 7 Table 5.2 shows the extracted path constraints and their values, as wellas the current and next iteration fixtures at each iteration of the concolic executionfor the running example. Since there is no fixture available in the first iteration,the constraint obtained is document.getElementById("items")=null.This means the execution terminates at line 8 of the JavaScript code (Figure 5.1)due to a null exception. Our algorithm negates the last non-negated constraint (document.getElementById("items") 6=null), and generates the corre-sponding fixture (<div id="items">) to satisfy this negated constraint. Thisprocess continues until the solver fails at producing a satisfiable fixture in the sixthiteration. UNSAT is returned because the constraints itemList.children.length 6=0 and the newly negated one (0≮itemList.children.length) require that the number of child nodes be negative, which is not feasible inthe DOM structure. It then tries to generate a fixture by negating the last non-negated constraint without applying any fixtures, i.e., the path constraint extractedin the sixth iteration. However, in this case there are no non-negated constraintsleft since the last one (0<itemList.children.length) had already beennegated and the result was not satisfiable. Consequently the algorithm terminateswith an empty fixture in the last iteration. The table also shows which paths of therunning example are covered in terms of lines of JavaScript code.5.3.5 Implementation DetailsCONFIX currently generates QUnit test cases with fixtures, however, it can beeasily adapted to generate test suites in other JavaScript testing frameworks. Toparse the JavaScript code into an abstract syntax tree for instrumentation, we buildup on Mozilla Rhino [41]. To collect execution traces while executing a functionin the browser, we use WebDriver [42].XML Solver.. To solve structure-wise DOM constraints using XPath expressions,we use an existing XML reasoning solver [89]. A limitation with this solver isthat it cannot generate XML structures with valued attributes (i.e., attributes aresupported but not their values). To mitigate this, we developed a transformationtechnique that takes our generic XPath syntax (Figure 5.5) and produces an XPathformat understandable by the solver. More specifically, we merge attributes and117their values together and feed them to the solver as single attributes to be gen-erated. Once the satisfied XML is generated, we parse and decompose it to theproper attribute-value format as expected in a valid XHTML instance. Anotherlimitation is that it merges instances of the same tag elements at the same tree levelwhen declared as children of a common parent. We resolved this by appending anauto-increment number to the end of each tag and remove it back once the XMLproduced.Handling asynchronous calls. Another challenge we encountered pertains to han-dling asynchronous HTTP requests that send/retrieve data (e.g., in JSON/XML for-mat) to/from a server performed by using the XMLHttpRequest (XHR) object.This feature makes unit-level testing more challenging since the server-side gener-ated data should also be considered in a test fixture as an XHR response if the func-tion under test (in)directly uses the XHR and expects a response from the server.Existing techniques [100] address this issue by mocking the server responses, butthey require multiple concrete executions of the application to learn the response.This is, however, not feasible in our case because we generate JavaScript unit testsin isolation from other dependencies such as the server-side code. As a solution,our instrumentation replaces the XHR object of the browser with a new object andredefines the XHR open() method in a way that it always uses the GET requestmethod to synchronously retrieve data from a local URL referring to our mockedserver. This helps us to avoid null exceptions and continue the execution of thefunction under test. However, if the execution depends on the actual value of theretrieved data (and not merely their existence), our current approach can not handleit. In such cases, a string solver [105] may be helpful.Tracking DOM-referring variables. To detect DOM-referring variables — usedto generate constraints on symbolic DOM elements (Definition 12) — we auto-matically search for DOM API calls, their arguments, and their actual values atruntime, in the execution trace. Algorithm 4 keeps track of DOM references (line13) by storing information units, called DOM Reference Track (DRT), in a datastructure.Definition 13 (DOM Reference Track (DRT)) A DOM reference track is a datastructure capturing how a DOM tree is accessed in the JavaScript code. It is de-118Table 5.3: DRT data structure for the running example.IterationDOMVariableParentVariableElementAttributeVariablesExists?Type1 itemList document div 〈id:items, -〉 72 itemList document div 〈id:items,-〉 3- itemList div 〈id:Confix1,-〉 7- document div 〈id:message,-〉 73 itemList document div 〈id:items,-〉 3- itemList div 〈id:Confix1,-〉 7- document div 〈id:message,-〉 3... ... ... ... ... ...6 itemList document div 〈id:items,-〉 3- itemList input 〈id:Confix1,-〉, 〈value:1, p〉 3- document div 〈id:message,-〉 3- document div 〈id:total,-〉 3noted by a 4 tuple <D ,P,A ,T > where:1. D (DOMVariable) is a JavaScript variable v that is set to refer to a DOMelement d.2. P (ParentVariable) is a JavaScript variable (or the document object) thatrefers to the parent node of d.3. A (AttributeVariables) is a set of 〈att:val, var〉 pairs; each pair storesthe variable var in the code that refers to an attribute att of d with a valueval.4. T (ElementType) is the node type of d. 2When JavaScript variables are evaluated in a condition, the DRT entries in thisdata structure are examined to determine whether they refer to the DOM. If theactual value of a JavaScript variable at runtime contains information regarding aDOM element object, and it does not exist in our DRT data structure, we add it asa new DOM referring variable. Table 5.3 presents an example of the DRT for therunning example.119We implemented a constraint solver that reasons about some common symbolicDOM constraints such as string/integer attribute values, and number of childrennodes. Specifically the solver infers conditions on DOM referring variables by ex-amining DRT entries. If the constraint is on an attribute of a DOM element, thenthe AttributeVariables property of the corresponding DRT will be updatedwith a satisfying value. In case the constraint is a structural constraint, such asnumber of child nodes, a satisfying number of DRT entities would be added to thetable. Table 5.3 depicts the process of constructing the DRT during different itera-tions of the concolic execution. The Exists field indicates whether the elementexists in the DOM fixture.Example 8 Consider the running example of Figure 5.1. WhensumTotalPrice() is called in the first iteration, dg("items") re-turns null as no DOM element with ID items exists. Table 5.3 would thenbe populated by adding the first row: DOMVariable is itemList, theParentVariable points to document, the default element type is set to div,and the attribute id is set to items; and this particular element does not existyet. The execution terminates with a null exception at line 8. In the next iteration,CONFIX updates the DOM fixture with a div element with id items. Thereforedg("items") returns a DOM element and line 8 evaluates the number ofchild nodes under itemList. This would then update the table with a newentry having ParentVariable point to itemList and attribute id set toan automatically incremented id "Confix1" (in the second row). This processcontinues as shown in Table 5.3.Generating DOM-based function arguments. Current tools for JavaScript inputgeneration (e.g., [158, 161]) only consider primitive data types for function argu-ments and thus cannot handle functions that take DOM elements as input. Considerthe following simple function:1 function foo(elem) {2 var price = elem.firstChild.value;3 if (price > 200) {4 ...5 }}120The elem function parameter is expected to be a DOM element, whose firstchild node’s value is read in line 2 and used in line 3. The problem of generatingDOM function arguments is not fundamentally different from generating DOMfixtures. Thus, we propose a solution for this issue in CONFIX. The challenge here,however, is that JavaScript is dynamically typed and since elem in this exampledoes not reveal its type statically nor when foo is executed in isolation in a unittest (because elem does not exist to log its type dynamically), it is not possible todetermine that elem is a DOM element. To address this challenge, CONFIX firstcomputes a forward slice of the function parameters. If there is a DOM API call inthe forward slice, CONFIX deduces constraints and solves them similarly to howDOM fixtures are constructed. The generated fixture is then parsed into a DOMelement object and the function is called with that object as input in the test case.In the example above, there is a DOM API call present, namely firstChild inthe forward slice of elem. Therefore, CONFIX would know that elem is a DOMelement and would generate it accordingly. Then foo is called in the test case withthat object as input.5.4 Empirical EvaluationTo assess the efficacy of our proposed technique, we have conducted a controlledexperiment to address the following research questions:RQ1 (Coverage) Can fixtures generated by CONFIX effectively increase codecoverage of DOM-dependent functions?RQ2 (Performance) What is the performance of running ? Is it acceptable?The experimental objects and our results, along with the implementation of areavailable for download [24].5.4.1 Experimental ObjectsTo evaluate CONFIX, we selected four open source web applications that havemany DOM-dependent JavaScript functions. Table 5.4 shows these applications,which fall under different application domains and have different sizes. ToDoList121Table 5.4: Characteristics of experimental objects excluding blank/comment lines and external JavaScriptlibraries.Name JSLOC#Branches#Functions%DOM-DependentFunctions#DOMConstraints(DC)%Non-ConditionalDC%ConditionalDCToDoList 82 10 7 100 19 84 16HotelReserve 106 88 9 56 13 69 31Sudoku 399 344 18 78 66 67 33Phormer 1553 464 109 71 194 70 30Total 2140 906 143 72 292 70 30[43] is a simple JavaScript-based todo list manager. HotelReserve [29] is a reser-vation management application. Sudoku [44] is a web-based implementation ofthe Sudoku game. And Phormer [39] is an Ajax photo gallery. As presented inTable 5.4, about 70% of the functions in these applications are DOM-dependent.The table also shows the lines of JavaScript code, and number of branches andfunctions in each application.5.4.2 Experimental SetupOur experiments are performed on Mac OS X, running on a 2.3GHz Intel Core i7with 8 GB memory, and FireFox 37.Independent variablesTo the best of our knowledge, there exists no DOM test fixture generation tool tocompare against; the closest to is JSeft [130], which generates tests that use theentire DOM at runtime as test fixtures. However, JSeft does not generate DOMfixtures, and it requires a deployed web application before it can be used.Therefore, we construct baseline test suites to compare against. We comparedifferent types of test suites to evaluate the effectiveness of test fixtures generatedby CONFIX. Table 5.5 depicts different JavaScript unit test suites. We classifytest suites based on the type of test input they have support for, namely, (1) DOM122Table 5.5: Evaluated function-level test suites.Test Suite Function Arguments DOM DOM # TestFixture Input CasesNoInput No input 7 7 98Jalangi Generated by JALANGI 7 7 98+4Manual Manual inputs 7 7 98+55ConFix + NoInput No input 3 7 98+125ConFix + Jalangi Generated by JALANGI 3 7 98+129ConFix + Manual Manual inputs 3 3 98+236fixtures, and/or (2) DOM function arguments.Test suites without DOM fixtures. NoInput is a naive test suite that calls eachfunction without setting any fixture or input for it. Jalangi produces (non-DOM)function arguments using the concolic execution engine of JALANGI [161]. Man-ual is a test suite that uses manually provided (non-DOM) inputs.Test suites with DOM fixtures. To assess the effect of DOM fixtures and inputsgenerated by CONFIX, we consider different combinations: ConFix + NoInputhas DOM fixtures generated by but no function arguments, ConFix + Jalangi hasDOM fixtures generated by but uses the inputs generated by JALANGI for non-DOM function arguments, and ConFix + Manual uses DOM fixtures and DOMfunction arguments generated by and manual inputs for non-DOM function argu-ments. Table 5.5 shows all these combinations along with the number of test casesin each category.Note that since our approach is geared toward generating DOM-based test fix-tures/inputs, we only consider test generation for DOM-dependent functions andthus for all categories we consider the same set of 98 DOM-dependent functionsunder test, but with different inputs/fixtures. The 55 manual non-DOM functionarguments were written by the authors through source code inspection.Dependent variablesOur dependent variables are code coverage and generation time.Code coverage. Code coverage is commonly used as a test suite adequacy crite-rion. To address RQ1, we compare the JavaScript code coverage of the different1230%	  10%	  20%	  30%	  40%	  50%	  60%	  70%	  80%	  NoInput	   ConFix	  +	  NoInput	  Jalangi	   ConFix	  +	  Jalangi	  Manual	   ConFix	  +	  Manual	  Code	  Coverage	  Statement	  Coverage	   Branch	  Coverage	  Figure 5.6: Comparison of statement and branch coverage, for DOM-dependent functions, using differenttest suite generation methods.test suites, using JSCover [33]. Since our target functions in this work are DOM-dependent functions (Definition 11), code coverage is calculated by consideringonly DOM-dependent functions.Fixture generation time. To answer RQ2 (performance), we measure the time(in seconds) required for to generate test fixtures for a test suite, divide it by thenumber of generated tests, and report this as the average fixture generation time.5.4.3 ResultsCoverage (RQ1). Figure 5.6 illustrates the comparison of code coverage achievedby each test suite category. We report the total statement and branch coverage ofthe JavaScript code obtained from the experimental objects.Table 5.4 shows that in total about 14% of the code (i.e., 292 out of 2140 LOC)contains DOM constraints. However, as shown in Figure 5.6, this relatively smallportion of the code has a remarkable impact on the code coverage when comparingtest suites with and without DOM test fixtures. This is due to the fact that if a DOMconstraint is not satisfied, the function terminates as a result of a null exception inmost cases. Such constraints may exist at statements near the entrance of functions(as shown in Figure 5.1) and thus, proper DOM test fixtures are essential to achieveproper coverage. Table 5.6 shows the coverage increase for the test suites.Our results, depicted in Figure 5.6 and Table 5.6, show that Manual and Jalangicannot achieve a much higher code coverage than NoInput. This again relates tothe fact that if expected DOM elements are not available, then the execution ofDOM-dependent functions terminates and consequently code coverage cannot be124Table 5.6: Coverage increase (in percentage point) of test suites on rows over test suites on columns.Statement and branch coverage are separated by a slash, respectively.NoInput Jalangi Manual ConFix ConFix+ +NoInput JalangiJalangi 3 / 2 — — — —Manual 7 / 9 4 / 7 — — —ConFix + NoInp 23 / 23 20 / 21 16 / 14 — —ConFix + Jalangi 26 / 25 23 / 23 19 / 16 3 / 2 —ConFix + Manual 34 / 42 31 / 40 27 / 33 11 / 19 8 / 17increased much, no matter the quality of the function arguments provided.The considerable coverage increase for ConFix + NoInput vs. Jalangi andManual indicates that test suites generated by CONFIX, even without providing anyarguments, is superior over other test suites with respect to the achieved code cov-erage. The coverage increases even more when function arguments are provided;for example, for ConFix + Manual compared to Jalangi, there is a 40 percentagepoint increase (300% improvement) in the branch coverage, and a 31 percentagepoint increase (67% improvement) in the statement coverage.The coverage increase for ConFix + Manual vs. ConFix + NoInput and ConFix+ Jalangi is more substantial in comparison with the coverage increase for Manualvs. NoInput and Jalangi. This is mainly due to (1) the DOM fixtures generated,which are required to execute paths that depend on manually given arguments; and(2) the DOM arguments generated, which enable executing paths that depend onDOM elements provided as function arguments.Although DOM fixtures generated by can substantially improve the cover-age compared to the current state-of-the-art techniques, we discuss why does notachieve full coverage in Section 5.5.Performance (RQ2). The execution time of is mainly affected by its concolic ex-ecution, which requires iterative code execution in the browser and collecting andsolving constraints. Our results show that, on average, CONFIX requires 0.7 secondper test case and 1.6 second per function to generate DOM fixtures. This amountof time is, however, negligible considering the significant code coverage increase.Since the number of DOM constraints in typical DOM-dependent JavaScript func-125tions is not large (2.9 on average in our study), concolic execution can be performedin a reasonable time.5.5 DiscussionApplications. Given the fact that JavaScript extensively interacts with the DOMon the client-side, and these interactions are highly error-prone [141], we see manyapplications for our technique. CONFIX can be used to automatically generateJavaScript unit tests with DOM fixtures that could otherwise be quite time consum-ing to write manually. It can also be used in combination with other existing testgeneration techniques [130, 161] to improve the code coverage. In case a DOMconstraint depends on a function argument, we can perform concolic executioni.e., beginning with an arbitrary value for the argument, capturing the DOM con-straint during execution, and treating the DOM referring variable and the argumentas symbolic variables. In addition to DOM fixtures, can also generate DOM-basedfunction arguments, i.e., DOM elements as inputs, as explained in subsection 5.3.5.Currently there is no tool that supports DOM input generation for JavaScript func-tions.Limitations. We investigated why does not achieve full coverage. The main rea-sons that we found reveal some of the current limitations of our implementation,which include: (1) we implemented a simple integer/string constraint solver togenerate XPath expressions with proper structure and attribute values, which can-not handle complex constraints currently; (2) we do not support statements thatrequire event-triggering; (3) the XML solver used in our work cannot efficientlysolve long lists of constraints, (4) some paths are browser-dependent, which is outof the scope of CONFIX; (5) the execution of some paths are dependent on globalvariables that are set via other function calls during the execution, which is alsoout of the scope of ; and (6) does not analyze dynamically generated code usingeval() that interacts with the DOM.Threats to validity. A threat to the external validity of our experiment is withregard to the generalization of the results to other JavaScript applications. To mit-igate this threat, we selected applications from different domains (task manage-ment, form validation, game, gallery) that exhibit variations in functionality, and126we believe they are representative of JavaScript applications that use DOM APIsfor manipulating the page; although we do need more and large applications tomake our results more generalizable. With respect to the reproducibility of ourresults, CONFIX, the test suites, and the experimental objects are all available [24],making the experiment repeatable.5.6 Related WorkMost current web testing techniques focus on generating sequences of events atthe DOM level, while we consider unit test generation at the JavaScript code level.Event-based test generation techniques [121, 123, 126] can miss JavaScript faultsthat do not propagate to the DOM [130].Unit testing. Alshraideh [65] generates unit tests for JavaScript programs throughmutation analysis by applying basic mutation operators. Heidegger et al. [95] pro-pose a test case generator for JavaScript that uses contracts (i.e., type signatureannotations that the tester has to include in the program manually) to generate in-puts. ARTEMIS [66] is a framework for automated testing of JavaScript, whichapplies feedback-directed random test generation. None of these techniques con-sider DOM fixtures for JavaScript unit testing.More related to our work, JSEFT [130] applies a heuristic-based approach bycapturing the full DOM tree during the execution of an application just beforeexecuting the function under test, and uses that DOM as a test fixture in a generatedtest case. This approach, however, cannot cover all DOM dependent branches.Symbolic and concolic execution. Nguyen et al. [138] present a technique that ap-plies symbolic execution for reasoning about the potential execution of client-sidecode embedded in server-side code. KUDZU [158] performs symbolic reasoning toanalyze JavaScript security vulnerabilities, such as code injections in web applica-tions. JALANGI [161] is a dynamic analysis framework for JavaScript that appliesconcolic execution to generate function arguments; however, it does not supportDOM-based arguments nor DOM fixtures, as CONFIX does. SYMJS [109] con-tains a symbolic execution engine for JavaScript, as well as an automatic eventexplorer. It extends HTMLUnit’s DOM and browser API model to support sym-bolic execution by introducing symbolic values for specific elements, such as text127inputs and radio boxes. However, it considers substituting DOM element variableswith integer or string values and using a traditional solver, rather than actually gen-erating the hierarchical DOM structure. CONFIX on the other hand has support forthe full DOM tree-structure including its elements and their hierarchical relations,attributes, and attribute values.To the best of our knowledge, CONFIX is the first to address the problem ofDOM test fixture construction for JavaScript unit testing. Unlike most other tech-niques, we consider JavaScript code in isolation from server-side code and withoutthe need to execute the application as a whole.5.7 ConclusionsProper test fixtures are required to cover DOM-dependent statements and condi-tions in unit testing JavaScript code. However, generating such fixtures is not aneasy task. In this chapter, we proposed a concolic technique and tool, called CON-FIX, to automatically generate a set of unit tests with DOM fixtures and DOMfunction arguments. Our empirical results show that the generated fixtures sub-stantially improve code coverage compared to test suites without these fixtures.128Chapter 6Detecting JavaScript Code SmellsSummary13JavaScript is a powerful and flexible prototype-based scripting language that is in-creasingly used by developers to create interactive web applications. The languageis interpreted, dynamic, weakly-typed, and has first-class functions. In addition, itinteracts with other web languages such as CSS and HTML at runtime. All thesecharacteristics make JavaScript code particularly error-prone and challenging towrite and maintain. Code smells are patterns in the source code that can adverselyinfluence program comprehension and maintainability of the program in the longterm. We propose a set of 13 JavaScript code smells, collected from various devel-oper resources. We present a JavaScript code smell detection technique called JS-NOSE. Our metric-based approach combines static and dynamic analysis to detectsmells in client-side code. This automated technique can help developers to spotcode that could benefit from refactoring. We evaluate the smell finding capabilitiesof our technique through an empirical study. By analyzing 11 web applications,we investigate which smells detected by JSNOSE are more prevalent.6.1 IntroductionJavaScript is a flexible popular scripting language that is used to offload core func-tionality to the client-side web browser and mutate the DOM tree at runtime to13An initial version of this chapter has been published in the IEEE International Conference onSource Code Analysis and Manipulation (SCAM), 2013 [124].129facilitate smooth state transitions. Because of its flexibility JavaScript is a par-ticularly challenging language to write code in and maintain. The challenges aremanifold: First, it is an interpreted language, meaning that there is typically nocompiler in the development cycle that would help developers to spot erroneous orunoptimized code. Second, it has a dynamic, weakly-typed, asynchronous nature.Third, it supports intricate features such as prototypes [151], first-class functions,and closures [77]. And finally, it interacts with the DOM through a complex event-based mechanism [180].All these characteristics make it difficult for web developers who lack in-depthknowledge of JavaScript, to write maintainable code. As a result, web applica-tions written in JavaScript tend to contain many code smells [85]. Code smells arepatterns in the source code that indicate potential comprehension and maintenanceissues in the program. Code smells, once detected, need to be refactored to improvethe design and quality of the code.Detecting code smells manually is time consuming and error-prone. Auto-mated smell detection tools can lower long-term development costs and increasethe chances for success [178] by helping to make the code more maintainable.Current work on web application code smell detection is scarce [137] and tools[9, 17, 27, 137] available to web developers to maintain their code are mainly staticanalyzers and thus limited in their capabilities.In this work, we propose a list of code smells for JavaScript applications. Intotal, we consider 13 code smells: 7 are existing well-known smells adapted toJavaScript, and 6 are specific JavaScript code smell types, collected from variousJavaScript development resources. We present an automated technique, called JS-NOSE, which performs a metric-based static with dynamic analysis to detect thesesmells in JavaScript code.Our work makes the following main contributions:• We propose a list of JavaScript code smells, collected from various web de-velopment resources;• We present an automated metric-based approach to detect JavaScript codesmells;130• We implement our approach in a tool called JSNose, which is freely avail-able;• We evaluate the effectiveness of our technique in detecting code smells inJavaScript applications;• We empirically investigate 11 web applications using JSNOSE to find outwhich smells are more prevalent.Our results indicate that amongst the smells detected by JSNOSE, lazy object,long method/function, closure smells, coupling between JavaScript, HTML, andCSS, and excessive global variables, are the most prevalent code smells. Further,our study indicates that there exists a strong and significant positive correlationbetween the types of smells and lines of code, number of functions, number ofJavaScript files, and cyclomatic complexity.6.2 Motivation and ChallengesAlthough JavaScript is increasingly used to develop modern web applications,there is still lack of tool support targeting code quality and maintenance, in par-ticular for automated code smell detection and refactoring.JavaScript is a dynamic weakly typed and prototype-based scripting language,with first-class functions. Prototype-based programming is a class-free style ofobject-oriented programming, in which objects can inherit properties from otherobjects directly. In JavaScript, prototypes can be redefined at runtime, and imme-diately affect all the referring objects.The detection process for many of the traditional code smells [85, 104] inobject-oriented languages is dependent on identifying objects, classes, and func-tions in the code. Unlike most object-oriented languages such as Java or C++,identification of such key language items is not straightforward in JavaScript code.Bellow we explain the major challenges in identifying objects and functions inJavaScript.JavaScript has a very flexible model of objects and functions. Object propertiesand their values can be created, changed, or deleted at runtime and accessed viafirst-class functions. For instance, in the following piece of code a call to the131function foo() will dynamically create a property prop for the object obj if propdoes not already exist.1 function foo(obj, prop, value){2 obj.prop = value;3 }Due to such dynamism, the set of all available properties of an object is noteasily retrievable through static analysis of the code alone. Empirical studies [153]reveal that most dynamic features in JavaScript are frequently used by developersand cannot be disregarded in code analysis processes.Furthermore, functions in JavaScript are first-class values. They can (1) beobjects themselves, (2) contain properties and nested function closures, (3) be as-signed dynamically to other objects, (4) be stored in variables, objects, and arrays,(5) be passed as arguments to other functions, and (6) be returned from functions.JavaScript also allows the creation (through eval()) and execution of new codeat runtime, which again makes static analysis techniques insufficient.Manual analysis and detection of code smells in JavaScript is time consuming,tedious, and error-prone in large code bases. Therefore, automated techniques areneeded to support web developers in maintaining their code. Given the challengingcharacteristics of JavaScript, our goal in this work is to propose a technique that canhandle the highly dynamic nature of the language to detect potential code smellseffectively.6.3 Related WorkFowler and Beck [85] proposed 22 code smells in object-oriented languages andassociated each of them with a possible refactoring. Although code smells inobject-oriented languages have been extensively studied in the past, current workon smell detection for JavaScript code is scarce [137]. In this work we studya list of code smells in JavaScript, and propose an automated detection tech-nique. The list of proposed JavaScript code smells in this chapter is based on astudy of various discussions in online development forums and JavaScript books[86, 136, 137, 145, 147].Many tools and techniques have been proposed to detect code smells automat-132ically in Java and C++ such as Checkstyle [3], Decor [133], and JDeodorant [174].A common heuristic-based approach to code smell detection is the use of codemetrics and user defined thresholds [76, 108, 133, 135, 162]. Similarly we adopta metric-based smell detection strategy. Our approach is different from such tech-niques in the sense that due to the dynamic nature of JavaScript, we propose a codesmell technique that combines static with dynamic analysis.Cilla [118] is a tool that similar to our work applies dynamic analysis but fordetecting unused CSS code in relation to dynamically mutated DOM elements. Acase study conducted with Cilla revealed that over 60% of CSS rules are unused inreal-world deployed web applications, and eliminating them could vastly improvethe size and maintainability of the code.A more closely related tool is WebScent [137], which detects client-side smellsthat exist in embedded code within scattered server-side code. Such smells can notbe easily detected until the client-side code is generated. After detecting smells inthe generated client-side code, WebScent locates the smells in the correspondinglocation in the server-side code. WebScent primarily identifies mixing of HTML,CSS, and JavaScript, duplicate code in JavaScript, and HTML syntax errors. Ourtool, JSNOSE, can similarly identify JavaScript code smells generated via server-side code. However, we propose and target a larger set of JavaScript code smellsand unlike the manual navigation of the application in WebScent, we apply au-tomated dynamic exploration using a crawler. Another advantage of JSNOSE isthat it can infer dynamic creation/change of objects, properties, and functions atruntime, which WebScent does not support.A number of industrial tools exist that aim at assisting web developers withmaintaining their code. For instance, WARI [17] examines dependencies betweenJavaScript functions, CSS styles, HTML tags and images. The goal is to stati-cally find unused images as well as unused and duplicated JavaScript functionsand CSS styles. Because of the dynamic nature of JavaScript, WARI cannot guar-antee the correctness of the results. JSLint [9] is a static code analysis tool writtenin JavaScript that validates JavaScript code against a set of good coding practices.The code inspection tends to focus on improving code quality from a technical per-spective. The Google Closure Compiler [27] is a JavaScript optimizer that rewritesJavaScript code to make it faster and more compact. It helps to reduce the size of133JavaScript code by removing comments and unreachable code.6.4 JavaScript Code SmellsIn this section, we propose a list of code smells for JavaScript-based applications.Of course a list of code smells can never be complete as the domain and projectsthat the code is used in may vary. Moreover, code smells are generally subjectiveand imprecise, i.e., they are based on opinions and experiences [178]. To mitigatethis subjective nature of code smells, we have collected these smells by studyingvarious online development resources [10, 86, 136, 145, 147, 163] and books [77,195, 196] that discuss bad JavaScript coding patterns.In total, we consider 13 code smells in JavaScript. Although JavaScript hasits own specific code smells, most of the generic code smells for object-orientedlanguages [85, 104] can be adapted to JavaScript as well. Since JavaScript is aclass-free language and objects are defined directly, we use the notion of “object”instead of “class” for these generic smells. The generic smells include the follow-ing 7: Empty catch blocks (poor understanding of logic in the try), Large object(too many responsibilities), Lazy object (does too little), Long functions (inade-quate decomposition), Long parameter list (need for an object), Switch statements(duplicated code and high complexity), and Unused/dead code (never executed orunreachable code).In addition to the generic smells, we propose 6 types of JavaScript code smellsin this section as follows.6.4.1 Closure SmellsIn JavaScript, it is possible to declare nested functions, called closures. Clo-sures make it possible to emulate object-oriented notions such as public,private, and privileged members. Inner functions have access to the pa-rameters and variables — except for this and argument variables — of thefunctions they are nested in, even after the outer function has returned [77]. Weconsider four smells related to the concept of function closures in JavaScript.Long scope chaining. Functions can be multiply-nested, thus closures can havemultiple scopes. This is called “scope chaining” [10], where inner functions have134access to the scope of the functions containing them. An example is the followingcode:1 function foo(x) {2 var tmp = 3;3 function bar(y) {4 ++tmp;5 function baz(z) {6 document.write(x + y + z + tmp);7 }8 baz(3);9 }10 bar(10);11 }12 foo(2); // writes 19 i.e., 2+10+3+4This nested function style of programming is useful to emulate privacy, how-ever, using too many levels of nested closures over-complicates the code, makingit hard to comprehend and maintain. Moreover, identifier resolution performanceis directly related to the number of objects to search in the scope chain [195]. Thefarther up in the scope chain an identifier exists, the longer the search goes on andthe longer time it takes to access that variable.Closures in loops. Inner functions have access to the actual variables of theirouter functions and not their copies. Therefore, creating functions within a loopcan cause confusion and be wasteful computationally [77]. Consider the followingexample:1 var addTheHandler = function (nodes) {2 for (i = 0; i < nodes.length; i++) {3 nodes[i].onclick = function (e) {4 document.write(i);5 };6 }7 };8 addTheHandler(document.getElementsByTagName("div"));Assume that there are three div DOM elements present. If the developer’sactual intention is to display the ordinal of the div nodes when a node is clicked,then the result will not be what she expected as the length of nodes, 3, will bereturned instead of the node’s ordinal position. In this example, the value of i inthe document.write function is assigned when the for loop is finished and135the inner anonymous function is created. Therefore, the variable i has the valueof nodes.length, which is 3 in this case. To avoid this confusion and potentialmistake, the developer can use a helper function outside of the loop that will delivera function binding to the current local value of i.Variable name conflict in closures. When two variables in the scopes of a closurehave the same name, there is a name conflict. In case of such conflicts, the innerscope takes precedence. Consider the following example [10]:1 function outside() {2 var a = 10;3 function inside(a) {4 return a;5 }6 return inside;7 }8 result = outside()(20); \\ result: 20In this example, there is a name conflict between the variable a in outside()and the function parameter a in inside() that takes precedence. We considerthis a code smell as it makes it difficult to comprehend the actual intended valueassignment. Due to scope chain precedence, the value of result is now 20.However, it is not evident from the code whether this is the intended result (orperhaps 10).Moreover, dynamic typing in JavaScript makes it possible to reuse the samevariable for different types at runtime. Similar to the variable name conflict issue,this style of programming reduces readability and in turn maintainability. Thus,declaring a new variable with a dedicated unique name is the recommended refac-toring. This issue is not restricted to closures, or to nested functions.Accessing the this reference in closures. Due to the design of JavaScript lan-guage, when an inner function in a closure is invoked, this becomes boundedto the global object and not to the this variable of the outer function [77]. Weconsider the usage of this in closures a code smell as it is a potential symptomfor mistakes. As a refactoring workaround, the developer can assign the value ofthe this variable of the outer function to a new variable that and then use thatin the inner function [77].1366.4.2 Coupling between JavaScript, HTML, and CSSIn web applications, HTML is meant for presenting content and structure, CSSfor styling, and JavaScript for functional behaviour. Keeping these three enti-ties separate is a well-known programming practice, known as separation of con-cerns. Unfortunately, web developers often mix JavaScript code with markup andstyling code [137], which adversely influences program comprehension, mainte-nance and debugging efforts in web applications. We categorize the tight couplingof JavaScript with HTML and CSS code into the following three code smells types:JavaScript in HTML. One common way to register an event listener in web ap-plications is via inline assignment in the HTML code. We consider this inlineassignment of event handlers a code smell as it tightly couples the HTML code tothe JavaScript code. An example of such a coupling is shown below:1 <button onclick="foo();" id="myBtn"/>This smell can be refactored by removing the onclick attribute from thebutton in HTML and using the addEventListener function of DOM Level 2[180] to assign the event handler through JavaScript:1 <button id="myBtn"/>3 function foo() {4 // code5 }6 var btn = document.getElementById("myBtn");7 btn.addEventListener("click", foo, false);This could be further refactored using the jQuery library as$("#myBtn").on("click", foo);.Note that JavaScript code within the <script> tag in HTML code can beseen as a code smell [137]. We do not consider this a code smell as it does not affectcomprehension nor maintainability, although separating the code to a JavaScriptfile is preferable.HTML in JavaScript. Extensive DOM API calls and embedded HTML stringsin JavaScript complicate debugging and software evolution. In addition, editingmarkup is believed to be less error prone than editing JavaScript code [196]. Thefollowing code is an example of embedded HTML in JavaScript [86]:1371 // add book to the list2 var book = doc.createElement("li");3 var title = doc.createElement("strong");4 titletext = doc.createTextNode(name);5 title.appendChild(titletext);6 var cover = doc.createElement("img");7 cover.src = url;8 book.appendChild(cover);9 book.appendChild(title);10 bookList.appendChild(book);To refactor this code smell, we can move the HTML code to a template(book tpl.html):1 <li><strong>TITLE</strong><img src="COVER"/></li>The JavaScript code would then be refactored as:1 var tpl = loadTemplate("book_tpl.html");2 var book = tpl.substitute({TITLE: name, COVER: url});3 bookList.appendChild(book);Another example this smell is using long strings of HTML in jQuery functioncalls [147]:1 $('\#news')2 .append('<div class="gall"><a href="javascript:void(0)">Linky</a></←↩div>')3 .append('<button onclick="app.doStuff()">Button</button>');CSS in JavaScript. Setting the presentation style of DOM elements by assign-ing their style properties in JavaScript is a code smell [86]. Keeping stylingcode inside JavaScript is asking for maintenance problems. Consider the followingexample:1 div.onclick = function(e) {2 var clicked = this;3 = "1px solid blue";4 }The best way to change the style of an element in JavaScript is by manipulatingCSS classes properly defined in CSS files [86, 196]. The above code smell can berefactored as follows:1381 \\ CSS file:2 .selected{border: 1px solid blue;}3 \\ JavaScript:4 div.onclick = function(e) {5 this.setAttribute("class","selected");6 }6.4.3 Excessive Global VariablesGlobal variables are accessible from anywhere in JavaScript code, even when de-fined in different files loaded on the same page. As such, naming conflicts betweenglobal variables in different JavaScript source files is common, which affects pro-gram dependability and correctness. The higher the number of global variables inthe code, the more dependent existing modules are likely to be; and dependencyincreases error-proneness, and maintainability efforts [145]. Therefore, we see theexcessive use of global variables as a code smell in JavaScript. One way to mitigatethis issue is to create a single global object for the whole application that containsall the global variables as its properties [77]. Grouping related global variables intoobjects is another remedy.6.4.4 Long Message ChainLong chaining of functions with the dot operator can result in complex controlflows that are hard to comprehend. This style of programming happens frequentlywhen using the jQuery library. One extreme example is shown bellow [147]:1 $('a').addClass('reg-link').find('span').addClass('inner').end().find('←↩div').mouseenter(mouseEnterHandler).mouseleave(mouseLeaveHandler).←↩end().explode();Long chains are unreadable specially when a large amount of DOM traversingis taking place [147]. Another instance of this code smell is too much cascading.Similar to object-oriented languages such as Java, in JavaScript many methodscalls can be cascaded on the same object sequentially within a single statement.This is possible when the methods return the this object. Cascading can help toproduce expressive interfaces that perform much work at once. However, the code139written this way tends to be harder to follow and maintain. The following exampleis borrowed from [77]:1 getElement('myBoxDiv').move(350, 150).width(100).height(100).color('red←↩').border('10px outset').padding('4px').appendText("Please stand ←↩by").on('mousedown', function (m) {2 this.startDrag(m, this.getNinth(m));}).on('mousemove', 'drag').on('←↩mouseup', 'stopDrag').tip("This box is resizeable");A possible refactoring to shorten the message chain is to break the chain intomore general methods/properties for that object which incorporate longer chains.6.4.5 Nested CallbackA callback is a function passed as an argument to another (parent) function. Call-backs are executed after the parent function has completed its execution. Callbackfunctions are typically used in asynchronous calls such as timeouts and XML-HttpRequests (XHRs). Using excessive callbacks, however, can result in hard toread and maintain code due to their nested anonymous (and usually asynchronous)nature. An example of a nested callback is given below [163]:1 setTimeout(function () {2 xhr("/greeting/", function (greeting) {3 xhr("/who/?greeting=" + greeting, function (who) {4 document.write(greeting + " " + who);5 });6 });7 }, 1000);A possible refactoring to nested callbacks is to split the functions and pass areference to another function [19]. The above code can be rewritten as below:1 setTimeout(foo,1000);2 function foo() {3 xhr("/greeting/", bar);4 }5 function bar(greeting) {6 xhr("/who/?greeting=" + greeting, baz);7 }8 function baz(who) {9 document.write(greeting + " " + who);10 }1406.4.6 Refused BequestJavaScript is a class-free prototypal inheritance language, i.e., an object can inheritproperties from another object, called a prototype object. A JavaScript object thatdoes not use/override many of the properties it inherits from its prototype object isan instance of a refused bequest [85] soft code smell. In the following example, thestudent object inherits from its prototype parent person. However, studentonly uses one of the five properties inherited from person, namely fname.1 var person={fname:"John", lname:"Smith", gender:"male", age:28, ←↩location:"Vancouver"};2 var student = Object.create(person);3 ...4 = "UBC";5 document.write(student.fname + " studies at " +;A simple refactoring, similar to the push down field/method proposed byFowler [85], could be to eliminate the inheritance altogether and add the requiredproperty (fname) of the prototype to the object that refused the bequest.6.5 Smell Detection MechanismIn this section, we present our JavaScript code smell detection mechanism, whichis capable of detecting the code smells discussed in the previous section.A common heuristic-based approach to detect code smells is the use of sourcecode metrics and thresholds [76, 108, 133, 135, 162]. In this work, we adopt asimilar metric-based approach to identify smelly sections of JavaScript code.In order to calculate the metrics, we first need to extract objects, functions, andtheir relationships from the source code. Due to the dynamic nature of JavaScript,static code analysis alone will not suffice, as discussed in Section 6.2. Therefore, inaddition to static code analysis, we also employ dynamic analysis to monitor andinfer information about objects and their relations at runtime.Figure 6.1 depicts an overview of our approach. At a high level, (1) the con-figuration, containing the defined metrics and thresholds, is fed into the code smelldetector. We automatically (2) intercept the JavaScript code of a given web ap-plication, by setting up a proxy between the server and the browser, (3) extractJavaScript code from all .js and HTML files, (4) parse the source code into an141DOM-based FixtureJavaScript Code(1)Instrument Code(2)Execute Function(4)Deduce DOM-dependant PCsExecution Trace(3)Collect Execution Trace(9)Generate Test caseGenerated Unit Tests(7)Generate Fixture(8)Apply Fixture to Cover a New PathInstrumented Code(6)Solve XPath expressions (5)Translate PCs to XPath expressionsSolved XML treeFigure 6.1: Processing view of JSNOSE, our JavaScript code smell detector.Abstract Syntax Tree (AST) and analyze it by traversing the tree. During the ASTtraversal, the analyzer visits all program entities, objects, properties, functions, andcode blocks, and stores their structure and relations. At the same time, we (2) in-strument the code to monitor statement coverage, which is used for unused/deadcode smell detection. Next, we (7) navigate the instrumented application in thebrowser to produce an execution trace, through an automated dynamic crawler,and (8) collect and use execution traces to calculate code coverage. We (5) ex-tract patterns from the AST such as names of objects and functions, and (6) inferJavaScript objects, their types, and properties dynamically by querying the browserat runtime. Finally, (9) based on all the static and dynamic data collected, we detectcode smells (10) using the metrics.142Table 6.1: Metric-based criteria for JavaScript code smell detection.Code smell Level Detection method Detection criteria MetricClosure smell Function Static & Dynamic LSC > 3 LSC: Length of scope chainCoupling JS/HTML/CSS File Static & Dynamic JSC > 1 JSC: JavaScript coupling instanceEmpty catch Code block Static LOC(catchBlock) = 0 LOC: Lines of codeExcessive global variables Code block Static & Dynamic GLB > 10 GLB: Number of global variablesLarge object Object Static & Dynamic [185]: LOC(ob j)> 750 or NOP > 20 NOP: Number of propertiesLazy object Object Static & Dynamic NOP < 3 NOP: Number of propertiesLong message chain Code block Static LMC > 3 LMC: Length of message chainLong method/function Function Static & Dynamic [108, 185]: MLOC > 50 MLOC: Method lines of codeLong parameter list Function Static & Dynamic [185]: PAR > 5 PAR: Number of parametersNested callback Function Static & Dynamic CBD > 3 CBD: Callback depthRefused bequest Object Static & Dynamic [108]: BUR < 13 and NOP > 2 BUR: Base-object usage ratioSwitch statement Code block Static NOC > 3 NOC: Number of casesUnused/dead code Code block Static & Dynamic EXEC = 0 EXEC: Execution countor RCH = 0 RCH: Reachability of code1436.5.1 Metrics and Criteria Used for Smell DetectionTable 6.1 presents the metrics and criteria we use in our approach to detect codesmells in JavaScript applications. Some of these metrics and their correspond-ing thresholds have been proposed and used for detecting code smells in object-oriented languages [76, 108, 133, 135, 162]. In addition, we propose new metricsand criteria to capture the characteristics of JavaScript code smells discussed inSection 6.4.Closure smell. We identify long scope chaining and accessing this in closures.If the length of scope chain (LSC) is greater than 3, or if this is used in an innerfunction closure, we report it as a closure smell instance.Coupling JS/HTML/CSS. We count the number of occurrences of JavaScriptwithin HTML tags, and CSS in JavaScript as described in Section 6.4.2. Our toolreports all such JavaScript coupling instances as code smell.Empty catch. Detecting empty catches is straightforward in that the number oflines of code (LOC) in the catch block should be zero.Excessive global variables. We extract global variables in JavaScript, which canbe defined in three ways: (1) using a var statement outside of any function, suchas var x = value;, (2) adding a property to the window global object, i.e.,the container of all global variables, such as = value;, and (3)using a variable without declaring it by var. If the number of global variables(GLB) exceeds 10, we consider it as a code smell.Large/Lazy object. An object that is doing too much or not doing enough workshould be refactored. Large objects may be restructured or broken into smallerobjects, and lazy objects maybe collapsed or combined into other classes. If anobject’s lines of code is greater than 750 or the number of its methods is greaterthan 20, it is identified as a large object [185]. We consider an object lazy, if thenumber of its properties (NOP) is less than 3.Long message chain. If the length of a message chain (LMC), i.e., the number ofitems being chained by dots as explained in Section 6.4.4, in a statement is greaterthan 3, we consider it a long message chain and report it as a smell.Long method/function. A method with more than 50 lines of code (MLOC) isidentified as a long method smell [108, 185].144Long parameter list. We consider a parameter list long when the number of pa-rameters (PAR) exceeds 5 [185].Nested callback. We identify nested functions that pass a function type as anargument. If the callback depth (CBD) exceeds 3, we report it as a smell.Refused bequest. If an object uses or specializes less than a third of its parentprototype, i.e., base-object usage ratio (BUR) is less than 13 , it is considered asrefused parent bequest [108]. Further, the number of methods and the cyclomaticcomplexity of the child object should be above average since simple and smallobjects may unintentionally refuse a bequest. In our work, we slightly change thiscriteria to the constraint of NOP>2, i.e., not to be a lazy small object.Switch statement. The problem with switch statements is duplicated code. Typi-cally, similar switch statements are scattered throughout a program. If one adds orremoves a clause in one switch, often has to find and repair the others too [85, 104].When the number of switch cases (NOC) is more than three, it is considered as acode smell. This can also be applied to if-then-else statements with morethan three branches.Unused/dead code. Unused/dead code has negative effects on maintainability asit makes the code unnecessarily more difficult to understand [112, 145]. Unlikelanguages such as Java, due to the dynamic nature of JavaScript it is quite chal-lenging to reason about dead JavaScript code statically. Hence, if the executioncount (EXEC) of an statement remains 0 after executing the web application, wereport it as a candidate unused/dead code. Reachability of code (RCH) is anothermetric we use to identify unreachable code.6.5.2 Combining Static and Dynamic AnalysisAlgorithm 5 presents our smell detection method. The algorithm is generic in thesense that the metric-based static and dynamic smell detection procedures can bedefined and used according to any smell detection criteria. Given a JavaScriptapplication A, a maximum crawling time t, and a set of code smell criteria τ , thealgorithm generates a set of code smells CS.The algorithm starts by looking for inline JavaScript code embedded in HTML(line 3). All JavaScript code is then extracted from JavaScript files and HTML145<script> tags (line 4). An AST of the extracted code is then generated us-ing a parser (line 5). This AST is traversed recursively (lines 6, 19-21) to detectcode smells using a static analyzer. Next the AST is instrumented (line 7) andtransformed back to the corresponding JavaScript source code and passed to thebrowser (lines 8). The crawler then navigates (line 9-16) the application, and po-tential code smells are explored dynamically (line 9 14). After the termination ofthe exploration process, unused code is identified based on the execution trace andadded to the list of code smells (line 17), and the resulting list of smells is returned(line 18).Next, we present the relevant static and dynamic smell detection processes indetail.Static Analysis. The static code analysis (Line 19) involves analyzing the AST bytraversing the tree. During this step, we extract CSS style usage, objects, proper-ties, inheritance relations, functions, and code blocks to calculate the smell metrics.If the calculated metrics violate the given criteria (τ), the smell is returned.There are different ways to create objects in JavaScript. In this work,we only consider two main standard forms of using object literals, namely,through (1) the new keyword, and (2) Object.create(). To detect theprototype of an object, we consider both the non-standard form of using theproto property assignment, and the more general constructor functionsthrough Object.create().In order to detect unreachable code, we search the AST nodes for return,break, continue, and throw statements. Whatever a statement is found rightafter these statements that is on the same node level in the AST, we mark it aspotential unreachable code.Dynamic Analysis. Dynamic analysis (Line 14) is performed for two reasons:1. To calculate many of the metrics in Table 6.1, we need to monitor the cre-ation/update of functions, objects, and their properties at runtime. To thatend, a combination of static and dynamic analysis should be applied. Thedynamic analysis is performed by executing a piece of JavaScript code inthe browser, which enables retrieving a list of all global variables, objects,and functions (own properties of the window object) and dynamically de-146Algorithm 5: JavaScript Code Smell Detectioninput : A JavaScript application A, the maximum exploration time t, the set of smellmetric criteria τoutput: The list of JavaScript code smells CS1 CS←∅Procedure EXPLORE() begin2 while TIMELEFT(t) do3 CS←CS∪DETECTINLINEJSINHTML(τ)4 code← EXTRACTJAVASCRIPT(A)5 AST ← PARSTOAST(code)6 VISITNODE(AST.root)7 ASTinst← INSTRUMENT(AST)8 INJECTJAVASCRIPTCODE(A,ASTinst)9 C← EXTRACTCLICKABLES(A)10 for c ∈C do11 dom← browser.GETDOM()12 robot.FIREEVENT(c)13 new dom← browser.GETDOM()14 CS←CS∪DETECTDYNAMICALLY(τ)15 if dom.HASCHANGED(new dom) then16 EXPLORE(A)endendend17 CS←CS∪DETECTUNUSEDCODE()18 return CSendProcedure VISITNODE(ASTNode) begin19 CS←CS∪DETECTSTATICALLY(node,τ)20 for node ∈ AST Node.getChildren() do21 VISITNODE(node)endendtecting prototypes of objects (using getPrototypeOf() on each object).However, local objects in functions are not accessible via JavaScript codeexecution in the global scope. Therefore, we use static analysis and extractthe required information from the parsed AST. The objects, functions, andproperties information gathered this way is then fed to the smell detectorprocess.2. To detect unused/dead code we need to collect execution traces for measur-ing code coverage. Therefore, we instrument the code and record which parts147of it are invoked by exploring the application through automated crawling.However, this dynamic analysis can give false positives for non-executed,but reachable code. This is a limitation of any dynamic analysis approachsince there is no guarantee of completeness (such as code coverage).Note that our approach merely reports candidate code smells and the decisionwill always be upon developers whether or not to refactor the code smells.6.5.3 ImplementationWe have implemented our approach in a tool called JSNOSE, which is publiclyavailable [34]. JSNOSE operates automatically, does not modify the web browser,is independent of the server technology, and requires no extra effort from the user.We use the WebScarab proxy to intercept the JavaScript/HTML code. To parsethe JavaScript code to an AST and instrument the code, we use Mozilla Rhino[41]. To automatically explore and dynamically crawl the web application, weuse CRAWLJAX [120]. The output of JSNOSE is a text file that lists all detectedJavaScript code smells with their corresponding line numbers in a JavaScript fileor an HTML page.6.6 Empirical EvaluationWe have conducted an empirical study to evaluate the effectiveness and real-worldrelevance of JSNOSE. Our study is designed to address the following researchquestions:RQ1: How effective is JSNOSE in detecting JavaScript code smells?RQ2: Which code smells are more prevalent in web applications?RQ3: Is there a correlation between JavaScript code smells and source code met-rics?Our experimental data along with the implementation of JSNOSE are availablefor download [34].148Table 6.2: Experimental JavaScript-based objects.ID Name #JSfilesJSLOC#FunctionsAverageCCAverageMIDescription1 PeriodicTable [13] 1 71 9 12 116 A periodic table of the elements2 CollegeVis [4] 1 177 30 11 119 A visualization tool3 ChessGame [20] 2 198 15 102 105 A simple chess game4 Symbolistic [15] 1 203 20 28 109 A simple game5 Tunnel [16] 0 234 32 29 116 A simple game6 GhostBusters [7] 0 278 26 45 97 A simple game7 TuduList [51] 4 782 89 106 94 A task manager (J2EE and MySQL)8 FractalViewer [26] 8 1245 125 35 116 A fractal zoomer9 PhotoGallery [39] 5 1535 102 53 102 A photo gallery (PHP without MySQL)10 TinySiteCMS [49] 13 2496 462 54 115 A CMS (PHP without MySQL)11 TinyMCE [50] 174 26908 4455 67 101 A WYSIWYG editor6.6.1 Experimental ObjectsWe selected 11 web applications that make extensive use of client-side JavaScript,and fall under different application domains. The experimental objects along withtheir source code metrics are shown in Table 6.2. In the calculation of these sourcecode metrics, we included inline HTML JavaScript code, and excluded blank lines,comments, and common JavaScript libraries such as jQuery, DWR, Scriptaculous,Prototype, and google-analytics. Note that we also exclude these libraries in theinstrumentation step. We use CLOC [22] to count the JavaScript lines of code(JS LOC). Number of functions (including anonymous functions), ad cyclomaticcomplexity (CC) are all calculated using complexityReport.js [23]. The reportedCC is across all JavaScript functions in each application.6.6.2 Experimental SetupWe confine the dynamic crawling time for each application to 10 minutes, which isacceptable in a maintenance environment. Of course, the more time we designatefor exploring the application, the higher statement coverage we may get and thusmore accurate the detection of unused/dead code. For the crawling configuration,we set no limits on the crawling depth nor the maximum number of DOM statesto be discovered. The criteria for code smell metrics are configured according to149those presented in Table 6.1.To evaluate the effectiveness of JSNOSE (RQ1), we validate the produced re-sults by JSNOSE against manual code inspection. Similar to [133], we measureprecision and recall as follows:Precision is the rate of true smells identified among the detected smells: T PT P+FPRecall is the rate of true smells identified among the existing smells: T PT P+FNwhere TP (true positives), FP (false positives), and FN (false negatives) respec-tively represent the number of correctly detected smells, falsely detected smells,and missed smells. To count TP, FP, and FN in a timely fashion while preservingaccuracy, we only consider the first 9 applications since the last 2 applications haverelatively larger code bases. In our manual validation process, we also considerruntime created/modified objects and functions that are inferred during JSNOSEdynamic analysis. It is worth mentioning that this manual process is a labour in-tensive task, which took approximately 6.5 hours for the 9 applications.Note that the precision-recall values for detecting unused/dead code smell iscalculated considering only “unreachable” code, which is code after an uncondi-tional return statement. This is due to the fact that the accuracy of dead codedetection depends on the running time and dynamic exploration strategy.To measure the prevalence of JavaScript code smells (RQ2), we ran JSNOSEon all the 11 web applications and counted each smell instance.To evaluate the correlation between the number of smells and applicationsource code metrics (RQ3), we use R14 to calculate the non-parametric Spearmancorrelation coefficients as well as the p-values. The Spearman correlation coeffi-cient does not require the data to be normally distributed [103].14http://www.r-project.org150Table 6.3: Precision-recall analysis (based on the first 9 applications), and detected code smell statistics (for all 11 applications).S1.ClosuresmellsS2.CouplingJS/HTML/CSSS3.EmptycatchNumberofglobalvariablesS4.ExcessiveglobalvariablesS5.LargeobjectS6.LazyobjectS7.LongmessagechainS8.Longmethod/functionS9.LongparameterlistS10.NestedcallbackS11.RefusedbequestS12.SwitchstatementS13.UnreachablecodeS13.Unused/deadcodeNumberofsmellinstancesNumberoftypesofsmellsTPtotal 19 171 16 200 - 14 391 87 25 12 1 13 10 0 n/a 959 n/aFPtotal 0 0 0 0 - 4 73 0 0 0 0 6 0 0 n/a 83 n/aFNtotal 6 0 0 0 - 1 2 8 0 0 0 0 0 0 n/a 17 n/aPrecisiontotal 100% 100% 100% 100% - 78% 85% 100% 100% 100% 100% 68% 100% n/a n/a 92% n/aRecalltotal 76% 100% 100% 100% - 94% 99% 92% 100% 100% 100% 100% 100% n/a n/a 98% n/aPeriodicTable 1 2 0 6 - 4 10 0 0 0 0 0 0 0 28% 23 4CollegeVis 1 0 0 17 + 0 32 0 2 0 1 0 0 0 22% 53 5ChessGame 0 7 0 39 + 3 9 4 0 2 0 0 0 0 36% 64 6Symbolistic 0 0 0 4 - 0 17 0 1 0 0 1 0 0 20% 23 3Tunnel 9 0 0 15 + 0 28 0 2 0 0 0 0 0 44% 54 4GhostBusters 2 0 0 4 - 0 38 0 2 3 0 0 0 0 45% 49 4TuduList 6 47 0 45 + 6 138 78 12 2 0 2 7 0 65% 343 10FractalViewer 0 16 0 40 + 7 117 4 5 5 0 16 2 0 36% 212 9PhotoGallery 0 99 16 30 + 0 73 1 1 0 0 0 1 0 64% 221 7TinySiteCMS 2 7 0 82 + 3 13 4 3 58 0 0 0 0 22% 172 8TinyMCE 3 3 1 4 - 5 23 4 0 2 1 3 3 0 63% 52 10Average 2.2 16.5 1.5 26 + 2.5 45.2 8.6 2.6 6.5 0.2 2 1.2 0 40% 115 6.4#Smelly apps 7 7 2 n/a 7 6 11 6 8 6 2 4 4 0 n/a n/a n/a%Smelly apps 64% 64% 18% n/a 64% 55% 100% 55% 73% 55% 18% 36% 36% 0% n/a n/a n/a1516.6.3 ResultsEffectiveness (RQ1). We report the precision and recall in the first 5 rows of Table6.3. The reported TPtotal , FPtotal , and FNtotal , are the sum of TP, FP, and FN valuesfor the first 9 applications. Our results show that JSNOSE has an overall precisionof 93% and an average recall of 98% in detecting the JavaScript code smells, whichpoints to its effectiveness.We observed that most false positives detected are related to large/lazy objectsand refused bequest, which are primitive variables, object properties, and methodsin jQuery. This is due to the diverse coding styles and different techniques in objectmanipulations in JavaScript, such as creating and initializing arrays of objects.There were a few false negatives in closure smells and long message chain, whichare due to the permissive nature of jQuery syntax, complex chains of methods,array elements, as well as jQuery objects created via $() function.Code smell prevalence (RQ2). Table 6.3 shows the frequency of code smells ineach of the experimental objects. The results show that among the JavaScript codesmells detected by JSNOSE, lazy object, long method/function, closure smells,coupling JS/HTML/CSS, and excessive global variables, are the most prevalentsmells (appeared in 100%-64% of the experimental objects).Tunnel and TuduList use many instances of this in closures. Major cou-pling smells in TuduList and PhotoGallary are with the use of CSS in JavaScript.Refused bequest are most observed in FractalViewer in objects inheriting from ge-ometry objects. The high percentage of unused/dead code reported for TuduList,PhotoGallary, and TinyMCE is in fact not due to dead code per se, but is mainlyrelated to the existence of admin pages and parts of the code which require precisedata inputs that were not provided during the crawling process. On the other hand,TinyMCE has a huge number of possible actions and features that could not beexercised in the designated time of 10 minutes.Correlations (RQ3). Table 6.4 shows the Spearman correlation coefficients be-tween the source code metrics and the total number of smell instances/types. Theresults show that there exists a strong and significant positive correlation betweenthe types of smells and LOC, number of functions, number of JavaScript files, andcyclomatic complexity. A weak correlation is also observed between the number152Table 6.4: Spearman correlation coefficients between number of code smells and code quality metrics.Total number of Total number ofMetric smell instances types of smellsLines of code (r = 0.53, p = 0.05) (r = 0.70, p = 0.01)# Functions (r = 0.57, p = 0.03) (r = 0.76, p = 0.00)# JavaScript files (r = 0.53, p = 0.05) (r = 0.85, p = 0.00)Cyclomatic complexity (r = 0.63, p = 0.02) (r = 0.70, p = 0.01)of smell instances and the aforementioned source code metrics.6.6.4 DiscussionHere, we discuss some of the limitations and threats to validity of our results.Implementation Limitations. The current implementation of JSNOSE is not ableto detect all various ways of object creation in JavaScript. Also it does not dealwith various syntax styles of frameworks such as jQuery. For the dynamic analysispart, JSNOSE is dependent on the crawling strategy and execution time, which mayaffect the accuracy if certain JavaScript files are never loaded in the browser duringthe execution since the state space of web applications is typically huge. SinceJSNOSE is using Rhino to parse the JavaScript code and generate the AST, if thereexists a syntax error in a JavaScript file, the code in that file will not be parsed to anAST and thus any potential code smells within that file will be missed. Note thatthese are all implementation issues and not related to the design of our approach.Threats to Validity. A threat to the external validity of our evaluation is with re-gard to the generalization of the results to other web applications. We acknowledgethat more web applications should be evaluated to support the conclusions. To mit-igate this threat we selected our experimental objects from different applicationdomains, which exhibit variations in design, size, and functionality.One threat to the internal validity of our study is related to the metrics andcriteria we proposed in Table 6.1. However, we believe these metrics can effec-tively identify code smells described in Section 6.4. The designated 10 minutestime for crawling could also be increased to get more accurate results, however,we believe that in most maintenance environments this is acceptable consideringfrequent code releases. The validation and accuracy analysis performed by manual153inspection can be incomplete and inaccurate. We mitigated this threat by focusingon the applications with smaller sets of code smells so that manual comparisoncould be conducted accurately.With respect to reliability of our evaluation, JSNOSE and all the web-basedsystems are publicly available, making the results reproducible.6.7 ConclusionsIn this work, we discussed a set of 13 JavaScript code smells and presented ametric-based smell detection technique, which combines static and dynamic anal-ysis of the client-side code. Our approach, implemented in a tool called JSNOSE,can be used by web developers during development and maintenance cycles to spotpotential code smells in JavaScript-based applications. The detected smells can berefactored to improve their code quality.Our empirical evaluation shows that JSNOSE is effective in detectingJavaScript code smells; Our results indicate that lazy object, long method/func-tion, closure smells, coupling between JavaScript, HTML, and CSS, and exces-sive global variables are the most prevalent code smells. Further, our study in-dicates that there exists a strong and significant positive correlation between thetypes of smells and LOC, cyclomatic complexity, and the number of functions andJavaScript files.154Chapter 7ConclusionsWeb applications are often written in multiple languages such as JavaScript,HTML, and CSS. JavaScript is extensively used to build responsive modern webapplications. The event-driven and dynamic nature of JavaScript, and its interac-tion with the DOM, make it challenging to understand and test effectively. Thework presented in this dissertation has focused on proposing new techniques toimprove the quality of web and in particular JavaScript-based applications throughautomated testing and maintenance.7.1 Revisiting Research Questions and Future DirectionsIn the beginning of this thesis, we designed five research questions towards the goalof improving the quality of web applications. We believe that we have addressedthe research questions with our contributions.RQ1.4.1. How can we effectively derive test models for web applications?Chapter 2. We proposed four aspects of a test model that can be derived by crawl-ing, including functionality coverage, navigational coverage, page structural cov-erage, and size of the test model. We also presented a feedback-directed explorationapproach, called FEEDEX, to guide the exploration towards achieving higher func-tionality, navigational, and page structural coverage while reducing the test modelsize [123]. Results of our experiments show that FEEDEX is capable of generat-ing test models that are enhanced in all aspects compared to traditional exhaustivemethods (DFS, BFS, and random).Future directions. There are number of possible future work based on the pre-155sented technique in Chapter 2. In this work we only measured the client-sidecoverage. It would be interesting to include the server-side code coverage in thefeedback loop as well to guide the exploration. Also our state expansion method iscurrently based on a memory-less greedy algorithm, which could leverage machinelearning techniques to improve its effectiveness. We do not claim that the proposedstate expansion heuristic and the scoring function is the best solution. Other heuris-tics, such as selecting states with the least coverage improvement in state expansionprocess, and other combinations for scoring function can be evaluated.RQ1.4.2. Can we utilize the knowledge in existing UI tests to generate newtests?Chapter 3. We proposed a technique [126] and a tool, called TESTILIZER [47], togenerate DOM-based UI test cases using existing tests. This work is motivated bythe fact that a human-written test suite is a valuable source of domain knowledge,which can be used to tackle some of the challenges in automated web applicationtest generation. TESTILIZER mines the existing UI test suite (such as SELENIUM)to infer a model of the covered DOM states and event-based transitions includinginput values and assertions. It then expands the inferred model by exploring alter-native paths and generates assertions for the new states. Finally it generates a newtest suite from the extended model.Our results supports that leveraging input values and assertions from human-written test suites can be helpful in generating more effective test cases.TESTILIZER easily outperforms a random test generation technique, provides sub-stantial improvements in the fault detection rate compared with the original testsuite, while slightly increasing code coverage too.Future directions. TESTILIZER is limited to applications that already havehuman-written tests, which may not be so prevalent in practice. On the other hand,many web applications are similar to each other in terms of design and code base,such as being built on top of the same content management system. We propose anopen research problem whether human-written tests can be leveraged to generateeffective tests for applications without existing tests. This is, however, challengingparticularly for assertion generation based on learned patterns. DOM-based asser-tions on abstract DOM states of an application may require some changes to be156applied on similar abstract DOM state of another application. This also requiresdefining similarity measures between applications for testing purposes.Another possible extension of this work is the test generation approach giventhe extended SFG with newly generated assertions. The current graph traversalmethod in TESTILIZER may produce test cases that share common paths, whichdo not contribute much to fault detection or code coverage. An optimization couldbe realized by guiding the test generation algorithm towards sates that have moreconstrained DOM-based assertions.RQ1.4.3. What is the quality of JavaScript tests in practice and which part ofthe code are hard to cover and test?Chapter 4. While some JavaScript features are known to be hard to test, noempirical study was done earlier towards measuring the quality and coverage ofJavaScript tests. We present the first empirical study of JavaScript tests to charac-terize their prevalence, quality metrics (code coverage, test code ratio, test commitratio, and average number of assertions per test), and shortcomings [125].We found that a considerable percentage of JavaScript projects do not have anytest and this is in particular for projects with JavaScript at client-side. On the otherhand almost all purely server-side JavaScript projects have tests and the quality ofthose tests are higher compared to tests for client-side. On average JavaScript testslack proper coverage for event-dependent callbacks, asynchronous callbacks, andDOM-related code.Future directions. The result of this study can be used to improve JavaScript testgeneration tools in producing more effective test cases that target hard-to-test code.Also evaluating the effectiveness of JavaScript tests by measuring their mutationscore, may reveal shortcomings regarding the quality of written assertions. An-other possible direction could be designing automated JavaScript code refactoringtechniques towards making the code more testable and maintainable.RQ1.4.4. How can we automate fixture generation for JavaScript unit testing?Chapter 5. Proper test fixtures are required to cover DOM-dependent statementsand conditions in unit testing JavaScript code. However, generating such fixturesis not an easy task. We proposed a DOM-based test fixture generation technique[127] and a tool, called CONFIX [24], which is based on concolic (concrete and157symbolic) execution. Our approach guides program executing through differentbranches of a function. CONFIX automatically generate a set of JavaScript unittests with DOM fixtures and DOM function arguments. Our empirical results showthat the generated fixtures substantially improve code coverage compared to testsuites without these fixtures, and the overhead is negligible .Future directions. Our DOM-based fixture generation technique could be en-hanced by adding support for statements that require event-triggering, and han-dling more complex constraints by improving the integer/string constraint solverto generate XPath expressions with proper structure and attribute values. Also ex-isting JavaScript test input generator tools, such as JALANGI, could be combinedwith CONFIX in such a way that the existing tool uses CONFIX in the case ofDOM-related constraints.RQ1.4.5. Which JavaScript code smells are prevalent in practice and whatmaintenance issues they cause?Chapter 6. We proposed a tool and technique, called JSNOSE [34], to detectJavaScript code smells [124]. It uses static and dynamic analysis of the client-side code to detect objects, functions, variables, and code blocks. We collected13 JavaScript code smells by studying various online development resources andbooks that discuss bad JavaScript coding patterns and what maintenance issues theymake. Our empirical evaluation shows that (1) JSNOSE can accurately detect thesecode smells, and (2) lazy object, long method/function, closure smells, couplingbetween JavaScript, HTML, and CSS, and excessive global variables are the mostprevalent code smells.Future directions. JSNOSE can be used during development and maintenance cy-cles to spot potential JavaScript code smells, which can be refactored to improvethe code quality. However, manual code refactoring may take a significant amountof time and may also introduce new bugs if changes are not done properly. CurrentJavaScript refactoring techniques [8, 11, 83, 84] support basic operations such asrenaming of variables or object properties, or encapsulation of properties and ex-traction of modules. One possible future direction is designing an automated toolfor JavaScript code refactoring for the detected code smell instances. Examplesof the required code refactoring is explained in our work, however, a major chal-158lenge towards building an effective JavaScript code refactoring is to guarantee codechanges do not affect the application behaviour.7.2 Concluding RemarksThe work presented in this dissertation has focused on advancing the state-of-the-art in automated testing and maintenance of web applications. Although, the pro-posed approaches have been tailored to web and in particular JavaScript-based ap-plications, a number of contributions are applicable to other types of applications,programming languages, and software analysis domains as following:• Our feedback-directed exploration technique [123] decides about next stateto expand and next event to exercise based on scoring functions. Such scor-ing functions are generic and can be changed to adapt the model inferencefor other purposes such as program comprehension.• Our proposed UI testing techniques for test model generation [123], and us-ing existing tests to generate new ones [126], can be adapted to other types ofapplication that have rich user interface interactions such as mobile applica-tions and desktop GUI-based applications. The major difference is the DOMstructure and its elements compared to a specific designed user interface.• The idea of leveraging knowledge in existing tests for UI test generation[126] can also be applied for unit test generation. For instance some inputvalues and program paths can be reused.• Our technique to generate JavaScript unit tests with DOM-based fixture[127] can be adapted for building test fixtures in the form of partial UI that isrequired for proper unit testing of mobile and other GUI-based applications.• Our empirical study to analyze prevalence, quality, and shortcomings of ex-isting JavaScript tests, can be similarly conducted to study unit tests writtenin other languages. In particular, less studied popular languages such asRuby, Objective-C, Python, and Go can be interesting to analyse.159• Our JavaScript code smell detection technique [124] can be applied on otherimplementations of ECMAScript (e.g. JScript .NET, and ActionScript), su-persets of JavaScript (e.g. CoffeeScript and TypeScript), prototype-basedprogramming languages (e.g. Self, Lua, Common Lisp), and languages thatsupport first-class functions (e.g. ML, Haskell, Scala, Perl, Python, PHP).160Bibliography[1] AddressBook. 2016-01-30. → pages 59[2] Brotherhood social network. 2016-01-30. → pages 59[3] Checkstyle. Accessed: 2015-04-30. →pages 133[4] CollegeVis. Accessed:2013-09-30. → pages 149[5] CookeryBook. 2016-01-30. → pages 59[6] WSO2 EnterpriseStore. 2014-11-30. → pages 59[7] GhostBusters. Accessed:2013-09-30. → pages 149[8] Eclipse JavaScript development tools. 2015-04-30. → pages 158[9] Jslint: The JavaScript code quality tool. Accessed:2015-04-30. → pages 8, 130, 133[10] Mozilla developer network’s JavaScript reference. Accessed:2013-09-30. → pages 134, 136[11] JetBrains JavaScript editor. editor.jsp. Accessed:2015-04-30. → pages 158161[12] PMD code analyzer. Accessed: 2015-04-30.→ pages 8[13] PeriodicTable. Accessed: 2013-09-30.→ pages 149[14] Environment simulated study room. 2016-01-30. → pages 59[15] Symbolistic. Accessed:2013-09-30. → pages 149[16] Tunnel. Accessed:2013-09-30. → pages 149[17] WARI: Web application resource inspector. 2015-04-30. → pages 8, 130, 133[18] WolfCMS. Accessed: 2014-11-30.→ pages 59[19] Callback hell: A guide to writing elegant asynchronous JavaScriptprograms. Accessed: 2013-09-30. → pages 140[20] SimpleChessGame. Accessed: 2013-09-30. →pages 27, 149[21] Claroline. Accessed: 2014-11-30.→ pages 59[22] CLOC. Accessed: 2013-09-30. → pages 28,149[23] complexityReport.js. 2013-09-30. → pages 149[24] ConFix. Accessed: 2015-11-30. →pages 9, 102, 121, 127, 157[25] FeedEx. Accessed: 2013-09-30. →pages 8, 26, 27[26] FractalViewer. Accessed: 2013-09-30.→ pages 27, 149162[27] Google closure compiler. 2013-09-30. → pages 8, 130, 133[28] Examples of hard to test JavaScript. Accessed:2016-08-30. → pages 5, 75, 82[29] Hotel Reservation. 2015-11-30. → pages 122[30] Istanbul - a JS code coverage tool written in JS. Accessed: 2016-08-30. → pages 79[31] Jasmine. Accessed: 2016-08-30. →pages 4, 75, 104[32] JavaParser. Accessed: 2014-11-30.→ pages 57[33] Jscover. Accessed: 2014-05-30. →pages 63, 79, 124[34] JSNose. Accessed: 2013-09-30. →pages 9, 148, 158[35] JsUnit. Accessed: 2016-08-30. → pages 104[36] Mocha. Accessed: 2016-08-30. → pages 4, 75[37] Nodeunit. Accessed: 2016-08-30. →pages 4, 75[38] Organizer. Accessed:2014-11-30. → pages 41[39] Phormer Photogallery. 2013-09-30. → pages 27, 59, 122, 149[40] QUnit. Accessed: 2016-08-30. → pages 1, 4, 75, 104[41] Mozilla Rhino. Accessed: 2013-08-30.→ pages 26, 79, 117, 148[42] Selenium HQ. Accessed: 2016-08-30. → pages 1,30, 38, 109, 117163[43] SimpleToDo. Accessed:2015-11-30. → pages 122[44] Sudoku game. sudoku/game sudoku.html.Accessed: 2015-11-30. → pages 122[45] TacirFormBuilder. 2013-09-30. → pages 27[46] Writing testable JavaScript., .Accessed: 2016-08-30. → pages 5, 75, 82[47] Testilizer., . Accessed: 2014-11-30. →pages 8, 39, 57, 58, 156[48] How to unit test private functions in JavaScript.,. Accessed: 2016-08-30. → pages 5, 75, 82, 85[49] TinySiteCMS., . Accessed: 2013-09-30. → pages 149[50] TinyMCE., . Accessed: 2013-09-30. → pages 27, 149[51] TuduListManager. Accessed: 2013-09-30.→ pages 27, 149[52] Which JavaScript test library should you use? 2016-08-30. → pages 4, 75[53] Writing testable code in JavaScript: A brief overview. 2016-08-30. → pages 5, 75, 82[54] JSter JavaScript Libraries Catalog., 2014. Accessed:2016-08-30. → pages 76, 97[55] Most depended-upon NMP packages., 2014. Accessed: 2016-08-30.→ pages 76[56] Github Showcases., 2014. Accessed:2016-08-30. → pages 76, 97164[57] Testing and deploying with ordered npm run scripts.,2015. Accessed: 2016-08-30. → pages 87, 95[58] SLOC (source lines of code) counter., 2016.Accessed: 2016-08-30. → pages 77[59] TestScanner., 2016. Accessed:2016-08-30. → pages 76, 79, 97[60] C. Q. Adamsen, A. Møller, and G. Mezzetti. Systematic execution ofandroid test suites in adverse conditions. In Proceedings of theInternational Symposium on Software Testing and Analysis (ISSTA), pages83–93. ACM, 2015. → pages 72[61] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying searchresults. In Proc. of the International Conference on Web Search and DataMining, pages 5–14. ACM, 2009. → pages 16, 34[62] S. Alimadadi, S. Sequeira, A. Mesbah, and K. Pattabiraman.Understanding JavaScript event-based interactions. In Proc. of theACM/IEEE International Conference on Software Engineering (ICSE),pages 367–377. ACM, 2014. → pages 1, 6, 101, 106[63] N. Alshahwan and M. Harman. State aware test case regeneration forimproving web application test suite coverage and fault detection. In Proc.of the International Symposium on Software Testing and Analysis (ISSTA),pages 45–55, 2012. → pages 72[64] N. Alshahwan, M. Harman, A. Marchetto, R. Tiella, and P. Tonella.Crawlability metrics for web applications. In Proc. InternationalConference on Software Testing, Verification and Validation (ICST), pages151–160, Washington, DC, USA, 2012. IEEE Computer Society, IEEEComputer Society. ISBN 978-0-7695-4670-4. doi:10.1109/ICST.2012.95.URL → pages 12, 33, 34[65] M. Alshraideh. A complete automation of unit testing for JavaScriptprograms. Journal of Computer Science, 4(12):1012, 2008. → pages 127[66] S. Artzi, J. Dolby, S. Jensen, A. Møller, and F. Tip. A framework forautomated testing of JavaScript web applications. In Proc. InternationalConference on Software Engineering (ICSE), pages 571–580. ACM, 2011.→ pages 1, 4, 5, 6, 35, 39, 71, 75, 96, 101, 127165[67] C. Atkinson, O. Hummel, and W. Janjic. Search-enhanced testing (niertrack). In Proceedings of the 33rd International Conference on SoftwareEngineering (ICSE-NIER track), pages 880–883. ACM, 2011. → pages 72[68] M. Benedikt, J. Freire, and P. Godefroid. VeriWeb: Automatically testingdynamic web sites. In Proc. of the International World Wide WebConference (WWW), pages 654–668, 2002. → pages 2, 12, 32, 33[69] M. Benedikt, W. Fan, and F. Geerts. XPath satisfiability in the presence ofDTDs. In Proc. of the Symposium on Principles of Database Systems,pages 25–36. ACM, 2005. doi:10.1145/1065167.1065172. → pages 112[70] K. Benjamin, G. von Bochmann, M. E. Dincturk, G.-V. Jourdan, and I.-V.Onut. A strategy for efficient crawling of rich internet applications. InProc. of the International Conference on Web Engineering (ICWE), pages74–89. Springer-Verlag, 2011. → pages 3, 33, 34[71] C.-P. Bezemer, A. Mesbah, and A. van Deursen. Automated securitytesting of web widget interactions. In Proc. of the joint meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTsymposium on the Foundations of Software Engineering (ESEC-FSE),pages 81–91. ACM, 2009. → pages 2[72] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vectormachines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at∼cjlin/libsvm. → pages 58[73] S. Choudhary, M. E. Dincturk, S. M. Mirtaheri, G.-V. Jourdan, G. vonBochmann, and I. V. Onut. Building rich internet applications models:Example of a better strategy. In Proc. of the International Conference onWeb Engineering (ICWE). Springer, 2013. → pages 3, 34[74] S. R. Choudhary, M. Prasad, and A. Orso. Crosscheck: Combiningcrawling and differencing to better detect cross-browser incompatibilitiesin web applications. In Proc. International Conference on SoftwareTesting, Verification and Validation (ICST), pages 171–180. IEEEComputer Society, 2012. → pages 2, 3, 12, 32, 33, 38[75] J. Clark and S. DeRose. Xml path language (xpath)., November 1999. →pages 112166[76] Y. Crespo, C. Lo´pez, R. Marticorena, and E. Manso. Language independentmetrics support towards refactoring inference. In 9th ECOOP Workshop onQAOOSE, volume 5, pages 18–29, 2005. → pages 133, 141, 144[77] D. Crockford. JavaScript: the good parts. O’Reilly Media, Incorporated,2008. → pages 8, 84, 130, 134, 135, 136, 139, 140[78] V. Dallmeier, N. Knopp, C. Mallon, S. Hack, and A. Zeller. Generating testcases for specification mining. In Proc. of the International Symposium onSoftware Testing and Analysis (ISSTA), pages 85–96, 2010. → pages 71[79] M. E. Dincturk, S. Choudhary, G. von Bochmann, G.-V. Jourdan, and I. V.Onut. A statistical approach for efficient crawling of rich internetapplications. In Proc. of the International Conference on Web Engineering(ICWE), pages 362–369. Springer, 2012. → pages 3, 34[80] C. Duda, G. Frey, D. Kossmann, R. Matter, and C. Zhou. Ajax crawl:Making Amax applications searchable. In Proc. of the 2009 IEEEInternational Conference on Data Engineering, ICDE ’09, pages 78–89.IEEE Computer Society, 2009. → pages 23, 33[81] S. Elbaum, G. Rothermel, S. Karre, and M. Fisher. Leveraging user-sessiondata to support web application testing. IEEE Trans. Softw. Eng., 31(3):187–202, 2005. ISSN 0098-5589. → pages 4, 70[82] M. Erfani, I. Keivanloo, and J. Rilling. Opportunities for clone detection intest case recommendation. In IEEE Computer Software and ApplicationsConference Workshops (COMPSACW), pages 65–70. IEEE, 2013. →pages 72[83] A. Feldthaus and A. Møller. Semi-automatic rename refactoring forJavaScript. In Proc. Conference on Object-Oriented Programming,Systems, Languages, and Applications (OOPSLA), pages 323–338. ACM,2013. → pages 8, 158[84] A. Feldthaus, T. Millstein, A. Møller, M. Scha¨fer, and F. Tip.Tool-supported refactoring for JavaScript. In Proc. Conference onObject-Oriented Programming, Systems, Languages, and Applications(OOPSLA), 2011. → pages 8, 158[85] M. Fowler and K. Beck. Refactoring: improving the design of existingcode. Addison-Wesley Professional, 1999. → pages 7, 130, 131, 132, 134,141, 145167[86] F. Galassi. Refactoring to unobtrusive JavaScript. JavaScript Camp 2009.→ pages 132, 134, 137, 138[87] K. Gallaba, A. Mesbah, and I. Beschastnikh. Don’t call us, we’ll call you:Characterizing callbacks in JavaScript. In Proceedings of the ACM/IEEEInternational Symposium on Empirical Software Engineering andMeasurement (ESEM), pages 247–256. IEEE Computer Society, 2015. →pages 76, 77, 84, 98[88] V. Garousi, A. Mesbah, A. Betin Can, and S. Mirshokraie. A systematicmapping study of web application testing. Information and SoftwareTechnology, 2013. → pages 12[89] P. Geneve`s, N. Layaı¨da, and A. Schmitt. Efficient static analysis of XMLpaths and types. In Proc. of the 2007 ACM SIGPLAN Conference onProgramming Language Design and Implementation, PLDI ’07, pages342–351, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-633-2.doi:10.1145/1250734.1250773. URL → pages 102, 115, 117[90] M. Ghafari, C. Ghezzi, A. Mocci, and G. Tamburrelli. Mining unit tests forcode recommendation. In International Conference on ProgramComprehension (ICPC), pages 142–145. ACM, 2014. → pages 72[91] GitHut. A small place to discover languages in GitHub.,2015. → pages 1, 75[92] P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated randomtesting. In Proc. of the 2005 ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, PLDI ’05, pages 213–223, NewYork, NY, USA, 2005. ACM. ISBN 1-59593-056-6.doi:10.1145/1065010.1065036. → pages 107[93] A. Guha, S. Krishnamurthi, and T. Jim. Using static analysis for Ajaxintrusion detection. In Proc. of the International World Wide WebConference (WWW), pages 561–570. ACM, 2009. ISBN978-1-60558-487-4. → pages 34[94] M. Harrold, R. Gupta, and M. Soffa. A methodology for controlling thesize of a test suite. ACM Transactions on Software Engineering andMethodology (TOSEM), 2(3):270–285, 1993. → pages 17, 24168[95] P. Heidegger and P. Thiemann. Contract-driven testing of Javascript code.In Proceedings of the 48th International Conference on Objects, Models,Components, Patterns, TOOLS’10, pages 154–172, Berlin, Heidelberg,2010. Springer-Verlag. ISBN 3-642-13952-3, 978-3-642-13952-9. URL → pages 5, 6, 75, 127[96] J. Humble and D. Farley. Continuous Delivery: Reliable Software Releasesthrough Build, Test, and Deployment Automation. Addison-WesleyProfessional, first edition, 2010. ISBN 0321601912, 9780321601919. →pages 28[97] L. Inozemtseva and R. Holmes. Coverage is not strongly correlated withtest suite effectiveness. In Proc. of the International Conference onSoftware Engineering (ICSE), 2014. → pages 5, 63, 79, 98[98] D. Jang, R. Jhala, S. Lerner, and H. Shacham. An empirical study ofprivacy-violating information flows in javascript web applications. InProceedings of the 17th ACM conference on Computer andcommunications security, pages 270–283. ACM, 2010. → pages 98[99] W. Janjic and C. Atkinson. Utilizing software reuse experience forautomated test recommendation. In Proceedings of the 8th InternationalWorkshop on Automation of Software Test (AST), pages 100–106. IEEEPress, 2013. → pages 72[100] C. S. Jensen, A. Møller, and Z. Su. Server interface descriptions forautomated testing of JavaScript web applications. In Proc. of the 2013 9thJoint Meeting on Foundations of Software Engineering, ESEC/FSE 2013,pages 510–520. ACM, 2013. → pages 118[101] S. H. Jensen, M. Madsen, and A. Møller. Modeling the HTML DOM andbrowser API in static analysis of JavaScript web applications. InProceedings of the 19th ACM SIGSOFT Symposium and the 13th EuropeanConference on Foundations of Software Engineering, ESEC/FSE’11, pages59–69. ACM, 2011. → pages 34, 101[102] R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser.Are mutants a valid substitute for real faults in software testing? InProceedings of the ACM SIGSOFT International Symposium on theFoundations of Software Engineering (FSE), FSE 2014, pages 654–665,New York, NY, USA, 2014. ACM. ISBN 978-1-4503-3056-5.doi:10.1145/2635868.2635929. URL → pages 97169[103] S. H. Kan. Metrics and Models in Software Quality Engineering.Addison-Wesley, 2002. → pages 63, 150[104] J. Kerievsky. Refactoring to patterns. Pearson Deutschland GmbH, 2005.→ pages 131, 134, 145[105] A. Kiezun, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. HAMPI:A solver for string constraints. In Proc. of the International Symposium onSoftware Testing and Analysis (ISSTA), pages 105–116. ACM, 2009.doi:10.1145/1572272.1572286. → pages 118[106] J. C. King. Symbolic execution and program testing. Communications ofthe ACM, 19(7):385–394, 1976. → pages 106[107] M. Landha¨ußer and W. F. Tichy. Automated test-case generation bycloning. In Proceedings of the 7th International Workshop on Automationof Software Test (AST), pages 83–88. IEEE Press, 2012. → pages 72[108] M. Lanza and R. Marinescu. Object-oriented Metrics in Practice: UsingSoftware Metrics to Characterize, Evaluate, and Improve the Design ofObject-Oriented Systems. Springer, 2006. → pages 133, 141, 143, 144, 145[109] G. Li, E. Andreasen, and I. Ghosh. SymJS: Automatic symbolic testing ofJavaScript web applications. In Proc. of the ACM SIGSOFT InternationalSymposium on the Foundations of Software Engineering (FSE), page 11pages. ACM, 2014. → pages 6, 96, 101, 107, 127[110] J. Madhavan, D. Ko, L. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy.Google’s deep web crawl. Proc. VLDB Endow., 1(2):1241–1252, 2008. →pages 33[111] M. Madsen, B. Livshits, and M. Fanning. Practical static analysis ofJavaScript applications in the presence of frameworks and libraries. InESEC/FSE, 2013. → pages 101[112] M. Ma¨ntyla¨, J. Vanhanen, and C. Lassenius. A taxonomy and an initialempirical study of bad smells in code. In Proc. International Conferenceon Software Maintenance (ICSM), pages 381–384. IEEE ComputerSociety, 2003. → pages 145[113] A. Marchetto and P. Tonella. Using search-based algorithms for Ajax eventsequence generation during testing. Empirical Software Engineering, 16(1):103–140, 2011. → pages 2170[114] A. Marchetto, P. Tonella, and F. Ricca. State-based testing of Ajax webapplications. In Software Testing, Verification, and Validation, 2008 1stInternational Conference on, pages 121–130. IEEE, IEEE ComputerSociety, 2008. ISBN 978-0-7695-3127-4. → pages 12, 32, 33, 35[115] S. McAllister, E. Kirda, and C. Kruegel. Leveraging user interactions forin-depth testing of web applications. In Recent Advances in IntrusionDetection, volume 5230 of LNCS, pages 191–210. Springer, 2008. URL 11. → pages 70[116] A. M. Memon. An event-flow model of gui-based applications for testing.Software testing, verification and reliability, 17(3):137–157, 2007. ISSN0960-0833. doi: → pages 68[117] F. Menczer, G. Pant, and P. Srinivasan. Topical web crawlers: Evaluatingadaptive algorithms. ACM Transactions on Internet Technology (TOIT), 4(4):378–419, 2004. → pages 33[118] A. Mesbah and S. Mirshokraie. Automated analysis of CSS rules tosupport style maintenance. In Proc. International Conference on SoftwareEngineering (ICSE), pages 408–418. IEEE Computer Society, 2012. →pages 133[119] A. Mesbah and M. R. Prasad. Automated cross-browser compatibilitytesting. In Proc. of the International Conference on Software Engineering(ICSE), pages 561–570. ACM, 2011. → pages 2, 12, 32, 33[120] A. Mesbah, A. van Deursen, and S. Lenselink. Crawling Ajax-based webapplications through dynamic analysis of user interface state changes.ACM Transactions on the Web (TWEB), 6(1):3:1–3:30, 2012. → pages 2, 8,14, 22, 26, 33, 45, 57, 148[121] A. Mesbah, A. van Deursen, and D. Roest. Invariant-based automatictesting of modern web applications. IEEE Transactions on SoftwareEngineering (TSE), 38(1):35–53, 2012. ISSN 0098-5589.doi:10.1109/TSE.2011.28. URL→ pages 2, 3, 4, 8, 12, 15, 26, 32, 33, 35, 38, 39, 45, 47, 71, 127[122] G. Meszaros. xUnit test patterns: Refactoring test code. PearsonEducation, 2007. → pages 103[123] A. Milani Fard and A. Mesbah. Feedback-directed exploration of webapplications to derive test models. In Proc. of the International Symposium171on Software Reliability Engineering (ISSRE), pages 278–287. IEEEComputer Society, 2013. → pages iii, 8, 10, 11, 47, 71, 127, 155, 159[124] A. Milani Fard and A. Mesbah. JSNose: Detecting JavaScript code smells.In Proc. of the International Conference on Source Code Analysis andManipulation (SCAM), pages 116–125. IEEE Computer Society, 2013. →pages iv, 9, 10, 32, 80, 98, 129, 158, 160[125] A. Milani Fard and A. Mesbah. JavaScript: The (un)covered parts. InProceedings of the International Conference on Software Testing,Verification, and Validation (ICST), page 11 pages. IEEE ComputerSociety, 2017. → pages iii, 9, 10, 74, 157[126] A. Milani Fard, M. Mirzaaghaei, and A. Mesbah. Leveraging existing testsin automated test generation for web applications. In Proc. of theIEEE/ACM International Conference on Automated Software Engineering(ASE), pages 67–78. ACM, 2014. → pages iii, 8, 10, 37, 98, 127, 156, 159[127] A. Milani Fard, A. Mesbah, and E. Wohlstadter. Generating fixtures forJavaScript unit testing. In Proceedings of the IEEE/ACM InternationalConference on Automated Software Engineering (ASE), pages 190–200.IEEE Computer Society, 2015. → pages iii, 5, 9, 10, 72, 75, 82, 83, 95, 96,100, 157, 159[128] S. Mirshokraie and A. Mesbah. JSART: JavaScript assertion-basedregression testing. In Proc. of the Internatinoal Conference on WebEngineering (ICWE), pages 238–252. Springer, 2012. → pages 2, 12, 32,33, 35, 71[129] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Efficient JavaScriptmutation testing. In Proc. of the International Conference on SoftwareTesting, Verification and Validation (ICST). IEEE Computer Society, 2013.→ pages 2, 5, 12, 33, 62, 97, 99[130] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. JSeft: AutomatedJavaScript unit test generation. In Proc. of the International Conference onSoftware Testing, Verification and Validation (ICST), pages 1–10. IEEEComputer Society, 2015. → pages 1, 4, 5, 6, 71, 72, 75, 95, 96, 101, 122,126, 127[131] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Atrina: Inferring unitoracles from GUI test cases. In Proceedings of the International172Conference on Software Testing, Verification, and Validation (ICST), page11 pages. IEEE Computer Society, 2016. → pages 5, 75, 95[132] M. Mirzaaghaei, F. Pastore, and M. Pezze. Supporting test suite evolutionthrough test case adaptation. In Proc. of the International Conference onSoftware Testing, Verification and Validation (ICST), pages 231–240. IEEEComputer Society, 2012. → pages 72[133] N. Moha, Y.-G. Gue´he´neuc, L. Duchien, and A.-F. Le Meur. DECOR: Amethod for the specification and detection of code and design smells. IEEETransactions on Software Engineering, 36(1):20–36, 2010. → pages 133,141, 144, 150[134] A. Moosavi, S. Hooshmand, S. Baghbanzadeh, G.-V. Jourdan, G. V.Bochmann, and I. V. Onut. Indexing rich internet applications usingcomponents-based crawling. In Proc. of the International Conference onWeb Engineering (ICWE), pages 200–217. Springer-Verlag, 2014. → pages3, 33, 34[135] M. J. Munro. Product metrics for automatic identification of “bad smell”design problems in Java source-code. In Proc. International SymposiumSoftware Metrics, pages 15–15. IEEE, 2005. → pages 133, 141, 144[136] R. Murphey. JS minty fresh: Identifying and eliminating JavaScript codesmells. →pages 132, 134[137] H. V. Nguyen, H. A. Nguyen, T. T. Nguyen, A. T. Nguyen, and T. N.Nguyen. Detection of embedded code smells in dynamic web applications.In Proc. of the International Conference on Automated SoftwareEngineering (ASE), pages 282–285. ACM, 2012. → pages 7, 130, 132,133, 137[138] H. V. Nguyen, C. Ka¨stner, and T. N. Nguyen. Building call graphs forembedded client-side code in dynamic web applications. In Proc. of theInternational Symposium on Foundations of Software Engineering (FSE),pages 518–529. ACM, 2014. doi:10.1145/2635868.2635928. → pages 127[139] H. V. Nguyen, H. A. Nguyen, A. T. Nguyen, and T. N. Nguyen. Mininginterprocedural, data-oriented usage patterns in javascript web applications.In Proceedings of the 36th International Conference on SoftwareEngineering, pages 791–802. ACM, 2014. → pages 98173[140] N. Nikiforakis, L. Invernizzi, A. Kapravelos, S. Van Acker, W. Joosen,C. Kruegel, F. Piessens, and G. Vigna. You are what you include:large-scale evaluation of remote javascript inclusions. In Proceedings ofthe 2012 ACM conference on Computer and communications security,pages 736–747. ACM, 2012. → pages 98[141] F. Ocariza, K. Bajaj, K. Pattabiraman, and A. Mesbah. An empirical studyof client-side JavaScript bugs. In Proc. of the International Symposium onEmpirical Software Engineering and Measurement (ESEM), pages 55–64.IEEE Computer Society, 2013. → pages 1, 6, 96, 101, 103, 106, 126[142] F. Ocariza, K. Bajaj, K. Pattabiraman, and A. Mesbah. A study of causesand consequences of client-side JavaScript bugs. IEEE Transactions onSoftware Engineering (TSE), page 17 pages, 2017. → pages 5, 97, 98[143] F. J. Ocariza, K. Pattabiraman, and A. Mesbah. AutoFLox: An automaticfault localizer for client-side JavaScript. In Proc. of the InternationalConference on Software Testing, Verification and Validation (ICST’12),pages 31–40. IEEE Computer Society, 2012. → pages 34[144] C. Olston and M. Najork. Web crawling. Foundations and Trends inInformation Retrieval, 4(3):175–246, 2010. → pages 12, 16, 33[145] V. O¨zc¸elik. o2.js JavaScript conventions & best practices. → pages132, 134, 139, 145[146] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directedrandom test generation. In Proc. 29th Int. Conf. on Sw. Engineering(ICSE’07), pages 75–84. IEEE Computer Society, IEEE Computer Society,2007. doi: → pages 71[147] J. Padolsey. jQuery code smells. → pages 132,134, 138, 139[148] K. Pattabiraman and B. Zorn. DoDOM: Leveraging DOM Invariants forWeb 2.0 Application Robustness Testing. In Proc. of the InternationalSymposium on Sw. Reliability Eng. (ISSRE), pages 191–200. IEEEComputer Society, 2010. → pages 35, 55, 71[149] M. Pawlik and N. Augsten. RTED: a robust algorithm for the tree editdistance. Proc. VLDB Endow., 5(4):334–345, Dec. 2011. ISSN 2150-8097.URL → pages 23, 27174[150] M. Pezze, K. Rubinov, and J. Wuttke. Generating effective integration testcases from unit ones. In Proc. International Conference on SoftwareTesting, Verification and Validation (ICST), pages 11–20. IEEE, 2013. →pages 4, 72[151] S. Porto. A plain english guide to JavaScript prototypes. → pages 130[152] P. Ratanaworabhan, B. Livshits, and B. G. Zorn. JSMeter: Comparing thebehavior of JavaScript benchmarks with real web applications. InProceedings of the 2010 USENIX Conference on Web ApplicationDevelopment, WebApps’10, pages 3–3, Berkeley, CA, USA, 2010.USENIX Association. URL → pages 98[153] G. Richards, S. Lebresne, B. Burg, and J. Vitek. An analysis of thedynamic behavior of JavaScript programs. In Conference on ProgrammingLanguage Design and Implementation (PLDI), pages 1–12. ACM, 2010.ISBN 978-1-4503-0019-3.doi: → pages 98, 106, 132[154] G. Richards, C. Hammer, B. Burg, and J. Vitek. The eval that men do. InECOOP 2011–Object-Oriented Programming, pages 52–78. Springer,2011. → pages 98[155] K. Rubinov and J. Wuttke. Augmenting test suites automatically. In Proc.of the 2012 International Conference on Software Engineering, ICSE 2012,pages 1433–1434, Piscataway, NJ, USA, 2012. IEEE Press. ISBN978-1-4673-1067-3. URL → pages 72[156] T. Sakai. Evaluation with informational and navigational intents. In Proc.of the International Conference on World Wide Web (WWW), pages499–508. ACM, 2012. → pages 34[157] R. Santos, C. Macdonald, and I. Ounis. Selectively diversifying web searchresults. In Proc. of the International Conference on Information andknowledge management, pages 1179–1188. ACM, 2010. → pages 34[158] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song. Asymbolic execution framework for JavaScript. In Proc. Symp. on Securityand Privacy (SP’10), SP ’10, pages 513–528, Washington, DC, USA,1752010. IEEE Computer Society. ISBN 978-0-7695-4035-1.doi: URL → pages 6, 35, 96, 107, 108, 120,127[159] M. Schur, A. Roth, and A. Zeller. Mining behavior models from enterpriseweb applications. In Proc. of the 2013 9th Joint Meeting on Foundations ofSoftware Engineering, Proc. of the Foundations of Software Engineering(ESEC/FSE), pages 422–432. ACM, 2013. → pages 3, 4, 38, 71[160] K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing enginefor C. In Proc. of the ACM SIGSOFT International Symposium onFoundations of Software Engineering, ESEC/FSE, pages 263–272. ACM,2005. ISBN 1-59593-014-0. doi:10.1145/1081706.1081750. URL → pages 107[161] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs. Jalangi: A selectiverecord-replay and dynamic analysis framework for JavaScript. InProceedings of the 9th Joint Meeting on Foundations of SoftwareEngineering, ESEC/FSE, pages 488–498. ACM, 2013. ISBN978-1-4503-2237-9. doi:10.1145/2491411.2491447. URL → pages 6, 96, 101, 106,107, 108, 120, 123, 126, 127[162] F. Simon, F. Steinbruckner, and C. Lewerentz. Metrics based refactoring.In Proc European Conference on Software Maintenance and Reengineering(CSMR), pages 30–38. IEEE, 2001. → pages 133, 141, 144[163] K. Simpson. Native JavaScript: sync and async. → pages 134, 140[164] R. Song, H. Liu, J.-R. Wen, and W.-Y. Ma. Learning important models forweb page blocks based on layout and content analysis. ACM SIGKDDExplorations Newsletter, 6(2):14–23, 2004. → pages 53, 54[165] S. Sprenkle, E. Gibson, S. Sampath, and L. Pollock. Automated replay andfailure detection for web applications. In ASE’05: Proc. 20th IEEE/ACMInt. Conf. on Automated Sw. Eng., pages 253–262. ACM, 2005. ISBN1-59593-993-4. doi: URL → pages 4, 70[166] P. Srinivasan, F. Menczer, and G. Pant. A general evaluation framework fortopical crawlers. Information Retrieval, 8(3):417–447, 2005. → pages 16176[167] Stack Overflow. 2016 developer survey., 2016. → pages1, 75[168] K.-C. Tai. The tree-to-tree correction problem. Journal of the ACM(JACM), 26(3):422–433, 1979. ISSN 0004-5411.doi:10.1145/322139.322143. URL → pages 23[169] S. Thummalapenta, K. V. Lakshmi, S. Sinha, N. Sinha, and S. Chandra.Guided test generation for web applications. In Proc. of the InternationalConference on Software Engineering (ICSE), pages 162–171. IEEEComputer Society, 2013. → pages 3, 33, 34, 38[170] N. Tillmann and J. de Halleux. Pex - white box test generation for .NET. InProc. of Tests and Proofs (TAP’08), volume 4966 of LNCS, pages 134–153.Springer Verlag, April 2008. URL → pages107[171] P. Tonella and F. Ricca. Statistical testing of web applications. Journal ofSoftware Maintenance and Evolution: Research and Practice, 16(1-2):103–127, 2004. ISSN 1532-060X. doi:→ pages 2, 12, 33[172] P. Tonella, F. Ricca, A. Stocco, and M. Leotta. Automated generation ofvisual web tests from DOM-based web tests. In Proceedings of theInternational Symposium on Applied Computing (SAC), pages 775–782.ACM, 2015. → pages 72[173] M. E. Trostler. Testable JavaScript. O’Reilly Media, Incorporated, 2013.→ pages 5, 75, 82[174] N. Tsantalis, T. Chaikalis, and A. Chatzigeorgiou. JDeodorant:Identification and removal of type-checking bad smells. In Proc. EuropeanConference on Software Maintenance and Reengineering (CSMR), pages329–331, 2008. → pages 133[175] M. Utting, A. Pretschner, and B. Legeard. A taxonomy of model-basedtesting approaches. Software Testing, Verification and Reliability, 22(5):297–312, 2012. → pages 2177[176] A. Vahabzadeh, A. Milani Fard, and A. Mesbah. An empirical study ofbugs in test code. In Proceedings of the International Conference onSoftware Maintenance and Evolution (ICSME), pages 101–110. IEEEComputer Society, 2015. → pages 10, 98[177] A. Valmari. The state explosion problem. In LNCS: Lectures on Petri NetsI, Basic Models, Advances in Petri Nets, pages 429–528. Springer-Verlag,1998. ISBN 3-540-65306-6. → pages 3, 12[178] E. Van Emden and L. Moonen. Java quality assurance by detecting codesmells. In Proc. of the Working Conference on Reverse Engineering(WCRE), pages 97–106. IEEE Computer Society, 2002. → pages 130, 134[179] V. Vapnik. The nature of statistical learning theory. springer, 2000. →pages 53[180] W3C. Document Object Model (DOM) level 2 events specification., 13 November 2000. →pages 1, 7, 130, 137[181] Y. Wang, S. Person, S. Elbaum, and M. B. Dwyer. A framework to advisetests using tests. In Proc. of ICSE NIER. ACM, 2014. → pages 4, 72[182] S. Wei and B. G. Ryder. Practical blended taint analysis for JavaScript. InProc. of the International Symposium on Software Testing and Analysis(ISSTA), pages 336–346. ACM, 2013. → pages 106[183] J. Weinberger, P. Saxena, D. Akhawe, M. Finifter, R. Shin, and D. Song.An empirical analysis of XSS sanitization in web application frameworks.Electrical Engineering and Computer Sciences University of California atBerkeley, Technical Report, pages 1–17, 2011. → pages 98[184] E. J. Weyuker. On testing non-testable programs. The Computer Journal,25(4):465–470, 1982. doi:10.1093/comjnl/25.4.465. → pages 3[185] L. Williams, D. Ho, and S. Heckman. Software metrics in eclipse. → pages 143,144, 145[186] M. R. Woodward. Mutation testing?its origin and evolution. Informationand Software Technology, 35(3):163–169, 1993. → pages 62[187] T. Xie and D. Notkin. Mutually enhancing test generation and specificationinference. In Formal Approaches to Software Testing, pages 60–69.Springer, 2004. → pages 71178[188] D. Xu, W. Xu, B. K. Bavikati, and W. E. Wong. Mining executablespecifications of web applications from Selenium IDE tests. In SoftwareSecurity and Reliability (SERE), 2012 IEEE Sixth International Conferenceon, pages 263–272. IEEE, 2012. → pages 4, 71[189] Z. Xu, Y. Kim, M. Kim, G. Rothermel, and M. B. Cohen. Directed testsuite augmentation: techniques and tradeoffs. In Proc. of the InternationalSymposium on Foundations of Software Engineering (FSE), pages257–266. ACM, 2010. → pages 4, 72[190] S. Yoo and M. Harman. Test data regeneration: generating new test datafrom existing test data. Software Testing, Verification and Reliability, 22(3):171–201, 2012. → pages 4, 71[191] X. Yuan and A. M. Memon. Using gui run-time state as feedback togenerate test cases. In ICSE ’07: Proc. of the 29th international conferenceon Software Engineering, ICSE ’07, pages 396–405, Washington, DC,USA, 2007. IEEE Computer Society. ISBN 0-7695-2828-7.doi: URL → pages 71[192] X. Yuan and A. M. Memon. Iterative execution-feedback model-directedGUI testing. Information and Software Technology, 52(5):559–575, 2010.→ pages 4, 71[193] C. Yue and H. Wang. Characterizing insecure JavaScript practices on theweb. In Proc. of the International World Wide Web Conference (WWW),pages 961–970. ACM, 2009. → pages 98[194] A. Zaidman, B. van Rompaey, S. Demeyer, and A. van Deursen. Miningsoftware repositories to study co-evolution of production and test code. InProceedings of the International Conference on Software Testing,Verification and Validation (ICST), pages 220–229, 2008. → pages 98[195] N. C. Zakas. Writing efficient JavaScript. In S. Souders, editor, EvenFaster Web Sites. O’Reilly, 2009. → pages 134, 135[196] N. C. Zakas. Maintainable JavaScript - Writing Readable Code. O’Reilly,2012. → pages 134, 137, 138[197] D. Zhang, W. Wang, D. Liu, Y. Lei, and D. Kung. Reusing existing testcases for security testing. In Proceedings of the International Symposiumon Software Reliability Engineering (ISSRE), pages 323–324. IEEEComputer Society, 2008. → pages 72179[198] S. Zhang, D. Jalali, J. Wuttke, K. Mus¸lu, W. Lam, M. D. Ernst, andD. Notkin. Empirically revisiting the test independence assumption. InProceedings of the International Symposium on Software Testing andAnalysis (ISSTA, pages 385–396. ACM, 2014. → pages 69[199] Y. Zhang and A. Mesbah. Assertions are strongly correlated with test suiteeffectiveness. In Proceedings of the joint meeting of the European SoftwareEngineering Conference and the ACM SIGSOFT Symposium on theFoundations of Software Engineering (ESEC/FSE), pages 214–224. ACM,2015. → pages 5, 79, 98[200] Y. Zheng, T. Bao, and X. Zhang. Statically locating web application bugscaused by asynchronous calls. In Proc. of the International World-WideWeb Conference (WWW), pages 805–814. ACM, 2011. → pages 34[201] Z. Zhu, Y. Zou, B. Xie, Y. Jin, Z. Lin, and L. Zhang. Mining api usageexamples from test code. In Proceedings of the International Conferenceon Software Maintenance and Evolution (ICSME), pages 301–310. IEEEComputer Society, 2014. → pages 72180


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items