Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Mining Stack Overflow for questions asked by web developers : an empirical study Bajaj, Kartik 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_february_bajaj_kartik.pdf [ 1.62MB ]
Metadata
JSON: 24-1.0167068.json
JSON-LD: 24-1.0167068-ld.json
RDF/XML (Pretty): 24-1.0167068-rdf.xml
RDF/JSON: 24-1.0167068-rdf.json
Turtle: 24-1.0167068-turtle.txt
N-Triples: 24-1.0167068-rdf-ntriples.txt
Original Record: 24-1.0167068-source.json
Full Text
24-1.0167068-fulltext.txt
Citation
24-1.0167068.ris

Full Text

Mining Stack Overflow for QuestionsAsked by Web DevelopersAn Empirical StudybyKartik BajajB.Tech., VIT University, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2014© Kartik Bajaj 2014AbstractModern web applications consist of a significant amount of client-side code,written in JavaScript, HTML, and CSS. In this thesis, we present a studyof common challenges and misconceptions among web developers, by miningrelated questions asked on Stack Overflow. We use unsupervised learning tocategorize the mined questions and define a ranking algorithm to rank all theStack Overflow questions based on their importance. We analyze the top 50questions qualitatively. The results indicate that (1) the overall share of webdevelopment related discussions is increasing among developers, (2) browserrelated discussions are prevalent; however, this share is decreasing with time,(3) form validation and other DOM related discussions have been discussedconsistently over time, (4) web related discussions are becoming more preva-lent in mobile development, and (5) developers face implementation issueswith new HTML5 features such as Canvas. We examine the implicationsof the results on the development, research, and standardization communi-ties. Our results show that there is a consistent knowledge gap between theoptions available and options known to developers. Given the presence ofknowledge gap among developers, we need better tools customized to assistdevelopers in building web applications.iiPrefaceThe thesis is an extension of empirical study of questions asked by web de-velopers on Stack Overflow conducted by myself in collaboration with Pro-fessor Karthik Pattabiraman and Professor Ali Mesbah. The results of thisstudy were published as a conference paper on June 2014 in the 11th Work-ing Conference on Mining Software Repositories (MSR) [6]. A part of thisthesis (Section 3.4) was completed as a course project for the Topics in Ma-chine Learning course in collaboration with Professor Mehdi Moradi. I wasresponsible for devising the experiments, creating test cases, running the ex-periments, evaluating and analyzing the results, and writing the manuscript.My collaborators were responsible for guiding me with the creation of theexperimental methodology and the analysis of results, as well as editing andwriting portions of the manuscript.K. Bajaj, K. Pattabiraman and A. Mesbah, “Mining Questions Askedby Web Developers”, in Proceedings of the Working Conference on MiningSoftware Repositories (MSR), 2014, 112-121, ACM.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . 41.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 62 Background on Web Applications and Related Work . . . 72.1 Web Applications . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Stack Overflow Dataset . . . . . . . . . . . . . . . . . . . . . 9ivTable of Contents2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.1 Stack Overflow Dataset . . . . . . . . . . . . . . . . . 112.3.2 Web Application Analysis . . . . . . . . . . . . . . . . 123 Experimental Methodology . . . . . . . . . . . . . . . . . . . . 143.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . 143.2 Data Partitioning . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Data Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4 Supervised Vs. Unsupervised Learning . . . . . . . . . . . . . 183.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 193.4.2 Training Data Selection . . . . . . . . . . . . . . . . . 203.4.3 Building the Classifier . . . . . . . . . . . . . . . . . . 233.4.4 Classifier Comparison . . . . . . . . . . . . . . . . . . 243.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 263.5 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . 273.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1 Discussion Categories . . . . . . . . . . . . . . . . . . . . . . 344.2 Hot Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3 Temporal Trends . . . . . . . . . . . . . . . . . . . . . . . . . 394.4 Mobile Development . . . . . . . . . . . . . . . . . . . . . . . 424.5 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . 454.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50vTable of Contents5.1 Implications for Web Developers . . . . . . . . . . . . . . . . 505.2 Implications for Research Community . . . . . . . . . . . . . 515.3 Implications for Web Standardization Community . . . . . . 525.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . 526 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 556.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57AppendixA Keywords for Each Category . . . . . . . . . . . . . . . . . . . 65viList of Tables3.1 No. of questions in each subset of data. . . . . . . . . . . . . 153.2 Training sets for the classifier . . . . . . . . . . . . . . . . . . 223.3 Training datasets based on different features of data . . . . . 223.4 Factors used in our Accumulated Post Score formula. . . . . . 324.1 Hot topics with the highest view counts. Hot topics with littlediscussion are presented in boldface. . . . . . . . . . . . . . . 38A.1 Keywords for categories in JavaScript related questions. . . . 66A.2 Keywords for categories in HTML5 related questions. . . . . . 67A.3 Keywords for categories in CSS related questions. . . . . . . . 68A.4 Keywords for categories in mobile JavaScript related questions. 69A.5 Keywords for categories in mobile HTML5 related questions. . 70A.6 Keywords for categories in mobile CSS related questions. . . 71viiList of Figures1.1 Text shadow rendering across different browsers . . . . . . . . 23.1 Share of web related questions on Stack Overflow. . . . . . . . 163.2 Number of users, questions, and accepted answers based onthe average reputation of the user . . . . . . . . . . . . . . . . 183.3 Number of questions per tag . . . . . . . . . . . . . . . . . . . 213.4 Accuracy when each sample belongs to maximum of 1 class . 243.5 Accuracy when each sample belongs to maximum of 2 classes 253.6 Accuracy when each sample belongs to maximum of 3 classes 253.7 Our overall analysis workflow. . . . . . . . . . . . . . . . . . . 284.1 Categories of JavaScript-based discussions. . . . . . . . . . . . 354.2 Categories of HTML5-based discussions. . . . . . . . . . . . . 364.3 Categories of CSS-based discussions. . . . . . . . . . . . . . . 374.4 Temporal trends in JavaScript-based discussions. . . . . . . . 394.5 Temporal trends in HTML5-based discussions. . . . . . . . . . 404.6 Temporal trends in CSS-based discussions. . . . . . . . . . . 414.7 Share of web based discussions in mobile related questions onStack Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . 434.8 Categories of mobile JavaScript-based discussions. . . . . . . 44viiiList of Figures4.9 Categories of mobile HTML5-based discussions. . . . . . . . . 444.10 Categories of mobile CSS-based discussions. . . . . . . . . . . 45ixAcknowledgementsFirst of all, I would like to thank my advisors Karthik Pattabiraman andAli Mesbah for their unwavering support. The months I spent as a Masterstudent were some of the most intellectually and professionally rewarding Ihave ever had, and much of it is attributable to Karthik and Ali for con-stantly motivating me to think more critically and get my ideas across moreeffectively. Their exemplary supervision allowed me to learn a lot and pavedthe way for me to begin my professional engineering career.I would also like to thanks my colleagues in CSRG for their help andcritical feedback on my work. I particularly would like to thank my labmates who always manage to make me laugh and made this entire experienceenjoyable. None of this would have been possible if it had not been for theunconditional love accorded to me by my family and friends. I would like tothank my dad Ghansham, my mom Neelam, my siblings Vikram and Tanu,and all my relatives for encouraging me to do my best. I would also like tothank MITACS for giving me the opportunity to work with highly reputedprofessionals and also for funding opportunities provided by MITACS.Last but not the least, I would like to thank God for giving me thechance to pursue this wonderful profession, and for guiding me throughoutthe process.xDedicationTo my friends and familyxiChapter 1IntroductionModern interactive web applications require the integration of many lan-guages on the client-side, such as JavaScript, CSS and HTML. Web devel-opers1 use HTML to define the initial Document Object Model2 (DOM)layout, CSS to provide styling to the layout, and JavaScript to interact withthat layout. JavaScript is often responsible for the core functionality of aweb application, yet it is difficult to program in due to features such as loosetyping, dynamic code generation using eval, and frequent interaction withthe DOM. As a result, JavaScript code often experiences errors [26], whichcan affect the operation of the web application. Further, CSS code is of-ten ad-hoc and difficult to maintain, which can lead to unnecessary codebloat [22]. Finally, with the advent of HTML5 [16], many new features havebeen added to HTML, making it potentially error prone and difficult to use.Therefore, to be able to help developers effectively, there is a compelling needto understand programming challenges faced by web application developers.1In this paper, when we say web development, we mean client-side web development,unless we say otherwise.2The Document Object Model is a cross-platform and language-independent conventionfor representing and interacting with objects in HTML, XHTML and XML documents.The nodes of every document are organized in a tree structure, called the DOM tree.1Chapter 1. IntroductionFigure 1.1: Text shadow rendering across different browsersJavaScript is loosely typed and allows runtime creation and execution ofcode, which makes the web applications prone to errors. Browsers tend tohandle these errors, however each browser has its own exception handlingmechanism. Further, each browser has its own interpreter and renderer forCSS stylesheets. Even though majority browsers tend to render a similarlayout, there are minor differences among these browsers. Figure 1.1 providesan example of such cross-browser issues where each browsers renders thetext-shadow differently. The article [2] provides a brief overview of majorcross-browsers issues that take up significant portion of developer time.Mobile development has also been gaining attention among web devel-opers [17]. The developers not only tend to develop applications to providea unified experience among different browsers, but also provide a simplifiedand user-friendly experience on mobile devices such as iOS and android.Our overall goal is to bridge the knowledge gap between the developerand research communities, and help in developing tools that will increase the21.1. Objectivesoverall quality of web application development. To pursue our goal, we con-ducted a mixed-method analysis on the data obtained in the Stack Overflowdata dump. While there have been other studies that have focused on assess-ing and improving the quality of web applications after the applications havebeen released [14], [46], [26], [25], there has been no systematic attempt tounderstand the sources of confusion and misconceptions among developerswhile they are building web applications. Understanding these issues is anecessary first step towards improving web application quality. Other workhas studied these issues based on console messages, static analysis and bugreports [14], [46], [26], [25]. However, discussions on QA websites such asStack Overflow captures developer’s thought process. Further they containa satisfactory answer that helps us in further understanding the issue.1.1 ObjectivesIn this study, our main goal in this study is to understand the common chal-lenges and/or misconceptions among web developers. To pursue our goal,we conduct a quantitative as well as qualitative study of more than 500,000Stack Overflow questions related to web development. Stack Overflow3 is aquestion and answer (QA) site for programmers. Being one of the active QAsites [38], the data available in Stack Overflow is huge, and can provide high-level insights into the issues faced by programmers in web development. Wechose Stack Overflow for our study as the questions contain detailed informa-tion about the issues faced by developers, often followed by a discussion and3http://stackoverflow.com31.2. Thesis Contributionan ‘accepted’ answer. Our study pertains to QA items related to JavaScript,HTML5, and CSS, on Stack Overflow.We are not the first to use Stack Overflow for understanding issues facedby programmers. Prior work has used Topic Modelling on Stack Overflowquestions to list the categories of discussions [7] [3] [29], or used Stack Over-flow statistics to analyze user behaviour [33] [27] [4]. However, none of thesepapers examine fine-grained aspects of Stack Overflow data related to webapplications’ development. Doing so requires new kinds of heuristics to ana-lyze the data, and to gather insights from it. To the best of our knowledge,we are the first to analyze Stack Overflow data with regard to modern webapplication development, and to extract actionable insights from the datafor the developer and researcher communities.1.2 Thesis ContributionThis thesis makes the following main contributions:1. We define novel heuristics for analyzing Stack Overflow questions basedon participating user reputation; this is needed since existing ways ofranking questions do not satisfy our criteria for extracting importantinformation (Section 3.3);2. We compare the quality and accuracy achieved by the supervised learn-ing algorithm for automated tagging;3. We categorize related discussions into multiple categories based on thedominant topics in the interactions among developers. We then high-41.2. Thesis Contributionlight the important topics of discussion from these categories. In addi-tion, we categorize discussions by mobile web developers with respectto HTML5, CSS and JavaScript based on the dominant topics of dis-cussion;4. We identify temporal trends in related discussions to understand cur-rent and future trends in the areas pertaining to client side web devel-opment; and5. Finally, we devise a metric to rank Stack Overflow questions based onthe contributions by registered users and qualitatively analyze the top50 questions.The main findings from our study are:1. Cross-browser related discussions while prevalent in the past, are be-coming less important.;2. DOM APIs and event handling issues have been a significant source ofconfusion for web development;3. Overall share of mobile development is increasing and HTML5 is gain-ing popularity in (mobile) web applications;4. Web related topics are becoming more prevalent in mobile develop-ment, though the topics are broadly similar to those in other webapplications; and5. Even expert programmers are confused by some of the new featuresadded to HTML5, CSS3 and JavaScript.51.3. Thesis Organization1.3 Thesis OrganizationThis chapter serves to establish the overarching goal and motivation of thisthesis. Chapter 2 discusses the background information on web applicationsand establishes the motivation behind this work. Chapter 3 describes in de-tail the experimental methodology used to mine the questions and answersfrom the Stack Overflow website. Chapter 4 provides a detailed informationregarding the results found from this study. Chapter 5 discusses the implica-tions these results can have with respect to web developers, tool developersand the research community, as well as it lists the threats to validity forthe experimental methodology. Chapter 6 discusses the related work, andChapter 7 concludes and presents future research directions.6Chapter 2Background on WebApplications and Related WorkThis chapter provides background information about modern web applica-tions, followed by a brief description about Stack Overflow and its datadumps. Finally we provide an overview of the related work and where westand with respect to it.2.1 Web ApplicationsModern web applications consist of both client and server side components.In this paper we focus on the client-side of the web applications, which consistof the following aspects:JavaScript is a prototype-based scripting language with first-class func-tions. JavaScript is mainly used to (1) attach various events to theDOM tree, (2) dynamically change the state of DOM tree by mod-ifying the elements or their attributes by calling DOM API accessmethods, and (3) communicate asynchronously with the server. Java-Script is event-based, dynamically typed, and asynchronous in nature.72.1. Web ApplicationsWhile JavaScript is predominantly used in the client-side of web appli-cations, it is becoming increasingly popular in server-side applications,game engines, and desktop applications.HTML5 HTML is used to define the layout of the web page. Internally,the browser generates a Document Object Model (DOM), which is ahierarchical representation of the state of elements in the web page.Changes in the value of any of these elements are reflected on the ren-dered page. HTML5 is the latest version of HTML and it marks asignificant improvement over the previous versions. The main goal ofHTML5 is to increase the human readability of the code and include na-tive support for multimedia features such as audio and video. HTML5has added new HTML tags such as canvas, and has introduced newattributes for the existing tags to provide additional information in asystematic manner. New JavaScript API’s have also been introducedas part of HTML5 specification.CSS is a design language used to define the presentation of the web doc-ument. It can be used to modify style properties and change the pre-sentation of a particular node or a group of nodes in the DOM tree.Mobile application development has also been largely influenced by theadvancement in HTML5 and CSS3 [37]. HTML5 is becoming a commonplatform for mobile application development, and companies are investingsignificant resources in supporting it [1].82.2. Stack Overflow Dataset2.2 Stack Overflow DatasetStack Overflow is a popular community-driven questioning and answeringservice. Users can ask questions, provide answers to the questions asked,mark the questions as favourite, up vote / down vote an answer, tag ques-tions, and carry out other community related tasks. It has been actively usedby programmers to ask questions [38]; from January 2009 to December 2012,a total of 4,125,638 questions have been asked by users on Stack Overflow,with a mean of 85K questions per month.Stack Overflow provides data dumps of all user-generated data, includingquestions asked with the list of answers, the accepted answer per question,up/down votes, favourite counts, post score, comments, and anonymizeduser reputation. Stack Overflow allows users to tag discussions and has areputation-based4 mechanism to rank users based on their active participa-tion and contributions.For this study, we downloaded a data dump containing data from June2008 to March 2013. Note that Stack Overflow originated only in June2008. Therefore, our dump includes all the questions and answers on StackOverflow until March 2013.The data dump consists of six files in XML format:1. Posts.xml: Contains all questions and answers posted on Stack Over-flow. Each question or answer is stored as a separate post with a uniqueid, and other attributes such as user id, time, text associated with it.2. Posthistory.xml: Contains all edits made to each post.4http://meta.stackoverflow.com/help/whats-reputation92.2. Stack Overflow Dataset3. Users.xml: Contains an anonymized list of Stack Overflow users.4. Comments.xml: Contains the list of comments made to each post .5. Badges.xml: Contains a list of badges earned by the Stack Overflowusers.56. Votes.xml: Contains the count for all the up votes, down votes,favourites, etc for each post.In addition to the above, Stack Overflow makes the following meta-dataavailable for analysis.Tags. Stack Overflow allows users to tag each question, with up to a maxi-mum of 5 tags. Users can select an existing tag provided in the autocompletetext box or create a new one. To create a new tag, users need to have a min-imum level of reputation on Stack Overflow 6. This makes sure that newtags are only created by expert users, maintaining consistency among tagsfound on Stack Overflow. Expert users can also change the question tags, ifthe questions are incorrectly tagged.User Reputation. Stack Overflow provides a metric called Reputation7to rank their users. Reputation is an approximate measurement of howmuch the community trusts a user; it is earned when the peers appreciatewhat a user is contributing. Users do not need reputation for basic sitefunctionalities such as asking questions and providing answers, however users5Besides gaining reputation with questions and answers, users receive badges for beingespecially helpful.6http://stackoverflow.com/help/privileges/create-tags7http://meta.stackoverflow.com/help/whats-reputation102.3. Related Workwith high reputation score gain more privileges. The primary way to gainreputation is by posting good questions and useful answers. Votes on theseposts cause user to gain (or sometimes lose) reputation. The maximumnumber reputation points that can be earned in a day is 200, thus makingsure that the reputation gained by a user is by actively and consistentlyparticipating in the site activities.2.3 Related Work2.3.1 Stack Overflow DatasetDataset Analysis. Stack Overflow has been extensively studied and ana-lyzed for a wide variety of empirical studies. For example, researchers haveused Stack Overflow to analyze prominent topics of discussion [3, 7, 47],extract documentation [29], assign tags to discussions [35], and analyzecode [42]. Saxe et al. [36] compared the code snippets found on Stack Over-flow website with the malware code snippets. The preliminary results indi-cate that there exist similarity in naming conventions and further analysiscan be used to build malware detection patterns.Stack Overflow dataset has also been used to generate API documen-tation using crowdsourcing techniques. Campbell et al. [11] compared thePHP and Python code snippets found on Stack Overflow website with theofficial documentation to discover topics that are inadequately covered bythe API documentation. Parnin et al. [29] investigates the dynamics of asuccessful API community on Stack Overflow.Metadata Analysis. Other work uses the metadata (such as user age,112.3. Related Workknowledge level, gender, tags for each question) attached to the Stack Over-flow questions to understand user behaviour [4, 21, 27, 33, 44]. Prior workhas also used Stack Overflow metadata to analyze mobile API usage [19] andsecurity issues related to these APIs [41] by analyzing tags attached to eachquestion.However, none of these studies have analyzed web developmentdata on Stack Overflow, particularly for the client-side. As we haveseen in this paper, such discussions are increasing in volume and hence itis important to understand them. To the best of our knowledge, we arethe first to mine and analyze web development related discussions on StackOverflow.2.3.2 Web Application AnalysisEmpirical Studies. Several studies have empirically analyzed the relia-bility, security and performance of client-side web applications. For exam-ple, Ocariza et al. [26] used error messages logged to the console to ana-lyze JavaScript errors in web applications. In a recent empirical study [25],we analyzed bug reports of twelve open source applications to understandthe root causes of failures in them. Ratanaworabhan et al. [32] study thedynamic behaviour and performance of JavaScript-based web applications.Nikiforakis et al. [24] performed an empirical study to analyze the trust re-lationship between the production website and JavaScript library providers.Bug Finding. Prior work has also focussed in finding bugs in existing webapplications. Bleaker et al. [8] developed an automated technique for finding122.3. Related Workbugs in web applications using dynamic test case generation. Zheng et al. [46]statically analyzed JavaScript code to locate bugs caused by asynchronousfunction calls. Petrov et al. [30] defined a happens-before relation to detectconcurrency bugs in web applications.Security Vulnerability Analysis. Other work has studied the preva-lence of security related vulnerabilities in JavaScript code such as Cross SiteScripting [45]. Cova et al. [12] analyzed JavaScript code to detect drive-by-download attacks and malicious JavaScript code within the production webapplication. Dang et al. [18] performed an empirical study to analyze privacyviolating information flows within existing web applications.The main difference between these studies and ours is that we studythe sources of difficulty, confusion, and misconception in programmers’ mindsduring web application development activities. Because we analyze the nat-ural language text of programmers questions and accepted answers, we canget to the root of a confusion or difficulty, which is typically not apparentfrom the code or other artifacts produced during the development process.13Chapter 3Experimental MethodologyIn this chapter, we list the research questions in our experiments. We thenprovide an overview of the Stack Overflow dataset available for our analysis.Next, we describe data filtering heuristics used in our experiments. Finally,we explain what class of learning algorithms is a best fit for our analysis andhow we process the available dataset, with respect to each research question.3.1 Research QuestionsOur research questions are formulated as follows:RQ0: What percentage of the questions on Stack Overflow are related toclient side web development and how is that share changing with time?RQ1: What are the categories of topics of discussion among web developers?RQ2: What are the hot topics related to web development in terms of im-portance?RQ3: Are there temporal trends present in discussions related to web de-velopment?143.2. Data PartitioningTable 3.1: No. of questions in each subset of data.Dataset Tag No. of questions % of QuestionsDS1 JavaScript 342363 7.39%DS2 HTML5 31777 0.65%DS3 CSS 125906 2.71%Total 500046RQ4: How prevalent are web-related topics in discussions related to mobileweb development?RQ5: What are the main technical challenges faced by web developers?3.2 Data PartitioningPrior to answering the research questions, we need to understand if StackOverflow has sufficient related data to answer these questions. Being one ofthe active QA sites [38], developers for all languages are attracted to StackOverflow. To this end, we study the total number of Stack Overflow ques-tions that are related to client-side web development. We extract questionscontaining the following three tags, namely, JavaScript, HTML5 and CSS,and store them separately in three datasets. Questions containing more thanone of the above mentioned tags overlap among these datasets. The num-ber of questions in the three datasets corresponding to each tag is shown inTable 3.1.Figure 3.1 shows the growth in the percentage of questions that pertainto web development over time. The results show that Stack Overflow con-tains a significant number of questions related to web development, from153.2. Data Partitioning						Figure 3.1: Share of web related questions on Stack Overflow.its inception. We find that the overall share of web related discussions isincreasing among developers from Jan’09 to Dec’12. This indicates that webdevelopment is gaining popularity among developers. Further, while Java-Script continues to be the dominant topic of discussion for client-side webdevelopment (at 8%), CSS and HTML5 are gaining popularity, althoughtheir share of questions is low with 2% and 1% respectively. Therefore, thesetopics may gain a larger share of the questions in the future.Finding #0: Approximately 10% of Stack Overflow questions be-long to client side web development and this share is consistentlyincreasing with time.163.3. Data Filtering3.3 Data FilteringAs we saw in the previous section, there are thousands of questions related toclient-side web development on Stack Overflow. In order to extract the mostimportant questions and their answers, we devise two heuristics as follows:• H1: Only accepted answers should be considered.• H2: More weight should be given to questions with high view counts.H1 is based on the analysis that majority of the accepted answers areprovided by users with high reputation, and there are only 8% of users onStack Overflow with above average reputation (135). We compared the av-erage reputation of users asking questions (1826) and the average reputationof users providing accepted answers (29625). We found that the latter is 16times higher than the former. From this, we conclude that in majority of thecases, questions are asked by novice users and are predominantly answeredby expert users. Further, answers can be accepted only by the users ask-ing the question, showing that accepted answers are satisfactory from thequestioner’s point of view. Therefore, we consider only accepted answers touncover important topics of discussion.Figure 3.2 represents the trend for number of questions asked and answersprovided by users, grouped by their reputation. Number of answers providedis directly proportional to the user reputation, whereas number of questionsasked is inversely proportional to the user reputation. Therefore, we consideronly accepted answers to uncover important topics of discussion.H2 is based on the fact that view count is the only statistic that isupdated when a guest user views the question. Many developers use Stack173.4. Supervised Vs. Unsupervised LearningFigure 3.2: Number of users, questions, and accepted answers based on theaverage reputation of the userOverflow to read already resolved questions. Such guest users do not activelyparticipate in QA activities, and hence cannot affect any other statistics.Therefore, we believe questions with higher view counts are likely to be ofgreater interest for developers, and should be given more weight in terms ofimportance.3.4 Supervised Vs. Unsupervised LearningA straw man approach to gain insights from the available dataset would be tocount the tags attached to each question. However, there are three problemswith using tags for grouping: 1) tags provide only abstract information aboutthe topic of discussion, whereas we want specific information, 2) the user whocreated a question could be unsure about the appropriate topic of discussion,thus might tag it incorrectly, 3) users tend to add as many tags as possible183.4. Supervised Vs. Unsupervised Learning(up to 5) making their question visible in more search queries8, thereforeincreasing the likelihood of receiving an answer quickly. Therefore, we cannotuse tags for the analysis and need to utilize machine learning techniques togain insights from the dataset.Before we further dive into the details of our methodology, we first needto understand which class of machine learning algorithms (supervised vs.unsupervised) is a good fit for our analysis. Supervised learning takes placewhen the training instances are labelled with the correct result, which givesfeedback on how learning is progressing. This is akin to having a supervisorwho can tell the agent whether it was correct. In unsupervised learning, thegoal is harder because there are no pre-determined categorizations.3.4.1 OverviewThe overall goal of this phase is to see if we can learn from the tags attachedto each question, and build an automated tagging system that can then beused to rectify and improve the tags attached to each question in the dataset.There has been a vast of prior work done that focus on building automatedtagging systems [13, 15, 28, 39, 40]. The quality of such automated taggingcan vary depending upon various aspects of training data set, such as numberof classes (tags), number of samples per class (questions per tag), number ofclasses a sample belongs to (tags per question) and the quality of samples(length of each question). Therefore, in order to gain better insights on thequality of an automated tagging system, we select our training samples basedon the criteria described in the following section.8http://meta.stackoverflow.com/questions/164348/193.4. Supervised Vs. Unsupervised LearningFor each training set, we then use Naive Bayes Classifier to build an au-tomated tagging system (as it is one of the simplest and efficient classifier fortext classification [43]) followed by a 10-Fold Cross Validation Technique9 tomeasure the accuracy for each system. Next, we analyze the results obtainedfrom each classifier and compare the effects of different features used to selectthe training data.3.4.2 Training Data SelectionThe dataset provided by Stack Overflow is huge. Directly processing all thequestions and their tags will be time consuming. Therefore we select a subsetof data to perform our analysis. We use the DS1 (Table 3.1) dataset definedin section 3.2, as it contains 300,000+ questions available for analysis. Sinceall the questions in this dataset had one common tag javascript attached tothem, i.e., all questions belong to same class, any classifier we build will tagthe new question to this class. In order to remove this bias we removed thistag from all the questions. We also removed the questions that had only onetag(javascript) attached to them, so that there is no question with 0 tagsafter removal of javascript tag i.e each sample belongs to at least one class.As the quality of automated tagging system can vary depending uponthe number of classes each sample belongs to, we created 3 initial sets oftraining data. Each training set consisted of tags as classes and questiontexts as training samples. For Training Set 1, we only considered one tagattached to each question.This way each question belongs to one and onlyone tag, i.e., each training sample belongs to a single class. For Training9http://en.wikipedia.org/wiki/Cross-validation_(statistics)203.4. Supervised Vs. Unsupervised LearningTag IDNumber of Questions per tagTraining Set 3 (3 Tags) Training Set 2 (2 Tags) Training Set 1 (1 Tags)0 20 40 60 80 10010k100k1001kFigure 3.3: Number of questions per tagSet 2, we considered maximum of 2 tags attached to each question, this wayeach training sample can belong to one or two classes. For Training Set 3, weconsidered three tags attached to each question. From all the three sets, weremoved the tags that had less than 500 questions attached to them. Thisstep was to make sure that we have enough training samples for each class.Distribution for the number of questions per tag in each training set can befound in Figure 3.3. The area under each curve represents the total numberof questions in each dataset.Table 3.2 provides the total number of training samples and number ofclasses in each dataset. To gain better insights depending upon the variousfactors, we created subsets of data that filter the training samples basedon different requirements. One such requirement was minimum number of213.4. Supervised Vs. Unsupervised LearningTable 3.2: Training sets for the classifierTraining Set Number of Tags Total number of samplesTraining Set 1 41 161304Training Set 2 82 258696Training Set 3 114 306149characters in each question i.e., quality of training samples. So we created 3subsets for each of these with different minimum question length i.e. 1000,1500, 2000 characters. Further, for each of these datasets we created 4separate datasets , where each training set had a different requirement forminimum number of questions ( 500, 1000, 1500, and 2000) per tag i.e.number of samples per class. So in total we had 3 ∗ 4 ∗ 3 = 36 differentdatasets for analysis.Table 3.3: Training datasets based on different features of dataNo. of tags Tag frequency Question length No. of shortlisted tags1 1000 500 151 1000 1000 61 1000 1500 31 1000 2000 31 1500 500 111 1500 1000 31 1500 1500 31 1500 2000 31 2000 500 71 2000 1000 31 2000 1500 31 2000 2000 22 1000 500 282 1000 1000 14continued on next page...223.4. Supervised Vs. Unsupervised LearningTable 3.3 – continued from previous pageNo. of tags Tag frequency Question length No. of shortlisted tags2 1000 1500 62 1000 2000 42 1500 500 202 1500 1000 82 1500 1500 52 1500 2000 32 2000 500 142 2000 1000 62 2000 1500 42 2000 2000 33 1000 500 363 1000 1000 183 1000 1500 83 1000 2000 53 1500 500 233 1500 1000 93 1500 1500 53 1500 2000 43 2000 500 173 2000 1000 73 2000 1500 53 2000 2000 33.4.3 Building the ClassifierTable 3.3 provides an overview of each training dataset. As seen in the table,the number of shortlisted tags differ. The more the number of tags, the betteris the quality of automated classification. However, as noticed from the table,to include more tags we need to compromise on the quality of training databy including classes with less number of samples, and samples with lowerquality, which in return will affect the accuracy of the automated classifier.233.4. Supervised Vs. Unsupervised LearningTag IDNumbuegr o fQusTafQDImt Qu f t i DnpagNumbu gQQIbIfT rari ni g SSSget i 3(s) T rari ni g 2SSget i 3(s) T rari ni g1SSSget i 3(s)Minimum	500CharactersMinimum	1000CharactersMinimum	1500CharactersMinimum	2000CharactersS SS122S02Figure 3.4: Accuracy when each sample belongs to maximum of 1 classTherefore, we can say that there exist a balance between the accuracy andthe quality of suggestions provided by the classifier.3.4.4 Classifier ComparisonWe used Naive Bayes Classification to train our automated tagging systemusing the different training sets defined in Table 3.3. To measure the accuracyof each classifier we used 10-Fold Cross Validation Technique. We thencompared the results in terms of accuracy of each classifier with respect toquality of samples, minimum number of samples per class, and maximumnumber of classes per sample.Figure 3.4 -3.6 provide as overview of the accuracy of each classifier withrespect to the parameters described above. As we can see from the results,the accuracy of each classifier varies depending upon the quality of trainingsamples. The better the quality of training data, the higher is the accuracy243.4. Supervised Vs. Unsupervised LearningTag IDNumbuegr o fQusTafQDImt Qu f t i DnpagNumbu gQQIbIfT rari ni g SSSget i 3(s) T rari ni g 2SSget i 3(s) T rari ni g1SSSget i 3(s)Minimum	500CharactersMinimum	1000CharactersMinimum	1500CharactersMinimum	2000CharactersS1S0S4S6SFigure 3.5: Accuracy when each sample belongs to maximum of 2 classesTag IDNumbuegr o fQusTafQDImt Qu f t i DnpagNumbu gQQIbIfT rari ni g SSSget i 3(s) T rari ni g 2SSget i 3(s) T rari ni g1SSSget i 3(s)Minimum	500CharactersMinimum	1000CharactersMinimum	1500CharactersMinimum	2000CharactersS1S0S4S6SFigure 3.6: Accuracy when each sample belongs to maximum of 3 classes253.4. Supervised Vs. Unsupervised Learningof the classifier.Based on the accuracy of each classifier, the key insights are as follows:• Increasing the number of minimum samples per class increases the ac-curacy of the automated classifier, whereas the quality of the designedclassifier is decreased due to lesser number of classes available for clas-sification.• Increasing the quality of samples also increases the accuracy of the ac-curacy of the automated classifier, whereas the quality of the designedclassifier is reduced due to lesser number of qualified samples, leadingto lesser number of qualified classes for classification.• The accuracy of the classifier is reduced when a single sample belongsto more than one class.3.4.5 SummaryAs a result of our analysis, we conclude that the desired accuracy for anautomated tagging system can be achieved. However increasing the accuracydecreases the quality of the tagging system. In our following experiments, weneed to analyze the complete dataset to understand the discussions amongweb developers. Therefore, building a classifier using supervised learningtechniques will result in low accuracy, as the quality of samples as well asthe number of samples per class vary over a broad range starting from 1sample per class. Hence, for our analysis we rely upon unsupervised learningmethods, where we do not have any prior information about the classes andeach sample is treated independent of the tags attached to it.263.5. Data Processing3.5 Data ProcessingIn order to understand common challenges and misconceptions among webdevelopers, we study questions as well as their accepted answers similar toprior work [7] [3], followed by a fine-grained analysis of only the acceptedanswer of each question. Our analyzed dataset is available for download.10After we filter the dataset, we process it using Natural Language Process-ing (NLP) methods to understand the main topics. We use Latent DirichletAllocation (LDA), a type of Topic Modelling to answer our research ques-tions. Topic Modelling is a type of statistical modelling that can be used todiscover hidden topics in a collection of documents, based on the statisticsof words in each document [9]. LDA is a generative form of Topic Modellingthat allows a set of observations to be explained by unobserved groups, thatexplain why some parts of data are similar [10]. The output of LDA is a listof topics, topic proportion of each document, and topic share of each topicin the collection. The Topic Proportion of each document refers to whatpercentage of it belongs to each topic, while the Topic share is a measureof how much a topic has been discussed as compared to other topics in thecollection.Figure 3.7 represents our overall methodology used for analyzing thequestions. The rest of this section is organized according to the researchquestions discussed above. The steps in bold correspond to steps in Fig-ure 3.7. Step 1 and Step 2 are common to all the analysis we do andare described in section 3.3. The other steps are specific to the research10http://www.ece.ubc.ca/~kbajaj/so/data.zip273.5. Data ProcessingFigure 3.7: Our overall analysis workflow.283.5. Data Processingquestions.RQ1: Categorization of topics of discussion. To answer RQ1, i.e.,listing the categories of discussions, we used LDA to categorize the discus-sions on Stack Overflow. Categories discovered in this phase represent majortopics of discussion related to web development. We first extracted the textof questions and accepted answers (Step 10). We then used the PorterStemming Algorithm [31] to convert all words to their root words (such as“programmer” to “program”) and removed stop words (Step 11). Finally,we passed the resulting text as an input to LDA process (Step 12) fordiscovering hidden topics. We used the list of generated topics to identifythe categories of discussion, and topic share to obtain the proportion of thediscussions belonging to each category. The labels were assigned manuallyby the author based on the keywords suggested by the LDA Algorithm. Wehave made the labels publicly available along with the dataset.RQ2: Finding hot topics of discussion. To answer RQ2, we used LDAto analyze the top 2000 most viewed questions from each category identifiedin RQ1. The analysis in this phase is based on the two heuristics definedin section 3.3. We then ranked questions based on view count (Step 3)and shortlisted the first 2000 questions (Step 4). Then we extracted theaccepted answer text (Heuristic 1) for each question and processed the textusing LDA to generate a list of hot topics of discussion (Steps 10–12).RQ3: Analyzing temporal trends over time. To answer RQ3, weused LDA to analyze the Stack Overflow data on a half yearly basis. Wedivided our dataset into subsets of 6 months data each (Step 5), followed293.5. Data Processingby LDA (Steps 10–12) to analyze important topics of discussion in eachtime period. The choice of 6 months was based on the trade-off betweenthe number of questions required for efficient topic modelling versus analysisgranularity. Decreasing the time period further will decrease the input data,affecting the efficiency of topic modelling. Our data spanned from July 2008to March 2013, so we considered 8 subsets each for DS1, DS2, and DS3starting from Jan’09-Jun’09 till July’12-Dec’12. We decided to skip the first6 months of the data as the number of questions on Stack Overflow werelimited during that time period, since the site had just been launched.RQ4: Prevalence of web in mobile development. To answer RQ4, wefirst analyzed the trend of JavaScript, CSS and HTML5 related discussionswithin the subset of questions related to mobile development. We then cre-ated subsets of these questions and performed LDA on those datasets. Wewanted to study the categories of discussion related to mobile developmentand whether these categories are different from those in web development.To filter out questions related to mobile development, we relied on mobileplatform specific tags used by the users. The usage of tags for filtering thequestions is justified as we are using the generic tags (as described in Sec-tion 3.5) that differentiate between different mobile development platforms.We used the mobile development related tags shortlisted in prior work [20]to a create subset of the data (Step 7) from the three datasets that wehad. The tags are: android, bada, blackberry, iphone, ios, java-me, phone-gap, symbian, tizen, webos, and windows-phone. Next we performed LDA(Step 10–12) to identify the main topics of discussion.RQ5: Technical challenges faced by developers. To answer RQ5, we303.5. Data Processingfirst select important questions and qualitatively analyze them in depth. Toselect the important questions, we devise a metric based on the statisticsprovided by Stack Overflow, and rank the questions (Step 8). The reasonwe need a new metric is that the metrics used by Stack Overflow do notnecessarily indicate the question’s importance. For example, Stack Overflowprovides a post score which is the sum of the up votes for a post minussum of the down votes. However, the votes accrued by a question do notdifferentiate the number of users involved in the discussion from those whoare just interested in the solution. This is important as users who are involvedin a discussion may have a very different perspective from users who simplyview the solution, and up/down vote the answer. Further, the reputation ofthe user who votes on a question is also important.To estimate a question’s importance taking the above factors into ac-count, we propose a new metric, called Accumulated Post Score (AMS):AMSi = 3Ui − 25Di + 10Ci +Ai + Fi, (3.1)where U , D, C, A, and F are as presented in Table 3.4. The weightsassigned to these factors are based on the value of reputation required toperform each of these activities on Stack Overflow11.After computing the accumulated post score, we filter the top 50 (Step9) questions with the highest score from each dataset, and analyze themmanually (Step 16). We choose 50 to balance the depth of the qualitativeanalysis with the time taken for the analysis.11http://stackoverflow.com/help/privileges?tab=all313.5.DataProcessingTable 3.4: Factors used in our Accumulated Post Score formula.Name Definition Rationale Required ReputationAnswer count - Ai Represents the number of answers pro-vided to the question.High number of answers implies more peopleare trying to figure out the correct solutionto the problem.0Comment count - Ci Represents the number of commentson a particular question.High number of comments implies more peo-ple are interested in the particular topic.50Favourite count - Fi Represents the number of users markedthe question as favourite.High favourite count implies more people areinterested in the solution.0Up votes - Ui Represents the number of people whopromoted the question.More number of people liking the questionimplies the topic of discussion is importantto the community.15Down votes - Di Represents the number of people whodemoted the question.More number of people not liking the ques-tion implies the question is incorrect or doesnot provide any value to the community.125323.6. Summary3.6 SummaryIn this chapter, we presented our technique to analyze the Stack Overflowdiscussions. We first formulated two heuristics to rank questions based on thenumber of views attained by each question. We then compared the qualityand accuracy achieved using supervised learning algorithms. Based on ouranalysis, we chose unsupervised learning to answer our research questions.Next, we defined our methodology to answer each research question. Last,we defined Accumulated Post Score metric to rank Stack Overflow questionsbased on the quality of discussion generated by each question. In the fol-lowing chapter, we present the results of our analysis with respect to eachresearch question.33Chapter 4ResultsIn this chapter, we present the results of our study, according to the researchquestions formulated in the Section 3.1. For each research question, wedescribe the outcome of our methodology as well as summarize the resultsin terms of findings. The sections in this chapter correspond to the researchquestions.4.1 Discussion CategoriesTo answer RQ1, we used the LDA method in the previous chapter on thethree obtained datasets. Figures 4.1–4.3, present the results of this processcorresponding to JavaScript, HTML5 and CSS3, respectively. We providesome examples related to these categories in section 4.5. The results in thisphase provide us with an aggregate picture of the topics that have gainedmost attention from web developers over the past four years.Figure 4.1 shows the distribution of topics related to JavaScript. Ascan be seen in the figure, Cross Browser Compatibility related discussionshave the maximum weight among all topics. This implies that developershave faced challenges in making their code work consistently on all browsers.Further, DOM related discussions have gained significant attention from de-344.1. Discussion Categories							 	 !"#$	%		&	'  ('			$		Figure 4.1: Categories of JavaScript-based discussions.velopers. This confirms the results of our previous study [25], where we an-alyzed bug reports from different web applications and JavaScript libraries,and found that DOM related errors were dominant. Other popular issuesbeing discussed are event handling, form validation and the jQuery library.We then compared our results with the JavaScript reference provided byw3schools12 to find what topics were missing in the discovered categories.These included features such as eval, cookies, and navigator. This showsthat developers do not have many questions or concerns about these top-ics. This is somewhat counterintuitive as the first two of these topics havedependability and security implications.Figure 4.2 shows the distribution of topics related to HTML5. Here, theCanvas API has been a major topic of discussion among HTML5 developers.Examples of questions regarding Canvas include (1) handling images in can-12http://www.w3schools.com/js/default.asp354.1. Discussion Categories			 !!" #	$%&   Figure 4.2: Categories of HTML5-based discussions.vas, and (2) converting HTML to Canvas and vice versa. HTML5 browsersupport has also been a major issue discussed among developers, such as afeature being supported by one browser but (yet) not others. Usage of newattributes such as “data-” , media elements such as audio and video, newform elements such as email input and HTML5 based form validation areother topics of discussion.When comparing the topics discussed with the w3schools reference forHTML5, we found that there was little to no discussion related to HTMLfeatures such as drag & drop and web-workers13.Figure 4.3 shows the distribution of topics related to CSS. Among CSStopics, the layout of the DOM tree has gained the maximum attention fromdevelopers. Other common topics for discussion are (1) questions on plac-13A web worker is a JavaScript module that runs in the background, independently ofother scripts, without affecting the performance of the page.364.1. Discussion Categories				 !"# 		 		#$%&Figure 4.3: Categories of CSS-based discussions.ing an HTML element inside/outside another, (2) creating a web page thatis displayed uniformly across browsers, (3) questions related to CSS Box-Model, which describes the content of the space taken by an element,(4)modifying CSS using external widgets and JavaScript code, and (5) havingcustom fonts on a webpage. Again, based on the w3schools reference forCSS, we found that there was limited discussion on CSS features such assprites, i.e., a collection of images in a single image.A detailed list of keywords attached to each category for web relateddiscussion in JavaScript, HTML5 and CSS can be found in the Table A.1-A.3 in the appendix.Finding #1: Cross Browser related discussions have gained max-imum attention from web developers, followed by DOM and Canvasrelated discussions.374.2. Hot TopicsTable 4.1: Hot topics with the highest view counts. Hot topics with littlediscussion are presented in boldface.Technology Hot topicsJavaScript Document Structure, File Handling, Cross-Browser, jQuery, DOMHTML5 Media, Browser Support, HTML5 Elements, Canvas API, Offline WebCSS CSS3, Fonts, JavaScript, Box-Model, Layout4.2 Hot TopicsTo answer RQ2, we identify topics that have been viewed by the most num-ber of developers regardless of how much these topics are discussed amongdevelopers (this was considered in RQ1). We call these hot topics, which arediscussions that many developers view, probably to understand and resolvetheir issues, but for which there is no further discussion. Because not allusers are logged into Stack Overflow when they view a discussion, it is dif-ficult to tell which user viewed what content. Therefore, we consider onlythe aggregate view count rather than view-counts for different users whenclassifying a topic as hot.Table 4.1 shows the hot topics we obtained. We expected the results tobe similar to the topics that are discussed most often in RQ1. However, notall the hot topics are similar as can be seen from the table. For example, filehandling in JavaScript and media in HTML5 were not among the discussioncategories obtained in section 4.1. We believe that this is because the solu-384.3. Temporal Trends						Figure 4.4: Temporal trends in JavaScript-based discussions.tions posted are satisfactory, thus obviating the need for further discussions.Finding #2: View counts provide a hint towards recurrent issuesfaced by web developers such as those pertaining to HTML5 Ele-ments, DOM structure, offline web, and CSS3.4.3 Temporal TrendsTo address RQ3, we divided the four year time period of the data into sixmonth intervals, and used topic modelling to analyze the dominant topicswithin each subset of data. Figures 4.4–4.6 present the temporal trends inthe discussions for JavaScript, HTML5 and CSS, respectively.As can be seen in Figure 4.4, DOM related issues have been consistentlydiscussed over the span of 4 years. However, cross browser compatibilityrelated discussions while dominant initially, have seen a sharp decline re-394.3. Temporal Trends							 !Figure 4.5: Temporal trends in HTML5-based discussions.cently. This means issues related to browser compatibility have been reduc-ing in importance over time. Possible explanations could be improvementsin the quality of JavaScript IDE’s, better JavaScript libraries that handlecross-browser issues (such as jQuery), and/or more robust browsers that fol-low W3C specifications. On the other hand, CSS related discussions havegained in importance in the recent years. Form validation issues have alsobeen discussed consistently over the span of 4 years.Figure 4.5 shows the temporal trends in HTML5 related discussions.Browser support has been discussed heavily among HTML5 developers. How-ever, these have dropped in importance recently suggesting that the browsersupport for HTML5 is maturing rapidly. The same is true for the Can-vas API, which is declining in popularity. However, HTML5 specific API’ssuch as local storage have gained importance over time, meaning that moreand more developers are utilizing client-side storage capabilities provided by404.3. Temporal Trends						Figure 4.6: Temporal trends in CSS-based discussions.HTML5. Finally, mobile device specific issues such as interfacing device APIwith web applications or mobile specific themes have also become popular,suggesting that HTML5 is gaining popularity in mobile application develop-ment. The next subsection provides more details about web technologies inmobile development.Figure 4.6 shows the temporal trends in CSS discussions. Again here,we can clearly observe that browser compatibility discussions have droppedsharply in the recent past. Further, JavaScript related discussions have beendiscussed consistently over the span of 4 years by the CSS developers, whileCSS3 related discussions have increased over time. Finally, discussions re-lated to adjusting the style of website according to the view (i.e., viewportmeta tag) have recently become important, again highlighting the impor-414.4. Mobile Developmenttance of mobile web development.Finding #3: Cross-browser compatibility issues have seen a sharpdecline in the recent past. Further, CSS3 and HTML5 discussionsare gaining popularity in web as well as mobile application develop-ment.4.4 Mobile DevelopmentTo answer RQ4, i.e., prevalence of web technologies in mobile development,we first study what percentage of mobile related discussions overlap withHTML5, CSS and JavaScript, over different six month time periods. Ascan be seen from Figure 4.7, the share of web based discussions is increasingwithin mobile related questions, although the absolute percentages are lowrelative to the overall share of mobile-related discussions. Further, Java-Script related discussions have seen the sharpest rise in the area of mobiledevelopment, and have nearly doubled from 0.75% to 1.5% over three years.HTML5 related discussions have gone from 0 to 0.6% in this time frame,while CSS discussions have gone from a little over 0.25% to 0.5%.We then study the dominant topics related to JavaScript, HTML5 andCSS for mobile. We expected the results to be different from those obtainedearlier, and involve mobile specific features. However, the results show thatthe issues are broadly similar to those in general web application develop-ment, although with some minor differences, such as geo-location and deviceresolution figuring prominently.424.4. Mobile Development						Figure 4.7: Share of web based discussions in mobile related questions onStack Overflow.Figure 4.8 shows JavaScript discussions related to mobile development.As can be seen in Figure 4.8, JavaScript discussions related to topics such asfile handling and event handling are prevalent in mobile development, whichwas also the case for web applications. However, mobile specific discussionssuch as device API, device resolution, and geolocation are also prevent, butnot found in web development discussions.Figure 4.9 shows HTML5 discussions related to mobile development. Aswas the case in web applications, browser support related discussions areprevalent in mobile applications, as are discussions related to media/formelements. However, mobile specific issues such as Touch events are alsoprevalent, which was not the case for web applications.Figure 4.10 showsCSS discussions related to mobile development. Mobile-specific issues such as device resolution, zooming and touch sensitivity are434.4. Mobile Development					 !"Figure 4.8: Categories of mobile JavaScript-based discussions.						 Figure 4.9: Categories of mobile HTML5-based discussions.444.5. Technical Challenges			 !"#$%& '%$Figure 4.10: Categories of mobile CSS-based discussions.prevalent among the discussions, next to generic topics such as cross-browser,box-model, and layout.A detailed list of keywords attached to each category for mobile relateddiscussion in JavaScript, HTML5 and CSS can be found in the Table A.4-A.6in the appendix.Finding #4: Discussions related to Mobile development are seeingan increasing share of web technologies such as HTML5, and followa similar trend as in web applications.4.5 Technical ChallengesTo gain insights into the kind of technical difficulties faced by web program-mers, we ranked the questions in the three datasets based on their relativeimportance using Equation 3.1. We then manually analyzed the top 50 ques-454.5. Technical Challengestions from each of the three categories, and based on the topics discussed,extracted the dominant categories in the questions. In this section, we dis-cuss a few examples from these top 50 questions that are representative ofthe types of technical challenge that web developers face in their daily de-velopment activities.Issue 1: In HTML5, developers face challenges while working with the newHTML5-JavaScript objects such as localStorage. For example,the following question was asked by a user on Stack Overflow:“I’d like to store a JavaScript object in HTML5 localStorage,but my object is apparently being converted to a string. Ican store and retrieve primitive JavaScript types and arraysusing localStorage, but objects don’t seem to work. Shouldthey?” 14 (s.i.c)At first glance, it seems the question is related to the datatypes thatlocalStorage can store. However, the accepted answer below pro-vides a solution combining the existing techniques used by JavaScriptprogrammers to convert the objects into strings, showing that this wasthe main point of confusion for the user.1 var testObject = { 'one': 1, 'two': 2, 'three ': 3 };2 localStorage.setItem('testObject ',JSON.stringify(testObject))←↩;3 var retrievedObject = localStorage.getItem('testObject ');4 console.log('retrievedObject: ', JSON.parse(retrievedObject))←↩;14http://stackoverflow.com/questions/2010892464.5. Technical ChallengesIssue 2: Issues related to Canvas API are confusing for many developers.These issues vary from simple API calls to complex scripts. The fol-lowing question was asked by a user with a reputation score of 13,151on Stack Overflow, which is significantly higher than the average userreputation (135) on Stack Overflow, pointing to the fact that the useris an expert developer:“Is it possible to capture or print what’s displayed in anHTML canvas as an image or PDF? I’d like to generate animage via canvas, and I’d like to be able to generate a PNGfrom that image.” 15 (sic)The accepted answer (below) provided for this question is a simple callto one of the canvas API functions.1 var canvas = document.getElementById("mycanvas");2 var img = canvas.toDataURL("image/png");3 document.write('<img src="'+img+'"/>');Questions such as this clearly indicate that there is a lack of properand clear API documentation for HTML5. We manually analyzed theHTML5 documentation provided by W3C and inferred that it is voidof many details that developers would need on a daily basis.Issue 3: HTML5 Developers also face browser support issues to maketheir site compatible, as the example below shows.15http://stackoverflow.com/questions/923885474.5. Technical Challenges“I have just installed IE9 beta and on a specific site I created(HTML5) IE9 jumps to compatibility mode unless I manu-ally tell it not to. I have tried removing several parts of thewebsite but no change. Including removing all CSS includes.On some other website of me it goes just fine.” 16(sic)A simple solution (marked as accepted answer) is to tell the browserthat the site is X-UA-Compatible – the X-UA Compatible meta tagallows web authors to choose what version of Internet Explorer thepage should be rendered as – by adding an additional meta-tag.1 <meta http -equiv="X-UA -Compatible" content="IE=Edge"/>The above question was asked by the user with a reputation score of24,453, which implies the user is an expert. This points to the fact thatmany solutions to make HTML5 sites compatible are available but notknown to developers.Issue 4: In CSS tagged discussions, a developer asked the following ques-tion:“I have noticed I am getting a "CSS Explosion". It is be-coming difficult for me to decide how to best organize andabstract data within the CSS file.” 17 (sic)The accepted answer provided for this question lists the set of rules16http://stackoverflow.com/questions/372635717http://stackoverflow.com/questions/2253110484.6. Summarythat a developer should follow while creating stylesheets. We inferredfrom the discussion that the developer was aware of the rules, butdid not follow them in fear of breaking the layout or possible perfor-mance overheads. This CSS maintenance issue has been empiricallyhighlighted before by researchers [22], calling for better tool support.Finding #5: Even expert programmers get confused about featuresof JavaScript, HTML5, and CSS, suggesting that the available APIresources for these features is far from ideal. Also, maintaining webcode, such as CSS, is complex without proper tool support, and usersoften ignore recommendations and best practices.4.6 SummaryIn this chapter, we presented our results with respect to each research ques-tion asked in the previous chapter. The results of our analysis indicate that(1) the overall share of web development related discussions is increasingamong developers, (2) browser related discussions are prevalent; however,this share is decreasing with time, (3) form validation and other DOM re-lated discussions have been discussed consistently over time, (4) web relateddiscussions are becoming more prevalent in mobile development, and (5)developers face implementation issues with new HTML5 features such asCanvas.49Chapter 5DiscussionIn this chapter, we discuss the implications of our findings with respect toweb developers, researchers, and the web standardization community. Wealso consider the threats to validity of our results.5.1 Implications for Web DevelopersWeb developers can use the results to focus and learn from common issuesthat are discussed among other developers. Educating the developers withthe common sources of misconception will avoid future errors and eventuallysave development time. Findings 1 and 3 suggest that while cross-browserissues were important in the past, and have been discussed extensively, theyseem to be much less important today. Therefore, developers can shift theirfocus to other issues, as solutions related to browser issues are availableonline. Results of section 4.3 suggest that we need better IDEs that canassist the developer when coding against DOM and Canvas APIs. Finding 5suggests that we need better API resources for new features in HTML5 andJavaScript and better code maintenance tool support for CSS.505.2. Implications for Research Community5.2 Implications for Research CommunityThe Research community can use our results to focus on specific areas ofweb development. For example, there have been many papers on cross-browser compatibility testing [23, 34], yet it appears that this is no longerthe dominant problem faced by web developers (Finding 3). Rather, the is-sues confronting web developers today seem to be around DOM and canvasinteractions. Analyzing what features of HTML5 are gaining popularity andwhat features are inconvenient for developers to implement can improve theoverall quality of web development. Finding 4 suggests that mobile develop-ment follows a similar trend as web applications. Therefore predicting whatfeatures of HTML5 and JavaScript will be popular in mobile applicationscan guide the developers to build better mobile development tools.Application of our mining methodology is not just restricted to web re-lated discussions. Researchers can use our methodology for analyzing anyarea of interest by selecting appropriate tags to create subset of data. Themethodology for addressing RQ2 takes view counts into consideration. Aswe have seen in Finding 2, view counts provide a different perspective onthe relative importance of discussion items. Further, our formula is based onstatistics provided by Stack Overflow, however, it is not restricted to StackOverflow questions. It can be used in any QA website as long as the web siteprovides similar statistics, with suitable modification of the weights. How-ever, we restrict ourselves to use the factors provided by Stack Overflow anduse similar weights as used by Stack Overflow.515.3. Implications for Web Standardization Community5.3 Implications for Web StandardizationCommunityThe web standardization community can use the results (Finding 5) to ex-tract the areas of web development that need improvements and prioritizethem in terms of standardization. The results can be used to analyze whatfeatures are lacking in web applications, and what areas need better (API)documentation to enhance developer comprehension. The results can alsobe used to analyze how long it takes for a particular feature to become pop-ular after being specified in the standard. For example, understanding whatfeatures are quickly adopted by developers, can aid the development of newfeatures and their standardization. The results can also be used to see whatfeatures have been rarely used and discussed among developers.5.4 Threats to ValidityAn external threat to validity of our results is that we focus on a singlewebsite, Stack Overflow. However, Stack Overflow is one of the most popularand largest question and answer websites for software developers currently.At the same time, Stack Overflow is relatively new, having started only in2008, and hence is not representative of all issues web developers have facedin their development endeavours.Another external threat is the reproducibility of our results. Stack Over-flow is a growing website and hence, it might add more features that canaffect the quality of our results as well as the proposed metric. However,to mitigate this issue we have publicly released the subset of Stack Over-525.4. Threats to Validityflow dataset that we analyzed and its results. The dataset is available fordownload18.Another external threat to validity of our results is that there might beduplicate discussions pertaining to same category of questions. This mightaffect the category share of topics discovered in our analysis. However, ques-tions are actively marked as duplicates on Stack Overflow if they are likelyto be repeated and the solution is already available on Stack Overflow. Weremoved those questions that are marked as duplicates.Another external threat to validity of our results is that we assume thequestions were asked by web developers during the development phase oftheir application. However, it is possible that some of the questions mighthave been asked by quality analysts during the testing phase of the webapplication.An internal threat to validity is that we focus only on discussions taggedJavaScript, CSS and HTML5. However, as we have seen in Table 3.1, thisconstitutes a significant number of questions numbering in the tens of thou-sands of questions per month. Therefore, we believe that these questions arerepresentative of client-side web development.Another internal threat to validity is that we assume the user with highreputation on Stack Overflow are experts. However, it is possible that auser is expert in one area and novice in another where he/she is askingthe questions. This might affect the results of our Finding 5. However,in our preliminary analysis of Stack Overflow dataset we found that theaverage reputation of users asking the question verses the user providing the18http://www.ece.ubc.ca/~kbajaj/so/data.zip535.4. Threats to Validitysatisfying answer was 16 times lesser, which leads to the conclusion that inmajority of the cases, questions are asked by user who have low reputationscore and are not experts.Another internal threat to validity is that we use “HTML5” and not the“HTML” tag to select the subset of questions for analysis. HTML5 is arecent advancement of HTML and is supported by majority of the latestmobile devices. This might bias the results of our study towards mobiledevices.A construct threat to validity is that we designed a new metric to rankquestions by their importance (Equation 3.1) for qualitatively analyzingquestions posed by developers (RQ5), and the fact that we did this partof the analysis manually. However, our metric is based upon statistics col-lected by Stack Overflow, and uses some of the relative weightings that StackOverflow itself uses for ranking questions.Another construct threat to validity is that we base one of our heuristics(H1) on user reputation. A user with high reputation score could contributeto a certain subset of posts that he knows a lot about, but ask questionsabout areas and languages in which he or she would be considered a novice.However, we observed majority of the questions were asked by users withlow reputation score, which means they are not expert in any area.54Chapter 6Conclusion and Future WorkIn this thesis, we performed an empirical analysis of web related discussionson Stack Overflow, a popular question and answer forum, to understandthe common difficulties and misconceptions among developers. Our studyinvolves analyzing the text of both questions and answers related to webdevelopment to extract the dominant topics of discussion using topic mod-elling.Our results show that (1) cross-browser related discussions while preva-lent in the past, are becoming less important, (2) DOM APIs and eventhandling issues have been a significant source of confusion for web develop-ment, (3) HTML5 is gaining popularity in (mobile) web applications, (4) webrelated topics are becoming more prevalent in mobile development, thoughthe topics are broadly similar to those in other web applications, and (5)even expert programmers are confused by some of the new features addedto HTML5 and JavaScript. The results of our study can help the develop-ment and research communities to focus on the misconceptions or sources ofconfusion among web developers. It can also help the web standardizationcommunity understand the adoption of various standards and the factorsimpeding their adoption, if any.556.1. Future Work6.1 Future WorkWhile in this study we only analyzed top 50 questions, we plan to extendour analysis and include top 1000 questions to get better insights on theobtained results.We also plan to look more closely into issues related to mobile devel-opment. To complement this study, we can use the bug reports filed byopen source mobile application users. This comparison can provide us de-tailed insights on which mobile development are solved after discussion onQA websites and which ones remain unsolved.Finally, as mentioned in Chapter 1, our main goal in this study is tounderstand the common challenges and/or misconceptions among web de-velopers. Hence, we plan to utilize the results of our study and build toolsand techniques that can assist the web developers in overcoming the chal-lenges, and accelerate the web development process. One such project isDompletion [5], which is a DOM-Aware JavaScript code completion system.56Bibliography[1] HTML5 home | Intel developer zone. http://software.intel.com/en-us/html5/home. Accessed: 2014-02-03.[2] JavaScript - what cross-browser issues have you faced? - stack over-flow. http://stackoverflow.com/questions/565641/what-cross-browser-issues-have-you-faced. Accessed: 2014-11-02.[3] Miltiadis Allamanis and Charles Sutton. Why, when, and what: ana-lyzing stack overflow questions by topic, type, and code. In Proceedingsof the Tenth International Workshop on Mining Software Repositories,pages 53–56. IEEE Press, 2013.[4] Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and JureLeskovec. Steering user behavior with badges. In Proceedings of the22nd international conference on World Wide Web, pages 95–106. In-ternational World Wide Web Conferences Steering Committee, 2013.[5] Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. Domple-tion: Dom-aware javascript code completion. In Proceedings of theIEEE/ACM International Conference on Automated Software Engineer-ing (ASE), page 43–54. ACM, 2014.57Bibliography[6] Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. Mining questionsasked by web developers. In Proceedings of the Working Conference onMining Software Repositories (MSR), page 112–121. ACM, 2014.[7] Anton Barua, Stephen W Thomas, and Ahmed E Hassan. What aredevelopers talking about? an analysis of topics and trends in stackoverflow. Empirical Software Engineering, pages 1–36, 2012.[8] Prashant Belhekar, Alka Londhe, Bhavana Lucy, and Santosh Kumar.Finding bugs in web applications using dynamic test generation. InInternational Journal of Engineering Research and Technology. ESRSAPublications, 2013.[9] David M Blei and J Lafferty. Topic models. Text mining: classification,clustering, and applications, 10:71, 2009.[10] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichletallocation. The Journal of Machine Learning Research, 3:993–1022,2003.[11] Joshua Charles Campbell, Chenlei Zhang, Zhen Xu, Abram Hindle, andJames Miller. Deficient documentation detection: A methodology tolocate deficient project documentation using topic analysis. In Proceed-ings of the 10th International Working Conference on Mining SoftwareRepositories, pages 57–60. IEEE, 2013.[12] Marco Cova, Christopher Kruegel, and Giovanni Vigna. Detection andanalysis of drive-by-download attacks and malicious javascript code. In58BibliographyProceedings of the 19th international conference on World wide web,pages 281–290. ACM, 2010.[13] Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R Guha, AnantJhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins,John A Tomlin, et al. Semtag and seeker: Bootstrapping the semanticweb via automated semantic annotation. In Proceedings of the 12th in-ternational conference on World Wide Web, pages 178–186. ACM, 2003.[14] Arjun Guha, Shriram Krishnamurthi, and Trevor Jim. Using static anal-ysis for ajax intrusion detection. In Proceedings of the 18th internationalconference on World wide web, pages 561–570. ACM, 2009.[15] Hui Han, C Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang,and Edward A Fox. Automatic document metadata extraction usingsupport vector machines. In Digital Libraries, 2003. Proceedings. 2003Joint Conference on, pages 37–48. IEEE, 2003.[16] Ian Hickson and David Hyatt. HTML5: A vocabulary and associatedapis for html and xhtml. W3C Working Draft edition, 2011.[17] Adrian Holzer and Jan Ondrus. Trends in mobile application develop-ment. In Mobile Wireless Middleware, Operating Systems, and Applica-tions - Workshops, volume 12, pages 55–64. Springer Berlin Heidelberg,2009.[18] Dongseok Jang, Ranjit Jhala, Sorin Lerner, and Hovav Shacham. Anempirical study of privacy-violating information flows in javascript web59Bibliographyapplications. In Proceedings of the 17th ACM conference on Computerand communications security, pages 270–283. ACM, 2010.[19] David Kavaler, Daryl Posnett, Clint Gibler, Hao Chen, PremkumarDevanbu, and Vladimir Filkov. Using and asking: Apis used in theandroid market and asked about in stackoverflow. In Social Informatics,pages 405–418. Springer, 2013.[20] Mario Linares-Vásquez, Bogdan Dit, and Denys Poshyvanyk. An ex-ploratory analysis of mobile development issues using stack overflow. InProceedings of the Tenth International Workshop on Mining SoftwareRepositories, pages 93–96. IEEE Press, 2013.[21] Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, andBjörn Hartmann. Design lessons from the fastest q&a site in the west. InProceedings of the SIGCHI conference on Human factors in computingsystems, pages 2857–2866. ACM, 2011.[22] Ali Mesbah and ShabnamMirshokraie. Automated analysis of CSS rulesto support style maintenance. In International Conference on SoftwareEngineering (ICSE), pages 408–418. IEEE, 2012.[23] Ali Mesbah and Mukul R. Prasad. Automated cross-browser compatibil-ity testing. In Proceedings of the International Conference on SoftwareEngineering (ICSE), pages 561–570. ACM, 2011.[24] Nick Nikiforakis, Luca Invernizzi, Alexandros Kapravelos, StevenVan Acker, Wouter Joosen, Christopher Kruegel, Frank Piessens, andGiovanni Vigna. You are what you include: Large-scale evaluation of60Bibliographyremote javascript inclusions. In Proceedings of the 2012 ACM confer-ence on Computer and communications security, pages 736–747. ACM,2012.[25] Frolin S Ocariza, Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah.An empirical study of client-side JavaScript bugs. In International Sym-posium on Empirical Software Engineering and Measurement (ESEM),pages 55–64. IEEE, 2013.[26] FS Ocariza, Karthik Pattabiraman, and Benjamin Zorn. JavaScript er-rors in the wild: An empirical study. In Software Reliability Engineering(ISSRE), 2011 IEEE 22nd International Symposium on, pages 100–109.IEEE, 2011.[27] Aditya Pal, Shuo Chang, and Joseph A Konstan. Evolution of expertsin question answering communities. In ICWSM, 2012.[28] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up?:sentiment classification using machine learning techniques. In Proceed-ings of the ACL-02 conference on Empirical methods in natural languageprocessing-Volume 10, pages 79–86. ACM, 2002.[29] Chris Parnin, Christoph Treude, Lars Grammel, and Margaret-AnneStorey. Crowd documentation: Exploring the coverage and the dynam-ics of api discussions on stack overflow. Georgia Institute of Technology,Tech. Rep, 2012.[30] Boris Petrov, Martin Vechev, Manu Sridharan, and Julian Dolby. Race61Bibliographydetection for web applications. ACM SIGPLAN Notices, pages 251–262,2012.[31] Martin Porter. {The Porter Stemming Algorithm}. 2009.[32] Paruj Ratanaworabhan, Benjamin Livshits, and Benjamin G Zorn. JS-Meter: Comparing the behavior of JavaScript benchmarks with realweb applications. In Proceedings of the 2010 USENIX conference onWeb application development, pages 3–3. USENIX Association, 2010.[33] Fatemeh Riahi. Finding expert users in community question answeringservices using topic models. 2012.[34] Shauvik Roy Choudhary, Mukul R. Prasad, and Alessandro Orso. X-PERT: Accurate identification of cross-browser issues in web applica-tions. In Proceedings of the International Conference on Software En-gineering (ICSE), pages 702–711. IEEE Press, 2013.[35] Avigit K Saha, Ripon K Saha, and Kevin A Schneider. A discrimina-tive model approach for suggesting tags automatically for stack overflowquestions. In Proceedings of the Tenth International Workshop on Min-ing Software Repositories, pages 73–76. IEEE Press, 2013.[36] Joshua Saxe, David Mentis, and Christopher Greamo. Mining web tech-nical discussions to identify malware capabilities. In 33rd InternationalConference on Distributed Computing Systems Workshops (ICDCS 2013Workshops), Philadelphia, PA, USA, 8-11 July, 2013, pages 1–5. IEEE,2013.62Bibliography[37] David Sin, Erin Lawson, and Krishnan Kannoorpatti. Mobile web apps-the non-programmer’s alternative to native applications. In HumanSystem Interactions (HSI), 2012 5th International Conference on, pages8–15. IEEE, 2012.[38] Vibha Singhal Sinha, Senthil Mani, and Monika Gupta. Exploring ac-tiveness of users in QA forums. In Proceedings of the Tenth InternationalWorkshop on Mining Software Repositories, pages 77–80. IEEE Press,2013.[39] Yang Song, Lu Zhang, and C Lee Giles. Automatic tag recommendationalgorithms for social recommender systems. ACM Transactions on theWeb (TWEB), 5(1):4, 2011.[40] Yang Song, Ziming Zhuang, Huajing Li, Qiankun Zhao, Jia Li, Wang-Chien Lee, and C Lee Giles. Real-time automatic tag recommendation.In Proceedings of the 31st annual international ACM SIGIR conferenceon Research and development in information retrieval, pages 515–522.ACM, 2008.[41] Ryan Stevens, Jonathan Ganz, Vladimir Filkov, Premkumar Devanbu,and Hao Chen. Asking for (and about) permissions used by androidapps. In Proceedings of the Tenth International Workshop on MiningSoftware Repositories, pages 31–40. IEEE Press, 2013.[42] Siddharth Subramanian and Reid Holmes. Making sense of online codesnippets. In Proceedings of the Tenth International Workshop on MiningSoftware Repositories, pages 85–88. IEEE Press, 2013.63[43] S. L. Ting, W. H. Ip, and Albert HC Tsang. Is naïve bayes a good clas-sifier for document classification?. In International Journal of SoftwareEngineering and Its Applications, pages 37–46. ACM, 2011.[44] Bogdan Vasilescu, Andrea Capiluppi, and Alexander Serebrenik. Gen-der, representation and online participation: A quantitative study ofstackoverflow. In International Conference on Social Informatics. ASE,2012.[45] Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter,Richard Shin, and Dawn Song. An empirical analysis of xss sanitizationin web application frameworks. Technical report, Technical report, UCBerkeley, 2011.[46] Yunhui Zheng, Tao Bao, and Xiangyu Zhang. Statically locating webapplication bugs caused by asynchronous calls. In Proceedings of the20th international conference on World wide web, pages 805–814. ACM,2011.[47] Zainab Zolaktaf, Fatemeh Riahi, Mahdi Shafiei, and Evangelos Milios.Modeling community question-answering archives. 2011.64Appendix AKeywords for Each CategoryThe tables in the following pages list the keywords discovered in each cat-egory using Latent Dirichlet Allocation. Table A.1-A.3 represent the cate-gories for web related discussions. Table A.4-A.6 represent the categories formobile related discussions.65Appendix A. Keywords for Each CategoryTable A.1: Keywords for categories in JavaScript related questions.Category KeywordsCross-Browser javascript ar thi browser support code js work ha make thei don webwai onli implement good veri versionFunction Calls function object var return thi call foo variabl obj properti method logscope prototype global console valu ar alertjQuery http net jsfiddle www html org thi jquery https en api plugin demowork jqueri google code javascript githubDOM div class id function li click span ul href hide element item html varshow attr find link dataString Manipulation string var match return number str replace charact thi length splitregex result test valu function text ar replacValidation input form type id function val submit button text valu var fielddocument label return checkbox checked false checkDocument Structure script html javascript text js type document jquery page head bodysrc load content tag http file function codeAJAX data function json ajax url php callback var request post success re-turn call result error response alert id typeEvent Handling event function click handler element document bind false thi targetbutton trigger dialog alert return events fire jqueri onclickCSS px width css height style div left top color background position sizeposit border scroll font margin animate varWindow Object window function var settimeout location open href document url pagetimer return thi call hash popup true setinterval timeoutData Structures model view function app ext data var backbone render templat thidojo extend template bind return require ko creatTable Manipulation option td select tr id row var table tabl function options text dataselected column cell class val brCanvas API date var math function time return canvas svg random draw canvamonth ctx chart year start max point floorImage Manipulation img imag src image images var png jpg function url video id loadplayer width height gif alt http66Appendix A. Keywords for Each CategoryTable A.2: Keywords for categories in HTML5 related questions.Category KeywordsCanvas API thi canva ar imag wai make creat canvas draw ani time element onliobject move work game set javascriptBrowser Support html thi web ar app browser support javascript user server mobil aniapplic develop flash wai work android haBrowser Specific thi work http chrome html test firefox browser code problem page wanet ve support safari doe doesn triHTML5 Attributes html element thi ar tag attribut content css browser page valid anidocument don make wai ha onli theiJavaScript function var event document click window false getelementbyid returnthi id alert button element code true call addeventlistener logCSS px div width height background left css style margin top border colorposition text padding font box display classForms input type form id text label button submit field email post valu brhtml user class val valid selectDevice Specific page file load url cach html manifest user thi app cache android ajaxcontent link php request browser serverHTML5 JS file data server var upload function files send localstorage json stringrequest websocket client php object xhr thi urlDOM Structure script html text head body js type javascript title src css jquery httpcontent meta href doctype id divMedia video audio mp plai src player sourc play type html ogg file httpcontrols id tag control sound flashCanvas Image imag image img var canvas canva src width height png function dataimages drawimage document context draw onload jpgCSS3 svg transform webkit scale deg color animation rotate opacity varanim path fill http style camera gl width mozGeolocation map google http var maps org api function navigator position httpsgoogl locat geoloc coords error location geolocation enHTML5 Elements div data section page id header content role class footer ui articlearticl mobile theme tab icon button btnTable td option tr row table select id tabl db var php result echo data columnfunction cell tx transactionFonts font date time node var xsl format st size pt text pos scope frame urloutput family fonts bold67Appendix A. Keywords for Each CategoryTable A.3: Keywords for categories in CSS related questions.Category KeywordsDOM Layout width height div thi content left set posit element overflow top positioncolumn scroll px fix float make cssBrowser Specific css thi ar browser html page media make javascript web wai supportonli site ani differ design thei screenCSS Box-Model px width left margin height top border padding background positioncolor bottom style solid float auto div text absoluteDiv Spacing div class id content br style html float left container header clearwrapper body test footer thi main titleWidgets http net jsfiddle ui data tab thi jqueri icon css slider demo widgetjquery dialog button tooltip tabs workBackground-Image img background imag image png images url src repeat jpg alt httpheight href gif bg width class logoJavaScript function var document click event javascript script return jquery jquerihide window getelementbyid css id show code ready addclassDisplay Property span text block display align inline class line center style space verticalwrap word blah white float html divList li ul menu href item class nav list id link hov hover display html homenavig dropdown color blockForms input button type label form option text id select class field submittextarea checkbox search radio email user htmlBackground background color border box px shadow radius webkit gradient mozrgba linear top css bottom white red image solidDocument Structure html css text type script http head href body link style www titlecontent xhtml stylesheet js rel orgFonts font size em text color family weight bold serif sans arial url normaldecoration helvetica px line style formatTable td tr table tabl class row cell width nbsp border style align columncol tbody collapse data id centerCSS3 webkit transform transition opacity moz deg anim animation easerotate filter ms css opac scale transit svg alpha rotat68Appendix A. Keywords for Each CategoryTable A.4: Keywords for categories in mobile JavaScript related questions.Category KeywordsFile Handling file phonegap html plugin cordova js http app android folder xml wwwcreat write code https api project localEvent Handling event function document touch false addeventlistener bind fire varclick events touchstart element handler preventdefault true return lis-ten tapResolution window width height tab viewport screen scale orient var ui contentsize device meta zoom ti thi view addmedia video navigator user ipad iphone useragent android audio match agentos test var version ua window http safari iosCSS css div px style li class id width href webkit text height color leftbutton link font ul httpDevice API webview uiwebview nsstring url javascript request stringbyevaluat-ingjavascriptfromstring method return void view navigationtype ob-jective shouldstartloadwithrequest nsurlrequest bool deleg nsurl web-viewdidfinishloadDrag n Drop scroll var window node element posit document range function over-flow position move span select text div offset length topImage Manipulation var img image imag camera src push data string options png arraylength thi return prototype base images photoGeolocation map google maps http googl api location position locat www coordsmarker geolocation geoloc link id latitude html codeDOM document var element getelementbyid innerhtml createelement func-tion appendchild string iframe ifram id class tag body label foohittestresult lengthMobile SQL tx db function id executesql var table transaction insert results rowresult return item rows values data errorcb sqlFacebook API login connection frame id fb url user connect facebook post passwordstates response redirect els ip network server app69Appendix A. Keywords for Each CategoryTable A.5: Keywords for categories in mobile HTML5 related questions.Category KeywordsBrowser Support browser android support ios thi safari work html iphon test doe ipadmobil devic don ar version webkit onliFile Handling file data load cach local html server app store storag ajax manifestjson browser databas download folder save requestMedia video audio plai html player mp tag play file src media control workelement sourc attribut stream controls loadEvent Handling function var false addeventlistener window document true funct logerror call check navigator console phonegap return settimeout alertconnectionTouch Events event function page click bind touch live handl preventdefault preventbutton thi jqueri trigger document touchstart element dom handlerResolution width imag screen css scale viewport png icon meta height contentsize px media device pixel image set backgroundDevice API phonegap app camera phone cordova window open google maps nav-igator href github plugin map url sms https messag pluginsForm Elements input type form php date text keyboard number content html emailsubmit element id post test function picker valCanvas px canvas ctx var canva height ev webkit width top draw left contexttransform color style border center css70Appendix A. Keywords for Each CategoryTable A.6: Keywords for categories in mobile CSS related questions.Category KeywordsCross Browser ar mobil browser support web css devic site thei ipad good androidios safari make user iphon design versionBox-Model px color webkit background border top left margin width bottomheight style thi padding css box body html buttonResolution media width screen px device onli max pixel queri min css iphon ratioportrait orientation devic landscape orient webkitZooming width viewport scale meta content device initial tag maximum userthi scalable head set minimum html zoom page appleLayout div height id width class style header data page position containerwrapper content px btn section left overflow backgroundJavaScript function document var window script javascript return body style nav-igator js jquery getelementbyid click code onclick attr els modernizrImage background imag image img png url images icon size app src repeatjpg sprite limit path center jpeg bgTouch event touch function scroll hover touchstart preventdefault click trig-ger tap document touchend addeventlistener events element usertouches bind navForm input span text type select class label range webkit form field removnode focus wrap user placeholder pointer subviewDOM li class menu display ul tab item block pre div href index make droplist tag inline hover spaceCSS3 Animations transform webkit translate transition transit anim ease px opacitymove act ms property css moz hardwar rotate animation translatexFonts font size family fonts text url ttf webfont weight format bold stylenormal src rule fac svg siz pt71

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0167068/manifest

Comment

Related Items