Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Mobile app development : challenges and opportunities for automated support Erfani Joorabchi, Mona 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_may_erfanijoorabchi_mona.pdf [ 5.22MB ]
Metadata
JSON: 24-1.0228781.json
JSON-LD: 24-1.0228781-ld.json
RDF/XML (Pretty): 24-1.0228781-rdf.xml
RDF/JSON: 24-1.0228781-rdf.json
Turtle: 24-1.0228781-turtle.txt
N-Triples: 24-1.0228781-rdf-ntriples.txt
Original Record: 24-1.0228781-source.json
Full Text
24-1.0228781-fulltext.txt
Citation
24-1.0228781.ris

Full Text

Mobile App Development:Challenges and Opportunities for Automated SupportbyMona Erfani JoorabchiB.Sc., Shahid Beheshti University, Iran, 2007M.Sc., Simon Fraser University, Canada, 2010A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)April 2016© Mona Erfani Joorabchi, 2016AbstractMobile app development is a relatively new phenomenon that is increasing rapidlydue to the ubiquity and popularity of smartphones among end-users. As with anynew domain, mobile app development has its own set of new challenges. Thework presented in this dissertation has focused on improving the state-of-the-artby understanding the current practices and challenges in mobile app developmentas well as proposing a new set of techniques and tools based on the identifiedchallenges.To understand the current practices, real challenges and issues in mobile de-velopment, we first conducted an explorative field study, in which we interviewed12 senior mobile developers from nine different companies, followed by a semi-structured survey, with 188 respondents from the mobile development commu-nity. Next, we mined and quantitatively and qualitatively analyzed 32K non-reproducible bug reports in one industrial and five open-source bug repositories.Then, we performed a large-scale comparative study of 80K iOS and Android app-pairs and 1.7M reviews by mining the Google Play and Apple app stores.Based on the identified challenges, we first proposed a reverse engineeringtechnique that automatically analyzes a given iOS mobile app and generates a statemodel of the app. Finally, we proposed an automated technique for detecting in-consistencies in the same mobile app implemented for iOS and Android platforms.To measure the effectiveness of the proposed techniques, we evaluated our methodsusing various industrial and open-source mobile apps. The evaluation results pointto the effectiveness of the proposed model generation and mapping techniques interms of accuracy and inconsistency detection capability.iiPrefaceAll of the work presented henceforth was conducted by the author, Mona ErfaniJoorabchi. The contributions and evaluations presented in this dissertation are sum-marized and published in four conference papers. Additionally, the author and anECE master student, Mohamed Ali, collaborated equally to a conference submis-sion of Chapter 4 which is currently under review.The following list presents publications for each chapter.• Chapter 2:– “Real Challenges in Mobile App Development” [86]. M. Erfani Joorabchi,A. Mesbah and P. Kruchten. In Proceedings of the 7th ACM/IEEE In-ternational Symposium on Empirical Software Engineering and Mea-surement (ESEM 2013). ACM/IEEE. 15–24.• Chapter 3:– “Works For Me! Characterizing Non-reproducible Bug Reports” [87].M. Erfani Joorabchi, M. Mirzaaghaei and A. Mesbah. In Proceedingsof the 11th ACM Working Conference on Mining Software Reposito-ries (MSR 2014). ACM. 62–71.• Chapter 4:– The author and an ECE master student collaborated equally to an ACMSIGSOFT conference submission of this work and it is currently underreview.• Chapter 5:– “Reverse Engineering iOS Mobile Applications” [85]. M. Erfani Joorabchiand A. Mesbah. In Proceedings of the 19th IEEE Working Conferenceon Reverse Engineering (WCRE 2012). IEEE Computer Society. 177–186.• Chapter 6:iii– “Detecting Inconsistencies in Multi-Platform Mobile Apps” [88]. M.Erfani Joorabchi, M. Ali and A. Mesbah. In Proceedings of the 26thIEEE International Symposium on Software Reliability Engineering(ISSRE 2015). IEEE Computer Society. 450–460.Regarding ethics approval, the following Human Ethics Certificates were ob-tained from UBC Behavioural Research Ethics Board:• Project Title “A Study of Cross-platform Development and Testing Practicesof Mobile Applications” with Certificate Numbers H12-03058 and H15-02247.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Real Challenges in Mobile App Development . . . . . . . . . . . . . 102.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Data Collection and Analysis . . . . . . . . . . . . . . . . 122.2.3 Participant Demographics . . . . . . . . . . . . . . . . . 152.3 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.1 General Challenges for Mobile Developers . . . . . . . . 202.3.2 Developing for Multiple Platforms . . . . . . . . . . . . . 232.3.3 Current Testing Practices . . . . . . . . . . . . . . . . . . 282.3.4 Analysis and Testing Challenges . . . . . . . . . . . . . . 372.4 What Has (not) Changed since 2012? A Follow-up Study . . . . . 402.4.1 Survey Design . . . . . . . . . . . . . . . . . . . . . . . 402.4.2 Our Participants . . . . . . . . . . . . . . . . . . . . . . 412.4.3 Analysis and Summary of Survey Findings . . . . . . . . 41v2.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . 452.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.6.1 Mapping Study . . . . . . . . . . . . . . . . . . . . . . . 462.6.2 Same App across Multiple Platforms . . . . . . . . . . . 472.6.3 Testing Mobile-Specific Features . . . . . . . . . . . . . 492.6.4 Other Challenging Areas . . . . . . . . . . . . . . . . . . 502.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Works For Me! Characterizing Non-reproducibleBug Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2 Non-Reproducible Bugs . . . . . . . . . . . . . . . . . . . . . . 603.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.3.1 Bug Repository Selection . . . . . . . . . . . . . . . . . 613.3.2 Mining Non-Reproducible Bug Reports . . . . . . . . . . 623.3.3 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . 643.3.4 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . 653.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.4.1 Frequency and Comparisons (RQ1) . . . . . . . . . . . . 683.4.2 Cause Categories (RQ2) . . . . . . . . . . . . . . . . . . 693.4.3 Common Transition Patterns (RQ3) . . . . . . . . . . . . 733.4.4 Fixed Non-reproducible Bugs (RQ4) . . . . . . . . . . . . 763.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.5.1 Quantitative Analysis of NR Bug Reports . . . . . . . . . 773.5.2 Fixing NR Bugs . . . . . . . . . . . . . . . . . . . . . . 773.5.3 Interbug Dependencies . . . . . . . . . . . . . . . . . . . 783.5.4 Mislabelling . . . . . . . . . . . . . . . . . . . . . . . . 783.5.5 Different Domains and Environments . . . . . . . . . . . 783.5.6 Communication Issues . . . . . . . . . . . . . . . . . . . 793.5.7 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 793.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824 Same App, Two App Stores: A Comparative Study . . . . . . . . . . 844.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . 864.2.2 Matching Apps to Find App-Pairs . . . . . . . . . . . . . 874.2.3 App-store Attribute Analysis . . . . . . . . . . . . . . . . 914.2.4 User Reviews . . . . . . . . . . . . . . . . . . . . . . . . 924.2.5 Success Rates . . . . . . . . . . . . . . . . . . . . . . . . 94vi4.2.6 Datasets and Classifiers . . . . . . . . . . . . . . . . . . . 964.3 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.3.1 Prevalence and Attributes (RQ1) . . . . . . . . . . . . . . 964.3.2 Top Rated Apps (RQ2) . . . . . . . . . . . . . . . . . . . 1024.3.3 Success Rate (RQ3) . . . . . . . . . . . . . . . . . . . . 1034.3.4 Major Complaints (RQ4) . . . . . . . . . . . . . . . . . . 1084.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.4.1 Implications . . . . . . . . . . . . . . . . . . . . . . . . 1104.4.2 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 1114.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135 Reverse Engineering iOS Mobile Applications . . . . . . . . . . . . 1145.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165.3 Background and Challenges . . . . . . . . . . . . . . . . . . . . 1185.4 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215.4.1 Hooking into the Application . . . . . . . . . . . . . . . 1235.4.2 Analyzing UI Elements . . . . . . . . . . . . . . . . . . . 1245.4.3 Exercising UI Elements . . . . . . . . . . . . . . . . . . 1245.4.4 Accessing the Next View Controller . . . . . . . . . . . . 1255.4.5 Comparing States . . . . . . . . . . . . . . . . . . . . . . 1265.4.6 State Graph Generation . . . . . . . . . . . . . . . . . . . 1295.5 Tool Implementation: ICRAWLER . . . . . . . . . . . . . . . . . 1295.6 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 1305.6.1 Experimental Objects . . . . . . . . . . . . . . . . . . . . 1305.6.2 Experimental Design . . . . . . . . . . . . . . . . . . . . 1315.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.6.4 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 1345.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366 Detecting Inconsistencies in Multi-PlatformMobile Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1376.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.2 Pervasive Inconsistencies . . . . . . . . . . . . . . . . . . . . . . 1396.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406.3.1 Inferring Abstract Models . . . . . . . . . . . . . . . . . 1416.3.2 Mapping Inferred Models . . . . . . . . . . . . . . . . . 1456.3.3 Visualizing the Models . . . . . . . . . . . . . . . . . . . 1516.4 Tool Implementation . . . . . . . . . . . . . . . . . . . . . . . . 1526.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153vii6.5.1 Experimental Objects . . . . . . . . . . . . . . . . . . . . 1536.5.2 Experimental Procedure . . . . . . . . . . . . . . . . . . 1556.5.3 Results and Findings . . . . . . . . . . . . . . . . . . . . 1566.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1606.6.1 Comparison Criteria . . . . . . . . . . . . . . . . . . . . 1606.6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 1606.6.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . 1616.6.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . 1626.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1626.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 1657.1 Revisiting Research Questions . . . . . . . . . . . . . . . . . . . 1657.2 Future Work and Concluding Remarks . . . . . . . . . . . . . . . 172Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175viiiList of TablesTable 2.1 Interview Participants. . . . . . . . . . . . . . . . . . . . . . . 14Table 2.2 A mapping study. . . . . . . . . . . . . . . . . . . . . . . . . 47Table 3.1 Studied bug repositories and their rate of NR bugs. . . . . . . . 63Table 3.2 Mapping of BUGZILLA and JIRA fields. . . . . . . . . . . . . 65Table 3.3 NR Categories and Rules. . . . . . . . . . . . . . . . . . . . . 67Table 3.4 Descriptive statistics between NR and Others, for each definedmetric: Active Time (AT), # Unique Authors (UA), # Com-ments (C), # Watchers (W), from all repositories. . . . . . . . . 69Table 3.5 Examples of STATUS (RESOLUTION) transitions of NR bug re-ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Table 4.1 Collected app-pair attributes . . . . . . . . . . . . . . . . . . . 88Table 4.2 Real-world reviews and their classifications. . . . . . . . . . . 93Table 4.3 Reviews and subcategories of problem discovery. . . . . . . . 96Table 4.4 Descriptive statistics for iOS and Android (AND), on ClusterSize (C), Ratings (R), Ratings for all apps (R*), Stars (S), Starsfor all apps (S*), and Price (P). . . . . . . . . . . . . . . . . . 98Table 4.5 Statistics of 14 Apps used to build the classifiers (C1 = GenericClassifier, C2 = Sentiment Classifier, NB = Naive Bayes Al-gorithm, SVM = Support Vector Machines Algorithm, Train =Training pool). . . . . . . . . . . . . . . . . . . . . . . . . . . 104Table 4.6 Descriptive statistics for the iOS and Android (AND) reviewsfor the app-pairs: Problem Discovery (PD), Feature Request(FR), Non-informative (NI), Positive (P), Negative (N), Neutral(NL), and SR (Success Rate). . . . . . . . . . . . . . . . . . . 106Table 4.7 Descriptive statistics for the problematic reviews of the app-pairs: Critical (CR), Post Update (PU), Price Complaints (PC),and App Feature (AF). . . . . . . . . . . . . . . . . . . . . . . 110Table 5.1 Experimental objects. . . . . . . . . . . . . . . . . . . . . . . 130Table 5.2 Characteristics of the experimental objects. . . . . . . . . . . . 131Table 5.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133ixTable 6.1 Six combinations for mapping. . . . . . . . . . . . . . . . . . 149Table 6.2 Characteristics of the experimental objects, together with totalnumber of edges, unique states, elements and manual uniquestates counts (MC) across all the scenarios. . . . . . . . . . . . 154Table 6.3 Number of reported inconsistencies by CHECKCAMP, vali-dated, average and percentage of their severity with examplesin each app-pair. . . . . . . . . . . . . . . . . . . . . . . . . . 159Table 6.4 Bug severity description. . . . . . . . . . . . . . . . . . . . . 160xList of FiguresFigure 2.1 How many years of work experience do you have in softwaredevelopment: . . . . . . . . . . . . . . . . . . . . . . . . . . 16Figure 2.2 How many years of work experience do you have in nativemobile application development: . . . . . . . . . . . . . . . . 16Figure 2.3 What platforms do you develop native mobile applications for(Check all that apply): . . . . . . . . . . . . . . . . . . . . . 17Figure 2.4 How many native mobile applications have you developed so far: 17Figure 2.5 What types of native apps have you built (check all that apply)? 17Figure 2.6 Which of the following applies to you: . . . . . . . . . . . . . 18Figure 2.7 If you are working in a company, how big is the native mobileapplication developer team (including developers for differentplatforms)? . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 2.8 An overview of our four main categories with 31 subordinateconcepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 2.9 Do you see the existence of multiple platforms as a challengefor developing mobile applications and why? . . . . . . . . . 21Figure 2.10 Have you developed the same native mobile app across differ-ent platforms? . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 2.11 How are your native mobile apps tested? . . . . . . . . . . . . 28Figure 2.12 Who is responsible for testing your native mobile apps? . . . . 29Figure 2.13 How do you test your application’s correctness across multipleplatforms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 2.14 What levels of testing do you apply and how? . . . . . . . . . 31Figure 2.15 iOS levels of testing. . . . . . . . . . . . . . . . . . . . . . . 33Figure 2.16 Android levels of testing. . . . . . . . . . . . . . . . . . . . . 33Figure 2.17 Windows levels of testing. . . . . . . . . . . . . . . . . . . . 33Figure 2.18 Blackberry levels of testing. . . . . . . . . . . . . . . . . . . 33Figure 3.1 Overview of our methodology. . . . . . . . . . . . . . . . . . 61Figure 3.2 Active Time. . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 3.3 No. of Authors. . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 3.4 No. of Comments. . . . . . . . . . . . . . . . . . . . . . . . 68xiFigure 3.5 No. of Watchers. . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 3.6 Overall Rate of NR Categories. . . . . . . . . . . . . . . . . 70Figure 3.7 Rate of root cause categories in each bug repository. . . . . . 71Figure 3.8 Resolution-to-Resolution Transition Patterns of NR Bug Re-ports (only weights larger than 2% are shown on the graph). . 75Figure 4.1 Overview of our methodology. . . . . . . . . . . . . . . . . . 87Figure 4.2 Android Cluster for Swiped app. . . . . . . . . . . . . . . . . 88Figure 4.3 a) Groupon and b) Scribblenauts apps. Android apps are shownon top and iOS apps at the bottom. . . . . . . . . . . . . . . . 90Figure 4.4 Matching App-pair Criteria. . . . . . . . . . . . . . . . . . . 90Figure 4.5 Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Figure 4.6 Ratings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Figure 4.7 Stars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Figure 4.8 Prices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100Figure 4.9 The rates of classifiers’ categories for our 2K app-pairs, whereeach dot represents an app-pair. . . . . . . . . . . . . . . . . 105Figure 4.10 The success rates for our 2K app-pairs, where each dot repre-sents an app-pair. . . . . . . . . . . . . . . . . . . . . . . . . 106Figure 4.11 Success rates for 2K app-pairs. The green round shape refersto Android apps and the blue triangular shape refers to iOS apps.107Figure 4.12 The rates of complaints categories for our 2K app-pairs, whereeach dot represents an app-pair. . . . . . . . . . . . . . . . . 109Figure 5.1 The Olympics2012 iPhone app going through a UI state tran-sition, after a generated event. . . . . . . . . . . . . . . . . . 120Figure 5.2 The generated state graph of the Olympics2012 iPhone app. . 121Figure 5.3 Relation between ICRAWLER and a given iPhone app. Theright side of the graph shows key components of an iPhoneapp taken from [125]. . . . . . . . . . . . . . . . . . . . . . . 122Figure 5.4 The new method in which we inject code to set the dismissedboolean and then call the original method. . . . . . . . . . . . 125Figure 5.5 Swapping the original built-in method with our new method inthe +load function. . . . . . . . . . . . . . . . . . . . . . . 126Figure 6.1 The overview of our technique for behaviour checking acrossmobile platforms. . . . . . . . . . . . . . . . . . . . . . . . . 140Figure 6.2 An edge object of MTG iPhone app with its touched elementand methods. . . . . . . . . . . . . . . . . . . . . . . . . . . 143Figure 6.3 A snapshot of a state in MTG iPhone app with its captured UIelement objects. . . . . . . . . . . . . . . . . . . . . . . . . . 144xiiFigure 6.4 Visualization of mapping inferences for MTG iPhone (left) andAndroid (right) app-pairs. The result indicates 3 unmatchedstates shown with red border (including 2 functionality incon-sistencies where iPhone has more states than Android and 1platform specific inconsistency with MoreViewsController oniPhone). Other 5 matched states have data inconsistenciesshown with yellow border. . . . . . . . . . . . . . . . . . . . 151Figure 6.5 Zooming into a selected State (or Edge) represents detected in-consistencies and UI-structure (or touched element and meth-ods) information of iPhone (left) and Android (right) app-pairs. 152Figure 6.6 Plot of precision and recall for the five mapping combinationsof each app-pair. . . . . . . . . . . . . . . . . . . . . . . . . 157Figure 6.7 F-measure obtained for the five mapping combinations on eachapp-pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158xiiiAcknowledgmentsMy deepest gratitude is to my supervisor, Dr. Ali Mesbah. I have been amazinglyfortunate to have a supervisor who gave me the freedom to explore on my own, andat the same time the guidance to recover when my steps faltered. You have set anexample of excellence as a researcher, advisor, mentor, instructor, and role model.I would like to thank my thesis committee members who were more than gen-erous with their expertise and precious time through this process; your discus-sion, ideas, and feedback have been absolutely invaluable. A special thank to Dr.Philippe Kruchten for his co-authorship in our ESEM paper.I also need to thank the very many great friends from the SALT lab for baringwith me and providing a great research and social atmosphere at the school.I would like to acknowledge and thank Quickmobile for allowing me to con-duct my research and providing any assistance requested. Special thanks go to themembers of the product development department for their continued support.My deepest levels of gratitude also go to my amazing twin sister and bestfriend, Minoo, who made a significant mark in my life with her presence, support,and encouragement.Last but not least, I would like to thank my amazing and supportive friend,Dr. Nima Kaviani. Nima walked alongside me from the very starting point of myPh.D. career to the end. Thanks Nima for all the help, support, compassion, andfor your trustful manner.xivDedicationTo the greatest blessings of my life:“my mom and my dad, my sister and my brother...”Your support, encouragement, and constant love have sustained me throughout mylife and successfully made me the person I am becoming. Thank you!xvChapter 1IntroductionThe ubiquity and popularity of smartphones among end-users has increasinglydrawn software developers’ attention over the last few years. Mobile apps fallbroadly into three categories: native, web-based, and hybrid [157]. Native applica-tions run on a device’s operating system and are required to be adapted for differentmobile devices and platforms, such as Apple’s iOS, Google’s Android, WindowsPhone, and Blackberry. While this approach provides the best performance andaccess to all native platform features, the downside is, in order to build a multi-platform application, the code has to be rewritten for each platform separately.Web-based apps require a web browser on a mobile device. Web technologies suchas HTML, CSS, and JavaScript are used to build web-based apps and multipleplatforms can be targeted. However, web technologies are not allowed to access alldevice features and performance can suffer. Hybrid apps are ‘native-wrapped’ webapps and primarily built using HTML5 and JavaScript. The native-wrappedcontainer provides access to platform features. Recent surveys [50, 51] reveal thatdevelopers are mainly interested in building native apps because they offer the bestperformance and allow for advanced UI interactions. Throughout this dissertation,we focus on native mobile apps. Henceforth, we use the term ‘mobile app’ orsimply ‘app’ to denote ‘native mobile application’.While mobile app development for devices and platforms, including Nokia,BlackBerry, Android, and Windows Phone goes back over 10 years, there has beenexponential growth in mobile app development since the Apple app store launchedin July 2008 [213]. Since then, other mobile platforms have opened online storesand marketplaces as their distribution channels for third-party apps; the Android1Market opened a few months later, followed by BlackBerry App World, Nokia’sOvi Store and Windows Phone Marketplace. The easy distribution via the onlineapp stores have significantly lowered the barrier to market entry [64] and, therefore,for the first time in software development history, small companies and single de-velopers access distribution infrastructures that allow them to provide their mobileapps to millions of users at the tap of a finger [186]. On the other hand, users weregranted control over the apps they download and install on their mobile devices,and subsequently rate and review apps publicly on the online app stores. As a re-sult, these app stores provide unique and critical communication channels betweendevelopers and users, where users can provide relevant information to guide devel-opers in accomplishing their software development, maintenance, and evolutiontasks.Currently, iOS and Android mobile apps dominate the app market each withover 1.5 million apps in their respective app stores, i.e., Apple’s AppStore andAndroid Market, and there are hundreds of thousands of apps on Windows Mar-ketplace and Blackberry AppWorld.1 Recent estimations indicate that by 2017over 80 percent of all handset shipments will be smartphones, capable of runningmobile apps.2As with any new domain, mobile app development has its own set of new chal-lenges. Researchers have discussed some of the challenges involved in mobileapp development [82, 138, 166, 169, 186], however, most of the early related dis-cussions are anecdotal in nature. Additionally, to obtain better insights of issuesand concerns in software development, in general, it is a common practice thatresearchers investigate other sources of software development data. Thus, severalstudies have made an effort to understand mobile development issues and concernsthrough (1) mining mobile bug repositories [63, 112, 155] as they have become anintegral component of software development activities; (2) mining and analyzingapp stores’ content, such as user-reviews [71, 124, 185], mobile app descriptions[105, 143, 200], mobile app bytecode [54, 194, 195]; and (3) mining question andanswer (QA) websites that are used by developers [62, 148, 149, 212]. While thereare substantial qualitative field studies [106, 133, 203] on different areas of soft-1http://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/2http://www.prweb.com/releases/2013/11/prweb11365872.htm2ware engineering and traditional software development, limited field studies havebeen conducted to investigate the actual challenges and issues associated with mo-bile development.Thus, we start by conducting the first qualitative field study to gain an under-standing of the current practices and challenges in native mobile app development.To this end, we follow a Grounded Theory approach, a research methodology stem-ming from the social sciences [99], which is gaining increasing popularity in soft-ware engineering research [44, 133, 199, 204]. Therefore, instead of starting withpredetermined hypotheses, we set our objective to discover the process and chal-lenges of mobile app development across multiple platforms. We first conduct andanalyze interviews with 12 senior mobile app developers, from nine different in-dustrial companies, who are experts in platforms such as iOS, Android, WindowsMobile/Phone, and Blackberry. Based on the outcome of these interviews, we de-sign and distribute an online survey, which has been completed by 188 mobile appdevelopers worldwide.Our results reveal challenges of dealing with multiple mobile platforms duringmobile development. Mobile devices and platforms are extensively moving towardfragmentation, i.e., 1) each mobile platform is different with regard to the pro-gramming languages, API/SDK, supported tools, user interface, user experience,and Human Computer Interaction (HCI) standards; 2) on each platform, variousdevices exist with different properties such as memory, CPU speed, operating sys-tem, graphical resolutions, and screen sizes. Developers currently treat the mobileapp for each platform separately and manually check that the functionality is pre-served and consistent across multiple platforms. Furthermore, mobile developersneed better analysis tools in order to track metrics for their apps during the de-velopment phase. Additionally, testing is a significant challenge. Current testingframeworks do not provide the same level of support for different platforms andcurrent testing tools do not support important features for mobile testing such asmobility, location services, sensors, or different gestures and inputs.In our first study, developers mentioned that one of their challenges is to dealwith the crashes that are “very hard to catch and harder to reproduce” [86]. Thus,we perform the first empirical analysis of non-reproducible bug reports. While westart with mobile non-reproducible bugs, we notice that none of the related work3investigates non-reproducible bug reports in isolation. Thus, we expand the studyto other software environments and domains. Furthermore, our first field study in-dicates that to attract as many users as possible, developers often publish the sameapp for multiple mobile platforms [86]. While several studies have mined and ana-lyzed app stores [71, 124, 185], all of these studies focus on one app store only andnone has studied the same app published on different app stores. Thus, we conductthe first comparative study on mobile app-pairs, i.e., the same app implemented foriOS and Android platforms, in order to analyze and compare their various attributesand root causes of user complaints at multiple levels of granularity.Additionally, a previous study [192] shows that many developers interact withthe graphical user interface (GUI) to comprehend the software by creating a men-tal model of the application. Reverse engineering of desktop user interfaces wasfirst proposed by Memon et al. in a technique called GUI Ripping [159]. Forweb applications, Mesbah et al. [163] proposed a crawling-based technique to re-verse engineer the navigational structure and paths of a web application under test.Amalfitano et al. [49] extend this approach and propose a GUI crawling techniquefor a small subset of widgets of Android apps. While other related studies [97, 119]focus on Android apps, none has been done to reverse engineer Objective-C iPhoneapps automatically. Thus, in order to help developers gain a high-level understand-ing of their mobile apps, we propose the first automated technique that throughdynamic analysis of a given iPhone app, generates a state model of it. This gener-ated model can assist mobile developers to better comprehend and visualize theirmobile apps. It can also be used for maintenance, analysis and testing purposes(i.e., smoke testing, test case generation).Finally, we use model-based techniques in our last study, in order to address amajor challenge, faced by industrial mobile developers. The challenge is to keepthe app consistent and ensure that the behaviour is the same across multiple plat-forms [86]. Dealing with multiple platforms is not specific to the mobile domain.The problem also exists for cross-browser compatibility testing [74, 161]. How-ever, in the mobile domain, each mobile platform is different with regard to theOS, programming languages, API/SDKs, and supported tools, making it muchmore challenging to detect inconsistencies automatically. In the mobile domain,Rosetta [100] infers likely mappings between the JavaME and Android graphics4APIs. While none of the related work addresses inconsistency detection acrossnative mobile apps, we propose the first automated technique which for the samemobile app implemented for iOS and Android platforms infers abstract modelsfrom the dynamically captured traces and formally compares the app-pair usingdifferent comparison criteria to expose any detected inconsistencies.1.1 Research QuestionsThe goal of this thesis is to understand the current practices and challenges in mo-bile app development as well as proposing a new set of techniques and tools basedon the identified challenges to help mobile app developers. In order to address thisgoal, we designed five research questions. The first three research questions aim toobtain better insights regarding issues and concerns in mobile development through(1) interviewing and surveying developers in the field, (2) mining bug repositories,and (3) analyzing app stores’ content. The last two research questions are follow-up studies, which address the identified challenges by proposing techniques andtools.RQ1. What are the main challenges developers face in practice when theybuild mobile apps?As a preliminary step in our research journey, we start with this basic and crit-ical question. Thus, we conducted a qualitative field study, following a GroundedTheory approach, in which we interviewed 12 senior mobile developers from ninedifferent companies, followed by a semi-structured survey, with 188 respondentsfrom the mobile development community.RQ2. What are the characteristics of non-reproducible bug reports and thechallenges developers deal with?In our first study, developers mentioned that one of their challenge is “deal-ing with the crashes that are very hard to catch and harder to reproduce” [86].While, ideally each bug report should help developers to find and fix a softwarefault, there is a subset of reported bugs that is not (easily) reproducible, on whichdevelopers spend considerable amounts of time and effort. Although we start withmobile non-reproducible bugs, we notice that none of the related work investigatesnon-reproducible bug reports in isolation. Thus, we expand the study to other soft-5ware environment and domains. We perform an empirical analysis of bug reports,in particular, characterizing rate, nature, and root causes of 32K non-reproduciblereports. We quantitatively compared them with other resolution types, using a setof metrics and qualitatively analyzed root causes of 1,643 non-reproducible bugreports to infer common categories of the reasons these reports cannot be repro-duced.RQ3. What are the app-store characteristics of the same mobile app, publishedin different marketplaces? How are the major concerns or complaints different oneach platform?Furthermore, our first study indicates that to attract as many users as possible,developers often implement and publish the same app for multiple mobile plat-forms [86]. While, ideally, a given app should provide the same functionality andhigh-level behaviour across platforms, this is not always the case in practice andthere might be known/unknown differences in functionality of the same app-pairsdue to many reasons (legal, marketing, platform, API access). For instance, a userof the Android STARBUCKS app complains: “I downloaded the app so I couldplace a mobile order only to find out it’s only available through the iPhone app.A paying customer is a customer regardless of what phone they have and limitingtheir access to the business is beyond me.” An iOS NFL app review reads: “on theGalaxy you can watch the game live..., on this (iPad) the app crashes sometimes,you can’t watch live games, and it is slow.” Thus, as part of our goal to gain mo-bile development insights, we conducted a large-scale comparative study on 80KiOS and Android mobile app-pairs, in order to analyze and compare their variousattributes, user reviews, and root causes of user complaints at multiple levels ofgranularity. Since we noticed that most of the related work focused on one appstore only and none has studied the same app, published on different app stores,we mine the two most popular app stores i.e., the Google Play and Apple appstores and employ a mixed-methods approach using both quantitative and qualita-tive analysis. We also looked for app-pairs in the top rated 100 free and 100 paidapps listed on Google Play and Apple app stores and identify some of the obstaclesthat prevent developers from publishing their apps in both stores. Additionally,we built three automated classifiers and classified 1.7M reviews to understand howuser complaints and concerns vary across platforms.6RQ4. How can we help developers to better understand their mobile apps?Additionally, the previous study [192] shows that many developers interactwith the graphical user interface (GUI) to comprehend the software by creating amental model of the application. Thus, in order to help developers gain a high-levelunderstanding of their mobile apps, we propose a reverse engineering techniquethat automatically performs dynamic analysis of a given iPhone app by executingthe program and extracting information about the runtime behaviour. Our approachexercises the application’s user interface to cover the interaction state space. Ourtool, called ICRAWLER (iPhone Crawler), is capable of automatically navigatingand generating a state model of a given iPhone app. This generated model canassist mobile developers to better comprehend and visualize their mobile apps. Itcan also be used for maintenance, analysis and testing purposes (i.e., smoke testing,test case generation).RQ5. How can we help developers to automatically detect inconsistencies intheir same mobile app across multiple platforms?Finally, we address a major challenge, we found in our first qualitative study[86]. The challenge, faced by industrial mobile developers, is to keep the app con-sistent and ensure that the behaviour is the same across multiple platforms. Thischallenge is due to the many differences across the platforms, from the devices’hardware to operating systems (e.g., iOS/Android), and programming languagesused for developing the apps (e.g., Objective-C/Java). We found that developerscurrently treat the mobile app for each platform separately and manually performscreen-by-screen comparisons, often detecting many cross-platform inconsisten-cies. This manual process is, tedious, time-consuming, and error-prone. Thus,we first identify the most pervasive cross-platform inconsistencies between iOSand Android mobile app-pairs, through industrial interviews as well as a documentshared with us by the interviewees, containing 100 real-world cross-platform mo-bile app inconsistencies. Then, we propose an automated technique and tool, calledCHECKCAMP (Checking Compatibility Across Mobile Platforms), which for thesame mobile app implemented for iOS and Android platforms instruments and gen-erates traces of the app on each platform for a set of user scenarios. Then, it infersabstract models from the captured traces that contain code-based and GUI-basedinformation for each pair, and formally compares the app-pair using different com-7parison criteria to expose any discrepancies. Finally, it produces a visualization ofthe models, depicting any detected inconsistencies.1.2 ContributionsWe have conducted a series of studies on different aspects of mobile app analy-sis. In response to our research questions as outlined in Section 1.1, the followingpapers have been published and one is currently under review:• Chapter 2:– “Real Challenges in Mobile App Development” [86]. M. Erfani Joorabchi,A. Mesbah and P. Kruchten. In Proceedings of the 7th ACM/IEEE In-ternational Symposium on Empirical Software Engineering and Mea-surement (ESEM 2013). ACM/IEEE. 15–24.• Chapter 3:– “Works For Me! Characterizing Non-reproducible Bug Reports” [87].M. Erfani Joorabchi, M. Mirzaaghaei and A. Mesbah. In Proceedingsof the 11th ACM Working Conference on Mining Software Reposito-ries (MSR 2014). ACM. 62–71.• Chapter 4: The author and an ECE master student collaborated equally toan ACM SIGSOFT conference submission of this work and it is currentlyunder review. We addressed four research questions in this work and mymain contributions are:– In RQ1, I was responsible for quantitative analysis of 80K app-pairs’attributes. I compared their attributes such as ratings, stars, categories,prices, versions and calculated their statistics to investigate their differ-ences.– In RQ1, I equally contributed to categorize the reasons of price fluctu-ation.– I was responsible for RQ2.8– In RQ3, I manually inspected and labeled 2.1K problematic reviewsfor training the generic classifier.– In RQ3, I manually inspected and labeled 2.1K problematic reviewsfor training the sentiment classifier.– In RQ3, I calculated the statistical results and figures for the genericand sentiment classes.– In RQ3, I equally contributed to defining success rate.– In RQ4, I was responsible for topic modelling analysis on 20 app-pairs.– In RQ4, I equally contributed to defining classes for the complaintsclassifier.– In RQ4, I manually inspected and labeled 500 problematic reviews forthe complaints classifier.– I equally contributed in writing the paper.• Chapter 5:– “Reverse Engineering iOS Mobile Applications” [85]. M. Erfani Joorabchiand A. Mesbah. In Proceedings of the 19th IEEE Working Conferenceon Reverse Engineering (WCRE 2012). IEEE Computer Society. 177–186.• Chapter 6:– “Detecting Inconsistencies in Multi-Platform Mobile Apps” [88]. M.Erfani Joorabchi, M. Ali and A. Mesbah. In Proceedings of the 26thIEEE International Symposium on Software Reliability Engineering(ISSRE 2015). IEEE Computer Society. 450–460.9Chapter 2Real Challenges in Mobile App DevelopmentSummary3Mobile app development is a relatively new phenomenon that is increasing rapidlydue to the ubiquity and popularity of smartphones among end-users. The goal ofour study is to gain an understanding of the main challenges developers face inpractice when they build apps for different mobile devices. We conducted a qual-itative study, following a Grounded Theory approach, in which we interviewed12 senior mobile developers from nine different companies, followed by a semi-structured survey, with 188 respondents from the mobile development community.The outcome is an overview of the current challenges faced by mobile develop-ers in practice, such as developing apps across multiple platforms, lack of robustmonitoring, analysis, and testing tools, and emulators that are slow or miss manyfeatures of mobile devices. Our initial study was conducted in 2012; to examinewhether the results of our study still hold in 2015, we survey 15 senior develop-ers, including the senior interviewees in our earlier study, and report the findings.Based on our findings of the current practices and challenges, we highlight areasthat require more attention from the research and development community.3The main study in this chapter appeared at the 7th ACM/IEEE International Symposium onEmpirical Software Engineering and Measurement (ESEM 2013) [86].102.1 IntroductionThe ubiquity and popularity of smartphones among end-users has increasinglydrawn software developers’ attention over the past recent years. Currently, iOS andAndroid each have over 1.5 million mobile apps on Apple’s AppStore and AndroidMarket, and there are hundreds of thousands of apps on Windows Marketplace andBlackberry AppWorld.4 Recent estimations indicate that by 2017 over 80 percentof all handset shipments will be smartphones, capable of running mobile apps.5As with any new domain, mobile application development has its own set ofnew challenges, which researchers have recently started discussing [46, 138, 166].Kochhar et al. [138] discussed the test automation culture among app developersby surveying Android and Windows app developers. Miranda et al. [166] reportedon an exploratory study through semi-structured interviews. Most early relateddiscussions [82, 92, 169, 213], however, are anecdotal in nature. While there aresubstantial qualitative studies on different areas of software engineering, limitedstudies have been conducted to investigate the challenges that mobile app develop-ers face in practice.The goal of our study is to gain an understanding of the current practices andchallenges in native mobile app development. To this end, we conducted an explo-rative study by following a Grounded Theory approach, a research methodologystemming from the social sciences [99], which is gaining increasing popularityin software engineering research [44, 133, 199, 204]. Thus, instead of startingwith predetermined hypotheses, we set our objective to discover the process andchallenges of mobile app development across multiple platforms. We started byconducting interviews with 12 senior mobile app developers, from nine differentindustrial companies. The developers are experts in building mobile apps for plat-forms such as iOS, Android, Windows Mobile/Phone, and Blackberry. Based onthe outcome of these interviews, we designed and distributed an online survey,which was properly completed by 188 mobile app developers worldwide.Our results reveal challenges of dealing with multiple mobile platforms duringmobile development. While mobile devices and platforms are extensively moving4http://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/5http://www.displaysearch.com/pdf/131121 smartphones to pass global mobile phoneshipments by 2017.pdf11toward fragmentation, the contemporary development process is missing the adap-tation to leverage knowledge from platform to platform. Developers currently treatthe mobile app for each platform separately and manually check that the function-ality is preserved across multiple platforms. Furthermore, mobile developers needbetter analysis tools in order to track metrics for their apps during the developmentphase. Additionally, testing is a significant challenge. Current testing frameworksdo not provide the same level of support for different platforms. Additionally,platform-supported tools are required, as the current third party testing solutionshave limited support for important features of mobile testing such as mobility, lo-cation services, sensors, or different gestures and inputs.2.2 Study DesignThe objective of our study is to gain an understanding of the challenges mobile appdevelopers face in practice.2.2.1 MethodologyConsidering the nature of our research goal, we decided to conduct a qualitativestudy by following a Grounded Theory approach [79, 99]. Grounded Theory isbest suited when the intent is to learn how people manage problematic situationsand how people understand and deal with what is happening to them [45]. It is alsouseful when the research area has not been covered in previous studies [130] andthe emphasis is on new theory generation [78], i.e., understanding a phenomenon.Grounded Theory has been gaining more traction in software engineering researchrecently [44, 70, 106, 132, 133, 139, 191, 203, 204].2.2.2 Data Collection and AnalysisOur approach for conducting a Grounded Theory research includes a combinationof interviews and a semi-structured survey. The interviews targeted experts in mo-bile app development and the survey was open to the general mobile developmentcommunity.Our interviews were conducted in an iterative style, and they are at the core ofthe data collection and analysis process. At the end of each interview, we asked12the interviewees for feedback on our set of questions; what is missing and whatis redundant. The analytical process involves collecting, coding and analyzingdata after each interview while developing theory simultaneously. From the in-terview transcripts, we analyze the data line-by-line, break down interviews intodistinct units of meaning (sentences or paragraphs), allocate codes to the text andlabel them to generate concepts to these units. Our codes, where appropriate, aretaken from the text itself. Otherwise, they are created by the authors to capture theemerging concepts. Furthermore, these concepts are then clustered into descriptivecategories. They are re-evaluated and subsumed into higher-order categories inorder to generate an emerging theory. Theoretical sampling evolves into an ever-changing process, as codes are analyzed and categories and concepts continue todevelop [77]. We perform constant comparison [99] between the analyzed dataand the emergent theory until additional data being collected from the interviewsadds no new knowledge about the categories. Thus, once the interviewees’ answersbegin to resemble the previous answers, a state of saturation [98] is reached, andthat is when we stop the interviewing process.Based on the theory emerging from the interview phase, we designed a semi-structured survey, as another source of data, to challenge this theory. Before pub-lishing the survey and making it publicly available, we asked four external people– one senior Ph.D. student and three mobile app developers – to review the surveyin order to make sure all the questions were appropriate and easily comprehensible.Most of our survey questions are closed-ended, but there are also a few optionalopen-ended questions for collecting participants’ ‘insights’ and ‘experiences’. Theresponses to these open-ended questions are fed into our coding and analysis stepto refine the results, where applicable. This survey, as distributed to participants,is available online.6 Additionally, the first page of the survey includes the purposeand procedures of the study, potential risks and benefits, privacy and confidential-ity, our contact information and consent.6http://www.ece.ubc.ca/∼merfani/survey.pdf13Table 2.1: Interview Participants.ID Role Platform Experience SoftwareDev Exp (yr)Mobile DevExp (yr)Company (MobileDev Team Size)Company’s Platform SupportP1 iOS Lead iOS, Android 6-10 6 A (20) iOS, Android, Windows, BlackberryP2 Android Lead Android, iOS 6-10 6 A (20) iOS, Android, Windows, BlackberryP3 Blackberry Lead Blackberry, iOS, Android 6-10 6 A (20) iOS, Android, Windows, BlackberryP4 iOS Lead iOS 6-10 3-4 B (2-5) iOS, AndroidP5 Android Lead Android 6-10 3 B (2-5) iOS, AndroidP6 iOS Dev iOS 4-5 3-4 C (20+) iOS, AndroidP7 Windows Mobile Dev Windows, Android 10+ 2 D (1) WindowsP8 Android Dev Android 4-5 2-3 E (2-5) iOS, AndroidP9 Android Lead Android, iOS, Windows 10+ 5-6 F (6-10) iOS, Android, WindowsP10 iOS Dev iOS, Android 10+ 3 G (1) iOS, AndroidP11 Android Lead Android, Blackberry 10+ 6+ H (1) Android, BlackberryP12 iOS Dev iOS, Windows 10+ 2-3 I (2-5) iOS, Windows142.2.3 Participant DemographicsInterviews. We interviewed 12 experts from nine different companies in Canada.Each interview session took on average around 30 minutes. We recorded audioin the interview sessions and then transcribed them for later analysis. Table 2.1presents each participant’s role in their company, the mobile platforms they haveexpertise in, the number of years they have work experience in software develop-ment and in mobile app development, the size of the mobile development team, andfinally all the mobile platforms that each company supports. Regarding the partici-pants’ experience in developing mobile apps, five have around six years, four have3–4 years and three have 2–3 years of experience. Five participants are mainlyiOS experts, five are Android experts, one is a Windows expert, and finally one isa Blackberry expert. In addition, the category distribution of their apps includesTools/Utilities, Business, Social Networks, Maps/Navigation, Games, Education,Travel, Music, Videos, Sports, Entertainment, and (events and conferences, medi-cal professionals and self-improvement) categories.Survey. Our survey was fully completed by 188 respondents. We released thesurvey to a wide variety of mobile development groups. We targeted the popularMobile Development Meetup groups, LinkedIn groups related to the native mobiledevelopment and shared the survey through our Twitter accounts. We kept the sur-vey live for two and a half months. No reward or incentive (such as donations)was offered to the participants. In our attempt to distribute our online survey, itwas interesting to see people’s reactions; they liked our post on LinkedIn groupsand gave encouraging comments such as “I hope it will help to make mobile appdevelopers’ lives easier”. The following shows our original invitation message:Are you a native mobile app developer? Please participate in our researchstudy to help us understand the real challenges: Link to the surveyThanks,1513% 20.9% 14.7% 51.4% 0% 20% 40% 60% Less than 2 years 2-5 years 6-10 years More than 10 years Figure 2.1: How many years of work experi-ence do you have in software development:16.4% 57.6% 19.2% 6.8% 0% 20% 40% 60% 80% Less than 1 year 1-3 years 4-6 years More than 6 years Figure 2.2: How many years of work experi-ence do you have in native mobile applica-tion development:The demographics of the participants in the survey are as follows. 92% weremale and 5% female; they come from USA (48%), India (11%), Canada (10%),Israel (5%), The Netherlands (3%), UK (3%), New Zealand (2%), Mexico (2%),and 15 other countries.The histogram of their work experience in software development is shown inFigure 2.1, where 52% have more than 10 years, 15% between 6–10 years, 20%between 2–5 years, and 13% less than 2 years. Their experience in native mo-bile development, shown in Figure 2.2, ranges from 6% more than 6 years, 19%between 4–6 years, 59% have between 1–3 years, to 16% less than 1 year.The platforms they have expertise in are presented in Figure 2.3, which include72% iOS, 65% Android, 26% Windows, 13% Blackberry, and 6% chose others(e.g., Symbian, J2ME). In terms of the number of mobile apps they have developed,Figure 2.4 depicts 64% have developed less than 10 apps, 22% have developed10–20 apps and the rest more than 20 apps. As shown in Figure 2.5, they builtdifferent types of apps such as Tools/Utilities, Business, Social Networks, Maps,Games, Education and more.Figure 2.6 indicates that 25% are freelance mobile developers, 33% work in acompany while 33% do both. As shown in Figure 2.7, 42% work with 2–5 native1672.3% 65.5% 13.6% 25.4% 10.7% 0% 20% 40% 60% 80% iOS (iPhone/iPad/iPod) Android Blackberry Windows Mobile/Phone/8 Other Figure 2.3: What platforms do you develop na-tive mobile applications for (Check all thatapply):More than 20 14.1% 10-20 Apps 22%  Less than 10 63.8% Figure 2.4: How many native mobile applica-tions have you developed so far:31.1% 9.6% 38.4% 32.2% 15.3% 21.5% 33.3% 16.9% 15.8% 18.6% 11.9% 10.7% 14.7% 14.7% 18.1% 31.6% 51.4% 46.9% 5.1% 0% 30% 60% Games Weather Social Networking Maps/Navigation/Search Music News Entertainment Banking/Finance Video/Movies Shopping/Retail Sports Communication Food/Restaurant Travel Health Education/Learning Tools/Utilities Business Other Figure 2.5: What types of native apps have you built (check all that apply)?mobile app developers (including developers for different platforms), 27% are theonly mobile app developers and 15% work with 6–10 other developers.1726% 34.5% 32.2% 7.3% 0% 25% 50% Freelance App Developer App Developer in Company Both above Other Figure 2.6: Which of the following applies toyou:27.1% 42.4% 14.7% 15.8% 0% 25% 50% Only me (one developer) 2 to 5 developers 6 to 10 developers Other Figure 2.7: If you are working in a company,how big is the native mobile applicationdeveloper team (including developers fordifferent platforms)?2.3 FindingsThe findings from our study consist of 4 main categories and 31 subordinate con-cepts. Figure 2.8 presents an overview of our results. For each concept, appropriatecodes and quotes are presented in this section.In addition to the general challenges faced by mobile developers (Section 2.3.1),two major themes emerged from the study, namely (1) challenges of developingmobile apps across multiple platforms (Section 2.3.2), and (2) current practices(Section 2.3.3) and challenges (Section 2.3.4) of mobile app analysis and testing.18Figure 2.8: An overview of our four main categories with 31 subordinate concepts.192.3.1 General Challenges for Mobile DevelopersIn this subsection, we present the most prominent general challenges faced by mo-bile app developers, emerging from our study results.Moving toward Fragmentation rather than Unification76% of our survey participants see the existence of multiple mobile platforms as achallenge for developing mobile apps, while 23% believe it is an opportunity fortechnology advances that drive innovation (See Figure 2.9).More than half of the participants mentioned that mobile platforms are movingtoward fragmentation rather than unification:• Fragmentation across platforms: Each mobile platform is different with re-gard to the user interface, user experience, Human Computer Interaction(HCI) standards, user expectations, user interaction metaphors, program-ming languages, API/SDK, and supported tools.• Fragmentation within the same platform: On the same platform, various de-vices exist with different properties such as memory, CPU speed, and graph-ical resolutions. There is also a fragmentation possible on the operatingsystem level. A famous example is a fragmentation on Android devices withdifferent screen sizes and resolutions. Almost every Android developer inboth our interviews and survey mentioned this as a huge challenge they haveto deal with on a regular basis.Furthermore, device fragmentation is not only a challenge for developmentbut also for testing. All of our participants believe that platform versioning andupgrading is a major concern; For example, a respondent said: “at the OS level,some methods are deprecated or even removed.” So developers need to test theirapps against different OS versions and screen sizes to ensure that their app works.Subject P5 said they mostly maintain “a candidate list of different devices andsizes”. P11 explained, “because we monitor our application from the feedback ofthe users, we tend to focus on testing the devices that are most popular.” Thus, thecurrent state of mobile platforms adds another dimension to the cost, with a widevariety of devices and OS versions to test against. P11 continued, “right now we20Yes 75.7% No 23.2%  Other 1.1% Figure 2.9: Do you see the existence of multiple platforms as a challenge for developing mobile applica-tions and why?support 5 or 6 different (app) versions only because there are different OS versions,and on each of those OS versions we also have 3–4 different screen sizes to makesure the application works across each of the Android versions.” A respondentstated, “we did a code split around version 2.3 (Android). So we have two differentversions of the applications: pre 2.3 version and post 2.3 version. And in terms ofour policy, we made that decision since it is too difficult to port some features.”Monitoring, Analysis and Testing Support“Historically, there has almost been no one doing very much in mobile app testing”,stated P10 and explained that until fairly recently, there has been very little testing,and very few dedicated testing teams. However, that is changing now and they havestarted to reach out for quality and testing. Automated testing support is currentlyvery limited for native mobile apps. This is seen as one of the main challengesby many of the participants. Current tools and emulators do not support importantfeatures for mobile testing such as mobility, location services, sensors, or differentgestures and inputs. Our results indicate a strong need of mobile app developers forbetter analysis and testing support. Many mentioned the need to monitor, measure,and visualize various metrics of their apps through better analysis tools.Open/Closed Development PlatformsAndroid is open source whereas iOS and Windows are closed source. Some par-ticipants argued that Apple and Microsoft need to open up their platforms. P5explained: “We have real challenges with iOS, not with Android. Because you21don’t have API to control, so you have to jump into loops and find a back door be-cause the front door is locked. Whatever Apple allows is not enough sometimes.”An example of such lack of control is given: “to find out whether we are connectedto the Bluetooth.” On the other hand, P9 explained that because Android is opensource and each manufacturer modifies the source code to their own desires andreleases it, sometimes they do not stick to the standards. A simple example is pro-vided: “the standard Android uses commas to separate items in a list, but Samsungphones use a semicolon.” A respondent stated, “Many Android devices have beenbadly customized by carriers and original equipment manufacturers.”Data Intensive AppsDealing with data is tricky for apps that are data intensive. As a respondent ex-plained: “So much data cannot be stored on the device, and using a network con-nection to sync up with another data source in the backend is challenging.” Re-garding offline caching in hybrid solutions, P1 said: “Our apps have a lot of dataand offline caching doesn’t seem to really work well.”Apps and Programming LanguagesTwo of our participants explained that there have been a number of comparisons(e.g., performance-wise) between programming languages used for native mobiledevelopment such as Java, C, and Objective-C. Java has huge benefits ofbeing platform independent and popular language with many resources and thirdparty libraries compares to Objective-C. However, P3 stated that Java is notas efficient on a mobile device and slow. P1 elaborated that “while Apple hadbuilt Cocoa framework over many years, recently Objective-C is accepted bythe iOS development community. Going with Java would negate a lot of theiradvantages that they have in Cocoa.”Keeping Up with Frequent ChangesOne type of challenge mentioned by many developers is learning more languagesand APIs for the various platforms and remaining up to date with highly frequentchanges within each software development kit (SDK). “Most mobile developers22will need to support more than one platform at some point”, a respondent stated.“Each platform is totally different (marketplaces, languages, tools, design guide-lines), so you need experts for every one of them. Basically, it is like trying to writesimultaneously a book in Japanese and Russian; you need a native Japanese and anative Russian, or quality will be ugly”, explained another respondent. As a result,learning another platform’s language, tools, techniques, best practices, and HCIrules is challenging.While many developers complained about learning more languages and APIsfor the various platforms and the lack of an integrated development environmentthat supports different mobile platforms, P1 explained: “Right now we develop intwo main platforms: iPhone and Android. That is not really that hard, the nativeSDKs are pretty mature and they are easy to learn. Additionally, it is not required tohave hundreds of thousands of lines of code to do something. You have 50 thousandlines of code and you have a complex app.”2.3.2 Developing for Multiple Platforms67% of our interview participants and 63% of our survey respondents have experi-enced developing the same app for more than one mobile platform.Mobile-web vs. Hybrid vs. Native Mobile AppsSubjects P1 and P8 support developing hybrid apps. The remaining 10 intervieweesare in favour of building pure native apps and believe that the current hybrid modeltends to look and behave much more like web pages than mobile applications. P11argued that “the native approach offers the greatest features” and P4 stated, “userexperience on native apps is far superior [compared] to a web app.” In a numberof cases, the participants had completely moved away from the hybrid to the nativeapproach. A recurring example given is Facebook’s switch from an HTML5-basedmobile app to a native one.On the other hand, P1 argued that “it really depends on the complexity and typeof the application”, for example, “information sharing apps can easily adopt thehybrid model to push news content and updates across multiple platforms.”In the survey, 82% responded having native development experience, 11% have23tried hybrid solutions, and 7% have developed mobile web apps. Most respondentsare in favour of the native approach. Regarding mobile web, they stated that: “Mo-bile web doesn’t feel or look like any of the platforms.” Or: “You will never be ableto control some hardware through the web.” Or: “Mobile web is a powerful toolappropriate for some uses, but is not extensive enough to replace native.”Regarding the hybrid approach, others said that: “HTML5 has much potentialand will likely address many of the current problems in the future as it saves devel-opment time and cost”; or: “since many big players are investing a lot in HTML5,it may take a big chunk of the front-end side when it becomes stable.”Regarding native approach, they stated that: “There will always be a demandfor the specificity of a native app.” Or: “In some fields, web or hybrid will prevail;but there are many cases where we need a native app.” Most of the participantsargued that when development cost is not an issue, companies tend to developnative apps. Of course it also depends on the application type; where better userexperience or device specific features are needed, native seems to be a clear choice.Lastly, when we asked our participants that whether native app developmentwill be replaced by hybrid solutions or mobile web development due to its chal-lenges, all the interviewees and 70% of survey participants disagreed, and 10%indicated that there will always be a combination of native and hybrid approaches:“They’ll coexist as they do today in PC world.”Available Cross-Platform SolutionsRegarding common practices for building cross-platform apps, our participants ex-plained that a variety of technologies exists. Hybrid approaches have the conceptof recompilation in native with the power of cross-platform execution. PHONE-GAP, APPCELERATOR TITANIUM, XAMARIN, CORONA7 and many other toolsexist, which follow different approaches and some have their own SDK. As ex-pected, there are different attitudes towards them since there are no silver bul-lets yet defined. For instance, P1 explained that “with PHONEGAP one basicallywrites HTML and JavaScript code and it is translated to the native libraries,but they have performance and user experience drawbacks.” A respondent said,7http://coronalabs.com/2437.4% 33.7% 19.8% 9.1% 0% 20% 40% No Yes, from scratch Yes, port part of it Yes, other Figure 2.10: Have you developed the same native mobile app across different platforms?“I would like to see mobile web applications or hybrid frameworks (e.g. PHONE-GAP/ APPCELERATOR TITANIUM) reach a level of responsive user experience thattruly mimics the experience users will find native to the platform.”Limiting Capabilities of a Platform’s DevicesNot all devices and operating systems of a platform have the same capabilities. Forinstance, Android has different versions and browsers in some of those versionshave poor support for HTML5. Most of the participants in favour of the hybrid ap-proach believe that once the adaptation is complete (e.g., with mature web browsersin the platforms), there would be more interest from the community for hybrid de-velopment.Reusing Code vs. Writing from Scratch67% of our interview participants have tried both methods of writing a native mo-bile app from scratch for a different platform and reusing some portions of thesame code across platforms. The majority stated that it is impossible or challeng-ing to port functionality across platforms and that when a code is reused in anotherplatform, the quality of the results is not satisfactory.Figure 2.10 shows that out of the 63% survey respondents, who have expe-rienced developing mobile apps across different platforms, 34% have written the25same app for each platform from scratch, and 20% have experienced porting someof the existing code. A respondent said, “every platform has different requirementsfor development and porting doesn’t always produce quality”; or: “At this moment,I believe that it is best to create the apps from scratch targeting the individual OS.”P11 argued that “we ported a very little amount of the code back and forth betweenAndroid and Blackberry, but we typically write the code from scratch. While theyboth use Java, they don’t work the same way. Even when basic low levels of Javaare the same, you have to rewrite the code.”In addition to the differences at the programming language level (e.g., Javaversus Objective-C), P9 elaborated why migrating code does not work: “Asimple example is the way they [platforms] process push messages. In Android, apush message wakes up parts of the app and it requests for CPU time. In iOS, theserver would pass the data to Apple push server. The server then sends it to thedevice and no CPU time to process the data is required.” These differences acrossplatforms force developers to rewrite the same app for different platforms, with noor little code reuse. This is seen as one of the main disadvantages of native appdevelopment.Behavioural Consistency versus Specific HCI GuidelinesIdeally, a given mobile app should provide the same functionality and behaviourregardless of the target platform it is running on. However, due to the internal dif-ferences in various mobile devices and operating systems, “a generic design for allplatforms does not exist”; For instance, P12 stated that “an Android design cannotwork all the way for the iPhone.” This is mainly due to the fact that HCI guidelinesare quite different across platforms, since no standards exist for the mobile world,as they do for the Web for instance. Thus, developers are constantly faced with twocompeting requirements:• Familiarity for platform users: Each platform follows a set of specific HCIguidelines to provide a consistent look-and-feel across applications on thesame device. This makes it easier for end users to navigate and interact withvarious applications.• Behavioural consistency across platforms: On the other hand, developers26would like their application to behave similarly across platforms, e.g., userinteraction with a certain feature on Blackberry should be the same as oniPhone and Android.Thus, creating a reusable basic design that will translate easily to all platformswhile preserving the behavioural consistency is challenging. As P9 stated: “Theapp should be re-designed per platform/OS to make sure it flows well”; A respon-dent put it: “We do screen by screen design review for each new platform”; or:“Different platforms have different strengths and possibilities. It is foolish to tryto make the apps exactly the same between platforms”; and: “It requires multi-platform considerations at the designing stage and clever decisions should be madewhere platform-specific design is necessary.” Other respondents explained, “Youare writing a lot of native code, across all the different platforms. You have to writeit 3-4 times. One of the problems is keeping them consistent. There are UI and UXdifferences between different platforms. Android is different from iPhone app andyou also want to take advantage of native, but you also want to make sure that thesame app is consistent across all the different platforms.”Time, Effort, and Budget are MultipliedDue to the lack of support for automated migration across platforms, developershave to redesign and reimplement most of the application. Although the applica-tion definition, the logic work, and backend connectivity would be similar regard-less of platform, the production phase would require adaptations to create nativeapplications for each platform. This is because the flow of the application develop-ment is different; iOS uses Objective-C for development, Android uses Java,etc. Therefore, creating quality products across platforms is not only challengingbut also time-consuming and costly, i.e.“developing mobile apps across platformsnatively is like having a set of different developers per each platform”, stated P11.As a result, “re-coding against wildly different API sets” increases the cost andtime-to-market within phases of design, development, testing, and maintenance,which is definitely a large issue for start-up and smaller companies.273.2% 63.8% 31.4% 1.6% 0% 50% 100% Automatically Manually Hybrid (both) Other Figure 2.11: How are your native mobile apps tested?2.3.3 Current Testing PracticesAs outlined in Subsection 2.3.1, many developers see analysis and testing of mobileapps as an important activity to provide dependable solutions for end-users. Ourstudy results shed light on the current practices of mobile application analysis andtesting.Manual Testing is PrevalentAs shown in Figure 2.11, 64% of our survey participants test their mobile appsmanually, 31% apply a hybrid approach, i.e., a combination of manual and auto-mated testing and only 3% engage in fully automated testing. P3 explained: “Rightnow, manually is the best option. It’s kind of like testing a new game, testing onconsoles and devices. It is that kind of testing I believe just maybe smaller, butyou have to worry about more platforms and versions.” A respondent stated: “Or-ganizations, large and small, believe only in manual testing on a small subset ofdevices”; and another one said: “It’s a mess. Even large organizations are hardto convince to do automated testing”, or “I have not used automation testing. I’veheard good and bad about it, but most of my apps are small enough to be manuallytested.”2880.3% 58% 27.7% 1.1% 3.2% 0% 50% 100% Developer Testing team (tester) Beta tester No testing Other Figure 2.12: Who is responsible for testing your native mobile apps?Developers are TestersThere are different combinations of testing processes and approaches currentlytaken by the industry. They can be categorized based on a company’s size, clients,development culture, testing policy, application type, and the mobile platformssupported. These testing approaches are performed by various people such as de-velopers, testing teams, beta testers, clients, as well as third-party testing services.As indicated in Table 2.1, our interviewees’ companies vary from small size with1–2 developers to larger mobile development companies or teams with over 20developers. As expected, larger companies can afford dedicated testing teams oremploy beta field testing while in smaller companies testing is mainly done by de-velopers or clients (end-users) and in more informal and ad-hoc way. Additionally,large teams use more devices for manual testing, and have tried to automate partof the testing procedure such as data creation, building and deploying the app intodevices.Figure 2.12 depicts the results of our survey with regard to roles responsiblefor testing. 80% of the respondents indicated that the developers are the testers,53% have dedicated testing teams or testers, and 28% rely on beta testers.The majority of the participants, with or without testing teams, stated that afterdeveloping a new feature, the developers do their own testing first and make sureit is functional and correct. This is mostly manual testing on simulators and if29Manually 22% Test multiple times for each Platform 41% No multiple Platform Development 37% Figure 2.13: How do you test your application’s correctness across multiple platforms?available on physical devices.Test the App for Each Platform SeparatelyOur interviews reveal that app developers treat each platform completely separatelywhen it comes to testing. Currently, there is no coherent method for testing a givenmobile app across different platforms; being able to handle the differences at the UIlevel is seen as a major challenge. Testers write “scripts that are specific for eachplatform”, and they “are familiar with the functionality of the app, but are testingeach platform separately and individually”. We also noticed that there are usuallyseparate teams in the same company, each dedicated to a specific platform withtheir own set of tools and techniques; P6, an iOS developer, said: “I am not sureabout Android, as the teams in our company are so separate and I don’t even knowwhat is going on with the other side.” Responses provided by 63% of our surveyparticipants, who develop the same native mobile app for more than one platform,confirmed the interview results, stating: “The test cases apply to each platform, butthey must be implemented uniquely on each platform”, or: “I have to do it twice ormore depending on how many platforms I have to build it on”, or: “Treat them asseparate projects, as they essentially are, if native. Do testing independently!”Figure 2.13 shows that, out of 63% of our survey participants that develop thesame native mobile app for more than one platform, 22% test their apps manually,and 41% indicated that they “test the app for each platform separately”.300% 50% 100% Unit Testing Integration Testing System Testing Regression Testing GUI Testing Acceptance Testing Performance Testing Usability Testing Security Testing N/A Hybrid Automatic Manual Figure 2.14: What levels of testing do you apply and how?Levels of TestingLevels of testing refer to the stages of testing such as unit, integration, system, re-gression, GUI, etc. Our study aimed to determine the existence, value and processof the testing stages in mobile app development. Figure 2.14 illustrates differentlevels of testing applied on mobile apps. There is very little automation for dif-ferent levels of testing, e.g., around 3% for each of GUI, acceptance, and usabilitytesting. P2 noted: “It is not really well structured or formal what we do. We dosome pieces of all of them [ad-hoc] but the whole testing is a manual process.”GUI TestingMore than half of the participants admitted that GUI testing is challenging to au-tomate. P2 said: “Automated UI testing is labor intensive, and can cause inertiawhen you want to modify the UI. We have a manual tester, core unit testing, thenemploy beta field testing with good monitoring.”P7 stated: “Our company has Microsoft products. With Microsoft studio in-terface, you can emulate a lot of sensors for testing GUI whereas in Eclipse forAndroid, you need to click a lot of buttons. You can emulate the position in yourphone, but Android doesn’t do this.”P3 elaborated: “Blackberry is actually really hard to create test scripts for GUI31testing. Because it is not like other platforms, which are touch-based and layout-based. With Blackberry, you have to know what field manager is and it is hard toactually get this information by clicking on buttons. You have to go through thewhole array of elements.”Some tools were highlighted such as ROBOTIUM8 and MONKEYRUNNER9 forAndroid. A few iOS developers said they have tried MONKEYTALK10 (formerlycalled FONEMONKEY) and KIF11 for GUI testing; P1 stated: “I find KIF to be a lotmore mature than automation testing provided by Apple, esp. if you want to auto-mate using a build server. Even with KIF you have to write a lot of Objective-Ccode to work properly. But it is still hard to be used for our custom and dynamicapplications.”Unit TestingOur study shows that the use of unit testing in the mobile development communityis relatively low. Both interview and survey results (See Figure 2.14) reveal thatunit testing for native mobile apps was not commonplace in 2012, however, thatis changing recently (see Section 2.4). Figure 2.14 shows levels of testing forall participants. The comparison of automation testing across the main platformsshown in Figure 2.15 – Figure 2.18.While some respondents argued that “the relatively small size of mobile appsmakes unit testing overkill and deciding whether it’s worth writing unit tests or savethe time and test manually is always difficult”; or: “Complete unit testing to get fullcoverage is overkill. We only unit test critical code”; or: “Small projects withsmall budgets - the overhead of creating rigorous test plans and test cases wouldhave a serious impact on the budget.” On the other hand, others said that “therapidly changing user expectations and technology mean unit testing is crucial.”Our interviewees believe that having a test suite for the core generic features of theapp is the best approach in the long term. P12 said: “Unit tests are still the best.They are easy to run, and provide immediate feedback when you break something.”8http://code.google.com/p/robotium/9http://developer.android.com/tools/help/monkeyrunner concepts.html10http://www.gorillalogic.com/testing-tools/monkeytalk11https://github.com/square/KIF320%50%100%Unit TestingIntegration System TestingRegression GUI TestingAcceptance Performance Usability TestingSecurity TestingN/AHybridAutomaticManualFigure 2.15: iOS levels of testing.0%50%100%Unit TestingIntegration System TestingRegression GUI TestingAcceptance Performance Usability TestingSecurity TestingN/AHybridAutomaticManualFigure 2.16: Android levels of testing.0%50%100%Unit TestingIntegration System TestingRegression GUI TestingAcceptance Performance Usability TestingSecurity TestingN/AHybridAutomaticManualFigure 2.17: Windows levels of testing.0%50%100%Unit TestingIntegration System TestingRegression GUI TestingAcceptance Performance Usability TestingSecurity TestingN/AHybridAutomaticManualFigure 2.18: Blackberry levels of testing.33Unit testing seems to be more popular among Android and Windows develop-ers, using JUnit and NUnit, respectively.Two iOS participants have tried writing unit tests for iPhone using XCODEINSTRUMENTS12 as well as SENTESTINGKITFRAMEWORK, a built-in Xcode tool.P1 explained: “iOS apps are not really built to be fully unit tested. You have tostructure your code properly in order to actually write good unit tests. It is hard totest our apps because a lot of view manipulation logic and business logic are mixedin the controllers and it is hard to write unit tests for controllers. This is one of theMVC’s [Model View Controller] shortcomings that could discourage developersfrom writing unit tests. Testing models are easier with unit-test and a better way totest UI is to write integration/acceptance tests using e.g., KIF or CALABASH.13”P12 argued: “iOS doesn’t make it easy to have test automation” and a respondentsaid: “Apple’s Developer testing tools don’t play well.”Usability TestingWhile there is not a common agreement on usability standards for all platformsand “usability experts have not agreed on [mobile app] standards like [they havefor] the web”, our participants acknowledged that “usability is the most importantfactor for testing on these devices, because that is what people care about.” P10stated, “In the past we have controlled the dialog and the image and the distribu-tion on the web but now the store providers, like Apple and Microsoft have. Theycontrol whether we can submit or not. We have to follow their rules, and peoplehave a huge social platform to rank us on the store and it is much more front centrethan it was before.” End-users nowadays have the ability to collectively rank appson the mobile app stores. If users like an app, they download and start using it.If not, they delete it and move on immediately. If they really like it, they rank ithigh; Or if they really dislike it, they go on social media and complain. Thus, alow-quality release can have devastating consequences for mobile developers. Asa result, there is a huge emphasis on usability testing. As P8 explained, “definitelyone of our big challenges is usability testing, which is manual. I do heuristic eval-12https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/InstrumentsUserGuide/Introduction/Introduction.html13http://calaba.sh/34uations personally and then evaluate with real users.” P11 elaborated: “Our us-ability testing encompasses a large portion of not only features but also UI. Withinthe application, we got a community of testers that are willing to test our newestand greatest software parts and put some feedback on it.” Or a respondent said:“You need to spend a good amount of time to fix the UI, usability and performanceissues for each platform.” Additionally, the next aspect is emotion, explained bya participant: “the personal attachment of people to their devices cannot be simu-lated in a test environment. Users take these devices into bed, bathroom, doctor’soffice. So in very intimate parts of their lives, they are using these apps and theyhave an emotion attached to that.”Security TestingAs observed by another study for Android apps [63], security bug reports are ofhigher quality but get fixed slower than non-security bugs. While there has been anumber of studies related to security, privacy leaks and malware behaviour of mo-bile apps [48, 67, 84, 189, 207], in practice, as it is shown in Figure 2.14, securitytesting has the least priority (i.e., N/A) among mobile app development commu-nity. P10 stated that “I don’t do security, but everything else I do. There is verylittle tool supports to help with this.” However, P1 explained that it depends onthe extensive of testing, for example if the app is for a major enterprise client thatsecurity is a top priority, they need more testing than normal apps. He continued:“We have got security audit from various security companies that actually havedone security testing on our apps. But we don’t do anything internally.”Performance TestingThe more critical the app is, the more performance testing is conducted. As shownin Figure 2.14, performance testing has been performed mostly manually but someof our participants have used different types of tools. P5 stated that “Because ofthe nature of our apps which are on goggles, we have to be very critical about per-formance and battery consumption. So we do a lot of related testing and measurethe currency of battery usage.” P7 added: “A lot of apps, such as games, requireperformance, if you develop some middleware to drag down the user experiences,35nobody will use it because in games user experience is the most important feature.”Beta Testers and Third Party Testing ServicesBeta testing, such as TESTFLIGHT14, seems to be quite popular in mobile appdevelopment; although P5 emphasized that “the beta testers are in the order ofdozens not thousands.” TestFlight automates parts of the process, from deployingthe app to collecting feedback. Further, there are many cases in which the clientsare responsible for testing, i.e., recruiting beta testers or acceptance testing. P6explained that they have internal and external client tracking systems: “Basicallywe have two bug tracking systems, internal and client tracking system (external).The clients create bugs in that tracking system and our testing team try to reproducebugs to see if it is a valid and reproducible bug. If so they duplicate it in our internaltracking system. Then developers will look at it again.”Additionally, some developers rely on third party testing services such as PER-FECTOMOBILE15 and DEVICEANYWHERE.16 However, “it is usually too volatileand the tools in many cases support very simple apps. Honestly not really worththe effort”, said one of our interviewees. Other participants’ attitudes toward test-ing services are varied; P12 argued: “Services should be affordable, and not justreport bugs but also provide some documents that indicate how people test the ap-plication, and give a high-level overview of all the paths and possibilities that aretested.” Another respondent said: “Most online testing services charge a very heftypremium even for apps that are distributed for free”; and: “It is nice to test an appby a third party, someone who is not the developer. At the same time, just randomtesting doesn’t do the trick. You need to have a more methodical approach, but theproblem with methodical approaches is that they turn the price up.” P11 said: “Wedon’t want to lock in on one specific vendor and tend to use open-source tools, suchas JUnit.” Another problem mentioned is that “if we want to change something theway we want to, we don’t have access to the source code. So we can’t change theservices of the framework.”14https://developer.apple.com/testflight/update/15http://www.perfectomobile.com/16http://www.keynote.com/solutions/testing/mobile-testing36Handling User Workflow InterruptionRelated to the usability and multi-screens, some of our participants stated that manyusers go through a workflow using multiple devices. For instance, P10 explainedthat a user may search for a flight on her smart phone and find a good deal, but tobook the flight requires much typing. So she goes on her laptop or tablet and booksthe flight from there, then she goes back to the smartphone to save the e-ticket.Thus, long workflows tend to make users swap between devices, and apps shouldbe able to handle such interruptions.2.3.4 Analysis and Testing ChallengesIn this subsection, we present the challenges experienced, by our interview partic-ipants and survey respondents, for analyzing and testing native mobile apps.Limited Unit Testing Support for Mobile Specific FeaturesAlthough JUnit is used by more than half of the Android participants, many alsopoint out that “JUnit is designed for stationary applications and it has no interfacewith mobile specifics such as sensors (GPS, accelerometer, gyroscope), rotation,navigation”. As a result, “there is no simple way to inject GPS positions, to ro-tate the device and verify it that way”. P11 explained: “we are creating a ‘mapapplication’, which requires users typically being outdoors, moving around andnavigating, which is not supported by current testing tools.” Writing mobile spe-cific test scenarios requires a lot of codes and is time-consuming and challenging.A number of participants indicated that having “a JUnit type of framework withmobile specific APIs and assertions” would be very helpful.Monitoring and AnalysisBoth our interview and survey data indicate a strong need of mobile app develop-ers for better analysis and monitoring support. Many mentioned the need to moni-tor, measure, and visualize various metrics of their apps such as memory manage-ment (to spot memory leaks), battery usage (to optimize battery life), CPU usage,pulling/pushing data, and network performance (over various networks, e.g., 2G,3G, 4G and wireless connections) through better analysis tools. “A visualization37tool such as those hospital monitoring devices with heart rate, blood pressure, etc.,would help to gain a better understanding of an app’s health and performance”,explained P8.Handling CrashesOne major problem mentioned in mobile app testing is about crashes, which areoften intermittent, non-deterministic, and irrecoverable. It is challenging for devel-opers to capture enough information about these crashes to analyze and reproducethem [220] so that they can be fixed. Many developers in our study found it helpfulto have a set of tools that would enable capturing state data as a crash occurs andcreating a bug report automatically. P5 stated: “Dealing with the crashes that arevery hard to catch and harder to reproduce is an issue. It would be good that whenthe crashes happen, system logs and crash logs can be immediately captured andsent to developers over the phone.”Emulators/SimulatorsEmulators are known to mimic the software and hardware environments found onactual devices whereas simulators only mimic the software environment. Manymobile developers believe that better support is needed to mimic real environments(e.g., network latency, sensors) for testing. Another issue mentioned is that rootedsimulators and emulators are needed in order to access features outside of the ap-plication, such as settings, play store, Bluetooth and GPS, which could be part ofa test case. Also, the performance of emulators is a key factor mentioned by manyof our participants. Compared to iOS Simulator, “Android emulator is very slow. Iuse my device for testing instead”, said P8.Missing Platform-Supported ToolsAlmost all of the participants mentioned that current tools are weak and unreli-able with limited support for important features for mobile testing such as mobil-ity, location services, sensors and different inputs. They have experienced manyautomation failures or many cases where testing tools actually slowed the develop-ment process down substantially.38Some of our participants stated that platform-supported tools are needed, e.g.,“unit testing should be built-in”. A respondent said: “the platforms have to sup-port it (testing). 3rd party solutions will never be good enough.”, and another onesaid they need “strong integrated development environment support”. Some notedthat the process will be similar to that for web applications, “it took years to cre-ate powerful tools for analyzing and testing web apps, and we are still not therecompletely.”Rapid Changes Over TimeOur interview reveals that requirements for mobile app projects change rapidly andvery often over time. This is the reason our participants argued that they have dif-ficulties to keep the testing code up to date. A respondent said: “changing require-ments means changing UI/logic, so GUI and integration tests must be constantlyrewritten.” P1 stated: “there are some testing tools out there, but we don’t use anyof them because we can’t keep the tests updated for our dynamic apps.” P10 statedthat due to rapid changes, they have “time constraints for creating test scripts andperforming proper testing”.Testing Device in the Wild with Many Possibilities to CheckAn issue mentioned by some of the participants is the fact that combination of pa-rameters in the wild is challenging, but it is best to test the app where the users areactually using it. P10 stated: “Weather condition has an effect on wireless activityand the visual representation of app.” Our participants explained that there are somany different possibilities to test, and places that could go potentially wrong onmobile apps. Thus, “it is difficult to identify all the usage scenarios and possibleuse cases while there is a lot of hidden states; for example, enabling/disabling thelocation services, and weak and strong network for network connectivity”. P12finds: “The relation between apps should be well managed, you might be inter-rupting other apps, or they might be interrupting yours.” P12 provides an ex-ample: “manage the states when an audio recording app goes into background.”Furthermore, a participant argued that based on missing or misleading usage speci-fications, they should avoid under-coverage (missing test cases) and over-coverage39(waste of time and human resources for testing situations that won’t happen in thereal world). Another related issue to take care includes upgrading as P12 explained:“For example, when upgrading from iOS 5 to iOS 6, the permissions are different.So your app in the wild just stop working. Unfortunately, there is nothing that canhelp you figure those out.”App Stores’ RequirementsDevelopers have to follow mobile app stores’ (e.g., Apple’s AppStore, AndroidGoogle Play, Windows Marketplace and Blackberry AppWorld) requirements todistribute their apps to end users. These requirements change often, and develop-ers need a way to test their apps’ conformance. “I would like to have somethingmore robust for me to mimic what the publisher (store) will be doing so that I cancatch the error earlier in the development process,” said a respondent. Addition-ally, “pushing the app to individual devices is more complex than necessary”, forinstance in iPhone.2.4 What Has (not) Changed since 2012? A Follow-upStudyThree years after our initial qualitative study, we felt it was necessary to find outwhat has changed and what has remained the same in the app development spec-trum of challenges and practices.2.4.1 Survey DesignThis follow-up study was conducted during the months of February and Marchof 2015, by surveying mobile app experts. First, we created a document outlin-ing the main findings of our results from the initial study with each finding hasspace for an optional comment. We also included an open-ended question askingwhether the existing mobile app development challenges have changed or whethernew challenges have emerged since 2012.17 The goal of the extended study is notto perform a whole new study. We aimed to target, particularly, our original inter-17http://goo.gl/forms/kp98G1ldpY40viewees (the earlier 12 experts) again for a follow-up and see what has changed.Thus, we sent an email providing a link to this survey directly to 25 experts in ournetwork, including all the interviewees in our earlier study as well as new knownexperts in our network. Similar to the original study, we also shared this link to thepopular Mobile Development Meetup and LinkedIn groups related to native mo-bile development. However given the nature of this survey, all of our respondentsare from the emails we directly contacted.2.4.2 Our ParticipantsWe received 15 responses, from which four were from our original pool of inter-viewees. The 15 respondents were native iOS or Android developers in Canada,and they are from 12 different companies. Additionally, they have an average offive years of app development experience.2.4.3 Analysis and Summary of Survey FindingsThe survey results reveal interesting findings; they indicate that overall, the list ofthe challenges from our initial study is still valid and that there are some new orchanged challenges, which we discuss next.Moving toward Fragmentation rather than UnificationExternal fragmentation, i.e., fragmentation across platforms, seems to have de-creased since 2012; as a respondent explained: “the fragmentation [across plat-forms] is definitely less visible than before. Before, there had to be implementa-tions for Symbian, Blackberry, Android, iOS, and Windows Phone, but this haschanged; Android and iOS are the front runners. So by choosing [these] two, agood percentage of users can already be covered.”The challenge with internal fragmentation, i.e., fragmentation within the sameplatform, used to mainly affect Android developers because of the many hardwarevariations in devices. However, Apple is also moving into that direction. As aresult, currently there is also internal fragmentation with Apple as it has releasednew devices with different screen sizes (such as iPhone 5, 6 and 6+) resulting inmore variations. A respondent stated: “They [Apple] have released updated APIs41that make supporting the new screen sizes much easier, but it requires using thelatest OS version (8.0), which is difficult if you have a legacy codebase.”Open/Closed Development PlatformsOur respondents indicated that mobile development tools have matured signifi-cantly and are more stable in the last few years. Apple and Google have releasedenhanced IDEs (e.g., XCode 6, Android Studio) that makes development more ef-ficient. Furthermore, app development has become more popular and it is easier tofind skilled developers as the pool of developers has grown.In terms of Android being open-source, the fact that any manufacturer has theability to customize their version of Android and modify the source is still a majorissue on Android. Our respondents explained that manufacturers introduce varia-tions on the underlying implementations of the OS that has the potential of break-ing the contract of APIs. When this happens, apps that work on stock or near-stockversions of the OS perform as expected, but the custom versions have unexpectedanomalies on the UI and also hardware components, such as the camera. This re-sults in software fragmentation that is most difficult to deal with for app developers.On the other hand, as explained by a respondent, “as an Android developer,I have been blessed with two entities; the first being Google engineers and thesecond being the open-source community. Both these entities have contributedimmensely to lower the barriers of mobile app development with tools, librariesand documentation.” Another android developer agreed that “More open-sourcelibraries are now available that are easy to integrate into your mobile app provid-ing robust functionality and interesting features. This helps prevent reinventing thewheel as there are more resources for developers to lean on.” However, it was alsomentioned that while developers would prefer open-source software, most of theapps out there are still closed-source.Web vs. Hybrid vs. Native Mobile AppsIn terms of hybrid solutions and cross-platform tools that promise write-once, runeverywhere, the respondents stated that they are still not mature enough to be usedin production code. Currently, there seems to be more interest in adopting the hy-42brid approach. A respondent stated, “the trend I have seen recently is that peoplestart with a hybrid implementation with minimal functionality, and as they startgaining traction, they change to a native implementation for performance and bet-ter user experience.” Another respondent stated, “there is still quite a large tech-nical debate, especially for start-ups, about whether to create native apps (specif-ically iOS and Android) versus just creating a responsive mobile web experience.This is mostly determined by [the availability of] funds and resources to create,test, monitor, handle customer support issues, and maintain additional softwareapplications.”Although our respondents agreed that mobile browsers are becoming more ma-ture to support mobile web apps, they also raised concerns; for instance one re-spondent added: “user expectations about the platform have also risen and someUI effects (for example the extensive use of transparency in iOS 7+) are difficult todo non-natively. So non-native apps are trying to hit a moving target, and there isno reason to think that the target will stop moving.”Also when it comes to more complex apps, hybrid apps cannot fully supporteverything that is needed; “we have stayed away from hybrid also because it lim-its the app performance and list of available APIs to use”, a respondent stated.Another major problem in the hybrid approach, mentioned by our respondents, isdebugging: “it is difficult to debug code when developing a hybrid model as the is-sue could be in the native or web layer. Also attaching a debugger to the web-layeris slow and makes it extremely difficult to use.”Monitoring and AnalysisAll of the respondents still mentioned that monitoring and analysis challenges arefurther increased today. While mobile app crash reporting tools such as CRASH-LYTICS18, Google Analytics19 for mobile apps, and NEWRELIC20 have evolvedover time and a few profiling/monitoring tools embedded in the IDEs, our respon-dents believed that “analytics, logging, crash-reporting, etc. are general require-ments for any app development and should be available as a framework uniform18https://try.crashlytics.com/19http://www.google.com/analytics/mobile/20http://newrelic.com/43across all platforms and controllable via configurations. At this moment, each ofthese is being addressed via certain service providers in a variety of ways.”Levels of TestingThere is more awareness among developers related to the benefits of testing andmore testing tools are emerging to help app developers. Most of our respondentsagreed that automated testing tools have improved in the last three years. So hasthe prevalence of unit and GUI testing in practice. An android developer stated,“for Unit testing and functional testing, Google and the open-source communityhave come together once again to provide a toolset that allows developers to re-ally provide unit testing and proper UI testing. Tools like ROBOLECTRIC21 andESPRESSO22 for Android have brought unit testing and functional testing to a muchhigher level. But, it is still not easy to write the unit tests, mainly because of all thedeep native APIs that are difficult to mock.” Another respondent said, “with plat-forms like GENYMOTION23 allowing to run the same application across multipledevices very quickly and efficiently and then integration test platforms like Jenkins,it has become easier to run the app across many different platforms.”Keeping Up with Frequent ChangesIt is still challenging to keep up with the new releases, especially with Apple break-ing core functionality in new releases. A respondent stated, “it is getting more dif-ficult to develop for iOS now and guidelines are scattered when it comes to AppStore submissions. Also, Apple has made this worse by discontinuing tools, buyingup others and breaking them, such as TESTFLIGHT.”In terms of keeping up to date for devices, an Android developer explainedthat “before Google and the open-source community (OSC) really picked up theirpace in Android 4.0, backwards compatibility was extremely difficult. Many de-vices never got upgraded to newer versions. However, Google and the OSC foundways to decouple some of their core APIs into libraries that brought backwardscompatibility all the way back to Android 2.0.”21http://robolectric.org/22https://code.google.com/p/android-test-kit/wiki/Espresso23https://www.genymotion.com/44Data Intensive AppsRegarding dealing with data-intensive apps, a respondent explained: “syncing witha back-end is challenging but there are third-party tools such as PARSE24 that solvethis problem for some use-cases.”Apart from the aforementioned updates, our respondents agreed that the rest ofour initial findings remains the same: “The rest is pretty much the same as it usedto be and I think your paper has captured it reasonably well”, a respondent stated.2.5 Threats to ValiditySimilar to quantitative research, qualitative studies could suffer from threats tovalidity, which is challenging to assess as outlined by Onwuegbuzie et al. [179].For instance, in codification, the researcher bias can be troublesome, skewingresults on data analysis [132]. We tried to mitigate this threat through triangulation;The codification process was conducted by two researchers, one of whom had notparticipated in the interviews, to ensure minimal interference of personal opinionsor individual preferences. Additionally, we conducted a survey to challenge theresults emerging from the interviews.Both the interview and survey questionnaire were designed by a group of threeresearchers, with feedback from four external people – one senior Ph.D. studentand three industrial mobile app developers – in order to ensure that all the questionswere appropriate and easily comprehensible.Another concern was a degree of generalizability. We tried to draw represen-tative mobile developer samples from nine different companies. Thus, the distri-bution of participants includes different companies, development team sizes, plat-forms, application domains, and programming languages – representing a widerange of potential participants. Of course, the participants in the survey also havea wide range of background and expertise. All this gives us some confidence thatthe results have a degree of generalizability.One risk within Grounded Theory is that the resulting findings might not fitwith the data or the participants [99]. To mitigate this risk, we challenged thefindings from the interviews with an online survey, filled out by 188 practitioners24https://parse.com/products/core45worldwide. The results of the survey confirmed that the main concepts and codes,generated by the Grounded Theory approach, are in line with what the majority ofthe mobile development community believes.Lastly, in order to make sure that the right participants would take part in thesurvey, we shared the survey link with some of the popular Mobile DevelopmentMeetup and LinkedIn groups related to native mobile app development. Further-more, we did not offer any financial incentives nor any special bonuses or prizes toincrease response rate.2.6 DiscussionWe discuss the challenges that are worth further investigation by the research anddevelopment community.2.6.1 Mapping StudyWe complement our quantitative and qualitative analysis with a mapping study inorder to show how the research and industry community is investigating some ofthese challenges. Table 2.2 presents a mapping study between the research commu-nity and our challenges, listed into Analysis and Testing Studies and Multiple Plat-forms Studies. Among analysis and testing studies are model-based approaches,record-and-replay approaches, context-sensitive events, mobile security and pri-vacy leaks, and performance profiling. Among multiple platforms studies are de-vice fragmentation, cross-compilation approaches, and mappings and consistencychecking. Additionally, there exists a body of other challenges that are recognizedby researchers, listed under Other Studies in Table 2.2. Among them are studiesrelated to mobile energy efficiency, app bytecode, the impact of unstable or buggyAPIs, the impact of mobile ads, management of informative user reviews, manage-ment of bug reports, and technology selection frameworks. We discussed most ofthem in this section and the related work section.46Table 2.2: A mapping study.Analysis and Testing Studies:• Model-based Approach [49, 73, 85, 119, 128, 153, 154, 217]• Record-and-Replay Approach [102]• Context-Sensitive Approach [43, 66, 120, 146, 216]• Mobile Security and Privacy Leaks [47, 54, 67, 84, 196, 218, 221]• Performance Profiling [142, 151, 177]Multiple Platforms Studies:• Android Fragmentation [112, 135]• Cross-Compilation Approach* [101, 117, 171, 187]• Mappings and Consistency Checking [88, 100]Other Studies:• Mobile Energy Efficiency [57, 60, 176, 188]• Mobile App Bytecode [194, 205]• Impact of Unstable/Buggy APIs* [58, 147, 158, 195]• Impact of Mobile Ads [107, 167, 173, 174]• Informative User Reviews Management* [71, 105, 110, 123, 152, 181, 184]• Bug Reports Management* [63, 112, 155, 168]• Technology Selection Frameworks [157]* Discussed in Section 2.7.2.6.2 Same App across Multiple PlatformsDevelopmentA prominent challenge emerging from our study is the fact that developers have tobuild the same native app for multiple mobile platforms. Although developing formultiple platforms is a recurring problem that is not unique to the mobile world,the lack of proper development and analysis support in the mobile environment47exacerbates the challenges. Opting for standardized cross-platform solutions, suchas HTML5, seems to be the way to move forward. However, HTML5 needs to bepushed towards maturation and adoption by major mobile manufacturers, whichin turn can mitigate many of the cross-platform development problems. Anotherpossible direction to pursue is exploring ways to declaratively construct [114] na-tive mobile applications, by abstracting the implementation details into a model,which could be used to generate platform-specific instances. That being said, deal-ing with the critically of user experience in such apps and the dramatic differencesamong the platforms not just in terms of APIs but also in the types of interactions,are among the challenges.Consistency CheckingSince each mobile platform requires its own unique environment in terms of pro-gramming languages, tools, and development teams, another related challenge ischecking the correctness and consistency of the app developed across different plat-forms. As revealed by our findings, developers currently conduct manual screen-by-screen comparisons of the apps across platforms to check for consistent be-haviour. However, this manual process is tedious and error-prone. One way totackle this problem is by constructing tools and techniques that can automaticallyinfer interaction models from the app on different platforms (See Table 2.2). InChapter 5, we reverse engineer a model of iOS applications [85]. Similarly, others[119, 217] are looking into Android apps. The models of the app, generated fromdifferent platforms, can be formally compared for equivalence on a pairwise basis[162] to expose any detected discrepancies. In Chapter 6, we propose an auto-mated technique for detecting inconsistencies in the same native app implementedin iOS and Android platforms [88]. Other studies could use image-processingtechniques in the mapping phase. Additionally, they could focus on capturing in-formation regarding the API calls made to utilize the device’s native functionalitysuch as GPS, SMS, Address Book, E-mail, Calendar, Camera, and Gallery, as wellas device’s network communication i.e., client-server communication of platform-specific versions of a mobile app (similar to the cross-platform feature matching ofweb applications [74]). Such automated techniques would drastically minimize the48difficulty and effort in consistency checking since many mobile developers manu-ally “do screen-by-screen design review for each new platform”.TestingRegarding the testing challenges, follow-up studies could focus on generating testcases for mobile apps. A centralized automatic testing system that generates a (dif-ferent) test case for each target platform could be a huge benefit. While platform-specific features can be customized, core features could share the same tests. Thus,further research should focus on streamlining application development and testingefforts regardless of the mobile platform.2.6.3 Testing Mobile-Specific FeaturesThe existing testing frameworks have limitations for testing mobile-specific fea-tures and scenarios such as sensors (GPS, Accelerometer, gyroscope), rotation,navigation, and mobility (changing network connectivity). As a consequence de-velopers either need to write much test fixture code to assert mobile-specific sce-narios or opt for manual testing. Thus, creating “a JUnit type of frameworkwith mobile-specific APIs and assertions” would be really beneficial. While thereare open-source and commercial tools available in the market that help emulatecontextual events e.g., Genymotion25 or Lockito26, our interviews mentioned thatbuilt-in and platform-supported tools are needed as third-party solutions are hardto be good enough.Additionally, on the academic side as listed in Table 2.2, related studies [43,66, 120, 146] proposed testing frameworks that consider not only GUI events butalso contextual events. Liang et al. [146] present Caiipa, a cloud service for testingapps over an expanded mobile context space in a scalable way. It incorporates keytechniques to make app testing more tractable, including a context test space prior-itizer to quickly discover failure scenarios for each app. Chandra et al. [66] havedeveloped techniques for scalable automated mobile app testing within two proto-type services – VanarSena [190] and Caiipa [146]. In their paper, they describe a25https://www.genymotion.com/26https://play.google.com/store/apps/details?id=fr.dvilleneuve.lockito49vision for SMASH, a cloud-based mobile app testing service that combines bothprevious systems to tackle the complexities presently faced by testers of mobileapps.2.6.4 Other Challenging AreasThere are also serious needs for (1) rooted emulators that can mimic the hardwareand software environments realistically; (2) better analysis tools, in order to mea-sure and monitor different metrics of the app under development; (3) techniquesthat would help debugging apps by capturing better state data when unexpectedcrashes occur; (4) testing APIs from app stores, in order to catch the inconsisten-cies of code with a store’s guidelines and internal APIs. In particular for Apple appstore, it would be beneficial if a set of testing APIs (e.g., as services) could checkthe code against, before submitting to the stores.2.7 Related WorkResearchers have discussed some of the challenges involved in mobile app de-velopment [46, 82, 92, 138, 166, 169, 206, 213], however, most of these discus-sions are anecdotal in nature. Other recent studies have made an effort to obtainbetter insights regarding issues and concerns in mobile development through (1)mining question and answer (QA) websites [62, 148, 149, 212] that are used bydevelopers; (2) mining and analyzing app stores’ content, such as user-reviews[68, 71, 95, 113, 134–136, 140, 147, 152, 156, 181, 184], mobile app attributesand descriptions [105, 143, 197], mobile app bytecode [54, 194, 195, 218], and (3)mining mobile bug repositories [63, 112, 155, 168]. We categorize related workinto the aforementioned classes as well as cross-platform app development studiesand grounded theory studies in software engineering. We also provide a review ofthe current papers and their relationship with the challenges in mobile developmentthat are widely recognized by the researchers, such as proliferation of malware viafake markets or apps highjacking, crowdsourced requirements, the management ofnon-informative reviews/bug reports from the crowd, and impact of unstable/buggyAPIs.50Mobile App Development and Testing ChallengesRecently, there have been numerous studies [46, 82, 92, 138, 150, 166, 169, 213]related to the development and testing of mobile apps. Kochhar et al. [138] dis-cussed the test automation culture among app developers. They surveyed 83 An-droid and 127 Windows app developers and found that time constraints, compat-ibility issues, lack of exposure, and cumbersome tools are the main challenges.Miranda et al. [166] reported on an exploratory study through semi-structured in-terviews with nine mobile developers. They found that developers perceive theAndroid platform as more accessible and compatible with their existing knowl-edge, however, its fragmentation is the major problem. Additionally, some devel-opers choose iOS because sales are more profitable on that platform. Muccini etal. [169] briefly discussed challenges and research directions on testing mobileapps by analyzing the state of the art. Performance, security, reliability, and energyare strongly affected by the variability of the environment where the mobile devicemoves towards. Dehlinger et al. [82] briefly described four challenges they seefor mobile app software engineering and possible research directions. These chal-lenges are namely, creating user interfaces accessible to differently-abled users,developing for mobile application product-lines, supporting context-aware appli-cations, and specifying requirements uncertainty. Franke et al. [92] have shownthat life-cycles of mobile platforms (iOS, Android, Java ME) have issues with theofficial lifecycle models. They presented a way to reverse engineer any mobileapp lifecycle. They found for each platform either errors in the official models,inconsistencies in the documentation or a lack of information in both. Wasserman[213] briefly discussed a number of mobile-related research topics including devel-opment processes, tools, user interface design, application portability, quality, andsecurity.Mining QA WebsitesBeyer et al. [62] presented a manual categorization of 450 Android related postsof StackOverflow concerning their question and problem types using the input ofthree Android app developers. The study highlights that developers have problemswith the usage of API components, such as user interface and core elements. Errors51are mentioned in questions related to Network, Database, and Fragments. Linares-Vasquez et al. [148] used topic modelling techniques to extract hot topics fromStackOverflow mobile-development related questions. Their findings suggest thatmost of the questions include general topics such as IDE-related and compatibilityissues, while the specific topics, such as crash reports and database connection, arepresented in a reduced set of questions. In another study, Linares-Vasquez et al.[149] investigated how changes occurring to Android APIs trigger questions andactivity in StackOverow. They found that Android developers have more questionswhen the behaviour of APIs is modified e.g., deleting public methods from APIs isa trigger for questions. Wang et al. [212] analyzed API-related posts regarding iOSand Android development from StackOverflow to understand API usage challengesbased on forum-based input from a multitude of developers. Bajaj et al. [56] minedStackOverflow for questions and answers related to mobile web apps. They foundthat web-related discussions are becoming more prevalent in mobile development,and developers face implementation issues with new HTML5 features such as Can-vas.Mining App StoresKhalid [134] manually analyzed and tagged reviews of iOS apps to identify the dif-ferent issues that users of iOS apps complain about. They [136] studied 6,390 lowstar-rating user-reviews for 20 free iOS apps and uncovered 12 types of complaints.Their findings suggest that functional errors, feature requests and app crashes arethe most frequent complaints while privacy and ethical issues, and hidden app costsare the complaints with the most negative impact on app ratings. Gorla et al. [105]clustered Android apps by their description topics to identify potentially maliciousoutliers in terms of API usage. Their CHABADA prototype identified severalanomalies in a set of 22K Android apps. Avdiienko et al. [54] compared benignand malicious Android apps by mining their data flow from sensitive sources. Theyfound that the data for sensitive sources ends up in typical sinks that differ betweenbenign and malicious apps. Khalid et al. [135] helped game app developers dealwith Android fragmentation by picking the devices that have the most impact ontheir app ratings, and aiding developers in prioritizing their testing efforts. Mining52the user reviews of 99 free game apps, they found that although apps receive userreviews from 38-132 unique devices, 80% of the reviews originate from a smallsubset of devices. Pagano et al. [181] carried out an exploratory study on overone million reviews from iOS apps to determine their potential for requirementsengineering processes. They found that most of the feedback is provided shortlyafter new releases, and the content has an impact on download numbers. Theyalso found that reviews’ topics include user experience, bug reports, and featurerequests. Linares-Vasquez et al. [147] investigated how the fault and change-proneness of APIs used by free Android apps relates to their success estimated asthe average rating provided by the users. They [58] also surveyed 45 Android de-velopers to indicate that apps having high user ratings use APIs that are less fault-and change-prone than the APIs used by low rated apps. As also revealed by ourstudy, McDonnell et al. [158] found that rapid platform/library/API evolution isamong the challenges mobile developers and testers are faced with.Mining Bug RepositoriesHan et al. [112] analyzed fragmentation within Android by extracting topics frombug reports of HTC and Motorola, using topic modelling techniques. They foundthat hardware-based fragmentation affecting the bugs reported in the Android bugrepository as even for shared common topics there was a divergence in topic key-words between vendors. Martie et al. [155] presented an approach to examinethe topics of concern for the Android open-source projects using issue trackers.They used LDA to examine Android bug XML logs and analyzed topic trends anddistribution over time and releases.Cross-platform App DevelopmentThere have been a number of comparison studies [81, 121, 180, 182] of several“write once run anywhere” tools (e.g., PHONEGAP, APPCELERATOR TITANIUM,XAMARIN, etc.). Other studies [83, 101, 129, 157] have analyzed different web-based or hybrid mobile app development frameworks, while others [187] have dis-cussed cross-compilation approach. For instance, Palmieri et al. [182] report acomparison between four different cross-platform tools (RHODES, PHONEGAP,53DRAGONRAD and MOSYNC) to develop applications on different mobile OSs.Huy et al. [121] studied and analyzed four types of mobile applications, namely,native, mobile widgets, mobile web, and HTML5. Masi et al. [157] proposed aframework to support developers with their technology selection process for thedevelopment of a mobile application, which fits the given context and require-ments. Gokhale et al. [101] discussed an approach for developing and deliveringexisting web and desktop applications as mobile apps. Their proposal is a vari-ant of Hybrid development model that utilizes code translators to translate existingweb or desktop applications for the target mobile platforms. Puder et al. [187] de-scribed a cross-compilation approach, where Android apps are cross-compiled toC for iOS and to C# for Windows Phone 7, from byte code level to API mapping.Grounded Theory Studies in Software EngineeringMany researchers have used a grounded theory approach in qualitative softwareengineering studies [44, 45, 70, 77, 78, 89, 90, 106, 130, 132, 132, 133, 139, 175,191, 203, 204, 214] in order to understand software development practices andchallenges of industrial practitioners [44]. For instance, Khadka et al. [133] de-scribed an exploratory study where 26 industrial practitioners were interviewed onwhat makes a software system a legacy system, what the main drivers are that leadto the modernization of such systems, and what challenges are faced during themodernization process. The findings were validated through a survey with 198 re-spondents. Greiler et al. [106] conducted a grounded theory study to understandthe challenges involved in Eclipse plug-in testing. The outcome of their interviewswith 25 senior practitioners and a structured survey of 150 professionals providesan overview of the current testing practices, a set of barriers to adopting test prac-tices, and the compensation strategies adopted because of limited testing by theEclipse community. Based on their findings, they proposed a set of recommen-dations and areas for future research on plug-in based systems testing. Througha grounded theory approach, Sulayman et al. [204] performed interviews with 21participants representing 11 different companies, and analyze the data qualitatively.They propose an initial framework of key software process improvement successfactors for small and medium Web companies. Kasurinen et al. [131] discussed54the limitations, difficulties, and improvement needs in software test automation fordifferent types of organizations. They surveyed employees from 31 software devel-opment organizations and qualitatively analyzed 12 companies as individual cases.They found that 74% of surveyed organizations do not use test automation consis-tently. Karhu et al. [130] explored the factors that affect the use of software testingautomation through a case study within 5 different organizations. They collecteddata from interviews with managers, testers, and developers and used groundedtheory. They found that the generic and independent (of third-party systems) testedproducts emphasize on the wide use of testing automation. Coleman et al. [77, 78]adopt the grounded theory methodology to report on the results of their study ofhow software processes are applied in the Irish software industry. The outcome isa theory that explains when and why software process improvement is undertakenby software developers.Our study aims at understanding the actual challenges mobile developers faceby interviewing and surveying developers in the field. To the best of our knowl-edge, our work is the first to report a qualitative field study targeting mobile appdevelopment practices and challenges.2.8 ConclusionsOur study has given us a better, more objective understanding of the real challengesfaced by the mobile app developers today, beyond anecdotal stories.Our results reveal that having to deal with multiple mobile platforms is one ofthe most challenging aspects of mobile development. In particular, more recentlythe challenge with internal fragmentation within the same platform is significant.Since mobile devices and platforms are moving toward fragmentation, the devel-opment process cannot leverage information and knowledge from a platform toanother platform. When the ‘same’ app is developed for multiple platforms, de-velopers currently treat the mobile app for each platform separately and manuallycheck that the functionality is preserved across multiple platforms and devices.Also creating a reusable user-interface design for the app is a trade-off betweenconsistency and adhering to each platform’s standards. Our study also shows thatmobile developers need mainly platform-supported analysis tools to measure and55monitor their apps. Also, testing is a huge challenge currently. Most develop-ers test their mobile apps manually. There are more awareness recently for Unittesting within the mobile community, however current testing frameworks do notprovide the same level of support for different platforms. Additionally, most devel-opers feel that current testing tools are weak and have limited support for importantfeatures of mobile testing such as mobility (e.g., changing network connectivity),location services, sensors, or different gestures and inputs. Finally, emulators seemto lack several real features of mobile devices, which makes analysis and testing,even more, challenging.56Chapter 3Works For Me! Characterizing Non-reproducibleBug ReportsSummary27Bug repository systems have become an integral component of software develop-ment activities. Ideally, each bug report should help developers to find and fix asoftware fault. However, there is a subset of reported bugs that is not (easily) re-producible, on which developers spend considerable amounts of time and effort.We present an empirical analysis of non-reproducible bug reports to characterizetheir rate, nature, and root causes. We mine one industrial and five open-sourcebug repositories, resulting in 32K non-reproducible bug reports. We (1) compareproperties of non-reproducible reports with their counterparts such as active timeand number of authors, (2) investigate their life-cycle patterns, and (3) examine 120Fixed non-reproducible reports (i.e., non-reproducible reports that were marked asFixed later in their life-cycle). In addition, we qualitatively classify a set of ran-domly selected non-reproducible bug reports (1,643) into six common categories.Our results show that, on average, non-reproducible bug reports pertain to 17%of all bug reports, remain active three months longer than their counterparts, canbe mainly (45%) classified as “Interbug Dependencies”, and 66% of Fixed non-reproducible reports were indeed reproduced and fixed.27This chapter appeared at the 11th ACM Working Conference on Mining Software Repositories(MSR 2014) [87].573.1 IntroductionWhen a failure is detected in a software system, a bug report is typically filedthrough a bug tracking system. The developers then try to validate, locate, andrepair the reported bug as quickly as possible. In order to validate the existenceof the bug, the first step developers take is often using the information in the bugreport to reproduce the failure. However, reproducing reported bugs is not alwaysstraightforward. In fact, some reported bugs are difficult or impossible to repro-duce. When all attempts at reproducing a reported bug are futile, the bug is markedas non-reproducible (NR) [7, 24].Non-reproducible bugs are usually frustrating for developers to deal with [40].First, developers usually spend a considerable amount of time trying to reproducethem, without any success. Second, due to the very nature of these bug reports,there is typically no coherent set of policies to follow when developers encountersuch bug reports. Third, because they cannot be reproduced, developers are reluc-tant to take responsibility and close them.Mistakenly marking an important bug as non-reproducible and ignoring it, canhave serious consequences. An example is the recent security vulnerability foundin Facebook [75], which allowed anyone to post to other users’ walls. Beforeexposing the vulnerability, the person who had detected the vulnerability had filed abug report. However, the bug was ignored by Facebook engineers: “Unfortunatelyyour report [...] did not have enough technical information for us to take action onit. We cannot respond to reports which do not contain enough detail to allow us toreproduce an issue.”Researchers have analyzed bug repositories from various perspectives includ-ing bug report quality [61], prediction [108], reassignment [109], bug fixing andcode reviewing [53, 219], reopening [222], and misclassification [116]. None ofthese studies, however, has analyzed non-reproducible bugs in isolation. In fact,most studies have ignored non-reproducible bugs by focusing merely on the Fixedresolution.In this work, we provide an empirical study on non-reproducible bug reports,characterizing their prevalence, nature, and root causes. We mine six bug reposito-ries and employ a mixed-methods approach using both quantitative and qualitative58analysis. To the best of our knowledge, we are the first to study and characterizenon-reproducible bug reports.Overall, our work makes the following main contributions:• We mine the bug repositories of one proprietary and five open source appli-cations, comprising 188,319 bug reports in total; we extract 32,124 non-reproducible bugs and quantitatively compare them with other resolutiontypes, using a set of metrics;• We qualitatively analyze root causes of 1,643 non-reproducible bug reportsto infer common categories of the reasons these reports cannot be repro-duced. We systematically classify 1,643 non-reproducible bug reports intothe inferred categories;• We extract patterns of status and resolution changes pertaining to all themined non-reproducible bug reports. Further, we manually investigate 120of these non-reproducible reports that were marked as Fixed later in theirlife-cycle.Our results show that, on average:1. NR bug reports pertain to 17% of all bug reports;2. compared with bug reports with other resolutions, NR bug reports remainactive around three months longer, and are similar in terms of the extent towhich they are discussed and/or the number of involved parties;3. NR bug reports can be classified into 6 main cause categories, namely “In-terbug Dependencies”’ (45%), “Environmental Differences” (24%), “Insuf-ficient Information” (14%), “Conflicting Expectations” (12%), and “Non-deterministic Behaviour” (3%);4. 68% of all NR bug reports are resolved directly from the initial status (New/ Open). The remaining 32% exhibit many resolution transition scenarios.5. NR bug reports are seldom marked as Fixed (3%) later on; from those thatare finally fixed, 66% are actually reproduced and fixed through code patches(i.e., changes in the source code).593.2 Non-Reproducible BugsMost bug tracking systems are equipped with a default list of bug statuses andresolutions, which can be customized if needed. Generally, each bug report hasa status, which specifies its current position in the bug report life cycle [7]. Forinstance, reports start at New and progress to Resolved. From Resolved, they areeither Reopened or Closed, i.e., the issue is complete. At the Resolved status, thereare different resolutions that a bug report can obtain, such as Fixed, Duplicate,Won’t Fix, Invalid, or Non-Reproducible [7, 24].There are various definitions available for non-reproducible bugs online. Weadopt and slightly adapt the definition used in Bugzilla [24]:Definition 1 A Non-Reproducible (NR) bug is one that cannot be reproducedbased on the information provided in the bug report. All attempts at reproduc-ing the issue have been futile, and reading the system’s code provides no clues asto why the described behaviour would occur.Other resolution terminologies commonly used for non-reproducible bugs in-clude Cannot Reproduce [28], Works on My Machine [40] and Works For Me [41].Our interest in studying NR bugs was triggered by realizing that developersspend considerable amounts of time and effort on these reports. For instance, issue#106396 in the ECLIPSE project has 62 comments from 28 people, discussing howto reproduce the reported bug [23]. This motivated us to conduct a systematiccharacterization study of non-reproducible bug reports to better understand theirnature, frequency, and causes.3.3 MethodologyOur analysis is based on a mixed-methods research approach [80], where we collectand analyze both quantitative as well as qualitative data. All our empirical data isavailable for download [9]. We address the following research questions in ourstudy:RQ1. How prevalent are NR bug reports? Are NR bug reports treated differentlythan other bug reports?60Figure 3.1: Overview of our methodology.RQ2. Why can NR bug reports not be reproduced? What are the most commoncause categories?RQ3. Which resolution transition patterns are common in NR bug reports?RQ4. What portion of NR bug reports is fixed eventually? Were they mislabelledinitially? What cause categories do they belong to?Figure 3.1 depicts our overall approach. We use this figure to illustrate ourmethodology throughout this section.3.3.1 Bug Repository SelectionTo answer our research questions, we need bug tracking systems that provide ad-vanced search/filter mechanisms and access to historical bug report life-cycles.61Since BUGZILLA and JIRA both support these features (e.g., Changed to/fromoperators), we choose projects that use these two systems.Table 3.1 shows the bug repositories we have selected for this study. To ensurerepresentativeness, we select five popular, actively maintained software projectsfrom three separate domains, namely desktop (FIREFOX and ECLIPSE IDE), web(MEDIAWIKI and MOODLE), and mobile (FIREFOX ANDROID). In addition, weinclude one commercial closed source application (INDUSTRIAL). The proprietarybug tracking system is from a Vancouver-based mobile app development company.The bug reports are filed by their testing team and end-users, and are related todifferent mobile platforms such as Android, Blackberry, iOS, and Windows Phone,as well as their content management platform and backend software.3.3.2 Mining Non-Reproducible Bug ReportsIn this study, we include all bug reports that are resolved as non-reproducible atleast once in their life-cycles. In our search queries, we include all resolution ter-minologies commonly used for non-reproducible bug reports, as outlined in Sec-tion 3.2. We extract these NR bug reports in three main steps (Box 1 in Figure3.1):Step 1. We start by filtering out all Invalid, Duplicate, and Rejected reports. Whereapplicable, we also exclude Enhancement, Feedback, and Unconfirmed re-ports. The set of bug reports retrieved afterward is the total set that we con-sider in this study (‘#All Bugs’ in Table 3.1).Step 2. We use the filter/search features available in the bug repository systemsand apply the Changed to/from operator on the resolution field to nar-row down the list of bug reports further to the non-reproducible resolution(‘#NR Bugs’ in Table 3.1).Step 3. We extract and save the data in XML format, containing detailed informa-tion for each retrieved bug report.62Table 3.1: Studied bug repositories and their rate of NR bugs.ID Domain Repository Product/Component #All Bugs* #NR Bugs** NR(%) FixedNR(%)***FF Desktop Bugzilla [4] Firefox 65,408 18,516 28% 1%E Desktop Bugzilla [2] Eclipse/Platform 65,475 8,189 13% 4%W Web Bugzilla [3] MediaWiki 9,335 1,125 12% 9%M Web Jira [10] Moodle 22,175 2,503 11% 5%FFA Mobile Bugzilla [4] FirefoxAndroid 7,902 1,148 15% 3%PTY Mobile Jira Proprietary 18,024 643 4% 17%Overall 188,319 32,124 17% 3%*All Query: Resolution: All except (Duplicate, Invalid, Rejected) and Severity: All except (Enhancement, Feedback) and Status: All except Unconfirmed**NR Query: All Query and Resolution: Changed to/from Non-Reproducible***FixedNR Query: Resolution: Fixed and Severity: All except (Enhancement, Feedback) and Status: All except Unconfirmed and Resolution CHANGED FROMNon-Reproducible and Resolution: CHANGED TO Fixed63This mining step was conducted during August, 2013. We did not constrain thestart date for any of the repositories. The detailed search queries used in our studyare available online [9]. Overall, our queries extracted 32,124 NR bug reports froma total of 188,319 bug reports.3.3.3 Quantitative AnalysisIn order to perform our quantitative analysis, we measure the following metricsfrom each extracted bug report:Active Time pertains to the period between a bug report’s creation and the lastupdate in the report.Number of Unique Authors measures the number of people directly involved withthe report, based on their user ID.Number of Comments provides information about the extent to which a bug isdiscussed; this is an indication of how much attention a bug report attracts.Number of CCs/Watchers measures the number of people that would receive up-date notifications for the report. It provides insights as how many people areinterested in a particular bug report.Historical Status and Resolution Changes collects data on how the status andresolution of a bug report changes throughout time.To address RQ1, we measure the first four metrics for all the bug reports tocompare the properties of NR bug reports (32,124) with the others (156,195). Webuilt an analyzer tool, called NR-Bug-Analyzer [9], to calculate these metrics. Ittakes as input the extracted XML files and measures the first four metrics (Box2 in Figure 3.1). Since each repository system has a different set of fields, weperformed a mapping to link common fields in BUGZILLA and JIRA, as presentedin Table 3.2.To address RQ3, the last metric (historical changes) is extracted for all NR bugreports and used to mine common transition patterns. The data retrieved from bugrepositories does not contain any information on how the statuses and resolutions64Table 3.2: Mapping of BUGZILLA and JIRA fields.# BUGZILLA JIRA Description1 bug id key The bug ID.2 comment id id (in comment field) A unique ID for a comment.3 who author (in comment field) Name and id of the user who added a bug, acomment, or any other type of text.4 creation ts created The date/time of bug creation.5 delta ts resolved (updated) The timestamp of the last update. If resolvedfield is not available, updated field is used.6 bug status status The bug’s latest status.7 resolution resolution The bug’s latest resolution.8 cc watches Receive notifications.change over time for each bug report. Thus our tool parses the HTML source ofeach NR bug report to extract historical data of status and resolution changes (Box3 in Figure 3.1). BUGZILLA provides a History Table with historical changes todifferent fields of an issue, including the status and resolution fields, attachments,and comments. We extract the history of each bug report by concatenating theissue ID with the base URL of the HISTORY TABLE.28 JIRA provides a similarmechanism called Change History. Our bug report analyzer tool along with all thecollected (open source) empirical data are available for download [9].3.3.4 Qualitative AnalysisIn order to address RQ2, we perform a qualitative analysis that requires manualinspection. To conduct this analysis in a timely manner, we constrain the numberof NR bug reports to be analyzed through random sampling. The manual classifi-cation is conducted in two steps, namely, common category inference and classifi-cation.Common Category Inference. In the first phase, we aim to infer a set of commoncategories for the causes of NR bugs, i.e., understanding why they are resolved asNR. We randomly selected 250 NR reports from the open source repositories and250 NR reports from INDUSTRIAL.In order to infer common cause categories, each bug report was thoroughly an-alyzed based on the bug’s description, tester/developer discussions/comments, and28For example, the base URL for the History Table in FIREFOX BUGZILLA is https://bugzilla.mozilla.org/show activity.cgi?id=bug id.65historical data. We defined a set of classification definitions and rules and gener-ated the initial set of categories and sub-categories (Box 4 in Figure 3.1). Then, thegenerated (sub)categories were cross-validated through discussions, merged, andrefined (Box 5 in Figure 3.1). Based on an analysis of the reasons the bug reportscould not be reproduced, in total, we extracted six high-level cause categories, eachwith a set of sub-categories, which were fed into our classification step. The cate-gories and our classification rules are presented in Table 3.3. In the given examplesin Table 3.3 and throughout the work, R refers to reporter and D refers to anyoneelse other than reporter.Classification. In the second phase, we randomly selected 200 NR bug reportsfrom each of the open source repositories. In addition, to have a comparable num-ber of NR bug reports from the commercial application, we included all the 643NR bug reports from INDUSTRIAL in this step. We then systematically classifiedthese 1,643 NR bug reports, using the rules and (sub)categories inferred in the pre-vious phase. Where needed, the sub-categories were refined in the process (Box 6in Figure 3.1). Similar to the category inference step, each bug report was manu-ally classified by analyzing its descriptions, discussions/comments, and historicalactivities. At the end of this step, each of the 1,643 NR bug reports was distributedinto one of the 6 categories of Table 3.3.Inspecting Fixed NR Bug Reports. To address RQ4, we performed a query onthe set of NR bug reports to extract the subset that is finally changed to a Fixedresolution.We randomly selected 20 fixed NR bug reports from the 6 repositories and man-ually inspected them (120) to understand why they were marked as Fixed (Box 7in Figure 3.1), to understand whether the reports were initially mislabelled [116]or became reproducible/fixable, e.g., through additional information provided bythe reporter. In addition, this would provide more insights in types of NR bug re-ports that are expected to be fixed, and the additional information that is commonlyasked for, which helps reproduce NR bugs.3.4 ResultsIn this section, we present the results of our study for each research question.66Table 3.3: NR Categories and Rules.1) Interbug Dependencies: NR report cannot be reproduced because it has been implicitly fixed:a) as a result or a side effect of other bug fixesb) although it is not clear what patch fixed this bugc) and the bug is a possible duplicate of or closely related to other fixed bugs.Example #759127 in FIREFOX: R: “It is now working with Firefox 15.0.1. I believe it was fixed by the patches to#780543 and #788600 [...].”2) Environmental Differences: NR report cannot be reproduced due to different environmental settings such as:a) cashed data (e.g., cookies), user settings/preferences, builds/profiles, old versionsb) third party software, plugins, add-ons, local firewalls, extensionsc) databases, Virtual Machines (VM), Software Development Kits (SDK), IDE settingsd) hardware(mobile/computer) specifics such as memory, browser, Operating System (OS), compilere) network, server configuration, server being down/slow.Example #261055 in FIREFOX: D: “This is probably an extension problem. Uninstall your extensions and see ifyou can still reproduce these problems.” R: “that did it, I just uninstalled all themes and extensions, and afterwardsreinstalled everything from the getextensions website. And now everything works again [...].”3) Insufficient Information: NR report cannot be reproduced due to lack of enough details in the report; developersrequest more detailed information:a) regarding test case(s)b) pertaining to precise steps taken by the reporter leading to the bugc) regarding different conditions that result in the reported bug.Example in INDUSTRIAL: D: “Cannot reproduce this problem. [...] go to the main screen of the blackberry device,hold ALT and press L+O+G, it will show the logs. That information can help us to some degree.”4) Conflicting Expectations: NR report cannot be reproduced when there exist conflicting expectations of theapplication’s functionality between end-users/developers/testers:a) misunderstanding of a particular functionality or system behaviour when it works as designed (i.e., lack ofdocumentation)b) misunderstanding of (non)supported features, out of scope, dropping support or obsolete functionality innewer versionsc) change in requirementsd) misunderstandings turning into QA conversationsExample #29825 in ECLIPSE: D: “PDE Schema works as designed [...] Since we cannot tell when you want touse tags and when you want to use reserved chars as-is, you need to escape them yourself EXCEPT, again, whenbetween the ’¡pre¿’ and ’¡/pre¿’ tags that we recognize as a special case [...].”5) Non-deterministic Behaviour: NR report cannot be reproduced deterministically.Example #MDLSITE-2255 in MOODLE: R: “This happened for me again, and then went away again (started work-ing). It seems there is an intermittent problem.”6) Other: NR report cannot be reproduced due to various other reasons, such as mistakes of reporters:Example #MDL-35391 in MOODLE: R: “I’m so sorry... This is not a bug. It occurred because I have been usingMoodle 2.3 since beta and overwriting old source in the same directory. Could admin please delete this ticket? Sorryagain.” D: “Thanks for the explanation, closing.” 67FF−NRFF− OthersE−NRE−OthersW−NRW−OthersM−NRM−OthersFFA−NRFFA−OthersPTY−NRPTY−Others0100020003000Active Time (Days)Figure 3.2: Active Time.FF−NRFF− OthersE−NRE−OthersW−NRW−OthersM−NRM−OthersFFA−NRFFA−OthersPTY−NRPTY−Others024681012Number of Unique AuthorsFigure 3.3: No. of Authors.FF−NRFF− OthersE−NRE−OthersW−NRW−OthersM−NRM−OthersFFA−NRFFA−OthersPTY−NRPTY−Others0510152025Number of CommentsFigure 3.4: No. of Comments.FF−NRFF− OthersE−NRE−OthersW−NRW−OthersM−NRM−OthersFFA−NRFFA−OthersPTY−NRPTY−Others02468101214Number of CCFigure 3.5: No. of Watchers.3.4.1 Frequency and Comparisons (RQ1)Table 3.1 presents the percentage of NR bug reports for each repository. The resultsof our study show that, on average, 17% of all bug reports are resolved as non-reproducible at least once in their life-cycles.Figures 3.2–3.5 depict the results of comparing NR bug reports with other res-olution types. For each bug repository, the NR bug reports are shown with grey68Table 3.4: Descriptive statistics between NR and Others, for each defined metric: Active Time (AT), #Unique Authors (UA), # Comments (C), # Watchers (W), from all repositories.Metric Type Mean Median SD Max p-valueAT NR 396 154 553 4534 0.00Others 313 40 531 4326UA NR 3.16 3 2.22 85 0.00Others 3.06 2 2.61 103C NR 5.14 3 7.9 459 0.03Others 5.93 3 12.5 1117W NR 2.1 1 3 159 0.00Others 2.7 2 4.3 145background. We ignore outliers for legibility. Table 3.4 shows the mean, me-dian, standard deviation, max and p-value (Mann-Whitney) for each comparisonmetric.29 The results show that active time is significantly different, i.e., NR bugreports are on average three months longer active than non-NR bug reports. Forthe number of unique authors, comments, and CC/watchers, the results are statisti-cally significant (p < 0.05), but the observed differences, having almost the samemedians, are not indicative, meaning that NR bug reports receive as much attentionfrom reporters and developers as any other resolution type.3.4.2 Cause Categories (RQ2)Table 3.3 shows the classification rules we used in our cause category investiga-tion. Figure 3.6 shows the six main categories that emerged in our analysis, withtheir overall rate. As shown, “Interbug Dependencies” is the most common cate-gory with having 45% of the NR bugs, followed by “Environmental Differences”(24%), “Insufficient Information” (14%), “Conflicting Expectations” (12%), “Non-deterministic Behaviour” (3%) and “Other” (2%). Additionally, Figure 3.7 depictsthe rate of the six cause categories per bug repository. We provide examples ofeach category below.Interbug Dependencies. Bug reports in this category are those that cannot be re-produced because they have been indirectly fixed with or without explicit softwarepatches. This category implies that there are bug reports that perhaps are not identi-cal but semantically closely related to each other. Overall, this is the most common29Min was 0 in all cases.69Interbug Dependencies 45% Other 2% Non-deterministic Behaviour 3%	  Conflicting Expectations 12% Insufficient Information 14% Environmental Differences 24% Figure 3.6: Overall Rate of NR Categories.cause category we observed in the study (45%). Examples include:#767543 in FIREFOX: “D: Works for me for Beta 15, Aurora 16, and Nightly17 with Swype Beta 1.0.3.5809 on Galaxy Nexus. I think my fix for bug #767597fixed this bug.”#177769 in FIREFOX: “D: Will resolve this as NR since we don’t know whichcheckin fixed this.”#259652 in ECLIPSE: “D: I remember fixing this but can’t find the bug. Sinceit doesn’t happen in HEAD, marking as NR.”#723250 in FIREFOX ANDROID: “D: This should be fixed now with my latestchanges on inbound. Specifically, bug 728369.”Environmental Differences. Bug reports in this category cannot be reproduceddue to environmental settings that are different for developers/testers/end-users.This category accounts for 24%. Examples include:#353838 in ECLIPSE: “D: [.. ] your install got corrupted because of incompat-ible bundles. You could first try to disable or uninstall Papyrus and if that doesn’thelp try to remove the Object Teams bundles.”#DTP-01 in INDUSTRIAL: “D: This has something to do with the XCODEsettings on the build machine. Try to build it on another computer and see if itworks. I cannot reproduce this on my iPhone, iPad + simulators.”700% 50% 100% Firefox Eclipse MediaWiki Moodle FirefoxAndroid Proprietary Interbug Dependencies Environmental Differences Insufficient Information Conflicting Expectations Non-deterministic Behaviour Other Figure 3.7: Rate of root cause categories in each bug repository.#456734 in FIREFOX: “R: I solved the problem by uninstalling firefox (withoutextensions) and installing version 3.0.1 again, and then updating it again to 3.0.2.It’s a mystery for me but it helped so it’s solved.”Insufficient Information. This is when developers need more specific and detailedinformation from the reporters. This category accounts for 14% of NR bug reports.Examples of this category include:#125142 in ECLIPSE:: “D: I haven’t been able to reproduce this bug in theJava debugger [...]. Do you have a test case that displays the launch happening inthe foreground? marking as NR. Please reopen with a reproducible test case if thisis still occurring.”#3103 in MEDIAWIKI: “D: I’m going to resolve this bug (as NR) on thegrounds that without further details of the circumstances in which it occurs, there’sreally not much we can do... ”#19880 in MEDIAWIKI: “D: I’ve tested ru.wikipedia.org in IE5.5 on Windows2000, IE6 on Windows XP, IE7 on Windows Vista, IE8 on Windows Vista. I wasunable to reproduce this problem. Perhaps the reporter of this bug could be morespecific.”Also tickets are also resolved as NR when there is no response from reportersfor several months. For example:#10014 in MEDIAWIKI: “D: Closing ‘support bug’ due to lack of response; ifthe problem persists, please consider taking it up on the mediawiki-l mailing list.”71In the FIREFOX project, an automated message is set up in the bug trackingsystem, which states “This bug has had no comments for a long time. Statistically,we have found that bug reports that have not been confirmed by a second user afterthree months are highly unlikely to be the source of a fix to the code. [...] If this bugis not changed in any way in the next two weeks, it will be automatically resolved(NR).”Conflicting Expectations. This category represents bug reports in which there ex-ist conflicting expectations of the software between end-users/developers/testers.Such conflicts could be related to a particular system behaviour, functionality, fea-ture, software support, activity, input/output types and ranges, or specification doc-umentation. In these scenarios the user believes there is a bug in the system sincewhat they see is different from their mental model and/or expectations. As a result,the reported bugs are not really bugs and thus cannot be reproduced by developers.12% of NR bug reports fall into this category. Some examples are:#956483 in FIREFOX ANDROID: “D: [...] getDefaultUAString is notwhat you think. That controls the UA of the Java HTTP requests we make in Fennec.This is not used by the Gecko networking and rendering engine. You need to usethe normal Gecko preferences to change the UA. This might work: [...] R: Thanks.That works.”#12593 in MEDIAWIKI: “R: [...] I can live with this because it is consistentand predictable behaviour. Thinking about it, it is probably desirable that thesystem works this way for migration purposes; for example: when importing adump into a newer MEDIAWIKI version.” says the reporter.#19943 in MEDIAWIKI: “D: Seems ok to me. As long as extensions are passingtheir path as either a full URL (with protocol) or relative from the docroot theyshould be fine [...] Checked all extensions in MW SVN that call this, and they allseem to be ok, [...]. Works for me, no real issue with addExtensionStyle()here. R: Ah, I see. Needs documenting, then [...].”#17265 in MEDIAWIKI: “R: Preferably, the user and talk page of the otherusername should be deleted, because it’ll be impracticable to merge. I hope thiswill be implemented and will help a lot of people. D: Works for me. There’s an72extension [...] that does this. Also, there’s a maintenance script [...] that can beused for edit attribution, if someone wanted to manually merge two users.”Non-deterministic Behaviour. This category represents bugs that cannot be re-produced deterministically, meaning that the failure is intermittent and triggered atrandom; and thus difficult to analyze. 3% of NR bug reports are in this category.An example of a developer comment is given below:#DTP-02 in INDUSTRIAL: “D: This crash is very random, hard to reproduce.But my guess is it is network/analytics related. It may have to do with the userscrolling through a number of events in the Schedule section which the app cannotkeep up with and then eventually crashes.”Other. Any other reason not covered in the other 5 categories would fall under thiscategory (2%). One common instance in this category is bug reports that are mis-takenly reported, such as opening an old ticket by mistake, or running the systemwith incorrect permissions.#152 in MEDIAWIKI: “R: For the last 3 hours I made the assumption that wecould only import articles from the template namespace ... Additionally I made anerror in my testing page that I just figured out. Closing...”#8966 in MEDIAWIKI: “R: Shame on me, The function is not broken, I [mis]understood the syntax.”3.4.3 Common Transition Patterns (RQ3)68% of NR bug reports are resolved directly from the initial status (New/Open→Resolved(NR)). For the remaining 32%, there are various transition scenarios thatNR bugs go through, changing their status and resolution. Table 3.5 presents someof the observed examples of the status transitions of NR bug reports. For instance,the bug report in row #6 changes resolutions 5 times: Fixed→ Fixed→ Invalid→NR→ Fixed.73Table 3.5: Examples of STATUS (RESOLUTION) transitions of NR bug reports.# STATUS (RESOLUTION)1 NEW→RESOLVED(NR)→REOPENED→RESOLVED(NR)→REOPENED→RESOLVED(NR)2 NEW→RESOLVED(NR)→REOPENED→ASSIGNED→RESOLVED(FIXED)→REOPENED→ RESOLVED(FIXED)3 NEW→RESOLVED(FIXED)→REOPENED→RESOLVED(WONTFIX)→RESOLVED(NR)4 NEW→RESOLVED(FIXED)→REOPENED→RESOLVED(NR)→REOPENED→NEW→RESOLVED(WONTFIX)5 NEW→RESOLVED(FIXED)→REOPENED→RESOLVED(NR)→REOPENED→RESOLVED(FIXED)6 UNCONFIRMED→NEW→RESOLVED(FIXED)→REOPENED→RESOLVED(FIXED)→REOPENED→RESOLVED(INVALID)→REOPENED→RESOLVED(NR)→REOPENED→RESOLVED(FIXED)7 NEW→ASSIGNED→NEW→RESOLVED(NR)→REOPENED→ASSIGNED→RESOLVED(FIXED)→VERIFIED8 NEW→ASSIGNED→RESOLVED(NR)→REOPENED→ASSIGNED→RESOLVED(LATER)→REOPENED→NEW→ASSIGNED→NEW→ASSIGNED→RESOLVED(FIXED)9 UNCONFIRMED→RESOLVED(INCOMPLETE)→UNCONFIRMED→RESOLVED(INCOMPLETE)→RESOLVED(FIXED)→RESOLVED(NR)74No ResolutionINCOMPLETE)WONTFIXFIXED4.4NR69.1INVALIDDUPLICATECUSTOM RESOLUTIONS4.65.1 2Figure 3.8: Resolution-to-Resolution Transition Patterns of NR Bug Reports (only weights larger than 2%are shown on the graph).We examined resolution transitions of NR bug reports more closely, and plot-ted a resolution change pattern graph for the six bug repositories, which is depictedin Figure 3.8. In order to extract a common pattern for all the six repositories, weabstracted away custom (repository-specific) resolutions such as Later, Remind,Expired, Rejected, Unresolved, and NotEclipse. The custom resolutions are clus-tered as Custom Resolutions in Figure 3.8. The other resolutions shown in thegraph were common in all the repositories.We distinguish between two types of transitions in Figure 3.8: the black arrowsindicate all the direct connections to the NR resolution, i.e., all the fan-ins and fan-outs; the grey arrows indicate the indirect connections between other resolutionsand NR resolution. To avoid cluttering the figure, we only show weights largerthan 2% on the graph. As the figure illustrates, 69% of the transitions are resolvedas NR from the beginning. 4.6% of the transitions are from Fixed to NR. For75instance, #376902 in FIREFOX was first resolved as Fixed then changed to NR witha comment: “Fixed refers to problems fixed by actual code changes to FIREFOX.Here NR is the correct resolution.”Interestingly, 5.1% of the transitions are from NR to Fixed. We explore fixedNR bug reports further in the following subsection.3.4.4 Fixed Non-reproducible Bugs (RQ4)The last column in Table 3.1 shows that, on average, 3% of all NR bugs becomeFixed. From these, around 66% actually become reproducible as valid bugs andare fixed with code patches. They mainly fall into “Insufficient Information”, “En-vironmental Differences”, and “Conflicting Expectations” cause categories. Someexamples include:#209834 in ECLIPSE: “D: Now, when you described the problem more pre-cisely I realized it’s a valid bug. I checked it in both 3.3.1.1 (which you’re using)and N20071221-0010 (on which I’m on at this moment) and I can see the prob-lem by clicking the ‘Apply’ button several times - a resource matched to *.a rulechanges it’s state even though the rule is enabled all the time. I’ll put up a fix in aminute [...]” (“Insufficient Information” category)#533470 in FIREFOX: “R: [...] I think I got to the bottom of it. The confusionwas caused by kernel settings: I thought it was fixed, but actually it was just a ipv6module getting automatically loaded. The problem still exists when there is nokernel ipv6 support available. I’ve submitted a simple patch to pulseaudio whichwill hopefully be accepted and solve the problem.” (“Environmental Differences”category)#245584 in FIREFOX: “D: the problem was because NS NewURI was failing- perhaps it was failing because there was something about the URL from IE’sdata that our networking system couldn’t handle? Since this was particular to thatURL in that person’s set of typed URLs in IE, it didn’t show up for everyone [...]”(“Environmental Differences” category)Interestingly, there were no code patches assigned to the rest of Fixed NR bugreports (34%). These are mislabelled reports, as the Fixed resolution is used when76“a fix for a bug is checked into the tree and tested” [24]. From these, around 24%are in the “Interbug Dependencies” category. For example:#705166 in FIREFOX: “D1: [...], guess this bug is fixed in the latest nightly.Working fine for me too. D2: [...] WorksForMe is not a correct resolution for thisone. The bug was actually fixed by the patch in bug 704575.”3.5 DiscussionIn this section, we discuss our general findings related to non-reproducible bugreports and discuss some of the threats to validity of our results.3.5.1 Quantitative Analysis of NR Bug ReportsOur investigation in the quantitative attributes of NR and other types of bug reportsshows that NR bug reports are as costly and important as the rest since they receivethe same amount of attention as other bug report types, in terms of the numberof comments and developers involved. Developers are typically reluctant to closethese bug reports, and they try to involve more people and ask questions throughcomments. As a result, NR bug reports remain open substantially —around threemonths on average— longer than other types of bug reports. This clearly pointsto the uncertainly and low level of confidence developers have when dealing withNR bugs. Possible explanations for leaving NR bug report open longer could bethat (1) they do not want to be responsible in case the NR bug turns out to be a real(reproducible) bug that needs fixing, (2) they hope more concrete information willbe provided to help reproduce the bug, and/or (3) they wait for someone else to beassigned to the report who knows how to reproduce the bug.3.5.2 Fixing NR BugsAs our results from the six repositories have shown, on average 17% of all bugreports are resolved as NR. Among those, 3% are later marked as Fixed. A deeperinvestigation into the Fixed NR reports revealed that around 66% of them becomeindeed reproducible and fixed with code patches. The rest (34%) have no codepatches assigned to them, from which around 24% are in the “Interbug Dependen-77cies” category. This means overall only 1.98% of all NR bug reports are fixed withan explicit code patch. This indicates that the majority of NR bug reports remainunreproducible.3.5.3 Interbug DependenciesOn the other hand, 45% of all NR bugs were categorized as “Interbug Dependen-cies”, where they were non-reproducible because they were implicitly fixed in otherbug reports. Therefore, we expected the percentage of the explicit fixed NR bugs tobe higher than 3%. However, it turns out that developers use the NR resolution forreports that are resolved as a consequence of other bug fixes. This implies that al-most half of all NR bugs are actually (implicitly) fixed bugs. We believe coming upwith automatic solutions that would cluster these interbug dependent reports basedon inferred historical characteristics would help the developers in this regard.3.5.4 MislabellingOur findings indicate that many reports are misclassified. These misclassificationshappen not due to human errors but also because of the fact that the availableresolutions in the repositories do not cover all possible scenarios. For instance,many developers use the NR (or WorksForMe) resolution when they actually meanthe bug report is irrelevant, unimportant, or even fixed. This is different than theformal definition of NR bugs (see Section 3.2). We observed many inconsistenciesand ambiguities around the usage of the Fixed and NR resolutions, in particularin cases where a bug report needs to be marked as “fixed with no code patches”.Bugs 376902 and 705166 (subsections 3.4.3 and 3.4.4, respectively) are examplesof these cases.3.5.5 Different Domains and EnvironmentsThe active time of NR bug reports in the INDUSTRIAL repository is much lowerthan the open source repositories (see Figure 3.2). According to Table 3.1, NRbugs are more prevalent in the studied open-source projects, i.e., they pertain to11–28% in the open source repositories and 4% in the industrial case. In addition,as presented in the last column of Table 3.1, although the rate of NR bug reports78is lower in the INDUSTRIAL case, the rate of fixed NR bug reports is higher, com-pared to the open source repositories. Although these findings apply to our samplerepositories, possible reasons behind these differences could be that in commercialprojects, there is more at stake and, therefore, developers (1) spend more time andeffort in reproducing even hard to reproduce bugs, and (2) cannot afford to simplyignore NR bugs. It could also be that the company has a brute force policy in termsof closing bug reports as soon as possible. On the other side, developers in opensource projects have less time to spend and less urgency to fix/close a bug report.Additionally, in the mined repositories, the rate of NR bug reports in desk-top applications is more than web and mobile apps, i.e., they are in the range of13–28% for desktop, 11–12% for web, and 4–15% for mobile apps. Figure 3.2indicates that NR bug reports have a lower active time in the repositories of themobile apps, compared to the desktop and web applications. In addition, the dif-ference between the medians of NR and other bug reports is the highest in the webapplications, followed by the desktop, and mobile apps in our study.3.5.6 Communication IssuesThe two categories “Insufficient Information” (14%) and “Conflicting Expecta-tions” (12%) indicate that there is a source of uncertainty and lack of proper com-munication between the reporters and resolvers. Herzig et al. [116] observed thisuncertainty as a source of misclassification patterns in their recent bug report study.Equipping bug tracking systems with better collaboration tools would facilitate andenhance the communication needs between the two parties. For the category “En-vironmental Differences” (24%), techniques that make it easier to capture the stepsleading to the bug through, e.g., record/replay methods [115], monitoring the dy-namic execution of applications [59], or capturing user interactions [193] would behelpful to reproduce the bug report.3.5.7 Threats to ValidityOur manual classification of the bug reports could be a source of internal threats tovalidity. In order to mitigate errors and possibilities of bias, we performed our man-ual classification in two phases where (1) the inference of rules was initially done79by the first author; the rules were cross-validated and uncertainties were resolvedthrough extensive discussions and refinements between the first two authors; thegenerated categories were discussed and refined by all the three authors, (2) theactual distribution of bug reports into the 6 inferred categories was subsequentlyconducted by the first author following the classification rules inferred in the firststep.In addition, since this is the first study classifying NR bug reports, we had toinfer new classification rules and categories. Thus, one might argue that our NRrules and categories are subjective with blurry edges and boundaries. By followinga systematic approach and triangulation we tried to mitigate this threat. Anotherthreat in our study is the selection and use of these bug repositories as the mainsource of data. However, we tried to mitigate this threat by selecting various largerepositories and randomly selecting NR bug reports for analysis.In terms of external threats, we tried our best to choose bug repositories from arepresentative sample of popular and actively developed applications in three dif-ferent domains (desktop, web, and mobile). With respect to bug tracking systems,JIRA and BUGZILLA are well-known popular systems, although bug reports inprojects using other bug tracking systems could behave differently. Thus, regard-ing a degree of generalizability, replication of such studies within different domainsand environments (in particular for industrial cases) would help to generalize theresults and create a larger body of knowledge.All repositories except the INDUSTRIAL case are publicly available, makingthe quantitative findings of our study reproducible.3.6 Related WorkWe categorize related work into two classes: empirical bug report studies and fail-ure reproduction studies.Empirical Bug Report Studies. Empirical bug report studies have so far focusedon different perspectives including understanding the quality of bug reports [61,64, 118, 141], reassignments [109], bug report misclassifications [116], reopenings[202, 222], prediction and statistical models [96, 108, 145, 201], bug fixing andcode reviewing process [219], and coordination patterns and activities around the80bug fixing process [53].Herzig et al. [116] recently reported that every third ‘bug report’ is not reallya bug report. In a manual examination of more than 7,000 bug reports of fiveopen-source projects, they found 33.8% of all bug reports to be misclassified - thatis, rather than referring to a code fix, they resulted in a new feature, an update todocumentation, or an internal refactoring. This misclassification introduces errorsin bug prediction models: on average, 39% of files marked as defective actuallynever had a bug. They estimated the impact of this misclassification on earlierstudies and recommended manual data validation for future studies. The results ofour study also confirm this finding.Aranda et al. [53] report on a field study of coordination activities around bugfixing, through a combination of case study and a survey of software profession-als. They found that the histories of even simple bugs are strongly dependent onsocial, organizational, and technical knowledge, which cannot be solely extractedthrough automation of electronic repositories, and that such automation providesincomplete and often erroneous accounts of coordination.Zimmermann et al. [222] characterized how bug reports are reopened, by usingthe Microsoft Windows operating system project as a case study, using a mixed-methods approach. They categorized the reasons for reopening based on a surveyof 358 Microsoft employees and ran a quantitative study of Windows bug reports,focusing on factors related to bug report edits and relationships between people in-volved in handling the bug. They propose statistical models to describe the impactof various metrics on reopening bugs ranging from the reputation of the opener tohow the bug was found.Guo et al. [109] present a quantitative and qualitative analysis of the bug re-assignment process in the Microsoft Windows Vista project. They quantify socialinteractions in terms of both useful and harmful reassignments. They list five rea-sons for reassignments: finding the root cause, determining ownership, poor bugreport quality, hard to determine proper fix, and workload balancing. Based ontheir study, they propose recommendations for the design of more socially-awarebug tracking systems.To the best of our knowledge, our work is the first to report a characterizationstudy on non-reproducible bug reports.81Failure Reproduction Studies. Apart from the empirical bug studies, there havebeen a number of studies [59, 115, 193] analyzing and proposing solutions forfailure reproduction. Roehm et al. [193] present an approach to monitor interac-tions between users and their applications selectively at a high-level of abstraction,which enables developers to analyze user interaction traces. Herbold et al. [115]use a record/replay approach and monitor messages between GUI objects. Suchmessages are triggered by user interactions such as mouse clicks or key presses.We believe these techniques can help make NR bug reports easier to understandand reproduce. In this work, however, we perform a mining study of NR bug re-ports to understand their nature, leaving possible solutions for future work.3.7 ConclusionsWorking on non-reproducible bug reports is notoriously frustrating and time con-suming for developers. In this work, we presented the first empirical study onthe frequency, nature, and root cause categories of non-reproducible bug reports.We mined 6 bug tracking repositories from three different domains, and foundthat 17% of all bug reports are resolved as non-reproducible at least once in theirlife-cycles. Non-reproducible bug reports, on average, remain active around threemonths longer than other resolution types while they are treated similarly in termsof the extent to which they are discussed or the number of developers involved. Inaddition, our analysis of resolution transitions in non-reproducible bug repots re-vealed that such reports change their resolutions many times. Furthermore, around2% of all NR bug reports are eventually fixed with code patches, while around halfare implicitly ‘fixed’.Our manual examination revealed 6 common root cause categories. Our clas-sification indicated that “Interbug Dependencies” forms the most common cat-egory (45%), followed by “Environmental Differences” (24%), “Insufficient In-formation” (14%), “Conflicting Expectations” (12%), and “Non-deterministic Be-haviour” (3%). Our study shows that many NR bug reports are mislabelled pointingto the need for bug repository systems and developers to resolve inconstancies inthe usage of the Fixed and NR resolutions.Future work can focus on (1) bug reports in the “Interbug Dependencies” cate-82gory to design techniques that would facilitate identifying, linking, and clusteringthem upfront so that developers would not have to waste time on them, (2) incorpo-rating better collaboration tools into bug tracking systems to facilitate better com-munication between different stakeholders to address the problem with the otherNR categories.83Chapter 4Same App, Two App Stores: A Comparative StudySummary30Each mobile platform has its own online store for distributing apps to users. Toattract more users, implementing the same mobile app for different platforms hasbecome a common industry practice. App stores provide a unique channel forusers to share feedback on the acquired apps through ratings and textual reviews.To understand the characteristics of and differences in how users perceive the sameapp implemented for and distributed through different platforms, we perform alarge-scale comparative study. We mine the characteristics of 80K app-pairs froma corpus of 2.4M apps collected from the Apple and Google Play app stores. Wequantitatively compare their app-store attributes, such as ratings, versions, prices,and ask the developers about the identified differences. Further, we employ su-pervised machine learning to build classifiers for sentiment and feedback analysis,and classify 1.7M textual user reviews obtained from 2K of the mined app-pairs.We analyze discrepancies and root causes of user complaints at multiple levels ofgranularity across the two platforms.4.1 IntroductionOnline app stores are the primary medium for the distribution of mobile apps.Through app stores, users can download and install apps on their mobile devices.App stores also provide an important channel for app developers to collect user30This chapter is submitted to an ACM SIGSOFT conference.84feedback, such as the overall rating of their app, and issues or feature requeststhrough user reviews.To attract as many users as possible, developers often implement the same appfor multiple mobile platforms [86]. While ideally, a given app should providethe same functionality and high-level behavior across platforms, this is not alwaysthe case in practice [88]. For instance, a user of the Android STARBUCKS appcomplains: “I downloaded the app so I could place a mobile order only to find outit’s only available through the iPhone app.” Or an iOS NFL app review reads:“on the Galaxy you can watch the game live..., on this (iPad) the app crashessometimes, you can’t watch live games, and it is slow.”Currently, iOS [21] and Android [19] dominate the app market, each with over1.5 million apps in their respective app stores; therefore, in this work, we focus onthese two platforms. We present a large-scale study on mobile app-pairs — sameapp implemented for iOS and Android platforms — in order to analyze and com-pare their various app-store attributes, textual user reviews, and root causes of usercomplaints. We mine the two app stores and employ a mixed-methods approachusing both quantitative and qualitative analysis. Our study can help developers tounderstand the dynamics of different markets, end-user perceptions, and reasonsbehind varying success rates for the same app across platforms.Researchers have mined app stores for analyzing user-reviews [71, 124, 185],app descriptions [105, 143, 200], and app bytecode [54, 194, 195]. However, ex-isting studies focus on one store at a time only. To the best of our knowledge, weare the first to study the same apps, published on different app stores.Overall, our work makes the following main contributions:• We present the first dataset of 80,169 app-pairs, extracted by analyzing theproperties of 2.4M apps from the Google Play and Apple app stores. Ourapp-pair dataset is publicly available [165].• We compare the app-pairs’ attributes such as ratings, categories, prices, ver-sions, updates and explore causes for variation among them.• We identify app-pairs on the top rated 100 free and 100 paid apps listed onGoogle Play and Apple app stores and explore some of the obstacles that85prevent developers from publishing their apps in both stores.• We extract and classify user reviews to compare user sentiment and com-plaints across app-pairs.• Finally, we measure an app’s success using a proposed metric that combinesreviews, ratings, and stars. Based on developers’ feedback, we provide in-sight into the varying success rates of app-pairs.4.2 MethodologyOur analysis is based on a mixed-methods research approach [80], where we col-lect and analyze both quantitative and qualitative data. We address the followingresearch questions in our study:RQ1. How prevalent are app-pairs and what are the app-store characteristics ofapp-pairs?RQ2. What portion of the top rated apps are app-pairs? Why are some apps onlyavailable on one platform?RQ3. Are the app-pairs equally successful on both platforms?RQ4. What are some of the major concerns or complaints on each platform?Figure 4.1 depicts our overall approach. We use this figure to illustrate ourmethodology throughout this section. We first describe how we collect our datasetsalong with their attributes and descriptive statistics, and then describe how we de-tect app-pairs. Finally, we explain the analysis steps performed on the app-pairs.Additionally, to better understand our findings, we ask app developers about someof the reasons behind the differences in app-pair attributes such as prices, updatefrequencies, success rates, and top-rated apps existing only on one platform.4.2.1 Data CollectionThe first step in our work is to collect Android and iOS apps along with theirattributes (Box 1 in Figure 4.1). To this end, we use two open-source crawlers,86Figure 4.1: Overview of our methodology.namely Google Play Store Crawler [103] and Apple Store Crawler [52] to mineapps from the two app stores, respectively. We collect app attributes that are avail-able on two stores. Since each app store has a different set of attributes, we firstmap the attributes that exist in both app stores and ignore the rest. For instance,information about the number of downloads is only available for Android but notiOS, and thus, we ignore this attribute. This mining step was conducted in Novem-ber 2014 and resulted in 1.4M Android apps and 1M iOS apps. Table 4.1 outlinesthe attributes we collect. We save this data in a MONGODB database [165], whichtakes up approximately 2.1GB of storage.4.2.2 Matching Apps to Find App-PairsAfter creating the Android and iOS datasets separately, we set out to find app-pairsby matching similar apps in the two datasets. The unique IDs for iOS and Androidapps are different and thus cannot be used to match apps, i.e., Android apps havean application ID composed of characters while iOS apps have a unique 8 digitnumber. However, app names are generally consistent across the platforms sincethey are often built by the same company/developer. Thus, we use app name anddeveloper name to automatically search for app-pairs. This approach could result87Table 4.1: Collected app-pair attributes# iTunes; Google Description1 name; title Name of the app.2 developerName;developer nameName of the developer/company of theapp.3 description; description Indicates the description of the app.4 category; category Indicates the category of the app; 23Apple & 27* Google categories.5 isFree; free True if the app is free.6 price; price Price ($) of the app.7 ratingsAllVersions;ratingsAllVersionsNumber of users rating the app.8 starsVersionAllVersions;star ratingAverage of all stars (1 to 5) given to theapp.9 version; version string User-visible version string/number.10 updated; updated Date the app was last updated.*Google has its apps split into Games and Applications. We count Games as one category.in multiple possible matches because (1) on one platform, developers may developclose variants of their apps with extra features that have similar names (See Fig-ure 4.2); (2) the same app could have slightly different names across the platforms(See Figure 4.3–a); (3) the same app could have slightly different developer namesacross the platforms (See Figure 4.3–b).Figure 4.2: Android Cluster for Swiped app.ClusteringTo find app-pairs more accurately, we first cluster the apps on each platform. Thisstep (outlined in Box 2 of Figure 4.1) groups together apps on each platform thatbelong to the same category, have similar app names (i.e., having the exact rootword, but allowing permutations) and the same developer name. Figure 4.2 is an88Algorithm 1: Android & iOS Clustering Algorithminput : Collection of Apps (APPS)output: APPS with Cluster IDs1 begin2 foreach i = 0, i <COUNT(APPS), i++ do3 appName← APPS[i].name4 devName← APPS[i].devName5 category← APPS[i].category6 clusterID← appName + devName +category7 if CHECK(APPS[i],clusterID) then8 APPS[i].clusterID← clusterID9 end10 foreach j = 0, j <COUNT(APPS), j++ do11 if SIMILAR(APPS[ j].name,appName) &APPS[ j].devName≡ devName then12 and APPS[j].category==category APPS[j].clusterID← clusterID13 end14 end15 end16 endexample of a detected Android cluster. The apps in this cluster are all developed byiGold Technologies, belong to the Game category and have similar (but not exact)names.To cluster the apps, we execute Algorithm 1 on the Android and iOS datasets,separately. The algorithm takes as input a collection of apps and annotates thecollection to group the apps together. We loop through the entire collection of apps(line 2). For each app, we extract the app name, developer name, and category(lines 3, 4 and 5). Next, if an app has not been annotated previously we annotateit with a unique clusterID (line 7). Then we search for apps, which have a similarname, exact developer name, and belong to the same category (line 11). If a matchis found, we annotate the found app with the same clusterID (line 12).Detecting App-PairsWe consider an app-pair to consist of the iOS version and the Android version ofthe same app. In our attempt to find app-pairs (Box 3 in Figure 4.1), we noticed thatAndroid and iOS apps have different naming conventions for app names and de-veloper names. For instance, Figure 4.3–a depicts an app developed by ‘Groupon,Inc.’, with different naming conventions for app names; ‘Groupon - Daily Deals,89Coupons’ on the Android platform whereas ‘Groupon - Deals, Coupons & Shop-ping: Local Restaurants, Hotels, Beauty & Spa’ on the iOS platform. Similarly,Figure 4.3–b shows the ‘Scribblenauts Remix’ app, which has the exact name onboth platforms, but has differences in the developer’s name.(a) (b)Figure 4.3: a) Groupon and b) Scribblenauts apps. Android apps are shown on top and iOS apps at thebottom.Figure 4.4 shows the app-pairs we find using matching criteria with differentconstraints. Criteria E looks for app-pairs having exact app and developer namewhereas Criteria S relaxes both the app and developer name, thus matching theapps in Figure 4.3 as app-pairs.ID App-pair CriteriaE EXACT(AppName) & EXACT(DevName)S SIMILAR(AppName) & SIMILAR(DevName)Figure 4.4: Matching App-pair Criteria.To find app-pairs, we match the Android clusters with their iOS counterparts.First, we narrow down the search for a matching cluster by only retrieving thosewith a similar developer name. This results in one or more possible matchingclusters and we identify the best match by comparing the items in each cluster.90Thus, for each app in the Android cluster, we look for an exact match (criteria E)in the iOS cluster. If no match is found, we relax the criteria and look for matcheshaving a similar app and developer name (criteria S). The set of all possible app-pairs is a superset of S, and S is a superset of E, as depicted in the Venn diagram ofFigure 4.4.Exact App-PairsWe perform the rest of our study using criteria E, which provides a large-enoughset of app-pairs for analysis. To confirm that criteria E correctly matches app-pairs,we manually compared app names, descriptions, developers’ names, app icons andscreenshots of 100 randomly selected app-pairs and the results indicated that thereare no false positives.4.2.3 App-store Attribute AnalysisTo address RQ1, we recapture (Box 4 in Figure 4.1) the app-pairs’ attributes (seeTable 4.1) and update our dataset with the latest data. This step was conductedin June 2015. We compare the updated attributes between the iOS and Androidapp-pairs and present the results in Section 4.3.To address RQ2, we use the iTunes Store RSS Feed Generator [127] to retrievethe top rated apps, which enables us to create custom RSS feeds by specifyingfeed length, genres, country, and types of the apps to be retrieved. These feedsreflect the latest data in the Apple app store. The Google Play store provides thelist of top rated Android apps [209] as well. We collected the top 100 free and100 paid iOS apps belonging to all genres, as well as top 100 free and 100 paidAndroid apps belonging to all categories, in September 2015 (Box 5 in Figure 4.1).To check whether a top app exists on both platforms, we used our exact app-pairtechnique as described in the previous section. Since the lists were not long, wealso manually validated the pairs using the app name, developer name, descriptionand screenshots in the other market.914.2.4 User ReviewsIn addition to collecting app-store attributes for our app-pairs in RQ1, we analyzeuser reviews of app-pairs to see if there are any discrepancies in the way usersexperience the same app on two different platforms (RQ3 and RQ4).To that end, we first select 2K app-pairs that have more than 500 ratings, fromour app-pair dataset. This allows us to target the most popular apps with enoughuser reviews to conduct a thorough analysis. To retrieve the user reviews, we usetwo open-source scrapers, namely the iTunes App Store Review Scraper [126] andthe Goole Play Store Review Scraper [104]. In total, we retrieve 1.7M user reviewsfrom the 2K app-pairs.The goal is to semi-automatically classify the user reviews of the app-pairsand compare them at the app and platform level. To achieve this, we use naturallanguage processing and supervised machine learning to train two classifiers (Box6 in Figure 4.1). Each classifier can automatically put a review into one of its threeclasses.Generic Feedback Analysis. As shown in Table 4.2, our generic feedback clas-sifier (C1) has three unique class labels {Problem Discovery, Feature Request,Non-informative}; where Problem Discovery implies that the user review pertainsto a functional (bug), or non-functional (e.g., performance), or an unexpected is-sue with the app. Feature Request indicates that the review contains suggestions,improvements, requests to add/modify/bring back/remove features. Finally, Non-informative means that the review is not a constructive or useful feedback; suchreviews typically contain user emotional expressions (e.g., ‘I love this app’, de-scriptions (e.g., features, actions) or general comments. We have adopted theseclasses from recent studies [71, 185] and slightly adapted them to fit our analysisof user complaints and feedback on the two platforms.Sentiment Analysis. Additionally, we are interested in comparing the sentiment(C2 in Table 4.2) classes of {Positive, Negative, Neutral} between the reviews ofapp-pairs. We use these rules to assign class labels to review instances. Table 4.2provides real review examples of the classes in our classifiers.92Table 4.2: Real-world reviews and their classifications.C1 – Generic Feedback Classifier1 Problem Discovery: “Videos don’t work. The sound is workingbut the video is just a black screen.”2 Feature Request: “I would give it a 5 if there were a way to excludechain restaurants from dining options.”3 Non-informative: “A far cry from Photoshop on the desktop, ob-viously, but still a handy photo editor for mobile devices with greatsupport.”C2 – Sentiment Classifier1 Positive: “Amazing and works exactly how I want it to work. Noth-ing bad about this awesome and amazing app!”2 Negative: “The worst, I downloaded it with quite a lot of excitementbut ended up very disappointed”3 Neutral: “its aight good most of the time but freezes sometimes”Building ClassifiersOverall, we label 2.1K reviews for training each of the two classifiers (Box 7 inFigure 4.1).We randomly selected 1,050 Android user reviews and 1,050 iOS user reviewsfrom 14 app-pairs. We experimented with the number of app-pairs to find thebest F-measures. Additionally, these app-pairs were in the list of the most popu-lar apps and categories in their app stores. This diversity of apps and categoriesallows us to build robust classifiers capable of accurately labeling reviews whichare from different contexts, contain different vocabularies, and written by differentusers. The manual labeling of reviews was first conducted by one author follow-ing the classification rules inferred in Table 4.2. Subsequently, any uncertaintieswere cross-validated and resolved through discussions and refinements betweenthe authors.To build our classifiers, we use the bags of words representation, which countsthe number of occurrences of each word to turn the textual content into numeri-cal feature vectors. Next, we preprocess the text, tokenize it and filter stop words.We use the feature vectors to train our classifier and apply a machine learning al-gorithm on the historical training data. In this work, we experimented with twowell-known and representative semi-supervised algorithms, Naive Bayes (NB) andSupport Vector Machines (SVM). We use the Scikit Learn Tool [198] to build ourclassifiers. The training and testing data for our classifiers are 1,575 and 525 re-93views. We repeated this trial 25 times to train both our generic and sentimentclassifiers and compared the NB and SVM algorithms. We choose the generic(C1) and sentiment (C2) classifiers with the best F-measure. Finally, we use thegeneric and sentiment classifiers to classify ∼1.7M reviews of 2K app-pairs. Theresults are presented in Section 4.3.4.2.5 Success RatesAn important indication of the success of an app is the number of its downloadsby users. However, as explained in Section 4.2.1, although download counts areavailable for Android apps, Apple does not publish this information for iOS apps.This means we need to find another metric to measure and compare the successrates of app-pairs. Since user feedback can also be an indication of the successor failure of apps, we use the classified reviews from the previous step, along withtheir ratings and number of stars, to compute a success metric (Box 8 in Figure 4.1),which is defined as follows:SuccessRate =Rrev +(S¯×AR)7∗100 (4.1)where1. Rrev = Prev− (100− Nrev+PDrev2 )2. S¯ = Average value for the stars of an app.3. AR =0.25 if AppRatings≤ Q10.50 if Q1 < AppRatings≤ Q20.75 if Q2 < AppRatings≤ Q31.00 if Q3 < AppRatingsand Prev, Nrev, and PDrev depict the rates of Positive, Negative, and Problem Dis-covery reviews for each app. Also, each app has an overall ratings (AppRatings)on the app stores, which indicates the number of users who rate the app. We cal-culate the Q1, Q2, and Q3 as the first, second (median) and third quartiles of appratings for the 2K app-pairs for each platform. We choose to place more emphasison the ratings and stars since that information is collected directly from the appstores and strongly reflects the users’ view of the app. Furthermore, we multiplythem together to fairly compare apps which have high stars and low ratings and94apps which have average stars but high ratings. We use a range based on quartilesfor AR as opposed to the actual number of ratings to normalize the data and fairlycompare the apps. As for the reviews Rrev, we add the Nrev, and PDrev together anddivide them by 2 and then subtract them from 100 to avoid negative numbers; wesubtract the result from the Prev to get an overall score. The star values S¯, ratingsAR, reviews Rrev, and the overall SuccessRate are between [0–5], [0.25–1], [0–2],and [0–100%], respectively, for each app.Major Complaints AnalysisThe goal in RQ4 is to understand the nature of user complaints and how they differon the two platforms (Box 9 in Figure 4.1). To address this, we first collect theProblem Discovery reviews for 20 app-pairs having (1) the biggest differences insuccess rates between the platforms, (2) over 100 problematic reviews. These 20app-pairs are split into 10 in which Android is more successful than iOS and 10 inwhich iOS is more successful than Android. Then, we manually inspect and label1K problematic reviews (Box 10 in Figure 4.1), by randomly selecting 25 Androiduser reviews and 25 iOS user reviews from each of the 20 app-pairs. We noticedthat user complaints usually fall into the following five subcategories:1. Critical: issues related to crashes and freezes;2. Post Update: problems occurring after an update/upgrade;3. Price Complaints: issues related to app prices;4. App Features: issues related to functionality of a feature, or its compatibility,usability, security, or performance;5. Other: irrelevant comments.Table 4.3 provides real example reviews of these categories. We use the labelleddataset to build a complaints classifier to automatically classify ∼350K problematicreviews of our 2K app-pairs.95Table 4.3: Reviews and subcategories of problem discovery.C3 – Complaints Classifier (Classes and Examples)1 Critical: “Crashing is terrible. This game crashes every two bat-tles. Really annoying please fix.”2 Post Update: “after the recent update, the music bar has disap-peared and I’m not able to listen to my fav. Music anymore, haveto log on from safari to listen to it!.”3 Price Complaints: “Don’t buy the pro I spent 4 dollars on the pronow it won’t even let me assess the app without sending me back tothe home screen! Four dollars is a lot when it comes to apps andnow it’s freakin gone!.”4 App Feature: “Video video. Is not working !!!!!!!! Please fix thevideo player. Thats not right.”4.2.6 Datasets and ClassifiersAll our extracted data, datasets for the identified app-pairs and the 2K app-pairsalong with their extracted user reviews, as well as all our scripts and classifiers areavailable for download [165].4.3 FindingsIn this section, we present the results of our study for each research question.4.3.1 Prevalence and Attributes (RQ1)Cluster of appsWe found 1,048,575 (∼1M) Android clusters for 1,402,894 (∼1.4M) Android appsand 935,765 (∼0.9M) iOS clusters for 980,588 (∼1M) iOS apps in our dataset. Thelargest Android cluster contains 219 apps31 and the largest iOS cluster contains 65apps.32 Additionally, 7,845 Android and 9,016 iOS clusters have more than oneitem. The first row of Table 4.4 shows descriptive statistics along with p-value(Mann-Whitney) for cluster sizes, ignoring clusters of size 1. Figure 4.5 depictsthe results of comparing the cluster sizes between the two platforms. We ignoreoutliers for legibility. The results are statistically significant (p < 0.05) and showthat while Android clusters deviate more than iOS clusters, the median in iOS is31 https://play.google.com/store/search?q=Kira-Kira&c=apps&hl=en32 https://itunes.apple.com/us/developer/urban-fox-production-llc/id39569678896AND iOS2.02.53.03.54.04.55.0Cluster Size (# Apps in Clusters)Figure 4.5: Clusters.higher than Android by one. This could be explained perhaps by the following twoobservations: (1) not all iOS apps are universal apps (i.e., run on all iOS devices)and some apps have both iPhone-only and iPad-only apps instead of one universalapp; (2) iOS has more free and pro versions of the same app than Android.Prevalence of app-pairsWe found 80,169 (∼80K ) exact app-pairs (Criteria E in Figure 4.4), which is 8%of the total iOS apps, and 5.7% of the total Android apps in our datasets. When werelax both app and developer names, the number of app-pairs increases to 116,326(∼117K ) app-pairs, which is 13% of our iOS collection and 9.2% of our Androidcollection. While our dataset contains apps from 22 Apple and 25 Google cate-gories, most of the pairs belong to popular categories, which exist on both plat-forms: {Games, Business, Lifestyle, Education, Travel, Entertainment, Music, Fi-nance, Sports}.Ratings & StarsInterestingly, 68% of Android and only 18% of iOS apps have ratings (the numberof users who have rated the app). The Median is 0 for all iOS and 3 for all Android,as depicted in Table 4.4. However, when we only consider apps with at least onerating, the median increases to 21 for iOS and 11 for Android (See Figure 4.6).97Table 4.4: Descriptive statistics for iOS and Android (AND), on Cluster Size (C), Ratings (R), Ratings forall apps (R*), Stars (S), Stars for all apps (S*), and Price (P).ID Type Min Mean Median SD Max PC iOS2 3.30 3.00 2.11 650AND 2 3.00 2.00 3.69 219R iOS5 1,935.00 21.00 26,827.24 1,710,2510AND 1 4,892.00 11.00 171,362.40 28,056,146R* iOS0 353.10 0.00 11,483.19 1,710,2510AND 0 3,302.00 3.00 140,807.60 28,056,146S iOS1 3.80 4.00 0.90 50AND 1 4.04 4.10 0.82 5S* iOS0 0.70 0.00 1.52 50AND 0 2.73 3.70 2.01 5P iOS0 3.41 1.99 9.53 5000AND 0 3.38 1.88 9.13 210*Including apps that have no ratings/stars.We ignore outliers for legibility. Furthermore, we compare the differences betweenratings for each pair. In 63% of the pairs, Android apps have more users rating them(on average 4,821 more users) whereas in only 5% of the pairs, iOS apps have moreusers rating them (on average 1,966 more users). Additionally, the results of ratingsin Table 4.4 are statistically significant (p < 0.05), indicating that while Androidusers tend to rate apps more than iOS users, for the rated app-pairs, iOS apps havehigher ratings. The categories with the highest ratings were { Personalization,Communication, Games} on Android and {Games, Social Networking, Photo &Video} on iOS.Similarly, 68% of Android and 18% of iOS apps have stars (i.e., the 1 to 5 scoregiven to an app). When we consider the apps with stars, the median increases to4 for iOS and 4.1 for Android (See Figure 4.7). Additionally, comparing the starswith the data, we had originally from November 2014, does not show considerablechange on the two platforms. Comparing the differences between the stars foreach pair, in 58% of the pairs, Android apps have more stars while in only 8%of the pairs, iOS apps have more stars. Additionally, while the results of starsare statistically significant (p < 0.05), the observed differences, having almost thesame medians (see Table 4.4), are not indicative, meaning that although Androidusers tend to star apps more than iOS users, the starred app-pairs have similarstars. The categories with the highest number of stars were {Weather, Music &Audio, Comics} on Android and {Games, Weather, Photo & Video} on iOS.98AND iOS050100150200Number of People who RateFigure 4.6: Ratings.llllllllllllllllll lllllllllllOldAND OldiOS AND iOS12345StarsFigure 4.7: Stars.Prices of app-pairsThe normal expectation is that the same app should have the same price on bothplatforms. However, comparing prices, 88% of app-pairs have different prices fortheir Android versus iOS versions. Comparing the rate of free and paid apps, 10%of the Android and 12% of iOS apps are paid. In 34% of the pairs, iOS apps havea higher price whereas in 54% of the pairs, Android apps have a higher price (SeeFigure 4.8). As Table 4.4 shows, the median is 1.99 for iOS and 2.11 for Android.The results are statistically significant (p < 0.05), and indicate that for app-pairswhile more paid iOS apps exist, Android paid apps have slightly higher prices.The categories with the most expensive apps were {Medical, Books & Reference,Education} on Android and {Medical, Books, Navigation} on iOS.To understand the reasons behind the price differences, we sent emails to allthe developers of app-pairs with price differences of more than $10 (US) and askedthem why their app-pairs were priced differently on the two platforms. Out of 52emails, we received 25 responses and categorized the reasons:Different monetization strategies per app store, i.e., paid apps vs. in-app pur-chases vs. freemium vs. subscription. For instance, in the Android version,the features exist as part of the app, while in the iOS version, the features areoptional or provided as an in-app purchase. A developer responded, “Thedifference is that the Android version includes consolidation ($9.99), chart-99llllllll l ll lllllll ll ll l llll l llll ll llll lllll l lll ll l lllll llll l ll lllllll lll l llllllll l l lll l ll lll ll l l lll l ll lll ll ll l l ll ll lll lll llll l llll llll l ll ll llllll ll l l l l ll l l lll lllllll l l0 50 100 150 2000100200300400500AND Price ($)iOS Price ($)Figure 4.8: Prices.ing ($14.99), reports ($9.99) and rosters ($14.99), whereas these are ‘in apppurchase’ options on Apple devices.”Different set of features on the two platforms. e.g., a developer responded, “theiOS version offers more features than the Android version.”Development/Maintenance cost of the app. Different parameters play a role inthe development/maintenance cost of an app, the cost varies depending onthe characteristics of the app e.g., a developer responded, “Apple forces de-velopers to constantly migrate the apps to their latest OS and tool versions,which causes enormous costs. ... the effort to maintain an App on iOS ismuch higher than on Android.” While another developer stated, “It’s easierto develop for (less device fragmentation), Android is relatively expensiveand painful to create for and much harder to maintain and support.”Demographic biases Developers price the app based on certain user demograph-ics; e.g, “since iOS phones are $1,000+ in this market, these users are richand willing to pay more readily for apps than Android users.”Exchange rate differences e.g., “price in both are set to 99 EUR as we are mainlyselling this in Europe. Play Store apparently still used USD converted by theexchange rate of the day the app was published.”100We have to note that some of the developers we contacted were unaware ofprice differences on the stores for their app-pairs.Versions and last updatedWhile the app stores’ guidelines suggest that developers follow typical softwareversioning conventions such as semantic versioning33 — (major.minor.patch) —they do not enforce any scheme. Therefore, mobile apps exhibit a wide varietyof versioning formats containing letters and numbers, e.g., date-based schemes(year.month.patch). Our analysis indicates that only 25% of the app-pairs haveidentical versions. When we inspect the major digit only, 78% of the pairs havethe same version. 13% of the Android apps have a higher version while 9% of theiOS apps have a higher version.Comparing the date the apps were last updated, 58% of the app-pairs havean iOS update date which is more recent than Android; while 42% have a morerecent Android update date. Interestingly, 30% of the app-pairs have update dateswhich are more than 6 months apart. To understand why developers update theirapps inconsistently across the platforms, we emailed all the developers of app-pairswhich were recently updated (after January 2016) on either platform; and in whichthe other platform has not been updated in 80 days or more. Out of 65 emails, wereceived 15 responses and categorized the reasons below:Ease of releasing updates e.g., “we are experimenting with a new 3D printingfeature, and wanted to try it on Android before we released it on iOS. As youknow, developers can release updates quickly to fix any problems on Android,but on iOS, we have to wait a week or two while Apple reviews the game.”Preferring one platform over the other for various reasons, e.g., “while thereare many Android handsets and potentially many downloads, this doesn’ttranslate well to dollars spent, relative to iOS.”Developer skills and expertise The developers might be more skilled at buildingapps for one of the platforms than the other; e.g., “I learned iOS first and amdeveloping for iOS full time, so everything is easier for me with iOS.”33 http://semver.org101Update due to a platform-specific feature e.g., “only updated the iOS version toswitch over to AdMob as the advertising network for iOS. Apple announcedthat iAd is being discontinued.”We have to note that, similar to the reasons behind the price differences, someof the developers we contacted, mentioned that the development/maintenance costof the app could affect updates on either platform.Finding 1: We found 80K exact app-pairs. While Android users tend to rate andstar apps much more than iOS users, the Stars were equal and the Ratings werehigher on the iOS platform. Also, it is common to have price and update datemismatches for app-pairs.4.3.2 Top Rated Apps (RQ2)Interestingly, our analysis on the top 100 free iOS and Android apps shows that88% of the top iOS and 86% of the top Android apps have pairs. 37 app-pairs arein the top 100 list for both platforms. On the other hand, for the top 100 paid iOSand Android apps, 66% of the top iOS and 79% of the top Android apps have pairs.30 of the paid pairs are in the top 100 for both platforms.Furthermore, we sent emails to all the developers of apps with no pairs andasked why their top rated Android or iOS app was only published for one platform.One respondent stated the following: “while the situation is unique with everycompany’s strategy and different with every individual product, there’s always aset of driving factors with expertise, success record, target market size and costof development that influence the general strategy, starting from which market toenter first, [to] when and if to enter another market (if at all).” Out of 81 emails,we received 29 responses and categorized the reasons below:Lack of resources e.g., “building the same app across two platforms is actuallytwice the work given we can’t share code ... so we’d rather make a reallygood app for one platform than make a mediocre one for two.”Platform restrictions e.g., “I’ve only focused on the Android platform simply be-cause Apple doesn’t allow for much customization to their UI.”102Revenue per platform e.g., “In my experience, iOS users spend more money, whichmeans a premium [paid app with a price higher than 2.99] is more likely tosucceed. ... while the Android platform has the market size, it proves to beharder for small [companies] to make good money.”Fragmentation within a platform e.g., “my app is very CPU intensive and thus,I must test it on every model. With a much-limited number of models for iOS,it’s feasible. On Android, it’s impossible to test on every model and qualitywould thus suffer.”Similar apps already exist on the other platform e.g., “Apple devices already havea default podcast app.”Platform-specific apps e.g., iTunes U app developed by Apple whereas GooglePlay Games app developed by Google Play.A common response from developers was that the app for the other platformis under development. We also observed more antivirus programs are available forAndroid. Thus, security could be more of an issue for the open-source Androidthan the restricted iOS platform.Finding 2: On average, 80% of the top rated apps are app-pairs. Main rea-sons for apps existing only on one platform, include: lack of resources, platformrestrictions and fragmentation, and revenue per platform.4.3.3 Success Rate (RQ3)Classification. To evaluate the accuracy of the classifiers, we measure the F-measure = 2×Precision×RecallPrecision+Recall for the Naive Bayes and SVM algorithms, listed inTable 4.5, where Precision = TPTP+FP and Recall =TPTP+FN . We found that SVMachieves a higher F-measure. On average, F(SVM) = 0.84 for the generic classifierand F(SVM) = 0.74 for the sentiment classifier. The F-measures obtained by ourclassifiers are similar to related studies such as [185] (0.72) and [71] (0.79). We se-lected the classifiers with the best F-measures and used them to classify 1,702,100(∼1.7M ) reviews for 2,003 (∼2K ) app-pairs.103Table 4.5: Statistics of 14 Apps used to build the classifiers (C1 = Generic Classifier, C2 = Sentiment Classifier, NB = Naive Bayes Algorithm, SVM = SupportVector Machines Algorithm, Train = Training pool).# App GoogleCategory AppleCategory Train Test F(C1-NB) F(C2- NB) F(C1-SVM) F(C2-SVM)1 FruitNinja Game(Arcade) Game 150 50 0.77 0.68 0.83 0.752 UPSMobile Business Business 150 50 0.80 0.69 0.82 0.763 Starbucks Lifestyle Food & Drink 150 50 0.75 0.63 0.84 0.774 YellowPages Travel & Local Travel 150 50 0.78 0.62 0.85 0.755 Vine Social Photo & Video 150 50 0.81 0.70 0.84 0.766 Twitter Social Social Networking 150 50 0.79 0.67 0.84 0.757 AdobePhotoShop Photography Photo & Video 150 50 0.82 0.72 0.85 0.75Total / Average of 7 Apps 1,050 350 0.78 0.67 0.85 0.768 YahooFinance Finance Finance 75 25 0.75 0.66 0.84 0.73... ... ... ... ... ... ... ...Total / Average of 14 Apps 1,575 525 0.77 0.65 0.84 0.74104lllllllllllllllllllll lll lllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llll lllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllll0 20 40 60 80 100020406080100Android Problem Discovery (PD)iOS Problem Discovery (PD)lllllllllllll llllllllllll lllllllllllllllllll llllllllll lllllllllllllllllll l lllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllll llllllllllllll llllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllll llllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android Feature Request (FR)iOS Feature Request (FR)llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllllllllll lllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllll ll lllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Non−Informative AndroidNon−Informative iOSll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android Positive (P)iOS Positive (P)lllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll llllllllllllll llllllllllllll lllllll lllll lllllllllllllllllllll llllllllllllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllll lllll lllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll llllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllll lllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll0 20 40 60 80 100020406080100Android Negative (N)iOS Negative (N)lllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll lllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllll lllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Neutral AndroidNeutral iOSFigure 4.9: The rates of classifiers’ categories for our 2K app-pairs, where each dot represents an app-pair.Figure 4.9 and Figure 4.10 plot the distribution of the rates for the main cate-gories in the sentiment and generic classifiers for our app-pairs as well as the suc-cess rate for our app-pairs. Each dot represents an app-pair. The statistical resultsare depicted in Table 4.6. On average, Feature Request, Positive, and Negatives re-105lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll llllllllllll lllllllllllllllllllllllllllll lllllllllllll lllllllllllllll lll llllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll llllllllllllllllllll llllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll lllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllll llllllllllllllllllllllllllllllllllllll llllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllll20 40 60 8020406080100Success Rate AndroidSuccess Rate iOSFigure 4.10: The success rates for our 2K app-pairs, where each dot represents an app-pair.Table 4.6: Descriptive statistics for the iOS and Android (AND) reviews for the app-pairs: Problem Dis-covery (PD), Feature Request (FR), Non-informative (NI), Positive (P), Negative (N), Neutral (NL), andSR (Success Rate).ID Type Min Mean Median SD Max PPD iOS0.00 20.47 15.62 16.65 100.00.00AND 0.00 21.06 17.54 14.61 100.0FR iOS0.00 17.50 16.03 10.81 100.00.00AND 0.00 13.71 12.50 8.88 100.0NI iOS0.00 62.04 64.95 20.77 100.00.00AND 0.00 65.23 67.10 17.45 100.0P iOS0.00 55.62 59.26 20.41 100.00.00AND 0.00 49.74 51.36 17.64 100.0N iOS0.00 9.80 6.66 10.07 100.00.00AND 0.00 7.72 5.74 7.39 100.0NL iOS0.00 34.57 32.45 14.87 100.00.00AND 0.00 42.54 41.73 13.97 100.0SR iOS8.93 55.31 54.68 19.98 98.10.27AND 11.84 55.97 55.22 18.83 92.4views are more among iOS apps whereas Problem Discovery, Non-informative andNeutral are more among Android apps. In addition, the average length of reviewson the iOS platform is 103 characters and 76 characters on the Android platform.Success Rate. Figure 4.11 shows the success rates for our app-pairs. The app-pairs are arranged based on the difference of their success rates between the twoplatforms. The far ends on the figure (ellipsed regions) indicate apps which are very1060	  20	  40	  60	  80	  100	  120	  0	   2	   4	   6	   8	   10	   12	   14	   16	   18	   20	  Success Rate (SR) App-pairs (Hundreds) Android Success iOS Success Android Success Rate iOS Success Rate Figure 4.11: Success rates for 2K app-pairs. The green round shape refers to Android apps and the bluetriangular shape refers to iOS apps.successful on one platform but not on the other. The results indicate that 17.4%(348 app-pairs) of the app-pairs have a difference of 25% or more in their successrate. The categories with the most successful apps were {Games, Entertainment,Finance} on Android and {Games, Entertainment, Education} on iOS.Finding 3: The majority of app-pairs are similarly successful on the two appstores. However, 17% have a difference of more than 25% in their success rates.Success Rate Differences. The method used to implement the app might affect itssuccess. We randomly selected 30 app-pairs with close success rates (within 5%range) and downloaded their Android source code; we found that 8 of them wereimplemented using a hybrid approach. The hybrid approach uses web technologiessuch as HTML, CSS, and Javascript to build mobile apps that can run on multi-ple platforms. We also analyzed 30 app-pairs that are more successful on iOS thanAndroid (difference greater than 20%) and 30 app-pairs that are more successfulon Android. We found only 4 in each set used the hybrid approach. In total, wefound 16 hybrid apps, which represents 17.7% of 90 app-pairs we inspected. Thisresult is in line with a recent study [210], which found that 15% of android apps aredeveloped using a hybrid approach. The results of this analysis indicate that some107app-pairs are equally successful because they use a hybrid approach, meaning theyhave the same implementation on the two platforms.Furthermore, to verify our success rate results, we sent emails to all the de-velopers of app-pairs which have a difference of more than 30% in their successrates. We asked if they have noticed the difference and possible reasons that theirtwo apps are not equally successful on both platforms. Although this is a sensi-tive topic, out of 200 emails, we received 20 responses; all the developers agreedwith our findings and were aware of the differences, for example, one developersaid: “our app was by far more successful on iOS than on Android (about a milliondownloads on iOS and 5k on Android).”Overall, developers mentioned (release/update) timing and first impressionscan make a large difference in how users perceive the app on the app stores. Thevariation in success can also be attributed to developers providing more (timely)support on either side. Additionally, app store support and promotional opportuni-ties were mentioned to help developers, e.g., “Apple ... promote your work if theyfind it of good quality, this happened to us 4–5 times and this makes a big differ-ence indeed”. Furthermore, some respondents find the Google Play’s quick reviewprocess helpful to release bug fixes and updates quickly.4.3.4 Major Complaints (RQ4)The goal in RQ4 is to understand the nature of user complaints and whether theydiffer on the two platforms.Our complaint classifier has, on average, an F-measure of F(SVM) = 0.7. Weused the classifier to classify 350,324 (∼350K ) problematic reviews for our 2Kapp-pairs. The results, depicted in Figure 4.12 and Table 4.7, show that the com-plaints about the apps vary between the two platforms. On average, iOS apps havemore critical and post update problems than their Android counterparts, whichcould be due to Apple regularly forcing developers to migrate their apps to theirlatest OS and SDK.On the other hand, Android apps have more complaints related to App Fea-tures subcategory (i.e., functionality of a feature, compatibility, usability, secu-rity, performance), which could be due to device fragmentation on Android. The108ll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllll lllllllllllllllllllll lllll llllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllll llllllllllllllllllll lllllllllllll ll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll l llllllllllllllllllllllllllllllllllllllllllllllllll lllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll0 20 40 60 80 100020406080100Android Critical (CR)iOS Critical (CR)lllllllllllll lllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllll llllllll llllllllllllllllll llllllllllllllllllllllllll lllllll lllllllllll lllllll llllllllllllllllllllllllllllllllllllllllllllll l lllllllllll lllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllll llllll llll lllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllll lllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllll llllllllll llllllllllllllllll llllllllllllllllllllllllllllllllllllllllll llllllll llllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android Post Update (PU)iOS Post Update (PU)l llllllll lllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll ll lll llllllllllllllllllllllllllllllllll llllllllllllllllll lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllll llllllllllllllllllllllllllll lllllllllllllllllll llll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll llllllllll llllllllll llllll lllllllll lllllll lllllllllllllllllll llllllllllll llllllllllllllllllllllll lllllllllllllllllll lllllllllllllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android Price Complaints (PC)iOS Price Complaints (PC)llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllll lllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llll lllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllll llllllllll lllllllllllllllllllllllllllllll llllllllll lllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android App Feature (AF)iOS App Feature (AF)Figure 4.12: The rates of complaints categories for our 2K app-pairs, where each dot represents an app-pair.wide array of Android devices running with different versions of Android, differ-ent screen sizes, and different CPUs can cause security, performance or usabilityproblems. This negative side-effect of fragmentation is also discussed in the liter-ature [86, 94, 111, 215].Examples of iOS post update problems include users unable to login, featuresno longer working, loss of information or data, and unresponsive or slow UI. Ex-amples of Android problems include dissatisfaction with a certain functionality,incompatibility with a certain device/OS, network and connection problems.109Table 4.7: Descriptive statistics for the problematic reviews of the app-pairs: Critical (CR), Post Update(PU), Price Complaints (PC), and App Feature (AF).ID Type Min Mean Median SD Max PAF iOS0.00 53.71 54.29 18.15 100.00.00AND 0.00 60.55 60.92 16.25 100.0CR iOS0.00 23.72 21.05 16.40 100.00.00AND 0.00 19.98 17.65 13.66 100.0PU iOS0.00 6.08 4.24 7.44 100.00.00AND 0.00 3.91 2.33 5.17 50.0PC iOS0.00 7.76 5.00 9.41 100.00.00AND 0.00 6.70 4.54 8.20 100.0Finding 4: On average, iOS apps have more critical and post update prob-lems while Android apps have more complaints related to app features and non-functional properties.4.4 DiscussionIn this section, we discuss several implications for app developers, app stores, andresearchers as well as some of the threats to validity of our results.4.4.1 ImplicationsFor App DevelopersOur findings and developer feedback indicate that developers wishing to test newideas or features find the Android platform more convenient since Google has lessformal guidelines and a fast review process compared to Apple’s strict guidelinesand lengthy review process. Also, paid apps seem to be more successful on the iOSplatform; thus, depending on the financial model adopted, developers might wantto prioritize building the app for iOS first assuming resources to build the apps forboth platforms is limited. Our automated classifiers can be useful for developers tomake sense of the feedback in user reviews and handle a large number of reviewssome apps receive. Based on our analysis of user reviews, iOS apps suffer frommore critical and post update problems. We would suggest that developers focustheir attention and resources on fixing such issues first, since many users seemto be easily annoyed by them. On the other hand, the Android platform suffers110more from compatibility, usability and performance issues so this is where Androiddevelopers should spend more time on.For App StoresApp Stores’ support and promotional opportunities greatly benefit the developers.We found evidence that Apple rewards well-made apps with promotional opportu-nities, which could drive the success of the app, an approach that could be adoptedby Google Play. Google Play’s quick review process is appealing to many devel-opers and something that can be improved on the Apple app store.For ResearchersOur results indicate that 80% of the top rated apps exist on both the Apple andGoogle Play app stores. While both platforms are popular and equally important,Android has gained the majority of the attention from the software engineering re-search community. Our suggestion is to look at both Apple Store and Google Playin future studies to have a more representative coverage. Additionally, examiningapp-pairs closely can help developers bridge the gap in terms of success for theirapps. As recently identified by Nagappan and Shihad [172] one of the obstacleswith cross-platform analysis is the lack of a dataset for such apps. Our dataset of80K app-pairs [165] can help to mitigate this issue; our dataset can be leveragedby other researchers for further cross-platform analysis.4.4.2 Threats to ValidityOur manual labelling of the reviews to train the classifiers could be a source ofinternal threat to validity. In order to mitigate this threat, uncertainties were cross-validated and resolved through discussions and refinements between the authors.As shown in Figure 4.4, the app-pairs detected in our study are a subset of allpossible app-pairs. Our study only considers exact matches for app-pairs, whichmeans there exist app-pairs that are not included in our analysis. For instance, anapp named The Wonder Weeks34 on iOS has a pair on the Android platform with34 https://itunes.apple.com/app/the-wonder-weeks/id529815782?mt=8111the name Baby Wonder Weeks Milestones,35 but not included in our study. Whileour study has false negatives, our manual validation of 100 randomly selected app-pairs shows that there are no false positives.In terms of representativeness, we chose app-pairs from a large representativesample of popular mobile apps and categories. With respect to generalizability,iTunes and Google Play are the most popular systems currently, although apps inother app stores could have other characteristics. Regarding replication, all ourdata is publicly available [165], making the findings of our study reproducible.4.5 Related WorkMobile app stores provide app developers with a new and critical channel to extractuser feedback. As a result, many studies have been conducted recently throughmining and analysis of app store content such as user-reviews [68, 71, 95, 113,124, 134–136, 140, 147, 156, 181, 183, 185, 208, 211], app descriptions [72, 105,143, 200], and app bytecode [54, 194, 195, 218].A number of studies [71, 93, 95, 110, 122] have focused on extracting valuableinformation for developers from user reviews in app stores. Lacob et al. [122]found that 23% of reviews represent feature requests. They proposed a prototypefor automatic retrieval of mobile app feature requests from online reviews. Chen etal. [71] found that 35% of app reviews contain information that can directly helpdevelopers improve their apps. They proposed AR-Miner, a technique to extractthe most informative user reviews. First, they filter out non-informative reviewsthrough text analysis and machine learning. Then, they use topic modelling torecognize topics in the reviews classified as informative. Panichella et al. [185]proposed an approach built on top of AR-Miner to automatically classify app re-views into different categories. Khalid et al. [134, 136] manually analyzed andtagged reviews of iOS apps to identify different issues that users of iOS apps com-plain about. They studied 6,390 low star-rating reviews for 20 free iOS apps anduncovered 12 types of complaints. They found that functional errors, feature re-quests and app crashes are the most frequent complaints while privacy and ethicalissues, and hidden app costs are the complaints with the most negative impact on35 https://play.google.com/store/apps/details?id=org.twisevictory.apps&hl=en112app ratings. Chen et al. [72] developed mechanisms to verify the maturity ratingsof apps based on app descriptions and user reviews and investigated the possiblereasons behind incorrect ratings. They discovered that over 30% of Android appshave unreliable maturity ratings.Our work, on the other hand, aims at characterizing the differences in mobileapp-pairs across two different platforms. To the best of our knowledge, this isthe first work to report a large-scale study targeting iOS and Android mobile app-pairs.4.6 ConclusionsIn this work, we present the first quantitative and qualitative study of mobile app-pairs. We mined 80K iOS and Android app-pairs and compared their app-storeattributes. We built three automated classifiers and classified 1.7M reviews to un-derstand how user complaints and concerns vary across platforms. Additionally,we contacted app developers to understand some of the major differences in app-pair attributes such as prices, update frequencies, success rates and top rated appsexisting only on one platform.For future work, the testing and analysis of apps across multiple platformscould be explored. While our recent study [88] is a step toward better under-standing of it, with the increased fragmentation in devices and platforms, it stillremains a challenge to test mobile apps across varying hardware and platforms[172]. Among other directions, the release dates of the app-pairs can be investi-gated to understand which platform developers target first when they release a newapp. Additionally, app features, extracted from app descriptions, can be used tocompare on different platforms. Finally, while we combined the stars, ratings anduser reviews to measure app success, future studies could explore ways of measur-ing user retention, number of downloads, user loyalty, recency, or monetization.113Chapter 5Reverse Engineering iOS Mobile ApplicationsSummary36As a result of the ubiquity and popularity of smartphones, the number of third partymobile apps is explosively growing. With the increasing demands of users for newdependable applications, novel software engineering techniques and tools gearedtowards the mobile platform are required to support developers in their programcomprehension and analysis tasks. In this work, we propose a reverse engineeringtechnique that automatically (1) hooks into, dynamically runs, and analyzes a giveniOS mobile app, (2) exercises its user interface to cover the interaction state spaceand extracts information about the runtime behaviour, and (3) generates a statemodel of the given application, capturing the user interface states and transitionsbetween them. Our technique is implemented in a tool called ICRAWLER. To eval-uate our technique, we have conducted a case study using six open-source iPhoneapps. The results indicate that ICRAWLER is capable of automatically detecting theunique states and generating a correct model of a given mobile app.5.1 IntroductionAccording to recent estimations [178], by 2015 over 70 percent of all handset ship-ments will be smartphones, capable of running mobile apps.Currently, there areover 600,000 mobile apps on Apple’s AppStore [21] and more than 400,000 on36This chapter appeared at the 19th IEEE Working Conference on Reverse Engineering (WCRE2012) [85].114Android Market [19].Some of the challenges involved in mobile app development include handlingdifferent devices, multiple operating systems (Android, Apple iOS, Windows Mo-bile), and different programming languages (Java, Objective-C, Visual C++). More-over, mobile apps are developed mostly in small-scale, fast-paced projects to meetthe competitive market’s demand [137]. Given the plethora of different mobile appsto choose from, users show low tolerance for buggy unstable applications, whichputs an indirect pressure on developers to comprehend and analyze the quality oftheir applications before deployment.With the ever increasing demands of smartphone users for new applications,novel software engineering techniques and tools geared towards the mobile plat-form are required [82, 169, 213] to support mobile developers in their programcomprehension, analysis and testing tasks [91, 128].According to a recent study [192], many developers interact with the graphicaluser interface (GUI) to comprehend the software by creating a mental model ofthe application. For traditional desktop applications, an average of 48% of theapplication’s code is devoted to GUI [170]. Because of their highly interactivenature, we believe the amount of GUI-related code is typically higher in mobileapps.To support mobile developers in their program comprehension and analysistasks, we propose a technique to automatically reverse engineer a given mobile appand generate a comprehensible model of the user interface states and transitionsbetween them. In this work, we focus on native mobile apps for the iOS platform.To the best of our knowledge, reverse engineering of iOS mobile apps has not beenaddressed in the literature yet.Our work makes the following contributions:• A technique that automatically performs dynamic analysis of a given iPhoneapp by executing the program and extracting information about the runtimebehaviour. Our approach exercises the application’s user interface to coverthe interaction state space;• A heuristic-based algorithm for recognizing a new user interface state, com-posed of different UI elements and properties.115• A tool implementing our technique, called ICRAWLER (iPhone Crawler),capable of automatically navigating and generating a state model of a giveniPhone app. This generated model can assist mobile developers to bettercomprehend and visualize their mobile app. It can also be used for analysisand testing purposes (i.e., smoke testing, test case generation).• An evaluation of the technique through a case study conducted on six differ-ent open-source iPhone apps. The results of our empirical evaluation showthat ICRAWLER is able to identify the unique states of a given iPhone appand generate its state model accurately, within the supported transitional UIelements.5.2 Related WorkWe divide the related work in three categories: mobile app security testing, in-dustrial testing tools currently available to mobile developers, and GUI reverseengineering and testing.Mobile App Security Testing. Security testing of mobile apps has gained mostof the attention from the research community when compared to other areas of re-search such as functional testing, maintenance, or program comprehension. Mostsecurity testing approaches are based on static analysis of mobile apps [67] todetect mobile malware. Egele et al. [84] propose PIOS to perform static taintanalysis on iOS app binaries. To automatically identify possible privacy gaps, themobile app under test is disassembled and a control flow graph is reconstructedfrom Objective-C binaries to find code paths from sensitive sources to sinks. Ex-tending on PIOS, the same authors discuss the challenges involved in dynamicanalysis of iOS apps and propose a prototype implementation of an Objective-Cbinary analyzer [205]. Interestingly, to exercise the GUIs, they use image process-ing techniques. This work is closest to ours. However, their approach randomlyclicks on an screen area and reads the contents from the device’s frame buffer andapplies image processing techniques to compare screenshots and identify interac-tive elements. Since image comparison techniques are known to have a high rateof false positives, in our approach we “programmatically” detect state changes byusing a heuristic-based approach.116Industrial Testing Tools. Most industrial tools and techniques currently avail-able for analyzing mobile apps are manual or specific to the application in a waythat they require knowledge of the source code and structure of the application.For instance, KIF (Keep It Functional) [34] is an open source iOS integration testframework, which uses the assigned accessibility labels of objects to interact withthe UI elements. The test runner is composed of a list of scenarios and each sce-nario is composed of a list of steps. Other similar frameworks are FRANK [31]and INSTRUMENTS [125]. A visual technology, called SIKULI [39], uses fuzzyimage matching algorithms on the screenshots to determine the positions of GUIelements, such as buttons, in order to find the best matching occurrence of an im-age of the GUI element in the screen image. SIKULI creates keyboard and mouseclick events at that position to interact with the element. There are also record andplayback tools for mobile apps such as MONKEYTALK [36]. However, using suchtools requires application-specific knowledge and much manual effort.GUI Reverse Engineering and Testing. Reverse engineering of desktop user in-terfaces was first proposed by Memon et al. in a technique called GUI Ripping[159]. Their technique starts at the main window of a given desktop application,automatically detects all GUI widgets and analyzes the application by executingthose elements. Their tool, called GUITAR, generates an event-flow graph to cap-ture a model of the application’s behaviour and generate test-cases.For web applications, Mesbah et al. [163] propose a crawling-based techniqueto reverse engineer the navigational structure and paths of a web application undertest. The approach, called CRAWLJAX, automatically builds a model of the appli-cation’s GUI by detecting the clickable elements, exercising them, and comparingthe DOM states before and after the event executions. The technique is used forautomated test case generation [164] and maintenance analysis [160] in web appli-cations.Amalfitano et al. [49] extend on this approach and propose a GUI crawlingtechnique for Android apps. Their prototype tool, called A2T2, manages to extractmodels of a small subset of widgets of an Android app.Gimblett et al. [97] present a generic description of UI model discovery, inwhich a model of an interactive software is automatically discovered through sim-ulating its user actions. Specifically they describe a reusable and abstract API for117user interface discovery.Further, Chang et al. [69] build on SIKULI, the aforementioned tool, to auto-mate GUI testing. They help GUI testers automate regression testing by program-ming test cases once and repeatedly applying those test cases to check the integrityof the GUI.Hu et al. [119] propose a technique for detecting GUI bugs for Android ap-plications using Monkey [35], an automatic event generation tool. Their techniqueautomatically generates test cases, feeds the application with random events, in-struments the VM, and produces log/trace files to detect errors by analyzing thempost-run.To the best of our knowledge, no work has been done so far to reverse engineerObjective-C iPhone apps automatically. Our approach and algorithms are differentfrom the aforementioned related work in the way we track the navigation within theapplication, retrieve the UI views and elements, and recognize a new state, whichare geared towards native iPhone user interfaces.5.3 Background and ChallengesHere, we briefly describe the relevant iPhone programming concepts [125] requiredfor understanding our approach in Section 5.4.Objective-C is the primary programming language used to write native iOSapps. The language adds a thin layer of object-oriented and Smalltalk-style mes-saging to the C programming language. Apple provides a set of Objective-C APIscollectively called Cocoa. Cocoa Touch is a UI framework on top of Cocoa. One ofthe main frameworks of Cocoa Touch is UIKit, which provides APIs to developeiOS user interfaces.The Model-View-Controller design pattern is used for building iOS apps. Inthis model, the controller is a set of view controllers as well as the UIApplicationobject, which receives events from the system and dispatches them to other partsof the system for handling. As soon as an app is launched, the UIApplicationmain function creates a singleton application delegate object that takes control.The application delegate object can be accessed by invoking the shared applicationclass method from anywhere in code.118At a minimum, a window object and a view object are required for presentingthe application’s content. The window provides the area for displaying the contentand is loaded from the main nib file.37 Standard UI elements, which are providedby the UIKit framework for presenting different types of content, such as labels,buttons, tables, and text fields are inherited from the UIView class. Views drawcontent in a designated rectangular area and handle events.Events are objects sent to an application to inform it of user actions. Manyclasses in UIKit handle touch events in ways that are distinctive to objects of theclass. The application sends these events to the view on which the touch occurred.That view analyzes the events and responds in an appropriate manner. For exam-ple, buttons and sliders are responsive to gestures such as a tap or a drag whilescroll views provide scrolling behaviour for tables or text views. When the sys-tem delivers a touch event, it sends an action message to a target object when thatgesture occurs.View controllers are used to change the UI state of an application. A viewcontroller is responsible for handling the creation and destruction of its views, andthe interactions between the views and other objects in the application. The UIKitframework includes classes for view controllers such as UITabBarController,UITableViewController and UINavigationController. Because iOSapps have a limited amount of space in which to display content, view controllersalso provide the infrastructure needed to swap out the views from one view con-troller and replace them with the views of another view controller. The most com-mon relationships between source and destination view controllers in an iPhoneapp are either by using a navigation controller, in which a child of a navigation con-troller pushes another child onto the navigation stack, or by presenting a view con-troller modally. The navigation controller is an instance of the UINavigationController class and used for structured content applications to navigate be-tween different levels of content in order to show a screen flow, whereas the modalview controllers represent an interruption to the current workflow.Challenges. Dynamic analysis of iOS apps has a number of challenges. Most iOSapps are heavily based on event-driven graphical user interfaces. Simply launching37A nib file is a special type of resource file to store the UI elements in.119	  	  Figure 5.1: The Olympics2012 iPhone app going through a UI state transition, after a generated event.an application will not be sufficient to infer a proper understating of the applica-tion’s runtime behaviour [205]. Unfortunately, most iOS apps currently do notcome with high coverage test suites. Therefore, to execute a wide range of pathsand reverse engineer a representative model, an approach targeting iOS apps needsto be able to automatically change the application’s state and analyze state changes.One challenge that follows is defining and detecting a new state of an applica-tion while executing and changing its UI. In other words, automatically determin-ing whether a state change has occurred is not that straightforward.Another challenge, associated with tracking view controllers, revolves aroundthe fact that firing an event on the UI could result in several different scenariosas far as the UI is concerned, namely, (1) the current view controller could goto the next view controller (modally, by being pushed to the navigation stack, orchanging to the next tab in a tab bar view controller) or (2) UI element(s) in thecurrent interface could be dynamically added/removed/changed, or (3) the currentview controller goes back to the previous view controller (dismissed modally orpopped from the navigation stack), or (4) nothing happens. Analyzing each of thesescenarios requires a different way of monitoring the UI changes and the navigationstack.120TabBarItemclickedgotoArcherygotoCycling1gotoCycling2gotoCycling3gotoCycling4gotoDivinggotoEquestrian1gotoEquestrian2gotoEquestrian3gotoFencinggotoFootballgotoArcherygotoGymnastics1gotoGymnastics2gotoGymnastics3gotoHandballgotoHockeygotoJudogotoRowinggotoSailinggotoShootinggotoSwimminggotoAthleticsgotoSynchronisedSwimminggotoTableTennisgotoTaekwondogotoTennisgotoTriathlongotoVolleyballgotoWaterPologotoWeightliftinggotoWrestlinggotoBadmintongotoBasketballgotoBeachgotoBoxinggotoCanoe1gotoCanoe2 TabBarItemclicked Back TabBarItemclicked  TabBarItemclicked Figure 5.2: The generated state graph of the Olympics2012 iPhone app.5.4 Our ApproachOur approach revolves around dynamically running a given iOS mobile app, nav-igating its user interface automatically, and reverse engineering a model of theapplication’s user interface states and transitions between them. Figure 5.1 showssnapshots of an iPhone app (called Olympics2012, used in our case study in Sec-tion 5.6) UI state transition after an event. Figure 5.2 shows the automaticallygenerated state graph of the same application. The figure is minimized because ofspace restrictions, and it is depicted to give an impression of the graph inferred byour approach.Figure 5.3 depicts the relation between our technique and a given mobile app.The following seven steps outline our technique’s operation.121iCrawleriPhone ApplicationUI ApplicationControllerEventsApp DelegateViewiCrawler ControllerShared InstanceState Flow GraphView ControllerUI WindowViews & UI objectsKIFUIEvent UIToucheventAccessInvocationProcessing FileProcessing ComponentLegendInter AccessFigure 5.3: Relation between ICRAWLER and a given iPhone app. The right side of the graph shows keycomponents of an iPhone app taken from [125].Step 1 - Hooking into the Application: As soon as the application is started, ourtechnique kicks in by setting up a shared instance object. As shown in Fig-ure 5.3, we immediately hook into and monitor the application delegate ob-ject to identify the initial view controller and infer information about its UIcomponents.Step 2 - Analyzing UI Elements: After obtaining the initial view controller, wehave access to all its UI elements. We keep this information in an arrayassociated to the view controller. Meanwhile, our technique recognizes thedifferent types of UI elements, such as labels, buttons, table cells, and tabs,and identifies which UI elements have an event listener assigned to them.Step 3 - Exercising UI Elements: To exercise a UI element, we look for an un-visited UI element that has an event listener. As depicted in Figure 5.3, aftergathering all the information about the event listeners, the UI object, andits action and target, we generate an event on that element and pass it to122UIApplication object, which is responsible for receiving the events anddispatching them to the code for further handling.Step 4 - Accessing Next View Controller: By observing the changes on the cur-rent view controller, we obtain the next view controller and analyze the be-haviour. The event could lead to four scenarios: no user interface change,the changes are within the current view controller, going to a new view con-troller, or going to the previous view controller.Step 5 - Analyzing New UI Elements: After getting a new view controller, wecollect all its UI elements. If the action has resulted in staying in the currentview controller, we record the changes on the UI elements.Step 6 - Comparing UI States: Once we get the new view controller and its UIelements, we need to compare the new state with all the perviously visitedunique states. This way, we can determine if the action changes the currentstate or ends up on a state that has already been analyzed. If the state is notvisited before, it is added to the set of unique visited states.Step 7 - Recursive Call: We recursively repeat from step 3 until no other exe-cutable UI elements are left within the view controller and we have traversedall the view controllers.We further describe our approach in the following subsections.5.4.1 Hooking into the ApplicationThe process of accessing the initial view controller is different from the rest ofthe view controllers. Since our goal is to be as nonintrusive and orthogonal tothe application’s source code as possible, we determine the initial view controllerby performing a low-level Objective-C program analysis on the application del-egate object. To that end, we employ a number of runtime functions to deducethe initial view controller. We use the Objective-C runtime reference library [38],which provides support for the dynamic properties of the Objective-C languageand works with classes, objects, and properties directly. It is effective primarily forlow-level debugging and meta-programming. In addition, the Key-Value Coding123(KVC) protocol [37] is used to access UI objects at runtime. The KVC protocolassists in accessing the properties of an object indirectly by key/value, rather thanthrough invocation of an accessor method or as instance variables [37].Once the application delegate is accessed, we retrieve all the properties of thisclass and their names. After getting the property names in the application delegate,we call a KVC method to access an instance variable of the initial view controllerusing the property name string. This way, we are able to identify the type of initialview controller (e.g., UITabBarController, UINavigationController,just a custom UIViewController). This knowledge is required for setting upthe initial state.5.4.2 Analyzing UI ElementsIn our approach, a UI state includes the current view controller, its properties,accompanied by its set of UI elements. Once we get the view controller, we readall the subviews, navigation items, as well as tool bar items of the view controllerin order to record the corresponding UI elements in an array associated to the viewcontroller. Having the required information for a state, we set a global variable topoint to the current state throughout the program.5.4.3 Exercising UI ElementsWe fire an event (e.g., a tap) on each unvisited UI element that has an event-listenerassigned to it. Since events are handled in different ways for different UI classes,for each UI element type, such as tables, tabs, text views, and navigation bar items,we recognize its type and access the appropriate view. As shown in Figure 5.3, weuse KIF’s methods to handle the event.After an element is exercised, we use a delay to wait for the UI to update,before calling the main function recursively. Based on our experience, a 1 secondwaiting time is enough after firing different event types such as tapping on a tablecell or a button, scrolling a table up and down, and closing a view.1241 (void)icDismissModalVC:(BOOL) animated {2 [NSUserDefaultsstandardUserDefaults setBool:YES forKey:@"IC_isDismissed"];3 // Call the original (now renamed) method4 self icDismissModalVC:animated;5 }Figure 5.4: The new method in which we inject code to set the dismissed boolean and then call the originalmethod.5.4.4 Accessing the Next View ControllerAfter exercising a UI element, we need to analyze the resulting UI state. An eventcould potentially move the UI forward, backward, or have no effect at all.At a low level, going back to a previous view controller in iPhone apps happenseither by popping the view controller from the navigation stack or by dismissinga modal view controller. We monitor the navigation stack after executing eachevent to track possible changes on the stack and thus, become aware of the popcalls. However, being aware of dismissing a modal view needs to be addresseddifferently. Our approach combines reflection with code injection to track if adismiss method is called. To that end, we employ the Category and Extension [65]feature of Objective-C, which allows adding methods to an existing class withoutsubclassing it or knowing the original classes. We also use a technique calledMethod Swizzling [76], which allows the method implementation of a class to beswapped with another method.We define a category extension to the UIViewController class and adda new method in this category (See Figure 5.4). We then swap a built-in methodof the view controller, responsible for dismissing a view controller class, with thenew method (See Figure 5.5). The static +load method is also added to the cat-egory and called when the class is first loaded. We use the +load method toswap the implementation of the original method with our replaced method. Theswap method call swaps the method implementations such that calls to the orig-inal method at run-time result in calls to our method defined in the category. Asshow in Figure 5.4, we also call the original method, which is now renamed. Ourmethod stores a boolean data in the defaults system. The iOS defaults system isavailable throughout the application, and any data saved in the defaults system willpersist through application sessions. Therefore, after a dismiss call occurs, we set1251 (void)load {2 if (self == UIViewController class) {3 Method originalMethod = class_getInstanceMethod(self, @selector(←↩dismissModalViewControllerAnimated:));5 Method replacedMethod = class_getInstanceMethod(self, @selector(←↩icDismissModalVC:));7 swap(originalMethod, replacedMethod);8 }9 }Figure 5.5: Swapping the original built-in method with our new method in the +load function.the dismissed boolean to true. At runtime, each time an action is executed, wecheck the dismissed boolean in the NSUserDefaults object to see if dismisshas occurred. We set this back to false if that is the case. This way we are ableto track if the event results in going back to a previous view controller to take theproper corresponding action.A new view controller could be pushed to the navigation stack, presentedmodally, or be a new tab of a tab bar controller. If the action results in stayingin the current view controller, different state changes could still occur such as UIelement(s) dynamically being changed/added/removed, or a pop-up message or anaction sheet appearing. If we do not notice any changes within the current state,we move further with finding the next clickable UI element. Otherwise, we need toconduct a state comparison to distinguish new states from already visited states. Ifthe state is recognized as a new state, a screen shot of the interface is also recoded.5.4.5 Comparing StatesAnother crucial step in our analysis is determining whether the state we encounterafter an event is a new UI state. As opposed to other techniques that are based onimage-based comparisons [69, 205], in order to distinguish a new state from thepreviously detected states, we take a programmatic, heuristic-based approach inwhich we compare the view controllers and all their UI elements of the applicationbefore and after the event is executed.Deciding what constitutes a UI state change is not always that straightforward.For instance, consider when a user starts typing a string to a text field and that126action changes the value of the text field’s property, or when the sent button of anemail application is enabled as soon as the user starts typing a body of the email.We need a way to figure out if these changes (changing text of a text field/labelor enabling a button) should be seen as a new state. To that end, we propose asimilarity-based heuristic to emphasize or ignore changes on specific properties ofview controllers, their accompanying UI elements, and the elements’ properties.Our state recognition heuristic considers the following properties of view con-trollers: class, title, and the number of UI elements. In addition, for eachUI element, it considers class, hidden, enable, target, and action. Al-though our algorithm can handle as many properties as required, we are interestedin these attributes because we believe they are most likely to cause a visible UI statechange. We consider a set of distinct weights for each of the aforementioned at-tributes of a view controller, denoted as WVC = {wvc1,wvc2, ..} as well as anotherset of distinct weights for each of the aforementioned attributes of a UI element asWE = {we1,we2, ..}. The value of each weight is a number between 0 and 1. Allweights have default values that can be overridden by the user’s input if required.These default values are obtained for each weight through an experimental trial anderror method (discussed in Section 5.6). The similarity, σ , between two UI statesis a percentage calculated as follows:σ =∑Size(WVC)i=1 |WVCi|VCi +∑Nej=1∑Size(WE)k=1 |WEk|El jSize(WVC)+Ne×Size(WE)×100where VC returns 1 if the property of the two view controllers are equal and0 otherwise, Size(WVC) and Size(WE) return the total number of propertiesconsidered for a view controller and a UI element respectively. The second part ofthe summation calculates similarity of each of the elements’ properties. El returns1 if the property of the two UI elements is equal and Ne is the total number of UIelements. The total summation of the view controllers and elements is divided bythe total number of properties.Algorithm 2 shows our algorithm for checking the similarity of the current state(after an event) with all the visited states. It returns a similar state if one is foundamong the visited states. As input the algorithm gets two sets of distinct weights127Algorithm 2: State Change Recognitioninput : Set of weights for view controller properties (Wvc)input : Set of weights for UI element properties (We)input : Similarity threshold (τ)input : Set of the unique states visited (VS)input : Current state (cs)output: Similar state (s ∈ VS, otherwise nil)1 begin2 σ ← 03 foreach s ∈ VS do4 σ ←5 (Wvc(class) × (s.viewController.class ≡ cs.viewController.class) +6 Wvc(title) × (s.viewController.title ≡ cs.viewController.title) +7 Wvc(elements) × (s.uiElementsCount ≡ cs.uiElementsCount))8 foreach e1 ∈ s.uiElementsArray do9 e2← GETELEMENTATINDEX(cs,e1)10 σ ← σ + (We(class) × (e1.class ≡ e2.class) + We(hidden) × (e1.hidden≡ e2.hidden) + We(enable) × (e1.enable ≡ e2.enable) + We(target) ×(e1.target ≡ e2.target) + We(action) × (e1.action ≡ e2.action))11 attributes← Size(Wvc) + (s.uiElementsCount × Size(We))12 if ((σ/attributes)×100)>= τ then13 return s14 return nilfor view controller and UI element, a similarity threshold (τ), the set of uniquestates visited so far, and the current state.For each visited state (line 3), we calculate the similarity of the two states byadding the similarity of the two view controllers’ classes, titles and the number ofUI elements (line 7). Then for each UI element in a visited state (line 8), the cor-responding UI element in the current state (line 9) is retrieved and their similarityis calculated. Finally, we divide the similarity by the total number of attributes,which are considered so far, calculate the percentage (line 12) and compare it tothe threshold. The algorithm assumes the two interfaces to be equivalent if thecalculation of the aforementioned weight-based attributes are more than or equalto τ . In other words, we consider two UI states equal, if they have the same viewcontroller, title, set of UI elements, with the same set of selected properties, andthe same event listeners.1285.4.6 State Graph GenerationTo explore the state space, we use a depth-first search algorithm and incrementallycreate a multi-edge directed graph, called a state-flow graph [163], with the nodesrepresenting UI states and edges representing user actions causing a state transition.5.5 Tool Implementation: ICRAWLERWe have implemented our approach in a tool called ICRAWLER. ICRAWLER is im-plemented in Objective-C using Xcode 3. We use a number of libraries as follows.DCINTROSPECT [30] is a library for debugging iOS user interfaces. It lis-tens for shortcut keys to toggle view outlines and print view properties as well asthe action messages and target objects, to the console. We have extended DCIN-TROSPECT in a way to extract a UI element’s action message, target object, it’sproperties and values. We further use our extension to this library to output all thereverse engineered UI elements’ properties within one of our output files.To generate an event or insert textual input, we use and extend the KIF frame-work [34]. At runtime, ICRAWLER extracts UI elements with event-listeners as-signed to them and collects information about the action message and target objectof each UI elements. By recognizing the type of a UI element, ICRAWLER gainsaccess to its appropriate view. Then it uses KIF’s internal methods to generate anevent on the view.At the end of the reverse engineering process, the state graph is transformedinto an XML file using XSWI [42], which is a standalone XML stream writerimplemented in Objective-C.The output of ICRAWLER consists of the following three items: (1) an XMLfile, representing a directed graph with actions as edges and states as nodes. (2)screenshots of the unique states, and (3) a log of all the reverse engineered UIelements (including their properties, values, actions and targets), generated eventsand states.129Table 5.1: Experimental objects.ID Exp. Object Resource1 Olympics2012 https://github.com/Frahaan/2012-Olympics-iOS--iPad-and-iPhone--source-code2 Tabster http://developer.apple.com/library/ios/#samplecode/Tabster/Introduction/Intro.html#3 TheElements http://developer.apple.com/library/ios/#samplecode/TheElements/Introduction/Intro.html#4 Recipes & Printing http://developer.apple.com/library/ios/#samplecode/Recipes + Printing/Introduction/Intro.html#5 NavBar http://developer.apple.com/library/ios/#samplecode/NavBar/Introduction/Intro.html#6 U Decide http://appsamuck.com/day12.html5.6 Empirical EvaluationTo assess the effectiveness of our reverse engineering approach, we have conducteda case study using six open-source iPhone apps.We address the following research questions in our evaluation:RQ1 Is ICRAWLER capable of identifying unique states of a given iPhone appli-cation correctly?RQ2 How complete is the generated state model in terms of the number of edgesand nodes?RQ3 How much manual effort is required to set up and use ICRAWLER? What isthe performance of ICRAWLER?5.6.1 Experimental ObjectsWe include six open-source experimental objects from the official Apple samplecode, Guithub, and other online resources. Table 5.1 shows each objects’s ID,name, and resource. Table 5.2 presents the characteristics of these applications interms of their size and complexity. We use XCODE STATISTICIAN38 for collectingmetrics such as the number of header and main files, lines of code (LOC) and38http://xcode-statistician.mac.informer.com/130Table 5.2: Characteristics of the experimental objects.ID .m/.h Files LOC (Objective-C) Statements (;) Widgets1 22 2,645 1,559 3982 21 1,727 286 143 28 2,870 690 214 23 2,127 508 75 20 1,487 248 106 13 442 162 15statements. The table also shows the number of UI widgets within each application.The UI widget is a UI element, such as a tab bar view with all of its tab icons, atable view with all of its cells, a label or a button. The number of UI widgets iscollected through ICRAWLER’s output file, which logs all the UI elements and theirproperties.5.6.2 Experimental DesignIn order to address RQ1, we need to compare unique states generated by ICRAWLERto the actual unique states for each application. As mentioned before, ICRAWLERidentifies the unique states through Algorithm 3 and keeps the screen-shots of theunique states in a local folder. To form a comparison baseline, we manually runand navigate each application and count the unique states and compare that withthe output of ICRAWLER.To assess the ICRAWLER’s generated state model (RQ2), we also require toform a baseline of the actual number of edges (i.e. user’s actions that change thestates) and states (unique and repetitive) to compare with the ICRAWLER’s statemodel. Therefore, we manually run and navigate each application and count theedges and the states. Note that there are currently no other similar tools availableto compare ICRAWLER’s results against.In order to address RQ3, we measure the time required to set up ICRAWLERand employ it to each of the given iPhone apps. The following series of manualtasks are required before ICRAWLER can start the analysis:• The ICRAWLER framework should be added to the application’s project un-131der analysis.• In order to enable ICRAWLER to access the delegating application object,the ICRAWLER’s initialization line of code should be added to the built-inmethod, application: didFinishLaunchingWithOptions:.• Finally, a preprocessor flag (RUN ICRAWLER) needs to be added to the cre-ated Xcode target.Further to investigate the performance of ICRAWLER for each application un-der test, we measure the time between calling ICRAWLER and when ICRAWLERfinishes its job.As we mentioned earlier, we obtain default values for the threshold and similar-ity weights by an experimental trial and error method for each of the applications.The best values that we have observed are: threshold (%70); weights include: viewcontroller’s class (0.8), title (0.8), and number of UI elements (0.8); UI ele-ment’s class (0.7), hidden (0.7), enable (0.7), target (0.7), and action(0.7). These are also the values used in our evaluation, for all the experimentalobjects.5.6.3 ResultsSetting up ICRAWLER and utilizing it for a given iPhone app takes 10 minutes onaverage. The results of our study are shown in Table 5.3. The table shows the num-ber of Unique States, Total States, and Edges counted manually andby ICRAWLER. Further, the total number of Generated Events and TotalTime for ICRAWLER are presented. We should note that the total time dependson the application and includes the delay (1 sec) we use after each action. Thenumber of generated events is different from the number of detected edges. Theevents include all the user actions, while the edges are only those actions that resultin a state change (including back-ward edges). For instance, scrolling a table or aview up and down counts as an event while it is not an edge in the state model.Another example, related to our state comparison algorithm, is a label that changesafter executing a button, which ICRAWLER does not consider as a new state.Below, we describe some of the results in Table 5.3.132Table 5.3: Results.Manual ICRAWLERID UniqueStatesTot.States EdgesUniqueStatesTotalStates EdgesGen.EventsTotalTime (Sec)1 6 81 43 6 81 43 85 882 11 16 17 9 12 11 18 183 6 16 15 6 16 15 27 294 6 13 10 3 5 4 8 105 8 14 13 3 5 4 7 96 2 13 2 2 13 2 12 13The Olympics2012 (#1) application provides information about 38 sportsin the Olympics 2012 as well as a timetable and a count down (See Figure 5.1and Figure 5.2). According to Table 5.3, ICRAWLER is capable of identifying thecorrect number of uniques states, total states, and edges within this application.The events include tapping on a tab bar item, scrolling up/down a view, scrollingup/down a table, tapping on a backward/forward button and tapping on a buttonwhich flips the view. The number of user actions, i.e., generated events, is 85while the number of edges is 43 (including a back-ward edge). This is becauseuser actions such as scrolling are not changing states and as a result they are notcounted as edges. The number of uniques states is 6 while the number of totalstates is 81. This is because there are 38 buttons in this application which lead to asame UI state while presenting different data for 38 types of sports.Events within Tabster (#2) include tapping on a tab bar item, scrolling up-/down a table, tapping on a table cell, tapping on a backward/forward button,tapping on a dismiss/present button and writing a text. When exercising UI ele-ments which require text input through keyboard, we used a dummy string basedon the keyboard type e.g., numeric, alphanumeric, url or email address input. Asit is shown in Table 5.3, our approach is able to identify 11 edges and 9 uniquesstates. However Tabster has the tab bar view with a “more page” feature andICRAWLER supports an ordinary tab bar view (without the “more” feature) at thistime. As a result, there is a difference between the number of uniques states andedges in baseline and ICRAWLER.Actions within TheElements (#3) application include tapping on a tab baritem, scrolling up/down a table, tapping on a table cell, tapping on a back-ward/forward133button and tapping on a button which flips the view. ICRAWLER is successfullyable to cover the states and edges of TheElements. Here, we disabled a button,which closes the application and forwards the user to the AppStore.The Recipes & Printing application (#4) browses recipes and has theability to print the browsed recipes. Here, the difference between manual andICRAWLER results in Table 5.3 is due to ignoring the states and actions involvedwith printing.For tables, one could think of different strategies to take: (1) generate an eventon each and every single table cell, (2) randomly click on a number of table cells(3) generate an event on the first table cell. In our technique, once ICRAWLERencounters a table view, it scrolls down and up to ensure the scrolling action worksproperly and it does not cause to any unwanted crashes, e.g., by having a specificcharacter in an image’s url and trying to load the image on a table cell. ICRAWLERthen generates an event on the first row and moves forward. This works well fortable cells that result to the same next view. However, there are cases in whichtable cells lead to a different view. NavBar (#5) is such a case. There are fivedifferent table cells within this application, which go to different UI states. Thuswe witness a difference between the number of edges or states counted manuallyand by ICRAWLER. This is a clear empirical evidence suggesting that we need toimprove our table cell analysis strategy.5.6.4 FindingsThe results of our case study show that ICRAWLER is able to identify the uniquestates of a given iPhone app and generate its state model correctly, within the sup-ported UI elements and event types. Generally, it takes around 10 minutes to set upand use ICRAWLER. The performance of ICRAWLER is acceptable. For the set ofexperimental objects, the minimum analysis time was 9 seconds (5 states, 4 edges,7 events) and the maximum was 88 seconds (81 states, 43 edges, 85 events).5.7 DiscussionLimitations. There are some limitations within our current implementation of theapproach. Although it is minimal, the users still need to complete a few tasks134to set up ICRAWLER within their applications manually. There are also some UIelements such as the tool bar, slider, page control, and search bar, which are notsupported currently. In addition, while ICRAWLER currently supports the mostcommon gestures in iOS apps such as tapping on a UI element, inserting text, andscrolling views, there is no support yet for other advanced gestures such as swipingpages and pinching (e.g., zooming in and out images).Threats to Validity. The fact that we form the comparison baselines manuallycould be a threat to internal validity. We did look for other tools to compare ourresults against, without success. Manually going through the different applicationsto create baselines is labour intensive and potentially subject to errors and author’sbias. We tried to mitigate this threat by asking two other students to create thecomparison baselines.Additionally, the independent variables of weights and threshold within ourstate recognition algorithm have a direct effect on our dependent variables such asnumber of unique states and edges. As a result, choosing other values for these in-dependent variables rather than our default values, could result in difference in theoutcome. As mentioned in the evaluation section, we chose these optimal valuesthrough a series of trial and error experiments.In our attempt to gather the experimental objects, we noticed that there is asmall collection of open-source iPhone apps available online – note that we couldnot use applications available in AppStore for our experiment since we neededaccess to their source code. Even though, this made it difficult to select applicationsthat reflect the whole spectrum of different UI elements in iPhone apps, we believethe selected objects are representative of the type of applications ICRAWLER canreverse engineer. However, we acknowledge the fact that, in order to draw moregeneral conclusions, more mobile apps are required.Applications. There are various applications for our technique. First of all, ourtechnique enables automatic interaction with the mobile app. This alone can beseen as performing smoke testing (e.g., to detect crashes). In addition, the statemodel inferred can be used for automated test case generation. Further, using themodel to provide a visualization of the state space supports developers to obtaina better understanding of their mobile apps. The approach can be extended to135perform cross-platform testing [161], i.e., whether an application is working cor-rectly on different platforms such as iOS and Android, by comparing the generatedmodels. Finally, other application areas could be in performance and accessibilitytesting of iOS apps.5.8 ConclusionsAs smartphones become ubiquitous and the number of mobile apps increases, newsoftware engineering techniques and tools geared towards the mobile platform arerequired to support developers in their program comprehension, analysis, and test-ing tasks.In this work, we presented our reverse engineering technique to automaticallynavigate a given iPhone app and infer a model of its user interface states. Weimplemented our approach in ICRAWLER, which is capable of exercising and ana-lyzing UI changes and generate a state model of the application. The results of ourevaluation, on six open source iPhone apps, point to the efficacy of the approachin automatically detecting unique UI states, with a minimum level of manual effortrequired from the user. We believe our approach and techniques have the potentialto help mobile app developers increase the quality of iOS apps.There are several opportunities in which our approach can be enhanced andextended for future research. The immediate step would be to extend the currentversion of ICRAWLER to support the remaining set of UI elements within UIKITsuch as the tool bar, slider, page control, and search bar. Other directions can usethis technique for smoke testing of iPhone apps as well as generating test casesfrom the inferred state model. Furthermore, ICRAWLER can be extended to sup-port iPad apps as well as reverse engineering analysis at the binary level. This isbeneficial as AppStore distributes binary code of the applications, and this wouldbe interesting to apply automated testing to any application disregarding havingaccessibility to its source code.136Chapter 6Detecting Inconsistencies in Multi-PlatformMobile AppsSummary39Due to the increasing popularity and diversity of mobile devices, developers writethe same mobile app for different platforms. Since each platform requires its ownunique environment in terms of programming languages and tools, the teams build-ing these multi-platform mobile apps are usually separate. This in turn can resultin inconsistencies in the apps developed. In this work, we propose an automatedtechnique for detecting inconsistencies in the same native app implemented for iOSand Android platforms. Our technique (1) automatically instruments and traces theapp on each platform for given execution scenarios, (2) infers abstract models fromeach platform execution trace, (3) compares the models using a set of code-basedand GUI-based criteria to expose any discrepancies, and finally (4) generates a vi-sualization of the models, highlighting any detected inconsistencies. We have im-plemented our approach in a tool called CHECKCAMP. CHECKCAMP can helpmobile developers in testing their apps across multiple platforms. An evaluationof our approach with a set of 14 industrial and open-source multi-platform nativemobile app-pairs indicates that CHECKCAMP can correctly extract and abstractthe models of mobile apps from multiple platforms, infer likely mappings betweenthe generated models based on different comparison criteria, and detect inconsis-39This chapter appeared at the 26th IEEE International Symposium on Software Reliability Engi-neering (ISSRE 2015) [88].137tencies at multiple levels of granularity.6.1 IntroductionRecent industry surveys [50, 51] indicate that mobile developers are mainly inter-ested in building native apps, because they offer the best performance and allow foradvanced UI interactions. Native apps run directly on a device’s operating system,as opposed to web-based or hybrid apps, which run inside a browser.Currently, iOS [21] and Android [19] native mobile apps40 dominate the appmarket each with over a million apps in their respective app stores. To attract moreusers, implementing the same mobile app across these platforms has become acommon industry practice. Ideally, a given mobile app should provide the samefunctionality and high-level behaviour on different platforms. However, as foundin our recent study [86], a major challenge faced by industrial mobile develop-ers is to keep the app consistent across platforms. This challenge is due to themany differences across the platforms, from the devices’ hardware, to operatingsystems (e.g., iOS/Android), and programming languages used for developing theapps (e.g., Objective-C/Java). We also found that developers currently treat themobile app for each platform separately and manually perform screen-by-screencomparisons, often detecting many cross-platform inconsistencies [86]. This man-ual process is, however, tedious, time-consuming, and error-prone.In this work, we propose an automated technique, called CHECKCAMP (Check-ing Compatibility Across Mobile Platforms), which for the same mobile app im-plemented for iOS and Android platforms (1) instruments and generates traces ofthe app on each platform for a set of user scenarios, (2) infers abstract models fromthe captured traces that contain code-based and GUI-based information for eachpair, (3) formally compares the app-pair using different comparison criteria to ex-pose any discrepancies, and (4) produces a visualization of the models, depictingany detected inconsistencies. Our work makes the following main contributions:• A technique to capture a set of run-time code-based and GUI related metricsused for generating abstract models from iOS and Android app-pairs;40In this work, we focus on native apps; henceforth, we use the terms ‘mobile app’ or simply ‘app’to denote ‘native mobile app’.138• Algorithms along with an effective combination of mobile specific criteria tocompute graph-based mappings of the generated abstract models targetingmobile app-pairs, used to detect cross-platform app inconsistencies;• A tool implementing our approach, called CHECKCAMP, which visualizesmodels of app-pairs, highlighting the detected inconsistencies. CHECK-CAMP is publicly available [27];• An empirical evaluation of CHECKCAMP through a set of seven industrialand seven open-source iOS and Android mobile app-pairs.Our results indicate that CHECKCAMP can correctly extract abstract modelsof the app-pairs to infer likely mappings between the generated abstract modelsbased on the selected criteria; CHECKCAMP also detects 32 valid inconsistenciesin the 14 app-pairs.6.2 Pervasive InconsistenciesA major challenge faced by industrial mobile developers is to keep the app consis-tent across platforms. This challenge and the need for tool support emerged fromthe results of our qualitative study [86], in which we interviewed 12 senior appdevelopers from nine different companies and conducted a semi-structured survey,with 188 respondents from the mobile development community.In this work, to identify the most pervasive cross-platform inconsistencies be-tween iOS and Android mobile app-pairs, we conducted an exploratory study byinterviewing three industrial mobile developers, who actively develop apps for bothplatforms. The following categories and examples are extracted from the inter-views as well as a document shared with us by the interviewees, containing 100real-world cross-platform mobile app inconsistencies. Ranked in the order of im-pact on app behaviour, the most pervasive inconsistency categories are as follows:Functionality: The highest level of inconsistencies is missing functionality; e.g.,“Notes cannot be deleted on Android whereas iOS has the option to deletenotes.” Or “After hitting send, you are prompted to confirm to upload – thisprompt is missing on iOS.”139Figure 6.1: The overview of our technique for behaviour checking across mobile platforms.Data: When the presentation of any type of data is different in terms of order,phrasing/wording, imaging, or text/time format; e.g., “Button on Androidsays ‘Find Events’ while it should say ‘Find’ similar to iOS.”Layout: When a user interface element is different in terms of its layout such assize, order, or position; e.g., “Android has the ‘Call’ button on the left and‘Website’ on the right - iPhone has them the other way around.”Style: The lowest level of inconsistency pertains to the user interface style; i.e.,colour, text style, or design differences, e.g., “iOS has Gallery with a bluebackground while Android has Gallery with a white background”.We propose an approach that is able to automatically detect such inconsisten-cies. Our main focus is on the first two since these can impact the behaviour of theapps.6.3 ApproachFigure 6.1 depicts an overview of our technique called CHECKCAMP. We describethe main steps of our approach in the following subsections.1406.3.1 Inferring Abstract ModelsWe build separate dynamic analyzers for iOS and Android, to instrument the app-pair. For each app-pair, we execute the same set of user scenarios to exercisesimilar actions that would achieve the same functionality (e.g., reserving a hotelor creating a Calendar event). As soon as the app is started, each analyzer startsby capturing a collection of traces about the runtime behaviour, UI structures, andmethod invocations. Since the apps are expected to provide the same functional-ity, our intuition is that their traces should be mappable at an abstract level. Thecollected traces from each app are used to construct a model:Definition 2 (Model). A Model µ for a mobile app M is a directed graph, denotedby a 4-tuple < α , η , V, E > where:1. α is the initial edge representing the action initiating the app (e.g., a tap onthe app icon).2. η is the node representing the initial state after M has been fully loaded.3. V is a set of vertices representing the states of M. Each υ ∈ V represents aunique screen of M annotated with a unique ID.4. E is a set of directed edges (i.e., transitions) between vertices. Each (υ1, υ2)∈ E represents a clickable c connecting two states if and only if state υ2 isreached by executing c in state υ1.5. µ can have multi-edges and be cyclic.Definition 3 (State). A state s ∈ V represents the user interface structure of asingle mobile app screen. This structure is denoted by a 6-tuple, < γ , θ , τ , λ ,Ω , δ >, where γ is a unique state ID, θ is a classname (e.g., name of a ViewController in iOS or an Activity in Android), τ is the title of the screen, λis a screenshot of the current screen, Ω is a set of user interface elements with theirproperties such as type, action, label/data, and δ is a set of auxiliary properties(e.g., tag, distance) used for mapping states.141Definition 4 (Edge). An edge e ∈ E is a transition between two states representinguser actions. It is denoted by a 6-tuple, < γ , θ , τ , λ , Ω , δ >, where γ is a uniqueedge ID, θ is a source state ID, τ is a target state ID, λ is a list of methods invokedwhen the action is triggered, Ω is a set of properties of a touched element41 (i.e.type, action, label/data) and δ is a set of auxiliary properties (e.g., tag, distance)used for mapping purposes.iOS App Model InferenceIn iOS, events can be of different types, such as touch, motion, or multimediaevents. We focus on touch events since the majority of actions are of this type. Atouch event object may contain one or more finger gestures on the screen. It alsoincludes methods for accessing the UI view in which the touch occurs. We trackthe properties of the UI element that the touch event is exercised on. To capture thisinformation, we employ the Category and Extension [65] feature of Objective-C,which allows adding methods to an existing class without subclassing it or knowingthe original classes. We also use a technique called Method Swizzling [76], whichallows the method implementation of a class to be swapped with another method.To that end, we define a category extension to the UIApplication class and anew method in this category. We then swap a built-in method, responsible for send-ing an event, with the new method. The swap method call modifies the methodimplementations such that calls to the original method at runtime result in calls toour method defined in the category. Additionally, we capture the invoked methodcalls after an event is fired. We use aspects to dynamically hook into methods andlog method invocations. Once an event is fired at runtime, all the invoked methodsand their classes are traced and stored in a global dataset.For each event fired, we add an edge to the model. Figure 6.2 shows an edgeobject of an iPhone app (called MTG, used in our evaluation in Section 6.5) in-cluding its captured touched element and invoked methods.To construct the model, we need to capture the resulting state after an event istriggered. In iPhone apps, a UI state includes the current visible view controller, itsproperties, accompanied by its set of UI elements. We use a delay to wait for the UI41 A touched element is the UI element which has been exercised when executing a scenario (e.g., a cell in atable, a button, a tab in a tab bar).142Figure 6.2: An edge object of MTG iPhone app with its touched element and methods.to update properly after an event, before triggering another event on a UI element.Based on our empirical analyses, a two second waiting time is enough for mostiOS apps. An event could potentially move the UI forward, backward, or have noeffect at all. If the action results in staying in the current view controller, differentstate mutations could still occur. For instance, UI element(s) could dynamicallybe changed/added/removed, or the main view of the view controller be swappedand replaced by another main view with a set of different UI element(s). At alow level, moving the UI forward or backward loads a view controller in iPhoneapps. Similar to capturing properties of each edge, our approach for capturing UI-structure of each state, combines reflection with code injection to observe loadingview controller methods.Once we obtain a reference to the view controller, our approach takes a snap-shot of the state and captures all the UI element objects in an array associated tothe view controller, such as tables with cells, tab bars with tab items, tool bar items,navigation items (left, right, or back buttons), and it loops through all the subviews(e.g., labels, buttons) of the view controller. For each of them, we create an elementobject with its ID, type, action42, label, and details.Figure 6.3 shows a snapshot of a state in the MTG iPhone app including itsUI element objects. For instance, the top left button in Figure 6.3 has ‘UIButton’as type, ‘1’ as label, ‘button1Pressed’ as action (the event handler). We setdetails for extra information such as the number of cells in a list. Using this42 action pertains to the event handler, representing the method that will handle the event.1431 State(ID,ClassName,Title,#Elements)2 (S4,DecklistCounterController,-,8)4 UIElements5 (Type,Label,Action,Details)6 (UIButton,1,button1Pressed,-)7 (UIButton,2,button2Pressed,-)8 (UIButton,3,button3Pressed,-)9 (UIButton,4,button4Pressed,-)10 (UILabel,Total: 0 (+0),-,-)11 (UIButton,Reset,resetPressed,-)12 (UIButton,Undo,undoPressed,-)13 (UITabBar,-,itemClicked,5tabs)Figure 6.3: A snapshot of a state in MTG iPhone app with its captured UI element objects.information, we create a state node in the model.Android App Model InferenceAt a high-level, our Android dynamic analyzer intercepts method calls executedwhile interacting with an app and captures UI information (state) upon the returnof these methods. Similar to iOS, Android has different types of events. In ourapproach, we focus on user-invoked events since they contribute to the greatestchanges in the UI and allow the app to progress through different states. Thesetypes of events get executed when a user directly interacts with the UI of an app,for instance by clicking a button or swiping on the screen. When a user interactswith a UI element, the associated event listener method is invoked, and the elementis passed as one of its arguments. To create a new edge in our model, we inspectthese arguments and extract information about the UI element that was interactedwith by the user. This inspection also allows us to separate user-invoked eventsfrom other types, by checking whether the method argument was a UI elementsuch as a button or table that the user can interact with. We compare the argumentagainst the andoird.widget package [1], which contains visual UI elements to beused in apps.In our android analyzer, a UI state includes the current visible screen, its prop-erties, accompanied by its set of UI elements. When an executed method returns,we use the activity that called the method to retrieve information about the state ofthe UI. To access the UI layout of the current view, we use a method provided bythe Android library called getRootView [1]. This method returns a ViewGroup144object, which is a tree-like structure of all the UI elements present in the currentscreen of the app. We traverse this tree recursively to retrieve all the UI elements.Additionally, we capture some unique properties of the UI elements such as labelsfor TextViews and Buttons, and number of items for ListViews. Theseproperties are used during the mapping phase to compare iOS and Android statesat a lower level.6.3.2 Mapping Inferred ModelsNext, we analyze each model-pair to infer likely mappings implied by the states andedges through a series of phases. Prior to the Mapping phase, two preprocessingsteps are required namely Pruning and Merging.PruningThe first step in our analysis, is to prune the graph obtained for each platform,in order merge duplicate states. This step is required as our dynamic analyzerscapture any state we encounter after an event is fired without checking if it is aunique state or a duplicate state. This check can be carried out either separately ineach analyzer tool or once in the mapping phase. Having it in the mapping phaseensures that the pruning procedure is consistent across platforms. Identifying anew state of a mobile app while executing and changing its UI is challenging. Inorder to distinguish a new state from previously detected states, we compare thestate nodes along with their properties, as shown in Algorithm 3.As input, Algorithm 3 takes all States and Edges, obtained from the graph (G),and outputs a pruned graph (P). We loop through all the states captured (line 4),and compare each state with the rest of state space (line 6) based on their classesand number of UI elements (line 8). Next, we proceed by checking their UI el-ements (line 10) for equivalency of types and actions (line 12). Thus, datachanges do not reflect a unique state in our algorithm. In other words, two statesare considered the same if they have the same class and set of UI elements alongwith their respective properties. Detected duplicate states are removed (line 18)and the source and target state IDs for the edges are adjusted accordingly (line 19).145Algorithm 3: Pruning a Given Modelinput : State Graph (G) of a Given Model (M)output: Pruned State Graph (P)1 begin2 S← GETVERTICES(G)3 E← GETEDGES(G)4 foreach i = 0, i <COUNT(S), i++ do5 s1← S[i]6 foreach j = i+1, j <COUNT(S), j++ do7 s2← S[j]8 if s1(class)≡ s2(class) &s1(#elements)≡ s2(#elements) then9 elFlag← TRUE10 foreach e1 ∈ s1.Elements do11 e2← GETELEMENTATINDEX(s1,e1)12 if e1.type 6= e2.type ‖e1.action 6= e2.action then13 elFlag← FALSE14 break15 end16 end17 if elFlag then18 REMOVEDUPLICATESTATE(S,s2)19 UPDATEEDGES(E,s1,s2)20 end21 end22 end23 end24 return P(S,E)25 endMergingPlatform-specific differences that manifest in our models are abstracted away inthis phase. This step is required since such irrelevant differences can occur fre-quently across platforms. For instance, the iPhone app may offer More as an op-tion in its tab controller which is different from the Android app. If the iPhoneapp has more than five items, the tab bar controller automatically inserts a specialview controller (called the More view controller) to handle the display of addi-tional items. The More view controller lists the additional view controllers in atable, which appears automatically when it is needed and is separate from customcontent. Thus, our approach merges the More state with the next state (view con-146Algorithm 4: Mapping two (iOS & Android) Modelsinput : iPhone State Graph (IG)input : Android State Graph (AG)output: IG with Mapping Properties (MIG)output: AG with Mapping Properties (MAG)1 begin2 IS← GETVERTICES(IG)3 AS← GETVERTICES(AG)4 IE← GETEDGES(IG)5 AE← GETEDGES(AG)6 edgePairs[0]← INSERTEDGEPAIR(IE[0], AE[0])7 foreach i = 0, i <COUNT(edgePairs), i++ do8 pair← edgePairs[i]9 if NOTMAPPED(pair) then10 s1← GETSTATE(IS,pair[iphTrgtId])11 s2← GETSTATE(AS,pair[andTrgtId])12 iphEdges← GETOUTGOINGEDGES(s1,IE)13 andEdges← GETOUTGOINGEDGES(s2,AE)14 /*Find closest edge-pairs*/15 nextPairs← FINDEDGEPAIRS(iphEdges,andEdges)16 SETSTATEMAPPINGPROPERTIES(s1,s2)17 SETEDGEMAPPINGPROPERTIES(nextPairs)18 end19 foreach j = 0, j <COUNT(nextPairs),j++ do20 edgePairs[i+j+1]← INSERTEDGEPAIR(nextPairs[j])21 end22 end23 return (MIG,MAG)24 endtroller) to abstract away iPhone differences that are platform-specific and as suchirrelevant for our analysis. Similarly, the Android app may offer an option Menupanel to provide a set of actions. The contents of the options menu appear at thebottom of the screen when the user presses the Menu button. When a state is cap-tured on Android and then the option Menu is clicked, our approach merges thetwo states together to abstract away Android differences. Other differences suchas Android’s hardware back button vs. iPhone’s soft back button are taken intoaccount in our graph representations.147MappingThe collected code-based (e.g., classname) and GUI-based (e.g., screen title)data for states and edges are used in this phase to map the two models, as shown inAlgorithm 4. As input, Algorithm 4 takes iPhone (IG) and Android (AG) graphs,produced after the pruning and merging phases, and outputs those models with aset of computed auxiliary mapping properties for their states and edges (MIG andMAG). The algorithm operates on the basis of the following assumptions (1) themodel of an app starts with an initial edge that leads to an initial state and (2)conceptually, both models start with the same initial states. An array, edgePairs,holds the initial iPhone and Android edges (line 6) and other edge-pairs are insertedthrough the main loop (line 20). To find the edge-pairs, we first obtain the initialiPhone and Android states (line 10 and 11) based on the target state IDs in the ini-tial edge-pair. We then obtain all the outgoing iPhone edges (iphEdges in line12) and Android edges (andEdges in line 13) from the already mapped state-pair.To identify closest iPhone and Android edge-pairs (line 15), we loop through theoutgoing edges and calculate σEd , based on a set of comparison criteria as definedin Formula 6.1:σEd = min∀Ediph∈iphEdges∀Edand∈andEdges(f (Ediph,Edand)∑N f lagsi=1 Fi)∗100 (6.1)wheref (Ediph,Edand) = Faction ∗LD(I phaction,Andaction)+Flabel ∗LD(I phlabel ,Andlabel)+Ftype ∗Corresponds(I phtype,Andtype)+Fclass ∗LD(I phclass,Andclass)+Ftitle ∗LD(I phtitle,Andtitle)+Felms ∗NElPairs∑i=1Similarity(I phelms,Andelms)+Fmethods ∗LD(I phmethods,Andmethods)with the action, label, and type of the touched element, classname, titleand attributes of UI elements in the target state, and the method calls invoked bythe event.148The edge-pair with the lowest computed σEd value is selected as the closestAndroid-iPhone edge-pair and their mapping properties are appended to the modelaccordingly (line 17). To instantiate different combinations of this metric, we usea set of binary flags, denoted as Faction, Flabel , Ftype, Fclass, Ftitle, Felms and Fmethods.The value of each flag is 1 or 0 to activate or ignore a criterion. We propose sixdifferent instantiations, listed in Table 6.1, and compare them in our evaluation toassess their effectiveness (discussed in Section 6.5).Table 6.1: Six combinations for mapping.ID Combinations of Comparison CriteriaComb1 ClassNameComb2 TouchedElement (action, label, type)Comb3 TouchedElement+ClassNameComb4 TouchedElement+ClassName+TitleComb5 TouchedElement+ClassName+Title+UIElementsComb6 TouchedElement+ClassName+Title+UIElements+MethodsLD in Formula 6.1 is a relative Levenshtein Distance [144] between two strings,calculated as the absolute distance divided by the maximum length of the givenstrings (See Formula 6.2). Some string patterns that are known to be equivalent arechopped from the strings before calculating their distance. For instance, the words“Activity” in Android classname and “ViewController”/“Controller” in iPhoneclassname are omitted.LD(str,str′) =distance(str,str′)maxLength(str,str′)(6.2)Corresponds in Formula 6.1 is used for comparing the element’s type basedon the corresponding Android-iPhone UI element equivalent mappings. Since iOSand Android have different UI elements, a mapping is needed to find equivalentwidgets. We analyzed GUI elements that exist for both native Android [20] andiPhone [13] platforms and identified the differences and similarities on the twoplatforms. We used and extended upon existing mappings that are available online[11]. During the interview sessions (See Section 6.2), we cross-validated over30 control, navigation, and UI element mappings (such as button, label, picker andslider) that function equivalently on the two platforms, so that the generated models149can be used in this phase. We have made these UI equivalent mappings publiclyavailable [27]. Corresponds returns 1 if two elements are seen as equivalent andthus can be mapped, and 0 otherwise.Further, Similarity in Formula 6.1 is a relative number ([0,1]) between two setsof elements in the two (target) states calculated as follows:Similarity(elAry,elAry′) =elPairCount(elAry,elAry′)maxCount(elAry,elAry′)(6.3)where the number of elements that can be mapped is divided by the maximum sizeof the given arrays. Similar to the touched element, action, label, and typeproperties of UI elements are used to compute mapping between them.Finally, going back to our algorithm, mapped edge-pairs are inserted to themain array (line 20), and the next set of states and edges are considered for mappingrecursively until no other outgoing edges are left.Detecting InconsistenciesAny unmatched state left without mapping properties from the previous phase isconsidered as a functionality inconsistency. For a matched state-pair, since theirincoming edges are mapped, we assume that these target states should be equivalentconceptually. Data inconsistencies pertain to text properties of the screen such astitles, labels, buttons, and also the number of cells in a table and tabs. Image relatedand style related properties are out of scope. We calculate data inconsistencies,σState, in a pair of mapped states by computing LD between two titles as well astext properties of the elements-pairs.σState = dLD(I phtitle,Andtitle)e+NElPairs∑i=1dLD(I phtxt ,Andtxt)e (6.4)To compute the correspondence between the elements, we loop through the twoarrays of elements. First, we compare the elements’ types based on the correspond-ing Android-iPhone UI element equivalent mappings [14]. For any two elementswith the same type and a textual label, we compute LD. We ignore image elementtypes e.g., a button with an image. Where we have multiple elements of the sametype, the lowest computed LD is selected as the closest elements-pairs. The σState150IPH ANDFigure 6.4: Visualization of mapping inferences for MTG iPhone (left) and Android (right) app-pairs. Theresult indicates 3 unmatched states shown with red border (including 2 functionality inconsistencieswhere iPhone has more states than Android and 1 platform specific inconsistency with MoreViewsCon-troller on iPhone). Other 5 matched states have data inconsistencies shown with yellow border.is added as mapping distance to the models with the same mapping tag for the twostates (line 16). Additionally, the detected inconsistencies are added to mappingresult which are later manifested through our visualization.Eventually, at the end of this phase, each state is marked as either unmatched,matched with inconsistencies or completely matched in the two models, ready tobe visualized in the next phase. Thus, we automatically detect mismatched screensby using one platform’s model as an oracle to check another platform’s model andvice versa.6.3.3 Visualizing the ModelsAfter calculating the likely mappings and detecting potential inconsistencies, wevisualize the iOS and Android models, side-by-side, colour coding the mappingresults. Red, yellow and dark green border colours around states show unmatched,matched with inconsistencies and completely matched states, respectively. Matchedstates and edges share the same mapping tag. Figure 6.4 depicts an example ofthe output of the visualization phase (it is minimized because of space restric-tions). The models can be zoomed in and list detected inconsistencies as well as151IPH ANDFigure 6.5: Zooming into a selected State (or Edge) represents detected inconsistencies and UI-structure(or touched element and methods) information of iPhone (left) and Android (right) app-pairs.UI-structure information on selected state(-pair) or touched element and methodsinformation on selected edge(-pair) (See Figure 6.5).6.4 Tool ImplementationOur approach is implemented in a tool called CHECKCAMP [27].Its iPhone analyzer is implemented in Objective-C. We use and extend a num-ber of external libraries. ASPECTS [22] uses Objective-C message forwarding andhooks into messages to enable functionality similar to Aspect Oriented Program-ming for Objective-C. DCINTROSPECT [30] is a library for debugging iOS userinterfaces. We extend DCINTROSPECT to extract a UI element’s action message,target object, it’s properties and values.The Android analyzer is implemented in Java (using Android 4.3). To interceptmethod calls, we rely mainly on ASPECTJ.Our Mapping and visualization engine is written in Objective-C and imple-ments the states recognition and the states/edges mapping steps of the technique.The output of the mapping engine is an interactive visualization of the iOS andAndroid models, which highlights the inconsistencies between the app-pairs. Thevisualization is implemented as a web application and uses the CYTOSCAPE.JSlibrary [29], which is a graph theory library to create models.1526.5 EvaluationTo evaluate the efficacy of our approach we conducted an empirical evaluation,which addresses the following research questions:RQ1. How accurate are the models inferred by CHECKCAMP?RQ2. How accurate are the mapping methods? Which set of comparison criteriaprovides the best results?RQ3. Is CHECKCAMP capable of detecting valid inconsistencies in cross-platformapps?6.5.1 Experimental ObjectsWe include a set of seven large-scale industrial and seven open-source iPhone andAndroid app-pairs (14 app-pairs in total). The industrial app-pairs are collectedfrom two local mobile companies in Vancouver. The open-source app-pairs arecollected from Github. We require the open-source app-pairs to be under the sameGitHub repository to ensure that their functionally is meant to be similar acrossiPhone and Android. Table 6.2 shows the app-pairs included in our evaluation.Each objects’s ID, name, resource, and their characteristics in terms of their sizeand complexity is also presented. XCODE STATISTICIAN [15] and ECLIPSEMET-RICS [5] are used to measure lines of code (LOC) in the iOS and Android apps,respectively.153Table 6.2: Characteristics of the experimental objects, together with total number of edges, unique states, elements and manual unique states counts (MC) acrossall the scenarios.#LOC #Edges #Unique States #Elements #MC StatesID App [URL] (#Scenarios) AND IPH AND IPH AND IPH AND IPH AND IPH1 MTG-Judge [8] (2) 3,139 1,822 23 38 11 14 118 125 11 142 Roadkill-Reporter [18, 33] (1) 1,799 474 3 17 1 5 48 103 4 53 NotifyYDP [16] (1) 1,673 1,960 5 18 2 5 101 96 2 54 Family [6] (1) ∼12K ∼14K 10 24 3 4 93 372 3 45 Chirpradio [17, 32] (1) 1,705 881 3 4 1 1 9 24 1 16 Whistle [14] (1) 702 111 3 4 1 1 6 4 1 17 Redmine [12] (1) 1,602 48 6 8 5 4 68 26 5 48 Industry App A (2) 8,376 4,015 37 46 13 14 1,041 1,286 13 139 Industry App B (4) ∼70k ∼28K 49 53 22 22 715 796 22 2210 Industry App C (6) ∼68K ∼30K 76 87 37 36 1,142 1,028 37 3611 Industry App D (4) ∼69K ∼28K 66 71 29 31 940 1,803 29 2912 Industry App E (2) ∼68K ∼26K 23 28 11 12 353 265 11 1213 Industry App F (3) ∼68K ∼28K 53 57 28 28 635 2,182 28 2814 Industry App G (4) ∼69K ∼29K 53 56 27 27 813 1,128 27 271546.5.2 Experimental ProcedureWe used iOS 7.1 simulator and a Samsung Galaxy S3, to run the iPhone and An-droid apps, respectively. To collect traces, two graduate students were recruited.First, they installed a fresh version of each pair of the apps, which were then in-strumented by CHECKCAMP. Next, to collect consistent traces, we wrote a set ofscenarios for our collected app-pairs and gave each student one scenario for eachapp to access all use-cases of the Android or iPhone versions of the apps accordingto the given scenarios. Note that the same user scenario is used for both the iOSand Android versions of an app. The scenarios used in our evaluation are availableonline [27].Once traces were collected, CHECKCAMP was executed to obtain the modelsand mappings. To asses the accuracy of the models generated (RQ1), we comparethe number of generated unique states to the actual number of unique states foreach app-pair. To form a comparison baseline, we manually examine and navigatethe user scenarios for each app-pair and document the number of unique states.To evaluate the accuracy of the mappings (RQ2), we measure precision, recall,and F-measure for each combination, listed in Table 6.1, and app-pair as follows:Precision is the rate of mapped states reported by CHECKCAMP that are correct:TPTP+FPRecall is the rate of correct mapped states that CHECKCAMP finds: TPTP+FNF-measure is the harmonic mean of precision and recall: 2×Precision×RecallPrecision+Recallwhere T P (true positives), FP (false positives), and FN (false negatives), respec-tively, represent the number of states that are correctly mapped (both fully matchedor matched with inconsistencies), falsely mapped, and missed. To document T P,FP, and FN, associated with each app for our combinations of comparison crite-ria, we manually examine the apps and compare the formed baseline against thereported output.To validate detected inconsistencies (RQ3), for the best combination calculatedin RQ2, we manually examine the reported inconsistencies in each app-pair. Theresults from our analysis are presented in the next section.155Note that, to the best of our knowledge, there are currently no similar toolsto compare the results of CHECKCAMP against. That is why our baselines arecreated manually.6.5.3 Results and FindingsRQ1: Inferred models. We ran multiple Scenarios to cover all the screens/statesin each app. For each scenario, the initial model is constructed over its tracesand analyzed by CHECKCAMP. Table 6.2 presents the total number of Edges,Unique States, and UI Elements for all the scenarios running on each Android andiPhone app, produced by CHECKCAMP. The last column of the table also showsthe number of Unique States counted manually. As far as RQ1 is concerned, ourresults show that CHECKCAMP is able to identify unique states of a given iPhoneand Android app-pair and generate their state models correctly for each scenario.However, there is a few cases in our industry iPhone apps (IDs 8 and 11) andAndroid app (ID 2) where the number of manual unique states does not exactlymatch the number of unique states collected by the dynamic analyzer. This ismainly because our approach currently takes into account the type of the class(either Activity in Android or View Controller in iOS) in defining a state and thusseparate states are captured for different View Controllers (discussed in Section 6.6under Limitations).RQ2: Different mapping combinations. The precision and recall rates, mea-sured for the first five combinations, listed in Table 6.1, for our 14 app-pairs, arepresented in Figure 6.6. The F-measure is shown in Figure 6.7. We do not in-clude Combination 6 in these figures since apart from the touched element’s event-handler (i.e., action), comparing the rest of the method calls did not improvethe mapping (discussed in Section 6.6 under Conclusive Comparison Criteria). Asfar as RQ2 is concerned, our results show that CHECKCAMP is highly accuratein mapping state-pairs. As expected, the results are higher in the open-source appsdue to the relative simplicity compared to the industry apps. The comparisons inFigure 6.6 and Figure 6.7 reveal that Combination 5 followed by Combination 4provide the best mapping results in recall, precision, and F-measure for the industryapps. While the results of the combinations have less variation in the open-source1560.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00.30.40.50.60.70.80.91.0RecallPrecisionlllllll llComb1Comb2Comb3Comb4Comb5Figure 6.6: Plot of precision and recall for the five mapping combinations of each app-pair.apps, Combination 2 shows the best results for them. For the best combinations:• The recall is 1 for the open-source apps, and for the industry apps it oscillatesbetween 0.68–1 (average 0.88) meaning that our approach can successfullymap most of the state-pairs present in an app-pair.• The precision is 1 for the open-source apps, and for the industry apps itoscillates between 0.88–1 (average 0.97), which is caused by a low rate offalse positives (discussed in Section 6.6 under Limitations).• The F-measure is 1 for open-source apps, and varies between 0.75–1 (aver-age 0.92), for industry apps.RQ3: Valid inconsistencies. As far as RQ3 is concerned, for the best combina-tions calculated in RQ2, Table 6.3 depicts the number of reported inconsistenciesby CHECKCAMP along with some examples. We manually examined and val-idated (inconsistency categories) in each app-pair across the scenarios. We also1572 4 6 8 10 12 140.30.40.50.60.70.80.91.0Experimental ObjectsF−measurel l lll l llllll ll1 3 5 7 9 11 13lComb1Comb2Comb3Comb4Comb5Figure 6.7: F-measure obtained for the five mapping combinations on each app-pair.computed the average rank and percentage of severity of the valid detected incon-sistencies. The used severity ranks are presented in Table 6.4, which are adoptedfrom Bugzilla [25] and slightly adapted to fit inconsistency issues in mobile apps.We computed the percentage of the valid inconsistencies’ severity as the ratio ofthe average severity rank to the maximum severity rank (which is 5).We found a number of valid functionality inconsistencies in the open-sourceapps, and interestingly, two in the industrial apps (IDs 10 and 12). However, insome app-pairs, functions such as email clients or opening browsers behaved dif-ferently on the two platforms. For instance, in the case of app-pair with ID 3,opening browsers and email clients take the Android app user to outside of the ap-plication while that is not the case in the iPhone app. As such, the two models havemismatched states in Table 6.2 as CHECKCAMP is not capturing states outside ofthe app.158Table 6.3: Number of reported inconsistencies by CHECKCAMP, validated, average and percentage of their severity with examples in each app-pair.ID #Reported (Categories) #Validated (Categories) Severity (Avg,%) Examples of Reported Inconsistencies1 13 (2 func, 11 data) 13 (2 func, 11 data) 2.6 52% Android missing ‘Draft Time’/‘Update’ functionality (Figure 6.4)# table cells: iPhone(12628) vs. Android(6336)2 4 (3 func, 1 data) 1 (1 func) 5 100% Android missing ‘Help’ functionality3 2 (2 data) 2 (2 data) 2 40% Title: iPhone ‘Notify YDP’ vs. Android ‘’4 3 (1 func, 2 data) 1 (1 func) 5 100% Android missing ‘Change Password’ functionality5 1 (1 data) 1 (1 data) 2 40% Button: iPhone ‘’ vs. Android ‘Play’6 0 0 – – –7 5 (1 func, 4 data) 5 (1 func, 4 data) 2.6 52% iPhone missing a functionality8 14 (14 data) 2 (2 data) 2 40% Button: iPhone ‘Reset’ vs. Android ‘RESET’9 2 (2 data) 0 – – –10 5 (1 func, 4 data) 3 (1 func, 2 data) 3 60% iPhone missing ‘Map’ functionality11 2 (2 data) 1 (1 data) 2 40% Title: iPhone ‘May 14’ vs. Android ‘Schedule’12 2 (1 func, 1 data) 2 (1 func, 1 data) 3.5 70% Android missing ‘Participants’ functionality13 1 (1 data) 1 (1 data) 2 40% Title: iPhone ‘Details’ vs. Android ‘Hotels’14 0 0 – – –All 54 (9 func, 45 data) 32 (7 func, 25 data) 3 60% –159Table 6.4: Bug severity description.Severity Description RankCritical Functionality loss, no work-around 5Major Functionality loss, with possible work-around 4Normal Makes a function difficult to use 3Minor Not affecting functionality, behaviour is not natural 2Trivial Not affecting functionality, cosmetic issue 1Among the data inconsistencies in Table 6.3, are inconsistencies in the numberof cells and text of titles, labels, and buttons. Most of the false positives in thereported data inconsistencies (in particular in app-pair with ID 8) are due to theUI structure of a state being implemented differently on the two platforms (dis-cussed in Section 6.6 under Limitations). Thus, CHECKCAMP could not map theelements correctly and reported incorrect inconsistencies.6.6 DiscussionIn this section, we discuss our general findings, limitations of CHECKCAMP, andsome of the threats to validity of our results.6.6.1 Comparison CriteriaAmong the code-based and GUI-based comparison criteria, our evaluation showsthat the most effective in the mapping phase pertains to information about the text,action, and type of UI elements that events are fired on, as well as the classnameand title of the states. In addition, while we extract a set of method calls after anevent fires, our investigation shows that only the action of the touched UI element iseffective. We found that even after omitting OS built-in methods, such as delegatemethods provided by the native SDK, or library API calls, the method names arequite different in the two platforms and thus provided no extra value in the mappingphase.6.6.2 LimitationsThere are some limitations to our current implementation. First, deciding whatconstitutes a UI state is not always straightforward. For instance, consider two160screens with a list of different items. In the Android version of an app the sameActivity is used to implement the two screens while on the iPhone version sepa-rate View Controllers exist and currently as shown in Algorithm 3, the type of theclass (either Activity in Android or View Controllers in iOS) is checked (line 8) foridentifying a state and thus (mistakenly) separate states are captured in iPhone.Next, the low rate of false positives in RQ2 include examples where even con-sidering our selected properties all together, CHECKCAMP still lacks enough in-formation to conclude correct mappings. For instance, if an ImageButtonwhichcontains an image as a background is exercised, there would be no text/label to becompared. Another limitation is with respect to the string edit distance used inour algorithm; for instance, the two classnames DetailedTipsViewController andTipsDetailActivity are falsely reported as being different based on their distance.This means, if the outgoing edges can not be mapped correctly in Algorithm 4,CHECKCAMP halts and cannot go any further. Backtracking based approachescan be considered to recover if it performs incorrect matches.Another limitation is related to the high false-positive rate in the reported datainconsistencies in RQ3. In states with multiple elements of the same type, e.g., but-tons with images or text properties, our programmatic approach in CHECKCAMPcannot map them correctly. Another reason, occurred in some cases, is the UIstructure of a state-pair is implemented differently. For instance, in an Androidstate, buttons exist with text properties whereas in the corresponding iPhone state,those texts are implemented through labels along with buttons. However, this lim-itation could be addressed through image-processing techniques [69, 205] on theiPhone and Android screenshots collected by the dynamic analyzers. This couldenable the detection of other types of inconsistencies between app-pairs includingimage-related data, layout, or style.6.6.3 ApplicationsThere are various applications for our technique. First of all, our technique sup-ports mobile developers in comprehending, analyzing, and testing their native mo-bile apps that have implementations in both iOS and Android. Many developersinteract with GUI to comprehend the software by creating a mental model of the161application [192]. On average, 48% of a desktop applications’s code is devoted toGUI [170]. We believe the amount of GUI-related code is higher in mobile appsdue to their highly interactive nature. Thus, using the models to provide a visu-alization of the apps accompanied with the UI-structure and method calls in thevisualization output, would support mobile developers and testers in their programcomprehension and analysis tasks and to obtain a better understanding of their mo-bile apps. The models inferred by CHECKCAMP can also be used for generatingtest cases. In terms of scalability, the results in Table 6.2 show that our approachis scalable to large industrial mobile apps consisting of tens of thousands of LOCand many states.6.6.4 Threats to ValidityThe fact that we form the comparison baselines manually could be a threat to in-ternal validity. We did look for other similar tools to compare our results against,without success. Manually going through the different applications to create base-lines is labour intensive and potentially subject to errors and author’s bias. Wetried to mitigate this threat by asking the first two authors to create the compar-ison baselines together before conducting the experiment. Additionally, we hada small number of scenarios in particular for the open source apps. We tried tomitigate this threat by assuring that these scenarios covered the app screens/statesfully. A threat to the external validity of our experiment is with regard to the gen-eralization of the results to other mobile apps. To mitigate this threat, we selectedour experimental objects from industrial and open-source domains with variationsin functionality, structure and size. With respect to reproducibility of our results,CHECKCAMP, the open-source experimental objects, their scenarios and resultsare publicly available [27].6.7 Related WorkDealing with multiple platforms is not specific to the mobile domain. The problemalso exists for cross-browser compatibility testing. However, in the mobile domain,each mobile platform is different with regard to the OS, programming languages,API/SDKs, and supported tools, making it much more challenging to detect incon-162sistencies automatically.Mesbah and Prasad [161] propose a functional consistency check of web appli-cation behaviour across different browsers. Their approach automatically analyzesthe given web application, captures the behaviour as a finite-state machine andformally compares the generated models for equivalence to expose discrepancies.Their model generation [163] and mapping technique is based on DOM states ofa web application while CHECKCAMP deals with native iOS and Android statesand mappable code-based and GUI related metrics of the two mobile platforms.Choudhary et al. [74] propose a technique to analyze the client-server commu-nication and network traces of different versions of a web application to matchfeatures across platforms.In the mobile domain, Rosetta [100] infers likely mappings between the JavaMEand Android graphics APIs. They execute application pairs with similar inputs toexercise similar functionality and logged traces of API calls invoked by the appli-cations to generate a database of functionally equivalent trace pairs. Its output isa ranked list of target API methods that likely map to each source API method.Cloud Twin [117] natively executes the functionality of a mobile app written foranother platform. It emulates the behaviour of Android apps on a Windows Phonewhere it transmits the UI actions performed on the Windows Phone to the cloudserver, which then mimics the received actions on the Android emulator. To ourbest knowledge, none of the related work addresses inconsistency detection acrossiOS and Android mobile platforms.6.8 ConclusionsThis work is motivated by the fact that implementation of mobile apps for multi-ple platforms – iOS and Android – has become an increasingly common industrypractice. As a result, a challenge for mobile developers and testers is to keep theapp consistent, and ensure that the behaviour is the same across multiple platforms.In this work, we proposed CHECKCAMP, a technique to automatically detect andvisualize inconsistencies between iOS and Android versions of the same mobileapp. Our empirical evaluation on 14 app-pairs shows that the GUI model-based ap-proach can provide an effective solution; CHECKCAMP can correctly infer mod-163els, and map them with a high precision and recall rate. Further, CHECKCAMPwas able to detect 32 valid functional and data inconsistencies between app ver-sions.While we are encouraged by the evaluation results of CHECKCAMP, thereare several opportunities in which our approach can be enhanced and extended forfuture research. The immediate step would be to conduct an in-depth case study,carried out in an industrial setting with a number of developers using CHECK-CAMP. This would help validate the efficiency of the mapping and the visualiza-tions. Additionally, the execution of consistent scenarios can be enhanced by theuse of mobile apps that have test suites such as CALABASH [26] scripts. The tracesgenerated by test suites can be leveraged in the mapping engine to enhance theapproach.Systematically crawling to recover models is also an alternative to using scenar-ios. While there are limitations of automated model recovery, it could complementhuman-provided scenarios, to ensure better coverage. We have taken the first re-quired steps for automatically generating state models of iPhone apps [85] througha reverse engineering technique. There have been similar techniques for Androidapps [55, 73, 102, 217].Another direction is to improve the current dynamic analyzers to capture infor-mation regarding each device’s network communication (client-server communi-cation of platform-specific versions of a mobile app), as well as the API calls madeto utilize the device’s native functionality such as GPS, SMS, Calendar, Camera,and Gallery.164Chapter 7Conclusions and Future WorkMobile app development on platforms such as Android and iOS has gained tremen-dous popularity recently. This dissertation aims at advancing the state-of-the-art by1) obtaining insights regarding current practices, real challenges and concerns inmobile app development as well as 2) proposing a new set of techniques and toolsbased on the identified challenges. To this end, we designed five research questions.The first three research questions addressed the first part of our goal, in particular,each responded to a gap in the current state-of-the-art. The last two research ques-tions are follow-up studies, which address the identified challenges by proposingtechniques and tools. We believe that our primary contributions and publications,presented in this dissertation, have addressed our goal and research questions.7.1 Revisiting Research QuestionsRQ1. What are the main challenges developers face in practice when they buildmobile apps?Chapter 2. We presented the first qualitative field study [86] targeting mobile appdevelopment practices and challenges, which is considered “a very strong and in-fluential contribution to the whole SE community” by our reviewers. We started byconducting and analyzing interviews with 12 senior mobile app developers, fromnine different industrial companies. We followed a Grounded Theory approach toanalyze our interviews. Based on the outcome of these interviews, we designed anddistributed an online survey, targeted the popular Mobile Development Meetup andLinkedIn groups related to native mobile development. We kept the survey live for165two and a half months, which was fully completed by 188 mobile app developersworldwide.However, similar to quantitative research, qualitative studies could suffer fromthreats to validity, which is challenging to assess as outlined by Onwuegbuzie etal. [179]. For instance, in codification, the researcher bias can be troublesome,skewing results on data analysis [132]. We tried to mitigate this threat throughtriangulation; The codification process was conducted by two researchers, one ofwhom had not participated in the interviews, to ensure minimal interference of per-sonal opinions or individual preferences. Additionally, we conducted a survey tochallenge the results emerging from the interviews. Both the interview and surveyquestionnaire were designed by a group of three researchers, with feedback fromfour external people – one senior Ph.D. student and three industrial mobile app de-velopers – in order to ensure that all the questions were appropriate and easily com-prehensible. Another concern was a degree of generalizability. We tried to drawrepresentative mobile developer samples from nine different companies. Thus, thedistribution of participants includes different companies, development team sizes,platforms, application domains, and programming languages – representing a widerange of potential participants. Of course, the participants in the survey also havea wide range of background and expertise. All this gives us some confidence thatthe results have a degree of generalizability. One risk within Grounded Theoryis that the resulting findings might not fit with the data or the participants [99].To mitigate this risk, we challenged the findings from the interviews with an on-line survey, filled out by 188 practitioners worldwide. The results of the surveyconfirmed that the main concepts and codes, generated by the Grounded Theoryapproach, are in line with what the majority of the mobile development commu-nity believes. Lastly, in order to make sure that the right participants would takepart in the survey, we shared the survey link with some of the popular Mobile De-velopment Meetup and LinkedIn groups related to native mobile app development.Furthermore, we did not offer any financial incentives nor any special bonuses orprizes to increase response rate.RQ2. What are the characteristics of non-reproducible bug reports and thechallenges developers deal with?Chapter 3. To obtain better insights of issues and concerns in software develop-166ment, in general, it is a very common practice that researchers investigate othersources of software development data. In particular, bug repository systems havebecome an integral component of software development activities. Ideally, eachbug report should help developers find and fix a software fault. However, thereis a subset of reported bugs that is not (easily) reproducible, on which develop-ers spend considerable amounts of time and effort. While we started with mobilenon-reproducible (NR) bugs, we noticed that none of the related work investigatesnon-reproducible bug reports in isolation. Thus, we expanded the study to othersoftware environment and domains (desktop, web, and mobile). In this work [87],we presented the first empirical study on the frequency, nature, and root cause cat-egories of non-reproducible bug reports. We mined six bug tracking repositoriesfrom three different domains and found that 17% of all bug reports are resolvedas non-reproducible at least once in their life-cycles. Non-reproducible bug re-ports, on average, remain active around three months longer than other resolutiontypes while they are treated similarly in terms of the extent to which they are dis-cussed or the number of developers involved. Furthermore, we manually examinedand classified six common root cause categories. Our classification indicated that“Interbug Dependencies” forms the most common category (45%), followed by“Environmental Differences” (24%), “Insufficient Information” (14%), “Conflict-ing Expectations” (12%), and “Non-deterministic Behaviour” (3%).However, our manual classification of the bug reports could be a source ofinternal threats to validity. In order to mitigate errors and possibilities of bias,we performed our manual classification in two phases where (1) the inference ofrules was initially done by the first author; the rules were cross-validated and un-certainties were resolved through extensive discussions and refinements betweentwo researchers; the generated categories were discussed and refined by a groupof three researchers, (2) the actual distribution of bug reports into the six inferredcategories was subsequently conducted by myself following the classification rulesinferred in the first step. In addition, since this is the first study classifying NRbug reports, we had to infer new classification rules and categories. Thus, onemight argue that our NR rules and categories are subjective with blurry edges andboundaries. By following a systematic approach and triangulation we tried to mit-igate this threat. Another threat in our study is the selection and use of these bug167repositories as the main source of data. However, we tried to mitigate this threatby selecting various large repositories and randomly selecting NR bug reports foranalysis. In terms of external threats, we tried our best to choose bug reposito-ries from a representative sample of popular and actively developed applications inthree different domains (desktop, web, and mobile). With respect to bug trackingsystems, JIRA and BUGZILLA are well-known popular systems, although bug re-ports in projects using other bug tracking systems could behave differently. Thus,regarding a degree of generalizability, replication of such studies within differentdomains and environments (in particular for industrial cases) would help to gener-alize the results and create a larger body of knowledge.RQ3. What are the app-store characteristics of the same mobile app, publishedin different marketplaces? How are the major concerns or complaints different oneach platform?Chapter 4. Online app stores are the primary medium for the distribution of mo-bile apps. Through app stores, users can download and install apps on their mobiledevices and subsequently rate the apps. As such, app stores provide an importantchannel for app developers to collect user feedback such as the overall rating oftheir app, issues or bugs detected and new feature requests. To attract as manyusers as possible, developers often implement the same app for multiple mobileplatforms [86]. We presented the first large-scale study on mobile app-pairs, i.e.,the same app implemented for iOS and Android platforms, in order to analyze andcompare their various attributes, user reviews, and root causes of user complaintsat multiple levels of granularity. We mined the two most popular app stores andemploy a mixed-methods approach using both quantitative and qualitative analysis.Our results show that on average the stars and prices are similar on both platforms,with some fluctuations in price. Reasons for price fluctuations include differentmonetizing strategies, offering different features and different efforts and costs re-quired to maintain the app. The number of ratings is greatly in favour of Androidwhere in 63% of the app-pairs it has 4,821 more ratings than the iOS platform.Further, some top rated apps only exist on one platform, reasons for this includelack of resources, platform restrictions and revenue per platform. We combinedthe stars, ratings, and user reviews to measure apps’ success on the Android andiOS platforms for 2K app-pairs and found that 17.4% have a difference of 25% or168more in their success rate between the two platforms. Finally, we looked closely atuser complaints and concerns. We found that, on average, iOS apps have more crit-ical and post-update problems while Android apps have more complaints relatedto compatibility, usability, performance, security or functionality. It connects thedevelopers comments to our findings as they mentioned, “Apple forces the devel-opers to constantly migrate the apps to their latest OS and tool versions.” On theother hand, there is more device fragmentation on Android and the compatibility,usability, performance, and functionality are more related to dealing with a varietyof devices.However, our manual labelling of the reviews to train the classifiers could bea source of internal threat to validity. In order to mitigate this threat, uncertaintieswere cross-validated and resolved through discussions and refinements between theauthors. With respect to the detection of app-pairs, our technique cannot retrieveall the possible app-pairs since it only considers two apps a pair if their app nameand developer name start with the same root word. For instance, an app named TheWonder Weeks43 on iOS has a pair on the Android platform with the name BabyWonder Weeks Milestones.44 Such a pair would not be retrieved by our techniquesince it does not start with the same root word. Thus, as shown in Figure 4.4, theapp-pairs detected in our study are a subset of all possible app-pairs. In terms ofexternal threats, we tried our best to choose app-pairs from a representative sampleof popular mobile apps and categories. With respect to app store systems, iTunesand Google Play are the most popular systems currently, although apps in otherapp stores could have other characteristics. Regarding generalizability, replicationof such studies within different app stores would help to generalize the results andcreate a larger body of knowledge. Additionally, the classifiers alone could beuseful to group the reviews for developers.RQ4. How can we help developers to better understand their mobile apps?Chapter 5. Many developers interact with the graphical user interface (GUI) tocomprehend the software by creating a mental model of the application [192]. Fortraditional desktop applications, an average of 48% of the application’s code isdevoted to GUI [170]. Because of their highly interactive nature, we believe the43 https://goo.gl/ofLWim44 https://goo.gl/tMKWTx169amount of GUI-related code is typically higher in mobile apps. To support mobiledevelopers in their program comprehension and analysis tasks, we presented thefirst reverse engineering technique to automatically navigate a given iPhone appand infer a model of its user interface states [85]. We implemented our approach inICRAWLER, which is capable of exercising and analyzing UI changes and generatea state model of the application. The results of our evaluation, on six open sourceiPhone apps, point to the efficacy of the approach in automatically detecting uniqueUI states, with a minimum level of manual effort required from the user. We be-lieve our approach and techniques have the potential to help mobile app developersincrease the quality of iOS apps.However, there are some limitations within our current implementation of theapproach. Although it is minimal, the users still need to complete a few tasks toset up ICRAWLER within their applications manually. There are also some UI el-ements such as the tool bar, slider, page control, and search bar, which are notsupported currently. In addition, while ICRAWLER at the moment supports themost common gestures in iOS apps such as tapping on a UI element, inserting text,and scrolling views, still there is no support for other advanced gestures such asswiping pages and pinching (e.g., zooming in and out images). Furthermore, thefact that we form the comparison baselines manually could be a threat to inter-nal validity. We did look for other tools to compare our results against, withoutsuccess. Manually going through the different applications to create baselines islabour intensive and potentially subject to errors and author’s bias. We tried to mit-igate this threat by asking two other students to create the comparison baselines.Further, in our attempt to gather the experimental objects, we noticed that there is asmall collection of open-source iPhone apps available online – note that we couldnot use applications available in AppStore for our experiment since we needed ac-cess to their source code. Even though, this made it difficult to select applicationsthat reflect the whole spectrum of different UI elements in iPhone apps, we believethe selected objects are representative of the type of applications ICRAWLER canreverse engineer. However, we acknowledge the fact that, in order to draw moregeneral conclusions, more mobile apps are required.RQ5. How can we help developers to automatically detect inconsistencies intheir same mobile app across multiple platforms?170Chapter 6. This work [88] is motivated by the fact that implementation of mobileapps for multiple platforms – iOS and Android – has become an increasingly com-mon industry practice. Therefore, as we identified in our initial field study [86], amajor challenge for mobile developers and testers is to keep the app consistent andensure that the behaviour is the same across multiple platforms. We also found thatdevelopers currently treat the mobile app for each platform separately and manu-ally perform screen-by-screen comparisons, often detecting many cross-platforminconsistencies [86]. This manual process is, however, tedious, time-consuming,and error-prone. Thus, we proposed [88] the first automated technique, calledCHECKCAMP (Checking Compatibility Across Mobile Platforms), which auto-matically detect and visualize inconsistencies between iOS and Android versionsof the same mobile app.However, there are some limitations to our current implementation. First, de-ciding what constitutes a UI state is not always straightforward. For instance, con-sider two screens with a list of different items. In the Android version of an appthe same Activity is used to implement the two screens while on the iPhone ver-sion separate View Controllers exist and currently as shown in Algorithm 3, thetype of the class (either Activity in Android or View Controllers in iOS) is checked(line 8) for identifying a state and thus (mistakenly) separate states are captured iniPhone. Next, the low rate of false positives in RQ2 include examples where evenconsidering our selected properties all together, CHECKCAMP still lacks enoughinformation to conclude correct mappings. For instance, if an ImageButtonwhich contains an image as a background is exercised, there would be no text/labelto be compared. Another limitation is with respect to the string edit distance usedin Algorithm 4 for mapping two classnames based on their distance. This means ifthe outgoing edges can not be mapped correctly, CHECKCAMP halts and cannotgo any further. Backtracking-based approaches can be considered to recover if itperforms incorrect matches. Another limitation is related to the high false-positiverate in the reported data inconsistencies in RQ3. In states with multiple elementsof the same type, e.g., buttons with images or text properties, our programmaticapproach in CHECKCAMP cannot map them correctly. Another reason, occurredin some cases, is the UI structure of a state-pair is implemented differently. Forinstance, in an Android state, buttons exist with text properties whereas in the cor-171responding iPhone state, those texts are implemented through labels along withbuttons. However, this limitation could be addressed through image-processingtechniques [69, 205] on the iPhone and Android screenshots collected by the dy-namic analyzers. This could enable the detection of other types of inconsistenciesbetween app-pairs including image-related data, layout, or style. Additionally, thefact that we form the comparison baselines manually could be a threat to internalvalidity. We tried to mitigate this threat by asking two researchers to create thecomparison baselines together before conducting the experiment. Additionally, wehad a small number of scenarios in particular for the open source apps. We tried tomitigate this threat by assuring that these scenarios covered the app screens/statesfully. A threat to the external validity of our experiment is with regard to the gen-eralization of the results to other mobile apps. To mitigate this threat, we selectedour experimental objects from industrial and open-source domains with variationsin functionality, structure and size.7.2 Future Work and Concluding RemarksTo summarize, we have identified the practices and challenges in software devel-opment for mobile devices, that go beyond the anecdotic evidence. We startedwith a qualitative field study. We provided a list of important software engineeringresearch issues related to mobile development. We also conducted two empiri-cal studies on software development data for mobile apps. After our qualitativestudies, in order to address the identified challenges, we conducted follow-up re-search and proposed automated model-based techniques for generating a model ofa mobile app through ICRAWLER approach as well as detecting inconsistencies ofa mobile app across multiple platforms through CHECKCAMP approach. Over-all, our findings showed the effectiveness of the proposed model generation andmapping techniques in terms of accuracy and inconsistency detection capability.Future work on non-reproducible bug reports can focus on (1) bug reports inthe “Interbug Dependencies” category to design techniques that would facilitateidentifying, linking, and clustering them upfront so that developers would not haveto waste time on them, (2) incorporating better collaboration tools into bug track-ing systems to facilitate better communication between different stakeholders to172address the problem with the other NR categories. Future work regarding the app-pairs study can focus on the release dates of the app-pairs to understand whichplatform developers will target first when they release a new app. Further, app de-scriptions can be used to compare the app features provided on different platforms.Additionally, there are several opportunities in which ICRAWLER approach canbe enhanced and extended for future research. The immediate step would be to ex-tend the current version of ICRAWLER to support the remaining set of UI elementswithin UIKIT such as date picker, action sheet, alert view, the tool bar, slider, pagecontrol, and search bar. Other directions we will pursue are using ICRAWLER tech-nique for smoke testing of iPhone apps as well as generating test cases from theinferred state model. Furthermore, ICRAWLER can be expanded to support iPadapps. Additionally, ICRAWLER can be extended with reverse engineering analysisat the binary level. It would be beneficial as Apple app store distributes binary codeof the applications, and this would be interesting to apply automated testing to anyapplication disregarding having accessibility to its source code. It also would re-duce/omit the users manual effort to set up the analysis environment. Similarly,there are various opportunities in which CHECKCAMP approach can be improvedand extended for future research. The immediate step would be to conduct an in-depth case study, carried out in an industrial setting with a number of developersusing CHECKCAMP. Additionally, the execution of consistent scenarios can beenhanced by the use of mobile apps that have test suites such as CALABASH [26]scripts. The traces generated by test suites can be leveraged in the mapping en-gine to enhance the approach. Systematically crawling to recover models is also analternative to using scenarios. While there are limitations of automated model re-covery, it could complement human-provided scenarios, to ensure better coverage.We have taken the first required steps for automatically generating state models ofiPhone apps through ICRAWLER reverse engineering technique [85]. There havebeen similar techniques for Android apps [55, 73, 102, 217]. Another direction is toimprove the current dynamic analyzers to capture information regarding each de-vice’s network communication (client-server communication of platform-specificversions of a mobile app), as well as the API calls made to utilize the device’snative functionality such as GPS, SMS, Address Book, E-mail, Calendar, Camera,and Gallery. Finally, CHECKCAMP approach can be extended to support third-173party testing, where no source code is available, as well as other mobile platformssuch as Windows Phone and Blackberry.Nonetheless, our research is only scratching the surface of mobile developmentdomain with its fast paced and highly frequent changes. We open sourced all of ourempirical data and tools, making our techniques and findings applicable in futureresearch.174Bibliography[1] The developer’s guide - Android developers.https://developer.android.com/guide/index.html. Accessed: 2015-12-15. →pages 144[2] Eclipse Bugzilla. https://bugs.eclipse.org/bugs/, . Accessed: 2015-12-15.→ pages 63[3] MediaWiki Bugzilla. https://bugzilla.wikimedia.org/, . Accessed:2015-12-15. → pages 63[4] Bugzilla@Mozilla. https://bugzilla.mozilla.org, . Accessed: 2015-12-15.→ pages 63[5] Eclipse Metrics plugin. http://metrics2.sourceforge.net/. Accessed:2015-12-15. → pages 153[6] Family App for iPhone and Android. https://github.com/FamilyLab/Family.Accessed: 2015-12-15. → pages 154[7] JIRA.https://confluence.atlassian.com/display/JIRA050/JIRA+Documentation.Accessed: 2015-12-15. → pages 58, 60[8] MTGJudge App for iPhone and Android.https://github.com/numegil/MTG-Judge. Accessed: 2015-12-15. → pages154[9] Non-reproducible bug report analyser and empirical data.https://github.com/saltlab/NR-bug-analyzer. Accessed: 2015-12-15. →pages 60, 64, 65[10] Moodle Tracker! https://tracker.moodle.org/issues/?jql=. Accessed:2015-12-15. → pages 63175[11] PortKit: UX Metaphor Equivalents for iOS & Android. http://kintek.com.au/blog/portkit-ux-metaphor-equivalents-for-ios-and-android/. Accessed:2015-12-15. → pages 149[12] Redmine App for iPhone and Android.https://github.com/webguild/RedmineMobile. Accessed: 2015-12-15. →pages 154[13] UIKit Framework Reference. https://developer.apple.com/library/ios/documentation/UIKit/Reference/UIKit Framework/. Accessed: 2015-12-15.→ pages 149[14] Whistle App for iPhone and Android. https://github.com/yasulab/whistle.Accessed: 2015-12-15. → pages 154[15] Xcode Statistician.http://www.alexcurylo.com/blog/2010/11/01/xcode-statistician/. Accessed:2015-12-15. → pages 153[16] YDP App for iPhone and Android. https://github.com/alakinfotech/YDP.Accessed: 2015-12-15. → pages 154[17] The Android version of Chirpradio.https://github.com/chirpradio/chirpradio-android. Accessed: 2015-12-15.→ pages 154[18] The Android version of Roadkill Reporter.https://github.com/calebgomer/Roadkill Reporter Android. Accessed:2015-12-15. → pages 154[19] Android Market Stats. http://www.appbrain.com/stats/, . Accessed:2015-12-15. → pages 85, 115, 138[20] android.widget Package. https://developer.android.com/reference/android/widget/package-summary.html,. Accessed: 2015-12-15. → pages 149[21] App Store Metrics. http://148apps.biz/app-store-metrics/. Accessed:2015-12-15. → pages 85, 114, 138[22] Aspects. https://github.com/steipete/Aspects. Accessed: 2015-12-15. →pages 152176[23] Bugzilla: Eclipse Bug #106396.https://bugs.eclipse.org/bugs/show bug.cgi?id=106396, . Accessed:2015-12-15. → pages 60[24] Bugzilla. http://www.bugzilla.org/docs/, . Accessed: 2015-12-15. → pages58, 60, 77[25] Bugzilla Severity Definitions.https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Severity, .Accessed: 2015-12-15. → pages 158[26] Calabash. http://calaba.sh/. Accessed: 2015-12-15. → pages 164, 173[27] iOS and Android Dynamic Analyzers, Mapping and Visualization Enginetogether with Open-source Experimental Scenarios and Results.https://github.com/saltlab/camp. Accessed: 2015-12-15. → pages 139,150, 152, 155, 162[28] What is an issue?https://confluence.atlassian.com/display/JIRA/What+is+an+Issue.Accessed: 2015-12-15. → pages 60[29] Graph theory (a.k.a. network) library for analysis and visualisation.http://js.cytoscape.org/. Accessed: 2015-12-15. → pages 152[30] DCIntrospect. https://github.com/domesticcatsoftware/DCIntrospect.Accessed: 2015-12-15. → pages 129, 152[31] Frank: Automated Acceptance Tests for iPhone and iPad.http://www.testingwithfrank.com/. Accessed: 2015-12-15. → pages 117[32] The iOS version of Chirpradio.https://github.com/chirpradio/chirpradio-ios. Accessed: 2015-12-15. →pages 154[33] The iOS version of Roadkill Reporter.https://github.com/calebgomer/Roadkill Reporter iOS. Accessed:2015-12-15. → pages 154[34] KIF iOS Integration Testing Framework. https://github.com/square/KIF.Accessed: 2015-12-15. → pages 117, 129[35] UI/Application Exerciser Monkey.http://developer.android.com/tools/help/monkey.html, . Accessed:2015-12-15. → pages 118177[36] MonkeyTalk for iOS & Android.http://www.gorillalogic.com/testing-tools/monkeytalk, . Accessed:2015-12-15. → pages 117[37] NSKeyValueCoding Protocol Reference.https://developer.apple.com/library/ios/navigation/, . Accessed:2015-12-15. → pages 124[38] Objective-C Runtime Reference.https://developer.apple.com/library/ios/navigation/, . Accessed:2015-12-15. → pages 123[39] Project SIKULI. http://sikuli.org. Accessed: 2015-12-15. → pages 117[40] Works on my machine - How to fix non-reproducible bugs?http://stackoverflow.com/questions/1102716/works-on-my-machine-how-to-fix-non-reproducible-bugs. Accessed:2015-12-15. → pages 58, 60[41] Bug fields. https://bugzilla.mozilla.org/page.cgi?id=fields.html. Accessed:2015-12-15. → pages 60[42] XSWI: XML stream writer for iOS. http://skjolber.github.io/xswi/.Accessed: 2015-12-15. → pages 129[43] C. Q. Adamsen, G. Mezzetti, and A. Møller. Systematic execution ofandroid test suites in adverse conditions. In Proceedings of the 2015International Symposium on Software Testing and Analysis, ISSTA 2015,pages 83–93. ACM, 2015. → pages 47, 49[44] S. Adolph, W. Hall, and P. Kruchten. Using grounded theory to study theexperience of software development. Empirical Softw. Engg., 16(4):487–513, 2011. → pages 3, 11, 12, 54[45] S. Adolph, P. Kruchten, and W. Hall. Reconciling perspectives: Agrounded theory of how people manage the process of softwaredevelopment. J. Syst. Softw., 85(6):1269–1286, 2012. → pages 12, 54[46] S. Agarwal, R. Mahajan, A. Zheng, and V. Bahl. Diagnosing mobileapplications in the wild. In Proceedings of the 9th ACM SIGCOMMWorkshop on Hot Topics in Networks, Hotnets-IX, pages 22:1–22:6. ACM,2010. → pages 11, 50, 51178[47] Y. Agarwal and M. Hall. Protectmyprivacy: Detecting and mitigatingprivacy leaks on ios devices using crowdsourcing. In Proceeding of the11th Annual International Conference on Mobile Systems, Applications,and Services, MobiSys ’13, pages 97–110. ACM, 2013. → pages 47[48] M. Ahmad, N. Musa, R. Nadarajah, R. Hassan, and N. Othman.Comparison between android and iOS operating system in terms ofsecurity. In Information Technology in Asia (CITA), 2013 8th InternationalConference on, pages 1–4. IEEE, 2013. → pages 35[49] D. Amalfitano, A. R. Fasolino, and P. Tramontana. A GUI crawling-basedtechnique for Android mobile application testing. In Proceedings of theWorkshops at IEEE Fourth International Conference on Software Testing,Verification and Validation Workshops (ICSTW), pages 252–261. IEEEComputer Society, 2011. → pages 4, 47, 117[50] Appcelerator. Appcelerator / IDC Q3 2014 Mobile Trends Report.http://www.appcelerator.com/enterprise/resource-center/research/appcelerator-2014-q3-mobile-report/, . Accessed: 2015-12-15. → pages 1,138[51] Appcelerator. Appcelerator / IDC Q4 2013 Mobile Trends Report. http://www.appcelerator.com.s3.amazonaws.com/pdf/q4-2013-devsurvey.pdf, .Accessed: 2015-12-15. → pages 1, 138[52] Apple Store Crawler.https://github.com/MarcelloLins/Apple-Store-Crawler. Accessed:2015-12-15. → pages 87[53] J. Aranda and G. Venolia. The secret life of bugs: Going past the errors andomissions in software repositories. In Proceedings of the InternationalConference on Software Engineering (ICSE), pages 298–308. IEEEComputer Society, 2009. → pages 58, 81[54] V. Avdiienko, K. Kuznetsov, A. Gorla, A. Zeller, S. Arzt, S. Rasthofer, andE. Bodden. Mining apps for abnormal usage of sensitive data. InProceedings of the 37th International Conference on SoftwareEngineering, ICSE 2015. ACM, 2015. → pages 2, 47, 50, 52, 85, 112[55] T. Azim and I. Neamtiu. Targeted and depth-first exploration for systematictesting of Android apps. SIGPLAN Not., 48(10):641–660, Oct. 2013. →pages 164, 173179[56] K. Bajaj, K. Pattabiraman, and A. Mesbah. Mining questions asked by webdevelopers. In Proceedings of the Working Conference on Mining SoftwareRepositories (MSR), pages 112–121. ACM, 2014. → pages 52[57] A. Banerjee, L. K. Chong, S. Chattopadhyay, and A. Roychoudhury.Detecting energy bugs and hotspots in mobile apps. In Proceedings of the22Nd ACM SIGSOFT International Symposium on Foundations of SoftwareEngineering, FSE 2014, pages 588–598. ACM, 2014. → pages 47[58] G. Bavota, M. Linares-Vasquez, C. Bernal-Cardenas, M. D. Penta,R. Oliveto, and D. Poshyvanyk. The impact of api change- andfault-proneness on the user ratings of Android apps. IEEE Transactions onSoftware Engineering, 99(PrePrints):1, 2015. → pages 47, 53[59] J. Bell, N. Sarda, and G. Kaiser. Chronicler: Lightweight recording toreproduce field failures. In Proceedings of the 2013 InternationalConference on Software Engineering, ICSE ’13, pages 362–371. IEEEPress, 2013. → pages 79, 82[60] C. Bernal-Ca´rdenas. Improving energy consumption in android apps. InProceedings of the 2015 10th Joint Meeting on Foundations of SoftwareEngineering, ESEC/FSE 2015, pages 1048–1050. ACM, 2015. → pages 47[61] N. Bettenburg, S. Just, A. Schro¨ter, C. Weiss, R. Premraj, andT. Zimmermann. What makes a good bug report? In Proceedings of the16th ACM SIGSOFT International Symposium on Foundations of softwareengineering, pages 308–318. ACM, 2008. → pages 58, 80[62] S. Beyer and M. Pinzger. A manual categorization of Android appdevelopment issues on Stack Overflow. In Software Maintenance andEvolution (ICSME), 2014 IEEE International Conference on, pages531–535. IEEE, Sept 2014. → pages 2, 50, 51[63] P. Bhattacharya, L. Ulanova, I. Neamtiu, and S. Koduru. An empiricalanalysis of bug reports and bug fixing in open source Android apps. InSoftware Maintenance and Reengineering (CSMR), 2013 17th EuropeanConference on, pages 133–143. IEEE, March 2013. → pages 2, 35, 47, 50[64] P. Bhattacharya, L. Ulanova, I. Neamtiu, and S. C. Koduru. An empiricalanalysis of bug reports and bug fixing in open source Android apps. InProceedings of the European Conference on Software Maintenance andReengineering (CSMR), pages 133–143. IEEE Computer Society, 2013. →pages 2, 80180[65] Categories and Extensions.https://developer.apple.com/library/ios/navigation/. Accessed: 2015-12-15.→ pages 125, 142[66] R. Chandra, B. F. Karlsson, N. D. Lane, C.-J. M. Liang, S. Nath, J. Padhye,L. Ravindranath, and F. Zhao. How to smash the next billion mobile appbugs? GetMobile: Mobile Computing and Communications, 19(1), 2015.→ pages 47, 49[67] M. Chandramohan and H. B. K. Tan. Detection of mobile malware in thewild. Computer, 99, 2012. → pages 35, 47, 116[68] R. Chandy and H. Gu. Identifying spam in the iOS app store. InProceedings of the Joint WICOW/AIRWeb Workshop on Web Quality,WebQuality ’12, pages 56–59. ACM, 2012. → pages 50, 112[69] T.-H. Chang, T. Yeh, and R. C. Miller. GUI testing using computer vision.In Proceedings of the 28th international conference on Human factors incomputing systems, CHI ’10, pages 1535–1544. ACM, 2010. → pages 118,126, 161, 172[70] L. Chen, M. AliBabar, and B. Nuseibeh. Characterizing architecturallysignificant requirements. IEEE Softw., 30(2):38–45, 2013. → pages 12, 54[71] N. Chen, J. Lin, S. C. H. Hoi, X. Xiao, and B. Zhang. Ar-miner: Mininginformative reviews for developers from mobile app marketplace. InProceedings of the 36th International Conference on SoftwareEngineering, ICSE 2014, pages 767–778. ACM, 2014. → pages 2, 4, 47,50, 85, 92, 103, 112[72] Y. Chen, H. Xu, Y. Zhou, and S. Zhu. Is this app safe for children?: Acomparison study of maturity ratings on android and iOS applications. InProceedings of the International Conference on World Wide Web, WWW’13, pages 201–212. ACM, 2013. → pages 112, 113[73] W. Choi, G. Necula, and K. Sen. Guided GUI testing of Android apps withminimal restart and approximate learning. SIGPLAN Not., 48(10):623–640, 2013. → pages 47, 164, 173[74] S. R. Choudhary, M. Prasad, and A. Orso. Cross-platform feature matchingfor web applications. In Proceedings of the 2014 International Symposiumon Software Testing and Analysis, ISSTA 2014. ACM, 2014. → pages 4,48, 163181[75] CNET. Researcher posts Facebook bug report to Mark Zuckerberg’s wall.http://news.cnet.com/8301-1023 3-57599043-93/researcher-posts-facebook-bug-report-to-mark-zuckerbergs-wall/.Accessed: 2015-12-15. → pages 58[76] Cocoa developer community. Method Swizzling.https://developer.apple.com/library/ios/navigation/. Accessed: 2015-12-15.→ pages 125, 142[77] G. Coleman and R. O’Connor. Using grounded theory to understandsoftware process improvement: A study of Irish software productcompanies. Inf. Softw. Technol., 49(6):654–667, 2007. → pages 13, 54, 55[78] G. Coleman and R. O’Connor. Investigating software process in practice:A grounded theory perspective. J. Syst. Softw., 81(5):772–784, 2008. →pages 12, 54, 55[79] J. W. Creswell. Qualitative inquiry and research design : choosing amongfive approaches (2nd edition). Thousand Oaks, CA: SAGE, 2007. → pages12[80] J. W. Creswell. Research design: Qualitative, quantitative, and mixedmethods approaches. Sage Publications, Incorporated, 2013. → pages 60,86[81] I. Dalmasso, S. Datta, C. Bonnet, and N. Nikaein. Survey, comparison andevaluation of cross platform mobile application development tools. InWireless Communications and Mobile Computing Conference (IWCMC),2013 9th International, pages 323–328. IEEE, July 2013. → pages 53[82] J. Dehlinger and J. Dixon. Mobile application software engineering:Challenges and research directions. In Proceedings of the Workshop onMobile Software Engineering, pages 29–32. Springer, 2011. → pages 2,11, 50, 51, 115[83] S. Diewald, L. Roalter, A. Mo¨ller, and M. Kranz. Towards a holisticapproach for mobile application development in intelligent environments.In Proceedings of the 10th International Conference on Mobile andUbiquitous Multimedia, MUM ’11, pages 73–80. ACM, 2011. → pages 53[84] M. Egele, C. Kruegel, E. Kirda, and G. Vigna. PiOS: Detecting PrivacyLeaks in iOS Applications. In 18th Annual Network and DistributedSystem Security Symposium (NDSS). The Internet Society, 2011. → pages35, 47, 116182[85] M. Erfani Joorabchi and A. Mesbah. Reverse engineering iOS mobileapplications. In Proceedings of the Working Conference on ReverseEngineering (WCRE), pages 177–186. IEEE Computer Society, 2012. →pages iii, 9, 47, 48, 114, 164, 170, 173[86] M. Erfani Joorabchi, A. Mesbah, and P. Kruchten. Real challenges inmobile app development. In Proceedings of the ACM/IEEE InternationalSymposium on Empirical Software Engineering and Measurement,ESEM’13, pages 15–24. IEEE, 2013. → pages iii, 3, 4, 5, 6, 7, 8, 10, 85,109, 138, 139, 165, 168, 171[87] M. Erfani Joorabchi, M. Mirzaaghaei, and A. Mesbah. Works for me!characterizing non-reproducible bug reports. In The 11th WorkingConference on Mining Software Repositories, MSR’14, pages 62–71.ACM, 2014. → pages iii, 8, 57, 167[88] M. Erfani Joorabchi, M. Ali, and A. Mesbah. Detecting inconsistencies inmulti-platform mobile apps. In Proceedings of the 26th InternationalSymposium on Software Reliability Engineering, ISSRE, pages 450–460.IEEE Computer Society, 2015. → pages iv, 9, 47, 48, 85, 113, 137, 171[89] D. Falessi, M. A. Babar, G. Cantone, and P. Kruchten. Applying empiricalsoftware engineering to software architecture: challenges and lessonslearned. Empirical Softw. Engg., 15(3):250–276, 2010. → pages 54[90] A. C. C. Franca, D. E. S. Carneiro, and F. Q. B. da Silva. Towards anexplanatory theory of motivation in software engineering: A qualitativecase study of a small software company. 2012 26th Brazilian Symposiumon Software Engineering, 0:61–70, 2012. → pages 54[91] D. Franke and C. Weise. Providing a Software Quality Framework forTesting of Mobile Applications. In Proceedings of the InternationalConference on Software Testing, Verification and Validation (ICST), pages431–434. IEEE Computer Society, 2011. → pages 115[92] D. Franke, C. Elsemann, S. Kowalewski, and C. Weise. Reverseengineering of mobile application lifecycles. In 18th Working Conferenceon Reverse Engineering (WCRE), 2011. → pages 11, 50, 51[93] B. Fu, J. Lin, L. Li, C. Faloutsos, J. Hong, and N. Sadeh. Why people hateyour app: Making sense of user feedback in a mobile app store. InProceedings of the 19th ACM SIGKDD International Conference on183Knowledge Discovery and Data Mining, KDD ’13, pages 1276–1284.ACM, 2013. → pages 112[94] R. Gallo, P. Hongo, R. Dahab, L. C. Navarro, H. Kawakami, K. Galva˜o,G. Junqueira, and L. Ribeiro. Security and system architecture:Comparison of Android customizations. In Proceedings of the 8th ACMConference on Security & Privacy in Wireless and Mobile Networks,WiSec ’15, pages 12:1–12:6. ACM, 2015. → pages 109[95] L. Galvis Carreno and K. Winbladh. Analysis of user comments: Anapproach for software requirements evolution. In Software Engineering(ICSE), 2013 35th International Conference on, pages 582–591. IEEEComputer Society, 2013. → pages 50, 112[96] E. Giger, M. D’Ambros, M. Pinzger, and H. C. Gall. Method-level bugprediction. In Proceedings of the International Symposium on EmpiricalSoftware Engineering and Measurement, ESEM, pages 171–180. ACM,2012. → pages 80[97] A. Gimblett and H. Thimbleby. User interface model discovery: towards ageneric approach. In Proceedings of the 2nd ACM SIGCHI symposium onEngineering interactive computing systems, EICS ’10, pages 145–154.ACM, 2010. → pages 4, 117[98] B. Glaser. Doing Grounded Theory: Issues and Discussions. SociologyPress, 1998. → pages 13[99] B. Glaser and A. Strauss. The discovery of Grounded Theory: Strategiesfor Qualitative Research. Aldine Transaction, 1967. → pages 3, 11, 12, 13,45, 166[100] A. Gokhale, V. Ganapathy, and Y. Padmanaban. Inferring likely mappingsbetween APIs. In Proceedings of the 2013 International Conference onSoftware Engineering, ICSE ’13, pages 82–91. IEEE Press, 2013. → pages4, 47, 163[101] P. Gokhale and S. Singh. Multi-platform strategies, approaches andchallenges for developing mobile applications. In Circuits, Systems,Communication and Information Technology Applications (CSCITA), 2014International Conference on, pages 289–293. IEEE, April 2014. → pages47, 53, 54184[102] L. Gomez, I. Neamtiu, T. Azim, and T. Millstein. Reran: Timing- andtouch-sensitive record and replay for Android. In Proceedings of the 2013International Conference on Software Engineering, ICSE ’13, pages72–81. IEEE Press, 2013. → pages 47, 164, 173[103] Google Play Store Crawler.https://github.com/MarcelloLins/GooglePlayAppsCrawler. Accessed:2015-12-15. → pages 87[104] Goole Play Store Review scraper.https://github.com/jkao/GooglePlayScraper. Accessed: 2015-12-15. →pages 92[105] A. Gorla, I. Tavecchia, F. Gross, and A. Zeller. Checking app behavioragainst app descriptions. In Proceedings of the 36th InternationalConference on Software Engineering, ICSE 2014, pages 1025–1035. ACM,2014. → pages 2, 47, 50, 52, 85, 112[106] M. Greiler, A. van Deursen, and M.-A. Storey. Test confessions: a study oftesting practices for plug-in systems. In Proceedings of the InternationalConference on Software Engineering (ICSE), pages 244–254. IEEEComputer Society, 2012. → pages 2, 12, 54[107] J. Gui, S. Mcilroy, M. Nagappan, and W. G. J. Halfond. Truth inadvertising: The hidden cost of mobile ads for software developers. InProceedings of the 37th International Conference on Software Engineering- Volume 1, ICSE ’15, pages 100–110. IEEE Press, 2015. → pages 47[108] P. J. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. Characterizingand predicting which bugs get fixed: an empirical study of MicrosoftWindows. In Proceedings of the International Conference on SoftwareEngineering (ICSE), pages 495–504. ACM, 2010. → pages 58, 80[109] P. J. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. ‘not my bug!’and other reasons for software bug report reassignments. In Proceedings ofthe Conference on Computer Supported Cooperative Work, CSCW, pages395–404. ACM, 2011. → pages 58, 80, 81[110] E. Guzman and W. Maalej. How do users like this feature? a fine grainedsentiment analysis of app reviews. In Requirements EngineeringConference (RE), 2014 IEEE 22nd International, pages 153–162, 2014. →pages 47, 112185[111] D. Han, C. Zhang, X. Fan, A. Hindle, K. Wong, and E. Stroulia.Understanding Android fragmentation with topic analysis ofvendor-specific bugs. In 19th Working Conference on Reverse Engineering(WCRE), 2012, pages 83–92. IEEE, 2012. → pages 109[112] D. Han, C. Zhang, X. Fan, A. Hindle, K. Wong, and E. Stroulia.Understanding Android fragmentation with topic analysis ofvendor-specific bugs. In 19th Working Conference on Reverse Engineering(WCRE), pages 83–92. IEEE, Oct 2012. → pages 2, 47, 50, 53[113] M. Harman, Y. Jia, and Y. Zhang. App store mining and analysis: MSR forapp stores. In 9th IEEE Working Conference on Mining SoftwareRepositories (MSR), pages 108–111. IEEE, June 2012. → pages 50, 112[114] Z. Hemel and E. Visser. Declaratively programming the mobile web withMobl. In Proceedings of Intl. Conf. on Object oriented programmingsystems languages and applications (OOPSLA), pages 695–712. ACM,2011. → pages 48[115] S. Herbold, J. Grabowski, S. Waack, and U. Bu¨nting. Improved bugreporting and reproduction through non-intrusive gui usage monitoring andautomated replaying. In Proceedings of the 2011 IEEE FourthInternational Conference on Software Testing, Verification and ValidationWorkshops, ICSTW ’11, pages 232–241. IEEE Computer Society, 2011. →pages 79, 82[116] K. Herzig, S. Just, and A. Zeller. It’s not a bug, it’s a feature: howmisclassification impacts bug prediction. In Proceedings of theInternational Conference on Software Engineering (ICSE), pages 392–401.IEEE Computer Society, 2013. → pages 58, 66, 79, 80, 81[117] E. Holder, E. Shah, M. Davoodi, and E. Tilevich. Cloud twin: Nativeexecution of Android applications on the Windows Phone. In AutomatedSoftware Engineering (ASE), 2013 IEEE/ACM 28th InternationalConference on, pages 598–603. IEEE, 2013. → pages 47, 163[118] P. Hooimeijer and W. Weimer. Modeling bug report quality. InProceedings of the International Conference on Automated SoftwareEngineering (ASE), pages 34–43. ACM, 2007. → pages 80[119] C. Hu and I. Neamtiu. Automating GUI testing for Android applications.In Proceedings of the 6th International Workshop on Automation ofSoftware Test, AST ’11, pages 77–83. ACM, 2011. → pages 4, 47, 48, 118186[120] G. Hu, X. Yuan, Y. Tang, and J. Yang. Efficiently, effectively detectingmobile app bugs with appdoctor. In Proceedings of the Ninth EuropeanConference on Computer Systems, EuroSys ’14, pages 18:1–18:15. ACM,2014. → pages 47, 49[121] N. P. Huy and D. vanThanh. Evaluation of mobile app paradigms. InProceedings of the International Conference on Advances in MobileComputing and Multimedia, MoMM, pages 25–30. ACM, 2012. → pages53, 54[122] C. Iacob and R. Harrison. Retrieving and analyzing mobile apps featurerequests from online reviews. In Proceedings of the 10th WorkingConference on Mining Software Repositories, MSR ’13, pages 41–44.IEEE Press, 2013. → pages 112[123] C. Iacob and R. Harrison. Retrieving and analyzing mobile apps featurerequests from online reviews. In Proceedings of the 10th WorkingConference on Mining Software Repositories, MSR ’13, pages 41–44.IEEE Press, 2013. → pages 47[124] C. Iacob, V. Veerappa, and R. Harrison. What are you complaining about?:A study of online reviews of mobile applications. In Proceedings of the27th International BCS Human Computer Interaction Conference,BCS-HCI ’13, pages 29:1–29:6. British Computer Society, 2013. → pages2, 4, 85, 112[125] iOS Developer Library. Apple’s developer guides.https://developer.apple.com/library/ios/navigation/. Accessed: 2015-12-15.→ pages xii, 117, 118, 122[126] iTunes App Store Review scraper.https://github.com/grych/AppStoreReviews. Accessed: 2015-12-15. →pages 92[127] iTunes RSS Feed Generator. https://rss.itunes.apple.com/ca/. Accessed:2015-12-15. → pages 91[128] M. Janicki, M. Katara, and T. Paakkonen. Obstacles and opportunities indeploying model-based gui testing of mobile software: a survey. SoftwareTesting, Verification and Reliability, 22(5):313–341, 2012. → pages 47,115187[129] Y.-W. Kao, C.-F. Lin, K.-A. Yang, and S.-M. Yuan. A cross-platformruntime environment for mobile widget-based application. InCyber-Enabled Distributed Computing and Knowledge Discovery(CyberC), 2011 International Conference on, pages 68 –71, 2011. → pages53[130] K. Karhu, T. Repo, O. Taipale, and K. Smolander. Empirical observationson software testing automation. In Proceedings of the InternationalConference on Software Testing Verification and Validation (ICST), pages201–209. IEEE Computer Society, 2009. → pages 12, 54, 55[131] J. Kasurinen, O. Taipale, and K. Smolander. Software test automation inpractice: empirical observations. Adv. Soft. Eng., 2010:4:1–4:13, 2010. →pages 54[132] V. Kettunen, J. Kasurinen, O. Taipale, and K. Smolander. A study onagility and testing processes in software organizations. In Proceedings ofthe International Symposium on Software Testing and Analysis (ISSTA),pages 231–240. ACM, 2010. → pages 12, 45, 54, 166[133] R. Khadka, B. V. Batlajery, A. M. Saeidi, S. Jansen, and J. Hage. How doprofessionals perceive legacy systems and software modernization? InProceedings of the 36th International Conference on Software Engineering,ICSE 2014, pages 36–47. ACM, 2014. → pages 2, 3, 11, 12, 54[134] H. Khalid. On identifying user complaints of iOS apps. In Proceedings ofthe 2013 International Conference on Software Engineering, ICSE ’13,pages 1474–1476. IEEE Press, 2013. → pages 50, 52, 112[135] H. Khalid, M. Nagappan, E. Shihab, and A. E. Hassan. Prioritizing thedevices to test your app on: a case study of Android game apps. InProceedings of the 22nd ACM SIGSOFT International Symposium onFoundations of Software Engineering, (FSE-22), Hong Kong, China,November 16 - 22, 2014, pages 610–620. ACM, 2014. → pages 47, 52[136] H. Khalid, E. Shihab, M. Nagappan, and A. Hassan. What do mobile appusers complain about? a study on free iOS apps. IEEE Software, 99, 2014.→ pages 50, 52, 112[137] H. Kim, B. Choi, and W. Wong. Performance testing of mobileapplications at the unit test level. In Proceedings of the 3rd InternationalConference on Secure Software Integration and Reliability Improvement,pages 171–180. IEEE Computer Society, 2009. → pages 115188[138] P. S. Kochhar, F. Thung, N. Nagappan, T. Zimmermann, and D. Lo.Understanding the test automation culture of app developers. InProceedings of the 8th IEEE International Conference on Software Testing,Verification, and Validation. IEEE Computer Society, 2015. → pages 2, 11,50, 51[139] T. A. Kroeger, N. J. Davidson, and S. C. Cook. Understanding thecharacteristics of quality for software engineering processes: A groundedtheory investigation. Inf. Softw. Technol., 56(2):252–271, Feb. 2014. →pages 12, 54[140] A. Kumar Maji, K. Hao, S. Sultana, and S. Bagchi. Characterizing failuresin mobile oses: A case study with Android and symbian. In SoftwareReliability Engineering (ISSRE), IEEE International Symposium on, pages249–258. IEEE, Nov 2010. → pages 50, 112[141] A. Kumar Maji, K. Hao, S. Sultana, and S. Bagchi. Characterizing failuresin mobile OSes: A case study with Android and Symbian. In Proceedingsof the International Symposium on Software Reliability Engineering(ISSRE), pages 249–258. IEEE Computer Society, 2010. → pages 80[142] Y. Kwon, S. Lee, H. Yi, D. Kwon, S. Yang, B.-G. Chun, L. Huang,P. Maniatis, M. Naik, and Y. Paek. Mantis: Automatic performanceprediction for smartphone applications. In Proceedings of the 2013USENIX Conference on Annual Technical Conference, USENIX ATC’13,pages 297–308. USENIX Association, 2013. → pages 47[143] D. Lavid Ben Lulu and T. Kuflik. Functionality-based clustering usingshort textual description: Helping users to find apps installed on theirmobile device. In Proceedings of the 2013 International Conference onIntelligent User Interfaces, IUI ’13, pages 297–306. ACM, 2013. → pages2, 50, 85, 112[144] V. L. Levenshtein. Binary codes capable of correcting deletions, insertions,and reversals. Cybernetics and Control Theory, 10:707–710, 1996. →pages 149[145] C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E. J. Whitehead Jr.Does bug prediction support human developers? findings from a Googlecase study. In Proceedings of the International Conference on SoftwareEngineering, ICSE, pages 372–381. IEEE Computer Society, 2013. →pages 80189[146] C.-J. M. Liang, N. D. Lane, N. Brouwers, L. Zhang, B. F. Karlsson, H. Liu,Y. Liu, J. Tang, X. Shan, R. Chandra, and F. Zhao. Caiipa: Automatedlarge-scale mobile app testing through contextual fuzzing. In Proceedingsof the 20th Annual International Conference on Mobile Computing andNetworking, MobiCom ’14, pages 519–530. ACM, 2014. → pages 47, 49[147] M. Linares-Va´squez, G. Bavota, C. Bernal-Ca´rdenas, M. Di Penta,R. Oliveto, and D. Poshyvanyk. Api change and fault proneness: A threatto the success of Android apps. In Proceedings of the InternationalSymposium on the Foundations of Software Engineering, ESEC/FSE 2013,pages 477–487. ACM, 2013. → pages 47, 50, 53, 112[148] M. Linares-Vasquez, B. Dit, and D. Poshyvanyk. An exploratory analysisof mobile development issues using stack overflow. In 10th IEEE WorkingConference on Mining Software Repositories (MSR), pages 93–96. IEEE,May 2013. → pages 2, 50, 52[149] M. Linares-Va´squez, G. Bavota, M. Di Penta, R. Oliveto, andD. Poshyvanyk. How do api changes trigger stack overflow discussions? astudy on the Android SDK. In Proceedings of the 22Nd InternationalConference on Program Comprehension, ICPC 2014, pages 83–94. ACM,2014. → pages 2, 50, 52[150] M. Linares-Va´squez, M. White, C. Bernal-Ca´rdenas, K. Moran, andD. Poshyvanyk. Mining android app usages for generating actionablegui-based execution scenarios. In Proceedings of the 12th WorkingConference on Mining Software Repositories, MSR ’15, pages 111–122.IEEE Press, 2015. → pages 51[151] M. Linares-Vsquez, C. Vendome, Q. Luo, and D. Poshyvanyk. Howdevelopers detect and fix performance bottlenecks in android apps. InProceedings of 31st IEEE International Conference on SoftwareMaintenance and Evolution, ICSME’15. IEEE, 2015. → pages 47[152] W. Maalej and H. Nabil. Bug report, feature request, or simply praise? onautomatically classifying app reviews. In IEEE 23rd InternationalRequirements Engineering Conference (RE), 2015, pages 116–125. IEEE,2015. → pages 47, 50[153] A. Machiry, R. Tahiliani, and M. Naik. Dynodroid: An input generationsystem for android apps. In Proceedings of the 2013 9th Joint Meeting onFoundations of Software Engineering, ESEC/FSE 2013, pages 224–234.ACM, 2013. → pages 47190[154] R. Mahmood, N. Mirzaei, and S. Malek. Evodroid: Segmentedevolutionary testing of android apps. In Proceedings of the 22Nd ACMSIGSOFT International Symposium on Foundations of SoftwareEngineering, FSE 2014, pages 599–609. ACM, 2014. → pages 47[155] L. Martie, V. Palepu, H. Sajnani, and C. Lopes. Trendy bugs: Topic trendsin the Android bug reports. In 9th IEEE Working Conference on MiningSoftware Repositories (MSR), pages 120–123, June 2012. → pages 2, 47,50, 53[156] W. Martin, M. Harman, Y. Jia, F. Sarro, and Y. Zhang. The app samplingproblem for app store mining. In 12th IEEE Working Conference onMining Software Repositories (MSR). IEEE, 2015. → pages 50, 112[157] E. Masi, G. Cantone, M. Mastrofini, G. Calavaro, and P. Subiaco. Mobileapps development: A framework for technology decision making. InProceedings of International Conference on Mobile Computing,Applications, and Services., MobiCASE’4, pages 64–79, 2012. → pages 1,47, 53, 54[158] T. McDonnell, B. Ray, and M. Kim. An empirical study of api stability andadoption in the Android ecosystem. In Proceedings of the 2013 IEEEInternational Conference on Software Maintenance, ICSM ’13, pages70–79. IEEE Computer Society, 2013. → pages 47, 53[159] A. M. Memon, I. Banerjee, and A. Nagarajan. GUI Ripping: ReverseEngineering of Graphical User Interfaces for Testing. In Proceedings ofThe 10th Working Conference on Reverse Engineering, pages 260–269.IEEE, 2003. → pages 4, 117[160] A. Mesbah and S. Mirshokraie. Automated analysis of CSS rules tosupport style maintenance. In Proceedings of the 34th ACM/IEEEInternational Conference on Software Engineering (ICSE’12), pages408–418. IEEE Computer Society, 2012. → pages 117[161] A. Mesbah and M. R. Prasad. Automated cross-browser compatibilitytesting. In Proceedings of the 33rd ACM/IEEE International Conference onSoftware Engineering (ICSE’11), pages 561–570. ACM, 2011. → pages 4,136, 163[162] A. Mesbah and M. R. Prasad. Automated cross-browser compatibilitytesting. In Proceedings of the International Conference on SoftwareEngineering (ICSE), pages 561–570. ACM, 2011. → pages 48191[163] A. Mesbah, A. van Deursen, and S. Lenselink. Crawling Ajax-based WebApplications through Dynamic Analysis of User Interface State Changes.In ACM Transactions on the Web (TWEB), volume 6, pages 3:1–3:30.ACM, 2012. → pages 4, 117, 129, 163[164] A. Mesbah, A. van Deursen, and D. Roest. Invariant-based automatictesting of modern web applications. IEEE Transactions on SoftwareEngineering (TSE), 38(1):35 –53, 2012. → pages 117[165] Mining iOS and Android mobile app-pairs: Toolset and dataset.https://github.com/saltlab/Minning-App-Stores. Accessed: 2015-12-15. →pages 85, 87, 96, 111, 112[166] M. Miranda, R. Ferreira, C. R. B. de Souza, F. Figueira Filho, andL. Singer. An exploratory study of the adoption of mobile developmentplatforms by software engineers. In Proceedings of the 1st InternationalConference on Mobile Software Engineering and Systems, MOBILESoft2014, pages 50–53. ACM, 2014. → pages 2, 11, 50, 51[167] I. Mojica Ruiz, M. Nagappan, B. Adams, T. Berger, S. Dienst, andA. Hassan. Impact of ad libraries on ratings of android mobile apps.Software, IEEE, 31(6):86–92, 2014. → pages 47[168] K. Moran, M. Linares-Va´squez, C. Bernal-Ca´rdenas, and D. Poshyvanyk.Auto-completing bug reports for android applications. In Proceedings ofthe 2015 10th Joint Meeting on Foundations of Software Engineering,ESEC/FSE 2015, pages 673–686. ACM, 2015. → pages 47, 50[169] H. Muccini, A. D. Francesco, and P. Esposito. Software Testing of MobileApplications: Challenges and Future Research Directions. In Proceedingsof the 7th International Workshop on Automation of Software Test (AST).IEEE Computer Society, 2012. → pages 2, 11, 50, 51, 115[170] B. A. Myers and M. B. Rosson. Survey On User Interface Programming.In Proceedings of the SIGCHI conference on Human factors in computingsystems, CHI’92, pages 195–202. ACM, 1992. → pages 115, 162, 169[171] S. N. Nader Boushehrinejadmoradi, Vinod Ganapathy and L. Iftode.Testing cross-platform mobile app development frameworks. InProceedings of the 30th IEEE/ACM International Conference onAutomated Software Engineering, ASE 2015. ACM, 2015. → pages 47192[172] M. Nagappan and E. Shihab. Future trends in software engineeringresearch for mobile apps. In Proceedings of the IEEE InternationalConference on Software Analysis, Evolution, and Reengineering, FoSE,2016. → pages 111, 113[173] S. Nath. Madscope: Characterizing mobile in-app targeted ads. InProceedings of the 13th Annual International Conference on MobileSystems, Applications, and Services, MobiSys ’15, pages 59–73. ACM,2015. → pages 47[174] S. Nath, F. X. Lin, L. Ravindranath, and J. Padhye. Smartads: Bringingcontextual ads to mobile apps. In Proceeding of the 11th AnnualInternational Conference on Mobile Systems, Applications, and Services,MobiSys ’13, pages 111–124. ACM, 2013. → pages 47[175] S. P. Ng, T. Murnane, K. Reed, D. Grant, and T. Y. Chen. A preliminarysurvey on software testing practices in Australia. In Proceedings of theAustralian Software Engineering Conference, pages 116–125. IEEEComputer Society, 2004. → pages 54[176] N. Nikzad, O. Chipara, and W. G. Griswold. Ape: An annotation languageand middleware for energy-efficient mobile application development. InProceedings of the 36th International Conference on SoftwareEngineering, ICSE 2014, pages 515–526. ACM, 2014. → pages 47[177] A. Nistor and L. Ravindranath. Suncat: Helping developers understand andpredict performance problems in smartphone applications. In Proceedingsof the 2014 International Symposium on Software Testing and Analysis,ISSTA 2014, pages 282–292. ACM, 2014. → pages 47[178] NPD DisplaySearch. Smartphones to pass 80% of global mobile phoneshipments by 2017. http://en.ofweek.com/news/Smartphones-to-pass-80-of-global-mobile-phone-shipments-by-2017-3354.Accessed: 2015-12-15. → pages 114[179] A. Onwuegbuzie and N. Leech. Validity and qualitative research: Anoxymoron? Quality and Quantity, 41:233–249, 2007. → pages 45, 166[180] T. Paananen. Smartphone Cross-Platform Frameworks. Bachelor’s Thesis.,2011. → pages 53[181] D. Pagano and W. Maalej. User feedback in the appstore: An empiricalstudy. In Requirements Engineering Conference (RE), 2013 21st IEEEInternational, pages 125–134. IEEE, July 2013. → pages 47, 50, 53, 112193[182] M. Palmieri, I. Singh, and A. Cicchetti. Comparison of cross-platformmobile development tools. In Intelligence in Next Generation Networks(ICIN), 2012 16th International Conference on, pages 179 –186, 2012. →pages 53[183] F. Palomba, M. Linares-Vasquez, G. Bavota, R. Oliveto, M. D. Penta,D. Poshyvanyk, and A. D. Lucia. User reviews matter! trackingcrowdsourced reviews to support evolution of successful apps. In IEEEInternational Conference on Software Maintenance and Evolution(ICSME). IEEE, 2015. → pages 112[184] F. Palomba, M. Linares-Vsquez, G. Bavota, R. Oliveto, M. D. Penta,D. Poshyvanyk, and A. D. Lucia. User reviews matter! trackingcrowdsourced reviews to support evolution of successful apps. In Proc.ICSME, pages 291–300. IEEE, 2015. → pages 47, 50[185] S. Panichella, A. D. Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, andH. C. Gall. How can i improve my app? classifying user reviews forsoftware maintenance and evolution. In IEEE International Conference onSoftware Maintenance and Evolution (ICSME). IEEE, 2015. → pages 2, 4,85, 92, 103, 112[186] G. P. Picco, C. Julien, A. L. Murphy, M. Musolesi, and G.-C. Roman.Software engineering for mobility: Reflecting on the past, peering into thefuture. In Proceedings of the on Future of Software Engineering, FOSE2014, pages 13–28. ACM, 2014. → pages 2[187] A. Puder and O. Antebi. Cross-compiling Android applications to iOS andWindows phone 7. Mob. Netw. Appl., 18(1):3–21, Feb. 2013. → pages 47,53, 54[188] K. Rasmussen, A. Wilson, and A. Hindle. Green mining: Energyconsumption of advertisement blocking methods. In Proceedings of the 3rdInternational Workshop on Green and Sustainable Software, GREENS2014, pages 38–45. ACM, 2014. → pages 47[189] V. Rastogi, Y. Chen, and W. Enck. Appsplayground: Automatic securityanalysis of smartphone applications. In Proceedings of the Third ACMConference on Data and Application Security and Privacy, CODASPY’13, pages 209–220. ACM, 2013. → pages 35[190] L. Ravindranath, S. Nath, J. Padhye, and H. Balakrishnan. Automatic andscalable fault detection for mobile applications. In Proceedings of the 12th194Annual International Conference on Mobile Systems, Applications, andServices, MobiSys ’14, pages 190–203. ACM, 2014. → pages 49[191] P. C. Rigby and M.-A. Storey. Understanding broadcast based peer reviewon open source software projects. In Proceedings of the 33rd InternationalConference on Software Engineering, ICSE ’11, pages 541–550. ACM,2011. → pages 12, 54[192] T. Roehm, R. Tiarks, R. Koschke, and W. Maalej. How do professionaldevelopers comprehend software? In Proceedings of the InternationalConference on Software Engineering (ICSE), pages 255–265. IEEEComputer Society, 2012. → pages 4, 7, 115, 162, 169[193] T. Roehm, N. Gurbanova, B. Bruegge, C. Joubert, and W. Maalej.Monitoring user interactions for supporting failure reproduction. InInternational Conference on Program Comprehension (ICPC), pages73–82. IEEE, 2013. → pages 79, 82[194] I. Ruiz, M. Nagappan, B. Adams, and A. Hassan. Understanding reuse inthe Android market. In Program Comprehension (ICPC), 2012 IEEE 20thInternational Conference on, pages 113–122. IEEE, June 2012. → pages 2,47, 50, 85, 112[195] I. M. Ruiz, M. Nagappan, B. Adams, T. Berger, S. Dienst, and A. Hassan.On the relationship between the number of ad libraries in an android appand its rating. IEEE Software, 99, 2014. → pages 2, 47, 50, 85, 112[196] A. Sadeghi, N. Esfahani, and S. Malek. Mining the categorized softwarerepositories to improve the analysis of security vulnerabilities. InProceedings of the 17th International Conference on FundamentalApproaches to Software Engineering - Volume 8411, pages 155–169.Springer-Verlag, 2014. → pages 47[197] F. Sarro, A. Al-Subaihin, M. Harman, Y. Jia, W. Martin, and Y. Zhang.Feature lifecycles as they spread, migrate, remain, and die in app stores. InProc. RE, pages 76–85. IEEE, 2015. → pages 50[198] Scikit Learn: Machine Learning in Python.http://scikit-learn.org/stable/index.html. Accessed: 2015-12-15. → pages93[199] C. Seaman. Qualitative methods in empirical studies of softwareengineering. Software Engineering, IEEE Transactions on, 25(4):557–572,Jul 1999. → pages 3, 11195[200] S. Seneviratne, A. Seneviratne, M. A. Kaafar, A. Mahanti, andP. Mohapatra. Early detection of spam mobile apps. In Proceedings of the24th International Conference on World Wide Web, WWW, pages 949–959.ACM, 2015. → pages 2, 85, 112[201] H. Seo and S. Kim. Predicting recurring crash stacks. In Proceedings of theInternational Conference on Automated Software Engineering (ASE),pages 180–189. ACM, 2012. → pages 80[202] E. Shihab, A. Ihara, Y. Kamei, W. Ibrahim, M. Ohira, B. Adams,A. Hassan, and K.-i. Matsumoto. Predicting re-opened bugs: A case studyon the eclipse project. In Proceedings of the Working Conference onReverse Engineering (WCRE), pages 249–258. IEEE Computer Society,2010. → pages 80[203] I. Steinmacher, T. Uchoa Conte, and M. Gerosa. Understanding andsupporting the choice of an appropriate task to start with in open sourcesoftware communities. In 48th Hawaii International Conference on SystemSciences (HICSS), pages 5299–5308. IEEE, 2015. → pages 2, 12, 54[204] M. Sulayman, C. Urquhart, E. Mendes, and S. Seidel. Software processimprovement success factors for small and medium web companies: Aqualitative study. Inf. Softw. Technol., 54(5):479–500, 2012. → pages 3,11, 12, 54[205] M. Szydlowski, M. Egele, C. Kruegel, and G. Vigna. Challenges forDynamic Analysis of iOS Applications. In Proceedings of the Workshop onOpen Research Problems in Network Security (iNetSec), pages 65–77,2011. → pages 47, 116, 120, 126, 161, 172[206] M. Szydlowski, M. Egele, C. Kruegel, and G. Vigna. Challenges fordynamic analysis of iOS applications. In Proceedings of the 2011 IFIP WG11.4 International Conference on Open Problems in Network Security,iNetSec’11, pages 65–77. Springer-Verlag, 2012. → pages 50[207] K. Thomas, A. K. Bandara, B. A. Price, and B. Nuseibeh. Distillingprivacy requirements for mobile applications. In Proceedings of the 36thInternational Conference on Software Engineering, ICSE 2014, pages871–882. ACM, 2014. → pages 35[208] Y. Tian, M. Nagappan, D. Lo, and A. E. Hassan. What are thecharacteristics of high-rated apps? a case study on free android196applications. In IEEE International Conference on Software Maintenanceand Evolution (ICSME). IEEE, 2015. → pages 112[209] Top Free in Android Apps.https://play.google.com/store/apps/collection/topselling free?hl=en.Accessed: 2015-12-15. → pages 91[210] N. Viennot, E. Garcia, and J. Nieh. A measurement study of google play.In The 2014 ACM International Conference on Measurement and Modelingof Computer Systems, SIGMETRICS ’14, pages 221–233. ACM, 2014. →pages 107[211] P. M. Vu, T. T. Nguyen, H. V. Pham, and T. T. Nguyen. Mining useropinions in mobile app reviews: A keyword-based approach. CoRR,abs/1505.04657, 2015. → pages 112[212] W. Wang and M. W. Godfrey. Detecting api usage obstacles: A study ofiOS and Android developer questions. In Proceedings of the10th WorkingConference on Mining Software Repositories, MSR ’13, pages 61–64.IEEE Press, 2013. → pages 2, 50, 52[213] A. I. Wasserman. Software engineering issues for mobile applicationdevelopment. In FSE/SDP workshop on Future of software engineeringresearch, FoSER’10, pages 397–400. ACM, 2010. → pages 1, 11, 50, 51,115[214] M. Waterman, J. Noble, and G. Allan. How much up-front? a groundedtheory of agile architecture. In IEEE/ACM 37th IEEE InternationalConference on Software Engineering (ICSE), 2015, volume 1, pages347–357. IEEE, 2015. → pages 54[215] L. Wu, M. Grace, Y. Zhou, C. Wu, and X. Jiang. The impact of vendorcustomizations on Android security. In Proceedings of the 2013 ACMSIGSAC Conference on Computer &#38; Communications Security, CCS’13, pages 623–634. ACM, 2013. → pages 109[216] S. Yang, D. Yan, H. Wu, Y. Wang, and A. Rountev. Static control-flowanalysis of user-driven callbacks in android applications. In Proceedings ofthe 37th International Conference on Software Engineering - Volume 1,ICSE ’15, pages 89–99. IEEE Press, 2015. → pages 47[217] W. Yang, M. R. Prasad, and T. Xie. A grey-box approach for automatedGUI-model generation of mobile applications. In Proceedings of the197International Conference on Fundamental Approaches to SoftwareEngineering (FASE), pages 250–265. Springer-Verlag, 2013. → pages 47,48, 164, 173[218] W. Yang, X. Xiao, B. Andow, S. Li, T. Xie, and W. Enck. Appcontext:Differentiating malicious and benign mobile app behaviors using context.In Proceedings of the 37th International Conference on SoftwareEngineering, ICSE 2015. ACM, 2015. → pages 47, 50, 112[219] Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, and L. Bairavasundaram. How dofixes become bugs? In Proceedings of ESEC/FSE, pages 26–36. ACM,2011. → pages 58, 80[220] P. Zhang and S. Elbaum. Amplifying tests to validate exception handlingcode. In Proceedings of the International Conference on SoftwareEngineering (ICSE), pages 595–605. IEEE Computer Society, 2012. →pages 38[221] C. Zheng, S. Zhu, S. Dai, G. Gu, X. Gong, X. Han, and W. Zou.Smartdroid: an automatic system for revealing ui-based trigger conditionsin android applications. In Proceedings of the second ACM workshop onSecurity and privacy in smartphones and mobile devices, SPSM ’12, pages93–104. ACM, 2012. → pages 47[222] T. Zimmermann, N. Nagappan, P. J. Guo, and B. Murphy. Characterizingand predicting which bugs get reopened. In Proceedings of theInternational Conference on Software Engineering (ICSE), pages1074–1083. IEEE Computer Society, 2012. → pages 58, 80, 81198

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0228781/manifest

Comment

Related Items