UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Mining and characterizing cross-platform apps Ali, Mohamed Hassanein Tawfik Ibrahim 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_ february_ali_mohamed.pdf [ 1.74MB ]
Metadata
JSON: 24-1.0340560.json
JSON-LD: 24-1.0340560-ld.json
RDF/XML (Pretty): 24-1.0340560-rdf.xml
RDF/JSON: 24-1.0340560-rdf.json
Turtle: 24-1.0340560-turtle.txt
N-Triples: 24-1.0340560-rdf-ntriples.txt
Original Record: 24-1.0340560-source.json
Full Text
24-1.0340560-fulltext.txt
Citation
24-1.0340560.ris

Full Text

Mining and Characterizing Cross-Platform AppsbyMohamed Hassanein Tawfik Ibrahim AliB.Sc., The University of British Columbia, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)December 2016© Mohamed Hassanein Tawfik Ibrahim Ali, 2016AbstractSmartphones and the applications (apps), which run on them, have grown tremen-dously over the past few years. To capitalize on this growth and attract more users,developing the same app for different platforms has become a common industrypractice. However, each mobile platform has its own development language, Ap-plication program interfaces (APIs), software development kits (SDKs) and onlinestores for distributing apps to users. To understand the characteristics of and differ-ences in how users perceive the same app implemented for and distributed throughdifferent platforms, we present a large-scale comparative study of cross-platformapps. We mine the characteristics of 80,000 app-pairs (160K apps in total) from acorpus of 2.4 million apps collected from the Apple and Google Play app stores.We quantitatively compare their app-store attributes, such as stars, versions, andprices. We measure the aggregated user-perceived ratings and find many differ-ences across the platforms. Further, we employ machine learning to classify 1.7million textual user reviews obtained from 2,000 of the mined app-pairs. We ana-lyze discrepancies and root causes of user complaints to understand cross-platformdevelopment challenges that impact cross-platform user-perceived ratings. We alsofollow up with the developers to understand the reasons behind identified differ-ences.Further, we take a closer look at a special category of cross-platform apps,which are built using Cross Platform Tools (CPTs). CPTs allow developers to usea common code-base to simultaneously create apps for multiple platforms. Appscreated using these CPTs are called hybrid apps. We mine 15,512 hybrid apps;measure their aggregated user-perceived ratings and compare them to native appsof the same category.iiPrefaceThis thesis presents two large-scale empirical studies on cross-platform apps. Thework presented was conducted by myself in collaboration with my supervisor, Pro-fessor Ali Mesbah. Chapter 2 of this thesis was done with equal collaboration fromMona Erfani Joorabchi. I was responsible for devising the approach and collectingthe data, running the experiments, analyzing the results and writing the manuscript.My collaborators guided me with the creation of the methodology, analysis ofresults, editing and writing portions of the manuscript. The results described inChapter 2 are submitted as a full paper to an ACM SIGSOFT conference and arecurrently under review. The work presented in Chapter 3 has been published asa workshop paper in November 2016 in the Proceedings of the 1st InternationalWorkshop on App Market Analytics (WAMA) [4].iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 42 Same App, Different App Stores:A Large-Scale Study of Cross-Platform Apps . . . . . . . . . . . . . 52.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Matching Apps to Find App-Pairs . . . . . . . . . . . . . 72.1.3 App-store Attribute Analysis . . . . . . . . . . . . . . . . 102.1.4 User Reviews . . . . . . . . . . . . . . . . . . . . . . . . 102.1.5 User-Perceived Rating . . . . . . . . . . . . . . . . . . . 122.1.6 Cross-platform Complaint Analysis . . . . . . . . . . . . 142.1.7 Datasets and Classifiers . . . . . . . . . . . . . . . . . . . 152.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.1 Prevalence and Attributes (RQ1) . . . . . . . . . . . . . . 152.2.2 Top Rated Apps (RQ2) . . . . . . . . . . . . . . . . . . . 212.2.3 Aggregated User-Perceived Ratings (RQ3) . . . . . . . . 222.2.4 Complaints Across Platforms (RQ4) . . . . . . . . . . . . 242.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32iv3 Mining and Characterizing Hybrid Apps . . . . . . . . . . . . . . . 333.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . 343.1.2 Finding Hybrid Apps . . . . . . . . . . . . . . . . . . . . 343.1.3 User-Perceived-Rating . . . . . . . . . . . . . . . . . . . 353.1.4 Dataset and Results . . . . . . . . . . . . . . . . . . . . . 373.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2.1 Prevalence and Popularity of CPTs (RQ1) . . . . . . . . . 373.2.2 Effect of CPT on App’s AUR (RQ2) . . . . . . . . . . . . 383.2.3 Hybrid Vs Native (RQ3) . . . . . . . . . . . . . . . . . . 403.2.4 AUR Across Platforms (RQ4) . . . . . . . . . . . . . . . 413.3 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.1 Cross Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2 Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.3 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4 Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 555.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57vList of TablesTable 2.1 Collected app-pair attributes . . . . . . . . . . . . . . . . . . . 7Table 2.2 Real-world reviews and their classifications. . . . . . . . . . . 12Table 2.3 Ranking apps using different metrics. . . . . . . . . . . . . . . 14Table 2.4 iOS & AND descriptive statistics: Cluster Size (C), Ratings (R),Stars (S), and Price (P). . . . . . . . . . . . . . . . . . . . . . 17Table 2.5 Statistics of 14 Apps used to build the classifiers (C1 = GenericClassifier, C2 = Sentiment Classifier, NB = Naive Bayes Algo-rithm, SVM = Support Vector Machines Algorithm) . . . . . . 26Table 2.6 Descriptive statistics for iOS & AND reviews: Problem Discov-ery (PD), Feature Request (FR), Non-informative (NI), Positive(P), Negative (N), Neutral (NL), and AUR. . . . . . . . . . . . 28Table 2.7 Descriptive statistics for problematic reviews: App Feature (AF),Critical (CR), Post Update (PU), and Price Complaints (PC). . 28Table 3.1 Number of Hybrid apps using different CPTs . . . . . . . . . . 37Table 3.2 Descriptive statistics for the hybrid apps: Ratings (R), Stars (S),and Downloads (D). . . . . . . . . . . . . . . . . . . . . . . . 39Table 3.3 Descriptive statistics for the hybrid apps: AUR . . . . . . . . . 40viList of FiguresFigure 2.1 Overview of our methodology. . . . . . . . . . . . . . . . . . 6Figure 2.2 Android Cluster for Swiped app. . . . . . . . . . . . . . . . . 8Figure 2.3 a) Groupon and b) Scribblenauts apps. Android apps are shownon the top and iOS apps at the bottom. . . . . . . . . . . . . . 9Figure 2.4 Matching App-pair Criteria. . . . . . . . . . . . . . . . . . . 9Figure 2.5 Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Figure 2.6 Ratings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 2.7 Stars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 2.8 Prices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 2.9 AUR scores calculated per platform. . . . . . . . . . . . . . . 23Figure 2.10 AUR scores calculated across the platforms. . . . . . . . . . . 23Figure 2.11 The rates of classifiers’ categories for our 2K app-pairs, whereeach dot represents an app-pair. . . . . . . . . . . . . . . . . 27Figure 2.12 The rates of complaints categories for our 2K app-pairs, whereeach dot represents an app-pair. . . . . . . . . . . . . . . . . 29Figure 3.1 Number of apps in each category created using the PhoneGapCPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Figure 3.2 Number of apps in each category created using the TitaniumCPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 3.3 Number of apps in each category created using the Adobe AirCPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Figure 3.4 AUR rates for the apps overall and across each CPT. . . . . . 41Figure 3.5 Average AUR for app categories for Native and Hyrbid Apps. 42Figure 3.6 AUR scores for 1400 hybrid app-pairs. Each pair of diamond(iOS)and square(Android) dots represents an app. The solid anddashed lines show the trend of AUR across the apps. . . . . . 43viiAcknowledgmentsI would like to thank my supervisor Dr. Ali Mesbah for giving me the chanceto pursue my masters and be part of his research group. His experience, profes-sionalism and valuable insights have been key to the success of this work. Hismentorship and guidance has been exemplary and I am confident that the lessons Ilearned from him will be valuable for my engineering career.I would like to thank my family especially my dad, Hassanein Ali, my momNabila Mohamed and my brothers, Karim, Diaa and Ahmed for their constant sup-port and love. Thank you for your encouragement and for inspiring me to becomea better person. Special thanks to my friends at the SALT lab for their valuablehelp, feedback and for creating a fun and productive environment. I would like toacknowledge and thank all my friends for their support and motivation.Last but not least, I would like to thank God for his countless blessings and forgiving me the chance to pursue my masters.viiiChapter 1IntroductionSmartphone use has grown tremendously in the last few years. This increasingpopularity can be attributed to the introduction of Apple’s App Store and Google’sPlay Store which were launched in 2008. These stores serve as the primary mediumfor the distribution of mobile applications (apps). Through app stores, users candownload and install apps on their mobile devices. App stores also provide animportant channel for app developers to collect user feedback, such as the overallrating of their app, issues or feature requests through user reviews.To attract as many users as possible, developers often implement the same appfor multiple mobile platforms [32]. While ideally, a given app should provide thesame functionality and high-level behavior across different platforms, this is notalways the case in practice [33]. For instance, a user of the Android Starbucksapp complains: “I downloaded the app so I could place a mobile order only tofind out it’s only available through the iPhone app.” Or an iOS NFL app reviewreads: “on the Galaxy you can watch the game live..., on this (iPad) the app crashessometimes, you can’t watch live games, and it is slow.”Recently, researchers have mined app stores by analyzing user-reviews [22, 59,88], app descriptions [46, 66, 94], and app bytecode [12, 90, 91]. However, existingstudies focus on one store at a time only. To the best of our knowledge, there is nocross-platform study that analyzes the same apps, published on different app stores.Currently, iOS [9] and Android [7] dominate the app market, each with over1.5 million apps in their respective app stores; hence, in this paper, we focus onthese two platforms. To understand how the same app is experienced by the userson different platforms, Chapter 2 of this thesis presents a large-scale study on mo-1bile app-pairs, i.e., the same app implemented for iOS and Android platforms. Weemploy a mixed-methods approach using both quantitative and qualitative analysis.We mine app-pairs and compare their various app-store attributes. We classify tex-tual user reviews to identify discrepancies of user complaints across the platforms.Our study helps to gain insight into the challenges faced by developers in cross-platform app development. It can help app developers to understand why the usersof their apps might perceive and experience the same app differently across plat-forms, and to mitigate the differences. Android has gained the majority of the at-tention from the software engineering research community so far. One of the majorobstacles with cross-platform analysis is the lack of a dataset for such apps [84].Our mined artifact of more than 80,000 app-pairs is the first publicly availabledataset that can be used by researchers to go beyond Android and study differentaspects of cross-platform apps.Currently there are three popular ways to build mobile apps. The first method,“Native”, allows developers to use the software development kits (SDKs) andframeworks for the targeted platform to build the app [55]. “Native”, allows devel-opers to use all of the capabilities of a device, provides the best performance and isdistributed through the platform’s dedicated app store. However, implementing anative app for multiple platforms requires familiarity with the languages, APIs andSDKs of each specific platform, which is resource intensive and time consuming.The second method, “Mobile Web App”, uses web technologies such as HTML,CSS and Javascript to build the application as one website which is optimized formobile devices. This approach allows the app to be used across multiple platformswhich reduces development cost and time. However, “Mobile Web App” cannotuse device specific hardware features such as the camera or accelerometer [55].The third method to build an app is “Hybrid”, which bridges the gap betweenthe first two methods. “Hybrid” uses a common code base to simultaneously de-liver native-like apps to multiple platforms. These apps can access the hardwarefeatures of the device and are also distributed using the platform’s app store. Hy-brid apps are created using Cross Platform Tools (CPTs). These CPTs provide twoapproaches to create apps; the first approach allows the developer to use web tech-nologies such as HTML, CSS and Javascript to create a code base which runs in aninternal browser (WebView) that is wrapped in a native app. Examples of CPTs us-2ing this approach include PhoneGap [3] and Trigger.io [102]. The other approachCPTs offer allows the developer to write their code in a language such as C# orJavascript which then gets compiled to native code for each platform. Examplesof CPTs that use this approach include Xamarin [110], Appcelerator Titanium [10]and Adobe Air [1]. Mobile Apps created using a CPT are referred to as hybridapps and we use that term to refer to these apps throughout this paper.A recent study conducted by Viennot et al. [103] found that out of 1.1 MillionAndroid apps, 129,800 (11.8 %) were hybrid apps.In Chapter 3 we present a large-scale study on hybrid apps in order to un-derstand their behavior and characteristics by analyzing their various app-store at-tributes. Additionally, we compare hybrid apps across the Android and iOS plat-forms. Finally we compare hybrid apps to native ones in terms of user perception.By presenting real market data, our study can help developers decide if using aCPT to create their apps is the best solution.1.1 ContributionsThis thesis makes the following main contributions:• The first dataset of 80,169 cross-platform app-pairs (iOS/Android), extractedby analyzing the properties of 2.4M apps from the Google Play and Appleapp stores. Our app-pair dataset is publicly available [80].• A metric for measuring an app’s aggregated user-perceived ratings, whichcombines ratings and stars.• A characterization and comparison of app-pair attributes such as stars, rat-ings, prices, versions, and updates across platforms.• Qualitative developer feedback, providing insights into the cause of varia-tions in development, prices, and user-perceived ratings across platforms.• Sentiment and complaints analysis of user reviews across app-pairs.• A technique to identify hybrid apps along with a large dataset of 15,512hybrid apps. Our dataset is publicly available [58].3• A characterization of hybrid app attributes such as ratings, stars and down-loads.• A comparison between hybrid and native apps in terms of user perception.• A comparison of hybrid apps across the Android and iOS platforms.1.2 Thesis OrganizationIn Chapter 2 of this thesis we describe the empirical study we conducted to mineand characterize cross-platform app-pairs, the results of our analysis and their im-plications on developers and researchers. In Chapter 3 we present our empiricalstudy on hybrid apps. This chapter outlines how we mine hybrid apps and presentsthe results of our analysis on the app attributes. Chapter 4 discusses the relatedwork, and Chapter 5 concludes and presents future research directions.4Chapter 2Same App, Different App Stores:A Large-Scale Study of Cross-Platform AppsSummary1To attract more users, implementing the same mobile app for different platformshas become a common industry practice. App stores provide a unique channel forusers to share feedback on the acquired apps through ratings and textual reviews.However, each mobile platform has its own online store for distributing apps tousers. To understand the characteristics of and differences in how users perceive thesame app implemented for and distributed through different platforms, we present alarge-scale comparative study of cross-platform apps. We mine the characteristicsof 80,000 app-pairs (160K apps in total) from a corpus of 2.4 million apps collectedfrom the Apple and Google Play app stores. We quantitatively compare their app-store attributes, such as stars, versions, and prices. We measure the aggregateduser-perceived ratings and find many differences across the platforms. Further,we employ machine learning to classify 1.7 million textual user reviews obtainedfrom 2,000 of the mined app-pairs. We analyze discrepancies and root causes ofuser complaints to understand cross-platform development challenges that impactcross-platform user-perceived ratings. We also follow up with the developers tounderstand the reasons behind identified differences.1This chapter is submitted to an ACM SIGSOFT conference.5Figure 2.1: Overview of our methodology.2.1 MethodologyOur analysis is based on a mixed-methods research approach [27], where we col-lect and analyze both quantitative and qualitative data. We address the followingresearch questions in our study:RQ1. How prevalent are app-pairs? Do app-pairs exhibit the same characteristicsacross app-stores?RQ2. Why do some developers make theirs apps only available on one platform?RQ3. Do users perceive app-pairs equally across platforms?RQ4. Are the major user concerns or complaints the same across platforms?Figure 2.1 depicts our overall approach. We use this figure to illustrate ourmethodology throughout this section.2.1.1 Data CollectionTo collect Android and iOS apps along with their attributes (Box 1 in Figure 2.1),we use two open-source crawlers, namely Google Play Store Crawler [44] and Ap-ple Store Crawler [11] and mine apps from the two app stores, respectively. We6Table 2.1: Collected app-pair attributes# Apple; Google Description1 name; title Name of the app.2 developerName;developer nameName of the developer/company of theapp.3 description; description Indicates the description of the app.4 category; category Indicates the category of the app; 23Apple & 27* Google categories.5 isFree; free True if the app is free.6 price; price Price ($) of the app.7 ratingsAllVersions;ratingsAllVersionsNumber of users rating the app.8 starsVersionAllVersions;star ratingAverage of all stars (1 to 5) given to theapp.9 version; version string User-visible version string/number.10 updated; updated Date the app was last updated.*Google has its apps split into Games and Applications. We count Games as one category.only collect app attributes that are available on both stores. For instance, infor-mation about the number of downloads is only available for Android but not iOS,and thus, we ignore this attribute. Table 2.1 outlines the list of attributes we col-lect. This mining step results in 1.4 million Android apps and 1 million iOS apps.Data collection was conducted between Sep–Nov 2015 and the data was stored ina MongoDB database, which takes up approximately 2.1GB of storage [80].2.1.2 Matching Apps to Find App-PairsAfter creating the Android and iOS datasets separately, we set out to find app-pairsby matching similar apps in the two datasets. The unique IDs for iOS and Androidapps are different and thus cannot be used to match apps, i.e., Android apps havean application ID composed of characters while iOS apps have a unique 8 digitnumber. However, app names are generally consistent across the platforms sincethey are often built by the same company/developer. Thus, we use app name anddeveloper name to automatically search for app-pairs. This approach could resultin multiple possible matches because (1) on one platform, developers may developclose variants of their apps with extra features that have similar names (See Fig-ure 2.2); (2) the same app could have slightly different names across the platforms(See Figure 2.3–a); (3) the same app could have slightly different developer namesacross the platforms (See Figure 2.3–b).7Clustering per platformTo find app-pairs more accurately, we first cluster the apps on each platform. Thisstep (outlined in Box 2 of Figure 2.1) groups together apps on each platform thatbelong to the same category, have similar app names (i.e., having the exact rootword, but allowing permutations) and the same developer name. Figure 2.2 is anexample of a detected Android cluster. The apps in this cluster are all developed byiGold Technologies, belong to the Game category and have similar (but not exact)names.Figure 2.2: Android Cluster for Swiped app.We execute a clustering algorithm on the Android and iOS datasets, separately.The algorithm takes as input a collection of apps and annotates the collection togroup the apps together. For each app, we extract the app name, developer name,and category. Next, if an app has not been annotated previously, we annotate it witha unique clusterID. Then we search for apps in the collection that have a similarname, exact developer name, and belong to the same category. If a match is found,we annotate the found app with the same clusterID.Detecting App-PairsWe consider an app-pair to consist of the iOS version and the Android version ofthe same app. In our attempt to find app-pairs (Box 3 in Figure 2.1), we noticed thatAndroid and iOS apps have different naming conventions for app names and de-veloper names. For instance, Figure 2.3–a depicts an app developed by ‘Groupon,Inc.’, with different naming conventions for app names; ‘Groupon - Daily Deals,Coupons’ on the Android platform whereas ‘Groupon - Deals, Coupons & Shop-ping: Local Restaurants, Hotels, Beauty & Spa’ on the iOS platform. Similarly,8Figure 2.3–b shows the ‘Scribblenauts Remix’ app, which has the exact name onboth platforms, but has differences in the developer’s name.(a) (b)Figure 2.3: a) Groupon and b) Scribblenauts apps. Android apps are shown on the top and iOS apps at thebottom.Figure 2.4 shows the app-pairs we find using matching criteria with differentconstraints. Criteria E looks for app-pairs having exact app and developer namewhereas Criteria S relaxes both the app and developer name, thus matching theapps in Figure 2.3 as app-pairs.ID App-pair CriteriaE EXACT(AppName) & EXACT(DevName)S SIMILAR(AppName) & SIMILAR(DevName)Figure 2.4: Matching App-pair Criteria.To find app-pairs, we match the Android clusters with their iOS counterparts.First, we narrow down the search for a matching cluster by only retrieving thosewith a similar developer name. This results in one or more possible matchingclusters and we identify the best match by comparing the items in each cluster.Thus, for each app in the Android cluster, we look for an exact match (criteria E)in the iOS cluster. If no match is found, we relax the criteria and look for matches9having a similar app and developer name (criteria S). The set of all possible app-pairs is a superset of S, and S is a superset of E, as depicted in the Venn diagram ofFigure 2.4.Exact App-PairsWe perform the rest of our study using criteria E, which provides a large-enoughset of exactly matched app-pairs needed for our analysis. To validate whether cri-teria E correctly matches app-pairs, the first two authors manually compared appnames, descriptions, developers’ names, app icons and screenshots of 100 ran-domly selected app-pairs and the results indicated that there are no false positives.This is, however, no surprise given the strict criteria defined in E.2.1.3 App-store Attribute AnalysisTo address RQ1, (Box 4 in Figure 2.1) we compare the captured attributes betweenthe iOS and Android app-pairs and present the results in Section 4.3.To address RQ2, we use the iTunes Store RSS Feed Generator [63] to retrievethe top rated apps, which enables us to create custom RSS feeds by specifying feedlength, genres, country, and types of the apps to be retrieved. These feeds reflectthe latest data in the Apple app store. The Google Play store provides the list oftop rated Android apps [61] as well. We collected the top 100 free and 100 paidiOS apps belonging to all genres, as well as top 100 free and 100 paid Androidapps belonging to all categories (Box 5 in Figure 2.1). To check whether a topapp exists on both platforms, we apply our exact app-pair technique as describedin the previous section. Since the lists were not long, we also manually validatedthe pairs using the app name, developer name, description and screenshots.2.1.4 User ReviewsIn addition to collecting app-store attributes for our app-pairs in RQ1, we analyzeuser reviews of app-pairs to see if there are any discrepancies in the way usersexperience the same app on two different platforms (RQ4).To that end, we first select 2,000 app-pairs that have more than 500 ratings,from our app-pair dataset. This allows us to target the most popular apps with10enough user reviews to conduct a thorough analysis. To retrieve the user reviews,we use two open-source scrapers, namely the iTunes App Store Review Scraper[62] and the Goole Play Store Review Scraper [45]. In total, we retrieve 1.7M userreviews for the 2K app-pairs.The goal is to semi-automatically classify the user reviews of the app-pairsand compare them at the app and platform level. To achieve this, we use naturallanguage processing and machine learning to train two classifiers (Box 6 in Fig-ure 2.1). Each classifier can automatically put a review into one of its three classes.Generic Feedback Analysis. As shown in Table 2.2, our generic feedback clas-sifier (C1) has three unique class labels {Problem Discovery, Feature Request,Non-informative}; where Problem Discovery implies that the user review pertainsto a functional (bug), or non-functional (e.g., performance), or an unexpected is-sue with the app. Feature Request indicates that the review contains suggestions,improvements, requests to add/modify/bring back/remove features. Finally, Non-informative means that the review is not a constructive or useful feedback; suchreviews typically contain user emotional expressions (e.g., ‘I love this app’, de-scriptions (e.g., features, actions) or general comments. We have adopted theseclasses from recent studies [22, 88] and slightly adapted them to fit our analysis ofuser complaints and feedback across the two platforms.Sentiment Analysis. Additionally, we are interested in comparing the sentiment(C2 in Table 2.2) classes of {Positive, Negative, Neutral} between the reviews ofapp-pairs. We use these rules to assign class labels to review instances. Table 2.2provides real review examples of the classes in our classifiers.Labelling ReviewsSince labelling is a tedious and time-consuming task, we constrain the number ofapp-pairs and reviews to manually label. We randomly selected 1,050 Android userreviews and 1,050 iOS user reviews from 14 app-pairs. These app-pairs were inthe list of the most popular apps and categories in their app stores. The manuallabeling of reviews was first conducted by one author following the classificationrules inferred in Table 2.2. Subsequently, any uncertainties were cross-validatedand resolved through discussions and refinements between the authors. Overall,11Table 2.2: Real-world reviews and their classifications.C1 – Generic Feedback Classifier1 Problem Discovery: “Videos don’t work. The sound is workingbut the video is just a black screen.”2 Feature Request: “I would give it a 5 if there were a way to excludechain restaurants from dining options.”3 Non-informative: “A far cry from Photoshop on the desktop, butstill a handy photo editor for mobile devices with...”C2 – Sentiment Classifier1 Positive: “Amazing and works exactly how I want it to work. Noth-ing bad about this awesome and amazing app!”2 Negative: “The worst, I downloaded it with quite a lot of excitementbut ended up very disappointed”3 Neutral: “No complaints because I’m not a complainer save youroption for something that matters”we label 2.1K reviews for training each of the two classifiers (Box 7 in Figure 2.1).Building ClassifiersTo build our classifiers, we use the bags of words representation, which countsthe number of occurrences of each word to turn the textual content into numeri-cal feature vectors. Next, we preprocess the text, tokenize it and filter stop words.We use the feature vectors to train our classifier and apply a machine learning al-gorithm on the historical training data. In this work, we experimented with twowell-known and representative semi-supervised algorithms, Naive Bayes (NB) andSupport Vector Machines (SVM). We use the Scikit Learn Tool [93] to build ourclassifiers. The training and testing data for our classifiers were randomly com-posed of 1,575 and 525 of the manually labelled reviews, respectively. We repeatedthis trial 25 times to train both our generic and sentiment classifiers and comparedthe NB and SVM algorithms. We choose the generic (C1) and sentiment (C2)classifiers with the best F-measures.We use the trained classifiers to classify ∼1.7M reviews of the 2K app-pairs.2.1.5 User-Perceived RatingThere are multiple ways to measure how end-users perceive an app. For examplethe number of downloads can be an indication of the popularity of an app. How-ever, as discussed by Tian et al. [100], many users download an app without ever12using it. More importantly, as explained in Section 2.1.1, Apple does not publishthe download count for iOS apps, which means we cannot use this metric in ourcross-platform study.Another method is to measure the sentiment of user reviews through NLP tech-niques. Such techniques, however, lack the required accuracy for measuring suc-ces [50, 99].The star rating of an app, which is the average rating of an app (between 1–5),has been used in many studies to measure an app’s success rate [14, 49, 53, 100].However, relying only on the average star rating of an app might be misleadingsince it does not take into account the number of ratings the app receives. Forinstance the Facebook app on the Google Play store currently has an average starrating of 4 with over 40 million ratings. On the other hand, OneRepMaxCalculatorcurrently has an average star rating of 5, but only seven ratings. Despite havinga lower star rating, logically the Facebook app is better perceived because it hasmuch more ratings. To mitigate this issue, we combine the average star rating withthe number of ratings to measure the Aggregated User-perceived Rating (AUR)(Box 8 in Figure 2.1) of an app as follows:AUR(appi) =vi× rivi +m+m× cvi +m(2.1)where1. vi is the number of ratings for appi,2. ri is the average stars for the appi,3. m is the average number of ratings (for all apps in the dataset),4. c is the average number of stars (for all apps in the dataset).If an app does not have enough ratings (i.e., less than m) we cannot place much truston the few ratings to accurately measure aggregate rating, and thus the formulapenalizes it by bringing in the average values of m ratings and c stars.We were inspired by the movies ranking algorithm [60] of the Internet MovieDatabase (IMDB), which uses user votes to generate the top 250 movies. Theformulae is believed to provide a true Bayesian estimate [35, 60].13Table 2.3: Ranking apps using different metrics.App Ratings (R) Stars (S) Rank S Rank R Rank AURA 1 5.0 1 6 4B 4 4.8 2 5 3C 1825 4.7 3 2 1D 11 2.1 5 4 5E 67 4.6 4 3 2F 2796 1.8 6 1 6AUR provides a number between [1–5], which we convert into a percentage tobetter represent the results. In practice, some apps have no ratings and no stars. Inour work, we require that an app must have at least one rating to be included in theanalysis.To illustrate the need for combining ratings and stars, and evaluate our pro-posed AUR metric, we randomly selected 100 apps from our dataset and rankedthem based on different metrics. The average ratings (m) and stars (c) across the100 apps were 142 and 4.1, respectively. Table 2.3 presents the rankings for six ofthe apps based on the stars, the ratings, and AUR. Using only the stars ranks appA first although it only has one single rating. Using only the ratings would rankapp F first although it has only 1.8 stars. Our proposed metric, AUR, ranks C first,because it has many ratings (1825) and relatively high stars (4.7). It ranks F last,which has many ratings but the lowest stars.2.1.6 Cross-platform Complaint AnalysisThe goal in RQ4 is to understand the nature of user complaints and how they differon the two platforms (Box 9 in Figure 2.1). To address this, we first collect theProblem Discovery reviews for 20 app-pairs having (1) the biggest differences inAUR rates between the platforms, and (2) over 100 problematic reviews. These20 app-pairs are split into 10 in which Android has a higher AUR than iOS and10 in which iOS has a higher AUR than Android. Then, we manually inspect andlabel 1K problematic reviews (Box 10 in Figure 2.1), by randomly selecting 25Android user reviews and 25 iOS user reviews from each of the 20 app-pairs. Wenoticed that user complaints usually fall into the following five subcategories: (1)Critical: issues related to crashes and freezes; (2) Post Update: problems occurringafter an update/upgrade; (3) Price Complaints: issues related to app prices; (4) App14Features: issues related to functionality of a feature, or its compatibility, usability,security, or performance; (5) Other: irrelevant comments.We use the labelled dataset to build a complaints classifier to automaticallyclassify ∼350K problematic reviews of our 2K app-pairs.2.1.7 Datasets and ClassifiersAll our extracted data, datasets for the identified app-pairs and the 2K app-pairsalong with their extracted user reviews, as well as all our scripts and classifiers arepublicly available [80].2.2 FindingsIn this section, we present the results of our study for each research question.2.2.1 Prevalence and Attributes (RQ1)We found 1,048,575 (∼1M) Android clusters for 1,402,894 (∼1.4M) Android appsand 935,765 (∼0.9M) iOS clusters for 980,588 (∼1M) iOS apps in our dataset. Thelargest Android cluster contains 219 apps2 and the largest iOS cluster contains 65apps.3 Additionally, 7,845 Android and 9,016 iOS clusters have more than oneitem. The first row of Table 2.4 shows descriptive statistics along with p-value(Mann-Whitney) for cluster sizes, ignoring clusters of size 1. Figure 2.5 depictsthe cluster sizes for the two platforms. We ignore outliers for legibility. The resultsare statistically significant (p < 0.05) and show that while Android clusters deviatemore than iOS clusters, the median in iOS is higher than Android by one. Thiscould be explained perhaps by the following two observations: (1) not all iOS appsare universal apps (i.e., run on all iOS devices) and some apps have both iPhone-only and iPad-only apps instead of one universal app; (2) iOS has more free andpro versions of the same app than Android.2 https://play.google.com/store/search?q=Kira-Kira&c=apps&hl=en3 https://itunes.apple.com/us/developer/urban-fox-production-llc/id39569678815AND iOS2.02.53.03.54.04.55.0Cluster Size (# Apps in Clusters)Figure 2.5: Clusters.Prevalence of app-pairsWe found 80,169 (∼80,000 ) exact app-pairs (Criteria E in Figure 2.4), which is8% of the total iOS apps, and 5.7% of the total Android apps in our datasets.When we relax both app and developer names, the number of app-pairs increasesto 116,326 (∼117K ) app-pairs, which is 13% of our iOS collection and 9.2% ofour Android collection. While our dataset contains apps from 22 Apple and 25Google categories, most of the pairs belong to popular categories, which exist onboth platforms: {Games, Business, Lifestyle, Education, Travel, Entertainment,Music, Finance, Sports}.Finding 1: Our results indicate that a large portion of apps (87–95%) are devel-oped for one particular platform only.Ratings & StarsInterestingly, 68% of Android and only 18% of iOS apps have ratings. The Medianis 0 for all iOS and 3 for all Android, as depicted in Table 2.4. However, when weonly consider apps with at least one rating, the median increases to 21 for iOS and11 for Android (See Figure 2.6). We ignore outliers for legibility. Furthermore, wecompare the differences between ratings for each pair. In 63% of the pairs, Androidapps have more users rating them (on average 4,821 more users) whereas in only16Table 2.4: iOS & AND descriptive statistics: Cluster Size (C), Ratings (R), Stars (S), and Price (P).ID Type Min Mean Median SD Max P-ValC iOS2 3.30 3.00 2.11 650AND 2 3.00 2.00 3.69 219R iOS5 1,935.00 21.00 26,827.24 1,710,2510AND 1 4,892.00 11.00 171,362.40 28,056,146R* iOS0 353.10 0.00 11,483.19 1,710,2510AND 0 3,302.00 3.00 140,807.60 28,056,146S iOS1 3.80 4.00 0.90 50AND 1 4.04 4.10 0.82 5S* iOS0 0.70 0.00 1.52 50AND 0 2.73 3.70 2.01 5P iOS0.99 3.58 1.99 9.73 5000AND 0.95 4.00 2.11 9.81 210*Including apps that have no ratings/stars/prices (i.e., all apps).5% of the pairs, iOS apps have more users rating them (on average 1,966 moreusers). Additionally, the results of ratings in Table 2.4 are statistically significant(p < 0.05), indicating that Android users tend to rate apps more than iOS users.The categories with the highest ratings were { Personalization, Communication,Games} on Android and {Games, Social Networking, Photo & Video} on iOS.Similarly, 68% of Android and 18% of iOS apps have stars. When we considerthe apps with stars, the median increases to 4 for iOS and 4.1 for Android (SeeFigure 2.7). Comparing the differences between the stars for each pair, in 58% ofthe pairs, Android apps have more stars while in only 8% of the pairs, iOS appshave more stars. Additionally, while the results of stars are statistically significant(p < 0.05), the observed differences, having almost the same medians (see Ta-ble 2.4), are not indicative, meaning that although Android users tend to star appsmore than iOS users, the starred app-pairs have similar stars. The categories withthe highest number of stars were {Weather, Music & Audio, Comics} on Androidand {Games, Weather, Photo & Video} on iOS.Finding 2: Android users tend to rate apps more than iOS users.Prices of app-pairsIdeally, the same app should have the same price on different platforms. The gen-eral belief is that developers price their iOS apps higher than Android apps. We17AND iOS050100150200Number of People who RateFigure 2.6: Ratings.AND iOS12345StarsFigure 2.7: Stars.explored to what extend this is true.Our results show that 88% of app-pairs have different prices for their Androidversus iOS versions. Comparing the rate of free and paid apps, 10% of the An-droid and 12% of iOS apps are paid. In 34% of the pairs, iOS apps have a higherprice whereas in 54% of the pairs, Android apps have a higher price. As Table 2.4shows, the mean and median for paid apps are slightly higher for Android com-pared to iOS. The categories with the most expensive apps were {Medical, Books& Reference, Education} on Android and {Medical, Books, Navigation} on iOS.Finding 3: Our results indicate that while more Android apps are free, the paidAndroid apps have slightly higher prices than their iOS counterparts.For some of the app-pairs, the price differences is huge, as depicted in Fig-ure 2.8.To understand the reasons behind these differences, we sent emails to all thedevelopers of app-pairs with price differences of more than $10 (US) and askedwhy their app-pairs were priced differently on the two platforms. Out of 52 emailssent, we received 25 responses and categorized the main reasons:Different monetization strategies per app store. For instance, “the difference isthat the Android version includes consolidation ($9.99), charting ($14.99),reports ($9.99) and rosters ($14.99), whereas these are ‘in app purchase’options on Apple devices.”18llllllll l ll lllllll ll ll l llll l llll ll llll lllll l lll ll l lllll llll l ll lllllll lll l llllllll l l lll l ll lll ll l l lll l ll lll ll ll l l ll ll lll lll llll l llll llll l ll ll llllll ll l l l l ll l l lll lllllll l l0 50 100 150 2000100200300400500AND Price ($)iOS Price ($)Figure 2.8: Prices.Different set of features on the two platforms: “the iOS version offers more fea-tures than the Android version.”Development/maintenance costs of the app: one respondent said “the effort tomaintain an App on iOS is much higher than on Android”, while anotherstated “Android is relatively expensive and painful to create for and muchharder to maintain and support.” It is interesting to see that developershave different, even conflicting, perspectives of the difficulties involved inthe development and maintenance of apps for each platform.Exchange rate differences e.g., “price in both are set to 99 EUR as we are mainlyselling this in Europe. Play Store apparently still used USD converted by theexchange rate of the day the app was published.”We have to note that some of the developers we contacted were unaware of theprice differences.Versions and last updatedWhile the app stores’ guidelines suggest that developers follow typical softwareversioning conventions such as semantic versioning4 — in the form of (major.minor.patch)4 http://semver.org19— they do not enforce any scheme. Therefore, mobile apps exhibit a wide vari-ety of versioning formats containing letters and numbers, e.g., date-based schemes(year.month.patch). Our data indicate that only 25% of the app-pairs have identicalversions. When we inspect the major digit only, 78% of the pairs have the sameversion. 13% of the Android apps have a higher version compared to 9% of theiOS apps that have a higher version.Comparing the date the apps were last updated, 58% of the app-pairs havean iOS update date which is more recent than Android; while 42% have a morerecent Android update date. Interestingly, 30% of the app-pairs have update dateswhich are more than 6 months apart. To understand why developers update theirapps inconsistently across the platforms, we emailed all the developers of app-pairswhich were recently updated (after January 2016) on either platform; and in whichthe other platform has not been updated in 80 days or more. Out of 65 emails, wereceived 15 responses and categorized the reasons below:Ease of releasing updates e.g., “we are experimenting with a new 3D printingfeature, and wanted to try it on Android before we released it on iOS. As youknow, developers can release updates quickly to fix any problems on Android,but on iOS, we have to wait a week or two while Apple reviews the game.”Preferring one platform over the other for various reasons, e.g., “while thereare many Android handsets and potentially many downloads, this doesn’ttranslate well to dollars spent, relative to iOS.”Developer skills and expertise The developers might be more skilled at buildingapps for one of the platforms than the other; e.g., “I learned iOS first and amdeveloping for iOS full time, so everything is easier for me with iOS.”Update due to a platform-specific feature e.g., “only updated the iOS version toswitch over to AdMob as the advertising network for iOS. Apple announcedthat iAd is being discontinued.”We have to note that, similar to the reasons behind the price differences, someof the developers we contacted, mentioned that the development/maintenance costof the app could affect updates on either platform.20Finding 4: Our results indicate that the majority of cross-platform apps are notconsistently released. Only one in every four app-pairs has identical versionsacross platforms and 30% of the app-pairs have update dates which are morethan 6 months apart.2.2.2 Top Rated Apps (RQ2)Interestingly, our analysis on the top 100 free iOS and Android apps shows that88% of the top iOS and 86% of the top Android apps have pairs. 37 app-pairs arein the top 100 list for both platforms. On the other hand, for the top 100 paid iOSand Android apps, 66% of the top iOS and 79% of the top Android apps have pairs.30 of the paid pairs are in the top 100 for both platforms.To understand why some developers of successful apps only develop for oneplatform, we sent emails to all the developers of apps with no pairs. Out of 81emails sent, we received 29 responses and categorized the main reasons below:Lack of resources: “building the same app across two platforms is actually twicethe work given we can’t share code ... so we’d rather make a really goodapp for one platform than make a mediocre one for two.”Platform restrictions: “I’ve only focused on the Android platform simply becauseApple doesn’t allow for much customization to their UI.”Revenue per platform: “In my experience, iOS users spend more money, whichmeans a premium [paid app with a price higher than 2.99] is more likely tosucceed. ... while the Android platform has the market size, it proves to beharder for small [companies] to make good money.”Fragmentation within a platform: “my app is very CPU intensive and thus, Imust test it on every model. With a much-limited number of models for iOS,it’s feasible. On Android, it’s impossible to test on every model and qualitywould thus suffer.”Similar apps already exist on the other platform: “Apple devices already havea default podcast app.”21A common response from developers was that the app for the other platform isunder development.Finding 5: More than 80% of the top-rated apps are cross-platform.2.2.3 Aggregated User-Perceived Ratings (RQ3)Figure 2.9 shows the AUR rates for our app-pairs, computed using formulae 2.1.Pairs of triangular and square points represent an app-pair. We only keep app-pairsthat contained at least 1 Android and 1 iOS rating; this reduced the number of app-pairs to 14,000. The average number of ratings (m) across the Android apps was18199 and the average number of stars (c) was 3.9. For iOS, m was 1979 ratingsand c was 3.8 stars. The app-pairs are sorted based on the difference in their AURrates on the two platforms. The far ends of the figure indicate apps that are ratedhigher on one of the two platforms.The results indicate that in 95% of the app-pairs, the Android version is per-ceived better by users. Figure 2.10 shows the AUR rates for the app-pairs; but nowwith m and c set as the averages across all the Android and iOS apps combined.The averages for the ratings and stars were 10,089 and 3.8 respectively. Usingthese values for m and c results in 59% of the Android apps being perceived bettercompared with their iOS counterparts.Finding 6: The Android versions of cross-platform apps receive higher user-perceived ratings compared to the iOS versions.The method used to implement an app-pair might affect how its perceived byend-users. To explore this, we randomly selected and downloaded 30 app-pairswith similar AUR scores (within 5% range). We found that eight of them were im-plemented using a hybrid approach. The hybrid approach uses web technologiessuch as HTML, CSS, and Javascript to build mobile apps that can run acrossplatforms. We also analyzed 30 app-pairs that had a higher AUR on iOS than An-droid and 30 app-pairs with higher AUR on Android (i.e., with differences greaterthan 20%). We found only four in each set used the hybrid approach. In total, wefound 16 hybrid apps, which represents 17.7% of 90 app-pairs we inspected. Thisresult is in line with previous studies [103], which found that 15% of Android apps2230	40	50	60	70	80	90	100	0	 2	 4	 6	 8	 10	 12	 14	AUR	App-pairs	Thousands	ios-AUR	 android-AUR	 Poly.	(android-AUR)	 Poly.	(ios-AUR)	Figure 2.9: AUR scores calculated per platform.30	40	50	60	70	80	90	100	0	 2	 4	 6	 8	 10	 12	 14	AUR	App-pairs	Thousands	ios-AUR	 android-AUR	 Poly.	(ios-AUR)	 Poly.	(android-AUR)	Figure 2.10: AUR scores calculated across the platforms.are developed using a hybrid approach. Our analysis indicates that hybrid appsare usually equally rated by users on the platforms, which is not surprising as theyhave the same implementation on the two platforms.Furthermore, to understand why an app-pair is perceived differently on eachplatform, we sent emails to all the developers of app-pairs which had a difference23of more than 30% in their AUR scores. We asked if they have noticed the differ-ence and possible reasons that their two apps are not equally rated across platforms.Out of 200 sent emails, we received 20 responses. All the respondents agreed withour findings and were aware of the differences; for example, one developer said:“our app was by far more successful on iOS than on Android (about a milliondownloads on iOS and 5k on Android).” The reasons given were as follows. Tim-ing (release/update) and first impressions were thought to make a big difference inhow users perceive an app. The variation in ratings across platforms can also beattributed to the degree at which developers provide support on either side. Addi-tionally, app store support and promotional opportunities were mentioned to helpdevelopers, e.g., “Apple ... promote your work if they find it of good quality, thishappened to us 4–5 times and this makes a big difference indeed”. Furthermore,some respondents find the Google Play’s quick review process helpful to releasebug fixes and updates quickly.2.2.4 Complaints Across Platforms (RQ4)Classification. To evaluate the accuracy of the classifiers, we measured the F-measure for the Naive Bayes and SVM algorithms, listed in Table 2.5, wherePrecision = TPTP+FP and Recall =TPTP+FN . We found that SVM achieves a higherF-measure. On average, F(SVM) = 0.84 for the generic classifier and F(SVM) =0.74 for the sentiment classifier. The F-measures obtained by our classifiers aresimilar to previous studies such as Panichella et al. [88] (0.72) and Chen et al. [22](0.79). We selected the classifiers with the best F-measures and used them to clas-sify 1,702,100 (∼1.7M ) reviews for 2,003 (∼2K ) app-pairs.Sentiment and Generic ReviewsFigure 2.11 plots the distribution of the rates for the main categories in the senti-ment and generic classifiers for our app-pairs. Each dot represents an app-pair. Thedescriptive statistics are shown in Table 2.6. On average, Feature Request, Positive,and Negative reviews are more among the iOS versions whereas Problem Discov-ery, Non-informative and Neutral are more among Android versions of app-pairs.Further, we found that the average length of reviews on the iOS platform is larger,24namely 103 characters versus 76 characters on the Android platform.The goal in RQ4 was to understand the nature of user complaints and whetherthey differ on the two platforms.2526Table 2.5: Statistics of 14 Apps used to build the classifiers (C1 = Generic Classifier, C2 = Sentiment Classifier, NB = Naive Bayes Algorithm, SVM = SupportVector Machines Algorithm)# App GoogleCategory AppleCategory F(C1-NB) F(C2- NB) F(C1-SVM) F(C2-SVM)1 FruitNinja Game(Arcade) Game 0.77 0.68 0.83 0.752 UPSMobile Business Business 0.80 0.69 0.82 0.763 Starbucks Lifestyle Food & Drink 0.75 0.63 0.84 0.774 YellowPages Travel & Local Travel 0.78 0.62 0.85 0.755 Vine Social Photo & Video 0.81 0.70 0.84 0.766 Twitter Social Social Networking 0.79 0.67 0.84 0.757 AdobePhotoShop Photography Photo & Video 0.82 0.72 0.85 0.75... ... ... ... ... ...Total / Average of 14 Apps 0.77 0.65 0.84 0.74lllllllllllllllllllll lll lllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llll lllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllll0 20 40 60 80 100020406080100Android Problem Discovery (PD)iOS Problem Discovery (PD)lllllllllllll llllllllllll lllllllllllllllllll llllllllll lllllllllllllllllll l lllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllll llllllllllllll llllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllll llllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android Feature Request (FR)iOS Feature Request (FR)llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllllllllll lllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllll ll lllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Non−Informative AndroidNon−Informative iOSll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllll lll llllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android Positive (P)iOS Positive (P)lllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll llllllllllllll llllllllllllll lllllll lllll lllllllllllllllllllll llllllllllllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllll lllll lllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll llllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllll lllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll0 20 40 60 80 100020406080100Android Negative (N)iOS Negative (N)lllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll lllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllll lllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Neutral AndroidNeutral iOSFigure 2.11: The rates of classifiers’ categories for our 2K app-pairs, where each dot represents an app-pair.27Table 2.6: Descriptive statistics for iOS & AND reviews: Problem Discovery (PD), Feature Request (FR),Non-informative (NI), Positive (P), Negative (N), Neutral (NL), and AUR.ID Type Min Mean Median SD Max P-ValPD iOS0.00 20.47 15.62 16.65 100.00.00AND 0.00 21.06 17.54 14.61 100.0FR iOS0.00 17.50 16.03 10.81 100.00.00AND 0.00 13.71 12.50 8.88 100.0NI iOS0.00 62.04 64.95 20.77 100.00.00AND 0.00 65.23 67.10 17.45 100.0P iOS0.00 55.62 59.26 20.41 100.00.00AND 0.00 49.74 51.36 17.64 100.0N iOS0.00 9.80 6.66 10.07 100.00.00AND 0.00 7.72 5.74 7.39 100.0NL iOS0.00 34.57 32.45 14.87 100.00.00AND 0.00 42.54 41.73 13.97 100.0AUR iOS38.22 76.21 76.03 3.06 99.750.27AND 52.53 80.88 80.90 2.07 97.48Table 2.7: Descriptive statistics for problematic reviews: App Feature (AF), Critical (CR), Post Update(PU), and Price Complaints (PC).ID Type Min Mean Median SD Max P-ValAF iOS0.00 53.71 54.29 18.15 100.00.00AND 0.00 60.55 60.92 16.25 100.0CR iOS0.00 23.72 21.05 16.40 100.00.00AND 0.00 19.98 17.65 13.66 100.0PU iOS0.00 6.08 4.24 7.44 100.00.00AND 0.00 3.91 2.33 5.17 50.0PC iOS0.00 7.76 5.00 9.41 100.00.00AND 0.00 6.70 4.54 8.20 100.0ComplaintsOur complaint classifier has, on average, an F-measure of F(SVM) = 0.7. Weused the classifier to classify 350,324 (∼350K ) problematic reviews for our 2Kapp-pairs.The results, depicted in Figure 2.12 and Table 2.7, show that the complaintsabout the apps vary between the two platforms. On average, iOS apps have morecritical and post update problems than their Android counterparts, which could bedue to Apple regularly forcing developers to migrate their apps to their latest OSand SDK. Examples of iOS post update complaints include users unable to login,features no longer working, loss of information or data, and unresponsive or slowUI.28ll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllll lllllllllllllllllllll lllll llllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllll llllllllllllllllllll lllllllllllll ll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll l llllllllllllllllllllllllllllllllllllllllllllllllll lllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll0 20 40 60 80 100020406080100Android Critical (CR)iOS Critical (CR)lllllllllllll lllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllll llllllll llllllllllllllllll llllllllllllllllllllllllll lllllll lllllllllll lllllll llllllllllllllllllllllllllllllllllllllllllllll l lllllllllll lllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllll llllll llll lllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllll lllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllll llllllllll llllllllllllllllll llllllllllllllllllllllllllllllllllllllllll llllllll llllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android Post Update (PU)iOS Post Update (PU)l llllllll lllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll ll lll llllllllllllllllllllllllllllllllll llllllllllllllllll lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllllll llllllllllllllllllllllllllll lllllllllllllllllll llll lllllllllllllllllllllllllllllllllllllllllllllllll lllllll llllllllll llllllllll llllll lllllllll lllllll lllllllllllllllllll llllllllllll llllllllllllllllllllllll lllllllllllllllllll lllllllllllllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android Price Complaints (PC)iOS Price Complaints (PC)llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllll lllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llll lllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllll llllllllll lllllllllllllllllllllllllllllll llllllllll lllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0 20 40 60 80 100020406080100Android App Feature (AF)iOS App Feature (AF)Figure 2.12: The rates of complaints categories for our 2K app-pairs, where each dot represents an app-pair.On the other hand, Android apps have more complaints related to features,which could be due to device fragmentation on Android. The wide array of An-droid devices running with different versions of Android, different screen sizes,and different CPUs can cause non-functional complaints related to security, per-formance or usability problems. This negative side-effects of fragmentation isdiscussed in other studies [32, 37, 52, 109]. Examples of Android complaintsinclude dissatisfaction with a certain functionality, incompatibility with a certaindevice/OS, and network and connectivity problems.29Finding 7: The results indicate that on average, iOS apps receive more criticaland post update complaints while Android apps receive more complaints relatedto app features and non-functional properties.2.3 DiscussionIn this section, we discuss several implications of our study and the threats to va-lidity of our results.ImplicationsOur study helps to gain insights into the challenges faced by developers such as in-consistencies that might arise due to different strategies for maintaining, releasing,and pricing apps across platforms. It can help app developers to understand whyusers of their apps might experience, complain, or rate the same app differentlyacross platforms, and to mitigate the differences.Our results indicate that a large portion of apps (87–95%) are developed for oneplatform only. While both platforms are popular and equally important, Androidhas gained the majority of the attention from the software engineering researchcommunity by far. Our results suggest that apps from both Apple Store and GooglePlay need to be included in future studies to have a more representative coverage.More than 80% of the top-rated apps exist on both the Apple and Google Playapp stores. As recently identified by Nagappan and Shihad [84], one of the obsta-cles with cross-platform analysis is the lack of a dataset for such apps. Our workprovides the first large dataset with more than 80,000 exact app-pairs of iOS andAndroid apps [80]. This large dataset, which is now publicly available, can beleveraged by other researchers for further cross-platform analysis of mobile apps.Our results show that end-users can perceive and rate cross-platforms differ-ently on each platform. This is especially true for native apps that are built withdifferent languages and technologies. Hybrid apps are less susceptible to such user-perceived variations across platforms. Our empirical study presented in Chapter 3shows that hybrid apps have less varaiance in terms of user perception across theplatforms and can outperform native apps in terms of aggregated ratings; out of the25 possible app store categories, the hybrid apps had better ratings in 18. Develop-30ers willing to achieve more consistency for their apps across platforms can benefitfrom creating hybrid apps.Review ClassificationThere are many techniques available to classify textual user reviews. The goal ofthis work was not to develop a new classification technique to outperform othertechniques, but to simply compare the nature of reviews for the same apps, on thetwo platforms. To this end, we surveyed the literature and chose the technique bestsuited to our need while achieving a decent F-score. The Support Vector Machine(SVM) algorithm along with the NLP features of the Scikit Learn Tool were thebest choice four out study and resulted in F-scores that were comparable to othersimilar studies [22, 88].Threats to ValidityOur manual labelling of the reviews to train the classifiers could be a source ofinternal threat to validity. In order to mitigate this threat, uncertainties were cross-validated and resolved through discussions and refinements between the authors.As shown in Figure 2.4, the app-pairs detected in our study are a subset of allpossible app-pairs. Our study only considers exact matches for app-pairs, whichmeans there exist app-pairs that are not included in our analysis. For instance, anapp named The Wonder Weeks5 on iOS has a pair on the Android platform withthe name Baby Wonder Weeks Milestones,6 but not included in our study. Whileour study has false negatives, our manual validation of 100 randomly selected app-pairs shows that there are no false positives. In terms of representativeness, wechose app-pairs from a large representative sample of popular mobile apps andcategories. With respect to generalizability, iTunes and Google Play are the mostpopular systems currently, although apps in other app stores could have other char-acteristics. Regarding replication, all our data is publicly available [80], makingthe findings of our study reproducible.5 https://itunes.apple.com/app/the-wonder-weeks/id529815782?mt=86 https://play.google.com/store/apps/details?id=org.twisevictory.apps&hl=en312.4 ConclusionsIn this chapter, we present the first large-scale study of cross-platform mobile app-pairs. We mined 80K iOS and Android pairs and compared their app-store at-tributes. We built three automated classifiers and classified 1.7M reviews to under-stand how user complaints and concerns vary across platforms. Additionally, wecontacted app developers to understand some of the major differences in app-pairattributes such as prices, update frequencies, AUR rates and top rated apps existingonly on one platform.32Chapter 3Mining and Characterizing Hybrid AppsSummary7Mobile apps have grown tremendously over the past few years. To capitalize onthis growth and to attract more users, implementing the same mobile app for dif-ferent platforms has become a common industry practice. Building the same appnatively for each platform is resource intensive and time consuming since everyplatform has different environments, languages and APIs. Cross Platform Tools(CPTs) address this challenge by allowing developers to use a common code-baseto simultaneously create apps for multiple platforms. Apps created using theseCPTs are called hybrid apps. We mine 15,512 hybrid apps and present the firststudy of its kind on such apps. We identify which CPTs these apps use and howusers perceive them. Further, we compare the user-perceived ratings of hybrid appsto native apps of the same category. Finally, we compare the user-perceived ratingsof the same hybrid app on the Android and iOS platforms.3.1 MethodologyWe address the following research questions in our study:RQ1. How prevalent are hybrid apps and which cross-platform tool is widelyused?7This chapter appeared at the 1st International Workshop on App Market Analytics (WAMA2016) [4]33RQ2. Does the choice of a cross-platform tool influence how it is perceived byusers?RQ3. How do hybrid apps compare to native apps of the same category in termsof user-perceived ratings?RQ4. Does using a cross-platform tool ensure the app is perceived similarly onmultiple platforms?We first describe how we identify hybrid apps and then explain the analysissteps we performed on those apps.3.1.1 Data CollectionThe first step in our work is to mine apps from the app stores. To this end, we usedthe dataset of Android and iOS app-pairs introduced in Chapter 2. The dataset con-tains the attributes of 80,000 app-pairs. This dataset only contains the attributes ofthe apps and does not include the source code. We used a dataset of app-pairs sinceone of the main usages of CPTs is to generate the app for more than platform; bylooking at app-pairs the chances of finding hybrid apps is much higher. Addition-ally, having the app-pair allows us to answer RQ4 which compares how a hybridapp is perceived on the iOS and Android platforms.3.1.2 Finding Hybrid AppsIn order to determine if an app is hybrid, a manual approach can be used. Suchan approach would involve installing the app on a device and exercising its func-tionality and try to infer from the user experience if the app is hybrid. The manualapproach is time consuming and subjective to the user’s opinion, which can lead tomany false positives. Furthermore, previous work [103] has quantified the numberof hybrid apps and discovered that out of 1.1 Million Android apps 129,800 werehybrid. However, that dataset of hybrid apps is not publicly available and hencewe had to build our own. For this work, we provide a fully automated techniqueto detect hybrid apps with 100% accuracy. Our technique supports the detection ofapps made using 3 CPTs which are PhoneGap [3], Appcelerator Titanium [10] and34Adobe Air [1]. We target these CPTs since previous work [103] has shown thatthey are the most popular CPTs to develop hybrid apps.To identify a hyrbid app, we download its Android application package file(APK) which is the file format used by the Android operating system to distributeand install application software and middleware. Since the dataset used in 3.1.1does not include the APKs, our technique first attempts to find it by using the app’sid to search through a dataset of 1.1 Million Android apps [85] and downloads theAPK if it is available. Every Android APK includes a file called “classes.dex”which includes the classes of an Android app compiled in the dex file format.We use an open source tool called android-classyshark [17] to decompile the“classes.dex” into a readable format and then inspect it to check if an app usesa CPT. To determine if an app is hybrid we check its class contents for the fol-lowing references; PhoneGap - “org.apache.cordova”, Appcelerator Titanium -“org.appcelerator.titanium”, Adobe Air -“com.adobe.air”. This technique of in-specting “classes.dex” and looking for references of usage of CPTs allows us toidentity hybrid apps with 100% accuracy. We chose to download the Android APKinstead of the iOS application package file because such information is not publiclyavailable for iOS apps. To validate that our technique correctly identifies hybridapps, we manually compared app icons and screen shots between the iOS and An-droid version of the apps and looked for clues such as having the exact UI layoutacross platforms or the use of un-native UI elements to conclude that an app is hy-brid. We sampled 100 random apps and all of them were indeed hybrid, indicatingthat there are no false positives.Algorithm 1 summarizes our approach and is used on the dataset in 3.1.1 tocreate a dataset of hybrid app-pairs. We use this dataset to answer RQ1 and theresults are presented in Section 4.3.3.1.3 User-Perceived-RatingTo measure how end-users perceive hybrid apps we use the same AUR metric in-troduced in Section 2.1.5.CPT vs App AUR. Since every CPT uses a different set of programming languagesand techniques to generate hybrid apps, the goal of RQ2 is to analyze the relation-35Algorithm 1: Identifying hybrid appsinput : Collection of Appsoutput: Collection of Hybrid Apps1 begin2 phoneGapApps← []3 titaniumApps← []4 adobeAirApps← []5 foreach i = 0, i <COUNT(APPS), i++ do6 app← APPS[i]7 appId← app.id8 apk← lookForApk(appId)9 classes← classyShark(apk)10 if classes.contains(“org.apache.cordova”) then11 phoneGapApps.append(app)12 end13 if classes.contains(“org.appcelerator.titanium”) then14 titaniumApps.append(app)15 end16 if classes.contains(“com.adobe.air”) then17 adobeAirApps.append(app)18 end19 end20 endship between the CPT used to develop an app and how it is perceived by users.We use the metric discussed earlier to measure the AUR of apps generated by eachCPT and compare the results in Section 4.3.Hybrid vs Native Apps. While Hybrid apps have been increasing in popularity,native apps still dominate the market place due to their competitive advantage interms of performance and supported features. The goal of RQ3, is to comparenative apps and hybrid apps in terms of AUR. Since there is no way of directlycomparing the apps, we compare the categories of apps with one another.AUR Across Platforms. One of the main reasons for using a CPT is the con-venience of generating an app for multiple platforms using a single code base.Furthermore, this ensures that the user experience is uniform across platforms.The final RQ examines whether these identically created apps are also identical interms of their AUR. The dataset we use from Section 3.1.1 contains informationabout Google’s Android and Apple’s iOS app-pairs (the same app implemented fordifferent platforms). We again use the formula discussed earlier to measure theAUR of hybrid apps across these platforms and present the comparisons in Sec-tion 4.3. Since we are performing cross-platform analysis, we require that the app36Table 3.1: Number of Hybrid apps using different CPTsCPT # of Apps % of Hybrid AppsPhoneGap 10,562 68.0%Titanium 2,881 18.5%Adobe Air 2,069 13.5%Total 15,512 100.0%has at least 1 rating on both platforms.3.1.4 Dataset and ResultsOur extracted dataset and results for the identified hybrid apps, as well as all ourscripts are available for download at [58].3.2 FindingsIn this section, we present the results of our study for each research question.3.2.1 Prevalence and Popularity of CPTs (RQ1)Out of the 80,000 apps that were inspected, our technique (described in 3.1.2)was able to find a total of 15,512 hybrid apps. As shown in Table 3.1, 10,562hybrid apps used the PhoneGap CPT, 2,881 used Appcelerator Titanium, and 2,069used Adobe Air. Furthermore, Figures 3.1–3.3 show the distribution of hybridapps across the various categories in the Google Play store for each of the CPTs.The most popular categories for PhoneGap are business, lifestyle, travel & local,sports and education. For Titanium, the most popular categories are travel & local,lifestyle, finance, business and education. Finally for Adobe Air the most popularcategories are games, education, business, lifestyle, and entertainment. Looking atthe number of paid vs free apps, we found that all the hybrid apps in our datasetwere free, regardless of which CPT they used.Ratings, Stars & DownloadsWe found that 79% of all the hybrid apps in our dataset, 76% of the PhoneGapapps, 81% of the Titanium apps and 90% of the AdobeAir apps have at least one371945	1335	888	851	763	568	482	475	434	419	335	304	292	276	248	240	219	168	149	 72	 50	 25	 14	 5	 5	0	500	1000	1500	2000	2500	business	lifestyle	travel_and_local	sports	educa>on	finance	health_and_fitness	entertainment	produc>vity	medical	music_and_audio	tools	social	news_and_magazines	game	books_and_reference	shopping	communica>on	transporta>on	media_and_video	photography	weather	libraries_and_demo	comics	personaliza>on	Number	of	Apps	Category		Distribu>on	of	phoneGap	Apps	Figure 3.1: Number of apps in each category created using the PhoneGap CPT.rating. As depicted in Table 3.2, the median for the number of ratings is 7 overall,6 for PhoneGap, 7 for Titanium and 19 for Adobe Air.As for the stars, the median is 4.20 overall, 4.3 for PhoneGap, 4.1 for Titaniumand 3.9 for Adobe Air. 99% of the apps have been downloaded at least once. Themedian was 100 downloads overall, 100 for PhoneGap and Titanium and 500 forAdobeAir.Finding 8: The PhoneGap CPT dominates the hybrid app market with a 68%share and is mainly being used to develop business, lifestyle, travel & localapps. Despite having the smallest market share at 13.5%, apps created using theAdobeAir CPT are downloaded and reviewed much more by users. The AdobeAirCPT is mainly used to develop games, and we attribute its popularity to this rea-son.3.2.2 Effect of CPT on App’s AUR (RQ2)Table 3.3 shows the AUR across all hybrid apps and for each of the CPTs. In ouranalysis, we only keep apps that contained at least 1 rating; this reduced the numberof apps to 9948. Overall the median for AUR was 84%, 86% for PhoneGap, 82%for Titanium and 78% for AdobeAir. As can be seen in Figure 3.4, the PhoneGap38533	447	383	292	197	120	 108	 89	 85	 81	 79	 73	 68	 67	 60	 44	 39	 38	 33	 15	 13	 8	 7	 2	0	100	200	300	400	500	600	travel_and_local	lifestyle	finance	business	educa>on	news_and_magazines	entertainment	sports	medical	tools	produc>vity	shopping	social	health_and_fitness	music_and_audio	game	transporta>on	books_and_reference	communica>on	media_and_video	photography	weather	libraries_and_demo	personaliza>on	Number	of	Apps	Category		Distribu>on	of	Titanium	Apps	Figure 3.2: Number of apps in each category created using the Titanium CPT.Table 3.2: Descriptive statistics for the hybrid apps: Ratings (R), Stars (S), and Downloads (D).ID Type Min Mean Med SD MaxRAll 1 609.40 7.00 10428.74 530200PhoneGap 1 182.00 6.00 3465.35 228200Titanium 1 146.30 7.00 1487.72 44400AdobeAir 1 3088.00 19.00 25792.13 530200SAll 1 4.07 4.20 0.85 5PhoneGap 1 4.13 4.30 0.85 5Titanium 1 4.00 4.10 0.89 5AdobeAir 1 3.88 3.90 0.75 5DAll 1 11290.00 100.00 177195.30 10MPhoneGap 1 6295.00 100.00 143812.30 10MTitanium 1 4835.00 100.00 46610.65 1MAdobeAir 1 41610.00 500.00 339959.60 10MCPT generates apps with a better AUR score, followed by Titanium and AdobeAir.The results are statistically significant, with the p-value (Mann-Whitney) being0.00 for all comparisons between CPTs.Finding 9: The PhoneGap CPT results in apps which are better perceived byusers. Furthermore, our results indicate that a high number of downloads andratings does not necessarily mean that an app is well received by users. Thisis evident by the results of the AdobeAir CPT, which had the highest number ofratings but resulted in apps with the lowest AUR.39878	332	194	106	 91	 58	 48	 46	 41	 37	 34	 30	 29	 23	 23	 22	 17	 17	 16	 13	 6	 5	 2	 1	0	100	200	300	400	500	600	700	800	900	1000	game	educa3on	business	lifestyle	entertainment	travel_and_local	health_and_fitness	tools	produc3vity	books_and_reference	news_and_magazines	medical	sports	finance	photography	music_and_audio	communica3on	media_and_video	shopping	social	transporta3on	weather	personaliza3on	comics	Number	of	Apps	Category	Distribu3on	of	Adobe	Air	Apps	Figure 3.3: Number of apps in each category created using the Adobe Air CPT.Table 3.3: Descriptive statistics for the hybrid apps: AURID Type Min Mean Median SD MaxAURAll 20.00 81.48 84.00 17.11 100.00PhoneGap 20.00 82.75 86.00 17.17 100.00Titanium 20.00 80.14 82.00 17.86 100.00AdobeAir 20.00 77.67 78.00 15.03 100.003.2.3 Hybrid Vs Native (RQ3)Figure 3.5 plots the AUR scores of various app categories for Native and HybridApps. To fairly compare the categories, we used an equal number of apps, dis-tributed equally across the categories. The total number of Hybrid apps was 9948and an equal number of Native apps was used. When exploring this RQ, we ex-pected the native apps to be better perceived than the hybrid ones due to theiradvantage in terms of performance and supported native features. To our surprise,the hybrid apps were very close to the native ones in terms of AUR scores and evenscored slightly higher in some of the categories. Out of the 25 possible categoriesfor apps, the Hybrid scored higher in 17 of them.40All PhoneGap Titanium AdobeAir20406080100AURFigure 3.4: AUR rates for the apps overall and across each CPT.Finding 10: The hybrid Apps analyzed had AUR scores which matched; andin some categories were higher than the native apps. This indicates that it ispossible to create a successful hybrid app that is well perceived by users andthat can compete with native variants.3.2.4 AUR Across Platforms (RQ4)Figure 3.6 plots the AUR scores of 1400 hybrid app-pairs. The results show thatusing a CPT to create hybrid identical apps for the iOS and Android platform doesnot necessarily mean it will be equally perceived by users on both platforms. Thefar ends of the plot show apps that are better perceived on one platform but not theother.3.3 Threats to ValidityThe Hybrid apps detected in our study are a subset of all the possible hybrid apps.Our study uses an existing dataset of 80,000 to identify 15,512 hybrid apps.In terms of representativeness, our identified hybrid app dataset contains apps4172	74	76	78	80	82	84	86	88	game	transporta2on	libraries_and_demo	news_and_magazines	produc2vity	educa2on	tools	photography	health_and_fitness	books_and_reference	entertainment	shopping	finance	travel_and_local	weather	business	medical	lifestyle	communica2on	social	personaliza2on	sports	media_and_video	comics	music_and_audio	AUR		Category	Na2ve	 Hybrid	Figure 3.5: Average AUR for app categories for Native and Hyrbid Apps.from all the categories available on the Google Play store. With respect to general-izability, iTunes and Google Play are the most popular systems currently, althoughapps in other app stores could have other characteristics. Regarding replication, allour data is publicly available [58], making the findings of our study reproducible.3.4 ConclusionsIn this chapter, we present the first study of hybrid mobile apps. We mined 15,512hybrid apps, identified the CPTs they used and analyzed their attributes like thenumber of ratings, stars and downloads. We indirectly compared the success of theCPTs by measuring how the app is perceived by users. We compared the AUR ofhybrid apps to native apps of the same category. Additionally, we compared theAUR of the same hybrid app on the Android and iOS platforms.4220	30	40	50	60	70	80	90	100	0	 200	 400	 600	 800	 1000	 1200	 1400	AUR	App-pairs	ios_AUR	 android_AUR	 Poly.	(ios_AUR)	 Poly.	(android_AUR)	Figure 3.6: AUR scores for 1400 hybrid app-pairs. Each pair of diamond(iOS) and square(Android) dotsrepresents an app. The solid and dashed lines show the trend of AUR across the apps.43Chapter 4Related WorkMany studies have been conducted recently to mine and analyze app store con-tent. Most studies, however, have focused on one platform only. Our work, on theother hand, aims at characterizing the differences in mobile app-pairs across twodifferent platforms. To the best of our knowledge, this is the first work to reporta large-scale study targeting iOS and Android app-pairs. This chapter presents aninformal survey of the recent works in the mobile apps domain.4.1 Cross PlatformMalavolta et al. [75] conducted a study to investigate hybrid mobile apps in theGoogle Play Store. To find hybrid apps, the authors downloaded the top 500 appsfrom each app category and developed a tool which inspects app binaries to de-termine if an app is hybrid. Using this technique they were able to identify 445hybrid apps and found Apache Cordova [2] and Appcelerator Titanium [10] to bethe most common cross platform frameworks. They found jQuery, jQuery Mobileand Json2 to be the most common third-party web libraries and found that somehybrid apps use MVC frameworks such as AngularJS [43] and Backbone [64]. Fi-nally they found that hybrid and native apps are rated similarly by end users. Thiswork is the closest to our empirical study on hybrid apps. We expand on their workin a few ways. First, our dataset is 33 times larger and includes 15,512 hybridapps. Second, we propose a novel metric which combines the ratings and stars tomeasure the aggregated user-perceived ratings. Third, since our dataset containsthe Android and iOS versions of the apps, we are able to conduct cross platform44analysis.Malavolta et al. [74] extended their work by mining and classifying user re-views to understand users’ perspective of hybrid apps. Their results indicate thathybrid and native apps are balanced in terms of performance, with the exception ofapps which require a closer interaction with the Android platform in which nativeoutperformed hybrid ones. They found that hybrid apps are more prone to bugsand they attribute that due to the lack testing frameworks for cross-platform toolsand hybrid apps.Heitko¨tter et al. [56] conducted an experiment to evaluate various cross-platformdevelopment approaches. The authors compared Web apps, PhoneGap apps [3], Ti-tanium apps [10] and native apps by examining a set of proposed criteria such aslicenses, costs, look & feel, supported platforms and application performance. Theresults of their experiment indicate that the Titanium framework provides the bestsolution, if a native-like UI is desired and only the iOS and Android platforms needto be supported. However, if the UI requirements are flexible and more platformsneed to be supported, PhoneGap is the better option. The authors found that crossplatform frameworks are mature enough to develop apps that can compete withnative ones.Along the same lines, Ciman et al. [25] compared four cross-platform frame-works, MoSync [82], Titanium, jQuery Mobile [98] and Phonegap with an em-phasis on the development of apps with animations. They developed an animation-intensive game using each of these four frameworks and found the Titanium frame-work to yield the best performance due to its native support for animations andtransitions.Angulo et al. [8] compared the user experience (UX) of two versions of thesame app, one developed natively and one developed using the Titanium crossplatform framework. The authors conducted a user study with 37 participants andfound that users were able to complete tasks slightly faster on the native version ofthe app. Furthermore, they found that 71% of the participants in the iOS Titaniumversion of the app agreed that it behaves like a typical iOS app while 91% agreedthat the Android Titanium version behaves like a native app. The authors mea-sured user satisfaction with the System Usability Scale (SUS) [19] and found thatthe Titanium versions of the app scored 82.79% while the native version scored4589.12%.In 2016 Willocx et al. [108] conducted a study to compare the performanceof native apps and apps created using cross-platform frameworks. To compare theperformance, the authors used an open source app called PropertyCross [26], whichhas a native implementation along with various implementations using cross-platformframeworks. Using the Android, iOS and Windows Phone native implementationsalong with ten cross-platform frameworks’ implementations the authors measuredresponse times, CPU, memory, disk and battery usage. Their results indicate thatJavascript-based frameworks are the most CPU intensive and have the slowestlaunch times. Additionally, they found that the performance of a cross-platformapp is strongly correlated with the targeted platform.4.2 ReviewsCen et al. [20] utilized user reviews to assess the security risk of mobile apps. Theyproposed a two step approach, in which they first mine an app’s user reviews andlabel each review into categories related to security/privacy. They use the labeledreviews to compute a risk score for each app and then rank the apps based on theirrisk scores. An evaluation on two datasets has shown that their technique was ableto outperform other metrics for ranking app security risk.Gao et al. [39] proposed AR-Tracker, a framework to mine user reviews with-out manual human labeling which also traces the changes of user reviews overtime. They compared their tool to AR-Miner [23] and were able to achieve similarresults.Along the same lines, Gu et al. [48] proposed SUR-Miner, a framework tosummarize app user reviews. The framework classifies reviews into five categoriesand then uses two interactive visualizations to present the results. The frameworkachieved an f-measure of 81% and a developer survey showed that 88% agree withthe tool’s findings.In 2015 Gomez et al. [40] conducted an empirical study to mine buggy appsby examining the correlation between the permission an app uses and user reviews.The authors used the mined data to build app checkers which can predict whetheran app could be buggy.46A tool, DIVERSE was developed by Guzman et al. [51] to accept developerqueries and retrieve related user reviews. The authors conducted a controlled ex-periment and found that DIVERSE can help developers save time when analyzinguser reviews and planning their next releases.Khalid et al. [65] argued for the usefulness of app reviews for crowdsourcingby analyzing a crowdsourcing reference model. Their findings indicate that appreviews can be used for crowdsourcing and help solve a few problems such asfeature requests, recommendations for developers and users and error reporting.Similarly, Maalej et al. [72] proposed several techniques to classify app re-views into four types: bug reports, feature requests, user experiences, and ratings.Using multiple binary classifiers, as opposed to a single multiclass classifier; theauthors were able to achieve a precision and recall for all four classes ranging from71% to 97%. Furthermore, they found that review classification techniques can beenhanced by incorporating metadata such as tense, length of review and star rating.Gao et al. [38] designed a framework PAID, to prioritize app issues for devel-opers and help them decide on problems such as which bugs to fix or what featuresto add. The framework operates by tracking and classifying user comments overthe release versions of the app. The authors evaluated their technique on 18 appswith 117 app versions and the results show that PAID was able to predict issuesthat matched the official changelogs with high accuracy.Similarly, in 2016 Villarroel et al. [105] proposed CLAP, to automatically cat-egorize and cluster related reviews into bug reports and new features. CLAP usesthis information to make recommendations to developers on what to include/fixin their next app version. They conducted a user study and found that CLAP canaccurately identify bugs and feature requests to help developers with release plan-ning.Liang et al. [67] examined the effect of user reviews on app sales. They useda multifacet sentiment analysis (MFSA) [68] approach to analyze user reviews andfound that reviews on service quality have the strongest effect on app sales.In 2015 Mcilroy et al. [78] conducted an empirical study on user reviews from20 apps and found that 30% of the reviews express more than one concern aboutthe app. Based on these findings they proposed an automated approach to assignmultiple labels to reviews with a precision of 66% and recall of 65%. They demon-47strated the usefulness of their approach in app comparison and app store overview.Finally they proposed a technique to detect anomalous apps and evaluated it on12,000 apps.Mcilroy et al. [77] expanded their work on app reviews and examined the valueof responding to user reviews. They examined responses to reviews of 10,713 appsand found out that most developers do not respond to reviews. However, they foundthat after a developer response, 38.7% of users increase their rating with a medianof 20%.Palomba et al. [86] proposed CRISTAL to track informative user reviews tochanges in the apps source code. They conducted an evaluation on 100 open sourceapps and found that 49% of the requests in user reviews were implemented in newversions of the app. They found a positive relationship between implementing theuser requests in reviews and the apps overall success measured in terms of ratings,thus confirming Mcilroy’s previous results.Panichella et al. [87] combined machine learning techniques; natural languageprocessing, text analysis and sentiment analysis to classify reviews into categoriesrelevant to software maintenance and evolution. They evaluated their framework on1,421 manually labeled reviews and were able to achieve a precision and recall of75%. Their results indicate that the combined use of machine learning techniquesachieves better results than using each technique separately.Park et al. [89] conducted a study on mobile app retrieval. They proposed atopic model (AppLDA) that represents apps using the topics in user reviews andapp descriptions. An evaluation on 1,385,607 reviews from 43,041 apps shows thatAppLDA significantly outperforms traditional retrieval techniques.4.3 SecurityAvdiienko et al. [13] mined apps for abnormal usage of sensitive data. Theyproposed MUDFLOW which extracts flow data information from apps in orderto train a malware classifier. They used 2,950 apps to train the classifier and wereable to identify 86.4% of the malicious apps and 90.1% of the apps which leaksensitive data.Deng et al. [31] combined static and dynamic analysis to develop iRiS, a sys-48tem to detect iOS apps which use private APIs and access sensitive user informa-tion, thus violating Apple’s terms of service. Out of 2019 apps, the authors detected146 apps which use private APIs that access sensitive user information, such as thedevice’s serial number.In 2015 Huang et al. [57] developed SUPOR, a system which uses static anal-ysis to detect UI elements that prompt the user for entry of sensitive data, suchuser credentials, finance or medical data. They combined SUPOR with existingstatic taint analysis tools to detect apps which leak private information. They eval-uated the system on 16,000 apps mined from the Google Play store and achievedan average precision and recall of 97.3% and found that 355 apps leak private userinformation.Chen et al. [21] developed MassVet, which can identify malicious apps under10 seconds and with a low false detection rate. However, Unlike other techniqueswhich rely on heavyweight static/dynamic analysis of the app; MassVet operatesby comparing the submitted app to other similar apps. A large scale evaluation on1,165,847 apps mined from Google Play store identifies 127,429 apps as malicious.In 2016 Dash et al. [28] proposed a purely dynamic analysis technique toclassify Android malware into families of related malware. They used a hybrid ap-proach which combines the traditional Support Vector Machines [97] classificationmethod with Conformal Prediction [96]. Through an evaluation on 5,560 apps theywere able to achieve a high accuracy of 94%.FUSION, a bug reporting framework based on static and dynamic analysis ofthe app was developed by Moran et al. [81]. Taking the event driven nature ofapps into account, FUSION generates reproduction steps for bugs. A user studyinvolving 28 participants showed that FUSION allows developers to accuratelyreproduce bugs.Ma et al. [71] developed a system to detect malicious apps by comparing theirdescription and runtime behavior. Their work improves on the popular CHABADA[47] work by combining semi-supervised learning and active learning and makinguse of both known benign and malicious apps to detect malicious apps. An eval-uation on 22,555 apps showed an f-measure of 96.02% which was 209.6% higherthan CHABADA.A large scale study to detect vulnerabilities in apps which contain web con-49tent was conducted by Mutchler et al. [83]. The authors leveraged a variety oftechniques to detect vulnerabilities such as loading unsafe web content, exposingsensitive API calls and mishandling certificate errors. Inspecting 998,286 GooglePlay apps which contain web content reveals that 28% contain at least one vulner-ability.In 2015 Schutte et al. [92] developed ConDroid, which performs concolicexecution of Android apps to observe behavior such as network traffic or dynamiccode loading. Using ConDroid on a dataset of 10,000 apps, revealed that 172suffered from a logic bomb vulnerability.Along the same lines, Fratantonio et al. [36] implemented TriggerScope todetect logic bombs in Android apps. In addition to symbolic execution, Trigger-Scope uses a new static analysis technique called trigger analysis to detect hiddenapp behavior and report it to the user. TriggerScope was evaluated on 9,582 appsfrom the Google Play Store and achieved a 100% detection rate and discovering 2previously undisclosed vulnerabilities.Vigneri et al. [104] built a system to characterize the network behavior ofAndroid applications and identify network communication that could leak privateuser information such as user tracking, spyware or excessive ad usage. Using aset of 5,000 apps from the Google Play store, the authors discovered that a largenumber of popular applications download excessive amounts of advertisements andthat some apps establish network connections with malicious websites.Yang et al. [111] developed AppContext which detects malicious apps by usingstatic analysis to examine the events and conditions that cause an app to exhibitsecurity-sensitive behaviors. Running AppContext on a dataset of 835 apps fromthe Google Play Store correctly identifies 192 malicious apps with a precision of87.7% and a recall of 95%.Zhang et al. [112] proposed the DescribeMe system to improve the securityawareness of app users. The system uses a combination of static analysis tech-niques to automatically identify security issues with an app and output a humanreadable description of the results. To prove the effectiveness of their system theauthors conducted a user study using Amazons Mechanical Turk (MTurk) [6] on100 apps and found that while their system reduced readability by 4% it reducedthe downloads of malicious apps by 39%.50A system for early detection of spam mobile apps was developed by Senevi-ratne et al. [95]. The authors manually labeled a dataset of removed apps, andusing a set of heuristics that explain why an app was removed, it was estimatedthat 35% of the apps were removed because they were spam. Using this data aclassifier was trained to automatically detect spam apps. The classifier achieved anaccuracy of 95% and estimated that 2.7% of the apps in the author’s dataset werespam.4.4 FeatureA technique to help users find apps faster on their smartphone was developed byLulu et al. [70]. Their technique uses app descriptions along with informationfrom the web to represent apps based on their functionality. A user study with 40participants revealed that the proposed technique allowed users to find apps fasterand provided a more logical grouping of apps.Berardi et al. [15] used machine learning to automatically classify apps basedon their metadata. The authors used app names, descriptions, ratings and app sizesto classify apps into one of 50 categories. The classifier used the Support VectorMachine algorithm and was trained using a set of 5,993 manually labeled Androidapps. The classifier achieved an f-measure of 89.5%.A system xRank, to help advertisers find better target users was developed byHe et al. [54]. As opposed to traditional approaches like using AdWords or Ad-Sense, xRank targets users based on the apps downloaded on their phones. xRankwas trained using a set of 122,875 apps from the Huawei App Store and informa-tion about 20,169,033 users. xRank was shown to improve the accuracy of variousmarketing tasks compared to traditional marketing approaches.In 2015 Tian et al. [101] conducted a study to understand the differences be-tween high and low rated apps. The authors used a dataset of 1,492 Android appsand measured the correlation between an app’s rating and its data such as size, codecomplexity, library dependence and UI complexity. Their findings show that thesize of an app, number of promotional images, and target SDK version affect appratings the most.Wang et al. [107] developed a text mining approach to identify how sensitive51data is used in Android apps. Their approach decompiles apps to Java source codeand searches for uses of sensitive permissions, then extracts multiple kinds of fea-tures from the code. A classifier is built using these features and evaluated in thecontext of geolocation and the user’s contacts permissions. Using the classifier ona set of 622 apps reveals that it can accurately, with an average of 89.5%, infer howsensitive permissions are used.4.5 TestingIn 2105 Boushehrinejadmoradi et al. [18] developed a technique based on dif-ferential testing to detect inconsistencies and test cross platform app developmentframeworks. Through the use of random test generation tools, the authors executedthe tests on the source and target platforms and examined the results to identify in-consistent behavior. They Implemented their technique in a tool called X-Checkerand applied it to the popular cross platform framework Xamarin. Their tool iden-tified 47 bugs/inconsistencies, 12 of which have been fixed after being reported bythe authors.Along the same lines, Erfani et al. [34] proposed an automated technique todetect inconsistencies in native apps across the iOS and Android platforms. Thetechnique uses a graph based approach to compare the dynamically extracted mod-els of the apps and detect inconsistent behavior such as missing functionality ordifferent data presentation. The technique was implemented through a prototypetool called CHECKCAMP and evaluated on a set of 14 industrial and open sourceapps. CHECKCAMP identified 54 inconsistencies with an f-measure of 92% onaverage. A limitation of the tool however, is the high false-positive rate in thereported data inconsistencies.Meng et al. [79] developed an Android Testing Toolkit (ATT) to help the de-velopment of testing and analysis tools. The toolkit combines tools and APIs fordevice management, event generation, system profiling and program instrumenta-tion. The authors demonstrated the use of their technique by reimplementing 3different testing techniques using ATT.Choudhary et al. [24] conducted a study to compare the various test input gen-eration tools for Android. They used criteria such as code coverage, fault detection52and ease of use to compare the tools. Their experiments show that Monkey [42]remains the best existing test input generation tool, providing the best coverage,highest fault detection rate and the best support for various versions of Android.Mao et al. [76] proposed SAPIENZ, a multi-objective approach which com-bines random fuzzing, systematic and search-based exploration to test Androidapps. SAPIENZ aims to improve code coverage, fault detection rate and executiontime. SAPIENZ was applied to the top 1,000 Google Play apps and revealed 558new crashes, 14 of which have been reported to and fixed by the developers. Addi-tionally, SAPIENZ significantly outperformed state of the art tools such as Monkey[42] and Dynodroid [73] in 70% of the experiments related to code coverage andfault detection.Lu et al. [69] developed PRADA, an approach based on mining large-scaleusage data to help developers prioritize Android device models to test their appson. PRADA tries to predict the expected usage of a new app based on the usage datafrom a set of existing similar apps. PRADA was evaluated by using app browsingtime on a set of 200 apps from the Wandoujia [106] app store covering 3.86 millionusers and 14.71 thousand devices. The results indicate that PRADA is able toaccurately prioritize test devices 75% of the time.Gomez et al. [41] developed a crowd sourcing based approach called MoTiF, tohelp developers reproduce app crashes experienced by end users. MoTiF monitorsthe execution traces from Android devices and identifies various crash patterns;it uses these crash patterns to generate test suites which the developers can useto reproduce the crashes quickly. MoTiF was evaluated on 5 Android apps, andsuccessfully generated test suites which reproduced the bugs in 4 out of the 5 apps.Deng et al. [30] adapted the popular structural testing method, mutation testing[29] to test Android apps. They defined mutation operators unique to Androidapps such as replacing event handlers or deleting buttons and used the operators todevelop a prototype to generate, inject and execute the mutated apps. The prototypewas used to test a small Android app and successfully demonstrated the feasibilityof applying mutation testing to Android apps.A technique AGRippin, that improves on Model Based Testing techniques ofAndroid apps was proposed by Amalfitano et al. [5]. AGRippin uses a combinationof genetic and hill climbing techniques to generate test suites. An evaluation on 553open source Android apps showed that AGRippin produced test suites which weremore efficient and effective than ones produced by Model Based techniques.In 2015 Bielik et al. [16] developed a system to detect race conditions andconcurrency bugs in Android applications. The system uses execution traces tobuild Happens-Before graphs which are used to detect race conditions. Applyingthe system on 8 open-source apps revealed 15 bugs such as displaying old infor-mation and crashes after the user stops using the application. Some of these bugswere reported to and fixed by the developers. Additionally, the proposed systemoutperformed other race condition detection tools in terms of bug detection andusability.54Chapter 5Conclusions and Future WorkIn the first part of this thesis, we present an empirical study on cross-platform mo-bile apps or app-pairs. We mine 80,169 iOS and Android app-pairs and comparetheir app-store attributes. We find that Android apps are more expensive and re-ceive more ratings. Price fluctuations across the platforms are common and reasonsfor that include, different monetizing strategies, offering different features and dif-ferent costs and efforts to maintain the app. Further, we find that more than 80%of the top-rated apps are cross platform and reasons for apps only existing on oneplatform include, lack of resources, platform restrictions and revenue per platform.Additionally we propose a metric to measure an app’s aggregated user-perceivedratings which combines ratings and stars. We find that in 95% of the app-pairs,the Android version is perceived better by users. Reasons for apps being perceivedbetter on one platform include the method used to implement that app, timing (re-lease/update) and first impressions and app store support and promotional oppor-tunities. Finally, we build three automated classifiers to classify 1.7M reviews tounderstand how the concerns and complaints of users vary across platforms. Wefind that iOS apps suffer more from critical and post update problems while An-droid apps exhibit more problems related to app features.In the second part of this thesis, we expand our work on cross-platform appsby examining a special category known as hybrid apps which are built using CrossPlatform Tools (CPTs). We mine 15,512 hybrid apps and analyze their attributeslike the number of ratings, stars and number of downloads. We find that appsimplemented using the PhoneGap CPT make up 68% of the hybrid apps and arebest perceived by end-users. However, apps created using the Adobe Air CPT,55despite having the smallest market share, have the highest number of ratings anddownloads. Further, we compare hybrid and native apps and find that they areperceived similarly by end users. Finally, we look at how hybrid apps are perceivedacross the Android and iOS platforms and find that using a CPT does not ensurethat the app will be equally perceived by users.5.1 Future WorkAs for future work, we plan to explore our data further to gain insights into thebehavior of apps across different platforms. For instance, we plan to analyze therelease dates of app-pairs to understand which platform developers will target firstwhen they release a new app. Additionally, the testing and analysis of apps acrossmultiple platforms could be explored. While our recent study [33] is a step to-ward better understanding of it, with the increased fragmentation in devices andplatforms, it still remains a challenge to test mobile apps across varying hardwareand platforms [84]. Finally, while we combined the stars and ratings to measurehow an app is perceived by users, in the future we will explore other methods suchas using the source code of an app coupled with its problematic user reviews tomeasure reliability and user perception.As for hybrid apps we plan to mine user reviews and automatically classifythem to better understand users’ perception of such apps. Other interesting av-enues of research include verifying whether CPTs output apps with the correct anddesired behavior. Finally, it would be valuable to survey the developers of hybridapps and get their feedback and comments on the subject.56Bibliography[1] Adobe Inc. https://get.adobe.com/air/. Accessed: 2016-06-12. → pages 3,35[2] Adobe Inc. Apache cordova. https://cordova.apache.org/, . Accessed:2016-10-02. → pages 44[3] Adobe Inc. Phone gap. http://phonegap.com/, . Accessed: 2016-10-02. →pages 3, 34, 45[4] M. Ali and A. Mesbah. Mining and characterizing hybrid apps. InProceedings of the 1st International Workshop on App Market Analytics(WAMA 2016), page 7 pages, 2016. → pages iii, 33[5] D. Amalfitano, N. Amatucci, A. R. Fasolino, and P. Tramontana. Agrippin:A novel search based testing technique for android applications. InProceedings of the 3rd International Workshop on Software DevelopmentLifecycle for Mobile, DeMobile 2015, pages 5–12, New York, NY, USA,2015. ACM. ISBN 978-1-4503-3815-8. doi:10.1145/2804345.2804348.URL http://doi.acm.org/10.1145/2804345.2804348. → pages 53[6] Amazon. Amazon mechanical turk.https://www.mturk.com/mturk/welcome. Accessed: 2016-09-30. → pages50[7] Android Market Stats. http://www.appbrain.com/stats/. Accessed:2015-11-10. → pages 1[8] E. Angulo and X. Ferre. A case study on cross-platform developmentframeworks for mobile applications and ux. In Proceedings of the XVInternational Conference on Human Computer Interaction,Interacci&#243;n ’14, pages 27:1–27:8, New York, NY, USA, 2014. ACM.ISBN 978-1-4503-2880-7. doi:10.1145/2662253.2662280. URLhttp://doi.acm.org/10.1145/2662253.2662280. → pages 4557[9] App Store Metrics. http://148apps.biz/app-store-metrics/. Accessed:2015-11-10. → pages 1[10] Appcelerator, Inc. Appcelerator titanium. http://www.appcelerator.com/.Accessed: 2016-10-02. → pages 3, 34, 44, 45[11] Apple Store Crawler.https://github.com/MarcelloLins/Apple-Store-Crawler. Accessed:2015-11-12. → pages 6[12] V. Avdiienko, K. Kuznetsov, A. Gorla, A. Zeller, S. Arzt, S. Rasthofer, andE. Bodden. Mining apps for abnormal usage of sensitive data. InProceedings of the 37th International Conference on SoftwareEngineering, ICSE 2015. ACM, 2015. → pages 1[13] V. Avdiienko, K. Kuznetsov, A. Gorla, A. Zeller, S. Arzt, S. Rasthofer, andE. Bodden. Mining apps for abnormal usage of sensitive data. InProceedings of the 37th International Conference on Software Engineering- Volume 1, ICSE ’15, pages 426–436, Piscataway, NJ, USA, 2015. IEEEPress. ISBN 978-1-4799-1934-5. URLhttp://dl.acm.org/citation.cfm?id=2818754.2818808. → pages 48[14] G. Bavota, M. Linares-Vasquez, C. Bernal-Cardenas, M. D. Penta,R. Oliveto, and D. Poshyvanyk. The impact of api change- andfault-proneness on the user ratings of Android apps. IEEE Transactions onSoftware Engineering, 99(PrePrints):1, 2015. → pages 13[15] G. Berardi, A. Esuli, T. Fagni, and F. Sebastiani. Multi-storemetadata-based supervised mobile app classification. In Proceedings of the30th Annual ACM Symposium on Applied Computing, SAC ’15, pages585–588, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3196-8.doi:10.1145/2695664.2695997. URLhttp://doi.acm.org/10.1145/2695664.2695997. → pages 51[16] P. Bielik, V. Raychev, and M. Vechev. Scalable race detection for androidapplications. In Proceedings of the 2015 ACM SIGPLAN InternationalConference on Object-Oriented Programming, Systems, Languages, andApplications, OOPSLA 2015, pages 332–348, New York, NY, USA, 2015.ACM. ISBN 978-1-4503-3689-5. doi:10.1145/2814270.2814303. URLhttp://doi.acm.org/10.1145/2814270.2814303. → pages 54[17] Boris Farber. ClassyShark. https://github.com/google/android-classyshark.Accessed: 2016-06-12. → pages 3558[18] N. Boushehrinejadmoradi, V. Ganapathy, S. Nagarakatte, and L. Iftode.Testing cross-platform mobile app development frameworks (t). InAutomated Software Engineering (ASE), 2015 30th IEEE/ACMInternational Conference on, pages 441–451, Nov 2015.doi:10.1109/ASE.2015.21. → pages 52[19] J. Brooke et al. Sus-a quick and dirty usability scale. Usability evaluationin industry, 189(194):4–7, 1996. → pages 45[20] L. Cen, D. Kong, H. Jin, and L. Si. Mobile app security risk assessment: Acrowdsourcing ranking approach from user comments. In Proceedings ofthe 2015 SIAM International Conference on Data Mining, pages 658–666.SIAM, 2015. → pages 46[21] K. Chen, P. Wang, Y. Lee, X. Wang, N. Zhang, H. Huang, W. Zou, andP. Liu. Finding unknown malice in 10 seconds: Mass vetting for newthreats at the google-play scale. In Proceedings of the 24th USENIXConference on Security Symposium, SEC’15, pages 659–674, Berkeley,CA, USA, 2015. USENIX Association. ISBN 978-1-931971-232. URLhttp://dl.acm.org/citation.cfm?id=2831143.2831185. → pages 49[22] N. Chen, J. Lin, S. C. H. Hoi, X. Xiao, and B. Zhang. Ar-miner: Mininginformative reviews for developers from mobile app marketplace. InProceedings of the 36th International Conference on Software Engineering,ICSE 2014, pages 767–778. ACM, 2014. → pages 1, 11, 24, 31[23] N. Chen, J. Lin, S. C. H. Hoi, X. Xiao, and B. Zhang. Ar-miner: Mininginformative reviews for developers from mobile app marketplace. InProceedings of the 36th International Conference on SoftwareEngineering, ICSE 2014, pages 767–778, New York, NY, USA, 2014.ACM. ISBN 978-1-4503-2756-5. doi:10.1145/2568225.2568263. URLhttp://doi.acm.org/10.1145/2568225.2568263. → pages 46[24] S. R. Choudhary, A. Gorla, and A. Orso. Automated test input generationfor android: Are we there yet? CoRR, abs/1503.07217, 2015. URLhttp://arxiv.org/abs/1503.07217. → pages 52[25] M. Ciman, O. Gaggi, and N. Gonzo. Cross-platform mobile development:A study on apps with animations. In Proceedings of the 29th Annual ACMSymposium on Applied Computing, SAC ’14, pages 757–759, New York,NY, USA, 2014. ACM. ISBN 978-1-4503-2469-4.doi:10.1145/2554850.2555104. URLhttp://doi.acm.org/10.1145/2554850.2555104. → pages 4559[26] Colin Eberhardt and Chris Price . Propertycross. http://propertycross.com/.Accessed: 2016-10-02. → pages 46[27] J. W. Creswell. Research design: Qualitative, quantive, and mixed methodsapproaches. Sage Publications, Incorporated, 2013. → pages 6[28] S. K. Dash, G. Suarez-Tangil, S. Khan, K. Tam, M. Ahmadi, J. Kinder, andL. Cavallaro. Droidscribe: Classifying android malware based on runtimebehavior. In 2016 IEEE Security and Privacy Workshops (SPW), pages252–261, May 2016. doi:10.1109/SPW.2016.25. → pages 49[29] R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test dataselection: Help for the practicing programmer. Computer, 11(4):34–41,Apr. 1978. ISSN 0018-9162. doi:10.1109/C-M.1978.218136. URLhttp://dx.doi.org/10.1109/C-M.1978.218136. → pages 53[30] L. Deng, N. Mirzaei, P. Ammann, and J. Offutt. Towards mutation analysisof android apps. In Software Testing, Verification and ValidationWorkshops (ICSTW), 2015 IEEE Eighth International Conference on,pages 1–10, April 2015. doi:10.1109/ICSTW.2015.7107450. → pages 53[31] Z. Deng, B. Saltaformaggio, X. Zhang, and D. Xu. iris: Vetting private apiabuse in ios applications. In Proceedings of the 22Nd ACM SIGSACConference on Computer and Communications Security, CCS ’15, pages44–56, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3832-5.doi:10.1145/2810103.2813675. URLhttp://doi.acm.org/10.1145/2810103.2813675. → pages 48[32] M. Erfani Joorabchi, A. Mesbah, and P. Kruchten. Real challenges inmobile app development. In Proceedings of the ACM/IEEE InternationalSymposium on Empirical Software Engineering and Measurement,ESEM’13, pages 15–24. ACM, 2013. → pages 1, 29[33] M. Erfani Joorabchi, M. Ali, and A. Mesbah. Detecting inconsistencies inmulti-platform mobile apps. In Proceedings of the InternationalSymposium on Software Reliability Engineering, ISSRE ’15. IEEEComputer Society, 2015. → pages 1, 56[34] M. Erfani Joorabchi, M. Ali, and A. Mesbah. Detecting inconsistencies inmulti-platform mobile apps. In Proceedings of the InternationalSymposium on Software Reliability Engineering (ISSRE), page 11 pages.IEEE Computer Society, 2015. URLhttp://salt.ece.ubc.ca/publications/docs/issre15.pdf. → pages 5260[35] M. A. Figueiredo. Lecture notes on bayesian estimation and classification.Instituto de Telecomunicacoes-Instituto Superior Tecnico, page 60, 2004.→ pages 13[36] Y. Fratantonio, A. Bianchi, W. Robertson, E. Kirda, C. Kruegel, andG. Vigna. Triggerscope: Towards detecting logic bombs in androidapplications. In 2016 IEEE Symposium on Security and Privacy (SP),pages 377–396, May 2016. doi:10.1109/SP.2016.30. → pages 50[37] R. Gallo, P. Hongo, R. Dahab, L. C. Navarro, H. Kawakami, K. Galva˜o,G. Junqueira, and L. Ribeiro. Security and system architecture:Comparison of Android customizations. In Proceedings of the 8th ACMConference on Security & Privacy in Wireless and Mobile Networks,WiSec ’15, pages 12:1–12:6, New York, NY, USA, 2015. ACM. ISBN978-1-4503-3623-9. doi:10.1145/2766498.2766519. URLhttp://doi.acm.org/10.1145/2766498.2766519. → pages 29[38] C. Gao, B. Wang, P. He, J. Zhu, Y. Zhou, and M. R. Lyu. Paid: Prioritizingapp issues for developers by tracking user reviews over versions. InSoftware Reliability Engineering (ISSRE), 2015 IEEE 26th InternationalSymposium on, pages 35–45. IEEE, 2015. → pages 47[39] C. Gao, H. Xu, J. Hu, and Y. Zhou. Ar-tracker: Track the dynamics ofmobile apps via user review mining. In Service-Oriented SystemEngineering (SOSE), 2015 IEEE Symposium on, pages 284–290, March2015. doi:10.1109/SOSE.2015.13. → pages 46[40] M. Go´mez, R. Rouvoy, M. Monperrus, and L. Seinturier. A recommendersystem of buggy app checkers for app store moderators. In Proceedings ofthe Second ACM International Conference on Mobile SoftwareEngineering and Systems, MOBILESoft ’15, pages 1–11, Piscataway, NJ,USA, 2015. IEEE Press. ISBN 978-1-4799-1934-5. URLhttp://dl.acm.org/citation.cfm?id=2825041.2825043. → pages 46[41] M. Go´mez, R. Rouvoy, B. Adams, and L. Seinturier. Reproducingcontext-sensitive crashes of mobile apps using crowdsourced monitoring.In Proceedings of the International Conference on Mobile SoftwareEngineering and Systems, MOBILESoft ’16, pages 88–99, New York, NY,USA, 2016. ACM. ISBN 978-1-4503-4178-3.doi:10.1145/2897073.2897088. URLhttp://doi.acm.org/10.1145/2897073.2897088. → pages 5361[42] Google. Android monkey.https://developer.android.com/studio/test/monkey.html. Accessed:2016-09-30. → pages 53[43] Google. Angularjs. https://angularjs.org/. Accessed: 2016-10-02. → pages44[44] Google Play Store Crawler.https://github.com/MarcelloLins/GooglePlayAppsCrawler. Accessed:2015-11-12. → pages 6[45] Goole Play Store Review scraper.https://github.com/jkao/GooglePlayScraper. Accessed: 2015-11-21. →pages 11[46] A. Gorla, I. Tavecchia, F. Gross, and A. Zeller. Checking app behavioragainst app descriptions. In Proceedings of the 36th InternationalConference on Software Engineering, ICSE 2014, pages 1025–1035. ACM,2014. → pages 1[47] A. Gorla, I. Tavecchia, F. Gross, and A. Zeller. Checking app behavioragainst app descriptions. In Proceedings of the 36th InternationalConference on Software Engineering, ICSE 2014, pages 1025–1035, NewYork, NY, USA, 2014. ACM. ISBN 978-1-4503-2756-5.doi:10.1145/2568225.2568276. URLhttp://doi.acm.org/10.1145/2568225.2568276. → pages 49[48] X. Gu and S. Kim. ”what parts of your apps are loved by users?” (t). InProceedings of the 2015 30th IEEE/ACM International Conference onAutomated Software Engineering (ASE), ASE ’15, pages 760–770,Washington, DC, USA, 2015. IEEE Computer Society. ISBN978-1-5090-0025-8. doi:10.1109/ASE.2015.57. URLhttp://dx.doi.org/10.1109/ASE.2015.57. → pages 46[49] L. Guerrouj, S. Azad, and P. C. Rigby. The influence of app churn on appsuccess and stackoverflow discussions. In 2015 IEEE 22nd InternationalConference on Software Analysis, Evolution, and Reengineering (SANER),pages 321–330, March 2015. doi:10.1109/SANER.2015.7081842. →pages 13[50] E. Guzman and W. Maalej. How do users like this feature? a fine grainedsentiment analysis of app reviews. In Requirements Engineering62Conference (RE), 2014 IEEE 22nd International, pages 153–162, 2014. →pages 13[51] E. Guzman, O. Aly, and B. Bruegge. Retrieving diverse opinions from appreviews. In 2015 ACM/IEEE International Symposium on EmpiricalSoftware Engineering and Measurement (ESEM), pages 1–10. IEEE, 2015.→ pages 47[52] D. Han, C. Zhang, X. Fan, A. Hindle, K. Wong, and E. Stroulia.Understanding Android fragmentation with topic analysis ofvendor-specific bugs. In Reverse Engineering (WCRE), 2012 19th WorkingConference on, pages 83–92, Oct 2012. doi:10.1109/WCRE.2012.18. →pages 29[53] M. Harman, Y. Jia, and Y. Zhang. App store mining and analysis: MSR forapp stores. In 9th IEEE Working Conference on Mining SoftwareRepositories (MSR), pages 108–111. IEEE, June 2012. → pages 13[54] X. He, W. Dai, G. Cao, R. Tang, M. Yuan, and Q. Yang. Mining targetusers for online marketing based on app store data. In Big Data (Big Data),2015 IEEE International Conference on, pages 1043–1052, Oct 2015.doi:10.1109/BigData.2015.7363858. → pages 51[55] H. Heitko¨tter, S. Hanschke, and T. A. Majchrzak. Evaluatingcross-platform development approaches for mobile applications. InInternational Conference on Web Information Systems and Technologies,pages 120–138. Springer, 2012. → pages 2[56] H. Heitko¨tter, S. Hanschke, and T. A. Majchrzak. EvaluatingCross-Platform Development Approaches for Mobile Applications, pages120–138. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. ISBN978-3-642-36608-6. doi:10.1007/978-3-642-36608-6 8. URLhttp://dx.doi.org/10.1007/978-3-642-36608-6 8. → pages 45[57] J. Huang, Z. Li, X. Xiao, Z. Wu, K. Lu, X. Zhang, and G. Jiang. Supor:Precise and scalable sensitive user input detection for android apps. InProceedings of the 24th USENIX Conference on Security Symposium,SEC’15, pages 977–992, Berkeley, CA, USA, 2015. USENIX Association.ISBN 978-1-931971-232. URLhttp://dl.acm.org/citation.cfm?id=2831143.2831205. → pages 4963[58] Hybrid Apps Study: Toolset and dataset.https://github.com/saltlab/hybrid-apps-study. Accessed: 2016-06-12. →pages 3, 37, 42[59] C. Iacob, V. Veerappa, and R. Harrison. What are you complaining about?:A study of online reviews of mobile applications. In Proceedings of the27th International BCS Human Computer Interaction Conference,BCS-HCI ’13, pages 29:1–29:6. British Computer Society, 2013. → pages1[60] IMDB. http://www.imdb.com/help/show leaf?votestopfaq. Accessed:2015-11-22. → pages 13[61] T. F. in Android Apps.https://play.google.com/store/apps/collection/topselling free?hl=en.Accessed: 2015-11-13. → pages 10[62] iTunes App Store Review scraper.https://github.com/grych/AppStoreReviews. Accessed: 2015-11-21. →pages 11[63] iTunes RSS Feed Generator. https://rss.itunes.apple.com/ca/. Accessed:2015-11-13. → pages 10[64] Jeremy Ashkenas. Backbone.js. http://backbonejs.org/. Accessed:2016-10-02. → pages 44[65] M. Khalid, U. Shehzaib, and M. Asif. A case of mobile app reviews as acrowdsource. International Journal of Information Engineering andElectronic Business, 7(5):39, 2015. → pages 47[66] D. Lavid Ben Lulu and T. Kuflik. Functionality-based clustering usingshort textual description: Helping users to find apps installed on theirmobile device. In Proceedings of the 2013 International Conference onIntelligent User Interfaces, IUI ’13, pages 297–306. ACM, 2013. → pages1[67] T.-P. Liang, X. Li, C.-T. Yang, and M. Wang. What in consumer reviewsaffects the sales of mobile apps: A multifacet sentiment analysis approach.International Journal of Electronic Commerce, 20(2):236–260, 2015. →pages 4764[68] B. Liu. Sentiment analysis: A multifaceted problem. IEEE IntelligentSystems, 25(3):76–80, 8 2010. ISSN 1094-7167.doi:10.1109/MIS.2010.75. → pages 47[69] X. Lu, X. Liu, H. Li, T. Xie, Q. Mei, D. Hao, G. Huang, and F. Feng.Prada: Prioritizing android devices for apps by mining large-scale usagedata. In Proceedings of the 38th International Conference on SoftwareEngineering, ICSE ’16, pages 3–13, New York, NY, USA, 2016. ACM.ISBN 978-1-4503-3900-1. doi:10.1145/2884781.2884828. URLhttp://doi.acm.org/10.1145/2884781.2884828. → pages 53[70] D. L. B. Lulu and T. Kuflik. Wise mobile icons organization: Appstaxonomy classification using functionality mining to ease apps finding.Mobile Information Systems, 2015. → pages 51[71] S. Ma, S. Wang, D. Lo, R. H. Deng, and C. Sun. Active semi-supervisedapproach for checking app behavior against its description. In ComputerSoftware and Applications Conference (COMPSAC), 2015 IEEE 39thAnnual, volume 2, pages 179–184, July 2015.doi:10.1109/COMPSAC.2015.93. → pages 49[72] W. Maalej and H. Nabil. Bug report, feature request, or simply praise? onautomatically classifying app reviews. In 2015 IEEE 23rd internationalrequirements engineering conference (RE), pages 116–125. IEEE, 2015. →pages 47[73] A. Machiry, R. Tahiliani, and M. Naik. Dynodroid: An input generationsystem for android apps. In Proceedings of the 2013 9th Joint Meeting onFoundations of Software Engineering, ESEC/FSE 2013, pages 224–234,New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2237-9.doi:10.1145/2491411.2491450. URLhttp://doi.acm.org/10.1145/2491411.2491450. → pages 53[74] I. Malavolta, S. Ruberto, T. Soru, and V. Terragni. End users’ perception ofhybrid mobile apps in the google play store. In 2015 IEEE InternationalConference on Mobile Services, pages 25–32, June 2015.doi:10.1109/MobServ.2015.14. → pages 45[75] I. Malavolta, S. Ruberto, T. Soru, and V. Terragni. Hybrid mobile apps inthe google play store: An exploratory investigation. In Mobile SoftwareEngineering and Systems (MOBILESoft), 2015 2nd ACM InternationalConference on, pages 56–59, May 2015. doi:10.1109/MobileSoft.2015.15.→ pages 4465[76] K. Mao, M. Harman, and Y. Jia. Sapienz: Multi-objective automatedtesting for android applications. In Proceedings of the 25th InternationalSymposium on Software Testing and Analysis, ISSTA 2016, pages 94–105,New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4390-9.doi:10.1145/2931037.2931054. URLhttp://doi.acm.org/10.1145/2931037.2931054. → pages 53[77] S. McIlroy, W. Shang, N. Ali, and A. Hassan. Is it worth responding toreviews? a case study of the top free apps in the google play store. IEEESoftware, PP(99):1–1, 2015. ISSN 0740-7459. doi:10.1109/MS.2015.149.→ pages 48[78] S. Mcilroy, N. Ali, H. Khalid, and A. E. Hassan. Analyzing andautomatically labelling the types of user issues that are raised in mobile appreviews. Empirical Softw. Engg., 21(3):1067–1106, June 2016. ISSN1382-3256. doi:10.1007/s10664-015-9375-7. URLhttp://dx.doi.org/10.1007/s10664-015-9375-7. → pages 47[79] Z. Meng, Y. Jiang, and C. Xu. Facilitating reusable and scalable automatedtesting and analysis for android apps. In Proceedings of the 7thAsia-Pacific Symposium on Internetware, Internetware ’15, pages 166–175,New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3641-3.doi:10.1145/2875913.2875937. URLhttp://doi.acm.org/10.1145/2875913.2875937. → pages 52[80] Mining iOS and Android mobile app-pairs: Toolset and dataset.https://github.com/saltlab/Minning-App-Stores. Accessed: 2015-11-21. →pages 3, 7, 15, 30, 31[81] K. Moran, M. Linares-Va´squez, C. Bernal-Ca´rdenas, and D. Poshyvanyk.Auto-completing bug reports for android applications. In Proceedings ofthe 2015 10th Joint Meeting on Foundations of Software Engineering,ESEC/FSE 2015, pages 673–686, New York, NY, USA, 2015. ACM. ISBN978-1-4503-3675-8. doi:10.1145/2786805.2786857. URLhttp://doi.acm.org/10.1145/2786805.2786857. → pages 49[82] MoSync AB. Mosync. https://github.com/MoSync/MoSync. Accessed:2016-10-02. → pages 45[83] P. Mutchler, A. Doupe´, J. Mitchell, C. Kruegel, and G. Vigna. A large-scalestudy of mobile web app security. In Proceedings of the Mobile SecurityTechnologies Workshop (MoST), 2015. → pages 5066[84] M. Nagappan and E. Shihab. Future trends in software engineeringresearch for mobile apps. In Proceedings of the IEEE InternationalConference on Software Analysis, Evolution, and Reengineering, FoSE,2016. → pages 2, 30, 56[85] Nicolas Viennot. Playdrone. https://github.com/nviennot/playdrone.Accessed: 2016-06-12. → pages 35[86] F. Palomba, M. Linares-Va´squez, G. Bavota, R. Oliveto, M. Di Penta,D. Poshyvanyk, and A. De Lucia. User reviews matter! trackingcrowdsourced reviews to support evolution of successful apps. In SoftwareMaintenance and Evolution (ICSME), 2015 IEEE International Conferenceon, pages 291–300. IEEE, 2015. → pages 48[87] S. Panichella, A. Di Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, andH. C. Gall. How can i improve my app? classifying user reviews forsoftware maintenance and evolution. In Software Maintenance andEvolution (ICSME), 2015 IEEE International Conference on, pages281–290. IEEE, 2015. → pages 48[88] S. Panichella, A. D. Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, andH. C. Gall. How can i improve my app? classifying user reviews forsoftware maintenance and evolution. In IEEE International Conference onSoftware Maintenance and Evolution (ICSME). IEEE, 2015. → pages 1,11, 24, 31[89] D. H. Park, M. Liu, C. Zhai, and H. Wang. Leveraging user reviews toimprove accuracy for mobile app retrieval. In Proceedings of the 38thInternational ACM SIGIR Conference on Research and Development inInformation Retrieval, SIGIR ’15, pages 533–542, New York, NY, USA,2015. ACM. ISBN 978-1-4503-3621-5. doi:10.1145/2766462.2767759.URL http://doi.acm.org/10.1145/2766462.2767759. → pages 48[90] I. Ruiz, M. Nagappan, B. Adams, and A. Hassan. Understanding reuse inthe Android market. In Program Comprehension (ICPC), 2012 IEEE 20thInternational Conference on, pages 113–122. IEEE, June 2012. → pages 1[91] I. M. Ruiz, M. Nagappan, B. Adams, T. Berger, S. Dienst, and A. Hassan.On the relationship between the number of ad libraries in an Android appand its rating. IEEE Software, 99, 2014. ISSN 0740-7459. → pages 1[92] J. Schutte, R. Fedler, and D. Titze. Condroid: Targeted dynamic analysis ofandroid applications. In 2015 IEEE 29th International Conference on67Advanced Information Networking and Applications, pages 571–578,March 2015. doi:10.1109/AINA.2015.238. → pages 50[93] Scikit Learn: Machine Learning in Python.http://scikit-learn.org/stable/index.html. Accessed: 2015-11-13. → pages12[94] S. Seneviratne, A. Seneviratne, M. A. Kaafar, A. Mahanti, andP. Mohapatra. Early detection of spam mobile apps. In Proceedings of the24th International Conference on World Wide Web, WWW, pages 949–959.ACM, 2015. → pages 1[95] S. Seneviratne, A. Seneviratne, M. A. Kaafar, A. Mahanti, andP. Mohapatra. Early detection of spam mobile apps. In Proceedings of the24th International Conference on World Wide Web, WWW ’15, pages949–959, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3469-3.doi:10.1145/2736277.2741084. URLhttp://doi.acm.org/10.1145/2736277.2741084. → pages 51[96] G. Shafer and V. Vovk. A tutorial on conformal prediction. Journal ofMachine Learning Research, 9(Mar):371–421, 2008. → pages 49[97] I. Steinwart and A. Christmann. Support Vector Machines. SpringerPublishing Company, Incorporated, 1st edition, 2008. ISBN 0387772413.→ pages 49[98] The jQuery Project. jquery mobile. https://jquerymobile.com/. Accessed:2016-10-02. → pages 45[99] M. Thelwall, K. Buckley, and G. Paltoglou. Sentiment strength detectionfor the social web. J. Am. Soc. Inf. Sci. Technol., 63(1):163–173, Jan. 2012.ISSN 1532-2882. doi:10.1002/asi.21662. URLhttp://dx.doi.org/10.1002/asi.21662. → pages 13[100] Y. Tian, M. Nagappan, D. Lo, and A. E. Hassan. What are thecharacteristics of high-rated apps? a case study on free Androidapplications. In IEEE International Conference on Software Maintenanceand Evolution (ICSME). IEEE, 2015. → pages 12, 13[101] Y. Tian, M. Nagappan, D. Lo, and A. E. Hassan. What are thecharacteristics of high-rated apps? a case study on free androidapplications. In Proceedings of the 2015 IEEE International Conference onSoftware Maintenance and Evolution (ICSME), ICSME ’15, pages68301–310, Washington, DC, USA, 2015. IEEE Computer Society. ISBN978-1-4673-7532-0. doi:10.1109/ICSM.2015.7332476. URLhttp://dx.doi.org/10.1109/ICSM.2015.7332476. → pages 51[102] Triggercorp Inc. https://trigger.io/. Accessed: 2016-06-12. → pages 3[103] N. Viennot, E. Garcia, and J. Nieh. A measurement study of google play.In The 2014 ACM International Conference on Measurement and Modelingof Computer Systems, SIGMETRICS ’14, pages 221–233, New York, NY,USA, 2014. ACM. ISBN 978-1-4503-2789-3.doi:10.1145/2591971.2592003. URLhttp://doi.acm.org/10.1145/2591971.2592003. → pages 3, 22, 34, 35[104] L. Vigneri, J. Chandrashekar, I. Pefkianakis, and O. Heen. Taming theandroid appstore: Lightweight characterization of android applications.CoRR, abs/1504.06093, 2015. URL http://arxiv.org/abs/1504.06093. →pages 50[105] L. Villarroel, G. Bavota, B. Russo, R. Oliveto, and M. Di Penta. Releaseplanning of mobile apps based on user reviews. In Proceedings of the 38thInternational Conference on Software Engineering, ICSE ’16, pages14–24, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-3900-1.doi:10.1145/2884781.2884818. URLhttp://doi.acm.org/10.1145/2884781.2884818. → pages 47[106] Wandoujia. Wandoujia. https://www.wandoujia.com/. Accessed:2016-09-30. → pages 53[107] H. Wang, J. Hong, and Y. Guo. Using text mining to infer the purpose ofpermission use in mobile apps. In Proceedings of the 2015 ACMInternational Joint Conference on Pervasive and Ubiquitous Computing,UbiComp ’15, pages 1107–1118, New York, NY, USA, 2015. ACM. ISBN978-1-4503-3574-4. doi:10.1145/2750858.2805833. URLhttp://doi.acm.org/10.1145/2750858.2805833. → pages 51[108] M. Willocx, J. Vossaert, and V. Naessens. Comparing performanceparameters of mobile app development strategies. In Proceedings of theInternational Conference on Mobile Software Engineering and Systems,MOBILESoft ’16, pages 38–47, New York, NY, USA, 2016. ACM. ISBN978-1-4503-4178-3. doi:10.1145/2897073.2897092. URLhttp://doi.acm.org/10.1145/2897073.2897092. → pages 4669[109] L. Wu, M. Grace, Y. Zhou, C. Wu, and X. Jiang. The impact of vendorcustomizations on Android security. In Proceedings of the 2013 ACMSIGSAC Conference on Computer &#38; Communications Security, CCS’13, pages 623–634, New York, NY, USA, 2013. ACM. ISBN978-1-4503-2477-9. doi:10.1145/2508859.2516728. URLhttp://doi.acm.org/10.1145/2508859.2516728. → pages 29[110] Xamarin Inc. https://www.xamarin.com/. Accessed: 2016-06-12. → pages3[111] W. Yang, X. Xiao, B. Andow, S. Li, T. Xie, and W. Enck. Appcontext:Differentiating malicious and benign mobile app behaviors using context.In Proceedings of the 37th International Conference on SoftwareEngineering - Volume 1, ICSE ’15, pages 303–313, Piscataway, NJ, USA,2015. IEEE Press. ISBN 978-1-4799-1934-5. URLhttp://dl.acm.org/citation.cfm?id=2818754.2818793. → pages 50[112] M. Zhang, Y. Duan, Q. Feng, and H. Yin. Towards automatic generation ofsecurity-centric descriptions for android apps. In Proceedings of the 22NdACM SIGSAC Conference on Computer and Communications Security,CCS ’15, pages 518–529, New York, NY, USA, 2015. ACM. ISBN978-1-4503-3832-5. doi:10.1145/2810103.2813669. URLhttp://doi.acm.org/10.1145/2810103.2813669. → pages 5070

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0340560/manifest

Comment

Related Items