Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

What to learn next: recommending commands in a feature-rich environment Zolaktaf, Sedigheh 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2015_september_zolaktaf_sedigheh.pdf [ 795.08kB ]
JSON: 24-1.0166527.json
JSON-LD: 24-1.0166527-ld.json
RDF/XML (Pretty): 24-1.0166527-rdf.xml
RDF/JSON: 24-1.0166527-rdf.json
Turtle: 24-1.0166527-turtle.txt
N-Triples: 24-1.0166527-rdf-ntriples.txt
Original Record: 24-1.0166527-source.json
Full Text

Full Text

What to Learn Next: Recommending Commands in a Feature-richEnvironmentbySedigheh ZolaktafB.Sc., Sharif University of Technology, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2015c© Sedigheh Zolaktaf, 2015AbstractDespite an abundance of commands to make tasks easier to perform, the users of feature-richapplications, such as development environments and AutoCAD applications, use only a fraction ofthe commands available due to a lack of awareness of the existence of many commands. Earlierwork has shown that command recommendation can improve the usage of a range of commandsavailable within such applications.In this thesis, we address the command recommendation problem, in which, given the com-mand usage history of a set of users, the objective is to predict a command that is likely useful forthe user to learn. We investigate two approaches to address the problem.The first approach is built upon the hypothesis that users of feature-rich applications who havesimilar features tend to use the same commands, and also, a specific user tends to use commandswith similar features. Building on this hypothesis, we describe a supervised learning frameworkthat exploits features from a user-command network to predict new links among users and com-mands.The second approach is built upon three hypotheses. First, we hypothesize that in feature-richapplications there exists co-occurrence patterns between commands. Second, we hypothesize thatusers of feature-rich applications have prevalent discovery patterns. Finally, we hypothesize thatusers need different recommendations based on the time elapsed between their last activity and thetime of recommendation. To generate recommendations, we obtain co-occurrence and discoverypatterns from the command usage history of a large set of users of the same feature-rich application.Subsequently, for each user, we produce recommendations based on the user’s command usagehistory, co-occurrence and discovery patterns, and time elapsed since the last command usage. Weiirefer to the algorithm we developed according to this approach as CoDis.Empirical experiments on data submitted by users of an integrated development environment(Eclipse) demonstrate that CoDis achieves significant performance improvements over link predic-tion, standard algorithms used for command recommendation, and matrix factorization techniquesthat are known to perform well in other domains. Compared to ADAGRAD, the best perform-ing baseline, it achieves an improvement of 10.22% in recall, for a top-N recommendation task(N = 20).iiiPrefaceThis thesis is submitted in partial fulfilment of the requirements for a Master of Science Degree inComputer Science. All the work presented in this dissertation are original and independent workof the author, performed under the supervision of Prof. Gail C. Murphy.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 CoDis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.7 Notations and Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.8 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6v2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Increasing Users Awareness of Commands . . . . . . . . . . . . . . . . . . . . . . 72.1.1 Tutorials for Application Usage . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 Recommendation Techniques for Application Usage . . . . . . . . . . . . 82.2 Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Prediction of Next Item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A Link Prediction Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1 Feature Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Classification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 CoDis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.1 Co-occurrence Score (CScore) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Discovery Score (DScore) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Combining Co-occurrence and Discovery Scores Based on Elapsed Time (CoDis) . 214.4 Combining Co-occurrence and Discovery Score Irrespective of Elapsed Time (Co+Dis) 225 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Training, Validation, and Testing Sets . . . . . . . . . . . . . . . . . . . . . . . . 245.3 Algorithms & Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.3.1 Link Prediction Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 255.3.2 CoDis & Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.3.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.4 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.5 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.7 Link Prediction Feature Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 385.8 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40vi6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.1 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47viiList of TablesTable 1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Table 5.1 Parameters used in algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . 28viiiList of FiguresFigure 1.1 Command usage distribution of Eclipse users obtained by UDC. . . . . . . . . 2Figure 3.1 A representation of a weighted user-command bipartite graph, where edgesrepresent user’ interactions with commands. . . . . . . . . . . . . . . . . . . . 13Figure 5.1 Recall of link prediction classifiers in subset (a). Dataset consists of 4178 usersand 480 commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 5.2 Recall of link prediction classifiers in subset (b). Dataset consists of 4292 usersand 638 commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 5.3 Recall of link prediction classifiers in subset (c). Dataset consists of 4301 usersand 676 commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 5.4 Recall of link prediction classifiers in subset (d). Dataset consists of 4305 usersand 700 commands. Item-based Discovery uses 3724 users and 614 commands. 34Figure 5.5 Recall of CoDis, Co + Dis, CScore, and DScore algorithms against baselinesin subset (a). Dataset consists of 4178 users and 480 commands. . . . . . . . . 34Figure 5.6 Recall of CoDis, Co + Dis, CScore, and DScore algorithms against baselinesin subset (b). Dataset consists of 4292 users and 638 commands. . . . . . . . . 35Figure 5.7 Recall of CoDis, Co + Dis, CScore, and DScore algorithms against baselinesin subset (c). Dataset consists of 4301 users and 676 commands. . . . . . . . . 35ixFigure 5.8 Recall of CoDis, Co + Dis, CScore, and DScore algorithms against baselinesin subset (d). Dataset consists of 4305 users and 700 commands. User-basedDiscovery and Item-based Discovery use a dataset consisting of 3724 users and614 commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Figure 5.9 Jaccard similarity of users that the CScore and DScore algorithms were able tomake correct recommendation for, in subset (d). . . . . . . . . . . . . . . . . . 36Figure 5.10 Average elapsed time of users that the CoDis, Co+Dis, CScore, and DScorealgorithms were able to make correct recommendation for, in subset(d) . . . . 37Figure 5.11 Information gain of features based on the Decision Tree classifier, in subset (d). 39Figure 5.12 Information gain of features based on the Random Forest classifier, in subset (d). 39xAcknowledgmentsI would like to thank my supervisor, Prof. Gail C. Murphy, for all her support, help, and encour-agements. I am sincerely grateful for the kind guidance she provided, which allowed me to developthis thesis.I would like to thank Prof. Laks V.S. Lakshmanan for being the second reader of my thesis andfor providing helpful comments.I would also like to thank my sister, Zainab, for being the third (informal) reader of my thesisand also for making life much more fun throughout my masters.Lastly, I would like to thank my family, for their love, their belief in me, and their never-endingencouragement.xiDedicationTo my family.xiiChapter 1Introduction1.1 MotivationIntegrated development environments (IDE) provide comprehensive tools and facilities to assistprogrammers with software development tasks. In particular, these tools support various phases ofthe software development cycle. For instance, source code editors integrated in these environmentsare designed to simplify and speed up the input of source code, while debugging tools make theidentification and fixing of bugs easier. To use any of these tools, the user is required to executecommands either through a command line, a graphical user interface or a keyboard short cut. Giventhe complexity of developing code, these environments quickly become feature-rich and complex,thereby making them less effective than aspired. For instance, consider the Eclipse integrateddevelopment environment1. Logs collected from a large set of anonymous users of this IDE with theEclipse Usage Data Collector (UDC)23 in 2009, show that 1098 distinct commands are used across897714 users, but the majority of the users utilize less than 10 commands solely (Figure 1.1a).Even the majority of users who were continuously active during the entire year consisting of 4308users, use less than 60 distinct command individually, whereas altogether they use 700 commands(Figure 1.1b).This lack of awareness of commands has also been reported for other highly-functional appli-cations, such as AutoCAD [27, 37], a computer-aided drawing application. For development envi-ronments, which have an ability to be easily extended with new tools and hence new commands,the problem is exacerbated. The efficiency of a user in performing tasks with these applicationscan be increased by making the users aware of commands relevant to them.1 approach for increasing a user’s awareness of features in a feature-rich application is byproviding efficient search mechanisms. This approach requires thorough documentation of theapplication, which must adjust with changes to the application over time. Even more critically, ifa user is unaware of the existence of a feature, it will be difficult, if not impossible, to search forcommands supporting the feature.An alternative approach is through personalized command recommendation based on commandusage history, which we call the command recommendation problem.(a) Command usage distribution of all users.(b) Command usage distribution of users who were active during an entire year.Figure 1.1: Command usage distribution of Eclipse users obtained by UDC.21.2 Problem StatementIn the command recommendation problem, given the command usage history of a set of users, theobjective is to predict a command that will likely be useful for the user to learn. The commandusage history of a user consists of a time-ordered sequence of commands issued by the user for thefeature-rich application. We denote the set of users as U and the set of all commands issued bythe users as I .To obtain recommendation lists, for each user and unused command pair (u, i), where u ∈ Uand i ∈ I , we compute a score that represents how useful command i would be for the user u.Subsequently, we recommend the top N unknown commands with the highest score to the user.In this thesis, we focus on command recommendations for IDEs. We investigate two ap-proaches to address the command recommendation problem: (1) link prediction as a supervisedlearning task and (2) an algorithm that we name CoDis.1.3 Link PredictionLink prediction [29] has been used for recommendation in different domains, such as co-authorshipnetworks [2, 53] and e-commerce [28]. A user-item recommendation problem can be formalizedas a link prediction problem by representing user and items as nodes in a bipartite graph andrepresenting user-item interactions as links between user-item nodes. Link prediction deals withrecommendation by trying to infer which new links between the user and item vertices are likelyto occur in the graph, given a snapshot of the bipartite graph.If existing links have similar features that distinguish them from non-existent links, link pre-diction can be solved using a supervised learning framework. In the command domain, we as-sume users who have similar features tend to use the same commands, and also, a specific usertends to use commands with similar features. Based on this assumption, we convert the commandrecommendation problem to a link prediction problem. We model users and commands as a het-erogeneous bipartite graph, where nodes represent users and commands, and links represent usercommand interactions. We build upon the work of Huang and colleagues [21] to extract topolog-ical similarity metrics from a user-command bipartite graph. We then apply supervised learningalgorithms to infer for each user, which commands they are likely to form an link with and generaterecommendations based on the predicted links.1.4 CoDisPrevious work has addressed the command recommendation problem through collaborative filter-ing techniques. Li, Matejka, and colleagues [27, 37] use user-based and item-based collaborativefiltering to generate personalized command recommendations in AutoCAD. Murphy-Hill and col-3leagues [44] use collaborative filtering techniques based on sequential pattern mining [52] andpatterns of past command discovery, to recommend commands in Eclipse. In this work, we buildupon similar assumptions as [44] and [52] to improve the task of command recommendation.In most domains, such as that of text collections or electronic retail, there are sets of frequentlyco-occurring items [3, 19, 46, 52]. This phenomenon is even more noticeable in the domain of IDEcommands, where certain sets of commands have to be used to complete specific tasks [46, 56].Hence, based on the hypothesis that in an IDE, frequent command co-occurrence patterns exist, weestimate co-occurrence scores for each user and command (CScore algorithm). We apply a variantof bigram models [8, 39] to compute this score. These models make predictions using observedmarginal and conditional item frequencies.Murphy-Hill and colleagues [44] also showed that there exists prevalent command discoverypatterns in the domain of IDE commands, and that these patterns can be incorporated into col-laborative filtering algorithms to make useful recommendations. Based on the intuition that in anIDE, frequent command discovery patterns exist, we estimate discovery scores for each commandand user (DScore algorithm). Similar to co-occurrence scores, we use bigram models to computediscovery scores.Finally, users have different needs in different circumstances. We believe the elapsed timebetween the user’s last activity and the time of recommendation is an indicator of the user’s rec-ommendation need. In particular, if a user’s elapsed time is relatively small, they are more likelyto be working on their most recent tasks. However, if the elapsed time is relatively large, the useris more likely to have begun a new task. Based on this intuition, our solution consists of combin-ing the co-occurrence and discovery score of each user with regard to their elapsed time to makerecommendations. We define a tuning parameter λ to control the influence of co-occurrence anddiscovery scores. If the users elapsed time is less than λ , then the user is more likely to be workingon their most recent task and recommendation should take into account their co-occurrence scoremore than their discovery score. However, if the elapsed time is more than λ , then recommendationshould take into account their discovery score more than their co-occurrence score. We refer to thealgorithm we developed according to this approach as CoDis.1.5 Thesis ContributionsThe contributions of this thesis include:• We formulate the command recommendation problem as a supervised learning link predic-tion problem. We show how we can extract features from command usage history of users,described in Chapter 3. Furthermore, we describe how we can extract positive and negativeexamples from command usage history of users, and how to generate recommendation lists4from predicted links, described in Chapter 5.• We propose an algorithm CoDis to generate command recommendations with regard to theelapsed time between users’ last activity time and time of recommendation. The CoDisalgorithm is described in Chapter 4.• We conduct empirical experiments on data submitted by users of an integrated developmentenvironment (Eclipse) to compare the link prediction approach and the CoDis algorithm withstandard algorithms used for command recommendation and matrix factorization techniquesthat are known to perform well in other domains.1.6 Summary of ResultsEmpirical results on data collected by Eclipse Usage Data Collector demonstrate the effectivenessof the CoDis algorithm compared to link prediction, standard algorithms used for command recom-mendation, and matrix factorization techniques that are known to perform well in other domains.From among the link prediction algorithms, Random Forest obtains the highest accuracy. However,the CoDis algorithm beats all link prediction and baseline algorithms based on the recall measure.Compared to the matrix factorization technique with ADAGRAD solver [11], the best performingbaseline, CoDis and Random Forest obtain an improvement of 10.22% and 4.74% in terms of therecall performance measure for a top-N recommendation task, where N = 20.1.7 Notations and ConventionsTable 1.1 shows the notation we use in this work. We use lower case letters for scalar variables(e.g., a), upper case letters for constants (e.g., A), lower-case bold letters for vectors (e.g., a), uppercase bold letters for matrices (e.g., A), and typeset the sets (e.g., A ).5Table 1.1: NotationVariable SymbolSet of users USet of commands ISet of neighbors of vertex v G vSet of neighbors of vertex v’s neighbors Ĝ vSet of commands used (discovered) by user u I uSet of sessions of user u S uList of recommendations for user u RuEntire command usage history of the users QCommand usage history of user u QuTrain set of user u QutrainTest set of user u QutestTrain set of user u in link prediction, consists of positive and negative examples A utrainTest set of user u in link prediction, consists of positive and negative examples A utestVector of feature weights wVector of features for instance ins xinsSession weights for user u wuNumber of commands in a user’s testing set using k-tail kCorresponding class of instance ins yinsElapsed time for user u euEdge weight between vertices u and c f u,cSession chunk threshold θDecay rate λNumber of recommendations NInverse of regularization strength CNumber of instances in link prediction training set mCommand session matrix for user u NuCommand co-occurrence frequency matrix CFCommand co-occurrence probability matrix CPCommand discovery frequency matrix DFCommand discovery probability matrix DP1.8 Thesis OutlineThe remainder of the paper is organized as follows: Chapter 2 presents a brief description of themost important research related to our work, Chapter 3 describes our investigation of link predic-tion, Chapter 4 describes our investigation of the CoDis algorithm, Chapter 5 reports on exper-iments performed, Chapter 6 outlines drawbacks and advantages of our approaches, and finallyChapter 7 concludes with a summary and a description of possible future directions.6Chapter 2Related WorkIn this chapter, we summarize related work for increasing users’ awareness of commands in feature-rich applications. We also provide background on link prediction and discuss efforts relevant to theCoDis algorithm.2.1 Increasing Users Awareness of CommandsPrior work on supporting users awareness of features provided by feature-rich applications can beclassified in two dominant forms: tutorials and recommendation. We describe each of these bodiesin the following subsections.2.1.1 Tutorials for Application UsageUsers intentionally search through tutorials when they need assistance with a task. Prior workon web tutorial roles for feature-rich applications [26] has shown that users generally expect twotypes of help from tutorials: (1) some users expect to have help from tutorials in the midst ofperforming a task that they do not know how to complete, and (2) some users expect tutorials topro-actively expand their skills. The latter users prefer to learn technology that could be used inlater tasks, whereas the former users prefer to learn it when it is required. However, if a user isnot aware of an existing feature they might never inquiry about it. Moreover, some applicationscontinuously upgrade and provide more features, which makes it impossible for most users to knowthe existence of all the features provided. Thus, recommender systems are required to help increaseusers awareness of features. In this thesis, we extend the findings in [26] and conjecture users aremore willing to accept in-task recommendation help when they are in the midst of performing atask (have a small elapsed time), and are more willing to accept recommendations that expand theirrepertoire of skills when they are not in the midst of performing a task (have a large elapsed time).7The CoDis recommendation algorithm we propose, produces different recommendations for usersbased on their elapsed time.2.1.2 Recommendation Techniques for Application UsageHuman-to-Human RecommendationOne of the most influential ways to learn how to perform a software related task, is to learn from apeer performing the task [43]. This phenomenon is called Over the Shoulder Learning (OTSL) [55].However, it has been shown that this method occurs less frequent than other methods for discover-ing new tools in feature rich environments [43]. This difference may be due to the fact that peersare not always available to help or may not know how to perform a task.Murphy-Hill and colleagues [42] introduced Continuous Social Screencasting to overcomeavailability requirements in OTSL. This approach continuously and automatically records videoclips of software developers working on tasks. The developers can share the recorded videos withother developers to demonstrate how to use a tool. However, this method does not overcome allavailability requirements, since all software users will not have access to the videos if not sharedwith them. Moreover, Lubick and colleagues [35] conducted a field study that showed the learningeffects were not as effective as OTSL.Another effective method for learning tools from peers is remotely through online program-ming question and answer forums like Stack Overflow and question and answer forums embeddedwithin interfaces like IP-QAT [38]. Through these communities, people have access to knowledgeand expertise of their peers. However, even with this method, questions are not guaranteed to beanswered on time. If an answer to a question does not exist, then the user would be required topost the question and might receive a late response. Even more critically, if users are unaware ofthe existence of an existing tool, they are unlikely to intentionally search for supporting tools.Web Documentation Recommender SystemsKhan and colleagues [22] explored web documentation as data source to generate command rec-ommendations for tasks at hand. They proposed a method that uses QF-Graphs [15], a bipartitegraph which maps search queries (tasks) to commands (features) referenced in online web docu-mentations. Based on a user’s last x command usage, connected query nodes are activated based ontheir weights. This leads to estimating which queries are relevant to the user’s current task. Thenbased on the activated queries, connected commands to the queries are consequently activated andused for command recommendation to the user. However, in [22], empirical experiments on datasubmitted by GIMP users, show that the proposed method achieved lower predictive accuracy then8previous work used in adaptive interfaces (e.g., frequency based predictions [13, 16]) for users withstable usage patterns.Inefficiency-based Recommender SystemsInefficiency-based recommenders make recommendations when they detect inefficient behaviourof a user performing a task. The Lumiere [20] project, which ultimately became Microsoft office’97 Assistant (Clippy), was one of the first inefficiency-based recommender systems for softwaretools. The program watched a user’s activity using a tool and inferred whether they needed help.Lumiere used Bayesian models to reason under uncertainty and to infer likely intentions of users.It made several assumptions about inefficient performance, such as time spent on portions of adocument, to obtain the model.Another inefficiency recommender is Spyglass [56]. Spyglass aims to increase users aware-ness of navigations tools in IBM Rational Team Concert Environment which is based on Eclipse.Based on the assumptions that recent events contribute more to a developer’s current goal and thatselection of information elements may imply navigation, Spyglass tracks up to seven of the mostrecent navigation events in a window of recent activity. It uses the window of recent activity toinfer when a developers selections may indicate a navigation between two pieces of information. Itcomputes the proximity between the first event in the window and events farther. Then it comparesthe proximity number to the optimal number of information elements needed to reach each eventusing different tools. If a tool is found that may be a more efficient means to support navigation, itadds the tool to a set of potential recommendations. When the window of recent activity is full, itrecommends all the unused tools from the set of potential recommendations to the user.A limitation of the above recommender systems is the requirement of expert knowledge toprecondition each command and activity for recommendation. Recommender systems which donot need pre-programming have also been explored. OWL [33] is a recommender system forMicrosoft Word, which intends to enhance the organization-wide learning of application software.OWL observes the command usage of a large number of users over a period of time. It makesrecommendations to individuals to change their command usage behaviour by comparing a user’scommand usage to the average command usage across the organization. The system would thenmake a recommendation if a command is being under-utilized or over utilized by an individual incomparison to the organization.A common drawback of the aforementioned inefficiency-based recommenders, is that they arenot personalized for individuals. Moreover, even though users knowledge of tool changes over timethe same models are repeatedly used.9Neighborhood Collaborative Filtering Recommender SystemsIn [27, 37, 44], collaborative filtering methods have been investigated in software environmentsto increase users awareness of environment commands. These models address the task of IDEcommand recommendation by analysing implicit user-command interactions to detect commoncommand usage patterns. Li, Matejka and colleagues [27, 37] introduce CommunityCommands,a recommender system for AutoCAD. This system records users’ command usage history andadapts the tfidf [51] weighting scheme to commands. Then it applies user-based and item-basedcollaborative filtering to make a recommendation list for each user.Murph-Hill and colleagues [44] build upon [27, 37] by modelling patterns of command dis-covery. They introduced the notion of discovery patterns in commands, which means that it isprevalent in the user community to discover some commands after others. They assumed if a useruses command i then discovers command j, then the discovery pattern (i, j) occurs. However, theyhad several constraints for a command being actually discovered. First of all, they assumed all com-mands being used in the first period of the users activity (base command usage window) are knownto the user. Second, they ruled out commands which were used only once. They showed that usingdiscovery patterns for command recommendation can be more useful than only considering theirinteractions, by applying collaborative filtering techniques to discovery patterns of users. Theyalso presented a proof-of-concept study that proves the existence of prevalent discovery patterns inEclipse. In this work, we make our second hypothesis on discovery patterns of commands basedon [44] and we incorporate it with our first hypothesis about co-occurrence patterns. However, thethe definition of discovery patterns that we use differs from the definition presented by [44].2.2 Link PredictionThe task of link prediction is to infer which interactions among members of a network are likelyto occur in the near future, given a snapshot of the network [29]. The two most common type ofnetworks are (1) unimodal networks, which can be presented as a homogeneous graph, where nodesrepresent users and links represent user ties such as co-authorship, and (2) affiliation networkswhich can be presented as a heterogeneous bipartite network, where nodes represent users anditems and links represent user and item ties such as purchases [60]. Two popular frameworks forsolving link prediction problems include (1) unsupervised learning framework, and (2) supervisedlearning framework, where they both take advantage of topological features and non-topologicalfeatures of users and items. Topological features are extracted from the structure of an input graphand include common metrics such as Adamic Adar and Common Neighbors [29]. Non-topologicalfeatures depend on metadata and attributes of the users and items. For example in an co-authorshipnetwork[53], such attributes consist of keyword similarities, paper topic similarities, etc.10In unsupervised learning link prediction, connection scores are assigned to all pair of nodes,which are calculated based on the features. Then pairs are ranked according to their scores andpresented to users for recommendation. For example, Libin-Nowell and colleagues [29] applyunsupervised learning methods to a co-authorship network. They explore common topologicalconnection scores such as Adamic Adar, Common Neighbors, Kitz, and Shortest Path for rankingnodes and recommending potential collaborators to authors.In supervised learning link prediction, node pairs are associated with a number of featuresand are typically classified into two classes using classification algorithms: existing links (positiveclass) and non-existing links (negative class). Given a new node pair and its corresponding features,the aim is to predict to which class it belongs. For example, Al Hasan and colleagues [2], apply asupervised learning framework to a unimodal co-authorship network. They extract features frommetadata such as keywords in addition to topological features and apply different classificationalgorithms, such as Decision Tree[40], to predict collaborations between authors.In this work, we use a similar approach to [21], where existing features such as Common Neigh-bors are revised to be meaningful to a bipartite graph, to extract features from a user-commandnetwork. It has been shown that supervised learning approaches out perform unsupervised learningapproaches [30], which leads us to use supervised learning for the task of command recommenda-tion in this thesis.2.3 Prediction of Next ItemPredicting the next item a user will use or visit has been used in other domains, such as the Newsdomain [17, 54]. Trevisiol and colleagues [54] showed that methods based on transition likelihoodscan beat content based methods since users do not constantly read about the same topic. Garcinand colleagues [17] use the sequential nature of news reading to model it as a Markov process andgenerate recommendations based on transition probabilities.In natural language processing tasks, N-Gram models have been widely used to predict the nextword. For example, Brown et al. [8, 39] addressed the problem of predicting a word from previouswords in a text by using frequency of co-occurrence of words in a document.Our model uses bigram models in the command domain to obtain co-occurrence and discoveryscores. However, our approach is different from previous work, because we consider co-occurrenceand discovery patterns simultaneously. Moreover, we take into account the elapsed time betweenthe users last activity time and time of recommendation as an indicator of their need and use it asan weighting scheme for the co-occurrence and discovery scores.11Chapter 3A Link Prediction ApproachIn this chapter, we investigate the use of link prediction to address the command recommendationproblem. We focus on a description of the approach and delay the detailed evaluation to Chapter 5.Our approach is built upon one hypothesis:Hypothesis 1. In an integrated development environment, users with similar features tend to usethe same commands, and also, a specific user tends to use commands with similar features.Thereupon, user-command pairs in which the user has interacted with the command (positiveinstance) will have similar features, which distinguishes them from user-command pairs in whichthe user has not interacted with the command (negative instance). The positive instances belongto a positive class, whereas the negative instance belong to a negative class. Given a new user-command pair, we can predict in which class the user-command pair will belong to based on it’sfeatures.From the command usage history of users, we can extract implicit topological features forusers and commands based on user-command interactions. We map user-command interactionsto a weighted bipartite user-command interaction graph, where the graph structure captures subtleinformation on relations between users and commands (Figure 3.1). We denote the bipartite user-command network as G = (U, I,F), where partitions U and I correspond to the user set U andcommand setI respectively. Also, F denotes the edges between the partitions, where edge f u,i ∈Fis the number of times user u uses command i. For user u, let G u be the set of all commands whichthey use. We define Ĝ u =⋂i∈G u Gi as the set of users who use a common command with user u.Similarly, for a command i, let G i be the set of all users who use it. We define Ĝ i =⋂u∈G i Gu to bethe set of commands that are used by a common user with command i.We extract meaningful features based on the structure of the graph and pose the command rec-ommendation problem as a link prediction problem: Given a snapshot of a user-command network,12we aim to infer which new interactions among the user commands are likely to occur in the nearfuture. We study link prediction as a supervised learning task to predict whether missing linksbetween users and commands will form (positive instance) in the future or not (negative instance).Figure 3.1: A representation of a weighted user-command bipartite graph, where edges rep-resent user’ interactions with commands.3.1 Feature SetChoosing an appropriate feature set is a critical part of any supervised learning algorithm. For eachinstance of user-command pair we use a number of features that are extracted from the commandusage history of users. We use a similar approach to [21], to extract meaningful measures froma bipartite graph. Each feature relates either to the user node, command node, or both. In thefollowing we provide a description of all the topological features we used for link prediction in theuser-command network.• User Neighbors (UN): For user u, this feature measures the number of distinct commandsthat they use:UN(u) = |G u| (3.1)• Command Neighbors (CN): For command i, this feature measures the number of distinctusers who use it:CN(i) = |G i| (3.2)• Preferential Attachment [21] (PA): For user u and command i, this feature relates the user’sknowledge of commands with the command’s popularity:PA(u, i) = |G u| ∗ |G i| (3.3)13• User Neighbors Weighted (UNW): For user u, this features measures the total number oftimes the user executes a command:UNW (u) = ∑i∈G uf u,i (3.4)• Command Neighbors Weighted (CNW): For command i, this feature measures the numberof times it is used by the entire user community:CNW (i) = ∑u∈G if u,i (3.5)• User Common Commands [21] (UCC): For user u and command i, this feature measuresthe number of commands user u uses in common with the users who use command i:UCC(u, i) = |G u∩ Ĝ i| (3.6)• Command Common Users [21] (CCU): For user u and command i, this feature measuresthe number of common users command i has with the commands used by user u:CCU(u, i) = |Ĝ u∩G i| (3.7)• User Common Commands Jaccard’s Coefficient [21] (UCCJ): For user u and commandi, this feature measures the ratio of the number of commands user u uses in common withthe users who use command i to the number of commands that are either used by user u orby the users who use command i:UCCJ(u, i) =|G u∩ Ĝ i||G u∪ Ĝ i|(3.8)• Command Common Users Jaccard’s Coefficient [21] (CCUJ): For user u and commandi, this feature measures the ratio of the number of common users command i has with thecommands used by user u to the number of users who either use command i or a commandused by user u:CCUJ(u, i) =|Ĝ u∩G i||Ĝ u∪G i|(3.9)• User Common Commands Adamic Adar [21] (UCCAA) : This feature refines User Com-mon Commands and considers common commands with a lower degree more important than14common commands with higher degree.UCCAA(u, i) = ∑i′∈|G u∪Ĝ i|1log|G i′ |(3.10)• Command Common Users Adamic Adar [21] (CCUAA) : This feature refines CommandCommon Users and considers common users with a smaller degree more important thancommon users with higher degree:CCUAA(u, i) = ∑u′∈|Ĝ u∪G i|1log|G u′ |(3.11)• User Entropy [58] (UE): For user u, this feature measures the entropy of the user’s interestin all commands. The broader a user’s tastes and interests are, the more distinct commandsthey use. We assumed if a user uses a large number of commands especially with equalprobability, the uncertainty of the user tends to increase. Inversely, if a user uses only afew commands with unequal probability, the specificity of the user tends to increase. Usinginformation theory, command-based User Entropy of user u is defined as:UE(u) =− ∑i∈G up(i/u)logp(i/u) (3.12)where p(i/u) = fu,i∑i∈G uf u,i .3.2 Classification AlgorithmsThere exist a proliferation of supervised learning classification algorithms. In this thesis, we usesix classification algorithms to predict whether missing links in a user-command graph will formin the future or not. In the following, we give a brief description of the algorithms and assumem is the number of training instances, w is a vector consisting of feature weights, xins is a vectorconsisting of all features of instance ins, yins is the corresponding class of instance ins, and C is theinverse of regularization strength:• Decision Tree [40]: uses simple decision rules inferred from data features, to recursivelypartition data. It has a multi stage approach, where at each stage in the procedure, a functionsuch as information gain or Gini impurity [49] is used to process the data.• Random Forest [4]: is a ensemble learning method that uses a collection of decision trees15for training. For an instance, each tree casts a vote on its class, then the random forest outputsthe mode on the individual trees.• Logistic Regression [41]: is a direct probability model for classification. As an optimiza-tion problem, a binary class L2 penalized logistic regression minimizes the following costfunction:minw,cCm∑ins=1log(exp(−yins(xTinsw+ c))+1)+12||w||2 (3.13)• K-Nearest Neighbors [7, 40]: classifies instances based on the corresponding class of itsclosest K training instances in the features space. The algorithm uses a function such asinverse of Euclidean distance or Minkowski [7] metric to weight neighbors of each instance.• Support Vector Machine (SVM) [41]: maps instances in space to classes, by widening amargin as a wide as possible between points of different classes. As an optimization problem,a linear binary class SVM minimizes the following cost function:argminwCm∑ins=1max(0,1− yinswT xins)+12||w||2 (3.14)In addition to linear classification, SVMs can perform non-linear classification using kerneltricks such as a radial basis function (rbf) or a polynomial kernel.• Naı¨ve Bayes [41]: is based on Bayes’ theorem with the naı¨ve assumption of independencebetween every pair of features. The classifier uses the following function to assign positiveand negative classes to instances:yins = argmaxclass∈{+,−}p(yclass)m∏ins=1p(xins|yclass) (3.15)A Naı¨ve Bayes classifier can use different likelihood functions such as the Gaussian distri-bution function or the Bernoulli distribution function.We investigated Decision Trees because they are interpretable and easy to explain. Other al-gorithms such as Logistic Regression are not intuitive. However, decision trees can easily overfitthe training data. Therefore, we considered an ensemble method such as Random Forest to avoidoverfitting. We considered Logistic Regression because the model can easily be updated to takein new data unlike Random Forests and Decision Trees. Moreover, Logistic Regression inherentlyproduces useful probabilities which can be be used for ranking recommendations. We investigatedSVM because with a proper kernel it can work well even if the data is not linearly separable based16on the proposed features. However, SVMs are very slow to tune compared to other classifica-tion algorithms. We considered Naı¨ve Bayes because it has linear time complexity compared tothe size of the data and therefore has optimal time complexity. Also, Naı¨ve Bayes decisions areinterpretable.Each classifier can predict a probability for a user-command pair forming a link (described inSection 5.3.1). For each user, we sort their incident edges based on the probabilities and make atop-N recommendation list of the corresponding unused commands.17Chapter 4CoDisIn this chapter, we investigate our proposed algorithm, CoDis, to address the command recommen-dation problem. The detailed evaluation of the algorithm is explained in Chapter 5.CoDis is built upon three hypotheses:Hypothesis 2. In an integrated development environment, frequent command co-occurrence pat-terns exist.Hypothesis 3. In an integrated development environment, frequent command discovery patternsexist.Hypothesis 4. If a user’s elapsed time is relatively small, they are more likely to be working ontheir most recent work, and would prefer a recommendation based on co-occurring patterns. Astheir elapsed time increases, they are more likely to be working on a new task, and would thereforeprefer a recommendation based on discovering patterns.To distinguish between co-occurrence and discovery patterns we divide a user’s command us-age history into chunks called sessions. To acquire sessions, we define a tunable session chunkthreshold θ , which is determined by cross validation, and assume if the user resumes work aftert ≥ θ , then they have started a new session. Otherwise, we assume that the user is carrying on withtheir previous session. We denote S u as the set of sessions of user u.In Sections 4.1 and 4.2, we describe how co-occurrence patterns and discovery patterns canbe used to compute scores for user-command pairs, respectively. In Section 4.3, we propose aframework to combine the scores based on elapsed time of users. In Section 4.4, we combine thescores irrespective of elapsed time.184.1 Co-occurrence Score (CScore)To perform a task in software environments, the user needs to execute commands. However, com-mands can not be arbitrarily executed to complete a task; for each task a set of specific commandsneed to be executed. Additionally, there might be multiple ways to perform a single task. For exam-ple, for the task of moving text from one place to another in the Eclipse IDE, one user might issuethe commands cut and paste, consecutively, whereas another user may instead execute the copy,paste, and delete commands. Thus, in a development environment, there are sets of frequentlyco-occurring commands:Definition 1. [Command Co-occurrence Pattern] A command co-occurrence pattern (i, j) is de-fined as a user executing command i then j or j then i, sequentially in a single session. Hypothesis 2. In an integrated development environment, frequent command co-occurrence pat-terns exist.We use a symmetric matrix CF ∈ R|I |×|I | to record co-occurrence frequency of commands.Every time a pair of commands (i, j) are executed consecutively in a single session, their corre-sponding frequencies, i.e., c fi, j and c f j,i, increase by one each. For example, if a user executesthe commands i, j,k,i, and j sequentially in a single session, c fi, j and c f j,i increase by two, whilec f j,k, c fk, j, c fk,i, and c fi,k increase by one. We then use these co-occurrence frequencies to obtainthe co-occurrence probability of a command j occurring with command i in a session. We recordthese probabilities in the command co-occurrence probability matrix CP ∈ R|I |×|I |. Here, cpi, jrepresents the probability of command j occurring with command i:cpi, j =c fi, j|I |∑k=1c fi,k(4.1)Note that this matrix is asymmetric.Moreover, for user u, we use the command usage history of the user to derive a commandsession matrix Nu ∈ R|Su|×|I |, where nus, j is the normalized frequency count of command j insession s for user u. Hence, we have:|I |∑j=1nus, j = 1 (4.2)A user’s recent command usage is indicative of the task they are working on. Since the samecommand can be used for many tasks, we can not exactly determine what the user is working on.However, using their previous command usage history and the above co-occurrence probabilitymatrix, we can estimate the commands they would most likely use to complete their task. There-fore, for any user u and unused command i, we estimate a co-occurrence score using the following19equation:CScore(u, i) =|S u|∑s=1wus|I |∑j=1nus, jcp j,i (4.3)where wu ∈R|Su| is a weight vector that defines the importance of user u’s sessions. In our setting,we assumed only the user’s last session is important:wus ={1 s = |S u|0 otherwise(4.4)4.2 Discovery Score (DScore)When users begin using a development environment, they start by using specific commands, andas they become more fluent they discover more and more commands [44]. For example, in an IDEsuch as Eclipse, most people expect there exists a save command even when they have not yetworked with it. But as time passes, they start learning more commands. One might next learn abasic command such as rename and then a more sophisticated version control command such ascompareWithRevision [44]. In this work, we define command discovery patterns in developmentenvironments as:Definition 2. [Command Discovery Pattern] We assume users discover a command when they firstexecute the command. A command discovery pattern (i, j) happens when a user first discoverscommand i in a session and discovers command j consecutively in a later session, with no othercommand being discovered in between. Hypothesis 3. In an integrated development environment, frequent command discovery patternsexist.We use an asymmetric matrix DF ∈ R|I |×|I | to record the frequency of discovery patterns inthe user community, where d fi, j shows the frequency of the discovery pattern (i, j). Note (i, j) is adiscovery pattern only when i and j are discovered consecutively and in two distinct sessions. Thelatter constraint is put to distinguish discovery patterns with co-occurrence patterns; two commandsthat are discovered in the same session are likely to be co-occurring. For example, consider auser who discovers commands i and j in their first session, reuses j in their second session, andreuses i and discovers k in their third session. Then only d f j,k increases by one, because j andk were discovered consecutively in two distinct sessions. d fi, j will not increase because i and jwere discovered in the same session. Also, d fi,k will not increase because they were not discoveredconsecutively. We then use DF to derive the command discovery probability matrix DP∈R|I |×|I |.20Note that this matrix is also asymmetric. Here, d pi, j represents the probability of command j beingdiscovered after command i in the entire user community:d pi, j =d fi, j|I |∑k=1d fi,k(4.5)For user u and unused command i, based on the user’s command discovery patterns and the com-mand discovery pattern probabilities of the entire user community, we use the following equationto obtain the probability of the user discovering the command:DScore(u, i) =∑j∈I ud p j,i|I u|(4.6)where I u is the set of discovered commands of user u. The more prevalent the discovery patterns( j, i) are in the command usage history of the entire user community, where j is any command inthe set of discovered commands of user u, the higher DScore(u, i) will be. In other words, if useru uses commands that are discovered frequently before command i by the entire user community,then DScore(u, i) will be high compared to when the users uses commands that are not discoveredfrequently before command i by the entire user community.4.3 Combining Co-occurrence and Discovery Scores Based onElapsed Time (CoDis)Users have different requirements in different circumstances. When a user is busy with a task,they prefer recommendations relevant to that specific task. However, if they are just about to begina new task, they may prefer more general recommendations. Therefore, the user’s current statusshould be considered when making recommendations. To achieve this, we first define the user’selapsed time:Definition 3. [Elapsed time] For each user u, we define eu to be the elapsed time between the user’slast activity and the time a recommendation is generated. Hypothesis 4. If a user’s elapsed time is relatively small, they are more likely to be working ontheir most recent work, and would prefer a recommendation based on co-occurring patterns. Astheir elapsed time increases, they are more likely to be working on a new task, and would thereforeprefer a recommendation based on discovering patterns.Using elapsed time as an indicator of the user’s recommendation need, and relying on theaforementioned hypothesis, we compute the final score of a user-command pair (u, i) considering21both the co-occurrence score CScore(u, i), and the discovery score DScore(u, i). Specifically, wecombine the scores using a linear model that gradually decays the influence of the co-occurrencescore as eu increases:CDScore(u, i) = CScore(u, i)∗ f (eu)+DScore(u, i) (4.7)We define f (eu) = λeu where λ is a tuning parameter that controls the influence of CScore and isdetermined by cross validation. Based on this intuition, if the users elapsed time is less than λ ,then CoDis generates recommendations that take into account CScore more than DScore and if theusers elapsed time is more than λ , then it generates recommendations that take into account DScoremore than CScore. As the elapsed time of the user increases, the effect of CScore decreases.For each user u and unused command, we compute the above score, and take the top-N unusedcommands as the recommendation list for the user. We refer to this algorithm as the CoDis modelthroughout this work.4.4 Combining Co-occurrence and Discovery Score Irrespective ofElapsed Time (Co+Dis)In addition to CoDis, we derived another model Co+Dis where the score of a user-command pairis computed irrespective of their elapsed time:C+DScore(u, i) = CScore(u, i)+DScore(u, i) (4.8)We derive this model to compare it’s accuracy with CoDis in Chapter 5 and to show that weightingco-occurrence and discovery scores by elapsed time in CoDis is effective. We refer to this algorithmas Co+Dis throughout this work.22Chapter 5Experiments5.1 Data DescriptionTo test the efficiency of our algorithm, we conducted offline experiments on data collected fromEclipse Usage Data Collector (UDC). This dataset contains information voluntarily submitted fromabout 1 million Eclipse users in 2009. It has more than 1.3 billion events, including commandinvocations which are tagged by users identifiers and timestamps. We filtered the dataset to consideronly command invocations. This reduced the dataset to 897714 users who executed 1098 distinctcommands a total of 315048551 times. Also, similar to [44], we chose to experiment only onusers who were active during the entire year to increase the chances of obtaining actual discoverypatterns. This led to a final dataset consisting of 4308 users who used 700 distinct commands atotal of 22926392 times.We used the Eclipse data set consisting of 4308 users and 700 commands to conduct empiricalexperiments. We used four different subsets of data consisting of 3 months, 6 months, 9 monthsof users recent command usage history, and their entire command usage history to see how rec-ommendation accuracy, i.e., recall, changes if we discard older data. We refer to these subsets assubset (a), subset (b), subset (c), and subset (d), respectively. The k-tail approach (described inSection 5.2) discarded users in (a), (b), (c), and (d) who had not discovered more than one com-mand during the corresponding time. This reduced the dataset to 4178 users and 480 commandsin (a), 4292 users and 638 commands in (b), 4301 users and 676 commands in (c), and 4305 usersand 700 commands in (d).235.2 Training, Validation, and Testing SetsWe used the k-tail approach as described in [27, 37, 44] to split all users’ command usagehistory into a training set and a testing set. For each user, the testing sequence contains thelast k commands that the user has discovered. The training set contains all the commands theuser has used, until the first command in the testing set was used. For example, assume theuser has a command usage history consisting of Qu = {commit, rename, commit, openResource,synchronize, and rename}. Note, that the last command discovery is synchronize. Now, if k =1, the testing set would contain Qutest = {synchronize} which is the last discovered command.The training set contains all the previous commands used before synchronize, and consists ofQutrain = {commit,rename,commit,openResource}. The last command, rename, has been dis-carded because it was used after the last command discovery. To enable comparisons to earlierwork [27, 37, 44], we use k = 1. This value is reasonable as our goal is to teach commands one ata time to each user.For algorithms that need tuning to avoid over fitting the data, we used a validation set. For allalgorithms except for the classification algorithms, to obtain the validation set, we used the k-tailapproach with k = 1 on the users’ training set to further split it into a training set and a validationset.The link prediction classifiers require negative examples in addition to positive examples. How-ever, in the Eclipse dataset, user-command pairs are not explicitly classified. We assumed if auser has used a command, then the command was useful to them. Therefore, we classified auser-command pair as a positive example if the users uses the command at least once, and as anegative example otherwise. We partitioned all positive examples into two non overlapping sets:training set and testing set using the k-tail approach, which was described previously. To par-tition negative examples, user-command pairs with a missing link were randomly split into twosets with the constraint that for each user their training set contained the same number of positiveand negative examples. Hence, the training set was balanced with the same number of positiveand negative examples, whereas the testing set contained more negative examples than positiveexamples. For example, let I = {commit, rename, extractMethod, openResource, synchronize,sync, showHistory, f ormat, copy, paste}, and Qu = {commit, rename, commit, openResource,synchronize, and rename}. As explained above for user u, Qutrain = {commit,rename,commit,openResource} and Qutest = {synchronize}. We extract three positive instances (u,commit),(u,rename), and (u,openResource) for the training set and one positive instance (u,synchronize)for the testing set. The negative instances are extracted based on u’s unused commands and con-sist of (u,extractMethod), (u,sync), (u, paste), (u,showHistory), (u, f ormat), and (u,copy). Werandomly split the negative examples into two the training set and testing set such that the training24set also contains three negative examples. A possible division could be (u,sync), (u, paste), and(u,showHistory) as negative instances for the training set and (u,extractMethod), (u,copy), and(u, f ormat) as negative examples for the testing set. We demonstrate the training set and testingset obtained with this approach that consists of both positive and negative examples as A utrain andA utest .For the classification algorithms, we used grid search 1 to obtain a validation set (described inSection 5.5).5.3 Algorithms & Implementation5.3.1 Link Prediction AlgorithmsWe programmed the link prediction approach using Python. We first extracted the features men-tioned in Chapter 3 from the command usage history of users and then standardized all the featuresusing scikit-learn2 [45], a machine learning library for Python. If a feature has variance muchlarger than others, it might dominate the objective function of the estimator. Hence, we appliedstandard scaling to features such that they have zero mean and unit variance. We then reapplied thesame transformation to the testing set34.To evaluate the performance of the features described in Chapter 3, we experimented with sixclassification algorithms: Decision Tree, Random Forest, Logistic Regression, K-Nearest Neigh-bors, SVM, and Naı¨ve Bayes. The classification algorithms were also implemented with scikit-learn. Each classifier can predict a probability for a user-command pair forming a link (positiveclass) or not (negative class). We used the probability methods offered by scikit-learn to obtainclass probabilities for each link. Naı¨ve Bayes and Logistic Regression directly produce probabil-ities. In K-Nearest Neighbors, the probability of a class for a user-command pair is equal to thefraction of the training examples of the same class in the neighbors of the link. In Decision Tree,the probability of a class for a user-command pair is the fraction of training samples of the sameclass in a leaf 5. In Random Forest, the probability of a class is the mean of predicted class ofthe trees in the forest 6. Also, SVM can predict probabilities using platt scaling, which is Logis-tic Regression on SVM’s scores [31]. For each user, we sorted their incident edges based on theprobabilities and made a top-N recommendation list of the corresponding unused commands.1 search.html2 CoDis & VariantsIn addition to CoDis, we further implemented three other variants of our approach to make rec-ommendations. We individually used the Co-occurrence Score (CScore) and Discovery Score(DScore) introduced in Sections 4.1 and 4.2, to generate recommendation for users. Moreover,we derived another model Co+Dis that combines CScore and DScore irrespective of elapsed time,in Section 4.4. We implemented all the aforementioned algorithms in Python.5.3.3 BaselinesWe compared all the link prediction algorithms, CoDis, Co+Dis, CScore, and DScore against eightbaselines, including neighborhood collaborative filtering algorithms that have previously been usedfor the task of command recommendation [27, 37, 44] and latent factor models that we additionallyexplore:• Non-Personalized Methods: include the (#1) Most Popular [44] and (#2) Most Widely Usedalgorithms [27, 33, 37, 44], which have been used previously in the context of command rec-ommendation. The Most Popular algorithm recommends the commands that are the mostused by the community. The Most Widely Used algorithm is an extension to the Most Popu-lar that ignores repeated use of a command from a user and recommends the commands thatare used by the most number of users in the community.• Neighbourhood Models: include standard (#3) User-based [1] and (#4) Item-based [48]collaborative filtering algorithms, also used in the context of command recommendation in[27, 37]. In the former algorithm, similarities between users are calculated based on theircommand usage history, and recommendation to users are made based on the commands thatsimilar users use. In the latter algorithm, similarities between commands are calculated basedon the users who use them, and recommendation to users are made based on the commandswhich are similar to the ones they have already used.• Collaborative Filtering with Discovery: include (#7) User-based Discovery and (#8) Item-based Discovery algorithms, which have been proposed in [44] for the task of commandrecommendation. These methods model users discovery patterns of commands and applystandard neighborhood models on the discovery patterns.• Latent Factor Models: is another primary area for collaborative filtering which explainsratings based on latent factors [25]. To the best of our knowledge, latent factor models havenot been previously used for command recommendation. In this work, we used two matrixfactorization techniques (#5) Singular Value Decomposition (SVD) [47] and (#6) Adaptive26Gradient (ADAGRAD) [11]. SVD is a technique for dimensionality reduction, where itdecomposes a matrix into three lower dimension subcomponents and then reconstructs itfor making recommendations. ADAGRAD is a subgradient method to solve regression inmatrix factorization, which dynamically adapts the algorithm to the geometry of the data todecompose a matrix into user and item latent factors.We implemented the Most Popular, Most Widely Used, User-based collaborative filtering, User-based Discovery, and SVD algorithms using Python. For Item-based, Item-based Discovery, andADAGRAD we used GraphLab7 [34], a high performance machine learning toolkit.In the baseline algorithms, similar to [37], we used the tfidf weighting scheme [51] to representthe importance of commands for users. For a user-command pair, this weighting scheme has twocomponents: command frequency and the inverse user frequency. The command frequency repre-sents the importance of a command i to user u, and is defined as the proportion of times commandi is used by u, denoted by c f (u, i). The inverse user frequency of command i, denoted by iu f (i,U )describes the discriminative power of the command in the user community, and helps to controlfor the fact that some commands are generally more common. Altogether, the cfiuf weight for acommand is:c f iu f (u, i,U ) = c f (u, i)∗ iu f (i,U )=∑|S u|s=1 nus,i∑|I |j=1∑|S u|s=1 nus, j∗ log|U ||u ∈U : i ∈I u|(5.1)where the notation is explained in Table 1.1.For algorithms such as Most Popular, Most Widely Used, User-based collaborative filtering,and SVD, which we implemented, we stored the cfiuf values in a |U | ∗ |I | matrix to apply thealgorithms.5.4 Evaluation MetricA common approach to evaluate recommender systems which use implicit data is through accuracymetrics such as recall and precision [6]. These metrics have been widely used in other domainswhere explicit preference judgements do not exist [23, 54]. For each user, we make a recommenda-tion list of size N, to predict their next command discovery. We use recall to evaluate performanceaccuracy:Recall =|U |∑i=1|Ru∩Qutest ||U |(5.2)7 Ru is defined as the recommendation list for user u, Qutest is the testing set for user u,and |U | is the total number of users in the community. For k = 1, Precision = RecallN and theprecision diagram would maintain the same relative ordering as the recall diagram; hence, we donot demonstrate the diagrams for precision separately.5.5 ParametersFor algorithms that need tuning to avoid over fitting the data, we used a validation set. For theclassification algorithms, we first used the k-tail approach described in Section 5.2 to obtain atraining set and a testing set. We then used grid search offered by scikit-learn8 to split the trainingset into a training set and a validation set. Grid search uses the validation set to avoid over fittingthe data on the testing set.For other algorithms, we first used the k-tail approach described in Section 5.2 to obtain atraining set and a testing set. We then used the k-tail approach with k = 1 on the users training setto further split it into a training set and a validation set. As explained in section 5.1, we conductedexperiments on different lengths of all users history. However, to be consistent, we only tunedthe parameters on subset (d), which contains the entire command usage history of users who wereactive during the entire year. We further used the obtained parameters from subset (d) on subsets(a),(b), (c) to show the stability of the CoDis algorithm. The tuning lead to the parameters shown inTable 5.1. For algorithms that are dependent on randomization, we used a seed equal to 100. Notethat we only report the parameters in which we tuned and do not report the parameters in which weused their default value given by the scikit-learn and GraphLab libraries.Table 5.1: Parameters used in algorithms.Algorithm Parameters DescriptionDecision Tree,Random Forestχ = information gain,σ = best,υ = 300χ is the criterion that measures the qualityof a split, σ is the strategy used to choosethe split at each node, υ is the number oftrees in the forest. All parameters werefound using exhaustive search over speci-fied parameter values in subset (d).Continued on next page8 search.html28Table 5.1 Parameters used in algorithms. Continued from previous pageAlgorithms Parameters DescriptionLogistic RegressionC=200,Ł = L2,C is the inverse of the regularizationstrength, Ł is the norm used for penaliza-tion. Both parameters were found usingexhaustive search over specified parametervalues in subset (d).K-Nearest NeighborsK = 100,d = Euclidean,ω = inverse of distanceK is the number of neighbors considered foreach node, d is the distance metric, ω is theweight function for weighting neighbors.All parameters were found using exhaus-tive search over specified parameter valuesin subset (d).SVMκ = rbf,γ = 112 ,ϖ = trueκ is the kernel type used, γ is the kernel co-efficient, ϖ enables probability estimates inscikit-learn. All parameters were found us-ing exhaustive search over specified param-eter values in subset (d).Naı¨ve Bayes Γ = GaussianΓ is the distribution type. The algorithmperformed best with the Gaussian distribu-tion compared to other distributions in allsubsets.CoDis,Co+Dis,CScore,DScoreθ = 100,λ = 50θ is the session chunk threshold in all al-gorithms, λ is the decay rate in the CoDismodel. Both parameters were found usingexhaustive search over specified parametervalues in subset (d).Continued on next page29Table 5.1 Parameters used in algorithms. Continued from previous pageAlgorithms Parameters DescriptionUser-based,Item-based,User-based Dis-covery,Item-based Dis-coveryµ = 32,β = 40,s f = cosine similarityµ is the number of similar users or com-mands to draw recommendations, β is thebase command usage window, which spec-ifies the time in which users know all thecommands used upto that point, s f is thefunction to calculate similarities betweenusers or commands. All parameters werespecified in [44]. The algorithms per-formed best with the cosine similarity func-tion compared to other functions in all sub-sets.SVD ∆ = 40∆ is the dimensionality used to reconstructthe factorized matrix. It was found usingexhaustive search over specified parametervalues in subset (d).ADAGRAD δ = 32δ is the number of latent factors. It wasgiven by the default settings of GraphLab.Also, the default setting performed betterthan other settings on average.The CoDis algorithm has two parameters that require tuning: θ the session chunk threshold,and λ the decay rate in the CoDis model. The CoDis algorithm produced stable results on differentvalues of θ and λ and changed gradually. However, we obtained θ = 100s and λ = 50, by tuningthe parameters using the entire command usage history of all users.For ADAGRAD we used the GraphLab library with it’s default setting to implement it. Sinceit is dependent on randomized methods, we repeated each implementation 100 times and reportedthe average. For the User-based, Item-based, User-based Discovery, and Item-based Discoverymethods we used the tune parameters recommended by [37] and [44].5.6 Experimental ResultsFigures 5.1, 5.2, 5.3,and 5.4 demonstrate the performance of the classification algorithms againsteach other on subsets (a), (b), (c), and (d), respectively. In all figures, recommendation list size,i.e., N, increases from 1 to 20 and k is set to 1. From the classification algorithms, RandomForest performs the best followed by Logistic Regression and K-Nearest Neighbors. Naı¨ve Bayes30performs fourth best because the features are actually correlated and Naı¨ve Bayes assumes theyare not. However, the algorithm does not need tuning and is very fast compared to the otheralgorithms. Surprisingly, SVM with rbf kernel has lower accuracy than the Naı¨ve Bayes algorithm.We conjecture that the low accuracy is the result of inadequate parameter space which we providedfor the algorithm, which was itself the result of it’s very slow running time. Since SVM with rbfkernel has a very slow running time, we suggest that the other algorithms should be used insteadfor the command recommendation problem. Decision Tree performs the worst because it over fitsthe training data. However, Random Forest corrects Decisions Tree’s over fitting and performs thebest. The accuracy of classifiers such as Random Forest and Logistic Regression on the subsetsshow that the features described in Section 3.1 are capable of distinguishing between missing andnon missing links and provide evidence towards hypothesis 1.Figures 5.5, 5.6, 5.7, and 5.8 shows the performance of the CoDis, Co+Dis, CScore, DScore,all baseline algorithms and the highest performing classification algorithm, i.e., Random Forest, onsubsets (a), (b), (c), and (d), respectively. In all figures, recommendation list size, i.e., N, increasesfrom 1 to 20 and k is set to 1. In Figures 5.5, 5.6, 5.7, and 5.8 , we can see that CoDis out performsall other algorithms for N = 3 to N = 20 on all four subsets. The Co+Dis performs second best.As the figures demonstrate, Codis always has a higher recall than Co+Dis. Therefore, consideringthe elapsed time of users as an indicator of the type of recommendation they require is effectiveand leads to better accuracy.For N = 20, Random Forest performs third best in all subset except for (a). This is probably dueto the lack of positive examples in subset (a). In subset (d), from among the baseline algorithms,User-based Discovery performs the best for N = 1 to N = 15,. However, for N = 15 to N = 20ADAGRAD performs the best. The Most popular performs the worst in (b), (c), and (d). However,in (a), it performs as good as ADAGRAD.In subset (d), User-based Discovery performs the best for N = 1 to N = 2. For User-based dis-covery and Item-based discovery algorithms, Murphy-Hill and colleagues [44] used a base com-mand usage window of size 40 sessions, to discard false positive discoveries. As explained inSection 2.1.2, they assumed a command discovery occurs only after the base command usage win-dow and only if it is used more than once. These requirements reduced (d) to 3724 users and 614commands for the User-based Discovery and Item-based Discovery algorithms (Figure 5.8). Inthe other subsets, the number of users is reduced to less than 2600 users and the number of com-mands is also dropped. As a result, we chose not to report the recall of User-based Discovery andItem-based Discovery in subsets (a), (b), and (c) as we thought it is not a fair comparison to otheralgorithms.Based on the users entire command usage history (subset d), for the DScore and CScoremodels, we computed for which common users they made a correct recommendation. Figure 5.931Figure 5.1: Recall of link prediction classifiers in subset (a). Dataset consists of 4178 usersand 480 commands.demonstrates their Jaccard similarity for the top-N recommendation task, where N changes from1 to 20. The two sets of user have less than 35% overlap for all N. This indicates that the twoalgorithms produce different recommendations and are able to make correct recommendations fordifferent sets of users (Figure 5.9).Moreover, we analysed the elapsed time of the users of subset (d) for which there was a correctrecommendation, from the CoDis, Co+Dis, CScore, and DScore algorithms. Figure 5.10 demon-strates their average elapsed time for the top-N recommendation task, where N changes from 1 to20. In Figure 5.10, we can see that the average elapsed time for the CScore model is very lowcompared to DScore model. This means that the CScore model is able to produce correct recom-mendations for users with small elapsed time, whereas the Discovery model is able to make correctrecommendations for users with large elapsed time. The CoDis algorithm which takes into accountelapsed time directly in the model, has higher average elapsed time compared to Co+Dis. How-ever, both CoDis and Co+Dis converge to the average elapsed time of the entire users, which is43.40 hours.32Figure 5.2: Recall of link prediction classifiers in subset (b). Dataset consists of 4292 usersand 638 commands.Figure 5.3: Recall of link prediction classifiers in subset (c). Dataset consists of 4301 usersand 676 commands.33Figure 5.4: Recall of link prediction classifiers in subset (d). Dataset consists of 4305 usersand 700 commands. Item-based Discovery uses 3724 users and 614 commands.Figure 5.5: Recall of CoDis, Co + Dis, CScore, and DScore algorithms against baselines insubset (a). Dataset consists of 4178 users and 480 commands.34Figure 5.6: Recall of CoDis, Co + Dis, CScore, and DScore algorithms against baselines insubset (b). Dataset consists of 4292 users and 638 commands.Figure 5.7: Recall of CoDis, Co + Dis, CScore, and DScore algorithms against baselines insubset (c). Dataset consists of 4301 users and 676 commands.35Figure 5.8: Recall of CoDis, Co + Dis, CScore, and DScore algorithms against baselines insubset (d). Dataset consists of 4305 users and 700 commands. User-based Discoveryand Item-based Discovery use a dataset consisting of 3724 users and 614 commands.Figure 5.9: Jaccard similarity of users that the CScore and DScore algorithms were able tomake correct recommendation for, in subset (d).36Figure 5.10: Average elapsed time of users that the CoDis, Co+Dis, CScore, and DScorealgorithms were able to make correct recommendation for, in subset(d)375.7 Link Prediction Feature ComparisonOne of our objectives was to compare the 12 features described in Section 3.1, to judge their relativestrength in a link prediction task. Figures 5.11 and 5.12 provide a comparison of the 12 featuresfor subset (d). They show the features rank of importance in information gain as obtained by theDecision Tree and Random Forest algorithms, respectively. As illustrated in Figures 5.11 and 5.12,Command Common Users Jaccards Coefficient has the highest rank in the Eclipse dataset. InFigure 5.11, all other features have low information gain, whereas in figure 5.12, all features haverelatively high information gain. Because Random Forest corrects Decision Tree’s overfitting, itsranking of attributes are more likely to be correct. Therefore, the combination of features usedin the link prediction problem act as good discriminators for missing and non-missing links whenthey are combined compared to when they are used solely.38Figure 5.11: Information gain of features based on the Decision Tree classifier, in subset (d).Figure 5.12: Information gain of features based on the Random Forest classifier, in subset(d).395.8 Summary of ResultsExperiments on the Eclipse dataset show that from among the link prediction algorithms, Ran-dom Forest performs the best. Compared to the matrix factorization technique with ADAGRADsolver [11], that has the best accuracy amongst the baselines, Random Forest obtained an improve-ment of 4.74% in terms of the recall performance measure for a top-N recommendation task, whereN = 20. Random Forest suggests that the features provided all have relatively high informationgain, but the Command Common Users Jaccards Coefficient has the highest.The CoDis algorithm achieved significant performance improvements, in terms of recall, overall link prediction and baseline algorithms. Compared to the matrix factorization technique withADAGRAD solver, CoDis obtained an improvement of 10.22% in terms of the recall performancemeasure for a top-N recommendation task, where N = 20. Based on the CoDis algorithm, theresults suggest that taking into account the elapsed time between users last activity time and timeof recommendation could be useful for producing more accurate recommendations.40Chapter 6DiscussionIn this chapter, we discuss some disadvantages and advantages of our approaches. A salient limita-tion of the link prediction approach is the fact that not all features were explored. Other similarityfeatures in the user-command network such as Hitting time [29], which is the expected number ofsteps required for a random walk to start at a node and reach the other node, can be investigatedto measure the similarity between a user node and a command node. Also, other user featuressuch as skewness [10], which measures whether a user has used many commands with a skeweddistribution, can be explored.A common limitation of the link prediction approach and the CoDis approach is the fact that weconsidered a user-command pair in which the user has used the command at least once a positiveexample, and a negative example otherwise. However, a user may use a command several times andlater realise they don’t like it, or a user might actually find a command useful but never use it dueto unawareness of its existence. Hence, categorizing user-command pairs as positive or negativeexamples, should be handled more carefully. For example, Murphy-Hill and colleagues [44] useconcepts such as k-tail multi-use, which means a command should be used multiple times by a userto be considered as a positive example, to categorize user-command pairs. Similar approaches canbe applied to both the link prediction algorithms and the CoDis algorithm to improve quality ofrecommendation for users.One drawback of the CoDis algorithm is identifying co-occurrence and discovery patterns indatasets. Since commands can be executed in any order, two co-occurring commands may beexecuted far apart in time, even in different sessions. Also, two commands that are discovered inthe same session might actually form a discovery pattern. Distinguishing between these situationsis even harder in datasets which do not provide explicit user sessions. For a co-occurrence patternto be missed, the users should use the co-occurring commands far apart most of the times. If thecommand has been used very rare by the users and also far apart then it is likely to be missed as41a co-occurring pattern. Also, if two commands of a discovery pattern are discovered in the samesession or are discovered not consecutively, then it is likely to be missed as a discovery pattern.Moreover, we assumed when a user first uses a command, then the user has discovered it.Since we might not have a user’s entire command usage history, a command discovery might notactually be a discovery. In other words, when a user issues a command for the first time it doesnot necessarily mean that the command was just learned. The user may have known the command,despite not using it. There exists different techniques to rule out such false-positive discoveries.For example, Murphy-Hill and colleagues [44] consider a command usage as a command discoveryonly after a base time, which represents a period of time for which they assume the developers knowthe command.We further explored other definitions for discovery patterns, but they led to lower accuracy. Forexample, an alternative for Definition 2 in which we explored is:Definition 4. [Command Discovery Pattern Alternative] We assume users discover a commandwhen they first execute the command. A command discovery pattern (i, j) happens when a userfirst discovers command i in a session and discovers command j consecutively in a later session.According to this definition, if a user discovers commands i and j in their first session, reusesj in their second session, and reuses i and discovers k in their third session. Then both d fi,k andd f j,k increase by one, because i and j were discovered in the first session and k was consecutivelydiscovered in the third session.Another drawback of the CoDis algorithm is that Hypothesis 4 assumes there is a change intask when elapsed time is large and that not might be true. A user might require recommendationrelated to their current task even though they have a large elapsed time.Overall, based on the CoDis algorithm, the results suggest that taking into account the elapsedtime between users last activity time and time of recommendation could be useful for producingmore accurate recommendations. However, accuracy is not the only important factor for evaluatingrecommender systems. Another key challenge is explainability [24]. For each recommendationto a user, we can show the co-occurrence and discovery patterns which had the highest impact.Specifically, in the context of recommending software tools, learning from peers is very effectivebut happens very rare due to limitations [43]. Hence, recommending prevalent patterns in the usercommunity can improve users awareness.6.1 Time ComplexityIn addition to accuracy and explainability, another good reason for deploying the CoDis algorithmis computational efficiency, which is a key aspect in recommender systems [24]. The link predictionand baseline algorithms may have different time complexities based on their implementation. In the42following, we give an approximation of all algorithms’ time complexity based on the solvers thatscikit-learn and GraphLab use. For the algorithms that we implemented from scratch, we present atime analysis based on our implementation.LetU be the users set,I be the set of commands in which the users have used,Q be the entirecommand usage history of the users. For the classification algorithms, let Atrain be the instancesin the training set, Atest be the instances in the testing set, and d be the number of attributes. Notethat |Atrain|+ |Atest |= |U | ∗ |I |, since each positive instance is weighted by number of times theuser uses the command.All the algorithms have time complexity at least Ω(|Q|) to scan the input. However, we excludeit from their runtime given below. Also, we exclude the time of ranking recommendations for users.For link prediction, creating a sophisticated feature set might require a noticeable amount oftime. Some features have trivial time complexity, whereas some have salient time complexity.For example, for a single user node, extracting the User Neighbors feature is O(|I |). Whereasextracting User Common Commands for a user-command pair is O(|U ||I |). Also, the classi-fication algorithms might have noticeable time complexity. Decision Tree has time complexityO(d|Atrain|log|Atrain|) [57]. Random Forest on a single core with υ trees has time complexityO(υd|Atrain|log|Atrain|). For SVD, scikit-learn’s implementation is based on libsvm and has timecomplexity between O(d|Atrain|2) and O(d|Atrain|3) 1 [5] to train data. For Logistic Regression,we used the liblinear solver of scikit-learn. The liblinear solver uses a coordinate descent algorithmbased on liblinear [12], where each iteration is O(d|Atrain|) [59]. For K-nearest Neighbors, train-ing isn’t required, but classifying each instance requires O(d|Atrain|) using brute force search 2.scikit-learn provides several approximate algorithms to reduce the time complexity at the cost ofaccuracy. Training with Naı¨ve Bayes has optimal time complexity O(d|Atrain|) [36], since it islinear in the time it takes to scan all instances.The Most Popular and Most Widely Used algorithms require O(|U ||I |) time, since for eachcommand we need to acquire all the users who use it. Item-based collaborative filtering has timecomplexity O(|U ||I |2) for training and has time complexity O(|U ||I |) for generating predic-tions for a single user [14, 32]. User-based collaborative filtering doesn’t require training, howeverto make predictions for a single user, it requires O(|U ||I |) [14] time. In Item-based Discovery,items are command discovery patterns and the number of command discovery patterns is O(|I |2).Hence, Item-based Discovery has runtime O(|U ||I |4) for training and O(U |I |2) to make pre-dictions for a single user. In User-based Discovery, items are also command discovery patterns.Therefore, to make predictions for a single user, O(|U ||I |2) time is required. The fastest algo-rithms for SVD are proportional to O(|U |2|I |+ |I |3) [18], which is equal to O(|U |2|I |) for1|I | ≤ |U |. Each iteration of ADAGRAD is O(QN), where N is the number of sub-functions usedin the algorithm and Q is the cost of computing the value and gradient for a single sub-function [50].In the CoDis algorithm, while the users command usage history is being scanned, the DF andCF matrices can be generated. The CP and DP matrices can be obtained in O(|I |2), since we needto compute the sum of each row once and then normalize each cell by the sum of it’s correspondingrow. To compute CScore for a single user, all the commands in their last session and their co-occurring commands need to be investigated. Hence, CScore is O(|I |2) in worst case. To computeDScore for a single user, all the commands in which the user has discovered and the commandsthat have been discovered after them need to be investigated. Therefore, DScore also has timecomplexity O(|I |2). For each user and command, obtaining CoDis from their CScore and DScorerequires O(1) time. Hence, to make prediction for a single user, CoDis requires O(I 2) time.Similarly, to make predictions for a single user, Co+Dis has time complexity O(I 2). Since the CPand DP matrices can be computed offline and in parallel, then computing the scores for a user canbe done online when a user asks for recommendation. Hence, the personalized recommendationscan be produced in a timely manner when a user asks for recommendation. Therefore, the CoDisalgorithm can be easily scaled to millions of users and commands.44Chapter 7Conclusion and Future WorkIn this work, we addressed the command recommendation problem. We proposed a link predictionapproach and a novel algorithm called CoDis that can be deployed to teach users in a developercommunity what command to learn next.For the link prediction approach, we obtained a user-command bipartite network from com-mand usage history of users and extracted topological features from the network for user-commandpairs. We used several classification algorithms to infer for each user, which commands they arelikely to form an link with and generated recommendations based on the predicted links.Our proposed algorithm, CoDis, generates personalized recommendations for a user by analysingthe user’s past command usage history, co-occurrence and discovery patterns of commands withinthe entire community, and also the elapsed time between the user’s last activity and the time ofrecommendation. The algorithm first extracts co-occurrence and discovery patterns from the com-mand usage history of the entire user community. Then for each user, computes a discovery andco-occurrence score based on their own command usage history. Finally, for each user, it weighstheir co-occurrence score and discovery score according to their own elapsed time. The CoDis con-siders the users elapsed time as an indicator of their need. In particular, CoDis assumes if a user’selapsed time is relatively small, they are more likely to be working on their most recent tasks, and ifthe elapsed time is relatively large, the user is more likely to have begun a new task. Based on thisintuition, if the users elapsed time is relatively low, then CoDis generates recommendations thattake into account the co-occurrence score more than the discovery score and if the users elapsedtime is relatively high, then it generates recommendations that take into account the discovery scoremore than the co-occurrence score.We evaluated our approaches on data submitted by anonymous users of the Eclipse IDE, interms of recall. We compared the accuracy of the classifiers and the CoDis algorithm with stan-dard algorithms used for command recommendation and matrix factorization techniques that are45known to perform well in other domains. The CoDis algorithm performed better than all link pre-diction and baseline algorithms and the Random Forest classifier performed better than all otherclassification algorithms. CoDis and Random Forest achieved an increase of 10.22% and 4.74% inrecall, compared to the matrix factorization technique with ADAGRAD solver, the best perform-ing baseline, for a top-N recommendation task, where N = 20. Based on the CoDis algorithm, theresults suggest that taking into account the elapsed time between users last activity time and timeof recommendation could be useful for producing more accurate recommendations.Building on our results, there are several promising areas for future work. A promising direc-tion for the CoDis algorithm, is to investigate other models for taking advantage of the discoveryand co-occurrence scores. In CoDis, we used a linear combination of co-occurrence and discoveryscores weighted by elapsed time to obtain final scores. We used a tuning parameter λ to specifywhether users prefer recommendation based on co-occurrence patterns or discovery patterns. Al-ternative models, such as exponential functions with a decay rate [9] can be explored for combiningthe co-occurrence and discovery scores.Moreover, other patterns can be taken into account for estimating the co-occurrence and dis-covery scores. Currently, for both types of patterns, we have considered 2-grams. Patterns of largersizes, e.g., 3-grams or 4-grams, should be investigated. Further, in the CoDis algorithm, to computeCScore we used a weighting function to weigh sessions importance, where the last session weighedone and previous sessions weighed zero. Future work should investigate whether other weightingfunctions which give more weight to the user’s previous sessions, perform better or not.Although our analysis and experiments focus on recommending IDE commands, we believethat our model can be extended in applications where learning takes place, such as other feature-rich applications and application programming interfaces. Our model can be explored for deploy-ment in these tools to teach users how to navigate through them. Also, it is worthy to conductexperimental studies in non-technical environments where learning takes place. For example, forteaching language related skills to users.46Bibliography[1] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: Asurvey of the state-of-the-art and possible extensions. Knowledge and Data Engineering,IEEE Transactions on, 17(6):734–749, 2005. → pages 26[2] M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using supervised learning.In SDM06: Workshop on Link Analysis, Counter-terrorism and Security, 2006. → pages 3,11[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machineLearning research, 3:993–1022, 2003. → pages 4[4] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. → pages 15[5] C.-C. Chang and C.-J. Lin. Libsvm: A library for support vector machines. ACMTransactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011. → pages 43[6] P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-nrecommendation tasks. In Proceedings of the fourth ACM conference on Recommendersystems, pages 39–46. ACM, 2010. → pages 27[7] P. Cunningham and S. J. Delany. k-nearest neighbour classifiers. Mult Classif Syst, pages1–17, 2007. → pages 16[8] M. Damashek et al. Gauging similarity with n-grams: Language-independent categorizationof text. Science, 267(5199):843–848, 1995. → pages 4, 11[9] Y. Ding and X. Li. Time weight collaborative filtering. In Proceedings of the 14th ACMinternational conference on Information and knowledge management, pages 485–492.ACM, 2005. → pages 46[10] D. P. Doane and L. E. Seward. Measuring skewness: a forgotten statistic. Journal ofStatistics Education, 19(2):1–18, 2011. → pages 41[11] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning andstochastic optimization. The Journal of Machine Learning Research, 12:2121–2159, 2011.→ pages 5, 27, 4047[12] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library forlarge linear classification. The Journal of Machine Learning Research, 9:1871–1874, 2008.→ pages 43[13] L. Findlater and J. McGrenere. A comparison of static, adaptive, and adaptable menus. InProceedings of the SIGCHI conference on Human factors in computing systems, pages89–96. ACM, 2004. → pages 9[14] V. Formoso, F. Cacheda, and V. Carneiro. Algorithms for efficient collaborative filtering. InEfficiency Issues in Information Retrieval Workshop, page 17, 2008. → pages 43[15] A. Fourney, R. Mann, and M. Terry. Query-feature graphs: bridging user vocabulary andsystem functionality. In Proceedings of the 24th annual ACM symposium on User interfacesoftware and technology, pages 207–216. ACM, 2011. → pages 8[16] K. Z. Gajos, M. Czerwinski, D. S. Tan, and D. S. Weld. Exploring the design space foradaptive graphical user interfaces. In Proceedings of the working conference on Advancedvisual interfaces, pages 201–208. ACM, 2006. → pages 9[17] F. Garcin, K. Zhou, B. Faltings, and V. Schickel. Personalized news recommendation basedon collaborative filtering. In Proceedings of the The 2012 IEEE/WIC/ACM InternationalJoint Conferences on Web Intelligence and Intelligent Agent Technology-Volume 01, pages437–441. IEEE Computer Society, 2012. → pages 11[18] G. H. Golub and C. F. Van Loan. Matrix computations, volume 3. JHU Press, 2012. →pages 44[19] J. Han, H. Cheng, D. Xin, and X. Yan. Frequent pattern mining: current status and futuredirections. Data Mining and Knowledge Discovery, 15(1):55–86, 2007. → pages 4[20] E. Horvitz, J. Breese, D. Heckerman, D. Hovel, and K. Rommelse. The lumiere project:Bayesian user modeling for inferring the goals and needs of software users. In Proceedingsof the Fourteenth conference on Uncertainty in artificial intelligence, pages 256–265.Morgan Kaufmann Publishers Inc., 1998. → pages 9[21] Z. Huang, X. Li, and H. Chen. Link prediction approach to collaborative filtering. InProceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 141–142.ACM, 2005. → pages 3, 11, 13, 14, 15[22] M. A. A. Khan, V. Dziubak, and A. Bunt. Exploring personalized commandrecommendations based on information found in web documentation. In Proceedings of the20th International Conference on Intelligent User Interfaces, pages 225–235. ACM, 2015.→ pages 8[23] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborativerecommendation. In Proceedings of the 32nd international ACM SIGIR conference onResearch and development in information retrieval, pages 195–202. ACM, 2009. → pages2748[24] Y. Koren. Tutorial on recent progress in collaborative filtering. RecSys, 8:333–334, 2008. →pages 42[25] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommendersystems. Computer, 42(8):30–37, 2009. → pages 26[26] B. Lafreniere, A. Bunt, M. Lount, and M. A. Terry. Understanding the roles and uses of webtutorials. In ICWSM, 2013. → pages 7[27] W. Li, J. Matejka, T. Grossman, J. A. Konstan, and G. Fitzmaurice. Design and evaluation ofa command recommendation system for software applications. ACM Transactions onComputer-Human Interaction (TOCHI), 18(2):6, 2011. → pages 1, 3, 10, 24, 26[28] X. Li and H. Chen. Recommendation as link prediction in bipartite graphs: A graphkernel-based machine learning approach. Decision Support Systems, 54(2):880–890, 2013.→ pages 3[29] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journalof the American society for information science and technology, 58(7):1019–1031, 2007. →pages 3, 10, 11, 41[30] R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla. New perspectives and methods in linkprediction. In Proceedings of the 16th ACM SIGKDD international conference onKnowledge discovery and data mining, pages 243–252. ACM, 2010. → pages 11[31] H.-T. Lin, C.-J. Lin, and R. C. Weng. A note on platts probabilistic outputs for supportvector machines. Machine learning, 68(3):267–276, 2007. → pages 25[32] G. Linden, B. Smith, and J. York. Amazon. com recommendations: Item-to-itemcollaborative filtering. Internet Computing, IEEE, 7(1):76–80, 2003. → pages 43[33] F. Linton, D. Joy, H.-P. Schaefer, and A. Charron. Owl: A recommender system fororganization-wide learning. Educational Technology & Society, 3(1):62–76, 2000. → pages9, 26[34] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributedgraphlab: a framework for machine learning and data mining in the cloud. Proceedings ofthe VLDB Endowment, 5(8):716–727, 2012. → pages 27[35] K. Lubick, T. Barik, and E. Murphy-Hill. Can social screencasting help developers learnnew tools? 2015. → pages 8[36] C. D. Manning, P. Raghavan, H. Schu¨tze, et al. Introduction to information retrieval,volume 1. Cambridge university press Cambridge, 2008. → pages 43[37] J. Matejka, W. Li, T. Grossman, and G. Fitzmaurice. Communitycommands: Commandrecommendations for software applications. In Proceedings of the 22Nd Annual ACMSymposium on User Interface Software and Technology, UIST ’09, pages 193–202, New49York, NY, USA, 2009. ACM. ISBN 978-1-60558-745-5. doi:10.1145/1622176.1622214.URL → pages 1, 3, 10, 24, 26, 27, 30[38] J. Matejka, T. Grossman, and G. Fitzmaurice. Ip-qat: in-product questions, answers, & tips.In Proceedings of the 24th annual ACM symposium on User interface software andtechnology, pages 175–184. ACM, 2011. → pages 8[39] R. L. Mercer et al. Class-based n-gram models of natural language, 1990. → pages 4, 11[40] D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine learning, neural and statisticalclassification. 1994. → pages 11, 15, 16[41] K. P. Murphy. Machine learning: a probabilistic perspective. MIT press, 2012. → pages 16[42] E. Murphy-Hill. Continuous social screencasting to facilitate software tool discovery. InProceedings of the 34th International Conference on Software Engineering, pages1317–1320. IEEE Press, 2012. → pages 8[43] E. Murphy-Hill and G. C. Murphy. Peer interaction effectively, yet infrequently, enablesprogrammers to discover new tools. In Proceedings of the ACM 2011 conference onComputer supported cooperative work, pages 405–414. ACM, 2011. → pages 8, 42[44] E. Murphy-Hill, R. Jiresal, and G. C. Murphy. Improving software developers’ fluency byrecommending development environment commands. In Proceedings of the ACM SIGSOFT20th International Symposium on the Foundations of Software Engineering, page 42. ACM,2012. → pages 4, 10, 20, 23, 24, 26, 30, 31, 41, 42[45] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of MachineLearning Research, 12:2825–2830, 2011. → pages 25[46] M. A. Saied, O. Benomar, H. Abdeen, and H. Sahraoui. Mining multi-level api usagepatterns. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22ndInternational Conference on, pages 23–32. IEEE, 2015. → pages 4[47] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimensionality reduction inrecommender system-a case study. Technical report, DTIC Document, 2000. → pages 26[48] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filteringrecommendation algorithms. In Proceedings of the 10th international conference on WorldWide Web, pages 285–295. ACM, 2001. → pages 26[49] T. Segaran. Programming collective intelligence: building smart web 2.0 applications. ”O’Reilly Media, Inc.”, 2007. → pages 15[50] J. Sohl-Dickstein, B. Poole, and S. Ganguli. Fast large-scale optimization by unifyingstochastic gradient and quasi-newton methods. arXiv preprint arXiv:1311.2115, 2013. →pages 4450[51] K. Sparck Jones. A statistical interpretation of term specificity and its application inretrieval. Journal of documentation, 28(1):11–21, 1972. → pages 10, 27[52] R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performanceimprovements. Springer, 1996. → pages 4[53] Y. Sun, R. Barber, M. Gupta, C. C. Aggarwal, and J. Han. Co-author relationship predictionin heterogeneous bibliographic networks. In Advances in Social Networks Analysis andMining (ASONAM), 2011 International Conference on, pages 121–128. IEEE, 2011. →pages 3, 10[54] M. Trevisiol, L. M. Aiello, R. Schifanella, and A. Jaimes. Cold-start news recommendationwith domain-dependent browse graph. In Proceedings of the ACM Recommender Systemconference, RecSys, volume 14, 2014. → pages 11, 27[55] M. B. Twidale. Over the shoulder learning: supporting brief informal learning. ComputerSupported Cooperative Work, 14(6):505–547, 2005. → pages 8[56] P. Viriyakattiyaporn and G. C. Murphy. Improving program navigation with an active helpsystem. In Proceedings of the 2010 Conference of the Center for Advanced Studies onCollaborative Research, pages 27–41. IBM Corp., 2010. → pages 4, 9[57] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques.Morgan Kaufmann, 2005. → pages 43[58] H. Yin, B. Cui, J. Li, J. Yao, and C. Chen. Challenging the long tail recommendation.Proceedings of the VLDB Endowment, 5(9):896–907, 2012. → pages 15[59] H.-F. Yu, F.-L. Huang, and C.-J. Lin. Dual coordinate descent methods for logistic regressionand maximum entropy models. Machine Learning, 85(1-2):41–75, 2011. → pages 43[60] E. Zheleva, L. Getoor, J. Golbeck, and U. Kuter. Using friendship ties and family circles forlink prediction. In Advances in Social Network Mining and Analysis, pages 97–113.Springer, 2010. → pages 1051


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items