Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Computing degree-of-knowledge values for a developer's workspace Ou, Jingwen 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2009_fall_ou_jingwen.pdf [ 1.21MB ]
JSON: 24-1.0051655.json
JSON-LD: 24-1.0051655-ld.json
RDF/XML (Pretty): 24-1.0051655-rdf.xml
RDF/JSON: 24-1.0051655-rdf.json
Turtle: 24-1.0051655-turtle.txt
N-Triples: 24-1.0051655-rdf-ntriples.txt
Original Record: 24-1.0051655-source.json
Full Text

Full Text

Computing Degree-of-Knowledge Values for a Developer’s Workspace by Jingwen Ou B.Sc., Guangdong University of Technology, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May 2009 © Jingwen Ou, 2009 Abstract Previous research in computer science shows that software developers are typically deluged by an enormous volume of information daily. Improving the effectiveness of developers to filter this information may yield signifi cant productivity improvements. To combat this overload, we introduce an indicator, called degree-of-knowledge (DOK), which is a real value indicat ing how much knowledge a developer has with a source code element. A developer’s DOK values for a source code base can be computed automati cally from authorship data mined from the source revision systems and from interaction data collected as the developer works. This indicator may help reduce information overload by, for instance, filtering the source code to only show the elements for which a developer has high knowledge. We describe our implementation of an efficient framework for computing DOK values in a development environment. 11 Table of Contents Abstract ii Table of Contents iii List of Tables v List of Figures vi List of Programs vii Acknowledgements 1 1 Introduction 2 2 Related Work 5 2.1 Expertise Recommenders 5 2.2 Degree-of-Interest Model 6 2.3 Indicators of Knowledge of Code 7 2.4 Awareness Approaches 7 3 Model 8 3.1 The Components of Degree-of-Knowledge 8 3.1.1 Degree-of-Interest 8 3.1.2 Degree-of-Authorship 9 3.1.3 Degree-of-Knowledge 9 3.2 Events Used to Compute Degree-of-Knowledge 10 3.2.1 Interaction Events 10 3.2.2 Authorship Events 12 3.3 Computing Degree-of-Knowledge 13 3.3.1 Degree-of-Interest 13 3.3.2 Degree-of-Authorship 16 3.3.3 Degree-of-Knowledge 18 in Table of Contents 4 Implementation 19 4.1 Architecture 19 4.2 Core 21 4.3 Storage of Knowledge 23 4.4 Monitor 24 4.4.1 Interaction Monitoring 25 4.4.2 Authorship Monitoring 27 4.5 Extensions 33 4.5.1 Bridging to a Source Revision System 33 4.5.2 Adding a Knowledge Indicator 33 4.6 Performance 34 4.6.1 One Working Day Scenario 35 4.6.2 Seven Working Days Scenario 35 5 Discussion 39 5.1 Scoped Structured View 39 5.2 Awareness of Knowledge 40 5.3 Knowledge Map 43 5.4 Change Set Assessment 43 5.5 Onboarding 44 5.6 Future Work 44 5.6.1 Adding More Indicators 44 5.6.2 An Automated Learning Model 45 6 Summary 46 Bibliography 47 iv List of Tables 3.1 Indicator event schema 10 3.2 Interaction event types 11 3.3 Sample interaction history 11 3.4 Authorship event types 12 3.5 Sample authorship history 13 3.6 Interaction scaling factors 16 3.7 Authorship scaling factors 17 4.1 XML schema for the persistency of source code elements . . 24 4.2 Comparison of terms between Jazz Team Server and Concur rent Version System 28 4.3 Number of elements with positive DOK values 37 v List of Figures 4.1 Framework architecture showing OSGI plug-ins and their de pendencies 20 4.2 Core plug-in dependency 21 4.3 Event-based model 22 4.4 Knowledge model class diagram 23 4.5 Event-based model class diagram 25 4.6 Interaction monitor plug-in dependency 26 4.7 Jazz Team Server monitor plug-in dependency 30 4.8 Concurrent Versions System monitor plug-in dependency. 32 4.9 Repository bridge class diagram 34 4.10 Used Java heap in kilobytes for one simulated workday . . . . 36 4.11 CPU time in milliseconds for one simulated workday 36 4.12 Used Java heap in kilobytes for seven simulated workdays . . 38 4.13 CPU time in milliseconds for seven simulated workdays . . . 38 5.1 Scoping task context with DOK model 40 5.2 Awareness of knowledge drop when synchronizing workspace 42 5.3 Part of a knowledge map 43 vi List of Programs 3.1 Incremental DOT Computation 15 3.2 Incremental DOA Computation 17 vii Acknowledgements I thank my supervisor Gail C. Murphy for her guidance, support and encour agement that make this work possible. Thanks to her for being supportive for me to work on the thesis. Thanks to her for all the insightful discussions that will continue to guide my whole career life. I thank Gregor Kiczales for being my second reader and giving me in valuable comments. Thanks to all the great friends I’ve met in Vancouver for making it such an amazing experience. 1 Chapter 1 Introduction Software developers use integrated development environments (IDEs) to ease their work on large systems. These envirOnments provide almost immediate access to multiple sources of information about the software under devel opment, including source code, bug reports, and Really Simple Syndication (RSS) feeds. These environments make it easy for a developer to navigate across related pieces of information. For example, most environments make it easy to navigate across calling relationships between source code elements and between historical revisions to elements. Source code elements include types, methods, and fields in a source code base. These environments have been engineered to provide access to millions of lines of source code. However, in providing access to a large amount of information, these environments are also deluging developers with more information than they need to solve a particular available task, which typ ically requires only a fraction of the information [9j. Further complicating this problem is that the information accessed by developers is often being changed by other members of a team. A source code element that a de veloper studied in depth a month ago may be completely changed by the time the developer revisits the element. In a recent study, it was shown that for a group of industrial developers over a three-month period, an element accessible in their development environment was being changed on average once every 54 seconds [7]. The only effort of which we are aware to directly address the information overload problem is the Eclipse Mylyn’ project. Mylyn is based on the idea that a developer’s interaction with a system can be transformed into a degree-of-interest model [11], where each source code element in the system is weighted according to its relevance to the task at hand. For each task, a task context can be formed comprising the interactions a developer has had with the source while working on the task. From the information in a task context, the set of source code elements with a positive degree-of interest can be constructed. In this way, the task context can be used to ‘, verified 25 April 2009 2 Chapter 1. Introduction focus the User Interface (UI) of the development environment by highlighting the most important elements, filtering the unimportant ones, and allowing developers to perform operations, such as performing check-ins to a source code repository, on elements of interest. The main attempt to address problems related to information change in a development environment has been approaches to make developers aware of changes as they are occurring. Palantir [1], for example, increases a de veloper’s awareness by continuously sharing information regarding the ac tivities of other developers, instead of informing developers of other efforts only when they themselves perform some configuration management opera tion (e.g., check in or check out). Specifically, Palantir informs a developer of which other developers change which other artifacts, calculates a simple measure of severity of those changes, and graphically displays the informa tion in a configurable and generally non-obtrusive manner. In this thesis, we introduce an approach that has the potential to ad dress both the overload and the high rate of change problem. We introduce an indicator for which code in an environment a developer has knowledge about. With such an indicator, information overload may be reduced by showing only elements for which a developer has high knowledge for the past week. This filtering might help reduce information overload by helping to seed appropriate elements into the start of a programming task, which can then be managed by Mylyn. With such an indicator, a developer could potentially address issues related to the high rate of change of information in the environment. For instance, when receiving a group of source revi sions, for example a change set, from a team member that is intended to fix a bug or implement an enhancement, a developer must assess the impact of the change set: will the revisions integrate easily or will they cause the system to break? An indicator of which elements related to the change set a developer has good knowledge of might help a developer to focus their attention on the change sets and code that need the most review. A previous study conducted about a developer’s knowledge of code [6] suggests that two important components that indicate a developer’s knowl edge are the interactions a developer has had with the code as captured by a degree-of-interest model [11], and which code the developer has authored. The knowledge indicator we introduce in this thesis, a degree-of-knowledge (DOK) value, incorporates both of these components. A DOK value for a source code element is specific to a developer and can be automatically computed from data gathered about and during the development process (Section 3). Different developers may have different DOK values for the same source code elements. We compute the DOK value for a developer by 3 Chapter 1. Introduction combining authorship data from the source revision system and interaction data from monitoring the developer’s activity in the development environ ment. In addition to presenting an abstract formulation of the DOK model (Section 3), we also describe an efficient implementation to compute DOK values as a developer works with a large amount of code in a development environment (Section 4). The main focus of this thesis is the definition and implementation of an efficient framework for computing DOK values in a development environ ment. While experimentation to determine the usefulness of DOK values to address the problems outlined above is future work, we provide a glimpse of the kinds of information a developer might find useful, using the DOK values (Section 5). We make the following contributions in this thesis. • We provide a model for a degree-of-knowledge (DOK) value that in corporates both a developer’s interaction with code elements and a developer’s authorship of code elements. • We describe an efficient and extensible implementation for the DOK model. • We discuss the usage. of the DOK model in different scenarios, report ing on the benefits of the DOK model. 4 Chapter 2 Related Work We build our DOK model by analyzing authorship information from a project’s source revision system and interaction information from an in tegrated development environment (IDE). In this chapter, we present work related to building a knowledge model, including expertise recommender tools that mine authorship change information from source repositories (Sec tion 2.1) and an approach that approximates knowledge per task based on interaction information (Section 2.2). We also describe a user study that suggests significant factors in building a DOK model (Section 2.3) and com pare our approach to awareness tools (Section 2.4). 2.1 Expertise Recommenders Earlier work has considered how to determine which developers are experts in, or have knowledge of, particular parts of a source code base by relying on authorship change information from a project’s source repository (e.g., [14]). Most of these approaches to constructing a knowledge model are based on a heuristic called the “Line 10 Rule”, which states that the developer who last committed a change to a file has expertise in that file [12]. One of the earliest systems, the Expertise Recommender, selects an expert assuming that the developer who made the most recent change to a source file has the relevant expertise [12]. Another system, the Expertise Browser, uses experience atoms, basic units of experience created by mining the version control system, to rank developers according to revision times so that the developer who made the last change to the source file has the highest rank of knowledge on a particular part of the system [14]. The Emergent Expertise Locator goes one step further by considering the relationship between file modifications and who made those changes [13]. Girba and colleagues used finer-grained information by considering the number of lines of code that each developer has modified when equating expertise [8]. All of these previous approaches consider a developer’s expertise either as a binary function or as a monotonically increasing function. The Expertise Recommender considers the developer who made the last change to a file 5 2.2. Degree-of-Interest Model as the only one having expertise in that file. The Expertise Browser and Emergent Expertise Locator rank expertise by accumulating information of who made recent changes to a file, without taking care of the fact that acceptance of changes to a file by other developers also likely degrades the developer’s expertise in this file. Our approach also considers authorship information from a source project’s repository. However, we refine the existing approaches by modeling the ebb and flow of multiple developers changing the same file: a developer’s degree- of-knowledge in the file rises when the developer commits changes to the source repository and diminishes when other developers make subsequent changes to the same code. Our approach also considers the interactions the developer has with the source codebase in this ebb and flow: a developer’s degree-of-knowledge in the file rises when the developer interacts with it and diminishes when she interacts with other files. 2.2 Degree-of-Interest Model Another way to gain knowledge about source code is to interact with the code in a development environment. Kersten et al. used this type of approach, proposing a degree-of-interest (DOT) value to represent which program el ements a developer has interacted with significantly [9]. The basic idea is that the more frequently and recently a developer has interacted with a par ticular program element, the higher the DOl value; as a developer moves to work on other program elements, the DOT value of the initial element decays. The original applications of this concept computed DOTs across a developer’s workspace and filtered the views in the development environ ment based on DOT values to reduce information overload [10]. Subsequent work focus the DOT computation on a per-task basis since a developer of ten works on and switches between tasks and computing DOT values across the whole workspace did not adequately scope the display of appropriate element [11]. In this thesis, we refine the approach by considering how a developer interacts with the code in a development environment as they col laborate with teammates by producing changes to the code. We chose to return to the computation of DOT across all of a developer’s workspace to capture a developer’s knowledge of the source across tasks. 6 2.3. Indicators of Knowledge of Code 2.3 Indicators of Knowledge of Code Firtz et al. has conducted an experiment to investigate what factors could indicate for which code a developer has knowledge [6]. Through this study, it was found that DOT values computed from the interaction information can indicate knowledge about a developer’s code and other factors, such as authorship of code, also play a significant role in gauging a developer’s knowledge of the code. Our DOK model uses the results of this previous study by considering two key factors that impact a developer’s knowledge of code: interaction and authorship. The focus of this thesis is the implemen tation of the DOK model. An investigation of whether such a model can help with the information overload problem is reported on elsewhere [7]. 2.4 Awareness Approaches Various awareness tools have been proposed for collaborative development environments to make developers aware of other team members’ work. [3]. For example, Flesce [2] features real-time awareness of team activities by pro viding a shared environment (e.g., a shared debugger); Palantir [1] enables developers to detect potential conflicts early, as they occur, by providing workspace awareness in the form of which developers are changing which artifacts and by how much. Elvin [5], an augmentation to CVS with real time notification and chat facilities, sends real-time CVS log messages to developers on what code changes are happening and enables them to engage in a timely work discussion. These existing approaches focus on the immediate presentation of changes to artifacts from various sources (e.g., source control system). In contrast, our tool builds a model of that aims to raise developers’ awareness of im portant information in the workspace weighted by the DOK values. For example, our tool might be used to filter the source code elements that a developer has knowledge of and focus her attention on the elements that she potentially needs to learn (e.g., unfamiliar elements that are related to a bug). 7 Chapter 3 Model Our degree-of-knowledge (DOK) model captures an individual developer’s perspective on her source code. Each source code element loaded into a development environment used by a developer is assigned a degree-of- knowledge value. The intent of the model is the higher the value, the more familiar the developer is with the source code element. Our model captures a DOK value for a source code element based on one component that indicates a developer’s short-term knowledge of a source code element, represented by a degree-of-interest (DOT) value, and a sec ond component that indicates a developer’s longer-term knowledge of the element, represented by a degree-of-authorship (DOA) value. To better de scribe our concepts, we use D1 to represent the developer for whom DOK values in source code elements are being computed. 3.1 The Components of Degree-of-Knowledge A developer works on source code and builds up DOK values for that source code over long periods of time. As the developer works, the integrated devel opment environment (IDE) used by the developer, is started and stopped. We refer to the periods during which the IDE is active as a session. For instance, suppose D1 works 8 hours one day and at the end of the workday exits the IDE, forming one session. She then invokes a new session the next workday. To accurately represent DOK values, we must compute DOT and DOA values across these sessions. When describing how we compute DOK values, we also use the term workspace, which refers to the source code elements a developer has access to in the IDE. 3.1.1 Degree-of-Interest The intent of DOT is to capture the interest level of D1 in a source code ele ment on which D1 is working. The DOT concept was developed to represent a developer’s interest in an element as part of working on a task [91. In this work, we compute DOT over all of D1’s interactions with the source code, 8 3.1. The Components of Degree-of-Knowledge similar to the original calculation of DOT [10]. We compute DOT in this way because the scope of a developer’s knowledge is all source code elements in her workspace instead of elements specific to a task. The DOT value for an element is based on the frequency and recency of interactions that D1 has with the source code element. The frequency is determined by the number of interactions with the element as a target. The recency is defined by a decay that is proportional to the total number of interactions a developer has with the elements in the workspace and the first interaction with the element. 3.1.2 Degree-of-Authorship The DOA component is intended to represent how well D1 knows a source code element based on whether or not she wrote code comprising that ele ment. In a team development situation, it is possible (and even likely) that over time, other developers also contribute to the same source code element as D1. Contributions by others to the element are assumed to decrease D1’s familiarity with the element [6]. The DOA for D1 in a source code element can rise and fall as D1 and others modify the element. We consider the DOA of D1 of an element to be determined by three factors: • first authorship (FA), representing whether D1 created the first version of the element; • the number of deliveries (DL), representing subsequent changes by D1 to the element after initial authorship; and • the number of acceptances (AC), representing changes to the element not contributed by D1. 3.1.3 Degree-of-Knowledge The DOK represents the knowledge of D1 over a source codebase. It is a linear combination of the DOA and DOl of a source code element e: DOK(e) = c x FA(e) + /3 x DL(e) + y x AC(e) + S x DOT(e) (3.1) To determine appropriate weightings (, /3,, 6), an experiment was con ducted with six professional Java developers [7]. 9 3.2. Events Used to Compute Degree-of-Knowledge 3.2 Events Used to Compute Degree-of-Knowledge Computing DOK requires information about how a developer is working with source code elements in a development environment. At the heart of the model are the events that encapsulate the activities of a developer in an IDE. We use the term indicator event to represent events monitored in a development environment to help compute DOK. Two kinds of indicator events are used: 1. interaction events, representing how a developer selects, edits and oth erwise interacts with source code elements in her workspace, and 2. authorship events, representing changes made to an element and the author of those changes. Each indicator event captures at least four pieces of information (Ta ble 3.1). The time field stamps an event with the moment that the event occurred. The structure kind field provides an identifier that binds the ele ment to a particular domain structure available within the IDE (e.g., “java”). The structure handle field provides an identifier that uniquely identifies the element for the given structure type within the IDE. The event type field describes the type of the event. The contribution value field indicates how much the event contributes to the computation of the DOK value. Tih1 1 Tndr±nr vnt srhms Information Description Time The time of the event occurrence. Structure Kind Identifier describing the kind of element oper ated upon. Structure Handle Identifier of the target element. Event Type The type of the event. Contribution Value The value contributing to the computation of the DOK value. 3.2.1 Interaction Events As a developer works, she interacts with the source code through the devel opment environment. These interactions can be captured and are referred to as interaction events. Interaction events are one kind of indicator event. 10 3.2. Events Used to Compute Degree-of-Knowledge Some interaction events are the result of a developer’s interactions with source code elements. For instance, a developer D1 may select a particular Java method to view its source. D1 may then edit the element before saving the file containing the element. Each of these interactions corresponds to an event of a different kind carrying different contribution values. Some interaction events are a summary of interaction events in previous sessions, for instance, the system remembers the elements that D1 interacted with in previous sessions and reproduces the elements’ DOT values next time she starts up the IDE. Table 3.2 summarizes the three supported interaction event types. Table 3.2: Interaction event types Table 3.3: Sample ihteraction history Event Kind Target(s) 1 INITIAL_DOT ContextStructureBridge.getHandleldentifier() method 2 SELECTION ContextStructureBridge class 3 EDIT ContextStructureBridge.getHandleldentifierQ method 4. . . 10 EDIT ContextStructureBridge.getContentTypeQ 11 SELECTION ContextStructureBridge.getllandleldentifierQ method Table 3.3 provides an example of the sequence of interaction events re sulting from part of D1’s work session. For simplicity, we use an event number to stand in for the time field of an interaction event. D1 first starts up the IDE (event 1 in Table 3.3) which causes the DOT values of all source code elements that D1 knows from previous sessions to be loaded. Then, she selects a class, ContextStructureBridge (event 2) and invokes a rename operation on one of its methods (event 3). D1 then edits another method in the same class (event 4 to 10) before she selects the method again (event Category Event Type Description interaction SELECTION Selections of an element via mouse or keyboard in any editor or view. EDIT Edit of an element in any editor. system generated INITIAL_DOT Reproduces an element’s DOT value of previous sessions. 11 3.2. Events Used to Compute Degree-of-Knowledge 11) to facilitate subsequent modifications (not shown). 3.2.2 Authorship Events As a developer works in a team, she interacts with other developers by contributing to the same source code base through the development envi ronment. A developer can push or pull changes to and from the source revision system, typically through a shared team view in the IDE, We refer to the operations captured from this view as authorship events. Some authorship events are the result of the changes made to a program element and shared with the team; for instance, a developer D1 is the first author of a particular Java method and makes three changes to it before accepting a change from another developer D2. Each of these authorship changes corresponds to an event of a different kind with different contribu tion values. Only changes that are shared with the team are considered as authorship events; changes made by a developer local to their workspace are not considered. Other authorship events are also generated by the system to summarize the authorship events in previous sessions. Four authorship event types are supported as summarized in Table 3.4. Table 3.4: Authorship event types Category Event Type Description authorship change FIRST_AUTHORSHIP The creation of the first version of an element. DELIVERY Subsequent changes after first authorship made to the element by a developer and shared with the team. ACCEPTANCE Changes to the element made by other members of the team and shared with the developer for whom DOA is being calculated. system generated INITIAL_DOA Accesses an element’s DOA value of a previous session. Table 3.5 provides an example of part of a sequence of authorship events of one of D1’s work sessions. As before, we use an event number to stand in for the time field of an authorship event. D1 first starts up the IDE 12 3.3. Computing Degree-of-Knowledge which causes the DOl values of all source code elements that D1 knows from previous sessions to be loaded (event 1 in Table 3.5). Then, she adds a method to a class and delivers the changes (event 2). D1 then makes and delivers three changes to the method (event 3 to 5)) before she accepts another new change to the method (event 6) by her teammate D2. Table 3.5: Sample authorship history Event Kind Target(s) 1 INITIAL_DOA CortanaCore class 2 FIRST_AUTHORSHIP CortanaCore.getStructureBridge() method 3. . . 5 DELIVERY CortanaCore.getStructureBridge() method 6 ACCEPTANCE CortanaCore.getStructureBridgeQ method 3.3 Computing Degree-of-Knowledge Each event type has a different scaling factor constant, resulting in different weightings for different kinds of interactions and authorship changes. We conducted an experiment to empirically investigate these weightings [7]. In essence, the experiment involved gathering data about authorship from the revision history of a project and about interest by monitoring developers’ interactions with the code as they work on the project. We then asked the developers to rate their knowledge in particular code elements. With the de veloper ratings, we used multiple linear regression to determine appropriate weightings for the various factors described in Section 3.1.3. To support DOK computation over a long period of time, we use incre mental algorithms. Starting with the DOK values provided from a previous session, the incremental algorithms continuously compute the knowledge that a developer has with the source code elements. 3.3.1 Degree-of-Interest We compute DOI for an element using the basic approach used in the Eclipse Mylyn project, based on frequency and recency of a developer’s interactions with the element. However, unlike Mylyn, there is a need to compute the values across sessions and for a larger number of source code elements at 13 3.3. Computing Degree-of-Knowledge once, since we need to compute DOK over all elements in the workspace. The DOT value of a target element e is calculated by dividing it into sessions: DOI(e) = DOIsession(e)j (3.2) where DOlsession(e) = x selectionssessjon(e) + scalingedjt x editssessjon (e) — scalingdecay x event_distance50(e) (3.3) DOlsession(e) is the sessional DOT value of the target element e, is the number of selections of e in the session, edits50(e) is the number of edits of e in the session, and event_distance30( )is the position iii the sessional event stream of the first interaction with e. Program 3.1 describes how we incrementally compute a DOT value for an element, given an interaction event with the element as the target. The intent of the PROCESS_EVENT function is to compute the new DOl value for an element when an interaction event occurs and to ensure that the DOI value of an element becomes positive if the DOI value for the element has decayed to a negative interest when a new event occurs, The definition of PROCESS_EVENT uses a data structure, called knowledgeElement, to store the decay start event number (knowledgeElement . decayStart) and the interest value (knowledgeElement. interest) of the element (Section 4.2). In PROCESS_EVENT function, we retrieve the previous interest value of the element (line 3), increment the interest value of the element based on the kind of the current event (line 4) and, if the decay has offset the interest, reset the decay to start at the current event (line 7-8). Finally the interest value of the element is updated (line 9). The GET_KNOWLEDGE_ELEMENT func tion returns or creates the data structure (knowledgeElement) of the target element.The DECAY function computes the decay based on the distance of two interaction events, where the distance is in terms of the number of events instead of the time stamp of the events. The DOl function computes the real degree-of-interest value of an element. The LAST_EVENT is a counter that counts the total number of interaction events. The SCALING function returns the constant associated with each event kind and with the KIND-DECAY con stant. The scaling factors used are the same as in the Eclipse Mylyn project and are summarized in Table 3.6. 14 3.3. Computing Degree-of-Knowledge Program 3.1 Incremental DOl Computation - - PROCESS_EVENT(element , elementEvent) 1 knowledgeElement = GET_KNOWLEDGE_ELEMENT(element) 2 decayStart = knowledgeElement.decayStart 3 interest = knowledgeElement.interest 4 interest += SCALING(KIND(elementEvent)) 5 currDecay = DECAY(decayStart, elementEvent) 6 if interest < durrDecay then I/reset decay & interest 7 knowledgeElement.decayStart = elementEvent 8 interest = SCALING(KIND(elementEvent)) 9 knowledgeElement.interest = interest DECAY(fromEvent ,toEvent) 10 decayDistance = ItoEvent - fromEventi 11 return decayDistance * SCALING(KIND-DECAY) DOl (element) 12 knowledgeElement = GET_KNOWLEDGE_ELEMENT(element) 13 decayStart = knowledgeElement . decayStart 14 interest = knowledgeElement.interest 15 totalDecay = DECAY(decayStart, LAST_EVENT) 16 return interest — totalDecay 15 3.3. Computing Degree-of-Knowledge Table 3.6: Interaction scaling factors Event Type Scaling Factor SELECTION 1 EDIT 0.7 DECAY 0.017 We use the events in Table 3.3 to illustrate the algorithm. To simplify the example, we assume the scaling factors for SELECTION and EDIT are both 1 and the scaling factor for DECAY is 0.5. We compute the DOI value for the ContextStructureBridge.getHandleldentifier method. It has an initial DOT value of 2 from previous sessions (event 1 in 3.3). Then there is a selection on another element so its DOl value decays by 0.5 and drops to 1.5 (event 2). It is then edited and has the DOT value of 2.5 (event 3). From event 4 to 10, it decays by 3 in total and its DOT value drops to -0.5. Then it is selected again (event 11) and its decay value and interest value are reset hence having the final DOT value of 1. 3.3.2 Degree-of-Authorship We compute DOA based on the frequency of authorship change, which is determined by three event types: first authorship, the number of deliveries and the number of acceptances. Each event type has a different scaling factor constant, resulting in different weightings for different kinds of authorship changes. We also calculate the DOA value by dividing it into sessions and the DOA of a target element e for a developer D1 can be defined as DOA(e) = DOAsession(e)i (3.4) where DOAsession (e) = scalingfirstauthorship x first_authorshipsessjon (e) + scalingdeljvery X deliveriessession (e) — scalingacceptance x acceptancessessjon(e) (3.5) DOAsession(e) is the sessional DOA value of the element e, first_authorship is whether the developer D1 created the first version of e in the session, deliveries is the number of changes made to e and delivered by D1, and 16 3.3. Computing Degree-of-Knowledge acceptances is the number of changes made to e not by D1 but accepted by D1. Program 3.2 describes how we incrementally compute a DOA value for an element, given an authorship event with the element as the target. We get the previous authorship value (line 2), increment the DOA value of the element on the kind of the current event (line 3), and finally update the authorship value of the element (line 4). Program 3.2 Incremental DOA Computation PROCESS_EVENT(element, elementEvent) 1 knowledgeElement = GET_KNOWLEDGE_ELEMENT(eleiuent) 2 authorship = knowledgeElement.authorship 3 knowledgeElement += SCALING(KIND(elementEvent)) 4 knowledgeElement.authorship = authorship DOA (element) 5 knowledgeElement = GET_KNOWLEDGE_ELEMENT(element) 6 return knowledgeElement . authorship The SCALING function returns the constant associated with each event kind. These scaling factors come from an empirical experiment [7j and are summarized in Table 3.7. Table 3.7: Authorship scaling factors Event Type Scaling Factor FIRSTAUTHORSHIP 1.098 DELIVERY 0.164 ACCEPTANCE -0.321 Using the events from Table 3.5, we compute the DOA value of the CortanaCore.getStructureBridge method for developer D1. To simplify the example, we assume the scaling factors for FIRST_AUTHORSHIP, DELIV ERY and ACCEPTANCE are 1, 0.8 and -0.5 respectively. The method does not have any initial DOA value from previous sessions so its DOA value is 0 (event 1 in Table 3.5). Then D1 shares the method that she authored with her teammate and delivers to the source revision system (event 2), hence it has the DOA value of 1. From event 3 to 5, D1 delivers another three revi sions of the methods hence having the DOA value of 3.4. Then she accepts 17 3.3. Computing Degree-of-Knowledge a change to the method (event 6) from her teammate D2 hence having the final DOA value of 2.9. 3.3.3 Degree-of-Knowledge We combine the DOT and DOA of a source code element for a developer to provide an indicator of the developer’s knowledge in that element. The degree-of-knowledge we compute linearly combines the factors contributing to the DOA and the DOT. We determined the scaling factors through an experiment described elsewhere [7j. The DOK value of a target element e is computed as: DOK(e) = 1.098 x FA(e) + 0.164 x DL(e) —0.321 x ln(1 +AC(e)) + 0.19 x ln (1 + DOT(e)) (3.6) FA(e) is the first authorship of the element e, DL(e) is the number of deliveries of e, AC(e) is the number of acceptances of e, and DOT(e) is the DOT value of e. 18 Chapter 4 Implementation There are two key challenges to implementing the DOK model: determining the mapping from how a developer interacts with a particular IDE and a particular source revision system to the events described in the knowledge model and being able to bridge to different monitors to track these inter actions. We implemented the knowledge model for the Eclipse IDE and we call it Cortana2 . The Eclipse IDE provides a set of unified APIs for monitoring a developer’s interaction. To monitor authorship changes in two source revision systems, we reuse two applications built on top of Eclipse: the Rational Team Concert4 and the Eclipse CVS client5. Rational Team Concert is a collaborative work environment built on top of Eclipse for devel opers, architects and project managers with work item, source control, build management, and iteration planning support for the Jazz Team Server. The Eclipse CVS client is a component of Eclipse that can connect to the Con current Version System. Jazz Team Server and Eclipse CVS client provide APIs to access repository information in different levels. In this chapter, we describe the framework architecture that implements the DOK model (Section 4.1); we then move onto the implementation of the knowledge model, including how we persist knowledge information between sessions (Section 4.2 and Section 4.3); we then give a description of the implementation of the monitors (Section 4.4) and adding additional monitors and knowledge indicators to the model (Section 4.5). 4.1 Architecture The framework comprises three components: core, monitor and integra tion. The core component provides a model and operations not coupled to any particular platform and suitable for use in server-side applications for 2Cortana is a fictional artificially intelligent (AT) character in the Halo video game series who is capable of hacking into human’s mind. 3, verified 29 April 2009 4http://jazz,net/, verified 26 April 2009 5http://wwweclipseorg/eclipse/platform-cvs/, verified 26 April 2009 19 4.1. Architecture Core cari,.InLecactin Li Monitor mcnItor.h1j r ii monitcr.aiho,iip 1•• Integration momtor.mytyn —. inonhlot — Figure 4.1: Framework architecture showing OSGI plug-ins and their de pendencies embedding in another framework. The core component comprises two sub- components: interaction and authorship, which include the model and oper ations specific to the calculation of DOT and DOA respectively. The monitor component transforms a developer’s interaction with, or authorship change of, the source code elements into indicator events that are processed by the core component. It provides extension points to allow different monitor implementations to contribute indicator events to the calculation of DOK (Section 4.4). The integration component integrates different monitor imple mentations into the framework (Section 4.5.1). In the current version, the framework bridges to the Eclipse Mylyn project’s monitor framework for monitoring the user interactions and implements monitors for Jazz Team Server and Concurrent Version System to track authorship changes. Figure 4.1 illustrates our component model, with each label correspond ing to a component and each box corresponding to a subcomponent belong ing to the component. The key property of this architecture is loose cou pling of components. Each component is implemented as a plug-in within the Eclipse architecture, to allow such extensibility as the use of different monitor implementations. 20 4.2. Core Figure 4.2: Core plug-in dependency 4.2 Core The core component computes DOl, DOA and DOK values for each source code element. It also persists the computed DOK values to support use of the model across a developer’s sessions with the IDE. Figure 4.2 shows the dependency structure of the core plug-in, with arrows corresponding to plug-in dependencies. The core plug-in ( depends on the event-based model implemented in the monitor plug-in. The plug-ins in grey represent indirect dependencies of the core plug-in. The framework utilizes an event-based model and the Eclipse extension -, 21 4.2. Core k,onitors lcore — reit9rLictunrO handletndicatorEventQ processIndcatorEventO Figure 4.3: Event-based model point mechanism to achieve loose coupling of components. As shown in Figure 4.3, the core plug-in registers itself to all monitors integrated to the monitor plug-in (registerListener) and gets notified whenever an indicator event is issued by the monitors (handlelndicatorEventQ). The core plug-in then distributes the event to the appropriate event handler that contributes to the calculation of DOK (processlndicatorEventQ). Two event handlers are implemented in the current version: the interaction event handler and the authorship event handler. The core plug-in also provides an extension point for incorporating additional factors when computing DOK values (Sec tion 4.5.2). Figure 4.4 shows the class diagram of the knowledge model in the core plug-in. A knowledge context (IKnowledgeContext) is formed as a devel oper works. It is constituted by a set of source code elements that the devel oper has knowledge (IKnowledgeElement). The DOK value of each element increases and decreases as the developer interacts with, or performs author ship change operations on, the source code base. The computation of the DOK value is encapsulated in a composite indicator (IDegeeOf Knowledge), which is a composition of a group of individual knowledge indicators (IDegreeof Indicator). As discussed in Section 3.1.3, two indicators are implemented: an indicator for short-term knowledge (IDegreeof Interest) and an indicator for longer-term knowledge (IDegreeOfAuthorship). IDegreeOf Interest provides operations in support of the DOT computa 22 4.3. Storage of Knowledge Ilndicatorttver,tLiotener r interface — I atlr1bute nperatians I getKnow Element( Strir) JKnowledgeElernent [___ ciasses 7 IKnowledgeElernent operations getDegleeOfl<nowledge( ) lDegreeOfinowjJ classes lDragreeOAuthorship N in:erfacu IDeqreeofindlcator attributes I operations I getvalue( classes J Figure 4.4: Knowledge model class diagram tion by storing the interest value and the decay start event number (Sec tion 3.3.1). Similarly, IDegreeOfAuthorship supports the DOA computa tion by storing the authorship value (Section 3.3.2). As discussed in Sec tion 4.5.2, the model can be extended to integrate other indicators verified in future work. 4.3 Storage of Knowledge The core component also provides facilities for the persistency of source code elements for which a developer has DOK values above a certain thresholds. Table 4.1 describes the XML schema used to store the DOK values for elements. FA, DL, AC and DOT are the aggregate values from previous sessions. IDegreeOflntereot IDegreeOfKnowde attributes operations getDereeOflndicatcc( String ) IDegronOflndicator classes 23 4.4. Monitor Table 4.1: XML schema for the persistency of source code elements XML Attribute Description StructureHandle Identifier of the target element. StructureKind Identifier describing the kind of ele ment operated upon. FA The value of first authorship. DL The number of deliveries. AC The number of acceptance. DOT The DOT value. Time The time of the element being last interacted or having the last author ship change. The application persists these values when the IDE is shut down and re stores them when the IDE is started up. The reason that we stores the components of DOA computation (FA, DL and AC) but not components of DOT (selections and edits) is to conform to the DOK computation formula described in Section 3.3.3. 4.4 Monitor The monitor component provides different monitor implementations to be integrated into the framework. Since the framework conforms to an event- based mechanism, all monitor implementations should issue a specific indi cator event in order to contribute to the DOK computation life cycle. Figure 4.5 shows the class diagram of some of the essential interfaces in the monitor component. All contributing monitors should implement the IlndicatorEventSource interface to join the framework monitoring life cy cle, which issues lindicatorEvent whenever an operation contributing to an interaction or authorship change is performed. In the current implemen tation, two indicator events are supported: interaction events (linteractionEvent) and authorship events (lAuthorshipEvent). An lindicatorEvent encapsulates information about the source element on whIch an interaction or authorship change operation occurred (e.g., struc ture handle and structure kind) and the contribution value of the operation (e.g., an element is selected once hence the event’s contribution value is 1). 24 4.4. Monitor lAuthorshipEvent nioriitc, L_________ cC) llr,dkatotEventSource______ operatEns IInteIactlonEverct a&Litenei( flndkato1Event.tener) vemoveListensr( I1rdcatorEventlJsterer) qu dunes JtncIcntorEvent IinckatorEventllstener ottributns operation, getstructurehandre( ) getStructurolJ-rd( ) 2etCcrct1lbutIonVue( ) L classes Figure 4.5: Event-based model class diagram 4.4.1 Interaction Monitoring The Interaction Monitor subcomponent, an Eclipse plug-in, provides in teraction monitoring. The monitor can be installed separately from the framework and collects information about a user’s activity in Eclipse. The monitor plug-in adapts to Eclipse Mylyn’s monitor framework to track de velopers’ interactions with the IDE. It accepts listeners to different Eclipse events, including preference changes, perspective changes, window events, selections, and commands invoked through menus or key bindings. The linteractionEvent interface used by the monitor transforms the interact ing information that a developer has with source code elements. Scaling factors are used to determine the bias between different kinds of interac tions (e.g., bias of edits vs. selections) as listed in Table 3.6. Figure 4.6 shows the dependencies of this plug-in. It adapts to the framework’s event-based model by reusing the interaction monitoring facil ities provided by the Eclipse Mylyn project. The idea is that whenever an interaction is performed, the internal event generated by the Mylyn monitor is translated to the indicator event used in the framework and broadcast. 25 — _ _ _;._ — — — - - - .- iiinrIB1& — _ _ _g — - - - - L_ - - — — — — — Figure 4.6: Interaction monitor p’ug-in dependency 4.4. Monitor 4.4.2 Authorship Monitoring The Authorship Monitor subcomponent provides authorship change mon itoring by bridging to different source revision systems. These monitors can be plugged into the framework to track the delivery and acceptance of any source code element in different source repositories. The basic idea is that the monitor listener for the connections between a developer’s local workspace and its remote copy and generate events whenever an operation is detected. Since different source control systems support different types of corn mands using different terms, we find it necessary to define unified names for the set of generic operations that are supported by most repositories. For example, delivery means the commit of changes after first authorship made to the element by a developer in our framework. However, this command is referred to as a deliver in Jazz Team Server while in CVS it is called a commit. Two authorship monitors have been currently implemented: a monitor for the Jazz Team Server (Jazz) and a monitor for the Concurrent Versions System (CVS). Each of them monitors the operations that a developer per forms in a team shared view in Eclipse by issuing corresponding indicator events. The operations include delivery and acceptance of changes to and/or from the source revision systems. The lAuthorshipEvent interface is used by the monitors to encapsulate the authorship information when a developer changes an element, including information on the type of change (e.g.,. adding a method) and the author who made the change. Scaling factors listed in Table 3.7 are used to de termine the bias between different kinds of authorship changes (e.g., bias of first authorship vs. acceptance). Before going deep into the discussion of the implementation, we first compare two source revision systems. Section 4.4.2 describes the conceptual difference between Jazz and CVS. Section 4.4.2 and Section 4.4.2 describes their implementation details respectively. Jazz vs. CVS The biggest difference between CVS and Jazz is that Jazz pushes the idea of local workspace one step further: Jazz stores local workspace on the server as repository workspace while there is no equivalent for CVS. A repository workspace is the personal workspace with project files that are under source control. In Jazz, teams use a stream to store the master copy of project 27 4.4. Monitor files; each repository workspace holds a copy. Change sets checked in to a repository workspace can be delivered to a stream to make them available to other team members. There are no branches in Jazz since each team member is already working on a branch (called a repository workspace) from the stream and contributes to the stream in the unit of change sets. A stream is a repository workspace owned by all team members, whereas a repository workspace is a workspace owned by a particular team member. A new repository workspace can be cloned from any of these, which is similar to creating a new branch from head/branch using equivalent terms in CVS. Another difference is that CVS does not have the concept of change sets since CVS stores revisions of files individually. In Jazz, a change set is a grouping of related changes to files in a workspace or stream. In addition to updates to the contents of files, a change set iii Jazz also tracks file and folder creations, deletions, renames, and moves. Table 4.2 provides a comparison of terms between the two version control systems. Table 4.2: Comparison of terms between Jazz Team Server and Concurrent Version System Concurrent Version System Jazz Team Server Head Stream Branch Repository workspace Top lever folder in a module Component Commit Deliver Update Accept N/A Change set In the framework, we use the following terms. A delivery means com mitting changes to the stream from any workspace. The workspace can be a local workspace or a repository workspace. An acceptance means accepting changes from the stream to any workspace. Jazz Team Server To implement the Jazz Team Server monitor, we use the unified APIs pro vided by Rational Team Concert. Rational Team Concert is a client to the Jazz Team Server built on top of the Eclipse IDE and it has a set of APIs to track delivery and acceptance of change sets inside the IDE. The APIs pro vide access to detailed information about a change set, including the author, 28 4.4. Monitor the time stamp of the change, and the change type. Individual commands are mapped, per artifact, onto an authorship event as follows: • commit, deliver and check-in: DELIVERY • accept and load: ACCEPTANCE Figure 4.7 shows the dependency of the Jazz Team Server monitor. 29 4.4. M onitor III II1,//11 I 4 ,/1 ijI ii 7 jI huh1 I!F’ tUIB BII IiII 30 4.4. Monitor Concurrent Version System CVS does not have the concept of a change set, hence no APIs are provided to access information concerning to a change set such as who makes the change, when was the change accepted by a particular team member and so on. In CVS, change information retrieved from the repository does not include details about the acceptance date of the changes. Thus we make an assumption that whenever a developer commits a change to the repository, other team members accept it immediately. The particular mapping of commands to events, however, is slightly different (once again, per artifact): • edit, commit and check-in: DELIVERY • update and check out: ACCEPTANCE Figure 4.8 show the dependency of the CVS monitor. To implement the CVS monitor, we use the APIs that come with the Eclipse CVS client. The CVS monitor plug-in depends on the monitor framework, the Eclipse team plug-in (, and the Eclipse CVS client plug-in ( 31 —-— ________ Figure 4.8: Concurrent Versions System monitor plug-in dependency 4.5. Extensions 4.5 Extensions The framework is extensible through extension points and APIs. We de fine two extension points: the monitor extension point and the knowledge model extension point. With the monitor extension point, additional moni tor implementations can be integrated into the framework to contribute to the DOK calculation life cycle. With the knowledge model extension point, the framework can be extended to support new knowledge indicators, for example, adding an authorship indicator that considers information from bug reports. Section 4.5.1 describes extending the monitor to capture authorship changes for different source revision systems. Section 4.5.2 discusses ex tending the knowledge model with additional indicators. 4.5.1 Bridging to a Source Revision System The monitor component (ca.ubc.spl.cortana.monitor) provides an exten sion point (ca.ubc.cs.spl.cortana.monitor.repository) to monitor additional source revision systems for the DOA calculation. Figure 4.9 shows the class diagram of the repository bridge. AbstractRepository models the basic behaviors of a repository. AbstractRepositoryBridge provides op erations specific to a repository such as issuing an indicator event when authorship of a source code element is changed (lindicatorEvent). To monitor additional repositories, one must provide specific implementations of AbstractRepository and AbstractRepositoryBridge. Currently two repositories are bridged: Jazz Team Server and Concurrent Versions System. 4.5.2 Adding a Knowledge Indicator The core plug-in (ca.ubc.cs.spl.cortana.core) provides an extension point (ca.ubc.cs.spl.cortana.core.indicator) to add new knowledge indicators into the DOK calculation life cycle. As shown in Figure 4.2, one needs to simply provide an implementation of IDegreeOfKnowledgeComputationStrategy and IDegreeof Indicator, which are used by IDegreeOfKnowledge to com pute DOK values. Currently two indicators are implemented: degree-of- interest (IDegreeOf Interest) and degree-of-authorship (IDegreeOfAutho— rship). The default computation strategy is implemented according the formula described in Section 3.3.3. This flexibility enables easy integration of additional knowledge indicators, for example, adding another authorship indicator by crawling information from bug reports. 33 4.6. Performance J.cjt.MhiSflVfl JCa,arr.ntv.nio.iSy.t.,u —S — -R.pooryD,Idg.Feto,y I getPeptcOIke AbstiactR.posftay ) AbsUactRepo.Icnorkto IIrdcatC.ESent Ab.trsdR.po,ito,,,B,aig.s agateIMcatowEv.nt(Eva,tType ) IInskitc.Event t. I.. rrjCSfltY6yStSfl&idVe flit’ - Figure 4.9: Repository bridge class diagram 4.6 Performance Our event-based approach scales even if there are multiple monitors issuing thousands of events at the same time. We describe two performance test cases using the Eclipse performance test plug-in6 to demonstrate the scala bility of our approach. The performance test plug-in provides infrastructure for instrumenting programs to collect performance data for Eclipse-based applications and to assert that performance does not drop below a baseline. The goal of the performance test case is to simulate a typical workday scenario of a developer and then measure the performance of the frame work’s event-based model. We generate test data by assuming that the probability that a developer performs operations (e.g., a selection or a de livery) on a certain source code element conforms to a Normal Distribution, and the probability of the type of operations performed conforms to a Uni form Distribution. The reason is that typically developers focus on a subset of source code elements in a workday and the probability of operating on these elements is much higher than the rest, while the type of the operations 6, verified 26 April 2009 34 4.6. Performance performed can be regarded as random. In both our scenarios, we assume that there are 100000 elements loaded in a developer’s workspace and a developer frequently works on 1% of them (1000). We generate indicator events for these 1000 random elements with high probability and generate events for elements outside this set with low probability. The event type of the generated event is randomly picked. All the test results are statistically significant with 6400 samples and generated on a machine with a dual core 2.5GHz CPU and 3GB of RAM. 4.6.1 One Working Day Scenario From the data collected in our previous experiment [7], a professional devel oper typically has an average of 7861(±3982) interaction events over seven working days and an average of 153,240(±46,572) authorship events over three months. Thus the number of events in a regular working day is approx imately 2825(+ 1086). We conducted four performance tests to see whether the performance of the event-based model drops significantly when handling 1500, 3000, 4000, 6000 and 8000 events at once for the calculation of DOK values. The numbers of the events in three of the tests match the possible numbers of events that a developer may have in a typical workday (1500, 3000 and 4000) and we added two load tests (6000 and 8000) to investigate whether the event-based model scales. Figure 4.10 shows the amount of Java heap used for the tests, which is the amount of memory (KB) used in the JVM calculated by subtracting free memory from total memory. The framework only needs 279.92KB memory with 8000 events contributing to the calculation of the DOK values. The amount of memory used does not double even if the number of events double from 4000 to 8000. Figure 4.11 shows the CPU time used for the same tests. The framework only needs l7ms to handle 8000 events in the load test. We conclude that there is no significant performance impact when computing DOKs in a single session. 4.6.2 Seven Working Days Scenario We conducted another performance test to simulate a developer working consequently for seven workdays. For each day, we assume that the devel oper generates 8000 events (e.g., a SELECTION event) and the application will reproduce the elements with positive DOK values the next day. This simulation is close to a developer working in an IDE with one session per day. Table 4.3 shows the number of elements with positive DOK values at 35 4.6. Performance Figure 4.10: Used Java heap in kilobytes for one simulated workday Figure 4.11: CPU time in milliseconds for one simulated workday Used Java Heap (KB) 300 250 ______ 4fl992 200 150 100 -.. to 1500 events 3000 tvents 4000 events 6000 events 0000 events CPU Time (ms) 18 14 12 10 1500 events 3000 nvtnts 4000 events 0000events 0000 events 36 4.6. Performance Table 4.3: Number of elements with positive DOK values Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 767 780 798 779 811 782 772 the end of each day. Internally, the framework reproduces elements by is suing indicator events with the INITIAL_XXX event type (details in Table 3.2 and Table 3.4). Our test case mocks up this mechanism by translating elements with positive DOK values on the first day to indicator events before inputting them with other randomly generated events to the next day. Figure 4.12 shows that the event-based model uses a steady amount of memory (286.5 KB on average) to handle events each day for seven working days. Figure 4.13 indicates that the event-based model needs an average of l9ms to handle the same amount of events each day. We conclude that our event-based model scales between sessions. 37 4.6. Performance Used Java Heap (KB) 211 287.6 H— 287.79 2865 __ _ _ 285.5 285 L_________ .____________ ________________ ____________ Oayi. 8W2 Dy3 Dy4 DayS DayS Dayl Figure 4.12: Used Java heap in kilobytes for seven simulated workdays Figure Dayl 9a2 Day3 ay4 Days DwyO Dy7 4.13: CPU time in milliseconds for seven simulated workdays 215 21 CPU Time (ms) 20.5 20 10.5 19 17.S f 17 16.5 38 Chapter 5 Discussion In this chapter, we discuss the potential usefulness of the DOK model. We report on five usage scenarios that consider the use of the knowledge indi cator to help scope structured views (Section 5.1), to help raise awareness of development process (Section 5.2), to help find experts (Section 5,3), to help with onboarding (Section 5.5), and to help with assessing information in change sets (Section 5.4). We also provide direction for future work (See tion5.6). 5.1 Scoped Structured View The DOK model can help focus structured views by projecting the source code elements with high DOK values onto a particular UI facility. The basic idea is if a source code element’s DOK value exceeds a certain threshold, it is filtered in the structured views. We believe this filtering can help relieve developers from overloaded information and focus their attentions on the elements that potentially need further review. We illustrate how to use our DOK model to further filter elements in a task context in Mylyn. As a developer works on a task, she gradually forms the task context, the set of all artifacts related to the task [9]. Mylyn uses this task context to focus the UI on interesting information, hide what’s uninteresting, for example, filtering elements in the Package Explorer view in Eclipse. With our knowledge model, we can further filter the elements in a task context that the developer already knows and focus the view on ele ments that might need special attention, for example, focusing on significant elements related to the task for which the developer has no knowledge. We believe this usage is helpful in scoping elements in a task context, especially if the task context is not authored by the developer but shared by other developers in the team and is retrieved from a bug repository. Figure 5.1 illustrates this scenario with the elements in a task context on the left and elements after being filtered by the DOK model on the right. As shown, a developer uses a task context originally created by a teammate. The Mylyn focus UI is turned on (1 in Figure 5.1) and only the interesting 39 5.2. Awareness of Knowledge ___________ Figure 5.1: ScOping task context with DOK model elements are shown in the Package Explorer view (2 in Figure 5.1). At this point, the developer would like to further scope her view on elements that she potentially needs to learn, so she turns on the focus facility connecting to the DOK model (3 in Figure 5.1) and focuses her view on elements that she does not have high knowledge (4 in Figure 5.1). 5.2 Awareness of Knowledge The DOK model can be used to raise the awareness of potential increases and decreases in a developer’s knowledge over a source codebase. The basic idea is that whenever a certain number of source code elements with their DOK values rises above, or drops below, a certain threshold, the developer is notified that her knowledge of these elements potentially change significantly and she may need to review them to understand the cause. For example as shown in Figure 5.2, whenever a developer synchronizes her workspace with the source repository and plans to accept changes from her teammates, an assessment is conducted to indicate the effect on ac cepting these changes and a notification dialog is shown up to make the developer aware of the effect. The notification provides information such U C 1C .3 l6n4b196g9I4t46Edi26P496.I.V, 1.11 l.n.bI.B.iIbT.kEdiL,,P, dA<lI(1TooM.n.) n.w Con9dConl,ibl96r() ..6 . doAdion(S1,h) . lIIToØTtr..e) 96 14 1961rr62696n.,ofl.j96a E. 14 1d,prp,ti 1.7 llllgE,.44,1 1.96 96 o dpe.,n9lyn.t8.k.. td4v.,.oJ 96 .9 .r 96 dtod.L96v 1.26 k6 49663th196lion(9696 9.4191.1., A doA or(SlrIn9, Cofl1rl 14 T 1966,6 tr96or6v4 .76 ( Th.k9d*o,96t6nCo9696,,1o, 1.76 96 091.6 .1lfl6lyfl.t0960,’i.,960’, [4 AbOtrT O96.0,91,. 1.196 14 .144 j PIUQ1.l.0661i 1.313 () EiLir 1.9.e10fl*1.1 [76V.61.1..076) 6.4fl ..,000r96.1 14 IR6O1.6r69149.1k 66,0600, 640 1.3 14 b6d 9.41.,t,., .7 PI.96fl.1161 1.76 r 1.9.696700.flMyfl.144191.14 (dov..o1oeo,9J 111 96 [4 076 44106, rI .iOIOrflOI,1 k0.oe.16rrI[4 6d%0.j0II0 .26 E 91trIU6l 1,26 26 [4 104110d66,AIOOrCOrrblb,ltor400, 1.76 E To96Er26orAo6nCo26th96o, .76 6,9.OdP5O.mYYfl.96*4,96.,dOO,1 [4 6bg,.4tTolkEdt1.P.4,.0rIo 1.106 40 5.2. Awareness of Knowledge as which source code element will have its DOK value drop if the changes are accepted and the significance of the drop (e.g., 14 out of 22 elements in IKnowledgeContext will have their DOK values drop and the overall drop reaches 70%). The developer can then review these elements to understand the cause of the knowledge drop by opening the Knowledge Explorer view (e.g., the drop is caused by accepting a change from another developer in the team). 41 CVS(w.zoofleW — -: CCyrFjtC :u zWe Thntup rccLcz11ts and -êG dif fi a O.-rn. • Eclipse r4otmctlon ° :aou 0 *.a,l,n.oaaoa,a.a 0 a a,+ia.a.aoun ‘. 0.aa1 0 , Od. aa+iadd*ba. fn.ana oi . a. ,ada*auaa . ddan.daa..i 0 aa.alkaaiann..niou ln.nn1 4, aa.. 4a, dkdoa.aa 0d0*a0a 0 j,s?, oq..4.sq4pma&atE t&nc” nç a.flfl 0 , ‘.<nnaa3 T..0.. cl I ?> jTI’.aa..i.2.t1em4rtt ha. out of • .1.nt.s drøppni 41’ koo1.d. dr 4) __________________________________ ?> rKdin1.dg.coi...t ha 14 out of 22 .L...nt. droppinq O4 Significaat kao..Iudqa drop ‘) Ci.n tau.Co. ba€ I out of 2 .ients dzuppinq 12 EnOwL.49.drop : ‘ - ‘> IDo1r..OCIno..1.dq. 0.€ 2 out of S •1.nts d poaq 25t .ISd—Oda,a dad — — Kow1.dg drop 4€ ma., .p.o Koool.dq. £p1o.a to oi.. . —.--—.. ___________ suaqcom4L..da% (‘)C Iidqui: C - Tarn an.ula gJ pn nnyl tWa niyi .iiIøu ItO & Qnn or’ dd EC t )( 00 n.r aaat KS. t- ci.- e-; ]o’’4 I I Figure 5.2: Awareness of knowledge drop when synchronizing workspace 5.3. Knowledge Map 5com ibm.teamuitena1fsystentuiqueies $combmteamü.tein1.fi1esystetnuipreferences $c,mntapffl $cornibm.tennnterna15esystemuLpatthe.viewer $com ibm.teatu itenesyopenadiot1s $coiiaibmtcn.àenLesystemiiicre coniponoiithstory Figure 5.3: Part of a knowledge map 5.3 Knowledge Map By ranking the results of individual developer’s knowledge in a codebase using DOK, we can apply our approach to identify which team member is the expert of each part of the code base. The basic idea, as described in our user study [7], is that for each source code element in a package, and for each developers in the team, we calculate the aggregate DOK values for a package by summing the DOK values of all source code elements belonging to the package. We can then produce a diagram called knowledge map that shows for each package, the developer with the highest DOK for that package. A part of the knowledge map for a project is presented in Figure 5.3. It shows which team member knows most about which package by highlighting them with different colors. We reported a 25% improvement in correctly finding experts compared to other approaches [7]. 5.4 Change Set Assessment A developer who receives a change set from another team member typically assess the change set before accepting it into her development environment. We believe shipping with a change set, additional elements that represent the 43 5.5. Onboarding developer’s knowledge in that change can help developers better assess the incoming change set. The basic idea is that whenever a developer delivers a change set, she attaches to the change set the changes’ structurally related elements (e.g., superclasses) which also have high DOK values, and then team members can better assess the changes by looking at these attached elements. We found that this approach shows promises in helping developers understand a change set [7]. 5.5 Onboarding We believe our DOK model is also useful in helping developers learn the basic structure of the codebase who newly join a development project. The process of becoming proficient with a codebase is known as onboarding [4]. The idea is filtering the display of code elements in a project to focus on those with which the newcomer’s mentor has familiarity by ranking the elements according to their DOK values; the higher the DOK values of the elements in the mentor’s workspace, the higher priority that the newcomer should learn them. We reported that the elements with high DOK values from the mentor’s workspace can be used as a starting point for the newcomer to find useful elements that help learn the structure of the codebase 7]. 5.6 Future Work We have defined a model that indicates the knowledge of a developer and implemented an effective framework for calculating the DOK values. Our implementation is one particular point in the design space of using the DOK model to scope down the information presented in an application. In this section, we discuss alternative approaches and future work that could im prove accuracy of the model and applicability. 5.6.1 Adding More Indicators To gauge the knowledge of a developer, we only take into account two main factors (interactions and authorship) from [6]. To gain a better indicator of program structure knowledge, other factors such as code patterns can be used to better gauge the knowledge if proven by future experiments. As described in Section 4.5.2, the current implementation is flexible enough to support additional indicators integrated to the life cycle of the DOK computation. What is left to do is conducting user studies to find out how 44 5.6. Future Work significant other factors are in gauging developers’ knowledge and provide corresponding implementations. 5.6.2 An Automated Learning Model It would be interesting to combine the DOK idea with a machine learning approach. Currently, we predefined a fixed model and apply it to devel opers working in various environments. While the model approximates a developer’s knowledge reasonably well, there are cases where it does not hold because of different development processes or different projects. For example, we reported that the DOK values computed using the model tuned at one site is less statistically significant at a different site [6]. We believe that each developer should has his own DOK model with a unique formula. The basic idea is using an online learning approach to keep tuning the DOK model as a developer works; by inputting with a generic model, a developer’s DOK model is constantly tweaked according to his working habit as more activity data is put in through monitors. The model will finally grow to a shape that predicts a particular developer’s knowledge specific to a partic ular project that he works on and a particular development process that he uses. 45 Chapter 6 Summary The integrated development environments (IDEs) used by developers to work on large software systems make it easy to browse and query the struc ture of the information that developers need to access. However, given the complexity of today’s information systems, developers end up spending an inordinate amount of time filtering the information by considering both imn plicit and explicit cues. In this thesis, we investigated an indicator of a developer’s knowledge of code that might be used as an explicit cue to help developers filter through the information comprising a system. The thesis introduces a degree-of-knowledge (DOK) model that incorpo rates a long-term component degree-of-authorship (DOA) and a short-term component degree-of-interest (DOT), to help combat the information over load problem faced by software developers. The thesis also describes the implementation of the DOK model in detail, provides documentation on some of the APIs and provides performance test data to show that the im plementation scales. Finally, the thesis discusses how the DOK model might be investigated in future research to help alleviate the information overload problem. 46 Bibliography [1] Zahra Noroozi Anita Sarma and Andre van der Hoek. Palantir: raising awareness among configuration management workspaces. Proceedings of the 25th International Conference on Software Engineering, pages 444—454, IEEE Computer Society Press, 2003. [21 Grady Booch and Alan W. Brown. Collaborative development environ ments. In Computer, volume 26, pages 17—27. IEEE Computer Society Press, 1993. [31 Grady Booch and Alan W. Brown. Collaborative Development Envi ronments, volume 59 of Advances in Computers. Academic Press, 2003. [4] Mauro Cherubini, Gina Venolia, Rob DeLine, and Andrew J. Ko. Let’s go to the whiteboard: how and why software developers use drawings. Proceedings of the SIGCHI conference on Human factors in computing systems, pages 557—566, ACM Press, 2007. [5] Geraldine Fitzpatrick, Paul Marshall, and Anthony Phillips. Cvs inte gration with notification and chat: lightweight software team collabora tion. Proceedings of the 2006 20th anniversary conference on Computer Supported Cooperative Work, pages 49—58, ACM Press, 2006. [6] Thomas Fritz, Gail C. Murphy, and Emily Hill. Does a program mer’s activity indicate knowledge of code? Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on the Foundations of Software Engi neering, pages 341—350, ACM Press, 2007. [7] Thomas Fritz, Jingwen Ou, and Gail C. Murphy. Degree-of-knowledge: Investigating an indicator for source code authority. TR-2009-13, Uni versity of British Columbia Computer Science Technical Report, 2009. [8] Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and Stephane Ducasse. How developers drive software evolution. Proceedings of the 8th Inter 47 Chapter 6. Bibliography national Workshop on Principles of Software Evolution, pages 113—122, IEEE Computer Society Press, 2005. [91 Mik Kersten. Focusing knowledge work with task context. University of British Columbia, 2007. [10] Mik Kersten and Gail C. Murphy. Mylar: a degree-of-interest model for ides. Proceedings of the 4th international conference on Aspect-Oriented Software Development, pages 159—168, ACM Press, 2005. [11] Mik Kersten and Gail C. Murphy. Using task context to improve pro grammer productivity. Proceedings of the 14th ACM SIGSOFT interna tional symposium on Foundations of Software Engineering, pages 1—11, ACM Press, 2006. [12] David W. McDonald and Mark S. Ackerman. Expertise recommender: a flexible recommendation system and architecture. Proceedings of the 2000 ACM conference on Computer Supported Cooperative Work, pages 231—240, ACM Press, 2000. [13] Shawn Minto and Gail C. Murphy. Recommending emergent teams. Proceedings of the 4th International Workshop on Mining Software Repositories, page 5, IEEE Computer Society Press, 2007. [14] Audris Mockus and James D. Herbsleb. Expertise browser: a quan titative approach to identifying expertise. Proceedings of the 24th In ternational Conference on Software Engineering, pages 503—5 12, ACM Press, 2002. 48


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items