UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

CodeShovel : constructing robust source code history Grund, Felix 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_september_grund_felix.pdf [ 2.03MB ]
Metadata
JSON: 24-1.0379647.json
JSON-LD: 24-1.0379647-ld.json
RDF/XML (Pretty): 24-1.0379647-rdf.xml
RDF/JSON: 24-1.0379647-rdf.json
Turtle: 24-1.0379647-turtle.txt
N-Triples: 24-1.0379647-rdf-ntriples.txt
Original Record: 24-1.0379647-source.json
Full Text
24-1.0379647-fulltext.txt
Citation
24-1.0379647.ris

Full Text

CodeShovel: constructing robust source code historybyFelix GrundB.Sc., Bamberg University, 2012a thesis submitted in partial fulfillmentof the requirements for the degree ofMaster of Scienceinthe faculty of graduate and postdoctoral studies(Computer Science)The University of British Columbia(Vancouver)June 2019c© Felix Grund, 2019The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:CodeShovel: constructing robust source code historysubmitted by Felix Grund in partial fulfillment of the requirements for the degreeof Master of Science in Computer Science.Examining Committee:Reid Holmes, Computer ScienceSupervisorIvan Beschastnikh, Computer ScienceSupervisory Committee MemberiiAbstractSource code histories are valuable resources for developers, and development tools,to reason about the evolution of their software systems. Through a survey with 42professional software developers, we gained insight in how they use the history oftheir projects and what challenges they face while doing so. We discovered signif-icant mismatches between the output provided by developers’ existing approachesand what they need to successfully complete their tasks. To address these short-comings, we created CodeShovel, a tool for navigating method histories that is ableto quickly produce complete method histories in 90% of the cases. CodeShovelenables developers to navigate the entire history of source code methods quicklyand reliably, regardless of the transformations and refactorings the method has un-dergone over its lifetime, helping developers build a robust understanding of itsevolution. A field study with 16 industrial developers confirmed our empiricalfindings of CodeShovel’s correctness and efficiency and additionally showed thatour approach can be useful for a wide range of industrial development tasks.iiiLay SummaryAt the core of all software is its source code that is saved in (normally multiple) fileson the file system. To navigate changes to source code, most software developersuse version control, a range of tools that capture all changes made to the sourcecode as dedicated versions. This work aims to provide better views and meansof navigation for such version histories. In particular, these histories are normallyviewed and navigated on the file level, which makes it very hard to view histories ofonly certain components of the source code. We introduce a new tool that enablestracing source code history of logical components of the code, rather than the textsaved in a file.ivPrefaceAll of the work presented henceforth was conducted in the Software Practices Lab-oratory at the University of British Columbia, Point Grey campus. All projects andassociated methods were approved by the University of British Columbia’s Re-search Ethics Board [certificate #H18-00510].This material is the result of ongoing research at the Software Practices Lab-oratory. I was the lead investigator, responsible for all major areas of conceptformation, data collection and analysis, as well as manuscript composition. ReidHolmes was the supervisory author on this project and was involved throughoutthe project in concept formation and manuscript composition. The material has notbeen published prior to this thesis.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . 52.1 General Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 File- and line-based source code history . . . . . . . . . . . . . . 62.3 Specific Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1 Analysis burden . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4 CodeShovel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12vi4 Industrial Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.1 Survey Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2 Survey Participants . . . . . . . . . . . . . . . . . . . . . . . . . 164.3 Survey Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3.1 Developers’ usage of source code history . . . . . . . . . 174.3.2 Source code granularity . . . . . . . . . . . . . . . . . . 184.3.3 History and source code units . . . . . . . . . . . . . . . 194.3.4 History tools and program comprehension . . . . . . . . . 215 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.1 CodeShovel Inputs and Outputs . . . . . . . . . . . . . . . . . . 235.2 Method Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.3 Similarity Algorithm Design . . . . . . . . . . . . . . . . . . . . 265.4 Change Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306.1 Languages and Platforms . . . . . . . . . . . . . . . . . . . . . . 306.2 Similarity Algorithm Implementation . . . . . . . . . . . . . . . 316.3 History Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . 336.4 Language Adapter API . . . . . . . . . . . . . . . . . . . . . . . 347 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 377.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.3 Empirical Study Results . . . . . . . . . . . . . . . . . . . . . . 397.3.1 Frequency of code changes . . . . . . . . . . . . . . . . . 397.3.2 CodeShovel correctness . . . . . . . . . . . . . . . . . . 417.3.3 CodeShovel performance . . . . . . . . . . . . . . . . . . 427.4 Change Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.5 Comparison with git-log . . . . . . . . . . . . . . . . . . . . . 438 Industrial Field Study . . . . . . . . . . . . . . . . . . . . . . . . . . 478.1 Study Participants . . . . . . . . . . . . . . . . . . . . . . . . . . 478.2 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48vii8.3 Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498.3.1 CodeShovel correctness . . . . . . . . . . . . . . . . . . 498.3.2 CodeShovel performance . . . . . . . . . . . . . . . . . . 498.3.3 Scenarios for method-level histories . . . . . . . . . . . . 509 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.1 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . 529.1.1 Internal Validity . . . . . . . . . . . . . . . . . . . . . . 529.1.2 External Validity . . . . . . . . . . . . . . . . . . . . . . 539.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539.2.1 Interaction with CodeShovel . . . . . . . . . . . . . . . . 539.2.2 Integration of CodeShovel in other software . . . . . . . . 549.2.3 More code granularities and change details . . . . . . . . 549.2.4 Source code statistics . . . . . . . . . . . . . . . . . . . . 559.2.5 CodeShovel robustness . . . . . . . . . . . . . . . . . . . 5510 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62A.1 Survey Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62A.2 Survey Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 72B Field Study Transcripts . . . . . . . . . . . . . . . . . . . . . . . . . 117viiiList of TablesTable 2.1 Actual history of method CommonUtils::createPattern. Rowswith ? were returned by git-log; other rows are false negativeresults that git-log failed to identify. One false positive re-sult returned by git-log (2864c10) is not included in the list.CodeShovel correctly identified all results for this method withno false positives or false negatives. . . . . . . . . . . . . . . . 9Table 3.1 Selection of tools for examining source code histories. Thesevary in whether they are online (computing histories on demandor requiring pre-processing a whole project), the granularity thatcan be analyzed (code means a subset of class, method, or state-ment), and their tolerance to common source code transforma-tions (M- refers to method, F- refers to files, ≈ refers to stringmatching, and M-Move denotes move method, pull-up method,and push-down method). . . . . . . . . . . . . . . . . . . . . . 11Table 4.1 Most commonly used history navigation tools. . . . . . . . . . 17Table 7.1 Java repositories used for our empirical analysis and their statis-tics. Repositories annotated with a ? were used in the traininganalysis (total of 65171 methods) while the rest were used inthe validation analysis (total of 110954 methods). . . . . . . . 38ixTable 7.2 Proportion of methods tagged with each change type. Theseadd up to > 100% because a change can be tagged with morethan one kind of change. Changes, in the bottom half of thetable modify the method signature. . . . . . . . . . . . . . . . 44xList of FiguresFigure 4.1 Structural granularities most desired by study participants whennavigating source code histories. . . . . . . . . . . . . . . . . 19Figure 4.2 How well participants’ strategies cope with common code changes.Change A: rename method; Change B: change method signa-ture; Change C: move method to another file; Change D: splitmethod into multiple methods; Change E: combination of pre-vious changes. . . . . . . . . . . . . . . . . . . . . . . . . . 21Figure 5.1 High-level approach: each query starts with a method nameand SHA. CodeShovel iterates backwards through history untilit finds the introducing commit for that method. . . . . . . . . 24Figure 5.2 Hierarchy of change types in CodeShovel. . . . . . . . . . . . 28Figure 7.1 The proportion of methods having each number of changes asidentified by CodeShovel. This shows that while 33% of meth-ods are only changed once, 50% of methods are changed threetimes or more. . . . . . . . . . . . . . . . . . . . . . . . . . 40Figure 7.2 A diff showing a complex method transformation. The createmethod was renamed to bindMatchers, the parameters werechanged, an exception signature was added, and the creationInvocation was removed from the body. . . . . . . . . . . . 42xiFigure 7.3 The median time it took CodeShovel to process all methods ineach repository (point) listed in Table 7.1. The overall medianruntime is under 2 seconds; the outlier is the intellij-communityrepository which has many large and frequently changing sourcefiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 7.4 Fraction of CodeShovel data returned by git-log as a fractionof total duration of results. For example, for a method thathad 1,000 days of history with CodeShovel, git-log was onlyable to find 510 worth of history (on average). . . . . . . . . . 45Figure 8.1 CodeShovel runtimes on 42 methods from an industrial code-base. The slow performance of the outlying method was dueto the containing file needing to be parsed a large number oftimes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50xiiAcknowledgmentsI thank everyone at the Software Practices Lab of the Computer Science Depart-ment of the University of British Columbia for magnificient 2.5 years in academia.My master program at UBC has been an extraordinary experience. Moreover, Ithank everyone at Scandio GmbH in Munich, my first employer, for their greatsupport before and throughout my master program. And last but most importantly,I thank my family who have always had open ears when I was faced with compli-cated life situations.xiiiChapter 1IntroductionSource code repositories contain a wealth of information that is valuable and rele-vant to support developers in their work. For instance, this information can answerquestions commonly asked about source code (e.g., [15, 18, 21, 29]), help predictthe location and likelihood of source code defects [22], or provide more contextfor code reviews [17]. We performed a survey with 30 industrial developers and 12academic developers and found that they frequently have questions that they try toanswer using source code histories, but that existing tools do not support them inaccessing the historical data they need. Specifically, existing tools return results atthe wrong granularity (they are file-based rather than method-based) and results areoften incomplete (common file transformations inhibit retrieving complete histori-cal data). Consequently, developers have to perform multiple queries and manuallyfilter changes to identify the changes that are relevant to their task.To address the shortcomings identified by our survey participants, and to helpthem find the information they need from source code histories, we built CodeShovel,a tool for surfacing complete histories of source code methods. CodeShovel worksby parsing the files changed in a commit and using the resulting Abstract SyntaxTree (AST) to reason about the nature of the change. Currently, a robust languageparser for Java is available, and we have started to work on JavaScript and Rubyparsers. The algorithm is easily extensible for other languages. A similarity algo-rithm that considers five key features compares methods in the ASTs to determinewhether a method of interest was modified, and if so how. Ultimately, CodeShovel1is able to quickly and accurately return complete and precise method-level sourcecode histories. In our previously described survey, we identified that a majorityof industrial developers would find a tool working on the method- and class-levelsmost useful. We therefore implemented CodeShovel to work on methods with theperspective that an extension for classes would be feasible by aggregating methodhistories.We evaluated CodeShovel’s correctness and performance using 20 popularopen source repositories with a total of 176,125 methods. We manually createdoracle method histories for 10 methods in each of the 20 repositories (totaling200 methods) and then proceeded in two phases: first, we trained our algorithmon the first 100 oracle method histories and created unit tests to prevent regres-sion; second, we evaluated the correctness of CodeShovel on the next 100 oraclemethod histories after an implementation freeze. We found that CodeShovel isable to correctly determine the complete history of over 90% of methods. By run-ning CodeShovel on all the 176,125 methods in the 20 repositories, we found thatCodeShovel produces method histories with a median execution time of less than2 seconds.To ensure the correctness of CodeShovel translates to closed-source, industrialcode bases, we additionally ran CodeShovel on 45 participant-selected methods ina field study with 16 industrial developers. The results were nearly identical toour empirical study in terms of accuracy (the developers assessed 91% of methodhistories to be correct and complete) and performance (median < 2 seconds execu-tion time). Furthermore, our field study revealed that CodeShovel would be usefulin the domains of provenance, traceability, onboarding, code understanding, andautomation.The primary contributions of this paper are:• A survey with 42 professional developers to understand how they use sourcecode history; this demonstrated a lack of tool support for the most frequently-performed historical understanding tasks.• The implementation of CodeShovel, a novel approach for extracting method-level source code histories.11CodeShovel is available at: https://github.com/ataraxie/codeshovel2• A quantitative analysis of CodeShovel’s accuracy and performance.• A field study with 16 industrial developers verifying CodeShovel’s correct-ness and performance, and an indication of use cases where CodeShovelwould be most helpful.The remainder of the paper is structured as follows: first, we describe a com-plete scenario in Chapter 2 where we illustrate the problems developers are com-monly facing with line-based historical analysis tools. Chapter 3 then presentsrelated work where we place CodeShovel in the context of similar tools for sourcecode history. Chapter 4 describes the design and results from our survey with 42professional developers. Here, we aim to answer the following research questions:RQ1 Do developers use source code history when they are working with code? Ifso, what are they trying to learn when they examine source code history?RQ2 In terms of their mental models and information needs, what level of tempo-ral and structural granularity are most appropriate when using source codehistory?RQ3 How do developers identify the history for specific code units and how welldoes existing tooling support these queries?RQ4 How effectively can a language-aware code history viewer support programcomprehension?Our approach with CodeShovel is described in Chapter 5 and its implementa-tion in Chapter 6.We then evaluate our tool empirically in Chapter 7 based on the followingresearch questions:RQ5 How frequently are methods changed?RQ6 Are the results returned by CodeShovel correct?RQ7 Is CodeShovel fast enough for being used as an online tool that does notrequire preprocessing?3Chapter 8 describes our field study in which we further analyze RQ6 and RQ7and one more research question:RQ8 In which scenarios are method-level histories useful to industrial developersand why?Discussion with threats to validity and future work follow in Chapter 9 beforewe conclude in Chapter 10.4Chapter 2Motivation and BackgroundThis section motivates the importance of source code history tooling during de-velopment tasks. We first illustrate a general high-level scenario in Section 2.1and provide background information about existing tooling in Section 2.2. Wethen expand our initial scenario to be more specific using an example method inan open-source repository, demonstrating how existing tools fail to return manyimportant historical references in practice (Section 2.3).2.1 General ScenarioConsider the following scenario: a developer is about to review a pull request thatchanged a method. She has not seen this particular segment of code in a while andlacks understanding of what the method is actually doing. Since in-place documen-tation and comments are insufficient for a clear understanding of the code [27], andthe author of the pull request is unavailable, she decides to investigate in the versioncontrol history associated with the method [17]. She believes how the method waspreviously evolved, and who authored those changes, will help contextualize thecurrent pull request in terms of the problems prior developers encountered workingon this code.1.She first uses her version control tool to view the history of the file containingthe method that was changed. Unfortunately, the file history contains a large num-1The approach of using version history for such program understanding tasks being very commonamong software developers will become more visible in our survey in Section 4.5ber of changes, among which only a few modified the method of interest to her pullrequest. These changes cannot be easily filtered for only the changes relevant toher pull request so she decides to use her version control tool to select a line-rangebased history (a functionality built into most version control systems). Becausethe file has undergone multiple extensive refactorings over its lifespan, this historyreports many changes that are not related to the method she is interested in. Thishistory also terminates at a fairly recent commit when the method was moved fromanother file. Consequently, even with a tool seemingly built for this task she wasnot able to obtain a complete view of the past changes that affected her methods ofinterest without extensive manual traversal of the version control history.2.2 File- and line-based source code historyThe scenario uses tools provided by modern version control systems to search andfilter source code history. The most prominent version control system, Git, surfacesthis functionality through the git-log command which includes several differentoptions. Typically, a file path is provided for the history specific to a file. In addi-tion, a line range can be provided if the investigator is only interested in a specificpart of a file (git log -L BEGIN,END:PATH). While there are other advancedoptions such as the -S option which accepts a search string, most git-log invo-cations are at the file- (no parameter) and line-range (-L) granularity.IDE-based version control tools that provide more abstract views are usuallybased on the same underlying data as git-log and its arguments (e.g., within theEclipse or IntelliJ IDEs). While these tools suggest a language-aware aspect to theirfunctionality, they remain essentially text-based. For example, the Show historyfor method feature in IntelliJ has no notion of the method as a code unit despitethe term method in the command name within the IDE. Rather, IntelliJ extractsthe line range of the method of interest and shows the history for this line range.This lack of language awareness causes these tools to produce a high proportionof false positives and miss many relevant changes (false negatives) for files thathave been involved in non-trivial evolution. Additionally, these tools often report aunit as being new, where in fact it was just moved from a different file (e.g. duringrefactoring), even though in reality that unit may have a rich evolutionary history6prior to the refactoring.2.3 Specific ScenarioWe illustrate the challenges involved with investigating source code history of acode unit using the Checkstyle2 project, a popular syntax validation tool for Java.Suppose the developer is evaluating a pull request that changes the Common-Utils::createPattern3 method which can be found on lines 93–112 in Common-Utils.java. She now wants to learn more about this method’s history so she canbetter understand the context of the current pull request. She uses her versioncontrol tool to show the history of CommonUtils.java, but unfortunately the filehistory shows 47 changes to this file in three years and given that createPatterncomprises only 3% of the file (20 LOC out of 664 LOC its current revision), it isunlikely that createPattern is germane to most of these changes.Retrieving history with git-log She decides to examine the history for create-Pattern using its line range and issues the command git log -L 93,112:-PATH.4 This identifies two commits: ce21086 which changed the body of themethod and 2864c10, which is a false positive because the change just moved themethod within the file without modifying it. Unfortunately, several other changesare missing as well; in total, of the 17 years of history for this method, git-logis only able to return the commits that happened in the most recent 18 months.Through an exhaustive manual analysis we identified several changes to this methodthroughout its lifetime; these are shown in Table 2.1. Notably, the method wasmoved between files twice (8d6fa33 and 1c15b6a) and the file the method wasin was renamed (or was moved between directories) and was moved between filesthree times (f1efb27, ed595de, and cdf3e56). While these changes could havehelped the developer build the understanding they wanted for their task, it is nearlyimpossible to find them in a timely fashion with the current tools.2We forked this repository to keep it stable at a specific commit for illustration purposes in thispaper: https://github.com/ataraxie/checkstyle/.3https://github.com/ataraxie/checkstyle/blob/746a9d69125211ff44af1cb37732e919368ba620/src/main/java/com/puppycrawl/tools/checkstyle/utils/CommonUtils.java4Full command: git log -L 93,112:src/main/java/com/puppycrawl/tools-/checkstyle/utils/CommonUtils.java.7Retrieving history with Code Shovel CodeShovel is able to build a complete his-tory for CommonUtils::createPattern on the fly with no previous analysis ofthe version control repository. Every row in Table 2.1 is returned by the tool, alongwith the annotations shown in the right-most column in the table.5 These anno-tations exist to help developers quickly identify the changes they are interestedin (for instance if they wanted to trace how the method moved between files theycould just look at the first row when the method was introduced and any subsequentmethod move entries).5CodeShovel actually reveals more contextual information than displayed here. These informa-tion are described in more detail in Chapter 5.8Table 2.1: Actual history of method CommonUtils::createPattern.Rows with ? were returned by git-log; other rows are false negativeresults that git-log failed to identify. One false positive result returnedby git-log (2864c10) is not included in the list. CodeShovel correctlyidentified all results for this method with no false positives or false nega-tives.SHA Date Nature of change? ce21086 2017-02-24 Method body changef65b17c 2015-11-22 Method body changef2c6263 2015-10-29 Param change + body changecdf3e56 2015-08-27 File move/renameed595de 2015-08-26 File move/rename081c654 2015-08-01 Exception change97f0829 2015-03-27 Method body changeebd4afd 2015-03-24 Method body change1c15b6a 2015-03-13 Method move to other fileb94bac0 2015-01-11 Param change + body changef1efb27 2014-02-19 File move/rename35d1673 2006-07-07 Method body changee27489c 2005-05-11 Body + return type change + renameb0db9be 2002-12-08 Method body change419d924 2002-12-06 Exception change7b849d5 2002-05-14 Method body change8d6fa33 2002-01-14 Method move to other filef0f7f3e 2001-06-28 Method body change0fd6959 2001-06-22 Method introduced9Chapter 3Related WorkSource code history has long been recognized in the software evolution communityas a key contributor to program understanding and capturing rationale (e.g., [3, 4,9, 14, 19, 25, 26]).While all version control systems provide features for tracking the histories ofindividual lines, line ranges, and files that they are versioning, a number of toolshave been built to better help engineers understand the histories of their systems. Inthis work, our goal is to develop an approach to help developers find comprehensivehistories for their code as easily as posible. While there are many dimensionsone could analyze, we have chosen to focus on three in particular in this work aswe believe they are instrumental in supporting this goal. An overview of thesedimensions is given in Table 3.1.3.1 Analysis burdenMany approaches are up-front analyses that require a complete project to be an-alyzed before any queries can be issued. These offline analyses can usually bequeried efficiently once a history has been created, but the up-front cost can requirehours of pre-processing before these queries can be handled. These approachestypically require histories to be recomputed after code changes are made as thetools are more geared towards mining-style analyses than answering developerqueries. For example, Historage pre-processes a repository, placing each method10Code TransformationsApproach Online Granularity Intra-File Inter-FileAPFEL NO Code 7 7Beagle NO Code ≈ ≈Historage NO Code M-Rename F-RenameC-Rex NO Code M-Rename 7pry-git YES Code ≈ 7method_log YES Code ≈ 7git-log -L YES Text ≈ 7IntelliJ YES Text ≈ F-RenameCodeShovel YES CodeM-RenameM-SignatureF-RenameM-MoveTable 3.1: Selection of tools for examining source code histories. These varyin whether they are online (computing histories on demand or requir-ing pre-processing a whole project), the granularity that can be analyzed(code means a subset of class, method, or statement), and their toleranceto common source code transformations (M- refers to method, F- refersto files, ≈ refers to string matching, and M-Move denotes move method,pull-up method, and push-down method).in its own file and then using Git’s history mechanism to track changes on each in-dividual method’s corresponding file [12]. The up-front pre-processing time allowsfor essentially instantaneous serving of histories for a given method. In contrast,git-log does not require any up-front analysis: when a query is made for a line orrange within a file, git-log walks back through the file’s history to find changesto that line or line range.3.2 GranularityDifferent tools provide answers for historical queries about different granularitiesof code units. By default, version control systems operate on lines within files. Byfocusing on the text itself, these approaches are language-agnostic but are unable toanswer interesting queries like “find all changes to this class”, which is supportedby Beagle [30]. Tools which support queries on code elements, rather than lines,also support various levels of queries, for instance to classes (e.g., Beagle [30]),11methods (e.g., method_log1) or program blocks (e.g., APFEL [33]). Granularitiescan also vary in terms of time: while most tools in Table 3.1 try to find completehistories, pry-git2 and Beagle only analyze changes between two specific versionsof a program or file. While code-specific support adds additional complexity tothese tools, they allow developers to ask queries about how specific units of codehave changed over time.3.3 TransformationsSystems are constantly being evolved, which can cause considerable challenges forhistory tracking tools. These changes can range from simple single-line code editsto complex refactorings that can involve renaming methods and moving them tonew files. Refactorings have been described as the “bread and butter" of softwarerestructuring [1] and the occurrence of refactoring commits is remarkably large.For example, it was shown that 80% of changes to APIs are refactorings [5] andthat 19% of method introductions in the PostgreSQL source code were a result ofrefactorings [9].Most approaches are able to use textual similarity to detect transformationswithin a file (intra-file changes), as long as enough textual similarity is maintainedthrough the transformation. Some code-based analyses are able to further catego-rize the changes (e.g., Historage [12] and C-Rex [11] are able to identify methodrename refactorings). Most tools are unable to track changes through inter-filetransformations, except for Historage and IntelliJ, which are robust in the face offile rename events, but cannot track other inter-file transformations (like an extract-method refactoring). Unfortunately, such refactorings are also prevalent in practice(e.g. [4, 25]).3.4 CodeShovelThe approach described in this paper is aimed at quickly providing comprehensivecode histories for developers so they can understand how their system has evolved.To do this, we have developed an online approach (to remove the need for time1https://github.com/freerange/method_log, accessed Jan 29 20192https://github.com/pry/pry-git, accessed Jan 29 201912consuming pre-processing and to allow for the results to always be up-to-date) thathas been tailored for identifying histories for methods (as most program changeshappen within these structures) while striving to be as robust as possible to thekinds of transformations that are common in practice (refactorings often cause codehistories to be incomplete).13Chapter 4Industrial SurveyTo understand industrial developers’ perspectives on code histories, we surveyedthem about how they use source code histories, the scenarios in which histories areuseful, how easily they can be leveraged with existing tools, and whether historiescould be augmented or recast to provide more value. We structured the surveyaround the following survey research questions:RQ1 Do developers use source code history when they are working with code? Ifso, what are they trying to learn when they examine source code history?RQ2 In terms of their mental models and information needs, what level of tempo-ral and structural granularity are most appropriate when using source codehistory?RQ3 How do developers identify the history for specific code units and how welldoes existing tooling support these queries?RQ4 How effectively can a language-aware code history viewer support programcomprehension?4.1 Survey DesignOur survey was administered online and consisted of 18 questions arranged in fourparts; these parts included likert-scale questions, free-response questions, and two14scenario-based code questions. We also asked participants about their professionalbackground, current job position, development experience, and the version controltools they use. Each survey took approximately 20 minutes to complete.Part 1 This section was designed to gain insight into RQ1 and RQ2. We askedfor a recollection of the participant’s last activity with source code history (RQ1).To investigate the structural granularity (first half of RQ2) we let participants ratehow interested they are in history at different levels of granularity (class/module,file, method/function, block, and field/variable). In terms of the temporal granular-ity (second half of RQ2) we asked for a description on how far in the past theynormally examine history and how they decide what time span to look for.Part 2 This section introduced a brief scenario similar to the one described inSection 2.1: a developer is faced with a pull request and wants to understand betterwhat the code that was changed is actually doing. In this section we wanted togain more information about whether developers use source code history for pro-gram understanding tasks and what information they are searching for (RQ1). Wefirst asked participants how familiar this scenario appears to them and then for adescription of how they would approach this problem. We then isolated the imagi-nary change to a change to a single method and asked participants how they wouldidentify changes in history that changed this method as an example of a code unit.We asked how well the participant’s described strategy would cope with complexstructural changes to this method (e.g. renaming, moving to a different file). Con-cluding Part 2, we asked how they traced changes to methods with current toolingsupport and what makes this hard or easy. Responses would provide us a clearpicture on whether developers struggle with the tools they use in practice (RQ3).Part 3 Examined another concrete scenario. We created a sample pull request1 inour forked repository of the Checkstyle project described in Chapter 2 that changeda specific method. We provided a link to the associated file history2 and asked1https://github.com/ataraxie/checkstyle/pull/12https://github.com/ataraxie/checkstyle/commits/746a9d/src/main/java/com/puppycrawl/tools/checkstyle/utils/CommonUtils.java15participants to describe how they would identify commits relevant to this methodof interest. While we expected that the challenge of this task would become visiblefrom these descriptions we also asked directly how well existing tools support thistask (RQ3). We also asked participants about the utility of a code unit history forthis scenario to investigate RQ4.Part 4 While the first three parts of the survey asked about questions about codehistories using the participant’s own development tools, the fourth part introduceda tool mockup for supporting language-aware historical analysis. We mocked anoutput for the scenario in Part 3, showing the respective commits and diffs thatchanged the method alongside detailed descriptions of the changes in natural lan-guage. To inform RQ4 and the feasibility of our approach further, we now askedfor a rating on (a) how helpful this output would be for a better understandingof the method, (b) how hard it would be to retrieve this type of information withtheir current tooling and (c) how valuable a tool would be that could generate thisoutput. Finally, we gave participants the option to express what other informationcould have been valuable that was not in our mocked result view.4.2 Survey ParticipantsWe recruited 30 professional developers from industry (71%) and 12 from academia(29%) for a total of 42 participants. Participants were selected and contacted in-dividually from the authors’ professional networks; 87 individuals were solicitedgiving a final response rate of 48%. The majority of job titles (64%) were softwaredeveloper/engineer or similar; all academic participants were upper-level graduatestudents or faculty. Across all participants, 90% had more than 4 years of program-ming experience (50% > 10 years) and 80% had used source code history for morethan 4 years (21% > 10 years). For the professional developers, 63% had beenemployed in industry for more than 4 years (23% > 10 years).Table 4.1 shows our participants’ most frequently used source code historytools. Other tools mentioned were specific IDE add-ons, GitKraken, and GitLense.From a historical analysis standpoint, all of these tools are built upon the defaultdata provided by the version control log utilities. None of the related tools men-16tioned in Section 3 were mentioned by our participants, which we take as an indi-cation that the domain is currently research-driven.Tool Kind % RespondentsGit (git-log) Command Line 95%GitHub Web Service 80%BitBucket Web Service 76%IDE/Editor Application 71%SVN (svn log) Command Line 29%TortoiseGit/TortoiseSVN Application 17%SourceTree Application 17%Mercurial (hg log) Command line 14%Other — 29%Table 4.1: Most commonly used history navigation tools.4.3 Survey ResultsWe summarize the results of the survey organized by our survey research question.4.3.1 Developers’ usage of source code historyRQ1: Do developers use source code history when they are working with code?If so, what are they trying to learn when they examine source code history?The great majority of our participants frequently use source code history: 76%of participants had used source code history within two days prior to performingthe survey (90% < 1 week, 96% < one month). In terms of their most recenthistory-exploration activity, we first observe many common version control tasks,e.g “check what I modified”. We also see many activities associated with account-ability; participants checked “who had been contributing”, “who [they] could con-tact for dev support” and “who is associated with [a specific] change”.A large fraction of activities were also related to program understanding. Par-ticipants wanted to “understand how the solution to a certain problem was imple-mented” and “how and why [a property] was changed”. Some participants were17interested in the “evolving of software architecture” and in “what steps a certaincomponent took to get to the shape/position it was in”. The term understand ap-peared 8 times when they described their most recent activity.Developers search and navigate extensively through source code history to“find reasons for [...] changes”. They “browsed for a change made by a specificcommit in the history”, “[looked] for a specific change that might have introducedan issue” and “[looked] for an outdated implementation of a functionality”. Theytried to “[figure] out [a] history of changes” and “[compared] files over multiplecommits”. The terms looked, browsed, searched and navigated appeared 36 timesin the descriptions of most recent activities.When asked how familiar participants were with the scenario in Part 2 (evalu-ating a pull request), 88% replied with very familiar or familiar. When asked for adescription of their strategy for addressing this scenario, most participants repliedthat their first step would be to approach the pull request author because “they haveto be accountable”. Other frequent strategies were to refer to issue trackers becausethey would tell “what were the design decisions that lead to the change”. Othersreported that examining version history was their first step: “my first step would beto look at the code history” or “fire up the history view [...] and hope for the best.”4.3.2 Source code granularityRQ2: In terms of their mental models and information needs, what level oftemporal and structural granularity are most appropriate when using source codehistory?Figure 4.1 shows how interested our participants were in gathering information onsource code history at different levels of structural granularity. Participants were atleast somewhat interested in all levels but were most interested in Method/Functionand Class/Module with both having roughly 85% in the categories ‘Very interested’and ‘Interested’. Overall, this shows an interest in examining changes at levelsother than just the file or textual range (block) level as is supported by most tools.To explore the temporal granularity participants considered in their explo-ration, we asked them how far in the past they usually examine and how they18GranularityStrongly Disagree Disagree Neutral AgreeMethod/Function 0 1 3 18Class 0 0 4 21File 1 2 6 10Block 1 5 3 15Project 1 6 6 15Field/Variable 1 9 6 12Directory/Package 2 7 12 11Method/FunctionClassFileBlockProjectField/VariableDirectory/Package0% 20% 40% 60% 80% 100%Strongly Disagree Disagree Neutral Agree Strongly Agree1Figure 4.1: Structural granularities most desired by study participants whennavigating source code histories.determine how far they want to go. The responses were mostly consistent: “thislargely depends on the goal” and that the time range can vary because the fragmentof interest is based on changes and commits rather than time. Both ends of thespectrum are well represented though; some participants “don’t usually go [back]more than a few weeks” or “a couple of days” while for other tasks they “look at thecommit history years ago”. Interestingly, the cases where they examine further inthe past seem to be related to program understanding tasks most often, for examplewhen “looking for the reason for a code change”, “if some functionality seems oddor obsolete while reviewing source code” or “to understand why and how specificparts became the way they are today”.4.3.3 History and source code unitsRQ3: How do developers identify the history for specific code units and howwell does existing tooling support these queries?In the context of the pull request, we asked our participants how they would gen-erally identify the history of a method and how challenging this task is. Theyhighlighted file history as the most important strategy. Some participants described19how they would limit the history to isolate changes to a method, for example byusing a combination of git log and grep. Many also mentioned recursive or iter-ative calls to git blame and more high-level sources of information like commitmessages and issue trackers.Many participants immediately described this task as “not trivial”, “not easy”and “not the funnest job”. Rating the difficulty of the task in a later question, 48%chose hard or very hard while 31% chose not very hard, or not hard at all.Figure 4.2 shows participants’ ratings of how well their strategies cope withmore complex structural changes. Their strategies handled method renames andsignature changes well, but failed to handle split into multiple methods and moveto different file. This was strongly confirmed by the responses to our question aboutwhat would make this hard or easy where the main notion was that it depends onthe complexity of changes: “It’s either easy (the method hasn’t undergone com-plex changes) or challenging (the method has undergone complex changes). Whenit is challenging, it is VERY difficult.” Many participants mentioned specificallythat “not knowing the semantics” and the fact that “[there] is no direct or explicitabstraction to method/class levels” makes these historical exploration tasks chal-lenging.When faced with our real-world example from the Checkstyle repository and itshistory, our participants assessed the task of identifying commits that changed themethod as even more challenging than in the previous questions. While some saidthey could not think of a strategy for this scenario (“I have no idea how I would dothis”, “Not at all”, “I sincerely have no idea”), others gave more advanced strategies(“write a script” or a “divide and conquer mechanism”). Most participants fall backto the history of the file containing the method and “dig through the history to findthe method in each commit” and “open each one of them and manually inspect”.Some participants immediately pointed out the limits of this strategy: “[if] themethod comes from another class [...], that would be more difficult to trace” andthat “it’s annoying”, “there’s too much garbage noise” and that “revisions are toomany to go through”. Some participants wondered whether “there is any tool thatallows [them] to track the function across commits”.When asked how well existing tools support identifying these changes, onlyone participant said very well, 13% said well and 26% said neutral. The majority20TasksNot well at all Not well Neutral WellA 2 4 5 21B 1 5 2 20C 16 13 6 5D 4 12 9 14E 10 10 12 5ABCDE0% 20% 40% 60% 80% 100%Not well at all Not well Neutral Well Very WellANSWER KEYA: rename methodB: signature changesC: move fileD: split methodE: combinations 1Figure 4.2: How well participants’ strategies cope with common codechanges. Change A: rename method; Change B: change method signa-ture; Change C: move method to another file; Change D: split methodinto multiple methods; Change E: combination of previous changes.of participants rated with either not very well or not well at all (59%). When askedhow hard it would be to find the first commit that really introduced the method(rather than being just a refactoring commit), 76% replied with hard or very hard.4.3.4 History tools and program comprehensionRQ4: How effectively can a language-aware code history viewer support pro-gram comprehension?Some participants mentioned the lack of semantic awareness for existing historytools makes it hard to trace the changes of methods (“not knowing the semantics”).Given our tool mockup for a fictitious language-aware history tool, participantsrated this tool as very valuable or valuable (94%) and 79% consider retrievingthis information manually hard or very hard. When asked how helpful they wouldconsider this information for a better understanding of the method being faced withthe pull request, 70% replied with very helpful or helpful.The positive attitudes towards comprehensive method histories also arose in the21final comments: “having the information [code histories] would be a big improve-ment in the daily business” and it would be great “to generate a method-, class-,file-history/protocol in a human-readable way” and to have “version control thatwas method-aware, or aware of classes/modules”. As indicated before in RQ2, ourmethod example code unit was confirmed as “definitely the best use case” by some.Some participants commented that sometimes histories are not used “because thetools that [they] use don’t support this feature”.22Chapter 5ApproachFigure 5.1 provides a high-level illustration of the CodeShovel approach. We par-tition our description as follows: Section 5.1 describes CodeShovel as a blackboxreceiving one method as input and responding with the method history as output(the Start and Return circles in the figure). We then describe our approach for find-ing matching methods (the left box in Figure 5.1 in Section 5.2 and the underlyingsimilarity algorithm in Section 5.3. Our approach for identifying changes (the rightbox in the Figure 5.1) is described in Section 5.4.5.1 CodeShovel Inputs and OutputsCodeShovel can be seen as a black box taking a number of arguments describinga method in a repository as input and producing the method’s history as output.From a developer’s perspective, they simply need to be able to identify the methodthey want a history for. Programmaticaly, these inputs are:• Repository: Path to a Git version control repository on the local file system.• StartCommit: SHA of the commit to start from and move backwards throughhistory.• FilePath: Path of the file containing the method relative to the root folder ofthe repository.23Analyze changeFind methodMethodunchanged?Similarmethod insame file?Similarmethod indifferent file?Parse fileFind preceeding SHADetect movesMethod moveFile moveCategorize changesSignature changeBody changeNoNoNoReturn change stack.<fileName, methodName, SHA>Add change to stackYesYesStartYesFigure 5.1: High-level approach: each query starts with a method name andSHA. CodeShovel iterates backwards through history until it finds theintroducing commit for that method.• MethodName: Name of the method for which the history should be pro-duced.• StartLine: Line number for the start of the method. This is used to differen-tiate between multiple methods with the same name.The output of a CodeShovel execution is a JSON file with a list of commits thatchanged the selected method. Each commit has a change type (see Section 5.4),and includes both basic information like the commit message, date and authorand more complex information like the time and number of commits between the24current method change and the previous one. For more complex change types,type-specific information is provided (e.g., for a method move, the old file pathand the new file path).5.2 Method FindingAfter the CodeShovel query has been launched, a method finding process begins.The file specified by the input FilePath is found in the given Repository with thespecified StartCommit. The programming language of the target file is identifiedfrom the file extension in the FilePath. A language-specific AST parser is instanti-ated and invoked with the extracted file content, producing an AST for the file. Themethod node is then identified in the AST using the input MethodName and Start-Line. Using the version control system’s built-in historical records, we identify allof the changes to the FilePath and process each change in turn from most recent tothe oldest to search for the method of interest.Since source code transformations can cause methods to move through the sys-tem, four outcomes are considered in sequence:1. Method unchanged: in this case, the change to the file did not modify themethod; CodeShovel can discard this change (with respect to the selectedmethod) and proceed to the next.2. Similar method in same file: in this case, the method was modified, but couldbe found within the file; this requires change categorization.3. Similar method in different file: when methods are removed from a file, wesearch all other files modified in the same commit to determine where themethod came from. This requires move detection first and change catego-rization thereafter.4. No similar method is found in the same file or a different file: this indicatesthat the method was introduced in this commit and the CodeShovel executioncan return the accumulated list of changes for the method.We have designed this process with a recursive pattern so that if a similarmethod is found in a different file (case 3), a new CodeShovel sub-execution is25started for the matched method in the other file. The termination condition for thisrecursion is an actual method introduction being identified in any sub-execution.This processes is robust to an arbitrary number of method move transfomations.5.3 Similarity Algorithm DesignAt the core of the CodeShovel method finding procedure is a similarity algorithmfor matching methods across file versions. In many source code history tools(e.g., git-log, Historage [12]), similarity is computed by matching a method’s linerange. CodeShovel instead performs the matching using method nodes obtainedfrom ASTs. This matching procedure is based on techniques from clone detec-tion [6, 13, 16, 20, 24, 28]. In search of an accurate and efficient way to matchmethods, we opted for an approach that combines string matching with the com-parison of a set of metrics. For each method matching procedure, CodeShovelcomputes similarity of the following metrics:• Body similarity: String similarity of the method body.• Name similarity: String similarity of the method name.• Parameter similarity: String similarity of parameter names and equality ofparameter types (if language-supported).• Scope similarity: Name equality of the scope of the method, e.g. the class orparent function (only in-file matching).• Line similiarity: Distance between the start line of the two methods (onlyused for in-file matching).It has been shown that string similarity is an efficient strategy for clone detec-tion but lacks accuracy in many cases [6, 13, 24]. One approach for mitigatingthe problem of accuracy without significantly sacrificing efficiency is to combinethe similarities of a set of method features and then use these similarities as met-rics. Rather than measuring string similarity, we measure code similarity withthe above-listed metrics [16, 20, 28]. We found that CodeShovel could process26method matchings accurately and efficiently1 and did not rely on more complexstrategies like AST matching techniques as in [2, 7, 8, 10, 23] that have relativelyhigh performance overheads. For more detail into how our similarity algorithmwas implemented, please see Section 6.2.5.4 Change AnalysisOnce a similar method has been found by our method matching procedure (leftbox in Figure 5.1), the method is given to our change analysis procedure (rightbox in Figure 5.1). If the method was found in the same file, changes are catego-rized immediately. If the method was found in a different file, move refactoringoperations are detected first. This can be either a method move, pull-up method,or push-down method refactoring or a file move (or rename) operation. Each com-mit in the CodeShovel output is associated with one specific change type. If thereare multiple change types for one commit, they are processed as a special changetype containing multiple single change types. Our categorization of changes is asimplified version of the change taxonomy described by Fluri et. al. [8].Figure 5.2 shows the hierarchical structure of our change types. At the top levelis the abstract type Change from which the following types inherit:• NoChange is a special type that is used as a flag for commits that did notchange a method being analyzed.• MultiChange is another special type that indicates that a commit containedmultiple changes and maintains a list of those changes.• CompareFunctionChange is another abstract type from which all actual changesbetween two methods inherit.• Introduced indicates that a method was introduced in this commit for the firsttime.1We are currently using the Jaro-Winkler distance algorithm [32] for string similarity ratings.In a later stage of the project we have learned from different sources [8, 13] that n-gram stringmatching [31] is the better similarity algorithm for source code. We expect that changing to thistechnique will further improve CodeShovel’s accuracy.27Figure 5.2: Hierarchy of change types in CodeShovel.Below the abstract type CompareFunctionChange are the following changetypes:• BodyChange indicates that the body of the method was changed. Note thatwe do not distinguish changes within the body of methods further in orderto keep our change hierarchy simple.• SignatureChange is an abstract type indicating that some parts of the methodsignature have changed.• CrossFileChange is an abstract type indicating a refactoring commit thateither moved the method between files or changed the file name or path thatcontains the method.The type SignatureChange is extended by:• ExceptionChange: exception clause has changed (exceptions were added,removed or edited).• ReturnTypeChange: the method’s return type has changed.28• ParameterChange: the method’s parameters have changed (either in type orname).• Rename: the method has been renamed.• ModifierChange: the method modifier has changed (e.g., from public to pri-vate).The type ParameterChange has another subtype named ParameterMetaChangethat indicates that some parameter meta-information were changed (e.g. finalkeyword added to parameter in Java).While the previously described changes are related to change categorization(upper right box in Figure 5.1), the type CrossFileChange represents changes thatspan multiple files and is therefore related to move detection (lower right box inFigure 5.1). Its subtypes are:• FileMove indicates a refactoring operation in either (1) the file name or (2)the file path.• MethodMove indicates a refactoring operation where a method was movedfrom one file to another.Note that the significance of the MethodMove change type and, in particular,its combination with the Rename change type has been previously identified [25].Consequently, we consider a proper identification of this change type to be animportant goal for building comprehensive method histories.29Chapter 6ImplementationIn this section, we describe how we implemented CodeShovel. We first outline thelanguages and platforms we used, before we illustrate how we implemented thesimilarity algorithm whose design we described in Section 5.3. We then providedetail on how CodeShovel traverses a method’s history and detail the programmingAPI for implementing CodeShovel language adapters.6.1 Languages and PlatformsCodeShovel is implemented in Java. While CodeShovel is language-aware, thecore approach is not language-dependent: given the required AST parser for alanguage (although written in Java), the implementation is extensible for otherlanguages: all core components with language-specific functionality were real-ized with abstract classes and interfaces and concrete language-specific imple-mentations. In particular, for adding other language adapters the only require-ment is an implementation of two CodeShovel interfaces Parser and Method (seeSection 6.4). To perform Git operations and to traverse commits in repositories,CodeShovel leverages the JGit1 library. JavaParser2 is the core of CodeShovel’sJava-specific language-adapter while Nashorn3 is used by the JavaScript-specificlanguage-adapter. We have also started to write a Ruby-specific adapter using1https://github.com/eclipse/jgit2https://github.com/javaparser/javaparser3https://www.oracle.com/technetwork/articles/java/jf14-nashorn-2126515.html30jRuby4 and have found adding support for languages to be as straight-forwardas we had hoped with this adapter-architecture.6.2 Similarity Algorithm ImplementationCodeShovel’s similarity algorithm takes a target method and a list of candidatemethods as input and produces the method from the candidate list with the highestoverall similarity as output if and only if the candidate’s overall similarity is abovea certain threshold. The algorithm is outlined in Listing 6.1.Listing 6.1: CodeShovel similarity algorithm1 IF there is a candidate with the exact same signature2 IF this is an in-file matching3 return candidate4 ELSE IF body similarity is above 0.85 return candidate67 FOREACH candidate c8 IF c has body similarity of 19 return c10 IF c scope simiarity 1 and body similarity > 0.911 IF this is a cross-file matching12 return c13 ELSE IF the line number distance < 1014 return c15 Compute overall similarity for candidate and save1617 IF there is a candidate with the same name18 IF this is a cross-file matching19 return c IF its overall similarity is > 0.820 ELSE21 return c IF its overall similarity is > 0.52223 WITH candidate c having the highest overall similarity24 IF c has body similarity > 0.8225 IF both bodies of c and the target method have < 4lines and < 60 characters4https://github.com/jruby/jruby3126 return c if it has overall similarity > 0.9527 ELSE28 return c if it has overall similarity > 0.822930 RETURN NULLThis algorithm has a few key properties:• The algorithm uses different metrics depending on whether this is an in-file matching or a cross-file matching. For example, if we find a candidatemethod with the exact same signature in the same file, we can simply returnit because it must be the same method. On the other hand, if we search for amatching method across other files, a full match of the signature may not beenough (there might be many methods with the same signature in differentfiles, e.g. if a class implements an interface). In this case, we add a thresholdto the body similarity.• The algorithm uses threshold numbers for different metrics that seem arbi-trary at first. We derived these numbers from a test suite with a manuallyprepared oracle for 100 methods from 10 different open source repositories(see Chapter 7). The numbers presented in the algorithm in Listing 6.1 arethe result of finding the correct originating commit for all 100 methods. Weshow in Chapter 7 that these values continue to work for another set of 100methods from 10 new open-source projects.• While looping over all candidates, we compute an overall similarity of thecandidate if it has not been identified as a match previously with more sim-ple metrics. This overall similarity is currently computed by weighing thedifferent metrics based on the in-file/cross-file switch. These weights are theresult of the training phase with the 100 sample methods (see Chapter 7).• We treat ’short’ methods (less than 4 lines and 60 characters) differently inthat they need to have a higher similarity to be matched. Not doing so canlead to many incorrect matches for methods with small numbers of state-ments.32Our matching algorithm need not always return a match (i.e., see RETURN NULLin Listing 6.1). If no match is returned, the commit being considered is a prelim-inary Introduced (in the in-file case) or a final introduced (in the cross-file case).The latter case is the termination condition for the CodeShovel execution.6.3 History TraversalA CodeShovel execution uses the inputs described in Section 5.1 and proceeds fol-lows: The file specified by the input FilePath is found in the given (local) Reposi-tory with the specified StartCommit. The programming language of the target fileis identified from the file extension in the FilePath. A language-specific AST parseris instantiated and invoked with the extracted file content, producing an AST forthe file content. The method node is then identified in the AST using the inputMethodName and StartLine.5With the identified start method node at hand, the file-level history of the filecontaining the target method is now created using an abstraction of git-log, start-ing with the parent commit (i.e. the previous commit) of the input StartCommit.Proceeding from new to old (i.e. child to parent) for each commit in the file history,the file content at this point in time is parsed with the same language-specific ASTparser. For each iteration, a reference to the previously matched method (i.e. inthe child commit) is maintained (in the first iteration this is the matched method inthe StartCommit) and the previous version of the child commit is searched in thisparent AST using the in-file similarity detection procedure (see Section 6.2).For each such mapping of the method between commits, there are two cases:(1) a method sufficing the similarity threshold is found or (2) no method suffic-ing the similarity threshold is found. In the case of (1) a within-file interpreter isused to extract the change between the two revisions of the method and the loopcan continue. In the case of (2) the result for this commit is a preliminary Intro-duced change type which indicates that no matching method could be found in thiscommit.This is where a cross-file interpreter comes into play and searches through5Note that the StartLine is necessary for languages that allow method overloading. An alternativesolution would have been to specify the target method by adding its parameters to the method name.We opted for the former due to its simplicity.33all methods that were removed in this commit.6 Each of the removed methods iscompared with the target method using a cross-file similarity detection procedure(see Section 6.2). Again, there are now the two cases described previously: (1)a method sufficing the similarity threshold is found or (2) no method sufficing thesimilarity threshold is found. In the case of (1), the commit that was previouslyinterpreted as a preliminary Introduced is changed to a new change type of eitherFileRename or MethodMove (Or a MultiChange that combines one of the two withother simple changes. This could be a method move in combination with a methodrename, for example.) After the change type was corrected, a new CodeShovelexecution is started for the identified method in a different file. In the case of(2) the preliminary Introduced change is changed to an actual Introduced: we arenow certain that this was in fact the commit that introduced the target method.Note that a CodeShovel execution emits a recursive pattern with multiple nestedsub-executions. The termination condition for this recursive pattern is an actualIntroduced change being identified.6.4 Language Adapter APICodeShovel is extensible for arbitrary programming languages. To implement alanguage adapter, the developer must implement two Java interfaces (the languageadapter itself must be implemented in Java):• Parser: defines how methods can be found and used in a given source fileof the language.• Method: defines the characteristics of a method in the given language.The following Listings 6.2 and 6.3 illustrate the Parser and Method inter-faces. Note that the terms Method and Function are used interchangeably in theseinterfaces.Listing 6.2: CodeShovel Parser interface1 Method findFunctionByNameAndLine(String name, int line)6Note that this strategy does not deal with scenarios where methods were removed in one commitand then added in a later non-subsequent commit.342 List<Method> findMethodsByLineRange(int beginLine , int endLine)3 List<Method> getAllMethods()4 Map<String, Method> getAllMethodsCount()5 Method findFunctionByOtherFunction(Method otherMethod)6 boolean functionNamesConsideredEqual(String aName, String bName)7 Method getMostSimilarFunction(List<Method> candidates , MethodcompareMethod , boolean crossFile)8 double getScopeSimilarity(Method function , MethodcompareFunction)9 List<SignatureChange > getMajorChanges(Commit commit, MethodcompareMethod)10 List<Change> getMinorChanges(Commit commit, MethodcompareFunction)11 String getAcceptedFileExtension()Listing 6.3: CodeShovel Method interface1 String getBody();2 String getName();3 List<Parameter > getParameters();4 ReturnStmt getReturnStmt();5 Modifiers getModifiers();6 Exceptions getExceptions();7 int getNameLineNumber();8 int getEndLineNumber();9 String getCommitName();10 String getCommitNameShort();11 Commit getCommit();12 String getId();13 String getMethodPath();14 String getSourceFileContent();15 String getSourceFilePath();16 String getSourceFragment();17 String getParentName();The language-specific Parser implementation is given the file content in itsconstructor because the main task is the creation of an AST from this content. TheMethod implementation is given the method node from the language-specific ASTimplementation as input. As an example, our interface Parser defines a methodsignature findMethodByNameAndLine(name, line) which is implemented in35our class JavaParser that knows how to find a Method instance within a Java file,given its name and start line. Supporting a new language with this interface is rela-tively straightforward. In addition to our interfaces, we provide two abstract classesAbstractParser and AbstractMethod that provide generic implementations ofmost methods in the interfaces that we expect to work for most languages. This re-duces the implementation of a language adapter for CodeShovel to a manageableset of methods. Both our Java- and (beta) JavaScript-language adapters are imple-mented in this fashion and have approx. 250 LoC (this is the only language-specificcode in the CodeShovel source).36Chapter 7Empirical EvaluationIn this section we describe how we evaluated the correctness and performance ofCodeShovel using 200 methods taken from 20 popular open source Java reposito-ries. Our research questions for this evaluation are as follows:RQ5 How frequently are methods changed?RQ6 Are the results returned by CodeShovel correct?RQ7 Is CodeShovel fast enough for being used as an online tool that does notrequire preprocessing?7.1 SubjectsWe chose 20 popular open source Java projects of varying size for our evaluation.We chose active repositories having at least 2,000 commits, 900 methods, and 250stars on GitHub. These projects span a range of domains and we consider thema representative set of mid- to large-scale Open Source Java projects. Table 7.1lists our subjects and their core statistics. (The star annotations in the table areconnected to the two analyses phases that will be described in Section 7.2.)For the empirical study we selected only Java projects since we consider ourCodeShovel Java adapter robust enough for this type of extensive evaluation. Weare keen to work on our other parsers and reproduce this evaluation once we at-tribute them a comparable level of robustness.37Table 7.1: Java repositories used for our empirical analysis and their statis-tics. Repositories annotated with a ? were used in the training analysis(total of 65171 methods) while the rest were used in the validation anal-ysis (total of 110954 methods).Repository # commits # methods # stars? checkstyle 8010 3084 3848commons-io 2123 996 488? commons-lang 5230 2197 1389elasticsearch 40353 18261 33640? flink 14416 17009 4166hadoop 19805 32888 7801? hibernate-orm 9100 23159 3318hibernate-search 6172 5069 283intellij-community 226106 5946 6335? javaparser 4781 3613 1883jetty 15991 11522 2139? jgit 6065 8277 604? junit4 2228 1107 6992? junit5 4695 2078 2323lucene-solr 30500 29888 1840mockito 4811 1366 7358? okhttp 3262 1433 28107pmd 13360 2567 1738spring-boot 17818 2451 27527? spring-framework 17041 3214 22769TOTAL 451,867 176,125 164,5487.2 MethodologyWe divided our evaluation into two analysis phases with 10 repositories each: atraining analysis and a validation analysis.In the training analysis, we selected 10 repositories at random, and from eachof these, we then randomly selected 10 method histories with at least three com-mits (100 methods in total). We then constructed an oracle for each of these 10038methods to find all of the changes made to these methods. To do this we looked atall CodeShovel results and any additional manual exploration that was needed untilwe were convinced we found the original commit that introduced the method to therepository. For all CodeShovel results we manually evaluated whether a returnedresult was a true positive or false positive; for false positives we analyzed the rootcause of the problem and fixed the tool. For CodeShovel true positives (includingthose that were deemed true positives after improving the tool), we created a unittest validating the input data for the given method and the expected result (the unittests also prevented subsequent regression issues when changing our algorithms).We iteratively performed this process until CodeShovel found all commits in ourmanually-derived oracle for the 100 methods we were analyzing.The validation phase was performed similarly to the training analysis, exceptthat it was performed after the CodeShovel algorithm could no longer be updated.Again, we randomly selected 10 methods from each of the remaining reposito-ries having at least three commits and manually constructed an oracle containingtheir complete history. This was done by multiple authors (and one non-author)using a combination of git-log and manual inspections; the oracles were inde-pendently validated by experienced developers for completeness and correctness.The CodeShovel results for these methods were compared to this oracle and fromthese we can assess the correctness of our tool (since we know both what all of theright results should be and can also check that the tool did not return any results itshould not have).7.3 Empirical Study ResultsWe summarize the results of the empirical study by research question.7.3.1 Frequency of code changesRQ5: How frequently are methods changed?Figure 7.1 shows the number of modifications CodeShovel found across all176,125 methods in 20 repositories. We can see that one-third of methods areintroduced and never modified. Exploring these, we found many of these methods39were not part of the project itself but were dependencies imported into the projectas source and not subsequently modified. The next most common reason for theseunchanged methods is that they are getters or setters which, due to their limitedfunctionality, had little code that needed future modification. Finally, there aremethods that are simply never modified, moved, or renamed. These methods arenot interesting as they contain no history beyond their introducing commit andso we exclude them from our analysis. They also represent methods for whichdevelopers would have no trouble collecting historical information with existingtools (as there is none).The remaining methods had multiple changes and are precisely the ones whosehistories would be most useful to developers as they are more likely to investigatethe histories of methods that are actually changing (e.g., high churn methods) thanthose that do not change. They are also the ones whose introducing commit ismore challenging to find since there are more changes to step through. Thus, wesubsequently focus our evaluation on methods that had at least three changes.Table 1# Commits per method1 33%2 17%3 13%4 12%5 7%6 5%7 3%8 2%9 2%10 1%> 10 5% 0%10%20%30%40%# Changes per method1 2 3 4 5 6 7 8 9 10 > 105%1%2%2%3%5%7%12%13%17%33%Table 250%1Figure 7.1: The proportion of methods having each number of changes asidentified by CodeShovel. This shows that while 33% of methods areonly changed once, 50% of methods are changed three times or more.40Answer RQ5: Based on 176,125 methods in our analysis, about one third ofmethods are introduced and never modified. Another third are changed one tothree times after being introduced. Most of the remaining third of methods aremodified less then ten times with only 5% being modified more than 10 times.7.3.2 CodeShovel correctnessRQ6: Are the results returned by CodeShovel correct?To examine the completeness and correctness of the histories found by CodeShovel,we compared the histories produced by CodeShovel with the histories in the oraclefor each of the 100 methods in our validation phase. The history of 7 methodscould not be determined by CodeShovel because these files were not parsable. Wefound that CodeShovel correctly identified the histories for 85 of the 93 remainingmethods (91%).CodeShovel was not able to find the correct introducing commit for 8 methods.In these cases, the diff contained several changes for the method making it some-what ambiguous as to whether the method should be considered a new methodor an extensively transformed version of an existing method. For example, Fig-ure 7.2 shows a diff from one of the 8 methods. From a developer’s perspective,the method was modified to take an instance of Invocation instead of creating itin the method body, which can be seen by the changes to the parameters, the ex-ception signature, and the removal of the single line in the body. The method wasalso renamed in the same commit. Collectively, these changes caused CodeShovelto report that the bindMatchers method was introduced in this commit instead ofreporting that it was a transformed version of create. In our training phase, weestablished the threshold values for CodeShovel’s similarity algorithm to penalizesimultaneous changes. We found that allowing the similarity algorithm to considertoo many simultaneous changes significantly increased the false-positive rate. Fur-ther tuning of the threshold values, along with additional metrics would adjust howCodeShovel interprets these changes.41Figure 7.2: A diff showing a complex method transformation. The createmethod was renamed to bindMatchers, the parameters were changed,an exception signature was added, and the creation Invocation wasremoved from the body.Answer RQ6: Based on the 100 methods analyzed in our validation phase,CodeShovel is able to correctly determine the complete history of 91% of meth-ods.7.3.3 CodeShovel performanceRQ7: Is CodeShovel fast enough for being used as an online tool that does notrequire preprocessing?To evaluate the performance of CodeShovel’s online similarity algorithm, werecorded the execution time for each of the methods across the 10 repositories usedin the validation phased (total of 110,954 methods). We collected the runtimeson a development computer (12-core processor running at 3.30GHz with 32GBmemory). Figure 7.3 shows the median execution time for all methods in eachrepository (point) as well as the overall median execution time for all methods.Overall, the performance is sufficient for interactive use with a median runtimeunder 2 seconds and with 90% of the methods returning in less than 10 seconds.The intellij-community repository is the outlier with a median execution timeof about 7 seconds, which was due to a combination of large source files (whichtake longer to parse) and a high frequency of change within these files (whichincreases the number of times the parser is invoked).42Repository Median Runtime (seconds)0 1 2 3 4 5 6 7● ●●●●●●●Figure 7.3: The median time it took CodeShovel to process all methods ineach repository (point) listed in Table 7.1. The overall median runtimeis under 2 seconds; the outlier is the intellij-community repositorywhich has many large and frequently changing source files.Answer RQ7: Based on the 110,954 methods analyzed in the validation phase,CodeShovel produces correct and complete method histories with a median un-der 2 seconds.7.4 Change TaggingWe examined the proportion of the method changes that were tagged with eachchange type. Table 7.2 shows the proportion of methods that received each tag.The sum of these proportions is > 100% because methods can receive multipletags (e.g., they can have their method body changed and get a new parameter in thesame change).While body changes do comprise the majority of changes, we believe this alsomakes it easier for the developer to filter these out if they are more interested inrefactoring and restructuring tasks. Similarly, if developers want to only focus onthose changes, they can elide many results that would otherwise not be of interestto them.7.5 Comparison with git-logDue to the popularity of the git-log command and its integration in many ad-vanced history viewers, we compared CodeShovel with the basic command-line43Table 7.2: Proportion of methods tagged with each change type. These addup to > 100% because a change can be tagged with more than one kindof change. Changes, in the bottom half of the table modify the methodsignature.Change type % of changesModify method body 59.8Initial method introduction 20.1Rename file or change file path 14.3Extract method refactoring 1.8Parameter change (type, name, or order) 11.4Modifier change 3.4Method rename refactoring 2.4Return type modification 0.9Exception signature changes 0.6version of git-log1 for the 20 repositories used in our evaluation.Figure 7.4 shows the proportion of history missing by git-log with a point foreach analyzed project. For example, for a method that had 1,000 days of historywith CodeShovel, git-log was only able to find 510 days worth of history onaverage.Analyzing the correctness of git-log, we identified three main categories ofscenarios in which git-log failed to produce correct results:1. A file is renamed or its path changes (due to moving of the file or directory/-package renaming). While git-log has a follow flag which enables thetracking of changes to a file through file renames, this option is not availablein combination with the -L flag for line ranges. So if a developer is onlyinterested in the changes to a specific segment of code, there is no optionto see changes to this segment across file renames or path changes. Thiswould mean that git-log will simply discontinue a method’s history forthese changes.2. A method is moved to another file. This is a change not detected by the1i.e. git log -L METHOD_BEGIN_LINE,METHOD_END_LINE:FILEPATH.440.00.20.40.60.81.0title●●●●●●●●●●●●●●●●●●●●●Figure 7.4: Fraction of CodeShovel data returned by git-log as a fractionof total duration of results. For example, for a method that had 1,000days of history with CodeShovel, git-log was only able to find 510worth of history (on average).git-log command in general.3. A method is moved within the file. In this case the baseline will continue thehistory of the line range from which the method was moved, although thisline range now continues a new fragment of source code.It should be noted here that, rather than git-log, the git-blame commandenables to follow the moving of methods in-file and cross-file to some degree withthe -M and -C flags that work in combination with the -L flag. However, the pur-pose of git-blame is to annotate each line in the code segment with the commitand author that changed the line most recently. There is no history functionalityassociated with this command, although there are approaches available that createsuch a history using a recursive call chain of git-blame (e.g. SmartGit2).2https://www.syntevo.com/smartgit/, accessed 2019-06-0345Analyzing the correctness of CodeShovel, we found almost all of the incorrectresults being related to our method similarity algorithm (see 5.3). While in somecases there is room for discussion on whether one method A in commit X reallycontinues the history of method B in commit Y, other cases are obvious mistakesthat we consider false positives from our tool. While we managed to eliminate allfalse positives from the 100 method histories in our training phase by improvingthe tool, we still saw a number of erroneous changes in the histories analyzed inthe validation phase (see Section 7.3.2).46Chapter 8Industrial Field StudyOur empirical analysis in Chapter 7 showed that CodeShovel provides completeand correct results with acceptable performance for our set of sample methods fromopen-source projects. To ensure the correctness (RQ6) and performance (RQ7)of CodeShovel translates to closed-source, industrial code bases, we ran the toolon participant-selected methods from their own industrial projects and had themindependently verify that the generated histories were correct. As a follow-up tothe industrial survey (Section 4), we also elicited participant insight into how theywould apply CodeShovel in their industrial setting. For this purpose, we aim toadditionally answer one research question:RQ8 In which scenarios are method-level histories useful to industrial developersand why?8.1 Study ParticipantsWe conducted our field study with 16 developers at a medium-size (approximately60 employee) software company in Munich, Germany.1 Since the current versionof CodeShovel is most stable for Java methods, developers were required to havesome industry Java background, and to be able to provide a set of Java methodswith whose histories they were familiar. Participants had a median of 10 years of14 of the 16 particpants had also participated in the survey described in Section 4.47programming experience, 3.5 years working as professional software developers,and 8.5 years experience with version control. Each participant was given a 10Euro coffee gift cards as compensation for their time.8.2 Study DesignWe designed our field study as on-site interview sessions lasting approximately 45minutes per participant. Participants were asked to chose 2-4 Java methods fromtheir own repositories that they were familiar with and that had been revised multi-ple times. We queried CodeShovel using each of these methods on the participant’scomputer and recorded the runtime. We then had the industrial developers evaluatethe correctness of the results.After we sent each participant the CodeShovel executable and had the parciti-pants run it on each of their methods. For each method, the participant then sentthe JSON result file to the moderator. For each commit in the result file, the processwas then:1. The participant opened the commit in the history viewer of their choice ontheir machine.2. We summarized the information in the CodeShovel output (and especiallythe change types described in Section 5.4).3. The participant looked in their history viewer and verified the information.After this process, we asked participants three questions:Q1 Was the method history correct? In particular, was the method really intro-duced in the first commit?Q2 For what scenarios do you think CodeShovel would be useful?Q3 Is the information produced by CodeShovel helpful?Finally, we asked participants about their professional experiences with sourcecode histories before concluding the interview.488.3 Study ResultsWe summarize the results of the field study by the associated research questionsRQ6, RQ7, and RQ8.8.3.1 CodeShovel correctnessRQ6: Are the results returned by CodeShovel correct?We produced 45 method histories in our field study. CodeShovel found the cor-rect introducing commit for 41 methods (91%).2 For the four methods for whichCodeShovel was not able to find the introducing commit, two had commits con-taining multiple changes causing the overall similarity to be below the matchingthreshold, one contained a non-parsable file, and for one we could not reproducethe problem. Otherwise, participants confirmed that CodeShovel performed welland found the relevant commits without including any unrelated commits.Answer RQ6: Based on the 45 participant-selected methods analyzed in ourfield study, CodeShovel is able to correctly determine the complete history of91% of methods. This conforms to the results of our empirical results in Sec-tion 7.3.2.8.3.2 CodeShovel performanceRQ7: Is CodeShovel fast enough for being used as an online tool that does notrequire preprocessing?Figure 8.1 shows the runtimes of CodeShovel on 42 of the 45 methods (weforgot to record the runtimes for three methods). The outlying method that took8 seconds to execute was due to the method being changed 44 times requiringa large file be parsed as many times. Overall, the median runtime was less than 2seconds showing that CodeShovel’s online algorithm is fast enough on an industrialcodebase for interactive use.2The alert reader may notice that this number exactly coincides with the results of our empiricalevaluation (see Section 7.3.2).49Method Runtime (seconds)0 2 4 6 8● ●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●● ● ●●●●Figure 8.1: CodeShovel runtimes on 42 methods from an industrial codebase.The slow performance of the outlying method was due to the containingfile needing to be parsed a large number of times.Answer RQ7: Based on the 45 participant-selected methods analysed in ourfield study, CodeShovel produces correct and complete method histories with amedian under 2 seconds. This conforms to the results of our empirical results inSection 7.3.3.8.3.3 Scenarios for method-level historiesRQ8: In which scenarios are method-level histories useful to industrial devel-opers and why?We were also interested for which scenarios professional developers would useCodeShovel. To help facilitate the discussion, we asked the participants after theyhad seen the history generated by CodeShovel for their methods. Participants de-scribed several scenarios in which CodeShovel would be useful, and to understandthese scenarios more generally, two authors independently open coded the tran-scribed responses.The history generated by CodeShovel allows developers to determine a method’sprovenance because they “can see easily who introduced a method” (P3) and it canhelp answer “[how] this code came to be” (P8). It can aid in traceability, “espe-cially [...] through refactorings [since] other tools like IntelliJ and git-log don’thelp us here” (P9), and so developers can “focus on moves and other refactoringoperations that would not be tracable with conventional Git history” (P5). Partici-pants thought that the “histories are very helpful for onboarding [since] Git blame50isn’t useful because formatting commits destroy everything” (P14), or “if you’renew to a codebase” (P10). They also thought it would be useful for “code under-standing for code you’re not used to” (P10) so that “one can learn more about thecodebase in an easy way” (P7). And, because developers “already do what thistool is doing, we just do it manually” (P8), CodeShovel automates history-relatedtasks.Participants also offered some suggestions as to how CodeShovel could be im-proved: it “definitely needs a UI” (P2) that should be integrated into the IDE soit doesn’t “take me out of my workflow” (P16), and that tracking “method-levelannotations would be valuable” (P1).Answer (RQ8) Participants provided several scenarios in which they thoughtCodeShovel would be useful. From these, we identified provenance, traceabil-ity, onboarding, code understanding, and automation as the most important as-pects. Participants also noted the need for a UI and support for method-levelannotations before they would consider adopting CodeShovel.Overall, participants rated the method histories (including change tags) as veryhelpful (7/16, 43%), somewhat helpful (6/16, 37%), or neither helpful nor unhelp-ful (3/16, 19%). No participants rated the information as unhelpful. Two of theparticipants who rated CodeShovel as neither helpful nor unhelpful were either notactive developers or had not yet worked on a “big" project.51Chapter 9DiscussionIn this section we discuss our threats to validity and future work.9.1 Threats to ValidityIn the following, we internal and external threats to validity.9.1.1 Internal ValidityThe primary threat to the internal validity of our empirical study is the constructionof our oracle. While we had at least one external developer verify the methodhistories we manually generated, it is possible that some histories in the oraclewere incorrect or incomplete. This is especially true for the introducing commitwhich was found on a best-effort basis since it is not feasible to manually examineall commits in a repository.Another threat is due to our sampling method: the methods selected to be usedin our oracle were randomly chosen from all methods having more than three com-mits. This was meant to focus the evaluation on more interesting and challenginghistories but we may have missed certain classes of histories by using random sam-pling.Moreover, both the survey and the field study there is a certain degree of mod-erator bias, since participants’ were selected from the authors’ personal networks.We claim though, that it is very unlikely that participants—mostly experienced52software developers—responded in favor of the author just to influence the resultspositively. This is especially true for free text questions, where content would haveto be “made up” by participants.9.1.2 External ValidityThe primary threat to external validity in our empirical study is our sample size.While we evaluated CodeShovel on both open-source and closed-source industrialcodebases, we manually evaluated the correctness of the approach on a limitednumber of methods. In both the evaluation phase and the field study, this was donedue to time constraints, and we acknowledge that our findings may not generalize.In our survey and field study, the participants we recruited through may notbe representative of all developers in practice. We sought to reduce this threat byhaving a fairly large number of participants (42 in the survey and 16 in the fieldstudy).Furthermore, we acknowledge that our experiments were run only on Javaprojects. Although we claim that CodeShovel is easily extensible to other lan-guages, we have only started to implement other language adapters (currentlyJavaScript and Ruby). So far though, we have only evaluated our tool on the Javaprogramming language, which may not generalize to many other languages.9.2 Future WorkThere are a variety of ways for improving CodeShovel in the future. These ideasstem from participants’ ideas in our survey and field study, as well as from our ownevaluation.9.2.1 Interaction with CodeShovelPresently, developers must interact with CodeShovel via a command-line interface.We see a proper user interface as a logical next step for practical usage, a sentimentechoed by many of the developers who took part in our field study (Chapter 8).Some further suggested that CodeShovel should be integrated into the IDE to avoidcontext switches, or to be shown as a timeline with clickable commits. An optionto reorder the commits from old to new was also suggested.53We are currently working on running CodeShovel as a web service with UI:users point the service to their Git repository and are then guided to choose theirmethod of interest. The history of the given method is then shown in the UI (after aprocessing time that was evaluated in previous sections) and the user can navigatethrough the changes. We expect that the completion of this service, and the asso-ciated transformation from command-line tool to UI-based web service will be asignificant step towards practical usage of CodeShovel.9.2.2 Integration of CodeShovel in other softwareSince CodeShovel is an open source Maven project, the CodeShovel Java APIcan already be used in other Java projects. For example, a Java-based Git host-ing service could add a method history feature to its UI which will then triggerCodeShovel procedures on the backend. This process will also be enhanced bysaid web service that is currently in development: rather than the CodeShovel JavaAPI, any platform can then interact with the REST endpoints of our service usingHTTP. This enables integration of CodeShovel in projects not written in Java (buteven Java projects may choose to use the REST API for architectural reasons).9.2.3 More code granularities and change detailsDevelopers who took part in our survey (Section 4) said that being able to nav-igate history at both the method-level and the class-level would be most useful.Currently, CodeShovel only supports method-level navigation but it would be rel-atively straight forward to track non-method blocks (e.g., fields and constructors)and include them with the aggregate of all method histories in a particular class, toprovide support for class-level navigation.We would also like to inspect the body of methods more in order to providemore details on changes that involve changing of the body of methods (by far themost present change type in Section 7.4). So far, we have not interpreted changeswithin method bodies for performance reasons (after all it is what makes relatedtools using ASTs perform considerably slower). More investigation and runtimeanalysis may direct us to a strategy to do such interpretations with less performancecosts. We would also like to provide more detailed descriptions of changes to54method metadata (e.g., method annotations in Java). One participant also suggestedcall-site analysis for method changes; i.e. CodeShovel could also analyze wheremethods are called throughout the code base and how this changes over time.9.2.4 Source code statisticsSeveral participants described a desire to see more statistics for method histories;for example, statistics could indicate that a method needs refactoring or one couldinfer from method histories what features were robust and which ones needed con-tinuous repair and improvement. This would move us to a meta-analysis level ofmethod history data that is very interesting.9.2.5 CodeShovel robustnessFinally, CodeShovel currently fails to produce method histories as soon as it en-counters a file revision that cannot be parsed While we would like to deal with thisproblem more gracefully, we can also see from our evaluation that this does notoccur often in practice. Still, it would be a great improvement to skip over such“unparseable” commits and continue history traversal.Although we consider CodeShovel’s measured correctness of 91% a good re-sult, we would certainly like to evaluate our tool with more methods and languages:more language adapter need to be implemented and more methods need to be eval-uated. With the concept of the training phase and the associated unit tests (seeChapter 7) we are certain that we can make CodeShovel produce the correct his-tory of almost any method in the world.55Chapter 10ConclusionSource code histories are valuable sources of information about how a system hasevolved. In this paper, we described a formative survey with 30 industrial and 12academic developers to learn how they use source code history and what challengesthey face when doing so. Through this survey, we learned that existing tools do noteffectively surface the results developers need to answer their development ques-tions. To address this, we built CodeShovel, a tool that uses a combination of ASTparsing and metrics-based string similarity to analyze method-level source codechanges. With an empirical analysis across 10 open-source projects, we demon-strated that CodeShovel can return complete method histories in more than 90%of cases with a median execution time of less than 2 seconds. A field study with16 industrial developers confirmed these results translate to industrial code basesand from these developers we determined that CodeShovel would be useful for awide range of industrial development tasks. We have a wide range of ideas to im-prove CodeShovel in the future. Most importantly, would like to extend our toolto support more programming languages and provide an appropriate user interfacefor practical usage. Further evaluation, both in the lab and in the field, will makeour findings more generalizable and our tool even more accurate. In the end, webelieve the results that CodeShovel returns quickly and reliably will be helpful forfuture developers as they investigate the history of their source code.56Bibliography[1] . Refactoring: Improving the Design of Existing Code. Addison-WesleyLongman Publishing Co., Inc., Boston, MA, USA, 1999. ISBN0-201-48567-2. → page 12[2] I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clonedetection using abstract syntax trees. In Proceedings. InternationalConference on Software Maintenance (Cat. No. 98CB36272), pages368–377, Nov 1998. doi:10.1109/ICSM.1998.738528. → page 27[3] A. W. Bradley and G. C. Murphy. Supporting software history exploration.In Proceedings of the 8th Working Conference on Mining SoftwareRepositories, MSR ’11, pages 193–202, New York, NY, USA, 2011. ACM.ISBN 978-1-4503-0574-7. doi:10.1145/1985441.1985469. URLhttp://doi.acm.org/10.1145/1985441.1985469. → page 10[4] S. Demeyer, S. Ducasse, and O. Nierstrasz. Finding refactorings via changemetrics. In Proceedings of the 15th ACM SIGPLAN Conference onObject-oriented Programming, Systems, Languages, and Applications,OOPSLA ’00, pages 166–177, New York, NY, USA, 2000. ACM. ISBN1-58113-200-X. doi:10.1145/353171.353183. URLhttp://doi.acm.org/10.1145/353171.353183. → pages 10, 12[5] D. Dig and R. Johnson. The role of refactorings in api evolution. InProceedings of the 21st IEEE International Conference on SoftwareMaintenance, ICSM ’05, pages 389–398, Washington, DC, USA, 2005.IEEE Computer Society. ISBN 0-7695-2368-4.doi:10.1109/ICSM.2005.90. URLhttp://dx.doi.org/10.1109/ICSM.2005.90. → page 12[6] S. Ducasse, O. Nierstrasz, and M. Rieger. On the effectiveness of clonedetection by string matching: Research articles. J. Softw. Maint. Evol., 1857(1):37–58, Jan. 2006. ISSN 1532-060X. doi:10.1002/smr.v18:1. URLhttp://dx.doi.org/10.1002/smr.v18:1. → page 26[7] J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus.Fine-grained and accurate source code differencing. In Proceedings of the29th ACM/IEEE International Conference on Automated SoftwareEngineering, ASE ’14, pages 313–324, New York, NY, USA, 2014. ACM.ISBN 978-1-4503-3013-8. doi:10.1145/2642937.2642982. URLhttp://doi.acm.org/10.1145/2642937.2642982. → page 27[8] B. Fluri, M. Wuersch, M. PInzger, and H. Gall. Change distilling: Treedifferencing for fine-grained source code change extraction. IEEE Trans.Softw. Eng., 33(11):725–743, Nov. 2007. ISSN 0098-5589.doi:10.1109/TSE.2007.70731. URLhttp://dx.doi.org/10.1109/TSE.2007.70731. → page 27[9] M. W. Godfrey and L. Zou. Using origin analysis to detect merging andsplitting of source code entities. volume 31, pages 166–181, 02 2005.doi:10.1109/TSE.2005.28. URLdoi.ieeecomputersociety.org/10.1109/TSE.2005.28. → pages 10, 12[10] M. Hashimoto and A. Mori. Diff/ts: A tool for fine-grained structural changeanalysis. In 2008 15th Working Conference on Reverse Engineering, pages279–288, Oct 2008. doi:10.1109/WCRE.2008.44. → page 27[11] A. E. Hassan and R. C. Holt. C-rex : An evolutionary code extractor for c.2004. → page 12[12] H. Hata, O. Mizuno, and T. Kikuno. Historage: Fine-grained version controlsystem for java. In Proceedings of the 12th International Workshop onPrinciples of Software Evolution and the 7th Annual ERCIM Workshop onSoftware Evolution, IWPSE-EVOL ’11, pages 96–100, New York, NY, USA,2011. ACM. ISBN 978-1-4503-0848-9. doi:10.1145/2024445.2024463.URL http://doi.acm.org/10.1145/2024445.2024463. → pages 11, 12, 26[13] Johnson. Substring matching for clone detection and change tracking. InProceedings 1994 International Conference on Software Maintenance, pages120–126, Sep. 1994. doi:10.1109/ICSM.1994.336783. → pages 26, 27[14] H. Kagdi, M. Hammad, and J. I. Maletic. Who can help me with this sourcecode change? In 2008 IEEE International Conference on SoftwareMaintenance, pages 157–166, Sep. 2008.doi:10.1109/ICSM.2008.4658064. → page 1058[15] A. J. Ko, R. DeLine, and G. Venolia. Information needs in collocatedsoftware development teams. In Proceedings of the InternationalConference on Software Engineering (ICSE), pages 344–353, 2007. ISBN0-7695-2828-7. doi:10.1109/ICSE.2007.45. → page 1[16] Kodhai and Perumal. Clone detection using textual and metric analysis tofigure out all types of. 2011. → page 26[17] O. Kononenko, O. Baysal, and M. W. Godfrey. Code review quality: Howdevelopers see it. In Proceedings of the International Conference onSoftware Engineering (ICSE), pages 1028–1038, 2016.doi:10.1145/2884781.2884840. URLhttp://doi.acm.org/10.1145/2884781.2884840. → pages 1, 5[18] T. D. LaToza, G. Venolia, and R. DeLine. Maintaining mental models: Astudy of developer work habits. In Proceedings of the InternationalConference on Software Engineering (ICSE), pages 492–501, 2006. → page1[19] T. D. LaToza, G. Venolia, and R. DeLine. Maintaining mental models: Astudy of developer work habits. In Proceedings of the 28th InternationalConference on Software Engineering, ICSE ’06, pages 492–501, New York,NY, USA, 2006. ACM. ISBN 1-59593-375-1.doi:10.1145/1134285.1134355. URLhttp://doi.acm.org/10.1145/1134285.1134355. → page 10[20] Mayrand, Leblanc, and Merlo. Experiment on the automatic detection offunction clones in a software system using metrics. In 1996 Proceedings ofInternational Conference on Software Maintenance, pages 244–253, Nov1996. doi:10.1109/ICSM.1996.565012. → page 26[21] A. Mockus and J. D. Herbsleb. Expertise browser: A quantitative approachto identifying expertise. In Proceedings of the International Conference onSoftware Engineering (ICSE), pages 503–512, 2002. → page 1[22] N. Nagappan and T. Ball. Use of relative code churn measures to predictsystem defect density. In Proceedings of the International Conference onSoftware Engineering (ICSE), pages 284–292, 2005.doi:10.1145/1062455.1062514. URLhttp://doi.acm.org/10.1145/1062455.1062514. → page 1[23] M. Pawlik and N. Augsten. Rted: A robust algorithm for the tree editdistance. Proc. VLDB Endow., 5(4):334–345, Dec. 2011. ISSN 2150-8097.59doi:10.14778/2095686.2095692. URLhttp://dx.doi.org/10.14778/2095686.2095692. → page 27[24] F. V. Rysselberghe and S. Demeyer. Evaluating clone detection techniquesfrom a refactoring perspective. In Proceedings. 19th InternationalConference on Automated Software Engineering, 2004., pages 336–339,Sep. 2004. doi:10.1109/ASE.2004.1342759. → page 26[25] F. V. Rysselberghe, M. Rieger, and S. Demeyer. Detecting move operationsin versioning information. In Conference on Software Maintenance andReengineering (CSMR’06), pages 8 pp.–278, March 2006.doi:10.1109/CSMR.2006.23. → pages 10, 12, 29[26] F. Servant and J. A. Jones. History slicing: Assisting code-evolution tasks.In Proceedings of the ACM SIGSOFT 20th International Symposium on theFoundations of Software Engineering, FSE ’12, pages 43:1–43:11, NewYork, NY, USA, 2012. ACM. ISBN 978-1-4503-1614-9.doi:10.1145/2393596.2393646. URLhttp://doi.acm.org/10.1145/2393596.2393646. → page 10[27] J. Singer. Practices of software maintenance. In Proceedings of theInternational Conference on Software Maintenance (ICSM), pages 139–145,1998. ISBN 0-8186-8779-7. doi:10.1109/ICSM.1998.738502. → page 5[28] M. Sudhamani and L. Rangarajan. Structural similarity detection usingstructure of control statements. Procedia Computer Science, 46:892 – 899,2015. ISSN 1877-0509. doi:https://doi.org/10.1016/j.procs.2015.02.159.URLhttp://www.sciencedirect.com/science/article/pii/S1877050915002239.Proceedings of the International Conference on Information andCommunication Technologies, ICICT 2014, 3-5 December 2014 at BolgattyPalace & Island Resort, Kochi, India. → page 26[29] S. D. Thomas Zimmermann, Peter Weisgerber and A. Zeller. Mining versionhistories to guide software changes. In Proceedings of the InternationalConference on Software Engineering (ICSE), pages 563–572, 2004. ISBN0-7695-2163-0. → page 1[30] Q. Tu and M. W. Godfrey. An integrated approach for studying architecturalevolution. In Proceedings 10th International Workshop on ProgramComprehension, pages 127–136, June 2002.doi:10.1109/WPC.2002.1021334. → page 1160[31] G. W. Adamson and J. Boreham. The use of an association measure basedon character structure to identify semantically related words and documenttitles. Information Storage and Retrieval, 10:253–260, 07 1974.doi:10.1016/0020-0271(74)90020-5. → page 27[32] W. Winkler. String comparator metrics and enhanced decision rules in thefellegi-sunter model of record linkage. Proceedings of the Section on SurveyResearch Methods, 01 1990. → page 27[33] T. Zimmermann. Fine-grained processing of cvs archives with apfel. InProceedings of the 2006 OOPSLA Workshop on Eclipse TechnologyeXchange, eclipse ’06, pages 16–20, New York, NY, USA, 2006. ACM.ISBN 1-59593-621-1. doi:10.1145/1188835.1188839. URLhttp://doi.acm.org/10.1145/1188835.1188839. → page 1261Appendix ASurveyA.1 Survey FormThe following pages show the form that was presented to the participants of thesurvey described in Section 4.6224/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 3/11Section 1: Source Code History In this section we aim to collect a general understanding how developers use source code history. By source codehistory, we are referring to any activity, system, or tool associated with past changes to source code; commonexamples are the history of a specific file, commit diffs or pull requests.Q1.1 How recently did you last use source code history of any kind?Q1.2 Please describe this most recent activity. How did you use source code history? What were you looking for? Did you find it? Did the tools you used support you in this investigation or could they be improved?Q1.3 In terms of source code granularity, how interested are you in gathering information on source code history at the following levels?* By block we are refering to a group of declarations or statements that we commonly see between curly braces ({})or keywords like begin/end in programming languages.Q1.4 When you use code history, how far in the past do you usually examine? How do you determine how far in the past you want to go?Less than 2 work daysLess than 1 weekLess than 1 monthLess than 1 yearMore than 1 yearcan't remember    Veryinterested Interested NeutralNot veryinterestedNotinterested atall Don't knowProject   Directory/Package   File   Class/Module   Field/Variable   Method/Function   Block*   6324/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 4/11Section 2: pull request scenarioSection 2: Pull Request Example  For the following questions, imagine the following situation: you are reviewing a change to a code fragment in a pull request* but you are not certain about what the code is actually doing. Your goal is to better understand the code and what led to the change being made. *Pull request  is a common term in version control for the activity of a source code contributor requesting that a project maintainer merges a change into the code base of the project. If you are unfamiliar with this terminology, you can simply assume a simple change to the source code base that you are reviewing.Q2.1 Does this scenario sound familiar to you (i.e. have you encountered this in the past)?Q2.2 Please describe very briefly how you would approach this problem. What kinds of questions would you like to answer? What tools or approaches would you use to answer them?Suppose the change you are reviewing is related to a single method. You want to understand this method better and what led to it being changed.Q2.3 Using source code history, how would you find changes to this method only? Please describe briefly.Q2.4 How well would your strategy cope with more complex structural changes, e.g. method renaming, moving of a method, refactoring?Very familliar Familiar Neutral Not very familiar Not familiar at all Don't know6424/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 5/11Q2.5 Using current tooling support, how hard is it generally to trace changes to a specific method?Q2.6 Given your answer to the previous question (Q2.5), what makes this hard or easy? Section 3: Historical Scenario OverviewSection 3: Specific Scenario ‑ Overview We have chosen a specific scenario that illustrates how source code history relates to development in practice. Please read the description below and answer the questions that follow. Please allow a few minutes and click on the links provided to understand the scenario better. The choice of the Java language for the example is arbitrary and does not require Java experience. Example Scenario  Imagine yourself in the dev team of Checkstyle, a popular syntax checker for Java. You are to review this pull request with a change to the method CommonUtils.hasWhitespaceBefore. In order to review this pull request, you want to get a better picture on this method and how it has changed over the past. You decide to look into the history of the file CommonUtils.java as seen here. You discover that this file has a history of 47 revisions in 3 years.Q3.1 In the above version history, how would you identify the commits in which the method of interest has changed? Please describe your strategy briefly.     Very well Well NeutralNot verywellNot well atall Don't knowRenaming of method   Signature changes(parameters, return type)   Move to a different file   Splitting into multiplemethods   Combinations of theprevious   Very hard Hard Neutral Not very hard Not hard at all Don't know6524/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 6/11Q3.2 How well do existing tools support identifying these changes? Q3.3 How useful would it be to have support for a more semantic history in this scenario (e.g. history for this method or class only)?Q3.4 How hard would it be to find the first commit for the given method and whether the method was really created then or if it was moved there from somewhere else (e.g. through a file renaming, or through a refactoring)?Section 4: Historical Scenario DetailSection 4: Historical Scenario Detail For the same real world example as above (Checkstyle pull request), we have analyzed the version control history of the method of interest. The following descriptions and diff snippets show where and how the method has been changed over the past. Please have a look at these and answer the questions that follow. Example Scenario (details) The history of the file CommonUtils.java shows that a refactoring commit on Aug 28 2015 (46a52f8) renamed the method whitespaceBefore to hasWhitespaceBefore: This is the third­oldest commit in the file’s history. The message of the oldest commit in the file’s history on Aug 26 2015 (cdf3e56) is “Utils class has been splitted to CommonUtils and TokenUtils”. The diff confirms that a file Util.java was split into these two separate files CommonUtils.java and TokenUtils.java and that the method whitespaceBefore came from this file: Very well Well Neutral Not very well Not well at all Don't knowVery useful Useful Neutral Not very useful Not useful at all Don't knowVery hard Hard Neutral Not very hard Not hard at all Don't know6624/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 7/11     Inspecting the history of Utils.java reveals 41 revisions throughout the year 2015. However, in the oldest commit from Jan 21 2015 (204c073), the method whitespaceBefore was not present in this file. Searching for the commit that introduced the method reveals a commit from March 15 2015 (1c15b6a) with the message “move all methods from checkstyle.api.Utils to checkstyle.Utils”. Again, this was a refactoring commit that combined two classes with the same name (Utils.java) to one file:    6724/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 8/11      The history of the old Utils.java file from which the method came reveals 69 revisions with the first one dating back to Feb 20 2002 (e10faf3). The details of this commit show that the method whitespaceBefore was introduced in this commit for the first time.  6824/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 9/11  Q4.1 Consider again the described situation of being faced with a pull request for a change of a method. Howhelpful would you consider the information above for getting a better understanding of the method and its history?Q4.2 How hard would you consider retrieving information on the history of a method with the above level of detail?Q4.3 If a tool could generate information in the fashion of the above on any method or other code unit, how valuable would you consider this tool?Q4.4 What other information that is not in the descriptions above would you consider valuable?Very helpful Helpful Neutral Not very helpful Not helpful at all Don't knowVery hard Hard Neutral Not very hard Not hard at all Don't knowVery valuable Valuable Neutral Not very valuable Not valuable at all Don't know6924/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 10/11Section 5: Background InformationBackground InformationQ5.1 How many years have you been programming?Q5.2 How long have you been working as a professional software developer?Q5.3 How many years have you been using source code version control?Q5.4 What is your current job title?Q5.5 What version control systems and tools do you use? Please select one or more options.Q5.6 If you selected IDE/Editor in the previous question, please specify what IDE/Editor (and if you know, whatunderlying version control system) you are using. If you selected "Other", please specify what other tools you use.Q5.7 Do you have any final comments? Do you have any other ideas for tool support or systems to solve the generalproblems described in this survey and its scenarios? Is there anything else on your mind?< 1 year 1‑3 years 4‑10 years > 10 years< 1 year 1‑3 years 4‑10 years > 10 years< 1 year 1‑3 years 4‑10 years > 10 yearsGit Bitbucket GitKrakenMercurial SourceTree TFS (Team Foundation Server)CVS IDE/Editor Visual Studio OnlineSVN SmartGit/SmartSVN Other (see next question)Github TortoiseGit/TortoiseSVN    7024/08/2018 Qualtrics Survey Softwarehttps://ubc.ca1.qualtrics.com/ControlPanel/Ajax.php?action=GetSurveyPrintPreview 11/11Private Policy Terms of UsePowered by QualtricsQ5.8 If you are interested in the results of this survey and/or you want to enrol for the $100 (CAD) Amazon gift cardlottery, please provide your email address. (Your email address will not be stored with the survey data.)71A.2 Survey ResultsThe following pages show the results of the survey described in Section 4.72Survey Results ExportCodeShovelJune 25, 2019 5:33 PM MDTQ1.1 - Q1.1 How recently did you last use source code history of any kind?Less than 2 workdaysLess than 1 weekLess than 1 monthLess than 1 yearMore than 1 yearcan't remember0 5 10 15 20 25 30 35 40# Field Minimum Maximum Mean StdDeviation Variance Count1 Q1.1 How recently did you last use source code history of anykind? 1.00 4.00 1.37 0.77 0.60 49Showing rows 1 - 7 of 7# Field ChoiceCount1 Less than 2 work days 77.55% 382 Less than 1 week 12.24% 63 Less than 1 month 6.12% 34 Less than 1 year 4.08% 25 More than 1 year 0.00% 06 can't remember 0.00% 04973Q1.2 - Q1.2 Please describe this most recent activity. How did you use source codehistory? What were you looking for? Did you find it? Did the tools you used support you inthis investigation or could they be improved?Q1.2 Please describe this most recent activity. How did you use source code...Figure out who did certain changesI opened a project I have not been working on for some time, so I used the commit-diff feature to see where I left the project, what changes I havemade and where to keep on working.Looked for changes. Tools that have been used, were useful.shell, code changes, yes, yesUsed local individual file history to restore a previous iteration of my own code and used line annotations to identify when certain lines were changedlast. Regarding the file history, I used only the most fitting version because there were so many revisions that finding the right one was too timeconsuming without knowing the approximate time where that version was actually in use.who wrote the code initially / who made the last change and what did he changeI Iteratively fix bugs in code driving a data processing pipeline. I try out a few things, then commit them to the repo when they are tested.Using git to manage multiple branches that depend on one another. Each branch corresponds to a separate feature, and each feature is submitted forcode review independently. I was ensuring that each branch contained all the changes it needed to be atomic. This is possible in git but requiresadvanced knowledge of git -- I wouldn't say it's a first-class use case.Reviewed a pull-request, merged a pull-request, changed to branches to see other people's code. I was looking to see that a bug fix was implemented.The tools I used were Git command line, to switch branches, and Bit Bucket online, where line changes were listed in red and green. The Bit Bucket toolcould certainly be improved. Unlike Github, it is difficult to click into full files and the entire code history. I suppose that the Git command line couldbegin to use plugins, if they do not already do, to implement interesting features.I searched in some diffs between two git commits for reasons why something in the code broke.Tagging the repo for a release. Pushing the tag to origin. Used git CLI.I was looking for the origin of a merge conflict. I was able to find the origin by using standard Git commands like git diff, git show and git ls-files -s.I did some changes to the source code but needed to look at the previous state to resolve some bugs I introduced. I found the problem. I used theGitHub website to look at the history.git log, git diff, git show to find reasons for recent changesI had to do some version updates of dependencies that were done in another project too. So I had a look at the history to check which versions did weuse in the other project. The tool (Bitbucket) I used supported me in this investigation.checked the last commits to identify was was changed recently - and yes found it74Q1.2 Please describe this most recent activity. How did you use source code...Figure out history of changes. Continuous Integration tool was not doing a build anymore. No change was made to source code. So I could concentrateon supicous configuration. Tool support was okay for me.Used git to make and push some changes to the repo. Also, browsed for a change made by specific commit in the history (for 2 month old commit), i.e.,looked at the "git diff" based on the SHA of the commit. Git was good enough for what I wanted and I did get the work done.I reviewed recent changes. I used source code history to identify the changes, verify they were correct, looking for a specific change that might haveintroduced an issue. I confirmed that the problem was not recently introduced. The tools were quite good at helping me find those changes.I was looking to merge branches into the main branch. (There were several disparate branches because each individual team member was working inisolation.) The worst aspect was that all of the deleted garbage files cluttered the history, so I had to filter that out. I used a lot of git diff.Looked through a diff in a pull requestI searched for an old version of a file. I used the bitbucket for that. I found it and the bitbucket interface was great help.I was merging few code changes that I implemented few days ago with my current working branch. I use git and I'm completely satisfied with it so far.I was using github to go check who had been contributing to a enterprise project before and who I could contact for dev support.I was reviewing a pull request in GitHub. It was a small PR and I had no issues with it. But reviewing PR is usually a tedious and boring task, because thismay involve reviewing many dozens of files with unrelated changes (especially at the beginning of a project, you may need to refactor a few thingswhile implementing a functionality). You could group them under commits, but when you review a PR it's easier to review the whole changes instead ofcommit by commit.I used “git log” and “git show” to check what I modified. I was ensuring I didn’t commit the wrong thing. I was also trying to have a overview what Ihave done. Just many commands involved. Better to have an easier ones.Had to look if the stage branch contained all hotfixes from production, and how to name the next stage release. Used git on the commandline formerges, but SourceTree too look at the graph (tube map).Looking for which commit a block of code was inserted in. I did find it eventually, but the tools aren't great, since I couldn't find a way to track lineageof a particular line of code, just the last commit which modified it (or my git/hg-fu aren't good enough!)I was using it to identify when a colleague made a change, and also looking to see how commits were made. I was easily able to find it. I was using git-x.I also used history to identify what was changed previously and why. I found it using an annotation addon in intellij.pull/commit/push cycle in git and the equivalent in hg and looking through history to identify what changed recently (since my previous commits) andalso who is associated with some changes that I noticed.I was checking the diffs in my branch against the committed code. A normal diff on the terminal was sufficient.- Check when a change was made - Read code from a pervious version Of course I found what I was looking for. I can't thing of anything to improve.I was looking for an outdated implementation of a functionality we aimed to re-introduce (in a better way)Mainly for the basic tasks pulling, committing and pushing. I did not search for something specific. It's mostly a problem of myself that in addition to acertain improvement or bug fix, I put irrelevant enhancements into commits instead of committing the irrelevant ones separately. So I like theBitbucket function of commenting on certain code parts of a commit and highlighting a particular spot that was responsible for a bug.Looked for an old git commit using IntelliJ. Found it because of good commit messages. IntelliJ helped excellently.75Q1.2 Please describe this most recent activity. How did you use source code...Checking for which ticket the changes in a file have been made. Yes I found it.git1. Embedded Eclipse plugin for GIT. 2. Recover changes overwritten by a team member by accident. 3. Yes. 4. Yes, Yes, visualization could be better,had to walk through all files line by line to compare changes.Analyzed code changes over a period of time to find a bug/introduction of the bug. Tool support was sufficient (git/bitbucket/IDE)I wanted to understand how the solution to a certain problem was implemented. I did find what I was looking for. The tool (Stash) was perfectlyadequate in displaying the added and the changed lines of code.Evolving of software architecture. Understanding what steps a certain component took to get to the shape/position it was in. I used GitHub and Towerto navigate the repository and the module's diffs. Navigating the file history and searching through the history of near-by modules by navigating therepository at different positions in time. No, the architectural evolution and reasoning of the modules under consideration was not reconstructiblethrough observing the repository at different positions in time.I often study the history of my source code to understand how and where other people worked since my last commit/push. I do not need any particularadvanced tool to do that, since I am usually looking at small modifications that can be easily managed by the built-in tools of git/svn.- reviewing a pull request - with the user interface of Atlassian bitbucket - compare the changes to the previous version of the file - yes - yesI was trying to revert a previously deleted feature, which wasn't tagged correctly. Unfortunately it took some time to find the right revision. InPHPStorm there is only a history view without further search functionality.I search for a property in a configuration file hosted in a Git repository to understand how and why it was changed. I found the commit in which theproperty was changed. I used ' git blame' and 'git diff' to trace back the changes to the property.Pull requests to review code using Bitbucket. Works fine most of the time, but sometimes changes (especially refactoring/reformating, but evenadding of new methods) get displayed in a confusing way. Diffs of specific files (mostly changes of the last few hours) while developing using IDE.(Mostly to restore old states after realizing "I should have committed that".)Contents of a file contradicted what has been documented elsewhere. Wanted to identify who changed the file at which point and whether there wereany regressions (e.g. due to merges). (result: the discrepancy was intended) Used the Bitbucket Server interface to navigate first the history of a file,looking at various diffs. Color-highlighting of the changes definitely helped. Sometimes the context of what other changes where done in the commitwere a bit sparse.76Q1.3 - Q1.3 In terms of source code granularity, how interested are you in gatheringinformation on source code history at the following levels?Very interestedInterestedNeutralNot very interestedNot interested at all0 5 10 15 20 25ProjectDirectory/PackageFileClass/ModuleField/VariableMethod/FunctionBlock*# Field Minimum Maximum Mean Std Deviation Variance Count77# Field Minimum Maximum Mean Std Deviation Variance Count1 Project 1.00 5.00 2.48 1.28 1.64 462 Directory/Package 1.00 5.00 2.67 1.12 1.26 483 File 1.00 5.00 1.77 1.00 1.01 484 Class/Module 1.00 5.00 1.74 0.78 0.62 475 Field/Variable 1.00 5.00 2.40 1.22 1.48 456 Method/Function 1.00 4.00 1.67 0.72 0.52 467 Block* 1.00 5.00 2.09 1.12 1.26 44Showing rows 1 - 7 of 7# Field Very interested Interested Neutral Not very interested Not interested at all Total1 Project 26.09% 12 34.78% 16 13.04% 6 17.39% 8 8.70% 4 462 Directory/Package 16.67% 8 29.17% 14 31.25% 15 16.67% 8 6.25% 3 483 File 54.17% 26 22.92% 11 16.67% 8 4.17% 2 2.08% 1 484 Class/Module 40.43% 19 48.94% 23 8.51% 4 0.00% 0 2.13% 1 475 Field/Variable 28.89% 13 31.11% 14 15.56% 7 20.00% 9 4.44% 2 456 Method/Function 45.65% 21 43.48% 20 8.70% 4 2.17% 1 0.00% 0 467 Block* 36.36% 16 38.64% 17 6.82% 3 15.91% 7 2.27% 1 4478Q1.4 - Q1.4 When you use code history, how far in the past do you usually examine?How do you determine how far in the past you want to go?Q1.4 When you use code history, how far in the past do you usually examine?...if I see an interesting change i don’t care how far back it was introduced...~ 1-2 Months. Really depends on the project (how many developers are working on it, commit style: micro vs major, developer skills: is an interninvolved?)as far as necessary, it dependsUsually a version number or release date can narrow down the time frame of relevant commits to reasonable levels. Depending on the number ofpossible elements to analyze, I usually adapt the code level and more precisely select individual files or even just parts of a file to look into.i am interested in the history of a code snippet (who did it / what was the code before), it doesn't matter how old it is - what matters is the number ofiterations where the functionality was changedIt's generally for a few reasons. - I either want to figure out why/when a bug was introduced (by bisecting). There's no hard limit on how far back thismight go in the history. I pick a point in history that I know was a good spot, and the current buggy point, and search between the two. In larger teams,it's often due to bad merges, and so it's interesting to track persons or features that introduce a regression. - Often I just want to revert a recentlycommitted change, generally in my local history (stuff I haven't pushed). I like staging things, and using "git diff" as a tool to keep track of where I amor what I'm doing at the bleeding edge. -- especially from one day to the other. I'll often try a few things, and rebase/squash into the local commit thatfixes that stuff. - I go by concept/feature for my commits. So sometimes there is no right granularity to look at patches. A refactoring operation, forinstance, is conceptually simple, but ends up touching everything often. I have a vague notion of how things used to work, and so to find a commit inthe past, I'll often rely on both my own commit messages, and I'll go by file. For projects that have very few (large) files, I will have to go by function, orlocal spots, so I use "blame" tools to figure out which commit touched each line last. If I don't remember where I changed something, but I rememberan identifer, or a comment I've typed, I'll use git grep and go from there.Depends on the pace of code development. In a project that sees several commits daily I might look back as far as 50-80 commits. Also I regularly usethe functionality of git blame, which can document changes to a file that occurred arbitrarily long ago, and I find this useful when I want to understandthe context in which a particular line of code was written.Within a few months. I usually check to see if something has been modified that is relevant to the current issue that I am working on. If it is current, Iwant to see other related changes to keep in mind. I don't want to destroy other people's changes, so I want to understand their business logic and seewhy they implemented some code.Depends: On older projects there is no exact time frame, in current projects it's often from now until the last one of my commits (i.e. all commitssomeone else made).1-3 commits.Most of the time, I only need to go back a couple of days. However, if some functionality seems odd or obsolete while reviewing source code, I need toinspect commits that are many months old.Usually the last few changes (most of the time only the last change). To go back I use the commit history.difficult to answer as it depends on the priority / impact of the defect / feature and how crucial the understand of it is ...Usually I examine the last few commits (maybe 3 to 5 last commits). I have a look at the git network and read the commit messages, to find the correctcommit.79Q1.4 When you use code history, how far in the past do you usually examine?...Only couple of Commits which is mostly a couple of days. Sometimes I've been looking for a feature that was/wasn't implemented some months ago.Last 5 - 10 commits, not depending on time.&gt; how far in the past do you usually examine? Usually 2-3 month, rarely anything more than a year (or all the way to the origin). &gt; How do youdetermine how far in the past you want to go? I usually stop when I find what I am looking for. It usually happens to be 2-3 month old. I'll go all the wayto the origin until I find what I am looking for.Usually, I search back one or two weeks. Occasionally I will search back farther than that. How far depends upon the goal - if I am looking to find whenan issue was introduced, I might search back years.Not very far at all. I would keep it within the current sprint which is generally 2 weeks. Anything beyond that will be classified as a bug not a work inprogress. This is because the branches are closed once things are merged into master and deployed as part of continuous deployments. This is verydifficult to retrieve after the code is deployed. Unfortunately, we'd end up having to talk to the person who developed the feature rather than relyingon code commits history. It also doesn't help that teams commit tiny changes 3-4 times a day, so it's a lot to sort through.Depends on what I want to do. If I review a pull reuest, I only look at the latest changes. When I need to look up, how something was done in the past, Imight even go back to the beginning of the projectBased on the current problem I try to solve, I go back in history until I find relevant code changes in this part of the codeI don't usually go more than a few weeks before in the history. Also it is very rare that I search commit history based on timestamp.As long as I needI don't have any fixed answer for this, it really depends a lot on the particular case. Sometimes I need to trace back the lifespan of a class until it wascreated (which might get tricky if it was renamed). Sometimes I need to check the very first commit of the project. I usually have to go to the filehistory and select the commit where the change I'm looking for might had happened.Probably days at most a week if I want to ensure my work and have a overview. If using git blame, it can go very far.Phew, as far as needed. Usually as far as the cause of some bug. Sometimes I use git bisect to find that. Sometimes I also use the history for "When didwe launch v1" and that can be some years…Usually a couple of weeks back to try and locate what caused a regression. Sometimes, have to go much further to find a changeset which isunaffected.Time does not matter in distance I go back. The only thing that matters most is changes. I go back only 1-2 changes max.Usually examine up to my last commit since that was the last time that I had a snapshot of the code in my head in a consistent form. If using branchesthen consider the entire branch, especially if am trying to figure out a merge.I go as far as the last commit *I* made to the project, or from where I started working on the project, whichever is more recent. If this does not help inmy search, I look at the commit messages to find something that seems relevant and then explore the diff.As far as the change i'm looking for is in the past. I use tags and commit messages.Most of the time I'm looking for relevant Jira issues in the first place, so that I know in what time frame the changes I'm searching for were made.Until the last release tag.80Q1.4 When you use code history, how far in the past do you usually examine?...Depends on what I'm looking for. Usually not further than 10 commits, because of different branches.Depends, sometimes years if I'm looking for the reason for a code change. This depends on how old the project is.till i find what i am looking forDepends...usually I know who committed changes in the past days as I am constantly checking for updates in the SCM. When I'm uncertain about thehistory I have a look at the merge graph (GIT) and who of my teammembers contributed changes. Depending on the (hopefully good commitcomment) I decide which commit to analyse, when searching for a specific change in history e.g. in a class file.At least back to the oldest currently productive codeUsually only one step into the past. Rarely more. Only for very specific investigations one might have to step back a few times or look at a diffcompared to a certain time point when an issue was believed to have been introduced.Starting at a blame I get a rough overview of the age of certain parts of a module. I then try to triage down to issues I am looking into. Often line basedvisualisation is not really helpful for that. One has to jump back and forth in time between often unrelated changes/parts of history.I am usually updating myself on what it has been done since I last worked on the project, so it highly depends on when I last worked on it. However,usually, I look at 3 to 6 days of work.- usually around 1-4 weeks in the past - i move back in time by commitsIt'll be great to have the complete history available all the time.It depends: On a project I'm actively working on I usually look at the code history for the past week. When using an open source project and dependingon the change I'm interested in there are times where I have to look at the commit history years ago.Mostly between a few hours to about two weeks. But for some cases (e.g. to understand why and how specific parts became the way they are today) itcan expand to several years.This largely depends of the goal. On larger projects with multiple developers, I most often start at the last commit I did myself (if any are available)trying to recreate the context in my mind and then go from there forward in time.81Q2.1 - Q2.1 Does this scenario sound familiar to you (i.e. have you encountered this inthe past)?Very familliarFamiliarNeutralNot very familiarNot familiar at all0 2 4 6 8 10 12 14 16 18 20 22# Field Minimum Maximum Mean StdDeviation Variance Count1 Q2.1 Does this scenario sound familiar to you (i.e. have youencountered this in the past)? 1.00 5.00 1.85 1.04 1.09 46Showing rows 1 - 6 of 6# Field ChoiceCount1 Very familliar 45.65% 212 Familiar 39.13% 183 Neutral 2.17% 14 Not very familiar 10.87% 55 Not familiar at all 2.17% 14682Q2.2 - Q2.2 Please describe very briefly how you would approach this problem. Whatkinds of questions would you like to answer? What tools or approaches would you use toanswer them?Q2.2 Please describe very briefly how you would approach this problem. What...I always use the intelliJ "inspect tool" to quickly navigate through the filesFirst, have a look at any linked issues to find problem or feature descriptions, then consult documentation or the pull request creator to get moreinformation. If too unsure, then at least test it locally or on an existing test system whether the intended functionality works as intended. At the end ofthe day, the responsibility of verifying the correct implementation is on the programmer side, not on the reviewer's.first i ask for an informal description from the author. second i ask for a unit test / integration test. normally after understanding the test, oneunderstands the code what was changed (ideally: test failed before the change)The purpose of a patch is often not visible in the code alone. It It is my opinion that good comments (or PR messages) should describe the "why". The"what" and the "how" are essentially the code itself. I would want to know: - it is a fix, or a feature improvement - is it worth merging in (PRs canintroduce other bugs) - is it related to something that's been repeatedly breaking. (is it a regression). looking at the code history for the affectedregion might reveal some important things. - is the patch tested properly. does it cover all cases. - are the limitations of the improvement/fixdocumented. is it a silver bullet. - is it linked to a particular issue. does the code submitted refer to that issue.Tools I've used for PR code review, mainly Github and Bitbucket, usually don't make it easy to browse parts of the code that weren't modified in the PR.Yet, if the changes are to a part of the code with which I am unfamiliar, this is necessary to understand the full context of the changes. For this reason Iwill usually pull the feature branch in question to my local copy of the repository, and use my IDE to browse the changes, and the context in which theyoccur, at my leisure. This usually involves switching between e.g. Github, which highlights the changes and allows me to make comments on them, andthe IDE, for semantic browsing of the code.I would like better explanations of the bug/feature that is fixed or implemented. Sometimes context from a comment helps monumentally. Tying a codechange to a ticket in Github, BitBucket, RT, etc would be beneificial in the code-viewer, so that context is always available and it does not have to betracked down.The obvious questions are why was this change made and what is it doing? I would check who made the commit and get in contact (in our casecomment in Bitbucket on the specific pull request). Or look for a test for this function and check the test if this clarifies the reason and function behindthe change.Look at the whole file. Look at corresponding ticket. Ask commiter for intention of change.To solve this problem, I would view the diffs of the relevant commits and try to run the relevant function in the original and the modified form andcompare their results.I would look at the code and follow its execution path (might need to look at other modules/classes). I would add a comment on parts that I don't knowso that the person that created the pull request could answer my questions. The questions will mostly be general, like: Why does it need to bechanged? What other parts of the code will be affected by the change? Are all boundary conditions satisfied? Will it break something? Tools andapproaches: GitHub website for reviews and comments, IDE to look through the code, IRC to ask questions to the contributor.pairing with the developer and find it out :-)1. Checkout the complete code 2. Try to understand the general problem that the code should solve 3. Run & Debug the code, to see what happens atthat specific part where the changes being made Bitbucket, IntelliJ IDEA, Tools that I need to run the code.83Q2.2 Please describe very briefly how you would approach this problem. What...If I don't understand the code that I should review, I assume to contact the developer to get all needed background information. In addition I wouldcheck if there is any ticket related to the changes that provides more details.I would analyze and use an editor to write down my comments. If code is very very complicated I will print out on paper and use a marker.Usually there is a bug report (e.g., JIRA) for a small pull requests (PR) or a separate documentation for the bigger PR (e.g., feature). I prefer to get thehigher level picture by reading docs, followed by the actual code.I would read the code and reason about what it is doing. Generally I don't need anything more than an editor to read such code.In a code review, I would be familiar with the code because only teammates with expert domain knowledge should review and give +1 to a PR. I wouldhave to ping the person and ask what the change is for, but the PR will be linked to a JIRA task that I should review before bothering them. PRs areassigned within a small team and we were generally working on top of each other's code and very aware of any buggy code or upcomingenhancements.Comment on the pull request. Either comment on the code directely or regularly on the pr itselfIf I come to this kind of problem, this is mostly a sign of bad code quality. I will then ask the pull request creator what this should do and think aboutpossible refactoring actions and comment them in the pull requestI never had to look for the past changes made to a code to understand it. But I had used git blame command to look for who was the last personchanged some lines of code.I would read through the code and try out the branch locally, in addition to running the tests and follow the flow of the code. Github and editor.The one that triggered this PR should add a description with any information relevant to the reviewers (i.e. why, how). Besides that, the tool could helpby grouping the changes by topic (not by commit, which may not contain final changes).Read the commit messages first, then follow the thought, examine codes. Since commits are likely to be small, it should not take too long to figure outeach commits, then the entire pull request.Ask my IDE to show all callers of that function in my IDE, and then also look at the diff in this PR…I'd really like much more context around the diffs. The other thing I'd like is subcommits to a larger commit; for example, assume a change thatmodifies a function and several call sites. I'd like to be able to group all the actual functional changes in a single subcommit, and then group togetherall the other call site changes and the like. Individual commits don't really work well because they break bisection.I would use two tools, and issue tracker and github diff The issue tracker would tell me what were the design decisions that lead to the change. Githubdiff tells me the previous change with the current code.Look at the comment in the pull request -- who made it, and what issue it is referencing. If the pull request is huge (sigh) then consider the tests first,then iterate with the person in the comments section to figure out what is what and why it is so large. Usually relying on the person who made the pullrequest is my go to strategy -- they have to be accountable for the pull request and in their interest to get it merged, so I expect them to volunteeranswers quickly and to the point.First, I do a diff to understand the relevant sections that I need to review. I then see if the change seems proper. To inspect a code fragment that Ican't reason about, I use cscope to find how it is being used. I trace the calls to that function and try to reason about its usage. If the change does notmake sense to me and seems sketchy, I make notes on the PR and ask for clarification.Everything is in the code. Of course I know what it is doing.Asking the Pull Request author? :D84Q2.2 Please describe very briefly how you would approach this problem. What...Phew ... good question. I always ask the pull request creator directly. Of course you can comment with the already existing tools code parts and askwhy was it done the way it was done. But typing takes time and during working hours you do not want to write big explanations. So my priorityapproach would be the Hipchat Call.The PR should be linked to an issue which creates enough semantic context most of the time. If you then still don't understand the code it's eitherbecause the code is bad or you do not know the context well enough. In that case you would need to read the codebase.I'd either talk to the developer directly or comment the file in the pull request.read the code read comments aks committing person test the codeTo understand "why" the code was changed, the easiest way is to ask the developer committed the change and ask for the reason. To answer thequestion "what" was changed in the code I would use a diff tool to compare and analyze the changes.What was changed? Did the behaviour change (was a test modified to match the new behaviour)?Investigate what the whole repository or relevant sub-systems are meant to do, then determine how this PR fits into that picture. Consult the PRdescription or associated design docs as to what the PR is supposed to achieve and then determine whether the code changes seem reasonable in thatcontext.Instead of understanding the evolution of the module/change-set I tend to ask questions or perform a pair review. Reconstructing the knowledge fromplain diffs and the PR's change request is often to cumbersome. I often wish for more Architecture Decision Records I can read to understand whycertain architecture decisions (APIs etc) arrived at the stage they're in.If the change in the code is not self-explanatory, I would expect to find a clar and explanatory message in the commit.Find the bug/feature ticket in which the problem is described and try to understand the business case. Questions are very specific to the problem. - isthere a mockup ( if ui related task )In general the pull request should contain all aspects of what has changed and how the changes should behave. Also it must contain a detailed setup tobuild an exact test environment.I would discuss the pull request either directly with the developer who opened the pull request or in a code review meeting in order to clarify thefunctionality of the code and establish a common understanding of how to use pull request and their descriptions to improve the overall code reviewprocess. Some question would be: Who did this change and when? Was it a single change or did it evolve over time? Why was the change made? Is thechange related to an issue/bug or is it an improvement? In order to answer these questions my first step would be to look at the code history and thenlook at the tools involved in the development process (requirements documents, isse tracking system, ...).Inspect the diff, read the comments and Javadoc, checkout the branch and browse it in the IDE. In more important/complex cases or projects I'm verynew to, talk to the developer or ask questions as comments. Questions can be general like what the responsibility of each class is or target specificchanges and choices made while implementing them.Hopefully there is some textual description of the goal that has been achieved with the code changes. From that functionality I would work backwardsto see what the different changes in themselves actually achieve and what their role is. Questions I would try to answer: - Why were the changesmade? - Any sideeffects? - Does it harmonize with the general code-architecture of the project? Toolwise I would do most of this in the web UI of thecode repository (Github, Bitbucket, ...). For larger changes I also run that version locally and step through the different functions using a debugger.85Q2.3 - Q2.3 Using source code history, how would you find changes to this method only?Please describe briefly.Q2.3 Using source code history, how would you find changes to this method o...AnnotateText searchUse multi line history for the method block to identify commits with changes in this area. If that is not enough, expand to file history or check theidentified commits globally to see interconnections between files.git blameThat's hard with line-based history search. I find it painful enough that I don't even bother. I'll narrow by file. Methods can move between files, but theygenerally don't span multiple files. Then, I would probably word grep for commits containing strings in that function -- if I know that an identifierchanged, for instance. Short functions are easy. Long functions are harder. I will often fallback to a line-based blame output. I'll have commits thattouched each line in a file (git/svn/hg blame) last, and work from there. Most often I'm interested in only the latest commit to have touched something -- if that commit was a whitespace fix (e.g. change a line ending), I'd have to recursively look back from that point. If I needed anything fancier, I wouldneed a program to produce snapshots in time, and then track where the function is with an index (like cscope). But that's language specific. It is aninteresting problem. I think searching by method has its uses, but the syntactic scope of where a change was made is not always relevant, or notalways possible to track, so I feel the tools haven't pushed that feature forward too much. In Javascript for instance, you have these closures all overthe place, and they are often anonymous -- so you can't fully rely on names, really to narrow down to only what you want to find. I feel that most SCMsthrow the towel and give you generic "region-aware" history, rather than an informed history search based on program structure. It is interestingthough, that when you are staging commits, the tools will often pull out the name of the function/class in which a hunk is to be applied, even if that lineis outside the hunk. So there is value in extracting context. Gitk will give you the same thing when looking at the history (it will print the name of theobject/class/function before each diff hunk). I'm not sure you can search on it though! -- or at least I don't remember doing this. You could ultimatelylook through the history with a visual tool, and let gitk extract the function name for you, but that's a lot of clicking.I would use git blame to show changes to this file. This would work as long as this method hasn't been moved from one file to another. Github providesa decent UI around this, where they show the list of commits that have made changes to a file. However, this requires clicking on each commitseparately, and navigating to the method in question within the list of changes brought about by that commit.I would typically check to see if the file has been changed on a commit, and then git diff those changes to see if the method is included.I would use the git history of the containing file in IntelliJ.Looking at the history of the file where the method is placed in.I would use options to limit the diff output of git log.I would also look at the places where the method is used to see if the change breaks something. Also unit tests come in handy here.git log -L :&lt;funcname&gt;:&lt;file&gt;1. Checkout the branch with the changes (if there is an extra branch for that changes) 2. git diff from branch to develop or git diff between the commitwith the changes and the commit before 3. Have a look at this methodI would use revision history and click through revisions.86Q2.3 Using source code history, how would you find changes to this method o...Run 'git blame' on the file, get the 'git commit SHA' for the function, and 'git show commit_sha'. Most functions tend to be part of the single git commit.Or only 2-3 commits (modified the source code), in which case I'd see all commits one-by-one. I have not encountered the case where a function wasmodified by more than 2-3 commits, say 10, but if that's the case, I would probably look at the git history of the lines I am interested in. I am veryunlikely to see all 10 git commits.This can be more challenging, especially if the code has moved across files. I would need to establish the provenance of the code, then look at changesto the specific files. I would need to then read the code and understand how it changed.This occurs in circumstances where the person responsible for this method has left (this is bad practice, eg. silo'ing). Then we would dig into the codehistory for this one method by looking at changes to the file itself. One method would not exist in isolation because they typically should be no longerthan a screen's length. So generally the entire file and dependent files are also investigated. Things are trickier when the code doesn't link up in git, forexample in microservices the API calls are difficult to connect and trace in git. For this we had a custom tool that linked calls (similar to Google'sDapper but not as nice) where we could trace the history of success/failures. From those metrics we dig into the code history to find a correlatingchange.Have a look at the ticket connected to the PR. If done correctely, it should describe what should be done. However, if the ticket is done sloppy, Iprobably would have to contact the author or maybe the product ownerI would check the diff of the file and scroll to the methodAs I said, I never had to do this. But if I had to do this, I probably would check the log of that particular file along with the changes made to it. git log -pcommand for example.I use GitLens which makes it easy to find the exact commit and, hopefully, read a good summary on it.You would need to check the "Blame" option for the file where the method is and check the commit that generated the last change. Then do the samefor every deeper step you would like to go. You could guess the reason why it was changed with the commit message. In any case, it's not trivial.use git blame and see which lines come from which commits.Look at the file history for the last X commits.blame-&gt;move to revision-&gt;blame recursively to rebuild lineageI would search for changes to `this file` using gitx.git log an grep for method if the norm is to mention this mention (unlikely); otherwise git blame (just looked up usage again in a blog since I use it soinfrequently). I would figure out a person who last changed it and talk to the person if I can -- getting to the person is critical for me since I don't wantto waste time making up a story about a change sequence that isn't true in reality (intent is important and is missing entirely from the source coderepos).I use git log -L to review a particular method name. Of course, this relies on the function name and path remaining constant.Use the annotate feature in IntelliJ and klick on the commits in the method.I'd use the bitbucket history of the file the method is in or the history of my IDE to see the changes commit by commit.If the method is still in the same file, I would look at the diff of the file. But this approach is not really leading the way. The method could have movedfrom one file to another or the diff of the file is just too confusing because several places were adjusted at the same time in the course.Don't know how, but would be great to be able to do it easily! I would probably look at commit messages and issues that I would expect to haveimpacts on said method.87Q2.3 Using source code history, how would you find changes to this method o...Don't know, I'd probably check the file history.using a diff toolAswing for a diff of the file the method is assumed to be included. Changes will be shown, if method is there. If method was (re)moved from file, I'llhave a problem...(graphical) diff toolUse `git blame` to see the most recent changes to the relevant lines of code within that method and step through to the most recent change listedthere.Reading the blame of the method and jumping back and forth in history of the method. To the question blow: linting, automatic formatting and tests atdifferent levels would ensure that the change is valid.Unfortunately, it is not easy to directly and explicitly search changes in methods or variables or classes. I would search the diffs of the related files.compare the file diff to the latest commit and find the changes for this function.That´s a good question. Actually i would navigate to the file and open up the file git-history. Then i have to browse different versions to investigate thechanges. It´s not the funniest job but it´s possible.I would try different approaches: * First I would look at the source code history of the file that contains the method in order to understand the changes* If the first action does not result in a better understanding I would search for the method name in the descriptions of source code historyIn IDE "show history" for the file, searching for the commit changing this method (by the latest change date of the specific lines), then using thecommit message that hopefully includes a ticket ID.I would look at the history for the lines of the method, for example using IntelliJ's "Show history for selection"88Q2.4 - Q2.4 How well would your strategy cope with more complex structural changes,e.g. method renaming, moving of a method, refactoring?Very wellWellNeutralNot very wellNot well at all0 2 4 6 8 10 12 14 16 18 20 22Renaming of methodSignature changes (parameters, return type)Move to a different fileSplitting into multiple methodsCombinations of the previous# Field Minimum Maximum Mean Std Deviation Variance Count1 Renaming of method 1.00 5.00 2.34 1.11 1.22 442 Signature changes (parameters, return type) 1.00 5.00 2.02 1.02 1.04 453 Move to a different file 1.00 5.00 3.84 1.17 1.38 454 Splitting into multiple methods 1.00 5.00 3.11 1.07 1.15 445 Combinations of the previous 1.00 5.00 3.62 1.07 1.14 4289Showing rows 1 - 5 of 5# Field Very well Well Neutral Not very well Not well at all Total1 Renaming of method 20.45% 9 47.73% 21 15.91% 7 9.09% 4 6.82% 3 442 Signature changes (parameters, return type) 33.33% 15 46.67% 21 6.67% 3 11.11% 5 2.22% 1 453 Move to a different file 4.44% 2 11.11% 5 17.78% 8 28.89% 13 37.78% 17 454 Splitting into multiple methods 2.27% 1 34.09% 15 25.00% 11 27.27% 12 11.36% 5 445 Combinations of the previous 2.38% 1 11.90% 5 33.33% 14 26.19% 11 26.19% 11 4290Q2.5 - Q2.5 Using current tooling support, how hard is it generally to trace changes to aspecific method?Very hardHardNeutralNot very hardNot hard at all0 2 4 6 8 10 12 14 16# Field Minimum Maximum Mean StdDeviation Variance Count1 Q2.5 Using current tooling support, how hard is it generally totrace changes to a specific method? 1.00 5.00 2.89 1.18 1.39 45Showing rows 1 - 6 of 6# Field ChoiceCount1 Very hard 11.11% 52 Hard 33.33% 153 Neutral 20.00% 94 Not very hard 26.67% 125 Not hard at all 8.89% 44591Q2.6 - Q2.6 Given your answer to the previous question (Q2.5), what makes this hard oreasy?Q2.6 Given your answer to the previous question (Q2.5), what makes this har...If you use a good IDE with Git Plugins it isn't too hardAccessing history for a specific part of a file works pretty well. Problems only arise once the changes go beyond the scope of the originally identifiedsection either through involvement of different files or major structural changes. This makes partial file history basically useless if what you are lookingfor is very old and there were multiple other changes to the file you are working with.See previous answer. I can only surmise that it is not easy because correctly defining the scope of a method automatically without false positives andfalse negatives is a hard problem. Each language has its own grammar, there can be syntax errors which would throw off counting scopes, or evenpreprocessing mish-mashes of various files. Some methods/functions don't have names. The golden standard seems to be to pick the line in the sourcefile which "looks" like some sort of header. (I've now realized at this point that we may not be on the same page as to what would be considered amethod).Most often, methods don't get moved across files, in which case it is easy. In the cases that they do, it becomes hard as the forms a discontinuity in thesource code history.There is not a specific way to track this information with a Git tool. You have to track the file and mentally investigate changes in the file.Currently using git and this has file based granularity.History is on file level makes it hard.Most of the time, Git repositories are used. Git does not track changes but uses snapshots.It might affect parts of the code that are not obvious to spotgit log -L ....git or bitbucket show me all changes that being made. So it should be very easy to trace changes.Refactoring is hard to track. Without refactoring it's ok.Function can be split into multiple functions or moved to another file. These cases make it hard for 'git blame' based history browsing.It's either easy (the method hasn't undergone complex changes) or challenging (the method has undergone complex changes). When it is challenging, itis VERY difficult.It's very difficult in distributed systems because IntelliJ and other IDEs aren't great at remote debugging. Combine this with microservice architectureand the problem explodes. Calls occur between different repositories and the github search doesn't link these well. However, with good namingpractices and logging behaviour, it can be made more intuitive.This depends on whether the PR is connected to a ticket. If so, it also depends on how detailed the ticket is writtenThe diff in bitbucket and github are pretty good.92Q2.6 Given your answer to the previous question (Q2.5), what makes this har...Certainly git is not designed to do this, but usually developers don't move around methods or functions much i guess. So file level history is goodenough. At least as far as I'm concernedGitLens shows it allLast change is relatively easy: "Blame" and find the file within the commit. The older the change is, the harder it gets. If the method has change file orname, it becomes more complicated. In general, every additional change you have to trace back, more chances you have to get lost on the way.git blame does not show the blames being replaced by subsequent commits. And if the method got moved, or modified too much, git blame will getlostWhen the method moves elsewhere you have to change to that other place… Most of the tooling I'm aware of works on file and line granularity, but mostly ignores function boundaries in changeset metadataIt is hard using gitx. It is more easy generally using annotation (git history in intellij). But if the method is moved then much more difficult.source code repo tools don't have lang semantics, so somehow attempting to capture 'method' in git command line options is just asking for a bad day.The change information at sub-file level is difficult to identify and track. Like the previous questions identified, method changes like renaming andsplitting are difficult to track.Just go through the history of that file and look at the method. Don't know why it should be hardWell, it's not that hard since changes are clearly marked in the history of my IDE if you one commit after another.Commits refer to file changes without any further reference. One would have to tag a piece of code, be it a method, class, or block somehow during thecommit, so that any associated changes could be identified across files and independently of other changes in a commit.You just can't do it AFAIK :DDepends like described in Q2.4. if method is not renamed, moved to another file than it should be possible to trace changes.Sophisticated history support in SCM/SCM integration in IDE (we use git)It's not very hard because it's possible through decent IDE support to step through different git commits but could definitely be made easier.Not knowing the semantics. Not being able to visualize the change of e.g. a method renaming or split in a graph.The is no direct or explicit abstraction to method/class levels.Hard: - most of the time a PR/Commit contains more then a single change to the method - no clear way of seeing the impact of a specific change if notexplicitly grouped in a commit - developer needs discipline to commit often and in terms of logical changes Easy: - if a commit contains only therenaming or signature changing of a function it should be easy to trace back the changesIt´s just time consuming but not that hard. I´d prefer a better overview with a timeline with specific filters and stuff. Not a single side-by-side view.The tool I use the most is Git and as far as I know you can only narrow down the search results by specifying a path. Classes, methods, properties arenot supported.93Q2.6 Given your answer to the previous question (Q2.5), what makes this har...The more stable a method is from external perspective (keeping its signature and general purpose) the easier inspecting its history becomes.94Q3.1 - Q3.1 In the above version history, how would you identify the commits in which themethod of interest has changed? Please describe your strategy briefly.Q3.1 In the above version history, how would you identify the commits in wh...Use selective history for the lines in question, which reduces the number of commits to those with changes in these. Obviously, this has the downsidethat once the method moves around more than a few lines, the history is losing track of it because we are only referencing line numbers. Then, thecommit log might give an indication of where the method came from. Continue the search there...git blamethe comment links to an upstream issue. I would read that first. I will state upfront that I don't like the github interface. I would create a temp branch,set to the revision against which the pull request is made. Then I'd run git blame on the file. And spot the revisions affecting lines of the method ofinterest. Then I would inspect these commits. If the function was refactored from a different file, i'll reach a point where the function is added toCommonUtils.java. Then I'd have to git grep to find where it came from (hopefully the removal of the initial function is in the same commit as theaddition). If CommonUtils.java was called something else before, I'd have to find the corresponding delete operation.Click through each commit and search for the method in question.I would take a look at the commit messages and any time-periods where I know people were working on related features where this method isimportant. I would specifically look for the development of that method, where it would have had to have been changed, by finding a relevant gitcommit message.You could diff the first and the latest version and then check the lines of method for commit hashes in which they got changed, then go to thesecommits and check if there are earlier annotations on these lines and so on.Use google and found command: "git log --follow -p -- file"I would run git log -p -- src/main/java/com/puppycrawl/tools/checkstyle/utils/CommonUtils.java and search for all chunks affectingCommonUtils.hasWhitespaceBefore.Above the code field you can select changes from all or certain commits.If "git log -L ..." is no option here I would possible do samples of different dates looking at the original file and comparing it (as the method is quiteshort)I would check the commit messages of the above version history to find the correct commits.I would try to figure out of the comments if the method was concerned. If I wan't to be sure I would have to check every commit.I would use 'git blame' to get the last commit/change on the lines I am interested. I would not try to find 'function level' changes from the entire commithistory to CommonUtils.java. As you pointed out, 47 revisions are too many to go through.Usually I would use binary search to walk through the change set. The descriptions could be useful in eliminating unlikely changes.I scrolled down to the bottom of the commits and found the commons-3 commit which looked interested but was actually not directly related. Iclicked on that diff and looked for the same file and saw that it was only libs and constants changed. I then scrolled through all the irrelevant minorsand version bump commits up towards the top. I found the most recent related commit which shows improvements to the whitespace method. Isearched for the file CommonUtils.java (using the browser search for text) because the smaller changes just to the method call were arranged at thetop, making it difficult to find what I was looking for. Ultimately, I'm looking for just the CommonUtils.java file and there's too much garbage noisebecause it's such a commonly used file.95Q3.1 In the above version history, how would you identify the commits in wh...Looked at the commit messages and randomly clicked through commits. Not very easyI would click through each version and check if there is a changegit log -p And then search for the function nameI have no idea how I would do this.In the PR, click "View" file, click "Blame", search for "hasWhitespaceBefore", check commit next to the method. For older commits, click "View blameprior to this change". Only commits older than 29th August 2015 are related to this method. But the method comes from another class (Utils.java), thatwould be more difficult to trace.Open each one of them and manually inspect. ;-(It's annoying, but looking at each commit if stuff changed in that method.Recursively looking up blame on method lines to see when they were last modifiedI would first use a better tool to look at history of the file. Like gitx, then I would just scroll through all the changes to the file if I had no idea why thechange was made.Oh boy. Well, if the git interface allowed me to expand all the diff snippets then I would expand and search for method name. As is, I would have to clickthrough each one (doesn't seem like this method name appears in any of the comments, plus could miss some this way), so for completeness I wouldexpand each diff snippet and search using the browser page text search).Basically, I would dig through the history to find the method in each commit. I don't know if there is any tool that allows me to track the function acrosscommits.Go through every commit and check the methodI don't knooow! Click every single commit? Damn... you got me with this one!Click on each commit and search with Ctrl+F for method name.Look at the commit messages and diffs for each.I'm trying to figure out by looking at the commit description if the change affects the method I'm interested in and check the diff if so.split the time to equals parts, compare with revision 24, then revision 12 or 36Having a look at the commit comments to get a knowledge of what is the change committed is all about. In Git there is an option to search in commentsor search for logges changes in a specific file. If there is a hint in the comment, I would have a deeper look into the specifit commit. If not, one wouldprobably have to have a deeper look into every of the 47 commits to investigate...1. commit message 2. linked issues 3. diff toolLook at each commit diff to determine if the method was changed.Not at all.96Q3.1 In the above version history, how would you identify the commits in wh...I sincerely have no idea. I usually don't find myself in such complex situations. Personally, I would need to search commit-by-commit. However, as Isaid, I usually don't have to do this, so I never explored more efficient solutions.open the different commits and check manuallyI´ll have to pick some previous versions to hopefully find commits related to this single method.Looking at the version history I would try to identify changes related to the 'hasWhitespaceBefore' method by looking at the commit messages orissues. If this does not lead to a positive result I would probably write a simple script in order to step through the source code history and generate adiff of the affected file and only print out the relevant parts for method 'hasWhitespaceBefore'.On web interface: Click through history? Search for "whitespace"/"blanks" in commit messages? Or using IDE, as described before.I would first check if the function existed already in the very first commit. If not attempt some divide and conquer mechanism to see when it wasintroduced. From there I would go forward through all commits.97Q3.2 - Q3.2 How well do existing tools support identifying these changes?Very wellWellNeutralNot very wellNot well at all0 2 4 6 8 10 12 14# Field Minimum Maximum Mean StdDeviation Variance Count1 Q3.2 How well do existing tools support identifying thesechanges? 1.00 5.00 3.59 1.06 1.12 41Showing rows 1 - 6 of 6# Field ChoiceCount1 Very well 2.44% 12 Well 14.63% 63 Neutral 26.83% 114 Not very well 34.15% 145 Not well at all 21.95% 94198Q3.3 - Q3.3 How useful would it be to have support for a more semantic history in thisscenario (e.g. history for this method or class only)?Very usefulUsefulNeutralNot very usefulNot useful at all0 2 4 6 8 10 12 14 16 18 20 22 24# Field Minimum Maximum Mean StdDeviation Variance Count1 Q3.3 How useful would it be to have support for a more semantichistory in this scenario (e.g. history for this method or class only)? 1.00 3.00 1.57 0.65 0.43 44Showing rows 1 - 6 of 6# Field ChoiceCount1 Very useful 52.27% 232 Useful 38.64% 173 Neutral 9.09% 44 Not very useful 0.00% 05 Not useful at all 0.00% 04499Q3.4 - Q3.4 How hard would it be to find the first commit for the given method andwhether the method was really created then or if it was moved there from somewhere else(e.g. through a file renaming, or through a refactoring)?Very hardHardNeutralNot very hardNot hard at all0 2 4 6 8 10 12 14 16 18# Field Minimum Maximum Mean StdDeviation Variance Count1Q3.4 How hard would it be to find the first commit for the givenmethod and whether the method was really created then or if itwas moved there from somewhere else (e.g. through a filerenaming, or through a refactoring)?1.00 4.00 1.88 0.85 0.72 42Showing rows 1 - 6 of 6# Field ChoiceCount1 Very hard 38.10% 162 Hard 40.48% 173 Neutral 16.67% 74 Not very hard 4.76% 25 Not hard at all 0.00% 042100Q4.1 - Q4.1 Consider again the described situation of being faced with a pull request fora change of a method. How helpful would you consider the information above for getting abetter understanding of the method and its history?Very helpfulHelpfulNeutralNot very helpfulNot helpful at all0 2 4 6 8 10 12 14 16 18# Field Minimum Maximum Mean StdDeviation Variance Count1Q4.1 Consider again the described situation of being faced with apull request for a change of a method. How helpful would youconsider the information above for getting a better understandingof the method and its history?1.00 5.00 2.21 1.10 1.22 42Showing rows 1 - 6 of 6# Field ChoiceCount1 Very helpful 28.57% 122 Helpful 42.86% 183 Neutral 9.52% 44 Not very helpful 16.67% 75 Not helpful at all 2.38% 142101Q4.2 - Q4.2 How hard would you consider retrieving information on the history of amethod with the above level of detail?Very hardHardNeutralNot very hardNot hard at all0 2 4 6 8 10 12 14 16 18 20 22 24 26# Field Minimum Maximum Mean StdDeviation Variance Count1 Q4.2 How hard would you consider retrieving information on thehistory of a method with the above level of detail? 1.00 5.00 2.22 0.92 0.85 41Showing rows 1 - 6 of 6# Field ChoiceCount1 Very hard 17.07% 72 Hard 58.54% 243 Neutral 12.20% 54 Not very hard 9.76% 45 Not hard at all 2.44% 141102Q4.3 - Q4.3 If a tool could generate information in the fashion of the above on anymethod or other code unit, how valuable would you consider this tool?Very valuableValuableNeutralNot very valuableNot valuable at all0 2 4 6 8 10 12 14 16 18 20 22 24# Field Minimum Maximum Mean StdDeviation Variance Count1Q4.3 If a tool could generate information in the fashion of theabove on any method or other code unit, how valuable would youconsider this tool?1.00 4.00 1.74 0.69 0.47 43Showing rows 1 - 6 of 6# Field ChoiceCount1 Very valuable 37.21% 162 Valuable 53.49% 233 Neutral 6.98% 34 Not very valuable 2.33% 15 Not valuable at all 0.00% 043103Q4.4 - Q4.4 What other information that is not in the descriptions above would youconsider valuable?Q4.4 What other information that is not in the descriptions above would you...Information about whether content has only been moved from one class to another or gotten a real change. In the above example, I had to compare themethod contents every time because it was marked as new content in the new files. For lines in that file, this may be true, but the method content itselfdidn't change at all.I'm a bit torn here. On one hand I can appreciate the value that such a tool could provide. On the other hand, I don't think that in a real scenario i wouldwant to trace such a small function (~10 LOC) back to the project's inception. It seems like more work than it's worth. If the question is "Should I mergethis PR?", then I'm really trying to evaluate if the PR is a strict improvement over the previous commit. I'm not convinced going back to the roots of theproject is necessary to make that decision. In other words, in what way does the function's provenance help assess whether the PR change is needed?There are other things that need to be done to evaluate the previous PR. For instance, there might be other calls to that function hasWhitespaceBeforethat should also be refactored. There might be tests to worry about. I might be interested in finding commits that have modified both a test and thegiven function -- that might be a more compelling use case. Another thing to note is that the PR changes the semantics of what characters are allowedat the start of the line. isWhitespace() is more inclusive than just checking for ' '. Are callers of the modified functions aware that the semantics havechanged?It would be great if it was part of a git command line :) That is something that I would use to search for a method and its changes for sure.Who made the changes and why (but this is available in the linked commits so not really missing).Not just the information when and what changed but why the changes being made.There is enough info here.Short summary of all the places that are calling this method, so I could determine how important it is when scoping out changes.I don't work on java a lot. Mostly C or C++. These languages don't have a restriction for giving same name file to the public class in the file. So I neverhad to face this problem.I'd love to be able to specify some inputs and outputs to act as tests for a method. If it could show me how each version of the method would functionfor those; for instance, if I found a bug where it broke on a particular input, I could easily see if it were always broken or if some change added the bug.I would like to have the issue number and design decisions that required the change.The actual history of the method itself seems less relevant to me than what the method does right now (why/how would I need/use history; to blamepeople for bugs?). It seems that one useful bit of history is to find methods that were co-changed with this one. i.e., if there is another method that is adependency or that somehow uses logic that is similar to this method, then knowing that they were changed together (or not!) would be useful. i.e.,either to inform a change I would to make, or to learn about other parts of the code that are relevant to this method (e.g., if I'm learning about thismethod/this part of the code). Clearly knowing the people involved is key (for me), which you left out above. Knowing that Bob and Jane made all thesechanges would be super helpful (because I know Bob's on Slack and Jane left the company, so I know who to ask about the implementation or if I wantto change the method -- code ownership comes into play).If it is not an API, the most important information for me is that the method will still be used.Maybe it would be interesting to see where it's (been) used (in the past).The call reference to this method which in turn has been refracted just because of this method change104Q4.4 What other information that is not in the descriptions above would you...Method usageno answerAssociated test runs/suites and bug reports. Dependency updates which may be have trigger source file changes. Discussions of PR reviews goinginto more detail of reasonings.- references / how often and where is the method used before and after the changesHaving the information above would be a big improvement in the daily business.The commit message associated with every commit. Normally the commit message should include a description why the change was made and alsoreferences to other tools (e.g. issue tracking system, ...).Who worked on the method within which time spans. Associated issue ids. Changes of signature. History of Javadoc of this specific method.If the programming language supports strong typing like this example, I think the signature (parameters and return type) could be retrieved and wouldhelp a lot.105Q5.1 - Q5.1 How many years have you been programming?< 1 year1-3 years4-10 years> 10 years0 2 4 6 8 10 12 14 16 18 20 22# Field Minimum Maximum Mean Std Deviation Variance Count1 Q5.1 How many years have you been programming? 2.00 4.00 3.40 0.66 0.43 42Showing rows 1 - 5 of 5# Field ChoiceCount1 < 1 year 0.00% 02 1-3 years 9.52% 43 4-10 years 40.48% 174 > 10 years 50.00% 2142106Q5.2 - Q5.2 How long have you been working as a professional software developer?< 1 year1-3 years4-10 years> 10 years0 2 4 6 8 10 12 14 16# Field Minimum Maximum Mean StdDeviation Variance Count1 Q5.2 How long have you been working as a professional softwaredeveloper? 1.00 4.00 2.78 0.92 0.85 41Showing rows 1 - 5 of 5# Field ChoiceCount1 < 1 year 9.76% 42 1-3 years 26.83% 113 4-10 years 39.02% 164 > 10 years 24.39% 1041107Q5.3 - Q5.3 How many years have you been using source code version control?< 1 year1-3 years4-10 years> 10 years0 2 4 6 8 10 12 14 16 18 20 22 24 26# Field Minimum Maximum Mean StdDeviation Variance Count1 Q5.3 How many years have you been using source code versioncontrol? 2.00 4.00 3.02 0.64 0.40 42Showing rows 1 - 5 of 5# Field ChoiceCount1 < 1 year 0.00% 02 1-3 years 19.05% 83 4-10 years 59.52% 254 > 10 years 21.43% 942108Q5.4 - Q5.4 What is your current job title?Q5.4 What is your current job title?software engineerPostdoctoral fellow at UBC -- Data Science InstituteSoftware DeveloperBackend DeveloperIT SpecialistCTOSoftware DeveloperGraduate StudentPrincipal ConsultantSoftware DeveloperIT Consultant/Fullstack DeveloperDiplom-InformatikerPhD studentPhD StudentPlatform EngineerSoftaware DeveloperSoftware DeveloperNode.js EngineerSoftware EngineerGraduate studentProjektleiter Software & ConsultingPhD Student109Q5.4 What is your current job title?Senior software product engineerAssistant ProfessorGrad StudentSoftware DeveloperSoftware Developer / UI/UX DesignerDeveloperSoftware DelevoperWeb DeveloperSoftware Developer WebSenior Software Developersenior software consultant and architectSenior Frontend EngineerPermanent researcherProfessional Software DeveloperSoftware developerSoftware developerFullstack Developer110Q5.5 - Q5.5 What version control systems and tools do you use? Please select one ormore options.GitMercurialCVSSVNGithubBitbucketSourceTreeIDE/EditorSmartGit/SmartSVNTortoiseGit/TortoiseSVNGitKrakenTFS (Team FoundationServer)Visual Studio OnlineOther (see nextquestion)0 5 10 15 20 25 30 35 40# Field ChoiceCount1 Git 21.39% 402 Mercurial 3.21% 63 CVS 1.60% 34 SVN 6.42% 125 Github 18.18% 34111Showing rows 1 - 15 of 15# Field ChoiceCount6 Bitbucket 17.11% 327 SourceTree 3.74% 78 IDE/Editor 16.04% 309 SmartGit/SmartSVN 0.53% 110 TortoiseGit/TortoiseSVN 3.74% 711 GitKraken 1.07% 212 TFS (Team Foundation Server) 1.60% 313 Visual Studio Online 1.60% 314 Other (see next question) 3.74% 7187112Q5.6 - Q5.6 If you selected IDE/Editor in the previous question, please specify whatIDE/Editor (and if you know, what underlying version control system) you are using. If youselected "Other", please specify what other tools you use.Q5.6 If you selected IDE/Editor in the previous question, please specify wh...Qt+GitI'm confused by the counterpart of this question. I use emacs, mostly. I've used Eclipse, and Sublime, and IDLE for python in the past. I use vim once ina while, but only to shoot myself in the foot or to insert a bunch of "^qq:" at the top of each file. I've only _heard_ of Visual Studio Online. Looks decent,i'd be keen to try it. What underlying version control system ? I'm not sure what that means. Do you mean under which SCM the IDE is developped (Idon't know)? for most editors I use, I install plugins for most common SCMs. I have also used CVS, SourceForge, and SourceSafe in the past, but thesetools have definitely lost their edge today. I feel ancient now. There's perforce that comes to mind, but it's not on the list.- VScode uses git - the IDEA suite of IDEs from JetBrains provide an excellent feature for managing current source code changes called "changelists"IntelliJ (with git)- Atom (git) - vi - Eclipse (git) - IntelliJ (git)VimEclipse (git, svn), IntelliJ IDEA (git, svn)IntelliJ IDEAUsing Eclipse, IntelliJ, Atom, and VS Code all along with git.Eclipse with SVN plugin. Will change in the next weeks to Bitbucket.VS Code (various including git) Visual Studio (TFS, Perforce, git)IntelliJ, Eclipse Git but only through command lineVSCode- VS Code - IntelliJVim, ctagsVS code, GitLensIntellij IDEAEditor: sublime text113Q5.6 If you selected IDE/Editor in the previous question, please specify wh...PyCharm, AtomI use multiple IDE, Currently at work I am using Intellij, at home I use VS code.Eclipse with all the version systems.P4V and perforce.IntelliJPhp StormJetbrains IntelliJ Visual Studio CodeIntelliJ, Visual Studio, AtomPHP Storm & Visual Studio CodeIBM ClearCase , IBM RTCIntelliJIntelliJ / Sublime (Underlying git)Terminal and git with various helpers and configs.IntellyJ IDEA and PHPStorm with GitIntellijIDEA/Atom/GitIntelliJEditor / IDE: - VS Code with GitLense - various IntelliJ products with both its "Local History" function and the Git integration Other: - The Git CLI itself(not sure if "git" included this)114Q5.7 - Q5.7 Do you have any final comments? Do you have any other ideas for toolsupport or systems to solve the general problems described in this survey and itsscenarios? Is there anything else on your mind?Q5.7 Do you have any final comments? Do you have any other ideas for tool s...- informal / flowchart-like documentation is often more helpful than just the code for complex algorithms and structures. - reviewing code withunit/integration tests is easier than than just reviewing code. - we always should know who to ask: next to "git blame" sometimes I would like somethinglike "how did something similar in this company"? (based on semantic similarity of code) often people solve the identical problems again, because theydon't from each other. after the first commit a good tool could give a hint "are you looking for...?"I do like the idea of being able to express rich queries on evolving code. I think that's a powerful concept, and one that could work across repos (orforks of a repo). I was thinking of use cases as I was filling in the survey, and I thought that it would be pretty cool to automatically merge patches thathave been fixed in forked repos. If I find a bug in my own code, I could quickly determine whether it affects "descendent" repos, and vice versa. I couldalso look at a piece of code (a function , a class, a struct, etc.) , and compare it side by side with alternate implementations in other repos. However Ithink it's too big of a gun for inspecting the localized PR presented as a use case in this survey. I think a use case where a bug was reintroduced mightbe a more compelling example (it happens a lot!). There are other powerful use cases for a structured historical search during development (and less sopatch approval). If I add a parameter to a function I would be interested in seeing what other parts of the code change along with that parameter. E.g.:"show me the code that changes when signature of method X changes". An obvious consequence of changing a signature is to change the callers ofthe modified function. A less obvious one is adding appropriate logging for the new parameter -- this is often forgotten, and added after the fact. Ihope this is going to be a real tool! Would love to try it. I'm thinking even a prototype implementation of this as a wrapper script to the git commandsthat would be great to have at one's fingertips.This was a long survey!Being able to see who revised the method, and comment revisions would be nice.I like the idea of being able to deep dive into a methods history, this would also be helpful for internal classes or fields that get moved from one file toanother, but methods are definitely the best use case.The described scenario is hard to solve with existing tools from my point of view. However I thought on my situations during the survey and I can'tremember that I've needed such detailed information about one method over time. I'm pretty sure that I had these situations in the past but I thinkthey are pretty rare. In other words: it is not something I miss in my daily work but of course it would be very helpful in these rare situations wheremethod tracking is needed.Merging the call tree with the historical view would be super useful. IDEs do this, for example when you rename a method, but the window is tiny andhard to read. It also only appears for the time when you're doing the refactoring and the historical view is difficult to retrieve.I don't know method level history is very challenging considering the myriad of programming languages. I personally never had to encounter with theproblems mentioned in the survey, may be because the tools that I use don't support this feature.The history of a method helps to understand a pull request, but I usually don’t find the need to do so.Thinking back, I dig a lot less in code version control systems than I thought. Generally it's to rollback or to find out who did what or at most figure outwhen something happened and what issues were involved. I'm not sure I would find rich method-level semantics that useful. But, rich meta data that iseasier to get at would be helpful.Use diffs to trace a method through history in a file. Use search to trace a method in different files.115End of ReportQ5.7 Do you have any final comments? Do you have any other ideas for tool s...Thank you, I realize now that I use VCS purely as an archive. Even with the pull request, the successful execution of the test is more important than theactual change.NononeI´d be great to generate a method-, class-, file-history/protocol in a human-readable way. Example: --- Method: ExampleClass::exampleMethodX -aka: DifferentClass::exampleMethodA, MovedIntoAnotherClass::exampleMethodB - created: 2017-06-03 by "john doe" - history: -- 2017-06-03 13:46:Moved to class DifferentClass (see diff) -- 2017-06-10 13:46: Return value has changed to Integer (see diff) -- 2017-06-12 13:46: Added new Argument"newArgument" of Type String (see diff)Good idea with the semantic code history. But the example with "hasWhitespaceBefore()" differs from my usual code history challenges - in this case, Iwouldn't consider the history shown very helpful.116Appendix BField Study TranscriptsThe following pages show the plain results of the field study described in Section 8.117### BEGIN: LEGEND ###Main questions:Q1: Was this method really introduced in this commit?Q2: Were there any commits in the history that should not bethere?Q3: Do you remember any specific commits that should be therebut aren’t?Q4: Is the information produced by CodeShovel helpful? Pleaserate from 1 (not useful at all) to 5 (very useful). For thisquestion assume there were a good UI for the tool.Q5: Do you have any further comments? i.e. Do you miss anything?For what questions and challenges could you imaging CS to beuseful?Participant data:Q6: How long have you worked as a professional softwaredeveloper?Q7: How long have you been programming?Q8: How long have you been using version control?Result legend:Q1-Q2-Q3-Q4(Q1-Q3 are yes or no for each method used in the study)===Q5===Q6-Q7-Q8TIME_TAKEN_METHOD_1 -TIME_TAKEN_METHOD_2 -...### END: LEGEND ###============== P1 ==============yy-nn-nn-4118===* It would be interesting to see more details about what changedin the body, e.g. in one case an if statement was extractedand it would nice to be told about that.* Multichange type is good because it is good to see thatmultiple things changed in this commit.* Method-level annotations would be valuable (e.g. @Override)* More information about refactorings , especially methodextraction* Interpretation of changes of call sites would be useful* Generally really cool thing! It would be great to makesomething bigger out of this. I would really like to seemethod extraction refactorings specifically!===4-12-12============== P2 ==============yy-nn-nn-3===* This definitely needs a UI* I would want to see the order the other way round from old tonew* There is definitely a domain and use cases for this, but I can’t think of anything specific right now because I’m not adeveloper at the moment (rather a consultant)20-26-152915-2933============== P3 ==============yyyy-nnnn-nnnn-4===119A UI with timeline would be great. This way I could see easilywho introduced a method (accountability).If a method is very old, I see that many people were involvedand I may then decide to refactor it.I could also see in the overview what tickets were associated.Then I could find those tickets and use it for betterdocumentation.===2-10-83158-3650-4293-4040============== P4 ==============yny-nnn-nnn-5===I really like your tool! It is much more efficient than thetools I am using. But a good UI is definitely a requirementfor practical usage!In one case the introduction commit was in fact just a copiedmethod. It might be good if the tool would follow the methodwhere it was copied from.===3-9-81866-1872-1036============== P5 ==============yyy-nnn-nnn-5===It is really interesting to follow such histories. If a methodhas a shitty name and is renamed to another shitty name, on120kann trace that. However, if it was never renamed, this isproof that developers followed conventions from the beginning.If this tool had a UI it would be extremely useful because youcan really focus on moves and other refactoring operationsthat would not be tracable with conventional Git history. Buta UI is definitely a must.Can you give this tool to me? I could see a potential every-dayuse. At least when we focus on sprints. It could also beuseful for code reviews: how often was a method changed andwhy? In our team we have a big eye on commit history.It would also be very useful to extend this tool with metrics.We could derive where there is potential for refactoring. Ifa method is changed very frequently , there is danger that themethod is unstable. This often has consequences.Statistics would be interesting: e.g. what is before a featureand what is after a feature? Where might I need refactoring?===20-35-202651-2352-2664============== P6 ==============yyy-nnn-nnn-4===In one case, nethod was copied from other class. History stopswhen copied. It would be nice to trace the method where itwas copied from.What are cases where there is only a parameterchange but nobodychange? Maybe something was done wrong if such changesare found?121This would be helpful if I want to find out why we changedsomething and the documentation isn’t good enough.I don’t really see it being too useful for code reviews becausethese are about the actual changes.For the UI: what has changed? Timeline of what has changed.Statistics of changes are not so important , I am moreinterested in the concrete changes. I am interested incommits.===4-10-82033-1968-2180============== P7 ==============yy-nn-nn-4===I can really imagine that this tool can be helpful! If you’renew to a project and you see a method, its history can reallyhelp you. I wouldn’t have to click through commits. It’d bereally helpful to have a method-dedicated history. This wayone can learn more about the codebase in an easy way.===1-8-1781-810============== P8 ==============yyn-nnn-nnn-5===122This tool can be useful you notice that something is going wrongsomewhere. You can better debug and analyze why does thisbug occur and what complications does it have? What was thereason why this code came to be? A method history can helphere. We already do what this tool (i.e. its algorithm) isdoing, we just do it manually. It would be really cool tohave this functionality in the IDE.===10-17-144122-1500-1400============== P9 ==============yyy-nnn-nnn-5===We have methods and big refactorings in our services. I can wellimagine that it could be useful to trace their histories.Because sometimes when we start with a new microservice andit receives more responsibilities and business logic, weoften split classes into multiple. There’s really lots ofstuff happening. If then there is a time gap of, say, 10weeks between sprints, things become really hard to debug.Imagine you have a bug on production with a state that is 10weeks old. Now you have to create a hotfix and you also wantto have the logic on your "develop" branch. Often you can’tcherry-pick--you’ll have to know where the method moved andfrom where. If I want to create a hotfix on HEAD as well ason the release branch I have to know what happened to themethod. Here this tool would be very helpful! Because younever know where it is!It is especially interesting to track history throughrefactorings. Other tools like IntelliJ and git-log don’thelp us here. Other than refactorings , I think the availabletools are sufficient for us. But we often have scenarioswhere things completely dissolve and things are refactored123big. You’ll ask "what kind of monster is this?". This toollooks at the file itself plus the context outside of the file. There is no context switch to reach your goal. Becauseotherwise I’ll end up at "grep". Switching of tools is themost costly activity.===13-17-10814-634-401============== P10 ==============yyy-nnn-nnn-4===This is very useful for refactorings , legacy code, codeunderstanding for code you’re not used to and code reviews. Iwould also think on the architecture level: I want to tracewhere a class was introduced and from where did the methodscome and why are they there. Source code history is veryimportant if you’re new to a codebase.===6-13-132044-1228-914============== P11 ==============yyy-nnn-nnn-5===It is important that the tool also tracks method-levelannotations. It doesn’t do that yet. Especially @deprecatedis important. It’s interesting how crazy a method has blowedup and the became deprecated at some point. I didn’t knowthat it was renamed at some point before I started at the124company.I can imagine that this can be useful! In my example, the "BlackListFilter" was introduced and then removed again. It’sinteresting to see how the last commit before theintroduction and the last commit containing the method looklike. I can imagine that I would want to see a method-levelhistory. When we introduce functionality and it matures for afew years and then you are confronted with it and you notice: "wow this has really grown. But why has it grown so much?".Functionality is tied to methods and that’s why method-levelhistory is helpful. Also in order to see how things grow; tosee how all these things are added. Using this history youcan see that changes that were made a long time ago aren’tgood any more now. You can discover design problems! Thingsbecome more complex and then it becomes hard to follow what’sgoing on.It would also be useful for refactoring. If this tool were inthe IDE and there were a good UI with this history we mightuse this tool on an every day basis. But the UI andintegration are a requirement because otherwise it’s too hardto handle.We had to learn the hard way how to write commit messages thatreally help.##This participant told me one day after the study that he wouldhave used my tool for tracing a test method: which assertionsthat once were there were removed and why. You can’t findanything to answer this question even with good commitmessages and history. Nothing! Except you click through allcommits.##===2-10-7773-2021-1775125============== P12 ==============yyy-nnn-nnn-4===This tool is certainly useful. With the last method, for example, it was interesting how extremely bloated it became. You canlearn why that came to be. Why does this method do so muchnow although it didn’t really do much in the beginning? Whatwas then intention at the time it was created and what’s theintention now? It was 4 lines in the beginning and now it’s25?It would be good to now more about what changed in the body.Otherwise all the infos I would want were covered: renaming,parameter changes, etc. Really cool, really useful! A good UIwould be important though.===2-9-8894-631-1696============== P13 ==============yyyy-nnnn-nnnn-3===You can see when things were introduced and removed again later.Even if there’s no UI the information is useful, esp. ifsomething was moved or renamed. You can see the evolution:when was something introduced , where does it come from. Somethings you can achieve using common git tools but there isdefinitely room for improvement. Where did refactoringshappen? It would be nice if duplicate code would berecognized: what methods do already exist?126The UI should have a timeline where you can click on commits tosee changes etc. In general, I think there are few caseswhere this tool is very helpful and many cases where thereare only body changes and in those cases the default gittools are sufficient as well.===3-9-91944-8053-3896-3048============== P14 ==============nyy-nnn-nnn-5===Histories are very helpful for onboarding: where do methods comefrom, how did they come to be? Esp. with complex methodsthis is helpful. Git blame isn’t useful here becauseformatting commits destroy everything. Also for code reviewsthis is helpful if you don’t know much about the code. Butthe integration (IDE or Bitbucket) is important. My review of5 is only valid if the tool is integrated.===3-7-6817-1413-714============== P15 ==============yy-nn-nn-3===I can’t really imagine a concrete case where this is useful. ButI haven’t worked with big projects yet. I could imagine thatit’s helpful with bigger codebases. Most often I introducedthe methods myself. It’s more interesting than useful. The127body changes should be more detailed.===1-1-1============== P16 ==============yn-nn-nn-5===This is very useful! Because you tend to trust some commits moreblindly than you should and others you have doubts and it’sunjustified. To track methods is really good because you havethe context available. You see changes that are specific tothis method.Here’s an analogy: if I watch a series and it’s a new season, Iwant a summary of the previous season. With the combinationof good commit messages and this tool you discover thecurrent status of knowledge in your team really well!It’s propably hard to show this information well in a UI, esp.when you change files. You get used to trace objects in theIDE, e.g. by clicking on a method. The UI should definitelybe directly where things happen (i.e. IDE). If I need thistool then it should definitely be in the IDE because if it’sin the pull request somehow then the pull request is too rich/bloated. If I want to see the code I go to the IDE. I don’twant any third resources that take me out of my workflow ,that’s why this must all be in one place.===11-19-17985-1466128

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0379647/manifest

Comment

Related Items