"Science, Faculty of"@en . "Computer Science, Department of"@en . "DSpace"@en . "UBCV"@en . "Sillito, Jonathan"@en . "2011-02-03T21:45:04Z"@en . "2006"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "Despite significant existing empirical work, little is known about the specific kinds of questions\r\nprogrammers ask when evolving a code base. Understanding precisely what information a\r\nprogrammer needs about the code base as they work is key to determining how to better\r\nsupport the activity of programming. The goal of this research is to provide an empirical\r\nfoundation for tool design based on an exploration of what programmers need to understand\r\nabout a code base and of how they use tools to discover that information. To this end, we\r\nundertook two qualitative studies of programmers performing change tasks to medium to\r\nlarge sized programs. One study involved newcomers working on assigned change tasks to a\r\nmedium-sized code base. The other study involved industrial programmers working on their\r\nown change tasks to code with which they had experience. The focus of our analysis has\r\nbeen on what information a programmer needs to know about a code base while performing\r\na change task and also on how they go about discovering that information. Based on a\r\nsystematic analysis of the data from these user studies as well as an analysis of the support\r\nthat current programming tools provide for these activities, this research makes four key\r\ncontributions: (1) a catalog of 44 types of questions programmers ask, (2) a categorization\r\nof those questions into four categories based on the kind and scope of information needed\r\nto answer a question, (3) a description of important context for the process of answering\r\nquestions, and (4) a description of support that is missing from current programming tools."@en . "https://circle.library.ubc.ca/rest/handle/2429/31065?expand=metadata"@en . "A s k i n g and A n s w e r i n g Quest ions )uring a P r o g r a m m i n g C h a n g e Task. by Jonathan Si l l i to B . S c , T h e Universi ty of Albe r t a , 1998 M . S c , T h e Univers i ty of Albe r t a , 2000 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Doctor of Phi losophy in The Facul ty of Graduate Studies (Computer Science) T h e Univers i ty Of Br i t i sh C o l u m b i a December'2006 \u00C2\u00A9 Jonathan Sil l i to 2006 Abstract Despite significant exist ing empir ical work, l i t t le is known about the specific kinds of questions programmers ask when evolving a code base. Understanding precisely what information a programmer needs about the code base as they work is key to determining how to better support the act ivi ty of programming. The goal of this research is to provide an empirical foundation for tool design based on an exploration of what programmers need to understand about a code base and of how they use tools to discover that information. To this end, we undertook two quali tat ive studies of programmers performing change tasks to medium to large sized programs. One study involved newcomers working on assigned change tasks to a medium-sized code base. T h e other study involved industr ia l programmers working on their own change tasks to code wi th which they had experience. T h e focus of our analysis has been on what information a programmer needs to know about a code base while performing a change task and also on how they go about discovering that information. Based on a systematic analysis of the data from these user studies as well as an analysis of the support that current programming tools provide for these activities, this research makes four key contributions: (1) a catalog of 44 types of questions programmers ask, (2) a categorization of those questions into four categories based on the k ind and scope of information needed to answer a question, (3) a description of important context for the process of answering questions, and (4) a description of support that is missing from current programming tools. n Table of Contents Abstract ii Table of Contents iii List of Tables iv List of Figures v Acknowledgments vi Dedication vii 1 Introduction 1 1.1 Overview of Our Data Collection and Analysis ' 3 1.2 Overview of Contributions 6 1.3 Overview of Dissertation Contents 8 2 Related Work 10 2.1 Program Comprehension 10 2.1.1 Cognitive Models 11 2.1.2 Informing Tool Design 13 2.1.3 Analysis of Questions 14 2.2 Empirical Studies of Change Tasks 16 2.3 Summary 18 iii 3 Study 1: Laboratory-based Investigation 19 3.1 Study Setup 20 3.1.1 Change Tasks 23 3.1.2 Participants 23 3.2 Initial Observations 25 3.2.1 Challenges 36 3.3 Summary 38 4 Study 2: Industry-based Investigation 39 4.1 Study Setup 40 4.1.1 Participants and Change Tasks 42 4.2 Initial Observations 45 4.2.1 Production Environments 48 4.2.2 Interruptions 48 4.3 Summary 49 5 Questions in Context 50 5.1 Questions Asked 51 5.1.1 Finding Focus Points 52 5.1.2 Building on Those Points 55 5.1.3 Understanding a Subgraph 57 5.1.4 Questions Over Groups of Subgraphs 59 5.2 Answering Questions 61 5.2.1 Questions and Sub-Questions 62 5.2.2 Questions and Tools 64 5.2.3 From Results to Answers 65 5.2.4 A More Involved Anecdote 66 5.2.5 Understanding the Challenges Programmers Face 68 5.3 Summary 69 iv 6 A n a l y s i s of T o o l Suppor t for A n s w e r i n g Questions 70 6.1 Answering Questions Around Finding Initial Focus Points 71 6.2 Answering Questions Around Building on a Point 76 6.3 Answering Questions Around Understanding a Subgraph 80 6.4 Answering Questions About Groups of Subgraphs 87 6.5 The Gap 91 6.6 Summary 95 7 Discuss ion \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 96 7.1 Implications 96 7.2 Related Work 99 7.3 Limitations 101 7.4 Follow-up Studies 102 8 S u m m a r y 106 B i b l i o g r a p h y 110 A p p e n d i x A E th i c s B o a r d Cert i f icate of A p p r o v a l 122 v List of Tables 3.1 Session number, driver, observer and tasks for study 1 22 3.2 Summary of tasks for study 1 24 3.3 Key observations from study 1 26 4.1 Programming languages and tools used in study 2 41 6.1 Tools and techniques for answering questions in category 1 74 6.2 Tools and techniques for answering questions 6 to 11 76 6.3 Tools and techniques for answering questions 12 to 16 77 6.4 Tools and techniques for answering questions 17 to 20 79 6.5 Tools and techniques for answering questions 21 to 27 82 6.6 Tools and techniques for answering questions 28 to 33 85 6.7 Tools and techniques for answering questions 34 to 37 88 6.8 Tools and techniques for answering questions 38 to 44 89 6.9 Summary of the level of support tools provide 92 vi List of Figures 3.1 Screenshot of the Eclipse Development Environment 21 3.2 Tool usage statistics over all session of study one 27 3.3 Model classes and interfaces from task 1622 32 3.4 Several revisit patterns from sessions from study 1 35 4.1 Arrangements of windows and tools observed in study two . 47 5.1 An overview of the four categories of questions 53 5.2 Two observed patterns of questions 56 5.3 Issues in moving from questions to answers 62 5.4 Windows and associated tools 67 6.1 Package Explore, Type Hierarchy and Outline View 73 6.2 Eclipse Search Results View 73 6.3 SHriMP and Relo Visualizations 84 6.4 Eclipse Call Hierarchy 86 6.5 Sample output from the diff command line tools 87 7.1 Mockup of a search tool illustrating results 98 7.2 Distribution of question occurrences across studies by category 103 vii Acknowledgments I am grateful to the participants from my two studies, to Eleanor Wynn for her valuable help with the second study, and to Gail Murphy for assistance above and beyond the call of a committee member. This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC), IBM and Intel. viii Dedication To Christina Jolayne for her patient and enthusiastic support through far too many years of school. ix Chapter 1 Introduction To better support the activity of programming there has been substantial research on tools to make programming more effective (e.g., [6, 23, 40, 62]). Despite this research and despite the many commercial and research programming tools developed, programming remains a difficult activity. We) believe that tools have not helped as much as they might because much of the work on tools is based on intuition or personal experience rather than empirical research results. In fact software engineering research papers in general seem at times to proceed based on various relatively unsubstantiated assumptions about the support that programmers need from their tools or about the challenges with which programmers need assistance. For instance, here are several examples of assumptions made without empirical backing: \"being forced to follow multiple different relationships... results in developers losing their context\" [87]; \"one of the problems most frequently faced in dealing with legacy software is the location of the code for a specific feature\" [108]; and \"in design recovery we need to be able to generate a range of views mined from both structural and behavioral information about the code\" [75]. Our intention here is not to claim that these statements are necessarily incorrect, nor is it to claim that the tools developed are without value, we simply wish to illustrate that much of this work is not guided by empirical data. We also believe that a more solid, empirical foundation is possible and that such a foundation would be valuable for tool builders. For example, we believe empirical results can produce additional insights beyond intuition and allow researchers to address the right problems and to do so in a manner more likely to be useful to practitioners. 1 A large body of significant empirical work about programming activities does exist. Some of this work has focused on developing models of program comprehension, which are descriptions of the process a programmer uses to build an understanding of a software system (e.g., [83, 53]). One of the goals of work on program comprehension has been to inform the design of programming tools (e.g., [102, 92]). There have also been several studies about how programmers perform change tasks including how programmers use tools in that context (e.g., [18, 86]). Despite this work many open questions remain. For example, what does a programmer need to know about a code base when performing a change task to a software system? How does a programmer go about finding that information? Although this is of central importance to the activity of programming and the design of programming tools, surprisingly little is known about the specific questions asked by programmers as they work on realistic change tasks, and how they use tools to answer those questions. For example, we are aware of only a small amount of existing work that considers in detail the questions that programmers ask. While studying programmers performing change tasks, Letovsky observed a recurring behavior of programmers asking questions and conjecturing answers and he detailed some of those questions [57]. Erdos and Sneed proposed seven kinds of questions programmers must answer while performing a change task [17]. This list was based on their personal programming experience and includes questions such as where is a particular subroutine/procedure invoked? Johnson and Erdem studied questions asked to experts on a newsgroup [47]. Similarly, little is known about how programmers answer their questions and the role of tools in that process. Our aim is to build on this body of existing work with the goal of providing an empirical foundation for tool design based on an exploration of what programmers need to understand and of how they use tools to discover that information. We focus, in particular, on what programmers need to understand about a code base while performing a nontrivial change task to a software system. To this end, we undertook two qualitative studies. In each of these studies we observed programmers making source changes to medium (20 KLOC) to large-sized (over 1 million LOC) code bases. Based on a systematic analysis of the data from 2 these user studies, this dissertation makes four key contributions: (1) a catalog of 44 types of questions programmers ask, (2) a categorization of those questions into four categories based on the kind and scope of information needed to answer a question, (3) an analysis of the process of answering questions which exposed important context for the questions, and (4) an analysis of existing tool support for answering questions including a discussion of the support that is currently missing from tools. These contributions provide a foundation on which to build tools that more effectively support programmers in the process of performing a change task, and in particular gaining an understanding of a code base. In the remainder of this chapter we present an overview of our studies and our research approach (Section 1.1), describe our four key contributions (Section 1.2) and outline the contents of this dissertation (Section 1.3). 1.1 Overv iew of O u r D a t a Col lec t ion and Ana lys i s We collected data from two studies: our first study was carried out in a laboratory setting [85] and the second study was carried out in an industrial work setting [84]. Both were observational studies to which we applied qualitative analysis. The participants in the first study (NI... N9) we refer to as newcomers as they were working on a code base that was new to them. All nine participants in the first study were computer science graduate students with varying amounts of previous development experience, including experience with the Java programming language. In this first study, pairs of programmers performed change tasks on a moderately sized open-source system assigned by the experimenter. We chose to study pairs of programmers because we believed that the discussion between the pair as they worked on the change task would allow us to learn what information they were looking for and why particular actions were being taken during the task. This study involved twelve sessions (1.1... 1.12) with each participant participating in two or three sessions with different pairings in each session. 3 The second study entailed 15 sessions (2.1... 2.15) carried out with 16 programmers (El.. . E16) in an industrial setting. With one exception, in this study we studied individual programmers working alone (rather than in pairs) because that was their normal work situation. Participants were observed as they worked on a change task to a software system for which they had responsibility. The systems were implemented in a range of languages and during the sessions the participants used the tools they would normally use. We asked each participant to select the task on which they worked to ensure that the tasks were realistic and because we were interested in observing programmers working on a range of change tasks. They were asked to think-aloud while working on the task [101]. Through these studies we have been able to observe a range of programmers working on a range of change tasks, using a range of programming tools. Varying the study situation along these dimensions has been deliberate as we believe that a broad look at the process of asking and answering questions was needed. To structure our data collection and the analysis of our data, we have used a grounded theory approach, which is an emergent process intended to support the production of a theory that \"fits\" or \"works\" to explain a situation of interest [25, 96]. Grounded theory analysis revolves around various coding procedures which aim to identify, develop and relate the concepts. In this approach, data collection, coding and analysis do not happen strictly sequentially, but are overlapping activities. As data is reviewed and compared (a process referred to as \"constant comparison\"), important themes or ideas emerge (i.e., categories) that help contribute to an understanding of the situation. As categories emerge, further selective sampling can be performed to gather more information, often with a focus on exploring variation within those categories. Further analysis aims to organize and understand the relationships between the identified categories, possibly producing higher-level categories in the process. The aim here is to build rather than test theory and the specific result of this process is a theoretical understanding of the situation of interest that is grounded in the data collected. During the sessions of both studies, including the interview portions of those sessions, the experimenter (the author of this dissertation) made field notes. These field notes were 4 later supplemented by transcripts of much of the audio data. Summaries rather than full transcripts were made of several portions of our audio data (in particular from the second study) that did not relate to our analytic interests. Our coding work proceeded from these transcripts, though also with occasional support of the audio data. Three kinds of coding were used in our analysis: open coding which involves identifying concepts in the data (as opposed to assigning predefined categories), selective coding which focuses on further developing identified categories, and theoretical coding which focuses on exploring the relationships between categories. Coding was primarily performed by the author of this dissertation, however members of the supervisory committee (Gail Murphy, Kris De Voider and Brian Fisher) supervised this work. Coding of our empirical data proceeded in four major phases, each of which involved a relatively detailed review of nearly all data collected to the point in time at which the coding was performed. The first phase involved open coding of only the data from the first study, and was conducted both during and after that study. The results of this coding are summarized in Section 3.2. The goal of this phase was not to achieve saturation, but simply to help provide a general understanding of how our participants managed their work and help identify central issues for further analysis. Based on this first phase of coding we decided to focus our further analysis on our participants' questions and their activities around those questions. In the second phase, which began before the second study commenced and continued during and after that study, we used selective coding to identify the questions asked by our participants. This phase also involved \"constant comparison\" of those questions. In the process we found that many of the questions asked were roughly the same, except for minor situational differences, so we developed generic versions of the questions, which very slightly abstract from the specifics of a particular situation and code base. For example, we observed specific comments from session 2.15 including \"I am just looking at the data structures here trying to figure out how to get to the part I need to get to\" [E16] and \"to fix it I have to somehow get the variable from the field to pass into this function\" [E16]. Abstracted versions of the questions this participant was asking are What data can we access from this object? 5 and How can data be passed to (or accessed at) this point in the code? (see questions 16 and 28 discussed i n Chapter 5). The questions identified are low-inference representations of our data which have (along wi th the transcripts) formed a basis for theoret ical ' coding which focused on further comparisons of the questions asked. In the process we found that many of the similarities and differences could be understood.in terms of the amount and type of information required to answer a given question. Th i s observation became the foundation for a categorization of these questions into four categories discussed further in Section 5.1. Dur ing this th i rd phase we also selectively coded for activities around answering the identified questions both at the part icular question level and at the category level. A final phase of analysis involved a review of both the audio data and the transcripts to insure the consistency of our coding. Th is also gave us confidence that no significant questions or other insights relative to our analytic interests were missed, and that the analysis could stop. Beyond the analysis of our empir ical data we have also considered the level of tool support for answering questions provided by a wide range of tools, inc luding relevant tools discussed in the research literature. T h i s investigation drew on our findings around the questions asked and the behavior we observed around answering those questions. T h e goal here was to explore the level of support today's tools provide for answering the questions we have identified and to increase the generalizability of our results. 1.2 Overv iew of Cont r ibu t ions Based on our systematic analysis of the data collected from the two studies, this research makes four key contributions that contribute to an understanding of what programmers need to know about a code base and the role tools play in discovering that information. The work described in this thesis is covered in two publications [85] and [84]. 1. A n empir ical ly based catalog of the 44 types of questions asked by the participants of the two studies. These are slight abstractions over the many specific questions and situations we have observed. These abstractions have allowed us to compare and 6 contrast the questions asked as well as to gather some simple frequency data for those questions. Example questions include: Which types is this type a part of? which was asked in two different sessions, What in this structure distinguishes these cases? which was asked in three different sessions, and Where should this branch be inserted or how should this case be handled? which was asked in seven different sessions. Al though not a complete list of questions, it illustrates the sheer variety of questions asked and provides a basis for an analysis of tool support for answering questions. To our knowledge this is the most comprehensive such list published to date. 2. A categorization of the 44 types of questions into four categories based on the k ind of information needed to answer a question: one category groups questions aimed at finding in i t i a l focus points, another groups questions that bu i ld on in i t ia l points, another groups questions that bui ld a model of connected information, and the final category groups questions over one or more models buil t from the previous category. A l though other categorizations of the questions are possible, we have selected this categorization for three reasons. Fi rs t , the categories show the types and scope of information needed to answer the questions. Second, they capture some intuit ive sense of the various levels of questions asked. Final ly , the categories make clear various kinds of relationships between questions, as well as certain observed challenges around answering those questions. 3. A n analysis of the observed process of asking and answering questions, which provides valuable context for the questions we report. We have shown that many of the questions asked were closely related. For example, some lower-level questions were asked as part of answering higher-level questions. We have shown that the questions a part icipant asked often mapped imperfectly to questions that could be answered directly using the available tools. For example, at times the questions programmers pose using current tools are more general than their intended questions. We have also shown that programmers, at times, needed to mentally combine result sets or other information from mult iple tools to answer their questions. These results contribute 7 to our understanding both about how programmers answer their questions and the challenges they face in doing so. 4. A n analysis of the support existing tools (both industry tools and research tools) provide for answering each k ind of question. We consider which questions are well supported and which are less well supported. We also generalize this information to demonstrate a gap between the support tools provide and that which programmers need. In particular we show that programmers need improved support for asking higher-level questions and more precise questions, support for maintaining, context and put t ing information together, and support for abstracting information and working w i t h subgraphs of source code entities and relationships. We hope these results w i l l provide motivat ion and a foundation for the design of future programming tools. Our analysis has allowed us to consider our data at different levels, each producing different insights. We believe that these insights w i l l be of value to both research and tool design. To show this, this dissertation also includes a discussion of how tool research and design can benefit from our work (i.e., an exploration of the implicat ions of these results). 1.3 Overv iew of Disser ta t ion Contents Chapter 2 (Related Work) Chapter 2 compares our research to previous work, including work in the area of program comprehension (including, work that has proposed cognition models, efforts to use those models to inform tool design and work that analyzes questions asked by programmers) and empirical studies of how programmers manage change tasks. Chapters 3 and 4 (User Studies) Chapters 3 and 4 give setup details (tasks, participants, tools used, etc.) for the two studies we performed. T h e laboratory study is described in Chapter 3. The industry study is described in Chapter 4. These chapters also discuss some in i t ia l observations from each study. Our 8 in i t ia l observations from the first study are organized around eight key observations and five challenges. Our in i t i a l observations from the second study focus on aspects of the situation that differ from that of the first study. Chapter 5 (Questions in Context) The results of our analysis of the data collected from our two studies is presented in Chapter 5. Broadly, this presentation is in two parts. The first is in Section 5.1 and presents the 44 types of questions along wi th our categorization of these questions into four top-level categories. The second is in Section 5.2 and presents context around answering those questions. Chapter 6 (Analysis of Tool Support) Further analysis of these questions, including a discussion of how well existing industry and research tools support a programmer in answering these questions, is presented in Chapter 6. Support for each question is rated as full (i.e., well supported), partial (i.e., some support but not complete), and minimal (i.e., l i t t le or not tool support) . Chapter 6 also contains an overall discussion of the type of tool support that is currently lacking. Chapter 7 and 8 (Discussion and Summary) Chapter 7 discusses possible implications for our results including making specific suggestions for programming tool design. Chapter 7 also contains a discussion of the l imits of our results and a discussion of possible future studies that could be used to bu i ld on our work. We conclude this dissertation wi th a summary in Chapter 8. 9 C h a p t e r 2 Related Work In this chapter, we discuss a range of work related to our research. For each of these we describe the similarities and differences with our work. We cover work in the area of program comprehension including cognitive models, efforts to use those theories to inform tool design, and studies around the analysis of programmers' questions (see Section 2.1). We also cover previous empirical studies that have looked at how programmers use tools and generally how they carry out change tasks and other programming activities (see Section 2.2). This coverage includes a discussion of studies that use similar research methods. However a discussion of particular research tools to support various programming activities is not provided here. Instead, a detailed discussion of this work is provided in Chapter 6, where we analyze the support a wide range of research and industry tools provide for answering the questions programmers ask. 2.1 P r o g r a m Comprehens ion Program comprehension or software understanding is a broad field encompassing a large body of work. Some of this work has focused on proposing cognitive models. We discuss this work in Section 2.1.1. Other work has attempted to use these models or theories to inform tool design. This work is discussed in Section 2.1.2. A third category of work in this area that is most closely related to our work, has focused on analyzing the questions programmers ask. This work is discussed in Section 2.1.3. 10 2.1.1 Cognitive Models A cognitive model describes the cognitive processes and information structures used by programmers to form a mental model, which is a programmer's mental representation of the program being maintained. Many cognitive models have been proposed, and there are both similarities and differences between these models [103]. For example all models rely on the programmer's own knowledge, the source code and available documentation [103]. Disparities between various comprehension models can be explained in terms of differences in experimental factors (programmer characteristics, program characteristics and task characteristics) that influence the comprehension process [91]. In this section we discuss several key cognitive models organized around three categories: top-down program comprehension, bottom-up program comprehension, and models combin-ing multiple strategies. Theories by Brooks [8, 9], Koenemann and Robertson [53], and Soloway and Ehrlich [88] are all based on a top-down approach. Brooks' model, for example, proposes that programmers comprehend a system by reconstructing knowledge about the domain of the program and by working to map that knowledge to the source code. This process starts with a hypothesis about the top-level goal of the program which is then refined by forming sub-hypotheses. This process then continues recursively (in a depth-first manner) until the hypotheses can be verified. Brooks claims that verification depends on beacons in the code, which are features that give clues that a particular structure or operation is present. Again, experimental support exists for each of these models, however the generalizability of the results from the supporting experiments may be limited. According to the bottom-up theory of program comprehension, programmers first read individual statements in the code and then mentally group those statements into higher-level abstractions (capturing control-flow or data-flow, for example). These abstractions are in turn aggregated until this recursive process produces a sufficiently high-level understanding of the program [83]. This process is sometimes referred to as chunking and is driven by the limitations in short term memory [61]. Two theories that propose a bottom-up approach are Pennington's model [71] and Shneiderman and Mayer's cognitive framework [82]. Pennington 11 claims that programmers use a bottom-up approach to form two kinds of models. The first is the program model , which is a model of the control-flow of the program. The second is the situation model, which is a model of the data-flow and functions of the program. Shneiderman and Mayer ' s framework considers syntactic knowledge and semantic knowledge separately, w i th semantic knowledge being buil t in a bottom-up fashion. E a c h of these proposed models are supported by empir ical evidence, although the evidence is based on experiments involving small programs. L i t t m a n et al. noted that programmers use either a systematic strategy or an as-needed strategy [58]. The i r experiments found that programmers need to get both static knowledge (structural information) and causal knowledge (information about runtime interactions between components) and that using an as-needed approach made the second type of information difficult to gain accurately. Letovsky's knowledge-based understanding model proposes that programmers work \"opportunist ically\", using both bottom-up and top-down strategies [56, 54]. A core component of this model is a knowledge base that encompasses a programmer's expertise and background- knowledge. Soloway et al. also propose a model in which programmers use a number of different strategies in the process of t rying to understand a program including: inquiry episodes (read, question, conjecture and search cycle), systematic strategies (tracing the flow, of the program) and as-needed strategies (studying par t icular ly relevant portions of a program) [89]. F ina l ly , von Mayrhauser and Vans .propose a model that they cal l the integrated metamodel. T h i s model combines four components. Three of these describe the comprehension process: the top-down model (i.e., a domain model), the program model (a control-flow abstraction) and the situation model (a data-flow and functional abstraction). The fourth component of the integrated metamodel is a support ing knowledge base as in Letovsky's model. In contrast our work has not focused on proposing new models of program comprehension. Nor have we at tempted to validate any of these existing models. Instead we aim to complement this work by filling in important details often abstracted away by theories of comprehension. To do this we have thoroughly analyzed the information that programmers 12 need to discover. We have also analyzed programmers' behavior around discovering that information along w i t h the role that tools play in the answering process. 2.1.2 Informing Tool Design One of the goals of work in the area of program comprehension has been to inform the design of programming tools (or better documentation methods [89]) to support a programmer wi th various program understanding processes. A s a pr imary goal of our research has also been to empir ical ly support the design of programming tools, in this section we discuss several efforts in this direction. V o n Mayrhauser and Vans ' approach to this is based on their proposed integrated metamodel [102]. Specifically they have used their model (and the supporting empirical studies) to produce a list of tasks and subtasks programmers need to perform as part of understanding a system. For each of these they have suggested associated information needs along wi th tool capabilities to support those information needs. For example, a programmer bui lding a program model (a task) w i l l need to investigate and revisit code segments (a subtask). To do this he or she w i l l need access to previously browsed locations (an information need) which could be supported by a tool that provides a history of browsed locations (a tool capabil i ty) . Storey et al. present a hierarchy of cognitive issues or design elements to be considered during the design of a software exploration or visualization tool [92]. The i r framework is based on a wide range of program comprehension work and is inspired by a similar framework from the domain of hypermedia tools [99]. Elements in this hierarchy represent possible design goals (for example, enhance top-down comprehension) w i th the leaves capturing more concrete design goals (or features) a tool should have given those higher-level design goals (for example, support goal-directed hypothesis-driven comprehension and provide an adequate overview of the system architecture at various levels of abstraction). Storey et al. suggest that this hierarchy, along wi th an iterative design, implement and evaluate approach to tool bui lding can lead to tools that effectively support programmers in exploring code [93]. 13 Work by Walenstein represents a different approach to bridging the gap between cognitive models and tool design [105, 104]. His goal is to provide a more solid theoretical grounding for the design of programming tools based on program comprehension theories as well as other cognitive theories. To this end, Walenstein discusses a theory of cognitive support. While a program comprehension theory describes the process of building a mental model, a theory of cognitive support describes the mental assistance tools can provide. Walenstein claims that such a theory can be used to rationalize tool features in terms of their support for cognition. Example principles that support might be based on include: redistribution (moving cognitive resources to external artifacts) and perceptual substitution (transforming a task into a variant that is cognitively easier). Our work takes a different approach to influencing the design of tools. Rather than begin with models of cognition or other cognitive theories, we begin with observations about how programmers manage a change task and develop an understanding of the associated activities. In particular we aim to use qualitative studies to fill in details around the specific questions programmers ask and how they use tools to answer those questions. We believe these details provide an important connection between program comprehension theories and programming tool research and design. 2.1.3 Analysis of Questions Possibly the research that is closest to our work is previous research into the questions programmers ask or the information they need to perform their work. Like our work most of this research (the work by Erdos and Sneed being the one exception) is based on the analysis of qualitative data collected from studies of programmers performing change tasks. Letovsky presents observations of programmer activities which he calls inquiries [56, 57]. These may involve a programmer asking a question, conjecturing an answer and then possibly searching through the code and documentation to verify the answer (i.e., the conjecture). Letovsky believes there are five kinds of conjectures (and therefore five kinds of associated questions). The first three kinds or categories of conjectures are: why conjectures (questioning the role of a piece of code), how conjectures (about the method for accomplishing a goal) and 14 what conjectures (what is a variable or function). The last two categories interact with the first three. They are whether conjectures (concerned with whether or not a routine serves a given purpose) and discrepancy conjectures (questioning perceived discrepancies). The data for Letovsky's taxonomy is from a study of six programmers working on assigned change tasks to a very small program (approximately 250 lines of code). In contrast, we aim to develop a more comprehensive list of questions and we aim to do this based on much larger systems, a range of realistic tasks and in the context of the tools available today. Erdos and Sneed suggest, based on their personal experience, that seven questions need to be answered for a programmer to maintain a program that is only partially understood: (1) where is a particular subroutine/procedure invoked? (2) what are the arguments and results of a given function? (3) how does control flow reach a particular location? (4) where is a particular variable set, used or queried? (5) where is a particular variable declared? (6) where is a particular data object accessed? and (7) what are the inputs and outputs of a module? [17]. Note that these are seven specific questions, rather than a number of categories as presented by Letovsky. Our work aims to produce a more comprehensive list of questions, but based on empirical results from a range of participants. Our work also aims to consider higher-level questions than those discussed by Erdos and Sneed. Johnson and Erdem extracted and analyzed questions posted to Usenet newsgroups [47]. These questions were classified as goal-oriented (requested help to achieve task-specific goals), symptom-oriented (why something is going wrong) and system-oriented (requested information for identifying system objects or functions). By basing this work on newsgroup postings they were looking at questions asked to experts and they point out that \"newsgroup members may have been reluctant to ask questions that should be answerable by examining available code and documentation\" [48, page 59]. Our goal has been to identify questions asked during such an examination. Building on this work, Erdem et al. also analyzed questions from the Usenet study just mentioned and questions from a survey of the literature (including the work described above) to develop a model of the questions that programmers ask [16]. In their model, a question is represented based on its topic (the referenced entity), the question type (one of: verification, 15 identification, procedural, motivat ion, time or location) and the relation type (what sort of information is being asked for). Aga in , our a im has been to produce a more comprehensive list of questions, including questions at a higher level than those captured in this work. Herbsleb and K u w a n a have empirical ly studied questions asked by software designers during real design meetings in three organizations [38]. T h e y determined the types of questions asked as well as how frequently they were asked. Based on a separate study they also present questions asked by programmers concerning software requirements [55]. Our approach is s imi lar ly empir ical ly based, however we focus on questions asked while performing a change task to a system, rather than questions asked during design meetings for a new system or dur ing requirements gathering. 2.2 Empirical Studies of Change Tasks The si tuation of programmers performing change tasks has been studied from a number of perspectives. M a n y of these have, at least partially, explored the use of programming tools. For example, Storey et al. carried out a user study focused on how program understanding tools enhance or change the way that programmers understand programs [95]. In their study th i r ty participants used various research tools to solve program understanding tasks on a small system. Based on these Storey et al. suggest that tools should support mult iple strategies (top-down and bottom-up, for example) and should aim to reduce cognitive overhead during program exploration. In contrast to our work, Storey et al.'s work d id not attempt to analyze specifically what programmers need to understand. M u r p h y et al. report on observations around how programmers are using the Eclipse Java Development Environment [67]. These observations are based on usage data from 41 programmers collected by an Eclipse plugin that monitors a wide range of user actions. Observations include the percentage of programmers using the various tools (the Package Explorer , the Console and the Search Results V i e w were used by the highest percentage), the commands used by the most programmers (Delete, Save and Paste were among those used by the most programmers), and which refactoring commands were used by the most 16 programmers (Rename being the most popular). These results should provide important information for tool builders on how their tools are being used. We used a similarly instrumented version of Eclipse in our first study, which has allowed us to collect some of this same type of data. However we have collected data over a much shorter duration which limits the conclusions we can draw. Instead we have focused our analysis efforts on the qualitative data we have collected. More similar to our study are efforts that qualitatively examine the work practices of programmers. For example, Flor et al. used distributed cognition to study a single pair of programmers performing a straightforward change task [18]. We extend their methods to a larger participant pool and a more involved set of change tasks with the goal of more broadly understanding the challenges programmers encounter. As another example, Singer et al. studied the daily activities of software engineers [86]. We focus more closely on the activities directly involved in performing a change task, producing a complementary study at a finer scale of analysis. Four recent studies have focused on the use of current development environments (as do our studies). Robillard et al. characterize how programmers who are successful at maintenance tasks typically navigate a code base [77]. Deline et al. report on a formative observational study also focusing on navigation [15]. Our study differs from these in considering more broadly the process of asking and answering questions, rather than focusing exclusively on navigation. Ko et al. report on a study in which Java programmers used the Eclipse development environment to work on five maintenance tasks on a small program [51]. Their intent was to gather design requirements for a maintenance-oriented development environment. Our study differs in focusing on a more realistic situation involving larger code bases, and more involved tasks. Our analysis differs in that we aim specifically to understand what questions programmers ask and how they answer those questions. De Alwis and Murphy report on a field study about how software developers experience disorientation when using the Eclipse Java integrated development environment [1]. They analyzed their data using the theory of visual momentum [111], identifying three factors that may lead to disorientation: the absence of connecting navigation context during program exploration, thrashing between 17 displays to view necessary pieces of code, and the pursuit of sometimes unrelated subtasks. In contrast, our analysis has not employed the theory of visual momentum and has focused on questions and answers rather than disorientation. 2.3 Summary Our work is related to work in the area of program comprehension. Much of this research has focused on proposing and validating cognitive models, that is models of how programmers understand programs. Other work has attempted to use those models to inform the design of tools. Most closely related to our work in this area are previous efforts around analyzing the questions programmers ask as the perform a change task. Our work aims to build on this previous work by providing a much more comprehensive list of questions, by analyzing programmer behavior around answering those questions and also by analyzing tool support for answering questions. Our work is also related to previous work that empirically studies how programmers perform change tasks or how programming tools are used. We differ from much of this work by studying more realistic situations and by focusing more specifically on the process of asking and answering questions. 18 Chapter 3 Study 1: Laboratory-based Investigation The first study we carried out was conducted in a laboratory setting. The goal of this study was to observe programmers performing significant change tasks using state-of-the-practice development tools. Several study design choices were made to make this as realistic as possible. Specifically we used: real change tasks, experienced programmers and a non-trivial code base (about 60KLOC). The study involved nine participants (all of whom were computer science graduate students) and a total of twelve sessions. In each session two participants performed an assigned task as a pair working side-by-side at one computer. We chose to study pairs of programmers because we believed that the discussion between the pair as they worked on the change task would allow us to learn what information they were looking for and why particular actions were being taken during the task, similar to earlier efforts (e.g., [18] and [63]). During each session an audio recording was made of discussion between the pair of participants, a video of the screen was captured, and a log was made automatically of the events in Eclipse related to navigation and selection. At the end of the session the experimenter (the author of this dissertation), who was present during each session, briefly interviewed the participants about their experience. In this chapter we present study details for this first study, including: setup (Section 3.1), tasks (Section 3.1.1) and participants (Section.3.1.2). We also cover some initial observations that resulted from our analysis of the data from this study only. These initial observations capture the basic activities we observed organized around eight key observations which are summarized in Table 3.3. Examples of these key observations include Goals were initially 19 narrowly focused, but became more broad in scope as programmers struggled to understand the system sufficiently to perform the task, (see Observation 2) and Revisiting entities and relationships was common, but not always straightforward, (see Observation 8). As described earlier (see Section 1.1) we have used a grounded theory approach in analyzing our data. The observations presented here are based on the categories we developed as part of trying to understand the situation under study and before focusing our analysis more closely on the question and answering process. In addition to being interesting in their own right, presenting these initial observations provides insights into the way our theoretical understanding of this situation has developed as we applied grounded theory analysis. We selected complex, non-local changes for our participants to work on. The reason for this choice was to make the situation realistic and to gather data about a challenging situation, one that would likely benefit from tool support. As expected, the participants in this study found the assigned tasks challenging and many participants expressed a feeling of having made little progress during a session or specific parts of a session. In this chapter we present our initial impressions of the major challenges that the participants faced in completing the tasks: gaining a sufficiently broad understanding and ineffective use of tools, for example. Results from a more extensive analysis of the data from both studies, including further discussion about these challenges, is presented in Chapter 5. 3.1 Study Setup Participants in this study used the Java programming language [28] and the Eclipse Java development environment (version 3.0.1), a widely used IDE that we consider representative of the state-of-the-practice [43]. A screenshot of Eclipse is shown in Figure 3.1 showing several commonly-used views or tools: the package explorer (showing the package, file and class structure), the tabbed source code editor, the content outline view (showing the structure of the currently open file) and the call hierarchy browser. Other commonly-used views not shown in the screenshot include a type hierarchy view, a search results view (showing results from both lexical and static analysis based searches), a breakpoint view, a variable view 20 EditorFocusListener.java - Eclipse Platform File Edit Source Refactor Navigate Seard H Package E... S3 Hierarchy JUnrt .tableau -: ]U ca.ubc.tableau * ,l EditorFocusListener.java + ,Jj TableauPlugjn.java + ,J , TableauViewPart.java + .{-j ca.ubc.tableau.ut + |B ca.ubc,tableau.views ffl org.echpse.jdt.internal.u + i3 test + S& Plug -in Dependencies \u00E2\u0080\u00A2 ~ JRE System Library [j2rel.4,2] + & icons \u00C2\u00BBi\u00C2\u00AB burid.properoes \u00C2\u00A7j pkjgtn.xmJ + Ic7 ca.ubc.treemap + !,Z7 experiment + ', jp p*uoin experiment S tj-ij >post-sub-se4ect-axion 2003/10/17 0 , \u00E2\u0080\u00A2 'L Oubrie ! 7 \u00C2\u00A3 3 ^ Debug 1% H Xs \u00E2\u0080\u00A2 E d i t o r F o c u s l i s t e n e r (Tableau' .* t h i s . c o n t r o l l e r \u00E2\u0080\u00A2 c e n t r e } v o i d BHWBBBf J a v a E d i c c r p a r \u00E2\u0080\u00A2 - r . r e g i . 3 t e r () ; e d i t o r \u00C2\u00BB p a r t ; t4i ca.ubc.tableau : + | | import declarations V J , EditorFocusListenc ^ controller: TableauViewPart A editor: lavaEditor A seiectionProvider: ESdedfonProvider A ( EdnorFocusListener (TableauViewPart) A regis ter(JavaEdi tor) A unregisterO \u00E2\u0080\u00A2 c selectonChangedtSetectonCrtangedE'/et getEementAt(int, boolean) > < Debug Problems Console Search Properties Call Hierarchy . Members calling Yegister(JavaEditor)' -m Workspace - A register (JavaEdi tor) \u00E2\u0080\u00A2 ca.ubc. tableau. Editor-Focuses tener - i 3 lm&gif ( | g * E i v t l u t . n ~~1 rtoMOf\u00C2\u00ABfijls|j.c*ption PrgMa-.M Node-Labels: Above Mode (ffced) v t/cubtk: \u00E2\u0080\u00A2 N*rtj\u00C2\u00AB\u00C2\u00ABjn: Hsgr\u00C2\u00ABfy ] java.aAt.Graphics setCcttortCotor) * w java.uO.vector.eiementAtfint): java.lang.Object )ava.utl. vector.sueQ; i t B Figure 6.3: (A) A SHriMP tree layout showing part of a system's type hierarchy. (B) A Relo visualization showing a number of classes in one package (CH.ifa.draw.figures), along with source code details for one method (setAttribute) and several connections to methods and types outside of that package. 81 28 How can data be passed to (or accessed at) this point in the code? Minimal support: Runtime visualization tools (e.g., BALSA [10]) 29 How is control getting (from here to) here? Partial support: Visualization or browsing tools (e.g., a call hierarchy browser) 30 Why isn't control reaching this point in the code? Partial support: Debugging and slicing techniques (e.g., Whyline [52]) 31 Which execution path is being taken in this case? Partial support: Debugging and slicing techniques 32 Under what circumstances is this method called or exception thrown? Minimal support: Debugging and visualization tools 33 What parts of this data structure are accessed in this code? Minimal support: Browsing or overview tools Table 6.6: A summary of the techniques and tools applicable to answering, questions 28 to 33, along with the level of support for each: full, partial or minimal. All of these questions are from the third category and are about data and control flow. such as 21 (How are instances of these types created and assembled?) and 34 (How does the system behavior vary over these types or cases?). SHriMP (along with Rigi [65] and SNIFF+ [50]) was featured in an experiment which aimed to determine how tools affect how programmers understand programs [95]. The experiment showed that the amount of information (especially arcs) presented could be overwhelming, even for the relatively small program used (1700 LOC). The participants in our studies faced this challenge as well; being able to clearly see and work with the relevant subgraph(s) of the system was difficult. Relo is a tool that aims to support program understanding by allowing interactive exploration of code. As a programmer explores relationships found in the code, Relo builds a visualization (which is initially empty) of what has been explored preserving one kind of context. An example of this is shown in part B of Figure 6.3. Though the process still revolves around lower-level questions and 85 W* Cal l H i e r a r c h y ZZ \ Sea rch P r o b l e m s Javadoc Members calling 'gctfactoryQ' - in workspace D e c l a r a t i o n ; C o n s o l e P ro jec t i on t f g e t F a c t o r y O - o r g . a r g o u m l . m o d e l . u m l . U m l F a c t o r y f\ <\u00C2\u00A3> b u i l d A t t r i b u t e O - o r g . a r g o u m l . m o d e l . u m l . f o u n d a t i o n . c o r e . C o r e ' p a r s c A t t r i b u t e ( S t n n g . Object ) o rg .a rgoum l .um l .gene ra to r .Pa r : p a r s e A u r i b u t e F i g t O b j e c t . Ob jec t , Str ing) o r g . a r g o u m l . u m tex tEd i t cd (F igTex t ) o r g . a r g o u m l . um l . d i a g r a m , static \u00E2\u0080\u009E ; \u00E2\u0080\u00A2 b u i l d A s s o c i a t i o n ( M C I a s s i f i e r . b o o l e a n . M A g g r e g a t i o n K m d . M C l a * ge tNewNode( i n t ) o r g a r g o u m l . u m l d i ag ram.dep loymen t ui.Selc , p a r s e A t t r i b u t e ( s , at) p a r s e A t t n b u t e l s . new. 3 V Figure 6.4: Eclipse Call Hierarchy Viewer showing a the call hierarchy rooted at a method named getFactory. no direct support is provided to identifying relevant information, the resulting visualization may help a programmer in answering higher-level questions by externalizing more supporting information [114]. Questions 29 (How is control getting (from here to) here?), 33 (What parts of this data structure are accessed in this code?) and 37 (What is the mapping between these Ul types and these model types?) all consider two different sets of entities or points in the code and ask about the connections between them. For example, question 29 is about understanding the control flow between two methods. A tool such as the Call Hierarchy Viewer provided in Eclipse (see Figure 6.4) can be used to produce information towards answering this questions, but the branching factor is high and we observed that our participants rarely used this viewer beyond two or three calls. The Relo tool described above supports a related feature called Autobrowsing. Autobrowsing tries to model a simple directed exploration activity between two or more selected entities. It effectively does a breadth first search and adds a source code entity to the visualization that is relevant to all of the selected entities. The process can be repeated and in some simple situations such a feature may help answer these questions about understanding connections. In summary, tool support for answering questions in this category is limited. Of the thirteen questions, we found that four had partial support, while the rest had only minimal support. We found that often to answer these questions lower-level, possibly less refined versions of these questions (i.e., ones with better tools support) must be asked. The 86 O O O Te rm ina l - bash - 8 2 x 1 7 i \" x w ,, _ , .. , I - , \u00E2\u0080\u0094 _ \u00E2\u0080\u0094 _ 108C122 < for value iti values |> var iable .each.value( leve l ) { I value I 110,116cl24,128 < solut ion[variable] - value < i f va t id?(csp , so lu t ion ,var iab le , l eve l ) < each(csp, so lut ion , level+1, Sblock.) < e n d < undo_propagate(csp, so lut ion, level) i f ^propagate | < end < s o l u t i o n [ v a r i a b l e ] \u00E2\u0080\u00A2 U N S E T > variable .value = value Q > each(csp, level+1, Eijlock) if va l id? (var iab le , c sp , level ) * > urido_propagate(csp, level') if (propagate \u00C2\u00BB \> > d Figure 6.5: Sample output from the diff command line tools. consequences of this include noisier results and the need to mentally put together answers, though visualization tools may make integrating this information easier. 6.4 Answering Questions About Groups of Subgraphs In this section we discuss tool support for answering questions 34 through 44. These are all high-level questions about one or more subgraphs of a system. Including, for example, questions about how two subgraphs were related, or questions about the potential impacts of changes to a subgraph. Like questions discussed in the previous category, for the participants in our studies these questions required multiple lower-level questions to provide the necessary information and even when that information was identified, answering the intended questions could still be difficult. Tables 6.7 and 6.8 list some of the techniques and tools applicable to each of these questions, as well as the level of supported provided. Questions 34 (How does the system behavior vary over these types or cases?), 35 (What are the differences between these files or types?) and 36 (What is the difference between these similar parts of the code (e.g., between sets of methods)?) are about making comparisons between behavior, types or methods. Generally making comparisons (especially comparing behavior) is difficult, especially comparing behavior as needed for answering question 34. For 87 34 How does the system behavior vary over these types or cases? Minimal support: Dynamic visualization 35 What are the differences between these files or types? Partial support: Line-based comparison tools (e.g., diff [20]) 36 What is the difference between these similar parts of the code (e.g., between sets of methods)? Partial support: Line-based comparison tools 37 What is the mapping between these U l types and these model types? Partial support: Conceptual module querying [2] Table 6.7: A summary of the techniques and tools applicable to answering questions 34 to 37, along with the level of support for each: full, partial or minimal. All of these questions are over from the fourth category and are about comparing and connecting. questions 35 and 36, diff [20] which is a command line tool designed to show, line by line, the differences between two files, provides partial support (see Figure 6.5). However in the cases we observed that the differences (from a line by line view) were sufficiently large that diff was of limited help: \"the code is different enough that I can't just do a simple diff\" [E14]. Question 37 (What is the mapping between these Ul types and these model types?) is an example of a question asked when a programmer develops a (partial) understanding of two related groups of entities and wants to understand the connection between those (the control-flow between them, for example). Baniassad and Murphy have developed a technique and a tool called conceptual module querying, that in some situations may help with identifying these connections [2]. A conceptual module is defined by a list of source code lines and the associated tool allows programmers to ask: how do the conceptual modules relate to each other? If two conceptual modules are defined appropriately (one for each subgraph) then in some cases a query could yield some information of help to a programmer in answering question 37. The conceptual module query tool also allows programmers to ask: how are the conceptual modules related to the other source code? This query may help answer questions 88 38 Where should this branch be inserted or how should this case be handled? Minimal support: Visualization and browsing tools 39 Where in the U l should this functionality be added? Minimal support: Visualization and browsing tools 40 To move this feature into this code what else needs to be moved? Partial support: Conceptual module querying [2] 41 How can we know this object has been created and initialized correctly? Minimal support: Visualization and browsing tools 42 What will be (or has been) the direct impact of this change? Partial support: Testing and impact analysis techniques (e.g., Chianti [74]) 43 What will be the total impact of this change? Partial support: Testing and impact analysis techniques (e.g., Unit tests) 44 Will this completely solve the problem or provide the enhancement? Partial support: Testing techniques Table 6.8: A summary of the techniques and tools applicable to answering questions 38 to 44, along with the level of support for each: full, partial or minimal. All of these questions are over from the fourth category and are about changes and impacts of changes. 89 around how a subgraph is connected to the rest of the system, for example question 40 (To move this feature into this code what else needs to be moved?). Questions 42 (What will be (or has been) the direct impact of this change?), 43 (What will be the total impact of this change?) and 44 (Will this completely solve the problem or provide the enhancement?) are about the impact of (planned) changes to a system. Unit testing and other testing techniques allow programmers to verify some aspects of the impact of changes to a code base. In situations where an extensive test suite is available, testing allows programmers to determine both if their changes had the desired effect and whether there were any unintended effects [3]. Note however that in our studies such a test suite was the exception, rather than the rule and that several of the participants from study two were writing code for an environment that made some types of testing difficult. Various impact analysis techniques and tools exist to help programmers identify the parts of a system impacted by a change. For example, Fyson and Boldyreff show how this can be done using program understanding techniques to populate a ripple propagation graph [24] and Chianti is a tool that uses a suite of unit tests to generate a list of affected unit tests given a change to a system [74]. These techniques generate candidates for the programmer to investigate and so constitute partial support for understand the impact of changes made. Questions 38 (Where should this branch be inserted or how should this case be handled?), 39 (Where in the Ul should this functionality be added?) and 41 (How can we know this object has been created and initialized correctly?), as well as several questions discussed previously such as 26 ( What is the \"correct\" way to use or access this data structure?), require understanding a code base at a relatively abstract level. Building this level of understanding-based on source code details is quite difficult. Tool support for this activity is limited and many of the possibly relevant research tools that exist have not been empirically evaluated in a way to support strong claims about the support provided. Despite this, we briefly describe several approaches to supporting programmers in abstracting information. Several visualization tools (such as SHriMP, already discussed) allow subtrees and relationships to be collapsed or expanded during exploration. Other tools or techniques exist which aim to support programmers in recovering a code base's design or architecture, 90 which has been viewed as a clustering problem [59], a CSP problem [112], a graph partitioning problem [12], a visualization and composition problem [66], and a graph matching problem [79, 80]. These techniques and tools provide a kind of abstraction over the details of a code base. They aim to support programmers in developing an understanding of the decomposition of a system or its macrostructure and may provide some support for answering questions 26, 38, 39 and 41 to the extent that the answers to these questions can be found along that structure. Another approach to supporting programmers in abstracting information stems from research around solving the concept assignment problem which is a generalization of the feature location problem discussed above. The concept assignment problem is that of discovering human-oriented (high-level) concepts and assigning them to their realizations within a program [4]. Tool support for this is limited though several research tools exist including DESIRE [5], HB-CAS [26] and PAT [31]. These tools use various techniques to tag contiguous regions of code with a term that captures the meaning of that code. Although such an approach may help with some aspects of understanding code at a higher-level, many questions in this category (and the previous category) require understanding arbitrary subgraphs, including understanding why things are the way they are and how to use or change things in a way that is consistent with the current code base. Despite some tool support, we believe that developing this level of understanding remains difficult. 6.5 The Gap In this chapter we have considered techniques and tools for answering each kind of question asked by our participants. The level of support available by category is summarized in Table 6.9, with more details available in the other tables in this chapter. This summary table shows that questions in the first two categories all had at least partial support, with the majority having full support. We found that in most, though not all, situations these questions could be answered relatively directly using today's tools. The situation was quite different for questions in categories three and four. None of these questions had full support 91 Question Category Full Partial Minimal 1. Finding an initial focus point (5 questions) 3 2 0 2. Building on a point of focus (15 questions) 12 3 0 3. Understanding a subgraph (13 questions) 0 4 9 4. Over groups of subgraphs (11 questions) 0 7 4 Total 15 (34%) 16 (36%) 13 (30%) Table 6.9: Summary of the number of questions with full, partial or minimal support by question category. This information is broken out by question in Tables 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7 and 6.8. with over half having only minimal support. These questions are at a relatively higher level and require programmers to consider a wider range of information and as a result these are more difficult for tools to support and more difficult for programmers to answer. Regarding questions in category one and category two, it is important to note that often these are asked in support of higher-level questions and the particular supporting question asked will be heavily influenced by the tools available to the programmer. In general, as the actual question asked (using the available tools) gets conceptually further from the intended questions the number of questions that have to be asked will increase and the number of false positives and negatives will increase; a question asked by a programmer can be supported effectively by a given tool, but the programmer may not have been well supported in answering their intended question. In addition to providing information about which questions (or categories of questions) can or can not be answered directly with today's programming tools, our results also suggest more general limitations with current industry and research tools. Based on our results, we believe programmers need better or more comprehensive support in three related areas: (1) support for asking higher-level questions and more precise questions, (2) support for maintaining context and putting information together, and (3) support for abstracting information and 92 working with subgraphs at various levels of detail. These areas of support are discussed in more detail below. 1. Support for asking higher-level questions. Many programming tools support only relatively low-level questions. For example, questions that are limited to individual entities and just one type of relationship. On the other hand many of the questions asked go beyond what can be directly asked under these limitations. For example, questions about subgraphs or groups of subgraphs such as 30 (Why isn't control reaching this point in the code?) require considering multiple entities and relationships. In these situations, as we have discussed, programmers map their questions to multiple tools which they believe will produce information that will contribute to answering their questions. Programmers are often limited in how precise or refined their questions can be. A programmer's questions often have an explicit or implicit context or scope. In question 31 (Which execution path is being taken in this case?) this context is explicit. Question 33 (What parts of this data structure are accessed in this code?) asks about changes to a data structure but in the context of a certain section of code. Similarly, in session 1.11 participant N2 wanted to learn about the properties of an object in the context of a particular failing case of a large loop. Tools generally provide little or no support for a programmer to specify such context as a way to scope his or her questions, and so programmers ask questions more globally than they intend. Due to lack of precision or support for suitably refining queries, result sets or other information displayed by the various tools include many items irrelevant to the intended questions (i.e., false positives relative to the information the participant was seeking) and determining relevance requires additional exploration. 2. Support for maintaining context and putting information together. Most tools treat questions as if they were asked in a isolation, though we have shown that often a particular question is part of a larger process. For example, answering a higher-level question may involve multiple lower-level questions each possibly asked using different tools. Similarly, answering a question may involve gathering information from a code base written in two different languages, each with support from a different set of programming tools. Even when multiple questions are asked using the same tool, the results are presented by that 93 tool in isolation as largely undifferentiated and unconnected lists. Some tools that we have shown to partially help answer higher-level questions, such as impact analysis tools simply produce a list of candidate entities to consider; investigating those can be nontrivial and generally requires using other tools. Although earlier in this chapter we discussed several notable exceptions (conceptual module querying, for example), generally tools are designed to answer a specific kind of question targeting a particular programming language or type of artifact. In cases like these the burden is on the programmer to maintain the context and assemble the information which can be difficult; \"it gets very hard to think in your head how that works\" [E14]. Missing is support for bringing information together as well as support for building toward an answer. We believe also that there are missed opportunities for tools to make use of the larger context to help programmers determine what is relevant to their higher-level questions. 3. Support for abstracting information and working with subgraphs. Questions in categories three and four (that is questions about understanding subgraphs, and questions over one or more subgraphs) are distinguished from those in the other two categories by the volume and variety of information that must be considered to answer those questions. Answering these questions involves treating a number of entities as a conceptual whole and dealing with information at a number of levels. These distinguishing factors are also the factors that make answering these questions difficult as working with subgraphs at this level is not well supported by tools. Making comparisons between a number of entities is one example of an operation between subgraphs that is not well supported. As mentioned earlier, E2 spent nearly 30 minutes of session 2.2 merging the changes from one version of a pair of files into a different version of those files, because the changes were sufficiently large that diff (and therefore merge) was of limited help. In some cases, even harder is a comparison between a system's behavior in two different cases as in question 34 (How does the system behavior vary over these types or cases?). Further, to answer questions such as question 38 (Where should this branch be inserted or how should this case be handled?) requires both the right details and an overview of the right aspects of the structure; this requires support for 94 abstracting information in a way that goes beyond collapsing subsystem nodes and composite arcs in a nested graph. Tool support for understanding systems at this level is quite limited. 6.6 Summary Based on our results and results from previous research papers, we have investigated support provided by today's research tools for answering the 44 kinds of questions asked by our participants. We found that overall fifteen of those questions had full support, sixteen had partial support and thirteen had minimal or no support. Generally, answering questions from the first two categories was well supported, while answering questions from the third and fourth categories was much less well supported. We also observed that questions in the first two categories are at times asked precisely because there is tool support for answering them. In the process of this investigation, we have identified several areas in which programmers need better support, including support for asking higher-level and more precise questions, support for maintaining context and putting information together, and finally support for abstracting information and working with subgraphs at various levels of detail. Our results suggest that improvements in these areas will move programming tools closer to programmers' questions and the process of answering those questions. 95 C h a p t e r 7 Discussion The previous two chapters have presented the results of our analysis of the data collected from our two studies. This presentation has included a discussion of areas where we have found tool support to be missing for activities around answering questions (see Section 6.5). The goal of this chapter is to provide a brief discussion of these studies and results. First, we discuss the implications of our results in Section 7.1. This discussion includes suggestions for programming tool features to better support the activities we observed around answering those questions. To make these feature suggestions concrete we describe features for a hypothetical search tool. Second, we discuss our findings in relation to some earlier work (see Section 7.2). Third, we discuss the limitations of our studies which affect how our results should be interpreted (see Section 7.3). Finally, we discuss possible future studies that could be used to build on our research (see Section 7.4). 7.1 Implications Our analysis has identified particular questions that can not easily be answered using current tools. We observed that the most difficult questions to answer are often the higher-level questions-those about one or more subgraphs of a system. One very effective way to get an answer to such a high-level question is to \"ask somebody who knows\" [48]. People have an ability to make a sketch or provide an explanation that captures just the right details; abstracting and crossing boundaries. Their tools are natural language, metaphors and pictures. Using these people can relatively effectively answer the types of questions that are the most difficult to answer using current tools. Though there are limitations with people's abilities here: \"there are lots of different metaphors [used to describe this code base] 96 but it ends up diving in to details so quickly that the metaphors are often lost\" [E2]. The availability of documentation in our two studies was limited. Based on our results, we believe that improving the ability of programmers to document and communicate about parts of their code would be valuable. This support should focus on subgraphs of the system including issues around \"why\" and \"how best\" kinds of questions. When neither documentation nor an expert is available to help answer a programmer's questions, he or she heavily relies on tools to help with activities around answering those questions. Our results point to several aspects of the gap between the support programmers need in answering their questions and the support that tools provide (see Section 6.5). Here we explore possible ways that tools can be designed to better support the process of answering questions. Based on support that we have shown to be missing from current tools, we believe that programmers need tools that support questions or views that cross relationship types and artifact types. Tools need to support more options for scoping queries (possibly integrating techniques such as slicing or feature location techniques) to increase the relevance of information presented. Tools that present lists or trees of source code entities could show additional information (some source code details, for example) to make it easier for programmers to determine what is relevant and to increase the information content of the view. Search results or other views should maintain context between searches or other actions. Examples of contexts that could be profitably used include: visited entities, previous search results, modified entities and executions of the program. These kinds of contexts could be made explicit and used between tools. If made explicit these represent groups of entities that tools could support operations over (intersecting, union, comparison and connecting, for example) and generally should support working with and seeing together. Finally, in bringing together more information, tools should support flexibly combining appropriate details and overviews of that information. To make these suggestions more concrete, next we describe features of a hypothetical search tool. The tool is mocked up in Figure 7.1. On the left are the search results. On the 97 Search Results Mockup SKTImage - (id)copyWithZone:(NSZone *)zone id newObj = [super copyWithZone:zone]; [newObj setlmage:[self image]]; ... - (void)loadPropertyListRep:(NSDictionary *)dict id obj; ... obj = [diet objectForKey:SKTImageContentsKey]; ... [self setlmage:[unarchiver unarchiveObject:obj]]; ... - (void)setlmageFile:(NSString *)filePath newlmage = [[NSImage allocWithZone:[self zone]]...]; if (newlmage) { [self setlmage:newlmage]; ... SKTCraphicView - (BOOL)makeNewlmageAtPoint:(NSPoint)point NSImage *contents = [[NSImage allocWithZone: ...]; if (contents) {... [newlmage setlmagexontents]; ... calls:setlmage Level of detail Limit scope i "Thesis/Dissertation"@en . "10.14288/1.0052042"@en . "eng"@en . "Computer Science"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "Asking and answering questions during a programming change task"@en . "Text"@en . "http://hdl.handle.net/2429/31065"@en .