UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Summarizing software artifacts Rastkar, Sarah

Abstract

To answer an information need while performing a software task, a software developer sometimes has to interact with a lot of software artifacts. This interaction may involve reading through large amounts of information and many details of artifacts to find relevant information. In this dissertation, we propose the use of automatically generated natural language summaries of software artifacts to help a software developer more efficiently interact with software artifacts while trying to answer an information need. We investigated summarization of bug reports as an example of natural language software artifacts, summarization of crosscutting code concerns as an example of structured software artifacts and multi-document summarization of project documents related to a code change as an example of multi-document summarization of software artifacts. We developed summarization techniques for all the above cases. For bug reports, we used an extractive approach based on an existing supervised summarization system for conversational data. For crosscutting code concerns, we developed an abstractive summarization approach. For multi-document summarization of project documents, we developed an extractive supervised summarization approach. To establish the effectiveness of generated summaries in assisting software developers, the summaries were extrinsically evaluated by conducting user studies. Summaries of bug reports were evaluated in the context of bug report duplicate detection tasks. Summaries of crosscutting code concerns were evaluated in the context of software code change tasks. Multi-document summaries of project documents were evaluated by investigating whether project experts find summaries to contain information describing the reason behind the corresponding code changes. The results show that reasonably accurate natural language summaries can be automatically produced for different types of software artifacts and that the generated summaries are effective in helping developers address their information needs.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International