Summarizing software artifacts

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Summarizing software artifacts Rastkar, Sarah

Abstract

To answer an information need while performing a software task, a software developer sometimes has to interact with a lot of software artifacts. This interaction may involve reading through large amounts of information and many details of artifacts to find relevant information. In this dissertation, we propose the use of automatically generated natural language summaries of software artifacts to help a software developer more efficiently interact with software artifacts while trying to answer an information need. We investigated summarization of bug reports as an example of natural language software artifacts, summarization of crosscutting code concerns as an example of structured software artifacts and multi-document summarization of project documents related to a code change as an example of multi-document summarization of software artifacts. We developed summarization techniques for all the above cases. For bug reports, we used an extractive approach based on an existing supervised summarization system for conversational data. For crosscutting code concerns, we developed an abstractive summarization approach. For multi-document summarization of project documents, we developed an extractive supervised summarization approach. To establish the effectiveness of generated summaries in assisting software developers, the summaries were extrinsically evaluated by conducting user studies. Summaries of bug reports were evaluated in the context of bug report duplicate detection tasks. Summaries of crosscutting code concerns were evaluated in the context of software code change tasks. Multi-document summaries of project documents were evaluated by investigating whether project experts find summaries to contain information describing the reason behind the corresponding code changes. The results show that reasonably accurate natural language summaries can be automatically produced for different types of software artifacts and that the generated summaries are effective in helping developers address their information needs.

Item Metadata

Title	Summarizing software artifacts
Creator	Rastkar, Sarah
Publisher	University of British Columbia
Date Issued	2013
Description	To answer an information need while performing a software task, a software developer sometimes has to interact with a lot of software artifacts. This interaction may involve reading through large amounts of information and many details of artifacts to find relevant information. In this dissertation, we propose the use of automatically generated natural language summaries of software artifacts to help a software developer more efficiently interact with software artifacts while trying to answer an information need. We investigated summarization of bug reports as an example of natural language software artifacts, summarization of crosscutting code concerns as an example of structured software artifacts and multi-document summarization of project documents related to a code change as an example of multi-document summarization of software artifacts. We developed summarization techniques for all the above cases. For bug reports, we used an extractive approach based on an existing supervised summarization system for conversational data. For crosscutting code concerns, we developed an abstractive summarization approach. For multi-document summarization of project documents, we developed an extractive supervised summarization approach. To establish the effectiveness of generated summaries in assisting software developers, the summaries were extrinsically evaluated by conducting user studies. Summaries of bug reports were evaluated in the context of bug report duplicate detection tasks. Summaries of crosscutting code concerns were evaluated in the context of software code change tasks. Multi-document summaries of project documents were evaluated by investigating whether project experts find summaries to contain information describing the reason behind the corresponding code changes. The results show that reasonably accurate natural language summaries can be automatically produced for different types of software artifacts and that the generated summaries are effective in helping developers address their information needs.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2013-05-17
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0052199
URI	http://hdl.handle.net/2429/44482
Degree	Doctor of Philosophy - PhD
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2013-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Summarizing software artifacts Rastkar, Sarah

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights