Investigating the impact of methodological choices on source code maintenance analyses

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Investigating the impact of methodological choices on source code maintenance analyses Ahmad, Syed Ishtiaque

Abstract

Many prediction models rely on past data about how a system evolves to learn and anticipate the number of changes and bugs it will have in the future. As a software engineer or data scientist creates these models, they need to make several methodological choices such as deciding on size measurements, whether size should be controlled, from what time range metrics should be obtained, etc. In this work, we demonstrate how different methodological decisions can cause practitioners to reach conclusions that are significantly and meaningfully different. For example, when measuring SLOC from evolving source code of a method, one could decide to use the initial, median, average, final, or a per-change measure of method size. These decisions matter; for instance, one prior study observed better performance of code metrics for defect prediction in general, while another study found negative results when performance was evaluated through a time-based approach. Our result identifies the reason behind this contradiction is due to the age of the methods not being explicitly controlled, where the first six months of a method’s evolution could have provided a better understanding of maintenance. Understanding the impact of these different methodological decisions is especially important given the increasing significance of approaches that use these large datasets for software analysis tasks. This work can impact both practitioners and researchers by helping them understand which of the methodological choices underpinning their analyses are important, and which are not; this can lead to more consistency among research studies and improved decision making for deployed analyses.

Item Metadata

Title	Investigating the impact of methodological choices on source code maintenance analyses
Creator	Ahmad, Syed Ishtiaque
Supervisor	Holmes, Reid
Publisher	University of British Columbia
Date Issued	2021
Description	Many prediction models rely on past data about how a system evolves to learn and anticipate the number of changes and bugs it will have in the future. As a software engineer or data scientist creates these models, they need to make several methodological choices such as deciding on size measurements, whether size should be controlled, from what time range metrics should be obtained, etc. In this work, we demonstrate how different methodological decisions can cause practitioners to reach conclusions that are significantly and meaningfully different. For example, when measuring SLOC from evolving source code of a method, one could decide to use the initial, median, average, final, or a per-change measure of method size. These decisions matter; for instance, one prior study observed better performance of code metrics for defect prediction in general, while another study found negative results when performance was evaluated through a time-based approach. Our result identifies the reason behind this contradiction is due to the age of the methods not being explicitly controlled, where the first six months of a method’s evolution could have provided a better understanding of maintenance. Understanding the impact of these different methodological decisions is especially important given the increasing significance of approaches that use these large datasets for software analysis tasks. This work can impact both practitioners and researchers by helping them understand which of the methodological choices underpinning their analyses are important, and which are not; this can lead to more consistency among research studies and improved decision making for deployed analyses.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2021-09-10
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0401968
URI	http://hdl.handle.net/2429/79697
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2021-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Investigating the impact of methodological choices on source code maintenance analyses Ahmad, Syed Ishtiaque

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights