Advancing intrinsic and non-intrinsic bug classification with NLP, machine learning, and few-shot prompt engineering

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Advancing intrinsic and non-intrinsic bug classification with NLP, machine learning, and few-shot prompt engineering Bhandari, Pragya

Abstract

Bug classification is a pivotal approach in analyzing bugs, streamlining debugging processes, and facilitating analysis. Recent studies have shown that bugs can be categorized into intrinsic and extrinsic types. Intrinsic bugs, which are traceable to a cause in a project's code, contrast with extrinsic bugs, which lack such traceability and are typically instigated by external failures affecting the project. This thesis presents the first ever automatic bug classification tool called BuggIn using Natural Language Process (NLP) and Machine Learning (ML) models to classify intrinsic and non-intrinsic bugs and also introduces the application of few-shot prompt engineering in bug classification tasks by implementing a novel tool using Large Language Model (LLM). We conduct four experiments: 1) implementing an automatic bug classifier using solely text from the bug reports and training NLP and ML models; 2) assessing non-textual bug report features' effectiveness for ML model inputs; 3) replicating the BuggIn pipeline with both textual and non-textual features (i.e., source code and code review metrics); and 4) exploring alternative techniques utilizing low-resource datasets through few-shot prompt engineering using GPT-3. The results are promising while using a combination of features for the bug classification task yielding much better classification score as compared to using only textual or only non-textual features. Furthermore, the results for experiments involving LLM and few-shot prompt engineering show potential in bug classification tasks, demonstrating comparable scores as the ML-based classifiers with much fewer data records. Overall, this thesis bridges the gaps in the literature involving the classification of intrinsic and non-intrinsic bugs by enhancing the classification model and by paving the way for novel methods of solving the task. Moreover, the inclusion of methods like few-shot prompt engineering can be a good alternative to other methodologies that require large datasets that have remained prevalent in bug classification research so far.

Item Metadata

Title	Advancing intrinsic and non-intrinsic bug classification with NLP, machine learning, and few-shot prompt engineering
Creator	Bhandari, Pragya
Supervisor	Rodríguez-Pérez, Gema
Publisher	University of British Columbia
Date Issued	2024
Description	Bug classification is a pivotal approach in analyzing bugs, streamlining debugging processes, and facilitating analysis. Recent studies have shown that bugs can be categorized into intrinsic and extrinsic types. Intrinsic bugs, which are traceable to a cause in a project's code, contrast with extrinsic bugs, which lack such traceability and are typically instigated by external failures affecting the project. This thesis presents the first ever automatic bug classification tool called BuggIn using Natural Language Process (NLP) and Machine Learning (ML) models to classify intrinsic and non-intrinsic bugs and also introduces the application of few-shot prompt engineering in bug classification tasks by implementing a novel tool using Large Language Model (LLM). We conduct four experiments: 1) implementing an automatic bug classifier using solely text from the bug reports and training NLP and ML models; 2) assessing non-textual bug report features' effectiveness for ML model inputs; 3) replicating the BuggIn pipeline with both textual and non-textual features (i.e., source code and code review metrics); and 4) exploring alternative techniques utilizing low-resource datasets through few-shot prompt engineering using GPT-3. The results are promising while using a combination of features for the bug classification task yielding much better classification score as compared to using only textual or only non-textual features. Furthermore, the results for experiments involving LLM and few-shot prompt engineering show potential in bug classification tasks, demonstrating comparable scores as the ML-based classifiers with much fewer data records. Overall, this thesis bridges the gaps in the literature involving the classification of intrinsic and non-intrinsic bugs by enhancing the classification model and by paving the way for novel methods of solving the task. Moreover, the inclusion of methods like few-shot prompt engineering can be a good alternative to other methodologies that require large datasets that have remained prevalent in bug classification research so far.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-07-12
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0444146
URI	http://hdl.handle.net/2429/88627
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Irving K. Barber Faculty of (Okanagan); Computer Science, Mathematics, Physics and Statistics, Department of (Okanagan)
Degree Grantor	University of British Columbia
Graduation Date	2024-09
Campus	UBCO
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Advancing intrinsic and non-intrinsic bug classification with NLP, machine learning, and few-shot prompt engineering Bhandari, Pragya

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights