UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Advancing intrinsic and non-intrinsic bug classification with NLP, machine learning, and few-shot prompt engineering Bhandari, Pragya

Abstract

Bug classification is a pivotal approach in analyzing bugs, streamlining debugging processes, and facilitating analysis. Recent studies have shown that bugs can be categorized into intrinsic and extrinsic types. Intrinsic bugs, which are traceable to a cause in a project's code, contrast with extrinsic bugs, which lack such traceability and are typically instigated by external failures affecting the project. This thesis presents the first ever automatic bug classification tool called BuggIn using Natural Language Process (NLP) and Machine Learning (ML) models to classify intrinsic and non-intrinsic bugs and also introduces the application of few-shot prompt engineering in bug classification tasks by implementing a novel tool using Large Language Model (LLM). We conduct four experiments: 1) implementing an automatic bug classifier using solely text from the bug reports and training NLP and ML models; 2) assessing non-textual bug report features' effectiveness for ML model inputs; 3) replicating the BuggIn pipeline with both textual and non-textual features (i.e., source code and code review metrics); and 4) exploring alternative techniques utilizing low-resource datasets through few-shot prompt engineering using GPT-3. The results are promising while using a combination of features for the bug classification task yielding much better classification score as compared to using only textual or only non-textual features. Furthermore, the results for experiments involving LLM and few-shot prompt engineering show potential in bug classification tasks, demonstrating comparable scores as the ML-based classifiers with much fewer data records. Overall, this thesis bridges the gaps in the literature involving the classification of intrinsic and non-intrinsic bugs by enhancing the classification model and by paving the way for novel methods of solving the task. Moreover, the inclusion of methods like few-shot prompt engineering can be a good alternative to other methodologies that require large datasets that have remained prevalent in bug classification research so far.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International