UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

On the understanding of software engineering related texts via the transfer of prior knowledge Hadi, Mohammad A

Abstract

Software Engineering (SE) related natural language texts, such as app reviews and crowdsourced Questions-Answers (Q&A) play pivotal roles for software engineers and developers to gain knowledge regarding different stages of software life-cycle, such as application development, deployment, and maintenance. Successful classification and clustering of these texts can help developers and engineers quickly understand and process the bulk of information in the most effective way. Therefore, this thesis focuses on the efficient classification and clustering of SE-related texts using state-of-the-art neural language models and adaptive topic modeling techniques, respectively. An extensive empirical study is performed to understand the strength, effectiveness, and competence of Pre-trained Transformer based neural language Models (PTM) for the app review classifications task, which also identified the best-performing PTMs. Two of the best-performing PTM models have also been pre-trained from scratch on domain-specific data to yield better classification performance. The largest domain-specific app review dataset has been scraped from Google Play Store for this pretraining purpose. For SE texts clustering purposes, a new online adaptive Topic Model, Adaptive Online Bi-term Topic Model (AOBTM) has been proposed that can efficiently identify topics from corpora sliced over different time and version slices. This topic model leverages and adapts the statistical data inferred in the preceding slices to infer latent topics from the latest slice. The approach yields good results for the short and noisy SE-related natural language texts.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International