- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- On the understanding of software engineering related...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
On the understanding of software engineering related texts via the transfer of prior knowledge Hadi, Mohammad A
Abstract
Software Engineering (SE) related natural language texts, such as app reviews and crowdsourced Questions-Answers (Q&A) play pivotal roles for software engineers and developers to gain knowledge regarding different stages of software life-cycle, such as application development, deployment, and maintenance. Successful classification and clustering of these texts can help developers and engineers quickly understand and process the bulk of information in the most effective way. Therefore, this thesis focuses on the efficient classification and clustering of SE-related texts using state-of-the-art neural language models and adaptive topic modeling techniques, respectively. An extensive empirical study is performed to understand the strength, effectiveness, and competence of Pre-trained Transformer based neural language Models (PTM) for the app review classifications task, which also identified the best-performing PTMs. Two of the best-performing PTM models have also been pre-trained from scratch on domain-specific data to yield better classification performance. The largest domain-specific app review dataset has been scraped from Google Play Store for this pretraining purpose. For SE texts clustering purposes, a new online adaptive Topic Model, Adaptive Online Bi-term Topic Model (AOBTM) has been proposed that can efficiently identify topics from corpora sliced over different time and version slices. This topic model leverages and adapts the statistical data inferred in the preceding slices to infer latent topics from the latest slice. The approach yields good results for the short and noisy SE-related natural language texts.
Item Metadata
Title |
On the understanding of software engineering related texts via the transfer of prior knowledge
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2023
|
Description |
Software Engineering (SE) related natural language texts, such as app reviews and crowdsourced Questions-Answers (Q&A) play pivotal roles for software engineers and developers to gain knowledge regarding different stages of software life-cycle, such as application development, deployment, and maintenance. Successful classification and clustering of these texts can help developers and engineers quickly understand and process the bulk of information in the most effective way. Therefore, this thesis focuses on the efficient classification and clustering of SE-related texts using state-of-the-art neural language models and adaptive topic modeling techniques, respectively. An extensive empirical study is performed to understand the strength, effectiveness, and competence of Pre-trained Transformer based neural language Models (PTM) for the app review classifications task, which also identified the best-performing PTMs. Two of the best-performing PTM models have also been pre-trained from scratch on domain-specific data to yield better classification performance. The largest domain-specific app review dataset has been scraped from Google Play Store for this pretraining purpose. For SE texts clustering purposes, a new online adaptive Topic Model, Adaptive Online Bi-term Topic Model (AOBTM) has been proposed that can efficiently identify topics from corpora sliced over different time and version slices. This topic model leverages and adapts the statistical data inferred in the preceding slices to infer latent topics from the latest slice. The approach yields good results for the short and noisy SE-related natural language texts.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2023-02-27
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0427284
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2023-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International