- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Code representation learning with Prüfer sequences
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Code representation learning with Prüfer sequences Jinpa, Tenzin
Abstract
An effective and efficient code representation is critical to the success of sequence-to-sequence deep neural network models for a variety of tasks in code understanding, such as code summarization and documentations, improving productivity, and reducing software development costs. Unlike the natural language, which is unstructured and noisy, programming codes are intrinsically structured, and the learning model can leverage this property of the code. A significant challenge is to find a sequence representation that captures the structural information in the program code and facilitates the training of the models. In this study, we propose to use the Prüfer sequence of the Abstract Syntax Tree (AST) of a computer program to design a sequential representation scheme that preserves the structural information in an AST. Our representation makes it possible to develop deep-learning models in which signals carried by lexical tokens in the training examples can be exploited automatically and selectively based on their syntactic role and importance. Unlike other recently-proposed approaches, our representation is concise and lossless in terms of the structural information of the AST. To test the efficacy of Prüfer-sequence-based representation, we designed a code summarization using a sequence-to-sequence learning model on real-world benchmark datasets. The results from the empirical studies show that Prüfer-sequence-based representation is indeed highly effective and efficient, outperforming significantly all the recently-proposed deep-learning models we used as the baseline models.
Item Metadata
Title |
Code representation learning with Prüfer sequences
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2021
|
Description |
An effective and efficient code representation is critical to the success of sequence-to-sequence deep neural network models for a variety of tasks in code understanding, such as code summarization and documentations, improving productivity, and reducing software development costs. Unlike the natural language, which is unstructured and noisy, programming codes are intrinsically structured, and the learning model can leverage this property of the code. A significant challenge is to find a sequence representation that captures the structural information in the program code and facilitates the training of the models.
In this study, we propose to use the Prüfer sequence of the Abstract Syntax Tree (AST) of a computer program to design a sequential representation scheme that preserves the structural information in an AST. Our representation makes it possible to develop deep-learning models in which signals carried by lexical tokens in the training examples can be exploited automatically and selectively based on their syntactic role and importance. Unlike other recently-proposed approaches, our representation is concise and lossless in terms of the structural information of the AST. To test the efficacy of Prüfer-sequence-based representation, we designed a code summarization using a sequence-to-sequence learning model on real-world benchmark datasets. The results from the empirical studies show that Prüfer-sequence-based representation is indeed highly effective and efficient, outperforming significantly all the recently-proposed deep-learning models we used as the baseline models.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2021-11-04
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0402946
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2022-02
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International