- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Towards alleviating human supervision for document-level...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Towards alleviating human supervision for document-level relation extraction Feng, Yuxi
Abstract
Motivated by various downstream applications, there is tremendous interest in the automatic construction of knowledge graphs (KG) by extracting relations from text corpora. Relation Extraction (RE) from unstructured data sources is a key component for building large-scale KG. In this thesis, I focus on the research centered on Document Level Relation Extraction. One challenge of Document Level Relation Extraction is the lack of labeled training data since the construction of a large in-domain labeled dataset would require a large amount of human labor. To alleviate human supervision on documentlevel relation extraction, I propose 1) an unsupervised RE method CIFRE which enhances the recall of pipeline-based approaches while keeping high precision; 2) a semi-supervised RE method DuRE when few labeled data are available, by leveraging self-training to generate pseudo text. In order to improve the quality of pseudo text, I also propose two methods (DuNST and KEST) to improve the controllability and diversity of semi-supervised text generation, solving the challenges of inadequate unlabeled data, overexploitation, and training deceleration. Comprehensive experiments on real datasets demonstrate that our proposed methods significantly outperform all baselines, proving the effectiveness of our methods in unsupervised and semi-supervised document-level relation extraction.
Item Metadata
Title |
Towards alleviating human supervision for document-level relation extraction
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
Motivated by various downstream applications, there is tremendous interest in the automatic construction of knowledge graphs (KG) by extracting relations from text corpora. Relation Extraction (RE) from unstructured data sources is a key component for building large-scale KG. In this thesis, I focus on the research centered on Document Level Relation Extraction. One challenge of Document Level Relation Extraction is the lack of labeled training data since the construction of a large in-domain labeled dataset would require a large amount of human labor. To alleviate human supervision on documentlevel relation extraction, I propose 1) an unsupervised RE method CIFRE which enhances the recall of pipeline-based approaches while keeping high
precision; 2) a semi-supervised RE method DuRE when few labeled data are available, by leveraging self-training to generate pseudo text. In order to improve the quality of pseudo text, I also propose two methods (DuNST and KEST) to improve the controllability and diversity of semi-supervised text generation, solving the challenges of inadequate unlabeled data, overexploitation, and training deceleration. Comprehensive experiments on real datasets demonstrate that our proposed methods significantly outperform all baselines, proving the effectiveness of our methods in unsupervised and semi-supervised document-level relation extraction.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-04-17
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0441405
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2024-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International