Infrequent discourse relation identification using data programming

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Infrequent discourse relation identification using data programming Zeng, Xing

Abstract

Discourse parsing is an important task in natural language processing as it supports a wide range of downstream NLP tasks. While the overall performance of discourse parsing has been recently improved considerably, the performance on identifying relatively infrequent discourse relations is still rather low (∼ 20 in terms of F1 score). To resolve the gap between the performance of infrequent and frequent relations, we propose a novel method for discourse relation identification that is centered around “a paradigm for the programmatic creation of training datasets,” called Data Programming (DP). The main idea in our approach is to overcome the issue of limited labeled data for infrequent relations by leveraging unlabeled data in addition to labeled data. Our experiments show that our method improves the performance on most of the infrequent relations with minimal negative effect on frequent relations.

Item Metadata

Title	Infrequent discourse relation identification using data programming
Creator	Zeng, Xing
Publisher	University of British Columbia
Date Issued	2018
Description	Discourse parsing is an important task in natural language processing as it supports a wide range of downstream NLP tasks. While the overall performance of discourse parsing has been recently improved considerably, the performance on identifying relatively infrequent discourse relations is still rather low (∼ 20 in terms of F1 score). To resolve the gap between the performance of infrequent and frequent relations, we propose a novel method for discourse relation identification that is centered around “a paradigm for the programmatic creation of training datasets,” called Data Programming (DP). The main idea in our approach is to overcome the issue of limited labeled data for infrequent relations by leveraging unlabeled data in addition to labeled data. Our experiments show that our method improves the performance on most of the infrequent relations with minimal negative effect on frequent relations.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2018-12-05
Provider	Vancouver : University of British Columbia Library
Rights	Attribution 4.0 International
DOI	10.14288/1.0375383
URI	http://hdl.handle.net/2429/67968
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2019-02
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Infrequent discourse relation identification using data programming Zeng, Xing

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights