UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Infrequent discourse relation identification using data programming Zeng, Xing


Discourse parsing is an important task in natural language processing as it supports a wide range of downstream NLP tasks. While the overall performance of discourse parsing has been recently improved considerably, the performance on identifying relatively infrequent discourse relations is still rather low (∼ 20 in terms of F1 score). To resolve the gap between the performance of infrequent and frequent relations, we propose a novel method for discourse relation identification that is centered around “a paradigm for the programmatic creation of training datasets,” called Data Programming (DP). The main idea in our approach is to overcome the issue of limited labeled data for infrequent relations by leveraging unlabeled data in addition to labeled data. Our experiments show that our method improves the performance on most of the infrequent relations with minimal negative effect on frequent relations.

Item Media

Item Citations and Data


Attribution 4.0 International