- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Domain adaptation for summarizing conversations
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Domain adaptation for summarizing conversations Sandu, Oana
Abstract
The goal of summarization in natural language processing is to create abridged and informative versions of documents. A popular approach is supervised extractive summarization: given a training source corpus of documents with sentences labeled with their informativeness, train a model to select sentences from a target document and produce an extract. Conversational text is challenging to summarize because it is less formal, its structure depends on the modality or domain, and few annotated corpora exist. We use a labeled corpus of meeting transcripts as the source, and attempt to summarize a different target domain, threaded emails. We study two domain adaptation scenarios: a supervised scenario in which some labeled target domain data is available for training, and an unsupervised scenario with only unlabeled data in the target and labeled data available in a related but different domain. We implement several recent domain adaptation algorithms and perform a comparative study of their performance. We also compare the effectiveness of using a small set of conversation-specific features with a large set of raw lexical and syntactic features in domain adaptation. We report significant improvements of the algorithms over their baselines. Our results show that in the supervised case, given the amount of email data available and the set of features specific to conversations, training directly in-domain and ignoring the out-of-domain data is best. With only the more domain-specific lexical features, though overall performance is lower, domain adaptation can effectively leverage the lexical features to improve in both the supervised and unsupervised scenarios.
Item Metadata
Title |
Domain adaptation for summarizing conversations
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2011
|
Description |
The goal of summarization in natural language processing is
to create abridged and informative versions of documents. A popular
approach is supervised extractive summarization: given a training
source corpus of documents with sentences labeled with their
informativeness, train a model to select sentences from a target
document and produce an extract. Conversational text is challenging to
summarize because it is less formal, its structure depends on the
modality or domain, and few annotated corpora exist. We use a labeled corpus of meeting transcripts as the source, and
attempt to summarize a different target domain, threaded emails. We
study two domain adaptation scenarios: a supervised scenario in which
some labeled target domain data is available for training, and an
unsupervised scenario with only unlabeled data in the target and labeled data
available in a related but different domain. We implement several recent domain adaptation algorithms and perform a
comparative study of their performance. We also compare the
effectiveness of using a small set of conversation-specific features
with a large set of raw lexical and syntactic features in domain
adaptation. We report significant improvements of the algorithms over
their baselines. Our results show that in the supervised case, given the amount of
email data available and the set of features specific to
conversations, training directly in-domain and ignoring the
out-of-domain data is best. With only the more domain-specific lexical
features, though overall performance is lower, domain adaptation can
effectively leverage the lexical features to improve in both the
supervised and unsupervised scenarios.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2011-04-21
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-ShareAlike 3.0 Unported
|
DOI |
10.14288/1.0051250
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2011-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-ShareAlike 3.0 Unported