Summarization of partial email threads : silver standards and bayesian surprise

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Summarization of partial email threads : silver standards and bayesian surprise Johnson, Jordon Kent

Abstract

We define and motivate the problem of summarizing partial email threads. This problem introduces the challenge of generating reference summaries for these partial threads when extractive human annotation is only available for the threads as a whole, since gold standard annotation intended to summarize a completed email thread may not always be equally applicable to each of its partial threads, particularly when the human-selected sentences are not uniformly distributed within the threads. We propose a framework for generating these reference summaries with arbitrary length in an oracular manner by exploiting existing gold standard summaries for completed email threads. We also propose and evaluate two sentence scoring functions that can be used in this "silver standard" framework, and we are making the resulting datasets publicly available. In addition, we apply a recent unsupervised method based on Bayesian Surprise that incorporates background knowledge to partial thread summarization, extend that method with conversational features, and modify the mechanism by which it handles information redundancy. Experiments with our partial thread summarizers indicate comparable or improved performance relative to a state-of-the-art unsupervised full thread summarizer baseline in most cases; and we have identified areas in which potential vulnerabilities in our methods can be avoided or accounted for. Furthermore, our results suggest that the potential benefits of background knowledge to partial thread summarization should be further investigated with larger datasets.

Item Metadata

Title	Summarization of partial email threads : silver standards and bayesian surprise
Creator	Johnson, Jordon Kent
Publisher	University of British Columbia
Date Issued	2018
Description	We define and motivate the problem of summarizing partial email threads. This problem introduces the challenge of generating reference summaries for these partial threads when extractive human annotation is only available for the threads as a whole, since gold standard annotation intended to summarize a completed email thread may not always be equally applicable to each of its partial threads, particularly when the human-selected sentences are not uniformly distributed within the threads. We propose a framework for generating these reference summaries with arbitrary length in an oracular manner by exploiting existing gold standard summaries for completed email threads. We also propose and evaluate two sentence scoring functions that can be used in this "silver standard" framework, and we are making the resulting datasets publicly available. In addition, we apply a recent unsupervised method based on Bayesian Surprise that incorporates background knowledge to partial thread summarization, extend that method with conversational features, and modify the mechanism by which it handles information redundancy. Experiments with our partial thread summarizers indicate comparable or improved performance relative to a state-of-the-art unsupervised full thread summarizer baseline in most cases; and we have identified areas in which potential vulnerabilities in our methods can be avoided or accounted for. Furthermore, our results suggest that the potential benefits of background knowledge to partial thread summarization should be further investigated with larger datasets.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2018-04-18
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0365780
URI	http://hdl.handle.net/2429/65468
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2018-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Summarization of partial email threads : silver standards and bayesian surprise Johnson, Jordon Kent

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights