UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Summarization of partial email threads : silver standards and bayesian surprise Johnson, Jordon Kent

Abstract

We define and motivate the problem of summarizing partial email threads. This problem introduces the challenge of generating reference summaries for these partial threads when extractive human annotation is only available for the threads as a whole, since gold standard annotation intended to summarize a completed email thread may not always be equally applicable to each of its partial threads, particularly when the human-selected sentences are not uniformly distributed within the threads. We propose a framework for generating these reference summaries with arbitrary length in an oracular manner by exploiting existing gold standard summaries for completed email threads. We also propose and evaluate two sentence scoring functions that can be used in this "silver standard" framework, and we are making the resulting datasets publicly available. In addition, we apply a recent unsupervised method based on Bayesian Surprise that incorporates background knowledge to partial thread summarization, extend that method with conversational features, and modify the mechanism by which it handles information redundancy. Experiments with our partial thread summarizers indicate comparable or improved performance relative to a state-of-the-art unsupervised full thread summarizer baseline in most cases; and we have identified areas in which potential vulnerabilities in our methods can be avoided or accounted for. Furthermore, our results suggest that the potential benefits of background knowledge to partial thread summarization should be further investigated with larger datasets.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International