UBC Theses and Dissertations
An epistemological approach to domain-specific multiple biographical document summarization Tennessy, Blair
Automatic document summarization consists of two tasks: understanding and generation. Understanding is a technique in which relevant content is identified, processed, and annotated. Generation is the process of restating important content in a concise form. As a task for an intelligent system, summarization is a crucial operation: by what process can you succinctly restate pertinent information contained within a set of documents, citing only the essential facts relevant to the query at hand? In this thesis we demonstrate a conceptual approach to multiple biographical document summarization. Specifically, we apply domain-specific semantic and temporal document understanding methods to multi-document biographical summarization. Our purpose is to more fully address the important criteria—routinely cited yet rarely approached—of multi-document summarization. These criteria, namely the discovery and resolution of identical, complementary, or contradictory statements, have been roughly treated using general lexico-semantic methods. We maintain that the general semantically-informed methods previously devised for unrestricted text are not completely suitableto biography summarization; instead, it is our conviction that one must have at least a partial conceptual understanding of the subject's domain in order to reason about the importance and verity of document information. We hold that this is especially true for establishing temporal relationships, which is at the heart of biography understanding and production. What we demonstrate in this thesis is that an extremely course approximation to an epistemological system based on concepts is able to satisfy the criteria of a multi-document summarization system in a particular domain. Our methods, while primitive, provide a lower-bound on the performance of such a system.
Item Citations and Data