Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An epistemological approach to domain-specific multiple biographical document summarization Tennessy, Blair 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


831-ubc_2006-0117.pdf [ 6.44MB ]
JSON: 831-1.0051642.json
JSON-LD: 831-1.0051642-ld.json
RDF/XML (Pretty): 831-1.0051642-rdf.xml
RDF/JSON: 831-1.0051642-rdf.json
Turtle: 831-1.0051642-turtle.txt
N-Triples: 831-1.0051642-rdf-ntriples.txt
Original Record: 831-1.0051642-source.json
Full Text

Full Text

An Epistemological Approach to Domain-Specific Multiple Biographical Document Summarization by Blair Tennessy B.Sc, University of Northern British Columbia, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF " Master of Science in THE FACULTY OF GRADUATE STUDIES (Computer Science) The University of British Columbia January 2006 © Blair Tennessy, 2006 Abstract Automatic document summarization consists of two tasks: understanding and gen-eration. Understanding is a technique in which relevant content is identified, processed, and annotated. Generation is the process of restating important content in a concise form. As a task for an intelligent system, summarization is a crucial operation: by what process can you succinctly restate pertinent information contained within a set of documents, citing only the essential facts relevant to the query at hand? In this thesis we demonstrate a conceptual approach to multiple biographical doc-ument summarization. Specifically, we apply domain-specific semantic and temporal docu-ment understanding methods to multi-document biographical summarization. Our purpose is to more fully address the important criteria—routinely cited yet rarely approached—of multi-document summarization. These criteria, namely the discovery and resolution of iden-tical, complementary, or contradictory statements, have been roughly treated using general lexico-semantic methods. We maintain that the general semantically-informed methods pre-viously devised for unrestricted text are not completely suitableto biography summarization; instead, it is our conviction that one must have at least a partial conceptual understanding of the subject's domain in order to reason about the importance and verity of document information. We hold that this is especially true for establishing temporal relationships, which is at the heart of biography understanding and production. What we demonstrate in this thesis is that an extremely course approximation to an epistemological system based on concepts is able to satisfy the criteria of a multi-document summarization system in a particular domain. Our methods, while primitive, provide a lower-bound on the performance of such a system. ii Contents Abstract ii Contents iii List of Tables vii List of Figures ix Acknowledgements x 1 Introduction 1 1.1 Problem Statement 1 1.2 Philosophical Underpinnings 2 1.3 Characteristics of a Biographical Account 3 1.4 Accomplishment and Affiliation 3 1.5 Thesis Outline 4 2 Related Work 6 2.1 A Summarization of Summarization 6 2.1.1 Single Document Summarization 7 2.1.2 Multiple Document Summarization 9 2.1.3 Biography Summarization 10 2.1.4 DUC 2004 Task 5 17 iii 2.2 Evaluation 2 0 2.2.1 ROUGE 21 2.2.2 Evaluation Methods at DUC 22 2.2.3 Performance of Systems at DUC 2004 24 2.3 Other Relevant Work 24 2.3.1 TimeML 25 2.3.2 Syntactic Simplification 26 2.3.3 Legal Text Summarization 27 2.4 Critique 28 2.4.1 Sentence Extraction 28 2.4.2 Anti-Conceptual Approaches as Ultimately Futile 29 2.4.3 Critique of ROUGE 32 3 C o r p u s 3 5 3.1 Introduction 35 3.2 Document Collection 36 3.3 Organization 37 3.4 Corpus Statistics 39 3.5 Training and Testing 42 4 A p p r o a c h 4 7 4.1 Overview 47 4.2 A Running Example 48 4.3 Parsing 50 4.4 Semantic Analysis 55 4.4.1 An Interface for Annotation 57 4.4.2 A Compositional Markup System 61 4.4.3 Domain Learning 66 4.4.4 Reference Resolution 70 iv 4.4.5 Sentence Simplification 76 4.4.6 Frame Reduction 80 4.5 Verification 88 4.6 Generation 89 4.6.1 Content Selection 89 4.6.2 Content Structuring 9 0 4.6.3 Realization 90 4.6.4 Punctuation 9 4 5 E v a l u a t i o n 95 5.1 Baseline Summarizers 95 5.1.1 Random Summarizers 95 5.1.2 MEAD Adaptations 96 5.1.3 Training Set Results 9 ? 5.2 Evaluation Method '. 98 5.2.1 Measuring System Performance 100 5.3 Results 1° 2 5.3.1 A Key to the Systems 102 5.3.2 Entire Test Set 104 5.3.3 Subset of Test Set 105 5.4 Discussion and Interpretation 106 5.4.1 Loss due to Unknown Concepts 107 5.4.2 Poor Event Recall 108 5.4.3 Maxwell Bentley 108 5.4.4 Kareem Abdul-Jabbar 112 6 C o n c l u s i o n H 7 6.1 Conclusion 117 6.2 Future Work 118 v B i b l i o g r a p h y 121 A p p e n d i x A A p p e n d i x 128 A.l Model Biographies 128 A.1.1 Instructions 128 A.1.2 Glen's Summaries 129 A.1.3 Sarah's Summaries 135 A.2 Ontology 141 A.2.1 np labels 141 A.2.2 vp labels 146 A.3 Output Biographies 149 vi List o f Tables 2.1 Regular expressions and their associated weights for the QueryPhraseMatch feature of University of Michigan's DUC 2004 Task 5 submission 13 2.2 Document clusters and their associated summary target name 18 2.3 Results of the human assessment on the seven quality questions for each summary submission for the question "Who is Wilt Chamberlain?" 19 2.4 The range of averaged quality evaluation scores for the DUC 2004 automatic biography systems 24 3.1 The sources in the basketball corpus. 36 3.2 The sources in the football corpus 37 3.3 The sources in the hockey corpus 37 3.4 Players from the basketball corpus with the most words 42 3.5 Players from the football corpus with the most words 42 l! 3.6 Players from the hockey corpus with the most words 43 4.1 Noun phrase rates for the first document set annotated by Trevor 60 4.2 Noun phrase rates for the second document set annotated by Trevor 60 4.3 Noun phrase markup rates for the document set annotated by Glen 61 4.4 Markup rates for the four document basketball set annotated by Jeanette. . . 62 4.5 The number of author-annotated training set documents for each sport in the corpus 62 vii 4 .6 The ten most frequent trophies from the basketball corpus 6 9 4 . 7 The ten most frequent positions, along with their variations, from the football corpus : . . . 69 4.8 The ten most frequent leagues, along with their abbreviated forms, from the hockey corpus 70 4 .9 The most frequent underspecified expressions referring to some season 74 4 . 1 0 Some other notable underspecified expressions referring to some season. . . . 75 4 .11 The split counts for the basketball corpus 8 0 4 . 1 2 The split counts for the football corpus 8 0 4 . 1 3 The split counts for the hockey corpus 81 4 . 1 4 The names of the frames and their descriptions 85 5.1 The amount of multiple document player clusters for the M E A D variant com-parisons 98 vii i List of Figures 3.1 The distribution of document lengths in the basketball, football, and hockey corpora 40 3.2 The distribution of mean cluster document lengths in the basketball, football, and hockey corpora 41 4.1 The parsing stage 48 4:2 The semantic analysis stage 48 4.3 The validation stage 49 4.4 The generation stage 49 4.5 A screenshot of the annotation interface 58 4.6 A screenshot of the annotation interface with the ontology menu popup. . . . 59 5.1 ROUGE recall average of the MEAD systems and the random system 98 5.2 ROUGE precision average of the MEAD systems and the random system. . . 99 5.3 ROUGE f-measure average of the MEAD systems and the random system. . 99 5.4 The difference in performance between the MEAD SportsWords variant against pure Centroid MEAD 100 ix Acknowledgements This thesis is the culmination of my effort and the efforts of others. I acknowledge their contribution here: thank you to Glen Goodvin, Sarah Anderson, Trevor Tennessy, Sam Mason, and Dr. David Poole. I am grateful especially for the patience of my supervisor, Dr. Giuseppe Carenini, and for the steady love and companionship of Jeanette Bautista. B L A I R T E N N E S S Y The University of British Columbia January 2006 Chapter 1 Introduction 1.1 Problem Statement The automatic summarization of a specific body of knowledge is an important ability for any intelligent system. It is the ability to condense a topic into a quickly graspable unit composed only of the essential, and thereby establish a compact view of a subject. Concise descriptions of entities, attributes, and events serve to economize discussion, explanation, and investigation. The statement of knowledge in its bare essentials is the goal of the summarizer. Summarization is the condensed restatement of the essential items of a given topic. Information and knowledge are gathered from many different sources and appear in different forms making different appeals to the senses. This information is then distilled into knowl-edge, which is a consistent, unified view of the topic. From this vantage point, one is able to discourse upon a subject and reason about/incorporate new information. Knowledge is the distilled product of process and information; summarization is the communicative device for quickly transmitting that knowledge. In the last decade, many techniques for document summarization have been devised. Most notably, this work has been applied to online news, and full implementations for summarizing news online are available [54, 55]. Recent work in machine summarization 1 and question answering systems has focused on summarization of knowledge about human beings (appropriately termed "biography summarization"). This research is purposed on an investigation into the technologies and methods underlying the summarization of historical details associated with a particular entity. That is, we wish to focus on the problem of acquiring knowledge about a particular entity (over a specific duration of time) and then generating an historical account of that entity's existence. 1.2 Philosophical Underpinnings This work proceeds from certain convictions about the nature of man1 (our subject) and of the metaphysical status of natural language (our medium). We shall state here our philosophical knowledge of both topics. Let us summarize our philosophical principles up front. Man is differentiated from all other existents as being the rational animal. Man holds knowledge of the universe by means of concepts, which are the units of his epistemological system. These units subsume an unlimited number of concrete particulars. Language is a visual-auditory coding which converts a concept into the mental equivalent of a perceptual concrete: language uses a word to denote a concept. Language is the exclusive domain and tool of concepts [1, 2]. Our present inquiry is concerned with propositions. Aristotle recognized: "A (sim-ple) proposition tells you whether something is or is not true of something else, and specifies the time." Our task is the summarization of historical fact. Aristotle continues: "Any as-sertion or denial that concerns the present or the past must be either true or false." [3] Epistemology is the science of identifying and validating knowledge. An epistemolog-ical method moves in two complementary directions: integration, which extends knowledge, and differentiation, which intensifies knowledge. We will furnish our system with our own ontology; we will attempt only to intensify our knowledge through distinguishing the units of each abstract ontological class. :By man we mean all members of humanity. Throughout this thesis, we will use the convention of referring to a human with the male pronoun. However, the discussion applies to all people without regard gender. 2 1.3 Characteristics of a Biographical Account Essentially, a biography catalogs the qualities, events, and accomplishments of an individual. Events occur in succession. Time moves forward. If one is to retell the occurrences of a life, one must recall the order in which the events occurred. This is important too for verbal economy: here, context is a product of the past statements. Context must be established, then developed in a sequence of events. A sentence which duplicates previously established context is unnecessarily prolix; repeated context is therefore superfluous when sequenced together in a text. A sentence which appears out-of-context throws its entire neighborhood into disarray. Man, by his nature, is a long-range planner. He acts: he acts toward the satisfaction of his long-term goals, he acts purposefully and deliberately, and he acts with an effect that persists beyond his capacity for action. He is an efficacious being, and is thus concerned with compiling success upon success into a sum that is his life. It is this type of entity that we are concerned with in this work. It follows that an essential biographical summary will cite the significant qualities and monumental accomplishments of an individual. 1.4 Accomplishment and Affiliation Man can accomplish many things working individually. However, men acting in concert derive immense benefit from their affiliation. Organization is built around the division of labour. Men have qualities and skills that suit them for special types of work. Membership within an organization requires the concept of a role. This is the connection between individual activity and organizational structure. Like individual men, organizations are efficacious finite entities. Upon the sub-stratum of its membership, organizations are capable of planning, of decision making, of realizing goals. Certainly we could write a "biography" of an organization, but in this work 3 we will view organization as a vehicle for individual accomplishment. 1.5 Thesis Outline We have pronounced the philosophical foundations for our work. We must follow through with the description of an epistemological system, an implementation approximating this system, and an evaluation of the implementation against baseline systems. However, before we develop the body of this work, we will review and critique the field of natural language summarization and the special topic of biographical summarization. Our attention is drawn especially to the Document Understanding Conference (DUC) series, in which the automatic summarization efforts of many leading groups are annually compared. This review section finishes with an examination of summary quality evaluation techniques and a short critique of current summarization practices. We then devote a section to a description of our corpus and summarization domain. The domain of summarization for this work is sport, and we have compiled a large collection of biographical documents concerning athletes from three sports (basketball, football, and hockey). This is an excellent domain because sport is human life in regimented microcosm. The object of the game is known and most of the accomplishments are quantified. Reg-ulated team sports are also relatively modern, so there has been room for inventors and pioneers. There are many "firsts" in this domain, just as there are many instances of one man surpassing the records of another. We begin our original work with a description of our system. We describe our crude implementation of each subcomponent of the natural language understanding and generation systems. It is not our goal to demonstrate the best or most accurate methods: our system is based on a limited amount of human annotated data and on machine learning methods which achieve only a limited accuracy. The first car was not a Ferrari; essentially, however, the components are the same. We take solace in the fact that our concrete system is only a rough approximation to the ideal system, yet is complete enough to render wholly original, grammatically correct text. 4 The penultimate section details the evaluation of our system against baseline meth-ods. A n automatic method for gisting evaluation compares the output summaries of the automatic methods against model summaries written by human participants. Al l methods, human and machine alike, produce their output from a player document cluster. We target 200 word biographies for the twelve players in the testing set. We discuss the aggregate re-sults of the evaluation, and we highlight interesting items while comparing and contrasting the biographies of a few of these target players. The conclusion summarizes the achievements of this work. We close with a discus-sion of future work. We comment on the shortcomings of our implementation and suggest improvements to the method. We sketch interface components and behaviours for obtaining training data and for human suggestion and correction of mistakes and misclassifications on the part of the system. 5 Chapter 2 Related Work 2.1 A Summarization of Summarization Document summarization is the general task of deriving a short, coherent text from one or more related documents which maximizes the amount of important information covered [49, 48]. The above definition of summarization requires us to answer the question "What is the important and essential information for this set of documents?" (content selection). The requirement that we judiciously use the output space in maximizing summary content demands that we not restate information (redundancy elimination). Finally, as we are inter-ested in writing a factual, consistent, and truthful document, we are required to detect and eliminate the inconsistencies and contradictions from input documents (conflict resolution). Speaking generally, summarizing a set of documents is the combination of document understanding and generation. According to this view, summarization properly consists of two subtasks: knowledge extraction/understanding, in which information given in some perceptual form (images, video, audio, text) is reduced to some form from which a system can evaluate the importance of individual pieces of the content, and summary generation, in which the accumulated knowledge base is presented and explained within the context of the target reader's knowledge of the subject, their specific query, the subject domain, etc., 6 and is constrained by parameters such as the allowed length of the output, the nature of the medium, etc. Summarization is a high-level task, a composite of many underlying methods in com-putational linguistics. A summarization system must first identify and understand relevant content. The supporting technologies of this venture include word classifiers (e.g. part-of-speech taggers [50] and word-sense disambiguation), sentence parsers [63], named entity recognizers [4], information extraction [26], and anaphora resolution [38]. Domain-specific ontologies help guide this process, and offer a way of modularizing the domain-specific re-quirements [15]. Classification regimes, such as Naive Bayes, Decision Trees, Hidden Markov Models, and Support Vector Machines, have all been used in the labeling and scoring of sen-tences from the source texts:[51, 33, 32]. A summarizer must generate a natural language text, for which the norms of dis-course apply. A summarization is characteristically a concise document addressing the central points of the topic within the context of the user's knowledge of the topic at hand. Techniques described in the summarization literature range from extremely simple (pure extractive summarization) [34, 32], through editor-like rewriting for consistency [39], to full document planning [19]. 2.1.1 S i n g l e D o c u m e n t S u m m a r i z a t i o n Single document summarization is centered around a single instance of a document. The product of summarization may be either indicative or informative. An indicative summary explains what the source document is about, while an informative summary serves as a surrogate for the source document. Thus there are two important tasks one typically as-sociates with a single document: the generation of headlines and subheadings (or titles, chapter names, section headings, etc.), and the generation of terse explanatory passages (executive summaries, abstracts, or introductions). A summary must repeat the definite, substantive statements of the text. It should at least suggest the location (both spatial and temporal) so to indicate context and domain, 7 and explain the pertinent event. These are the basic requirements, and therefore must be recognized especially when the length of the summary is tightly constrained. Studies have found inter-human agreement on the basic, central content of a docu-ment to be high, especially when the number of choices allowed to the individual deciding which items of content are important is small. However, as the length constraint is weak-ened, i.e. when the individual is required to pick more items of content for his importance set, the amount of agreement between humans tends to decrease [27]. This is not surprising. Single document summarization has targeted relatively short summaries. The length is measured in different ways. Headline generation—an extremely terse form of summarization—is on the order of 4 words, and is sometimes measured in and limited to a small amount of characters. The DUC 2002 single-document task required from partici-pants a 100 word summary. At the time of writing, news summaries on the second page of a major national newspaper consist of a section (topical location), headline (context), and one compound sentence (explanation). When working with a single text, researchers have realistically assumed the text to exhibit coherence and consistency; the document appears as the product of a single mind building point-by-point. Most researchers have assumed that issues such as contradiction and inconsistency become manifest and are a concern only when dealing with multiple documents. Single document summarization for lengths larger than a headline are typically thought of as the special single-document case of multiple document summarization. Thus, many of the techniques discussed below for multi-document summarization apply for this case. The predominant attempt at summarization, termed selection-based extraction, is the quoting large units of text from source documents [17, 18, 23]. This class of systems typically operates at the sentence level. For a extractive system summarizing a document in isolation, certain assumptions are made about the location of the important content. For example, some assume that 8 the most important content is related to the position of the text in the document (news contains the main content at the start of the article). In fact, this is the impetus behind the lead-based baseline summary. 2.1.2 Multiple Document Summarization The task of machine multi-document summarization has been formulated as the ranking of information in a given set of texts, from which the items judged to have the greatest significance are arranged to form an output document of a given length. Broadening the scope of the summarization introduces new concerns for the un-derstanding process. Different authors will use different linguistic constructions. Varied grammar and spelling is a concern. Documents may appear in different languages, on different page layouts, under different design regimes. Authors may write from different points-of-view on a subject, assigning adjectives to entities reflecting appraisals that may conflict with the convictions of fellow authors. A widely different level of diction and word-choice is observed between documents about the same subject. Over time, nomenclature may change and facts may be revised. The most common and easiest summarization method is extractive summarization, in which whole phrases or sentences from the source texts are copied by rote. This method has the obvious advantage of simplicity and certainly does not improperly rephrase content (though it may erroneously quote out-of-context). One recognizes quoting as a powerful device of an author. The extractive method has shortcomings, however, for different authors will use different devices, whereas a single text should appear as though a single author or voice has generated the text. Grammar is a problem for extractive summarizers operating at the sub-sentence level (i.e. on phrases rather than full sentences). Rewrite and rephrasing methods for smoothing the text and producing coherent output have been developed [44, 47]. Working with multiple documents also holds a great deal of advantages over a single text. Usually a human requires multiple documents to compare and contrast, from which one might winnow down a large document set based on some evaluative criteria. One also 9 learns about a new subject area and domain by inferring a domain ontology, which consists of developing entity classes, labeling particular entities, establishing relationships between entities and concepts, etc. This is best accomplished with a large amount of data. Given a set of documents belonging to a given field, one might also infer the standard elements of the typical document for that field. Automatic document summarizers have approached multiple document summariza-tion as some variation on the following theme: (i) treat each text as a collection of sentences, (ii) cluster these sentences and group into subsets exhibiting topical similarity, (iii) evalu-ate and measure the importance or centrality of each topic, (iv) select representatives for each topic, and (v) arrange/rewrite to form the output, under consideration of maximiz-ing coverage. Typically, these sentences are treated in the topic clustering step as if they were independent, in the respect that their position or source document is irrelevant to this summarization subprocess. Many interesting methods have been investigated for multi-document summariza-tion. These include Spreading Activation [20, 22], Multiple Sequence Alignment [25], and Document Semantic Graphs [36, 24]. Multiple document summarization is qualitatively different from single document summarization. As a rule (there are exceptions), one can take a single document as con-sistent a priori, as it is written by a single author. However, during the understanding of multiple documents, we may encounter contradictory statements. Further, information from each source may complement the information from another text, providing "the rest of the story." 2.1.3 Biography Summarization Biography summarization is the task of recounting the qualities, attributes, and events in the life of some person. It is a special subtask of summarization in which the subject of the discourse is a human entity. In a recent Document Understanding Conference (DUC 2004), special consideration 10 has been given to question answering systems responding to the form "Who is X?" [58]. The input document clusters contain news story text related to the human entity referred to as X. The desired output is a biographical description of the human covering the important events in the life of that human. Many of the teams participating in this conference attempted to hand-tailor their systems to this task. These systems are attacking the same problem as we are in this thesis, so we will give an in-depth look at a selection of the biographical summarization methods of the systems entered in this conference. Note that the difference is that the clusters in this task are collections of news documents, whereas we are performing biographical summarization from biographical documents. ISI/USC The ISI/USC team approached biography summarization by identifying the shared com-ponents of the typical biography [32]. By annotating a corpus of 130 documents, these researchers distinguished nine classes of phrases (bio, fame, personality, personal, social, education, nationality, scandal, and work). Using two classification tasks (a binary classifier to decide whether sentences qualify for inclusion in a biography summary, and a ten-way classifier for the nine phrase classes and the special class none), and an array of machine learning methods, biography sentences were selected, filtered, and ranked. From here, re-dundancies were eliminated until a biography of the proper length was found. This approach is an example of pure extractive summarization. As for shortcomings, the authors conclude, "The summaries generated by the system address the question about the person, though not listing the chronological events occur-ring in this person's life due to the lack of background information in the news articles themselves." 11 U n i v e r s i t y of M i c h i g a n / M E A D The system due to the University of Michigan [34] proceeded in a similar and simpler manner to that of ISI. MEAD [18], an extractive summarizer, operates in three steps: 1. Conversion of each sentence into a feature vector. The basic features included in the MEAD platform are Position, Length, and Centroid. The Position script assigns higher scores to sentences at the beginning of the document, the Length script imposes a minimum word length for each sentence, and the Centroid script scores each sentence for similarity to the cluster's "centroid" using frequency counting and an inverse document frequency database. 2. A combiner step, in which the feature vector is reduced to a scalar value. The default combiner computes the scalar value of a sentence as the sum of the products of the feature value and the feature weight. 3. A re-ranker, in which the score of a sentence is adjusted based on its relationship to the other sentences. This step corresponds to redundancy elimination, and the default system adjusts scores based on cosine similarity between sentences: the score of a candidate sentence is reduced based on its similarity to the accepted sentences. Working with the MEAD system, the researchers added a new feature, QueryPhraseMatch, that gave a higher ranking to sentences deemed to contain information about the person in question. This is a general regular expression matching script which increments the score of a sentence whenever an expression is matched in that sentence. The team determined a set of biographical information-indicating regular expres-sions empirically (that is, they encoded common patterns from sample biographies as regu-lar expressions). Each of these expressions was given an ad-hoc weight which boosted the value of the sentence in the extraction phase. An example of such an expression from the paper is X (lives I lived), which, if matched, contributed a weight of 0.5 to the sentence. A sample of expressions excerpted from [34] is given in Table 2.1. 12 expression weight X 0.25 X grew up 1.0 X attended 1.0 X (turns I turned) [1-9][0-9]? 1.0 X, (an?I the I who I whom I whose) [\w ]*[,.] 1 5X, [1-9][0-9]?(,I years) 1.5 X began 0.35 X (lives Hived) 0.5 X made 0.5 Table 2.1: Regular expressions and their associated weights for the QueryPhraseMatch feature of University of Michigan's DUC 2004 Task 5 submission. This table is given in [34]. Note that many of these expressions bear a resemblance to the ISI phrase categories. Many of these verbs are expected to appear in ISI's biographical categories like bio, educa-tion, fame, etc. In this respect, the ISI effort seems more abstract and amenable to machine learning techniques, as typical expressions and their associated weights could easily be found automatically from suitably marked-up text. Note that this feature works at the regular expression level. The matches must be exact. For example, if the goal of the first expression, X, is to increase the feature value for sentences containing a reference to the biography subject, then at least some of the sentences will be favoured. The general intention is clear—look for sentences in which the referent of X is referred to. Regular expressions do not suffice. One would much rather have text marked up in such a way as to be able to identify a reference to an entity X and then analyze the following verb phrase or parenthetical matter. We have more to say about the MEAD summarizer later, when we extend the basic platform system with features of our own. MEAD serves as our sentence extractor baseline in the evaluation phase. C o l u m b i a U n i v e r s i t y / D e f S c r i b e r The effort from Columbia University built upon their DefScriber software, which is a part of the AQUAINT system [46]. This package combines goal-driven and data-driven techniques. 13 The goal-driven method moves top-down identifying types of information for definition in-clusion (phrases like "X is a Y"). In contrast, the data-driven method is bottom-up, using centroid-based similarity and clustering for topic identification and redundancy issues. The Columbia system proceeds in four steps which may be summarized thus [39]: 1. Identify and extract relevant sentences containing definitional information for the tar-get. 2. Incrementally cluster extracted sentences using the cosine distance metric. Inverse document frequency (IDF) counts from word-stems are used. 3. Select sentences for the output by maximizing the inclusion of important phrases from definitional clusters. 4. Apply rewrite techniques to improve readability. For the Task 5 problem, special modifications were taken in the first step to match shortened and alternate versions of the target's name, and in the last step, in which the researchers adopted a specific system for rewriting references to people. This system was also developed under McKeown at Columbia [45]. Critically, the authors made note of certain problems with their work. One prob-lem was discrete named entities with similar names appearing together in text caused a misidentification of the two referents. Importation of a sentence about, say, a relative of the biography subject, occurred. After a rewriting of the name, the true subject became the actor of his relative's deeds. The authors gave the example of their system confusing John F. Kennedy (1917-1963) with John F. Kennedy, Jr. (1960-1999), and subsequently titling the latter man "President." Further, the Columbia project did well at the ROUGE automated scoring evaluation, but these results diverged from the human quality evaluations. The key addition to their system, namely this named entity rewriting technique, did not distinguish their system on the related DUC human quality assessment questions. 14 C R L / N Y U The CRL/NYU Summarization System for DUC 2004 was entered in the multi-document biographical summarization task [35]. Generally, their system is an extractive model op-• 1 erating at the sentence level harnessing a module to gauge similarity between sentences. Similarity guides the representative selection process, as well as collecting sentences with similar structure but different content. The group makes use of anaphora resolution for the biography summarization problem. The group outlines a list of metrics for estimating the significance of a sentence. These scoring functions vary from the extremely simple—based on position of the sentence in the document, with no regard to paragraph boundaries—through frequency counting methods, and even scoring functions using higher-level symbols, as in scoring with named entities instead of nouns. The team used an extended named entity categorizer developed by the NYU Proteus Group [10]. The scores from each technique are weighted to form a total score. The heaviest features were tf*idf and position, which accounted for 54.7% and 45.0% of the composite sentence score for biography summarization. The news headline of the document was also used: each sentence was scored for relevance against the document headline using named entities (this accounted for only 1% of the total sentence score). They found parameter values after training with DUC 2001 and 2002 data. The CRL/NYU DUC 2004 paper contains some interesting methodological ap-proaches. A problem in multiple document summarization is avoiding repetition of the same idea, and so finding similarity between sentences is important. The authors explain two types of similarity functions to determine necessary or redundant sentences. They make the following assumptions in [35]: 1. When two sentences are similar and have no named entities, the sentence pair has the same content. 2. When two sentences are similar and share named entities, the sentence pair has the same content. 15 3. When two sentences are similar but have different named entity tokens of the same types, the sentence pair has similar structure but different content. The system then calculates two similarity functions, one based on content words, the other on named entities. Unlike most reports from DUC 2004, CRL/NYU explicitly addressed some of the temporal aspects of the document set. They argue that the distribution of key sentences is related to the time span of a document set, and proceed to formulate features based on this information. The two features they computed from the document dateline were within a week and over a year. NYU considers frequency and document frequency of named entity tokens and classes, as well as the variation of the named entity tokens. The named entity classes they use are event, facility, location, organization, person, and product. NYU also makes use of a coreference module to find expressions related to the cluster name. Officially, the group did not receive good ROUGE scores, yet performed well in the human quality evaluation. In particular, it was the best at questions 2, 4, and 6, which we will review below. C L Research The CL Research summarization experiments paper details an interesting, deeper approach to biography summarization [38, 59]. The system described by Litkowski is relatively ad-vanced in terms of the natural language processing techniques employed. The system is based on "massively" tagged XML documents, in which source input text (usually HTML) is cleaned, parsed, and marked up in a discourse representation. Syntactic and semantic attributes of the key elements of the sentences are identified using anaphora resolution and WordNet [37]. The CL Research entry is part of a broader system for discourse analysis of text, and the author mentions this broader system is being used to process encyclopedia articles, 16 historical texts, and news. The system allows for exploration of documents by either "asking fact-based questions, by summarization (generally or topic-based), and by probing the con-tents of the semantic types of entities, relations, and events." These tasks are implemented in XPath [62]. With this system, biography summarization is approached by focusing on the target name, then identifying all discourse entities by that name, then evaluating each sentence containing this person according to whether it displays a definitional pattern. This is similar to the approach of Columbia. According to the abstract [38], this method places the system at the overall level of third of 14 systems in biography summarization at DUC 2004. 2.1.4 DUC 2004 Task 5 The Document Understanding Conference (DUC) is an annual event hosted by the Na-tional Institute of Science and Technology (NIST) in which summarizer teams participate in "bake-offs" with their best document summarizer efforts. The systems described above were entered in DUC 2004 Task 5, which is a multi-document biographical summary prob-lem: given a question of the form "Who is X?", and a document cluster of news articles related to the person referred to as X, produce a biography-like summary of the information contained within those articles. Let us examine the Wilt Chamberlain cluster. We reproduce here the human-authored model biography summary A derived from the Wilt Chamberlain cluster: Wilton Norman "Wilt" Chamberlain was a phenomenal 7'1" 280-pound National Basketball Association center of the 1960's and early 1970's. Excelling at Philadelphia's Overbrook High School in track and field as well as basketball, he went to the University of Kansas, on to the Harlem Globetrotters and joined the Philadelphia Warriors in 1959. He was outstanding not on'ly for height and strength, but for agility, setting many new league records and dominating the championship 76'ers in 1967. In 1968 he joined the Los Angeles Lakers and led their championship team of 1972. He retired in 1973. Wilt Chamberlain died 17 c l u s t e r n a m e c l u s t e r n a m e dl32d Robert Rubin dl70e Hugo Chavez dl33c Stephen Hawking dl71f Howard Dean dl34h Desmond Tutu dl73g Barbara Boxer dl35g Brian Jones dl74b Chea Sim dl36c Gene Autry dl75g Guenter Grass dl37c Harry A. Blackmun dl76g Jerri Nielsen dl39b Joerg Haider dl78f Ralph Nader dl41d Sir John Gielgud dl79g Nawaz Sharif dl44c Jon Postel dl85g Henry Lyons dl47d Mel Carnahan dl87a Susan McDougal dl48g Carole Sund dl88a Eric Robert Rudolph dl49d Louis J. Freeh dl89h Paul Wellstone dl51h Alan Greenspan dl90a Pol Pot dl53h Kofi Annan dl91h John. F. Kerry dl54c Wilt Chamberlain dl92f Sonia Gandhi dl55c JFK, Jr. dl93h Willie Brown dl56b Wen Ho Lee dl94f George Soros dl57d John C. Danforth dl95d Madeline Albright dl59a Theodore John Kaczynski dl96a Juan Antonio Samaranch dl61f Karl Rove dl97a Paul Coverdell dl64g Mia Hamm dl99b Thoabo Mbeki dl65h Jimmy Carter d200b Philip Glass dl66d Jesse Helms d201b Abdullah Ocalan dl68d Helmut Kohl d202e Wesley Clark dl69a Dr. Jack Kevorkian d211e Anthony B. "T.J." Solomon Table 2.2: Document clusters and their associated summary target name. on Oct. 12, 1999 at the age of 63 years. Contrast this summary with two automatic summaries given below (documents produced by systems 62 and 109, respectively): The following is The Associated Press story filed March 2, 1962, the night Wilt Chamberlain scored 100 points in a game HERSHEY, Pa. (AP) - Wilt Chamberlain set a National Basketball Association scoring record of 100 points tonight as the Philadelphia Warriors defeated the New York Knickerbockers, 169-147. But Russell's Celtics - with the exception of the 1967 Philadelphia 76ers - were consistently loaded with far more talent than 18 the various collections of Warriors, Sixers and Lakers that backed up Chamberlain. I became a Wilt Chamberlain fan at age 8 when I saw him play for the Harlem Globetrotters, and I remained an unabashed Wilt fan until the day he d Wilt Chamberlain was a test for a sportswriter. His name was Wilton Norman Chamberlain, and the very least we can say about this man is that there has never been a basketball player like him. Wilt Chamberlain was n't merely in the record book. Before Wilt Chamberlain ever stepped on an NBA floor, the basketball world knew who he was. Chamberlain brought something to basketball it had never seen before: a dominating 7-foot-l center who was athletic. Yet Chamberlain was more than just a special force on the floor. When Chamberlain died Tuesday at age 63 of an apparent heart attack in his Bel-Air home I! famous for a stream running through it, Hall was stunned. Here are the quality assessments for the reproduced machine summaries, as well as quality assessments for the baseline and human summaries: summary 1 2 3 4 5 6 7 62 2 1 1 3 1 2 1 109 4 2 2 1 1 3 2 5 1 1 1 1 1 2 2 A 1 1 1 1 1 1 1 C 2 2 1 1 1 1 1 D 1 1 1 1 1 2 1 Table 2.3: Results of the human assessment on the seven quality questions for each summary submission for the question "Who is Wilt Chamberlain?". The human-written summaries are documents A, C, and D. The baseline summary is 5, which copies the lead from a New York Times article written by Scott Ostler of the San Francisco Chronicle. Note that summary 62 gained very high marks, and from this assessment, one would expect that the biographical summary would be very near to human performance. 19 2.2 Evaluation The evaluative criteria of a summary consists of measuring how fluent the text is and how well the summary reflects the important content of the document set. As with all text, a summary must cohere and be grammatically correct. In addition, a summary must condense document subject matter into statements representative of the main ideas of that content. A summary is necessarily an abbreviated version of the original document; under a constraint of length, one must maximize the coverage of the topics. This is a discriminating, selective activity: words must be chosen carefully to achieve effect. For a summary of a single document, one desires a text which includes references to the principal entities and the major events or relationships between those entities. For a summary of multiple documents, one requires a text that rephrases the main subject areas over that set of documents. A human expert is an expensive item. Work done in evaluating a summary consumes a good deal of his time. This effort has little re-use value. It is estimated that nearly 3000 hours was spent in evaluation in the 2001 and 2002 DUC conferences. Thus, an automated scoring mechanism is extremely desirable. The evaluation of general language issues, such as spelling and grammar, occurs automatically to the reviewer of any text, including a summary. This does not require much knowledge of the particular subject area and topics of the summarized document. The evaluation of the content of a summary, i.e. of the coverage of the topics specific to the documents at hand, does require domain-specific knowledge. Typically one must be an expert in a given field to write a quality summary of the ideas particular to that field. This is like the fact that one must be fluent with two languages to translate a document from the first language to the second, yet a speaker of the second language requires no knowledge of the first language in order to evaluate the grammatical correctness of the translation. In order to evaluate the translation, however, the reader must be able to understand the text in the first language, as he must determine what was stated in the original document. 20 2.2.1 R O U G E Automatic systems for the evaluation of translations and summaries have only been invented recently. The current method for the automatic evaluation of machine summarization has its root in a technique for the automatic evaluation of machine translation. The position for measuring translation performance taken by IBM researchers is: "The closer a machine translation is to a professional human translation, the better it is." An evaluation of a translation thus requires a corpus of quality human translations for reference, and some metric for determining similarity. The essential idea is to use a weighted average of variable length phrase matches against the reference translations [28, 29]. Translations having more n-grams1 in common with the reference translations are scored higher, and are thus judged to be better translations. IBM demonstrated a high degree of correlation between this automatic method of scoring and translation quality evaluations performed by human judges. This method (called BLEU) was adopted as the primary evaluation measure in the machine translation community. The technique of automatic evaluation of translations using n-gram co-occurence statistics was adapted for summarization by Lin and Hovy [31]. They were able to show a strong correlation on past DUC evaluations between the rankings given by this automatic system (for unigrams and bigrams) and the ratings awarded by human judges. This corre-spondence has led to the adoption of their system, ROUGE (Recall Oriented Understudy for Gisting Evaluation) [57], as the evaluation system for most of the tasks at the DUC conference in 2004. ROUGE is freely available for research use. In evaluating the effectiveness of automatic evaluation metrics, two criteria are pro-posed [31]: 1. Automatic evaluations must correlate highly, positively, and consistently with human assessments. This requirement means that when a human recognizes a good summary, so will the automatic evaluator with high probability. xAn n-gram is a sequence of n words. A unigram is a single word, a bigram is two consecutive words, etc. 21 2. The statistical significance of automatic evaluations must be a good predictor of the statistical significance of human assessments with high reliability. This requirement states that when a significant difference between two summaries is detected by an au-tomatic evaluation system, this will correspond (and positively correlate) to significant differences as evaluated by human assessors. When this occurs with strong reliability (that is, high recall and precision), then the automatic evaluation serves as a good barometer for summarizer progress. Lin and Hovy found that simple unigram and bigram co-occurrence statistics consis-tently outperform the BLEU style weighted average of variable length n-gram statistics in summary evaluation. According to their statistical experiments with DUC 2001 data, they found unigram co-occurrence statistics to best suit the automatic scoring process across summary tasks. This method consistently correlated highly with human assessor scores and had high recall and precision in significance tests with such results. This is contrasted with the findings for BLEU-style weighted average variable length n-grams, which did not typically satisfy the criteria! of a good automatic scoring system for summaries. The au-thors forward the argument that, when working in the domain of summarization, extractive methods do not suffer problems in grammar, while translations do. Longer n-gram methods score for grammar rather than content. 2.2.2 Evaluation Methods at DUC As mentioned above, automatic evaluation methods have only recently been adopted, start-ing with the 2004 Document Understanding Conference. Prior conferences have required many thousands of man-hours to be spent in evaluation. It is estimated that this segment of the conference accounted for roughly 25-35 percent of the event [30]. Even in the most recent conference, human evaluations were necessary for scoring Task 5 (biography summa-rization). In this subsection we will review the methods by which the quality of biography summaries was evaluated at these conferences. For the biography summarization task, assessors were instructed to rank summaries 22 on quality, coverage, and responsiveness [34]. The ROUGE tool was partially used, al-though many teams found discrepancies between the evaluation performance assessed by the automatic tool and the rankings given by humans in the quality questions. In terms of evaluation, it is instructive to review the quality metrics. To gauge quality, the assessors answered the following seven questions, ranking summaries from 1 (best) to 5 (worst): 1. Does the summary build from sentence to sentence to a coherent body of information about the topic? 2. If you were editing the summary to make it more concise and to the point, how much useless, confusing or repetitive text would you remove from the existing summary? 3. To what degree does the summary say the same thing over again? 4. How much trouble did you have identifying the referents of noun phrases in this sum-mary? Are there nouns, pronouns or personal names that are not well-specified? For example, a person is mentioned and it is not clear what his role in the story is, or any other entity that is referenced but its identity and relation with the story remains unclear. 5. To what degree do you think the entities (person/thing/event/place/...) were re-mentioned in an overly explicit way, so that readability was impaired? For example, a pronoun could have been used instead of a lengthy description, or a shorter description would have been more appropriate? 6. Are there any obviously ungrammatical sentences, e.g., missing components, unrelated fragments or any other grammar-related problem that makes the text difficult to read? 7. Are there any datelines, system-internal formatting or capitalization errors that can make the reading of the summary difficult? It is interesting to note that, upon viewing an analysis of the official results from DUC 2004, most systems performed satisfactorily on questions 3-7, many getting a mark 23 I between 1 and 2, while the marks on the first two questions were decidedly worse. For these questions, systems routinely scored worse than a 3 on the first question, and the highest average was 2.32 on the second. Given the nature of the first two questions, it seems that more work should be dedicated to biography output planning. 2.2.3 Performance of Systems at DUC 2004 The ranges of values given to the automatic systems by human evaluators for the seven quality questions are tabled below: q range l 2.90-4.52 2 2.32-4.04 3 1.16-1.98 4 1.24-2.96 5 1.18-2.42 6 1.22-2.76 7 1.30-2.54 Table 2.4: The range of averaged quality evaluation scores for the DUC 2004 automatic biography systems. In contrast, the human model summaries consistently achieved scores close to 1 on all questions. This table highlights a major concern with short biography summarization. The first two quality questions indicate that most of the automatic summaries were incoherent and included useless information. 2.3 Other Relevant Work We now review some of the papers from which we drew inspiration for our work. 24 2.3.1 TimeML TimeML is a specification language for event and temporal expressions in natural language text. It applies to a range of syntactic and semantic contexts, such as aspectual predication (as in Plante began to wear a mask) and modal subordination (as in Plante could have quit the game). It breaks ground on temporal and elementary causal reasoning [11, 53]. TimeML is an XML-based annotation scheme. There are four main classes of tags in this markup language: EVENT, TIMEX3, SIGNAL, and various LINK tags. We will proceed with a brief overview of each of these data structures. Broadly, EVENT consumes text corresponding to situations which occur and to pred-icates describing circumstances in which something holds true. Events may be punctual or last for a period of time. The attributes of an EVENT are an identifier, a class, a tense, and an aspect. We list here the possible classes of event and some corresponding examples (modified from [53]): 1. OCCURRENCE: die, crash, build, merge, sell 2. PERCEPTION: see, hear, watch, feel 3. REPORTING: say, report, announce 4. ASPECTUAL: begin, finish, stop, continue 5. STATE: on board, kidnapped, love 6. I_STATE: believe, intend, want 7. IJVCTION: attempt, try, promise, offer 8. MODAL: might, should, would Corresponding to each EVENT is at least one MAKEINSTANCE tag (during markup, one should create as many such instances as are warranted by the text). This tag creates event identifiers. All relations indicated by links (the types of links are described below) involve these event instances. 25 The TIMEX3 tag marks up explicit temporal expressions. These can be fully specified (e.g. March 20, 1948), underspecified (e.g. last Tuesday), or durations (e.g. one day). The SIGNAL tag consumes regions of text that indicate how temporal objects are related. These are typically function words, including temporal prepositions (e.g. during), connectives (e.g. when), and subordinators (e.g. if). It also encapsulates polarity indicators (e.g. no) and temporal quantification (e.g. two times). The LINK tags encode relations existing between the temporal elements instantiated by the MAKEINSTANCE tag. There are three types of links: TLINK, for encoding a temporal relationship between two events or between an event and a time; SLINK, for a subordi-nation relationship between two events or between an event and a signal; and ALINK, for representing the relation between an aspectual event and its argument event. The EVENT type was the major inspiration during the development of verb phrase labels in our project. 2.3.2 S y n t a c t i c S i m p l i f i c a t i o n Authors load text with many items of information. They do this through grammatical devices, such as compound forms and parentheticals. The technique used to get at the base stuff of these compound sentences is termed syntactic simplification [9, 8]. For sentence extraction researchers, this technique is used to improve the sentence clustering stage. Especially for news stories, the information contained in parentheticals is often only tangentially related to the main clause of the sentence. Capturing appositives and non-restrictive relative clauses and severing them from the base sentence allows one to cluster on the main matter of the sentence free from extraneous information. Also, the parentheticals can be reintroduced at fortuitous locations in the text through a process of reference rewriting [39]. We base our syntactic simplification method on a short handbook of grammar [64]. We also proceed to isolate out parenthetical material as in [9]. 26 2.3.3 Legal Text Summarization The special task of summarizing legal judgments and verdicts has been given attention [41, 42, 43]. There are a number of special roles and artifacts in this domain, so researchers have developed methods to identify the text passages referring to these items. In particular, the markup method explained in [41] operates on a form of text representation that is close to ours. They are summarizing legal proceedings with an extractive method. The legal domain contains some special concepts, and the corresponding referring expressions are peculiar and detectable. Aware of this, [41] performs "a more complex layering of two kinds of NER [named entity recognition]." Their system chunks text into noun groups and assigns each group a type and subtype attribute. Our system parses text into phrases and assigns each phrase a class and type attribute. The first kind of named entity recognition finds general named entities. These noun groups are tagged with an outside named entity tagger. Example types from this tagger include person, organization, location, date. The second kind finds domain-specific entities. These noun groups are tagged by handcrafted LT T T T rules [7]. For example, a person type group might receive the more specific subtype committee-lord. Domain specific entities include organization subtypes such as court, and legal artifacts such as act or judgment. This label also includes room for legal roles like appellant and respondent. In case of labels assigned to noun groups by both taggers, the most specific do-main label dominates. These labels are used as features during rhetorical classification and relevance measurement. 27 2.4 Critique 2.4.1 Sentence Extraction As we have seen, except for a few notable exceptions, the prevailing method of summarization is sentence extraction. Let us commence a condemnation of the root method common to the foremost groups at the premier "document understanding" conference. Sentence extraction should be known by its original term: plagiarism. Let us recall the definition [65]: plagiarism: n. 1.; the appropriation or imitation of the language, ideas, and thoughts of another author, and representation of them as one's original work. 2. something appropriated and presented in- this manner. , Plagiarized work is a revolting sight when committed by a human. When it is put forward and accepted as a "robust method" of summarization by the predominant researchers in the field, it amounts to a condemnation of the field itself. Sentence extraction is entrenched. Some have come to identify it with automatic summarization itself. Consider the second sentence from the introduction of [32]: "It [automatic text summarization] is described as selecting a subset of sentences from a document that is in size a small percentage of the original and yet is just as infor-mative." There are many valid instances of quoting text by rote in performing a summariza-tion task. Sometimes, the point is better made through reference rewriting, as we performed in the preceding quote. However, the recalling of original sections of text from other docu-ments requires explicit attention to context. It must be used for effect, as a secondary point to drive home the point being made by the author of the summary. Quoting is a rhetorical 28 device. Sentence extraction does not merely not make use of concepts. It is profoundly anti-conceptual. It bears the chief hallmark of the anti-conceptual: the separation of a statement from context. Given our definition of language, it is not a valid approach to the task of summarization. 2.4.2 Anti-Conceptual Approaches as Ultimately Futile Sentence extraction, and any variant thereof, is the only conceivable method of forming natural language text from a set of input documents from an anti-conceptual approach. Otherwise, one must be generating novel text. Generating text requires both something to say (which requires concepts of entities, attributes, events, and particular instances of these items) and knowledge of grammar (which requires the concepts related to forming sentences and larger units of discourse). Let us consider a series of simple examples. Let us imagine we are to summarize, in one sentence, the following eight sentences. Bobby Orr won the Norris Trophy in 1967-68. Bobby Orr won the Norris Trophy in 1968-69. Bobby Orr won the Norris Trophy in 1969-70. Bobby Orr won the Norris Trophy in 1970-71. Bobby Orr won the Norris Trophy in 1971-72. Bobby Orr won the Norris Trophy in 1972-73. Bobby Orr won the Norris Trophy in 1973-74. Bobby Orr won the Norris Trophy in 1974-75. There are eight explicit facts stated here. Summarizing them into a single sentence should be a simple exercise for a reader. Consider a pure sentence extractor. It will pick one of these sentences, leading to 29 an incomplete account of the text. A method which attempts to dispose of some extraneous content (e.g. [44]) does not help. Let us consider another example. The sentences above are relatively self-contained; selecting any of them would at least establish one Norris win in one hockey season by Bobby Orr. The time (the context) is explicit. The context of events is not always so spelled out. Bobby Orr won the Norris Trophy in 1967-68. Bobby Orr won the Norris Trophy in the following season. Now, should a sentence extractor pick the second sentence, a reader would not be able to resolve the phrase "the following season," and might only retain the fact that, in some season, Bobby Orr won the Norris Trophy. This is a simple example of context dropping. It would have been better simply to prune off the preposition phrase altogether. Context dropping will insidiously undermine the unity of a work. Without attention to reference—without identifying the meaning of a specific phrase by establishing its onto-logical place in one's hierarchy of concepts—one is liable to disregard the time and place and other essential information conditioning the statement. In reassembling this content, one will either (quote sentences that) make anaphoric reference to the wrong referent intro-duced in a previous (quoted) sentence, or (quote sentences that) "access" a reference that was never introduced. Epistemology is concerned with identifying and verifying the facts of existence. All validation occurs in terms of concepts. All reasoning relies on concepts. An approach with-out concepts is therefore unable to determine contradiction between statements. Using an an anti-conceptual approach, it is possible to arrange the quoted statements into something resembling a contradiction. Thus, an anti-conceptual method evades the fundamental fact that a contradiction cannot exist [3]. Consider: Bobby Orr scored exactly 1 assist in his first professional hockey game. 30 Bobby Orr scored exactly 2 assists in his first professional hockey game. Let us have no restriction on the length of the summary. Most summarizers will take blind advantage of the length and copy both propositions. One can only recognize the contradiction when it is clear that an entity cannot simultaneously perform a certain act both exactly once and exactly twice within the identical region in time. So much for our short refutation and proof that one cannot approach summarization, and more generally, any method of acquiring knowledge without the device of the concept (for knowledge is retained in terms of, and only in terms of, concepts). If one cannot even identify the meaning of a statement, one cannot accomplish the subsequent task of verifying that statement. One cannot be harsh enough when the good part of the research community is approaching a task while explicitly disregarding the nature of that task. They are proceeding according to inessentials: they have an automatic method for cobbling together an eclectic disintegration of other people's words. I state here that their project must be abandoned, and that plagiarism should be held in the contempt it deserves. The current field of researchers are trying to do too much with too little. They are attempting "general" or "domain independent" summarizers without any experience in any particular domain. This is a mere velleity. There is no such thing as "domain independent" summarization, for summarization requires familiarity with the domain of whatever subject matter one is summarizing.. As Polus says, "experience produces art, but inexperience chance." We should be careful in talking about truthful or contradictory statements from sentence extractors. The documents produced by a concept-free summarization architecture have the same status regarding truth as the recitations of a parrot: no process exists which formulates propositions. The strings of symbols produced by a sentence extractor are not linguistic, just as the strings of sounds produced by a parrot are not. Properly speaking, the product of the non-conceptual summarizer cannot be regarded as true or false, consistent 31 or contradictory. 2.4.3 Critique of ROUGE ROUGE is a limited method of evaluating summary quality. First, it models a document as a collection of independent sentences. The scores for any permutation of the sentences from a peer document will be the same. It is obvious that the quality of a biography is not the same as the quality of that same biography read in reverse. Second, ROUGE operates on n-grams, which are delimited by whitespace and cor-respond to runs of words. ROUGE has many variations: vanilla n-grams (sequences of words), skip grams, Basic Elements, etc. These divisions of the sentence are artificial and mismatched with the grammatical structure: they cross phrase boundaries. Let us examine some simple ways in which ROUGE options may lead our evaluation astray. Let us assume we are dealing with a model document with one sentence: Bobby Orr was a defenseman. In the stopword mode, certain words—deemed too common and thus unimportant— are cut. One of these words is not. This word is the difference between the assertion of a fact and its denial. Yet, if it is cut, we have ROUGE agreeing that the denial is a perfect summary of the assertion. Thus Bobby Orr was not a defenseman. is a perfect match to our model document. Then again, so is Bobby Orr was not not a defenseman. 32 Next, words must match. Using a synonym for defenseman, such as defenceman, defender, or blueliner, will decrease the ROUGE value. For example, the sentence Bobby Orr was a blueliner. will get a unigram score of 0.8 on the recall and precision measures (four out of the five l! terms match). Note that the denial above, even when not is not stopped, still beats the paraphrase on all measures in the unigram instance. Multi-word synonyms, and even broken compound words or the addition of a hyphen, play havoc with ROUGE. In an attempt to get around this problem, especially for verb inflections, a version of the Porter word stemmer2 is included in ROUGE. In order to test ROUGE's sensitivity to simple forms of paraphrase, we wrote a simple text generator which rendered the same information into English text in a number of equivalent forms. A number of documents were generated by randomly selecting a form for each piece of information. Unsurprisingly, as the degree of paraphrase increases, the average ROUGE score decreases. ROUGE is stuck in roughly the same mindset as the rest of the summarization field. They are treating textas a bunch of perceptual symbols. They disregard boundaries and the sequencing of text. They disregard function words. It is obvious that a system which is insensitive to the order of the sentences on the input is also insensitive to the chronological order of the statements of a document. It must also be insensitive to the development of context. We will further explore the proper methods of evaluating a summarizer once we have judiciously separated the work into abstract tasks. It suffices to mention here that a proper evaluation of a summarizer proceeds according to the abstract stages of the underlying epistemological system, and of the summarizer proper. It cannot be a concrete-bound comparison of runs of whitespace-delimited strings of characters, although such a comparison 2 A stemmer is a common preprocessing step in tasks like information retrieval—the stem is the common morphological root of inflected, derivative words. 33 w i l l v e r i f y t h a t w e a r e s p e a k i n g w i t h t h e s a m e t e r m s . 34 Chapter 3 Corpus 3.1 Introduction Multiple biographical document summarization requires a set of biographical documents with multiple different entries for each human subject. This organized collection of docu-ments is known as a corpus. The automatic summarization of biographical documents is a problem that has been only recently approached in earnest. Human endeavour is so varied and broad that a comprehensive collection with multiple biographies would be immense. The field is not yet mature enough to support such a collection, so we are required to choose a particular field of human action and to find and collect a large number of biographical documents belonging to that field. We decided to study athletes and sport for many reasons. The primary reason, however, is that sport is an idealization of human activity. An ideal abstracts out and concentrates the essentials. Sharply delineated time periods, quantified accomplishments, formal changes of player status—the regulation in sport is a boon to our work. We want a domain which will exhibit the challenges of multi-document summariza-tion yet will not require such a complex process of reasoning as to disqualify a maturing system from partial success. We want texts which contain errors and which confirm and 35 conflict with each other; we want blemished texts from sources prone to "noise." We want a domain which is familiar and interesting to both the author and the other human partic-ipants in this work (annotators, model biography writers, readers). We want a domain in which there is interaction between many of the biographical subjects: this allows us to link the tasks involved in summarizing a small set of documents to the tasks of learning about the subject's domain. Our collection of athlete biographies fulfill all of these desires. The two prominent multiple document biographical summarization corpora (the DUC 2004 Task 5 corpus, and the corpus described in [32]) feature large clusters but few subjects. For example, the corpus from [32] contains ten subjects with roughly twelve documents per subject. We aim at smaller cluster sizes but vastly more subjects. 3.2 Document Collection The body of biography text that we work with was collected from a number of internet sites. To our knowledge, these texts have not be used before in any natural language processing effort. The names of the sources for our content, as well as their base internet page are given in Table 3.1, Table 3.2, and Table 3.3. source name url Attic attic Hickok hickok HitRunScore hitrunscore HoopHall hoophall Trivia Library trivialibrary Wikipedia i wiki Table 3.1: The sources in the basketball corpus. The content of these pages is encoded in HTML. We extract the textual content of each player biography, removing items such as tables, section headings, etc. The derivative is a text consisting of a title and a sequence of paragraphs. 36 source name url Hickok hickok Pro Football HOF profootball Thinkquest tq Wikipedia wiki Table 3.2: The sources in the football corpus. source name url Legends of Hockey loh Hockey Fans hf Couch Potato Hockey cph Wikipedia wiki Hickok hickok Table 3.3: The sources in the hockey corpus. These texts were written by a number of different human authors. They contain a number of mistakes and inconsistencies, including errors of punctuation, spelling, grammar, and fact. The biographies of some sources are professionally written, while others are the work of devoted fans. We will have an opportunity throughout the remainder of this thesis to examine the various types of errors made by the original authors of the text. We will focus especially on the sensitivity of our system to the various types of these innate errors, and whether we can recognize and correct such mistakes. We develop the system under the assumption that the text is correct—this is the general character of the corpus. But we must eventually confront the fact that errors exist in the text and, more importantly, some of these errors are within our system's ability to fix. 3.3 Organization Our corpus consists of biographies from three team sports: basketball, football, and hockey. We further subdivide the sport corpus according to the internet sources for each sport. 37 Therefore, each sport is initially arranged as a corpus consisting of an index of sources. These sources contain an index of their biographies. The biographies are initially extracted from the source internet site, and the document index is built automatically. The hockey source index, which contains five sources, is given below: <corpus sport="hockey"> <source name="Legends of Hockey" target="loh" index=""/> <source name="Hockey Fans" target="hf" index=""/> <source name="Couch Potato Hockey" target="cph" index=""/> <source name="Wikipedia" target="wiki" index=""/> <source name="Hickok" target="hickok" index=""/> </corpus> A n abbreviated version of the document index from the wiki source of the hockey corpus is given below: <biographies> <bio name="Sid Abel" target="Sid_Abel"/> <bio name="Jack Adams" target="Jack_Adams"/> <bio name="George Armstrong" target="George_Armstrong"/> <bio name="Ace Bailey" target="Ace_Bailey"/> </biographies> Here is an example paragraph-segmented document: <document> | <title>Sid Abel</title> <p> Sid Abel played professional hockey from 1938 unt i l 1954. He also served as a coach, scout, and colour commentator. </p> <P> Abel was a member of the Production Line with Gordie Howe and Ted Lindsay. He won three Stanley Cups with the Detroit Red Wings during his NHL career. </p> </document> Additionally, each sport contains an index of the biography subjects contained within the sources. We will refer to a collection of biographies about a given player as a player 3 8 cluster or simply cluster. This clustering was accomplished automatically with a correction pass by the author. An example cluster index with one player cluster is given below: <players> <player firstname="Sid" surname="Abel" bios="4"> <bio source="hickok!' target="Sid_Abel" name="Sid Abel"/> <bio source="cph" target="Sid_Abel" name="Sid Abel"/> <bio source="wiki" target="Sid_Abel" name="Sid Abel"/> <bio source="loh" target="SidneyGeraldAbel" name="Sidney Gerald (Sid) Abel"/> </player> </players> 3.4 Corpus Statistics We graph in Figure 3.1 the size of each corpus and give the distributions of the document lengths, in words. We plot histograms with a bucket size of 25 and a range of 1000 words. Most of the documents are Hess than 1000 words, but there are a few long documents in our corpus. The longest document is a 2,528 word wiki entry on the hockey great Mario Lemieux. We provide histograms in Figure 3.2 of the average biography document length (in words) for each player cluster from each sport. 39 II 60 50 40 -30 -20 -10 k 0 T—n Tr-rr-i o 200 400 600 length in words 400 600 length in words 800 1000 800 1000 400 600 length in words 800 1000 Figure 3.1: The distribution of document lengths in the basketball, football, and hockey corpora. 40 40 i — 35 -30 -25 -20 -15 -10 -5 -0 — 0 200 400 600 800 average cluster document length in words T 1000 200 400 600 average cluster document length in words 800 1000 200 400 600 800 average cluster document length in words 1000 Figure 3.2: The distribution of mean cluster document lengths in the basketball, football, and hockey corpora. 41 For each sport, we list the top ten players with the most words counted in their cluster: player words bios words/bio Michael Jordan 3069 3 1023.0 Wilt Chamberlain 2725 5 545.0 Sam Barry 2152 3 717.3 Bob Cousy 2010 5 402.0 Jerry Lucks 1982 3 660.7 Kareem Abdul-Jabbar 1971 5 394.2 George Mikan 1865 4 466.3 Pete Maravich 1842 4 460.5 Hakeem Olajuwon 1801 3 600.3 Bill Russell 1752 4 438.0 Table 3.4: Players from the basketball corpus with the most words. player words bios words/bio Jim Thorpe 3109 3 1036.3 Joe Namath 2099 4 524.8 George Halas 1956 3 652.0 Walter Camp 1921 3 640.3 Dan Marino 1855 4 463.8 Paul Horriung 1775 4 443.8 Red Grange 1639 4 409.8 Joe Montana 1583 4 395.8 Vince Lombardi 1555 4 388.8 Al Davis 1478 4 369.5 Table 3.5: Players from the football corpus with the most words. 3.5 Training and Testing We split the corpus into a training set and a testing set. The training set consists of a large set of documents on which we make observations, mine for knowledge, and label for learning purposes. We also use the training set for word frequency information, which we apply in adapting the baseline summarizers to the sports domain. 42 player words bios words/bio Mario Lemieux 5826 5 1165.2 Wayne Gretzky 5626 4 1406.5 Gordie Howe 3860 5 772.0 Guy Lafieur 3633 5 726.6 Bobby Hull 2983 5 596.6 Maurice Richard 2918 4 729.5 Bobby Orr 2827 4 706.8 Phil Esposito 2782 5 556.4 Ray Bourque 2693 5 538.6 Mike Bossy 2657 5 531.4 Table 3.6: Players from the hockey corpus with the most words. There are two special subsets of the training set. The first is the human annotated documents. The second is the development set, which was used during the construction of the system. This second set consists of a few player clusters, mostly from the hockey corpus. The development players given the most attention were legends familiar to the author: Bobby Orr, Bobby Hull, Jacques Plante, Sid Abel, and Rick Barry. The testing set is considered off-limits. Once we have completed development of our system using the training set, we will apply our methods to produce summaries from the testing set. The two sets are formed from the player cluster indices from each sport. We are practicing multi-document summarization, so players with a single biography document automatically qualify for the training set. The remainder are randomly split, with 95% going to the training set and 5% making the test set. The corpus was randomly split. The resulting testing set contained twelve players: five basketball players, three: football players, and four hockey players. Their cluster names and the names and sources of their biographies are given in the table below: Neil Johnston source biography name words hickok Neil Johnston 201 hoophall Donald Neil Johnston 227 43 Kareem Abdul-Jabbar source biography name words attic Kareem Abdul-Jabbar 277 trivialibrary Kareem Abdul-Jabbar 375 wiki Kareem Abdul-Jabbar 673 hoophall Kareem Abdul-Jabbar 400 hickok Kareem Abdul-Jabbar 490 Tom Gola source biography name words hoophall Thomas J. Gola 287 hickok Tom Gola 290 Bob Cousy source biography name words hoophall Robert J. Cousy 287 trivialibrary Bob Cousy 327 wiki Bob Cousy 358 hickok Bob Cousy 523 attic Bob Cousy 509 Joe Dumars source biography name words hickok Joe Dumars 157 wiki Joe Dumars 237 hitrunscore Joe Dumars 317 Alan Page source biography name words hickok Alan Page 220 profootball Alan Page 283 wiki Alan Page 473 44 James Lofton source biography name words hickok James Lofton 283 profootball James Lofton 386 Jim Brown source biography name words tq Jim Brown 171 hickok Jim Brown 384 wiki Jim Brown 311 profootball Jim Brown 298 Alex Delvecchio source biography name words wiki Alex Delvecchio 162 cph Alex Delvecchio 792 loh Alexander Peter (Alex) Delvecchio 937 Eddie Shore source biography name words wiki Eddie Shore 342 loh Edward William (Eddie) Shore 858 hf Eddie Shore 376 Maxwell Bentley source biography name words hickok Maxwell Bentley 178 cph Max Bentley 544 loh Maxwell Herbert Lloyd Bentley 740 45 Bobby Clarke source biography name words hickok Bobby Clarke 222 loh Robert Earle (Bobby) Clarke 1551 wiki Bobby Clarke 268 After we formed the testing set, we enlisted two teams of authors to write a short biography (at most 200 words) for each of the twelve test players. Using only factual information directly available from a player cluster, and their general knowledge of sport, they were to perform the same summarization task as our automatic methods. We use these extra human-authored documents as the model set in the evaluation phase. The model documents are reproduced in the appendix. i The document specifying biography authoring directions is also attached in the appendix. The writing instructions are open-ended: basically, the authors are allowed a single paragraph of at most 200 words in which to write a biographical summary. We will comment further on whether 200 words was a good length to target, whether II the instructions were too open, etc., when we review the evaluation results. 46 Chapter 4 Approach 4.1 Overview Our system is implemented1 as a number of successive markup stages. Understanding a document is seen as proceeding from raw text to a collection of frames. These frames represent the facts stated in the document. Generating a document is seen as proceeding from a collection of frames to a realization in English. This surface realization, a summary, is a condensed encoding of the collection of frames. The first stage of understanding is the identification of the grammatical structure. The unit of this stage, the sentence, is broken into a hierarchy of contiguous sequences. This structure is called a parse tree. Next, the unit of the parse tree, the phrase, is labeled with a semantic class from an ontology. Next, we pass through a resolution stage, which operates on the entire document. In this stage, the particular referent of each phrase is decided. We then simplify sentences into independent clauses. Finally, we recover a frame representation of the document. This identification of the propositional content is the final stage of understanding an isolated document. We arrive at the knowledge base by deciding which frames to accept. The cardinal concern is to build a consistent and unified knowledge base free of contradiction; the knowl-edge base need not consist completely of true facts, but it cannot admit two opposing facts. 47 In this respect, our system remains gullible, as we do not address rhetorical devices such as hyperbole. Generation is similar to understanding in reverse. We choose a subset of facts. We plan which facts to combine into a sentence. We then compose the grammatical structure and instantiate references. Finally, text is punctuated and the realization is completed. As a guide for this chapter, we employ a flowchart diagram of the understanding process and the generating process. We break the chart down into four stages corresponding to grammar, semantic analysis, validation, and generation. text extraction sentence segmentation parsing html xml xml xml premarking/ expanding xml Figure 4.1: The parsing stage. domain knowledge phrase marking reference resolution sentence simplification frame reduction / / / / - -» r xml xml | | xml xml xml Figure 4.2: The semantic analysis stage. 4.2 A Running Example We shall proceed to discuss each stage of the system. During this discussion, we will be performing the techniques to a short example biography about hockey legend Bobby Orr 48 frame harmonization xml Figure 4.3: The validation stage. event document generation punctuation selection planning / / • / / / xml xml xml xml text Figure 4.4: The generation stage. written by the author. We will also have occasion to cite examples encountered by the author which will highlight the difficulties and shortcomings with our system. Let us start with the single paragraph text biography of Bobby Orr: <document> <p> Robert Gordon Orr was born in Parry Sound, Ontario, Canada, on March 20, 1948. Bobby Orr began his junior career with the Oshawa Generals at the age of 14. He set a record for goals by a defenseman with Oshawa. At 18 he signed with the Boston Bruins of the NHL. He won the Calder Trophy as top rookie in his f i r s t season. A fast, skilled player, Orr revolutionized the role of the defenseman. He won eight consecutive Norris Trophies (1968-1975) as best defenseman. He added three Hart Trophies for most valuable player, and he won two Art Ross Trophies as scoring leader. Orr led the Bruins to the Stanley Cup in 1970 and 1972, winning the Conn 49 Smythe as playoff MVP on both occasions. Incredibly talented, Orr had the ability to dominate play on the ice. However, multiple knee injuries would limit his career. Despite his injuries, Orr led Canada to the 1976 Canada Cup championship and won the tournament MVP award. After signing with the Chicago Black Hawks in 1976, Orr played only a handful of games. He retired in 1979 and was immediately inducted into the Hockey Hall of Fame. </p> </document> This document contains references to a number of people, organizations, trophies, times. Moving forward in time, it develops the hockey life of Bobby Orr by citing the major events he participated in, as well as the qualities and skills he was known for. One would be hard-pressed to find a more complete summary of Orr's hockey career in the same amount of space. The length is 193 words. 4.3 Parsing The first stage of document understanding is a separation of the text into sentences (see Figure 4.1). The author implemented a simple rule-based sentence segmenter (see [6] for a general technique). Each sentence is further tokenized into words and punctuation. We divide sentences using the s tag, and enclose words and punctuation in the w tag, which has a lex attribute. Let us segment the first few sentences of the Orr biography: <document> <p> <s> <w lex="Robert"/> <w lex="Gordon"/> <w lex="0rr"/> <w lex="was"/> <w lex="born"/> <w lex="in"/> <w lex="Parry"/> <w lex="Sound"/> <w lex=","/> 50 <w lex= "Ontario"/> <w lex= ","/> <w lex= "Canada"/> <w lex= ","/> <w lex= lon"/> <w lex= 'March'7> <w lex= '20"/> <w lex= ","/> <w lex= "1948"/> <w lex= '."/> </s> <s> <w lex= 'Bobby'7> <w lex= 'Orr"/> <w lex= 'began'7> </s> </document> Next, we parse the text (see Figure 4.1). This is a hierarchical sequence labeling task which also assigns the part-of-speech tag for each class. There are four main phrase groups: noun phrase (np), adjective phrase (adjp), verb phrase (vp), and preposition phrase We use the Charniak parser [5], which is a maximum-entropy based parser. A freely-available parser, it was trained on a Penn Treebank corpus of general news text. This is the only natural language processing component in our system not written by the author. There were some serious issues with the parser. Since it is a probabilistic method that we could not train on our corpus, there were a number of issues. First, the parser did not usually produce the correct parse as the most likely parse. However, the parser is able to produce the top n parses. After some experimentation by the author, it was found that the correct parse was usually within at least the top twenty parses. From this point on, we actually operate on twenty parse versions for each sentence. Second, the parser had notorious difficulties with event nominals. For example, in the sports of hockey and basketball, there is a statistic named assist. In the vast majority 51 of cases, this word is the head of a noun phrase. However, the parser nearly always marked it as a verb, and produced some very odd parse contortions on the output. We couldn't train the parser, which should have fixed the problem given enough input data. What we do instead is a simple word substitution before parsing. This "tricks" the parser toward the desired parse. One might consider it a pre-tagging of the part-of-speech for a small set of words. After parsing, we convert the symbol back to the correct lexeme. For example, the word assist is converted to bassist before parsing, which forces a noun phrase. Let us parse the first Orr sentence. Note that this is the author's Charniak-like parse: the Charniak parser gets confused by the word "Parry", which is obviously part of a name for a location, but must have been marked as a verb in the training corpus for the parser. <document> <p> <sl> <s> <np> <w lex="Robert" pos="nnp"/> <w lex="Gordon" pos="nnp"/> <w lex="0rr" pos="nnp"/> </np> <vp> <w lex="was" pos="aux"/> <vp> <w lex="born" pos="vbd"/> <PP> <w lex="in" pos="in"/> <np> <np> <w lex="Parry" pos="nnp"/> <w lex="Sound" pos="nnp"/> </np> <w lex="," pos=","/> <np> <w lex="Ontario" pos="nnp"/> </np> <w lex="," pos=","/> <np> <w lex"="Canada" pos="nnp"/> 52 </np> < w lex="," pos=","/> </np> </pp> <PP> < w lex="on" pos="in"/> <np> <np> < w lex="March" pos="nnp"/> </np> < w lex=!'20" pos="cd"/> < w lex="," pos=","/> <np> < w lex="1948" pos="cd"/> </np> </np> </pp> </vp> </vp> < w lex="." p o s = " . " / > </s> </sl> </p> </document> We then perform some cleanup to the parse. We transform back any of the special word substitutions. We also compute some simple features using regular expressions, and then scan for known named locations. The output of the Charniak parser is difficult to work with. For example, compound noun phrases are generally surrounded with just one noun phrase marker: <np> < w lex="Doug" pos="nnp"/> < w lex="Harvey" pos="nnp"/> < w lex="," pos=","/> : < w lex="Bobby" pos="nnp"/> < w lex="0rr" pos="nnp"/> < w lex="," pos=")'7> < w lex="and" pos="cc"/> < w lex="Paul" pos="nnp"/> < w lex="Coffey" pos="nnp"/> 53 </np> However, we want to separate the grammatical devices (in this case, conjunction) from the individual phrases they combine (in this case, a number of naming phrases). Thus we "expand" such phrases. They are easy to spot as we have a knowledge of grammar. <np> <np> <w lex="Doug" pos="nnp"/> <w lex="Harvey" pos="nnp"/> </np> <w lex="," pos=","/> <np> <w lex="Bobby" pos='|fnnp"/> <w lex="0rr" pos="nhp"/> </np> <w lex="," pos=","/> <w- lex="and" pos="cc"/> <np> |j. <w lex="Paul" pos="nnp"/> <w lex="Coffey" pos="nnp"/> </np> </np> We expand genitive phrases with pronouns. We also isolate number forms. These include cardinals, ordinals, fractions, and digit-dash combinations. We look for these items and enclose them in a noun phrase along with their value. For example, we expand <np> <w lex="his" pos="prp$"/> <w lex="first" pos="jj"/> <w lex="season" pos="nn"/> </np> into <np> <np> <w lex="his" pos="prp$"/> </np> 54 <np class="number" type="ord" value="l"> <w lex="first" pos="jj"/> </np> <w lex="season" pos="nn"/> </np> The last stage is the pre-marking of locations. This is our first use of feedback from later stages. At some point, we will determine the structure of team names and find the particular sports teams and their associated location names. These location names are fed back to this stage, and we pre-mark the longest matching subsequence within any noun phrase. Thus the output <np> <w lex="Chicago" pos="nnp"/> <w lex="Black" pos="nnp"/> <w lex="Hawks" pos="nnps"/> </np> , becomes <np> <np class="location" type="city"> <w lex="Chicago" pos="nnp"/> </np> <w lex="Black" pos="nnp"/> <w lex="Hawks" pos="nnps"/> </np> 4.4 Semantic Analysis We now have a nested grouping of phrases for each sentence. Each phrase has a meaning, so we must determine what each phrase refers to. The general approach is to first classify syntactic phrases according to an ontology (later, once we have enough data for a specific sport, we will subdivide the abstract classes of this ontology). In terms of text representation, this classification corresponds to labeling each phrase with an element from the ontology. 55 Our corpus contains many documents from a related field. Many of the phrases exhibit the same patterns, albeit with different words corresponding to different but related referents (they are related through belonging to the same semantic class). Our goal is to mark the entire corpus with semantic labels through an automatic method. To accomplish this, we first hand-annotate a small set of development documents. We then apply machine learning methods to induce the labels for phrases. These phrases might contain words which are not observed in the annotation data. For example, various qualities are ascribed to a person through a linking verb. One need not know all the words referring to particular qualities in order to identify new qualities. We can move inductively from known patterns to absorb new expressions. In order to deal on the level of concepts, which are groups containing two or more related particulars1, we are required to reify these abstractions with a concrete label. Before describing the process of labeling our noun groups with these labels, we will describe the steps that we took in developing our sport ontology. During the development of this system, we created and refined a sport ontology. We began by labeling the noun phrase groupings with a class attribute. We started initially with some rough classes: player, org, statistic, time, artifact, quality, other. We also had a special class, error, which was intended as a marker for an incorrectly parsed phrase. As time went on, we collected a number of phrases under each class by labeling parses of sentences from our development set. It became clear that there were too few classes, and that a reorganization and refinement of our concepts was impending. We extended the noun phrase ontology to include a type attribute. The type further specifies the class, and subdivides each class into a set of more precisely defined concepts. For example, the org class contained teams, leagues, schools, and companies. We split the class into these subtypes, as well as a special label called other which caught organizations not belonging to these subtypes. We turned to the marking of verb phrases. Our main inspiration for the class level ^his is not a complete definition of concept, but it underlines the fact that a concept is an abstract grouping of more than one particular. "> 56 attribute were the event types listed in the TimeML work [53], The ontology in its current organization, which we use for the remainder of our project, is listed in the appendix. The noun phrase class attribute is one of player, body, quality, time, event, statistic, org, location, sport, award, artifact, draft, and other. There are also linguistic labels such as quantifier, structure, and error. The verb phrase class attribute is one of the TimeML event classes or the special linguistic label structure. The author also wrote a short definition and gave a few examples of particular phrases, within context, for each semantic label. This provided each term with an identity and thus allowed human annotators to decide on labels for phrases. Ultimately, classification according to the ontology is the first part of the reference resolution process. 4.4.1 An Interface for Annotation Our machine learning methods require training data. To acquire this data, we implemented an interface which allows human operators to mark parsed data with the class and type attributes for each phrase. Four collections, consisting of ten documents each, were randomly selected from the document set. We exported the top parses for each sentence of each document to these annotation packages. An initial classifier was derived from annotated parses marked by the author. We marked each text using our markup system described below. This helps to mark the "easy" phrases in the annotation packages, allowing the human annotators to concentrate on new or difficult examples. For example, human marking of personal pronouns distracts the operator and contributes little in terms of overall effect on the classifier. The interface (see Figure 4.5) contains a main document panel and a control panel. The document panel displays colour-coded text with the accompanying phrase label in a smaller font directly below the phrase. The control panel allows the user to switch between noun phrase and verb phrase markup modes. The control panel also contains a slider for 57 controlling the phrase depth. Early in his career, Esposito was regarded by some observers as a "" garbage player "who player description scored easy goals simply by hanging around the net . As time went on and he was consistently among the top scorers in the NHL, he was player description finally recognized as one of the all-time great player • description hockey players. file mode. = • noun phrases I ©'verb phrases |level= 1 2 3 Figure 4.5: A screenshot of the annotation interface. The label for each phrase is set by the user (see Figure 4.6). The annotator picks a phrase by right clicking on the text of the phrase. This pops up a menu containing the two-level ontology (class and type). The interface was implemented in Java using the Swing Toolkit [60] and makes use of the Xerces XML processing engine [61]. It is platform-independent, and was used by annotators on a few operating systems. We timed the annotators during their task. Each document accumulates the amount of time during which it is open for annotation. This will not be accurate if the annotator "breaks" and leaves the interface open. For each document from each annotated document set, we tabulate the time (in seconds), number of phrases marked, and the time per phrase. The documents for annotation were drawn from the entire document set, so there were a few annotated documents from the test set marked by our participants (we created the annotation packages before establishing our test set). It is important to note that we 58 Early in his carper Esoosito was regarded by tone—! player career some observ scored easy j; net . As tim among the to| player ctescriptt\ finally re cog] hockey playei body quality time event statistic org luidiiiin . sport award : amfart draft ' quantifier structurel garbage player" who y by hanging around the city [consistently state/prov ' country [continent field other other error he was -time great file [i-mode • nnun phrases; verb phrases level-2 3 Figure 4.6: A screenshot !of the annotation interface with the ontology menu popup. did not make use of any annotation from the test set documents during the development of our system (the test set is off-limits). The only information that we gained from these documents was the human annotator timing measurements. We are interested here in the markup rates for the users of our interface. We see that a package of ten documents requires a few hours of annotation time. Our system trains on roughly twenty to thirty documents per sport, so there is a lot of effort involved in creating the annotated development corpus. The number of annotated documents, plus the number of noun phrases and verb phrases marked, for each sport, is tabled in Table 4.5. Interface Enhancements This interface is an obvious candidate for enhancement. A mixed-initiative setup, which would embed the parser and classifier in the system in order to react to user corrections, would be a great advantage. It is observed that, in many cases, the user will change the 59 biography time (s) phrases s.p.p. David (Sweeney) Schriner 620 119 5.21 Dan Fonts 1714 104 16.48 Lou Groza 696 100 6.96 Viacheslav Fetisov 740 112 6.61 Chuck Rayner 483 52 9.29 Wayne Embry 544 89 6.11 Brud Holland 856 53 16.15 George Blanda 1204 131 9.19 J. D. (Jack) Ruttan 519 85 6.11 Jackie Smith 557 144 3.87 totals 7933 989 8.02 Table 4.1: Noun phrase rates for the first document set annotated by Trevor. biography time (s) phrases s.p.p. John McKay 451 105 4.30 Elvin E. Hayes 967 190 5.09 Tom Barrasso > 1023 151 6.77 Frederick Joseph (Bun) Cook 1406 242 5.81 Wayne Millner 587 127 4.62 Scotty Pippen 771 159 4.85 Bobby Bell 483 119 4.06 Lavell Edwards 390 94 4.15 Gordon Roberts 462 116 3.98 Dave Christian 479 118 4.06 totals 7019 1421 4.94 Table 4.2: Noun phrase rates for the second document set annotated by Trevor. label of a phrase, but subsequent phrases must also be relabeled (most likely other forms of referring expressions involving many of the same words). Such an interface would update its model upon correction, fixing any subsequent mistakes. The author did toy with a simple mixed-initiative system under a number of varied behaviours and found that the helper sped up the annotation process. Other improvements that could be made to the interface would mainly deal with directing the annotator's attention toward problematic examples. Examples for which the system is not confident in its label assignment require user attention. Also, if we use the classifier to rank the most likely label choices, one could arrange the ontology menu on a 60 biography time (s) phrases s.p.p. John McKay ; 1885 102 18.48 Elvin E. Hayes 1336 178 7.51 Tom, Barrasso 627 142 4.42 Frederick Joseph (Bun) Cook 1081 223 4.85 Wayne Millner 606 112 5.41 Scotty Pippen 898 139 6.46 Bobby Bell 522 113 4.62 Lavell Edwards 455 73 6.23 Gordon Roberts 416 99 4.20 Dave Christian 394 100 3.94 totals 8220 1281 6.42 Table 4.3: Noun phrase markup rates for the document set annotated by Glen, per-phrase basis so that these alternatives are close at hand. 4.4.2 A Compositional Markup System We assume that the label for each phrase may be decided directly from the components it contains. Under this assumption, we implement a recursive markup system. Given a phrase node, we first recurse on all subphrases, setting the semantic label for those subphrases. We then determine the label for the current node to be the label corresponding to the maximum value of the product of the probability for the label from each contained symbol. The symbols used for noun phrase markup are words, noun phrases, and preposi-tion phrases with noun phrase complement. In the case of words, we omit words tagged with certain parts-of-speech (e.g. determiners, certain types of punctuation, etc.), and we transform the lexeme to lower case except when tagged as a proper noun. The noun phrase symbols are the class and type pair, as well as a binary feature for indicating whether the phrase has a possessive marker. The preposition phrase symbols are the preposition plus the noun phrase symbol of the preposition complement. The symbols used for verb phrases are words, noun phrases, verb phrases, and preposition phrases. Again, we reject words with certain part-of-speech types. The verb 61 biography time (s) phrases s.p.p. Kareem Abdul-Jabbar 2372 200 11.86 Rick Barry 2662 236 11.28 Wilt Chamberlain 3327. 345 9.64 Bob Cousy 2400 311 7.72 Julius Erving 2648 294 9.01 Walt Frazier 1325 295 4.49 John Havlicek 1790 272 6.58 Michael Jordan 4456 722 6.17 Hank Luisetti 1599 269 5.94 Oscar Robertson 963 160 6.02 totals 23542 3104 7.58 Table 4.4: Markup rates for the four document basketball set annotated by Jeanette. Both noun phrases and verb phrases were marked in this document set. Jeanette reported that she marked the documents in the order shown here, and we see that she quickly improves her markup rate. |! sport docs np vp basketball 16 2701 842 football 21 2839 748 hockey 36 5726 1678 Table 4.5: The number of author-annotated training set documents for each sport in the corpus. phrase symbols are made up of the class and type attributes. We invoke the naive simplifying assumption, in which features are assumed indepen-dent. Further, in English noim phrases, the most significant word of the basic phrase is the last, so we boost up this "head" symbol by raising its contribution to the third power (this ad hoc value was found to increase the accuracy of the classifier in the holdout assessment described below). We will call this method "Naive Product." We assign the label I by maximizing over all semantic labels using features computed from the node: ; I = argmaxp(Z | N) = arg max p(l | n,) (4.1) nteN We compute the probabilities from our human annotated data. We smooth each 62 feature using simple add-one smoothing. Let us consider the markup of the phrase "the Boston Bruins of the NHL." The parsed, pre-marked noun phrase is: <np> <np> <w lex="the" pos="dt"/> <np class="location!' type="city"> <w lex="Boston" pos="nnp"/> </np> <w lex="Bruins" pos="nnps"/> </np> <PP > ; <w lex="of" pos="in';'/> <np> <w lex="the" pos="dt"/> <w lex="NHL" pos="nnp'7> </np> </pp> </np> 1. The algorithm starts on the top noun phrase. It recurses on the noun phrase "the Boston Bruins", then the preposition phrase "of the NHL." 2. Within the Bruins phrase, the probability for each class and type pair2 is com-puted as the product of the probability of that label given a noun subphrase of type location/city, and the probability of that label given the plural proper noun Bruins. We observe many instances in the training data of an org/team containing a location/city subphrase. We also find the word Bruins exclusively within an org/team phrase. We set the label for the phrase to org/team. 3. Within the preposition phrase, we recurse on the complement "the NHL." The abbre-viation NHL is observed overwhelmingly in phrases of type org/league, so this label is assigned. 2For economy of discussion, we will write these pairs in the form class/type 63 4. Finally, we are ready to tag the top noun phrase. There are two symbols: the noun phrase of type org/team, and the preposition phrase of type of org/league. These two symbols are both typically found in the org/team class, so this label is assigned. We have decorated the tree as follows: <np class="org" type="team"> <np class="org" type="team"> <w lex="the" pos="dt"/> <np class="location" type="city"> <w lex="Boston" pos="nnp"/> </np> <w lex="Bruins" pos-"nnps"/> </np> <pp> <w lex="of" pos="in"/> <np class="org" type="league"> <w lex="the" pos="dt'7> <w lex="NHL" pos="nnp'7> </np> </pp> </np> This was an easy example and also suggests that we will see such good performance of our technique on named entities (we will confirm this below). Even if, for example, we could not so surely label the first noun phrase (replace "Boston Bruins" with "Bogusville Bogarts"), we would still label the top phrase as org/team since there are an overwhelming number of training examples of teams with the league specified in an attached preposition phrase. Once we have made a markup pass to the training set, we help bootstrap the process by feeding back terms which, occur frequently over the corpus within a certain label. Items like proper nouns (player names, team names, locations, etc.), as well as many head nouns (generic terms, roles, body parts, etc.), are gained from the corpus. This is a simple example of utilizing the large corpus for domain learning. We also tried a system operating under the same assumptions using Support Vector Machines (SVM). We used the freely available SVMlight package [52]. 64 C r i t i q u e Our basic assumption—the semantic label of a phrase is the composite of the components— admits some obvious errors. For example, consider the phrase "the first round". If one were asked to say whether it was the first round of a draft, or if it was the first round of the playoffs, one would be offended: not enough information was given to tell either way. One would probably reply: use it in a sentence, and I shall say the sense. This information must come from surrounding context. This assumption is especially dangerous for plural pronouns. Considered out of context, the referent of the word "they" could belong to any number of classes. On the other hand, it will consistently mark "he" as some particular person (of course, there are referents from other classes which may be referred to with this pronoun). We have explained that our assumption is incorrect. However, it holds in many cases that the meaning of a phrase is the composite of its constituents. We must also critique our counting method. Unfortunately, we do not separate the sports, and thus create a single counts file. This has some drawbacks, such as the confounding of statistic names or position names. For example, an end in football is a position (in fact, there are many variations, such as tight end, defensive end, etc.). The term occurs many times in the annotation data for football biographies. In basketball and hockey, however, end is not a position, and is used for event aspect to signal termination (as in "came to an end"). Because of our counting method, phrases from hockey and basketball containing the word "end" are generally marked with the wrong class. Per formance We tested our system after annotating a number of documents. Intersecting the marked documents with the training documents, we came up with a training set for the annotation task. We test by singling out one of these documents (the holdout document), deriving a model from the remainder, and marking the holdout document. We then compare the human marked holdout document and the automatically annotated document. We performed this 65 for noun phrase markup over each document in the annotated testing set of the hockey corpus. We found that the Naive Product system achieves a 72.54% accuracy, while the SVM system achieves 67.26% accuracy. It is important to note that this crucial segment of our overall understanding pipeline is only working in the neighborhood of 70% accuracy. Every subsequent step depends upon this shaky foundation. However, certain important items, especially entities referenced with a proper name, enjoy a much higher accuracy. For example, player references are marked with an 85.93% recall and 94.44% precision. Teams are marked with 92.72% recall and 84.34% precision. These numbers become even better once we have fed back the terms for I which we are confident belong to a certain label. 4.4.3 Domain Learning After marking the sport biographies according to a general sport ontology, we pause from the markup of documents in order to form a more precise knowledge of the particulars of each sport (see Figure 4.2). This stage pertains to the primary elements of the ontology—the entities of the domain and their attributes—rather than events and relationships. Before we resolve references, whether to particulars or to abstract concepts, we must separate out all these entities and give them identifiers: we must determine the units of each class. We must examine the ways in which non-pronomial referring expressions are written. This task is largely concerned with the structure of names. We mark the entire corpus with semantic labels in the previous stage of semantic analysis. We now pass over the player clusters, gaining counts for the particular structures of each semantic class. Recall that we are operating on twenty parses for each sentence, so i these structures might have multiple interpretations—different part-of-speech tags, different subphrase sequencings—within each parse. It is also possible to have completely different phrase boundaries between the different parse versions of the same sentence. Let us focus here on runs of proper nouns. We are interested in determining the proper names for each entity. We do this for players, nicknames, locations, teams, leagues, 66 schools, team groupings, games, and trophies. Basically, we take a lexical approach to determining which naming expressions corefer. Entities are referred to by name in a plethora of referring expressions. For example, a particular player might be referenced by his full name, including his titles (Robert Gordon Orr, O.C.). Instead, he might get his full common name (Bobby Orr), his last name (Orr), his first name (Bobby). There are other ways, too, such as nicknames. Also, spelling irreg-ularities manifest themselves here. One author might write Bobbie (which is phonologically equivalent and a common alternate spelling), or they might err and write, say, Bobbu. Determining the ways in which a player (more generally, an entity or a concept) is referenced is important for understanding and for generation. Keeping count of the standard ways of referring to a person shall aid our reference instantiator: generating an unambiguous reference to a particular entity is the paramount concern of that subsystem. Understanding a naming reference can be seen as a further expansion of the phrase. We break the name into components, labeling each component with an identifier. For example, the phrase Bobby Orr is broken into a first name and a last name. The process on names can be extended for things like titles, interceding nicknames, letters, honors, etc. In general, this is common to all naming expressions. A name consists of a specific term (Bobby), and a general term (Orr). This holds for locations (town, province, country), teams (name, location), leagues (level, sport), schools (location, type), etc. Our task is to determine the specific and generic terms of a name. We must also cluster the various referring expressions and attach them to an entity identifier. We encoded some elementary information, such as common short forms of a name (e.g. Robert and Bobby). We gave some generic terms (e.g. University, High School, etc., in the case of schools) to help identify the individual units of each class. It was not our goal to implement a general system which infers the structures of names or naming expressions from a large corpus, although automatic techniques are con-ceivable. Further, we clustered abstract items, such as roles (coach, captain, commentator), 67 skills (speed, strength), body parts (knee, nose). These are not named using proper nouns. However, they exhibit similar naming phenomena. Basically, there are generic terms, which we assume to appear as the head noun, and there are specific modifiers, which we assume to appear before the noun (we are speaking in English). Consider the terms head coach, assis-tant coach, and goaltending coach. Some concepts are specific enough and are contained in a single word (such as defenseman). We are also responsible for determining the synonymous terms. For all the classes, we find synonymous terms by employing a few simple techniques. The first is a simple edit-distance measure between two strings, known as Levenshtein dis-tance. We also use prefix matching and term cooccurrence. We decide on the primary forms for each term by selecting the forms which occur most frequently. We will use these standard terms, and only these terms, in the generation side. We will call this "term normalization." Let us consider the manifold ways in which an author may refer to the defenseman position. The terms the system found were defenseman, defensemen, defender, defenders, defenceman, defencemen. These forms are obviously close in terms of edit distance. One alternate form that the author observed but was not captured by the synonym finder is blueliner, which refers to the fact that the defenseman line up on their own blue line at the drop the puck at center ice, that the defensive zone is demarcated by the blue line, and that the defenseman holds the blue line in the opposition zone. The individual items discovered by this domain learning process may be thought of as the full extension of our ontology down to the individual concretes. We have reached bottom. We have an opportunity here to identify errors. For example, consider the ways in which the hockey team St. Catharines Tee Pees is referenced. First, notice that the team name is Tee Pees, and that the team is from St. Catharines (in Ontario, Canada). There are a few ways in which a full name reference may differ. First, one could misspell St. Catharines. Indeed, it is usually mistaken as St. Catherines. Next, the team name (perhaps due to its rhyming name) may be condensed into a single word, perhaps with a change in 68 capitalization. In fact, the following forms are observed: Tee Pees, TeePees, Teepees, and even Tepees. Our system captures all these variations (alternates and spelling errors alike) and recognizes them as referring to one team entity. Again, the spelling is normalized in the generation phase when we reference only with the most frequent form. As an example of the particulars of some abstract classes found by our system, we tabulate a few "top ten" lists below. name species count NBA championship 3442 NCAA championship 2110 World championship 564 NIT championship 477 AAU championship 457 Wade trophy 396 Big title 353 NBL championship 343 Olympic medal 342 European championship 252 Table 4.6: The ten most frequent trophies from the basketball corpus. variations forms count head, assistant coach, coaches 16486 quarterback, quarterbacks 7256 defensive, tight ' ends, end 5865 defensive, offensive tackle 4517 running, defensive back, backs, backup 4103 middle guard 3251 middle, defensive, offensive lineman, linemen, linebacker 3111 assistant 2747 wide receiver 2697 halfback 2353 Table 4.7: The ten most frequent positions, along with their variations, from the football corpus. 69 name abbreviation count National Hockey League NHL 2684 World Hockey Association WHA 860 National Hockey Association NHA 729 Pacific Coast Hockey Association PCHA 650 American Hockey League AHL 608 Ontario Hockey Association OHA 480 Western Canada Hockey League WCHL 394 Western Hockey League WHL 374 Quebec Major Junior Hockey League QMJHL 372 International Hockey League IHL 320 Table 4.8: The ten most frequent leagues, along with their abbreviated forms, from the hockey corpus. 4.4.4 Reference Resolution Up to now, the unit of analysis has been subsentential. A discourse, however, is a collection of related sentences which is only understood as a relation amongst the individual points made in the component sentences. We must now perform a "reading" of the text. We must hold context, which is the sum of all the earlier points conditioning the sentence we are about to understand. We approximate this by keeping track of the last item resolved for each semantic class. We now turn to some related issues. We describe the resolution of particulars of the semantic classes. We then investigate the identification of the temporal position and range of a statement. We will term this "temporal reasoning." Resolution Method We resolve most phrases simply by matching them exactly or matching within a small edit-distance tolerance. In resolving references to named entities, we utilize the knowledge of naming phrases gained in the domain learning stage. In resolving references to concepts, we use exact matching against the synonymous word forms of each subtype of the semantic class. We resolve references to players, organizations, awards, and times using a focus 70 approach. After every sentence we update access lists which contain information about the most recently accessed referents. We also track the player focus. We set this to the last player resolved as a verb phrase subject or object (a player in the subject position is preferred). ' t Resolution using some edit-distance tolerance can overcome some spelling mistakes. The most frequent mistake we observe is misspelled names, which is handled in tandem by this stage and the domain learning stage. I The resolution of times is especially important. We detail our method for dealing with explicit and implicit times in the next section. R e a s o n i n g about T e m p o r a l L o c a t i o n In a biography, which is necessarily a document containing multiple episodes arranged ac-cording to the development of a person's life, the temporal location and duration for each episode must be treated. For a biographical summarizer, which is primarily concerned with identifying these episodes, the temporal location and duration of each episode must be decided. Temporal information is written in the text in a number of ways. These items are either explicit, as in March 20, 1948, or are implicit, and therefore require some method to infer the time period they refer to. Our underlying assumption is that the text tends to develop forward in time. This does not necessarily mean that, in the course of reading a document, all statements read correspond to events that happened prior to the events left to be read. For example, authors may state the major accomplishments of a person in the first paragraph, then start in on an account of a life by citing his birth, the conditions in which a person was raised, etc. Abstractly, we wish to resolve each noun phrase marked time class to some point or region in time. In the case of a time point, we set the value attribute. In the case of a time range, we set the duration attribute. Our simple method first identifies explicit times. These are easily found using regular 71 expressions. C o m p u t i n g E x p l i c i t T i m e s Let us discuss the structures and methods for our time classes. Our methods depend upon the parse structures produced by the parser. Again, we maintain that these structures are compositional, and we will move from the basic elements up to the larger structures. Years are the easiest types to spot. The documents we are dealing with are organized sport, and they have only been played on a large scale in the 19th, 20th, and 21st centuries. We are looking for noun phrases marked as a cardinal number with a value attribute between 1800 and 2099. Seasons and ranges of years look roughly the same. They contain a four digit year, a dash (or slash), and another year. The second year is usually abbreviated. However, sometimes seasons take the same form as years. We must also disambiguate between the two. Months too are single-word noun phrases, but they contain a named division of the calendar. Being common, they may also be abbreviated. We map the names of the months i . - • to their index in the year. • Now we get into more complex phrases. Our first example is a day, which contains a month, a number, and a year. We have recursively determined the values for the month . , i and the year, and so we simply find the day (of the month) and form the value. An example of a day is: <np class="time" type="day" value="20/3/1948"> <np class="time" type="month" value="3"> <w lex="March" pos="nnp"/> </np> <np class="number" type="d" value="20"> <w lex="20" pos="cd"/> </np> <w lex="," pos=",'7> <np class="time" type="year" value="1948"> <w lex="1948" pos="cd'7> 72 I i </np> </np> Durations are approached in the same way: a time phrase with a cardinal number subphrase receives a duration attribute with the value inherited from the number sub-phrase: <np class="time" type="game" duration="6"> <np class="number" type="d" value="6"> <w lex="six" pos="cd"/> </np> <w lex="games" pos="nns"/> </np> ; A g e and the B i r t h E v e n t The most general and relevant information about a person is his time of birth. Knowledge f of this fact enables us to compute his age. Let us illustrate how potent this ability is by examining two sentences from our Bobby Orr biography: <document> <p> Robert Gordon Orr was born in Parry Sound, Ontario, Canada, on March 20, 1948. At 18 he signed with the Boston Bruins of the NHL. </p> </document> We have already computed the value of Orr's birthday. But when (roughly, in what range of dates) did Orr sign with the Bruins? The information is not explicit, but can be inferred in this standalone text. The computation of the time position is our first example of inferential computation of the temporal location of a statement; in the case where another time is mentioned in the same statement, we arrive at our first check of validity and consistency of events. 73 count phrase 2081 the season 2026 that year 1866 the following season 1542 the following year 1524 the year 1340 the next year 1256 the next season 1194 a year 1112 that season 836 his first season 732 his first year 693 the regular season Table 4.9: The most frequent underspecified expressions referring to some season. This illustrates that we must first understand the birth event (the first sentence) prior to attaching a time to the second sentence. Unfortunately, we perform the resolution stage to the entire document, then later perform the frame reduction stage which would recognize the event and permit us the inference (see Figure 4.2). Equivalently, we need to extract the proposition concerning the time of Orr's birth before fully resolving the relative time of the second statement; the problem is that we perform the document markup to the whole document in discrete stages, whereas a human is resolving and recovering propositional content as he reads in one pass. Relative Times Like (domain-independent, general human) concerns such as birth relative statements, we are interested in domain-dependent, sport specific inferences. Summarized in Table 4.9 and Table 4.10 are the non-explicit season type phrases from the hockey corpus, as well as some other notable entries. We see that the ordinal adjectives first, following, and next are very important. Luckily, these simply indicate the successor year to the previously mentioned year; they are easy to compute, given proper resolution of the year before. (One should also note that season and year are in most cases synonyms for a sport season. It seems that season is slightly more preferred than year). 74 count phrase 459 that same year 383 his second season 240 his rookie year 220 his final season 200 his second year 182 his rookie season 180 his final year 138 the previous season Table 4.10: Some other notable underspecified expressions referring to some season. Some events are milestones and epochal. A major change of state serves as the fixed point of time, as a point of reference for measuring time. In some sense, a player is "born" into a league as a rookie, just as an athlete achieves professional status. Special adjectives are common to certain milestone seasons in a player's career. The first season uses first, rookie, debut, etc. The second season is sometimes termed sophomore. The last season is called last or final. We do not model these in the current instantiation of our approach; again, they are resolved when one comprehends events like the joining of a league, attaining a certain status, etc. Unmodelled Temporal Signals We have left some necessary temporal items untreated. For example, we miss "markers" into time regions. These usually appear as compound phrases which further specify the location in time. For example, consider what is meant by "the start of the 1967-68 season.". This phrase refers to the earliest point of whatever is meant by "the 1967-68 season." Another huge detractor is that we have ignored preposition phrases. These phrases signal important temporal information, including the precedence of events. Finally, time is also communicated by other classes, especially descriptions of players. For example, consider rookie, freshman, senior, veteran, etc. These things depend on the resolution of other classes, such as organizations and events. For example, a freshman is a 75 first-year student in a college. The resolution is beyond our current method. 4.4.5 Sentence Simplification Syntactic simplification is the third phase of semantic analysis (see Figure 4.2). Our goal is now to simplify each sentence into independent clauses. It is on these individual units that we will recover the corresponding frame. Thus, we wish to reduce a sentence to a number of propositions. A sentence may be a compound of many independent clauses, verb phrases, or subordinate clauses. This loading of information into the sentence is governed by the con-struction rules of grammar. Further, the choice of content to fuse into a sentence is governed by conceptual knowledge: only related content should be united into a single statement. In its simple form, a sentence consists of an independent clause; a sentence relates a subject and a predicate through a verb phrase. However, many ideas are closely related, and events can share a context. The process of combining content to eliminate redundant, simple statements while preserving their individual meaning is fraught with subtle difficulties. To a human reading a text, aware of the liability to err, the correction of grammatical errors is largely automatic, even to the point where one is not consciously aware of the error. A human is conscious of the intended, obvious meaning; perceiving this, he does not pause for a formal analysis of syntax. The reader can reconstruct the meaning of a grammatical mistake, just as he can determine the correct word from a spelling mistake. We observe some repeated types of grammatical mistakes in the text. This simplifi-cation stage is where they become explicit. For example, the subject of hanging material at the beginning of sentences is the subject of the main clause. One mistake that is often made is the omission of the (hanging) subject from the main clause. Another type is a phrase in which the subject of the hanging clause participates as a genitive in the subject of the main clause. An example of the latter type of error is "Born in Parry Sound, Orr's abilities were well-known in the town." Our simplifier rewrites the hanging material as "Orr's abilities 76 were born in Parry Sound." The sentence simplification stage must take place after the semantic analysis. It is not a purely grammatical technique. For example, we make use of semantic labels in order to disambiguate true parenthetical matter from badly punctuated text, badly parsed sentences, etc. We also use semantic information during rewriting of subordinated material in order to generate the correct preposition (e.g. time/age requires an at or by, while time/year requires an in). We do not attempt to fix errors like the grammatical mistakes associated with hang-ing phrases. However, in cases where the true subject is present (as in the genitive phrase variation on the hanging material error), it is an artifact of the graph search method of the frame reduction stage, described in the next section, which allows us to recover the intended meaning. Let us now commence with a description of the grammatical constructions we target. We will cite the points at which conceptual knowledge is required to form simpler sentences. C o m p o u n d Sentences The first, and probably easiest, sentence simplification is to divide compound sentences, which are two or more independent clauses joined together by a conjoiner. Two examples are the , and construction and the ; punctuation symbol. This does not require any markup ability, only the correct parse. A p p o s i t i o n Another construction which is abundant and relatively easy to spot is apposition. The form is a definite noun phrase which contains subchildren, one of which is a reference (usually a name), the other a description. It is generally constructed using the double-comma punctuation symbol. Our method is to capture the description, relate the definite reference to the de-scription (through the cased form of the auxiliary verb was), and then reduce the entire 77 phrase in the original sentence to the definite reference. Dependent Clauses We shall detach non-restrictive subordinate clauses from the main sentence. We spot these assuming that the grammatical convention that a comma punctuate non-restrictive clauses is observed. Verb Phrase Constructions Verb phrases may be conjoined. These multiple verb phrases all apply to the subject of the sentence. Also, we could break compound objects and possibly compound complements if so inclined. We only break conjoined verb phrases. An oft-observed grammatical mistake is to include a hanging preposition phrase to a compound verb phrase containing a second preposition phrase with conflicting information. Consider "In 1967 Orr won the Calder and won the Norris in 1968." It is our conviction that the hanging material "In 1967" applies to the entire conjoined verb phrase. A dogmatic rewrite forms two sentences, the latter written "In 1967 Orr won the Norris in 1968." Events with different temporal context but conjoined within a verb phrase should contain their preposition phrases within their respective phrases. If one wishes to emphasize the time with a hanging preposition, one would be better off using a compound sentence construction, as in "In 1967 Orr won the Calder, and in 1968 Orr won the Norris." i We rewrite conjoined verb phrases according to our conviction. This introduces conflicting time values for events. We corrected for this type of mistake (during the frame i. reduction stage described in the next section) by favouring a time found within a verb phrase rather than a time found hanging before the verb phrase. Results and Use Sentence simplification is our term for reducing compound grammatical structures into single independent clauses. We are separating out the matter of analyzing individual propositions 78 (which requires conceptual domain knowledge) and of the relatively universal problem of relating the events through syntax. The problem of relating events is one of understanding concepts of grammar. Cer-tain function words, especially prepositions and subordinating conjunctions, serve to relate events. For example, consider the sentence: After he retired in 1979, Orr became involved in advertising. The hanging phrase temporally connects two events. Simplifying, Orr retired in 1979. The independent clause of the sentence is: Orr became involved in advertising. These sentences should be separated thus, and further, the temporal precedence should be noted. We have remarked above that we do not mark preposition phrases. We have not developed a markup scheme for capturing relations indicated by grammar and function words. For example, temporal precedence and causal information stated in these ways is not kept. Critically, our method does not attend to temporal signals like before and after. Sentences are simplified according to their position in the sentence, and rewritten in this order. For example, we would get the same output for the above sentence if we swapped After he retired with Before he retired. This is one reason why we perform the resolution stage before the syntactic simplification: the order of events might get reversed, which could cause an error during the resolution of relative temporal phrases. The ordering of rewritten sentences and the establishment of the temporal and causal relationships between these rewrites deserves more development effort. There are some potent uses for the output from the syntactic simplification stage. First and foremost, we observe that the parser stage produces bad results, especially as the length of the sentence increases. Recall that we are running our system on the top twenty parses. It is possible to take the most frequent reduced forms of the sentence and re-parse on these smaller sentences, leading to a greater chance of getting the correct individual parses 79 within the top twenty bracket. Another use for this technique is for gaining information about what ideas and concepts are related, and in what ways these relationships are expressed in terms of grammar (apposition for certain types of information, subordination for others). Good authors do not arbitrarily group independent clauses in compound sentences. Thus, we may assume that event classes which cooccur frequently in compound sentences hint at related ideas. We can then use this knowledge when we face the problem of forming complex statements from independent clauses in the generation stage. We list here, for each source of each sport in the corpus, the number of sentence parses, the number of sentences after simplification, and the ratio between the latter and former value: source sentences splits ratio attic 8721 11888 1.363 hickok 51188 88364 1.726 hitrunscore 2340 3440 1.470 hoophall 56962 88909 1.561 trivialibrary 3000 4233 1.411 wiki 26771 40681 1.520 Table 4.11: The split counts for the basketball corpus. source sentences splits ratio hickok 104098 186952 1.796 profootball 70223 105587 1.504 tq 7680 11380 1.482 wiki 36160 55226 1.527 Table 4.12: The split counts for the football corpus. 4.4.6 F r a m e R e d u c t i o n The process of frame reduction is the last phase of semantic analysis (see Figure 4.2). This process carries the individual propositions into a knowledge representation. We perform this reduction first on a document in isolation: we are recovering the facts stated by the 80 source sentences splits ratio hf 1 9220 15686 1.701 cph 23094 37463 1.622 wiki 28330 43461 1.534 hickok 26218 45294 1.728 loh 128576 211514 1.645 Table 4.13: The split counts for the hockey corpus. author of the document, regardless of whether we accept them. This state is free of any evaluation of the veracity of the statements; verification and acceptance of the information occurs later. Like the identification of the primary elements of the domain, we are required to determine the dependent, secondary elements relating the domain primaries. The first domain learning task was extending the ontology corresponding to noun phrases, and now we are extending the ontology corresponding to verb phrases. This is the appropriate point at which to discuss the elaboration of the event ontology. We establish abstract types of events, identifying the necessary components and the extra, optional components. We then turn to the process of mapping the semantically-decorated syntactic rep-resentations of the raw text into a knowledge representation. Just as we understand the meaning of particular noun phrases by establishing what particulars (existents or concepts) are referred to, we understand the meaning of an entire sentence by establishing what event or events are referred to. We establish particular instances of events, requiring at least the basic components of the event and striving to capture any extra components. The Event Ontology Let us return to the process of domain learning. Earlier, we identified a number of primary objects for each sport. We did this by counting and comparing word forms contained within each semantically labeled noun phrase. We now discuss the corresponding process as applied to the secondary elements of the ontology. Events relate members of conceptual classes. We identify the secondary objects by counting and comparing the syntactic forms in which noun 81 phrases are reduced to their, semantic labels. We perform frame reduction on a limited scale. We identify the most important relations between concepts in the domain. We do this by counting the occurrence of the various verb phrase forms, reducing each constituent subphrase of these verb forms to its semantic label. We then assume that the most frequently occurring forms are the most important. Let us examine the I most frequent forms as mined from the wiki hockey source. We give examples of conforming text, and we omit the linking verb forms. For brevity, we present this in a simple parse form in which square brackets represent noun phrases, braces indicate verb phrases, and dash-parenthesis signal preposition phrases. [player/definite] {occurrence/award [award/trophy]} Trottier won six Stanley Cups. [player/definite] {state/copula {occurrence/award -(into [award/honor]) -(in [time/year])} } Mikita was inducted into the Hockey Hall of Fame in 1983. [player/definite] {occurrence/play -(for [org/team])} Ron Francis played for the Toronto Maple Leafs. [player/definite] {state/copula {occurrence/birth -(in [location/city])} } Tim Horton was born in Cochrane, Ontario. [player/definite] {occurrence/play [time/season]} Lemieux missed the 1993-94 season. [player/definite] {occurrence/death -(on [time/day])} Moose Goheen died on November 13, 1979. 82 [player/definite] {state/copula {occurrence/award -(to [org/team])} } He was named to the All-Star team. [player/definite] {state/copula {occurrence/birth [time/day] -(in [location/city])} } Chris Chelios was born January 25, 1962, in Chicago, Illinois. [player/definite] {state/copula {occurrence/birth -(on [time/day])} } Dale Hawerchuk was born on April 4, 1963. [player/definite] {occurrence/retire -(in [time/year])} Henri retired in 1975. [player/definite] {occurrence/play -(in [time/game])} Cournoyer played in six All-Star Games. [player/definite] {occurrence/score [statistic/point]} He recorded 103 points. He added 12 points. [player/definite] {occurrence/lead [org/team] -(to [award/trophy])} Howe led Detroit to four Stanley Cups. [player/definite] {state/copula {occurrence/award -(to [award/honor]) -(in [time/year])} } Bower was elected to the Hockey Hall of Fame in 1976. [player/definite] {occurrence/score [statistic/goal]} He scored 507 goals. [player/definite] {state/copula {occurrence/trade -(to [org/team])} } Dionne was then traded to the Los Angeles Kings. 83 [player/definite] {occurrence/award [award/trophy] -(in [time/season])} Bill won the Stanley Cup in 1944 and 1946. [player/definite] {state/copula {i state/belief -(as [player/description])} } Keon is remembered as one of the Maple Leafs most productive offensive stars of the 1960s. He was known as a strong offensive threat. [player/definite] {occurrence/award [award/trophy] -(in [time/year])} He won the Canada Cup in 1976. [player/definite] {occurrence/award [award/trophy] -(as [player/description])} He won the Art Ross Memorial Trophy as the league's leading scorer. [player/definite] {state/copula {i state/belief {structure/to {state/copula [player/description]} } } } LaFontaine is considered to be one of the most classy and graceful players of all time. -(in [time/year]) [player/definite] {occurrence/join team [org/team]} In 1922 he joined the Toronto St. Pats. We capture a limited amount of frames. These frames represent qualities of an entity and occurrences of events. We give a list of the frames and frame descriptions: It could be argued that we have already identified the frames when we singled out the various types of verb phrase labels in the ontology. However, there are far more verb phrase semantic classes than frame types. Moreover, many of the verb phrase classes may reduce to one event (for example, verb phrases labeled state/copula and i state/belief may reduce down to the event asserting a certain role of a person), just as a single verb phrase class may consume text which could be one of many form types (for example, the occurrence/award label might relate a player and an award, or a team and an award). Isolating the factual information ("what is said") from the form ("how it is said") is important for generation, ^uch like recovering a standard term name for each primary, 84 name description birth a person comes into existence skill a skill or quality attributed to a person copula a role or description is asserted of a person nickname an informal name attributed to a person measure a physical measurement attributed to a person draft select a person's playing rights are claimed by a team join team a person joins a team organization sign a person signs a contract with a team organization teamed a person is teamed with other specific people trade a person's playing rights are transferred between team organizations leave team a person leaves a team organization lead team a person leads a team organization to some accomplishment lead league a person leads a league organization in some statistic play a person is (in) active with an organization award a person wins an award statistic a person performs some quantified sport event retire a person quits a sport jersey retire a person's jersey number is retired injury a person suffers an injury death a person ceases to exist Table 4.14: The names of the frames and their descriptions. recovering the standard ways in which a fact is verbalized allows us to refer to the fact in a concise, consistent form. We( are able to recover syntactic forms for generation. Recall our standardization on the most frequent lexical forms for generation of ref-erences to domain primaries. We standardize references to events in an analogous manner: counting the specific syntactic arrangements of the event components, we settle on the most common forms. These syntactic forms are then converted into a slightly different represen-tation and provided as input to the generation system (we will revisit this in the section on realization). We also permit some synonymous verb lexemes, which are generated according to the distribution observed in the training corpus. Graph Search Method for Reduction We reduce the semantically decorated syntactic structures to frames using a simple graph search method and some logic to decide which frame slot a matched element is assigned. 85 Basically, each phrase type is mapped to a function. This function operates on a parsed sentence and returns a set of frames. Each sentence is searched for its phrases, and then is passed to the function(s) associated with its phrase types. This function is responsible for instantiating1 frames. For example, the verb phrase label occurrence/birth is associated with the func-tion reduce-birth. This function first looks for the subject of the birth—our approximation is to recursively search for the first definite reference to a person. We then require a time or a location, so phrases matching these semantic types are recursively searched for under the occurrence/birth verb phrase. Finally, if this information is successfully acquired, an event frame is formed and logged. Now let us return to the rewrite of the hanging material error in the previous section. The rewritten sentence was "Orr's abilities were born in Parry Sound." Since we search recursively for a definite person phrase contained within the subject position, we do in fact find Orr as the subject of the birth. We find a location under the verb phrase, so the event is logged. Additionally, some logic (operating on prepositions and syntactic structure) was added by the author in cases like the trade event, where one must distinguish between the trading team and the receiving team. Each function logs a number of events, but these events are not necessarily added to the output list. Because we are operating on multiple parses, we set a threshold value which requires the identical fact to be extracted from multiple versions of the parse. Any fact not meeting this threshold is pruned from the output. This graph search method is only an approximation. Ideally, we would align a num-ber of syntactic trees with their corresponding frame representations, then automatically learn the mappings between the two. This would be an interesting addition to our inter-face: users would read a sentence, pick frames, then click and drag the phrases to their corresponding frame slots, i We decided on this recursive tree search style because of the multiple versions of 86 each parse. The most common problem is that a preposition phrase which should attach to the verb phrase is actually attached to some noun phrase, which is in turn the complement of a preposition phrase attached to the verb phrase (the parser favours deep parses). There are cases in which the search method introduces error, and the main source of these errors is due to badly parsed text. Identifying a single correct parse would greatly simplify this component of the semantic analyzer. C r i t i q u e Shamefully, we do not yet handle polarity. Our system will falsely recognize a positive award event in: Cretzky did not win the Calder. Another source of error has to do with special function verbs. For example, we do not deal with the modal class of verbs. For example, the system also recovers an award event from the following sentence: Gretzky would have won the Calder. The same event is captured for both statements. Both statements do refer to the event: Gretzky won the Calder. The first states that this particular event did not occur. The second was probably rewritten from a sentence like Gretzky would have won the Calder, but ... in which case the statement Irefers to an event that was hypothetically sure to take place if not for the second condition. This accounts for some of the false positive events produced by our system. We did not devote much time to analyzing modifications to a verb phrase since generally a biography is full of unmodified occurrences of events. 87 4.5 V e r i f i c a t i o n We are now in a position to decide whether or not to adopt an author's statement as a belief. It is at this point that we can properly decide whether two statements mean the same thing, or are in conflict, or are unrelated. We implement some simple rules which detect some limited forms of contradiction. For example, we cannot accept multiple accounts of the statistics for a particular player within the same period of time. We also encoded a simple domain rule in hockey: (within any period of time) the sum'of the goals and assists is the number of points. Any recovered proposition not conforming to this rule is dropped from consideration. In order to deal with the rough, inexact times produced by our system and inherent in differing accounts, we clustered certain types of events to within a small time variation. We then pick the most specific delegate from these clusters as the representative on the output. For example, a join team event is less specific than a draft (when the teams are the same and the event times are roughly the same), so we prefer the draft event. This eliminates redundant events! The other purpose of the frame harmonizer is to determine the most specific time of an event. For example, we would rather know the day-of-birth rather than the year-of-birth for a person. Finally, we make some other simple constraints on the accepted events. When we decide the time of birth for a person, we require any other event involving the person to occur at least two years after this event. The output of the frame harmonizer is a collection of time-ordered events gained from all the documents from a player cluster. Generally, knowledge is a unified, wholesome view incorporating all that one learns. We should not limit ourselves to the relatively small subset of events present in a player cluster. We should instead attempt to harmonize the entire corpus. This would only reinforce our claims to knowledge and aid us in weeding out spurious claims introduced by our probabilistic foundation. 88 We do not currently unify our entire knowledge base, although certainly this is the destination for our work. The author believes that the system could attempt to recheck and revise the semantic analysis of sentences which produce events that are in direct conflict with the sure facts backed by a large amount of evidence. Also, since it would be comparatively easy to extract a large number of propositions about certain types of events (statistics, award wins) from tabular data available on the internet, we have a quick way to determine the accuracy of our epistemological system for this subset of events. At this point, we have not carried out a check of the accuracy. 4.6 G e n e r a t i o n Our generation system is extremely simple. We have captured all the necessary information (frames, syntactic forms, standardized terms) during the understanding phases. Now we pick our facts, conjoin them into sentences, and realize the surface text. 4.6.1 Content Selection Our overriding content selection concern is: state facts directly related to the biographical subject. Only those events with a reference to the biographical subject are passed on for consideration. Next, we must be concerned with length, so we measure the importance of each frame. We do this by counting the frame inverse document frequency (IDF) over the training set. We form the ratio of total documents by the number of documents containing at least one occurrence of the frame type. The event types we have captured were selected because they were frequently occuring, so it seems odd now to "invert" their importance. However, this IDF value favours items like trades, drafts, and retirements over statistics and descriptions. Intuitively, we also want to highlight facts with more information, so we add the logarithm of the number of nodes (roughly, the number of individual items of information) contained in the event frame. This is an ad hoc ranking method inspired by IDF term weighting. It is too simple, 89 and the first improvement that comes to mind is the a weighting considering each branch of the event frame according to frequency information within the ontology. For example, two award events may be of equal importance under our ranker, whereas one event might concern the Stanley Cup (700 references), the other the Challenge Cup (3 references). In hockey, winning the Stanley Cup is more prestigious, and this fact should be reflected in the rank. Now the events are ranked. We now need to generate a biography of at most 200 words. We are not exactly sure how many words will be realized for each subset of the ranked events, so we search for the largest ranked subset of events with a realization of at most 200 words. Basically, we start with a full event stack, then iteratively realize a biography, count the words, end if we meet the length constraint, otherwise we pop the lowest ranked event from the stack and repeat. 4.6.2 Content Structuring We arrange our content according to a simple schema. We impose the following schema: a birth is mentioned first, followed by the player skills, nickname, and body measurements. Next, any other type of event except death is mentioned. Finally, a death event is cited. All these events are ordered according to their time attribute. The content is fused into sentences in a simple way. We take two adjacent sentences and attempt to conjoin the two. If the sentences share the same subject, we form a conjoined verb phrase, otherwise we form a conjoined sentence with the , and construction. r 4.6.3 Realization Our realization method instantiates references to events and qualities from the frames picked by the content selector. The specific information concerning the biographical content has been ascertained. The grammatical structure governing the construction of sentences is determined either by a human expert (canned text), or by some automatic method. The third ingredient is the 90 noun phrase reference writing strategy; surface form references are decided within context. Like reference resolution, past references condition the form of the current references. Our basic method of realization for a frame is to fill a general form with particulars. We replace the abstract classes of the general form with the particular entities from the frame data. The learning of these forms is essentially the reverse process. We proceed from particular surface sentences. We use our markup system to classify the phrases into their abstract categories. We can then mine the corpus for common abstract grammatical orga-nizations of event types. Here we can gain alternate structural forms (paraphrase rules) and find synonymous terms, all of which will allow our output to appear varied. Overall, we arrange text according to a schema. This is an XML document which will be filled with events from the player event cluster. Just like sentence surface forms, these schema (document level plans) can also be learned. We limit our realizer to produce only simple sentences, conjoined sentences, and compound verb phrases. Let us examine here the instantiation of English text for the frame corresponding to the birth of Bobby Orr. The syntactic form for the birth event is given below: <parse> <s> <np class="player" type="definite" fill="event/player"/> <vp class="structure",type="past" voice="passive"> <w lex="was" pos="aux"/> <vp class="occurrence" type="birth"> <w lex="born" pos="vbd"/> <pp> <np class="location" fill="event/location"/> </pp> <PP> <np class="time" fill="event/time"/> </pp> </vp> </vp> </s> 91 </parse> The frame representation for Orr's birth is: <event type="birth"> <player id="217"/> <location type="city" id="136"/> <time type="day" value="20/3/1948"/> </event> Our realizer is given the form and the event data. The f i l l attribute indicates the path of the data item to be realized. This item may be either found in the specific event, or may refer to some other item of knowledge in the ontology. Ultimately, this filling function will replace the abstract unfilled fields with the corresponding identifiers from the ontology. . ' ' I! We fill the form by copying across attributes from the event (or ontology) into the syntax tree. Thus, after filling the information from the event, we arrive at the intermediate tree: <parse> <s> | <np class="player" id="217" type="definite"/> <vp class="structure" type="past" voice="passive"> <w lex="was" pos="aux"/> <vp class="occurrence" type="birth"> <w lex="born" pos="vbd"/> <PP> <np class="location" id="136" type="city"/> </pP> <pp> <np class="time" type="day" value="20/3/1948"/> </PP> </vp> </vp> </s> </parse> We next move to realizing the surface forms for the noun phrases. Assume that we are generating this sentence as the sole sentence in a document. We fully realize the 92 references in named form. We also generate the correct prepositions for each prepositional attachment. In general, we might accomplish this by associating prepositions with their complements (i.e. by mining these associations over our corpus). Here, the author has encoded the rules for the various phrase classes under each prepositional phrase. <parse> <s> <np class="player" type="definite"> <w lex="Robert" pos="nnp"/> <w lex="Gordon" pos="nnp"/> <w lex="0rr" pos="nnp"/> </np> <vp class="structure" type="past" voice="passive"> <w lex="was" pos="aux"/> <vp class="occurrence" type="birth"> <w lex="born" pos="vbd"/> <PP> <w lex="in" pos="in"/> <np class="location" type="city"> <np class="location" type="city"> <w lex="Parry" pos="nnp"/> <w lex="Sound" pos="nnp"/> </np> <w lex="," pos=","/> <np class="location" type="state/prov"> <w lex="Ontario" pos="nnp"/> </np> ! <w lex="," pos=","/> <np class="location" type="country"> <w lex="Canada" pos="nnp"/> </np> <w lex="," pos=","/> </np> </pP> <PP> <w lex="on" pos="in"/> <np class="time" type="day"> <np class="time" type="month"> <w lex="March" pos="nnp"/> </np> <w lex="20" pos="cd"/> <w lex="," pos=","/> ji <np class="time" type="year"> 93 <w lex="1948" pos="cd"/> </np> <w lex="," pos=","/> </np> </pp> </vp> </vp> </s> </parse> 4.6.4 Punctuation T h e final a c t o f r e a l i z a t i o n i s t h e o b s e r v a n c e o f t h e r u l e s o f p u n c t u a t i o n . W e b e g i n a l l s e n t e n c e s w i t h a c a p i t a l i z e d w o r d , a n d w e c o n c l u d e a l l s e n t e n c e s w i t h t e r m i n a t i n g p u n c t u a -t i o n . O t h e r p u n c t u a t i o n r u l e s i n c l u d e t h e a b s o r p t i o n o f c o m m a s i n t o p e r i o d s , p e r i o d s i n s i d e q u o t a t i o n s , e t c . 3 L e t u s c o n t i n u e w i t h o u r e x a m p l e . F o r o u r e x a m p l e , i n w h i c h t h e d a t e a t t h e e n d d e s e r v e s a s e p a r a t i n g c o m m a , w e m u s t a b s o r b t h e c o m m a i n t o t h e final p e r i o d . O t h e r e x a m -p l e s o f p u n c t u a t i o n - r e l a t e d c o n c e r n s i n c l u d e d o u b l e - c o m m a p u n c t u a t i o n a t t h e b e g i n n i n g o f s e n t e n c e s ( m a i n l y u s e d f o r a p p o s i t i o n ) a n d t h e a b s o r p t i o n o f p e r i o d s i n t o q u o t a t i o n m a r k s , p a r e n t h e s i s , e t c . W e s h o u l d a l s o g e n e r a t e t h e p r o p e r t y p e s o f q u o t e s , t a k i n g c a r e f o r n e s t e d q u o t e s . W e a r r i v e a t t h e f o l l o w i n g o u t p u t : Robert Gordon Orr was born in Parry Sound, Ontario, Canada, on March 20, 194-8. A n d voila!, w e h a v e e x p r e s s e d o u r k n o w l e d g e a b o u t t h e b i r t h o f a h o c k e y l e g e n d i n t h e f o r m o f a n E n g l i s h s e n t e n c e . 3 T h e a u t h o r b e l i e v e s t h e i n v e r s e p r o c e s s — e x p a n s i o n a n d c l a s s i f i c a t i o n o f p u n c t u a t i o n — t o b e a n i m p o r t a n t b u t l a r g e l y i u n t r e a t e d a n d u n a c k n o w l e d g e d p a r t o f t h e d o c u m e n t u n d e r -s t a n d i n g p r o c e s s . T h e d i f f i c u l t y l i es i n t h e f a c t t h a t s y m b o l s a r e o m i t t e d f r o m t h e s u r f a c e f o r m , a n d t h u s m u s t b e r e - i n t r o d u c e d . . . a p r o b l e m t h a t s e e m s s i m i l a r i n s p i r i t t o e l l i p t i c a l p h r a s e s . 94 Chapter 5 Evaluation 5.1 Baseline Summarizers I: We implemented a number of baseline extractive summarizers. Our baseline summarizers are variations on a random summarizer and the M E A D platform. 5.1.1 Random Summarizers We created two random extractive summarizers. The first random summarizer collects all the sentences from the source documents and picks uniformly until the length constraint is met. This system we call random. The second random summarizer also considers the length of the source document for each sentence. We assume that better summary sentences are contained within shorter biographies. We randomly select sentences first by randomly picking a source document, then by randomly selecting a sentence from that document. The source document is picked according to the inverse of the document length in words. We call this system random I d ("length distributed"). 95 5.1.2 MEAD Adaptations M E A D is our baseline summarization engine. It has been successfully used and adapted for multiple news-document biographical summarization in D U C 2004 Task 5. The summarizer in its basic, domain-independent form can be used to generate multiple biographical docu-ment summaries. Also, we extend the summarizer by implementing feature scripts tailored to general biographical documents and sports biographical documents. We run the basic M E A D system with the Centroid feature script. Our two variations will incorporate this feature, as well as features designed for biographies. We extend M E A D with a general biographical document feature computation script (ParagraphPosition) and a sports-specific feature script (SportsWords). ParagraphPosition Roughly, the first script operates similar to the standard Position feature script provided in R O U G E . This script is suited to news documents, which typically contain most of the important, summarization-worthy sentences near the beginning of the text. However, a biographical document usually contains important information throughout the document. We write two variations on this script, operating with the assumption that the main points of a life are detailed at the head or the tail of each paragraph. The first operates like Position. Position scores the sentence as the inverse of the square-root of the absolute sentence index. Instead, we score the sentence relative to the index in the paragraph. This favours the first sentence of each paragraph, then decays as the paragraph develops. The second favours the sentences towards the beginning and end of the paragraph. We take the maximum of the inverses of the difference between the sentence index and the initial sentence, and the difference between the sentence index and the final sentence. This gives a U-like score to the sentences in the paragraph. We chose this version of ParagraphPosition for use in the evaluation. 96 S p o r t s W o r d s The SportsWords script relies on word frequencies computed from the training set. We focus on the training set player clusters with more than one document. We think of the shortest biography document within each cluster as representing a "model" biography, and the remainder are thought of as input (peer) documents. We assume that a word which occurs in both the model and a peer is biography-worthy. Filtering out stop words, we weigh each term according to this assumption. The feature script computes the feature value of a sentence as the sum of the weights of each word in that sentence. The term weight of each word is the product of the number of model occurrences multiplied by the ratio of model occurrences to peer occurrences. Originally, we tried computing special inverse document scores over the training set. MEAD uses this information in computing its Centroid score. However, computation of a new IDF database from the sport training set was found to lower the ROUGE results of the resulting summaries. The additional weighting of the terms in SportsWords has a positive effect on ROUGE scores. 5.1.3 Training Set Results We ran the MEAD summarizers over the training set. We also ran the random system which randomly selects sentences from the input documents. The random system was run 8 times on each player cluster, and we averaged out the random performance. Again, we held out the shortest document for every multiple document player clus-t ter. This document was considered the model biography. We then summarized the other documents from the player cluster and measured the performance with ROUGE. We present one set of graphs (Figure 5.1, Figure 5.2, and Figure 5.3) compar-ing all MEAD versions against the random baseline, and one graph (Figure 5.4) compar-ing the MEAD SportsWords extension against the pure Centroid MEAD. Generally, the i SportsWords addition is the best system across the training set. 97 I f t length range documents 100-149 90 150-199 106 200-249 69 250-299 65 Table 5.1: The amount of multiple document player clusters for each range of lengths in the graphs below. 100 150 200 250 300 summary length Figure 5.1: ROUGE recall average of the MEAD systems and the random system. 5.2 Evaluation Method Let us now turn to the evaluation of the fully-developed automatic systems on the testing set. We evaluate our system against a number of baseline summarizers. We measure co-occurrence of n-grams between automatic methods and two human-written model bi-ographies for each player in the testing set. We are comparing systems using ROUGE. This is an automatic method of gisting evaluation based on coocurrence of terms. It correlates well with human rankings and assessments on past DUC data. Before we discuss the results of our evaluation, it should be noted that ROUGE is not completely suitable for evaluating the quality of our work. We believe our system is at a fundamental disadvantage in this test, for certain characteristics of a biography cannot be 98 a> a. 400 r— 380 -360 -340 -320 -300 -280 -260 — 100 _L 150 200 summary length mead —I— mead pp - -X -mead sw - - 5fc - • _| random avg 250 300 Figure 5.2: ROUGE precision average of the MEAD systems and the random system. 400 CO CD E 380 -360 -340 -320 -300 -280 -260 — 100 150 200 summary length mead mead pp - -X - -mead sw - - - • _J random avg 250 300 Figure 5.3: ROUGE f-measure average of the MEAD systems and the random system. measured due to the nature of the ROUGE system. Our system is capable of understanding event context. In particular, it can roughly situate events in time. This allows it to sequence the events of a life in the order in which those events occurred. We have recognized this as an essential characteristic of an historical account. However, ROUGE scores are independent of sentence order. 99 100 150 200 250 300 summary length Figure 5.4: The difference in performance between the MEAD SportsWords variant against pure Centroid MEAD. At the 200 word mark, we expect a 1 to 2 point gain in recall and a 1 to 1.5 point gain in precision with the SportsWords variant. 5.2.1 Measuring System Performance Properly, our system should be measured on a number of related tasks. These tasks corre-spond to the major components of the extraction system and the generation system. We are dealing with things that happened in the past: necessarily, our statements are either true or false. A proper evaluation would first determine whether the yield of events produced by an understanding method corresponds to reality. We should give a recall, precision, and fallout measure of the natural language information extraction system. Basically, we measure the performance of the underlying epistemological system: in terms of a common knowledge representation, compare the facts stated by a document (say, have a human produce a document from this set of facts) against those the system determines to be stated. In fact, this was the focus for the Messaage Understanding Conference (MUC), the predecessor to DUC. The next task is validation: determining from a set of facts (obtained from multiple documents), which of the facts are admissible and correspond to the true state of the world. As we have seen, our frame harmonizer performs this task, and eliminates certain types of impossibilities and contradictions. This again may be tested using the evaluation techniques of MUC. 100 Summarization is simply an application, and properly concerns only the meth-ods of selecting and efficiently rendering a collection of facts into natural language. All applications—summarization, document correction, question answering, translation, etc.— make use of the same basic epistemological system. Summarization is interested in choosing content for the summary. We should mea-sure how the subset of selected facts grows as the length increases. Obviously it need not grow one fact per cut, as many facts may be asserted in a single statement. We should compare these nested subsets of facts against those chosen by humans. Summarization is also interested in methods for rendering facts into language. This includes questions such as: how and when may we conjoin many facts together into a single statement, how can we do this without any loss of information, when is "lossy compression" of the text permitted. Summarizing a set of facts, we could measure compression rates between, say, a canonical form for every individual fact, and the smallest summarized form produced by the summarizer. We would also have to verify that the rendering preserves the factual information with little ambiguity in meaning; in the case of lossy compression, we must verify that the rendering preserves the meaning and quantify the amount of compres-sion. Beyond the rendering of fact into language are aesthetic matters. Generating inter-esting narratives and colouring the summary seem to appeal to subjective tastes, so we will not suggest any method of measuring these concerns. It is obvious then that summarization as application of knowledge of truth and of grammar, is not receiving precise measurement with tools such as ROUGE. This is because the task of summarization has not been separated into its component parts. We are con-ducting a test with ROUGE here simply to find out whether we are at least picking out the right classes and instantiating references to those classes using the same terms as the human writers. 101 5.3 Results Let us now present the results of the ROUGE evaluations. We give the ROUGE scores averaged over the entire testing set. We also give the ROUGE scores for a subset of the testing set for which the average random performance is appreciably lower than the MEAD performance. We evaluate all systems using three ROUGE instances: unigram (1), bigram (2), and longest-common-subsequence (L). The recall (R), precision (P), and f-measure (F) scores are tabulated for each ROUGE instance. For brevity, the values given in the table are rounded to the nearest thousandth, and the decimal points are omitted. We also provide the standard deviation for each measure. Higher values indicate a higher degree of co-occurrence, which is thought to indicate a better summary. The extractive summarizer baseline enjoys an immense advantage over our system in this test. Many of the biographies that it extracts from are roughly of the same length as the target biography. This fact is most confronting for the players where the average random sentence extractor performance rivals that of MEAD. For example, the Neil Johnston cluster contains two biographies, both slightly over 200 words. The average random f-measure performance for ROUGE-L is 456, while the MEAD values are 398, 435, and 461. The subset cluster of the test set contains Maxwell Bentley, Bob Cousy, Alan Page, Alex Delvecchio, Kareem Abdul-Jabbar, Bobby Clarke, and Eddie Shore. This subset of players levels the playing field. 5.3.1 A Key to the Systems The two human authors are labeled glen and sarah. For each player cluster, we test all the automatic summarizers against the two model documents produced by these authors. We then take out the sarah biography from the model set and replace it in the peer set, comparing the automatic systems plus sarah against the glen biography. We then repeat this comparison, this time adding glen to the peer set and comparing against the sarah 102 biography. There are three variations on the MEAD platform. The mead system is MEAD run with just the Centroid feature. The mead pp is MEAD with the Centroid and our ParagraphPosition feature. The mead sw is MEAD with the added domain information learned from the corpus (the SportsWords feature). Our automatic system is called autobio. 103 5.3.2 Entire Test Set against both: 1 2 L system R P F R P F R P F autobio 354 330 339 102 95 97 334 311 320 mead 409 355 379 162 138 148 381 331 353 mead pp 424 378 398 171 149 159 396 353 372 mead sw 429 375 399 166 143 153 403 352 375 random 372 331 349 119 104 111 353 315 331 random Id 375 373 373 129 127 128 355 353 353 against glen: 1 2 L system R P F R P F R P F autobio 382 354 363 112 103 106 364 338 347 mead 424 373 394 171 148 158 395 348 367 mead pp 438 395 413 177 156 165 410 369 386 mead sw 438 385 408 178 154 164 414 364 385 sarah 421 428 421 158 160 157 393 400 393 random 381 339 356 128 111 118 363 323 340 random Id 385 385 383 140 138 138 366 366 364 against sarah: 1 2 L system R P F R P F R P F autobio 326 306 313 91 87 88 305 285 292 mead 393 337 361 153 127 138 367 314 337 mead pp 411 361 382 167 142 152 385 336 357 mead sw 417 365 387 154 133 141 390 340 362 glen 428 421 421 160 158 157 399 393 393 random 363 324 341 109 97 102 344 306 322 random Id 364 361 360 118 116 116 344 341 341 104 5.3.3 Subset of Test Set against both: 1 2 L system R P F R P F R P F autobio 366 332 347 110 100 104 348 315 329 mead 354 326 338 119 108 113 325 300 310 mead pp 363 346 352 125 116 120 336 320 326 mead sw 409 364 384 147 129 137 381 340 359 random 315 307 310 84 82 83 295 288 291 random Id 339 364 350 105 112 108 318 342 329 against glen: 1 2 L system R P F R P F R P F autobio 395 359 374 122 112 116 375 341 356 mead 363 340 348 120 113 115 329 310 316 mead pp 379 365 369 128 120 123 348 336 339 mead sw 410 366 385 150 133 140 385 344 362 sarah 402 407 401 130 128 128 371 376 370 random 318 308 311 86 82 84 298 290 292 random Id 343 370 354 108 117 112 321 347 331 against sarah: 1 2 L system R P F R P F R P F autobio 335 305 317 97 88 92 320 290 302 mead 341 311 324 115 103 108 318 289 302 mead pp 342 326 332 119 112 115 320 305 311 mead sw 404 362 380 142 126 133 377 336 353 glen 407 402 401 128 130 128 374 370 369 random 313 305 308 83 80 81 294 286 289 random Id 335 359 346 102 108 104 316 338 326 105 5.4 Discussion and Interpretation It is obvious that our system is performing worse than the MEAD variants in terms of the ROUGE evaluation. However, our system is batting in the same ballpark, especially for biography clusters where our system is able to recover enough events to fill the output. It is important to remember that ROUGE is only measuring whether (runs of) terms appear in both the model biographies and the candidates. This evaluation verifies that we are generating text using the same consecutive runs of words as the human model writers. When given enough input text, our system does in fact perform as well as the MEAD summarizers at this test. The testing subset reflects this fact. Our entire test set scores are dragged down quite a bit by a few clusters. For instance, the length of the biography for Jim Brown was around 120 words. The short biography loses out on both recall and precision scores—our system has a unigram f-measure of 272, while the MEAD variants score around 400. As we have observed, the small clusters are especially biased towards the sentence extractors. This evaluation does not measure a major accomplishment of our technique, namely the linear sequencing of content in time. Also, it doesn't verify to what extent we are stating factually correct information. When one reads through the biographies, one finds the MEAD outputs to be rife with broken anaphora. The text hops from one place and time to another in a haphazard manner. There were some players for which our system did outperform the extractive sum-marizer. For example, the ROUGE-L recall score for Maxwell Bentley for our system was 419, while the best MEAD score was 305. There are other players for which our system lagged behind MEAD. For example, the ROUGE-L recall score for Kareem Abdul-Jabbar for our system was 269, while the best MEAD score was 352. After discussing discuss two critical problems with our system which led to lower ROUGE performance, we examine the two aforementioned player clusters in-depth. This will be our opportunity to give at least an informal discussion contrasting the qualitative aspects of MEAD and our system. 106 5.4.1 Loss due to Unknown Concepts The main limitation of our knowledge extractor is that it knows only sports concepts. Most of the players studied here have qualities or engaged in things "outside" our conceptual knowledge. This exposes not a limit of our system, but only that our current ontology has a certain conceptual boundary past which we have no ability to understand or represent particulars. Past this horizon we have no authority with which to speak, and thus we cannot author any text for that region. For example, Bobby Clarke was a diabetic and an athlete. He had a special diet which involved a number of odd foods and eating behaviours. Although our system has a cursory knowledge of the human body (i.e. it can recognize some parts and some forms of injury), it does not have a general knowledge of physiology or of disease. Moreover, it does not know about food, nor the role of energy in sustaining a human, nor of the importance of sugar for a diabetic. Bobby Clarke's diet was singular, special, and famous. The human biographers both allotted a large chunk of the output space to his eating habits, but our automatic biographer systematically neglects these things because it cannot conceptualize them. These "outside" concepts are prevalent in the testing corpus. Kareem Abdul-Jabbar was a devout Muslim who changed his name from Ferdinand Lewis Alcindor, Alan Page went on to become a lawyer and a justice with the Minnesota Supreme Court, Jim Brown quit football in the prime of his career to become an actor, etc. Generally, our system will lose out on ROUGE whenever non-sport information is included in a model biography. It is also interesting to note that, in the case of Alan Page, the Minnesota Supreme Court was actually misclassified as a team (there were plenty of teams from Minnesota, and not enough Supreme Courts labeled as org/other in the training data). The organization does get mentioned in our system's biography. This is an error, and one which will get us ROUGE points for the wrong reasons. To marginalize the "outside concepts" problem, we could have specified in the sum-marization problem statement that we wanted a biography that had only to do with the 107 person qua professional athlete. The ultimate solution to the problem is to study more domains: our system should read more text from the many fields of human endeavour in order to gain a more extensive knowledge of what man is. 5.4.2 Poor Event Recall The other discrepancy is most apparent in summarizations of small player clusters. We had a couple players who had small clusters with short documents. In this case, we saw it would be difficult for an extractive summarizer not to pick sentences which score high on ROUGE (contrast the average random extractor performance against that of the MEAD variants). Human authors efficiently pack information into sentences, so when the source biographies are roughly the same length as the target, nearly every sentence is suitable for extraction. On the other hand, our system is stretched to find enough information to fill the entire output, and is not as good in compacting information together into sentences as a human. This helps account for the striking discrepancies in performance on the smaller player clusters. This problem could have been avoided with a more judicious choice of testing set or with a different summary target length (dependent on the cluster size itself). For example, we could have either selected player clusters which contained at least some threshold of words, or we could have set a target length as a fraction of the overall cluster size. In fact, the solution of targeting smaller lengths would have focused our efforts on the application (summarization), and probably would have demonstrated the superiority of our system. It is this direction that the author would like to investigate further. 5.4.3 Maxwell Bentley Our performance on the Maxwell Bentley cluster is encouraging. Let us cite the ROUGE scores here: 108 against both: 1 2 L system R P F R P F R P F autobio 430 324 369 162 122 139 419 316 360 mead 373 327 348 120 105 112 305 268 285 mead pp 300 315 307 094 098 096 253 266 259 meadsw 316 321 318 089 090 089 284 289 287 random avg 287 278 282 061 059 060 264 256 260 randomld avg 315 325 320 076 078 077 293 303 298 We now cite the model biography written by Glen: Maxwell Bentley The small 150-pound Bentley played in the NHL from 1940-1954 for the Chicago Blackhawks, Toronto Maple Leafs, and New York Rangers. When Max was young he was diagnosed with a weak heart and it was recommended that he never play hockey. Ignoring his doctor's advice, the pale and gaunt Bentley made it onto the Chicago Blackhawks where he played on the potent "Pony Line" with brother Doug and Bill Mosienko. In Chicago he won the scoring title two times, edging out legendary Maurice Richard by one point in the 1946-47 season. In 1947 he was traded to the Toronto Maple Leafs where he reached the peak of his career and led the Leafs to three Stanley Cups. He was traded to the New York Rangers in 1953 and retired following that season. Remembered for his aggressive play, constant motion, and accurate shot, Maxwell Bentley was inducted into the NHL Hall of Fame in 1966 and passed away in 1984-and the model biography written by Sarah: Dipsy-Doodle-Dandy of Deslisle Named for his stick handling ability and hometown roots, this physically weak hearted player went from averaging one to two points per game for the Drumheller Miners to leading the 109 NHL in playoff goals, assists, and points in the 1950-51 season. In the decade leading up to his success with the Maple Leafs, Max was traded from the Miners to the Saskatoon Quakers, then to the Chicago Blackhawks where he, after only three seasons, won the Lady Byng Trophy for ranking third in the scoring race. From Chicago, Max served in the military and led the scoring in the City Senior League for the Calgary Currie Army team. Once back with the Blackhawks, Max was a member of "the Pony Line" and won the scoring title two seasons straight. In 1953, following his glamorous stint with the Maple Leafs, Max finished off his NHL career with the New York Rangers. From there, he retired after a few years of coaching. In addition to winning the Lady Byng Trophy, he was also awarded the Hart Trophy, voted first and second all star teams, was leading scorer two years straight, and was inducted into the Hall of Fame in 1966. Here is the MEAD SportsWords biography, which obtains the highest ROUGE-L composite f-measure score of the three MEAD systems: M a x w e l l B e n t l e y Max spent the first 2 years of his hockey career with the Drumheller Miners where he averaged close to a goal a game and nearly 2 points per game. In just his third season, 1942-43, Bentley tallied 70 points on 26 goals which placed him third in the scoring race, just two back of Bill Cowley, and 3 points behind his brother Doug who won the scoring title that year. For the next two seasons, Max served with the Military and played with the Calgary Currie Army team of the City Senior League where he led the league in 1943-44 in goals (18) and points (31) in only 15 games. That same season and the next (edging out Maurice Richard by one point in the last game of the season), Max led the league in scoring, and had the Art Ross Trophy been around, he would have won it both times. In his first year Max played on a line with brother Doug and Mush March, but the following season the coach put Bill Thorns on the line as a policeman for the two high scorers. 110 And here is our biography: Maxwell Bentley Maxwell Bentley was known for his skating, scoring, and stickhandling. Bentley was nick-named "Dipsy-Doodle Dandy." Bentley weighed 155 pounds. Bentley played 2 seasons with the Saskatoon Quakers and played 2 seasons with the Drumheller Miners. Bentley was teamed on the Pony Line and suffered a kidney, stomach, and throat injury. He became a star and joined the Chicago Black Hawks in 1940. He played 1940-41 with the Providence Reds and won the Art Ross Trophy in 1942, 1945, 1945-46, and 1946-47. He earned the Lady Byng Trophy in 1942, 1943, 1946, and 1946-47 and won the Hart Trophy in 1942, 1943, and 1946. Bentley led the City Senior League in goals and points in 1943-44 and played 1944-4& with the Calgary Currie Army. Bentley was named an All-Star in 1946 and 1947 and was traded to the Toronto Maple Leafs on November 2, 1947. He led the National Hockey League in goals, assists, and points in 1950-51 and was traded to the New York Rangers on August 11, 1953. He played 1953-54 with New York and retired in 1953-54- He was inducted into the Hockey Hall of Fame in 1966. He died on January 19, 1984-Although our summary is mechanical and uninteresting to read, it still develops forward in time. The dates of his trades, retirement, and death are specific and agree with the human model summaries. In contrast, a reader will have a hard time resolving the MEAD biography. It seems the reference the league refers to the NHL (and not the City Senior League), because the same sentence references the legendary Maurice Richard. The last sentence (in his first year...) seems like it should come before the second sentence (in just his third season). Our summary contains some factual errors. The references to seasons are inconsis-tent, sometimes using a year form, sometimes the range-of-years form. Reading the extracted sentence from the MEAD summary, I believe that some of the Art Ross Trophy wins didn't occur. I l l It is interesting to note that today's Chicago Blackhawks modified their name from Chicago Black Hawks decades after Bentley played for the team. This is an example where we are penalized in ROUGE because our reference contains three grams with only one (Chicago) in common with the the modern team reference expression used by the human authors. • 5.4.4 K a r e e m A b d u l - J a b b a r We did not perform well on the Kareem Abdul-Jabbar cluster. We could only generate 170 words. Here are the ROUGE scores: against both: 1 2 L system R P F R P F R P F autobio 304 315 309 092 095 093 269 279 274 autobio relax 347 307 326 100 089 094 313 276 293 mead 330 333 331 105 106 105 304 307 305 mead pp 378 345 361 118 108 112 347 317 331 mead sw 373 335 353 114 102 107 352 316 333 random avg 261 264 262 059 060 060 246 249 247 random Id avg 263 308 283 068 079 073 245 288 264 This is the model biography written by Glen: Kareem Abdul-Jabbar The basketball world may never see another player dominate the sport for as long as Ka-reem Abdul-Jabbar did. Formerly Lew Alcindor before adopting the Islamic faith in 1968, Abdul-Jabbar stood at l-foot-2 and weighed in at 235 pounds. He had national attention as early as high school and faced scrutiny from critics throughout his high school and college years. In college, he played for UCLA where he led the team to three consecutive NCAA championships. He played his rookie year in the NBA for the Milwaukee Bucks and won the Rookie of the Year award for the 1969-1970 season. He played several seasons with the 112 Bucks and led them to their first NBA title. He was then traded to the Los Angeles Lak-ers where he teamed up with fellow superstar Magic Johnson to win five more NBA titles. Abdul-Jabbar could dominate his opponents with his ability to score, rebound, pass, defend, and block shots. He is most well-known for his famous "Skyhook" shot that was extremely hard to defend against. At the time of his retirement Kareem Abdul-Jabbar had won six NBA MVP awards and two playoff MVP awards. He still holds many of the NBA's records including most seasons played (20), most games (1560), most minutes played (57,448), most points (38,387), most field goals (15,837), and most blocked shots (3189). and this is the model biography written by Sarah: Kareem "The Dream" Abdul-Jabbar Born Ferdinand Lewis Alcindor. Kareem dominated every level of play that he ever at-tempted. Three time All American Prep School, he lost once during 117 games. At UCLA freshman are not allowed to play on the varsity team. Kareem led his team to an undefeated season and a victory against his own number one ranked team. With the UCLA Bruins he led them to eighty eight victories with only three losses. These years produced three national championships, a forty seven game winning streak and three MVP awards. In his scholastic efforts Kareem boasted an IQ of 131. Joining the NBA team, the Milwaukee Bucks, he grabbed up the rookie of the year award. In his second year of play he led the Milwaukee Bucks to a national title. In 1968 Kareem shocked the sporting world with his conversion to Islam. He was unhappy with the publicity this brought him but it never affected his game play. In 1975 he was traded to the Los Angeles Lakers where he spent the next fourteen years. During this time period he led his team to five NBA championship titles. The game may never see another player like Kareem. Here is the MEAD SportsWords biography, which obtains the highest ROUGE-L composite f-measure score of the three MEAD systems: 113 K a r e e m A b d u l - J a b b a r Alcindor won the NBA Rookie of the Year award in 1969-70, when he scored 2,361 points and had 1,190 rebounds, and he won the first of his six most valuable player Awards in 1970-71, when he was the league's top scorer with 31.7 points per game. Under Hall of Fame coach John Wooden, Abdul-Jabbar guided the UCLA Bruins to a three-year mark of 88-2, three consecutive NCAA titles (1967-69) and was the first and only player to be named the NCAA Tournament's Most Outstanding Player three times. In only his first of a stellar 20 year career, one that saw Abdul-Jabbar play no less than 65 games a season, the smooth and competitive seven-footer was named NBA Rookie of the Year after averaging 28.8 ppg and 14-5 rebounds for the Milwaukee Bucks. Upon his retirement in 1989, Abdul-Jabbar stood on top of the heap in nine NBA statistical categories, including points scored (38,387), seasons played (20), playoff scoring (5,762), MVP awards (6), minutes played (57,446), games played (1,560), field goals made and attempted (15,837 of 28,307) and blocked shots (3,189). And here is our biography: K a r e e m A b d u l - J a b b a r Kareem Abdul-Jabbar was born in New York, United States on April 16, 1947. Abdul-Jabbar was known for his scoring, agility, and hook. He was nicknamed "Big 0" and "Sky-hook." Abdul-Jabbar stood 7-foot-2. Abdul-Jabbar was teamed with Wilt Chamberlain and was teamed with Magic Johnson with the Los Angeles Lakers. Abdul-Jabbar was a center and was a freshman. Abdul-Jabbar was a professional and was a player. Abdul-Jabbar was a scoring scorer and was a major factor. Abdul-Jabbar became a force and was named an All-American in 3 games. Abdul-Jabbar played 5 seasons with the Milwaukee Bucks and played 14 seasons with Los Angeles. Abdul-Jabbar captured the Podoloff Cup in 1968, 1971, 1972, and 1974 and won the NCAA Championship in 1968. He won the NBA Championship in 1969, 1980, 1982, 1985, 1987, and 1988 and joined the Bucks in 1971. Abdul-Jabbar 114 was traded by Milwaukee to the Lakers in 1975 and was named an All-Star in 5 seasons. Abdul-Jabbar captured the MVP Award in 1989 and retired in 1989. The MEAD biography is actually quite good, except for a few redundant references to the Rookie of the Year win. The entire biography consists of four very full sentences. The SportsWords script adds weight for each matched term in a sentence, which favours long sentences containing many frequent sports terms. Although it leaps over the bulk of his career, the biography reads coherently. The last three sentences are extracted from the same source biography. Kareem Abdul-Jabbar devastates our system on many levels. We could not pick out enough events. Our system feebly renders everything it knows, much of which are (usually unimportant) player descriptions about Kareem Abdul-Jabbar. Part of the problem is that there were fewer marked basketball documents in the development corpus, leading to lower markup accuracy in basketball, leading to fewer extracted events. Another problem is that we didn't comprehend any record set type event, which is pertinent here. Alcindor/Abdul-Jabbar exposes a fundamental problem with the way our system models reference. Our system thinks of a human entity as having a set name with minor variations, but a name is simply a way of referring to someone. It is an attribute, and it may be changed, sometimes drastically so. Our system regards Ferdinand Lewis Alcindor and Kareem Abdul-Jabbar as two completely different people. Especially here, all the information pertaining to the early career of the basketball legend—the period of time in which his name was commonly Lew Alcindor—is pruned off at the content selection stage. Thus we miss most of the events from this period, and with those events we lose important proper noun phrases. How important is recognizing that the two names refer to the same person? As a test, the author relaxed the filtering stage from the content selector, then regenerated the Kareem Abdul-Jabbar biography. The biography produced was: 115 K a r e e m A b d u l - J a b b a r Kareem Abdul-Jabbar was born in New York, United States on April 16, 1947. He was known for his scoring, agility, and hook. Abdul-Jabbar was nicknamed "Big 0" and "Skyhook." Lew Alcindor weighed 235 pounds, and he stood 7-foot-2. Alcindor stood 7-foot-2. Alcindor was teamed with Oscar Robertson with the Milwaukee Bucks, and he was teamed with Wilt Chamberlain. Abdul-Jabbar was teamed with Magic Johnson with the Los Angeles Lakers and became a force. Alcindor played 1965 and 1969 with the UCLA Bruins, and he was named an All-American in 3 games. Abdul-Jabbar played 5 seasons with Milwaukee and played 14 seasons with the Lakers. Alcindor won the NCAA Championship in 1967 and 1969, and he won the Podoloff Cup in 1968, 1971, 1972, and 1974- Abdul-Jabbar earned the NCAA Championship in 1968 and earned the NBA Championship in 1969, 1980, 1982, 1985, 1987, and 1988. Alcindor captured the NBA Championship in 1970 and 1970-71, and he joined Milwaukee in 1971. Abdul-Jabbar was traded by Milwaukee to Los Angeles in 1975 and was named an All-Star in 5 seasons. He earned the MVP Award in 1989 and retired in 1989. Alcindor was named an All-American in 3 seasons. This biography was scored with R O U G E . The scores are tabled above in the autobio re lax row. The biography now contains many redundant statements, but the scores are better. 116 Chapter 6 Conclusion 6.1 Conclusion We have demonstrated a multiple biographical document summarization system which is able to address the concerns of multiple document summarization. This method has many positive characteristics and has many directions (and many well-defined tasks) in which to grow and flourish. This system is a first attempt, a baseline for our approach: each component is functional, but only barely so. The purpose was to show that, applying unsophisticated techniques to a simple domain, we are able to understand unrestricted biographical text, to recover some basic propositions from the text, to validate these statements according to domain rules, to compare the claims of multiple authors, and to produce new biographical documents exhibiting the essential characteristics of such a text. It will be interesting to see the system performance increase as better, more sophisticated methods are swapped in. We have argued against sentence extraction. Multi-document plagiarism is headed for a dead end. The basic philosophic argument is that a method limited to the perceptual level cannot adequately deal with conceptual material. As we have stated before, there are only two methods of generating text: forming an original statement, or copying second-hand the work of another. The former requires 117 an involved process of domain learning and fact-finding before an original statement can be conceived; it requires a process which builds toward a unified, consistent set of facts and forms. The latter requires a method to pick sentences. To be original, one need not invent new terms. In fact, one may be original using the same words and forms as other authors. For originality is not always found in the words or in the forms, but in the piece. For machine summarization, it is enough to form original documents in clean, simple language using clear, standard terms. The fundamental distinction between our system and extractive regimes is that our system can err, while an extractive system writes only derivative statements. When one plagiarizes statements without comprehending the meaning, those quoted statements become arbitrary. The bond to reality—the status as truth or falsehood—is obliterated. The false is better than the arbitrary: something which is false at least has some relation to the facts of reality, and so one can determine where in the process the error occurred. This point is apparent in our system. One can say: here is the error, you have mistaken (mistagged) a city for a team. Or, you have misidentified the referent for this particular phrase. Or, you have determined two terms to be synonymous that are not. Or, you have believed an author who was consistent but wrong. 6.2 Future Work We have learned about a species of man (the athlete). We have an intense knowledge of three sports. The development of the ontology (the differentiation of the original base concepts) corresponds to an intensification of knowledge about three sports. Our system is not complete. We have examined shortcomings in method throughout the development of our approach. The main problem is that we do not pick one single parse of each sentence as correct. Most of the spurious events arise from the multiple parses. This is the first place that the author intends to enhance. The proper parse tree, with complete semantic decoration, must be singled out. This is necessary too for efficiency. 1 1 8 This requires that we refine our semantic labeling technique. A first step is to maximize over the assignment of semantic labels to the entire sentence. A second step is to enlarge the feature set—in particular, distributions of the numerical values observed for the various semantic classes would aid classification. A third step is to abstract out sport-specific matter (the end position in football), general sport information (the coach position, a game), and general human information (like parts of the body, events common to a human life like birth). A parallel fourth step is to use class-wide information in the the classifier: currently, we see every semantic class/type pair as independent, although arranging them in the two-level hierarchy clearly implies that types from the same class are related. We developed a simple user interface for the annotation of parse data. This was our source of training data, and the author has every intention of using it to mark other types of text. However, annotation is tedious and lonesome. We must back it up with an epistemological system which is striving to learn: we need to interface it with the parser and semantic analyzer so that the human teacher is consulted as an expert. The interface must behave greedily, presenting examples to the user which are the most obscure and problematic to its current markup model. Further, we did not quantify how well some of our subsystems were performing for lack of marked data. For example, we are not sure how well the reference resolution system is performing. Adding an interface method to graphically link corefering expressions together would help us gain marked data for this subtask. It may also be a rewarding task for the user to link together their annotated phrases. The author is disappointed that so much energy was devoted to the epistemologi-cal system and comparatively little to the generation side of summarization. The content selection module underutilizes the ontology, and the content planner forms a dry sentence from unrelated components. For selecting content, we could employ a similar technique as the SportsWords MEAD variant. Using the multi-document training players, we could assume that the shortest biography is a good target, and then we could automatically determine importance 119 measures for the frame types and their constituents. For planning content, we should simply observe which items are typically fused together into complex sentences. The ability to climb all the way up to a knowledge representation, then to scale down to original surface text, is intriguing. Now we can come full circle: we can understand what we write. We can now mimic the human authoring process: generate text, suspend knowledge about what we meant or intended to say, read back the text, and determine whether we can unambiguously recover what we meant to say. The author believes that one of the most exciting applications of this method is the problem of updating documents. We term this document contemporization. For example, at the time of writing, Marcel Dionne is the current all-time team leader of the Los Angeles Kings in goals. This fact is stated in a number of biographical documents about Dionne. However, it is expected that Luc Robitaille will soon surpass Dionne's mark. When this occurs, the statements in the Dionne biographies will become false and will be in need of updating. The fact that documents become obsolescent is a most convincing argument that summarization must not be confined to a limited cluster of documents. Summarization is the next step for information retrieval. Combined with a graphical user interface, it has amazing potential for indexing and navigating related fields of docu-ments. It would be interesting to interface the epistemological system so that a user could trace back a statement made by the summarizer to the place(s) in the original document(s) from which the fact was derived. 120 Bibliography [1] Rand, Ayn. Introduction to the Objectivist Epistemology, Meridian, 1990. [2] Peikoff, Leonard. Objectivism: The Philosophy of Ayn Rand, Meridian, 1993. [3] Bambrough, Renford. The Philosophy of Aristotle, Signet Classic, 2003. [4] Bikel, Daniel M., Schwartz, Richard, Weischedel, Ralph M.. An Algorithm that Learns What's in a Name, Machine Learning, 1999. [5] Charniak, Eugene. A maximum-entropy-inspired parser, Brown University Technical Report CS99-12, 1999. [6] Palmer, David D., Hearst, Marti A.. Adaptive Sentence Boundary Disambiguation, Proceedings of the Applied Natural Language Processing Conference, Stuttgart, Octo-ber 1994. [7] Grover, C , Mikheev, A., Moens, M.. LT TTT - A flexible tokenisation tool, Pro-ceedings of the Second International Conference on Language Resources and Evalua-tion, 2000. [8] Siddharthan, Advaith. Syntactic Simplification and Text Cohesion, PhD thesis, Uni-versity of Cambridge, 2003. [9] Siddharthan, Advaith, Nenkova, Ani, McKeown, Kathleen. Syntactic Simplifica-tion for Improving Content Selection in Multi-Document Summarization, 20th Interna-tional Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, 2004. 121 [10] New York University. The Proteus Project,, 2004. [11] Ferro, L., Mani, I., Sundheim, B., Wilson, G.. TIDES Temporal Annotation Guide-lines Draft - Version 1.02, MITRE Technical Report, 2001. [12] Mani, I., Wilson, G.. Robust Temporal Processing of News, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000. [13] Mani, I., Shiftman, B., Zhang, J.. Inferring Temporal Ordering of Events in News, Proceedings of the Human Language Technology Conference, 2003. [14] Kim, Sanghee, Alani, Harith, Hall, Wendy, Lewis, Paul H., Millard, David E., Shadbolt, Nigel R., Weal, Mark J.. Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web, Proceedings of Workshop on Se-mantic Authoring, Annotation and Knowledge Markup, the 15th European Conference on Artificial Intelligence, pp. 1-6, Lyon, France, 2002. [15] Kim, Sanghee, Alani, Harith, Hall, Wendy, Lewis, Paul H., Millard, David E., Shadbolt, Nigel R., Weal, Mark J.. Automatic Ontology-based Knowledge Extraction and Tailored Biography Generation from the Web, IEEE Intelligent Systems 18(1): pp. 14-21, 2003. [16] Geurts, Joost, Bocconi, Stefano, van Ossenbruggen, Jacco, Hardman, Lynda. Towards Ontology-driven Discourse: From Semantic Graphs to Multimedia Presenta-tions, Proceedings of the Second International Semantic Web Conference, 2003. [17] Luhn, H.P.. The Automatic Creation of Literature Abstracts, IBM Journal of Research and Development, 1958. [18] Radev, Dragomir, Blair-Goldensohn, Sasha, Zhang, Zhu. Experiments in single and multi-document summarization using MEAD, DUC 01 Conference Proceedings, 2001. [19] Carenini, Giuseppe, Raymond, Ng, Pauls, Adam. Multi-Document Summarization of Evaluative Text, EACL (submitted), 2006., 122 [20] Mani, Inderjeet, Bloedorn, Eric. Multi-document Summarization by Graph Search and Matching, Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-97), Providence, RI, pp. 622-628, 1997. [21] Utiyama, Masao, Hasida, Koiti. Multi-Topic Multi-Document Summarization, Pro-ceedings of COLING (2000), pp. 892-898., 2000. [22] Carbonell, Jaime G., Goldstein, Jade. The use of MMR, diversity-based rerank-ing for reordering documents and producing summaries, Research and Development in Information Retrieval pp. 335-336, 1998. [23] Hovy, Eduard, Lin, Chin-Yew. Automated Text Summarization in SUMMARIST, Advances in Automatic Text Summarization, 1999. [24] Leskovec, Jure, Grobelnik, Marko, Milic-Frayling, Natasa. Learning Sub-Structures of Document Semantic Graphs for Document Summarization, Proceedings of LinkKDD 2004 August Seattle, WA, 2004. [25] Lacatusu, V. Finley, Maiorano, Steven J., Harabagiu, Sanda M.. Multi-Document Summarization using Multiple-Sequence Alignment, , 2004. [26] Appelt, Douglas E., Israel, David J.. Introduction to Information Extraction Tech-nology, IJCAI-99, 1999. [27] Jing, Hongyan, Barzilay, Regina, McKeown, Kathleen, Elhadad, Michael. Sum-marization Evaluation Methods: Experiments and Analysis, AAAI Symposium, 1998. [28] Papineni, Kishore, Roukos, Salim, Ward, Todd, Zhu, Wei-Jing. BLEU: A Method for Automatic Evaluation of Machine Translation, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002. [29] NIST. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics, Automatic Evaluation of MT Quality, NIST. 123 [30] Lin, Chin-Yew. Cross-Domain Study of N-grams, Co-Occurrence Metrics - A Case in Summarization, MT Summit IX, 2003. [31] Lin, Chin-Yew, Hovy, Eduard. Automatic Evaluation of Summaries Quality Using N-gram Co-Occurrence Statistics, Proceedings of the Human Technology Conference, 2003. [32] Zhou, Liang, Ticrea, Miruna, Hovy, Eduard. Multi-Document Biography Summa-rization, Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004. [33] Conroy, John M., Schlesinger, Judith D., Goldstein, Jade, O'Leary, Dianne P.. Left-Brain / Right-Brain Multi Document Summarization, DUC 04 Conference Pro-ceedings, 2004. [34] Erkan, Gunes, Radev, Dragomir R.. The University of Michigan at DUC 2004, DUC 04 Conference Proceedings, 2004. [35] Nobata, Chikashi, Sekine, Satoshi. CRL/NYU Summarization System at DUC 2004, DUC 04 Conference Proceedings, 2004. [36] Vanderwende, Lucy, Banko, Michele, Menezes, Arul. Event-Centric Summary Generation, DUC 04 Conference Proceedings, 2004. [37] Miller, George. WordNet: A Lexical Database for English, Communications of the ACM pp. 39-41 38(11), 1995. [38] Litkowski, Kenneth C . Summarization Experiments in DUC 2004, DUC 04 Confer-ence Proceedings, 2004. [39] Blair-Goldensohn, Sasha, Evans, David, Hatzivassiloglou, Vasileios, McKeown, Kathleen, Nenkova, Ani, Passonneau, Rebecca, Schiffman, Barry, Schlaikjer, An-drew, Siddharthan, Advaith, Siegelman, Sergey. Columbia University at DUC 2004, DUC 04 Conference Proceedings, 2004. 124 [40] Reiter, Ehud, Dale, Robert. Building Natural Language Generation Systems, Studies in Natural Language Processing, Cambridge University Press, 2000. [41] Hachey, B., Grover, C . Extractive Summarization of Legal Texts, Artificial Intelli-gence and Law: Special Issue on E-government., 2003. [42] Moens, M. F., Uyttendaele, C , Dumortier, J.. Abstracting of Legal Cases: The SALOMON Experience, The Sixth International Conference on Artificial Intelligence and Law pp. 114-122, 1997. [43] Farzindar, Atefeh, Guy, Lapalme. Legal Texts Summarization by Exploration of the Thematic Structures and Argumentative Roles, Text Summarization Branches Out Conference held in conjunction with The Association for Computational Linguistics 2004 (ACL'04), p. 27-38, Barcelona, Spain, July 2004., 2004. [44] Knight, Kevin, Marcu, Daniel. Summarization beyond sentence extraction: a proba-bilistic approach to sentence compression, ACM Journal of Artificial Intelligence, Vol-ume 139, Issue 1 (July 2002) pp. 91-107, 2002, 2002. [45] Nenkova, A., McKeown, K.. References to Named Entities: A Corpus Study, Pro-ceedings of the NAACL-HLT 03, 2003. [46] Blair-Goldensohn, S., McKeown, K., Schlaikjer, A.. A Hybrid Approach for QA Track Definitional Questions, Proceedings of the 12th Text Retrieval Conference TREC, 2003. [47] Barzilay, Regina, McKeown, Kathleen, Elhadad, Michael. Information Fusion in the Context of Multi-Document Summarization, Proceedings of the 37th Association for Computational Linguistics, 1999, Maryland, 1999. [48] Radev, Dragomir, McKeown, Kathleen. Generating Summaries of Multiple News Articles, Proceedings of the Eighteenth Annual International ACM Conference on Re-search and Development in Information Retrieval (SIGIR), pp. 74-82, Seattle, WA, 1995. 125 [49] Mani, I., Maybury, M.T.. Advances in Automatic Text Summarization, MIT Press, 1999. [50] Brill, Eric. Transformation-based Error-driven Learning and Natural Language Pro-cessing: A case study in part of speech tagging, Computational Linguistics, 1995. [51] Chang, Chih-Chung, Lin, Chih-Jen. LIBSVM - A Library for Support Vector Ma-chines, cjlin/libsvm/, . [52] Joachims, Thorsten. Making large-Scale SVM Learning Practical, Advances in Kernel Methods - Support Vector Learning, B. Scholkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999. [53] Pustejovsky, James. TimeML: Markup Language for Temporal and Event Expres-sions,, 2004. [54] Columbia University. Columbia Newsblaster,, 2004. [55] University of Michigan. News-in-Essence,, 2004. [56] Wikimedia Foundation, Inc.. Wikipedia, the free encyclopedia,, 2004. [57] Lin, Chin-Yew. ROUGE: Recall Oriented Understudy for Gisting Evaluation, cyl/ROUGE, 2005. [58] NIST. Document Understanding Conference,, 2004. [59] Litkowski, Kenneth. CL Research KM,, 2005. [60] Sun Microsystems. Java Swing,, 2005. [61] Apache Foundation. Xerces,, 2005. 126 [62] Simpson, John E.. XPath and XPointer: Locating Content in XML Documents, O'Reilly, 2002. [63] Jurafsky, Daniel, Martin, James H.. Speech and Language Processing, Prentice-Hall, 2000. [64] Hopper, Vincent F., Gale, Cedric, Foote, Ronald C . A Pocket Guide to Correct Grammar, Barron's Educational Series, 1997. [65] Random House. Webster's Encyclopedic Unabridged Dictionary of the English Lan-guage, Portland House, 1989. 127 Appendix A Appendix A . l Model Biographies We procured two full sets of biographies for the players in our testing set. These model biographies were produced by human authors. We reproduce here our model writing in-structions document. We then reproduce the model document sets. A. 1.1 Instructions You are given a collection of biographical documents related to a specific athlete. You are to read and comprehend these documents. You are to produce a biographical summary of no more than 200 words. The target is a single paragraph. Include the information that you feel is most relevant to the biographical subject. It is assumed that you are familiar with sport in general, and have possibly read outside biographies related to the subject. However, please strive to include only that information (facts, events, etc.) directly available from the collection of documents. Your output target should contain nearly 200 words without going over that limit. We define a word to be a string of characters separated by whitespace or breaking punctua-tion (contractions and hyphenated words are considered to count as a single word). Do not count punctuation as a word. 128 Please return your summaries, along with your completed questionnaire, to the author of this document. Thank you for participating in this experiment. A.1.2 Glen's Summaries J i m B r o w n Jim Brown is considered by many to be the greatest running back in NFL history. An exceptional all-around athlete, he not only excelled in football, but professional fighting, baseball, basketball, lacrosse, and track as well. Brown played fullback for the Cleveland Browns from 1957 to 1966. In his rookie year he gained 942 yards on 202 carries and was unanimously named rookie of the year. During his stay in Cleveland, he was named to the All-Pro team eight times and won the MVP three times. He rushed for more than 1000 yards seven times and led the league in rushing every season he played except one. When Brown announced his retirement at the height of his career at age 29 to pursue an acting career he had a total of 12,312 yards gained and numerous entries in the NFL's record book. After his retirement he appeared in a few motion pictures, including 1967's "The Dirty Dozen". Brown has inducted into the NFL Hall of Fame in 1971 and currently works with young adults caught up in the gang scene in Los Angeles, California. James L o f t o n James Lofton spent his NFL career with the Green Bay Packers, Los Angeles Raiders, Buffalo Bills, Los Angeles Rams, and Philadelphia Eagles. An All-American wide receiver from Stanford University, Lofton was also an accomplished track star in university. As the first round draft pick by the Packers in 1978, Lofton was named rookie of the year after catching 46 passes for 818 yards and 6 touchdowns. Lofton was the first NFL player to score a touchdown in the 70s, 80s, and 90s. Many of these touchdowns were the result of Lofton simply outrunning the competition using his great speed. His excellent play earned him ^several Pro Bowl appearances and he was only the fifth NFL player to gain more than 1000 129 receiving yards. After James Lofton retired he was the NFL's all-time leader in reception yardage, but this record has since been surpassed by Jerry Rice. After spending several years in broadcasting for a number of different networks, Lofton became the receiving coach for the San Diego Chargers in 2002. M a x w e l l B e n t l e y The small 150-pound Bentley played in the NHL from 1940-1954 for the Chicago Blackhawks, Toronto Maple Leafs, and New York Rangers. When Max was young he was diagnosed with a weak heart and it was recommended that he never play hockey. Ignoring his doctor's advice, the pale and gaunt Bentley made it onto the Chicago Blackhawks where he played on the potent "Pony Line" with brother Doug and Bill Mosienko. In Chicago he won the scoring title two times, edging out legendary Maurice Richard by one point in the 1946-47 season. In 1947 he was traded to the Toronto Maple Leafs where he reached the peak of his career and led the Leafs to three Stanley Cups. He was traded to the New York Rangers in 1953 and retired following that season. Remembered for his aggressive play, constant motion, and accurate shot, Maxwell Bentley was inducted into the NHL Hall of Fame in 1966 and passed away in 1984-B o b C o u s y Bob Cousy, nicknamed the "Cooz", the "Houdini of Harlem", and the "Mobile Magician", is a former basketball star who played for the Boston Celtics from 1951-63 and the Cincinnati Royals from 1969-70. The Cooz is best remembered for bewildering his opponents with his amazing dribbling, passing skills, and play-making abilities. Each year that he played in the NBA he was chosen as a member for the all-star team. He lead the league in assists eight years in a row, was the all-star MVP in 1954 a n d 1957, and was the most valuable player for the 1957 NBA regular season. As a member of the Celtics-he also captured six NBA titles (1957, 1959-63). When Cousy retired in 1963, he held the NBA record for most assists with 6949, was second in games played with 917, and was fourth in scoring with 130 16,955 points. Cousy was also one of ten players named to the NBA's silver anniversary team in 1971. Bob Cousy coached Boston College from 1964-69 and then returned to the NBA as a player-coach for the Cincinnati Royals in 1970. He resigned suddenly in 1973 and became television commentator for Boston Celtics games. A l a n Page Alan Page was a star defensive lineman in the NFL from 1967-81 and later went on to have a distinguished legal career. Starting as a defensive end in college, Page was drafted to the Minnesota Vikings in 1967 as a defensive tackle. He became a starter just four games into his rookie year and was named defensive player of the year in both 1971 and 1973. Page had his best seasons in Minnesota, earning All-American league six times and being named the NFL's most valuable player in 1971, the first defensive player to ever earn such an honour. He was released by the Vikings in 1978 but was quickly picked up by Chicago, where he remained a starter until he retired in 1981. He is most remembered for his speed and quickness on the field which helped him achieve a career total 24 opponents' fumbles, 28 blocked kicks, 164 sacks, and 1431 tackles. After his retirement, Page became a lawyer and was elected to the Minnesota Supreme Court in 1992. He was inducted into the NFL Hall of Fame in 1988. A l e x De lvecch io Alex Delvecchio was a talented and classy star in the NHL from 1951-73. Delvecchio, or "Fats" as his teammates referred to him on account of his round face, spent his entire career on the Detroit Red Wings from their glory days in the 1950s to their dismal 1970s. Alex is most remembered as part of the famed "Production Line" centering Gordie Howe and Ted Lindsay. During his days in Detroit Alex captured 3 Stanley Cups and was awarded three Lady Byng trophies for most gentlemanly player in the NHL. By the time of Delvecchio's retirement in 1973 he was second only to Gordie Howe with 1549 games played, 825 assists, and 1281 points. After his retirement he became the intermittent head coach and General 131 Manager of the Red Wings until 1977. Upon leaving the NHL Delvecchio achieved some success in business. Alex "Fats" Delvecchio was inducted into the Hall of Fame in 1977 and his number 10 jersey was retired to the rafters of Joe Louis Arena in 1991. N e i l J o h n s t o n Neil Johnston was one of the NBA's most prolific scorers. He played college basketball at Ohio State for two seasons and also signed with baseball's Philadelphia Phillies as a pitcher. Luckily for basketball fans, an injury forced Johnston to quit baseball and he was signed by the Philadelphia Warriors in 1951. Neil "Gabby" Johston led the NBA in scoring three years in a row and led in field goal accuracy three years in a row. He also led the NBA in rebounds for one year. The 6'8", 200 pound center was most notorious for his virtually unstoppable hook shot. When a serious knee injury forced Johnston to retire in 1959, he had appeared in six All-Star games and led the Warriors to the 1956 NBA championship. In 1961, Neil Johnston became the coach for the Warriors. Two years later he was fired after a mediocre 53-50 record as coach. K a r e e m A b d u l - J a b b a r The basketball world may never see another player dominate the sport for as long as Ka-reem Abdul-Jabbar did. Formerly Lew Alcindor before adopting the Islamic faith in 1968, Abdul-Jabbar stood at 7-foot-2 and weighed in at 235 pounds. He had national attention as early as high school and faced scrutiny from critics throughout his high school and college years. In college, he played for UCLA where he led the team to three consecutive NCAA championships. He played his rookie year in the NBA for the Milwaukee Bucks and won the Rookie of the Year award for the 1969-1970 season. He played several seasons with the Bucks and led them to their first NBA title. He was then traded to the Los Angeles Lak-ers where he teamed up with fellow superstar Magic Johnson to win five more NBA titles. Abdul-Jabbar could dominate his opponents with his ability to score, rebound, pass, defend, and block shots. He is most well-known for his famous "Skyhook" shot that was extremely 132 hard to defend against. At the time of his retirement Kareem Abdul-Jabbar had won six NBA MVP awards and two playoff MVP awards. He still holds many of the NBA's records including most seasons played (20), most games (1560), most minutes played (57,44-8), most points (38,387), most field goals (15,837), and most blocked shots (3189). Joe D u m a r s Joe Dumars was a consistent all-around basketball player for the Detroit Pistons from 1986-1999. Dumars was drafted in the first round by the Pistons and spent one year as a backup until becoming a starter in 1987. He was soon recognized as one of the finest defensive players in the league. After working on his shooting, Joe Dumars also became an offensive threat by averaging 20-24 points per game during the peak of his career. He was named to the NBA's All-Rookie Team in 1986 and to the NBA's all-defensive team in 1989, 1990, 1992, and 1993. He won two NBA championships with the Pistons and is also known for coach Chuck Daly's "Jordan Rules" defensive playbook which forced the Chicago Bulls to change their offensive strategy to include less emphasis on Michael Jordan and more emphasis on the other members of the team. He retired in 1999, having spent his entire career with the Detroit Pistons. He leads the Pistons in games played (1018) and three-point field goals (990) and is second in points (16401), assists (4612), and steals (902). Joe Dumars is currently the Piston's President of Basketball Operations. T o m G o l a Tom Gola was a star basketball player for Philadelphia at all levels. He starred at LaSalle High School and shocked the basketball world by opting to play college basketball for his local LaSalle University in Philadelphia. At LaSalle he was an instant star, playing every offensive position and leading LaSalle to 1952 NIT and 1954 NCAA championships. His college coach nicknamed him "Mr All-Around" for his ability to single-handedly control a game by himself. During a ten year NBA career with the Philadelphia Warriors, San Francisco Warriors, and New York Knicks he won one NBA title with Philadelphia and scored 7871 points (11.3 ppg). 133 After retiring, Gola coached LaSalle to a 23-1 record in 1968-69 and the team was ranked second in the nation. Tom Gola is one of only two players to ever win NIT, NCAA, and NBA championships. B o b b y C l a r k Bobby Clark was the leader of the NHL's Philadelphia Flyers from 1969-84- Originally from Flin Flon, Manitoba, Bobby Clark was drafted by the Flyers in spite of his diabetic condition. His diet antics would later become famous: two cans of Coca-Cola and three spoonfuls of sugar before a game, two bottle of orange juice between periods, and plenty of chocolate bars on hand during the periods. Bobby helped lead the Flyers, or "Broad Street Bullies" as there were also known due to their aggressive style of play, to back-to-back Stanley Cups in 1974 and 1975. The fiery scrappy redhead with plenty of missing teeth was named the league's most valuable player three times (1973, 1975, 1976) and finished his fifteen season career with the Flyers with 358 goals and 852 assists. In addition to his strong regular shifts, Clark was a top notch penalty killer and was a regular on the Flyers' power play unit. He was also one of the NHL's best face-off men. After retiring from the league as a player, Bobby Clark went on to work as an assistant coach for the Flyers from 1979-1982. He is currently the president and general manager of the Philadelphia Flyers. E d d i e Shore Eddie "The Edmonton Express" Shore was an offensive defenseman who played for the NHL's Boston Bruins for the majority of his career. Shore was best known for his end-to-end rushes, crushing body-checks, and nasty disposition. He was also the first defenseman to be a scoring threat on the ice. In the 1928-29 season, Eddie Shore led the Bruins to their first ever Stanley Cup. Ten years later he won another Stanley Cup with the Bruins. Shore was named to the NHL All-Star team eight out of nine years that the team was selected and he was the only NHL defenseman to win four Hart Trophies for the NHL's most valuable player. After his retirement Shore purchased the American Hockey League's Springfield Indians and 134 operated the team {or thirty-five years. Today, the award for the AHL's best defenseman is honoured with his name. Eddie Shore was elected into the Hockey Hall of Fame in 1947 and passed away in 1985. A . 1.3 S a r a h ' s S u m m a r i e s J i m B r o w n James Nathaniel Brown was born February 17, 1936. Known for his domination on the football field as a running back in the NFL. An all around athlete in high school, Brown earned thirteen letters at Manhasset High School in football, basketball, baseball, lacrosse and track and field. He turned down the New York Yankees and became an All American at Syracuse University in both football and lacrosse. At six foot two and two hundred and twenty eight pounds he was an amazing sprinter. He joined the Cleveland Browns in 1957 and was named Rookie of the Year, gaining 942 yards on 202 carries. Over the next eight years he rushed 1000 yards seven times. He shocked the football world when at twenty nine he was retiring from football to focus on acting. He went on to perform in such movies as "The Dirty Dozen." James Lof ton He was All-American at Stanford and an accomplished track and field star too. Played wide receiver and was the number one draft pick of the Green Bay Packers as there first round draft choice. He was awarded Rookie of the Year, catching forty six passes for eight hundred and eighteen yards and six touchdowns. In 1986 he had his football career put on hold. He was accused of sexual assault and suspended by the league on the final game of the season. He was then traded to the Oakland Raiders. After a moderate showing over two years he was traded to the Buffalo Bills in 1989. Unlike most players who slump under controversy, Lofton made good on his time in Buffalo becoming a league dominator. When he retired, he was the all time leader in reception yardage, surpassed only by the great Jerry Rice. Over 135 the next years he took the broadcasting chair at CNN, FOX and NBC. In 2002 he took a position as the San Diego Chargers receivers coach. Dipsy-Doodle-Dandy of Deslisle Named for his stick handling ability and hometown roots, this physically weak hearted player went from averaging one to two points per game for the Drumheller Miners to leading the NHL in playoff goals, assists, and points in the 1950-51 season. In the decade leading up to his success with the Maple Leafs, Max was traded from the Miners to the Saskatoon Quakers, then to the Chicago Blackhawks where he, after only three seasons, won the Lady Byng Trophy for ranking third in the scoring race. From Chicago, Max served in the military and led the scoring in the City Senior League for the Calgary Currie Army team. Once back with the Blackhawks, Max was a member of "the Pony Line" and won the scoring title two seasons straight. In 1953, following his glamorous stint with the Maple Leafs, Max finished off his NHL career with the New York Rangers. From there, he retired after a few years of coaching. In addition to winning the Lady Byng Trophy, he was also awarded the Hart Trophy, voted first and second all star teams, was leading scorer two years straight, and was inducted into the Hall of Fame in 1966. Bob Cousy This "Houdini of the Hardwood" was drafted to the Celtics by default his first year after becoming and Ail-American and winning NCAA championship for Holy Cross. After the team he had originally been chosen for folded, his name was drawn third out of three by the Celtics in 1950. Cousy ranked fourth in assists in the league in his rookie year, followed by second in his second year, then led the league for the remainder of his career. He was a first team all star ten times and was one of ten players named to the NBA's silver anniversary team in 1911. He was voted the All-Star game Most Valuable Player in 1954 a n d 1957 and led the team to six titles. At the end of his playing career, Cousy went on to coach Boston College to 114 wins and five appearances in the National Invitation Tournament 136 in six seasons. His final victory lap involved coaching and playing for NBA's Cincinnati Royals, although he was only able to play in seven games during the season. After that, Cousy remained involved in the sport, commentating and was named one of the 100 greatest athletes by ESPN. A l a n Page One of four consensus All-Americans from Notre Dame in 1966, Alan Page was a defensive end for the national champions that year. Aggression and intelligence made up for Page's size disadvantage and he was signed to defensive tackle when he was picked (second) by the Minnesota Vikings in 1967. NFL's Player of the Year had never been won by a defensive player before it was awarded to Page in 1971. That year, he was also given the honour of United Press International award as National Football Conference Player of the Year. Although released from the Vikings as the result of an intentional weight loss program, Page didn't miss a game in his fifteen pro seasons. He was signed by the Chicago Bears and put on the starting line before the start of the following game. He started for them for four years before retiring. Throughout his career, Page played in four superbowls, and won four NFL/NFC title games. He earned All-Pro honours six times, voted to nine straight Pro Bowls, and was named to ten all-conference teams. Page earned a law degree while playing pro football and was inducted into the International Scholar-Athlete Hall of Fame in 2002. A l e x De lvecch io "Best Supporting Team Mate" for the better part of twenty four seasons, Alex Delvecchio holds the record for number games played for one specific team. His grace on the ice kept him with the Detroit Red Wings for his entire professional career, including positions as head coach and general manager. "Fats" (as he was called by teammates) centred Gordie Howe and Ted Lindsay on "the Production Line" for the Red Wings during their glory days. He played an integral part in the team's success at the Stanley Cup Finals three times. Easily the most durable player in the league, Delvecchio's only major injury was a twisted ankle that 137 took him out for twenty two games in the 1956-57 season. Throughout his career Delvecchio was awarded the Lady Byng Trophy three times, voted to the NHL All Star team twice, and made 13 All Star game appearances. He was voted the captain of the team in 1962 and remained captain until his retirement in 1974- Alex was only second to teammate Gordie Howe in points, assists, and games played throughout his stint with the Red Wings. In 1977, he was inducted into the Hockey Hall of Fame and in 1991 his number 10 was retired by the Detroit Red Wings. N e i l J o h n s t o n While baseball was his passion, basketball was Johnston's career. His sore arm and bad luck in baseball turned Neil Johnston away from baseball and onto the court for the Philadelphia Warriors in 1951. After playing as a substitute in his first year, he was moved to become a starter for his remaining seven seasons. During this time he averaged more than 20 points per game and led the NBA scoring for three years in a row. Famous for his "virtually unstoppable hook shot", Johnston was an NBA first-team all star four times. He teamed up with fellow Hall of Famers Arizin and Gola to lead the Warriors to the NBA championship in 1956. He appeared in six all star games and, in 1953-54, celebrated five of the six top individual game highs, and led the NBA against Syracuse by 50 points. A knee injury in 1959 ended his playing career, but Johnston went on to coach the Warriors, winning ninety-five games in two seasons. Johnston attempted to play ball again in 1961 as a player-coach for the Pittsburg Condors in the American Basketball League, but was only able to play in five games as a result of his pre-existing knee injury. K a r e e m " T h e D r e a m " A b d u l - J a b b a r Born Ferdinand Lewis Alcindor. Kareem dominated every level of play that he ever at-tempted. Three time All American Prep School, he lost once during 117 games. At UCLA freshman are not allowed to play on the varsity team. Kareem led his team to an undefeated season and a victory against his own number one ranked team. With the UCLA Bruins he 138 led them to eighty eight victories with only three losses. These years produced three national championships, a forty seven game winning streak and three MVP awards. In his scholastic efforts Kareem boasted an IQ of 131. Joining the NBA team, the Milwaukee Bucks, he grabbed up the rookie of the year award. In his second year of play he led the Milwaukee Bucks to a national title. In 1968 Kareem shocked the sporting world with his conversion to Islam. He was unhappy with the publicity this brought him but it never affected his game play. In 1975 he was traded to the Los Angeles Lakers where he spent the next fourteen years. During this time period he led his team to five NBA championship titles. The game may never see another player like Kareem. Joe D u m a r s Joe Dumars led his teams as a pillar of teamwork and sportsmanship. Born May 24, 1963 in Shreveport, Louisiana. He played shooting guard and point guard. Drafted first round from McNeese State University to the Detroit Pistons. He spent his entire career in Detroit, from 1985 to 1999. He helped his team win two NBA Championships in 1989 and 1990 and earned the 1989 Finals MVP. The next year, along with Dennis Rodman, he was a pillar of coach Chuck Daly's "Jordan Rules" defensive strategy which forced the dominating Chicago Bulls to make more of a team effort and less of a Michael Jordan spectacle. He was a five time All-Star and a four time All-Defensive first team selection. His #4 jersey was retired by the Pistons in 2000. He continues to work for the Pistons to this day becoming the President of Basketball Operations. Eager to prove himself he worked to build a strong team and 2004 he brought Detroit another NBA championship title with a team that will contend for many more. G o l a "Mr Ail-Around" was a Philadelphia born star. Opting to stay local to play college ball, Gola was voted MVP in his freshman year when LaSalle won the National Invitational Tournament. The next three seasons showed him a concensus All-American and tournament 139 MVP when his team won the 1954 NCAA championship. He still holds the NCAA record for rebounds. One of only two players to play on NIT, NCAA, and NBA championship teams, Gola led the Philadelphia Warriors to win the league championship. After two years of military service, he returned to the Warriors as a defenceman in 1957. Gola finished his playing career with the New York Knicks in the 1965-66 season after being traded mid-season from the moving Warriors. At the end of his ten year professional career, he returned to his roots and coached LaSalle to a 23-1 record and led them to be ranked second nationally. Gola then went to work for his city as a member of the Pennsylvania State Legislature, then as a comptroller of the city where he grew up. B o b b y C l a r k e Robert Earle Clarke was born August 13, 1949 in Flin Flon, Manitoba. Clarke played in hockey games since the age of eight and was a special case to hockey being a diabetic since a very young age. He was famous for his diet of two cans of soda pre-game, two glasses of orange juice at intermissions and glucose gum hidden in his uniform. This was to prevent his blood sugar from dropping too low during physical game play. He played for fifteen seasons in the NHL and scored 358 goals and 852 assists and all for the Flyers. He won the defenseman's award, the Hart Trophy, three times. His coach once said, "he is the number player in the game at helping his team." Continuing to work as a assistant coach for the Flyers, he became the General Manager and President a few years later. He has also been General Manager of the Minnesota North Stars and Florida Panthers before returning to the Flyers once again in 1994-E d d i e " T h e E d m o n t o n E x p r e s s " Shore Eddie Shore was born on November 25, 1902 in Fort Qu'Apelle, Saskatchewan, Canada. He played very little hockey in his youth but worked hard for years and eventually moved up to a spot on the WHL team the Regina Caps. When the league folded in 1927 he was bought up by the Boston Bruins. In his rookie year he scored twelve goals and scored eighteen total 140 points, then an unheard of number for a defenseman. He was no softie either, racking up an NHL Record 165 penalty minutes in his second season with the Boston Bruins. He frequently ran over players and skated in a trademark crouch so that is was difficult to knock him over. During one practice a team mate head butted Eddie and severed his ear from his skull. He watched with a mirror and no anesthetic as doctors sew the ear back to his cranium. Eddie made the All-Star Team eight times won the NHL defenseman's trophy, the Hart Trophy, four times. As a member of the Bruins he won two Stanley Cups and was inducted into the Hockey Hall of Fame in 1947. A. 2 Ontology We reproduce here the semantic labels for the noun phrases and the verb phrases. A.2.1 n p l abe l s player type definition / descript ion definite a reference to one particular person. nickname a special, informal name for a player position a position or role played. description not any one particular player or group of players; qualifies the player(s) on which it is predicated. b o d y type definition / descript ion , part a part of the human body. injury some problem with the body which decreases performance. repair something used to repair the body, provide mobility, etc. measure a measure of the physical body or other body features other 141 quality type definition / description skill a particular skill or ability behaviour style of play, methods, etc. personal prestige, notoriety, fame, etc. other time type definition / description season A unit of time in sports, roughly occurring about the same time ev-ery year, in which teams play games according to a schedule, usually building towards a large tournament. This term might have the same written form as a human year (below). tournament A unit of time in sports in which teams play each other, advancing by some elimination rules (usually towards a final match). series A unit of time in sports in which teams play each other, advancing by some elimination rules (usually towards a final match). game a unit of time in sports in which two teams face each other, with some outcome relating the performance of the two teams. practice a session in which players or teams develop skills and strategy period a division of a game which lasts until some condition is met, for exam-ple, time reached zero, the ref called time, etc. age a time relative to a person career a division of time, in which a player was involved with a sport. history the past of a sport year human (roughly: solar) year or years human season a division of the year month a division of the year day a unit of time in which the sun rises and sets, roughly 24 hours; also includes parts like afternoon, night, etc. hour 60 minutes of time minute 60 seconds of time second a small unit of time point in time some other moment in time period of time some other region in time 142 event type definition / description join the event in which some entity becomes a member of an organization trade the event of ending active involvement in a sport, and includes the event in which a player re-activates operation an operation or treatment of the human body retire the event of ending active involvement in a sport, and includes the event in which a player re-activates repetitions the repetition of an event decision the resolution to some course of action aspiration a goal, dream, something to accomplish effort a course of action toward achieving a goal future things to come a play the making of some particular game play (not included under statistic) a state some state of the world, fact, etc. effect the result of an action or actions war battles, fighting between groups other statistic type definition / description goal scoring on net (hockey) assist involvement in / set up of a goal. point (in hockey) either a goal or an assist gaa (in hockey for goalies) goals-against average touchdown having possession of the football in the opposing teams goal (football) reception a completed pass (football) yards count of yards the play has advanced pass count of passes thrown field goal kicking through the uprights interception catching an opposition team's pass field goal a basket from the field rebound acquiring control of the ball from an opposition attempt free throw a free shot after a foul three pointer making a basket outside the three point line win/loss game outcome, usually applies to a team, or a special player such as a goalie, pitcher, coach, etc. this can include a tie. shutout holding a team to no production rank comparitive position according to some index composite a grouping or multiple statistics other 143 org type definition / description team a coalition of players, coaches, management, and staff, which exists to compete and play contests with other teams (sometimes referred to by home city or nation) league an organization governing play of teams, rules, etc. school educational intitution company economic venture military fighting forces media a publisher, publication, etc. other location type definition / description city a center of people, trade, and customs state/prov the political units of a country country a geographical region under a political system continent a significant geographical area of land arena a building facilitating play of the sport field The field or location on the field. other sport type definition / description hockey The sport of hockey basketball The sport of basketball football The sport of football other award type definition / description trophy A trophy or title awarded for merit honor An honor or distinction record the top recorded measure of performance, usually over a period of time other 144 artifact type definition / description equipment the gear and other items special to the game rule a rule of a sport team grouping a subgrouping of a team, like the defensive team, coaching staff, a line, special teams, etc. contract a multi-party agreement, playing rights, etc. money tradable economic unit, a human estimate of value. other draft type definition / description draft event the event in which certain players are exclusively selected by teams according to some eligibility criteria, in some order round a division of a draft in which teams select eligible players in some order pick The place in the draft order in which a player is selected. other quantifier type definition / description none no part of part a division of a whole, part of a set, etc. all everything, all of, the entire whole multiples individual units position a place within some region, usually time other structure type definition / description d integer digit (possibly spelled out) ord ordinal number d.d number with a decimal place d-d a combination of digits and dashes 145 A .2.2 vp labels occurrence type definition / description play The act of playing a sport score The act of making a point or other statistic performance Another in-game action outcome winning, losing, tieing, etc, usually relates two teams; not the same as an award event. injure an injury action lead The act of leading. teamed Being put on a line, a team grouping, etc. wear to wear something: equipment, jersey, etc. penalty Causing a foul, receiving a penalty, being sanctioned, etc. change position To switch to another position award Winning an award or attaining an honor set record the setting or breaking of a record. prevail prevailing over another player (as in a scoring race) create to originate something, bring it into existence draft select a team claiming a player's playing rights join team Joining or leaving a team trade An event in which teams exchange players or items retire Deactivating (or reactivating), quitting from play join some entity becoming a member of an organization offer to make an offer create org to create an organization leave some entity discontinuing membership in an organization birth A person comes into existence naming Receiving a name, nickname, etc. pay Transferring money work Doing a job death A person ceases to exist get To receive or extract something (not a play, not a trophy or honor) use to use, employ come Arriving travel To go somewhere other 146 state type definition / description copula Verb linking subject and predicate, typically the verb "was". measure to be measured at a certain height, weight, etc. change Modification of the state of things. cause Effecting a new state. require A state demanding certain actions or preconditions. limit A state that curtails or prevents certain actions. blessed To be endowed with potent natural abilities. improve Improvement or development. perfect To take some behaviour to its ideal or pinnacle, to master. degrade opposite of improve rank to hold some position or status in a statistic, etc. exist to be, to emerge, to establish, etc. maintain to keep a certain state of affairs. include to be a part in something consist to be composed or made up of components permit to be permitted or allowed lack to lack some quality, missing the presence of something other reporting type definition / description statement Saying something other perception type definition / description visual Seeing something audible Hearing something touch Feeling something (with the body) smell Smelling something taste Tasting something 147 modal type definition / description could The verb could should The verb should would The verb would will The verb will might The modal might may The modal may can The modal can other aspectual type definition / description begin Initiation, or the start of the something continue the continuing of the something precede Indicates a precedence of events follow Indicates a precedence of events repeat The repetition of an event end Culmination, termination, or the end of the something other i state type definition / description belief A personal conviction desire An future action or state thought to be of value by a player. admire Having admiration enjoy Having joy (enjoyment), happiness, or sadness-experience Personally experiencing, undergoing something. Sometimes projected onto teams, leagues, etc. recognize To gain awareness of, to take note, to arrive at an idea other i action type definition / description attempt Taking action towards a goal decide Deciding, ruling suggest to suggest, recommend, advise, etc. ask To require something from someone promise Suggesting a committment to future action commit Holding a committment to certain course of action or behaviour attain Reaching a personal goal other 148 structure type definition / description to Infinitive marker conj Conjunction of verb phrases other A.3 Output Biographies Here are the biographies produced by our system. We have kept the "relaxed" version of the Kareem Abdul-Jabbar biography. J i m B r o w n Jim Brown was born in Saint Simons Island, Georgia, United States on February 17, 1936. He was known for his durability, jump, all-around, and exceptional ability. He was a running back and was an only rusher. He was a fullback and was a star. Brown was a american player and was a exceptional athlete. He was a professional and was a member. Brown played 1966 and 1957 with the Cleveland Browns and recorded more than 1,000 yards in 1963. Brown scored 4 touchdowns in 1957 and joined Cleveland in 1957. He led the National Football League in yards in 1961 and was named an All-America in 1965. Brown scored 2,499 yards, 262 passes, and 20 touchdowns in 1966. James Lof ton James Lofton was born on July 5, 1956. He was known for his speed, athletic ability, and jump. Lofton weighed 190 pounds and stood 6-foot-3. He was a wide receiver and played 1 game and 1993 with the Philadelphia Eagles. Lofton recorded 41 passes, 759 yards, and 8 touchdowns in 13 games and scored 100 yards in 3 games. He scored 1,216 yards, 68 passes, and 16 touchdowns in 1977 and was a wide receiver at Stanford University in 1977. He was named an All-America in 1978 and was drafted in the first round by the Green Bay Packers in 1978. He recorded 1 goal, 43 yards, and 1 touchdown in 1982 and led the National Football League in yards, points, and touchdowns in 1982. He led the NFL in yards, points, and touchdowns in 1983 and led the NFL in yards, points, and touchdowns in 1983. Lofton 149 was traded to the Los Angeles Raiders in 1987 and signed with the Buffalo Bills in 1989. He became a free agent in 1992 and recorded 7.7 gaas, 246 yards, 32 attempts, and 1 touchdown in 1993. He retired in 1993 and was a wide receiver in July 1956. M a x w e l l B e n t l e y Maxwell Bentley was known for his skating, scoring, and stickhandling. Bentley was nick-named "Dipsy-Doodle Dandy." Bentley weighed 155 pounds. Bentley played 2 seasons with the Saskatoon Quakers and played 2 seasons with the Drumheller Miners. Bentley was teamed on the Pony Line and suffered a kidney, stomach, and throat injury. He became a star and joined the Chicago Black Hawks in 1940. He played 1940-41 with the Providence Reds and won the Art Ross Trophy in 1942, 1945, 1945-46, and 1946-47. He earned the Lady Byng Trophy in 1942, 1943, 1946, and 1946-47 and won the Hart Trophy in 1942, 1943, and 1946. Bentley led the City Senior League in goals and points in 1943-44 a n d played 1944-46 w i t h the Calgary Currie Army. Bentley was named an All-Star in 1946 and 1947 and was traded to the Toronto Maple Leafs on November 2, 1947. He led the National Hockey League in goals, assists, and points in 1950-51 and was traded to the New York Rangers on August 11, 1953. He played 1953-54 with New York and retired in 1953-54- He was inducted into the Hockey Hall of Fame in 1966. He died on January 19, 1984-B o b C o u s y Bob Cousy was born in New York, United States on August 9, 1928. Cousy was known for his scoring, passing, ball-handling, hands, and playmaking. Cousy was nicknamed "Mr. Basketball." He was a great, outstanding player and became a commentator. Cousy played 1974 and 1969 with the Cincinnati-Kansas City Royals and played 7 games, 1969, and 1974 with the Cincinnati Royals. Cousy won the NBA Championship in 1947, 1957, 1998, and 1999 and was named an All-Star in 13 seasons. Cousy was named an All-America in 3 seasons and scored 16,960 points in 1953. He led the National Basketball Association in assists in 1953-54 and led the NBA in assists in 1954-55. He led the NBA in assists in 150 1955-56 and scored 16,960 points in 1956. Cousy led the NBA in assists in 1956-57 and won the NCAA Championship in 1957. Cousy led the NBA in assists in 1957-58 and led the NBA in assists in 1958-59. He recorded 28 assists in 1959 and led the NBA in assists in 1959-60. He retired in 1963 and joined the Royals in 1970. He was a player with the Mobile Magician in 1971 and scored 5 points in 1972. He retired in 1973. A l a n Page Alan Page was born in Canton, China on August 7, 1945. He was known for his speed and quickness. Page was a defensive end and was a defensive lineman with the Minnesota Vikings. Page was a children and was a player with the Minnesota Supreme Court. Page became a defensive player and was. a member with Minnesota. He played 1976, 1974, 1969, 1973, 1 season, 5 games, and 4 games with the Vikings and played 1981 and 238 games with the Chicago Bears. He was named an All-American in 1966 and was named an All-America in 1966. Page was picked in the first round by Minnesota in 1967 and notched in 1971. Page was named an All-Pro in 1968 and 1976 and was inducted into the College Football Hall of Fame in 1979. Page became a player in 1979 and retired in 1981. He was an attorney general in 1985 and joined the New York Giants as a lineman in 1986. Page was inducted into the Pro Football Hall of Fame in 1988. A l e x De lvecch io Alex Delvecchio was born in Ontario, Canada on December 4, 1931. Delvecchio was known for his accomplishments, skating, punch, scoring, totals, and playmaking. He was nicknamed "The Production Line," "Big M," and "Fats." He played 1 game and 1947-48 with the Fort William Rangers and was teamed on the Production Line. Delvecchio was teamed with Gordie Howe and was a forward. Delvecchio joined the Indianapolis Capitals in 6 games and scored 16 goals and 8 assists in 1948-49. Delvecchio led the Ontario Hockey Association in assists in 1950-51 and captured the Stanley Cup in 1951, 1951-52, 1952, and 1954-55. Delvecchio played 24 seasons and 1951-52 with the Capitals and was named an All-Star in 151 1973, 1952-53, and 1954- He retired in 1962 and led the National Hockey League in assists in 1965-66. He notched 700 goals and 1 point in 1970 and won the Lester Patrick Trophy in 1973 and 1974- Delvecchio joined the Detroit Red Wings as a player in 1973 and retired in 1973. He was inducted into the Hockey Hall of Fame in 1977 and joined Detroit on December 17, 1976. Delvecchio's jersey was retired by Detroit in 1991, and he retired in 1991. N e i l J o h n s t o n Neil Johnston was known for his scoring and hook. He stood 6-foot-8. He was named an All-Star in 6 games and suffered a arm injury. He was teamed with Tom Gola with the Philadelphia Warriors and was teamed with Paul Arizin and he with Philadelphia. Johnston became a coach with Philadelphia and was a professional. He was a player and became a scorer. Johnston was a substitute and played 2 seasons, 95 games, and 1960 with the Warriors. He signed with the Philadelphia Phillies in 1949 and joined the Warriors in 1951. He led the National Basketball Association in a record and field goals in 1952-53 and notched more than 20 points in 1953-55. He led the NBA in a record, points, and field goals in 1953-54 and was a rebounder in 1954-55. He led the NBA in a record and field goals in 1954-55 and suffered a knee injury in 1955. Johnston was named an All-NBA in 4 seasons and won the NBA Championship in 1955-56 and 1956. Johnston led the NBA in a record and field goals in 1956-57 and suffered a knee injury in 1958-59. Johnston became a player-coach with the Pittsburgh Condors in 1961 and recorded a 53 record in 1962-63. K a r e e m A b d u l - J a b b a r Kareem Abdul-Jabbar was born in New York, United States on April 16, 1947. He was known for his scoring, agility, and hook. Abdul-Jabbar was nicknamed "Big 0" and "Skyhook." Lew Alcindor weighed 235 pounds, and he stood 7-foot-2. Alcindor stood 7-foot-2. Alcindor was teamed with Oscar Robertson with the Milwaukee Bucks, and he was teamed with Wilt Chamberlain. Abdul-Jabbar was teamed with Magic Johnson with the Los Angeles Lakers 152 and became a force. Alcindor played 1965 and 1969 with the UCLA Bruins, and he was named an Ail-American in 3 games. Abdul-Jabbar played 5 seasons with Milwaukee and played 14 seasons with the Lakers. Alcindor won the NCAA Championship in 1967 and 1969, and he won the Podoloff Cup in 1968, 1971, 1972, and 1974- Abdul-Jabbar earned the NCAA Championship in 1968 and earned the NBA Championship in 1969, 1980, 1982, 1985, 1987, and 1988. Alcindor captured the NBA Championship in 1970 and 1970-71, and he joined Milwaukee in 1971. Abdul-Jabbar was traded by Milwaukee to Los Angeles in 1975 and was named an All-Star in 5 seasons. He earned the MVP Award in 1989 and retired in 1989. Alcindor was named an All-American in 3 seasons. Joe Dumars Joe Dumars was born in Shreveport, Louisiana, United States on May 24, 1963. Dumars was known for his all-around skill. He was nicknamed "Bad Boys." Dumars weighed 195 pounds and stood 6-foot-3. Dumars was a guard with the Detroit Pistons and was a guard. Dumars was a selection and was a player with the Pistons. Dumars was a player and became a good scorer. He was a leader and was a member. He played 14 seasons, 1998-99, 1999, and 1985 with Detroit and was drafted in the first round by Detroit in 1985. He scored 24 assists and 109 points in 1989 and captured the NBA Championship in 1989, 1990, 2000, and 2004- Dumars was named an All-Rookie in 1 season, and his jersey was retired by Detroit in 2000. Dumars retired in 2000 and became a president in 2000-2001. He was a president in 2005. Tom Gola Tom Gola was known for his speed and hands. He was nicknamed "Mr. All-Around." He weighed 220 pounds and stood 6-foot-6. He was teamed with Paul Arizin and Neil Johnston with the Philadelphia Warriors and was a passer. He was a hero and was a player with Philadelphia. Gola was a player and was an instant star. Gola was a strong rebounder and scored 2,962 assists and 7,871 points in 698 games. Gola recorded 2,462 points and 2,201 153 rebounds in 1953 and captured the NBA Championship in 1953, 1955-56, and 1960. He won the NCAA Championship in 1953 and 1954 and was named an All-American in 3 seasons. He scored 10.8 points and 18.7 rebounds in 1955 and scored 20.9 points and 18.7 rebounds in 1955. He was named an All-America in 1955, 1960, and 1955 and joined Philadelphia in 1955. Gola joined the San Francisco Warriors in 1957 and was named an All-NBA in 1958 and 1958. He retired in 1965-66. Bobby Clarke Bobby Clarke weighed 180 pounds and stood 5-foot-10. He joined the Flin Flon Bombers at age 8 and played age 8 and 1967-68 with Flin Flon. He was a general manager with the Philadelphia Flyers and was a president with the Flyers. Clarke was a center and was a checkers with Philadelphia. He was a checkers and was a leader with the Flyers. Clarke was a faceoff and was a men. Clarke recorded 117 assists, 51 goals, and 168 points in 1967 and recorded 117 assists, 51 goals, and 168 points in 1967-68. He recorded 86 assists, 51 goals, and 137 points in 1968 and earned the Stanley Cup in 1968, 1969, 1974, 1975, and 1982. He won the Bill Masterton Memorial Trophy in 1969 and was picked in the second round by the Flyers in 1969. He scored 36 assists and 27 goals in 1969-70 and captured the Hart Trophy in 1973, 1975, and 1976. Clarke scored 358 goals and 852 assists in 1984 and retired in 1984- He was a general manager in 1990 and joined the Flyers in 1994-Eddie Shore Eddie Shore was born in Saskatchewan, Canada on November 25, 1902. He was known for his slapshot, scoring, and playmaking. He was nicknamed "The Edmonton Express." Shore played 35 seasons with the Springfield Indians and was teamed with Lionel Hitchman. He became a manager and was a defenseman with the Boston Bruins. Shore became a tough defenseman and was a forward with the Toronto Maple Leafs. Shore was an end and was a valuable player. He was a scoring threat and was a professional. Shore joined the Regina Capitals as a forward in 1924 and recorded 12 goals in 1925. Shore won the Stanley Cup in 154 1925 and won the Hart Trophy in 1927, 1930-31, 1933, 1935, 1936, and 1938. Shore was named an All-Star in 8 seasons and retired in December 1933. He retired in 1939-40 and was inducted into the Hockey Hall of Fame in 1947. He was a player in 1985. Shore died in 1985. 155 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items