UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Multilingual machine translation : a case study of Spanish-English reflexives Sharp, Randall Martin 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


831-ubc_2007-319277.pdf [ 18.97MB ]
JSON: 831-1.0076892.json
JSON-LD: 831-1.0076892-ld.json
RDF/XML (Pretty): 831-1.0076892-rdf.xml
RDF/JSON: 831-1.0076892-rdf.json
Turtle: 831-1.0076892-turtle.txt
N-Triples: 831-1.0076892-rdf-ntriples.txt
Original Record: 831-1.0076892-source.json
Full Text

Full Text

Multilingual Machine Translation: A Case Study of Spanish-English Reflexives by Randall Martin Sharp B . S c , S imon Fraser Un ivers i ty , 1977 M . S c , Un ive rs i t y o f B r i t i sh C o l u m b i a , 1985 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Doctor o f Ph i losophy in T H E F A C U L T Y O F G R A D U A T E S T U D I E S ( Interdiscipl inary Studies) The University of British Columbia August 2007 © Randa l l M a r t i n Sharp, 2007 11 Abstract This dissertation describes a formal ism for mult i l ingual machine translat ion and its appl icat ion to the translation of reflexive constructions between Spanish and Engl ish. The language de-scr ipt ion is principle-based, with l inguist ic rules div ided between language-specific components and shared language-independent components. The principles define head projection and the licensing of attachments to the projection. Rules create l inguistic objects as tree structures, where nodes are represented as feature structures (attribute-value pairs). Feature unif ication, augmented wi th negative, disjunctive, and condit ional constraints, defines the feature content of the objects. Translat ion is accomplished by defining levels of representation based on mor-phological , syntactic, and semantic (predicate-argument) structures, and by defining structural transformations between levels. Language transfer is based on the Pr inc ip le of Semantic Compat -ibi l i ty, in which semantic representations (trees and features) across languages are unifiable with each other. Us ing this formalism, rule components for Engl ish and Spanish are defined which maximize rule-sharing; only ten phrase structure rules are required for a substantial subset of the Engl ish and Spanish languages. A l l grammat ica l relations are expressed in terms of local trees (root node and immediate daughters). Arguments and modifiers are licensed in canonical posit ion, and displaced arguments may appear either in subject or c l i t ic posit ion. A displaced argument is associated wi th its predicate not by movement but by unify ing features wi th in a chain structure spanning al l of the local trees between the argument and the predicate. G iven this l inguistic model , an implementat ion of reflexives is described in which a reflexive pronoun discharges either an internal or an external argument. If an internal argument is discharged, a personal reflexive construction results, as in Juan se vio ' Juan saw h imse l f . If an external argument is discharged, a nonpersonal reflexive construction results, as in Se venden pehodicos 'Newspapers are sold 7 . In both personal and nonpersonal reflexive constructions, the sentential subject, whether lexical or nul l , binds the reflexive. In the nonpersonal reflexive construct ion, a pleonastic null subject binds the reflexive. A Reflexive Pr inc ip le , implemented over local trees, determines when a pronoun is reflexive and when not, depending on the binding potential of an undischarged external argument. CONTENTS i i i Contents Abstract ii Contents iii List of Tables xi List of Figures xii Acknowledgements xiii Dedicat ion . xiv 1 Introduction 1 2 Formaliz ing Mult i l ingual Machine Translat ion 4 2.1 Overview of Machine Translation Strategies 5 2.1.1 Direct Approach 6 2.1.2 Interlingual Approach 7 2.1.3 Transfer Approach 10 Bilingual Transfer 12 Multilingual Transfer 13 2.1.4 Principle-Based Approach 15 2.1.5 The Relation between Data and Algorithms 17 2.2 Multilingual Machine Translation Methodology 17 2.3 Data Structures 20 2.3.1 Feature Structures 21 CONTENTS iv Definit ions 21 M M T Nota t ion 22 2.3.2 Trees 23 Definit ions 23 M M T Notat ion 25 2.4 Feature Set Descriptions 26 2.4.1 Posit ive Constra ints 28 2.4.2 Nonposit ive Constraints 29 Negative Constraints 30 Dis junct ive Constra ints 34 2.4.3 Conjo ined Feature Sets 35 2.4.4 Constra int Resolut ion 38 2.5 Context-Free Grammars 38 2.5.1 Forma l Def ini t ion 39 2.5.2 Simpli f icat ions 40 2.6 Rule Systems 42 2.6.1 Lexicon 43 Lexica l Rules (L-Rules) 43 Lexica l Feature Rules (LF-Rules) 46 2.6.2 Generators 48 Generator Rules (G-Rules) 49 Generator Feature Rules (GF-Rules) 50 2.6.3 Tree Transformers 52 Transformer Rules (T-Rules) 53 Type 1: Copy 55 Type 2a: Contract ion 56 Type 2b: Expansion 57 Type 3: Compression : 58 Type 4: Ext rac t ion 59 Transformer Feature Rules (TF-Rules) 60 2.6.4 M u l t i w o r d Transformers 62 CONTENTS v Mu l t iword Rules (MW-Rules) 62 Textua l/Morpholog ica l Level 62 Morphologica l/Syntact ic Level 64 Mu l t iword Feature Rules (MWF-Rules ) 64 2.7 Conclus ion 65 3 Core Linguistic Structures 66 3.1 Pro ject ion Structure 68 3.1.1 Project ion Licensing Condi t ions 72 3.1.2 Project ion Features 75 3.1.3 Pro ject ion Rules 77 3.2 Phonologica l Structure 79 3.2.1 Engl ish A / A n A l lomorphy 79 3.2.2 Spanish 81 Coord inator Al ternat ions 81 Definite Ar t i c l e A l ternat ion 82 3.3 Morphologica l Structure 84 3.3.1 Morphologica l Features 85 3.3.2 Morphologica l Rules 89 Engl ish Morphologica l Rules 90 Spanish Morphologica l Rules 92 3.4 Syntact ic Structure 93 3.4.1 Syntact ic Features 94 3.4.2 Syntact ic Rules 103 B inary Branch ing 104 Headless IP 105 Extended Project ions 107 3.5 Semantic Structure 107 3.5.1 Semantic Features 109 3.5.2 Semantic Rules 113 3.6 C l i t i cs 115 CONTENTS vi 4 Phrasal Structures 119 4.1 Argument Structure 119 4.1.1 Feature Content 120 Zero-Argument Heads 122 One-Argument Heads 123 External Only 124 Internal Only 126 Nouns: - 127 Prepositions: 128 Adjectives: 130 Closed Class Categories: 131 Two-Argument Heads 134 Three-Argument Heads 135 4.1.2 Complement Licensing Rules 138 4.2 Chain Structure 141 4.2.1 Chain Creation 142 4.2.2 Chain Transmission 143 4.2.3 Specifier Licensing Rule - - • - 144 4.2.4 Clitic Licensing Rules 146 4.2.5 Null Subject 151 4.2.6 Subject of Infinitivals 152 4.2.7 Saturation 154 Optionality 155 Saturation of Argument Structure 156 Saturation of Chain Structure 157 Internal Arguments 159 External Argument of a Finite IP 159 External Argument of an Infinitival IP 161 External Argument of an Exceptional Case-Marked IP 162 4.2.8 Nominalization 166 Pforms 169 CONTENTS vii Suppression of Agent in Agentive Nominate 172 4.2.9 Passivization 172 Internalization of External Argument 174 Internalization of Internal Argument 175 Passivization of Double-Object Verbs 177 Passivization of E C M Verbs 179 4.3 Modifier Structure 181 4.3.1 Modifier Features 183 4.3.2 Modifier Licensing Rules 184 4.4 Summary of Rules 186 5 R e f l e x i v i t y 187 5.1 Morphological Forms 190 5.1.1 English 191 5.1.2 Spanish 191 5.2 Types of Reflexives 193 5.2.1 Personal Reflexives 195 Referential Reflexive 196 Obligatory Reflexive 196 Dative Reflexive 198 Aspectual Reflexive (Spanish) 200 5.2.2 Nonpersonal Reflexives 201 Impersonal Reflexive 201 Passive Reflexive 202 Inchoative Reflexive 204 Middle Reflexive 204 5.2.3 Spurious se (Spanish) 205 5.2.4 Emphatic Reflexives 206 5.3 Analysis of Reflexives 207 5.3.1 Binding 208 Coreferents 208 CONTENTS viii Command Relation 210 Domain 211 5.3.2 Argument Discharge 214 Personal Reflexives 214 Nonpersonal Reflexives 216 Object Agreement 219 Externalization of Object 222 5.3.3 Infinitival Clauses 223 Internal Reflexive Argument 223 External Reflexive Argument 224 Clitic Climbing 225 5.3.4 Local Binding Structures 226 5.4 Implementation of Reflexives 226 5.4.1 Implementing the Reflexive Principle 227 English Reflexives 227 Spanish Reflexives 230 Personal Reflexives 231 Nonpersonal Reflexives 233 5.4.2 Transforming Reflexive Constructions 235 Personal Reflexives 235 Referential Reflexive 235 Obligatory Reflexive 236 Aspectual Reflexive 238 Nonpersonal Reflexives • 239 Impersonal Reflexive 240 Passive Reflexive 241 Spurious se 242 5.4.3 Ambiguities in Reflexive Constructions 244 6 T h e Translat ion Process 245 6.1 Textual Input/Output 245 CONTENTS ix 6.2 Morpholog ica l Processing 246 6.2.1 Textua l/Morpholog ica l Level Processing 247 6.2.2 Morphologica l Level Processing 248 Morphologica l Analys is 248 Morphologica l Synthesis 251 6.2.3 Morphologica l/Syntact ic Level Processing 253 Normal izat ion 253 Disambiguat ion 255 Compound Nouns 256 6.3 Syntact ic Level Pars ing 257 6.4 Tree Transformation 262 6.4.1 Analys is : Syntact ic to Intermediate Level 262 6.4.2 Analys is : Intermediate to Transfer Level 265 6.4.3 Transfer 269 Refiexivity in Transfer 274 Personal Reflexives 274 Nonpersonal Reflexives 276 Complex Transfer 279 Terminology Transfer 280 6.4.4 Synthesis: Transfer to Intermediate Level 282 Obl igatory Reflexive Pronoun 282 Nonreferential Externa l Argument Pronoun 283 6.4.5 Synthesis: Intermediate to Syntactic Level 284 Generat ion of Grammat i ca l Formatives 285 Generat ion of Displaced Arguments 286 Subject 286 C l i t i cs 287 Generat ion of Internal Arguments and Modif iers 288 6.5 Robust Processing 290 CONTENTS x 7 Evaluation 295 7.1 M M T as Machine Translat ion Formal ism 295 7.1.1 M M T and H P S G 298 7.1.2 Less L inguist ica l ly Based Machine Translat ion Systems 300 7.2 M M T and L inguis t ic Ana lys is 302 7.2.1 Transformational Grammars 304 7.2.2 E m p t y Categories . 305 7.2.3 Semantic Propert ies 308 7.2.4 Ambigu i t y 309 7.2.5 Treatment of Translat ional Divergences 311 7.3 M M T and Ref lexiv i ty 318 7.4 Comparat ive Analys is of M M T 329 8 Conclusion 333 Bibliography 336 Appendix A M M T Grammar System 347 A . l C o m m o n Core Component 347 A.2 English-Specific Rules 348 A .3 Spanish-Specific Rules 349 Appendix B Comparative Machine Translation Output 351 Author Index 370 Subject Index 373 LIST OF TABLES xi List of Tables 3.1 Table of Spanish Verb Inflections 93 3.2 Table of Syntactic Categories 94 3.3 Table of Grammatical Formatives 97 3.4 Table of English Determiners 102 3.5 Table of Semantic Classes 112 3.6 Table of Spanish Clitics 115 5.1 Table of English Reflexive Pronouns 191 5.2 Table of Spanish Reflexive Clitics 192 LIST OF FIGURES Xl l List of Figures 3.1 List of Top Level Features 66 3.2 List of Projectional Features 76 3.3 List of Phonological Features 79 3.4 List of Morphological Features 85 3.5 List of Syntactic Features 94 3.6 List of Semantic Features 109 4.1 List of Argument Structure Features 121 4.2 List of Chain Structure Features 142 4.3 List of Modifier Structure Features 183 x i i i Acknowledgements People that have helped in one way or another, direct ly or indirectly, knowingly or unknowingly: • Michael Rochemont, for suggesting I write this dissertation, and, once again, for support ing me fully, if briefly. • Rose-Marie Dechaine, for being such an encouraging, positive, cr i t ica l , and challenging thesis supervisor; I could not have done it without her persistence and insistence. • R ichard Rosenberg, for welcoming me back to U B C and suggesting the Indiv idual Inter-discipl inary Studies Graduate P rogram as the place to pursue a doctorate degree. • Rhodr i Windsor-Liscomb, for l istening to my story, and then accepting me into the Indi-v idual Interdisciplinary Studies Graduate P rogram. • Harvey Abramson, who showed me the art of unif ication and the joy of sushi. • Herr Professor Doktor Johann "Hans' 1 Hal ler , who believed in me and what I was doing.' • Ol iver Streiter, for having the same passion as I. Venues that have been especially conducive to wr i t ing this thesis in a comfortable atmosphere: • Green College at U B C • Boston Pub l i c L ib ra ry at Copley Square • Per l i t a BIO, C l u b Cascadas, Cabo San Lucas, Mexico R A N D A L L M A R T I N S H A R P The University of British Columbia July 2001 XIV To the memory of my mother, my first linguistics teacher CHAPTER 1. INTRODUCTION 1 Chapter 1 Introduction In the late 1970's, the European Commiss ion embarked on a Community-wide project called E U R O T R A to develop a prototype for researching and developing machine translat ion. A t that t ime, nine of the major European languages (Engl ish, Ge rman , French, Spanish, Portuguese, I tal ian, Danish, Du t ch , Greek) compris ing the twelve member countries were designated for implementat ion. A prototype software using the Pro log programming language was developed and distr ibuted to all part ic ipat ing research institutes. Tha t prototype was called ; ' < C , A > , T ' ; (Constructors, A toms , Translators) (A rno ld et a l . 1985; A rno ld et a l . 1986; A rno ld and des Tombe 1987). The < C , A > , T framework was later replaced by the so-called E-Framework (E=Engineering) (Bech et al . 1988; Bech and Nyggard 1991). A t the same t ime, the < C , A > , T framework and software underwent substantial revision and transformation at the Institut fur Angewandte Inforrnationsforschung (IAI) in Saarbri icken, Germany, under the leadership of Prof. Dr . Johann Hal ler , where it became the C A T 2 system (Sharp 1988, Sharp 1991, Sharp 1997; Sharp and Streiter 1992, Sharp and Streiter 1995). Considerable research was undertaken in demonstrat ing the mult i l ingual i ty and pract ical i ty of C A T 2 (e.g. see Hong (1998) 1 ) . The research presented in this dissertation takes the C A T 2 framework one step further. I cal l the new and improved version M M T : Mu l t i l i ngua l Machine Trans lat ion. The goal of this dissertation is four-fold. The first is to elaborate on the formalism behind M M T and demonstrate its suitabi l i ty as a methodology for performing mult i l ingual machine translat ion. The mechanism wi th in M M T that most signif icantly contributes to mult i l ingual i ty is 'Hong (1998) lists English, German, French, Russian, Dutch, Spanish, Arabic. Chinese, and Korean as languages partially implemented using the CAT2 formalism. CHAPTER 1. INTRODUCTION 2 the facil ity to share rule components among languages, in the same way that a computer program calls a common subroutine from various points in the program. The same modular i ty afforded by subrout in ing in engineering is carried over to l inguistic description. For example, a given rule for, say, phrase structures, is shared by more than one language component. By maximiz ing such rule sharing, the system approaches a principle-based methodology for formal izing machine translat ion, applicable in principle to any number of languages. G i ven the facil ity for rule sharing, the second goal applies that facil ity to the problem of l inguist ic descript ion. The second goal is to implement a principle-based approach to gram-mar description such that rule sharing, or grammar sharing, among the language components is maximized as much as possible. Not only the grammatica l components become sharable be-tween languages but so do the transformation components, i.e. those parts of the translat ion methodology that transform source language objects into target language objects. Since gram-mar sharing, in the form of a "Universal G r a m m a r " , is at the heart of principle-based approaches w i th in l inguist ic theory, the l inguistic implementat ion developed here borrows heavily and freely from the Chomskyan school of Pr inc iples and Parameters and also from the, sometimes acrimo-niously compet ing, paradigm of Head-Driven Phrase Structure (HPSG ) . Thus, my second goal is to show that concepts from both of these frameworks can be successfully amalgamated into a powerful tool for l inguistic research and development. A th i rd goal is to demonstrate how one can implement significant subsets of natural language grammars wi th in the M M T formal ism. I have chosen Spanish and Engl ish as represen-tative languages, part ly for historical reasons (a Spanish component to C A T 2 was extensively developed at the Universidad Nacional Au tonoma de Mexico in Mexico C i t y over a four-year period), and part ly for pract ical reasons; Engl ish and Spanish share enough common features of grammar to demonstrate the efficacy of the shared-grammar concept. Tha t also makes it rela-tively easy to develop a nontr iv ia l working translat ion system capable of producing translations, in either direct ion, that are of sufficient qual i ty to make them useful in practice. It is claimed that the bi l ingual treatment presented here is easily and efficiently extended to other (probably largely Indo-European) languages, support ing the c la im to mult i l ingual i ty . 2 2 This view differs slightly from that of Streiter (1996), who claims that a contrastive analysis of languages is required in order to produce a running machine translation system. Here, it is advocated that a comparative analysis, beginning first with two languages and abstracting out the common universal principles of structure, extrapolates readily to functional multilingual machine translation. CHAPTER 1. INTRODUCTION 3 The fourth goal of this dissertation arose dur ing the investigation and implementat ion of Spanish, where it was observed that the Spanish cl i t ic se occurs wi th relatively high frequency in expository texts. Since one interpretation of se is clearly reflexive, as i l lustrated in ( la ) , the question arose as to whether se is also reflexive in sentences such as ( lb ) : (1) a. Juan se vio ' Juan saw himself ' b. Se venden casas 'Houses are sold ' (— 'Houses for sale') Mos t researchers have claimed that se is reflexive in ( la) but not in ( lb ) . I dispute that, c la im, and my fourth goal is to show that se can indeed be analyzed as reflexive in both ( la ) and ( lb ) . I then show how that claim is implemented in the M M T formal ism. The focus on. reflexives exercises all components of the system. For one, Engl ish reflex-ive pronouns are easily distinguished from nonreflexive pronouns in the lexicon by ending in -self/selves. Spanish reflexive cl i t ics, on the other hand, have the same form as nonreflexive cl it ics, forcing disambiguation to occur in the syntax. Bo th Engl ish and Spanish reflexives have the property of realizing a verbal argument, but Engl ish discharges that argument as a comple-ment adjoined to the verbal projection, whereas Spanish discharges the argument adjoined to the inflectional projection. The translation of reflexives is crucial ly dependent on whether the reflexive discharges an internal or external argument. Thus, the study of reflexives provides a rich testbed for research in the many facets of linguistically-based machine translat ion. The dissertation is organized as follows. Chapter 2 presents the M M T formal ism. Chap -ter 3 presents the implementat ion of the core l inguistic properties of the l inguistic structures generated by the formalism, such as projection structures, morphology, syntax and semantics. Chapter 4 covers the relations between phrases in phrase structures, characterizing those rela-tions as either arguments or modifiers. Chapter 5 presents the analysis and implementat ion of reflexives in Engl ish and Spanish. Chapter 6 demonstrates how the translation engine works by i l lustrat ing the different stages an input text undergoes on its way to the translated output text. Chapter 7 presents an evaluation of the research and compares it to related work by other researchers. Chapter 8 presents concluding remarks, and briefly mentions those areas where further research would be next directed. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 4 Chapter 2 Formalizing Mul t i l ingua l Machine Translation Th is chapter describes the computat ional system for formal iz ing the analysis, t ranslat ion, and synthesis of natural language texts. In other words, this chapter presents a description of a formalism for performing machine translat ion. Since the l inguist ic perspective is mult i l ingual , the formalism defines mult i l ingual machine translat ion, or M M T . The M M T model comprises: • a formalism for describing l inguist ic structures and mapping relations between structures; • a set of processing strategies for reading texts, analyz ing their l inguistic structure, and translat ing them into other languages. The first aspect of M M T is ful ly described in this chapter. The second aspect is fully described in Chapter 6, "The Translat ion Process" . V iewed as a software tool , the M M T program provides a l inguistic development envi-ronment for designing, testing and executing M M T rules. Wr i t ten entirely in P ro log , 1 M M T consists of a command shell for issuing commands, for example for loading and unloading rule files, for input t ing text, and for creating, deleting, displaying and translat ing l inguist ic objects. In addit ion, a debugging tool allows for t rac ing rule appl icat ion dur ing grammar development. A l though I have programmed these features into the M M T software system, I refrain from A have consistently used SICStus Prolog, although no special SICStus-defined constructs have been used to my knowledge. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 5 describing them here, as they are irrelevant for the topic of this dissertation.2 Section 2.1 begins with an overview of machine translation as a "black box", and discusses different ways of viewing the components within it, culminating in the configuration underlying the M M T framework. The remainder of the chapter presents the M M T formalism, beginning in Section 2.2 with the notion of levels of representation, followed in Section 2.3 with a discus-sion of the two types of data structures found in M M T , tree structures and feature structures. Section 2.4 introduces the mechanism of feature unification and constraint resolution, which com-bine features within feature structures, and which render the M M T formalism unification-based, similar to other formalisms within computational linguistics such as L F G (Lexical Functional Grammar) (Bresnan 1982; Bresnan 2001), F U G (Functional Unification Grammar) (Kay 1983; Kay 1984), and H P S G (Head-Driven Phrase Structure Grammar) (Pollard and Sag 1994), among others. Section 2.5 describes context-free grammars, the basis for the grammar rules that form the linguistic structures in M M T . Section 2.6 covers the different rule systems in M M T that define the lexicons, grammar rules, and transformational rules of an M M T system. 2.1 Overview of Machine Translation Strategies Viewed as a black box, a machine translation system takes a text in a source language S and translates it as text in a target language T , as illustrated below: (1) Text Machine Translation System Text, The text can be a word, phrase, sentence or multiple sentences, although we will limit the range of the text in this research to a single sentence. Within the black box, the M T system may utilize various linguistically-based strategies for performing translation.3 Three prominent approaches to M T have been developed, the direct translation approach, the interlingua approach, and the transfer approach. These are discussed below. 2 A complete description of C A T 2 , the forerunner of M M T , is given in Sharp (1994). The command shells of C A T 2 and M M T are virtually identical. 3Less linguistic or nonlinguistic strategies, such as Example-Based M T and Statistics-Based M T , are not described in this dissertation. See Chapter 7, §7.1 for discussion. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 6 2.1.1 Direct Approach The direct approach is the earliest and most pr imit ive strategy for M T . It performs a word-for-word translat ion, w i th possible rearrangement of words in the target language. The components wi th in the black box in a direct approach are i l lustrated in the diagram below (adapted from Hutchins and Somers (1992:72)): (2) Texts Analysis Machine Translat ion System Ri Transfer Ri Reordering ^ T e x t r ) The first stage is a morphological analysis of word forms, in which inflected words are reduced to their uninflected forms. The result ing representation. R\, is passed to the transfer stage, which replaces the source language words wi th target language words. The result of transfer, i?2, enters the final stage in which word forms are reordered according to the properties of the target language. For example, at this stage an adjective-noun sequence in a source language such as Engl ish (e.g. public telephone) would be re-ordered into a noun-adjective sequence in a target language such as Spanish (telefono publico). Since the direct approach is specific to one source language and one target language, the components can be tailored to fit exact ly that language pair and that direct ion. Accord ing to Hutchins (1986:54): the basic assumption is that the vocabulary and syntax of SL [source language] texts need not be analysed any more than str ict ly necessary for the resolution of ambigu-ities, the correct identif ication of appropriate T L [target language] expressions and the specification of T L word order. Thus if the sequence of SL words is sufficiently close to an acceptable sequence of T L words, then there is no need to identify the syntactic structure of the SL text. Hutchins then goes on to name some of the earlier systems that were based on the direct approach, incorporat ing varying degrees of syntactic analysis. Because of the impoverished syntactic and semantic analysis inherent in the direct ap-proach, and the lack of generality across language pairs, the direct approach has been replaced by " indirect ' 1 approaches, either the interlingua-based or the transfer-based approach. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 7 2 .1 . 2 Interlingual Approach The interl ingual approach to M T is based on the notion of an interlingua, i.e. a representation of the meaning of a sentence independent of any specific language. Inside the black box, the interlingua-based approach has the following structure: (3) Text Machine Translat ion System R Analys is Synthesis T e x t r ^ The result of analyzing a source text Texts is a n interl ingual representation R, a language-independent representation of the meaning of Texts . Since this intermediate representation is language-independent, it can be used as the input to synthesis for any target language to produce the target text Textr- For this reason, the interl ingual model achieves the most benefit in mult i l ingual M T systems. The inclusion of any addit ional language entails the construction of the analysis and/or synthesis module for that language, without regard to the other languages in the system. The diagram in (3) i l lustrates graphical ly the relationship between sound (represented as text) and meaning (represented by R) in the Saussurean sense: the sound (text) is highly idiosyncrat ic and language-specific, while the meaning is universal and language-independent. The analysis procedure relates sound to meaning, and the synthesis procedure relates meaning to sound. The diff iculty in the interl ingual system is finding the appropriate set of language-inde-pendent constructs that wi l l be val id for a diverse set of natural languages. Th i s includes finding a common structural representation and a common lexical representation which does not depend on the language being analyzed or the language being synthesized. A n early attempt at employing an inter l ingua was the D L T (Distr ibuted Language Trans-lation) system (Wi tkam 1983; Schubert 1988), which ut i l ized Esperanto as the representation language for R. A l though morphological ly regular, Esperanto demonstrates the same syntactic and semantic ambiguity and vagueness as any natural language. Th i s meant that the problem of translat ing between two natural languages in D L T was doubly compounded by first having to translate the source language text to an Esperanto text, and then translat ing the Esperanto text to the target language. Esperanto was no more appropriate as an interl ingua than any CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 8 other language. Most interlinguas currently being investigated are representations of concepts. For exam-ple, the U N I T R A N system (Dorr 1987; Dorr 1993) utilizes JackendofFs (1983, 1990) Lexical Conceptual Structure (LGS) as an interl ingual representation of meaning, reminiscent of the conceptual dependency representations of Shank (1972). As an example of the interl ingual rep-resentation in U N I T R A N , the following sentence in (4a) has the L C S representation in (4b) (Dorr 1993:122): (4) a. John believes in unicorns O- [Event B E p e r c {[Thing J O H N ] , [Path INperc ([position A T p e r c ([Thing J O H N ] , [Thing UN ICORNS ] ) ] ) , [Manner B E L I EVTNGLY ] ) ] The constructs employed in Dorr 's interl ingua, based on JackendofFs L C S , are taken from a set of types, primitives, and fields. The types define a set of ontological categories ("conceptual constituents") that describe the types of entities that may occur in the world, and include such things as Thing, Event, State, Position, Path, Property, Location, Time, Manner, Intensifier, Purpose, and Amount.4 The primitives are the ind iv idua l tokens of specific types; for example G O , B E , and C A U S E are primit ives of type Event, and J O H N , U N I C O R N , W E A T H E R V A N E , etc., of type Thing. The fields are semantic labels that differentiate varying domains for a spe-cific pr imit ive, such as Perceptional (as in B E p e r c above), Locational, Possessional, Temporal, Identificational, Circumstantial, and Existential (Dorr 1993:98). The following sentences i l lus-trate how the verbs go/change, be, and keep change meaning depending on whether they express concepts of spatial location and motion, possession, ascript ion of properties, or scheduling of activit ies (from JackendofT 1990:25): (5) a. Spatial location and motion i. The bird went from the ground to the tree i i . The bird is in the tree i i i . Harry kept the b i rd in the cage 4 Dorr does not, use all of JackendofT's categories, such as Amount, and renames some, such as Position for JackendofFs Place (Dorr 1993:99fn2). The exact set of types employed by JackendofT or Dorr is not important here. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 9 b. Possession i. The inheritance went to Ph i l i p i i . The money is Ph i l ip 's i i i . Susan kept the money c. Ascription of properties i. The light went/changed from green to red i i . The light is red i i i . Sam kept the crowd happy d. Scheduling of Activities i. The meeting was changed from Tuesday to Monday i i . The meeting is on Monday i i i . Let 's Jceep the tr ip on Saturday As in any ontological system, a major problem is denoting and de l imi t ing the set of primit ive symbols compris ing the system. In L C S al l lexical knowledge must be universally representable in terms of a finite set of types, primitives and fields. 5 Sometimes the labels can be obtuse, such as Dorr 's Manner primitives L I K I N G L Y , D A N C I N G L Y , G I F T I N G L Y , W R I T I N G L Y , S W I M M I N G L Y , etc. (Dorr 1993:100), or a Thing pr imit ive such as K N I F E -W O U N D , a conceptual component of the sentence I stabbed John and the Spanish equivalent Yo le di punaladas a Juan T to-him gave knife-wounds to Juan' . A s Tru j i l lo (1999:171) mentions, "[sjome types include a large number of primit ives in an almost one-to-one relationship w i th the open class words in the language." If this is the case, then there may be l i tt le advantage to an interl ingual system over a transfer system (described in the next section), in which words and their translations are direct ly related individual ly. Furthermore, words in one language often correspond to primitives for which no direct translation is available in a target language, forcing an unnatural classification of primit ives. Th is is evidenced in the sentences below, in which the Manner pr imit ive L I K I N G L Y is introduced into the L C S in (6a) in order to equate the German manner adverb gern in (6b) w i th the Engl ish construct ion like+to+Vvnfin in (6c): (6) {Event B E c i r c ([Thing I], [Position ATClrc ([Thing I], [ E v e n t SWIM)) ] , [ M a n n e r L I K INGLY ] ) ] b. Ich schwimme gern T swim gladly ' 5Jaekendoff also introduces features which may attach to primitives, such as [idistributive], [icontact], [iattachment], etc. These allow a further level of delineation, e.g. ON[_ d i s t ] for on the Boor versus ON[ + d i s f ] for all over the floor (Jackendoff 1990:104-105). CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 10 c. I like to swim Other interl ingual systems include the (now defunct) R O S E T T A system (Rosetta 1994), whose inter l ingua was based on Montague semantics, K B M T (Knowledge-Based Machine Translation) (Nirenburg et al . 1992), which incorporates l inguistic as well as domain and real-world knowl-edge into the inter l ingua representation, and the K A N T and K A N T O O systems (Nyberg and M i t a m u r a 1992; Nyberg and M i t a m u r a 2000), which rely on restricted subsets of the vocabularies and grammars, referred to as controlled language (M i t amura and Nyberg 1995). 2.1.3 Transfer Approach In the transfer approach, the M T system separates the task into the three major subtasks i l lustrated below: (7) Text Analvs is Machine Translat ion System Transfer R Synthesis T e x t r ^ The first stage analyzes the source text and produces representation R$. In the transfer stage, Rs is transformed to RT, a representation expressed in terms of the target language. RT is then used as input to the synthesis stage, which produces as output the target text. Transfer must perform two different types of transformations, lexical and structural . Lex-ical transfer equates words in the source language to words in the target language, such as dog—>perro in English-to-Spanish, dog—*Hund in English-to-German, and so on. (In inter l in-gual systems, this correspondence would be achieved by assigning dog, perro, and Hund an appropriate concept label such as CAN INE-1 . ) S t ructura l transfer transforms al l other structures larger than the word. Rs and RT are generally semantic (=meaning) representations, since syntact ic representations of translat ional ly equivalent sentences are rarely isomorphic. A s a simple example, the Engl ish sentence I looked for a book might have the syntactic structure in (8a) whereas the Spanish equivalent has the structure in (8b): CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 11 (8) V ^ N P (yo) busque Det N un libro F i rs t , the Engl ish sentence contains an expl ic i t subject as opposed to Spanish where pronominal subjects are usually nul l , Spanish being a pro-drop language. Second, the Engl ish structure includes the preposit ional phrase 6 for a book but the Spanish structure does not, hence a structural transformation would be required. If we change the sentence to future tense, another transformation would be required to el iminate the Engl ish auxi l iary will, at the same time changing the inflection of the Spanish verb to buscare. G iven the typical ly high number of structura l divergences between languages for any given sentence, syntactic structures are not generally used in transfer-based M T systems. Instead, it is more common to base the representations Rs and Rj- on some aspect of mean-ing, such as predicate-argument-modifier structure (hereinafter referred to as just predicate-argument structure), i.e. a structure in which a predicate (e.g. a verb), its thematic arguments or #-roIes (e.g. AGENT, T H E M E , etc.), and its modifiers (e.g. adverbials, relative clauses, etc.) are configured together, abstract ing away from the actual syntactic relations between them. The working hypothesis is that predicate-argument structure reflects one aspect of structured meaning (von Stechow 1982; Cresswell 1985), which is hypothesized to be invariant across lan-guages. To capture the correlation of semantic representations across languages, I propose the P R I N C I P L E OF S E M A N T I C CONSISTENCY , informal ly defined in (9): (9) Principle of Semantic Consistency: If construct Cs in language Ls has the same meaning as construct Cy in language LT, then the semantic representation of Cs is consistent with the semantic representation of cy. 6 T h i s ignores the other possibility that look-for is a verb-particle construction, perhaps having the structure in (i): V look for which also would require a structural transformation. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 12 Semantic consistency between two constructs entails that the meanings of the constructs are compatible w i th each other. A t the thematic level, for example, if one construct is semantical ly analyzed as an AGENT, then the corresponding construct in the target language is also analyzed as an A G E N T and not, say. a T H E M E or GOAL . A t the lexical level, if a noun refers to an animate being in one language, then its translation in any target language also refers to an animate being. The P R I N C I P L E OF S E M A N T I C CONSISTENCY is implemented by the operation of unif ication, discussed later. The transfer methodology converts the left-hand syntactic structure in (10) to the adjacent predicate-argument structure. The transfer stage transforms the predicate-argument structure to a semantically consistent predicate-argument structure, which synthesis then transforms to the syntactic structure on the right. (10) A G E N T PRED T H E M E look book A G E N T PRED T H E M E I I I yo buscar libro Bilingual Transfer The transfer-based approach is inherently bi l ingual , as opposed to mult i l ingual , since it entails transfer rules between two specific languages. In most industr ia l environments, a mult i l ingual configuration would consist of mult iple independent bi l ingual transfer systems, arranged as fo l -lows: (11) English* Analys is Engl ish j Analys is E n g l i s h n Analys is Transfer Transfer : Transfer RG) Spanish Synthesis German Synthesis Rj Japanese Synthesis Since each system in (11) is independent of the others, the components can be tai lored to the specific language pair involved. For example, the Engl ish , analysis module can produce a repre-CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 13 sentation R£t which is tailored for translat ion to Spanish, while English^ produces tailored for German , English„ for Japanese, etc. M a n y commercial M T systems are designed along these lines, since it allows for language-specific rule opt imizat ion (as well as language-pair merchan-dising). For example, in the Engl ish-German system, the structure of the Engl ish sentence could resemble the German structure in order to make transfer into German easier, whereas in the English-Spanish system, a Spanish-like structure might be adopted. The disadvantage to this approach is that there is no necessary commonal i ty between the various Engl ish modules, so that extensions to one Engl ish component wi l l not be automat ica l ly reflected in any other Engl ish component. Over t ime, the Engl ish analysis modules become increasingly divergent, thereby increasing the cost and effort of maintenance. The same applies to mult iple synthesis components for the same target languages, for example English-to-Spanish, German-to-Spanish, Japanese-to-Spanish, etc., as i l lustrated below: (12) Engl ish Analys is German Analys is fic! Transfer Transfer Spanishj Synthesis Spanish^ Synthesis Japanese Analys is R Transfer Span i sh n Synthesis If, as in (12), each Spanish synthesis module is independent of the others, maintenance of one wi l l not be automatical ly reflected in any of the others, leading to divergent Spanish synthesis modules. Multilingual Transfer There are a number of ways in which we can combine these mult iple independent systems into one integrated mult i l ingual system. The first is to recognize that if the same analysis module for language X could be used for all systems which have X as the source language, then we have achieved a first measure of mult i l ingual i ty . For example, if Engl ish analysis is defined once, independent of the target language, then Engl ish analysis is no longer tied to one bi l ingual M T system, but is now (at least potential ly) mult i l ingual in its orientat ion. Th is is diagrammed below: CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 14 (13) Engl ish Analys is Transfer Transfer Rs) Re) Spanish ) ' Synthesis German Synthesis Transfer Japanese Synthesis Similar ly, if mult iple source languages a l l translate into the same target language, for example English-to-Spanish, German-to-Spanish, Japanese-to-Spanish, etc., then by making the synthesis module common to al l of the Spanish systems, we achieve a st i l l higher degree of mult i l ingual ity. (14) Engl ish Analys is Ge rman Analys is RE Transfer Transfer Japanese Analys is Transfer Spanish Synthesis In (13), the analysis of an Engl ish text is carried out independently of the target language in which the translat ion wi l l be performed, and s imi lar ly in (14), the synthesis of a Spanish text is independent of the or iginat ing language. The burden of translation is localized to the Transfer module. Th i s is the approach taken in M M T , as well as numerous other mult i l ingual transfer-based M T systems. The next logical move conflates the analysis and synthesis modules for one language into one combined module. In the configuration in (13), the analysis module for Engl ish may be completely independent of the synthesis module for Eng l i sh . Wh i l e potential ly desirable from the perspective of runt ime opt imizat ion , it suffers from the similar problem as mult iple analysis modules for the same language, i.e., redundancy of da ta leading to increased maintenance costs. For example, the treatment of negation must be encoded in the analysis module in order to recognize negation in source texts, but must also be encoded in the synthesis module in order to properly generate negation in target texts. Yet the l inguistic structures involving negation should be applicable regardless of the direction of processing. Once the language modules refer to static descriptions of the language and are not restricted to analysis-only or synthesis-only CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 15 operations, the labels 'Ana lys is ' and 'Synthesis' can be removed, and the direct ion of processing can now be made reversible, as i l lustrated in (15): (15) Engl ish Transfer Spanish Transfer German Transfer Japanese The Engl ish module is used in analysis to convert an Engl ish text to RE, which is then transferred to the Spanish Rs, from which the Spanish module synthesizes a Spanish text. Conversely, the same Spanish module is used to analyze a Spanish text y ielding Rs, which is transferred to the Engl ish RE, from which the Engl ish module synthesizes the Engl ish translat ion. Th is results in one language module for Engl ish, one for Spanish, one for German, etc., rather than mult iple modules depending on target language or direct ional i ty of processing. D i -rectionality becomes a product of the procedures that convert structures from one representation to the next, and not on the definit ion of the structures themselves. 2.1.4 Principle-Based Approach Whereas the approach described in the previous section concentrated on the integration of modules for one language, a far more interesting and ambitious approach lies in integrating the descriptions of mult iple languages into one mult i l ingual description of language. Such is the direct ion taken in the study of grammar which posits the notion of a Universal Grammar (UG) , where l inguistic structures are derived from universal principles. B y model l ing the un i -versal principles, the task of describing the grammars of mult iple languages becomes reduced to describing how the universal principles apply to each language. For example, cross-linguistically a noun phrase is headed by (i.e. contains as pr imary obl ig -atory component) a noun, a verb phrase by a verb, etc. Such facts should not require repeating for every language, and instead form the basis of a principle of phrase structure as elaborated in X-bar theory (discussed further in Chapter 3). Language variat ion is accounted for by, first, defining an idiosyncratic lexicon for each language, and second, by augmenting the principles CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 16 wi th a system of parameters which control how the principles apply. A n example would be a head parameter that determines if objects follow verbs, as in Engl ish and Spanish, or precede verbs, as in German and Japanese. The Pr inciples and Parameters framework in theoretical syn -tax (formerly known as Government-Binding (GB ) theory (Chomsky 1981; Cul icover 1997)) has arguably provided the greatest insight into how the interaction of parametrized principles might explain grammatical phenomena universally, and has formed the basis of a number of principle-based implementations (Wehrl i 1984; Correa 1988; Fong 1991), inc luding many research projects dedicated to the implementat ion of machine translat ion (Sharp 1985; Sharp 1986; Dor r 1987: Crocker 1988). In M M T , language-universal properties are modelled by defining a core set of principles (i.e. rules which apply universally), and then defining ind iv idua l language components such that they reference language-specific rules in addit ion to the core principles. Th i s is diagrammed below: (16) Engl ish Transfer Spanish Transfer 1 Transfer German )* • \« » Japanese )* * Core Pr inciples The Engl ish component contains the lexicon for Engl ish along wi th English-specific rules of grammar (for example, a rule of Do-Support for sentences such as We d o not support that opinion), and a l ink to the core principles of language, such as the principle of phrase structure. Similarly, the Spanish component contains its own lexicon and language-specific rules, plus the l ink to the core principles, as does German , Japanese, and any other language which becomes part of the system. In effect, the core principles are shared among al l language components, and it is this sharing of core principles that constitutes a fundamental property of the M M T system, making it more suitable for mult i l ingual machine translat ion. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 17 2.1.5 The Relation between Data and Algorithms In earlier periods of natural language processing, there was no formal separation between the linguistic data and the algorithms that processed the data ; the data and the algorithms were combined in one large computer program. In more recent periods, data and algorithms are separated. The data becomes the set of rules describing the linguistic knowledge, and the algorithms make up the rule interpreter or engine which applies the rules. Th i s allows l inguistic development and refinement to be carried out independently of the algorithms that process the data, and s imi lar ly algorithmic refinement can occur independently of the development of the linguistic data. Th is is also the case in M M T , where l inguistic data is separate from the rule interpreter. The rule interpreter, written in the Pro log programming language, takes advantage of Pro log unif ication, automatic backtracking, and deductive logic evaluation to derive the translation of a text from a given input. The various components of the M M T rule interpreter, including parsing and transformation, are described in Chapter 6. The l inguistic data in M M T is expressed in a uniform notation which can be interpreted' by the rule interpreter. Each of the components making up the analysis, transfer, and synthesis stages is expressed using the same notat ional devices (tree and feature structures). The M M T user notat ion, together wi th the procedures that operate on the structures created by M M T constitute the M M T formalism, described in the remainder of this chapter. 2.2 Multilingual Machine Translation Methodology The basic methodology of M M T is to describe the translat ion process in terms of levels of representation of a text and transit ions between levels. A source text Texts in source language <S is transformed from its surface representation (a str ing of words) to an abstract level of representation L\. The representation L\ is then transformed to a second level of representation Z/2, and so on unt i l the final abstract level of representation Ln is achieved. F rom there, the target text T e x t r is produced in the target language T . Th is is diagrammed in (17): (17) Tex ts L\ <-> L 2 <->•-•<—> L/t <—> Lfc+i <-»-•-<-> L n _ i <—» Ln <—> Tex tx I Analys is •! Transfer I Synthesis A CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 18 In a transfer-based model, the transfer phase occurs between two levels, L^ and Lk+x, as ind i -cated above. The analysis phase encompasses a l l of the stages preceding Lk, and the synthesis phase encompasses a l l of the stages following Lk+\- (In the interlingua-based approach, Lk would represent the interl ingua structure, and synthesis would transform Lk to the target text.) Since the direct ional i ty is reversible, the informat ion structure of the levels is symmetr ical . That is, L\ wi l l have the same informat ion structure as that of Ln, L2 wi l l have the same information structure as L n _ i , and so forth. Th i s allows analysis to begin at Tex ts or at Textx- As a further consequence, a l l languages have the same number of levels of representation. The first fundamental property of the M M T formalism is that it describes the translation of languages by means of two constructs, generators and transformers. In M M T , each level Li, 1 < i < n, is described by what is referred to as a generator Gi, which is a set of rules that defines ("generates") all and only the well-formed structures at L j . For each adjacent pair L j and L j+ i , a transformer ,Ti+i describes the mapping of structures between Li and Li+\. Th i s is i l lustrated as follows: G\ G2 Gk Gfc+i Gn-\ G n 1 1 i i 1 1 (18) Tex ts <—> L\ <-> Li <—> ••- <—• Lk <-» L/t+i <-» ••• <-> Ln-\ <-> Ln <-> T e x t r T T T T T T T rri rj~i rr~t rrt rrt rf\ r~ri \J-2 2J.3 fc-lJfc k-l-k+l k+l-l-k+2 n - 2 - * n - l n - l - t n The generator G\ describes structures at level L\, generator Gi describes structures at L2, and transformer 1T2 describes the mapping between an L\ structure and an L2 structure, and so on for the other generators and transformers. The generators G\ to Gk describe the levels of representation for one language, £5, just as the generators Gjt+i to Gn describe the levels for language £7-. The transfer of structures from language £5 to language £7- occurs where Gk and Gk+i, the transfer levels, interface: that is, transfer is defined by the transformer kTk+x-(The transformer kTk+\ h i effect constitutes the transfer component i l lustrated i n the diagrams in section 2.1). The generators that define a language, i.e., G\ to Gk for language S and Gk+\ to Gn for language T, are named and have a fixed interpretat ion. Generator G\ (and by symmetry Gn) defines the morphological level, the internal structure of words. For Engl ish , this generator is given the name MLEN ( "Morpholog ica l Level for ENg l i sh " ) ; for Spanish, its name is MLES ( "Morphologica l Level for ESpanoI" ) . Generator Gi (and Gn-i) defines the syntactic level, the structure of phrases wi th in a sentence. The Engl ish syntactic level is named SLEN, and the CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 19 Spanish level SLES. A l l other generators G3 to Gk (and Gk+i to Gn - 2 ) define what are referred to as relational levels. For example, the level at which transfer takes place is a relational level based on the predicate-argument structure of the constituents. The transfer level generator for Engl ish is named TLEN, and for Spanish TLES. The current implementat ion incorporates one addit ional relational level between the syntact ic level and the transfer level. Th i s is an intermediate level of representation in which nul l or covert constituents, such as impl ic i t arguments and so-called "empty categories", are made overt. Th is facilitates the transformation of objects from the syntact ic level to the transfer level and vice versa. For Engl i sh , the generator of this level is labeled ILEN, and for Spanish ILES. The sequence of named generators in a translat ion system for Engl ish and Spanish, and the levels of representation they generate, is i l lustrated in (19) below: ^ ( T e x t g ^ ) ( T e x t £ 5 f Morpho log ica l y Level Syntact ic Level Intermediate V Level J Transfer Level MLEN MLES SLEN SLES ILEN ILES TLEN TLES Morphologica l Level Syntact ic Level J Intermediate y Level Transfer Level The second fundamental property of M M T is that it is principle-based. Th is means that the generators and transformers that define the languages share common properties, the core principles of grammar. For morphological structure, a core morphological generator ML is defined, w i th rules that are shared by all of the language-specific morphological generators, here MLEN and MLES. Similarly, the core syntactic generator SL is shared by the language-specific generators SLEN and SLES, and so on for the other generators. Th is is i l lustrated in (20) below: CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 20 (20) TextEN MorphoIogicaT\ Level J Syntactic Level Intermediate Level Transfer Level tz [T&y±ES fMorphological^ V Level ) Syntactic Level Intermediate A Level J Transfer Level J The translators, like the generators, also share common properties, since the types of structures being transformed between levels can often be generalized across languages. Thus the trans-former between the syntactic and the intermediate relational levels of Engl ish, SLEN^ILENI shares common properties wi th the Spanish equivalent, SLESTILES- Th is commonal i ty is encoded in the core transformer component SLTIL; s imi lar ly for the other transformers. The third fundamental property of M M T is that it uses a uniform notat ion to describe the generators and translators, and is based on two types of data structures, trees and feature structures. These are described next. 2.3 Data Structures The two types of data structures in M M T are tree structures and feature structures. Tree structures symbolize the linear and hierarchical relations between words and phrases, whereas feature structures describe the set of properties that characterize words and phrases, such as their phonological, morphological , syntactic, and semantic properties. These two da ta structures are described below, beginning wi th feature structures. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 21 2.3.1 Feature Structures F i rst , I present the relevant definitions of features and feature structures. Then . I present the M M T notation used to represent feature structures. Definitions A feature is defined as follows: Definition 2.1 A feature is a relation between an attribute A and a value V, represented as A — V. For example, the following feature: (21) c a t = n expresses the relation that the attr ibute c a t (category) has the value n (noun). Definition 2.2 A feature set is an unordered part ia l function from attr ibutes to their values-Features wi th in a feature set are unordered, and for every attr ibute in the feature set, there is one and only one value. Feature sets are commonly represented in unification-based grammar formalisms as an attribute-value matr ix . For example, the feature sets in (22) consist of the three features for 'person', 'number' , and 'gender' that together express the notion ' 3 r d person singular masculine'. Because features are unordered, the feature sets in (22) are all equivalent: (22) a. p e r = 3 num = s i n g gend = masc a. num = s i n g b. p e r = 3 C. num = s i n g gend = masc gend = masc p e r = 3 The property of a feature set being a part ia l function entails, first, that an attr ibute may only have one value. Hence, the fol lowing two feature sets are i l l ic i t because the attr ibute num occurs twice in each (regardless of its values): (23) a. num = s i n g num = p l u r b. num = s i n g num = s i n g Second, the function being part ia l entails that a feature set can consist of zero or more features. The feature sets in (24) below are part ia l feature sets, each one increasingly informative: (24) num = s i n g r i p e r = 3 p e r = 3 J u . num = s i n g num = s i n g gend = masc CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 22 A feature set is total when it contains all of the features that can be defined for it . A total feature set representing agreement, for example, is that in (24d), assuming that agreement is defined as the features for person, number, and gender, and no others. In M M T , total feature sets are defined in feature declarations, which delineate all and only the features that may occur w i th in any feature set. M M T Notation In M M T user notat ion, a feature structure such as that in (24d) is represented as a list of comma-separated features enclosed in cur ly brackets, as in: (25) (pe r=3, num=sing, gend=masc} Th is list is further compi led by the M M T rule compiler into a standard Pro log list structure w i th a t ra i l ing tai l variable to accommodate addit ional features through unif icat ion: (26) [per=3, num=sing, gend=masc|_] However, in general the more customary notat ion in (24) wi l l be used, with the understanding that a l l feature structures in the M M T user notat ion are expressed as in (25), and that they are further compi led into Pro log list structures like that in (26). A t t r ibu te names such as pe r , num and gend are represented as atomic symbols, i.e. indiv is -ible character strings. Values, however, are expressed as either atomic symbols or feature sets; alternatively, a value may be expressed as a variable, which may become instantiated through unif ication (see §2.4) to either an atomic symbol or a feature set. For example, in the following feature set: f e a t = n p e r = 3 num = s i n g gend = fem [ c a s e = X the value of the attr ibute ca t is atomic, the value of agr is a feature set, and the value of case is a variable, denoted, as in Pro log, by a character str ing that begins wi th a capital letter. (27) a g r CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 23 2.3.2 Trees The l inguist ic object that is created at any given level is represented as a tree structure, the uniform data structure for l inguist ic objects in M M T . Tha t is, morphological objects, syntactic objects, and relational objects are all represented as tree structures. Similar ly, the l inguistic constructs in each generator G are expressed as (partial) tree descriptions, and each transformer T describes a set of tree-to-tree transformations. Definitions Standard definitions regarding trees as special types of graphs are adopted here and summarized below (Aho and U l lman (1972); Wa l l (1972:144-152); Carnie (2002:ch.3)): Definition 2.3 A graph Q is defined as the tuple < N, B > where: 1. TV is a nonempty finite set {n\, ...,nm} whose elements rii are nodes (or vertices), and 2'. B is a set {b\, -..,bn} of branches (or edges or arcs) whose elements bi are pairs of nodes from (i.e., a branch connects exactly two nodes, e.g. (rij, n^))-For convenience, each node in a graph is given a label, such as A , B, etc.; such graphs are often called labeled graphs. Labe led graphs are represented by labeled nodes w i th lines as branches connecting the nodes. A simple labeled graph Q consisting of the three nodes P, Q, and R is defined in (28a) and i l lustrated graphical ly in (28b); a more common notat ion is shown in (28c): (28) a. G= <{P,Q,R}, {(P,Q), (Q,R), (R,P)}> b. P Q c. P Q \/ R Definition 2.4 A path is a list of two or more nodes [n\,... ,nm] where each is in N and either: 1: (n\,nm) is a branch in B (i.e. there is a branch connecting n\ and nm), or 2. (ni,ri2) is a branch in B and there is a path from ri2 to nm. Definition 2.5 A cycle is a path that begins at node ni and ends at the same node rii. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 24 The graph in (28) is cyclic since there is a path from any node back to the same node; e.g., [P,Q,R, P\-Definition 2.6 A n acyclic graph is a graph Q which does not contain any cycles. If any one branch in the graph in (28) is removed, the graph becomes acycl ic: (29) a. P Q b. P Q c. P Q \/ \ / R R R Definition 2.7 A tree is an acyclic graph wi th one node designated as the root. By convention, the root node is the top-most node in the tree. For example, the acyclic graph in (29a) can be represented by each of the trees in (30), tak ing each node in turn as the root: (30) a. P b. R c. Q R / \ R I Q P I Q P Definition 2.8 A node a in graph Q dominates node /? if either: 1. (a, /3) is a branch of Q, or 2. there is a node 7 such that (7,/?) is a branch in Q and a dominates 7. Definition 2.9 A node a in graph Q immediately dominates node j3 if (a,/3) is a branch in Q. Definition 2.10 Two (or more) nodes that are immediately dominated by the same node are sisters of each other, and daughters of the immediately dominat ing (— mother) node. Definition 2.11 A node that dominates no other node is called a leaf. Any given node immediately dominates zero or more nodes, and al l nodes except the root are immediately dominated by exactly one node. The root node dominates al l other nodes, and no node dominates the root. To il lustrate, the node A in the tree in (31) below is the root node, dominat ing B, C , D and E, and immediately dominat ing B and C . Nodes B, D and E are leaf nodes. Nodes B and C are sisters, as are D and E: CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 25 (31) B C D E Definition 2.12 A local tree is a tree consisting of a node a , called the local root, together w i th the nodes that a immediately dominates. In (31), the nodes A , B, and C form a local tree wi th A the local root; similarly, the nodes C , D, and E form a local tree w i th C the local root. The importance of a local tree is that each rule of grammar describes a local tree, and conversely any local tree corresponds to a rule of grammar. When we look at the rules of grammar in the next chapter, we wi l l observe that al l relations between nodes are expressed w i th in the local tree, i.e. between mother and daughter nodes or between sister nodes. Th is applies not only to the unif ication of features between local head and local root, but also to the constraints holding between the features of sister nodes, as in agreement relations (e.g. subject-verb, determiner-noun, noun-adjective) and licensing relations (specifier, complement, modifier, and c l i t ic l icensing, described in the next chapter). Hav ing the local tree as the domain over which al l of these relations hold strengthens the significance of the local tree and the restrictions it places on the feature content of the nodes and on the rules constraining that content. A s mentioned at the beginning of this subsection, each l inguistic object is represented as a tree, i.e. a root node dominat ing zero or more subtrees (zero if the tree is itself just a single node). However, instead of labeled nodes as they appear in the above tree diagrams, each node in the tree is realized as a feature set. For example, the l inguistic structure in (32a) would be (minimal ly ) represented in M M T as the tree in (32b): (32) a . S N P V P V N P I I I np M M T Notation In M M T notat ion, a tree representation is formal ly specified as follows: CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 26 (33) R o o t . [ S u b t r e e i , S u b t r e e 2 , S u b t r e e n ] where Root is a feature set representing the root node of the tree, and each Subtree,;, 0 < i < n, is a tree representation. When no subtrees are present under a root, as in a leaf node, the tree is represented as follows: (34) Roo t . [] The tree in (32b) would be represented in M M T user notat ion as follows: (35) { c a t=s } . [ { c a t = n p } . [ ] , { ca t=vp } . [ { c a t = v } . [ ] , { ca t=np} . [ ] ] ] and the local tree, headed by the node labeled S, would be expressed as in (36): (36) { c a t=s } . [ {cat=np}, {cat=vp} ] 2.4 Feature Set Descriptions Whereas l inguistic objects contain feature sets in their nodes, the nodes of generators and transformers contain feature set descriptions, consisting of a set of feature descriptions. A feature description is a constraint on what value an attr ibute can have and what value an attr ibute is proscribed from having. When a feature set description in a rule is applied to a feature set in an object, a constraint satisfaction process is ini t iated which either succeeds if all constraints can be successfully applied to the object, or fails if any one constraint fails to apply. We indicate the appl icat ion of a rule's feature set description (FSD) to an object feature set (FS) as follows: (37) F S D U FS => FS ' (first version) Tha t is, a feature set description F S D applied to a feature set F S yields the updated feature set FS ' . (This wi l l be slightly revised in §2.4.2.) If F S D fails to apply to FS , then the constraint fails. Th i s is annotated as follows: (38) F S D U FS => FAIL Constraint satisfaction is an extension of the notion of feature unification (Shieber 1986), where a feature set description is combined wi th a feature set such that the result ing feature set contains all and only the features of both sets, and no attr ibute in the resulting feature set CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 27 contains confl ict ing values. For example, the equation in (39) i l lustrates the unif icat ion of a feature set description w i th an object feature set to yield the updated object feature set on the right side of the equation: (39) agr cat = n per — 3 num = sing case = X Feature Set Descr ipt ion U case = nom agr num = sing gend = masc cat = n agr = per = 3 num = sing gend = masc case = nom Object Feature Set Updated Object Feature Set The feature description j^at = n j is added to the object feature set, and the feature description [per = 3J is added to the object feature set value of agr. The variable X is instantiated to the value nom. The kinds of constraints considered here are div ided into two major types called positive constraints and nonpositive constraints. The nonpositive constraints further subdivide into negative constraints and disjunctive constraints. These constraint types are elaborated i n the following sections. Feature set descriptions may contain mult iple instances of a given feature. A feature set description may specify a positive, a negative, and a disjunctive constraint (or any number of them) for the same feature, as in the fol lowing: (40)' agr = agr ^ num = sing per agr i per = 3 gend = masc [gend = neutj When applied to an object feature set containing the feature agr, the feature set description in (40) states that agr contains the feature and contains either the feature num = sing , does not contain the feature per = 1 per = 3 gend = mascj set that results from this constraint evaluation wi l l contain just a single feature for agr w i th a or the feature [gend = neut . The object feature 1 co tai  value determined by the resolution of the three constraints. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 28 2.4.1 Positive Constraints A positive constraint is satisfied by unif icat ion, as i l lustrated in (39) above. Some important properties of unification are listed below: 1. Uni f icat ion is additive. In (39), the feature description [^ cat = nj in the feature set de-scr ipt ion is added to the object feature set: similarly, the feature description |per = 3J is added to agr in the object feature set. 2. Uni f icat ion is idempotent. Un i fy ing identical features does not alter the feature set; this is feature matching. In (39), the feature description jnum = singj in the feature set descrip-t ion for agr unifies wi th the feature jnum = s ingj in the object feature set. The unif ication of like features is not cumulative: that is: (41) num = s i n g U num = s i n g 7^ num = s i n g num = s i n g 3. Unif icat ion instantiates variables. The variable X in the feature set description in (39) becomes instantiated to the value nom through unif ication wi th the object feature set. A n y other variables named X in the feature set would also simultaneously become instantiated to nom. 4. Uni f icat ion is information-preserving. Information is never destroyed, removed, h idden or deleted by the process of unif icat ion; unif icat ion only adds information to an object feature set (or confirms what is already in the feature set). Unif icat ion fails if two values for the same attr ibute cannot unify. The unif ication of the feature sets in (42) fails, since the atomic values s i n g and p l u r are not identical : (42) num = s i n g LI num = p l u r FAIL Similarly, unif ication fails in (43), since an atomic symbol (exp l ) can only unify w i th an identical atomic symbol , not w i th a feature set: (43) p r o n = e x p l U p r o n c a t = n t y p e = p e r s o n a l FAIL Posit ive constraints are found in all unification-based grammar formalisms, such as L F G (Lexical Funct ional Grammar ) (Bresnan 1982; Bresnan 2001), F U G (Functional Uni f icat ion Grammar ) (Kay 1983; K a y 1984), H P S G (Head-Driven Phrase Structure Grammar ) (Pol lard CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 29 and Sag 1994), and many others. Posit ive constraints constrain features to have the specified values, and fail if the features have confl ict ing values, or more precisely, fail if the specified values conflict w i th any other constraints already on the values. Th i s leads direct ly to the concepts of nonpositive constraints implemented in M M T , the negative and disjunctive constraints. 2.4.2 Nonpositive Constraints The unif ication of a feature set description wi th an object feature set—i.e., the application of positive constraints—yields an updated object feature set. M M T also provides for the expression of nonpositive constraints, which either negatively or disjunctively del imit the set of features in the object feature set. Nonposit ive constraint satisfaction goes beyond simple unification as described in the previous section. In part icular, a nonpositive constraint might not be immedi -ately resolvable, as opposed to positive constraints which are always immediately resolvable by unif ication. For example, the negative constraint in (44): (44). [cat ^ n] indicates that the value of the attr ibute c a t may not unify w i th the value n. W h e n (44) is applied to a feature set that does not contain a feature for c a t , such as (45), the constraint cannot be resolved, as there is no value for ca t in (45) against which the negative constraint can be verified: (45) per = 3 agr = num = s ing In this case, the constraint becomes pending unt i l such t ime as unif icat ion succeeds in adding a feature for c a t to the feature set; once the feature set has a value for c a t , the negative constraint can be evaluated, which w i l l either succeed or fail depending on whether the value of ca t is n or not. The same phenomenon occurs wi th the disjunctive constraint, as described below. Consequently, every object feature set F S can be considered to consist of two parts, one describing the positive features P O S and one describing any pending, i.e. unresolved (or not yet resolved), nonpositive constraints N P O S on the feature set: (46) F S = ( P 0 S | \NP0S J CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 30 Every feature set description F S D interacts w i th these two parts as shown below: / P O S \ / P O S ' (47) F S D U (final version) \NP0S J yNP0S'y That is, a feature set description wi th in a rule, when applied to an object feature set consisting of positive and nonpositive constraints, yields an updated object feature set consisting of updated positive and nonpositive constraints. Given this notat ion, we can now describe the appl icat ion of the constraint in (44) to the object feature set in (45) as follows: (48) cat n / per = 3 \ I per = 3 agr = agr = u num = sing num = sing V : i I { cat ^ nj J The feature set description consists of the single negative constraint [cat ^  nj• The object feature set consists of two parts: the positive features for agr, and an empty set of nonpositive constraints, indicat ing that there are currently no pending constraints on the feature set. A f te r the constraint appl icat ion, the updated positive port ion remains unchanged, but the updated nonpositive port ion now contains the unresolved negative constraint. The next two sections describe the two types of nonpositive constraints, beginning wi th negative constraints (§ and followed by disjunctive constraints (§ Negative Constraints In a negative constraint, the value of an attr ibute is constrained to not have a specific value. The constraint succeeds if the feature set already has a value different from that specified in the constraint, or it fails if the feature set already contains the proscribed value. If its value has not yet been defined, or not sufficiently defined, the negative constraint becomes a pending nonpositive constraint on the feature set, evaluated each t ime subsequent constraints are applied to the feature set. Consider the three negative constraints in (49): (49) a. cat ^  n agr ^ per = 3 num = sing CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 31 c. subj re f ^ X objref = X (In M M T user notat ion, the operator is wr i t ten as ' " = ' , as in {cat~=n}.) In (49a), the constraint states that the attr ibute c a t may not have as value the atomic symbol n. The effect of this constraint on an object feature set is i l lustrated by the four constraint equations in (50): (50) a. [cat + n] U b. [cat ^ nj U cat cat FAIL cat = v cat ^ n d. cat ^ n U U n = + n = + cat = cat = v = - v = -agr num = s ing agr - |num = s ing | cat nl In (50a). the object feature set already contains the feature cat = n j , so the constraint fails. In (50b), the object feature contains a value for ca t different from that specified in the constraint, so the constraint succeeds; the result ing object feature set remains unchanged. In (50c), the value of the object's ca t attr ibute is a feature set, which is never unifiable w i th an atomic symbol , so the constraint succeeds, and the object feature set remains unchanged. In (50d), the object feature set does not contain any value for c a t . Af ter unif icat ion, the constraint is added to the object feature set as a pending constraint; the value of c a t is free to unify w i th anything except the atomic value n. Consider now the negative constraint in (49b), repeated here: (49) b. Th i s constraint states that the value of agr may not simultaneously contain the two features [per = 3J and [num = singj- In other words, it expresses the not ion "not 3 r d person s ingular" , characteristic of Engl ish present tense verb forms which appear in their base (=uninflected) form, as shown in the table below for the verb listen: agr ^ per = 3 num = s ing CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 32 (51) Is* person singular I l isten 2nd person singular you • listen 3 r d person singular he/she/it l i s t e n s 1 s t person plural we listen 2nd person plural you listen 3rd person plural they listen The verb form listens is unique to 3 r d person singular agreement; al l others are "not 3rd person s ingular" , so it is convenient to be able to express that property using the negative constraint in (49b). Note how this constraint interacts wi th the object feature sets in the following equations: (52) a. p e r = 3 a g r -num = s i n g -a g r p e r = 3 -num = s i n g -a g r p e r = 3 num = s i n g u . u u p e r = 3 a g r = num = s i n g FAIL -gend = masc a g r = p e r = 1 a g r = p e r = 1 num = s i n g num = s i n g a g r = p e r = 3 >]] a g r = ^per = 3 a g r 7^  jnum = s i n g In (52a), the object feature set includes both of the features proscribed by the negative constraint, so the constraint fails. In (52b), the object feature set contains the feature [per = i j , which does not unify w i th p e r = 3 in the negative constraint, hence the constraint is satisfied; the object feature set does not contain both p e r = 3 and num = s ing • In (52c), the object feature set contains the feature p e r = 3 contain the feature num = s i n g , which is specified in the negative constraint, but, it does not, . The constraint succeeds but a new constraint is added to the list of pending constraints, proh ib i t ing the object feature set from addit ional ly acquir ing the feature num = s i n g j • The result ing feature set on a verb is compatible w i th a 3rd person plural pronoun, e.g. they listen, but not w i th a 3 r d person singular pronoun such as *he listen.7 Final ly , the negative constraint in (49c), repeated here: (49) c. s u b j r e f ^ X o b j r e f = X 7The situation is complicated by the fact that he listen is licit in subjunctive mood, as in She asked that he listen to her. Thus the constraint on subject-verb agreement also has to take into consideration the grammatical mood of the utterance. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 33 il lustrates how Pro log logic variables, e.g. X , propagate the effects of constraints throughout an entire object feature set. The constraint in (49c) is intended to have the interpretation that a subject's referential value is not the same as the object's referential value. (Such a rule might appear w i th in conditions on non-coreference. Th i s is a s impli f icat ion for the purpose of discussing the behavior of negative constraints; the actual representation of reference and conditions on reference are detailed in Chapter 3.) The constraint states that whatever value the subject's reference has, it cannot unify w i th the value of the object's reference, and conversely, whatever value the object reference has, the subject's reference value cannot unify w i th it. Consider the negative constraint equations in (53): (53) a. s u b j r e f X o b j r e f = X s u b j r e f ^ X o b j r e f = X U u s u b j r e f = o b j r e f = s u b j r e f = o b j r e f = i n d e x = 2 2 p e r = 3 i n d e x = 7 p e r = 3 s u b j r e f = o b j r e f = i n d e x = 2 2 p e r = 3 i n d e x = 7 p e r = 3 p e r = 3 i n d e x = 7 p e r = 3 p e r = 3 i n d e x = 7 p e r = 3 \ In (53a), the variable X is instantiated to of the equation, as shown below: i n d e x = 7 p e r = 3 s u b j r e f o b j r e f = j^per = a s u b j r e f ^ [ i n d e x = 7 ] ] / ' by virtue of the positive constraint port ion (54) o b j r e f = X U o b j r e f = i n d e x = 7 o b j r e f = i n d e x = 7 p e r = 3 p e r = 3 The negative constraint port ion now reduces to the following: (55) s u b j r e f ^ i n d e x = 7 U i n d e x = 2 2 s u b j r e f = i n d e x = 2 2 s u b j r e f = p e r = 3 p e r = 3 p e r = 3 which, s imilar to equation (52b) above, is clearly satisfied, since i n d e x = 7J does not unify w i th Hence the negative constraint succeeds, and the result leaves the object feature i n d e x 2 2 i n d e x = 7 p e r = 3 (56) s u b j r e f i n d e x = 7 p e r = 3 U subjref = [per = 3J per = 3 index = set unchanged. In (53b), the variable X is again instantiated to the feature set and the negative constraint equation is reduced to the following: / r r subjref subj re f ^ [ The result is a pending negative constraint on the value of s u b j r e f ; it may not come to contain the feature [index = 7J • CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 34 Disjunctive Constraints The second type of nonpositive constraint is the disjunctive constraint, where the value of an attr ibute is constrained to be one of a set of alternatives. A disjunctive constraint is displayed in one of the following two equivalent forms: (57) a. a t t r = ( v a l i V v a l 2 V • • • V v a l n ) va l ] b. a t t r v a l 9 va l „ where each disjunctive value va l^ is either an atomic symbol or a feature set description (or a variable). The first form in (57a) wi l l generally be used for atomic-valued disjuncts, and the second form in (57b) for feature set-valued disjuncts. (In the M M T implementat ion, dis junct ion is represented uniformly by a semicolon, as in { a t t r = ( v a l l ; v a l 2 ; v a l n ) } . ) If an object feature set unifies wi th only one of the disjuncts, the disjunctive constraint is immediately satisfied, as i l lustrated by the following constraint equations: (58) a. b. cat = ( noun V verb V adj ) U cat = verb =>• cat = verb cat = ( noun V verb V adj ) c. pron = ( exp l V cat U U cat = prep FAIL pron = per per = 3 pron = cat = n The resolution of disjunctive constraints involv ing atomic symbols, as in (58a,b). is straightfor-ward: if the atomic value in the object is one of the values in the disjunctive constraint, the constraint succeeds, as in (58a), and if it is not one of the disjunctive values, the constraint fails, as in (58b). Dist inguishing between an atomic value and a feature set is also straightforward, as shown in (58c), where the value of p ron is either the atomic symbol e x p l or the feature cat = n • Since the object value of p r o n is the feature per = 3 , only the disjunct cat = n satisfies the constraint. Dist inguishing between two (or more) feature sets in a disjunctive constraint is more complex, but the general rule for constraint satisfaction applies: if only one of the disjuncts unifies with the object feature set, the constraint is satisfied w i th the object feature set having the result of the unif icat ion. A simple example is the following: CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 35 (59) p e r num a g r = < 3 s ing per = 3 n u m = p l u r gend = masc U a g r = n u m = p l u r c a s e = nom a g r n u m = p l u r case = nom per = 3 gend = masc The feature [num = p lu r j in the second disjunct above is sufficient to satisfy the constraint, w i th the result being the unif ication of the second disjunct w i th the object feature set. In cases where there is insufficient content in the object feature set to resolve the disjunctive constraint, the unresolved disjunctive port ion of the constraint becomes pending in the object feature set, as i l lustrated below: (60) a g r p e r num 3 s ing per = 3 n u m = p l u r gend = masc per = 2 n u m —- p l u r gend = fem U a g r c a s e = nom n u m = p l u r a g r case = nom n u m = p l u r per = 3 gend = per = 2 gend = fem \ masc The disjunctive constraint in (60) contains three disjuncts. The first cannot unify w i th the object feature set because of confl ict ing values for the attr ibute num. Tha t disjunct is el iminated from the equation, but the two remaining disjuncts cannot be resolved unt i l the object feature set acquires a value for p e r or gend. So the two disjuncts form a pending disjunctive constraint in the object feature set. 2.4.3 Conjoined Feature Sets A n y given feature set description consists of a conjunction of ind iv idua l feature descriptions. For example, the feature set description in (61a) is equivalent to the conjunction of indiv idual feature descriptions in (61b): (61) a. p = x A A Formal iz ing this property by introducing the conjunction operator 'A ' (logical A N D ) allows one to do two things that are not possible w i th the means available so far. F i rs t , it allows CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 36 (62) a g r e e m e n t : [TJ (63) agreement X A one of the conjuncts to be represented by a logical variable. Th i s makes it possible to model shared (or "reentrant") feature structures, typical ly annotated w i th numbered boxes in most unification-based grammar formalisms, as in the following (from Shieber 1986:19): number: s i n g u l a r ] p e r s o n : t h i r d [^subject: [agreement : (TJ j In (62), the values of the two agreement attr ibutes are shared, indicated by the symbol [TJ. Furthermore, that value is constrained to unify w i th features signifying 3 r d person singular agreement. In M M T , this feature set description would be expressed as follows: number = s i n g u l a r ] p e r s o n = t h i r d [ s u b j e c t = (^agreement = x] Th is feature set description uses the conjunct ion operator to conjoin the variable X to the feature set for 3 r d person singular. In M M T user notat ion, the 'A ' symbol is replaced by the ampersand '&', so that (63) is represented as: (64) {agreement=X&{number=singular,person=third},subject={agreement=X}} Whi l e conjunction resembles unif icat ion (it has the same effect as unif ication in (61)). i t differs in that the conjunct ion operator is a notat ional component of a feature set description, like disjunction and negation. Uni f icat ion, on the other hand, is the operation that combines a feature set description w i th an object feature set, resolving constraints and instant iat ing variables. The second function of the conjunct ion operator allows one of the conjuncts to be repre-sented as a disjunction of feature sets. In this way, a feature set description can be separated into an invariable part and a disjunctive part. For example, the German definite article der can occur in six configurations, shown in (65) and exemplified in (66): (65) G e r m a n der ( d e f i n i t e a r t i c l e ) Number Gender Case Example singular masculine nominat ive (66a) singular feminine dative (66b) singular feminine genitive (66c) plural masculine genitive (66d) plural feminine genitive (66e) plural neuter genitive (66f) CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 37 (66) a. Gestern ist der Zug angekommen yesterday is S G . M A S C . N O M t ra in arrived 'Yesterday the train arr ived' b. Ich gab der F rau ein Geschenk-I gave SG.FEM.DAT woman a gift 'I gave the woman a gift ' c. Die Geduld der Mu t te r ist grenzenlos the patience S G . F E M . G E N mother is l imitless 'The mother's patience is boundless' d. Das Projekt der Manner beeindruckte uns the project P L . M A S C . G E N men impressed us 'The men's project impressed us' e. Das Projekt der Frauen beeindruckte uns the project P L . F E M . G E N women impressed us 'The women's project impressed us' f. Das Projekt der K inder beeindruckte uns the project P L . N E U . G E N children impressed us 'The children's project impressed us' One method of specifying the lexical entries for der is to write six separate entries, one for each of the combinations of number, gender and case. However, using conjoined feature sets, these can be collapsed into one rule, each t ime factoring out common informat ion as the invariable part and disjunctive feature sets for the remainder. The entry would be specified as follows: word = der cat = art (67) agr = per = 3 num = sing A < A < gend = masc case = nom gend = fem case = ( dat num = p lur case = gen In other words, der is a 3 r d person article, which is specified for either singular or p lura l number, and if singular then its agreement features include masculine nominative or feminine dative or genitive, and if plural then its case is genitive and its gender is unspecified, i.e., it unifies wi th any gender. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 38 2.4.4 Constraint Resolution Nonposit ive, i.e. negative and disjunctive, constraints are costlier to process than positive con-straints. W h e n mult iple pending nonpositive constraints accumulate in an object feature set, they may be resolvable by mutual interaction w i th each other, or at least able to reduce to fewer and/or simpler constraints, thus reducing processing costs. For example, assume that the negative constraint in (68) is applied to an object feature set containing a pending disjunctive constraint. The result is the resolution of both constraints: (68) agr ^  per = 3 U agr agr case = nom num = plur per = 3 gend = masc per = 2 gend = fem \ agr case = nom num = plur per = 2 gend = fem The negative constraint in (68) is compatible wi th only one of the disjuncts in the pending disjunctive constraint, since the negative constraint proscribes the agr attr ibute from having the feature per = 3 • The disjunct specifying an agr value of is thereby excluded, per = 3 gend = mascj leaving only the second disjunct to unify with the object. The object now satisfies the negative constraint, and resolves the disjunctive constraint. The result ing object set no longer has any pending constraints, as all constraints have been resolved. Complex combinations of disjunctive and negative constraints can interact in such a way that many if not all constraints are resolved dur ing unif icat ion. The strategy of constraint satisfaction and resolution ensures that at no time wi l l the object contain confl ict ing features. 2.5 Context-Free Grammars M M T is a rule-based formal ism. Tha t is, the lexicons, generators, and transformers are defined by rules. The rules are responsible for generating l inguistic objects and for transforming objects in the process of translat ion. The term 'generate' stems from the notion of generative grammar: The basic concept of generative grammar is s imply a system of rules that defines in a formally precise way (i.e. 'generates') a set of sequences (strings over some vocabulary of words or 'formatives') that represent the well-formed sentences of a given language. (Sag and Wasow 1999:411) CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 39 The generative grammars that have been proposed for natural languages belong to the class of context-free grammars (CFGs ) . Th i s section begins w i th a formal definit ion of a C F G , and then proceeds to show how simplif ications to the formal definit ion can be made to describe the types of C F G s that are the basis of M M T generators. 2.5.1 Formal Definition A context-free grammar Q for language £ is defined as the following tuple (Aho and U l lman 1972:85): Definition 2.13 G = < Af,V,V,S >, where • TV is a finite set of nonterminal symbols. • V is a finite set of terminal symbols (the vocabulary) , • V is a finite set of phrase structure rules (also known as production rules), and • S is the start symbol, a designated member of J\f which defines the set of sentences of C. A phrase structure rule is defined as follows: Definition 2.14 A —> a is a phrase structure rule, where A € TV, and a € {TVu V } + . 8 A simple C F G is i l lustrated below: (69) G= < TV: {S, NP , V P , N , V } , V: {saw, he, himself}, V: (S -> N P V P , V P -» V N P , N P N, N he V himself, V -• saw} S: S > This grammar wi l l recognize the str ing uhe saw himself' (among others) as being a sentence of the language and having the following hierarchical phrase structure, which graphical ly i l lustrates the derivational history of the phrase structure rules that were applied to generate this sentence: 8 This is interpreted as follows: The symbol A is a member of the set N of nonterminal symbols, and a is a sequence of one or more symbols taken from the combined set of nonterminal and terminal symbols. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 40 (70) 2.5.2 Simplifications The definition of a context-free grammar given in 2.13 can be simplif ied in a number of ways. For one. since the left-hand side of each phrase structure rule in V is a symbol that belongs to the set M of nonterminal symbols, the set J\f is derivable from V, hence M is redundant and el iminable. In other words, by Def ini t ion 2.14 for the phrase structure rule, there can be no rule X —> a where X is not a symbol in TV". Us ing the grammar in (69) as an i l lustrat ion, the set of phrase structure rules has the symbols S, V P , NP , N , and V as left-hand side symbols, which is exactly the extension of the defined set of nonterminal symbols {S, N P , V P , N , V } . In principle, there could exist a nonterminal W wi th in J\f which no rule expands, but then W would be an ineffectual nonterminal symbol and could be el iminated from N wi thout affecting the language coverage of the grammar. For example, if the symbol P P exists in M but no phrase structure rule has P P on its left-hand side, the symbol P P could be removed from M. Hence, the set N need not be st ipulated and so can be el iminated from the definit ion of a grammar. Second, the phrase structure rules in (69) contain certain nonterminal symbols, sometimes called preterminal symbols (cf. Chomsky 's (1965:84) preterminal strings), that expand to the terminal symbols, i.e. the vocabulary items. A n example is the phrase structure rule V —> saw. Clear ly it is redundant to include the terminal symbols as components of the vocabulary set V and to specify them again as expansions of preterminal symbols in the set of phrase structure rules P. G iven a lexicon of tens or hundreds of thousands of entries, it is also highly inefficient. The solution is made available by recognizing that the terminal symbols in V specify feature sets. Th i s means that instead of atomic values for the terminal symbols, as in {saw, he, him-self}, the symbols specify feature sets, incorporat ing features for both the terminal and the preterminal symbols, as in { word = saw c a t = v word = he c a t = n word = h i m s e l f c a t = n phrase structure rules in the set V can then be el iminated. }. The preterminal CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 41 T h i r d , the start symbol S restricts the category of the topmost node of the sentence. However, natura l language constructs are not l imi ted to the sentence. Typ i ca l examples of nonsentence occurrences in natural language texts include titles, captions, and short-answer responses, which are often realized as noun phrases, preposit ional phrases, adverbials, etc . 9 A grammar that processes natural language has to be able to identify any construction as belonging to the language, but it needn't restrict that construction to a single syntactic category. For this reason, S, the start symbol , is irrelevant, and no restrictions are placed on the category value of the topmost node. We are left w i th the following view of a grammar: (71) g = <v,v> where Q contains the terminal symbols V and the phrase structure rules V. In the terminology of M M T , the terminal symbols are defined by the lexicon, and the phrase structure rules by the generator. The terminal symbols, as vocabulary, are part icular to each language. The phrase structure rules, however, share a substantial degree of commonal i ty across languages, as captured for example in the principle of phrase structure, one of the principles of U G . Hence, the grammatical description of a given language C consists of the set of principles of U G plus any idiosyncratic rules of C that (as yet) appear to fall outside the bounds of U G . For any language £, we can characterize its grammatical system Q as follows: (72) QC = < Vc, Pc U VUG > for C e {Engl ish, Spanish, . . . } Tha t is, the grammar of language L consists of its vocabulary Vc, the rules Vc that perta in specifically to language £, and the rules (principles) of grammar common to all languages VUG-9Short-answer responses in a discourse context could be analyzed as elided sentences. For example, the response in (b) to the question in (a) could be analyzed as the structure in (c). where the bracketed material has been elided: (i) a. Who threatened Iraq? b. Bush. c. Bush [threatened Iraq] This analysis might provide support for retaining S as a start symbol, even in the case of sentence fragments like (b). However, analyzing book titles, for example The Andromeda Strain or For Whom the Bells Toll, would still require relaxing any constraint on the categorial requirement of the texts being processed, and support the elimination of the start symbol as a formal property of grammar. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 42 The next section addresses the M M T rule system in which the vocabulary and phrase structure rule components of an M M T translat ion system are defined. 2.6 Rule Systems A n M M T translat ion system consists of, for each language implemented, a lexicon, several generators, and several transformers. Each of these components is defined by a set of rules, for which a well-defined rule notation is developed. The generators and transformers share a common core of principles, and these principles are also expressed in the same rule notat ion. Th is section describes the rule notat ion for each of the component types. Each component is defined by two rule types, one for tree structures and one for feature structures. The table below shows the rule types that are defined for each component: R u l e T y p e C o m p o n e n t D e s c r i p t i o n L lexicon lexical entries L F lexicon lexical features G generator phrase structures G F generator features in phrase structures T transformer tree transformations T F transformer features in tree transformations M W transformer mul t iword transformations M W F transformer features in mult iword transformations Lexicons are defined by lexical entry rules (L-rules) and lexical feature rules (LF-rules). Gen -erators are defined by phrase structure rules (G-rules) and phrase structure feature rules (GF-rules). Transformers are defined by either tree transformation rules (T-rules) and tree feature transformation rules (TF-rules), or by mult iword rules (MW-rules) and mult iword feature rules (MWF-rules) , depending on the levels of representation involved in the transformation. The rules making up the common core components may, in pr inciple, be of any rule type. In practice, common core components do not include lexical entry rules (L-rules), which contain idiosyncratic language-specific data. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 43 A n important characteristic of all M M T rules is that each rule maps onto a part ia l tree, the form of which is repeated here: (74) a. Root.. [ S u b t r e e i , S u b t r e e 2 , - •., S u b t r e e n ] Not only do trees have the form in (74) but so do the M M T rules, with the principal difference being that the node descriptions i n a rule are feature set descriptions, not feature sets, and therefore ut i l ize the ful l constraint resolution logic provided. Thus if the node descriptions in a rule unify w i th the nodes in an object, the rule is deemed to be applicable to the object, which is modified according to the conditions stated in the rule. The following subsections provide detailed descriptions of the rule types for each of the components: the lexicon (§2.6.1), the generators (§2.6.2), and the transformers (§2.6.3). 2.6.1 Lexicon The lexicon supplies the words of a language. W h e n a source text is presented for analysis, the words in the text are taken either directly or derivatively (to be clarified presently) from the source language lexicon. Similar ly, when the transfer structure of a source language text is translated to a target language, the target language words are taken from the target lexicon. The lexicon is organized as a set of lexical entry rules (L-rules). A s the lexicon is compiled, lexical redundancy rules, in the form of lexical feature rules (LF-rules), supply addit ional features through unif ication wi th the entry. The next two subsections describe these two rule types. Lexical Rules (L-Rules) Each lexical rule specifies a set of properties for a lexical entry using the feature set notation described in §2.4. One feature is designated as the lexeme (Matthews 1991:26) which serves as the index into the lexicon, much like a dict ionary form is used as an index entry into a standard dictionary. Thus the word forms open, opens, opening, and opened are related to the lexeme OPEN. In M M T , the attr ibute name l e x is used to denote the lexeme. For example, the lexical entries for the words door and open would be (minimal ly) represented as follows: b. Roo t . [] (75) a. b. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 44 Morphologica l processes generate the regular inflected forms such as doors, opens, opened, open-ing, as well as derived forms such as opener, without having to add them as separate entries in the lexicon. Irregular forms, such as those involving vowel changes (e.g. give/gave, buy/bought) or suppletions (go/went, be/is/am/are/was/were), are encoded as separate lemmas perta in-ing to the same lexeme. The i r representation in M M T util izes disjunctive feature sets for the lemmas, conjoined to the feature set containing the lexeme, as follows: (76) a. b. l e x = g i v e Verb l e x = buy Verb A i A < c. l e x = go Verb A < d. l e x = be Verb A < lemma = g i v e Base Form lemma = gave P a s t Tense lemma = buy Base Form lemma = b o u g h t P a s t Tense lemma = go Base Form lemma = went Past Tense lemma = be Base Form lemma = i s 3 r d Person Singular Present Tense lemma = am 1 s t Person Singular Present Tense lemma = a r e Plural V 2 n d Person Singular i Present Tense lemma = was 1 s t v 3 r d Person Singular Past Tense lemma = were Plural V 2nd Person Singular Past Tense The properties wr i t ten in ital ics, e.g. Verb, Past Tense, etc., are shorthand annotations for the actual features that encode this informat ion, as described in Chapter 3. Propert ies common to all the lemmas, such as being a transit ive or intransit ive verb, would be encoded in the invariant CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 45 port ion of the feature set containing the lexeme. Lemmas not only specify irregular inflectional forms, such as buy versus bought, but also irregular derivational forms. For example, many verbs allow the suffix -ation to form process nominals, such as alter-* alteration, adapt—* adaptation, etc. Similarly, agentive nominals are often formed by adding the suffix -er or -or, as in attack—*attacker and collect^ collector. However, the verb rebel is irregular; instead of *rehelatidn we have rebellion, and instead of *rebeler we have s imply rebel. These irregular forms are also encoded as lemmas conjoined to the lexeme R E B E L : ( 7 7 ) l e x = r e b e l A < lemma Verb r e b e l lemma = r e b e l l i o n Noun (Process) lemma = r e b e l Noun (Agentive) Each word occurr ing in the source text must be found in the lexicon as a lexeme or as a lemma, or be morphological ly analyzed as being a derivative of a lexeme or lemma. A n y other word wi l l be treated as unknown. The feature set notat ion allows for any arbitrary organizat ion of lexical informat ion that can be expressed in the form of attribute-value pairs. In this implementat ion, lexical properties of a given lexeme are organized into the following features: l e x = Lexeme p r o j = Projection Structure phon = Phonological Structure morph = Morphological Structure s y n = Syntactic Structure sem =• Semantic Structure a r g s t r = Argument Structure c h a i n = Chain Structure m o d s t r = Modifier Structure where Projection Structure, Phonological Structure, etc., are complex feature set descriptions. The specific content of these descriptions is presented in the following chapter, where we wi l l see that all of these features (except the lexeme) form the content of not only leaf nodes but also all nodes in an M M T tree structure. ( 7 8 ) CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 46 Since lexical rules define lexical entries, which become the leaf nodes in tree structures, the representation of lexical rules coincides wi th the representation of trees rooted by a terminal node. Thus , (79) below shows the M M T user notat ion for the lexical rule in (76a): (79) { l e x = g i v e , . . . } & ( { l e m m a = g i v e , . . . } ; { l e m m a = g a v e , . . . } ) . [ ] L e x i c a l F e a t u r e R u l e s ( L F - R u l e s ) Lexica l feature rules provide a means of introducing default features into a lexical entry before it is stored in the lexicon. Lex ica l feature rules can be thought of as lexical redundancy rules: The general a im of such rules is to minimize the information that needs to be included in the dict ionary entries for part icular items - or, more generally s t i l l , to el iminate lexical redundancy (since informat ion which is predictable from a general rule or principle is clearly not idiosyncratic to specific lexical items, and hence should not be included in part icular lexical entries). (Radford 1988:355) Lexica l redundancy rules may supply morphological , phonological, syntact ic, or semantic prop-erties of lexical entries, based on a priori content in the entry. For example: m o r p h o l o g y Verbs whose lexemes end in -ate allow process nominal izat ions ending in -ation, such as annihilate —> annihilation, procrastinate —> procrastination, investigate —> inves-tigation. On ly the exceptions need to be marked expl ic i t ly in the lexicon, by effectively blocking the lexical redundancy rule; for example, berate beration, hate /» hation. p h o n o l o g y The dist inct ion in the use of the indefinite article a vs. an is determined by the phonological sound of the immediately following word. Words beginning w i th a vowel occur wi th an, e.g. an apple, an open door. A l l others occur w i th a. Th i s property can be set by a lexical redundancy rule based on the orthography of the lexeme, overridden by explicit mark ing in the lexicon where necessary; cf. a honeybee, a hotel vs. an honor. s e m a n t i c s Verb classes based on semantic dist inctions may determine their syntactic subcate-gorization properties. Radford (1988:355) presents the following examples: • mandative predicates take subjunctive complements. • interrogative and dubitative predicates take interrogative complements. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 47 • desiderative and emotive predicates take inf init ival complements w i th for. • cognitive and assertive predicates take declarative (indicative) complements. If the semantic class is stated in the lexical entry, a lexical redundancy rule can introduce the subcategorization frame automatical ly. Th i s presupposes, however, a clear delineation of the semantic classes of a verb. In the implementat ion developed here, lexical feature rules are defined that supply morphological and phonological features, as described above. Semantic redundancy rules have not yet been implemented, but could be once a complete set of semantic classes encompassing all implemented languages has been defined. As each lexical rule is read, the entire set of lexical feature rules is evaluated on the lexical rule. If the lexical feature rule unifies w i th the lexical rule, the lexical rule is updated, and processing continues to the next lexical feature rule. If unif ication of the lexical feature rule w i th the lexical rule fails, no change is made to the lexical rule. Af ter al l lexical feature rules have been processed on the lexical rule, the updated lexical rule is stored in the lexicon as the lexical entry. For example, a lexical feature rule apply ing the morphological redundancy rule above for verbs ending in -ate is the following: On l y lexical rules whose lexeme ends in -ate, and whose base category is identified as a verb, e.g. procrastinate, are affected by this lexical feature rule. (The underscore '_' represents a str ing of one or more alphanumeric characters; the symbol is the M M T str ing concatenation operator.) A n example of a phonological lexical feature rule which defines the first segment of a word as having a vowel-initial sound is the following: lex = _+ate (80) morph = basecat = v Process Nominalization = -ation (81) lex = (a;e;i;o;u)+_ phon = Leg = vowel Except ions must be listed expl ic i t ly in the lexicon, such as the entries for union and euphoria: CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 48 (82) a. lex = union phon seg = cons lex = euphoria phon seg = consj Al though the lexemes in (82) begin wi th a vowel, the lexical feature rule in (81) wi l l not affect the lexical rules because the lexical rules already contain a value for the segment. Hence the lexical rules remain unchanged (for this lexical feature rule). A s w i th lexical rules, the M M T user notat ion of a lexical feature rule conforms to the structure of a leaf node. The rule in (81), for example, would be wri t ten as follows: (83) { l e x = ( a ; e ; i ; o ; u ) + _ , phon={seg=vowel} } . [ ] 2.6.2 Generators Generators define the l inguist ic objects represented as tree structures, or more precisely, local tree structures, i.e. trees consisting of a root node and its immediate daughters. The two rule types that make up the generator of a level are generator rules (G-rules) and generator feature rules (GF-rules). The generator rules construct the hierarchical structure, and the generator feature rules operate on the node content, i.e. the features w i th in the hierarchical structure. Generators define the structures at the morphological , syntactic, and relational levels of representation. A t the morphological level, generator rules define how a word may be constructed out of morphological components, such as stems and affixes. For example, the word consultations might be analyzed structural ly as in.(84): (84) consultations (-s plural izat ion) I consultation (-ation nominal izat ion) 1 consult (root lexeme) The morphological generator defines how the noun consultations is derived from the lexeme C O N S U L T through morphological analysis, and at the same time how and under what conditions the lexeme C O N S U L T can yield the noun consultations, through morphological synthesis. A t the syntactic level, generator rules are a variat ion of standard phrase structure rules, for example S —> N P V P . The syntactic parser uses generator rules when it constructs tree structures. A major difference between context-free rules and generator rules is that generator CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 49 rules, l ike all other rules in M M T , map onto a local tree structure. Consequently, the format of the generator rules, and of the generator feature rules that apply constraints to the objects, coincides w i th the format of trees in (85): (85) Root.[ Subtreei, Subtree2, Subtree™ ] Th is is elaborated upon in the following sections. Generator Rules (G-Rules) A generator rule, in describing a local tree, is composed of an expression for the local root, and an expression for the possible configurations of immediate daughters under the root, called the body. The formal description of generator rules is presented here in Backus-Naur Form ( B N F ) . 1 0 Thus the format of a generator rule and its components is expressed in B N F as follows: (86) a. b: c. d. e. f. g-h. G-RULE ROOT BODY BODY BODY BODY BODY BODY := ROOT.[BODY] := FSD (Feature Set Description) FSD (BODY (BODY ?B0DY *B0DY +B0DY BODY) (adjacency) BODY) (alternation) . (optionality) (zero or more) (one or more) where ROOT is defined as a feature set description (FSD), a period separates the root from the body, and BODY, enclosed in square brackets, is a regular expression over the root nodes of the immediate daughters. M in ima l l y , BODY is defined as a single F S D (86c). but may also be a sequence of expressions (86d) or alternat ing expressions (86e). A n expression is optional if preceded by '?' (86f), or an expression may be repeated zero or more times using '*' (Kleene star) (86g), or one or more times using '+' (86h). A n example of a phrase structure rule typical of grammars in the 1960 :s is shown in (87a), and its rendit ion in M M T user notat ion in (87b): (87) a. S -> N P ( A U X ) V P ( P P V A D V P ) * b. {cat=s).[ {cat=np}, ?{cat=aux}, {cat=vp}, *({cat=pp};{cat=advp}) ] 1 0 B N F was invented by John Backus and Peter Naur for describing the A L G O L 60 programming language, and has been accepted as the standard notation for formally describing the syntax of a language. The symbol is interpreted as "is defined as" CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 50 Generator Feature Rules (GF-Rules) Generator feature rules map onto part ia l trees constructed by generator rules and apply feature constraints to the nodes in the tree. Whereas generator rules constrain the structura l well-formedness of a tree, the generator feature rules constrain the featural well-formedness of the nodes in the tree. A generator feature rule cannot modify the hierarchical or linear structure of the tree in any way, only the feature content of its nodes. A s each tree is constructed, the entire set of generator feature rules are applied to the tree. If all generator feature rules are successfully appl ied, the tree is well-formed: if any generator feature rule fails, the system backtracks to another generator rule (if any) in order to construct an alternative tree, and then re-applies the set of generator feature rules. Since a generator feature rule describes some port ion of a tree structure, the generator feature rule has one of two formats, the first for a root and its daughters (88a), and the second for a leaf node (88b): (88) a. G F - R U L E b. G F - R U L E C. G F - B O D Y d. G F - B O D Y e. G F - B O D Y f. G F - B O D Y g. G F - B O D Y R O O T . [ G F - B O D Y ] R O O T . [ ] B O D Y R O O T . [ G F - B O D Y ] ? * + where R O O T is, as before, a feature set description, and G F - B O D Y is a regular expression over the body of a generator feature rule. The Backus-Naur Form for B O D Y in (88c) has the same form as a generator rule body, presented in (86). The expression in (88d), permits a generator feature rule to describe not just a local tree but, because of the recursive reference to G F - B O D Y , a tree to any desired depth. The expression in (88e) allows for an optional nonspecific feature set, and is equivalent to the expression ?{}. The expressions in (88f,g), equivalent to *{} and +{}, allow zero or more, or one or more, nonspecific feature sets, respectively. J Generator feature rules differ from generator rules in two important ways. The first-concerns the depth of the tree described by the rule, and the second concerns the nature of the node descriptions in each rule. A s to the first, a generator rule defines a local root and its immediate daughters, whereas generator feature rules, by vir tue of the recursivity in (87d), may CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 51 describe a root node and any of its constituents to any depth. For example, the highlighted port ion of the tree in (89a) can be described by the generator feature rule in (89b) (formatted to more clearly show the linear and hierarchical relationships): (89) a. ^ b. S { c a t=s } . [ {cat=np} , / \ { c a t=s } . [ {cat=np} , N P / \ • {cat=vp} ] ] N P V P Th is allows generator feature rules to target nonlocal trees, rather than str ict ly local trees as do generator rules. 1 Second, node descriptions in generator feature rules express conditional constraints, which are denoted as follows: (90) P » Q analogous to the logical statement " i f P then Qr'. The condit ion P is a feature set description called the condition set. If P unifies w i th the node of the tree being mapped, then the consequence set Q must also unify. If the unif ication of Q fails, then the constraint fails, hence the generator feature rule fails and the system backtracks. If P does not unify w i th the node, no change to the node occurs, and the generator feature rule is deemed not to apply. A feature set description without the '3>' cond i t i ona l l y operator is interpreted as a condit ion set w i th no consequence set, or, equivalently, as the condit ional constraint ? » {}. For example, the root of the generator feature rule in (89b) is equivalent to the fol lowing: (91) { c a t = s } » { } A s a hypothet ical example of a generator feature rule using the condit ional constraint, assume the generator rule in (92a) generates the local tree in (92b), corresponding to the famil iar phrase structure rule S —+ N P V P : (92) a. {cat=s} . [ {cat=np}, {cat=vp} ] b. feat = s CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 52 The following generator feature rule could enforce subject-verb agreement on this structure: (93) {}. [ { c a t = n p } » { a g r = X } , { c a t = v p } » { a g r = X } ] When this (or any) generator feature rule maps onto a tree structure, all of the condition sets in the rule must unify with the corresponding feature sets in the tree in order for the rule to be applicable. (Note that unification of condition sets may also introduce new constraints into the tree, which if satisfied become a part of the object.) In (93), the first feature set description to be unified is the root. In this case, the feature set description is treated as a condition set only, which is trivially satisfied by unification with the root of the tree, [cat = sj • The next condition set to be mapped is the feature set description [cat = npj, which, like the next condition set, cat = vpj, is also satisfied by unification with the corresponding node in the tree. Hence, the generator feature rule is deemed applicable, and all consequence sets must now be satisfied. In this case, the feature set description [agr = xj unifies with the agreement features in the NP, thereby instantiating the variable X with the NP's agreement features. The agreement features on the V P node must now also unify with X in order for the generator feature rule to succeed. In this way, the agreement (^unification) of features between two (or more) nodes can be controlled by the conditional constraints in a generator feature rule. 2.6.3 Tree Transformers This section describes the rules that effect a transformation of one tree structure to another. This occurs when an object passes from the syntactic level to a relational level or vice versa, and between relational levels. Analogous to the other rule types, transformer rules are divided into two groups, tree transformation rules and tree feature transformation rules. Tree transformation rules transform partial source structures to partial target structures, and feature transformation rules transform the features from the source to the target tree. Every transformer rule has one of the following general formats: (94) a. LHS <=> RHS b. LHS => RHS c. LHS <= RHS CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 53 where LHS (left-hand side) and RHS (right-hand side) are tree descriptions that map onto l i n -guistic objects at specific levels of representation. For example, in the transfer stage between Spanish and Engl ish , LHS maps onto a part ia l tree at the Spanish transfer level and RHS indicates how to construct a part ia l tree at the Engl ish transfer level. In Engl ish to Spanish translat ion, the reverse holds; RHS maps onto the Engl ish transfer object and LHS directs the construction of the Spanish target object. The format in (94a) specifies a bidirectional rule, indicated by the operator '<=>', meaning that the rule applies in both analysis and synthesis. The rules in (94b,c) are unidirect ional . They are only activated when translat ing in one direct ion. For example, the rule in (94b) would apply only when translat ing from Spanish to Engl ish , whereas (94c) would apply only when translat ing from Engl ish to Spanish. Transformer Rules (T-Rules) Syntactic structures are rich both in content words and in grammatical or funct ion words, such as auxi l iary verbs, modals, articles, fixed prepositions, subordinate conjunctions, etc. When a syntactic structure is transformed to the predicate-argument structure, the funct ion words are removed, leaving just the predicate, its arguments and any modifiers. In the process, the tree is collapsed in size, and becomes more amenable to direct transformation into other languages. Conversely, when a relational structure is transformed to a syntactic structure, the funct ion words appropriate to the target language are inserted into the structure. The transformer rules specify how a source tree is transformed to a target tree. The strategy underly ing the transformation process is to recursively decompose the source tree into its subtrees unt i l it reaches the leaves, and then to transform each source subtree to a target subtree, recomposing the collection of transformed target subtrees to yield the complete target tree. For example, given the sentence in (95), the relation between its syntactic structure (using labels famil iar from Government-Binding) and its relational structure is i l lustrated in (96): (95) A report had been seen by the press CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 54 (96) D N a report A R G i P R E D A R G 0 report see press D N the press The relational structure, considerably smaller and more efficient to process, contains only the content words report, see, and press. The feature equivalent of each of the function words is converted to feature values on the nodes of the content words over which the function words have scope. Thus, the first determiner, a, has scope over the noun report, the auxil iaries had and been have scope over the verb seen, and the preposit ion by and the determiner the have scope over the noun press. The feature structures on the content words report, see, and press in the relational structure on the right-hand side of (96) include those functional values (the exact formulat ion of which follows the descriptions in the next chapter): (97) a. l e x = r e p o r t c a t = n d e t = i n d e f p f o r m = n i l l e x = see c a t = v t e n s e = p a s t p e r f = + a s p e c t = [ p r o g = -v o i c e = p a s s i v e c. l e x = p r e s s c a t = n d e t = j d e f = a r t p f o r m = by (97a) signifies that report is accompanied by an indefinite article, and it does not include a func-t ional preposit ion, as indicated by the feature pform = n i l ] . In (97b), the verbal construction headed by see has past tense, perfective but not progressive aspect, and passive voice. (97c) signifies that press is accompanied by a definite article and the funct ional preposition by. When the relational structure is transformed to a syntactic structure, the same process applies in reverse. The feature content of the relational tree guides the construction of the syntactic tree. The discussion of the various types of transformer rules in the paragraphs ahead describe this process. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 55 When transforming a structure, a transformer rule copies the structure or rearranges it in various ways. The table below shows the types of transformer rules in M M T , and whether they are used in analysis and/or synthesis: Type Name Analysis / Synthesis Type 1 Copy Analys is and Synthesis Type 2a Type 2b Contract ion Expans ion Analys is Synthesis Type 3 Compression Analys is Type 4 Ext rac t ion Synthesis The following paragraphs describe each of these transformer rule types. Type 1: Copy Four different copy operations are considered here: 1. A leaf node is transformed to another leaf node 2. A nonleaf structure is transformed to a leaf node 3. A leaf node is transformed to a nonleaf structure 4. A nonleaf structure is transformed to another nonleaf structure F i rs t , the simplest transformer rule copies a leaf node to a leaf node. For example, the following Type 1 copy rule translates the Spanish word mujer to the Engl ish word woman, and vice versa: (99) { lex=mujer} <=> {lex=woman} Thousands of such rules comprise the transfer level. The second type transforms a nonleaf structure to a leaf node. Th i s occurs when two or more words in the source language are translated as a single word in the target language. For example, the Spanish phrase hacia. arriba, represented as a local tree consisting of the two words hacia and arriba, can be translated to the Engl ish word upward, as in un movimiento hacia arriba 'an upward movement'. The transformer rule that expresses this relation is stated as follows: (100) {}-[ { l ex=hac i a } , { l e x = a r r i b a } ] <=> {lex=upward} CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 56 The th i rd type is the reverse of the previous example. A leaf node, e.g. upward, is trans-formed to a nonleaf structure, e.g. the phrase hacia arriba. The fourth type transforms a nonleaf tree structure to a nonleaf tree structure. In this case, T-markers prefix those subtrees which are to be recursively transformed. For example, the following rule transforms a local tree consisting of two subtrees, one an adjective phrase prefixed w i th the T-marker a and one a noun prefixed wi th b, to a target tree where the constituents are reversed: (101) { } . [ a : { ca t=ap} , b: {cat=n} ] <=> { } . [ b : { ca t=n} , a : {cat=ap} ] Such a transformer rule might be employed to translate tree structures such as those in (102) below, where Engl ish adjective-noun combinations are reversed in Spanish: (102) N P ' N P / \ / \ A P N N A P ugly duckling patito feo The T-markers a and b mark the subtrees that are to be recursively transformed. (T-markers w i th in a rule must be unique atomic symbols.) The T-marked adjective phrase, for example, wi l l be recursively translated by a simple Type 1 copy rule, as w i l l the T-marked noun. 2 .6 .3.1.2 T y p e 2 a : C o n t r a c t i o n A tree is contracted when a subtree in a source structure becomes the root of the tree in the target structure. Th i s type of transformer rule is used to remove funct ion words (grammatical formatives) from the structure. Th is occurs, for example, when the syntactic tree for the press is transformed to a relational tree containing just the noun press, removing the grammatical formative the, or when the functional preposit ion by in the phrase by the press is removed. In this case, a Type 2a contraction rule first converts the P P to a D P , and then recursively transforms the D P to N. Th is is depicted in (103): (103) D N The transformer rule that effects a Type 2a contraction has the property that one of the subtrees on the source side is T-marked, and that T-marker is the only element on the target side of the CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 57 rule. Th i s eliminates the other nodes on the source side. Since the rule fires whenever the source side matches the source object, it is crucial that the identifying features on the source side restrict the appl icat ion of the rule to only those structures that should undergo a Type 2a contract ion. If one daughter node is identified as a funct ion word, then it can be el iminated and the sister node T-marked for contract ion. For example, the Type 2a contraction rule in (104) transforms the P P structure in (103) to a D P , and the D P to an N, assuming that both by and the have the feature \syn = [gramm = +|] : W h e n (104) applies to the structure in (103), the T-marker a marks the D P for transformation. The same rule recursively applies to the D P to mark the N for transformation. A Type 1 copy rule then copies the lexical N node to a target lexical N node, which becomes the result of both applications of the Type 2a contraction rule in (104). 2 .6 .3 .1 .3 T y p e 2 b : E x p a n s i o n Type 2b expansion rules are the opposite of Type 2a contraction rules. Tha t is, reversing the direct ion of a Type 2a contraction rule yields a Type 2b expansion rule. However, reversing the rule in (104) would result in massive overgeneration, unless restrictive features are added to the source side of the rule, the right-hand side in (104). Hence the counterpart to the contraction rule is an expansion rule that expands a specific category. For example, the Type 2b expansion rule in (105a) (reading right-to-left) expands a noun, T-marked a, to a structure containing a determiner followed by the noun, where the determiner matches the noun's det feature. The expansion rule in (105b) expands any subtree to a P P whose preposit ion has a lexeme value equal to the p f drm in the source tree: (105) a. { } . [ { l e x = d e t , c a t = d , d e t = T } , a:{} ] <= a: {cat=n,det=T} b. { } . [ { l ex=P , ca t=p } , a:{} ] <= a:{pform=P} The first rule. (105a), transforms the noun press to a D P headed by the definite article the: (104) {}.[ {syn = {gramm = +}}, a:{} ] => a:{} (106) D P ' press D N the press CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 58 The second rule. (105b), applies to the D P node created above, assuming it has been passed the feature pform = byj , to create the P P : (107) D P , [pform = by P P D N P D P the press by D N the press If no preposit ion exists whose lexeme has the value of p form, the transformer rule fails. So if a noun has the feature pform = n i l j , no P P wi l l be generated. 2 .6 .3.1.4 T y p e 3: C o m p r e s s i o n Type 3 compression rules compress, or flatten, a phrasal structure so that the phrasal node immediate ly dominates its head node, the head's arguments (if any), and the head's modifiers (if any). For example, a Type 3 compression rule converts the following structure on the left to the structure on the right, where the head is act. its modifier is random, and its argument is kindness, the preposit ion of having been deleted by a Type 2a contraction rule: (108) M O D P R E D A R G random act kindness P N P of kindness Type 3 compression rules comprise two subtypes, root compression and nonroot compression. These are presented in (109): (109) a. R00TA{pro j = {max = +}} . [ a:{} , b:{} ] => ROOT .a#b b. {p ro j = {max = -}} . [ c : { } , d:{} ] => c#d Root compression retains the root, a max ima l project ion, in the target structure. It merges the two T-marked objects into one l inear sequence. Nonroot compression does the same, but the local root, an intermediate project ion, is not retained. Compression rules use the compression operator '#' to separate the two T-marked subtrees. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 59 We can follow the operation of Type 3 compression rules in the translat ion of the noun phrase in (108). The transformation of the left-hand structure begins w i th a root compression rule, since the output must also contain a root dominat ing its constituents. The rule T-marks the adjective and the N ' node. The adjective is transformed by appl icat ion of a Type 1 copy rule. The N ' , being a nonmaximal project ion, is transformed by appl icat ion of the Type 3 nonroot compression rule. In this case, the T-marker c marks the head noun acts and d T-marks the argument of kindness. The transformation of the head noun remains a head noun by a Type 1 copy rule. The transformation of the argument becomes the lexical node kindness by a Type 2a contraction rule, since of, selected by the noun act, is marked as a grammatica l formative and deleted. The result' is the concatenation of c and d, together concatenated to a, to yield the body of the structure on the right-hand side of (109). Type 4: Extraction A Type 4 extract ion rule undoes in synthesis what a Type 3 compression rule does in analysis: i t takes a flat structure and creates a hierarchical structure from it . One constituent from the flat structure is extracted from the list, leaving a residual structure. The extracted constituent is transformed and posit ioned as an immediate daughter of the root, creating a binary structure. The residual structure is recursively transformed wi th Type 4 extract ion rules unt i l only a single structure remains, which is then copied over w i th a simple Type 1 copy rule. The Type 4 extract ion rule is demonstrated by its appl icat ion to the structure on the left in (110). Given that this structure represents a clause, the first Type 4 extract ion rule extracts a constituent marked wi th case = nom • Tha t T-marked structure is transformed into the D P the report and positioned as the daughter of an IP node. A provisional sister node is created, indicated as [TJ. When the residual structure, shown inside the box on the right, is transformed, it wi l l replace [TJ: (110) A R G i case = nom report see A R G o press the report P R E D A R G 0 see press Type 4 extract ion applies again to the T-marked residual structure and extracts the argument. Assuming the extracted argument is a noun wi th features for a definite article and a pform value CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 60 of by, its transformation as a PP by a Type 2b expansion rule is right-adjoined to a provisional node, indicating where the transformation.of the residual structure will appear: V f i l l ' P R E D see A R G 0 press V P V P P p by D P D N the press V l - " I P R E D see The residual structure in (111) is transformed by a Type 1 copy rule and replaces the | V | node. Type 2b expansion rules will add auxiliary verbs to the V P structure, and another Type 2b expansion rule creates the headless IP structure, attaching the expanded V P to the [T] node in (110), and the transformation is complete. The Type 4 extraction rule that performs the first transformation in (110) is shown in (112); Type 4 extraction is indicated by the extraction operator '<#' separating the T-marked constituents: (112) R00TA{/?oo£ Features} . a:{case=nom} <# b:{Residual Root Features}} => ROOT.[ a , b ] The root node on the left-hand side is copied to the target root position. The Root Features specify the type of constituent this rule applies to. The T-marker a extracts a constituent having the feature case = nom , and places its transformation in the left daughter position on the target side of the rule. The T-marker b marks the residual structure, specifying in Residual Root Features the features that must appear on the root of the residual structure . A subsequent transformer rule transforms the residual structure and positions it in the right daughter position on the target side. Transformer Feature Rules (TF-Rules) Transformer feature rules are applied after a transformer rule has successfully transformed a source object to a target object. The format of a transformer feature rule defines, as expected, partial tree objects on the left-hand and right-hand sides of the arrow: CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 61 (113) LHS <=> RHS where LHS and RHS describe part ia l trees in exactly the same format that generator feature rules specify trees. (It is also possible to specify unidirect ional transformer feature rules using the '=>' and '<=' operators.) The definit ion of LHS and RHS in Backus-Naur Form notat ion is the fol lowing: (114) LHS RHS GF-RULE GF-RULE c a t = n <=> c a t = n num = X num = X (See (88) on page 50 for the description of GF-RULE.) The funct ion of transformer feature rules is to transfer features from a source object to a target object. The features may be copied from the source object, or they may be altered, depending on the requirements of the target. A n example of feature copying copies the grammatica l number of nouns from the source to the target. Such a transformer feature rule might be specified as follows: (115) Th is rule contains only condit ion sets but no consequence sets, so if unif ication fails, either because the source object is not a noun or because the target already has a nonunifying value for number, the rule is not appl ied. The rule ensures that, for example, the Spanish singular noun casa is translated as house, and the p lura l noun casas as houses. B u t the rule w i l l not apply if the source word is a. Spanish p lura l noun that translates as a singular mass noun in Engl ish , such as muebles ' furniture ' , consejos 'advice', informaciones ' in format ion ' , etc. Transformer feature rules like (115) are used to copy person, tense, aspect, and many other features by default from the source object to the target object. A transformer feature rule that changes feature values is i l lustrated in (116), in which Spanish condit ional tense on a verb, which does not exist in Eng l i sh , is converted to future tense w i th a feature for modal i ty : (116) c a t = v t e n s e = c o n d i t i o n a l <=> c a t = v t e n s e = f u t u r e m o d a l = p o s s i b i l i t y The rule in (116) would allow for the following translat ion: CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 62 (117) Le dije que i r f a C L . D A T . 3 S G t o l d . l S G that g O . C O N D . l S G 'I to ld h im that I would go ; W i t h a more refined semantic representation of tense and modality, such transformer feature rules could be removed. The example here s imply shows how features can be altered between source and target objects using transformer feature rules. 2.6.4 Multiword Transformers / We have so far considered rules that cover the transformations of tree objects. There are also two transitions that a text undergoes between the input and the syntactic level. The first occurs between the in i t ia l input object and the morphological level, and the second between the morphological level and the syntactic level. (In synthesis, the same two transitions are visited, but in reverse order.) The transformation rules between these levels treat relationships between adjacent words, and for this reason are referred to as multiword ( M W ) transformation rules. Mul t iword transformers consist of mult iword rules and mult iword feature rules. These two rule types are described in the following subsections. Multiword Rules (MW-Rules) Mul t iword rules are defined between the textual or input level and the morphological level, and between the morphological level and the syntactic level. The rules have the same form, but have different functions, as described below. Textual/Morphological Level The sequence JVew York can either be considered two words, New and York, or, more likely, it is a single term that consists of two -word strings separated by a space. Unfortunately, it is not possible to define a lexical entry consisting of two or more words; that is, the following feature set cannot appear in the l ex i con : 1 1 (118) lex = 'New York' cat = n 1 1 The entry is syntactically valid, but because M M T tokenizes each space-separated word in the input, it would never be able to recognize the sequence New followed by York as a single token. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 63 even though New York behaves in a l l respects as if it were a single word like Vancouver or Ottawa. To get around this problem, mult iword rules are defined which list al l such collocations, converting them to entities which can be defined in the lexicon. For New York, this is achieved wi th the following mult iword rule: (119) ( [word = ' N e w ' ] , [word = ' Y o r k ' ] ) <=> [word = ' N e w - Y o r k ' The lexical entry is then coded as: (120) l e x = ' N e w - Y o r k ' c a t = n The same technique can be applied to all fixed mult iword phrases, such as proper names (Hong Kong, World Trade Organization, Palestine Liberation Front), compound expressions (ad hoc, a la mode, status quo), and idiomatic expressions (up to date, little by little, for the time being, etc). Since these words are always uninflected, they can be recognized directly after textual input occurs, and so are defined as transformation rules between the input level and the morphological level. Because the rule is bidirect ional , it wi l l also convert 'New-York' to the proper two-word form New York on output. Another use of mult iword rules at the textual/morphological level is to replace contractions w i th their non-contracted form, for example we'll to we will. Th is part icular contraction can be generalized to all words wi th the '11 contract ion, using the following mult iword rule: (121) ( {word=X}, {word='}, {word=ll} ) => ( {word=X}, {word=will,cat=v} ) The variable X unifies with whatever word precedes the apostrophe. The right-hand side of the rule creates two words, one being the instantiat ion of X , the second the word will. It is coded as a verb to dist inguish it from the noun will, as in the will to survive. A s a matter of preference, this rule is defined for analysis only, as indicated by the unidirect ional arrow '=>'. For example, it wi l l convert 'You'll be sorry' on input to 'You will be sorry', but on output it suppresses the contraction, in keeping wi th the convention of not ut i l iz ing contractions in expository texts. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 64 Morphological/Syntactic Level Once morphological analysis has processed al l of the words in the input text, they are passed to the parser for syntactic analysis. Before the str ing of words is parsed, however, they pass through the morphological/syntact ic transformation level. Mu l t iwo rd rules at this level can modify the input to the parsing. For example, compound nouns can be recognized and converted to single-word forms, such as Engl ish barn owl, credit card, trade union, etc., and Spanish palabra clave 'keyword' , cese de fuego 'ceasefire', patron oro 'gold standard ' , etc. The mult iword rule for the compound palabra clave 'keyword' is shown in (122): (122) l e x = p a l a b r a num = NUM l e x = c l a v e w o r d = c l a v e ) <=> l e x = ' p a l a b r a - c l a v e ' num = NUM The grammatica l number on palabra is copied to the lexeme P A L A B R A - C L A V E , SO that palabra clave reflects singular number, and palabras clave p lura l . Synthesis performs the reverse transformation, separating the lexeme P A L A B R A - C L A V E into P A L A B R A and C L A V E . The grammatical number of the lexeme is copied to the head noun, where morphological synthesis wi l l generate either palabra or palabras, depending on the value of the number. Multiword Feature Rules (MWF-Rules) A t the morphological/syntact ic level, mult iword feature rules apply feature constraints to the sequence of tokens after mul t iword rules have applied and before parsing begins. They are pr imar i ly used to disambiguate the parts of speech, or syntactic category, of words based on the surrounding context. For example, the word report may be a noun or verb, but i f preceded by an article, such as the report, a report, or my report, it can only be a noun. Mu l t iwo rd feature rules permit the exclusion of the verb reading in this context, improv ing considerably the efficiency of parsing. The mul t iword feature rule in (123) would achieve this effect: (123) c a t = d t y p e = a r t i c l e , c a t ^ v The rule in (123) states that in any two-word sequence, if the first word is a determiner of type article, the second cannot have the category of a verb. If the context does not apply, the constraint cannot apply, and the parser must perform the disambiguat ion. CHAPTER 2. FORMALIZING MULTILINGUAL MACHINE TRANSLATION 65 Mu l t iword feature rules are not defined at the textual/morphological level, since the feature structures have no features other than the words themselves. 2.7 Conclusion This chapter has covered the characteristics of the M M T formalism. It presented the notion of levels of representation, and described how levels are defined by generators; and how transitions between levels are defined by transformers. It emphasized that common core properties of all generators and transformers, regardless of language, may be factored out and made available to each component, s imulat ing the notion of a Universal G rammar . It defined l inguist ic objects in terms of tree structures and feature structures, and presented the rule formal ism as expressions of constraints on feature structures w i th in part ia l trees. Given this formal ism, a specific instantiat ion of l inguist ic knowledge is described, which is the topic of the next chapter. Th is instant iat ion is not intended to cover al l of the major morphological , syntactic, and semantic properties of Engl ish or Spanish, but rather to investigate how a part icular subset of properties can be described using the formal system. W i t h i n that descriptive mechanism, I investigate specific properties of reflexive constructions and how they can be expressed and translated. Th i s is the topic of Chapter 5. CHAPTER 3. CORE LINGUISTIC STRUCTURES 66 Chapter 3 C o r e L i n g u i s t i c S t r u c t u r e s Chapter 2 presented the M M T formalism used to encode l inguist ic knowledge. The formal ism provides for the construction of tree structures and, w i th in the nodes of a tree, feature structures. The feature logic allows for the resolution of constraint equations involv ing positive, negative, and disjunctive constraints, as directed by rules w i th in generators and translators. The current chapter presents in detai l the specific encoding of l inguist ic knowledge that has been developed for this part icular appl icat ion of mult i l ingual machine translat ion. Th is knowledge is compartmental ized into the set of feature structures shown in F igure 3.1, which every node wi th in a tree object contains (except for l e x , the lexeme, which only has relevance to leaf nodes). Thus every node contains the features: p r o j , phon, morph, syn , and sem, describ-l ex = Lexeme pro j = Projection Structure phon = Phonological Structure morph = Morphological Structure syn = Syntactic Structure sem = Semantic Structure a rgs t r = Argument Structure chain = Chain Structure modstr = Modifier Structure Figure 3.1: L ist of Top Level Features ing the node's projection structure, phonological structure, morphological structure, syntactic structure, and semantic structure, respectively. These constitute the core l inguist ic feature struc-CHAPTER 3. CORE LINGUISTIC STRUCTURES 67 tures, and are described in §3.1 through §3.5 in this chapter. Fol lowing the semantic structure, I devote §3.6 to a description of cl i t ic structures in Spanish, since they figure prominent ly in the discussion of argument structure and reflexivity. The informat ion that relates the given node to other nodes in the tree is represented by the three addit ional features a r g s t r representing the argument structure of the current node, c h a i n containing information on displaced arguments, and modst r compris ing the modifier structure. Because the argument structure plays such a crucial role in the analysis of constituent structure, part icular ly w i th respect to reflexives, I defer the description of the argument structure and chain structure to the next chapter. The description of the modifier structure is also covered in that chapter. The dist inct ion between the various feature structure groupings is not always transparent. Thus, morphological features, such as tense and number, play a significant role in determining well-formed syntactic structures. Similarly, semantic: features, such as animaey. affect the range of syntactic structures that may be generated, part icular ly in Spanish, where animate direct objects are regularly preceded by the preposit ion 'a ' , and inanimate objects lack ' a ' , as shown in the following: (1) Yo vi *(a) la niiia 'I saw the child'' (2) Yo vi (*a) la bicicleta 'I saw the bicycle' In (1), 'a ' preceding the animate object la n i i ia ' ' the ch i ld ' is l ic i t , whereas omit t ing 'a ' is i l l i c i t . Conversely in (2), ' a ' preceding the inanimate object bicicleta 'bicycle ' is i l l i c i t , whereas omi t t ing 'a ' is l ic it . The following descriptions summarize the content of the core feature structures, and help to determine which feature structure a given feature is assigned to: P r o j e c t i o n S t r u c t u r e : the status of each node as a min ima l project ion, max ima l project ion, neither, or both. P h o n o l o g i c a l S t r u c t u r e : the features that pertain specifically to the phonology (sound prop-erties) of a lexical i tem. CHAPTER 3. CORE LINGUISTIC STRUCTURES 68 \ Morphological Structure: the features that pertain to the inflectional or derivational prop-erties of a word, and that affect the word's orthography. Phrasal nodes do not contain morphological features. Syntactic Structure: the features that describe a constituent structure's part of speech and the agreement properties between constituents. Syntactic features are language-specific, and-thus may vary across languages (although some are universal). Semantic Structure: the features that describe a constituent structure's semantic properties, including its semantic role (i.e. as argument, modifier, or predicate), nominal reference, and intrinsic semantic properties such as animacy. Semantic features, as a representation of meaning, are language-universal; that is, the semantic features borne by a. constituent are equivalent (or more accurately, unifiable) across languages. These five feature structures are covered in the next five sections. 3.1 Projection Structure A noun phrase is a projection of a noun, just as a verb phrase is a projection of a verb, a preposit ional phrase a projection of a preposit ion, and so on. Thus projections are closely t ied to the notion of phrase structure. The phrase structure rules presented in Chapter 2 followed the tradit ional form, using rules such as N P —* D E T N. However, a more principled approach to the description of phrase structure was proposed by Chomsky (1970) and developed more expl ic i t ly by Jackendoff (1977) into X-bar theory, in which a lexical head X projected a phrasal structure according to a radical ly reduced set of phrase structure rules. A number of alternative formulations have been presented (e.g. S tuurman (1985)), but the most standard form is that in (3): (3) a. X -> ( S PEC ) X b. X -> X C O M P * where X is a lexical category (e.g. N (noun), V (verb), A (adjective), etc.), X is the first projection dominat ing X and zero or more complements of X (i.e., subcategorized constituents of X ) , and X is the maximal projection of X , dominat ing an optional specifier of X and the X project ion. CHAPTER 3. CORE LINGUISTIC STRUCTURES 69 Stowell (1981) constrained the S P E C and C O M P constituents to each be max ima l projections, so that (3), in effect, becomes (4): (4) a. X -> (7) X b. X -» X I* wi th Y and Z the categories of the specifier and complement(s), respectively. Stowell (1981) init iated the program to eliminate X-bar rules, recognizing that (4) contains redundancies that are an anathema to a principled theory of phrase structure. For example, the complements of a head must necessarily be specified w i th in the head itself as subcategorization properties, making it redundant to also refer to them in phrase structure rules such as (4b). The specifier of a head is also categorially dependent on the head; for example, it was suggested (Chomsky 1970:210) that the determiner is a specifier of N, auxil iaries are the specifier of V . qual ifying adverbs, e.g. very, are the specifier of A , and that N P is the specifier of the sentence S, making the rule in (4a) also redundant. The order of const i tuents—in part icular whether complements precede or follow the head, but also presumably applicable to the specifier before or after X (see, e.g., the Projection of the Specifier Relation in Webelhuth (1992:33; 1995:35))— was assumed to be parametr ical ly determined, and hence the order of constituents given in (4) was also derivable from a parametric principle of phrase structure. For example, complements in Engl ish are assumed to follow the head, but German and Japanese complements precede the head (at least those of predicative categories such as the verb), as i l lustrated in the fol lowing: (5) a. Ich weifi, dass der M a n n dem Kind ein Buch gegeben hat I know that the man the chi ld a book given has 'I know that the man gave a book to the child' b. Sannin-ni omiyage-o ageta three.people-DAT presents-ACC gave 'I gave presents to three people' (Nagao 1989:76) The result is that the phrase structure component of grammar does not require the rules in (4) but rather relies on more basic principles that derive phrase structures on the basis of the lexical properties of heads of phrases. The fundamental principle governing the hierarchical representation of phrases is encap-sulated in a single rule governing head project ion, referred to by Speas (1990:43) as P R O J E C T CHAPTER 3. CORE LINGUISTIC STRUCTURES 70 A L P H A : (6) P R O J E C T A L P H A : A word of syntactic category X is dominated by an uninterrupted sequence of X nodes, leading to potential structures such as: (7) X i X In other words, if a lexical node whose syntactic category is X projects, it must be immediately dominated by a node whose syntactic category is also X . A n d if that node projects, it too must be immediately dominated by a node of category X , and so on.- The topmost X is called the maximal projection, and the bottom-most X , instant iat ing the lexical head, the minimal projection. A n y node X between the max ima l and the min imal projection is called an intermediate projection. The sequence of X nodes from the min ima l to the maximal projection is called by Speas the projection chain. Each X in a projection chain shares all syntactic category features, inc luding (but not l imited to) the part of speech (N, V , A , etc.). In M M T , the lexical node's entire syntactic and semantic feature structures are shared throughout the projection chain. X projects whenever X combines w i th another constituent Y which is related to X to form a "larger" X . Thus, Y merges with X in one of the two configurations below when X projects: (8) a. X b. X X Y Y X Y either follows X or precedes X ; those are the only options available. Th is process of merging X and Y is similar to the pr imit ive operation of M E R G E (Chomsky 1995b:226) in the M in ima l i s t P rogram, where two objects X and Y are replaced by K = {Z, {X, Y}}, Z the label of K. B y deduction (Chomsky 1995b:243ff), the label Z is equivalent to either the head of X or the head of Y. The principle of P R O J E C T A L P H A combines merging and label ing in one operation, and as such is a simpli f icat ion of the theory of phrase structure. Furthermore, the two objects that P R O J E C T A L P H A merges must be adjacent, whereas no such restrict ion is specified in M E R G E . B o t h P R O J E C T A L P H A and M E R G E combine two objects, giv ing rise to str ict ly binary branching, CHAPTER 3. CORE LINGUISTIC STRUCTURES 71 although conceivably more than two could be acted upon. Assumpt ions based on the prevalence of minimal i ty , however, favor pairs of objects, as noted by Chametzky : . . . why should M E R G E be l imited only to pairs? F rom one point of view, this is min ima l : any less is impossible (because there is no forming a new syntactic object from a single syntactic object), and any more is unnecessary (because a new object can be formed from two objects). (Chametzky 2000:124) Thus, P R O J E C T A L P H A is the single principle that merges two objects to form a phrasal pro-ject ion. In fact, a projection is only formed when P R O J E C T A L P H A takes place. In other words, there are no vacuous projections. Consequently, structures such as (9a) that were generable by the classic X-bar system, and which would be represented in terms of head projection in (9b), are ruled out as a possible projection structure: (9) a. I b. X 1 I X X I I X X Th is is entirely in line wi th Chomsky 's not ion of Bare Phrase Structure (Chomsky 1995a), in which a constituent contains only as much structure as required and no more. Thus , the X-bar-compliant structure of the determiner phrase in (10a) is replaced by its bare phrase structure equivalent in (10b): (10) a. E> b. D D D N r ^ - \ the man D 1 the | N I N man (Here and throughout, I adopt the D P Hypothesis in which the determiner D, a functional category, selects an N P as its complement (Hudson 1984; Abney 1987), in contrast to earlier analyses in which the D P was treated as a specifier of the NP. ) Note that in (10b) the N node for man is both a maximal and a min ima l project ion, and that there are no intermediate projections in either the N projection or the D projection. CHAPTER 3. CORE LINGUISTIC STRUCTURES 72 3.1.1 Projection Licensing Conditions A constituent X combines w i th Y if and only if X licenses Y. There are four l icensing conditions that allow X to combine w i th Y . These are l isted below: (11) X licenses Y iff i. Y is a complement of X . i i . Y is a specifier of X , i i i . Y is a modifier of X , or iv. Y is a clitic attached to X . C o m p l e m e n t : X licenses Y as a complement when X subcategorizes for Y . Subcategorizat ion, a property of lexical items, is discussed more fully in the next chapter. When Y unifies wi th one of the argument specifications in X ' s subcategorization frame. Y is said to discharge that argument. Y as complement merges w i th X to form a projection of X : (12) a . X b. X X Y Y X As head-initial languages. Engl ish and Spanish have heads that precede their complements, so that verbs, nouns, prepositions, and determiners combine w i th their complement(s) to form structures like that in (12a). Japanese, a head-final language, produces projections like (12b). 1 German has properties of both head-initial and head-final languages, and hence generates both structures. S p e c i f i e r : X licenses Y as a specifier if (following Fuku i (1995)) X is a functional category that licenses Y . The two functional categories that admit specifiers in this implementat ion are I (inflection) and C (complementizer), giv ing rise to the inflectional projection I P and the 1 However, demonstrative determiners in Japanese, e.g. ano 'that', appear to display the head-initial property, as seen in (a) below. Similarly, the possessive particle -no. if treated as the head of D, analogous to English "s' in Yoko:s friend, appears to display the head-initial property, as illustrated in (b): (i) a. ano Mto 'that man' b. Yoko-no tomodachi 'Yoko's friend' D D D N D D ano hi to Yoko 'that' 'man' rj N -no tomodachi 'friend' CHAPTER 3. CORE LINGUISTIC STRUCTURES 73 complementizer projection C P , respectively. 2 The specifier of I, if present, is the subject of the sentence, and is licensed by either discharging an external argument from the chain structure (see §4.2), or by being a pleonastic pronoun (it or there in Engl ish , es in German, il in French: Spanish and Japanese do not have lexical pleonastic pronouns, presumably related to the pro-drop status of these languages). The structure of the I P inc luding the specifier is shown in (13): (13) I (=IP) ^ \ D I .In (13), the category labeled D, as specifier of I, is the max ima l projection of the subject of the sentence. The specifier of C (complementizer), if present, is in Engl ish and Spanish a fronted wh-phrase (quien 'who', que 'what' , cuanto 'how much/many' , etc.) . 3 Tha t is, a complemen-tizer licenses a specifier if the specifier has the [+wh] feature. The following i l lustrates the specifier l icensing structures for the C projection in Engl ish and Spanish: (14) C (=CP ) In (14), Y is any maximal projection that has the [+wh] feature. (WTi-phrases, and hence specifiers of C, are not further treated in this dissertation.) 2 R izz i (1997) decomposes the C P projection into the four functional categories Force, Top (Topic), Foe (Focus), and Fin (Finite). Additional functional categories have been proposed to replace the I projection with Tense and Agr(eement) projections (cf. Cinque (1999); Rizzi (2004)). For the purposes of the present implementation, it suffices to recognize the functional categories I and C. 3 In German, the specifier of C is customarily treated as the locus of the initial phrase preceding the • finite verb in main clauses, the so-called Vorfeld; it need not be a u/ft-phrase. For example, in the sentence Der Mann hat die Akten gestohlen 'The man stole the files', the phrase Der Mann 'the man' as Vorfeld is in the specifier of C. followed by the verb hat 'has' in second position, possibly occupying the head of the C projection, followed by the rest of the sentence die Akten gestohlen 'stolen the files'. See den Besten (1983), Haider and Prinzhorn (1985), Kosmeijer (1991), Haider et al. (1995), and Thrainsson et al. (1996) for analysis. Since German is not implemented in this study, the implementational details are not included here. CHAPTER 3. CORE LINGUISTIC STRUCTURES 74 M o d i f i e r : X licenses Y as a modi f ier 4 if X satisfies the constraints of a modif icand as specified in Y ' s modifier structure (see §4.3). Tha t is, Y selects the projections it may modify, fol lowing Po l la rd and Sag (1987:55-57). For example, adjectives typical ly modify nouns, as in the ital ic ized words below: (15) a. Prenomina l adjectives: Engl ish white houses German weiBe Hiiuser Japanese shiroi uchi b. Postnominal adjectives: Spanish casas blancas French maisons blanches Portuguese casas brancas I tal ian case bianche The adjective white, and all of its translations in (15), specifies within its modifier structure that it may modify a noun. Modi f icat ion may be to the left, as in (15a) for Engl ish. German, and Japanese, or to the right, as in (15b) for Spanish, French, Portuguese, and Ital ian. The licensing structures involv ing modifiers for Engl ish and Spanish are i l lustrated below: (16) a. N b. N A N N A white houses casas blancas Other modifiers typical ly include adverb phrases, preposit ional phrases, relative clauses. and quantify ing phrases. C l i t i c : X licenses Y as a cl i t ic in the special case where Y , a Spanish cl i t ic of category D (de-terminer), attaches to X , a projection of tensed I (inflection). The structural configuration for c l i t ic l icensing is i l lustrated in (17): 4 The term modifier is often referred to as.an adjunct, a term which I avoid because of its possible ambiguity. A modifier cannot be an argument of a head. An adjunct, participates in the operation of adjunction, a general process whereby two (or more) elements combine to form a single structure. Adjuncts, as participants in the process of adjunction, may be arguments or modifiers, depending on the context (see Chametzky (2000:138-145) for a discussion of the distinction between adjuncts (i.e. modifiers in the current analysis) and adjunction structures). CHAPTER 3. CORE LINGUISTIC STRUCTURES 75 (17) A fuller discussion of clitics is presented in §3.6. Fol lowing Stowell (1981), the constituent that merges w i th the X projection must itself be a maximal project ion. Complements, specifiers, modifiers and cl it ics are all max imal projections. Th is precludes the possibi l i ty of head adjunct ion structures, in which a lexical head Y , which is not a max ima l projection, adjoins to another lexical head X , forming a projection of X which is itself treated as a lexical head, as in : (18) X ° Y ° X ° Such structures have been suggested in the M in ima l i s t P rogram and earlier for verbs that "raise" from a base posit ion as head of a V P and adjoin to functional heads, collecting their morpholog-ical affixes as they raise (or checking their features, in Min ima l i s t terminology), and continuing up the structure to some target functional head. In this study, verbal affixes are analyzed in the morphology; the verb remains in its base or canonical posit ion, unless rules apply to posit ion the verb in a noncanonical posit ion. (No such rules have been defined in this study, but an analysis of verb fronting in question formation, or verb-second phenomena in German main clauses would entail the addit ion of such rules.) 3.1.2 Projection Features Project ion levels in the X-bar theory were graphical ly represented by bars (X , X ) or by primes (X" , X ' ) . In a feature-based system, these notations are awkward and unnecessary; furthermore, they do not allow for the possibi l i ty that a node level may be both a max ima l and a min ima l projection. The notat ion adopted here is to express the project ion level of any node in terms of two binary features, [±min] and [±max] (K i tagawa (1986), adapt ing an idea of Muysken (1982)). A min ima l projection, i.e. a lexical head, has the feature [+min] and al l other projection levels have the feature [—min]. S imilar ly, a maximal projection has the feature [+max] and all other projection levels have the feature [—max]. A l l intermediate projections have the feature combinat ion [—min, —max], and al l projections that are simultaneously min imal and max ima l projections have the feature combinat ion [+min , +max] . CHAPTER 3. CORE LINGUISTIC STRUCTURES 76 In order to ensure the integrity of a head and its projections, each node in the projection chain shares a unique projection index (called a lexical index in Speas (1990:44)). The projection index is a unique integer generated automatica l ly for each word in the input. For example, the following sentence might receive the indices indicated: (19) thei reportS2 have3 been4 reads bye the7 committees (The projection index on the nouns reports and committee w i l l also become their referential index of those nouns; see page 110.) W h e n viewing a tree structure, it is possible to identify the local head of a projection as that node whose projection index is the same as that of the max ima l project ion. In the structure in (20a) it is potential ly ambiguous which local root is the projection of which local head. B y associating the projection index wi th each projection chain, the ambiguity disappears. To emphasize this association, projection chains in this work are represented graphical ly as vertical projections as in (20b): (20) a. V V have V v 3 v 3 v 4 have V been V read V 4 V 5 been read The subscripts 3, 4, and 5 in (20b) represent the projection indices of the three V projections. In general, these indices are not shown in tree diagrams, since the vertical structures show projections which have equivalent indices. The M M T representation of the min/max features and the projection index is located wi th in the top-level attr ibute p r o j . Every node in a tree structure contains the feature set shown in F igure 3.2 w i th values for min and max set according to the hierarchical posit ion of the min = ± pro j = max = ± index = Projection Index Figure 3.2 : L is t of Project ional Features CHAPTER 3. CORE LINGUISTIC STRUCTURES 77 node wi th in its project ion. The projection index is the value assigned to the word which defines the min ima l projection. 3.1.3 P r o j e c t i o n R u l e s Given that phrase structure is an instance of P R O J E C T A L P H A along wi th the licensing condit ions that allow a maximal projection to merge w i th a head, the rules for generating phrase structures are reduced to just two, one for licensing to the left and one for licensing to the right. Tha t is, the entire phrase structure component could consist of the two rules in (21), corresponding respectively to the two local trees in (22): (21) a. X l - m i " l max] y[+ma:r] (22) a . X b. X X Y P Y P X The first rule in (21a) states that a nonmin imal projection of X dominates a nonmaximal pro-ject ion of X , followed by a max ima l projection of Y . The second rule in (21b) s imply reverses the constituents. These two rules form al l phrase structures, and as such belong in the common core component of grammar; i.e., they comprise the basic principle of phrase structure. We can represent this principle in M M T by the fol lowing two generator rules: (23) a. b. p r o j = min = — index = INDEX min = index = INDEX p r o j = P r o j max = — index = INDEX + P r o j - r + p r o j max = — index = INDEX The projection index, located in the local root and the local head, is assigned the variable INDEX, establishing the projection chain. Since projection indices are unique to a project ion, the projection index on the attaching max ima l projection is not indicated; its value is irrelevant for this rule. The rules in (23) correspond to projections in which the head licenses a complement, a specifier, a modifier, or a cl i t ic . As licensing depends on the content of the argument structure, chain structure, and modifier structure, the two rules in (23) are compiled out into a sl ightly CHAPTER 3. CORE LINGUISTIC STRUCTURES 78 larger set of licensing-specific rules, which are elaborated in the next chapter. Ant i c ipa t ing that expansion, we introduce a divis ion of labor into the phrase structure component. Generator rules define the structural l icensing requirements, based on the notion of attachment to a project ion chain, and generator feature rules define the feature requirements, applicable to all environments. The generator rules are presented in (24), and the generator feature rules (refined later in this chapter) axe presented in (25): (24) G - R u l e s : P h r a s e S t r u c t u r e G r a m m a r [ p r o J a. b. proj = proj = index = INDEX index = INDEX index = INDEX proj = [index = INDEX max = + (25) G F - R u l e s : P r o j e c t i o n F e a t u r e s ( V e r s i o n 1) a. [proj = [index = INDExj ] > [proj = [min [proj = [index = INDEx]]>[proj [ ] > [ p r ° j = [max = +]] b. [proj = [index = INDEx]] > [proj = [min = - j ] [ > [proj [proj = index = INDEXjJ>[proj Each generator rule in (24) bui lds a binary structure, w i th one daughter as the head of the projection chain and the other daughter, the attaching constituent, a nonspecific feature set. The generator feature rules in (25) enforce the projection features in each of the nodes through condit ional constraints. Each generator feature rule ensures that for every binary branching structure constructed by the generator rule, the local root is always nonmin imal , the local head is nonmaximal . and the attaching structure is a max ima l project ion. Later , we wi l l see how these generator feature rules are augmented to include all of the features that part ic ipate in a projection chain. CHAPTER 3. CORE LINGUISTIC STRUCTURES 79 3.2 Phonological Structure Since the appl icat ion is text-oriented as opposed to speech-oriented, phonological features play only a min ima l role in syntactic analysis. Nevertheless, we can conceive of a phon feature as containing any features that may become relevant to phonological processing. A t present, the only phonological feature that has a bearing on lexical selection is the ini t ia l phonological segment of the word. Th is information is recorded in the phonological structure shown in Figure 3.3. The feature seg, in this implementat ion, refers to the first segment of the lexeme. In [phon = [seg = Segment^ Figure 3.3: L ist of Phonological Features Engl ish , it is sufficient to know whether the word begins w i th a consonant or a vowel. Spanish is sl ightly more complex, in that the segment feature indicates whether an in i t ia l vowel is a stressed a. which affects determiner selection, or an i or o. which affects coordinator selection. These properties are described below. 3.2.1 English A / A n Allomorphy In Engl ish , the indefinite article a may only be followed by a word whose in i t ia l segment is a consonant, whereas an must be followed by a vowel-initial word. Th is is i l lustrated in (26): (26) a. a door b. a n open door c. a n apple d. a rotten apple The entry for door consists of a consonant-initial segment, as does the word rotten. The entry for open consists of a vowel-initial segment, as does apple. Every word in Engl ish has a feature signifying either a consonant or a vowel as its in i t ia l segment. The value is automatical ly set using lexical feature (LF ) rules when the lexicon is compiled. A n example of a lexical feature rule setting the default value for words beginning wi th a vowel is shown in (27): CHAPTER 3. CORE LINGUISTIC STRUCTURES 80 (27) LF-Rule: English Vowel Segment lex = (a;e;i;o;u)+_ phon seg = vowel A subsequent lexical feature rule would set the default value of all other in i t ia l segment values to a consonant: (28) LF-Rule: English Consonant Segment phon - , seg = cons Exceptions such as honest, which is phonological ly vowel-initial in spite of not orthographi-cally beginning wi th a vowel, and euphoria, which is phonological ly consonant-initial in spite of orthographica.lly beginning wi th a vowel, are hard-coded in the lexicon, as in (29): (29) a. l e x = honest phon = [seg = vowel b. l e x = euphoria phon = [seg = cons Since the a/an d ist inct ion is purely phonological—that is, article selection is based solely on the phonological properties of the adjacent word, without regard to any morphological , syntactic or semantic propert ies—it is most appropriately handled at the phonological interface, i.e. the boundary between the auditory, or in the present case textual , level and the morphology. A t the morphological level and beyond, an behaves identical ly to a. Consequently, we can convert an to a. on input without incurr ing any disrupt ion in the further analysis. Th is conversion is accomplished w i th the following mult iword rule at the textual/morphological level for Engl ish : (30) MW-Rule: English an/a Conversion word = an word = a This has two advantages. For one, an entry for an in the lexicon is no longer required. For another, it allows either an or a to precede any word in the input text, regardless of its segment value, as it often occurs erroneously when a text is being revised and a word is inserted or deleted between the article and the next word. In synthesis, however, we always want to produce the correct article. The following mu l -t iword rule converts a to an in the context of the following word having a vowel-initial segment: CHAPTER 3. CORE LINGUISTIC STRUCTURES 81 (31) MW-Rule: English a/an Conversion ( [word = an], X ) <^= ( [word = a], X A [phon = [seg = vowel] In summary, phonological processing in Engl ish entails two lexical feature rules (27) and (28) for setting in i t ia l segment values, and two mult iword rules for transforming an to a on input (30) and a to an on output (31). No other rules of Engl ish grammar at any level make reference to phonological structure. 3.2.2 Spanish Spanish also has word adjacency conditions that are affected by phonological properties. One concerns the selection of the proper coordinator for and and or, and one concerns the two allomorphs of the feminine singular definite article la and el. These are described in turn below. Coordinator Alternations In Spanish, the coordinator e 'and ' precedes a word beginning w i th the segment / i / , otherwise the coordinator y is used, as i l lustrated in (32): (32) a. grupos e indiv iduos 'groups and indiv iduals ' b. sehores y sehoras 'ladies and gentlemen' Similar ly, the coordinator u 'or' prec