UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A model of grammar based on principles of government and binding Sharp, Randall Martin 1985

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1985_A6_7 S51.pdf [ 8.5MB ]
Metadata
JSON: 831-1.0051899.json
JSON-LD: 831-1.0051899-ld.json
RDF/XML (Pretty): 831-1.0051899-rdf.xml
RDF/JSON: 831-1.0051899-rdf.json
Turtle: 831-1.0051899-turtle.txt
N-Triples: 831-1.0051899-rdf-ntriples.txt
Original Record: 831-1.0051899-source.json
Full Text
831-1.0051899-fulltext.txt
Citation
831-1.0051899.ris

Full Text

A M O D E L O F G R A M M A R B A S E D O N P R I N C I P L E S O F G O V E R N M E N T A N D B I N D I N G by R A N D A L L M A R T I N S H A R P B.Sc , Simon Fraser University, 1977 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S Department of Computer Science We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A October, 1985 ® Randall M a r t i n Sharp, 1985 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at The University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his or her representatives. It is understood that copying or publication of this thesis for Financial gain shall not be allowed without my written permission. Department of Computer Science The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date: October. 1985 Abstract This thesis describes an implementation of a model of natural language grammar based on current theories of transformational grammar, collectively referred to as Government and Binding (GB) theory. A description is presented of the principles of G B , including X - b a r syntax and the theories of Case, Theta, Binding, Bounding, and Government The principles, in effect, constitute an embodiment of "universal grammar" ( U G ) , i.e. the abstract characterization of the innately endowed human language faculty. Associated with the principles is a set of parameters that alter the effect of the principles. The "core grammar" of a specific language is an instantiation of U G with the parameters set in a particular way. To demonstrate the cross-linguistic nature of the theory, a subset of the "core grammars" of Spanish and English is implemented, including their parametric values and certain language-specific transformations required to characterize grammatical sentences. Sentences in one language are read in and converted through a series of reverse transformations to a base representation in the target language. To this representation, transformations are applied that produce a set of output sentences. The well-formedness of these sentences is verified by the general principles of U G as controlled by the parameters. A n y that fail to meet the conditions are rejected so that only grammatical sentences are displayed. The model is written in the Prolog programming language. i i Table of Contents 1. I N T R O D U C T I O N 1 2. L I N G U I S T I C F R A M E W O R K 4 2.1 Rule Systems 5 2.2 Systems of Principles 7 2.2.1 X - B a r Theory 9 2.2.2 Theta Theory 14 2.2.3 Case Theory 15 2.2.4 Government Theory 17 2.2.5 Binding Theory 20 2.2.6 Bounding Theory 21 2.2.7 Control Theory 23 3. R E P R E S E N T A T I O N S 24 3.1 Lexicon 24 3.1.1 Dictionary 24 3.1.2 Suffix Table 27 3.1.3 Contraction Table 28 3.1.4 Collocation Table 28 3.2 Phrase Structure 29 3.2.1 C O M P 29 3.2.2 I N F L 30 3.2.3 V P 30 3.2.4 N P , PP, A P 31 3.2.5 S-Adjunction 31 3.2.6 Example 32 3.3 Empty Categories 32 3.4 Grammar 33 3.4.1 Phrase Structure Rules 34 iii 4. T R A N S F O R M A T I O N S 38 4.1 Move Alpha 38 4.2 Move Af f ix 39 4.3 N u l l Subject 40 4.4 Inversion 41 4.4.1 Subject-Auxiliary Inversion 42 4.4.2 Have/Be Raising 43 4.4.3 D o Support 44 4.4.4 Verb Fronting 45 4.5 It Insertion 46 4.6 Complementizer Insertion 47 4.7 Modal Insertion 48 4.8 Successive Cyclicity and Order of Transformations 49 5. C O N D I T I O N S O N R E P R E S E N T A T I O N S 52 5.1 Extended Projection Principle and Theta-Criterion 52 5.2 D o u b l y - F i l l e d C O M P Filter 53 5.3 Case Filter 54 5.4 Binding Conditions 57 5.5 Empty Category Principle 60 6. E X E C U T I O N 63 6.1 Sentence Input 64 6.2 Morphological Analysis 65 6.3 Parsing 66 6.4 Reverse Transformations 67 6.4.1 Rinversion 68 6.4.2 Percolate 69 6.4.3 R m o v e - A f f i x 70 iv 6.4.4 Theta- Extraction 71 6.4.5 Do-Support 71 6.4.6 It-Insertion 72 6.4.7 Null-Subject 72 6.4.8 R m o v e - Alpha 72 6.4.9 WH-Feature-Extract ion 73 6.5 Translation .....74 6.6 Transformations 77 6.6.1 S e t - T N S 79 6.6.2 M o v e - A l p h a 79 6.6.3 It-Insertion 80 6.6.4 Null-Subject 81 6.6.5 Do-Support 81 6.6.6 M o v e - A f f i x 81 6.6.7 COMP-Inser t ion 82 6.6.8 Modal-Insertion 83 6.6.9 Inversion 83 6.7 PF-Generat ion 84 6.8 Sentence Output 84 6.9 Examples 85 6.9.1 Transformational Phase 85 6.9.2 Well-formedness Phase 86 7. E V A L U A T I O N A N D D I S C U S S I O N 89 7.1 Representations 89 7.1.1 Lexicon 89 7.1.2 Phrase Structure 92 7.1.3 Configurationality 95 v 7.1.4 Well-Formedness Conditions 97 7.2 Execution 99 7.2.1 Relation to Logic Grammars 99 7.2.2 Parsing 100 7.2.3 Translation 102 7.2.4 Generation 105 8. Conclusion 107 Bibliography 109 Appendix A . E X A M P L E S 113 Appendix B. Prolog Code - English Lexicon 117 Appendix C . Prolog Code - Spanish Lexicon 123 Appendix D . Prolog Code - E X E C 132 Appendix E. Prolog Code - R E A D I N P U T 136 Appendix F . Prolog Code - M O R P H 138 Appendix G . Prolog Code - Core Grammars 141 Appendix H . Prolog Code - X B A R 148 Appendix I. Prolog Code - R T R A N S F O R M 150 Appendix J. Prolog Code - T R A N S L A T E 155 Appendix K . Prolog Code - T R A N S F O R M 158 Appendix L . Prolog Code - G E N E R A T E 167 vi: Acknowledgements I wish to acknowledge the tremendous support and guidance I received from my supervisor, Dr . Michael Rochemont, who first introduced me to syntax and linguistic endeavors. Words cannot express the measure of my gratitude. I would also like to thank Dr . Harvey Abramson for technical assistance, Dr . K a r l Kobbervig for assistance with Spanish, the graduate students in the Department of Linguistics for their encouragement and spirit, and various friends who have helped in their own special way, including Donald Acton (a wizard), Gladys Wong (bless her heart), Maria Perez, and Michael Tropp. V i i i Chapter 1 I N T R O D U C T I O N One of the aims of linguistic inquiry is the analysis of human languages and the underlying systems of grammar. A n increasingly prevalent trend among modern linguists is the construction of a theory of grammar that is appropriate for the description of all human languages, rather than merely descriptive of any one language or family of languages. A significant factor in the formulation of a general theory of grammar is the degree to which it provides an explanation for the empirical facts of the language. The explanatory power of a theory is important to those researchers who, for example, are curious about how language is acquired by children. In the development of a theory of natural grammar, there is a clear preference away from rule-oriented descriptions of grammar to principle-based descriptions. This has a number of consequences. Firstly, it simplifies the descriptions of natural grammars, since the interaction of only a few principles can account for a wide range of data that would otherwise require numerous distinct rules. Secondly, it contributes very strongly to the concept of a genetically realized "language faculty", where the principles of a universal grammar ( U G ) are somehow inherent in the biological endowment of human beings and serve to account for the rapidity with which normal children acquire language. Thirdly, it reduces the variety of possible grammars by restricting the applications of rules. According to some linguists, transformational generative grammar still holds the best promise of a concise, encompassing principle-based theory of natural grammar. Transformational theory has undergone considerable change since the publication of Chomsky's Syntactic Structures (1957) and the subsequent enrichment generally referred to as the "Extended Standard Theory." In particular, the emphasis on the principles of U G rather than rules has effectively reduced the set of transformations to the single rule M o v e - a , where a arbitrarily 1 2 selects across the set of grammatical categories, and i l l - formed results are filtered out by virtue of the principles. The principles act as conditions on the well-formedness of linguistic representations. Cross-linguistic description can now be attempted by specifying parameters of variation to the principles that would account for language differences. The grammar of any one particular language, referred to as the "core grammar", would be represented solely by the settings of the parameters. The core grammars in conjunction with U G , modified by idiosyncracies of markedness, determine the set of actual grammars. This holds conceptually at the individual level as well as the language family level. This discussion of grammars is restricted to issues relating to syntax, although clearly other aspects, such as semantics, morphology, pragmatics, phonology, etc., combine to yield the empirical facts of human language. These other aspects may also be expressed in terms of basic principles and rule systems, not necessarily the same as those for syntax. The principle of the autonomy of syntax (Chomsky . 1977a) maintains a conceptual separation, though interrelatedness, among these components, and allows for independent investigation of each area. This thesis looks at a particular subset of some of the principles and rules involved in a theory of transformational grammar drawing primarily on the theory of government and binding (GB) as developed by Chomsky (1981) and several other researchers. A model is constructed that attempts to simulate the effects of U G , modified by parametric variation and combined with language-specific rules, in the context of parsing, generation, and translation of natural language. The primary goal of the research is to investigate the feasibility of representing the theory computationally, thereby formalizing it in some, albeit prototypical, sense. Because of their inherent similarities, the languages of Spanish and English were chosen to demonstrate the model. This minimizes the variability one might expect in two distinct language grammars, and makes the computational realization more tractable. If the endeavor succeeds, under some reasonable criteria for success, then several possibilities are opened up. 3 The formalization of U G can be improved, extending the range of grammatical constructions. The core grammars can be improved to further develop the coverage of Spanish and English sentences. Other core grammars could be added, which would have the dual effect of verifying the principles and their realization, and of enhancing our understanding of the theory o f - U G so that fundamental refinements to it may be made. Finally, in the area of applications research, the model may contribute to the eventual realization of a language processing device, not only for translation, but also for the intelligent recognition and production of written and spoken communication. The computational system described here makes use of Prolog, a logic programming language specifically designed for the manipulation of non-numeric symbolic data. Numerous natural language applications have been developed in Prolog (Dahl 1983, Schwind 1984), though none are known to exist that are based on the concepts underlying the current work. The thesis is organized as follows: Chapter 2 presents the linguistic framework underlying the theory of U G . Chapter 3 covers the representations in Prolog of the linguistic structures. Chapter 4 describes in detail the various transformations being modeled. Chapter 5 concerns the conditions that determine well-formed structures and how they are modeled. Chapter 6 presents a breakdown of the overall system, including the stages of parsing, translation, and generation. Chapter 7 evaluates the work and provides a discussion of some of the issues that have arisen in the construction of the model. Chapter 8 concludes the study with a summary of results and areas for future research. Appendix A contains sample dialogues illustrating the performance of the system, and the remaining appendices contain the entire Prolog program. Chapter 2 L I N G U I S T I C F R A M E W O R K The description of natural grammars can be viewed from two perspectives. One is the rule-based approach. We use rules in learning, for example, that past tense is normally formed by adding -ed to words. We hear children follow the rules, and learn the exceptions, often by over-application of the rule and subsequent retreat Syntactic rules are learned in forming questions, and we follow semantic rules when we learn meanings. A n on-going activity among many researchers in linguistics is the description of these rules at the various levels. A second approach is principle-based, where fundamental principles of grammar account for the well-formedness of linguistic constructions. They hold of the applications of rules and have the effect of constraining the set of possible rule derivations. Principles needn't be learned; they are somehow inherent in the mechanism of language. Since the mid-50s, following Chomsky's Syntactic Structures (1957), transformational generative grammar has been rigorously studied from a formal perspective by many linguists (e.g. Peters and Ritchie (1973), Lasnik and Kupin (1977), Wexler and Culicover (1980)). The evolution in this enterprise has been a shift from rule descriptions to descriptions of general principles, motivated in large part by the desire for a theory of grammar that adequately explains, not just describes, linguistic phenomena. Out of this endeavor arises a theory of universal grammar ( U G ) , i.e. that grammar that represents the linguistic knowledge common to all human languages. This in turn leads to a theory of language acquisition and helps explain why language is learned as fast as it is. As a consequence, the rule component of grammar diminishes, reducing the number of rules that must be explicitly learned. In particular, all the rules that seem to involve movement, such as Raising-to-Subject, Extraposition, W H - F r o n t i n g , etc., reduce to a single rule, M o v e - a , i.e. "move any category anywhere." Over-applications of this rule are ruled out 4 5 by the principles of grammar. Chomsky (1981) provides a breakdown of the basic rule systems and systems of principles. The components of the rule systems are the following: (1) (i) lexicon (ii) syntax (a) categorial component (b) transformational component (iii) PF-component (iv) L F - component The systems of principles fall in the domain of the following areas: (2) (i) X - B a r Theory (ii) Theta Theory (iii) Case Theory (iv) Binding Theory (v) Bounding Theory (vi) Government Theory (vii) Control Theory We look in some detail below at each of the different components of these systems. 2.1 Rule Systems The lexicon contains the vocabulary of the language, arranged as a list of lexical entries, with each entry containing syntactic, semantic, phonological, and morphological information. In addition, the lexicon contains rules for creating new lexical entries, as in Word-Formation Rules (e.g. creating adverbs from adjectives by adding -ly) and Restructuring Rules, as in idioms (e.g. take-advantage-oj), the restructuring verbs in Romance (Rizzi 1982, Rouvert and Vergnaud 1980, Burzio 1981), and the amalgamation of prepositions with verbs (e.g. eat-up your granda). The morphological rules for inflections of verbs, nouns and adjectives are also included in the lexicon. The syntax is responsible for generating structural representations of sentences. It consists of two components: the categorial component and the transformational component The categorial component expresses phrase structure. It relies on the principles of X - b a r theory to organize phrasal categories and to reduce, i f not eliminate, distinct rules of 6 phrase structure. The phrasal categories themselves are projected direcdy from the lexicon as properties of lexical entries. Specifiers and modifiers position themselves within the structures according to categorial affinities, represented as features, with ordering being largely idiosyncratic to different languages. The interaction of the lexicon and categorial component, together referred to as the base component, generate base structures ( " D - structures" or "deep structures"), i.e. representations in which the semantic participants are expressed directly in syntactic structures. The transformational component transforms the base structure into surface-like structures, or S-structures, which obey the principles of well- formed surface representation. Some transformations are language-specific, such as the English rule of D o Support (see 4.4.3). Others are applied generally, such as M o v e - a and the M o v e - A f f i x rule for "attaching" tense information to a verb. The S-structure is interpreted phonologically by the PF-component and logically by the LF-component, giving rise to phonological form (PF) and logical form (LF) , respectively. Each have their own rules, such as the P F "wanna-construction" rule (Chomsky and Lasnik 1978) which converts (3a) to (3b): (3) a. I want to win. b. I wanna win. and the L F - r u l e of quantifier-raising (Chomsky 1981) which converts (4a) to (4b): (4) a. She met a friend of Judy's. b. for some x, x a person who likes Judy, she met x While the rules and principles that make up P F and L F are interesting areas of investigation and share some overlap with those of the syntax,1 it is not the intention to treat these in this study. The relationship between these levels of representation is diagrammed below: 1 See Dresner's (1981, 1984) work, which emphasizes an approach to explanation in phonology based on general principles. n i D-structure S-structure / \ / \ / \ / \ PF LF D-structure is base-generated; S-structure is related to D-structure by the rule M o v e - a , and S-structure maps to representations of sound (PF) and meaning (LP) by other interpretive subsystems, possibly involving rules like M o v e - a . 2.2 Systems of Principles The principles of grammar determine the well-formedness of the various representations. They act as filters to overgeneration by the rule systems. They interact in interesting ways to account for all and only grammatically correct constructions. The theories listed in (2) represent independent areas that together form a general theory of U G . X - b a r theory develops the representation of phrase structures. Theta theory refers to the thematic or semantic relations that hold between lexical items. Case theory concerns the assignment of Case (e.g. nominative, accusative, etc.) to noun phrases. Binding theory specifies the relations that exist between noun phrases. Bounding theory places restrictions on the application of M o v e - a . Government theory refers to dominance relations that exist between categories. Control theory identifies the reference to P R O , a pronominal element with no phonetic content Associated with each theory are one or more principles that hold of grammatical constructions. For example, a principle of Case Theory, the Case Filter, states that all phonetically-realized noun phrases must have Case. This would appear to be a fundamental principle of grammar irrespective of any particular language. The principles may be subject to some degree of variation, and it is this variation that accounts for language differences. Wi th 8 Case Theory, English seems to have an adjacency requirement, such that nominal elements with objective Case must be adjacent to their Case assigner, as illustrated by the following two sentences:2 (5) a. John put the book on the table, b. *John put on the table the book.3 This requirement of adjacency may not hold in other languages. It is postulated, then, that U G is described in terms of the fundamental principles of grammar, and that individual grammars ("core grammars") are described by the particular settings of the available parameters of variation. By exposure to linguistic data from the environment, a child learns the values of the parameters, and a core grammar becomes established. Exceptions to the consequences of core grammar do arise, and these are covered under a theory of markedness. Each of these theories are described more completely below. However, much investigation is currently going on in all areas, and new analyses are continually being presented. Consequently the theories are in a constant state of flux, and any implementation is bound to reflect out-of-date concepts. By virtue of the modular nature of the principles, it is possible to make some changes in one area without necessarily having to alter substantially other areas, analogous to the situation in computer programming where the concepts of structured programming and modular design are now considered standard practice. By the same token, a small change in one area could have proliferating effects throughout the grammatical system, as Chomsky observes. 2 A s is customary, ungrammatical sentences are prefixed with " * " . 3 Sentence (b) may be derived by a rule of Heavy N P Shift, where the base-generated N P direct object is moved to clause-final position, usually with a comma inserted to indicate an intonation break, as i n : (i) John put t on the table, the book. The various idiosyncratic properties of sentences like (i) can only be derived by assuming that such cases are derived, not canonical as with (5a). 9 2.2.1 X - B a r Theory X - B a r theory arises from the specification of feature types shared naturally among lexical items. Different hypotheses have been proposed as to the most appropriate classification of features; Chomsky's (1972) classification is that adopted here, although Jackendoff (1977) presents a detailed alternative. Lexical items can be distinguished as to their substantive qualities, denoted by the feature [ + N] for nominal or non-nominal, and their predicative qualities, [ ± V J , for verbal or nonverbal. The four possible combinations are associated with the four major lexical categories: noun(N) = [ + N , - V J , verb(V) = [ - N , + V ] , adjective(A) = [ + N , + V ] , and preposition(P) = [ - N , - V ] . Thus, generalizations may be made in terms of feature content, such as one that refers to only [ -N] categories (i.e. the class of English Case assigners). The major generalization is that phrase structures are projections of a lexical head X with features [ ± N , ± V j , such that they follow the context-free rule: (6) X / + 1 -> q . . . Cj X1 ck ... cn where X is one of N , V , A , or P, and the C^, for z'=l to n, are possible adjuncts. X° refers to the lexical head, X 1 dominates X°, X 2 dominates X 1 , etc. xmax is considered to be the maximal projection of X , for some value of max, typically 2 Maximal projections w i l l often be denoted by their more" familiar form N P , V P , etc. The first level of the structure, X 1 (more commonly denoted X in the literature), dominates the lexical head X° (or just X ) and its complement, i f any. This is represented by the phrase structure rule: (7) X 1 - > X Complement The complement to the head is determined by the semantic requirements of the head. For example, the verb put takes a complement of two arguments: a noun phrase (object) and a prepositional phrase (location). Both are required to satisfy the semantic interpretability. of the verb, as illustrated by the following sentences: 10 (8) a. John put the book on the table. b. "John put the book. c. *John put on the table. The verb put, then, is subcategorized as taking an N P and a PP. This subcategorization information, or frame, is contained in its lexical entry. Other categories besides V can also subcategorize arguments: (9) Adjectives: a. anxious that you might forget ( = S) b. satisfied with his job ( = PP) (10) Prepositions: a. behind every man ( = N P ) b. towards the sun ( = N P ) c. *away the house (11) Nouns: a. pride in his work ( = PP) b. belief that ghosts are real ( = S ) c. purchase of a book ( = N P ) (10c) illustrates that not all prepositions are transitive; an N P following such a preposition will not receive Case and is therefore excluded by the Case Filter. In ( l i b ) and (11c), the nouns are related to transitive verbs (believe, purchase) which subcategorize equivalent phrases. The of is inserted transformationally in (11c) (through an Of-Insertion rule (Chomsky 1981) and analogous rules in other languages) in order to satisfy the Case Filter. A t the next level, X 2 ( = X ) , are the specifiers to the X 1 constituent For nouns, these include determiners (e.g. the articles the, a), quantifier phrases (e.g. some, a few), and possessives (e.g. Alan's, my sister-in-law's). For verbs, it includes the auxiliaries (have, be). For adjectives and prepositional phrases, it includes adverbial phrases (e.g. very satisfied, right behind). This level is represented by the rule: (12) X 2 - > Specifier X 1 In general, specifiers may be optional, depending on semantic and pragmatic considerations. Modifying phrases may also be included in phrasal structures, and these are typically assumed to adjoin at the X 1 level, at the same time introducing recursivity. Thus the phrase a bright young student of physics with long hair eager to please whomever would be represented by the following phrase structure: 11 A / \ / \ art N 1 a A / \ / \ A 2 N 1 bright A / \ / \ A 2 N l young A / \ / \ N l A 2 /\ eager to please whomever / \ / \ N 1 P 2 /\ w/Y/z long hair / \ / \ N° N 2 student [of ] physics The additional phrase structure rules for recursive expressions are: (13) a. X 1 —> Y 2 X 1 b. X 1 —> X 1 Y 2 where Y 2 represents adjunct(s) of any phrasal category and, as shown, may occur on either side of category X 1 . The basic set of phrase structure rules, then, are summarized below: (14) a. X 2 —> (Specifier) X 1 b. X 1 —> (Adjunct) X 1 c. X 1 —> X 1 (Adjunct) d. X 1 —> X (Complement) A characteristic of English grammar is that the specifiers of all the lexical categories always occur before the X 1 constituent, and complements always follow the head. This isn't so for all languages. In German, for example, verbal complements generally precede the verb. In the Dravidian language group, verbal specifiers follow the verb, while nominal specifiers precede the noun, as in English; in K i k u y u , a Bantu language of East Africa, nominal 12 specifiers follow the noun (Lightfoot 1982). Therefore, it would appear that the order of constituents is subject to a degree of parametric variation. But the basic constituent structure follows the pattern given in (14) above. (Consideration of phrase structure syntax for "free word order" languages like Japanese, and how the X - b a r analysis might differ, is given in (Chomsky 1981).) In addition to the X - b a r system for lexical categories, a rule is also required for the derivation of the clause. This has traditionally taken the form: (15) S —> N P A U X V P where N P is the subject, V P the predicate, and A U X contains tense and modality information. Chomsky (1981) gives the revised rule: (16) S —> N P I N F L V P where I N F L , suggestive of "inflection", contains a tense feature [ ± T N S ] (Chomsky 1980) and, i f [ + T N S ] , agreement features, abbreviated A G R , consisting of person, number and gender. That a clause may be finite ([ + TNS]) or infinitival ( [ - T N S ] ) is illustrated by the sentences of (17) with the underlying structure in (18): (17) a. 1 believe (that) Beatrice is my friend, b. I believe Beatrice to be my friend. (18) [ I believe [ Beatrice [ ± T N S ] be my friend ]] In (18), the verb believe is subcategorized simply as taking a clause, which may be tensed (17a) or untensed (17b). A n alternative analysis would be to state that believe takes either a subordinate clause or a " t o - V P " complement However, this misses the generalization that these two statements appear to express the same proposition, varying only as to tense of the embedded clause. Many verbs are of this type, so that an unnecessarily large lexicon would be required to express the two possible complements. Furthermore, the principles of G B ride heavily on the assumption that verbs taking clausal complements select either a tensed or non-tensed form, subject to other considerations, and that the subject of the non-tensed form normally must be non-lexical P R O , as in (19a) with internal structure (b), unless lexical NPs are allowed, as in (17), by exceptional mechanisms, such as Exceptional Case-Marking (cf. 2.2.3 13 below). (19) a. They prepared to flee. b. [ They prepared [ P R O to flee ]] In addition, it is assumed that another rule exists for introducing clauses with a complementizer, C O M P (Chomsky 1977b), such as that for tensed clauses and for for infinitivals: (20) a. I said (that) we were leaving now. b. H e would prefer (for) you to read it out loud. This rule is expressed as: (21) S - > C O M P S suggesting that S is the head of S, although this will be altered below. It is reasonable to assume, then, that lexical items that subcategorize for a clause actually specify S as their argument, with a rule introduced to delete the complementizer for those verbs which seem to optionally take it (cf. 4.6). Rule (16) for S appears to deviate from the schema presented under the X - b a r system (14). However, it has been suggested that I N F L is the head of S (Chomsky 1981), subcategorizing the predicate phrase, and that the subject N P is the specifier. The modified rule for the clause S then becomes: (22) a. I N F L 2 —> N 2 I N F L 1 b. I N F L 1 - > I N F L V 2 Similarly, an argument is presented (Stowell 1981, Chomsky 1985) that C O M P is the head of S. We may then consider the specifier of C O M P to be a wh- phrase and I N F L the complement, thus converting rule (21) to rule (23): (23) a. C O M P 2 —> wh- phrase C O M P 1 b. C O M P 1 - > C O M P I N F L 2 Under these new representations of the rules for S and S, the X - b a r system in (14) is now adequate to represent any phrasal category X , where X is one of the set { N , A , V , P, C O M P , I N F L }. 14 2.2.2 Theta Theory Theta Theory concerns the thematic properties of lexical items. It entails such notions as "agent-of-action", "goal", "instrument", etc., as developed in the theories of Gruber (1965), Fillmore (1968), Jackendoff (1972), and others. Lexical items assign thematic roles, or 8-roles, to phrasal constituents as part of their semantic description. For example, the verb put assigns the 8-role "patient" to its direct object NP and "location" to the locative PP. In addition, it assigns the 0-role of "agent" to an NP which typically assumes the grammatical function of subject A fundamental principle of grammar, the Projection Principle, expresses the notion that the 8- marking properties of lexical items must be represented categorially at each syntactic level, i.e. at D-structure, S-structure and LF. 0-role assignment and subcategorization are evidently closely linked concepts, as it appears that any phrase which is subcategorized is also assigned a 0-role. The reverse is not true. When not assigned a 0-role, the subject must still appear (in English) as a pleonastic element, void of semantic content (e.g. it, in It seems that Donald is very happy). However, subcategorized objects are never pleonastic; they always have a 0-role. The definition of the term "argument", introduced in the previous section, is extended to include not only the subcategorized phrases, referred to as "internal arguments," but also any phrase which receives a 0-role but is not subcategorized for, referred to as "external arguments" (Williams 1981). A principle of UG formalizes this relationship, known as the 8- Criterion. Stated simply, it is: (24) 8- Criterion: Each argument bears one and only one 8-role, and each 0-role is assigned to one and only one argument The Projection Principle and 8-Criterion together state the primary conditions on phrase structure, configured in terms of X-bar syntax. Specific phrase structure rules are eliminated, since lexical items directly determine, through subcategorization and 8- marking features, the required constituents. Order of constituents may derive from X-bar parameters and possibly by an adjacency requirement for Case assignment (cf. next section). 15 Additionally, clauses must have subjects. The subject may be the phonetically null P R O , such as the embedded subject in (19), or i f the subject is not 0 - marked it must be pleonastic it or there (il in French), as in : (25) a. It is raining. b. It is likely that no one wil l leave. c. There seem to be problems with this analysis. Spanish and other languages do not show pleonastic subjects but evidence suggests that the subject position is nevertheless present. The Projection Principle and the requirement that clauses have subjects are together referred to as the Extended Projection Principle. Since thematic relations express semantic properties, they are not described in any detail here, as we are primarily concerned with issues of syntax. The relevant issue, though, is the one-to-one correpondence between the 0-roles assigned by a lexical item and the syntactic structures to which they are assigned. Since subcategorization entails 0 -marking by the 0 - Criterion, it is sufficient to note the subcategorized arguments; the assignment of 8- roles to the arguments follows implicitly. 0 - r o l e assignment to the subject must be explicit, however, since the subject position is present (by the Extended Projection Principle) whether or not the position is assigned a 0 - ro le by the predicate. If by default the predicate assigns a 0 - ro le to the subject, then only those that do not assign a 0 - r o l e need an explicit feature to that effect, as for the weather verbs (rains, snows, etc., although the subjects of such verbs have "quasi-argument" status (Chomsky 1981)), verbs like seem, appear, and predicate adjectives like likely, probable, apparent, etc. In this system, the feature [ - T H E T A ] is associated with predicates that do not 0 - m a r k the subject The n o n - 0 - m a r k e d subjects in English are replaced by pleonastic elements; in Spanish, they remain unfilled. 2.2.3 Case Theory Case Theory refers to the assignment of abstract Case, such as nominative, accusative, etc., to nouns. It is not to be confused with Fillmore's cases (Fillmore 1968), which are related to thematic roles, covered in the previous section. Morphologically, the English language shows 16 three Case inflections, exemplified by the pronouns I, me, my, for nominative, objective ( = accusative), and genitive Case, respectively. Non-pronominals are not distinguished as to Case, except for the affix 's on genitives, as in Noam's book. Other languages have richer Case inflections: Latin has six Cases, Finnish has fifteen, etc. Case is assigned to a noun phrase by virtue of the structural position in which it occurs. Thus a transitive verb "governs" its direct object (in the traditional sense of government (Lyons 1968), to be more formalized in Section 2.2.4), assigning it objective Case. Similarly a (transitive) preposition assigns Case to the object it governs. Nominative Case is assigned to the subject of tensed clauses, i.e. in configurations where the subject N P is governed by [ + T N S ] ; 4 the subject of infinitives is generally not Case-marked, as shown in (26), though the presence of the prepositional for complementizer will Case-mark the subject, as in (20b), as will other exceptional means noted below. The influence of [ ± T N S ] on the subject position is illustrated by the following example: (26) a. It seems [ [ Heather [ + TNS] has left ]] b. *It seems [ [ Heather [ -TNS] to have left ]] In (a), Heather is assigned nominative Case by [ + TNS] in the embedded clause, whereas in (b) the embedded clause is infinitival so that Heather is not Case-marked, and the sentence is ungrammatical. To account for this and related phenomena, the Case Filter is postulated as a fundamental principle of grammar: (27) Case Filter: * N P i f N P is lexical and has no Case Thus i f a lexical N P is in any structural configuration in which Case is not assigned, such as the subject of an infinitival clause, it wil l be ungrammatical. The transformation M o v e - a , described later, must be invoked to move the N P to a location where Case is assigned. 4 Chomsky (1981) treats A G R , the agreement feature complex, as the nominative Case-assigning agent Since A G R accompanies [ + TNS] but not [ - T N S ] , the choice may not be relevant, at least at the level of detail studied here. 17 Certain cases arise where an infinitival clause does contain a subject In these instances the matrix verb, i f transitive, may assign Case across the S boundary to the embedded subject The S node is effectively deleted, allowing the embedded subject to be governed directly by the matrix verb, thus assigning it Case. 5 This is referred to as Exceptional Case-Marking ( E C M ) , an idiosyncratic feature of English grammar. It is a property of certain verbs that take either N P arguments (i.e. are transitive) or clausal arguments. A n example is the verb believe: (28) a. I believe him. b. I believe him to be telling the truth. c. I believe (that) he is telling the truth. In (a), an N P direct object is selected and assigned objective Case. In (b), a clausal complement is selected, and Case is assigned by E C M to the subject of the infinitival clause. In (c), nominative Case is assigned in the normal way to the embedded subject since the clause is tensed. For languages that do not have the E C M property, such as Spanish, only (28a,c) have grammatical equivalents, as in (29a,c): (29) a. Y o lo creo (a 61). b. * Y o lo creo decir la verdad. c. Y o creo que 61 dice la verdad. 2.2.4 Government Theory The theory of government enters into many areas since it relates directly to properties that hold over structural configurations. Thus a lexical head governs the phrases that it subcategorizes, Case is assigned under government (as noted in the previous section), and the 5 S-deletion presents a problem given the formal definition of government (Chomsky 1981:250), since a maximal projection wi l l block government of an element by a higher predicate. Taking S to be a maximal projection whose head is I N F L , as assumed here, there is no way in which the specifier_of I N F L can be governed without crossing S. O n the other hand, i f S is the head of S , then S is no longer a maximal projection, and after S-deletion the predicate may govern without crossing any maximal projection. A third possibility is to allow S-deleting predicates to subcategorize for either S or S (i.e. C O M P 2 or I N F L 2 ) , preserving the formalism that arguments are always maximal projections, but less desirable from the aspect of learnability. As this is still an issue under investigation, the stipulation is simply made that predicates with the S-delet ion property may govern an embedded subject 18 binding principles are defined in terms of government The following definition of government is taken from Chomsky (1982:19): (30) a governs /3 i f (i) a = X ° (Le. a is the head of phrase X ) (ii) a c-commands /3 (iii) 0 is not protected by a maximal projection The definition of c-command ( = "constituent command" (Reinhart 1976)), adapted from Chomsky (1981:166), is given as: (31) a c-commands (3 i f (i) a does not contain |3 (ii) all the maximal projections that contain a also contain /3 (31) allows the head of a phrase to c-command its complement from the X 1 level, its specifier from the X 2 level, and an adjunct attached at the level of the maximal projection, as in stylistic inversion of the subject, attaching to V P , in Romance languages. These three configurations are illustrated below: x2 / / / / W2 X2 / / / / Y2 X1 \ \ \ \ X° Z 2 In the diagram, the head X° c-commands its complement Z 2 , its specifier Y 2 , and the adjunct W 2 adjoined to X 2 . As to the notion of protection, /3 is protected from a by a maximal projection i f the maximal projection includes 0 but not a. The relationship between government and c-command is illustrated below with the clause They ran towards me: 19 INFL 2 / / / / NP INFL 1 they \ INFL 0 V 2 e V 1 A \ \ \ V° P 2 ran P 1 \ \ \ \ P 0 N 2 towards N 1 N° me The verb V° c-commands and governs the preposition P°, which c-commands and governs N°. V° c-commands N° but does not govern it since it is protected by the maximal projection P 2 . P° does not c-command V°, and N° does not c-command anything. I N F L 0 c-commands and governs the subject N P . The first condition for government (30i) allows any category within the X - b a r system to be a potential governor. This originally included only those elements with features [ ± N , ± V J , but under the system developed in Section 2.2.1 now also includes I N F L and 20 COMP. 2.2.5 Binding Theory Binding theory establishes coreference relations among noun phrases. It is assumed, for the purpose of coreferencing, that every NP is assigned an index, and that if two NPs have the same index, they are intended to refer to the same entity. Indices are assigned via some indexing rule which may happen at lexical insertion time or within some process at LF, where interpretation occurs. A natural condition on indexing is that NPs sharing an index must agree in person, number and gender. The notion of binding is given by the following definition: (32) a binds 0 if (i) a and /3 have the same index (ii) a c-commands |3 If an element is not bound, it is free. Three classes of NPs are identified: anaphors, pronominals, and names.6 Anaphors have no independent referential qualities and must take their reference from another NP, its antecedent, within the sentence. They include reflexives (e.g. himself) and reciprocals (e.g. each other). Pronouns (e.g. he, me, etc.) also have antecedents, but the antecedent may either appear within the sentence or outside the sentence. Names include all other (lexical) NPs (e.g. John, dogs, honesty, etc.). Chomsky (1982) assigns the features [ ± ANAPHOR] and [ ± PRONOUN] to each type of noun. Anaphors are marked with [ + ANAPHOR,- PRONOUN], pronouns with [-ANAPHOR,+PRONOUN], and names are marked [-ANAPHOR-PRONOUN]. Additionally, non-lexical NPS, both base-generated and traces that arise through movement, also contain these features. The trace left by movement of an NP to an argument position (A-position), as in passives and with raising predicates like seem, likely, pattern with lexical anaphors, while movement to non-argument (= A) positions, such as wh- movement to COMP, leaves behind a 6 Chomsky uses the term R-expression ("referential expression") for name. 21 variable trace that patterns with lexical names. The null subjects that occur in pro-drop languages pattern after lexical pronouns. The non-lexical P R O contains the features [ + A N A P H O R + P R O N O U N ] and has no lexical counterpart The binding conditions for these three classes are stated as follows: (33) Condition A : Anaphors must be bound in their governing category. Condition B: Pronouns must be free in their governing category. Condition C : Names must be free in all governing categories. The definition of governing category is: (34) a is the governing category for /3 i f and only i f a is the minimal S or N P containing /3 and a governor of j3.7 Since P R O is both pronominal and anaphoric, it follows from the binding conditions that it cannot have a governing category and therefore must be in an ungoverned position. The sentences in (35) illustrate the binding conditions for lexical NPs, given the possible indexing combinations: (35) a. [ Jane -^ said [ Wendyy saw her^. in the mirror ]] b. [ She^. said [ Wendyy saw herself^ in the mirror ]] In (a), i f j=k, then her is bound to Wendy, violating Condition B for pronouns. Therefore, j^k, i.e. Wendy and her must refer to distinct persons. Her may be preferential with Jane i f i=kr, nothing demands nor prohibits i t In (b), she and Wendy cannot refer to each other since Wendy must not be bound anywhere, by Condition C . Wendy and herself must be preferential , by Condition A , which therefore implies that she and herself cannot be coindexed because / and j must be distinct 2.2.6 Bounding Theory Bounding theory refers to the bounds placed on movement under the transformation M o v e - a . M o v e - a obeys the condition of subjacency, where a single application of M o v e - a cannot cross more than one bounding category. For English, N P and S are considered 7 A more precise definition of governing category includes the notion of accessible S U B J E C T (Chomsky 1981:212), although (34) is satisfactory for current purposes. 22 bounding categories; for Spanish, N P and S are bounding categories. Subjacency appears to hold of m o v e m e n t - t o - C O M P , movement-to-subject, extraposition, and inversion of subject to post-verbal position, as in There arrived three men from England and stylistic inversion in the Romance languages. The execution of M o v e - a in relation to subjacency is illustrated in the following example: (36) W h o do you believe (that) Renate saw? (37) a. [g e you believe [g e [g Renate see who ]]]] b. [g e [g you believe [g who^- [g Renate see t- ]]]] c. [g whOj- [g you believe [g ti [g Renate see ]]]] (37a) is the D-structure of (36). In (b), who moves to C O M P , leaving a coindexed trace in base position. Only one S node is crossed. In the next cycle (c), who moves to the matrix C O M P , again crossing only one S node. If who had moved directly from its base position to the matrix C O M P , it would have crossed both the embedded S and matrix S node, which would be a violation of subjacency. Clearer evidence for subjacency as a condition on movement is shown by the following sentences in (38) with the D-structure in (39): (38) a. It seems to be certain that John will win. b. It seems that John is certain t to win. c. John seems t to be certain t to win. d. "John seems (that) it is certain t to win. (39) [ S i N P j seem [g^ N P 2 be certain [g^ [ § N P 3 win ]]]]] Given (39), (38a) results from no movement and NP3 = John. (38b) results from movement from N P 3 to N P 2 , and (38c) from a cyclic movement through N P 2 to the matrix subject, N P ] . (38d) is ungrammatical because John has moved from its base position directly to the matrix position, crossing two bounding categories, S 2 and S 3 . ! 8 It may be noted that (38d) seems to be a violation of the binding theory for Condition A for NP-traces, since the trace does not appear to be bound in its governing category. However, given the modifications to the binding theory which include the notion of accessible S U B J E C T , the trace in (38d) is bound; therefore, the only factor that accounts for its ungrammaticality is subjacency. 23 Subjacency is also demonstrated for N P nodes by the following example: (40) a. [§ whoj [g do you believe [g t- [g Renate saw t. ]]]] b. *[g who^- [g do you believe the claim [§ f. that [g Renate saw ^ ]]]] In (a), who moves by successive cyclicity from base to embedded C O M P to matrix C O M P . However, in (b), who moves from base to embedded C O M P but then must cross an N P and the matrix S in order to get to the matrix C O M P , crossing two bounding categories. Evidence in favor of S being a bounding category for Spanish comes from the interaction of wh- movement and verb fronting. The details of this interaction are covered in Section 4.4.4, Verb Fronting. 2.2.7 Control Theory Control theory refers to the identification of P R O , illustrated by the paradigm sentences below: (41) a. J im persuaded G a i l [ P R O to leave ] b. J im promised G a i l [ P R O to leave ] c. It is unclear [ what [ P R O to do ]] d. John said [ P R O to leave ] e. It is unclear to me [ what [ P R O to do ]] In (a), Gail controls P R O and identifies it In (b), P R O is controlled by Mm. The determination o f control is evidendy a property of the matrix verb; persuade is an object-control verb, while promise is subject-control. In (c), P R O is uncontrolled and therefore arbitrary in reference. Uncontrolled P R O often arises from the unavailability of an antecedent, as in (41c), but in other cases must be stipulated by the main verb, as in (d), where say requires that P R O be arbitrary and specifically not be identified with the subject, in contrast to (e) where P R O may optionally refer to me. The identification of P R O is relevant to determining its reference at LF, and thus is not of significance in the syntax. Furthermore, identification should be similar across languages for the equivalent lexical item, so that translation poses no special problems for determining control. Therefore control theory is not implemented in the current model. Chapter 3 REPRESENTATIONS This chapter covers the Prolog representations of the linguistic structures. It includes descriptions of the lexicon, phrasal structure, empty categories, and grammar rules. The implementation of the transformational rules for mapping between D - and S-structure is given in Chapter 4, and conditions that determine the well-formedness of structures are covered in Chapter 5. 3.1 Lexicon The lexicon consists of the dictionary and tables for generating new entries into the dictionary. The tables include the suffix table, contraction table, and collocation table. 3.1.1 Dictionary The dictionary is a database of unit clauses of the form: (1) dict(Lang,Word,Cat,LexData). where Lang = 'english' or 'Spanish' Word = the lexical entry Cat = the lexical category, one of: n = noun v = verb a = adjective p = preposition art = article perf = perfective prog = progressive comp = complementizer infl = modal stem# = numbered stem for regular orthographic stem-changing verbs (Spanish only) LexData = list of lexical data, which includes one or more of the following terms: f = feature list subcat = subcategorization frame root = root word, for inflected entries 24 25 Spanish = Spanish equivalent lexical item(s) english = English equivalent lexical item(s) expr = idiomatic phrase headed by the given lexical entry The feature list f may include the following features: per = person (=per(l) , per(2), per(3)) Pi — number (pl(-) = singular, pl( + )=plural) fem gender (fem(-)=masculine, fem(+)=feminine) tns = tense (=tns(pres), tns(past), tns(fut); infinitive=tns(-perf = perfective aspect (=perf(+)) prog = progressive aspect (=prog( + )) wh = wh feature (=wh( + )) theta — indicates a n o n - 8 - m a r k e d subject ( = theta(-)) 9 The following terms may also be included in LexData for root verbs: irr list of irregular inflected forms perf irregular perfective form prog = irregular progressive form stem# = stem for orthographic stem changes sdel = S-deletion feature (=sdel(+)) Nouns may include the following terms in LexData: Pi = irregular plural form fem irregular feminine form proper = feature for indicating proper nouns (=proper( + )) pro = pronominal feature (=pro(+)) The following are examples of dictionary entries: diet(engli sh,woman,n,[pi(women),spani sh(mu j er)]). dict(english,women,n,[f(pi(+)),root(woman)]). diet(english,i,n,[f(per(1)),pro(+),proper(+),spanish(yo)]). diet(english,go,v,[subcat(p),subcat(comp),perf(gone), irr([goes,went]),spanish(ir)]). dict(english,went,v,[f(tns(past)),root(go)]). dict(english,seem,v,[f(theta(-)),sdel(+),subcat(comp), Spanish(parecer)]). dict(english,which,art,[f(wh(+)),spanish(cual)]). Default features pertain to lexical entries in order to reduce the size of the lexicon. If a default feature does not apply, an explicit feature is required to override it Default features are the following: 9 A s mentioned in Section 2.2.2, the 'theta(-)' feature, applicable to 8-marking predicates, is included as a convenience in determining the thematic status of subjects. When percolated to I N F L 2 by 8- Extraction (see 6.4.4), the feature will cause movement to or from the subject position by M o v e - a or the reverse of M o v e - a , respectively. This highly simplified mechanism is in lieu of one which indicates 0 - r o l e assignment by means of some notation such as 'agent=[NP,S]\ i.e. the 0 - r o l e of agent is assigned to the N P immediately dominated by S. 26 (2) a. Nouns: per(3), p l ( - ) (i.e. 3rd person singular) fem(-) (Spanish nouns default to masculine) b. Articles: p l ( - ) , fem(-) (i.e. singular masculine, Spanish only) c. Verbs: tns(-) (Spanish only) English verbs that appear uninflected may actually be the present tense of first or second person, or third person plural, as in : (3) a. I win/We win. b. Y o u win. c. They win. d. N P to win. When analyzing an apparently uninflected verb, agreement features will be taken from the subject unless the infinitive marker to is present, as in (d), in which case no agreement features are associated with the verb. N o such action is required with Spanish verbs since its inflection system is generally sufficient to fully identify agreement features. Since the translations performed in this model are for equivalent lexical items, it is theoretically possible to eliminate duplicate features by containing the common features in a language-neutral lexicon. The language-specific dictionary would then contain any idiosyncratic features along with pointers to the corresponding common entry. For example, the verb believe and its Spanish equivalent creer could be represented somewhat as follows: (4) diet(english,believe,v,[neutral(xl23),sdel(+),...]). diet(Spanish,creer,v,[neutral(xl23),...]). diet(neutral,xl23,v,[subcat ( n ) ,subcat ( c o m p ) , english(believe),Spanish(creer),...]). Both believe and creer share the same subcategorization frames as shown in the entry for 'xl23 ' , a unique "system-generated" identifier pointed to by the equivalent verbs. Other features in common, such as 8- marking properties and selectional restrictions, would also be included in the lexical data for 'xl23. ' Only the language-specific information (e.g. 'sdel(+)' for believe but not for creer), would be included in the language-specific entries. This relational approach to the lexicon, as well as being theoretically attractive, would simplify the addition of other languages to the system, provided the lexical forms have a high 27 degree of equivalence to the neutral forms in the lexicon. The overall size of the combined lexicons would certainly be reduced. However, access time would approximately double, since at least two entry look-ups would be required to analyze each word. In this system, a combined approach is taken. Subcategorization information is included only for English lexical items; the Spanish equivalents have a pointer to the English entry for accessing that information. A l l other common features, e.g. person, are duplicated in the entry for each language. For example, believe and creer have the following entries: (5) dict(english,believe,v,[subcat(n),subcat(comp), sdel(+),spanish(creer)]). diet(Spanish,creer,v,[prog(creyendo),english(believe)]). 3.1.2 Suffix Table The suffix table is used in the morphological analysis of inflected words. The structure of the table is a unit clause of the form (6) suffixtable(Lang,Suffix,RootEnding,Cat,Features). where Lang = 'english' or 'spanish' Suffix = one or more letters in the suffix RootEnding = zero or more letters to be added to the word after the suffix is removed in order to form the root word Cat = lexical category Features = features associated with the suffix. Below are some examples of the suffix table: suf f ixtable(english,ies,y , n , t p K + ) ] ) • suffixtable(english,s , " " , n ,[pl(+)]). suffixtable(english,ies,y,v,[pl(-),per(3),tns(pres)]). suffixtable(Spanish,a,o,n,[fem(+)]). suffixtable(Spanish,a,ar,v,[pl(-),per(3),tns(pres)]). suffixtable(Spanish,iendo,ir,v,[prog(+)]). 28 3.1.3 Contraction Table The contraction table is used to split a contraction into its constituent words during lexical analysis, and to form contractions during lexical generation. The Spanish lexicon contains the following contraction table: c o n t r a c t i o n ( S p a n i s h , a l , [ a , e l ] ) . contraction(Spanish,del,[de,el]). English contractions are optional and include entries such as: contraction(english,i'm,[i,am]). c o n t r a c t i o n ( e n g l i s h , i ' v e , [ i , h a v e ] ) . 3.1.4 Collocation Table The collocation table includes groups of words that distributionally behave as a single unit The collocated unit appears as a single entry in the dictionary. The collocation table for English includes the following: collocation(english,[how,much],how_much). collocation(english,[how,many],how_many). The corresponding dictionary entries are: diet(english,how_much,n,[f([wh(+)]),spanish(cuanto)]). diet(english,how_many,n,[f([wh(+),pl(+)]),spanish(cuantos)]). diet(english,how_much,art,[f([wh(+),pl(-)]),spanish(cuanto)]). diet(english,how_many,art,[f([wh(+),pl(+)]),spanish(cuantos)]). Some of the entries in the Spanish collocation table for prepositions include: collocation(Spanish,[a,el,lado,de],al_lado_de). collocation(Spanish,[acerca,de],acerca_de). collocation(Spanish,[alrededor,de],alrededor_de). collocation(Spanish,[antes,de],antes_de). collocation(Spanish,[despues,de],despues_de). Note that the dictionary entries could be any uniquely coded item, such as xl23 in place of how_many. The latter is used only to facilitate human scanning of the dictionary. Coded entries would be reasonable for phrases such as How do you do, thank you, and similar invariant expressions. The collocation table, when extended in this way, effectively becomes a phrasal dictionary. 29 3.2 Phrase Structure The representation of phrase structure employed here is a modification of the X - b a r system detailed in 2.2.1(14). The problem with that representation is that the left-recursive production in (14c) prevents a direct encoding of the rule in Prolog. Therefore the adjuncts in that rule are attached at the X 2 level rather than at the X 1 level. Although this leads to structural misrepresentations, the effect on translation is nil , at least for the range of Spanish and English sentences handled by this system. The resulting phrase structure rules are: (7) a. X 2 - > (Spec) X 1 (Post-Adjuncts) b. X 1 —> (Pre-Adjunct) X (Complement) The list representation for the phrase X 2 has the following general form: (8) [Node, Specifier, [Pre-Adjunct, X , Argumenti,...], Post-Adjunct],...] The first element of the list, Node, is itself a two-element list which specifies the category of the phrase and the list of features as percolated from its constituents. The second element is the specifier, i f it exists; otherwise it is simply the letter "e" (= empty). The third element is the phrasal structure at the X 1 level, which consists of an adjunct, or "e" i f none, followed by the lexical head X , followed by zero or more arguments that complement the head. Following the third element are zero or more adjunct structures. In the following subsections, the list representation is shown for each value of X as an instantiation of the pattern in (8). A complete example of a clause is presented in 3.2.6. 3.2.1 C O M P The clause is represented by a C O M P phrase (= S), which has the following list representation: (9) [ C O M P , W H , [e, Complementizer, I N F L ] , Mode] where W H is either empty (=e) or a wh-phrase; Complementizer is either empty or lexical; I N F L is the I N F L 2 ( = S) complement to C O M P ; Mode is the mode of the sentence, i.e. declarative or interrogative. 30 Mode information is attached as an adjunct to the most deeply-embedded C O M P phrase. It is represented in this system as a two-element list of the form: (10) [mode, M O D E ] where M O D E is either "decl" or "ques". 3.2.2 I N F L The I N F L phrase (= S) is represented by the following list structure: (11) [ I N F L , Subj, [e, Modal , VP]] where Subj is the N P subject of the clause; Modal is a modal verb, the English infinitive marker to, or empty; V P is the predicate phrase of the clause. There are no pre- or post-adjuncts to I N F L . 3.2.3 V P The V P predicate phrase is a verb phrase with the following representation: (12) [VP, P I , [P2, V , Argument!,...], Adjunct^...] where P I is either an auxiliary verb (have/be in English, haber/estar in Spanish) or empty; P2 is either a second auxiliary or empty; V is the lexical verb; Argument!,... are zero or more arguments to the verb; Adjuncts... are zero or more adjuncts to the verb phrase. The second auxiliary, which in English is always a form of be, is used in conjunction with the first auxiliary to indicate perfect progressive, perfect passive, or progressive passive aspect As there are only two auxiliary positions in (12), it is not possible to generate a perfect progressive passive form. 1 0 The following sentences illustrate the various combinations: 1 0 This is probably an invalid restriction, given a sentence such as The Bible has been being translated for thousands of years, which is acceptable to some speakers. The allowance for three auxiliary forms could be provided as a matter of execution by treating the passive be as a main verb, subcategorizing for a "de-verbalized" adjective. This has the 31 (13) a. Dorothy has bought a book. b. Dorothy is buying a book. c. Dorothy has been buying books. d. A book was bought (by Dorothy). e. A book has been bought (by Dorothy). f. * A book has been being bought (by Dorothy). The arguments to the verb depend on the subcategorization frame indicated in the lexical entry for the verb; the adjuncts consist of non-subcategorized phrases, here limited to prepositional phrases. 3.2.4 NP, PP, AP Noun phrases, prepositional phrases, and adjective phrases complete the set of phrase structures represented under the X - b a r system. They have the same general configuration as in (8). For N P , the specifier may be an article or quantifier word. N o pre-adjuncts are provided for NPs , PPs, or APs. NPs may include any prepositional phrase as post-adjunct Subcategorization is limited to adjectives taking clausal arguments and prepositions taking N P arguments. 3.2.5 S- Adjunction Adjunction to S ( = I N F L 2 ) occurs i n inversion, such as Subject-Auxiliary Inversion (SAI) in English (cf. 4.4.1) or Verb Fronting in Spanish (cf. 4.4.4). The inverted material is contained in a term, 'Adjunct', where 'Adjunct' is either the English auxiliary or a functor containing the Spanish restructured verb, i.e. a list of one or more verbal elements — see 4.4. The inverted material is adjoined to I N F L 2 , as shown in the tree diagram below: 1 0(cont'd) theoretical drawback of treating passive be sometimes as an auxiliary, as in (13d,e), and sometimes as a main verb, as above. 32 INFL 2 / / / / / Adjunct INFL 2 / / / / N 2 INFL 1 \ \ \ \ INFL 0 V 2 As a list structure, it has the following representation: (14) [INFL, Adjunct, [INFL, Subj, [e, Modal, VP]]] 3.2.6 Example The complete S-structure list representation is given in (16) for the sentence in (15). Indentation and spacing are added to improve readability. (15) The man had seen many things from his window. (16) [[comp,FC], e, [e, e, [[infLFI], [[n,FNl], the, [e, man]], [e, e, [[v,FV], had, [e, seen, [[n,FN2], many, [e, things]]], [[p,FP], e, [e, from, [[n,FN3], his, [e, window]]]]]], [mode.decl]]] The features, prefixed with F- (e.g. FC), are not shown in detail, but indicate such things as person, number, gender, tense, etc. 3.3 Empty Categories Empty categories (ECs) are either base-generated, such as PRO and the null subject pronoun pro ("little pro"), or result from Move-a, where the moved element, a, leaves behind a trace t coindexed with a. The ECs PRO, pro, NP-trace, NP-variable, and PP-variable have 33 the following general representation: (17) [ N O D E , e ] where N O D E is either N P or PP. The features contained within N O D E distinguish between the different types of ECs. P R O contains no features.11 pro contains agreement features and [ + P R O N O U N ] . NP-traces have agreement features, an index, and the features [ + A N A P H O R , - P R O N O U N ] . Variables have the [ + W H ] feature, an index, and [ - A N A P H O R - P R O N O U N ] . For NP-variables, agreement features are also included. The representations of the ECs are summarized below: (18) a. P R O [ [ n l ] ] , e ] b. pro [ [n,[pro( + ) , P E R , P L , F E M ] ] , e ] c. NP-trace [ [n,[indx(I),ana( + ) ,pro( - ) ,PER,PL,FEM]] , e ] d. NP-var iable [ [n,[indx(I),wh( + ) ,ana(- ) ,pro(- ) ,PER,PL,FEM]] , e ] e. PP-variable [ [p,[indx(I),wh( + ),ana(-),pro(-)]], e ] The agreement features P E R , P L , and F E M are set appropriately. 3.4 Grammar In keeping with the conceptualization of a U G and language-specific core grammars, the implementation of the grammar has been divided into two components. The U G component is language-independent and consists of the phrase structure rules utilizing the X - b a r system, together with the transformational component for relating surface structures to D-structures. The core grammars, one for Spanish and one for English, contain the language-specific properties that interact with U G in the processing of sentences. 1 1 This is counter to current G B theory which holds that P R O has agreement features but no phonological content Values are assigned to the features as a consequence of control theory. As this is not implemented here, and the agreement features serve no other purpose in this model, they are absent from the feature l i s t 34 3.4.1 Phrase Structure Rules The phrase structure rules are written in the grammar rule notation for Prolog (Clocksin and Mell ish 1981). They follow the rules in (7) for expanding the X 2 and X 1 phrasal categories along with their associated specifiers, complements, and adjuncts. The basic grammar rules for parsing any phrase of type X are the following: (19) a. x2(L, C , [ [ C , []],Spec,Xl|Post]) — > specifier(L, C, Spec), xl(L, C, XI), postadjunct(L, C, Post). b. xl(L, C, [Pre,Head|Post]) —> preadjunct(L, C, Pre), x(L, C, Head, Args), complements(L, Args, Post). c. x(L, C , W, Args) —> [w], { extract(L,W,C,_,subcat,Args) }. In (a), the x2 clause states that for language L and category C , the X 2 phrase has the phrase structure indicated by the third parameter. To form the phrase structure, a specifier is parsed, followed by the X 1 phrase, followed by any post-adjuncts. The x l clause (b) is similar; a pre-adjunct is parsed, followed by the lexical head, followed by the complements to the head. In (c), the lexical head is indicated by the single token W , taken from the input string. The extract clause extracts the subcategorization frame, indicated by 'Args', by referencing the dictionary entry for W . The subcategorization frame is used by the complements clause in (b) to continue the parsing. When C = C O M P , 'Head' , i.e. the complementizer, may be empty. Similarly when C = I N F L , the modal may not be present Therefore, the following two phrase structure rules are added; the argument of C O M P is set to I N F L , and the argument of I N F L is set to V P : (20) x(_, comp, e, [infl]) —> []. x(_, i n f l , e, [v]) —> []. The specifiers are represented by the general rule: (21) specifier(L, C, Tree) —> Spec. with L, C , and Tree being the language, category, and parse tree, respectively. The particular values of Spec for different Cs are: 35 (22) a. for C = C O M P , Spec = wA-phrase (i.e. N P or PP with feature [ + W H ] ) b. for C = I N F L , Spec = N P c. for C = N , Spec = article d. for C = V , Spec = perfective or progressive auxiliary e. for C=anything else, Spec is empty. In (a), the w A - N P or wh-PP must be lexical during parsing, although i n the transition from S-structure to D-structure, or vice versa, it may be interpreted as a wh- trace, depending on circumstances. In (b), subjects of clauses are always present, by the Extended Projection Principle; therefore, the N P is lexical or an E C , never just "e" as for the other categories, (c) and (d) are of course optional and may be empty. The complements of a lexical head are presented as a list of lexical categories. The list may be empty i f no arguments are subcategorized for. Or the list may indicate optional categories, such as for the verb eat, which may optionally take an N P object: (23) a. Phil ip eats. b. Phil ip eats tacos. Optional arguments are enclosed in braces, e.g. {n}. The dictionary entry for eat and its Spanish equivalent comer would contain the following: d i e t ( e n g l i s h , e a t , v , [ s u b c a t ( { n } ) , . . . , S p a n i s h ( c o m e r ) ] ) . d i e t ( S p a n i s h , c o m e r , v , [ e n g l i s h ( e a t ) , . . . ] ) . The Spanish complement is taken from the English entry, as described in 3.1.1. Since the course of parsing is dependent on the syntactic properties (i.e. subcategorization frames) of the lexical items, the arguments are parsed using the following rules, where the first argument is the language, the second is the list of arguments from the call to x (19c), and the third argument is the list of phrase structures built for each argument (24) c o m p l e m e n t s ( _ , [ ] , [ ] ) —> [ ] . complements(L, [A|AS ] , [ A r g [ A r g s ] ) —> p a r s e A r g(L, A, A r g ) , complements(L, As, A r g s ) . The first clause handles the empty l ist The second parses the argument ' A ' , forming the structure 'Arg ' , and recursively calls complements on the remainder of the l is t 'As ' . The call to parseArg is simply a call to x2, or to optionalArg which also calls x2 but succeeds even i f the X 2 constituent is not found. 36 (25) parseArg(L, {Arg}, Tree) —> optionalArg(L, Arg, Tree). parseArg(L, Arg, Tree) —> x2(L, Arg, Tree). optionalArg(L, Arg, Tree) —> x2(L, Arg, Tree). optionalArg(_,_,[]) --> [ ] . Note that optional arguments are not the same as non-lexical arguments. If an argument is required by the subcategorization frame, it must exist either as a lexical constituent or as its trace. However, before assuming that an unfound argument is non-lexical, all subcategorization frames must be tried. If none result in a successful parse, then the possibility exists that the argument is non-lexical. This only applies to subcategorized NPs and PPs, as these are the only constituents that move in this model. For example, the verb believe subcategorizes for either an N P or C O M P : (26) dict(english,believe,v,[subcat ( n ) ,subcat(comp), ] ) . During parsing, an N P argument wil l be searched for first If not found, then a clause will be searched. If that is not found, then because N P is a possible argument of believe (determined by re-scanning its list of subcategorization frames), an empty N P wil l be inserted into the parse tree, and the parse continues. To accommodate empty arguments, the additional x l clause is added: (27) x l ( L , C, [Pre,Head|Post]) —> preadjunct(L, C, Pre), x(L, C, Head, Args), emptyArgs(L, Args, Post). The emptyArgs clause wil l replace either an N P or PP argument with an E C , i f subcategorized for. The only pre-adjunct considered in this model is the second auxiliary to the verb. The grammar rules for this are simply: (28) preadjunct(L, v, W) —> [w], { dict(L,W,prog,_) }. preadjunct(_, _, e) —> [ ] . Post-adjuncts may be lexical prepositional phrases (not ECs) for either N P s or VPs, or else null . Also, sentence punctuation is considered a post-adjunct to the C O M P . 37 The remaining grammar rule is the rule for inversion. Since inversion for SAI and Verb Fronting both involve adjunction to S, an additional rule at the X 2 level for I N F L is added: (29) x2(L, i n f l , [[inf1,[]],Adjunct,INFL]) --> adjunct(L, Adjunct), parse_inversion(L, Adjunct, INFL). The rules for adjunct and parse_inversion are both language-specific, so they are included in the core grammar section. For English, SAI involves adjunction of a modal or auxiliary. (Do is included as a modal verb, as well as a main verb. Also, the infinitive marker to is included as a modal, since modal verbs may not co-occur with infinitives.) Spanish inversion is considerably more complicated, since inversion involves verb restructuring. The modal, both auxiliaries, and main verb, or some restricted combination of them, may be reanalyzed as a single verbal element which inverts. Thirteen combinations are allowed that maintain the ordering of elements; that is, i f the main verb inverts, for example, then any and all preceding verbal elements must invert Similarly for the auxiliaries and any elements preceding them. Also, following Torrego (1984), i f perfective haber or one of its inflected forms is found inverted, then its participle must also be inverted. To allow for the possibility of all four elements inverting, an adjunct structure is created as a four-element list, where each element may either be lexically filled or empty, with the proviso that the adjunct must not be completely empty (i.e. not = [e,e,e,e]). That is, i f an adjunct is formed, then at least one element has inverted. Chapter 4 TRANSFORMATIONS The transformations considered in this model fall into two general categories. One is the set of language-independent rules, i.e. M o v e - a and Move Aff ix . These are represented in the U G component as general movement rules, although parameter settings within the core grammars will influence their application. The other category includes idiosyncratic rules such as D o Support for English and pro-drop for Romance languages. Each transformation is described in the following sections, presenting first the motivation for the rule, and second its implementation. The last section deals with the order of transformations within the transformational cycle. 4.1 Move Alpha The fundamental transformation under current theories of transformational grammar is M o v e - a , i.e. "move any category anywhere". The principles of U G effectively determine all and only those applications of M o v e - a that result in grammatical sentences, subject to parametric variation. When a phrase moves, it forms an association chain with its base position. Each position to which the phrase moves becomes a member of the chain and inherits the index that was assigned to the base element at lexical insertion time. A 0 - ro le is assigned to the base position in the chain, and for lexical NPs Case will be assigned to the initial position. The movement obeys the bounding condition of subjacency, and all non-init ial positions must be properly governed by the Empty Category Principle, a well-formedness condition covered in Section 5.5. A basic result that falls out of the Projection Principle and the 6- Criterion is that movement is always to a n o n - 6- marked (= 6) position (Borer 1979, 1981). If movement 38 39 were possible to another 8- marked position it would receive two 6- roles, in violation of the 8- Criterion. 8 positions include C O M P , the subject of I N F L i f not 8- marked by the predicate phrase, and adjuncts. In this model, the landing sites for movement are taken to be empty n o n - 8- marked subjects and C O M P . Only NPs may move to subject position, by the structure-preserving hypothesis (Emonds 1976, but see Stowell 1981), and only categories with the feature [ + W H ] may move to C O M P (Chomsky 1981:115). The process of moving an element out of base position and the reverse operation of restoring a moved element are logically just inverse operations of each other. However they are treated separately here, since the goals of the two operations are quite different; one, the latter, is concerned with relating a phrase at surface structure with its base 8- marked position, while the other relates a base position with possibly many valid 8 surface positions. The details of these two operations are covered more fully in 6.4.8 and 6.6.2. 4.2 Move Affix Move Af f ix , or A f f i x - H o p p i n g (Chomsky 1957, Baker 1978), affixes inflectional markers to verbal elements. A t D-structure, tense and agreement make up the set of features in I N F L (Chomsky 1981). Considering [ + TNS] as an affix, it "moves" from I N F L to the first verbal element, which may be a modal, auxiliary, or main verb. A G R is assumed to be associated with [ + TNS] (but not with [ -TNS]) so it too moves with [ + T N S ] , thereby fully determining the morphology of the verbal to which it attaches. The perfective auxiliary (have/haber) is associated with a perfect participle, which affixes to the following verb form. Similarly, the progressive auxiliary (be/estar) supplies a progressive participle, and the passive be/ser a passive participle (although this form of passive is not often used in Spanish, and not implemented in this model). The effects of these rules are exemplified below: 40 (1) a. Dana [ + TNS(past)] sit at home. b. Dana sit+TNS(past) at home. c. Dana sat at home. (2) a. Michael [ + TNS(pres)] have + en leave. b. Michael have+TNS(pres) leave + en. c. Michael has left. (3) a. They [ + TNS(pres)] be+ing visit friends. b. They be+TNS(pres) visit +ing friends. c. They are visiting friends. (4) a. He [ + TNS(past)] can have + en be+en see from the window. b. He can + TNS(past) have be + e« see + en from the window. c. He could have been seen from the window. The sentences of (a) are in base form, with participles associated with their auxiliaries. In (b), all affixes have moved to right-adjacent verbal elements. The resulting surface forms are shown in (c). In the reverse process, the inflectional features of the lexical items are combined and merged with the features in I N F L . The inflected forms are then replaced by their root forms. Validity tests on the combined features ensure that the appropriate form of the item has been taken. For example, the word bought may be past tense or a perfect participle, depending on the context The reverse transformation will select the correct choice. 4.3 N u l l Subject A characteristic of many languages (e.g. Spanish, Italian, Hungarian, Chinese, Japanese) that falls under the rubric of "pro-drop" is the possibility of "missing" subjects of tensed clauses, as in the following Spanish examples: (5) a. (Yo) voy a la playa. (T'm going to the beach.') b. (Tu) vas a la playa. ( 'You [informal] are going to the beach.') c. ( E l / E l l a / U d . ) va a la playa. ( 'He/She/You are going to the beach.') d. (Nosotros) vamos a la playa. ('We're going to the beach.') e. (Ellos/Ellas/Uds.) van a la playa. ("They/You [plural] are going to the beach.') In all cases the parenthesized subject pronoun is optional. It appears that, following Taraldsen (1978), when the inflectional system is sufficiently rich to identify the subject, the subject may delete. This connects the presence of A G R in I N F L (obligatory with [+TNS] but absent from 41 [ -TNS] ) with the possibility of a null subject i f A G R is sufficiently specified. 1 2 English A G R is not sufficiently rich, so empty subjects of finite declarative clauses do not occur. The empty subject in these cases is pro, an N P with [ + P R O N O U N ] and agreement features. It is governed by A G R , as is the subject of any finite clause, and thereby may receive nominative Case. In analyzing Spanish sentences, a null subject may indicate any one of the ECs. If the verb is inflected, then the subject is either a wh-trace or pro; it cannot be P R O because it is governed by [ + T N S ] , and it cannot be an NP-trace because, as subject, it could not be bound in its governing category, in violation of binding condition A . The distinction between pro and wh- trace is readily apparent i f the verbal inflection indicates first or second person, in which case the appropriate lexical pronoun can replace the E C under recoverability of deletion. Otherwise the status of the E C wil l depend on whether there exists a wh- element to bind the variable. In generating Spanish sentences, a subject pronoun may be replaced by an E C , retaining its features which include person, number, possibly gender, and [ + P R O N O U N ] . English, not being a pro-drop language, is barred from the possibility of null subjects. 4.4 Inversion The inversion transformation for English is called Subject-Auxiliary Inversion (SAI), or Subject Helping-Verb Inversion (Baker 1978). To fully account for the empirical facts of SAI in English, some additional transformations are required, namely D o Support and Have/Be Raising. These are described below, followed by the counterpart inversion rule in Spanish, Verb Fronting. 1 2 Huang (1982) notes that Chinese has no A G R feature set, yet is a pro-drop language; he relates the identification of the empty subject not with A G R but with a higher lexical subject. 42 4.4.1 Subject-Auxiliary Inversion SAI is an obligatory rule for English yes/no and wh questions, as demonstrated below: (6) a. Can we still be friends? b. Has it ever happened before? c. Are you lonesome tonight? d. D o you really want to hurt me? (7) a. How can I be sure? b. Where have all the flowers gone? c. W h o are you kidding? d. Why do fools fall in love? In the (a) examples, the modal verb inverts with the subject In (b), the perfective auxiliary inverts; in (c) the progressive auxiliary inverts, and in (d) the "helping verb" do is inserted, then inverted. In structural terms, the contents of the I N F L head ( = " ' A U X " ) become an adjunct structure that attaches at the I N F L 2 level: (8) [ I N F L , Adjunct [ I N F L , Subj, [e, e, VP]]] This is diagrammed below: INFL2 / / / / / INFL 2 / can INFL 2 has / are / do / / / / N 2 INFL 1 \ \ / N 2 INFL1 \ \ \ \ \ \ INFL 0 INFL 0 can has e are do (a) base (b) surface 43 The implementation of this rule is straightforward and reversible. From a surface structure (b) in which an adjunct was parsed, the adjunct is restored to the head of the I N F L , as in (a), which must, of course, have been empty, given the ungrammaticality of sentences like Should John can go?. A t generation, inversion of I N F L occurs (at the matrix level only) merely by virtue of the sentence being a question. 4.4.2 Have/Be Raising When no modal is present in I N F L to undergo inversion, the first auxiliary inverts, as in the (6b,c) and (7b,c). It "raises" into I N F L , where SAI then applies. The structures are reversible: I N F L 2 I N F L 2 / / / / N 2 I N F L 1 / / N 2 I N F L 1 \ \ \ \ I N F L 0 V 2 / / \ \ \ \ I N F L 0 V 2 e / have / be / / / / / / A U X i V 1 have / be / A U X i V 1 be / e / / / / / A U X 2 V ° A U X 2 V ° be e e (a) base (b) surface 44 4.4.3 Do Support If SAI inverts a null A U X , i.e. void of a modal or auxiliary verb, the rule of D o Support is activated. The grammatical formative do, distinct from main verb do although with similar morphology, is inserted into the inverted A U X position. However, it does not apply in wh-questions with empty subjects: Who saw David?, although SAI still takes place, even i f vacuously. The current implementation departs from the traditional order by scheduling D o Support before SAI. This allows Move Aff ix to attach tense onto do before inversion takes place. The D o Support transformation is provided with a "look-ahead" capability to detect i f D o Support will be required. D o Support is illustrated below: INFL 2 / / / / N 2 INFL 1 INFL 2 / / / / N 2 INFL 1 \ \ \ \ \ \ \ INFL 0 V 2 e / \ INFL 0 V 2 do / / / / / / / AUXX V 1 e / AUXX V 1 e / / / / / / / AUX2 V° AUX2 V° e e (a) base (b) surface 45 4.4.4 Verb Fronting Verb Fronting, or V-Preposing (Torrego 1984), is a characteristic of Spanish questions that fronts one or more [ + VJ elements to the head of the clause, as in the examples below: (9) a. Debia Juan haber estado cantando tan fuerte? ('Should John have been singing so loudly?') b. Debia haber estado Juan cantando tan fuerte? c. Debia haber estado cantando Juan tan fuerte? In (a), only the modal verb undergoes fronting. In (b), the modal and perfect auxiliary with its participle inverts (perfectives may not be separated from their participle - cf. * Debia haber Juan estado cantando tan juerie?). In (c), the modal, both auxiliaries, and main verb are fronted. As noted by Torrego, verb fronting is optional for yes/no questions but obligatory in clauses containing certain wh- phrases or their trace in C O M P : (10) a. Qu6 querian esos dosl ('What did those two want?') b. *Qu6 esos dos querian? (11) a. C o n quien sabia Juan que habia admitido Ana que habia hablado Pedro! ( 'Who did John know that Ana had admitted that Peter had spoken with?') b. C o n qui6n sabia Juan que habia admitido Ana que Pedro habia hablado? c. * C o n qui6n sabia Juan que Ana habia admitido que Pedro habia hablado? In (11), the wh- phrase moves from the lowest embedded clause to the matrix C O M P . In (11a), the wA-phrase passes through each C O M P position, forcing inversion. Example ( l i b ) is grammatical and provides evidence for S as a bounding category for M o v e - a . The wh-phrase moves directly to the second C O M P position, bypassing the lowest C O M P . Since the lowest C O M P does not contain the wh- trace, no inversion is required. Example (11c) is a subjacency violation as the wh- phrase crosses two S nodes, evident from the fact that inversion has only occurred in the highest clause containing the wh- phrase. 1 3 Several combinations of inversion are thus possible, so the V-preposing clause simply lists all patterns of base structure, surface structure and adjunct in the format shown: 1 3 Not all linguists agree with_ Torrego's grammaticality judgments of the sentences in (11) nor with the conclusion that S is a bounding node for Spanish. Her analysis is adopted here, although other analyses may prove to be preferable. 46 (12) v_prepose(Base, Surface, Adjunct). Each argument to v_prepose consists of a four-element list representing the contents of the modal, first auxiliary, second auxiliary and main verb. When mapping from surface to base structure, the surface-adjunct pair directly unifies with one of the combinations to derive the base structure. O n generation, the base structure wi l l , through backtracking, unify with one or more possible surface structures, giving rise to multiple transformations. For example, i f a modal and main verb are found inverted during parsing, it wil l match the first entry below: During generation, the first entry wil l cause the modal and main verb to invert; subsequent backtracking to v_prepose wil l select the second entry, inverting only the modal. 4.5 It Insertion In the pro-drop languages, null subjects of clauses that are not ©-marked remain null . In English, they must be lexicalized. The non-referential pronoun it serves this function. (There is the other non-referential element but its characteristics are more complex (Chomsky 1981, Safir 1982, Williams 1984).) It Insertion is diagrammed below: (13) v_prepose( [l,e,e,V], [e,e,e,e], [l,e,e,V]). v_prepose( [l,e,e,V], [e,e,e,V], [I,e,e,e]) :- !. INFL2 / / / INFL2 [PER(3),-PL] / N 2 INFL1 e \ / / / / N2 INFL1 it \ \ \ \ \ \ \ INFL0 V 2 [-THETA] INFL0 V 2 [-THETA] (a) base (b) surface In going from base to surface form, an empty subject of a non-6-marked clause has it 47 inserted, with features [PER(3) , -PL] (= 3rd person singular) merged into the I N F L features for determining verbal inflection. Going from surface to base form, i f it appears as subject of a non- 6- marked clause, it deletes. (This is not always correct i f it is in fact referential, as in the sentence It seems to behave intelligently. Some amount of backtracking or lookahead, ultimately relying on satisfaction of the 6-Criterion, would be required in order to correctly handle this situation.) 4.6 Complementizer Insertion It is assumed that complementizers are optionally selected, depending on specific lexical characteristics and other conditions such as embedded tense. When introducing tensed embedded clauses in English, the complementizer seems to be freely selected (or deleted, depending on the direction from which the problem is approached): (14) I know (that) you think (that) I'm crazy. Similarly, for may be optional (in some dialects) in infinitival clauses: (15) She would prefer (for) you to leave immediately. The complementizer for takes on the same Case-assigning properties as the preposition for, so its presence in C O M P is closely connected with the Case Filter. It may not delete, for example, in the sentence: (16) I'm eager for you to take part. since you would not be assigned Case (although note the counterexample for want below). Choice of complementizer is determined in part by lexical properties, presumably as part of the subcategorization frame that specifies clausal arguments. For example, wonder may take whether as a complementizer (or other [ + W H ] phrase) but not that: (17) a. I wonder whether Jennifer knows yet b. I wonder who Jennifer knows. c. *I wonder that Jennifer knows yet The for- complementizer is not permitted by wonder, although other verbs obligatorily require it: 48 (18) a. They waited for him to leave. b. *They waited that he would leave. c. I want (for) David to come with me. d. *I want (that) David come with me. (The for in (18c) is assumed to delete when immediately adjacent to the verb after assigning Case to the embedded subject, another marked property of want — cf. I want very much for David to come with me) However, it is assumed that in the unmarked case, clauses may be tensed or infinitival, with choice of complementizer determined accordingly. In this model, these peculiarities are avoided by allowing the that- complementizer to optionally appear in any tensed embedded clause. In a more complete system, some form of selectional restrictions on the subcategorized clause is necessary, as discussed in Chapter 8. 4.7 Modal Insertion In English, future tense may be indicated by the modal will, while Spanish has a regular inflection for future tense: (19) a. George wi l l arrive tomorrow at seven, b. Jorge llegara maflana a las siete. Similarly for conditional tense, where English sometimes uses the modal would and Spanish inflects: (20) a. If I could, I would buy a house, b. Si pudiera, me compraria una casa. In this model, only future tense is analyzed, since conditional tense generally involves a subordinate clause in the subjunctive mood (cf. (20b) above), which is not considered here. When will is encountered in analyzing an English sentence, the modal is deleted, leaving behind the tense feature in the feature set for I N F L . In generating English sentences, will is inserted into the head position of I N F L i f future tense is specified. In the case of untensed clauses in English, the infinitive marker to is used. Since to indicates [ - T N S ] , it may not co-occur with a modal, which always carries the [ + TNS] feature. Therefore, the position occupied by a modal in tensed clauses is f i l led with to in infinitival clauses. As with will, it deletes during analysis of sentences, leaving behind [ -TNS] 49 as a feature, and is inserted on generation. 4.8 Successive Cyclicity and Order of Transformations The cyclic nature of transformations is relevant to the application of M o v e - a , where long-distance movement can only be explained by assuming a series of successive movements, given the bounding condition of subjacency. This was illustrated in Section 4.1, Move Alpha, and in Section 4.4.4, Verb Fronting, for wh- movement, and applies as well to NP-movement, as in the sentence: (21) Thatcher seems to be likely to win. with the corresponding S-structure representation: (22) Thatcher^- seems t^ to be likely [g ti to win ]]]]]] The subject, Thatcher, moves from the lowest embedded clause S 3 , where it receives its ©-role, to the matrix subject via the non-©-marked subject of the intermediate clause S 2. Direct movement from S3 to Si would violate subjacency. Successive cyclicity does not necessarily enter into the operation of the other transformations, which are all local to the clause anyway. However, since it is required for M o v e - a , all transformations are defined in terms of the cycle, that is, as having their effect within the boundaries of a single clause, i.e. C O M P 2 phrase. This facilitates the execution of a transformational cycle by recursively calling the set of transformations on each C O M P 2 phrase. In analyzing sentences, the cycle begins at the matrix clause; in generating sentences it begins at the lowest clause. Within one cycle, the transformations are performed sequentially. The results of a transformation may have a bearing on the functioning of subsequent ones, so that ordering becomes significant 1 4 As a consequence, the transformational cycle may take only limited 1 4 A n alternative would be to introduce filters (Chomsky and Lasnik 1977) to eliminate mis-applied transformations. In that case, ordering would be irrelevant but a large number of filters might be required, possibly incurring excessive computational overhead in their execution. The result effectively imposes an implicit rule ordering, which is made explicit here and minimizes the reliance on filters. 50 advantage of concurrent processing in mapping from one level of representation to another. However, the conditions on well-formedness of representations could be applied simultaneously. Because of the similarity between Spanish and English, the transformational cycle for the two languages share much in common. Ordering is the same for related transformations, and some, such as M o v e - a and Move Aff ix , are general enough to be described independently of the language. The overall effect is to reduce the size and complexity of the transformational component, while maintaining relatively high cross-linguistic generality. For English, the transformational cycle includes the following: (23) a. M o v e - a b. It Insertion c. D o Support d. Move Aff ix e. C O M P Insertion f. Modal Insertion g. Have/Be Raising h. SAI M o v e - a occurs first, i f at all, to move phrases into © positions. If a non-©-marked subject position remains empty, It Insertion applies to lexicalize the subject C O M P Insertion and Modal Insertion may occur at any time. Before Move Aff ix applies to inflect verbs, D o Support must occur, in case inversion (or "Not" Insertion, i f negation were implemented) is called for. After the tense has been affixed, SAI may move the inflected auxiliary. Have/Be Raising is a convenience in the specification of SAI , so it precedes SAI. The reverse-transformational cycle is exactly the reverse of the above, with the exception that C O M P Insertion and Modal Insertion are not performed; that is, C O M P "deletion" and M o d a l "deletion" are not necessary in the reconstruction of D-structure from surface structure. They are effectively deleted during the translation stage. Further, some auxiliary actions are performed, such as ©-extraction (Section 6.4.4), that do not have counterparts in the transformational cycle. For Spanish, the transformational cycle includes: (24) a. M o v e - a b. N u l l Subject c. Move A f f i x d. C O M P Insertion 51 e. Verb Fronting . M o v e - a applies first, after which the N u l l Subject option may apply. Verb Fronting is applied to the matrix clause of yes/no questions and to any clause in which M o v e - a has moved a wh- phrase -to C O M P . However, before Verb Fronting occurs, Move Aff ix generates the verbal inflections. As in English, C O M P Insertion may occur at any time. The reverse transformational cycle performs these transformations in reverse order, again bypassing C O M P Insertion. Chapter 5 CONDITIONS ON REPRESENTATIONS Representations of the different levels of linguistic structures (i.e. D-structure, S-structure, L F , etc.) must satisfy certain well-formedness conditions to meet standards of grammaticality. In the previous chapters we've covered the general representations of D-structure and S-structure and the rules for mapping between the two. We saw that D-structure was unique to a sentence and that multiple S-structures were related to the D-structure through the basic transformation M o v e - a , the other transformations being highly idiosyncratic and relatively uninteresting. M o v e - a was seen to obey the principle of subjacency, a condition on movement rather than on representation. See Huang (1984) and Chomsky (1982) for argumentation supporting this viewpoint In this chapter we look at the conditions that determine the well-formedness of D -and S-structures, and how these conditions are implemented in Prolog. Certain conditions are expressed as filters on over-generation, specifically designed to filter out i l l - f o r m e d S-structure constructions. Only two such filters are modeled here, the Doubly- f i l l ed C O M P Filter and the Case Filter. Other well-formedness conditions are expressions of the principles of grammaticality, also implemented in the form of filters. These include the Binding Conditions and the Empty Category Principle. The Extended Projection Principle and 8- Criterion, discussed first, are satisfied by virtue of the structural design of the model; no explicit filtering mechanisms are used. 5.1 Extended Projection Principle and Theta-Criterion The Extended Projection Principle and 8- Criterion interact to constrain both S-structure and D-structure. The two principles guarantee conservation of 8- roles among the different levels of representation within the X - b a r system of phrase structure. This correspondence is 52 53 maintained by the grammar rules which are driven by subcategorization frames within the lexicon, and by the fact that lexical categories in © positions may be related to one and only one empty ©-position. Elements ©-marked at D-structure move within the sentence structure leaving coindexed traces in each occupied location. The indexed chain so formed contains at most one lexical phrase ("at most" since it may consist of only non-lexical P R O ) , and each structural position is represented in only one chain. This ensures that a definable path, not necessarily unique, exists from S-structure back to D-structure, and that the thematic properties expressed at D-structure are present in any S-structure configuration. G i v e n that only grammatically correct sentences are analyzed by this system, the well-formedness of the representations are maintained as to the Extended Projection Principle and the ©-Criterion. 5.2 Doubly-Fi l led C O M P Filter In the course of wh- movement, the C O M P node may contain a wh- phrase in specifier position and a lexical complementizer in head position. Such a configuration is ruled out at S-structure by the D o u b l y - F i l l e d C O M P Filter ( D F C F ) (Chomsky and Lasnik 1977): (!) ^ C O M P 1 where a is assumed to contain a w/?-phrase and 0 a complementizer, as i n : (2) *...[the man [g tooMP w h o t ^ i a t ^ -^S y o u m e t 1 ^ Either a or /3 (or both) must delete in order for the clause to be grammatical. W e might have achieved the same result by stipulating that a wh- phrase moves to the head of C O M P , thereby preventing a lexical complementizer from simultaneously appearing in head position. However, a couple of problems result from this analysis. One is that there exist languages for which the D F C F does not hold (Pesetsky 1982). Another is that wh- movement through C O M P is assumed to leave a trace in C O M P , even though a complementizer may be present in certain circumstances: (3) WhOj do you think [g [coMP  li * a t ^ ^  s a w li ^ Chomsky (1981) proposes that this is also ruled out by the D F C F , and that either the trace 54 or that must delete. While it might be agreed that the trace is deletable, especially i f the D F C F is a filter in the P F component, the analogue to that in Spanish, for example, is not deletable: (4) a. A quien^ piensas [jj [ ^ O M P '/ ^ u e ^ -^S v * ° Guil lermo */ 1] b. * A quien^ piensas [g fcQMP li ^ ^S v ' ° Guillermo ti ]] Therefore the head position of C O M P must be reserved for a complementizer for those languages that require it, and the specifier position of C O M P may still be used to contain a wh- phrase, preserving the derivational history of movement in the syntax. The D F C F , then, becomes a condition on the representation of S-structure, where the C O M P may not contain a and /3, both lexical. As a Prolog clause, this is stated simply as: (5) dbl_comp_filter([[comp,_], WH, [_,COMP,_]]) :-lexical(WH), lexical(COMP), ! , f a i l . The argument to dbl_cornp_filter is the S-structure of the clause, a C O M P 2 phrase. The specifier is W H and the head is C O M P , all other constituents being immaterial. If both W H and C O M P are lexical, where lexical is defined as: (6) lexical([X,e]) :- ! , f a i l . i.e. not empty, then dbLcomp_filter fails and the S-structure is rejected. 5.3 Case Filter The Case Filter basically states that all lexical NPs must have Case. This is formulated in terms of the chain C=(a.j,...,a7J), where each af- is an N P in an argument position (A-posit ion) and af- binds a / +^. thus sharing an index. Furthermore, for all />1, is an E C . The head of the chain, a^, is either in a Case-marked position or P R O , and no other a - in the chain may be Case-marked. The head of the chain, then, must either be lexical or P R O . The chain is termed an A - c h a i n since it involves only A-positions. 55 The implementation of the Case Filter involves inspecting every A-posit ion and verifying that lexical NPs in these positions are Case-marked. The A-positions of interest are the subjects of clauses and objects of verbs. Objects of prepositions are always assigned Case by the preposition, and may not move to another Case-assigned position; 1 5 therefore, they do not need explicit verification. The subject of a clause may receive Case in one of two ways: through the mechanism of Exceptional Case-Marking ( E C M ) or through [ + TNS] in I N F L , the former assigning objective Case, the latter nominative. If the subject is lexical, it must be in one of these two environments. If it is not lexical, the Case Filter does not apply; other principles wil l determine the status of the E C . Exceptional Case-Marking is an idiosyncratic feature of English that allows for sentences such as: (7) a. I believe (that) John is my friend, b. I believe John to be my friend. In (a), John receives nominative Case from the tensed embedded I N F L . In (b), nominative Case is not assigned since the embedded clause is infinitival, which normally prevents lexical NPs from appearing in subject position, as in : (8) *I think John to be my friend. The transitive verb believe "deletes" the subcategorized S node and thus is able to directly govern the embedded subject, assigning it Case by virtue of transitivity. The Case Filter is implemented by separating the analysis of Case into two stages, one for analyzing the subject position, the other for object position: (9) case_filter(L, Sstructure) :-subj_position(L, Sstructure), obj_position(L, Sstructure). The two configurations in which Case is assigned to the subject position are represented by the following Prolog clauses: 1 5 Restructuring of a preposition with the verb may occur in certain cases, such as This task should not have been taken on. The restructured verb take on assigns Case, not the preposition. However, this model does not contain cases of such restructuring. 56 (10) subj_pos(L, [ [ v , J , _ , [_,V, [[comp,_],e,[e,e, [ [ i n f l , F I ] , N P , _ ] ] ] ] ] ) : -l e x i c a l ( N P ) , member(tns(-) ,FI), ! , ecm(L,V). subj_pos ( L ,[[inf1,_] , N P , _ ] ) : -l e x i c a l ( N P ) , ! , not member(tns(-) ,FI) . In the first subj_pos statement, the configuration for ECM is presented, where a VP dominates a clause COMP which contains no wh- phrase, trace, or complementizer. Then, if the subject is lexical and INFL contains the feature [-TNS], ECM must obligatorily apply. (The cut symbol (!) prevents backtracking to the next subj_pos statement) The ecm statement is defined as: (11) ecm(english ,V) : -is_sdeleter(english,V,v), is_transitive(english,V ) . where is_sdeleter scans the lexical entry of the verb for the feature 'sdel( + )\ and is_transitive checks verbs for subcategorization of NP. ecm is present only in the core grammar for English; Spanish has no ECM properties. In the second subj_pos statement, a lexical subject not in an ECM environment must be in a tensed clause, i.e. the features of INFL must not contain [-TNS]. Appropriate Prolog clauses are added to recursively search through the phrase structure for the two subject Case-marking configurations. Objective Case is always assigned by transitive verbs, as for prepositions, except in the passive construction, where the transitive verb is "de-transitivized" and fails to assign Case to its object Therefore if the object of a transitive verb is lexical but the phrase is marked for passive, the configuration will be ruled out The Prolog clause is defined as: (12) obj_pos( [ [ v , F V ] , • _, [_, _, [[n,_], _, _]|_]) :-not member(pass(+),FV). This situation occurs if, for example, Move-a fails to move the NP out of object position, as in: (13) *(It) was taken a book from the shelf. The NP a book must obligatorily move to subject position, where it will receive Case from the 57 tensed I N F L . 5.4 Binding Conditions The relation between an N P and a possible antecedent is expressed by the binding conditions (and by the theory of control for P R O ) . A s stated in Section 2.2.5, the binding conditions hold over lexical NPs and non-lexical NPs , with NP-traces conforming to Condition A for anaphors, pro following Condition B for pronouns, and variables conforming to Condition C for names. Although in principle all NPs , whether lexical or non-lexical, receive an index whose value is determined by the binding conditions, only NPs that participate in movement are explicitly checked for valid binding, since only they are given indices in this system. The binding of all other (static) N P s determines their reference for interpretation purposes, a semantic operation at L F , and therefore not of interest here. Consequently, the implementation of the binding conditions is limited to Condition A for NP-traces and Condition C for variable traces, expressed by the following top-level Prolog statement: (14) b i n d i n g _ c o n d i t i o n s ( L , S s t r u c t u r e ) : -c o n d i t i o n _ A ( L , S s t r u c t u r e ) , c o n d i t i o n _ C ( L , S s t r u c t u r e ) . For Condition A , an NP-trace may occur either in subject position or in object position. If the subject is an NP-trace , then it must be governed by a predicate with the S-delet ion property. It may not be governed by [ + T N S ] , thereby preventing a sentence such as: (15) John seems [ t has left ] (Note that (15) is more properly a Case Filter violation since the chain of John and its trace is receiving Case twice. However, the implementation of the Case Filter only applies to lexical NPs and variables, not NP-traces.) The Prolog statement of Condition A for subject position is given in (16): 58 (16) c o n d i t i o n_A(L, [[X,_],_,[_,XW, [[comp,_],WH,[e,COMP, [ [ i n f l , F I ] , N P , _ ] ] ] ] ] ) : -i s _ n p t r a c e ( N P ) , I WH=e, COMP=e, m e m b e r ( t n s(-), FI), i s _ s d e l e t e r ( L , XW, X). N o explicit index matching is performed, since in this model i f an NP-trace is in subject position, it can only be bound to an antecedent in the next higher subject position, the only possible available A-posit ion. Empty object positions as landing sites for an antecedent are excluded by the Projection Principle, and more distant subject positions are excluded by subjacency. For an NP-trace in object position, its antecedent must be the subject, a result of passive formation. Here, explicit index matching is required. Otherwise, the D-structure in (17a) could generate the sentence in (17b), in which movement obeys subjacency and the trace is properly governed, but the trace is not bound in its governing category, violating Condition (17) a. [ e be likely [ Mary [ - T N S , + P E R F ] see John ]] b. John^ is likely Mary to have seen t-The statement for this condition is given below, where I N F L represents the governing category for the trace in object position: (18) c o n d i t i o n _ A ( L , [ [ inf1,_],NP1,[_,_, [[v,_],_,[_,jArgs]]]]> :-member(NP2, A r g s ) , i s _ n p t r a c e(NP2), i i n d e x ( N P l , I ) , i n d e x (NP2, J ) , I == J . In (18), each argument of the verb is checked for the presence of an NP-trace. If none is found, Condition A succeeds by default Otherwise the index of the trace must match the index of the clausal subject As these are the only two possibilities for an NP-trace to occur, a further check is made to verify that no other E C in the structure has the features of an NP-trace , i.e. 59 [ + A N A P H O R - P R O N O U N ] . As for Condition C , a violation will occur i f a variable is bound to a c-commanding phrase in an A-posit ion. As with Condition A , object positions are excluded as possible antecedents by the Projection Principle, leaving subjects as the only possible candidates. A violation could arise in the following instance, where (19a) gives the D-structure and (19b) a possible derivation: (19) a. [ e seem [ e [ who like Madonna ]]] b. *[ Whoz- seems [ [ r- likes Madonna ]]] c. [ WhOj- [ does it seem [ ti [ ti likes Madonna ]]]] In (b), who moves to C O M P by wh-movement, and in the next cycle to the ©-subject of the matrix clause. The variable in the embedded subject position is now bound to the matrix subject, violating Condition C . Compare this with the grammatical sentence in (c), where the embedded subject trace is bound to who in C O M P , an A - p o s i t i o n , and therefore not in violation of the binding conditions. The relevant statements that constitute the basic formulation of Condition C are given below. As with the implementation of the other well-formedness conditions, other statements not shown are also involved in recursively searching clausal constituents for binding condition violations. (20) condition_C(L, [[comp,_],_,[ ,_, [[infl,_],NP ,T_,_|vp]] ] ] ) : -var_free(NP, VP). var_free(NP, Args) :- index(NP,I), !, free(I, VP). var_free(_,_). f r e e d , [H|T]) :- free_arg(I, H), free_arg2(I, H), f r e e d , T). free(_,[]). free_arg(I, X) :- not (is_variable(X), index(X, J), I == J ) . The var_free statement determines i f the subject has an index; i f so, then any embedded variable must not be bound with the same index. The free statement makes sure that each 60 argument of the verb does not share the index of the subject (checked by free_arg), and that no embedded phrase is bound to the subject (checked by free_arg2). The free_arg statement wil l succeed unless the phrase under inspection is a variable whose index matches that of the subject 5.5 Empty Category Principle The E C P (or Extended E C P (Chomsky 1982)) states the conditions on empty categories (ECs): (21) Extended Empty Category Principle (i) A n E C is a trace i f and only i f it is properly governed (ii) A n E C is P R O if and only i f it is ungoverned NP-traces and wh- traces must be properly governed by (i), where proper government is a more restricted notion of government than that presented in Section 2.2.4. The E C pro is not subject to the E C P but rather depends on Condition B of the binding conditions, being a pronominal, and on its identifiability; that is, i f deleted its identification must be recoverable. In Spanish, the verbal inflection is generally sufficient to identify the missing pronoun. (Note that this applies to pro in subject position, not object pro which identifies with a clitic.) The distribution of the remaining E C , P R O , is described by (ii) of the E C P , which restricts it to the subject of infinitival clauses. Certain configurations are classed as proper governors, possibly subject to parametric variation. Thus, for English, [ + TNS] governs the subject but does not properly govern i t For Spanish, [ + TNS] is included in the set of proper governors but prepositions are n o t 1 6 The "core grammar" components of the system contain the following Prolog unit clauses for defining the proper governors: 1 6 The inclusion of [ + T N S ] as a proper governor for languages such as Spanish was proposed by Chomsky (1981) but later rejected. It is included here as a proper governor since it simplifies the implementation for the effects we wish to achieve. 61 (22) proper_governor(english,v). proper_governor(english,p). proper_governor(spanish,v). proper_governor(Spanish,inf1). In addition to the above categories, a wh- phrase or its trace in C O M P is implicitly included as a proper governor. With these descriptions of proper government, the E C P is implemented by enumerating all the structural configurations in which an E C may occur and verifying that the position meets the E C P . These are listed below, and include (1) proper government by an S-deleting predicate, (2) proper government by a wh- element (overt or trace) in C O M P , (3) proper government by [ + T N S ] (for Spanish), (4) proper government by V or P of its arguments, and (5) proper government by a lexical head of any adjuncts.17 The latter case arises, for example, in wh-movement of a non-argument, as in (23): (23) A t which dance did Kevin meet Marlene t 1 The wh- phrase is not subcategorized by meet, but the trace is nevertheless properly governed. The principal statements of the E C P are the following: (24) ecp(L, [[X,_],_,[_,XW, [[comp,_],e,[e,e, [ [ i n f l , J , N P , _ ] ] ] ] ] ) not lexical(NP), is_sdeleter(L, XW, X), not i s PRO(NP). ecp(L, [[comp,_],WH,[e,e, t[i n f l , _ ] , N P , _ ] ] ] ) index(WH,_), not i s PRO(NP). ecp(L, [[inf l , F l ] , N P , _ ] ) :-not lexical(NP), not member(tns(-), F I ) , proper_governor(L, i n f l ) . 1 7 This is perhaps better described as antecedent government, not lexical government (Lasnik and Saito 1984). ecp(L, [[X,_],_,[_,_|Args]]) : member(XP, Args), is_trace(XP), i proper_governor(L, X) ecp(L, [[X,J,_,_|PostAdj]) :-member(XP, PostAdj), is_trace(XP), i proper_governor(L, X) Chapter 6 E X E C U T I O N The overall purpose of the system is to read grammatical sentences in a source language, convert them to D-structures, translate the D-structures into a target language, and then generate grammatical sentences in the target language, relying on the principles of transformational grammar. Since many transformations are optional, the system will perform all possible transformations on a single D-structure, producing an output sentence for each option. Likewise, a sentence in the source language may have multiple translations, and for each translation the complete set of transformational options are applied. The high-level operation of the system is defined by the following Prolog program: (1) model :- readinput(SL, TL, P F 1 ) , gb(SL,PFl, TL,PF2), printout(PF2). gb(SL,PFl, TL,PF2) :-morph(SL, PF1, PFla), parse(SL, PFla, Sstructurel), rtransformation(SL, Sstructurel, Dstructurel), translation(SL, Dstructurel, TL, Dstructure2), transformation(TL, Dstructure2, Sstructure2), pf_generation(TL, Sstructure2, PF2), Following input of the source language, target language, and source sentence, the gb predicate derives the target sentence, which is then printed. The six clauses of gb have the following interpretation: 1. Perform a morphological analysis on the source sentence P F 1 , converting it to a list of lexical tokens P F l a suitable for parsing. 2. Parse the list of tokens, creating an S-structure representation. 3. Perform reverse-transformations on the S-structure, converting it to D-structure representation. 4. Translate the source language D-structure into a D-structure in the target language. 63 64 5. Perform transformations on the D-structure to produce an S-structure. It is in this, stage that the well-formedness conditions apply to filter out ungrarnmatical S-structures. 6. Generate a list of tokens PF2 from the S-structure. The implementation uses the bui l t - in Prolog predicate bagof in step 4 to collect all possible translations, which are passed one at a time to the transformation routine. The bagof predicate is again used in step 5 to produce multiple transformations on each of the translations. The eight clauses of the model are described below. 6.1 Sentence Input The three-argument readinput clause prompts for the source language, target language, and sentence in the source language. The sentence is represented as a list of words, or tokens, separated by commas. Final punctuation is also represented as a token. A l l upper-case characters are converted to lower-case. The procedure for reading each character, combining characters into words, and stringing words into a list, is taken directly from Clocksin and Mell ish (1981:87-88). The following illustrates a typical dialogue: E n t e r s o u r c e l a n g u a g e : S p a n i s h . E n t e r t a r g e t l a n g u a g e : e n g l i s h . E n t e r s e n t e n c e i n S p a n i s h : Q u i e n d i j i s t e que s a l i o ? As a result of the above dialogue, the readinput clause would have its three arguments instantiated as follows: (2) r e a d i n p u t ( S p a n i s h , e n g l i s h , [ q u i e n , d i j i s t e , q u e , s a l i b , ? ] ) . (Note that diacritical marks appear over the letters in the examples here and elsewhere to improve readability. In practice, given the English keyboards available, diacritics immediately follow the letters over which they are to appear, as can be seen in the examples in Appendix A.) 65 6.2 Morphological Analysis The morph clause analyzes the list provided by readinput and updates the dictionary i f necessary. It also analyzes contractions and collocations, forming a new list of tokens to be parsed. The operation of morph is defined by the following three clauses: (3) morph(SL, Surface, Newsurface) :-contract(SL, Surface, Surfacel), collocate(SL, Surfacel, Newsurface), morphwords(SL, Newsurface). The first clause, contract, uses the contraction table (cf. 3.1.3) to split a contraction into its constituent words. It constructs an output list of tokens identical to the input list except that any contractions are expanded, or "de-contracted". For example, the Spanish sentence Yo pongo los libros al lado del radio T put the books beside the radio' would evaluate to the following: (4) cont rac t (spani sh, [yo,pongo,los,libros,al,lado,del,radio,.], [yo,pongo,los,libros,a,el,lado,de,el,radio,.]). The second clause, collocate, forms collocations on the words supplied by the contract routine. It uses the collocation table (cf. 3.1.4) to generate a new list of tokens that include collocated entries. Using the same example as above, the collocate clause would evaluate to: (5) collocate(Spanish, [yo,pongo,los,libros,a,el,lado,de,el,radio,.], [yo,pongo,los,libros,al_lado_de,el,radio,.]). The third clause, morphwords, performs a morphological analysis on each token in the l ist Three options apply: (1) the token is already in the dictionary, in which case processing continues with the next token; (2) a suffix analysis is undertaken on the token to see i f it is an inflected form of an existing lexical entry; or (3) a prompt is issued on the computer terminal to allow the addition of the word to the dictionary. The suffix analysis uses the suffix table (cf. 3.1.2) to determine i f the word is a regular inflection of a root word. For each entry in the suffix table, an attempt is made to split the word into a stem segment followed by the suffix, with letters appended to the stem 66 i f present in the suffix table. If the (appended) stem is in the dictionary, then the current token is added with features specified for in the table. If not, the next entry in the suffix table is selected, continuing in this manner until a match is found or the table is exhausted. Once an inflected form is added to the dictionary it remains until the computer session is terminated. A n example of suffix analysis is given below for the Spanish word viviendo ' l iving' . A segment of the suffix table includes the following entries: (6) suffixtable(spanish,ando,ar,v,tprog(+)]). suffixtable(Spanish,iendo,er,v,[prog(+)]). suffixtable(Spanish,iendo,ir,v,[prog(+)]). The root entry in the dictionary contains: (7) d i e t ( S p a n i s h , v i v i r , v , [ e n g l i s h ( l i v e ) , . . . ] ) . The first entry in the suffix table, -ando, fails to match with the ending of viviendo. The second entry succeeds, but when the stem viv is appended with -er, the result viver is not listed in the dictionary, so the next entry in the table is chosen. This also succeeds, and since the result vivir does exist, the dictionary is updated with: (8) d i e t ( S p a n i s h , v i v i e n d o , v , [ f ( p r o g ( + ) ) , r o o t ( v i v i r ) ] ) . If the suffix analysis fails, an opportunity is presented for including the word in the dictionary, either permanently or just for the duration of the current session. The lexical category and relevant features are requested, and the dictionary is appropriately updated. 6.3 Parsing The parse routine converts the string of tokens into a phrase marker representing the S-structure of a clause. The clause is represented by a C O M P 2 phrase, and parsed by calling the x2 routine with X = c o m p : (9) p a r s e ( L , P F , S s t r u c t u r e ) : - x2 (L , comp, S s t r u c t u r e , P F , [ ] ) . C O M P subcategorizes an I N F L phrase, as indicated in the lexical entry for complementizers, or 67 by default, and I N F L subcategorizes a V P . So parsing continues from one phrase to another as directed by the subcategorization information for each lexical head. Parsing will succeed as long as the subcategorization requirements are met, with ECs inserted i f necessary. The main purpose of the parsing component in this model, then, is simply to satisfy the Extended Projection Principle without regard to other rules or principles of grammar, such as subject-verb agreement The following is the instantiation of the parse statement after parsing the sentence: Who did you say you saw? (10) p a r s e ( e n g l i s h , [ w h o , d i d , y o u , s a y , y o u , s a w , ? ] , [ [ c o m p , [ ] ] , [ [ n , [ ] ] , e , [ e , w h o ] ] , [ e , e , [ [ i n f l , [ ] 3 , d i d , [ [ i n f l , [ ] ] , [ [ n , [ ] ] , e , [ e , y o u ] ] , [ e , e , [ [ v , [ ] ] , e , [ e , s a y , [ [ c o m p , [ ] ] , e , [ e , e , t t i n f l , [ ] ] , [ [ n , [ ] ] , e , [ e , y o u ] ] , [ e , e , [ [ v , [ ] ] , e , [ e , s a w , [ [ n , [ ] ] , e ] ] ] ] ] ] , [ m o d e , q u e s ] ] ] ] ] ] ] ] ] ) . 6.4 Reverse Transformations The function of rtransformation is to convert the S-structure to D-structure. A s D-structure is a pure representation of the 8- marking properties of the clause, a function of rtransformation is to return all lexical phrases in 8 positions to their base-generated 6 positions. A second function is to determine the feature specifications of each phrase by percolating and merging features from lexical constituents, validating feature agreement in the process. The other functions reverse the effects of the various other transformations that are required in forming well-formed surface structures. For the most part, reverse-transformations are, in fact transformations performed in reverse. That is, i f a transformation inserts a lexical item, such as pleonastic it, the reverse-transformation wil l remove i t If D o Support is required before inversion takes place, the reverse-transformation sequence is to undo the inversion, then remove the do-formative. 68 Consequently, the components that perform transformations and reverse-transformations call upon many of the same routines but with interchanged parameters. The transformational cycle is also reversed in rtransformation, so that all reverse-transformations are performed on the matrix clause first, then on each embedded clause in descending order. As a recursive function, rtransformation calls all the reverse-transformations for one cycle. It operates functionally on various stages of S-structure, with each subroutine converting the previous S-structure to a modified S-structure, passing it to the next subroutine, until the D-structure representation is obtained for that cycle. Then the process repeats on the embedded clause. The operation of rtransformation is given by the following: (11) rtransformation(L, COMP1, COMP11) :-rinversion(L, COMP1, COMP2), percolate(L, COMP2, COMP3), rmove_affix(L, COMP3, COMP4), theta_extraction(COMP4, COMP5), do_support(L, _,_,COMP6, COMP5), it_insertion(L, COMP7, COMP6), null_subject(L, COMP8, COMP7), rmove_alpha (COMP8, COMP9),. wh_feature_extraction(COMP9, COMP10), rtransform_lower(L, COMP10, COMP11). Each function is described below, although some, such as D o Support, It Insertion, and N u l l Subject, are simply the reverse of the transformations described in Chapter 4. If the structural configuration for these transformations is present in the surface structure, the corresponding structural change occurs. Otherwise, the structure remains unchanged. Notice that the parameters are reversed in these three transformations. 6.4.1 Rinversion The rinversion clause for English reverses the effect of SAI, then reverses Have/Be Raising. If an adjunct has been parsed, SAI wil l return the contents of the adjunct to the base I N F L head position. Then i f the head of I N F L is a perfective or progressive auxiliary, it bumps the current auxiliary to second auxiliary position, and the auxiliary in I N F L is moved into the primary auxiliary position, leaving the head of I N F L empty. 69 The rinversion statement for English is defined as: (12) rinversion(english, C0MP1, COMP3) :-sai(_,_, COMP2, COMP1), havebe_rais ing(COMP3, COMP2). The parameters are reversed in each clause, with the input of the reverse transformation being the same as the output of the "forward" transformation. For Spanish, rinversion is defined in terms of the verb-preposing transformation. If an adjunct has been parsed, its contents are restored to their base position according to the pattern defined by the surface structure and adjunct pair, as described in Section 4.4.4. Otherwise the structure remains unchanged. The clauses that define rinversion for Spanish are the following: (13) rinversion(Spanish, COMP1, COMP2) :-v_preposing(COMP2, COMP1), i . The v_preposing clause simply forms the four-element lists from the base structure, surface structure and adjunct, for use by the pattern-matching unit clauses in v_prepose (cf. 4.4.4). The arguments to v_preposing are given in reverse order. 6.4.2 Percolate The node position of each phrasal category contains the name of the phrase and a list of features for that phrase. Initially, the list is null . The function of percolate is to assign to the list the merged set of features of the head and specifier, which must of course agree. Percolation is carried out on all phrasal categories except V P , which has its own "percolation" routine, rmove_affix. Percolation for C O M P , I N F L , PP, and A P is trivial, since in the latter two cases there are no specifiers with features to percolate (in this model), and in the former two the head contributes no features that would clash with the specifier. The primary function of percolate, then, is to establish agreement between head nouns and articles with respect to person, number, and, for Spanish, gender. Subject-verb agreement occurs within rmove_affix. 70 Another function of percolate is to replace inflected words (other than verbs) with their root form in preparation for translation. An illustration of percolate is shown below for the sentence: I saw the ships. (14) percolate (english, [[comp,[]],e,[e,e, [[infl,[]], [[n,[]],e,[e,i]], [e,e, [[v,[]],e,[e,saw, [[n,[]],the,[e,ships]]]]]]], [mode,decl]], [[comp,[]],e,[e,e, [[infl,[per(l),pl(-)]], [[n,[per(l),pl(-)]],e,[e,i]], [e,e, [[v,[]],e,[e,saw, [[n,[per(3),pl(+)]],the,[e,ship]]]]]]], [mode,decl]]). 6.4.3 Rmove-Affix The rmove_affix clause moves tense information from the VP to INFL, undoing the affect of affix-hopping. After the first stage of rmove_affix has applied, the verb will be in its uninflected form, and the tense and agreement features of the verb will have percolated to the V2 level. A final step guarantees subject-verb agreement by merging the features of V2 with those in INFL2. Tense information includes the tense itself (past, present or future), aspect (perfective or progressive), and voice (passive or, by default, active). These features are determined from the morphology of the verbal elements. A clause within rmove_affix lists all the combinations of auxiliaries with main verb and extracts the relevant features. It also ensures that the appropriate affixes were present for each aspect Thus a perfective auxiliary requires the next verbal element to have the feature [ + PERF]; a progressive auxiliary requires a following verbal with [ + PROG]; and passive requires perfective morphology, i.e. [ + PERF]. One consequence of the passive voice is that the subject is not ©-marked. Therefore if passive morphology is detected, the feature [-THETA] is added to the feature list An additional function of rmove_affix is to assign a default tense if none is indicated. Since Spanish exhibits inflection for every tensed form, the absence of an explicit feature 71 means that the first verbal element is infinitival. In English, absence of tense indicates present tense, with person and number features taken from the subject 6.4.4 Theta-Extraction N o n - 6- marked subjects of clauses wil l be returned to their base position by the rmove_alpha function. If a predicate contains the [ - T H E T A ] feature, the reverse movement will be performed. In anticipation of this action, the [ - T H E T A ] feature is transferred up to the I N F L feature set The reverse movement will then be triggered simply by inspecting the features in I N F L . For verbs with this feature (e.g. seem), the transfer will be simultaneous with the transfer of tense and agreement features to I N F L during rmove_affix. However for adjectives (e.g. probable), the feature must be explicitly searched for and i f found transferred to I N F L . This is because the A P structure is embedded within the V P as a subcategorized phrase, and rmove_affix only moves features from the V P . The theta_extraction routine is defined below. The first parameter is the current S-structure and the second is the resulting base structure: (15) theta_extraction([[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,INFL, [[v,FV],Pl,[P2,V, [[a,FA],e,AP]]|PostV]]]]|PostC], [[comp,FC],WH,[e,COMP, [[infl,[theta(-)|FI]],NP,[e,INFL, [[v,FV],Pl,[P2,V, [[a,FA],e,AP]]|PostV]]]]|PostC]) :-member(theta(-), FA), !. 6.4.5 Do-Support The do_support function deletes the "quasi-modal" do from the head of I N F L , as described in Section 4.4.3. As a reverse-transformation, the parameters are listed in reverse order. 72 6.4.6 It-Insertion The it_insertion function replaces it with an empty N P i f the [ - T H E T A ] feature is present in I N F L , as described in Section 4.5. 6.4.7 Null-Subject The null_subject function has no effect in English as a reverse-transformation. For Spanish, an attempt will be made to lexicalize a null subject If the verbal inflection indicates first or second person, the subject is unambiguously determined, and the E C is appropriately replaced. A third person inflection may indicate a variable, so a null subject in this case remains an E C , pending later analysis. See Section 4.3. 6.4.8 Rmove- Alpha By the ©-Criterion and the Projection Principle, movement may only be to a © position. For the cases of M o v e - a under consideration in this model, the only © positions are C O M P and the subject of non-©-marking predicates. The rmove_alpha clause consists of (1) movement from C O M P , followed by (2) movement from subject: (16) rmove_alpha(COMPl, COMP3) :-rmove_comp(COMPl, COMP2), rmove_subj(COMP2, COMP3). For movement from C O M P , the phrase must be either an N P or P P with the feature [ + W H ] . Possible destinations are (1) an empty subcategorized argument, (2) an empty N P object of a preposition, (3) an empty subject, or (4) an embedded C O M P . The options are taken in that order. Options (1) and (2), and (3) for English, are required in order to satisfy the ©-Criterion. E C s in these positions must be filled. For option (2), the P P may be a subcategorized argument or an adjunct Option (3) poses difficulty for Spanish because of the null subject parameter. Substituting a w / i - N P for an E C subject is premature i f an embedded obligatory position remains empty. Unfortunately the current system does not have adequate backtracking ability to recover from this situation, so input sentences with this characteristic are 73 avoided. In the case that a sentence has two null subjects (e.g. Quien piensa que solid? 'Who thinks (he/she) left?'), the first wil l be replaced by the N P in C O M P . For both English and Spanish, the wh-NP must agree in features with the verb, which for English represents a grammaticality check, while for Spanish it helps to further distinguish between a null subject and a variable subject If none of options (1-3) are available, then for NPs, option (4) is obligatory (assuming that NPs must always receive a 8-role). For PPs, option (4) will be selected unless no embedded C O M P is available, in which case an adjunct position to the V P wil l be created to contain the wh-PP. (NPs may also have PP adjuncts, but this possibility is ignored here.) Movement from subject takes place i f a lexical N P occupies a n o n - 8- marked subject, indicated by the [ - T H E T A ] feature. If the clause is marked for passive, then movement must be to an empty object Otherwise an empty lower subject must be available as a destination. 6.4.9 WH-Feature-Extraction The purpose of this function is to merge a [ + W H ] feature into the feature list for a PP i f its N P object contains the [ + W H ] feature. The need for this arises in sentences such as: (17) a. To whom did they give glass beads? b. Who(m) did they give glass beads to? In both cases, the N P object who(m) has the feature [ + W H ] but the preposition to does not In order for the prepositional phrase to whom to move to C O M P , it must have the [ + W H ] feature. The wh_feature_extraction routine finds and extracts the features of the N P objects of all PPs. If the features of N P contain [ + W H ] , then [ + WH] is merged with the features of PP. The part of the function that performs this action is shown below: (18) wh_feature_extraction([[p,F],e,[e,P,NP]], [[p,[wh(+)|F]],e,[e,P,NP]]) :-features(NP, FNP), member(wh(+), FNP), 74 6.5 Translation Translation is a complex process that involves many factors, including problems of syntax, semantics, intensionality, causality, etc. Accurate high-quality translation of general text by machine is not yet available, and is an advanced skill even at the human level. Our interest in this system is in studying the applicability of a model of U G , augmented by the effects of parametric variation, to achieve grammatically correct translations. Subtleties of meaning are not of interest, but rather the comparative analyses of sentences that express essentially identical propositions. Therefore we restrict attention to achieving sentence translations based on the semantic equivalence between lexical elements. For example, the English verb put is semantically equivalent to the Spanish poner, both subcategorize and assign ©-roles to identical elements. We disregard other possible translations (e.g. meter, echar, colocar) as well as the myriad other expressions which use put (e.g. hard put to do something, put up with something, put to sleep, etc.). Similarly for nouns and adjectives where a single translation suffices; an English car translates to a Spanish coche, ignoring carro, auto, etc., and the adjective certain simply translates to cierto, ignoring seguro, etc. Multiple translations are provided for some lexical items, especially prepositions, although in general only one will be correct in a given context As it is often very difficult to distinguish by automatic means the correct choice, this system indiscriminately uses al l the translations provided. Typical examples are the Spanish preposition de which means of or from, the English for which translates to por or para, and the Spanish possessive pronoun su which means his, her, its, their, and your (formal form). Consequently, some anomalous translations result from input sentences such as the following: Juan solid con su amigo de Espana. (19) a. John left with his friend from Spain. b. John left with her friend from Spain. c. John left with its friend from Spain. d. John left with their friend from Spain. e. John left with your friend from Spain. f. John left with his friend of Spain. g. John left with her friend of Spain. h. John left with its friend of Spain. i . John left with their friend of Spain, j . John left with your friend of Spain. 75 Two methods are provided for translating phrases and idioms, since literal translations are almost always totally inappropriate. One is by using the collocation table, as mentioned in Section 3.1.4. The other is by storing the idiomatic expression in the lexical entry of the leading word. A typical example is kick the bucket, where kick contains as part of its lexical definition the expression the bucket. Whenever kick is encountered in the input sentence, a routine check is made first to see i f an idiomatic expression can be matched with the subsequent wording. Examples of dictionary entries with multiple translations and idiomatic expressions are shown below: (20) dict(Spanish, de, p, [ e n g l i s h ( [ o f , f r o m ] ) ] ) . d i e t ( e n g l i s h , kick, v, [expr([[the,bucket],spanish([morir, e x p r ( [ e s t i r a r , l a , p a t a ] ) ] ) ] ) , s u b c a t ( n ) , S p a n i s h ( p a t e a r ) ] ) . In the entry for kick above, the first 'expr' term contains a list of two items; the first is a list of words that form the idiomatic usage, and the second is the Spanish equivalent shown here as a list of two possible translations. One is the simple word morir 'die', the second is itself an idiomatic expression, estirar la pata. If kick is not being used in its idiomatic form, then the Spanish patear is the translation, subcategorizing for an N P . Translation takes place between D-structures, since D - structures directiy represent the thematic relations in a sentence. In fact the primary motivation for recreating D-structure from S-structure is just to simplify the translation stage, which would otherwise involve an analysis of movement chains. The strategy is to translate the sentence phrase by phrase, independently of each other, starting at the matrix C O M P 2 level. The head of each phrase is translated first and then, for nouns, inflected according to the feature specifications of the phrase. (Verbal inflections are applied later by the Move Aff ix transformation.) Specifiers are translated next and inflected to match the head. A l l subcategorized phrases and adjunct phrases are translated similarly. The translation function, recursively called for each maximal phrase structure, is shown below. The first argument to translate is the source language, the second argument is the list 76 structure of the sentence, the third argument is the target language, and the fourth is the translated list structure: (21) translate(SL, [[C,F1],Specl,HDl Postl], TL, [[C,F2],Spec2,HD2 Post2]) :-transl_xl (SL ,-C, FI, HD1, TL, F2, HD2) , transl_spec(SL,C,Specl, TL,F2,Spec2), transl_comp(SL,Postl, TL,Post2). transl_xl(SL,C,Fl, [Pre,HDl|Complementl], TL, F2, [Pre,HD2|Complement2]) :-transl_head(SL,C,Fl,HDl, TL,F2,HD2), transl_comp(SL,Complementl, TL,Complement2). The example in (23) illustrates how the list structures would appear in translating sentence (22a) to (22b). Unessential detail has been omitted, and phrases are indented to improve clarity: (22) a. How many books had Mary put beside her bed? b. Cuantos libros habia puesto Maria al lado de su cama? 77 ( 2 3) [COMP [ lNFL:TNS(past ) , + P E R F , P E R ( 3 ) - P L e  [ N P : P E R ( 3 ) , - P L mary 1 tvP :TNS(past),+PERF [ NP:PER(3), + P L ^-many book ] [pp beside [ N P : P E R ( 3 ) , - P L n e r b e d ^ ^MODE:ques ^ C O M P INFL:TNS(past) , + PERF,PER(3) - P L e ^ N P : P E R ( 3 ) - P L m a r i a ^ [ VP:TNS(past) , + P E R F Poner [ NP:PER(3), + P L c u a n t o s U b r o s 3 [pp cd_lado_de [ N P : P E R ( 3 ) , - P L M c a m a ™ ^MODErques ^ 6.6 Transformations The transformational stage transforms each translated D-structure into a set (a "bag" in Prolog terminology) of S-structures, using the transformations described in Chapter 4. Each S-structure is passed through the set of well-formedness conditions described in Chapter 5. Only the S-structures that satisfy .all the conditions are displayed as output The high level transformational function is shown below: 78 (24) transformation^, Dstructure, Sstructure) :-transform(L,matrix(+),_,Dstructure, Sstructure), dbl_comp_filter(Sstructure), case_filter(L, Sstructure), binding_conditions(L, Structure), ecp(L, Sstructure). The D-structure is transformed into S-structure by the transform function. The 'matrix( + )' parameter indicates that a matrix clause is being passed initially, since transform is recursively called on each embedded clause. Following the transformation, the well-formedness of the S-structure is validated against the Doubly- f i l l ed C O M P Filter, the Case Filter, the Binding Conditions, and the E C P . Failure to pass any of the conditions results in automatic backtracking into transform to try another transformational option. The transform function, shown below, performs the transformational cycle on each clause. The first statement, transform_lower, recursively descends the list structure looking for an embedded C O M P 2 phrase. If it finds one it calls transform on that clause, which itself calls transform_lower as its first step. In this way the structure is completely transformed beginning with the most deeply embedded clause and ending with the matrix clause. (25) transforms, Matrix, Mode, COMP1, COMP11) :-transformlower(L,matrix(-),Mode, COMP1, COMP2), set_tns(Matrix, COMP2, COMP3), move_alpha(L, COMP3, COMP4), it_insertion(L, COMP4, COMP5), null_subject(L, COMP5, COMP6), do_support(L, Matrix, Mode, COMP6, COMP7), move_affix(L, COMP7, COMP8), comp_insertion(L, Matrix, COMP8, COMP9), modal_insertion(L, COMP9, COMP10), inversion(L, Matrix, Mode, COMP10, COMP11). The 'Mode' variable unifies with the mode of the clause, as either declarative or question. It is returned by transform_lower since the mode is attached as an adjunct to the lowest C O M P 2 phrase. The mode influences D o Support and Inversion, in that a question mode wil l trigger these transformations. Each of the remaining functions of transform are described below. 79 6.6.1 S e t - T N S Each clause within the D-structure retains the tense specification of the original source sentence. Assuming that all tenses other that the matrix clause may be either [+TNS] or [ -TNS] (ignoring marked lexical exceptions), the purpose of the sct_tns function is to alternate each tense specification. During the transformational phase the original tense of each clause is maintained. Then when backtracking occurs, the tense will alternate. If the tense specification is [ + TNS] (i.e. one of [TNS(past)], [TNS(pres)], or [TNS(fut)]), it changes to [ - T N S ] , i.e. infinitive. If the original tense is infinitive, then, for the sake of making a reasonable choice blindly, [TNS(pres)] is selected, along with [PER(3)] and [ -PL] for agreement features. The matrix clause remains [ + TNS] and does not alternate. 6.6.2 Move-Alpha While technically M o v e - a moves "any category anywhere", its operation is limited here to movement to C O M P or movement to subject, the latter restricted to 8 subjects. Additionally, a is restricted to either N P or P P with the feature [ + W H ] in the case of movement to C O M P , and only N P in the case of movement to subject Five move_alpha statements are included to handle the various possibilities. They are summarized below: (26) a. N o movement b. N P movement to subject - subject must be empty and 8 c. N P movement to C O M P - N P must have the feature [ + W H ] d. P P movement to C O M P - PP must have the feature [ + W H ] e. N P movement to subject then to C O M P - same conditions as (b) and (c) A general-purpose move routine is called by move_alpha to do the actual move. The layout of move is as follows: (27) move(L, Alpha, Level, X, F, BINFL, INFL). move is provided with the category 'Alpha ' (= N P or PP) and the features ' F to be assigned 80 to the trace (i.e. [ ± A N A P H O R , ± P R O N O U N ] ) , depending on its destination. The routine recursively searches top-down through each phrase within the base I N F L 2 structure ' B I N F L ' until it finds a phrase ' X ' of category 'Alpha ' . It extracts ' X ' from ' B I N F L ' , replacing it with an E C to which are given the features 'F ' . The modified structure is returned in ' I N F L ' . O n backtracking, move wil l eventually find and pass back all c-commanded phrases of category 'Alpha' . A n index is also generated as a Prolog logical variable and added to the features of ' X ' and the E C . The variable is never instantiated but retains its Prolog-generated unique value. Only the trace and the antecedent will share the same value. To implement subjacency, the move function keeps track of the number of bounding nodes encountered. The core grammars each have a definition of the bounding categories for that language, as given in (28) below: (28) b o u n d i n g _ c a t ( e n g l i s h , i n f l ) . b o u n d i n g _ c a t ( e n g l i s h , n ) . b o u n d i n g _ c a t ( s p a n i s h , c o m p ) . b o u n d i n g _ c a t ( s p a n i s h , n ) . This explains why the language ' L ' is passed as a parameter to move. The 'Level ' parameter, initially set to zero by move_alpha, is incremented by one for each bounding category crossed. The function fails when 'Level ' reaches two. The examples given i n Section 6.9 illustrate the processing of the move_alpha transformation. 6.6.3 It-Insertion As described in Section 4.5, an empty 6 subject is obligatorily replaced with it, including features [ P E R ( 3 ) , - P L ] . If the subject position is already filled and/or is 8- marked no action is taken. 81 6.6.4 Null-Subject The nulLsubject transformation for Spanish optionally deletes a lexical subject i f it has the feature [ + P R O N O U N ] . English has no null subject option. However in the cases where a Spanish sentence is input with a null subject that does not become lexicalized, the generation of the sentence in English must lexicalize it (assuming it is not P R O , i.e. not in the environment of [ -TNS]) . Third person English pronouns are selected one at a time as valid options, re-instantiating during backtracking. 6.6.5 Do-Support Following the description in Section 4.4.3, the matrix clause of a question in English will have do inserted into modal position as long as the subject is lexical, and no other modal or auxiliary is specified. That is, the matrix clause must not be in future tense, have perfect or progressive aspect, or be in the passive voice. 6.6.6 Move-Af f ix The function of move_affix is to implement af f ix-hopping, at the same time generating auxiliaries as indicated by the presence of any aspectual features. The function is defined at the top level as follows: (29) move_affix(L, [[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,BINFL,BVP]]]|Post], [[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,INFL,VP]]]|Post]) :-a f f i x _ i n f l ( L , FI, BINFL, FV, INFL), affix_vp(L, FV, BVP, VP), affix_infl(_,F,e, F,e). af f i x _ i n f l ( L , F, SourceW, [tns(-)|F], TargetW) :-dict(L, SourceW, i n f l , Lex), not member(root(_), Lex), gen_verb(L, i n f l , F, SourceW, Lex, TargetW). affix_vp(L, FI, [[v,FV],_,[_,BV|Args]|Post], [[v,FV],PERF,[PROG,v|Args]|Post]) :-affix_to_perf(L, FI, PERF), affix_to_prog(L, FI, PROG), af f i x to verb(L, FI, BV, V). 82 The first action is to affix the tense to a modal. The lexical entry of the uninflected modal form is retrieved from the dictionary and then appropriately inflected (in the gen_verb routine, not shown) according to the tense and agreement features in INFL. English modals show minimal inflection, but Spanish verbs acting as modals (poder for can, deber for should, etc.) show normal verbal inflections. If a modal ends up taking the tense, then the feature [-TNS] is passed to the affix_vp routine to supress tense inflection of the following verbal elements. The affix_vp routine generates auxiliary verbs and selects the appropriately inflected main verb. If the INFL features specify [ + PERF], then a perfective auxiliary is created and inflected depending on the prevailing tense feature. Similarly with the progressive auxiliary if [ + PROG] is specified. Finally the main verb root form is taken from the dictionary and inflected. If the verb specifies irregular forms, then these are inspected first to see if a match can be found. If not then if the verb specifies stem forms, such as the preterite stem tuv-for the Spanish verb tener 'to have', these will be scanned for a match. Otherwise the verb will be automatically inflected by running through the suffix table looking for a matching entry. For English verbs, if the feature specifications include [TNS(pres)] and either [ + PL], [PER(2)]> or [PER(1)], then the infinitive form is selected, if an irregular form has not already been found. 6.6.7 COMP-Insertion As described in Section 4.6, the comp_insertion routine for English optionally inserts the that- complementizer into non-infinitival embedded clauses. For Spanish, the insertion of que is obligatory; the "cut" prevents backtracking. The comp_insertion routine is defined as follows: (30) comp_insertion(english, matrix(-), [[comp,FC],WH,[e,e,INFL]|Post], [[comp,FC],WH,[e,that,INFL]|Post]) :-features(INFL, F I ) , not member(tns(-), F I ) . 83 comp_insertion(spani sh, matrix(-), [[comp,FC],WH,[e,e,INFL]|Post], [[comp,FC],WH,[e,que,INFL]|Post]) features(INFL, FI), not member(tns(-), FI), i , comp_insertion(_,_,COMP, COMP). 6.6.8 Modal-Insertion In English, the infinitive marker to is inserted into modal position i f the I N F L features contain [ - T N S ] . If future tense is indicated, then will is inserted. N o modal insertion occurs in Spanish. 6.6.9 Inversion Inversion in English involves the transformations of Have/Be Raising (if relevant) followed by SAI. The code for inversion is the following: (31) inversion(english, Matrix, Mode, COMP1, COMP3) :-havebe_raising(COMPl, COMP2), sai(Matrix, Mode, COMP2, COMP3). Have/Be Raising occurs only i f an auxiliary is available for raising, and SAI occurs at the matrix level for Mode=ques. For Spanish, inversion involves the Verb Fronting transformation, obligatory for wh- phrases in any C O M P node, and also enforced at the matrix level for Mode=ques. The code for Spanish inversion is the following: (32) inversion(spanish,_,_, BaseCOMP, SurfaceCOMP) :-member([comp,FC], Ba s eCOMP), member(wh(+), FC), v_preposing(BaseCOMP, SurfaceCOMP). inversion(Spanish, matrix(+), ques, BaseCOMP, SurfaceCOMP) :-i vj?reposing(BaseCOMP, SurfaceCOMP). inver s ion(spani sh,_,_,COMP, COMP). 84 6.7 PF-Generation As shown by the diagram in Section 2.1, S-structure maps directly to P F and L F . Here, we consider the mapping to P F , which is for the most part merely a stringing out of the terminal nodes of the S-structure tree into a linear list form. However, in a speech synthesis system this component would take on major proportions. The PF-generation process is defined by the following Prolog statement: (33) pf_generate(L, Sstructure, Surface) :-readleaves(Sstructure, L i s t l ) , delete_e(Listl, List2), collocated., List3, List2), contract(L, Surface, List3). The head of the clause states that for language L , 'Sstructure' converts to 'Surface' form as a result of passing 'Sstructure' through the four clauses. The first readleaves, simply traverses the tree in a depth-first lef t - to-r ight fashion, putting each terminal symbol into a l ist ' L i s t l ' . The second clause, delete_e, removes all occurrences of the empty marker "e" from the list This is included as a separate clause, since the adjunct term created during Verb Fronting may contain "e"s; the contents of the adjunct are not analyzed within readleaves. The collocate clause is identical to that used for forming collocations during morphological analysis (cf. 6.2), but with the parameters reversed. The same applies to the fourth clause, contract, which forms contractions. The result is a list of tokens in surface form, suitable for display. 6.8 Sentence Output The final stage of processing is displaying the list of tokens on the output device. Each token in the list is printed, separated by a blank character. The first character of the list is capitalized, and final punctuation is printed depending on the value of the last token in the list: 'decl' results in a period being displayed; 'ques' converts to the question mark. 85 6.9 E x a m p l e s Two examples are presented in this section. The first illustrates the transformational phase from D-structure to one possible S-structure. The other illustrates how the well-formedness conditions restrict the set of generated S-structures to only grammatically correct ones. 6.9.1 Transformational Phase In this example, a D-structure passes through the various transformations on its way to S-structure. For each applicable transformation, the resulting phrase structure is indicated. The M o v e-a transformations shown are only one of its many optional applications, including no movement. The COMP-Inser t ion step shows that- insertion, an optional transformation. Example: What did John say that Mary saw? 1. D-structure e [ c John [ say e L Mary [ see [ what ]]]]]]] 2. S 2 Cycle - Move-a ^Si e ^Si J o h n ^ ^ '•Sj w h a t ^S2 M a r y f s e e t r ]]]]]]] 3. S 2 Cycle - Move Aff ix e [Q John [ say [~ what L Mary [ saw [ t ]]]]]]] 4. Si Cycle - Move-a what L John [ say [* t L Mary [ saw [ t ]]]]]]] 5. Si Cycle - D o Support [g what [ § John [ I N F L do ] [ say [g t [ § Mary [ saw [ t ]]]]]]] 86 6. Si Cycle - Move Af f ix [g what [ s John [ I N F L did ] [ say [g * [ g Mary [ saw [ t ]]]]]]] 7. Si Cycle - Inversion [g what [ S i t I N F L d i d J s^, J o h n ^ s a y ^S 2 1 h2 M a r y t s a w t f 6.9.2 Well-formedness Phase In this example, a D - structure is given which then undergoes the series of transformations resulting in an S-structure. Backtracking occurs, deriving alternate S-structures depending on the transformational options taken. For each S-structure generated, its status with respect to the well-formedness conditions is 'given. In some cases more than one condition disqualifies the structure, although in practice the first failure will cause backtracking. Conceptually the conditions apply in parallel, illustrating the modular nature of the theory; failure in any one component is sufficient to cause termination of the remaining components. Subjacency violations are included among the list of S-structures for illustrative purposes only; they are actually filtered out within the application of M o v e - a , not as a condition on S-structure. The sentence in (34) below represents one of the many possible valid S-structures for the D - structure in (35), including those that represent echo questions and yes/no questions. Also, it will be observed that a wh- phrase in subject position will appear identical to (i.e. have the same P F representation as) a w/z-phrase in C O M P binding a variable in subject position, as in items 10 and 18 below. The S-structures are listed in the order in which they are generated. Only the clause boundaries are shown. Traces are indicated by t, and e represents the empty complementizer. (34) Who seems to have left? (35) [g e [ s e [ + TNS] seem [g e [ § who [ ± T N S , + P E R F ] leave ]]]] 87 Select [ -TNS] in embedded clause: 1. [ e [ does it seem [ e [ who to have left ]]]] Fails: Case Filter 2. [ e [ does who^ seem [ e [ ti to have left ]]]] 3. [ whOj- [ t-t seems [ e [ ti to have left ]]]] 4. [ whOj- [ does it seem [ e [ ti to have left ]]]] Fails: Subjacency 5. [ e [ does it seem [ who^- [ *• to have left ]]]] Fails: Case Filter 6. [ e [ does who^ seem [ [ to have left ]]]] Fails: Case Filter, Binding Condition C 7. [ whoz- [ t[ seems [ ti [ ti to have left ]]]] Fails: Case Filter, Binding Condition C 8. [ whoj [ does it seem [ ti [ ti to have left ]]]] Fails: Case Filter Select [ + TNS] in embedded clause: 9. [ e [ does it seem [ that [ who^- has left ]]]] 10. [ e [ does it seem [ e [ who^- has left ]]]] 11. [ e [ does who- seem [ that [ has left ]]]] Fails: Case Filter, Binding Condition A , E C P 12. [ e [ does who ;- seem [ e [ ti has left ]]]] Fails: Case Filter, Binding Condition A 13. [ who- [ tj seems [ that [ t, has left ]]]] Fails: Case Filter, Binding Condition A , E C P 14. [ whOj- [ ti seems [ e [ ti has left ]]]] Fails: Case Filter, Binding Condition A 15. f whOj- [ does it seem [ that [ ti has left ]]]] Fails: Subjacency 16. [ whOj- [ does it seem [ e [ ti has left ]]]] Fails: Subjacency 17. [ e [ does it seem [ who- that [ L has left ]]]] Fails: Doubly- f i l l ed C O M P Filter, E C P 18. [ e [ does it seem [ who^- e [ ti has left ]]]] 19. [ e [ does who- seem [ t= that [ has left ]]]] Fails: Case Filter, Binding Condition C, E C P 20. [ e [ does who^ seem [ ti e [ ^ has left ]]]] Fails: Case Filter, Binding Condition C 21. [ who- [ tt seems [ i - that [ f. has left ]]]] Fails: Case Filter, Binding Condition C , E C P 22. [ whOj- [ f. seems [ ti e [ l-l has left ]]]] Fails: Case Filter, Binding Condition C 23. [ who- [ does it seem [ t- that [ t- has left ]]]] Fails: T i C P 24. [ whOj [ does it seem [ ti e [ ti has left ]]]] Chapter 7 EVALUATION AND DISCUSSION The foregoing chapters have been descriptive of the model in terms of its capabilities. This chapter reviews the capabilities in light of its limitations. The concerns center around the representations that have been chosen and the procedures used to manipulate and transform the representations. Little attention wil l be paid to analyzing the theoretical constructs on which the model is based. 7.1 Representations As pointed out in Section 2.1, the description of a grammatical system relies on the lexicon and the rules and principles involved in combining lexical items. The role of specific phrase structure rules is minimized, generalizing to a single phrase structure rule system based on X - b a r theory, with lexical entries directly determining the course of a derivation. This is in accord with attempts to reduce i f not eliminate phrase structure rules (Chomsky 1981; Stowell 1981), in contrast to alternative approaches to extend the phrase structure component, as in Generalized Phrase Structure Grammar (Gazdar and Pullum 1982). The power of the system, then, is a function primarily of the complexity of the lexicon and the principles that determine well-formed constructions in accordance with the X - b a r system of phrase structure. 7.1.1 Lexicon The dictionary component of the lexicon consists of entries keyed by lexical item and consisting of feature descriptions. The definition of a lexical entry, given in 3.1.1(1), indicates a lexical category, but this is merely an abbreviation for some combination of the features [ ± N , ± V ] which could just as well be included among the entry's other features. One advantage to representing categories as features is that subcategorization information would not 89 90 be restricted to the set of categories but rather could specify from one to any number of features. A couple of examples wil l illustrate this. Chomsky's (1981) conception of the clause is defined as in (1) ( = 2.2.1(16)): (1) S —> N P I N F L V P where V P is actually a general predicate phrase, i.e. any [ + V] constituent, thus allowing for either V P or A P , as in She is upset. (The copula is added as an adjunct, perhaps transformationally, in order to provide an attachment site for tense during Aff ix Movement) This overcomes the necessity of the ©-Extraction routine (cf. 6.4.4) and provides a more general representation for the clause.1 8 The I N F L category could then be defined as subcategorizing for [ + V] rather than V P . In the other direction, subcategorization by feature specification can be used to restrict the allowable constituents. For example, the verb put requires its PP to be of type [ + L O C ] (i.e. locative), so that the subcategorization frame for put might specify [ + N - V ] , [ - N , - V , + L O C ] for the N P and PP, respectively. Similar techniques could be used in selecting complementizers, embedded tense of clausal arguments, and selectional restrictions on subcategorized arguments in general. Nevertheless, the present system makes use of category names to simplify the retrieval of lexical items. N o internal search is required to match features, so access time is reduced. It also facilitates the description and access of specifiers and adjuncts by associating them with category names, not features. 1 8 Even more generality can be obtained by adopting the rule in (i) for the clause as proposed by Will iams (1984): (i) S —> N P I N F L X 2 where X may be any of the major lexical categories, as in the following examples: (ii) a. Mary is a woman. ( X 2 = N P ) b. Mary is hungry. ( X 2 = A P ) c. M a r y is at home. ( X 2 = P P ) d. Mary is eating. ( X 2 = V P ) 91 Furthermore, it is not clear as to how the minor categories should be represented in terms of features. Does an article include the features [ + N - V J to indicate affinity with pure pronominals? Evidence suggests that it might, since they do not modify [ + VJ categories and they show morphological Case in some languages (e.g. German) by virtue of having the [ + N] feature. But they are prohibited from occupying the head of an N P phrase, so some additional features are required to distinguish the two. Without creating the necessary features to characterize this distinction, they are simply given the name A R T . The number of features used in this system is in fact limited to that required to achieve a minimum level of descriptive accuracy, principally those of person, number, and gender. The inadequacy of this set is abundandy demonstrated by the range of ungrammatical and anomalous sentences accepted and generated. It seems clear that, for example, some feature regarding animacy ( [ ± A N I M A T E ] ) is required to distinguish between those entities that possess volition and can initiate action, and therefore serve as (thematic) agents of an action, and those that cannot For example, consider the sentences in (2): (2) a. John opened the door. b. The key opened the door. Structurally identical, they differ thematically; John acts as agent in (2a), whereas the key in (2b) has the role of instrument, although both are represented grammatically as the subject The problem becomes apparent in the grammar of Spanish. Spanish makes use of the personal a to distinguish direct objects with human(like) or animate qualities, as in (3): (3) a. Juan vio el libro. ('John saw the book.') b. Juan vio a Maria. ('John saw Mary.') c. Juan vio el perro. ('John saw the dog.') d. Juan vio al perro. ('John saw the dog.') The quasi-preposition a is obligatory in (3b) and obligatorily absent in (3a); (3c) expresses an objective event while (3d) imputes a sentiment of, say, affection or familiarity. Since the current system lacks a feature of animacy and therefore cannot accommodate the use of the 92 personal a, this leads to incorrect analyses, such as the sentence (4a) which in this model translates to (4b): (4) a. Qui6n vio la pelicula? ("Who saw the film?') b. Who did the film see? The analysis of (4a) involves verb fronting, such that the subject is taken to be la pelicula, inverted with the verb vio. Quien then assumes the role of direct object Other features commonly associated with lexical items but not implemented include the count/mass distinction for nouns, locative/directional for prepositions, predicate/non-predicative adjectives, etc. A t this time there appears to be no generally accepted theory as to the nature and number of semantic classes required, nor whether strictly binary features are sufficient as opposed to n-ary features or gradations along a continuum. 7.1.2 Phrase Structure The representation of a phrase according to the metarules of X - b a r theory is convenient and theoretically attractive. It provides the framework in which to express linguistic configurations as generalized tree structures (or as a linearization of structures in free phrase order languages such as Latin and Japanese). The choice of trees follows from a long tradition in linguistics, and computer science provides a large store of knowledge in manipulating trees as lists, efficiently traversing them through recursive programming. The characteristics of a phrase, its features, are naturally associated with its structural representation, insofar as phrase descriptions are independent of each other. Gender, for example, is attached to a noun phrase independent of the predicating verb phrase, although semantic notions such as selectional restrictions may impose further constraints on what may be predicated, i.e. which features are selected and which prohibited. Some cross-phrasal dependencies may also be expressed structurally, such as the agreement of a subject and verb. In this case the abstract but motivated creation of an I N F L phrase containing both subject and verb provides the unifying mechanism. 93 In other cases, the association is not as direct The mode of a sentence is variously represented orthographically (punctuation), lexically (particles), phonetically (intonation), or pragmatically (intention). In the present model, the mode, attached as a structural adjunct to the lowest C O M P node, is passed up as an external flag to affect the initial structure of the sentence in the form of inversion. (That is, a question mark forces inversion.) This mechanism is expedient for present purposes but almost certainly inadequate as a general representation of mode. (Note that this is very similar to the abstract Q 'morpheme as an adjunct to S (Baker 1978).) Alternatively, features of mode may be clausally related and reside in I N F L , partaking in some form of affix movement (and reverse affix movement) along with features of tense, voice, aspect, and mood. Presence or absence of the feature wil l then control inversion. Some structural possibilities not implemented include relative clauses, coordination, adverbial phrases, and adjectival modifiers. The present system could be extended without a great deal of effort to include some aspect of these, although they are not trivial extensions. Furthermore, translation imposes increased effort in having to work out idiosyncracies in each language. However, the general approach is fairly clear. Relative clauses, as a post-adjunct to an N P , could be defined as an S in which some form of wh- movement has applied to leave a gap in the clause. Some adjustment to the definition of the principles would be necessary, particularly affecting the usage of governing categories, which are not explicitiy defined in this system. A n approach to coordination is presented in (Fong and Berwick 1985), based on a technique of linearizing the conjoined phrases and finding equivalences within the linearization. Adverbial phrases of manner (e.g. lentamente 'slowly') could be defined as pre- or post-adjuncts to I N F L . Adverbials of degree (e.g. muy 'very') may be specifiers of adverbials or adjectival phrases. Other adverbials of time and place (e.g. ahora 'now', aqui 'here') can attach as post-adjuncts to the verb. Adjectival modifiers may appear to be relatively simple to implement, but in fact present special problems. Predicate adjectives such as small, red, sincere may be positioned prenominally in English, postnominally in Spanish, or in copular phrases in either language: 94 (5) a. a small apple; the apple is small una manzana pequena; la manzana es pequena b. the red houses; the houses are red las casas rojas; las casas son rajas c. a sincere man; the man is sincere un hombre sincero; el hombre es sincero A single lexical entry would appear to suffice for each of these adjectives, and its position may be defined in terms of X-bar as an NP pre-adjunct, post-adjunct, or as a complement of the copula verb, agreeing in number and gender with the noun. However, not all predicate adjectives may be nominal modifiers, as illustrated by the following examples: (6) a. The boy is afraid. b. *He is an afraid boy. c. *He is a boy afraid. d. He is a boy afraid of the dark. (V) a. The woman is fond of her nephew. b. *She is a fond of her nephew woman. c. She is a woman fond of her nephew. The simple adjective afraid may not be positioned prenominally as in (6b), nor postnominally as in (6c) unless it is part of an adjectival phrase as in (6d). Similarly, the phrase fond of her nephew may not appear prenominally in (7b) but because of its transitivity, like (6d), must appear postnominally, as in (7c). Moreover, descriptive adjectives in Spanish may often appear on either side of the noun with differing semantic connotations: (8) a. un hombre pobre ('a poor man', i.e. without money) b. un pobre hombre ('a poor [unfortunate, wretched] man') But adjectives modified by adverbials must always follow the noun: (9) a. un hombre muy pobre ('a very poor man') b. *un muy pobre hombre Some (ad hoc) extension to feature types is required to resolve these lexical differences. On account of considerations such as these, the implementation of adjective phrases has not been attempted, other than those that introduce clausal arguments (e.g. likely, probable) and whose distribution is therefore invariant 95 Additional extensions to phrase structure representation include the variety of M o v e - a operations, e.g. Extraposition with It Insertion (e.g. It amazed me that Fred won a prize), Topicalization (e.g. This picture, I like), Heavy N P Shift (e.g. We have with us this evening a very special guest), Stylistic Inversion in Spanish (e.g. Canta bien la chica "The girl sings well'), and Passivization. Some machinery has been included in the model to handle the latter, but the major problem here lies more in translation. True Spanish passives are rendered with the auxiliary ser 'be' and the past participle which agrees in number and gender with the subject: (10) a. E l juguete fue roto ayer. ('The toy was broken yesterday.') b. Las reuniones fueron organizadas por los obreros. ("The meetings were organized by the workers.') The ser auxiliary has not been added, nor have participles that, like adjectives, require agreement More commonly, passive voice is expressed with the reflexive pronoun se, as in : (11) a. Se prohibe fumar. ('Smoking is prohibited.') b. Esas casas se vieron por los agentes. ('Those houses were seen by the agents.') c. Se venden tamales aqui. ('Tamales are sold here.') Reflexives, including all other object pronoun clitics, have not been implemented here. Relevant theoretical discussion of cliticization within the G B framework may be found in Jaeggli (1982) and Borer (1984). 7.1.3 Configurationality The primary design goal of the system is to construct a model of grammatical knowledge that achieves cross-linguistic generality. It emphasizes the representation of a universal grammar as an embodiment of general principles, and individual core grammars as parametrized variations to the principles. For a not insignificant subset of the grammars of English and Spanish, this goal has been achieved. 96 A n obvious question that arises is to what extent this system wil l accommodate the grammars of other languages, especially non-Indo-European languages. To the extent that a language is configurational in terms of embedded phrase structure (Chomsky 1981), e.g. English, the model would ideally require only minor, i f any revision, apart from establishing the lexicon and the parameters. Unfortunately, the analysis component of the system, i.e. the parser, is designed for strictly top-down lef t - to-r ight processing, imposed by the Prolog execution of the grammar rules. This presupposes head-initial grammars, such as English and Spanish, and other so-called S V O languages. This strategy falls apart with German, for example, an S O V language. Since the object is an argument of the verb and the verb is in final position, an indeterminate amount of lexical material must be passed before the verb is recognized and its complement determined. This suggests that ultimately a bottom-up approach may prove to be more general and preferable, using some technique such as "chart" parsing (Kaplan 1973), as in the Q-system (Colmerauer 1970), Prolog's predecessor. As to so-called non-configurational languages (e.g. Japanese), a bottom-up approach to parsing may again be preferable. The verb will indicate the argument structure that must be satisfied, with order of arguments relatively freely distributed in pre-verbal position. A mechanism, possibly incorporated as a reverse-transformation, could map a non-configurational structure into a configurational D-structure in which thematic relations are structurally represented as in configurational languages. The same alterations to the parsing strategy required for German may accordingly hold for Japanese, and by extension English, Spanish, etc., at least as far as the satisfaction of the ©-Criterion and Extended Projection Principle. Since the present model is configurationally based, the statement of the E C P , binding conditions, and Case Filter are all expressed in terms of structural configurations, and a language so represented should require minimal change in the model. A t most, new structural configurations would be necessary to express the new environments) where the conditions may hold. Once again, i f non-configurational languages can be equivalently represented configurationally, then no substantial changes or additions would be required in the statement 97 of the principles, apart from those motivated by ongoing empirical investigations. 7.1.4 Well-Formedness Conditions The well-formedness conditions as implemented here act as output Filters on S-structures. A n obvious improvement to the time efFiciency is to execute the conditions in parallel, given parallel machine architecture. If any one condition fails it halts all ongoing tasks, rather than the current situation where a structure may sequentially pass several conditions, possibly incurring substantial computation, before being rejected. This also motivates the use of Filters to reduce the rule component, since Filters may operate in parallel, but rules are necessarily serial operations. Ideally, though, the model would be constructed in such a way that only valid S-structures are generated in the First place. This suggests a closer integration of the principles within the rules of grammar, as subjacency is (necessarily) implemented. A s an example, take Move-a, which includes the following preconditions (with subjacency being a condition on execution): (12) Move-a Preconditions (i) a agrees with its destination _ (ii) the destination is an empty 6 position (iii) a and its trace satisfy the binding conditions (iv) the trace would be properly governed Condition (i) requires a correspondence between the lexical category of a and its destination, and further that i f the destination is C O M P then a includes [ + WH]. The latter is implemented as a filter after Move-a has applied (which leads to some unnecessary computation, as phrases are brought one at a time to C O M P only to be rejected by the Filter). Condition (ii) is trivial. Conditions (iii,iv) implement the principles of binding and government as output Filters. One advantage to the separation is that it localizes the deFinition of the principles to a single component rather than scattering them among different rules, which would result in , for example, an E C P condition on NP-traces in one rule, an E C P condition on wh- traces in another, and a disjoint E C P condition on P R O . The disadvantage is in the 98 additional computation that takes place before an invalid movement is rejected, as in condition (i) for wh- movement In a sense, this integration has been achieved in the expression of c-command, a recurring notion in G B theory. The phrase structure design along with top-down recursion effectively implements the concept of c-command, obviating the necessity of a special predicate such as "c-command(X,Y)" . Any material contained in the specifier, complement, or adjunct, and no other, is inherently c-commanded by the head. The definitions of government and binding, as implemented, require no reference to c-command as a precondition; it is inherent in the structure itself. But top-down recursion is not always satisfactory, at least not for the binding relation. In a recursive descent of phrase structure, government of an argument, including ECs , may be locally determined by isolating the governor and its c-commanded phrases. Thus the operations that rely on government i.e. assignment of Case and ©-roles and validation of the E C P , can be performed within the recursive descent However, binding is expressed as a passive operation. That is, rather than determining i f a binds /3, the conditions depend on whether 0 is bound/not bound by a , suggesting an upward search rather than a downward search. For example, a name (or variable) must not be bound by a c-commanded phrase (= Condition C) . This is implemented in one of two ways. Either a search is made upward from a name, halting when a co-indexed antecedent is found or the top node is reached, or a search is made from every N P downward to see i f by chance it binds a name. The latter is the approach taken in the present model, which suffers from the inefficiency of having to search the entire length of every downward path from every N P just in case it binds a name (or variable, as it were, in this model). A n interference in integrating principles with rules arises from exceptional government phenomena such as S-deletion. S-deletion affects the bounding category for the application of the binding conditions. But since S is a cyclic node in terms of the transformational cycle, a procedure within the cycle has no access to structural information above S to determine i f 99 S-delet ion is in effect Therefore the satisfaction of the binding conditions is sometimes relevant at the current cycle and other times at the next higher cycle. The same applies in all cases where government of the subject is crucial. Consequently S-delet ion is a major contributor to the decision to implement the principles as well-formedness conditions on output 7.2 Execution 7.2.1 Relation to Logic Grammars The use of Prolog in natural language processing stems from the introduction of logic grammars in 1975 (Colmerauer 1975). Since the logic grammar formalism is a part of the Prolog language, grammars can be written in concise, familiar notation and be interpreted directly by Prolog. Several notable logic grammar formalisms include Metamorphosis Grammars (Colmerauer 1975), Definite Clause Grammars (Pereira and Warren 1980), Extraposition Grammars (Pereira 1981), Modifier Structure Grammars (Dahl and M c C o r d 1983), and Definite Clause Translation Grammars (Abramson 1984). Prolog is designed primarily for top-down backtracking analysis of sentences, although a bottom-up Prolog parser has been designed (Stabler 1983) to implement the deterministic parser of Marcus (1980). In contrast to the grammars mentioned above, this implementation has a very small grammar rule component A l l phrase structures are parsed using the X - b a r metarule system, guided by the lexicon directly with minimal backtracking. Arguments are not used in the rules except to name the language and category of the phrase undergoing the parse, and to return its parse tree. Validation only occurs during the reverse-transformation stage. This is not to say that validation should not occur during parsing, but only that this system maintains a separation of function for the sake of modular design. As the system is designed to accept and process only grammatical sentences, the validations that are performed occur primarily as a side effect of establishing the sentence's 100 D-structure. For example, in restoring the tense feature to I N F L , a check is made to ensure that tense is specified once and only once, preventing sentences such as: (13) a. Paul can leaves for Costa Rica tomorrow, b. She drink mineral water. where (a) is overspecified for tense and (b) is underspecified. Similarly, the percolation routine contains consistency checks for subject-verb agreement and article-noun agreements as a side-effect of feature identification. Sentences containing errors of these sorts, such as (13), will parse successfully but fail in the reverse-transformation component Apart from the logic grammar formalism, the derivation of D-structure from S-structure (and vice versa) is largely a functional process, for which a functional programming language (such as LISP) is entirely adequate. The output of one function (a transformation) is input to a following function, with often no non-determinism involved. The unification provided by Prolog is mostly used as a pattern-matching device for phrasal descriptions, not for instantiation of logical variables, except of course for output variables from function application. As such, the power of Prolog may be under-utilized, but its relative perspicuity, especially in grammar notation, over L I S P - l i k e languages (as noted by Pereira and Warren (1980)) makes it an attractive programming language for natural language research. 7.2.2 Parsing The areas where parsing shows difficulty is in relation to empty and optional arguments. Consider the two sentences below: (14) a. What did you eat? b. Where did you eat? The verb eat is transitive in (a) and intransitive in (b), but the parser is unable to determine which is applicable at the time the verb is being retrieved from the lexicon. The part of the parse tree that contains the wh- phrase is not accessible at this point If it assumes an intransitive use then (a) wil l fail the ©-Criterion when reverse -Move-a attempts to relocate the W/J-NP to a Case-marked, hence 6, position. What is required is a strategy similar to 101 that taken in (b) where a position is actually created to contain the wh- element, only instead of a 6 adjunct, a 8 argument is created, provided the verb subcategorizes for a phrase of that type and that its subcategorization frame is not already satisfied. The verb be has particular difficulty with ECs, since as a main verb it may take N P , PP, or A P , as in : (15) a. Who are you t ? b. Where are you t ? c. How are you t ? The E C defaults to an N P by the parser for the sake of creating an argument position, so that (b,c) cannot be recognized. A further problem arises when the argument is incorrectly instantiated, as in : (16) What is that behind you? The phrase behind you is parsed as a PP argument of is, so no E C is created. The w / z - N P what wil l then fail to find a 8 position to occupy. (In passing, an oversight in design adds to be's problems as a main verb. It fails to participate in Have/Be Raising, resulting in the following anomalous translation: (17) a. D6nde estas? ('Where are you?') b. Where do you be? It also causes the following sentence to be unrecognizable since the parser fails to find a main verb in canonical position: (18) Is he likely to be late? This is easily corrected by modifying the Have/Be Raising rule appropriately.) The pro-drop parameter as implemented fails to adequately distinguish pro from P R O in cases where pro represents a third person pronoun. When the parser reads sentence (19a) it converts it to the D-structure in (b) with an E C in the position labelled N P : (19) a. Es posible que e sale hoy. (Tt is possible that he/she/it is leaving today.') b. [ e ser posible [ N P [ ± T N S ] salir hoy ]] c. Es posible e salir hoy. (Tt is possible to leave today.') 102 Since the null subject routine cannot distinguish the correct pronoun for the E C in (a), it leaves its features empty. When translated to English, the N P will be lexicalized to each of the possible pronouns. However, when given sentence (c), it wil l also create D-structure (b). Since the tense feature freely alternates (in the set_tns routine), [ + TNS] wil l cause the N P to appear as pro, as in (a), resulting in lexicalized pronouns being generated. But the E C in (c) is P R O , not pro, so that the English sentences of (a) cannot be valid translations of (c). The proper way to resolve this is to provide the ECs with the appropriate [ + A N A P H O R , ± P R O N O U N ] features, either at the time of parsing when the E C is created or as a subfunction of percolation, when other features are determined. 7.2.3 Translation The translation component is primitive in comparison to developed machine translation systems (see Slocum (1985) for a review of such). This is largely because no semantic analysis occurs in the translation. The system is probably best characterized as an interlingual system (Slocum 1985) in which a source language sentence is converted into an essentially language-independent representation, i.e. the T> structure. The language-independent character derives from the equivalence of lexical items in the dictionary. For example, the verb eat has an equivalent word in Spanish, comer, in which subcategorization frames and thematic roles are identical. Therefore the D-structure containing eat wi l l be identical to that for the Spanish comer and (presumably) for any language with an equivalent concept Hence, the • language-independent nature of the interlingual representation. The translation of lexical items itself, then, is simply a lexical transfer process based on the correspondences established in the (bilingual) lexicon. Repeating what was said in Section 6.5, no attempt is made to provide for the range of translations for each lexical item. This avoids the task of handling the "17 words for snow in Eskimo" problem, which otherwise would require some circumlocutory translations. 103 Lexical equivalence needn't be restricted to single lexical items. Thus, the English verb to feed is translated in Spanish as dar de comer ('give to eat'); typewriter is translated as maquina de escribir ('machine to write'). The mechanisms exist to effect this form of translation. A n area that does not lend itself well to translation in this system involves verbs in which the grammatical function of the thematic roles are reversed. A typical ' example is the verb like in English, generally translated using the Spanish gustar, as in : (20) a. Mary does not like pastries. b. A Mar ia no le gustan los pasteles. The direct object in English becomes the subject in Spanish. Subcategorizations are identical, but the 0 - ro le of "agent" in Spanish is assigned to the N P dominated by the V P , not S as in English. As 8- theory is not developed in this system to the point of specifying actual 8-roles, the distinction cannot be made between like and gustar, nor between verbs of similar nature, e.g. olvidar 'forget', perder 'lose', etc. Two possible solutions are (1) to retranslate to a word that does have equivalent functional and thematic structure, and (2) to implement 0 - ro le assignment Solution 1 for like/gustar is to retranslate gustar to please ("The pastries are not pleasing to Mary." ) , but like has no equivalent Besides being a poor translation in terms of normal usage, some means are required to deal with items which simply have no strict translation. Complications immediately arise when the structural and thematic properties diverge, as i n : (21) a. I am hungry, b. Tengo hambre. ('I have hunger.') The predicate adjectival usage in (a) is substituted for a transitive verb in (b). The translation component must have the ability to make structural alterations with corresponding 0 - r o l e changes, as required. Solution 2 is obviously preferable, since it imparts a degree of semantic information into an existing syntactic structure. The necessity of explicit subcategorization information 104 becomes doubtful given a well-defined theory and notation of ©-roles. A major problem in translation is ambiguity in all its various forms. N o attempt has been made in this system at disambiguation since it requires contextual information not available in the surface sentence. For example, the sentence: (22) Time flies like an arrow. has many possible syntactic structures depending on the specific meanings of each item and the context (sometimes pragmatically absurd) in which they occur. As mentioned earlier, some assumptions are made regarding the ambiguity of PP-attachments, exemplified by the classical sentence: (23) a. I saw a man in the park with the telescope. b. Y o vi a un hombre en el parque con telescopic Apart from the lexical ambiguity of the verb saw (—the further ambiguity of saw as verb or noun is resolved by the Extended Projection Principle—), the correct attachment of the two PPs is many-wise ambiguous. The PP in the park may modify man or saw, and the PP with the telescope may modify man, saw, or park. However, disambiguation in (23) turns out to be unnecessary. Sentence (a), ambiguous in English, remains ambiguous in Spanish (b). In other instances, disambiguation becomes essential, as for (24a) below which has either (b) or (c) as its translation: (24) a. The man told the woman that he loved the story. b. E l hombre le dijo a la mujer que le gustaba el cuento. c. E l hombre cont6 el cuento a la mujer a quien queria. Sentence (b) has the sense of "The story that the man loved was told to the woman (by the man)", whereas (c) has the sense "The story was told (by the man) to the woman whom he loved." In this example, not only are the structural configurations radically different, but they lead to quite distinct lexical translations; tell, when subcategorizing for a clause, translates to decir, (b), but when used in the sense of recounting, translates to contar, (c). Further, the translation of the verb love depends on the characteristics of the direct object When animate, it translates to querer, otherwise gustar. Without the benefit of context (or intonation patterns) there is no way of determining which of (b) or (c) is the correct translation of (a). 105 Another area which receives minimal treatment is the specification of .time. The system recognizes tense, aspect, and voice, clearly, especially in Spanish, but makes no correspondence of the usage patterns between English and Spanish. Whereas English has one simple past tense, Spanish has two, the imperfect and the preterit The imperfect is often translated into past progressive. Present progressive in English is commonly expressed by simple present in Spanish. Other constructions wil l use the perfective aspect in one language but not in the other. A n adequate theory of temporal representation is required (see, e.g., Hornstein (1981) and Y i p (1985)) and an understanding of how different languages express equivalent time references. 7.2.4 Generation Little more wil l be stated here on generation than what has already been covered under parsing, since the two operations are ideally inverse operations of each other. In fact it is interesting to consider the possibility of using a single predicate function to achieve both, i.e.: (25) p a r s e _ g e n ( L , S u r f a c e , D e e p ) . where either 'Surface' is supplied, resulting in the D - structure 'Deep', or 'Deep' is supplied giving 'Surface' or a set of 'Surface's. In general, such reversible predicates are not feasible for complex relations such as this, although certain sub-processes may be handled as such. In the transformational cycle, for example, many transformations are reversible, such as D o Support and It Insertion. Others are nearly so, such as the Inversion transformations, where Have/Be Raising occurs before SAI in one direction, and the opposite order is performed in the reverse direction. However, other processes are specifically designed to be unidirectional, and it is difficult to see how they may generalize to bidirectionality. For example, the M o v e - a transformation assigns indices and features, but the reverse movement ignores such assignments; it has been optimized, in a sense, to remove them from consideration, and in fact they are not even present at the time of the reverse -Move-a operation. 106 The use of reversible predicates would most certainly reduce the overall size of the transformational component, but possibly at the expense of increased processing time. This remains an area for further investigation. Chapter 8 Conclusion This thesis has described the implementation of a model of natural language that reflects current theories of transformational grammar. The fundamental motivation behind the theory is to develop a description of natural language grammars that achieves a high degree of explanatory capacity. A number of basic principles are formulated to this end which provide for a concise, modular, cross-linguistic account of grammatical structure. The principles in effect represent the embodiment of a universal grammar. Specific language grammars are instantiations of this universal grammar, with language variations attributed to parametrizations of the principles. As there exists little published research on systems whose design is based on principles of G B , it is difficult to do a comparative analysis. Principal among what is available include the work of Berwick and Weinberg (1984) regarding the computational complexity of GB-or iented systems, that of Wehrli (1983, 1984) on a G B parser of French, and current developments at M I T (Barton 1984; Dorr 1985) in parsing and translation in the context of G B theory. The present study represents a further contribution to a growing body of practical G B systems. A subset of the grammars of English and Spanish is presented, including language-specific rule applications along with the parameters that affect general rules. The conditions on well-formedness are represented as general conditions, again with simple parametric values determining the effect Translation is simplified by relating equivalent D-structures, allowing for some degree of structural differences in the representations. The equivalence reduces the size of the lexicons, since identical features between lexical items need be represented only once. 107 108 The lack of semantics is intentional, since the principles modeled here relate primarily to syntax. The analysis of logical form (LF) becomes the next major area of research, incorporating new rules and principles of L F and borrowing from existing ones where motivated. A t the same time, further extensions to the syntactic component wil l expand the range of analyzable surface constructions (PF). With adequate representations of P F and L F , and sufficient memory to record their discourse history, it becomes possible to analyze discourse in terms of knowledge networks and inference rules. This of course leads to a more refined sense of the term "natural language understanding", in which a higher degree of intelligence is demonstrated. This is one goal among many i n the combined study of linguistics, artificial intelligence, and cognitive science. Bibliography Abramson, H . (1984) "Definite Clause Translation Grammars," in Proceedings of the 1984 International Symposium on Logic Programming, Atlantic City, N J , 233-240. Baker, C . L . (1978) Introduction to Generative-Transformational Syntax, Englewood Cliffs, N J : Prentice-Hall , Inc. Barton, G . E . (1984) "Toward a Principle-Based Parser," A l Memo N o . 788, M I T A l Laboratory, Cambridge, M A . Berwick, R. and Weinberg, A . (1984) The Grammatical Basis of Linguistic Performance, Cambridge, M A : M I T Press. Borer, H . (1979) "Empty Subjects in Modern Hebrew and Constraints on Thematic Relations," in Proceedings of the Tenth Annual Meeting of the North Eastern Linguistic Society, Ottawa, Ontario. (1981) " O n the Definition of Variables," Journal of Linguistic Research 1.3:16-40. (1984) Parametric Syntax: Case Studies in Semitic and Romance Languages, Dordrecht: Foris Publications. Burzio, L . (1981) Intransitive Verbs and Italian Auxiliaries, Ph .D. dissertation, M I T , Cambridge, M A . Chomsky, N . (1957) Syntactic Structures, The Hague: Mouton. (1972) "Remarks on Nominalization," in Studies on Semantics in Generative Grammar, The Hague: Mouton. (1977a) Essays on Form and Interpretation, New York: Nor th-Hol land . (1977b) " O n W h - M o v e m e n t " , in P.W. Culicover, T. Wasow, and A . Akmajian (Eds.), Formal Syntax, New York : Academic Press. (1980) Rules and Representations, New York: Columbia University Press. (1981) Lectures on Government and Binding, Dordrecht: Foris Publications. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, M A : M I T Press. (1985) "Barriers," unpublished ms., M I T , Cambridge, M A . Chomsky, N . and Lasnik, H . (1977) "Filters and control," Linguistic Inquiry 8:425-504. (1978) " A Note on Contraction," Linguistic Inquiry 9:268-274. Clocksin, W . F . and Mellish, C S . (1981) Programming in Prolog, Berl in: Springer-Verlag. Colmerauer, A . (1970) "Les systemes-Q ou un formalisme pour analyser et synthetiser des phrases sur ordinateur," Internal Publication 43, Department d'Informatique, Universite de Montreal, Quebec. 109 110 (1975) "Les grammaires de metamorphose," Groupe dTntelligence Artificielle, University de Marsai l le-Luminy. Appears as "Metamorphosis Grammars" in L . Bole (Ed.) Natural Language Communication with Computers, Berl in: Springer, 1978. Dahl , V . (1983) "Current Trends in Logic Grammars," Technical Report 83-2, Computer Sciences Department, Simon Fraser University, Burnaby, B C . Dahl , V . and M c C o r d , M . (1983) "Treating Coordination in Logic Grammars," American Journal of Computational Linguistics 9:69-91. Dorr, B. (1985) "Toward a Principle-Based Translator," A l Working Paper 274, M I T A l Laboratory, Cambridge, M A . Dresher, B.E. (1981) "Abstractness and explanation in phonology," in N . Hornstein and D . Lightfoot (Eds.), Explanation in Linguistics, London: Longman. (1984) "Can a computer learn accent systems?", talk presented May 1 at the Artificial Intelligence and Formal Linguistics Discussion Group, Simon Fraser University, Burnaby, B C . Emonds, J. (1976) A Transformational Approach to English Syntax: Root, Structure-Preserving and Local Transformations, New York: Academic Press. Fil lmore, C . (1968) "The Case for Case", in E. Bach and R. Harms (Eds.), Universals in Linguistic Theory, New York: Holt , Rinehart, and Winston. Fong, S. and Berwick, R. (1985) "New Approaches to Parsing Conjunctions Using Prolog," in Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 870-876. Gazdar, G . and Pullum, G . (1982) "Generalized Phrase Structure Grammar: A Theoretical Synopsis," Indiana University Linguistics Club, Bloomington, I N . Gruber, J.S. (1965) Studies in Lexical Relations, Ph .D. dissertation, M I T , Cambridge, M A . Hornstein, N . (1981) "The study of meaning in natural language," in N . Hornstein and D . Lightfoot (Eds.) Explanation in Linguistics, London: Longman. Huang, C . - T . J . (1982) Logical Relations in Chinese and the Theory of Grammar, P h . D . dissertation, M I T , Cambridge, M A . (1984) " O n the Distribution and Reference of Empty Pronouns," Linguistic Inquiry 15:531-574. Jackendoff, R S . (1972) Semantic Interpretation in Generative Grammar, Cambridge, M A : M I T Press. (1977) X Syntax: A Study of Phrase Structure, Cambridge, M A : M I T Press. Jaeggli, O . (1982) Topics in Romance Syntax, Dordrecht: Foris Publications. Kaplan, R. (1973) " A General Syntactic Processor," in R. Rustin (Ed.) Natural Language Processing, New York: A l g o r i t h m i c Press. Lasnik, H . and Kupin , J. (1977) " A Restrictive Theory of Transformational Grammar," I l l Theoretical Linguistics 4:173-196. Lasnik, H . and Saito, M . (1984) " O n the Nature of Proper Government," Linguistic Inquiry 15:235-289. Lightfoot, D . (1982) The Language Lottery: Towards a Biology of Grammars, Cambridge, M A : M I T Press. Lyons, J. (1968) Introduction to Theoretical Linguistics, London: Cambridge University Press. Marcus, M . (1980) Theory of Syntactic Recognition for Natural Language, Cambridge, M A : M I T Press. Pereira, F . (1981) "Extraposition Grammars," American Journal of Computational Linguistics, 7:243-255. (1984) " C - P r o l o g User's Manual ," SRI International, Menlo Park, C A . Pereira, F . and Warren, D . (1980) "Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks," Artificial Intelligence 13:231-278. Pesetsky, D . (1982) "Complementizer-Trace Phenomena and the Nominative Island Condition," Linguistic Review 1:297-343. Peters, S. and Ritchie, R . W . (1973) " O n the Generative Power of Transformational Grammars," Information Sciences 6:49-83. Reinhart, T. (1976) The Syntactic Domain of Anaphora, Ph .D. dissertation, M I T , Cambridge, M A . Rizzi , L . (1982) Issues in Italian Syntax, Dordrecht: Foris Publications. Rouveret, A . and Vergnaud, J . -R . (1980) "Specifying Reference to the Subject," Linguistic Inquiry 11:97-202. Safir, K . (1982) Syntactic Chains and the Definiteness Effect, Ph .D. dissertation, M I T , Cambridge, M A . Schwind, C B . (1984) "Logic Based Natural Language Processing," Proceedings of the Natural Language Understanding and Logic Programming Workshop, IRISA, Rennes, France. Slocum, J. (1985) " A Survey of Machine Translation: Its History, Current Status, and Future Prospects," Computational Linguistics, 11:1-17. Stabler, E (1983) "Deterministic and Bot tom-Up Parsing in P R O L O G , " in Proceedings of the AAAI, 383-386. Stowell, T . A . (1981) Origins of Phrase Structure, P h . D . dissertation, M I T , Cambridge, M A . Taraldsen, T. (1978) " O n the N I C , Vacuous Application and the That- Trace Filter," mimeographed ms., M I T ; distributed by The Indiana University Linguistics Club, Bloomington, I N . Torrego, E. (1984) " O n Inversion in Spanish and Some of its Effects", Linguistic Inquiry 112 15:103-129. Wehrl i , E . (1983) " A Modular Parser for French," in Proceedings of the Eighth International Joint Conference on Artificial Intelligence, 686-689. (1984) " A Government-Binding Parser for French," Working Paper N o . 48, Institut pour les Etudes Semantiques et Cognitives, Universite de Geneve, Switzerland. Wexler, K . and Culicover, P. (1980) Formal Principles of Language Acquisition, Cambridge, M A : M I T Press. Will iams, E . (1981) "Argument Structure and Morphology," The Linguistic Review 1:81-114. (1984) 'T/jere-Insertion," Linguistic Inquiry 15:131-151. Y i p , K . (1985) "Tense, Aspect, and the Cognitive Representation of Time," in Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics, 18-26. Appendix A . E X A M P L E S The examples shown here illustrate the properties of the model. Sentences are input in one language and converted to D-structure representation. A translation of D-structures is made, followed by the generation of all allowable S-structures. The working vocabulary is fairly small, but the intention is to model representative lexical items, rather than handle a broad coverage of the languages. The examples were produced on a V A X 11/750 running under U N I X 4.2 B S D at the University of British Columbia. The program is written in C - P r o l o g 1.5 (Pereira 1984). Some minor amount of post-editing has been done to facilitate inclusion of these examples in the appendix. Diacritics have been eliminated except where ambiguity might result (e.g. hablo vs. hablo). In the examples, a diacritic follows the vowel over which it applies. As noted in the text, the sentences generated from questions in the source language include echo questions, yes/no questions, and wh questions, as illustrated in some of the examples below, such as Example 3. N o attempt is made to limit the form of the output to that of the input sentence. Example 1. Enter source language: english. Enter target language: Spanish. Enter sentence i n english: John said he thought h i s brother had seen the f i l m . Juan d i j o que e ' l pensaba que su hermano habia v i s t o l a p e l i c u l a . Juan d i j o que pensaba que su hermano habia v i s t o l a p e l i c u l a . Example 2. Enter source language: Spanish. Enter target language: english. Enter sentence i n Spanish: creo que eres mi amigo. I believe that you are my f r i e n d . I believe you are my f r i e n d . I believe you to be my f r i e n d . Example 3. Enter source language: Spanish. Enter target language: english. 1 1 3 Enter sentence i n Spanish: quie'n parece haber salido? Does who seem to have l e f t ? Who seems to have l e f t ? Does i t seem that who has l e f t ? Does i t seem who has l e f t ? Does i t seem who has l e f t ? Who does i t seem has l e f t ? Example 4. Enter source language: Spanish. Enter target language: english. Enter sentence i n Spanish: es probable que l a mujer hablara' antes de l a reunion. It i s l i k e l y that the woman w i l l speak before the meeting. It i s l i k e l y the woman w i l l speak before the meeting. The woman i s l i k e l y to speak before the meeting. It i s probable that the woman w i l l speak before the meeting. It i s probable the woman w i l l speak before the meeting. Example 5. Enter source language: english. Enter target language: Spanish. Enter sentence i n english: should mary know the tru t h about her father? Debe saber Maria l a verdad acerca de su padre? Debe Maria saber l a verdad acerca de su padre? Example 6. Enter source language: Spanish. Enter target language: english. Enter sentence i n Spanish: parece que juan ha estirado l a pata. It seems that John has died. It seems John has died. John seems to have died. It seems that John has kicked the bucket. It seems John has kicked the bucket. John seems to have kicked the bucket. Example 7. Enter source language: english. Enter target language: Spanish. Enter sentence i n english: where d i d you put my book? Pusiste t u ' mi l i b r o do'nde? Pusiste mi l i b r o do'nde? Do'nde p u s i s t e tu' mi l i b r o ? Do'nde pusiste mi li b r o ? E x a m p l e 8. Enter source language: english. Enter target language: Spanish. Enter sentence i n english: when did my si s t e r put t h i s book on the table? Puso mi hermana este l i b r o en l a mesa cua'ndo? Cua'ndo puso mi hermana este l i b r o en l a mesa? Puso mi hermana este l i b r o sobre l a mesa cua'ndo? Cua'ndo puso mi hermana este l i b r o sobre l a mesa? E x a m p l e 9. Enter source language: english. Enter target language: Spanish. Enter sentence i n english: mary i s l i k e l y to come to the meeting after the f i l m . Es probable que Maria viene a l a reunion despues de l a p e l i c u l a . E x a m p l e 10. Enter source language: Spanish. Enter target language: english. Enter sentence i n Spanish: despues de cua'l p e l i c u l a salieron? Did they leave after which film? Which f i l m did they leave after? After which f i l m did they leave? E x a m p l e 11. Enter source language: english. Enter target language: Spanish. Enter sentence i n english: which f i l m did they leave after? Salieron e l l o s despues de cua'l pelicula? Salieron despues de cua'l pelicula? Despues de cua'l p e l i c u l a salieron ellos? Despues de cua'l p e l i c u l a salieron? E x a m p l e 12. Enter source language: Spanish. Enter target language: english. Enter sentence i n Spanish: quiero su l i b r o . I want his book. I want her book. I want i t s book. I want t h e i r book. I want your book. E x a m p l e 13. Enter source language: Spanish. Enter target language: english. Enter sentence i n Spanish: s a l i o ' ? Did he leave? Did she leave? Did i t leave? E x a m p l e 14. Enter source language: english. Enter target language: Spanish. Enter sentence i n english: how much time do we have? Tenemos nosotros cua'nto tiempo? Tenemos cua'nto tiempo? Cua'nto tiempo tenemos nosotros? Cua'nto tiempo tenemos? E x a m p l e 15. Enter source language: english. Enter target language: Spanish. Enter sentence i n english: who did John know mary had said some man had spoken with? Sabia Juan que Maria habia dicho que algun hombre habia hablado con quie'n? Con quie'n sabia Juan que habia dicho Maria que algun hombre habia hablado? Con quie'n sabia Juan que habia dicho Maria que habia hablado algun hombre? E x a m p l e 16. Enter source language: Spanish. Enter target language: english. Enter sentence i n Spanish: quie'n v i o l a p e l i c u l a ? Did the f i l m see who? Who did the f i l m see? Appendix B. Prolog Code - English Lexicon ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / * V /* LEXICON: English */ /* */ ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /* MORPHOLOGY */ ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * suffixtable(english,ies,y,n,[pl(+)]). suffixtable(english,ves,f,n,[pl(+)]). suffixtable(english,ves,fe,n,[pl(+)])• suffixtable(english,es,"",n,[pl(+)]). suffixtable(english,s,"",n,[pl(+)]). suffixtable(english,ies,y,v,[pl(-),per(3),tns(pres)]). suffixtable(english,s,"",v,[pl(-),per(3),tns(pres)]). suffixtable(english,ied,y,v,[tns(past)]). suffixtable(english,ed,e,v,[tns(past)]). suffixtable(english,ed,"",v,[tns(past)]). suffixtable(english,en,e,v,[perf(+)]). suffixtable(english,en,"",v,[perf(+)]). suffixtable(english,ing,e,v,[prog(+)]). suffixtable(english,ing,"",v,[prog(+)]). ^*************************************************** /* FUNCTION WORDS */ ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * dict(english,a,art,[f(pl(-)),spanish(un)]). dict(english,an,art,[f(pl(-)),spanish(un)]). dict(english,another,art,[f(pl(-)),spanish(otro)]). diet(english,each,art,[f(pl(-)),spanish(cada)]). diet(english,every,art,[f(pl(-)),spanish(cada)]). diet(english,her,art,[spanish(su)]). dict(english,his,art,[spanish(su)]). diet(english,how_many,art,[f([wh(+),pl(+)]), spanish('cua*'ntos 1)]). diet(english,how_much,art,[f([wh(+),pl(-)]), spanish('cua''nto')]). diet(english,its,art,[spanish(su)]). diet(english,many,art,[f(pl(+)),spanish(muchos)]). diet(english,my,art,[spanish(mi)]). diet(english,no,art,[spanish(ningun)]). dict(english,other,art,[f(pl(+)),spanish(otros)]). dict(english,our,art,[spanish(nuestro)]). dict(english,several,art,[f(pl(+)),spanish(varios)]). dict(english,some,art,[spanish(algun)]). diet(english,that,art,[f(pl(-)),spanish(ese)]). dict(english,the,art,[spanish(el)]). diet(english,their,art,[spanish(su)]). diet(english,these,art,[f(pl(+)),spanish(estos)]). diet(english,this,art,[f(pl(-)),spanish(este)]). diet(english,those,art,[f(pl(+)),spanish(esos)]). dict(english,what,art,[f(wh(+)), Spanish (['que' " , 'cua" 1' ]) ]). diet(english,which,art,[f(wh(+)),spanish('cua''1')]). diet(english,your,art,[spanish(tu)]). 117 diet(english,had,perf,[f(tns(past)),root(have)]). diet(english,has,perf,[f([tns(pres),per(3),pl(-)]), root(have)]). dict(english,have,perf,[spanish(haber)]). dict(english,be,prog,[spanish(estar)]). diet(english,been,prog,[f(perf(+)),root(be)]). diet(english,being,prog,[f(prog(+)),root(be)])• diet(english,is,prog,[f([tns(pres),per(3),pl(-)]),root(be)]). diet(english,am,prog,[f([tns(pres),per(1),pl(-)]),root(be)3). diet(english,are,prog,[f([tns(pres),pl(+)3),root(be)]). dict(english,are,prog,[f([tns(pres),per(2)]),root(be)]). diet(english,was,prog,[f([tns(past),per(3),pl(-)]),root(be)]). diet(english,was,prog,[f ([tns(past),per(1),pl(-)]),root(be)]). diet(english,were,prog,[f([tns(past),pl(+)3),root(be)3)• diet(english,were,prog,[f ([tns(past),per(2)3),root(be)]). diet(english,that,comp,[subcat(inf1),spanish(e)]). diet(english,for,comp,[f(tns(-)),subcat(inf1),spanish(e)3)• diet(english,he,n,[f(pro(+)),spanish('e''1')]). diet(english,how_much,n,[f(wh(+)),spanish('cua''nto')]). diet(english,how_many,n,[f([wh(+),pl(+)]), spanish('cua''ntos')]). dict(english,i,n,[f([per(l),pro(+)3),proper(+),spanish(yo)]). diet(english,it,n,[f(pro(+)),Spanish('e'*1')]). diet(english,she,n,[f(pro(+)),spanish(ella)]). diet(english,that,n,[spanish(eso)]). dict(english,these,n,[f(pl( + )),spanish('e" stos 1)]). diet(english,they,n,[f([pl(+),pro(+)]),spanish(ellos)]). diet(english,this,n,[spanish(esto)]). diet(english,those,n,[f(pl(+)),spanish('e' 1sos')]). dict(english,we,n,[f([per(l),pl(+),pro(+)]), spanish(nosotros)]). diet(english,what,n,[f(twh(+),per(3),pl(-)]), spanish('que'' 1)]). diet(english,which,n,[f([wh(+),per(3),pl(-)]), spanish('cua 1'1')]). diet(english,who,n,[f([wh(+),per(3),pl(-)]), spanish('quie''n')]). dict(english,whom,n,[f([wh(+),per(3),pl(-)]), Spanish('quie 1'n')]). diet(english,you,n,[f([per(2),pl(-),pro(+)]), Spanish ('tu' " ) ] ) . diet(english,can,inf1,[f(tns(pres)),subcat(v),irr(could), spanish(poder)]). diet(english,could,infl,[f(tns(past)),root(can)]). diet(english,should,inf1,[f(tns(pres)),subcat(v), spanish(deber)]). diet(english,did,inf1,[f(tns(past)),root(do)]). diet(engli sh,do,inf1,[f(tns(pres)),irr([does,did]),subcat(v)]) diet(english,does,inf1,[f([tns(pres),per(3),pl(-)]),root(do)]) diet(english,to,infl,[f(tns(-)),subcat(v),spanish(e)]). dict(english,will,inf1,[f(tns(fut)),subcat(v),spanish(e)]). diet(english,about,p,[subcat(n),spanish(acerca_de)]). diet(english,above,p,[subcat(n),spanish([encima_de,sobre])]). diet(english,after,p,[subcat(n),spanish(despues_de)]). diet(english,against,p,[subcat(n),Spanish(contra)]). diet(english,among,p,[subcat(n),spanish(entre)]). diet(english,around,p,[subcat(n),spanish(airededor_de)]). dict(english,at,p,[subcat(n),spanish(a)]). dict(english,before,p,[subcat(n),spanish(antes_de)]). diet(english,behind,p,[subcat(n),spanish(detras_de)]). dict(english,below,p,[subcat(n),spanish(debajo_de)]). dict(english,beneath,p,[subcat(n),spanish([bajo,debajo_de])]) d i e t ( e n g l i sh,beside,p,[subcat(n),spani sh(al_lado_de) ] ) . diet(english,between,p,[subcat(n),spanish(entre)]). dict(english,by,p,[subcat(n),spanish(por)]). dict(english,during,p,[subcat(n),Spanish(durante)]). diet(english,for,p,[subcat(n),Spanish([por,para])]). diet(english,from,p,[subcat(n),spanish(de)]). dict(english,in,p,[subcat(n),spanish(en)]). diet(english,inside,p,[subcat(n),spanish(dentro_de)]). diet(english,into,p,[subcat(n),spanish(dentro)]). diet(english,near,p,[subcat(n),spanish(cerca_de)]). dict(english,of,p,[subcat(n),spanish(de)]). diet(english,on,p,[subcat(n),spanish([en,sobre])]). dict(english,outside,p,[subcat(n),spanish(fuera_de)]). diet(english,over,p,[subcat(n),Spanish(sobre)]). dict(english,through,p,[subcat(n),spanish(por)]). diet(english,to,p,[subcat(n),spanish(a)]). diet(english,toward,p,[subcat(n),spanish(hacia)]). diet(english,towards,p,[subcat(n),spanish(hacia)]). diet(english,under,p,[subcat(n),spanish(debajo_de)]). diet(english,when,p,[f(wh(+)),spanish('cua''ndo')]). diet(english,where,p,[f(wh(+)),Spanish('do''nde')]). diet(english,with,p,[subcat(n),Spanish(con)]). dict(english,without,p,[subcat(n),spanish(sin)]). /**********************************************************/ /* CONTRACTIONS */ /**********************************************************/ contraction(english,'i''m',[i,am]). contraction(english,'I''m',['I',am]). contraction(english,'i''ve',[i,have]). contraction(english,'I''ve',['I',have]). /********************************************************** ^ /* COLLOCATIONS */ /**********************************************************/ collocation(english,[how,many],how_many). collocation(english,[how,much],how_much). /**********************************************************/ /* DICTIONARY */ ^**********************************************************/ diet(english,am,v,[f([tns(pres),per(l),pl(-)]),root(be)]). dict(english,are,v,[f([tns(pres),per(2)]),root(be)]). diet(english,are,v,[f([tns(pres),pl(+)]),root(be)]). diet(english,be,v,[subcat(a),subcat(n),subcat(p),perf(been), p r o g ( b e i n g ) , i r r ( [ i s,am,ar e,was,wer e] ) , spanish([estar,ser])]). diet(english,bed,n,[spanish(cama)]). dict(english,been,v,[f(perf(+)),root(be)]). diet(english,being,v,[f(prog(+)),root(be)]). dict(english,believe,v,[subcat(comp),subcat(n),sdel(+), spani s h ( c r e e r ) ] ) . diet(english,book,n,[Spanish(libro)]). diet(english,bought,v,[f ([perf(+),tns(past)]),root(buy)]). diet(english,boy,n,[spanish(muchacho)]). dict(english,brother,n,[Spanish(hermano)]). diet(english,bucket,n,[spanish(cubeta)]). dict(english,buy,v,[subcat(n),irr(bought),perf(bought), spanish(comprar)]). diet(english,came,v,[f(tns(past)),root(come)]). diet(english,certain,a,[f(theta(-)),sdel(+),subcat(comp), s p a n i s h ( c i e r t o ) ] ) . diet(english,come,v,[f(perf(+)),root(come)]). dict(english,come,v,[irr(came),perf(come),subcat(p), spanish(venir)]). d i c t ( e n g l i s h , c o s t , v , [ f ( [ p e r f ( + ) , t n s ( p a s t ) ] ) , r o o t ( c o s t ) ] ) . d i c t ( e n g l i s h , c o s t , v , [ i r r ( c o s t ) , p e r f ( c o s t ) , s u b c a t ( n ) , spanish(costar)]). d i c t ( e n g l i s h , d i d , v , [ f ( t n s ( p a s t ) ) , r o o t ( d o ) ] ) . dict(english,die,v,[perf(died),prog(dying),spanish(morir)]). d i c t ( e n g l i s h , d i e d , v , [ f ( [ t n s ( p a s t ) , p e r f ( + ) ] ) , r o o t ( d i e ) ] ) . diet(english,do,v,[irr([does,did]),perf(done),subcat(n), spanish(hacer)]). d i e t ( e n g l i s h , d o e s , v , [ f ( [ t n s ( p r e s ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( d o ) ] ) . dict(english,done,v,[f(perf(+)),root(do)]). d i c t ( e n g l i s h , d y i n g , v , [ f ( p r o g ( + ) ) , r o o t ( d i e ) ] ) . diet(english,expect,v,[subcat(comp),spanish(suponer)]). dict(english,father,n,[Spanish(padre)]). d i c t ( e n g l i s h , f i l m , n , [ s p a n i s h ( p e l i c u l a ) ] ) . diet(english,firm,n,[spanish(empresa)]). d i c t ( e n g l i s h , f l o o r , n , [ S p a n i s h ( p i s o ) ] ) . dict(english,friend,n,[Spanish(amigo)]). d i c t ( e n g l i s h , g a v e , v , [ f ( t n s ( p a s t ) ) , r o o t ( g i v e ) ] ) . dict(english,girl,n,[spanish(muchacha)]). diet(english,give,v,[subcat([n,p]),irr([gave]),spanish(dar)]) d i e t ( e n g l i sh,go,v,[subcat(p),subcat(comp),per f(gone), irr([goes,went]),spani s h ( i r ) ] ) d i c t ( e n g l i s h , g o e s , v , [ f ( [ t n s ( p r e s ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( g o ) ] ) . diet(english,gone,v,[f(perf(+)),root(go)]). diet(english,had,v,[f([perf(+),tns(past)]),root(have)]). d i e t ( e n g l i s h , h a s , v , [ f ( [ t n s ( p r e s ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( h a v e ) ] ) . dict(english,have,v,[subcat(n),irr([has,had]),perf(had), spanish(tener)]). dict(english,house,n,[spanish(casa)]). d i e t ( e n g l i s h , i s , v , [ f ( [ t n s ( p r e s ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( b e ) ] ) . dict(english,john,n,[proper(+),spanish(juan)]). diet(english,kick,v,[expr([[the,bucket],spanish([morir, e x p r ( [ e s t i r a r , l a , p a t a ] ) ] ) ] ) , s u b c a t ( n ) , perf(kicked),spanish(patear)]). d i e t ( e n g l i s h , k i c k e d , v , [ f ( [ p e r f ( + ) , t n s ( p a s t ) ] ) , r o o t ( k i c k ) ] ) . dict(english,knew,v,[f(tns(past)),root(know)]). dict(english,know,v,[subcat(comp),subcat(n),sdel(+), perf(known),irr(knew),Spanish(saber)]). dict(english,known,v,[f(perf(+)),root(know)]). d i e t ( e n g l i s h , l e a v e , v , [ i r r ( l e f t ) , p e r f ( l e f t ) , S p a n i s h ( s a l i r ) ] ) . d i e t ( e n g l i s h , l e f t , v , [ f ( [ p e r f ( + ),tns (past)]),root(leave)]). d i e t ( e n g l i sh,likely,a,[f(theta(-)),subcat(comp),sdel(+), spanish(probable)]). diet(english,made,v,[f([perf( + ) ,tns(past)]),root(make)]). diet(english,make,v,[perf(made), i r r (made), subcat(n), spanish(hacer)]). diet(english,man,n,[pl(men),Spanish(hombre)]). dict(english,mary,n,[proper( + ),spanish(maria)]). diet(english,meeting,n,[Spanish(reunion)]). diet(english,men,n,[f(pl( + )),root(man)]). diet(english,mother,n,[Spanish(madre)]). dict(english,movie,n,[Spanish(cine)]). diet(english,paw,n,[spanish(pata)]). dict(english,possible,a,[f(theta(-)),subcat(comp), Spanish(posible)]). diet(english,probable,a,[f(theta(-)), subcat(comp), spanish(probable)]). d i c t ( e n g l i s h , p u t , v , [ f ( [ p e r f ( + ) , t n s ( p a s t ) ] ) , r o o t ( p u t ) ] ) . d i c t ( e n g l i s h , p u t , v , [ s u b c a t ( [ n , p ] ) , i r r ( p u t ) , p e r f ( p u t ) , prog(putting),spanish(poner)]). d i c t ( e n g l i s h , p u t t i n g , v , [ f ( p r o g ( + ) ) , r o o t ( p u t ) ] ) . dict(english,said,v,[f([perf(+),tns(past)]),root(say)]). dict(english,saw,v,[f(tns(past)),root(see)]). diet(english,say,v,[subcat(comp),irr(said),perf(said), Spanish(decir)]). d i e t ( e n g l i sh,see,v,[subcat(n),prog(seeing),perf(seen), i r r ( s a w ) , s p a n i s h ( v e r ) ] ) . d i c t ( e n g l i s h , s e e i n g , v , [ f ( p r o g ( + ) ) , r o o t ( s e e ) ] ) . diet(english,seem,v,[f(theta(-)),sdel(+),subcat(comp), spanish(parecer)]). diet(english,seen,v,[f(perf( + )),root(see) ] ) . d i c t ( e n g l i s h , s e l l , v , [ i r r ( s o l d ) , p e r f ( s o l d ) , s u b c a t ( n ) , spanish(vender)]). diet(english,side,n,[Spanish(lado)]). dict(english,sister,n,[spanish(hermana)]). d i e t ( e n g l i s h , s l e e p , v , [ i r r ( [ s l e p t ] ) , p e r f ( s l e p t ) , spanish(dormir)]). d i e t ( e n g l i s h , s l e p t , v , [ f ( [ p e r f ( + ) , t n s ( p a s t ) ] ) , r o o t ( s l e e p ) ] ) . diet(english,sold,v,[f([perf(+),tns(past)]),root(sell)]). dict(english,spain,n,[proper( + ),spani s h(espana)]). dict(english,speak,v,[irr(spoke),perf(spoken),subcat(p), spanish(hablar)]). diet(english,spoke,v,[f(tns(past)),root(speak) ]). diet(english,spoken,v,[f(perf(+)),root(speak)]). dict(english,stretch,v,[subcat(n),Spanish(estirar)]). diet(english,table,n,[Spanish(mesa)]). dict(english,thing,n,[spanish(cosa)]). diet(engli sh,think,v,[subcat(comp),irr([thought]), perf(thought),spanish(pensar)]) diet(english,thought,v,[f([perf( + ), tns(past)]),root(think)]) dict(english,time,n,[Spanish(tiempo)]). dict(english,truth,n,[Spanish(verdad)]). dict(english,want,v,[perf(wanted),subcat(comp),subcat(n), sdel(+),spanish(querer)]) diet(english,wanted,v,[f([perf(+),tns(past)]),root(want)]) diet(english,was,v,[f([tns(past),per(l),pl(-)]),root(be)]) diet(english,was,v,[f([tns(past),per(3),pl(-)]),root(be)]) dict(english,went,v,[f(tns(past)),root(go)]) . diet(english,were,v,[f([tns(past),per(2)]),root(be)]). dict(english,were,v,[f([tns(past),pl(+)3),root(be)]). diet(english,woman,n,[pi(women),Spanish(mujer)]). dict(english,women,n,[f(pl(+)),root(woman)]). dict(english,written,v,[f(perf(+)),root(write)]). dict(english,wrote,v,[f(tns(past)),root(write)]). Appendix C . Prolog Code - Spanish Lexicon /**********************************************************^ /* V /* LEXICON: Spanish */ / * • V ^**** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *^ / * M O R P H O L O G Y * / ^ ******* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^  suffixtable(Spanish,es,"",n,[pl(+)]). s u f f i x t a b l e ( S p a n i s h , s , " " , n , [ p l ( + ) ] ) . suffixtable(Spanish,a,o,n,[fem(+)]). suffixtable(Spanish,as,o,n,[fem(+),pi(+)]). s u f f i x t a b l e ( S p a n i s h , o , a r , v , [ p e r ( l ) , p l ( - ) , t n s ( p r e s ) ] ) . suffixtable(spanish,as,ar,v,[per(2),pl(-),tns(pres)]). suf f i x t a b l e ( s p a n i sh,a,ar,v,[per(3),pi(-),tn s(pr e s ) ] ) . suffixtable(spanish,amos,ar,v,[per(1),pl(+),tns(pres)]). suffixtable(Spanish,an,ar,v,[per(3),pl(+),tns(pres)]). suffixtable(Spanish,abas,ar,v,[per(2),pl(-),tns(past)]). suffixtable(Spanish,aba,ar,v,[per(3),pl(-),tns(past)]). s u f f i x t a b l e ( S p a n i s h , a b a , a r , v , [ p e r ( l ) , p l ( - ) , t n s ( p a s t ) ] ) . suffixtable(Spanish,abamos,ar,v,[per(1),pl(+),tns(past)]). suffixtable(Spanish,aban,ar,v,[per(3),pi(+),tns(past)]). s u f f i x t a b l e ( S p a n i s h , ' e ' ' ' , a r , v , [ p e r ( l ) , p l ( - ) , t n s ( p a s t ) ] ) . suffixtable(spanish,aste,ar,v,[per(2),pl(-),tns(past)]). suffixtable(Spanish,'o'" ,ar,v,[per(3),pl(-),tns(past)]). suffixtable(spanish,amos,ar,v,[per(l),pl(+),tns(past)])• suffixtable(spanish,aron,ar,v,[per(3),pl(+),tns(past)]). suffixtable(Spanish,o,er,v,[per(1),pl(-),tns(pres)]). s u f f i x t a b l e ( s p a n i s h , e s , e r , v , [ p e r ( 2 ) , p l ( - ) , t n s ( p r e s ) ] ) . suffixtable(Spanish,e,er,v,[per(3),pl(-),tns(pres)]). suffixtable(spanish,emos,er,v,[per(1),pl(+),tns(pres)]). suffixtable(Spanish,en,er,v,[per(3),pl(+),tns(pres)]). s u f f i x t a b l e ( S p a n i s h , i a s , e r , v , [ p e r ( 2 ) , p l ( - ) , t n s ( p a s t ) ] ) . s u f f i x t a b l e ( S p a n i s h , i a , e r , v , [ p e r ( 3 ) , p l ( - ) , t n s ( p a s t ) ] ) . s u f f i x t a b l e ( S p a n i s h , i a , e r , v , [ p e r ( l ) , p l ( - ) , t n s ( p a s t ) ] ) . suffixtable(Spanish,iamos,er,v,[per(1),pl(+),tns(past)]). suffixtable(Spanish,ian,er,v,[per(3),pl(+),tns(past)]). s u f f i x t a b l e ( S p a n i s h , ' i ' ' ' , e r , v , [ p e r ( 1 ) , p l ( - ) , t n s ( p a s t ) ] ) . suffixtable(Spanish,iste,er,v,[per(2),pl(-),tns(past)]). s u f f i x t a b l e ( S p a n i s h , ' i o ' 1 1 , e r , v , [ p e r ( 3 ) , p l ( - ) , t n s ( p a s t ) ] ) . suffixtable(Spanish,imos,er,v,[per(l),pl(+),tns(past)]). suffixtable(Spanish,ieron,er,v,[per(3),pl(+),tns(past)]). suffixtable(Spanish,o,ir,v,[per(l),pl(-),tns(pres)]). suffixtable(spanish,es,ir,v,[per(2),pl(-),tns(pres)]). suffixtable(Spanish,e,ir,v,[per(3),pl(-),tns(pres)]). 123 suffixtable(Spanish,imos,ir,v,[per(l),pl(+),tns(pres)]). suffixtable(Spanish,en,ir,v,[per(3),pl(+),tns(pres)]). suf f i x t a b l e ( s p a n i s h , i a s , i r , v , [ p e r ( 2 ) , p l ( - ) , t n s ( p a s t ) ] ) . s u f f i x t a b l e ( S p a n i s h , i a , i r , v , [ p e r ( 3 ) , p l ( - ) , t n s ( p a s t ) ] ) . s u f f i x t a b l e ( S p a n i s h , i a , i r , v , [ p e r ( l ) , p l ( - ) , t n s ( p a s t ) ] ) . suffixtable(Spanish,iamos,ir,v,[per(l),pl(+),tns(past)]). suffixtable(Spanish,ian,ir,v,[per(3),pl(+),tns(past)]). s u f f i x t a b l e ( S p a n i s h , ' i ' 1 ' , i r , v , [ p e r ( 1 ) , p l ( - ) , t n s ( p a s t ) ] ) . s u f f i x t a b l e ( S p a n i s h , i s t e , i r , v , [ p e r ( 2 ) , p l ( - ) , t n s ( p a s t ) ] ) . s u f f i x t a b l e ( S p a n i s h , ' i o ' ' ' , i r , v , [ p e r ( 3 ) , p l ( - ) , t n s ( p a s t ) ] ) . suffixtable(Spanish,imos,ir,v,[per(1),pl(+),tns(past)]). suffixtable(Spanish,ieron,ir,v,[per(3),pl(+),tns(past)]). s u f f i x t a b l e ( S p a n i s h , ' e * ' ' , " " , v , [ p e r ( 1 ) , p l ( - ) , t n s ( f u t ) ] ) . s u f f i x t a b l e ( S p a n i s h , ' a ' ' s ' , " " , v , [ p e r ( 2 ) , p l ( - ) , t n s ( f u t ) ] ) . s u f f i x t a b l e ( S p a n i s h , ' a ' ' ' , " " , v , [ p e r ( 3 ) , p l ( - ) f t n s ( f u t ) ] ) . suffixtable(Spanish,'ara 1'mos 1,ar,v,[per(1),pl(+),tns(fut)]). suffixtable(Spanish,'e''mos',"",v,[per(l),pl(+),tns(fut)]). suffixtable(Spanish,'a''n',"",v,[per(3),pl(+),tns(fut)]). suffixtable(Spanish,ado,ar,v, [perf( + ) ] ) . suffixtable(Spanish,ido,er,v,[perf(+)]). s u f f i x t a b l e ( S p a n i s h , i d o , i r , v , [ p e r f ( + ) ] ) . suffixtable(spanish,ando,ar,v,[prog(+)]). suffixtable(Spanish,iendo,er,v,[prog(+)]). suffixtable(Spanish,iendo,ir,v,[prog(+)]). s u f f i x t a b l e ( s p a n i s h , o , e r , i n f 1 , [ p e r ( l ) , p l ( - ) , t n s ( p r e s ) ] ) . s u f f i x t a b l e ( S p a n i s h , e s , e r , i n f l , [ p e r ( 2 ) , p l ( - ) , t n s ( p r e s ) ] ) . s u f f i x t a b l e ( S p a n i s h , e , e r , i n f 1 , [ p e r ( 3 ) , p l ( - ) , t n s ( p r e s ) ] ) . suffixtable(spanish,emos,er,inf1,[per(l),pl(+),tns(pres)]). suffixtable(Spanish,en,er,inf1,[per(3),pl(+),tns(pres)]). s u f f i x t a b l e ( s p a n i s h , o , " " , s t e m l , [ p e r ( l ) , p l ( - ) , t n s ( p r e s ) ] ) . suffixtable(Spanish,as,"",steml,[per(2),pl(-),tns(pres)]). suffixtable(Spanish,a,"",steml,[per(3),pl(-),tns(pres)]). suffixtable(Spanish,an,"",steml,[per(3),pl(+),tns(pres)]). suffixtable(Spanish,o,"",stem2,[per(1),pl(-),tns(pres)]). suffixtable(spanish,es,"",stem2,[per(2),pl(-),tns(pres)]). s u f f i x t a b l e ( s p a n i sh,e,"",s t em2,[per(3),p1(-),tns(pre s)]). suffixtable(Spanish,en,"",stem2,[per(3),pl(+),tns(pres)]). suffixtable(Spanish,e,"",stem3,[per(l),pl(-),tns(past)]). suffixtable(Spanish,iste,"",stem3,[per(2),pl(-),tns(past)]). suffixtable(spanish,o,"",stem3,[per(3),pl(-),tns(past)]). suffixtable(Spanish,imos,"",stem3,[per(1),pi(+),tns(past)]). suffixtable(Spanish,eron,"",stem3,[per(3),pl(+),tns(past)]). s u f f i x t a b l e ( S p a n i s h , ' i " ' s t e m 4 , [ p e r ( l ) , p l ( - ) , t n s ( p a s t ) ] ) . suffixtable(Spanish,iste,"",stem4,[per(2),pl(-),tns(past)]). suffixtable(Spanish,'io''',"",stem4,[per(3),pl(-),tns(past)]). suffixtable(Spanish,imos,"",stem4,[per(l),pl(+),tns(past)]). suffixtable(Spanish,ieron,"",stem4,[per(3),pl(+),tns(past)]). suffixtable(Spanish,*e''',"",stem5,[per(1),pl(-),tns(fut)]). suffixtable(Spanish,'a 1 1s',"",stem5,[per(2),pl(-),tns(fut)]). suffixtable(Spanish,'a''',"",stem5,[per(3),pl(-),tns(fut)]). suffixtable(Spanish,'e''mos',"",stem5,[per(1),pl(+),tns(fut)]). suffixtable(Spanish,'a''n',"",stem5,[per(3),pl(+),tns(fut)]). suffixtable(spanish,a,"",art,[fem(+)]). suffixtable(spanish,a,o,art,[fem(+)]). suffixtable(spanish,as,"",art,[fem(+),pl(+)]). suffixtable(Spanish,as,os,art,[fem(+),pl( + ) ] ) . s u f f i x t a b l e ( s p a n i s h , e s , " " , a r t , [ p l ( + ) ] ) . suffixtable(Spanish,os,"",art,[pl(+)]). s u f f i x t a b l e ( s p a n i s h , s , " " , a r t , [ p l ( + ) ] ) . ^******************************************* /* FUNCTION WORDS */ diet(Spanish,algun,art,[pl(algunos),english(some)]). d i c t ( s p a n i s h , a l g u n o s , a r t , [ f ( p l ( + ) ) , r o o t ( a l g u n ) ] ) . diet(Spanish,cada,art,[f(fem(_)),english([each,every])]). dict(spanish,'cua''1',art,[f([wh(+),fem(_),pl(-)]), p l ( ' c u a " l e s ' ) , english (which)]). diet(Spanish,'cua' 1les',art,[f([wh(+),fem(_),pl(+)]), rootCcua' '1')]). dict(spanish,'cua''nto',art,[f([wh(+),pl(-)]), english(howjmuch)]). dict(spanish,'cua''ntos',art,[f([wh(+),pl(+)]), english(how_many)]). d i e t ( S p a n i s h , e l , a r t , [ f e m ( l a ) , p i ( l o s ) , e n g l i s h ( t h e ) ] ) . dict(spanish,esa,art,[f(fem(+)),root(ese)]). diet(Spanish,ese,art,[fem(esa),english(that)]). d i e t ( s p a n i s h , e s o s , a r t , [ f ( p l ( + ) ) , e n g l i s h ( t h o s e ) ] ) . diet(Spanish,esta,art,[f(fem(+)),root(este)]). d i c t ( s p a n i s h , e s t e , a r t , [ f e m ( e s t a ) , p l ( e s t o s ) , e n g l i s h ( t h i s ) ] ) . d i e t ( s p a n i s h , e s t o s , a r t , [ f ( p l ( + ) ) , e n g l i s h ( t h e s e ) ] ) . d i e t ( S p a n i s h , l a , a r t , [ f ( f e m ( + ) ) , r o o t ( e l ) ] ) . d i e t ( S p a n i s h , l a s , a r t , [ f ( [ f e m ( + ) , p l ( + ) ] ) , r o o t ( e l ) ] ) . d i e t ( S p a n i s h , l o s , a r t , [ f ( p l ( + ) ) , r o o t ( e l ) ] ) . diet(Spanish,mi,art,[f(fem(_)),english(my)]). diet(spanish,muchos,art,[f(pl(+)),english(many)]). diet(spanish,ningun,art,[english(no)]). diet(Spanish,nuestro,art,[english(our)]). diet(Spanish,otro,art,[english(another)]). d i e t ( s p a n i s h , o t r o s , a r t , [ f ( p l ( + ) ) , e n g l i s h ( o t h e r ) ] ) . diet(Spanish,'que''',art,[f(wh(+)),english(what)]). dict(spanish,su,art,[f(fem(_)),english([his,her,its,their, y o u r ] ) ] ) . d i e t ( S p a n i s h , t u , a r t , [ f ( f e m ( _ ) ) , e n g l i s h ( y o u r ) ] ) . diet(Spanish,un,art,[english(a)]). diet(Spanish,unos,art,[f(pl(+)),english(some)]). d i c t ( s p a n i s h , v a r i o s , a r t , [ f ( p l ( + ) ) , e n g l i s h ( s e v e r a l ) ] ) . diet(Spanish,que,comp,[english(that)]). diet(Spanish,haber,perf,[english(have)]). d i c t ( S p a n i s h , h e , p e r f , [ f ( [ t n s ( p r e s ) , p e r ( l ) , p l ( - ) ] ) , root(haber) d i e t ( s p a n i s h , h a s , p e r f , [ f ( [ t n s ( p r e s ) , p e r ( 2 ) , p l ( - ) ] ) , root(haber) diet(Spanish,ha,perf,[f([tns(pres),per(3),pl(-)]), root(haber) diet(spanish,hemos,perf,[f([tns(pres),per(1),pl(+)]), root(haber) diet(Spanish,han,perf,[f([tns(pres),per(3),pl(+)]), root(haber) d i c t ( s p a n i s h , h a b i a s , p e r f , [ f ( [ t n s ( p a s t ) , p e r ( 2 ) , p l ( - ) ] ) , root(haber) die t ( S p a n i s h , h a b i a , p e r f , [ f ( [ t n s ( p a s t ) , p e r ( 3 ) , p l ( - ) ] ) , root(haber) d i e t ( S p a n i s h , h a b i a , p e r f , [ f ( [ t n s ( p a s t ) , p e r ( l ) , p l ( - ) ] ) , root(haber) diet(Spanish,habiamos,perf,[f([tns(past),per(1),pl(+)]), root(haber) diet(Spanish,habian,perf,[f([tns(past),per(3),pl(+)]), root(haber) d i c t ( s p a n i s h , ' h a b r e ' ' ' , p e r f , [ f ( [ t n s ( f u t ) , p e r ( 1 ) , p l ( - ) ] ) , root(haber) d i c t ( s p a n i s h , ' h a b r a ' ' s ' , p e r f , [ f ( [ t n s ( f u t ) , p e r ( 2 ) , p l ( - ) ] ) , root(haber) d i e t ( S p a n i s h , ' h a b r a ' ' ' , p e r f , [ f ( [ t n s ( f u t ) , p e r ( 3 ) , p l ( - ) ] ) , root(haber) dict(spanish,'habre''mos',perf,[f([tns(fut),per(1),pl(+)] root(haber) dict(spanish,'habra''n',perf,[f([tns(fut),per(3),pl(+)]), root(haber) diet(Spanish,estar,prog,[english(be)]). d i e t ( S p a n i s h , e s t o y , p r o g , [ f ( [ t n s ( p r e s ) , p e r ( l ) , p l ( - ) ] ) , root(estar) d i e t ( S p a n i s h , ' e s t a ' ' s ' , p r o g , [ f ( [ t n s ( p r e s ) , p e r ( 2 ) , p l ( - ) ] ) , root(estar) d i c t ( s p a n i s h , ' e s t a ' ' ' , p r o g , [ f ( [ t n s ( p r e s ) , p e r ( 3 ) , p l ( - ) ] ) , root(estar) diet(spanish,estamos,prog,[f([tns(pres),per(l),pl(+)]), root(estar) diet(Spanish,estan,prog,[f([tns(pres),per(3),pl(+)]), root(estar) dict(Spanish,estado,prog,[f([perf(+)]),root(estar)]). d i e t ( S p a n i s h , e s t a b a s , p r o g , [ f ( [ t n s ( p a s t ) , p e r ( 2 ) , p l ( - ) ] ) , root(estar) diet(Spanish,estaba,prog,[f([tns(past),per(3),pl(-)]), root(estar) diet(Spanish,estaba,prog,[f([tns(past),per(1),pl(-)]), root(estar) diet(Spanish,estabamos,prog,[f([tns(past),per(l),pl(+)]), root(estar) dict(spanish,estaban,prog,[f([tns(past),per(3),pl(+)]), r o o t ( e s t a r ) ] ) . d i e t ( S p a n i s h , ' e s t a r e ' ' 1 , p r o g , [ f ( [ t n s ( f u t ) , p e r ( l ) , p l ( - ) ] ) , r o o t ( e s t a r ) ] ) . d i c t ( s p a n i s h , ' e s t a r a ' ' s ' , p r o g , [ f ( [ t n s ( f u t ) , p e r ( 2 ) , p l ( - ) ] ) , r o o t ( e s t a r ) ] ) . d i c t ( s p a n i s h , ' e s t a r a ' ' ' , p r o g , [ f ( [ t n s ( f u t ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( e s t a r ) ] ) . dict(spanish,'estare''mos',prog,[f([tns(fut),per(l),pl(+)]), r o o t ( e s t a r ) ] ) . dict(spanish,'estara''n',prog,[f([tns(fut),per(3),pl(+)]), r o o t ( e s t a r ) ] ) . diet(Spanish,'cua''1',n,[f([wh(+),pi(-)]),english(which)]). dict(spanish,'cua''les',n,[f([wh(+),pl(+)]),root('cua''1')]). diet(Spanish,'cua''nto',n,[f([wh(+),pl(-)]), english(how_much)]). dict(spanish,'cua''ntos',n,[f([wh(+),pl(+)]), english(how_many)]). dict ( s p a n i s h , ' e ' ' 1 ' , n , [ f ( p r o ( + ) ) , f e m ( e l l a ) , p l ( e l l o s ) , e n g l i s h ( [ h e , i t ] ) ] ) . diet(Spanish,ella,n,[f([fem(+),pro(+)]),english(she)]). d i c t ( s p a n i s h , e l l o s , n , [ f ( [ p l ( + ) , p r o ( + ) ] ) , e n g l i s h ( t h e y ) ] ) . dict(spanish,'e''sa',n,[f(fem(+)),root(eso)]). dic t ( s p a n i s h , ' e ' ' s o ' , n , [ e n g l i s h ( t h a t ) ] ) . dict(spanish,'e''sos',n,[f(pl(+)),english(those)]). d i e t ( S p a n i s h , ' e ' ' s t o ' , n , [ e n g l i s h ( t h i s ) ] ) . d i c t ( s p a n i s h , ' e * ' s t o s ' , n , [ f ( p l ( + ) ) , e n g l i s h ( t h e s e ) ] ) . diet(Spanish,nosotros,n,[f([per(l),pl(+),pro(+)]), english(we)]). dict(spanish,'que''',n,[f(wh(+)),english(what)]). dict(spanish,'quie''n',n,[f([wh(+),pl(-)]),english(who)]). diet(Spanish,'quie''nes',n,[f([wh(+),pl(+)]),english(who)]). dict(Spanish,'tu' " , n , [ f ( [ p l ( - ) , p e r ( 2 ) , p r o ( + ) ] ) , e nglish(you)]). d i e t ( s p a n i s h , y o , n , [ f ( [ p l ( - ) , p e r ( 1 ) , p r o ( + ) ] ) , e n g l i s h ( i ) ] ) . diet(spanish,poder,infl,[english(can),stem2(pued),stem3(pud), stem5(podr),irr(pudieron)]). d i e t ( s p a n i sh,podr,s tem5,[f(tns(fut)),root(poder)]). diet(Spanish,pud,stem3,[f(tns(past)),root(poder)]). diet(spanish,pudieron,inf1,[f([tns(past),per(3),pl(+)]), root(poder)]). d i e t ( s p a n i sh,pued,stem2,[f(tns(pres)),root(poder)]). d i c t ( s p a n i s h , d e b e r , i n f l , [ e n g l i s h ( s h o u l d ) ] ) . d i c t ( S p a n i s h , a , i n f l , [ f ( t n s ( - ) ) , e n g l i s h ( t o ) ] ) . diet(Spanish,a,p,[english([to,at])]). diet(Spanish,al_lado_de,p,[english(beside)]). diet(Spanish,acerca_de,p,[english(about)]). diet(spanish,antes_de,p,[english(before)]). dict(spanish,alrededor_de,p,[english(around)]). diet(Spanish,bajo,p,[english(beneath)]). diet(Spanish,cerca_de,p,[english(near)]). diet(Spanish,con,p,[english(with)]). diet(Spanish,contra,p,[english(against)]). 128 diet(Spanish,'cua''ndo',p,[f(wh(+)),english(when)]). diet(spani s h,de,p,[english([from,of])]). diet(spanis h,debajo_de,p,[english([below,beneath,under])]). diet(spani s h,dentro_de,p,[english(i ns id e) ]) . diet(Spanish,despues_de,p,[english(after)]). dict(spanish,detras_de,p,[english(behind)]). dict(spanish,'do''nde',p,[f(wh(+)),english(where)]). diet(Spanish,durante,p,[english(during)]). diet(Spanish,en,p,[english(on)]). diet(Spanish,encima_de,p,[english(above)]). d i e t ( s p a n i sh,entre,p,[engli sh([among,between])]). diet(Spanish,fuera_de,p,[english(outside)]). diet(Spanish,hacia,p,[english(toward)]). diet(Spanish,para,p,[english(for)]). diet(spani s h,por,p,[english([for,by,through])]). diet(Spanish,sin,p,[english(without)]). diet(Spanish,sobre,p,[english([over,on,above])]). ^***************************************** /* CONTRACTIONS */ y***************************************** c o n t r a c t i o n ( s p a n i s h , a l , [ a , e l ] ) . contraction(Spanish,del,[de,el]). /* COLLOCATIONS */ c o l l o c a t i o n ( s p a n i sh, collocation(Spanish, collocation(Spanish, c o l l o c a t i o n ( s p a n i sh, c o l l o c a t ion(spani sh, collocation(Spanish, c o l l o c a t ion(spani sh, collocation(Spanish, collocation(Spanish, c o l l o c a t i o n ( s p a n i sh, collo c a t i o n ( s p a n i s h , [a,el,lado,de],al_lado_de). [acerca,de],acerca_de). [alrededor,de],alrededor_de). [antes,de],antes_de). [cerca,de],cerca_de). [debajo,de],deba jo_de). [dentro,de],dentro_de). [despues,de],despues_de). [detras,de],detras_de). [encima,de],encima_de). [fuera,de],fuera_de). y***************************************** /* DICTIONARY */ / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * d i c t ( s p a n i s h , * s e ' ' 1 , v , [ f ( [ t n s ( p r e s ) , p e r ( 1 ) , p l ( - ) ] ) , r o o t ( s a b e r ) ] ) . diet(Spanish,amigo,n,[english(friend)]). diet(Spanish,cama,n,[f(fem(+)),english(bed)]). diet(Spanish,carta,n,[f(fem(+)),english(letter)]). dict(spanish,casa,n,[f(fem(+)),english(house)]). d i c t ( s p a n i s h , c i e r t o , a , [ f ( t h e t a ( - ) ) , e n g l i s h ( c e r t a i n ) ] ) . diet(Spanish,cine,n,[english(movie)]). diet(spani s h,comprar,v,[english(buy)]). diet(spani s h,cosa,n,[f(fem(+)),english(thing)]). d i e t ( s p a n i s h , c o s t a r , v , [ s t e m l ( c u e s t ) , e n g l i s h ( c o s t ) ] ) . diet(spani s h ,creer,v,[prog(creyendo),english(believe)]). diet(spanish,cubeta,n,[f(fem(+)),engli s h(bucket)]). d i e t ( S p a n i s h , c u e s t , s t e m l , [ f ( t n s ( p r e s ) ) , r o o t ( c o s t a r ) ] ) . diet(spani s h,d,stem4,[f(tns(past)),root(dar)]). diet(spani s h,dar,v,[irr([doy,dio]),stem4(d),english(give)]). d i c t ( s p a n i s h , d e c i r , v , [ e n g l i s h ( s a y ) , p e r f ( d i c h o ) , prog(diciendo),irr(digo),stem2(dic),stem3(dij)]). dict(Spanish,die,stem2,[f(tns(pres)),root(decir)]). d i e t ( S p a n i s h , d i c h o , v , [ f ( p e r f ( + ) ) , r o o t ( d e c i r ) ] ) . d i e t ( s p a n i s h , d i c i e n d o , v , [ f ( p r o g ( + ) ) , r o o t ( d e c i r ) ] ) . d i e t ( s p a n i s h , d i g o , v , [ f ( [ t n s ( p r e s ) , p e r ( l ) , p l ( - ) 3 ) , r o o t ( d e c i r ) ] ) . diet(Spanish,dij,stem3,[f(tns(past)),root(decir)]). d i e t ( s p a n i s h , d i o , v , [ f ( [ t n s ( p a s t ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( d a r ) ] ) . diet(Spanish,dormir,v,[english(sleep),prog(durmiendo), stem2(duerm),stem4(durm)]). d i e t ( s p a n i s h , d o y , v , [ f ( [ p e r ( l ) , p l ( - ) , t n s ( p r e s ) ] ) , r o o t ( d a r ) ] ) . diet(Spanish,duerm,stem2,[f(tns(pres)),root(dormir)]). diet(Spanish,durm,stem4,[f(tns(past)),root(dormir)]). diet(Spanish,durmiendo,v,[f(prog(+)),root(dormir)]). diet(Spanish,empresa,n,[f(fem(+)),english(firm)]). d i e t ( S p a n i s h , e r a , v , [ f ( [ t n s ( p a s t ) , p e r ( l ) , p l ( - ) ] ) , r o o t ( s e r ) ] ) . d i e t ( S p a n i s h , e r a , v , [ f ( [ t n s ( p a s t ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( s e r ) ] ) . diet(Spanish,eramos,v,[f ( [ t n s ( p a s t ) , p e r ( l ) , p l ( + ) ] ) , root(ser)]). diet(Spanish,eran,v,[f([tns(past ) ,per(3),pl(+)3),root(ser)]) d i e t (Spanish, eras, v, [f (• [ tns (past), per ( 2 ) , p i ( - ) ] ) , root (ser) ]) di e t ( S p a n i s h , e r e s , v , [ f ( [ t n s ( p r e s ) , p e r ( 2 ) , p l ( - ) ] ) , r o o t ( s e r ) ] ) diet(Spanish,es,v,[f([tns(pres),per(3),pl(-)3),root(ser)]). d i c t ( s p a n i s h , e s c r i b i r , v , [ e n g l i s h ( w r i t e ) ] ) . diet(Spanish,espana,n,[proper(+),english(spain)]). d i e t ( S p a n i s h , e s t i r a r , v , [ e x p r ( [ [ l a , p a t a ] , e n g l i s h ( [ d i e , expr([kick,the,bucket 3 ) ] ) ] ) , e n g l i s h ( s t r e t c h ) ] ) diet(Spanish,fu,stem3,[f(tns(past)),root(ir)]). d i e t ( S p a n i s h , f u e , v , [ f ( [ t n s ( p a s t ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( i r ) ] ) . d i e t ( S p a n i s h , f u i , v , [ f ( [ t n s ( p a s t ) , p e r ( l ) , p l ( - ) ] ) , r o o t ( i r ) ] ) . diet(spanish,hablar,v,[english(speak)]). diet(Spanish,hacer,v,[english(do),perf(hecho), irr([hago,hizo,hicieron]),stem3(hic)]). diet(Spanish,hago,v,[f([tns(pres),per(1),pl(-)]), root(hacer)]). diet(Spanish,hecho,v,[f(perf(+)),root(hacer)]). diet(spanish,hermana,n,[english(sister),f(fem(+))]). diet(spanish,hermano,n,[english(brother)]). diet(Spanish,hie,stem3,[f(tns(past)),root(hacer)]). diet(Spanish,hicieron,v,[f([tns(past),per(3),pl(+)]), root(hacer)]). d i e t ( S p a n i s h , h i c i m o s , v , [ f ( [ t n s ( p a s t ) , p l ( + ) , p e r ( l ) ] ) , root(hacer)]). d i e t ( S p a n i s h , h i z o , v , [ f ( [ t n s ( p a s t ) , p e r ( 3 ) , p l ( - ) ] ) , r o o t ( h a c e r ) ] ) . dict(Spanish,hombre,n,[english(man)]). diet(Spanish,ir,v,[english(go),irr([voy,vamos,fui,fue]), steml(v),stem3(fu),prog(yendo)]). diet(Spanish,juan,n,[proper(+),english(John)]). die t ( S p a n i s h , l a d o , n , [ e n g l i s h ( s i d e ) ] ) . diet(Spanish,libro,n,[english(book)]). diet(spanish,madre,n,[f(fem(+)),english(mother)]). dict(spanish,maria,n,[proper(+),english(mary)]). diet(Spanish,mesa,n,[f(fem(+)),english(table)]). dict(spanish,morir,v,[english(die),perf(muerto),stem2(muer), stem4(mur)]). diet(spanish,muchacha,n,[f(fem(+)),english(girl)]). diet(spanish,muchacho,n,[english(boy)]). diet(spanish,muer,stem2,[f(tns(pres)),root(morir)]). d i e t ( s p a n i sh,muer to,v,[f(per f(+)),root(mor i r ) ] ) . diet(spanish,mujer,n,[f(fem(+)),pl(mujeres),english(woman)]). dict(Spanish,mur,stem4,[f(tns(past)),root(morir)]). diet(spanish,padre,n,[english(father)]). diet(Spanish,parecer,v,[f(theta(-)),sdel(+),english(seem), i r r ( [ p a r e z c o ] ) ] ) . d i e t ( S p a n i s h , p a r e z c o , v , [ f ( [ p e r ( l ) , p l ( - ) , t n s ( p r e s ) ] ) , r o o t ( p a r e c e r ) ] ) . diet(spanish,pata,n,[f(fem(+)),english(paw)]). d i c t ( S p a n i s h , p a t e a r , v , [ e n g l i s h ( k i c k ) ] ) . d i e t ( S p a n i s h , p e l i c u l a , n , [ f ( f e m ( + ) ) , e n g l i s h ( f i l m ) ] ) . diet(spanish,pensar,v,[english(think),steml(piens)]). diet(Spanish,piens,steml,[f(tns(pres)),root(pensar)]). d i c t ( s p a n i s h , p i s o , n , [ e n g l i s h ( f l o o r ) ] ) . diet(spanish,poner,v,[english(put),perf(puesto),irr([pongo, pusimos,pusieron]),stem3(pus)]). di e t ( S p a n i s h , p o n g o , v , [ f ( [ p e r ( l ) , p l ( - ) , t n s ( p r e s ) ] ) , root(poner)]). d i e t ( S p a n i s h , p o s i b l e , a , [ f ( t h e t a ( - ) ) , e n g l i s h ( p o s s i b l e ) ] ) . d i c t ( S p a n i s h , p r o b a b l e , a , [ f ( t h e t a ( - ) ) , e n g l i s h ( [ l i k e l y , probable])]), diet(Spanish,puesto,v,[f(perf(+)),root(poner)]). dict(spanish,pus,stem3,[f(tns(past)),root(poner)]). diet(Spanish,pusieron,v,[f([tns(past),per(3),pl(+)]), root(poner)]). diet(Spanish,pusimos,v,[f([tns(past),per(1),pl(+)]), root(poner)]). diet(Spanish,querer,v,[english(want),stem2(quier)]). diet(Spanish,quier,stem2,[f(tns(pres)),root(querer)]). d i e t ( s p a n i sh,reunion,n,[f(fem( + ) ) , e n g l i sh(meeting)]). diet(Spanish,saber,v,[english(know),irr(*se' 1'), stem5(sabr)]). dict(spanish,sabr,stem5,[f(tns(fut)),root(saber)]). d i c t ( s p a n i s h , s a l , s t e m 4 , [ f ( t n s ( p a s t ) ) , r o o t ( s a l i r ) ] ) . d i e t ( S p a n i s h , s a l d r , s t e m 5 , [ f ( t n s ( f u t ) ) , r o o t ( s a l i r ) ] ) . d i e t ( S p a n i s h , s a l g o , v , [ f ( [ t n s ( p r e s ) , p e r ( 1 ) , p l ( - ) ] ) , r o o t ( s a l i r ) ] ) . d i e t ( S p a n i s h , s a l i r , v , [ e n g l i s h ( l e a v e ) , i r r ( s a l g o ) , s t e m 4 ( s a l ) , stem5(saldr)]). diet(Spanish,ser,v,[english(be),irr([soy,eres,es,somos,son, era,eras,eramos,eran])]). diet(Spanish,somos,v,[f([tns(pres),per(1),pl(+)]),root(ser)]). diet(Spanish,son,v,[f([tns(pres),per(3),pl(+)]),root(ser)]). d i e t ( S p a n i s h , s o y , v , [ f ( [ t n s ( p r e s ) , p e r ( l ) , p l ( - ) ] ) , r o o t ( s e r ) ] ) . dict(spanish,suponer,v,[english(expect),perf(supuesto), i r r ( [ s u p o n g o ] ) ] ) . diet(Spanish,supongo,v,[f([per(l),pl(-),tns(pres)]), root(suponer)]). d i e t ( s p a n i sh,tendr,stem5,[f(tns(fut)),root(tener)]). diet(Spanish,tener,v,[english(have),irr([tengo,tuvimos, tuvieron]),stem2(tien),stem3(tuv),stem5(tendr)]). d i e t ( S p a n i s h , t e n g o , v , [ f ( [ t n s ( p r e s ) , p e r ( l ) , p l ( - ) ] ) , r o o t ( t e n e r ) ] ) . dict(spanish,tiempo,n,[english(time)]). diet(Spanish,tien,stem2,[f(tns(pres)),root(tener)]). diet(spani sh,tuv,stem3,[f(tns(past)),root(tener)]). diet(Spanish,tuvieron,v,[f([tns(past),per(3),pl(+)3), root(tener)]). dict(spanish,tuvimos,v,[f([tns(past),per(l),pl(+)]), root(tener)]). diet(spanish,v,steml,[f(tns(pres)),root(ir)]). diet(spanish,v,stem4,[f(tns(past)),root(ver)]). diet(spanish,vamos,v,[f([tns(pres),per(l),pl(+)]),root(ir)]). diet(Spanish,vender,v,[english(sell)]). diet(spanish,vendr,stem5,[f(tns(fut)),root(venir)]). diet(spanish,vengo,v,[f([per(l),pl(-),tns(pres)]), root(venir)]). diet(Spanish,venir,v,[irr([vengo,vinieron,viniendo]), stem2(vien),stem3(vin),stem5(vendr),english(come)]). diet(spanish,veo,v,[f([tns(pres),per(l),pl(-)]),root(ver)]). diet(spanish,ver,v,[english(see),perf(visto),irr([veo,vio]), stem4(v)]). diet(Spanish,verdad,n,[f(fem(+)),english(truth)]). diet(spanish,vien,stem2,[f(tns(pres)),root(venir)]). diet(spanish,vin,stem3,[f(tns(past)),root(venir)]). dict(spanish,viniendo,v,[f(prog(+)),root(venir)]). diet(spanish,vinieron,v,[f([tns(past),per(3),pl(+)]), root(venir)]). diet(spanish,vio,v,[f([tns(past),per(3),pl(-)]),root(ver)]). dict(spanish,visto,v,[f(perf(+)),root(ver)]). diet(spanish,voy,v,[f([tns(pres),per(l),pl(-)]),root(ir)]). diet(spanish,yendo,v,[f(prog(+)),root(ir)]). Appendix D. Prolog Code - EXEC y**********************************************************/ / * V /* EXECUTIVE */ /* */ y**********************************************************^/ gb : -readinput(SL,TL,Surface), !, morph(SL,Surface, NewSurface), !, parse(SL,NewSurface, Sstructure), write('...Parsed 1), n l , !, rtransformation(SL,Sstructure, Dstructure), write( 1...R-transformed'), n l , show(ds,'D-Structure =',Dstructure), !, bagof(TDstructure, translate(SL,Dstructure,TL, TDstructure), TDstructures), write('...Translated 1), n l , !, transform_and_generate(TL,TDstructures). y******************************************************* /* CATEGORIAL ATTRIBUTES */ ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ is_trace([[X,F],e]) :- member(indx(_),F). is_nptrace([[n,F],e]) is_variable([[X,F],e]) i s _pro([[n,FN],e]) member(ana(+),F). member(ana(-),F), member(pro(-),F) member(pro(+),FN), not member(indx( ),FN). is PRO([[n,[]],e]) is_sdeleter(L,W,C) :- extractLocal(L,W,C,_,sdel,[+]). is_transitive(L,V) :- extract(L,V,v,_,subcat,[n]). lexical([_,e]) :- ! , f a i l . lexical(e) :- ! , f a i l . l e x i c a l ( _ ) . index([[X,[indx(I)|_]]|_], I ) . features([[_,F]|_],F). ^* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ /* AUXILIARY PREDICATES */ ^/**********************************************************/ /* Extract Args from current word and Root word only */ extractLocal(L,W,C,RW,Funct,Args) :-dict(L,W,C,Lexinfo), extractFunc(Lexinfo,Funct,AW), extractRoot(L,W,C,Lexinfo,RW,_,Funct,AR), combineF(AW,AR,Args). 132 /* Extract Args from current word, Root, and English root */ extract(L,W,C,RW,Funct,Args) :-dict(L,W,C,Lexinfo), extractFunc(Lexinfo,Funct,AW), extractRoot(L,W,C,Lexinfo,RW,RLexinfo,Funct,AR), extractEngl(L,C,RLexinfo,Funct,AE), combineF(AW,AR,Ax), combineF(Ax,AE,Args). /* Extract Args from given lexical info */ extractFunc(Lexinfo,Funct,Args) :-X =.. [Funct,A], member(X,Lexinfo), makelist(A,Args). extractFunc(Lexinfo,Funct,[]) :-X =.. [Funct,_], not member(X,Lexinfo). /* Extract Args from root word, i f one exists */ extractRoot(L,_,C,Lexinfo,Root,RLexinfo,Funct,Args) : -member(root(Root),Lexinfo), !, diet(L,Root,C,RLexinfo), not member(root(_),RLexinfo), extractFunc(RLexinfo,Funct,Args). extractRoot(_,W,_,Lex,W,Lex,_,[]). /* For Spanish words, extract Args from i t s English translation, which must exist. */ extractEngl(Spanish,C,RLexinfo,Funct, Args) :-member(english(expr([w|Expr])),RLexinfo), diet(english,W,C,ELexinfo), member(expr([Expr|Expr_info]),ELexinfo), extractFunc(Expr_info,Funct, Args). extractEngl(Spanish,C,RLexinfo,Funct, Args) :-member(english(EW),RLexinfo), makelist(EW,EWlist), member(EW1,EWli st), diet(english,EWl,C,ELexinfo), not member(root(_),ELexinfo), extractFunc(ELexinfo,Funct,Args). e x t r a c t E n g l ( e n g l i s h , [ ] ) . /* */ get_expr(L,C,expr([w|Expr]),Args,[w|Sl],S2) :-rootword(L,W,C, RW), diet(L,RW,C,Lexinfo), member(expr([Expr|Expr_info]),Lexinfo), exprjnember(Expr,S1, S2), extractEngl(L,C,Expr_info,subcat, Args). expr_member([],S,S). exprjnember([W|X],[w|Y],Z) :- exprjnember(X ,Y,Z). /* */ rootword(L,W,C,RootW) :-dict(L,W,C,Lexinfo), member(root(R),Lexinfo), !, RootW = R. rootword(L,W,C,W) :- dict(L,W,C,_). /* */ combineF([],F,F) :- !. combineF(F,[],F) :- !. combineF(WF,[Rl|Rs],F) :-mbrtype(Rl,WF), combineF(WF,Rs,F). combineF(WF,[Rl|Rs],[Rl|F]) :-not mbrtype(Rl,WF), combineF(WF,Rs,F). /* i f Rl already i n WF, */ /* WF takes precedence */ /* add Rl to WF i f not */ /* already i n WF */ mbrtype(R,W) :-R =.. [RA,_], !, X =.. [RA,_], member(X,W). mbrtype(R,W) :- member(R,W) /* _. /* for ' f features */ /* for 'subcat' */ * must_appear([], ) :- !. must_appear([FljFs],F2) :-member(FI,F2), !, must_appear(Fs,F2). must_agree([], ). must_agree([FljFs],F2) :-featureagree(Fl,F2), must agree(Fs,F2). / /* ALL members of Argl */ /* must be i n Arg2 */ /* If Argl i s i n Arg2, */ /* i t must agree */ featureagree(_,[3)• featureagree(Fl,[F2|_]) :-Fl=..[Al,Vl3, F2=..[A1,V2], !, V1=V2. featureagree(Fl,[_|F2]) :-featureagree(Fl,F2). /* */ feature_union(Fl,F2,F3) :- must_agree(Fl,F2), union(F1,F2,F3) /* */ get(Ftype,Flist,[F]) :- F=..[Ftype,_], member(F,Flist), !. get ( _ , _ , [ ] ) . ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /* GENERAL PURPOSE PREDICATES */ ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * member(X,[X member(X,[ J ) -Y]) :- member(X,Y). union(X,[3,X) union([3,X,X) union([X union([X R3,Y,Z) R3 ,Y,[X|Z]) - member(X,Y), !, union(R,Y,Z). - union(R,Y,Z). append([3,L,L). append([x|Ll],L2,[x|L3]) :- append(Ll,L2,L3) subtract(L,[3,L) subtract([H subtract([H T],L,U) T],L,[H|U]) - I - member(H,L), !, subtract(T,L,U). - !, subtract(T,L,U). 135 subtract(_,_,[]). makelist([],[]) :- !. makelist([H|T],[H|T]) :- !. makelist(X,[X]). ^**** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *^ /* PRETTY-PRINTING */ y******************************************************** show(Type,Label,Tree) :-showon(Type), n l , display(Label), n l, prettyprint(Tree,0), nl, n l . show(_,_,_). prettyprint(e,_). /* don't print empty categories */ prettyprint([],_). prettyprintC[[X,F]|T],I) :-nl , tab(I), display(X),display(' '),display(F), J i s 1+2, ppx(T,J). prettyprint([H|T],I) :- tab(I), prettyprint(H,I), prettyprint(T,I). prettyprint(X,I) :- nl, tab(I), display(X). ppx([],_). ppx([H|T],I) :- prettyprint(H,I), ppx(T,I). Appendix E . Prolog Code - R E A D I N P U T y******************************************* /* /* READINPUT /* y******************************************** readinput(SLang,TLang,Surface) :-display('Enter source language: '), read(Input), readinput2(Input,SLang,TLang,Surface). readinput2('ds',SLang,TLang,Surface) :-assert(showon(ds)), readinput(SLang,TLang,Surface). readinput2('nds',SLang,TLang,Surface) :-retract(showon(ds)), readinput(SLang,TLang,Surface). readinput2(SLang,SLang,TLang,Surface) :-display('Enter target language: '), read(TLang), display('Enter sentence in '), display(SLang), display(': '), read_in(Surface). ^******************************************** /* Read sentence: Clocksin & Mellish p.87-88 read_in([w|ws]) :-getO(C), readword(C,W,Cl), restsent(W,Cl,Ws). readword(C,W,Cl) :-single_character(C), !, name(W,[C]), getO(Cl). readword(C,W,C2) :-in_word(C,NewC), !, getO(Cl), restword(Cl,Cs,C2), name(W,[NewC|Cs]). readword(_,W,C2) :- getO(Cl), readword(Cl,W,C2) restword(C,[NewC|Cs],C2) in_word(C,NewC), !, getO(Cl), restword(Cl,Cs,C2). restword(C,[],C). restsent(W,_,[]) :-restsent(_,C,[Wl|Ws]) :-lastword(W), !. readword(C,Wl,Cl), restsent(WT,CI,Ws). in_word(C,C) :- 096, C«123. /* a ... z */ in_word(C,L) :- C>64, C«91, L is C+32. /* A ... Z */ in_word(C,C) :- 047, C«58. /* 0 1...9 */ in_word(39,39). /* ' */ lastword('.'). 136 lastwordC?'). single_character(63) single_character(46) Appendix. F. Prolog Code - M O R P H /**********************************************************^ / * V /* MORPHOLOGICAL ANALYSIS */ / * V ^**********************************************************^ morph(L,[],[]). morph(L,Surface,Newsurface) :-contract(L,Surface,Surfacel), collocate(L,Surfacel,Newsurface), morphl(L,Newsurface). morphl(_,[]). morphl(L,[w|S]) :- morph_word(L,W), morphl(L,S). morph_word(_,'.'). morph_word(_,'?'). morph_word(L,W) :- dict(L,W,_,_). morph_word(L,W) :- suffixanalysis(L,W). morph_word(L,W) :- update_dict(L,W). morph_word(_,_). /* */ contract(_,[],[]). contract(L, [A|S1], [B,C|S2]) :- contraction^,A, [B,C]) , contract(L,SI,S2). contract(L,[x|Sl],[x|S2]) :- contract(L,SI,S2). collocate(_,[],[]). collocated,[A,B,C,D|S1],[E|S2]) :-collocation(L,[A,B,C,D],E), collocate(L,Sl,S2). collocateU, [A,B|S1], [c|S2]) :-collocation(L,[A,B],C), collocate(L,Sl,S2). collocate(L,[X|S1],[X|S2]) :- collocate(L,Sl,S2). y*************************************************** suffixanalysis(L,W) :-suffixtable(L,Sfx,RootEnd,SfxC,SfxF), split(Stem,Sfx,W), split(Stem,RootEnd,Root), addinfl(L,W,Root,SfxC,SfxF), !. /* category type: stem_ */ addinf1(spani sh,W,Root,Stem,Sf xF) :-name(Stem,[S,T,E,M|_]), name(stem,[S,T,E,M]), rootword(Spanish,Root,Stem,RootW), !, diet(spanish,RootW,C,_), /* root i s modal or verb */ addinf1(spani sh,W,RootW,C,SfxF). addinf1(L,W,Root,C,SfxF) :-extractLocal(L,Root,C,Root,f,RootF), combineF(SfxF,RootF,F), adddict(L,W,Root,C,F), ! , f a i l . 138 split(Stem,[],Stem) :- !. /* for stems with null endings */ split(Stem,Sfx,W) :-var(Stem), !, /* Find Stem, given W */ name(W,Wlist), name(Sfx,Sfxlist), append(Stemlist,Sfxlist,Wlist), name(Stem,Stemlist). split(Stem,Sfx,W) :- /* Find W, given Stem */ name(Stem,Stemlist), name(Sfx,Sfxlist), append(Stemlist,Sfxlist,Wlist), name(W,Wlist). adddict(L,W,RootW,C,F) :-uniqueadd(L,W,C,RootW,F), assert(dict(L,W,C,[f(F),root(RootW)])), display('Added:'), nl, display(dict(L,W,C,[f(F),root(RootW)])), n l . uniqueadd(L,W,C,_,_) :- not dict(L,W,C,_). uniqueadd(L,W,C,RootW,_) :-dict(L,W,C,Lexinfo), not member(root(RootW),Lexinfo). uniqueadd(L,W,C,_,F) :-dict(L,W,C,Lexinfo), member(f(FI),Lexinfo), makelist(Fl,Fllist), not union(Fllist,[],F). y***************************************** update_dict(L,W) :-not dict(L,W,_,_), display('Do you want to add " ' ) , display(W), display('"? '), read(Reply), process_reply(L,Reply,W). process_reply(L,y,W) :- /* Yes, add to dictionary */ display('Enter category: '), read(C), display('Enter lexical info: '), read(Lexinfo), assert(dict(L,W,C,Lexinfo)), display('Perm add:'), nl, display(dict(L,W,C,Lexinfo)), n l , t e l l ( L ) , write(dict(L,W,C,Lexinfo)), put(46), n l , tell(user). process_reply(L,t,W) :- /* Temp add to dictionary */ display('Enter category: '), read(C), display('Enter lexical info: '), read(Lexinfo), assert(dict(L,W,C,Lexinfo)), display('Temp add:'), n l , display(diet(L,W,C,Lexinfo)), n l . process_reply(L,r,W) :- /* Add Root to dictionary */ display('Enter root: '), read(Root), display('Enter category: '), read(C), display('Enter lexical info: '), read(Lexinfo), assert(dict(L,Root,C,Lexinfo)), display('Root add:'), n l , display(dict(L,Root,C,Lexinfo)), nl, t e l l ( L ) , write(dict(L,Root,C,Lexinfo)), put(46), n l , tell(user), morph_word(L,W). /* now re-morph word process reply( , , ) :- abort. Appendix G . Prolog Code - Core Grammars / * V /* CORE GRAMMAR: English */ /* */ y**************************************** null_subject(english, /* Base to Surface */ [[comp,FC],WH,[e,COMP, [[infl,Fl],BaseNP,INFL]]|Post], [[comp,FC],WH,[e,COMP, [[infl,Fl],[[n,FN],e,[e,N]],INFL]]|Post]) :-not var(BaseNP), not lexical(BaseNP), not is_trace(BaseNP), not member(tns(-),FI), !, /* EC is not PRO */ english_pronoun(N), assign_features(english,N,FI,FN). null_subject(english,COMP,COMP). /* Surface to Base */ assign_features(L,N,FI,FN) :-extractLocal(L,N,n,_,f,F1), defaultF(L,n,Fl, FN), !, must agree(FI,FN). english_pronoun(he). english_pronoun(she). english_pronoun(it). english_pronoun(they). /* */ /* Exceptional Case Marking */ ecm(english,V) :- /* verb must be: */ is_sdeleter(english,V,v), /* sbar-deleter */ is_transitive(english,V). /* & transitive */ /* */ proper_governor(english,v). proper_governor(english,p). /* */ bounding_cat(english,infl). bounding_cat(english,n). y****************************************** /* INSERTION RULES */ y*************************************** it_insertion(english, /* base to surface */ [[comp,FC],WH,[e,COMP, [[infl,BaseFl],[[n,[]],e],INFL]]|Post], [[comp,FC],WH,[e,COMP,. [[infl,Fl],[[n,FN],e,[e,it]],INFL]]|Post]) :-var(FN), member(theta(-),BaseFI), makelist([per(3),pl(-)],FN), union(BaseFI,FN,FI), !. it_insertion(english, /* surface to base */ [[comp,FC],WH,[e,COMP, [[infl,Fl],[[n,[]],e],INFL]]|Post], [[comp,FC],WH,[e,COMP, [[infl,Fl],[[n,FN],e,[e,it]],INFL]]|Post]) :-141 not var(FN), member(theta(-),FI), !. it_insertion(_,COMP,COMP). /* /* i f non-matrix COMP is empty, and clause is tensed, /* assign a default lexical COMP. comp_insertion(english,matrix(-), [[comp,FC],WH,[e,e,INFL]|Post], [[comp,FC],WH,[e,that,INFL]|Post]) :-features(INFL,FI), not member(tns(-),FI). comp_insertion(english,_,COMP, COMP). / modal_insertion(english,[[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,e,VP]]]|Post], [[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,to,VP]]]|Post]) : member(tns(-),FI), !. modal_insertion(english,[[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,e,VP]]]|Post], [[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,will,VP]]]|Post]) member(tns(fut),FI), !. modal_insertion(_,COMP,COMP). /* do_support(engli sh,matrix(+),ques, [[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,e,VP]]]|Post], [[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,do,VP]]]|Post]) :-lexical(NP), not member(tns(fut),FI), not member(perf(+),FI), not member(prog(+),FI), not member(pass(+),FI), !. do_support(_,_,_,COMP,COMP). y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /* INVERSION RULES y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * adjunct(english,W) —> [W], { diet(english,W,inf1,_), !, not W=to }. adjunct(english,W) — > [ w ] , { diet(english,W,perf,_) }. adjunct(english,W) —> [W], { dict(english,W,prog,_) }. inversion(english,Matrix,Mode,BaseCOMP,SurfaceCOMP) :-havebe_raising(BaseCOMP, COMP1), sai(Matrix,Mode,COMP1, SurfaceCOMP). rinversion(english,SurfaceCOMP, BaseCOMP) :-sai(_,_,COMPl, SurfaceCOMP), havebe_raising(BaseCOMP, COMP1). /* Subject-AUX-Inversion sai(matrix(+),ques,[[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,INFL,VP]]]|Post], -: ([}SOd|[[ldNl'dN ' [ ld'T5UT]] 'dWOO'9] 'HM1 [Dd'duico] ] '[^sodlCtlJNl'ECN'al'a'tNd ' u j j ' t i d'TJUT]] 'dWCO's] 'HM' [Dd'duiCO] ] asnq o i aoHjjns */ 'qsTxreds^oaCqns ijnu /* */ /* qsxueds :HVWWWH9 SHOD */ /* */ /****************************************_******************/ • (M'fitiT'A)^Tlds -: (M' 'A 1yjSTXfiUB)6oJd -ua6 • (M'6uT'ure^S)4TTds _ ' (A'3'ura^S)3-TTds -: (M' 'A'^^xBuB)&ojdi~ua6 •(ojuxxai'(M)6ojd)jaquiaui -: (M ' O J U T X B T i _'qsTx6ua)6ojd~ua6 ' (M'ua'A)lTTds -: (M , _'A^STX6UB)JjacTuaB •(M'u9'urais)lTTds '(A'9'ura^S)lTlds -: (M'~'A/qsTx6ua)ijacr~ua6 • (ojUTxaq'(M)J-iad)jaquraui -: (M'oiuTxai' 'nsfiBua)jjacTuaB /**********************************************************/ saTjB-rxT x n B P333ST?UT a^ejauao */ /******************************************_****************/ '(d'd' 'qsTxBua) j^xnejap ' i ' ( e d ' Z d ' - ' T d ) ^ x n H j a p '(Zd'Td'£'Jad)ixneiap /* SNflON */ - : ( e d'Td ' u'irsTxBua^lxnejap sajn^saj x ^ O T x a q i x n B 3 a < I */ /„ _ •aq=A }OU '(Ad'( ) su^) jaqutaui }ou '(Id'( ) sxr}) jaquiaui q.ou -: (sajd)su^] 1A'Ad'ld'qsTx6ua)su^xne3ap '(Id'( )suq.) jaquiaui ^ou ' (Ad' O s u } ) jaquiaui -: ( i£ i -'Ad ' l d'qsTx6ua ) s u^xnejap *(Ad'( )su:j.) jaquiaui }OU ' (Id' O s u } ) jaquiaui -: (id , -'Ad ' ld'USTT6ua)su^ _^xnH3ap / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / /* smod n n v d a a */ (dW0D'dW03)6uTSTHj aqaABq • j '( ( '6oad ' x d ' t[STx6ua)iOTp I ( - ,3jad ' x d'MSTT 6ua)^o-rp ) -: ([}SOd|[[[[AlSOd|[S6JV|A'9]'3d'[Ad'A]] 'Id's]'dN '[ld'T3UT]] 'dWOD's] 'HM'[Di'duioo]] '[^SOd|[[[[AlSOd|[s6jv|A'Zd]'Td'[AJ'A]] 'S'3]'dN '[ld'T3UT]] 'dWCO'9] 'HM' [Dd'duioo] ] ) 6 U T S T H j aqaABu; '(dWCO'dWOO' ' )T^S • i -: ([^sod|[[[[dA's's]'dN'[IJ'T3UT]] 'idNl'flJ'TJUT]] 'dWOD'a] 'HM'[Dd'duioo]] var(N), not lexical(NP), ( member(per(1),FI) ; member(per(2),FI) ), spanish_pronoun(N), assign_features(Spanish,N,FI,FN). nu11_sub j ect(spani sh,COMP,COMP). null_subject(Spanish, /* base to surface */ ttcomp,FC],WH,[e,COMP, [[infl,Fl],[[n,FN],e,[e,N]],INFL]]|Post], t[comp,FC],WH,[e,COMP, [[infl,FI],[[n,FN],e],INFL]]|Post]) :-not var(N), member(pro(+),FN). /* delete i f l e x i c a l pronoun */ spanish_pronoun(yo). spanish_pronoun(nosotros). spanish_pronoun( 1tu' 1'). /* */ proper_governor(spani sh,v). proper_governor(spani sh,inf1). /* */ bounding_cat(Spanish,comp). bounding_cat(Spanish,n). /* INSERTION RULES */ y************************************* /* i f non-matrix COMP i s empty, and clause i s tensed, */ /* assign a default l e x i c a l COMP. */ comp_insertion(Spanish,matrix(-), [[comp,FC],WH,[e,e,INFL]|Post], [[comp,FC],WH,[e,que,INFL]|Post]) :-features(INFL,FI), not member(tns(-),FI), !. comp_insertion(Spanish,_,COMP, COMP). /************************************ /* RESTRUCTURING RULES */ adjunct(Spanish,expr(W)) —> restructure(W), { !, not W=[e,e,e,e] }. restructure([e,e,e,V]) — > [ v ] , { diet(spanish,V,v,Data), member(root(_),Data) }. restructure([e,e,P2,V]) —> [P2,V], { diet(spanish,P2,prog,Data), member(root(_),Data), dict(Spanish,V,v,_) }. restructure([e,e,P2,e]) —> [P2], { diet(spanish,P2,prog,Data), member(root(_),Data) }. restructure([e,Pl,e,V]) —> [ P l , v ] , { diet(Spanish,PI,perf,Data), member(root(_),Data), diet(spanish,V,v, ) }. re s t r u c t u r e ( [ e , P l , P 2 , v T ) — > [P1,P2,V], { diet(Spanish,PI,perf,Data), 145 member(root(_),Data), diet(Spanish,P2,prog,_), dict(spanish,V,v,_) }. restructure([e,Pl,P2,e]) —> [P1,P2], { dict(Spanish,PI,perf,Data), member(root(_),Data), diet(spanish,P2,prog,_) }. restructure([l,e,e,V]) —> [ l , V ] , { diet(Spanish,I,inf1,Data), member(root(_),Data), dict(spanish,V,v,_) }. restructure([l,e,P2,V]) —> [l,P2,V], { diet(Spanish,I,inf1,Data), member(root(_),Data), diet(Spanish,P2,prog,_), dict(spanish,V,v,_) }. restructure([l,e,P2,e]) ~> [l,P2,e], { diet(Spanish,I,inf1,Data), member(root(_),Data), diet(Spanish,P2,prog,_) }. restructure([l,Pl,e,V]) — > [ l , P l , V ] , { diet(Spanish,I,inf1,Data), member(root(_),Data), diet(Spanish,PI,perf,_), dict(spanish,V,v,_) }. restructure([I,P1,P2,V]) —> [l,Pl,P2,V], { diet(Spanish,I,infl,Data), member(root(_),Data), diet(Spanish,PI,perf,_), diet(spanish,P2,prog,_), dict(spanish,V,v,_) }. restructure([I,Pl,P2,e]) —> [ l , P l , P 2 ] , { diet(Spanish,I,inf1,Data), member(root(_),Data), dict(Spanish,PI,perf,_), diet(Spanish,P2,prog,_) }. restructure([l,e,e,e]) —> [ I ] , { diet(Spanish,I,inf1,Data), member(root(_),Data) }. /********************************************************** ^ /* ' INVERSION RULES */ /**********************************************************/ parse_inversion(Spanish,expr([_,_,_,V]), [[infl,[]],NP,[e,e, [[v,[]],e,[e,e|Args]|PostV]]]) —> { l e x i c a l ( V ) , extractLocal(spanish,V,v,_,f,FV) }, parse_subject(FV,NP), parse_inversion2(V,Args), postadjunct(Spanish,v,PostV). parse_inversion(L,_,INFL) —> x2(L,inf1,INFL). parse_inversion2(V,Args) —> { extract(spanish,V,v,_,subcat,SUBCAT) }, arguments(Spanish,SUBCAT,Args). parse_inversion2(V,Args) —> { extract(spani sh,V,v,_,subcat,SUBCAT) }, emptyArgs(Spanish,SUBCAT,Args). parse_subject(FV,NP) —> x2(Spanish,n,NP), { percolate(Spanish,NP,[[n,FN]|_]), must_agree(FN,FV) }. parse_subject(FV,[[n,[]],e,[e,N]]) --> [], { ( member(per(1),FV) ; member(per(2),FV) ), !, spanish_pronoun(N), assign_features(spanish,N,FV,_) }. parse_subject(_,[[n,[]],e]) —> []. inversion(spanish,_,_,BaseCOMP, SurfaceCOMP) :-member([comp,FC],Ba s eCOMP), member(wh(+),FC), !, /* oblig for wh- in COMP */ v_preposing(BaseCOMP, SurfaceCOMP). inversion(spanish,matrix(+),ques,BaseCOMP, SurfaceCOMP) :-!, v_preposing(BaseCOMP, SurfaceCOMP). inver s ion(spani sh,_,_,COMP,COMP). rinversion(Spanish,SurfaceCOMP, BaseCOMP) :-v_preposing(BaseCOMP, SurfaceCOMP), !. rinversion(Spanish,COMP,COMP). v_preposing([[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,BaseINFL, [[v,FV],BasePERF, [BasePROG,BaseVERB|Args]|PostV]]]]|PostC], [[comp,FC],WH,[e,COMP, [[inf1,FI],expr(Adjunct), [[infl,Fl],NP,[e,INFL, [[v,FV],PERF, [PROG,VERB|Args]|PostV]]]]]|PostC]) :-makelist([BaselNFL,BasePERF,BasePROG,BaseVERB], Base), makelist([INFL,PERF,PROG,VERB], Surface), v_prepose(Base,Surface,Adjunct). v_prepose([e,e,e ,V],[e,e,e,e],[e,e,e ,V]) v_prepose([e,e,P2 ,V],[e,e,e,e],[e,e,P2 ,V]). v_prepose([e,e,P2,V],[e,e,e,V],[e,e,P2,e]) v_prepose([e , P l,e , V ],[e,e,e,e],[e , P l,e , V ] ) v_prepose([e ,P l,P2 ,V],[e,e,e,e],[e ,P l,P2 ,V] ) . v_prepose([e ,P l,P2 ,V] ,[e,e,e ,V] ,[e ,P l,P2,e]) v_prepose( [ l,e,e ,V],[e,e,e,e], [ l,e,e ,V] ) . v_prepose( [ l,e,e ,V],[e,e,e ,V] , [ l re,e,e]) v_j)repose( [ l,e,P2 ,V],[e,e,e,e], [ l,e,P2 ,V]) . v_prepose( [ l,e,P2 ,V] ,[e,e,e ,V] , [ l,e,P2,e]). vjprepose( [ l,e,P2 ,V] ,[e,e,P2 ,V] , [ l,e,e,e]) v_prepose ( [ l , P l,e , V ] ,[e,e,e,e], [ l , P l,e , V ] ) . v_prepose ( [ l , P l,e , V ] ,[e , P l,e , V ] , [ l,e,e,e]) - i - i - i 147 v_prepose([l,Pl,P2,V],[e,e,e,e],[l,Pl,P2,V]). v _prepose([l,Pl,P2,V],[e,e,e,V],[l,Pl,P2,e]). vj>repose([l,Pl,P2,V], [e,Pl,P2,V], [l,e,e,e]) :- !. y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /* DEFAULT FORMS */ y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * default_tns(spanish,FI,FV,_, FI) :- member(tns(_),FI), not member(tns(_),FV). default_tns(Spanish,FI,FV,_, FI) :- member(tns(_),FV), not member(tns(_),FI). default_tns(spanish,FI,FV,_, [tns(-)|Fl]) :-not member(tns(_),FI), not member(tns( ),FV). / /* Default Lexical Features */ /* NOUNS */ /* ARTICLES */ defaultF(spanish,n,Fl,F4) :-default(per,3,FI,F2), default(pl,-,F2,F3), default(fem,-,F3,F4), !. def au1tF(spani sh,art,Fl,F3) default(pl,-,Fl,F2), default(fem,-,F2,F3), !. defaultF(Spanish,_,F,F). y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /* Generate i n f l e c t e d a u x i l i a r i e s */ y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * gen_per f(spani sh,_,Lexinfo,W) gen_perf(spani sh,V,_,W) gen_perf(spani sh,V,_,W) gen_perf(spanish,V,_,W) gen_prog(spani sh,_,Lexinfo,W) gen_prog(spanish,V,_,W) member(perf(W),Lexinfo) split(Stem,ar,V), split(Stem,ado,W). split(Stem,er,V), split(Stem,ido,W). split(Stem,ir,V), split(Stem,ido,W). gen_prog(spani sh,V,_,W) gen_prog(spani sh,V,_,W) y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * member(prog(W),Lexinfo) split(Stem,ar,V), split(Stem,ando,W). split(Stem,er,V), split(Stem,iendo,W). split(Stem,ir,V), split(Stem,iendo,W). 7 /* AUXILIARY PREDICATES */ y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * default(Feature,Value,Fin,Fout) :-FI =.. [Feature,Value], F2 =.. [Feature,_], default2(Fl,F2,Fin,Fout). default2(_,F2,Fin,Fin) :- member(F2,Fin), !. default2(Fl, , F i n , [ F l l F i n ] ) . Appendix H. Prolog Code - XBAR ^***************************************** / * V /* X-BAR SYSTEM */ /* */ ^*************************************** parse(L,Surface,Sstructure) :-x2(L,comp,Sstructure,Surface,[]). /* */ x 2 0_ '_ ) —> [?]» {!,fail}. x 2 0 _ 0 —> [•], {!,fail}. x2(L,inf1,[[infl,[3],ADJU,INFL]) —> /* Inversion */ adjunct(L,ADJU), parse_inversion(L,ADJU,INFL). x2(L,C,[[C,[]],Spec,HD|Post]) —> specifier(L,C,Spec), xl(L,C,HD), postadjunct(L,C,Post). /* */ xl(L,C,[Pre,HD|Complement]) —> preadjunct(L,C,Pre), x(L,C,HD,Args), arguments(L,Args,Complement). xl(L,C,[Pre,HD|Complement]) —> preadjunct(L,C,Pre), x(L,C,HD,Args), emptyArgs(L,Args,Complement). /* */ x(L,C,HD,Args) —> get_expr(L,C,HD,Args). x(L,C,W,Args) —> [W], { extract(L,W,C,_,subcat, Args) }. x(_,comp,e,[inf1]) —> []. x(_,infl,e,[v]) —> []. ^*************************** J /* SPECIFIER STRUCTURES: */ /******************************************** specifier(L,comp,WH) —> wh_phrase(L,WH). specifier(L,inf1,NP) —> x2(L,n,NP). specifier(_,inf 1, [[n, []],e]) —> [], {.'}. specifier(L,n,W) —> [W], { dict(L,W,art,_) }. specifier(L,v,W) —> [W], { dict(L,W,perf,_) }. specifier(L,v,W) —> [W], { dict(L,W,prog,_) }. specifier(_,_,e) —> []. /* optional */ ^******************************************* /* ADJUNCT STRUCTURES: */ y******************************************** preadjunct(L,v,W) —> [w], { dict(L,W,prog,_) }. preadjunct(_,_,e) —> []. /* optional */ /* */ postadjunct(_,n,[]) —> []. /* optional for NPs */ postadjunct(L,n,[PP|ADJU]) —> {!}, x2(L,p,PP), {!}, postadjunct(L,n,ADJU). 148 postadjunct(L,v,[PP|ADJU]) —> x2(L,p,PP), {!}, postadjunct(L,v,ADJU). postadjunct(_,comp,[[mode,decl]]) —> [.], {!}. postadjunct(_,comp,[[mode,ques]]) —> [ ? ] , { ! ) • postadjunct(_,_,[]) —> [ ] . /* optional */ ^******************************************* /* ARGUMENT STRUCTURES: */ y****************************************** arguments(_,[],[]) —> [ ] . arguments(L,[Al|As],[ARGl|ARGS]) ~> parseArg(L,Al,ARG1), arguments(L,As,ARGS). parseArg(L,{Arg},TREE) —> optionalArg(L,Arg,TREE). parseArg(L,Arg,TREE) —> x2(L,Arg,TREE). optionalArg(L,Arg,TREE) —> par seArg(L,Arg,TREE). optionalArg(_,_,[]) —> [ ] . emptyArgs(L,[Al|As],[Argl|Args]) —> parseArg(L,Al,Argl), emptyArgs(L,As,Args). emptyArgsCL,[n|As],[[[n,[]],e]|Args]) ~> emptyArgs2(L,As,Args). emptyArgs(L,[p|As],[[[p,[]],e]|Args]) —> emptyArgs2(L,As,Args). emptyArgs2(_,[],[]) —> []. emptyArgs2(L,[Al|As],[Argl|Args]) —> parseArg(L,Al,Argl), emptyArgs2(L,As,Args). ^****************************************** /* WH-PHRASE */ y******************************************* wh_phrase(L,TREE) —> x2(L,n,TREE), { wh_test(L,TREE) }. wh_phrase(L,TREE) —> x2(L,p,TREE), { wh_test(L,TREE) }. wh_test(L,[[p,_],_,[_,_,TREE]]) :- /* test for wh-NP */ percolate(L,TREE,TREEl), /* within PP */ features(TREE1,F), member(wh(+),F). wh_test(L,TREE) :-percolate(L,TREE,TREEl), features(TREEl,F), !, member(wh(+),F). Appendix I. Prolog Code - RTRANSFORM y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ / * V /* REVERSE-TRANSFORMATIONS */ /* */ rtransformation(L, COMP1, COMP11) :-rinversion(L, COMP1, COMP2), percolate(L, COMP2, COMP3), rmove_affix(L, COMP3, COMP4), theta_extraction(COMP4, COMP5), do_Support(L,_,_,COMP6, COMP5), it_insertion(L, COMP7, COMP6), null_subject(L, COMP8, COMP7), rmove_alpha( COMP8, COMP9), wh_feature_extraction(COMP9, COMP10), rtransform lower(L, COMP10, COMP11). rtransform_lower(L,[[comp,FC],WH,[e,COMP [[comp,FC],WH,[e,COMP rtransform2(L,Argsl,Args2). Argsl] Args2] Post], Post]) :-rtransf orm2 (_,. [],[]). rtransform2(L,[HI|T1], [H2|T2]) :- rtransform3(L,HI,H2), rtransform2(L,Tl,T2). rtransform3(L,[[comp,FC1]|COMP1], [[comp,FC2]|C0MP2]) :-rtransformation(L,[[comp,FCl] COMPl], [[comp,FC2] COMP2]). rtransform3(L,[[CP,FC],SC,[PC,C Argsl] Post], [[CP,FC],SC,[PC,C Args2] Post]) :-rtransform2(L,Argsl,Args2). y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ ' /* FEATURE PERCOLATION */ y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ ' percolate(_,X,X) :- not lexical(X). /* empty NP or PP */ percolate(L,[[v,[]],P1,[P2,V Argsl] [[v,[]],P1,[P2,V Args2] perc_compl(L,Argsl,Args2), perc compl(L,PostVl,PostV2). PostVl], /* bypass VP */ PostV2]) :-percolate(L,[[C,[]],Specl,HDl Postl], [[C,F], Spec2,HD2 Post2]) :• perc_xl(L,C,HDl, HD2,FHD), perc_specword(L,C,Specl, Spec2,FSC), feature_union(FSC,FHD,F), perc_compl(L,Postl,Post2). perc_xl(L,C,[e,HD1 [e,HD2 Complementl], Complement2], F) :-perc_word(L,C,HDl, HD2,F), perc compl(L,Complementl, Complement2) perc_compl(_,[], []) 150 perc_compl(_,[[mode,M]],[[mode,M]]) :- !. perc_compl(L,[Hl|Tl], [H2|T2]) :-perc_compl2(L,Hl, H2), perc_compl(L,T1, 12). perc_compl2(_,[[comp,FC]|COMP], [[comp,FC]|COMP]) :- !. perc_compl2(L,X,Y) :- percolate(L , X,Y). /* / perc_word(_,_,e,e,[]) :- !. /* empty word */ perc_word(L,_, [ [ X , _ ]|Y], [ [ X , F ] | Z ] , F ) :- /* phrase */ !, p e r c o l a t e ( L , [ [ X , J | Y ] , [ [ X , F ] Z ] ) , !. per c_wordd,C, expr ( [W|T] ), expr( [RW T]) ,F) :- /* expr'n */ !,extractLocal(L,W,C,RW,f, FI), defaultF(L,C,Fl, F). perc_wordd,C,W, RW,F) :- /* l e x i c a l word */ !,extractLocal(L,W,C,RW,f, FI), defaultFd,C,Fl, F). /* */ perc_specword(_,_,e,e,[]) 1 perc_specword(L,n,W,RW,F) perc_specword(L,comp,W,RW,F) per c_ spec word (L, inf 1, W, RW, F) - perc_word(L,art,W,RW,F) - perc_word(L,_,W,RW,F). - perc_word(L,_,W,RW,F). y T * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * I /* REVERSE AFFIX MOVEMENT */ rmove_affix(L, [[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,INFL, [[v,[]],PERF,[PROG,v|Args]|PostV]]]]|PostC], [[comp,FC],WH,[e,COMP, [[infl,BaseFl],NP,[e,INFL, [[v,FV],e,[e,BaseV|Args]|PostV]]]]|PostC]) :-rmove_participle(L,PERF,PROG,V, BaseV,FV), default_tns(L,FI,FV,V,FIl), /* language-dependent */ union(FIl,FV,FI2), /* merge VP&INFL feat. */ features(NP,FNP), feature_union(FNP,FI2,BaseFI). /* merge NP&INFL feat. */ /* expressions */ rmove_participle(L,PERF,PROG,expr([H|T]), expr([BaseH|T]),FV) :-!, rmove_participle(L,PERF,PROG,H,BaseH,FV). /* no auxs, V i n f i n i t i v e */ rmove_j?articiple(L,e,e,V,V,F) :-dict(L,V,v,Lexinfo), not member(root(_),Lexinfo), extractFunc(Lexinfo,f,F). /* no auxs, V in f l e c t e d */ rmove_participle(L,e,e,V,BaseV,F) :-!,extractLocal(L,V,v,BaseV,f,FV), member(tns(_),FV), subtract(FV,[perf(+)],F). /* V=[tns(past),perf(+)] */ /* P2 progressive */ rmove_participle(L,e,P2,V,BaseV,[prog(+)|F]) :-extractLocal(L,P2,prog,_,f,F), • (ZldNI 'ndNI 'HM)ZduiOO 9AOUIJ '(HM) THO"™! ([4SOd [ZldNl'dWOO's] '3 ' [Q'duioO]] ' t 4 s o d [ndNI 'dWOO'9] 'HM' [ 'duioo]])duioo 3AOUiJ • (CdWOO' ZdWOO) C qns_3AOUi J ' (ZdWOD' TdWOD ) duiOD 3AOUIJ -: (CdWOO'TdWOD)eqdTH aAOUiJ /* YHdlY 3A0W 3SJ3A9H * / '(dW0D'dW0D)uoT43Hj4x3 e 4 3 i j 4 " i ' (Yd' (-) e 4aq; 4) jaquraui -: ([D}SOd|[[[[A4SOd|[[dY '3'[Yd 'B]] 'A'Zd]'Td'[Ad 'A] ] 'ldNl ' 9 ] ' d N ' [ [ l d | ( - ) H 4 3 U . 4 ] 'TJUT]] 'dWOD'9] 'HM'[Dd'duioo]] '[D4SOd|[[[[A4SOd|[[dV'a'[Vd 'H]] 'A'Zd]'Td'[Ad 'A] ] 'ldNl ' 9 ]'dN'[ld'T3UT]] 'dWOD'3] 'HM' [Dd'dUIOO] ] ) U O T 4 D B J 4 X 3 B 4 3 t [ 4 /*******«¥****¥***********************^ /* IdNI o; 3jn 4e3i e i a t u S A O W * / *(Ad'(+)Jjad)jaquiam '(Ad'3 ,ASSBg I A I A , T)T ^ 3 O i 4 0 H j 4xa ' ( [ ( + ) 6 o j d ] ' 3 ' - ' 6 o j d ' z d ' T ) T ^ o o i 4 O H j 4 x a ' (d'3 ' " " ' B o j d ' x d ' T ) X H O O T 4 O H j 4 x a -: ([d| ( - ) H 4 a q 4 ' ( + )ssHd'( + )6oj:d] ' Assng' A' Zd' Id' 1) a"[dT3T 4 Jed - S A O U I J a A T S s a j f i o j d a A T S s e d * / •(Ad'(+)3 J3d) jaquraui ' (Ad' 3' A3 s eg' A ' A ' 1) Teooq 4 O H J 4xa '(dd'(+)3JSd)jaquraui '(dd'3'_'6ojd'3d'l)ieooq 4 D H J 4xa '(d'3' ' 3 J a d ' i d ' l ) T B 3 o i ^ 3 B J 4 x 3 -: ([d| ( - ) H 4aq 4 ,(+)ssHd'(+)3jad] ' Asssg' A ' Zd' Td' 1) s i d T O T 4 J H C T S A O U U / , a A f s s e d 4 o a j j a d * / • ( [ (+.)6ojd] ' 3'A^SHg ' A 1 A ' T ) X B O O T 4 O H J 4 X 3 ' (dd' ( + ) 3 J s d ) jaquraui ' ( d d ' 3 ' ~ ' 6 o j d i z d ' T ) T ^ o q 4 0 B j i x a '(d'3' ' 3 J a d ' - [ d ' l ) T B 3 o i 4 3 B J 4 x a ([d| ( + )6ojd'( + )jjad] 'A3SHg iA'Zd'Td'l )3 ldTDT 4j'Bd — BAOUIJ 3ATSsaj6ojd 4 o a j j 3 d */ •(Ad'(+)jjad)jaquraui '(Ad'3'A9SHg 'A'A'l)T«30i 40Ej 4xa '(d'3' 'Bojd'id'l)TSOOi 4OHj 4 x 3 -'- ([d| ( - ) e 4 3 q 4 ' ( + )ssHd] ,ASSEg'A '3'Td'T)STdT 3T4-ied _3AOUiJ / * 3ATSSHd » / • ( [ ( + )6ojd] /i , A 3 S H g ,A'A'T)TS30'l4oej 4xa '(d'3' '6ojd'id'T)T B 3oq 4oHj 4xa -: ([d|(+)6ojd]'ASSHg'A '9'Td'l)9TdT3T4JBd~aAouiJ /* 3ATSsaj6ojd i d */ * ( A d ' ( + ) 3 j a q u r a u i '(Ad'3'A3SHg ' A'A ' T)T B 3 0 i 4 0 B j 4x3 '(d'3' '3Jad'Td'l)Teooi 4OHj 4x3 ([d| (+-)iJ3d] 'A9SHg'A'a'Td'T)9TdTOT 4jHcr"aAOUiJ / * a A T 4 o a j j a d */ • ( [ ( + )DOjd] i 3 /A9Seg I A'A ' , I)"rH : 3O r l43 B ^4X9 rmove_comp(COMP,COMP). /* COMP to Argument */ rmove_comp2(WH,[[inf1,FI],NP,[e,INFL, [[v,FV],Pl,[P2 , v|Argsl]|Post]]], [[infl,Fl],NP,[e,INFL, [[v,FV],Pl,[P2 ,v|Args2]|Post]]]) :-rmove_to_obj(WH,Argsl,Args2). /* NP COMP to PP-adj */ rmove_comp2(WH,[[infl,Fl],NP,[e,INFL, [[v,FV],Pl,VP|PostVl]j], [[infl,Fl],NP,[e,INFL, [[v,FV],Pl,VP|PostV2]]]) :-rmove_pp(WH, PostVl, PostV2). /* NP COMP to PP-arg */ rmove_comp2(WH,[[inf1,FI],NP,[e,INFL, [[v,FV],Pl,[P2,V|Argsl]|PostV]]], [[infl,FI],NP,[e,INFL, [[v,FV],Pl,[P2 ,v|Args2]|PostV]]]) :-rmove__pp(WH, Argsl, Args2) . /* NP COMP to Subject */ rmove_comp2([[n,FN]|NN],[[inf1,FI1],NP,INFL], [[infl,FI2],[[n,FN]|NN],INFL]) :-not lexical(NP), features(WH,F), feature_union(F,FIl,FI2). /* COMP to COMP */ rmove_comp2(WH,[[comp,FC],e,[e,COMP,INFL]|Post], [[comp,FC],WH,[e,COMP,INFL]|Post]). /* recursive search */ rmove_comp2(WH,[[X,F],A1,[A2,XW Argsl] Post], [[X,F],A1,[A2,XW Args2] Post]) :-rmove_comp3(WH,Argsl,Args2). /* PP COMP added as adj */ rmove_comp2([[p,FP]|PP],[[inf1,FI],NP,[e,INFL, [[v,FV],Pl,VP|PostV]]], [[infl,Fl],NP,[e,INFL, [[v,FV],Pl,VP,[[p,FP]|PP]|PostV]]]). rmove_comp3(WH,[Hl|T],[H2|T]) :- rmove_comp2(WH,Hl,H2). rmove_comp3(WH,[H|TI],[H|T2]) :- rmove_comp3(WH,TI,T2). rmove_to_obj(WH,[H|T],[WH|T]) :-not l e x i c a l ( H ) , member([C,_],H), member([C,_],WH), !. rmove_to_obj(WH,[H|T1],[H|T2]) :- rmove_to_obj(WH,T1,T2). rmove_pp([[n,FN]|NP],[[[p,FP],e,[e,P,PNP]]I Post], [[[p,FP],e,[e,P,[[n,FN]|NP]]]|Post]) :-not lexical(PNP). rmove__pp(NP, [X|P1], [X|P2]) :- rmove_pp(NP,Pl,P2). /* */ rmove_subj([[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,INFL,VPl]]]|Post], [[comp,FC],WH,[e,COMP, [[infl,BaseFl],[[n,[]],e],[e,INFL,VP2]]]|Post]) lexical(NP), member(theta(-),FI), !, combineF([per(3),pl(-)],FI,BaseFI), rmove_lower_subj(NP,VPl,VP2). rmove_subj(COMP,COMP). /* SUBJ to OBJ PostV], PostV]) :-rmove_lower_subj(NP,[[v,FV],Pl,[P2,V Argl] [[v,FV],Pl,[P2,V Arg2] member(pass(+),FV), !, rmove_obj(NP,Argl,Arg2). rmove_lower_subj([[n,_]|NP], /* SUBJ to SUBJ */ [[comp,FC],WH,[e,COMP, [[infl , F l ] , [ [ n , [ ] ] , e ] , I N F L ] ] | P o s t ] , [[comp,FC],WH,[e,COMP, [[infl,Fl],[[n,[]]|NP],INFL]]I Post]). rmove_lower_subj(NP,[[X,F],A1,[A2,XP,Argl] [[X,F],A1,[A2,XP,Arg2] rmove_lower_subj(NP,Argl,Arg2). PostX], PostX]) :-rmove_obj(NP,BaseNP,NP) :- not lexical(BaseNP). rmove_obj(NP,[Hl|T],[H2|T]) :- rmove_obj(NP,Hl,H2). rmove_obj(NP,[H|T1],[H|T2]) :- rmove_obj(NP,T1,T2). ^***************************************** /* Move WH(+) i n [NP,PP] to PP features */ y*************************************** wh_feature_extraction([[p,F],e,[e,P,NP]], [[p,[wh(+)|F]],e,[e,P,NP]]) :-features(NP,FNP), member(wh(+),FNP), !. wh_feature_extraction([Hl|Tl], [H2|T2]) :-wh_feature_extraction(Hl,H2), wh_feature_extraction(Tl,T2), !. wh feature extraction(X,X). Appendix J. Prolog Code - TRANSLATE y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / * V /* TRANSLATE */ /* */ /* X2 —> Specifier XI Postadjuncts */ translate(SL,[[C,F1],Specl,HDl Postl], TL,t[C,F2],Spec2,HD2 Post2]) :-transl_xl(SL,C,Fl,HDl, TL,F2,HD2), transl_spec(SL, C, Sped,TL, F2, Spec2), transl_comp(SL,Postl, TL,Post2). /* XI —> Preadjunct X Complements */ transl_xl(SL,C,Fl,[Pre,HDl Complementl], TL, F2,[Pre,HD2 Complement2]) :-transl_head(SL,C,Fl,HDl, TL,F2,HD2), transl_comp(SL,Complementl, TL,Complement2). y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * transl_head(_,_,F,e, _,F,e) :- !. /* empty head */ transl_head(_,comp,_,_, _,[],e) :- !. /* COMP —> e */ /* expression head */ transl_head(SL,C,Fl,expr([SLW|Expr]), TL,F2,TLW) :-extractLocal(SL,SLW,C, _,expr,[Expr|Expr_info]), member (TLwords, Expr_inf o), TLwords =.. [TL,Words], makelist(Words,Wlist), !, pick_headlist(C,Wlist,Fl,TL, F2,TLW). /* a l l others */ transl_head(SL,C,Fl,SourceW, TL,F2,TargetW) :-extractLocal(SL,SourceW,C, _,TL,TLwords), !, pick_headlist(C,TLwords,FI, TL,F2,TargetW). /* _ */ pick_headlist(C,[Wl|_],Fl, TL,F2,W2) :-pick_headword(C,Wl,Fl, TL,F2,W2). pick_headlist(C,[_|Ws],Fl, TL,F2,W2) :-pick_headlist(C,Ws,Fl, TL,F2,W2). /* */ pick_headword(_,e,F,_, F,e) :- !. pick_headword(_,expr(E),_,_,[],expr(E)) :- !. pick_headword(n,Wl,Fl,TL, F2,W3) :- /* transl head noun */ get(pl,Fl,NUM), adjust(pl,n,TL,Wl,NUM,W2), /* adjust for num */ extractLocal(TL,Wl,n,_,f,F2a), defaultF(TL,n,F2a, F2b), combineF(Fl,F2b, F2), capitalize_proper(TL,W2,W3), !. pick_headword(inf1,W,F,_, F,W) :- !. pick_headword(C,W,Fl,TL, F2,W) :- /* transl other heads */ extractLocal(TL,W,C,_,f, F2a), defaultF(TL,C,F2a, F2b), combineF(F2b,Fl, F2), !. 155 y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * y /* Translate specifier */ y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ transl_spec(_,_,X, _,_,X) :- not lexical(X), !. /* empty */ transl_spec(SL,n,SourceW, TL,F,TargetW) :- /* arti c l e */ extractLocal(SL,SourceW,art, _,TL,TLwords), !, pick_speclist(TLwords,TL,F, TargetW). /* translate subject NP ( i n f l spec) */ transl_spec(SL,inf1,[[n,Fl] translate(SL, [[n,Fl] NP1], TL,_,[[n,F2] NP2]) :-NP1], TL, [[n,F2] NP2]). /* */ pick_speclist([Wl|_],TL,F2,W2) :- pick_specword(Wl,TL,F2,W2). pick_speclist([_|Ws],TL,F2,W2) :- pick_speclist(Ws,TL,F2,W2). pick_specword(W,english,_,W). pick_specword(Wl,Spanish,F2,W3) :-get(fem,F2,GEN), adjust(fem,art,Spanish,Wl,GEN,W2), /* adjust gender */ morph_word(spani sh,W2), get(pl,F2,NUM), adjust(pl,art,Spanish,W2,NUM,W3), !. /* adjust number */ ^**********************************************************i /* Translate complements */ y 1 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ transl_comp(_,e,_,e). /* empty */ transl_comp(_,[],_,[]). /* no complements */ transl_comp(_,[[mode,M]], _,[[mode,M]]) :- !. transl_comp(SL,[Hl|Tl], TL,[H2|T2]) :-translate(SL,Hl,TL,H2), transl_comp(SL,Tl,TL,T2). y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ /* Auxiliary functions V y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ capitalize_proper(L,W,Wout) :-diet(L,W,n,Lexinfo), member(proper(+),Lexinfo), capitalize(W,Wout). capitalize_proper(_,W,W). /* */ adjust(_,_,_,W,[],W). /* No inflection */ adjust(pl,_,_,W,Lpl(-)],W). /* Not plural */ adjust(fem,_,_,W,[fem(-)],W). /* Not feminine */ adjust(pl,C,TL,W,[pl(+)],Target) :- /* pluralize */ diet(TL,W,C,Lexinfo), test_f(pl(+),TL,W,C,Lexinfo,Target). adjust(fem,C,TL,W,tfem(+)],Target) :- /* feminize */ diet(TL,W,C,Lexinfo), test_f(fem(+),TL,W,C,Lexinfo,Target). test_f(Fitem,L,Root,C,_,Root) :- /* already inflected */ extractLocal(L,Root,C,_,f,F), member(Fitem,F), !. test_f(Fitem,_,Root,_,Lexinfo,W) :- /* otherwise: inflect */ Fitem =.. [Ftype,_], 157 get(Ftype,Lexinfo,Fpred), apply(Ftype,Root,Fpred,W). apply(pi,Root,[],Target) :-split(Root,s,Target). apply(fem,Root,[],Target) :-split(Stem,os,Root), split(Stem,as,Target). apply(fem,Root,[],Target) :-split(Stem,o,Root), split(Stem,a,Target). apply(fem,Root,[],Target) :-split(Root,a,Target). apply(F,_,[Fpred],Target) :-Fpred =.. [F,Target]. /* default p i : add 's' */ /* default fem: */ /* delete "-os" */ /* add "-as" */ /* delete "-o" */ /* add "-a" */ /* add "-a" */ /* Else use inflection */ /* from Lexinfo */ Appendix K . Prolog Code - T R A N S F O R M y********************************************************* /* /* TRANSFORMATIONS /* y********************************************************* transform_and_generate(_,[]). transform_and_generate(L,[Dstructure|T]) : -bagof(Sstructure, transformation(L,Dstructure, Sstructure), Sstructures), generate_and_print(L,Sstructures), transform_and_generate(L,T). transformation(L,Dstructure,Sstructure) :-transform(L,matrix( + ),_,Dstructure,Sstructure), dbl_comp_filter(Sstructure), case_filter(L,Sstructure), binding_conditions(L,Sstructure), • ecp(L,Sstructure), write('...Transformed'),nl. transform(L,Matrix,Mode,C0MP1, COMP11) :-transform_lower(L,matrix(-),Mode,COMPl, COMP2), set_tns(Matrix,C0MP2, COMP3), move_alpha(L,COMP3,COMP4), it_insertion(L,COMP4,COMP5), null_subject(L,COMP5,COMP6), do_support(L,Matrix,Mode,COMP6,COMP7), move_affix(L,COMP7,COMP8), comp_in s er t ion(L,Matr ix,COMP8,COMP9), modal_insertion(L,COMP9,COMP10), inver sion(L,Matrix,Mode,COMP10,COMP11). / r / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Transform Lower Cycle * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * transform_lower(_,_,Mode,[[comp,F]|COMP], [[comp,F]|C0MP]) member([mode,Mode],COMP), !. transform_lower(L,Matrix,Mode, [[C,F],Spec,[Pre,HD [[C,F],Spec,[Pre,HD Argsl] Args2] Post], Post]) :-transf lower args(L,Matrix,Mode,Argsl, Args2), transf_lower_args(L,Matrix,Mode,[[[comp,Fl] [[[comp,F2] !,transforms,Matrix,Mode, [[comp,Fl] [[comp,F2] C0MP1] C0MP2] C0MP1], C0MP2]) transf_lower_args(L,Matrix,Mode,[Hl |T], [ H 2 | T ] ) transform_lower(L,Matrix,Mode,HI, H2). / T], T]) :-* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /* Set TNS 158 set_tns(_,COMP, COMP). /* leave clause as is f i r s t time set_tns(matrix(-),[[comp,F],WH,[e,COMP, [[infl,Fl],NP,INFL]]|Post], [[comp,F],WH,[e,COMP, [[inf1,F2],NP,INFL]]|Post]) :-alternate_tns(NP,Fl, F2), !. alternate_tns(NP,Fl, [tns(pres)|F5]) :-member(tns(-),F1), subtract(Fl,[tns(-)], F2), features(NP,FNP), union(FNP,F2, F3), default(per,3,F3,F4), default(pl,-,F4,F5). alternate_tns(_,Fl,[tns(-)|F2]) :-subtract(Fl,[tns(_)],F2). /* /* MOVE ALPHA 7 * y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * move_alpha(_,COMP,COMP). /* no movement /* NP movement to subject move_alpha(L,[[comp,FC],WH,[e,COMP, [[infl,BaseFl],BaseNP,BaseINFL]]|Post], [[comp,FC],WH,[e,COMP, [[infl,FI],NP,INFL]]|Post]) :-not lexical(BaseNP), member(theta(-),BaseFl), move(L,n,0,NP,[ana(+)],BaseINFL,INFL), features(NP,F), combineF(F,BaseFl,FI). /* NP movement to SUBJ, then COMP move_alpha(L,[[comp,FC],e,[e,COMP, [[inf1,BaseFl],BaseNP,BaselNFL]]|Post], [[comp,[wh(+)|FC]],WH,[e,COMP, [[infl,FI],NP,INFL]]|Post]) :-not lexical(BaseNP), member(theta(-),BaseFl), ((bounding_cat(L,inf1),Level=l) ; Level=0), move(L,n,Level,WH,[ana(+)],BaselNFL,INFL), features(WH,F), member(wh(+),F), combineF(F,BaseFl,FI), index(WH,I), makelist([[n,[indx(I),pro(-),ana(-)]],e],NP). /* NP movement to COMP move_alpha(L,[[comp,FC],e,[e,e,BaselNFL]|Post], [[comp,[wh(+) FC]],WH,[e,e,INFL]|Post]) :-move(L,n,0,WH,[pro(-),ana(-)],BaselNFL,INFL), features(WH,F), member(wh(+),F). /* PP movement to COMP move_alpha(L,[[comp,FC],e,[e,e,BaselNFL]|Post], [[comp,[wh(+)|FC]],WH,[e,e,INFL]I Post]) :-move(L,p,0,WH,tpro(-),ana(-)],BaseINFL,INFL), features(WH,F), member(wh(+),F). move(_,_,2,_,_, ,_) :- ! , f a i l . move(L,Alpha,_,T[Alpha,[indx(I)|F]],Spec,HD], ANAPRO, [[Alpha,F],Spec,HD], [[Alpha,NewF],e]) :-not member(indx(_),F), append([indx(I)|F],ANAPRO,NewF). move(L,Alpha,_,[[Alpha,F],Spec,HD], ANAPRO, [[Alpha,F],Spec,HD], [[Alpha,NewF],e]) :-member(indx(I),F), append(F,ANAPRO,NewF). move(L,Alpha,Levell,XP,ANAPRO,[[C,F]|Tl], [[C,F]|T2]) :-bounding_cat(L,C), Level2 i s Levell+1, !, move(L,Alpha,Level2,XP,ANAPRO,Tl,T2). move(L,Alpha,Level,XP,ANAPRO,[[C,F]|T1], [[C,F]|T2]) :-!, move(L,Alpha,Level,XP,ANAPRO,Tl,T2). move(L,Alpha,Level,NP,ANAPRO,[HI|T], [H2|T]) :-move(L,Alpha,Level,NP,ANAPRO,HI,H2). move(L,Alpha,Level,NP,ANAPRO,[H]T1], [H|T2]) :-move(L,Alpha,Level,NP,ANAPRO,Tl,T2). /* /* AFFIX-HOPPING /* y****************************************** move_affix(L,[[comp,FC],WH,[e,COMP, [[infl,Fl],NF,[e,BaseINFL,BaseVP]]]|Post], [[comp,FC],WH,[e,COMP, [[infl,Fl],NP,[e,INFL,VP]]]|Post]) :-affix_infl(L,FI,BaseINFL, FV,INFL), affix_vp(L,FV,BaseVP, VP), !. a f f i x _ i n f l ( _ , F , e , F,e). affix_inf1(L,F,SourceW, [tns(-)|F],TargetW) :-diet(L,SourceW,inf1,Lexinfo), not member(root(_),Lexinfo), gen_verb(L,inf1,F,SourceW,Lexinfo, TargetW). affix_vp(L,FI,[[v,FV],_,[_,BaseV|Args]|Postv], [[v,FV],PERF,[PROG,V|Args]|PostV]) :-affi x _ t o _ p e r f ( L , F I , PERF), affix_to_prog(L,FI, PROG), affi x _ t o _ v e r b ( L , F l , BaseV, V). / affix_to_perf(L,FI,W) :- member(perf(+),FI), create(perf,L,FI,W). 161 affix_to_perf(L,FI,W) :-( member(prog(+),FI) ; member(pass(+),FI) ), create(prog,L,FI,W). affix_to_perf(_,_,e). affix_to_prog(L,FI,W) :-member(perf(+),FI), ( member(prog(+),FI) ; member(pass(+),FI) ), extractLocal(L,W,prog,_,f,[perf(+)]). affix_to_prog(L,FI,W) :-member(prog(+),FI), member(pass(+),FI), extractLocal(L,W,prog,_,f,[prog(+)]). affix_to_prog(_,_,e). create(C,L,F,W) :- /* If tns(-), select i n f i n i t i v e form */ member(tns(-),F), rootword(L,_,C,W). create(C,L,F, W) :- /* otherwise: select inflected form: */ not member(tns(-),F), get(per,F, PER), get(pi,F, HUM), get(tns,F, TENSE), append(PER,NUM, PERNUM), create2(L,C,TENSE,PERNUM, W). create2(L,C,TENSE,PERNUM,W) :-dict(L,W,C,Lexinfo), member(f(F),Lexinfo), makelist(F,WF), must_appear(TENSE,WF), must_agree(PERNUM,WF). create2(english,C,[tns(pres)],_,RW) :-rootword(english,_,C,RW)• /* */ /* If expression, a f f i x to f i r s t element */ affix_to_verb(L,FI,expr([BaseW|Expr]),expr([W|Expr])) :-affix_to_verb(L,FI,BaseW,W). affix_to_verb(L,FI,BaseW, W) :-dict(L,BaseW,v,Lexinfo), not member(root(_),Lexinfo), affix_verb2(L,FI,Lexinfo,BaseW,W). affix_verb2(L,FI,Lexinfo,BaseW, W) :-member(pass(+),FI), gen_perf(L,BaseW,Lexinfo,W). affix_verb2(L,FI,Lexinfo,BaseW, W) :-member(prog(+),FI), gen_prog(L,BaseW,Lexinfo,W). affix_verb2(L,FI,Lexinfo,BaseW, W) :-member(perf(+),FI), gen_perf(L,BaseW,Lexinfo,W). affix_verb2(_,FI,_,W,W) :- member(tns(-),FI). affix_verb2(english,FI,_,W,W) :- member(tns(fut),FI). affix_verb2(L,FI,Lexinfo,BaseW, W) :-gen_verb(L,v,FI,BaseW,Lexinfo,W). y******************************************** gen_verb(L,C,FI,BaseW,Lexinfo, W) :- /* irregular forms */ member(irr(IRR),Lexinfo), makelist(IRR,IRRlist), get_irr(L,C,FI,BaseW,IRRlist,W). gen_verb(L,_,FI,BaseW,Lexinfo, W) :- /* stem forms */ member(StemCat,Lexinfo), StemCat =.. [StemNum,St], name(StemNum,[S,T,E,M|_]), name(stem,[S,T,E,M]), makelist(St,Stem), get_stem(StemNum,L,FI,BaseW,Stem,W). gen_verb(english,C,FI,BaseW,_, BaseW) :- /* default form */ member(tns(pres), FI), ( member(pi(+),FI) ; member(per(2),FI) ; member(per(1),FI) ). gen_verb(engli sh,inf1,FI,Ba seW,_,Ba seW). gen_verb(L,C,FI,BaseW,_, W) :- /* generate form */ suffixtable(L,Sfx,Stemend,C,SfxF), makelist(SfxF,SfxList), must_appear(SfxList,FI), split(Stem,Stemend,BaseW), split(Stem,Sfx,W), morph_word(L,W). /* */ get_irr(L,C,FI,BaseW,[Wl|Ws], Wl) :-dict(L,Wl,C,Lexinfo), member(root(BaseW),Lexinfo), member(f(F),Lexinfo), makelist(F,Flist), member(tns(_),Flist), must_agree(Flist,FI). get_irr(L,C,FI,BaseW,[_|ws],W) :- get_irr(L,C,FI,BaseW,Ws,W). /* */ get_stem(Stemtype,L,FI,BaseW,[Stem|_],W) :-get_proper_stem(Stemtype,L,FI,BaseW,Stem), create_stem(Stemtype,L,FI,Stem,W). get_stem(Stemtype,L,FI,BaseW,[_|stems],W) :-get_stem(Stemtype,L,FI,BaseW,Stems,W). get_proper_stem(C,L,FI,BaseW,Stem) :-diet(L,Stem,C,Lexinfo), member(root(BaseW),Lexinfo), member(f(F),Lexinfo), makelist(F,Flist), must_appear(Flist,FI). create_stem(Stemtype,L,FI,Stem,W) :-suffixtable(L,Sfx,_,Stemtype,SfxF), makelist(SfxF,SfxList), must_appear(SfxList,FI), split(Stem,Sfx,W). y*************************************** / * V /* Doubly-Filled COMP F i l t e r */ / * V y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ dbl_corap_filter([[comp,_],WH,[e,COMP,_]|_]) :-lexical(WH), lexical(COMP), ! , f a i l . dbl_comp_filter([[inf1,_],_,[[inf1,_],_,[_,_,X]]]) :-!, dbl_comp_filter(X). dbl_comp_filter([[_,_],_,[_,_|x]|_]) :-!, dbl_comp_filter2(X). dbl_comp_filter([[_,_],e]). dbl_comp_filter2([]). dbl_comp_filter2([H|T]) :- dbl_comp_filter(H), dbl_comp_filter2(T) . y'************************************************** / * V /* Case F i l t e r */ / * , V ^**********************************************************^ case_filter(L,COMP) :- subj_pos(L,COMP), !, obj_pos(COMP), !. subj_pos(L,[[v,_],_,[_,V, /* ECM */ [[comp,_],WH,[e,COMP, /* (with inversion) */ [ [ i n f l , _ ] , _ , [[infl,Fl],NP,[_,_,VP]]]]|PostC]]|PostV]) :-( lexical(NP) ; is_pro(NP) ), member(tns(-),FI), !, WH = e, COMP = e, ecm(L,V), subj_pos(L,VP). subj_pos(L,[[v,_],_,[_,V, /* ECM */ [[comp,_],WH,[e,COMP, /* (no inversion) */ [[infl,Fl],NP,[_,_,VP]]]|_]]|_]) :-( lexical(NP) ; is_pro(NP) ), member(tns(-),FI), !, WH = e, COMP = e, ecm(L,V), subj_pos(L,VP). subj_pos(L,[[infl,Fl],NP,[_,_,VP]]) :- /* Variables:Case */ is_variable(NP), !, not member(tns(-),FI), subj_pos(L,VP). subj_pos(L,[[inf1,_], , /* bypass inverted INFL */ [ [ i n f 1 , F I ] T I N F L ] ] ) :-! , S U b j_pos(L,[[infl,Fl]|INFL]). subj_j»os(L,[[infl,Fl],NP,[_,_,VP]]) :- /* Case from +TNS */ ( lexical(NP) ; is_pro(NP) ), !, not member(tns(-),FI), subj_pos(L,VP). subj_pos(L,[[_,_],_,[_,_|Args]|_]) :- subj_pos2(L,Args). subj_pos(L f[[_,_],e]). subj_pos2(_,[]). subj_pos2(L,[H|T]) :- subj_pos(L,H), subj_pos2(L,T). /* */ obj_pos([[v,FV],_,[_,_,[[n,_],_,_]|Args]|_]) :-not member(pass( + ),FV), !, obj_pos2(Args). objjpos([[infl,Fl],_, [[infl,Fl],NP,[_,_,VP]]]) :-obj_pos(VP). obj_pos([[_,_],_,[_,_|Args]|_]) :-obj_pos2(Args). objjpos([[_,_],e]). obj_pos2([]). obj_pos2([H|T]) :- obj_pos(H), obj_pos2(T). /**********************************************************i /* */ /* BINDING CONDITIONS */ /* V ^**********************************************************y binding_conditions(L,COMP) :- condition_A(L,COMP), !, condition_C(COMP), !. condition_A(L,t[X,_],_,[_,XW, /* S-deletion */ [[comp,_],WH,[e,COMP, /* (with inversion) */ [ [ i n f l , _ ] , _ , [[infl,Fl],NP, [_,_|VP]]]] |_]] | J ) :-is_nptrace(NP), !, WH = e, COMP = e, member(tns(-),FI), !, is_sdeleter(L,XW,X), condition_A2(L,VP). condition_A(L,[[X,_],_,[_,XW, /* S-deletion */ [[comp, ],WH,[e,COMP, /* (no inversion) */ [ [ i n f l , F l ] , N P , [ _ , _ | V P ] ] ] | J ] | J ) :-is_nptrace(NP), !, WH = e, COMP = e, member(tns(-) ,FI), is_sdeleter(L,XW,X), condition_A2(L,VP). condition A(L,[[infl, ],NP1,[ , , /* obj bound to subj */ ttv,_],_,[_,_|Args]|_]]]) :-member(NP2,Args), is_nptrace(NP2), !, index(NPl,I), index(NP2,J), I == J, !, condition_A2(L,Args). condition_A(_,NP) :- /* otherwise NP-trace invalid */ is_nptrace(NP), ! , f a i l . condition_A(L,[[inf1,_], , [[infl,Fl]"|"lNFL]]) :-!,condition_A(L,[[infl,FI]|INFL]). condition_A(L,[[_,_],NP,[_,_|Args]|Post]) :-!,condi t ion_A(L,NP), condition_A2(L,Args), condition_A2(L,Post). condition_A(_,_). condition_A2(L,[]). condition_A2(L,[A|As]) :- condition_A(L,A), condition A2(L,As). condition_C([[comp,_],_,[_,_, [ [ i n f l , _ ] , _ , [[infl,_],NP,[_,_|VP]]]]|_]) :-!,var_free(NP,VP), condition C2(VP). condition_C(T[comp,_],_,[_,_, [[infl,_],NP,[_,_|VP]]]|_]) :-var_free(NP,VP), condition_C2(VP). var_free(NP,Args) :- index(NP,I), !, freed,Args). var_free(_,_). f r e e d , [H|T]) :- free_arg(l,H), free_arg2(I,H), f r e e d , T ) . free(_,[]). free_arg(I,X) :- not (is_variable(X), index(X,J), I == J ) . free_arg2(I,[[infl, ],_,[[inf1,_]|INFL]]) :-!, free_arg2(I,[Tinfl,_]|INFL]). free_arg2(I,[[inf1,_],NP,[_,_|Args]|_]) :-!, free_arg(I,NP), free(I,Args). free_arg2(I,[[_,_],_,[_,_|Args]|_]) :-!, freed,Args). free_arg2(_,_). condition_C2([H|T]) :- condition_C3(H), condition_C2(T). condition_C2([]). condi t ion_C3([[comp,_]|COMP]) :-!, condition_C([[comp,_]|COMP]). condition_C3(_). y************************************ / * V /* ECP - Empty Category Principle */ / * V ecp(_,[mode,_]) :- !. ecp(L,[[X,_],_,[_,XW, /* S-deletion */ 166 [[comp,_], e,[e,e, /* (with inversion) */ [ [ i n f l , _ ] , _ , [[infl,_],NP,[_,_,VP]]]]|_]]|PostX]) :-not l e x i c a l ( N P ) , is_sdeleter(L,XW,X), !, not is_PRO(NP), ecp(L,VP), ecp2(L,PostX), !. ecp(L,[[X,_],_,[_,XW, /* S-deletion */ [[comp,_],e,[e,e, /* (no inversion) */ [[infl,_],NP,[_,_,VP]]]|_]]|PostX]) :-not l e x i c a l ( N P ) , is_sdeleter(L,XW,X), !, not is_PRO(NP), ecp(L,VP), ecp2(L,PostX), !. ecp(L,[[comp,_],WH,[e,e, /* WH i n COMP */ [ [ i n f l , _ ] , _ , /* (with inversion) */ [[infl,_],NP,[_,_,VP]]]]|Post]) :-index(WH,_), !, not is_PRO(NP), ecp(L,VP), !. ecp(L,[[comp,_],WH,[e,e, /* WH i n COMP */ [[infl,_],NP,[_,_,VP]]]|Post]) :- /* (no inversion) */ index(WH,_), !, not is_PRO(NP), ecp(L,VP), !. ecp(L, [ [ i n f l , F l ] , N P , [_,_,VP]]) :- /* +TNS */ not l e x i c a l ( N P ) , not member(tns(-) ,FI), !, proper_governor(L,inf1), ecp(L,VP), !. ecp(L,[[X,_],_,[_,_|Args]|PostX]) :- /* head */ member(XP,Args), is_trace(XP), !, proper_governor(L, X), ecp2(L,Args), ecp2(L,PostX), !. ecp(L,[[X,_],_,_|PostX]) :- /* adjunct */ member(XP,PostX), is_trace(XP), !, proper_governor(L,X). /* ECP: embedded clauses */ e c p ( L , [ [ i n f l , _ ] , _ , [[infl,_],NP,[_,_,VP]]]) :-ecp(L,VP), !. ecp(L,[[_,_],_,[_,_|Args]|PostX]) :-ecp2(L,Args), ecp2(L,PostX), !. ecp(_,[[_,_],e]). ecp2(_,[]). ecp2(L,[H|T]) :- ecp(L,H), ecp2(L,T). Appendix L. Prolog Code - GENERATE ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ /* */ /* GENERATE */ /* */ ^/**********************************************************^ generate_and_print(_,[]). generate_and_print(L,[Sstructure|T]) :-generate(L,Sstructure, Surface), printout(Surface), generate_and_print(L,T). generate(L,TStructure,Tsurface) :-readleaves(TStructure,Wordlistl), delete_e(Wordlistl,Wordlist2), collocate(L,Wordlist3,Wordlist2), contract(L,Tsurface,Wordlist3). readleaves([], []). readleaves([mode,X], [x]). readleaves([[ ,F]|x], S) :- makelist(_,F), readleaves(X,S). readleaves([HJT], S) :- readleaves(H,Sl), readleaves(T,S2), append(Sl,S2,S). r ead1eaves(expr(Expr), Expr). readleaves(W, [W]). delete_e([],[]). delete_e([e Tl],T2) :- delete_e(Tl,T2). delete_e([H Tl],[H|T2]) :- delete_e(Tl,T2). ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^ /* PRINTOUT */ ^/**********************************************************^ printout([H|T]) :-capitalize(H,HC), display(HC), display_sent(T), n l . printout(X) :- display('Invalid string to printout: '), n l , display(X), abort. capitalized!,HC) :-name(H,[HH|HT]), HH>96, HH<123, HHC is HH-32, name(HC,[HHC|HT]). capitalized,H). display_sent([decl]) :- display('. 1). display_sent([ques]) :- display('? 1). display_sent([H|T]) :-display(' '), display(H), display_sent(T). 167 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051899/manifest

Comment

Related Items